Re: [PATCH 0/7] atomics: generate atomic headers

2018-06-04 Thread Mark Rutland
On Mon, Jun 04, 2018 at 06:21:22PM +0200, Dmitry Vyukov wrote:
> On Tue, May 29, 2018 at 8:07 PM, Mark Rutland  wrote:
> > This series scripts the generation of the various atomic headers, to
> > ensure that the various atomic APIs remain consistent, reducing the risk
> > of human error, and simplifying future rework.
> >
> > The series is based on my atomic API cleanup patches [1,2], and can be
> > found on my atomics/generated branch [3] on kernel.org.
> >
> > The first three patches are preparatory rework, with patch four
> > introducing the infrastructure. The final three patches migrate to
> > generated headers. The scripts themselves are mostly POSIX sh (modulo
> > local), without bashisms, and work in dash and bash.
> >
> > Per Linus request that it is possible to use git grep to inspect the
> > atomic headers [3], the headers are committed (and not generated by
> > kbuild). Since we now expand the fallback definitions inline, each
> > *should* be easier to find with grep. Each header also has a comment at
> > the top with a path to the script used to generate it.
> >
> > While the diff stat looks like a huge addition, the scripting comes in
> > at ~800 lines in total, including the fallback definitions, so we're
> > removing ~1000 lines of hand-written code. At the same time, we fill in
> > gaps in the atomic_long API, and the instrumented atomic wrappers.
> >
> > Longer-term, I think things could be simplified if we were to rework the
> > headers such that we have:
> >
> > * arch/*/include/asm/atomic.h providing arch_atomic_*().
> >
> > * include/linux/atomic-raw.h building raw_atomic_*() atop of the
> >   arch_atomic_*() definitions, filling in gaps in the API. Having
> >   separate arch_ and raw_ namespaces would simplify the ifdeffery.
> >
> > * include/linux/atomic.h building atomic_*() atop of the raw_atomic_*()
> >   definitions, complete with any instrumentation. Instrumenting at this
> >   level would lower the instrumentation overhead, and would not require
> >   any ifdeffery as the whole raw_atomic_*() API would be available.
> >
> > ... I've avoided this for the time being due to the necessary churn in
> > arch code.
> >
> > Thanks,
> > Mark.
> 
> Wow! Nice work!

Thanks!

> I can say that besides KASAN, we will need hooking into atomics for
> KMSAN and KTSAN in future. Without this it would be lots of manual
> work with possibility of copy-paste errors.

Indeed!

This was largely driven by wanting the arm64 atomics instrumented, and
generating that does end up easier to get right than manually expanding all the
acquire/release/relaxed ifdeffery manually.

First we need to get the preparatory patches [1,2] merged. Those have had a few
fixups since they were last posted, so I'll send an updated version come
v4.18-rc1 unless the atomics maintainers really want to queue those ASAP.

There'll be some additional prep work for instrumentation on arm64, too (e.g.
fiddling with the (cmp)xchg instrumentation), so I'll look into that in the
mean time.

Thanks,
Mark.

> > [1] https://lkml.kernel.org/r/20180529154346.3168-1-mark.rutl...@arm.com
> > [2] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
> > atomics/api-cleanup
> > [3] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
> > atomics/generated
> > [4] 
> > https://lkml.kernel.org/r/ca+55afxju0op8qllu0n-rjhbs7gqslvd8jcyedgw6jezfn7...@mail.gmail.com


Re: [PATCH 0/7] atomics: generate atomic headers

2018-06-04 Thread Mark Rutland
On Mon, Jun 04, 2018 at 06:21:22PM +0200, Dmitry Vyukov wrote:
> On Tue, May 29, 2018 at 8:07 PM, Mark Rutland  wrote:
> > This series scripts the generation of the various atomic headers, to
> > ensure that the various atomic APIs remain consistent, reducing the risk
> > of human error, and simplifying future rework.
> >
> > The series is based on my atomic API cleanup patches [1,2], and can be
> > found on my atomics/generated branch [3] on kernel.org.
> >
> > The first three patches are preparatory rework, with patch four
> > introducing the infrastructure. The final three patches migrate to
> > generated headers. The scripts themselves are mostly POSIX sh (modulo
> > local), without bashisms, and work in dash and bash.
> >
> > Per Linus request that it is possible to use git grep to inspect the
> > atomic headers [3], the headers are committed (and not generated by
> > kbuild). Since we now expand the fallback definitions inline, each
> > *should* be easier to find with grep. Each header also has a comment at
> > the top with a path to the script used to generate it.
> >
> > While the diff stat looks like a huge addition, the scripting comes in
> > at ~800 lines in total, including the fallback definitions, so we're
> > removing ~1000 lines of hand-written code. At the same time, we fill in
> > gaps in the atomic_long API, and the instrumented atomic wrappers.
> >
> > Longer-term, I think things could be simplified if we were to rework the
> > headers such that we have:
> >
> > * arch/*/include/asm/atomic.h providing arch_atomic_*().
> >
> > * include/linux/atomic-raw.h building raw_atomic_*() atop of the
> >   arch_atomic_*() definitions, filling in gaps in the API. Having
> >   separate arch_ and raw_ namespaces would simplify the ifdeffery.
> >
> > * include/linux/atomic.h building atomic_*() atop of the raw_atomic_*()
> >   definitions, complete with any instrumentation. Instrumenting at this
> >   level would lower the instrumentation overhead, and would not require
> >   any ifdeffery as the whole raw_atomic_*() API would be available.
> >
> > ... I've avoided this for the time being due to the necessary churn in
> > arch code.
> >
> > Thanks,
> > Mark.
> 
> Wow! Nice work!

Thanks!

> I can say that besides KASAN, we will need hooking into atomics for
> KMSAN and KTSAN in future. Without this it would be lots of manual
> work with possibility of copy-paste errors.

Indeed!

This was largely driven by wanting the arm64 atomics instrumented, and
generating that does end up easier to get right than manually expanding all the
acquire/release/relaxed ifdeffery manually.

First we need to get the preparatory patches [1,2] merged. Those have had a few
fixups since they were last posted, so I'll send an updated version come
v4.18-rc1 unless the atomics maintainers really want to queue those ASAP.

There'll be some additional prep work for instrumentation on arm64, too (e.g.
fiddling with the (cmp)xchg instrumentation), so I'll look into that in the
mean time.

Thanks,
Mark.

> > [1] https://lkml.kernel.org/r/20180529154346.3168-1-mark.rutl...@arm.com
> > [2] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
> > atomics/api-cleanup
> > [3] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
> > atomics/generated
> > [4] 
> > https://lkml.kernel.org/r/ca+55afxju0op8qllu0n-rjhbs7gqslvd8jcyedgw6jezfn7...@mail.gmail.com


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
On 06/04/2018 at 04:12 PM Alan Cox wrote:
>> A malicious program most probably won't care about that. Therefore, my
>> next question is: which memory regions can be exploited by a malicious
>> program? The complete physical memory or only the memory provided to the
>> malicious program? Should be the latter if this approach should have any
>> impact.
> 
> Spectre is not about memory regions. It's about speculative execution
> leaving measurable footprints. What footprints you leave depend upon what
> code you are executing. Thus the question becomes 'what can the target
> access'.
> 
> In order to attack something you need both a way to influence the code
> concerned and a way to measure it. In addition it needs to have some
> secret you want.
> 
> In practice that usually means something on the same system with its own
> memory space/privilege level. The usual cases then are user<->kernel and
> managed application<->runtime.

Would this be a practical test case: Gather keys and passwords used by a
ssh login by running a malicious program in parallel to sshd as another
ordinary user w/o root access.


Thanks,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
On 06/04/2018 at 04:12 PM Alan Cox wrote:
>> A malicious program most probably won't care about that. Therefore, my
>> next question is: which memory regions can be exploited by a malicious
>> program? The complete physical memory or only the memory provided to the
>> malicious program? Should be the latter if this approach should have any
>> impact.
> 
> Spectre is not about memory regions. It's about speculative execution
> leaving measurable footprints. What footprints you leave depend upon what
> code you are executing. Thus the question becomes 'what can the target
> access'.
> 
> In order to attack something you need both a way to influence the code
> concerned and a way to measure it. In addition it needs to have some
> secret you want.
> 
> In practice that usually means something on the same system with its own
> memory space/privilege level. The usual cases then are user<->kernel and
> managed application<->runtime.

Would this be a practical test case: Gather keys and passwords used by a
ssh login by running a malicious program in parallel to sshd as another
ordinary user w/o root access.


Thanks,
Andreas


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 05-06-18, 07:48, Daniel Lezcano wrote:
> As soon as we reach complete(), no timer can be set because of the
> condition before.

Why not ? We aren't using any locks here and it is possible that run_duration_ms
is set to 0 from idle_injection_stop() only after the first thread has restarted
the hrtimer. Isn't it ?

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 05-06-18, 07:48, Daniel Lezcano wrote:
> As soon as we reach complete(), no timer can be set because of the
> condition before.

Why not ? We aren't using any locks here and it is possible that run_duration_ms
is set to 0 from idle_injection_stop() only after the first thread has restarted
the hrtimer. Isn't it ?

-- 
viresh


RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Tuesday, June 5, 2018 11:03 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > "Hatayama, Daisuke"  writes:
> >
> >>> Can you test this and please verify it fixes your issue?
> >>
> >> I tried this patch on top of v4.17 but the system fails to boot
> >> without detecting root disks by dracut like this:
> [snip]
> 
> >> OTOH, there's no issue on the pure v4.17 kernel.
> >>
> >> As above, ls /sys/module looks apparently good. But I guess any part of
> >> behavior of getdentries() on sysfs must have changed, affecting the disk
> >> detection...
> >
> > I haven't been able to reproduce this yet.  My test system boots fine.
> > Which fedora are you testing on?
> 
> I reproduced something similar and fedora 28.  So I think I have found
> and fixed the issue.  I believe I simply reversed the test at the end of
> kernfs_dir_pos. AKA "<" instead of ">".

Though too late, I used fedora 28 and RHEL7.5.

I applied this fix to your patch and the system boot was successfully done.

I'll start testing your patch from now on.

> 
> I am going to see if I can test my changes more throughly on this side
> and then repost.
> 
> 
> 
> >>>  fs/kernfs/dir.c | 109
> >>> ++--
> >>>  1 file changed, 67 insertions(+), 42 deletions(-)
> >>>
> >>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> >>> index 89d1dc19340b..8148b5fec48d 100644
> >>> --- a/fs/kernfs/dir.c
> >>> +++ b/fs/kernfs/dir.c
> >>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode
> *inode,
> >>> struct file *filp)
> >>>   return 0;
> >>>  }
> >>>
> >>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
> >>> +{
> >>> + struct rb_node *node = rb_next(>rb);
> >>> + return node ? rb_to_kn(node) : NULL;
> >>> +}
> >>> +
> >>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
> >>> - struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
> >>> + struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
> >>>  {
> >>> - if (pos) {
> >>> - int valid = kernfs_active(pos) &&
> >>> - pos->parent == parent && hash == pos->hash;
> >>> - kernfs_put(pos);
> >>> - if (!valid)
> >>> - pos = NULL;
> >>> - }
> >>> - if (!pos && (hash > 1) && (hash < INT_MAX)) {
> >>> - struct rb_node *node = parent->dir.children.rb_node;
> >>> - while (node) {
> >>> - pos = rb_to_kn(node);
> >>> -
> >>> - if (hash < pos->hash)
> >>> - node = node->rb_left;
> >>> - else if (hash > pos->hash)
> >>> - node = node->rb_right;
> >>> - else
> >>> - break;
> >>> + struct kernfs_node *pos;
> >>> + struct rb_node *node;
> >>> + unsigned int hash;
> >>> + const char *name = "";
> >>> +
> >>> + /* Is off a valid name hash? */
> >>> + if ((off < 2) || (off >= INT_MAX))
> >>> + return NULL;
> >>> + hash = off;
> >>> +
> >>> + /* Is the saved position usable? */
> >>> + if (saved) {
> >>> + /* Proper parent and hash? */
> >>> + if ((parent != saved->parent) || (saved->hash != hash)) {
> >>> + saved = NULL;
> >>
> >> name is uninitialized in this path.
> >
> > It is.  name is initialized to "" see above.
> >
> >>> + } else {
> >>> + if (kernfs_active(saved))
> >>> + return saved;
> >>> + name = saved->name;
> >>>   }
> >>>   }
> >>> - /* Skip over entries which are dying/dead or in the wrong namespace
> >>> */
> >>> - while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
> >>> - struct rb_node *node = rb_next(>rb);
> >>> - if (!node)
> >>> - pos = NULL;
> >>> +
> >>> + /* Find the closest pos to the hash we are looking for */
> >>> + pos = NULL;
> >>> + node = parent->dir.children.rb_node;
> >>> + while (node) {
> >>> + int result;
> >>> +
> >>> + pos = rb_to_kn(node);
> >>> + result = kernfs_name_compare(hash, name, ns, pos);
> >>> + if (result < 0)
> >>> + node = node->rb_left;
> >>> + else if (result > 0)
> >>> + node = node->rb_right;
> >>>   else
> >>> - pos = rb_to_kn(node);
> >>> + break;
> >>>   }
> >>> +
> >>> + /* Ensure pos is at or beyond the target position */
> >>> + if (pos && (kernfs_name_compare(hash, name, ns, pos) < 0))
> 
>   should be > 0
> >>> + pos = 

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Tuesday, June 5, 2018 11:03 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > "Hatayama, Daisuke"  writes:
> >
> >>> Can you test this and please verify it fixes your issue?
> >>
> >> I tried this patch on top of v4.17 but the system fails to boot
> >> without detecting root disks by dracut like this:
> [snip]
> 
> >> OTOH, there's no issue on the pure v4.17 kernel.
> >>
> >> As above, ls /sys/module looks apparently good. But I guess any part of
> >> behavior of getdentries() on sysfs must have changed, affecting the disk
> >> detection...
> >
> > I haven't been able to reproduce this yet.  My test system boots fine.
> > Which fedora are you testing on?
> 
> I reproduced something similar and fedora 28.  So I think I have found
> and fixed the issue.  I believe I simply reversed the test at the end of
> kernfs_dir_pos. AKA "<" instead of ">".

Though too late, I used fedora 28 and RHEL7.5.

I applied this fix to your patch and the system boot was successfully done.

I'll start testing your patch from now on.

> 
> I am going to see if I can test my changes more throughly on this side
> and then repost.
> 
> 
> 
> >>>  fs/kernfs/dir.c | 109
> >>> ++--
> >>>  1 file changed, 67 insertions(+), 42 deletions(-)
> >>>
> >>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> >>> index 89d1dc19340b..8148b5fec48d 100644
> >>> --- a/fs/kernfs/dir.c
> >>> +++ b/fs/kernfs/dir.c
> >>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode
> *inode,
> >>> struct file *filp)
> >>>   return 0;
> >>>  }
> >>>
> >>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
> >>> +{
> >>> + struct rb_node *node = rb_next(>rb);
> >>> + return node ? rb_to_kn(node) : NULL;
> >>> +}
> >>> +
> >>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
> >>> - struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
> >>> + struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
> >>>  {
> >>> - if (pos) {
> >>> - int valid = kernfs_active(pos) &&
> >>> - pos->parent == parent && hash == pos->hash;
> >>> - kernfs_put(pos);
> >>> - if (!valid)
> >>> - pos = NULL;
> >>> - }
> >>> - if (!pos && (hash > 1) && (hash < INT_MAX)) {
> >>> - struct rb_node *node = parent->dir.children.rb_node;
> >>> - while (node) {
> >>> - pos = rb_to_kn(node);
> >>> -
> >>> - if (hash < pos->hash)
> >>> - node = node->rb_left;
> >>> - else if (hash > pos->hash)
> >>> - node = node->rb_right;
> >>> - else
> >>> - break;
> >>> + struct kernfs_node *pos;
> >>> + struct rb_node *node;
> >>> + unsigned int hash;
> >>> + const char *name = "";
> >>> +
> >>> + /* Is off a valid name hash? */
> >>> + if ((off < 2) || (off >= INT_MAX))
> >>> + return NULL;
> >>> + hash = off;
> >>> +
> >>> + /* Is the saved position usable? */
> >>> + if (saved) {
> >>> + /* Proper parent and hash? */
> >>> + if ((parent != saved->parent) || (saved->hash != hash)) {
> >>> + saved = NULL;
> >>
> >> name is uninitialized in this path.
> >
> > It is.  name is initialized to "" see above.
> >
> >>> + } else {
> >>> + if (kernfs_active(saved))
> >>> + return saved;
> >>> + name = saved->name;
> >>>   }
> >>>   }
> >>> - /* Skip over entries which are dying/dead or in the wrong namespace
> >>> */
> >>> - while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
> >>> - struct rb_node *node = rb_next(>rb);
> >>> - if (!node)
> >>> - pos = NULL;
> >>> +
> >>> + /* Find the closest pos to the hash we are looking for */
> >>> + pos = NULL;
> >>> + node = parent->dir.children.rb_node;
> >>> + while (node) {
> >>> + int result;
> >>> +
> >>> + pos = rb_to_kn(node);
> >>> + result = kernfs_name_compare(hash, name, ns, pos);
> >>> + if (result < 0)
> >>> + node = node->rb_left;
> >>> + else if (result > 0)
> >>> + node = node->rb_right;
> >>>   else
> >>> - pos = rb_to_kn(node);
> >>> + break;
> >>>   }
> >>> +
> >>> + /* Ensure pos is at or beyond the target position */
> >>> + if (pos && (kernfs_name_compare(hash, name, ns, pos) < 0))
> 
>   should be > 0
> >>> + pos = 

Re: [PATCH v5 24/31] kconfig: add CC_IS_GCC and GCC_VERSION

2018-06-04 Thread Stefan Agner
On 05.06.2018 02:07, Masahiro Yamada wrote:
> Hi Stefan
> 
> 2018-06-05 6:49 GMT+09:00 Stefan Agner :
>> Hi Masahiro,
>>
>> On 28.05.2018 11:22, Masahiro Yamada wrote:
>>> This will be useful to specify the required compiler version,
>>> like this:
>>>
>>> config FOO
>>> bool "Use Foo"
>>> depends on GCC_VERSION >= 40800
>>> help
>>>   This feature requires GCC 4.8 or newer.
>>>
>>
>> I tried using CC_IS_GCC today while using clang. It seems that it is set
>> to y despite I am using CC=clang.
>>
>> .config looks like this after config:
>>
>> ...
>> CONFIG_CC_IS_GCC=y
>> CONFIG_GCC_VERSION=40201
>> CONFIG_CC_IS_CLANG=y
>> CONFIG_CLANG_VERSION=6
>> ...
>>
>>
>> I am using clang 6.0.0 on Arch Linux, which seems to return a version
>> when using gcc-version.sh:
>> ./scripts/gcc-version.sh clang | sed 's/^0*//'
>> 402
>>
>> I guess that should not be the case?
>>
> 
> 
> What will 'clang --version' print on your machine?

$ clang --version
clang version 6.0.0 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir:
/home/ags/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf/bin

I use a symlink to clang in my cross compiler toolchain, that is why
InstalledDir points to a GCC toolchain.

--
Stefan


Re: [PATCH v5 24/31] kconfig: add CC_IS_GCC and GCC_VERSION

2018-06-04 Thread Stefan Agner
On 05.06.2018 02:07, Masahiro Yamada wrote:
> Hi Stefan
> 
> 2018-06-05 6:49 GMT+09:00 Stefan Agner :
>> Hi Masahiro,
>>
>> On 28.05.2018 11:22, Masahiro Yamada wrote:
>>> This will be useful to specify the required compiler version,
>>> like this:
>>>
>>> config FOO
>>> bool "Use Foo"
>>> depends on GCC_VERSION >= 40800
>>> help
>>>   This feature requires GCC 4.8 or newer.
>>>
>>
>> I tried using CC_IS_GCC today while using clang. It seems that it is set
>> to y despite I am using CC=clang.
>>
>> .config looks like this after config:
>>
>> ...
>> CONFIG_CC_IS_GCC=y
>> CONFIG_GCC_VERSION=40201
>> CONFIG_CC_IS_CLANG=y
>> CONFIG_CLANG_VERSION=6
>> ...
>>
>>
>> I am using clang 6.0.0 on Arch Linux, which seems to return a version
>> when using gcc-version.sh:
>> ./scripts/gcc-version.sh clang | sed 's/^0*//'
>> 402
>>
>> I guess that should not be the case?
>>
> 
> 
> What will 'clang --version' print on your machine?

$ clang --version
clang version 6.0.0 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir:
/home/ags/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf/bin

I use a symlink to clang in my cross compiler toolchain, that is why
InstalledDir points to a GCC toolchain.

--
Stefan


[no subject]

2018-06-04 Thread Mavis Wanczyk



-- 
Good Day,

I, Mavis Wanczyk donates $ 5 Million Dollars from part of my Powerball
Jackpot Lottery of $ 758 Million Dollars, respond with your details
for claims.

I await your earliest response and God Bless you

Good luck.
Mavis Wanczyk


[no subject]

2018-06-04 Thread Mavis Wanczyk



-- 
Good Day,

I, Mavis Wanczyk donates $ 5 Million Dollars from part of my Powerball
Jackpot Lottery of $ 758 Million Dollars, respond with your details
for claims.

I await your earliest response and God Bless you

Good luck.
Mavis Wanczyk


Re: 4.14.44: BUG_ON(!list_empty(>wait_list));

2018-06-04 Thread Daniel J Blueman
On 5 June 2018 at 05:47,   wrote:
>> -Original Message-
>> From: Daniel J Blueman [mailto:dan...@quora.org]
>> Sent: Thursday, May 31, 2018 9:21 PM
>> To: Linux Kernel; linux-a...@vger.kernel.org
>> Cc: Limonciello, Mario; Dominguez, Jared
>> Subject: 4.14.44: BUG_ON(!list_empty(>wait_list));
>>
>> Plugging in a USB-C power source on my Dell XPS 9550 trips an ACPI
>> BUG_ON [1], reproducible with mainline 4.14.44, suggesting other
>> threads are waiting for semaphore acquisition due to
>> "BUG_ON(!list_empty(>wait_list))".
>>
>> This is the current 1.7.0 BIOS with Ubuntu 18.04 userspace, plugging
>> in an LG 27UD88 (also with the current firmware) monitor USB-C
>> connection which apparently advertises 60W charging (x1,
>> PowerDelivery, DisplayPort alternative mode, data). The same issues
>> reproduce on a Dell Precision 5510 with Ubuntu 16.04, the shipped
>> kernel and 4.14.44.
>>
>> I can enable ACPI debugging if useful? Perhaps ACPI_DB_MUTEX or other
>> levels would be appropriate?
>
> I think most useful would be if this can still reproduce with 4.17.

Fair suggestion!

I can achieve 100% reproducibility of the same backtrace on a clean
Ubuntu 18.04 install with 4.17 mainline [1]:

1. disable grub 'quiet' parameter, disconnect charger and power off laptop to S5
2. power on laptop from S5
3. suspend via closing lid
4. resume by opening lid
5. connect LG 27UD88 via USB-C
6. wait 20s
7. disconnect LG 27UD88
8. run 'systemctl poweroff'
9. observe the same backtrace from acpi_os_delete_semaphore

I don't observe the issue when using an Apple 87W USB-C Power Adapter,
so it may reproduce on other monitors advertising USB-C DisplayPort
alternate mode.

Thanks,
  Daniel

[1] 
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17/linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb

>> kernel BUG at /home/kernel/COD/linux/drivers/acpi/osl.c:1201
>> invalid opcode:  [#1] SMP PTI
>> Modules linked in: [...]
>> CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.14.44-041444-generic
>> #201805251612
>> Hardware name: Dell Inc. XPS 15 9550/, BIOS 1.7.0 02/23/2018
>> task: 9bc2ab6b9740 task.stack: bOca80034000
>> RIP: 0010:acpi_os_delete_semaphore+0x6d/0x70
>> RSP: 0018:bOca80037be8 EFLAGS: 00010283
>> RAX: bOca83f8fc40 RBX: 9bc238b5dbe0 RCX: 
>> RDX: 9bc238b5dbe8 RSI:  RDI: 9bc238b5dbe0
>> RBP: 9bc2adlc0990 ROB: 9bc2bdc25f20 R09: 9bc29ee56300
>> R10: e03bd2796440 R11: 9bc2ad183fa0 R12: 9bc22f1321e0
>> R13: 0001 R14: 0001 R15: 9bc22f132eb0
>> FS: 7fc03886f940() GS:9bc2bdc0()
>> knlGS:
>> CS: 0010 DS:  ES:  CRO: 80050033
>> CR2: 7ffc645e70f8 CR3: 00049e120001 CR4: 003606f0
>> Call Trace:
>> acpi_ex_system_reset_event+0x3f/0x65
>> acpi_ex_opcode_1A_OT_0R+0x70/0xfa
>> acpi_ds_exec_end_op+0x15d/0x71b
>> acpi_ps_parse_loop+0x929/0x9d6
>> ? acpi_ds_result_push+0x82/0x1d2
>> acpi_ps_parse_aml+0x1a2/0x4af
>> acpi_ps_execute_method+0x1ef/0x2ab
>> acpi_ns_evaluate+0x2e4/0x41d
>> acpi_evaluate_object+0x1cb/0x38e
>> acpi_enter_sleep_state_prep+0xae/0x13a
>> acpi_sleep_prepare.part.2+0x2e/0x40
>> acpi_power_off_prepare+0xf/0x20
>> [38871.1925361 kernel_power_off+0x42/0x70
>> SYSC_reboot+0x12f/0x210
>> ? handle_mm_fault+0xea/0x1e0
>> [38871.1925861 ? do_writev+0x5e/0xf0
>> ? do_writev+0x5e/0xf0
>> do_syscall_64+0x6e/0x120
>> entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> RIP: 0033:0x7fc03839b373
>> RSP: 002b:7ffc645e70f8 EFLAGS: 0202 ORIG_RAX: 00a9
>> RAX: ffda RBX: 4321fedc RCX: 7fc03839b373
>> ROX: 4321fedc RSI: 28121969 RDI: fee1dead
>> RBP: 7ffc645e7160 R08:  R09: 
>> R10: 0002 R11: 0202 R12: 7ffc645e7168
>> R13:  R14: 001b0004 R15: 7ffc645e7458
>> Code: b8 00 04 00 00 48 c7 c1 c3 91 28 ab 48 c7 c2 20 91 28 ab be of
>> 04 00 00 bf 00 00 00 01 03 41 85 04 00 58 eb b0 b8 01 10 00 00 c3 
>> Ob 90 Of if 44 00 00 80 3d 74 CO 97 01 00 41 54 55 53 Of 84
>> RIP: acpi_os_delete_semaphore+0x6d/0x70 RSP: b0ca80037be8
-- 
Daniel J Blueman


Re: 4.14.44: BUG_ON(!list_empty(>wait_list));

2018-06-04 Thread Daniel J Blueman
On 5 June 2018 at 05:47,   wrote:
>> -Original Message-
>> From: Daniel J Blueman [mailto:dan...@quora.org]
>> Sent: Thursday, May 31, 2018 9:21 PM
>> To: Linux Kernel; linux-a...@vger.kernel.org
>> Cc: Limonciello, Mario; Dominguez, Jared
>> Subject: 4.14.44: BUG_ON(!list_empty(>wait_list));
>>
>> Plugging in a USB-C power source on my Dell XPS 9550 trips an ACPI
>> BUG_ON [1], reproducible with mainline 4.14.44, suggesting other
>> threads are waiting for semaphore acquisition due to
>> "BUG_ON(!list_empty(>wait_list))".
>>
>> This is the current 1.7.0 BIOS with Ubuntu 18.04 userspace, plugging
>> in an LG 27UD88 (also with the current firmware) monitor USB-C
>> connection which apparently advertises 60W charging (x1,
>> PowerDelivery, DisplayPort alternative mode, data). The same issues
>> reproduce on a Dell Precision 5510 with Ubuntu 16.04, the shipped
>> kernel and 4.14.44.
>>
>> I can enable ACPI debugging if useful? Perhaps ACPI_DB_MUTEX or other
>> levels would be appropriate?
>
> I think most useful would be if this can still reproduce with 4.17.

Fair suggestion!

I can achieve 100% reproducibility of the same backtrace on a clean
Ubuntu 18.04 install with 4.17 mainline [1]:

1. disable grub 'quiet' parameter, disconnect charger and power off laptop to S5
2. power on laptop from S5
3. suspend via closing lid
4. resume by opening lid
5. connect LG 27UD88 via USB-C
6. wait 20s
7. disconnect LG 27UD88
8. run 'systemctl poweroff'
9. observe the same backtrace from acpi_os_delete_semaphore

I don't observe the issue when using an Apple 87W USB-C Power Adapter,
so it may reproduce on other monitors advertising USB-C DisplayPort
alternate mode.

Thanks,
  Daniel

[1] 
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17/linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb

>> kernel BUG at /home/kernel/COD/linux/drivers/acpi/osl.c:1201
>> invalid opcode:  [#1] SMP PTI
>> Modules linked in: [...]
>> CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.14.44-041444-generic
>> #201805251612
>> Hardware name: Dell Inc. XPS 15 9550/, BIOS 1.7.0 02/23/2018
>> task: 9bc2ab6b9740 task.stack: bOca80034000
>> RIP: 0010:acpi_os_delete_semaphore+0x6d/0x70
>> RSP: 0018:bOca80037be8 EFLAGS: 00010283
>> RAX: bOca83f8fc40 RBX: 9bc238b5dbe0 RCX: 
>> RDX: 9bc238b5dbe8 RSI:  RDI: 9bc238b5dbe0
>> RBP: 9bc2adlc0990 ROB: 9bc2bdc25f20 R09: 9bc29ee56300
>> R10: e03bd2796440 R11: 9bc2ad183fa0 R12: 9bc22f1321e0
>> R13: 0001 R14: 0001 R15: 9bc22f132eb0
>> FS: 7fc03886f940() GS:9bc2bdc0()
>> knlGS:
>> CS: 0010 DS:  ES:  CRO: 80050033
>> CR2: 7ffc645e70f8 CR3: 00049e120001 CR4: 003606f0
>> Call Trace:
>> acpi_ex_system_reset_event+0x3f/0x65
>> acpi_ex_opcode_1A_OT_0R+0x70/0xfa
>> acpi_ds_exec_end_op+0x15d/0x71b
>> acpi_ps_parse_loop+0x929/0x9d6
>> ? acpi_ds_result_push+0x82/0x1d2
>> acpi_ps_parse_aml+0x1a2/0x4af
>> acpi_ps_execute_method+0x1ef/0x2ab
>> acpi_ns_evaluate+0x2e4/0x41d
>> acpi_evaluate_object+0x1cb/0x38e
>> acpi_enter_sleep_state_prep+0xae/0x13a
>> acpi_sleep_prepare.part.2+0x2e/0x40
>> acpi_power_off_prepare+0xf/0x20
>> [38871.1925361 kernel_power_off+0x42/0x70
>> SYSC_reboot+0x12f/0x210
>> ? handle_mm_fault+0xea/0x1e0
>> [38871.1925861 ? do_writev+0x5e/0xf0
>> ? do_writev+0x5e/0xf0
>> do_syscall_64+0x6e/0x120
>> entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> RIP: 0033:0x7fc03839b373
>> RSP: 002b:7ffc645e70f8 EFLAGS: 0202 ORIG_RAX: 00a9
>> RAX: ffda RBX: 4321fedc RCX: 7fc03839b373
>> ROX: 4321fedc RSI: 28121969 RDI: fee1dead
>> RBP: 7ffc645e7160 R08:  R09: 
>> R10: 0002 R11: 0202 R12: 7ffc645e7168
>> R13:  R14: 001b0004 R15: 7ffc645e7458
>> Code: b8 00 04 00 00 48 c7 c1 c3 91 28 ab 48 c7 c2 20 91 28 ab be of
>> 04 00 00 bf 00 00 00 01 03 41 85 04 00 58 eb b0 b8 01 10 00 00 c3 
>> Ob 90 Of if 44 00 00 80 3d 74 CO 97 01 00 41 54 55 53 Of 84
>> RIP: acpi_os_delete_semaphore+0x6d/0x70 RSP: b0ca80037be8
-- 
Daniel J Blueman


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Daniel Lezcano
On 05/06/2018 07:14, Viresh Kumar wrote:
> On 31-05-18, 20:25, Daniel Lezcano wrote:
>> On 29/05/2018 11:31, Viresh Kumar wrote:
>>> On 25-05-18, 11:49, Daniel Lezcano wrote:
 +  /* + * The last CPU waking up is in charge of setting the
 timer. If + * the CPU is hotplugged, the timer will move to
 another CPU +   * (which may not belong to the same cluster) but
 that is not a + * problem as the timer will be set again by
 another CPU +   * belonging to the cluster. This mechanism is
 self adaptive. +*/
>>> 
>>> I am afraid that the above comment may not be completely true all
>>> the time. For a quad-core platform, it is possible for 3 CPUs
>>> (0,1,2) to run this function as soon as the kthread is woken up,
>>> but one of the CPUs (3) may be stuck servicing an IRQ, Deadline
>>> or RT activity. Because you do atomic_inc() also in this function
>>> (above) itself, below decrement may return a true value for the
>>> CPU2 and that will restart the hrtimer, while one of the CPUs
>>> never got a chance to increment count in the first place.
>>> 
>>> The fix is simple though, do the increment in
>>> idle_injection_wakeup() and things should be fine then.
>> 
>> Ok.
>> 
 +  if (!atomic_dec_and_test(_dev->count)) + return; + +
 run_duration_ms = atomic_read(_dev->run_duration_ms); + if
 (run_duration_ms) { +  hrtimer_start(_dev->timer,
 ms_to_ktime(run_duration_ms), +
 HRTIMER_MODE_REL_PINNED); +return; +   } + +
 complete(_dev->stop_complete);
>>> 
>>> So you call complete() after hrtimer is potentially restarted.
>>> This can happen if idle_injection_stop() is called right after
>>> the above atomic_read() has finished :)
>>> 
>>> IOW, this doesn't look safe now as well.
>> 
>> It is safe, we just missed a cycle and the stop will block until
>> the next cycle. I did it on purpose and for me it is correct.
> 
> Okay, what about this then:
> 
> Path A Path B
> 
> idle_injection_fn()
> idle_injection_unregister() hrtimer_start()
> idle_injection_stop()
> 
> complete()
> 
> wait_for_completion() kfree(ii_dev);
> 
> Hrtimer is still used here after getting freed.
> 
> Is this not possible ?

As soon as we reach complete(), no timer can be set because of the
condition before.

 +struct idle_injection_device *idle_injection_register(struct
 cpumask *cpumask) +{ + struct idle_injection_device *ii_dev; +
 int cpu, cpu2; + + ii_dev = ii_dev_alloc(); +  if (!ii_dev) +
 return NULL; + +   cpumask_copy(ii_dev->cpumask, cpumask); +
 hrtimer_init(_dev->timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_REL); +   ii_dev->timer.function =
 idle_injection_wakeup_fn; + +  for_each_cpu(cpu,
 ii_dev->cpumask) { + + if (per_cpu(idle_injection_device,
 cpu)) {
>>> 
>>> Maybe add unlikely here ?
>> 
>> For this kind of init function and tight loop, there is no benefit
>> of adding the unlikely / likely. I can add it if you want, but it
>> is useless.
> 
> Okay.
> 
 +  pr_err("cpu%d is already registered\n", cpu); + 
 goto
 out_rollback_per_cpu; +} + +   
 per_cpu(idle_injection_device,
 cpu) = ii_dev; +   } + +   return ii_dev; + +out_rollback_per_cpu: 
 +  for_each_cpu(cpu2, ii_dev->cpumask) { + if (cpu == cpu2)
>>> 
>>> And is this really required? Perhaps this conditional is making
>>> it worse and just setting the per-cpu variable forcefully to NULL
>>> would be much faster :)
>> 
>> We want undo what was done, setting all variables to NULL resets
>> the values from a previous register and we don't want that.
> 
> Why will that happen ? We are only iterating over a particular
> cpumask and not all possible CPUs. Yes I understand the "undo only
> what we did" part, but the conditional is more expensive than that I
> feel.

Ok, I will remove it. In any case, there is no point of having the
function called twice. If that happens, there is something wrong with
the caller.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Daniel Lezcano
On 05/06/2018 07:14, Viresh Kumar wrote:
> On 31-05-18, 20:25, Daniel Lezcano wrote:
>> On 29/05/2018 11:31, Viresh Kumar wrote:
>>> On 25-05-18, 11:49, Daniel Lezcano wrote:
 +  /* + * The last CPU waking up is in charge of setting the
 timer. If + * the CPU is hotplugged, the timer will move to
 another CPU +   * (which may not belong to the same cluster) but
 that is not a + * problem as the timer will be set again by
 another CPU +   * belonging to the cluster. This mechanism is
 self adaptive. +*/
>>> 
>>> I am afraid that the above comment may not be completely true all
>>> the time. For a quad-core platform, it is possible for 3 CPUs
>>> (0,1,2) to run this function as soon as the kthread is woken up,
>>> but one of the CPUs (3) may be stuck servicing an IRQ, Deadline
>>> or RT activity. Because you do atomic_inc() also in this function
>>> (above) itself, below decrement may return a true value for the
>>> CPU2 and that will restart the hrtimer, while one of the CPUs
>>> never got a chance to increment count in the first place.
>>> 
>>> The fix is simple though, do the increment in
>>> idle_injection_wakeup() and things should be fine then.
>> 
>> Ok.
>> 
 +  if (!atomic_dec_and_test(_dev->count)) + return; + +
 run_duration_ms = atomic_read(_dev->run_duration_ms); + if
 (run_duration_ms) { +  hrtimer_start(_dev->timer,
 ms_to_ktime(run_duration_ms), +
 HRTIMER_MODE_REL_PINNED); +return; +   } + +
 complete(_dev->stop_complete);
>>> 
>>> So you call complete() after hrtimer is potentially restarted.
>>> This can happen if idle_injection_stop() is called right after
>>> the above atomic_read() has finished :)
>>> 
>>> IOW, this doesn't look safe now as well.
>> 
>> It is safe, we just missed a cycle and the stop will block until
>> the next cycle. I did it on purpose and for me it is correct.
> 
> Okay, what about this then:
> 
> Path A Path B
> 
> idle_injection_fn()
> idle_injection_unregister() hrtimer_start()
> idle_injection_stop()
> 
> complete()
> 
> wait_for_completion() kfree(ii_dev);
> 
> Hrtimer is still used here after getting freed.
> 
> Is this not possible ?

As soon as we reach complete(), no timer can be set because of the
condition before.

 +struct idle_injection_device *idle_injection_register(struct
 cpumask *cpumask) +{ + struct idle_injection_device *ii_dev; +
 int cpu, cpu2; + + ii_dev = ii_dev_alloc(); +  if (!ii_dev) +
 return NULL; + +   cpumask_copy(ii_dev->cpumask, cpumask); +
 hrtimer_init(_dev->timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_REL); +   ii_dev->timer.function =
 idle_injection_wakeup_fn; + +  for_each_cpu(cpu,
 ii_dev->cpumask) { + + if (per_cpu(idle_injection_device,
 cpu)) {
>>> 
>>> Maybe add unlikely here ?
>> 
>> For this kind of init function and tight loop, there is no benefit
>> of adding the unlikely / likely. I can add it if you want, but it
>> is useless.
> 
> Okay.
> 
 +  pr_err("cpu%d is already registered\n", cpu); + 
 goto
 out_rollback_per_cpu; +} + +   
 per_cpu(idle_injection_device,
 cpu) = ii_dev; +   } + +   return ii_dev; + +out_rollback_per_cpu: 
 +  for_each_cpu(cpu2, ii_dev->cpumask) { + if (cpu == cpu2)
>>> 
>>> And is this really required? Perhaps this conditional is making
>>> it worse and just setting the per-cpu variable forcefully to NULL
>>> would be much faster :)
>> 
>> We want undo what was done, setting all variables to NULL resets
>> the values from a previous register and we don't want that.
> 
> Why will that happen ? We are only iterating over a particular
> cpumask and not all possible CPUs. Yes I understand the "undo only
> what we did" part, but the conditional is more expensive than that I
> feel.

Ok, I will remove it. In any case, there is no point of having the
function called twice. If that happens, there is something wrong with
the caller.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

2018-06-04 Thread kbuild test robot
Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=x86_64

All errors (new ones prefixed by >>):

   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix 
>> btsq $0,(%rax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 28(%r12)'
--
   arch/x86/include/asm/atomic64_64.h: Assembler messages:
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 1000(%rcx,%rdx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 64(%rdx)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix 
>> btrq $8,8(%rax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
>> orb $2,8(%rax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rdx)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rdx)'
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rax)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 16(%r12)'
   arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
decl 16(%r12)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl (%rdx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl (%rdx)'
--
   arch/x86/include/asm/atomic64_64.h: Assembler messages:
>> arch/x86/include/asm/atomic64_64.h:46: Error: no such instruction: 
>> `lock_prefix addq %rsi,1008(%rdx,%rax)'
>> arch/x86/include/asm/atomic64_64.h:46: Error: no such instruction: 
>> `lock_prefix addq %rsi,72(%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 72(%rax)'
>> arch/x86/include/asm/atomic64_64.h:183: Error: no such instruction: 
>> `lock_prefix cmpxchgq %rcx,56(%rdx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 76(%rdi)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%r12)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 72(%rdi)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 56(%rsi)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 72(%rdi)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 76(%rbx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl -868(%rbx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%rdi)'
   arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
andb $-5,8(%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl (%rdi)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 72(%rbx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 56(%rax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
>> decl 296(%rcx)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl 24(%rdx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq (%rbx)'
   arch/x86/in

Re: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

2018-06-04 Thread kbuild test robot
Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=x86_64

All errors (new ones prefixed by >>):

   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix 
>> btsq $0,(%rax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 28(%r12)'
--
   arch/x86/include/asm/atomic64_64.h: Assembler messages:
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 1000(%rcx,%rdx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 64(%rdx)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix 
>> btrq $8,8(%rax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
>> orb $2,8(%rax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rdx)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rdx)'
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,120(%rax)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 16(%r12)'
   arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
decl 16(%r12)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl (%rdx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl (%rdx)'
--
   arch/x86/include/asm/atomic64_64.h: Assembler messages:
>> arch/x86/include/asm/atomic64_64.h:46: Error: no such instruction: 
>> `lock_prefix addq %rsi,1008(%rdx,%rax)'
>> arch/x86/include/asm/atomic64_64.h:46: Error: no such instruction: 
>> `lock_prefix addq %rsi,72(%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 72(%rax)'
>> arch/x86/include/asm/atomic64_64.h:183: Error: no such instruction: 
>> `lock_prefix cmpxchgq %rcx,56(%rdx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 76(%rdi)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%r12)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 72(%rdi)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 56(%rsi)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 72(%rdi)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 76(%rbx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl -868(%rbx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%rdi)'
   arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
andb $-5,8(%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl (%rdi)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 72(%rbx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq 56(%rax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
>> decl 296(%rcx)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl 24(%rdx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: 
>> `lock_prefix incq (%rbx)'
   arch/x86/in

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Monday, June 4, 2018 11:45 PM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> "Hatayama, Daisuke"  writes:
> 
> > Hello,
> >
> > Thanks for this patch, Eric.
> >
> >> -Original Message-
> >> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> >> Sent: Monday, June 4, 2018 3:51 AM
> >> To: Hatayama, Daisuke 
> >> Cc: 'gre...@linuxfoundation.org' ;
> >> 't...@kernel.org' ; Okajima, Toshiyuki
> >> ; linux-kernel@vger.kernel.org;
> >> 'ebied...@aristanetworks.com' 
> >> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> >>
> >>
> >> Over the years two bugs have crept into the code that handled the last
> >> returned kernfs directory being deleted, and handled general seeking
> >> in kernfs directories.  The result is that reading the /sys/module
> >> directory while moduled are loading and unloading will sometimes
> >> skip over a module that was present for the entire duration of
> >> the directory read.
> >>
> >> The first bug is that kernfs_dir_next_pos was advancing the position
> >> in the directory even when kernfs_dir_pos had found the starting
> >> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
> >> guaranteed to return the directory entry past the starting directory
> >> entry, making it so that advancing the directory position is wrong.
> >>
> >> The second bug is that kernfs_dir_pos when the saved kernfs_node
> >> did not validate, was not always returning a directory after
> >> the the target position, but was sometimes returning the directory
> >> entry just before the target position.
> >>
> >> To correct these issues and to make the code easily readable and
> >> comprehensible several cleanups are made.
> >>
> >> - A function kernfs_dir_next is factored out to make it straight-forward
> >>   to find the next node in a kernfs directory.
> >>
> >> - A function kernfs_dir_skip_pos is factored out to skip over directory
> >>   entries that should not be shown to userspace.  This function is called
> >>   from kernfs_fop_readdir directly removing the complication of calling
> >>   it from the other directory advancement functions.
> >>
> >> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
> >>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
> >>   and ensuring the directory position advancment functions can reference
> >>   the saved node without complications.
> >>
> >> - The function kernfs_dir_next_pos is modified to only advance the 
> >> directory
> >>   position in the common case when kernfs_dir_pos validates and returns
> >>   the starting kernfs node.  For all other cases kernfs_dir_pos is
> guaranteed
> >>   to return a directory position in advance of the starting directory
> position.
> >>
> >> - kernfs_dir_pos is given a significant make over.  It's essense remains
> the
> >>   same but some siginficant changes were made.
> >>
> >>   + The offset is validated to be a valid hash value at the very beginning.
> >> The validation is updated to handle the fact that neither 0 nor 1 are
> >> valid hash values.
> >>
> >>   + An extra test is added at the end of the rbtree lookup to ensure
> >> the returned position is at or beyond the target position.
> >>
> >>   + kernfs_name_compare is used during the rbtree lookup instead of just
> >> comparing
> >> the hash.  This allows the the passed namespace parameter to be used,
> and
> >> if it is available from the saved entry the old saved name as well.
> >>
> >>   + The validation of the saved kernfs node now checks for two cases.
> >> If the saved node is returnable.
> >> If the name of the saved node is usable during lookup.
> >>
> >> In net this should result in a easier to understand, and more correct
> >> version of directory traversal for kernfs directories.
> >>
> >> Reported-by: "Hatayama, Daisuke" 
> >> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number lookup")
> >> Fixes: 1e5289c97bba ("sysfs: Cache the last sysfs_dirent to improve readdir
> >> scalability v2")
> >> Signed-off-by: "Eric W. Biederman" 
> >> ---
> >>
> >> Can you test this and please verify it fixes your issue?
> >>
> >
> > I tried this patch on top of v4.17 but the system fails to boot
> > without detecting root disks by dracut like this:
> >
> > [  196.121294] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> > [  196.672175] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> > [  197.219519] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> > [  197.768997] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - 

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Monday, June 4, 2018 11:45 PM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> "Hatayama, Daisuke"  writes:
> 
> > Hello,
> >
> > Thanks for this patch, Eric.
> >
> >> -Original Message-
> >> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> >> Sent: Monday, June 4, 2018 3:51 AM
> >> To: Hatayama, Daisuke 
> >> Cc: 'gre...@linuxfoundation.org' ;
> >> 't...@kernel.org' ; Okajima, Toshiyuki
> >> ; linux-kernel@vger.kernel.org;
> >> 'ebied...@aristanetworks.com' 
> >> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> >>
> >>
> >> Over the years two bugs have crept into the code that handled the last
> >> returned kernfs directory being deleted, and handled general seeking
> >> in kernfs directories.  The result is that reading the /sys/module
> >> directory while moduled are loading and unloading will sometimes
> >> skip over a module that was present for the entire duration of
> >> the directory read.
> >>
> >> The first bug is that kernfs_dir_next_pos was advancing the position
> >> in the directory even when kernfs_dir_pos had found the starting
> >> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
> >> guaranteed to return the directory entry past the starting directory
> >> entry, making it so that advancing the directory position is wrong.
> >>
> >> The second bug is that kernfs_dir_pos when the saved kernfs_node
> >> did not validate, was not always returning a directory after
> >> the the target position, but was sometimes returning the directory
> >> entry just before the target position.
> >>
> >> To correct these issues and to make the code easily readable and
> >> comprehensible several cleanups are made.
> >>
> >> - A function kernfs_dir_next is factored out to make it straight-forward
> >>   to find the next node in a kernfs directory.
> >>
> >> - A function kernfs_dir_skip_pos is factored out to skip over directory
> >>   entries that should not be shown to userspace.  This function is called
> >>   from kernfs_fop_readdir directly removing the complication of calling
> >>   it from the other directory advancement functions.
> >>
> >> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
> >>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
> >>   and ensuring the directory position advancment functions can reference
> >>   the saved node without complications.
> >>
> >> - The function kernfs_dir_next_pos is modified to only advance the 
> >> directory
> >>   position in the common case when kernfs_dir_pos validates and returns
> >>   the starting kernfs node.  For all other cases kernfs_dir_pos is
> guaranteed
> >>   to return a directory position in advance of the starting directory
> position.
> >>
> >> - kernfs_dir_pos is given a significant make over.  It's essense remains
> the
> >>   same but some siginficant changes were made.
> >>
> >>   + The offset is validated to be a valid hash value at the very beginning.
> >> The validation is updated to handle the fact that neither 0 nor 1 are
> >> valid hash values.
> >>
> >>   + An extra test is added at the end of the rbtree lookup to ensure
> >> the returned position is at or beyond the target position.
> >>
> >>   + kernfs_name_compare is used during the rbtree lookup instead of just
> >> comparing
> >> the hash.  This allows the the passed namespace parameter to be used,
> and
> >> if it is available from the saved entry the old saved name as well.
> >>
> >>   + The validation of the saved kernfs node now checks for two cases.
> >> If the saved node is returnable.
> >> If the name of the saved node is usable during lookup.
> >>
> >> In net this should result in a easier to understand, and more correct
> >> version of directory traversal for kernfs directories.
> >>
> >> Reported-by: "Hatayama, Daisuke" 
> >> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number lookup")
> >> Fixes: 1e5289c97bba ("sysfs: Cache the last sysfs_dirent to improve readdir
> >> scalability v2")
> >> Signed-off-by: "Eric W. Biederman" 
> >> ---
> >>
> >> Can you test this and please verify it fixes your issue?
> >>
> >
> > I tried this patch on top of v4.17 but the system fails to boot
> > without detecting root disks by dracut like this:
> >
> > [  196.121294] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> > [  196.672175] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> > [  197.219519] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> > [  197.768997] dracut-initqueue[324]: Warning: dracut-initqueue
> timeout - 

Re: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

2018-06-04 Thread kbuild test robot
Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: um-i386_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=i386

All errors (new ones prefixed by >>):

   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix 
>> btsl $0,once.63562'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 16(%esi)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl contig_page_data+500(%edx)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl vm_zone_stat+32'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix 
>> btrl $8,4(%eax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
>> orb $2,4(%eax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%edx)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%edx)'
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%eax)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 8(%esi)'
   arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
decl 8(%esi)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl host_sleep_count'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl host_sleep_count'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:55: Error: no such instruction: `lock_prefix 
>> addl %edx,contig_page_data+504(%eax)'
>> arch/x86/include/asm/atomic.h:55: Error: no such instruction: `lock_prefix 
>> addl %edx,vm_zone_stat+36'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 36(%eax)'
>> arch/x86/include/asm/atomic.h:197: Error: no such instruction: `lock_prefix 
>> cmpxchgl %ecx,28(%edx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 40(%eax)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%esi)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 36(%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl 28(%eax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 36(%ebx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 40(%ebx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl -444(%ebx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%ebx)'
   arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
andb $-5,4(%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%eax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 36(%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl 28(%eax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
>> decl 184(%ecx)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 12(%edi)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%esi)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruc

Re: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

2018-06-04 Thread kbuild test robot
Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: um-i386_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=i386

All errors (new ones prefixed by >>):

   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix 
>> btsl $0,once.63562'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 16(%esi)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl contig_page_data+500(%edx)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl vm_zone_stat+32'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix 
>> btrl $8,4(%eax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
>> orb $2,4(%eax)'
--
   arch/x86/include/asm/bitops.h: Assembler messages:
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%edx)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%edx)'
   arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix 
orb $1,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
>> andb $-2,64(%eax)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 8(%esi)'
   arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
decl 8(%esi)'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl host_sleep_count'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl host_sleep_count'
--
   arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:55: Error: no such instruction: `lock_prefix 
>> addl %edx,contig_page_data+504(%eax)'
>> arch/x86/include/asm/atomic.h:55: Error: no such instruction: `lock_prefix 
>> addl %edx,vm_zone_stat+36'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 36(%eax)'
>> arch/x86/include/asm/atomic.h:197: Error: no such instruction: `lock_prefix 
>> cmpxchgl %ecx,28(%edx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 40(%eax)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%esi)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 36(%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl 28(%eax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
>> decl 36(%ebx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl 40(%ebx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl -444(%ebx)'
   arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix 
decl (%ebx)'
   arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix 
andb $-5,4(%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%eax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 36(%eax)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl 28(%eax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix 
>> decl 184(%ecx)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
>> incl 12(%edi)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix 
incl (%esi)'
   arch/x86/include/asm/atomic.h:96: Error: no such instruc

[PATCH] remoteproc: qcom: Introduce Hexagon V5 based WCSS driver

2018-06-04 Thread Sricharan R
IPQ8074 has an integrated Hexagon dsp core q6v5 and a wireless lan
(Lithium) IP. An mdt type single image format is used for the
firmware. So the mdt_load function can be directly used to load
the firmware. Also add the relevant resets required for this core.

Acked-by: Rob Herring  (for bindings)
Signed-off-by: Sricharan R 
[bjorn: Rewrote as a separate driver, intead of extending q6v5_pil.c]
Signed-off-by: Bjorn Andersson 
---
 Fixed review comments from Vinod.
 Retained the reg read/update/write sequence instead of modify for
 readability
 In q6v5_wcss_powerdown SSCAON_CONFIG bits 16,17,18 documentation
 have to be updated.

 .../devicetree/bindings/remoteproc/qcom,q6v5.txt   |   7 +-
 drivers/remoteproc/Kconfig |  16 +
 drivers/remoteproc/Makefile|   1 +
 drivers/remoteproc/qcom_q6v5_wcss.c| 588 +
 4 files changed, 611 insertions(+), 1 deletion(-)
 create mode 100644 drivers/remoteproc/qcom_q6v5_wcss.c

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
index 00d3d58..d52d05e 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
@@ -11,6 +11,7 @@ on the Qualcomm Hexagon core.
"qcom,msm8916-mss-pil",
"qcom,msm8974-mss-pil"
"qcom,msm8996-mss-pil"
+   "qcom,ipq8074-wcss-pil"
 
 - reg:
Usage: required
@@ -49,11 +50,15 @@ on the Qualcomm Hexagon core.
Usage: required
Value type: 
Definition: reference to the reset-controller for the modem sub-system
+   reference to the list of 3 reset-controllers for the
+   wcss sub-system
 
 - reset-names:
Usage: required
Value type: 
-   Definition: must be "mss_restart"
+   Definition: must be "mss_restart" for the modem sub-system
+   Definition: must be "wcss_aon_reset", "wcss_reset", "wcss_q6_reset"
+   for the wcss syb-system
 
 - cx-supply:
 - mss-supply:
diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index be76619..7b1a9ef 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -123,6 +123,22 @@ config QCOM_Q6V5_PIL
  Say y here to support the Qualcomm Peripherial Image Loader for the
  Hexagon V5 based remote processors.
 
+config QCOM_Q6V5_WCSS
+   tristate "Qualcomm Hexagon based WCSS Peripheral Image Loader"
+   depends on OF && ARCH_QCOM
+   depends on QCOM_SMEM
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_GLINK_SMEM || RPMSG_QCOM_GLINK_SMEM=n
+   depends on QCOM_SYSMON || QCOM_SYSMON=n
+   select MFD_SYSCON
+   select QCOM_MDT_LOADER
+   select QCOM_Q6V5_COMMON
+   select QCOM_RPROC_COMMON
+   select QCOM_SCM
+   help
+ Say y here to support the Qualcomm Peripheral Image Loader for the
+ Hexagon V5 based WCSS remote processors.
+
 config QCOM_SYSMON
tristate "Qualcomm sysmon driver"
depends on RPMSG
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 5dd0249..03332fa 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_QCOM_ADSP_PIL)   += qcom_adsp_pil.o
 obj-$(CONFIG_QCOM_RPROC_COMMON)+= qcom_common.o
 obj-$(CONFIG_QCOM_Q6V5_COMMON) += qcom_q6v5.o
 obj-$(CONFIG_QCOM_Q6V5_PIL)+= qcom_q6v5_pil.o
+obj-$(CONFIG_QCOM_Q6V5_WCSS)   += qcom_q6v5_wcss.o
 obj-$(CONFIG_QCOM_SYSMON)  += qcom_sysmon.o
 obj-$(CONFIG_QCOM_WCNSS_PIL)   += qcom_wcnss_pil.o
 qcom_wcnss_pil-y   += qcom_wcnss.o
diff --git a/drivers/remoteproc/qcom_q6v5_wcss.c 
b/drivers/remoteproc/qcom_q6v5_wcss.c
new file mode 100644
index 000..9c5b3b4
--- /dev/null
+++ b/drivers/remoteproc/qcom_q6v5_wcss.c
@@ -0,0 +1,588 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2016-2018 Linaro Ltd.
+ * Copyright (C) 2014 Sony Mobile Communications AB
+ * Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "qcom_common.h"
+#include "qcom_q6v5.h"
+
+#define WCSS_CRASH_REASON  421
+
+/* QDSP6SS Register Offsets */
+#define QDSP6SS_RESET_REG  0x014
+#define QDSP6SS_GFMUX_CTL_REG  0x020
+#define QDSP6SS_PWR_CTL_REG0x030
+#define QDSP6SS_MEM_PWR_CTL0x0B0
+
+/* AXI Halt Register Offsets */
+#define AXI_HALTREQ_REG0x0
+#define AXI_HALTACK_REG0x4
+#define AXI_IDLE_REG   0x8
+
+#define HALT_ACK_TIMEOUT_MS100
+
+/* QDSP6SS_RESET */
+#define Q6SS_STOP_CORE 

[PATCH] remoteproc: qcom: Introduce Hexagon V5 based WCSS driver

2018-06-04 Thread Sricharan R
IPQ8074 has an integrated Hexagon dsp core q6v5 and a wireless lan
(Lithium) IP. An mdt type single image format is used for the
firmware. So the mdt_load function can be directly used to load
the firmware. Also add the relevant resets required for this core.

Acked-by: Rob Herring  (for bindings)
Signed-off-by: Sricharan R 
[bjorn: Rewrote as a separate driver, intead of extending q6v5_pil.c]
Signed-off-by: Bjorn Andersson 
---
 Fixed review comments from Vinod.
 Retained the reg read/update/write sequence instead of modify for
 readability
 In q6v5_wcss_powerdown SSCAON_CONFIG bits 16,17,18 documentation
 have to be updated.

 .../devicetree/bindings/remoteproc/qcom,q6v5.txt   |   7 +-
 drivers/remoteproc/Kconfig |  16 +
 drivers/remoteproc/Makefile|   1 +
 drivers/remoteproc/qcom_q6v5_wcss.c| 588 +
 4 files changed, 611 insertions(+), 1 deletion(-)
 create mode 100644 drivers/remoteproc/qcom_q6v5_wcss.c

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
index 00d3d58..d52d05e 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
@@ -11,6 +11,7 @@ on the Qualcomm Hexagon core.
"qcom,msm8916-mss-pil",
"qcom,msm8974-mss-pil"
"qcom,msm8996-mss-pil"
+   "qcom,ipq8074-wcss-pil"
 
 - reg:
Usage: required
@@ -49,11 +50,15 @@ on the Qualcomm Hexagon core.
Usage: required
Value type: 
Definition: reference to the reset-controller for the modem sub-system
+   reference to the list of 3 reset-controllers for the
+   wcss sub-system
 
 - reset-names:
Usage: required
Value type: 
-   Definition: must be "mss_restart"
+   Definition: must be "mss_restart" for the modem sub-system
+   Definition: must be "wcss_aon_reset", "wcss_reset", "wcss_q6_reset"
+   for the wcss syb-system
 
 - cx-supply:
 - mss-supply:
diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index be76619..7b1a9ef 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -123,6 +123,22 @@ config QCOM_Q6V5_PIL
  Say y here to support the Qualcomm Peripherial Image Loader for the
  Hexagon V5 based remote processors.
 
+config QCOM_Q6V5_WCSS
+   tristate "Qualcomm Hexagon based WCSS Peripheral Image Loader"
+   depends on OF && ARCH_QCOM
+   depends on QCOM_SMEM
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_GLINK_SMEM || RPMSG_QCOM_GLINK_SMEM=n
+   depends on QCOM_SYSMON || QCOM_SYSMON=n
+   select MFD_SYSCON
+   select QCOM_MDT_LOADER
+   select QCOM_Q6V5_COMMON
+   select QCOM_RPROC_COMMON
+   select QCOM_SCM
+   help
+ Say y here to support the Qualcomm Peripheral Image Loader for the
+ Hexagon V5 based WCSS remote processors.
+
 config QCOM_SYSMON
tristate "Qualcomm sysmon driver"
depends on RPMSG
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 5dd0249..03332fa 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_QCOM_ADSP_PIL)   += qcom_adsp_pil.o
 obj-$(CONFIG_QCOM_RPROC_COMMON)+= qcom_common.o
 obj-$(CONFIG_QCOM_Q6V5_COMMON) += qcom_q6v5.o
 obj-$(CONFIG_QCOM_Q6V5_PIL)+= qcom_q6v5_pil.o
+obj-$(CONFIG_QCOM_Q6V5_WCSS)   += qcom_q6v5_wcss.o
 obj-$(CONFIG_QCOM_SYSMON)  += qcom_sysmon.o
 obj-$(CONFIG_QCOM_WCNSS_PIL)   += qcom_wcnss_pil.o
 qcom_wcnss_pil-y   += qcom_wcnss.o
diff --git a/drivers/remoteproc/qcom_q6v5_wcss.c 
b/drivers/remoteproc/qcom_q6v5_wcss.c
new file mode 100644
index 000..9c5b3b4
--- /dev/null
+++ b/drivers/remoteproc/qcom_q6v5_wcss.c
@@ -0,0 +1,588 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2016-2018 Linaro Ltd.
+ * Copyright (C) 2014 Sony Mobile Communications AB
+ * Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "qcom_common.h"
+#include "qcom_q6v5.h"
+
+#define WCSS_CRASH_REASON  421
+
+/* QDSP6SS Register Offsets */
+#define QDSP6SS_RESET_REG  0x014
+#define QDSP6SS_GFMUX_CTL_REG  0x020
+#define QDSP6SS_PWR_CTL_REG0x030
+#define QDSP6SS_MEM_PWR_CTL0x0B0
+
+/* AXI Halt Register Offsets */
+#define AXI_HALTREQ_REG0x0
+#define AXI_HALTACK_REG0x4
+#define AXI_IDLE_REG   0x8
+
+#define HALT_ACK_TIMEOUT_MS100
+
+/* QDSP6SS_RESET */
+#define Q6SS_STOP_CORE 

Re: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions

2018-06-04 Thread kbuild test robot
Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: c6x-evmc6678_defconfig (attached as .config)
compiler: c6x-elf-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=c6x 

All errors (new ones prefixed by >>):

   include/linux/compiler.h: Assembler messages:
>> include/linux/compiler.h:308: Error: Macro `annotate_unreachable' was 
>> already defined

vim +308 include/linux/compiler.h

   307  
 > 308  .macro ANNOTATE_UNREACHABLE counter:req
   309  .endm
   310  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions

2018-06-04 Thread kbuild test robot
Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: c6x-evmc6678_defconfig (attached as .config)
compiler: c6x-elf-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=c6x 

All errors (new ones prefixed by >>):

   include/linux/compiler.h: Assembler messages:
>> include/linux/compiler.h:308: Error: Macro `annotate_unreachable' was 
>> already defined

vim +308 include/linux/compiler.h

   307  
 > 308  .macro ANNOTATE_UNREACHABLE counter:req
   309  .endm
   310  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread H. Peter Anvin
On 06/04/18 20:57, Mika Penttilä wrote:
> 
> This won't work on X86-32 because it actually uses the segment limit with fs: 
> access. So there 
> is a reason why the lsl based method is X86-64 only.
> 



Why does that matter in any shape, way, or form?  The LSL instruction
doesn't touch any of the segment registers, it just uses a segment
selector number.



I see... we have a VERY unfortunate name collision: the x86-64
GDT_ENTRY_PERC_PU and the i386 GDT_ENTRY_PERCPU are in fact two
completely different things, with the latter being the actual percpu
offset used by the kernel.

So yes, this patch is wrong, because the naming of the x86-64 segment is
insane especially in the proximity of the  -- it should be something
like GDT_ENTRY_CPU_NUMBER.

Unfortunately we probably can't use the same GDT entry on x86-32 and
x86-64, because entry 15 (selector 0x7b) is USER_DS, which is something
we really don't want to screw with.  This means i386 programs that
execute LSL directly for whatever reason will have to understand the
difference, but most of the other segment numbers differ as well,
including user space %cs (USER_CS/USER32_CS) and %ds/%es/%ss (USER_DS).
Perhaps we could bump down segments 23-28 by one and put it as 23, that
way the CPU_NUMBER segment would always be at %ss+80 for the default
(flat, initial) user space %ss.  (We want to specify using %ss rather
than %ds, because it is less likely to be changed and because 64 bits,
too, have %ss defined, but not %ds.)



Rename the x86-64 segment to CPU_NUMBER, fixing the naming conflict.
Add 1 to GDT entry numbers 23-28 for i386 (all of these are
kernel-internal segments and so have no impact on user space).
Add i386 CPU_NUMBER equivalent to x86-64 at GDT entry 23.
Document the above relationship between segments.

OK, everyone?

-hpa


Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread H. Peter Anvin
On 06/04/18 20:57, Mika Penttilä wrote:
> 
> This won't work on X86-32 because it actually uses the segment limit with fs: 
> access. So there 
> is a reason why the lsl based method is X86-64 only.
> 



Why does that matter in any shape, way, or form?  The LSL instruction
doesn't touch any of the segment registers, it just uses a segment
selector number.



I see... we have a VERY unfortunate name collision: the x86-64
GDT_ENTRY_PERC_PU and the i386 GDT_ENTRY_PERCPU are in fact two
completely different things, with the latter being the actual percpu
offset used by the kernel.

So yes, this patch is wrong, because the naming of the x86-64 segment is
insane especially in the proximity of the  -- it should be something
like GDT_ENTRY_CPU_NUMBER.

Unfortunately we probably can't use the same GDT entry on x86-32 and
x86-64, because entry 15 (selector 0x7b) is USER_DS, which is something
we really don't want to screw with.  This means i386 programs that
execute LSL directly for whatever reason will have to understand the
difference, but most of the other segment numbers differ as well,
including user space %cs (USER_CS/USER32_CS) and %ds/%es/%ss (USER_DS).
Perhaps we could bump down segments 23-28 by one and put it as 23, that
way the CPU_NUMBER segment would always be at %ss+80 for the default
(flat, initial) user space %ss.  (We want to specify using %ss rather
than %ds, because it is less likely to be changed and because 64 bits,
too, have %ss defined, but not %ds.)



Rename the x86-64 segment to CPU_NUMBER, fixing the naming conflict.
Add 1 to GDT entry numbers 23-28 for i386 (all of these are
kernel-internal segments and so have no impact on user space).
Add i386 CPU_NUMBER equivalent to x86-64 at GDT entry 23.
Document the above relationship between segments.

OK, everyone?

-hpa


Re: [PATCHv2 03/16] atomics/treewide: make atomic64_inc_not_zero() optional

2018-06-04 Thread Mark Rutland
On Mon, Jun 04, 2018 at 04:17:25PM -0700, Palmer Dabbelt wrote:
> On Tue, 29 May 2018 08:43:33 PDT (-0700), mark.rutl...@arm.com wrote:
> > We define a trivial fallback for atomic_inc_not_zero(), but don't do
> > the same for atmic64_inc_not_zero(), leading most architectures to
> > define the same boilerplate.
> 
> atmic64

Cheers for the spot; fixed now!

Mark.


Re: [PATCHv2 03/16] atomics/treewide: make atomic64_inc_not_zero() optional

2018-06-04 Thread Mark Rutland
On Mon, Jun 04, 2018 at 04:17:25PM -0700, Palmer Dabbelt wrote:
> On Tue, 29 May 2018 08:43:33 PDT (-0700), mark.rutl...@arm.com wrote:
> > We define a trivial fallback for atomic_inc_not_zero(), but don't do
> > the same for atmic64_inc_not_zero(), leading most architectures to
> > define the same boilerplate.
> 
> atmic64

Cheers for the spot; fixed now!

Mark.


Re: [greybus-dev] [PATCH] Staging:greybus Fix comparison to NULL

2018-06-04 Thread Viresh Kumar
On 03-06-18, 08:52, Janani Sankara Babu wrote:
> This patch replaces comparison of var to NULL with !var
> 
> Signed-off-by: Janani Sankara Babu 
> ---
>  drivers/staging/greybus/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/greybus/core.c b/drivers/staging/greybus/core.c
> index dafa430..5d14a4e 100644
> --- a/drivers/staging/greybus/core.c
> +++ b/drivers/staging/greybus/core.c
> @@ -48,7 +48,7 @@ static bool greybus_match_one_id(struct gb_bundle *bundle,
>  static const struct greybus_bundle_id *
>  greybus_match_id(struct gb_bundle *bundle, const struct greybus_bundle_id 
> *id)
>  {
> - if (id == NULL)
> + if (!id)

It is pretty much personal preference and there is no good reason to accept this
patch really.

-- 
viresh


Re: [greybus-dev] [PATCH] Staging:greybus Fix comparison to NULL

2018-06-04 Thread Viresh Kumar
On 03-06-18, 08:52, Janani Sankara Babu wrote:
> This patch replaces comparison of var to NULL with !var
> 
> Signed-off-by: Janani Sankara Babu 
> ---
>  drivers/staging/greybus/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/greybus/core.c b/drivers/staging/greybus/core.c
> index dafa430..5d14a4e 100644
> --- a/drivers/staging/greybus/core.c
> +++ b/drivers/staging/greybus/core.c
> @@ -48,7 +48,7 @@ static bool greybus_match_one_id(struct gb_bundle *bundle,
>  static const struct greybus_bundle_id *
>  greybus_match_id(struct gb_bundle *bundle, const struct greybus_bundle_id 
> *id)
>  {
> - if (id == NULL)
> + if (!id)

It is pretty much personal preference and there is no good reason to accept this
patch really.

-- 
viresh


Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread Mika Penttilä
On 06/05/2018 07:44 AM, Bae, Chang Seok wrote:
>>> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
>>> index ea554f8..e716e94 100644
>>> --- a/arch/x86/kernel/setup_percpu.c
>>> +++ b/arch/x86/kernel/setup_percpu.c
>>> @@ -155,12 +155,21 @@ static void __init pcpup_populate_pte(unsigned long 
>>> addr)
>>>  
>>>  static inline void setup_percpu_segment(int cpu)
>>>  {
>>> -#ifdef CONFIG_X86_32
>>> -   struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
>>> - 0xF);
>>> +#ifdef CONFIG_NUMA
>>> +   unsigned long node = early_cpu_to_node(cpu);
>>> +#else
>>> +   unsigned long node = 0;
>>> +#endif
>>> +   struct desc_struct d = GDT_ENTRY_INIT(0x0, per_cpu_offset(cpu),
>>> +  make_lsl_tscp(cpu, node));
>>> +
>>> +   d.type = 5; /* R0 data, expand down, accessed */
>>> +   d.dpl = 3;  /* Visible to user code */
>>> +   d.s = 1;/* Not a system segment */
>>> +   d.p = 1;/* Present */
>>> +   d.d = 1;/* 32-bit */
>>>  
>>> write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, , DESCTYPE_S);
>>> -#endif
>>>  }
> 
>> This won't work on X86-32 because it actually uses the segment limit with 
>> fs: access. So there 
>> is a reason why the lsl based method is X86-64 only.
> 
> The limit will be consumed in X86-64 only, while the unification with i386 
> was suggested for a
> different reason.
> 
> Thanks,
> Chang
> 

The unification affects i386, and the limit is consumed by the processor with 
fs: access.
The limit was 0xF before, now it depends on the cpu/node. So accesses on 
small number cpus 
are likely to fault.

--Mika


Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread Mika Penttilä
On 06/05/2018 07:44 AM, Bae, Chang Seok wrote:
>>> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
>>> index ea554f8..e716e94 100644
>>> --- a/arch/x86/kernel/setup_percpu.c
>>> +++ b/arch/x86/kernel/setup_percpu.c
>>> @@ -155,12 +155,21 @@ static void __init pcpup_populate_pte(unsigned long 
>>> addr)
>>>  
>>>  static inline void setup_percpu_segment(int cpu)
>>>  {
>>> -#ifdef CONFIG_X86_32
>>> -   struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
>>> - 0xF);
>>> +#ifdef CONFIG_NUMA
>>> +   unsigned long node = early_cpu_to_node(cpu);
>>> +#else
>>> +   unsigned long node = 0;
>>> +#endif
>>> +   struct desc_struct d = GDT_ENTRY_INIT(0x0, per_cpu_offset(cpu),
>>> +  make_lsl_tscp(cpu, node));
>>> +
>>> +   d.type = 5; /* R0 data, expand down, accessed */
>>> +   d.dpl = 3;  /* Visible to user code */
>>> +   d.s = 1;/* Not a system segment */
>>> +   d.p = 1;/* Present */
>>> +   d.d = 1;/* 32-bit */
>>>  
>>> write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, , DESCTYPE_S);
>>> -#endif
>>>  }
> 
>> This won't work on X86-32 because it actually uses the segment limit with 
>> fs: access. So there 
>> is a reason why the lsl based method is X86-64 only.
> 
> The limit will be consumed in X86-64 only, while the unification with i386 
> was suggested for a
> different reason.
> 
> Thanks,
> Chang
> 

The unification affects i386, and the limit is consumed by the processor with 
fs: access.
The limit was 0xF before, now it depends on the cpu/node. So accesses on 
small number cpus 
are likely to fault.

--Mika


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 31-05-18, 20:25, Daniel Lezcano wrote:
> On 29/05/2018 11:31, Viresh Kumar wrote:
> > On 25-05-18, 11:49, Daniel Lezcano wrote:
> >> +  /*
> >> +   * The last CPU waking up is in charge of setting the timer. If
> >> +   * the CPU is hotplugged, the timer will move to another CPU
> >> +   * (which may not belong to the same cluster) but that is not a
> >> +   * problem as the timer will be set again by another CPU
> >> +   * belonging to the cluster. This mechanism is self adaptive.
> >> +   */
> > 
> > I am afraid that the above comment may not be completely true all the
> > time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
> > run this function as soon as the kthread is woken up, but one of the
> > CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
> > Because you do atomic_inc() also in this function (above) itself,
> > below decrement may return a true value for the CPU2 and that will
> > restart the hrtimer, while one of the CPUs never got a chance to
> > increment count in the first place.
> >
> > The fix is simple though, do the increment in idle_injection_wakeup()
> > and things should be fine then.
> 
> Ok.
> 
> >> +  if (!atomic_dec_and_test(_dev->count))
> >> +  return;
> >> +
> >> +  run_duration_ms = atomic_read(_dev->run_duration_ms);
> >> +  if (run_duration_ms) {
> >> +  hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
> >> +HRTIMER_MODE_REL_PINNED);
> >> +  return;
> >> +  }
> >> +
> >> +  complete(_dev->stop_complete);
> > 
> > So you call complete() after hrtimer is potentially restarted. This
> > can happen if idle_injection_stop() is called right after the above
> > atomic_read() has finished :)
> > 
> > IOW, this doesn't look safe now as well.
> 
> It is safe, we just missed a cycle and the stop will block until the
> next cycle. I did it on purpose and for me it is correct.

Okay, what about this then:

Path A Path B

idle_injection_fn()idle_injection_unregister()
  hrtimer_start()idle_injection_stop()

  complete()

   wait_for_completion()
 kfree(ii_dev);
  
  Hrtimer is still used here after
  getting freed.


Is this not possible ?

> >> +struct idle_injection_device *idle_injection_register(struct cpumask 
> >> *cpumask)
> >> +{
> >> +  struct idle_injection_device *ii_dev;
> >> +  int cpu, cpu2;
> >> +
> >> +  ii_dev = ii_dev_alloc();
> >> +  if (!ii_dev)
> >> +  return NULL;
> >> +
> >> +  cpumask_copy(ii_dev->cpumask, cpumask);
> >> +  hrtimer_init(_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> >> +  ii_dev->timer.function = idle_injection_wakeup_fn;
> >> +
> >> +  for_each_cpu(cpu, ii_dev->cpumask) {
> >> +
> >> +  if (per_cpu(idle_injection_device, cpu)) {
> > 
> > Maybe add unlikely here ?
> 
> For this kind of init function and tight loop, there is no benefit of
> adding the unlikely / likely. I can add it if you want, but it is useless.

Okay.

> >> +  pr_err("cpu%d is already registered\n", cpu);
> >> +  goto out_rollback_per_cpu;
> >> +  }
> >> +
> >> +  per_cpu(idle_injection_device, cpu) = ii_dev;
> >> +  }
> >> +
> >> +  return ii_dev;
> >> +
> >> +out_rollback_per_cpu:
> >> +  for_each_cpu(cpu2, ii_dev->cpumask) {
> >> +  if (cpu == cpu2)
> > 
> > And is this really required? Perhaps this conditional is making it
> > worse and just setting the per-cpu variable forcefully to NULL would
> > be much faster :)
> 
> We want undo what was done, setting all variables to NULL resets the
> values from a previous register and we don't want that.

Why will that happen ? We are only iterating over a particular cpumask and not
all possible CPUs. Yes I understand the "undo only what we did" part, but the
conditional is more expensive than that I feel.

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 31-05-18, 20:25, Daniel Lezcano wrote:
> On 29/05/2018 11:31, Viresh Kumar wrote:
> > On 25-05-18, 11:49, Daniel Lezcano wrote:
> >> +  /*
> >> +   * The last CPU waking up is in charge of setting the timer. If
> >> +   * the CPU is hotplugged, the timer will move to another CPU
> >> +   * (which may not belong to the same cluster) but that is not a
> >> +   * problem as the timer will be set again by another CPU
> >> +   * belonging to the cluster. This mechanism is self adaptive.
> >> +   */
> > 
> > I am afraid that the above comment may not be completely true all the
> > time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
> > run this function as soon as the kthread is woken up, but one of the
> > CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
> > Because you do atomic_inc() also in this function (above) itself,
> > below decrement may return a true value for the CPU2 and that will
> > restart the hrtimer, while one of the CPUs never got a chance to
> > increment count in the first place.
> >
> > The fix is simple though, do the increment in idle_injection_wakeup()
> > and things should be fine then.
> 
> Ok.
> 
> >> +  if (!atomic_dec_and_test(_dev->count))
> >> +  return;
> >> +
> >> +  run_duration_ms = atomic_read(_dev->run_duration_ms);
> >> +  if (run_duration_ms) {
> >> +  hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
> >> +HRTIMER_MODE_REL_PINNED);
> >> +  return;
> >> +  }
> >> +
> >> +  complete(_dev->stop_complete);
> > 
> > So you call complete() after hrtimer is potentially restarted. This
> > can happen if idle_injection_stop() is called right after the above
> > atomic_read() has finished :)
> > 
> > IOW, this doesn't look safe now as well.
> 
> It is safe, we just missed a cycle and the stop will block until the
> next cycle. I did it on purpose and for me it is correct.

Okay, what about this then:

Path A Path B

idle_injection_fn()idle_injection_unregister()
  hrtimer_start()idle_injection_stop()

  complete()

   wait_for_completion()
 kfree(ii_dev);
  
  Hrtimer is still used here after
  getting freed.


Is this not possible ?

> >> +struct idle_injection_device *idle_injection_register(struct cpumask 
> >> *cpumask)
> >> +{
> >> +  struct idle_injection_device *ii_dev;
> >> +  int cpu, cpu2;
> >> +
> >> +  ii_dev = ii_dev_alloc();
> >> +  if (!ii_dev)
> >> +  return NULL;
> >> +
> >> +  cpumask_copy(ii_dev->cpumask, cpumask);
> >> +  hrtimer_init(_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> >> +  ii_dev->timer.function = idle_injection_wakeup_fn;
> >> +
> >> +  for_each_cpu(cpu, ii_dev->cpumask) {
> >> +
> >> +  if (per_cpu(idle_injection_device, cpu)) {
> > 
> > Maybe add unlikely here ?
> 
> For this kind of init function and tight loop, there is no benefit of
> adding the unlikely / likely. I can add it if you want, but it is useless.

Okay.

> >> +  pr_err("cpu%d is already registered\n", cpu);
> >> +  goto out_rollback_per_cpu;
> >> +  }
> >> +
> >> +  per_cpu(idle_injection_device, cpu) = ii_dev;
> >> +  }
> >> +
> >> +  return ii_dev;
> >> +
> >> +out_rollback_per_cpu:
> >> +  for_each_cpu(cpu2, ii_dev->cpumask) {
> >> +  if (cpu == cpu2)
> > 
> > And is this really required? Perhaps this conditional is making it
> > worse and just setting the per-cpu variable forcefully to NULL would
> > be much faster :)
> 
> We want undo what was done, setting all variables to NULL resets the
> values from a previous register and we don't want that.

Why will that happen ? We are only iterating over a particular cpumask and not
all possible CPUs. Yes I understand the "undo only what we did" part, but the
conditional is more expensive than that I feel.

-- 
viresh


[PATCH V2 2/2] arm: dts: sunxi: Add missing cooling device properties for CPUs

2018-06-04 Thread Viresh Kumar
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.

Add such missing properties.

Fix other missing properties (clocks, OPP, clock latency) as well to
make it all work.

Signed-off-by: Viresh Kumar 
---
V2:
- Separated patch for h3
- Fixed subject s/sun/sunxi/

 arch/arm/boot/dts/sun6i-a31.dtsi | 30 ++
 arch/arm/boot/dts/sun7i-a20.dtsi | 13 +
 arch/arm/boot/dts/sun8i-a33.dtsi |  9 +
 3 files changed, 52 insertions(+)

diff --git a/arch/arm/boot/dts/sun6i-a31.dtsi b/arch/arm/boot/dts/sun6i-a31.dtsi
index c72992556a86..debc0bf22ea3 100644
--- a/arch/arm/boot/dts/sun6i-a31.dtsi
+++ b/arch/arm/boot/dts/sun6i-a31.dtsi
@@ -119,18 +119,48 @@
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <1>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   1008000 120
+   864000  120
+   72  110
+   48  100
+   >;
+   #cooling-cells = <2>;
};
 
cpu@2 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <2>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   1008000 120
+   864000  120
+   72  110
+   48  100
+   >;
+   #cooling-cells = <2>;
};
 
cpu@3 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <3>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   1008000 120
+   864000  120
+   72  110
+   48  100
+   >;
+   #cooling-cells = <2>;
};
};
 
diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
index e529e4ff2174..35372a0cfc8d 100644
--- a/arch/arm/boot/dts/sun7i-a20.dtsi
+++ b/arch/arm/boot/dts/sun7i-a20.dtsi
@@ -122,6 +122,19 @@
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <1>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   96  140
+   912000  140
+   864000  130
+   72  120
+   528000  110
+   312000  100
+   144000  100
+   >;
+   #cooling-cells = <2>;
};
};
 
diff --git a/arch/arm/boot/dts/sun8i-a33.dtsi b/arch/arm/boot/dts/sun8i-a33.dtsi
index 8d278ee001e9..4e92741b24a7 100644
--- a/arch/arm/boot/dts/sun8i-a33.dtsi
+++ b/arch/arm/boot/dts/sun8i-a33.dtsi
@@ -132,21 +132,30 @@
};
 
cpu@1 {
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@2 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <2>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@3 {

[PATCH V2 2/2] arm: dts: sunxi: Add missing cooling device properties for CPUs

2018-06-04 Thread Viresh Kumar
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.

Add such missing properties.

Fix other missing properties (clocks, OPP, clock latency) as well to
make it all work.

Signed-off-by: Viresh Kumar 
---
V2:
- Separated patch for h3
- Fixed subject s/sun/sunxi/

 arch/arm/boot/dts/sun6i-a31.dtsi | 30 ++
 arch/arm/boot/dts/sun7i-a20.dtsi | 13 +
 arch/arm/boot/dts/sun8i-a33.dtsi |  9 +
 3 files changed, 52 insertions(+)

diff --git a/arch/arm/boot/dts/sun6i-a31.dtsi b/arch/arm/boot/dts/sun6i-a31.dtsi
index c72992556a86..debc0bf22ea3 100644
--- a/arch/arm/boot/dts/sun6i-a31.dtsi
+++ b/arch/arm/boot/dts/sun6i-a31.dtsi
@@ -119,18 +119,48 @@
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <1>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   1008000 120
+   864000  120
+   72  110
+   48  100
+   >;
+   #cooling-cells = <2>;
};
 
cpu@2 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <2>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   1008000 120
+   864000  120
+   72  110
+   48  100
+   >;
+   #cooling-cells = <2>;
};
 
cpu@3 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <3>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   1008000 120
+   864000  120
+   72  110
+   48  100
+   >;
+   #cooling-cells = <2>;
};
};
 
diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
index e529e4ff2174..35372a0cfc8d 100644
--- a/arch/arm/boot/dts/sun7i-a20.dtsi
+++ b/arch/arm/boot/dts/sun7i-a20.dtsi
@@ -122,6 +122,19 @@
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <1>;
+   clocks = < CLK_CPU>;
+   clock-latency = <244144>; /* 8 32k periods */
+   operating-points = <
+   /* kHzuV */
+   96  140
+   912000  140
+   864000  130
+   72  120
+   528000  110
+   312000  100
+   144000  100
+   >;
+   #cooling-cells = <2>;
};
};
 
diff --git a/arch/arm/boot/dts/sun8i-a33.dtsi b/arch/arm/boot/dts/sun8i-a33.dtsi
index 8d278ee001e9..4e92741b24a7 100644
--- a/arch/arm/boot/dts/sun8i-a33.dtsi
+++ b/arch/arm/boot/dts/sun8i-a33.dtsi
@@ -132,21 +132,30 @@
};
 
cpu@1 {
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@2 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <2>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@3 {

[PATCH V2 1/2] arm: dts: sun8i-h3: Add missing cooling device properties for CPUs

2018-06-04 Thread Viresh Kumar
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.

Add such missing properties.

Fix other missing properties (clocks, clock-names) as well to make it all
work.

Signed-off-by: Viresh Kumar 
---
V2: Separated patch for h3.

 arch/arm/boot/dts/sun8i-h3.dtsi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
index 41d57c76f290..9dff6887923c 100644
--- a/arch/arm/boot/dts/sun8i-h3.dtsi
+++ b/arch/arm/boot/dts/sun8i-h3.dtsi
@@ -84,21 +84,30 @@
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <1>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@2 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <2>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@3 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <3>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
};
 
-- 
2.15.0.194.g9af6a3dea062



[PATCH V2 1/2] arm: dts: sun8i-h3: Add missing cooling device properties for CPUs

2018-06-04 Thread Viresh Kumar
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.

Add such missing properties.

Fix other missing properties (clocks, clock-names) as well to make it all
work.

Signed-off-by: Viresh Kumar 
---
V2: Separated patch for h3.

 arch/arm/boot/dts/sun8i-h3.dtsi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
index 41d57c76f290..9dff6887923c 100644
--- a/arch/arm/boot/dts/sun8i-h3.dtsi
+++ b/arch/arm/boot/dts/sun8i-h3.dtsi
@@ -84,21 +84,30 @@
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <1>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@2 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <2>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
 
cpu@3 {
compatible = "arm,cortex-a7";
device_type = "cpu";
reg = <3>;
+   clocks = < CLK_CPUX>;
+   clock-names = "cpu";
operating-points-v2 = <_opp_table>;
+   #cooling-cells = <2>;
};
};
 
-- 
2.15.0.194.g9af6a3dea062



Re: Regression in Linux next again

2018-06-04 Thread Tony Lindgren
Hi,

* Maciej Purski  [180604 14:02]:
> Tony, please apply this patchset and test it on your Beaglebone. It'd be
> great if you could try to find out, which patch causes failure. They should
> be appliable on the current next.

It seems beagle-x15 boots for me except with the last patch in the
series with compile issue fixed. With that applied after modules
get loaded it just hangs with:

[   25.925607] regulator_resolve_supply: 1608
[   25.930620] regulator_resolve_supply: 1608
[   26.098449] lib80211: common routines for IEEE802.11 drivers
[   26.446713] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[   26.514834] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   26.525311] platform regulatory.0: Direct firmware load for regulatory.db 
failed with error -2
[   26.534250] cfg80211: failed to load regulatory.db
[   26.998736] regulator_set_voltage: 3381
[   27.007662] _regulator_do_set_voltage: 2913
[   27.012593] regulator_set_voltage: 3381
[   27.016969] _regulator_do_set_voltage: 2913
[   27.022847] regulator_set_voltage: 3381

Not sure what module gets loaded at that point, I'll test again
with initcall_debug once you post a compiling version of your last
patch.

Regards,

Tony


Re: Regression in Linux next again

2018-06-04 Thread Tony Lindgren
Hi,

* Maciej Purski  [180604 14:02]:
> Tony, please apply this patchset and test it on your Beaglebone. It'd be
> great if you could try to find out, which patch causes failure. They should
> be appliable on the current next.

It seems beagle-x15 boots for me except with the last patch in the
series with compile issue fixed. With that applied after modules
get loaded it just hangs with:

[   25.925607] regulator_resolve_supply: 1608
[   25.930620] regulator_resolve_supply: 1608
[   26.098449] lib80211: common routines for IEEE802.11 drivers
[   26.446713] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[   26.514834] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   26.525311] platform regulatory.0: Direct firmware load for regulatory.db 
failed with error -2
[   26.534250] cfg80211: failed to load regulatory.db
[   26.998736] regulator_set_voltage: 3381
[   27.007662] _regulator_do_set_voltage: 2913
[   27.012593] regulator_set_voltage: 3381
[   27.016969] _regulator_do_set_voltage: 2913
[   27.022847] regulator_set_voltage: 3381

Not sure what module gets loaded at that point, I'll test again
with initcall_debug once you post a compiling version of your last
patch.

Regards,

Tony


Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread Bae, Chang Seok
>> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
>> index ea554f8..e716e94 100644
>> --- a/arch/x86/kernel/setup_percpu.c
>> +++ b/arch/x86/kernel/setup_percpu.c
>> @@ -155,12 +155,21 @@ static void __init pcpup_populate_pte(unsigned long 
>> addr)
>>  
>>  static inline void setup_percpu_segment(int cpu)
>>  {
>> -#ifdef CONFIG_X86_32
>> -struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
>> -  0xF);
>> +#ifdef CONFIG_NUMA
>> +unsigned long node = early_cpu_to_node(cpu);
>> +#else
>> +unsigned long node = 0;
>> +#endif
>> +struct desc_struct d = GDT_ENTRY_INIT(0x0, per_cpu_offset(cpu),
>> +   make_lsl_tscp(cpu, node));
>> +
>> +d.type = 5; /* R0 data, expand down, accessed */
>> +d.dpl = 3;  /* Visible to user code */
>> +d.s = 1;/* Not a system segment */
>> +d.p = 1;/* Present */
>> +d.d = 1;/* 32-bit */
>>  
>>  write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, , DESCTYPE_S);
>> -#endif
>>  }

> This won't work on X86-32 because it actually uses the segment limit with fs: 
> access. So there 
> is a reason why the lsl based method is X86-64 only.

The limit will be consumed in X86-64 only, while the unification with i386 was 
suggested for a
different reason.

Thanks,
Chang



Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread Bae, Chang Seok
>> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
>> index ea554f8..e716e94 100644
>> --- a/arch/x86/kernel/setup_percpu.c
>> +++ b/arch/x86/kernel/setup_percpu.c
>> @@ -155,12 +155,21 @@ static void __init pcpup_populate_pte(unsigned long 
>> addr)
>>  
>>  static inline void setup_percpu_segment(int cpu)
>>  {
>> -#ifdef CONFIG_X86_32
>> -struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
>> -  0xF);
>> +#ifdef CONFIG_NUMA
>> +unsigned long node = early_cpu_to_node(cpu);
>> +#else
>> +unsigned long node = 0;
>> +#endif
>> +struct desc_struct d = GDT_ENTRY_INIT(0x0, per_cpu_offset(cpu),
>> +   make_lsl_tscp(cpu, node));
>> +
>> +d.type = 5; /* R0 data, expand down, accessed */
>> +d.dpl = 3;  /* Visible to user code */
>> +d.s = 1;/* Not a system segment */
>> +d.p = 1;/* Present */
>> +d.d = 1;/* 32-bit */
>>  
>>  write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, , DESCTYPE_S);
>> -#endif
>>  }

> This won't work on X86-32 because it actually uses the segment limit with fs: 
> access. So there 
> is a reason why the lsl based method is X86-64 only.

The limit will be consumed in X86-64 only, while the unification with i386 was 
suggested for a
different reason.

Thanks,
Chang



Re: [PATCH 1/6] arm64: dts: amlogic: Add missing cooling device properties for CPUs

2018-06-04 Thread Viresh Kumar
On 02-06-18, 01:14, Olof Johansson wrote:
> And what I am saying is that it sounds like a broken binding if you don't 
> allow
> that, especially since it'll be a super common case that all CPUs will specify
> the same cooling-device specifier.

I am fine with allowing the #cooling-cells property in the cpus node if the DT
maintainers are fine with it.

@Rob: comments ?

@Olof: What about other properties which are still going to be duplicated for
the most common cases today, like: clocks, supply information, cache
information, cpu-idle-states and others. When we can duplicate these properties,
why not keep following the same for #cpu-cooling property ?

Note that the OPP table doesn't really need to get duplicated (for new
platforms) as the platforms use the v2 bindings now which just duplicates a
phandle assignment for all CPUs. Its a mess with older platforms which use the
earlier version of OPP table.

> > This property is required to declare a device as a cooling-device and
> > the device here is CPU. We use it as a cooling device by limiting its
> > higher range of frequencies, so that it doesn't generate too much
> > heat.
> > 
> > It is already there for CPU0 and CPU4, but it should really be there
> > for all the CPUs, like we have clock, supply, caches, etc.
> 
> You have #cooling-cells in the cpu node, but the actual data is in the
> thermal-zones nodes. Why isn't #cooling-cells under thermal-zones, next to
> cooling-maps?

Actually I thought about that when I worked on these patches initially and this
is why I felt convinced that the CPU nodes are the right place for this.

We add #interrupt-cells to an Interrupt controller's DT node, #gpio-cells to a
GPIO controller's DT node, #clock-cells to a clock controller's DT node and
that's exactly why we should (and we do) add #cooling-cells property to a
cooling device's DT node. This information is used in two ways, first it enables
the OS to know that the device is capable of being a cooling device and second
it tells us how many arguments will be required with a phandle of this device.

And so the cooling-maps always contain two arguments with the cooling device's
phandle (which is mostly a CPU or a gpio fan) as the #cooling-cells currently
is fixed to 2.

And so I am not really sure if thermal-zones is the right place to define this
thing. Is my understanding correct ?

-- 
viresh


Re: [PATCH 1/6] arm64: dts: amlogic: Add missing cooling device properties for CPUs

2018-06-04 Thread Viresh Kumar
On 02-06-18, 01:14, Olof Johansson wrote:
> And what I am saying is that it sounds like a broken binding if you don't 
> allow
> that, especially since it'll be a super common case that all CPUs will specify
> the same cooling-device specifier.

I am fine with allowing the #cooling-cells property in the cpus node if the DT
maintainers are fine with it.

@Rob: comments ?

@Olof: What about other properties which are still going to be duplicated for
the most common cases today, like: clocks, supply information, cache
information, cpu-idle-states and others. When we can duplicate these properties,
why not keep following the same for #cpu-cooling property ?

Note that the OPP table doesn't really need to get duplicated (for new
platforms) as the platforms use the v2 bindings now which just duplicates a
phandle assignment for all CPUs. Its a mess with older platforms which use the
earlier version of OPP table.

> > This property is required to declare a device as a cooling-device and
> > the device here is CPU. We use it as a cooling device by limiting its
> > higher range of frequencies, so that it doesn't generate too much
> > heat.
> > 
> > It is already there for CPU0 and CPU4, but it should really be there
> > for all the CPUs, like we have clock, supply, caches, etc.
> 
> You have #cooling-cells in the cpu node, but the actual data is in the
> thermal-zones nodes. Why isn't #cooling-cells under thermal-zones, next to
> cooling-maps?

Actually I thought about that when I worked on these patches initially and this
is why I felt convinced that the CPU nodes are the right place for this.

We add #interrupt-cells to an Interrupt controller's DT node, #gpio-cells to a
GPIO controller's DT node, #clock-cells to a clock controller's DT node and
that's exactly why we should (and we do) add #cooling-cells property to a
cooling device's DT node. This information is used in two ways, first it enables
the OS to know that the device is capable of being a cooling device and second
it tells us how many arguments will be required with a phandle of this device.

And so the cooling-maps always contain two arguments with the cooling device's
phandle (which is mostly a CPU or a gpio fan) as the #cooling-cells currently
is fixed to 2.

And so I am not really sure if thermal-zones is the right place to define this
thing. Is my understanding correct ?

-- 
viresh


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Christoph Hellwig
On Mon, Jun 04, 2018 at 02:58:49PM -0700, Roland Dreier wrote:
> We plan to implement all the fancy NVMe standards like ANA, but it
> seems that there is still a requirement to let the host side choose
> policies about how to use paths (round-robin vs least queue depth for
> example).  Even in the modern SCSI world with VPD pages and ALUA,
> there are still knobs that are needed.  Maybe NVMe will be different
> and we can find defaults that work in all cases but I have to admit
> I'm skeptical...

The sensible thing to do in nvme is to use different paths for
different queues.  That is e.g. in the RDMA case use the HCA closer
to a given CPU by default.  We might allow to override this for
cases where the is a good reason, but what I really don't want is
configurability for configurabilities sake.


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Christoph Hellwig
On Mon, Jun 04, 2018 at 02:58:49PM -0700, Roland Dreier wrote:
> We plan to implement all the fancy NVMe standards like ANA, but it
> seems that there is still a requirement to let the host side choose
> policies about how to use paths (round-robin vs least queue depth for
> example).  Even in the modern SCSI world with VPD pages and ALUA,
> there are still knobs that are needed.  Maybe NVMe will be different
> and we can find defaults that work in all cases but I have to admit
> I'm skeptical...

The sensible thing to do in nvme is to use different paths for
different queues.  That is e.g. in the RDMA case use the HCA closer
to a given CPU by default.  We might allow to override this for
cases where the is a good reason, but what I really don't want is
configurability for configurabilities sake.


Re: [PATCH -mm -V3 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-06-04 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, May 23, 2018 at 04:26:04PM +0800, Huang, Ying wrote:
>> And for all, Any comment is welcome!
>> 
>> This patchset is based on the 2018-05-18 head of mmotm/master.
>
> Trying to review this and it doesn't apply to mmotm-2018-05-18-16-44.  git
> fails on patch 10:
>
> Applying: mm, THP, swap: Support to count THP swapin and its fallback
> error: Documentation/vm/transhuge.rst: does not exist in index
> Patch failed at 0010 mm, THP, swap: Support to count THP swapin and its 
> fallback
>
> Sure enough, this tag has Documentation/vm/transhuge.txt but not the .rst
> version.  Was this the tag you meant?  If so did you pull in some of Mike
> Rapoport's doc changes on top?

I use the mmotm tree at

git://git.cmpxchg.org/linux-mmotm.git

Maybe you are using the other one?

>> base  optimized
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>1417897   2%+992.8%   15494673vm-scalability.throughput
>>1020489   4%   +1091.2%   12156349vmstat.swap.si
>>1255093   3%+940.3%   13056114vmstat.swap.so
>>1259769   7%   +1818.3%   24166779meminfo.AnonHugePages
>>   28021761   -10.7%   25018848   2%  meminfo.AnonPages
>>   64080064   4% -95.6%2787565  33%  
>> interrupts.CAL:Function_call_interrupts
>>  13.91   5% -13.80.10  27%  
>> perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>> 
> ...snip...
>> test, while in optimized kernel, that is 96.6%.  The TLB flushing IPI
>> (represented as interrupts.CAL:Function_call_interrupts) reduced
>> 95.6%, while cycles for spinlock reduced from 13.9% to 0.1%.  These
>> are performance benefit of THP swapout/swapin too.
>
> Which spinlocks are we spending less time on?

"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.mem_cgroup_commit_charge.do_swap_page.__handle_mm_fault":
 4.39,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages":
 1.53,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.get_page_from_freelist.__alloc_pages_slowpath.__alloc_pages_nodemask":
 1.34,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.swapcache_free_entries.free_swap_slot.do_swap_page":
 1.02,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node":
 0.61,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_active_list.shrink_node_memcg.shrink_node":
 0.54,

Best Regards,
Huang, Ying


Re: [PATCH -mm -V3 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-06-04 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, May 23, 2018 at 04:26:04PM +0800, Huang, Ying wrote:
>> And for all, Any comment is welcome!
>> 
>> This patchset is based on the 2018-05-18 head of mmotm/master.
>
> Trying to review this and it doesn't apply to mmotm-2018-05-18-16-44.  git
> fails on patch 10:
>
> Applying: mm, THP, swap: Support to count THP swapin and its fallback
> error: Documentation/vm/transhuge.rst: does not exist in index
> Patch failed at 0010 mm, THP, swap: Support to count THP swapin and its 
> fallback
>
> Sure enough, this tag has Documentation/vm/transhuge.txt but not the .rst
> version.  Was this the tag you meant?  If so did you pull in some of Mike
> Rapoport's doc changes on top?

I use the mmotm tree at

git://git.cmpxchg.org/linux-mmotm.git

Maybe you are using the other one?

>> base  optimized
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>1417897   2%+992.8%   15494673vm-scalability.throughput
>>1020489   4%   +1091.2%   12156349vmstat.swap.si
>>1255093   3%+940.3%   13056114vmstat.swap.so
>>1259769   7%   +1818.3%   24166779meminfo.AnonHugePages
>>   28021761   -10.7%   25018848   2%  meminfo.AnonPages
>>   64080064   4% -95.6%2787565  33%  
>> interrupts.CAL:Function_call_interrupts
>>  13.91   5% -13.80.10  27%  
>> perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>> 
> ...snip...
>> test, while in optimized kernel, that is 96.6%.  The TLB flushing IPI
>> (represented as interrupts.CAL:Function_call_interrupts) reduced
>> 95.6%, while cycles for spinlock reduced from 13.9% to 0.1%.  These
>> are performance benefit of THP swapout/swapin too.
>
> Which spinlocks are we spending less time on?

"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.mem_cgroup_commit_charge.do_swap_page.__handle_mm_fault":
 4.39,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages":
 1.53,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.get_page_from_freelist.__alloc_pages_slowpath.__alloc_pages_nodemask":
 1.34,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.swapcache_free_entries.free_swap_slot.do_swap_page":
 1.02,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node":
 0.61,
"perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_active_list.shrink_node_memcg.shrink_node":
 0.54,

Best Regards,
Huang, Ying


Re: Common config for N900 and D4

2018-06-04 Thread Tony Lindgren
* Pavel Machek  [180603 21:03]:
> Hi!
> 
> > > Aaro, I know I have asked before, but if you have common config for
> > > N900 and Droid4, please send me a copy. Yes, it should be somewhere in
> > > my inbox already, but I can't find it and version for v4.17 would be
> > > more useful.
> > > 
> > > While trying to came up with common config, I hit:
> > > 
> > > [0.00] L2C-310 erratum 727915 enabled
> > > [0.00] L2C-310 enabling early BRESP for Cortex-A9
> > > [0.00] L2C-310 full line of zeros enabled for Cortex-A9
> > 
> > I tried disabling outer cache to get rid of this. That got me further
> > in boot, but not to working system:
> 
> I now have config that works.
> 
> My kernels were probably grossly misconfigured (OMAP4 not enabled on
> droid4)... still I'd expect some kind of panic explaining board is not
> compatible with kernel, not a random oops...

Like "clk-provider not found" error? :) I agree it's not very
descriptive and I think we could easily print something from
after the SoC detection for missing pdata.

Eventually we should just get rid of all ARMv7 Kconfig variants
once most of the hwmod SoC pdata is gone.

Regards,

Tony


Re: Common config for N900 and D4

2018-06-04 Thread Tony Lindgren
* Pavel Machek  [180603 21:03]:
> Hi!
> 
> > > Aaro, I know I have asked before, but if you have common config for
> > > N900 and Droid4, please send me a copy. Yes, it should be somewhere in
> > > my inbox already, but I can't find it and version for v4.17 would be
> > > more useful.
> > > 
> > > While trying to came up with common config, I hit:
> > > 
> > > [0.00] L2C-310 erratum 727915 enabled
> > > [0.00] L2C-310 enabling early BRESP for Cortex-A9
> > > [0.00] L2C-310 full line of zeros enabled for Cortex-A9
> > 
> > I tried disabling outer cache to get rid of this. That got me further
> > in boot, but not to working system:
> 
> I now have config that works.
> 
> My kernels were probably grossly misconfigured (OMAP4 not enabled on
> droid4)... still I'd expect some kind of panic explaining board is not
> compatible with kernel, not a random oops...

Like "clk-provider not found" error? :) I agree it's not very
descriptive and I think we could easily print something from
after the SoC detection for missing pdata.

Eventually we should just get rid of all ARMv7 Kconfig variants
once most of the hwmod SoC pdata is gone.

Regards,

Tony


Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes

2018-06-04 Thread David Rientjes
On Fri, 1 Jun 2018, Michal Hocko wrote:

> > We've discussed the mm 
> > having a single blockable mmu notifier.  Regardless of how we arrive at 
> > the point where the oom reaper can't free memory, which could be any of 
> > those three cases, if (1) the original victim is sufficiently large that 
> > follow-up oom kills would become unnecessary and (2) other threads 
> > allocate/charge before the oom victim reaches exit_mmap(), this occurs.
> > 
> > We have examples of cases where oom reaping was successful, but the rss 
> > numbers in the kernel log are very similar to when it was oom killed and 
> > the process is known not to mlock, the reason is because the oom reaper 
> > could free very little memory due to blockable mmu notifiers.
> 
> Please be more specific. Which notifiers these were. Blockable notifiers
> are a PITA and we should be addressing them. That requiers identifying
> them first.
> 

The most common offender seems to be ib_umem_notifier, but I have also 
heard of possible occurrences for mv_invl_range_start() for xen, but that 
would need more investigation.  The rather new invalidate_range callback 
for hmm mirroring could also be problematic.  Any mmu_notifier without 
MMU_INVALIDATE_DOES_NOT_BLOCK causes the mm to immediately be disregarded.  
For this reason, we see testing harnesses often oom killed immediately 
after running a unittest that stresses reclaim or compaction by inducing a 
system-wide oom condition.  The harness spawns the unittest which spawns 
an antagonist memory hog that is intended to be oom killed.  When memory 
is mlocked or there are a large number of threads faulting memory for the 
antagonist, the unittest and the harness itself get oom killed because the 
oom reaper sets MMF_OOM_SKIP; this ends up happening a lot on powerpc.  
The memory hog has mm->mmap_sem readers queued ahead of a writer that is 
doing mmap() so the oom reaper can't grab the sem quickly enough.

I agree that blockable mmu notifiers are a pain, but until such time as 
all can implicitly be MMU_INVALIDATE_DOES_NOT_BLOCK, the oom reaper can 
free all mlocked memory, and the oom reaper waits long enough to grab 
mm->mmap_sem for stalled mm->mmap_sem readers, we need a solution that 
won't oom kill everything running on the system.  I have doubts we'll ever 
reach a point where the oom reaper can do the equivalent of exit_mmap(), 
but it's possible to help solve the immediate issue of all oom kills 
killing many innocent processes while working in a direction to make oom 
reaping more successful at freeing memory.

> > The current implementation is a timeout based solution for mmap_sem, it 
> > just has the oom reaper spinning trying to grab the sem and eventually 
> > gives up.  This patch allows it to currently work on other mm's and 
> > detects the timeout in a different way, with jiffies instead of an 
> > iterator.
> 
> And I argue that anything timeout based is just broken by design. Trying
> n times will at least give you a consistent behavior.

It's not consistent, we see wildly inconsistent results especially on 
power because it depends on the number of queued readers of mm->mmap_sem 
ahead of a writer until such time that a thread doing mmap() can grab it, 
drop it, and allow the oom reaper to grab it for read.  It's so 
inconsistent that we can see the oom reaper successfully grab the sem for 
an oom killed memory hog with 128 faulting threads, and see it fail with 4 
faulting threads.

> Retrying on mmap
> sem makes sense because the lock might be taken for a short time.

It isn't a function of how long mmap_sem is taken for write, it's a 
function of how many readers are ahead of the queued writer.  We don't run 
with thp defrag set to "always" under standard configurations, but users 
of MADV_HUGEPAGE or configs where defrag is set to "always" can 
consistently cause any number of additional processes to be oom killed 
unnecessarily because the readers are performing compaction and the writer 
is queued behind it.

> > I'd love a solution where we can reliably detect an oom livelock and oom 
> > kill additional processes but only after the original victim has had a 
> > chance to do exit_mmap() without a timeout, but I don't see one being 
> > offered.  Given Tetsuo has seen issues with this in the past and suggested 
> > a similar proposal means we are not the only ones feeling pain from this.
> 
> Tetsuo is doing an artificial stress test which doesn't resemble any
> reasonable workload.

Tetsuo's test cases caught the CVE on powerpc which could trivially 
panic the system if configured to panic on any oops and required a 
security fix because it made it easy for any user doing a large mlock.  
His test case here is trivial to reproduce on powerpc and causes several 
additional processes to be oom killed.  It's not artificial, I see many 
test harnesses killed *nightly* because a memory hog is faulting with a 
large number of threads and two or three other threads are 

Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes

2018-06-04 Thread David Rientjes
On Fri, 1 Jun 2018, Michal Hocko wrote:

> > We've discussed the mm 
> > having a single blockable mmu notifier.  Regardless of how we arrive at 
> > the point where the oom reaper can't free memory, which could be any of 
> > those three cases, if (1) the original victim is sufficiently large that 
> > follow-up oom kills would become unnecessary and (2) other threads 
> > allocate/charge before the oom victim reaches exit_mmap(), this occurs.
> > 
> > We have examples of cases where oom reaping was successful, but the rss 
> > numbers in the kernel log are very similar to when it was oom killed and 
> > the process is known not to mlock, the reason is because the oom reaper 
> > could free very little memory due to blockable mmu notifiers.
> 
> Please be more specific. Which notifiers these were. Blockable notifiers
> are a PITA and we should be addressing them. That requiers identifying
> them first.
> 

The most common offender seems to be ib_umem_notifier, but I have also 
heard of possible occurrences for mv_invl_range_start() for xen, but that 
would need more investigation.  The rather new invalidate_range callback 
for hmm mirroring could also be problematic.  Any mmu_notifier without 
MMU_INVALIDATE_DOES_NOT_BLOCK causes the mm to immediately be disregarded.  
For this reason, we see testing harnesses often oom killed immediately 
after running a unittest that stresses reclaim or compaction by inducing a 
system-wide oom condition.  The harness spawns the unittest which spawns 
an antagonist memory hog that is intended to be oom killed.  When memory 
is mlocked or there are a large number of threads faulting memory for the 
antagonist, the unittest and the harness itself get oom killed because the 
oom reaper sets MMF_OOM_SKIP; this ends up happening a lot on powerpc.  
The memory hog has mm->mmap_sem readers queued ahead of a writer that is 
doing mmap() so the oom reaper can't grab the sem quickly enough.

I agree that blockable mmu notifiers are a pain, but until such time as 
all can implicitly be MMU_INVALIDATE_DOES_NOT_BLOCK, the oom reaper can 
free all mlocked memory, and the oom reaper waits long enough to grab 
mm->mmap_sem for stalled mm->mmap_sem readers, we need a solution that 
won't oom kill everything running on the system.  I have doubts we'll ever 
reach a point where the oom reaper can do the equivalent of exit_mmap(), 
but it's possible to help solve the immediate issue of all oom kills 
killing many innocent processes while working in a direction to make oom 
reaping more successful at freeing memory.

> > The current implementation is a timeout based solution for mmap_sem, it 
> > just has the oom reaper spinning trying to grab the sem and eventually 
> > gives up.  This patch allows it to currently work on other mm's and 
> > detects the timeout in a different way, with jiffies instead of an 
> > iterator.
> 
> And I argue that anything timeout based is just broken by design. Trying
> n times will at least give you a consistent behavior.

It's not consistent, we see wildly inconsistent results especially on 
power because it depends on the number of queued readers of mm->mmap_sem 
ahead of a writer until such time that a thread doing mmap() can grab it, 
drop it, and allow the oom reaper to grab it for read.  It's so 
inconsistent that we can see the oom reaper successfully grab the sem for 
an oom killed memory hog with 128 faulting threads, and see it fail with 4 
faulting threads.

> Retrying on mmap
> sem makes sense because the lock might be taken for a short time.

It isn't a function of how long mmap_sem is taken for write, it's a 
function of how many readers are ahead of the queued writer.  We don't run 
with thp defrag set to "always" under standard configurations, but users 
of MADV_HUGEPAGE or configs where defrag is set to "always" can 
consistently cause any number of additional processes to be oom killed 
unnecessarily because the readers are performing compaction and the writer 
is queued behind it.

> > I'd love a solution where we can reliably detect an oom livelock and oom 
> > kill additional processes but only after the original victim has had a 
> > chance to do exit_mmap() without a timeout, but I don't see one being 
> > offered.  Given Tetsuo has seen issues with this in the past and suggested 
> > a similar proposal means we are not the only ones feeling pain from this.
> 
> Tetsuo is doing an artificial stress test which doesn't resemble any
> reasonable workload.

Tetsuo's test cases caught the CVE on powerpc which could trivially 
panic the system if configured to panic on any oops and required a 
security fix because it made it easy for any user doing a large mlock.  
His test case here is trivial to reproduce on powerpc and causes several 
additional processes to be oom killed.  It's not artificial, I see many 
test harnesses killed *nightly* because a memory hog is faulting with a 
large number of threads and two or three other threads are 

Re: [PATCH] uart: fix race between uart_put_char() and uart_shutdown()

2018-06-04 Thread Serge E. Hallyn
Quoting Tycho Andersen (ty...@tycho.ws):
> We have reports of the following crash:
> 
> PID: 7 TASK: 88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
> #0 [88085c6db710] machine_kexec at 81046239
> #1 [88085c6db760] crash_kexec at 810fc248
> #2 [88085c6db830] oops_end at 81008ae7
> #3 [88085c6db860] no_context at 81050b8f
> #4 [88085c6db8b0] __bad_area_nosemaphore at 81050d75
> #5 [88085c6db900] bad_area_nosemaphore at 81050e83
> #6 [88085c6db910] __do_page_fault at 8105132e
> #7 [88085c6db9b0] do_page_fault at 8105152c
> #8 [88085c6db9c0] page_fault at 81a3f122
> [exception RIP: uart_put_char+149]
> RIP: 814b67b5 RSP: 88085c6dba78 RFLAGS: 00010006
> RAX: 0292 RBX: 827c5120 RCX: 0081
> RDX:  RSI: 005f RDI: 827c5120
> RBP: 88085c6dba98 R8: 012c R9: 822ea320
> R10: 88085fe4db04 R11: 0001 R12: 881059f9c000
> R13: 0001 R14: 005f R15: 0fba
> ORIG_RAX:  CS: 0010 SS: 0018
> #9 [88085c6dbaa0] tty_put_char at 81497544
> #10 [88085c6dbac0] do_output_char at 8149c91c
> #11 [88085c6dbae0] __process_echoes at 8149cb8b
> #12 [88085c6dbb30] commit_echoes at 8149cdc2
> #13 [88085c6dbb60] n_tty_receive_buf_fast at 8149e49b
> #14 [88085c6dbbc0] __receive_buf at 8149ef5a
> #15 [88085c6dbc20] n_tty_receive_buf_common at 8149f016
> #16 [88085c6dbca0] n_tty_receive_buf2 at 8149f194
> #17 [88085c6dbcb0] flush_to_ldisc at 814a238a
> #18 [88085c6dbd50] process_one_work at 81090be2
> #19 [88085c6dbe20] worker_thread at 81091b4d
> #20 [88085c6dbeb0] kthread at 81096384
> #21 [88085c6dbf50] ret_from_fork at 81a3d69f​
> 
> after slogging through some dissasembly:
> 
> 814b6720 :
> 814b6720: 55  push   %rbp
> 814b6721: 48 89 e5mov%rsp,%rbp
> 814b6724: 48 83 ec 20 sub$0x20,%rsp
> 814b6728: 48 89 1c 24 mov%rbx,(%rsp)
> 814b672c: 4c 89 64 24 08  mov%r12,0x8(%rsp)
> 814b6731: 4c 89 6c 24 10  mov%r13,0x10(%rsp)
> 814b6736: 4c 89 74 24 18  mov%r14,0x18(%rsp)
> 814b673b: e8 b0 8e 58 00  callq  81a3f5f0 
> 814b6740: 4c 8b a7 88 02 00 00mov0x288(%rdi),%r12
> 814b6747: 45 31 edxor%r13d,%r13d
> 814b674a: 41 89 f6mov%esi,%r14d
> 814b674d: 49 83 bc 24 70 01 00cmpq   $0x0,0x170(%r12)
> 814b6754: 00 00
> 814b6756: 49 8b 9c 24 80 01 00mov0x180(%r12),%rbx
> 814b675d: 00
> 814b675e: 74 2f   je 814b678f 
> 
> 814b6760: 48 89 dfmov%rbx,%rdi
> 814b6763: e8 a8 67 58 00  callq  81a3cf10 
> <_raw_spin_lock_irqsave>
> 814b6768: 41 8b 8c 24 78 01 00mov0x178(%r12),%ecx
> 814b676f: 00
> 814b6770: 89 ca   mov%ecx,%edx
> 814b6772: f7 d2   not%edx
> 814b6774: 41 03 94 24 7c 01 00add0x17c(%r12),%edx
> 814b677b: 00
> 814b677c: 81 e2 ff 0f 00 00   and$0xfff,%edx
> 814b6782: 75 23   jne814b67a7 
> 
> 814b6784: 48 89 c6mov%rax,%rsi
> 814b6787: 48 89 dfmov%rbx,%rdi
> 814b678a: e8 e1 64 58 00  callq  81a3cc70 
> <_raw_spin_unlock_irqrestore>
> 814b678f: 44 89 e8mov%r13d,%eax
> 814b6792: 48 8b 1c 24 mov(%rsp),%rbx
> 814b6796: 4c 8b 64 24 08  mov0x8(%rsp),%r12
> 814b679b: 4c 8b 6c 24 10  mov0x10(%rsp),%r13
> 814b67a0: 4c 8b 74 24 18  mov0x18(%rsp),%r14
> 814b67a5: c9  leaveq
> 814b67a6: c3  retq
> 814b67a7: 49 8b 94 24 70 01 00mov0x170(%r12),%rdx
> 814b67ae: 00
> 814b67af: 48 63 c9movslq %ecx,%rcx
> 814b67b2: 41 b5 01mov$0x1,%r13b
> 814b67b5: 44 88 34 0a mov%r14b,(%rdx,%rcx,1)
> 814b67b9: 41 8b 94 24 78 01 00mov0x178(%r12),%edx
> 814b67c0: 00
> 814b67c1: 83 c2 01add$0x1,%edx
> 814b67c4: 81 e2 ff 0f 

Re: [PATCH] uart: fix race between uart_put_char() and uart_shutdown()

2018-06-04 Thread Serge E. Hallyn
Quoting Tycho Andersen (ty...@tycho.ws):
> We have reports of the following crash:
> 
> PID: 7 TASK: 88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
> #0 [88085c6db710] machine_kexec at 81046239
> #1 [88085c6db760] crash_kexec at 810fc248
> #2 [88085c6db830] oops_end at 81008ae7
> #3 [88085c6db860] no_context at 81050b8f
> #4 [88085c6db8b0] __bad_area_nosemaphore at 81050d75
> #5 [88085c6db900] bad_area_nosemaphore at 81050e83
> #6 [88085c6db910] __do_page_fault at 8105132e
> #7 [88085c6db9b0] do_page_fault at 8105152c
> #8 [88085c6db9c0] page_fault at 81a3f122
> [exception RIP: uart_put_char+149]
> RIP: 814b67b5 RSP: 88085c6dba78 RFLAGS: 00010006
> RAX: 0292 RBX: 827c5120 RCX: 0081
> RDX:  RSI: 005f RDI: 827c5120
> RBP: 88085c6dba98 R8: 012c R9: 822ea320
> R10: 88085fe4db04 R11: 0001 R12: 881059f9c000
> R13: 0001 R14: 005f R15: 0fba
> ORIG_RAX:  CS: 0010 SS: 0018
> #9 [88085c6dbaa0] tty_put_char at 81497544
> #10 [88085c6dbac0] do_output_char at 8149c91c
> #11 [88085c6dbae0] __process_echoes at 8149cb8b
> #12 [88085c6dbb30] commit_echoes at 8149cdc2
> #13 [88085c6dbb60] n_tty_receive_buf_fast at 8149e49b
> #14 [88085c6dbbc0] __receive_buf at 8149ef5a
> #15 [88085c6dbc20] n_tty_receive_buf_common at 8149f016
> #16 [88085c6dbca0] n_tty_receive_buf2 at 8149f194
> #17 [88085c6dbcb0] flush_to_ldisc at 814a238a
> #18 [88085c6dbd50] process_one_work at 81090be2
> #19 [88085c6dbe20] worker_thread at 81091b4d
> #20 [88085c6dbeb0] kthread at 81096384
> #21 [88085c6dbf50] ret_from_fork at 81a3d69f​
> 
> after slogging through some dissasembly:
> 
> 814b6720 :
> 814b6720: 55  push   %rbp
> 814b6721: 48 89 e5mov%rsp,%rbp
> 814b6724: 48 83 ec 20 sub$0x20,%rsp
> 814b6728: 48 89 1c 24 mov%rbx,(%rsp)
> 814b672c: 4c 89 64 24 08  mov%r12,0x8(%rsp)
> 814b6731: 4c 89 6c 24 10  mov%r13,0x10(%rsp)
> 814b6736: 4c 89 74 24 18  mov%r14,0x18(%rsp)
> 814b673b: e8 b0 8e 58 00  callq  81a3f5f0 
> 814b6740: 4c 8b a7 88 02 00 00mov0x288(%rdi),%r12
> 814b6747: 45 31 edxor%r13d,%r13d
> 814b674a: 41 89 f6mov%esi,%r14d
> 814b674d: 49 83 bc 24 70 01 00cmpq   $0x0,0x170(%r12)
> 814b6754: 00 00
> 814b6756: 49 8b 9c 24 80 01 00mov0x180(%r12),%rbx
> 814b675d: 00
> 814b675e: 74 2f   je 814b678f 
> 
> 814b6760: 48 89 dfmov%rbx,%rdi
> 814b6763: e8 a8 67 58 00  callq  81a3cf10 
> <_raw_spin_lock_irqsave>
> 814b6768: 41 8b 8c 24 78 01 00mov0x178(%r12),%ecx
> 814b676f: 00
> 814b6770: 89 ca   mov%ecx,%edx
> 814b6772: f7 d2   not%edx
> 814b6774: 41 03 94 24 7c 01 00add0x17c(%r12),%edx
> 814b677b: 00
> 814b677c: 81 e2 ff 0f 00 00   and$0xfff,%edx
> 814b6782: 75 23   jne814b67a7 
> 
> 814b6784: 48 89 c6mov%rax,%rsi
> 814b6787: 48 89 dfmov%rbx,%rdi
> 814b678a: e8 e1 64 58 00  callq  81a3cc70 
> <_raw_spin_unlock_irqrestore>
> 814b678f: 44 89 e8mov%r13d,%eax
> 814b6792: 48 8b 1c 24 mov(%rsp),%rbx
> 814b6796: 4c 8b 64 24 08  mov0x8(%rsp),%r12
> 814b679b: 4c 8b 6c 24 10  mov0x10(%rsp),%r13
> 814b67a0: 4c 8b 74 24 18  mov0x18(%rsp),%r14
> 814b67a5: c9  leaveq
> 814b67a6: c3  retq
> 814b67a7: 49 8b 94 24 70 01 00mov0x170(%r12),%rdx
> 814b67ae: 00
> 814b67af: 48 63 c9movslq %ecx,%rcx
> 814b67b2: 41 b5 01mov$0x1,%r13b
> 814b67b5: 44 88 34 0a mov%r14b,(%rdx,%rcx,1)
> 814b67b9: 41 8b 94 24 78 01 00mov0x178(%r12),%edx
> 814b67c0: 00
> 814b67c1: 83 c2 01add$0x1,%edx
> 814b67c4: 81 e2 ff 0f 

Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread Mika Penttilä
On 06/04/2018 10:24 PM, Chang S. Bae wrote:
> The CPU (and node) number will be written, as early enough,
> to the segment limit of per CPU data and TSC_AUX MSR entry.
> The information has been retrieved by vgetcpu in user space
> and will be also loaded from the paranoid entry, when
> FSGSBASE enabled. So, it is moved out from vDSO to the CPU
> initialization path where IST setup is serialized.
> 
> Now, redundant setting of the segment in entry/vdso/vma.c
> was removed; a substantial code removal. It removes a
> hotplug notifier, makes a facility useful to both the kernel
> and userspace unconditionally available much sooner, and
> unification with i386. (Thanks to HPA for suggesting the
> cleanup)
> 
> Signed-off-by: Chang S. Bae 
> Cc: H. Peter Anvin 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Andi Kleen 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> ---

> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
> index ea554f8..e716e94 100644
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -155,12 +155,21 @@ static void __init pcpup_populate_pte(unsigned long 
> addr)
>  
>  static inline void setup_percpu_segment(int cpu)
>  {
> -#ifdef CONFIG_X86_32
> - struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
> -   0xF);
> +#ifdef CONFIG_NUMA
> + unsigned long node = early_cpu_to_node(cpu);
> +#else
> + unsigned long node = 0;
> +#endif
> + struct desc_struct d = GDT_ENTRY_INIT(0x0, per_cpu_offset(cpu),
> +make_lsl_tscp(cpu, node));
> +
> + d.type = 5; /* R0 data, expand down, accessed */
> + d.dpl = 3;  /* Visible to user code */
> + d.s = 1;/* Not a system segment */
> + d.p = 1;/* Present */
> + d.d = 1;/* 32-bit */
>  
>   write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, , DESCTYPE_S);
> -#endif
>  }


This won't work on X86-32 because it actually uses the segment limit with fs: 
access. So there 
is a reason why the lsl based method is X86-64 only.


-Mika

>  
>  void __init setup_per_cpu_areas(void)
> 



Re: [PATCH 6/6] x86/vdso: Move out the CPU number store

2018-06-04 Thread Mika Penttilä
On 06/04/2018 10:24 PM, Chang S. Bae wrote:
> The CPU (and node) number will be written, as early enough,
> to the segment limit of per CPU data and TSC_AUX MSR entry.
> The information has been retrieved by vgetcpu in user space
> and will be also loaded from the paranoid entry, when
> FSGSBASE enabled. So, it is moved out from vDSO to the CPU
> initialization path where IST setup is serialized.
> 
> Now, redundant setting of the segment in entry/vdso/vma.c
> was removed; a substantial code removal. It removes a
> hotplug notifier, makes a facility useful to both the kernel
> and userspace unconditionally available much sooner, and
> unification with i386. (Thanks to HPA for suggesting the
> cleanup)
> 
> Signed-off-by: Chang S. Bae 
> Cc: H. Peter Anvin 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Andi Kleen 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> ---

> diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
> index ea554f8..e716e94 100644
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -155,12 +155,21 @@ static void __init pcpup_populate_pte(unsigned long 
> addr)
>  
>  static inline void setup_percpu_segment(int cpu)
>  {
> -#ifdef CONFIG_X86_32
> - struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
> -   0xF);
> +#ifdef CONFIG_NUMA
> + unsigned long node = early_cpu_to_node(cpu);
> +#else
> + unsigned long node = 0;
> +#endif
> + struct desc_struct d = GDT_ENTRY_INIT(0x0, per_cpu_offset(cpu),
> +make_lsl_tscp(cpu, node));
> +
> + d.type = 5; /* R0 data, expand down, accessed */
> + d.dpl = 3;  /* Visible to user code */
> + d.s = 1;/* Not a system segment */
> + d.p = 1;/* Present */
> + d.d = 1;/* 32-bit */
>  
>   write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, , DESCTYPE_S);
> -#endif
>  }


This won't work on X86-32 because it actually uses the segment limit with fs: 
access. So there 
is a reason why the lsl based method is X86-64 only.


-Mika

>  
>  void __init setup_per_cpu_areas(void)
> 



Re: [PATCH 16/19] sched/numa: Detect if node actively handling migration

2018-06-04 Thread Srikar Dronamraju
* Rik van Riel  [2018-06-04 16:05:55]:

> On Mon, 2018-06-04 at 15:30 +0530, Srikar Dronamraju wrote:
> 
> > @@ -1554,6 +1562,9 @@ static void task_numa_compare(struct
> > task_numa_env *env,
> > if (READ_ONCE(dst_rq->numa_migrate_on))
> > return;
> >  
> > +   if (*move && READ_ONCE(pgdat->active_node_migrate))
> > +   *move = false;
> 
> Why not do this check in task_numa_find_cpu?
> 
> That way you won't have to pass this in as a
> pointer, and you also will not have to recalculate
> NODE_DATA(cpu_to_node(env->dst_cpu)) for every CPU.
> 

I thought about this. Lets say we evaluated that destination node can
allow movement. While we iterate through the list of cpus trying to find
the best cpu node, we find a idle cpu towards the end of the list.
However if another task as already raced with us to move a task to this
node, then we should bail out. Keeping the check in task_numa_compare
will allow us to do this.

> > /*
> > +* If the numa importance is less than SMALLIMP,
> 
>   ^^^ numa improvement
> 

okay

> > +* task migration might only result in ping pong
> > +* of tasks and also hurt performance due to cache
> > +* misses.
> > +*/
> > +   if (imp < SMALLIMP || imp <= env->best_imp + SMALLIMP / 2)
> > +   goto unlock;
> 
> I can see a use for the first test, but why limit the
> search for the best score once you are past the
> threshold?
> 
> I don't understand the use for that second test.
> 

Lets say few threads are racing with each other to find a cpu on the
node. The first thread has already found a task/cpu 'A' to swap and
finds another task/cpu 'B' thats slightly better than the current
best_cpu which is 'A'. Currently we allow the second task/cpu 'B' to be
set as best_cpu. However the second or subsequent threads cannot find
the first task/cpu A because its suppose to be in active migration. By
the time it reaches task/cpu B even that may look to be in active
migration. It may never know that task/cpu A was cleared. In this way,
the second and subsequent threads may not get a task/cpu in the
preferred node to swap just because the first task kept hopping task/cpu
as its choice of migration.

While we can't complete avoid this, the second check will try to make
sure we don't hop on/hop off just for small incremental numa
improvement.



> What workload benefits from it?

> 
> -- 
> All Rights Reversed.




Re: [PATCH 16/19] sched/numa: Detect if node actively handling migration

2018-06-04 Thread Srikar Dronamraju
* Rik van Riel  [2018-06-04 16:05:55]:

> On Mon, 2018-06-04 at 15:30 +0530, Srikar Dronamraju wrote:
> 
> > @@ -1554,6 +1562,9 @@ static void task_numa_compare(struct
> > task_numa_env *env,
> > if (READ_ONCE(dst_rq->numa_migrate_on))
> > return;
> >  
> > +   if (*move && READ_ONCE(pgdat->active_node_migrate))
> > +   *move = false;
> 
> Why not do this check in task_numa_find_cpu?
> 
> That way you won't have to pass this in as a
> pointer, and you also will not have to recalculate
> NODE_DATA(cpu_to_node(env->dst_cpu)) for every CPU.
> 

I thought about this. Lets say we evaluated that destination node can
allow movement. While we iterate through the list of cpus trying to find
the best cpu node, we find a idle cpu towards the end of the list.
However if another task as already raced with us to move a task to this
node, then we should bail out. Keeping the check in task_numa_compare
will allow us to do this.

> > /*
> > +* If the numa importance is less than SMALLIMP,
> 
>   ^^^ numa improvement
> 

okay

> > +* task migration might only result in ping pong
> > +* of tasks and also hurt performance due to cache
> > +* misses.
> > +*/
> > +   if (imp < SMALLIMP || imp <= env->best_imp + SMALLIMP / 2)
> > +   goto unlock;
> 
> I can see a use for the first test, but why limit the
> search for the best score once you are past the
> threshold?
> 
> I don't understand the use for that second test.
> 

Lets say few threads are racing with each other to find a cpu on the
node. The first thread has already found a task/cpu 'A' to swap and
finds another task/cpu 'B' thats slightly better than the current
best_cpu which is 'A'. Currently we allow the second task/cpu 'B' to be
set as best_cpu. However the second or subsequent threads cannot find
the first task/cpu A because its suppose to be in active migration. By
the time it reaches task/cpu B even that may look to be in active
migration. It may never know that task/cpu A was cleared. In this way,
the second and subsequent threads may not get a task/cpu in the
preferred node to swap just because the first task kept hopping task/cpu
as its choice of migration.

While we can't complete avoid this, the second check will try to make
sure we don't hop on/hop off just for small incremental numa
improvement.



> What workload benefits from it?

> 
> -- 
> All Rights Reversed.




Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Chris Chiu
On Tue, Jun 5, 2018 at 10:31 AM, Darren Hart  wrote:
> On Mon, Jun 04, 2018 at 04:23:04PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 04-06-18 15:51, Daniel Drake wrote:
>> > On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  wrote:
>> > > Is this really a case of the hardware itself processing the
>> > > keypress and then changing the brightness *itself* ?
>> > >
>> > >  From the "[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight
>> > > toggle support" patch I get the impression that the driver is
>> > > modifying the brightness from within the kernel rather then the
>> > > keyboard controller are ACPI embeddec-controller doing it itself.
>> > >
>> > > If that is the case then the right fix is for the driver to stop
>> > > mucking with the brighness itself, it should simply report the
>> > > right keyboard events and export a led interface and then userspace
>> > > will do the right thing (and be able to offer flexible policies
>> > > to the user).
>> >
>> > Before this modification, the driver reports the brightness keypresses
>> > to userspace and then userspace can respond by changing the brightness
>> > level, as you describe.
>> >
>> > You are right in that the hardware doesn't change the brightness
>> > directly itself, which is the normal usage of LED_BRIGHT_HW_CHANGED.
>> >
>> > However this approach was suggested by Benjamin Berg and Bastien
>> > Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi: Add
>> > keyboard backlight toggle support
>> > https://marc.info/?l=linux-kernel=152639169210655=2
>> >
>> > The issue is that we need to support a new "keyboard backlight
>> > brightness cycle" key (in the patch that follows this one) which
>> > doesn't fit into any definitions of keys recognised by the kernel and
>> > likewise there's no userspace code to handle it.
>> >
>> > If preferred we could leave the standard brightness keys behaving as
>> > they are (input events) and make the new special key type directly
>> > handled by the kernel?
>>
>> I'm sorry that Benjamin and Bastien steered you in this direction,
>> IMHO none of it should be handled in the kernel.
>>
>> Anytime any sort of input is directly responded to by the kernel
>> it is a huge PITA to deal with from userspace. The kernel will have
>> a simplistic implementation which almost always is wrong.
>>
>> Benjamin, remember the pain we went through with rfkill hotkey
>> presses being handled in the kernel ?
>>
>> And then there is the whole acpi_video.brightness_switch_enabled
>> debacle, which is an option which defaults to true which causes
>> the kernel to handle LCD brightness key presses, which all distros
>> have been patching to default to off for ages.
>>
>> To give a concrete example, we may want to implement software
>> dimming / auto-off of the kbd backlight when the no keys are
>> touched for x seconds. This would seriously get in the way of that.
>>
>> So sorry, but NACK to this series.
>
> So if instead of modifying the LED value, the kernel platform drivers
> converted the TOGGLE into a cycle even by converting to an UP event
> based on awareness of the plaform specific max value and the read
> current value, leaving userspace to act on the TOGGLE/UP events - would
> that be preferable?
>
> Something like:
>
> if (code == TOGGLE && ledval < ledmax)
> code = UP;
>
> sparse_keymap_report_event(..., code, ...)
>
> }
> --
> Darren Hart
> VMware Open Source Technology Center

That's what I was trying to do in  [PATCH v2] platform/x86: asus-wmi: Add
keyboard backlight toggle support. However, that brought another problem
discussed in the thread.
https://marc.info/?l=linux-kernel=152639169210655=2

So I moved the brightness change in the driver without passing to userspace.
Per Hans, seems there're some other concerns and I also wonder if the
TOGGLE event happens in ASUS HID (asus-hid.c) which also convert and
pass the keycode to userspace but no TOGGLE key support yet What should
we do then?


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Chris Chiu
On Tue, Jun 5, 2018 at 10:31 AM, Darren Hart  wrote:
> On Mon, Jun 04, 2018 at 04:23:04PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 04-06-18 15:51, Daniel Drake wrote:
>> > On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  wrote:
>> > > Is this really a case of the hardware itself processing the
>> > > keypress and then changing the brightness *itself* ?
>> > >
>> > >  From the "[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight
>> > > toggle support" patch I get the impression that the driver is
>> > > modifying the brightness from within the kernel rather then the
>> > > keyboard controller are ACPI embeddec-controller doing it itself.
>> > >
>> > > If that is the case then the right fix is for the driver to stop
>> > > mucking with the brighness itself, it should simply report the
>> > > right keyboard events and export a led interface and then userspace
>> > > will do the right thing (and be able to offer flexible policies
>> > > to the user).
>> >
>> > Before this modification, the driver reports the brightness keypresses
>> > to userspace and then userspace can respond by changing the brightness
>> > level, as you describe.
>> >
>> > You are right in that the hardware doesn't change the brightness
>> > directly itself, which is the normal usage of LED_BRIGHT_HW_CHANGED.
>> >
>> > However this approach was suggested by Benjamin Berg and Bastien
>> > Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi: Add
>> > keyboard backlight toggle support
>> > https://marc.info/?l=linux-kernel=152639169210655=2
>> >
>> > The issue is that we need to support a new "keyboard backlight
>> > brightness cycle" key (in the patch that follows this one) which
>> > doesn't fit into any definitions of keys recognised by the kernel and
>> > likewise there's no userspace code to handle it.
>> >
>> > If preferred we could leave the standard brightness keys behaving as
>> > they are (input events) and make the new special key type directly
>> > handled by the kernel?
>>
>> I'm sorry that Benjamin and Bastien steered you in this direction,
>> IMHO none of it should be handled in the kernel.
>>
>> Anytime any sort of input is directly responded to by the kernel
>> it is a huge PITA to deal with from userspace. The kernel will have
>> a simplistic implementation which almost always is wrong.
>>
>> Benjamin, remember the pain we went through with rfkill hotkey
>> presses being handled in the kernel ?
>>
>> And then there is the whole acpi_video.brightness_switch_enabled
>> debacle, which is an option which defaults to true which causes
>> the kernel to handle LCD brightness key presses, which all distros
>> have been patching to default to off for ages.
>>
>> To give a concrete example, we may want to implement software
>> dimming / auto-off of the kbd backlight when the no keys are
>> touched for x seconds. This would seriously get in the way of that.
>>
>> So sorry, but NACK to this series.
>
> So if instead of modifying the LED value, the kernel platform drivers
> converted the TOGGLE into a cycle even by converting to an UP event
> based on awareness of the plaform specific max value and the read
> current value, leaving userspace to act on the TOGGLE/UP events - would
> that be preferable?
>
> Something like:
>
> if (code == TOGGLE && ledval < ledmax)
> code = UP;
>
> sparse_keymap_report_event(..., code, ...)
>
> }
> --
> Darren Hart
> VMware Open Source Technology Center

That's what I was trying to do in  [PATCH v2] platform/x86: asus-wmi: Add
keyboard backlight toggle support. However, that brought another problem
discussed in the thread.
https://marc.info/?l=linux-kernel=152639169210655=2

So I moved the brightness change in the driver without passing to userspace.
Per Hans, seems there're some other concerns and I also wonder if the
TOGGLE event happens in ASUS HID (asus-hid.c) which also convert and
pass the keycode to userspace but no TOGGLE key support yet What should
we do then?


linux-next: build warning after merge of the mfd tree

2018-06-04 Thread Stephen Rothwell
Hi Lee,

After merging the mfd tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

drivers/mfd/cros_ec_dev.c:265:13: warning: '__remove' defined but not used 
[-Wunused-function]
 static void __remove(struct device *dev) { }
 ^~~~

Introduced by commit

  3aa2177e4787 ("mfd: cros_ec: Use devm_kzalloc for private data")

-- 
Cheers,
Stephen Rothwell


pgpq1FM7QUqan.pgp
Description: OpenPGP digital signature


linux-next: build warning after merge of the mfd tree

2018-06-04 Thread Stephen Rothwell
Hi Lee,

After merging the mfd tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

drivers/mfd/cros_ec_dev.c:265:13: warning: '__remove' defined but not used 
[-Wunused-function]
 static void __remove(struct device *dev) { }
 ^~~~

Introduced by commit

  3aa2177e4787 ("mfd: cros_ec: Use devm_kzalloc for private data")

-- 
Cheers,
Stephen Rothwell


pgpq1FM7QUqan.pgp
Description: OpenPGP digital signature


rf69_set_deviation in rf69.c (pi433 driver)

2018-06-04 Thread Hugo Lefeuvre
Hi Marcus,

I have been taking a look at the pi433 driver these last days, and started
working on the remaining TODOs. I just stumbled across the following
one (drivers/staging/pi433/rf69.c):

245 // TODO: Dependency to bitrate
246 if (deviation < 600 || deviation > 50) {
247 dev_dbg(>dev, "set_deviation: illegal input param");
248 return -EINVAL;
249 }

According to the datasheet[0], the deviation should always be smaller
than 300kHz, and the following equation should be respected:

  (1) FDA + BRF/2 =< 500 kHz

Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like
a bug to me.

Concerning the TODO, I can see two solutions currently:

1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB
   and REG_BITRATE_LSB and returns reconstructed BRF.
   We could use this function in rf69_set_deviation to retrieve the BRF.

+ clean
+ intuitive
- heavy / slow

2. Store BRF somewhere in rf69.c, initialize it with the default value
   (4.8 kb/s) and update it when rf69_set_bit_rate is called.

+ easy
- dirty, doesn't fit well with the design of rf69.c (do not store
  anything, simply expose API)

I'd really prefer going for the first one, but I wanted to have your opinion
on this.

Thanks for your work !

Best regards,
 Hugo

[0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf

[CC-ing Valentin Vidic, he was quite active on the pi433 driver these
last months]

-- 
 Hugo Lefeuvre (hle)|www.owl.eu.com
4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA


signature.asc
Description: PGP signature


rf69_set_deviation in rf69.c (pi433 driver)

2018-06-04 Thread Hugo Lefeuvre
Hi Marcus,

I have been taking a look at the pi433 driver these last days, and started
working on the remaining TODOs. I just stumbled across the following
one (drivers/staging/pi433/rf69.c):

245 // TODO: Dependency to bitrate
246 if (deviation < 600 || deviation > 50) {
247 dev_dbg(>dev, "set_deviation: illegal input param");
248 return -EINVAL;
249 }

According to the datasheet[0], the deviation should always be smaller
than 300kHz, and the following equation should be respected:

  (1) FDA + BRF/2 =< 500 kHz

Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like
a bug to me.

Concerning the TODO, I can see two solutions currently:

1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB
   and REG_BITRATE_LSB and returns reconstructed BRF.
   We could use this function in rf69_set_deviation to retrieve the BRF.

+ clean
+ intuitive
- heavy / slow

2. Store BRF somewhere in rf69.c, initialize it with the default value
   (4.8 kb/s) and update it when rf69_set_bit_rate is called.

+ easy
- dirty, doesn't fit well with the design of rf69.c (do not store
  anything, simply expose API)

I'd really prefer going for the first one, but I wanted to have your opinion
on this.

Thanks for your work !

Best regards,
 Hugo

[0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf

[CC-ing Valentin Vidic, he was quite active on the pi433 driver these
last months]

-- 
 Hugo Lefeuvre (hle)|www.owl.eu.com
4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA


signature.asc
Description: PGP signature


Re: [PATCH] ARM: dts: armada388-helios4

2018-06-04 Thread Baruch Siach
Hi Dennis,

On Mon, Jun 04, 2018 at 08:13:43PM -0500, Dennis Gilmore wrote:
> The helios4 is a Armada388 based nas board designed by SolidRun and
> based on their SOM. It is sold by kobol.io the dts file came from
> https://raw.githubusercontent.com/armbian/build/master/patch/kernel/mvebu-default/95-helios4-device-tree.patch
> I added a SPDX license line to match the clearfog it says it was based
> on and a compatible line for "solidrun,helios4"
> 
> Signed-off-by: Dennis Gilmore 
> ---
>  arch/arm/boot/dts/Makefile   |   1 +
>  arch/arm/boot/dts/armada-388-helios4.dts | 315 +++
>  2 files changed, 316 insertions(+)
>  create mode 100644 arch/arm/boot/dts/armada-388-helios4.dts
> 
> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
> index 7e2424957809..490bfd586198 100644
> --- a/arch/arm/boot/dts/Makefile
> +++ b/arch/arm/boot/dts/Makefile
> @@ -1123,6 +1123,7 @@ dtb-$(CONFIG_MACH_ARMADA_38X) += \
>   armada-388-clearfog-pro.dtb \
>   armada-388-db.dtb \
>   armada-388-gp.dtb \
> + armada-388-helios4.dtb \
>   armada-388-rd.dtb
>  dtb-$(CONFIG_MACH_ARMADA_39X) += \
>   armada-398-db.dtb
> diff --git a/arch/arm/boot/dts/armada-388-helios4.dts 
> b/arch/arm/boot/dts/armada-388-helios4.dts
> new file mode 100644
> index ..a6d6996e1378
> --- /dev/null
> +++ b/arch/arm/boot/dts/armada-388-helios4.dts
> @@ -0,0 +1,315 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +/*
> + * Device Tree file for Helios4
> + * based on SolidRun Clearfog revision A1 rev 2.0 (88F6828)
> + *
> + *  Copyright (C) 2017
> + *
> + */
> +
> +/dts-v1/;
> +#include "armada-388.dtsi"
> +#include "armada-38x-solidrun-microsom.dtsi"
> +
> +/ {
> + model = "Helios4";
> + compatible = "solidrun,helios4", "marvell,armada388",

SolidRun is not the producer of the Helios4 carrier board. So I think 
"kobol,helios4" makes more sense. This is just like the "auvidea,h100" 
compatible string for Auvidea H100 which is also using a SolidRun SOM.

baruch

> + "marvell,armada385", "marvell,armada380";
> +

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


Re: [PATCH] ARM: dts: armada388-helios4

2018-06-04 Thread Baruch Siach
Hi Dennis,

On Mon, Jun 04, 2018 at 08:13:43PM -0500, Dennis Gilmore wrote:
> The helios4 is a Armada388 based nas board designed by SolidRun and
> based on their SOM. It is sold by kobol.io the dts file came from
> https://raw.githubusercontent.com/armbian/build/master/patch/kernel/mvebu-default/95-helios4-device-tree.patch
> I added a SPDX license line to match the clearfog it says it was based
> on and a compatible line for "solidrun,helios4"
> 
> Signed-off-by: Dennis Gilmore 
> ---
>  arch/arm/boot/dts/Makefile   |   1 +
>  arch/arm/boot/dts/armada-388-helios4.dts | 315 +++
>  2 files changed, 316 insertions(+)
>  create mode 100644 arch/arm/boot/dts/armada-388-helios4.dts
> 
> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
> index 7e2424957809..490bfd586198 100644
> --- a/arch/arm/boot/dts/Makefile
> +++ b/arch/arm/boot/dts/Makefile
> @@ -1123,6 +1123,7 @@ dtb-$(CONFIG_MACH_ARMADA_38X) += \
>   armada-388-clearfog-pro.dtb \
>   armada-388-db.dtb \
>   armada-388-gp.dtb \
> + armada-388-helios4.dtb \
>   armada-388-rd.dtb
>  dtb-$(CONFIG_MACH_ARMADA_39X) += \
>   armada-398-db.dtb
> diff --git a/arch/arm/boot/dts/armada-388-helios4.dts 
> b/arch/arm/boot/dts/armada-388-helios4.dts
> new file mode 100644
> index ..a6d6996e1378
> --- /dev/null
> +++ b/arch/arm/boot/dts/armada-388-helios4.dts
> @@ -0,0 +1,315 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +/*
> + * Device Tree file for Helios4
> + * based on SolidRun Clearfog revision A1 rev 2.0 (88F6828)
> + *
> + *  Copyright (C) 2017
> + *
> + */
> +
> +/dts-v1/;
> +#include "armada-388.dtsi"
> +#include "armada-38x-solidrun-microsom.dtsi"
> +
> +/ {
> + model = "Helios4";
> + compatible = "solidrun,helios4", "marvell,armada388",

SolidRun is not the producer of the Helios4 carrier board. So I think 
"kobol,helios4" makes more sense. This is just like the "auvidea,h100" 
compatible string for Auvidea H100 which is also using a SolidRun SOM.

baruch

> + "marvell,armada385", "marvell,armada380";
> +

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


Re: [PATCH] Use an IDR to allocate apparmor secids

2018-06-04 Thread John Johansen
On 06/04/2018 07:27 PM, Matthew Wilcox wrote:
> On Mon, Jun 04, 2018 at 06:27:09PM -0700, John Johansen wrote:
>> hey Mathew,
>>
>> I've pulled this into apparmor-next and done the retuning of
>> AA_SECID_INVALID a follow on patch. The reworking of the api to
>> return the specific error type can wait for another cycle.
> 
> Oh ... here's what I currently have.  I decided that AA_SECID_INVALID
> wasn't needed.
> 
well not needed in the allocation path, but definitely needed and it
needs to be 0.

This is for catching some uninitialized or freed and zeroed values.
The debug checks aren't in the current version, as they were
residing in another debug patch, but I will pull them out into their
own patch.


Re: [PATCH] Use an IDR to allocate apparmor secids

2018-06-04 Thread John Johansen
On 06/04/2018 07:27 PM, Matthew Wilcox wrote:
> On Mon, Jun 04, 2018 at 06:27:09PM -0700, John Johansen wrote:
>> hey Mathew,
>>
>> I've pulled this into apparmor-next and done the retuning of
>> AA_SECID_INVALID a follow on patch. The reworking of the api to
>> return the specific error type can wait for another cycle.
> 
> Oh ... here's what I currently have.  I decided that AA_SECID_INVALID
> wasn't needed.
> 
well not needed in the allocation path, but definitely needed and it
needs to be 0.

This is for catching some uninitialized or freed and zeroed values.
The debug checks aren't in the current version, as they were
residing in another debug patch, but I will pull them out into their
own patch.


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Darren Hart
On Mon, Jun 04, 2018 at 04:23:04PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 04-06-18 15:51, Daniel Drake wrote:
> > On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  wrote:
> > > Is this really a case of the hardware itself processing the
> > > keypress and then changing the brightness *itself* ?
> > > 
> > >  From the "[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight
> > > toggle support" patch I get the impression that the driver is
> > > modifying the brightness from within the kernel rather then the
> > > keyboard controller are ACPI embeddec-controller doing it itself.
> > > 
> > > If that is the case then the right fix is for the driver to stop
> > > mucking with the brighness itself, it should simply report the
> > > right keyboard events and export a led interface and then userspace
> > > will do the right thing (and be able to offer flexible policies
> > > to the user).
> > 
> > Before this modification, the driver reports the brightness keypresses
> > to userspace and then userspace can respond by changing the brightness
> > level, as you describe.
> > 
> > You are right in that the hardware doesn't change the brightness
> > directly itself, which is the normal usage of LED_BRIGHT_HW_CHANGED.
> > 
> > However this approach was suggested by Benjamin Berg and Bastien
> > Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi: Add
> > keyboard backlight toggle support
> > https://marc.info/?l=linux-kernel=152639169210655=2
> > 
> > The issue is that we need to support a new "keyboard backlight
> > brightness cycle" key (in the patch that follows this one) which
> > doesn't fit into any definitions of keys recognised by the kernel and
> > likewise there's no userspace code to handle it.
> > 
> > If preferred we could leave the standard brightness keys behaving as
> > they are (input events) and make the new special key type directly
> > handled by the kernel?
> 
> I'm sorry that Benjamin and Bastien steered you in this direction,
> IMHO none of it should be handled in the kernel.
> 
> Anytime any sort of input is directly responded to by the kernel
> it is a huge PITA to deal with from userspace. The kernel will have
> a simplistic implementation which almost always is wrong.
> 
> Benjamin, remember the pain we went through with rfkill hotkey
> presses being handled in the kernel ?
> 
> And then there is the whole acpi_video.brightness_switch_enabled
> debacle, which is an option which defaults to true which causes
> the kernel to handle LCD brightness key presses, which all distros
> have been patching to default to off for ages.
> 
> To give a concrete example, we may want to implement software
> dimming / auto-off of the kbd backlight when the no keys are
> touched for x seconds. This would seriously get in the way of that.
> 
> So sorry, but NACK to this series.

So if instead of modifying the LED value, the kernel platform drivers
converted the TOGGLE into a cycle even by converting to an UP event
based on awareness of the plaform specific max value and the read
current value, leaving userspace to act on the TOGGLE/UP events - would
that be preferable?

Something like:

if (code == TOGGLE && ledval < ledmax)
code = UP;

sparse_keymap_report_event(..., code, ...)

}
-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Darren Hart
On Mon, Jun 04, 2018 at 04:23:04PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 04-06-18 15:51, Daniel Drake wrote:
> > On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  wrote:
> > > Is this really a case of the hardware itself processing the
> > > keypress and then changing the brightness *itself* ?
> > > 
> > >  From the "[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight
> > > toggle support" patch I get the impression that the driver is
> > > modifying the brightness from within the kernel rather then the
> > > keyboard controller are ACPI embeddec-controller doing it itself.
> > > 
> > > If that is the case then the right fix is for the driver to stop
> > > mucking with the brighness itself, it should simply report the
> > > right keyboard events and export a led interface and then userspace
> > > will do the right thing (and be able to offer flexible policies
> > > to the user).
> > 
> > Before this modification, the driver reports the brightness keypresses
> > to userspace and then userspace can respond by changing the brightness
> > level, as you describe.
> > 
> > You are right in that the hardware doesn't change the brightness
> > directly itself, which is the normal usage of LED_BRIGHT_HW_CHANGED.
> > 
> > However this approach was suggested by Benjamin Berg and Bastien
> > Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi: Add
> > keyboard backlight toggle support
> > https://marc.info/?l=linux-kernel=152639169210655=2
> > 
> > The issue is that we need to support a new "keyboard backlight
> > brightness cycle" key (in the patch that follows this one) which
> > doesn't fit into any definitions of keys recognised by the kernel and
> > likewise there's no userspace code to handle it.
> > 
> > If preferred we could leave the standard brightness keys behaving as
> > they are (input events) and make the new special key type directly
> > handled by the kernel?
> 
> I'm sorry that Benjamin and Bastien steered you in this direction,
> IMHO none of it should be handled in the kernel.
> 
> Anytime any sort of input is directly responded to by the kernel
> it is a huge PITA to deal with from userspace. The kernel will have
> a simplistic implementation which almost always is wrong.
> 
> Benjamin, remember the pain we went through with rfkill hotkey
> presses being handled in the kernel ?
> 
> And then there is the whole acpi_video.brightness_switch_enabled
> debacle, which is an option which defaults to true which causes
> the kernel to handle LCD brightness key presses, which all distros
> have been patching to default to off for ages.
> 
> To give a concrete example, we may want to implement software
> dimming / auto-off of the kbd backlight when the no keys are
> touched for x seconds. This would seriously get in the way of that.
> 
> So sorry, but NACK to this series.

So if instead of modifying the LED value, the kernel platform drivers
converted the TOGGLE into a cycle even by converting to an UP event
based on awareness of the plaform specific max value and the read
current value, leaving userspace to act on the TOGGLE/UP events - would
that be preferable?

Something like:

if (code == TOGGLE && ledval < ledmax)
code = UP;

sparse_keymap_report_event(..., code, ...)

}
-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH 2/2] sched/fair: util_est: add running_sum tracking

2018-06-04 Thread kbuild test robot
Hi Patrick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20180604]
[cannot apply to v4.17]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Patrick-Bellasi/sched-fair-pelt-use-u32-for-util_avg/20180605-082640
config: x86_64-randconfig-x011-201822 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function 'enqueue_task_fair':
>> kernel/sched/fair.c:5450:2: error: implicit declaration of function 
>> 'util_est_enqueue_running'; did you mean 'util_est_enqueue'? 
>> [-Werror=implicit-function-declaration]
 util_est_enqueue_running(p);
 ^~~~
 util_est_enqueue
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u32
   Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_add
   Cyclomatic Complexity 1 
include/asm-generic/atomic-instrumented.h:atomic64_add
   Cyclomatic Complexity 2 arch/x86/include/asm/jump_label.h:arch_static_branch
   Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_false
   Cyclomatic Complexity 1 include/linux/math64.h:mul_u64_u32_shr
   Cyclomatic Complexity 2 include/linux/thread_info.h:test_ti_thread_flag
   Cyclomatic Complexity 5 arch/x86/include/asm/preempt.h:__preempt_count_add
   Cyclomatic Complexity 1 
arch/x86/include/asm/preempt.h:__preempt_count_dec_and_test
   Cyclomatic Complexity 1 include/linux/lockdep.h:lock_is_held
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_lock_acquire
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_lock_release
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_lock
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_unlock
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_lock_sched_notrace
   Cyclomatic Complexity 2 
include/linux/rcupdate.h:rcu_read_unlock_sched_notrace
   Cyclomatic Complexity 1 include/linux/rbtree.h:rb_link_node
   Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info
   Cyclomatic Complexity 1 include/linux/sched.h:test_tsk_thread_flag
   Cyclomatic Complexity 1 include/linux/sched.h:test_tsk_need_resched
   Cyclomatic Complexity 1 include/linux/sched.h:task_cpu
   Cyclomatic Complexity 1 include/linux/sched/cputime.h:get_running_cputimer
   Cyclomatic Complexity 2 
include/linux/sched/cputime.h:account_group_exec_runtime
   Cyclomatic Complexity 1 include/linux/cgroup.h:task_css_set
   Cyclomatic Complexity 1 include/linux/cgroup.h:task_dfl_cgroup
   Cyclomatic Complexity 3 include/linux/cgroup.h:cgroup_parent
   Cyclomatic Complexity 1 include/linux/cgroup.h:cpuacct_charge
   Cyclomatic Complexity 2 include/linux/cgroup.h:cgroup_account_cputime
   Cyclomatic Complexity 1 kernel/sched/sched.h:cpu_of
   Cyclomatic Complexity 1 kernel/sched/sched.h:assert_clock_updated
   Cyclomatic Complexity 4 kernel/sched/sched.h:rq_clock
   Cyclomatic Complexity 4 kernel/sched/sched.h:rq_clock_task
   Cyclomatic Complexity 4 kernel/sched/sched.h:rq_clock_skip_update
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_pin_lock
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_unpin_lock
   Cyclomatic Complexity 1 kernel/sched/sched.h:task_on_rq_queued
   Cyclomatic Complexity 1 kernel/sched/sched.h:put_prev_task
   Cyclomatic Complexity 1 kernel/sched/sched.h:sched_update_tick_dependency
   Cyclomatic Complexity 2 kernel/sched/sched.h:add_nr_running
   Cyclomatic Complexity 1 kernel/sched/sched.h:sub_nr_running
   Cyclomatic Complexity 1 kernel/sched/sched.h:hrtick_enabled
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_lock
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_unlock
   Cyclomatic Complexity 2 kernel/sched/sched.h:cpufreq_update_util
   Cyclomatic Complexity 4 include/trace/events/sched.h:trace_sched_stat_runtime
   Cyclomatic Complexity 1 kernel/sched/fair.c:update_load_add
   Cyclomatic Complexity 1 kernel/sched/fair.c:update_load_sub
   Cyclomatic Complexity 1 kernel/sched/fair.c:update_load_set
   Cyclomatic Complexity 2 kernel/sched/fair.c:task_of
   Cyclomatic Complexity 2 kernel/sched/fair.c:rq_of
   Cyclomatic Complexity 1 kernel/sched/fair.c:task_cfs_rq
   Cyclomatic Complexity 1 kernel/sched/fair.c:cfs_rq_of
   Cyclomatic Complexity 1 kernel/sched/fair.c:group_cfs_rq
   Cyclomatic C

Re: [PATCH 2/2] sched/fair: util_est: add running_sum tracking

2018-06-04 Thread kbuild test robot
Hi Patrick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20180604]
[cannot apply to v4.17]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Patrick-Bellasi/sched-fair-pelt-use-u32-for-util_avg/20180605-082640
config: x86_64-randconfig-x011-201822 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function 'enqueue_task_fair':
>> kernel/sched/fair.c:5450:2: error: implicit declaration of function 
>> 'util_est_enqueue_running'; did you mean 'util_est_enqueue'? 
>> [-Werror=implicit-function-declaration]
 util_est_enqueue_running(p);
 ^~~~
 util_est_enqueue
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u32
   Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_add
   Cyclomatic Complexity 1 
include/asm-generic/atomic-instrumented.h:atomic64_add
   Cyclomatic Complexity 2 arch/x86/include/asm/jump_label.h:arch_static_branch
   Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_false
   Cyclomatic Complexity 1 include/linux/math64.h:mul_u64_u32_shr
   Cyclomatic Complexity 2 include/linux/thread_info.h:test_ti_thread_flag
   Cyclomatic Complexity 5 arch/x86/include/asm/preempt.h:__preempt_count_add
   Cyclomatic Complexity 1 
arch/x86/include/asm/preempt.h:__preempt_count_dec_and_test
   Cyclomatic Complexity 1 include/linux/lockdep.h:lock_is_held
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_lock_acquire
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_lock_release
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_lock
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_unlock
   Cyclomatic Complexity 1 include/linux/rcupdate.h:rcu_read_lock_sched_notrace
   Cyclomatic Complexity 2 
include/linux/rcupdate.h:rcu_read_unlock_sched_notrace
   Cyclomatic Complexity 1 include/linux/rbtree.h:rb_link_node
   Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info
   Cyclomatic Complexity 1 include/linux/sched.h:test_tsk_thread_flag
   Cyclomatic Complexity 1 include/linux/sched.h:test_tsk_need_resched
   Cyclomatic Complexity 1 include/linux/sched.h:task_cpu
   Cyclomatic Complexity 1 include/linux/sched/cputime.h:get_running_cputimer
   Cyclomatic Complexity 2 
include/linux/sched/cputime.h:account_group_exec_runtime
   Cyclomatic Complexity 1 include/linux/cgroup.h:task_css_set
   Cyclomatic Complexity 1 include/linux/cgroup.h:task_dfl_cgroup
   Cyclomatic Complexity 3 include/linux/cgroup.h:cgroup_parent
   Cyclomatic Complexity 1 include/linux/cgroup.h:cpuacct_charge
   Cyclomatic Complexity 2 include/linux/cgroup.h:cgroup_account_cputime
   Cyclomatic Complexity 1 kernel/sched/sched.h:cpu_of
   Cyclomatic Complexity 1 kernel/sched/sched.h:assert_clock_updated
   Cyclomatic Complexity 4 kernel/sched/sched.h:rq_clock
   Cyclomatic Complexity 4 kernel/sched/sched.h:rq_clock_task
   Cyclomatic Complexity 4 kernel/sched/sched.h:rq_clock_skip_update
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_pin_lock
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_unpin_lock
   Cyclomatic Complexity 1 kernel/sched/sched.h:task_on_rq_queued
   Cyclomatic Complexity 1 kernel/sched/sched.h:put_prev_task
   Cyclomatic Complexity 1 kernel/sched/sched.h:sched_update_tick_dependency
   Cyclomatic Complexity 2 kernel/sched/sched.h:add_nr_running
   Cyclomatic Complexity 1 kernel/sched/sched.h:sub_nr_running
   Cyclomatic Complexity 1 kernel/sched/sched.h:hrtick_enabled
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_lock
   Cyclomatic Complexity 1 kernel/sched/sched.h:rq_unlock
   Cyclomatic Complexity 2 kernel/sched/sched.h:cpufreq_update_util
   Cyclomatic Complexity 4 include/trace/events/sched.h:trace_sched_stat_runtime
   Cyclomatic Complexity 1 kernel/sched/fair.c:update_load_add
   Cyclomatic Complexity 1 kernel/sched/fair.c:update_load_sub
   Cyclomatic Complexity 1 kernel/sched/fair.c:update_load_set
   Cyclomatic Complexity 2 kernel/sched/fair.c:task_of
   Cyclomatic Complexity 2 kernel/sched/fair.c:rq_of
   Cyclomatic Complexity 1 kernel/sched/fair.c:task_cfs_rq
   Cyclomatic Complexity 1 kernel/sched/fair.c:cfs_rq_of
   Cyclomatic Complexity 1 kernel/sched/fair.c:group_cfs_rq
   Cyclomatic C

Re: [PATCH] Use an IDR to allocate apparmor secids

2018-06-04 Thread Matthew Wilcox
On Mon, Jun 04, 2018 at 06:27:09PM -0700, John Johansen wrote:
> hey Mathew,
> 
> I've pulled this into apparmor-next and done the retuning of
> AA_SECID_INVALID a follow on patch. The reworking of the api to
> return the specific error type can wait for another cycle.

Oh ... here's what I currently have.  I decided that AA_SECID_INVALID
wasn't needed.

>From 4e43853a7d984d7b16fd68f356aef1b686a05298 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox 
Date: Tue, 22 May 2018 05:37:19 -0400
Subject: [PATCH] Use an IDR to allocate apparmor secids

Replace the custom usage of the radix tree to store a list of free IDs
with the IDR.  Change the aa_alloc_secid() API to store the secid in the
label while holding the spinlock to ensure that a simultaneous lookup
cannot find an uninitialised secid.

Signed-off-by: Matthew Wilcox 
---
 security/apparmor/include/secid.h |   5 +-
 security/apparmor/label.c |   3 +-
 security/apparmor/secid.c | 135 ++
 3 files changed, 28 insertions(+), 115 deletions(-)

diff --git a/security/apparmor/include/secid.h 
b/security/apparmor/include/secid.h
index 686de8e50a79..a8ba767b2d2c 100644
--- a/security/apparmor/include/secid.h
+++ b/security/apparmor/include/secid.h
@@ -19,16 +19,13 @@
 
 struct aa_label;
 
-/* secid value that will not be allocated */
-#define AA_SECID_INVALID 0
-
 struct aa_label *aa_secid_to_label(u32 secid);
 int apparmor_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int apparmor_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void apparmor_release_secctx(char *secdata, u32 seclen);
 
 
-u32 aa_alloc_secid(struct aa_label *label, gfp_t gfp);
+int aa_alloc_secid(struct aa_label *label, gfp_t gfp);
 void aa_free_secid(u32 secid);
 void aa_secid_update(u32 secid, struct aa_label *label);
 
diff --git a/security/apparmor/label.c b/security/apparmor/label.c
index a17574df611b..ba11bdf9043a 100644
--- a/security/apparmor/label.c
+++ b/security/apparmor/label.c
@@ -407,8 +407,7 @@ bool aa_label_init(struct aa_label *label, int size, gfp_t 
gfp)
AA_BUG(!label);
AA_BUG(size < 1);
 
-   label->secid = aa_alloc_secid(label, gfp);
-   if (label->secid == AA_SECID_INVALID)
+   if (aa_alloc_secid(label, gfp) < 0)
return false;
 
label->size = size; /* doesn't include null */
diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
index c2f0c1571156..bc9f8e101b65 100644
--- a/security/apparmor/secid.c
+++ b/security/apparmor/secid.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -30,18 +31,10 @@
 /*
  * secids - do not pin labels with a refcount. They rely on the label
  * properly updating/freeing them
- *
- * A singly linked free list is used to track secids that have been
- * freed and reuse them before allocating new ones
  */
 
-#define FREE_LIST_HEAD 1
-
-static RADIX_TREE(aa_secids_map, GFP_ATOMIC);
+static DEFINE_IDR(aa_secids);
 static DEFINE_SPINLOCK(secid_lock);
-static u32 alloced_secid = FREE_LIST_HEAD;
-static u32 free_list = FREE_LIST_HEAD;
-static unsigned long free_count;
 
 /*
  * TODO: allow policy to reserve a secid range?
@@ -49,65 +42,6 @@ static unsigned long free_count;
  * TODO: use secid_update in label replace
  */
 
-#define SECID_MAX U32_MAX
-
-/* TODO: mark free list as exceptional */
-static void *to_ptr(u32 secid)
-{
-   return (void *)
-   unsigned long) secid) << RADIX_TREE_EXCEPTIONAL_SHIFT));
-}
-
-static u32 to_secid(void *ptr)
-{
-   return (u32) (((unsigned long) ptr) >> RADIX_TREE_EXCEPTIONAL_SHIFT);
-}
-
-
-/* TODO: tag free_list entries to mark them as different */
-static u32 __pop(struct aa_label *label)
-{
-   u32 secid = free_list;
-   void __rcu **slot;
-   void *entry;
-
-   if (free_list == FREE_LIST_HEAD)
-   return AA_SECID_INVALID;
-
-   slot = radix_tree_lookup_slot(_secids_map, secid);
-   AA_BUG(!slot);
-   entry = radix_tree_deref_slot_protected(slot, _lock);
-   free_list = to_secid(entry);
-   radix_tree_replace_slot(_secids_map, slot, label);
-   free_count--;
-
-   return secid;
-}
-
-static void __push(u32 secid)
-{
-   void __rcu **slot;
-
-   slot = radix_tree_lookup_slot(_secids_map, secid);
-   AA_BUG(!slot);
-   radix_tree_replace_slot(_secids_map, slot, to_ptr(free_list));
-   free_list = secid;
-   free_count++;
-}
-
-static struct aa_label * __secid_update(u32 secid, struct aa_label *label)
-{
-   struct aa_label *old;
-   void __rcu **slot;
-
-   slot = radix_tree_lookup_slot(_secids_map, secid);
-   AA_BUG(!slot);
-   old = radix_tree_deref_slot_protected(slot, _lock);
-   radix_tree_replace_slot(_secids_map, slot, label);
-
-   return old;
-}
-
 /**
  * aa_secid_update - update a secid mapping to a new label
  * @secid: secid to update
@@ -115,11 +49,10 @@ static struct aa_label * 

Re: [PATCH] Use an IDR to allocate apparmor secids

2018-06-04 Thread Matthew Wilcox
On Mon, Jun 04, 2018 at 06:27:09PM -0700, John Johansen wrote:
> hey Mathew,
> 
> I've pulled this into apparmor-next and done the retuning of
> AA_SECID_INVALID a follow on patch. The reworking of the api to
> return the specific error type can wait for another cycle.

Oh ... here's what I currently have.  I decided that AA_SECID_INVALID
wasn't needed.

>From 4e43853a7d984d7b16fd68f356aef1b686a05298 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox 
Date: Tue, 22 May 2018 05:37:19 -0400
Subject: [PATCH] Use an IDR to allocate apparmor secids

Replace the custom usage of the radix tree to store a list of free IDs
with the IDR.  Change the aa_alloc_secid() API to store the secid in the
label while holding the spinlock to ensure that a simultaneous lookup
cannot find an uninitialised secid.

Signed-off-by: Matthew Wilcox 
---
 security/apparmor/include/secid.h |   5 +-
 security/apparmor/label.c |   3 +-
 security/apparmor/secid.c | 135 ++
 3 files changed, 28 insertions(+), 115 deletions(-)

diff --git a/security/apparmor/include/secid.h 
b/security/apparmor/include/secid.h
index 686de8e50a79..a8ba767b2d2c 100644
--- a/security/apparmor/include/secid.h
+++ b/security/apparmor/include/secid.h
@@ -19,16 +19,13 @@
 
 struct aa_label;
 
-/* secid value that will not be allocated */
-#define AA_SECID_INVALID 0
-
 struct aa_label *aa_secid_to_label(u32 secid);
 int apparmor_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int apparmor_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void apparmor_release_secctx(char *secdata, u32 seclen);
 
 
-u32 aa_alloc_secid(struct aa_label *label, gfp_t gfp);
+int aa_alloc_secid(struct aa_label *label, gfp_t gfp);
 void aa_free_secid(u32 secid);
 void aa_secid_update(u32 secid, struct aa_label *label);
 
diff --git a/security/apparmor/label.c b/security/apparmor/label.c
index a17574df611b..ba11bdf9043a 100644
--- a/security/apparmor/label.c
+++ b/security/apparmor/label.c
@@ -407,8 +407,7 @@ bool aa_label_init(struct aa_label *label, int size, gfp_t 
gfp)
AA_BUG(!label);
AA_BUG(size < 1);
 
-   label->secid = aa_alloc_secid(label, gfp);
-   if (label->secid == AA_SECID_INVALID)
+   if (aa_alloc_secid(label, gfp) < 0)
return false;
 
label->size = size; /* doesn't include null */
diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
index c2f0c1571156..bc9f8e101b65 100644
--- a/security/apparmor/secid.c
+++ b/security/apparmor/secid.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -30,18 +31,10 @@
 /*
  * secids - do not pin labels with a refcount. They rely on the label
  * properly updating/freeing them
- *
- * A singly linked free list is used to track secids that have been
- * freed and reuse them before allocating new ones
  */
 
-#define FREE_LIST_HEAD 1
-
-static RADIX_TREE(aa_secids_map, GFP_ATOMIC);
+static DEFINE_IDR(aa_secids);
 static DEFINE_SPINLOCK(secid_lock);
-static u32 alloced_secid = FREE_LIST_HEAD;
-static u32 free_list = FREE_LIST_HEAD;
-static unsigned long free_count;
 
 /*
  * TODO: allow policy to reserve a secid range?
@@ -49,65 +42,6 @@ static unsigned long free_count;
  * TODO: use secid_update in label replace
  */
 
-#define SECID_MAX U32_MAX
-
-/* TODO: mark free list as exceptional */
-static void *to_ptr(u32 secid)
-{
-   return (void *)
-   unsigned long) secid) << RADIX_TREE_EXCEPTIONAL_SHIFT));
-}
-
-static u32 to_secid(void *ptr)
-{
-   return (u32) (((unsigned long) ptr) >> RADIX_TREE_EXCEPTIONAL_SHIFT);
-}
-
-
-/* TODO: tag free_list entries to mark them as different */
-static u32 __pop(struct aa_label *label)
-{
-   u32 secid = free_list;
-   void __rcu **slot;
-   void *entry;
-
-   if (free_list == FREE_LIST_HEAD)
-   return AA_SECID_INVALID;
-
-   slot = radix_tree_lookup_slot(_secids_map, secid);
-   AA_BUG(!slot);
-   entry = radix_tree_deref_slot_protected(slot, _lock);
-   free_list = to_secid(entry);
-   radix_tree_replace_slot(_secids_map, slot, label);
-   free_count--;
-
-   return secid;
-}
-
-static void __push(u32 secid)
-{
-   void __rcu **slot;
-
-   slot = radix_tree_lookup_slot(_secids_map, secid);
-   AA_BUG(!slot);
-   radix_tree_replace_slot(_secids_map, slot, to_ptr(free_list));
-   free_list = secid;
-   free_count++;
-}
-
-static struct aa_label * __secid_update(u32 secid, struct aa_label *label)
-{
-   struct aa_label *old;
-   void __rcu **slot;
-
-   slot = radix_tree_lookup_slot(_secids_map, secid);
-   AA_BUG(!slot);
-   old = radix_tree_deref_slot_protected(slot, _lock);
-   radix_tree_replace_slot(_secids_map, slot, label);
-
-   return old;
-}
-
 /**
  * aa_secid_update - update a secid mapping to a new label
  * @secid: secid to update
@@ -115,11 +49,10 @@ static struct aa_label * 

RE: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
I misunderstood the cause of a deadlock.
I sent v2 fixing the commit message about the reason of the deadlock.
Please ignore this and review v2.
Thank you.

> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Tuesday, June 05, 2018 10:44 AM
> To: Hoeun Ryu 
> Cc: Andrew Morton ; Kees Cook
> ; Borislav Petkov ; Andi Kleen
> ; Josh Poimboeuf ; Hoeun Ryu
> ; linux-kernel@vger.kernel.org; Tejun Heo
> ; Vitaly Kuznetsov 
> Subject: Re: [PATCH] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> On Mon,  4 Jun 2018 14:45:57 +0900
> Hoeun Ryu  wrote:
> 
> > From: Hoeun Ryu 
> >
> >  Many console device drivers hold the uart_port->lock spinlock with irq
> enabled
> > (using spin_lock()) while the device drivers are writing characters to
> their devices,
> > but the device drivers just try to hold the spin lock (using
> spin_trylock()) if
> > "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
> >
> >  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. A CPU
> > could be stopped by smp_send_stop() while it was holding the port lock
> because irq was
> > enabled. Once a CPU stops, it doesn't respond interrupts anymore and the
> lock stays
> > locked forever.
> >
> >  console_flush_on_panic() is called during panic() and it eventually
> holds the uart
> > lock but the lock is held by another stopped CPU and it is a deadlock.
> By moving
> > bust_spinlocks(0) after console_flush_on_panic(), let the console device
> drivers
> > think the Oops is still in progress to call spin_trylock() instead of
> spin_lock() to
> > avoid the deadlock.
> >
> > Signed-off-by: Hoeun Ryu 
> > ---
> >  kernel/panic.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 42e4874..b4063b6 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
> > if (_crash_kexec_post_notifiers)
> > __crash_kexec(NULL);
> >
> > -   bust_spinlocks(0);
> > -
> > /*
> >  * We may have ended up stopping the CPU holding the lock (in
> >  * smp_send_stop()) while still having some valuable data in the
> console
> > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
> > debug_locks_off();
> > console_flush_on_panic();
> >
> > +   bust_spinlocks(0);
> 
> Added a few more to Cc. This looks like it could have subtle
> side-effects. I'd like those that have been touching the code around
> here to have a look.
> 
> -- Steve
> 
> 
> > +
> > if (!panic_blink)
> > panic_blink = no_blink;
> >



RE: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
I misunderstood the cause of a deadlock.
I sent v2 fixing the commit message about the reason of the deadlock.
Please ignore this and review v2.
Thank you.

> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Tuesday, June 05, 2018 10:44 AM
> To: Hoeun Ryu 
> Cc: Andrew Morton ; Kees Cook
> ; Borislav Petkov ; Andi Kleen
> ; Josh Poimboeuf ; Hoeun Ryu
> ; linux-kernel@vger.kernel.org; Tejun Heo
> ; Vitaly Kuznetsov 
> Subject: Re: [PATCH] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> On Mon,  4 Jun 2018 14:45:57 +0900
> Hoeun Ryu  wrote:
> 
> > From: Hoeun Ryu 
> >
> >  Many console device drivers hold the uart_port->lock spinlock with irq
> enabled
> > (using spin_lock()) while the device drivers are writing characters to
> their devices,
> > but the device drivers just try to hold the spin lock (using
> spin_trylock()) if
> > "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
> >
> >  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. A CPU
> > could be stopped by smp_send_stop() while it was holding the port lock
> because irq was
> > enabled. Once a CPU stops, it doesn't respond interrupts anymore and the
> lock stays
> > locked forever.
> >
> >  console_flush_on_panic() is called during panic() and it eventually
> holds the uart
> > lock but the lock is held by another stopped CPU and it is a deadlock.
> By moving
> > bust_spinlocks(0) after console_flush_on_panic(), let the console device
> drivers
> > think the Oops is still in progress to call spin_trylock() instead of
> spin_lock() to
> > avoid the deadlock.
> >
> > Signed-off-by: Hoeun Ryu 
> > ---
> >  kernel/panic.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 42e4874..b4063b6 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
> > if (_crash_kexec_post_notifiers)
> > __crash_kexec(NULL);
> >
> > -   bust_spinlocks(0);
> > -
> > /*
> >  * We may have ended up stopping the CPU holding the lock (in
> >  * smp_send_stop()) while still having some valuable data in the
> console
> > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
> > debug_locks_off();
> > console_flush_on_panic();
> >
> > +   bust_spinlocks(0);
> 
> Added a few more to Cc. This looks like it could have subtle
> side-effects. I'd like those that have been touching the code around
> here to have a look.
> 
> -- Steve
> 
> 
> > +
> > if (!panic_blink)
> > panic_blink = no_blink;
> >



[PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
From: Hoeun Ryu 

 Many console device drivers hold the uart_port->lock spinlock with irq disabled
(using spin_lock_irqsave()) while the device drivers are writing characters to 
their
devices, but the device drivers just try to hold the spin lock (using
spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 
1 to
avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. 
If the
kernel lockup detector calls panic() while the device driver is holding the 
lock,
it can cause a deadlock because panic() eventually calls console_unlock() and 
tries
to hold the lock. Here is an example.

 CPU0

 local_irq_save()
 .
 foo()
 bar()
 .  // foo() + bar() takes long time
 printk()
   console_unlock()
 call_console_drivers() // close to watchdog threshold
   some_slow_console_device_write() // device driver code
 spin_lock_irqsave(uart->lock)  // acquire uart spin lock
   slow-write()
 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 bust_spinlocks(0)  // now, oops_in_progress = 0
   console_flush_on_panic()
 console_unlock()
   call_console_drivers()
 some_slow_console_device_write()
   spin_lock_irqsave(uart->lock)
    deadlock// we can use 
spin_trylock_irqsave()

 console_flush_on_panic() is called in panic() and it eventually holds the uart
lock but the lock is held by the preempted CPU (the same CPU in NMI context) 
and it is
a deadlock.
 By moving bust_spinlocks(0) after console_flush_on_panic(), let the console 
device
drivers think the Oops is still in progress to call spin_trylock_irqsave() 
instead of
spin_lock_irqsave() to avoid the deadlock.

 CPU0

 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 console_flush_on_panic()
   console_unlock()
 call_console_drivers()
   some_slow_console_device_write()
 spin_trylock_irqsave(uart->lock)   // oops_in_progress = 1
  use trylock, no deadlock
 bust_spinlocks(0)  // now, oops_in_progress = 0

Signed-off-by: Hoeun Ryu 
---
 v2: fix commit message on the reason of a deadlock, no code change.

 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
-
/*
 * We may have ended up stopping the CPU holding the lock (in
 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic();
 
+   bust_spinlocks(0);
+
if (!panic_blink)
panic_blink = no_blink;
 
-- 
2.1.4



[PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
From: Hoeun Ryu 

 Many console device drivers hold the uart_port->lock spinlock with irq disabled
(using spin_lock_irqsave()) while the device drivers are writing characters to 
their
devices, but the device drivers just try to hold the spin lock (using
spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 
1 to
avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. 
If the
kernel lockup detector calls panic() while the device driver is holding the 
lock,
it can cause a deadlock because panic() eventually calls console_unlock() and 
tries
to hold the lock. Here is an example.

 CPU0

 local_irq_save()
 .
 foo()
 bar()
 .  // foo() + bar() takes long time
 printk()
   console_unlock()
 call_console_drivers() // close to watchdog threshold
   some_slow_console_device_write() // device driver code
 spin_lock_irqsave(uart->lock)  // acquire uart spin lock
   slow-write()
 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 bust_spinlocks(0)  // now, oops_in_progress = 0
   console_flush_on_panic()
 console_unlock()
   call_console_drivers()
 some_slow_console_device_write()
   spin_lock_irqsave(uart->lock)
    deadlock// we can use 
spin_trylock_irqsave()

 console_flush_on_panic() is called in panic() and it eventually holds the uart
lock but the lock is held by the preempted CPU (the same CPU in NMI context) 
and it is
a deadlock.
 By moving bust_spinlocks(0) after console_flush_on_panic(), let the console 
device
drivers think the Oops is still in progress to call spin_trylock_irqsave() 
instead of
spin_lock_irqsave() to avoid the deadlock.

 CPU0

 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 console_flush_on_panic()
   console_unlock()
 call_console_drivers()
   some_slow_console_device_write()
 spin_trylock_irqsave(uart->lock)   // oops_in_progress = 1
  use trylock, no deadlock
 bust_spinlocks(0)  // now, oops_in_progress = 0

Signed-off-by: Hoeun Ryu 
---
 v2: fix commit message on the reason of a deadlock, no code change.

 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
-
/*
 * We may have ended up stopping the CPU holding the lock (in
 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic();
 
+   bust_spinlocks(0);
+
if (!panic_blink)
panic_blink = no_blink;
 
-- 
2.1.4



Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Darren Hart
On Mon, Jun 04, 2018 at 08:32:37PM +0800, Chris Chiu wrote:
> Make asus-wmi notify on hotkey kbd brightness changes, listen for
> brightness events and update the brightness directly in the driver.
> For this purpose, bound check on brightness in kbd_led_set must be
> based on the same data type to prevent illegal value been set.
> 
> Update the brightness by led_classdev_notify_brightness_hw_changed.
> This will allow userspace to monitor (poll) for brightness changes
> on the LED without reporting via input keymapping.
> 
> Signed-off-by: Chris Chiu 
> ---
>  drivers/platform/x86/asus-nb-wmi.c |  2 --
>  drivers/platform/x86/asus-wmi.c| 21 +++--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/platform/x86/asus-nb-wmi.c 
> b/drivers/platform/x86/asus-nb-wmi.c
> index 136ff2b4cce5..7ce80e4bb5a3 100644
> --- a/drivers/platform/x86/asus-nb-wmi.c
> +++ b/drivers/platform/x86/asus-nb-wmi.c
> @@ -493,8 +493,6 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
>   { KE_KEY, 0xA6, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + TV + HDMI */
>   { KE_KEY, 0xA7, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + CRT + TV + 
> HDMI */
>   { KE_KEY, 0xB5, { KEY_CALC } },
> - { KE_KEY, 0xC4, { KEY_KBDILLUMUP } },
> - { KE_KEY, 0xC5, { KEY_KBDILLUMDOWN } },
>   { KE_IGNORE, 0xC6, },  /* Ambient Light Sensor notification */
>   { KE_END, 0},
>  };
> diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
> index 1f6e68f0b646..b4915b7718c1 100644
> --- a/drivers/platform/x86/asus-wmi.c
> +++ b/drivers/platform/x86/asus-wmi.c
> @@ -460,6 +460,7 @@ static void kbd_led_update(struct work_struct *work)
>   ctrl_param = 0x80 | (asus->kbd_led_wk & 0x7F);
>  
>   asus_wmi_set_devstate(ASUS_WMI_DEVID_KBD_BACKLIGHT, ctrl_param, NULL);
> + led_classdev_notify_brightness_hw_changed(>kbd_led, 
> asus->kbd_led_wk);
>  }
>  
>  static int kbd_led_read(struct asus_wmi *asus, int *level, int *env)
> @@ -497,9 +498,9 @@ static void kbd_led_set(struct led_classdev *led_cdev,
>  
>   asus = container_of(led_cdev, struct asus_wmi, kbd_led);
>  
> - if (value > asus->kbd_led.max_brightness)
> + if ((int)value > (int)asus->kbd_led.max_brightness)
>   value = asus->kbd_led.max_brightness;
> - else if (value < 0)
> + else if ((int)value < 0)
>   value = 0;
>  
>   asus->kbd_led_wk = value;
> @@ -656,6 +657,7 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
>  
>   asus->kbd_led_wk = led_val;
>   asus->kbd_led.name = "asus::kbd_backlight";
> + asus->kbd_led.flags = LED_BRIGHT_HW_CHANGED;
>   asus->kbd_led.brightness_set = kbd_led_set;
>   asus->kbd_led.brightness_get = kbd_led_get;
>   asus->kbd_led.max_brightness = 3;
> @@ -1700,6 +1702,13 @@ static int is_display_toggle(int code)
>   return 0;
>  }
>  
> +static int is_kbd_led_event(int code)
> +{
> + if (code == NOTIFY_KBD_BRTUP || code == NOTIFY_KBD_BRTDWN)
> + return 1;
> + return 0;

I don't think this is worth breaking out into a separate function. This
isn't exactly a hot path, but I don't think this makes the code any more
readable really. The is_display_toggle was comparing 8 values in a
complex logic statement, so we don't need to follow that for something
this simple.

> +}
> +
>  static void asus_wmi_notify(u32 value, void *context)
>  {
>   struct asus_wmi *asus = context;
> @@ -1745,6 +1754,14 @@ static void asus_wmi_notify(u32 value, void *context)
>   }
>   }
>  
> + if (is_kbd_led_event(code)) {
> + if (code == NOTIFY_KBD_BRTDWN)

Seems like we could eliminate the extra function and eliminate a level
of indentation by rewriting this as:

if (code == NOTIFY_KBD_BRTDWN)
kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
else if (code == NOTIFY_KBD_BRTUP)
kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
if (code == NOTIFY_KBD_BRTDWN || NOTIFY_KBD_BRTUP)
goto exit;

Or, just treat them as separate events:


if (code == NOTIFY_KBD_BRTDWN) {
kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
goto exit;
}

if (code == NOTIFY_KBD_BRTUP) {
kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
goto exit;
}



>   if (is_display_toggle(code) &&
>   asus->driver->quirks->no_display_toggle)
>   goto exit;
> -- 
> 2.11.0
> 
> 

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Darren Hart
On Mon, Jun 04, 2018 at 08:32:37PM +0800, Chris Chiu wrote:
> Make asus-wmi notify on hotkey kbd brightness changes, listen for
> brightness events and update the brightness directly in the driver.
> For this purpose, bound check on brightness in kbd_led_set must be
> based on the same data type to prevent illegal value been set.
> 
> Update the brightness by led_classdev_notify_brightness_hw_changed.
> This will allow userspace to monitor (poll) for brightness changes
> on the LED without reporting via input keymapping.
> 
> Signed-off-by: Chris Chiu 
> ---
>  drivers/platform/x86/asus-nb-wmi.c |  2 --
>  drivers/platform/x86/asus-wmi.c| 21 +++--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/platform/x86/asus-nb-wmi.c 
> b/drivers/platform/x86/asus-nb-wmi.c
> index 136ff2b4cce5..7ce80e4bb5a3 100644
> --- a/drivers/platform/x86/asus-nb-wmi.c
> +++ b/drivers/platform/x86/asus-nb-wmi.c
> @@ -493,8 +493,6 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
>   { KE_KEY, 0xA6, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + TV + HDMI */
>   { KE_KEY, 0xA7, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + CRT + TV + 
> HDMI */
>   { KE_KEY, 0xB5, { KEY_CALC } },
> - { KE_KEY, 0xC4, { KEY_KBDILLUMUP } },
> - { KE_KEY, 0xC5, { KEY_KBDILLUMDOWN } },
>   { KE_IGNORE, 0xC6, },  /* Ambient Light Sensor notification */
>   { KE_END, 0},
>  };
> diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
> index 1f6e68f0b646..b4915b7718c1 100644
> --- a/drivers/platform/x86/asus-wmi.c
> +++ b/drivers/platform/x86/asus-wmi.c
> @@ -460,6 +460,7 @@ static void kbd_led_update(struct work_struct *work)
>   ctrl_param = 0x80 | (asus->kbd_led_wk & 0x7F);
>  
>   asus_wmi_set_devstate(ASUS_WMI_DEVID_KBD_BACKLIGHT, ctrl_param, NULL);
> + led_classdev_notify_brightness_hw_changed(>kbd_led, 
> asus->kbd_led_wk);
>  }
>  
>  static int kbd_led_read(struct asus_wmi *asus, int *level, int *env)
> @@ -497,9 +498,9 @@ static void kbd_led_set(struct led_classdev *led_cdev,
>  
>   asus = container_of(led_cdev, struct asus_wmi, kbd_led);
>  
> - if (value > asus->kbd_led.max_brightness)
> + if ((int)value > (int)asus->kbd_led.max_brightness)
>   value = asus->kbd_led.max_brightness;
> - else if (value < 0)
> + else if ((int)value < 0)
>   value = 0;
>  
>   asus->kbd_led_wk = value;
> @@ -656,6 +657,7 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
>  
>   asus->kbd_led_wk = led_val;
>   asus->kbd_led.name = "asus::kbd_backlight";
> + asus->kbd_led.flags = LED_BRIGHT_HW_CHANGED;
>   asus->kbd_led.brightness_set = kbd_led_set;
>   asus->kbd_led.brightness_get = kbd_led_get;
>   asus->kbd_led.max_brightness = 3;
> @@ -1700,6 +1702,13 @@ static int is_display_toggle(int code)
>   return 0;
>  }
>  
> +static int is_kbd_led_event(int code)
> +{
> + if (code == NOTIFY_KBD_BRTUP || code == NOTIFY_KBD_BRTDWN)
> + return 1;
> + return 0;

I don't think this is worth breaking out into a separate function. This
isn't exactly a hot path, but I don't think this makes the code any more
readable really. The is_display_toggle was comparing 8 values in a
complex logic statement, so we don't need to follow that for something
this simple.

> +}
> +
>  static void asus_wmi_notify(u32 value, void *context)
>  {
>   struct asus_wmi *asus = context;
> @@ -1745,6 +1754,14 @@ static void asus_wmi_notify(u32 value, void *context)
>   }
>   }
>  
> + if (is_kbd_led_event(code)) {
> + if (code == NOTIFY_KBD_BRTDWN)

Seems like we could eliminate the extra function and eliminate a level
of indentation by rewriting this as:

if (code == NOTIFY_KBD_BRTDWN)
kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
else if (code == NOTIFY_KBD_BRTUP)
kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
if (code == NOTIFY_KBD_BRTDWN || NOTIFY_KBD_BRTUP)
goto exit;

Or, just treat them as separate events:


if (code == NOTIFY_KBD_BRTDWN) {
kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
goto exit;
}

if (code == NOTIFY_KBD_BRTUP) {
kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
goto exit;
}



>   if (is_display_toggle(code) &&
>   asus->driver->quirks->no_display_toggle)
>   goto exit;
> -- 
> 2.11.0
> 
> 

-- 
Darren Hart
VMware Open Source Technology Center


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> "Hatayama, Daisuke"  writes:
>
>>> Can you test this and please verify it fixes your issue?
>>
>> I tried this patch on top of v4.17 but the system fails to boot
>> without detecting root disks by dracut like this:
[snip]

>> OTOH, there's no issue on the pure v4.17 kernel.
>>
>> As above, ls /sys/module looks apparently good. But I guess any part of
>> behavior of getdentries() on sysfs must have changed, affecting the disk
>> detection...
>
> I haven't been able to reproduce this yet.  My test system boots fine.
> Which fedora are you testing on?

I reproduced something similar and fedora 28.  So I think I have found
and fixed the issue.  I believe I simply reversed the test at the end of
kernfs_dir_pos. AKA "<" instead of ">".

I am going to see if I can test my changes more throughly on this side
and then repost.



>>>  fs/kernfs/dir.c | 109
>>> ++--
>>>  1 file changed, 67 insertions(+), 42 deletions(-)
>>> 
>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>> index 89d1dc19340b..8148b5fec48d 100644
>>> --- a/fs/kernfs/dir.c
>>> +++ b/fs/kernfs/dir.c
>>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode 
>>> *inode,
>>> struct file *filp)
>>> return 0;
>>>  }
>>> 
>>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
>>> +{
>>> +   struct rb_node *node = rb_next(>rb);
>>> +   return node ? rb_to_kn(node) : NULL;
>>> +}
>>> +
>>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
>>> -   struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
>>> +   struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
>>>  {
>>> -   if (pos) {
>>> -   int valid = kernfs_active(pos) &&
>>> -   pos->parent == parent && hash == pos->hash;
>>> -   kernfs_put(pos);
>>> -   if (!valid)
>>> -   pos = NULL;
>>> -   }
>>> -   if (!pos && (hash > 1) && (hash < INT_MAX)) {
>>> -   struct rb_node *node = parent->dir.children.rb_node;
>>> -   while (node) {
>>> -   pos = rb_to_kn(node);
>>> -
>>> -   if (hash < pos->hash)
>>> -   node = node->rb_left;
>>> -   else if (hash > pos->hash)
>>> -   node = node->rb_right;
>>> -   else
>>> -   break;
>>> +   struct kernfs_node *pos;
>>> +   struct rb_node *node;
>>> +   unsigned int hash;
>>> +   const char *name = "";
>>> +
>>> +   /* Is off a valid name hash? */
>>> +   if ((off < 2) || (off >= INT_MAX))
>>> +   return NULL;
>>> +   hash = off;
>>> +
>>> +   /* Is the saved position usable? */
>>> +   if (saved) {
>>> +   /* Proper parent and hash? */
>>> +   if ((parent != saved->parent) || (saved->hash != hash)) {
>>> +   saved = NULL;
>>
>> name is uninitialized in this path.
>
> It is.  name is initialized to "" see above.
>
>>> +   } else {
>>> +   if (kernfs_active(saved))
>>> +   return saved;
>>> +   name = saved->name;
>>> }
>>> }
>>> -   /* Skip over entries which are dying/dead or in the wrong namespace
>>> */
>>> -   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
>>> -   struct rb_node *node = rb_next(>rb);
>>> -   if (!node)
>>> -   pos = NULL;
>>> +
>>> +   /* Find the closest pos to the hash we are looking for */
>>> +   pos = NULL;
>>> +   node = parent->dir.children.rb_node;
>>> +   while (node) {
>>> +   int result;
>>> +
>>> +   pos = rb_to_kn(node);
>>> +   result = kernfs_name_compare(hash, name, ns, pos);
>>> +   if (result < 0)
>>> +   node = node->rb_left;
>>> +   else if (result > 0)
>>> +   node = node->rb_right;
>>> else
>>> -   pos = rb_to_kn(node);
>>> +   break;
>>> }
>>> +
>>> +   /* Ensure pos is at or beyond the target position */
>>> +   if (pos && (kernfs_name_compare(hash, name, ns, pos) < 0))

  should be > 0
>>> +   pos = kernfs_dir_next(pos);
>>> +

Eric


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> "Hatayama, Daisuke"  writes:
>
>>> Can you test this and please verify it fixes your issue?
>>
>> I tried this patch on top of v4.17 but the system fails to boot
>> without detecting root disks by dracut like this:
[snip]

>> OTOH, there's no issue on the pure v4.17 kernel.
>>
>> As above, ls /sys/module looks apparently good. But I guess any part of
>> behavior of getdentries() on sysfs must have changed, affecting the disk
>> detection...
>
> I haven't been able to reproduce this yet.  My test system boots fine.
> Which fedora are you testing on?

I reproduced something similar and fedora 28.  So I think I have found
and fixed the issue.  I believe I simply reversed the test at the end of
kernfs_dir_pos. AKA "<" instead of ">".

I am going to see if I can test my changes more throughly on this side
and then repost.



>>>  fs/kernfs/dir.c | 109
>>> ++--
>>>  1 file changed, 67 insertions(+), 42 deletions(-)
>>> 
>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>> index 89d1dc19340b..8148b5fec48d 100644
>>> --- a/fs/kernfs/dir.c
>>> +++ b/fs/kernfs/dir.c
>>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode 
>>> *inode,
>>> struct file *filp)
>>> return 0;
>>>  }
>>> 
>>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
>>> +{
>>> +   struct rb_node *node = rb_next(>rb);
>>> +   return node ? rb_to_kn(node) : NULL;
>>> +}
>>> +
>>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
>>> -   struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
>>> +   struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
>>>  {
>>> -   if (pos) {
>>> -   int valid = kernfs_active(pos) &&
>>> -   pos->parent == parent && hash == pos->hash;
>>> -   kernfs_put(pos);
>>> -   if (!valid)
>>> -   pos = NULL;
>>> -   }
>>> -   if (!pos && (hash > 1) && (hash < INT_MAX)) {
>>> -   struct rb_node *node = parent->dir.children.rb_node;
>>> -   while (node) {
>>> -   pos = rb_to_kn(node);
>>> -
>>> -   if (hash < pos->hash)
>>> -   node = node->rb_left;
>>> -   else if (hash > pos->hash)
>>> -   node = node->rb_right;
>>> -   else
>>> -   break;
>>> +   struct kernfs_node *pos;
>>> +   struct rb_node *node;
>>> +   unsigned int hash;
>>> +   const char *name = "";
>>> +
>>> +   /* Is off a valid name hash? */
>>> +   if ((off < 2) || (off >= INT_MAX))
>>> +   return NULL;
>>> +   hash = off;
>>> +
>>> +   /* Is the saved position usable? */
>>> +   if (saved) {
>>> +   /* Proper parent and hash? */
>>> +   if ((parent != saved->parent) || (saved->hash != hash)) {
>>> +   saved = NULL;
>>
>> name is uninitialized in this path.
>
> It is.  name is initialized to "" see above.
>
>>> +   } else {
>>> +   if (kernfs_active(saved))
>>> +   return saved;
>>> +   name = saved->name;
>>> }
>>> }
>>> -   /* Skip over entries which are dying/dead or in the wrong namespace
>>> */
>>> -   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
>>> -   struct rb_node *node = rb_next(>rb);
>>> -   if (!node)
>>> -   pos = NULL;
>>> +
>>> +   /* Find the closest pos to the hash we are looking for */
>>> +   pos = NULL;
>>> +   node = parent->dir.children.rb_node;
>>> +   while (node) {
>>> +   int result;
>>> +
>>> +   pos = rb_to_kn(node);
>>> +   result = kernfs_name_compare(hash, name, ns, pos);
>>> +   if (result < 0)
>>> +   node = node->rb_left;
>>> +   else if (result > 0)
>>> +   node = node->rb_right;
>>> else
>>> -   pos = rb_to_kn(node);
>>> +   break;
>>> }
>>> +
>>> +   /* Ensure pos is at or beyond the target position */
>>> +   if (pos && (kernfs_name_compare(hash, name, ns, pos) < 0))

  should be > 0
>>> +   pos = kernfs_dir_next(pos);
>>> +

Eric


[PATCH] ksys_mount: check for permissions before resource allocation

2018-06-04 Thread Ilya Matveychikov
Early check for mount permissions prevents possible allocation of 3
pages from kmalloc() pool by unpriveledged user which can be used for
spraying the kernel heap.

Signed-off-by: Ilya V. Matveychikov 
---
 fs/namespace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5f75969adff1..1ef8feb2de2a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3046,6 +3046,9 @@ int ksys_mount(char __user *dev_name, char __user 
*dir_name, char __user *type,
char *kernel_dev;
void *options;

+   if (!may_mount())
+   return -EPERM;
+
kernel_type = copy_mount_string(type);
ret = PTR_ERR(kernel_type);
if (IS_ERR(kernel_type))
--
2.17.0



[PATCH] ksys_mount: check for permissions before resource allocation

2018-06-04 Thread Ilya Matveychikov
Early check for mount permissions prevents possible allocation of 3
pages from kmalloc() pool by unpriveledged user which can be used for
spraying the kernel heap.

Signed-off-by: Ilya V. Matveychikov 
---
 fs/namespace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5f75969adff1..1ef8feb2de2a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3046,6 +3046,9 @@ int ksys_mount(char __user *dev_name, char __user 
*dir_name, char __user *type,
char *kernel_dev;
void *options;

+   if (!may_mount())
+   return -EPERM;
+
kernel_type = copy_mount_string(type);
ret = PTR_ERR(kernel_type);
if (IS_ERR(kernel_type))
--
2.17.0



Re: [GIT PULL] x86/asm changes for v4.18

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 5:21 AM Ingo Molnar  wrote:
>
>  - __clear_user() micro-optimization (Alexey Dobriyan)

Was this actually tested?

I think one reason people avoided the constant was that on some
microarchitecture it ended up being a separate uop just for the
constant generation, because it wouldn't fit in a single uop.

I'm pretty sure that used to be the case for P4, for example.

Afaik there have also been issues with decoding instructions that have
both an immediate and a memory offset.

I suspect none of this is an issue on modern cores, but there really
at least historically were cases where

   mov %reg,mem

was better than

   mov $imm,mem

if %reg already had the right value, so it's not at all 100% obvious
that the micro-optimization really _optimizes_ anything.

Any time people do this, they should add numbers.

Linus


Re: [PATCH v2] platform/x86: asus-wmi: Add keyboard backlight toggle support

2018-06-04 Thread Darren Hart
On Mon, Jun 04, 2018 at 11:27:43AM +0200, Benjamin Berg wrote:
> Hi,
> 
> On Thu, 2018-05-24 at 16:33 +0800, Chris Chiu wrote:
> > I've made my change to set the brightness level directly in the
> > driver, but the
> > OSD doesn't show correctly correspond to the level value. The brightness 
> > shows
> > OK in /sys/class/led//brighness but the OSD always shows level 0. I 
> > thought
> > GNOME should read the brightness from /sys before showing OSD?
> 
> Sorry for the late response.
> 
> There is a special mechanism to report that the HW changed the
> brightness. This works using the "brightness_hw_changed" sysfs
> attribute. So you will need to set the LED_BRIGHT_HW_CHANGED flag on
> the LED and then call led_classdev_notify_brightness_hw_changed to make
> it work.
> 
> Userspace should correctly show the OSD when this is done.

This makes sense. Userspace can't be aware of every platforms sys files,
so there needs to be a common mechanism. LED_BRIGHT_HW_CHANGED makes
sense.

-- 
Darren Hart
VMware Open Source Technology Center


Re: [GIT PULL] x86/asm changes for v4.18

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 5:21 AM Ingo Molnar  wrote:
>
>  - __clear_user() micro-optimization (Alexey Dobriyan)

Was this actually tested?

I think one reason people avoided the constant was that on some
microarchitecture it ended up being a separate uop just for the
constant generation, because it wouldn't fit in a single uop.

I'm pretty sure that used to be the case for P4, for example.

Afaik there have also been issues with decoding instructions that have
both an immediate and a memory offset.

I suspect none of this is an issue on modern cores, but there really
at least historically were cases where

   mov %reg,mem

was better than

   mov $imm,mem

if %reg already had the right value, so it's not at all 100% obvious
that the micro-optimization really _optimizes_ anything.

Any time people do this, they should add numbers.

Linus


Re: [PATCH v2] platform/x86: asus-wmi: Add keyboard backlight toggle support

2018-06-04 Thread Darren Hart
On Mon, Jun 04, 2018 at 11:27:43AM +0200, Benjamin Berg wrote:
> Hi,
> 
> On Thu, 2018-05-24 at 16:33 +0800, Chris Chiu wrote:
> > I've made my change to set the brightness level directly in the
> > driver, but the
> > OSD doesn't show correctly correspond to the level value. The brightness 
> > shows
> > OK in /sys/class/led//brighness but the OSD always shows level 0. I 
> > thought
> > GNOME should read the brightness from /sys before showing OSD?
> 
> Sorry for the late response.
> 
> There is a special mechanism to report that the HW changed the
> brightness. This works using the "brightness_hw_changed" sysfs
> attribute. So you will need to set the LED_BRIGHT_HW_CHANGED flag on
> the LED and then call led_classdev_notify_brightness_hw_changed to make
> it work.
> 
> Userspace should correctly show the OSD when this is done.

This makes sense. Userspace can't be aware of every platforms sys files,
so there needs to be a common mechanism. LED_BRIGHT_HW_CHANGED makes
sense.

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH v4 8/8] Documentation/ABI: Add documentation mlxreg-io sysfs interfaces

2018-06-04 Thread Darren Hart
On Tue, May 29, 2018 at 08:59:07AM +, Vadim Pasternak wrote:
> Add documentation for mlxreg-io driver sysfs interfaces for user space
> access to system's power resets control, reset causes monitoring,
> programmable devices version reading and devices selection control.
> 
> Signed-off-by: Vadim Pasternak 

Hi Vadim,

You mentioned that these can vary by platform, I presume that means that
any given platform will present a subset of these attributes - but that
all possible attributes are documented here. Is that correct?

> ---
> v4:
>  Comments pointed out by Greg:
>  Add Documentation/ABI/ entries for the new sysfs files.
> ---
>  Documentation/ABI/stable/sysfs-driver-mlxreg-io | 51 
> +
>  1 file changed, 51 insertions(+)
>  create mode 100644 Documentation/ABI/stable/sysfs-driver-mlxreg-io
> 
> diff --git a/Documentation/ABI/stable/sysfs-driver-mlxreg-io 
> b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
> new file mode 100644
> index 000..fcd659e
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
> @@ -0,0 +1,51 @@
> +What:/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
> + cause_aux_pwr_or_ref
> + cause_asic_thermal
> + cause_hotswap_or_wd
> + cause_fw_reset
> + cause_long_pb
> + cause_main_pwr_fail
> + cause_short_pb
> + cause_sw_reset

These property names do not seem to follow any kind of logical pattern
with respect to abbreviation. Starting with "cause_" made me think they
were event generators at first... Of course, who wants to be able to
generate an asic thermal shutdown from sysfs... ;-)

The only thing that might make more sense to *me* would be to replace
"cause" with "reset". Not required. I'll leave it to you, you may have
good reason for the names chosen.

> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: These files show the system reset cause, as following: power
> + auxiliary outage or power refresh, ASIC thermal shutdown,
> + hotswap or watchdog, firmware reset, long press power button,
> + short press power button, software reset. Value 1 in file means
> + this is reset cause, 0 - otherwise. Only one of the above
> + causes could be 1 at the same time, representing only last
> + reset cause.
> +
> + The files are read only.
> +
> +What:/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
> + cpld1_version
> + cpld2_version
> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: These files show with which CPLD versions have been burned
> + on carrier and switch boards.
> +
> + The files are read only.
> +
> +What:
> /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/select_iio
> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: This file allows iio devices selection.
> +
> + The file is read/write.
> +

Some description is appropriate here for what values it accepts and how
it is intended to be used.

> +What:
> /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/psu1_on
> + /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/psu2_on
> + /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/pwr_cycle
> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: These files allow assert system's power cycling and PS units
> + on/off switching.

This Description is awkward. Consider:

These files allow asserting system power cycling and switching power
supply units on and off.

> + The files are write only.

Please provide usage and resulting behavior.

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH v4 8/8] Documentation/ABI: Add documentation mlxreg-io sysfs interfaces

2018-06-04 Thread Darren Hart
On Tue, May 29, 2018 at 08:59:07AM +, Vadim Pasternak wrote:
> Add documentation for mlxreg-io driver sysfs interfaces for user space
> access to system's power resets control, reset causes monitoring,
> programmable devices version reading and devices selection control.
> 
> Signed-off-by: Vadim Pasternak 

Hi Vadim,

You mentioned that these can vary by platform, I presume that means that
any given platform will present a subset of these attributes - but that
all possible attributes are documented here. Is that correct?

> ---
> v4:
>  Comments pointed out by Greg:
>  Add Documentation/ABI/ entries for the new sysfs files.
> ---
>  Documentation/ABI/stable/sysfs-driver-mlxreg-io | 51 
> +
>  1 file changed, 51 insertions(+)
>  create mode 100644 Documentation/ABI/stable/sysfs-driver-mlxreg-io
> 
> diff --git a/Documentation/ABI/stable/sysfs-driver-mlxreg-io 
> b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
> new file mode 100644
> index 000..fcd659e
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
> @@ -0,0 +1,51 @@
> +What:/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
> + cause_aux_pwr_or_ref
> + cause_asic_thermal
> + cause_hotswap_or_wd
> + cause_fw_reset
> + cause_long_pb
> + cause_main_pwr_fail
> + cause_short_pb
> + cause_sw_reset

These property names do not seem to follow any kind of logical pattern
with respect to abbreviation. Starting with "cause_" made me think they
were event generators at first... Of course, who wants to be able to
generate an asic thermal shutdown from sysfs... ;-)

The only thing that might make more sense to *me* would be to replace
"cause" with "reset". Not required. I'll leave it to you, you may have
good reason for the names chosen.

> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: These files show the system reset cause, as following: power
> + auxiliary outage or power refresh, ASIC thermal shutdown,
> + hotswap or watchdog, firmware reset, long press power button,
> + short press power button, software reset. Value 1 in file means
> + this is reset cause, 0 - otherwise. Only one of the above
> + causes could be 1 at the same time, representing only last
> + reset cause.
> +
> + The files are read only.
> +
> +What:/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
> + cpld1_version
> + cpld2_version
> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: These files show with which CPLD versions have been burned
> + on carrier and switch boards.
> +
> + The files are read only.
> +
> +What:
> /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/select_iio
> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: This file allows iio devices selection.
> +
> + The file is read/write.
> +

Some description is appropriate here for what values it accepts and how
it is intended to be used.

> +What:
> /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/psu1_on
> + /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/psu2_on
> + /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/pwr_cycle
> +Date:May 2018
> +KernelVersion:   4.18
> +Contact: Vadim Pasternak 
> +Description: These files allow assert system's power cycling and PS units
> + on/off switching.

This Description is awkward. Consider:

These files allow asserting system power cycling and switching power
supply units on and off.

> + The files are write only.

Please provide usage and resulting behavior.

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Steven Rostedt
On Mon,  4 Jun 2018 14:45:57 +0900
Hoeun Ryu  wrote:

> From: Hoeun Ryu 
> 
>  Many console device drivers hold the uart_port->lock spinlock with irq 
> enabled
> (using spin_lock()) while the device drivers are writing characters to their 
> devices,
> but the device drivers just try to hold the spin lock (using spin_trylock()) 
> if
> "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
> 
>  There is a case ocurring a deadlock related to the lock and 
> oops_in_progress. A CPU
> could be stopped by smp_send_stop() while it was holding the port lock 
> because irq was
> enabled. Once a CPU stops, it doesn't respond interrupts anymore and the lock 
> stays
> locked forever.
> 
>  console_flush_on_panic() is called during panic() and it eventually holds 
> the uart
> lock but the lock is held by another stopped CPU and it is a deadlock. By 
> moving
> bust_spinlocks(0) after console_flush_on_panic(), let the console device 
> drivers
> think the Oops is still in progress to call spin_trylock() instead of 
> spin_lock() to
> avoid the deadlock.
> 
> Signed-off-by: Hoeun Ryu 
> ---
>  kernel/panic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 42e4874..b4063b6 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
>   if (_crash_kexec_post_notifiers)
>   __crash_kexec(NULL);
>  
> - bust_spinlocks(0);
> -
>   /*
>* We may have ended up stopping the CPU holding the lock (in
>* smp_send_stop()) while still having some valuable data in the console
> @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
>   debug_locks_off();
>   console_flush_on_panic();
>  
> + bust_spinlocks(0);

Added a few more to Cc. This looks like it could have subtle
side-effects. I'd like those that have been touching the code around
here to have a look.

-- Steve


> +
>   if (!panic_blink)
>   panic_blink = no_blink;
>  



Re: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Steven Rostedt
On Mon,  4 Jun 2018 14:45:57 +0900
Hoeun Ryu  wrote:

> From: Hoeun Ryu 
> 
>  Many console device drivers hold the uart_port->lock spinlock with irq 
> enabled
> (using spin_lock()) while the device drivers are writing characters to their 
> devices,
> but the device drivers just try to hold the spin lock (using spin_trylock()) 
> if
> "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
> 
>  There is a case ocurring a deadlock related to the lock and 
> oops_in_progress. A CPU
> could be stopped by smp_send_stop() while it was holding the port lock 
> because irq was
> enabled. Once a CPU stops, it doesn't respond interrupts anymore and the lock 
> stays
> locked forever.
> 
>  console_flush_on_panic() is called during panic() and it eventually holds 
> the uart
> lock but the lock is held by another stopped CPU and it is a deadlock. By 
> moving
> bust_spinlocks(0) after console_flush_on_panic(), let the console device 
> drivers
> think the Oops is still in progress to call spin_trylock() instead of 
> spin_lock() to
> avoid the deadlock.
> 
> Signed-off-by: Hoeun Ryu 
> ---
>  kernel/panic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 42e4874..b4063b6 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
>   if (_crash_kexec_post_notifiers)
>   __crash_kexec(NULL);
>  
> - bust_spinlocks(0);
> -
>   /*
>* We may have ended up stopping the CPU holding the lock (in
>* smp_send_stop()) while still having some valuable data in the console
> @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
>   debug_locks_off();
>   console_flush_on_panic();
>  
> + bust_spinlocks(0);

Added a few more to Cc. This looks like it could have subtle
side-effects. I'd like those that have been touching the code around
here to have a look.

-- Steve


> +
>   if (!panic_blink)
>   panic_blink = no_blink;
>  



Re: [PATCH 1/2] sched/fair: pelt: use u32 for util_avg

2018-06-04 Thread kbuild test robot
Hi Patrick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.17 next-20180604]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Patrick-Bellasi/sched-fair-pelt-use-u32-for-util_avg/20180605-082640
config: i386-randconfig-s0-201822 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/sched/fair.o: In function `post_init_entity_util_avg':
>> kernel/sched/fair.c:761: undefined reference to `__udivdi3'

vim +761 kernel/sched/fair.c

   724  
   725  /*
   726   * With new tasks being created, their initial util_avgs are 
extrapolated
   727   * based on the cfs_rq's current util_avg:
   728   *
   729   *   util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * 
se.load.weight
   730   *
   731   * However, in many cases, the above util_avg does not give a desired
   732   * value. Moreover, the sum of the util_avgs may be divergent, such
   733   * as when the series is a harmonic series.
   734   *
   735   * To solve this problem, we also cap the util_avg of successive tasks 
to
   736   * only 1/2 of the left utilization budget:
   737   *
   738   *   util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
   739   *
   740   * where n denotes the nth task.
   741   *
   742   * For example, a simplest series from the beginning would be like:
   743   *
   744   *  task  util_avg: 512, 256, 128,  64,  32,   16,8, ...
   745   * cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
   746   *
   747   * Finally, that extrapolated util_avg is clamped to the cap 
(util_avg_cap)
   748   * if util_avg > util_avg_cap.
   749   */
   750  void post_init_entity_util_avg(struct sched_entity *se)
   751  {
   752  struct cfs_rq *cfs_rq = cfs_rq_of(se);
   753  long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) 
/ 2;
   754  
   755  if (cap > 0) {
   756  struct sched_avg *sa = >avg;
   757  u64 util_avg = READ_ONCE(sa->util_avg);
   758  
   759  if (cfs_rq->avg.util_avg != 0) {
   760  util_avg  =  cfs_rq->avg.util_avg * 
se->load.weight;
 > 761  util_avg /= (cfs_rq->avg.load_avg + 1);
   762  if (util_avg > cap)
   763  util_avg = cap;
   764  } else {
   765  util_avg = cap;
   766  }
   767  
   768  WRITE_ONCE(sa->util_avg, util_avg);
   769  }
   770  
   771  if (entity_is_task(se)) {
   772  struct task_struct *p = task_of(se);
   773  if (p->sched_class != _sched_class) {
   774  /*
   775   * For !fair tasks do:
   776   *
   777  update_cfs_rq_load_avg(now, cfs_rq);
   778  attach_entity_load_avg(cfs_rq, se, 0);
   779  switched_from_fair(rq, p);
   780   *
   781   * such that the next switched_to_fair() has the
   782   * expected state.
   783   */
   784  se->avg.last_update_time = 
cfs_rq_clock_task(cfs_rq);
   785  return;
   786  }
   787  }
   788  
   789  attach_entity_cfs_rq(se);
   790  }
   791  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/2] sched/fair: pelt: use u32 for util_avg

2018-06-04 Thread kbuild test robot
Hi Patrick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.17 next-20180604]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Patrick-Bellasi/sched-fair-pelt-use-u32-for-util_avg/20180605-082640
config: i386-randconfig-s0-201822 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/sched/fair.o: In function `post_init_entity_util_avg':
>> kernel/sched/fair.c:761: undefined reference to `__udivdi3'

vim +761 kernel/sched/fair.c

   724  
   725  /*
   726   * With new tasks being created, their initial util_avgs are 
extrapolated
   727   * based on the cfs_rq's current util_avg:
   728   *
   729   *   util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * 
se.load.weight
   730   *
   731   * However, in many cases, the above util_avg does not give a desired
   732   * value. Moreover, the sum of the util_avgs may be divergent, such
   733   * as when the series is a harmonic series.
   734   *
   735   * To solve this problem, we also cap the util_avg of successive tasks 
to
   736   * only 1/2 of the left utilization budget:
   737   *
   738   *   util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
   739   *
   740   * where n denotes the nth task.
   741   *
   742   * For example, a simplest series from the beginning would be like:
   743   *
   744   *  task  util_avg: 512, 256, 128,  64,  32,   16,8, ...
   745   * cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
   746   *
   747   * Finally, that extrapolated util_avg is clamped to the cap 
(util_avg_cap)
   748   * if util_avg > util_avg_cap.
   749   */
   750  void post_init_entity_util_avg(struct sched_entity *se)
   751  {
   752  struct cfs_rq *cfs_rq = cfs_rq_of(se);
   753  long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) 
/ 2;
   754  
   755  if (cap > 0) {
   756  struct sched_avg *sa = >avg;
   757  u64 util_avg = READ_ONCE(sa->util_avg);
   758  
   759  if (cfs_rq->avg.util_avg != 0) {
   760  util_avg  =  cfs_rq->avg.util_avg * 
se->load.weight;
 > 761  util_avg /= (cfs_rq->avg.load_avg + 1);
   762  if (util_avg > cap)
   763  util_avg = cap;
   764  } else {
   765  util_avg = cap;
   766  }
   767  
   768  WRITE_ONCE(sa->util_avg, util_avg);
   769  }
   770  
   771  if (entity_is_task(se)) {
   772  struct task_struct *p = task_of(se);
   773  if (p->sched_class != _sched_class) {
   774  /*
   775   * For !fair tasks do:
   776   *
   777  update_cfs_rq_load_avg(now, cfs_rq);
   778  attach_entity_load_avg(cfs_rq, se, 0);
   779  switched_from_fair(rq, p);
   780   *
   781   * such that the next switched_to_fair() has the
   782   * expected state.
   783   */
   784  se->avg.last_update_time = 
cfs_rq_clock_task(cfs_rq);
   785  return;
   786  }
   787  }
   788  
   789  attach_entity_cfs_rq(se);
   790  }
   791  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH V2] cros_ec_keyb: Mark cros_ec_keyb driver as wake enabled device.

2018-06-04 Thread Brian Norris
Hi,

On Fri, May 25, 2018 at 06:14:40PM -0700, Ravi Chandra Sadineni wrote:
> Mark cros_ec_keyb has wake enabled by default. If we see a MKBP event
> related to keyboard,  call pm_wakeup_event() to make sure wakeup
> triggers are accounted to keyb during suspend resume path.
> 
> Signed-off-by: Ravi Chandra Sadineni 
> ---
> V2: Marked the ckdev as wake enabled instead of input devices.

I'm not sure I saw v1? But FYI, if you're not marking the input devices
as wake-enabled, then you only have one knob for disabling wakeup on the
buttons and switches (e.g., power button) vs. the keyboard. I know you
were previously concerned about this, but given that the EC itself
usually has full knowledge of these situations (e.g., it knows to
disable keyboard wakeup when a convertible is in a tablet orientation,
but leave power-button wakeup enabled), this may not be a problem.

Anyway, just wanted to highlight that part.

>  drivers/input/keyboard/cros_ec_keyb.c | 21 +
>  drivers/mfd/cros_ec.c | 19 +++
>  2 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/input/keyboard/cros_ec_keyb.c 
> b/drivers/input/keyboard/cros_ec_keyb.c
> index 79eb29550c348..a7c96f0317123 100644
> --- a/drivers/input/keyboard/cros_ec_keyb.c
> +++ b/drivers/input/keyboard/cros_ec_keyb.c
> @@ -245,12 +245,17 @@ static int cros_ec_keyb_work(struct notifier_block *nb,
>   switch (ckdev->ec->event_data.event_type) {
>   case EC_MKBP_EVENT_KEY_MATRIX:
>   /*
> -  * If EC is not the wake source, discard key state changes
> +  * If Keyb is not wake enabled, discard key state changes

We can use the full word; text is cheap:

s/Keyb/keyboard/

>* during suspend.
>*/
> - if (queued_during_suspend)
> + if (queued_during_suspend
> + && !device_may_wakeup(ckdev->dev))
>   return NOTIFY_OK;
>  
> + if (device_may_wakeup(ckdev->dev))
> + pm_wakeup_event(ckdev->dev, 0);
> +
> +

I don't think you need two blank lines here. One is enough.

>   if (ckdev->ec->event_size != ckdev->cols) {
>   dev_err(ckdev->dev,
>   "Discarded incomplete key matrix event.\n");
> @@ -265,18 +270,25 @@ static int cros_ec_keyb_work(struct notifier_block *nb,
>   val = get_unaligned_le32(>ec->event_data.data.sysrq);
>   dev_dbg(ckdev->dev, "sysrq code from EC: %#x\n", val);
>   handle_sysrq(val);
> +
> + if (device_may_wakeup(ckdev->dev))
> + pm_wakeup_event(ckdev->dev, 0);
>   break;
>  
>   case EC_MKBP_EVENT_BUTTON:
>   case EC_MKBP_EVENT_SWITCH:
>   /*
> -  * If EC is not the wake source, discard key state
> +  * If keyb is not wake enabled, discard key state

s/keyb/keyboard/

Or, since this is talking about buttons and switches (which don't
technically require the "keyboard" part of this driver), you might just
leave that off ("If not wake enabled...").

Otherwise, I believe this looks good, though I may have overlooked
something:

Reviewed-by: Brian Norris 

And given Lee acked this, and it's mostly a keyboard change, it should
probably go through Dmitry? And I'd expect he'd be a better reviewer
than me anyway.

Brian

>* changes during suspend. Switches will be re-checked in
>* cros_ec_keyb_resume() to be sure nothing is lost.
>*/
> - if (queued_during_suspend)
> + if (queued_during_suspend
> + && !device_may_wakeup(ckdev->dev))
>   return NOTIFY_OK;
>  
> + if (device_may_wakeup(ckdev->dev))
> + pm_wakeup_event(ckdev->dev, 0);
> +
>   if (ckdev->ec->event_data.event_type == EC_MKBP_EVENT_BUTTON) {
>   val = get_unaligned_le32(
>   >ec->event_data.data.buttons);
> @@ -639,6 +651,7 @@ static int cros_ec_keyb_probe(struct platform_device 
> *pdev)
>   return err;
>   }
>  
> + device_init_wakeup(ckdev->dev, true);
>   return 0;
>  }
>  
> diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
> index d61024141e2b6..36156a41499c9 100644
> --- a/drivers/mfd/cros_ec.c
> +++ b/drivers/mfd/cros_ec.c
> @@ -229,7 +229,7 @@ int cros_ec_suspend(struct cros_ec_device *ec_dev)
>  }
>  EXPORT_SYMBOL(cros_ec_suspend);
>  
> -static void cros_ec_drain_events(struct cros_ec_device *ec_dev)
> +static void cros_ec_report_events_during_suspend(struct cros_ec_device 
> *ec_dev)
>  {
>   while (cros_ec_get_next_event(ec_dev, NULL) > 0)
>   blocking_notifier_call_chain(_dev->event_notifier,
> @@ -253,21 +253,16 @@ int cros_ec_resume(struct cros_ec_device *ec_dev)
>   dev_dbg(ec_dev->dev, "Error %d sending resume 

Re: [PATCH V2] cros_ec_keyb: Mark cros_ec_keyb driver as wake enabled device.

2018-06-04 Thread Brian Norris
Hi,

On Fri, May 25, 2018 at 06:14:40PM -0700, Ravi Chandra Sadineni wrote:
> Mark cros_ec_keyb has wake enabled by default. If we see a MKBP event
> related to keyboard,  call pm_wakeup_event() to make sure wakeup
> triggers are accounted to keyb during suspend resume path.
> 
> Signed-off-by: Ravi Chandra Sadineni 
> ---
> V2: Marked the ckdev as wake enabled instead of input devices.

I'm not sure I saw v1? But FYI, if you're not marking the input devices
as wake-enabled, then you only have one knob for disabling wakeup on the
buttons and switches (e.g., power button) vs. the keyboard. I know you
were previously concerned about this, but given that the EC itself
usually has full knowledge of these situations (e.g., it knows to
disable keyboard wakeup when a convertible is in a tablet orientation,
but leave power-button wakeup enabled), this may not be a problem.

Anyway, just wanted to highlight that part.

>  drivers/input/keyboard/cros_ec_keyb.c | 21 +
>  drivers/mfd/cros_ec.c | 19 +++
>  2 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/input/keyboard/cros_ec_keyb.c 
> b/drivers/input/keyboard/cros_ec_keyb.c
> index 79eb29550c348..a7c96f0317123 100644
> --- a/drivers/input/keyboard/cros_ec_keyb.c
> +++ b/drivers/input/keyboard/cros_ec_keyb.c
> @@ -245,12 +245,17 @@ static int cros_ec_keyb_work(struct notifier_block *nb,
>   switch (ckdev->ec->event_data.event_type) {
>   case EC_MKBP_EVENT_KEY_MATRIX:
>   /*
> -  * If EC is not the wake source, discard key state changes
> +  * If Keyb is not wake enabled, discard key state changes

We can use the full word; text is cheap:

s/Keyb/keyboard/

>* during suspend.
>*/
> - if (queued_during_suspend)
> + if (queued_during_suspend
> + && !device_may_wakeup(ckdev->dev))
>   return NOTIFY_OK;
>  
> + if (device_may_wakeup(ckdev->dev))
> + pm_wakeup_event(ckdev->dev, 0);
> +
> +

I don't think you need two blank lines here. One is enough.

>   if (ckdev->ec->event_size != ckdev->cols) {
>   dev_err(ckdev->dev,
>   "Discarded incomplete key matrix event.\n");
> @@ -265,18 +270,25 @@ static int cros_ec_keyb_work(struct notifier_block *nb,
>   val = get_unaligned_le32(>ec->event_data.data.sysrq);
>   dev_dbg(ckdev->dev, "sysrq code from EC: %#x\n", val);
>   handle_sysrq(val);
> +
> + if (device_may_wakeup(ckdev->dev))
> + pm_wakeup_event(ckdev->dev, 0);
>   break;
>  
>   case EC_MKBP_EVENT_BUTTON:
>   case EC_MKBP_EVENT_SWITCH:
>   /*
> -  * If EC is not the wake source, discard key state
> +  * If keyb is not wake enabled, discard key state

s/keyb/keyboard/

Or, since this is talking about buttons and switches (which don't
technically require the "keyboard" part of this driver), you might just
leave that off ("If not wake enabled...").

Otherwise, I believe this looks good, though I may have overlooked
something:

Reviewed-by: Brian Norris 

And given Lee acked this, and it's mostly a keyboard change, it should
probably go through Dmitry? And I'd expect he'd be a better reviewer
than me anyway.

Brian

>* changes during suspend. Switches will be re-checked in
>* cros_ec_keyb_resume() to be sure nothing is lost.
>*/
> - if (queued_during_suspend)
> + if (queued_during_suspend
> + && !device_may_wakeup(ckdev->dev))
>   return NOTIFY_OK;
>  
> + if (device_may_wakeup(ckdev->dev))
> + pm_wakeup_event(ckdev->dev, 0);
> +
>   if (ckdev->ec->event_data.event_type == EC_MKBP_EVENT_BUTTON) {
>   val = get_unaligned_le32(
>   >ec->event_data.data.buttons);
> @@ -639,6 +651,7 @@ static int cros_ec_keyb_probe(struct platform_device 
> *pdev)
>   return err;
>   }
>  
> + device_init_wakeup(ckdev->dev, true);
>   return 0;
>  }
>  
> diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
> index d61024141e2b6..36156a41499c9 100644
> --- a/drivers/mfd/cros_ec.c
> +++ b/drivers/mfd/cros_ec.c
> @@ -229,7 +229,7 @@ int cros_ec_suspend(struct cros_ec_device *ec_dev)
>  }
>  EXPORT_SYMBOL(cros_ec_suspend);
>  
> -static void cros_ec_drain_events(struct cros_ec_device *ec_dev)
> +static void cros_ec_report_events_during_suspend(struct cros_ec_device 
> *ec_dev)
>  {
>   while (cros_ec_get_next_event(ec_dev, NULL) > 0)
>   blocking_notifier_call_chain(_dev->event_notifier,
> @@ -253,21 +253,16 @@ int cros_ec_resume(struct cros_ec_device *ec_dev)
>   dev_dbg(ec_dev->dev, "Error %d sending resume 

Re: [PATCH 1/2] sched/fair: pelt: use u32 for util_avg

2018-06-04 Thread kbuild test robot
Hi Patrick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.17 next-20180604]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Patrick-Bellasi/sched-fair-pelt-use-u32-for-util_avg/20180605-082640
config: i386-defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/sched/fair.o: In function `post_init_entity_util_avg':
>> fair.c:(.text+0xa057): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/2] sched/fair: pelt: use u32 for util_avg

2018-06-04 Thread kbuild test robot
Hi Patrick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.17 next-20180604]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Patrick-Bellasi/sched-fair-pelt-use-u32-for-util_avg/20180605-082640
config: i386-defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/sched/fair.o: In function `post_init_entity_util_avg':
>> fair.c:(.text+0xa057): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


  1   2   3   4   5   6   7   8   9   10   >