Re: Process for severe early stable bugs?

2018-12-07 Thread Willy Tarreau
Hi Laura,

On Fri, Dec 07, 2018 at 04:33:10PM -0800, Laura Abbott wrote:
> The latest file system corruption issue (Nominally fixed by
> ffe81d45322c ("blk-mq: fix corruption with direct issue") later
> fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch
> list")) brought a lot of rightfully concerned users asking about
> release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to
> 4.19.3 on Nov 23. When the issue started getting visibility,
> users were left with the option of running known EOL 4.18.x
> kernels or running a 4.19 series that could corrupt their
> data. Admittedly, the risk of running the EOL kernel was pretty
> low given how recent it was, but it's still not a great look
> to tell people to run something marked EOL.
> 
> I'm wondering if there's anything we can do to make things easier
> on kernel consumers. Bugs will certainly happen but it really
> makes it hard to push the "always run the latest stable" narrative
> if there isn't a good fallback when things go seriously wrong. I
> don't actually have a great proposal for a solution here other than
> retroactively bringing back 4.18 (which I don't think Greg would
> like) but I figured I should at least bring it up.

This type of problem may happen once in a while but fortunately is
extremely rare, so I guess it can be addressed with unusual methods.

For my use cases, I always make sure that the last two LTS branches
work fine. Since there's some great maintenance overlap between LTS
branches, I can quickly switch to 4.14.x (or even 4.9.x) if this
happens. In our products we make sure that our toolchain is built
with support for the previous kernel as well "just in case". We've
never switched back and will probably never do, but at least it
serves us a lot to compare strange behaviours between two kernels.

I think that if your distro is functionally and technically compatible
with the previous LTS branch, it could be an acceptable escape for
users who are concerned about their data and their security at the
same time. After all, previous LTS branches are there for those who
can't upgrade. In my opinion this situation perfectly qualifies.

But it requires some preparation like I mentioned. It might be that
some components in the distro rely on features from the very latest
kernels. At the very least it might deserve a bit of inspection to
know if such dependencies exist, and/or what is lost in case of such
a fall back, to warn users.

Just my two cents,
Willy


Re: Process for severe early stable bugs?

2018-12-07 Thread Willy Tarreau
Hi Laura,

On Fri, Dec 07, 2018 at 04:33:10PM -0800, Laura Abbott wrote:
> The latest file system corruption issue (Nominally fixed by
> ffe81d45322c ("blk-mq: fix corruption with direct issue") later
> fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch
> list")) brought a lot of rightfully concerned users asking about
> release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to
> 4.19.3 on Nov 23. When the issue started getting visibility,
> users were left with the option of running known EOL 4.18.x
> kernels or running a 4.19 series that could corrupt their
> data. Admittedly, the risk of running the EOL kernel was pretty
> low given how recent it was, but it's still not a great look
> to tell people to run something marked EOL.
> 
> I'm wondering if there's anything we can do to make things easier
> on kernel consumers. Bugs will certainly happen but it really
> makes it hard to push the "always run the latest stable" narrative
> if there isn't a good fallback when things go seriously wrong. I
> don't actually have a great proposal for a solution here other than
> retroactively bringing back 4.18 (which I don't think Greg would
> like) but I figured I should at least bring it up.

This type of problem may happen once in a while but fortunately is
extremely rare, so I guess it can be addressed with unusual methods.

For my use cases, I always make sure that the last two LTS branches
work fine. Since there's some great maintenance overlap between LTS
branches, I can quickly switch to 4.14.x (or even 4.9.x) if this
happens. In our products we make sure that our toolchain is built
with support for the previous kernel as well "just in case". We've
never switched back and will probably never do, but at least it
serves us a lot to compare strange behaviours between two kernels.

I think that if your distro is functionally and technically compatible
with the previous LTS branch, it could be an acceptable escape for
users who are concerned about their data and their security at the
same time. After all, previous LTS branches are there for those who
can't upgrade. In my opinion this situation perfectly qualifies.

But it requires some preparation like I mentioned. It might be that
some components in the distro rely on features from the very latest
kernels. At the very least it might deserve a bit of inspection to
know if such dependencies exist, and/or what is lost in case of such
a fall back, to warn users.

Just my two cents,
Willy


Re: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings (fwd)

2018-12-07 Thread Julia Lawall



On Fri, 7 Dec 2018, Jacek Anaszewski wrote:

> Hi Julia,
>
> Thank you for the patch, but it doesn't apply to LED tree.
>
> The patch causing the problem is out-of-LED-tree.

OK, I guess that the patch is in a TI-specific tree, given the name.

Thanks for looking into it.

julia

>
> Best regards,
> Jacek Anaszewski
>
> On 12/6/18 9:28 PM, Julia Lawall wrote:
> > Hello,
> >
> > The code seems to be wrong in several ways.  If the continue is wanted,
> > the of_node_put is not needed; it will happen on the next iteration.  If
> > the continue is not wanted, the of_node_put is needed, and the continue
> > should be dropped.
> >
> > julia
> >
> > -- Forwarded message --
> > Date: Thu, 6 Dec 2018 19:48:54 +0800
> > From: kbuild test robot 
> > To: kbu...@01.org
> > Cc: Julia Lawall 
> > Subject: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings
> >
> > CC: kbuild-...@01.org
> > TO: Jyri Sarha 
> > CC: Peter Ujfalusi 
> > CC: Jacek Anaszewski 
> > CC: Pavel Machek 
> > CC: linux-l...@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> >
> > From: kbuild test robot 
> >
> > drivers/leds/leds-tlc591xx.c:342:3-14: ERROR: probable double put.
> >
> >   Device node iterators put the previous value of the index variable, so an
> >   explicit put causes a double put.
> >
> > Generated by: scripts/coccinelle/iterators/device_node_continue.cocci
> >
> > Fixes: 7b2d34aaede7 ("leds: tlc591xx: Add gpio output support")
> > CC: Jyri Sarha 
> > Signed-off-by: kbuild test robot 
> > ---
> >
> > tree:   https://github.com/omap-audio/linux-audio peter/ti-linux-4.19.y/wip
> > head:   838f24e2deaf1229002bd6555eb7e889b09ac1f9
> > commit: 7b2d34aaede727b4abfc78061bbd2202fcd92bc8 [62/67] leds: tlc591xx: Add
> > gpio output support
> > :: branch date: 26 hours ago
> > :: commit date: 26 hours ago
> >
> > Please take the patch only if it's a positive warning. Thanks!
> >
> >   leds-tlc591xx.c |1 -
> >   1 file changed, 1 deletion(-)
> >
> > --- a/drivers/leds/leds-tlc591xx.c
> > +++ b/drivers/leds/leds-tlc591xx.c
> > @@ -339,7 +339,6 @@ tlc591xx_probe(struct i2c_client *client
> > for_each_child_of_node(np, child) {
> > err = of_property_read_u32(child, "reg", );
> > if (err) {
> > -   of_node_put(child);
> > continue;
> > return err;
> > }
> >
>
>
>


Re: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings (fwd)

2018-12-07 Thread Julia Lawall



On Fri, 7 Dec 2018, Jacek Anaszewski wrote:

> Hi Julia,
>
> Thank you for the patch, but it doesn't apply to LED tree.
>
> The patch causing the problem is out-of-LED-tree.

OK, I guess that the patch is in a TI-specific tree, given the name.

Thanks for looking into it.

julia

>
> Best regards,
> Jacek Anaszewski
>
> On 12/6/18 9:28 PM, Julia Lawall wrote:
> > Hello,
> >
> > The code seems to be wrong in several ways.  If the continue is wanted,
> > the of_node_put is not needed; it will happen on the next iteration.  If
> > the continue is not wanted, the of_node_put is needed, and the continue
> > should be dropped.
> >
> > julia
> >
> > -- Forwarded message --
> > Date: Thu, 6 Dec 2018 19:48:54 +0800
> > From: kbuild test robot 
> > To: kbu...@01.org
> > Cc: Julia Lawall 
> > Subject: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings
> >
> > CC: kbuild-...@01.org
> > TO: Jyri Sarha 
> > CC: Peter Ujfalusi 
> > CC: Jacek Anaszewski 
> > CC: Pavel Machek 
> > CC: linux-l...@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> >
> > From: kbuild test robot 
> >
> > drivers/leds/leds-tlc591xx.c:342:3-14: ERROR: probable double put.
> >
> >   Device node iterators put the previous value of the index variable, so an
> >   explicit put causes a double put.
> >
> > Generated by: scripts/coccinelle/iterators/device_node_continue.cocci
> >
> > Fixes: 7b2d34aaede7 ("leds: tlc591xx: Add gpio output support")
> > CC: Jyri Sarha 
> > Signed-off-by: kbuild test robot 
> > ---
> >
> > tree:   https://github.com/omap-audio/linux-audio peter/ti-linux-4.19.y/wip
> > head:   838f24e2deaf1229002bd6555eb7e889b09ac1f9
> > commit: 7b2d34aaede727b4abfc78061bbd2202fcd92bc8 [62/67] leds: tlc591xx: Add
> > gpio output support
> > :: branch date: 26 hours ago
> > :: commit date: 26 hours ago
> >
> > Please take the patch only if it's a positive warning. Thanks!
> >
> >   leds-tlc591xx.c |1 -
> >   1 file changed, 1 deletion(-)
> >
> > --- a/drivers/leds/leds-tlc591xx.c
> > +++ b/drivers/leds/leds-tlc591xx.c
> > @@ -339,7 +339,6 @@ tlc591xx_probe(struct i2c_client *client
> > for_each_child_of_node(np, child) {
> > err = of_property_read_u32(child, "reg", );
> > if (err) {
> > -   of_node_put(child);
> > continue;
> > return err;
> > }
> >
>
>
>


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread Dan Williams
On Fri, Dec 7, 2018 at 4:53 PM John Hubbard  wrote:
>
> On 12/7/18 11:16 AM, Jerome Glisse wrote:
> > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote:
[..]
> I see. OK, HMM has done an efficient job of mopping up unused fields, and now 
> we are
> completely out of space. At this point, after thinking about it carefully, it 
> seems clear
> that it's time for a single, new field:
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ed8f6292a53..1c789e324da8 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -182,6 +182,9 @@ struct page {
> /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
> atomic_t _refcount;
>
> +   /* DMA usage count. See get_user_pages*(), put_user_page*(). */
> +   atomic_t _dma_pinned_count;
> +
>  #ifdef CONFIG_MEMCG
> struct mem_cgroup *mem_cgroup;
>  #endif
>
>
> ...because after all, the reason this is so difficult is that this fix has to 
> work
> in pretty much every configuration. get_user_pages() use is widespread, it's 
> a very
> general facility, and...it needs fixing.  And we're out of space.

HMM seems entirely too greedy in this regard. Especially with zero
upstream users. When can we start to delete the pieces of HMM that
have no upstream consumers? I would think that would be 4.21 / 5.0 as
there needs to be some forcing function. We can always re-add pieces
of HMM with it's users when / if they arrive.


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread Dan Williams
On Fri, Dec 7, 2018 at 4:53 PM John Hubbard  wrote:
>
> On 12/7/18 11:16 AM, Jerome Glisse wrote:
> > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote:
[..]
> I see. OK, HMM has done an efficient job of mopping up unused fields, and now 
> we are
> completely out of space. At this point, after thinking about it carefully, it 
> seems clear
> that it's time for a single, new field:
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ed8f6292a53..1c789e324da8 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -182,6 +182,9 @@ struct page {
> /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
> atomic_t _refcount;
>
> +   /* DMA usage count. See get_user_pages*(), put_user_page*(). */
> +   atomic_t _dma_pinned_count;
> +
>  #ifdef CONFIG_MEMCG
> struct mem_cgroup *mem_cgroup;
>  #endif
>
>
> ...because after all, the reason this is so difficult is that this fix has to 
> work
> in pretty much every configuration. get_user_pages() use is widespread, it's 
> a very
> general facility, and...it needs fixing.  And we're out of space.

HMM seems entirely too greedy in this regard. Especially with zero
upstream users. When can we start to delete the pieces of HMM that
have no upstream consumers? I would think that would be 4.21 / 5.0 as
there needs to be some forcing function. We can always re-add pieces
of HMM with it's users when / if they arrive.


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Andrea Righi
On Sat, Dec 08, 2018 at 12:48:59PM +0900, Masami Hiramatsu wrote:
> On Fri, 7 Dec 2018 18:58:05 +0100
> Andrea Righi  wrote:
> 
> > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > > Hi Andrea and Ingo,
> > > 
> > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > > working.
> > > After introducing this patch, I will start adding 
> > > arch_populate_kprobe_blacklist()
> > > to some arches.
> > > 
> > > Thank you,
> > > 
> > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > > area
> > > 
> > > From: Masami Hiramatsu 
> > > 
> > > Blacklist symbols in arch-defined probe-prohibited areas.
> > > With this change, user can see all symbols which are prohibited
> > > to probe in debugfs.
> > > 
> > > All archtectures which have custom prohibit areas should define
> > > its own arch_populate_kprobe_blacklist() function, but unless that,
> > > all symbols marked __kprobes are blacklisted.
> > 
> > What about iterating all symbols and use arch_within_kprobe_blacklist()
> > to check if we need to blacklist them or not.
> 
> Sorry, I don't want to iterate all ksyms since it may take a long time
> (especially embedded small devices.)
> 
> > 
> > In this way we don't have to introduce an
> > arch_populate_kprobe_blacklist() for each architecture.
> 
> Hmm, I had a same idea, but there are some arch which prohibit probing
> extable entries (e.g. arm64.) For correctness of the blacklist, I think
> it should be listed (not entire the function body).
> I also rather like to remove arch_within_kprobe_blacklist() instead.

OK. Thanks.

-Andrea


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Andrea Righi
On Sat, Dec 08, 2018 at 12:48:59PM +0900, Masami Hiramatsu wrote:
> On Fri, 7 Dec 2018 18:58:05 +0100
> Andrea Righi  wrote:
> 
> > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > > Hi Andrea and Ingo,
> > > 
> > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > > working.
> > > After introducing this patch, I will start adding 
> > > arch_populate_kprobe_blacklist()
> > > to some arches.
> > > 
> > > Thank you,
> > > 
> > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > > area
> > > 
> > > From: Masami Hiramatsu 
> > > 
> > > Blacklist symbols in arch-defined probe-prohibited areas.
> > > With this change, user can see all symbols which are prohibited
> > > to probe in debugfs.
> > > 
> > > All archtectures which have custom prohibit areas should define
> > > its own arch_populate_kprobe_blacklist() function, but unless that,
> > > all symbols marked __kprobes are blacklisted.
> > 
> > What about iterating all symbols and use arch_within_kprobe_blacklist()
> > to check if we need to blacklist them or not.
> 
> Sorry, I don't want to iterate all ksyms since it may take a long time
> (especially embedded small devices.)
> 
> > 
> > In this way we don't have to introduce an
> > arch_populate_kprobe_blacklist() for each architecture.
> 
> Hmm, I had a same idea, but there are some arch which prohibit probing
> extable entries (e.g. arm64.) For correctness of the blacklist, I think
> it should be listed (not entire the function body).
> I also rather like to remove arch_within_kprobe_blacklist() instead.

OK. Thanks.

-Andrea


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Andrea Righi
On Sat, Dec 08, 2018 at 12:42:10PM +0900, Masami Hiramatsu wrote:
> On Fri, 7 Dec 2018 18:00:26 +0100
> Andrea Righi  wrote:
> 
> > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > > Hi Andrea and Ingo,
> > > 
> > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > > working.
> > > After introducing this patch, I will start adding 
> > > arch_populate_kprobe_blacklist()
> > > to some arches.
> > > 
> > > Thank you,
> > > 
> > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > > area
> > > 
> > > From: Masami Hiramatsu 
> > > 
> > > Blacklist symbols in arch-defined probe-prohibited areas.
> > > With this change, user can see all symbols which are prohibited
> > > to probe in debugfs.
> > > 
> > > All archtectures which have custom prohibit areas should define
> > > its own arch_populate_kprobe_blacklist() function, but unless that,
> > > all symbols marked __kprobes are blacklisted.
> > > 
> > > Reported-by: Andrea Righi 
> > > Signed-off-by: Masami Hiramatsu 
> > > ---
> > 
> > [snip]
> > 
> > > +int kprobe_add_ksym_blacklist(unsigned long entry)
> > > +{
> > > + struct kprobe_blacklist_entry *ent;
> > > + unsigned long offset = 0, size = 0;
> > > +
> > > + if (!kernel_text_address(entry) ||
> > > + !kallsyms_lookup_size_offset(entry, , ))
> > > + return -EINVAL;
> > > +
> > > + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> > > + if (!ent)
> > > + return -ENOMEM;
> > > + ent->start_addr = entry - offset;
> > > + ent->end_addr = entry - offset + size;
> > 
> > Do we need to take offset into account? The code before wasn't using it.
> 
> Yes, if we hit an alias symbol (zero-size), we forcibly increment address
> and retry it. In that case, offset will be 1.
> 
> > 
> > > + INIT_LIST_HEAD(>list);
> > > + list_add_tail(>list, _blacklist);
> > > +
> > > + return (int)size;
> > > +}
> > > +
> > > +/* Add functions in arch defined probe-prohibited area */
> > > +int __weak arch_populate_kprobe_blacklist(void)
> > > +{
> > > + unsigned long entry;
> > > + int ret = 0;
> > > +
> > > + for (entry = (unsigned long)__kprobes_text_start;
> > > +  entry < (unsigned long)__kprobes_text_end;
> > > +  entry += ret) {
> > > + ret = kprobe_add_ksym_blacklist(entry);
> > > + if (ret < 0)
> > > + return ret;
> > > + if (ret == 0)   /* In case of alias symbol */
> > > + ret = 1;
> 
> Here, we incremented.
> 
> Thank you,

Makes sense, thanks for the clarification.

-Andrea


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Andrea Righi
On Sat, Dec 08, 2018 at 12:42:10PM +0900, Masami Hiramatsu wrote:
> On Fri, 7 Dec 2018 18:00:26 +0100
> Andrea Righi  wrote:
> 
> > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > > Hi Andrea and Ingo,
> > > 
> > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > > working.
> > > After introducing this patch, I will start adding 
> > > arch_populate_kprobe_blacklist()
> > > to some arches.
> > > 
> > > Thank you,
> > > 
> > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > > area
> > > 
> > > From: Masami Hiramatsu 
> > > 
> > > Blacklist symbols in arch-defined probe-prohibited areas.
> > > With this change, user can see all symbols which are prohibited
> > > to probe in debugfs.
> > > 
> > > All archtectures which have custom prohibit areas should define
> > > its own arch_populate_kprobe_blacklist() function, but unless that,
> > > all symbols marked __kprobes are blacklisted.
> > > 
> > > Reported-by: Andrea Righi 
> > > Signed-off-by: Masami Hiramatsu 
> > > ---
> > 
> > [snip]
> > 
> > > +int kprobe_add_ksym_blacklist(unsigned long entry)
> > > +{
> > > + struct kprobe_blacklist_entry *ent;
> > > + unsigned long offset = 0, size = 0;
> > > +
> > > + if (!kernel_text_address(entry) ||
> > > + !kallsyms_lookup_size_offset(entry, , ))
> > > + return -EINVAL;
> > > +
> > > + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> > > + if (!ent)
> > > + return -ENOMEM;
> > > + ent->start_addr = entry - offset;
> > > + ent->end_addr = entry - offset + size;
> > 
> > Do we need to take offset into account? The code before wasn't using it.
> 
> Yes, if we hit an alias symbol (zero-size), we forcibly increment address
> and retry it. In that case, offset will be 1.
> 
> > 
> > > + INIT_LIST_HEAD(>list);
> > > + list_add_tail(>list, _blacklist);
> > > +
> > > + return (int)size;
> > > +}
> > > +
> > > +/* Add functions in arch defined probe-prohibited area */
> > > +int __weak arch_populate_kprobe_blacklist(void)
> > > +{
> > > + unsigned long entry;
> > > + int ret = 0;
> > > +
> > > + for (entry = (unsigned long)__kprobes_text_start;
> > > +  entry < (unsigned long)__kprobes_text_end;
> > > +  entry += ret) {
> > > + ret = kprobe_add_ksym_blacklist(entry);
> > > + if (ret < 0)
> > > + return ret;
> > > + if (ret == 0)   /* In case of alias symbol */
> > > + ret = 1;
> 
> Here, we incremented.
> 
> Thank you,

Makes sense, thanks for the clarification.

-Andrea


[PATCH] ARC: Remove 0x prefix from unit-address of node names

2018-12-07 Thread Alexey Brodkin
Done automatically with help of:
--->8
sed -i 's/@0x/@/g' arch/arc/boot/dts/*.dts*
--->8

Inspired by [1] and the like.

[1] http://kisskb.ellerman.id.au/kisskb/buildresult/13612017/

Signed-off-by: Alexey Brodkin 
Cc: Rob Herring 

Signed-off-by: Alexey Brodkin 
---
 arch/arc/boot/dts/abilis_tb10x.dtsi   |  4 ++--
 arch/arc/boot/dts/axc001.dtsi |  6 +++---
 arch/arc/boot/dts/axc003.dtsi | 14 +++---
 arch/arc/boot/dts/axc003_idu.dtsi | 14 +++---
 arch/arc/boot/dts/axs10x_mb.dtsi  | 22 +++---
 arch/arc/boot/dts/vdk_axc003.dtsi |  4 ++--
 arch/arc/boot/dts/vdk_axc003_idu.dtsi |  4 ++--
 arch/arc/boot/dts/vdk_axs10x_mb.dtsi  | 14 +++---
 8 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/arch/arc/boot/dts/abilis_tb10x.dtsi 
b/arch/arc/boot/dts/abilis_tb10x.dtsi
index 3121536b25a3..77ac1368a65b 100644
--- a/arch/arc/boot/dts/abilis_tb10x.dtsi
+++ b/arch/arc/boot/dts/abilis_tb10x.dtsi
@@ -178,7 +178,7 @@
clocks = <_clk>;
};
 
-   spi0: spi@0xFE01 {
+   spi0: spi@FE01 {
#address-cells = <1>;
#size-cells = <0>;
cell-index = <0>;
@@ -189,7 +189,7 @@
interrupts = <26 8>;
clocks = <_clk>;
};
-   spi1: spi@0xFE011000 {
+   spi1: spi@FE011000 {
#address-cells = <1>;
#size-cells = <0>;
cell-index = <1>;
diff --git a/arch/arc/boot/dts/axc001.dtsi b/arch/arc/boot/dts/axc001.dtsi
index fdc266504ada..37be3bf03ad6 100644
--- a/arch/arc/boot/dts/axc001.dtsi
+++ b/arch/arc/boot/dts/axc001.dtsi
@@ -41,7 +41,7 @@
 * this GPIO block ORs all interrupts on CPU card (creg,..)
 * to uplink only 1 IRQ to ARC core intc
 */
-   dw-apb-gpio@0x2000 {
+   dw-apb-gpio@2000 {
compatible = "snps,dw-apb-gpio";
reg = < 0x2000 0x80 >;
#address-cells = <1>;
@@ -60,7 +60,7 @@
};
};
 
-   debug_uart: dw-apb-uart@0x5000 {
+   debug_uart: dw-apb-uart@5000 {
compatible = "snps,dw-apb-uart";
reg = <0x5000 0x100>;
clock-frequency = <3000>;
@@ -88,7 +88,7 @@
 * avoid duplicating the MB dtsi file given that IRQ from
 * this intc to cpu intc are different for axs101 and axs103
 */
-   mb_intc: dw-apb-ictl@0xe0012000 {
+   mb_intc: dw-apb-ictl@e0012000 {
#interrupt-cells = <1>;
compatible = "snps,dw-apb-ictl";
reg = < 0x0 0xe0012000 0x0 0x200 >;
diff --git a/arch/arc/boot/dts/axc003.dtsi b/arch/arc/boot/dts/axc003.dtsi
index d75d65ddf8e3..ffa208a57a0a 100644
--- a/arch/arc/boot/dts/axc003.dtsi
+++ b/arch/arc/boot/dts/axc003.dtsi
@@ -55,7 +55,7 @@
 * this GPIO block ORs all interrupts on CPU card (creg,..)
 * to uplink only 1 IRQ to ARC core intc
 */
-   dw-apb-gpio@0x2000 {
+   dw-apb-gpio@2000 {
compatible = "snps,dw-apb-gpio";
reg = < 0x2000 0x80 >;
#address-cells = <1>;
@@ -74,7 +74,7 @@
};
};
 
-   debug_uart: dw-apb-uart@0x5000 {
+   debug_uart: dw-apb-uart@5000 {
compatible = "snps,dw-apb-uart";
reg = <0x5000 0x100>;
clock-frequency = <3000>;
@@ -102,19 +102,19 @@
 * external DMA buffer located outside of IOC aperture.
 */
axs10x_mb {
-   ethernet@0x18000 {
+   ethernet@18000 {
dma-coherent;
};
 
-   ehci@0x4 {
+   ehci@4 {
dma-coherent;
};
 
-   ohci@0x6 {
+   ohci@6 {
dma-coherent;
};
 
-   mmc@0x15000 {
+   mmc@15000 {
dma-coherent;
};
};
@@ -132,7 +132,7 @@
 * avoid duplicating the MB dtsi file given that IRQ from
 * this intc to cpu intc are different for axs101 and axs103
 */
-   mb_intc: dw-apb-ictl@0xe0012000 {
+   mb_intc: dw-apb-ictl@e0012000 {
#interrupt-cells = <1>;
compatible = "snps,dw-apb-ictl";
reg = < 0x0 0xe0012000 0x0 0x200 >;
diff --git a/arch/arc/boot/dts/axc003_idu.dtsi 
b/arch/arc/boot/dts/axc003_idu.dtsi
index a05bb737ea63..ad795748ac63 100644
--- 

Re: [PATCH 1/5] kconfig: remove unneeded setsym label in conf_read_simple()

2018-12-07 Thread Masahiro Yamada
On Fri, Nov 30, 2018 at 6:17 PM Masahiro Yamada
 wrote:
>
> The two 'goto setsym' statements are reachable only when sym == NULL.
>
> The code below the 'setsym:' label does nothing when sym == NULL
> since there is just one if-block guarded by 'if (sym && ...)'.
>
> Hence, 'goto setsym' can be replaced with 'continue'.
>
> Signed-off-by: Masahiro Yamada 
> ---

Series, applied to linux-kbuild/kconfig.



>  scripts/kconfig/confdata.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
> index 91d0a5c..1e35529 100644
> --- a/scripts/kconfig/confdata.c
> +++ b/scripts/kconfig/confdata.c
> @@ -363,7 +363,7 @@ int conf_read_simple(const char *name, int def)
> sym = sym_find(line + 2 + strlen(CONFIG_));
> if (!sym) {
> sym_add_change_count(1);
> -   goto setsym;
> +   continue;
> }
> } else {
> sym = sym_lookup(line + 2 + strlen(CONFIG_), 
> 0);
> @@ -397,7 +397,7 @@ int conf_read_simple(const char *name, int def)
> sym = sym_find(line + strlen(CONFIG_));
> if (!sym) {
> sym_add_change_count(1);
> -   goto setsym;
> +   continue;
> }
> } else {
> sym = sym_lookup(line + strlen(CONFIG_), 0);
> @@ -416,7 +416,7 @@ int conf_read_simple(const char *name, int def)
>
> continue;
> }
> -setsym:
> +
> if (sym && sym_is_choice_value(sym)) {
> struct symbol *cs = 
> prop_get_symbol(sym_get_choice_prop(sym));
> switch (sym->def[def].tri) {
> --
> 2.7.4
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH 1/5] kconfig: remove unneeded setsym label in conf_read_simple()

2018-12-07 Thread Masahiro Yamada
On Fri, Nov 30, 2018 at 6:17 PM Masahiro Yamada
 wrote:
>
> The two 'goto setsym' statements are reachable only when sym == NULL.
>
> The code below the 'setsym:' label does nothing when sym == NULL
> since there is just one if-block guarded by 'if (sym && ...)'.
>
> Hence, 'goto setsym' can be replaced with 'continue'.
>
> Signed-off-by: Masahiro Yamada 
> ---

Series, applied to linux-kbuild/kconfig.



>  scripts/kconfig/confdata.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
> index 91d0a5c..1e35529 100644
> --- a/scripts/kconfig/confdata.c
> +++ b/scripts/kconfig/confdata.c
> @@ -363,7 +363,7 @@ int conf_read_simple(const char *name, int def)
> sym = sym_find(line + 2 + strlen(CONFIG_));
> if (!sym) {
> sym_add_change_count(1);
> -   goto setsym;
> +   continue;
> }
> } else {
> sym = sym_lookup(line + 2 + strlen(CONFIG_), 
> 0);
> @@ -397,7 +397,7 @@ int conf_read_simple(const char *name, int def)
> sym = sym_find(line + strlen(CONFIG_));
> if (!sym) {
> sym_add_change_count(1);
> -   goto setsym;
> +   continue;
> }
> } else {
> sym = sym_lookup(line + strlen(CONFIG_), 0);
> @@ -416,7 +416,7 @@ int conf_read_simple(const char *name, int def)
>
> continue;
> }
> -setsym:
> +
> if (sym && sym_is_choice_value(sym)) {
> struct symbol *cs = 
> prop_get_symbol(sym_get_choice_prop(sym));
> switch (sym->def[def].tri) {
> --
> 2.7.4
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] kbuild: move .SECONDARY special target to Kbuild.include

2018-12-07 Thread Masahiro Yamada
On Sat, Dec 1, 2018 at 9:29 AM Masahiro Yamada
 wrote:
>
> In commit 54a702f70589 ("kbuild: mark $(targets) as .SECONDARY and
> remove .PRECIOUS markers"), I missed one important feature of the
> .SECONDARY target:
>
>   .SECONDARY with no prerequisites causes all targets to be treated
>as secondary.
>
> Kbuild's policy is, "Do not delete any intermediate files."
> So, just move it to scripts/Kbuild.include, with no prerequisites.
>
> Note:
> If an intermediate file is generated by $(call if_changed,...), you
> still need to add it to "targets" so its .*.cmd file is included.
>
> The arm/arm64 crypto files are generated by $(call cmd,shipped),
> so they do not need to be added to "targets", but need to be added
> to "clean-files" so "make clean" can properly clean them away.
>
> Signed-off-by: Masahiro Yamada 
> ---


Applied to linux-kbuild.



>  arch/arm/crypto/Makefile   | 2 +-
>  arch/arm64/crypto/Makefile | 2 +-
>  scripts/Kbuild.include | 3 +++
>  scripts/Makefile.build | 4 
>  4 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
> index bd5bcee..89f88ab 100644
> --- a/arch/arm/crypto/Makefile
> +++ b/arch/arm/crypto/Makefile
> @@ -65,4 +65,4 @@ $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
> $(call cmd,perl)
>  endif
>
> -targets += sha256-core.S sha512-core.S
> +clean-files += sha256-core.S sha512-core.S
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index f476fed..860d931 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -75,4 +75,4 @@ $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl
> $(call cmd,perlasm)
>  endif
>
> -targets += sha256-core.S sha512-core.S
> +clean-files += sha256-core.S sha512-core.S
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index bb01555..0454916 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -393,3 +393,6 @@ endef
>
>  # delete partially updated (i.e. corrupted) files on error
>  .DELETE_ON_ERROR:
> +
> +# do not delete intermediate files automatically
> +.SECONDARY:
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index a8e7ba9..604096a 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -546,8 +546,4 @@ $(shell mkdir -p $(obj-dirs))
>  endif
>  endif
>
> -# Some files contained in $(targets) are intermediate artifacts.
> -# We never want them to be removed automatically.
> -.SECONDARY: $(targets)
> -
>  .PHONY: $(PHONY)
> --
> 2.7.4
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH 6/7] microblaze: fix race condition in building boot images

2018-12-07 Thread Masahiro Yamada
On Thu, Dec 6, 2018 at 1:32 AM Michal Simek  wrote:
>
> On 03. 12. 18 8:50, Masahiro Yamada wrote:
> > I fixed a race condition in the parallel building of ARM in commit
> > 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not
> > generate invalid images").
> >
> > I see the same problem for MicroBlaze too.
> >
> > "make -j ARCH=microblaze all linux.bin.ub" results in a broken build
> > since two threads descend into arch/microblaze/boot simultaneously.
>
> I see also different problem that when I run that make above
> linux.bin.ub is not generated at all.


That's why I am fixing the problem.

Before the fix, the parallel build fails.


/usr/bin/mkimage: Can't read arch/microblaze/boot/linux.bin: Invalid argument
arch/microblaze/boot/Makefile:14: recipe for target
'arch/microblaze/boot/linux.bin.ub' failed
make[1]: *** [arch/microblaze/boot/linux.bin.ub] Error 1
make[1]: *** Deleting file 'arch/microblaze/boot/linux.bin.ub'
arch/microblaze/Makefile:87: recipe for target 'linux.bin.ub' failed
make: *** [linux.bin.ub] Error 2
make: *** Waiting for unfinished jobs
  MODPOST 6 modules
Kernel: arch/microblaze/boot/linux.bin is ready  (#2)





>
> >
> > Add proper dependencies to avoid it.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/microblaze/Makefile | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/microblaze/Makefile b/arch/microblaze/Makefile
> > index 7a5df02..544796d 100644
> > --- a/arch/microblaze/Makefile
> > +++ b/arch/microblaze/Makefile
> > @@ -79,13 +79,15 @@ all: linux.bin
> >  archclean:
> >   $(Q)$(MAKE) $(clean)=$(boot)
> >
> > +linux.bin.ub linux.bin.gz: linux.bin
> > +
> >  PHONY += linux.bin linux.bin.gz linux.bin.ub
> >  linux.bin linux.bin.gz linux.bin.ub: vmlinux
> >   $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
> >   @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')'
> >
> >  PHONY += simpleImage.$(DTB)
> > -simpleImage.$(DTB): vmlinux
> > +simpleImage.$(DTB): linux.bin.ub
>
> It looks weird that simpleImage requires linux.bin.ub.
> Is it really linux.bin.ub here or just linux.bin?


This is intentional to avoid a race in
"make simpleImage. linux.bin.ub"


But, I chose to make simpleImage* independent of linux.bin* in v2.

I hope you will like it.



> >   $(Q)$(MAKE) $(build)=$(boot) simple_images
> >   @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')'
> >
> >
>
>
> Thanks,
> Michal
>
> --
> Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
> w: www.monstr.eu p: +42-0-721842854
> Maintainer of Linux kernel - Xilinx Microblaze
> Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
> U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs
>
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH 6/7] microblaze: fix race condition in building boot images

2018-12-07 Thread Masahiro Yamada
On Thu, Dec 6, 2018 at 1:32 AM Michal Simek  wrote:
>
> On 03. 12. 18 8:50, Masahiro Yamada wrote:
> > I fixed a race condition in the parallel building of ARM in commit
> > 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not
> > generate invalid images").
> >
> > I see the same problem for MicroBlaze too.
> >
> > "make -j ARCH=microblaze all linux.bin.ub" results in a broken build
> > since two threads descend into arch/microblaze/boot simultaneously.
>
> I see also different problem that when I run that make above
> linux.bin.ub is not generated at all.


That's why I am fixing the problem.

Before the fix, the parallel build fails.


/usr/bin/mkimage: Can't read arch/microblaze/boot/linux.bin: Invalid argument
arch/microblaze/boot/Makefile:14: recipe for target
'arch/microblaze/boot/linux.bin.ub' failed
make[1]: *** [arch/microblaze/boot/linux.bin.ub] Error 1
make[1]: *** Deleting file 'arch/microblaze/boot/linux.bin.ub'
arch/microblaze/Makefile:87: recipe for target 'linux.bin.ub' failed
make: *** [linux.bin.ub] Error 2
make: *** Waiting for unfinished jobs
  MODPOST 6 modules
Kernel: arch/microblaze/boot/linux.bin is ready  (#2)





>
> >
> > Add proper dependencies to avoid it.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/microblaze/Makefile | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/microblaze/Makefile b/arch/microblaze/Makefile
> > index 7a5df02..544796d 100644
> > --- a/arch/microblaze/Makefile
> > +++ b/arch/microblaze/Makefile
> > @@ -79,13 +79,15 @@ all: linux.bin
> >  archclean:
> >   $(Q)$(MAKE) $(clean)=$(boot)
> >
> > +linux.bin.ub linux.bin.gz: linux.bin
> > +
> >  PHONY += linux.bin linux.bin.gz linux.bin.ub
> >  linux.bin linux.bin.gz linux.bin.ub: vmlinux
> >   $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
> >   @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')'
> >
> >  PHONY += simpleImage.$(DTB)
> > -simpleImage.$(DTB): vmlinux
> > +simpleImage.$(DTB): linux.bin.ub
>
> It looks weird that simpleImage requires linux.bin.ub.
> Is it really linux.bin.ub here or just linux.bin?


This is intentional to avoid a race in
"make simpleImage. linux.bin.ub"


But, I chose to make simpleImage* independent of linux.bin* in v2.

I hope you will like it.



> >   $(Q)$(MAKE) $(build)=$(boot) simple_images
> >   @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')'
> >
> >
>
>
> Thanks,
> Michal
>
> --
> Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
> w: www.monstr.eu p: +42-0-721842854
> Maintainer of Linux kernel - Xilinx Microblaze
> Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
> U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs
>
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH v9 2/3] x86: add support for Huawei WMI hotkeys.

2018-12-07 Thread ayman . bagabas
On Fri, 2018-12-07 at 00:52 -0500, ayman.baga...@gmail.com wrote:
On Mon, 2018-12-03 at 21:17 +0200, Andy Shevchenko wrote:
> On Mon, Dec 3, 2018 at 9:04 PM Takashi Iwai  wrote:
> > On Mon, 03 Dec 2018 19:53:39 +0100,
> > Ayman Bagabas wrote:
> > > + if (code == 0x80) {
> > > + acpi_status status;
> > > + acpi_handle handle;
> > > + unsigned long long result;
> > > + union acpi_object args[1];
> > > + struct acpi_object_list arg_list = {
> > > + .pointer = args,
> > > + .count = ARRAY_SIZE(args),
> > > + };
> > > +
> > > + args[0].type = ACPI_TYPE_INTEGER;
> > > + args[0].integer.value = 0;
> > > +
> > > + status = acpi_get_handle(NULL, "\\WMI0", );
> > > + if (ACPI_FAILURE(status)) {
> > > + dev_err(>dev, "Unable to get ACPI
> > > handle\n");
> > > + return;
> > > + }
> > > +
> > > + status = acpi_evaluate_integer(NULL, "WQ00",
> > > _list, );
> > 
> > I guess you need to pass handle here?  In the earlier version, you
> > passed \\WMI0.WQ00, so it worked with NULL handle.  But now it's no
> > longer so...
> 
> I think in this case we don't need to have a separate call to get
> handle and try to get integer directly. In either we will have an
> error case if method  / namespace / etc is not found.
> 
> 

I was digging further into the DSDT table of the laptop that has WQ00
method and it turned out that this method is a ACPI_WMI_EXPENSIVE. It
has its own GUID of "39142400-C6A3-40fa-BADB-8A2652834100". I believe
wmi_query_block, which is deprecated BTW, can be used to access this
method instead of acpi_evaluate. However, I don't have a machine to
test it on. Any ideas?




Re: [PATCH v9 2/3] x86: add support for Huawei WMI hotkeys.

2018-12-07 Thread ayman . bagabas
On Fri, 2018-12-07 at 00:52 -0500, ayman.baga...@gmail.com wrote:
On Mon, 2018-12-03 at 21:17 +0200, Andy Shevchenko wrote:
> On Mon, Dec 3, 2018 at 9:04 PM Takashi Iwai  wrote:
> > On Mon, 03 Dec 2018 19:53:39 +0100,
> > Ayman Bagabas wrote:
> > > + if (code == 0x80) {
> > > + acpi_status status;
> > > + acpi_handle handle;
> > > + unsigned long long result;
> > > + union acpi_object args[1];
> > > + struct acpi_object_list arg_list = {
> > > + .pointer = args,
> > > + .count = ARRAY_SIZE(args),
> > > + };
> > > +
> > > + args[0].type = ACPI_TYPE_INTEGER;
> > > + args[0].integer.value = 0;
> > > +
> > > + status = acpi_get_handle(NULL, "\\WMI0", );
> > > + if (ACPI_FAILURE(status)) {
> > > + dev_err(>dev, "Unable to get ACPI
> > > handle\n");
> > > + return;
> > > + }
> > > +
> > > + status = acpi_evaluate_integer(NULL, "WQ00",
> > > _list, );
> > 
> > I guess you need to pass handle here?  In the earlier version, you
> > passed \\WMI0.WQ00, so it worked with NULL handle.  But now it's no
> > longer so...
> 
> I think in this case we don't need to have a separate call to get
> handle and try to get integer directly. In either we will have an
> error case if method  / namespace / etc is not found.
> 
> 

I was digging further into the DSDT table of the laptop that has WQ00
method and it turned out that this method is a ACPI_WMI_EXPENSIVE. It
has its own GUID of "39142400-C6A3-40fa-BADB-8A2652834100". I believe
wmi_query_block, which is deprecated BTW, can be used to access this
method instead of acpi_evaluate. However, I don't have a machine to
test it on. Any ideas?




Re: [PATCH 1/2] um: remove -fno-unit-at-a-time workaround for pre-4.0 GCC

2018-12-07 Thread Masahiro Yamada
x86 maintainers,


Ping.



On Tue, Nov 13, 2018 at 6:48 PM Richard Weinberger  wrote:
>
> Am Montag, 12. November 2018, 03:35:19 CET schrieb Masahiro Yamada:
> > Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
> > bumped the minimum GCC version to 4.6 for all architectures.
> >
> > '$(call cc-option,-fno-unit-at-a-time)' is now dead code since
> > '$(cc-version) -lt 0400' is always false.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/x86/Makefile.um | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/Makefile.um b/arch/x86/Makefile.um
> > index 91085a0..577976b 100644
> > --- a/arch/x86/Makefile.um
> > +++ b/arch/x86/Makefile.um
> > @@ -26,12 +26,8 @@ cflags-y += $(call 
> > cc-option,-mpreferred-stack-boundary=2)
> >  # an unresolved reference.
> >  cflags-y += -ffreestanding
> >
> > -# Disable unit-at-a-time mode on pre-gcc-4.0 compilers, it makes gcc use
> > -# a lot more stack due to the lack of sharing of stacklots.  Also, gcc
> > -# 4.3.0 needs -funit-at-a-time for extern inline functions.
> > -KBUILD_CFLAGS += $(shell if [ $(cc-version) -lt 0400 ] ; then \
> > - echo $(call cc-option,-fno-unit-at-a-time); \
> > - else echo $(call cc-option,-funit-at-a-time); fi ;)
> > +# gcc 4.3.0 needs -funit-at-a-time for extern inline functions.
> > +KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
>
> Acked-by: Richard Weinberger 
>
> Thanks,
> //richard
>
>
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH 2/2] x86, powerpc: remove -funit-at-a-time compiler option entirely

2018-12-07 Thread Masahiro Yamada
x86 maintainers,

Ping.



On Mon, Nov 12, 2018 at 8:23 PM Michael Ellerman  wrote:
>
> Masahiro Yamada  writes:
>
> > GCC 4.6 manual says:
> >
> > -funit-at-a-time
> >   This option is left for compatibility reasons. -funit-at-a-time has
> >   no effect, while -fno-unit-at-a-time implies -fno-toplevel-reorder
> >   and -fno-section-anchors.
> >   Enabled by default.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/powerpc/Makefile | 4 
> >  arch/x86/Makefile | 4 
> >  arch/x86/Makefile.um  | 5 -
> >  3 files changed, 13 deletions(-)
> >
> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> > index 8a2ce14..854199c 100644
> > --- a/arch/powerpc/Makefile
> > +++ b/arch/powerpc/Makefile
> > @@ -228,10 +228,6 @@ KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> >  KBUILD_CFLAGS += $(call cc-option,-mno-spe)
> >  KBUILD_CFLAGS += $(call cc-option,-mspe=no)
> >
> > -# Enable unit-at-a-time mode when possible. It shrinks the
> > -# kernel considerably.
> > -KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> > -
>
> Thanks for cleaning it up.
>
> Acked-by: Michael Ellerman 
>
> cheers



-- 
Best Regards
Masahiro Yamada


[RFC PATCH] hwmon/k10temp: Add Hygon Dhyana support

2018-12-07 Thread Pu Wen
Add support for Hygon Dhyana family 18h processor for k10temp to get the
temperature. As Hygon Dhyana shares the same function interface with AMD
family 17h, so add Hygon PCI Vendor ID and reuse the code path of AMD.

Signed-off-by: Pu Wen 
---
 drivers/hwmon/k10temp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 2cef0c3..e24ba10 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -330,7 +330,7 @@ static int k10temp_probe(struct pci_dev *pdev,
 (boot_cpu_data.x86_model & 0xf0) == 0x70)) {
data->read_htcreg = read_htcreg_nb_f15;
data->read_tempreg = read_tempreg_nb_f15;
-   } else if (boot_cpu_data.x86 == 0x17) {
+   } else if (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) {
data->temp_adjust_mask = 0x8;
data->read_tempreg = read_tempreg_nb_f17;
data->show_tdie = true;
@@ -367,6 +367,7 @@ static const struct pci_device_id k10temp_id_table[] = {
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_DF_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M10H_DF_F3) },
+   { PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) },
{}
 };
 MODULE_DEVICE_TABLE(pci, k10temp_id_table);
-- 
2.7.4



Re: [PATCH] nvme-rdma: complete requests from ->timeout

2018-12-07 Thread Jaesoo Lee
Now, I see that my patch is not safe and can cause double completions.
However, I am having a hard time finding out a good solution to
barrier the racing completions.

Could you suggest where the fix should go and what should it look
like? We can provide more details on reproducing this issue if that
helps.

On Fri, Dec 7, 2018 at 6:04 PM Keith Busch  wrote:
>
> On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote:
> >
> > > Could you please take a look at this bug and code review?
> > >
> > > We are seeing more instances of this bug and found that reconnect_work
> > > could hang as well, as can be seen from below stacktrace.
> > >
> > >   Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> > >   Call Trace:
> > >   __schedule+0x2ab/0x880
> > >   schedule+0x36/0x80
> > >   schedule_timeout+0x161/0x300
> > >   ? __next_timer_interrupt+0xe0/0xe0
> > >   io_schedule_timeout+0x1e/0x50
> > >   wait_for_completion_io_timeout+0x130/0x1a0
> > >   ? wake_up_q+0x80/0x80
> > >   blk_execute_rq+0x6e/0xa0
> > >   __nvme_submit_sync_cmd+0x6e/0xe0
> > >   nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics]
> > >   ? wait_for_completion_interruptible_timeout+0x157/0x1b0
> > >   nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma]
> > >   nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma]
> > >   nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma]
> > >   process_one_work+0x179/0x390
> > >   worker_thread+0x4f/0x3e0
> > >   kthread+0x105/0x140
> > >   ? max_active_store+0x80/0x80
> > >   ? kthread_bind+0x20/0x20
> > >
> > > This bug is produced by setting MTU of RoCE interface to '568' for
> > > test while running I/O traffics.
> >
> > I think that with the latest changes from Keith we can no longer rely
> > on blk-mq to barrier racing completions. We will probably need
> > to barrier ourselves in nvme-rdma...
>
> You really need to do that anyway. If you were relying on blk-mq to save
> you from double completions by ending a request in the nvme driver while
> the lower half can still complete the same one, the only thing preventing
> data corruption is the probability the request wasn't reallocated for a
> new command.


Re: [PATCH] nvme-rdma: complete requests from ->timeout

2018-12-07 Thread Jaesoo Lee
Now, I see that my patch is not safe and can cause double completions.
However, I am having a hard time finding out a good solution to
barrier the racing completions.

Could you suggest where the fix should go and what should it look
like? We can provide more details on reproducing this issue if that
helps.

On Fri, Dec 7, 2018 at 6:04 PM Keith Busch  wrote:
>
> On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote:
> >
> > > Could you please take a look at this bug and code review?
> > >
> > > We are seeing more instances of this bug and found that reconnect_work
> > > could hang as well, as can be seen from below stacktrace.
> > >
> > >   Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> > >   Call Trace:
> > >   __schedule+0x2ab/0x880
> > >   schedule+0x36/0x80
> > >   schedule_timeout+0x161/0x300
> > >   ? __next_timer_interrupt+0xe0/0xe0
> > >   io_schedule_timeout+0x1e/0x50
> > >   wait_for_completion_io_timeout+0x130/0x1a0
> > >   ? wake_up_q+0x80/0x80
> > >   blk_execute_rq+0x6e/0xa0
> > >   __nvme_submit_sync_cmd+0x6e/0xe0
> > >   nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics]
> > >   ? wait_for_completion_interruptible_timeout+0x157/0x1b0
> > >   nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma]
> > >   nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma]
> > >   nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma]
> > >   process_one_work+0x179/0x390
> > >   worker_thread+0x4f/0x3e0
> > >   kthread+0x105/0x140
> > >   ? max_active_store+0x80/0x80
> > >   ? kthread_bind+0x20/0x20
> > >
> > > This bug is produced by setting MTU of RoCE interface to '568' for
> > > test while running I/O traffics.
> >
> > I think that with the latest changes from Keith we can no longer rely
> > on blk-mq to barrier racing completions. We will probably need
> > to barrier ourselves in nvme-rdma...
>
> You really need to do that anyway. If you were relying on blk-mq to save
> you from double completions by ending a request in the nvme driver while
> the lower half can still complete the same one, the only thing preventing
> data corruption is the probability the request wasn't reallocated for a
> new command.


Re: [PATCH 0/7] microblaze: fix various problems in building boot images

2018-12-07 Thread Masahiro Yamada
On Sat, Dec 8, 2018 at 12:20 AM Michal Simek  wrote:
>
> On 07. 12. 18 14:29, Michal Simek wrote:
> > On 07. 12. 18 12:29, Masahiro Yamada wrote:
> >> On Thu, Dec 6, 2018 at 11:55 PM Michal Simek  wrote:
> >>>
> >>> On 03. 12. 18 8:50, Masahiro Yamada wrote:
>  This patch set fixes various issues in microblaze Makefiles.
> 
>  BTW, "simpleImage." works like a phony target to generate the
>  following four images, where the first three are just aliases.
> 
>   - arch/microblaze/boot/simpleImage.:
>   identical to arch/microblaze/boot/linux.bin
> 
>   - arch/microblaze/boot/simpleImage..unstrip:
>   identical to vmlinux
> 
>   - arch/microblaze/boot/simpleImage..ub:
>   identical to arch/microblaze/boot/linux.bin.ub
> 
>   - arch/microblaze/boot/simpleImage..strip:
>   stripped vmlinux
> 
>  I am not sure how much useful those copies are,
>  but, I tried my best to keep the same behavior.
> 
>  IMHO, I guess DTB= would be more sensible,
>  but it is up to Michal.
> 
> 
> 
>  Masahiro Yamada (7):
>    microblaze: fix cleaning of boot images
>    microblaze: adjust the help to the real behavior
>    microblaze: move "... is ready" message to arch/microblaze/Makefile
>    microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
>    microblaze: add linux.bin* and simpleImage.* to PHONY
>    microblaze: fix race condition in building boot images
>    microblaze: remove the unneeded code just in case file copy fails
> 
>   arch/microblaze/Makefile  | 14 +-
>   arch/microblaze/boot/Makefile | 33 +
>   arch/microblaze/boot/dts/Makefile |  5 +
>   3 files changed, 27 insertions(+), 25 deletions(-)
> 
> >>>
> >>> One more thing I have in my mind for a while is that will be good to
> >>> configure kernel build flags from DT and completely get rid of these
> >>> symbols.
> >>>
> >>> XILINX_MICROBLAZE0_USE_MSR_INSTR
> >>> XILINX_MICROBLAZE0_USE_PCMP_INSTR
> >>> XILINX_MICROBLAZE0_USE_BARREL
> >>> XILINX_MICROBLAZE0_USE_DIV
> >>> XILINX_MICROBLAZE0_USE_HW_MUL
> >>> XILINX_MICROBLAZE0_USE_FPU
> >>>
> >>> It means setup CPUFLAGS based on extracting that values from DT that it
> >>> all the time match the hardware.
> >>> It will also simplify all the CPUFLAGS logic which is in
> >>> arch/microblaze/Makefile.
> >>>
> >>> Do you have any idea how this can be done?
> >>
> >>
> >> I have no idea.
> >>
> >> Parsing DTS with sed or something would be possible, but ugly.
> >>
> >> In my opinion, the kernel should be multi platform image,
> >> in other words, it is the least common denominator
> >> of supported platforms.
> >>
> >> So, the kernel should enable all features that may or may not be used
> >> depending on platform.
> >>
> >> I do not know if this works for MB.
> >
> > Microblaze is soft core CPU where you can select if you want to have it
> > with multiplier, divider, barrel shifter, etc.
> > You can of course say that you don't have them and you have "universal"
> > kernel but very slow.
> > That's why user has to setup these configs which are converted to cflags
> > to say GCC what can be used.
> > And these configs can be simply parsed from dt.
> >
> > For example like this based on dtb (quick hack)
> >
> > for i in `echo use-msr-instr use-pcmp-instr use-barrel use-div
> > use-hw-mul use-fpu`; do
> >   UPPER=`echo $i | tr '-' '_' | tr '[a-z]' '[A-Z}'`
> >   echo $i $UPPER;
> >
> >   VAR=`fdtget -t i $FILE/arch/microblaze/boot/dts/system.dtb 
> > /cpus/cpu@0/
> > xlnx,$i`
> >   FLAGS+="CONFIG_XILINX_MICROBLAZE0_${UPPER}=${VAR} "
> > done
> >
> > make $FLAGS
> >
> >
> > When simpleImage is requested dt could be parsed to setup proper build
> > flags.
>
> One more thing I found is that I have done these changes
>
> diff --git a/arch/microblaze/kernel/setup.c b/arch/microblaze/kernel/setup.c
> index bbd6968ce55b..dc6a6fee3ae2 100644
> --- a/arch/microblaze/kernel/setup.c
> +++ b/arch/microblaze/kernel/setup.c
> @@ -153,11 +153,13 @@ void __init machine_early_init(const char
> *cmdline, unsigned int ram,
>  #endif
>
>  #if CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR
> +#error MSR is enabled
> if (msr) {
> pr_info("!!!Your kernel has setup MSR instruction but ");
> pr_cont("CPU don't have it %x\n", msr);
> }
>  #else
> +#error MSR is not enabled
> if (!msr) {
> pr_info("!!!Your kernel not setup MSR instruction but ");
> pr_cont("CPU have it %x\n", msr);
>
> and run MSR with 1
> make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=1
> arch/microblaze/kernel/setup.o
> it errors #error MSR is enabled
> and all is good.
>
> And when I run MSR with 0
> make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=0
> arch/microblaze/kernel/setup.o
> also getting 

Re: [PATCH 0/7] microblaze: fix various problems in building boot images

2018-12-07 Thread Masahiro Yamada
On Sat, Dec 8, 2018 at 12:20 AM Michal Simek  wrote:
>
> On 07. 12. 18 14:29, Michal Simek wrote:
> > On 07. 12. 18 12:29, Masahiro Yamada wrote:
> >> On Thu, Dec 6, 2018 at 11:55 PM Michal Simek  wrote:
> >>>
> >>> On 03. 12. 18 8:50, Masahiro Yamada wrote:
>  This patch set fixes various issues in microblaze Makefiles.
> 
>  BTW, "simpleImage." works like a phony target to generate the
>  following four images, where the first three are just aliases.
> 
>   - arch/microblaze/boot/simpleImage.:
>   identical to arch/microblaze/boot/linux.bin
> 
>   - arch/microblaze/boot/simpleImage..unstrip:
>   identical to vmlinux
> 
>   - arch/microblaze/boot/simpleImage..ub:
>   identical to arch/microblaze/boot/linux.bin.ub
> 
>   - arch/microblaze/boot/simpleImage..strip:
>   stripped vmlinux
> 
>  I am not sure how much useful those copies are,
>  but, I tried my best to keep the same behavior.
> 
>  IMHO, I guess DTB= would be more sensible,
>  but it is up to Michal.
> 
> 
> 
>  Masahiro Yamada (7):
>    microblaze: fix cleaning of boot images
>    microblaze: adjust the help to the real behavior
>    microblaze: move "... is ready" message to arch/microblaze/Makefile
>    microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
>    microblaze: add linux.bin* and simpleImage.* to PHONY
>    microblaze: fix race condition in building boot images
>    microblaze: remove the unneeded code just in case file copy fails
> 
>   arch/microblaze/Makefile  | 14 +-
>   arch/microblaze/boot/Makefile | 33 +
>   arch/microblaze/boot/dts/Makefile |  5 +
>   3 files changed, 27 insertions(+), 25 deletions(-)
> 
> >>>
> >>> One more thing I have in my mind for a while is that will be good to
> >>> configure kernel build flags from DT and completely get rid of these
> >>> symbols.
> >>>
> >>> XILINX_MICROBLAZE0_USE_MSR_INSTR
> >>> XILINX_MICROBLAZE0_USE_PCMP_INSTR
> >>> XILINX_MICROBLAZE0_USE_BARREL
> >>> XILINX_MICROBLAZE0_USE_DIV
> >>> XILINX_MICROBLAZE0_USE_HW_MUL
> >>> XILINX_MICROBLAZE0_USE_FPU
> >>>
> >>> It means setup CPUFLAGS based on extracting that values from DT that it
> >>> all the time match the hardware.
> >>> It will also simplify all the CPUFLAGS logic which is in
> >>> arch/microblaze/Makefile.
> >>>
> >>> Do you have any idea how this can be done?
> >>
> >>
> >> I have no idea.
> >>
> >> Parsing DTS with sed or something would be possible, but ugly.
> >>
> >> In my opinion, the kernel should be multi platform image,
> >> in other words, it is the least common denominator
> >> of supported platforms.
> >>
> >> So, the kernel should enable all features that may or may not be used
> >> depending on platform.
> >>
> >> I do not know if this works for MB.
> >
> > Microblaze is soft core CPU where you can select if you want to have it
> > with multiplier, divider, barrel shifter, etc.
> > You can of course say that you don't have them and you have "universal"
> > kernel but very slow.
> > That's why user has to setup these configs which are converted to cflags
> > to say GCC what can be used.
> > And these configs can be simply parsed from dt.
> >
> > For example like this based on dtb (quick hack)
> >
> > for i in `echo use-msr-instr use-pcmp-instr use-barrel use-div
> > use-hw-mul use-fpu`; do
> >   UPPER=`echo $i | tr '-' '_' | tr '[a-z]' '[A-Z}'`
> >   echo $i $UPPER;
> >
> >   VAR=`fdtget -t i $FILE/arch/microblaze/boot/dts/system.dtb 
> > /cpus/cpu@0/
> > xlnx,$i`
> >   FLAGS+="CONFIG_XILINX_MICROBLAZE0_${UPPER}=${VAR} "
> > done
> >
> > make $FLAGS
> >
> >
> > When simpleImage is requested dt could be parsed to setup proper build
> > flags.
>
> One more thing I found is that I have done these changes
>
> diff --git a/arch/microblaze/kernel/setup.c b/arch/microblaze/kernel/setup.c
> index bbd6968ce55b..dc6a6fee3ae2 100644
> --- a/arch/microblaze/kernel/setup.c
> +++ b/arch/microblaze/kernel/setup.c
> @@ -153,11 +153,13 @@ void __init machine_early_init(const char
> *cmdline, unsigned int ram,
>  #endif
>
>  #if CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR
> +#error MSR is enabled
> if (msr) {
> pr_info("!!!Your kernel has setup MSR instruction but ");
> pr_cont("CPU don't have it %x\n", msr);
> }
>  #else
> +#error MSR is not enabled
> if (!msr) {
> pr_info("!!!Your kernel not setup MSR instruction but ");
> pr_cont("CPU have it %x\n", msr);
>
> and run MSR with 1
> make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=1
> arch/microblaze/kernel/setup.o
> it errors #error MSR is enabled
> and all is good.
>
> And when I run MSR with 0
> make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=0
> arch/microblaze/kernel/setup.o
> also getting 

Re: [PATCH] x86/kernel: Fix more -Wmissing-prototypes warnings

2018-12-07 Thread Masami Hiramatsu
On Fri, 7 Dec 2018 20:48:47 +0100
Borislav Petkov  wrote:

> On Fri, Dec 07, 2018 at 11:42:10PM +0900, Masami Hiramatsu wrote:
> > Hmm, I just thought that the symbol only referred from inline asm should
> > be visible. But if it is OK for any version of supported gcc and clang,
> > I'm good to remove it. :-) (IOW, I just concerned about older gcc)
> 
> I just tried two gcc versions: 8.2 and 4.8.5. Both asms looks good:
> 
> 81044690 :
> ...
> 
> 
> 810446ab:   41 57   push   %r15
> 810446ad:   48 89 e7mov%rsp,%rdi
> 810446b0:   e8 db 01 00 00  callq  81044890 
> 
> 
> there's the CALL...
> 
> 810446b5:   48 89 84 24 98 00 00mov%rax,0x98(%rsp)
> 810446bc:   00
> 
> ... and the handler is at the expected address.
> 
> 81044890 :
> 81044890:   e8 ab c9 7b 00  callq  81801240 
> <__fentry__>
> 81044895:   41 57   push   %r15
> 81044897:   41 56   push   %r14
> 81044899:   41 55   push   %r13
> 8104489b:   49 89 fdmov%rdi,%r13
> 
> Dunno, if you feel like there might be some trouble with some compilers,
> I can keep the hunk below as a separate patch and revert it when it
> explodes somewhere...?

OK, then please remove it. :-) Even if any problem occurs, which must be
build error, so we can revert it in that case.

Thank you!

> 
> > Reviewed-by: Masami Hiramatsu 
> > 
> > Thank you,
> > 
> > > 
> > > ---
> > > diff --git a/arch/x86/kernel/kprobes/core.c 
> > > b/arch/x86/kernel/kprobes/core.c
> > > index 6480056d370f..308bf103cc73 100644
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -66,8 +66,6 @@
> > >  
> > >  #include "common.h"
> > >  
> > > -void *trampoline_handler(struct pt_regs *regs);
> > > -
> > >  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
> > >  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
> > >  
> > > @@ -753,7 +751,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > >  /*
> > >   * Called from kretprobe_trampoline
> > >   */
> > > -__visible __used void *trampoline_handler(struct pt_regs *regs)
> > > +static __used void *trampoline_handler(struct pt_regs *regs)
> > >  {
> > >   struct kretprobe_instance *ri = NULL;
> > >   struct hlist_head *head, empty_rp;
> > > 
> > > -- 
> > > Regards/Gruss,
> > > Boris.
> > > 
> > > Good mailing practices for 400: avoid top-posting and trim the reply.
> > 
> > 
> > -- 
> > Masami Hiramatsu 
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.


-- 
Masami Hiramatsu 


Re: [PATCH] x86/kernel: Fix more -Wmissing-prototypes warnings

2018-12-07 Thread Masami Hiramatsu
On Fri, 7 Dec 2018 20:48:47 +0100
Borislav Petkov  wrote:

> On Fri, Dec 07, 2018 at 11:42:10PM +0900, Masami Hiramatsu wrote:
> > Hmm, I just thought that the symbol only referred from inline asm should
> > be visible. But if it is OK for any version of supported gcc and clang,
> > I'm good to remove it. :-) (IOW, I just concerned about older gcc)
> 
> I just tried two gcc versions: 8.2 and 4.8.5. Both asms looks good:
> 
> 81044690 :
> ...
> 
> 
> 810446ab:   41 57   push   %r15
> 810446ad:   48 89 e7mov%rsp,%rdi
> 810446b0:   e8 db 01 00 00  callq  81044890 
> 
> 
> there's the CALL...
> 
> 810446b5:   48 89 84 24 98 00 00mov%rax,0x98(%rsp)
> 810446bc:   00
> 
> ... and the handler is at the expected address.
> 
> 81044890 :
> 81044890:   e8 ab c9 7b 00  callq  81801240 
> <__fentry__>
> 81044895:   41 57   push   %r15
> 81044897:   41 56   push   %r14
> 81044899:   41 55   push   %r13
> 8104489b:   49 89 fdmov%rdi,%r13
> 
> Dunno, if you feel like there might be some trouble with some compilers,
> I can keep the hunk below as a separate patch and revert it when it
> explodes somewhere...?

OK, then please remove it. :-) Even if any problem occurs, which must be
build error, so we can revert it in that case.

Thank you!

> 
> > Reviewed-by: Masami Hiramatsu 
> > 
> > Thank you,
> > 
> > > 
> > > ---
> > > diff --git a/arch/x86/kernel/kprobes/core.c 
> > > b/arch/x86/kernel/kprobes/core.c
> > > index 6480056d370f..308bf103cc73 100644
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -66,8 +66,6 @@
> > >  
> > >  #include "common.h"
> > >  
> > > -void *trampoline_handler(struct pt_regs *regs);
> > > -
> > >  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
> > >  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
> > >  
> > > @@ -753,7 +751,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > >  /*
> > >   * Called from kretprobe_trampoline
> > >   */
> > > -__visible __used void *trampoline_handler(struct pt_regs *regs)
> > > +static __used void *trampoline_handler(struct pt_regs *regs)
> > >  {
> > >   struct kretprobe_instance *ri = NULL;
> > >   struct hlist_head *head, empty_rp;
> > > 
> > > -- 
> > > Regards/Gruss,
> > > Boris.
> > > 
> > > Good mailing practices for 400: avoid top-posting and trim the reply.
> > 
> > 
> > -- 
> > Masami Hiramatsu 
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.


-- 
Masami Hiramatsu 


RE: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"

2018-12-07 Thread Clark Wang
Hi Joe,

This patch series adds slave mode for lpspi controller. So this driver can 
support both master and slave mode for lpspi controller after apply this patch 
series. Currently, both master mode and slave mode share the code in this 
driver. Therefore, using spi_master to represent the structure of slave mode 
will cause confusion.

When Geert Uytterhoeven add the slave support for spi, he had done " Generalize 
SPI 'master' to 'controller'" step, too. The commit ID is 
8caab75fd2c2a92667cbb1cd315720bede3feaa9.

Thank you.

Regards,
Clark Wang

> -Original Message-
> From: Joe Perches 
> Sent: Saturday, December 8, 2018 13:14
> To: Clark Wang ; broo...@kernel.org
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"
> 
> On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote:
> > In order to enable the slave mode and make the code more readable,
> > replace all related structure names and object names which is named
> > "master" with "controller".
> 
> In what sense does this make the code more readable?
> 



RE: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"

2018-12-07 Thread Clark Wang
Hi Joe,

This patch series adds slave mode for lpspi controller. So this driver can 
support both master and slave mode for lpspi controller after apply this patch 
series. Currently, both master mode and slave mode share the code in this 
driver. Therefore, using spi_master to represent the structure of slave mode 
will cause confusion.

When Geert Uytterhoeven add the slave support for spi, he had done " Generalize 
SPI 'master' to 'controller'" step, too. The commit ID is 
8caab75fd2c2a92667cbb1cd315720bede3feaa9.

Thank you.

Regards,
Clark Wang

> -Original Message-
> From: Joe Perches 
> Sent: Saturday, December 8, 2018 13:14
> To: Clark Wang ; broo...@kernel.org
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"
> 
> On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote:
> > In order to enable the slave mode and make the code more readable,
> > replace all related structure names and object names which is named
> > "master" with "controller".
> 
> In what sense does this make the code more readable?
> 



[PATCH v5 0/1] signaling processes through pidfds

2018-12-07 Thread Christian Brauner
Hey everyone,

This is v5 of this patchset.
v5 does not introduce any functional changes since none were requested
or required in the thread. Instead, it focusses on updated documentation
making it very clear what the intentions are how to extend this syscall.

Eric, I dragged Serge into this and we went through the mails and tried
to very thoroughly address your concerns about whether the width of the
target should depend on flags or file descriptor types. As far as we
understand from the threads this was your ultimate worry and also the
reason why you withheld your agreement to the name of the syscall.
We have outlined how the syscall is intended to be extended and decided
that flags (e.g. PIDFD_TYPE_TID, PIDFD_TYPE_PGID) are the way to go. In
line with this we decided to accept "pidfd_" as prefix for the new
syscall.
All concerns we could identify and understand we tried to address. I
hope this will be sufficient for you to get behind the patch.
The relevant section in the commit message is titled:

/* sending signals to threads (tid) and process groups (pgid) */

Thanks!
Christian

Christian Brauner (1):
  signal: add pidfd_send_signal() syscall

 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/proc/base.c |  20 +++-
 include/linux/proc_fs.h|  12 +++
 include/linux/syscalls.h   |   3 +
 include/uapi/asm-generic/unistd.h  |   4 +-
 kernel/signal.c| 141 +++--
 7 files changed, 173 insertions(+), 9 deletions(-)

-- 
2.19.1



[PATCH v5 1/1] signal: add pidfd_send_signal() syscall

2018-12-07 Thread Christian Brauner
The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/ as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);

In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(, ). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]). For example, if a pidfd for a tid from
/proc//task/ is passed EOPNOTSUPP will be returned to give
userspace a way to detect when I add support for signaling to threads (cf. 
[10]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13] and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* O_PATH file descriptors */
pidfds opened as O_PATH fds cannot be used to send signals to a process
(cf. [2]). Signaling processes through pidfds is the equivalent of writing
to a file. Thus, this is not an operation that operates "purely at the file
descriptor level" as required by the open(2) manpage.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).

/* compat 

[PATCH v5 0/1] signaling processes through pidfds

2018-12-07 Thread Christian Brauner
Hey everyone,

This is v5 of this patchset.
v5 does not introduce any functional changes since none were requested
or required in the thread. Instead, it focusses on updated documentation
making it very clear what the intentions are how to extend this syscall.

Eric, I dragged Serge into this and we went through the mails and tried
to very thoroughly address your concerns about whether the width of the
target should depend on flags or file descriptor types. As far as we
understand from the threads this was your ultimate worry and also the
reason why you withheld your agreement to the name of the syscall.
We have outlined how the syscall is intended to be extended and decided
that flags (e.g. PIDFD_TYPE_TID, PIDFD_TYPE_PGID) are the way to go. In
line with this we decided to accept "pidfd_" as prefix for the new
syscall.
All concerns we could identify and understand we tried to address. I
hope this will be sufficient for you to get behind the patch.
The relevant section in the commit message is titled:

/* sending signals to threads (tid) and process groups (pgid) */

Thanks!
Christian

Christian Brauner (1):
  signal: add pidfd_send_signal() syscall

 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/proc/base.c |  20 +++-
 include/linux/proc_fs.h|  12 +++
 include/linux/syscalls.h   |   3 +
 include/uapi/asm-generic/unistd.h  |   4 +-
 kernel/signal.c| 141 +++--
 7 files changed, 173 insertions(+), 9 deletions(-)

-- 
2.19.1



[PATCH v5 1/1] signal: add pidfd_send_signal() syscall

2018-12-07 Thread Christian Brauner
The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/ as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);

In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(, ). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]). For example, if a pidfd for a tid from
/proc//task/ is passed EOPNOTSUPP will be returned to give
userspace a way to detect when I add support for signaling to threads (cf. 
[10]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13] and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* O_PATH file descriptors */
pidfds opened as O_PATH fds cannot be used to send signals to a process
(cf. [2]). Signaling processes through pidfds is the equivalent of writing
to a file. Thus, this is not an operation that operates "purely at the file
descriptor level" as required by the open(2) manpage.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).

/* compat 

arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible constraints

2018-12-07 Thread kbuild test robot
Hi Juergen,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   5f179793f0a73965681db6a3203fa1baabd9b3c3
commit: 6da63eb241a05b0e676d68975e793c0521387141 x86/paravirt: Move the 
pv_irq_ops under the PARAVIRT_XXL umbrella
date:   3 months ago
config: i386-randconfig-sb0-12081226 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 6da63eb241a05b0e676d68975e793c0521387141
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:8:0,
from arch/x86/include/asm/msr.h:67,
from arch/x86/include/asm/processor.h:21,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from mm/slub.c:13:
   mm/slub.c: In function '__slab_free.isra.74':
>> arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible 
>> constraints
 asm volatile(pfx "cmpxchg%c4b %2; sete %0"   \
 ^
   arch/x86/include/asm/cmpxchg.h:254:2: note: in expansion of macro 
'__cmpxchg_double'
 __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)
 ^
   include/asm-generic/atomic-instrumented.h:457:2: note: in expansion of macro 
'arch_cmpxchg_double'
 arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \
 ^
   mm/slub.c:403:7: note: in expansion of macro 'cmpxchg_double'
  if (cmpxchg_double(>freelist, >counters,
  ^

vim +/asm +245 arch/x86/include/asm/cmpxchg.h

3d94ae0c Jeremy Fitzhardinge 2011-09-28  235  
cdcd6298 Jan Beulich 2012-01-02  236  #define __cmpxchg_double(pfx, p1, 
p2, o1, o2, n1, n2) \
cdcd6298 Jan Beulich 2012-01-02  237  ({
\
cdcd6298 Jan Beulich 2012-01-02  238bool __ret; 
\
cdcd6298 Jan Beulich 2012-01-02  239__typeof__(*(p1)) __old1 = 
(o1), __new1 = (n1); \
cdcd6298 Jan Beulich 2012-01-02  240__typeof__(*(p2)) __old2 = 
(o2), __new2 = (n2); \
cdcd6298 Jan Beulich 2012-01-02  241BUILD_BUG_ON(sizeof(*(p1)) != 
sizeof(long));\
cdcd6298 Jan Beulich 2012-01-02  242BUILD_BUG_ON(sizeof(*(p2)) != 
sizeof(long));\
cdcd6298 Jan Beulich 2012-01-02  243VM_BUG_ON((unsigned long)(p1) % 
(2 * sizeof(long)));\
cdcd6298 Jan Beulich 2012-01-02  244VM_BUG_ON((unsigned long)((p1) 
+ 1) != (unsigned long)(p2));\
cdcd6298 Jan Beulich 2012-01-02 @245asm volatile(pfx "cmpxchg%c4b 
%2; sete %0"  \
cdcd6298 Jan Beulich 2012-01-02  246 : "=a" (__ret), 
"+d" (__old2), \
cdcd6298 Jan Beulich 2012-01-02  247   "+m" (*(p1)), 
"+m" (*(p2))   \
cdcd6298 Jan Beulich 2012-01-02  248 : "i" (2 * 
sizeof(long)), "a" (__old1),\
cdcd6298 Jan Beulich 2012-01-02  249   "b" (__new1), 
"c" (__new2)); \
cdcd6298 Jan Beulich 2012-01-02  250__ret;  
\
cdcd6298 Jan Beulich 2012-01-02  251  })
cdcd6298 Jan Beulich 2012-01-02  252  

:: The code at line 245 was first introduced by commit
:: cdcd629869fabcd38ebd24a03b0a05ec1cbcafb0 x86: Fix and improve 
cmpxchg_double{,_local}()

:: TO: Jan Beulich 
:: CC: Ingo Molnar 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible constraints

2018-12-07 Thread kbuild test robot
Hi Juergen,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   5f179793f0a73965681db6a3203fa1baabd9b3c3
commit: 6da63eb241a05b0e676d68975e793c0521387141 x86/paravirt: Move the 
pv_irq_ops under the PARAVIRT_XXL umbrella
date:   3 months ago
config: i386-randconfig-sb0-12081226 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 6da63eb241a05b0e676d68975e793c0521387141
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:8:0,
from arch/x86/include/asm/msr.h:67,
from arch/x86/include/asm/processor.h:21,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from mm/slub.c:13:
   mm/slub.c: In function '__slab_free.isra.74':
>> arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible 
>> constraints
 asm volatile(pfx "cmpxchg%c4b %2; sete %0"   \
 ^
   arch/x86/include/asm/cmpxchg.h:254:2: note: in expansion of macro 
'__cmpxchg_double'
 __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)
 ^
   include/asm-generic/atomic-instrumented.h:457:2: note: in expansion of macro 
'arch_cmpxchg_double'
 arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \
 ^
   mm/slub.c:403:7: note: in expansion of macro 'cmpxchg_double'
  if (cmpxchg_double(>freelist, >counters,
  ^

vim +/asm +245 arch/x86/include/asm/cmpxchg.h

3d94ae0c Jeremy Fitzhardinge 2011-09-28  235  
cdcd6298 Jan Beulich 2012-01-02  236  #define __cmpxchg_double(pfx, p1, 
p2, o1, o2, n1, n2) \
cdcd6298 Jan Beulich 2012-01-02  237  ({
\
cdcd6298 Jan Beulich 2012-01-02  238bool __ret; 
\
cdcd6298 Jan Beulich 2012-01-02  239__typeof__(*(p1)) __old1 = 
(o1), __new1 = (n1); \
cdcd6298 Jan Beulich 2012-01-02  240__typeof__(*(p2)) __old2 = 
(o2), __new2 = (n2); \
cdcd6298 Jan Beulich 2012-01-02  241BUILD_BUG_ON(sizeof(*(p1)) != 
sizeof(long));\
cdcd6298 Jan Beulich 2012-01-02  242BUILD_BUG_ON(sizeof(*(p2)) != 
sizeof(long));\
cdcd6298 Jan Beulich 2012-01-02  243VM_BUG_ON((unsigned long)(p1) % 
(2 * sizeof(long)));\
cdcd6298 Jan Beulich 2012-01-02  244VM_BUG_ON((unsigned long)((p1) 
+ 1) != (unsigned long)(p2));\
cdcd6298 Jan Beulich 2012-01-02 @245asm volatile(pfx "cmpxchg%c4b 
%2; sete %0"  \
cdcd6298 Jan Beulich 2012-01-02  246 : "=a" (__ret), 
"+d" (__old2), \
cdcd6298 Jan Beulich 2012-01-02  247   "+m" (*(p1)), 
"+m" (*(p2))   \
cdcd6298 Jan Beulich 2012-01-02  248 : "i" (2 * 
sizeof(long)), "a" (__old1),\
cdcd6298 Jan Beulich 2012-01-02  249   "b" (__new1), 
"c" (__new2)); \
cdcd6298 Jan Beulich 2012-01-02  250__ret;  
\
cdcd6298 Jan Beulich 2012-01-02  251  })
cdcd6298 Jan Beulich 2012-01-02  252  

:: The code at line 245 was first introduced by commit
:: cdcd629869fabcd38ebd24a03b0a05ec1cbcafb0 x86: Fix and improve 
cmpxchg_double{,_local}()

:: TO: Jan Beulich 
:: CC: Ingo Molnar 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/2] arm64: dts: ti: k3-am654: Add Support for MMC/SD

2018-12-07 Thread Vignesh R



On 07/12/18 2:12 PM, Faiz Abbas wrote:
> There are two MMC host controller instances present on the TI's
> Am654 SOCs. Add device tree nodes for the same.
> 
> Signed-off-by: Faiz Abbas 
> ---
>  arch/arm64/boot/dts/ti/k3-am65-main.dtsi | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi 
> b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi
> index 916434839603..d07212f16a81 100644
> --- a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi
> +++ b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi
> @@ -129,4 +129,32 @@
>   clocks = <_clks 113 1>;
>   power-domains = <_pds 113>;
>   };
> +
> + sdhci0: sdhci@4f8 {
> + compatible = "ti,am654-sdhci-5.1";
> + reg = <0x0 0x4f8 0x0 0x260>, <0x0 0x4f9 0x0 0x134>;
> + power-domains = <_pds 47>;
> + clocks = <_clks 47 0>, <_clks 47 1>;
> + clock-names = "clk_ahb", "clk_xin";
> + interrupts = ;
> + sdhci-caps-mask = <0x8007 0x0>;
> + mmc-ddr-1_8v;
> + ti,otap-del-sel = <0x2>;
> + ti,trm-icp = <0x8>;
> + status = "disabled";
> + };

Please drop "status=disabled" from dtsi. Can be disabled as required in
the board dts.

> +
> + sdhci1: sdhci@4fa {
> + compatible = "ti,am654-sdhci-5.1";
> + reg = <0x0 0x4fa 0x0 0x260>, <0x0 0x4fb 0x0 0x134>;
> + power-domains = <_pds 48>;
> + clocks = <_clks 48 0>, <_clks 48 1>;
> + clock-names = "clk_ahb", "clk_xin";
> + interrupts = ;
> + sdhci-caps-mask = <0x8007 0x0>;
> + mmc-ddr-1_8v;
> + ti,otap-del-sel = <0x2>;
> + ti,trm-icp = <0x8>;
> + status = "disabled";
> + };
>  };
> 

-- 
Regards
Vignesh


Re: [PATCH 1/2] arm64: dts: ti: k3-am654: Add Support for MMC/SD

2018-12-07 Thread Vignesh R



On 07/12/18 2:12 PM, Faiz Abbas wrote:
> There are two MMC host controller instances present on the TI's
> Am654 SOCs. Add device tree nodes for the same.
> 
> Signed-off-by: Faiz Abbas 
> ---
>  arch/arm64/boot/dts/ti/k3-am65-main.dtsi | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi 
> b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi
> index 916434839603..d07212f16a81 100644
> --- a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi
> +++ b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi
> @@ -129,4 +129,32 @@
>   clocks = <_clks 113 1>;
>   power-domains = <_pds 113>;
>   };
> +
> + sdhci0: sdhci@4f8 {
> + compatible = "ti,am654-sdhci-5.1";
> + reg = <0x0 0x4f8 0x0 0x260>, <0x0 0x4f9 0x0 0x134>;
> + power-domains = <_pds 47>;
> + clocks = <_clks 47 0>, <_clks 47 1>;
> + clock-names = "clk_ahb", "clk_xin";
> + interrupts = ;
> + sdhci-caps-mask = <0x8007 0x0>;
> + mmc-ddr-1_8v;
> + ti,otap-del-sel = <0x2>;
> + ti,trm-icp = <0x8>;
> + status = "disabled";
> + };

Please drop "status=disabled" from dtsi. Can be disabled as required in
the board dts.

> +
> + sdhci1: sdhci@4fa {
> + compatible = "ti,am654-sdhci-5.1";
> + reg = <0x0 0x4fa 0x0 0x260>, <0x0 0x4fb 0x0 0x134>;
> + power-domains = <_pds 48>;
> + clocks = <_clks 48 0>, <_clks 48 1>;
> + clock-names = "clk_ahb", "clk_xin";
> + interrupts = ;
> + sdhci-caps-mask = <0x8007 0x0>;
> + mmc-ddr-1_8v;
> + ti,otap-del-sel = <0x2>;
> + ti,trm-icp = <0x8>;
> + status = "disabled";
> + };
>  };
> 

-- 
Regards
Vignesh


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote:
> I see. OK, HMM has done an efficient job of mopping up unused fields, and now 
> we are
> completely out of space. At this point, after thinking about it carefully, it 
> seems clear
> that it's time for a single, new field:

Sorry for not replying earlier; I'm travelling and have had trouble
keeping on top of my mail.

Adding this field will grow struct page by 4-8 bytes, so it will no
longer be 64 bytes.  This isn't an acceptable answer.

We have a few options for bits.  One is that we have (iirc) two
bits available in page->flags on 32-bit.  That'll force a few more
configurations into using _last_cpupid and/or page_ext.  I'm not a huge
fan of this approach.

The second is to use page->lru.next bit 1.  This requires some care
because m68k allows misaligned pointers.  If the list_head that it's
joined to is misaligned, we'll be in trouble.  This can get tricky because
some pages are attached to list_heads which are on the stack ... and I
don't think gcc guarantees __aligned attributes work for stack variables.

The third is to use page->lru.prev bit 0.  We'd want to switch pgmap
and hmm_data around to make this work, and we'd want to record this
in mm_types.h so nobody tries to use a field which aliases with
page->lru.prev and has bit 0 set on a page which can be mapped to
userspace (which I currently believe to be true).

The fourth is to use a bit in page->flags for 64-bit and a bit in
page_ext->flags for 32-bit.  Or we could get rid of page_ext and grow
struct page with a ->flags2 on 32-bit.

Fifth, it isn't clear to me how many bits might be left in ->_last_cpupid
at this point, and perhaps there's scope for using a bit in there.

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ed8f6292a53..1c789e324da8 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -182,6 +182,9 @@ struct page {
> /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
> atomic_t _refcount;
>  
> +   /* DMA usage count. See get_user_pages*(), put_user_page*(). */
> +   atomic_t _dma_pinned_count;
> +
>  #ifdef CONFIG_MEMCG
> struct mem_cgroup *mem_cgroup;
>  #endif
> 
> 
> ...because after all, the reason this is so difficult is that this fix has to 
> work
> in pretty much every configuration. get_user_pages() use is widespread, it's 
> a very
> general facility, and...it needs fixing.  And we're out of space. 
> 
> I'm going to send out an updated RFC that shows the latest, and I think it's 
> going
> to include the above.
> 
> -- 
> thanks,
> John Hubbard
> NVIDIA
> 


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote:
> I see. OK, HMM has done an efficient job of mopping up unused fields, and now 
> we are
> completely out of space. At this point, after thinking about it carefully, it 
> seems clear
> that it's time for a single, new field:

Sorry for not replying earlier; I'm travelling and have had trouble
keeping on top of my mail.

Adding this field will grow struct page by 4-8 bytes, so it will no
longer be 64 bytes.  This isn't an acceptable answer.

We have a few options for bits.  One is that we have (iirc) two
bits available in page->flags on 32-bit.  That'll force a few more
configurations into using _last_cpupid and/or page_ext.  I'm not a huge
fan of this approach.

The second is to use page->lru.next bit 1.  This requires some care
because m68k allows misaligned pointers.  If the list_head that it's
joined to is misaligned, we'll be in trouble.  This can get tricky because
some pages are attached to list_heads which are on the stack ... and I
don't think gcc guarantees __aligned attributes work for stack variables.

The third is to use page->lru.prev bit 0.  We'd want to switch pgmap
and hmm_data around to make this work, and we'd want to record this
in mm_types.h so nobody tries to use a field which aliases with
page->lru.prev and has bit 0 set on a page which can be mapped to
userspace (which I currently believe to be true).

The fourth is to use a bit in page->flags for 64-bit and a bit in
page_ext->flags for 32-bit.  Or we could get rid of page_ext and grow
struct page with a ->flags2 on 32-bit.

Fifth, it isn't clear to me how many bits might be left in ->_last_cpupid
at this point, and perhaps there's scope for using a bit in there.

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ed8f6292a53..1c789e324da8 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -182,6 +182,9 @@ struct page {
> /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
> atomic_t _refcount;
>  
> +   /* DMA usage count. See get_user_pages*(), put_user_page*(). */
> +   atomic_t _dma_pinned_count;
> +
>  #ifdef CONFIG_MEMCG
> struct mem_cgroup *mem_cgroup;
>  #endif
> 
> 
> ...because after all, the reason this is so difficult is that this fix has to 
> work
> in pretty much every configuration. get_user_pages() use is widespread, it's 
> a very
> general facility, and...it needs fixing.  And we're out of space. 
> 
> I'm going to send out an updated RFC that shows the latest, and I think it's 
> going
> to include the above.
> 
> -- 
> thanks,
> John Hubbard
> NVIDIA
> 


Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"

2018-12-07 Thread Joe Perches
On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote:
> In order to enable the slave mode and make the code more readable,
> replace all related structure names and object names which is
> named "master" with "controller".

In what sense does this make the code more readable?




Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"

2018-12-07 Thread Joe Perches
On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote:
> In order to enable the slave mode and make the code more readable,
> replace all related structure names and object names which is
> named "master" with "controller".

In what sense does this make the code more readable?




[GIT PULL] Qualcomm ARM64 DT updates for 4.21 - Part 2

2018-12-07 Thread Andy Gross
The following changes since commit 70827d9f6bc4f481fafe790dd6654ba568526768:

  arm64: dts: qcom: msm8998: Fix compatible of scm node (2018-11-30 07:59:02 
-0600)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
tags/qcom-arm64-for-4.21-2

for you to fetch changes up to 1504b91c819359b574b55c269c850352260b8d19:

  arm64: dts: msm8996: Use dwc3-qcom glue driver for USB (2018-12-07 14:10:56 
-0600)


Qualcomm ARM64 Updates for v4.21 Part 2

* Switch to use dwc3-qcom glue driver on MSM8996
* Fix issue with xo clk name on MSM8998
* Add cooling maps on MSM8916
* Add UART nodes on SDM845
* Add camera subsystem support on MSM8996 and MSM8916


Andy Gross (1):
  arm64: dts: qcom: msm8998: Fixup clock to use xo_board

Manu Gautam (1):
  arm64: dts: msm8996: Use dwc3-qcom glue driver for USB

Matthias Kaehlcke (1):
  arm64: dts: qcom: sdm845: Add UART nodes

Todor Tomov (6):
  arm64: dts: qcom: msm8916: Add IOMMU sub-node for VFE context bank
  arm64: dts: qcom: msm8916: Add CAMSS support
  arm64: dts: qcom: Add Camera Control Interface pinctrls
  arm64: dts: qcom: Add pinctrls for camera sensors
  arm64: dts: qcom: msm8996: Add VFE SMMU node
  arm64: dts: qcom: msm8996: Add CAMSS support

Viresh Kumar (1):
  arm64: dts: msm8916: Add all CPUs in cooling maps

 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi |   6 +-
 arch/arm64/boot/dts/qcom/msm8916-pins.dtsi   |  76 
 arch/arm64/boot/dts/qcom/msm8916.dtsi|  97 +-
 arch/arm64/boot/dts/qcom/msm8996-pins.dtsi   | 120 
 arch/arm64/boot/dts/qcom/msm8996.dtsi| 162 +++-
 arch/arm64/boot/dts/qcom/msm8998.dtsi|   3 +-
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 270 +++
 7 files changed, 725 insertions(+), 9 deletions(-)


[GIT PULL] Qualcomm Driver updates for 4.21 - Part 2

2018-12-07 Thread Andy Gross
The following changes since commit b601f73130a375c912d9f2ec93c5f3cea5d6a3da:

  drm: msm: Check cmd_db_read_aux_data() for failure (2018-11-29 17:41:53 -0600)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
tags/qcom-drivers-for-4.21-2

for you to fetch changes up to 92def04bd7b46f5b186f985d5c9d3804858c3c2f:

  MAINTAINERS: Change Todor Tomov's email address (2018-12-05 16:45:34 -0600)


Qualcomm ARM Based Driver Updates for v4.21 Part 2

* Fix MAINTAINERS email address for Todor
* Fix SCM compilation error


Jonathan Marek (1):
  firmware: qcom: scm: fix compilation error when disabled

Todor Tomov (1):
  MAINTAINERS: Change Todor Tomov's email address

 MAINTAINERS  | 2 +-
 include/linux/qcom_scm.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)


[GIT PULL] Qualcomm ARM64 DT updates for 4.21 - Part 2

2018-12-07 Thread Andy Gross
The following changes since commit 70827d9f6bc4f481fafe790dd6654ba568526768:

  arm64: dts: qcom: msm8998: Fix compatible of scm node (2018-11-30 07:59:02 
-0600)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
tags/qcom-arm64-for-4.21-2

for you to fetch changes up to 1504b91c819359b574b55c269c850352260b8d19:

  arm64: dts: msm8996: Use dwc3-qcom glue driver for USB (2018-12-07 14:10:56 
-0600)


Qualcomm ARM64 Updates for v4.21 Part 2

* Switch to use dwc3-qcom glue driver on MSM8996
* Fix issue with xo clk name on MSM8998
* Add cooling maps on MSM8916
* Add UART nodes on SDM845
* Add camera subsystem support on MSM8996 and MSM8916


Andy Gross (1):
  arm64: dts: qcom: msm8998: Fixup clock to use xo_board

Manu Gautam (1):
  arm64: dts: msm8996: Use dwc3-qcom glue driver for USB

Matthias Kaehlcke (1):
  arm64: dts: qcom: sdm845: Add UART nodes

Todor Tomov (6):
  arm64: dts: qcom: msm8916: Add IOMMU sub-node for VFE context bank
  arm64: dts: qcom: msm8916: Add CAMSS support
  arm64: dts: qcom: Add Camera Control Interface pinctrls
  arm64: dts: qcom: Add pinctrls for camera sensors
  arm64: dts: qcom: msm8996: Add VFE SMMU node
  arm64: dts: qcom: msm8996: Add CAMSS support

Viresh Kumar (1):
  arm64: dts: msm8916: Add all CPUs in cooling maps

 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi |   6 +-
 arch/arm64/boot/dts/qcom/msm8916-pins.dtsi   |  76 
 arch/arm64/boot/dts/qcom/msm8916.dtsi|  97 +-
 arch/arm64/boot/dts/qcom/msm8996-pins.dtsi   | 120 
 arch/arm64/boot/dts/qcom/msm8996.dtsi| 162 +++-
 arch/arm64/boot/dts/qcom/msm8998.dtsi|   3 +-
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 270 +++
 7 files changed, 725 insertions(+), 9 deletions(-)


[GIT PULL] Qualcomm Driver updates for 4.21 - Part 2

2018-12-07 Thread Andy Gross
The following changes since commit b601f73130a375c912d9f2ec93c5f3cea5d6a3da:

  drm: msm: Check cmd_db_read_aux_data() for failure (2018-11-29 17:41:53 -0600)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
tags/qcom-drivers-for-4.21-2

for you to fetch changes up to 92def04bd7b46f5b186f985d5c9d3804858c3c2f:

  MAINTAINERS: Change Todor Tomov's email address (2018-12-05 16:45:34 -0600)


Qualcomm ARM Based Driver Updates for v4.21 Part 2

* Fix MAINTAINERS email address for Todor
* Fix SCM compilation error


Jonathan Marek (1):
  firmware: qcom: scm: fix compilation error when disabled

Todor Tomov (1):
  MAINTAINERS: Change Todor Tomov's email address

 MAINTAINERS  | 2 +-
 include/linux/qcom_scm.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)


Re: [PATCH v2 01/34] kbuild: Add support for DT binding schema checks

2018-12-07 Thread Masahiro Yamada
Hi Rob,


On Tue, Dec 4, 2018 at 6:32 AM Rob Herring  wrote:
>
> This adds the build infrastructure for checking DT binding schema
> documents and validating dts files using the binding schema.
>
> Check DT binding schema documents:
> make dt_binding_check
>
> Build dts files and check using DT binding schema:
> make dtbs_check
>
> Optionally, DT_SCHEMA_FILES can passed in with a schema file(s) to use
> for validation. This makes it easier to find and fix errors generated by
> a specific schema.
>
> Currently, the validation targets are separate from a normal build to
> avoid a hard dependency on the external DT schema project and because
> there are lots of warnings generated.
>
> Cc: Jonathan Corbet 
> Cc: Mark Rutland 
> Cc: Masahiro Yamada 
> Cc: Michal Marek 
> Cc: linux-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kbu...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
>  .gitignore   |  1 +
>  Documentation/Makefile   |  2 +-
>  Documentation/devicetree/bindings/.gitignore |  1 +
>  Documentation/devicetree/bindings/Makefile   | 33 
>  Makefile | 11 +--
>  scripts/Makefile.lib | 24 --
>  6 files changed, 67 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/.gitignore
>  create mode 100644 Documentation/devicetree/bindings/Makefile
>
> diff --git a/.gitignore b/.gitignore
> index 97ba6b79834c..a20ac26aa2f5 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -15,6 +15,7 @@
>  *.bin
>  *.bz2
>  *.c.[012]*.*
> +*.dt.yaml
>  *.dtb
>  *.dtb.S
>  *.dwo
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 2ca77ad0f238..9786957c6a35 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -2,7 +2,7 @@
>  # Makefile for Sphinx documentation
>  #
>
> -subdir-y :=
> +subdir-y := devicetree/bindings/
>
>  # You can set these variables from the command line.
>  SPHINXBUILD   = sphinx-build
> diff --git a/Documentation/devicetree/bindings/.gitignore 
> b/Documentation/devicetree/bindings/.gitignore
> new file mode 100644
> index ..d9194c02dd08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/.gitignore
> @@ -0,0 +1 @@
> +*.example.dts
> diff --git a/Documentation/devicetree/bindings/Makefile 
> b/Documentation/devicetree/bindings/Makefile
> new file mode 100644
> index ..ee0110dd8131
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/Makefile
> @@ -0,0 +1,33 @@
> +# SPDX-License-Identifier: GPL-2.0
> +DT_DOC_CHECKER ?= dt-doc-validate
> +DT_EXTRACT_EX ?= dt-extract-example
> +DT_MK_SCHEMA ?= dt-mk-schema
> +DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u)
> +
> +quiet_cmd_chk_binding = CHKDT   $<
> +  cmd_chk_binding = (set -e; \
> + $(DT_DOC_CHECKER) $< ; \
> + mkdir -p $(dir $@) ; \
> + $(DT_EXTRACT_EX) $< > $@ )
> +
> +$(obj)/%.example.dts: $(src)/%.yaml FORCE
> +   $(call if_changed,chk_binding)
> +
> +DT_TMP_SCHEMA := .schema.yaml.tmp
> +extra-y += $(DT_TMP_SCHEMA)
> +
> +quiet_cmd_mk_schema = SCHEMA  $@
> +  cmd_mk_schema = mkdir -p $(obj); \
> +  rm -f $@; \
> +  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $<


I think '$<' is wrong.

'$<' is replaced with the first prerequisite.


You can easily check what is happening here.

$ cat   Documentation/devicetree/bindings/..schema.yaml.tmp.cmd
cmd_Documentation/devicetree/bindings/.schema.yaml.tmp := mkdir -p
Documentation/devicetree/bindings; rm -f
Documentation/devicetree/bindings/.schema.yaml.tmp; dt-mk-schema  -o
Documentation/devicetree/bindings/.schema.yaml.tmp
Documentation/devicetree/bindings/arm/ti/ti,davinci.yaml


So, the dt-validater will check only binding from ti,davinci.yaml,
which is almost useless.



If I understand it correctly,
.schema.yaml.tmp should contain all binding yaml.


I fixed it up like follows:

diff --git a/Documentation/devicetree/bindings/Makefile
b/Documentation/devicetree/bindings/Makefile
index ee0110d..267458f 100644
--- a/Documentation/devicetree/bindings/Makefile
+++ b/Documentation/devicetree/bindings/Makefile
@@ -19,7 +19,7 @@ extra-y += $(DT_TMP_SCHEMA)
 quiet_cmd_mk_schema = SCHEMA  $@
   cmd_mk_schema = mkdir -p $(obj); \
   rm -f $@; \
-  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $<
+  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@
$(filter-out FORCE, $^)

 DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
 DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))



Then, I see another error.


  SCHEMA  Documentation/devicetree/bindings/.schema.yaml.tmp
Traceback (most recent call last):
  File "/home/masahiro/ref/yaml-bindings/tools/dt-mk-schema", line 32,
in 
schemas = dtschema.process_schemas(args.schemas, core_schema=(not

Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves

2018-12-07 Thread Pu Wen

On 2018/12/6 18:37, Borislav Petkov wrote:
> Did you not see my reply to this last time?
>
> https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic

I'm sorry that there is something wrong with my mail filter. So I was
not notified about your reply for many days. :)

I just found your reply and replied to it.

Thx.

--
Regards,
Pu Wen


Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves

2018-12-07 Thread Pu Wen

On 2018/12/6 18:37, Borislav Petkov wrote:
> Did you not see my reply to this last time?
>
> https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic

I'm sorry that there is something wrong with my mail filter. So I was
not notified about your reply for many days. :)

I just found your reply and replied to it.

Thx.

--
Regards,
Pu Wen


Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves

2018-12-07 Thread Pu Wen

On 2018/12/6 18:37, Borislav Petkov wrote:

Did you not see my reply to this last time?

https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic


I'm sorry that there is something wrong with my mail filter. So I was
not notified about your reply for many days. :)

I just found your reply and replied to it.

Thx.

--
Regards,
Pu Wen


Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves

2018-12-07 Thread Pu Wen

On 2018/12/6 18:37, Borislav Petkov wrote:

Did you not see my reply to this last time?

https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic


I'm sorry that there is something wrong with my mail filter. So I was
not notified about your reply for many days. :)

I just found your reply and replied to it.

Thx.

--
Regards,
Pu Wen


Re: [PATCH v6 2/5] phy: qcom-qmp: Utilize fully-specified DT registers

2018-12-07 Thread Kishon Vijay Abraham I
Hi,

On 07/12/18 2:16 PM, Vivek Gautam wrote:
> On Fri, Dec 7, 2018 at 5:06 AM Evan Green  wrote:
>>
>> Utilize the newly fixed up DT bindings to get the tx2 and rx2 register
>> regions for the second lane of dual-lane PHYs. Before this change,
>> the driver was simply using lane one's register region and adding
>> 0x400, which reached well beyond the DT-specified register
>> allocation. This would have been a crash were it not for the page size
>> on ARM64. Fix the driver not to rely on the magic of virtual memory by
>> using the newly specified DT register regions for tx2 and rx2.
>>
>> In order to support existing device trees, this change also contains a
>> fallback mode for when those new register regions don't exist, which
>> reverts to the original behavior of overreaching and prints a complaint.
>>
>> Signed-off-by: Evan Green 
>> Reviewed-by: Douglas Anderson 
>> ---
>> As Doug mentioned, this should land before the dts patches land, otherwise
>> the old driver code will use the tx2 register region as pcs_misc.
>>
>> Changes in v6: None
>> Changes in v5: None
>> Changes in v4: None
>> Changes in v3: None
>> Changes in v2: None
>>
>>  drivers/phy/qualcomm/phy-qcom-qmp.c | 51 +
>>  1 file changed, 38 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c 
>> b/drivers/phy/qualcomm/phy-qcom-qmp.c
>> index 514db7248a5d0..8204d55e2d650 100644
>> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
>> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
>> @@ -72,6 +72,9 @@
>>
>>  #define MAX_PROP_NAME  32
>>
>> +/* Define the assumed distance between lanes for underspecified device 
>> trees. */
>> +#define QMP_PHY_LEGACY_LANE_STRIDE 0x400
>> +
>>  struct qmp_phy_init_tbl {
>> unsigned int offset;
>> unsigned int val;
>> @@ -733,9 +736,6 @@ struct qmp_phy_cfg {
>> bool has_phy_dp_com_ctrl;
>> /* true, if PHY has secondary tx/rx lanes to be configured */
>> bool is_dual_lane_phy;
>> -   /* Register offset of secondary tx/rx lanes for USB DP combo PHY */
>> -   unsigned int tx_b_lane_offset;
>> -   unsigned int rx_b_lane_offset;
>>
>> /* true, if PCS block has no separate SW_RESET register */
>> bool no_pcs_sw_reset;
>> @@ -748,6 +748,8 @@ struct qmp_phy_cfg {
>>   * @tx: iomapped memory space for lane's tx
>>   * @rx: iomapped memory space for lane's rx
>>   * @pcs: iomapped memory space for lane's pcs
>> + * @tx2: iomapped memory space for second lane's tx (in dual lane PHYs)
>> + * @rx2: iomapped memory space for second lane's rx (in dual lane PHYs)
>>   * @pcs_misc: iomapped memory space for lane's pcs_misc
>>   * @pipe_clk: pipe lock
>>   * @index: lane index
>> @@ -759,6 +761,8 @@ struct qmp_phy {
>> void __iomem *tx;
>> void __iomem *rx;
>> void __iomem *pcs;
>> +   void __iomem *tx2;
>> +   void __iomem *rx2;
>> void __iomem *pcs_misc;
>> struct clk *pipe_clk;
>> unsigned int index;
>> @@ -975,8 +979,6 @@ static const struct qmp_phy_cfg qmp_v3_usb3phy_cfg = {
>>
>> .has_phy_dp_com_ctrl= true,
>> .is_dual_lane_phy   = true,
>> -   .tx_b_lane_offset   = 0x400,
>> -   .rx_b_lane_offset   = 0x400,
>>  };
>>
>>  static const struct qmp_phy_cfg qmp_v3_usb3_uniphy_cfg = {
>> @@ -1031,9 +1033,6 @@ static const struct qmp_phy_cfg sdm845_ufsphy_cfg = {
>> .mask_pcs_ready = PCS_READY,
>>
>> .is_dual_lane_phy   = true,
>> -   .tx_b_lane_offset   = 0x400,
>> -   .rx_b_lane_offset   = 0x400,
>> -
>> .no_pcs_sw_reset= true,
>>  };
>>
>> @@ -1238,12 +1237,12 @@ static int qcom_qmp_phy_init(struct phy *phy)
>> qcom_qmp_phy_configure(tx, cfg->regs, cfg->tx_tbl, cfg->tx_tbl_num);
>> /* Configuration for other LANE for USB-DP combo PHY */
>> if (cfg->is_dual_lane_phy)
>> -   qcom_qmp_phy_configure(tx + cfg->tx_b_lane_offset, cfg->regs,
>> +   qcom_qmp_phy_configure(qphy->tx2, cfg->regs,
>>cfg->tx_tbl, cfg->tx_tbl_num);
>>
>> qcom_qmp_phy_configure(rx, cfg->regs, cfg->rx_tbl, cfg->rx_tbl_num);
>> if (cfg->is_dual_lane_phy)
>> -   qcom_qmp_phy_configure(rx + cfg->rx_b_lane_offset, cfg->regs,
>> +   qcom_qmp_phy_configure(qphy->rx2, cfg->regs,
>>cfg->rx_tbl, cfg->rx_tbl_num);
>>
>> qcom_qmp_phy_configure(pcs, cfg->regs, cfg->pcs_tbl, 
>> cfg->pcs_tbl_num);
>> @@ -1615,8 +1614,9 @@ int qcom_qmp_phy_create(struct device *dev, struct 
>> device_node *np, int id)
>>
>> /*
>>  * Get memory resources for each phy lane:
>> -* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2; and
>> -* pcs_misc (optional) -> 3.
>> +* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2.
>> +* For dual lane PHYs: tx2 -> 3, rx2 

Re: [PATCH v6 2/5] phy: qcom-qmp: Utilize fully-specified DT registers

2018-12-07 Thread Kishon Vijay Abraham I
Hi,

On 07/12/18 2:16 PM, Vivek Gautam wrote:
> On Fri, Dec 7, 2018 at 5:06 AM Evan Green  wrote:
>>
>> Utilize the newly fixed up DT bindings to get the tx2 and rx2 register
>> regions for the second lane of dual-lane PHYs. Before this change,
>> the driver was simply using lane one's register region and adding
>> 0x400, which reached well beyond the DT-specified register
>> allocation. This would have been a crash were it not for the page size
>> on ARM64. Fix the driver not to rely on the magic of virtual memory by
>> using the newly specified DT register regions for tx2 and rx2.
>>
>> In order to support existing device trees, this change also contains a
>> fallback mode for when those new register regions don't exist, which
>> reverts to the original behavior of overreaching and prints a complaint.
>>
>> Signed-off-by: Evan Green 
>> Reviewed-by: Douglas Anderson 
>> ---
>> As Doug mentioned, this should land before the dts patches land, otherwise
>> the old driver code will use the tx2 register region as pcs_misc.
>>
>> Changes in v6: None
>> Changes in v5: None
>> Changes in v4: None
>> Changes in v3: None
>> Changes in v2: None
>>
>>  drivers/phy/qualcomm/phy-qcom-qmp.c | 51 +
>>  1 file changed, 38 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c 
>> b/drivers/phy/qualcomm/phy-qcom-qmp.c
>> index 514db7248a5d0..8204d55e2d650 100644
>> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
>> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
>> @@ -72,6 +72,9 @@
>>
>>  #define MAX_PROP_NAME  32
>>
>> +/* Define the assumed distance between lanes for underspecified device 
>> trees. */
>> +#define QMP_PHY_LEGACY_LANE_STRIDE 0x400
>> +
>>  struct qmp_phy_init_tbl {
>> unsigned int offset;
>> unsigned int val;
>> @@ -733,9 +736,6 @@ struct qmp_phy_cfg {
>> bool has_phy_dp_com_ctrl;
>> /* true, if PHY has secondary tx/rx lanes to be configured */
>> bool is_dual_lane_phy;
>> -   /* Register offset of secondary tx/rx lanes for USB DP combo PHY */
>> -   unsigned int tx_b_lane_offset;
>> -   unsigned int rx_b_lane_offset;
>>
>> /* true, if PCS block has no separate SW_RESET register */
>> bool no_pcs_sw_reset;
>> @@ -748,6 +748,8 @@ struct qmp_phy_cfg {
>>   * @tx: iomapped memory space for lane's tx
>>   * @rx: iomapped memory space for lane's rx
>>   * @pcs: iomapped memory space for lane's pcs
>> + * @tx2: iomapped memory space for second lane's tx (in dual lane PHYs)
>> + * @rx2: iomapped memory space for second lane's rx (in dual lane PHYs)
>>   * @pcs_misc: iomapped memory space for lane's pcs_misc
>>   * @pipe_clk: pipe lock
>>   * @index: lane index
>> @@ -759,6 +761,8 @@ struct qmp_phy {
>> void __iomem *tx;
>> void __iomem *rx;
>> void __iomem *pcs;
>> +   void __iomem *tx2;
>> +   void __iomem *rx2;
>> void __iomem *pcs_misc;
>> struct clk *pipe_clk;
>> unsigned int index;
>> @@ -975,8 +979,6 @@ static const struct qmp_phy_cfg qmp_v3_usb3phy_cfg = {
>>
>> .has_phy_dp_com_ctrl= true,
>> .is_dual_lane_phy   = true,
>> -   .tx_b_lane_offset   = 0x400,
>> -   .rx_b_lane_offset   = 0x400,
>>  };
>>
>>  static const struct qmp_phy_cfg qmp_v3_usb3_uniphy_cfg = {
>> @@ -1031,9 +1033,6 @@ static const struct qmp_phy_cfg sdm845_ufsphy_cfg = {
>> .mask_pcs_ready = PCS_READY,
>>
>> .is_dual_lane_phy   = true,
>> -   .tx_b_lane_offset   = 0x400,
>> -   .rx_b_lane_offset   = 0x400,
>> -
>> .no_pcs_sw_reset= true,
>>  };
>>
>> @@ -1238,12 +1237,12 @@ static int qcom_qmp_phy_init(struct phy *phy)
>> qcom_qmp_phy_configure(tx, cfg->regs, cfg->tx_tbl, cfg->tx_tbl_num);
>> /* Configuration for other LANE for USB-DP combo PHY */
>> if (cfg->is_dual_lane_phy)
>> -   qcom_qmp_phy_configure(tx + cfg->tx_b_lane_offset, cfg->regs,
>> +   qcom_qmp_phy_configure(qphy->tx2, cfg->regs,
>>cfg->tx_tbl, cfg->tx_tbl_num);
>>
>> qcom_qmp_phy_configure(rx, cfg->regs, cfg->rx_tbl, cfg->rx_tbl_num);
>> if (cfg->is_dual_lane_phy)
>> -   qcom_qmp_phy_configure(rx + cfg->rx_b_lane_offset, cfg->regs,
>> +   qcom_qmp_phy_configure(qphy->rx2, cfg->regs,
>>cfg->rx_tbl, cfg->rx_tbl_num);
>>
>> qcom_qmp_phy_configure(pcs, cfg->regs, cfg->pcs_tbl, 
>> cfg->pcs_tbl_num);
>> @@ -1615,8 +1614,9 @@ int qcom_qmp_phy_create(struct device *dev, struct 
>> device_node *np, int id)
>>
>> /*
>>  * Get memory resources for each phy lane:
>> -* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2; and
>> -* pcs_misc (optional) -> 3.
>> +* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2.
>> +* For dual lane PHYs: tx2 -> 3, rx2 

Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Masami Hiramatsu
On Fri, 7 Dec 2018 18:58:05 +0100
Andrea Righi  wrote:

> On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > Hi Andrea and Ingo,
> > 
> > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > working.
> > After introducing this patch, I will start adding 
> > arch_populate_kprobe_blacklist()
> > to some arches.
> > 
> > Thank you,
> > 
> > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > area
> > 
> > From: Masami Hiramatsu 
> > 
> > Blacklist symbols in arch-defined probe-prohibited areas.
> > With this change, user can see all symbols which are prohibited
> > to probe in debugfs.
> > 
> > All archtectures which have custom prohibit areas should define
> > its own arch_populate_kprobe_blacklist() function, but unless that,
> > all symbols marked __kprobes are blacklisted.
> 
> What about iterating all symbols and use arch_within_kprobe_blacklist()
> to check if we need to blacklist them or not.

Sorry, I don't want to iterate all ksyms since it may take a long time
(especially embedded small devices.)

> 
> In this way we don't have to introduce an
> arch_populate_kprobe_blacklist() for each architecture.

Hmm, I had a same idea, but there are some arch which prohibit probing
extable entries (e.g. arm64.) For correctness of the blacklist, I think
it should be listed (not entire the function body).
I also rather like to remove arch_within_kprobe_blacklist() instead.

Thank you,

> 
> Something like the following maybe.
> 
> Thanks.
> 
> [RFC] kprobes: blacklist all symbols in arch-defined prohibited area
> 
> From: Andrea Righi 
> 
> Blacklist symbols in arch-defined probe-prohibited areas.
> With this change, user can see all symbols which are prohibited
> to probe in debugfs.
> 
> Signed-off-by: Masami Hiramatsu 
> Signed-off-by: Andrea Righi 
> ---
>  kernel/kprobes.c | 55 +--
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 90e98e233647..e67598dd7468 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -2093,6 +2093,35 @@ void dump_kprobe(struct kprobe *kp)
>  }
>  NOKPROBE_SYMBOL(dump_kprobe);
>  
> +static int kprobe_blacklist_add(unsigned long entry)
> +{
> + struct kprobe_blacklist_entry *ent;
> + unsigned long offset = 0, size = 0;
> +
> + if (!kernel_text_address(entry) ||
> + !kallsyms_lookup_size_offset(entry, , ))
> + return -EINVAL;
> +
> + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> + if (!ent)
> + return -ENOMEM;
> + ent->start_addr = entry;
> + ent->end_addr = entry + size;
> + INIT_LIST_HEAD(>list);
> + list_add_tail(>list, _blacklist);
> +
> + return 0;
> +}
> +
> +static int arch_populate_kprobe_blacklist(void *data, const char *name,
> +   struct module *mod,
> +   unsigned long entry)
> +{
> + if (arch_within_kprobe_blacklist(entry))
> + kprobe_blacklist_add(entry);
> + return 0;
> +}
> +
>  /*
>   * Lookup and populate the kprobe_blacklist.
>   *
> @@ -2104,24 +2133,22 @@ NOKPROBE_SYMBOL(dump_kprobe);
>  static int __init populate_kprobe_blacklist(unsigned long *start,
>unsigned long *end)
>  {
> - unsigned long *iter;
> - struct kprobe_blacklist_entry *ent;
> - unsigned long entry, offset = 0, size = 0;
> + unsigned long entry, *iter;
> + int ret;
>  
> + /* Blacklist all arch_within_kprobe_blacklist() symbols */
> +mutex_lock(_mutex);
> + kallsyms_on_each_symbol(arch_populate_kprobe_blacklist, NULL);
> +mutex_unlock(_mutex);
> +
> + /* Add explicitly blacklisted symbols */
>   for (iter = start; iter < end; iter++) {
>   entry = arch_deref_entry_point((void *)*iter);
> -
> - if (!kernel_text_address(entry) ||
> - !kallsyms_lookup_size_offset(entry, , ))
> + ret = kprobe_blacklist_add(entry);
> + if (ret == -EINVAL)
>   continue;
> -
> - ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> - if (!ent)
> - return -ENOMEM;
> - ent->start_addr = entry;
> - ent->end_addr = entry + size;
> - INIT_LIST_HEAD(>list);
> - list_add_tail(>list, _blacklist);
> + if (ret < 0)
> + return ret;
>   }
>   return 0;
>  }


-- 
Masami Hiramatsu 


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Masami Hiramatsu
On Fri, 7 Dec 2018 18:58:05 +0100
Andrea Righi  wrote:

> On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > Hi Andrea and Ingo,
> > 
> > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > working.
> > After introducing this patch, I will start adding 
> > arch_populate_kprobe_blacklist()
> > to some arches.
> > 
> > Thank you,
> > 
> > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > area
> > 
> > From: Masami Hiramatsu 
> > 
> > Blacklist symbols in arch-defined probe-prohibited areas.
> > With this change, user can see all symbols which are prohibited
> > to probe in debugfs.
> > 
> > All archtectures which have custom prohibit areas should define
> > its own arch_populate_kprobe_blacklist() function, but unless that,
> > all symbols marked __kprobes are blacklisted.
> 
> What about iterating all symbols and use arch_within_kprobe_blacklist()
> to check if we need to blacklist them or not.

Sorry, I don't want to iterate all ksyms since it may take a long time
(especially embedded small devices.)

> 
> In this way we don't have to introduce an
> arch_populate_kprobe_blacklist() for each architecture.

Hmm, I had a same idea, but there are some arch which prohibit probing
extable entries (e.g. arm64.) For correctness of the blacklist, I think
it should be listed (not entire the function body).
I also rather like to remove arch_within_kprobe_blacklist() instead.

Thank you,

> 
> Something like the following maybe.
> 
> Thanks.
> 
> [RFC] kprobes: blacklist all symbols in arch-defined prohibited area
> 
> From: Andrea Righi 
> 
> Blacklist symbols in arch-defined probe-prohibited areas.
> With this change, user can see all symbols which are prohibited
> to probe in debugfs.
> 
> Signed-off-by: Masami Hiramatsu 
> Signed-off-by: Andrea Righi 
> ---
>  kernel/kprobes.c | 55 +--
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 90e98e233647..e67598dd7468 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -2093,6 +2093,35 @@ void dump_kprobe(struct kprobe *kp)
>  }
>  NOKPROBE_SYMBOL(dump_kprobe);
>  
> +static int kprobe_blacklist_add(unsigned long entry)
> +{
> + struct kprobe_blacklist_entry *ent;
> + unsigned long offset = 0, size = 0;
> +
> + if (!kernel_text_address(entry) ||
> + !kallsyms_lookup_size_offset(entry, , ))
> + return -EINVAL;
> +
> + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> + if (!ent)
> + return -ENOMEM;
> + ent->start_addr = entry;
> + ent->end_addr = entry + size;
> + INIT_LIST_HEAD(>list);
> + list_add_tail(>list, _blacklist);
> +
> + return 0;
> +}
> +
> +static int arch_populate_kprobe_blacklist(void *data, const char *name,
> +   struct module *mod,
> +   unsigned long entry)
> +{
> + if (arch_within_kprobe_blacklist(entry))
> + kprobe_blacklist_add(entry);
> + return 0;
> +}
> +
>  /*
>   * Lookup and populate the kprobe_blacklist.
>   *
> @@ -2104,24 +2133,22 @@ NOKPROBE_SYMBOL(dump_kprobe);
>  static int __init populate_kprobe_blacklist(unsigned long *start,
>unsigned long *end)
>  {
> - unsigned long *iter;
> - struct kprobe_blacklist_entry *ent;
> - unsigned long entry, offset = 0, size = 0;
> + unsigned long entry, *iter;
> + int ret;
>  
> + /* Blacklist all arch_within_kprobe_blacklist() symbols */
> +mutex_lock(_mutex);
> + kallsyms_on_each_symbol(arch_populate_kprobe_blacklist, NULL);
> +mutex_unlock(_mutex);
> +
> + /* Add explicitly blacklisted symbols */
>   for (iter = start; iter < end; iter++) {
>   entry = arch_deref_entry_point((void *)*iter);
> -
> - if (!kernel_text_address(entry) ||
> - !kallsyms_lookup_size_offset(entry, , ))
> + ret = kprobe_blacklist_add(entry);
> + if (ret == -EINVAL)
>   continue;
> -
> - ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> - if (!ent)
> - return -ENOMEM;
> - ent->start_addr = entry;
> - ent->end_addr = entry + size;
> - INIT_LIST_HEAD(>list);
> - list_add_tail(>list, _blacklist);
> + if (ret < 0)
> + return ret;
>   }
>   return 0;
>  }


-- 
Masami Hiramatsu 


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Masami Hiramatsu
On Fri, 7 Dec 2018 18:00:26 +0100
Andrea Righi  wrote:

> On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > Hi Andrea and Ingo,
> > 
> > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > working.
> > After introducing this patch, I will start adding 
> > arch_populate_kprobe_blacklist()
> > to some arches.
> > 
> > Thank you,
> > 
> > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > area
> > 
> > From: Masami Hiramatsu 
> > 
> > Blacklist symbols in arch-defined probe-prohibited areas.
> > With this change, user can see all symbols which are prohibited
> > to probe in debugfs.
> > 
> > All archtectures which have custom prohibit areas should define
> > its own arch_populate_kprobe_blacklist() function, but unless that,
> > all symbols marked __kprobes are blacklisted.
> > 
> > Reported-by: Andrea Righi 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> 
> [snip]
> 
> > +int kprobe_add_ksym_blacklist(unsigned long entry)
> > +{
> > +   struct kprobe_blacklist_entry *ent;
> > +   unsigned long offset = 0, size = 0;
> > +
> > +   if (!kernel_text_address(entry) ||
> > +   !kallsyms_lookup_size_offset(entry, , ))
> > +   return -EINVAL;
> > +
> > +   ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> > +   if (!ent)
> > +   return -ENOMEM;
> > +   ent->start_addr = entry - offset;
> > +   ent->end_addr = entry - offset + size;
> 
> Do we need to take offset into account? The code before wasn't using it.

Yes, if we hit an alias symbol (zero-size), we forcibly increment address
and retry it. In that case, offset will be 1.

> 
> > +   INIT_LIST_HEAD(>list);
> > +   list_add_tail(>list, _blacklist);
> > +
> > +   return (int)size;
> > +}
> > +
> > +/* Add functions in arch defined probe-prohibited area */
> > +int __weak arch_populate_kprobe_blacklist(void)
> > +{
> > +   unsigned long entry;
> > +   int ret = 0;
> > +
> > +   for (entry = (unsigned long)__kprobes_text_start;
> > +entry < (unsigned long)__kprobes_text_end;
> > +entry += ret) {
> > +   ret = kprobe_add_ksym_blacklist(entry);
> > +   if (ret < 0)
> > +   return ret;
> > +   if (ret == 0)   /* In case of alias symbol */
> > +   ret = 1;

Here, we incremented.

Thank you,

> > +   }
> > +   return 0;
> > +}
> > +
> >  /*
> >   * Lookup and populate the kprobe_blacklist.
> >   *
> > @@ -2104,26 +2142,20 @@ NOKPROBE_SYMBOL(dump_kprobe);
> >  static int __init populate_kprobe_blacklist(unsigned long *start,
> >  unsigned long *end)
> >  {
> > +   unsigned long entry;
> > unsigned long *iter;
> > -   struct kprobe_blacklist_entry *ent;
> > -   unsigned long entry, offset = 0, size = 0;
> > +   int ret;
> >  
> > for (iter = start; iter < end; iter++) {
> > entry = arch_deref_entry_point((void *)*iter);
> > -
> > -   if (!kernel_text_address(entry) ||
> > -   !kallsyms_lookup_size_offset(entry, , ))
> > +   ret = kprobe_add_ksym_blacklist(entry);
> > +   if (ret == -EINVAL)
> > continue;
> > -
> > -   ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> > -   if (!ent)
> > -   return -ENOMEM;
> > -   ent->start_addr = entry;
> > -   ent->end_addr = entry + size;
> > -   INIT_LIST_HEAD(>list);
> > -   list_add_tail(>list, _blacklist);
> > +   if (ret < 0)
> > +   return ret;
> > }
> > -   return 0;
> > +
> > +   return arch_populate_kprobe_blacklist();
> >  }
> >  
> >  /* Module notifier call back, checking kprobes on the module */


-- 
Masami Hiramatsu 


Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions

2018-12-07 Thread Masami Hiramatsu
On Fri, 7 Dec 2018 18:00:26 +0100
Andrea Righi  wrote:

> On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote:
> > Hi Andrea and Ingo,
> > 
> > Here is the patch what I meant. I just ran it on qemu-x86, and seemed 
> > working.
> > After introducing this patch, I will start adding 
> > arch_populate_kprobe_blacklist()
> > to some arches.
> > 
> > Thank you,
> > 
> > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited 
> > area
> > 
> > From: Masami Hiramatsu 
> > 
> > Blacklist symbols in arch-defined probe-prohibited areas.
> > With this change, user can see all symbols which are prohibited
> > to probe in debugfs.
> > 
> > All archtectures which have custom prohibit areas should define
> > its own arch_populate_kprobe_blacklist() function, but unless that,
> > all symbols marked __kprobes are blacklisted.
> > 
> > Reported-by: Andrea Righi 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> 
> [snip]
> 
> > +int kprobe_add_ksym_blacklist(unsigned long entry)
> > +{
> > +   struct kprobe_blacklist_entry *ent;
> > +   unsigned long offset = 0, size = 0;
> > +
> > +   if (!kernel_text_address(entry) ||
> > +   !kallsyms_lookup_size_offset(entry, , ))
> > +   return -EINVAL;
> > +
> > +   ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> > +   if (!ent)
> > +   return -ENOMEM;
> > +   ent->start_addr = entry - offset;
> > +   ent->end_addr = entry - offset + size;
> 
> Do we need to take offset into account? The code before wasn't using it.

Yes, if we hit an alias symbol (zero-size), we forcibly increment address
and retry it. In that case, offset will be 1.

> 
> > +   INIT_LIST_HEAD(>list);
> > +   list_add_tail(>list, _blacklist);
> > +
> > +   return (int)size;
> > +}
> > +
> > +/* Add functions in arch defined probe-prohibited area */
> > +int __weak arch_populate_kprobe_blacklist(void)
> > +{
> > +   unsigned long entry;
> > +   int ret = 0;
> > +
> > +   for (entry = (unsigned long)__kprobes_text_start;
> > +entry < (unsigned long)__kprobes_text_end;
> > +entry += ret) {
> > +   ret = kprobe_add_ksym_blacklist(entry);
> > +   if (ret < 0)
> > +   return ret;
> > +   if (ret == 0)   /* In case of alias symbol */
> > +   ret = 1;

Here, we incremented.

Thank you,

> > +   }
> > +   return 0;
> > +}
> > +
> >  /*
> >   * Lookup and populate the kprobe_blacklist.
> >   *
> > @@ -2104,26 +2142,20 @@ NOKPROBE_SYMBOL(dump_kprobe);
> >  static int __init populate_kprobe_blacklist(unsigned long *start,
> >  unsigned long *end)
> >  {
> > +   unsigned long entry;
> > unsigned long *iter;
> > -   struct kprobe_blacklist_entry *ent;
> > -   unsigned long entry, offset = 0, size = 0;
> > +   int ret;
> >  
> > for (iter = start; iter < end; iter++) {
> > entry = arch_deref_entry_point((void *)*iter);
> > -
> > -   if (!kernel_text_address(entry) ||
> > -   !kallsyms_lookup_size_offset(entry, , ))
> > +   ret = kprobe_add_ksym_blacklist(entry);
> > +   if (ret == -EINVAL)
> > continue;
> > -
> > -   ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> > -   if (!ent)
> > -   return -ENOMEM;
> > -   ent->start_addr = entry;
> > -   ent->end_addr = entry + size;
> > -   INIT_LIST_HEAD(>list);
> > -   list_add_tail(>list, _blacklist);
> > +   if (ret < 0)
> > +   return ret;
> > }
> > -   return 0;
> > +
> > +   return arch_populate_kprobe_blacklist();
> >  }
> >  
> >  /* Module notifier call back, checking kprobes on the module */


-- 
Masami Hiramatsu 


Re: [PATCH bpf-next 2/3] bpf: add bpffs pretty print for cgroup local storage maps

2018-12-07 Thread Yonghong Song


On 12/7/18 4:53 PM, Roman Gushchin wrote:
> Implement bpffs pretty printing for cgroup local storage maps
> (both shared and per-cpu).
> Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):
> 
> Shared:
>$ cat /sys/fs/bpf/map_2
># WARNING!! The output is for debug purpose only
># WARNING!! The output format will change
>{4294968594,1}: {,1039896}
> 
> Per-cpu:
>$ cat /sys/fs/bpf/map_1
># WARNING!! The output is for debug purpose only
># WARNING!! The output format will change
>{4294968594,1}: {
>   cpu0: {0,0,0,0,0}
>   cpu1: {0,0,0,0,0}
>   cpu2: {1,104,0,0,0}
>   cpu3: {0,0,0,0,0}
>}
> 
> Signed-off-by: Roman Gushchin 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> ---
>   include/linux/btf.h| 10 +
>   kernel/bpf/local_storage.c | 90 +-
>   2 files changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 8c2199b5d250..ac67bc4cbfd9 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -5,6 +5,7 @@
>   #define _LINUX_BTF_H 1
>   
>   #include 
> +#include 
>   
>   struct btf;
>   struct btf_type;
> @@ -63,4 +64,13 @@ static inline const char *btf_name_by_offset(const struct 
> btf *btf,
>   }
>   #endif
>   
> +static inline const struct btf_type *btf_orig_type(const struct btf *btf,
> +const struct btf_type *t)
> +{
> + while (t && BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF)
> + t = btf_type_by_id(btf, t->type);
technically, type modifier "const" and "volatile" can apply to member 
type as well. But these modifiers really don't make sense here.
Could you add a comment here to mention that they will be treated
as an error since such a programming is not really recommended?

> +
> + return t;
> +}
> +
>   #endif
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index b65017dead44..7b51fe1aba3c 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -1,11 +1,13 @@
>   //SPDX-License-Identifier: GPL-2.0
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   DEFINE_PER_CPU(struct bpf_cgroup_storage*, 
> bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
>   
> @@ -308,6 +310,91 @@ static int cgroup_storage_delete_elem(struct bpf_map 
> *map, void *key)
>   return -EINVAL;
>   }
>   
> +static int cgroup_storage_check_btf(const struct bpf_map *map,
> + const struct btf *btf,
> + const struct btf_type *key_type,
> + const struct btf_type *value_type)
> +{
> + const struct btf_type *st, *t;
> + struct btf_member *m;
> +
> + /* Key is expected to be of struct bpf_cgroup_storage_key type,
> +  * which is:
> +  * struct bpf_cgroup_storage_key {
> +  *  __u64   cgroup_inode_id;
> +  *  __u32   attach_type;
> +  * };
> +  */
> +
> + /*
> +  * Key_type must be a structure (or a typedef of a structure) with
> +  * two members.
> +  */
> + st = btf_orig_type(btf, key_type);
> + if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT ||
> + BTF_INFO_VLEN(st->info) != 2)
> + return -EINVAL;
> +
> + /*
> +  * The first field must be a 64 bit integer at 0 offset.
> +  */
> + m = (struct btf_member *)(st + 1);
> + t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
> + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || m->offset ||
> + t->size !=
> + FIELD_SIZEOF(struct bpf_cgroup_storage_key, cgroup_inode_id))
> + return -EINVAL;

We should not use t->size here. The "t->size" is the type size, and the
real number of bits held by the member is BTF_INT_BITS(...) with the 
argument of the u32 int value after "t".

> +
> + /*
> +  * The second field must be a 32 bit integer at 0 offset.
> +  */
> + m = m + 1;
> + t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
> + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT ||
> + m->offset != offsetof(struct bpf_cgroup_storage_key, attach_type) *
> + BITS_PER_BYTE || t->size !=
> + FIELD_SIZEOF(struct bpf_cgroup_storage_key, attach_type))
> + return -EINVAL;

The same is here. t->size should not be used.
BTF_INT_BITS(...) should be used.

> +
> + return 0;
> +}
> +
> +static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *_key,
> +  struct seq_file *m)
> +{
> + enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
> + struct bpf_cgroup_storage_key *key = _key;
> + struct bpf_cgroup_storage *storage;
> + int cpu;
> +
> + rcu_read_lock();
> + storage = cgroup_storage_lookup(map_to_storage(map), key, false);
> + if (!storage) 

Re: [PATCH v4 4/5] ARM: dts: imx5: add gpu nodes

2018-12-07 Thread Shawn Guo
On Tue, Dec 04, 2018 at 10:17:00AM -0500, Jonathan Marek wrote:
> This adds the gpu nodes for the adreno 200 GPU on iMX51 and iMX53, now
> supported by the freedreno driver.
> 
> The compatible for the iMX51 uses a patchid of 1, which is used by drm/msm
> driver to identify the smaller 128KiB GMEM size.
> 
> Signed-off-by: Jonathan Marek 

Applied, thanks.


Re: [RFC][PATCHv2 3/4] serial: introduce uart_port locking helpers

2018-12-07 Thread Sergey Senozhatsky
On (10/16/18 14:04), Sergey Senozhatsky wrote:
[..]
> - The first entry point is console ->write() callback, which we call
>   from printk(). A possible deadlock scenario there is:
> 
>   CPU0
>   
>   spin_lock_irqsave(>lock, flags)  << deadlock
>   serial_foo_write()
>   call_console_drivers()
>   console_unlock()
>   console_flush_on_panic()
>   panic()
>   
>   spin_lock_irqsave(>lock, flags)
>   serial_foo_write()
>   call_console_drivers()
>   console_unlock()
>   printk()
>   ...

[..]
> - The rest (of entry points) requires a bit different handling.
>   Let's take a look at the following backtrace:
> 
>   CPU0
>   
>   spin_lock_irqsave(>lock, flags)  << deadlock
>   serial_foo_write()
>   call_console_drivers()
>   console_unlock()
>   printk()
>   __queue_work()
>   tty_flip_buffer_push()
>   spin_lock_irqsave(>lock, flags)
>   serial_foo_handle_IRQ()
>   
>
>   Serial drivers invoke tons of core kernel functions - WQ, MM, etc. All
>   of which may printk() in various cases. So we can't really just
>   "remove those printk-s". The simples way to address this seems to be
>   PRINTK_SAFE_CONTEXT_MASK.

serial/UART and printk guys, sorry to bother you, do you hate this
idea of removing console_driver->CORE KERNEL->printk->console_driver
deadlock path? Or is there any chance we can move forward?

-ss


Re: [RFC][PATCHv2 3/4] serial: introduce uart_port locking helpers

2018-12-07 Thread Sergey Senozhatsky
On (10/16/18 14:04), Sergey Senozhatsky wrote:
[..]
> - The first entry point is console ->write() callback, which we call
>   from printk(). A possible deadlock scenario there is:
> 
>   CPU0
>   
>   spin_lock_irqsave(>lock, flags)  << deadlock
>   serial_foo_write()
>   call_console_drivers()
>   console_unlock()
>   console_flush_on_panic()
>   panic()
>   
>   spin_lock_irqsave(>lock, flags)
>   serial_foo_write()
>   call_console_drivers()
>   console_unlock()
>   printk()
>   ...

[..]
> - The rest (of entry points) requires a bit different handling.
>   Let's take a look at the following backtrace:
> 
>   CPU0
>   
>   spin_lock_irqsave(>lock, flags)  << deadlock
>   serial_foo_write()
>   call_console_drivers()
>   console_unlock()
>   printk()
>   __queue_work()
>   tty_flip_buffer_push()
>   spin_lock_irqsave(>lock, flags)
>   serial_foo_handle_IRQ()
>   
>
>   Serial drivers invoke tons of core kernel functions - WQ, MM, etc. All
>   of which may printk() in various cases. So we can't really just
>   "remove those printk-s". The simples way to address this seems to be
>   PRINTK_SAFE_CONTEXT_MASK.

serial/UART and printk guys, sorry to bother you, do you hate this
idea of removing console_driver->CORE KERNEL->printk->console_driver
deadlock path? Or is there any chance we can move forward?

-ss


Re: [PATCH] scsi: Use of_node_name_eq for node name comparisons

2018-12-07 Thread Martin K. Petersen


Rob,

> Convert string compares of DT node names to use of_node_name_eq helper
> instead. This removes direct access to the node name pointer.

Applied to 4.21/scsi-queue, thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH] checkstack.pl: dynamic stack growth for aarch64

2018-12-07 Thread Qian Cai
This is to track dynamic amount of stack growth for aarch64, so it is
possible to print out offensive functions that may consume too much
stack. For example,

0x284d1270 try_to_unmap_one [vmlinux]:  Dynamic (0xcf0)
0x28538358 migrate_page_move_mapping [vmlinux]: Dynamic (0xc60)
0x281276c8 copy_process.isra.2 [vmlinux]:   Dynamic (0xb20)
0x28424958 show_free_areas [vmlinux]:   Dynamic (0xb40)
0x28545178 __split_huge_pmd_locked [vmlinux]:   Dynamic (0xb30)
0x28555120 collapse_shmem [vmlinux]:Dynamic (0xbc0)
0x2862e0d0 do_direct_IO [vmlinux]:  Dynamic (0xb70)
0x28cc0aa0 md_do_sync [vmlinux]:Dynamic (0xb90)

Signed-off-by: Qian Cai 
---
 scripts/checkstack.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl
index 34414c6efad6..122aef5e4e14 100755
--- a/scripts/checkstack.pl
+++ b/scripts/checkstack.pl
@@ -48,7 +48,9 @@ my (@stack, $re, $dre, $x, $xs, $funcre);
$funcre = qr/^$x* <(.*)>:$/;
if ($arch eq 'aarch64') {
#ffc0006325cc:   a9bb7bfdstp x29, x30, [sp, 
#-80]!
+   #a110:   d11643ffsub sp, sp, #0x590
$re = qr/^.*stp.*sp, \#-([0-9]{1,8})\]\!/o;
+   $dre = qr/^.*sub.*sp, sp, #(0x$x{1,8})/o;
} elsif ($arch eq 'arm') {
#c0008ffc:  e24dd064sub sp, sp, #100; 0x64
$re = qr/.*sub.*sp, sp, #(([0-9]{2}|[3-9])[0-9]{2})/o;
-- 
2.17.2 (Apple Git-113)



[PATCH] checkstack.pl: dynamic stack growth for aarch64

2018-12-07 Thread Qian Cai
This is to track dynamic amount of stack growth for aarch64, so it is
possible to print out offensive functions that may consume too much
stack. For example,

0x284d1270 try_to_unmap_one [vmlinux]:  Dynamic (0xcf0)
0x28538358 migrate_page_move_mapping [vmlinux]: Dynamic (0xc60)
0x281276c8 copy_process.isra.2 [vmlinux]:   Dynamic (0xb20)
0x28424958 show_free_areas [vmlinux]:   Dynamic (0xb40)
0x28545178 __split_huge_pmd_locked [vmlinux]:   Dynamic (0xb30)
0x28555120 collapse_shmem [vmlinux]:Dynamic (0xbc0)
0x2862e0d0 do_direct_IO [vmlinux]:  Dynamic (0xb70)
0x28cc0aa0 md_do_sync [vmlinux]:Dynamic (0xb90)

Signed-off-by: Qian Cai 
---
 scripts/checkstack.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl
index 34414c6efad6..122aef5e4e14 100755
--- a/scripts/checkstack.pl
+++ b/scripts/checkstack.pl
@@ -48,7 +48,9 @@ my (@stack, $re, $dre, $x, $xs, $funcre);
$funcre = qr/^$x* <(.*)>:$/;
if ($arch eq 'aarch64') {
#ffc0006325cc:   a9bb7bfdstp x29, x30, [sp, 
#-80]!
+   #a110:   d11643ffsub sp, sp, #0x590
$re = qr/^.*stp.*sp, \#-([0-9]{1,8})\]\!/o;
+   $dre = qr/^.*sub.*sp, sp, #(0x$x{1,8})/o;
} elsif ($arch eq 'arm') {
#c0008ffc:  e24dd064sub sp, sp, #100; 0x64
$re = qr/.*sub.*sp, sp, #(([0-9]{2}|[3-9])[0-9]{2})/o;
-- 
2.17.2 (Apple Git-113)



Re: [PATCH] x86/cpu: Avoid endless loop to get the number of cache leaves

2018-12-07 Thread Pu Wen
Sorry for the late reply :)

On 2018/11/16 1:22, Borislav Petkov wrote:
>> @@ -640,7 +641,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c)
>>  /* Do cpuid(op) loop to find out num_cache_leaves */
>>  cpuid_count(op, i, , , , );
>>  cache_eax.full = eax;
>> -} while (cache_eax.split.type != CTYPE_NULL);
>> +} while (cache_eax.split.type != CTYPE_NULL && i != CTYPE_MAX);
> i is an int and CTYPE_MAX is enum _cache_type. Huh?

How about define CTYPE_MAX like this:
#define CTYPE_MAX  4

> This works by chance because CTYPE_MAX is 4 and the termination CPUID
> leaf is the 4th too.

It will return CTYPE_NULL when accessing the 4th CPUID leaf in most of
the cases, but in certain case it will not. So I think it's better to
restrict the maximum CPUID access times to 4 for kernel robustness.

-- 
Regards,
Pu Wen



Re: [PATCH] x86/cpu: Avoid endless loop to get the number of cache leaves

2018-12-07 Thread Pu Wen
Sorry for the late reply :)

On 2018/11/16 1:22, Borislav Petkov wrote:
>> @@ -640,7 +641,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c)
>>  /* Do cpuid(op) loop to find out num_cache_leaves */
>>  cpuid_count(op, i, , , , );
>>  cache_eax.full = eax;
>> -} while (cache_eax.split.type != CTYPE_NULL);
>> +} while (cache_eax.split.type != CTYPE_NULL && i != CTYPE_MAX);
> i is an int and CTYPE_MAX is enum _cache_type. Huh?

How about define CTYPE_MAX like this:
#define CTYPE_MAX  4

> This works by chance because CTYPE_MAX is 4 and the termination CPUID
> leaf is the 4th too.

It will return CTYPE_NULL when accessing the 4th CPUID leaf in most of
the cases, but in certain case it will not. So I think it's better to
restrict the maximum CPUID access times to 4 for kernel robustness.

-- 
Regards,
Pu Wen



Re: [PATCH] scsi: qla2xxx: NULL check before some freeing functions is not needed.

2018-12-07 Thread Martin K. Petersen


Thomas,

> NULL check before some freeing functions is not needed.

Applied to 4.21/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedf: NULL check before some freeing functions is not needed.

2018-12-07 Thread Martin K. Petersen


Thomas,

> NULL check before some freeing functions is not needed.

Applied to 4.21/scsi-queue, thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: megaraid_sas: NULL check before some freeing functions is not needed.

2018-12-07 Thread Martin K. Petersen


Thomas,

> NULL check before some freeing functions is not needed.

Applied to 4.21/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Thiago Jung Bauermann


Gautham R Shenoy  writes:
> On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:
>> Sure. I will test the patch and report back.
>
> I added the following debug patch on top of your patch, and after an
> hour's run, the system crashed. Appending the log at the end.

Thank you very much for testing! Your debug patch was very helpful as
well.

> I suppose we still need to increase the number of tries since we wait
> only for 2.5ms looping before giving up.

Do you think it would have helped? In the debug output you posted I
would have expected the following message to show up if the loop
finished too early, and it didn't:

"Querying DEAD? cpu %i (%i) shows %i\n"

So I don't think increasing the loop length would have made a
difference. In fact, the call to smp_query_cpu_stopped() always
succeeded on the first iteration.

I think there is something else going on which we don't fully understand
yet. From your other email:

> I agree that the Kernel has to respect RTAS's restriction. The PAPR
> v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following:
>
> "The stop-self service needs to be serialized with calls to the
>  stop-self, start-cpu, and set-power-level services. The OS must
>  be able to call RTAS services on other processors while the
>  processor is stopped or being stopped"
>
> Thus the onus is on the OS to ensure that there are no concurrent rtas
> calls with "stop-self" token.

As you say perhaps there's another call to stop-self, start-cpu or
set-power-level being made concurrently. I don't currently see how more
than one stop-self or start-cpu call could be in flight at the same time
given that there are a number of locks being grabbed during CPU hotplug
and unplug. OTOH the CPU that actually calls stop-self doesn't seem to
grab any locks itself so it's a possibility.

As for set-power-level, it's only used in the case of PCI hotplug from
what I can see, and that isn't part of the picture in this case, right?

We could address this problem directly by adding another lock separate
from rtas.lock to serialize just these calls. The challenge is with
stop-self, because the CPU calling it will never return to release the
lock. Is it possible to grab a lock (or down a semaphore) in the CPU
calling stop-self and then release the lock (or up the semaphore) in the
CPU running pseries_cpu_die()?

> > There's also a race between the CPU driving the unplug and the CPU
> > being unplugged which I think is not easy for the CPU being
> > unplugged to win, which makes the busy loop in pseries_cpu_die() a
> > bit fragile. I describe the race in the patch description.
> >
> > My solution to make the race less tight is to make the CPU driving
> > the unplug to only start the busy loop only after the CPU being
> > unplugged is in the CPU_STATE_OFFLINE state. At that point, we know
> > that it either is about to call RTAS or it already has.
>
> Ah, yes this is good optimization. Though, I think we ought to
> unconditionally wait until the target CPU has woken up from CEDE and
> changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we
> would have caught it in dlpar_offline_cpu() itself.

I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return
success when it had been given an invalid memory address to load from,
so my confidence in the error reporting of hcalls is a bit shaken. :-)

In that case the CPU would wait forever for the CPU state to change. If
you believe 100 ms is too short a timeout, we could make it 500 ms or
even 1s. What do you think?

> cpu 112 (hwid 112) Ready to die...
> [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65
> cpu 113 (hwid 113) Ready to die...
> [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139
> cpu 114 (hwid 114) Ready to die...
> [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036
> cpu 115 (hwid 115) Ready to die...
> [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133
> cpu 116 (hwid 116) Ready to die...
> [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231
> cpu 117 (hwid 117) Ready to die...
> [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231
> cpu 118 (hwid 118) Ready to die...
> [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231
> cpu 119 (hwid 119) Ready to die...
> [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131
> cpu 104 (hwid 104) Ready to die...
> [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40

Interesting, so 1.2 ms can pass before the dying CPU actually gets close
to making the stop-self call. And even in those cases the retry loop is
succeeding on the first try! So this shows that changing the code to
wait for the CPU_STATE_OFFLINE state is worth it.

> *** RTAS CALL BUFFER CORRUPTION ***
> 393: rtas32_call_buff_ptr=
>  0060  0060  0060  0060 [...`...`...`...`]
>  0060  0060  0060  0060 [...`...`...`...`]
>  0060  0060  0060  0060 

Re: [PATCH 01/41] scsi: BusLogic: mark expected switch fall-through

2018-12-07 Thread Martin K. Petersen


Gustavo A.,

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.

Applied to 4.21/scsi-queue, thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] drivers/scsi/fnic/fnic_trace.c: Use vzalloc

2018-12-07 Thread Martin K. Petersen


Sabyasachi,

> Replaced vmalloc + memset with vzalloc

Applied to 4.21/scsi-queue, thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread Jerome Glisse
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote:
> On 12/7/18 11:16 AM, Jerome Glisse wrote:
> > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote:
> >> On 12/4/18 5:57 PM, John Hubbard wrote:
> >>> On 12/4/18 5:44 PM, Jerome Glisse wrote:
>  On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote:
> > On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote:
> >> On 12/4/18 3:03 PM, Dan Williams wrote:
> >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how
> >>> does this proposal interact with those?
> >>
> >> Very badly: page->pgmap and page->hmm_data both get corrupted. Is 
> >> there an entire
> >> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? 
> >> Said another
> >> way: is it reasonable to disallow calling get_user_pages() on 
> >> ZONE_DEVICE pages?
> >>
> >> If we have to support get_user_pages() on ZONE_DEVICE pages, then the 
> >> whole 
> >> LRU field approach is unusable.
> >
> > We just need to rearrange ZONE_DEVICE pages.  Please excuse the 
> > whitespace
> > damage:
> >
> > +++ b/include/linux/mm_types.h
> > @@ -151,10 +151,12 @@ struct page {
> >  #endif
> > };
> > struct {/* ZONE_DEVICE pages */
> > +   unsigned long _zd_pad_2;/* LRU */
> > +   unsigned long _zd_pad_3;/* LRU */
> > +   unsigned long _zd_pad_1;/* uses mapping 
> > */
> > /** @pgmap: Points to the hosting device page 
> > map. */
> > struct dev_pagemap *pgmap;
> > unsigned long hmm_data;
> > -   unsigned long _zd_pad_1;/* uses mapping 
> > */
> > };
> >  
> > /** @rcu_head: You can use this to free a page by RCU. 
> > */
> >
> > You don't use page->private or page->index, do you Dan?
> 
>  page->private and page->index are use by HMM DEVICE page.
> 
> >>>
> >>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field 
> >>> remaining for 
> >>> dma-pinned information. Which might work. To recap, we need:
> >>>
> >>> -- 1 bit for PageDmaPinned
> >>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru.
> >>> -- N bits for a reference count
> >>>
> >>> Those *could* be packed into a single 64-bit field, if really necessary.
> >>>
> >>
> >> ...actually, this needs to work on 32-bit systems, as well. And HMM is 
> >> using a lot.
> >> However, it is still possible for this to work.
> >>
> >> Matthew, can I have that bit now please? I'm about out of options, and now 
> >> it will actually
> >> solve the problem here.
> >>
> >> Given:
> >>
> >> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not 
> >> on the LRU.
> >> That, in turn, means only 1 bit instead of 2 bits (in addition to a 
> >> counter) is required, 
> >> for that case. 
> >>
> >> 2) There is an independent bit available (according to Matthew). 
> >>
> >> 3) HMM uses 4 of the 5 struct page fields, so only one field is available 
> >> for a counter 
> >>in that case.
> > 
> > To expend on this, HMM private page are use for anonymous page
> > so the index and mapping fields have the value you expect for
> > such pages. Down the road i want also to support file backed
> > page with HMM private (mapping, private, index).
> > 
> > For HMM public both anonymous and file back page are supported
> > today (HMM public is only useful on platform with something like
> > OpenCAPI, CCIX or NVlink ... so PowerPC for now).
> > 
> >> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages.
> > 
> > get_user_pages() only need to work with HMM public page not the
> > private one as we can not allow _anyone_ to pin HMM private page.
> > So on get_user_pages() on HMM private we get a page fault and
> > it is migrated back to regular memory.
> > 
> > 
> >> 5) For a proper atomic counter for both 32- and 64-bit, we really do need 
> >> a complete
> >> unsigned long field.
> >>
> >> So that leads to the following approach:
> >>
> >> -- Use a single unsigned long field for an atomic reference count for the 
> >> DMA pinned count.
> >> For normal pages, this will be the *second* field of the LRU (in order to 
> >> avoid PageTail bit).
> >>
> >> For ZONE_DEVICE pages, we can also line up the fields so that the second 
> >> LRU field is 
> >> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets 
> >> move up and
> >> optionally renamed:
> >>
> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> >> index 017ab82e36ca..b5dcd9398cae 100644
> >> --- a/include/linux/mm_types.h
> >> +++ b/include/linux/mm_types.h
> >> @@ -90,8 +90,8 @@ struct page {
> >>  * are in use.
> >>

Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread Jerome Glisse
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote:
> On 12/7/18 11:16 AM, Jerome Glisse wrote:
> > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote:
> >> On 12/4/18 5:57 PM, John Hubbard wrote:
> >>> On 12/4/18 5:44 PM, Jerome Glisse wrote:
>  On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote:
> > On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote:
> >> On 12/4/18 3:03 PM, Dan Williams wrote:
> >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how
> >>> does this proposal interact with those?
> >>
> >> Very badly: page->pgmap and page->hmm_data both get corrupted. Is 
> >> there an entire
> >> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? 
> >> Said another
> >> way: is it reasonable to disallow calling get_user_pages() on 
> >> ZONE_DEVICE pages?
> >>
> >> If we have to support get_user_pages() on ZONE_DEVICE pages, then the 
> >> whole 
> >> LRU field approach is unusable.
> >
> > We just need to rearrange ZONE_DEVICE pages.  Please excuse the 
> > whitespace
> > damage:
> >
> > +++ b/include/linux/mm_types.h
> > @@ -151,10 +151,12 @@ struct page {
> >  #endif
> > };
> > struct {/* ZONE_DEVICE pages */
> > +   unsigned long _zd_pad_2;/* LRU */
> > +   unsigned long _zd_pad_3;/* LRU */
> > +   unsigned long _zd_pad_1;/* uses mapping 
> > */
> > /** @pgmap: Points to the hosting device page 
> > map. */
> > struct dev_pagemap *pgmap;
> > unsigned long hmm_data;
> > -   unsigned long _zd_pad_1;/* uses mapping 
> > */
> > };
> >  
> > /** @rcu_head: You can use this to free a page by RCU. 
> > */
> >
> > You don't use page->private or page->index, do you Dan?
> 
>  page->private and page->index are use by HMM DEVICE page.
> 
> >>>
> >>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field 
> >>> remaining for 
> >>> dma-pinned information. Which might work. To recap, we need:
> >>>
> >>> -- 1 bit for PageDmaPinned
> >>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru.
> >>> -- N bits for a reference count
> >>>
> >>> Those *could* be packed into a single 64-bit field, if really necessary.
> >>>
> >>
> >> ...actually, this needs to work on 32-bit systems, as well. And HMM is 
> >> using a lot.
> >> However, it is still possible for this to work.
> >>
> >> Matthew, can I have that bit now please? I'm about out of options, and now 
> >> it will actually
> >> solve the problem here.
> >>
> >> Given:
> >>
> >> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not 
> >> on the LRU.
> >> That, in turn, means only 1 bit instead of 2 bits (in addition to a 
> >> counter) is required, 
> >> for that case. 
> >>
> >> 2) There is an independent bit available (according to Matthew). 
> >>
> >> 3) HMM uses 4 of the 5 struct page fields, so only one field is available 
> >> for a counter 
> >>in that case.
> > 
> > To expend on this, HMM private page are use for anonymous page
> > so the index and mapping fields have the value you expect for
> > such pages. Down the road i want also to support file backed
> > page with HMM private (mapping, private, index).
> > 
> > For HMM public both anonymous and file back page are supported
> > today (HMM public is only useful on platform with something like
> > OpenCAPI, CCIX or NVlink ... so PowerPC for now).
> > 
> >> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages.
> > 
> > get_user_pages() only need to work with HMM public page not the
> > private one as we can not allow _anyone_ to pin HMM private page.
> > So on get_user_pages() on HMM private we get a page fault and
> > it is migrated back to regular memory.
> > 
> > 
> >> 5) For a proper atomic counter for both 32- and 64-bit, we really do need 
> >> a complete
> >> unsigned long field.
> >>
> >> So that leads to the following approach:
> >>
> >> -- Use a single unsigned long field for an atomic reference count for the 
> >> DMA pinned count.
> >> For normal pages, this will be the *second* field of the LRU (in order to 
> >> avoid PageTail bit).
> >>
> >> For ZONE_DEVICE pages, we can also line up the fields so that the second 
> >> LRU field is 
> >> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets 
> >> move up and
> >> optionally renamed:
> >>
> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> >> index 017ab82e36ca..b5dcd9398cae 100644
> >> --- a/include/linux/mm_types.h
> >> +++ b/include/linux/mm_types.h
> >> @@ -90,8 +90,8 @@ struct page {
> >>  * are in use.
> >>

[PATCH] selftests/vm/gup_benchmark.c: match gup struct to kernel

2018-12-07 Thread Alison Schofield
An expansion field was added to the kernel copy of this structure for
future use. See mm/gup_benchmark.c.

Add the same expansion field here, so that the IOCTL command decodes
correctly. Otherwise, it fails with EINVAL.

Signed-off-by: Alison Schofield 
---
 tools/testing/selftests/vm/gup_benchmark.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/vm/gup_benchmark.c 
b/tools/testing/selftests/vm/gup_benchmark.c
index 880b96fc80d4..c0534e298b51 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -25,6 +25,7 @@ struct gup_benchmark {
__u64 size;
__u32 nr_pages_per_call;
__u32 flags;
+   __u64 expansion[10];/* For future use */
 };
 
 int main(int argc, char **argv)
-- 
2.14.1



Re: [PATCH] nvme-rdma: complete requests from ->timeout

2018-12-07 Thread Keith Busch
On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote:
> 
> > Could you please take a look at this bug and code review?
> > 
> > We are seeing more instances of this bug and found that reconnect_work
> > could hang as well, as can be seen from below stacktrace.
> > 
> >   Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> >   Call Trace:
> >   __schedule+0x2ab/0x880
> >   schedule+0x36/0x80
> >   schedule_timeout+0x161/0x300
> >   ? __next_timer_interrupt+0xe0/0xe0
> >   io_schedule_timeout+0x1e/0x50
> >   wait_for_completion_io_timeout+0x130/0x1a0
> >   ? wake_up_q+0x80/0x80
> >   blk_execute_rq+0x6e/0xa0
> >   __nvme_submit_sync_cmd+0x6e/0xe0
> >   nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics]
> >   ? wait_for_completion_interruptible_timeout+0x157/0x1b0
> >   nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma]
> >   nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma]
> >   nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma]
> >   process_one_work+0x179/0x390
> >   worker_thread+0x4f/0x3e0
> >   kthread+0x105/0x140
> >   ? max_active_store+0x80/0x80
> >   ? kthread_bind+0x20/0x20
> > 
> > This bug is produced by setting MTU of RoCE interface to '568' for
> > test while running I/O traffics.
> 
> I think that with the latest changes from Keith we can no longer rely
> on blk-mq to barrier racing completions. We will probably need
> to barrier ourselves in nvme-rdma...

You really need to do that anyway. If you were relying on blk-mq to save
you from double completions by ending a request in the nvme driver while
the lower half can still complete the same one, the only thing preventing
data corruption is the probability the request wasn't reallocated for a
new command.


[PATCH] selftests/vm/gup_benchmark.c: match gup struct to kernel

2018-12-07 Thread Alison Schofield
An expansion field was added to the kernel copy of this structure for
future use. See mm/gup_benchmark.c.

Add the same expansion field here, so that the IOCTL command decodes
correctly. Otherwise, it fails with EINVAL.

Signed-off-by: Alison Schofield 
---
 tools/testing/selftests/vm/gup_benchmark.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/vm/gup_benchmark.c 
b/tools/testing/selftests/vm/gup_benchmark.c
index 880b96fc80d4..c0534e298b51 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -25,6 +25,7 @@ struct gup_benchmark {
__u64 size;
__u32 nr_pages_per_call;
__u32 flags;
+   __u64 expansion[10];/* For future use */
 };
 
 int main(int argc, char **argv)
-- 
2.14.1



Re: [PATCH] nvme-rdma: complete requests from ->timeout

2018-12-07 Thread Keith Busch
On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote:
> 
> > Could you please take a look at this bug and code review?
> > 
> > We are seeing more instances of this bug and found that reconnect_work
> > could hang as well, as can be seen from below stacktrace.
> > 
> >   Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> >   Call Trace:
> >   __schedule+0x2ab/0x880
> >   schedule+0x36/0x80
> >   schedule_timeout+0x161/0x300
> >   ? __next_timer_interrupt+0xe0/0xe0
> >   io_schedule_timeout+0x1e/0x50
> >   wait_for_completion_io_timeout+0x130/0x1a0
> >   ? wake_up_q+0x80/0x80
> >   blk_execute_rq+0x6e/0xa0
> >   __nvme_submit_sync_cmd+0x6e/0xe0
> >   nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics]
> >   ? wait_for_completion_interruptible_timeout+0x157/0x1b0
> >   nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma]
> >   nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma]
> >   nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma]
> >   process_one_work+0x179/0x390
> >   worker_thread+0x4f/0x3e0
> >   kthread+0x105/0x140
> >   ? max_active_store+0x80/0x80
> >   ? kthread_bind+0x20/0x20
> > 
> > This bug is produced by setting MTU of RoCE interface to '568' for
> > test while running I/O traffics.
> 
> I think that with the latest changes from Keith we can no longer rely
> on blk-mq to barrier racing completions. We will probably need
> to barrier ourselves in nvme-rdma...

You really need to do that anyway. If you were relying on blk-mq to save
you from double completions by ending a request in the nvme driver while
the lower half can still complete the same one, the only thing preventing
data corruption is the probability the request wasn't reallocated for a
new command.


Re: [PATCH v2 18/34] dt-bindings: arm: Convert FSL board/soc bindings to json-schema

2018-12-07 Thread Shawn Guo
On Thu, Dec 06, 2018 at 05:33:13PM -0600, Rob Herring wrote:
> On Wed, Dec 5, 2018 at 8:32 PM Shawn Guo  wrote:
> >
> > On Mon, Dec 03, 2018 at 03:32:07PM -0600, Rob Herring wrote:
> > > Convert Freescale SoC bindings to DT schema format using json-schema.
> > >
> > > Cc: Shawn Guo 
> > > Cc: Mark Rutland 
> > > Cc: devicet...@vger.kernel.org
> > > Signed-off-by: Rob Herring 
> > > ---
> > >  .../devicetree/bindings/arm/armadeus.txt  |   6 -
> > >  Documentation/devicetree/bindings/arm/bhf.txt |   6 -
> > >  .../bindings/arm/compulab-boards.txt  |  25 --
> > >  Documentation/devicetree/bindings/arm/fsl.txt | 229 --
> > >  .../devicetree/bindings/arm/fsl.yaml  | 214 
> >
> > Rob,
> >
> > I do have any changes on bindings/arm/fsl.txt queued for 4.21 on my
> > tree, so please send it via your tree.
> 
> What about:
> 
> c386f362957b dt-bindings: Add compatible string for LS1028A-QDS
> 3671cd57de06 dt-bindings: ls1012a: Add FRWY-LS1012A device tree binding

Ah, sorry, I only checked on imx/dt branch and forgot imx/dt64.  I will
drop the changes on fsl.txt and update fsl.yaml after it hits mainline.

Shawn


Query about platform device registration

2018-12-07 Thread Vakul Garg
Hi

I am registering a device using platform_device_register_full().
In the above-mentioned api argument - 'struct platform_device_info', I set the 
parent device pointer, set_dma_mask from parent device etc.

The parent device is created by declaration in device tree and the relevant 
node has property 'dma-coherent'.
So I get dma_coherent attribute as 1 for the parent device. In other words, 
is_dev_dma_coherent() returns 1 for parent device.

But for the device registered using platform_device_register_full(), the 
dma_coherent is not set and is_dev_dma_coherent() returns 0.

Can someone please guide me, how to declare the newly registered device as 
'dma_coherent'.

Regards

Vakul


Query about platform device registration

2018-12-07 Thread Vakul Garg
Hi

I am registering a device using platform_device_register_full().
In the above-mentioned api argument - 'struct platform_device_info', I set the 
parent device pointer, set_dma_mask from parent device etc.

The parent device is created by declaration in device tree and the relevant 
node has property 'dma-coherent'.
So I get dma_coherent attribute as 1 for the parent device. In other words, 
is_dev_dma_coherent() returns 1 for parent device.

But for the device registered using platform_device_register_full(), the 
dma_coherent is not set and is_dev_dma_coherent() returns 0.

Can someone please guide me, how to declare the newly registered device as 
'dma_coherent'.

Regards

Vakul


[PATCH v3 2/2] USB: quirks: Disable LPM for Logitech UVC devices

2018-12-07 Thread Kyle Williams
From: Kyle Williams 

Description: Some USB device / host controller combinations seem to have
problems with Link Power management. In particular it is described that
the combination of certain Logitech uvc devices and other powered media
devices such causes 'not enough bandwidth for new device state' error.

This patch enables the USB_QUIRK_NO_LPM quirk entries for all connected
Logitech UVC devices indicating LPM should remain disabled for the device.

Signed-off-by: Kyle Williams 
---
 drivers/usb/core/quirks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 0690fcff0ea2..7e6df958 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -440,7 +440,7 @@ static const struct usb_device_id usb_quirk_list[] = {
 static const struct usb_device_id usb_interface_quirk_list[] = {
/* Logitech UVC Cameras */
{ USB_VENDOR_AND_INTERFACE_INFO(0x046d, USB_CLASS_VIDEO, 1, 0),
- .driver_info = USB_QUIRK_RESET_RESUME },
+ .driver_info = USB_QUIRK_RESET_RESUME | USB_QUIRK_NO_LPM },
 
{ }  /* terminating entry must be last */
 };
-- 
2.20.0.rc2.403.gdbc3b29805-goog



[PATCH v3 0/2] Disable LPM by matching interface

2018-12-07 Thread Kyle Williams
Changes in v3:
- seperated work to allow matched interfaces to use the
USB_QUIRK_NO_LPM quirk to affect a broader range of devices instead of
setting each device individually

Kyle Williams (2):
  USB: quirks: Check device interface LPM capability
  USB: quirks: Disable LPM for Logitech UVC devices

 drivers/usb/core/hub.c| 87 +--
 drivers/usb/core/quirks.c |  2 +-
 2 files changed, 49 insertions(+), 40 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805-goog



[PATCH v3 1/2] USB: quirks: Check device interface LPM capability

2018-12-07 Thread Kyle Williams
From: Kyle Williams 

Description: enable the ability to disable LPM for all devices matched
by interface information

Signed-off-by: Kyle Williams 
---
 drivers/usb/core/hub.c | 87 +++---
 1 file changed, 48 insertions(+), 39 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 0f9381b69a3b..8f366ec4d21b 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -365,6 +365,51 @@ static void usb_set_lpm_parameters(struct usb_device *udev)
usb_set_lpm_sel(udev, >u2_params);
 }
 
+/*
+ * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM
+ * when they're plugged into a USB 2.0 port, but they don't work when LPM is
+ * enabled.
+ *
+ * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the
+ * device says it supports the new USB 2.0 Link PM errata by setting the BESL
+ * support bit in the BOS descriptor.
+ */
+static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev)
+{
+   struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent);
+   int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN;
+
+   if (!udev->usb2_hw_lpm_capable || !udev->bos)
+   return;
+
+   if (hub)
+   connect_type = hub->ports[udev->portnum - 1]->connect_type;
+
+   if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT))
+   || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) {
+   udev->usb2_hw_lpm_allowed = 1;
+   usb_set_usb2_hardware_lpm(udev, 1);
+   }
+}
+
+void usb_update_device_lpm(struct usb_hcd *hcd, struct usb_device *udev)
+{
+   int retval;
+
+   if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) {
+   retval = usb_get_bos_descriptor(udev);
+   if (!retval) {
+   udev->lpm_capable = usb_device_supports_lpm(udev);
+   usb_set_lpm_parameters(udev);
+   }
+   }
+
+   /* notify HCD that we have a device connected and addressed */
+   if (hcd->driver->update_device)
+   hcd->driver->update_device(hcd, udev);
+   hub_set_initial_usb2_lpm_policy(udev);
+}
+
 /* USB 2.0 spec Section 11.24.4.5 */
 static int get_hub_descriptor(struct usb_device *hdev,
struct usb_hub_descriptor *desc)
@@ -2295,7 +2340,6 @@ static int usb_enumerate_device_otg(struct usb_device 
*udev)
return err;
 }
 
-
 /**
  * usb_enumerate_device - Read device configs/intfs/otg (usbcore-internal)
  * @udev: newly addressed device (in ADDRESS state)
@@ -2351,6 +2395,8 @@ static int usb_enumerate_device(struct usb_device *udev)
 
usb_detect_interface_quirks(udev);
 
+   usb_update_device_lpm(hcd, udev);
+
return 0;
 }
 
@@ -4402,33 +4448,6 @@ static int hub_set_address(struct usb_device *udev, int 
devnum)
return retval;
 }
 
-/*
- * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM
- * when they're plugged into a USB 2.0 port, but they don't work when LPM is
- * enabled.
- *
- * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the
- * device says it supports the new USB 2.0 Link PM errata by setting the BESL
- * support bit in the BOS descriptor.
- */
-static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev)
-{
-   struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent);
-   int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN;
-
-   if (!udev->usb2_hw_lpm_capable || !udev->bos)
-   return;
-
-   if (hub)
-   connect_type = hub->ports[udev->portnum - 1]->connect_type;
-
-   if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) 
||
-   connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) {
-   udev->usb2_hw_lpm_allowed = 1;
-   usb_set_usb2_hardware_lpm(udev, 1);
-   }
-}
-
 static int hub_enable_device(struct usb_device *udev)
 {
struct usb_hcd *hcd = bus_to_hcd(udev->bus);
@@ -4779,19 +4798,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device 
*udev, int port1,
 
usb_detect_quirks(udev);
 
-   if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) {
-   retval = usb_get_bos_descriptor(udev);
-   if (!retval) {
-   udev->lpm_capable = usb_device_supports_lpm(udev);
-   usb_set_lpm_parameters(udev);
-   }
-   }
+   usb_update_device_lpm(hcd, udev);
 
retval = 0;
-   /* notify HCD that we have a device connected and addressed */
-   if (hcd->driver->update_device)
-   hcd->driver->update_device(hcd, udev);
-   hub_set_initial_usb2_lpm_policy(udev);
 fail:
if (retval) {
hub_port_disable(hub, port1, 0);
-- 
2.20.0.rc2.403.gdbc3b29805-goog



Re: [PATCH 1/1] virtio: remove deprecated VIRTIO_PCI_CONFIG()

2018-12-07 Thread Dongli Zhang



On 12/08/2018 02:01 AM, Michael S. Tsirkin wrote:
> On Fri, Dec 07, 2018 at 03:34:41PM +0800, Dongli Zhang wrote:
>> VIRTIO_PCI_CONFIG() is deprecated. Use VIRTIO_PCI_CONFIG_OFF() instead.
>>
>> Signed-off-by: Dongli Zhang 
>> ---
>>  drivers/virtio/virtio_pci_legacy.c | 6 --
>>  include/uapi/linux/virtio_pci.h| 2 --
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_pci_legacy.c 
>> b/drivers/virtio/virtio_pci_legacy.c
>> index de062fb..eff9ddc 100644
>> --- a/drivers/virtio/virtio_pci_legacy.c
>> +++ b/drivers/virtio/virtio_pci_legacy.c
>> @@ -52,7 +52,8 @@ static void vp_get(struct virtio_device *vdev, unsigned 
>> offset,
>>  {
>>  struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>  void __iomem *ioaddr = vp_dev->ioaddr +
>> -VIRTIO_PCI_CONFIG(vp_dev) + offset;
>> +VIRTIO_PCI_CONFIG_OFF(vp_dev->msix_enabled) +
>> +offset;
>>  u8 *ptr = buf;
>>  int i;
>>  
>> @@ -67,7 +68,8 @@ static void vp_set(struct virtio_device *vdev, unsigned 
>> offset,
>>  {
>>  struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>  void __iomem *ioaddr = vp_dev->ioaddr +
>> -VIRTIO_PCI_CONFIG(vp_dev) + offset;
>> +VIRTIO_PCI_CONFIG_OFF(vp_dev->msix_enabled) +
>> +offset;
>>  const u8 *ptr = buf;
>>  int i;
>>
> 
> I agree that VIRTIO_PCI_CONFIG_OFF is a better interface. So above looks
> fine.
>   
>> diff --git a/include/uapi/linux/virtio_pci.h 
>> b/include/uapi/linux/virtio_pci.h
>> index 90007a1..2070232 100644
>> --- a/include/uapi/linux/virtio_pci.h
>> +++ b/include/uapi/linux/virtio_pci.h
>> @@ -78,8 +78,6 @@
>>  /* The remaining space is defined by each driver as the per-driver
>>   * configuration space */
>>  #define VIRTIO_PCI_CONFIG_OFF(msix_enabled) ((msix_enabled) ? 24 : 20)
>> -/* Deprecated: please use VIRTIO_PCI_CONFIG_OFF instead */
>> -#define VIRTIO_PCI_CONFIG(dev)  
>> VIRTIO_PCI_CONFIG_OFF((dev)->msix_enabled)
>>  
>>  /* Virtio ABI version, this must match exactly */
>>  #define VIRTIO_PCI_ABI_VERSION  0
> 
> This might break some userspace builds, I don't see why we should bother
> removing it. Any reason?

Apologies.

I thought about some compatibility issue for building third-party kernel module
at userspace, but did not realize it will break other userspace software builds.

I will keep the definition of VIRTIO_PCI_CONFIG() and resend again.

Thank you very much!

Dongli Zhang

> 
> 
>> -- 
>> 2.7.4


[PATCH bpf-next 2/3] bpf: add bpffs pretty print for cgroup local storage maps

2018-12-07 Thread Roman Gushchin
Implement bpffs pretty printing for cgroup local storage maps
(both shared and per-cpu).
Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):

Shared:
  $ cat /sys/fs/bpf/map_2
  # WARNING!! The output is for debug purpose only
  # WARNING!! The output format will change
  {4294968594,1}: {,1039896}

Per-cpu:
  $ cat /sys/fs/bpf/map_1
  # WARNING!! The output is for debug purpose only
  # WARNING!! The output format will change
  {4294968594,1}: {
cpu0: {0,0,0,0,0}
cpu1: {0,0,0,0,0}
cpu2: {1,104,0,0,0}
cpu3: {0,0,0,0,0}
  }

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 include/linux/btf.h| 10 +
 kernel/bpf/local_storage.c | 90 +-
 2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8c2199b5d250..ac67bc4cbfd9 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -5,6 +5,7 @@
 #define _LINUX_BTF_H 1
 
 #include 
+#include 
 
 struct btf;
 struct btf_type;
@@ -63,4 +64,13 @@ static inline const char *btf_name_by_offset(const struct 
btf *btf,
 }
 #endif
 
+static inline const struct btf_type *btf_orig_type(const struct btf *btf,
+  const struct btf_type *t)
+{
+   while (t && BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF)
+   t = btf_type_by_id(btf, t->type);
+
+   return t;
+}
+
 #endif
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index b65017dead44..7b51fe1aba3c 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -1,11 +1,13 @@
 //SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(struct bpf_cgroup_storage*, 
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
 
@@ -308,6 +310,91 @@ static int cgroup_storage_delete_elem(struct bpf_map *map, 
void *key)
return -EINVAL;
 }
 
+static int cgroup_storage_check_btf(const struct bpf_map *map,
+   const struct btf *btf,
+   const struct btf_type *key_type,
+   const struct btf_type *value_type)
+{
+   const struct btf_type *st, *t;
+   struct btf_member *m;
+
+   /* Key is expected to be of struct bpf_cgroup_storage_key type,
+* which is:
+* struct bpf_cgroup_storage_key {
+*  __u64   cgroup_inode_id;
+*  __u32   attach_type;
+* };
+*/
+
+   /*
+* Key_type must be a structure (or a typedef of a structure) with
+* two members.
+*/
+   st = btf_orig_type(btf, key_type);
+   if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT ||
+   BTF_INFO_VLEN(st->info) != 2)
+   return -EINVAL;
+
+   /*
+* The first field must be a 64 bit integer at 0 offset.
+*/
+   m = (struct btf_member *)(st + 1);
+   t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
+   if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || m->offset ||
+   t->size !=
+   FIELD_SIZEOF(struct bpf_cgroup_storage_key, cgroup_inode_id))
+   return -EINVAL;
+
+   /*
+* The second field must be a 32 bit integer at 0 offset.
+*/
+   m = m + 1;
+   t = btf_orig_type(btf, btf_type_by_id(btf, m->type));
+   if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT ||
+   m->offset != offsetof(struct bpf_cgroup_storage_key, attach_type) *
+   BITS_PER_BYTE || t->size !=
+   FIELD_SIZEOF(struct bpf_cgroup_storage_key, attach_type))
+   return -EINVAL;
+
+   return 0;
+}
+
+static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *_key,
+struct seq_file *m)
+{
+   enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
+   struct bpf_cgroup_storage_key *key = _key;
+   struct bpf_cgroup_storage *storage;
+   int cpu;
+
+   rcu_read_lock();
+   storage = cgroup_storage_lookup(map_to_storage(map), key, false);
+   if (!storage) {
+   rcu_read_unlock();
+   return;
+   }
+
+   btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
+   stype = cgroup_storage_type(map);
+   if (stype == BPF_CGROUP_STORAGE_SHARED) {
+   seq_puts(m, ": ");
+   btf_type_seq_show(map->btf, map->btf_value_type_id,
+ _ONCE(storage->buf)->data[0], m);
+   seq_puts(m, "\n");
+   } else {
+   seq_puts(m, ": {\n");
+   for_each_possible_cpu(cpu) {
+   seq_printf(m, "\tcpu%d: ", cpu);
+   btf_type_seq_show(map->btf, map->btf_value_type_id,
+ per_cpu_ptr(storage->percpu_buf, cpu),
+ 

[PATCH bpf-next 1/3] bpf: pass struct btf pointer to the map_check_btf() callback

2018-12-07 Thread Roman Gushchin
If key_type or value_type are of non-trivial data types
(e.g. structure or typedef), it's not possible to check them without
the additional information, which can't be obtained without a pointer
to the btf structure.

So, let's pass btf pointer to the map_check_btf() callbacks.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h   | 3 +++
 kernel/bpf/arraymap.c | 1 +
 kernel/bpf/lpm_trie.c | 1 +
 kernel/bpf/syscall.c  | 3 ++-
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..128d93540b23 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@ struct bpf_prog;
 struct bpf_map;
 struct sock;
 struct seq_file;
+struct btf;
 struct btf_type;
 
 /* map is generic key/value storage optionally accesible by eBPF programs */
@@ -52,6 +53,7 @@ struct bpf_map_ops {
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
int (*map_check_btf)(const struct bpf_map *map,
+const struct btf *btf,
 const struct btf_type *key_type,
 const struct btf_type *value_type);
 };
@@ -126,6 +128,7 @@ static inline bool bpf_map_support_seq_show(const struct 
bpf_map *map)
 }
 
 int map_check_no_btf(const struct bpf_map *map,
+const struct btf *btf,
 const struct btf_type *key_type,
 const struct btf_type *value_type);
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 24583da9ffd1..25632a75d630 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -382,6 +382,7 @@ static void percpu_array_map_seq_show_elem(struct bpf_map 
*map, void *key,
 }
 
 static int array_map_check_btf(const struct bpf_map *map,
+  const struct btf *btf,
   const struct btf_type *key_type,
   const struct btf_type *value_type)
 {
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index bfd4882e1106..abf1002080df 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -728,6 +728,7 @@ static int trie_get_next_key(struct bpf_map *map, void 
*_key, void *_next_key)
 }
 
 static int trie_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
  const struct btf_type *key_type,
  const struct btf_type *value_type)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index aa05aa38f4a8..7c2e8ab03a34 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -456,6 +456,7 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 }
 
 int map_check_no_btf(const struct bpf_map *map,
+const struct btf *btf,
 const struct btf_type *key_type,
 const struct btf_type *value_type)
 {
@@ -478,7 +479,7 @@ static int map_check_btf(const struct bpf_map *map, const 
struct btf *btf,
return -EINVAL;
 
if (map->ops->map_check_btf)
-   ret = map->ops->map_check_btf(map, key_type, value_type);
+   ret = map->ops->map_check_btf(map, btf, key_type, value_type);
 
return ret;
 }
-- 
2.19.2



[PATCH bpf-next 3/3] selftests/bpf: add btf annotations for cgroup_local_storage maps

2018-12-07 Thread Roman Gushchin
Add btf annotations to cgroup local storage maps (per-cpu and shared)
in the network packet counting example.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 tools/testing/selftests/bpf/netcnt_prog.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/bpf/netcnt_prog.c 
b/tools/testing/selftests/bpf/netcnt_prog.c
index 1198abca1360..9f741e69cebe 100644
--- a/tools/testing/selftests/bpf/netcnt_prog.c
+++ b/tools/testing/selftests/bpf/netcnt_prog.c
@@ -16,12 +16,18 @@ struct bpf_map_def SEC("maps") percpu_netcnt = {
.value_size = sizeof(struct percpu_net_cnt),
 };
 
+BPF_ANNOTATE_KV_PAIR(percpu_netcnt, struct bpf_cgroup_storage_key,
+struct percpu_net_cnt);
+
 struct bpf_map_def SEC("maps") netcnt = {
.type = BPF_MAP_TYPE_CGROUP_STORAGE,
.key_size = sizeof(struct bpf_cgroup_storage_key),
.value_size = sizeof(struct net_cnt),
 };
 
+BPF_ANNOTATE_KV_PAIR(netcnt, struct bpf_cgroup_storage_key,
+struct net_cnt);
+
 SEC("cgroup/skb")
 int bpf_nextcnt(struct __sk_buff *skb)
 {
-- 
2.19.2



Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread John Hubbard
On 12/7/18 11:16 AM, Jerome Glisse wrote:
> On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote:
>> On 12/4/18 5:57 PM, John Hubbard wrote:
>>> On 12/4/18 5:44 PM, Jerome Glisse wrote:
 On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote:
>> On 12/4/18 3:03 PM, Dan Williams wrote:
>>> Except the LRU fields are already in use for ZONE_DEVICE pages... how
>>> does this proposal interact with those?
>>
>> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there 
>> an entire
>> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? 
>> Said another
>> way: is it reasonable to disallow calling get_user_pages() on 
>> ZONE_DEVICE pages?
>>
>> If we have to support get_user_pages() on ZONE_DEVICE pages, then the 
>> whole 
>> LRU field approach is unusable.
>
> We just need to rearrange ZONE_DEVICE pages.  Please excuse the whitespace
> damage:
>
> +++ b/include/linux/mm_types.h
> @@ -151,10 +151,12 @@ struct page {
>  #endif
> };
> struct {/* ZONE_DEVICE pages */
> +   unsigned long _zd_pad_2;/* LRU */
> +   unsigned long _zd_pad_3;/* LRU */
> +   unsigned long _zd_pad_1;/* uses mapping */
> /** @pgmap: Points to the hosting device page 
> map. */
> struct dev_pagemap *pgmap;
> unsigned long hmm_data;
> -   unsigned long _zd_pad_1;/* uses mapping */
> };
>  
> /** @rcu_head: You can use this to free a page by RCU. */
>
> You don't use page->private or page->index, do you Dan?

 page->private and page->index are use by HMM DEVICE page.

>>>
>>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining 
>>> for 
>>> dma-pinned information. Which might work. To recap, we need:
>>>
>>> -- 1 bit for PageDmaPinned
>>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru.
>>> -- N bits for a reference count
>>>
>>> Those *could* be packed into a single 64-bit field, if really necessary.
>>>
>>
>> ...actually, this needs to work on 32-bit systems, as well. And HMM is using 
>> a lot.
>> However, it is still possible for this to work.
>>
>> Matthew, can I have that bit now please? I'm about out of options, and now 
>> it will actually
>> solve the problem here.
>>
>> Given:
>>
>> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on 
>> the LRU.
>> That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) 
>> is required, 
>> for that case. 
>>
>> 2) There is an independent bit available (according to Matthew). 
>>
>> 3) HMM uses 4 of the 5 struct page fields, so only one field is available 
>> for a counter 
>>in that case.
> 
> To expend on this, HMM private page are use for anonymous page
> so the index and mapping fields have the value you expect for
> such pages. Down the road i want also to support file backed
> page with HMM private (mapping, private, index).
> 
> For HMM public both anonymous and file back page are supported
> today (HMM public is only useful on platform with something like
> OpenCAPI, CCIX or NVlink ... so PowerPC for now).
> 
>> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages.
> 
> get_user_pages() only need to work with HMM public page not the
> private one as we can not allow _anyone_ to pin HMM private page.
> So on get_user_pages() on HMM private we get a page fault and
> it is migrated back to regular memory.
> 
> 
>> 5) For a proper atomic counter for both 32- and 64-bit, we really do need a 
>> complete
>> unsigned long field.
>>
>> So that leads to the following approach:
>>
>> -- Use a single unsigned long field for an atomic reference count for the 
>> DMA pinned count.
>> For normal pages, this will be the *second* field of the LRU (in order to 
>> avoid PageTail bit).
>>
>> For ZONE_DEVICE pages, we can also line up the fields so that the second LRU 
>> field is 
>> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets 
>> move up and
>> optionally renamed:
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 017ab82e36ca..b5dcd9398cae 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -90,8 +90,8 @@ struct page {
>>  * are in use.
>>  */
>> struct {
>> -   unsigned long dma_pinned_flags;
>> -   atomic_t  dma_pinned_count;
>> +   unsigned long dma_pinned_flags; /* 
>> LRU.next */
>> + 

Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-07 Thread John Hubbard
On 12/7/18 11:16 AM, Jerome Glisse wrote:
> On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote:
>> On 12/4/18 5:57 PM, John Hubbard wrote:
>>> On 12/4/18 5:44 PM, Jerome Glisse wrote:
 On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote:
>> On 12/4/18 3:03 PM, Dan Williams wrote:
>>> Except the LRU fields are already in use for ZONE_DEVICE pages... how
>>> does this proposal interact with those?
>>
>> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there 
>> an entire
>> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? 
>> Said another
>> way: is it reasonable to disallow calling get_user_pages() on 
>> ZONE_DEVICE pages?
>>
>> If we have to support get_user_pages() on ZONE_DEVICE pages, then the 
>> whole 
>> LRU field approach is unusable.
>
> We just need to rearrange ZONE_DEVICE pages.  Please excuse the whitespace
> damage:
>
> +++ b/include/linux/mm_types.h
> @@ -151,10 +151,12 @@ struct page {
>  #endif
> };
> struct {/* ZONE_DEVICE pages */
> +   unsigned long _zd_pad_2;/* LRU */
> +   unsigned long _zd_pad_3;/* LRU */
> +   unsigned long _zd_pad_1;/* uses mapping */
> /** @pgmap: Points to the hosting device page 
> map. */
> struct dev_pagemap *pgmap;
> unsigned long hmm_data;
> -   unsigned long _zd_pad_1;/* uses mapping */
> };
>  
> /** @rcu_head: You can use this to free a page by RCU. */
>
> You don't use page->private or page->index, do you Dan?

 page->private and page->index are use by HMM DEVICE page.

>>>
>>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining 
>>> for 
>>> dma-pinned information. Which might work. To recap, we need:
>>>
>>> -- 1 bit for PageDmaPinned
>>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru.
>>> -- N bits for a reference count
>>>
>>> Those *could* be packed into a single 64-bit field, if really necessary.
>>>
>>
>> ...actually, this needs to work on 32-bit systems, as well. And HMM is using 
>> a lot.
>> However, it is still possible for this to work.
>>
>> Matthew, can I have that bit now please? I'm about out of options, and now 
>> it will actually
>> solve the problem here.
>>
>> Given:
>>
>> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on 
>> the LRU.
>> That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) 
>> is required, 
>> for that case. 
>>
>> 2) There is an independent bit available (according to Matthew). 
>>
>> 3) HMM uses 4 of the 5 struct page fields, so only one field is available 
>> for a counter 
>>in that case.
> 
> To expend on this, HMM private page are use for anonymous page
> so the index and mapping fields have the value you expect for
> such pages. Down the road i want also to support file backed
> page with HMM private (mapping, private, index).
> 
> For HMM public both anonymous and file back page are supported
> today (HMM public is only useful on platform with something like
> OpenCAPI, CCIX or NVlink ... so PowerPC for now).
> 
>> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages.
> 
> get_user_pages() only need to work with HMM public page not the
> private one as we can not allow _anyone_ to pin HMM private page.
> So on get_user_pages() on HMM private we get a page fault and
> it is migrated back to regular memory.
> 
> 
>> 5) For a proper atomic counter for both 32- and 64-bit, we really do need a 
>> complete
>> unsigned long field.
>>
>> So that leads to the following approach:
>>
>> -- Use a single unsigned long field for an atomic reference count for the 
>> DMA pinned count.
>> For normal pages, this will be the *second* field of the LRU (in order to 
>> avoid PageTail bit).
>>
>> For ZONE_DEVICE pages, we can also line up the fields so that the second LRU 
>> field is 
>> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets 
>> move up and
>> optionally renamed:
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 017ab82e36ca..b5dcd9398cae 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -90,8 +90,8 @@ struct page {
>>  * are in use.
>>  */
>> struct {
>> -   unsigned long dma_pinned_flags;
>> -   atomic_t  dma_pinned_count;
>> +   unsigned long dma_pinned_flags; /* 
>> LRU.next */
>> + 

Re: [PATCH] Fix sync. in blkdev_write_iter() acessing i_flags

2018-12-07 Thread Al Viro
On Fri, Dec 07, 2018 at 08:49:16PM +0100, Alexander Lochmann wrote:

> > _What_ SUID bit?  We are talking about a write to block device, for fsck 
> > sake...
> > 
> That's the way I understood Jan's explanation:
> "
> Thinking more about this I'm not sure if this is actually the right
> solution. Because for example the write(2) can set S_NOSEC flag wrongly
> when it would race with chmod adding SUID bit. So probably we rather need
> to acquire i_rwsem in blkdev_write_iter() if file does not have S_NOSEC set
> (we don't want to acquire it unconditionally as that would heavily impact
> scalability of block device writes).

IDGI.  We are talking about a block device here.  What business could
file_remove_privs() have doing _anything_ to it?  should_remove_suid() returns
to return 0 for those; what case do you have in mind?  Somebody setting
security.capabilities on a block device inode?

IMO the bug here is file_remove_privs() not buggering off immediately
after having observed that we are dealing with a block device.  It really
has nothing useful to do.


Re: [PATCH] Fix sync. in blkdev_write_iter() acessing i_flags

2018-12-07 Thread Al Viro
On Fri, Dec 07, 2018 at 08:49:16PM +0100, Alexander Lochmann wrote:

> > _What_ SUID bit?  We are talking about a write to block device, for fsck 
> > sake...
> > 
> That's the way I understood Jan's explanation:
> "
> Thinking more about this I'm not sure if this is actually the right
> solution. Because for example the write(2) can set S_NOSEC flag wrongly
> when it would race with chmod adding SUID bit. So probably we rather need
> to acquire i_rwsem in blkdev_write_iter() if file does not have S_NOSEC set
> (we don't want to acquire it unconditionally as that would heavily impact
> scalability of block device writes).

IDGI.  We are talking about a block device here.  What business could
file_remove_privs() have doing _anything_ to it?  should_remove_suid() returns
to return 0 for those; what case do you have in mind?  Somebody setting
security.capabilities on a block device inode?

IMO the bug here is file_remove_privs() not buggering off immediately
after having observed that we are dealing with a block device.  It really
has nothing useful to do.


[PATCH v2] USB: quirks: disable LPM for Logitech UVC devices

2018-12-07 Thread Kyle Williams
From: Kyle Williams 

Description: Some USB device / host controller combinations seem to have
problems with Link Power management. In particular it is described that
the combination of certain Logitech uvc devices and other powered media
devices such causes 'not enough bandwidth for new device state' error.

This patch enables the USB_QUIRK_NO_LPM quirk entries for all connected
Logitech UVC devices indicating LPM should remain disabled for the device.

Signed-off-by: Kyle Williams 
---

Changes in v2:
- changed commit message
- disble lpm for all logitech uvc devices instead of listing manually
- changes to allow the lpm to correctly be disabled after enumeration

 drivers/usb/core/hub.c| 87 +--
 drivers/usb/core/quirks.c |  2 +-
 2 files changed, 49 insertions(+), 40 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 0f9381b69a3b..8f366ec4d21b 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -365,6 +365,51 @@ static void usb_set_lpm_parameters(struct usb_device *udev)
usb_set_lpm_sel(udev, >u2_params);
 }
 
+/*
+ * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM
+ * when they're plugged into a USB 2.0 port, but they don't work when LPM is
+ * enabled.
+ *
+ * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the
+ * device says it supports the new USB 2.0 Link PM errata by setting the BESL
+ * support bit in the BOS descriptor.
+ */
+static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev)
+{
+   struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent);
+   int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN;
+
+   if (!udev->usb2_hw_lpm_capable || !udev->bos)
+   return;
+
+   if (hub)
+   connect_type = hub->ports[udev->portnum - 1]->connect_type;
+
+   if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT))
+   || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) {
+   udev->usb2_hw_lpm_allowed = 1;
+   usb_set_usb2_hardware_lpm(udev, 1);
+   }
+}
+
+void usb_update_device_lpm(struct usb_hcd *hcd, struct usb_device *udev)
+{
+   int retval;
+
+   if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) {
+   retval = usb_get_bos_descriptor(udev);
+   if (!retval) {
+   udev->lpm_capable = usb_device_supports_lpm(udev);
+   usb_set_lpm_parameters(udev);
+   }
+   }
+
+   /* notify HCD that we have a device connected and addressed */
+   if (hcd->driver->update_device)
+   hcd->driver->update_device(hcd, udev);
+   hub_set_initial_usb2_lpm_policy(udev);
+}
+
 /* USB 2.0 spec Section 11.24.4.5 */
 static int get_hub_descriptor(struct usb_device *hdev,
struct usb_hub_descriptor *desc)
@@ -2295,7 +2340,6 @@ static int usb_enumerate_device_otg(struct usb_device 
*udev)
return err;
 }
 
-
 /**
  * usb_enumerate_device - Read device configs/intfs/otg (usbcore-internal)
  * @udev: newly addressed device (in ADDRESS state)
@@ -2351,6 +2395,8 @@ static int usb_enumerate_device(struct usb_device *udev)
 
usb_detect_interface_quirks(udev);
 
+   usb_update_device_lpm(hcd, udev);
+
return 0;
 }
 
@@ -4402,33 +4448,6 @@ static int hub_set_address(struct usb_device *udev, int 
devnum)
return retval;
 }
 
-/*
- * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM
- * when they're plugged into a USB 2.0 port, but they don't work when LPM is
- * enabled.
- *
- * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the
- * device says it supports the new USB 2.0 Link PM errata by setting the BESL
- * support bit in the BOS descriptor.
- */
-static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev)
-{
-   struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent);
-   int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN;
-
-   if (!udev->usb2_hw_lpm_capable || !udev->bos)
-   return;
-
-   if (hub)
-   connect_type = hub->ports[udev->portnum - 1]->connect_type;
-
-   if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) 
||
-   connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) {
-   udev->usb2_hw_lpm_allowed = 1;
-   usb_set_usb2_hardware_lpm(udev, 1);
-   }
-}
-
 static int hub_enable_device(struct usb_device *udev)
 {
struct usb_hcd *hcd = bus_to_hcd(udev->bus);
@@ -4779,19 +4798,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device 
*udev, int port1,
 
usb_detect_quirks(udev);
 
-   if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) {
-   retval = usb_get_bos_descriptor(udev);
-   if (!retval) {
-   udev->lpm_capable = usb_device_supports_lpm(udev);
- 

Re: Recommended driver for current AMD processors

2018-12-07 Thread Ken Moffat
Hi Paul,

On Fri, 7 Dec 2018 at 15:32, Paul Menzel  wrote:
>
> Dear Linux folks,
>
>
> What driver is recommended for current AMD Ryzen based processors
> like *AMD Ryzen 5 PRO 1500 Quad-Core Processor* or *AMD EPYC 7601
> 32-Core Processor*?
>
> Only from the acpi-cpufreq Kconfig description, I assume, that that
> driver should be used.
>
> > config X86_ACPI_CPUFREQ
> > tristate "ACPI Processor P-States driver"
> > depends on ACPI_PROCESSOR
> > help
> >   This driver adds a CPUFreq driver which utilizes the ACPI
> >   Processor Performance States.
> >   This driver also supports Intel Enhanced Speedstep and newer
> >   AMD CPUs.
> >
> >   To compile this driver as a module, choose M here: the
> >   module will be called acpi-cpufreq.
> >
> >   For details, take a look at .
> >
> >   If in doubt, say N.
>
> Would a “native” driver like Intel’s P state driver also give better
> results? Do you know if AMD is working on something like that?
>
>
>
> Kind regards,
>
> Paul
>

As a mere user, I bought a ryzen 3 1300X earlier this year, and was
annoyed about how the frequencies changed when a single-threaded
compile moved to a different core (unlike e.g. a haswell where
frequencies barely vary).

I have a meter which can report the current power consumption
for the system (computer, monitor, network switch, kvm switch), and
using that to keep an eye on the range of reported wattages when idle,
compiling with -j1 and compiling with -j4, I eventually formed the
impression that the ondemand governor was marginally better than
the performance governor, and that omitting cpufreq did not appear to
increase the poer consumption.

But that is just one set of observations.  I agree that using less
power and getting faster compiles would be nice, so if something is
available I'll be keen to try it.

ĸen


Re: Recommended driver for current AMD processors

2018-12-07 Thread Ken Moffat
Hi Paul,

On Fri, 7 Dec 2018 at 15:32, Paul Menzel  wrote:
>
> Dear Linux folks,
>
>
> What driver is recommended for current AMD Ryzen based processors
> like *AMD Ryzen 5 PRO 1500 Quad-Core Processor* or *AMD EPYC 7601
> 32-Core Processor*?
>
> Only from the acpi-cpufreq Kconfig description, I assume, that that
> driver should be used.
>
> > config X86_ACPI_CPUFREQ
> > tristate "ACPI Processor P-States driver"
> > depends on ACPI_PROCESSOR
> > help
> >   This driver adds a CPUFreq driver which utilizes the ACPI
> >   Processor Performance States.
> >   This driver also supports Intel Enhanced Speedstep and newer
> >   AMD CPUs.
> >
> >   To compile this driver as a module, choose M here: the
> >   module will be called acpi-cpufreq.
> >
> >   For details, take a look at .
> >
> >   If in doubt, say N.
>
> Would a “native” driver like Intel’s P state driver also give better
> results? Do you know if AMD is working on something like that?
>
>
>
> Kind regards,
>
> Paul
>

As a mere user, I bought a ryzen 3 1300X earlier this year, and was
annoyed about how the frequencies changed when a single-threaded
compile moved to a different core (unlike e.g. a haswell where
frequencies barely vary).

I have a meter which can report the current power consumption
for the system (computer, monitor, network switch, kvm switch), and
using that to keep an eye on the range of reported wattages when idle,
compiling with -j1 and compiling with -j4, I eventually formed the
impression that the ondemand governor was marginally better than
the performance governor, and that omitting cpufreq did not appear to
increase the poer consumption.

But that is just one set of observations.  I agree that using less
power and getting faster compiles would be nice, so if something is
available I'll be keen to try it.

ĸen


Should this_cpu_read() be volatile?

2018-12-07 Thread Nadav Amit
[Resend, changing title & adding lkml and some others ]

On Dec 7, 2018, at 3:12 PM, Nadav Amit  wrote:

[ We can start a new thread, since I have the tendency to hijack threads. ]

> On Dec 7, 2018, at 12:45 AM, Peter Zijlstra  wrote:
> 
> On Thu, Dec 06, 2018 at 09:26:24AM -0800, Nadav Amit wrote:
>>> On Dec 6, 2018, at 2:25 AM, Peter Zijlstra  wrote:
>>> 
>>> On Thu, Dec 06, 2018 at 12:28:26AM -0800, Nadav Amit wrote:
 [ +Peter ]
 

[snip]

 
 *But* there is one thing that may require some attention - patch
 b59167ac7bafd ("x86/percpu: Fix this_cpu_read()”) set ordering constraints
 on the VM_ARGS() evaluation. And this patch also imposes, it appears,
 (unnecessary) constraints on other pieces of code.
 
 These constraints are due to the addition of the volatile keyword for
 this_cpu_read() by the patch. This affects at least 68 functions in my
 kernel build, some of which are hot (I think), e.g., finish_task_switch(),
 smp_x86_platform_ipi() and select_idle_sibling().
 
 Peter, perhaps the solution was too big of a hammer? Is it possible instead
 to create a separate "this_cpu_read_once()” with the volatile keyword? Such
 a function can be used for native_sched_clock() and other seqlocks, etc.
>>> 
>>> No. like the commit writes this_cpu_read() _must_ imply READ_ONCE(). If
>>> you want something else, use something else, there's plenty other
>>> options available.
>>> 
>>> There's this_cpu_op_stable(), but also __this_cpu_read() and
>>> raw_this_cpu_read() (which currently don't differ from this_cpu_read()
>>> but could).
>> 
>> Would setting the inline assembly memory operand both as input and output be
>> better than using the “volatile”?
> 
> I don't know.. I'm forever befuddled by the exact semantics of gcc
> inline asm.
> 
>> I think that If you do that, the compiler would should the this_cpu_read()
>> as something that changes the per-cpu-variable, which would make it invalid
>> to re-read the value. At the same time, it would not prevent reordering the
>> read with other stuff.
> 
> So the thing is; as I wrote, the generic version of this_cpu_*() is:
> 
>   local_irq_save();
>   __this_cpu_*();
>   local_irq_restore();
> 
> And per local_irq_{save,restore}() including compiler barriers that
> cannot be reordered around either.
> 
> And per the principle of least surprise, I think our primitives should
> have similar semantics.

I guess so, but as you’ll see below, the end result is ugly.

> I'm actually having difficulty finding the this_cpu_read() in any of the
> functions you mention, so I cannot make any concrete suggestions other
> than pointing at the alternative functions available.


So I got deeper into the code to understand a couple of differences. In the
case of select_idle_sibling(), the patch (Peter’s) increase the function
code size by 123 bytes (over the baseline of 986). The per-cpu variable is
called through the following call chain:

select_idle_sibling()
=> select_idle_cpu()
=> local_clock()
=> raw_smp_processor_id()

And results in 2 more calls to sched_clock_cpu(), as the compiler assumes
the processor id changes in between (which obviously wouldn’t happen). There
may be more changes around, which I didn’t fully analyze. But the very least
reading the processor id should not get “volatile”.

As for finish_task_switch(), the impact is only few bytes, but still
unnecessary. It appears that with your patch preempt_count() causes multiple
reads of __preempt_count in this code:

   if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
 "corrupted preempt_count: %s/%d/0x%x\n",
 current->comm, current->pid, preempt_count()))
   preempt_count_set(FORK_PREEMPT_COUNT);

Again, this is unwarranted, as the preemption count should not be changed in
any interrupt.




Should this_cpu_read() be volatile?

2018-12-07 Thread Nadav Amit
[Resend, changing title & adding lkml and some others ]

On Dec 7, 2018, at 3:12 PM, Nadav Amit  wrote:

[ We can start a new thread, since I have the tendency to hijack threads. ]

> On Dec 7, 2018, at 12:45 AM, Peter Zijlstra  wrote:
> 
> On Thu, Dec 06, 2018 at 09:26:24AM -0800, Nadav Amit wrote:
>>> On Dec 6, 2018, at 2:25 AM, Peter Zijlstra  wrote:
>>> 
>>> On Thu, Dec 06, 2018 at 12:28:26AM -0800, Nadav Amit wrote:
 [ +Peter ]
 

[snip]

 
 *But* there is one thing that may require some attention - patch
 b59167ac7bafd ("x86/percpu: Fix this_cpu_read()”) set ordering constraints
 on the VM_ARGS() evaluation. And this patch also imposes, it appears,
 (unnecessary) constraints on other pieces of code.
 
 These constraints are due to the addition of the volatile keyword for
 this_cpu_read() by the patch. This affects at least 68 functions in my
 kernel build, some of which are hot (I think), e.g., finish_task_switch(),
 smp_x86_platform_ipi() and select_idle_sibling().
 
 Peter, perhaps the solution was too big of a hammer? Is it possible instead
 to create a separate "this_cpu_read_once()” with the volatile keyword? Such
 a function can be used for native_sched_clock() and other seqlocks, etc.
>>> 
>>> No. like the commit writes this_cpu_read() _must_ imply READ_ONCE(). If
>>> you want something else, use something else, there's plenty other
>>> options available.
>>> 
>>> There's this_cpu_op_stable(), but also __this_cpu_read() and
>>> raw_this_cpu_read() (which currently don't differ from this_cpu_read()
>>> but could).
>> 
>> Would setting the inline assembly memory operand both as input and output be
>> better than using the “volatile”?
> 
> I don't know.. I'm forever befuddled by the exact semantics of gcc
> inline asm.
> 
>> I think that If you do that, the compiler would should the this_cpu_read()
>> as something that changes the per-cpu-variable, which would make it invalid
>> to re-read the value. At the same time, it would not prevent reordering the
>> read with other stuff.
> 
> So the thing is; as I wrote, the generic version of this_cpu_*() is:
> 
>   local_irq_save();
>   __this_cpu_*();
>   local_irq_restore();
> 
> And per local_irq_{save,restore}() including compiler barriers that
> cannot be reordered around either.
> 
> And per the principle of least surprise, I think our primitives should
> have similar semantics.

I guess so, but as you’ll see below, the end result is ugly.

> I'm actually having difficulty finding the this_cpu_read() in any of the
> functions you mention, so I cannot make any concrete suggestions other
> than pointing at the alternative functions available.


So I got deeper into the code to understand a couple of differences. In the
case of select_idle_sibling(), the patch (Peter’s) increase the function
code size by 123 bytes (over the baseline of 986). The per-cpu variable is
called through the following call chain:

select_idle_sibling()
=> select_idle_cpu()
=> local_clock()
=> raw_smp_processor_id()

And results in 2 more calls to sched_clock_cpu(), as the compiler assumes
the processor id changes in between (which obviously wouldn’t happen). There
may be more changes around, which I didn’t fully analyze. But the very least
reading the processor id should not get “volatile”.

As for finish_task_switch(), the impact is only few bytes, but still
unnecessary. It appears that with your patch preempt_count() causes multiple
reads of __preempt_count in this code:

   if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
 "corrupted preempt_count: %s/%d/0x%x\n",
 current->comm, current->pid, preempt_count()))
   preempt_count_set(FORK_PREEMPT_COUNT);

Again, this is unwarranted, as the preemption count should not be changed in
any interrupt.




Process for severe early stable bugs?

2018-12-07 Thread Laura Abbott

The latest file system corruption issue (Nominally fixed by
ffe81d45322c ("blk-mq: fix corruption with direct issue") later
fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch
list")) brought a lot of rightfully concerned users asking about
release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to
4.19.3 on Nov 23. When the issue started getting visibility,
users were left with the option of running known EOL 4.18.x
kernels or running a 4.19 series that could corrupt their
data. Admittedly, the risk of running the EOL kernel was pretty
low given how recent it was, but it's still not a great look
to tell people to run something marked EOL.

I'm wondering if there's anything we can do to make things easier
on kernel consumers. Bugs will certainly happen but it really
makes it hard to push the "always run the latest stable" narrative
if there isn't a good fallback when things go seriously wrong. I
don't actually have a great proposal for a solution here other than
retroactively bringing back 4.18 (which I don't think Greg would
like) but I figured I should at least bring it up.

Thanks,
Laura


Process for severe early stable bugs?

2018-12-07 Thread Laura Abbott

The latest file system corruption issue (Nominally fixed by
ffe81d45322c ("blk-mq: fix corruption with direct issue") later
fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch
list")) brought a lot of rightfully concerned users asking about
release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to
4.19.3 on Nov 23. When the issue started getting visibility,
users were left with the option of running known EOL 4.18.x
kernels or running a 4.19 series that could corrupt their
data. Admittedly, the risk of running the EOL kernel was pretty
low given how recent it was, but it's still not a great look
to tell people to run something marked EOL.

I'm wondering if there's anything we can do to make things easier
on kernel consumers. Bugs will certainly happen but it really
makes it hard to push the "always run the latest stable" narrative
if there isn't a good fallback when things go seriously wrong. I
don't actually have a great proposal for a solution here other than
retroactively bringing back 4.18 (which I don't think Greg would
like) but I figured I should at least bring it up.

Thanks,
Laura


Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

2018-12-07 Thread Andrew Morton
On Wed, 5 Dec 2018 22:07:37 -0800 Anthony Yznaga  
wrote:

> Would you like me to submit a revised patch?  An -mm tree diff?

Either is OK.  I usually turn replacemensts into deltas so we can see
what changed.



Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

2018-12-07 Thread Andrew Morton
On Wed, 5 Dec 2018 22:07:37 -0800 Anthony Yznaga  
wrote:

> Would you like me to submit a revised patch?  An -mm tree diff?

Either is OK.  I usually turn replacemensts into deltas so we can see
what changed.



  1   2   3   4   5   6   7   8   9   10   >