Re: Process for severe early stable bugs?
Hi Laura, On Fri, Dec 07, 2018 at 04:33:10PM -0800, Laura Abbott wrote: > The latest file system corruption issue (Nominally fixed by > ffe81d45322c ("blk-mq: fix corruption with direct issue") later > fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch > list")) brought a lot of rightfully concerned users asking about > release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to > 4.19.3 on Nov 23. When the issue started getting visibility, > users were left with the option of running known EOL 4.18.x > kernels or running a 4.19 series that could corrupt their > data. Admittedly, the risk of running the EOL kernel was pretty > low given how recent it was, but it's still not a great look > to tell people to run something marked EOL. > > I'm wondering if there's anything we can do to make things easier > on kernel consumers. Bugs will certainly happen but it really > makes it hard to push the "always run the latest stable" narrative > if there isn't a good fallback when things go seriously wrong. I > don't actually have a great proposal for a solution here other than > retroactively bringing back 4.18 (which I don't think Greg would > like) but I figured I should at least bring it up. This type of problem may happen once in a while but fortunately is extremely rare, so I guess it can be addressed with unusual methods. For my use cases, I always make sure that the last two LTS branches work fine. Since there's some great maintenance overlap between LTS branches, I can quickly switch to 4.14.x (or even 4.9.x) if this happens. In our products we make sure that our toolchain is built with support for the previous kernel as well "just in case". We've never switched back and will probably never do, but at least it serves us a lot to compare strange behaviours between two kernels. I think that if your distro is functionally and technically compatible with the previous LTS branch, it could be an acceptable escape for users who are concerned about their data and their security at the same time. After all, previous LTS branches are there for those who can't upgrade. In my opinion this situation perfectly qualifies. But it requires some preparation like I mentioned. It might be that some components in the distro rely on features from the very latest kernels. At the very least it might deserve a bit of inspection to know if such dependencies exist, and/or what is lost in case of such a fall back, to warn users. Just my two cents, Willy
Re: Process for severe early stable bugs?
Hi Laura, On Fri, Dec 07, 2018 at 04:33:10PM -0800, Laura Abbott wrote: > The latest file system corruption issue (Nominally fixed by > ffe81d45322c ("blk-mq: fix corruption with direct issue") later > fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch > list")) brought a lot of rightfully concerned users asking about > release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to > 4.19.3 on Nov 23. When the issue started getting visibility, > users were left with the option of running known EOL 4.18.x > kernels or running a 4.19 series that could corrupt their > data. Admittedly, the risk of running the EOL kernel was pretty > low given how recent it was, but it's still not a great look > to tell people to run something marked EOL. > > I'm wondering if there's anything we can do to make things easier > on kernel consumers. Bugs will certainly happen but it really > makes it hard to push the "always run the latest stable" narrative > if there isn't a good fallback when things go seriously wrong. I > don't actually have a great proposal for a solution here other than > retroactively bringing back 4.18 (which I don't think Greg would > like) but I figured I should at least bring it up. This type of problem may happen once in a while but fortunately is extremely rare, so I guess it can be addressed with unusual methods. For my use cases, I always make sure that the last two LTS branches work fine. Since there's some great maintenance overlap between LTS branches, I can quickly switch to 4.14.x (or even 4.9.x) if this happens. In our products we make sure that our toolchain is built with support for the previous kernel as well "just in case". We've never switched back and will probably never do, but at least it serves us a lot to compare strange behaviours between two kernels. I think that if your distro is functionally and technically compatible with the previous LTS branch, it could be an acceptable escape for users who are concerned about their data and their security at the same time. After all, previous LTS branches are there for those who can't upgrade. In my opinion this situation perfectly qualifies. But it requires some preparation like I mentioned. It might be that some components in the distro rely on features from the very latest kernels. At the very least it might deserve a bit of inspection to know if such dependencies exist, and/or what is lost in case of such a fall back, to warn users. Just my two cents, Willy
Re: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings (fwd)
On Fri, 7 Dec 2018, Jacek Anaszewski wrote: > Hi Julia, > > Thank you for the patch, but it doesn't apply to LED tree. > > The patch causing the problem is out-of-LED-tree. OK, I guess that the patch is in a TI-specific tree, given the name. Thanks for looking into it. julia > > Best regards, > Jacek Anaszewski > > On 12/6/18 9:28 PM, Julia Lawall wrote: > > Hello, > > > > The code seems to be wrong in several ways. If the continue is wanted, > > the of_node_put is not needed; it will happen on the next iteration. If > > the continue is not wanted, the of_node_put is needed, and the continue > > should be dropped. > > > > julia > > > > -- Forwarded message -- > > Date: Thu, 6 Dec 2018 19:48:54 +0800 > > From: kbuild test robot > > To: kbu...@01.org > > Cc: Julia Lawall > > Subject: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings > > > > CC: kbuild-...@01.org > > TO: Jyri Sarha > > CC: Peter Ujfalusi > > CC: Jacek Anaszewski > > CC: Pavel Machek > > CC: linux-l...@vger.kernel.org > > CC: linux-kernel@vger.kernel.org > > > > From: kbuild test robot > > > > drivers/leds/leds-tlc591xx.c:342:3-14: ERROR: probable double put. > > > > Device node iterators put the previous value of the index variable, so an > > explicit put causes a double put. > > > > Generated by: scripts/coccinelle/iterators/device_node_continue.cocci > > > > Fixes: 7b2d34aaede7 ("leds: tlc591xx: Add gpio output support") > > CC: Jyri Sarha > > Signed-off-by: kbuild test robot > > --- > > > > tree: https://github.com/omap-audio/linux-audio peter/ti-linux-4.19.y/wip > > head: 838f24e2deaf1229002bd6555eb7e889b09ac1f9 > > commit: 7b2d34aaede727b4abfc78061bbd2202fcd92bc8 [62/67] leds: tlc591xx: Add > > gpio output support > > :: branch date: 26 hours ago > > :: commit date: 26 hours ago > > > > Please take the patch only if it's a positive warning. Thanks! > > > > leds-tlc591xx.c |1 - > > 1 file changed, 1 deletion(-) > > > > --- a/drivers/leds/leds-tlc591xx.c > > +++ b/drivers/leds/leds-tlc591xx.c > > @@ -339,7 +339,6 @@ tlc591xx_probe(struct i2c_client *client > > for_each_child_of_node(np, child) { > > err = of_property_read_u32(child, "reg", ); > > if (err) { > > - of_node_put(child); > > continue; > > return err; > > } > > > > >
Re: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings (fwd)
On Fri, 7 Dec 2018, Jacek Anaszewski wrote: > Hi Julia, > > Thank you for the patch, but it doesn't apply to LED tree. > > The patch causing the problem is out-of-LED-tree. OK, I guess that the patch is in a TI-specific tree, given the name. Thanks for looking into it. julia > > Best regards, > Jacek Anaszewski > > On 12/6/18 9:28 PM, Julia Lawall wrote: > > Hello, > > > > The code seems to be wrong in several ways. If the continue is wanted, > > the of_node_put is not needed; it will happen on the next iteration. If > > the continue is not wanted, the of_node_put is needed, and the continue > > should be dropped. > > > > julia > > > > -- Forwarded message -- > > Date: Thu, 6 Dec 2018 19:48:54 +0800 > > From: kbuild test robot > > To: kbu...@01.org > > Cc: Julia Lawall > > Subject: [PATCH] leds: tlc591xx: fix device_node_continue.cocci warnings > > > > CC: kbuild-...@01.org > > TO: Jyri Sarha > > CC: Peter Ujfalusi > > CC: Jacek Anaszewski > > CC: Pavel Machek > > CC: linux-l...@vger.kernel.org > > CC: linux-kernel@vger.kernel.org > > > > From: kbuild test robot > > > > drivers/leds/leds-tlc591xx.c:342:3-14: ERROR: probable double put. > > > > Device node iterators put the previous value of the index variable, so an > > explicit put causes a double put. > > > > Generated by: scripts/coccinelle/iterators/device_node_continue.cocci > > > > Fixes: 7b2d34aaede7 ("leds: tlc591xx: Add gpio output support") > > CC: Jyri Sarha > > Signed-off-by: kbuild test robot > > --- > > > > tree: https://github.com/omap-audio/linux-audio peter/ti-linux-4.19.y/wip > > head: 838f24e2deaf1229002bd6555eb7e889b09ac1f9 > > commit: 7b2d34aaede727b4abfc78061bbd2202fcd92bc8 [62/67] leds: tlc591xx: Add > > gpio output support > > :: branch date: 26 hours ago > > :: commit date: 26 hours ago > > > > Please take the patch only if it's a positive warning. Thanks! > > > > leds-tlc591xx.c |1 - > > 1 file changed, 1 deletion(-) > > > > --- a/drivers/leds/leds-tlc591xx.c > > +++ b/drivers/leds/leds-tlc591xx.c > > @@ -339,7 +339,6 @@ tlc591xx_probe(struct i2c_client *client > > for_each_child_of_node(np, child) { > > err = of_property_read_u32(child, "reg", ); > > if (err) { > > - of_node_put(child); > > continue; > > return err; > > } > > > > >
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On Fri, Dec 7, 2018 at 4:53 PM John Hubbard wrote: > > On 12/7/18 11:16 AM, Jerome Glisse wrote: > > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote: [..] > I see. OK, HMM has done an efficient job of mopping up unused fields, and now > we are > completely out of space. At this point, after thinking about it carefully, it > seems clear > that it's time for a single, new field: > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 5ed8f6292a53..1c789e324da8 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -182,6 +182,9 @@ struct page { > /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ > atomic_t _refcount; > > + /* DMA usage count. See get_user_pages*(), put_user_page*(). */ > + atomic_t _dma_pinned_count; > + > #ifdef CONFIG_MEMCG > struct mem_cgroup *mem_cgroup; > #endif > > > ...because after all, the reason this is so difficult is that this fix has to > work > in pretty much every configuration. get_user_pages() use is widespread, it's > a very > general facility, and...it needs fixing. And we're out of space. HMM seems entirely too greedy in this regard. Especially with zero upstream users. When can we start to delete the pieces of HMM that have no upstream consumers? I would think that would be 4.21 / 5.0 as there needs to be some forcing function. We can always re-add pieces of HMM with it's users when / if they arrive.
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On Fri, Dec 7, 2018 at 4:53 PM John Hubbard wrote: > > On 12/7/18 11:16 AM, Jerome Glisse wrote: > > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote: [..] > I see. OK, HMM has done an efficient job of mopping up unused fields, and now > we are > completely out of space. At this point, after thinking about it carefully, it > seems clear > that it's time for a single, new field: > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 5ed8f6292a53..1c789e324da8 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -182,6 +182,9 @@ struct page { > /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ > atomic_t _refcount; > > + /* DMA usage count. See get_user_pages*(), put_user_page*(). */ > + atomic_t _dma_pinned_count; > + > #ifdef CONFIG_MEMCG > struct mem_cgroup *mem_cgroup; > #endif > > > ...because after all, the reason this is so difficult is that this fix has to > work > in pretty much every configuration. get_user_pages() use is widespread, it's > a very > general facility, and...it needs fixing. And we're out of space. HMM seems entirely too greedy in this regard. Especially with zero upstream users. When can we start to delete the pieces of HMM that have no upstream consumers? I would think that would be 4.21 / 5.0 as there needs to be some forcing function. We can always re-add pieces of HMM with it's users when / if they arrive.
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Sat, Dec 08, 2018 at 12:48:59PM +0900, Masami Hiramatsu wrote: > On Fri, 7 Dec 2018 18:58:05 +0100 > Andrea Righi wrote: > > > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > > Hi Andrea and Ingo, > > > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > > working. > > > After introducing this patch, I will start adding > > > arch_populate_kprobe_blacklist() > > > to some arches. > > > > > > Thank you, > > > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > > area > > > > > > From: Masami Hiramatsu > > > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > > With this change, user can see all symbols which are prohibited > > > to probe in debugfs. > > > > > > All archtectures which have custom prohibit areas should define > > > its own arch_populate_kprobe_blacklist() function, but unless that, > > > all symbols marked __kprobes are blacklisted. > > > > What about iterating all symbols and use arch_within_kprobe_blacklist() > > to check if we need to blacklist them or not. > > Sorry, I don't want to iterate all ksyms since it may take a long time > (especially embedded small devices.) > > > > > In this way we don't have to introduce an > > arch_populate_kprobe_blacklist() for each architecture. > > Hmm, I had a same idea, but there are some arch which prohibit probing > extable entries (e.g. arm64.) For correctness of the blacklist, I think > it should be listed (not entire the function body). > I also rather like to remove arch_within_kprobe_blacklist() instead. OK. Thanks. -Andrea
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Sat, Dec 08, 2018 at 12:48:59PM +0900, Masami Hiramatsu wrote: > On Fri, 7 Dec 2018 18:58:05 +0100 > Andrea Righi wrote: > > > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > > Hi Andrea and Ingo, > > > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > > working. > > > After introducing this patch, I will start adding > > > arch_populate_kprobe_blacklist() > > > to some arches. > > > > > > Thank you, > > > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > > area > > > > > > From: Masami Hiramatsu > > > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > > With this change, user can see all symbols which are prohibited > > > to probe in debugfs. > > > > > > All archtectures which have custom prohibit areas should define > > > its own arch_populate_kprobe_blacklist() function, but unless that, > > > all symbols marked __kprobes are blacklisted. > > > > What about iterating all symbols and use arch_within_kprobe_blacklist() > > to check if we need to blacklist them or not. > > Sorry, I don't want to iterate all ksyms since it may take a long time > (especially embedded small devices.) > > > > > In this way we don't have to introduce an > > arch_populate_kprobe_blacklist() for each architecture. > > Hmm, I had a same idea, but there are some arch which prohibit probing > extable entries (e.g. arm64.) For correctness of the blacklist, I think > it should be listed (not entire the function body). > I also rather like to remove arch_within_kprobe_blacklist() instead. OK. Thanks. -Andrea
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Sat, Dec 08, 2018 at 12:42:10PM +0900, Masami Hiramatsu wrote: > On Fri, 7 Dec 2018 18:00:26 +0100 > Andrea Righi wrote: > > > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > > Hi Andrea and Ingo, > > > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > > working. > > > After introducing this patch, I will start adding > > > arch_populate_kprobe_blacklist() > > > to some arches. > > > > > > Thank you, > > > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > > area > > > > > > From: Masami Hiramatsu > > > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > > With this change, user can see all symbols which are prohibited > > > to probe in debugfs. > > > > > > All archtectures which have custom prohibit areas should define > > > its own arch_populate_kprobe_blacklist() function, but unless that, > > > all symbols marked __kprobes are blacklisted. > > > > > > Reported-by: Andrea Righi > > > Signed-off-by: Masami Hiramatsu > > > --- > > > > [snip] > > > > > +int kprobe_add_ksym_blacklist(unsigned long entry) > > > +{ > > > + struct kprobe_blacklist_entry *ent; > > > + unsigned long offset = 0, size = 0; > > > + > > > + if (!kernel_text_address(entry) || > > > + !kallsyms_lookup_size_offset(entry, , )) > > > + return -EINVAL; > > > + > > > + ent = kmalloc(sizeof(*ent), GFP_KERNEL); > > > + if (!ent) > > > + return -ENOMEM; > > > + ent->start_addr = entry - offset; > > > + ent->end_addr = entry - offset + size; > > > > Do we need to take offset into account? The code before wasn't using it. > > Yes, if we hit an alias symbol (zero-size), we forcibly increment address > and retry it. In that case, offset will be 1. > > > > > > + INIT_LIST_HEAD(>list); > > > + list_add_tail(>list, _blacklist); > > > + > > > + return (int)size; > > > +} > > > + > > > +/* Add functions in arch defined probe-prohibited area */ > > > +int __weak arch_populate_kprobe_blacklist(void) > > > +{ > > > + unsigned long entry; > > > + int ret = 0; > > > + > > > + for (entry = (unsigned long)__kprobes_text_start; > > > + entry < (unsigned long)__kprobes_text_end; > > > + entry += ret) { > > > + ret = kprobe_add_ksym_blacklist(entry); > > > + if (ret < 0) > > > + return ret; > > > + if (ret == 0) /* In case of alias symbol */ > > > + ret = 1; > > Here, we incremented. > > Thank you, Makes sense, thanks for the clarification. -Andrea
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Sat, Dec 08, 2018 at 12:42:10PM +0900, Masami Hiramatsu wrote: > On Fri, 7 Dec 2018 18:00:26 +0100 > Andrea Righi wrote: > > > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > > Hi Andrea and Ingo, > > > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > > working. > > > After introducing this patch, I will start adding > > > arch_populate_kprobe_blacklist() > > > to some arches. > > > > > > Thank you, > > > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > > area > > > > > > From: Masami Hiramatsu > > > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > > With this change, user can see all symbols which are prohibited > > > to probe in debugfs. > > > > > > All archtectures which have custom prohibit areas should define > > > its own arch_populate_kprobe_blacklist() function, but unless that, > > > all symbols marked __kprobes are blacklisted. > > > > > > Reported-by: Andrea Righi > > > Signed-off-by: Masami Hiramatsu > > > --- > > > > [snip] > > > > > +int kprobe_add_ksym_blacklist(unsigned long entry) > > > +{ > > > + struct kprobe_blacklist_entry *ent; > > > + unsigned long offset = 0, size = 0; > > > + > > > + if (!kernel_text_address(entry) || > > > + !kallsyms_lookup_size_offset(entry, , )) > > > + return -EINVAL; > > > + > > > + ent = kmalloc(sizeof(*ent), GFP_KERNEL); > > > + if (!ent) > > > + return -ENOMEM; > > > + ent->start_addr = entry - offset; > > > + ent->end_addr = entry - offset + size; > > > > Do we need to take offset into account? The code before wasn't using it. > > Yes, if we hit an alias symbol (zero-size), we forcibly increment address > and retry it. In that case, offset will be 1. > > > > > > + INIT_LIST_HEAD(>list); > > > + list_add_tail(>list, _blacklist); > > > + > > > + return (int)size; > > > +} > > > + > > > +/* Add functions in arch defined probe-prohibited area */ > > > +int __weak arch_populate_kprobe_blacklist(void) > > > +{ > > > + unsigned long entry; > > > + int ret = 0; > > > + > > > + for (entry = (unsigned long)__kprobes_text_start; > > > + entry < (unsigned long)__kprobes_text_end; > > > + entry += ret) { > > > + ret = kprobe_add_ksym_blacklist(entry); > > > + if (ret < 0) > > > + return ret; > > > + if (ret == 0) /* In case of alias symbol */ > > > + ret = 1; > > Here, we incremented. > > Thank you, Makes sense, thanks for the clarification. -Andrea
[PATCH] ARC: Remove 0x prefix from unit-address of node names
Done automatically with help of: --->8 sed -i 's/@0x/@/g' arch/arc/boot/dts/*.dts* --->8 Inspired by [1] and the like. [1] http://kisskb.ellerman.id.au/kisskb/buildresult/13612017/ Signed-off-by: Alexey Brodkin Cc: Rob Herring Signed-off-by: Alexey Brodkin --- arch/arc/boot/dts/abilis_tb10x.dtsi | 4 ++-- arch/arc/boot/dts/axc001.dtsi | 6 +++--- arch/arc/boot/dts/axc003.dtsi | 14 +++--- arch/arc/boot/dts/axc003_idu.dtsi | 14 +++--- arch/arc/boot/dts/axs10x_mb.dtsi | 22 +++--- arch/arc/boot/dts/vdk_axc003.dtsi | 4 ++-- arch/arc/boot/dts/vdk_axc003_idu.dtsi | 4 ++-- arch/arc/boot/dts/vdk_axs10x_mb.dtsi | 14 +++--- 8 files changed, 41 insertions(+), 41 deletions(-) diff --git a/arch/arc/boot/dts/abilis_tb10x.dtsi b/arch/arc/boot/dts/abilis_tb10x.dtsi index 3121536b25a3..77ac1368a65b 100644 --- a/arch/arc/boot/dts/abilis_tb10x.dtsi +++ b/arch/arc/boot/dts/abilis_tb10x.dtsi @@ -178,7 +178,7 @@ clocks = <_clk>; }; - spi0: spi@0xFE01 { + spi0: spi@FE01 { #address-cells = <1>; #size-cells = <0>; cell-index = <0>; @@ -189,7 +189,7 @@ interrupts = <26 8>; clocks = <_clk>; }; - spi1: spi@0xFE011000 { + spi1: spi@FE011000 { #address-cells = <1>; #size-cells = <0>; cell-index = <1>; diff --git a/arch/arc/boot/dts/axc001.dtsi b/arch/arc/boot/dts/axc001.dtsi index fdc266504ada..37be3bf03ad6 100644 --- a/arch/arc/boot/dts/axc001.dtsi +++ b/arch/arc/boot/dts/axc001.dtsi @@ -41,7 +41,7 @@ * this GPIO block ORs all interrupts on CPU card (creg,..) * to uplink only 1 IRQ to ARC core intc */ - dw-apb-gpio@0x2000 { + dw-apb-gpio@2000 { compatible = "snps,dw-apb-gpio"; reg = < 0x2000 0x80 >; #address-cells = <1>; @@ -60,7 +60,7 @@ }; }; - debug_uart: dw-apb-uart@0x5000 { + debug_uart: dw-apb-uart@5000 { compatible = "snps,dw-apb-uart"; reg = <0x5000 0x100>; clock-frequency = <3000>; @@ -88,7 +88,7 @@ * avoid duplicating the MB dtsi file given that IRQ from * this intc to cpu intc are different for axs101 and axs103 */ - mb_intc: dw-apb-ictl@0xe0012000 { + mb_intc: dw-apb-ictl@e0012000 { #interrupt-cells = <1>; compatible = "snps,dw-apb-ictl"; reg = < 0x0 0xe0012000 0x0 0x200 >; diff --git a/arch/arc/boot/dts/axc003.dtsi b/arch/arc/boot/dts/axc003.dtsi index d75d65ddf8e3..ffa208a57a0a 100644 --- a/arch/arc/boot/dts/axc003.dtsi +++ b/arch/arc/boot/dts/axc003.dtsi @@ -55,7 +55,7 @@ * this GPIO block ORs all interrupts on CPU card (creg,..) * to uplink only 1 IRQ to ARC core intc */ - dw-apb-gpio@0x2000 { + dw-apb-gpio@2000 { compatible = "snps,dw-apb-gpio"; reg = < 0x2000 0x80 >; #address-cells = <1>; @@ -74,7 +74,7 @@ }; }; - debug_uart: dw-apb-uart@0x5000 { + debug_uart: dw-apb-uart@5000 { compatible = "snps,dw-apb-uart"; reg = <0x5000 0x100>; clock-frequency = <3000>; @@ -102,19 +102,19 @@ * external DMA buffer located outside of IOC aperture. */ axs10x_mb { - ethernet@0x18000 { + ethernet@18000 { dma-coherent; }; - ehci@0x4 { + ehci@4 { dma-coherent; }; - ohci@0x6 { + ohci@6 { dma-coherent; }; - mmc@0x15000 { + mmc@15000 { dma-coherent; }; }; @@ -132,7 +132,7 @@ * avoid duplicating the MB dtsi file given that IRQ from * this intc to cpu intc are different for axs101 and axs103 */ - mb_intc: dw-apb-ictl@0xe0012000 { + mb_intc: dw-apb-ictl@e0012000 { #interrupt-cells = <1>; compatible = "snps,dw-apb-ictl"; reg = < 0x0 0xe0012000 0x0 0x200 >; diff --git a/arch/arc/boot/dts/axc003_idu.dtsi b/arch/arc/boot/dts/axc003_idu.dtsi index a05bb737ea63..ad795748ac63 100644 ---
Re: [PATCH 1/5] kconfig: remove unneeded setsym label in conf_read_simple()
On Fri, Nov 30, 2018 at 6:17 PM Masahiro Yamada wrote: > > The two 'goto setsym' statements are reachable only when sym == NULL. > > The code below the 'setsym:' label does nothing when sym == NULL > since there is just one if-block guarded by 'if (sym && ...)'. > > Hence, 'goto setsym' can be replaced with 'continue'. > > Signed-off-by: Masahiro Yamada > --- Series, applied to linux-kbuild/kconfig. > scripts/kconfig/confdata.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c > index 91d0a5c..1e35529 100644 > --- a/scripts/kconfig/confdata.c > +++ b/scripts/kconfig/confdata.c > @@ -363,7 +363,7 @@ int conf_read_simple(const char *name, int def) > sym = sym_find(line + 2 + strlen(CONFIG_)); > if (!sym) { > sym_add_change_count(1); > - goto setsym; > + continue; > } > } else { > sym = sym_lookup(line + 2 + strlen(CONFIG_), > 0); > @@ -397,7 +397,7 @@ int conf_read_simple(const char *name, int def) > sym = sym_find(line + strlen(CONFIG_)); > if (!sym) { > sym_add_change_count(1); > - goto setsym; > + continue; > } > } else { > sym = sym_lookup(line + strlen(CONFIG_), 0); > @@ -416,7 +416,7 @@ int conf_read_simple(const char *name, int def) > > continue; > } > -setsym: > + > if (sym && sym_is_choice_value(sym)) { > struct symbol *cs = > prop_get_symbol(sym_get_choice_prop(sym)); > switch (sym->def[def].tri) { > -- > 2.7.4 > -- Best Regards Masahiro Yamada
Re: [PATCH 1/5] kconfig: remove unneeded setsym label in conf_read_simple()
On Fri, Nov 30, 2018 at 6:17 PM Masahiro Yamada wrote: > > The two 'goto setsym' statements are reachable only when sym == NULL. > > The code below the 'setsym:' label does nothing when sym == NULL > since there is just one if-block guarded by 'if (sym && ...)'. > > Hence, 'goto setsym' can be replaced with 'continue'. > > Signed-off-by: Masahiro Yamada > --- Series, applied to linux-kbuild/kconfig. > scripts/kconfig/confdata.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c > index 91d0a5c..1e35529 100644 > --- a/scripts/kconfig/confdata.c > +++ b/scripts/kconfig/confdata.c > @@ -363,7 +363,7 @@ int conf_read_simple(const char *name, int def) > sym = sym_find(line + 2 + strlen(CONFIG_)); > if (!sym) { > sym_add_change_count(1); > - goto setsym; > + continue; > } > } else { > sym = sym_lookup(line + 2 + strlen(CONFIG_), > 0); > @@ -397,7 +397,7 @@ int conf_read_simple(const char *name, int def) > sym = sym_find(line + strlen(CONFIG_)); > if (!sym) { > sym_add_change_count(1); > - goto setsym; > + continue; > } > } else { > sym = sym_lookup(line + strlen(CONFIG_), 0); > @@ -416,7 +416,7 @@ int conf_read_simple(const char *name, int def) > > continue; > } > -setsym: > + > if (sym && sym_is_choice_value(sym)) { > struct symbol *cs = > prop_get_symbol(sym_get_choice_prop(sym)); > switch (sym->def[def].tri) { > -- > 2.7.4 > -- Best Regards Masahiro Yamada
Re: [PATCH] kbuild: move .SECONDARY special target to Kbuild.include
On Sat, Dec 1, 2018 at 9:29 AM Masahiro Yamada wrote: > > In commit 54a702f70589 ("kbuild: mark $(targets) as .SECONDARY and > remove .PRECIOUS markers"), I missed one important feature of the > .SECONDARY target: > > .SECONDARY with no prerequisites causes all targets to be treated >as secondary. > > Kbuild's policy is, "Do not delete any intermediate files." > So, just move it to scripts/Kbuild.include, with no prerequisites. > > Note: > If an intermediate file is generated by $(call if_changed,...), you > still need to add it to "targets" so its .*.cmd file is included. > > The arm/arm64 crypto files are generated by $(call cmd,shipped), > so they do not need to be added to "targets", but need to be added > to "clean-files" so "make clean" can properly clean them away. > > Signed-off-by: Masahiro Yamada > --- Applied to linux-kbuild. > arch/arm/crypto/Makefile | 2 +- > arch/arm64/crypto/Makefile | 2 +- > scripts/Kbuild.include | 3 +++ > scripts/Makefile.build | 4 > 4 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile > index bd5bcee..89f88ab 100644 > --- a/arch/arm/crypto/Makefile > +++ b/arch/arm/crypto/Makefile > @@ -65,4 +65,4 @@ $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl > $(call cmd,perl) > endif > > -targets += sha256-core.S sha512-core.S > +clean-files += sha256-core.S sha512-core.S > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile > index f476fed..860d931 100644 > --- a/arch/arm64/crypto/Makefile > +++ b/arch/arm64/crypto/Makefile > @@ -75,4 +75,4 @@ $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > endif > > -targets += sha256-core.S sha512-core.S > +clean-files += sha256-core.S sha512-core.S > diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include > index bb01555..0454916 100644 > --- a/scripts/Kbuild.include > +++ b/scripts/Kbuild.include > @@ -393,3 +393,6 @@ endef > > # delete partially updated (i.e. corrupted) files on error > .DELETE_ON_ERROR: > + > +# do not delete intermediate files automatically > +.SECONDARY: > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index a8e7ba9..604096a 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -546,8 +546,4 @@ $(shell mkdir -p $(obj-dirs)) > endif > endif > > -# Some files contained in $(targets) are intermediate artifacts. > -# We never want them to be removed automatically. > -.SECONDARY: $(targets) > - > .PHONY: $(PHONY) > -- > 2.7.4 > -- Best Regards Masahiro Yamada
Re: [PATCH 6/7] microblaze: fix race condition in building boot images
On Thu, Dec 6, 2018 at 1:32 AM Michal Simek wrote: > > On 03. 12. 18 8:50, Masahiro Yamada wrote: > > I fixed a race condition in the parallel building of ARM in commit > > 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not > > generate invalid images"). > > > > I see the same problem for MicroBlaze too. > > > > "make -j ARCH=microblaze all linux.bin.ub" results in a broken build > > since two threads descend into arch/microblaze/boot simultaneously. > > I see also different problem that when I run that make above > linux.bin.ub is not generated at all. That's why I am fixing the problem. Before the fix, the parallel build fails. /usr/bin/mkimage: Can't read arch/microblaze/boot/linux.bin: Invalid argument arch/microblaze/boot/Makefile:14: recipe for target 'arch/microblaze/boot/linux.bin.ub' failed make[1]: *** [arch/microblaze/boot/linux.bin.ub] Error 1 make[1]: *** Deleting file 'arch/microblaze/boot/linux.bin.ub' arch/microblaze/Makefile:87: recipe for target 'linux.bin.ub' failed make: *** [linux.bin.ub] Error 2 make: *** Waiting for unfinished jobs MODPOST 6 modules Kernel: arch/microblaze/boot/linux.bin is ready (#2) > > > > > Add proper dependencies to avoid it. > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/microblaze/Makefile | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/arch/microblaze/Makefile b/arch/microblaze/Makefile > > index 7a5df02..544796d 100644 > > --- a/arch/microblaze/Makefile > > +++ b/arch/microblaze/Makefile > > @@ -79,13 +79,15 @@ all: linux.bin > > archclean: > > $(Q)$(MAKE) $(clean)=$(boot) > > > > +linux.bin.ub linux.bin.gz: linux.bin > > + > > PHONY += linux.bin linux.bin.gz linux.bin.ub > > linux.bin linux.bin.gz linux.bin.ub: vmlinux > > $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ > > @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')' > > > > PHONY += simpleImage.$(DTB) > > -simpleImage.$(DTB): vmlinux > > +simpleImage.$(DTB): linux.bin.ub > > It looks weird that simpleImage requires linux.bin.ub. > Is it really linux.bin.ub here or just linux.bin? This is intentional to avoid a race in "make simpleImage. linux.bin.ub" But, I chose to make simpleImage* independent of linux.bin* in v2. I hope you will like it. > > $(Q)$(MAKE) $(build)=$(boot) simple_images > > @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')' > > > > > > > Thanks, > Michal > > -- > Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91 > w: www.monstr.eu p: +42-0-721842854 > Maintainer of Linux kernel - Xilinx Microblaze > Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs > U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs > > -- Best Regards Masahiro Yamada
Re: [PATCH 6/7] microblaze: fix race condition in building boot images
On Thu, Dec 6, 2018 at 1:32 AM Michal Simek wrote: > > On 03. 12. 18 8:50, Masahiro Yamada wrote: > > I fixed a race condition in the parallel building of ARM in commit > > 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not > > generate invalid images"). > > > > I see the same problem for MicroBlaze too. > > > > "make -j ARCH=microblaze all linux.bin.ub" results in a broken build > > since two threads descend into arch/microblaze/boot simultaneously. > > I see also different problem that when I run that make above > linux.bin.ub is not generated at all. That's why I am fixing the problem. Before the fix, the parallel build fails. /usr/bin/mkimage: Can't read arch/microblaze/boot/linux.bin: Invalid argument arch/microblaze/boot/Makefile:14: recipe for target 'arch/microblaze/boot/linux.bin.ub' failed make[1]: *** [arch/microblaze/boot/linux.bin.ub] Error 1 make[1]: *** Deleting file 'arch/microblaze/boot/linux.bin.ub' arch/microblaze/Makefile:87: recipe for target 'linux.bin.ub' failed make: *** [linux.bin.ub] Error 2 make: *** Waiting for unfinished jobs MODPOST 6 modules Kernel: arch/microblaze/boot/linux.bin is ready (#2) > > > > > Add proper dependencies to avoid it. > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/microblaze/Makefile | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/arch/microblaze/Makefile b/arch/microblaze/Makefile > > index 7a5df02..544796d 100644 > > --- a/arch/microblaze/Makefile > > +++ b/arch/microblaze/Makefile > > @@ -79,13 +79,15 @@ all: linux.bin > > archclean: > > $(Q)$(MAKE) $(clean)=$(boot) > > > > +linux.bin.ub linux.bin.gz: linux.bin > > + > > PHONY += linux.bin linux.bin.gz linux.bin.ub > > linux.bin linux.bin.gz linux.bin.ub: vmlinux > > $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ > > @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')' > > > > PHONY += simpleImage.$(DTB) > > -simpleImage.$(DTB): vmlinux > > +simpleImage.$(DTB): linux.bin.ub > > It looks weird that simpleImage requires linux.bin.ub. > Is it really linux.bin.ub here or just linux.bin? This is intentional to avoid a race in "make simpleImage. linux.bin.ub" But, I chose to make simpleImage* independent of linux.bin* in v2. I hope you will like it. > > $(Q)$(MAKE) $(build)=$(boot) simple_images > > @echo 'Kernel: $(boot)/$@ is ready' ' (#'`cat .version`')' > > > > > > > Thanks, > Michal > > -- > Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91 > w: www.monstr.eu p: +42-0-721842854 > Maintainer of Linux kernel - Xilinx Microblaze > Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs > U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs > > -- Best Regards Masahiro Yamada
Re: [PATCH v9 2/3] x86: add support for Huawei WMI hotkeys.
On Fri, 2018-12-07 at 00:52 -0500, ayman.baga...@gmail.com wrote: On Mon, 2018-12-03 at 21:17 +0200, Andy Shevchenko wrote: > On Mon, Dec 3, 2018 at 9:04 PM Takashi Iwai wrote: > > On Mon, 03 Dec 2018 19:53:39 +0100, > > Ayman Bagabas wrote: > > > + if (code == 0x80) { > > > + acpi_status status; > > > + acpi_handle handle; > > > + unsigned long long result; > > > + union acpi_object args[1]; > > > + struct acpi_object_list arg_list = { > > > + .pointer = args, > > > + .count = ARRAY_SIZE(args), > > > + }; > > > + > > > + args[0].type = ACPI_TYPE_INTEGER; > > > + args[0].integer.value = 0; > > > + > > > + status = acpi_get_handle(NULL, "\\WMI0", ); > > > + if (ACPI_FAILURE(status)) { > > > + dev_err(>dev, "Unable to get ACPI > > > handle\n"); > > > + return; > > > + } > > > + > > > + status = acpi_evaluate_integer(NULL, "WQ00", > > > _list, ); > > > > I guess you need to pass handle here? In the earlier version, you > > passed \\WMI0.WQ00, so it worked with NULL handle. But now it's no > > longer so... > > I think in this case we don't need to have a separate call to get > handle and try to get integer directly. In either we will have an > error case if method / namespace / etc is not found. > > I was digging further into the DSDT table of the laptop that has WQ00 method and it turned out that this method is a ACPI_WMI_EXPENSIVE. It has its own GUID of "39142400-C6A3-40fa-BADB-8A2652834100". I believe wmi_query_block, which is deprecated BTW, can be used to access this method instead of acpi_evaluate. However, I don't have a machine to test it on. Any ideas?
Re: [PATCH v9 2/3] x86: add support for Huawei WMI hotkeys.
On Fri, 2018-12-07 at 00:52 -0500, ayman.baga...@gmail.com wrote: On Mon, 2018-12-03 at 21:17 +0200, Andy Shevchenko wrote: > On Mon, Dec 3, 2018 at 9:04 PM Takashi Iwai wrote: > > On Mon, 03 Dec 2018 19:53:39 +0100, > > Ayman Bagabas wrote: > > > + if (code == 0x80) { > > > + acpi_status status; > > > + acpi_handle handle; > > > + unsigned long long result; > > > + union acpi_object args[1]; > > > + struct acpi_object_list arg_list = { > > > + .pointer = args, > > > + .count = ARRAY_SIZE(args), > > > + }; > > > + > > > + args[0].type = ACPI_TYPE_INTEGER; > > > + args[0].integer.value = 0; > > > + > > > + status = acpi_get_handle(NULL, "\\WMI0", ); > > > + if (ACPI_FAILURE(status)) { > > > + dev_err(>dev, "Unable to get ACPI > > > handle\n"); > > > + return; > > > + } > > > + > > > + status = acpi_evaluate_integer(NULL, "WQ00", > > > _list, ); > > > > I guess you need to pass handle here? In the earlier version, you > > passed \\WMI0.WQ00, so it worked with NULL handle. But now it's no > > longer so... > > I think in this case we don't need to have a separate call to get > handle and try to get integer directly. In either we will have an > error case if method / namespace / etc is not found. > > I was digging further into the DSDT table of the laptop that has WQ00 method and it turned out that this method is a ACPI_WMI_EXPENSIVE. It has its own GUID of "39142400-C6A3-40fa-BADB-8A2652834100". I believe wmi_query_block, which is deprecated BTW, can be used to access this method instead of acpi_evaluate. However, I don't have a machine to test it on. Any ideas?
Re: [PATCH 1/2] um: remove -fno-unit-at-a-time workaround for pre-4.0 GCC
x86 maintainers, Ping. On Tue, Nov 13, 2018 at 6:48 PM Richard Weinberger wrote: > > Am Montag, 12. November 2018, 03:35:19 CET schrieb Masahiro Yamada: > > Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6") > > bumped the minimum GCC version to 4.6 for all architectures. > > > > '$(call cc-option,-fno-unit-at-a-time)' is now dead code since > > '$(cc-version) -lt 0400' is always false. > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/x86/Makefile.um | 8 ++-- > > 1 file changed, 2 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/Makefile.um b/arch/x86/Makefile.um > > index 91085a0..577976b 100644 > > --- a/arch/x86/Makefile.um > > +++ b/arch/x86/Makefile.um > > @@ -26,12 +26,8 @@ cflags-y += $(call > > cc-option,-mpreferred-stack-boundary=2) > > # an unresolved reference. > > cflags-y += -ffreestanding > > > > -# Disable unit-at-a-time mode on pre-gcc-4.0 compilers, it makes gcc use > > -# a lot more stack due to the lack of sharing of stacklots. Also, gcc > > -# 4.3.0 needs -funit-at-a-time for extern inline functions. > > -KBUILD_CFLAGS += $(shell if [ $(cc-version) -lt 0400 ] ; then \ > > - echo $(call cc-option,-fno-unit-at-a-time); \ > > - else echo $(call cc-option,-funit-at-a-time); fi ;) > > +# gcc 4.3.0 needs -funit-at-a-time for extern inline functions. > > +KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time) > > Acked-by: Richard Weinberger > > Thanks, > //richard > > > -- Best Regards Masahiro Yamada
Re: [PATCH 2/2] x86, powerpc: remove -funit-at-a-time compiler option entirely
x86 maintainers, Ping. On Mon, Nov 12, 2018 at 8:23 PM Michael Ellerman wrote: > > Masahiro Yamada writes: > > > GCC 4.6 manual says: > > > > -funit-at-a-time > > This option is left for compatibility reasons. -funit-at-a-time has > > no effect, while -fno-unit-at-a-time implies -fno-toplevel-reorder > > and -fno-section-anchors. > > Enabled by default. > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/powerpc/Makefile | 4 > > arch/x86/Makefile | 4 > > arch/x86/Makefile.um | 5 - > > 3 files changed, 13 deletions(-) > > > > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > > index 8a2ce14..854199c 100644 > > --- a/arch/powerpc/Makefile > > +++ b/arch/powerpc/Makefile > > @@ -228,10 +228,6 @@ KBUILD_CFLAGS += $(call cc-option,-mno-vsx) > > KBUILD_CFLAGS += $(call cc-option,-mno-spe) > > KBUILD_CFLAGS += $(call cc-option,-mspe=no) > > > > -# Enable unit-at-a-time mode when possible. It shrinks the > > -# kernel considerably. > > -KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time) > > - > > Thanks for cleaning it up. > > Acked-by: Michael Ellerman > > cheers -- Best Regards Masahiro Yamada
[RFC PATCH] hwmon/k10temp: Add Hygon Dhyana support
Add support for Hygon Dhyana family 18h processor for k10temp to get the temperature. As Hygon Dhyana shares the same function interface with AMD family 17h, so add Hygon PCI Vendor ID and reuse the code path of AMD. Signed-off-by: Pu Wen --- drivers/hwmon/k10temp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index 2cef0c3..e24ba10 100644 --- a/drivers/hwmon/k10temp.c +++ b/drivers/hwmon/k10temp.c @@ -330,7 +330,7 @@ static int k10temp_probe(struct pci_dev *pdev, (boot_cpu_data.x86_model & 0xf0) == 0x70)) { data->read_htcreg = read_htcreg_nb_f15; data->read_tempreg = read_tempreg_nb_f15; - } else if (boot_cpu_data.x86 == 0x17) { + } else if (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) { data->temp_adjust_mask = 0x8; data->read_tempreg = read_tempreg_nb_f17; data->show_tdie = true; @@ -367,6 +367,7 @@ static const struct pci_device_id k10temp_id_table[] = { { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) }, { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_DF_F3) }, { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M10H_DF_F3) }, + { PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) }, {} }; MODULE_DEVICE_TABLE(pci, k10temp_id_table); -- 2.7.4
Re: [PATCH] nvme-rdma: complete requests from ->timeout
Now, I see that my patch is not safe and can cause double completions. However, I am having a hard time finding out a good solution to barrier the racing completions. Could you suggest where the fix should go and what should it look like? We can provide more details on reproducing this issue if that helps. On Fri, Dec 7, 2018 at 6:04 PM Keith Busch wrote: > > On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote: > > > > > Could you please take a look at this bug and code review? > > > > > > We are seeing more instances of this bug and found that reconnect_work > > > could hang as well, as can be seen from below stacktrace. > > > > > > Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] > > > Call Trace: > > > __schedule+0x2ab/0x880 > > > schedule+0x36/0x80 > > > schedule_timeout+0x161/0x300 > > > ? __next_timer_interrupt+0xe0/0xe0 > > > io_schedule_timeout+0x1e/0x50 > > > wait_for_completion_io_timeout+0x130/0x1a0 > > > ? wake_up_q+0x80/0x80 > > > blk_execute_rq+0x6e/0xa0 > > > __nvme_submit_sync_cmd+0x6e/0xe0 > > > nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics] > > > ? wait_for_completion_interruptible_timeout+0x157/0x1b0 > > > nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma] > > > nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma] > > > nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma] > > > process_one_work+0x179/0x390 > > > worker_thread+0x4f/0x3e0 > > > kthread+0x105/0x140 > > > ? max_active_store+0x80/0x80 > > > ? kthread_bind+0x20/0x20 > > > > > > This bug is produced by setting MTU of RoCE interface to '568' for > > > test while running I/O traffics. > > > > I think that with the latest changes from Keith we can no longer rely > > on blk-mq to barrier racing completions. We will probably need > > to barrier ourselves in nvme-rdma... > > You really need to do that anyway. If you were relying on blk-mq to save > you from double completions by ending a request in the nvme driver while > the lower half can still complete the same one, the only thing preventing > data corruption is the probability the request wasn't reallocated for a > new command.
Re: [PATCH] nvme-rdma: complete requests from ->timeout
Now, I see that my patch is not safe and can cause double completions. However, I am having a hard time finding out a good solution to barrier the racing completions. Could you suggest where the fix should go and what should it look like? We can provide more details on reproducing this issue if that helps. On Fri, Dec 7, 2018 at 6:04 PM Keith Busch wrote: > > On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote: > > > > > Could you please take a look at this bug and code review? > > > > > > We are seeing more instances of this bug and found that reconnect_work > > > could hang as well, as can be seen from below stacktrace. > > > > > > Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] > > > Call Trace: > > > __schedule+0x2ab/0x880 > > > schedule+0x36/0x80 > > > schedule_timeout+0x161/0x300 > > > ? __next_timer_interrupt+0xe0/0xe0 > > > io_schedule_timeout+0x1e/0x50 > > > wait_for_completion_io_timeout+0x130/0x1a0 > > > ? wake_up_q+0x80/0x80 > > > blk_execute_rq+0x6e/0xa0 > > > __nvme_submit_sync_cmd+0x6e/0xe0 > > > nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics] > > > ? wait_for_completion_interruptible_timeout+0x157/0x1b0 > > > nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma] > > > nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma] > > > nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma] > > > process_one_work+0x179/0x390 > > > worker_thread+0x4f/0x3e0 > > > kthread+0x105/0x140 > > > ? max_active_store+0x80/0x80 > > > ? kthread_bind+0x20/0x20 > > > > > > This bug is produced by setting MTU of RoCE interface to '568' for > > > test while running I/O traffics. > > > > I think that with the latest changes from Keith we can no longer rely > > on blk-mq to barrier racing completions. We will probably need > > to barrier ourselves in nvme-rdma... > > You really need to do that anyway. If you were relying on blk-mq to save > you from double completions by ending a request in the nvme driver while > the lower half can still complete the same one, the only thing preventing > data corruption is the probability the request wasn't reallocated for a > new command.
Re: [PATCH 0/7] microblaze: fix various problems in building boot images
On Sat, Dec 8, 2018 at 12:20 AM Michal Simek wrote: > > On 07. 12. 18 14:29, Michal Simek wrote: > > On 07. 12. 18 12:29, Masahiro Yamada wrote: > >> On Thu, Dec 6, 2018 at 11:55 PM Michal Simek wrote: > >>> > >>> On 03. 12. 18 8:50, Masahiro Yamada wrote: > This patch set fixes various issues in microblaze Makefiles. > > BTW, "simpleImage." works like a phony target to generate the > following four images, where the first three are just aliases. > > - arch/microblaze/boot/simpleImage.: > identical to arch/microblaze/boot/linux.bin > > - arch/microblaze/boot/simpleImage..unstrip: > identical to vmlinux > > - arch/microblaze/boot/simpleImage..ub: > identical to arch/microblaze/boot/linux.bin.ub > > - arch/microblaze/boot/simpleImage..strip: > stripped vmlinux > > I am not sure how much useful those copies are, > but, I tried my best to keep the same behavior. > > IMHO, I guess DTB= would be more sensible, > but it is up to Michal. > > > > Masahiro Yamada (7): > microblaze: fix cleaning of boot images > microblaze: adjust the help to the real behavior > microblaze: move "... is ready" message to arch/microblaze/Makefile > microblaze: fix multiple bugs in arch/microblaze/boot/Makefile > microblaze: add linux.bin* and simpleImage.* to PHONY > microblaze: fix race condition in building boot images > microblaze: remove the unneeded code just in case file copy fails > > arch/microblaze/Makefile | 14 +- > arch/microblaze/boot/Makefile | 33 + > arch/microblaze/boot/dts/Makefile | 5 + > 3 files changed, 27 insertions(+), 25 deletions(-) > > >>> > >>> One more thing I have in my mind for a while is that will be good to > >>> configure kernel build flags from DT and completely get rid of these > >>> symbols. > >>> > >>> XILINX_MICROBLAZE0_USE_MSR_INSTR > >>> XILINX_MICROBLAZE0_USE_PCMP_INSTR > >>> XILINX_MICROBLAZE0_USE_BARREL > >>> XILINX_MICROBLAZE0_USE_DIV > >>> XILINX_MICROBLAZE0_USE_HW_MUL > >>> XILINX_MICROBLAZE0_USE_FPU > >>> > >>> It means setup CPUFLAGS based on extracting that values from DT that it > >>> all the time match the hardware. > >>> It will also simplify all the CPUFLAGS logic which is in > >>> arch/microblaze/Makefile. > >>> > >>> Do you have any idea how this can be done? > >> > >> > >> I have no idea. > >> > >> Parsing DTS with sed or something would be possible, but ugly. > >> > >> In my opinion, the kernel should be multi platform image, > >> in other words, it is the least common denominator > >> of supported platforms. > >> > >> So, the kernel should enable all features that may or may not be used > >> depending on platform. > >> > >> I do not know if this works for MB. > > > > Microblaze is soft core CPU where you can select if you want to have it > > with multiplier, divider, barrel shifter, etc. > > You can of course say that you don't have them and you have "universal" > > kernel but very slow. > > That's why user has to setup these configs which are converted to cflags > > to say GCC what can be used. > > And these configs can be simply parsed from dt. > > > > For example like this based on dtb (quick hack) > > > > for i in `echo use-msr-instr use-pcmp-instr use-barrel use-div > > use-hw-mul use-fpu`; do > > UPPER=`echo $i | tr '-' '_' | tr '[a-z]' '[A-Z}'` > > echo $i $UPPER; > > > > VAR=`fdtget -t i $FILE/arch/microblaze/boot/dts/system.dtb > > /cpus/cpu@0/ > > xlnx,$i` > > FLAGS+="CONFIG_XILINX_MICROBLAZE0_${UPPER}=${VAR} " > > done > > > > make $FLAGS > > > > > > When simpleImage is requested dt could be parsed to setup proper build > > flags. > > One more thing I found is that I have done these changes > > diff --git a/arch/microblaze/kernel/setup.c b/arch/microblaze/kernel/setup.c > index bbd6968ce55b..dc6a6fee3ae2 100644 > --- a/arch/microblaze/kernel/setup.c > +++ b/arch/microblaze/kernel/setup.c > @@ -153,11 +153,13 @@ void __init machine_early_init(const char > *cmdline, unsigned int ram, > #endif > > #if CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR > +#error MSR is enabled > if (msr) { > pr_info("!!!Your kernel has setup MSR instruction but "); > pr_cont("CPU don't have it %x\n", msr); > } > #else > +#error MSR is not enabled > if (!msr) { > pr_info("!!!Your kernel not setup MSR instruction but "); > pr_cont("CPU have it %x\n", msr); > > and run MSR with 1 > make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=1 > arch/microblaze/kernel/setup.o > it errors #error MSR is enabled > and all is good. > > And when I run MSR with 0 > make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=0 > arch/microblaze/kernel/setup.o > also getting
Re: [PATCH 0/7] microblaze: fix various problems in building boot images
On Sat, Dec 8, 2018 at 12:20 AM Michal Simek wrote: > > On 07. 12. 18 14:29, Michal Simek wrote: > > On 07. 12. 18 12:29, Masahiro Yamada wrote: > >> On Thu, Dec 6, 2018 at 11:55 PM Michal Simek wrote: > >>> > >>> On 03. 12. 18 8:50, Masahiro Yamada wrote: > This patch set fixes various issues in microblaze Makefiles. > > BTW, "simpleImage." works like a phony target to generate the > following four images, where the first three are just aliases. > > - arch/microblaze/boot/simpleImage.: > identical to arch/microblaze/boot/linux.bin > > - arch/microblaze/boot/simpleImage..unstrip: > identical to vmlinux > > - arch/microblaze/boot/simpleImage..ub: > identical to arch/microblaze/boot/linux.bin.ub > > - arch/microblaze/boot/simpleImage..strip: > stripped vmlinux > > I am not sure how much useful those copies are, > but, I tried my best to keep the same behavior. > > IMHO, I guess DTB= would be more sensible, > but it is up to Michal. > > > > Masahiro Yamada (7): > microblaze: fix cleaning of boot images > microblaze: adjust the help to the real behavior > microblaze: move "... is ready" message to arch/microblaze/Makefile > microblaze: fix multiple bugs in arch/microblaze/boot/Makefile > microblaze: add linux.bin* and simpleImage.* to PHONY > microblaze: fix race condition in building boot images > microblaze: remove the unneeded code just in case file copy fails > > arch/microblaze/Makefile | 14 +- > arch/microblaze/boot/Makefile | 33 + > arch/microblaze/boot/dts/Makefile | 5 + > 3 files changed, 27 insertions(+), 25 deletions(-) > > >>> > >>> One more thing I have in my mind for a while is that will be good to > >>> configure kernel build flags from DT and completely get rid of these > >>> symbols. > >>> > >>> XILINX_MICROBLAZE0_USE_MSR_INSTR > >>> XILINX_MICROBLAZE0_USE_PCMP_INSTR > >>> XILINX_MICROBLAZE0_USE_BARREL > >>> XILINX_MICROBLAZE0_USE_DIV > >>> XILINX_MICROBLAZE0_USE_HW_MUL > >>> XILINX_MICROBLAZE0_USE_FPU > >>> > >>> It means setup CPUFLAGS based on extracting that values from DT that it > >>> all the time match the hardware. > >>> It will also simplify all the CPUFLAGS logic which is in > >>> arch/microblaze/Makefile. > >>> > >>> Do you have any idea how this can be done? > >> > >> > >> I have no idea. > >> > >> Parsing DTS with sed or something would be possible, but ugly. > >> > >> In my opinion, the kernel should be multi platform image, > >> in other words, it is the least common denominator > >> of supported platforms. > >> > >> So, the kernel should enable all features that may or may not be used > >> depending on platform. > >> > >> I do not know if this works for MB. > > > > Microblaze is soft core CPU where you can select if you want to have it > > with multiplier, divider, barrel shifter, etc. > > You can of course say that you don't have them and you have "universal" > > kernel but very slow. > > That's why user has to setup these configs which are converted to cflags > > to say GCC what can be used. > > And these configs can be simply parsed from dt. > > > > For example like this based on dtb (quick hack) > > > > for i in `echo use-msr-instr use-pcmp-instr use-barrel use-div > > use-hw-mul use-fpu`; do > > UPPER=`echo $i | tr '-' '_' | tr '[a-z]' '[A-Z}'` > > echo $i $UPPER; > > > > VAR=`fdtget -t i $FILE/arch/microblaze/boot/dts/system.dtb > > /cpus/cpu@0/ > > xlnx,$i` > > FLAGS+="CONFIG_XILINX_MICROBLAZE0_${UPPER}=${VAR} " > > done > > > > make $FLAGS > > > > > > When simpleImage is requested dt could be parsed to setup proper build > > flags. > > One more thing I found is that I have done these changes > > diff --git a/arch/microblaze/kernel/setup.c b/arch/microblaze/kernel/setup.c > index bbd6968ce55b..dc6a6fee3ae2 100644 > --- a/arch/microblaze/kernel/setup.c > +++ b/arch/microblaze/kernel/setup.c > @@ -153,11 +153,13 @@ void __init machine_early_init(const char > *cmdline, unsigned int ram, > #endif > > #if CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR > +#error MSR is enabled > if (msr) { > pr_info("!!!Your kernel has setup MSR instruction but "); > pr_cont("CPU don't have it %x\n", msr); > } > #else > +#error MSR is not enabled > if (!msr) { > pr_info("!!!Your kernel not setup MSR instruction but "); > pr_cont("CPU have it %x\n", msr); > > and run MSR with 1 > make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=1 > arch/microblaze/kernel/setup.o > it errors #error MSR is enabled > and all is good. > > And when I run MSR with 0 > make defconfig && make CONFIG_XILINX_MICROBLAZE0_USE_MSR_INSTR=0 > arch/microblaze/kernel/setup.o > also getting
Re: [PATCH] x86/kernel: Fix more -Wmissing-prototypes warnings
On Fri, 7 Dec 2018 20:48:47 +0100 Borislav Petkov wrote: > On Fri, Dec 07, 2018 at 11:42:10PM +0900, Masami Hiramatsu wrote: > > Hmm, I just thought that the symbol only referred from inline asm should > > be visible. But if it is OK for any version of supported gcc and clang, > > I'm good to remove it. :-) (IOW, I just concerned about older gcc) > > I just tried two gcc versions: 8.2 and 4.8.5. Both asms looks good: > > 81044690 : > ... > > > 810446ab: 41 57 push %r15 > 810446ad: 48 89 e7mov%rsp,%rdi > 810446b0: e8 db 01 00 00 callq 81044890 > > > there's the CALL... > > 810446b5: 48 89 84 24 98 00 00mov%rax,0x98(%rsp) > 810446bc: 00 > > ... and the handler is at the expected address. > > 81044890 : > 81044890: e8 ab c9 7b 00 callq 81801240 > <__fentry__> > 81044895: 41 57 push %r15 > 81044897: 41 56 push %r14 > 81044899: 41 55 push %r13 > 8104489b: 49 89 fdmov%rdi,%r13 > > Dunno, if you feel like there might be some trouble with some compilers, > I can keep the hunk below as a separate patch and revert it when it > explodes somewhere...? OK, then please remove it. :-) Even if any problem occurs, which must be build error, so we can revert it in that case. Thank you! > > > Reviewed-by: Masami Hiramatsu > > > > Thank you, > > > > > > > > --- > > > diff --git a/arch/x86/kernel/kprobes/core.c > > > b/arch/x86/kernel/kprobes/core.c > > > index 6480056d370f..308bf103cc73 100644 > > > --- a/arch/x86/kernel/kprobes/core.c > > > +++ b/arch/x86/kernel/kprobes/core.c > > > @@ -66,8 +66,6 @@ > > > > > > #include "common.h" > > > > > > -void *trampoline_handler(struct pt_regs *regs); > > > - > > > DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; > > > DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); > > > > > > @@ -753,7 +751,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline); > > > /* > > > * Called from kretprobe_trampoline > > > */ > > > -__visible __used void *trampoline_handler(struct pt_regs *regs) > > > +static __used void *trampoline_handler(struct pt_regs *regs) > > > { > > > struct kretprobe_instance *ri = NULL; > > > struct hlist_head *head, empty_rp; > > > > > > -- > > > Regards/Gruss, > > > Boris. > > > > > > Good mailing practices for 400: avoid top-posting and trim the reply. > > > > > > -- > > Masami Hiramatsu > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. -- Masami Hiramatsu
Re: [PATCH] x86/kernel: Fix more -Wmissing-prototypes warnings
On Fri, 7 Dec 2018 20:48:47 +0100 Borislav Petkov wrote: > On Fri, Dec 07, 2018 at 11:42:10PM +0900, Masami Hiramatsu wrote: > > Hmm, I just thought that the symbol only referred from inline asm should > > be visible. But if it is OK for any version of supported gcc and clang, > > I'm good to remove it. :-) (IOW, I just concerned about older gcc) > > I just tried two gcc versions: 8.2 and 4.8.5. Both asms looks good: > > 81044690 : > ... > > > 810446ab: 41 57 push %r15 > 810446ad: 48 89 e7mov%rsp,%rdi > 810446b0: e8 db 01 00 00 callq 81044890 > > > there's the CALL... > > 810446b5: 48 89 84 24 98 00 00mov%rax,0x98(%rsp) > 810446bc: 00 > > ... and the handler is at the expected address. > > 81044890 : > 81044890: e8 ab c9 7b 00 callq 81801240 > <__fentry__> > 81044895: 41 57 push %r15 > 81044897: 41 56 push %r14 > 81044899: 41 55 push %r13 > 8104489b: 49 89 fdmov%rdi,%r13 > > Dunno, if you feel like there might be some trouble with some compilers, > I can keep the hunk below as a separate patch and revert it when it > explodes somewhere...? OK, then please remove it. :-) Even if any problem occurs, which must be build error, so we can revert it in that case. Thank you! > > > Reviewed-by: Masami Hiramatsu > > > > Thank you, > > > > > > > > --- > > > diff --git a/arch/x86/kernel/kprobes/core.c > > > b/arch/x86/kernel/kprobes/core.c > > > index 6480056d370f..308bf103cc73 100644 > > > --- a/arch/x86/kernel/kprobes/core.c > > > +++ b/arch/x86/kernel/kprobes/core.c > > > @@ -66,8 +66,6 @@ > > > > > > #include "common.h" > > > > > > -void *trampoline_handler(struct pt_regs *regs); > > > - > > > DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; > > > DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); > > > > > > @@ -753,7 +751,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline); > > > /* > > > * Called from kretprobe_trampoline > > > */ > > > -__visible __used void *trampoline_handler(struct pt_regs *regs) > > > +static __used void *trampoline_handler(struct pt_regs *regs) > > > { > > > struct kretprobe_instance *ri = NULL; > > > struct hlist_head *head, empty_rp; > > > > > > -- > > > Regards/Gruss, > > > Boris. > > > > > > Good mailing practices for 400: avoid top-posting and trim the reply. > > > > > > -- > > Masami Hiramatsu > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. -- Masami Hiramatsu
RE: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"
Hi Joe, This patch series adds slave mode for lpspi controller. So this driver can support both master and slave mode for lpspi controller after apply this patch series. Currently, both master mode and slave mode share the code in this driver. Therefore, using spi_master to represent the structure of slave mode will cause confusion. When Geert Uytterhoeven add the slave support for spi, he had done " Generalize SPI 'master' to 'controller'" step, too. The commit ID is 8caab75fd2c2a92667cbb1cd315720bede3feaa9. Thank you. Regards, Clark Wang > -Original Message- > From: Joe Perches > Sent: Saturday, December 8, 2018 13:14 > To: Clark Wang ; broo...@kernel.org > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller" > > On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote: > > In order to enable the slave mode and make the code more readable, > > replace all related structure names and object names which is named > > "master" with "controller". > > In what sense does this make the code more readable? >
RE: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"
Hi Joe, This patch series adds slave mode for lpspi controller. So this driver can support both master and slave mode for lpspi controller after apply this patch series. Currently, both master mode and slave mode share the code in this driver. Therefore, using spi_master to represent the structure of slave mode will cause confusion. When Geert Uytterhoeven add the slave support for spi, he had done " Generalize SPI 'master' to 'controller'" step, too. The commit ID is 8caab75fd2c2a92667cbb1cd315720bede3feaa9. Thank you. Regards, Clark Wang > -Original Message- > From: Joe Perches > Sent: Saturday, December 8, 2018 13:14 > To: Clark Wang ; broo...@kernel.org > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller" > > On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote: > > In order to enable the slave mode and make the code more readable, > > replace all related structure names and object names which is named > > "master" with "controller". > > In what sense does this make the code more readable? >
[PATCH v5 0/1] signaling processes through pidfds
Hey everyone, This is v5 of this patchset. v5 does not introduce any functional changes since none were requested or required in the thread. Instead, it focusses on updated documentation making it very clear what the intentions are how to extend this syscall. Eric, I dragged Serge into this and we went through the mails and tried to very thoroughly address your concerns about whether the width of the target should depend on flags or file descriptor types. As far as we understand from the threads this was your ultimate worry and also the reason why you withheld your agreement to the name of the syscall. We have outlined how the syscall is intended to be extended and decided that flags (e.g. PIDFD_TYPE_TID, PIDFD_TYPE_PGID) are the way to go. In line with this we decided to accept "pidfd_" as prefix for the new syscall. All concerns we could identify and understand we tried to address. I hope this will be sufficient for you to get behind the patch. The relevant section in the commit message is titled: /* sending signals to threads (tid) and process groups (pgid) */ Thanks! Christian Christian Brauner (1): signal: add pidfd_send_signal() syscall arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/proc/base.c | 20 +++- include/linux/proc_fs.h| 12 +++ include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- kernel/signal.c| 141 +++-- 7 files changed, 173 insertions(+), 9 deletions(-) -- 2.19.1
[PATCH v5 1/1] signal: add pidfd_send_signal() syscall
The kill() syscall operates on process identifiers (pid). After a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. This issue has often surfaced and there has been a push to address this problem [1]. This patch uses file descriptors (fd) from proc/ as stable handles on struct pid. Even if a pid is recycled the handle will not change. The fd can be used to send signals to the process it refers to. Thus, the new syscall pidfd_send_signal() is introduced to solve this problem. Instead of pids it operates on process fds (pidfd). /* prototype and argument /* long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags); In addition to the pidfd and signal argument it takes an additional siginfo_t and flags argument. If the siginfo_t argument is NULL then pidfd_send_signal() is equivalent to kill(, ). If it is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo(). The flags argument is added to allow for future extensions of this syscall. It currently needs to be passed as 0. Failing to do so will cause EINVAL. /* pidfd_send_signal() replaces multiple pid-based syscalls */ The pidfd_send_signal() syscall currently takes on the job of rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a positive pid is passed to kill(2). It will however be possible to also replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended. /* sending signals to threads (tid) and process groups (pgid) */ Specifically, the pidfd_send_signal() syscall does currently not operate on process groups or threads. This is left for future extensions. In order to extend the syscall to allow sending signal to threads and process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and PIDFD_TYPE_TID) should be added. This implies that the flags argument will determine what is signaled and not the file descriptor itself. Put in other words, grouping in this api is a property of the flags argument not a property of the file descriptor (cf. [13]). When appropriate extensions through the flags argument are added then pidfd_send_signal() can additionally replace the part of kill(2) which operates on process groups as well as the tgkill(2) and rt_tgsigqueueinfo(2) syscalls. How such an extension could be implemented has been very roughly sketched in [14], [15], and [16]. However, this should not be taken as a commitment to a particular implementation. There might be better ways to do it. Right now this is intentionally left out to keep this patchset as simple as possible (cf. [4]). For example, if a pidfd for a tid from /proc//task/ is passed EOPNOTSUPP will be returned to give userspace a way to detect when I add support for signaling to threads (cf. [10]). /* naming */ The syscall had various names throughout iterations of this patchset: - procfd_signal() - procfd_send_signal() - taskfd_send_signal() In the last round of reviews it was pointed out that given that if the flags argument decides the scope of the signal instead of different types of fds it might make sense to either settle for "procfd_" or "pidfd_" as prefix. The community was willing to accept either (cf. [17] and [18]). Given that one developer expressed strong preference for the "pidfd_" prefix (cf. [13] and with other developers less opinionated about the name we should settle for "pidfd_" to avoid further bikeshedding. The "_send_signal" suffix was chosen to reflect the fact that the syscall takes on the job of multiple syscalls. It is therefore intentional that the name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer because it might imply that pidfd_send_signal() is a replacement for kill(2), and not the latter because it is a hassle to remember the correct spelling - especially for non-native speakers - and because it is not descriptive enough of what the syscall actually does. The name "pidfd_send_signal" makes it very clear that its job is to send signals. /* O_PATH file descriptors */ pidfds opened as O_PATH fds cannot be used to send signals to a process (cf. [2]). Signaling processes through pidfds is the equivalent of writing to a file. Thus, this is not an operation that operates "purely at the file descriptor level" as required by the open(2) manpage. /* zombies */ Zombies can be signaled just as any other process. No special error will be reported since a zombie state is an unreliable state (cf. [3]). However, this can be added as an extension through the @flags argument if the need ever arises. /* cross-namespace signals */ The patch currently enforces that the signaler and signalee either are in the same pid namespace or that the signaler's pid namespace is an ancestor of the signalee's pid namespace. This is done for the sake of simplicity and because it is unclear to what values certain members of struct siginfo_t would need to be set to (cf. [5], [6]). /* compat
[PATCH v5 0/1] signaling processes through pidfds
Hey everyone, This is v5 of this patchset. v5 does not introduce any functional changes since none were requested or required in the thread. Instead, it focusses on updated documentation making it very clear what the intentions are how to extend this syscall. Eric, I dragged Serge into this and we went through the mails and tried to very thoroughly address your concerns about whether the width of the target should depend on flags or file descriptor types. As far as we understand from the threads this was your ultimate worry and also the reason why you withheld your agreement to the name of the syscall. We have outlined how the syscall is intended to be extended and decided that flags (e.g. PIDFD_TYPE_TID, PIDFD_TYPE_PGID) are the way to go. In line with this we decided to accept "pidfd_" as prefix for the new syscall. All concerns we could identify and understand we tried to address. I hope this will be sufficient for you to get behind the patch. The relevant section in the commit message is titled: /* sending signals to threads (tid) and process groups (pgid) */ Thanks! Christian Christian Brauner (1): signal: add pidfd_send_signal() syscall arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/proc/base.c | 20 +++- include/linux/proc_fs.h| 12 +++ include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- kernel/signal.c| 141 +++-- 7 files changed, 173 insertions(+), 9 deletions(-) -- 2.19.1
[PATCH v5 1/1] signal: add pidfd_send_signal() syscall
The kill() syscall operates on process identifiers (pid). After a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. This issue has often surfaced and there has been a push to address this problem [1]. This patch uses file descriptors (fd) from proc/ as stable handles on struct pid. Even if a pid is recycled the handle will not change. The fd can be used to send signals to the process it refers to. Thus, the new syscall pidfd_send_signal() is introduced to solve this problem. Instead of pids it operates on process fds (pidfd). /* prototype and argument /* long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags); In addition to the pidfd and signal argument it takes an additional siginfo_t and flags argument. If the siginfo_t argument is NULL then pidfd_send_signal() is equivalent to kill(, ). If it is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo(). The flags argument is added to allow for future extensions of this syscall. It currently needs to be passed as 0. Failing to do so will cause EINVAL. /* pidfd_send_signal() replaces multiple pid-based syscalls */ The pidfd_send_signal() syscall currently takes on the job of rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a positive pid is passed to kill(2). It will however be possible to also replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended. /* sending signals to threads (tid) and process groups (pgid) */ Specifically, the pidfd_send_signal() syscall does currently not operate on process groups or threads. This is left for future extensions. In order to extend the syscall to allow sending signal to threads and process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and PIDFD_TYPE_TID) should be added. This implies that the flags argument will determine what is signaled and not the file descriptor itself. Put in other words, grouping in this api is a property of the flags argument not a property of the file descriptor (cf. [13]). When appropriate extensions through the flags argument are added then pidfd_send_signal() can additionally replace the part of kill(2) which operates on process groups as well as the tgkill(2) and rt_tgsigqueueinfo(2) syscalls. How such an extension could be implemented has been very roughly sketched in [14], [15], and [16]. However, this should not be taken as a commitment to a particular implementation. There might be better ways to do it. Right now this is intentionally left out to keep this patchset as simple as possible (cf. [4]). For example, if a pidfd for a tid from /proc//task/ is passed EOPNOTSUPP will be returned to give userspace a way to detect when I add support for signaling to threads (cf. [10]). /* naming */ The syscall had various names throughout iterations of this patchset: - procfd_signal() - procfd_send_signal() - taskfd_send_signal() In the last round of reviews it was pointed out that given that if the flags argument decides the scope of the signal instead of different types of fds it might make sense to either settle for "procfd_" or "pidfd_" as prefix. The community was willing to accept either (cf. [17] and [18]). Given that one developer expressed strong preference for the "pidfd_" prefix (cf. [13] and with other developers less opinionated about the name we should settle for "pidfd_" to avoid further bikeshedding. The "_send_signal" suffix was chosen to reflect the fact that the syscall takes on the job of multiple syscalls. It is therefore intentional that the name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer because it might imply that pidfd_send_signal() is a replacement for kill(2), and not the latter because it is a hassle to remember the correct spelling - especially for non-native speakers - and because it is not descriptive enough of what the syscall actually does. The name "pidfd_send_signal" makes it very clear that its job is to send signals. /* O_PATH file descriptors */ pidfds opened as O_PATH fds cannot be used to send signals to a process (cf. [2]). Signaling processes through pidfds is the equivalent of writing to a file. Thus, this is not an operation that operates "purely at the file descriptor level" as required by the open(2) manpage. /* zombies */ Zombies can be signaled just as any other process. No special error will be reported since a zombie state is an unreliable state (cf. [3]). However, this can be added as an extension through the @flags argument if the need ever arises. /* cross-namespace signals */ The patch currently enforces that the signaler and signalee either are in the same pid namespace or that the signaler's pid namespace is an ancestor of the signalee's pid namespace. This is done for the sake of simplicity and because it is unclear to what values certain members of struct siginfo_t would need to be set to (cf. [5], [6]). /* compat
arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible constraints
Hi Juergen, FYI, the error/warning still remains. tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 5f179793f0a73965681db6a3203fa1baabd9b3c3 commit: 6da63eb241a05b0e676d68975e793c0521387141 x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella date: 3 months ago config: i386-randconfig-sb0-12081226 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: git checkout 6da63eb241a05b0e676d68975e793c0521387141 # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/atomic.h:8:0, from arch/x86/include/asm/msr.h:67, from arch/x86/include/asm/processor.h:21, from arch/x86/include/asm/cpufeature.h:5, from arch/x86/include/asm/thread_info.h:53, from include/linux/thread_info.h:38, from arch/x86/include/asm/preempt.h:7, from include/linux/preempt.h:81, from include/linux/spinlock.h:51, from include/linux/mmzone.h:8, from include/linux/gfp.h:6, from include/linux/mm.h:10, from mm/slub.c:13: mm/slub.c: In function '__slab_free.isra.74': >> arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible >> constraints asm volatile(pfx "cmpxchg%c4b %2; sete %0" \ ^ arch/x86/include/asm/cmpxchg.h:254:2: note: in expansion of macro '__cmpxchg_double' __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2) ^ include/asm-generic/atomic-instrumented.h:457:2: note: in expansion of macro 'arch_cmpxchg_double' arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \ ^ mm/slub.c:403:7: note: in expansion of macro 'cmpxchg_double' if (cmpxchg_double(>freelist, >counters, ^ vim +/asm +245 arch/x86/include/asm/cmpxchg.h 3d94ae0c Jeremy Fitzhardinge 2011-09-28 235 cdcd6298 Jan Beulich 2012-01-02 236 #define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2) \ cdcd6298 Jan Beulich 2012-01-02 237 ({ \ cdcd6298 Jan Beulich 2012-01-02 238bool __ret; \ cdcd6298 Jan Beulich 2012-01-02 239__typeof__(*(p1)) __old1 = (o1), __new1 = (n1); \ cdcd6298 Jan Beulich 2012-01-02 240__typeof__(*(p2)) __old2 = (o2), __new2 = (n2); \ cdcd6298 Jan Beulich 2012-01-02 241BUILD_BUG_ON(sizeof(*(p1)) != sizeof(long));\ cdcd6298 Jan Beulich 2012-01-02 242BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));\ cdcd6298 Jan Beulich 2012-01-02 243VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long)));\ cdcd6298 Jan Beulich 2012-01-02 244VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2));\ cdcd6298 Jan Beulich 2012-01-02 @245asm volatile(pfx "cmpxchg%c4b %2; sete %0" \ cdcd6298 Jan Beulich 2012-01-02 246 : "=a" (__ret), "+d" (__old2), \ cdcd6298 Jan Beulich 2012-01-02 247 "+m" (*(p1)), "+m" (*(p2)) \ cdcd6298 Jan Beulich 2012-01-02 248 : "i" (2 * sizeof(long)), "a" (__old1),\ cdcd6298 Jan Beulich 2012-01-02 249 "b" (__new1), "c" (__new2)); \ cdcd6298 Jan Beulich 2012-01-02 250__ret; \ cdcd6298 Jan Beulich 2012-01-02 251 }) cdcd6298 Jan Beulich 2012-01-02 252 :: The code at line 245 was first introduced by commit :: cdcd629869fabcd38ebd24a03b0a05ec1cbcafb0 x86: Fix and improve cmpxchg_double{,_local}() :: TO: Jan Beulich :: CC: Ingo Molnar --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible constraints
Hi Juergen, FYI, the error/warning still remains. tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 5f179793f0a73965681db6a3203fa1baabd9b3c3 commit: 6da63eb241a05b0e676d68975e793c0521387141 x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella date: 3 months ago config: i386-randconfig-sb0-12081226 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: git checkout 6da63eb241a05b0e676d68975e793c0521387141 # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/atomic.h:8:0, from arch/x86/include/asm/msr.h:67, from arch/x86/include/asm/processor.h:21, from arch/x86/include/asm/cpufeature.h:5, from arch/x86/include/asm/thread_info.h:53, from include/linux/thread_info.h:38, from arch/x86/include/asm/preempt.h:7, from include/linux/preempt.h:81, from include/linux/spinlock.h:51, from include/linux/mmzone.h:8, from include/linux/gfp.h:6, from include/linux/mm.h:10, from mm/slub.c:13: mm/slub.c: In function '__slab_free.isra.74': >> arch/x86/include/asm/cmpxchg.h:245:2: error: 'asm' operand has impossible >> constraints asm volatile(pfx "cmpxchg%c4b %2; sete %0" \ ^ arch/x86/include/asm/cmpxchg.h:254:2: note: in expansion of macro '__cmpxchg_double' __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2) ^ include/asm-generic/atomic-instrumented.h:457:2: note: in expansion of macro 'arch_cmpxchg_double' arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \ ^ mm/slub.c:403:7: note: in expansion of macro 'cmpxchg_double' if (cmpxchg_double(>freelist, >counters, ^ vim +/asm +245 arch/x86/include/asm/cmpxchg.h 3d94ae0c Jeremy Fitzhardinge 2011-09-28 235 cdcd6298 Jan Beulich 2012-01-02 236 #define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2) \ cdcd6298 Jan Beulich 2012-01-02 237 ({ \ cdcd6298 Jan Beulich 2012-01-02 238bool __ret; \ cdcd6298 Jan Beulich 2012-01-02 239__typeof__(*(p1)) __old1 = (o1), __new1 = (n1); \ cdcd6298 Jan Beulich 2012-01-02 240__typeof__(*(p2)) __old2 = (o2), __new2 = (n2); \ cdcd6298 Jan Beulich 2012-01-02 241BUILD_BUG_ON(sizeof(*(p1)) != sizeof(long));\ cdcd6298 Jan Beulich 2012-01-02 242BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));\ cdcd6298 Jan Beulich 2012-01-02 243VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long)));\ cdcd6298 Jan Beulich 2012-01-02 244VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2));\ cdcd6298 Jan Beulich 2012-01-02 @245asm volatile(pfx "cmpxchg%c4b %2; sete %0" \ cdcd6298 Jan Beulich 2012-01-02 246 : "=a" (__ret), "+d" (__old2), \ cdcd6298 Jan Beulich 2012-01-02 247 "+m" (*(p1)), "+m" (*(p2)) \ cdcd6298 Jan Beulich 2012-01-02 248 : "i" (2 * sizeof(long)), "a" (__old1),\ cdcd6298 Jan Beulich 2012-01-02 249 "b" (__new1), "c" (__new2)); \ cdcd6298 Jan Beulich 2012-01-02 250__ret; \ cdcd6298 Jan Beulich 2012-01-02 251 }) cdcd6298 Jan Beulich 2012-01-02 252 :: The code at line 245 was first introduced by commit :: cdcd629869fabcd38ebd24a03b0a05ec1cbcafb0 x86: Fix and improve cmpxchg_double{,_local}() :: TO: Jan Beulich :: CC: Ingo Molnar --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 1/2] arm64: dts: ti: k3-am654: Add Support for MMC/SD
On 07/12/18 2:12 PM, Faiz Abbas wrote: > There are two MMC host controller instances present on the TI's > Am654 SOCs. Add device tree nodes for the same. > > Signed-off-by: Faiz Abbas > --- > arch/arm64/boot/dts/ti/k3-am65-main.dtsi | 28 > 1 file changed, 28 insertions(+) > > diff --git a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > index 916434839603..d07212f16a81 100644 > --- a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > +++ b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > @@ -129,4 +129,32 @@ > clocks = <_clks 113 1>; > power-domains = <_pds 113>; > }; > + > + sdhci0: sdhci@4f8 { > + compatible = "ti,am654-sdhci-5.1"; > + reg = <0x0 0x4f8 0x0 0x260>, <0x0 0x4f9 0x0 0x134>; > + power-domains = <_pds 47>; > + clocks = <_clks 47 0>, <_clks 47 1>; > + clock-names = "clk_ahb", "clk_xin"; > + interrupts = ; > + sdhci-caps-mask = <0x8007 0x0>; > + mmc-ddr-1_8v; > + ti,otap-del-sel = <0x2>; > + ti,trm-icp = <0x8>; > + status = "disabled"; > + }; Please drop "status=disabled" from dtsi. Can be disabled as required in the board dts. > + > + sdhci1: sdhci@4fa { > + compatible = "ti,am654-sdhci-5.1"; > + reg = <0x0 0x4fa 0x0 0x260>, <0x0 0x4fb 0x0 0x134>; > + power-domains = <_pds 48>; > + clocks = <_clks 48 0>, <_clks 48 1>; > + clock-names = "clk_ahb", "clk_xin"; > + interrupts = ; > + sdhci-caps-mask = <0x8007 0x0>; > + mmc-ddr-1_8v; > + ti,otap-del-sel = <0x2>; > + ti,trm-icp = <0x8>; > + status = "disabled"; > + }; > }; > -- Regards Vignesh
Re: [PATCH 1/2] arm64: dts: ti: k3-am654: Add Support for MMC/SD
On 07/12/18 2:12 PM, Faiz Abbas wrote: > There are two MMC host controller instances present on the TI's > Am654 SOCs. Add device tree nodes for the same. > > Signed-off-by: Faiz Abbas > --- > arch/arm64/boot/dts/ti/k3-am65-main.dtsi | 28 > 1 file changed, 28 insertions(+) > > diff --git a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > index 916434839603..d07212f16a81 100644 > --- a/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > +++ b/arch/arm64/boot/dts/ti/k3-am65-main.dtsi > @@ -129,4 +129,32 @@ > clocks = <_clks 113 1>; > power-domains = <_pds 113>; > }; > + > + sdhci0: sdhci@4f8 { > + compatible = "ti,am654-sdhci-5.1"; > + reg = <0x0 0x4f8 0x0 0x260>, <0x0 0x4f9 0x0 0x134>; > + power-domains = <_pds 47>; > + clocks = <_clks 47 0>, <_clks 47 1>; > + clock-names = "clk_ahb", "clk_xin"; > + interrupts = ; > + sdhci-caps-mask = <0x8007 0x0>; > + mmc-ddr-1_8v; > + ti,otap-del-sel = <0x2>; > + ti,trm-icp = <0x8>; > + status = "disabled"; > + }; Please drop "status=disabled" from dtsi. Can be disabled as required in the board dts. > + > + sdhci1: sdhci@4fa { > + compatible = "ti,am654-sdhci-5.1"; > + reg = <0x0 0x4fa 0x0 0x260>, <0x0 0x4fb 0x0 0x134>; > + power-domains = <_pds 48>; > + clocks = <_clks 48 0>, <_clks 48 1>; > + clock-names = "clk_ahb", "clk_xin"; > + interrupts = ; > + sdhci-caps-mask = <0x8007 0x0>; > + mmc-ddr-1_8v; > + ti,otap-del-sel = <0x2>; > + ti,trm-icp = <0x8>; > + status = "disabled"; > + }; > }; > -- Regards Vignesh
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote: > I see. OK, HMM has done an efficient job of mopping up unused fields, and now > we are > completely out of space. At this point, after thinking about it carefully, it > seems clear > that it's time for a single, new field: Sorry for not replying earlier; I'm travelling and have had trouble keeping on top of my mail. Adding this field will grow struct page by 4-8 bytes, so it will no longer be 64 bytes. This isn't an acceptable answer. We have a few options for bits. One is that we have (iirc) two bits available in page->flags on 32-bit. That'll force a few more configurations into using _last_cpupid and/or page_ext. I'm not a huge fan of this approach. The second is to use page->lru.next bit 1. This requires some care because m68k allows misaligned pointers. If the list_head that it's joined to is misaligned, we'll be in trouble. This can get tricky because some pages are attached to list_heads which are on the stack ... and I don't think gcc guarantees __aligned attributes work for stack variables. The third is to use page->lru.prev bit 0. We'd want to switch pgmap and hmm_data around to make this work, and we'd want to record this in mm_types.h so nobody tries to use a field which aliases with page->lru.prev and has bit 0 set on a page which can be mapped to userspace (which I currently believe to be true). The fourth is to use a bit in page->flags for 64-bit and a bit in page_ext->flags for 32-bit. Or we could get rid of page_ext and grow struct page with a ->flags2 on 32-bit. Fifth, it isn't clear to me how many bits might be left in ->_last_cpupid at this point, and perhaps there's scope for using a bit in there. > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 5ed8f6292a53..1c789e324da8 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -182,6 +182,9 @@ struct page { > /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ > atomic_t _refcount; > > + /* DMA usage count. See get_user_pages*(), put_user_page*(). */ > + atomic_t _dma_pinned_count; > + > #ifdef CONFIG_MEMCG > struct mem_cgroup *mem_cgroup; > #endif > > > ...because after all, the reason this is so difficult is that this fix has to > work > in pretty much every configuration. get_user_pages() use is widespread, it's > a very > general facility, and...it needs fixing. And we're out of space. > > I'm going to send out an updated RFC that shows the latest, and I think it's > going > to include the above. > > -- > thanks, > John Hubbard > NVIDIA >
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote: > I see. OK, HMM has done an efficient job of mopping up unused fields, and now > we are > completely out of space. At this point, after thinking about it carefully, it > seems clear > that it's time for a single, new field: Sorry for not replying earlier; I'm travelling and have had trouble keeping on top of my mail. Adding this field will grow struct page by 4-8 bytes, so it will no longer be 64 bytes. This isn't an acceptable answer. We have a few options for bits. One is that we have (iirc) two bits available in page->flags on 32-bit. That'll force a few more configurations into using _last_cpupid and/or page_ext. I'm not a huge fan of this approach. The second is to use page->lru.next bit 1. This requires some care because m68k allows misaligned pointers. If the list_head that it's joined to is misaligned, we'll be in trouble. This can get tricky because some pages are attached to list_heads which are on the stack ... and I don't think gcc guarantees __aligned attributes work for stack variables. The third is to use page->lru.prev bit 0. We'd want to switch pgmap and hmm_data around to make this work, and we'd want to record this in mm_types.h so nobody tries to use a field which aliases with page->lru.prev and has bit 0 set on a page which can be mapped to userspace (which I currently believe to be true). The fourth is to use a bit in page->flags for 64-bit and a bit in page_ext->flags for 32-bit. Or we could get rid of page_ext and grow struct page with a ->flags2 on 32-bit. Fifth, it isn't clear to me how many bits might be left in ->_last_cpupid at this point, and perhaps there's scope for using a bit in there. > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 5ed8f6292a53..1c789e324da8 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -182,6 +182,9 @@ struct page { > /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ > atomic_t _refcount; > > + /* DMA usage count. See get_user_pages*(), put_user_page*(). */ > + atomic_t _dma_pinned_count; > + > #ifdef CONFIG_MEMCG > struct mem_cgroup *mem_cgroup; > #endif > > > ...because after all, the reason this is so difficult is that this fix has to > work > in pretty much every configuration. get_user_pages() use is widespread, it's > a very > general facility, and...it needs fixing. And we're out of space. > > I'm going to send out an updated RFC that shows the latest, and I think it's > going > to include the above. > > -- > thanks, > John Hubbard > NVIDIA >
Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"
On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote: > In order to enable the slave mode and make the code more readable, > replace all related structure names and object names which is > named "master" with "controller". In what sense does this make the code more readable?
Re: [PATCH V3 1/4] spi: lpspi: Replace all "master" with "controller"
On Fri, 2018-12-07 at 02:50 +, Clark Wang wrote: > In order to enable the slave mode and make the code more readable, > replace all related structure names and object names which is > named "master" with "controller". In what sense does this make the code more readable?
[GIT PULL] Qualcomm ARM64 DT updates for 4.21 - Part 2
The following changes since commit 70827d9f6bc4f481fafe790dd6654ba568526768: arm64: dts: qcom: msm8998: Fix compatible of scm node (2018-11-30 07:59:02 -0600) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git tags/qcom-arm64-for-4.21-2 for you to fetch changes up to 1504b91c819359b574b55c269c850352260b8d19: arm64: dts: msm8996: Use dwc3-qcom glue driver for USB (2018-12-07 14:10:56 -0600) Qualcomm ARM64 Updates for v4.21 Part 2 * Switch to use dwc3-qcom glue driver on MSM8996 * Fix issue with xo clk name on MSM8998 * Add cooling maps on MSM8916 * Add UART nodes on SDM845 * Add camera subsystem support on MSM8996 and MSM8916 Andy Gross (1): arm64: dts: qcom: msm8998: Fixup clock to use xo_board Manu Gautam (1): arm64: dts: msm8996: Use dwc3-qcom glue driver for USB Matthias Kaehlcke (1): arm64: dts: qcom: sdm845: Add UART nodes Todor Tomov (6): arm64: dts: qcom: msm8916: Add IOMMU sub-node for VFE context bank arm64: dts: qcom: msm8916: Add CAMSS support arm64: dts: qcom: Add Camera Control Interface pinctrls arm64: dts: qcom: Add pinctrls for camera sensors arm64: dts: qcom: msm8996: Add VFE SMMU node arm64: dts: qcom: msm8996: Add CAMSS support Viresh Kumar (1): arm64: dts: msm8916: Add all CPUs in cooling maps arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi | 6 +- arch/arm64/boot/dts/qcom/msm8916-pins.dtsi | 76 arch/arm64/boot/dts/qcom/msm8916.dtsi| 97 +- arch/arm64/boot/dts/qcom/msm8996-pins.dtsi | 120 arch/arm64/boot/dts/qcom/msm8996.dtsi| 162 +++- arch/arm64/boot/dts/qcom/msm8998.dtsi| 3 +- arch/arm64/boot/dts/qcom/sdm845.dtsi | 270 +++ 7 files changed, 725 insertions(+), 9 deletions(-)
[GIT PULL] Qualcomm Driver updates for 4.21 - Part 2
The following changes since commit b601f73130a375c912d9f2ec93c5f3cea5d6a3da: drm: msm: Check cmd_db_read_aux_data() for failure (2018-11-29 17:41:53 -0600) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git tags/qcom-drivers-for-4.21-2 for you to fetch changes up to 92def04bd7b46f5b186f985d5c9d3804858c3c2f: MAINTAINERS: Change Todor Tomov's email address (2018-12-05 16:45:34 -0600) Qualcomm ARM Based Driver Updates for v4.21 Part 2 * Fix MAINTAINERS email address for Todor * Fix SCM compilation error Jonathan Marek (1): firmware: qcom: scm: fix compilation error when disabled Todor Tomov (1): MAINTAINERS: Change Todor Tomov's email address MAINTAINERS | 2 +- include/linux/qcom_scm.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
[GIT PULL] Qualcomm ARM64 DT updates for 4.21 - Part 2
The following changes since commit 70827d9f6bc4f481fafe790dd6654ba568526768: arm64: dts: qcom: msm8998: Fix compatible of scm node (2018-11-30 07:59:02 -0600) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git tags/qcom-arm64-for-4.21-2 for you to fetch changes up to 1504b91c819359b574b55c269c850352260b8d19: arm64: dts: msm8996: Use dwc3-qcom glue driver for USB (2018-12-07 14:10:56 -0600) Qualcomm ARM64 Updates for v4.21 Part 2 * Switch to use dwc3-qcom glue driver on MSM8996 * Fix issue with xo clk name on MSM8998 * Add cooling maps on MSM8916 * Add UART nodes on SDM845 * Add camera subsystem support on MSM8996 and MSM8916 Andy Gross (1): arm64: dts: qcom: msm8998: Fixup clock to use xo_board Manu Gautam (1): arm64: dts: msm8996: Use dwc3-qcom glue driver for USB Matthias Kaehlcke (1): arm64: dts: qcom: sdm845: Add UART nodes Todor Tomov (6): arm64: dts: qcom: msm8916: Add IOMMU sub-node for VFE context bank arm64: dts: qcom: msm8916: Add CAMSS support arm64: dts: qcom: Add Camera Control Interface pinctrls arm64: dts: qcom: Add pinctrls for camera sensors arm64: dts: qcom: msm8996: Add VFE SMMU node arm64: dts: qcom: msm8996: Add CAMSS support Viresh Kumar (1): arm64: dts: msm8916: Add all CPUs in cooling maps arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi | 6 +- arch/arm64/boot/dts/qcom/msm8916-pins.dtsi | 76 arch/arm64/boot/dts/qcom/msm8916.dtsi| 97 +- arch/arm64/boot/dts/qcom/msm8996-pins.dtsi | 120 arch/arm64/boot/dts/qcom/msm8996.dtsi| 162 +++- arch/arm64/boot/dts/qcom/msm8998.dtsi| 3 +- arch/arm64/boot/dts/qcom/sdm845.dtsi | 270 +++ 7 files changed, 725 insertions(+), 9 deletions(-)
[GIT PULL] Qualcomm Driver updates for 4.21 - Part 2
The following changes since commit b601f73130a375c912d9f2ec93c5f3cea5d6a3da: drm: msm: Check cmd_db_read_aux_data() for failure (2018-11-29 17:41:53 -0600) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git tags/qcom-drivers-for-4.21-2 for you to fetch changes up to 92def04bd7b46f5b186f985d5c9d3804858c3c2f: MAINTAINERS: Change Todor Tomov's email address (2018-12-05 16:45:34 -0600) Qualcomm ARM Based Driver Updates for v4.21 Part 2 * Fix MAINTAINERS email address for Todor * Fix SCM compilation error Jonathan Marek (1): firmware: qcom: scm: fix compilation error when disabled Todor Tomov (1): MAINTAINERS: Change Todor Tomov's email address MAINTAINERS | 2 +- include/linux/qcom_scm.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
Re: [PATCH v2 01/34] kbuild: Add support for DT binding schema checks
Hi Rob, On Tue, Dec 4, 2018 at 6:32 AM Rob Herring wrote: > > This adds the build infrastructure for checking DT binding schema > documents and validating dts files using the binding schema. > > Check DT binding schema documents: > make dt_binding_check > > Build dts files and check using DT binding schema: > make dtbs_check > > Optionally, DT_SCHEMA_FILES can passed in with a schema file(s) to use > for validation. This makes it easier to find and fix errors generated by > a specific schema. > > Currently, the validation targets are separate from a normal build to > avoid a hard dependency on the external DT schema project and because > there are lots of warnings generated. > > Cc: Jonathan Corbet > Cc: Mark Rutland > Cc: Masahiro Yamada > Cc: Michal Marek > Cc: linux-...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Cc: linux-kbu...@vger.kernel.org > Signed-off-by: Rob Herring > --- > .gitignore | 1 + > Documentation/Makefile | 2 +- > Documentation/devicetree/bindings/.gitignore | 1 + > Documentation/devicetree/bindings/Makefile | 33 > Makefile | 11 +-- > scripts/Makefile.lib | 24 -- > 6 files changed, 67 insertions(+), 5 deletions(-) > create mode 100644 Documentation/devicetree/bindings/.gitignore > create mode 100644 Documentation/devicetree/bindings/Makefile > > diff --git a/.gitignore b/.gitignore > index 97ba6b79834c..a20ac26aa2f5 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -15,6 +15,7 @@ > *.bin > *.bz2 > *.c.[012]*.* > +*.dt.yaml > *.dtb > *.dtb.S > *.dwo > diff --git a/Documentation/Makefile b/Documentation/Makefile > index 2ca77ad0f238..9786957c6a35 100644 > --- a/Documentation/Makefile > +++ b/Documentation/Makefile > @@ -2,7 +2,7 @@ > # Makefile for Sphinx documentation > # > > -subdir-y := > +subdir-y := devicetree/bindings/ > > # You can set these variables from the command line. > SPHINXBUILD = sphinx-build > diff --git a/Documentation/devicetree/bindings/.gitignore > b/Documentation/devicetree/bindings/.gitignore > new file mode 100644 > index ..d9194c02dd08 > --- /dev/null > +++ b/Documentation/devicetree/bindings/.gitignore > @@ -0,0 +1 @@ > +*.example.dts > diff --git a/Documentation/devicetree/bindings/Makefile > b/Documentation/devicetree/bindings/Makefile > new file mode 100644 > index ..ee0110dd8131 > --- /dev/null > +++ b/Documentation/devicetree/bindings/Makefile > @@ -0,0 +1,33 @@ > +# SPDX-License-Identifier: GPL-2.0 > +DT_DOC_CHECKER ?= dt-doc-validate > +DT_EXTRACT_EX ?= dt-extract-example > +DT_MK_SCHEMA ?= dt-mk-schema > +DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u) > + > +quiet_cmd_chk_binding = CHKDT $< > + cmd_chk_binding = (set -e; \ > + $(DT_DOC_CHECKER) $< ; \ > + mkdir -p $(dir $@) ; \ > + $(DT_EXTRACT_EX) $< > $@ ) > + > +$(obj)/%.example.dts: $(src)/%.yaml FORCE > + $(call if_changed,chk_binding) > + > +DT_TMP_SCHEMA := .schema.yaml.tmp > +extra-y += $(DT_TMP_SCHEMA) > + > +quiet_cmd_mk_schema = SCHEMA $@ > + cmd_mk_schema = mkdir -p $(obj); \ > + rm -f $@; \ > + $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $< I think '$<' is wrong. '$<' is replaced with the first prerequisite. You can easily check what is happening here. $ cat Documentation/devicetree/bindings/..schema.yaml.tmp.cmd cmd_Documentation/devicetree/bindings/.schema.yaml.tmp := mkdir -p Documentation/devicetree/bindings; rm -f Documentation/devicetree/bindings/.schema.yaml.tmp; dt-mk-schema -o Documentation/devicetree/bindings/.schema.yaml.tmp Documentation/devicetree/bindings/arm/ti/ti,davinci.yaml So, the dt-validater will check only binding from ti,davinci.yaml, which is almost useless. If I understand it correctly, .schema.yaml.tmp should contain all binding yaml. I fixed it up like follows: diff --git a/Documentation/devicetree/bindings/Makefile b/Documentation/devicetree/bindings/Makefile index ee0110d..267458f 100644 --- a/Documentation/devicetree/bindings/Makefile +++ b/Documentation/devicetree/bindings/Makefile @@ -19,7 +19,7 @@ extra-y += $(DT_TMP_SCHEMA) quiet_cmd_mk_schema = SCHEMA $@ cmd_mk_schema = mkdir -p $(obj); \ rm -f $@; \ - $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $< + $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $(filter-out FORCE, $^) DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml') DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS)) Then, I see another error. SCHEMA Documentation/devicetree/bindings/.schema.yaml.tmp Traceback (most recent call last): File "/home/masahiro/ref/yaml-bindings/tools/dt-mk-schema", line 32, in schemas = dtschema.process_schemas(args.schemas, core_schema=(not
Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves
On 2018/12/6 18:37, Borislav Petkov wrote: > Did you not see my reply to this last time? > > https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic I'm sorry that there is something wrong with my mail filter. So I was not notified about your reply for many days. :) I just found your reply and replied to it. Thx. -- Regards, Pu Wen
Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves
On 2018/12/6 18:37, Borislav Petkov wrote: > Did you not see my reply to this last time? > > https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic I'm sorry that there is something wrong with my mail filter. So I was not notified about your reply for many days. :) I just found your reply and replied to it. Thx. -- Regards, Pu Wen
Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves
On 2018/12/6 18:37, Borislav Petkov wrote: Did you not see my reply to this last time? https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic I'm sorry that there is something wrong with my mail filter. So I was not notified about your reply for many days. :) I just found your reply and replied to it. Thx. -- Regards, Pu Wen
Re: [RFC PATCH RESEND] x86/cpu: Avoid endless loop to get the number of cache leaves
On 2018/12/6 18:37, Borislav Petkov wrote: Did you not see my reply to this last time? https://lkml.kernel.org/r/20181115172155.gb25...@zn.tnic I'm sorry that there is something wrong with my mail filter. So I was not notified about your reply for many days. :) I just found your reply and replied to it. Thx. -- Regards, Pu Wen
Re: [PATCH v6 2/5] phy: qcom-qmp: Utilize fully-specified DT registers
Hi, On 07/12/18 2:16 PM, Vivek Gautam wrote: > On Fri, Dec 7, 2018 at 5:06 AM Evan Green wrote: >> >> Utilize the newly fixed up DT bindings to get the tx2 and rx2 register >> regions for the second lane of dual-lane PHYs. Before this change, >> the driver was simply using lane one's register region and adding >> 0x400, which reached well beyond the DT-specified register >> allocation. This would have been a crash were it not for the page size >> on ARM64. Fix the driver not to rely on the magic of virtual memory by >> using the newly specified DT register regions for tx2 and rx2. >> >> In order to support existing device trees, this change also contains a >> fallback mode for when those new register regions don't exist, which >> reverts to the original behavior of overreaching and prints a complaint. >> >> Signed-off-by: Evan Green >> Reviewed-by: Douglas Anderson >> --- >> As Doug mentioned, this should land before the dts patches land, otherwise >> the old driver code will use the tx2 register region as pcs_misc. >> >> Changes in v6: None >> Changes in v5: None >> Changes in v4: None >> Changes in v3: None >> Changes in v2: None >> >> drivers/phy/qualcomm/phy-qcom-qmp.c | 51 + >> 1 file changed, 38 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c >> b/drivers/phy/qualcomm/phy-qcom-qmp.c >> index 514db7248a5d0..8204d55e2d650 100644 >> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c >> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c >> @@ -72,6 +72,9 @@ >> >> #define MAX_PROP_NAME 32 >> >> +/* Define the assumed distance between lanes for underspecified device >> trees. */ >> +#define QMP_PHY_LEGACY_LANE_STRIDE 0x400 >> + >> struct qmp_phy_init_tbl { >> unsigned int offset; >> unsigned int val; >> @@ -733,9 +736,6 @@ struct qmp_phy_cfg { >> bool has_phy_dp_com_ctrl; >> /* true, if PHY has secondary tx/rx lanes to be configured */ >> bool is_dual_lane_phy; >> - /* Register offset of secondary tx/rx lanes for USB DP combo PHY */ >> - unsigned int tx_b_lane_offset; >> - unsigned int rx_b_lane_offset; >> >> /* true, if PCS block has no separate SW_RESET register */ >> bool no_pcs_sw_reset; >> @@ -748,6 +748,8 @@ struct qmp_phy_cfg { >> * @tx: iomapped memory space for lane's tx >> * @rx: iomapped memory space for lane's rx >> * @pcs: iomapped memory space for lane's pcs >> + * @tx2: iomapped memory space for second lane's tx (in dual lane PHYs) >> + * @rx2: iomapped memory space for second lane's rx (in dual lane PHYs) >> * @pcs_misc: iomapped memory space for lane's pcs_misc >> * @pipe_clk: pipe lock >> * @index: lane index >> @@ -759,6 +761,8 @@ struct qmp_phy { >> void __iomem *tx; >> void __iomem *rx; >> void __iomem *pcs; >> + void __iomem *tx2; >> + void __iomem *rx2; >> void __iomem *pcs_misc; >> struct clk *pipe_clk; >> unsigned int index; >> @@ -975,8 +979,6 @@ static const struct qmp_phy_cfg qmp_v3_usb3phy_cfg = { >> >> .has_phy_dp_com_ctrl= true, >> .is_dual_lane_phy = true, >> - .tx_b_lane_offset = 0x400, >> - .rx_b_lane_offset = 0x400, >> }; >> >> static const struct qmp_phy_cfg qmp_v3_usb3_uniphy_cfg = { >> @@ -1031,9 +1033,6 @@ static const struct qmp_phy_cfg sdm845_ufsphy_cfg = { >> .mask_pcs_ready = PCS_READY, >> >> .is_dual_lane_phy = true, >> - .tx_b_lane_offset = 0x400, >> - .rx_b_lane_offset = 0x400, >> - >> .no_pcs_sw_reset= true, >> }; >> >> @@ -1238,12 +1237,12 @@ static int qcom_qmp_phy_init(struct phy *phy) >> qcom_qmp_phy_configure(tx, cfg->regs, cfg->tx_tbl, cfg->tx_tbl_num); >> /* Configuration for other LANE for USB-DP combo PHY */ >> if (cfg->is_dual_lane_phy) >> - qcom_qmp_phy_configure(tx + cfg->tx_b_lane_offset, cfg->regs, >> + qcom_qmp_phy_configure(qphy->tx2, cfg->regs, >>cfg->tx_tbl, cfg->tx_tbl_num); >> >> qcom_qmp_phy_configure(rx, cfg->regs, cfg->rx_tbl, cfg->rx_tbl_num); >> if (cfg->is_dual_lane_phy) >> - qcom_qmp_phy_configure(rx + cfg->rx_b_lane_offset, cfg->regs, >> + qcom_qmp_phy_configure(qphy->rx2, cfg->regs, >>cfg->rx_tbl, cfg->rx_tbl_num); >> >> qcom_qmp_phy_configure(pcs, cfg->regs, cfg->pcs_tbl, >> cfg->pcs_tbl_num); >> @@ -1615,8 +1614,9 @@ int qcom_qmp_phy_create(struct device *dev, struct >> device_node *np, int id) >> >> /* >> * Get memory resources for each phy lane: >> -* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2; and >> -* pcs_misc (optional) -> 3. >> +* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2. >> +* For dual lane PHYs: tx2 -> 3, rx2
Re: [PATCH v6 2/5] phy: qcom-qmp: Utilize fully-specified DT registers
Hi, On 07/12/18 2:16 PM, Vivek Gautam wrote: > On Fri, Dec 7, 2018 at 5:06 AM Evan Green wrote: >> >> Utilize the newly fixed up DT bindings to get the tx2 and rx2 register >> regions for the second lane of dual-lane PHYs. Before this change, >> the driver was simply using lane one's register region and adding >> 0x400, which reached well beyond the DT-specified register >> allocation. This would have been a crash were it not for the page size >> on ARM64. Fix the driver not to rely on the magic of virtual memory by >> using the newly specified DT register regions for tx2 and rx2. >> >> In order to support existing device trees, this change also contains a >> fallback mode for when those new register regions don't exist, which >> reverts to the original behavior of overreaching and prints a complaint. >> >> Signed-off-by: Evan Green >> Reviewed-by: Douglas Anderson >> --- >> As Doug mentioned, this should land before the dts patches land, otherwise >> the old driver code will use the tx2 register region as pcs_misc. >> >> Changes in v6: None >> Changes in v5: None >> Changes in v4: None >> Changes in v3: None >> Changes in v2: None >> >> drivers/phy/qualcomm/phy-qcom-qmp.c | 51 + >> 1 file changed, 38 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c >> b/drivers/phy/qualcomm/phy-qcom-qmp.c >> index 514db7248a5d0..8204d55e2d650 100644 >> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c >> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c >> @@ -72,6 +72,9 @@ >> >> #define MAX_PROP_NAME 32 >> >> +/* Define the assumed distance between lanes for underspecified device >> trees. */ >> +#define QMP_PHY_LEGACY_LANE_STRIDE 0x400 >> + >> struct qmp_phy_init_tbl { >> unsigned int offset; >> unsigned int val; >> @@ -733,9 +736,6 @@ struct qmp_phy_cfg { >> bool has_phy_dp_com_ctrl; >> /* true, if PHY has secondary tx/rx lanes to be configured */ >> bool is_dual_lane_phy; >> - /* Register offset of secondary tx/rx lanes for USB DP combo PHY */ >> - unsigned int tx_b_lane_offset; >> - unsigned int rx_b_lane_offset; >> >> /* true, if PCS block has no separate SW_RESET register */ >> bool no_pcs_sw_reset; >> @@ -748,6 +748,8 @@ struct qmp_phy_cfg { >> * @tx: iomapped memory space for lane's tx >> * @rx: iomapped memory space for lane's rx >> * @pcs: iomapped memory space for lane's pcs >> + * @tx2: iomapped memory space for second lane's tx (in dual lane PHYs) >> + * @rx2: iomapped memory space for second lane's rx (in dual lane PHYs) >> * @pcs_misc: iomapped memory space for lane's pcs_misc >> * @pipe_clk: pipe lock >> * @index: lane index >> @@ -759,6 +761,8 @@ struct qmp_phy { >> void __iomem *tx; >> void __iomem *rx; >> void __iomem *pcs; >> + void __iomem *tx2; >> + void __iomem *rx2; >> void __iomem *pcs_misc; >> struct clk *pipe_clk; >> unsigned int index; >> @@ -975,8 +979,6 @@ static const struct qmp_phy_cfg qmp_v3_usb3phy_cfg = { >> >> .has_phy_dp_com_ctrl= true, >> .is_dual_lane_phy = true, >> - .tx_b_lane_offset = 0x400, >> - .rx_b_lane_offset = 0x400, >> }; >> >> static const struct qmp_phy_cfg qmp_v3_usb3_uniphy_cfg = { >> @@ -1031,9 +1033,6 @@ static const struct qmp_phy_cfg sdm845_ufsphy_cfg = { >> .mask_pcs_ready = PCS_READY, >> >> .is_dual_lane_phy = true, >> - .tx_b_lane_offset = 0x400, >> - .rx_b_lane_offset = 0x400, >> - >> .no_pcs_sw_reset= true, >> }; >> >> @@ -1238,12 +1237,12 @@ static int qcom_qmp_phy_init(struct phy *phy) >> qcom_qmp_phy_configure(tx, cfg->regs, cfg->tx_tbl, cfg->tx_tbl_num); >> /* Configuration for other LANE for USB-DP combo PHY */ >> if (cfg->is_dual_lane_phy) >> - qcom_qmp_phy_configure(tx + cfg->tx_b_lane_offset, cfg->regs, >> + qcom_qmp_phy_configure(qphy->tx2, cfg->regs, >>cfg->tx_tbl, cfg->tx_tbl_num); >> >> qcom_qmp_phy_configure(rx, cfg->regs, cfg->rx_tbl, cfg->rx_tbl_num); >> if (cfg->is_dual_lane_phy) >> - qcom_qmp_phy_configure(rx + cfg->rx_b_lane_offset, cfg->regs, >> + qcom_qmp_phy_configure(qphy->rx2, cfg->regs, >>cfg->rx_tbl, cfg->rx_tbl_num); >> >> qcom_qmp_phy_configure(pcs, cfg->regs, cfg->pcs_tbl, >> cfg->pcs_tbl_num); >> @@ -1615,8 +1614,9 @@ int qcom_qmp_phy_create(struct device *dev, struct >> device_node *np, int id) >> >> /* >> * Get memory resources for each phy lane: >> -* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2; and >> -* pcs_misc (optional) -> 3. >> +* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2. >> +* For dual lane PHYs: tx2 -> 3, rx2
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Fri, 7 Dec 2018 18:58:05 +0100 Andrea Righi wrote: > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > Hi Andrea and Ingo, > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > working. > > After introducing this patch, I will start adding > > arch_populate_kprobe_blacklist() > > to some arches. > > > > Thank you, > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > area > > > > From: Masami Hiramatsu > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > With this change, user can see all symbols which are prohibited > > to probe in debugfs. > > > > All archtectures which have custom prohibit areas should define > > its own arch_populate_kprobe_blacklist() function, but unless that, > > all symbols marked __kprobes are blacklisted. > > What about iterating all symbols and use arch_within_kprobe_blacklist() > to check if we need to blacklist them or not. Sorry, I don't want to iterate all ksyms since it may take a long time (especially embedded small devices.) > > In this way we don't have to introduce an > arch_populate_kprobe_blacklist() for each architecture. Hmm, I had a same idea, but there are some arch which prohibit probing extable entries (e.g. arm64.) For correctness of the blacklist, I think it should be listed (not entire the function body). I also rather like to remove arch_within_kprobe_blacklist() instead. Thank you, > > Something like the following maybe. > > Thanks. > > [RFC] kprobes: blacklist all symbols in arch-defined prohibited area > > From: Andrea Righi > > Blacklist symbols in arch-defined probe-prohibited areas. > With this change, user can see all symbols which are prohibited > to probe in debugfs. > > Signed-off-by: Masami Hiramatsu > Signed-off-by: Andrea Righi > --- > kernel/kprobes.c | 55 +-- > 1 file changed, 41 insertions(+), 14 deletions(-) > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 90e98e233647..e67598dd7468 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -2093,6 +2093,35 @@ void dump_kprobe(struct kprobe *kp) > } > NOKPROBE_SYMBOL(dump_kprobe); > > +static int kprobe_blacklist_add(unsigned long entry) > +{ > + struct kprobe_blacklist_entry *ent; > + unsigned long offset = 0, size = 0; > + > + if (!kernel_text_address(entry) || > + !kallsyms_lookup_size_offset(entry, , )) > + return -EINVAL; > + > + ent = kmalloc(sizeof(*ent), GFP_KERNEL); > + if (!ent) > + return -ENOMEM; > + ent->start_addr = entry; > + ent->end_addr = entry + size; > + INIT_LIST_HEAD(>list); > + list_add_tail(>list, _blacklist); > + > + return 0; > +} > + > +static int arch_populate_kprobe_blacklist(void *data, const char *name, > + struct module *mod, > + unsigned long entry) > +{ > + if (arch_within_kprobe_blacklist(entry)) > + kprobe_blacklist_add(entry); > + return 0; > +} > + > /* > * Lookup and populate the kprobe_blacklist. > * > @@ -2104,24 +2133,22 @@ NOKPROBE_SYMBOL(dump_kprobe); > static int __init populate_kprobe_blacklist(unsigned long *start, >unsigned long *end) > { > - unsigned long *iter; > - struct kprobe_blacklist_entry *ent; > - unsigned long entry, offset = 0, size = 0; > + unsigned long entry, *iter; > + int ret; > > + /* Blacklist all arch_within_kprobe_blacklist() symbols */ > +mutex_lock(_mutex); > + kallsyms_on_each_symbol(arch_populate_kprobe_blacklist, NULL); > +mutex_unlock(_mutex); > + > + /* Add explicitly blacklisted symbols */ > for (iter = start; iter < end; iter++) { > entry = arch_deref_entry_point((void *)*iter); > - > - if (!kernel_text_address(entry) || > - !kallsyms_lookup_size_offset(entry, , )) > + ret = kprobe_blacklist_add(entry); > + if (ret == -EINVAL) > continue; > - > - ent = kmalloc(sizeof(*ent), GFP_KERNEL); > - if (!ent) > - return -ENOMEM; > - ent->start_addr = entry; > - ent->end_addr = entry + size; > - INIT_LIST_HEAD(>list); > - list_add_tail(>list, _blacklist); > + if (ret < 0) > + return ret; > } > return 0; > } -- Masami Hiramatsu
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Fri, 7 Dec 2018 18:58:05 +0100 Andrea Righi wrote: > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > Hi Andrea and Ingo, > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > working. > > After introducing this patch, I will start adding > > arch_populate_kprobe_blacklist() > > to some arches. > > > > Thank you, > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > area > > > > From: Masami Hiramatsu > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > With this change, user can see all symbols which are prohibited > > to probe in debugfs. > > > > All archtectures which have custom prohibit areas should define > > its own arch_populate_kprobe_blacklist() function, but unless that, > > all symbols marked __kprobes are blacklisted. > > What about iterating all symbols and use arch_within_kprobe_blacklist() > to check if we need to blacklist them or not. Sorry, I don't want to iterate all ksyms since it may take a long time (especially embedded small devices.) > > In this way we don't have to introduce an > arch_populate_kprobe_blacklist() for each architecture. Hmm, I had a same idea, but there are some arch which prohibit probing extable entries (e.g. arm64.) For correctness of the blacklist, I think it should be listed (not entire the function body). I also rather like to remove arch_within_kprobe_blacklist() instead. Thank you, > > Something like the following maybe. > > Thanks. > > [RFC] kprobes: blacklist all symbols in arch-defined prohibited area > > From: Andrea Righi > > Blacklist symbols in arch-defined probe-prohibited areas. > With this change, user can see all symbols which are prohibited > to probe in debugfs. > > Signed-off-by: Masami Hiramatsu > Signed-off-by: Andrea Righi > --- > kernel/kprobes.c | 55 +-- > 1 file changed, 41 insertions(+), 14 deletions(-) > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 90e98e233647..e67598dd7468 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -2093,6 +2093,35 @@ void dump_kprobe(struct kprobe *kp) > } > NOKPROBE_SYMBOL(dump_kprobe); > > +static int kprobe_blacklist_add(unsigned long entry) > +{ > + struct kprobe_blacklist_entry *ent; > + unsigned long offset = 0, size = 0; > + > + if (!kernel_text_address(entry) || > + !kallsyms_lookup_size_offset(entry, , )) > + return -EINVAL; > + > + ent = kmalloc(sizeof(*ent), GFP_KERNEL); > + if (!ent) > + return -ENOMEM; > + ent->start_addr = entry; > + ent->end_addr = entry + size; > + INIT_LIST_HEAD(>list); > + list_add_tail(>list, _blacklist); > + > + return 0; > +} > + > +static int arch_populate_kprobe_blacklist(void *data, const char *name, > + struct module *mod, > + unsigned long entry) > +{ > + if (arch_within_kprobe_blacklist(entry)) > + kprobe_blacklist_add(entry); > + return 0; > +} > + > /* > * Lookup and populate the kprobe_blacklist. > * > @@ -2104,24 +2133,22 @@ NOKPROBE_SYMBOL(dump_kprobe); > static int __init populate_kprobe_blacklist(unsigned long *start, >unsigned long *end) > { > - unsigned long *iter; > - struct kprobe_blacklist_entry *ent; > - unsigned long entry, offset = 0, size = 0; > + unsigned long entry, *iter; > + int ret; > > + /* Blacklist all arch_within_kprobe_blacklist() symbols */ > +mutex_lock(_mutex); > + kallsyms_on_each_symbol(arch_populate_kprobe_blacklist, NULL); > +mutex_unlock(_mutex); > + > + /* Add explicitly blacklisted symbols */ > for (iter = start; iter < end; iter++) { > entry = arch_deref_entry_point((void *)*iter); > - > - if (!kernel_text_address(entry) || > - !kallsyms_lookup_size_offset(entry, , )) > + ret = kprobe_blacklist_add(entry); > + if (ret == -EINVAL) > continue; > - > - ent = kmalloc(sizeof(*ent), GFP_KERNEL); > - if (!ent) > - return -ENOMEM; > - ent->start_addr = entry; > - ent->end_addr = entry + size; > - INIT_LIST_HEAD(>list); > - list_add_tail(>list, _blacklist); > + if (ret < 0) > + return ret; > } > return 0; > } -- Masami Hiramatsu
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Fri, 7 Dec 2018 18:00:26 +0100 Andrea Righi wrote: > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > Hi Andrea and Ingo, > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > working. > > After introducing this patch, I will start adding > > arch_populate_kprobe_blacklist() > > to some arches. > > > > Thank you, > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > area > > > > From: Masami Hiramatsu > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > With this change, user can see all symbols which are prohibited > > to probe in debugfs. > > > > All archtectures which have custom prohibit areas should define > > its own arch_populate_kprobe_blacklist() function, but unless that, > > all symbols marked __kprobes are blacklisted. > > > > Reported-by: Andrea Righi > > Signed-off-by: Masami Hiramatsu > > --- > > [snip] > > > +int kprobe_add_ksym_blacklist(unsigned long entry) > > +{ > > + struct kprobe_blacklist_entry *ent; > > + unsigned long offset = 0, size = 0; > > + > > + if (!kernel_text_address(entry) || > > + !kallsyms_lookup_size_offset(entry, , )) > > + return -EINVAL; > > + > > + ent = kmalloc(sizeof(*ent), GFP_KERNEL); > > + if (!ent) > > + return -ENOMEM; > > + ent->start_addr = entry - offset; > > + ent->end_addr = entry - offset + size; > > Do we need to take offset into account? The code before wasn't using it. Yes, if we hit an alias symbol (zero-size), we forcibly increment address and retry it. In that case, offset will be 1. > > > + INIT_LIST_HEAD(>list); > > + list_add_tail(>list, _blacklist); > > + > > + return (int)size; > > +} > > + > > +/* Add functions in arch defined probe-prohibited area */ > > +int __weak arch_populate_kprobe_blacklist(void) > > +{ > > + unsigned long entry; > > + int ret = 0; > > + > > + for (entry = (unsigned long)__kprobes_text_start; > > +entry < (unsigned long)__kprobes_text_end; > > +entry += ret) { > > + ret = kprobe_add_ksym_blacklist(entry); > > + if (ret < 0) > > + return ret; > > + if (ret == 0) /* In case of alias symbol */ > > + ret = 1; Here, we incremented. Thank you, > > + } > > + return 0; > > +} > > + > > /* > > * Lookup and populate the kprobe_blacklist. > > * > > @@ -2104,26 +2142,20 @@ NOKPROBE_SYMBOL(dump_kprobe); > > static int __init populate_kprobe_blacklist(unsigned long *start, > > unsigned long *end) > > { > > + unsigned long entry; > > unsigned long *iter; > > - struct kprobe_blacklist_entry *ent; > > - unsigned long entry, offset = 0, size = 0; > > + int ret; > > > > for (iter = start; iter < end; iter++) { > > entry = arch_deref_entry_point((void *)*iter); > > - > > - if (!kernel_text_address(entry) || > > - !kallsyms_lookup_size_offset(entry, , )) > > + ret = kprobe_add_ksym_blacklist(entry); > > + if (ret == -EINVAL) > > continue; > > - > > - ent = kmalloc(sizeof(*ent), GFP_KERNEL); > > - if (!ent) > > - return -ENOMEM; > > - ent->start_addr = entry; > > - ent->end_addr = entry + size; > > - INIT_LIST_HEAD(>list); > > - list_add_tail(>list, _blacklist); > > + if (ret < 0) > > + return ret; > > } > > - return 0; > > + > > + return arch_populate_kprobe_blacklist(); > > } > > > > /* Module notifier call back, checking kprobes on the module */ -- Masami Hiramatsu
Re: [PATCH] kprobes: x86_64: blacklist non-attachable interrupt functions
On Fri, 7 Dec 2018 18:00:26 +0100 Andrea Righi wrote: > On Sat, Dec 08, 2018 at 01:01:20AM +0900, Masami Hiramatsu wrote: > > Hi Andrea and Ingo, > > > > Here is the patch what I meant. I just ran it on qemu-x86, and seemed > > working. > > After introducing this patch, I will start adding > > arch_populate_kprobe_blacklist() > > to some arches. > > > > Thank you, > > > > [RFC] kprobes: x86/kprobes: Blacklist symbols in arch-defined prohibited > > area > > > > From: Masami Hiramatsu > > > > Blacklist symbols in arch-defined probe-prohibited areas. > > With this change, user can see all symbols which are prohibited > > to probe in debugfs. > > > > All archtectures which have custom prohibit areas should define > > its own arch_populate_kprobe_blacklist() function, but unless that, > > all symbols marked __kprobes are blacklisted. > > > > Reported-by: Andrea Righi > > Signed-off-by: Masami Hiramatsu > > --- > > [snip] > > > +int kprobe_add_ksym_blacklist(unsigned long entry) > > +{ > > + struct kprobe_blacklist_entry *ent; > > + unsigned long offset = 0, size = 0; > > + > > + if (!kernel_text_address(entry) || > > + !kallsyms_lookup_size_offset(entry, , )) > > + return -EINVAL; > > + > > + ent = kmalloc(sizeof(*ent), GFP_KERNEL); > > + if (!ent) > > + return -ENOMEM; > > + ent->start_addr = entry - offset; > > + ent->end_addr = entry - offset + size; > > Do we need to take offset into account? The code before wasn't using it. Yes, if we hit an alias symbol (zero-size), we forcibly increment address and retry it. In that case, offset will be 1. > > > + INIT_LIST_HEAD(>list); > > + list_add_tail(>list, _blacklist); > > + > > + return (int)size; > > +} > > + > > +/* Add functions in arch defined probe-prohibited area */ > > +int __weak arch_populate_kprobe_blacklist(void) > > +{ > > + unsigned long entry; > > + int ret = 0; > > + > > + for (entry = (unsigned long)__kprobes_text_start; > > +entry < (unsigned long)__kprobes_text_end; > > +entry += ret) { > > + ret = kprobe_add_ksym_blacklist(entry); > > + if (ret < 0) > > + return ret; > > + if (ret == 0) /* In case of alias symbol */ > > + ret = 1; Here, we incremented. Thank you, > > + } > > + return 0; > > +} > > + > > /* > > * Lookup and populate the kprobe_blacklist. > > * > > @@ -2104,26 +2142,20 @@ NOKPROBE_SYMBOL(dump_kprobe); > > static int __init populate_kprobe_blacklist(unsigned long *start, > > unsigned long *end) > > { > > + unsigned long entry; > > unsigned long *iter; > > - struct kprobe_blacklist_entry *ent; > > - unsigned long entry, offset = 0, size = 0; > > + int ret; > > > > for (iter = start; iter < end; iter++) { > > entry = arch_deref_entry_point((void *)*iter); > > - > > - if (!kernel_text_address(entry) || > > - !kallsyms_lookup_size_offset(entry, , )) > > + ret = kprobe_add_ksym_blacklist(entry); > > + if (ret == -EINVAL) > > continue; > > - > > - ent = kmalloc(sizeof(*ent), GFP_KERNEL); > > - if (!ent) > > - return -ENOMEM; > > - ent->start_addr = entry; > > - ent->end_addr = entry + size; > > - INIT_LIST_HEAD(>list); > > - list_add_tail(>list, _blacklist); > > + if (ret < 0) > > + return ret; > > } > > - return 0; > > + > > + return arch_populate_kprobe_blacklist(); > > } > > > > /* Module notifier call back, checking kprobes on the module */ -- Masami Hiramatsu
Re: [PATCH bpf-next 2/3] bpf: add bpffs pretty print for cgroup local storage maps
On 12/7/18 4:53 PM, Roman Gushchin wrote: > Implement bpffs pretty printing for cgroup local storage maps > (both shared and per-cpu). > Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c): > > Shared: >$ cat /sys/fs/bpf/map_2 ># WARNING!! The output is for debug purpose only ># WARNING!! The output format will change >{4294968594,1}: {,1039896} > > Per-cpu: >$ cat /sys/fs/bpf/map_1 ># WARNING!! The output is for debug purpose only ># WARNING!! The output format will change >{4294968594,1}: { > cpu0: {0,0,0,0,0} > cpu1: {0,0,0,0,0} > cpu2: {1,104,0,0,0} > cpu3: {0,0,0,0,0} >} > > Signed-off-by: Roman Gushchin > Cc: Alexei Starovoitov > Cc: Daniel Borkmann > --- > include/linux/btf.h| 10 + > kernel/bpf/local_storage.c | 90 +- > 2 files changed, 99 insertions(+), 1 deletion(-) > > diff --git a/include/linux/btf.h b/include/linux/btf.h > index 8c2199b5d250..ac67bc4cbfd9 100644 > --- a/include/linux/btf.h > +++ b/include/linux/btf.h > @@ -5,6 +5,7 @@ > #define _LINUX_BTF_H 1 > > #include > +#include > > struct btf; > struct btf_type; > @@ -63,4 +64,13 @@ static inline const char *btf_name_by_offset(const struct > btf *btf, > } > #endif > > +static inline const struct btf_type *btf_orig_type(const struct btf *btf, > +const struct btf_type *t) > +{ > + while (t && BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF) > + t = btf_type_by_id(btf, t->type); technically, type modifier "const" and "volatile" can apply to member type as well. But these modifiers really don't make sense here. Could you add a comment here to mention that they will be treated as an error since such a programming is not really recommended? > + > + return t; > +} > + > #endif > diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c > index b65017dead44..7b51fe1aba3c 100644 > --- a/kernel/bpf/local_storage.c > +++ b/kernel/bpf/local_storage.c > @@ -1,11 +1,13 @@ > //SPDX-License-Identifier: GPL-2.0 > #include > #include > +#include > #include > #include > #include > #include > #include > +#include > > DEFINE_PER_CPU(struct bpf_cgroup_storage*, > bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); > > @@ -308,6 +310,91 @@ static int cgroup_storage_delete_elem(struct bpf_map > *map, void *key) > return -EINVAL; > } > > +static int cgroup_storage_check_btf(const struct bpf_map *map, > + const struct btf *btf, > + const struct btf_type *key_type, > + const struct btf_type *value_type) > +{ > + const struct btf_type *st, *t; > + struct btf_member *m; > + > + /* Key is expected to be of struct bpf_cgroup_storage_key type, > + * which is: > + * struct bpf_cgroup_storage_key { > + * __u64 cgroup_inode_id; > + * __u32 attach_type; > + * }; > + */ > + > + /* > + * Key_type must be a structure (or a typedef of a structure) with > + * two members. > + */ > + st = btf_orig_type(btf, key_type); > + if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT || > + BTF_INFO_VLEN(st->info) != 2) > + return -EINVAL; > + > + /* > + * The first field must be a 64 bit integer at 0 offset. > + */ > + m = (struct btf_member *)(st + 1); > + t = btf_orig_type(btf, btf_type_by_id(btf, m->type)); > + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || m->offset || > + t->size != > + FIELD_SIZEOF(struct bpf_cgroup_storage_key, cgroup_inode_id)) > + return -EINVAL; We should not use t->size here. The "t->size" is the type size, and the real number of bits held by the member is BTF_INT_BITS(...) with the argument of the u32 int value after "t". > + > + /* > + * The second field must be a 32 bit integer at 0 offset. > + */ > + m = m + 1; > + t = btf_orig_type(btf, btf_type_by_id(btf, m->type)); > + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || > + m->offset != offsetof(struct bpf_cgroup_storage_key, attach_type) * > + BITS_PER_BYTE || t->size != > + FIELD_SIZEOF(struct bpf_cgroup_storage_key, attach_type)) > + return -EINVAL; The same is here. t->size should not be used. BTF_INT_BITS(...) should be used. > + > + return 0; > +} > + > +static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *_key, > + struct seq_file *m) > +{ > + enum bpf_cgroup_storage_type stype = cgroup_storage_type(map); > + struct bpf_cgroup_storage_key *key = _key; > + struct bpf_cgroup_storage *storage; > + int cpu; > + > + rcu_read_lock(); > + storage = cgroup_storage_lookup(map_to_storage(map), key, false); > + if (!storage)
Re: [PATCH v4 4/5] ARM: dts: imx5: add gpu nodes
On Tue, Dec 04, 2018 at 10:17:00AM -0500, Jonathan Marek wrote: > This adds the gpu nodes for the adreno 200 GPU on iMX51 and iMX53, now > supported by the freedreno driver. > > The compatible for the iMX51 uses a patchid of 1, which is used by drm/msm > driver to identify the smaller 128KiB GMEM size. > > Signed-off-by: Jonathan Marek Applied, thanks.
Re: [RFC][PATCHv2 3/4] serial: introduce uart_port locking helpers
On (10/16/18 14:04), Sergey Senozhatsky wrote: [..] > - The first entry point is console ->write() callback, which we call > from printk(). A possible deadlock scenario there is: > > CPU0 > > spin_lock_irqsave(>lock, flags) << deadlock > serial_foo_write() > call_console_drivers() > console_unlock() > console_flush_on_panic() > panic() > > spin_lock_irqsave(>lock, flags) > serial_foo_write() > call_console_drivers() > console_unlock() > printk() > ... [..] > - The rest (of entry points) requires a bit different handling. > Let's take a look at the following backtrace: > > CPU0 > > spin_lock_irqsave(>lock, flags) << deadlock > serial_foo_write() > call_console_drivers() > console_unlock() > printk() > __queue_work() > tty_flip_buffer_push() > spin_lock_irqsave(>lock, flags) > serial_foo_handle_IRQ() > > > Serial drivers invoke tons of core kernel functions - WQ, MM, etc. All > of which may printk() in various cases. So we can't really just > "remove those printk-s". The simples way to address this seems to be > PRINTK_SAFE_CONTEXT_MASK. serial/UART and printk guys, sorry to bother you, do you hate this idea of removing console_driver->CORE KERNEL->printk->console_driver deadlock path? Or is there any chance we can move forward? -ss
Re: [RFC][PATCHv2 3/4] serial: introduce uart_port locking helpers
On (10/16/18 14:04), Sergey Senozhatsky wrote: [..] > - The first entry point is console ->write() callback, which we call > from printk(). A possible deadlock scenario there is: > > CPU0 > > spin_lock_irqsave(>lock, flags) << deadlock > serial_foo_write() > call_console_drivers() > console_unlock() > console_flush_on_panic() > panic() > > spin_lock_irqsave(>lock, flags) > serial_foo_write() > call_console_drivers() > console_unlock() > printk() > ... [..] > - The rest (of entry points) requires a bit different handling. > Let's take a look at the following backtrace: > > CPU0 > > spin_lock_irqsave(>lock, flags) << deadlock > serial_foo_write() > call_console_drivers() > console_unlock() > printk() > __queue_work() > tty_flip_buffer_push() > spin_lock_irqsave(>lock, flags) > serial_foo_handle_IRQ() > > > Serial drivers invoke tons of core kernel functions - WQ, MM, etc. All > of which may printk() in various cases. So we can't really just > "remove those printk-s". The simples way to address this seems to be > PRINTK_SAFE_CONTEXT_MASK. serial/UART and printk guys, sorry to bother you, do you hate this idea of removing console_driver->CORE KERNEL->printk->console_driver deadlock path? Or is there any chance we can move forward? -ss
Re: [PATCH] scsi: Use of_node_name_eq for node name comparisons
Rob, > Convert string compares of DT node names to use of_node_name_eq helper > instead. This removes direct access to the node name pointer. Applied to 4.21/scsi-queue, thanks. -- Martin K. Petersen Oracle Linux Engineering
[PATCH] checkstack.pl: dynamic stack growth for aarch64
This is to track dynamic amount of stack growth for aarch64, so it is possible to print out offensive functions that may consume too much stack. For example, 0x284d1270 try_to_unmap_one [vmlinux]: Dynamic (0xcf0) 0x28538358 migrate_page_move_mapping [vmlinux]: Dynamic (0xc60) 0x281276c8 copy_process.isra.2 [vmlinux]: Dynamic (0xb20) 0x28424958 show_free_areas [vmlinux]: Dynamic (0xb40) 0x28545178 __split_huge_pmd_locked [vmlinux]: Dynamic (0xb30) 0x28555120 collapse_shmem [vmlinux]:Dynamic (0xbc0) 0x2862e0d0 do_direct_IO [vmlinux]: Dynamic (0xb70) 0x28cc0aa0 md_do_sync [vmlinux]:Dynamic (0xb90) Signed-off-by: Qian Cai --- scripts/checkstack.pl | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl index 34414c6efad6..122aef5e4e14 100755 --- a/scripts/checkstack.pl +++ b/scripts/checkstack.pl @@ -48,7 +48,9 @@ my (@stack, $re, $dre, $x, $xs, $funcre); $funcre = qr/^$x* <(.*)>:$/; if ($arch eq 'aarch64') { #ffc0006325cc: a9bb7bfdstp x29, x30, [sp, #-80]! + #a110: d11643ffsub sp, sp, #0x590 $re = qr/^.*stp.*sp, \#-([0-9]{1,8})\]\!/o; + $dre = qr/^.*sub.*sp, sp, #(0x$x{1,8})/o; } elsif ($arch eq 'arm') { #c0008ffc: e24dd064sub sp, sp, #100; 0x64 $re = qr/.*sub.*sp, sp, #(([0-9]{2}|[3-9])[0-9]{2})/o; -- 2.17.2 (Apple Git-113)
[PATCH] checkstack.pl: dynamic stack growth for aarch64
This is to track dynamic amount of stack growth for aarch64, so it is possible to print out offensive functions that may consume too much stack. For example, 0x284d1270 try_to_unmap_one [vmlinux]: Dynamic (0xcf0) 0x28538358 migrate_page_move_mapping [vmlinux]: Dynamic (0xc60) 0x281276c8 copy_process.isra.2 [vmlinux]: Dynamic (0xb20) 0x28424958 show_free_areas [vmlinux]: Dynamic (0xb40) 0x28545178 __split_huge_pmd_locked [vmlinux]: Dynamic (0xb30) 0x28555120 collapse_shmem [vmlinux]:Dynamic (0xbc0) 0x2862e0d0 do_direct_IO [vmlinux]: Dynamic (0xb70) 0x28cc0aa0 md_do_sync [vmlinux]:Dynamic (0xb90) Signed-off-by: Qian Cai --- scripts/checkstack.pl | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl index 34414c6efad6..122aef5e4e14 100755 --- a/scripts/checkstack.pl +++ b/scripts/checkstack.pl @@ -48,7 +48,9 @@ my (@stack, $re, $dre, $x, $xs, $funcre); $funcre = qr/^$x* <(.*)>:$/; if ($arch eq 'aarch64') { #ffc0006325cc: a9bb7bfdstp x29, x30, [sp, #-80]! + #a110: d11643ffsub sp, sp, #0x590 $re = qr/^.*stp.*sp, \#-([0-9]{1,8})\]\!/o; + $dre = qr/^.*sub.*sp, sp, #(0x$x{1,8})/o; } elsif ($arch eq 'arm') { #c0008ffc: e24dd064sub sp, sp, #100; 0x64 $re = qr/.*sub.*sp, sp, #(([0-9]{2}|[3-9])[0-9]{2})/o; -- 2.17.2 (Apple Git-113)
Re: [PATCH] x86/cpu: Avoid endless loop to get the number of cache leaves
Sorry for the late reply :) On 2018/11/16 1:22, Borislav Petkov wrote: >> @@ -640,7 +641,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c) >> /* Do cpuid(op) loop to find out num_cache_leaves */ >> cpuid_count(op, i, , , , ); >> cache_eax.full = eax; >> -} while (cache_eax.split.type != CTYPE_NULL); >> +} while (cache_eax.split.type != CTYPE_NULL && i != CTYPE_MAX); > i is an int and CTYPE_MAX is enum _cache_type. Huh? How about define CTYPE_MAX like this: #define CTYPE_MAX 4 > This works by chance because CTYPE_MAX is 4 and the termination CPUID > leaf is the 4th too. It will return CTYPE_NULL when accessing the 4th CPUID leaf in most of the cases, but in certain case it will not. So I think it's better to restrict the maximum CPUID access times to 4 for kernel robustness. -- Regards, Pu Wen
Re: [PATCH] x86/cpu: Avoid endless loop to get the number of cache leaves
Sorry for the late reply :) On 2018/11/16 1:22, Borislav Petkov wrote: >> @@ -640,7 +641,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c) >> /* Do cpuid(op) loop to find out num_cache_leaves */ >> cpuid_count(op, i, , , , ); >> cache_eax.full = eax; >> -} while (cache_eax.split.type != CTYPE_NULL); >> +} while (cache_eax.split.type != CTYPE_NULL && i != CTYPE_MAX); > i is an int and CTYPE_MAX is enum _cache_type. Huh? How about define CTYPE_MAX like this: #define CTYPE_MAX 4 > This works by chance because CTYPE_MAX is 4 and the termination CPUID > leaf is the 4th too. It will return CTYPE_NULL when accessing the 4th CPUID leaf in most of the cases, but in certain case it will not. So I think it's better to restrict the maximum CPUID access times to 4 for kernel robustness. -- Regards, Pu Wen
Re: [PATCH] scsi: qla2xxx: NULL check before some freeing functions is not needed.
Thomas, > NULL check before some freeing functions is not needed. Applied to 4.21/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: qedf: NULL check before some freeing functions is not needed.
Thomas, > NULL check before some freeing functions is not needed. Applied to 4.21/scsi-queue, thanks. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: megaraid_sas: NULL check before some freeing functions is not needed.
Thomas, > NULL check before some freeing functions is not needed. Applied to 4.21/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Gautham R Shenoy writes: > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: >> Sure. I will test the patch and report back. > > I added the following debug patch on top of your patch, and after an > hour's run, the system crashed. Appending the log at the end. Thank you very much for testing! Your debug patch was very helpful as well. > I suppose we still need to increase the number of tries since we wait > only for 2.5ms looping before giving up. Do you think it would have helped? In the debug output you posted I would have expected the following message to show up if the loop finished too early, and it didn't: "Querying DEAD? cpu %i (%i) shows %i\n" So I don't think increasing the loop length would have made a difference. In fact, the call to smp_query_cpu_stopped() always succeeded on the first iteration. I think there is something else going on which we don't fully understand yet. From your other email: > I agree that the Kernel has to respect RTAS's restriction. The PAPR > v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following: > > "The stop-self service needs to be serialized with calls to the > stop-self, start-cpu, and set-power-level services. The OS must > be able to call RTAS services on other processors while the > processor is stopped or being stopped" > > Thus the onus is on the OS to ensure that there are no concurrent rtas > calls with "stop-self" token. As you say perhaps there's another call to stop-self, start-cpu or set-power-level being made concurrently. I don't currently see how more than one stop-self or start-cpu call could be in flight at the same time given that there are a number of locks being grabbed during CPU hotplug and unplug. OTOH the CPU that actually calls stop-self doesn't seem to grab any locks itself so it's a possibility. As for set-power-level, it's only used in the case of PCI hotplug from what I can see, and that isn't part of the picture in this case, right? We could address this problem directly by adding another lock separate from rtas.lock to serialize just these calls. The challenge is with stop-self, because the CPU calling it will never return to release the lock. Is it possible to grab a lock (or down a semaphore) in the CPU calling stop-self and then release the lock (or up the semaphore) in the CPU running pseries_cpu_die()? > > There's also a race between the CPU driving the unplug and the CPU > > being unplugged which I think is not easy for the CPU being > > unplugged to win, which makes the busy loop in pseries_cpu_die() a > > bit fragile. I describe the race in the patch description. > > > > My solution to make the race less tight is to make the CPU driving > > the unplug to only start the busy loop only after the CPU being > > unplugged is in the CPU_STATE_OFFLINE state. At that point, we know > > that it either is about to call RTAS or it already has. > > Ah, yes this is good optimization. Though, I think we ought to > unconditionally wait until the target CPU has woken up from CEDE and > changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we > would have caught it in dlpar_offline_cpu() itself. I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return success when it had been given an invalid memory address to load from, so my confidence in the error reporting of hcalls is a bit shaken. :-) In that case the CPU would wait forever for the CPU state to change. If you believe 100 ms is too short a timeout, we could make it 500 ms or even 1s. What do you think? > cpu 112 (hwid 112) Ready to die... > [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65 > cpu 113 (hwid 113) Ready to die... > [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139 > cpu 114 (hwid 114) Ready to die... > [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036 > cpu 115 (hwid 115) Ready to die... > [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133 > cpu 116 (hwid 116) Ready to die... > [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231 > cpu 117 (hwid 117) Ready to die... > [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231 > cpu 118 (hwid 118) Ready to die... > [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231 > cpu 119 (hwid 119) Ready to die... > [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131 > cpu 104 (hwid 104) Ready to die... > [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40 Interesting, so 1.2 ms can pass before the dying CPU actually gets close to making the stop-self call. And even in those cases the retry loop is succeeding on the first try! So this shows that changing the code to wait for the CPU_STATE_OFFLINE state is worth it. > *** RTAS CALL BUFFER CORRUPTION *** > 393: rtas32_call_buff_ptr= > 0060 0060 0060 0060 [...`...`...`...`] > 0060 0060 0060 0060 [...`...`...`...`] > 0060 0060 0060 0060
Re: [PATCH 01/41] scsi: BusLogic: mark expected switch fall-through
Gustavo A., > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. Applied to 4.21/scsi-queue, thanks. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] drivers/scsi/fnic/fnic_trace.c: Use vzalloc
Sabyasachi, > Replaced vmalloc + memset with vzalloc Applied to 4.21/scsi-queue, thanks. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote: > On 12/7/18 11:16 AM, Jerome Glisse wrote: > > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote: > >> On 12/4/18 5:57 PM, John Hubbard wrote: > >>> On 12/4/18 5:44 PM, Jerome Glisse wrote: > On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote: > > On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote: > >> On 12/4/18 3:03 PM, Dan Williams wrote: > >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how > >>> does this proposal interact with those? > >> > >> Very badly: page->pgmap and page->hmm_data both get corrupted. Is > >> there an entire > >> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? > >> Said another > >> way: is it reasonable to disallow calling get_user_pages() on > >> ZONE_DEVICE pages? > >> > >> If we have to support get_user_pages() on ZONE_DEVICE pages, then the > >> whole > >> LRU field approach is unusable. > > > > We just need to rearrange ZONE_DEVICE pages. Please excuse the > > whitespace > > damage: > > > > +++ b/include/linux/mm_types.h > > @@ -151,10 +151,12 @@ struct page { > > #endif > > }; > > struct {/* ZONE_DEVICE pages */ > > + unsigned long _zd_pad_2;/* LRU */ > > + unsigned long _zd_pad_3;/* LRU */ > > + unsigned long _zd_pad_1;/* uses mapping > > */ > > /** @pgmap: Points to the hosting device page > > map. */ > > struct dev_pagemap *pgmap; > > unsigned long hmm_data; > > - unsigned long _zd_pad_1;/* uses mapping > > */ > > }; > > > > /** @rcu_head: You can use this to free a page by RCU. > > */ > > > > You don't use page->private or page->index, do you Dan? > > page->private and page->index are use by HMM DEVICE page. > > >>> > >>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field > >>> remaining for > >>> dma-pinned information. Which might work. To recap, we need: > >>> > >>> -- 1 bit for PageDmaPinned > >>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru. > >>> -- N bits for a reference count > >>> > >>> Those *could* be packed into a single 64-bit field, if really necessary. > >>> > >> > >> ...actually, this needs to work on 32-bit systems, as well. And HMM is > >> using a lot. > >> However, it is still possible for this to work. > >> > >> Matthew, can I have that bit now please? I'm about out of options, and now > >> it will actually > >> solve the problem here. > >> > >> Given: > >> > >> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not > >> on the LRU. > >> That, in turn, means only 1 bit instead of 2 bits (in addition to a > >> counter) is required, > >> for that case. > >> > >> 2) There is an independent bit available (according to Matthew). > >> > >> 3) HMM uses 4 of the 5 struct page fields, so only one field is available > >> for a counter > >>in that case. > > > > To expend on this, HMM private page are use for anonymous page > > so the index and mapping fields have the value you expect for > > such pages. Down the road i want also to support file backed > > page with HMM private (mapping, private, index). > > > > For HMM public both anonymous and file back page are supported > > today (HMM public is only useful on platform with something like > > OpenCAPI, CCIX or NVlink ... so PowerPC for now). > > > >> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages. > > > > get_user_pages() only need to work with HMM public page not the > > private one as we can not allow _anyone_ to pin HMM private page. > > So on get_user_pages() on HMM private we get a page fault and > > it is migrated back to regular memory. > > > > > >> 5) For a proper atomic counter for both 32- and 64-bit, we really do need > >> a complete > >> unsigned long field. > >> > >> So that leads to the following approach: > >> > >> -- Use a single unsigned long field for an atomic reference count for the > >> DMA pinned count. > >> For normal pages, this will be the *second* field of the LRU (in order to > >> avoid PageTail bit). > >> > >> For ZONE_DEVICE pages, we can also line up the fields so that the second > >> LRU field is > >> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets > >> move up and > >> optionally renamed: > >> > >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > >> index 017ab82e36ca..b5dcd9398cae 100644 > >> --- a/include/linux/mm_types.h > >> +++ b/include/linux/mm_types.h > >> @@ -90,8 +90,8 @@ struct page { > >> * are in use. > >>
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On Fri, Dec 07, 2018 at 04:52:42PM -0800, John Hubbard wrote: > On 12/7/18 11:16 AM, Jerome Glisse wrote: > > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote: > >> On 12/4/18 5:57 PM, John Hubbard wrote: > >>> On 12/4/18 5:44 PM, Jerome Glisse wrote: > On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote: > > On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote: > >> On 12/4/18 3:03 PM, Dan Williams wrote: > >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how > >>> does this proposal interact with those? > >> > >> Very badly: page->pgmap and page->hmm_data both get corrupted. Is > >> there an entire > >> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? > >> Said another > >> way: is it reasonable to disallow calling get_user_pages() on > >> ZONE_DEVICE pages? > >> > >> If we have to support get_user_pages() on ZONE_DEVICE pages, then the > >> whole > >> LRU field approach is unusable. > > > > We just need to rearrange ZONE_DEVICE pages. Please excuse the > > whitespace > > damage: > > > > +++ b/include/linux/mm_types.h > > @@ -151,10 +151,12 @@ struct page { > > #endif > > }; > > struct {/* ZONE_DEVICE pages */ > > + unsigned long _zd_pad_2;/* LRU */ > > + unsigned long _zd_pad_3;/* LRU */ > > + unsigned long _zd_pad_1;/* uses mapping > > */ > > /** @pgmap: Points to the hosting device page > > map. */ > > struct dev_pagemap *pgmap; > > unsigned long hmm_data; > > - unsigned long _zd_pad_1;/* uses mapping > > */ > > }; > > > > /** @rcu_head: You can use this to free a page by RCU. > > */ > > > > You don't use page->private or page->index, do you Dan? > > page->private and page->index are use by HMM DEVICE page. > > >>> > >>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field > >>> remaining for > >>> dma-pinned information. Which might work. To recap, we need: > >>> > >>> -- 1 bit for PageDmaPinned > >>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru. > >>> -- N bits for a reference count > >>> > >>> Those *could* be packed into a single 64-bit field, if really necessary. > >>> > >> > >> ...actually, this needs to work on 32-bit systems, as well. And HMM is > >> using a lot. > >> However, it is still possible for this to work. > >> > >> Matthew, can I have that bit now please? I'm about out of options, and now > >> it will actually > >> solve the problem here. > >> > >> Given: > >> > >> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not > >> on the LRU. > >> That, in turn, means only 1 bit instead of 2 bits (in addition to a > >> counter) is required, > >> for that case. > >> > >> 2) There is an independent bit available (according to Matthew). > >> > >> 3) HMM uses 4 of the 5 struct page fields, so only one field is available > >> for a counter > >>in that case. > > > > To expend on this, HMM private page are use for anonymous page > > so the index and mapping fields have the value you expect for > > such pages. Down the road i want also to support file backed > > page with HMM private (mapping, private, index). > > > > For HMM public both anonymous and file back page are supported > > today (HMM public is only useful on platform with something like > > OpenCAPI, CCIX or NVlink ... so PowerPC for now). > > > >> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages. > > > > get_user_pages() only need to work with HMM public page not the > > private one as we can not allow _anyone_ to pin HMM private page. > > So on get_user_pages() on HMM private we get a page fault and > > it is migrated back to regular memory. > > > > > >> 5) For a proper atomic counter for both 32- and 64-bit, we really do need > >> a complete > >> unsigned long field. > >> > >> So that leads to the following approach: > >> > >> -- Use a single unsigned long field for an atomic reference count for the > >> DMA pinned count. > >> For normal pages, this will be the *second* field of the LRU (in order to > >> avoid PageTail bit). > >> > >> For ZONE_DEVICE pages, we can also line up the fields so that the second > >> LRU field is > >> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets > >> move up and > >> optionally renamed: > >> > >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > >> index 017ab82e36ca..b5dcd9398cae 100644 > >> --- a/include/linux/mm_types.h > >> +++ b/include/linux/mm_types.h > >> @@ -90,8 +90,8 @@ struct page { > >> * are in use. > >>
[PATCH] selftests/vm/gup_benchmark.c: match gup struct to kernel
An expansion field was added to the kernel copy of this structure for future use. See mm/gup_benchmark.c. Add the same expansion field here, so that the IOCTL command decodes correctly. Otherwise, it fails with EINVAL. Signed-off-by: Alison Schofield --- tools/testing/selftests/vm/gup_benchmark.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index 880b96fc80d4..c0534e298b51 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -25,6 +25,7 @@ struct gup_benchmark { __u64 size; __u32 nr_pages_per_call; __u32 flags; + __u64 expansion[10];/* For future use */ }; int main(int argc, char **argv) -- 2.14.1
Re: [PATCH] nvme-rdma: complete requests from ->timeout
On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote: > > > Could you please take a look at this bug and code review? > > > > We are seeing more instances of this bug and found that reconnect_work > > could hang as well, as can be seen from below stacktrace. > > > > Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] > > Call Trace: > > __schedule+0x2ab/0x880 > > schedule+0x36/0x80 > > schedule_timeout+0x161/0x300 > > ? __next_timer_interrupt+0xe0/0xe0 > > io_schedule_timeout+0x1e/0x50 > > wait_for_completion_io_timeout+0x130/0x1a0 > > ? wake_up_q+0x80/0x80 > > blk_execute_rq+0x6e/0xa0 > > __nvme_submit_sync_cmd+0x6e/0xe0 > > nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics] > > ? wait_for_completion_interruptible_timeout+0x157/0x1b0 > > nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma] > > nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma] > > nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma] > > process_one_work+0x179/0x390 > > worker_thread+0x4f/0x3e0 > > kthread+0x105/0x140 > > ? max_active_store+0x80/0x80 > > ? kthread_bind+0x20/0x20 > > > > This bug is produced by setting MTU of RoCE interface to '568' for > > test while running I/O traffics. > > I think that with the latest changes from Keith we can no longer rely > on blk-mq to barrier racing completions. We will probably need > to barrier ourselves in nvme-rdma... You really need to do that anyway. If you were relying on blk-mq to save you from double completions by ending a request in the nvme driver while the lower half can still complete the same one, the only thing preventing data corruption is the probability the request wasn't reallocated for a new command.
[PATCH] selftests/vm/gup_benchmark.c: match gup struct to kernel
An expansion field was added to the kernel copy of this structure for future use. See mm/gup_benchmark.c. Add the same expansion field here, so that the IOCTL command decodes correctly. Otherwise, it fails with EINVAL. Signed-off-by: Alison Schofield --- tools/testing/selftests/vm/gup_benchmark.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index 880b96fc80d4..c0534e298b51 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -25,6 +25,7 @@ struct gup_benchmark { __u64 size; __u32 nr_pages_per_call; __u32 flags; + __u64 expansion[10];/* For future use */ }; int main(int argc, char **argv) -- 2.14.1
Re: [PATCH] nvme-rdma: complete requests from ->timeout
On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote: > > > Could you please take a look at this bug and code review? > > > > We are seeing more instances of this bug and found that reconnect_work > > could hang as well, as can be seen from below stacktrace. > > > > Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] > > Call Trace: > > __schedule+0x2ab/0x880 > > schedule+0x36/0x80 > > schedule_timeout+0x161/0x300 > > ? __next_timer_interrupt+0xe0/0xe0 > > io_schedule_timeout+0x1e/0x50 > > wait_for_completion_io_timeout+0x130/0x1a0 > > ? wake_up_q+0x80/0x80 > > blk_execute_rq+0x6e/0xa0 > > __nvme_submit_sync_cmd+0x6e/0xe0 > > nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics] > > ? wait_for_completion_interruptible_timeout+0x157/0x1b0 > > nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma] > > nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma] > > nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma] > > process_one_work+0x179/0x390 > > worker_thread+0x4f/0x3e0 > > kthread+0x105/0x140 > > ? max_active_store+0x80/0x80 > > ? kthread_bind+0x20/0x20 > > > > This bug is produced by setting MTU of RoCE interface to '568' for > > test while running I/O traffics. > > I think that with the latest changes from Keith we can no longer rely > on blk-mq to barrier racing completions. We will probably need > to barrier ourselves in nvme-rdma... You really need to do that anyway. If you were relying on blk-mq to save you from double completions by ending a request in the nvme driver while the lower half can still complete the same one, the only thing preventing data corruption is the probability the request wasn't reallocated for a new command.
Re: [PATCH v2 18/34] dt-bindings: arm: Convert FSL board/soc bindings to json-schema
On Thu, Dec 06, 2018 at 05:33:13PM -0600, Rob Herring wrote: > On Wed, Dec 5, 2018 at 8:32 PM Shawn Guo wrote: > > > > On Mon, Dec 03, 2018 at 03:32:07PM -0600, Rob Herring wrote: > > > Convert Freescale SoC bindings to DT schema format using json-schema. > > > > > > Cc: Shawn Guo > > > Cc: Mark Rutland > > > Cc: devicet...@vger.kernel.org > > > Signed-off-by: Rob Herring > > > --- > > > .../devicetree/bindings/arm/armadeus.txt | 6 - > > > Documentation/devicetree/bindings/arm/bhf.txt | 6 - > > > .../bindings/arm/compulab-boards.txt | 25 -- > > > Documentation/devicetree/bindings/arm/fsl.txt | 229 -- > > > .../devicetree/bindings/arm/fsl.yaml | 214 > > > > Rob, > > > > I do have any changes on bindings/arm/fsl.txt queued for 4.21 on my > > tree, so please send it via your tree. > > What about: > > c386f362957b dt-bindings: Add compatible string for LS1028A-QDS > 3671cd57de06 dt-bindings: ls1012a: Add FRWY-LS1012A device tree binding Ah, sorry, I only checked on imx/dt branch and forgot imx/dt64. I will drop the changes on fsl.txt and update fsl.yaml after it hits mainline. Shawn
Query about platform device registration
Hi I am registering a device using platform_device_register_full(). In the above-mentioned api argument - 'struct platform_device_info', I set the parent device pointer, set_dma_mask from parent device etc. The parent device is created by declaration in device tree and the relevant node has property 'dma-coherent'. So I get dma_coherent attribute as 1 for the parent device. In other words, is_dev_dma_coherent() returns 1 for parent device. But for the device registered using platform_device_register_full(), the dma_coherent is not set and is_dev_dma_coherent() returns 0. Can someone please guide me, how to declare the newly registered device as 'dma_coherent'. Regards Vakul
Query about platform device registration
Hi I am registering a device using platform_device_register_full(). In the above-mentioned api argument - 'struct platform_device_info', I set the parent device pointer, set_dma_mask from parent device etc. The parent device is created by declaration in device tree and the relevant node has property 'dma-coherent'. So I get dma_coherent attribute as 1 for the parent device. In other words, is_dev_dma_coherent() returns 1 for parent device. But for the device registered using platform_device_register_full(), the dma_coherent is not set and is_dev_dma_coherent() returns 0. Can someone please guide me, how to declare the newly registered device as 'dma_coherent'. Regards Vakul
[PATCH v3 2/2] USB: quirks: Disable LPM for Logitech UVC devices
From: Kyle Williams Description: Some USB device / host controller combinations seem to have problems with Link Power management. In particular it is described that the combination of certain Logitech uvc devices and other powered media devices such causes 'not enough bandwidth for new device state' error. This patch enables the USB_QUIRK_NO_LPM quirk entries for all connected Logitech UVC devices indicating LPM should remain disabled for the device. Signed-off-by: Kyle Williams --- drivers/usb/core/quirks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 0690fcff0ea2..7e6df958 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -440,7 +440,7 @@ static const struct usb_device_id usb_quirk_list[] = { static const struct usb_device_id usb_interface_quirk_list[] = { /* Logitech UVC Cameras */ { USB_VENDOR_AND_INTERFACE_INFO(0x046d, USB_CLASS_VIDEO, 1, 0), - .driver_info = USB_QUIRK_RESET_RESUME }, + .driver_info = USB_QUIRK_RESET_RESUME | USB_QUIRK_NO_LPM }, { } /* terminating entry must be last */ }; -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH v3 0/2] Disable LPM by matching interface
Changes in v3: - seperated work to allow matched interfaces to use the USB_QUIRK_NO_LPM quirk to affect a broader range of devices instead of setting each device individually Kyle Williams (2): USB: quirks: Check device interface LPM capability USB: quirks: Disable LPM for Logitech UVC devices drivers/usb/core/hub.c| 87 +-- drivers/usb/core/quirks.c | 2 +- 2 files changed, 49 insertions(+), 40 deletions(-) -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH v3 1/2] USB: quirks: Check device interface LPM capability
From: Kyle Williams Description: enable the ability to disable LPM for all devices matched by interface information Signed-off-by: Kyle Williams --- drivers/usb/core/hub.c | 87 +++--- 1 file changed, 48 insertions(+), 39 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 0f9381b69a3b..8f366ec4d21b 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -365,6 +365,51 @@ static void usb_set_lpm_parameters(struct usb_device *udev) usb_set_lpm_sel(udev, >u2_params); } +/* + * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM + * when they're plugged into a USB 2.0 port, but they don't work when LPM is + * enabled. + * + * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the + * device says it supports the new USB 2.0 Link PM errata by setting the BESL + * support bit in the BOS descriptor. + */ +static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) +{ + struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); + int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; + + if (!udev->usb2_hw_lpm_capable || !udev->bos) + return; + + if (hub) + connect_type = hub->ports[udev->portnum - 1]->connect_type; + + if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) + || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { + udev->usb2_hw_lpm_allowed = 1; + usb_set_usb2_hardware_lpm(udev, 1); + } +} + +void usb_update_device_lpm(struct usb_hcd *hcd, struct usb_device *udev) +{ + int retval; + + if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) { + retval = usb_get_bos_descriptor(udev); + if (!retval) { + udev->lpm_capable = usb_device_supports_lpm(udev); + usb_set_lpm_parameters(udev); + } + } + + /* notify HCD that we have a device connected and addressed */ + if (hcd->driver->update_device) + hcd->driver->update_device(hcd, udev); + hub_set_initial_usb2_lpm_policy(udev); +} + /* USB 2.0 spec Section 11.24.4.5 */ static int get_hub_descriptor(struct usb_device *hdev, struct usb_hub_descriptor *desc) @@ -2295,7 +2340,6 @@ static int usb_enumerate_device_otg(struct usb_device *udev) return err; } - /** * usb_enumerate_device - Read device configs/intfs/otg (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) @@ -2351,6 +2395,8 @@ static int usb_enumerate_device(struct usb_device *udev) usb_detect_interface_quirks(udev); + usb_update_device_lpm(hcd, udev); + return 0; } @@ -4402,33 +4448,6 @@ static int hub_set_address(struct usb_device *udev, int devnum) return retval; } -/* - * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM - * when they're plugged into a USB 2.0 port, but they don't work when LPM is - * enabled. - * - * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the - * device says it supports the new USB 2.0 Link PM errata by setting the BESL - * support bit in the BOS descriptor. - */ -static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) -{ - struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); - int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; - - if (!udev->usb2_hw_lpm_capable || !udev->bos) - return; - - if (hub) - connect_type = hub->ports[udev->portnum - 1]->connect_type; - - if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) || - connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { - udev->usb2_hw_lpm_allowed = 1; - usb_set_usb2_hardware_lpm(udev, 1); - } -} - static int hub_enable_device(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); @@ -4779,19 +4798,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, usb_detect_quirks(udev); - if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) { - retval = usb_get_bos_descriptor(udev); - if (!retval) { - udev->lpm_capable = usb_device_supports_lpm(udev); - usb_set_lpm_parameters(udev); - } - } + usb_update_device_lpm(hcd, udev); retval = 0; - /* notify HCD that we have a device connected and addressed */ - if (hcd->driver->update_device) - hcd->driver->update_device(hcd, udev); - hub_set_initial_usb2_lpm_policy(udev); fail: if (retval) { hub_port_disable(hub, port1, 0); -- 2.20.0.rc2.403.gdbc3b29805-goog
Re: [PATCH 1/1] virtio: remove deprecated VIRTIO_PCI_CONFIG()
On 12/08/2018 02:01 AM, Michael S. Tsirkin wrote: > On Fri, Dec 07, 2018 at 03:34:41PM +0800, Dongli Zhang wrote: >> VIRTIO_PCI_CONFIG() is deprecated. Use VIRTIO_PCI_CONFIG_OFF() instead. >> >> Signed-off-by: Dongli Zhang >> --- >> drivers/virtio/virtio_pci_legacy.c | 6 -- >> include/uapi/linux/virtio_pci.h| 2 -- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/virtio/virtio_pci_legacy.c >> b/drivers/virtio/virtio_pci_legacy.c >> index de062fb..eff9ddc 100644 >> --- a/drivers/virtio/virtio_pci_legacy.c >> +++ b/drivers/virtio/virtio_pci_legacy.c >> @@ -52,7 +52,8 @@ static void vp_get(struct virtio_device *vdev, unsigned >> offset, >> { >> struct virtio_pci_device *vp_dev = to_vp_device(vdev); >> void __iomem *ioaddr = vp_dev->ioaddr + >> -VIRTIO_PCI_CONFIG(vp_dev) + offset; >> +VIRTIO_PCI_CONFIG_OFF(vp_dev->msix_enabled) + >> +offset; >> u8 *ptr = buf; >> int i; >> >> @@ -67,7 +68,8 @@ static void vp_set(struct virtio_device *vdev, unsigned >> offset, >> { >> struct virtio_pci_device *vp_dev = to_vp_device(vdev); >> void __iomem *ioaddr = vp_dev->ioaddr + >> -VIRTIO_PCI_CONFIG(vp_dev) + offset; >> +VIRTIO_PCI_CONFIG_OFF(vp_dev->msix_enabled) + >> +offset; >> const u8 *ptr = buf; >> int i; >> > > I agree that VIRTIO_PCI_CONFIG_OFF is a better interface. So above looks > fine. > >> diff --git a/include/uapi/linux/virtio_pci.h >> b/include/uapi/linux/virtio_pci.h >> index 90007a1..2070232 100644 >> --- a/include/uapi/linux/virtio_pci.h >> +++ b/include/uapi/linux/virtio_pci.h >> @@ -78,8 +78,6 @@ >> /* The remaining space is defined by each driver as the per-driver >> * configuration space */ >> #define VIRTIO_PCI_CONFIG_OFF(msix_enabled) ((msix_enabled) ? 24 : 20) >> -/* Deprecated: please use VIRTIO_PCI_CONFIG_OFF instead */ >> -#define VIRTIO_PCI_CONFIG(dev) >> VIRTIO_PCI_CONFIG_OFF((dev)->msix_enabled) >> >> /* Virtio ABI version, this must match exactly */ >> #define VIRTIO_PCI_ABI_VERSION 0 > > This might break some userspace builds, I don't see why we should bother > removing it. Any reason? Apologies. I thought about some compatibility issue for building third-party kernel module at userspace, but did not realize it will break other userspace software builds. I will keep the definition of VIRTIO_PCI_CONFIG() and resend again. Thank you very much! Dongli Zhang > > >> -- >> 2.7.4
[PATCH bpf-next 2/3] bpf: add bpffs pretty print for cgroup local storage maps
Implement bpffs pretty printing for cgroup local storage maps (both shared and per-cpu). Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c): Shared: $ cat /sys/fs/bpf/map_2 # WARNING!! The output is for debug purpose only # WARNING!! The output format will change {4294968594,1}: {,1039896} Per-cpu: $ cat /sys/fs/bpf/map_1 # WARNING!! The output is for debug purpose only # WARNING!! The output format will change {4294968594,1}: { cpu0: {0,0,0,0,0} cpu1: {0,0,0,0,0} cpu2: {1,104,0,0,0} cpu3: {0,0,0,0,0} } Signed-off-by: Roman Gushchin Cc: Alexei Starovoitov Cc: Daniel Borkmann --- include/linux/btf.h| 10 + kernel/bpf/local_storage.c | 90 +- 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 8c2199b5d250..ac67bc4cbfd9 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -5,6 +5,7 @@ #define _LINUX_BTF_H 1 #include +#include struct btf; struct btf_type; @@ -63,4 +64,13 @@ static inline const char *btf_name_by_offset(const struct btf *btf, } #endif +static inline const struct btf_type *btf_orig_type(const struct btf *btf, + const struct btf_type *t) +{ + while (t && BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF) + t = btf_type_by_id(btf, t->type); + + return t; +} + #endif diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c index b65017dead44..7b51fe1aba3c 100644 --- a/kernel/bpf/local_storage.c +++ b/kernel/bpf/local_storage.c @@ -1,11 +1,13 @@ //SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include #include #include +#include DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); @@ -308,6 +310,91 @@ static int cgroup_storage_delete_elem(struct bpf_map *map, void *key) return -EINVAL; } +static int cgroup_storage_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) +{ + const struct btf_type *st, *t; + struct btf_member *m; + + /* Key is expected to be of struct bpf_cgroup_storage_key type, +* which is: +* struct bpf_cgroup_storage_key { +* __u64 cgroup_inode_id; +* __u32 attach_type; +* }; +*/ + + /* +* Key_type must be a structure (or a typedef of a structure) with +* two members. +*/ + st = btf_orig_type(btf, key_type); + if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT || + BTF_INFO_VLEN(st->info) != 2) + return -EINVAL; + + /* +* The first field must be a 64 bit integer at 0 offset. +*/ + m = (struct btf_member *)(st + 1); + t = btf_orig_type(btf, btf_type_by_id(btf, m->type)); + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || m->offset || + t->size != + FIELD_SIZEOF(struct bpf_cgroup_storage_key, cgroup_inode_id)) + return -EINVAL; + + /* +* The second field must be a 32 bit integer at 0 offset. +*/ + m = m + 1; + t = btf_orig_type(btf, btf_type_by_id(btf, m->type)); + if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_INT || + m->offset != offsetof(struct bpf_cgroup_storage_key, attach_type) * + BITS_PER_BYTE || t->size != + FIELD_SIZEOF(struct bpf_cgroup_storage_key, attach_type)) + return -EINVAL; + + return 0; +} + +static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *_key, +struct seq_file *m) +{ + enum bpf_cgroup_storage_type stype = cgroup_storage_type(map); + struct bpf_cgroup_storage_key *key = _key; + struct bpf_cgroup_storage *storage; + int cpu; + + rcu_read_lock(); + storage = cgroup_storage_lookup(map_to_storage(map), key, false); + if (!storage) { + rcu_read_unlock(); + return; + } + + btf_type_seq_show(map->btf, map->btf_key_type_id, key, m); + stype = cgroup_storage_type(map); + if (stype == BPF_CGROUP_STORAGE_SHARED) { + seq_puts(m, ": "); + btf_type_seq_show(map->btf, map->btf_value_type_id, + _ONCE(storage->buf)->data[0], m); + seq_puts(m, "\n"); + } else { + seq_puts(m, ": {\n"); + for_each_possible_cpu(cpu) { + seq_printf(m, "\tcpu%d: ", cpu); + btf_type_seq_show(map->btf, map->btf_value_type_id, + per_cpu_ptr(storage->percpu_buf, cpu), +
[PATCH bpf-next 1/3] bpf: pass struct btf pointer to the map_check_btf() callback
If key_type or value_type are of non-trivial data types (e.g. structure or typedef), it's not possible to check them without the additional information, which can't be obtained without a pointer to the btf structure. So, let's pass btf pointer to the map_check_btf() callbacks. Signed-off-by: Roman Gushchin Cc: Alexei Starovoitov Cc: Daniel Borkmann --- include/linux/bpf.h | 3 +++ kernel/bpf/arraymap.c | 1 + kernel/bpf/lpm_trie.c | 1 + kernel/bpf/syscall.c | 3 ++- 4 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e82b7039fc66..128d93540b23 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -23,6 +23,7 @@ struct bpf_prog; struct bpf_map; struct sock; struct seq_file; +struct btf; struct btf_type; /* map is generic key/value storage optionally accesible by eBPF programs */ @@ -52,6 +53,7 @@ struct bpf_map_ops { void (*map_seq_show_elem)(struct bpf_map *map, void *key, struct seq_file *m); int (*map_check_btf)(const struct bpf_map *map, +const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type); }; @@ -126,6 +128,7 @@ static inline bool bpf_map_support_seq_show(const struct bpf_map *map) } int map_check_no_btf(const struct bpf_map *map, +const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type); diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 24583da9ffd1..25632a75d630 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -382,6 +382,7 @@ static void percpu_array_map_seq_show_elem(struct bpf_map *map, void *key, } static int array_map_check_btf(const struct bpf_map *map, + const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index bfd4882e1106..abf1002080df 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -728,6 +728,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key) } static int trie_check_btf(const struct bpf_map *map, + const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index aa05aa38f4a8..7c2e8ab03a34 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -456,6 +456,7 @@ static int bpf_obj_name_cpy(char *dst, const char *src) } int map_check_no_btf(const struct bpf_map *map, +const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { @@ -478,7 +479,7 @@ static int map_check_btf(const struct bpf_map *map, const struct btf *btf, return -EINVAL; if (map->ops->map_check_btf) - ret = map->ops->map_check_btf(map, key_type, value_type); + ret = map->ops->map_check_btf(map, btf, key_type, value_type); return ret; } -- 2.19.2
[PATCH bpf-next 3/3] selftests/bpf: add btf annotations for cgroup_local_storage maps
Add btf annotations to cgroup local storage maps (per-cpu and shared) in the network packet counting example. Signed-off-by: Roman Gushchin Cc: Alexei Starovoitov Cc: Daniel Borkmann --- tools/testing/selftests/bpf/netcnt_prog.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/tools/testing/selftests/bpf/netcnt_prog.c b/tools/testing/selftests/bpf/netcnt_prog.c index 1198abca1360..9f741e69cebe 100644 --- a/tools/testing/selftests/bpf/netcnt_prog.c +++ b/tools/testing/selftests/bpf/netcnt_prog.c @@ -16,12 +16,18 @@ struct bpf_map_def SEC("maps") percpu_netcnt = { .value_size = sizeof(struct percpu_net_cnt), }; +BPF_ANNOTATE_KV_PAIR(percpu_netcnt, struct bpf_cgroup_storage_key, +struct percpu_net_cnt); + struct bpf_map_def SEC("maps") netcnt = { .type = BPF_MAP_TYPE_CGROUP_STORAGE, .key_size = sizeof(struct bpf_cgroup_storage_key), .value_size = sizeof(struct net_cnt), }; +BPF_ANNOTATE_KV_PAIR(netcnt, struct bpf_cgroup_storage_key, +struct net_cnt); + SEC("cgroup/skb") int bpf_nextcnt(struct __sk_buff *skb) { -- 2.19.2
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On 12/7/18 11:16 AM, Jerome Glisse wrote: > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote: >> On 12/4/18 5:57 PM, John Hubbard wrote: >>> On 12/4/18 5:44 PM, Jerome Glisse wrote: On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote: > On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote: >> On 12/4/18 3:03 PM, Dan Williams wrote: >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how >>> does this proposal interact with those? >> >> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there >> an entire >> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? >> Said another >> way: is it reasonable to disallow calling get_user_pages() on >> ZONE_DEVICE pages? >> >> If we have to support get_user_pages() on ZONE_DEVICE pages, then the >> whole >> LRU field approach is unusable. > > We just need to rearrange ZONE_DEVICE pages. Please excuse the whitespace > damage: > > +++ b/include/linux/mm_types.h > @@ -151,10 +151,12 @@ struct page { > #endif > }; > struct {/* ZONE_DEVICE pages */ > + unsigned long _zd_pad_2;/* LRU */ > + unsigned long _zd_pad_3;/* LRU */ > + unsigned long _zd_pad_1;/* uses mapping */ > /** @pgmap: Points to the hosting device page > map. */ > struct dev_pagemap *pgmap; > unsigned long hmm_data; > - unsigned long _zd_pad_1;/* uses mapping */ > }; > > /** @rcu_head: You can use this to free a page by RCU. */ > > You don't use page->private or page->index, do you Dan? page->private and page->index are use by HMM DEVICE page. >>> >>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining >>> for >>> dma-pinned information. Which might work. To recap, we need: >>> >>> -- 1 bit for PageDmaPinned >>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru. >>> -- N bits for a reference count >>> >>> Those *could* be packed into a single 64-bit field, if really necessary. >>> >> >> ...actually, this needs to work on 32-bit systems, as well. And HMM is using >> a lot. >> However, it is still possible for this to work. >> >> Matthew, can I have that bit now please? I'm about out of options, and now >> it will actually >> solve the problem here. >> >> Given: >> >> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on >> the LRU. >> That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) >> is required, >> for that case. >> >> 2) There is an independent bit available (according to Matthew). >> >> 3) HMM uses 4 of the 5 struct page fields, so only one field is available >> for a counter >>in that case. > > To expend on this, HMM private page are use for anonymous page > so the index and mapping fields have the value you expect for > such pages. Down the road i want also to support file backed > page with HMM private (mapping, private, index). > > For HMM public both anonymous and file back page are supported > today (HMM public is only useful on platform with something like > OpenCAPI, CCIX or NVlink ... so PowerPC for now). > >> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages. > > get_user_pages() only need to work with HMM public page not the > private one as we can not allow _anyone_ to pin HMM private page. > So on get_user_pages() on HMM private we get a page fault and > it is migrated back to regular memory. > > >> 5) For a proper atomic counter for both 32- and 64-bit, we really do need a >> complete >> unsigned long field. >> >> So that leads to the following approach: >> >> -- Use a single unsigned long field for an atomic reference count for the >> DMA pinned count. >> For normal pages, this will be the *second* field of the LRU (in order to >> avoid PageTail bit). >> >> For ZONE_DEVICE pages, we can also line up the fields so that the second LRU >> field is >> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets >> move up and >> optionally renamed: >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 017ab82e36ca..b5dcd9398cae 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -90,8 +90,8 @@ struct page { >> * are in use. >> */ >> struct { >> - unsigned long dma_pinned_flags; >> - atomic_t dma_pinned_count; >> + unsigned long dma_pinned_flags; /* >> LRU.next */ >> +
Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
On 12/7/18 11:16 AM, Jerome Glisse wrote: > On Thu, Dec 06, 2018 at 06:45:49PM -0800, John Hubbard wrote: >> On 12/4/18 5:57 PM, John Hubbard wrote: >>> On 12/4/18 5:44 PM, Jerome Glisse wrote: On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote: > On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote: >> On 12/4/18 3:03 PM, Dan Williams wrote: >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how >>> does this proposal interact with those? >> >> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there >> an entire >> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? >> Said another >> way: is it reasonable to disallow calling get_user_pages() on >> ZONE_DEVICE pages? >> >> If we have to support get_user_pages() on ZONE_DEVICE pages, then the >> whole >> LRU field approach is unusable. > > We just need to rearrange ZONE_DEVICE pages. Please excuse the whitespace > damage: > > +++ b/include/linux/mm_types.h > @@ -151,10 +151,12 @@ struct page { > #endif > }; > struct {/* ZONE_DEVICE pages */ > + unsigned long _zd_pad_2;/* LRU */ > + unsigned long _zd_pad_3;/* LRU */ > + unsigned long _zd_pad_1;/* uses mapping */ > /** @pgmap: Points to the hosting device page > map. */ > struct dev_pagemap *pgmap; > unsigned long hmm_data; > - unsigned long _zd_pad_1;/* uses mapping */ > }; > > /** @rcu_head: You can use this to free a page by RCU. */ > > You don't use page->private or page->index, do you Dan? page->private and page->index are use by HMM DEVICE page. >>> >>> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining >>> for >>> dma-pinned information. Which might work. To recap, we need: >>> >>> -- 1 bit for PageDmaPinned >>> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru. >>> -- N bits for a reference count >>> >>> Those *could* be packed into a single 64-bit field, if really necessary. >>> >> >> ...actually, this needs to work on 32-bit systems, as well. And HMM is using >> a lot. >> However, it is still possible for this to work. >> >> Matthew, can I have that bit now please? I'm about out of options, and now >> it will actually >> solve the problem here. >> >> Given: >> >> 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on >> the LRU. >> That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) >> is required, >> for that case. >> >> 2) There is an independent bit available (according to Matthew). >> >> 3) HMM uses 4 of the 5 struct page fields, so only one field is available >> for a counter >>in that case. > > To expend on this, HMM private page are use for anonymous page > so the index and mapping fields have the value you expect for > such pages. Down the road i want also to support file backed > page with HMM private (mapping, private, index). > > For HMM public both anonymous and file back page are supported > today (HMM public is only useful on platform with something like > OpenCAPI, CCIX or NVlink ... so PowerPC for now). > >> 4) get_user_pages() must work on ZONE_DEVICE and HMM pages. > > get_user_pages() only need to work with HMM public page not the > private one as we can not allow _anyone_ to pin HMM private page. > So on get_user_pages() on HMM private we get a page fault and > it is migrated back to regular memory. > > >> 5) For a proper atomic counter for both 32- and 64-bit, we really do need a >> complete >> unsigned long field. >> >> So that leads to the following approach: >> >> -- Use a single unsigned long field for an atomic reference count for the >> DMA pinned count. >> For normal pages, this will be the *second* field of the LRU (in order to >> avoid PageTail bit). >> >> For ZONE_DEVICE pages, we can also line up the fields so that the second LRU >> field is >> available and reserved for this DMA pinned count. Basically _zd_pad_1 gets >> move up and >> optionally renamed: >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 017ab82e36ca..b5dcd9398cae 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -90,8 +90,8 @@ struct page { >> * are in use. >> */ >> struct { >> - unsigned long dma_pinned_flags; >> - atomic_t dma_pinned_count; >> + unsigned long dma_pinned_flags; /* >> LRU.next */ >> +
Re: [PATCH] Fix sync. in blkdev_write_iter() acessing i_flags
On Fri, Dec 07, 2018 at 08:49:16PM +0100, Alexander Lochmann wrote: > > _What_ SUID bit? We are talking about a write to block device, for fsck > > sake... > > > That's the way I understood Jan's explanation: > " > Thinking more about this I'm not sure if this is actually the right > solution. Because for example the write(2) can set S_NOSEC flag wrongly > when it would race with chmod adding SUID bit. So probably we rather need > to acquire i_rwsem in blkdev_write_iter() if file does not have S_NOSEC set > (we don't want to acquire it unconditionally as that would heavily impact > scalability of block device writes). IDGI. We are talking about a block device here. What business could file_remove_privs() have doing _anything_ to it? should_remove_suid() returns to return 0 for those; what case do you have in mind? Somebody setting security.capabilities on a block device inode? IMO the bug here is file_remove_privs() not buggering off immediately after having observed that we are dealing with a block device. It really has nothing useful to do.
Re: [PATCH] Fix sync. in blkdev_write_iter() acessing i_flags
On Fri, Dec 07, 2018 at 08:49:16PM +0100, Alexander Lochmann wrote: > > _What_ SUID bit? We are talking about a write to block device, for fsck > > sake... > > > That's the way I understood Jan's explanation: > " > Thinking more about this I'm not sure if this is actually the right > solution. Because for example the write(2) can set S_NOSEC flag wrongly > when it would race with chmod adding SUID bit. So probably we rather need > to acquire i_rwsem in blkdev_write_iter() if file does not have S_NOSEC set > (we don't want to acquire it unconditionally as that would heavily impact > scalability of block device writes). IDGI. We are talking about a block device here. What business could file_remove_privs() have doing _anything_ to it? should_remove_suid() returns to return 0 for those; what case do you have in mind? Somebody setting security.capabilities on a block device inode? IMO the bug here is file_remove_privs() not buggering off immediately after having observed that we are dealing with a block device. It really has nothing useful to do.
[PATCH v2] USB: quirks: disable LPM for Logitech UVC devices
From: Kyle Williams Description: Some USB device / host controller combinations seem to have problems with Link Power management. In particular it is described that the combination of certain Logitech uvc devices and other powered media devices such causes 'not enough bandwidth for new device state' error. This patch enables the USB_QUIRK_NO_LPM quirk entries for all connected Logitech UVC devices indicating LPM should remain disabled for the device. Signed-off-by: Kyle Williams --- Changes in v2: - changed commit message - disble lpm for all logitech uvc devices instead of listing manually - changes to allow the lpm to correctly be disabled after enumeration drivers/usb/core/hub.c| 87 +-- drivers/usb/core/quirks.c | 2 +- 2 files changed, 49 insertions(+), 40 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 0f9381b69a3b..8f366ec4d21b 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -365,6 +365,51 @@ static void usb_set_lpm_parameters(struct usb_device *udev) usb_set_lpm_sel(udev, >u2_params); } +/* + * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM + * when they're plugged into a USB 2.0 port, but they don't work when LPM is + * enabled. + * + * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the + * device says it supports the new USB 2.0 Link PM errata by setting the BESL + * support bit in the BOS descriptor. + */ +static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) +{ + struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); + int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; + + if (!udev->usb2_hw_lpm_capable || !udev->bos) + return; + + if (hub) + connect_type = hub->ports[udev->portnum - 1]->connect_type; + + if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) + || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { + udev->usb2_hw_lpm_allowed = 1; + usb_set_usb2_hardware_lpm(udev, 1); + } +} + +void usb_update_device_lpm(struct usb_hcd *hcd, struct usb_device *udev) +{ + int retval; + + if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) { + retval = usb_get_bos_descriptor(udev); + if (!retval) { + udev->lpm_capable = usb_device_supports_lpm(udev); + usb_set_lpm_parameters(udev); + } + } + + /* notify HCD that we have a device connected and addressed */ + if (hcd->driver->update_device) + hcd->driver->update_device(hcd, udev); + hub_set_initial_usb2_lpm_policy(udev); +} + /* USB 2.0 spec Section 11.24.4.5 */ static int get_hub_descriptor(struct usb_device *hdev, struct usb_hub_descriptor *desc) @@ -2295,7 +2340,6 @@ static int usb_enumerate_device_otg(struct usb_device *udev) return err; } - /** * usb_enumerate_device - Read device configs/intfs/otg (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) @@ -2351,6 +2395,8 @@ static int usb_enumerate_device(struct usb_device *udev) usb_detect_interface_quirks(udev); + usb_update_device_lpm(hcd, udev); + return 0; } @@ -4402,33 +4448,6 @@ static int hub_set_address(struct usb_device *udev, int devnum) return retval; } -/* - * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM - * when they're plugged into a USB 2.0 port, but they don't work when LPM is - * enabled. - * - * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the - * device says it supports the new USB 2.0 Link PM errata by setting the BESL - * support bit in the BOS descriptor. - */ -static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) -{ - struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); - int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; - - if (!udev->usb2_hw_lpm_capable || !udev->bos) - return; - - if (hub) - connect_type = hub->ports[udev->portnum - 1]->connect_type; - - if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) || - connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { - udev->usb2_hw_lpm_allowed = 1; - usb_set_usb2_hardware_lpm(udev, 1); - } -} - static int hub_enable_device(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); @@ -4779,19 +4798,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, usb_detect_quirks(udev); - if (udev->wusb == 0 && le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) { - retval = usb_get_bos_descriptor(udev); - if (!retval) { - udev->lpm_capable = usb_device_supports_lpm(udev); -
Re: Recommended driver for current AMD processors
Hi Paul, On Fri, 7 Dec 2018 at 15:32, Paul Menzel wrote: > > Dear Linux folks, > > > What driver is recommended for current AMD Ryzen based processors > like *AMD Ryzen 5 PRO 1500 Quad-Core Processor* or *AMD EPYC 7601 > 32-Core Processor*? > > Only from the acpi-cpufreq Kconfig description, I assume, that that > driver should be used. > > > config X86_ACPI_CPUFREQ > > tristate "ACPI Processor P-States driver" > > depends on ACPI_PROCESSOR > > help > > This driver adds a CPUFreq driver which utilizes the ACPI > > Processor Performance States. > > This driver also supports Intel Enhanced Speedstep and newer > > AMD CPUs. > > > > To compile this driver as a module, choose M here: the > > module will be called acpi-cpufreq. > > > > For details, take a look at . > > > > If in doubt, say N. > > Would a “native” driver like Intel’s P state driver also give better > results? Do you know if AMD is working on something like that? > > > > Kind regards, > > Paul > As a mere user, I bought a ryzen 3 1300X earlier this year, and was annoyed about how the frequencies changed when a single-threaded compile moved to a different core (unlike e.g. a haswell where frequencies barely vary). I have a meter which can report the current power consumption for the system (computer, monitor, network switch, kvm switch), and using that to keep an eye on the range of reported wattages when idle, compiling with -j1 and compiling with -j4, I eventually formed the impression that the ondemand governor was marginally better than the performance governor, and that omitting cpufreq did not appear to increase the poer consumption. But that is just one set of observations. I agree that using less power and getting faster compiles would be nice, so if something is available I'll be keen to try it. ĸen
Re: Recommended driver for current AMD processors
Hi Paul, On Fri, 7 Dec 2018 at 15:32, Paul Menzel wrote: > > Dear Linux folks, > > > What driver is recommended for current AMD Ryzen based processors > like *AMD Ryzen 5 PRO 1500 Quad-Core Processor* or *AMD EPYC 7601 > 32-Core Processor*? > > Only from the acpi-cpufreq Kconfig description, I assume, that that > driver should be used. > > > config X86_ACPI_CPUFREQ > > tristate "ACPI Processor P-States driver" > > depends on ACPI_PROCESSOR > > help > > This driver adds a CPUFreq driver which utilizes the ACPI > > Processor Performance States. > > This driver also supports Intel Enhanced Speedstep and newer > > AMD CPUs. > > > > To compile this driver as a module, choose M here: the > > module will be called acpi-cpufreq. > > > > For details, take a look at . > > > > If in doubt, say N. > > Would a “native” driver like Intel’s P state driver also give better > results? Do you know if AMD is working on something like that? > > > > Kind regards, > > Paul > As a mere user, I bought a ryzen 3 1300X earlier this year, and was annoyed about how the frequencies changed when a single-threaded compile moved to a different core (unlike e.g. a haswell where frequencies barely vary). I have a meter which can report the current power consumption for the system (computer, monitor, network switch, kvm switch), and using that to keep an eye on the range of reported wattages when idle, compiling with -j1 and compiling with -j4, I eventually formed the impression that the ondemand governor was marginally better than the performance governor, and that omitting cpufreq did not appear to increase the poer consumption. But that is just one set of observations. I agree that using less power and getting faster compiles would be nice, so if something is available I'll be keen to try it. ĸen
Should this_cpu_read() be volatile?
[Resend, changing title & adding lkml and some others ] On Dec 7, 2018, at 3:12 PM, Nadav Amit wrote: [ We can start a new thread, since I have the tendency to hijack threads. ] > On Dec 7, 2018, at 12:45 AM, Peter Zijlstra wrote: > > On Thu, Dec 06, 2018 at 09:26:24AM -0800, Nadav Amit wrote: >>> On Dec 6, 2018, at 2:25 AM, Peter Zijlstra wrote: >>> >>> On Thu, Dec 06, 2018 at 12:28:26AM -0800, Nadav Amit wrote: [ +Peter ] [snip] *But* there is one thing that may require some attention - patch b59167ac7bafd ("x86/percpu: Fix this_cpu_read()”) set ordering constraints on the VM_ARGS() evaluation. And this patch also imposes, it appears, (unnecessary) constraints on other pieces of code. These constraints are due to the addition of the volatile keyword for this_cpu_read() by the patch. This affects at least 68 functions in my kernel build, some of which are hot (I think), e.g., finish_task_switch(), smp_x86_platform_ipi() and select_idle_sibling(). Peter, perhaps the solution was too big of a hammer? Is it possible instead to create a separate "this_cpu_read_once()” with the volatile keyword? Such a function can be used for native_sched_clock() and other seqlocks, etc. >>> >>> No. like the commit writes this_cpu_read() _must_ imply READ_ONCE(). If >>> you want something else, use something else, there's plenty other >>> options available. >>> >>> There's this_cpu_op_stable(), but also __this_cpu_read() and >>> raw_this_cpu_read() (which currently don't differ from this_cpu_read() >>> but could). >> >> Would setting the inline assembly memory operand both as input and output be >> better than using the “volatile”? > > I don't know.. I'm forever befuddled by the exact semantics of gcc > inline asm. > >> I think that If you do that, the compiler would should the this_cpu_read() >> as something that changes the per-cpu-variable, which would make it invalid >> to re-read the value. At the same time, it would not prevent reordering the >> read with other stuff. > > So the thing is; as I wrote, the generic version of this_cpu_*() is: > > local_irq_save(); > __this_cpu_*(); > local_irq_restore(); > > And per local_irq_{save,restore}() including compiler barriers that > cannot be reordered around either. > > And per the principle of least surprise, I think our primitives should > have similar semantics. I guess so, but as you’ll see below, the end result is ugly. > I'm actually having difficulty finding the this_cpu_read() in any of the > functions you mention, so I cannot make any concrete suggestions other > than pointing at the alternative functions available. So I got deeper into the code to understand a couple of differences. In the case of select_idle_sibling(), the patch (Peter’s) increase the function code size by 123 bytes (over the baseline of 986). The per-cpu variable is called through the following call chain: select_idle_sibling() => select_idle_cpu() => local_clock() => raw_smp_processor_id() And results in 2 more calls to sched_clock_cpu(), as the compiler assumes the processor id changes in between (which obviously wouldn’t happen). There may be more changes around, which I didn’t fully analyze. But the very least reading the processor id should not get “volatile”. As for finish_task_switch(), the impact is only few bytes, but still unnecessary. It appears that with your patch preempt_count() causes multiple reads of __preempt_count in this code: if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET, "corrupted preempt_count: %s/%d/0x%x\n", current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); Again, this is unwarranted, as the preemption count should not be changed in any interrupt.
Should this_cpu_read() be volatile?
[Resend, changing title & adding lkml and some others ] On Dec 7, 2018, at 3:12 PM, Nadav Amit wrote: [ We can start a new thread, since I have the tendency to hijack threads. ] > On Dec 7, 2018, at 12:45 AM, Peter Zijlstra wrote: > > On Thu, Dec 06, 2018 at 09:26:24AM -0800, Nadav Amit wrote: >>> On Dec 6, 2018, at 2:25 AM, Peter Zijlstra wrote: >>> >>> On Thu, Dec 06, 2018 at 12:28:26AM -0800, Nadav Amit wrote: [ +Peter ] [snip] *But* there is one thing that may require some attention - patch b59167ac7bafd ("x86/percpu: Fix this_cpu_read()”) set ordering constraints on the VM_ARGS() evaluation. And this patch also imposes, it appears, (unnecessary) constraints on other pieces of code. These constraints are due to the addition of the volatile keyword for this_cpu_read() by the patch. This affects at least 68 functions in my kernel build, some of which are hot (I think), e.g., finish_task_switch(), smp_x86_platform_ipi() and select_idle_sibling(). Peter, perhaps the solution was too big of a hammer? Is it possible instead to create a separate "this_cpu_read_once()” with the volatile keyword? Such a function can be used for native_sched_clock() and other seqlocks, etc. >>> >>> No. like the commit writes this_cpu_read() _must_ imply READ_ONCE(). If >>> you want something else, use something else, there's plenty other >>> options available. >>> >>> There's this_cpu_op_stable(), but also __this_cpu_read() and >>> raw_this_cpu_read() (which currently don't differ from this_cpu_read() >>> but could). >> >> Would setting the inline assembly memory operand both as input and output be >> better than using the “volatile”? > > I don't know.. I'm forever befuddled by the exact semantics of gcc > inline asm. > >> I think that If you do that, the compiler would should the this_cpu_read() >> as something that changes the per-cpu-variable, which would make it invalid >> to re-read the value. At the same time, it would not prevent reordering the >> read with other stuff. > > So the thing is; as I wrote, the generic version of this_cpu_*() is: > > local_irq_save(); > __this_cpu_*(); > local_irq_restore(); > > And per local_irq_{save,restore}() including compiler barriers that > cannot be reordered around either. > > And per the principle of least surprise, I think our primitives should > have similar semantics. I guess so, but as you’ll see below, the end result is ugly. > I'm actually having difficulty finding the this_cpu_read() in any of the > functions you mention, so I cannot make any concrete suggestions other > than pointing at the alternative functions available. So I got deeper into the code to understand a couple of differences. In the case of select_idle_sibling(), the patch (Peter’s) increase the function code size by 123 bytes (over the baseline of 986). The per-cpu variable is called through the following call chain: select_idle_sibling() => select_idle_cpu() => local_clock() => raw_smp_processor_id() And results in 2 more calls to sched_clock_cpu(), as the compiler assumes the processor id changes in between (which obviously wouldn’t happen). There may be more changes around, which I didn’t fully analyze. But the very least reading the processor id should not get “volatile”. As for finish_task_switch(), the impact is only few bytes, but still unnecessary. It appears that with your patch preempt_count() causes multiple reads of __preempt_count in this code: if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET, "corrupted preempt_count: %s/%d/0x%x\n", current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); Again, this is unwarranted, as the preemption count should not be changed in any interrupt.
Process for severe early stable bugs?
The latest file system corruption issue (Nominally fixed by ffe81d45322c ("blk-mq: fix corruption with direct issue") later fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list")) brought a lot of rightfully concerned users asking about release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to 4.19.3 on Nov 23. When the issue started getting visibility, users were left with the option of running known EOL 4.18.x kernels or running a 4.19 series that could corrupt their data. Admittedly, the risk of running the EOL kernel was pretty low given how recent it was, but it's still not a great look to tell people to run something marked EOL. I'm wondering if there's anything we can do to make things easier on kernel consumers. Bugs will certainly happen but it really makes it hard to push the "always run the latest stable" narrative if there isn't a good fallback when things go seriously wrong. I don't actually have a great proposal for a solution here other than retroactively bringing back 4.18 (which I don't think Greg would like) but I figured I should at least bring it up. Thanks, Laura
Process for severe early stable bugs?
The latest file system corruption issue (Nominally fixed by ffe81d45322c ("blk-mq: fix corruption with direct issue") later fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list")) brought a lot of rightfully concerned users asking about release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to 4.19.3 on Nov 23. When the issue started getting visibility, users were left with the option of running known EOL 4.18.x kernels or running a 4.19 series that could corrupt their data. Admittedly, the risk of running the EOL kernel was pretty low given how recent it was, but it's still not a great look to tell people to run something marked EOL. I'm wondering if there's anything we can do to make things easier on kernel consumers. Bugs will certainly happen but it really makes it hard to push the "always run the latest stable" narrative if there isn't a good fallback when things go seriously wrong. I don't actually have a great proposal for a solution here other than retroactively bringing back 4.18 (which I don't think Greg would like) but I figured I should at least bring it up. Thanks, Laura
Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped
On Wed, 5 Dec 2018 22:07:37 -0800 Anthony Yznaga wrote: > Would you like me to submit a revised patch? An -mm tree diff? Either is OK. I usually turn replacemensts into deltas so we can see what changed.
Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped
On Wed, 5 Dec 2018 22:07:37 -0800 Anthony Yznaga wrote: > Would you like me to submit a revised patch? An -mm tree diff? Either is OK. I usually turn replacemensts into deltas so we can see what changed.