Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)
On Wed, 20 Mar 2024 12:44:23 +0900 Masami Hiramatsu (Google) wrote: > > > kernel/trace/trace_probe.c > > > 846 return; > > > 847 > > > 848 for (i = 0; i < earg->size; i++) { > > > 849 struct fetch_insn *code = >code[i]; > > > 850 > > > 851 switch (code->op) { > > > 852 case FETCH_OP_ARG: > > > 853 val = regs_get_kernel_argument(regs, > > > code->param); > > > 854 break; > > > 855 case FETCH_OP_ST_EDATA: > > > --> 856 *(unsigned long *)((unsigned long)edata + > > > code->offset) = val; > > > > > > Probably the earg->code[i] always has FETCH_OP_ARG before > > > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out... > > > > Looks that way: > > > > case FETCH_OP_END: > > earg->code[i].op = FETCH_OP_ARG; > > earg->code[i].param = argnum; > > earg->code[i + 1].op = FETCH_OP_ST_EDATA; > > earg->code[i + 1].offset = offset; > > return offset; > > > > But probably should still initialize val to zero or have a WARN_ON() if > > that doesn't happen. > > OK, let's val = 0 in the store_trace_entry_data(), but WARN_ON() in this loop > is a bit strange. I think we should have a verifiler. Initializing to zero is fine. -- Steve
Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)
On Tue, 19 Mar 2024 10:10:00 -0400 Steven Rostedt wrote: > On Tue, 19 Mar 2024 10:19:09 +0300 > Dan Carpenter wrote: > > > Hello Masami Hiramatsu (Google), > > > > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe > > (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the > > following Smatch static checker warning: > > > > kernel/trace/trace_probe.c:856 store_trace_entry_data() > > error: uninitialized symbol 'val'. > > > > kernel/trace/trace_probe.c > > 846 return; > > 847 > > 848 for (i = 0; i < earg->size; i++) { > > 849 struct fetch_insn *code = >code[i]; > > 850 > > 851 switch (code->op) { > > 852 case FETCH_OP_ARG: > > 853 val = regs_get_kernel_argument(regs, > > code->param); > > 854 break; > > 855 case FETCH_OP_ST_EDATA: > > --> 856 *(unsigned long *)((unsigned long)edata + > > code->offset) = val; > > > > Probably the earg->code[i] always has FETCH_OP_ARG before > > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out... > > Looks that way: > > case FETCH_OP_END: > earg->code[i].op = FETCH_OP_ARG; > earg->code[i].param = argnum; > earg->code[i + 1].op = FETCH_OP_ST_EDATA; > earg->code[i + 1].offset = offset; > return offset; > > But probably should still initialize val to zero or have a WARN_ON() if > that doesn't happen. OK, let's val = 0 in the store_trace_entry_data(), but WARN_ON() in this loop is a bit strange. I think we should have a verifiler. Thank you, > > -- Steve > > > > > > 857 break; > > 858 case FETCH_OP_END: > > 859 goto end; > > 860 default: > > 861 break; > > 862 } > > 863 } > > 864 end: > > 865 return; > > 866 } > > > > regards, > > dan carpenter > -- Masami Hiramatsu (Google)
Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)
On Tue, Mar 19, 2024 at 10:10:00AM -0400, Steven Rostedt wrote: > On Tue, 19 Mar 2024 10:19:09 +0300 > Dan Carpenter wrote: > > > Hello Masami Hiramatsu (Google), > > > > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe > > (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the > > following Smatch static checker warning: > > > > kernel/trace/trace_probe.c:856 store_trace_entry_data() > > error: uninitialized symbol 'val'. > > > > kernel/trace/trace_probe.c > > 846 return; > > 847 > > 848 for (i = 0; i < earg->size; i++) { > > 849 struct fetch_insn *code = >code[i]; > > 850 > > 851 switch (code->op) { > > 852 case FETCH_OP_ARG: > > 853 val = regs_get_kernel_argument(regs, > > code->param); > > 854 break; > > 855 case FETCH_OP_ST_EDATA: > > --> 856 *(unsigned long *)((unsigned long)edata + > > code->offset) = val; > > > > Probably the earg->code[i] always has FETCH_OP_ARG before > > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out... > > Looks that way: > > case FETCH_OP_END: > earg->code[i].op = FETCH_OP_ARG; > earg->code[i].param = argnum; > earg->code[i + 1].op = FETCH_OP_ST_EDATA; > earg->code[i + 1].offset = offset; > return offset; > > But probably should still initialize val to zero or have a WARN_ON() if > that doesn't happen. > Most people use the GCC extension to initialize everything to zero so initializing to zero really has zero cost. I always recomend people to do it. regards, dan carpenter
Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)
On Tue, 19 Mar 2024 10:19:09 +0300 Dan Carpenter wrote: > Hello Masami Hiramatsu (Google), > > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe > (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the > following Smatch static checker warning: > > kernel/trace/trace_probe.c:856 store_trace_entry_data() > error: uninitialized symbol 'val'. > > kernel/trace/trace_probe.c > 846 return; > 847 > 848 for (i = 0; i < earg->size; i++) { > 849 struct fetch_insn *code = >code[i]; > 850 > 851 switch (code->op) { > 852 case FETCH_OP_ARG: > 853 val = regs_get_kernel_argument(regs, > code->param); > 854 break; > 855 case FETCH_OP_ST_EDATA: > --> 856 *(unsigned long *)((unsigned long)edata + > code->offset) = val; > > Probably the earg->code[i] always has FETCH_OP_ARG before > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out... Looks that way: case FETCH_OP_END: earg->code[i].op = FETCH_OP_ARG; earg->code[i].param = argnum; earg->code[i + 1].op = FETCH_OP_ST_EDATA; earg->code[i + 1].offset = offset; return offset; But probably should still initialize val to zero or have a WARN_ON() if that doesn't happen. -- Steve > > 857 break; > 858 case FETCH_OP_END: > 859 goto end; > 860 default: > 861 break; > 862 } > 863 } > 864 end: > 865 return; > 866 } > > regards, > dan carpenter
Re: [bug report] Memory leak from acpi_ev_install_space_handler()
On 06/04/2021 17:40, Rafael J. Wysocki wrote: On Tue, Apr 6, 2021 at 5:51 PM John Garry wrote: Hi guys, On next-20210406, I enabled CONFIG_DEBUG_KMEMLEAK and CONFIG_DEBUG_TEST_DRIVER_REMOVE for my arm64 system, and see this: Hi Rafael, Why exactly do you think that acpi_ev_install_space_handler() leaks memory? I don't think that acpi_ev_install_space_handler() itself leaks memory, but it seems that there is something missing in the code which is meant to undo/clean up after that on the uninstall path - I did make the point in writing "memory leak from", but maybe still not worded clearly. Anyway, I have not analyzed the problem fully - I'm just reporting. I don't mind looking further if requested. Thanks, John root@debian:/home/john# more /sys/kernel/debug/kmemleak unreferenced object 0x202803c11f00 (size 128): comm "swapper/0", pid 1, jiffies 4294894325 (age 337.524s) hex dump (first 32 bytes): 00 00 00 00 02 00 00 00 08 1f c1 03 28 20 ff ff( .. 08 1f c1 03 28 20 ff ff 00 00 00 00 00 00 00 00( .. backtrace: [<670a0938>] slab_post_alloc_hook+0x9c/0x2f8 [] kmem_cache_alloc+0x198/0x2a8 [<2bdba864>] acpi_os_create_semaphore+0x54/0xe0 [ ] acpi_ev_install_space_handler+0x24c/0x300 [<02e116e2>] acpi_install_address_space_handler+0x64/0xb0 [ ] i2c_acpi_install_space_handler+0xd4/0x138 [<8da42058>] i2c_register_adapter+0x368/0x910 [ ] i2c_add_adapter+0x9c/0x100 [<00ba2fcf>] i2c_add_numbered_adapter+0x44/0x58 [<7df22d67>] i2c_dw_probe_master+0x68c/0x900 [<682dfc98>] dw_i2c_plat_probe+0x460/0x640 [ ] platform_probe+0x8c/0x108 [ ] really_probe+0x190/0x670 [<66017341>] driver_probe_device+0x8c/0xf8 [ ] device_driver_attach+0x9c/0xa8 [ ] __driver_attach+0x88/0x138 unreferenced object 0x00280452c100 (size 128): comm "swapper/0", pid 1, jiffies 4294894558 (age 336.604s) hex dump (first 32 bytes): 00 00 00 00 02 00 00 00 08 c1 52 04 28 00 ff ff..R.(... 08 c1 52 04 28 00 ff ff 00 00 00 00 00 00 00 00..R.(... backtrace: [<670a0938>] slab_post_alloc_hook+0x9c/0x2f8 [ ] kmem_cache_alloc+0x198/0x2a8 [<2bdba864>] acpi_os_create_semaphore+0x54/0xe0 [ ] acpi_ev_install_space_handler+0x24c/0x300 [<02e116e2>] acpi_install_address_space_handler+0x64/0xb0 [<988d4f61>] acpi_gpiochip_add+0x20c/0x4a0 [<73d4faab>] gpiochip_add_data_with_key+0xd10/0x1680 [<1d50b98a>] devm_gpiochip_add_data_with_key+0x30/0x78 [ ] dwapb_gpio_probe+0x828/0xb28 [ ] platform_probe+0x8c/0x108 [ ] really_probe+0x190/0x670 [<66017341>] driver_probe_device+0x8c/0xf8 [ ] device_driver_attach+0x9c/0xa8 [ ] __driver_attach+0x88/0x138 [ ] bus_for_each_dev+0xec/0x160 [ ] driver_attach+0x34/0x48 root@debian:/home/john# Thanks, John .
Re: [bug report] Memory leak from acpi_ev_install_space_handler()
On Tue, Apr 6, 2021 at 5:51 PM John Garry wrote: > > Hi guys, > > On next-20210406, I enabled CONFIG_DEBUG_KMEMLEAK and > CONFIG_DEBUG_TEST_DRIVER_REMOVE for my arm64 system, and see this: Why exactly do you think that acpi_ev_install_space_handler() leaks memory? > root@debian:/home/john# more /sys/kernel/debug/kmemleak > unreferenced object 0x202803c11f00 (size 128): > comm "swapper/0", pid 1, jiffies 4294894325 (age 337.524s) > hex dump (first 32 bytes): > 00 00 00 00 02 00 00 00 08 1f c1 03 28 20 ff ff( .. > 08 1f c1 03 28 20 ff ff 00 00 00 00 00 00 00 00( .. > backtrace: > [<670a0938>] slab_post_alloc_hook+0x9c/0x2f8 > [] kmem_cache_alloc+0x198/0x2a8 > [<2bdba864>] acpi_os_create_semaphore+0x54/0xe0 > [ ] acpi_ev_install_space_handler+0x24c/0x300 > [<02e116e2>] acpi_install_address_space_handler+0x64/0xb0 > [ ] i2c_acpi_install_space_handler+0xd4/0x138 > [<8da42058>] i2c_register_adapter+0x368/0x910 > [ ] i2c_add_adapter+0x9c/0x100 > [<00ba2fcf>] i2c_add_numbered_adapter+0x44/0x58 > [<7df22d67>] i2c_dw_probe_master+0x68c/0x900 > [<682dfc98>] dw_i2c_plat_probe+0x460/0x640 > [ ] platform_probe+0x8c/0x108 > [ ] really_probe+0x190/0x670 > [<66017341>] driver_probe_device+0x8c/0xf8 > [ ] device_driver_attach+0x9c/0xa8 > [ ] __driver_attach+0x88/0x138 > unreferenced object 0x00280452c100 (size 128): > comm "swapper/0", pid 1, jiffies 4294894558 (age 336.604s) > hex dump (first 32 bytes): > 00 00 00 00 02 00 00 00 08 c1 52 04 28 00 ff ff..R.(... > 08 c1 52 04 28 00 ff ff 00 00 00 00 00 00 00 00..R.(... > backtrace: > [<670a0938>] slab_post_alloc_hook+0x9c/0x2f8 > [ ] kmem_cache_alloc+0x198/0x2a8 > [<2bdba864>] acpi_os_create_semaphore+0x54/0xe0 > [ ] acpi_ev_install_space_handler+0x24c/0x300 > [<02e116e2>] acpi_install_address_space_handler+0x64/0xb0 > [<988d4f61>] acpi_gpiochip_add+0x20c/0x4a0 > [<73d4faab>] gpiochip_add_data_with_key+0xd10/0x1680 > [<1d50b98a>] devm_gpiochip_add_data_with_key+0x30/0x78 > [ ] dwapb_gpio_probe+0x828/0xb28 > [ ] platform_probe+0x8c/0x108 > [ ] really_probe+0x190/0x670 > [<66017341>] driver_probe_device+0x8c/0xf8 > [ ] device_driver_attach+0x9c/0xa8 > [ ] __driver_attach+0x88/0x138 > [ ] bus_for_each_dev+0xec/0x160 > [ ] driver_attach+0x34/0x48 > root@debian:/home/john# > > Thanks, > John
Re: [bug report] node: Add memory-side caching attributes
On Thu, Apr 01, 2021 at 12:00:39PM +0300, Dan Carpenter wrote: > Hi Keith, > > I've been trying to figure out ways Smatch can check for device managed > resources. Like adding rules that if we call dev_set_name(>bar) > then it's device managaged and if there is a kfree(foo) without calling > device_put(>bar) then that's a resource leak. > > Of course one of the rules is that if you call device_register(dev) then > you can't kfree(dev), it has to released with device_put(dev) and that's > true even if the register fails. But this code here feels very > intentional so maybe there is an exception to the rule? Thanks for the notice. There was no intentional use of this, so I think it was a mistake. The proposal in the follow on replies looks much better. > The patch acc02a109b04: "node: Add memory-side caching attributes" > from Mar 11, 2019, leads to the following static checker warning: > > drivers/base/node.c:285 node_init_cache_dev() > error: kfree after device_register(): 'dev' > > drivers/base/node.c >263 static void node_init_cache_dev(struct node *node) >264 { >265 struct device *dev; >266 >267 dev = kzalloc(sizeof(*dev), GFP_KERNEL); >268 if (!dev) >269 return; >270 >271 dev->parent = >dev; >272 dev->release = node_cache_release; >273 if (dev_set_name(dev, "memory_side_cache")) >274 goto free_dev; >275 >276 if (device_register(dev)) > ^^^ >277 goto free_name; >278 >279 pm_runtime_no_callbacks(dev); >280 node->cache_dev = dev; >281 return; >282 free_name: >283 kfree_const(dev->kobj.name); >284 free_dev: >285 kfree(dev); > ^^ >286 } > > regards, > dan carpenter
Re: [bug report] node: Add memory-side caching attributes
On Thu, Apr 01, 2021 at 08:25:11AM -0300, Jason Gunthorpe wrote: > On Thu, Apr 01, 2021 at 12:00:39PM +0300, Dan Carpenter wrote: > > Hi Keith, > > > > I've been trying to figure out ways Smatch can check for device managed > > resources. Like adding rules that if we call dev_set_name(>bar) > > then it's device managaged and if there is a kfree(foo) without calling > > device_put(>bar) then that's a resource leak. > > It seems to be working from what I can see This check is actually more simple, and older. It just looks for device_register(dev); ... kfree(dev); I've written your proposed check of: device_register(>dev); ... kfree(foo); // warning missing device_put(>dev); But I just did that earler today and it will probably take a couple iterations to work out the kinks. Plus I'm off for a small vacation so it will be a week before I have the results from that. > > Also I wasn't able to convince myself that any locking around > node->cache_attrs exist.. > > > Of course one of the rules is that if you call device_register(dev) then > > you can't kfree(dev), it has to released with device_put(dev) and that's > > true even if the register fails. But this code here feels very > > intentional so maybe there is an exception to the rule? > > There is no exception. Open coding this: > > >282 free_name: > >283 kfree_const(dev->kobj.name); > > To avoid leaking memory from dev_set_name is a straight up layering > violation, WTF? > > node_cacheinfo_release() is just kfree(), so there is no need. > Instead (please feel free to send this Dan): Sure, I can send this (tomorrow). > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index f449dbb2c74666..89c28952863977 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -319,25 +319,24 @@ void node_add_cache(unsigned int nid, struct > node_cache_attrs *cache_attrs) > return; > > dev = >dev; > + device_initialize(dev) > dev->parent = node->cache_dev; > dev->release = node_cacheinfo_release; > dev->groups = cache_groups; > if (dev_set_name(dev, "index%d", cache_attrs->level)) Is calling dev_set_name() without doing a device_initialize() a bug? I could write a check for that. regards, dan carpenter
Re: [bug report] node: Add memory-side caching attributes
On Thu, Apr 01, 2021 at 05:06:52PM +0300, Dan Carpenter wrote: > > diff --git a/drivers/base/node.c b/drivers/base/node.c > > index f449dbb2c74666..89c28952863977 100644 > > +++ b/drivers/base/node.c > > @@ -319,25 +319,24 @@ void node_add_cache(unsigned int nid, struct > > node_cache_attrs *cache_attrs) > > return; > > > > dev = >dev; > > + device_initialize(dev) > > dev->parent = node->cache_dev; > > dev->release = node_cacheinfo_release; > > dev->groups = cache_groups; > > if (dev_set_name(dev, "index%d", cache_attrs->level)) > > Is calling dev_set_name() without doing a device_initialize() a bug? I > could write a check for that. IMHO, yes. However, Greg may not agree as dev_set_name() with no error check followed by device_register() is a very common pattern. If the user omits the device_initialize() then dev_set_name() must be immediately before only device_register() with no error unwind between them. It must not error unwind dev_set_name() to kfree. (This is really tricky) Greg and I have argued on the merits of device_initialize() several times before. I argue the error control flow is simpler and easier to get right, he argues the extra statement is unneeded complexity and people will get the error unwind wrong. Every time you find a bug like this is someone getting the complexity around error handling and device_register() wrong, so my advices is to stop using device_register (aka device_init_and_add) and make it explicit so the goto unwind has logical pairing. put_device() pairs with device_initialize(). The tricky bit is establishing the release function, as complex release functions often have complex init sequences and you need the setup done enough to go to release. When things become this complex I advocate splitting alloc into a function: /* Caller must use put_device(>dev) */ struct foo *foo_allocate() { foo = kzalloc allocate release freeing thing 1; allocate release freeing thing 2; allocate release freeing thing 3; device_initialize(); return foo; err: free thing 3; err: free thing 2; err: free thing 1; err: kfree(foo) return rc; } Thus there is a clear logical seperation between the world of 'unwind to kfree' and the world of 'unwind to put_device'. dev_set_name() can not be inside an alloc function. Simple cases, like here, should just do device_initialize() immediately after kzalloc() and never have a kfree() error unwind. I saw Christoph had a similar opinion on 'init and add' patterns being bad, maybe he has additional colour to share. Jason
Re: [bug report] node: Add memory-side caching attributes
On Thu, Apr 01, 2021 at 12:00:39PM +0300, Dan Carpenter wrote: > Hi Keith, > > I've been trying to figure out ways Smatch can check for device managed > resources. Like adding rules that if we call dev_set_name(>bar) > then it's device managaged and if there is a kfree(foo) without calling > device_put(>bar) then that's a resource leak. It seems to be working from what I can see Also I wasn't able to convince myself that any locking around node->cache_attrs exist.. > Of course one of the rules is that if you call device_register(dev) then > you can't kfree(dev), it has to released with device_put(dev) and that's > true even if the register fails. But this code here feels very > intentional so maybe there is an exception to the rule? There is no exception. Open coding this: >282 free_name: >283 kfree_const(dev->kobj.name); To avoid leaking memory from dev_set_name is a straight up layering violation, WTF? node_cacheinfo_release() is just kfree(), so there is no need. Instead (please feel free to send this Dan): diff --git a/drivers/base/node.c b/drivers/base/node.c index f449dbb2c74666..89c28952863977 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -319,25 +319,24 @@ void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs) return; dev = >dev; + device_initialize(dev) dev->parent = node->cache_dev; dev->release = node_cacheinfo_release; dev->groups = cache_groups; if (dev_set_name(dev, "index%d", cache_attrs->level)) - goto free_cache; + goto put_device; info->cache_attrs = *cache_attrs; - if (device_register(dev)) { + if (device_add(dev)) { dev_warn(>dev, "failed to add cache level:%d\n", cache_attrs->level); - goto free_name; + goto put_device } pm_runtime_no_callbacks(dev); list_add_tail(>node, >cache_attrs); return; -free_name: - kfree_const(dev->kobj.name); -free_cache: - kfree(info); +put_device: + put_device(dev); } static void node_remove_caches(struct node *node)
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Am Montag, dem 15.02.2021 um 10:54 -0500 schrieb Sven Van Asbroeck: > Hi Lucas, > > On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach wrote: > > > > The straight forward way to fix this would be to just disable the PRE > > when the stride is getting too large, which might not work well with > > all userspace requirements, as it effectively disables the ability to > > scan GPU tiled surfaces when the stride is getting too large. > > Thank you for your very knowledgeable input, really appreciate it. > > I am wondering why I am the first to notice this particular corner > case. Is this perhaps because X+armada plugin allocate a huge bitmap > that fits all displays, and other software frameworks do not? Are > people on i.MX6 mostly using X or Wayland? If Wayland allocates a > separate bitmap for each display, this PRE bug will of course never > show up... Yep, I really doubt that there are a lot i.MX6QP, multi-display, X.Org based devices out there. While it's not anywhere in a protocol or similar fixed API, Wayland compositors mostly opted to have a separate surface per display. The weston reference compositor started out this way (as it makes surface repaint easier) and other followed the lead, so Wayland based stacks won't hit this case. Regards, Lucas
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Hi Lucas, On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach wrote: > > The straight forward way to fix this would be to just disable the PRE > when the stride is getting too large, which might not work well with > all userspace requirements, as it effectively disables the ability to > scan GPU tiled surfaces when the stride is getting too large. Thank you for your very knowledgeable input, really appreciate it. I am wondering why I am the first to notice this particular corner case. Is this perhaps because X+armada plugin allocate a huge bitmap that fits all displays, and other software frameworks do not? Are people on i.MX6 mostly using X or Wayland? If Wayland allocates a separate bitmap for each display, this PRE bug will of course never show up... Sven
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Hi Sven, Am Freitag, dem 12.02.2021 um 18:52 -0500 schrieb Sven Van Asbroeck: > Philipp, Fabio, > > I was able to verify that the PREs do indeed overrun their allocated ocram > area. > > Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size > required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this > comes to 64K. This is what the PRE driver allocates. So far, so good. > > The trouble starts when we're displaying a section of a much wider > bitmap. This happens in X when using two displays. e.g.: > HDMI 1920x1088 > LVDS 1280x800 > X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS. > > In such a case, the stride will be much larger than the width of a > display scanline. Urgh, bad tested corner case. > This is where things start to go very wrong. > > I found that the ocram area used by the PREs increases with the > stride. I experimentally found a formula: > ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4 > > As the stride increases, the PRE eventually overruns the ocram and... > ends up in the "ocram aliased" area, where it overwrites the ocram in > use by the vpu/coda ! > > I could not find any PRE register setting that changes the used ocram area. There is no such setting. The PRE always prefetches a doublebuffer of 2x4 scanlines and the scanline size is defined by the store engine pitch. The straight forward way to fix this would be to just disable the PRE when the stride is getting too large, which might not work well with all userspace requirements, as it effectively disables the ability to scan GPU tiled surfaces when the stride is getting too large. I'm not sure if this works in practice, as the PRG address rewriting might make this harder than it seems, but on could probably try to rewrite the prefetch start address, input pitch, input width/height and store pitch of the PRE settings to cover only the area used by the the CRTC to reduce OCRAM requirements. Regards, Lucas
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Philipp, Fabio, I was able to verify that the PREs do indeed overrun their allocated ocram area. Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this comes to 64K. This is what the PRE driver allocates. So far, so good. The trouble starts when we're displaying a section of a much wider bitmap. This happens in X when using two displays. e.g.: HDMI 1920x1088 LVDS 1280x800 X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS. In such a case, the stride will be much larger than the width of a display scanline. This is where things start to go very wrong. I found that the ocram area used by the PREs increases with the stride. I experimentally found a formula: ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4 As the stride increases, the PRE eventually overruns the ocram and... ends up in the "ocram aliased" area, where it overwrites the ocram in use by the vpu/coda ! I could not find any PRE register setting that changes the used ocram area. Sven
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Hi Philipp, thank you so much for looking into this, I really appreciate it ! On Thu, Feb 11, 2021 at 9:32 AM Philipp Zabel wrote: > > Another thing that might help to identify who is writing where might be to > clear the whole OCRAM region and dump it after running only decode or only > PRE/PRG scanout, for example: Great idea, I will try that out. This might take a few days. I am also dealing with higher priority issues, > > Could you check /sys/kernel/debug/dri/?/state while running the error case? dri state in non-error case: # cat state plane[31]: plane-0 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=0 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[35]: plane-1 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=1 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[38]: plane-2 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=0 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[42]: plane-3 crtc=crtc-2 fb=59 allocated by = X refcount=2 format=XR24 little-endian (0x34325258) modifier=0x0 size=1280x1088 layers: size[0]=1280x1088 pitch[0]=5120 offset[0]=0 obj[0]: name=2 refcount=4 start=000105e4 size=5570560 imported=no paddr=0xee80 vaddr=78a02004 crtc-pos=1280x800+0+0 src-pos=1280.00x800.00+0.00+0.00 rotation=1 normalized-zpos=0 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[46]: plane-4 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=1 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[49]: plane-5 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=0 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range crtc[34]: crtc-0 enable=0 active=0 self_refresh_active=0 planes_changed=0 mode_changed=0 active_changed=0 connectors_changed=0 color_mgmt_changed=0 plane_mask=0 connector_mask=0 encoder_mask=0 mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0 crtc[41]: crtc-1 enable=0 active=0 self_refresh_active=0 planes_changed=0 mode_changed=0 active_changed=0 connectors_changed=0 color_mgmt_changed=0 plane_mask=0 connector_mask=0 encoder_mask=0 mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0 crtc[45]: crtc-2 enable=1 active=1 self_refresh_active=0 planes_changed=0 mode_changed=0 active_changed=0 connectors_changed=0 color_mgmt_changed=0 plane_mask=8 connector_mask=2 encoder_mask=2 mode: "": 60 67880 1280 1344 1345 1350 800 838 839 841 0x0 0x0 crtc[52]: crtc-3 enable=0 active=0 self_refresh_active=0 planes_changed=0 mode_changed=0 active_changed=0 connectors_changed=0 color_mgmt_changed=0 plane_mask=0 connector_mask=0 encoder_mask=0 mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0 connector[54]: HDMI-A-1 crtc=(null) self_refresh_aware=0 connector[57]: LVDS-1 crtc=crtc-2 self_refresh_aware=0 dri state in error case: # cat state plane[31]: plane-0 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=0 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[35]: plane-1 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1 normalized-zpos=1 color-encoding=ITU-R BT.601 YCbCr color-range=YCbCr limited range plane[38]: plane-2 crtc=(null) fb=0 crtc-pos=0x0+0+0 src-pos=0.00x0.00+0.00+0.00 rotation=1
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Hi Sven, On Wed, Feb 10, 2021 at 01:29:29PM -0500, Sven Van Asbroeck wrote: > Found it! > > The i.MX6QuadPlus has two pairs of PREs, which use the extended > section of the iRAM. The Classic does not have any PREs or extended > iRAM: > > pre1: pre@21c8000 { >compatible = "fsl,imx6qp-pre"; > > fsl,iram = <>; > }; > > pre3: pre@21ca000 { > compatible = "fsl,imx6qp-pre"; > > fsl,iram = <>; > }; > > The CODA (VPU) driver uses the common section of iRAM: > > vpu: vpu@204 { > compatible = "cnm,coda960"; > > iram = <>; > }; > > The VPU or the PREs are overrunning their assigned iRAM area. How do I > know? Because if I change the PRE iRAM order, the problem disappears! > > PRE1: ocram2 change to ocram3 > PRE2: ocram2 change to ocram3 > PRE3: ocram3 change to ocram2 > PRE4: ocram3 change to ocram2 Thank you for debugging this. Given that CODA uses the OCRAM address range 0x90-0x94 and the PREs use OCRAM2 at 0x94-0x96 and OCRAM3 at 0x96-0x98, it seems unlikely that the PREs would overrun into the CODA iRAM. But maybe there is some stride related overflow that causes it to write at negative offsets or some other kind of oversight. Could you check /sys/kernel/debug/dri/?/state while running the error case? Another thing that might help to identify who is writing where might be to clear the whole OCRAM region and dump it after running only decode or only PRE/PRG scanout, for example: --8<-- /* Clear OCRAM */ #include #include #include #include #include #define OCRAM_START 0x90 #define OCRAM_SIZE 0x8 int main(int argc, char *argv[]) { int fd = open("/dev/mem", O_RDWR | O_SYNC); void *map = mmap(NULL, OCRAM_SIZE, PROT_WRITE, MAP_SHARED, fd, OCRAM_START); if (map == MAP_FAILED) return EXIT_FAILURE; memset(map, 0, OCRAM_SIZE); munmap(map, OCRAM_SIZE); close(fd); return EXIT_SUCCESS; } -->8-- --8<-- /* Dump OCRAM to stdout */ #include #include #include #include #define OCRAM_START 0x90 #define OCRAM_SIZE 0x8 int main(int argc, char *argv[]) { int fd = open("/dev/mem", O_RDONLY | O_SYNC); void *map = mmap(NULL, OCRAM_SIZE, PROT_READ, MAP_SHARED, fd, OCRAM_START); if (map == MAP_FAILED) return EXIT_FAILURE; write(1, map, OCRAM_SIZE); munmap(map, OCRAM_SIZE); close(fd); return EXIT_SUCCESS; } -->8-- regards Philipp
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Found it! The i.MX6QuadPlus has two pairs of PREs, which use the extended section of the iRAM. The Classic does not have any PREs or extended iRAM: pre1: pre@21c8000 { compatible = "fsl,imx6qp-pre"; fsl,iram = <>; }; pre3: pre@21ca000 { compatible = "fsl,imx6qp-pre"; fsl,iram = <>; }; The CODA (VPU) driver uses the common section of iRAM: vpu: vpu@204 { compatible = "cnm,coda960"; iram = <>; }; The VPU or the PREs are overrunning their assigned iRAM area. How do I know? Because if I change the PRE iRAM order, the problem disappears! PRE1: ocram2 change to ocram3 PRE2: ocram2 change to ocram3 PRE3: ocram3 change to ocram2 PRE4: ocram3 change to ocram2 Sven
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Bonjour Nicolas, On Wed, Feb 10, 2021 at 11:11 AM Nicolas Dufresne wrote: > > Are you sure you aren't just running out of CMA ? This is the only things that > comes to mind at the moment, sorry if it's not that useful. Thanks for the suggestion! No worries, this is such a strange/weird problem, that basically any idea has merit at this point. I tried increasing the CMA area from 256M -> 512M, but there was no impact. The critical framebuffer width still remains the same (=0x900). And everything works fine on a classic i.MX6Quad, it's only the i.MX6QuadPlus that has the problem. I am running i.MX6Quad and i.MX6QuadPlus side-by-side with identical kernels/rootfses. Obviously the devicetree is slightly different. Sven
Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
Hi Sven, Le mercredi 03 février 2021 à 11:33 -0500, Sven Van Asbroeck a écrit : > From: Sven Van Asbroeck > > We have observed that under certain repeatable circumstances, the CODA > mem2mem device consistently generates corrupted frames. This happens only > on an i.MX6qp (Plus) - the classic imx6q is not affected. > > This happens when the virtual X screen is wider than 0x900 pixels (1). Are you sure you aren't just running out of CMA ? This is the only things that comes to mind at the moment, sorry if it's not that useful. > > Quite strange, because CODA is a mem2mem device, and is presumably not > touching > any of the IPU/GPU2D/GPU3D infrastructure used by X. Except if there is a > hidden > dependency somehow. > > I have captured and visualized generated CODA frames as follows: > gst-launch-1.0 playbin uri=file:///home/default/nycTrain1080p.mp4 flags=0x45 > video-sink='multifilesink location=frame%d.yuv' > See (2) for how I converted the raw YUV frame to a PNG image. > > For example, the following will break CODA mpeg4 decode (width >= 0x900): > # xrandr --fb 2400x1088 > Screen 0: minimum 1 x 1, current 2400 x 1088, maximum 4096 x 4096 > HDMI1 disconnected (normal left inverted right x axis y axis) > LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y > axis) 0mm x 0mm > 1280x800 59.79*+ > > Resulting frame when dumped with multifilesink (NOT written to the display): > https://gitlab.com/TheSven73/coda-investigation/-/blob/master/stripes.png > > And the following will restore CODA mpeg4 decode (width < 0x900): > # xrandr --fb 2300x1088 > Screen 0: minimum 1 x 1, current 2300 x 1088, maximum 4096 x 4096 > HDMI1 disconnected (normal left inverted right x axis y axis) > LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y > axis) 0mm x 0mm > 1280x800 59.79*+ > > Resulting frame when dumped with multifilesink (NOT written to the display): > https://gitlab.com/TheSven73/coda-investigation/-/blob/master/ok.png > > Additional info: > - only the virtual X screen width seems to trigger the issue, it is > independent of the height. > - issue seems independent of the pixel format. Forcing CODA to output NV12 > shows the same behaviour. > > System description: > - i.MX6 QuadPlus: > [ 0.144518] CPU identified as i.MX6QP, silicon rev 1.1 > - mainline Linux v5.9.16 with a small private patchset on top > (patchset does not touch CODA) > - CODA960 silicon contained within i.MX6 QuadPlus: > [ 4798.510033] coda 204.vpu: Firmware code revision: 46076 > [ 4798.515916] coda 204.vpu: Initialized CODA960. > [ 4798.520779] coda 204.vpu: Firmware version: 3.1.1 > - gstreamer from buildroot: > gst-launch-1.0 version 1.16.2 > GStreamer 1.16.2 > - X from buildroot, using armada and etnadrm_gpu plugins: > X.Org X Server 1.20.7 > X Protocol Version 11, Revision 0 > [ 99.527] (II) LoadModule: "armada" > [ 99.527] (II) Loading /usr/lib/xorg/modules/drivers/armada_drv.so > [ 99.538] (II) Module armada: vendor="X.Org Foundation" > [ 99.538]compiled for 1.20.7, module version = 0.0.0 > [ 99.538]Module class: X.Org Video Driver > [ 99.538]ABI class: X.Org Video Driver, version 24.1 > [ 99.538] (II) armada: Support for Marvell LCD Controller: 88AP510 > [ 99.539] (II) armada: Support for Freescale IPU: i.MX6 > [ 99.545] (II) armada(0): Added screen for KMS device /dev/dri/card1 > [ 99.561] (II) armada(0): hardware: imx-drm > [ 99.563] (**) armada(0): Option "AccelModule" "etnadrm_gpu" > [ 99.563] (II) Loading sub module "etnadrm_gpu" > [ 99.563] (II) LoadModule: "etnadrm_gpu" > [ 99.564] (II) Loading /usr/lib/xorg/modules/drivers/etnadrm_gpu.so > [ 99.576] (II) Module Etnaviv GPU driver (DRM): vendor="X.Org Foundation" > [ 99.576]compiled for 1.20.7, module version = 0.0.0 > > > (1) When using multiple displays, the virtual X screen is typically the > bounding > rectangle which includes all screens. That's why it can become wider than > 1920 pixels. > > (2) > > # Convert raw YUYV to PNG > # Python, runs out of the box on a stock Google Colab notebook > import cv2 > import numpy as np > import matplotlib.pyplot as plt > import matplotlib > > img = np.fromfile('frame1.yuv', dtype=np.uint8) > # YUYV has two 8-bit channels per pixel > img.shape = (1088, 1920, 2) > > img2 = cv2.cvtColor(img, cv2.COLOR_YUV2RGB_YUYV) > plt.imshow(img2) > matplotlib.image.imsave('frame1.png', img2) > > To: Philipp Zabel > To: Mauro Carvalho Chehab > Cc: Adrian Ratiu > Cc: Lucas Stach > Cc: Fabio Estevam > Cc: linux-me...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org
Re: [bug report] sched/fair: Prefer prev cpu in asymmetric wakeup path
On Fri, Nov 13, 2020 at 09:56:37AM +0100, Vincent Guittot wrote: > Hi Dan, > > Le vendredi 13 nov. 2020 à 11:46:57 (+0300), Dan Carpenter a écrit : > > Hello Vincent Guittot, > > > > The patch b4c9c9f15649: "sched/fair: Prefer prev cpu in asymmetric > > wakeup path" from Oct 29, 2020, leads to the following static checker > > warning: > > > > kernel/sched/fair.c:6249 select_idle_sibling() > > error: uninitialized symbol 'task_util'. > > > > kernel/sched/fair.c > > 6233 static int select_idle_sibling(struct task_struct *p, int prev, int > > target) > > 6234 { > > 6235 struct sched_domain *sd; > > 6236 unsigned long task_util; > > 6237 int i, recent_used_cpu; > > 6238 > > 6239 /* > > 6240 * On asymmetric system, update task utilization because we > > will check > > 6241 * that the task fits with cpu's capacity. > > 6242 */ > > > > The original comment was a bit more clear... Perhaps "On asymmetric > > system[s], [record the] task utilization because we will check that the > > task [can be done within] the cpu's capacity." > > The comment "update task utilization because we will check ..." refers to > sync_entity_load_avg() > > > > > 6243 if (static_branch_unlikely(_asym_cpucapacity)) { > > 6244 sync_entity_load_avg(>se); > > 6245 task_util = uclamp_task_util(p); > > 6246 } > > > > "task_util" is not initialized on the else path. > > no need because it will not be used > > > > > 6247 > > 6248 if ((available_idle_cpu(target) || sched_idle_cpu(target)) > > && > > 6249 asym_fits_capacity(task_util, target)) > >^ > > Uninitialized variable warning. > > asym_fits_capacity includes the same condition as above when we set task_util > so task_util can't be used unintialize > > static inline bool asym_fits_capacity(int task_util, int cpu) > { > if (static_branch_unlikely(_asym_cpucapacity)) > return fits_capacity(task_util, capacity_of(cpu)); > > return true; It's an interesting question, because unless the compiler makes this inline, then it will lead to a KASan/syzbot warning at runtime. The function is, of course, marked as inline but the compiler, also of course, generally ignores those hints (use __always_inline if you want the compiler to pay attention). On the other hand, the compiler will still probably inline it... So this is *probably* not going to lead to a runtime warning. regards, dan carpenter
Re: [bug report] sched/fair: Prefer prev cpu in asymmetric wakeup path
Hi Dan, Le vendredi 13 nov. 2020 à 11:46:57 (+0300), Dan Carpenter a écrit : > Hello Vincent Guittot, > > The patch b4c9c9f15649: "sched/fair: Prefer prev cpu in asymmetric > wakeup path" from Oct 29, 2020, leads to the following static checker > warning: > > kernel/sched/fair.c:6249 select_idle_sibling() > error: uninitialized symbol 'task_util'. > > kernel/sched/fair.c > 6233 static int select_idle_sibling(struct task_struct *p, int prev, int > target) > 6234 { > 6235 struct sched_domain *sd; > 6236 unsigned long task_util; > 6237 int i, recent_used_cpu; > 6238 > 6239 /* > 6240 * On asymmetric system, update task utilization because we > will check > 6241 * that the task fits with cpu's capacity. > 6242 */ > > The original comment was a bit more clear... Perhaps "On asymmetric > system[s], [record the] task utilization because we will check that the > task [can be done within] the cpu's capacity." The comment "update task utilization because we will check ..." refers to sync_entity_load_avg() > > 6243 if (static_branch_unlikely(_asym_cpucapacity)) { > 6244 sync_entity_load_avg(>se); > 6245 task_util = uclamp_task_util(p); > 6246 } > > "task_util" is not initialized on the else path. no need because it will not be used > > 6247 > 6248 if ((available_idle_cpu(target) || sched_idle_cpu(target)) && > 6249 asym_fits_capacity(task_util, target)) >^ > Uninitialized variable warning. asym_fits_capacity includes the same condition as above when we set task_util so task_util can't be used unintialize static inline bool asym_fits_capacity(int task_util, int cpu) { if (static_branch_unlikely(_asym_cpucapacity)) return fits_capacity(task_util, capacity_of(cpu)); return true; } > > 6250 return target; > 6251 > 6252 /* > 6253 * If the previous CPU is cache affine and idle, don't be > stupid: > 6254 */ > 6255 if (prev != target && cpus_share_cache(prev, target) && > 6256 (available_idle_cpu(prev) || sched_idle_cpu(prev)) && > 6257 asym_fits_capacity(task_util, prev)) > 6258 return prev; > 6259 > 6260 /* > 6261 * Allow a per-cpu kthread to stack with the wakee if the > > regards, > dan carpenter
Re: [bug report] tracing, synthetic events: Replace buggy strcat() with seq_buf operations
On Mon, 2 Nov 2020 10:45:24 +0300 Dan Carpenter wrote: > Hello Steven Rostedt (VMware), > > The patch 761a8c58db6b: "tracing, synthetic events: Replace buggy > strcat() with seq_buf operations" from Oct 23, 2020, leads to the > following static checker warning: > > kernel/trace/trace_events_synth.c:703 parse_synth_field() > warn: passing zero to 'ERR_PTR' > > kernel/trace/trace_events_synth.c >582 static struct synth_field *parse_synth_field(int argc, const char > **argv, >583 int *consumed) >584 { >585 struct synth_field *field; >586 const char *prefix = NULL, *field_type = argv[0], > *field_name, *array; >587 int len, ret = 0; >588 struct seq_buf s; >589 ssize_t size; >590 >591 if (field_type[0] == ';') >592 field_type++; >593 >594 if (!strcmp(field_type, "unsigned")) { >595 if (argc < 3) { >596 synth_err(SYNTH_ERR_INCOMPLETE_TYPE, > errpos(field_type)); >597 return ERR_PTR(-EINVAL); >598 } >599 prefix = "unsigned "; >600 field_type = argv[1]; >601 field_name = argv[2]; >602 *consumed = 3; >603 } else { >604 field_name = argv[1]; >605 *consumed = 2; >606 } >607 >608 field = kzalloc(sizeof(*field), GFP_KERNEL); >609 if (!field) >610 return ERR_PTR(-ENOMEM); >611 >612 len = strlen(field_name); >613 array = strchr(field_name, '['); >614 if (array) >615 len -= strlen(array); >616 else if (field_name[len - 1] == ';') >617 len--; >618 >619 field->name = kmemdup_nul(field_name, len, GFP_KERNEL); >620 if (!field->name) { >621 ret = -ENOMEM; >622 goto free; >623 } >624 if (!is_good_name(field->name)) { >625 synth_err(SYNTH_ERR_BAD_NAME, errpos(field_name)); >626 ret = -EINVAL; >627 goto free; >628 } >629 >630 if (field_type[0] == ';') >631 field_type++; >632 len = strlen(field_type) + 1; >633 >634 if (array) >635 len += strlen(array); >636 >637 if (prefix) >638 len += strlen(prefix); >639 >640 field->type = kzalloc(len, GFP_KERNEL); >641 if (!field->type) { >642 ret = -ENOMEM; >643 goto free; >644 } >645 seq_buf_init(, field->type, len); >646 if (prefix) >647 seq_buf_puts(, prefix); >648 seq_buf_puts(, field_type); >649 if (array) { >650 seq_buf_puts(, array); >651 if (s.buffer[s.len - 1] == ';') >652 s.len--; >653 } >654 if (WARN_ON_ONCE(!seq_buf_buffer_left())) >655 goto free; > > This was originally reported by kbuild-bot, but it was somehow > overlooked so now it's on my system. The missing error code will lead > to a NULL dereference in the caller. I misread the original report, modified it to fix something else, and didn't see the real problem. Having to initialize ret for *every* error path is ridiculous. Here's the fix. -- Steve >From 2980f226a7fb08e37fd3948e206854fe5a1a8c50 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Mon, 2 Nov 2020 11:28:39 -0500 Subject: [PATCH] tracing: Make -ENOMEM the default error for parse_synth_field() parse_synth_field() returns a pointer and requires that errors get surrounded by ERR_PTR(). The ret variable is initialized to zero, but should never be used as zero, and if it is, it could cause a false return code and produce a NULL pointer dereference. It makes no sense to set ret to zero. Set ret to -ENOMEM (the most common error case), and have any other errors set it to something else. This removes the need to initialize ret on *every* error branch. Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations") Reported-by: Dan Carpenter Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_synth.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index 84b7cab55291..881df991742a 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -584,7 +584,7 @@ static struct synth_field
Re: Bug Report High CPU Usage events_power_efficient
Hi Oki i find the problem is come from nf_conntrack_core I isolate problem in this part : queue_delayed_work(system_power_efficient_wq, _gc_work.dwork, HZ); When package go to queue delayed in one moment if connection track is to big process to delayed go to lock and start high cpu load. This is need to check and find solution… For now I remove queue_delayed_work and wait to check machine and will write status. Martin > On 8 Jul 2020, at 12:28, Greg KH wrote: > > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > A: No. > Q: Should I include quotations after my reply? > > http://daringfireball.net/2007/07/on_top > > On Wed, Jul 08, 2020 at 11:34:52AM +0300, Martin Zaharinov wrote: >> Yes i search but not find any information. > > Please do the testing yourself, using 'git bisect' to find the offending > commit. > > thanks, > > greg k-h
Re: Bug Report High CPU Usage events_power_efficient
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Wed, Jul 08, 2020 at 11:34:52AM +0300, Martin Zaharinov wrote: > Yes i search but not find any information. Please do the testing yourself, using 'git bisect' to find the offending commit. thanks, greg k-h
Re: Bug Report High CPU Usage events_power_efficient
Yes i search but not find any information. And write hear to help if any have same problem or any of you have idea from where is comme this problem. If need more debug i will only write how to get more information. Martin На ср, 8.07.2020 г. в 10:09 Greg KH написа: > > On Wed, Jul 08, 2020 at 09:50:49AM +0300, Martin Zaharinov wrote: > > Add Greg , Florian, Eric to this bug > > > > > On 7 Jul 2020, at 22:54, Martin Zaharinov wrote: > > > > > > And this is log from /sys/kernel/debug/tracing/trace > > > > > > > > > # entries-in-buffer/entries-written: 32410/32410 #P:64 > > > # > > > # _-=> irqs-off > > > # / _=> need-resched > > > #| / _---=> hardirq/softirq > > > #|| / _--=> preempt-depth > > > #||| / delay > > > # TASK-PID CPU# TIMESTAMP FUNCTION > > > # | | | | | > > > <...>-57259 [005] 29619.680698: workqueue_execute_start: > > > work struct ef22e4b8: function gc_worker [nf_conntrack] > > > <...>-57259 [005] 29623.811407: workqueue_execute_end: > > > work struct ef22e4b8: function gc_worker [nf_conntrack] > > > <...>-57259 [005] 29623.811410: workqueue_execute_start: > > > work struct 00aeec55: function fb_flashcursor > > > <...>-57259 [005] 29623.811421: workqueue_execute_end: > > > work struct 00aeec55: function fb_flashcursor > > > <...>-57259 [005] 29623.811422: workqueue_execute_start: > > > work struct a6d382bb: function vmstat_update > > > <...>-57259 [005] 29623.811435: workqueue_execute_end: > > > work struct a6d382bb: function vmstat_update > > > > > >> On 7 Jul 2020, at 22:44, Martin Zaharinov wrote: > > >> > > >> the problem is hear with kernel 5.7.7 > > >> > > >> last work kernel without this problem is 5.6.7 > > >> > > >> hear is more info: > > >> > > >> cat /proc/57259/stack > > >> root@megacableamarilis:~# cat /proc/57259/stack > > >> [<0>] gc_worker+0x1be/0x380 [nf_conntrack] > > >> [<0>] process_one_work+0x1bc/0x3b0 > > >> [<0>] worker_thread+0x4d/0x460 > > >> [<0>] kthread+0x10d/0x130 > > >> [<0>] ret_from_fork+0x1f/0x30 > > >> > > >> PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ > > >> COMMAND57259 root 28 0 0 0 > > >> 0 R 69.8 0.0 82:42.14 kworker/5:2+events_power_efficient > > >> 32 root 21 0 0 0 0 R 31.0 0.0 87:06.33 > > >> ksoftirqd/4 > > >> > > >> > > >> Please help to fix this problem > > >> > > >>> On 22 Apr 2020, at 15:55, Martin Zaharinov wrote: > > >>> > > >>> Hello Qian and Greg > > >>> With latest 5.6.x kernel have problem with events_power_efficient 28488 > > >>> root 28 0 0 0 0 R 95.5 0.0 101:38.19 > > >>> kworker/2:1+events_power_efficient Process start to load machine after > > >>> 3-4 hour and load not stop only reboot machine remove process . Server > > >>> runing on AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when > > >>> machine load over 1G traffic machine locked and only restart fix > > >>> problem to next load . After move traffic and server stop load process > > >>> still hear and load server ? > > >>> And after reboot process move to other core . > > Have you used 'git bisect' to try to find the offending commit? > > Without that, it's going to be hard to help you out here. > > thanks, > > greg k-h
Re: Bug Report High CPU Usage events_power_efficient
On Wed, Jul 08, 2020 at 09:50:49AM +0300, Martin Zaharinov wrote: > Add Greg , Florian, Eric to this bug > > > On 7 Jul 2020, at 22:54, Martin Zaharinov wrote: > > > > And this is log from /sys/kernel/debug/tracing/trace > > > > > > # entries-in-buffer/entries-written: 32410/32410 #P:64 > > # > > # _-=> irqs-off > > # / _=> need-resched > > #| / _---=> hardirq/softirq > > #|| / _--=> preempt-depth > > #||| / delay > > # TASK-PID CPU# TIMESTAMP FUNCTION > > # | | | | | > > <...>-57259 [005] 29619.680698: workqueue_execute_start: > > work struct ef22e4b8: function gc_worker [nf_conntrack] > > <...>-57259 [005] 29623.811407: workqueue_execute_end: work > > struct ef22e4b8: function gc_worker [nf_conntrack] > > <...>-57259 [005] 29623.811410: workqueue_execute_start: > > work struct 00aeec55: function fb_flashcursor > > <...>-57259 [005] 29623.811421: workqueue_execute_end: work > > struct 00aeec55: function fb_flashcursor > > <...>-57259 [005] 29623.811422: workqueue_execute_start: > > work struct a6d382bb: function vmstat_update > > <...>-57259 [005] 29623.811435: workqueue_execute_end: work > > struct a6d382bb: function vmstat_update > > > >> On 7 Jul 2020, at 22:44, Martin Zaharinov wrote: > >> > >> the problem is hear with kernel 5.7.7 > >> > >> last work kernel without this problem is 5.6.7 > >> > >> hear is more info: > >> > >> cat /proc/57259/stack > >> root@megacableamarilis:~# cat /proc/57259/stack > >> [<0>] gc_worker+0x1be/0x380 [nf_conntrack] > >> [<0>] process_one_work+0x1bc/0x3b0 > >> [<0>] worker_thread+0x4d/0x460 > >> [<0>] kthread+0x10d/0x130 > >> [<0>] ret_from_fork+0x1f/0x30 > >> > >> PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND > >>57259 root 28 0 0 0 0 R > >> 69.8 0.0 82:42.14 kworker/5:2+events_power_efficient > >> > >> 32 root 21 0 0 0 0 R 31.0 0.0 87:06.33 > >> ksoftirqd/4 > >> > >> > >> Please help to fix this problem > >> > >>> On 22 Apr 2020, at 15:55, Martin Zaharinov wrote: > >>> > >>> Hello Qian and Greg > >>> With latest 5.6.x kernel have problem with events_power_efficient 28488 > >>> root 28 0 0 0 0 R 95.5 0.0 101:38.19 > >>> kworker/2:1+events_power_efficient Process start to load machine after > >>> 3-4 hour and load not stop only reboot machine remove process . Server > >>> runing on AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when > >>> machine load over 1G traffic machine locked and only restart fix problem > >>> to next load . After move traffic and server stop load process still hear > >>> and load server ? > >>> And after reboot process move to other core . Have you used 'git bisect' to try to find the offending commit? Without that, it's going to be hard to help you out here. thanks, greg k-h
Re: Bug Report High CPU Usage events_power_efficient
Add Greg , Florian, Eric to this bug > On 7 Jul 2020, at 22:54, Martin Zaharinov wrote: > > And this is log from /sys/kernel/debug/tracing/trace > > > # entries-in-buffer/entries-written: 32410/32410 #P:64 > # > # _-=> irqs-off > # / _=> need-resched > #| / _---=> hardirq/softirq > #|| / _--=> preempt-depth > #||| / delay > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > <...>-57259 [005] 29619.680698: workqueue_execute_start: work > struct ef22e4b8: function gc_worker [nf_conntrack] > <...>-57259 [005] 29623.811407: workqueue_execute_end: work > struct ef22e4b8: function gc_worker [nf_conntrack] > <...>-57259 [005] 29623.811410: workqueue_execute_start: work > struct 00aeec55: function fb_flashcursor > <...>-57259 [005] 29623.811421: workqueue_execute_end: work > struct 00aeec55: function fb_flashcursor > <...>-57259 [005] 29623.811422: workqueue_execute_start: work > struct a6d382bb: function vmstat_update > <...>-57259 [005] 29623.811435: workqueue_execute_end: work > struct a6d382bb: function vmstat_update > >> On 7 Jul 2020, at 22:44, Martin Zaharinov wrote: >> >> the problem is hear with kernel 5.7.7 >> >> last work kernel without this problem is 5.6.7 >> >> hear is more info: >> >> cat /proc/57259/stack >> root@megacableamarilis:~# cat /proc/57259/stack >> [<0>] gc_worker+0x1be/0x380 [nf_conntrack] >> [<0>] process_one_work+0x1bc/0x3b0 >> [<0>] worker_thread+0x4d/0x460 >> [<0>] kthread+0x10d/0x130 >> [<0>] ret_from_fork+0x1f/0x30 >> >> PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND >> 57259 root 28 0 0 0 0 R >> 69.8 0.0 82:42.14 kworker/5:2+events_power_efficient >> >> 32 root 21 0 0 0 0 R 31.0 0.0 87:06.33 ksoftirqd/4 >> >> >> Please help to fix this problem >> >>> On 22 Apr 2020, at 15:55, Martin Zaharinov wrote: >>> >>> Hello Qian and Greg >>> With latest 5.6.x kernel have problem with events_power_efficient 28488 >>> root 28 0 0 0 0 R 95.5 0.0 101:38.19 >>> kworker/2:1+events_power_efficient Process start to load machine after 3-4 >>> hour and load not stop only reboot machine remove process . Server runing >>> on AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when machine load >>> over 1G traffic machine locked and only restart fix problem to next load . >>> After move traffic and server stop load process still hear and load server ? >>> And after reboot process move to other core . >>> >>> Best Regards, >>> Martin >> >
Re: Bug Report High CPU Usage events_power_efficient
And this is log from /sys/kernel/debug/tracing/trace # entries-in-buffer/entries-written: 32410/32410 #P:64 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <...>-57259 [005] 29619.680698: workqueue_execute_start: work struct ef22e4b8: function gc_worker [nf_conntrack] <...>-57259 [005] 29623.811407: workqueue_execute_end: work struct ef22e4b8: function gc_worker [nf_conntrack] <...>-57259 [005] 29623.811410: workqueue_execute_start: work struct 00aeec55: function fb_flashcursor <...>-57259 [005] 29623.811421: workqueue_execute_end: work struct 00aeec55: function fb_flashcursor <...>-57259 [005] 29623.811422: workqueue_execute_start: work struct a6d382bb: function vmstat_update <...>-57259 [005] 29623.811435: workqueue_execute_end: work struct a6d382bb: function vmstat_update > On 7 Jul 2020, at 22:44, Martin Zaharinov wrote: > > the problem is hear with kernel 5.7.7 > > last work kernel without this problem is 5.6.7 > > hear is more info: > > cat /proc/57259/stack > root@megacableamarilis:~# cat /proc/57259/stack > [<0>] gc_worker+0x1be/0x380 [nf_conntrack] > [<0>] process_one_work+0x1bc/0x3b0 > [<0>] worker_thread+0x4d/0x460 > [<0>] kthread+0x10d/0x130 > [<0>] ret_from_fork+0x1f/0x30 > > PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND > 57259 root 28 0 0 0 0 R 69.8 > 0.0 82:42.14 kworker/5:2+events_power_efficient > > 32 root 21 0 0 0 0 R 31.0 0.0 87:06.33 ksoftirqd/4 > > > Please help to fix this problem > >> On 22 Apr 2020, at 15:55, Martin Zaharinov wrote: >> >> Hello Qian and Greg >> With latest 5.6.x kernel have problem with events_power_efficient 28488 root >> 28 0 0 0 0 R 95.5 0.0 101:38.19 >> kworker/2:1+events_power_efficient Process start to load machine after 3-4 >> hour and load not stop only reboot machine remove process . Server runing on >> AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when machine load over >> 1G traffic machine locked and only restart fix problem to next load . After >> move traffic and server stop load process still hear and load server ? >> And after reboot process move to other core . >> >> Best Regards, >> Martin >
Re: Bug Report High CPU Usage events_power_efficient
the problem is hear with kernel 5.7.7 last work kernel without this problem is 5.6.7 hear is more info: cat /proc/57259/stack root@megacableamarilis:~# cat /proc/57259/stack [<0>] gc_worker+0x1be/0x380 [nf_conntrack] [<0>] process_one_work+0x1bc/0x3b0 [<0>] worker_thread+0x4d/0x460 [<0>] kthread+0x10d/0x130 [<0>] ret_from_fork+0x1f/0x30 PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 57259 root 28 0 0 0 0 R 69.8 0.0 82:42.14 kworker/5:2+events_power_efficient 32 root 21 0 0 0 0 R 31.0 0.0 87:06.33 ksoftirqd/4 Please help to fix this problem > On 22 Apr 2020, at 15:55, Martin Zaharinov wrote: > > Hello Qian and Greg > With latest 5.6.x kernel have problem with events_power_efficient 28488 root > 28 0 0 0 0 R 95.5 0.0 101:38.19 > kworker/2:1+events_power_efficient Process start to load machine after 3-4 > hour and load not stop only reboot machine remove process . Server runing on > AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when machine load over > 1G traffic machine locked and only restart fix problem to next load . After > move traffic and server stop load process still hear and load server ? > And after reboot process move to other core . > > Best Regards, > Martin
Re: bug report: libceph: follow redirect replies from osds
On Fri, Aug 30, 2019 at 4:05 PM Colin Ian King wrote: > > Hi, > > Static analysis with Coverity has picked up an issue with commit: > > commit 205ee1187a671c3b067d7f1e974903b44036f270 > Author: Ilya Dryomov > Date: Mon Jan 27 17:40:20 2014 +0200 > > libceph: follow redirect replies from osds > > Specifically in function ceph_redirect_decode in net/ceph/osd_client.c: > > 3485 > 3486len = ceph_decode_32(p); > > CID 17904: Unused value (UNUSED_VALUE) > > 3487*p += len; /* skip osd_instructions */ > 3488 > 3489/* skip the rest */ > 3490*p = struct_end; > > The double write to *p looks wrong, I suspect the *p += len; should be > just incrementing pointer p as in: p += len. Am I correct to assume > this is the correct fix? Hi Colin, No, the double write to *p is correct. It skips over len bytes and then skips to the end of the redirect reply. There is no bug here but we could drop len = ceph_decode_32(p); *p += len; /* skip osd_instructions */ and skip to the end directly to make Coverity happier. Thanks, Ilya
Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Confirmed the following 3 upstream commits can resolve this issue: d2a68c4effd8 x86/ftrace: Do not call function graph from dynamic trampolines 3c0dab44e227 x86/ftrace: Set trampoline pages as executable 7298e24f9042 x86/kprobes: Set instruction page as executable And they are all included in stable 4.19.49. Thanks, Joseph On 19/6/9 22:50, Greg KH wrote: > On Sun, Jun 09, 2019 at 09:10:45PM +0800, Joseph Qi wrote: >> Hi Nadav, >> Thanks for the comments. >> I'll test the 3 patches in the mentioned thread. > > This should all be fixed in the latest release that happened today. If > not, please let us know. > > thanks, > > greg k-h >
Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
On Sun, Jun 09, 2019 at 09:10:45PM +0800, Joseph Qi wrote: > Hi Nadav, > Thanks for the comments. > I'll test the 3 patches in the mentioned thread. This should all be fixed in the latest release that happened today. If not, please let us know. thanks, greg k-h
Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Hi Nadav, Thanks for the comments. I'll test the 3 patches in the mentioned thread. Thanks, Joseph On 19/6/8 00:38, Nadav Amit wrote: >> On Jun 7, 2019, at 3:24 AM, Joseph Qi wrote: >> >> Hi all, >> Any idea on this regression? > > Sorry for the late response (I assumed, for some reason, that you also follow > the second thread about this issue). > > Anyhow, it should be fixed by backporting some patches which were mistakenly > missed. > > See https://lore.kernel.org/stable/20190606131558.GJ29739@sasha-vm/ > > Regards, > Nadav > > >> Thanks, >> Joseph >> >> On 19/6/5 18:23, Joseph Qi wrote: >>> Hi, >>> >>> I have encountered a kernel BUG when running ltp ftrace-stress-test >>> on 4.19.48. >>> >>> [ 209.704855] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90) >>> [ 209.739412] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked >>> and stat_runtime require the kernel parameter schedstats=enable or >>> kernel.sched_schedstats=1 >>> [ 212.054506] kernel tried to execute NX-protected page - exploit attempt? >>> (uid: 0) >>> [ 212.055595] BUG: unable to handle kernel paging request at >>> c0349000 >>> [ 212.056589] PGD d00c067 P4D d00c067 PUD d00e067 PMD 23673e067 PTE >>> 80023457f061 >>> [ 212.057759] Oops: 0011 [#1] SMP PTI >>> [ 212.058303] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted >>> 4.19.48 #112 >>> >>> After some investigation I have found that it is introduced by commit >>> 8715ce033eb3 ("x86/modules: Avoid breaking W^X while loading modules"), >>> and then revert this commit the issue is gone. >>> >>> I have also tested the same case on 5.2-rc3 as well as right at >>> upstream commit f2c65fb3221a ("x86/modules: Avoid breaking W^X while >>> loading modules"), which has been merged in 5.2-rc1, it doesn't >>> happen. >>> >>> So I don't know why only stable has this issue while upstream doesn't. >>> >>> Thanks, >>> Joseph >
Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> On Jun 7, 2019, at 3:24 AM, Joseph Qi wrote: > > Hi all, > Any idea on this regression? Sorry for the late response (I assumed, for some reason, that you also follow the second thread about this issue). Anyhow, it should be fixed by backporting some patches which were mistakenly missed. See https://lore.kernel.org/stable/20190606131558.GJ29739@sasha-vm/ Regards, Nadav > Thanks, > Joseph > > On 19/6/5 18:23, Joseph Qi wrote: >> Hi, >> >> I have encountered a kernel BUG when running ltp ftrace-stress-test >> on 4.19.48. >> >> [ 209.704855] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90) >> [ 209.739412] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked >> and stat_runtime require the kernel parameter schedstats=enable or >> kernel.sched_schedstats=1 >> [ 212.054506] kernel tried to execute NX-protected page - exploit attempt? >> (uid: 0) >> [ 212.055595] BUG: unable to handle kernel paging request at >> c0349000 >> [ 212.056589] PGD d00c067 P4D d00c067 PUD d00e067 PMD 23673e067 PTE >> 80023457f061 >> [ 212.057759] Oops: 0011 [#1] SMP PTI >> [ 212.058303] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted >> 4.19.48 #112 >> >> After some investigation I have found that it is introduced by commit >> 8715ce033eb3 ("x86/modules: Avoid breaking W^X while loading modules"), >> and then revert this commit the issue is gone. >> >> I have also tested the same case on 5.2-rc3 as well as right at >> upstream commit f2c65fb3221a ("x86/modules: Avoid breaking W^X while >> loading modules"), which has been merged in 5.2-rc1, it doesn't >> happen. >> >> So I don't know why only stable has this issue while upstream doesn't. >> >> Thanks, >> Joseph
Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Hi all, Any idea on this regression? Thanks, Joseph On 19/6/5 18:23, Joseph Qi wrote: > Hi, > > I have encountered a kernel BUG when running ltp ftrace-stress-test > on 4.19.48. > > [ 209.704855] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90) > [ 209.739412] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked > and stat_runtime require the kernel parameter schedstats=enable or > kernel.sched_schedstats=1 > [ 212.054506] kernel tried to execute NX-protected page - exploit attempt? > (uid: 0) > [ 212.055595] BUG: unable to handle kernel paging request at c0349000 > [ 212.056589] PGD d00c067 P4D d00c067 PUD d00e067 PMD 23673e067 PTE > 80023457f061 > [ 212.057759] Oops: 0011 [#1] SMP PTI > [ 212.058303] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted > 4.19.48 #112 > > After some investigation I have found that it is introduced by commit > 8715ce033eb3 ("x86/modules: Avoid breaking W^X while loading modules"), > and then revert this commit the issue is gone. > > I have also tested the same case on 5.2-rc3 as well as right at > upstream commit f2c65fb3221a ("x86/modules: Avoid breaking W^X while > loading modules"), which has been merged in 5.2-rc1, it doesn't > happen. > > So I don't know why only stable has this issue while upstream doesn't. > > Thanks, > Joseph >
Re: Bug report: A commit about serial8250 cause the output disorderly at the phase of startup
On 4/23/19 2:48 PM, John Ogness wrote: On 2019-04-22, Hongzhi, Song wrote: Anyone notice this issue? Yes, I am aware of the issue. It is actually a feature, not a bug. ;-) Individual LOG_CONT messages, when classified as emergency messages, are printed immediately to the console. Yes, this is what I have acquired from commit log and code. The emergency mechanism is a good way. This makes them appear "disorderly". It is not yet clear how best to deal with LOG_CONT emergency messages. Extracting a part of log from my LOG_CONT message, [ 0.761635] 000: (family: 0x6, model: 0xf [ 0.761635] 000: , stepping: 0xb) It obviously separates a full line. Maybe it is not related to LOG_CONT. ^_^ The counterpart source code: 1499 pr_cont(" (family: 0x%x, model: 0x%x", c->x86, c->x86_model); 1500 1501 if (c->x86_stepping || c->cpuid_level >= 0) 1502 pr_cont(", stepping: 0x%x)\n", c->x86_stepping); 1503 else 1504 pr_cont(")\n"); I don't know if the issue exists in other place except LOG_CONT. But it indeed influences customer's judgement. Thanks for you reply. --Hongzhi But I suppose we should introduce a config option to disable atomic consoles altogether if users preferred buffered/delayed printk output. John Ogness On 4/19/19 10:24 AM, Hongzhi, Song wrote: 1. Issue description: Boot kernel( >= linux-rt-devel-v5.0.3 ) with qemu. Then qemu will print following disorderly messages. At the beginning, the messages are disorderly. But then it becomes normally from printing "[ 0.00] 000: Linux version..." -- [ 0.019000] 000: tsc: Unable to calibrate against PIT [ 0.002583] 000: 6199.83 BogoMIPS (lpj=3099918) [ 0.521247] 000: Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz [ 0.521247] 000: (family: 0x6, model: 0xf [ 0.521247] 000: , stepping: 0xb) [ 0.533126] 000: unsupported p6 CPU model 15 [ 0.533318] 000: no PMU driver, software events only. [ 0.765082] 000: 1 ACPI AML tables successfully acquired and loaded [ 0.765274] 000: [ 0.785903] 000: Enabled 2 GPEs in block 00 to 0F [ 0.786128] 000: [ 0.835675] 000: acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge. [ 0.892056] 000: 5 [ 0.892289] 000: *10 [ 0.892416] 000: 11 [ 0.892527] 000: ) [ 0.892661] 000: /* skip some repeated contents */ [ 5.052149] 000: , 512kB Cache [ 0.00] 000: Linux version 5.0.3-yocto-preempt-rt+ (hsong@pek-lpggp1) (gcc version 8.3.0 (GCC)) #24 SMP PREEMPT Thu Apr 18 03:29:58 EDT 2019 [ 0.00] 000: Command line: root=/dev/vda rw highres=off console=ttyS0 mem=256M ip=192.168.7.4::192.168.7.3:255.255.255.0 vga=0 uvesafb.mode_opti0 [ 0.00] 000: x86/fpu: x87 FPU will use FXSAVE [ 0.00] 000: BIOS-provided physical RAM map: [ 0.00] 000: BIOS-e820: [mem 0x-0x0009fbff] [ 0.00] 000: BIOS-e820: [mem 0x0009fc00-0x0009] [ 0.00] 000: BIOS-e820: [mem 0x000f-0x000f] [ 0.00] 000: BIOS-e820: [mem 0x0010-0x0ffdbfff] -- 2. Reproduce: (1)build kernel: (Attachment is my .config) make ARCH=x86_64 CROSS_COMPILE=[path-to-my-cross-toolchain]/x86_64-wrs-linux- (2)boot kernel with qemu: qemu-system-x86_64 \ -drive file=qemux86-64.rootfs.ext4,if=virtio,format=raw \ -nographic \ -kernel arch/x86/boot/bzImage \ -append 'root=/dev/vda rw highres=off console=ttyS0 mem=256M ip=dhcp' 3. Analysis: I find the following commit from >=linux-rt-devel-v5.0.3. cause the issue. b9d460e serial: 8250: implement write_atomic
Re: Bug report: A commit about serial8250 cause the output disorderly at the phase of startup
On 2019-04-22, Hongzhi, Song wrote: > Anyone notice this issue? Yes, I am aware of the issue. It is actually a feature, not a bug. ;-) Individual LOG_CONT messages, when classified as emergency messages, are printed immediately to the console. This makes them appear "disorderly". It is not yet clear how best to deal with LOG_CONT emergency messages. But I suppose we should introduce a config option to disable atomic consoles altogether if users preferred buffered/delayed printk output. John Ogness > On 4/19/19 10:24 AM, Hongzhi, Song wrote: >> 1. Issue description: >> >> Boot kernel( >= linux-rt-devel-v5.0.3 ) with qemu. >> Then qemu will print following disorderly messages. >> >> At the beginning, the messages are disorderly. But then it becomes >> normally from printing "[ 0.00] 000: Linux version..." >> >> -- >> [ 0.019000] 000: tsc: Unable to calibrate against PIT >> [ 0.002583] 000: 6199.83 BogoMIPS (lpj=3099918) >> [ 0.521247] 000: Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz >> [ 0.521247] 000: (family: 0x6, model: 0xf >> [ 0.521247] 000: , stepping: 0xb) >> [ 0.533126] 000: unsupported p6 CPU model 15 >> [ 0.533318] 000: no PMU driver, software events only. >> [ 0.765082] 000: 1 ACPI AML tables successfully acquired and loaded >> [ 0.765274] 000: >> [ 0.785903] 000: Enabled 2 GPEs in block 00 to 0F >> [ 0.786128] 000: >> [ 0.835675] 000: acpi PNP0A03:00: fail to add MMCONFIG >> information, can't access extended PCI configuration space under >> this bridge. >> [ 0.892056] 000: 5 >> [ 0.892289] 000: *10 >> [ 0.892416] 000: 11 >> [ 0.892527] 000: ) >> [ 0.892661] 000: >> >> /* skip some repeated contents */ >> >> [ 5.052149] 000: , 512kB Cache >> [ 0.00] 000: Linux version 5.0.3-yocto-preempt-rt+ >> (hsong@pek-lpggp1) (gcc version 8.3.0 (GCC)) #24 SMP PREEMPT Thu Apr >> 18 03:29:58 EDT 2019 >> [ 0.00] 000: Command line: root=/dev/vda rw highres=off >> console=ttyS0 mem=256M ip=192.168.7.4::192.168.7.3:255.255.255.0 >> vga=0 uvesafb.mode_opti0 >> [ 0.00] 000: x86/fpu: x87 FPU will use FXSAVE >> [ 0.00] 000: BIOS-provided physical RAM map: >> [ 0.00] 000: BIOS-e820: [mem >> 0x-0x0009fbff] >> [ 0.00] 000: BIOS-e820: [mem >> 0x0009fc00-0x0009] >> [ 0.00] 000: BIOS-e820: [mem >> 0x000f-0x000f] >> [ 0.00] 000: BIOS-e820: [mem >> 0x0010-0x0ffdbfff] >> >> -- >> >> 2. Reproduce: >> (1)build kernel: (Attachment is my .config) >> make ARCH=x86_64 >> CROSS_COMPILE=[path-to-my-cross-toolchain]/x86_64-wrs-linux- >> >> (2)boot kernel with qemu: >> >> qemu-system-x86_64 \ >> -drive file=qemux86-64.rootfs.ext4,if=virtio,format=raw \ >> -nographic \ >> -kernel arch/x86/boot/bzImage \ >> -append 'root=/dev/vda rw highres=off console=ttyS0 mem=256M ip=dhcp' >> >> 3. Analysis: >> I find the following commit from >=linux-rt-devel-v5.0.3. cause the >> issue. >> >> b9d460e serial: 8250: implement write_atomic
Re: Bug report: A commit about serial8250 cause the output disorderly at the phase of startup
Hi all, Anyone notice this issue? --Hongzhi On 4/19/19 10:24 AM, Hongzhi, Song wrote: 1. Issue description: Boot kernel( >= linux-rt-devel-v5.0.3 ) with qemu. Then qemu will print following disorderly messages. At the beginning, the messages are disorderly. But then it becomes normally from printing "[ 0.00] 000: Linux version..." -- [ 0.019000] 000: tsc: Unable to calibrate against PIT [ 0.002583] 000: 6199.83 BogoMIPS (lpj=3099918) [ 0.521247] 000: Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz [ 0.521247] 000: (family: 0x6, model: 0xf [ 0.521247] 000: , stepping: 0xb) [ 0.533126] 000: unsupported p6 CPU model 15 [ 0.533318] 000: no PMU driver, software events only. [ 0.765082] 000: 1 ACPI AML tables successfully acquired and loaded [ 0.765274] 000: [ 0.785903] 000: Enabled 2 GPEs in block 00 to 0F [ 0.786128] 000: [ 0.835675] 000: acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge. [ 0.892056] 000: 5 [ 0.892289] 000: *10 [ 0.892416] 000: 11 [ 0.892527] 000: ) [ 0.892661] 000: /* skip some repeated contents */ [ 5.052149] 000: , 512kB Cache [ 0.00] 000: Linux version 5.0.3-yocto-preempt-rt+ (hsong@pek-lpggp1) (gcc version 8.3.0 (GCC)) #24 SMP PREEMPT Thu Apr 18 03:29:58 EDT 2019 [ 0.00] 000: Command line: root=/dev/vda rw highres=off console=ttyS0 mem=256M ip=192.168.7.4::192.168.7.3:255.255.255.0 vga=0 uvesafb.mode_opti0 [ 0.00] 000: x86/fpu: x87 FPU will use FXSAVE [ 0.00] 000: BIOS-provided physical RAM map: [ 0.00] 000: BIOS-e820: [mem 0x-0x0009fbff] [ 0.00] 000: BIOS-e820: [mem 0x0009fc00-0x0009] [ 0.00] 000: BIOS-e820: [mem 0x000f-0x000f] [ 0.00] 000: BIOS-e820: [mem 0x0010-0x0ffdbfff] -- 2. Reproduce: (1)build kernel: (Attachment is my .config) make ARCH=x86_64 CROSS_COMPILE=[path-to-my-cross-toolchain]/x86_64-wrs-linux- (2)boot kernel with qemu: qemu-system-x86_64 \ -drive file=qemux86-64.rootfs.ext4,if=virtio,format=raw \ -nographic \ -kernel arch/x86/boot/bzImage \ -append 'root=/dev/vda rw highres=off console=ttyS0 mem=256M ip=dhcp' 3. Analysis: I find the following commit from >=linux-rt-devel-v5.0.3. cause the issue. b9d460e serial: 8250: implement write_atomic
Re: [BUG REPORT] linux-input: keyboard: gpio_keys: False Button Press Event on Wake
Hi Ken, On Wed, Apr 03, 2019 at 05:50:09PM +, Ken Sloat wrote: > Hello Dmitry, > > I may have found a potential bug in the "gpio_keys" driver. FYI, I am > running the 4.14 LTS kernel on my system, but from my understanding of > the issue, it seems that this would still occur in the latest version > of the kernel. > > The problem: In the 4.14 LTS kernel, both key press and release events > can generate a wake event. In the 5.x kernel, wake events are > configurable for press only, release only or "both" (see > "wakeup-event-action" binding). The issue can occur in the "both" case > or release/deasserted case. Let's imagine that a system is suspended > when a gpio key button is pressed, and subsequently resumed when the > button is released. If we look at the sequence of actions and events > reported by the input system, we can see the potential problem: > > Button Pressed > Event Value 1 > System Suspend > Button Released > System Wake & Resume > Event Value 0 > Event Value 1 > Event Value 0 > > As you can see the input system will report an extra button > event/press. This appears to be caused in gpio_keys_gpio_isr by the > following statement: > > if (bdata->suspended && > (button->type == 0 || button->type == EV_KEY)) { > /* > * Simulate wakeup key press in case the key has > * already released by the time we got interrupt > * handler to run. > */ > input_report_key(bdata->input, button->code, 1); > } > > This code does not seem to take into account that the wake event may > have been caused by a button release action, and just assumes we must > have a button press. > > This can obviously be problematic in the use case I mentioned, as the > system would be put in a constant loop between waking and sleeping. > While there are other ways to deal with or react to this issue in the > userspace, it seems that the driver should probably take this into > account. > I believe the expectation is that we do not go to sleep with button still pressed, as we expect it to be released momentarily. Given that we do not know which edge woke us it is not clear if we can avoid simulating the keypress event, as this definitely causes "missed press", at least on some Android devices, where by the time we get to run this ISR user has already released the button. Thanks. -- Dmitry
Re: [bug report][stable] perf probe: failed to add events
On 21/03/19 12:10 PM, Greg KH wrote: > On Thu, Feb 28, 2019 at 09:19:08AM +0200, Adrian Hunter wrote: >> On 28/02/19 4:07 AM, Joseph Qi wrote: >>> Hi Adrian, >>> >>> On 19/2/27 20:39, Adrian Hunter wrote: Seems to be fixed by this: From: Adrian Hunter Date: Wed, 27 Feb 2019 05:35:25 +0200 Subject: [PATCH] perf probe: Fix getting the kernel map Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry trampolines"), perf tools has been creating more than one kernel map, however 'perf probe' assumed there could be only one. Fix by using machine__kernel_map() to get the main kernel map. Signed-off-by: Adrian Hunter Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry trampolines") >>> >>> Below is my investigation result before, FYI. >>> the first bad commit (v4.18 ~ v4.19): >>> d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines >> >> Yes we should add a fixes tag for that also. > > So, what do I need to do here for the stable tree(s) to resolve this > issue? Nothing. Patch was re-sent with a stable tag so will go through like normal: https://lore.kernel.org/lkml/2ed432de-e904-85d2-5c36-5897ddc5b...@intel.com/
Re: [bug report][stable] perf probe: failed to add events
On Thu, Feb 28, 2019 at 09:19:08AM +0200, Adrian Hunter wrote: > On 28/02/19 4:07 AM, Joseph Qi wrote: > > Hi Adrian, > > > > On 19/2/27 20:39, Adrian Hunter wrote: > >> Seems to be fixed by this: > >> > >> From: Adrian Hunter > >> Date: Wed, 27 Feb 2019 05:35:25 +0200 > >> Subject: [PATCH] perf probe: Fix getting the kernel map > >> > >> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86 > >> PTI entry trampolines"), perf tools has been creating more than one kernel > >> map, however 'perf probe' assumed there could be only one. > >> > >> Fix by using machine__kernel_map() to get the main kernel map. > >> > >> Signed-off-by: Adrian Hunter > >> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI > >> entry trampolines") > > > > Below is my investigation result before, FYI. > > the first bad commit (v4.18 ~ v4.19): > > d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines > > Yes we should add a fixes tag for that also. So, what do I need to do here for the stable tree(s) to resolve this issue? thanks, greg k-h
Re: [bug report][stable] perf probe: failed to add events
Hi Adrian, On 19/2/28 15:19, Adrian Hunter wrote: > On 28/02/19 4:07 AM, Joseph Qi wrote: >> Hi Adrian, >> >> On 19/2/27 20:39, Adrian Hunter wrote: >>> Seems to be fixed by this: >>> >>> From: Adrian Hunter >>> Date: Wed, 27 Feb 2019 05:35:25 +0200 >>> Subject: [PATCH] perf probe: Fix getting the kernel map >>> >>> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86 >>> PTI entry trampolines"), perf tools has been creating more than one kernel >>> map, however 'perf probe' assumed there could be only one. >>> >>> Fix by using machine__kernel_map() to get the main kernel map. >>> >>> Signed-off-by: Adrian Hunter >>> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI >>> entry trampolines") >> >> Below is my investigation result before, FYI. >> the first bad commit (v4.18 ~ v4.19): >> d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines > > Yes we should add a fixes tag for that also. > Have you sent out the patch officially? I can't find it in the mail list. Thanks, Joseph >> revert this commit on 4.19.0, it works. >> the first good commit again (v4.19 ~ v4.20): >> bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline >> backported this commit as well as the related commit 98f05b5138f0 on >> v4.19.24, it works. >> >> And I've tested your fix on v4.19.24, it also works. >> Tested-by: Joseph Qi >>
Re: [bug report][stable] perf probe: failed to add events
On 28/02/19 4:07 AM, Joseph Qi wrote: > Hi Adrian, > > On 19/2/27 20:39, Adrian Hunter wrote: >> Seems to be fixed by this: >> >> From: Adrian Hunter >> Date: Wed, 27 Feb 2019 05:35:25 +0200 >> Subject: [PATCH] perf probe: Fix getting the kernel map >> >> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86 >> PTI entry trampolines"), perf tools has been creating more than one kernel >> map, however 'perf probe' assumed there could be only one. >> >> Fix by using machine__kernel_map() to get the main kernel map. >> >> Signed-off-by: Adrian Hunter >> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI >> entry trampolines") > > Below is my investigation result before, FYI. > the first bad commit (v4.18 ~ v4.19): > d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines Yes we should add a fixes tag for that also. > revert this commit on 4.19.0, it works. > the first good commit again (v4.19 ~ v4.20): > bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline > backported this commit as well as the related commit 98f05b5138f0 on > v4.19.24, it works. > > And I've tested your fix on v4.19.24, it also works. > Tested-by: Joseph Qi >
Re: [bug report][stable] perf probe: failed to add events
Hi Adrian, On 19/2/27 20:39, Adrian Hunter wrote: > Seems to be fixed by this: > > From: Adrian Hunter > Date: Wed, 27 Feb 2019 05:35:25 +0200 > Subject: [PATCH] perf probe: Fix getting the kernel map > > Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86 > PTI entry trampolines"), perf tools has been creating more than one kernel > map, however 'perf probe' assumed there could be only one. > > Fix by using machine__kernel_map() to get the main kernel map. > > Signed-off-by: Adrian Hunter > Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry > trampolines") Below is my investigation result before, FYI. the first bad commit (v4.18 ~ v4.19): d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines revert this commit on 4.19.0, it works. the first good commit again (v4.19 ~ v4.20): bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline backported this commit as well as the related commit 98f05b5138f0 on v4.19.24, it works. And I've tested your fix on v4.19.24, it also works. Tested-by: Joseph Qi
Re: [bug report][stable] perf probe: failed to add events
On 26/02/19 4:20 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, Feb 26, 2019 at 02:08:02PM +0100, Greg KH escreveu: >> On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote: >>> >>> >>> On 19/2/26 17:05, Greg KH wrote: On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote: > Hi, > > I'm using kernel v4.19.24 and have found that there is an issue when > using perf probe to define a new dynamic tracepoint. > > $ perf probe -a handle_mm_fault > Failed to write event: Numerical result out of range > Error: Failed to add events. > > I've also tried kernel v4.20, and it can pass. Ick, has this ever worked on the 4.19 stable tree? If so, any chance you can run 'git bisect' to find the offending commit? >>> >From my test, v4.19.0 also has this issue. >>> Bisect locates that it is introduced by commit bf904d2762ee >>> "x86/pti/64: Remove the SYSCALL64 entry trampoline". >> >> But that commit was in 4.20, not 4.19. So if this never worked, it's >> not a regression? >> >> confused, > > Adrian, Ideas? > Seems to be fixed by this: From: Adrian Hunter Date: Wed, 27 Feb 2019 05:35:25 +0200 Subject: [PATCH] perf probe: Fix getting the kernel map Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry trampolines"), perf tools has been creating more than one kernel map, however 'perf probe' assumed there could be only one. Fix by using machine__kernel_map() to get the main kernel map. Signed-off-by: Adrian Hunter Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry trampolines") --- tools/perf/util/probe-event.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index e86f8be89157..6cd96f9b346d 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -157,8 +157,10 @@ static struct map *kernel_get_module_map(const char *module) if (module && strchr(module, '/')) return dso__new_map(module); - if (!module) - module = "kernel"; + if (!module) { + pos = machine__kernel_map(host_machine); + return map__get(pos); + } for (pos = maps__first(maps); pos; pos = map__next(pos)) { /* short_name is "[module]" */ -- 2.19.1
Re: [bug report][stable] perf probe: failed to add events
On 19/2/26 21:08, Greg KH wrote: > On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote: >> >> >> On 19/2/26 17:05, Greg KH wrote: >>> On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote: Hi, I'm using kernel v4.19.24 and have found that there is an issue when using perf probe to define a new dynamic tracepoint. $ perf probe -a handle_mm_fault Failed to write event: Numerical result out of range Error: Failed to add events. I've also tried kernel v4.20, and it can pass. >>> >>> Ick, has this ever worked on the 4.19 stable tree? If so, any chance >>> you can run 'git bisect' to find the offending commit? >>> >> >From my test, v4.19.0 also has this issue. >> Bisect locates that it is introduced by commit bf904d2762ee >> "x86/pti/64: Remove the SYSCALL64 entry trampoline". > > But that commit was in 4.20, not 4.19. So if this never worked, it's > not a regression? > > confused, Oops, my fault. The first bad commit is: d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines It is between v4.18 and v4.19. > > greg k-h >
Re: [bug report][stable] perf probe: failed to add events
Em Tue, Feb 26, 2019 at 02:08:02PM +0100, Greg KH escreveu: > On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote: > > > > > > On 19/2/26 17:05, Greg KH wrote: > > > On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote: > > >> Hi, > > >> > > >> I'm using kernel v4.19.24 and have found that there is an issue when > > >> using perf probe to define a new dynamic tracepoint. > > >> > > >> $ perf probe -a handle_mm_fault > > >> Failed to write event: Numerical result out of range > > >> Error: Failed to add events. > > >> > > >> I've also tried kernel v4.20, and it can pass. > > > > > > Ick, has this ever worked on the 4.19 stable tree? If so, any chance > > > you can run 'git bisect' to find the offending commit? > > > > > >From my test, v4.19.0 also has this issue. > > Bisect locates that it is introduced by commit bf904d2762ee > > "x86/pti/64: Remove the SYSCALL64 entry trampoline". > > But that commit was in 4.20, not 4.19. So if this never worked, it's > not a regression? > > confused, Adrian, Ideas? - Arnaldo
Re: [bug report][stable] perf probe: failed to add events
On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote: > > > On 19/2/26 17:05, Greg KH wrote: > > On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote: > >> Hi, > >> > >> I'm using kernel v4.19.24 and have found that there is an issue when > >> using perf probe to define a new dynamic tracepoint. > >> > >> $ perf probe -a handle_mm_fault > >> Failed to write event: Numerical result out of range > >> Error: Failed to add events. > >> > >> I've also tried kernel v4.20, and it can pass. > > > > Ick, has this ever worked on the 4.19 stable tree? If so, any chance > > you can run 'git bisect' to find the offending commit? > > > >From my test, v4.19.0 also has this issue. > Bisect locates that it is introduced by commit bf904d2762ee > "x86/pti/64: Remove the SYSCALL64 entry trampoline". But that commit was in 4.20, not 4.19. So if this never worked, it's not a regression? confused, greg k-h
Re: [bug report][stable] perf probe: failed to add events
On 19/2/26 17:05, Greg KH wrote: > On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote: >> Hi, >> >> I'm using kernel v4.19.24 and have found that there is an issue when >> using perf probe to define a new dynamic tracepoint. >> >> $ perf probe -a handle_mm_fault >> Failed to write event: Numerical result out of range >> Error: Failed to add events. >> >> I've also tried kernel v4.20, and it can pass. > > Ick, has this ever worked on the 4.19 stable tree? If so, any chance > you can run 'git bisect' to find the offending commit? > >From my test, v4.19.0 also has this issue. Bisect locates that it is introduced by commit bf904d2762ee "x86/pti/64: Remove the SYSCALL64 entry trampoline". Thanks, Joseph >> So I've bisected and finally found the first good commit is: >> bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline >> which is based on another commit: >> 98f05b5138f0 Use the TSS sp2 slot for SYSCALL/SYSRET scratch space >> >> Once I've backpoted these two commits into 4.19.24, the above case can >> pass, though I'm not sure how it is fixed. >> So is there any plan to let them go into stable as well? > > If they are needed, I'll gladly queue them up, but this feels like > something might have broken, so it should be easier to just revert the > offending commit instead. > > Andy, any ideas? > > thanks, > > greg k-h >
Re: [bug report][stable] perf probe: failed to add events
On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote: > Hi, > > I'm using kernel v4.19.24 and have found that there is an issue when > using perf probe to define a new dynamic tracepoint. > > $ perf probe -a handle_mm_fault > Failed to write event: Numerical result out of range > Error: Failed to add events. > > I've also tried kernel v4.20, and it can pass. Ick, has this ever worked on the 4.19 stable tree? If so, any chance you can run 'git bisect' to find the offending commit? > So I've bisected and finally found the first good commit is: > bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline > which is based on another commit: > 98f05b5138f0 Use the TSS sp2 slot for SYSCALL/SYSRET scratch space > > Once I've backpoted these two commits into 4.19.24, the above case can > pass, though I'm not sure how it is fixed. > So is there any plan to let them go into stable as well? If they are needed, I'll gladly queue them up, but this feels like something might have broken, so it should be easier to just revert the offending commit instead. Andy, any ideas? thanks, greg k-h
Re: bug report: iwlwifi: mvm: support mac80211 TXQs model
On Wed, 2019-02-20 at 14:40 +, Colin Ian King wrote: > > ..when the used_hw_queues initialization was removed: > > @@ -360,8 +300,6 @@ int iwl_mvm_mac_ctxt_init(struct iwl_mvm *mvm, > struct ieee80211_vif *vif) > mvm->hw, IEEE80211_IFACE_ITER_RESUME_ALL, > iwl_mvm_mac_iface_iterator, ); > > - used_hw_queues = iwl_mvm_get_used_hw_queues(mvm, vif); > - > /* > * In the case we're getting here during resume, it's similar to > * firmware restart, and with RESUME_ALL the iterator will find > > > I'm not 100% sure if the right thing to do is just to now initialize > used_hw_queues to zero; it's not entirely clear what the initial value > should be now. My gut feeling says that we should just remove all of the code - we no longer use the vif->hw_queue stuff. Which also explains why we didn't notice any problems from this :) Something like diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c index aa308f7e7989..f90383943c62 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c @@ -263,8 +263,7 @@ int iwl_mvm_mac_ctxt_init(struct iwl_mvm *mvm, struct ieee80211_vif *vif) .found_vif = false, }; u32 ac; - int ret, i, queue_limit; - unsigned long used_hw_queues; + int ret, i; lockdep_assert_held(>mutex); @@ -341,38 +340,9 @@ int iwl_mvm_mac_ctxt_init(struct iwl_mvm *mvm, struct ieee80211_vif *vif) INIT_LIST_HEAD(>time_event_data.list); mvmvif->time_event_data.id = TE_MAX; - /* No need to allocate data queues to P2P Device MAC and NAN.*/ if (vif->type == NL80211_IFTYPE_P2P_DEVICE || - vif->type == NL80211_IFTYPE_NAN) { - for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) - vif->hw_queue[ac] = IEEE80211_INVAL_HW_QUEUE; - + vif->type == NL80211_IFTYPE_NAN) return 0; - } - - /* -* queues in mac80211 almost entirely independent of -* the ones here - no real limit -*/ - queue_limit = IEEE80211_MAX_QUEUES; - - /* -* Find available queues, and allocate them to the ACs. When in -* DQA-mode they aren't really used, and this is done only so the -* mac80211 ieee80211_check_queues() function won't fail -*/ - for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { - u8 queue = find_first_zero_bit(_hw_queues, queue_limit); - - if (queue >= queue_limit) { - IWL_ERR(mvm, "Failed to allocate queue\n"); - ret = -EIO; - goto exit_fail; - } - - __set_bit(queue, _hw_queues); - vif->hw_queue[ac] = queue; - } /* Allocate the CAB queue for softAP and GO interfaces */ if (vif->type == NL80211_IFTYPE_AP || I'll think about it a bit more and test it out, thanks for the report! johannes
Re: [bug report] Input: add st-keyscan driver
Hi all, I prefer to fix it. I guess people used their own correction. I will send you a fix asap. Best Regards Gabriel On 1/27/19 2:20 AM, Ken Sloat wrote: > On Sat, Jan 26, 2019 at 5:15 PM Dmitry Torokhov > wrote: >> On Sat, Jan 26, 2019 at 1:25 PM Ken Sloat >> wrote: >>> On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter >>> wrote: Hello Gabriel FERNANDEZ, >>> Hello Dan, >>> >>> I have added CCs for the maintainers as well as Gabriel Fernandez as >>> currently you just have the linux-input mailing list >>> The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12, 2014, leads to the following static checker warning: drivers/input/keyboard/st-keyscan.c:156 keyscan_probe() error: potential zalloc NULL dereference: 'keypad_data->input_dev' drivers/input/keyboard/st-keyscan.c 125 static int keyscan_probe(struct platform_device *pdev) 126 { 127 struct st_keyscan *keypad_data; 128 struct input_dev *input_dev; 129 struct resource *res; 130 int error; 131 132 if (!pdev->dev.of_node) { 133 dev_err(>dev, "no DT data present\n"); 134 return -EINVAL; 135 } 136 137 keypad_data = devm_kzalloc(>dev, sizeof(*keypad_data), 138GFP_KERNEL); 139 if (!keypad_data) 140 return -ENOMEM; 141 142 input_dev = devm_input_allocate_device(>dev); 143 if (!input_dev) { 144 dev_err(>dev, "failed to allocate the input device\n"); 145 return -ENOMEM; 146 } 147 148 input_dev->name = pdev->name; 149 input_dev->phys = "keyscan-keys/input0"; 150 input_dev->dev.parent = >dev; 151 input_dev->open = keyscan_open; 152 input_dev->close = keyscan_close; 153 154 input_dev->id.bustype = BUS_HOST; 155 --> 156 error = keypad_matrix_key_parse_dt(keypad_data); ^^^ >>> I agree with you this would be a problem >>> to clarify the NULL derefence would occur here within >>> keypad_matrix_key_parse_dt >>> >>> struct device *dev = keypad_data->input_dev->dev.parent; >>> This assumes we have set "keypad_data->input_dev = input_dev;" but we don't do that until... 157 if (error) 158 return error; 159 160 error = matrix_keypad_build_keymap(NULL, NULL, 161keypad_data->n_rows, 162keypad_data->n_cols, 163NULL, input_dev); 164 if (error) { 165 dev_err(>dev, "failed to build keymap\n"); 166 return error; 167 } 168 169 input_set_drvdata(input_dev, keypad_data); 170 171 keypad_data->input_dev = input_dev; ^^ this line here. This driver has never worked and it was included almost five years ago. Is it worth fixing? 172 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); 174 keypad_data->base = devm_ioremap_resource(>dev, res); 175 if (IS_ERR(keypad_data->base)) 176 return PTR_ERR(keypad_data->base); 177 regards, dan carpenter >>> Here is the interesting thing, I was looking on patchwork, and several >>> of the patches including what appears to be the latest actually set >>> "keypad_data->input_dev = input_dev" before calling >>> "keypad_matrix_key_parse_dt" >>> >>> From v4 on patchwork >>> + if (IS_ERR(keypad_data->clk)) { >>> + dev_err(>dev, "cannot get clock"); >>> + return PTR_ERR(keypad_data->clk); >>> + } >>> + >>> + keypad_data->input_dev = input_dev; >>> + >>> + input_dev->name = pdev->name; >>> + input_dev->phys = "keyscan-keys/input0"; >>> + input_dev->dev.parent = >dev; >>> + input_dev->open = keyscan_open; >>> + input_dev->close = keyscan_close; >>> + >>> + input_dev->id.bustype = BUS_HOST; >>> + >>> + error = keypad_matrix_key_parse_dt(keypad_data); >>> >>> According to patchwork, these aren't listed as accepted, so I'm not >>> sure where the exact accepted patch came from. Looking at the commit >>> log, it looks like the issue you showed above was made in the original >>> commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure >>> what is
Re: [bug report] Input: add st-keyscan driver
On Sat, Jan 26, 2019 at 5:15 PM Dmitry Torokhov wrote: > > On Sat, Jan 26, 2019 at 1:25 PM Ken Sloat > wrote: > > > > On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter > > wrote: > > > > > > Hello Gabriel FERNANDEZ, > > > > Hello Dan, > > > > I have added CCs for the maintainers as well as Gabriel Fernandez as > > currently you just have the linux-input mailing list > > > > > The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12, > > > 2014, leads to the following static checker warning: > > > > > > drivers/input/keyboard/st-keyscan.c:156 keyscan_probe() > > > error: potential zalloc NULL dereference: 'keypad_data->input_dev' > > > > > > drivers/input/keyboard/st-keyscan.c > > > 125 static int keyscan_probe(struct platform_device *pdev) > > > 126 { > > > 127 struct st_keyscan *keypad_data; > > > 128 struct input_dev *input_dev; > > > 129 struct resource *res; > > > 130 int error; > > > 131 > > > 132 if (!pdev->dev.of_node) { > > > 133 dev_err(>dev, "no DT data present\n"); > > > 134 return -EINVAL; > > > 135 } > > > 136 > > > 137 keypad_data = devm_kzalloc(>dev, > > > sizeof(*keypad_data), > > > 138GFP_KERNEL); > > > 139 if (!keypad_data) > > > 140 return -ENOMEM; > > > 141 > > > 142 input_dev = devm_input_allocate_device(>dev); > > > 143 if (!input_dev) { > > > 144 dev_err(>dev, "failed to allocate the input > > > device\n"); > > > 145 return -ENOMEM; > > > 146 } > > > 147 > > > 148 input_dev->name = pdev->name; > > > 149 input_dev->phys = "keyscan-keys/input0"; > > > 150 input_dev->dev.parent = >dev; > > > 151 input_dev->open = keyscan_open; > > > 152 input_dev->close = keyscan_close; > > > 153 > > > 154 input_dev->id.bustype = BUS_HOST; > > > 155 > > > --> 156 error = keypad_matrix_key_parse_dt(keypad_data); > > >^^^ > > I agree with you this would be a problem > > to clarify the NULL derefence would occur here within > > keypad_matrix_key_parse_dt > > > > struct device *dev = keypad_data->input_dev->dev.parent; > > > > > This assumes we have set "keypad_data->input_dev = input_dev;" but we > > > don't do that until... > > > > > > 157 if (error) > > > 158 return error; > > > 159 > > > 160 error = matrix_keypad_build_keymap(NULL, NULL, > > > 161keypad_data->n_rows, > > > 162keypad_data->n_cols, > > > 163NULL, input_dev); > > > 164 if (error) { > > > 165 dev_err(>dev, "failed to build keymap\n"); > > > 166 return error; > > > 167 } > > > 168 > > > 169 input_set_drvdata(input_dev, keypad_data); > > > 170 > > > 171 keypad_data->input_dev = input_dev; > > > ^^ > > > > > > this line here. This driver has never worked and it was included almost > > > five years ago. Is it worth fixing? > > > > > > 172 > > > 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > > > 174 keypad_data->base = devm_ioremap_resource(>dev, > > > res); > > > 175 if (IS_ERR(keypad_data->base)) > > > 176 return PTR_ERR(keypad_data->base); > > > 177 > > > > > > regards, > > > dan carpenter > > > > > > > Here is the interesting thing, I was looking on patchwork, and several > > of the patches including what appears to be the latest actually set > > "keypad_data->input_dev = input_dev" before calling > > "keypad_matrix_key_parse_dt" > > > > From v4 on patchwork > > + if (IS_ERR(keypad_data->clk)) { > > + dev_err(>dev, "cannot get clock"); > > + return PTR_ERR(keypad_data->clk); > > + } > > + > > + keypad_data->input_dev = input_dev; > > + > > + input_dev->name = pdev->name; > > + input_dev->phys = "keyscan-keys/input0"; > > + input_dev->dev.parent = >dev; > > + input_dev->open = keyscan_open; > > + input_dev->close = keyscan_close; > > + > > + input_dev->id.bustype = BUS_HOST; > > + > > + error = keypad_matrix_key_parse_dt(keypad_data); > > > > According to patchwork, these aren't listed as accepted, so I'm not > > sure where the exact accepted patch came from. Looking at the commit > > log, it looks like the issue you showed above was made in the original > > commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure > > what is going on here. Maybe the maintainer can chime in with the > > original patch/mailing list discussion on this. For reference, I've > >
Re: [bug report] Input: add st-keyscan driver
On Sat, Jan 26, 2019 at 1:25 PM Ken Sloat wrote: > > On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter > wrote: > > > > Hello Gabriel FERNANDEZ, > > Hello Dan, > > I have added CCs for the maintainers as well as Gabriel Fernandez as > currently you just have the linux-input mailing list > > > The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12, > > 2014, leads to the following static checker warning: > > > > drivers/input/keyboard/st-keyscan.c:156 keyscan_probe() > > error: potential zalloc NULL dereference: 'keypad_data->input_dev' > > > > drivers/input/keyboard/st-keyscan.c > > 125 static int keyscan_probe(struct platform_device *pdev) > > 126 { > > 127 struct st_keyscan *keypad_data; > > 128 struct input_dev *input_dev; > > 129 struct resource *res; > > 130 int error; > > 131 > > 132 if (!pdev->dev.of_node) { > > 133 dev_err(>dev, "no DT data present\n"); > > 134 return -EINVAL; > > 135 } > > 136 > > 137 keypad_data = devm_kzalloc(>dev, sizeof(*keypad_data), > > 138GFP_KERNEL); > > 139 if (!keypad_data) > > 140 return -ENOMEM; > > 141 > > 142 input_dev = devm_input_allocate_device(>dev); > > 143 if (!input_dev) { > > 144 dev_err(>dev, "failed to allocate the input > > device\n"); > > 145 return -ENOMEM; > > 146 } > > 147 > > 148 input_dev->name = pdev->name; > > 149 input_dev->phys = "keyscan-keys/input0"; > > 150 input_dev->dev.parent = >dev; > > 151 input_dev->open = keyscan_open; > > 152 input_dev->close = keyscan_close; > > 153 > > 154 input_dev->id.bustype = BUS_HOST; > > 155 > > --> 156 error = keypad_matrix_key_parse_dt(keypad_data); > >^^^ > I agree with you this would be a problem > to clarify the NULL derefence would occur here within > keypad_matrix_key_parse_dt > > struct device *dev = keypad_data->input_dev->dev.parent; > > > This assumes we have set "keypad_data->input_dev = input_dev;" but we > > don't do that until... > > > > 157 if (error) > > 158 return error; > > 159 > > 160 error = matrix_keypad_build_keymap(NULL, NULL, > > 161keypad_data->n_rows, > > 162keypad_data->n_cols, > > 163NULL, input_dev); > > 164 if (error) { > > 165 dev_err(>dev, "failed to build keymap\n"); > > 166 return error; > > 167 } > > 168 > > 169 input_set_drvdata(input_dev, keypad_data); > > 170 > > 171 keypad_data->input_dev = input_dev; > > ^^ > > > > this line here. This driver has never worked and it was included almost > > five years ago. Is it worth fixing? > > > > 172 > > 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > > 174 keypad_data->base = devm_ioremap_resource(>dev, res); > > 175 if (IS_ERR(keypad_data->base)) > > 176 return PTR_ERR(keypad_data->base); > > 177 > > > > regards, > > dan carpenter > > > > Here is the interesting thing, I was looking on patchwork, and several > of the patches including what appears to be the latest actually set > "keypad_data->input_dev = input_dev" before calling > "keypad_matrix_key_parse_dt" > > From v4 on patchwork > + if (IS_ERR(keypad_data->clk)) { > + dev_err(>dev, "cannot get clock"); > + return PTR_ERR(keypad_data->clk); > + } > + > + keypad_data->input_dev = input_dev; > + > + input_dev->name = pdev->name; > + input_dev->phys = "keyscan-keys/input0"; > + input_dev->dev.parent = >dev; > + input_dev->open = keyscan_open; > + input_dev->close = keyscan_close; > + > + input_dev->id.bustype = BUS_HOST; > + > + error = keypad_matrix_key_parse_dt(keypad_data); > > According to patchwork, these aren't listed as accepted, so I'm not > sure where the exact accepted patch came from. Looking at the commit > log, it looks like the issue you showed above was made in the original > commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure > what is going on here. Maybe the maintainer can chime in with the > original patch/mailing list discussion on this. For reference, I've > added the patchwork links below > https://patchwork.kernel.org/patch/3854341/ > https://patchwork.kernel.org/patch/3968891/ > https://patchwork.kernel.org/patch/3969991/ It may very well be that I messed up when applying the patch. I guess whatever platform that is using the driver has not attempted to update their
Re: [bug report] Input: add st-keyscan driver
On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter wrote: > > Hello Gabriel FERNANDEZ, Hello Dan, I have added CCs for the maintainers as well as Gabriel Fernandez as currently you just have the linux-input mailing list > The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12, > 2014, leads to the following static checker warning: > > drivers/input/keyboard/st-keyscan.c:156 keyscan_probe() > error: potential zalloc NULL dereference: 'keypad_data->input_dev' > > drivers/input/keyboard/st-keyscan.c > 125 static int keyscan_probe(struct platform_device *pdev) > 126 { > 127 struct st_keyscan *keypad_data; > 128 struct input_dev *input_dev; > 129 struct resource *res; > 130 int error; > 131 > 132 if (!pdev->dev.of_node) { > 133 dev_err(>dev, "no DT data present\n"); > 134 return -EINVAL; > 135 } > 136 > 137 keypad_data = devm_kzalloc(>dev, sizeof(*keypad_data), > 138GFP_KERNEL); > 139 if (!keypad_data) > 140 return -ENOMEM; > 141 > 142 input_dev = devm_input_allocate_device(>dev); > 143 if (!input_dev) { > 144 dev_err(>dev, "failed to allocate the input > device\n"); > 145 return -ENOMEM; > 146 } > 147 > 148 input_dev->name = pdev->name; > 149 input_dev->phys = "keyscan-keys/input0"; > 150 input_dev->dev.parent = >dev; > 151 input_dev->open = keyscan_open; > 152 input_dev->close = keyscan_close; > 153 > 154 input_dev->id.bustype = BUS_HOST; > 155 > --> 156 error = keypad_matrix_key_parse_dt(keypad_data); >^^^ I agree with you this would be a problem to clarify the NULL derefence would occur here within keypad_matrix_key_parse_dt struct device *dev = keypad_data->input_dev->dev.parent; > This assumes we have set "keypad_data->input_dev = input_dev;" but we > don't do that until... > > 157 if (error) > 158 return error; > 159 > 160 error = matrix_keypad_build_keymap(NULL, NULL, > 161keypad_data->n_rows, > 162keypad_data->n_cols, > 163NULL, input_dev); > 164 if (error) { > 165 dev_err(>dev, "failed to build keymap\n"); > 166 return error; > 167 } > 168 > 169 input_set_drvdata(input_dev, keypad_data); > 170 > 171 keypad_data->input_dev = input_dev; > ^^ > > this line here. This driver has never worked and it was included almost > five years ago. Is it worth fixing? > > 172 > 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > 174 keypad_data->base = devm_ioremap_resource(>dev, res); > 175 if (IS_ERR(keypad_data->base)) > 176 return PTR_ERR(keypad_data->base); > 177 > > regards, > dan carpenter > Here is the interesting thing, I was looking on patchwork, and several of the patches including what appears to be the latest actually set "keypad_data->input_dev = input_dev" before calling "keypad_matrix_key_parse_dt" >From v4 on patchwork + if (IS_ERR(keypad_data->clk)) { + dev_err(>dev, "cannot get clock"); + return PTR_ERR(keypad_data->clk); + } + + keypad_data->input_dev = input_dev; + + input_dev->name = pdev->name; + input_dev->phys = "keyscan-keys/input0"; + input_dev->dev.parent = >dev; + input_dev->open = keyscan_open; + input_dev->close = keyscan_close; + + input_dev->id.bustype = BUS_HOST; + + error = keypad_matrix_key_parse_dt(keypad_data); According to patchwork, these aren't listed as accepted, so I'm not sure where the exact accepted patch came from. Looking at the commit log, it looks like the issue you showed above was made in the original commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure what is going on here. Maybe the maintainer can chime in with the original patch/mailing list discussion on this. For reference, I've added the patchwork links below https://patchwork.kernel.org/patch/3854341/ https://patchwork.kernel.org/patch/3968891/ https://patchwork.kernel.org/patch/3969991/ Thanks, Ken Sloat
Re: Bug report: unaligned access with ext4 encryption
Hi, On Thu, Jan 10, 2019 at 03:01:14PM -0800, Eric Biggers wrote: > On Fri, Jan 11, 2019 at 12:29:28AM +0200, Aaro Koskinen wrote: > > Hi, > > > > On Fri, Jan 04, 2019 at 05:28:02PM +, David Howells wrote: > > > Eric Biggers wrote: > > > > Hi Aaro, thanks for the bug report! I think you're on the right track; > > > > it makes > > > > much more sense to have the keyrings subsystem store the payload with > > > > better > > > > alignment, than to work around the 2-byte alignment in fscrypt. > > > > > > > > But how about '__aligned(__alignof__(u64))' instead? 4 bytes may not > > > > be enough. > > > > > > > > David, what do you think? > > > > > > Does that even work? > > > > That should work. > > > > > Might be better to just insert 6 bytes of padding with a comment, but yes > > > I > > > agree that it's probably better to align it to at least machine word size. > > > > Padding is fragile, e.g. if struct rcu_head changes. Using __aligned should > > make it always right automatically. > > > > A. > > I agree that __aligned is better. It should work; see 'struct crypto_tfm' in > include/linux/crypto.h for another example of a struct that uses __aligned on > a > flexible array at the end. > > Aaro, can you send a formal patch? If you don't I'll do so, but I figure I'll > ask first. Please go ahead; I'd prefer if you send the patch, I will then test it on SPARC and reply with Tested-by (if it works :). A.
Re: Bug report: unaligned access with ext4 encryption
On Fri, Jan 11, 2019 at 12:29:28AM +0200, Aaro Koskinen wrote: > Hi, > > On Fri, Jan 04, 2019 at 05:28:02PM +, David Howells wrote: > > Eric Biggers wrote: > > > Hi Aaro, thanks for the bug report! I think you're on the right track; > > > it makes > > > much more sense to have the keyrings subsystem store the payload with > > > better > > > alignment, than to work around the 2-byte alignment in fscrypt. > > > > > > But how about '__aligned(__alignof__(u64))' instead? 4 bytes may not be > > > enough. > > > > > > David, what do you think? > > > > Does that even work? > > That should work. > > > Might be better to just insert 6 bytes of padding with a comment, but yes I > > agree that it's probably better to align it to at least machine word size. > > Padding is fragile, e.g. if struct rcu_head changes. Using __aligned should > make it always right automatically. > > A. I agree that __aligned is better. It should work; see 'struct crypto_tfm' in include/linux/crypto.h for another example of a struct that uses __aligned on a flexible array at the end. Aaro, can you send a formal patch? If you don't I'll do so, but I figure I'll ask first. Thanks, - Eric
Re: Bug report: unaligned access with ext4 encryption
Hi, On Fri, Jan 04, 2019 at 05:28:02PM +, David Howells wrote: > Eric Biggers wrote: > > Hi Aaro, thanks for the bug report! I think you're on the right track; it > > makes > > much more sense to have the keyrings subsystem store the payload with better > > alignment, than to work around the 2-byte alignment in fscrypt. > > > > But how about '__aligned(__alignof__(u64))' instead? 4 bytes may not be > > enough. > > > > David, what do you think? > > Does that even work? That should work. > Might be better to just insert 6 bytes of padding with a comment, but yes I > agree that it's probably better to align it to at least machine word size. Padding is fragile, e.g. if struct rcu_head changes. Using __aligned should make it always right automatically. A.
Re: Bug report: unaligned access with ext4 encryption
Eric Biggers wrote: > Hi Aaro, thanks for the bug report! I think you're on the right track; it > makes > much more sense to have the keyrings subsystem store the payload with better > alignment, than to work around the 2-byte alignment in fscrypt. > > But how about '__aligned(__alignof__(u64))' instead? 4 bytes may not be > enough. > > David, what do you think? Does that even work? Might be better to just insert 6 bytes of padding with a comment, but yes I agree that it's probably better to align it to at least machine word size. David
Re: Bug report: unaligned access with ext4 encryption
On Sun, Dec 30, 2018 at 06:29:06PM +0200, Aaro Koskinen wrote: > Hi, > > When using ext4 encryption on SPARC, there's plenty of dmesg noise about > unaligned access: > > [ 167.269526] Kernel unaligned access at TPC[5497a0] > find_and_lock_process_key+0x80/0x120 > [ 167.270152] Kernel unaligned access at TPC[5497a0] > find_and_lock_process_key+0x80/0x120 > [ 181.087451] log_unaligned: 5 callbacks suppressed > [ 181.087511] Kernel unaligned access at TPC[5497a0] > find_and_lock_process_key+0x80/0x120 > [ 181.092435] Kernel unaligned access at TPC[5497a0] > find_and_lock_process_key+0x80/0x120 > [ 181.095816] Kernel unaligned access at TPC[5497a0] > find_and_lock_process_key+0x80/0x120 > > And also seen on an ARM machine: > > $ cat /proc/cpu/alignment > User: 0 > System: 1028193 (find_and_lock_process_key+0x84/0x10c) > Skipped:0 > Half: 0 > Word: 1028193 > DWord: 0 > Multi: 0 > User faults:0 (ignored) > > Looks like user_key_payload layout is not optimal when data address > is used for fscrypt_key... I tried the below change and got rid of the > messages. Not sure what the proper fix should be? > > A. > > diff --git a/include/keys/user-type.h b/include/keys/user-type.h > index e098cbe27db5..6495ffcfe510 100644 > --- a/include/keys/user-type.h > +++ b/include/keys/user-type.h > @@ -31,7 +31,7 @@ > struct user_key_payload { > struct rcu_head rcu;/* RCU destructor */ > unsigned short datalen;/* length of this data */ > - chardata[0];/* actual data */ > + char data[0] __aligned(4); /* actual data */ > }; > > extern struct key_type key_type_user; > Hi Aaro, thanks for the bug report! I think you're on the right track; it makes much more sense to have the keyrings subsystem store the payload with better alignment, than to work around the 2-byte alignment in fscrypt. But how about '__aligned(__alignof__(u64))' instead? 4 bytes may not be enough. David, what do you think? - Eric
Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization
On 12/27/18 6:45 PM, Andrew Morton wrote: > On Thu, 27 Dec 2018 11:24:31 -0800 Mike Kravetz > wrote: >> It would be better to make an explicit check for mapping != null before >> calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to >> code above will not potentially lead to the possibility of mapping == null. >> >> I'm not sure what is the best way to handle this. Below is an updated >> version >> of the patch sent to Andrew. I can also provide a simple patch to the patch >> if that is easier. >> > > Below is the delta. Please check it. It seems to do more than the > above implies. > > Also, I have notes here that > > hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch > and > hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch > > have additional updates pending. Due to emails such as > > http://lkml.kernel.org/r/849f5202-2200-265f-7769-8363053e8...@oracle.com > http://lkml.kernel.org/r/732c0b7d-5a4e-97a8-9677-30f352089...@oracle.com > http://lkml.kernel.org/r/6b91dd42-b903-1f6c-729a-bd9f51273...@oracle.com > > What's the status, please? > There was a V3 of the patches which was Acked-by Kirill. See, http://lkml.kernel.org/r/20181224101349.jjjmk2hzwah6g64h@kshutemo-mobl1 The two V3 patches are: http://lkml.kernel.org/r/2018123013.22193-2-mike.krav...@oracle.com http://lkml.kernel.org/r/2018123013.22193-3-mike.krav...@oracle.com The patch I sent in this thread was an update to the V3. The delta you created was based on V2. So, the delta contains V2 -> V3 changes as well as the changes mentioned in this thread. My apologies for not noticing and clarifying. Let me know what you would like me to do to help. I hate to send any more patches right now as they might cause more confusion. -- Mike Kravetz
Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization
On Thu, 27 Dec 2018 11:24:31 -0800 Mike Kravetz wrote: > On 12/27/18 3:44 AM, Colin Ian King wrote: > > Hi, > > > > Static analysis with CoverityScan on linux-next detected a potential > > null pointer dereference with the following commit: > > > > From d8a1051ed4ba55679ef24e838a1942c9c40f0a14 Mon Sep 17 00:00:00 2001 > > From: Mike Kravetz > > Date: Sat, 22 Dec 2018 10:55:57 +1100 > > Subject: [PATCH] hugetlbfs: use i_mmap_rwsem for more pmd sharing > > > > The earlier check implies that "mapping" may be a null pointer: > > > > var_compare_op: Comparing mapping to null implies that mapping might be > > null. > > > > 1008if (!(flags & MF_MUST_KILL) && !PageDirty(hpage) && mapping && > > 1009mapping_cap_writeback_dirty(mapping)) { > > > > ..however later "mapper" is dereferenced when it may be potentially null: > > > > 1034/* > > 1035 * For hugetlb pages, try_to_unmap could potentially > > call > > 1036 * huge_pmd_unshare. Because of this, take semaphore in > > 1037 * write mode here and set TTU_RMAP_LOCKED to > > indicate we > > 1038 * have taken the lock at this higer level. > > 1039 */ > > CID 1476097 (#1 of 1): Dereference after null check (FORWARD_NULL) > > > > var_deref_model: Passing null pointer mapping to > > i_mmap_lock_write, which dereferences it. > > > > 1040i_mmap_lock_write(mapping); > > 1041unmap_success = try_to_unmap(hpage, > > ttu|TTU_RMAP_LOCKED); > > 1042i_mmap_unlock_write(mapping); > > > > Thanks for the report. > > The 'good news' is that mapping can not be null in the code path above. > The reasons are: > - The page is locked upon entry to the routine > - Earlier in the routine there is the check: > if (!page_mapped(hpage)) > return true; > For huge pages (which are processed in the else clause above), page_mapped > implies page->mapping != null. > > However, the routine hwpoison_user_mappings handles all page types. The > page_mapped check is actually there to check for pages in the swap cache. > It is just coincidence that it also implies mapping != null for huge pages. > > It would be better to make an explicit check for mapping != null before > calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to > code above will not potentially lead to the possibility of mapping == null. > > I'm not sure what is the best way to handle this. Below is an updated version > of the patch sent to Andrew. I can also provide a simple patch to the patch > if that is easier. > Below is the delta. Please check it. It seems to do more than the above implies. Also, I have notes here that hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch and hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch have additional updates pending. Due to emails such as http://lkml.kernel.org/r/849f5202-2200-265f-7769-8363053e8...@oracle.com http://lkml.kernel.org/r/732c0b7d-5a4e-97a8-9677-30f352089...@oracle.com http://lkml.kernel.org/r/6b91dd42-b903-1f6c-729a-bd9f51273...@oracle.com What's the status, please? From: Mike Kravetz Subject: hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix It would be better to make an explicit check for mapping != null before calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to code above will not potentially lead to the possibility of mapping == null. Signed-off-by: Mike Kravetz Cc: Michal Hocko Cc: Hugh Dickins Cc: Naoya Horiguchi Cc: "Aneesh Kumar K . V" Cc: Andrea Arcangeli Cc: "Kirill A . Shutemov" Cc: Davidlohr Bueso Cc: Prakash Sangappa Cc: Colin Ian King Signed-off-by: Andrew Morton --- --- a/mm/hugetlb.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix +++ a/mm/hugetlb.c @@ -3250,6 +3250,14 @@ int copy_hugetlb_page_range(struct mm_st mmu_notifier_range_init(, src, vma->vm_start, vma->vm_end, MMU_NOTIFY_CLEAR); mmu_notifier_invalidate_range_start(); + } else { + /* +* For shared mappings i_mmap_rwsem must be held to call +* huge_pte_alloc, otherwise the returned ptep could go +* away if part of a shared pmd and another thread calls +* huge_pmd_unshare. +*/ + i_mmap_lock_read(mapping); } for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) { @@ -3259,18 +3267,8 @@ int copy_hugetlb_page_range(struct mm_st if (!src_pte) continue; - /* -* i_mmap_rwsem must be held to call huge_pte_alloc. -* Continue to hold until finished with dst_pte, otherwise -* it could go away if part of a shared pmd. -* -*
Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization
On 12/27/18 3:44 AM, Colin Ian King wrote: > Hi, > > Static analysis with CoverityScan on linux-next detected a potential > null pointer dereference with the following commit: > > From d8a1051ed4ba55679ef24e838a1942c9c40f0a14 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz > Date: Sat, 22 Dec 2018 10:55:57 +1100 > Subject: [PATCH] hugetlbfs: use i_mmap_rwsem for more pmd sharing > > The earlier check implies that "mapping" may be a null pointer: > > var_compare_op: Comparing mapping to null implies that mapping might be > null. > > 1008if (!(flags & MF_MUST_KILL) && !PageDirty(hpage) && mapping && > 1009mapping_cap_writeback_dirty(mapping)) { > > ..however later "mapper" is dereferenced when it may be potentially null: > > 1034/* > 1035 * For hugetlb pages, try_to_unmap could potentially > call > 1036 * huge_pmd_unshare. Because of this, take semaphore in > 1037 * write mode here and set TTU_RMAP_LOCKED to > indicate we > 1038 * have taken the lock at this higer level. > 1039 */ > CID 1476097 (#1 of 1): Dereference after null check (FORWARD_NULL) > > var_deref_model: Passing null pointer mapping to > i_mmap_lock_write, which dereferences it. > > 1040i_mmap_lock_write(mapping); > 1041unmap_success = try_to_unmap(hpage, > ttu|TTU_RMAP_LOCKED); > 1042i_mmap_unlock_write(mapping); > Thanks for the report. The 'good news' is that mapping can not be null in the code path above. The reasons are: - The page is locked upon entry to the routine - Earlier in the routine there is the check: if (!page_mapped(hpage)) return true; For huge pages (which are processed in the else clause above), page_mapped implies page->mapping != null. However, the routine hwpoison_user_mappings handles all page types. The page_mapped check is actually there to check for pages in the swap cache. It is just coincidence that it also implies mapping != null for huge pages. It would be better to make an explicit check for mapping != null before calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to code above will not potentially lead to the possibility of mapping == null. I'm not sure what is the best way to handle this. Below is an updated version of the patch sent to Andrew. I can also provide a simple patch to the patch if that is easier. From: Mike Kravetz hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization While looking at BUGs associated with invalid huge page map counts, it was discovered and observed that a huge pte pointer could become 'invalid' and point to another task's page table. Consider the following: A task takes a page fault on a shared hugetlbfs file and calls huge_pte_alloc to get a ptep. Suppose the returned ptep points to a shared pmd. Now, another task truncates the hugetlbfs file. As part of truncation, it unmaps everyone who has the file mapped. If the range being truncated is covered by a shared pmd, huge_pmd_unshare will be called. For all but the last user of the shared pmd, huge_pmd_unshare will clear the pud pointing to the pmd. If the task in the middle of the page fault is not the last user, the ptep returned by huge_pte_alloc now points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invalid memory references. To fix, expand the use of i_mmap_rwsem as follows: - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called. huge_pmd_share is only called via huge_pte_alloc, so callers of huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers of huge_pte_alloc continue to hold the semaphore until finished with the ptep. - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called. Cc: Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz --- mm/hugetlb.c| 67 ++--- mm/memory-failure.c | 16 +-- mm/migrate.c| 13 - mm/rmap.c | 4 +++ mm/userfaultfd.c| 11 ++-- 5 files changed, 90 insertions(+), 21 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 309fb8c969af..2a3162030167 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3239,6 +3239,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, int cow; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); + struct address_space *mapping = vma->vm_file->f_mapping; unsigned long mmun_start; /* For mmu_notifiers */ unsigned long mmun_end; /* For mmu_notifiers */ int ret = 0; @@ -3247,14 +3248,25 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmun_start = vma->vm_start; mmun_end = vma->vm_end; - if (cow) + if
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
On Wed, 17 Oct 2018 at 16:50, Aaro Koskinen wrote: > > Hi, > > On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote: > > Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292 > > could you confirm that the original mmc driver (kernel from imgtech > > people) did work ok with PREEMPT_NONE (sorry I did not do my homework) > > ? > > Sorry, cannot confirm or test that. I have only used the mainline kernel > on this board, since v4.5 or so with my own custom config which has > been PREEMPT_NONE until now. > Aaro, I spent some time today investigating this issue. I think this driver has a broken pre-request/post-request implementation. Will post some patches soon. Thanks for the report, -- Ezequiel García, VanguardiaSur www.vanguardiasur.com.ar
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
On Wed, 17 Oct 2018 at 16:50, Aaro Koskinen wrote: > > Hi, > > On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote: > > Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292 > > could you confirm that the original mmc driver (kernel from imgtech > > people) did work ok with PREEMPT_NONE (sorry I did not do my homework) > > ? > > Sorry, cannot confirm or test that. I have only used the mainline kernel > on this board, since v4.5 or so with my own custom config which has > been PREEMPT_NONE until now. > Aaro, I spent some time today investigating this issue. I think this driver has a broken pre-request/post-request implementation. Will post some patches soon. Thanks for the report, -- Ezequiel García, VanguardiaSur www.vanguardiasur.com.ar
Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure
Hi Sergey, Gregory, Sergey Matyukevich writes: > MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64 > defconfig. According to quick bisection of the board dts file, the > root cause is in dts patch enabling CPU deep idle states: > see 8ed46368776b3bc. Is there any pending fix other than reverting > deep idle state support ? I reproduced the same boot failure on the Armada 8K based Clearfog GT-8K. Revert of commit 8ed46368776b3bc on top of v4.20-rc1 makes the system boot again. Here is the splat I get: [ 87.358425] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 87.364556] rcu: 3-...!: (1 ticks this GP) idle=d04/0/0x0 softirq=79/79 fqs=1 [ 87.371897] rcu: (detected by 2, t=21490 jiffies, g=-1035, q=179) [ 87.378106] Task dump for CPU 3: [ 87.381347] swapper/3 R running task0 0 1 0x0028 [ 87.388428] Call trace: [ 87.390892] __switch_to+0x78/0xb4 [ 87.394310] arm_enter_idle_state+0x38/0x4c [ 87.398512] cpuidle_enter_state+0xc8/0x15c [ 87.402713] cpuidle_enter+0x18/0x20 [ 87.406304] call_cpuidle+0x30/0x34 [ 87.409807] do_idle+0x19c/0x224 [ 87.413048] cpu_startup_entry+0x20/0x24 [ 87.416988] secondary_start_kernel+0x158/0x168 baruch -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure
Hi Sergey, Gregory, Sergey Matyukevich writes: > MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64 > defconfig. According to quick bisection of the board dts file, the > root cause is in dts patch enabling CPU deep idle states: > see 8ed46368776b3bc. Is there any pending fix other than reverting > deep idle state support ? I reproduced the same boot failure on the Armada 8K based Clearfog GT-8K. Revert of commit 8ed46368776b3bc on top of v4.20-rc1 makes the system boot again. Here is the splat I get: [ 87.358425] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 87.364556] rcu: 3-...!: (1 ticks this GP) idle=d04/0/0x0 softirq=79/79 fqs=1 [ 87.371897] rcu: (detected by 2, t=21490 jiffies, g=-1035, q=179) [ 87.378106] Task dump for CPU 3: [ 87.381347] swapper/3 R running task0 0 1 0x0028 [ 87.388428] Call trace: [ 87.390892] __switch_to+0x78/0xb4 [ 87.394310] arm_enter_idle_state+0x38/0x4c [ 87.398512] cpuidle_enter_state+0xc8/0x15c [ 87.402713] cpuidle_enter+0x18/0x20 [ 87.406304] call_cpuidle+0x30/0x34 [ 87.409807] do_idle+0x19c/0x224 [ 87.413048] cpu_startup_entry+0x20/0x24 [ 87.416988] secondary_start_kernel+0x158/0x168 baruch -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure
On Fri, Nov 09, 2018 at 03:11:12PM +0300, Sergey Matyukevich wrote: > Hello Gregory, > > MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64 > defconfig. According to quick bisection of the board dts file, the > root cause is in dts patch enabling CPU deep idle states: > see 8ed46368776b3bc. Is there any pending fix other than reverting > deep idle state support ? Adding devicetree mailing list.
Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure
On Fri, Nov 09, 2018 at 03:11:12PM +0300, Sergey Matyukevich wrote: > Hello Gregory, > > MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64 > defconfig. According to quick bisection of the board dts file, the > root cause is in dts patch enabling CPU deep idle states: > see 8ed46368776b3bc. Is there any pending fix other than reverting > deep idle state support ? Adding devicetree mailing list.
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
Hi, On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote: > Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292 > could you confirm that the original mmc driver (kernel from imgtech > people) did work ok with PREEMPT_NONE (sorry I did not do my homework) > ? Sorry, cannot confirm or test that. I have only used the mainline kernel on this board, since v4.5 or so with my own custom config which has been PREEMPT_NONE until now. A.
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
Hi, On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote: > Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292 > could you confirm that the original mmc driver (kernel from imgtech > people) did work ok with PREEMPT_NONE (sorry I did not do my homework) > ? Sorry, cannot confirm or test that. I have only used the mainline kernel on this board, since v4.5 or so with my own custom config which has been PREEMPT_NONE until now. A.
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
On Sun, 2018-10-14 at 20:04 +0300, Aaro Koskinen wrote: > Hi, > > There is something wrong with jz4740-mmc in current mainline kernel > (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist > prior those), as the DMA support does not work properly if I disable > kernel pre-emption. The console gets flooded with: > > [ 16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 567 host->next_data.cookie 568 > [ 16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 568 host->next_data.cookie 569 > [ 16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 569 host->next_data.cookie 570 > [ 16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 570 host->next_data.cookie 571 > [ 16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 571 host->next_data.cookie 572 > [ 16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 572 host->next_data.cookie 573 > [ 16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 573 host->next_data.cookie 574 > [ 16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 574 host->next_data.cookie 575 > [ 16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 575 host->next_data.cookie 576 > [ 16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 582 host->next_data.cookie 583 > > etc. ad inf. > > This should be easily reproducible on CI20 board with ci20_defconfig > and setting CONFIG_PREEMPT_NONE=y. > Will take a look as soon as possible, most likely after ELCE. Thanks, Eze
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
On Sun, 2018-10-14 at 20:04 +0300, Aaro Koskinen wrote: > Hi, > > There is something wrong with jz4740-mmc in current mainline kernel > (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist > prior those), as the DMA support does not work properly if I disable > kernel pre-emption. The console gets flooded with: > > [ 16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 567 host->next_data.cookie 568 > [ 16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 568 host->next_data.cookie 569 > [ 16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 569 host->next_data.cookie 570 > [ 16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 570 host->next_data.cookie 571 > [ 16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 571 host->next_data.cookie 572 > [ 16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 572 host->next_data.cookie 573 > [ 16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 573 host->next_data.cookie 574 > [ 16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 574 host->next_data.cookie 575 > [ 16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 575 host->next_data.cookie 576 > [ 16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 582 host->next_data.cookie 583 > > etc. ad inf. > > This should be easily reproducible on CI20 board with ci20_defconfig > and setting CONFIG_PREEMPT_NONE=y. > Will take a look as soon as possible, most likely after ELCE. Thanks, Eze
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
Hi, On Sun, Oct 14, 2018 at 7:04 PM Aaro Koskinen wrote: > > Hi, > > There is something wrong with jz4740-mmc in current mainline kernel > (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist > prior those), as the DMA support does not work properly if I disable > kernel pre-emption. The console gets flooded with: > > [ 16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 567 host->next_data.cookie 568 > [ 16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 568 host->next_data.cookie 569 > [ 16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 569 host->next_data.cookie 570 > [ 16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 570 host->next_data.cookie 571 > [ 16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 571 host->next_data.cookie 572 > [ 16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 572 host->next_data.cookie 573 > [ 16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 573 host->next_data.cookie 574 > [ 16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 574 host->next_data.cookie 575 > [ 16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 575 host->next_data.cookie 576 > [ 16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 582 host->next_data.cookie 583 > > etc. ad inf. > > This should be easily reproducible on CI20 board with ci20_defconfig > and setting CONFIG_PREEMPT_NONE=y. Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292 could you confirm that the original mmc driver (kernel from imgtech people) did work ok with PREEMPT_NONE (sorry I did not do my homework) ? Thanks > A.
Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE
Hi, On Sun, Oct 14, 2018 at 7:04 PM Aaro Koskinen wrote: > > Hi, > > There is something wrong with jz4740-mmc in current mainline kernel > (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist > prior those), as the DMA support does not work properly if I disable > kernel pre-emption. The console gets flooded with: > > [ 16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 567 host->next_data.cookie 568 > [ 16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 568 host->next_data.cookie 569 > [ 16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 569 host->next_data.cookie 570 > [ 16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 570 host->next_data.cookie 571 > [ 16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 571 host->next_data.cookie 572 > [ 16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 572 host->next_data.cookie 573 > [ 16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 573 host->next_data.cookie 574 > [ 16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 574 host->next_data.cookie 575 > [ 16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 575 host->next_data.cookie 576 > [ 16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid > cookie: data->host_cookie 582 host->next_data.cookie 583 > > etc. ad inf. > > This should be easily reproducible on CI20 board with ci20_defconfig > and setting CONFIG_PREEMPT_NONE=y. Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292 could you confirm that the original mmc driver (kernel from imgtech people) did work ok with PREEMPT_NONE (sorry I did not do my homework) ? Thanks > A.
Re: Bug report: HiBMC crash
Quoting John Garry (2018-09-21 09:11:19) > On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote: > > Hi John, > > Thank you for reporting bug. > > I am now using 4.18.7. I haven't found this issue yet. > > I will try linux-next and figure out what's wrong with it. > > > > Thanks, > > Xinliang > > > > > > As mentioned in internal mail, the issue may be that the surface > depth/bpp we were using the in the driver was previously invalid, but > code has since been added in v4.19 to reject this. Specifically it looks > like this patch: > > commit 70109354fed232dfce8fb2c7cadf635acbe03e19 > Author: Chris Wilson > Date: Wed Sep 5 16:31:16 2018 +0100 > > drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c index b92595c477ef..f3e7f41e6781 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c @@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper, DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n", sizes->surface_width, sizes->surface_height, sizes->surface_bpp); - sizes->surface_depth = 32; bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8); @@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv) return -ENOMEM; } - priv->fbdev = hifbdev; drm_fb_helper_prepare(priv->dev, >helper, _fbdev_helper_funcs); @@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv) fix->ypanstep, fix->ywrapstep, fix->line_length, fix->accel, fix->capabilities); + priv->fbdev = hifbdev; return 0; fini: Apply chunks 2&3 first to confirm they fix the GPF. -Chris
Re: Bug report: HiBMC crash
Quoting John Garry (2018-09-21 09:11:19) > On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote: > > Hi John, > > Thank you for reporting bug. > > I am now using 4.18.7. I haven't found this issue yet. > > I will try linux-next and figure out what's wrong with it. > > > > Thanks, > > Xinliang > > > > > > As mentioned in internal mail, the issue may be that the surface > depth/bpp we were using the in the driver was previously invalid, but > code has since been added in v4.19 to reject this. Specifically it looks > like this patch: > > commit 70109354fed232dfce8fb2c7cadf635acbe03e19 > Author: Chris Wilson > Date: Wed Sep 5 16:31:16 2018 +0100 > > drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c index b92595c477ef..f3e7f41e6781 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c @@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper, DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n", sizes->surface_width, sizes->surface_height, sizes->surface_bpp); - sizes->surface_depth = 32; bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8); @@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv) return -ENOMEM; } - priv->fbdev = hifbdev; drm_fb_helper_prepare(priv->dev, >helper, _fbdev_helper_funcs); @@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv) fix->ypanstep, fix->ywrapstep, fix->line_length, fix->accel, fix->capabilities); + priv->fbdev = hifbdev; return 0; fini: Apply chunks 2&3 first to confirm they fix the GPF. -Chris
Re: Bug report about KASLR and ZONE_MOVABLE
On Tue 17-07-18 09:51:20, Baoquan He wrote: > On 07/16/18 at 05:24pm, Michal Hocko wrote: > > On Mon 16-07-18 21:02:02, Baoquan He wrote: > > > On 07/16/18 at 01:38pm, Michal Hocko wrote: > > > > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > > > > Hi Michal, > > > > > > > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > > > > [...] > > > > > > I am not able to find the beginning of the email thread right now. > > > > > > Could > > > > > > you summarize what is the actual problem please? > > > > > > > > > > The bug is found on x86 now. > > > > > > > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > > > > kernel memory is spread evenly among nodes. However, this is right > > > > > when > > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > > > > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, > > > > > and > > > > > we specify "kernelcore=50%", means each node will take 10G for > > > > > kernelcore, 10G for movable area. But this doesn't take kernel > > > > > position > > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > > > > movable, in fact there's only 5G available for movable, just after > > > > > kernel. > > > > > > > > OK, I guess I see that part. But who is going to use movablecore along > > > > with KASLR enabled? I mean do we really have to support those two > > > > obscure command line parameters for KASLR? > > > > > > Not very sure whether we have to support both of those to work with > > > KASLR. Maybe it's time to make clear of it now. > > > > Yes, I would really like to deprecate this. It is an ugly piece of code > > and it's far from easily maintainable as well. > > > > > For 'kernelcore=mirror', we have solved the conflict to make it work well > > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting > > > patches to fix it. As for 'kernelcore=' and 'movablecore=', > > > > > > 1) solve the conflict between them with KASLR in > > >find_zone_movable_pfns_for_nodes(); > > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; > > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; > > > 4) add note in doc to notice people to not add them at the same time; > > > > I would simply warn that those kernel parameters are not supported > > anymore. If somebody shows up with a valid usecase we can reconsider. > > OK, got it. The use case I can think of is that people want to check > hotplug on system w/o hotplug ACPI info. Let's see whether there is really somebody like that and complain. > I am fine with warning people they are not supported. Should I post a > patch to address this, or you will do it? Both is fine to me. I will happily ack it if you create a patch. -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On Tue 17-07-18 09:51:20, Baoquan He wrote: > On 07/16/18 at 05:24pm, Michal Hocko wrote: > > On Mon 16-07-18 21:02:02, Baoquan He wrote: > > > On 07/16/18 at 01:38pm, Michal Hocko wrote: > > > > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > > > > Hi Michal, > > > > > > > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > > > > [...] > > > > > > I am not able to find the beginning of the email thread right now. > > > > > > Could > > > > > > you summarize what is the actual problem please? > > > > > > > > > > The bug is found on x86 now. > > > > > > > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > > > > kernel memory is spread evenly among nodes. However, this is right > > > > > when > > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > > > > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, > > > > > and > > > > > we specify "kernelcore=50%", means each node will take 10G for > > > > > kernelcore, 10G for movable area. But this doesn't take kernel > > > > > position > > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > > > > movable, in fact there's only 5G available for movable, just after > > > > > kernel. > > > > > > > > OK, I guess I see that part. But who is going to use movablecore along > > > > with KASLR enabled? I mean do we really have to support those two > > > > obscure command line parameters for KASLR? > > > > > > Not very sure whether we have to support both of those to work with > > > KASLR. Maybe it's time to make clear of it now. > > > > Yes, I would really like to deprecate this. It is an ugly piece of code > > and it's far from easily maintainable as well. > > > > > For 'kernelcore=mirror', we have solved the conflict to make it work well > > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting > > > patches to fix it. As for 'kernelcore=' and 'movablecore=', > > > > > > 1) solve the conflict between them with KASLR in > > >find_zone_movable_pfns_for_nodes(); > > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; > > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; > > > 4) add note in doc to notice people to not add them at the same time; > > > > I would simply warn that those kernel parameters are not supported > > anymore. If somebody shows up with a valid usecase we can reconsider. > > OK, got it. The use case I can think of is that people want to check > hotplug on system w/o hotplug ACPI info. Let's see whether there is really somebody like that and complain. > I am fine with warning people they are not supported. Should I post a > patch to address this, or you will do it? Both is fine to me. I will happily ack it if you create a patch. -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On 07/16/18 at 05:24pm, Michal Hocko wrote: > On Mon 16-07-18 21:02:02, Baoquan He wrote: > > On 07/16/18 at 01:38pm, Michal Hocko wrote: > > > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > > > Hi Michal, > > > > > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > > > [...] > > > > > I am not able to find the beginning of the email thread right now. > > > > > Could > > > > > you summarize what is the actual problem please? > > > > > > > > The bug is found on x86 now. > > > > > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > > > kernel memory is spread evenly among nodes. However, this is right when > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > > > > we specify "kernelcore=50%", means each node will take 10G for > > > > kernelcore, 10G for movable area. But this doesn't take kernel position > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > > > movable, in fact there's only 5G available for movable, just after > > > > kernel. > > > > > > OK, I guess I see that part. But who is going to use movablecore along > > > with KASLR enabled? I mean do we really have to support those two > > > obscure command line parameters for KASLR? > > > > Not very sure whether we have to support both of those to work with > > KASLR. Maybe it's time to make clear of it now. > > Yes, I would really like to deprecate this. It is an ugly piece of code > and it's far from easily maintainable as well. > > > For 'kernelcore=mirror', we have solved the conflict to make it work well > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting > > patches to fix it. As for 'kernelcore=' and 'movablecore=', > > > > 1) solve the conflict between them with KASLR in > >find_zone_movable_pfns_for_nodes(); > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; > > 4) add note in doc to notice people to not add them at the same time; > > I would simply warn that those kernel parameters are not supported > anymore. If somebody shows up with a valid usecase we can reconsider. OK, got it. The use case I can think of is that people want to check hotplug on system w/o hotplug ACPI info. I am fine with warning people they are not supported. Should I post a patch to address this, or you will do it? Both is fine to me. > > > 2) and 3) may need be fixed in arch/x86 code. As long as come to an > > agreement, any one is fine to me. > > > > > > In fact I would be much more concerned about memory hotplug and > > > pre-defined movable nodes. Does the current KASLR code work in that > > > case? > > > > As said above, kernelcore=mirror works well with KASLR now. Making > > 'movable_node' work with KASLR is in progress. > > OK, thanks for the info. You are welcome. Thanks Baoquan
Re: Bug report about KASLR and ZONE_MOVABLE
On 07/16/18 at 05:24pm, Michal Hocko wrote: > On Mon 16-07-18 21:02:02, Baoquan He wrote: > > On 07/16/18 at 01:38pm, Michal Hocko wrote: > > > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > > > Hi Michal, > > > > > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > > > [...] > > > > > I am not able to find the beginning of the email thread right now. > > > > > Could > > > > > you summarize what is the actual problem please? > > > > > > > > The bug is found on x86 now. > > > > > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > > > kernel memory is spread evenly among nodes. However, this is right when > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > > > > we specify "kernelcore=50%", means each node will take 10G for > > > > kernelcore, 10G for movable area. But this doesn't take kernel position > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > > > movable, in fact there's only 5G available for movable, just after > > > > kernel. > > > > > > OK, I guess I see that part. But who is going to use movablecore along > > > with KASLR enabled? I mean do we really have to support those two > > > obscure command line parameters for KASLR? > > > > Not very sure whether we have to support both of those to work with > > KASLR. Maybe it's time to make clear of it now. > > Yes, I would really like to deprecate this. It is an ugly piece of code > and it's far from easily maintainable as well. > > > For 'kernelcore=mirror', we have solved the conflict to make it work well > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting > > patches to fix it. As for 'kernelcore=' and 'movablecore=', > > > > 1) solve the conflict between them with KASLR in > >find_zone_movable_pfns_for_nodes(); > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; > > 4) add note in doc to notice people to not add them at the same time; > > I would simply warn that those kernel parameters are not supported > anymore. If somebody shows up with a valid usecase we can reconsider. OK, got it. The use case I can think of is that people want to check hotplug on system w/o hotplug ACPI info. I am fine with warning people they are not supported. Should I post a patch to address this, or you will do it? Both is fine to me. > > > 2) and 3) may need be fixed in arch/x86 code. As long as come to an > > agreement, any one is fine to me. > > > > > > In fact I would be much more concerned about memory hotplug and > > > pre-defined movable nodes. Does the current KASLR code work in that > > > case? > > > > As said above, kernelcore=mirror works well with KASLR now. Making > > 'movable_node' work with KASLR is in progress. > > OK, thanks for the info. You are welcome. Thanks Baoquan
Re: Bug report about KASLR and ZONE_MOVABLE
On Mon 16-07-18 21:02:02, Baoquan He wrote: > On 07/16/18 at 01:38pm, Michal Hocko wrote: > > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > > Hi Michal, > > > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > > [...] > > > > I am not able to find the beginning of the email thread right now. Could > > > > you summarize what is the actual problem please? > > > > > > The bug is found on x86 now. > > > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > > kernel memory is spread evenly among nodes. However, this is right when > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > > > we specify "kernelcore=50%", means each node will take 10G for > > > kernelcore, 10G for movable area. But this doesn't take kernel position > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > > movable, in fact there's only 5G available for movable, just after > > > kernel. > > > > OK, I guess I see that part. But who is going to use movablecore along > > with KASLR enabled? I mean do we really have to support those two > > obscure command line parameters for KASLR? > > Not very sure whether we have to support both of those to work with > KASLR. Maybe it's time to make clear of it now. Yes, I would really like to deprecate this. It is an ugly piece of code and it's far from easily maintainable as well. > For 'kernelcore=mirror', we have solved the conflict to make it work well > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting > patches to fix it. As for 'kernelcore=' and 'movablecore=', > > 1) solve the conflict between them with KASLR in >find_zone_movable_pfns_for_nodes(); > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; > 4) add note in doc to notice people to not add them at the same time; I would simply warn that those kernel parameters are not supported anymore. If somebody shows up with a valid usecase we can reconsider. > 2) and 3) may need be fixed in arch/x86 code. As long as come to an > agreement, any one is fine to me. > > > > In fact I would be much more concerned about memory hotplug and > > pre-defined movable nodes. Does the current KASLR code work in that > > case? > > As said above, kernelcore=mirror works well with KASLR now. Making > 'movable_node' work with KASLR is in progress. OK, thanks for the info. -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On Mon 16-07-18 21:02:02, Baoquan He wrote: > On 07/16/18 at 01:38pm, Michal Hocko wrote: > > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > > Hi Michal, > > > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > > [...] > > > > I am not able to find the beginning of the email thread right now. Could > > > > you summarize what is the actual problem please? > > > > > > The bug is found on x86 now. > > > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > > kernel memory is spread evenly among nodes. However, this is right when > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > > > we specify "kernelcore=50%", means each node will take 10G for > > > kernelcore, 10G for movable area. But this doesn't take kernel position > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > > movable, in fact there's only 5G available for movable, just after > > > kernel. > > > > OK, I guess I see that part. But who is going to use movablecore along > > with KASLR enabled? I mean do we really have to support those two > > obscure command line parameters for KASLR? > > Not very sure whether we have to support both of those to work with > KASLR. Maybe it's time to make clear of it now. Yes, I would really like to deprecate this. It is an ugly piece of code and it's far from easily maintainable as well. > For 'kernelcore=mirror', we have solved the conflict to make it work well > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting > patches to fix it. As for 'kernelcore=' and 'movablecore=', > > 1) solve the conflict between them with KASLR in >find_zone_movable_pfns_for_nodes(); > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; > 4) add note in doc to notice people to not add them at the same time; I would simply warn that those kernel parameters are not supported anymore. If somebody shows up with a valid usecase we can reconsider. > 2) and 3) may need be fixed in arch/x86 code. As long as come to an > agreement, any one is fine to me. > > > > In fact I would be much more concerned about memory hotplug and > > pre-defined movable nodes. Does the current KASLR code work in that > > case? > > As said above, kernelcore=mirror works well with KASLR now. Making > 'movable_node' work with KASLR is in progress. OK, thanks for the info. -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On 07/16/18 at 01:38pm, Michal Hocko wrote: > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > Hi Michal, > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > [...] > > > I am not able to find the beginning of the email thread right now. Could > > > you summarize what is the actual problem please? > > > > The bug is found on x86 now. > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > kernel memory is spread evenly among nodes. However, this is right when > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > > we specify "kernelcore=50%", means each node will take 10G for > > kernelcore, 10G for movable area. But this doesn't take kernel position > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > movable, in fact there's only 5G available for movable, just after > > kernel. > > OK, I guess I see that part. But who is going to use movablecore along > with KASLR enabled? I mean do we really have to support those two > obscure command line parameters for KASLR? Not very sure whether we have to support both of those to work with KASLR. Maybe it's time to make clear of it now. For 'kernelcore=mirror', we have solved the conflict to make it work well with KASLR. For 'movable_node' conflict with KASLR, Chao is posting patches to fix it. As for 'kernelcore=' and 'movablecore=', 1) solve the conflict between them with KASLR in find_zone_movable_pfns_for_nodes(); 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; 4) add note in doc to notice people to not add them at the same time; 2) and 3) may need be fixed in arch/x86 code. As long as come to an agreement, any one is fine to me. > > In fact I would be much more concerned about memory hotplug and > pre-defined movable nodes. Does the current KASLR code work in that > case? As said above, kernelcore=mirror works well with KASLR now. Making 'movable_node' work with KASLR is in progress. Thanks Baoquan
Re: Bug report about KASLR and ZONE_MOVABLE
On 07/16/18 at 01:38pm, Michal Hocko wrote: > On Fri 13-07-18 07:52:40, Baoquan He wrote: > > Hi Michal, > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote: > [...] > > > I am not able to find the beginning of the email thread right now. Could > > > you summarize what is the actual problem please? > > > > The bug is found on x86 now. > > > > When added "kernelcore=" or "movablecore=" into kernel command line, > > kernel memory is spread evenly among nodes. However, this is right when > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > > we specify "kernelcore=50%", means each node will take 10G for > > kernelcore, 10G for movable area. But this doesn't take kernel position > > into consideration. E.g if kernel is put at 15G of 2nd node, namely > > node1. Then we think on node1 there's 10G for kernelcore, 10G for > > movable, in fact there's only 5G available for movable, just after > > kernel. > > OK, I guess I see that part. But who is going to use movablecore along > with KASLR enabled? I mean do we really have to support those two > obscure command line parameters for KASLR? Not very sure whether we have to support both of those to work with KASLR. Maybe it's time to make clear of it now. For 'kernelcore=mirror', we have solved the conflict to make it work well with KASLR. For 'movable_node' conflict with KASLR, Chao is posting patches to fix it. As for 'kernelcore=' and 'movablecore=', 1) solve the conflict between them with KASLR in find_zone_movable_pfns_for_nodes(); 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set; 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled; 4) add note in doc to notice people to not add them at the same time; 2) and 3) may need be fixed in arch/x86 code. As long as come to an agreement, any one is fine to me. > > In fact I would be much more concerned about memory hotplug and > pre-defined movable nodes. Does the current KASLR code work in that > case? As said above, kernelcore=mirror works well with KASLR now. Making 'movable_node' work with KASLR is in progress. Thanks Baoquan
Re: Bug report about KASLR and ZONE_MOVABLE
On Fri 13-07-18 07:52:40, Baoquan He wrote: > Hi Michal, > > On 07/12/18 at 02:32pm, Michal Hocko wrote: [...] > > I am not able to find the beginning of the email thread right now. Could > > you summarize what is the actual problem please? > > The bug is found on x86 now. > > When added "kernelcore=" or "movablecore=" into kernel command line, > kernel memory is spread evenly among nodes. However, this is right when > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > we specify "kernelcore=50%", means each node will take 10G for > kernelcore, 10G for movable area. But this doesn't take kernel position > into consideration. E.g if kernel is put at 15G of 2nd node, namely > node1. Then we think on node1 there's 10G for kernelcore, 10G for > movable, in fact there's only 5G available for movable, just after > kernel. OK, I guess I see that part. But who is going to use movablecore along with KASLR enabled? I mean do we really have to support those two obscure command line parameters for KASLR? In fact I would be much more concerned about memory hotplug and pre-defined movable nodes. Does the current KASLR code work in that case? -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On Fri 13-07-18 07:52:40, Baoquan He wrote: > Hi Michal, > > On 07/12/18 at 02:32pm, Michal Hocko wrote: [...] > > I am not able to find the beginning of the email thread right now. Could > > you summarize what is the actual problem please? > > The bug is found on x86 now. > > When added "kernelcore=" or "movablecore=" into kernel command line, > kernel memory is spread evenly among nodes. However, this is right when > KASLR is not enabled, then kernel will be at 16M of place in x86 arch. > If KASLR enabled, it could be put any place from 16M to 64T randomly. > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and > we specify "kernelcore=50%", means each node will take 10G for > kernelcore, 10G for movable area. But this doesn't take kernel position > into consideration. E.g if kernel is put at 15G of 2nd node, namely > node1. Then we think on node1 there's 10G for kernelcore, 10G for > movable, in fact there's only 5G available for movable, just after > kernel. OK, I guess I see that part. But who is going to use movablecore along with KASLR enabled? I mean do we really have to support those two obscure command line parameters for KASLR? In fact I would be much more concerned about memory hotplug and pre-defined movable nodes. Does the current KASLR code work in that case? -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote: >Hi Michal, > >On 07/12/18 at 02:32pm, Michal Hocko wrote: >> On Thu 12-07-18 14:01:15, Chao Fan wrote: >> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: >> > >Hi Baoquan, >> > > >> > >At 07/11/2018 08:40 PM, Baoquan He wrote: >> > >> Please try this v3 patch: >> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 >> > >> From: Baoquan He >> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 >> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text >> > >> >> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting >> > >> PFN movable zone begins in each node, kernel text position is not >> > >> considered. KASLR may put kernel after which movable zone begins. >> > >> >> > >> Fix it by finding movable zone after kernel text on that node. >> > >> >> > >> Signed-off-by: Baoquan He >> > > >> > > >> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or >> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the >> > >last node, or more. >> > >> > I think it may not fail. >> > There is a 'restart' to do another pass. >> > >> > > >> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying >> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix >> > >this in the KASLR side. >> > > >> > >> > I think it's better to fix here, but not KASLR side. >> > Cause much more code will be change if doing it in KASLR side. >> > Since we didn't parse 'kernelcore' in compressed code, and you can see >> > the distribution of ZONE_MOVABLE need so much code, so we do not need >> > to do so much job in KASLR side. But here, several lines will be OK. >> >> I am not able to find the beginning of the email thread right now. Could >> you summarize what is the actual problem please? > >The bug is found on x86 now. > >When added "kernelcore=" or "movablecore=" into kernel command line, >kernel memory is spread evenly among nodes. However, this is right when >KASLR is not enabled, then kernel will be at 16M of place in x86 arch. >If KASLR enabled, it could be put any place from 16M to 64T randomly. > >Consider a scenario, we have 10 nodes, and each node has 20G memory, and >we specify "kernelcore=50%", means each node will take 10G for >kernelcore, 10G for movable area. But this doesn't take kernel position >into consideration. E.g if kernel is put at 15G of 2nd node, namely >node1. Then we think on node1 there's 10G for kernelcore, 10G for >movable, in fact there's only 5G available for movable, just after >kernel. > >I made a v4 patch which possibly can fix it. > > >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 >From: Baoquan He >Date: Fri, 13 Jul 2018 07:49:29 +0800 >Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text > >In find_zone_movable_pfns_for_nodes(), when try to find the starting >PFN movable zone begins at in each node, kernel text position is not >considered. KASLR may put kernel after which movable zone begins. > >Fix it by finding movable zone after kernel text on that node. > >Signed-off-by: Baoquan He You can post it as alone PATCH, then I will test it next week. Thanks, Chao Fan >--- > mm/page_alloc.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c >index 1521100f1e63..5bc1a47dafda 100644 >--- a/mm/page_alloc.c >+++ b/mm/page_alloc.c >@@ -6547,7 +6547,7 @@ static unsigned long __init >early_calculate_totalpages(void) > static void __init find_zone_movable_pfns_for_nodes(void) > { > int i, nid; >- unsigned long usable_startpfn; >+ unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; > unsigned long kernelcore_node, kernelcore_remaining; > /* save the state before borrow the nodemask */ > nodemask_t saved_node_state = node_states[N_MEMORY]; >@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) > if (!required_kernelcore || required_kernelcore >= totalpages) > goto out; > >+ kernel_endpfn = PFN_UP(__pa_symbol(_end)); > /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ >- usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; >+ arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; > > restart: > /* Spread kernelcore memory as evenly as possible throughout nodes */ >@@ -6659,6 +6660,16 @@ static void __init >find_zone_movable_pfns_for_nodes(void) > unsigned long start_pfn, end_pfn; > > /* >+ * KASLR may put kernel near tail of node memory, >+ * start after kernel on that node to find PFN >+ * at which zone begins. >+ */ >+ if (pfn_to_nid(kernel_endpfn) == nid) >+ usable_startpfn = max(arch_startpfn, kernel_endpfn);
Re: Bug report about KASLR and ZONE_MOVABLE
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote: >Hi Michal, > >On 07/12/18 at 02:32pm, Michal Hocko wrote: >> On Thu 12-07-18 14:01:15, Chao Fan wrote: >> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: >> > >Hi Baoquan, >> > > >> > >At 07/11/2018 08:40 PM, Baoquan He wrote: >> > >> Please try this v3 patch: >> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 >> > >> From: Baoquan He >> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 >> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text >> > >> >> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting >> > >> PFN movable zone begins in each node, kernel text position is not >> > >> considered. KASLR may put kernel after which movable zone begins. >> > >> >> > >> Fix it by finding movable zone after kernel text on that node. >> > >> >> > >> Signed-off-by: Baoquan He >> > > >> > > >> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or >> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the >> > >last node, or more. >> > >> > I think it may not fail. >> > There is a 'restart' to do another pass. >> > >> > > >> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying >> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix >> > >this in the KASLR side. >> > > >> > >> > I think it's better to fix here, but not KASLR side. >> > Cause much more code will be change if doing it in KASLR side. >> > Since we didn't parse 'kernelcore' in compressed code, and you can see >> > the distribution of ZONE_MOVABLE need so much code, so we do not need >> > to do so much job in KASLR side. But here, several lines will be OK. >> >> I am not able to find the beginning of the email thread right now. Could >> you summarize what is the actual problem please? > >The bug is found on x86 now. > >When added "kernelcore=" or "movablecore=" into kernel command line, >kernel memory is spread evenly among nodes. However, this is right when >KASLR is not enabled, then kernel will be at 16M of place in x86 arch. >If KASLR enabled, it could be put any place from 16M to 64T randomly. > >Consider a scenario, we have 10 nodes, and each node has 20G memory, and >we specify "kernelcore=50%", means each node will take 10G for >kernelcore, 10G for movable area. But this doesn't take kernel position >into consideration. E.g if kernel is put at 15G of 2nd node, namely >node1. Then we think on node1 there's 10G for kernelcore, 10G for >movable, in fact there's only 5G available for movable, just after >kernel. > >I made a v4 patch which possibly can fix it. > > >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 >From: Baoquan He >Date: Fri, 13 Jul 2018 07:49:29 +0800 >Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text > >In find_zone_movable_pfns_for_nodes(), when try to find the starting >PFN movable zone begins at in each node, kernel text position is not >considered. KASLR may put kernel after which movable zone begins. > >Fix it by finding movable zone after kernel text on that node. > >Signed-off-by: Baoquan He You can post it as alone PATCH, then I will test it next week. Thanks, Chao Fan >--- > mm/page_alloc.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c >index 1521100f1e63..5bc1a47dafda 100644 >--- a/mm/page_alloc.c >+++ b/mm/page_alloc.c >@@ -6547,7 +6547,7 @@ static unsigned long __init >early_calculate_totalpages(void) > static void __init find_zone_movable_pfns_for_nodes(void) > { > int i, nid; >- unsigned long usable_startpfn; >+ unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; > unsigned long kernelcore_node, kernelcore_remaining; > /* save the state before borrow the nodemask */ > nodemask_t saved_node_state = node_states[N_MEMORY]; >@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) > if (!required_kernelcore || required_kernelcore >= totalpages) > goto out; > >+ kernel_endpfn = PFN_UP(__pa_symbol(_end)); > /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ >- usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; >+ arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; > > restart: > /* Spread kernelcore memory as evenly as possible throughout nodes */ >@@ -6659,6 +6660,16 @@ static void __init >find_zone_movable_pfns_for_nodes(void) > unsigned long start_pfn, end_pfn; > > /* >+ * KASLR may put kernel near tail of node memory, >+ * start after kernel on that node to find PFN >+ * at which zone begins. >+ */ >+ if (pfn_to_nid(kernel_endpfn) == nid) >+ usable_startpfn = max(arch_startpfn, kernel_endpfn);
Re: Bug report about KASLR and ZONE_MOVABLE
Hi Michal, On 07/12/18 at 02:32pm, Michal Hocko wrote: > On Thu 12-07-18 14:01:15, Chao Fan wrote: > > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: > > >Hi Baoquan, > > > > > >At 07/11/2018 08:40 PM, Baoquan He wrote: > > >> Please try this v3 patch: > > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > > >> From: Baoquan He > > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 > > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > > >> > > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting > > >> PFN movable zone begins in each node, kernel text position is not > > >> considered. KASLR may put kernel after which movable zone begins. > > >> > > >> Fix it by finding movable zone after kernel text on that node. > > >> > > >> Signed-off-by: Baoquan He > > > > > > > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or > > >'movablecore=' failed if the KASLR puts the kernel back the tail of the > > >last node, or more. > > > > I think it may not fail. > > There is a 'restart' to do another pass. > > > > > > > >Due to we have fix the mirror memory in KASLR side, and Chao is trying > > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix > > >this in the KASLR side. > > > > > > > I think it's better to fix here, but not KASLR side. > > Cause much more code will be change if doing it in KASLR side. > > Since we didn't parse 'kernelcore' in compressed code, and you can see > > the distribution of ZONE_MOVABLE need so much code, so we do not need > > to do so much job in KASLR side. But here, several lines will be OK. > > I am not able to find the beginning of the email thread right now. Could > you summarize what is the actual problem please? The bug is found on x86 now. When added "kernelcore=" or "movablecore=" into kernel command line, kernel memory is spread evenly among nodes. However, this is right when KASLR is not enabled, then kernel will be at 16M of place in x86 arch. If KASLR enabled, it could be put any place from 16M to 64T randomly. Consider a scenario, we have 10 nodes, and each node has 20G memory, and we specify "kernelcore=50%", means each node will take 10G for kernelcore, 10G for movable area. But this doesn't take kernel position into consideration. E.g if kernel is put at 15G of 2nd node, namely node1. Then we think on node1 there's 10G for kernelcore, 10G for movable, in fact there's only 5G available for movable, just after kernel. I made a v4 patch which possibly can fix it. >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Fri, 13 Jul 2018 07:49:29 +0800 Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text In find_zone_movable_pfns_for_nodes(), when try to find the starting PFN movable zone begins at in each node, kernel text position is not considered. KASLR may put kernel after which movable zone begins. Fix it by finding movable zone after kernel text on that node. Signed-off-by: Baoquan He --- mm/page_alloc.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..5bc1a47dafda 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void) static void __init find_zone_movable_pfns_for_nodes(void) { int i, nid; - unsigned long usable_startpfn; + unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; unsigned long kernelcore_node, kernelcore_remaining; /* save the state before borrow the nodemask */ nodemask_t saved_node_state = node_states[N_MEMORY]; @@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (!required_kernelcore || required_kernelcore >= totalpages) goto out; + kernel_endpfn = PFN_UP(__pa_symbol(_end)); /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; + arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: /* Spread kernelcore memory as evenly as possible throughout nodes */ @@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void) unsigned long start_pfn, end_pfn; /* +* KASLR may put kernel near tail of node memory, +* start after kernel on that node to find PFN +* at which zone begins. +*/ + if (pfn_to_nid(kernel_endpfn) == nid) + usable_startpfn = max(arch_startpfn, kernel_endpfn); + else + usable_startpfn = arch_startpfn; + + /* * Recalculate kernelcore_node if the division per node * now exceeds what is necessary to satisfy the
Re: Bug report about KASLR and ZONE_MOVABLE
Hi Michal, On 07/12/18 at 02:32pm, Michal Hocko wrote: > On Thu 12-07-18 14:01:15, Chao Fan wrote: > > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: > > >Hi Baoquan, > > > > > >At 07/11/2018 08:40 PM, Baoquan He wrote: > > >> Please try this v3 patch: > > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > > >> From: Baoquan He > > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 > > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > > >> > > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting > > >> PFN movable zone begins in each node, kernel text position is not > > >> considered. KASLR may put kernel after which movable zone begins. > > >> > > >> Fix it by finding movable zone after kernel text on that node. > > >> > > >> Signed-off-by: Baoquan He > > > > > > > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or > > >'movablecore=' failed if the KASLR puts the kernel back the tail of the > > >last node, or more. > > > > I think it may not fail. > > There is a 'restart' to do another pass. > > > > > > > >Due to we have fix the mirror memory in KASLR side, and Chao is trying > > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix > > >this in the KASLR side. > > > > > > > I think it's better to fix here, but not KASLR side. > > Cause much more code will be change if doing it in KASLR side. > > Since we didn't parse 'kernelcore' in compressed code, and you can see > > the distribution of ZONE_MOVABLE need so much code, so we do not need > > to do so much job in KASLR side. But here, several lines will be OK. > > I am not able to find the beginning of the email thread right now. Could > you summarize what is the actual problem please? The bug is found on x86 now. When added "kernelcore=" or "movablecore=" into kernel command line, kernel memory is spread evenly among nodes. However, this is right when KASLR is not enabled, then kernel will be at 16M of place in x86 arch. If KASLR enabled, it could be put any place from 16M to 64T randomly. Consider a scenario, we have 10 nodes, and each node has 20G memory, and we specify "kernelcore=50%", means each node will take 10G for kernelcore, 10G for movable area. But this doesn't take kernel position into consideration. E.g if kernel is put at 15G of 2nd node, namely node1. Then we think on node1 there's 10G for kernelcore, 10G for movable, in fact there's only 5G available for movable, just after kernel. I made a v4 patch which possibly can fix it. >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Fri, 13 Jul 2018 07:49:29 +0800 Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text In find_zone_movable_pfns_for_nodes(), when try to find the starting PFN movable zone begins at in each node, kernel text position is not considered. KASLR may put kernel after which movable zone begins. Fix it by finding movable zone after kernel text on that node. Signed-off-by: Baoquan He --- mm/page_alloc.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..5bc1a47dafda 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void) static void __init find_zone_movable_pfns_for_nodes(void) { int i, nid; - unsigned long usable_startpfn; + unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; unsigned long kernelcore_node, kernelcore_remaining; /* save the state before borrow the nodemask */ nodemask_t saved_node_state = node_states[N_MEMORY]; @@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (!required_kernelcore || required_kernelcore >= totalpages) goto out; + kernel_endpfn = PFN_UP(__pa_symbol(_end)); /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; + arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: /* Spread kernelcore memory as evenly as possible throughout nodes */ @@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void) unsigned long start_pfn, end_pfn; /* +* KASLR may put kernel near tail of node memory, +* start after kernel on that node to find PFN +* at which zone begins. +*/ + if (pfn_to_nid(kernel_endpfn) == nid) + usable_startpfn = max(arch_startpfn, kernel_endpfn); + else + usable_startpfn = arch_startpfn; + + /* * Recalculate kernelcore_node if the division per node * now exceeds what is necessary to satisfy the
Re: Bug report about KASLR and ZONE_MOVABLE
On Thu 12-07-18 14:01:15, Chao Fan wrote: > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: > >Hi Baoquan, > > > >At 07/11/2018 08:40 PM, Baoquan He wrote: > >> Please try this v3 patch: > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > >> From: Baoquan He > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > >> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting > >> PFN movable zone begins in each node, kernel text position is not > >> considered. KASLR may put kernel after which movable zone begins. > >> > >> Fix it by finding movable zone after kernel text on that node. > >> > >> Signed-off-by: Baoquan He > > > > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or > >'movablecore=' failed if the KASLR puts the kernel back the tail of the > >last node, or more. > > I think it may not fail. > There is a 'restart' to do another pass. > > > > >Due to we have fix the mirror memory in KASLR side, and Chao is trying > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix > >this in the KASLR side. > > > > I think it's better to fix here, but not KASLR side. > Cause much more code will be change if doing it in KASLR side. > Since we didn't parse 'kernelcore' in compressed code, and you can see > the distribution of ZONE_MOVABLE need so much code, so we do not need > to do so much job in KASLR side. But here, several lines will be OK. I am not able to find the beginning of the email thread right now. Could you summarize what is the actual problem please? -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On Thu 12-07-18 14:01:15, Chao Fan wrote: > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: > >Hi Baoquan, > > > >At 07/11/2018 08:40 PM, Baoquan He wrote: > >> Please try this v3 patch: > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > >> From: Baoquan He > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > >> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting > >> PFN movable zone begins in each node, kernel text position is not > >> considered. KASLR may put kernel after which movable zone begins. > >> > >> Fix it by finding movable zone after kernel text on that node. > >> > >> Signed-off-by: Baoquan He > > > > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or > >'movablecore=' failed if the KASLR puts the kernel back the tail of the > >last node, or more. > > I think it may not fail. > There is a 'restart' to do another pass. > > > > >Due to we have fix the mirror memory in KASLR side, and Chao is trying > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix > >this in the KASLR side. > > > > I think it's better to fix here, but not KASLR side. > Cause much more code will be change if doing it in KASLR side. > Since we didn't parse 'kernelcore' in compressed code, and you can see > the distribution of ZONE_MOVABLE need so much code, so we do not need > to do so much job in KASLR side. But here, several lines will be OK. I am not able to find the beginning of the email thread right now. Could you summarize what is the actual problem please? -- Michal Hocko SUSE Labs
Re: Bug report about KASLR and ZONE_MOVABLE
On 07/12/18 at 09:19am, Chao Fan wrote: > On Wed, Jul 11, 2018 at 08:40:08PM +0800, Baoquan He wrote: > >Please try this v3 patch: > > > >From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > >From: Baoquan He > >Date: Wed, 11 Jul 2018 20:31:51 +0800 > >Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > > > >In find_zone_movable_pfns_for_nodes(), when try to find the starting > >PFN movable zone begins in each node, kernel text position is not > >considered. KASLR may put kernel after which movable zone begins. > > > >Fix it by finding movable zone after kernel text on that node. > > > >Signed-off-by: Baoquan He > >--- > > mm/page_alloc.c | 20 +++- > > 1 file changed, 15 insertions(+), 5 deletions(-) > > > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >index 1521100..390eb35 100644 > >--- a/mm/page_alloc.c > >+++ b/mm/page_alloc.c > >@@ -6547,7 +6547,7 @@ static unsigned long __init > >early_calculate_totalpages(void) > > static void __init find_zone_movable_pfns_for_nodes(void) > > { > > int i, nid; > >-unsigned long usable_startpfn; > >+unsigned long usable_startpfn, real_startpfn; > > unsigned long kernelcore_node, kernelcore_remaining; > > /* save the state before borrow the nodemask */ > > nodemask_t saved_node_state = node_states[N_MEMORY]; > >@@ -6681,10 +6681,20 @@ static void __init > >find_zone_movable_pfns_for_nodes(void) > > if (start_pfn >= end_pfn) > > continue; > > Hi Baoquan, > > Thanks for your quick reply and PATCH. > I think it can work well after reviewing the code. But I think the new > variable 'real_startpfn' is unnecessary. How about this: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6d00f746c2fd..0fc9c4283947 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6492,6 +6492,10 @@ static void __init > find_zone_movable_pfns_for_nodes(void) > if (start_pfn >= end_pfn) > continue; > > + if (pfn_to_nid(PFN_UP(_etext)) == i) > + usable_startpfn = max(usable_startpfn, > + PFN_UP(_etext)); > + > /* Account for what is only usable for kernelcore */ > if (start_pfn < usable_startpfn) { > unsigned long kernel_pages; > > I think the logic of these two method are the same, and this method > change less code. If I am wrong, please let me know. Might be not. Need consider usable_startpfn and kernel_pfn are in the same node, or in different node, two cases. I will correct code after fix the compiling error. > > > > > >+/* > >+ * KASLR may put kernel near tail of node memory, > >+ * start after kernel on that node to find PFN > >+ * which zone begins. > >+ */ > >+if (pfn_to_nid(PFN_UP(_etext)) == i) > >+real_startpfn = max(usable_startpfn, > >+PFN_UP(_etext)) > >+else > >+real_startpfn = usable_startpfn; > > /* Account for what is only usable for kernelcore */ > >-if (start_pfn < usable_startpfn) { > >+if (start_pfn < real_startpfn) { > > unsigned long kernel_pages; > >-kernel_pages = min(end_pfn, usable_startpfn) > >+kernel_pages = min(end_pfn, real_startpfn) > > - start_pfn; > > > > kernelcore_remaining -= min(kernel_pages, > >@@ -6693,7 +6703,7 @@ static void __init > >find_zone_movable_pfns_for_nodes(void) > > required_kernelcore); > > > > /* Continue if range is now fully accounted */ > >-if (end_pfn <= usable_startpfn) { > >+if (end_pfn <= real_startpfn) { > > > > /* > > * Push zone_movable_pfn to the end so > >@@ -6704,7 +6714,7 @@ static void __init > >find_zone_movable_pfns_for_nodes(void) > > zone_movable_pfn[nid] = end_pfn; > > continue; > > } > >-start_pfn = usable_startpfn; > >+start_pfn = real_startpfn; > > } > > > > /* > >-- > >2.1.0 > > > > > > > >
Re: Bug report about KASLR and ZONE_MOVABLE
On 07/12/18 at 09:19am, Chao Fan wrote: > On Wed, Jul 11, 2018 at 08:40:08PM +0800, Baoquan He wrote: > >Please try this v3 patch: > > > >From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > >From: Baoquan He > >Date: Wed, 11 Jul 2018 20:31:51 +0800 > >Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > > > >In find_zone_movable_pfns_for_nodes(), when try to find the starting > >PFN movable zone begins in each node, kernel text position is not > >considered. KASLR may put kernel after which movable zone begins. > > > >Fix it by finding movable zone after kernel text on that node. > > > >Signed-off-by: Baoquan He > >--- > > mm/page_alloc.c | 20 +++- > > 1 file changed, 15 insertions(+), 5 deletions(-) > > > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >index 1521100..390eb35 100644 > >--- a/mm/page_alloc.c > >+++ b/mm/page_alloc.c > >@@ -6547,7 +6547,7 @@ static unsigned long __init > >early_calculate_totalpages(void) > > static void __init find_zone_movable_pfns_for_nodes(void) > > { > > int i, nid; > >-unsigned long usable_startpfn; > >+unsigned long usable_startpfn, real_startpfn; > > unsigned long kernelcore_node, kernelcore_remaining; > > /* save the state before borrow the nodemask */ > > nodemask_t saved_node_state = node_states[N_MEMORY]; > >@@ -6681,10 +6681,20 @@ static void __init > >find_zone_movable_pfns_for_nodes(void) > > if (start_pfn >= end_pfn) > > continue; > > Hi Baoquan, > > Thanks for your quick reply and PATCH. > I think it can work well after reviewing the code. But I think the new > variable 'real_startpfn' is unnecessary. How about this: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6d00f746c2fd..0fc9c4283947 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6492,6 +6492,10 @@ static void __init > find_zone_movable_pfns_for_nodes(void) > if (start_pfn >= end_pfn) > continue; > > + if (pfn_to_nid(PFN_UP(_etext)) == i) > + usable_startpfn = max(usable_startpfn, > + PFN_UP(_etext)); > + > /* Account for what is only usable for kernelcore */ > if (start_pfn < usable_startpfn) { > unsigned long kernel_pages; > > I think the logic of these two method are the same, and this method > change less code. If I am wrong, please let me know. Might be not. Need consider usable_startpfn and kernel_pfn are in the same node, or in different node, two cases. I will correct code after fix the compiling error. > > > > > >+/* > >+ * KASLR may put kernel near tail of node memory, > >+ * start after kernel on that node to find PFN > >+ * which zone begins. > >+ */ > >+if (pfn_to_nid(PFN_UP(_etext)) == i) > >+real_startpfn = max(usable_startpfn, > >+PFN_UP(_etext)) > >+else > >+real_startpfn = usable_startpfn; > > /* Account for what is only usable for kernelcore */ > >-if (start_pfn < usable_startpfn) { > >+if (start_pfn < real_startpfn) { > > unsigned long kernel_pages; > >-kernel_pages = min(end_pfn, usable_startpfn) > >+kernel_pages = min(end_pfn, real_startpfn) > > - start_pfn; > > > > kernelcore_remaining -= min(kernel_pages, > >@@ -6693,7 +6703,7 @@ static void __init > >find_zone_movable_pfns_for_nodes(void) > > required_kernelcore); > > > > /* Continue if range is now fully accounted */ > >-if (end_pfn <= usable_startpfn) { > >+if (end_pfn <= real_startpfn) { > > > > /* > > * Push zone_movable_pfn to the end so > >@@ -6704,7 +6714,7 @@ static void __init > >find_zone_movable_pfns_for_nodes(void) > > zone_movable_pfn[nid] = end_pfn; > > continue; > > } > >-start_pfn = usable_startpfn; > >+start_pfn = real_startpfn; > > } > > > > /* > >-- > >2.1.0 > > > > > > > >
Re: Bug report about KASLR and ZONE_MOVABLE
On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: >Hi Baoquan, > >At 07/11/2018 08:40 PM, Baoquan He wrote: >> Please try this v3 patch: >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 >> From: Baoquan He >> Date: Wed, 11 Jul 2018 20:31:51 +0800 >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text >> >> In find_zone_movable_pfns_for_nodes(), when try to find the starting >> PFN movable zone begins in each node, kernel text position is not >> considered. KASLR may put kernel after which movable zone begins. >> >> Fix it by finding movable zone after kernel text on that node. >> >> Signed-off-by: Baoquan He > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or >'movablecore=' failed if the KASLR puts the kernel back the tail of the >last node, or more. I think it may not fail. There is a 'restart' to do another pass. > >Due to we have fix the mirror memory in KASLR side, and Chao is trying >to fix the 'movable_node' in KASLR side. Have you had a chance to fix >this in the KASLR side. > I think it's better to fix here, but not KASLR side. Cause much more code will be change if doing it in KASLR side. Since we didn't parse 'kernelcore' in compressed code, and you can see the distribution of ZONE_MOVABLE need so much code, so we do not need to do so much job in KASLR side. But here, several lines will be OK. Thanks, Chao Fan > >> --- >> mm/page_alloc.c | 20 +++- >> 1 file changed, 15 insertions(+), 5 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 1521100..390eb35 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6547,7 +6547,7 @@ static unsigned long __init >> early_calculate_totalpages(void) >> static void __init find_zone_movable_pfns_for_nodes(void) >> { >> int i, nid; >> -unsigned long usable_startpfn; >> +unsigned long usable_startpfn, real_startpfn; >> unsigned long kernelcore_node, kernelcore_remaining; >> /* save the state before borrow the nodemask */ >> nodemask_t saved_node_state = node_states[N_MEMORY]; >> @@ -6681,10 +6681,20 @@ static void __init >> find_zone_movable_pfns_for_nodes(void) >> if (start_pfn >= end_pfn) >> continue; >> +/* >> + * KASLR may put kernel near tail of node memory, >> + * start after kernel on that node to find PFN >> + * which zone begins. >> + */ >> +if (pfn_to_nid(PFN_UP(_etext)) == i) > >Here, did you want to check the Node id? seems may be nid. > >and > >for_each_node_state(nid, N_MEMORY) { > >... seems check here is more suitable. > > for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) { > > } >} > >Thanks, > dou > >> +real_startpfn = max(usable_startpfn, >> +PFN_UP(_etext)) >> +else >> +real_startpfn = usable_startpfn; >> /* Account for what is only usable for kernelcore */ >> -if (start_pfn < usable_startpfn) { >> +if (start_pfn < real_startpfn) { >> unsigned long kernel_pages; >> -kernel_pages = min(end_pfn, usable_startpfn) >> +kernel_pages = min(end_pfn, real_startpfn) >> - start_pfn; >> kernelcore_remaining -= min(kernel_pages, >> @@ -6693,7 +6703,7 @@ static void __init >> find_zone_movable_pfns_for_nodes(void) >> required_kernelcore); >> /* Continue if range is now fully accounted */ >> -if (end_pfn <= usable_startpfn) { >> +if (end_pfn <= real_startpfn) { >> /* >> * Push zone_movable_pfn to the end so >> @@ -6704,7 +6714,7 @@ static void __init >> find_zone_movable_pfns_for_nodes(void) >> zone_movable_pfn[nid] = end_pfn; >> continue; >> } >> -start_pfn = usable_startpfn; >> +start_pfn = real_startpfn; >> } >> /* >>
Re: Bug report about KASLR and ZONE_MOVABLE
On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: >Hi Baoquan, > >At 07/11/2018 08:40 PM, Baoquan He wrote: >> Please try this v3 patch: >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 >> From: Baoquan He >> Date: Wed, 11 Jul 2018 20:31:51 +0800 >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text >> >> In find_zone_movable_pfns_for_nodes(), when try to find the starting >> PFN movable zone begins in each node, kernel text position is not >> considered. KASLR may put kernel after which movable zone begins. >> >> Fix it by finding movable zone after kernel text on that node. >> >> Signed-off-by: Baoquan He > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or >'movablecore=' failed if the KASLR puts the kernel back the tail of the >last node, or more. I think it may not fail. There is a 'restart' to do another pass. > >Due to we have fix the mirror memory in KASLR side, and Chao is trying >to fix the 'movable_node' in KASLR side. Have you had a chance to fix >this in the KASLR side. > I think it's better to fix here, but not KASLR side. Cause much more code will be change if doing it in KASLR side. Since we didn't parse 'kernelcore' in compressed code, and you can see the distribution of ZONE_MOVABLE need so much code, so we do not need to do so much job in KASLR side. But here, several lines will be OK. Thanks, Chao Fan > >> --- >> mm/page_alloc.c | 20 +++- >> 1 file changed, 15 insertions(+), 5 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 1521100..390eb35 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6547,7 +6547,7 @@ static unsigned long __init >> early_calculate_totalpages(void) >> static void __init find_zone_movable_pfns_for_nodes(void) >> { >> int i, nid; >> -unsigned long usable_startpfn; >> +unsigned long usable_startpfn, real_startpfn; >> unsigned long kernelcore_node, kernelcore_remaining; >> /* save the state before borrow the nodemask */ >> nodemask_t saved_node_state = node_states[N_MEMORY]; >> @@ -6681,10 +6681,20 @@ static void __init >> find_zone_movable_pfns_for_nodes(void) >> if (start_pfn >= end_pfn) >> continue; >> +/* >> + * KASLR may put kernel near tail of node memory, >> + * start after kernel on that node to find PFN >> + * which zone begins. >> + */ >> +if (pfn_to_nid(PFN_UP(_etext)) == i) > >Here, did you want to check the Node id? seems may be nid. > >and > >for_each_node_state(nid, N_MEMORY) { > >... seems check here is more suitable. > > for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) { > > } >} > >Thanks, > dou > >> +real_startpfn = max(usable_startpfn, >> +PFN_UP(_etext)) >> +else >> +real_startpfn = usable_startpfn; >> /* Account for what is only usable for kernelcore */ >> -if (start_pfn < usable_startpfn) { >> +if (start_pfn < real_startpfn) { >> unsigned long kernel_pages; >> -kernel_pages = min(end_pfn, usable_startpfn) >> +kernel_pages = min(end_pfn, real_startpfn) >> - start_pfn; >> kernelcore_remaining -= min(kernel_pages, >> @@ -6693,7 +6703,7 @@ static void __init >> find_zone_movable_pfns_for_nodes(void) >> required_kernelcore); >> /* Continue if range is now fully accounted */ >> -if (end_pfn <= usable_startpfn) { >> +if (end_pfn <= real_startpfn) { >> /* >> * Push zone_movable_pfn to the end so >> @@ -6704,7 +6714,7 @@ static void __init >> find_zone_movable_pfns_for_nodes(void) >> zone_movable_pfn[nid] = end_pfn; >> continue; >> } >> -start_pfn = usable_startpfn; >> +start_pfn = real_startpfn; >> } >> /* >>
Re: Bug report about KASLR and ZONE_MOVABLE
Hi Baoquan, At 07/11/2018 08:40 PM, Baoquan He wrote: Please try this v3 patch: >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Wed, 11 Jul 2018 20:31:51 +0800 Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text In find_zone_movable_pfns_for_nodes(), when try to find the starting PFN movable zone begins in each node, kernel text position is not considered. KASLR may put kernel after which movable zone begins. Fix it by finding movable zone after kernel text on that node. Signed-off-by: Baoquan He You fix this in the _zone_init side_. This may make the 'kernelcore=' or 'movablecore=' failed if the KASLR puts the kernel back the tail of the last node, or more. Due to we have fix the mirror memory in KASLR side, and Chao is trying to fix the 'movable_node' in KASLR side. Have you had a chance to fix this in the KASLR side. --- mm/page_alloc.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..390eb35 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void) static void __init find_zone_movable_pfns_for_nodes(void) { int i, nid; - unsigned long usable_startpfn; + unsigned long usable_startpfn, real_startpfn; unsigned long kernelcore_node, kernelcore_remaining; /* save the state before borrow the nodemask */ nodemask_t saved_node_state = node_states[N_MEMORY]; @@ -6681,10 +6681,20 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (start_pfn >= end_pfn) continue; + /* +* KASLR may put kernel near tail of node memory, +* start after kernel on that node to find PFN +* which zone begins. +*/ + if (pfn_to_nid(PFN_UP(_etext)) == i) Here, did you want to check the Node id? seems may be nid. and for_each_node_state(nid, N_MEMORY) { ... seems check here is more suitable. for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) { } } Thanks, dou + real_startpfn = max(usable_startpfn, + PFN_UP(_etext)) + else + real_startpfn = usable_startpfn; /* Account for what is only usable for kernelcore */ - if (start_pfn < usable_startpfn) { + if (start_pfn < real_startpfn) { unsigned long kernel_pages; - kernel_pages = min(end_pfn, usable_startpfn) + kernel_pages = min(end_pfn, real_startpfn) - start_pfn; kernelcore_remaining -= min(kernel_pages, @@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) required_kernelcore); /* Continue if range is now fully accounted */ - if (end_pfn <= usable_startpfn) { + if (end_pfn <= real_startpfn) { /* * Push zone_movable_pfn to the end so @@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) zone_movable_pfn[nid] = end_pfn; continue; } - start_pfn = usable_startpfn; + start_pfn = real_startpfn; } /*
Re: Bug report about KASLR and ZONE_MOVABLE
Hi Baoquan, At 07/11/2018 08:40 PM, Baoquan He wrote: Please try this v3 patch: >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Wed, 11 Jul 2018 20:31:51 +0800 Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text In find_zone_movable_pfns_for_nodes(), when try to find the starting PFN movable zone begins in each node, kernel text position is not considered. KASLR may put kernel after which movable zone begins. Fix it by finding movable zone after kernel text on that node. Signed-off-by: Baoquan He You fix this in the _zone_init side_. This may make the 'kernelcore=' or 'movablecore=' failed if the KASLR puts the kernel back the tail of the last node, or more. Due to we have fix the mirror memory in KASLR side, and Chao is trying to fix the 'movable_node' in KASLR side. Have you had a chance to fix this in the KASLR side. --- mm/page_alloc.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..390eb35 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void) static void __init find_zone_movable_pfns_for_nodes(void) { int i, nid; - unsigned long usable_startpfn; + unsigned long usable_startpfn, real_startpfn; unsigned long kernelcore_node, kernelcore_remaining; /* save the state before borrow the nodemask */ nodemask_t saved_node_state = node_states[N_MEMORY]; @@ -6681,10 +6681,20 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (start_pfn >= end_pfn) continue; + /* +* KASLR may put kernel near tail of node memory, +* start after kernel on that node to find PFN +* which zone begins. +*/ + if (pfn_to_nid(PFN_UP(_etext)) == i) Here, did you want to check the Node id? seems may be nid. and for_each_node_state(nid, N_MEMORY) { ... seems check here is more suitable. for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) { } } Thanks, dou + real_startpfn = max(usable_startpfn, + PFN_UP(_etext)) + else + real_startpfn = usable_startpfn; /* Account for what is only usable for kernelcore */ - if (start_pfn < usable_startpfn) { + if (start_pfn < real_startpfn) { unsigned long kernel_pages; - kernel_pages = min(end_pfn, usable_startpfn) + kernel_pages = min(end_pfn, real_startpfn) - start_pfn; kernelcore_remaining -= min(kernel_pages, @@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) required_kernelcore); /* Continue if range is now fully accounted */ - if (end_pfn <= usable_startpfn) { + if (end_pfn <= real_startpfn) { /* * Push zone_movable_pfn to the end so @@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) zone_movable_pfn[nid] = end_pfn; continue; } - start_pfn = usable_startpfn; + start_pfn = real_startpfn; } /*
Re: Bug report about KASLR and ZONE_MOVABLE
On Wed, Jul 11, 2018 at 08:40:08PM +0800, Baoquan He wrote: >Please try this v3 patch: > >From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 >From: Baoquan He >Date: Wed, 11 Jul 2018 20:31:51 +0800 >Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > >In find_zone_movable_pfns_for_nodes(), when try to find the starting >PFN movable zone begins in each node, kernel text position is not >considered. KASLR may put kernel after which movable zone begins. > >Fix it by finding movable zone after kernel text on that node. > >Signed-off-by: Baoquan He >--- > mm/page_alloc.c | 20 +++- > 1 file changed, 15 insertions(+), 5 deletions(-) > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c >index 1521100..390eb35 100644 >--- a/mm/page_alloc.c >+++ b/mm/page_alloc.c >@@ -6547,7 +6547,7 @@ static unsigned long __init >early_calculate_totalpages(void) > static void __init find_zone_movable_pfns_for_nodes(void) > { > int i, nid; >- unsigned long usable_startpfn; >+ unsigned long usable_startpfn, real_startpfn; > unsigned long kernelcore_node, kernelcore_remaining; > /* save the state before borrow the nodemask */ > nodemask_t saved_node_state = node_states[N_MEMORY]; >@@ -6681,10 +6681,20 @@ static void __init >find_zone_movable_pfns_for_nodes(void) > if (start_pfn >= end_pfn) > continue; Hi Baoquan, Thanks for your quick reply and PATCH. I think it can work well after reviewing the code. But I think the new variable 'real_startpfn' is unnecessary. How about this: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6d00f746c2fd..0fc9c4283947 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6492,6 +6492,10 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (start_pfn >= end_pfn) continue; + if (pfn_to_nid(PFN_UP(_etext)) == i) + usable_startpfn = max(usable_startpfn, + PFN_UP(_etext)); + /* Account for what is only usable for kernelcore */ if (start_pfn < usable_startpfn) { unsigned long kernel_pages; I think the logic of these two method are the same, and this method change less code. If I am wrong, please let me know. Thanks, Chao Fan > >+ /* >+ * KASLR may put kernel near tail of node memory, >+ * start after kernel on that node to find PFN >+ * which zone begins. >+ */ >+ if (pfn_to_nid(PFN_UP(_etext)) == i) >+ real_startpfn = max(usable_startpfn, >+ PFN_UP(_etext)) >+ else >+ real_startpfn = usable_startpfn; > /* Account for what is only usable for kernelcore */ >- if (start_pfn < usable_startpfn) { >+ if (start_pfn < real_startpfn) { > unsigned long kernel_pages; >- kernel_pages = min(end_pfn, usable_startpfn) >+ kernel_pages = min(end_pfn, real_startpfn) > - start_pfn; > > kernelcore_remaining -= min(kernel_pages, >@@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) > required_kernelcore); > > /* Continue if range is now fully accounted */ >- if (end_pfn <= usable_startpfn) { >+ if (end_pfn <= real_startpfn) { > > /* >* Push zone_movable_pfn to the end so >@@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void) > zone_movable_pfn[nid] = end_pfn; > continue; > } >- start_pfn = usable_startpfn; >+ start_pfn = real_startpfn; > } > > /* >-- >2.1.0 > > >