Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)

2024-03-20 Thread Steven Rostedt
On Wed, 20 Mar 2024 12:44:23 +0900
Masami Hiramatsu (Google)  wrote:

> > > kernel/trace/trace_probe.c
> > > 846 return;
> > > 847 
> > > 848 for (i = 0; i < earg->size; i++) {
> > > 849 struct fetch_insn *code = >code[i];
> > > 850 
> > > 851 switch (code->op) {
> > > 852 case FETCH_OP_ARG:
> > > 853 val = regs_get_kernel_argument(regs, 
> > > code->param);
> > > 854 break;
> > > 855 case FETCH_OP_ST_EDATA:  
> > > --> 856 *(unsigned long *)((unsigned long)edata + 
> > > code->offset) = val;
> > > 
> > > Probably the earg->code[i] always has FETCH_OP_ARG before
> > > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out...  
> > 
> > Looks that way:
> > 
> > case FETCH_OP_END:
> > earg->code[i].op = FETCH_OP_ARG;
> > earg->code[i].param = argnum;
> > earg->code[i + 1].op = FETCH_OP_ST_EDATA;
> > earg->code[i + 1].offset = offset;
> > return offset;
> > 
> > But probably should still initialize val to zero or have a WARN_ON() if
> > that doesn't happen.  
> 
> OK, let's val = 0 in the store_trace_entry_data(), but WARN_ON() in this loop
> is a bit strange. I think we should have a verifiler.

Initializing to zero is fine.

-- Steve



Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)

2024-03-19 Thread Google
On Tue, 19 Mar 2024 10:10:00 -0400
Steven Rostedt  wrote:

> On Tue, 19 Mar 2024 10:19:09 +0300
> Dan Carpenter  wrote:
> 
> > Hello Masami Hiramatsu (Google),
> > 
> > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe
> > (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the
> > following Smatch static checker warning:
> > 
> > kernel/trace/trace_probe.c:856 store_trace_entry_data()
> > error: uninitialized symbol 'val'.
> > 
> > kernel/trace/trace_probe.c
> > 846 return;
> > 847 
> > 848 for (i = 0; i < earg->size; i++) {
> > 849 struct fetch_insn *code = >code[i];
> > 850 
> > 851 switch (code->op) {
> > 852 case FETCH_OP_ARG:
> > 853 val = regs_get_kernel_argument(regs, 
> > code->param);
> > 854 break;
> > 855 case FETCH_OP_ST_EDATA:
> > --> 856 *(unsigned long *)((unsigned long)edata + 
> > code->offset) = val;  
> > 
> > Probably the earg->code[i] always has FETCH_OP_ARG before
> > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out...
> 
> Looks that way:
> 
>   case FETCH_OP_END:
>   earg->code[i].op = FETCH_OP_ARG;
>   earg->code[i].param = argnum;
>   earg->code[i + 1].op = FETCH_OP_ST_EDATA;
>   earg->code[i + 1].offset = offset;
>   return offset;
> 
> But probably should still initialize val to zero or have a WARN_ON() if
> that doesn't happen.

OK, let's val = 0 in the store_trace_entry_data(), but WARN_ON() in this loop
is a bit strange. I think we should have a verifiler.

Thank you,

> 
> -- Steve
> 
> 
> > 
> > 857 break;
> > 858 case FETCH_OP_END:
> > 859 goto end;
> > 860 default:
> > 861 break;
> > 862 }
> > 863 }
> > 864 end:
> > 865 return;
> > 866 }
> > 
> > regards,
> > dan carpenter
> 


-- 
Masami Hiramatsu (Google) 



Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)

2024-03-19 Thread Dan Carpenter
On Tue, Mar 19, 2024 at 10:10:00AM -0400, Steven Rostedt wrote:
> On Tue, 19 Mar 2024 10:19:09 +0300
> Dan Carpenter  wrote:
> 
> > Hello Masami Hiramatsu (Google),
> > 
> > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe
> > (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the
> > following Smatch static checker warning:
> > 
> > kernel/trace/trace_probe.c:856 store_trace_entry_data()
> > error: uninitialized symbol 'val'.
> > 
> > kernel/trace/trace_probe.c
> > 846 return;
> > 847 
> > 848 for (i = 0; i < earg->size; i++) {
> > 849 struct fetch_insn *code = >code[i];
> > 850 
> > 851 switch (code->op) {
> > 852 case FETCH_OP_ARG:
> > 853 val = regs_get_kernel_argument(regs, 
> > code->param);
> > 854 break;
> > 855 case FETCH_OP_ST_EDATA:
> > --> 856 *(unsigned long *)((unsigned long)edata + 
> > code->offset) = val;  
> > 
> > Probably the earg->code[i] always has FETCH_OP_ARG before
> > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out...
> 
> Looks that way:
> 
>   case FETCH_OP_END:
>   earg->code[i].op = FETCH_OP_ARG;
>   earg->code[i].param = argnum;
>   earg->code[i + 1].op = FETCH_OP_ST_EDATA;
>   earg->code[i + 1].offset = offset;
>   return offset;
> 
> But probably should still initialize val to zero or have a WARN_ON() if
> that doesn't happen.
> 

Most people use the GCC extension to initialize everything to zero so
initializing to zero really has zero cost.  I always recomend people to
do it.

regards,
dan carpenter




Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)

2024-03-19 Thread Steven Rostedt
On Tue, 19 Mar 2024 10:19:09 +0300
Dan Carpenter  wrote:

> Hello Masami Hiramatsu (Google),
> 
> Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe
> (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the
> following Smatch static checker warning:
> 
>   kernel/trace/trace_probe.c:856 store_trace_entry_data()
>   error: uninitialized symbol 'val'.
> 
> kernel/trace/trace_probe.c
> 846 return;
> 847 
> 848 for (i = 0; i < earg->size; i++) {
> 849 struct fetch_insn *code = >code[i];
> 850 
> 851 switch (code->op) {
> 852 case FETCH_OP_ARG:
> 853 val = regs_get_kernel_argument(regs, 
> code->param);
> 854 break;
> 855 case FETCH_OP_ST_EDATA:
> --> 856 *(unsigned long *)((unsigned long)edata + 
> code->offset) = val;  
> 
> Probably the earg->code[i] always has FETCH_OP_ARG before
> FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out...

Looks that way:

case FETCH_OP_END:
earg->code[i].op = FETCH_OP_ARG;
earg->code[i].param = argnum;
earg->code[i + 1].op = FETCH_OP_ST_EDATA;
earg->code[i + 1].offset = offset;
return offset;

But probably should still initialize val to zero or have a WARN_ON() if
that doesn't happen.

-- Steve


> 
> 857 break;
> 858 case FETCH_OP_END:
> 859 goto end;
> 860 default:
> 861 break;
> 862 }
> 863 }
> 864 end:
> 865 return;
> 866 }
> 
> regards,
> dan carpenter




Re: [bug report] Memory leak from acpi_ev_install_space_handler()

2021-04-06 Thread John Garry

On 06/04/2021 17:40, Rafael J. Wysocki wrote:

On Tue, Apr 6, 2021 at 5:51 PM John Garry  wrote:


Hi guys,

On next-20210406, I enabled CONFIG_DEBUG_KMEMLEAK and
CONFIG_DEBUG_TEST_DRIVER_REMOVE for my arm64 system, and see this:




Hi Rafael,


Why exactly do you think that acpi_ev_install_space_handler() leaks memory?



I don't think that acpi_ev_install_space_handler() itself leaks memory, 
but it seems that there is something missing in the code which is meant 
to undo/clean up after that on the uninstall path - I did make the point 
in writing "memory leak from", but maybe still not worded clearly.


Anyway, I have not analyzed the problem fully - I'm just reporting.

I don't mind looking further if requested.

Thanks,
John


root@debian:/home/john# more /sys/kernel/debug/kmemleak
unreferenced object 0x202803c11f00 (size 128):
comm "swapper/0", pid 1, jiffies 4294894325 (age 337.524s)
hex dump (first 32 bytes):
00 00 00 00 02 00 00 00 08 1f c1 03 28 20 ff ff( ..
08 1f c1 03 28 20 ff ff 00 00 00 00 00 00 00 00( ..
backtrace:
[<670a0938>] slab_post_alloc_hook+0x9c/0x2f8
[] kmem_cache_alloc+0x198/0x2a8
[<2bdba864>] acpi_os_create_semaphore+0x54/0xe0
[] acpi_ev_install_space_handler+0x24c/0x300
[<02e116e2>] acpi_install_address_space_handler+0x64/0xb0
[] i2c_acpi_install_space_handler+0xd4/0x138
[<8da42058>] i2c_register_adapter+0x368/0x910
[] i2c_add_adapter+0x9c/0x100
[<00ba2fcf>] i2c_add_numbered_adapter+0x44/0x58
[<7df22d67>] i2c_dw_probe_master+0x68c/0x900
[<682dfc98>] dw_i2c_plat_probe+0x460/0x640
[] platform_probe+0x8c/0x108
[] really_probe+0x190/0x670
[<66017341>] driver_probe_device+0x8c/0xf8
[] device_driver_attach+0x9c/0xa8
[] __driver_attach+0x88/0x138
unreferenced object 0x00280452c100 (size 128):
comm "swapper/0", pid 1, jiffies 4294894558 (age 336.604s)
hex dump (first 32 bytes):
00 00 00 00 02 00 00 00 08 c1 52 04 28 00 ff ff..R.(...
08 c1 52 04 28 00 ff ff 00 00 00 00 00 00 00 00..R.(...
backtrace:
[<670a0938>] slab_post_alloc_hook+0x9c/0x2f8
[] kmem_cache_alloc+0x198/0x2a8
[<2bdba864>] acpi_os_create_semaphore+0x54/0xe0
[] acpi_ev_install_space_handler+0x24c/0x300
[<02e116e2>] acpi_install_address_space_handler+0x64/0xb0
[<988d4f61>] acpi_gpiochip_add+0x20c/0x4a0
[<73d4faab>] gpiochip_add_data_with_key+0xd10/0x1680
[<1d50b98a>] devm_gpiochip_add_data_with_key+0x30/0x78
[] dwapb_gpio_probe+0x828/0xb28
[] platform_probe+0x8c/0x108
[] really_probe+0x190/0x670
[<66017341>] driver_probe_device+0x8c/0xf8
[] device_driver_attach+0x9c/0xa8
[] __driver_attach+0x88/0x138
[] bus_for_each_dev+0xec/0x160
[] driver_attach+0x34/0x48
root@debian:/home/john#

Thanks,
John

.





Re: [bug report] Memory leak from acpi_ev_install_space_handler()

2021-04-06 Thread Rafael J. Wysocki
On Tue, Apr 6, 2021 at 5:51 PM John Garry  wrote:
>
> Hi guys,
>
> On next-20210406, I enabled CONFIG_DEBUG_KMEMLEAK and
> CONFIG_DEBUG_TEST_DRIVER_REMOVE for my arm64 system, and see this:

Why exactly do you think that acpi_ev_install_space_handler() leaks memory?

> root@debian:/home/john# more /sys/kernel/debug/kmemleak
> unreferenced object 0x202803c11f00 (size 128):
> comm "swapper/0", pid 1, jiffies 4294894325 (age 337.524s)
> hex dump (first 32 bytes):
> 00 00 00 00 02 00 00 00 08 1f c1 03 28 20 ff ff( ..
> 08 1f c1 03 28 20 ff ff 00 00 00 00 00 00 00 00( ..
> backtrace:
> [<670a0938>] slab_post_alloc_hook+0x9c/0x2f8
> [] kmem_cache_alloc+0x198/0x2a8
> [<2bdba864>] acpi_os_create_semaphore+0x54/0xe0
> [] acpi_ev_install_space_handler+0x24c/0x300
> [<02e116e2>] acpi_install_address_space_handler+0x64/0xb0
> [] i2c_acpi_install_space_handler+0xd4/0x138
> [<8da42058>] i2c_register_adapter+0x368/0x910
> [] i2c_add_adapter+0x9c/0x100
> [<00ba2fcf>] i2c_add_numbered_adapter+0x44/0x58
> [<7df22d67>] i2c_dw_probe_master+0x68c/0x900
> [<682dfc98>] dw_i2c_plat_probe+0x460/0x640
> [] platform_probe+0x8c/0x108
> [] really_probe+0x190/0x670
> [<66017341>] driver_probe_device+0x8c/0xf8
> [] device_driver_attach+0x9c/0xa8
> [] __driver_attach+0x88/0x138
> unreferenced object 0x00280452c100 (size 128):
> comm "swapper/0", pid 1, jiffies 4294894558 (age 336.604s)
> hex dump (first 32 bytes):
> 00 00 00 00 02 00 00 00 08 c1 52 04 28 00 ff ff..R.(...
> 08 c1 52 04 28 00 ff ff 00 00 00 00 00 00 00 00..R.(...
> backtrace:
> [<670a0938>] slab_post_alloc_hook+0x9c/0x2f8
> [] kmem_cache_alloc+0x198/0x2a8
> [<2bdba864>] acpi_os_create_semaphore+0x54/0xe0
> [] acpi_ev_install_space_handler+0x24c/0x300
> [<02e116e2>] acpi_install_address_space_handler+0x64/0xb0
> [<988d4f61>] acpi_gpiochip_add+0x20c/0x4a0
> [<73d4faab>] gpiochip_add_data_with_key+0xd10/0x1680
> [<1d50b98a>] devm_gpiochip_add_data_with_key+0x30/0x78
> [] dwapb_gpio_probe+0x828/0xb28
> [] platform_probe+0x8c/0x108
> [] really_probe+0x190/0x670
> [<66017341>] driver_probe_device+0x8c/0xf8
> [] device_driver_attach+0x9c/0xa8
> [] __driver_attach+0x88/0x138
> [] bus_for_each_dev+0xec/0x160
> [] driver_attach+0x34/0x48
> root@debian:/home/john#
>
> Thanks,
> John


Re: [bug report] node: Add memory-side caching attributes

2021-04-01 Thread Keith Busch
On Thu, Apr 01, 2021 at 12:00:39PM +0300, Dan Carpenter wrote:
> Hi Keith,
> 
> I've been trying to figure out ways Smatch can check for device managed
> resources.  Like adding rules that if we call dev_set_name(>bar)
> then it's device managaged and if there is a kfree(foo) without calling
> device_put(>bar) then that's a resource leak.
> 
> Of course one of the rules is that if you call device_register(dev) then
> you can't kfree(dev), it has to released with device_put(dev) and that's
> true even if the register fails.  But this code here feels very
> intentional so maybe there is an exception to the rule?

Thanks for the notice. There was no intentional use of this, so I
think it was a mistake. The proposal in the follow on replies looks much
better.
 
> The patch acc02a109b04: "node: Add memory-side caching attributes"
> from Mar 11, 2019, leads to the following static checker warning:
> 
>   drivers/base/node.c:285 node_init_cache_dev()
>   error: kfree after device_register(): 'dev'
> 
> drivers/base/node.c
>263  static void node_init_cache_dev(struct node *node)
>264  {
>265  struct device *dev;
>266  
>267  dev = kzalloc(sizeof(*dev), GFP_KERNEL);
>268  if (!dev)
>269  return;
>270  
>271  dev->parent = >dev;
>272  dev->release = node_cache_release;
>273  if (dev_set_name(dev, "memory_side_cache"))
>274  goto free_dev;
>275  
>276  if (device_register(dev))
> ^^^
>277  goto free_name;
>278  
>279  pm_runtime_no_callbacks(dev);
>280  node->cache_dev = dev;
>281  return;
>282  free_name:
>283  kfree_const(dev->kobj.name);
>284  free_dev:
>285  kfree(dev);
> ^^
>286  }
> 
> regards,
> dan carpenter


Re: [bug report] node: Add memory-side caching attributes

2021-04-01 Thread Dan Carpenter
On Thu, Apr 01, 2021 at 08:25:11AM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 12:00:39PM +0300, Dan Carpenter wrote:
> > Hi Keith,
> > 
> > I've been trying to figure out ways Smatch can check for device managed
> > resources.  Like adding rules that if we call dev_set_name(>bar)
> > then it's device managaged and if there is a kfree(foo) without calling
> > device_put(>bar) then that's a resource leak.
> 
> It seems to be working from what I can see

This check is actually more simple, and older.  It just looks for

device_register(dev);
...
kfree(dev);

I've written your proposed check of:

device_register(>dev);
...
kfree(foo); // warning missing device_put(>dev);

But I just did that earler today and it will probably take a couple
iterations to work out the kinks.  Plus I'm off for a small vacation so
it will be a week before I have the results from that.

> 
> Also I wasn't able to convince myself that any locking around
> node->cache_attrs exist..
> 
> > Of course one of the rules is that if you call device_register(dev) then
> > you can't kfree(dev), it has to released with device_put(dev) and that's
> > true even if the register fails.  But this code here feels very
> > intentional so maybe there is an exception to the rule?
> 
> There is no exception. Open coding this:
> 
> >282  free_name:
> >283  kfree_const(dev->kobj.name);
> 
> To avoid leaking memory from dev_set_name is a straight up layering
> violation, WTF?
> 
> node_cacheinfo_release() is just kfree(), so there is no need.
> Instead (please feel free to send this Dan):

Sure, I can send this (tomorrow).

> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index f449dbb2c74666..89c28952863977 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -319,25 +319,24 @@ void node_add_cache(unsigned int nid, struct 
> node_cache_attrs *cache_attrs)
>   return;
>  
>   dev = >dev;
> + device_initialize(dev)
>   dev->parent = node->cache_dev;
>   dev->release = node_cacheinfo_release;
>   dev->groups = cache_groups;
>   if (dev_set_name(dev, "index%d", cache_attrs->level))

Is calling dev_set_name() without doing a device_initialize() a bug?  I
could write a check for that.

regards,
dan carpenter



Re: [bug report] node: Add memory-side caching attributes

2021-04-01 Thread Jason Gunthorpe
On Thu, Apr 01, 2021 at 05:06:52PM +0300, Dan Carpenter wrote:

> > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > index f449dbb2c74666..89c28952863977 100644
> > +++ b/drivers/base/node.c
> > @@ -319,25 +319,24 @@ void node_add_cache(unsigned int nid, struct 
> > node_cache_attrs *cache_attrs)
> > return;
> >  
> > dev = >dev;
> > +   device_initialize(dev)
> > dev->parent = node->cache_dev;
> > dev->release = node_cacheinfo_release;
> > dev->groups = cache_groups;
> > if (dev_set_name(dev, "index%d", cache_attrs->level))
> 
> Is calling dev_set_name() without doing a device_initialize() a bug?  I
> could write a check for that.

IMHO, yes.

However, Greg may not agree as dev_set_name() with no error check
followed by device_register() is a very common pattern. If the user
omits the device_initialize() then dev_set_name() must be immediately
before only device_register() with no error unwind between them. It
must not error unwind dev_set_name() to kfree. (This is really
tricky)

Greg and I have argued on the merits of device_initialize() several
times before.

I argue the error control flow is simpler and easier to get right, he
argues the extra statement is unneeded complexity and people will get
the error unwind wrong.

Every time you find a bug like this is someone getting the complexity
around error handling and device_register() wrong, so my advices is to
stop using device_register (aka device_init_and_add) and make it
explicit so the goto unwind has logical pairing. put_device() pairs
with device_initialize().

The tricky bit is establishing the release function, as complex
release functions often have complex init sequences and you need the
setup done enough to go to release. When things become this complex I
advocate splitting alloc into a function:

/* Caller must use put_device(>dev) */
struct foo *foo_allocate()
{ 
   foo = kzalloc
   allocate release freeing thing 1;
   allocate release freeing thing 2;
   allocate release freeing thing 3;
   device_initialize();
   return foo;

err:
   free thing 3;
err:
   free thing 2;
err:
   free thing 1;
err:
   kfree(foo)
   return rc;
}

Thus there is a clear logical seperation between the world of 'unwind
to kfree' and the world of 'unwind to put_device'. dev_set_name() can
not be inside an alloc function.

Simple cases, like here, should just do device_initialize()
immediately after kzalloc() and never have a kfree() error unwind.

I saw Christoph had a similar opinion on 'init and add' patterns being
bad, maybe he has additional colour to share.

Jason


Re: [bug report] node: Add memory-side caching attributes

2021-04-01 Thread Jason Gunthorpe
On Thu, Apr 01, 2021 at 12:00:39PM +0300, Dan Carpenter wrote:
> Hi Keith,
> 
> I've been trying to figure out ways Smatch can check for device managed
> resources.  Like adding rules that if we call dev_set_name(>bar)
> then it's device managaged and if there is a kfree(foo) without calling
> device_put(>bar) then that's a resource leak.

It seems to be working from what I can see

Also I wasn't able to convince myself that any locking around
node->cache_attrs exist..

> Of course one of the rules is that if you call device_register(dev) then
> you can't kfree(dev), it has to released with device_put(dev) and that's
> true even if the register fails.  But this code here feels very
> intentional so maybe there is an exception to the rule?

There is no exception. Open coding this:

>282  free_name:
>283  kfree_const(dev->kobj.name);

To avoid leaking memory from dev_set_name is a straight up layering
violation, WTF?

node_cacheinfo_release() is just kfree(), so there is no need.
Instead (please feel free to send this Dan):

diff --git a/drivers/base/node.c b/drivers/base/node.c
index f449dbb2c74666..89c28952863977 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -319,25 +319,24 @@ void node_add_cache(unsigned int nid, struct 
node_cache_attrs *cache_attrs)
return;
 
dev = >dev;
+   device_initialize(dev)
dev->parent = node->cache_dev;
dev->release = node_cacheinfo_release;
dev->groups = cache_groups;
if (dev_set_name(dev, "index%d", cache_attrs->level))
-   goto free_cache;
+   goto put_device;
 
info->cache_attrs = *cache_attrs;
-   if (device_register(dev)) {
+   if (device_add(dev)) {
dev_warn(>dev, "failed to add cache level:%d\n",
 cache_attrs->level);
-   goto free_name;
+   goto put_device
}
pm_runtime_no_callbacks(dev);
list_add_tail(>node, >cache_attrs);
return;
-free_name:
-   kfree_const(dev->kobj.name);
-free_cache:
-   kfree(info);
+put_device:
+   put_device(dev);
 }
 
 static void node_remove_caches(struct node *node)


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-15 Thread Lucas Stach
Am Montag, dem 15.02.2021 um 10:54 -0500 schrieb Sven Van Asbroeck:
> Hi Lucas,
> 
> On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach  wrote:
> > 
> > The straight forward way to fix this would be to just disable the PRE
> > when the stride is getting too large, which might not work well with
> > all userspace requirements, as it effectively disables the ability to
> > scan GPU tiled surfaces when the stride is getting too large.
> 
> Thank you for your very knowledgeable input, really appreciate it.
> 
> I am wondering why I am the first to notice this particular corner
> case. Is this perhaps because X+armada plugin allocate a huge bitmap
> that fits all displays, and other software frameworks do not? Are
> people on i.MX6 mostly using X or Wayland? If Wayland allocates a
> separate bitmap for each display, this PRE bug will of course never
> show up...

Yep, I really doubt that there are a lot i.MX6QP, multi-display, X.Org
based devices out there.

While it's not anywhere in a protocol or similar fixed API, Wayland
compositors mostly opted to have a separate surface per display. The
weston reference compositor started out this way (as it makes surface
repaint easier) and other followed the lead, so Wayland based stacks
won't hit this case.

Regards,
Lucas



Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-15 Thread Sven Van Asbroeck
Hi Lucas,

On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach  wrote:
>
> The straight forward way to fix this would be to just disable the PRE
> when the stride is getting too large, which might not work well with
> all userspace requirements, as it effectively disables the ability to
> scan GPU tiled surfaces when the stride is getting too large.

Thank you for your very knowledgeable input, really appreciate it.

I am wondering why I am the first to notice this particular corner
case. Is this perhaps because X+armada plugin allocate a huge bitmap
that fits all displays, and other software frameworks do not? Are
people on i.MX6 mostly using X or Wayland? If Wayland allocates a
separate bitmap for each display, this PRE bug will of course never
show up...

Sven


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-15 Thread Lucas Stach
Hi Sven,

Am Freitag, dem 12.02.2021 um 18:52 -0500 schrieb Sven Van Asbroeck:
> Philipp, Fabio,
> 
> I was able to verify that the PREs do indeed overrun their allocated ocram 
> area.
> 
> Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
> required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
> comes to 64K. This is what the PRE driver allocates. So far, so good.
> 
> The trouble starts when we're displaying a section of a much wider
> bitmap. This happens in X when using two displays. e.g.:
> HDMI 1920x1088
> LVDS 1280x800
> X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.
> 
> In such a case, the stride will be much larger than the width of a
> display scanline.

Urgh, bad tested corner case.

> This is where things start to go very wrong.
> 
> I found that the ocram area used by the PREs increases with the
> stride. I experimentally found a formula:
> ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4
> 
> As the stride increases, the PRE eventually overruns the ocram and...
> ends up in the "ocram aliased" area, where it overwrites the ocram in
> use by the vpu/coda !
> 
> I could not find any PRE register setting that changes the used ocram area.

There is no such setting. The PRE always prefetches a doublebuffer of
2x4 scanlines and the scanline size is defined by the store engine
pitch.

The straight forward way to fix this would be to just disable the PRE
when the stride is getting too large, which might not work well with
all userspace requirements, as it effectively disables the ability to
scan GPU tiled surfaces when the stride is getting too large.

I'm not sure if this works in practice, as the PRG address rewriting
might make this harder than it seems, but on could probably try to
rewrite the prefetch start address, input pitch, input width/height and
store pitch of the PRE settings to cover only the area used by the the
CRTC to reduce OCRAM requirements.

Regards,
Lucas



Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-12 Thread Sven Van Asbroeck
Philipp, Fabio,

I was able to verify that the PREs do indeed overrun their allocated ocram area.

Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
comes to 64K. This is what the PRE driver allocates. So far, so good.

The trouble starts when we're displaying a section of a much wider
bitmap. This happens in X when using two displays. e.g.:
HDMI 1920x1088
LVDS 1280x800
X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.

In such a case, the stride will be much larger than the width of a
display scanline.

This is where things start to go very wrong.

I found that the ocram area used by the PREs increases with the
stride. I experimentally found a formula:
ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4

As the stride increases, the PRE eventually overruns the ocram and...
ends up in the "ocram aliased" area, where it overwrites the ocram in
use by the vpu/coda !

I could not find any PRE register setting that changes the used ocram area.

Sven


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-11 Thread Sven Van Asbroeck
Hi Philipp, thank you so much for looking into this, I really appreciate it !

On Thu, Feb 11, 2021 at 9:32 AM Philipp Zabel  wrote:
>
> Another thing that might help to identify who is writing where might be to
> clear the whole OCRAM region and dump it after running only decode or only
> PRE/PRG scanout, for example:

Great idea, I will try that out. This might take a few days. I am also
dealing with higher priority issues,

>
> Could you check /sys/kernel/debug/dri/?/state while running the error case?

dri state in non-error case:


# cat state
plane[31]: plane-0
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[35]: plane-1
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[38]: plane-2
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[42]: plane-3
crtc=crtc-2
fb=59
allocated by = X
refcount=2
format=XR24 little-endian (0x34325258)
modifier=0x0
size=1280x1088
layers:
size[0]=1280x1088
pitch[0]=5120
offset[0]=0
obj[0]:
name=2
refcount=4
start=000105e4
size=5570560
imported=no
paddr=0xee80
vaddr=78a02004
crtc-pos=1280x800+0+0
src-pos=1280.00x800.00+0.00+0.00
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[46]: plane-4
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[49]: plane-5
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
crtc[34]: crtc-0
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[41]: crtc-1
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[45]: crtc-2
enable=1
active=1
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=8
connector_mask=2
encoder_mask=2
mode: "": 60 67880 1280 1344 1345 1350 800 838 839 841 0x0 0x0
crtc[52]: crtc-3
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
connector[54]: HDMI-A-1
crtc=(null)
self_refresh_aware=0
connector[57]: LVDS-1
crtc=crtc-2
self_refresh_aware=0

dri state in error case:

# cat state
plane[31]: plane-0
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[35]: plane-1
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[38]: plane-2
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.00x0.00+0.00+0.00
rotation=1

Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-11 Thread Philipp Zabel
Hi Sven,

On Wed, Feb 10, 2021 at 01:29:29PM -0500, Sven Van Asbroeck wrote:
> Found it!
> 
> The i.MX6QuadPlus has two pairs of PREs, which use the extended
> section of the iRAM. The Classic does not have any PREs or extended
> iRAM:
> 
> pre1: pre@21c8000 {
>compatible = "fsl,imx6qp-pre";
> 
> fsl,iram = <>;
> };
> 
> pre3: pre@21ca000 {
> compatible = "fsl,imx6qp-pre";
> 
> fsl,iram = <>;
> };
> 
> The CODA (VPU) driver uses the common section of iRAM:
> 
> vpu: vpu@204 {
> compatible = "cnm,coda960";
> 
> iram = <>;
> };
> 
> The VPU or the PREs are overrunning their assigned iRAM area. How do I
> know? Because if I change the PRE iRAM order, the problem disappears!
> 
> PRE1: ocram2 change to ocram3
> PRE2: ocram2 change to ocram3
> PRE3: ocram3 change to ocram2
> PRE4: ocram3 change to ocram2

Thank you for debugging this. Given that CODA uses the OCRAM address
range 0x90-0x94 and the PREs use OCRAM2 at 0x94-0x96
and OCRAM3 at 0x96-0x98, it seems unlikely that the PREs would
overrun into the CODA iRAM. But maybe there is some stride related
overflow that causes it to write at negative offsets or some other kind
of oversight.

Could you check /sys/kernel/debug/dri/?/state while running the error case?

Another thing that might help to identify who is writing where might be to
clear the whole OCRAM region and dump it after running only decode or only
PRE/PRG scanout, for example:

--8<--
/* Clear OCRAM */
#include 
#include 
#include 
#include 
#include 

#define OCRAM_START 0x90
#define OCRAM_SIZE  0x8

int main(int argc, char *argv[])
{
int fd = open("/dev/mem", O_RDWR | O_SYNC);
void *map = mmap(NULL, OCRAM_SIZE, PROT_WRITE, MAP_SHARED, fd, 
OCRAM_START);
if (map == MAP_FAILED)
return EXIT_FAILURE;
memset(map, 0, OCRAM_SIZE);
munmap(map, OCRAM_SIZE);
close(fd);
return EXIT_SUCCESS;
}
-->8--

--8<--
/* Dump OCRAM to stdout */
#include 
#include 
#include 
#include 

#define OCRAM_START 0x90
#define OCRAM_SIZE  0x8

int main(int argc, char *argv[])
{
int fd = open("/dev/mem", O_RDONLY | O_SYNC);
void *map = mmap(NULL, OCRAM_SIZE, PROT_READ, MAP_SHARED, fd, 
OCRAM_START);
if (map == MAP_FAILED)
return EXIT_FAILURE;
write(1, map, OCRAM_SIZE);
munmap(map, OCRAM_SIZE);
close(fd);
return EXIT_SUCCESS;
}
-->8--

regards
Philipp


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-10 Thread Sven Van Asbroeck
Found it!

The i.MX6QuadPlus has two pairs of PREs, which use the extended
section of the iRAM. The Classic does not have any PREs or extended
iRAM:

pre1: pre@21c8000 {
   compatible = "fsl,imx6qp-pre";

fsl,iram = <>;
};

pre3: pre@21ca000 {
compatible = "fsl,imx6qp-pre";

fsl,iram = <>;
};

The CODA (VPU) driver uses the common section of iRAM:

vpu: vpu@204 {
compatible = "cnm,coda960";

iram = <>;
};

The VPU or the PREs are overrunning their assigned iRAM area. How do I
know? Because if I change the PRE iRAM order, the problem disappears!

PRE1: ocram2 change to ocram3
PRE2: ocram2 change to ocram3
PRE3: ocram3 change to ocram2
PRE4: ocram3 change to ocram2

Sven


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-10 Thread Sven Van Asbroeck
Bonjour Nicolas,

On Wed, Feb 10, 2021 at 11:11 AM Nicolas Dufresne  wrote:
>
> Are you sure you aren't just running out of CMA ? This is the only things that
> comes to mind at the moment, sorry if it's not that useful.

Thanks for the suggestion! No worries, this is such a strange/weird
problem, that basically any idea has merit at this point.

I tried increasing the CMA area from 256M -> 512M, but there was no
impact. The critical framebuffer width still remains the same
(=0x900).

And everything works fine on a classic i.MX6Quad, it's only the
i.MX6QuadPlus that has the problem. I am running i.MX6Quad and
i.MX6QuadPlus side-by-side with identical kernels/rootfses. Obviously
the devicetree is slightly different.

Sven


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-10 Thread Nicolas Dufresne
Hi Sven,

Le mercredi 03 février 2021 à 11:33 -0500, Sven Van Asbroeck a écrit :
> From: Sven Van Asbroeck 
> 
> We have observed that under certain repeatable circumstances, the CODA
> mem2mem device consistently generates corrupted frames. This happens only
> on an i.MX6qp (Plus) - the classic imx6q is not affected.
> 
> This happens when the virtual X screen is wider than 0x900 pixels (1).

Are you sure you aren't just running out of CMA ? This is the only things that
comes to mind at the moment, sorry if it's not that useful.

> 
> Quite strange, because CODA is a mem2mem device, and is presumably not
> touching
> any of the IPU/GPU2D/GPU3D infrastructure used by X. Except if there is a
> hidden
> dependency somehow.
> 
> I have captured and visualized generated CODA frames as follows:
> gst-launch-1.0 playbin uri=file:///home/default/nycTrain1080p.mp4 flags=0x45
>     video-sink='multifilesink location=frame%d.yuv'
> See (2) for how I converted the raw YUV frame to a PNG image.
> 
> For example, the following will break CODA mpeg4 decode (width >= 0x900):
> # xrandr --fb 2400x1088
> Screen 0: minimum 1 x 1, current 2400 x 1088, maximum 4096 x 4096
> HDMI1 disconnected (normal left inverted right x axis y axis)
> LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y
> axis) 0mm x 0mm
>    1280x800  59.79*+
> 
> Resulting frame when dumped with multifilesink (NOT written to the display):
> https://gitlab.com/TheSven73/coda-investigation/-/blob/master/stripes.png
> 
> And the following will restore CODA mpeg4 decode (width < 0x900):
> # xrandr --fb 2300x1088
> Screen 0: minimum 1 x 1, current 2300 x 1088, maximum 4096 x 4096
> HDMI1 disconnected (normal left inverted right x axis y axis)
> LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y
> axis) 0mm x 0mm
>    1280x800  59.79*+
> 
> Resulting frame when dumped with multifilesink (NOT written to the display):
> https://gitlab.com/TheSven73/coda-investigation/-/blob/master/ok.png
> 
> Additional info:
> - only the virtual X screen width seems to trigger the issue, it is
>   independent of the height.
> - issue seems independent of the pixel format. Forcing CODA to output NV12
>   shows the same behaviour.
> 
> System description:
> - i.MX6 QuadPlus:
> [    0.144518] CPU identified as i.MX6QP, silicon rev 1.1
> - mainline Linux v5.9.16 with a small private patchset on top
>   (patchset does not touch CODA)
> - CODA960 silicon contained within i.MX6 QuadPlus:
> [ 4798.510033] coda 204.vpu: Firmware code revision: 46076
> [ 4798.515916] coda 204.vpu: Initialized CODA960.
> [ 4798.520779] coda 204.vpu: Firmware version: 3.1.1
> - gstreamer from buildroot:
> gst-launch-1.0 version 1.16.2
> GStreamer 1.16.2
> - X from buildroot, using armada and etnadrm_gpu plugins:
> X.Org X Server 1.20.7
> X Protocol Version 11, Revision 0
> [    99.527] (II) LoadModule: "armada"
> [    99.527] (II) Loading /usr/lib/xorg/modules/drivers/armada_drv.so
> [    99.538] (II) Module armada: vendor="X.Org Foundation"
> [    99.538]compiled for 1.20.7, module version = 0.0.0
> [    99.538]Module class: X.Org Video Driver
> [    99.538]ABI class: X.Org Video Driver, version 24.1
> [    99.538] (II) armada: Support for Marvell LCD Controller: 88AP510
> [    99.539] (II) armada: Support for Freescale IPU: i.MX6
> [    99.545] (II) armada(0): Added screen for KMS device /dev/dri/card1
> [    99.561] (II) armada(0): hardware: imx-drm
> [    99.563] (**) armada(0): Option "AccelModule" "etnadrm_gpu"
> [    99.563] (II) Loading sub module "etnadrm_gpu"
> [    99.563] (II) LoadModule: "etnadrm_gpu"
> [    99.564] (II) Loading /usr/lib/xorg/modules/drivers/etnadrm_gpu.so
> [    99.576] (II) Module Etnaviv GPU driver (DRM): vendor="X.Org Foundation"
> [    99.576]compiled for 1.20.7, module version = 0.0.0
> 
> 
> (1) When using multiple displays, the virtual X screen is typically the
> bounding
>     rectangle which includes all screens. That's why it can become wider than
>     1920 pixels.
> 
> (2)
> 
> # Convert raw YUYV to PNG
> # Python, runs out of the box on a stock Google Colab notebook
> import cv2
> import numpy as np
> import matplotlib.pyplot as plt
> import matplotlib
> 
> img = np.fromfile('frame1.yuv', dtype=np.uint8)
> # YUYV has two 8-bit channels per pixel
> img.shape = (1088, 1920, 2)
> 
> img2 = cv2.cvtColor(img, cv2.COLOR_YUV2RGB_YUYV)
> plt.imshow(img2)
> matplotlib.image.imsave('frame1.png', img2)
> 
> To: Philipp Zabel 
> To: Mauro Carvalho Chehab 
> Cc: Adrian Ratiu 
> Cc: Lucas Stach 
> Cc: Fabio Estevam 
> Cc: linux-me...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org




Re: [bug report] sched/fair: Prefer prev cpu in asymmetric wakeup path

2020-11-13 Thread Dan Carpenter
On Fri, Nov 13, 2020 at 09:56:37AM +0100, Vincent Guittot wrote:
> Hi Dan,
> 
> Le vendredi 13 nov. 2020 à 11:46:57 (+0300), Dan Carpenter a écrit :
> > Hello Vincent Guittot,
> > 
> > The patch b4c9c9f15649: "sched/fair: Prefer prev cpu in asymmetric
> > wakeup path" from Oct 29, 2020, leads to the following static checker
> > warning:
> > 
> > kernel/sched/fair.c:6249 select_idle_sibling()
> > error: uninitialized symbol 'task_util'.
> > 
> > kernel/sched/fair.c
> >   6233  static int select_idle_sibling(struct task_struct *p, int prev, int 
> > target)
> >   6234  {
> >   6235  struct sched_domain *sd;
> >   6236  unsigned long task_util;
> >   6237  int i, recent_used_cpu;
> >   6238  
> >   6239  /*
> >   6240   * On asymmetric system, update task utilization because we 
> > will check
> >   6241   * that the task fits with cpu's capacity.
> >   6242   */
> > 
> > The original comment was a bit more clear...  Perhaps "On asymmetric
> > system[s], [record the] task utilization because we will check that the
> > task [can be done within] the cpu's capacity."
> 
> The comment "update task utilization because we will check ..." refers to
> sync_entity_load_avg()
> 
> > 
> >   6243  if (static_branch_unlikely(_asym_cpucapacity)) {
> >   6244  sync_entity_load_avg(>se);
> >   6245  task_util = uclamp_task_util(p);
> >   6246  }
> > 
> > "task_util" is not initialized on the else path.
> 
> no need because it will not be used
> 
> > 
> >   6247  
> >   6248  if ((available_idle_cpu(target) || sched_idle_cpu(target)) 
> > &&
> >   6249  asym_fits_capacity(task_util, target))
> >^
> > Uninitialized variable warning.
> 
> asym_fits_capacity includes the same condition as above when we set task_util
> so task_util can't be used unintialize
> 
> static inline bool asym_fits_capacity(int task_util, int cpu)
> {
>   if (static_branch_unlikely(_asym_cpucapacity))
>   return fits_capacity(task_util, capacity_of(cpu));
> 
>   return true;

It's an interesting question, because unless the compiler makes this
inline, then it will lead to a KASan/syzbot warning at runtime.  The
function is, of course, marked as inline but the compiler, also of
course,  generally ignores those hints (use __always_inline if you want
the compiler to pay attention).  On the other hand, the compiler will
still probably inline it...  So this is *probably* not going to
lead to a runtime warning.

regards,
dan carpenter



Re: [bug report] sched/fair: Prefer prev cpu in asymmetric wakeup path

2020-11-13 Thread Vincent Guittot
Hi Dan,

Le vendredi 13 nov. 2020 à 11:46:57 (+0300), Dan Carpenter a écrit :
> Hello Vincent Guittot,
> 
> The patch b4c9c9f15649: "sched/fair: Prefer prev cpu in asymmetric
> wakeup path" from Oct 29, 2020, leads to the following static checker
> warning:
> 
>   kernel/sched/fair.c:6249 select_idle_sibling()
>   error: uninitialized symbol 'task_util'.
> 
> kernel/sched/fair.c
>   6233  static int select_idle_sibling(struct task_struct *p, int prev, int 
> target)
>   6234  {
>   6235  struct sched_domain *sd;
>   6236  unsigned long task_util;
>   6237  int i, recent_used_cpu;
>   6238  
>   6239  /*
>   6240   * On asymmetric system, update task utilization because we 
> will check
>   6241   * that the task fits with cpu's capacity.
>   6242   */
> 
> The original comment was a bit more clear...  Perhaps "On asymmetric
> system[s], [record the] task utilization because we will check that the
> task [can be done within] the cpu's capacity."

The comment "update task utilization because we will check ..." refers to
sync_entity_load_avg()

> 
>   6243  if (static_branch_unlikely(_asym_cpucapacity)) {
>   6244  sync_entity_load_avg(>se);
>   6245  task_util = uclamp_task_util(p);
>   6246  }
> 
> "task_util" is not initialized on the else path.

no need because it will not be used

> 
>   6247  
>   6248  if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
>   6249  asym_fits_capacity(task_util, target))
>^
> Uninitialized variable warning.

asym_fits_capacity includes the same condition as above when we set task_util
so task_util can't be used unintialize

static inline bool asym_fits_capacity(int task_util, int cpu)
{
if (static_branch_unlikely(_asym_cpucapacity))
return fits_capacity(task_util, capacity_of(cpu));

return true;
}


> 
>   6250  return target;
>   6251  
>   6252  /*
>   6253   * If the previous CPU is cache affine and idle, don't be 
> stupid:
>   6254   */
>   6255  if (prev != target && cpus_share_cache(prev, target) &&
>   6256  (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
>   6257  asym_fits_capacity(task_util, prev))
>   6258  return prev;
>   6259  
>   6260  /*
>   6261   * Allow a per-cpu kthread to stack with the wakee if the
> 
> regards,
> dan carpenter


Re: [bug report] tracing, synthetic events: Replace buggy strcat() with seq_buf operations

2020-11-02 Thread Steven Rostedt
On Mon, 2 Nov 2020 10:45:24 +0300
Dan Carpenter  wrote:

> Hello Steven Rostedt (VMware),
> 
> The patch 761a8c58db6b: "tracing, synthetic events: Replace buggy
> strcat() with seq_buf operations" from Oct 23, 2020, leads to the
> following static checker warning:
> 
>   kernel/trace/trace_events_synth.c:703 parse_synth_field()
>   warn: passing zero to 'ERR_PTR'
> 
> kernel/trace/trace_events_synth.c
>582  static struct synth_field *parse_synth_field(int argc, const char 
> **argv,
>583   int *consumed)
>584  {
>585  struct synth_field *field;
>586  const char *prefix = NULL, *field_type = argv[0], 
> *field_name, *array;
>587  int len, ret = 0;
>588  struct seq_buf s;
>589  ssize_t size;
>590  
>591  if (field_type[0] == ';')
>592  field_type++;
>593  
>594  if (!strcmp(field_type, "unsigned")) {
>595  if (argc < 3) {
>596  synth_err(SYNTH_ERR_INCOMPLETE_TYPE, 
> errpos(field_type));
>597  return ERR_PTR(-EINVAL);
>598  }
>599  prefix = "unsigned ";
>600  field_type = argv[1];
>601  field_name = argv[2];
>602  *consumed = 3;
>603  } else {
>604  field_name = argv[1];
>605  *consumed = 2;
>606  }
>607  
>608  field = kzalloc(sizeof(*field), GFP_KERNEL);
>609  if (!field)
>610  return ERR_PTR(-ENOMEM);
>611  
>612  len = strlen(field_name);
>613  array = strchr(field_name, '[');
>614  if (array)
>615  len -= strlen(array);
>616  else if (field_name[len - 1] == ';')
>617  len--;
>618  
>619  field->name = kmemdup_nul(field_name, len, GFP_KERNEL);
>620  if (!field->name) {
>621  ret = -ENOMEM;
>622  goto free;
>623  }
>624  if (!is_good_name(field->name)) {
>625  synth_err(SYNTH_ERR_BAD_NAME, errpos(field_name));
>626  ret = -EINVAL;
>627  goto free;
>628  }
>629  
>630  if (field_type[0] == ';')
>631  field_type++;
>632  len = strlen(field_type) + 1;
>633  
>634  if (array)
>635  len += strlen(array);
>636  
>637  if (prefix)
>638  len += strlen(prefix);
>639  
>640  field->type = kzalloc(len, GFP_KERNEL);
>641  if (!field->type) {
>642  ret = -ENOMEM;
>643  goto free;
>644  }
>645  seq_buf_init(, field->type, len);
>646  if (prefix)
>647  seq_buf_puts(, prefix);
>648  seq_buf_puts(, field_type);
>649  if (array) {
>650  seq_buf_puts(, array);
>651  if (s.buffer[s.len - 1] == ';')
>652  s.len--;
>653  }
>654  if (WARN_ON_ONCE(!seq_buf_buffer_left()))
>655  goto free;
> 
> This was originally reported by kbuild-bot, but it was somehow
> overlooked so now it's on my system.  The missing error code will lead
> to a NULL dereference in the caller.


I misread the original report, modified it to fix something else, and
didn't see the real problem.

Having to initialize ret for *every* error path is ridiculous. Here's the
fix.

-- Steve

>From 2980f226a7fb08e37fd3948e206854fe5a1a8c50 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" 
Date: Mon, 2 Nov 2020 11:28:39 -0500
Subject: [PATCH] tracing: Make -ENOMEM the default error for
 parse_synth_field()

parse_synth_field() returns a pointer and requires that errors get
surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
never be used as zero, and if it is, it could cause a false return code and
produce a NULL pointer dereference. It makes no sense to set ret to zero.

Set ret to -ENOMEM (the most common error case), and have any other errors
set it to something else. This removes the need to initialize ret on *every*
error branch.

Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with 
seq_buf operations")
Reported-by: Dan Carpenter 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_synth.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_events_synth.c 
b/kernel/trace/trace_events_synth.c
index 84b7cab55291..881df991742a 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -584,7 +584,7 @@ static struct synth_field 

Re: Bug Report High CPU Usage events_power_efficient

2020-07-14 Thread Martin Zaharinov
Hi 

Oki i find the problem is come from nf_conntrack_core 

I isolate problem in this part :

queue_delayed_work(system_power_efficient_wq, _gc_work.dwork, HZ);


When package go to queue delayed in one moment if connection track is to big 
process to delayed go to lock and start high cpu load.

This is need to check and find solution…

For now I remove queue_delayed_work and wait to check machine and will write 
status.

Martin

> On 8 Jul 2020, at 12:28, Greg KH  wrote:
> 
> 
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top
> 
> On Wed, Jul 08, 2020 at 11:34:52AM +0300, Martin Zaharinov wrote:
>> Yes i search but not find any information.
> 
> Please do the testing yourself, using 'git bisect' to find the offending
> commit.
> 
> thanks,
> 
> greg k-h



Re: Bug Report High CPU Usage events_power_efficient

2020-07-08 Thread Greg KH


A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Wed, Jul 08, 2020 at 11:34:52AM +0300, Martin Zaharinov wrote:
> Yes i search but not find any information.

Please do the testing yourself, using 'git bisect' to find the offending
commit.

thanks,

greg k-h


Re: Bug Report High CPU Usage events_power_efficient

2020-07-08 Thread Martin Zaharinov
Yes i search but not find any information.

And write hear to help if any have same problem or any of you have
idea from where is comme this problem.

If need more debug i will only write how to get more information.

Martin


На ср, 8.07.2020 г. в 10:09 Greg KH  написа:
>
> On Wed, Jul 08, 2020 at 09:50:49AM +0300, Martin Zaharinov wrote:
> > Add Greg , Florian, Eric to this bug
> >
> > > On 7 Jul 2020, at 22:54, Martin Zaharinov  wrote:
> > >
> > > And this is log from /sys/kernel/debug/tracing/trace
> > >
> > >
> > > # entries-in-buffer/entries-written: 32410/32410   #P:64
> > > #
> > > #  _-=> irqs-off
> > > # / _=> need-resched
> > > #| / _---=> hardirq/softirq
> > > #|| / _--=> preempt-depth
> > > #||| / delay
> > > #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
> > > #  | |   |      | |
> > >   <...>-57259 [005]  29619.680698: workqueue_execute_start: 
> > > work struct ef22e4b8: function gc_worker [nf_conntrack]
> > >   <...>-57259 [005]  29623.811407: workqueue_execute_end: 
> > > work struct ef22e4b8: function gc_worker [nf_conntrack]
> > >   <...>-57259 [005]  29623.811410: workqueue_execute_start: 
> > > work struct 00aeec55: function fb_flashcursor
> > >   <...>-57259 [005]  29623.811421: workqueue_execute_end: 
> > > work struct 00aeec55: function fb_flashcursor
> > >   <...>-57259 [005]  29623.811422: workqueue_execute_start: 
> > > work struct a6d382bb: function vmstat_update
> > >   <...>-57259 [005]  29623.811435: workqueue_execute_end: 
> > > work struct a6d382bb: function vmstat_update
> > >
> > >> On 7 Jul 2020, at 22:44, Martin Zaharinov  wrote:
> > >>
> > >> the problem is hear with kernel 5.7.7
> > >>
> > >> last work kernel without this problem is 5.6.7
> > >>
> > >> hear is more info:
> > >>
> > >> cat /proc/57259/stack
> > >> root@megacableamarilis:~# cat /proc/57259/stack
> > >> [<0>] gc_worker+0x1be/0x380 [nf_conntrack]
> > >> [<0>] process_one_work+0x1bc/0x3b0
> > >> [<0>] worker_thread+0x4d/0x460
> > >> [<0>] kthread+0x10d/0x130
> > >> [<0>] ret_from_fork+0x1f/0x30
> > >>
> > >> PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ 
> > >> COMMAND57259 root  28   0   0  0 
> > >>  0 R  69.8   0.0  82:42.14 kworker/5:2+events_power_efficient
> > >> 32 root  21   0   0  0  0 R  31.0   0.0  87:06.33 
> > >> ksoftirqd/4
> > >>
> > >>
> > >> Please help to fix this problem
> > >>
> > >>> On 22 Apr 2020, at 15:55, Martin Zaharinov  wrote:
> > >>>
> > >>> Hello Qian and Greg
> > >>> With latest 5.6.x kernel have problem with events_power_efficient 28488 
> > >>> root  28   0   0  0  0 R  95.5   0.0 101:38.19 
> > >>> kworker/2:1+events_power_efficient Process start to load machine after 
> > >>> 3-4 hour and load not stop only reboot machine remove process . Server 
> > >>> runing on AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when 
> > >>> machine load over 1G traffic machine locked and only restart fix 
> > >>> problem to next load . After move traffic and server stop load process 
> > >>> still hear and load server ?
> > >>> And after reboot process move to other core .
>
> Have you used 'git bisect' to try to find the offending commit?
>
> Without that, it's going to be hard to help you out here.
>
> thanks,
>
> greg k-h


Re: Bug Report High CPU Usage events_power_efficient

2020-07-08 Thread Greg KH
On Wed, Jul 08, 2020 at 09:50:49AM +0300, Martin Zaharinov wrote:
> Add Greg , Florian, Eric to this bug 
> 
> > On 7 Jul 2020, at 22:54, Martin Zaharinov  wrote:
> > 
> > And this is log from /sys/kernel/debug/tracing/trace
> > 
> > 
> > # entries-in-buffer/entries-written: 32410/32410   #P:64
> > #
> > #  _-=> irqs-off
> > # / _=> need-resched
> > #| / _---=> hardirq/softirq
> > #|| / _--=> preempt-depth
> > #||| / delay
> > #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
> > #  | |   |      | |
> >   <...>-57259 [005]  29619.680698: workqueue_execute_start: 
> > work struct ef22e4b8: function gc_worker [nf_conntrack]
> >   <...>-57259 [005]  29623.811407: workqueue_execute_end: work 
> > struct ef22e4b8: function gc_worker [nf_conntrack]
> >   <...>-57259 [005]  29623.811410: workqueue_execute_start: 
> > work struct 00aeec55: function fb_flashcursor
> >   <...>-57259 [005]  29623.811421: workqueue_execute_end: work 
> > struct 00aeec55: function fb_flashcursor
> >   <...>-57259 [005]  29623.811422: workqueue_execute_start: 
> > work struct a6d382bb: function vmstat_update
> >   <...>-57259 [005]  29623.811435: workqueue_execute_end: work 
> > struct a6d382bb: function vmstat_update
> > 
> >> On 7 Jul 2020, at 22:44, Martin Zaharinov  wrote:
> >> 
> >> the problem is hear with kernel 5.7.7 
> >> 
> >> last work kernel without this problem is 5.6.7
> >> 
> >> hear is more info:
> >> 
> >> cat /proc/57259/stack
> >> root@megacableamarilis:~# cat /proc/57259/stack
> >> [<0>] gc_worker+0x1be/0x380 [nf_conntrack]
> >> [<0>] process_one_work+0x1bc/0x3b0
> >> [<0>] worker_thread+0x4d/0x460
> >> [<0>] kthread+0x10d/0x130
> >> [<0>] ret_from_fork+0x1f/0x30
> >> 
> >> PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND 
> >>57259 root  28   0   0  0  0 R  
> >> 69.8   0.0  82:42.14 kworker/5:2+events_power_efficient
> >>   
> >> 32 root  21   0   0  0  0 R  31.0   0.0  87:06.33 
> >> ksoftirqd/4
> >> 
> >> 
> >> Please help to fix this problem
> >> 
> >>> On 22 Apr 2020, at 15:55, Martin Zaharinov  wrote:
> >>> 
> >>> Hello Qian and Greg 
> >>> With latest 5.6.x kernel have problem with events_power_efficient 28488 
> >>> root  28   0   0  0  0 R  95.5   0.0 101:38.19 
> >>> kworker/2:1+events_power_efficient Process start to load machine after 
> >>> 3-4 hour and load not stop only reboot machine remove process . Server 
> >>> runing on AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when 
> >>> machine load over 1G traffic machine locked and only restart fix problem 
> >>> to next load . After move traffic and server stop load process still hear 
> >>> and load server ?
> >>> And after reboot process move to other core .

Have you used 'git bisect' to try to find the offending commit?

Without that, it's going to be hard to help you out here.

thanks,

greg k-h


Re: Bug Report High CPU Usage events_power_efficient

2020-07-08 Thread Martin Zaharinov
Add Greg , Florian, Eric to this bug 

> On 7 Jul 2020, at 22:54, Martin Zaharinov  wrote:
> 
> And this is log from /sys/kernel/debug/tracing/trace
> 
> 
> # entries-in-buffer/entries-written: 32410/32410   #P:64
> #
> #  _-=> irqs-off
> # / _=> need-resched
> #| / _---=> hardirq/softirq
> #|| / _--=> preempt-depth
> #||| / delay
> #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
> #  | |   |      | |
>   <...>-57259 [005]  29619.680698: workqueue_execute_start: work 
> struct ef22e4b8: function gc_worker [nf_conntrack]
>   <...>-57259 [005]  29623.811407: workqueue_execute_end: work 
> struct ef22e4b8: function gc_worker [nf_conntrack]
>   <...>-57259 [005]  29623.811410: workqueue_execute_start: work 
> struct 00aeec55: function fb_flashcursor
>   <...>-57259 [005]  29623.811421: workqueue_execute_end: work 
> struct 00aeec55: function fb_flashcursor
>   <...>-57259 [005]  29623.811422: workqueue_execute_start: work 
> struct a6d382bb: function vmstat_update
>   <...>-57259 [005]  29623.811435: workqueue_execute_end: work 
> struct a6d382bb: function vmstat_update
> 
>> On 7 Jul 2020, at 22:44, Martin Zaharinov  wrote:
>> 
>> the problem is hear with kernel 5.7.7 
>> 
>> last work kernel without this problem is 5.6.7
>> 
>> hear is more info:
>> 
>> cat /proc/57259/stack
>> root@megacableamarilis:~# cat /proc/57259/stack
>> [<0>] gc_worker+0x1be/0x380 [nf_conntrack]
>> [<0>] process_one_work+0x1bc/0x3b0
>> [<0>] worker_thread+0x4d/0x460
>> [<0>] kthread+0x10d/0x130
>> [<0>] ret_from_fork+0x1f/0x30
>> 
>> PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND   
>>  57259 root  28   0   0  0  0 R  
>> 69.8   0.0  82:42.14 kworker/5:2+events_power_efficient  
>> 
>> 32 root  21   0   0  0  0 R  31.0   0.0  87:06.33 ksoftirqd/4
>> 
>> 
>> Please help to fix this problem
>> 
>>> On 22 Apr 2020, at 15:55, Martin Zaharinov  wrote:
>>> 
>>> Hello Qian and Greg 
>>> With latest 5.6.x kernel have problem with events_power_efficient 28488 
>>> root  28   0   0  0  0 R  95.5   0.0 101:38.19 
>>> kworker/2:1+events_power_efficient Process start to load machine after 3-4 
>>> hour and load not stop only reboot machine remove process . Server runing 
>>> on AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when machine load 
>>> over 1G traffic machine locked and only restart fix problem to next load . 
>>> After move traffic and server stop load process still hear and load server ?
>>> And after reboot process move to other core .
>>> 
>>> Best Regards,
>>> Martin
>> 
> 



Re: Bug Report High CPU Usage events_power_efficient

2020-07-07 Thread Martin Zaharinov
And this is log from /sys/kernel/debug/tracing/trace


# entries-in-buffer/entries-written: 32410/32410   #P:64
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
   <...>-57259 [005]  29619.680698: workqueue_execute_start: work 
struct ef22e4b8: function gc_worker [nf_conntrack]
   <...>-57259 [005]  29623.811407: workqueue_execute_end: work 
struct ef22e4b8: function gc_worker [nf_conntrack]
   <...>-57259 [005]  29623.811410: workqueue_execute_start: work 
struct 00aeec55: function fb_flashcursor
   <...>-57259 [005]  29623.811421: workqueue_execute_end: work 
struct 00aeec55: function fb_flashcursor
   <...>-57259 [005]  29623.811422: workqueue_execute_start: work 
struct a6d382bb: function vmstat_update
   <...>-57259 [005]  29623.811435: workqueue_execute_end: work 
struct a6d382bb: function vmstat_update

> On 7 Jul 2020, at 22:44, Martin Zaharinov  wrote:
> 
> the problem is hear with kernel 5.7.7 
> 
> last work kernel without this problem is 5.6.7
> 
> hear is more info:
> 
> cat /proc/57259/stack
> root@megacableamarilis:~# cat /proc/57259/stack
> [<0>] gc_worker+0x1be/0x380 [nf_conntrack]
> [<0>] process_one_work+0x1bc/0x3b0
> [<0>] worker_thread+0x4d/0x460
> [<0>] kthread+0x10d/0x130
> [<0>] ret_from_fork+0x1f/0x30
> 
> PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
> 57259 root  28   0   0  0  0 R  69.8  
>  0.0  82:42.14 kworker/5:2+events_power_efficient 
>  
> 32 root  21   0   0  0  0 R  31.0   0.0  87:06.33 ksoftirqd/4
> 
> 
> Please help to fix this problem
> 
>> On 22 Apr 2020, at 15:55, Martin Zaharinov  wrote:
>> 
>> Hello Qian and Greg 
>> With latest 5.6.x kernel have problem with events_power_efficient 28488 root 
>>  28   0   0  0  0 R  95.5   0.0 101:38.19 
>> kworker/2:1+events_power_efficient Process start to load machine after 3-4 
>> hour and load not stop only reboot machine remove process . Server runing on 
>> AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when machine load over 
>> 1G traffic machine locked and only restart fix problem to next load . After 
>> move traffic and server stop load process still hear and load server ?
>> And after reboot process move to other core .
>> 
>> Best Regards,
>> Martin
> 



Re: Bug Report High CPU Usage events_power_efficient

2020-07-07 Thread Martin Zaharinov
the problem is hear with kernel 5.7.7 

last work kernel without this problem is 5.6.7

hear is more info:

cat /proc/57259/stack
root@megacableamarilis:~# cat /proc/57259/stack
[<0>] gc_worker+0x1be/0x380 [nf_conntrack]
[<0>] process_one_work+0x1bc/0x3b0
[<0>] worker_thread+0x4d/0x460
[<0>] kthread+0x10d/0x130
[<0>] ret_from_fork+0x1f/0x30

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND  
  57259 root  28   0   0  0  0 R  69.8   
0.0  82:42.14 kworker/5:2+events_power_efficient
  
32 root  21   0   0  0  0 R  31.0   0.0  87:06.33 ksoftirqd/4


Please help to fix this problem

> On 22 Apr 2020, at 15:55, Martin Zaharinov  wrote:
> 
> Hello Qian and Greg 
> With latest 5.6.x kernel have problem with events_power_efficient 28488 root  
> 28   0   0  0  0 R  95.5   0.0 101:38.19 
> kworker/2:1+events_power_efficient Process start to load machine after 3-4 
> hour and load not stop only reboot machine remove process . Server runing on 
> AMD EPIC CPU 2x 7301 32Gb Ram Have 2 x 10G card Intel when machine load over 
> 1G traffic machine locked and only restart fix problem to next load . After 
> move traffic and server stop load process still hear and load server ?
> And after reboot process move to other core .
> 
> Best Regards,
> Martin



Re: bug report: libceph: follow redirect replies from osds

2019-08-30 Thread Ilya Dryomov
On Fri, Aug 30, 2019 at 4:05 PM Colin Ian King  wrote:
>
> Hi,
>
> Static analysis with Coverity has picked up an issue with commit:
>
> commit 205ee1187a671c3b067d7f1e974903b44036f270
> Author: Ilya Dryomov 
> Date:   Mon Jan 27 17:40:20 2014 +0200
>
> libceph: follow redirect replies from osds
>
> Specifically in function ceph_redirect_decode in net/ceph/osd_client.c:
>
> 3485
> 3486len = ceph_decode_32(p);
>
> CID 17904: Unused value (UNUSED_VALUE)
>
> 3487*p += len; /* skip osd_instructions */
> 3488
> 3489/* skip the rest */
> 3490*p = struct_end;
>
> The double write to *p looks wrong, I suspect the *p += len; should be
> just incrementing pointer p as in: p += len.  Am I correct to assume
> this is the correct fix?

Hi Colin,

No, the double write to *p is correct.  It skips over len bytes and
then skips to the end of the redirect reply.  There is no bug here but
we could drop

  len = ceph_decode_32(p);
  *p += len; /* skip osd_instructions */

and skip to the end directly to make Coverity happier.

Thanks,

Ilya


Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)

2019-06-09 Thread Joseph Qi
Confirmed the following 3 upstream commits can resolve this issue:
d2a68c4effd8 x86/ftrace: Do not call function graph from dynamic trampolines
3c0dab44e227 x86/ftrace: Set trampoline pages as executable
7298e24f9042 x86/kprobes: Set instruction page as executable

And they are all included in stable 4.19.49.

Thanks,
Joseph

On 19/6/9 22:50, Greg KH wrote:
> On Sun, Jun 09, 2019 at 09:10:45PM +0800, Joseph Qi wrote:
>> Hi Nadav,
>> Thanks for the comments.
>> I'll test the 3 patches in the mentioned thread.
> 
> This should all be fixed in the latest release that happened today.  If
> not, please let us know.
> 
> thanks,
> 
> greg k-h
> 


Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)

2019-06-09 Thread Greg KH
On Sun, Jun 09, 2019 at 09:10:45PM +0800, Joseph Qi wrote:
> Hi Nadav,
> Thanks for the comments.
> I'll test the 3 patches in the mentioned thread.

This should all be fixed in the latest release that happened today.  If
not, please let us know.

thanks,

greg k-h


Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)

2019-06-09 Thread Joseph Qi
Hi Nadav,
Thanks for the comments.
I'll test the 3 patches in the mentioned thread.

Thanks,
Joseph

On 19/6/8 00:38, Nadav Amit wrote:
>> On Jun 7, 2019, at 3:24 AM, Joseph Qi  wrote:
>>
>> Hi all,
>> Any idea on this regression? 
> 
> Sorry for the late response (I assumed, for some reason, that you also follow 
> the second thread about this issue).
> 
> Anyhow, it should be fixed by backporting some patches which were mistakenly
> missed.
> 
> See https://lore.kernel.org/stable/20190606131558.GJ29739@sasha-vm/
> 
> Regards,
> Nadav
> 
> 
>> Thanks,
>> Joseph
>>
>> On 19/6/5 18:23, Joseph Qi wrote:
>>> Hi,
>>>
>>> I have encountered a kernel BUG when running ltp ftrace-stress-test
>>> on 4.19.48.
>>>
>>> [  209.704855] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90)
>>> [  209.739412] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked 
>>> and stat_runtime require the kernel parameter schedstats=enable or 
>>> kernel.sched_schedstats=1
>>> [  212.054506] kernel tried to execute NX-protected page - exploit attempt? 
>>> (uid: 0)
>>> [  212.055595] BUG: unable to handle kernel paging request at 
>>> c0349000
>>> [  212.056589] PGD d00c067 P4D d00c067 PUD d00e067 PMD 23673e067 PTE 
>>> 80023457f061
>>> [  212.057759] Oops: 0011 [#1] SMP PTI
>>> [  212.058303] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 
>>> 4.19.48 #112
>>>
>>> After some investigation I have found that it is introduced by commit
>>> 8715ce033eb3 ("x86/modules: Avoid breaking W^X while loading modules"),
>>> and then revert this commit the issue is gone.
>>>
>>> I have also tested the same case on 5.2-rc3 as well as right at
>>> upstream commit f2c65fb3221a ("x86/modules: Avoid breaking W^X while
>>> loading modules"), which has been merged in 5.2-rc1, it doesn't
>>> happen.
>>>
>>> So I don't know why only stable has this issue while upstream doesn't.
>>>
>>> Thanks,
>>> Joseph
> 


Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)

2019-06-07 Thread Nadav Amit
> On Jun 7, 2019, at 3:24 AM, Joseph Qi  wrote:
> 
> Hi all,
> Any idea on this regression? 

Sorry for the late response (I assumed, for some reason, that you also follow 
the second thread about this issue).

Anyhow, it should be fixed by backporting some patches which were mistakenly
missed.

See https://lore.kernel.org/stable/20190606131558.GJ29739@sasha-vm/

Regards,
Nadav


> Thanks,
> Joseph
> 
> On 19/6/5 18:23, Joseph Qi wrote:
>> Hi,
>> 
>> I have encountered a kernel BUG when running ltp ftrace-stress-test
>> on 4.19.48.
>> 
>> [  209.704855] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90)
>> [  209.739412] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked 
>> and stat_runtime require the kernel parameter schedstats=enable or 
>> kernel.sched_schedstats=1
>> [  212.054506] kernel tried to execute NX-protected page - exploit attempt? 
>> (uid: 0)
>> [  212.055595] BUG: unable to handle kernel paging request at 
>> c0349000
>> [  212.056589] PGD d00c067 P4D d00c067 PUD d00e067 PMD 23673e067 PTE 
>> 80023457f061
>> [  212.057759] Oops: 0011 [#1] SMP PTI
>> [  212.058303] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 
>> 4.19.48 #112
>> 
>> After some investigation I have found that it is introduced by commit
>> 8715ce033eb3 ("x86/modules: Avoid breaking W^X while loading modules"),
>> and then revert this commit the issue is gone.
>> 
>> I have also tested the same case on 5.2-rc3 as well as right at
>> upstream commit f2c65fb3221a ("x86/modules: Avoid breaking W^X while
>> loading modules"), which has been merged in 5.2-rc1, it doesn't
>> happen.
>> 
>> So I don't know why only stable has this issue while upstream doesn't.
>> 
>> Thanks,
>> Joseph




Re: [bug report][stable] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)

2019-06-07 Thread Joseph Qi
Hi all,
Any idea on this regression? 

Thanks,
Joseph

On 19/6/5 18:23, Joseph Qi wrote:
> Hi,
> 
> I have encountered a kernel BUG when running ltp ftrace-stress-test
> on 4.19.48.
> 
> [  209.704855] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90)
> [  209.739412] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked 
> and stat_runtime require the kernel parameter schedstats=enable or 
> kernel.sched_schedstats=1
> [  212.054506] kernel tried to execute NX-protected page - exploit attempt? 
> (uid: 0)
> [  212.055595] BUG: unable to handle kernel paging request at c0349000
> [  212.056589] PGD d00c067 P4D d00c067 PUD d00e067 PMD 23673e067 PTE 
> 80023457f061
> [  212.057759] Oops: 0011 [#1] SMP PTI
> [  212.058303] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 
> 4.19.48 #112
> 
> After some investigation I have found that it is introduced by commit
> 8715ce033eb3 ("x86/modules: Avoid breaking W^X while loading modules"),
> and then revert this commit the issue is gone.
> 
> I have also tested the same case on 5.2-rc3 as well as right at
> upstream commit f2c65fb3221a ("x86/modules: Avoid breaking W^X while
> loading modules"), which has been merged in 5.2-rc1, it doesn't
> happen.
> 
> So I don't know why only stable has this issue while upstream doesn't.
> 
> Thanks,
> Joseph
> 


Re: Bug report: A commit about serial8250 cause the output disorderly at the phase of startup

2019-04-23 Thread Hongzhi, Song



On 4/23/19 2:48 PM, John Ogness wrote:

On 2019-04-22, Hongzhi, Song  wrote:

Anyone notice this issue?

Yes, I am aware of the issue. It is actually a feature, not a bug. ;-)
Individual LOG_CONT messages, when classified as emergency messages, are
printed immediately to the console.


Yes, this is what I have acquired from commit log and code.

The emergency mechanism is a good way.



  This makes them appear
"disorderly". It is not yet clear how best to deal with LOG_CONT
emergency messages.



Extracting a part of log from my LOG_CONT message,

[    0.761635] 000:  (family: 0x6, model: 0xf
[    0.761635] 000: , stepping: 0xb)

It obviously separates a full line. Maybe it is not related to LOG_CONT. ^_^


The counterpart source code:

1499 pr_cont(" (family: 0x%x, model: 0x%x", c->x86, c->x86_model);
1500
1501 if (c->x86_stepping || c->cpuid_level >= 0)
1502 pr_cont(", stepping: 0x%x)\n", c->x86_stepping);
1503 else
1504 pr_cont(")\n");


I don't know if the issue exists in other place except LOG_CONT.

But it indeed influences customer's judgement.


Thanks for you reply.

--Hongzhi



But I suppose we should introduce a config option to
disable atomic consoles altogether if users preferred buffered/delayed
printk output.







John Ogness



On 4/19/19 10:24 AM, Hongzhi, Song wrote:

1. Issue description:

Boot kernel( >= linux-rt-devel-v5.0.3 ) with qemu.
Then qemu will print following disorderly messages.

At the beginning, the messages are disorderly. But then it becomes
normally from printing "[    0.00] 000: Linux version..."

--
[    0.019000] 000: tsc: Unable to calibrate against PIT
[    0.002583] 000: 6199.83 BogoMIPS (lpj=3099918)
[    0.521247] 000: Intel(R) Core(TM)2 Duo CPU T7700  @ 2.40GHz
[    0.521247] 000:  (family: 0x6, model: 0xf
[    0.521247] 000: , stepping: 0xb)
[    0.533126] 000: unsupported p6 CPU model 15
[    0.533318] 000: no PMU driver, software events only.
[    0.765082] 000: 1 ACPI AML tables successfully acquired and loaded
[    0.765274] 000:
[    0.785903] 000: Enabled 2 GPEs in block 00 to 0F
[    0.786128] 000:
[    0.835675] 000: acpi PNP0A03:00: fail to add MMCONFIG
information, can't access extended PCI configuration space under
this bridge.
[    0.892056] 000:  5
[    0.892289] 000:  *10
[    0.892416] 000:  11
[    0.892527] 000: )
[    0.892661] 000:

/* skip some repeated contents */

[    5.052149] 000: , 512kB Cache
[    0.00] 000: Linux version 5.0.3-yocto-preempt-rt+
(hsong@pek-lpggp1) (gcc version 8.3.0 (GCC)) #24 SMP PREEMPT Thu Apr
18 03:29:58 EDT 2019
[    0.00] 000: Command line: root=/dev/vda rw highres=off
console=ttyS0 mem=256M ip=192.168.7.4::192.168.7.3:255.255.255.0
vga=0 uvesafb.mode_opti0
[    0.00] 000: x86/fpu: x87 FPU will use FXSAVE
[    0.00] 000: BIOS-provided physical RAM map:
[    0.00] 000: BIOS-e820: [mem
0x-0x0009fbff]
[    0.00] 000: BIOS-e820: [mem
0x0009fc00-0x0009]
[    0.00] 000: BIOS-e820: [mem
0x000f-0x000f]
[    0.00] 000: BIOS-e820: [mem
0x0010-0x0ffdbfff]

--

2. Reproduce:
(1)build kernel: (Attachment is my .config)
make ARCH=x86_64
CROSS_COMPILE=[path-to-my-cross-toolchain]/x86_64-wrs-linux-

(2)boot kernel with qemu:

qemu-system-x86_64 \
-drive file=qemux86-64.rootfs.ext4,if=virtio,format=raw \
-nographic \
-kernel arch/x86/boot/bzImage \
-append 'root=/dev/vda rw highres=off  console=ttyS0 mem=256M ip=dhcp'

3. Analysis:
I find the following commit from >=linux-rt-devel-v5.0.3. cause the
issue.

b9d460e serial: 8250: implement write_atomic


Re: Bug report: A commit about serial8250 cause the output disorderly at the phase of startup

2019-04-23 Thread John Ogness
On 2019-04-22, Hongzhi, Song  wrote:
> Anyone notice this issue?

Yes, I am aware of the issue. It is actually a feature, not a bug. ;-)
Individual LOG_CONT messages, when classified as emergency messages, are
printed immediately to the console.  This makes them appear
"disorderly". It is not yet clear how best to deal with LOG_CONT
emergency messages. But I suppose we should introduce a config option to
disable atomic consoles altogether if users preferred buffered/delayed
printk output.

John Ogness


> On 4/19/19 10:24 AM, Hongzhi, Song wrote:
>> 1. Issue description:
>>
>> Boot kernel( >= linux-rt-devel-v5.0.3 ) with qemu.
>> Then qemu will print following disorderly messages.
>>
>> At the beginning, the messages are disorderly. But then it becomes
>> normally from printing "[    0.00] 000: Linux version..."
>>
>> --
>> [    0.019000] 000: tsc: Unable to calibrate against PIT
>> [    0.002583] 000: 6199.83 BogoMIPS (lpj=3099918)
>> [    0.521247] 000: Intel(R) Core(TM)2 Duo CPU T7700  @ 2.40GHz
>> [    0.521247] 000:  (family: 0x6, model: 0xf
>> [    0.521247] 000: , stepping: 0xb)
>> [    0.533126] 000: unsupported p6 CPU model 15
>> [    0.533318] 000: no PMU driver, software events only.
>> [    0.765082] 000: 1 ACPI AML tables successfully acquired and loaded
>> [    0.765274] 000:
>> [    0.785903] 000: Enabled 2 GPEs in block 00 to 0F
>> [    0.786128] 000:
>> [    0.835675] 000: acpi PNP0A03:00: fail to add MMCONFIG
>> information, can't access extended PCI configuration space under
>> this bridge.
>> [    0.892056] 000:  5
>> [    0.892289] 000:  *10
>> [    0.892416] 000:  11
>> [    0.892527] 000: )
>> [    0.892661] 000:
>>
>> /* skip some repeated contents */
>>
>> [    5.052149] 000: , 512kB Cache
>> [    0.00] 000: Linux version 5.0.3-yocto-preempt-rt+
>> (hsong@pek-lpggp1) (gcc version 8.3.0 (GCC)) #24 SMP PREEMPT Thu Apr
>> 18 03:29:58 EDT 2019
>> [    0.00] 000: Command line: root=/dev/vda rw highres=off
>> console=ttyS0 mem=256M ip=192.168.7.4::192.168.7.3:255.255.255.0
>> vga=0 uvesafb.mode_opti0
>> [    0.00] 000: x86/fpu: x87 FPU will use FXSAVE
>> [    0.00] 000: BIOS-provided physical RAM map:
>> [    0.00] 000: BIOS-e820: [mem
>> 0x-0x0009fbff]
>> [    0.00] 000: BIOS-e820: [mem
>> 0x0009fc00-0x0009]
>> [    0.00] 000: BIOS-e820: [mem
>> 0x000f-0x000f]
>> [    0.00] 000: BIOS-e820: [mem
>> 0x0010-0x0ffdbfff]
>>
>> --
>>
>> 2. Reproduce:
>> (1)build kernel: (Attachment is my .config)
>> make ARCH=x86_64
>> CROSS_COMPILE=[path-to-my-cross-toolchain]/x86_64-wrs-linux-
>>
>> (2)boot kernel with qemu:
>>
>> qemu-system-x86_64 \
>> -drive file=qemux86-64.rootfs.ext4,if=virtio,format=raw \
>> -nographic \
>> -kernel arch/x86/boot/bzImage \
>> -append 'root=/dev/vda rw highres=off  console=ttyS0 mem=256M ip=dhcp'
>>
>> 3. Analysis:
>> I find the following commit from >=linux-rt-devel-v5.0.3. cause the
>> issue.
>>
>> b9d460e serial: 8250: implement write_atomic


Re: Bug report: A commit about serial8250 cause the output disorderly at the phase of startup

2019-04-21 Thread Hongzhi, Song

Hi all,

Anyone notice this issue?


--Hongzhi


On 4/19/19 10:24 AM, Hongzhi, Song wrote:

1. Issue description:

Boot kernel( >= linux-rt-devel-v5.0.3 ) with qemu.
Then qemu will print following disorderly messages.

At the beginning, the messages are disorderly. But then it becomes 
normally from printing "[    0.00] 000: Linux version..."


--
[    0.019000] 000: tsc: Unable to calibrate against PIT
[    0.002583] 000: 6199.83 BogoMIPS (lpj=3099918)
[    0.521247] 000: Intel(R) Core(TM)2 Duo CPU T7700  @ 2.40GHz
[    0.521247] 000:  (family: 0x6, model: 0xf
[    0.521247] 000: , stepping: 0xb)
[    0.533126] 000: unsupported p6 CPU model 15
[    0.533318] 000: no PMU driver, software events only.
[    0.765082] 000: 1 ACPI AML tables successfully acquired and loaded
[    0.765274] 000:
[    0.785903] 000: Enabled 2 GPEs in block 00 to 0F
[    0.786128] 000:
[    0.835675] 000: acpi PNP0A03:00: fail to add MMCONFIG information, 
can't access extended PCI configuration space under this bridge.

[    0.892056] 000:  5
[    0.892289] 000:  *10
[    0.892416] 000:  11
[    0.892527] 000: )
[    0.892661] 000:

/* skip some repeated contents */

[    5.052149] 000: , 512kB Cache
[    0.00] 000: Linux version 5.0.3-yocto-preempt-rt+ 
(hsong@pek-lpggp1) (gcc version 8.3.0 (GCC)) #24 SMP PREEMPT Thu Apr 
18 03:29:58 EDT 2019
[    0.00] 000: Command line: root=/dev/vda rw highres=off 
console=ttyS0 mem=256M ip=192.168.7.4::192.168.7.3:255.255.255.0 vga=0 
uvesafb.mode_opti0

[    0.00] 000: x86/fpu: x87 FPU will use FXSAVE
[    0.00] 000: BIOS-provided physical RAM map:
[    0.00] 000: BIOS-e820: [mem 
0x-0x0009fbff]
[    0.00] 000: BIOS-e820: [mem 
0x0009fc00-0x0009]
[    0.00] 000: BIOS-e820: [mem 
0x000f-0x000f]
[    0.00] 000: BIOS-e820: [mem 
0x0010-0x0ffdbfff]


--

2. Reproduce:
(1)build kernel: (Attachment is my .config)
make ARCH=x86_64 
CROSS_COMPILE=[path-to-my-cross-toolchain]/x86_64-wrs-linux-


(2)boot kernel with qemu:

qemu-system-x86_64 \
-drive file=qemux86-64.rootfs.ext4,if=virtio,format=raw \
-nographic \
-kernel arch/x86/boot/bzImage \
-append 'root=/dev/vda rw highres=off  console=ttyS0 mem=256M ip=dhcp'

3. Analysis:
I find the following commit from >=linux-rt-devel-v5.0.3. cause the 
issue.


b9d460e serial: 8250: implement write_atomic



Re: [BUG REPORT] linux-input: keyboard: gpio_keys: False Button Press Event on Wake

2019-04-03 Thread dmitry.torok...@gmail.com
Hi Ken,

On Wed, Apr 03, 2019 at 05:50:09PM +, Ken Sloat wrote:
> Hello Dmitry,
> 
> I may have found a potential bug in the "gpio_keys" driver. FYI, I am
> running the 4.14 LTS kernel on my system, but from my understanding of
> the issue, it seems that this would still occur in the latest version
> of the kernel.
> 
> The problem: In the 4.14 LTS kernel, both key press and release events
> can generate a wake event. In the 5.x kernel, wake events are
> configurable for press only, release only or "both" (see
> "wakeup-event-action" binding). The issue can occur in the "both" case
> or release/deasserted case. Let's imagine that a system is suspended
> when a gpio key button is pressed, and subsequently resumed when the
> button is released. If we look at the sequence of actions and events
> reported by the input system, we can see the potential problem:
> 
> Button Pressed
> Event Value 1
> System Suspend
> Button Released
> System Wake & Resume
> Event Value 0
> Event Value 1
> Event Value 0
> 
> As you can see the input system will report an extra button
> event/press. This appears to be caused in gpio_keys_gpio_isr by the
> following statement:
> 
> if (bdata->suspended  &&
> (button->type == 0 || button->type == EV_KEY)) {
> /*
>  * Simulate wakeup key press in case the key has
>  * already released by the time we got interrupt
>  * handler to run.
>  */
> input_report_key(bdata->input, button->code, 1);
> }
> 
> This code does not seem to take into account that the wake event may
> have been caused by a button release action, and just assumes we must
> have a button press.
> 
> This can obviously be problematic in the use case I mentioned, as the
> system would be put in a constant loop between waking and sleeping.
> While there are other ways to deal with or react to this issue in the
> userspace, it seems that the driver should probably take this into
> account.
> 

I believe the expectation is that we do not go to sleep with button
still pressed, as we expect it to be released momentarily. Given that we
do not know which edge woke us it is not clear if we can avoid
simulating the keypress event, as this definitely causes "missed press",
at least on some Android devices, where by the time we get to run this
ISR user has already released the button.

Thanks.

-- 
Dmitry


Re: [bug report][stable] perf probe: failed to add events

2019-03-25 Thread Adrian Hunter
On 21/03/19 12:10 PM, Greg KH wrote:
> On Thu, Feb 28, 2019 at 09:19:08AM +0200, Adrian Hunter wrote:
>> On 28/02/19 4:07 AM, Joseph Qi wrote:
>>> Hi Adrian,
>>>
>>> On 19/2/27 20:39, Adrian Hunter wrote:
 Seems to be fixed by this:

 From: Adrian Hunter 
 Date: Wed, 27 Feb 2019 05:35:25 +0200
 Subject: [PATCH] perf probe: Fix getting the kernel map

 Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86
 PTI entry trampolines"), perf tools has been creating more than one kernel
 map, however 'perf probe' assumed there could be only one.

 Fix by using machine__kernel_map() to get the main kernel map.

 Signed-off-by: Adrian Hunter 
 Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI 
 entry trampolines")
>>>
>>> Below is my investigation result before, FYI.
>>> the first bad commit (v4.18 ~ v4.19):
>>> d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines
>>
>> Yes we should add a fixes tag for that also.
> 
> So, what do I need to do here for the stable tree(s) to resolve this
> issue?

Nothing.

Patch was re-sent with a stable tag so will go through like normal:


https://lore.kernel.org/lkml/2ed432de-e904-85d2-5c36-5897ddc5b...@intel.com/



Re: [bug report][stable] perf probe: failed to add events

2019-03-21 Thread Greg KH
On Thu, Feb 28, 2019 at 09:19:08AM +0200, Adrian Hunter wrote:
> On 28/02/19 4:07 AM, Joseph Qi wrote:
> > Hi Adrian,
> > 
> > On 19/2/27 20:39, Adrian Hunter wrote:
> >> Seems to be fixed by this:
> >>
> >> From: Adrian Hunter 
> >> Date: Wed, 27 Feb 2019 05:35:25 +0200
> >> Subject: [PATCH] perf probe: Fix getting the kernel map
> >>
> >> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86
> >> PTI entry trampolines"), perf tools has been creating more than one kernel
> >> map, however 'perf probe' assumed there could be only one.
> >>
> >> Fix by using machine__kernel_map() to get the main kernel map.
> >>
> >> Signed-off-by: Adrian Hunter 
> >> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI 
> >> entry trampolines")
> > 
> > Below is my investigation result before, FYI.
> > the first bad commit (v4.18 ~ v4.19):
> > d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines
> 
> Yes we should add a fixes tag for that also.

So, what do I need to do here for the stable tree(s) to resolve this
issue?

thanks,

greg k-h


Re: [bug report][stable] perf probe: failed to add events

2019-03-02 Thread Joseph Qi
Hi Adrian,

On 19/2/28 15:19, Adrian Hunter wrote:
> On 28/02/19 4:07 AM, Joseph Qi wrote:
>> Hi Adrian,
>>
>> On 19/2/27 20:39, Adrian Hunter wrote:
>>> Seems to be fixed by this:
>>>
>>> From: Adrian Hunter 
>>> Date: Wed, 27 Feb 2019 05:35:25 +0200
>>> Subject: [PATCH] perf probe: Fix getting the kernel map
>>>
>>> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86
>>> PTI entry trampolines"), perf tools has been creating more than one kernel
>>> map, however 'perf probe' assumed there could be only one.
>>>
>>> Fix by using machine__kernel_map() to get the main kernel map.
>>>
>>> Signed-off-by: Adrian Hunter 
>>> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI 
>>> entry trampolines")
>>
>> Below is my investigation result before, FYI.
>> the first bad commit (v4.18 ~ v4.19):
>> d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines
> 
> Yes we should add a fixes tag for that also.
> 
Have you sent out the patch officially?
I can't find it in the mail list.

Thanks,
Joseph

>> revert this commit on 4.19.0, it works.
>> the first good commit again (v4.19 ~ v4.20):
>> bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline
>> backported this commit as well as the related commit 98f05b5138f0 on
>> v4.19.24, it works.
>>
>> And I've tested your fix on v4.19.24, it also works.
>> Tested-by: Joseph Qi 
>>


Re: [bug report][stable] perf probe: failed to add events

2019-02-27 Thread Adrian Hunter
On 28/02/19 4:07 AM, Joseph Qi wrote:
> Hi Adrian,
> 
> On 19/2/27 20:39, Adrian Hunter wrote:
>> Seems to be fixed by this:
>>
>> From: Adrian Hunter 
>> Date: Wed, 27 Feb 2019 05:35:25 +0200
>> Subject: [PATCH] perf probe: Fix getting the kernel map
>>
>> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86
>> PTI entry trampolines"), perf tools has been creating more than one kernel
>> map, however 'perf probe' assumed there could be only one.
>>
>> Fix by using machine__kernel_map() to get the main kernel map.
>>
>> Signed-off-by: Adrian Hunter 
>> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI 
>> entry trampolines")
> 
> Below is my investigation result before, FYI.
> the first bad commit (v4.18 ~ v4.19):
> d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines

Yes we should add a fixes tag for that also.

> revert this commit on 4.19.0, it works.
> the first good commit again (v4.19 ~ v4.20):
> bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline
> backported this commit as well as the related commit 98f05b5138f0 on
> v4.19.24, it works.
> 
> And I've tested your fix on v4.19.24, it also works.
> Tested-by: Joseph Qi 
> 



Re: [bug report][stable] perf probe: failed to add events

2019-02-27 Thread Joseph Qi
Hi Adrian,

On 19/2/27 20:39, Adrian Hunter wrote:
> Seems to be fixed by this:
> 
> From: Adrian Hunter 
> Date: Wed, 27 Feb 2019 05:35:25 +0200
> Subject: [PATCH] perf probe: Fix getting the kernel map
> 
> Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86
> PTI entry trampolines"), perf tools has been creating more than one kernel
> map, however 'perf probe' assumed there could be only one.
> 
> Fix by using machine__kernel_map() to get the main kernel map.
> 
> Signed-off-by: Adrian Hunter 
> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry 
> trampolines")

Below is my investigation result before, FYI.
the first bad commit (v4.18 ~ v4.19):
d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines
revert this commit on 4.19.0, it works.
the first good commit again (v4.19 ~ v4.20):
bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline
backported this commit as well as the related commit 98f05b5138f0 on
v4.19.24, it works.

And I've tested your fix on v4.19.24, it also works.
Tested-by: Joseph Qi 


Re: [bug report][stable] perf probe: failed to add events

2019-02-27 Thread Adrian Hunter
On 26/02/19 4:20 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 26, 2019 at 02:08:02PM +0100, Greg KH escreveu:
>> On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote:
>>>
>>>
>>> On 19/2/26 17:05, Greg KH wrote:
 On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote:
> Hi,
>
> I'm using kernel v4.19.24 and have found that there is an issue when
> using perf probe to define a new dynamic tracepoint.
>
> $ perf probe -a handle_mm_fault
> Failed to write event: Numerical result out of range
>   Error: Failed to add events.
>
> I've also tried kernel v4.20, and it can pass.

 Ick, has this ever worked on the 4.19 stable tree?  If so, any chance
 you can run 'git bisect' to find the offending commit?

>>> >From my test, v4.19.0 also has this issue.
>>> Bisect locates that it is introduced by commit bf904d2762ee
>>> "x86/pti/64: Remove the SYSCALL64 entry trampoline".
>>
>> But that commit was in 4.20, not 4.19.  So if this never worked, it's
>> not a regression?
>>
>> confused,
> 
> Adrian, Ideas?
> 

Seems to be fixed by this:

From: Adrian Hunter 
Date: Wed, 27 Feb 2019 05:35:25 +0200
Subject: [PATCH] perf probe: Fix getting the kernel map

Since commit 4d99e4136580 ("perf machine: Workaround missing maps for x86
PTI entry trampolines"), perf tools has been creating more than one kernel
map, however 'perf probe' assumed there could be only one.

Fix by using machine__kernel_map() to get the main kernel map.

Signed-off-by: Adrian Hunter 
Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry 
trampolines")
---
 tools/perf/util/probe-event.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e86f8be89157..6cd96f9b346d 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -157,8 +157,10 @@ static struct map *kernel_get_module_map(const char 
*module)
if (module && strchr(module, '/'))
return dso__new_map(module);
 
-   if (!module)
-   module = "kernel";
+   if (!module) {
+   pos = machine__kernel_map(host_machine);
+   return map__get(pos);
+   }
 
for (pos = maps__first(maps); pos; pos = map__next(pos)) {
/* short_name is "[module]" */
-- 
2.19.1


Re: [bug report][stable] perf probe: failed to add events

2019-02-26 Thread Joseph Qi



On 19/2/26 21:08, Greg KH wrote:
> On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote:
>>
>>
>> On 19/2/26 17:05, Greg KH wrote:
>>> On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote:
 Hi,

 I'm using kernel v4.19.24 and have found that there is an issue when
 using perf probe to define a new dynamic tracepoint.

 $ perf probe -a handle_mm_fault
 Failed to write event: Numerical result out of range
   Error: Failed to add events.

 I've also tried kernel v4.20, and it can pass.
>>>
>>> Ick, has this ever worked on the 4.19 stable tree?  If so, any chance
>>> you can run 'git bisect' to find the offending commit?
>>>
>> >From my test, v4.19.0 also has this issue.
>> Bisect locates that it is introduced by commit bf904d2762ee
>> "x86/pti/64: Remove the SYSCALL64 entry trampoline".
> 
> But that commit was in 4.20, not 4.19.  So if this never worked, it's
> not a regression?
> 
> confused,
Oops, my fault.
The first bad commit is:
d83212d5dd67 kallsyms, x86: Export addresses of PTI entry trampolines

It is between v4.18 and v4.19.

> 
> greg k-h
> 


Re: [bug report][stable] perf probe: failed to add events

2019-02-26 Thread Arnaldo Carvalho de Melo
Em Tue, Feb 26, 2019 at 02:08:02PM +0100, Greg KH escreveu:
> On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote:
> > 
> > 
> > On 19/2/26 17:05, Greg KH wrote:
> > > On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote:
> > >> Hi,
> > >>
> > >> I'm using kernel v4.19.24 and have found that there is an issue when
> > >> using perf probe to define a new dynamic tracepoint.
> > >>
> > >> $ perf probe -a handle_mm_fault
> > >> Failed to write event: Numerical result out of range
> > >>   Error: Failed to add events.
> > >>
> > >> I've also tried kernel v4.20, and it can pass.
> > > 
> > > Ick, has this ever worked on the 4.19 stable tree?  If so, any chance
> > > you can run 'git bisect' to find the offending commit?
> > > 
> > >From my test, v4.19.0 also has this issue.
> > Bisect locates that it is introduced by commit bf904d2762ee
> > "x86/pti/64: Remove the SYSCALL64 entry trampoline".
> 
> But that commit was in 4.20, not 4.19.  So if this never worked, it's
> not a regression?
> 
> confused,

Adrian, Ideas?

- Arnaldo


Re: [bug report][stable] perf probe: failed to add events

2019-02-26 Thread Greg KH
On Tue, Feb 26, 2019 at 08:32:34PM +0800, Joseph Qi wrote:
> 
> 
> On 19/2/26 17:05, Greg KH wrote:
> > On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote:
> >> Hi,
> >>
> >> I'm using kernel v4.19.24 and have found that there is an issue when
> >> using perf probe to define a new dynamic tracepoint.
> >>
> >> $ perf probe -a handle_mm_fault
> >> Failed to write event: Numerical result out of range
> >>   Error: Failed to add events.
> >>
> >> I've also tried kernel v4.20, and it can pass.
> > 
> > Ick, has this ever worked on the 4.19 stable tree?  If so, any chance
> > you can run 'git bisect' to find the offending commit?
> > 
> >From my test, v4.19.0 also has this issue.
> Bisect locates that it is introduced by commit bf904d2762ee
> "x86/pti/64: Remove the SYSCALL64 entry trampoline".

But that commit was in 4.20, not 4.19.  So if this never worked, it's
not a regression?

confused,

greg k-h


Re: [bug report][stable] perf probe: failed to add events

2019-02-26 Thread Joseph Qi



On 19/2/26 17:05, Greg KH wrote:
> On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote:
>> Hi,
>>
>> I'm using kernel v4.19.24 and have found that there is an issue when
>> using perf probe to define a new dynamic tracepoint.
>>
>> $ perf probe -a handle_mm_fault
>> Failed to write event: Numerical result out of range
>>   Error: Failed to add events.
>>
>> I've also tried kernel v4.20, and it can pass.
> 
> Ick, has this ever worked on the 4.19 stable tree?  If so, any chance
> you can run 'git bisect' to find the offending commit?
> 
>From my test, v4.19.0 also has this issue.
Bisect locates that it is introduced by commit bf904d2762ee
"x86/pti/64: Remove the SYSCALL64 entry trampoline".

Thanks,
Joseph

>> So I've bisected and finally found the first good commit is:
>> bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline
>> which is based on another commit:
>> 98f05b5138f0 Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
>>
>> Once I've backpoted these two commits into 4.19.24, the above case can
>> pass, though I'm not sure how it is fixed.
>> So is there any plan to let them go into stable as well?
> 
> If they are needed, I'll gladly queue them up, but this feels like
> something might have broken, so it should be easier to just revert the
> offending commit instead.
> 
> Andy, any ideas?
> 
> thanks,
> 
> greg k-h
> 


Re: [bug report][stable] perf probe: failed to add events

2019-02-26 Thread Greg KH
On Tue, Feb 26, 2019 at 03:31:14PM +0800, Joseph Qi wrote:
> Hi,
> 
> I'm using kernel v4.19.24 and have found that there is an issue when
> using perf probe to define a new dynamic tracepoint.
> 
> $ perf probe -a handle_mm_fault
> Failed to write event: Numerical result out of range
>   Error: Failed to add events.
> 
> I've also tried kernel v4.20, and it can pass.

Ick, has this ever worked on the 4.19 stable tree?  If so, any chance
you can run 'git bisect' to find the offending commit?

> So I've bisected and finally found the first good commit is:
> bf904d2762ee x86/pti/64: Remove the SYSCALL64 entry trampoline
> which is based on another commit:
> 98f05b5138f0 Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
> 
> Once I've backpoted these two commits into 4.19.24, the above case can
> pass, though I'm not sure how it is fixed.
> So is there any plan to let them go into stable as well?

If they are needed, I'll gladly queue them up, but this feels like
something might have broken, so it should be easier to just revert the
offending commit instead.

Andy, any ideas?

thanks,

greg k-h


Re: bug report: iwlwifi: mvm: support mac80211 TXQs model

2019-02-20 Thread Johannes Berg
On Wed, 2019-02-20 at 14:40 +, Colin Ian King wrote:
> 
> ..when the used_hw_queues initialization was removed:
> 
> @@ -360,8 +300,6 @@ int iwl_mvm_mac_ctxt_init(struct iwl_mvm *mvm,
> struct ieee80211_vif *vif)
> mvm->hw, IEEE80211_IFACE_ITER_RESUME_ALL,
> iwl_mvm_mac_iface_iterator, );
> 
> -   used_hw_queues = iwl_mvm_get_used_hw_queues(mvm, vif);
> -
> /*
>  * In the case we're getting here during resume, it's similar to
>  * firmware restart, and with RESUME_ALL the iterator will find
> 
> 
> I'm not 100% sure if the right thing to do is just to now initialize
> used_hw_queues to zero; it's not entirely clear what the initial value
> should be now.

My gut feeling says that we should just remove all of the code - we no
longer use the vif->hw_queue stuff. Which also explains why we didn't
notice any problems from this :)

Something like

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
index aa308f7e7989..f90383943c62 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
@@ -263,8 +263,7 @@ int iwl_mvm_mac_ctxt_init(struct iwl_mvm *mvm, struct 
ieee80211_vif *vif)
.found_vif = false,
};
u32 ac;
-   int ret, i, queue_limit;
-   unsigned long used_hw_queues;
+   int ret, i;
 
lockdep_assert_held(>mutex);
 
@@ -341,38 +340,9 @@ int iwl_mvm_mac_ctxt_init(struct iwl_mvm *mvm, struct 
ieee80211_vif *vif)
INIT_LIST_HEAD(>time_event_data.list);
mvmvif->time_event_data.id = TE_MAX;
 
-   /* No need to allocate data queues to P2P Device MAC and NAN.*/
if (vif->type == NL80211_IFTYPE_P2P_DEVICE ||
-   vif->type == NL80211_IFTYPE_NAN) {
-   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++)
-   vif->hw_queue[ac] = IEEE80211_INVAL_HW_QUEUE;
-
+   vif->type == NL80211_IFTYPE_NAN)
return 0;
-   }
-
-   /*
-* queues in mac80211 almost entirely independent of
-* the ones here - no real limit
-*/
-   queue_limit = IEEE80211_MAX_QUEUES;
-
-   /*
-* Find available queues, and allocate them to the ACs. When in
-* DQA-mode they aren't really used, and this is done only so the
-* mac80211 ieee80211_check_queues() function won't fail
-*/
-   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
-   u8 queue = find_first_zero_bit(_hw_queues, queue_limit);
-
-   if (queue >= queue_limit) {
-   IWL_ERR(mvm, "Failed to allocate queue\n");
-   ret = -EIO;
-   goto exit_fail;
-   }
-
-   __set_bit(queue, _hw_queues);
-   vif->hw_queue[ac] = queue;
-   }
 
/* Allocate the CAB queue for softAP and GO interfaces */
if (vif->type == NL80211_IFTYPE_AP ||


I'll think about it a bit more and test it out, thanks for the report!

johannes



Re: [bug report] Input: add st-keyscan driver

2019-01-30 Thread Gabriel FERNANDEZ
Hi all,

I prefer to fix it. I guess people used their own correction.

I will send you a fix  asap.


Best Regards


Gabriel


On 1/27/19 2:20 AM, Ken Sloat wrote:
> On Sat, Jan 26, 2019 at 5:15 PM Dmitry Torokhov
>  wrote:
>> On Sat, Jan 26, 2019 at 1:25 PM Ken Sloat
>>  wrote:
>>> On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter  
>>> wrote:
 Hello Gabriel FERNANDEZ,
>>> Hello Dan,
>>>
>>> I have added CCs for the maintainers as well as Gabriel Fernandez as
>>> currently you just have the linux-input mailing list
>>>
 The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12,
 2014, leads to the following static checker warning:

  drivers/input/keyboard/st-keyscan.c:156 keyscan_probe()
  error: potential zalloc NULL dereference: 'keypad_data->input_dev'

 drivers/input/keyboard/st-keyscan.c
  125 static int keyscan_probe(struct platform_device *pdev)
  126 {
  127 struct st_keyscan *keypad_data;
  128 struct input_dev *input_dev;
  129 struct resource *res;
  130 int error;
  131
  132 if (!pdev->dev.of_node) {
  133 dev_err(>dev, "no DT data present\n");
  134 return -EINVAL;
  135 }
  136
  137 keypad_data = devm_kzalloc(>dev, 
 sizeof(*keypad_data),
  138GFP_KERNEL);
  139 if (!keypad_data)
  140 return -ENOMEM;
  141
  142 input_dev = devm_input_allocate_device(>dev);
  143 if (!input_dev) {
  144 dev_err(>dev, "failed to allocate the input 
 device\n");
  145 return -ENOMEM;
  146 }
  147
  148 input_dev->name = pdev->name;
  149 input_dev->phys = "keyscan-keys/input0";
  150 input_dev->dev.parent = >dev;
  151 input_dev->open = keyscan_open;
  152 input_dev->close = keyscan_close;
  153
  154 input_dev->id.bustype = BUS_HOST;
  155
 --> 156 error = keypad_matrix_key_parse_dt(keypad_data);
 ^^^
>>> I agree with you this would be a problem
>>> to clarify the NULL derefence would occur here within 
>>> keypad_matrix_key_parse_dt
>>>
>>> struct device *dev = keypad_data->input_dev->dev.parent;
>>>
 This assumes we have set "keypad_data->input_dev = input_dev;" but we
 don't do that until...

  157 if (error)
  158 return error;
  159
  160 error = matrix_keypad_build_keymap(NULL, NULL,
  161keypad_data->n_rows,
  162keypad_data->n_cols,
  163NULL, input_dev);
  164 if (error) {
  165 dev_err(>dev, "failed to build keymap\n");
  166 return error;
  167 }
  168
  169 input_set_drvdata(input_dev, keypad_data);
  170
  171 keypad_data->input_dev = input_dev;
  ^^

 this line here.  This driver has never worked and it was included almost
 five years ago.  Is it worth fixing?

  172
  173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
  174 keypad_data->base = devm_ioremap_resource(>dev, 
 res);
  175 if (IS_ERR(keypad_data->base))
  176 return PTR_ERR(keypad_data->base);
  177

 regards,
 dan carpenter

>>> Here is the interesting thing, I was looking on patchwork, and several
>>> of the patches including what appears to be the latest actually set
>>> "keypad_data->input_dev = input_dev" before calling
>>> "keypad_matrix_key_parse_dt"
>>>
>>>  From v4 on patchwork
>>> + if (IS_ERR(keypad_data->clk)) {
>>> + dev_err(>dev, "cannot get clock");
>>> + return PTR_ERR(keypad_data->clk);
>>> + }
>>> +
>>> + keypad_data->input_dev = input_dev;
>>> +
>>> + input_dev->name = pdev->name;
>>> + input_dev->phys = "keyscan-keys/input0";
>>> + input_dev->dev.parent = >dev;
>>> + input_dev->open = keyscan_open;
>>> + input_dev->close = keyscan_close;
>>> +
>>> + input_dev->id.bustype = BUS_HOST;
>>> +
>>> + error = keypad_matrix_key_parse_dt(keypad_data);
>>>
>>> According to patchwork, these aren't listed as accepted, so I'm not
>>> sure where the exact accepted patch came from. Looking at the commit
>>> log, it looks like the issue you showed above was made in the original
>>> commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure
>>> what is 

Re: [bug report] Input: add st-keyscan driver

2019-01-26 Thread Ken Sloat
On Sat, Jan 26, 2019 at 5:15 PM Dmitry Torokhov
 wrote:
>
> On Sat, Jan 26, 2019 at 1:25 PM Ken Sloat
>  wrote:
> >
> > On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter  
> > wrote:
> > >
> > > Hello Gabriel FERNANDEZ,
> >
> > Hello Dan,
> >
> > I have added CCs for the maintainers as well as Gabriel Fernandez as
> > currently you just have the linux-input mailing list
> >
> > > The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12,
> > > 2014, leads to the following static checker warning:
> > >
> > > drivers/input/keyboard/st-keyscan.c:156 keyscan_probe()
> > > error: potential zalloc NULL dereference: 'keypad_data->input_dev'
> > >
> > > drivers/input/keyboard/st-keyscan.c
> > > 125 static int keyscan_probe(struct platform_device *pdev)
> > > 126 {
> > > 127 struct st_keyscan *keypad_data;
> > > 128 struct input_dev *input_dev;
> > > 129 struct resource *res;
> > > 130 int error;
> > > 131
> > > 132 if (!pdev->dev.of_node) {
> > > 133 dev_err(>dev, "no DT data present\n");
> > > 134 return -EINVAL;
> > > 135 }
> > > 136
> > > 137 keypad_data = devm_kzalloc(>dev, 
> > > sizeof(*keypad_data),
> > > 138GFP_KERNEL);
> > > 139 if (!keypad_data)
> > > 140 return -ENOMEM;
> > > 141
> > > 142 input_dev = devm_input_allocate_device(>dev);
> > > 143 if (!input_dev) {
> > > 144 dev_err(>dev, "failed to allocate the input 
> > > device\n");
> > > 145 return -ENOMEM;
> > > 146 }
> > > 147
> > > 148 input_dev->name = pdev->name;
> > > 149 input_dev->phys = "keyscan-keys/input0";
> > > 150 input_dev->dev.parent = >dev;
> > > 151 input_dev->open = keyscan_open;
> > > 152 input_dev->close = keyscan_close;
> > > 153
> > > 154 input_dev->id.bustype = BUS_HOST;
> > > 155
> > > --> 156 error = keypad_matrix_key_parse_dt(keypad_data);
> > >^^^
> > I agree with you this would be a problem
> > to clarify the NULL derefence would occur here within 
> > keypad_matrix_key_parse_dt
> >
> > struct device *dev = keypad_data->input_dev->dev.parent;
> >
> > > This assumes we have set "keypad_data->input_dev = input_dev;" but we
> > > don't do that until...
> > >
> > > 157 if (error)
> > > 158 return error;
> > > 159
> > > 160 error = matrix_keypad_build_keymap(NULL, NULL,
> > > 161keypad_data->n_rows,
> > > 162keypad_data->n_cols,
> > > 163NULL, input_dev);
> > > 164 if (error) {
> > > 165 dev_err(>dev, "failed to build keymap\n");
> > > 166 return error;
> > > 167 }
> > > 168
> > > 169 input_set_drvdata(input_dev, keypad_data);
> > > 170
> > > 171 keypad_data->input_dev = input_dev;
> > > ^^
> > >
> > > this line here.  This driver has never worked and it was included almost
> > > five years ago.  Is it worth fixing?
> > >
> > > 172
> > > 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > 174 keypad_data->base = devm_ioremap_resource(>dev, 
> > > res);
> > > 175 if (IS_ERR(keypad_data->base))
> > > 176 return PTR_ERR(keypad_data->base);
> > > 177
> > >
> > > regards,
> > > dan carpenter
> > >
> >
> > Here is the interesting thing, I was looking on patchwork, and several
> > of the patches including what appears to be the latest actually set
> > "keypad_data->input_dev = input_dev" before calling
> > "keypad_matrix_key_parse_dt"
> >
> > From v4 on patchwork
> > + if (IS_ERR(keypad_data->clk)) {
> > + dev_err(>dev, "cannot get clock");
> > + return PTR_ERR(keypad_data->clk);
> > + }
> > +
> > + keypad_data->input_dev = input_dev;
> > +
> > + input_dev->name = pdev->name;
> > + input_dev->phys = "keyscan-keys/input0";
> > + input_dev->dev.parent = >dev;
> > + input_dev->open = keyscan_open;
> > + input_dev->close = keyscan_close;
> > +
> > + input_dev->id.bustype = BUS_HOST;
> > +
> > + error = keypad_matrix_key_parse_dt(keypad_data);
> >
> > According to patchwork, these aren't listed as accepted, so I'm not
> > sure where the exact accepted patch came from. Looking at the commit
> > log, it looks like the issue you showed above was made in the original
> > commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure
> > what is going on here. Maybe the maintainer can chime in with the
> > original patch/mailing list discussion on this. For reference, I've
> > 

Re: [bug report] Input: add st-keyscan driver

2019-01-26 Thread Dmitry Torokhov
On Sat, Jan 26, 2019 at 1:25 PM Ken Sloat
 wrote:
>
> On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter  
> wrote:
> >
> > Hello Gabriel FERNANDEZ,
>
> Hello Dan,
>
> I have added CCs for the maintainers as well as Gabriel Fernandez as
> currently you just have the linux-input mailing list
>
> > The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12,
> > 2014, leads to the following static checker warning:
> >
> > drivers/input/keyboard/st-keyscan.c:156 keyscan_probe()
> > error: potential zalloc NULL dereference: 'keypad_data->input_dev'
> >
> > drivers/input/keyboard/st-keyscan.c
> > 125 static int keyscan_probe(struct platform_device *pdev)
> > 126 {
> > 127 struct st_keyscan *keypad_data;
> > 128 struct input_dev *input_dev;
> > 129 struct resource *res;
> > 130 int error;
> > 131
> > 132 if (!pdev->dev.of_node) {
> > 133 dev_err(>dev, "no DT data present\n");
> > 134 return -EINVAL;
> > 135 }
> > 136
> > 137 keypad_data = devm_kzalloc(>dev, sizeof(*keypad_data),
> > 138GFP_KERNEL);
> > 139 if (!keypad_data)
> > 140 return -ENOMEM;
> > 141
> > 142 input_dev = devm_input_allocate_device(>dev);
> > 143 if (!input_dev) {
> > 144 dev_err(>dev, "failed to allocate the input 
> > device\n");
> > 145 return -ENOMEM;
> > 146 }
> > 147
> > 148 input_dev->name = pdev->name;
> > 149 input_dev->phys = "keyscan-keys/input0";
> > 150 input_dev->dev.parent = >dev;
> > 151 input_dev->open = keyscan_open;
> > 152 input_dev->close = keyscan_close;
> > 153
> > 154 input_dev->id.bustype = BUS_HOST;
> > 155
> > --> 156 error = keypad_matrix_key_parse_dt(keypad_data);
> >^^^
> I agree with you this would be a problem
> to clarify the NULL derefence would occur here within 
> keypad_matrix_key_parse_dt
>
> struct device *dev = keypad_data->input_dev->dev.parent;
>
> > This assumes we have set "keypad_data->input_dev = input_dev;" but we
> > don't do that until...
> >
> > 157 if (error)
> > 158 return error;
> > 159
> > 160 error = matrix_keypad_build_keymap(NULL, NULL,
> > 161keypad_data->n_rows,
> > 162keypad_data->n_cols,
> > 163NULL, input_dev);
> > 164 if (error) {
> > 165 dev_err(>dev, "failed to build keymap\n");
> > 166 return error;
> > 167 }
> > 168
> > 169 input_set_drvdata(input_dev, keypad_data);
> > 170
> > 171 keypad_data->input_dev = input_dev;
> > ^^
> >
> > this line here.  This driver has never worked and it was included almost
> > five years ago.  Is it worth fixing?
> >
> > 172
> > 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > 174 keypad_data->base = devm_ioremap_resource(>dev, res);
> > 175 if (IS_ERR(keypad_data->base))
> > 176 return PTR_ERR(keypad_data->base);
> > 177
> >
> > regards,
> > dan carpenter
> >
>
> Here is the interesting thing, I was looking on patchwork, and several
> of the patches including what appears to be the latest actually set
> "keypad_data->input_dev = input_dev" before calling
> "keypad_matrix_key_parse_dt"
>
> From v4 on patchwork
> + if (IS_ERR(keypad_data->clk)) {
> + dev_err(>dev, "cannot get clock");
> + return PTR_ERR(keypad_data->clk);
> + }
> +
> + keypad_data->input_dev = input_dev;
> +
> + input_dev->name = pdev->name;
> + input_dev->phys = "keyscan-keys/input0";
> + input_dev->dev.parent = >dev;
> + input_dev->open = keyscan_open;
> + input_dev->close = keyscan_close;
> +
> + input_dev->id.bustype = BUS_HOST;
> +
> + error = keypad_matrix_key_parse_dt(keypad_data);
>
> According to patchwork, these aren't listed as accepted, so I'm not
> sure where the exact accepted patch came from. Looking at the commit
> log, it looks like the issue you showed above was made in the original
> commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure
> what is going on here. Maybe the maintainer can chime in with the
> original patch/mailing list discussion on this. For reference, I've
> added the patchwork links below
> https://patchwork.kernel.org/patch/3854341/
> https://patchwork.kernel.org/patch/3968891/
> https://patchwork.kernel.org/patch/3969991/

It may very well be that I messed up when applying the patch. I guess
whatever platform that is using the driver has not attempted to update
their 

Re: [bug report] Input: add st-keyscan driver

2019-01-26 Thread Ken Sloat
On Tue, Jan 22, 2019 at 1:53 PM Dan Carpenter  wrote:
>
> Hello Gabriel FERNANDEZ,

Hello Dan,

I have added CCs for the maintainers as well as Gabriel Fernandez as
currently you just have the linux-input mailing list

> The patch 062589b13991: "Input: add st-keyscan driver" from Apr 12,
> 2014, leads to the following static checker warning:
>
> drivers/input/keyboard/st-keyscan.c:156 keyscan_probe()
> error: potential zalloc NULL dereference: 'keypad_data->input_dev'
>
> drivers/input/keyboard/st-keyscan.c
> 125 static int keyscan_probe(struct platform_device *pdev)
> 126 {
> 127 struct st_keyscan *keypad_data;
> 128 struct input_dev *input_dev;
> 129 struct resource *res;
> 130 int error;
> 131
> 132 if (!pdev->dev.of_node) {
> 133 dev_err(>dev, "no DT data present\n");
> 134 return -EINVAL;
> 135 }
> 136
> 137 keypad_data = devm_kzalloc(>dev, sizeof(*keypad_data),
> 138GFP_KERNEL);
> 139 if (!keypad_data)
> 140 return -ENOMEM;
> 141
> 142 input_dev = devm_input_allocate_device(>dev);
> 143 if (!input_dev) {
> 144 dev_err(>dev, "failed to allocate the input 
> device\n");
> 145 return -ENOMEM;
> 146 }
> 147
> 148 input_dev->name = pdev->name;
> 149 input_dev->phys = "keyscan-keys/input0";
> 150 input_dev->dev.parent = >dev;
> 151 input_dev->open = keyscan_open;
> 152 input_dev->close = keyscan_close;
> 153
> 154 input_dev->id.bustype = BUS_HOST;
> 155
> --> 156 error = keypad_matrix_key_parse_dt(keypad_data);
>^^^
I agree with you this would be a problem
to clarify the NULL derefence would occur here within keypad_matrix_key_parse_dt

struct device *dev = keypad_data->input_dev->dev.parent;

> This assumes we have set "keypad_data->input_dev = input_dev;" but we
> don't do that until...
>
> 157 if (error)
> 158 return error;
> 159
> 160 error = matrix_keypad_build_keymap(NULL, NULL,
> 161keypad_data->n_rows,
> 162keypad_data->n_cols,
> 163NULL, input_dev);
> 164 if (error) {
> 165 dev_err(>dev, "failed to build keymap\n");
> 166 return error;
> 167 }
> 168
> 169 input_set_drvdata(input_dev, keypad_data);
> 170
> 171 keypad_data->input_dev = input_dev;
> ^^
>
> this line here.  This driver has never worked and it was included almost
> five years ago.  Is it worth fixing?
>
> 172
> 173 res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> 174 keypad_data->base = devm_ioremap_resource(>dev, res);
> 175 if (IS_ERR(keypad_data->base))
> 176 return PTR_ERR(keypad_data->base);
> 177
>
> regards,
> dan carpenter
>

Here is the interesting thing, I was looking on patchwork, and several
of the patches including what appears to be the latest actually set
"keypad_data->input_dev = input_dev" before calling
"keypad_matrix_key_parse_dt"

>From v4 on patchwork
+ if (IS_ERR(keypad_data->clk)) {
+ dev_err(>dev, "cannot get clock");
+ return PTR_ERR(keypad_data->clk);
+ }
+
+ keypad_data->input_dev = input_dev;
+
+ input_dev->name = pdev->name;
+ input_dev->phys = "keyscan-keys/input0";
+ input_dev->dev.parent = >dev;
+ input_dev->open = keyscan_open;
+ input_dev->close = keyscan_close;
+
+ input_dev->id.bustype = BUS_HOST;
+
+ error = keypad_matrix_key_parse_dt(keypad_data);

According to patchwork, these aren't listed as accepted, so I'm not
sure where the exact accepted patch came from. Looking at the commit
log, it looks like the issue you showed above was made in the original
commit 062589b1399176a9c14bc68e16169f40439d658c so I'm not quite sure
what is going on here. Maybe the maintainer can chime in with the
original patch/mailing list discussion on this. For reference, I've
added the patchwork links below
https://patchwork.kernel.org/patch/3854341/
https://patchwork.kernel.org/patch/3968891/
https://patchwork.kernel.org/patch/3969991/

Thanks,
Ken Sloat


Re: Bug report: unaligned access with ext4 encryption

2019-01-10 Thread Aaro Koskinen
Hi,

On Thu, Jan 10, 2019 at 03:01:14PM -0800, Eric Biggers wrote:
> On Fri, Jan 11, 2019 at 12:29:28AM +0200, Aaro Koskinen wrote:
> > Hi,
> > 
> > On Fri, Jan 04, 2019 at 05:28:02PM +, David Howells wrote:
> > > Eric Biggers  wrote:
> > > > Hi Aaro, thanks for the bug report!  I think you're on the right track; 
> > > > it makes
> > > > much more sense to have the keyrings subsystem store the payload with 
> > > > better
> > > > alignment, than to work around the 2-byte alignment in fscrypt.
> > > > 
> > > > But how about '__aligned(__alignof__(u64))' instead?  4 bytes may not 
> > > > be enough.
> > > > 
> > > > David, what do you think?
> > > 
> > > Does that even work?
> > 
> > That should work.
> > 
> > > Might be better to just insert 6 bytes of padding with a comment, but yes 
> > > I
> > > agree that it's probably better to align it to at least machine word size.
> > 
> > Padding is fragile, e.g. if struct rcu_head changes. Using __aligned should
> > make it always right automatically.
> > 
> > A.
> 
> I agree that __aligned is better.  It should work; see 'struct crypto_tfm' in
> include/linux/crypto.h for another example of a struct that uses __aligned on 
> a
> flexible array at the end.
> 
> Aaro, can you send a formal patch?  If you don't I'll do so, but I figure I'll
> ask first.

Please go ahead; I'd prefer if you send the patch, I will then test it
on SPARC and reply with Tested-by (if it works :).

A.


Re: Bug report: unaligned access with ext4 encryption

2019-01-10 Thread Eric Biggers
On Fri, Jan 11, 2019 at 12:29:28AM +0200, Aaro Koskinen wrote:
> Hi,
> 
> On Fri, Jan 04, 2019 at 05:28:02PM +, David Howells wrote:
> > Eric Biggers  wrote:
> > > Hi Aaro, thanks for the bug report!  I think you're on the right track; 
> > > it makes
> > > much more sense to have the keyrings subsystem store the payload with 
> > > better
> > > alignment, than to work around the 2-byte alignment in fscrypt.
> > > 
> > > But how about '__aligned(__alignof__(u64))' instead?  4 bytes may not be 
> > > enough.
> > > 
> > > David, what do you think?
> > 
> > Does that even work?
> 
> That should work.
> 
> > Might be better to just insert 6 bytes of padding with a comment, but yes I
> > agree that it's probably better to align it to at least machine word size.
> 
> Padding is fragile, e.g. if struct rcu_head changes. Using __aligned should
> make it always right automatically.
> 
> A.

I agree that __aligned is better.  It should work; see 'struct crypto_tfm' in
include/linux/crypto.h for another example of a struct that uses __aligned on a
flexible array at the end.

Aaro, can you send a formal patch?  If you don't I'll do so, but I figure I'll
ask first.

Thanks,

- Eric


Re: Bug report: unaligned access with ext4 encryption

2019-01-10 Thread Aaro Koskinen
Hi,

On Fri, Jan 04, 2019 at 05:28:02PM +, David Howells wrote:
> Eric Biggers  wrote:
> > Hi Aaro, thanks for the bug report!  I think you're on the right track; it 
> > makes
> > much more sense to have the keyrings subsystem store the payload with better
> > alignment, than to work around the 2-byte alignment in fscrypt.
> > 
> > But how about '__aligned(__alignof__(u64))' instead?  4 bytes may not be 
> > enough.
> > 
> > David, what do you think?
> 
> Does that even work?

That should work.

> Might be better to just insert 6 bytes of padding with a comment, but yes I
> agree that it's probably better to align it to at least machine word size.

Padding is fragile, e.g. if struct rcu_head changes. Using __aligned should
make it always right automatically.

A.


Re: Bug report: unaligned access with ext4 encryption

2019-01-04 Thread David Howells
Eric Biggers  wrote:

> Hi Aaro, thanks for the bug report!  I think you're on the right track; it 
> makes
> much more sense to have the keyrings subsystem store the payload with better
> alignment, than to work around the 2-byte alignment in fscrypt.
> 
> But how about '__aligned(__alignof__(u64))' instead?  4 bytes may not be 
> enough.
> 
> David, what do you think?

Does that even work?

Might be better to just insert 6 bytes of padding with a comment, but yes I
agree that it's probably better to align it to at least machine word size.

David


Re: Bug report: unaligned access with ext4 encryption

2019-01-03 Thread Eric Biggers
On Sun, Dec 30, 2018 at 06:29:06PM +0200, Aaro Koskinen wrote:
> Hi,
> 
> When using ext4 encryption on SPARC, there's plenty of dmesg noise about
> unaligned access:
> 
> [  167.269526] Kernel unaligned access at TPC[5497a0] 
> find_and_lock_process_key+0x80/0x120
> [  167.270152] Kernel unaligned access at TPC[5497a0] 
> find_and_lock_process_key+0x80/0x120
> [  181.087451] log_unaligned: 5 callbacks suppressed
> [  181.087511] Kernel unaligned access at TPC[5497a0] 
> find_and_lock_process_key+0x80/0x120
> [  181.092435] Kernel unaligned access at TPC[5497a0] 
> find_and_lock_process_key+0x80/0x120
> [  181.095816] Kernel unaligned access at TPC[5497a0] 
> find_and_lock_process_key+0x80/0x120
> 
> And also seen on an ARM machine:
> 
> $ cat /proc/cpu/alignment
> User:   0
> System: 1028193 (find_and_lock_process_key+0x84/0x10c)
> Skipped:0
> Half:   0
> Word:   1028193
> DWord:  0
> Multi:  0
> User faults:0 (ignored)
> 
> Looks like user_key_payload layout is not optimal when data address
> is used for fscrypt_key... I tried the below change and got rid of the
> messages. Not sure what the proper fix should be?
> 
> A.
> 
> diff --git a/include/keys/user-type.h b/include/keys/user-type.h
> index e098cbe27db5..6495ffcfe510 100644
> --- a/include/keys/user-type.h
> +++ b/include/keys/user-type.h
> @@ -31,7 +31,7 @@
>  struct user_key_payload {
>   struct rcu_head rcu;/* RCU destructor */
>   unsigned short  datalen;/* length of this data */
> - chardata[0];/* actual data */
> + char data[0] __aligned(4);  /* actual data */
>  };
>  
>  extern struct key_type key_type_user;
> 

Hi Aaro, thanks for the bug report!  I think you're on the right track; it makes
much more sense to have the keyrings subsystem store the payload with better
alignment, than to work around the 2-byte alignment in fscrypt.

But how about '__aligned(__alignof__(u64))' instead?  4 bytes may not be enough.

David, what do you think?

- Eric


Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization

2018-12-27 Thread Mike Kravetz
On 12/27/18 6:45 PM, Andrew Morton wrote:
> On Thu, 27 Dec 2018 11:24:31 -0800 Mike Kravetz  
> wrote:
>> It would be better to make an explicit check for mapping != null before
>> calling i_mmap_lock_write/try_to_unmap.  In this way, unrelated changes to
>> code above will not potentially lead to the possibility of mapping == null.
>>
>> I'm not sure what is the best way to handle this.  Below is an updated 
>> version
>> of the patch sent to Andrew.  I can also provide a simple patch to the patch
>> if that is easier.
>>
> 
> Below is the delta.  Please check it.  It seems to do more than the
> above implies.
> 
> Also, I have notes here that 
> 
> hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
> and
> hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch
> 
> have additional updates pending.  Due to emails such as
> 
> http://lkml.kernel.org/r/849f5202-2200-265f-7769-8363053e8...@oracle.com
> http://lkml.kernel.org/r/732c0b7d-5a4e-97a8-9677-30f352089...@oracle.com
> http://lkml.kernel.org/r/6b91dd42-b903-1f6c-729a-bd9f51273...@oracle.com
> 
> What's the status, please?
> 

There was a V3 of the patches which was Acked-by Kirill.   See,
http://lkml.kernel.org/r/20181224101349.jjjmk2hzwah6g64h@kshutemo-mobl1

The two V3 patches are:
http://lkml.kernel.org/r/2018123013.22193-2-mike.krav...@oracle.com
http://lkml.kernel.org/r/2018123013.22193-3-mike.krav...@oracle.com

The patch I sent in this thread was an update to the V3.  The delta you
created was based on V2.  So, the delta contains V2 -> V3 changes as well
as the changes mentioned in this thread.  My apologies for not noticing
and clarifying.

Let me know what you would like me to do to help.  I hate to send any
more patches right now as they might cause more confusion.
-- 
Mike Kravetz


Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization

2018-12-27 Thread Andrew Morton
On Thu, 27 Dec 2018 11:24:31 -0800 Mike Kravetz  wrote:

> On 12/27/18 3:44 AM, Colin Ian King wrote:
> > Hi,
> > 
> > Static analysis with CoverityScan on linux-next detected a potential
> > null pointer dereference with the following commit:
> > 
> > From d8a1051ed4ba55679ef24e838a1942c9c40f0a14 Mon Sep 17 00:00:00 2001
> > From: Mike Kravetz 
> > Date: Sat, 22 Dec 2018 10:55:57 +1100
> > Subject: [PATCH] hugetlbfs: use i_mmap_rwsem for more pmd sharing
> > 
> > The earlier check implies that "mapping" may be a null pointer:
> > 
> > var_compare_op: Comparing mapping to null implies that mapping might be
> > null.
> > 
> > 1008if (!(flags & MF_MUST_KILL) && !PageDirty(hpage) && mapping &&
> > 1009mapping_cap_writeback_dirty(mapping)) {
> > 
> > ..however later "mapper" is dereferenced when it may be potentially null:
> > 
> > 1034/*
> > 1035 * For hugetlb pages, try_to_unmap could potentially
> > call
> > 1036 * huge_pmd_unshare.  Because of this, take semaphore in
> > 1037 * write mode here and set TTU_RMAP_LOCKED to
> > indicate we
> > 1038 * have taken the lock at this higer level.
> > 1039 */
> > CID 1476097 (#1 of 1): Dereference after null check (FORWARD_NULL)
> > 
> > var_deref_model: Passing null pointer mapping to
> > i_mmap_lock_write, which dereferences it.
> > 
> > 1040i_mmap_lock_write(mapping);
> > 1041unmap_success = try_to_unmap(hpage,
> > ttu|TTU_RMAP_LOCKED);
> > 1042i_mmap_unlock_write(mapping);
> > 
> 
> Thanks for the report.
> 
> The 'good news' is that mapping can not be null in the code path above.
> The reasons are:
> - The page is locked upon entry to the routine
> - Earlier in the routine there is the check:
>   if (!page_mapped(hpage))
>   return true;
>   For huge pages (which are processed in the else clause above), page_mapped
>   implies page->mapping != null.
> 
> However, the routine hwpoison_user_mappings handles all page types.  The
> page_mapped check is actually there to check for pages in the swap cache.
> It is just coincidence that it also implies mapping != null for huge pages.
> 
> It would be better to make an explicit check for mapping != null before
> calling i_mmap_lock_write/try_to_unmap.  In this way, unrelated changes to
> code above will not potentially lead to the possibility of mapping == null.
> 
> I'm not sure what is the best way to handle this.  Below is an updated version
> of the patch sent to Andrew.  I can also provide a simple patch to the patch
> if that is easier.
> 

Below is the delta.  Please check it.  It seems to do more than the
above implies.

Also, I have notes here that 

hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
and
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch

have additional updates pending.  Due to emails such as

http://lkml.kernel.org/r/849f5202-2200-265f-7769-8363053e8...@oracle.com
http://lkml.kernel.org/r/732c0b7d-5a4e-97a8-9677-30f352089...@oracle.com
http://lkml.kernel.org/r/6b91dd42-b903-1f6c-729a-bd9f51273...@oracle.com

What's the status, please?


From: Mike Kravetz 
Subject: hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix

It would be better to make an explicit check for mapping != null before
calling i_mmap_lock_write/try_to_unmap.  In this way, unrelated changes to
code above will not potentially lead to the possibility of mapping ==
null.

Signed-off-by: Mike Kravetz 
Cc: Michal Hocko 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Cc: "Aneesh Kumar K . V" 
Cc: Andrea Arcangeli 
Cc: "Kirill A . Shutemov" 
Cc: Davidlohr Bueso 
Cc: Prakash Sangappa 
Cc: Colin Ian King 
Signed-off-by: Andrew Morton 
---


--- 
a/mm/hugetlb.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix
+++ a/mm/hugetlb.c
@@ -3250,6 +3250,14 @@ int copy_hugetlb_page_range(struct mm_st
mmu_notifier_range_init(, src, vma->vm_start,
vma->vm_end, MMU_NOTIFY_CLEAR);
mmu_notifier_invalidate_range_start();
+   } else {
+   /*
+* For shared mappings i_mmap_rwsem must be held to call
+* huge_pte_alloc, otherwise the returned ptep could go
+* away if part of a shared pmd and another thread calls
+* huge_pmd_unshare.
+*/
+   i_mmap_lock_read(mapping);
}
 
for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
@@ -3259,18 +3267,8 @@ int copy_hugetlb_page_range(struct mm_st
if (!src_pte)
continue;
 
-   /*
-* i_mmap_rwsem must be held to call huge_pte_alloc.
-* Continue to hold until finished  with dst_pte, otherwise
-* it could go away if part of a shared pmd.
-*
-* 

Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization

2018-12-27 Thread Mike Kravetz
On 12/27/18 3:44 AM, Colin Ian King wrote:
> Hi,
> 
> Static analysis with CoverityScan on linux-next detected a potential
> null pointer dereference with the following commit:
> 
> From d8a1051ed4ba55679ef24e838a1942c9c40f0a14 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz 
> Date: Sat, 22 Dec 2018 10:55:57 +1100
> Subject: [PATCH] hugetlbfs: use i_mmap_rwsem for more pmd sharing
> 
> The earlier check implies that "mapping" may be a null pointer:
> 
> var_compare_op: Comparing mapping to null implies that mapping might be
> null.
> 
> 1008if (!(flags & MF_MUST_KILL) && !PageDirty(hpage) && mapping &&
> 1009mapping_cap_writeback_dirty(mapping)) {
> 
> ..however later "mapper" is dereferenced when it may be potentially null:
> 
> 1034/*
> 1035 * For hugetlb pages, try_to_unmap could potentially
> call
> 1036 * huge_pmd_unshare.  Because of this, take semaphore in
> 1037 * write mode here and set TTU_RMAP_LOCKED to
> indicate we
> 1038 * have taken the lock at this higer level.
> 1039 */
> CID 1476097 (#1 of 1): Dereference after null check (FORWARD_NULL)
> 
> var_deref_model: Passing null pointer mapping to
> i_mmap_lock_write, which dereferences it.
> 
> 1040i_mmap_lock_write(mapping);
> 1041unmap_success = try_to_unmap(hpage,
> ttu|TTU_RMAP_LOCKED);
> 1042i_mmap_unlock_write(mapping);
> 

Thanks for the report.

The 'good news' is that mapping can not be null in the code path above.
The reasons are:
- The page is locked upon entry to the routine
- Earlier in the routine there is the check:
if (!page_mapped(hpage))
return true;
  For huge pages (which are processed in the else clause above), page_mapped
  implies page->mapping != null.

However, the routine hwpoison_user_mappings handles all page types.  The
page_mapped check is actually there to check for pages in the swap cache.
It is just coincidence that it also implies mapping != null for huge pages.

It would be better to make an explicit check for mapping != null before
calling i_mmap_lock_write/try_to_unmap.  In this way, unrelated changes to
code above will not potentially lead to the possibility of mapping == null.

I'm not sure what is the best way to handle this.  Below is an updated version
of the patch sent to Andrew.  I can also provide a simple patch to the patch
if that is easier.

From: Mike Kravetz 

hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization

While looking at BUGs associated with invalid huge page map counts,
it was discovered and observed that a huge pte pointer could become
'invalid' and point to another task's page table.  Consider the
following:

A task takes a page fault on a shared hugetlbfs file and calls
huge_pte_alloc to get a ptep.  Suppose the returned ptep points to a
shared pmd.

Now, another task truncates the hugetlbfs file.  As part of truncation,
it unmaps everyone who has the file mapped.  If the range being
truncated is covered by a shared pmd, huge_pmd_unshare will be called.
For all but the last user of the shared pmd, huge_pmd_unshare will
clear the pud pointing to the pmd.  If the task in the middle of the
page fault is not the last user, the ptep returned by huge_pte_alloc
now points to another task's page table or worse.  This leads to bad
things such as incorrect page map/reference counts or invalid memory
references.

To fix, expand the use of i_mmap_rwsem as follows:
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
  huge_pmd_share is only called via huge_pte_alloc, so callers of
  huge_pte_alloc take i_mmap_rwsem before calling.  In addition, callers
  of huge_pte_alloc continue to hold the semaphore until finished with
  the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.

Cc: 
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz 
---
 mm/hugetlb.c| 67 ++---
 mm/memory-failure.c | 16 +--
 mm/migrate.c| 13 -
 mm/rmap.c   |  4 +++
 mm/userfaultfd.c| 11 ++--
 5 files changed, 90 insertions(+), 21 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 309fb8c969af..2a3162030167 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3239,6 +3239,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct
mm_struct *src,
int cow;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
+   struct address_space *mapping = vma->vm_file->f_mapping;
unsigned long mmun_start;   /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
int ret = 0;
@@ -3247,14 +3248,25 @@ int copy_hugetlb_page_range(struct mm_struct *dst,
struct mm_struct *src,

mmun_start = vma->vm_start;
mmun_end = vma->vm_end;
-   if (cow)
+   if 

Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-11-12 Thread Ezequiel Garcia
On Wed, 17 Oct 2018 at 16:50, Aaro Koskinen  wrote:
>
> Hi,
>
> On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote:
> > Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292
> > could you confirm that the original mmc driver (kernel from imgtech
> > people) did work ok with PREEMPT_NONE (sorry I did not do my homework)
> > ?
>
> Sorry, cannot confirm or test that. I have only used the mainline kernel
> on this board, since v4.5 or so with my own custom config which has
> been PREEMPT_NONE until now.
>

Aaro,

I spent some time today investigating this issue. I think this driver
has a broken pre-request/post-request implementation.

Will post some patches soon.

Thanks for the report,
-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-11-12 Thread Ezequiel Garcia
On Wed, 17 Oct 2018 at 16:50, Aaro Koskinen  wrote:
>
> Hi,
>
> On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote:
> > Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292
> > could you confirm that the original mmc driver (kernel from imgtech
> > people) did work ok with PREEMPT_NONE (sorry I did not do my homework)
> > ?
>
> Sorry, cannot confirm or test that. I have only used the mainline kernel
> on this board, since v4.5 or so with my own custom config which has
> been PREEMPT_NONE until now.
>

Aaro,

I spent some time today investigating this issue. I think this driver
has a broken pre-request/post-request implementation.

Will post some patches soon.

Thanks for the report,
-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar


Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure

2018-11-11 Thread Baruch Siach
Hi Sergey, Gregory,

Sergey Matyukevich writes:
> MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64
> defconfig. According to quick bisection of the board dts file, the
> root cause is in dts patch enabling CPU deep idle states:
> see 8ed46368776b3bc. Is there any pending fix other than reverting
> deep idle state support ?

I reproduced the same boot failure on the Armada 8K based Clearfog
GT-8K. Revert of commit 8ed46368776b3bc on top of v4.20-rc1 makes the
system boot again.

Here is the splat I get:

[   87.358425] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   87.364556] rcu: 3-...!: (1 ticks this GP) idle=d04/0/0x0 softirq=79/79 
fqs=1
[   87.371897] rcu: (detected by 2, t=21490 jiffies, g=-1035, q=179)
[   87.378106] Task dump for CPU 3:
[   87.381347] swapper/3   R  running task0 0  1 0x0028
[   87.388428] Call trace:
[   87.390892]  __switch_to+0x78/0xb4
[   87.394310]  arm_enter_idle_state+0x38/0x4c
[   87.398512]  cpuidle_enter_state+0xc8/0x15c
[   87.402713]  cpuidle_enter+0x18/0x20
[   87.406304]  call_cpuidle+0x30/0x34
[   87.409807]  do_idle+0x19c/0x224
[   87.413048]  cpu_startup_entry+0x20/0x24
[   87.416988]  secondary_start_kernel+0x158/0x168

baruch

--
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure

2018-11-11 Thread Baruch Siach
Hi Sergey, Gregory,

Sergey Matyukevich writes:
> MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64
> defconfig. According to quick bisection of the board dts file, the
> root cause is in dts patch enabling CPU deep idle states:
> see 8ed46368776b3bc. Is there any pending fix other than reverting
> deep idle state support ?

I reproduced the same boot failure on the Armada 8K based Clearfog
GT-8K. Revert of commit 8ed46368776b3bc on top of v4.20-rc1 makes the
system boot again.

Here is the splat I get:

[   87.358425] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   87.364556] rcu: 3-...!: (1 ticks this GP) idle=d04/0/0x0 softirq=79/79 
fqs=1
[   87.371897] rcu: (detected by 2, t=21490 jiffies, g=-1035, q=179)
[   87.378106] Task dump for CPU 3:
[   87.381347] swapper/3   R  running task0 0  1 0x0028
[   87.388428] Call trace:
[   87.390892]  __switch_to+0x78/0xb4
[   87.394310]  arm_enter_idle_state+0x38/0x4c
[   87.398512]  cpuidle_enter_state+0xc8/0x15c
[   87.402713]  cpuidle_enter+0x18/0x20
[   87.406304]  call_cpuidle+0x30/0x34
[   87.409807]  do_idle+0x19c/0x224
[   87.413048]  cpu_startup_entry+0x20/0x24
[   87.416988]  secondary_start_kernel+0x158/0x168

baruch

--
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure

2018-11-09 Thread Sergey Matyukevich
On Fri, Nov 09, 2018 at 03:11:12PM +0300, Sergey Matyukevich wrote:
> Hello Gregory,
> 
> MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64
> defconfig. According to quick bisection of the board dts file, the
> root cause is in dts patch enabling CPU deep idle states:
> see 8ed46368776b3bc. Is there any pending fix other than reverting
> deep idle state support ?

Adding devicetree mailing list.


Re: [bug report] armada-8040-mcbin: 4.20-rc1 boot failure

2018-11-09 Thread Sergey Matyukevich
On Fri, Nov 09, 2018 at 03:11:12PM +0300, Sergey Matyukevich wrote:
> Hello Gregory,
> 
> MacchiatoBin board fails to boot v4.20-rc1 kernel built using arm64
> defconfig. According to quick bisection of the board dts file, the
> root cause is in dts patch enabling CPU deep idle states:
> see 8ed46368776b3bc. Is there any pending fix other than reverting
> deep idle state support ?

Adding devicetree mailing list.


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-10-17 Thread Aaro Koskinen
Hi,

On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote:
> Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292
> could you confirm that the original mmc driver (kernel from imgtech
> people) did work ok with PREEMPT_NONE (sorry I did not do my homework)
> ?

Sorry, cannot confirm or test that. I have only used the mainline kernel
on this board, since v4.5 or so with my own custom config which has
been PREEMPT_NONE until now.

A.


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-10-17 Thread Aaro Koskinen
Hi,

On Wed, Oct 17, 2018 at 03:38:07PM +0200, Mathieu Malaterre wrote:
> Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292
> could you confirm that the original mmc driver (kernel from imgtech
> people) did work ok with PREEMPT_NONE (sorry I did not do my homework)
> ?

Sorry, cannot confirm or test that. I have only used the mainline kernel
on this board, since v4.5 or so with my own custom config which has
been PREEMPT_NONE until now.

A.


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-10-17 Thread Ezequiel Garcia
On Sun, 2018-10-14 at 20:04 +0300, Aaro Koskinen wrote:
> Hi,
> 
> There is something wrong with jz4740-mmc in current mainline kernel
> (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist
> prior those), as the DMA support does not work properly if I disable
> kernel pre-emption. The console gets flooded with:
> 
> [   16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 567 host->next_data.cookie 568
> [   16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 568 host->next_data.cookie 569
> [   16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 569 host->next_data.cookie 570
> [   16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 570 host->next_data.cookie 571
> [   16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 571 host->next_data.cookie 572
> [   16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 572 host->next_data.cookie 573
> [   16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 573 host->next_data.cookie 574
> [   16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 574 host->next_data.cookie 575
> [   16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 575 host->next_data.cookie 576
> [   16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 582 host->next_data.cookie 583
> 
> etc. ad inf.
> 
> This should be easily reproducible on CI20 board with ci20_defconfig
> and setting CONFIG_PREEMPT_NONE=y.
> 

Will take a look as soon as possible, most likely after ELCE.

Thanks,
Eze


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-10-17 Thread Ezequiel Garcia
On Sun, 2018-10-14 at 20:04 +0300, Aaro Koskinen wrote:
> Hi,
> 
> There is something wrong with jz4740-mmc in current mainline kernel
> (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist
> prior those), as the DMA support does not work properly if I disable
> kernel pre-emption. The console gets flooded with:
> 
> [   16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 567 host->next_data.cookie 568
> [   16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 568 host->next_data.cookie 569
> [   16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 569 host->next_data.cookie 570
> [   16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 570 host->next_data.cookie 571
> [   16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 571 host->next_data.cookie 572
> [   16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 572 host->next_data.cookie 573
> [   16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 573 host->next_data.cookie 574
> [   16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 574 host->next_data.cookie 575
> [   16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 575 host->next_data.cookie 576
> [   16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 582 host->next_data.cookie 583
> 
> etc. ad inf.
> 
> This should be easily reproducible on CI20 board with ci20_defconfig
> and setting CONFIG_PREEMPT_NONE=y.
> 

Will take a look as soon as possible, most likely after ELCE.

Thanks,
Eze


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-10-17 Thread Mathieu Malaterre
Hi,

On Sun, Oct 14, 2018 at 7:04 PM Aaro Koskinen  wrote:
>
> Hi,
>
> There is something wrong with jz4740-mmc in current mainline kernel
> (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist
> prior those), as the DMA support does not work properly if I disable
> kernel pre-emption. The console gets flooded with:
>
> [   16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 567 host->next_data.cookie 568
> [   16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 568 host->next_data.cookie 569
> [   16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 569 host->next_data.cookie 570
> [   16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 570 host->next_data.cookie 571
> [   16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 571 host->next_data.cookie 572
> [   16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 572 host->next_data.cookie 573
> [   16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 573 host->next_data.cookie 574
> [   16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 574 host->next_data.cookie 575
> [   16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 575 host->next_data.cookie 576
> [   16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 582 host->next_data.cookie 583
>
> etc. ad inf.
>
> This should be easily reproducible on CI20 board with ci20_defconfig
> and setting CONFIG_PREEMPT_NONE=y.

Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292
could you confirm that the original mmc driver (kernel from imgtech
people) did work ok with PREEMPT_NONE (sorry I did not do my homework)
?

Thanks

> A.


Re: Bug report: MIPS CI20/jz4740-mmc DMA and PREEMPT_NONE

2018-10-17 Thread Mathieu Malaterre
Hi,

On Sun, Oct 14, 2018 at 7:04 PM Aaro Koskinen  wrote:
>
> Hi,
>
> There is something wrong with jz4740-mmc in current mainline kernel
> (tested v4.18 and 4.19-rc, the MMC support for CI20 does not exist
> prior those), as the DMA support does not work properly if I disable
> kernel pre-emption. The console gets flooded with:
>
> [   16.461094] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 567 host->next_data.cookie 568
> [   16.473120] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 568 host->next_data.cookie 569
> [   16.485144] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 569 host->next_data.cookie 570
> [   16.497170] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 570 host->next_data.cookie 571
> [   16.509194] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 571 host->next_data.cookie 572
> [   16.532421] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 572 host->next_data.cookie 573
> [   16.544594] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 573 host->next_data.cookie 574
> [   16.556621] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 574 host->next_data.cookie 575
> [   16.568638] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 575 host->next_data.cookie 576
> [   16.601092] jz4740-mmc 1345.mmc: [jz4740_mmc_prepare_dma_data] invalid 
> cookie: data->host_cookie 582 host->next_data.cookie 583
>
> etc. ad inf.
>
> This should be easily reproducible on CI20 board with ci20_defconfig
> and setting CONFIG_PREEMPT_NONE=y.

Since CONFIG_PREEMPT has been 'y' since at least commit 0752f92934292
could you confirm that the original mmc driver (kernel from imgtech
people) did work ok with PREEMPT_NONE (sorry I did not do my homework)
?

Thanks

> A.


Re: Bug report: HiBMC crash

2018-09-21 Thread Chris Wilson
Quoting John Garry (2018-09-21 09:11:19)
> On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:
> > Hi John,
> > Thank you for reporting bug.
> > I am now using 4.18.7. I haven't found this issue yet.
> > I will try linux-next and figure out what's wrong with it.
> >
> > Thanks,
> > Xinliang
> >
> >
> 
> As mentioned in internal mail, the issue may be that the surface 
> depth/bpp we were using the in the driver was previously invalid, but 
> code has since been added in v4.19 to reject this. Specifically it looks 
> like this patch:
> 
> commit 70109354fed232dfce8fb2c7cadf635acbe03e19
> Author: Chris Wilson 
> Date:   Wed Sep 5 16:31:16 2018 +0100
> 
>  drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl


diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
index b92595c477ef..f3e7f41e6781 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n",
 sizes->surface_width, sizes->surface_height,
 sizes->surface_bpp);
-   sizes->surface_depth = 32;

bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);

@@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
return -ENOMEM;
}

-   priv->fbdev = hifbdev;
drm_fb_helper_prepare(priv->dev, >helper,
  _fbdev_helper_funcs);

@@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
 fix->ypanstep, fix->ywrapstep, fix->line_length,
 fix->accel, fix->capabilities);

+   priv->fbdev = hifbdev;
return 0;

 fini:

Apply chunks 2&3 first to confirm they fix the GPF.
-Chris


Re: Bug report: HiBMC crash

2018-09-21 Thread Chris Wilson
Quoting John Garry (2018-09-21 09:11:19)
> On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:
> > Hi John,
> > Thank you for reporting bug.
> > I am now using 4.18.7. I haven't found this issue yet.
> > I will try linux-next and figure out what's wrong with it.
> >
> > Thanks,
> > Xinliang
> >
> >
> 
> As mentioned in internal mail, the issue may be that the surface 
> depth/bpp we were using the in the driver was previously invalid, but 
> code has since been added in v4.19 to reject this. Specifically it looks 
> like this patch:
> 
> commit 70109354fed232dfce8fb2c7cadf635acbe03e19
> Author: Chris Wilson 
> Date:   Wed Sep 5 16:31:16 2018 +0100
> 
>  drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl


diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
index b92595c477ef..f3e7f41e6781 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n",
 sizes->surface_width, sizes->surface_height,
 sizes->surface_bpp);
-   sizes->surface_depth = 32;

bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);

@@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
return -ENOMEM;
}

-   priv->fbdev = hifbdev;
drm_fb_helper_prepare(priv->dev, >helper,
  _fbdev_helper_funcs);

@@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
 fix->ypanstep, fix->ywrapstep, fix->line_length,
 fix->accel, fix->capabilities);

+   priv->fbdev = hifbdev;
return 0;

 fini:

Apply chunks 2&3 first to confirm they fix the GPF.
-Chris


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-17 Thread Michal Hocko
On Tue 17-07-18 09:51:20, Baoquan He wrote:
> On 07/16/18 at 05:24pm, Michal Hocko wrote:
> > On Mon 16-07-18 21:02:02, Baoquan He wrote:
> > > On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > > > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > > > Hi Michal,
> > > > > 
> > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > > > [...]
> > > > > > I am not able to find the beginning of the email thread right now. 
> > > > > > Could
> > > > > > you summarize what is the actual problem please?
> > > > > 
> > > > > The bug is found on x86 now. 
> > > > > 
> > > > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > > > kernel memory is spread evenly among nodes. However, this is right 
> > > > > when
> > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > > > >  
> > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, 
> > > > > and
> > > > > we specify "kernelcore=50%", means each node will take 10G for
> > > > > kernelcore, 10G for movable area. But this doesn't take kernel 
> > > > > position
> > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > > > movable, in fact there's only 5G available for movable, just after
> > > > > kernel.
> > > > 
> > > > OK, I guess I see that part. But who is going to use movablecore along
> > > > with KASLR enabled? I mean do we really have to support those two
> > > > obscure command line parameters for KASLR?
> > > 
> > > Not very sure whether we have to support both of those to work with
> > > KASLR. Maybe it's time to make clear of it now.
> > 
> > Yes, I would really like to deprecate this. It is an ugly piece of code
> > and it's far from easily maintainable as well.
> > 
> > > For 'kernelcore=mirror', we have solved the conflict to make it work well
> > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> > > patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> > > 
> > > 1) solve the conflict between them with KASLR in
> > >find_zone_movable_pfns_for_nodes();
> > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> > > 4) add note in doc to notice people to not add them at the same time;
> > 
> > I would simply warn that those kernel parameters are not supported
> > anymore. If somebody shows up with a valid usecase we can reconsider.
> 
> OK, got it. The use case I can think of is that people want to check 
> hotplug on system w/o hotplug ACPI info.

Let's see whether there is really somebody like that and complain.
 
> I am fine with warning people they are not supported. Should I post a
> patch to address this, or you will do it? Both is fine to me.

I will happily ack it if you create a patch.

-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-17 Thread Michal Hocko
On Tue 17-07-18 09:51:20, Baoquan He wrote:
> On 07/16/18 at 05:24pm, Michal Hocko wrote:
> > On Mon 16-07-18 21:02:02, Baoquan He wrote:
> > > On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > > > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > > > Hi Michal,
> > > > > 
> > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > > > [...]
> > > > > > I am not able to find the beginning of the email thread right now. 
> > > > > > Could
> > > > > > you summarize what is the actual problem please?
> > > > > 
> > > > > The bug is found on x86 now. 
> > > > > 
> > > > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > > > kernel memory is spread evenly among nodes. However, this is right 
> > > > > when
> > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > > > >  
> > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, 
> > > > > and
> > > > > we specify "kernelcore=50%", means each node will take 10G for
> > > > > kernelcore, 10G for movable area. But this doesn't take kernel 
> > > > > position
> > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > > > movable, in fact there's only 5G available for movable, just after
> > > > > kernel.
> > > > 
> > > > OK, I guess I see that part. But who is going to use movablecore along
> > > > with KASLR enabled? I mean do we really have to support those two
> > > > obscure command line parameters for KASLR?
> > > 
> > > Not very sure whether we have to support both of those to work with
> > > KASLR. Maybe it's time to make clear of it now.
> > 
> > Yes, I would really like to deprecate this. It is an ugly piece of code
> > and it's far from easily maintainable as well.
> > 
> > > For 'kernelcore=mirror', we have solved the conflict to make it work well
> > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> > > patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> > > 
> > > 1) solve the conflict between them with KASLR in
> > >find_zone_movable_pfns_for_nodes();
> > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> > > 4) add note in doc to notice people to not add them at the same time;
> > 
> > I would simply warn that those kernel parameters are not supported
> > anymore. If somebody shows up with a valid usecase we can reconsider.
> 
> OK, got it. The use case I can think of is that people want to check 
> hotplug on system w/o hotplug ACPI info.

Let's see whether there is really somebody like that and complain.
 
> I am fine with warning people they are not supported. Should I post a
> patch to address this, or you will do it? Both is fine to me.

I will happily ack it if you create a patch.

-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Baoquan He
On 07/16/18 at 05:24pm, Michal Hocko wrote:
> On Mon 16-07-18 21:02:02, Baoquan He wrote:
> > On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > > Hi Michal,
> > > > 
> > > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > > [...]
> > > > > I am not able to find the beginning of the email thread right now. 
> > > > > Could
> > > > > you summarize what is the actual problem please?
> > > > 
> > > > The bug is found on x86 now. 
> > > > 
> > > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > > kernel memory is spread evenly among nodes. However, this is right when
> > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > > >  
> > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > > we specify "kernelcore=50%", means each node will take 10G for
> > > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > > movable, in fact there's only 5G available for movable, just after
> > > > kernel.
> > > 
> > > OK, I guess I see that part. But who is going to use movablecore along
> > > with KASLR enabled? I mean do we really have to support those two
> > > obscure command line parameters for KASLR?
> > 
> > Not very sure whether we have to support both of those to work with
> > KASLR. Maybe it's time to make clear of it now.
> 
> Yes, I would really like to deprecate this. It is an ugly piece of code
> and it's far from easily maintainable as well.
> 
> > For 'kernelcore=mirror', we have solved the conflict to make it work well
> > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> > patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> > 
> > 1) solve the conflict between them with KASLR in
> >find_zone_movable_pfns_for_nodes();
> > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> > 4) add note in doc to notice people to not add them at the same time;
> 
> I would simply warn that those kernel parameters are not supported
> anymore. If somebody shows up with a valid usecase we can reconsider.

OK, got it. The use case I can think of is that people want to check 
hotplug on system w/o hotplug ACPI info.

I am fine with warning people they are not supported. Should I post a
patch to address this, or you will do it? Both is fine to me.

> 
> > 2) and 3) may need be fixed in arch/x86 code. As long as come to an
> > agreement, any one is fine to me.
> > > 
> > > In fact I would be much more concerned about memory hotplug and
> > > pre-defined movable nodes. Does the current KASLR code work in that
> > > case?
> > 
> > As said above, kernelcore=mirror works well with KASLR now. Making
> > 'movable_node' work with KASLR is in progress.
> 
> OK, thanks for the info.

You are welcome.

Thanks
Baoquan


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Baoquan He
On 07/16/18 at 05:24pm, Michal Hocko wrote:
> On Mon 16-07-18 21:02:02, Baoquan He wrote:
> > On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > > Hi Michal,
> > > > 
> > > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > > [...]
> > > > > I am not able to find the beginning of the email thread right now. 
> > > > > Could
> > > > > you summarize what is the actual problem please?
> > > > 
> > > > The bug is found on x86 now. 
> > > > 
> > > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > > kernel memory is spread evenly among nodes. However, this is right when
> > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > > >  
> > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > > we specify "kernelcore=50%", means each node will take 10G for
> > > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > > movable, in fact there's only 5G available for movable, just after
> > > > kernel.
> > > 
> > > OK, I guess I see that part. But who is going to use movablecore along
> > > with KASLR enabled? I mean do we really have to support those two
> > > obscure command line parameters for KASLR?
> > 
> > Not very sure whether we have to support both of those to work with
> > KASLR. Maybe it's time to make clear of it now.
> 
> Yes, I would really like to deprecate this. It is an ugly piece of code
> and it's far from easily maintainable as well.
> 
> > For 'kernelcore=mirror', we have solved the conflict to make it work well
> > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> > patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> > 
> > 1) solve the conflict between them with KASLR in
> >find_zone_movable_pfns_for_nodes();
> > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> > 4) add note in doc to notice people to not add them at the same time;
> 
> I would simply warn that those kernel parameters are not supported
> anymore. If somebody shows up with a valid usecase we can reconsider.

OK, got it. The use case I can think of is that people want to check 
hotplug on system w/o hotplug ACPI info.

I am fine with warning people they are not supported. Should I post a
patch to address this, or you will do it? Both is fine to me.

> 
> > 2) and 3) may need be fixed in arch/x86 code. As long as come to an
> > agreement, any one is fine to me.
> > > 
> > > In fact I would be much more concerned about memory hotplug and
> > > pre-defined movable nodes. Does the current KASLR code work in that
> > > case?
> > 
> > As said above, kernelcore=mirror works well with KASLR now. Making
> > 'movable_node' work with KASLR is in progress.
> 
> OK, thanks for the info.

You are welcome.

Thanks
Baoquan


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Michal Hocko
On Mon 16-07-18 21:02:02, Baoquan He wrote:
> On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > Hi Michal,
> > > 
> > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > [...]
> > > > I am not able to find the beginning of the email thread right now. Could
> > > > you summarize what is the actual problem please?
> > > 
> > > The bug is found on x86 now. 
> > > 
> > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > kernel memory is spread evenly among nodes. However, this is right when
> > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > >  
> > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > we specify "kernelcore=50%", means each node will take 10G for
> > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > movable, in fact there's only 5G available for movable, just after
> > > kernel.
> > 
> > OK, I guess I see that part. But who is going to use movablecore along
> > with KASLR enabled? I mean do we really have to support those two
> > obscure command line parameters for KASLR?
> 
> Not very sure whether we have to support both of those to work with
> KASLR. Maybe it's time to make clear of it now.

Yes, I would really like to deprecate this. It is an ugly piece of code
and it's far from easily maintainable as well.

> For 'kernelcore=mirror', we have solved the conflict to make it work well
> with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> 
> 1) solve the conflict between them with KASLR in
>find_zone_movable_pfns_for_nodes();
> 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> 4) add note in doc to notice people to not add them at the same time;

I would simply warn that those kernel parameters are not supported
anymore. If somebody shows up with a valid usecase we can reconsider.

> 2) and 3) may need be fixed in arch/x86 code. As long as come to an
> agreement, any one is fine to me.
> > 
> > In fact I would be much more concerned about memory hotplug and
> > pre-defined movable nodes. Does the current KASLR code work in that
> > case?
> 
> As said above, kernelcore=mirror works well with KASLR now. Making
> 'movable_node' work with KASLR is in progress.

OK, thanks for the info.

-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Michal Hocko
On Mon 16-07-18 21:02:02, Baoquan He wrote:
> On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > Hi Michal,
> > > 
> > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > [...]
> > > > I am not able to find the beginning of the email thread right now. Could
> > > > you summarize what is the actual problem please?
> > > 
> > > The bug is found on x86 now. 
> > > 
> > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > kernel memory is spread evenly among nodes. However, this is right when
> > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > >  
> > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > we specify "kernelcore=50%", means each node will take 10G for
> > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > movable, in fact there's only 5G available for movable, just after
> > > kernel.
> > 
> > OK, I guess I see that part. But who is going to use movablecore along
> > with KASLR enabled? I mean do we really have to support those two
> > obscure command line parameters for KASLR?
> 
> Not very sure whether we have to support both of those to work with
> KASLR. Maybe it's time to make clear of it now.

Yes, I would really like to deprecate this. It is an ugly piece of code
and it's far from easily maintainable as well.

> For 'kernelcore=mirror', we have solved the conflict to make it work well
> with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> 
> 1) solve the conflict between them with KASLR in
>find_zone_movable_pfns_for_nodes();
> 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> 4) add note in doc to notice people to not add them at the same time;

I would simply warn that those kernel parameters are not supported
anymore. If somebody shows up with a valid usecase we can reconsider.

> 2) and 3) may need be fixed in arch/x86 code. As long as come to an
> agreement, any one is fine to me.
> > 
> > In fact I would be much more concerned about memory hotplug and
> > pre-defined movable nodes. Does the current KASLR code work in that
> > case?
> 
> As said above, kernelcore=mirror works well with KASLR now. Making
> 'movable_node' work with KASLR is in progress.

OK, thanks for the info.

-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Baoquan He
On 07/16/18 at 01:38pm, Michal Hocko wrote:
> On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> [...]
> > > I am not able to find the beginning of the email thread right now. Could
> > > you summarize what is the actual problem please?
> > 
> > The bug is found on x86 now. 
> > 
> > When added "kernelcore=" or "movablecore=" into kernel command line,
> > kernel memory is spread evenly among nodes. However, this is right when
> > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> >  
> > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > we specify "kernelcore=50%", means each node will take 10G for
> > kernelcore, 10G for movable area. But this doesn't take kernel position
> > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > movable, in fact there's only 5G available for movable, just after
> > kernel.
> 
> OK, I guess I see that part. But who is going to use movablecore along
> with KASLR enabled? I mean do we really have to support those two
> obscure command line parameters for KASLR?

Not very sure whether we have to support both of those to work with
KASLR. Maybe it's time to make clear of it now.

For 'kernelcore=mirror', we have solved the conflict to make it work well
with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
patches to fix it. As for 'kernelcore=' and 'movablecore=', 

1) solve the conflict between them with KASLR in
   find_zone_movable_pfns_for_nodes();
2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
4) add note in doc to notice people to not add them at the same time;

2) and 3) may need be fixed in arch/x86 code. As long as come to an
agreement, any one is fine to me.
> 
> In fact I would be much more concerned about memory hotplug and
> pre-defined movable nodes. Does the current KASLR code work in that
> case?

As said above, kernelcore=mirror works well with KASLR now. Making
'movable_node' work with KASLR is in progress.

Thanks
Baoquan


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Baoquan He
On 07/16/18 at 01:38pm, Michal Hocko wrote:
> On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> [...]
> > > I am not able to find the beginning of the email thread right now. Could
> > > you summarize what is the actual problem please?
> > 
> > The bug is found on x86 now. 
> > 
> > When added "kernelcore=" or "movablecore=" into kernel command line,
> > kernel memory is spread evenly among nodes. However, this is right when
> > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> >  
> > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > we specify "kernelcore=50%", means each node will take 10G for
> > kernelcore, 10G for movable area. But this doesn't take kernel position
> > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > movable, in fact there's only 5G available for movable, just after
> > kernel.
> 
> OK, I guess I see that part. But who is going to use movablecore along
> with KASLR enabled? I mean do we really have to support those two
> obscure command line parameters for KASLR?

Not very sure whether we have to support both of those to work with
KASLR. Maybe it's time to make clear of it now.

For 'kernelcore=mirror', we have solved the conflict to make it work well
with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
patches to fix it. As for 'kernelcore=' and 'movablecore=', 

1) solve the conflict between them with KASLR in
   find_zone_movable_pfns_for_nodes();
2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
4) add note in doc to notice people to not add them at the same time;

2) and 3) may need be fixed in arch/x86 code. As long as come to an
agreement, any one is fine to me.
> 
> In fact I would be much more concerned about memory hotplug and
> pre-defined movable nodes. Does the current KASLR code work in that
> case?

As said above, kernelcore=mirror works well with KASLR now. Making
'movable_node' work with KASLR is in progress.

Thanks
Baoquan


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Michal Hocko
On Fri 13-07-18 07:52:40, Baoquan He wrote:
> Hi Michal,
> 
> On 07/12/18 at 02:32pm, Michal Hocko wrote:
[...]
> > I am not able to find the beginning of the email thread right now. Could
> > you summarize what is the actual problem please?
> 
> The bug is found on x86 now. 
> 
> When added "kernelcore=" or "movablecore=" into kernel command line,
> kernel memory is spread evenly among nodes. However, this is right when
> KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> If KASLR enabled, it could be put any place from 16M to 64T randomly.
>  
> Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> we specify "kernelcore=50%", means each node will take 10G for
> kernelcore, 10G for movable area. But this doesn't take kernel position
> into consideration. E.g if kernel is put at 15G of 2nd node, namely
> node1. Then we think on node1 there's 10G for kernelcore, 10G for
> movable, in fact there's only 5G available for movable, just after
> kernel.

OK, I guess I see that part. But who is going to use movablecore along
with KASLR enabled? I mean do we really have to support those two
obscure command line parameters for KASLR?

In fact I would be much more concerned about memory hotplug and
pre-defined movable nodes. Does the current KASLR code work in that
case?
-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-16 Thread Michal Hocko
On Fri 13-07-18 07:52:40, Baoquan He wrote:
> Hi Michal,
> 
> On 07/12/18 at 02:32pm, Michal Hocko wrote:
[...]
> > I am not able to find the beginning of the email thread right now. Could
> > you summarize what is the actual problem please?
> 
> The bug is found on x86 now. 
> 
> When added "kernelcore=" or "movablecore=" into kernel command line,
> kernel memory is spread evenly among nodes. However, this is right when
> KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> If KASLR enabled, it could be put any place from 16M to 64T randomly.
>  
> Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> we specify "kernelcore=50%", means each node will take 10G for
> kernelcore, 10G for movable area. But this doesn't take kernel position
> into consideration. E.g if kernel is put at 15G of 2nd node, namely
> node1. Then we think on node1 there's 10G for kernelcore, 10G for
> movable, in fact there's only 5G available for movable, just after
> kernel.

OK, I guess I see that part. But who is going to use movablecore along
with KASLR enabled? I mean do we really have to support those two
obscure command line parameters for KASLR?

In fact I would be much more concerned about memory hotplug and
pre-defined movable nodes. Does the current KASLR code work in that
case?
-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He 
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >> 
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >> 
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >> 
>> > >> Signed-off-by: Baoquan He 
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> > 
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> > 
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> > 
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>> 
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now. 
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
> 
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He 
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He 

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init 
>early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
>   int i, nid;
>-  unsigned long usable_startpfn;
>+  unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
>   unsigned long kernelcore_node, kernelcore_remaining;
>   /* save the state before borrow the nodemask */
>   nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   if (!required_kernelcore || required_kernelcore >= totalpages)
>   goto out;
> 
>+  kernel_endpfn = PFN_UP(__pa_symbol(_end));
>   /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>-  usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+  arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> 
> restart:
>   /* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init 
>find_zone_movable_pfns_for_nodes(void)
>   unsigned long start_pfn, end_pfn;
> 
>   /*
>+   * KASLR may put kernel near tail of node memory,
>+   * start after kernel on that node to find PFN
>+   * at which zone begins.
>+   */
>+  if (pfn_to_nid(kernel_endpfn) == nid)
>+  usable_startpfn = max(arch_startpfn, kernel_endpfn);

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He 
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >> 
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >> 
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >> 
>> > >> Signed-off-by: Baoquan He 
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> > 
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> > 
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> > 
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>> 
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now. 
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
> 
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He 
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He 

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init 
>early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
>   int i, nid;
>-  unsigned long usable_startpfn;
>+  unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
>   unsigned long kernelcore_node, kernelcore_remaining;
>   /* save the state before borrow the nodemask */
>   nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   if (!required_kernelcore || required_kernelcore >= totalpages)
>   goto out;
> 
>+  kernel_endpfn = PFN_UP(__pa_symbol(_end));
>   /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>-  usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+  arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> 
> restart:
>   /* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init 
>find_zone_movable_pfns_for_nodes(void)
>   unsigned long start_pfn, end_pfn;
> 
>   /*
>+   * KASLR may put kernel near tail of node memory,
>+   * start after kernel on that node to find PFN
>+   * at which zone begins.
>+   */
>+  if (pfn_to_nid(kernel_endpfn) == nid)
>+  usable_startpfn = max(arch_startpfn, kernel_endpfn);

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Baoquan He
Hi Michal,

On 07/12/18 at 02:32pm, Michal Hocko wrote:
> On Thu 12-07-18 14:01:15, Chao Fan wrote:
> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
> > >Hi Baoquan,
> > >
> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
> > >> Please try this v3 patch:
> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> > >> From: Baoquan He 
> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> > >> 
> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
> > >> PFN movable zone begins in each node, kernel text position is not
> > >> considered. KASLR may put kernel after which movable zone begins.
> > >> 
> > >> Fix it by finding movable zone after kernel text on that node.
> > >> 
> > >> Signed-off-by: Baoquan He 
> > >
> > >
> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
> > >last node, or more.
> > 
> > I think it may not fail.
> > There is a 'restart' to do another pass.
> > 
> > >
> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
> > >this in the KASLR side.
> > >
> > 
> > I think it's better to fix here, but not KASLR side.
> > Cause much more code will be change if doing it in KASLR side.
> > Since we didn't parse 'kernelcore' in compressed code, and you can see
> > the distribution of ZONE_MOVABLE need so much code, so we do not need
> > to do so much job in KASLR side. But here, several lines will be OK.
> 
> I am not able to find the beginning of the email thread right now. Could
> you summarize what is the actual problem please?

The bug is found on x86 now. 

When added "kernelcore=" or "movablecore=" into kernel command line,
kernel memory is spread evenly among nodes. However, this is right when
KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
If KASLR enabled, it could be put any place from 16M to 64T randomly.
 
Consider a scenario, we have 10 nodes, and each node has 20G memory, and
we specify "kernelcore=50%", means each node will take 10G for
kernelcore, 10G for movable area. But this doesn't take kernel position
into consideration. E.g if kernel is put at 15G of 2nd node, namely
node1. Then we think on node1 there's 10G for kernelcore, 10G for
movable, in fact there's only 5G available for movable, just after
kernel.

I made a v4 patch which possibly can fix it.


>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Fri, 13 Jul 2018 07:49:29 +0800
Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins at in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He 
---
 mm/page_alloc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..5bc1a47dafda 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@ static unsigned long __init 
early_calculate_totalpages(void)
 static void __init find_zone_movable_pfns_for_nodes(void)
 {
int i, nid;
-   unsigned long usable_startpfn;
+   unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
if (!required_kernelcore || required_kernelcore >= totalpages)
goto out;
 
+   kernel_endpfn = PFN_UP(__pa_symbol(_end));
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-   usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+   arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
@@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long start_pfn, end_pfn;
 
/*
+* KASLR may put kernel near tail of node memory,
+* start after kernel on that node to find PFN
+* at which zone begins.
+*/
+   if (pfn_to_nid(kernel_endpfn) == nid)
+   usable_startpfn = max(arch_startpfn, kernel_endpfn);
+   else
+   usable_startpfn = arch_startpfn;
+
+   /*
 * Recalculate kernelcore_node if the division per node
 * now exceeds what is necessary to satisfy the 

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Baoquan He
Hi Michal,

On 07/12/18 at 02:32pm, Michal Hocko wrote:
> On Thu 12-07-18 14:01:15, Chao Fan wrote:
> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
> > >Hi Baoquan,
> > >
> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
> > >> Please try this v3 patch:
> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> > >> From: Baoquan He 
> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> > >> 
> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
> > >> PFN movable zone begins in each node, kernel text position is not
> > >> considered. KASLR may put kernel after which movable zone begins.
> > >> 
> > >> Fix it by finding movable zone after kernel text on that node.
> > >> 
> > >> Signed-off-by: Baoquan He 
> > >
> > >
> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
> > >last node, or more.
> > 
> > I think it may not fail.
> > There is a 'restart' to do another pass.
> > 
> > >
> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
> > >this in the KASLR side.
> > >
> > 
> > I think it's better to fix here, but not KASLR side.
> > Cause much more code will be change if doing it in KASLR side.
> > Since we didn't parse 'kernelcore' in compressed code, and you can see
> > the distribution of ZONE_MOVABLE need so much code, so we do not need
> > to do so much job in KASLR side. But here, several lines will be OK.
> 
> I am not able to find the beginning of the email thread right now. Could
> you summarize what is the actual problem please?

The bug is found on x86 now. 

When added "kernelcore=" or "movablecore=" into kernel command line,
kernel memory is spread evenly among nodes. However, this is right when
KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
If KASLR enabled, it could be put any place from 16M to 64T randomly.
 
Consider a scenario, we have 10 nodes, and each node has 20G memory, and
we specify "kernelcore=50%", means each node will take 10G for
kernelcore, 10G for movable area. But this doesn't take kernel position
into consideration. E.g if kernel is put at 15G of 2nd node, namely
node1. Then we think on node1 there's 10G for kernelcore, 10G for
movable, in fact there's only 5G available for movable, just after
kernel.

I made a v4 patch which possibly can fix it.


>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Fri, 13 Jul 2018 07:49:29 +0800
Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins at in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He 
---
 mm/page_alloc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..5bc1a47dafda 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@ static unsigned long __init 
early_calculate_totalpages(void)
 static void __init find_zone_movable_pfns_for_nodes(void)
 {
int i, nid;
-   unsigned long usable_startpfn;
+   unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
if (!required_kernelcore || required_kernelcore >= totalpages)
goto out;
 
+   kernel_endpfn = PFN_UP(__pa_symbol(_end));
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-   usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+   arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
@@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long start_pfn, end_pfn;
 
/*
+* KASLR may put kernel near tail of node memory,
+* start after kernel on that node to find PFN
+* at which zone begins.
+*/
+   if (pfn_to_nid(kernel_endpfn) == nid)
+   usable_startpfn = max(arch_startpfn, kernel_endpfn);
+   else
+   usable_startpfn = arch_startpfn;
+
+   /*
 * Recalculate kernelcore_node if the division per node
 * now exceeds what is necessary to satisfy the 

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Michal Hocko
On Thu 12-07-18 14:01:15, Chao Fan wrote:
> On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
> >Hi Baoquan,
> >
> >At 07/11/2018 08:40 PM, Baoquan He wrote:
> >> Please try this v3 patch:
> >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> >> From: Baoquan He 
> >> Date: Wed, 11 Jul 2018 20:31:51 +0800
> >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> >> 
> >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
> >> PFN movable zone begins in each node, kernel text position is not
> >> considered. KASLR may put kernel after which movable zone begins.
> >> 
> >> Fix it by finding movable zone after kernel text on that node.
> >> 
> >> Signed-off-by: Baoquan He 
> >
> >
> >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
> >'movablecore=' failed if the KASLR puts the kernel back the tail of the
> >last node, or more.
> 
> I think it may not fail.
> There is a 'restart' to do another pass.
> 
> >
> >Due to we have fix the mirror memory in KASLR side, and Chao is trying
> >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
> >this in the KASLR side.
> >
> 
> I think it's better to fix here, but not KASLR side.
> Cause much more code will be change if doing it in KASLR side.
> Since we didn't parse 'kernelcore' in compressed code, and you can see
> the distribution of ZONE_MOVABLE need so much code, so we do not need
> to do so much job in KASLR side. But here, several lines will be OK.

I am not able to find the beginning of the email thread right now. Could
you summarize what is the actual problem please?
-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Michal Hocko
On Thu 12-07-18 14:01:15, Chao Fan wrote:
> On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
> >Hi Baoquan,
> >
> >At 07/11/2018 08:40 PM, Baoquan He wrote:
> >> Please try this v3 patch:
> >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> >> From: Baoquan He 
> >> Date: Wed, 11 Jul 2018 20:31:51 +0800
> >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> >> 
> >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
> >> PFN movable zone begins in each node, kernel text position is not
> >> considered. KASLR may put kernel after which movable zone begins.
> >> 
> >> Fix it by finding movable zone after kernel text on that node.
> >> 
> >> Signed-off-by: Baoquan He 
> >
> >
> >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
> >'movablecore=' failed if the KASLR puts the kernel back the tail of the
> >last node, or more.
> 
> I think it may not fail.
> There is a 'restart' to do another pass.
> 
> >
> >Due to we have fix the mirror memory in KASLR side, and Chao is trying
> >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
> >this in the KASLR side.
> >
> 
> I think it's better to fix here, but not KASLR side.
> Cause much more code will be change if doing it in KASLR side.
> Since we didn't parse 'kernelcore' in compressed code, and you can see
> the distribution of ZONE_MOVABLE need so much code, so we do not need
> to do so much job in KASLR side. But here, several lines will be OK.

I am not able to find the beginning of the email thread right now. Could
you summarize what is the actual problem please?
-- 
Michal Hocko
SUSE Labs


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Baoquan He
On 07/12/18 at 09:19am, Chao Fan wrote:
> On Wed, Jul 11, 2018 at 08:40:08PM +0800, Baoquan He wrote:
> >Please try this v3 patch:
> >
> >From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> >From: Baoquan He 
> >Date: Wed, 11 Jul 2018 20:31:51 +0800
> >Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> >
> >In find_zone_movable_pfns_for_nodes(), when try to find the starting
> >PFN movable zone begins in each node, kernel text position is not
> >considered. KASLR may put kernel after which movable zone begins.
> >
> >Fix it by finding movable zone after kernel text on that node.
> >
> >Signed-off-by: Baoquan He 
> >---
> > mm/page_alloc.c | 20 +++-
> > 1 file changed, 15 insertions(+), 5 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 1521100..390eb35 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -6547,7 +6547,7 @@ static unsigned long __init 
> >early_calculate_totalpages(void)
> > static void __init find_zone_movable_pfns_for_nodes(void)
> > {
> > int i, nid;
> >-unsigned long usable_startpfn;
> >+unsigned long usable_startpfn, real_startpfn;
> > unsigned long kernelcore_node, kernelcore_remaining;
> > /* save the state before borrow the nodemask */
> > nodemask_t saved_node_state = node_states[N_MEMORY];
> >@@ -6681,10 +6681,20 @@ static void __init 
> >find_zone_movable_pfns_for_nodes(void)
> > if (start_pfn >= end_pfn)
> > continue;
> 
> Hi Baoquan,
> 
> Thanks for your quick reply and PATCH.
> I think it can work well after reviewing the code. But I think the new
> variable 'real_startpfn' is unnecessary. How about this:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d00f746c2fd..0fc9c4283947 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6492,6 +6492,10 @@ static void __init 
> find_zone_movable_pfns_for_nodes(void)
> if (start_pfn >= end_pfn)
> continue;
> 
> +   if (pfn_to_nid(PFN_UP(_etext)) == i)
> +   usable_startpfn = max(usable_startpfn,
> + PFN_UP(_etext));
> +
> /* Account for what is only usable for kernelcore */
> if (start_pfn < usable_startpfn) {
> unsigned long kernel_pages;
> 
> I think the logic of these two method are the same, and this method
> change less code. If I am wrong, please let me know.

Might be not. Need consider usable_startpfn and kernel_pfn are in the
same node, or in different node, two cases.

I will correct code after fix the compiling error.
> 
> 
> > 
> >+/*
> >+ * KASLR may put kernel near tail of node memory,
> >+ * start after kernel on that node to find PFN
> >+ * which zone begins.
> >+ */
> >+if (pfn_to_nid(PFN_UP(_etext)) == i)
> >+real_startpfn = max(usable_startpfn,
> >+PFN_UP(_etext))
> >+else
> >+real_startpfn = usable_startpfn;
> > /* Account for what is only usable for kernelcore */
> >-if (start_pfn < usable_startpfn) {
> >+if (start_pfn < real_startpfn) {
> > unsigned long kernel_pages;
> >-kernel_pages = min(end_pfn, usable_startpfn)
> >+kernel_pages = min(end_pfn, real_startpfn)
> > - start_pfn;
> > 
> > kernelcore_remaining -= min(kernel_pages,
> >@@ -6693,7 +6703,7 @@ static void __init 
> >find_zone_movable_pfns_for_nodes(void)
> > required_kernelcore);
> > 
> > /* Continue if range is now fully accounted */
> >-if (end_pfn <= usable_startpfn) {
> >+if (end_pfn <= real_startpfn) {
> > 
> > /*
> >  * Push zone_movable_pfn to the end so
> >@@ -6704,7 +6714,7 @@ static void __init 
> >find_zone_movable_pfns_for_nodes(void)
> > zone_movable_pfn[nid] = end_pfn;
> > continue;
> > }
> >-start_pfn = usable_startpfn;
> >+start_pfn = real_startpfn;
> > }
> > 
> > /*
> >-- 
> >2.1.0
> >
> >
> >
> 
> 


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Baoquan He
On 07/12/18 at 09:19am, Chao Fan wrote:
> On Wed, Jul 11, 2018 at 08:40:08PM +0800, Baoquan He wrote:
> >Please try this v3 patch:
> >
> >From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> >From: Baoquan He 
> >Date: Wed, 11 Jul 2018 20:31:51 +0800
> >Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> >
> >In find_zone_movable_pfns_for_nodes(), when try to find the starting
> >PFN movable zone begins in each node, kernel text position is not
> >considered. KASLR may put kernel after which movable zone begins.
> >
> >Fix it by finding movable zone after kernel text on that node.
> >
> >Signed-off-by: Baoquan He 
> >---
> > mm/page_alloc.c | 20 +++-
> > 1 file changed, 15 insertions(+), 5 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 1521100..390eb35 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -6547,7 +6547,7 @@ static unsigned long __init 
> >early_calculate_totalpages(void)
> > static void __init find_zone_movable_pfns_for_nodes(void)
> > {
> > int i, nid;
> >-unsigned long usable_startpfn;
> >+unsigned long usable_startpfn, real_startpfn;
> > unsigned long kernelcore_node, kernelcore_remaining;
> > /* save the state before borrow the nodemask */
> > nodemask_t saved_node_state = node_states[N_MEMORY];
> >@@ -6681,10 +6681,20 @@ static void __init 
> >find_zone_movable_pfns_for_nodes(void)
> > if (start_pfn >= end_pfn)
> > continue;
> 
> Hi Baoquan,
> 
> Thanks for your quick reply and PATCH.
> I think it can work well after reviewing the code. But I think the new
> variable 'real_startpfn' is unnecessary. How about this:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d00f746c2fd..0fc9c4283947 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6492,6 +6492,10 @@ static void __init 
> find_zone_movable_pfns_for_nodes(void)
> if (start_pfn >= end_pfn)
> continue;
> 
> +   if (pfn_to_nid(PFN_UP(_etext)) == i)
> +   usable_startpfn = max(usable_startpfn,
> + PFN_UP(_etext));
> +
> /* Account for what is only usable for kernelcore */
> if (start_pfn < usable_startpfn) {
> unsigned long kernel_pages;
> 
> I think the logic of these two method are the same, and this method
> change less code. If I am wrong, please let me know.

Might be not. Need consider usable_startpfn and kernel_pfn are in the
same node, or in different node, two cases.

I will correct code after fix the compiling error.
> 
> 
> > 
> >+/*
> >+ * KASLR may put kernel near tail of node memory,
> >+ * start after kernel on that node to find PFN
> >+ * which zone begins.
> >+ */
> >+if (pfn_to_nid(PFN_UP(_etext)) == i)
> >+real_startpfn = max(usable_startpfn,
> >+PFN_UP(_etext))
> >+else
> >+real_startpfn = usable_startpfn;
> > /* Account for what is only usable for kernelcore */
> >-if (start_pfn < usable_startpfn) {
> >+if (start_pfn < real_startpfn) {
> > unsigned long kernel_pages;
> >-kernel_pages = min(end_pfn, usable_startpfn)
> >+kernel_pages = min(end_pfn, real_startpfn)
> > - start_pfn;
> > 
> > kernelcore_remaining -= min(kernel_pages,
> >@@ -6693,7 +6703,7 @@ static void __init 
> >find_zone_movable_pfns_for_nodes(void)
> > required_kernelcore);
> > 
> > /* Continue if range is now fully accounted */
> >-if (end_pfn <= usable_startpfn) {
> >+if (end_pfn <= real_startpfn) {
> > 
> > /*
> >  * Push zone_movable_pfn to the end so
> >@@ -6704,7 +6714,7 @@ static void __init 
> >find_zone_movable_pfns_for_nodes(void)
> > zone_movable_pfn[nid] = end_pfn;
> > continue;
> > }
> >-start_pfn = usable_startpfn;
> >+start_pfn = real_startpfn;
> > }
> > 
> > /*
> >-- 
> >2.1.0
> >
> >
> >
> 
> 


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>Hi Baoquan,
>
>At 07/11/2018 08:40 PM, Baoquan He wrote:
>> Please try this v3 patch:
>> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> From: Baoquan He 
>> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> 
>> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> PFN movable zone begins in each node, kernel text position is not
>> considered. KASLR may put kernel after which movable zone begins.
>> 
>> Fix it by finding movable zone after kernel text on that node.
>> 
>> Signed-off-by: Baoquan He 
>
>
>You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>'movablecore=' failed if the KASLR puts the kernel back the tail of the
>last node, or more.

I think it may not fail.
There is a 'restart' to do another pass.

>
>Due to we have fix the mirror memory in KASLR side, and Chao is trying
>to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>this in the KASLR side.
>

I think it's better to fix here, but not KASLR side.
Cause much more code will be change if doing it in KASLR side.
Since we didn't parse 'kernelcore' in compressed code, and you can see
the distribution of ZONE_MOVABLE need so much code, so we do not need
to do so much job in KASLR side. But here, several lines will be OK.

Thanks,
Chao Fan

>
>> ---
>>   mm/page_alloc.c | 20 +++-
>>   1 file changed, 15 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 1521100..390eb35 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6547,7 +6547,7 @@ static unsigned long __init 
>> early_calculate_totalpages(void)
>>   static void __init find_zone_movable_pfns_for_nodes(void)
>>   {
>>  int i, nid;
>> -unsigned long usable_startpfn;
>> +unsigned long usable_startpfn, real_startpfn;
>>  unsigned long kernelcore_node, kernelcore_remaining;
>>  /* save the state before borrow the nodemask */
>>  nodemask_t saved_node_state = node_states[N_MEMORY];
>> @@ -6681,10 +6681,20 @@ static void __init 
>> find_zone_movable_pfns_for_nodes(void)
>>  if (start_pfn >= end_pfn)
>>  continue;
>> +/*
>> + * KASLR may put kernel near tail of node memory,
>> + * start after kernel on that node to find PFN
>> + * which zone begins.
>> + */
>> +if (pfn_to_nid(PFN_UP(_etext)) == i)
>
>Here, did you want to check the Node id? seems may be nid.
>
>and
>
>for_each_node_state(nid, N_MEMORY) {
>
>... seems check here is more suitable.
>
>   for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
>
>   }
>}
>
>Thanks,
>   dou
>
>> +real_startpfn = max(usable_startpfn,
>> +PFN_UP(_etext))
>> +else
>> +real_startpfn = usable_startpfn;
>>  /* Account for what is only usable for kernelcore */
>> -if (start_pfn < usable_startpfn) {
>> +if (start_pfn < real_startpfn) {
>>  unsigned long kernel_pages;
>> -kernel_pages = min(end_pfn, usable_startpfn)
>> +kernel_pages = min(end_pfn, real_startpfn)
>>  - start_pfn;
>>  kernelcore_remaining -= min(kernel_pages,
>> @@ -6693,7 +6703,7 @@ static void __init 
>> find_zone_movable_pfns_for_nodes(void)
>>  required_kernelcore);
>>  /* Continue if range is now fully accounted */
>> -if (end_pfn <= usable_startpfn) {
>> +if (end_pfn <= real_startpfn) {
>>  /*
>>   * Push zone_movable_pfn to the end so
>> @@ -6704,7 +6714,7 @@ static void __init 
>> find_zone_movable_pfns_for_nodes(void)
>>  zone_movable_pfn[nid] = end_pfn;
>>  continue;
>>  }
>> -start_pfn = usable_startpfn;
>> +start_pfn = real_startpfn;
>>  }
>>  /*
>> 




Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>Hi Baoquan,
>
>At 07/11/2018 08:40 PM, Baoquan He wrote:
>> Please try this v3 patch:
>> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> From: Baoquan He 
>> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> 
>> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> PFN movable zone begins in each node, kernel text position is not
>> considered. KASLR may put kernel after which movable zone begins.
>> 
>> Fix it by finding movable zone after kernel text on that node.
>> 
>> Signed-off-by: Baoquan He 
>
>
>You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>'movablecore=' failed if the KASLR puts the kernel back the tail of the
>last node, or more.

I think it may not fail.
There is a 'restart' to do another pass.

>
>Due to we have fix the mirror memory in KASLR side, and Chao is trying
>to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>this in the KASLR side.
>

I think it's better to fix here, but not KASLR side.
Cause much more code will be change if doing it in KASLR side.
Since we didn't parse 'kernelcore' in compressed code, and you can see
the distribution of ZONE_MOVABLE need so much code, so we do not need
to do so much job in KASLR side. But here, several lines will be OK.

Thanks,
Chao Fan

>
>> ---
>>   mm/page_alloc.c | 20 +++-
>>   1 file changed, 15 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 1521100..390eb35 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6547,7 +6547,7 @@ static unsigned long __init 
>> early_calculate_totalpages(void)
>>   static void __init find_zone_movable_pfns_for_nodes(void)
>>   {
>>  int i, nid;
>> -unsigned long usable_startpfn;
>> +unsigned long usable_startpfn, real_startpfn;
>>  unsigned long kernelcore_node, kernelcore_remaining;
>>  /* save the state before borrow the nodemask */
>>  nodemask_t saved_node_state = node_states[N_MEMORY];
>> @@ -6681,10 +6681,20 @@ static void __init 
>> find_zone_movable_pfns_for_nodes(void)
>>  if (start_pfn >= end_pfn)
>>  continue;
>> +/*
>> + * KASLR may put kernel near tail of node memory,
>> + * start after kernel on that node to find PFN
>> + * which zone begins.
>> + */
>> +if (pfn_to_nid(PFN_UP(_etext)) == i)
>
>Here, did you want to check the Node id? seems may be nid.
>
>and
>
>for_each_node_state(nid, N_MEMORY) {
>
>... seems check here is more suitable.
>
>   for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
>
>   }
>}
>
>Thanks,
>   dou
>
>> +real_startpfn = max(usable_startpfn,
>> +PFN_UP(_etext))
>> +else
>> +real_startpfn = usable_startpfn;
>>  /* Account for what is only usable for kernelcore */
>> -if (start_pfn < usable_startpfn) {
>> +if (start_pfn < real_startpfn) {
>>  unsigned long kernel_pages;
>> -kernel_pages = min(end_pfn, usable_startpfn)
>> +kernel_pages = min(end_pfn, real_startpfn)
>>  - start_pfn;
>>  kernelcore_remaining -= min(kernel_pages,
>> @@ -6693,7 +6703,7 @@ static void __init 
>> find_zone_movable_pfns_for_nodes(void)
>>  required_kernelcore);
>>  /* Continue if range is now fully accounted */
>> -if (end_pfn <= usable_startpfn) {
>> +if (end_pfn <= real_startpfn) {
>>  /*
>>   * Push zone_movable_pfn to the end so
>> @@ -6704,7 +6714,7 @@ static void __init 
>> find_zone_movable_pfns_for_nodes(void)
>>  zone_movable_pfn[nid] = end_pfn;
>>  continue;
>>  }
>> -start_pfn = usable_startpfn;
>> +start_pfn = real_startpfn;
>>  }
>>  /*
>> 




Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-11 Thread Dou Liyang

Hi Baoquan,

At 07/11/2018 08:40 PM, Baoquan He wrote:

Please try this v3 patch:
>>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Wed, 11 Jul 2018 20:31:51 +0800
Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He 



You fix this in the _zone_init side_. This may make the 'kernelcore=' or
'movablecore=' failed if the KASLR puts the kernel back the tail of the
last node, or more.

Due to we have fix the mirror memory in KASLR side, and Chao is trying
to fix the 'movable_node' in KASLR side. Have you had a chance to fix
this in the KASLR side.



---
  mm/page_alloc.c | 20 +++-
  1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100..390eb35 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@ static unsigned long __init 
early_calculate_totalpages(void)
  static void __init find_zone_movable_pfns_for_nodes(void)
  {
int i, nid;
-   unsigned long usable_startpfn;
+   unsigned long usable_startpfn, real_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6681,10 +6681,20 @@ static void __init 
find_zone_movable_pfns_for_nodes(void)
if (start_pfn >= end_pfn)
continue;
  
+			/*

+* KASLR may put kernel near tail of node memory,
+* start after kernel on that node to find PFN
+* which zone begins.
+*/
+   if (pfn_to_nid(PFN_UP(_etext)) == i)


Here, did you want to check the Node id? seems may be nid.

and

for_each_node_state(nid, N_MEMORY) {

... seems check here is more suitable.

for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {

}
}

Thanks,
dou


+   real_startpfn = max(usable_startpfn,
+   PFN_UP(_etext))
+   else
+   real_startpfn = usable_startpfn;
/* Account for what is only usable for kernelcore */
-   if (start_pfn < usable_startpfn) {
+   if (start_pfn < real_startpfn) {
unsigned long kernel_pages;
-   kernel_pages = min(end_pfn, usable_startpfn)
+   kernel_pages = min(end_pfn, real_startpfn)
- start_pfn;
  
  kernelcore_remaining -= min(kernel_pages,

@@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore);
  
  /* Continue if range is now fully accounted */

-   if (end_pfn <= usable_startpfn) {
+   if (end_pfn <= real_startpfn) {
  
  	/*

 * Push zone_movable_pfn to the end so
@@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
zone_movable_pfn[nid] = end_pfn;
continue;
}
-   start_pfn = usable_startpfn;
+   start_pfn = real_startpfn;
}
  
  			/*







Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-11 Thread Dou Liyang

Hi Baoquan,

At 07/11/2018 08:40 PM, Baoquan He wrote:

Please try this v3 patch:
>>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Wed, 11 Jul 2018 20:31:51 +0800
Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He 



You fix this in the _zone_init side_. This may make the 'kernelcore=' or
'movablecore=' failed if the KASLR puts the kernel back the tail of the
last node, or more.

Due to we have fix the mirror memory in KASLR side, and Chao is trying
to fix the 'movable_node' in KASLR side. Have you had a chance to fix
this in the KASLR side.



---
  mm/page_alloc.c | 20 +++-
  1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100..390eb35 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@ static unsigned long __init 
early_calculate_totalpages(void)
  static void __init find_zone_movable_pfns_for_nodes(void)
  {
int i, nid;
-   unsigned long usable_startpfn;
+   unsigned long usable_startpfn, real_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6681,10 +6681,20 @@ static void __init 
find_zone_movable_pfns_for_nodes(void)
if (start_pfn >= end_pfn)
continue;
  
+			/*

+* KASLR may put kernel near tail of node memory,
+* start after kernel on that node to find PFN
+* which zone begins.
+*/
+   if (pfn_to_nid(PFN_UP(_etext)) == i)


Here, did you want to check the Node id? seems may be nid.

and

for_each_node_state(nid, N_MEMORY) {

... seems check here is more suitable.

for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {

}
}

Thanks,
dou


+   real_startpfn = max(usable_startpfn,
+   PFN_UP(_etext))
+   else
+   real_startpfn = usable_startpfn;
/* Account for what is only usable for kernelcore */
-   if (start_pfn < usable_startpfn) {
+   if (start_pfn < real_startpfn) {
unsigned long kernel_pages;
-   kernel_pages = min(end_pfn, usable_startpfn)
+   kernel_pages = min(end_pfn, real_startpfn)
- start_pfn;
  
  kernelcore_remaining -= min(kernel_pages,

@@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore);
  
  /* Continue if range is now fully accounted */

-   if (end_pfn <= usable_startpfn) {
+   if (end_pfn <= real_startpfn) {
  
  	/*

 * Push zone_movable_pfn to the end so
@@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
zone_movable_pfn[nid] = end_pfn;
continue;
}
-   start_pfn = usable_startpfn;
+   start_pfn = real_startpfn;
}
  
  			/*







Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-11 Thread Chao Fan
On Wed, Jul 11, 2018 at 08:40:08PM +0800, Baoquan He wrote:
>Please try this v3 patch:
>
>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>From: Baoquan He 
>Date: Wed, 11 Jul 2018 20:31:51 +0800
>Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He 
>---
> mm/page_alloc.c | 20 +++-
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100..390eb35 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init 
>early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
>   int i, nid;
>-  unsigned long usable_startpfn;
>+  unsigned long usable_startpfn, real_startpfn;
>   unsigned long kernelcore_node, kernelcore_remaining;
>   /* save the state before borrow the nodemask */
>   nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6681,10 +6681,20 @@ static void __init 
>find_zone_movable_pfns_for_nodes(void)
>   if (start_pfn >= end_pfn)
>   continue;

Hi Baoquan,

Thanks for your quick reply and PATCH.
I think it can work well after reviewing the code. But I think the new
variable 'real_startpfn' is unnecessary. How about this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d00f746c2fd..0fc9c4283947 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6492,6 +6492,10 @@ static void __init find_zone_movable_pfns_for_nodes(void)
if (start_pfn >= end_pfn)
continue;

+   if (pfn_to_nid(PFN_UP(_etext)) == i)
+   usable_startpfn = max(usable_startpfn,
+ PFN_UP(_etext));
+
/* Account for what is only usable for kernelcore */
if (start_pfn < usable_startpfn) {
unsigned long kernel_pages;

I think the logic of these two method are the same, and this method
change less code. If I am wrong, please let me know.

Thanks,
Chao Fan

> 
>+  /*
>+   * KASLR may put kernel near tail of node memory,
>+   * start after kernel on that node to find PFN
>+   * which zone begins.
>+   */
>+  if (pfn_to_nid(PFN_UP(_etext)) == i)
>+  real_startpfn = max(usable_startpfn,
>+  PFN_UP(_etext))
>+  else
>+  real_startpfn = usable_startpfn;
>   /* Account for what is only usable for kernelcore */
>-  if (start_pfn < usable_startpfn) {
>+  if (start_pfn < real_startpfn) {
>   unsigned long kernel_pages;
>-  kernel_pages = min(end_pfn, usable_startpfn)
>+  kernel_pages = min(end_pfn, real_startpfn)
>   - start_pfn;
> 
>   kernelcore_remaining -= min(kernel_pages,
>@@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   required_kernelcore);
> 
>   /* Continue if range is now fully accounted */
>-  if (end_pfn <= usable_startpfn) {
>+  if (end_pfn <= real_startpfn) {
> 
>   /*
>* Push zone_movable_pfn to the end so
>@@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   zone_movable_pfn[nid] = end_pfn;
>   continue;
>   }
>-  start_pfn = usable_startpfn;
>+  start_pfn = real_startpfn;
>   }
> 
>   /*
>-- 
>2.1.0
>
>
>




  1   2   3   4   5   6   7   >