Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3 - null call

2017-10-09 Thread David Kozub

On Mon, 9 Oct 2017, Daniel Lezcano wrote:


On 07/10/2017 23:26, David Kozub wrote:

Hi all,

booting up kernel 4.14-rc3 with CS5535_CLOCK_EVENT_SRC on an ALIX 2c3
(http://pcengines.ch/alix2c3.htm) dies with:

[    2.313086] cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0'
registered
[    2.338711] cs5535-mfgpt cs5535-mfgpt: registered timer 0
[    2.355676] ledtrig-cpu: registered to indicate activity on CPUs
[    2.373745] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[    2.389976] cs5535-clockevt: Registering MFGPT timer as a clock
event, using IRQ 7
[    2.412687] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[    2.412698] IP:   (null)
[    2.412702] *pde = 
[    2.412713] Oops:  [#1]
[    2.412716] Modules linked in:
[    2.412732] CPU: 0 PID: 1 Comm: swapper Not tainted
4.14.0-rc3-humel-test17 #36
[    2.412739] task: c001 task.stack: c008e000
[    2.412744] EIP:   (null)
[    2.412749] EFLAGS: 00210093 CPU: 0
[    2.412758] EAX: c05fb8c0 EBX: c05fb880 ECX:  EDX: 620c
[    2.412769] ESI: c0009fd4 EDI: c014e14e EBP: c0009fac ESP: c0009fa8
[    2.412780]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[    2.412790] CR0: 80050033 CR2:  CR3: 0064f000 CR4: 0090
[    2.412793] Call Trace:
[    2.412800]  
[    2.412825]  ? mfgpt_tick+0x5d/0x81
[    2.412845]  __handle_irq_event_percpu+0x56/0xb6
[    2.412864]  ? handle_fasteoi_irq+0xf3/0xf3
[    2.412878]  handle_irq_event_percpu+0x17/0x3f
[    2.412892]  handle_irq_event+0x1d/0x29
[    2.412905]  handle_level_irq+0x57/0xc6
[    2.412924]  handle_irq+0x47/0x52
[    2.412929]  
[    2.412934]  
[    2.412949]  do_IRQ+0x32/0x9b
[    2.412963]  ? __irqentry_text_end+0x7/0x7
[    2.412976]  common_interrupt+0x2e/0x34
...

The problem seems to be in drivers/clocksource/cs5535-clockevt.c in
mfgpt_tick() on line 132:
...
cs5535_clockevent.event_handler(_clockevent);
...
cs5535_clockevent.event_handler is null.

Adding some more traces I see mfgpt_tick() gets called before
clockevents_config_and_register() finishes (invoked from
cs5535_mfgpt_init() on line 178). So when mfgpt_tick() accessess the
event_handler, it's NULL. Wrapping the event_handler call on line 132 in
a null pointer check results in a working system.

The issue is present at least also in kernel 4.13.5. In kernel versions
<= 4.1-rc6 the cs5535_clockevent worked OK. Kernels >= 4.1-rc7 never
booted at all on my ALIX 2c3 (last I tried 4.5, then gave up and tried
4.13.5 recently), so I don't know when exactly this issue appeared.

My guess is that the timer interrupt is enabled too early. If I change
the order of clockevents_config_and_register() and
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val) in
cs5535_mfgpt_init() like this:
...
/* Set up the clock event */
printk(KERN_INFO DRV_NAME
    ": Registering MFGPT timer as a clock event, using IRQ %d\n",
    timer_irq);
clockevents_config_and_register(_clockevent, MFGPT_HZ,
    0xF, 0xFFFE);

/* Set the clock scale and enable the event mode for CMP2 */
val = MFGPT_SCALE | (3 << 8);
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
...
the system boots and seems to be OK.


What happens if instead of inverting those two lines, you add
mfgpt_shutdown() early in the init function ?


Hi Daniel,

I can't call mfgpt_shutdown() before clockevents_config_and_register() as 
mfgpt_shutdown() wants a struct clock_event_device* which is only 
available after clockevents_config_and_register(). But I tried calling 
disable_timer():


@@ -152,6 +152,8 @@ static int __init cs5535_mfgpt_init(void)
}
cs5535_event_clock = timer;

+   disable_timer(cs5535_event_clock);
+
/* Set up the IRQ on the MFGPT side */
if (cs5535_mfgpt_setup_irq(timer, MFGPT_CMP2, _irq)) {
printk(KERN_ERR DRV_NAME ": Could not set up IRQ %d\n",

which should do the same thing. Unfortunately I still get the same BUG.

I added some more traces and I think the IRQ comes just after:
ret = setup_irq(timer_irq, );

(And my reordering of cs5535_mfgpt_write and 
clockevents_config_and_register was just lucky, because the confing and 
register occured sooner than the IRQ.)


Attached is a patch that shows where I added the traces. It produces the 
following output (I also added a bunch of sleeps in between to be sure 
which step triggers the mfgpt_tick call):


[3.229612] cs5535-clockevt: enabling verbose handler
[3.234699] cs5535-clockevt: sleeping(1)...
[4.320051] cs5535-clockevt: done(1)
[4.323641] cs5535-clockevt: calling cs5535_mfgpt_alloc_timer
[4.329402] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[4.334835] cs5535-clockevt: sleeping(2)...
[5.360051] cs5535-clockevt: done(2)
[5.363632] cs5535-clockevt: calling disable_timer
[5.368425] cs5535-clockevt: sleeping(3)...
[6.400048] cs5535-clockevt: done(3)
[6.403631] cs5535-clockevt: calling 

Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3 - null call

2017-10-09 Thread David Kozub

On Mon, 9 Oct 2017, Daniel Lezcano wrote:


On 07/10/2017 23:26, David Kozub wrote:

Hi all,

booting up kernel 4.14-rc3 with CS5535_CLOCK_EVENT_SRC on an ALIX 2c3
(http://pcengines.ch/alix2c3.htm) dies with:

[    2.313086] cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0'
registered
[    2.338711] cs5535-mfgpt cs5535-mfgpt: registered timer 0
[    2.355676] ledtrig-cpu: registered to indicate activity on CPUs
[    2.373745] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[    2.389976] cs5535-clockevt: Registering MFGPT timer as a clock
event, using IRQ 7
[    2.412687] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[    2.412698] IP:   (null)
[    2.412702] *pde = 
[    2.412713] Oops:  [#1]
[    2.412716] Modules linked in:
[    2.412732] CPU: 0 PID: 1 Comm: swapper Not tainted
4.14.0-rc3-humel-test17 #36
[    2.412739] task: c001 task.stack: c008e000
[    2.412744] EIP:   (null)
[    2.412749] EFLAGS: 00210093 CPU: 0
[    2.412758] EAX: c05fb8c0 EBX: c05fb880 ECX:  EDX: 620c
[    2.412769] ESI: c0009fd4 EDI: c014e14e EBP: c0009fac ESP: c0009fa8
[    2.412780]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[    2.412790] CR0: 80050033 CR2:  CR3: 0064f000 CR4: 0090
[    2.412793] Call Trace:
[    2.412800]  
[    2.412825]  ? mfgpt_tick+0x5d/0x81
[    2.412845]  __handle_irq_event_percpu+0x56/0xb6
[    2.412864]  ? handle_fasteoi_irq+0xf3/0xf3
[    2.412878]  handle_irq_event_percpu+0x17/0x3f
[    2.412892]  handle_irq_event+0x1d/0x29
[    2.412905]  handle_level_irq+0x57/0xc6
[    2.412924]  handle_irq+0x47/0x52
[    2.412929]  
[    2.412934]  
[    2.412949]  do_IRQ+0x32/0x9b
[    2.412963]  ? __irqentry_text_end+0x7/0x7
[    2.412976]  common_interrupt+0x2e/0x34
...

The problem seems to be in drivers/clocksource/cs5535-clockevt.c in
mfgpt_tick() on line 132:
...
cs5535_clockevent.event_handler(_clockevent);
...
cs5535_clockevent.event_handler is null.

Adding some more traces I see mfgpt_tick() gets called before
clockevents_config_and_register() finishes (invoked from
cs5535_mfgpt_init() on line 178). So when mfgpt_tick() accessess the
event_handler, it's NULL. Wrapping the event_handler call on line 132 in
a null pointer check results in a working system.

The issue is present at least also in kernel 4.13.5. In kernel versions
<= 4.1-rc6 the cs5535_clockevent worked OK. Kernels >= 4.1-rc7 never
booted at all on my ALIX 2c3 (last I tried 4.5, then gave up and tried
4.13.5 recently), so I don't know when exactly this issue appeared.

My guess is that the timer interrupt is enabled too early. If I change
the order of clockevents_config_and_register() and
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val) in
cs5535_mfgpt_init() like this:
...
/* Set up the clock event */
printk(KERN_INFO DRV_NAME
    ": Registering MFGPT timer as a clock event, using IRQ %d\n",
    timer_irq);
clockevents_config_and_register(_clockevent, MFGPT_HZ,
    0xF, 0xFFFE);

/* Set the clock scale and enable the event mode for CMP2 */
val = MFGPT_SCALE | (3 << 8);
cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
...
the system boots and seems to be OK.


What happens if instead of inverting those two lines, you add
mfgpt_shutdown() early in the init function ?


Hi Daniel,

I can't call mfgpt_shutdown() before clockevents_config_and_register() as 
mfgpt_shutdown() wants a struct clock_event_device* which is only 
available after clockevents_config_and_register(). But I tried calling 
disable_timer():


@@ -152,6 +152,8 @@ static int __init cs5535_mfgpt_init(void)
}
cs5535_event_clock = timer;

+   disable_timer(cs5535_event_clock);
+
/* Set up the IRQ on the MFGPT side */
if (cs5535_mfgpt_setup_irq(timer, MFGPT_CMP2, _irq)) {
printk(KERN_ERR DRV_NAME ": Could not set up IRQ %d\n",

which should do the same thing. Unfortunately I still get the same BUG.

I added some more traces and I think the IRQ comes just after:
ret = setup_irq(timer_irq, );

(And my reordering of cs5535_mfgpt_write and 
clockevents_config_and_register was just lucky, because the confing and 
register occured sooner than the IRQ.)


Attached is a patch that shows where I added the traces. It produces the 
following output (I also added a bunch of sleeps in between to be sure 
which step triggers the mfgpt_tick call):


[3.229612] cs5535-clockevt: enabling verbose handler
[3.234699] cs5535-clockevt: sleeping(1)...
[4.320051] cs5535-clockevt: done(1)
[4.323641] cs5535-clockevt: calling cs5535_mfgpt_alloc_timer
[4.329402] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[4.334835] cs5535-clockevt: sleeping(2)...
[5.360051] cs5535-clockevt: done(2)
[5.363632] cs5535-clockevt: calling disable_timer
[5.368425] cs5535-clockevt: sleeping(3)...
[6.400048] cs5535-clockevt: done(3)
[6.403631] cs5535-clockevt: calling 

[PATCH] ftrace/docs: Add documentation on how to use ftrace from within the kernel

2017-10-09 Thread Steven Rostedt

With the coming removal of jprobes, using ftrace callbacks is one of the
utilities that replace the jprobes functionality. Having a document that
explains how to use ftrace as such will help in the transition from jprobes
to ftrace. This document is for kernel developers that require attaching a
callback to a function within the kernel.

Link: 
http://lkml.kernel.org/r/150724519527.5014.10207042218696587159.stgit@devbox

Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/ftrace-uses.rst | 297 
 1 file changed, 297 insertions(+)
 create mode 100644 Documentation/trace/ftrace-uses.rst

diff --git a/Documentation/trace/ftrace-uses.rst 
b/Documentation/trace/ftrace-uses.rst
new file mode 100644
index 000..5914d71
--- /dev/null
+++ b/Documentation/trace/ftrace-uses.rst
@@ -0,0 +1,297 @@
+   Using ftrace to hook to functions
+   =
+
+Copyright 2017 VMware Inc.
+   Author:   Steven Rostedt 
+  License:   The GNU Free Documentation License, Version 1.2
+   (dual licensed under the GPL v2)
+
+Written for: 4.14
+
+Introduction
+
+
+The ftrace infrastructure was originially created to attach callbacks to the
+beginning of functions in order to record and trace the flow of the kernel.
+But callbacks to the start of a function can have other use cases. Either
+for live kernel patching, or for security monitoring. This document describes
+how to use ftrace to implement your own function callbacks.
+
+
+The ftrace context
+==
+
+WARNING: The ability to add a callback to almost any function within the
+kernel comes with risks. A callback can be called from any context
+(normal, softirq, irq, and NMI). Callbacks can also be called just before
+going to idle, during CPU bring up and takedown, or going to user space.
+This requires extra care to what can be done inside a callback. A callback
+can be called outside the protective scope of RCU.
+
+The ftrace infrastructure has some protections agains recursions and RCU
+but one must still be very careful how they use the callbacks.
+
+
+The ftrace_ops structure
+
+
+To register a function callback, a ftrace_ops is required. This structure
+is used to tell ftrace what function should be called as the callback
+as well as what protections the callback will perform and not require
+ftrace to handle.
+
+There is only one field that is needed to be set when registering
+an ftrace_ops with ftrace::
+
+ struct ftrace_ops ops = {
+   .func   = my_callback_func,
+   .flags  = MY_FTRACE_FLAGS
+   .private= any_private_data_structure,
+ };
+
+Both .flags and .private are optional. Only .func is required.
+
+To enable tracing call::
+
+  register_ftrace_function();
+
+To disable tracing call::
+
+  unregister_ftrace_function();
+
+The above is defined by including the header::
+
+ #include 
+
+The registered callback will start being called some time after the
+register_ftrace_function() is called and before it returns. The exact time
+that callbacks start being called is dependent upon architecture and scheduling
+of services. The callback itself will have to handle any synchronization if it
+must begin at an exact moment.
+
+The unregister_ftrace_function() will guarantee that the callback is
+no longer being called by functions after the unregister_ftrace_function()
+returns. Note that to perform this guarantee, the unregister_ftrace_function()
+may take some time to finish.
+
+
+The callback function
+=
+
+The prototype of the callback function is as follows (as of v4.14)::
+
+ void callback_func(unsigned long ip, unsigned long parent_ip,
+   struct ftrace_ops *op, struct pt_regs *regs);
+
+@ip
+This is the instruction pointer of the function that is being traced.
+(where the fentry or mcount is within the function)
+
+@parent_ip
+   This is the instruction pointer of the function that called the
+   the function being traced (where the call of the function occurred).
+
+@op
+   This is a pointer to ftrace_ops that was used to register the callback.
+   This can be used to pass data to the callback via the private pointer.
+
+@regs
+   If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
+   flags are set in the ftrace_ops structure, then this will be pointing
+   to the pt_regs structure like it would be if an breakpoint was placed
+   at the start of the function where ftrace was tracing. Otherwise it
+   either contains garbage, or NULL.
+
+
+The ftrace FLAGS
+
+
+The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
+Some of the flags are used for internal infrastructure of ftrace, but the
+ones that users should be aware of are the following:
+

[PATCH] ftrace/docs: Add documentation on how to use ftrace from within the kernel

2017-10-09 Thread Steven Rostedt

With the coming removal of jprobes, using ftrace callbacks is one of the
utilities that replace the jprobes functionality. Having a document that
explains how to use ftrace as such will help in the transition from jprobes
to ftrace. This document is for kernel developers that require attaching a
callback to a function within the kernel.

Link: 
http://lkml.kernel.org/r/150724519527.5014.10207042218696587159.stgit@devbox

Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/ftrace-uses.rst | 297 
 1 file changed, 297 insertions(+)
 create mode 100644 Documentation/trace/ftrace-uses.rst

diff --git a/Documentation/trace/ftrace-uses.rst 
b/Documentation/trace/ftrace-uses.rst
new file mode 100644
index 000..5914d71
--- /dev/null
+++ b/Documentation/trace/ftrace-uses.rst
@@ -0,0 +1,297 @@
+   Using ftrace to hook to functions
+   =
+
+Copyright 2017 VMware Inc.
+   Author:   Steven Rostedt 
+  License:   The GNU Free Documentation License, Version 1.2
+   (dual licensed under the GPL v2)
+
+Written for: 4.14
+
+Introduction
+
+
+The ftrace infrastructure was originially created to attach callbacks to the
+beginning of functions in order to record and trace the flow of the kernel.
+But callbacks to the start of a function can have other use cases. Either
+for live kernel patching, or for security monitoring. This document describes
+how to use ftrace to implement your own function callbacks.
+
+
+The ftrace context
+==
+
+WARNING: The ability to add a callback to almost any function within the
+kernel comes with risks. A callback can be called from any context
+(normal, softirq, irq, and NMI). Callbacks can also be called just before
+going to idle, during CPU bring up and takedown, or going to user space.
+This requires extra care to what can be done inside a callback. A callback
+can be called outside the protective scope of RCU.
+
+The ftrace infrastructure has some protections agains recursions and RCU
+but one must still be very careful how they use the callbacks.
+
+
+The ftrace_ops structure
+
+
+To register a function callback, a ftrace_ops is required. This structure
+is used to tell ftrace what function should be called as the callback
+as well as what protections the callback will perform and not require
+ftrace to handle.
+
+There is only one field that is needed to be set when registering
+an ftrace_ops with ftrace::
+
+ struct ftrace_ops ops = {
+   .func   = my_callback_func,
+   .flags  = MY_FTRACE_FLAGS
+   .private= any_private_data_structure,
+ };
+
+Both .flags and .private are optional. Only .func is required.
+
+To enable tracing call::
+
+  register_ftrace_function();
+
+To disable tracing call::
+
+  unregister_ftrace_function();
+
+The above is defined by including the header::
+
+ #include 
+
+The registered callback will start being called some time after the
+register_ftrace_function() is called and before it returns. The exact time
+that callbacks start being called is dependent upon architecture and scheduling
+of services. The callback itself will have to handle any synchronization if it
+must begin at an exact moment.
+
+The unregister_ftrace_function() will guarantee that the callback is
+no longer being called by functions after the unregister_ftrace_function()
+returns. Note that to perform this guarantee, the unregister_ftrace_function()
+may take some time to finish.
+
+
+The callback function
+=
+
+The prototype of the callback function is as follows (as of v4.14)::
+
+ void callback_func(unsigned long ip, unsigned long parent_ip,
+   struct ftrace_ops *op, struct pt_regs *regs);
+
+@ip
+This is the instruction pointer of the function that is being traced.
+(where the fentry or mcount is within the function)
+
+@parent_ip
+   This is the instruction pointer of the function that called the
+   the function being traced (where the call of the function occurred).
+
+@op
+   This is a pointer to ftrace_ops that was used to register the callback.
+   This can be used to pass data to the callback via the private pointer.
+
+@regs
+   If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
+   flags are set in the ftrace_ops structure, then this will be pointing
+   to the pt_regs structure like it would be if an breakpoint was placed
+   at the start of the function where ftrace was tracing. Otherwise it
+   either contains garbage, or NULL.
+
+
+The ftrace FLAGS
+
+
+The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
+Some of the flags are used for internal infrastructure of ftrace, but the
+ones that users should be aware of are the following:
+
+FTRACE_OPS_FL_PER_CPU
+   When set, the 

Re: [Intel-gfx] [PATCH v3] drm/i915: Replace *_reference/unreference() or *_ref/unref with _get/put()

2017-10-09 Thread Sean Paul
On Mon, Oct 9, 2017 at 4:59 AM, Daniel Vetter  wrote:
> On Sun, Oct 08, 2017 at 03:43:35PM +0100, Chris Wilson wrote:
>> Quoting Harsha Sharma (2017-10-08 15:04:07)
>> > @@ -624,7 +624,7 @@ static bool intel_fbdev_init_bios(struct drm_device 
>> > *dev,
>> > ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
>> > ifbdev->fb = fb;
>> >
>> > -   drm_framebuffer_reference(>fb->base);
>> > +   drm_framebuffer_put(>fb->base);
>>
>> Whoops.
>
> Hm yeah, how did this happen? Does cocci really do this, or is that an
> accident from manually fixing stuff up?


Running the spatch from the commit message gives me the correct substitution:
@@ -627,7 +627,7 @@ static bool intel_fbdev_init_bios(struct drm_device *dev,
ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
ifbdev->fb = fb;

-   drm_framebuffer_reference(>fb->base);
+   drm_framebuffer_get(>fb->base);

/* Final pass to check if any active pipes don't have fbs */
for_each_crtc(dev, crtc) {


Probably just finger slip since this is the last chunk before the
omitted selftests changes.

Harsha: the "better" way to omit the selftests without hand tuning the
patch would be to run the cocci spatch on i915 as normal, and then run
"git checkout -- drivers/gpu/drm/i915/selftests/" before committing.
It's dangerous to edit patches by hand, or to misrepresent a patch as
being the result of a cocci spatch when it's not.

Sean



> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH v3] drm/i915: Replace *_reference/unreference() or *_ref/unref with _get/put()

2017-10-09 Thread Sean Paul
On Mon, Oct 9, 2017 at 4:59 AM, Daniel Vetter  wrote:
> On Sun, Oct 08, 2017 at 03:43:35PM +0100, Chris Wilson wrote:
>> Quoting Harsha Sharma (2017-10-08 15:04:07)
>> > @@ -624,7 +624,7 @@ static bool intel_fbdev_init_bios(struct drm_device 
>> > *dev,
>> > ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
>> > ifbdev->fb = fb;
>> >
>> > -   drm_framebuffer_reference(>fb->base);
>> > +   drm_framebuffer_put(>fb->base);
>>
>> Whoops.
>
> Hm yeah, how did this happen? Does cocci really do this, or is that an
> accident from manually fixing stuff up?


Running the spatch from the commit message gives me the correct substitution:
@@ -627,7 +627,7 @@ static bool intel_fbdev_init_bios(struct drm_device *dev,
ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
ifbdev->fb = fb;

-   drm_framebuffer_reference(>fb->base);
+   drm_framebuffer_get(>fb->base);

/* Final pass to check if any active pipes don't have fbs */
for_each_crtc(dev, crtc) {


Probably just finger slip since this is the last chunk before the
omitted selftests changes.

Harsha: the "better" way to omit the selftests without hand tuning the
patch would be to run the cocci spatch on i915 as normal, and then run
"git checkout -- drivers/gpu/drm/i915/selftests/" before committing.
It's dangerous to edit patches by hand, or to misrepresent a patch as
being the result of a cocci spatch when it's not.

Sean



> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


[PATCH v3] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
Filters should be cleared of init functions during freeing of init
memory when the ftrace dyn records are released. However in current
code, the filters are left as is. This patch clears the hashes of the
saved init functions when the init memory is freed. This fixes the
following issue reproducible with the following sequence of commands for
a test module:


void bar(void)
{
printk(KERN_INFO "bar!\n");
}

void foo(void)
{
printk(KERN_INFO "foo!\n");
bar();
}

static int __init hello_init(void)
{
printk(KERN_INFO "Hello world!\n");
foo();
return 0;
}

static void __exit hello_cleanup(void)
{
printk(KERN_INFO "Cleaning up module.\n");
}

module_init(hello_init);
module_exit(hello_cleanup);


Commands:
echo '*:mod:test' > /d/tracing/set_ftrace_filter
echo function > /d/tracing/current_tracer
modprobe test
rmmod test
sleep 1
modprobe test
cat /d/tracing/set_ftrace_filter

Behavior without patch: Init function is still in the filter
Expected behavior: Shouldn't have any of the filters set

Signed-off-by: Joel Fernandes 
---
changes:
v1->v2: do the hash clearing when init memory is freed
v2->v3: Left a stale print, no other change.

 kernel/trace/ftrace.c | 69 +++
 1 file changed, 69 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9e99bd55732e..e0a98225666b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6067,6 +6067,63 @@ allocate_ftrace_mod_map(struct module *mod,
 }
 #endif /* CONFIG_MODULES */
 
+struct ftrace_init_func {
+   struct list_head list;
+   unsigned long ip;
+};
+
+/* Clear any init ips from hashes */
+static void
+clear_func_from_hash(struct ftrace_init_func *func, struct ftrace_hash *hash)
+{
+   struct ftrace_func_entry *entry;
+
+   if (ftrace_hash_empty(hash))
+   return;
+
+   entry = __ftrace_lookup_ip(hash, func->ip);
+
+   /*
+* Do not allow this rec to match again.
+* Yeah, it may waste some memory, but will be removed
+* if/when the hash is modified again.
+*/
+   if (entry)
+   entry->ip = 0;
+}
+
+static void
+clear_func_from_hashes(struct ftrace_init_func *func)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (!tr->ops || !tr->ops->func_hash)
+   continue;
+   mutex_lock(>ops->func_hash->regex_lock);
+   clear_func_from_hash(func, tr->ops->func_hash->filter_hash);
+   clear_func_from_hash(func, tr->ops->func_hash->notrace_hash);
+   mutex_unlock(>ops->func_hash->regex_lock);
+   }
+   mutex_unlock(_types_lock);
+}
+
+static void add_to_clear_hash_list(struct list_head *clear_list,
+  struct dyn_ftrace *rec)
+{
+   struct ftrace_init_func *func;
+
+   func = kmalloc(sizeof(*func), GFP_KERNEL);
+   if (!func) {
+   WARN_ONCE(1, "alloc failure, ftrace filter could be stale\n");
+   return;
+   }
+
+   func->ip = rec->ip;
+   list_add(>list, clear_list);
+}
+
 void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
 {
unsigned long start = (unsigned long)(start_ptr);
@@ -6076,8 +6133,12 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
struct dyn_ftrace *rec;
struct dyn_ftrace key;
struct ftrace_mod_map *mod_map = NULL;
+   struct ftrace_init_func *func, *func_next;
+   struct list_head clear_hash;
int order;
 
+   INIT_LIST_HEAD(_hash);
+
key.ip = start;
key.flags = end;/* overload flags, as it is unsigned long */
 
@@ -6102,6 +6163,9 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, 
void *end_ptr)
if (!rec)
continue;
 
+   /* rec will be cleared from hashes after ftrace_lock unlock */
+   add_to_clear_hash_list(_hash, rec);
+
if (mod_map)
save_ftrace_mod_rec(mod_map, rec);
 
@@ -6123,6 +6187,11 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
goto again;
}
mutex_unlock(_lock);
+
+   list_for_each_entry_safe(func, func_next, _hash, list) {
+   clear_func_from_hashes(func);
+   kfree(func);
+   }
 }
 
 void __init ftrace_free_init_mem(void)
-- 
2.14.2.920.gcf0c67979c-goog



[PATCH v3] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
Filters should be cleared of init functions during freeing of init
memory when the ftrace dyn records are released. However in current
code, the filters are left as is. This patch clears the hashes of the
saved init functions when the init memory is freed. This fixes the
following issue reproducible with the following sequence of commands for
a test module:


void bar(void)
{
printk(KERN_INFO "bar!\n");
}

void foo(void)
{
printk(KERN_INFO "foo!\n");
bar();
}

static int __init hello_init(void)
{
printk(KERN_INFO "Hello world!\n");
foo();
return 0;
}

static void __exit hello_cleanup(void)
{
printk(KERN_INFO "Cleaning up module.\n");
}

module_init(hello_init);
module_exit(hello_cleanup);


Commands:
echo '*:mod:test' > /d/tracing/set_ftrace_filter
echo function > /d/tracing/current_tracer
modprobe test
rmmod test
sleep 1
modprobe test
cat /d/tracing/set_ftrace_filter

Behavior without patch: Init function is still in the filter
Expected behavior: Shouldn't have any of the filters set

Signed-off-by: Joel Fernandes 
---
changes:
v1->v2: do the hash clearing when init memory is freed
v2->v3: Left a stale print, no other change.

 kernel/trace/ftrace.c | 69 +++
 1 file changed, 69 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9e99bd55732e..e0a98225666b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6067,6 +6067,63 @@ allocate_ftrace_mod_map(struct module *mod,
 }
 #endif /* CONFIG_MODULES */
 
+struct ftrace_init_func {
+   struct list_head list;
+   unsigned long ip;
+};
+
+/* Clear any init ips from hashes */
+static void
+clear_func_from_hash(struct ftrace_init_func *func, struct ftrace_hash *hash)
+{
+   struct ftrace_func_entry *entry;
+
+   if (ftrace_hash_empty(hash))
+   return;
+
+   entry = __ftrace_lookup_ip(hash, func->ip);
+
+   /*
+* Do not allow this rec to match again.
+* Yeah, it may waste some memory, but will be removed
+* if/when the hash is modified again.
+*/
+   if (entry)
+   entry->ip = 0;
+}
+
+static void
+clear_func_from_hashes(struct ftrace_init_func *func)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (!tr->ops || !tr->ops->func_hash)
+   continue;
+   mutex_lock(>ops->func_hash->regex_lock);
+   clear_func_from_hash(func, tr->ops->func_hash->filter_hash);
+   clear_func_from_hash(func, tr->ops->func_hash->notrace_hash);
+   mutex_unlock(>ops->func_hash->regex_lock);
+   }
+   mutex_unlock(_types_lock);
+}
+
+static void add_to_clear_hash_list(struct list_head *clear_list,
+  struct dyn_ftrace *rec)
+{
+   struct ftrace_init_func *func;
+
+   func = kmalloc(sizeof(*func), GFP_KERNEL);
+   if (!func) {
+   WARN_ONCE(1, "alloc failure, ftrace filter could be stale\n");
+   return;
+   }
+
+   func->ip = rec->ip;
+   list_add(>list, clear_list);
+}
+
 void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
 {
unsigned long start = (unsigned long)(start_ptr);
@@ -6076,8 +6133,12 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
struct dyn_ftrace *rec;
struct dyn_ftrace key;
struct ftrace_mod_map *mod_map = NULL;
+   struct ftrace_init_func *func, *func_next;
+   struct list_head clear_hash;
int order;
 
+   INIT_LIST_HEAD(_hash);
+
key.ip = start;
key.flags = end;/* overload flags, as it is unsigned long */
 
@@ -6102,6 +6163,9 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, 
void *end_ptr)
if (!rec)
continue;
 
+   /* rec will be cleared from hashes after ftrace_lock unlock */
+   add_to_clear_hash_list(_hash, rec);
+
if (mod_map)
save_ftrace_mod_rec(mod_map, rec);
 
@@ -6123,6 +6187,11 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
goto again;
}
mutex_unlock(_lock);
+
+   list_for_each_entry_safe(func, func_next, _hash, list) {
+   clear_func_from_hashes(func);
+   kfree(func);
+   }
 }
 
 void __init ftrace_free_init_mem(void)
-- 
2.14.2.920.gcf0c67979c-goog



Re: [PATCH v2] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
On Mon, Oct 9, 2017 at 12:11 PM, Joel Fernandes  wrote:
> On Mon, Oct 9, 2017 at 11:53 AM, Joel Fernandes  wrote:
> [..]
>> +
>>  void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
>>  {
>> unsigned long start = (unsigned long)(start_ptr);
>> @@ -6076,8 +6135,12 @@ void ftrace_free_mem(struct module *mod, void 
>> *start_ptr, void *end_ptr)
>> struct dyn_ftrace *rec;
>> struct dyn_ftrace key;
>> struct ftrace_mod_map *mod_map = NULL;
>> +   struct ftrace_init_func *func, *func_next;
>> +   struct list_head clear_hash;
>> int order;
>>
>> +   INIT_LIST_HEAD(_hash);
>> +
>> key.ip = start;
>> key.flags = end;/* overload flags, as it is unsigned long */
>>
>> @@ -6102,6 +6165,9 @@ void ftrace_free_mem(struct module *mod, void 
>> *start_ptr, void *end_ptr)
>> if (!rec)
>> continue;
>>
>> +   /* rec will be cleared from hashes after ftrace_lock unlock 
>> */
>> +   add_to_clear_hash_list(_hash, rec);
>> +
>> if (mod_map)
>> save_ftrace_mod_rec(mod_map, rec);
>>
>> @@ -6123,6 +6189,11 @@ void ftrace_free_mem(struct module *mod, void 
>> *start_ptr, void *end_ptr)
>> goto again;
>> }
>> mutex_unlock(_lock);
>> +
>> +   list_for_each_entry_safe(func, func_next, _hash, list) {
>
> I think I screwed this list_for_each_entry up! Please ignore this
> revision, I'll send another patch revision shortly.

Actually, its correct. I got confused with how
list_for_each_entry_safe's terminating condition is defined:

>member != (head)

I thought head itself had to be an entry's member, but here the
pointer offset is calculated and will take care of the termination so
its fine.

Sorry about the noise!

thanks,

- Joel


Re: [PATCH v2] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
On Mon, Oct 9, 2017 at 12:11 PM, Joel Fernandes  wrote:
> On Mon, Oct 9, 2017 at 11:53 AM, Joel Fernandes  wrote:
> [..]
>> +
>>  void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
>>  {
>> unsigned long start = (unsigned long)(start_ptr);
>> @@ -6076,8 +6135,12 @@ void ftrace_free_mem(struct module *mod, void 
>> *start_ptr, void *end_ptr)
>> struct dyn_ftrace *rec;
>> struct dyn_ftrace key;
>> struct ftrace_mod_map *mod_map = NULL;
>> +   struct ftrace_init_func *func, *func_next;
>> +   struct list_head clear_hash;
>> int order;
>>
>> +   INIT_LIST_HEAD(_hash);
>> +
>> key.ip = start;
>> key.flags = end;/* overload flags, as it is unsigned long */
>>
>> @@ -6102,6 +6165,9 @@ void ftrace_free_mem(struct module *mod, void 
>> *start_ptr, void *end_ptr)
>> if (!rec)
>> continue;
>>
>> +   /* rec will be cleared from hashes after ftrace_lock unlock 
>> */
>> +   add_to_clear_hash_list(_hash, rec);
>> +
>> if (mod_map)
>> save_ftrace_mod_rec(mod_map, rec);
>>
>> @@ -6123,6 +6189,11 @@ void ftrace_free_mem(struct module *mod, void 
>> *start_ptr, void *end_ptr)
>> goto again;
>> }
>> mutex_unlock(_lock);
>> +
>> +   list_for_each_entry_safe(func, func_next, _hash, list) {
>
> I think I screwed this list_for_each_entry up! Please ignore this
> revision, I'll send another patch revision shortly.

Actually, its correct. I got confused with how
list_for_each_entry_safe's terminating condition is defined:

>member != (head)

I thought head itself had to be an entry's member, but here the
pointer offset is calculated and will take care of the termination so
its fine.

Sorry about the noise!

thanks,

- Joel


Build regressions/improvements in v4.14-rc4

2017-10-09 Thread Geert Uytterhoeven
Below is the list of build error/warning regressions/improvements in
v4.14-rc4[1] compared to v4.13[2].

Summarized:
  - build errors: +7/-0
  - build warnings: +1502/-6250

JFYI, when comparing v4.14-rc4[1] to v4.14-rc3[3], the summaries are:
  - build errors: +1/-2
  - build warnings: +817/-384

Note that there may be false regressions, as some logs are incomplete.
Still, they're build errors/warnings.

Happy fixing! ;-)

Thanks to the linux-next team for providing the build service.

[1] 
http://kisskb.ellerman.id.au/kisskb/head/8a5776a5f49812d29fe4b2d0a2d71675c3facf3f/
 (all 269 configs)
[2] 
http://kisskb.ellerman.id.au/kisskb/head/569dbb88e80deb68974ef6fdd6a13edb9d686261/
 (267 out of 269 configs)
[3] 
http://kisskb.ellerman.id.au/kisskb/head/9e66317d3c92ddaab330c125dfe9d06eee268aff/
 (267 out of 269 configs)


*** ERRORS ***

7 error regressions:
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
'per_cpu_secondary_mm' undeclared (first use in this function):  => 81:10
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
implicit declaration of function 'per_cpu' 
[-Werror=implicit-function-declaration]:  => 81:2
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
implicit declaration of function 'smp_processor_id' 
[-Werror=implicit-function-declaration]:  => 79:12
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
unknown type name 'per_cpu_secondary_mm':  => 22:37
  + /home/kisskb/slave/src/drivers/net/ethernet/intel/i40e/i40e_ethtool.c: 
error: implicit declaration of function 'cmpxchg64' 
[-Werror=implicit-function-declaration]:  => 4150:6
  + error: "__cmpxchg_u64" [drivers/net/ethernet/intel/i40e/i40e.ko] 
undefined!:  => N/A
  + error: "devm_ioremap_resource" [drivers/auxdisplay/img-ascii-lcd.ko] 
undefined!:  => N/A


*** WARNINGS ***

[Deleted 855 lines about "warning: ... [-Wpointer-sign]" on parisc-allmodconfig]
[Deleted 6084 lines about "warning: -ffunction-sections disabled; it makes 
profiling impossible [enabled by default]" on parisc-allmodconfig]

1502 warning regressions:
  + /home/kisskb/slave/src/arch/powerpc/platforms/ps3/device-init.c: warning: 
the frame size of 2112 bytes is larger than 2048 bytes [-Wframe-larger-than=]:  
=> 888:1
  + /home/kisskb/slave/src/arch/um/os-Linux/skas/process.c: warning: control 
reaches end of non-void function [-Wreturn-type]:  => 613:1
  + /home/kisskb/slave/src/block/blk-cgroup.c: warning: the frame size of 1160 
bytes is larger than 1024 bytes [-Wframe-larger-than=]:  => 354:1
  + /home/kisskb/slave/src/crypto/algif_aead.c: warning: 'crypto_aead_copy_sgl' 
uses dynamic stack allocation [enabled by default]:  => 91:1
  + /home/kisskb/slave/src/drivers/clk/renesas/clk-sh73a0.c: warning: 
'parent_name' may be used uninitialized in this function [-Wuninitialized]:  => 
155:3
  + /home/kisskb/slave/src/drivers/crypto/caam/caamalg_qi.c: warning: format 
'%lu' expects argument of type 'long unsigned int', but argument 4 has type 
'unsigned int' [-Wformat]:  => 672:16, 1062:16, 909:16
  + /home/kisskb/slave/src/drivers/gpu/drm/i915/intel_runtime_pm.c: warning: 
'pg' may be used uninitialized in this function [-Wuninitialized]:  => 392:33
  + /home/kisskb/slave/src/drivers/infiniband/core/uverbs_std_types.c: warning: 
'inbuf' may be used uninitialized in this function [-Wuninitialized]:  => 249:2
  + /home/kisskb/slave/src/drivers/infiniband/core/uverbs_std_types.c: warning: 
'outbuf' may be used uninitialized in this function [-Wuninitialized]:  => 249:2
  + /home/kisskb/slave/src/drivers/md/dm-crypt.c: warning: 
'crypt_iv_lmk_one.isra.28' uses dynamic stack allocation [enabled by default]:  
=> 640:1
  + /home/kisskb/slave/src/drivers/md/dm-crypt.c: warning: 
'crypt_iv_tcw_whitening.isra.27' uses dynamic stack allocation [enabled by 
default]:  => 787:1
  + /home/kisskb/slave/src/drivers/media/pci/ddbridge/ddbridge-io.h: warning: 
'return' with a value, in function returning void [enabled by default]:  => 
55:2, 50:2
  + /home/kisskb/slave/src/drivers/mtd/chips/cfi_cmdset_0020.c: warning: the 
frame size of 1396 bytes is larger than 1280 bytes [-Wframe-larger-than=]:  => 
603:1
  + /home/kisskb/slave/src/drivers/net/tun.c: warning: 'copylen' may be used 
uninitialized in this function [-Wuninitialized]:  => 1471:22
  + /home/kisskb/slave/src/drivers/net/tun.c: warning: 'linear' may be used 
uninitialized in this function [-Wuninitialized]:  => 1384:46, 1192:31, 1192:34
  + /home/kisskb/slave/src/drivers/scsi/bfa/bfa_fcs_lport.c: warning: the frame 
size of 1712 bytes is larger than 1280 bytes [-Wframe-larger-than=]:  => 2160:1
  + /home/kisskb/slave/src/drivers/staging/media/imx/imx-media-dev.c: warning: 
'ret' may be used uninitialized in this function [-Wuninitialized]:  => 431:5
  + /home/kisskb/slave/src/fs/btrfs/backref.c: warning: 'count' may be used 
uninitialized in this function [-Wuninitialized]:  => 799:15
  + 

Build regressions/improvements in v4.14-rc4

2017-10-09 Thread Geert Uytterhoeven
Below is the list of build error/warning regressions/improvements in
v4.14-rc4[1] compared to v4.13[2].

Summarized:
  - build errors: +7/-0
  - build warnings: +1502/-6250

JFYI, when comparing v4.14-rc4[1] to v4.14-rc3[3], the summaries are:
  - build errors: +1/-2
  - build warnings: +817/-384

Note that there may be false regressions, as some logs are incomplete.
Still, they're build errors/warnings.

Happy fixing! ;-)

Thanks to the linux-next team for providing the build service.

[1] 
http://kisskb.ellerman.id.au/kisskb/head/8a5776a5f49812d29fe4b2d0a2d71675c3facf3f/
 (all 269 configs)
[2] 
http://kisskb.ellerman.id.au/kisskb/head/569dbb88e80deb68974ef6fdd6a13edb9d686261/
 (267 out of 269 configs)
[3] 
http://kisskb.ellerman.id.au/kisskb/head/9e66317d3c92ddaab330c125dfe9d06eee268aff/
 (267 out of 269 configs)


*** ERRORS ***

7 error regressions:
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
'per_cpu_secondary_mm' undeclared (first use in this function):  => 81:10
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
implicit declaration of function 'per_cpu' 
[-Werror=implicit-function-declaration]:  => 81:2
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
implicit declaration of function 'smp_processor_id' 
[-Werror=implicit-function-declaration]:  => 79:12
  + /home/kisskb/slave/src/arch/sparc/include/asm/mmu_context_64.h: error: 
unknown type name 'per_cpu_secondary_mm':  => 22:37
  + /home/kisskb/slave/src/drivers/net/ethernet/intel/i40e/i40e_ethtool.c: 
error: implicit declaration of function 'cmpxchg64' 
[-Werror=implicit-function-declaration]:  => 4150:6
  + error: "__cmpxchg_u64" [drivers/net/ethernet/intel/i40e/i40e.ko] 
undefined!:  => N/A
  + error: "devm_ioremap_resource" [drivers/auxdisplay/img-ascii-lcd.ko] 
undefined!:  => N/A


*** WARNINGS ***

[Deleted 855 lines about "warning: ... [-Wpointer-sign]" on parisc-allmodconfig]
[Deleted 6084 lines about "warning: -ffunction-sections disabled; it makes 
profiling impossible [enabled by default]" on parisc-allmodconfig]

1502 warning regressions:
  + /home/kisskb/slave/src/arch/powerpc/platforms/ps3/device-init.c: warning: 
the frame size of 2112 bytes is larger than 2048 bytes [-Wframe-larger-than=]:  
=> 888:1
  + /home/kisskb/slave/src/arch/um/os-Linux/skas/process.c: warning: control 
reaches end of non-void function [-Wreturn-type]:  => 613:1
  + /home/kisskb/slave/src/block/blk-cgroup.c: warning: the frame size of 1160 
bytes is larger than 1024 bytes [-Wframe-larger-than=]:  => 354:1
  + /home/kisskb/slave/src/crypto/algif_aead.c: warning: 'crypto_aead_copy_sgl' 
uses dynamic stack allocation [enabled by default]:  => 91:1
  + /home/kisskb/slave/src/drivers/clk/renesas/clk-sh73a0.c: warning: 
'parent_name' may be used uninitialized in this function [-Wuninitialized]:  => 
155:3
  + /home/kisskb/slave/src/drivers/crypto/caam/caamalg_qi.c: warning: format 
'%lu' expects argument of type 'long unsigned int', but argument 4 has type 
'unsigned int' [-Wformat]:  => 672:16, 1062:16, 909:16
  + /home/kisskb/slave/src/drivers/gpu/drm/i915/intel_runtime_pm.c: warning: 
'pg' may be used uninitialized in this function [-Wuninitialized]:  => 392:33
  + /home/kisskb/slave/src/drivers/infiniband/core/uverbs_std_types.c: warning: 
'inbuf' may be used uninitialized in this function [-Wuninitialized]:  => 249:2
  + /home/kisskb/slave/src/drivers/infiniband/core/uverbs_std_types.c: warning: 
'outbuf' may be used uninitialized in this function [-Wuninitialized]:  => 249:2
  + /home/kisskb/slave/src/drivers/md/dm-crypt.c: warning: 
'crypt_iv_lmk_one.isra.28' uses dynamic stack allocation [enabled by default]:  
=> 640:1
  + /home/kisskb/slave/src/drivers/md/dm-crypt.c: warning: 
'crypt_iv_tcw_whitening.isra.27' uses dynamic stack allocation [enabled by 
default]:  => 787:1
  + /home/kisskb/slave/src/drivers/media/pci/ddbridge/ddbridge-io.h: warning: 
'return' with a value, in function returning void [enabled by default]:  => 
55:2, 50:2
  + /home/kisskb/slave/src/drivers/mtd/chips/cfi_cmdset_0020.c: warning: the 
frame size of 1396 bytes is larger than 1280 bytes [-Wframe-larger-than=]:  => 
603:1
  + /home/kisskb/slave/src/drivers/net/tun.c: warning: 'copylen' may be used 
uninitialized in this function [-Wuninitialized]:  => 1471:22
  + /home/kisskb/slave/src/drivers/net/tun.c: warning: 'linear' may be used 
uninitialized in this function [-Wuninitialized]:  => 1384:46, 1192:31, 1192:34
  + /home/kisskb/slave/src/drivers/scsi/bfa/bfa_fcs_lport.c: warning: the frame 
size of 1712 bytes is larger than 1280 bytes [-Wframe-larger-than=]:  => 2160:1
  + /home/kisskb/slave/src/drivers/staging/media/imx/imx-media-dev.c: warning: 
'ret' may be used uninitialized in this function [-Wuninitialized]:  => 431:5
  + /home/kisskb/slave/src/fs/btrfs/backref.c: warning: 'count' may be used 
uninitialized in this function [-Wuninitialized]:  => 799:15
  + 

Re: [tho...@m3y3r.de: Re: [PATCH] um: Fix kcov crash before kernel is started.]

2017-10-09 Thread Thomas Meyer
On Mon, Oct 09, 2017 at 08:10:45PM +0200, Dmitry Vyukov wrote:
> On Mon, Oct 9, 2017 at 6:47 PM, Thomas Meyer  wrote:
> > - Forwarded message from Thomas Meyer  -
> >
> > Hi,
> >
> > are you able to shed light on this topic?
> > Any help is greatly appreciated!
> >
> > With kind regards
> > thomas
> >
> > Date: Sun, 8 Oct 2017 13:18:24 +0200
> > From: Thomas Meyer 
> > To: Richard Weinberger 
> > Cc: user-mode-linux-de...@lists.sourceforge.net, 
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] um: Fix kcov crash before kernel is started.
> > User-Agent: NeoMutt/20170113 (1.7.2)
> >
> > On Sun, Oct 08, 2017 at 12:44:12PM +0200, Richard Weinberger wrote:
> >> Am Sonntag, 8. Oktober 2017, 12:31:58 CEST schrieb Thomas Meyer:
> >> > UMLs current_thread_info() unconditionally assumes that the top of the 
> >> > stack
> >> > contains the thread_info structure. But on UML the 
> >> > __sanitizer_cov_trace_pc
> >> > function is called for *all* functions! This results in an early crash:
> >> >
> >> > Prevent kcov from using invalid curent_thread_info() data by checking
> >> > the system_state.
> >> >
> >> > Signed-off-by: Thomas Meyer 
> >> > ---
> >> >  kernel/kcov.c | 6 ++
> >> >  1 file changed, 6 insertions(+)
> >> >
> >> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> >> > index 3f693a0f6f3e..d601c0e956f6 100644
> >> > --- a/kernel/kcov.c
> >> > +++ b/kernel/kcov.c
> >> > @@ -56,6 +56,12 @@ void notrace __sanitizer_cov_trace_pc(void)
> >> > struct task_struct *t;
> >> > enum kcov_mode mode;
> >> >
> >> > +#ifdef CONFIG_UML
> >> > +   if(!(system_state == SYSTEM_SCHEDULING ||
> >> > +system_state == SYSTEM_RUNNING))
> >> > +   return;
> >> > +#endif
> >>
> >> Hmm, and why does it work on all other archs then?
> >
> > Hi,
> >
> > I guess UML is different then other archs! But to be honest I'm not sure
> > why. I assume that __sanitizer_cov_trace_pc on other archs isn't called
> > that early, or that curent_thread_info returns NULL on other archs when
> > the first task isn't running yet.
> >
> > But as I fail to use/setup the qemu gdb attachment to debug early x86_64 
> > code
> > I can't say exactly why.
> >
> > Maybe someone how knows the inner workings of x86_64 and/or kcov can
> > answer this question!
> 
> 
> Hi,

Hi,

> Yes, kcov can have some issues with early bootstrap code, because it
> accesses current and it can also conflict with say, per-cpu setup code
> (at least it was the case for x86). For x86 and arm64 we just bulk
> blacklist instrumentation of arch code involved in early bootstrap.
> See e.g. KCOV_INSTRUMENT in arch/x86/boot/Makefile. I think you need
> to do the same for um. Start with bulk ignoring as much as possible
> until you get it booting and then bisect back from there.

oh, arch/um/* already contains the Makefile exception settings!
I guess CONFIG_KCOV_INSTRUMENT_ALL overrides the the Makefile settings?
Or doesn't it? I looked at scripts/Makefile.lib but failed to understand
what config options has precedens in that case.

greets
thomas 


Re: [tho...@m3y3r.de: Re: [PATCH] um: Fix kcov crash before kernel is started.]

2017-10-09 Thread Thomas Meyer
On Mon, Oct 09, 2017 at 08:10:45PM +0200, Dmitry Vyukov wrote:
> On Mon, Oct 9, 2017 at 6:47 PM, Thomas Meyer  wrote:
> > - Forwarded message from Thomas Meyer  -
> >
> > Hi,
> >
> > are you able to shed light on this topic?
> > Any help is greatly appreciated!
> >
> > With kind regards
> > thomas
> >
> > Date: Sun, 8 Oct 2017 13:18:24 +0200
> > From: Thomas Meyer 
> > To: Richard Weinberger 
> > Cc: user-mode-linux-de...@lists.sourceforge.net, 
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] um: Fix kcov crash before kernel is started.
> > User-Agent: NeoMutt/20170113 (1.7.2)
> >
> > On Sun, Oct 08, 2017 at 12:44:12PM +0200, Richard Weinberger wrote:
> >> Am Sonntag, 8. Oktober 2017, 12:31:58 CEST schrieb Thomas Meyer:
> >> > UMLs current_thread_info() unconditionally assumes that the top of the 
> >> > stack
> >> > contains the thread_info structure. But on UML the 
> >> > __sanitizer_cov_trace_pc
> >> > function is called for *all* functions! This results in an early crash:
> >> >
> >> > Prevent kcov from using invalid curent_thread_info() data by checking
> >> > the system_state.
> >> >
> >> > Signed-off-by: Thomas Meyer 
> >> > ---
> >> >  kernel/kcov.c | 6 ++
> >> >  1 file changed, 6 insertions(+)
> >> >
> >> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> >> > index 3f693a0f6f3e..d601c0e956f6 100644
> >> > --- a/kernel/kcov.c
> >> > +++ b/kernel/kcov.c
> >> > @@ -56,6 +56,12 @@ void notrace __sanitizer_cov_trace_pc(void)
> >> > struct task_struct *t;
> >> > enum kcov_mode mode;
> >> >
> >> > +#ifdef CONFIG_UML
> >> > +   if(!(system_state == SYSTEM_SCHEDULING ||
> >> > +system_state == SYSTEM_RUNNING))
> >> > +   return;
> >> > +#endif
> >>
> >> Hmm, and why does it work on all other archs then?
> >
> > Hi,
> >
> > I guess UML is different then other archs! But to be honest I'm not sure
> > why. I assume that __sanitizer_cov_trace_pc on other archs isn't called
> > that early, or that curent_thread_info returns NULL on other archs when
> > the first task isn't running yet.
> >
> > But as I fail to use/setup the qemu gdb attachment to debug early x86_64 
> > code
> > I can't say exactly why.
> >
> > Maybe someone how knows the inner workings of x86_64 and/or kcov can
> > answer this question!
> 
> 
> Hi,

Hi,

> Yes, kcov can have some issues with early bootstrap code, because it
> accesses current and it can also conflict with say, per-cpu setup code
> (at least it was the case for x86). For x86 and arm64 we just bulk
> blacklist instrumentation of arch code involved in early bootstrap.
> See e.g. KCOV_INSTRUMENT in arch/x86/boot/Makefile. I think you need
> to do the same for um. Start with bulk ignoring as much as possible
> until you get it booting and then bisect back from there.

oh, arch/um/* already contains the Makefile exception settings!
I guess CONFIG_KCOV_INSTRUMENT_ALL overrides the the Makefile settings?
Or doesn't it? I looked at scripts/Makefile.lib but failed to understand
what config options has precedens in that case.

greets
thomas 


Re: [PATCH] [media] ov5645: I2C address change

2017-10-09 Thread Laurent Pinchart
Hi Todor,

On Monday, 9 October 2017 19:18:17 EEST Todor Tomov wrote:
> On  9.10.2017 15:52, Laurent Pinchart wrote:
> > On Monday, 9 October 2017 12:34:26 EEST Sakari Ailus wrote:
> >> On Mon, Oct 09, 2017 at 11:36:01AM +0300, Todor Tomov wrote:
> >>> On  4.10.2017 13:47, Laurent Pinchart wrote:
>  CC'ing the I2C mainling list and the I2C maintainer.
>  
>  On Wednesday, 4 October 2017 13:30:08 EEST Sakari Ailus wrote:
> > On Mon, Oct 02, 2017 at 04:28:45PM +0300, Todor Tomov wrote:
> >> As soon as the sensor is powered on, change the I2C address to the
> >> one specified in DT. This allows to use multiple physical sensors
> >> connected to the same I2C bus.
> >> 
> >> Signed-off-by: Todor Tomov 
> > 
> > The smiapp driver does something similar and I understand Laurent
> > might be interested in such functionality as well.
> > 
> > It'd be nice to handle this through the I²C framework instead and to
> > define how the information is specified through DT. That way it could
> > be made generic, to work with more devices than just this one.
> > 
> > What do you think?
> >>> 
> >>> Thank you for this suggestion.
> >>> 
> >>> The way I have done it is to put the new I2C address in the DT and the
> >>> driver programs the change using the original I2C address. The original
> >>> I2C address is hardcoded in the driver. So maybe we can extend the DT
> >>> binding and the I2C framework so that both addresses come from the DT
> >>> and avoid hiding the original I2C address in the driver. This sounds
> >>> good to me.
> >> 
> >> Agreed.
> >> 
> >> In this case the address is known but in general that's not the case it's
> >> not that simple. There are register compatible devices that have
> >> different addresses even if they're the same devices.
> >> 
> >> It might be a good idea to make this explicit.
> > 
> > Yes, in the general case we need to specify the original address in DT, as
> > the chip could have a non-fixed boot-up I2C address.
> > 
> > In many cases the value of the new I2C address doesn't matter much, as
> > long as it's unique on the bus. I was thinking about implementing a
> > dynamic allocator for I2C addresses, but after discussing it with Wolfram
> > we concluded that it would probably not be a good idea. There could be
> > other I2C devices on the bus that Linux isn't aware of, in which case the
> > dynamic allocator could create address conflicts. Specifying the new
> > address in DT is likely a better idea, even if it could feel a bit more
> > of system configuration information than a pure hardware description.
> > 
> >>> Then changing the address could be device specific and also this must be
> >>> done right after power on so that there are no address conflicts. So I
> >>> don't think that we can support this through the I2C framework only, the
> >>> drivers that we want to do that will have to be expanded with this
> >>> functionality. Or do you have any other idea?
> >> 
> >> Yes, how the address is changed is always hardware specific. This would
> >> be most conveniently done in driver's probe or PM runtime_resume
> >> functions.
> > 
> > This patch modifies client->addr directly, which I don't think is a good
> > idea. I'd prefer making the I2C core aware of the address change through
> > an explicit API call. This would allow catching I2C adress conflicts for
> > instance.
> > 
> >> It could be as simple as providing an adapter specific mutex to serialise
> >> address changes on the bus so that no two address changes are taking
> >> place at the same time. Which is essentially the impliementation you had,
> >> only the mutex would be for the I²C adapter, not the driver. An helper
> >> functions for acquiring and releasing the mutex.
> > 
> > Why do you need to serialize address changes ?
> 
> Correct me if I'm wrong, but if you power on more than one device with the
> same I2C address and issue a command to change it, then all devices will
> recognize this command as addressed to them. The only solution (which I know
> about) to avoid this is to serialize the power on and address change (as a
> whole!) for these devices.

Yes, that's correct. It can be even worse than that, sometimes only one of the 
two devices with the same address can be reconfigured, which means that 
powering that device requires powering up the other device and changing its 
address first, otherwise the second device can't be used as long as the first 
one is power on (this happened for real in a Nokia platform).

> I think it would be better to move the mutex out of the driver - to avoid
> all client drivers which will change I2C address to add a global variable
> mutex for this. We just have to find a better place for it :)

The biggest issue I see is that there's no C code that has knowledge of the 
whole platform. It's hard to describe this in DT in a generic way, board files 
were clearly useful for this kind 

Re: [PATCH] [media] ov5645: I2C address change

2017-10-09 Thread Laurent Pinchart
Hi Todor,

On Monday, 9 October 2017 19:18:17 EEST Todor Tomov wrote:
> On  9.10.2017 15:52, Laurent Pinchart wrote:
> > On Monday, 9 October 2017 12:34:26 EEST Sakari Ailus wrote:
> >> On Mon, Oct 09, 2017 at 11:36:01AM +0300, Todor Tomov wrote:
> >>> On  4.10.2017 13:47, Laurent Pinchart wrote:
>  CC'ing the I2C mainling list and the I2C maintainer.
>  
>  On Wednesday, 4 October 2017 13:30:08 EEST Sakari Ailus wrote:
> > On Mon, Oct 02, 2017 at 04:28:45PM +0300, Todor Tomov wrote:
> >> As soon as the sensor is powered on, change the I2C address to the
> >> one specified in DT. This allows to use multiple physical sensors
> >> connected to the same I2C bus.
> >> 
> >> Signed-off-by: Todor Tomov 
> > 
> > The smiapp driver does something similar and I understand Laurent
> > might be interested in such functionality as well.
> > 
> > It'd be nice to handle this through the I²C framework instead and to
> > define how the information is specified through DT. That way it could
> > be made generic, to work with more devices than just this one.
> > 
> > What do you think?
> >>> 
> >>> Thank you for this suggestion.
> >>> 
> >>> The way I have done it is to put the new I2C address in the DT and the
> >>> driver programs the change using the original I2C address. The original
> >>> I2C address is hardcoded in the driver. So maybe we can extend the DT
> >>> binding and the I2C framework so that both addresses come from the DT
> >>> and avoid hiding the original I2C address in the driver. This sounds
> >>> good to me.
> >> 
> >> Agreed.
> >> 
> >> In this case the address is known but in general that's not the case it's
> >> not that simple. There are register compatible devices that have
> >> different addresses even if they're the same devices.
> >> 
> >> It might be a good idea to make this explicit.
> > 
> > Yes, in the general case we need to specify the original address in DT, as
> > the chip could have a non-fixed boot-up I2C address.
> > 
> > In many cases the value of the new I2C address doesn't matter much, as
> > long as it's unique on the bus. I was thinking about implementing a
> > dynamic allocator for I2C addresses, but after discussing it with Wolfram
> > we concluded that it would probably not be a good idea. There could be
> > other I2C devices on the bus that Linux isn't aware of, in which case the
> > dynamic allocator could create address conflicts. Specifying the new
> > address in DT is likely a better idea, even if it could feel a bit more
> > of system configuration information than a pure hardware description.
> > 
> >>> Then changing the address could be device specific and also this must be
> >>> done right after power on so that there are no address conflicts. So I
> >>> don't think that we can support this through the I2C framework only, the
> >>> drivers that we want to do that will have to be expanded with this
> >>> functionality. Or do you have any other idea?
> >> 
> >> Yes, how the address is changed is always hardware specific. This would
> >> be most conveniently done in driver's probe or PM runtime_resume
> >> functions.
> > 
> > This patch modifies client->addr directly, which I don't think is a good
> > idea. I'd prefer making the I2C core aware of the address change through
> > an explicit API call. This would allow catching I2C adress conflicts for
> > instance.
> > 
> >> It could be as simple as providing an adapter specific mutex to serialise
> >> address changes on the bus so that no two address changes are taking
> >> place at the same time. Which is essentially the impliementation you had,
> >> only the mutex would be for the I²C adapter, not the driver. An helper
> >> functions for acquiring and releasing the mutex.
> > 
> > Why do you need to serialize address changes ?
> 
> Correct me if I'm wrong, but if you power on more than one device with the
> same I2C address and issue a command to change it, then all devices will
> recognize this command as addressed to them. The only solution (which I know
> about) to avoid this is to serialize the power on and address change (as a
> whole!) for these devices.

Yes, that's correct. It can be even worse than that, sometimes only one of the 
two devices with the same address can be reconfigured, which means that 
powering that device requires powering up the other device and changing its 
address first, otherwise the second device can't be used as long as the first 
one is power on (this happened for real in a Nokia platform).

> I think it would be better to move the mutex out of the driver - to avoid
> all client drivers which will change I2C address to add a global variable
> mutex for this. We just have to find a better place for it :)

The biggest issue I see is that there's no C code that has knowledge of the 
whole platform. It's hard to describe this in DT in a generic way, board files 
were clearly useful for this kind of situations.

> >> I 

Re: [PATCH] w1: keep balance of mutex locks and refcnts

2017-10-09 Thread Evgeniy Polyakov
Hi Alexey

07.10.2017, 20:59, "Alexey Khoroshilov" :
> Is it possible to switch to a nested variant:
> mutex_lock-atomic_inc-atomic_dec-mutex_unlock
> or
> atomic_inc-mutex_lock-mutex_unlock-atomic_dec
> ?

Yeah, you are right, it is a bit messy - we drop the lock while sleeping 
waiting for the bus master to complete operation,
and during this period family driver has to be referenced.

But we can easily grab the reference earlier and then try to lock the bus, so 
the second variant will work.

> --
> Alexey
>
> On 01.10.2017 08:55, Evgeniy Polyakov wrote:
>>  Hi Alex
>>
>>  29.09.2017, 23:23, "Alexey Khoroshilov" :
>>>  w1_therm_eeprom() and w1_DS18B20_precision() decrement THERM_REFCNT
>>>  on error paths, while they did not increment it yet.
>>>
>>>  read_therm() unlocks bus mutex on some error paths,
>>>  while it is not acquired.
>>>
>>>  The patch makes sure all the functions keep the balance in usage of
>>>  the mutex and the THERM_REFCNT.
>>>
>>>  Found by Linux Driver Verification project (linuxtesting.org).
>>
>>  Yes, this looks like a bug, thanks for finding it!
>>
>>  Please update your patch to use single exit point and not a mix of returns 
>> in the body of the function.
>>
>>>   ret = mutex_lock_interruptible(>bus_mutex);
>>>   if (ret != 0)
>>>  - goto post_unlock;
>>>  + return ret;
>>>
>>>   if (!sl->family_data) {
>>>  - ret = -ENODEV;
>>>  - goto pre_unlock;
>>>  + mutex_unlock(>bus_mutex);
>>>  + return -ENODEV;
>>>   }


Re: [PATCH] w1: keep balance of mutex locks and refcnts

2017-10-09 Thread Evgeniy Polyakov
Hi Alexey

07.10.2017, 20:59, "Alexey Khoroshilov" :
> Is it possible to switch to a nested variant:
> mutex_lock-atomic_inc-atomic_dec-mutex_unlock
> or
> atomic_inc-mutex_lock-mutex_unlock-atomic_dec
> ?

Yeah, you are right, it is a bit messy - we drop the lock while sleeping 
waiting for the bus master to complete operation,
and during this period family driver has to be referenced.

But we can easily grab the reference earlier and then try to lock the bus, so 
the second variant will work.

> --
> Alexey
>
> On 01.10.2017 08:55, Evgeniy Polyakov wrote:
>>  Hi Alex
>>
>>  29.09.2017, 23:23, "Alexey Khoroshilov" :
>>>  w1_therm_eeprom() and w1_DS18B20_precision() decrement THERM_REFCNT
>>>  on error paths, while they did not increment it yet.
>>>
>>>  read_therm() unlocks bus mutex on some error paths,
>>>  while it is not acquired.
>>>
>>>  The patch makes sure all the functions keep the balance in usage of
>>>  the mutex and the THERM_REFCNT.
>>>
>>>  Found by Linux Driver Verification project (linuxtesting.org).
>>
>>  Yes, this looks like a bug, thanks for finding it!
>>
>>  Please update your patch to use single exit point and not a mix of returns 
>> in the body of the function.
>>
>>>   ret = mutex_lock_interruptible(>bus_mutex);
>>>   if (ret != 0)
>>>  - goto post_unlock;
>>>  + return ret;
>>>
>>>   if (!sl->family_data) {
>>>  - ret = -ENODEV;
>>>  - goto pre_unlock;
>>>  + mutex_unlock(>bus_mutex);
>>>  + return -ENODEV;
>>>   }


Re: linux-next: manual merge of the staging tree with the media tree

2017-10-09 Thread Mark Brown
On Mon, Oct 09, 2017 at 09:05:32PM +0200, Greg KH wrote:

> What fix?  :(

It's a null diff, the change isn't needed any more.


signature.asc
Description: PGP signature


Re: linux-next: manual merge of the staging tree with the media tree

2017-10-09 Thread Mark Brown
On Mon, Oct 09, 2017 at 09:05:32PM +0200, Greg KH wrote:

> What fix?  :(

It's a null diff, the change isn't needed any more.


signature.asc
Description: PGP signature


Re: [PATCH v2] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
On Mon, Oct 9, 2017 at 11:53 AM, Joel Fernandes  wrote:
[..]
> +
>  void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
>  {
> unsigned long start = (unsigned long)(start_ptr);
> @@ -6076,8 +6135,12 @@ void ftrace_free_mem(struct module *mod, void 
> *start_ptr, void *end_ptr)
> struct dyn_ftrace *rec;
> struct dyn_ftrace key;
> struct ftrace_mod_map *mod_map = NULL;
> +   struct ftrace_init_func *func, *func_next;
> +   struct list_head clear_hash;
> int order;
>
> +   INIT_LIST_HEAD(_hash);
> +
> key.ip = start;
> key.flags = end;/* overload flags, as it is unsigned long */
>
> @@ -6102,6 +6165,9 @@ void ftrace_free_mem(struct module *mod, void 
> *start_ptr, void *end_ptr)
> if (!rec)
> continue;
>
> +   /* rec will be cleared from hashes after ftrace_lock unlock */
> +   add_to_clear_hash_list(_hash, rec);
> +
> if (mod_map)
> save_ftrace_mod_rec(mod_map, rec);
>
> @@ -6123,6 +6189,11 @@ void ftrace_free_mem(struct module *mod, void 
> *start_ptr, void *end_ptr)
> goto again;
> }
> mutex_unlock(_lock);
> +
> +   list_for_each_entry_safe(func, func_next, _hash, list) {

I think I screwed this list_for_each_entry up! Please ignore this
revision, I'll send another patch revision shortly.

Regards,
Joel


Re: [PATCH v2] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
On Mon, Oct 9, 2017 at 11:53 AM, Joel Fernandes  wrote:
[..]
> +
>  void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
>  {
> unsigned long start = (unsigned long)(start_ptr);
> @@ -6076,8 +6135,12 @@ void ftrace_free_mem(struct module *mod, void 
> *start_ptr, void *end_ptr)
> struct dyn_ftrace *rec;
> struct dyn_ftrace key;
> struct ftrace_mod_map *mod_map = NULL;
> +   struct ftrace_init_func *func, *func_next;
> +   struct list_head clear_hash;
> int order;
>
> +   INIT_LIST_HEAD(_hash);
> +
> key.ip = start;
> key.flags = end;/* overload flags, as it is unsigned long */
>
> @@ -6102,6 +6165,9 @@ void ftrace_free_mem(struct module *mod, void 
> *start_ptr, void *end_ptr)
> if (!rec)
> continue;
>
> +   /* rec will be cleared from hashes after ftrace_lock unlock */
> +   add_to_clear_hash_list(_hash, rec);
> +
> if (mod_map)
> save_ftrace_mod_rec(mod_map, rec);
>
> @@ -6123,6 +6189,11 @@ void ftrace_free_mem(struct module *mod, void 
> *start_ptr, void *end_ptr)
> goto again;
> }
> mutex_unlock(_lock);
> +
> +   list_for_each_entry_safe(func, func_next, _hash, list) {

I think I screwed this list_for_each_entry up! Please ignore this
revision, I'll send another patch revision shortly.

Regards,
Joel


[PATCH] Documentation: leds: Update 00-INDEX file

2017-10-09 Thread Jacek Anaszewski
Add missing entries for the following documentation files:

- leds-class-flash.txt
- leds-lm3556.txt
- leds-mlxcpld.txt
- ledtrig-usbport.txt
- uleds.txt

Signed-off-by: Jacek Anaszewski 
---
 Documentation/leds/00-INDEX | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/leds/00-INDEX b/Documentation/leds/00-INDEX
index b4ef1f3..ae626b2 100644
--- a/Documentation/leds/00-INDEX
+++ b/Documentation/leds/00-INDEX
@@ -4,6 +4,10 @@ leds-blinkm.txt
- Driver for BlinkM LED-devices.
 leds-class.txt
- documents LED handling under Linux.
+leds-class-flash.txt
+   - documents flash LED handling under Linux.
+leds-lm3556.txt
+   - notes on how to use the leds-lm3556 driver.
 leds-lp3944.txt
- notes on how to use the leds-lp3944 driver.
 leds-lp5521.txt
@@ -16,7 +20,13 @@ leds-lp55xx.txt
- description about lp55xx common driver.
 leds-lm3556.txt
- notes on how to use the leds-lm3556 driver.
+leds-mlxcpld.txt
+   - notes on how to use the leds-mlxcpld driver.
 ledtrig-oneshot.txt
- One-shot LED trigger for both sporadic and dense events.
 ledtrig-transient.txt
- LED Transient Trigger, one shot timer activation.
+ledtrig-usbport.txt
+   - notes on how to use the drivers/usb/core/ledtrig-usbport.c trigger.
+uleds.txt
+   - notes on how to use the uleds driver.
-- 
2.1.4



[PATCH] Documentation: leds: Update 00-INDEX file

2017-10-09 Thread Jacek Anaszewski
Add missing entries for the following documentation files:

- leds-class-flash.txt
- leds-lm3556.txt
- leds-mlxcpld.txt
- ledtrig-usbport.txt
- uleds.txt

Signed-off-by: Jacek Anaszewski 
---
 Documentation/leds/00-INDEX | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/leds/00-INDEX b/Documentation/leds/00-INDEX
index b4ef1f3..ae626b2 100644
--- a/Documentation/leds/00-INDEX
+++ b/Documentation/leds/00-INDEX
@@ -4,6 +4,10 @@ leds-blinkm.txt
- Driver for BlinkM LED-devices.
 leds-class.txt
- documents LED handling under Linux.
+leds-class-flash.txt
+   - documents flash LED handling under Linux.
+leds-lm3556.txt
+   - notes on how to use the leds-lm3556 driver.
 leds-lp3944.txt
- notes on how to use the leds-lp3944 driver.
 leds-lp5521.txt
@@ -16,7 +20,13 @@ leds-lp55xx.txt
- description about lp55xx common driver.
 leds-lm3556.txt
- notes on how to use the leds-lm3556 driver.
+leds-mlxcpld.txt
+   - notes on how to use the leds-mlxcpld driver.
 ledtrig-oneshot.txt
- One-shot LED trigger for both sporadic and dense events.
 ledtrig-transient.txt
- LED Transient Trigger, one shot timer activation.
+ledtrig-usbport.txt
+   - notes on how to use the drivers/usb/core/ledtrig-usbport.c trigger.
+uleds.txt
+   - notes on how to use the uleds driver.
-- 
2.1.4



Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

2017-10-09 Thread Michael Kerrisk (man-pages)
Hi Rik,

Thanks for the blazingly fast response :-)

On 9 October 2017 at 21:08, Rik van Riel  wrote:
> On Mon, 2017-10-09 at 21:06 +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Rik,
>>
>> I have a follow-up question re wipe-on-fork. What are the semantics
>> for this setting with respect to fork() and exec()? That is, in the
>> child of a fork(), does the flag remain set for the specified address
>> range? (My quick read of the source suggests yes, but I have not
>> tested.) And, when we do an exec(), my assumption is that the flag is
>> cleared for the address range, but it would be good to have
>> confirmation.
>
> Indeed, on exec() the flag is cleared, because all
> memory regions get replaced on exec().

Thanks.

> The flag remains across a fork(), so if a child task
> were to fork, the memory would be empty of contents
> again in its child. This seems to most closely match
> the use case of discarding things like cryptographic
> secrets at fork time.

Thanks!

I'll add this info to the madvise(2) page.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

2017-10-09 Thread Michael Kerrisk (man-pages)
Hi Rik,

Thanks for the blazingly fast response :-)

On 9 October 2017 at 21:08, Rik van Riel  wrote:
> On Mon, 2017-10-09 at 21:06 +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Rik,
>>
>> I have a follow-up question re wipe-on-fork. What are the semantics
>> for this setting with respect to fork() and exec()? That is, in the
>> child of a fork(), does the flag remain set for the specified address
>> range? (My quick read of the source suggests yes, but I have not
>> tested.) And, when we do an exec(), my assumption is that the flag is
>> cleared for the address range, but it would be good to have
>> confirmation.
>
> Indeed, on exec() the flag is cleared, because all
> memory regions get replaced on exec().

Thanks.

> The flag remains across a fork(), so if a child task
> were to fork, the memory would be empty of contents
> again in its child. This seems to most closely match
> the use case of discarding things like cryptographic
> secrets at fork time.

Thanks!

I'll add this info to the madvise(2) page.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

2017-10-09 Thread Rik van Riel
On Mon, 2017-10-09 at 21:06 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Rik,
> 
> I have a follow-up question re wipe-on-fork. What are the semantics
> for this setting with respect to fork() and exec()? That is, in the
> child of a fork(), does the flag remain set for the specified address
> range? (My quick read of the source suggests yes, but I have not
> tested.) And, when we do an exec(), my assumption is that the flag is
> cleared for the address range, but it would be good to have
> confirmation.

Indeed, on exec() the flag is cleared, because all
memory regions get replaced on exec().

The flag remains across a fork(), so if a child task
were to fork, the memory would be empty of contents
again in its child. This seems to most closely match
the use case of discarding things like cryptographic
secrets at fork time.


Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

2017-10-09 Thread Rik van Riel
On Mon, 2017-10-09 at 21:06 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Rik,
> 
> I have a follow-up question re wipe-on-fork. What are the semantics
> for this setting with respect to fork() and exec()? That is, in the
> child of a fork(), does the flag remain set for the specified address
> range? (My quick read of the source suggests yes, but I have not
> tested.) And, when we do an exec(), my assumption is that the flag is
> cleared for the address range, but it would be good to have
> confirmation.

Indeed, on exec() the flag is cleared, because all
memory regions get replaced on exec().

The flag remains across a fork(), so if a child task
were to fork, the memory would be empty of contents
again in its child. This seems to most closely match
the use case of discarding things like cryptographic
secrets at fork time.


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Pavel Tatashin
>
> Ok, but I'm still missing why you think that is needed. What would be the
> second page table walker that needs implementing?
>
> I guess we could implement that on arm64 using our current vmemmap_populate
> logic and an explicit memset.
>

Hi Will,

What do you mean by explicit memset()? We can't simply memset() from
start to end without doing the page table walk, because at the time
kasan is calling vmemmap_populate() we have a tmp_pg_dir instead of
swapper_pg_dir.

We could do the explicit memset() after
cpu_replace_ttbr1(lm_alias(swapper_pg_dir)); but again, this was in
one of my previous implementations, and I was asked to replace that.

Pavel


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Pavel Tatashin
>
> Ok, but I'm still missing why you think that is needed. What would be the
> second page table walker that needs implementing?
>
> I guess we could implement that on arm64 using our current vmemmap_populate
> logic and an explicit memset.
>

Hi Will,

What do you mean by explicit memset()? We can't simply memset() from
start to end without doing the page table walk, because at the time
kasan is calling vmemmap_populate() we have a tmp_pg_dir instead of
swapper_pg_dir.

We could do the explicit memset() after
cpu_replace_ttbr1(lm_alias(swapper_pg_dir)); but again, this was in
one of my previous implementations, and I was asked to replace that.

Pavel


Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

2017-10-09 Thread Michael Kerrisk (man-pages)
Hi Rik,

I have a follow-up question re wipe-on-fork. What are the semantics
for this setting with respect to fork() and exec()? That is, in the
child of a fork(), does the flag remain set for the specified address
range? (My quick read of the source suggests yes, but I have not
tested.) And, when we do an exec(), my assumption is that the flag is
cleared for the address range, but it would be good to have
confirmation.

Thanks,

Michael


On 19 September 2017 at 21:21, Rik van Riel  wrote:
> On Tue, 2017-09-19 at 21:07 +0200, Michael Kerrisk (man-pages) wrote:
>
>> Thanks. I applied this, and tweaked the madvise.2 text a little, to
>> read as follows (please let me know if I messed anything up):
>>
>>MADV_WIPEONFORK (since Linux 4.14)
>>   Present the child process with zero-filled
>> memory  in  this
>>   range  after  a fork(2).  This is useful in forking
>> servers
>>   in order to ensure that  sensitive  per-
>> process  data  (for
>>   example,  PRNG  seeds, cryptographic secrets, and so
>> on) is
>>   not handed to child processes.
>>
>>   The MADV_WIPEONFORK operation can be applied
>> only  to  pri‐
>>   vate anonymous pages (see mmap(2)).
>
> That looks great. Thank you, Michael!
>
> --
> All rights reversed



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

2017-10-09 Thread Michael Kerrisk (man-pages)
Hi Rik,

I have a follow-up question re wipe-on-fork. What are the semantics
for this setting with respect to fork() and exec()? That is, in the
child of a fork(), does the flag remain set for the specified address
range? (My quick read of the source suggests yes, but I have not
tested.) And, when we do an exec(), my assumption is that the flag is
cleared for the address range, but it would be good to have
confirmation.

Thanks,

Michael


On 19 September 2017 at 21:21, Rik van Riel  wrote:
> On Tue, 2017-09-19 at 21:07 +0200, Michael Kerrisk (man-pages) wrote:
>
>> Thanks. I applied this, and tweaked the madvise.2 text a little, to
>> read as follows (please let me know if I messed anything up):
>>
>>MADV_WIPEONFORK (since Linux 4.14)
>>   Present the child process with zero-filled
>> memory  in  this
>>   range  after  a fork(2).  This is useful in forking
>> servers
>>   in order to ensure that  sensitive  per-
>> process  data  (for
>>   example,  PRNG  seeds, cryptographic secrets, and so
>> on) is
>>   not handed to child processes.
>>
>>   The MADV_WIPEONFORK operation can be applied
>> only  to  pri‐
>>   vate anonymous pages (see mmap(2)).
>
> That looks great. Thank you, Michael!
>
> --
> All rights reversed



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


[PATCH] watchdog: hpwdt: change maintainer.

2017-10-09 Thread Jerry Hoemann
Signed-off-by: Jerry Hoemann 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2d3d750..e7dc993 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6114,7 +6114,7 @@ S:Odd Fixes
 F: drivers/media/usb/hdpvr/
 
 HEWLETT PACKARD ENTERPRISE ILO NMI WATCHDOG DRIVER
-M: Jimmy Vance 
+M: Jerry Hoemann 
 S: Supported
 F: Documentation/watchdog/hpwdt.txt
 F: drivers/watchdog/hpwdt.c
-- 
1.8.3.1



[PATCH] watchdog: hpwdt: change maintainer.

2017-10-09 Thread Jerry Hoemann
Signed-off-by: Jerry Hoemann 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2d3d750..e7dc993 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6114,7 +6114,7 @@ S:Odd Fixes
 F: drivers/media/usb/hdpvr/
 
 HEWLETT PACKARD ENTERPRISE ILO NMI WATCHDOG DRIVER
-M: Jimmy Vance 
+M: Jerry Hoemann 
 S: Supported
 F: Documentation/watchdog/hpwdt.txt
 F: drivers/watchdog/hpwdt.c
-- 
1.8.3.1



Re: linux-next: manual merge of the staging tree with the media tree

2017-10-09 Thread Greg KH
On Mon, Oct 09, 2017 at 07:26:54PM +0100, Mark Brown wrote:
> Hi Greg,
> 
> Today's linux-next merge of the staging tree got a conflict in:
> 
>   drivers/staging/media/atomisp/pci/atomisp2/css2400/sh_css_firmware.c
> 
> between commit:
> 
>866af46e6ebbc ("media: Staging: atomisp: fix alloc_cast.cocci warnings")
> 
> from the media tree and commit:
> 
>4d962df5a7771 ("atomisp2: remove cast from memory allocation")
> 
> from the staging tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.


What fix?  :(


Re: linux-next: manual merge of the staging tree with the media tree

2017-10-09 Thread Greg KH
On Mon, Oct 09, 2017 at 07:26:54PM +0100, Mark Brown wrote:
> Hi Greg,
> 
> Today's linux-next merge of the staging tree got a conflict in:
> 
>   drivers/staging/media/atomisp/pci/atomisp2/css2400/sh_css_firmware.c
> 
> between commit:
> 
>866af46e6ebbc ("media: Staging: atomisp: fix alloc_cast.cocci warnings")
> 
> from the media tree and commit:
> 
>4d962df5a7771 ("atomisp2: remove cast from memory allocation")
> 
> from the staging tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.


What fix?  :(


Re: [PATCH 08/13] net/ipv4/tcp_input.c: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
On Mon, Oct 09, 2017 at 07:28:45PM +0100, Mark Rutland wrote:

> - ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
> - sk->sk_max_pacing_rate);
> + WRITE_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
> +sk->sk_max_pacing_rate);

Sorry, I messed this up when attempting to fix the horizontal alignment
of the min_t() parameters.

I've pushed out a corrected version to my core/access-once branch [1].

Thanks,
Mark.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
core/access-once


Re: [PATCH 08/13] net/ipv4/tcp_input.c: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
On Mon, Oct 09, 2017 at 07:28:45PM +0100, Mark Rutland wrote:

> - ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
> - sk->sk_max_pacing_rate);
> + WRITE_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
> +sk->sk_max_pacing_rate);

Sorry, I messed this up when attempting to fix the horizontal alignment
of the min_t() parameters.

I've pushed out a corrected version to my core/access-once branch [1].

Thanks,
Mark.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
core/access-once


Re: [PATCH] perf tools: unbreak perf record for arm/arm64

2017-10-09 Thread Will Deacon
On Mon, Oct 09, 2017 at 04:00:00PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 09, 2017 at 11:22:00AM +0100, Will Deacon escreveu:
> > On Fri, Oct 06, 2017 at 07:38:22PM +0100, Mark Rutland wrote:
> > > Currently, perf record is broken on arm/arm64 systems when the PMU is
> > > specified explicitly as part of the event, e.g.
> > > 
> > > $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true
> > > 
> > > In such cases, perf record fails to open events unless
> > > perf_event_paranoid is set to -1, even if the PMU in question supports
> > > mode exclusion. Further, even when perf_event_paranoid is toggled, no
> > > samples are recorded.
> > > 
> > > This is an unintended side effect of commit:
> > > 
> > >   e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > > monitoring)
> > > 
> > > ... which assumes that if a PMU has an associated cpu_map, it is an
> > > uncore PMU, and forces events for such PMUs to be system-wide.
> > > 
> > > This is not true for arm/arm64 systems, which can have heterogeneous
> > > CPUs. To account for this, multiple CPU PMUs are exposed, each with a
> > > "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM
> > > PMUs do not have a "cpumask" file, and only have a "cpus" file. For the
> > > gory details as to why, see commit:
> > > 
> > >  7e3fcffe95544010 ("perf pmu: Support alternative sysfs cpumask")
> > > 
> > > Given all of this, we can instead identify uncore PMUs by explicitly
> > > checking for a "cpumask" file, and restore arm/arm64 PMU support back to
> > > a working state. This patch does so, adding a new perf_pmu::is_uncore
> > > field, and splitting the existing cpumask parsing so that it can be
> > > reused.
> > > 
> > > Signed-off-by: Mark Rutland 
> > > Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > > monitoring)
> > 
> > It sucks that we haven't noticed this being broken for so long, but I can
> > confirm that this fixes the issue:
> > 
> > Acked-by: Will Deacon 
> > Tested-by Will Deacon 
> > 
> > Any chance we can get this into 4.14? You'll probably need to do some stable
> > backports too, since this is a bit spread out.
> 
> Sure, I've added this to my perf/urgent branch, that, together with a
> few other fixes is undergoing testing now.

Thanks, Arnaldo.

Will


Re: [PATCH] perf tools: unbreak perf record for arm/arm64

2017-10-09 Thread Will Deacon
On Mon, Oct 09, 2017 at 04:00:00PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 09, 2017 at 11:22:00AM +0100, Will Deacon escreveu:
> > On Fri, Oct 06, 2017 at 07:38:22PM +0100, Mark Rutland wrote:
> > > Currently, perf record is broken on arm/arm64 systems when the PMU is
> > > specified explicitly as part of the event, e.g.
> > > 
> > > $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true
> > > 
> > > In such cases, perf record fails to open events unless
> > > perf_event_paranoid is set to -1, even if the PMU in question supports
> > > mode exclusion. Further, even when perf_event_paranoid is toggled, no
> > > samples are recorded.
> > > 
> > > This is an unintended side effect of commit:
> > > 
> > >   e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > > monitoring)
> > > 
> > > ... which assumes that if a PMU has an associated cpu_map, it is an
> > > uncore PMU, and forces events for such PMUs to be system-wide.
> > > 
> > > This is not true for arm/arm64 systems, which can have heterogeneous
> > > CPUs. To account for this, multiple CPU PMUs are exposed, each with a
> > > "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM
> > > PMUs do not have a "cpumask" file, and only have a "cpus" file. For the
> > > gory details as to why, see commit:
> > > 
> > >  7e3fcffe95544010 ("perf pmu: Support alternative sysfs cpumask")
> > > 
> > > Given all of this, we can instead identify uncore PMUs by explicitly
> > > checking for a "cpumask" file, and restore arm/arm64 PMU support back to
> > > a working state. This patch does so, adding a new perf_pmu::is_uncore
> > > field, and splitting the existing cpumask parsing so that it can be
> > > reused.
> > > 
> > > Signed-off-by: Mark Rutland 
> > > Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > > monitoring)
> > 
> > It sucks that we haven't noticed this being broken for so long, but I can
> > confirm that this fixes the issue:
> > 
> > Acked-by: Will Deacon 
> > Tested-by Will Deacon 
> > 
> > Any chance we can get this into 4.14? You'll probably need to do some stable
> > backports too, since this is a bit spread out.
> 
> Sure, I've added this to my perf/urgent branch, that, together with a
> few other fixes is undergoing testing now.

Thanks, Arnaldo.

Will


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Will Deacon
Hi Pavel,

On Mon, Oct 09, 2017 at 02:59:09PM -0400, Pavel Tatashin wrote:
> > We have two table walks even with your patch series applied afaict: one in
> > our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one
> > in the core code.
> 
> I meant to say implementing two new page table walkers, not at runtime.

Ok, but I'm still missing why you think that is needed. What would be the
second page table walker that needs implementing?

> > My worry is that these are actually highly arch-specific, but will likely
> > grow more users in mm/ that assume things for all architectures that aren't
> > necessarily valid.
> 
> I see, how about moving new kasan_map_populate() implementation into
> arch dependent code:
> 
> arch/x86/mm/kasan_init_64.c
> arch/arm64/mm/kasan_init.c
> 
> This way we won't need to add pmd_large()/pud_large() macros for arm64?

I guess we could implement that on arm64 using our current vmemmap_populate
logic and an explicit memset.

Will


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Will Deacon
Hi Pavel,

On Mon, Oct 09, 2017 at 02:59:09PM -0400, Pavel Tatashin wrote:
> > We have two table walks even with your patch series applied afaict: one in
> > our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one
> > in the core code.
> 
> I meant to say implementing two new page table walkers, not at runtime.

Ok, but I'm still missing why you think that is needed. What would be the
second page table walker that needs implementing?

> > My worry is that these are actually highly arch-specific, but will likely
> > grow more users in mm/ that assume things for all architectures that aren't
> > necessarily valid.
> 
> I see, how about moving new kasan_map_populate() implementation into
> arch dependent code:
> 
> arch/x86/mm/kasan_init_64.c
> arch/arm64/mm/kasan_init.c
> 
> This way we won't need to add pmd_large()/pud_large() macros for arm64?

I guess we could implement that on arm64 using our current vmemmap_populate
logic and an explicit memset.

Will


Re: [Outreachy kernel] Re: [PATCH v2] drm/tegra: Replace dev_* with DRM_DEV_*

2017-10-09 Thread Sean Paul
On Mon, Sep 25, 2017 at 3:38 AM, Thierry Reding
 wrote:
> On Sun, Sep 24, 2017 at 10:13:57PM +0530, Harsha Sharma wrote:
>> Replace all occurences of dev_info/err/dbg with DRM_DEV_INFO/
>> ERROR/DEBUG as we have DRM_DEV_* variants of drm print macros
>> Done using following coccinelle semantic patch
>>
>> @r@
>> @@
>>
>> (
>> -dev_info
>> +DRM_DEV_INFO
>> |
>> -dev_err
>> +DRM_DEV_ERROR
>> |
>> -dev_dbg
>> +DRM_DEV_DEBUG
>> )
>>
>> Signed-off-by: Harsha Sharma 
>> ---
>> Changes in v2:
>>  -Break line over 80 characters
>>  -Changes in comments not required
>
> Please don't do this. Most of the functions that you're trying to
> replace here are not DRM_DEV_*() for a very specific reason: none of
> them have anything to do with DRM/KMS in particular. This is important,
> in my opinion, because these messages are very device-specific and the
> additional information added by the DRM format string aren't useful in
> the context.

Hey Thierry,
It's likely not useful to a tegra expert such as yourself. However,
when I'm switching between platforms or providing log-parsing
instructions, it's very useful to say "grep for drm". Without the
common drm prefix, the reader needs to know they're looking for
tegra-sor, tegra-dc, etc as well.

Not a big deal, I just wanted to provide color for why someone might want this.

>
> Perhaps the only ones I consider to be good candidates for this
> conversion are the ones in drivers/gpu/drm/tegra/fb.c because they deal
> with the DRM fbdev setup and hence are not device specific. And even in
> those cases I'm not sure we gain very much by this conversion,
> especially since most of the replacements now end up having to split up
> argument lists.
>
> Sorry if this isn't documented anywhere. I also suspect other driver
> maintainers will be less picky about this sort of thing, so you might
> have more luck there.

I think the TODO entry states that contributors should check with the
driver maintainers before taking on this work, so it is documented :)

Sean


>
> Thierry
>
> --
> You received this message because you are subscribed to the Google Groups 
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To post to this group, send email to outreachy-ker...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/outreachy-kernel/20170925073841.GA12494%40ulmo.fritz.box.
> For more options, visit https://groups.google.com/d/optout.


Re: [Outreachy kernel] Re: [PATCH v2] drm/tegra: Replace dev_* with DRM_DEV_*

2017-10-09 Thread Sean Paul
On Mon, Sep 25, 2017 at 3:38 AM, Thierry Reding
 wrote:
> On Sun, Sep 24, 2017 at 10:13:57PM +0530, Harsha Sharma wrote:
>> Replace all occurences of dev_info/err/dbg with DRM_DEV_INFO/
>> ERROR/DEBUG as we have DRM_DEV_* variants of drm print macros
>> Done using following coccinelle semantic patch
>>
>> @r@
>> @@
>>
>> (
>> -dev_info
>> +DRM_DEV_INFO
>> |
>> -dev_err
>> +DRM_DEV_ERROR
>> |
>> -dev_dbg
>> +DRM_DEV_DEBUG
>> )
>>
>> Signed-off-by: Harsha Sharma 
>> ---
>> Changes in v2:
>>  -Break line over 80 characters
>>  -Changes in comments not required
>
> Please don't do this. Most of the functions that you're trying to
> replace here are not DRM_DEV_*() for a very specific reason: none of
> them have anything to do with DRM/KMS in particular. This is important,
> in my opinion, because these messages are very device-specific and the
> additional information added by the DRM format string aren't useful in
> the context.

Hey Thierry,
It's likely not useful to a tegra expert such as yourself. However,
when I'm switching between platforms or providing log-parsing
instructions, it's very useful to say "grep for drm". Without the
common drm prefix, the reader needs to know they're looking for
tegra-sor, tegra-dc, etc as well.

Not a big deal, I just wanted to provide color for why someone might want this.

>
> Perhaps the only ones I consider to be good candidates for this
> conversion are the ones in drivers/gpu/drm/tegra/fb.c because they deal
> with the DRM fbdev setup and hence are not device specific. And even in
> those cases I'm not sure we gain very much by this conversion,
> especially since most of the replacements now end up having to split up
> argument lists.
>
> Sorry if this isn't documented anywhere. I also suspect other driver
> maintainers will be less picky about this sort of thing, so you might
> have more luck there.

I think the TODO entry states that contributors should check with the
driver maintainers before taking on this work, so it is documented :)

Sean


>
> Thierry
>
> --
> You received this message because you are subscribed to the Google Groups 
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To post to this group, send email to outreachy-ker...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/outreachy-kernel/20170925073841.GA12494%40ulmo.fritz.box.
> For more options, visit https://groups.google.com/d/optout.


Re: [PATCH 2/2] of/fdt: skip unflattening of disabled nodes

2017-10-09 Thread Rob Herring
On Sun, Oct 8, 2017 at 5:57 PM, Frank Rowand  wrote:
> On 10/03/17 09:18, Rob Herring wrote:
>> For static DT usecases, we don't need the disabled nodes and can skip
>> unflattening. This saves a significant amount of RAM in memory constrained
>> cases. In one example on STM32F469, the RAM usage goes from 118K to 26K.
>>
>> There are a few cases in the kernel that modify the status property
>> dynamically. These all are changes from enabled to disabled, depend on
>> OF_DYNAMIC or are not FDT based (PDT based).
>>
>> Cc: Nicolas Pitre 
>> Cc: Frank Rowand 
>> Signed-off-by: Rob Herring 
>> ---
>> For more background, see this presentation from Nico:
>>
>> https://connect.linaro.org/resource/sfo17/sfo17-100/
>>
>>  drivers/of/fdt.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index f8c39705418b..efe91c6856a0 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -396,6 +396,10 @@ static int unflatten_dt_nodes(const void *blob,
>>   if (WARN_ON_ONCE(depth >= FDT_MAX_DEPTH))
>>   continue;
>>
>> + if (!IS_ENABLED(CONFIG_OF_DYNAMIC) &&
>> + !of_fdt_device_is_available(blob, offset))
>> + continue;
>> +
>>   if (!populate_node(blob, offset, , nps[depth],
>>  [depth+1], dryrun))
>>   return mem - base;
>>
>
> Hi Rob,
>
> I strongly support the idea of this patch, but there may be an
> issue we have to resolve.  I'm pretty sure we had talked about
> the issue a long time ago, and it has been sitting on my todo
> list.
>
> We have two sets of node traversal macros and functions.  One
> set honors the status property, and the other ignores it.  If
> I recall our previous discussion properly, we want the normal
> usage to honor the status property and only a tiny (or maybe
> non-existent) set of locations to be allowed to ignore the
> status property.

Ignoring status is a bug for a static DT. There could be places that
expect the node to be present, but disabled. Those may be bugs too.

> A rough sense of how often the status property is honored or
> not is:
>
>$ git grep for_each_child_of_node | wc -l
>293
>$ git grep of_get_next_child | wc -l
>103
>
>$ git grep for_each_available_child_of_node | wc -l
>106
>$ git grep of_get_next_available_child | wc -l
>20
>
> Many of the cases where the status flag is ignored will not
> actually encounter a node that is not available, so many of
> the cases where status is not checked could currently be
> checking status.

For many nodes, status simply makes no sense or at least is undefined.

> And just for completeness, there are a number of standalone
> checks for whether a node is available:
>
>$ git grep of_device_is_available | wc -l
>128

I'm surprised it's that many. It's a low-level detail that the core
should handle. We'd also need to make things like of_find_node_by_name
honor status.

> It will be a pain to manually check all of the sites that
> ignore the status property, but that task should be done.
>
> In the mean time, maybe we could flush out the few cases
> that currently depend on ignoring the status property by
>
>- making for_each_child_of_node() and of_get_next_child()
>  actually check for valid status
>
>- provide a temporary (one or two kernel release)
>  CONFIG option to allow the old behavior for
>  for_each_child_of_node() and of_get_next_child()
>  just in case we miss any locations that need to
>  be fixed
>
>- fix up the few places in core device tree code that
>  actually need to ignore status (if such places exist)
>
> In the end, the *_available_*() interfaces should be
> removed, because the normal behavior of node traversal
> should be to only traverse nodes that are available.

I'm not sure this is really something we want or need to fix.

I could just make this depend on OF_KOBJ instead. Then practically no
one would see any change as almost everyone enables sysfs (and in turn
/proc/device-tree).

Rob


Re: [PATCH 2/2] of/fdt: skip unflattening of disabled nodes

2017-10-09 Thread Rob Herring
On Sun, Oct 8, 2017 at 5:57 PM, Frank Rowand  wrote:
> On 10/03/17 09:18, Rob Herring wrote:
>> For static DT usecases, we don't need the disabled nodes and can skip
>> unflattening. This saves a significant amount of RAM in memory constrained
>> cases. In one example on STM32F469, the RAM usage goes from 118K to 26K.
>>
>> There are a few cases in the kernel that modify the status property
>> dynamically. These all are changes from enabled to disabled, depend on
>> OF_DYNAMIC or are not FDT based (PDT based).
>>
>> Cc: Nicolas Pitre 
>> Cc: Frank Rowand 
>> Signed-off-by: Rob Herring 
>> ---
>> For more background, see this presentation from Nico:
>>
>> https://connect.linaro.org/resource/sfo17/sfo17-100/
>>
>>  drivers/of/fdt.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index f8c39705418b..efe91c6856a0 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -396,6 +396,10 @@ static int unflatten_dt_nodes(const void *blob,
>>   if (WARN_ON_ONCE(depth >= FDT_MAX_DEPTH))
>>   continue;
>>
>> + if (!IS_ENABLED(CONFIG_OF_DYNAMIC) &&
>> + !of_fdt_device_is_available(blob, offset))
>> + continue;
>> +
>>   if (!populate_node(blob, offset, , nps[depth],
>>  [depth+1], dryrun))
>>   return mem - base;
>>
>
> Hi Rob,
>
> I strongly support the idea of this patch, but there may be an
> issue we have to resolve.  I'm pretty sure we had talked about
> the issue a long time ago, and it has been sitting on my todo
> list.
>
> We have two sets of node traversal macros and functions.  One
> set honors the status property, and the other ignores it.  If
> I recall our previous discussion properly, we want the normal
> usage to honor the status property and only a tiny (or maybe
> non-existent) set of locations to be allowed to ignore the
> status property.

Ignoring status is a bug for a static DT. There could be places that
expect the node to be present, but disabled. Those may be bugs too.

> A rough sense of how often the status property is honored or
> not is:
>
>$ git grep for_each_child_of_node | wc -l
>293
>$ git grep of_get_next_child | wc -l
>103
>
>$ git grep for_each_available_child_of_node | wc -l
>106
>$ git grep of_get_next_available_child | wc -l
>20
>
> Many of the cases where the status flag is ignored will not
> actually encounter a node that is not available, so many of
> the cases where status is not checked could currently be
> checking status.

For many nodes, status simply makes no sense or at least is undefined.

> And just for completeness, there are a number of standalone
> checks for whether a node is available:
>
>$ git grep of_device_is_available | wc -l
>128

I'm surprised it's that many. It's a low-level detail that the core
should handle. We'd also need to make things like of_find_node_by_name
honor status.

> It will be a pain to manually check all of the sites that
> ignore the status property, but that task should be done.
>
> In the mean time, maybe we could flush out the few cases
> that currently depend on ignoring the status property by
>
>- making for_each_child_of_node() and of_get_next_child()
>  actually check for valid status
>
>- provide a temporary (one or two kernel release)
>  CONFIG option to allow the old behavior for
>  for_each_child_of_node() and of_get_next_child()
>  just in case we miss any locations that need to
>  be fixed
>
>- fix up the few places in core device tree code that
>  actually need to ignore status (if such places exist)
>
> In the end, the *_available_*() interfaces should be
> removed, because the normal behavior of node traversal
> should be to only traverse nodes that are available.

I'm not sure this is really something we want or need to fix.

I could just make this depend on OF_KOBJ instead. Then practically no
one would see any change as almost everyone enables sysfs (and in turn
/proc/device-tree).

Rob


Re: [PATCH] perf tools: unbreak perf record for arm/arm64

2017-10-09 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 09, 2017 at 11:22:00AM +0100, Will Deacon escreveu:
> On Fri, Oct 06, 2017 at 07:38:22PM +0100, Mark Rutland wrote:
> > Currently, perf record is broken on arm/arm64 systems when the PMU is
> > specified explicitly as part of the event, e.g.
> > 
> > $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true
> > 
> > In such cases, perf record fails to open events unless
> > perf_event_paranoid is set to -1, even if the PMU in question supports
> > mode exclusion. Further, even when perf_event_paranoid is toggled, no
> > samples are recorded.
> > 
> > This is an unintended side effect of commit:
> > 
> >   e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > monitoring)
> > 
> > ... which assumes that if a PMU has an associated cpu_map, it is an
> > uncore PMU, and forces events for such PMUs to be system-wide.
> > 
> > This is not true for arm/arm64 systems, which can have heterogeneous
> > CPUs. To account for this, multiple CPU PMUs are exposed, each with a
> > "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM
> > PMUs do not have a "cpumask" file, and only have a "cpus" file. For the
> > gory details as to why, see commit:
> > 
> >  7e3fcffe95544010 ("perf pmu: Support alternative sysfs cpumask")
> > 
> > Given all of this, we can instead identify uncore PMUs by explicitly
> > checking for a "cpumask" file, and restore arm/arm64 PMU support back to
> > a working state. This patch does so, adding a new perf_pmu::is_uncore
> > field, and splitting the existing cpumask parsing so that it can be
> > reused.
> > 
> > Signed-off-by: Mark Rutland 
> > Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > monitoring)
> 
> It sucks that we haven't noticed this being broken for so long, but I can
> confirm that this fixes the issue:
> 
> Acked-by: Will Deacon 
> Tested-by Will Deacon 
> 
> Any chance we can get this into 4.14? You'll probably need to do some stable
> backports too, since this is a bit spread out.

Sure, I've added this to my perf/urgent branch, that, together with a
few other fixes is undergoing testing now.

- Arnaldo


Re: [PATCH] perf tools: unbreak perf record for arm/arm64

2017-10-09 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 09, 2017 at 11:22:00AM +0100, Will Deacon escreveu:
> On Fri, Oct 06, 2017 at 07:38:22PM +0100, Mark Rutland wrote:
> > Currently, perf record is broken on arm/arm64 systems when the PMU is
> > specified explicitly as part of the event, e.g.
> > 
> > $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true
> > 
> > In such cases, perf record fails to open events unless
> > perf_event_paranoid is set to -1, even if the PMU in question supports
> > mode exclusion. Further, even when perf_event_paranoid is toggled, no
> > samples are recorded.
> > 
> > This is an unintended side effect of commit:
> > 
> >   e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > monitoring)
> > 
> > ... which assumes that if a PMU has an associated cpu_map, it is an
> > uncore PMU, and forces events for such PMUs to be system-wide.
> > 
> > This is not true for arm/arm64 systems, which can have heterogeneous
> > CPUs. To account for this, multiple CPU PMUs are exposed, each with a
> > "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM
> > PMUs do not have a "cpumask" file, and only have a "cpus" file. For the
> > gory details as to why, see commit:
> > 
> >  7e3fcffe95544010 ("perf pmu: Support alternative sysfs cpumask")
> > 
> > Given all of this, we can instead identify uncore PMUs by explicitly
> > checking for a "cpumask" file, and restore arm/arm64 PMU support back to
> > a working state. This patch does so, adding a new perf_pmu::is_uncore
> > field, and splitting the existing cpumask parsing so that it can be
> > reused.
> > 
> > Signed-off-by: Mark Rutland 
> > Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide 
> > monitoring)
> 
> It sucks that we haven't noticed this being broken for so long, but I can
> confirm that this fixes the issue:
> 
> Acked-by: Will Deacon 
> Tested-by Will Deacon 
> 
> Any chance we can get this into 4.14? You'll probably need to do some stable
> backports too, since this is a bit spread out.

Sure, I've added this to my perf/urgent branch, that, together with a
few other fixes is undergoing testing now.

- Arnaldo


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Pavel Tatashin
Hi Will,

> We have two table walks even with your patch series applied afaict: one in
> our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one
> in the core code.

I meant to say implementing two new page table walkers, not at runtime.

> My worry is that these are actually highly arch-specific, but will likely
> grow more users in mm/ that assume things for all architectures that aren't
> necessarily valid.

I see, how about moving new kasan_map_populate() implementation into
arch dependent code:

arch/x86/mm/kasan_init_64.c
arch/arm64/mm/kasan_init.c

This way we won't need to add pmd_large()/pud_large() macros for arm64?

Pavel


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Pavel Tatashin
Hi Will,

> We have two table walks even with your patch series applied afaict: one in
> our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one
> in the core code.

I meant to say implementing two new page table walkers, not at runtime.

> My worry is that these are actually highly arch-specific, but will likely
> grow more users in mm/ that assume things for all architectures that aren't
> necessarily valid.

I see, how about moving new kasan_map_populate() implementation into
arch dependent code:

arch/x86/mm/kasan_init_64.c
arch/arm64/mm/kasan_init.c

This way we won't need to add pmd_large()/pud_large() macros for arm64?

Pavel


[PATCH v2] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
Filters should be cleared of init functions during freeing of init
memory when the ftrace dyn records are released. However in current
code, the filters are left as is. This patch clears the hashes of the
saved init functions when the init memory is freed. This fixes the
following issue reproducible with the following sequence of commands for
a test module:


void bar(void)
{
printk(KERN_INFO "bar!\n");
}

void foo(void)
{
printk(KERN_INFO "foo!\n");
bar();
}

static int __init hello_init(void)
{
printk(KERN_INFO "Hello world!\n");
foo();
return 0;
}

static void __exit hello_cleanup(void)
{
printk(KERN_INFO "Cleaning up module.\n");
}

module_init(hello_init);
module_exit(hello_cleanup);


Commands:
echo '*:mod:test' > /d/tracing/set_ftrace_filter
echo function > /d/tracing/current_tracer
cat /d/tracing/set_ftrace_filter
modprobe test
cat /d/tracing/trace
cat /d/tracing/set_ftrace_filter
rmmod test
cat /d/tracing/set_ftrace_filter
sleep 1
modprobe test
cat /d/tracing/trace
cat /d/tracing/set_ftrace_filter

Behavior without patch: Init function is still in the filter
Expected behavior: Shouldn't have any of the filters set

Cc: Jessica Yu 
Cc: Steven Rostedt 
Signed-off-by: Joel Fernandes 
---
 kernel/trace/ftrace.c | 71 +++
 1 file changed, 71 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9e99bd55732e..ac7e68c5e2d2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6067,6 +6067,65 @@ allocate_ftrace_mod_map(struct module *mod,
 }
 #endif /* CONFIG_MODULES */
 
+struct ftrace_init_func {
+   struct list_head list;
+   unsigned long ip;
+};
+
+/* Clear any init ips from hashes */
+static void
+clear_func_from_hash(struct ftrace_init_func *func, struct ftrace_hash *hash)
+{
+   struct ftrace_func_entry *entry;
+
+   if (ftrace_hash_empty(hash))
+   return;
+
+   printk("clear func %ps from hash\n", (void *)func->ip);
+
+   entry = __ftrace_lookup_ip(hash, func->ip);
+
+   /*
+* Do not allow this rec to match again.
+* Yeah, it may waste some memory, but will be removed
+* if/when the hash is modified again.
+*/
+   if (entry)
+   entry->ip = 0;
+}
+
+static void
+clear_func_from_hashes(struct ftrace_init_func *func)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (!tr->ops || !tr->ops->func_hash)
+   continue;
+   mutex_lock(>ops->func_hash->regex_lock);
+   clear_func_from_hash(func, tr->ops->func_hash->filter_hash);
+   clear_func_from_hash(func, tr->ops->func_hash->notrace_hash);
+   mutex_unlock(>ops->func_hash->regex_lock);
+   }
+   mutex_unlock(_types_lock);
+}
+
+static void add_to_clear_hash_list(struct list_head *clear_list,
+  struct dyn_ftrace *rec)
+{
+   struct ftrace_init_func *func;
+
+   func = kmalloc(sizeof(*func), GFP_KERNEL);
+   if (!func) {
+   WARN_ONCE(1, "alloc failure, ftrace filter could be stale\n");
+   return;
+   }
+
+   func->ip = rec->ip;
+   list_add(>list, clear_list);
+}
+
 void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
 {
unsigned long start = (unsigned long)(start_ptr);
@@ -6076,8 +6135,12 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
struct dyn_ftrace *rec;
struct dyn_ftrace key;
struct ftrace_mod_map *mod_map = NULL;
+   struct ftrace_init_func *func, *func_next;
+   struct list_head clear_hash;
int order;
 
+   INIT_LIST_HEAD(_hash);
+
key.ip = start;
key.flags = end;/* overload flags, as it is unsigned long */
 
@@ -6102,6 +6165,9 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, 
void *end_ptr)
if (!rec)
continue;
 
+   /* rec will be cleared from hashes after ftrace_lock unlock */
+   add_to_clear_hash_list(_hash, rec);
+
if (mod_map)
save_ftrace_mod_rec(mod_map, rec);
 
@@ -6123,6 +6189,11 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
goto again;
}
mutex_unlock(_lock);
+
+   list_for_each_entry_safe(func, func_next, _hash, list) {
+   clear_func_from_hashes(func);
+   kfree(func);
+   }
 }
 
 void __init ftrace_free_init_mem(void)
-- 
2.14.2.920.gcf0c67979c-goog



[PATCH v2] ftrace: Clear hashes of stale ips of init memory

2017-10-09 Thread Joel Fernandes
Filters should be cleared of init functions during freeing of init
memory when the ftrace dyn records are released. However in current
code, the filters are left as is. This patch clears the hashes of the
saved init functions when the init memory is freed. This fixes the
following issue reproducible with the following sequence of commands for
a test module:


void bar(void)
{
printk(KERN_INFO "bar!\n");
}

void foo(void)
{
printk(KERN_INFO "foo!\n");
bar();
}

static int __init hello_init(void)
{
printk(KERN_INFO "Hello world!\n");
foo();
return 0;
}

static void __exit hello_cleanup(void)
{
printk(KERN_INFO "Cleaning up module.\n");
}

module_init(hello_init);
module_exit(hello_cleanup);


Commands:
echo '*:mod:test' > /d/tracing/set_ftrace_filter
echo function > /d/tracing/current_tracer
cat /d/tracing/set_ftrace_filter
modprobe test
cat /d/tracing/trace
cat /d/tracing/set_ftrace_filter
rmmod test
cat /d/tracing/set_ftrace_filter
sleep 1
modprobe test
cat /d/tracing/trace
cat /d/tracing/set_ftrace_filter

Behavior without patch: Init function is still in the filter
Expected behavior: Shouldn't have any of the filters set

Cc: Jessica Yu 
Cc: Steven Rostedt 
Signed-off-by: Joel Fernandes 
---
 kernel/trace/ftrace.c | 71 +++
 1 file changed, 71 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9e99bd55732e..ac7e68c5e2d2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6067,6 +6067,65 @@ allocate_ftrace_mod_map(struct module *mod,
 }
 #endif /* CONFIG_MODULES */
 
+struct ftrace_init_func {
+   struct list_head list;
+   unsigned long ip;
+};
+
+/* Clear any init ips from hashes */
+static void
+clear_func_from_hash(struct ftrace_init_func *func, struct ftrace_hash *hash)
+{
+   struct ftrace_func_entry *entry;
+
+   if (ftrace_hash_empty(hash))
+   return;
+
+   printk("clear func %ps from hash\n", (void *)func->ip);
+
+   entry = __ftrace_lookup_ip(hash, func->ip);
+
+   /*
+* Do not allow this rec to match again.
+* Yeah, it may waste some memory, but will be removed
+* if/when the hash is modified again.
+*/
+   if (entry)
+   entry->ip = 0;
+}
+
+static void
+clear_func_from_hashes(struct ftrace_init_func *func)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (!tr->ops || !tr->ops->func_hash)
+   continue;
+   mutex_lock(>ops->func_hash->regex_lock);
+   clear_func_from_hash(func, tr->ops->func_hash->filter_hash);
+   clear_func_from_hash(func, tr->ops->func_hash->notrace_hash);
+   mutex_unlock(>ops->func_hash->regex_lock);
+   }
+   mutex_unlock(_types_lock);
+}
+
+static void add_to_clear_hash_list(struct list_head *clear_list,
+  struct dyn_ftrace *rec)
+{
+   struct ftrace_init_func *func;
+
+   func = kmalloc(sizeof(*func), GFP_KERNEL);
+   if (!func) {
+   WARN_ONCE(1, "alloc failure, ftrace filter could be stale\n");
+   return;
+   }
+
+   func->ip = rec->ip;
+   list_add(>list, clear_list);
+}
+
 void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
 {
unsigned long start = (unsigned long)(start_ptr);
@@ -6076,8 +6135,12 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
struct dyn_ftrace *rec;
struct dyn_ftrace key;
struct ftrace_mod_map *mod_map = NULL;
+   struct ftrace_init_func *func, *func_next;
+   struct list_head clear_hash;
int order;
 
+   INIT_LIST_HEAD(_hash);
+
key.ip = start;
key.flags = end;/* overload flags, as it is unsigned long */
 
@@ -6102,6 +6165,9 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, 
void *end_ptr)
if (!rec)
continue;
 
+   /* rec will be cleared from hashes after ftrace_lock unlock */
+   add_to_clear_hash_list(_hash, rec);
+
if (mod_map)
save_ftrace_mod_rec(mod_map, rec);
 
@@ -6123,6 +6189,11 @@ void ftrace_free_mem(struct module *mod, void 
*start_ptr, void *end_ptr)
goto again;
}
mutex_unlock(_lock);
+
+   list_for_each_entry_safe(func, func_next, _hash, list) {
+   clear_func_from_hashes(func);
+   kfree(func);
+   }
 }
 
 void __init ftrace_free_init_mem(void)
-- 
2.14.2.920.gcf0c67979c-goog



Re: [PATCH 3/3] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory

2017-10-09 Thread Yang Shi



On 10/8/17 11:36 PM, Michal Hocko wrote:

On Mon 09-10-17 08:33:16, Michal Hocko wrote:

On Sat 07-10-17 00:37:55, Yang Shi wrote:



On 10/6/17 2:37 AM, Michal Hocko wrote:

On Thu 05-10-17 05:29:10, Yang Shi wrote:

[...]

+   list_for_each_entry_safe(s, s2, _caches, list) {
+   if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+   continue;
+
+   memset(, 0, sizeof(sinfo));


why do you zero out the structure. All the fields you are printing are
filled out in get_slabinfo.


No special reason, just wipe out the potential stale data on the stack.


Do not add code that has no meaning. The OOM killer is a slow path but
that doesn't mean we should throw spare cycles out of the window.


With this fixed and the compile fix [1] folded, feel free to add my
Acked-by: Michal Hocko 

[1] 
http://lkml.kernel.org/r/1507492085-42264-1-git-send-email-yan...@alibaba-inc.com


Did some more thorough test and took the code a little deeper, it sounds 
!CONFIG_SLOB is not enough. Some data structure and functions depends on 
CONFIG_SLUB_DEBUG, i.e. kmem_cache_node->total_objects and 
node_nr_objs(), which are essential of get_slabinfo().


So, I'm supposed it makes more sense to protect the related slab stats 
code and the unreclaimable slabinfo dump with CONFIG_SLAB || 
CONFIG_SLUB_DEBUG.


Thanks,
Yang





Re: [PATCH 3/3] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory

2017-10-09 Thread Yang Shi



On 10/8/17 11:36 PM, Michal Hocko wrote:

On Mon 09-10-17 08:33:16, Michal Hocko wrote:

On Sat 07-10-17 00:37:55, Yang Shi wrote:



On 10/6/17 2:37 AM, Michal Hocko wrote:

On Thu 05-10-17 05:29:10, Yang Shi wrote:

[...]

+   list_for_each_entry_safe(s, s2, _caches, list) {
+   if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+   continue;
+
+   memset(, 0, sizeof(sinfo));


why do you zero out the structure. All the fields you are printing are
filled out in get_slabinfo.


No special reason, just wipe out the potential stale data on the stack.


Do not add code that has no meaning. The OOM killer is a slow path but
that doesn't mean we should throw spare cycles out of the window.


With this fixed and the compile fix [1] folded, feel free to add my
Acked-by: Michal Hocko 

[1] 
http://lkml.kernel.org/r/1507492085-42264-1-git-send-email-yan...@alibaba-inc.com


Did some more thorough test and took the code a little deeper, it sounds 
!CONFIG_SLOB is not enough. Some data structure and functions depends on 
CONFIG_SLUB_DEBUG, i.e. kmem_cache_node->total_objects and 
node_nr_objs(), which are essential of get_slabinfo().


So, I'm supposed it makes more sense to protect the related slab stats 
code and the unreclaimable slabinfo dump with CONFIG_SLAB || 
CONFIG_SLUB_DEBUG.


Thanks,
Yang





Re: [PATCH v5] crypto: s5p-sss: Add HASH support for Exynos

2017-10-09 Thread Krzysztof Kozlowski
On Mon, Oct 09, 2017 at 01:12:30PM +0200, Kamil Konieczny wrote:
> Add support for MD5, SHA1, SHA256 hash algorithms for Exynos HW.
> It uses the crypto framework asynchronous hash api.
> It is based on omap-sham.c driver.
> S5P has some HW differencies and is not implemented.
> 
> Modifications in s5p-sss:
> 
> - Add hash supporting structures and functions.
> 
> - Modify irq handler to handle both aes and hash signals.
> 
> - Resize resource end in probe if EXYNOS_HASH is enabled in
>   Kconfig.
> 
> - Add new copyright line and new author.
> 
> - Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.13-rc6
>   with crypto run-time self test testmgr
>   and with tcrypt module with: modprobe tcrypt sec=1 mode=N
>   where N=402, 403, 404 (MD5, SHA1, SHA256).
> 
> Modifications in drivers/crypto/Kconfig:
> 
> - Add new CRYPTO_DEV_EXYNOS_HASH, depend on !EXYNOS_RNG
>   and CRYPTO_DEV_S5P
> 
> - Select sw algorithms MD5, SHA1 and SHA256 in EXYNOS_HASH
>   as they are nedded for fallback.
> 
> Signed-off-by: Kamil Konieczny 
> ---
> version 5:
> - fix suggested by Krzysztof Kozlowski: change defines HASH_OP into enum, fix
>   comments
> 
> version 4:
> - fixes suggested by Krzysztof Kozlowski: reformat comments, convert context
>   flags into two bool vars, drop SSS_ALIGNED, change name of SSS_DMA_ALIGN and
>   SSS_DMA_ALIGN_MASK, split assignments into separate lines, use IS_ENABLED in
>   place of ifdef, remove sss_hash_algs_info and simplify register and 
> deregister
>   HASH algs
> 
> version 3:
> - many fixes suggested by Krzysztof Kozlowski: comments, uppercases in const,
>   remove unused defines, remove unused variable bs, constify aes_variant,
>   remove global var use_hash, remove WARN_ON, improve hash_import(),
>   change goto label into 'out' in s5p_hash_handle_queue(), reorder variable
>   declarations, add spinlock to protect clearing HASH_FLAGS_BUSY
> - simplify code: replace one-line functions s5p_hash_update_req(),
>   s5p_hash_final_req() with call to s5p_hash_xmit_dma(), and delete them
> - replace call to s5p_hash_hw_init() into s5p_ahash_dma_init() and delete it
> - fix clearing shash flag CRYPTO_TFM_REQ_MAY_SLEEP
> - fix s5p_hash_set_flow()
> 
> version 2:
> - change patch format so number of lines drops
> - change in Kconfig as suggested by Krzysztof Kozlowski, add
>   EXYNOS_HASH subsection
> - change #ifndef EXYNOS_RNG into #ifdef CRYPTO_DEV_EXYNOS_HASH
> - remove style fixups in aes, as they should go in separate patch
> - remove FLOW_LOG, FLOW_DUMP macros and its uses
> - remove #if 0 ... endif
> - remove unused function hash_wait and its defines
> - fix compiler warning in dev_dbg
> - remove some comments
> - other minor fixes in comments
> 
>  drivers/crypto/Kconfig   |   14 +
>  drivers/crypto/s5p-sss.c | 1441 
> +-
>  2 files changed, 1445 insertions(+), 10 deletions(-)
> 

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof



Re: [PATCH v5] crypto: s5p-sss: Add HASH support for Exynos

2017-10-09 Thread Krzysztof Kozlowski
On Mon, Oct 09, 2017 at 01:12:30PM +0200, Kamil Konieczny wrote:
> Add support for MD5, SHA1, SHA256 hash algorithms for Exynos HW.
> It uses the crypto framework asynchronous hash api.
> It is based on omap-sham.c driver.
> S5P has some HW differencies and is not implemented.
> 
> Modifications in s5p-sss:
> 
> - Add hash supporting structures and functions.
> 
> - Modify irq handler to handle both aes and hash signals.
> 
> - Resize resource end in probe if EXYNOS_HASH is enabled in
>   Kconfig.
> 
> - Add new copyright line and new author.
> 
> - Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.13-rc6
>   with crypto run-time self test testmgr
>   and with tcrypt module with: modprobe tcrypt sec=1 mode=N
>   where N=402, 403, 404 (MD5, SHA1, SHA256).
> 
> Modifications in drivers/crypto/Kconfig:
> 
> - Add new CRYPTO_DEV_EXYNOS_HASH, depend on !EXYNOS_RNG
>   and CRYPTO_DEV_S5P
> 
> - Select sw algorithms MD5, SHA1 and SHA256 in EXYNOS_HASH
>   as they are nedded for fallback.
> 
> Signed-off-by: Kamil Konieczny 
> ---
> version 5:
> - fix suggested by Krzysztof Kozlowski: change defines HASH_OP into enum, fix
>   comments
> 
> version 4:
> - fixes suggested by Krzysztof Kozlowski: reformat comments, convert context
>   flags into two bool vars, drop SSS_ALIGNED, change name of SSS_DMA_ALIGN and
>   SSS_DMA_ALIGN_MASK, split assignments into separate lines, use IS_ENABLED in
>   place of ifdef, remove sss_hash_algs_info and simplify register and 
> deregister
>   HASH algs
> 
> version 3:
> - many fixes suggested by Krzysztof Kozlowski: comments, uppercases in const,
>   remove unused defines, remove unused variable bs, constify aes_variant,
>   remove global var use_hash, remove WARN_ON, improve hash_import(),
>   change goto label into 'out' in s5p_hash_handle_queue(), reorder variable
>   declarations, add spinlock to protect clearing HASH_FLAGS_BUSY
> - simplify code: replace one-line functions s5p_hash_update_req(),
>   s5p_hash_final_req() with call to s5p_hash_xmit_dma(), and delete them
> - replace call to s5p_hash_hw_init() into s5p_ahash_dma_init() and delete it
> - fix clearing shash flag CRYPTO_TFM_REQ_MAY_SLEEP
> - fix s5p_hash_set_flow()
> 
> version 2:
> - change patch format so number of lines drops
> - change in Kconfig as suggested by Krzysztof Kozlowski, add
>   EXYNOS_HASH subsection
> - change #ifndef EXYNOS_RNG into #ifdef CRYPTO_DEV_EXYNOS_HASH
> - remove style fixups in aes, as they should go in separate patch
> - remove FLOW_LOG, FLOW_DUMP macros and its uses
> - remove #if 0 ... endif
> - remove unused function hash_wait and its defines
> - fix compiler warning in dev_dbg
> - remove some comments
> - other minor fixes in comments
> 
>  drivers/crypto/Kconfig   |   14 +
>  drivers/crypto/s5p-sss.c | 1441 
> +-
>  2 files changed, 1445 insertions(+), 10 deletions(-)
> 

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof



Re: [PATCH v4 1/2] PCI: pci-host-generic: add support for Synopsys DesignWare RC in ECAM mode

2017-10-09 Thread Ard Biesheuvel
On 9 October 2017 at 17:46, Will Deacon  wrote:
> On Fri, Oct 06, 2017 at 06:21:59PM -0500, Bjorn Helgaas wrote:
>> On Fri, Oct 06, 2017 at 05:39:18PM +0100, Ard Biesheuvel wrote:
>> > diff --git a/drivers/pci/host/pci-host-generic.c 
>> > b/drivers/pci/host/pci-host-generic.c
>> > index 7d709a7e0aa8..01e81a30e303 100644
>> > --- a/drivers/pci/host/pci-host-generic.c
>> > +++ b/drivers/pci/host/pci-host-generic.c
>> > @@ -35,6 +35,43 @@ static struct pci_ecam_ops gen_pci_cfg_cam_bus_ops = {
>> > }
>> >  };
>> >
>> > +static int pci_dw_ecam_config_read(struct pci_bus *bus, u32 devfn, int 
>> > where,
>> > +  int size, u32 *val)
>> > +{
>> > +   struct pci_config_window *cfg = bus->sysdata;
>> > +
>> > +   /*
>> > +* The Synopsys DesignWare PCIe controller in ECAM mode will not filter
>> > +* type 0 config TLPs sent to devices 1 and up on its downstream port,
>> > +* resulting in devices appearing multiple times on bus 0 unless we
>> > +* filter out those accesses here.
>> > +*/
>> > +   if (bus->number == cfg->busr.start && PCI_SLOT(devfn) > 0)
>> > +   return PCIBIOS_DEVICE_NOT_FOUND;
>>
>> I think we should return 0x data here, as we do in other
>> similar accessors (dw_pcie_rd_conf(), altera_pcie_cfg_read(),
>> rockchip_pcie_rd_conf()).
>>
>> Actually, xilinx-nwl has a nice trick: it implements
>> nwl_pcie_map_bus(), which returns NULL for invalid devices.  Then it
>> can use the generic accessors, and pci_generic_config_read() already
>> fills in ~0 for this case.
>>
>> What do you think of the following?  I put it on my pci/host-generic
>> branch for now (pending your response and an ack from Will).
>
> I'd probably just inline the device check in pci_dw_ecam_map_bus directly,
> but either way:
>
> Acked-by: Will Deacon 
>

Thanks

> I'm assuming this is so small that there's no point in having a Kconfig
> option for these guys that depends on the generic host driver?
>

I can make the compatible strings shorter if you like?


Re: [PATCH v4 1/2] PCI: pci-host-generic: add support for Synopsys DesignWare RC in ECAM mode

2017-10-09 Thread Ard Biesheuvel
On 9 October 2017 at 17:46, Will Deacon  wrote:
> On Fri, Oct 06, 2017 at 06:21:59PM -0500, Bjorn Helgaas wrote:
>> On Fri, Oct 06, 2017 at 05:39:18PM +0100, Ard Biesheuvel wrote:
>> > diff --git a/drivers/pci/host/pci-host-generic.c 
>> > b/drivers/pci/host/pci-host-generic.c
>> > index 7d709a7e0aa8..01e81a30e303 100644
>> > --- a/drivers/pci/host/pci-host-generic.c
>> > +++ b/drivers/pci/host/pci-host-generic.c
>> > @@ -35,6 +35,43 @@ static struct pci_ecam_ops gen_pci_cfg_cam_bus_ops = {
>> > }
>> >  };
>> >
>> > +static int pci_dw_ecam_config_read(struct pci_bus *bus, u32 devfn, int 
>> > where,
>> > +  int size, u32 *val)
>> > +{
>> > +   struct pci_config_window *cfg = bus->sysdata;
>> > +
>> > +   /*
>> > +* The Synopsys DesignWare PCIe controller in ECAM mode will not filter
>> > +* type 0 config TLPs sent to devices 1 and up on its downstream port,
>> > +* resulting in devices appearing multiple times on bus 0 unless we
>> > +* filter out those accesses here.
>> > +*/
>> > +   if (bus->number == cfg->busr.start && PCI_SLOT(devfn) > 0)
>> > +   return PCIBIOS_DEVICE_NOT_FOUND;
>>
>> I think we should return 0x data here, as we do in other
>> similar accessors (dw_pcie_rd_conf(), altera_pcie_cfg_read(),
>> rockchip_pcie_rd_conf()).
>>
>> Actually, xilinx-nwl has a nice trick: it implements
>> nwl_pcie_map_bus(), which returns NULL for invalid devices.  Then it
>> can use the generic accessors, and pci_generic_config_read() already
>> fills in ~0 for this case.
>>
>> What do you think of the following?  I put it on my pci/host-generic
>> branch for now (pending your response and an ack from Will).
>
> I'd probably just inline the device check in pci_dw_ecam_map_bus directly,
> but either way:
>
> Acked-by: Will Deacon 
>

Thanks

> I'm assuming this is so small that there's no point in having a Kconfig
> option for these guys that depends on the generic host driver?
>

I can make the compatible strings shorter if you like?


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Will Deacon
On Mon, Oct 09, 2017 at 08:14:33PM +0200, Michal Hocko wrote:
> On Mon 09-10-17 13:51:47, Pavel Tatashin wrote:
> > I can go back to that approach, if Michal OK with it. But, that would
> > mean that I would need to touch every single architecture that
> > implements vmemmap_populate(), and also pass flags at least through
> > these functions on every architectures (some have more than one
> > decided by configs).:
> > 
> > vmemmap_populate()
> > vmemmap_populate_basepages()
> > vmemmap_populate_hugepages()
> > vmemmap_pte_populate()
> > __vmemmap_alloc_block_buf()
> > alloc_block_buf()
> > vmemmap_alloc_block()
> > 
> > IMO, while I understand that it looks strange that we must walk page
> > table after creating it, it is a better approach: more enclosed as it
> > effects kasan only, and more universal as it is in common code.
> 
> While I understand that gfp mask approach might look better at first
> sight this is by no means a general purpose API so I would rather be
> pragmatic and have a smaller code footprint than a more general
> interface. Kasan is pretty much a special case and doing a one time
> initialization 2 pass thing is imho acceptable. If this turns out to be
> impractical in future then let's fix it up but right now I would rather
> go a simpler path.

I think the simpler path for arm64 is really to say when we want the memory
zeroing as opposed to exposing pmd_large/pud_large macros. Those are likely
to grow more users too, but are difficult to use correctly as we have things
like contiguous ptes that map to a granule smaller than a pmd.

I proposed an alternative solution to Pavel already, but it could be made
less general purpose by marking the function __meminit and only having it
do anything if KASAN is compiled in.

Will


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Will Deacon
On Mon, Oct 09, 2017 at 08:14:33PM +0200, Michal Hocko wrote:
> On Mon 09-10-17 13:51:47, Pavel Tatashin wrote:
> > I can go back to that approach, if Michal OK with it. But, that would
> > mean that I would need to touch every single architecture that
> > implements vmemmap_populate(), and also pass flags at least through
> > these functions on every architectures (some have more than one
> > decided by configs).:
> > 
> > vmemmap_populate()
> > vmemmap_populate_basepages()
> > vmemmap_populate_hugepages()
> > vmemmap_pte_populate()
> > __vmemmap_alloc_block_buf()
> > alloc_block_buf()
> > vmemmap_alloc_block()
> > 
> > IMO, while I understand that it looks strange that we must walk page
> > table after creating it, it is a better approach: more enclosed as it
> > effects kasan only, and more universal as it is in common code.
> 
> While I understand that gfp mask approach might look better at first
> sight this is by no means a general purpose API so I would rather be
> pragmatic and have a smaller code footprint than a more general
> interface. Kasan is pretty much a special case and doing a one time
> initialization 2 pass thing is imho acceptable. If this turns out to be
> impractical in future then let's fix it up but right now I would rather
> go a simpler path.

I think the simpler path for arm64 is really to say when we want the memory
zeroing as opposed to exposing pmd_large/pud_large macros. Those are likely
to grow more users too, but are difficult to use correctly as we have things
like contiguous ptes that map to a granule smaller than a pmd.

I proposed an alternative solution to Pavel already, but it could be made
less general purpose by marking the function __meminit and only having it
do anything if KASAN is compiled in.

Will


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Will Deacon
On Mon, Oct 09, 2017 at 02:42:32PM -0400, Pavel Tatashin wrote:
> Hi Will,
> 
> In addition to what Michal wrote:
> 
> > As an interim step, why not introduce something like
> > vmemmap_alloc_block_flags and make the page-table walking opt-out for
> > architectures that don't want it? Then we can just pass __GFP_ZERO from
> > our vmemmap_populate where necessary and other architectures can do the
> > page-table walking dance if they prefer.
> 
> I do not see the benefit, implementing this approach means that we
> would need to implement two table walks instead of one: one for x86,
> another for ARM, as these two architectures support kasan. Also, this
> would become a requirement for any future architecture that want to
> add kasan support to add this page table walk implementation.

We have two table walks even with your patch series applied afaict: one in
our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one
in the core code.

> >> IMO, while I understand that it looks strange that we must walk page
> >> table after creating it, it is a better approach: more enclosed as it
> >> effects kasan only, and more universal as it is in common code.
> >
> > I don't buy the more universal aspect, but I appreciate it's subjective.
> > Frankly, I'd just sooner not have core code walking early page tables if
> > it can be avoided, and it doesn't look hard to avoid it in this case.
> > The fact that you're having to add pmd_large and pud_large, which are
> > otherwise unused in mm/, is an indication that this isn't quite right imo.
> 
>  28 +#define pmd_large(pmd) pmd_sect(pmd)
>  29 +#define pud_large(pud) pud_sect(pud)
> 
> it is just naming difference, ARM64 calls them pmd_sect, common mm and
> other arches call them
> pmd_large/pud_large. Even the ARM has these defines in
> 
> arm/include/asm/pgtable-3level.h
> arm/include/asm/pgtable-2level.h

My worry is that these are actually highly arch-specific, but will likely
grow more users in mm/ that assume things for all architectures that aren't
necessarily valid.

Will


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Will Deacon
On Mon, Oct 09, 2017 at 02:42:32PM -0400, Pavel Tatashin wrote:
> Hi Will,
> 
> In addition to what Michal wrote:
> 
> > As an interim step, why not introduce something like
> > vmemmap_alloc_block_flags and make the page-table walking opt-out for
> > architectures that don't want it? Then we can just pass __GFP_ZERO from
> > our vmemmap_populate where necessary and other architectures can do the
> > page-table walking dance if they prefer.
> 
> I do not see the benefit, implementing this approach means that we
> would need to implement two table walks instead of one: one for x86,
> another for ARM, as these two architectures support kasan. Also, this
> would become a requirement for any future architecture that want to
> add kasan support to add this page table walk implementation.

We have two table walks even with your patch series applied afaict: one in
our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one
in the core code.

> >> IMO, while I understand that it looks strange that we must walk page
> >> table after creating it, it is a better approach: more enclosed as it
> >> effects kasan only, and more universal as it is in common code.
> >
> > I don't buy the more universal aspect, but I appreciate it's subjective.
> > Frankly, I'd just sooner not have core code walking early page tables if
> > it can be avoided, and it doesn't look hard to avoid it in this case.
> > The fact that you're having to add pmd_large and pud_large, which are
> > otherwise unused in mm/, is an indication that this isn't quite right imo.
> 
>  28 +#define pmd_large(pmd) pmd_sect(pmd)
>  29 +#define pud_large(pud) pud_sect(pud)
> 
> it is just naming difference, ARM64 calls them pmd_sect, common mm and
> other arches call them
> pmd_large/pud_large. Even the ARM has these defines in
> 
> arm/include/asm/pgtable-3level.h
> arm/include/asm/pgtable-2level.h

My worry is that these are actually highly arch-specific, but will likely
grow more users in mm/ that assume things for all architectures that aren't
necessarily valid.

Will


Re: [PATCH v2 10/16] iommu: introduce device fault report API

2017-10-09 Thread Jacob Pan
On Fri, 6 Oct 2017 10:36:02 +0100
Jean-Philippe Brucker  wrote:

> Hi Jacob,
> 
> On 06/10/17 00:03, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> > 
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> > 
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> > 
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> > 
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > ---  
> [...]
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler) +{
> > +   if (dev->iommu_fault_param)
> > +   return -EBUSY;
> > +   get_device(dev);
> > +   dev->iommu_fault_param =
> > +   kzalloc(sizeof(struct iommu_fault_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_fault_param)
> > +   return -ENOMEM;
> > +   dev->iommu_fault_param->dev_fault_handler = handler;  
> 
> Since the handler is owned by a device driver, you also need to clean
> it up when switching the driver (native->VFIO and VFIO->native), in
> iommu_attach_device I suppose.
> 
I was thinking the driver who registered fault handler shall be held
accountable to unregister. e.g. User must unbind driver (unregister
fault handler included) before assigning device to vfio-pci. Otherwise,
VFIO call to register handler would fail.
I am assuming VFIO needs to have a separate device fault handler of its
own.

Jacob


Re: [PATCH v2 10/16] iommu: introduce device fault report API

2017-10-09 Thread Jacob Pan
On Fri, 6 Oct 2017 10:36:02 +0100
Jean-Philippe Brucker  wrote:

> Hi Jacob,
> 
> On 06/10/17 00:03, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> > 
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> > 
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> > 
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> > 
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > ---  
> [...]
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler) +{
> > +   if (dev->iommu_fault_param)
> > +   return -EBUSY;
> > +   get_device(dev);
> > +   dev->iommu_fault_param =
> > +   kzalloc(sizeof(struct iommu_fault_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_fault_param)
> > +   return -ENOMEM;
> > +   dev->iommu_fault_param->dev_fault_handler = handler;  
> 
> Since the handler is owned by a device driver, you also need to clean
> it up when switching the driver (native->VFIO and VFIO->native), in
> iommu_attach_device I suppose.
> 
I was thinking the driver who registered fault handler shall be held
accountable to unregister. e.g. User must unbind driver (unregister
fault handler included) before assigning device to vfio-pci. Otherwise,
VFIO call to register handler would fail.
I am assuming VFIO needs to have a separate device fault handler of its
own.

Jacob


Re: [PATCH v2 1/3] kcov: support comparison operands collection

2017-10-09 Thread Dmitry Vyukov
On Mon, Oct 9, 2017 at 8:37 PM, Mark Rutland  wrote:
> On Mon, Oct 09, 2017 at 08:15:10PM +0200, 'Dmitry Vyukov' via syzkaller wrote:
>> On Mon, Oct 9, 2017 at 5:46 PM, Mark Rutland  wrote:
>> > On Mon, Oct 09, 2017 at 05:05:19PM +0200, Alexander Potapenko wrote:
>
>> > ... I note that a few places in the kernel use a 128-bit type. Are
>> > 128-bit comparisons not instrumented?
>>
>> Yes, they are not instrumented.
>> How many are there? Can you give some examples?
>
> From a quick scan, it doesn't looks like there are currently any
> comparisons.
>
> It's used as a data type in a few places under arm64:
>
> arch/arm64/include/asm/checksum.h:  __uint128_t tmp;
> arch/arm64/include/asm/checksum.h:  tmp = *(const __uint128_t *)iph;
> arch/arm64/include/asm/fpsimd.h:__uint128_t vregs[32];
> arch/arm64/include/uapi/asm/ptrace.h:   __uint128_t vregs[32];
> arch/arm64/include/uapi/asm/sigcontext.h:   __uint128_t vregs[32];
> arch/arm64/kernel/signal32.c:   __uint128_t raw;
> arch/arm64/kvm/guest.c: __uint128_t tmp;


Then I think we just continue ignoring them for now :)
In the future we can extend kcov to trace 128-bits values. We will
need to add a special flag and write 2 consecutive entries for them.
Or something along these lines.


>> >> + area = t->kcov_area;
>> >> + /* The first 64-bit word is the number of subsequent PCs. */
>> >> + pos = READ_ONCE(area[0]) + 1;
>> >> + if (likely(pos < t->kcov_size)) {
>> >> + area[pos] = ip;
>> >> + WRITE_ONCE(area[0], pos);
>> >
>> > Not a new problem, but if the area for one thread is mmap'd, and read by
>> > another thread, these two writes could be seen out-of-order, since we
>> > don't have an smp_wmb() between them.
>> >
>> > I guess Syzkaller doesn't read the mmap'd kcov file from another thread?
>>
>>
>> Yes, that's the intention. If you read coverage from another thread,
>> you can't know coverage from what exactly you read. So the usage
>> pattern is:
>>
>> reset coverage;
>> do something;
>> read coverage;
>
> Ok. I guess without a use-case for reading this from another thread it doesn't
> really matter.
>
> Thanks,
> Mark.


Re: [PATCH v2 1/3] kcov: support comparison operands collection

2017-10-09 Thread Dmitry Vyukov
On Mon, Oct 9, 2017 at 8:37 PM, Mark Rutland  wrote:
> On Mon, Oct 09, 2017 at 08:15:10PM +0200, 'Dmitry Vyukov' via syzkaller wrote:
>> On Mon, Oct 9, 2017 at 5:46 PM, Mark Rutland  wrote:
>> > On Mon, Oct 09, 2017 at 05:05:19PM +0200, Alexander Potapenko wrote:
>
>> > ... I note that a few places in the kernel use a 128-bit type. Are
>> > 128-bit comparisons not instrumented?
>>
>> Yes, they are not instrumented.
>> How many are there? Can you give some examples?
>
> From a quick scan, it doesn't looks like there are currently any
> comparisons.
>
> It's used as a data type in a few places under arm64:
>
> arch/arm64/include/asm/checksum.h:  __uint128_t tmp;
> arch/arm64/include/asm/checksum.h:  tmp = *(const __uint128_t *)iph;
> arch/arm64/include/asm/fpsimd.h:__uint128_t vregs[32];
> arch/arm64/include/uapi/asm/ptrace.h:   __uint128_t vregs[32];
> arch/arm64/include/uapi/asm/sigcontext.h:   __uint128_t vregs[32];
> arch/arm64/kernel/signal32.c:   __uint128_t raw;
> arch/arm64/kvm/guest.c: __uint128_t tmp;


Then I think we just continue ignoring them for now :)
In the future we can extend kcov to trace 128-bits values. We will
need to add a special flag and write 2 consecutive entries for them.
Or something along these lines.


>> >> + area = t->kcov_area;
>> >> + /* The first 64-bit word is the number of subsequent PCs. */
>> >> + pos = READ_ONCE(area[0]) + 1;
>> >> + if (likely(pos < t->kcov_size)) {
>> >> + area[pos] = ip;
>> >> + WRITE_ONCE(area[0], pos);
>> >
>> > Not a new problem, but if the area for one thread is mmap'd, and read by
>> > another thread, these two writes could be seen out-of-order, since we
>> > don't have an smp_wmb() between them.
>> >
>> > I guess Syzkaller doesn't read the mmap'd kcov file from another thread?
>>
>>
>> Yes, that's the intention. If you read coverage from another thread,
>> you can't know coverage from what exactly you read. So the usage
>> pattern is:
>>
>> reset coverage;
>> do something;
>> read coverage;
>
> Ok. I guess without a use-case for reading this from another thread it doesn't
> really matter.
>
> Thanks,
> Mark.


[PATCH] ARM: dts: imx7d: Invert legacy PCI irq mapping

2017-10-09 Thread Andrey Smirnov
According to i.MX7D reference manual (Rev. 0.1, table 7-1, page 1221)
legacy PCI interrupt mapping is as follows:

 - PCIE INT A is IRQ 122
 - PCIE INT B is IRQ 123
 - PCIE INT C is IRQ 124
 - PCIE INT D is IRQ 125

Invert the mapping information in corresponding DT node to reflect
that.

Cc: Shawn Guo 
Cc: yurov...@gmail.com
Cc: Fabio Estevam 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: devicet...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Andrey Smirnov 
---

Discovered by using custom build of QEMU[1] when using together with
"usb-ehci" which is, AFAIU, ICH4. A bit of a convoluted use-case, but
I thought it would still be worht fixing.

[1] https://github.com/ndreys/qemu/tree/imx7-support-upstreaming 

 arch/arm/boot/dts/imx7d.dtsi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
index f46814a7ea44..4d308d17f040 100644
--- a/arch/arm/boot/dts/imx7d.dtsi
+++ b/arch/arm/boot/dts/imx7d.dtsi
@@ -144,10 +144,10 @@
interrupt-names = "msi";
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 0x7>;
-   interrupt-map = <0 0 0 1  GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>,
-   <0 0 0 2  GIC_SPI 124 IRQ_TYPE_LEVEL_HIGH>,
-   <0 0 0 3  GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
-   <0 0 0 4  GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>;
+   interrupt-map = <0 0 0 1  GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>,
+   <0 0 0 2  GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
+   <0 0 0 3  GIC_SPI 124 IRQ_TYPE_LEVEL_HIGH>,
+   <0 0 0 4  GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>;
clocks = < IMX7D_PCIE_CTRL_ROOT_CLK>,
 < IMX7D_PLL_ENET_MAIN_100M_CLK>,
 < IMX7D_PCIE_PHY_ROOT_CLK>;
-- 
2.13.5



Re: [PATCH] leds: tca6507: Remove unnecessary reg check

2017-10-09 Thread Jacek Anaszewski
Hi Christos,

Thanks for the patch.

On 10/08/2017 08:55 PM, Christos Gkekas wrote:
> Variable reg is unsigned so checking whether it is less than zero is not
> necessary.
> 
> Signed-off-by: Christos Gkekas 
> ---
>  drivers/leds/leds-tca6507.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/leds/leds-tca6507.c b/drivers/leds/leds-tca6507.c
> index 45222a7..c12c16f 100644
> --- a/drivers/leds/leds-tca6507.c
> +++ b/drivers/leds/leds-tca6507.c
> @@ -715,7 +715,7 @@ tca6507_led_dt_init(struct i2c_client *client)
>   if (of_property_match_string(child, "compatible", "gpio") >= 0)
>   led.flags |= TCA6507_MAKE_GPIO;
>   ret = of_property_read_u32(child, "reg", );
> - if (ret != 0 || reg < 0 || reg >= NUM_LEDS)
> + if (ret != 0 || reg >= NUM_LEDS)
>   continue;
>  
>   tca_leds[reg] = led;
> 

Applied.

-- 
Best regards,
Jacek Anaszewski


[PATCH] ARM: dts: imx7d: Invert legacy PCI irq mapping

2017-10-09 Thread Andrey Smirnov
According to i.MX7D reference manual (Rev. 0.1, table 7-1, page 1221)
legacy PCI interrupt mapping is as follows:

 - PCIE INT A is IRQ 122
 - PCIE INT B is IRQ 123
 - PCIE INT C is IRQ 124
 - PCIE INT D is IRQ 125

Invert the mapping information in corresponding DT node to reflect
that.

Cc: Shawn Guo 
Cc: yurov...@gmail.com
Cc: Fabio Estevam 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: devicet...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Andrey Smirnov 
---

Discovered by using custom build of QEMU[1] when using together with
"usb-ehci" which is, AFAIU, ICH4. A bit of a convoluted use-case, but
I thought it would still be worht fixing.

[1] https://github.com/ndreys/qemu/tree/imx7-support-upstreaming 

 arch/arm/boot/dts/imx7d.dtsi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
index f46814a7ea44..4d308d17f040 100644
--- a/arch/arm/boot/dts/imx7d.dtsi
+++ b/arch/arm/boot/dts/imx7d.dtsi
@@ -144,10 +144,10 @@
interrupt-names = "msi";
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 0x7>;
-   interrupt-map = <0 0 0 1  GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>,
-   <0 0 0 2  GIC_SPI 124 IRQ_TYPE_LEVEL_HIGH>,
-   <0 0 0 3  GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
-   <0 0 0 4  GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>;
+   interrupt-map = <0 0 0 1  GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>,
+   <0 0 0 2  GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
+   <0 0 0 3  GIC_SPI 124 IRQ_TYPE_LEVEL_HIGH>,
+   <0 0 0 4  GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>;
clocks = < IMX7D_PCIE_CTRL_ROOT_CLK>,
 < IMX7D_PLL_ENET_MAIN_100M_CLK>,
 < IMX7D_PCIE_PHY_ROOT_CLK>;
-- 
2.13.5



Re: [PATCH] leds: tca6507: Remove unnecessary reg check

2017-10-09 Thread Jacek Anaszewski
Hi Christos,

Thanks for the patch.

On 10/08/2017 08:55 PM, Christos Gkekas wrote:
> Variable reg is unsigned so checking whether it is less than zero is not
> necessary.
> 
> Signed-off-by: Christos Gkekas 
> ---
>  drivers/leds/leds-tca6507.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/leds/leds-tca6507.c b/drivers/leds/leds-tca6507.c
> index 45222a7..c12c16f 100644
> --- a/drivers/leds/leds-tca6507.c
> +++ b/drivers/leds/leds-tca6507.c
> @@ -715,7 +715,7 @@ tca6507_led_dt_init(struct i2c_client *client)
>   if (of_property_match_string(child, "compatible", "gpio") >= 0)
>   led.flags |= TCA6507_MAKE_GPIO;
>   ret = of_property_read_u32(child, "reg", );
> - if (ret != 0 || reg < 0 || reg >= NUM_LEDS)
> + if (ret != 0 || reg >= NUM_LEDS)
>   continue;
>  
>   tca_leds[reg] = led;
> 

Applied.

-- 
Best regards,
Jacek Anaszewski


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Pavel Tatashin
Hi Will,

In addition to what Michal wrote:

> As an interim step, why not introduce something like
> vmemmap_alloc_block_flags and make the page-table walking opt-out for
> architectures that don't want it? Then we can just pass __GFP_ZERO from
> our vmemmap_populate where necessary and other architectures can do the
> page-table walking dance if they prefer.

I do not see the benefit, implementing this approach means that we
would need to implement two table walks instead of one: one for x86,
another for ARM, as these two architectures support kasan. Also, this
would become a requirement for any future architecture that want to
add kasan support to add this page table walk implementation.

>> IMO, while I understand that it looks strange that we must walk page
>> table after creating it, it is a better approach: more enclosed as it
>> effects kasan only, and more universal as it is in common code.
>
> I don't buy the more universal aspect, but I appreciate it's subjective.
> Frankly, I'd just sooner not have core code walking early page tables if
> it can be avoided, and it doesn't look hard to avoid it in this case.
> The fact that you're having to add pmd_large and pud_large, which are
> otherwise unused in mm/, is an indication that this isn't quite right imo.

 28 +#define pmd_large(pmd) pmd_sect(pmd)
 29 +#define pud_large(pud) pud_sect(pud)

it is just naming difference, ARM64 calls them pmd_sect, common mm and
other arches call them
pmd_large/pud_large. Even the ARM has these defines in

arm/include/asm/pgtable-3level.h
arm/include/asm/pgtable-2level.h

Pavel


Re: [PATCH v9 09/12] mm/kasan: kasan specific map populate function

2017-10-09 Thread Pavel Tatashin
Hi Will,

In addition to what Michal wrote:

> As an interim step, why not introduce something like
> vmemmap_alloc_block_flags and make the page-table walking opt-out for
> architectures that don't want it? Then we can just pass __GFP_ZERO from
> our vmemmap_populate where necessary and other architectures can do the
> page-table walking dance if they prefer.

I do not see the benefit, implementing this approach means that we
would need to implement two table walks instead of one: one for x86,
another for ARM, as these two architectures support kasan. Also, this
would become a requirement for any future architecture that want to
add kasan support to add this page table walk implementation.

>> IMO, while I understand that it looks strange that we must walk page
>> table after creating it, it is a better approach: more enclosed as it
>> effects kasan only, and more universal as it is in common code.
>
> I don't buy the more universal aspect, but I appreciate it's subjective.
> Frankly, I'd just sooner not have core code walking early page tables if
> it can be avoided, and it doesn't look hard to avoid it in this case.
> The fact that you're having to add pmd_large and pud_large, which are
> otherwise unused in mm/, is an indication that this isn't quite right imo.

 28 +#define pmd_large(pmd) pmd_sect(pmd)
 29 +#define pud_large(pud) pud_sect(pud)

it is just naming difference, ARM64 calls them pmd_sect, common mm and
other arches call them
pmd_large/pud_large. Even the ARM has these defines in

arm/include/asm/pgtable-3level.h
arm/include/asm/pgtable-2level.h

Pavel


[PATCH] arm64: dts: rockchip: enable touchpad button for rk3399-gru-kevin

2017-10-09 Thread Emil Renner Berthing
Adding the linux,gpio-keymap entry also has
the side-effect of making the driver register
the touchpad as a touchpad rather than another
touchscreen.

The index for BTN_LEFT was found by trial and error.

Signed-off-by: Emil Renner Berthing 
---
 arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts
index 5c8621766908..1c58d12551f5 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts
@@ -278,6 +278,10 @@ ap_i2c_dig:  {
pinctrl-0 = <_int_l>;
interrupt-parent = <>;
interrupts = <4 IRQ_TYPE_LEVEL_LOW>;
+   linux,gpio-keymap = ;
wakeup-source;
};
 };
-- 
2.14.2



[PATCH] arm64: dts: rockchip: enable touchpad button for rk3399-gru-kevin

2017-10-09 Thread Emil Renner Berthing
Adding the linux,gpio-keymap entry also has
the side-effect of making the driver register
the touchpad as a touchpad rather than another
touchscreen.

The index for BTN_LEFT was found by trial and error.

Signed-off-by: Emil Renner Berthing 
---
 arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts
index 5c8621766908..1c58d12551f5 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts
@@ -278,6 +278,10 @@ ap_i2c_dig:  {
pinctrl-0 = <_int_l>;
interrupt-parent = <>;
interrupts = <4 IRQ_TYPE_LEVEL_LOW>;
+   linux,gpio-keymap = ;
wakeup-source;
};
 };
-- 
2.14.2



RE: [PATCH v1 RFC 1/7] Replace license with GPL

2017-10-09 Thread Tristram.Ha
> From: tristram...@microchip.com
> > Sent: 06 October 2017 21:33
> > Replace license with GPL.
> 
> Don't you need permission from all the people who have updated
> the files in order to make this change?
> 
>   David

I am a little confused by your comment.  The 4 original KSZ9477 DSA
driver files were written by Woojung at Microchip Technology Inc.
There was a complaint the "AS IS" license is not exactly GPL.

It should be submitted formally to net-next instead of a RFC, but it
is probably pointless to do that when there is no code change.

I am hoping these drastic changes of KSZ9477 driver can be accepted
so that the patches can be submitted formally to net-next.



RE: [PATCH v1 RFC 1/7] Replace license with GPL

2017-10-09 Thread Tristram.Ha
> From: tristram...@microchip.com
> > Sent: 06 October 2017 21:33
> > Replace license with GPL.
> 
> Don't you need permission from all the people who have updated
> the files in order to make this change?
> 
>   David

I am a little confused by your comment.  The 4 original KSZ9477 DSA
driver files were written by Woojung at Microchip Technology Inc.
There was a complaint the "AS IS" license is not exactly GPL.

It should be submitted formally to net-next instead of a RFC, but it
is probably pointless to do that when there is no code change.

I am hoping these drastic changes of KSZ9477 driver can be accepted
so that the patches can be submitted formally to net-next.



Re: [PATCH v2 1/3] kcov: support comparison operands collection

2017-10-09 Thread Mark Rutland
On Mon, Oct 09, 2017 at 08:15:10PM +0200, 'Dmitry Vyukov' via syzkaller wrote:
> On Mon, Oct 9, 2017 at 5:46 PM, Mark Rutland  wrote:
> > On Mon, Oct 09, 2017 at 05:05:19PM +0200, Alexander Potapenko wrote:

> > ... I note that a few places in the kernel use a 128-bit type. Are
> > 128-bit comparisons not instrumented?
> 
> Yes, they are not instrumented.
> How many are there? Can you give some examples?

>From a quick scan, it doesn't looks like there are currently any
comparisons.

It's used as a data type in a few places under arm64:

arch/arm64/include/asm/checksum.h:  __uint128_t tmp;
arch/arm64/include/asm/checksum.h:  tmp = *(const __uint128_t *)iph;
arch/arm64/include/asm/fpsimd.h:__uint128_t vregs[32];
arch/arm64/include/uapi/asm/ptrace.h:   __uint128_t vregs[32];
arch/arm64/include/uapi/asm/sigcontext.h:   __uint128_t vregs[32];
arch/arm64/kernel/signal32.c:   __uint128_t raw;
arch/arm64/kvm/guest.c: __uint128_t tmp;

[...]

> >> + area = t->kcov_area;
> >> + /* The first 64-bit word is the number of subsequent PCs. */
> >> + pos = READ_ONCE(area[0]) + 1;
> >> + if (likely(pos < t->kcov_size)) {
> >> + area[pos] = ip;
> >> + WRITE_ONCE(area[0], pos);
> >
> > Not a new problem, but if the area for one thread is mmap'd, and read by
> > another thread, these two writes could be seen out-of-order, since we
> > don't have an smp_wmb() between them.
> >
> > I guess Syzkaller doesn't read the mmap'd kcov file from another thread?
> 
> 
> Yes, that's the intention. If you read coverage from another thread,
> you can't know coverage from what exactly you read. So the usage
> pattern is:
> 
> reset coverage;
> do something;
> read coverage;

Ok. I guess without a use-case for reading this from another thread it doesn't
really matter.

Thanks,
Mark.


Re: [PATCH v2 1/3] kcov: support comparison operands collection

2017-10-09 Thread Mark Rutland
On Mon, Oct 09, 2017 at 08:15:10PM +0200, 'Dmitry Vyukov' via syzkaller wrote:
> On Mon, Oct 9, 2017 at 5:46 PM, Mark Rutland  wrote:
> > On Mon, Oct 09, 2017 at 05:05:19PM +0200, Alexander Potapenko wrote:

> > ... I note that a few places in the kernel use a 128-bit type. Are
> > 128-bit comparisons not instrumented?
> 
> Yes, they are not instrumented.
> How many are there? Can you give some examples?

>From a quick scan, it doesn't looks like there are currently any
comparisons.

It's used as a data type in a few places under arm64:

arch/arm64/include/asm/checksum.h:  __uint128_t tmp;
arch/arm64/include/asm/checksum.h:  tmp = *(const __uint128_t *)iph;
arch/arm64/include/asm/fpsimd.h:__uint128_t vregs[32];
arch/arm64/include/uapi/asm/ptrace.h:   __uint128_t vregs[32];
arch/arm64/include/uapi/asm/sigcontext.h:   __uint128_t vregs[32];
arch/arm64/kernel/signal32.c:   __uint128_t raw;
arch/arm64/kvm/guest.c: __uint128_t tmp;

[...]

> >> + area = t->kcov_area;
> >> + /* The first 64-bit word is the number of subsequent PCs. */
> >> + pos = READ_ONCE(area[0]) + 1;
> >> + if (likely(pos < t->kcov_size)) {
> >> + area[pos] = ip;
> >> + WRITE_ONCE(area[0], pos);
> >
> > Not a new problem, but if the area for one thread is mmap'd, and read by
> > another thread, these two writes could be seen out-of-order, since we
> > don't have an smp_wmb() between them.
> >
> > I guess Syzkaller doesn't read the mmap'd kcov file from another thread?
> 
> 
> Yes, that's the intention. If you read coverage from another thread,
> you can't know coverage from what exactly you read. So the usage
> pattern is:
> 
> reset coverage;
> do something;
> read coverage;

Ok. I guess without a use-case for reading this from another thread it doesn't
really matter.

Thanks,
Mark.


linux-next: manual merge of the cgroup tree with the net-next tree

2017-10-09 Thread Mark Brown
Hi Tejun,

Today's linux-next merge of the cgroup tree got a conflict in:

  kernel/cgroup/cgroup.c

between commit:

  324bda9e6c5ad ("bpf: multi program support for cgroup+bpf")

from the net-next tree and commit:

  041cd640b2f3c ("cgroup: Implement cgroup2 basic CPU usage accounting")

from the cgroup tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

diff --cc kernel/cgroup/cgroup.c
index 00f5b358aeac,c3421ee0d230..
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@@ -4765,8 -4785,9 +4788,11 @@@ static struct cgroup *cgroup_create(str
  
return cgrp;
  
 +out_idr_free:
 +  cgroup_idr_remove(>cgroup_idr, cgrp->id);
+ out_stat_exit:
+   if (cgroup_on_dfl(parent))
+   cgroup_stat_exit(cgrp);
  out_cancel_ref:
percpu_ref_exit(>self.refcnt);
  out_free_cgrp:


signature.asc
Description: PGP signature


linux-next: manual merge of the cgroup tree with the net-next tree

2017-10-09 Thread Mark Brown
Hi Tejun,

Today's linux-next merge of the cgroup tree got a conflict in:

  kernel/cgroup/cgroup.c

between commit:

  324bda9e6c5ad ("bpf: multi program support for cgroup+bpf")

from the net-next tree and commit:

  041cd640b2f3c ("cgroup: Implement cgroup2 basic CPU usage accounting")

from the cgroup tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

diff --cc kernel/cgroup/cgroup.c
index 00f5b358aeac,c3421ee0d230..
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@@ -4765,8 -4785,9 +4788,11 @@@ static struct cgroup *cgroup_create(str
  
return cgrp;
  
 +out_idr_free:
 +  cgroup_idr_remove(>cgroup_idr, cgrp->id);
+ out_stat_exit:
+   if (cgroup_on_dfl(parent))
+   cgroup_stat_exit(cgrp);
  out_cancel_ref:
percpu_ref_exit(>self.refcnt);
  out_free_cgrp:


signature.asc
Description: PGP signature


Re: [PATCH] block: remove unnecessary NULL checks in bioset_integrity_free()

2017-10-09 Thread Kyle Fortin
On Oct 6, 2017, at 6:26 PM, Tim Hansen  wrote:
> On Fri, Oct 06, 2017 at 01:04:25PM -0600, Jens Axboe wrote:
>> On 10/05/2017 12:09 PM, Tim Hansen wrote:
>>> mempool_destroy() already checks for a NULL value being passed in, this 
>>> eliminates duplicate checks.
>>> 
>>> This was caught by running make coccicheck M=block/ on linus' tree on 
>>> commit 77ede3a014a32746002f7889211f0cecf4803163 (current head as of this 
>>> patch).
>> 
>> Applied, but I had to fix up your commit message. Please line wrap at
>> 72 chars, otherwise it turns into an unreadable mess when people
>> try to view them.
>> 
>> -- 
>> Jens Axboe
>> 
> 
> Noted! Thanks for the heads up on the char limit.

scripts/checkpatch.pl would have caught it too

--
Kyle Fortin - Oracle Linux Engineering






Re: [PATCH] block: remove unnecessary NULL checks in bioset_integrity_free()

2017-10-09 Thread Kyle Fortin
On Oct 6, 2017, at 6:26 PM, Tim Hansen  wrote:
> On Fri, Oct 06, 2017 at 01:04:25PM -0600, Jens Axboe wrote:
>> On 10/05/2017 12:09 PM, Tim Hansen wrote:
>>> mempool_destroy() already checks for a NULL value being passed in, this 
>>> eliminates duplicate checks.
>>> 
>>> This was caught by running make coccicheck M=block/ on linus' tree on 
>>> commit 77ede3a014a32746002f7889211f0cecf4803163 (current head as of this 
>>> patch).
>> 
>> Applied, but I had to fix up your commit message. Please line wrap at
>> 72 chars, otherwise it turns into an unreadable mess when people
>> try to view them.
>> 
>> -- 
>> Jens Axboe
>> 
> 
> Noted! Thanks for the heads up on the char limit.

scripts/checkpatch.pl would have caught it too

--
Kyle Fortin - Oracle Linux Engineering






[PATCH 02/13] EDAC, altera: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the Altera EDAC code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: Borislav Petkov 
Cc: Thor Thayer 
---
 drivers/edac/altera_edac.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index 346c498..11d6419 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -175,11 +175,11 @@ static ssize_t altr_sdr_mc_err_inject_write(struct file 
*file,
/*
 * To trigger the error, we need to read the data back
 * (the data was written with errors above).
-* The ACCESS_ONCE macros and printk are used to prevent the
+* The READ_ONCE macros and printk are used to prevent the
 * the compiler optimizing these reads out.
 */
-   reg = ACCESS_ONCE(ptemp[0]);
-   read_reg = ACCESS_ONCE(ptemp[1]);
+   reg = READ_ONCE(ptemp[0]);
+   read_reg = READ_ONCE(ptemp[1]);
/* Force Read */
rmb();
 
@@ -618,7 +618,7 @@ static ssize_t altr_edac_device_trig(struct file *file,
for (i = 0; i < (priv->trig_alloc_sz / sizeof(*ptemp)); i++) {
/* Read data so we're in the correct state */
rmb();
-   if (ACCESS_ONCE(ptemp[i]))
+   if (READ_ONCE(ptemp[i]))
result = -1;
/* Toggle Error bit (it is latched), leave ECC enabled */
writel(error_mask, (drvdata->base + priv->set_err_ofst));
@@ -635,7 +635,7 @@ static ssize_t altr_edac_device_trig(struct file *file,
 
/* Read out written data. ECC error caused here */
for (i = 0; i < ALTR_TRIGGER_READ_WRD_CNT; i++)
-   if (ACCESS_ONCE(ptemp[i]) != i)
+   if (READ_ONCE(ptemp[i]) != i)
edac_printk(KERN_ERR, EDAC_DEVICE,
"Read doesn't match written data\n");
 
-- 
1.9.1



[PATCH 02/13] EDAC, altera: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the Altera EDAC code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: Borislav Petkov 
Cc: Thor Thayer 
---
 drivers/edac/altera_edac.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index 346c498..11d6419 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -175,11 +175,11 @@ static ssize_t altr_sdr_mc_err_inject_write(struct file 
*file,
/*
 * To trigger the error, we need to read the data back
 * (the data was written with errors above).
-* The ACCESS_ONCE macros and printk are used to prevent the
+* The READ_ONCE macros and printk are used to prevent the
 * the compiler optimizing these reads out.
 */
-   reg = ACCESS_ONCE(ptemp[0]);
-   read_reg = ACCESS_ONCE(ptemp[1]);
+   reg = READ_ONCE(ptemp[0]);
+   read_reg = READ_ONCE(ptemp[1]);
/* Force Read */
rmb();
 
@@ -618,7 +618,7 @@ static ssize_t altr_edac_device_trig(struct file *file,
for (i = 0; i < (priv->trig_alloc_sz / sizeof(*ptemp)); i++) {
/* Read data so we're in the correct state */
rmb();
-   if (ACCESS_ONCE(ptemp[i]))
+   if (READ_ONCE(ptemp[i]))
result = -1;
/* Toggle Error bit (it is latched), leave ECC enabled */
writel(error_mask, (drvdata->base + priv->set_err_ofst));
@@ -635,7 +635,7 @@ static ssize_t altr_edac_device_trig(struct file *file,
 
/* Read out written data. ECC error caused here */
for (i = 0; i < ALTR_TRIGGER_READ_WRD_CNT; i++)
-   if (ACCESS_ONCE(ptemp[i]) != i)
+   if (READ_ONCE(ptemp[i]) != i)
edac_printk(KERN_ERR, EDAC_DEVICE,
"Read doesn't match written data\n");
 
-- 
1.9.1



[PATCH 06/13] media: dvb_ringbuffer: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the DVB ringbuffer code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: Mauro Carvalho Chehab 
Cc: Sakari Ailus 
---
 drivers/media/dvb-core/dvb_ringbuffer.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/media/dvb-core/dvb_ringbuffer.c 
b/drivers/media/dvb-core/dvb_ringbuffer.c
index 2322af1..5301162 100644
--- a/drivers/media/dvb-core/dvb_ringbuffer.c
+++ b/drivers/media/dvb-core/dvb_ringbuffer.c
@@ -66,12 +66,12 @@ ssize_t dvb_ringbuffer_free(struct dvb_ringbuffer *rbuf)
 {
ssize_t free;
 
-   /* ACCESS_ONCE() to load read pointer on writer side
+   /* READ_ONCE() to load read pointer on writer side
 * this pairs with smp_store_release() in dvb_ringbuffer_read(),
 * dvb_ringbuffer_read_user(), dvb_ringbuffer_flush(),
 * or dvb_ringbuffer_reset()
 */
-   free = ACCESS_ONCE(rbuf->pread) - rbuf->pwrite;
+   free = READ_ONCE(rbuf->pread) - rbuf->pwrite;
if (free <= 0)
free += rbuf->size;
return free-1;
@@ -143,7 +143,7 @@ ssize_t dvb_ringbuffer_read_user(struct dvb_ringbuffer 
*rbuf, u8 __user *buf, si
todo -= split;
/* smp_store_release() for read pointer update to ensure
 * that buf is not overwritten until read is complete,
-* this pairs with ACCESS_ONCE() in dvb_ringbuffer_free()
+* this pairs with READ_ONCE() in dvb_ringbuffer_free()
 */
smp_store_release(>pread, 0);
}
@@ -168,7 +168,7 @@ void dvb_ringbuffer_read(struct dvb_ringbuffer *rbuf, u8 
*buf, size_t len)
todo -= split;
/* smp_store_release() for read pointer update to ensure
 * that buf is not overwritten until read is complete,
-* this pairs with ACCESS_ONCE() in dvb_ringbuffer_free()
+* this pairs with READ_ONCE() in dvb_ringbuffer_free()
 */
smp_store_release(>pread, 0);
}
-- 
1.9.1



[PATCH 06/13] media: dvb_ringbuffer: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the DVB ringbuffer code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: Mauro Carvalho Chehab 
Cc: Sakari Ailus 
---
 drivers/media/dvb-core/dvb_ringbuffer.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/media/dvb-core/dvb_ringbuffer.c 
b/drivers/media/dvb-core/dvb_ringbuffer.c
index 2322af1..5301162 100644
--- a/drivers/media/dvb-core/dvb_ringbuffer.c
+++ b/drivers/media/dvb-core/dvb_ringbuffer.c
@@ -66,12 +66,12 @@ ssize_t dvb_ringbuffer_free(struct dvb_ringbuffer *rbuf)
 {
ssize_t free;
 
-   /* ACCESS_ONCE() to load read pointer on writer side
+   /* READ_ONCE() to load read pointer on writer side
 * this pairs with smp_store_release() in dvb_ringbuffer_read(),
 * dvb_ringbuffer_read_user(), dvb_ringbuffer_flush(),
 * or dvb_ringbuffer_reset()
 */
-   free = ACCESS_ONCE(rbuf->pread) - rbuf->pwrite;
+   free = READ_ONCE(rbuf->pread) - rbuf->pwrite;
if (free <= 0)
free += rbuf->size;
return free-1;
@@ -143,7 +143,7 @@ ssize_t dvb_ringbuffer_read_user(struct dvb_ringbuffer 
*rbuf, u8 __user *buf, si
todo -= split;
/* smp_store_release() for read pointer update to ensure
 * that buf is not overwritten until read is complete,
-* this pairs with ACCESS_ONCE() in dvb_ringbuffer_free()
+* this pairs with READ_ONCE() in dvb_ringbuffer_free()
 */
smp_store_release(>pread, 0);
}
@@ -168,7 +168,7 @@ void dvb_ringbuffer_read(struct dvb_ringbuffer *rbuf, u8 
*buf, size_t len)
todo -= split;
/* smp_store_release() for read pointer update to ensure
 * that buf is not overwritten until read is complete,
-* this pairs with ACCESS_ONCE() in dvb_ringbuffer_free()
+* this pairs with READ_ONCE() in dvb_ringbuffer_free()
 */
smp_store_release(>pread, 0);
}
-- 
1.9.1



[PATCH 09/13] net: average: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't pick up some uses, including those
in . As a preparatory step, this patch converts the
file to use {READ,WRITE}_ONCE() consistently.

At the same time, this patch addds missing includes necessary for
{READ,WRITE}_ONCE(), *BUG_ON*(), and ilog2().


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: Johannes Berg 
Cc: David S. Miller 
---
 include/linux/average.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/average.h b/include/linux/average.h
index 7ddaf34..7a72de4 100644
--- a/include/linux/average.h
+++ b/include/linux/average.h
@@ -1,6 +1,10 @@
 #ifndef _LINUX_AVERAGE_H
 #define _LINUX_AVERAGE_H
 
+#include 
+#include 
+#include 
+
 /*
  * Exponentially weighted moving average (EWMA)
  *
@@ -48,7 +52,7 @@
static inline void ewma_##name##_add(struct ewma_##name *e, \
 unsigned long val) \
{   \
-   unsigned long internal = ACCESS_ONCE(e->internal);  \
+   unsigned long internal = READ_ONCE(e->internal);\
unsigned long weight_rcp = ilog2(_weight_rcp);  \
unsigned long precision = _precision;   \
\
@@ -57,10 +61,10 @@
BUILD_BUG_ON((_precision) > 30);\
BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);   \
\
-   ACCESS_ONCE(e->internal) = internal ?   \
+   WRITE_ONCE(e->internal, internal ?  \
(((internal << weight_rcp) - internal) +\
(val << precision)) >> weight_rcp : \
-   (val << precision); \
+   (val << precision));\
}
 
 #endif /* _LINUX_AVERAGE_H */
-- 
1.9.1



[PATCH 09/13] net: average: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't pick up some uses, including those
in . As a preparatory step, this patch converts the
file to use {READ,WRITE}_ONCE() consistently.

At the same time, this patch addds missing includes necessary for
{READ,WRITE}_ONCE(), *BUG_ON*(), and ilog2().


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: Johannes Berg 
Cc: David S. Miller 
---
 include/linux/average.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/average.h b/include/linux/average.h
index 7ddaf34..7a72de4 100644
--- a/include/linux/average.h
+++ b/include/linux/average.h
@@ -1,6 +1,10 @@
 #ifndef _LINUX_AVERAGE_H
 #define _LINUX_AVERAGE_H
 
+#include 
+#include 
+#include 
+
 /*
  * Exponentially weighted moving average (EWMA)
  *
@@ -48,7 +52,7 @@
static inline void ewma_##name##_add(struct ewma_##name *e, \
 unsigned long val) \
{   \
-   unsigned long internal = ACCESS_ONCE(e->internal);  \
+   unsigned long internal = READ_ONCE(e->internal);\
unsigned long weight_rcp = ilog2(_weight_rcp);  \
unsigned long precision = _precision;   \
\
@@ -57,10 +61,10 @@
BUILD_BUG_ON((_precision) > 30);\
BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);   \
\
-   ACCESS_ONCE(e->internal) = internal ?   \
+   WRITE_ONCE(e->internal, internal ?  \
(((internal << weight_rcp) - internal) +\
(val << precision)) >> weight_rcp : \
-   (val << precision); \
+   (val << precision));\
}
 
 #endif /* _LINUX_AVERAGE_H */
-- 
1.9.1



Re: net/sunrpc: v4.14-rc4 lockdep warning

2017-10-09 Thread Trond Myklebust
On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> Hi,
> 
> I have run into the lockdep warning below while running v4.14-rc3/rc4
> on an ARM64 defconfig Juno dev board - reporting it to check whether
> it is a known/genuine issue.
> 
> Please let me know if you need further debug data or need some
> specific tests.
> 
> Thanks,
> Lorenzo
> 
> [6.209384] ==
> [6.215569] WARNING: possible circular locking dependency detected
> [6.221755] 4.14.0-rc4 #54 Not tainted
> [6.225503] --
> [6.231689] kworker/4:0H/32 is trying to acquire lock:
> [6.236830]  ((>u.tk_work)){+.+.}, at: []
> process_one_work+0x1cc/0x3f0
> [6.245472] 
>but task is already holding lock:
> [6.251309]  ("xprtiod"){+.+.}, at: []
> process_one_work+0x1cc/0x3f0
> [6.259158] 
>which lock already depends on the new lock.
> 
> [6.267345] 
>the existing dependency chain (in reverse order) is:
> [6.274836] 
>-> #1 ("xprtiod"){+.+.}:
> [6.279903]lock_acquire+0x6c/0xb8
> [6.283914]flush_work+0x188/0x270
> [6.287926]__cancel_work_timer+0x120/0x198
> [6.292720]cancel_work_sync+0x10/0x18
> [6.297081]xs_destroy+0x34/0x58
> [6.300917]xprt_destroy+0x84/0x90
> [6.304927]xprt_put+0x34/0x40
> [6.308589]rpc_task_release_client+0x6c/0x80
> [6.313557]rpc_release_resources_task+0x2c/0x38
> [6.318786]__rpc_execute+0x9c/0x210
> [6.322971]rpc_async_schedule+0x10/0x18
> [6.327504]process_one_work+0x240/0x3f0
> [6.332036]worker_thread+0x48/0x420
> [6.336222]kthread+0x12c/0x158
> [6.339972]ret_from_fork+0x10/0x18
> [6.344068] 
>-> #0 ((>u.tk_work)){+.+.}:
> [6.349920]__lock_acquire+0x12ec/0x14a8
> [6.354451]lock_acquire+0x6c/0xb8
> [6.358462]process_one_work+0x22c/0x3f0
> [6.362994]worker_thread+0x48/0x420
> [6.367180]kthread+0x12c/0x158
> [6.370929]ret_from_fork+0x10/0x18
> [6.375025] 
>other info that might help us debug this:
> 
> [6.383038]  Possible unsafe locking scenario:
> 
> [6.388962]CPU0CPU1
> [6.393493]
> [6.398023]   lock("xprtiod");
> [6.401080]lock((
> >u.tk_work));
> [6.407444]lock("xprtiod");
> [6.413024]   lock((>u.tk_work));
> [6.416863] 
> *** DEADLOCK ***
> 
> [6.422789] 1 lock held by kworker/4:0H/32:
> [6.426972]  #0:  ("xprtiod"){+.+.}, at: []
> process_one_work+0x1cc/0x3f0
> [6.435258] 
>stack backtrace:
> [6.439618] CPU: 4 PID: 32 Comm: kworker/4:0H Not tainted 4.14.0-
> rc4 #54
> [6.446325] Hardware name: ARM Juno development board (r2) (DT)
> [6.452252] Workqueue: xprtiod rpc_async_schedule
> [6.456959] Call trace:
> [6.459406] [] dump_backtrace+0x0/0x3c8
> [6.464810] [] show_stack+0x14/0x20
> [6.469866] [] dump_stack+0xb8/0xf0
> [6.474922] [] print_circular_bug+0x224/0x3a0
> [6.480849] [] check_prev_add+0x304/0x860
> [6.486426] [] __lock_acquire+0x12ec/0x14a8
> [6.492177] [] lock_acquire+0x6c/0xb8
> [6.497406] [] process_one_work+0x22c/0x3f0
> [6.503156] [] worker_thread+0x48/0x420
> [6.508560] [] kthread+0x12c/0x158
> [6.513528] [] ret_from_fork+0x10/0x18
> 

Adding Tejun and Lai, since this looks like a workqueue locking issue.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com


Re: [RFC PATCH] x86/mm: Flush more aggressively in lazy TLB mode

2017-10-09 Thread Andy Lutomirski
On Mon, Oct 9, 2017 at 11:08 AM, Borislav Petkov  wrote:
> On Mon, Oct 09, 2017 at 10:50:34AM -0700, Andy Lutomirski wrote:
>> The choices are somewhat lazy and not lazy at all.
>
> Yeah, you probably should explain those choices somewhere and what
> exactly they mean.
>
>> The degree of simplification I would get by removing it is basically
>> nil.  The debugfs code itself goes away, and a
>> static_branch_unlikely() turns into a static_cpu_has(), and that's it.
>
> Sure. But it is one variable less which is not really needed by the
> widest audience.
>
>> The real reason I added it is because Chris Mason volunteered to
>> benchmark it, and I'll send it to him once it survives a bit of
>> review.
>
> Sure but it still doesn't need to be upstream. You can do all the
> measurements with a patch ontop. You don't need the permanent knob in
> debugfs either. After a year, no one would really need that anymore,
> since the majority will be PCID machines.
>
>> This is non-lazy.  It's roughtly what our state was in old kernels
>> when we went lazy and then called leave_mm().
>
> non-lazy when we went lazy?!
>
> Now I'm confused :)

The function enter_lazy_tlb() is horribly named.  It really just means
scheduler_doesnt_need_an_mm_anymore().

--Andy


Re: net/sunrpc: v4.14-rc4 lockdep warning

2017-10-09 Thread Trond Myklebust
On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> Hi,
> 
> I have run into the lockdep warning below while running v4.14-rc3/rc4
> on an ARM64 defconfig Juno dev board - reporting it to check whether
> it is a known/genuine issue.
> 
> Please let me know if you need further debug data or need some
> specific tests.
> 
> Thanks,
> Lorenzo
> 
> [6.209384] ==
> [6.215569] WARNING: possible circular locking dependency detected
> [6.221755] 4.14.0-rc4 #54 Not tainted
> [6.225503] --
> [6.231689] kworker/4:0H/32 is trying to acquire lock:
> [6.236830]  ((>u.tk_work)){+.+.}, at: []
> process_one_work+0x1cc/0x3f0
> [6.245472] 
>but task is already holding lock:
> [6.251309]  ("xprtiod"){+.+.}, at: []
> process_one_work+0x1cc/0x3f0
> [6.259158] 
>which lock already depends on the new lock.
> 
> [6.267345] 
>the existing dependency chain (in reverse order) is:
> [6.274836] 
>-> #1 ("xprtiod"){+.+.}:
> [6.279903]lock_acquire+0x6c/0xb8
> [6.283914]flush_work+0x188/0x270
> [6.287926]__cancel_work_timer+0x120/0x198
> [6.292720]cancel_work_sync+0x10/0x18
> [6.297081]xs_destroy+0x34/0x58
> [6.300917]xprt_destroy+0x84/0x90
> [6.304927]xprt_put+0x34/0x40
> [6.308589]rpc_task_release_client+0x6c/0x80
> [6.313557]rpc_release_resources_task+0x2c/0x38
> [6.318786]__rpc_execute+0x9c/0x210
> [6.322971]rpc_async_schedule+0x10/0x18
> [6.327504]process_one_work+0x240/0x3f0
> [6.332036]worker_thread+0x48/0x420
> [6.336222]kthread+0x12c/0x158
> [6.339972]ret_from_fork+0x10/0x18
> [6.344068] 
>-> #0 ((>u.tk_work)){+.+.}:
> [6.349920]__lock_acquire+0x12ec/0x14a8
> [6.354451]lock_acquire+0x6c/0xb8
> [6.358462]process_one_work+0x22c/0x3f0
> [6.362994]worker_thread+0x48/0x420
> [6.367180]kthread+0x12c/0x158
> [6.370929]ret_from_fork+0x10/0x18
> [6.375025] 
>other info that might help us debug this:
> 
> [6.383038]  Possible unsafe locking scenario:
> 
> [6.388962]CPU0CPU1
> [6.393493]
> [6.398023]   lock("xprtiod");
> [6.401080]lock((
> >u.tk_work));
> [6.407444]lock("xprtiod");
> [6.413024]   lock((>u.tk_work));
> [6.416863] 
> *** DEADLOCK ***
> 
> [6.422789] 1 lock held by kworker/4:0H/32:
> [6.426972]  #0:  ("xprtiod"){+.+.}, at: []
> process_one_work+0x1cc/0x3f0
> [6.435258] 
>stack backtrace:
> [6.439618] CPU: 4 PID: 32 Comm: kworker/4:0H Not tainted 4.14.0-
> rc4 #54
> [6.446325] Hardware name: ARM Juno development board (r2) (DT)
> [6.452252] Workqueue: xprtiod rpc_async_schedule
> [6.456959] Call trace:
> [6.459406] [] dump_backtrace+0x0/0x3c8
> [6.464810] [] show_stack+0x14/0x20
> [6.469866] [] dump_stack+0xb8/0xf0
> [6.474922] [] print_circular_bug+0x224/0x3a0
> [6.480849] [] check_prev_add+0x304/0x860
> [6.486426] [] __lock_acquire+0x12ec/0x14a8
> [6.492177] [] lock_acquire+0x6c/0xb8
> [6.497406] [] process_one_work+0x22c/0x3f0
> [6.503156] [] worker_thread+0x48/0x420
> [6.508560] [] kthread+0x12c/0x158
> [6.513528] [] ret_from_fork+0x10/0x18
> 

Adding Tejun and Lai, since this looks like a workqueue locking issue.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com


Re: [RFC PATCH] x86/mm: Flush more aggressively in lazy TLB mode

2017-10-09 Thread Andy Lutomirski
On Mon, Oct 9, 2017 at 11:08 AM, Borislav Petkov  wrote:
> On Mon, Oct 09, 2017 at 10:50:34AM -0700, Andy Lutomirski wrote:
>> The choices are somewhat lazy and not lazy at all.
>
> Yeah, you probably should explain those choices somewhere and what
> exactly they mean.
>
>> The degree of simplification I would get by removing it is basically
>> nil.  The debugfs code itself goes away, and a
>> static_branch_unlikely() turns into a static_cpu_has(), and that's it.
>
> Sure. But it is one variable less which is not really needed by the
> widest audience.
>
>> The real reason I added it is because Chris Mason volunteered to
>> benchmark it, and I'll send it to him once it survives a bit of
>> review.
>
> Sure but it still doesn't need to be upstream. You can do all the
> measurements with a patch ontop. You don't need the permanent knob in
> debugfs either. After a year, no one would really need that anymore,
> since the majority will be PCID machines.
>
>> This is non-lazy.  It's roughtly what our state was in old kernels
>> when we went lazy and then called leave_mm().
>
> non-lazy when we went lazy?!
>
> Now I'm confused :)

The function enter_lazy_tlb() is horribly named.  It really just means
scheduler_doesnt_need_an_mm_anymore().

--Andy


[PATCH 07/13] net: netlink/netfilter: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts netlink and netfilter code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: David S. Miller 
Cc: Florian Westphal 
Cc: Jozsef Kadlecsik 
Cc: Pablo Neira Ayuso 
---
 include/linux/genetlink.h   | 2 +-
 include/linux/netfilter/nfnetlink.h | 2 +-
 include/linux/rtnetlink.h   | 2 +-
 include/net/netfilter/nf_tables.h   | 4 ++--
 net/netfilter/ipvs/ip_vs_sync.c | 2 +-
 net/netfilter/nfnetlink_queue.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h
index a4c61cb..0e694cf 100644
--- a/include/linux/genetlink.h
+++ b/include/linux/genetlink.h
@@ -30,7 +30,7 @@
  * @p: The pointer to read, prior to dereferencing
  *
  * Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * both the smp_read_barrier_depends() and the READ_ONCE(), because
  * caller holds genl mutex.
  */
 #define genl_dereference(p)\
diff --git a/include/linux/netfilter/nfnetlink.h 
b/include/linux/netfilter/nfnetlink.h
index 41d04e9..0f47a4a 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -66,7 +66,7 @@ static inline bool lockdep_nfnl_is_held(__u8 subsys_id)
  * @ss: The nfnetlink subsystem ID
  *
  * Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * both the smp_read_barrier_depends() and the READ_ONCE(), because
  * caller holds the NFNL subsystem mutex.
  */
 #define nfnl_dereference(p, ss)\
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dea59c8..765f7b9 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -67,7 +67,7 @@ static inline bool lockdep_rtnl_is_held(void)
  * @p: The pointer to read, prior to dereferencing
  *
  * Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * both the smp_read_barrier_depends() and the READ_ONCE(), because
  * caller holds RTNL.
  */
 #define rtnl_dereference(p)\
diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 0f5b12a..5c68e27 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1164,8 +1164,8 @@ static inline u8 nft_genmask_next(const struct net *net)
 
 static inline u8 nft_genmask_cur(const struct net *net)
 {
-   /* Use ACCESS_ONCE() to prevent refetching the value for atomicity */
-   return 1 << ACCESS_ONCE(net->nft.gencursor);
+   /* Use READ_ONCE() to prevent refetching the value for atomicity */
+   return 1 << READ_ONCE(net->nft.gencursor);
 }
 
 #define NFT_GENMASK_ANY((1 << 0) | (1 << 1))
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 0e5b64a..1cfffd4 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -457,7 +457,7 @@ static inline bool in_persistence(struct ip_vs_conn *cp)
 static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs,
  struct ip_vs_conn *cp, int pkts)
 {
-   unsigned long orig = ACCESS_ONCE(cp->sync_endtime);
+   unsigned long orig = READ_ONCE(cp->sync_endtime);
unsigned long now = jiffies;
unsigned long n = (now + cp->timeout) & ~3UL;
unsigned int sync_refresh_period;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c979662..a16356c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -401,7 +401,7 @@ static int nfqnl_put_bridge(struct nf_queue_entry *entry, 
struct sk_buff *skb)
 
outdev = entry->state.out;
 
-   switch ((enum nfqnl_config_mode)ACCESS_ONCE(queue->copy_mode)) {
+   switch ((enum 

[PATCH 07/13] net: netlink/netfilter: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts netlink and netfilter code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: David S. Miller 
Cc: Florian Westphal 
Cc: Jozsef Kadlecsik 
Cc: Pablo Neira Ayuso 
---
 include/linux/genetlink.h   | 2 +-
 include/linux/netfilter/nfnetlink.h | 2 +-
 include/linux/rtnetlink.h   | 2 +-
 include/net/netfilter/nf_tables.h   | 4 ++--
 net/netfilter/ipvs/ip_vs_sync.c | 2 +-
 net/netfilter/nfnetlink_queue.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h
index a4c61cb..0e694cf 100644
--- a/include/linux/genetlink.h
+++ b/include/linux/genetlink.h
@@ -30,7 +30,7 @@
  * @p: The pointer to read, prior to dereferencing
  *
  * Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * both the smp_read_barrier_depends() and the READ_ONCE(), because
  * caller holds genl mutex.
  */
 #define genl_dereference(p)\
diff --git a/include/linux/netfilter/nfnetlink.h 
b/include/linux/netfilter/nfnetlink.h
index 41d04e9..0f47a4a 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -66,7 +66,7 @@ static inline bool lockdep_nfnl_is_held(__u8 subsys_id)
  * @ss: The nfnetlink subsystem ID
  *
  * Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * both the smp_read_barrier_depends() and the READ_ONCE(), because
  * caller holds the NFNL subsystem mutex.
  */
 #define nfnl_dereference(p, ss)\
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dea59c8..765f7b9 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -67,7 +67,7 @@ static inline bool lockdep_rtnl_is_held(void)
  * @p: The pointer to read, prior to dereferencing
  *
  * Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * both the smp_read_barrier_depends() and the READ_ONCE(), because
  * caller holds RTNL.
  */
 #define rtnl_dereference(p)\
diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 0f5b12a..5c68e27 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1164,8 +1164,8 @@ static inline u8 nft_genmask_next(const struct net *net)
 
 static inline u8 nft_genmask_cur(const struct net *net)
 {
-   /* Use ACCESS_ONCE() to prevent refetching the value for atomicity */
-   return 1 << ACCESS_ONCE(net->nft.gencursor);
+   /* Use READ_ONCE() to prevent refetching the value for atomicity */
+   return 1 << READ_ONCE(net->nft.gencursor);
 }
 
 #define NFT_GENMASK_ANY((1 << 0) | (1 << 1))
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 0e5b64a..1cfffd4 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -457,7 +457,7 @@ static inline bool in_persistence(struct ip_vs_conn *cp)
 static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs,
  struct ip_vs_conn *cp, int pkts)
 {
-   unsigned long orig = ACCESS_ONCE(cp->sync_endtime);
+   unsigned long orig = READ_ONCE(cp->sync_endtime);
unsigned long now = jiffies;
unsigned long n = (now + cp->timeout) & ~3UL;
unsigned int sync_refresh_period;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c979662..a16356c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -401,7 +401,7 @@ static int nfqnl_put_bridge(struct nf_queue_entry *entry, 
struct sk_buff *skb)
 
outdev = entry->state.out;
 
-   switch ((enum nfqnl_config_mode)ACCESS_ONCE(queue->copy_mode)) {
+   switch ((enum nfqnl_config_mode)READ_ONCE(queue->copy_mode)) {
case NFQNL_COPY_META:
case NFQNL_COPY_NONE:
  

[PATCH 08/13] net/ipv4/tcp_input.c: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the IPv4 TCP input code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: David S. Miller 
---
 net/ipv4/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656..0b3bb19 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -815,12 +815,12 @@ static void tcp_update_pacing_rate(struct sock *sk)
if (likely(tp->srtt_us))
do_div(rate, tp->srtt_us);
 
-   /* ACCESS_ONCE() is needed because sch_fq fetches sk_pacing_rate
+   /* WRITE_ONCE() is needed because sch_fq fetches sk_pacing_rate
 * without any lock. We want to make sure compiler wont store
 * intermediate values in this location.
 */
-   ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
-   sk->sk_max_pacing_rate);
+   WRITE_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
+  sk->sk_max_pacing_rate);
 }
 
 /* Calculate rto without backoff.  This is the second half of Van Jacobson's
-- 
1.9.1



[PATCH 08/13] net/ipv4/tcp_input.c: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the IPv4 TCP input code and comments to use
{READ,WRITE}_ONCE() consistently.


virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)


Signed-off-by: Mark Rutland 
Cc: David S. Miller 
---
 net/ipv4/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656..0b3bb19 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -815,12 +815,12 @@ static void tcp_update_pacing_rate(struct sock *sk)
if (likely(tp->srtt_us))
do_div(rate, tp->srtt_us);
 
-   /* ACCESS_ONCE() is needed because sch_fq fetches sk_pacing_rate
+   /* WRITE_ONCE() is needed because sch_fq fetches sk_pacing_rate
 * without any lock. We want to make sure compiler wont store
 * intermediate values in this location.
 */
-   ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
-   sk->sk_max_pacing_rate);
+   WRITE_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
+  sk->sk_max_pacing_rate);
 }
 
 /* Calculate rto without backoff.  This is the second half of Van Jacobson's
-- 
1.9.1



[PATCH 10/13] samples: mic/mpssd/mpssd.c: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

The bulk of the kernel code can be transformed via Coccinelle to use
{READ,WRITE}_ONCE(), though this only modifies users of ACCESS_ONCE(),
and not the implementation itself. As such, it has the potential to
break homebrew ACCESS_ONCE() macros seen in some user code in the kernel
tree (e.g. the virtio code, as fixed in commit ea9156fb3b71d9f7).

To avoid fragility if/when that transformation occurs, and to align with
the preferred usage of {READ,WRITE}_ONCE(), this patch updates the MPSSD
sample code to use READ_ONCE() rather than ACCESS_ONCE(). There should
be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: Shuah Khan 
---
 samples/mic/mpssd/mpssd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/samples/mic/mpssd/mpssd.c b/samples/mic/mpssd/mpssd.c
index 49db1de..f42ce55 100644
--- a/samples/mic/mpssd/mpssd.c
+++ b/samples/mic/mpssd/mpssd.c
@@ -65,7 +65,7 @@
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)_ALIGN(addr, PAGE_SIZE)
 
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
 
 #define GSO_ENABLED1
 #define MAX_GSO_SIZE   (64 * 1024)
@@ -382,7 +382,7 @@ static inline void verify_out_len(struct mic_info *mic,
 
 static inline __u16 read_avail_idx(struct mic_vring *vr)
 {
-   return ACCESS_ONCE(vr->info->avail_idx);
+   return READ_ONCE(vr->info->avail_idx);
 }
 
 static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
@@ -523,7 +523,7 @@ static inline unsigned _vring_size(unsigned int num, 
unsigned long align)
 {
__u16 avail_idx = read_avail_idx(vr);
 
-   while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
+   while (avail_idx == le16toh(READ_ONCE(vr->vr.avail->idx))) {
 #ifdef DEBUG
mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
mic->name, __func__,
-- 
1.9.1



[PATCH 10/13] samples: mic/mpssd/mpssd.c: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

The bulk of the kernel code can be transformed via Coccinelle to use
{READ,WRITE}_ONCE(), though this only modifies users of ACCESS_ONCE(),
and not the implementation itself. As such, it has the potential to
break homebrew ACCESS_ONCE() macros seen in some user code in the kernel
tree (e.g. the virtio code, as fixed in commit ea9156fb3b71d9f7).

To avoid fragility if/when that transformation occurs, and to align with
the preferred usage of {READ,WRITE}_ONCE(), this patch updates the MPSSD
sample code to use READ_ONCE() rather than ACCESS_ONCE(). There should
be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: Shuah Khan 
---
 samples/mic/mpssd/mpssd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/samples/mic/mpssd/mpssd.c b/samples/mic/mpssd/mpssd.c
index 49db1de..f42ce55 100644
--- a/samples/mic/mpssd/mpssd.c
+++ b/samples/mic/mpssd/mpssd.c
@@ -65,7 +65,7 @@
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)_ALIGN(addr, PAGE_SIZE)
 
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
 
 #define GSO_ENABLED1
 #define MAX_GSO_SIZE   (64 * 1024)
@@ -382,7 +382,7 @@ static inline void verify_out_len(struct mic_info *mic,
 
 static inline __u16 read_avail_idx(struct mic_vring *vr)
 {
-   return ACCESS_ONCE(vr->info->avail_idx);
+   return READ_ONCE(vr->info->avail_idx);
 }
 
 static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
@@ -523,7 +523,7 @@ static inline unsigned _vring_size(unsigned int num, 
unsigned long align)
 {
__u16 avail_idx = read_avail_idx(vr);
 
-   while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
+   while (avail_idx == le16toh(READ_ONCE(vr->vr.avail->idx))) {
 #ifdef DEBUG
mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
mic->name, __func__,
-- 
1.9.1



[PATCH 11/13] selftests/powerpc: kill off ACCESS_ONCE()

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

The bulk of the kernel code can be transformed via Coccinelle to use
{READ,WRITE}_ONCE(), though this only modifies users of ACCESS_ONCE(),
and not the implementation itself. As such, it has the potential to
break homebrew ACCESS_ONCE() macros seen in some user code in the kernel
tree (e.g. the virtio code, as fixed in commit ea9156fb3b71d9f7).

To avoid fragility if/when that transformation occurs, and to align with
the preferred usage of {READ,WRITE}_ONCE(), this patch updates the DSCR
selftest code to use READ_ONCE() rather than ACCESS_ONCE(). There should
be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Shuah Khan 
---
 tools/testing/selftests/powerpc/dscr/dscr.h  | 2 +-
 tools/testing/selftests/powerpc/dscr/dscr_default_test.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h 
b/tools/testing/selftests/powerpc/dscr/dscr.h
index 18ea223b..cdb840b 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr.h
+++ b/tools/testing/selftests/powerpc/dscr/dscr.h
@@ -39,7 +39,7 @@
 #define rmb()  asm volatile("lwsync":::"memory")
 #define wmb()  asm volatile("lwsync":::"memory")
 
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
 
 /* Prilvilege state DSCR access */
 inline unsigned long get_dscr(void)
diff --git a/tools/testing/selftests/powerpc/dscr/dscr_default_test.c 
b/tools/testing/selftests/powerpc/dscr/dscr_default_test.c
index df17c3b..9e1a37e 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr_default_test.c
+++ b/tools/testing/selftests/powerpc/dscr/dscr_default_test.c
@@ -27,7 +27,7 @@ static void *do_test(void *in)
unsigned long d, cur_dscr, cur_dscr_usr;
unsigned long s1, s2;
 
-   s1 = ACCESS_ONCE(sequence);
+   s1 = READ_ONCE(sequence);
if (s1 & 1)
continue;
rmb();
-- 
1.9.1



[PATCH 13/13] rcutorture: formal: prepare for ACCESS_ONCE() removal

2017-10-09 Thread Mark Rutland
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

The bulk of the kernel code can be transformed via Coccinelle to use
{READ,WRITE}_ONCE(), though this only modifies users of ACCESS_ONCE(),
and not the implementation itself. As such, it has the potential to
break homebrew ACCESS_ONCE() macros seen in some user code in the kernel
tree (e.g. the virtio code, as fixed in commit ea9156fb3b71d9f7).

To avoid fragility if/when that transformation occurs, this patch
reworks the rcutorture formal tests to use an intermediate
__ACCESS_ONCE() helper, which will avoid {READ,WRITE}_ONCE() being
turned into tautological definitions. There should be no functional
change as a result of this patch.

Cc: Paul E. McKenney 
Signed-off-by: Mark Rutland 
---
 tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/barriers.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/barriers.h 
b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/barriers.h
index 6687acc..ee4e4f8 100644
--- a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/barriers.h
+++ b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/barriers.h
@@ -34,8 +34,9 @@
 #define rs_smp_mb() do {} while (0)
 #endif
 
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *) &(x))
-#define READ_ONCE(x) ACCESS_ONCE(x)
-#define WRITE_ONCE(x, val) (ACCESS_ONCE(x) = (val))
+#define __ACCESS_ONCE(x) (*(volatile typeof(x) *) &(x))
+#define ACCESS_ONCE(x) __ACCESS_ONCE(x)
+#define READ_ONCE(x) __ACCESS_ONCE(x)
+#define WRITE_ONCE(x, val) (__ACCESS_ONCE(x) = (val))
 
 #endif
-- 
1.9.1



<    3   4   5   6   7   8   9   10   11   12   >