Re: [RFC][PATCH 2/3] drm/bridge: adv7511: Add 200ms delay on power-on

2016-11-22 Thread Daniel Vetter
On Tue, Nov 22, 2016 at 08:23:38PM +0200, Laurent Pinchart wrote:
> Hi John,
> 
> (CC'ing Daniel)
> 
> On Tuesday 22 Nov 2016 10:07:53 John Stultz wrote:
> > On Tue, Nov 22, 2016 at 9:38 AM, John Stultz  wrote:
> > > Interestingly, without the msleep added in this patch, removing the
> > > wait_event_interruptible_timeout() method in adv7511_wait_for_edid()
> > > and using the polling loop seems to make things just as reliable. So
> > > maybe something is off with the irq handling here instead?
> > 
> > A.. So I think the trouble here is the that when we fail waiting
> > for the irq, the backtrace is as follows:
> > 
> > [8.318654] [] dump_backtrace+0x0/0x1a0
> > [8.318661] [] show_stack+0x14/0x20
> > [8.318671] [] dump_stack+0x90/0xb0
> > [8.318680] [] adv7511_get_edid_block+0x2c8/0x320
> > [8.318687] [] drm_do_get_edid+0x78/0x280
> > [8.318693] [] adv7511_get_modes+0x80/0xd8
> > [8.318700] [] adv7511_connector_get_modes+0x14/0x20
> > [8.318710] []
> > drm_helper_probe_single_connector_modes+0x2bc/0x500
> > [8.318718] [] drm_fb_helper_hotplug_event+0x130/0x188
> > [8.318726] [] drm_fbdev_cma_hotplug_event+0x10/0x20
> > [8.318733] []
> > kirin_fbdev_output_poll_changed+0x20/0x58
> > [8.318740] [] drm_kms_helper_hotplug_event+0x28/0x38
> > [8.318748] [] drm_helper_hpd_irq_event+0x138/0x180
> > [8.318754] [] adv7511_irq_process+0x78/0xd8
> > [8.318761] [] adv7511_irq_handler+0x14/0x28
> > [8.318769] [] irq_thread_fn+0x28/0x68
> > [8.318775] [] irq_thread+0x128/0x1e8
> > [8.318782] [] kthread+0xd0/0xe8
> > [8.318788] [] ret_from_fork+0x10/0x50
> > 
> > So we're actually in irq handling the hotplug interrupt, which is why
> > we never get the irq notification when the edid is read.
> > 
> > I suspect we need to use a workqueue to do the hotplug handling out of irq.
> 
> Lovely :-)
> 
> Quoting the DRM documentation:
> 
> /**
>  * drm_helper_hpd_irq_event - hotplug processing
>  * @dev: drm_device
>  *
>  * Drivers can use this helper function to run a detect cycle on all 
> connectors
>  * which have the DRM_CONNECTOR_POLL_HPD flag set in their  member. All
>  * other connectors are ignored, which is useful to avoid reprobing fixed
>  * panels.
>  *
>  * This helper function is useful for drivers which can't or don't track 
> hotplug
>  * interrupts for each connector.
>  *
>  * Drivers which support hotplug interrupts for each connector individually 
> and
>  * which have a more fine-grained detect logic should bypass this code and
>  * directly call drm_kms_helper_hotplug_event() in case the connector state
>  * changed.
>  *
>  * This function must be called from process context with no mode
>  * setting locks held.
>  *
>  * Note that a connector can be both polled and probed from the hotplug 
> handler,
>  * in case the hotplug interrupt is known to be unreliable.
>  */
> 
> So it looks like we should use drm_kms_helper_hotplug_event() instead.
> 
> /**
>  * drm_kms_helper_hotplug_event - fire off KMS hotplug events
>  * @dev: drm_device whose connector state changed
>  *
>  * This function fires off the uevent for userspace and also calls the
>  * output_poll_changed function, which is most commonly used to inform the 
> fbdev
>  * emulation code and allow it to update the fbcon output configuration.
>  *
>  * Drivers should call this from their hotplug handling code when a change is
>  * detected. Note that this function does not do any output detection of its
>  * own, like drm_helper_hpd_irq_event() does - this is assumed to be done by 
> the
>  * driver already.
>  *
>  * This function must be called from process context with no mode
>  * setting locks held.
>  */
> 
> The function suffers from the same problem though, that it must be called 
> from 
> process context.
> 
> Daniel, why do we have an API the is clearly related to interrupt handling 
> but 
> requires the caller to implement a workqueue ?

Because in general you need that workqueue anyway, and up to now there was
no driver ever who didn't have a work-queue already. Nesting workqueues
within workqueues seemed beyond silly, hence why I removed them in:

commit 69787f7da6b2adc4054357a661aaa1701a9ca76f
Author: Daniel Vetter 
Date:   Tue Oct 23 18:23:34 2012 +

drm: run the hpd irq event code directly

I guess we could talk about re-introducing a work-item based version of
drm_helper_hpd_irq_event. But for drm_kms_helper_hotplug_event I think it
doesn't make sense - if you call that you've probably just done a pile of
i2c transactions, and those can sleep. If you haven't done i2c
transactions, then it's not an external panel, and why exactly are you
handling hpd for them?

-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC][PATCH 2/3] drm/bridge: adv7511: Add 200ms delay on power-on

2016-11-22 Thread Daniel Vetter
On Tue, Nov 22, 2016 at 08:23:38PM +0200, Laurent Pinchart wrote:
> Hi John,
> 
> (CC'ing Daniel)
> 
> On Tuesday 22 Nov 2016 10:07:53 John Stultz wrote:
> > On Tue, Nov 22, 2016 at 9:38 AM, John Stultz  wrote:
> > > Interestingly, without the msleep added in this patch, removing the
> > > wait_event_interruptible_timeout() method in adv7511_wait_for_edid()
> > > and using the polling loop seems to make things just as reliable. So
> > > maybe something is off with the irq handling here instead?
> > 
> > A.. So I think the trouble here is the that when we fail waiting
> > for the irq, the backtrace is as follows:
> > 
> > [8.318654] [] dump_backtrace+0x0/0x1a0
> > [8.318661] [] show_stack+0x14/0x20
> > [8.318671] [] dump_stack+0x90/0xb0
> > [8.318680] [] adv7511_get_edid_block+0x2c8/0x320
> > [8.318687] [] drm_do_get_edid+0x78/0x280
> > [8.318693] [] adv7511_get_modes+0x80/0xd8
> > [8.318700] [] adv7511_connector_get_modes+0x14/0x20
> > [8.318710] []
> > drm_helper_probe_single_connector_modes+0x2bc/0x500
> > [8.318718] [] drm_fb_helper_hotplug_event+0x130/0x188
> > [8.318726] [] drm_fbdev_cma_hotplug_event+0x10/0x20
> > [8.318733] []
> > kirin_fbdev_output_poll_changed+0x20/0x58
> > [8.318740] [] drm_kms_helper_hotplug_event+0x28/0x38
> > [8.318748] [] drm_helper_hpd_irq_event+0x138/0x180
> > [8.318754] [] adv7511_irq_process+0x78/0xd8
> > [8.318761] [] adv7511_irq_handler+0x14/0x28
> > [8.318769] [] irq_thread_fn+0x28/0x68
> > [8.318775] [] irq_thread+0x128/0x1e8
> > [8.318782] [] kthread+0xd0/0xe8
> > [8.318788] [] ret_from_fork+0x10/0x50
> > 
> > So we're actually in irq handling the hotplug interrupt, which is why
> > we never get the irq notification when the edid is read.
> > 
> > I suspect we need to use a workqueue to do the hotplug handling out of irq.
> 
> Lovely :-)
> 
> Quoting the DRM documentation:
> 
> /**
>  * drm_helper_hpd_irq_event - hotplug processing
>  * @dev: drm_device
>  *
>  * Drivers can use this helper function to run a detect cycle on all 
> connectors
>  * which have the DRM_CONNECTOR_POLL_HPD flag set in their  member. All
>  * other connectors are ignored, which is useful to avoid reprobing fixed
>  * panels.
>  *
>  * This helper function is useful for drivers which can't or don't track 
> hotplug
>  * interrupts for each connector.
>  *
>  * Drivers which support hotplug interrupts for each connector individually 
> and
>  * which have a more fine-grained detect logic should bypass this code and
>  * directly call drm_kms_helper_hotplug_event() in case the connector state
>  * changed.
>  *
>  * This function must be called from process context with no mode
>  * setting locks held.
>  *
>  * Note that a connector can be both polled and probed from the hotplug 
> handler,
>  * in case the hotplug interrupt is known to be unreliable.
>  */
> 
> So it looks like we should use drm_kms_helper_hotplug_event() instead.
> 
> /**
>  * drm_kms_helper_hotplug_event - fire off KMS hotplug events
>  * @dev: drm_device whose connector state changed
>  *
>  * This function fires off the uevent for userspace and also calls the
>  * output_poll_changed function, which is most commonly used to inform the 
> fbdev
>  * emulation code and allow it to update the fbcon output configuration.
>  *
>  * Drivers should call this from their hotplug handling code when a change is
>  * detected. Note that this function does not do any output detection of its
>  * own, like drm_helper_hpd_irq_event() does - this is assumed to be done by 
> the
>  * driver already.
>  *
>  * This function must be called from process context with no mode
>  * setting locks held.
>  */
> 
> The function suffers from the same problem though, that it must be called 
> from 
> process context.
> 
> Daniel, why do we have an API the is clearly related to interrupt handling 
> but 
> requires the caller to implement a workqueue ?

Because in general you need that workqueue anyway, and up to now there was
no driver ever who didn't have a work-queue already. Nesting workqueues
within workqueues seemed beyond silly, hence why I removed them in:

commit 69787f7da6b2adc4054357a661aaa1701a9ca76f
Author: Daniel Vetter 
Date:   Tue Oct 23 18:23:34 2012 +

drm: run the hpd irq event code directly

I guess we could talk about re-introducing a work-item based version of
drm_helper_hpd_irq_event. But for drm_kms_helper_hotplug_event I think it
doesn't make sense - if you call that you've probably just done a pile of
i2c transactions, and those can sleep. If you haven't done i2c
transactions, then it's not an external panel, and why exactly are you
handling hpd for them?

-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[RFC 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath

2016-11-22 Thread Michal Hocko
From: Michal Hocko 

Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.

__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.

While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.

Signed-off-by: Michal Hocko 
---
 mm/page_alloc.c | 68 ++---
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fbfead6aa7d..76c0b6bb0baf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3627,32 +3627,23 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
goto got_pg;
 
/* Caller is not willing to reclaim, we can't balance anything */
-   if (!can_direct_reclaim) {
-   /*
-* All existing users of the __GFP_NOFAIL are blockable, so warn
-* of any new users that actually allow this type of allocation
-* to fail.
-*/
-   WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+   if (!can_direct_reclaim)
goto nopage;
+
+   /* Make sure we know about allocations which stall for too long */
+   if (time_after(jiffies, alloc_start + stall_timeout)) {
+   warn_alloc(gfp_mask,
+   "page alloction stalls for %ums, order:%u",
+   jiffies_to_msecs(jiffies-alloc_start), order);
+   stall_timeout += 10 * HZ;
}
 
/* Avoid recursion of direct reclaim */
-   if (current->flags & PF_MEMALLOC) {
-   /*
-* __GFP_NOFAIL request from this context is rather bizarre
-* because we cannot reclaim anything and only can loop waiting
-* for somebody to do a work for us.
-*/
-   if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
-   cond_resched();
-   goto retry;
-   }
+   if (current->flags & PF_MEMALLOC)
goto nopage;
-   }
 
/* Avoid allocations with no watermarks from looping endlessly */
-   if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+   if (test_thread_flag(TIF_MEMDIE))
goto nopage;
 
 
@@ -3679,14 +3670,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
goto nopage;
 
-   /* Make sure we know about allocations which stall for too long */
-   if (time_after(jiffies, alloc_start + stall_timeout)) {
-   warn_alloc(gfp_mask,
-   "page alloction stalls for %ums, order:%u",
-   jiffies_to_msecs(jiffies-alloc_start), order);
-   stall_timeout += 10 * HZ;
-   }
-
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 did_some_progress > 0, _progress_loops))
goto retry;
@@ -3715,6 +3698,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
}
 
 nopage:
+   /*
+* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+* we always retry
+*/
+   if (gfp_mask & __GFP_NOFAIL) {
+   /*
+* All existing users of the __GFP_NOFAIL are blockable, so warn
+* of any new users that actually require GFP_NOWAIT
+*/
+   if (WARN_ON_ONCE(!can_direct_reclaim))
+   goto fail;
+
+   /*
+* PF_MEMALLOC request from this context is rather bizarre
+* because we cannot reclaim anything and only can loop waiting
+* for somebody to do a work for us
+*/
+   WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+   /*
+* non failing costly orders are a hard requirement which we
+* are not prepared for much so let's warn about these users
+* so that we can identify them and convert them to something
+* 

[RFC 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath

2016-11-22 Thread Michal Hocko
From: Michal Hocko 

Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.

__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.

While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.

Signed-off-by: Michal Hocko 
---
 mm/page_alloc.c | 68 ++---
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fbfead6aa7d..76c0b6bb0baf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3627,32 +3627,23 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
goto got_pg;
 
/* Caller is not willing to reclaim, we can't balance anything */
-   if (!can_direct_reclaim) {
-   /*
-* All existing users of the __GFP_NOFAIL are blockable, so warn
-* of any new users that actually allow this type of allocation
-* to fail.
-*/
-   WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+   if (!can_direct_reclaim)
goto nopage;
+
+   /* Make sure we know about allocations which stall for too long */
+   if (time_after(jiffies, alloc_start + stall_timeout)) {
+   warn_alloc(gfp_mask,
+   "page alloction stalls for %ums, order:%u",
+   jiffies_to_msecs(jiffies-alloc_start), order);
+   stall_timeout += 10 * HZ;
}
 
/* Avoid recursion of direct reclaim */
-   if (current->flags & PF_MEMALLOC) {
-   /*
-* __GFP_NOFAIL request from this context is rather bizarre
-* because we cannot reclaim anything and only can loop waiting
-* for somebody to do a work for us.
-*/
-   if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
-   cond_resched();
-   goto retry;
-   }
+   if (current->flags & PF_MEMALLOC)
goto nopage;
-   }
 
/* Avoid allocations with no watermarks from looping endlessly */
-   if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+   if (test_thread_flag(TIF_MEMDIE))
goto nopage;
 
 
@@ -3679,14 +3670,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
goto nopage;
 
-   /* Make sure we know about allocations which stall for too long */
-   if (time_after(jiffies, alloc_start + stall_timeout)) {
-   warn_alloc(gfp_mask,
-   "page alloction stalls for %ums, order:%u",
-   jiffies_to_msecs(jiffies-alloc_start), order);
-   stall_timeout += 10 * HZ;
-   }
-
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 did_some_progress > 0, _progress_loops))
goto retry;
@@ -3715,6 +3698,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
}
 
 nopage:
+   /*
+* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+* we always retry
+*/
+   if (gfp_mask & __GFP_NOFAIL) {
+   /*
+* All existing users of the __GFP_NOFAIL are blockable, so warn
+* of any new users that actually require GFP_NOWAIT
+*/
+   if (WARN_ON_ONCE(!can_direct_reclaim))
+   goto fail;
+
+   /*
+* PF_MEMALLOC request from this context is rather bizarre
+* because we cannot reclaim anything and only can loop waiting
+* for somebody to do a work for us
+*/
+   WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+   /*
+* non failing costly orders are a hard requirement which we
+* are not prepared for much so let's warn about these users
+* so that we can identify them and convert them to something
+* else.
+*/
+

Re: crash by cdc_acm driver in kernels 4.8-rc1/5

2016-11-22 Thread Bjørn Mork


On November 23, 2016 1:54:57 AM CET, Wim Osterholt  wrote:
>On Tue, Nov 22, 2016 at 07:08:30PM +0100, Bjørn Mork wrote:
>> > On kernel 4.8.8  this crashes hard and produces over a serial link:
>> 
>> Huh?  That device shouldn't ever enter that code path AFAICS.
>> Unless you wouldn't happen to add a dynamic entry for this
>device,
>
>No idea of what you mean here.
>
>> would you?  What's the output of
>> 
>>  cat /sys/bus/usb/drivers/cdc_acm/new_id
>
>Just empty.

Shit. Back to not understanding how you could possibly enter the debugging code 
at all.

Bjørn



Re: crash by cdc_acm driver in kernels 4.8-rc1/5

2016-11-22 Thread Bjørn Mork


On November 23, 2016 1:54:57 AM CET, Wim Osterholt  wrote:
>On Tue, Nov 22, 2016 at 07:08:30PM +0100, Bjørn Mork wrote:
>> > On kernel 4.8.8  this crashes hard and produces over a serial link:
>> 
>> Huh?  That device shouldn't ever enter that code path AFAICS.
>> Unless you wouldn't happen to add a dynamic entry for this
>device,
>
>No idea of what you mean here.
>
>> would you?  What's the output of
>> 
>>  cat /sys/bus/usb/drivers/cdc_acm/new_id
>
>Just empty.

Shit. Back to not understanding how you could possibly enter the debugging code 
at all.

Bjørn



Re: [PATCH 1/3] of: base: add support to get machine compatible string

2016-11-22 Thread Sekhar Nori
On Tuesday 22 November 2016 09:16 PM, Sudeep Holla wrote:
> Hi Sekhar,
> 
> On 22/11/16 15:06, Sekhar Nori wrote:
>> Hi Sudeep,
>>
>> On Tuesday 22 November 2016 04:23 PM, Sudeep Holla wrote:
>>>
>>>
>>> On 22/11/16 10:41, Bartosz Golaszewski wrote:
 Add a function allowing to retrieve the compatible string of the root
 node of the device tree.

>>>
>>> Rob has queued [1] and it's in -next today. You can reuse that if you
>>> are planning to target this for v4.11 or just use open coding in your
>>> driver for v4.10 and target this move for v4.11 to avoid cross tree
>>> dependencies as I already mentioned in your previous thread.
>>
>> I dont have your original patch in my mailbox, but I wonder if
>> returning a pointer to property string for a node whose reference has
>> already been released is safe to do? Probably not an issue for the root
>> node, but still feels counter-intuitive.
>>
> 
> I am not sure if I understand the issue here. Are you referring a case
> where of_root is freed ?

Yes, right, thats what I was hinting at. Since you are giving up the
reference to the device node before the function returns, the user can
be left with a dangling reference.

> Also I have seen drivers today just using this pointer directly, but
> it's better to copy the string(I just saw this done in one case)

Hmm, the reference is given up before the API returns, so I doubt
copying it later is any additional benefit.

I suspect this is a theoretical issue though since root device node is
probably never freed.

Thanks,
Sekhar



Re: [PATCH 1/3] of: base: add support to get machine compatible string

2016-11-22 Thread Sekhar Nori
On Tuesday 22 November 2016 09:16 PM, Sudeep Holla wrote:
> Hi Sekhar,
> 
> On 22/11/16 15:06, Sekhar Nori wrote:
>> Hi Sudeep,
>>
>> On Tuesday 22 November 2016 04:23 PM, Sudeep Holla wrote:
>>>
>>>
>>> On 22/11/16 10:41, Bartosz Golaszewski wrote:
 Add a function allowing to retrieve the compatible string of the root
 node of the device tree.

>>>
>>> Rob has queued [1] and it's in -next today. You can reuse that if you
>>> are planning to target this for v4.11 or just use open coding in your
>>> driver for v4.10 and target this move for v4.11 to avoid cross tree
>>> dependencies as I already mentioned in your previous thread.
>>
>> I dont have your original patch in my mailbox, but I wonder if
>> returning a pointer to property string for a node whose reference has
>> already been released is safe to do? Probably not an issue for the root
>> node, but still feels counter-intuitive.
>>
> 
> I am not sure if I understand the issue here. Are you referring a case
> where of_root is freed ?

Yes, right, thats what I was hinting at. Since you are giving up the
reference to the device node before the function returns, the user can
be left with a dangling reference.

> Also I have seen drivers today just using this pointer directly, but
> it's better to copy the string(I just saw this done in one case)

Hmm, the reference is given up before the API returns, so I doubt
copying it later is any additional benefit.

I suspect this is a theoretical issue though since root device node is
probably never freed.

Thanks,
Sekhar



Re: Enabling peer to peer device transactions for PCIe devices

2016-11-22 Thread Daniel Vetter
On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote:
> On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter  wrote:
> > On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
> >  wrote:
> >>
> >> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
> >>>
> >>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams 
> >>> wrote:
> 
>  On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>   wrote:
> >
> > I personally like "device-DAX" idea but my concerns are:
> >
> > -  How well it will co-exists with the  DRM infrastructure /
> > implementations
> > in part dealing with CPU pointers?
> 
>  Inside the kernel a device-DAX range is "just memory" in the sense
>  that you can perform pfn_to_page() on it and issue I/O, but the vma is
>  not migratable. To be honest I do not know how well that co-exists
>  with drm infrastructure.
> 
> > -  How well we will be able to handle case when we need to
> > "move"/"evict"
> > memory/data to the new location so CPU pointer should point to the
> > new
> > physical location/address
> >  (and may be not in PCI device memory at all)?
> 
>  So, device-DAX deliberately avoids support for in-kernel migration or
>  overcommit. Those cases are left to the core mm or drm. The device-dax
>  interface is for cases where all that is needed is a direct-mapping to
>  a statically-allocated physical-address range be it persistent memory
>  or some other special reserved memory range.
> >>>
> >>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
> >>> pull off) I think we want all the magic in core mm, i.e. migration and
> >>> overcommit. At least that seems to be the very strong drive in all
> >>> general-purpose gpu abstractions and implementations, where memory is
> >>> allocated with malloc, and then mapped/moved into vram/gpu address
> >>> space through some magic,
> >>
> >> It is possible that there is other way around: memory is requested to be
> >> allocated and should be kept in vram for  performance reason but due
> >> to possible overcommit case we need at least temporally to "move" such
> >> allocation to system memory.
> >
> > With migration I meant migrating both ways of course. And with stuff
> > like numactl we can also influence where exactly the malloc'ed memory
> > is allocated originally, at least if we'd expose the vram range as a
> > very special numa node that happens to be far away and not hold any
> > cpu cores.
> 
> I don't think we should be using numa distance to reverse engineer a
> certain allocation behavior.  The latency data should be truthful, but
> you're right we'll need a mechanism to keep general purpose
> allocations out of that range by default. Btw, strict isolation is
> another design point of device-dax, but I think in this case we're
> describing something between the two extremes of full isolation and
> full compatibility with existing numactl apis.

Yes, agreed. My idea with exposing vram sections using numa nodes wasn't
to reuse all the existing allocation policies directly, those won't work.
So at boot-up your default numa policy would exclude any vram nodes.

But I think (as an -mm layman) that numa gives us a lot of the tools and
policy interface that we need to implement what we want for gpus.

Wrt isolation: There's a sliding scale of what different users expect,
from full auto everything, including migrating pages around if needed to
full isolation all seems to be on the table. As long as we keep vram nodes
out of any default allocation numasets, full isolation should be possible.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: Enabling peer to peer device transactions for PCIe devices

2016-11-22 Thread Daniel Vetter
On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote:
> On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter  wrote:
> > On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
> >  wrote:
> >>
> >> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
> >>>
> >>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams 
> >>> wrote:
> 
>  On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>   wrote:
> >
> > I personally like "device-DAX" idea but my concerns are:
> >
> > -  How well it will co-exists with the  DRM infrastructure /
> > implementations
> > in part dealing with CPU pointers?
> 
>  Inside the kernel a device-DAX range is "just memory" in the sense
>  that you can perform pfn_to_page() on it and issue I/O, but the vma is
>  not migratable. To be honest I do not know how well that co-exists
>  with drm infrastructure.
> 
> > -  How well we will be able to handle case when we need to
> > "move"/"evict"
> > memory/data to the new location so CPU pointer should point to the
> > new
> > physical location/address
> >  (and may be not in PCI device memory at all)?
> 
>  So, device-DAX deliberately avoids support for in-kernel migration or
>  overcommit. Those cases are left to the core mm or drm. The device-dax
>  interface is for cases where all that is needed is a direct-mapping to
>  a statically-allocated physical-address range be it persistent memory
>  or some other special reserved memory range.
> >>>
> >>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
> >>> pull off) I think we want all the magic in core mm, i.e. migration and
> >>> overcommit. At least that seems to be the very strong drive in all
> >>> general-purpose gpu abstractions and implementations, where memory is
> >>> allocated with malloc, and then mapped/moved into vram/gpu address
> >>> space through some magic,
> >>
> >> It is possible that there is other way around: memory is requested to be
> >> allocated and should be kept in vram for  performance reason but due
> >> to possible overcommit case we need at least temporally to "move" such
> >> allocation to system memory.
> >
> > With migration I meant migrating both ways of course. And with stuff
> > like numactl we can also influence where exactly the malloc'ed memory
> > is allocated originally, at least if we'd expose the vram range as a
> > very special numa node that happens to be far away and not hold any
> > cpu cores.
> 
> I don't think we should be using numa distance to reverse engineer a
> certain allocation behavior.  The latency data should be truthful, but
> you're right we'll need a mechanism to keep general purpose
> allocations out of that range by default. Btw, strict isolation is
> another design point of device-dax, but I think in this case we're
> describing something between the two extremes of full isolation and
> full compatibility with existing numactl apis.

Yes, agreed. My idea with exposing vram sections using numa nodes wasn't
to reuse all the existing allocation policies directly, those won't work.
So at boot-up your default numa policy would exclude any vram nodes.

But I think (as an -mm layman) that numa gives us a lot of the tools and
policy interface that we need to implement what we want for gpus.

Wrt isolation: There's a sliding scale of what different users expect,
from full auto everything, including migrating pages around if needed to
full isolation all seems to be on the table. As long as we keep vram nodes
out of any default allocation numasets, full isolation should be possible.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] Staging: iio: adc: fix sysfs files modes in ad7192.c

2016-11-22 Thread Boyan Vladinov
Fixes sysfs entries user/group modes and coding style warnings
found by checkpatch.pl tool

Signed-off-by: Boyan Vladinov 
---
 drivers/staging/iio/adc/ad7192.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7192.c b/drivers/staging/iio/adc/ad7192.c
index 1fb68c01abd5..3f9f54b654f7 100644
--- a/drivers/staging/iio/adc/ad7192.c
+++ b/drivers/staging/iio/adc/ad7192.c
@@ -341,10 +341,10 @@ ad7192_show_scale_available(struct device *dev,
 }
 
 static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available,
-in_voltage-voltage_scale_available,
-S_IRUGO, ad7192_show_scale_available, NULL, 0);
+in_voltage - voltage_scale_available,
+0444, ad7192_show_scale_available, NULL, 0);
 
-static IIO_DEVICE_ATTR(in_voltage_scale_available, S_IRUGO,
+static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444,
   ad7192_show_scale_available, NULL, 0);
 
 static ssize_t ad7192_show_ac_excitation(struct device *dev,
@@ -412,11 +412,11 @@ static ssize_t ad7192_set(struct device *dev,
return ret ? ret : len;
 }
 
-static IIO_DEVICE_ATTR(bridge_switch_en, S_IRUGO | S_IWUSR,
+static IIO_DEVICE_ATTR(bridge_switch_en, 0444 | 0200,
   ad7192_show_bridge_switch, ad7192_set,
   AD7192_REG_GPOCON);
 
-static IIO_DEVICE_ATTR(ac_excitation_en, S_IRUGO | S_IWUSR,
+static IIO_DEVICE_ATTR(ac_excitation_en, 0444 | 0200,
   ad7192_show_ac_excitation, ad7192_set,
   AD7192_REG_MODE);
 
-- 
2.7.4



[PATCH] Staging: iio: adc: fix sysfs files modes in ad7192.c

2016-11-22 Thread Boyan Vladinov
Fixes sysfs entries user/group modes and coding style warnings
found by checkpatch.pl tool

Signed-off-by: Boyan Vladinov 
---
 drivers/staging/iio/adc/ad7192.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7192.c b/drivers/staging/iio/adc/ad7192.c
index 1fb68c01abd5..3f9f54b654f7 100644
--- a/drivers/staging/iio/adc/ad7192.c
+++ b/drivers/staging/iio/adc/ad7192.c
@@ -341,10 +341,10 @@ ad7192_show_scale_available(struct device *dev,
 }
 
 static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available,
-in_voltage-voltage_scale_available,
-S_IRUGO, ad7192_show_scale_available, NULL, 0);
+in_voltage - voltage_scale_available,
+0444, ad7192_show_scale_available, NULL, 0);
 
-static IIO_DEVICE_ATTR(in_voltage_scale_available, S_IRUGO,
+static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444,
   ad7192_show_scale_available, NULL, 0);
 
 static ssize_t ad7192_show_ac_excitation(struct device *dev,
@@ -412,11 +412,11 @@ static ssize_t ad7192_set(struct device *dev,
return ret ? ret : len;
 }
 
-static IIO_DEVICE_ATTR(bridge_switch_en, S_IRUGO | S_IWUSR,
+static IIO_DEVICE_ATTR(bridge_switch_en, 0444 | 0200,
   ad7192_show_bridge_switch, ad7192_set,
   AD7192_REG_GPOCON);
 
-static IIO_DEVICE_ATTR(ac_excitation_en, S_IRUGO | S_IWUSR,
+static IIO_DEVICE_ATTR(ac_excitation_en, 0444 | 0200,
   ad7192_show_ac_excitation, ad7192_set,
   AD7192_REG_MODE);
 
-- 
2.7.4



[PATCH] arm64: dts: exynos: enable hs400 mode for eMMC for TM2

2016-11-22 Thread Jaehoon Chung
TM2 can support the HS400 mode, but eMMC is working as the lowest mode.
This patch added the properties for HS400 and other modes.

Signed-off-by: Jaehoon Chung 
---
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index 88cb6c1..f21bdc2 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -701,6 +701,9 @@
 _0 {
status = "okay";
num-slots = <1>;
+   mmc-hs200-1_8v;
+   mmc-hs400-1_8v;
+   cap-mmc-highspeed;
non-removable;
card-detect-delay = <200>;
samsung,dw-mshc-ciu-div = <3>;
-- 
2.10.1



[PATCH] arm64: dts: exynos: enable hs400 mode for eMMC for TM2

2016-11-22 Thread Jaehoon Chung
TM2 can support the HS400 mode, but eMMC is working as the lowest mode.
This patch added the properties for HS400 and other modes.

Signed-off-by: Jaehoon Chung 
---
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index 88cb6c1..f21bdc2 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -701,6 +701,9 @@
 _0 {
status = "okay";
num-slots = <1>;
+   mmc-hs200-1_8v;
+   mmc-hs400-1_8v;
+   cap-mmc-highspeed;
non-removable;
card-detect-delay = <200>;
samsung,dw-mshc-ciu-div = <3>;
-- 
2.10.1



Re: [PATCH] Add support for disabling Intel PT trace in ftrace

2016-11-22 Thread Alexander Shishkin
Andi Kleen  writes:

> +/*
> + * Disable the PT trace for debugging purposes.
> + */
> +void pt_disable(void)
> +{
> + u64 val;
> +
> + if (!boot_cpu_has(X86_FEATURE_INTEL_PT))
> + return;
> +
> + rdmsrl_safe(MSR_IA32_RTIT_CTL, );
> + val &= ~RTIT_CTL_TRACEEN;
> + wrmsrl_safe(MSR_IA32_RTIT_CTL, val);
> +}
> +EXPORT_SYMBOL(pt_disable);

This will create unexplainable gaps in the trace, at least we should
output RECORD_AUX when this happens, maybe add a flag for "had to stop
the trace for reasons external to perf".

Also, I can't tell if this is called from an atomic context.

But I'd suggest something more generic like perf_pmu_off($pmu):
 - we already have the code to stop the output;
 - this won't be a driver-specific api then;
 - this will be reflected in the event hw state;
 - it will also go through the driver's callbacks, so its internal
 states will actually match the reality;
 - will work equally well for intel_bts or the ARM/Coresight tracers.

Regards,
--
Alex


Re: [PATCH v2] mm: support anonymous stable page

2016-11-22 Thread Minchan Kim
Hi Hugh,

On Tue, Nov 22, 2016 at 08:43:54PM -0800, Hugh Dickins wrote:
> On Tue, 22 Nov 2016, Minchan Kim wrote:
> > On Mon, Nov 21, 2016 at 07:46:28PM -0800, Hugh Dickins wrote:
> > > 
> > > Andrew might ask if we should Cc stable (haha): I think we agree
> > > that it's a defect we've been aware of ever since stable pages were
> > > first proposed, but nobody has actually been troubled by it before
> > > your async zram development: so, you're right to be fixing it ahead
> > > of your zram changes, but we don't see a call for backporting.
> > 
> > I thought so until I see your comment. However, I checked again
> > and found it seems a ancient bug since zram birth.
> > swap_writepage unlock the page right before submitting bio while
> > it keeps the lock during rw_page operation during bdev_write_page.
> > So, if zram_rw_page fails(e.g, -ENOMEM) and then fallback to
> > submit_bio in __swap_writepage, the problem can occur.
> 
> It's not clear to me why that matters.  If it drives zram mad
> to the point of crashing the kernel, yes, that would matter.  But
> if it just places incomprehensible or mis-CRCed data on the device,
> who cares?  The reused swap page is marked dirty, and nobody should
> be reading the stale data back off swap.  If you do resend with a
> stable tag, please make clear why it matters.

Your comment makes me think again. For old zram, it would be not a
problem. Thanks for the hint, Hugh!
However, it makes 4.7 kernel crash with per-cpu stream feature
introduced in zram to increase cache hit ratio.

The problem is it tries to compress page and then get compressed size.
With that size, it allocates buffer via zsmalloc but it could be
failed easily due to limited gfp_flag in per-cpu context so zram
retry to allocate buffer out of per-cpu context with more soft gfp
flag. If it get successfully, it retry to compress the page again
and copy the compressed data to the buffer allocated in advance.
During the operations, if the content is changed, it means
compressed size could be different so that buffer size allocated
in first trial is not vaild any more. It ends up buffer-overrun
so that zsmalloc free object chaining will be broken and go crash.
So, if we want to fix it really, it should go stable for v4.7,
at least.

Unfortunately, zram has reset feature which means zram shrink
the disksize to zero and then it should revalidate disk where
it will reset BDI_CAP_STABLE_WRITES. now zRAM cannot do it atomically
so someone can miss BDI_CAP_STABLE_WRITES of /dev/zram0.

I need to redesign the locking before supporting stable page of
zram so now I realize it's hard to reach stable tree, maybe.
I will think over with more time.

Thanks for always the helpful comment!



Re: [PATCH] Add support for disabling Intel PT trace in ftrace

2016-11-22 Thread Alexander Shishkin
Andi Kleen  writes:

> +/*
> + * Disable the PT trace for debugging purposes.
> + */
> +void pt_disable(void)
> +{
> + u64 val;
> +
> + if (!boot_cpu_has(X86_FEATURE_INTEL_PT))
> + return;
> +
> + rdmsrl_safe(MSR_IA32_RTIT_CTL, );
> + val &= ~RTIT_CTL_TRACEEN;
> + wrmsrl_safe(MSR_IA32_RTIT_CTL, val);
> +}
> +EXPORT_SYMBOL(pt_disable);

This will create unexplainable gaps in the trace, at least we should
output RECORD_AUX when this happens, maybe add a flag for "had to stop
the trace for reasons external to perf".

Also, I can't tell if this is called from an atomic context.

But I'd suggest something more generic like perf_pmu_off($pmu):
 - we already have the code to stop the output;
 - this won't be a driver-specific api then;
 - this will be reflected in the event hw state;
 - it will also go through the driver's callbacks, so its internal
 states will actually match the reality;
 - will work equally well for intel_bts or the ARM/Coresight tracers.

Regards,
--
Alex


Re: [PATCH v2] mm: support anonymous stable page

2016-11-22 Thread Minchan Kim
Hi Hugh,

On Tue, Nov 22, 2016 at 08:43:54PM -0800, Hugh Dickins wrote:
> On Tue, 22 Nov 2016, Minchan Kim wrote:
> > On Mon, Nov 21, 2016 at 07:46:28PM -0800, Hugh Dickins wrote:
> > > 
> > > Andrew might ask if we should Cc stable (haha): I think we agree
> > > that it's a defect we've been aware of ever since stable pages were
> > > first proposed, but nobody has actually been troubled by it before
> > > your async zram development: so, you're right to be fixing it ahead
> > > of your zram changes, but we don't see a call for backporting.
> > 
> > I thought so until I see your comment. However, I checked again
> > and found it seems a ancient bug since zram birth.
> > swap_writepage unlock the page right before submitting bio while
> > it keeps the lock during rw_page operation during bdev_write_page.
> > So, if zram_rw_page fails(e.g, -ENOMEM) and then fallback to
> > submit_bio in __swap_writepage, the problem can occur.
> 
> It's not clear to me why that matters.  If it drives zram mad
> to the point of crashing the kernel, yes, that would matter.  But
> if it just places incomprehensible or mis-CRCed data on the device,
> who cares?  The reused swap page is marked dirty, and nobody should
> be reading the stale data back off swap.  If you do resend with a
> stable tag, please make clear why it matters.

Your comment makes me think again. For old zram, it would be not a
problem. Thanks for the hint, Hugh!
However, it makes 4.7 kernel crash with per-cpu stream feature
introduced in zram to increase cache hit ratio.

The problem is it tries to compress page and then get compressed size.
With that size, it allocates buffer via zsmalloc but it could be
failed easily due to limited gfp_flag in per-cpu context so zram
retry to allocate buffer out of per-cpu context with more soft gfp
flag. If it get successfully, it retry to compress the page again
and copy the compressed data to the buffer allocated in advance.
During the operations, if the content is changed, it means
compressed size could be different so that buffer size allocated
in first trial is not vaild any more. It ends up buffer-overrun
so that zsmalloc free object chaining will be broken and go crash.
So, if we want to fix it really, it should go stable for v4.7,
at least.

Unfortunately, zram has reset feature which means zram shrink
the disksize to zero and then it should revalidate disk where
it will reset BDI_CAP_STABLE_WRITES. now zRAM cannot do it atomically
so someone can miss BDI_CAP_STABLE_WRITES of /dev/zram0.

I need to redesign the locking before supporting stable page of
zram so now I realize it's hard to reach stable tree, maybe.
I will think over with more time.

Thanks for always the helpful comment!



Re: [PATCH v2 2/2] mmc: sdhci-pci: Use ACPI to get max frequency for Intel byt sdio controller sub-vended by NI

2016-11-22 Thread Adrian Hunter
On 22/11/16 23:53, Zach Brown wrote:
> On NI 9037 boards the max SDIO frequency is limited by trace lengths
> and other layout choices. The max SDIO frequency is stored in an ACPI
> table.
> 
> The driver reads the ACPI entry MXFQ during sdio_probe_slot and sets the
> f_max field of the host.
> 
> Signed-off-by: Nathan Sullivan 
> Reviewed-by: Jaeden Amero 
> Reviewed-by: Josh Cartwright 
> Signed-off-by: Zach Brown 
> ---
>  drivers/mmc/host/sdhci-pci-core.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-pci-core.c 
> b/drivers/mmc/host/sdhci-pci-core.c
> index 9741505..34284b8 100644
> --- a/drivers/mmc/host/sdhci-pci-core.c
> +++ b/drivers/mmc/host/sdhci-pci-core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "sdhci.h"
>  #include "sdhci-pci.h"
> @@ -375,6 +376,30 @@ static int byt_emmc_probe_slot(struct sdhci_pci_slot 
> *slot)
>   return 0;
>  }
>  
> +#ifdef CONFIG_ACPI
> +
> +static int ni_byt_sdio_probe_slot(struct sdhci_pci_slot *slot)
> +{
> + acpi_status status;
> + unsigned long long max_freq;
> +
> + status = acpi_evaluate_integer(ACPI_HANDLE(>chip->pdev->dev),
> +"MXFQ", NULL, _freq);
> + if (ACPI_FAILURE(status)) {
> + dev_err(>chip->pdev->dev,
> + "MXFQ not found in acpi table\n");
> + return -EINVAL;
> + }
> +
> + slot->host->mmc->f_max = max_freq * 100;
> +
> + slot->host->mmc->caps |= MMC_CAP_POWER_OFF_CARD | MMC_CAP_NONREMOVABLE |
> +  MMC_CAP_WAIT_WHILE_BUSY;
> + return 0;
> +}
> +
> +#else

No, it is the ACPI access that needs to be a separate function.  We don't
want 2 ni_byt_sdio_probe_slot(). Perhaps, something like:

#ifdef CONFIG_ACPI
static int ni_set_max_freq(struct sdhci_pci_slot *slot)
{
acpi_status status;
unsigned long long max_freq;

status = acpi_evaluate_integer(ACPI_HANDLE(>chip->pdev->dev),
   "MXFQ", NULL, _freq);
if (ACPI_FAILURE(status))
dev_err(>chip->pdev->dev,
"MXFQ not found in acpi table\n");
return -EINVAL;
}

slot->host->mmc->f_max = max_freq * 100;

return 0;
}
#else
static inline int ni_set_max_freq(struct sdhci_pci_slot *slot)
{
return 0;
}
#endif




Re: [PATCH v2 2/2] mmc: sdhci-pci: Use ACPI to get max frequency for Intel byt sdio controller sub-vended by NI

2016-11-22 Thread Adrian Hunter
On 22/11/16 23:53, Zach Brown wrote:
> On NI 9037 boards the max SDIO frequency is limited by trace lengths
> and other layout choices. The max SDIO frequency is stored in an ACPI
> table.
> 
> The driver reads the ACPI entry MXFQ during sdio_probe_slot and sets the
> f_max field of the host.
> 
> Signed-off-by: Nathan Sullivan 
> Reviewed-by: Jaeden Amero 
> Reviewed-by: Josh Cartwright 
> Signed-off-by: Zach Brown 
> ---
>  drivers/mmc/host/sdhci-pci-core.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-pci-core.c 
> b/drivers/mmc/host/sdhci-pci-core.c
> index 9741505..34284b8 100644
> --- a/drivers/mmc/host/sdhci-pci-core.c
> +++ b/drivers/mmc/host/sdhci-pci-core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "sdhci.h"
>  #include "sdhci-pci.h"
> @@ -375,6 +376,30 @@ static int byt_emmc_probe_slot(struct sdhci_pci_slot 
> *slot)
>   return 0;
>  }
>  
> +#ifdef CONFIG_ACPI
> +
> +static int ni_byt_sdio_probe_slot(struct sdhci_pci_slot *slot)
> +{
> + acpi_status status;
> + unsigned long long max_freq;
> +
> + status = acpi_evaluate_integer(ACPI_HANDLE(>chip->pdev->dev),
> +"MXFQ", NULL, _freq);
> + if (ACPI_FAILURE(status)) {
> + dev_err(>chip->pdev->dev,
> + "MXFQ not found in acpi table\n");
> + return -EINVAL;
> + }
> +
> + slot->host->mmc->f_max = max_freq * 100;
> +
> + slot->host->mmc->caps |= MMC_CAP_POWER_OFF_CARD | MMC_CAP_NONREMOVABLE |
> +  MMC_CAP_WAIT_WHILE_BUSY;
> + return 0;
> +}
> +
> +#else

No, it is the ACPI access that needs to be a separate function.  We don't
want 2 ni_byt_sdio_probe_slot(). Perhaps, something like:

#ifdef CONFIG_ACPI
static int ni_set_max_freq(struct sdhci_pci_slot *slot)
{
acpi_status status;
unsigned long long max_freq;

status = acpi_evaluate_integer(ACPI_HANDLE(>chip->pdev->dev),
   "MXFQ", NULL, _freq);
if (ACPI_FAILURE(status))
dev_err(>chip->pdev->dev,
"MXFQ not found in acpi table\n");
return -EINVAL;
}

slot->host->mmc->f_max = max_freq * 100;

return 0;
}
#else
static inline int ni_set_max_freq(struct sdhci_pci_slot *slot)
{
return 0;
}
#endif




Re: [PATCH] Staging: iio: adc: fix sysfs files modes in ad7192.c

2016-11-22 Thread Greg KH
On Tue, Nov 22, 2016 at 11:25:14PM -0800, Boyan Vladinov wrote:
> Fixes sysfs entries user/group modes and coding style warnings
> found by checkpatch.pl tool
> 
> Signed-off-by: Boyan Vladinov 
> ---
>  drivers/staging/iio/adc/ad7192.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/iio/adc/ad7192.c 
> b/drivers/staging/iio/adc/ad7192.c
> index 1fb68c01abd5..3f9f54b654f7 100644
> --- a/drivers/staging/iio/adc/ad7192.c
> +++ b/drivers/staging/iio/adc/ad7192.c
> @@ -341,10 +341,10 @@ ad7192_show_scale_available(struct device *dev,
>  }
>  
>  static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available,
> -  in_voltage-voltage_scale_available,
> -  S_IRUGO, ad7192_show_scale_available, NULL, 0);
> +  in_voltage - voltage_scale_available,
> +  0444, ad7192_show_scale_available, NULL, 0);

IIO_DEVICE_ATTR_RO() after fixing up some variable names?

>  
> -static IIO_DEVICE_ATTR(in_voltage_scale_available, S_IRUGO,
> +static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444,
>  ad7192_show_scale_available, NULL, 0);

IIO_DEVICE_ATTR_RO()?

>  
>  static ssize_t ad7192_show_ac_excitation(struct device *dev,
> @@ -412,11 +412,11 @@ static ssize_t ad7192_set(struct device *dev,
>   return ret ? ret : len;
>  }
>  
> -static IIO_DEVICE_ATTR(bridge_switch_en, S_IRUGO | S_IWUSR,
> +static IIO_DEVICE_ATTR(bridge_switch_en, 0444 | 0200,
>  ad7192_show_bridge_switch, ad7192_set,
>  AD7192_REG_GPOCON);

IIO_DEVICE_ATTR_RW()?

thanks,

greg k-h


Re: [PATCH] Staging: iio: adc: fix sysfs files modes in ad7192.c

2016-11-22 Thread Greg KH
On Tue, Nov 22, 2016 at 11:25:14PM -0800, Boyan Vladinov wrote:
> Fixes sysfs entries user/group modes and coding style warnings
> found by checkpatch.pl tool
> 
> Signed-off-by: Boyan Vladinov 
> ---
>  drivers/staging/iio/adc/ad7192.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/iio/adc/ad7192.c 
> b/drivers/staging/iio/adc/ad7192.c
> index 1fb68c01abd5..3f9f54b654f7 100644
> --- a/drivers/staging/iio/adc/ad7192.c
> +++ b/drivers/staging/iio/adc/ad7192.c
> @@ -341,10 +341,10 @@ ad7192_show_scale_available(struct device *dev,
>  }
>  
>  static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available,
> -  in_voltage-voltage_scale_available,
> -  S_IRUGO, ad7192_show_scale_available, NULL, 0);
> +  in_voltage - voltage_scale_available,
> +  0444, ad7192_show_scale_available, NULL, 0);

IIO_DEVICE_ATTR_RO() after fixing up some variable names?

>  
> -static IIO_DEVICE_ATTR(in_voltage_scale_available, S_IRUGO,
> +static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444,
>  ad7192_show_scale_available, NULL, 0);

IIO_DEVICE_ATTR_RO()?

>  
>  static ssize_t ad7192_show_ac_excitation(struct device *dev,
> @@ -412,11 +412,11 @@ static ssize_t ad7192_set(struct device *dev,
>   return ret ? ret : len;
>  }
>  
> -static IIO_DEVICE_ATTR(bridge_switch_en, S_IRUGO | S_IWUSR,
> +static IIO_DEVICE_ATTR(bridge_switch_en, 0444 | 0200,
>  ad7192_show_bridge_switch, ad7192_set,
>  AD7192_REG_GPOCON);

IIO_DEVICE_ATTR_RW()?

thanks,

greg k-h


Re: [v7] QE: remove PPCisms for QE

2016-11-22 Thread Scott Wood
On Wed, Sep 28, 2016 at 11:15:31AM +0800, Zhao Qiang wrote:
> QE was supported on PowerPC, and dependent on PPC,
> Now it is supported on other platforms. so remove PPCisms.
> 
> Signed-off-by: Zhao Qiang 
> ---

Changelog should be something like:

soc/fsl/qe: Cleanups and portability fixes

QE was supported on PowerPC, and dependent on PPC.  In preparation for
supporting on other platforms, remove some PPCisms.

The PPC kconfig dependency is moved from the QE core into the individual
QE peripheral drivers, to allow portability work to occur on them
separately.

> diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h
> index 70339d7..f7a14f2 100644
> --- a/include/soc/fsl/qe/qe.h
> +++ b/include/soc/fsl/qe/qe.h
> @@ -21,7 +21,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

When building corenet32_smp_defconfig, I get this:

/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:262:31: error: 
'BD_SC_READY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:266:31: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:348:3: error: 
'BD_SC_READY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:350:31: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:477:16: error: 
'BD_SC_EMPTY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:9: error: 
'BD_SC_BR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:20: error: 
'BD_SC_FR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:31: error: 
'BD_SC_PR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:42: error: 
'BD_SC_OV' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:512:3: error: 
'BD_SC_ID' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:514:31: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:605:26: error: 
'BD_SC_EMPTY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:605:40: error: 
'BD_SC_INTRPT' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:613:25: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:926:27: error: 
'BD_SC_EMPTY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:926:41: error: 
'BD_SC_OV' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:928:29: error: 
'BD_SC_FR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:928:40: error: 
'BD_SC_PR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:930:29: error: 
'BD_SC_BR' undeclared (first use in this function)

-Scott


Re: [v7] QE: remove PPCisms for QE

2016-11-22 Thread Scott Wood
On Wed, Sep 28, 2016 at 11:15:31AM +0800, Zhao Qiang wrote:
> QE was supported on PowerPC, and dependent on PPC,
> Now it is supported on other platforms. so remove PPCisms.
> 
> Signed-off-by: Zhao Qiang 
> ---

Changelog should be something like:

soc/fsl/qe: Cleanups and portability fixes

QE was supported on PowerPC, and dependent on PPC.  In preparation for
supporting on other platforms, remove some PPCisms.

The PPC kconfig dependency is moved from the QE core into the individual
QE peripheral drivers, to allow portability work to occur on them
separately.

> diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h
> index 70339d7..f7a14f2 100644
> --- a/include/soc/fsl/qe/qe.h
> +++ b/include/soc/fsl/qe/qe.h
> @@ -21,7 +21,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

When building corenet32_smp_defconfig, I get this:

/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:262:31: error: 
'BD_SC_READY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:266:31: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:348:3: error: 
'BD_SC_READY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:350:31: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:477:16: error: 
'BD_SC_EMPTY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:9: error: 
'BD_SC_BR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:20: error: 
'BD_SC_FR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:31: error: 
'BD_SC_PR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:501:42: error: 
'BD_SC_OV' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:512:3: error: 
'BD_SC_ID' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:514:31: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:605:26: error: 
'BD_SC_EMPTY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:605:40: error: 
'BD_SC_INTRPT' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:613:25: error: 
'BD_SC_WRAP' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:926:27: error: 
'BD_SC_EMPTY' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:926:41: error: 
'BD_SC_OV' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:928:29: error: 
'BD_SC_FR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:928:40: error: 
'BD_SC_PR' undeclared (first use in this function)
/home/scott/fsl/git/linux/upstream/drivers/tty/serial/ucc_uart.c:930:29: error: 
'BD_SC_BR' undeclared (first use in this function)

-Scott


Re: [PATCHv0 1/1] fbdev: add Intel FPGA FRAME BUFFER driver

2016-11-22 Thread Tomi Valkeinen
Hi,

On 16/11/16 11:07, Ong, Hean Loong wrote:
> From: Ong Hean Loong 
> 
>   This patch enables the display port IP driver for
>   Intel Arria 10 SOCFPGA Golden Hardware
>   Reference Design (GHRD).
> 
>   The driver requires enabling the options such as
>   Coheherent Memory Allocation,
>   Intel FPGA Frame Buffer, Frame Buffer Conasole
> 
> Signed-off-by: Ong Hean Loong 
> ---
>  .../devicetree/bindings/video/intelfpgavipfb.txt   |   22 ++
>  MAINTAINERS|6 +
>  drivers/video/fbdev/Kconfig|   15 +
>  drivers/video/fbdev/Makefile   |1 +
>  drivers/video/fbdev/intelfpgavipfb.c   |  302 
> 
>  5 files changed, 346 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/video/intelfpgavipfb.txt
>  create mode 100644 drivers/video/fbdev/intelfpgavipfb.c

As mentioned by Rob and Alan, no new fbdev drivers please. Write a DRM
driver for this.

 Tomi



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv0 1/1] fbdev: add Intel FPGA FRAME BUFFER driver

2016-11-22 Thread Tomi Valkeinen
Hi,

On 16/11/16 11:07, Ong, Hean Loong wrote:
> From: Ong Hean Loong 
> 
>   This patch enables the display port IP driver for
>   Intel Arria 10 SOCFPGA Golden Hardware
>   Reference Design (GHRD).
> 
>   The driver requires enabling the options such as
>   Coheherent Memory Allocation,
>   Intel FPGA Frame Buffer, Frame Buffer Conasole
> 
> Signed-off-by: Ong Hean Loong 
> ---
>  .../devicetree/bindings/video/intelfpgavipfb.txt   |   22 ++
>  MAINTAINERS|6 +
>  drivers/video/fbdev/Kconfig|   15 +
>  drivers/video/fbdev/Makefile   |1 +
>  drivers/video/fbdev/intelfpgavipfb.c   |  302 
> 
>  5 files changed, 346 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/video/intelfpgavipfb.txt
>  create mode 100644 drivers/video/fbdev/intelfpgavipfb.c

As mentioned by Rob and Alan, no new fbdev drivers please. Write a DRM
driver for this.

 Tomi



signature.asc
Description: OpenPGP digital signature


RE: [PATCH V3 1/2] powerpc/mpc85xx: Update TMU device tree node for T1040/T1042

2016-11-22 Thread Troy Jia


> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Wednesday, November 23, 2016 3:07 PM
> To: Troy Jia ; rui.zh...@intel.com; edubez...@gmail.com;
> robh...@kernel.org; Scott Wood ; shawn...@kernel.org
> Cc: devicet...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH V3 1/2] powerpc/mpc85xx: Update TMU device tree node for
> T1040/T1042
> 
> On Tue, 2016-10-25 at 10:15 +0800, Jia Hongtao wrote:
> > From: Hongtao Jia 
> >
> > Update #thermal-sensor-cells from 0 to 1 according to the new binding.
> > The sensor specifier added is the monitoring site ID, and represents
> > the "n" in TRITSRn and TRATSRn.
> >
> > Signed-off-by: Jia Hongtao 
> 
> Where can I find this new binding?  As of the current linux-next I don't see 
> anything
> in qoriq-thermal.txt about this.

Hi Rui Zhang,

As we discussed before. The time was inappropriate as merge window was about to 
close.
So do you have any plan for applying the binding file recently?

-Hongtao.



RE: [PATCH V3 1/2] powerpc/mpc85xx: Update TMU device tree node for T1040/T1042

2016-11-22 Thread Troy Jia


> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Wednesday, November 23, 2016 3:07 PM
> To: Troy Jia ; rui.zh...@intel.com; edubez...@gmail.com;
> robh...@kernel.org; Scott Wood ; shawn...@kernel.org
> Cc: devicet...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH V3 1/2] powerpc/mpc85xx: Update TMU device tree node for
> T1040/T1042
> 
> On Tue, 2016-10-25 at 10:15 +0800, Jia Hongtao wrote:
> > From: Hongtao Jia 
> >
> > Update #thermal-sensor-cells from 0 to 1 according to the new binding.
> > The sensor specifier added is the monitoring site ID, and represents
> > the "n" in TRITSRn and TRATSRn.
> >
> > Signed-off-by: Jia Hongtao 
> 
> Where can I find this new binding?  As of the current linux-next I don't see 
> anything
> in qoriq-thermal.txt about this.

Hi Rui Zhang,

As we discussed before. The time was inappropriate as merge window was about to 
close.
So do you have any plan for applying the binding file recently?

-Hongtao.



Re: [PATCH] PCI: Add information about describing PCI in ACPI

2016-11-22 Thread Ard Biesheuvel
On 23 November 2016 at 01:06, Bjorn Helgaas  wrote:
> On Tue, Nov 22, 2016 at 10:09:50AM +, Ard Biesheuvel wrote:
>> On 17 November 2016 at 17:59, Bjorn Helgaas  wrote:
>
>> > +PCI host bridges are PNP0A03 or PNP0A08 devices.  Their _CRS should
>> > +describe all the address space they consume.  In principle, this would
>> > +be all the windows they forward down to the PCI bus, as well as the
>> > +bridge registers themselves.  The bridge registers include things like
>> > +secondary/subordinate bus registers that determine the bus range below
>> > +the bridge, window registers that describe the apertures, etc.  These
>> > +are all device-specific, non-architected things, so the only way a
>> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
>> > +contain the device-specific details.  These bridge registers also
>> > +include ECAM space, since it is consumed by the bridge.
>> > +
>> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
>> > +the bridge apertures from the bridge registers [4, 5].  However,
>> > +BIOSes didn't use that bit correctly, and the result is that OSes have
>> > +to assume that everything in a PCI host bridge _CRS is a window.  That
>> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
>> > +device itself.
>>
>> Is that universally true? Or is it still possible to do the right
>> thing here on new ACPI architectures such as arm64?
>
> That's a very good question.  I had thought that the ACPI spec had
> given up on Consumer/Producer completely, but I was wrong.  In the 6.0
> spec, the Consumer/Producer bit is still documented in the Extended
> Address Space Descriptor (sec 6.4.3.5.4).  It is documented as
> "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
>
> Linux looks at the producer_consumer bit in acpi_decode_space(), which
> I think is used for all these descriptors (QWord, DWord, Word, and
> Extended).  This doesn't quite follow the spec -- we probably should
> ignore it except for Extended.  In any event, acpi_decode_space() sets
> IORESOURCE_WINDOW for Producer descriptors, but we don't test
> IORESOURCE_WINDOW in the PCI host bridge code.
>
> x86 and ia64 supply their own pci_acpi_root_prepare_resources()
> functions that call acpi_pci_probe_root_resources(), which parses _CRS
> and looks at producer_consumer.  Then they do a little arch-specific
> stuff on the result.
>
> On arm64 we use acpi_pci_probe_root_resources() directly, with no
> arch-specific stuff.
>
> On all three arches, we ignore the Consumer/Producer bit, so all the
> resources are treated as Producers, e.g., as bridge windows.
>
> I think we *could* implement an arm64 version of
> pci_acpi_root_prepare_resources() that would pay attention to the
> Consumer/Producer bit by checking IORESOURCE_WINDOW.  To be spec
> compliant, we would have to use Extended descriptors for all bridge
> windows, even if they would fit in a DWord or QWord.
>
> Should we do that?  I dunno.  I'd like to hear your opinion(s).
>

Yes, I think we should. If the spec allows for a way for a PNP0A03
device to describe all of its resources unambiguously, we should not
be relying on workarounds that were designed for another architecture
in another decade (for, presumably, another OS)

Just for my understanding, we will need to use extended descriptors
for all consumed *and* produced regions, even though dword/qword are
implicitly produced-only, due to the fact that the bit is ignored?

> It *would* be nice to have bridge registers in the bridge _CRS.  That
> would eliminate the need for looking up the HISI0081/PNP0C02 devices
> to find the bridge registers.  Avoiding that lookup is only a
> temporary advantage -- the next round of bridges are supposed to fully
> implement ECAM, and then we won't need to know where the registers
> are.
>
> Apart from the lookup, there's still some advantage in describing the
> registers in the PNP0A03 device instead of an unrelated PNP0C02
> device, because it makes /proc/iomem more accurate and potentially
> makes host bridge hotplug cleaner.  We would have to enhance the host
> bridge driver to do the reservations currently done by pnp/system.c.
>
> There's some value in doing it the same way as on x86, even though
> that way is somewhat broken.
>
> Whatever we decide, I think it's very important to get it figured out
> ASAP because it affects the ECAM quirks that we're trying to merge in
> v4.10.
>

I agree. What exactly is the impact for the quirks mechanism as proposed?


Re: [PATCH] PCI: Add information about describing PCI in ACPI

2016-11-22 Thread Ard Biesheuvel
On 23 November 2016 at 01:06, Bjorn Helgaas  wrote:
> On Tue, Nov 22, 2016 at 10:09:50AM +, Ard Biesheuvel wrote:
>> On 17 November 2016 at 17:59, Bjorn Helgaas  wrote:
>
>> > +PCI host bridges are PNP0A03 or PNP0A08 devices.  Their _CRS should
>> > +describe all the address space they consume.  In principle, this would
>> > +be all the windows they forward down to the PCI bus, as well as the
>> > +bridge registers themselves.  The bridge registers include things like
>> > +secondary/subordinate bus registers that determine the bus range below
>> > +the bridge, window registers that describe the apertures, etc.  These
>> > +are all device-specific, non-architected things, so the only way a
>> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
>> > +contain the device-specific details.  These bridge registers also
>> > +include ECAM space, since it is consumed by the bridge.
>> > +
>> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
>> > +the bridge apertures from the bridge registers [4, 5].  However,
>> > +BIOSes didn't use that bit correctly, and the result is that OSes have
>> > +to assume that everything in a PCI host bridge _CRS is a window.  That
>> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
>> > +device itself.
>>
>> Is that universally true? Or is it still possible to do the right
>> thing here on new ACPI architectures such as arm64?
>
> That's a very good question.  I had thought that the ACPI spec had
> given up on Consumer/Producer completely, but I was wrong.  In the 6.0
> spec, the Consumer/Producer bit is still documented in the Extended
> Address Space Descriptor (sec 6.4.3.5.4).  It is documented as
> "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
>
> Linux looks at the producer_consumer bit in acpi_decode_space(), which
> I think is used for all these descriptors (QWord, DWord, Word, and
> Extended).  This doesn't quite follow the spec -- we probably should
> ignore it except for Extended.  In any event, acpi_decode_space() sets
> IORESOURCE_WINDOW for Producer descriptors, but we don't test
> IORESOURCE_WINDOW in the PCI host bridge code.
>
> x86 and ia64 supply their own pci_acpi_root_prepare_resources()
> functions that call acpi_pci_probe_root_resources(), which parses _CRS
> and looks at producer_consumer.  Then they do a little arch-specific
> stuff on the result.
>
> On arm64 we use acpi_pci_probe_root_resources() directly, with no
> arch-specific stuff.
>
> On all three arches, we ignore the Consumer/Producer bit, so all the
> resources are treated as Producers, e.g., as bridge windows.
>
> I think we *could* implement an arm64 version of
> pci_acpi_root_prepare_resources() that would pay attention to the
> Consumer/Producer bit by checking IORESOURCE_WINDOW.  To be spec
> compliant, we would have to use Extended descriptors for all bridge
> windows, even if they would fit in a DWord or QWord.
>
> Should we do that?  I dunno.  I'd like to hear your opinion(s).
>

Yes, I think we should. If the spec allows for a way for a PNP0A03
device to describe all of its resources unambiguously, we should not
be relying on workarounds that were designed for another architecture
in another decade (for, presumably, another OS)

Just for my understanding, we will need to use extended descriptors
for all consumed *and* produced regions, even though dword/qword are
implicitly produced-only, due to the fact that the bit is ignored?

> It *would* be nice to have bridge registers in the bridge _CRS.  That
> would eliminate the need for looking up the HISI0081/PNP0C02 devices
> to find the bridge registers.  Avoiding that lookup is only a
> temporary advantage -- the next round of bridges are supposed to fully
> implement ECAM, and then we won't need to know where the registers
> are.
>
> Apart from the lookup, there's still some advantage in describing the
> registers in the PNP0A03 device instead of an unrelated PNP0C02
> device, because it makes /proc/iomem more accurate and potentially
> makes host bridge hotplug cleaner.  We would have to enhance the host
> bridge driver to do the reservations currently done by pnp/system.c.
>
> There's some value in doing it the same way as on x86, even though
> that way is somewhat broken.
>
> Whatever we decide, I think it's very important to get it figured out
> ASAP because it affects the ECAM quirks that we're trying to merge in
> v4.10.
>

I agree. What exactly is the impact for the quirks mechanism as proposed?


[PATCH] trace: Add documentation for mono and mono_raw trace clocks

2016-11-22 Thread Joel Fernandes
mono and mono_raw trace clocks access CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW
clocks for tracing purposes. Add documentation for the same.

Signed-off-by: Joel Fernandes 
---
Steven,
I skipped adding docs for boot clock as that patch is still being discussed,
but please accept documentation for the other clocks you said were missing.
I will follow with the boot clock documentation at a later time. Thanks.

 Documentation/trace/ftrace.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 185c39f..32cc1ee 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -362,6 +362,14 @@ of ftrace. Here is a list of some of the key files:
  to correlate events across hypervisor/guest if
  tb_offset is known.
 
+ mono: This uses the fast monotonic clock (CLOCK_MONOTONIC)
+   which is monotonic and is subject to NTP rate adjustments.
+
+ mono_raw:
+   This is the raw monotonic clock (CLOCK_MONOTONIC_RAW)
+   which is montonic but is not subject to any rate adjustments
+   and ticks at the same rate as the hardware clocksource.
+
To set a clock, simply echo the clock name into this file.
 
  echo global > trace_clock
-- 
2.8.0.rc3.226.g39d4020



[PATCH] trace: Add documentation for mono and mono_raw trace clocks

2016-11-22 Thread Joel Fernandes
mono and mono_raw trace clocks access CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW
clocks for tracing purposes. Add documentation for the same.

Signed-off-by: Joel Fernandes 
---
Steven,
I skipped adding docs for boot clock as that patch is still being discussed,
but please accept documentation for the other clocks you said were missing.
I will follow with the boot clock documentation at a later time. Thanks.

 Documentation/trace/ftrace.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 185c39f..32cc1ee 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -362,6 +362,14 @@ of ftrace. Here is a list of some of the key files:
  to correlate events across hypervisor/guest if
  tb_offset is known.
 
+ mono: This uses the fast monotonic clock (CLOCK_MONOTONIC)
+   which is monotonic and is subject to NTP rate adjustments.
+
+ mono_raw:
+   This is the raw monotonic clock (CLOCK_MONOTONIC_RAW)
+   which is montonic but is not subject to any rate adjustments
+   and ticks at the same rate as the hardware clocksource.
+
To set a clock, simply echo the clock name into this file.
 
  echo global > trace_clock
-- 
2.8.0.rc3.226.g39d4020



Re: [LKP] [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!

2016-11-22 Thread Linus Torvalds
On Tue, Nov 22, 2016 at 10:44 PM, Fengguang Wu  wrote:
>
> On Tue, Nov 22, 2016 at 02:04:42PM -0800, Linus Torvalds wrote:
>
>> I also noticed that the kernel test robot had screwed up the
>> participants list for some reason, and had
>>
>>  "Acked-by: Alexander Duyck , David S.
>> Miller" 
>>
>> as one of the participants. So there's some odd commit parsing issue
>> there somewhere. But Alexander seems to have seen this report despite
>> that, it just never went anywhere that I can tell.
>
>
> Yeah the robot will CC all "Acked-by" people in the bug reports.
>
> Shall we limit it to the below TO/CC list?

No. We do want to keep the Acked-by's on the cc.

But you missed the real problem.

It *didn't* cc the acked-by. Look closer. What happened was that it cc'd this:

 "Acked-by: Alexander Duyck , David S. Miller"

 

ie there is only _one_ email address (that of da...@davemloft.net),
and the whole "Acked-by: Alexander Duyck <...>" part is quoted as the
_name_ of that email address.

At least that's what the headers look like for me in the original report:

   From: kernel test robot 
   To: Eric Dumazet 
   Cc: l...@01.org, Linus Torvalds ,
LKML , Alexei Starovoitov
, Willem de Bruijn , "Acked-by:
Alexander Duyck , David S. Miller"


Notice the quoting of that last "name".

  Linus


Re: [PATCH V3 1/2] powerpc/mpc85xx: Update TMU device tree node for T1040/T1042

2016-11-22 Thread Scott Wood
On Tue, 2016-10-25 at 10:15 +0800, Jia Hongtao wrote:
> From: Hongtao Jia 
> 
> Update #thermal-sensor-cells from 0 to 1 according to the new binding. The
> sensor specifier added is the monitoring site ID, and represents the "n" in
> TRITSRn and TRATSRn.
> 
> Signed-off-by: Jia Hongtao 

Where can I find this new binding?  As of the current linux-next I don't see
anything in qoriq-thermal.txt about this.

-Scott



Re: [LKP] [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!

2016-11-22 Thread Linus Torvalds
On Tue, Nov 22, 2016 at 10:44 PM, Fengguang Wu  wrote:
>
> On Tue, Nov 22, 2016 at 02:04:42PM -0800, Linus Torvalds wrote:
>
>> I also noticed that the kernel test robot had screwed up the
>> participants list for some reason, and had
>>
>>  "Acked-by: Alexander Duyck , David S.
>> Miller" 
>>
>> as one of the participants. So there's some odd commit parsing issue
>> there somewhere. But Alexander seems to have seen this report despite
>> that, it just never went anywhere that I can tell.
>
>
> Yeah the robot will CC all "Acked-by" people in the bug reports.
>
> Shall we limit it to the below TO/CC list?

No. We do want to keep the Acked-by's on the cc.

But you missed the real problem.

It *didn't* cc the acked-by. Look closer. What happened was that it cc'd this:

 "Acked-by: Alexander Duyck , David S. Miller"

 

ie there is only _one_ email address (that of da...@davemloft.net),
and the whole "Acked-by: Alexander Duyck <...>" part is quoted as the
_name_ of that email address.

At least that's what the headers look like for me in the original report:

   From: kernel test robot 
   To: Eric Dumazet 
   Cc: l...@01.org, Linus Torvalds ,
LKML , Alexei Starovoitov
, Willem de Bruijn , "Acked-by:
Alexander Duyck , David S. Miller"


Notice the quoting of that last "name".

  Linus


Re: [PATCH V3 1/2] powerpc/mpc85xx: Update TMU device tree node for T1040/T1042

2016-11-22 Thread Scott Wood
On Tue, 2016-10-25 at 10:15 +0800, Jia Hongtao wrote:
> From: Hongtao Jia 
> 
> Update #thermal-sensor-cells from 0 to 1 according to the new binding. The
> sensor specifier added is the monitoring site ID, and represents the "n" in
> TRITSRn and TRATSRn.
> 
> Signed-off-by: Jia Hongtao 

Where can I find this new binding?  As of the current linux-next I don't see
anything in qoriq-thermal.txt about this.

-Scott



Re: [PATCH] HID: lg: fix noderef.cocci warnings

2016-11-22 Thread Julia Lawall


On Wed, 23 Nov 2016, Fengguang Wu wrote:

> On Tue, Nov 22, 2016 at 11:44:34AM +0100, Jiri Kosina wrote:
> > On Mon, 21 Nov 2016, Benjamin Tissoires wrote:
> >
> > > > Generated by: scripts/coccinelle/misc/noderef.cocci
> > > >
> > > > CC: Benjamin Tissoires 
> > > > Signed-off-by: Fengguang Wu 
> > > > ---
> > > >
> > > >  hid-lg.c |6 --
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > --- a/drivers/hid/hid-lg.c
> > > > +++ b/drivers/hid/hid-lg.c
> > > > @@ -777,8 +777,10 @@ static int lg_probe(struct hid_device *h
> > > > buf[1] = 0xB2;
> > > > get_random_bytes([2], 2);
> > > >
> > > > -   ret = hid_hw_raw_request(hdev, buf[0], buf,
> > > sizeof(buf),
> > > > -   HID_FEATURE_REPORT,
> > > HID_REQ_SET_REPORT);
> > > > +   ret = hid_hw_raw_request(hdev, buf[0], buf,
> > > > +sizeof(*buf),
> > >
> > > This is wrong. I messed up and should have used "sizeof(cbuf)", but the
> > > coccinelle script failed at detecting the correct solution (I guess it
> > > couldn't).
> >
> > Fengguang, is there anything that could be done to improve this?
>
> CC Julie and Gilles. I'm not sure if the coccinelle script could be
> made that smart. :)

Thanks for forwarding.  From looking at the code snippet, I have the
impression that if it were possible, it would require an interprocedural
analysis, and the cost would outweigh the benefit.  Basically, I don't see
any cbuf nearby.

julia


> > > Jiri, do you want me to send a v2 of the series or will you just amend
> > > the patch while applying?
> >
> > I'll fix that up, no worries. Thanks,
> >
> > --
> > Jiri Kosina
> > SUSE Labs
>


[RFC 0/2] GFP_NOFAIL cleanups

2016-11-22 Thread Michal Hocko
Hi,
Tetsuo has noticed [1] that recent changes have changed GFP_NOFAIL
semantic for costly order requests. I believe that the primary reason
why this happened is that our GFP_NOFAIL checks are too scattered
and it is really easy to forget about adding one. That's why I am
proposing patch 1 which consolidates all the nofail handling at a single
place. This should help to make this code better maintainable.

Patch 2 on top is a further attempt to make GFP_NOFAIL semantic less
surprising. As things stand currently GFP_NOFAIL overrides the oom killer
prevention code which is both subtle and not really needed. The patch 2
has more details about issues this might cause.

I would consider both patches more a cleanup than anything else. Any
feedback is highly appreciated.

[1] 
http://lkml.kernel.org/r/1479387004-5998-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp



[RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically

2016-11-22 Thread Michal Hocko
From: Michal Hocko 

__alloc_pages_may_oom makes sure to skip the OOM killer depending on
the allocation request. This includes lowmem requests, costly high
order requests and others. For a long time __GFP_NOFAIL acted as an
override for all those rules. This is not documented and it can be quite
surprising as well. E.g. GFP_NOFS requests are not invoking the OOM
killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of
the existing open coded loops around allocator to nofail request (and we
have done that in the past) then such a change would have a non trivial
side effect which is not obvious. Note that the primary motivation for
skipping the OOM killer is to prevent from pre-mature invocation.

The exception has been added by 82553a937f12 ("oom: invoke oom killer
for __GFP_NOFAIL"). The changelog points out that the oom killer has to
be invoked otherwise the request would be looping for ever. But this
argument is rather weak because the OOM killer doesn't really guarantee
any forward progress for those exceptional cases - e.g. it will hardly
help to form costly order - I believe we certainly do not want to kill
all processes and eventually panic the system just because there is a
nasty driver asking for order-9 page with GFP_NOFAIL not realizing all
the consequences - it is much better this request would loop for ever
than the massive system disruption, lowmem is also highly unlikely to be
freed during OOM killer and GFP_NOFS request could trigger while there
is still a lot of memory pinned by filesystems.

This patch simply removes the __GFP_NOFAIL special case in order to have
a more clear semantic without surprising side effects. Instead we do
allow nofail requests to access memory reserves to move forward in both
cases when the OOM killer is invoked and when it should be supressed.
__alloc_pages_nowmark helper has been introduced for that purpose.

Signed-off-by: Michal Hocko 
---
 mm/oom_kill.c   |  2 +-
 mm/page_alloc.c | 95 +++--
 2 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec9f11d4f094..12a6fce85f61 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1013,7 +1013,7 @@ bool out_of_memory(struct oom_control *oc)
 * make sure exclude 0 mask - all other users should have at least
 * ___GFP_DIRECT_RECLAIM to get here.
 */
-   if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
+   if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
return true;
 
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 76c0b6bb0baf..7102641147c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3044,6 +3044,25 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
 }
 
 static inline struct page *
+__alloc_pages_nowmark(gfp_t gfp_mask, unsigned int order,
+   const struct alloc_context *ac)
+{
+   struct page *page;
+
+   page = get_page_from_freelist(gfp_mask, order,
+   ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+   /*
+* fallback to ignore cpuset restriction if our nodes
+* are depleted
+*/
+   if (!page)
+   page = get_page_from_freelist(gfp_mask, order,
+   ALLOC_NO_WATERMARKS, ac);
+
+   return page;
+}
+
+static inline struct page *
 __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
const struct alloc_context *ac, unsigned long *did_some_progress)
 {
@@ -3078,47 +3097,41 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int 
order,
if (page)
goto out;
 
-   if (!(gfp_mask & __GFP_NOFAIL)) {
-   /* Coredumps can quickly deplete all memory reserves */
-   if (current->flags & PF_DUMPCORE)
-   goto out;
-   /* The OOM killer will not help higher order allocs */
-   if (order > PAGE_ALLOC_COSTLY_ORDER)
-   goto out;
-   /* The OOM killer does not needlessly kill tasks for lowmem */
-   if (ac->high_zoneidx < ZONE_NORMAL)
-   goto out;
-   if (pm_suspended_storage())
-   goto out;
-   /*
-* XXX: GFP_NOFS allocations should rather fail than rely on
-* other request to make a forward progress.
-* We are in an unfortunate situation where out_of_memory cannot
-* do much for this context but let's try it to at least get
-* access to memory reserved if the current task is killed (see
-* out_of_memory). Once filesystems are ready to handle 
allocation
-* failures more gracefully we should just bail out here.
-*/
+   /* Coredumps can quickly deplete all memory reserves */
+   if (current->flags & PF_DUMPCORE)
+   

Re: [PATCH] HID: lg: fix noderef.cocci warnings

2016-11-22 Thread Julia Lawall


On Wed, 23 Nov 2016, Fengguang Wu wrote:

> On Tue, Nov 22, 2016 at 11:44:34AM +0100, Jiri Kosina wrote:
> > On Mon, 21 Nov 2016, Benjamin Tissoires wrote:
> >
> > > > Generated by: scripts/coccinelle/misc/noderef.cocci
> > > >
> > > > CC: Benjamin Tissoires 
> > > > Signed-off-by: Fengguang Wu 
> > > > ---
> > > >
> > > >  hid-lg.c |6 --
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > --- a/drivers/hid/hid-lg.c
> > > > +++ b/drivers/hid/hid-lg.c
> > > > @@ -777,8 +777,10 @@ static int lg_probe(struct hid_device *h
> > > > buf[1] = 0xB2;
> > > > get_random_bytes([2], 2);
> > > >
> > > > -   ret = hid_hw_raw_request(hdev, buf[0], buf,
> > > sizeof(buf),
> > > > -   HID_FEATURE_REPORT,
> > > HID_REQ_SET_REPORT);
> > > > +   ret = hid_hw_raw_request(hdev, buf[0], buf,
> > > > +sizeof(*buf),
> > >
> > > This is wrong. I messed up and should have used "sizeof(cbuf)", but the
> > > coccinelle script failed at detecting the correct solution (I guess it
> > > couldn't).
> >
> > Fengguang, is there anything that could be done to improve this?
>
> CC Julie and Gilles. I'm not sure if the coccinelle script could be
> made that smart. :)

Thanks for forwarding.  From looking at the code snippet, I have the
impression that if it were possible, it would require an interprocedural
analysis, and the cost would outweigh the benefit.  Basically, I don't see
any cbuf nearby.

julia


> > > Jiri, do you want me to send a v2 of the series or will you just amend
> > > the patch while applying?
> >
> > I'll fix that up, no worries. Thanks,
> >
> > --
> > Jiri Kosina
> > SUSE Labs
>


[RFC 0/2] GFP_NOFAIL cleanups

2016-11-22 Thread Michal Hocko
Hi,
Tetsuo has noticed [1] that recent changes have changed GFP_NOFAIL
semantic for costly order requests. I believe that the primary reason
why this happened is that our GFP_NOFAIL checks are too scattered
and it is really easy to forget about adding one. That's why I am
proposing patch 1 which consolidates all the nofail handling at a single
place. This should help to make this code better maintainable.

Patch 2 on top is a further attempt to make GFP_NOFAIL semantic less
surprising. As things stand currently GFP_NOFAIL overrides the oom killer
prevention code which is both subtle and not really needed. The patch 2
has more details about issues this might cause.

I would consider both patches more a cleanup than anything else. Any
feedback is highly appreciated.

[1] 
http://lkml.kernel.org/r/1479387004-5998-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp



[RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically

2016-11-22 Thread Michal Hocko
From: Michal Hocko 

__alloc_pages_may_oom makes sure to skip the OOM killer depending on
the allocation request. This includes lowmem requests, costly high
order requests and others. For a long time __GFP_NOFAIL acted as an
override for all those rules. This is not documented and it can be quite
surprising as well. E.g. GFP_NOFS requests are not invoking the OOM
killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of
the existing open coded loops around allocator to nofail request (and we
have done that in the past) then such a change would have a non trivial
side effect which is not obvious. Note that the primary motivation for
skipping the OOM killer is to prevent from pre-mature invocation.

The exception has been added by 82553a937f12 ("oom: invoke oom killer
for __GFP_NOFAIL"). The changelog points out that the oom killer has to
be invoked otherwise the request would be looping for ever. But this
argument is rather weak because the OOM killer doesn't really guarantee
any forward progress for those exceptional cases - e.g. it will hardly
help to form costly order - I believe we certainly do not want to kill
all processes and eventually panic the system just because there is a
nasty driver asking for order-9 page with GFP_NOFAIL not realizing all
the consequences - it is much better this request would loop for ever
than the massive system disruption, lowmem is also highly unlikely to be
freed during OOM killer and GFP_NOFS request could trigger while there
is still a lot of memory pinned by filesystems.

This patch simply removes the __GFP_NOFAIL special case in order to have
a more clear semantic without surprising side effects. Instead we do
allow nofail requests to access memory reserves to move forward in both
cases when the OOM killer is invoked and when it should be supressed.
__alloc_pages_nowmark helper has been introduced for that purpose.

Signed-off-by: Michal Hocko 
---
 mm/oom_kill.c   |  2 +-
 mm/page_alloc.c | 95 +++--
 2 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec9f11d4f094..12a6fce85f61 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1013,7 +1013,7 @@ bool out_of_memory(struct oom_control *oc)
 * make sure exclude 0 mask - all other users should have at least
 * ___GFP_DIRECT_RECLAIM to get here.
 */
-   if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
+   if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
return true;
 
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 76c0b6bb0baf..7102641147c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3044,6 +3044,25 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
 }
 
 static inline struct page *
+__alloc_pages_nowmark(gfp_t gfp_mask, unsigned int order,
+   const struct alloc_context *ac)
+{
+   struct page *page;
+
+   page = get_page_from_freelist(gfp_mask, order,
+   ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+   /*
+* fallback to ignore cpuset restriction if our nodes
+* are depleted
+*/
+   if (!page)
+   page = get_page_from_freelist(gfp_mask, order,
+   ALLOC_NO_WATERMARKS, ac);
+
+   return page;
+}
+
+static inline struct page *
 __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
const struct alloc_context *ac, unsigned long *did_some_progress)
 {
@@ -3078,47 +3097,41 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int 
order,
if (page)
goto out;
 
-   if (!(gfp_mask & __GFP_NOFAIL)) {
-   /* Coredumps can quickly deplete all memory reserves */
-   if (current->flags & PF_DUMPCORE)
-   goto out;
-   /* The OOM killer will not help higher order allocs */
-   if (order > PAGE_ALLOC_COSTLY_ORDER)
-   goto out;
-   /* The OOM killer does not needlessly kill tasks for lowmem */
-   if (ac->high_zoneidx < ZONE_NORMAL)
-   goto out;
-   if (pm_suspended_storage())
-   goto out;
-   /*
-* XXX: GFP_NOFS allocations should rather fail than rely on
-* other request to make a forward progress.
-* We are in an unfortunate situation where out_of_memory cannot
-* do much for this context but let's try it to at least get
-* access to memory reserved if the current task is killed (see
-* out_of_memory). Once filesystems are ready to handle 
allocation
-* failures more gracefully we should just bail out here.
-*/
+   /* Coredumps can quickly deplete all memory reserves */
+   if (current->flags & PF_DUMPCORE)
+   goto out;
+   /* 

Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Michal Hocko
On Wed 23-11-16 14:53:12, Hillf Danton wrote:
> On Wednesday, November 23, 2016 2:34 PM Michal Hocko wrote:
> > @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, 
> > unsigned int order, int alloc_fla
> > if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
> > return false;
> > 
> > +#ifdef CONFIG_COMPACTION
> > +   /*
> > +* This is a gross workaround to compensate a lack of reliable 
> > compaction
> > +* operation. We cannot simply go OOM with the current state of the 
> > compaction
> > +* code because this can lead to pre mature OOM declaration.
> > +*/
> > +   if (order <= PAGE_ALLOC_COSTLY_ORDER)
> 
> No need to check order once more.

yes simple return true would be sufficient but I wanted the code to be
more obvious.

> Plus can we retry without CONFIG_COMPACTION enabled?

Yes checking the order-0 watermark was the original implementation of
the high order retry without compaction enabled. I do not rememeber any
reports for that so I didn't want to touch that path.
-- 
Michal Hocko
SUSE Labs


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Michal Hocko
On Wed 23-11-16 14:53:12, Hillf Danton wrote:
> On Wednesday, November 23, 2016 2:34 PM Michal Hocko wrote:
> > @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, 
> > unsigned int order, int alloc_fla
> > if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
> > return false;
> > 
> > +#ifdef CONFIG_COMPACTION
> > +   /*
> > +* This is a gross workaround to compensate a lack of reliable 
> > compaction
> > +* operation. We cannot simply go OOM with the current state of the 
> > compaction
> > +* code because this can lead to pre mature OOM declaration.
> > +*/
> > +   if (order <= PAGE_ALLOC_COSTLY_ORDER)
> 
> No need to check order once more.

yes simple return true would be sufficient but I wanted the code to be
more obvious.

> Plus can we retry without CONFIG_COMPACTION enabled?

Yes checking the order-0 watermark was the original implementation of
the high order retry without compaction enabled. I do not rememeber any
reports for that so I didn't want to touch that path.
-- 
Michal Hocko
SUSE Labs


linux-next: Tree for Nov 23

2016-11-22 Thread Stephen Rothwell
Hi all,

Changes since 20161122:

The clk tree gained conflicts against the arm-soc tree (resolved today
with help).

The md tree gained a conflict against the block tree.

The kvm-ppc-paulus tree gained a conflict against the powerpc-fixes tree.

The kvms390 tree gained a conflict against the s390 tree.

The userns tree lost its complex conflicts against Linus' tree.

Non-merge commits (relative to Linus' tree): 7367
 7572 files changed, 446863 insertions(+), 165009 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(with KALLSYMS_EXTRA_PASS=1) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 244 trees (counting Linus' and 34 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (23400ac99706 Merge branch 'for-rc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (c6a385539175 kbuild: Steal gcc's pie from the 
very beginning)
Merging arc-current/for-curr (a25f0944ba9b Linux 4.9-rc5)
Merging arm-current/fixes (575f6e7a97ab ARM: 8630/1: don't use assembler macro 
arguments with EXPORT_SYMBOL())
Merging m68k-current/for-linus (7e251bb21ae0 m68k: Fix ndelay() macro)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (9e5f68842276 powerpc: Fix missing CRCs, add more 
asm-prototypes.h declarations)
Merging sparc/master (9dd35d6882a1 sparc: drop duplicate header scatterlist.h)
Merging net/master (c9b8af133019 flow_dissect: call 
init_default_flow_dissectors() earlier)
Merging ipsec/master (330e832abda9 xfrm: unbreak xfrm_sk_policy_lookup)
Merging netfilter/master (9b6c14d51bd2 net: tcp response should set oif only if 
it is L3 master)
Merging ipvs/master (9b6c14d51bd2 net: tcp response should set oif only if it 
is L3 master)
Merging wireless-drivers/master (fcd2042e8d36 mwifiex: printk() overflow with 
32-byte SSIDs)
Merging mac80211/master (51b9a31c42ed tipc: eliminate obsolete socket locking 
policy description)
Merging sound-current/for-linus (6ff1a25318eb ALSA: usb-audio: Fix 
use-after-free of usb_device at disconnect)
Merging pci-current/for-linus (9f46107b8ce4 PCI: designware-plat: Update author 
email)
Merging driver-core.current/driver-core-linus (a25f0944ba9b Linux 4.9-rc5)
Merging tty.current/tty-linus (a909d3e63699 Linux 4.9-rc3)
Merging usb.current/usb-linus (c0da038d7afe Merge tag 'usb-serial-4.9-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus)
Merging usb-gadget-fixes/fixes (05e78c6933d6 usb: gadget: f_fs: fix wrong 
parenthesis in ffs_func_req_match())
Merging usb-serial-fixes/usb-linus (2ab13292d7a3 USB: serial: cp210x: add ID 
for the Zone DPMX)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (4320f9d4c183 phy: sun4i: check PMU presence when poking 
unknown bit of pmu)
Merging staging.current/staging-linus (a25f0944ba9b Linux 4.9-rc5)
Merging char-misc.current/char-misc-linus (a25f0944ba9b Linux 4.9-rc5)
Merging input-current/for-linus (e9fb7cc63801 Input: psmouse - disable 
automatic probing of BYD touchpads)
Merging crypto-current/master (c8467f7a3620 crypto: scatterwalk - Remove 
unnecessary aliasing check in map_and_copy)
Merging ide/master

Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Hillf Danton
On Wednesday, November 23, 2016 2:34 PM Michal Hocko wrote:
> @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, 
> unsigned int order, int alloc_fla
>   if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
>   return false;
> 
> +#ifdef CONFIG_COMPACTION
> + /*
> +  * This is a gross workaround to compensate a lack of reliable 
> compaction
> +  * operation. We cannot simply go OOM with the current state of the 
> compaction
> +  * code because this can lead to pre mature OOM declaration.
> +  */
> + if (order <= PAGE_ALLOC_COSTLY_ORDER)

No need to check order once more.
Plus can we retry without CONFIG_COMPACTION enabled?

> + return true;
> +#endif
> +
>   /*
>* There are setups with compaction disabled which would prefer to loop
>* inside the allocator rather than hit the oom killer prematurely.
> --
> Michal Hocko
> SUSE Labs
> 



linux-next: Tree for Nov 23

2016-11-22 Thread Stephen Rothwell
Hi all,

Changes since 20161122:

The clk tree gained conflicts against the arm-soc tree (resolved today
with help).

The md tree gained a conflict against the block tree.

The kvm-ppc-paulus tree gained a conflict against the powerpc-fixes tree.

The kvms390 tree gained a conflict against the s390 tree.

The userns tree lost its complex conflicts against Linus' tree.

Non-merge commits (relative to Linus' tree): 7367
 7572 files changed, 446863 insertions(+), 165009 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(with KALLSYMS_EXTRA_PASS=1) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 244 trees (counting Linus' and 34 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (23400ac99706 Merge branch 'for-rc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (c6a385539175 kbuild: Steal gcc's pie from the 
very beginning)
Merging arc-current/for-curr (a25f0944ba9b Linux 4.9-rc5)
Merging arm-current/fixes (575f6e7a97ab ARM: 8630/1: don't use assembler macro 
arguments with EXPORT_SYMBOL())
Merging m68k-current/for-linus (7e251bb21ae0 m68k: Fix ndelay() macro)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (9e5f68842276 powerpc: Fix missing CRCs, add more 
asm-prototypes.h declarations)
Merging sparc/master (9dd35d6882a1 sparc: drop duplicate header scatterlist.h)
Merging net/master (c9b8af133019 flow_dissect: call 
init_default_flow_dissectors() earlier)
Merging ipsec/master (330e832abda9 xfrm: unbreak xfrm_sk_policy_lookup)
Merging netfilter/master (9b6c14d51bd2 net: tcp response should set oif only if 
it is L3 master)
Merging ipvs/master (9b6c14d51bd2 net: tcp response should set oif only if it 
is L3 master)
Merging wireless-drivers/master (fcd2042e8d36 mwifiex: printk() overflow with 
32-byte SSIDs)
Merging mac80211/master (51b9a31c42ed tipc: eliminate obsolete socket locking 
policy description)
Merging sound-current/for-linus (6ff1a25318eb ALSA: usb-audio: Fix 
use-after-free of usb_device at disconnect)
Merging pci-current/for-linus (9f46107b8ce4 PCI: designware-plat: Update author 
email)
Merging driver-core.current/driver-core-linus (a25f0944ba9b Linux 4.9-rc5)
Merging tty.current/tty-linus (a909d3e63699 Linux 4.9-rc3)
Merging usb.current/usb-linus (c0da038d7afe Merge tag 'usb-serial-4.9-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus)
Merging usb-gadget-fixes/fixes (05e78c6933d6 usb: gadget: f_fs: fix wrong 
parenthesis in ffs_func_req_match())
Merging usb-serial-fixes/usb-linus (2ab13292d7a3 USB: serial: cp210x: add ID 
for the Zone DPMX)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (4320f9d4c183 phy: sun4i: check PMU presence when poking 
unknown bit of pmu)
Merging staging.current/staging-linus (a25f0944ba9b Linux 4.9-rc5)
Merging char-misc.current/char-misc-linus (a25f0944ba9b Linux 4.9-rc5)
Merging input-current/for-linus (e9fb7cc63801 Input: psmouse - disable 
automatic probing of BYD touchpads)
Merging crypto-current/master (c8467f7a3620 crypto: scatterwalk - Remove 
unnecessary aliasing check in map_and_copy)
Merging ide/master

Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Hillf Danton
On Wednesday, November 23, 2016 2:34 PM Michal Hocko wrote:
> @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, 
> unsigned int order, int alloc_fla
>   if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
>   return false;
> 
> +#ifdef CONFIG_COMPACTION
> + /*
> +  * This is a gross workaround to compensate a lack of reliable 
> compaction
> +  * operation. We cannot simply go OOM with the current state of the 
> compaction
> +  * code because this can lead to pre mature OOM declaration.
> +  */
> + if (order <= PAGE_ALLOC_COSTLY_ORDER)

No need to check order once more.
Plus can we retry without CONFIG_COMPACTION enabled?

> + return true;
> +#endif
> +
>   /*
>* There are setups with compaction disabled which would prefer to loop
>* inside the allocator rather than hit the oom killer prematurely.
> --
> Michal Hocko
> SUSE Labs
> 



Re: [LKP] [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!

2016-11-22 Thread Fengguang Wu

Hi Linus,

On Tue, Nov 22, 2016 at 02:04:42PM -0800, Linus Torvalds wrote:
[snip]


I also noticed that the kernel test robot had screwed up the
participants list for some reason, and had

 "Acked-by: Alexander Duyck , David S.
Miller" 

as one of the participants. So there's some odd commit parsing issue
there somewhere. But Alexander seems to have seen this report despite
that, it just never went anywhere that I can tell.


Yeah the robot will CC all "Acked-by" people in the bug reports.

Shall we limit it to the below TO/CC list?

   TO: author
   CC: committer (maintainer)
   CC: all Signed-off-by
   CC: all Reviewed-by
   CC: mailing lists, if the bug is found in a maintainer/well known tree

Regards,
Fengguang


On Tue, Nov 15, 2016 at 1:20 PM, kernel test robot
 wrote:


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit 34fad54c2537f7c99d07375e50cb30aa3c23bd83 ("net: __skb_flow_dissect() must cap 
its return value")

in testcase: pbzip2
with following parameters:

nr_threads: 25%
blocksize: 900K
cpufreq_governor: performance



on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 
with 64G memory

caused below changes:


+--+++
|  | 79774d6bfa 
| 34fad54c25 |
+--+++
| boot_successes   | 0  
| 2  |
| boot_failures| 2  
| 20 |
| invoked_oom-killer:gfp_mask=0x   | 2  
| 2  |
| Mem-Info | 2  
| 2  |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 2  
| 2  |
| kernel_BUG_at_include/linux/skbuff.h | 0  
| 16 |
| invalid_opcode:#[##]SMP  | 0  
| 16 |
| RIP:eth_type_trans   | 0  
| 16 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt| 0  
| 15 |
| calltrace:hub_event  | 0  
| 1  |
| WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup| 0  
| 2  |
| calltrace:parport_pc_init| 0  
| 2  |
| calltrace:SyS_finit_module   | 0  
| 2  |
| WARNING:at_lib/kobject.c:#kobject_add_internal   | 0  
| 2  |
+--+++



[   19.375251] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   19.388892] Sending DHCP requests .
[   19.388892] [ cut here ]
[   19.388894] kernel BUG at include/linux/skbuff.h:1935!
[   19.388895] invalid opcode:  [#1] SMP
[   19.388896] Modules linked in:
[   19.388897] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.9.0-rc3-00320-g34fad54 #1
[   19.388898] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS 
SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   19.388899] task: 81e0e4c0 task.stack: 81e0
[   19.388904] RIP: 0010:[]  [] 
eth_type_trans+0xe8/0x140
[   19.388904] RSP: :88081e803db8  EFLAGS: 00010297
[   19.388905] RAX: 0152 RBX: 88080221f200 RCX: 1073
[   19.388905] RDX: 8808013afdc0 RSI: 880801114000 RDI: 880819407c00
[   19.388906] RBP: 88081e803e20 R08: 880801114000 R09: 0800
[   19.388907] R10: 8808013afec0 R11: ea003fd5a880 R12: 880819407c00
[   19.388907] R13: 881033408000 R14: c9000843e000 R15: 0158
[   19.388908] FS:  () GS:88081e80() 
knlGS:
[   19.388909] CS:  0010 DS:  ES:  CR0: 80050033
[   19.388910] CR2: 88103000 CR3: 01e07000 CR4: 001406f0
[   19.388910] Stack:
[   19.388912]  816905a7 ea003fd5a880 ea08 
88080221f050
[   19.388913]  88080221f000 00400160 ea003fd5a880 

[   19.388915]  0040  88080221f050 
88100d216000
[   19.388915] Call Trace:
[   19.388919]  
[   19.388919]  [] ? igb_clean_rx_irq+0x6a7/0x7d0
[   19.388921]  [] igb_poll+0x382/0x700
[   19.388922]  [] ? igb_poll+0x397/0x700
[   19.388925]  [] net_rx_action+0x217/0x360
[   19.388928]  [] __do_softirq+0x104/0x2ab
[   19.388931]  [] irq_exit+0xf1/0x100

Re: [LKP] [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!

2016-11-22 Thread Fengguang Wu

Hi Linus,

On Tue, Nov 22, 2016 at 02:04:42PM -0800, Linus Torvalds wrote:
[snip]


I also noticed that the kernel test robot had screwed up the
participants list for some reason, and had

 "Acked-by: Alexander Duyck , David S.
Miller" 

as one of the participants. So there's some odd commit parsing issue
there somewhere. But Alexander seems to have seen this report despite
that, it just never went anywhere that I can tell.


Yeah the robot will CC all "Acked-by" people in the bug reports.

Shall we limit it to the below TO/CC list?

   TO: author
   CC: committer (maintainer)
   CC: all Signed-off-by
   CC: all Reviewed-by
   CC: mailing lists, if the bug is found in a maintainer/well known tree

Regards,
Fengguang


On Tue, Nov 15, 2016 at 1:20 PM, kernel test robot
 wrote:


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit 34fad54c2537f7c99d07375e50cb30aa3c23bd83 ("net: __skb_flow_dissect() must cap 
its return value")

in testcase: pbzip2
with following parameters:

nr_threads: 25%
blocksize: 900K
cpufreq_governor: performance



on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 
with 64G memory

caused below changes:


+--+++
|  | 79774d6bfa 
| 34fad54c25 |
+--+++
| boot_successes   | 0  
| 2  |
| boot_failures| 2  
| 20 |
| invoked_oom-killer:gfp_mask=0x   | 2  
| 2  |
| Mem-Info | 2  
| 2  |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 2  
| 2  |
| kernel_BUG_at_include/linux/skbuff.h | 0  
| 16 |
| invalid_opcode:#[##]SMP  | 0  
| 16 |
| RIP:eth_type_trans   | 0  
| 16 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt| 0  
| 15 |
| calltrace:hub_event  | 0  
| 1  |
| WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup| 0  
| 2  |
| calltrace:parport_pc_init| 0  
| 2  |
| calltrace:SyS_finit_module   | 0  
| 2  |
| WARNING:at_lib/kobject.c:#kobject_add_internal   | 0  
| 2  |
+--+++



[   19.375251] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   19.388892] Sending DHCP requests .
[   19.388892] [ cut here ]
[   19.388894] kernel BUG at include/linux/skbuff.h:1935!
[   19.388895] invalid opcode:  [#1] SMP
[   19.388896] Modules linked in:
[   19.388897] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.9.0-rc3-00320-g34fad54 #1
[   19.388898] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS 
SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   19.388899] task: 81e0e4c0 task.stack: 81e0
[   19.388904] RIP: 0010:[]  [] 
eth_type_trans+0xe8/0x140
[   19.388904] RSP: :88081e803db8  EFLAGS: 00010297
[   19.388905] RAX: 0152 RBX: 88080221f200 RCX: 1073
[   19.388905] RDX: 8808013afdc0 RSI: 880801114000 RDI: 880819407c00
[   19.388906] RBP: 88081e803e20 R08: 880801114000 R09: 0800
[   19.388907] R10: 8808013afec0 R11: ea003fd5a880 R12: 880819407c00
[   19.388907] R13: 881033408000 R14: c9000843e000 R15: 0158
[   19.388908] FS:  () GS:88081e80() 
knlGS:
[   19.388909] CS:  0010 DS:  ES:  CR0: 80050033
[   19.388910] CR2: 88103000 CR3: 01e07000 CR4: 001406f0
[   19.388910] Stack:
[   19.388912]  816905a7 ea003fd5a880 ea08 
88080221f050
[   19.388913]  88080221f000 00400160 ea003fd5a880 

[   19.388915]  0040  88080221f050 
88100d216000
[   19.388915] Call Trace:
[   19.388919]  
[   19.388919]  [] ? igb_clean_rx_irq+0x6a7/0x7d0
[   19.388921]  [] igb_poll+0x382/0x700
[   19.388922]  [] ? igb_poll+0x397/0x700
[   19.388925]  [] net_rx_action+0x217/0x360
[   19.388928]  [] __do_softirq+0x104/0x2ab
[   19.388931]  [] irq_exit+0xf1/0x100
[   19.388932]  [] do_IRQ+0x54/0xd0
[   19.388935]  [] 

Re: [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short)

2016-11-22 Thread Anshuman Khandual
On 11/18/2016 11:48 PM, Jérôme Glisse wrote:
> HMM provides 3 separate functionality :
> - Mirroring: synchronize CPU page table and device page table
> - Device memory: allocating struct page for device memory
> - Migration: migrating regular memory to device memory
> 
> This patch introduces some common helpers and definitions to all of
> those 3 functionality.
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Jatin Kumar 
> Signed-off-by: John Hubbard 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> ---
>  MAINTAINERS  |   7 +++
>  include/linux/hmm.h  | 139 
> +++
>  include/linux/mm_types.h |   5 ++
>  kernel/fork.c|   2 +
>  mm/Kconfig   |  11 
>  mm/Makefile  |   1 +
>  mm/hmm.c |  86 +
>  7 files changed, 251 insertions(+)
>  create mode 100644 include/linux/hmm.h
>  create mode 100644 mm/hmm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f593300..41cd63d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5582,6 +5582,13 @@ S: Supported
>  F:   drivers/scsi/hisi_sas/
>  F:   Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
>  
> +HMM - Heterogeneous Memory Management
> +M:   Jérôme Glisse 
> +L:   linux...@kvack.org
> +S:   Maintained
> +F:   mm/hmm*
> +F:   include/linux/hmm*
> +
>  HOST AP DRIVER
>  M:   Jouni Malinen 
>  L:   hos...@shmoo.com (subscribers-only)
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> new file mode 100644
> index 000..54dd529
> --- /dev/null
> +++ b/include/linux/hmm.h
> @@ -0,0 +1,139 @@
> +/*
> + * Copyright 2013 Red Hat Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors: Jérôme Glisse 
> + */
> +/*
> + * HMM provides 3 separate functionality :
> + *   - Mirroring: synchronize CPU page table and device page table
> + *   - Device memory: allocating struct page for device memory
> + *   - Migration: migrating regular memory to device memory
> + *
> + * Each can be use independently from the others.

Small nit s/use/used/

> + *
> + *
> + * Mirroring:
> + *
> + * HMM provide helpers to mirror process address space on a device. For this 
> it
> + * provides several helpers to order device page table update in respect to 
> CPU
> + * page table update. Requirement is that for any given virtual address the 
> CPU
> + * and device page table can not point to different physical page. It uses 
> the
> + * mmu_notifier API and introduce virtual address range lock which block CPU
> + * page table update for a range while the device page table is being 
> updated.
> + * Usage pattern is:
> + *
> + *  hmm_vma_range_lock(vma, start, end);
> + *  // snap shot CPU page table
> + *  // update device page table from snapshot
> + *  hmm_vma_range_unlock(vma, start, end);

This code block can be explained better in more detail.

> + *
> + * Any CPU page table update that conflict with a range lock will wait until
> + * range is unlock. This garanty proper serialization of CPU and device page
> + * table update.
> + *

Small typo in here  

> + *
> + * Device memory:
> + *
> + * HMM provides helpers to help leverage device memory either addressable 
> like
> + * regular memory by the CPU or un-addressable at all. In both case the 
> device
> + * memory is associated to dedicated structs page (which are allocated like 
> for
> + * hotplug memory). Device memory management is under the responsability of 
> the

Typo in here   

> + * device driver. HMM only allocate and initialize the struct pages 
> associated
> + * with the device memory.

We should also mention that its hot plugged into the kernel as a ZONE_DEVICE
based memory.

> + *
> + * Allocating struct page for device memory allow to use device memory 
> allmost
> + * like any regular memory. Unlike regular memory it can not be added to the
> + * lru, nor can any memory allocation can use device memory directly. Device
> + * memory will only end up to be use in a process if device driver migrate 
> some
> + * of the process memory from regular memory to device memory.
> + *
> + *
> + * Migration:
> + *
> + * Existing memory 

Re: [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short)

2016-11-22 Thread Anshuman Khandual
On 11/18/2016 11:48 PM, Jérôme Glisse wrote:
> HMM provides 3 separate functionality :
> - Mirroring: synchronize CPU page table and device page table
> - Device memory: allocating struct page for device memory
> - Migration: migrating regular memory to device memory
> 
> This patch introduces some common helpers and definitions to all of
> those 3 functionality.
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Jatin Kumar 
> Signed-off-by: John Hubbard 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> ---
>  MAINTAINERS  |   7 +++
>  include/linux/hmm.h  | 139 
> +++
>  include/linux/mm_types.h |   5 ++
>  kernel/fork.c|   2 +
>  mm/Kconfig   |  11 
>  mm/Makefile  |   1 +
>  mm/hmm.c |  86 +
>  7 files changed, 251 insertions(+)
>  create mode 100644 include/linux/hmm.h
>  create mode 100644 mm/hmm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f593300..41cd63d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5582,6 +5582,13 @@ S: Supported
>  F:   drivers/scsi/hisi_sas/
>  F:   Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
>  
> +HMM - Heterogeneous Memory Management
> +M:   Jérôme Glisse 
> +L:   linux...@kvack.org
> +S:   Maintained
> +F:   mm/hmm*
> +F:   include/linux/hmm*
> +
>  HOST AP DRIVER
>  M:   Jouni Malinen 
>  L:   hos...@shmoo.com (subscribers-only)
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> new file mode 100644
> index 000..54dd529
> --- /dev/null
> +++ b/include/linux/hmm.h
> @@ -0,0 +1,139 @@
> +/*
> + * Copyright 2013 Red Hat Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors: Jérôme Glisse 
> + */
> +/*
> + * HMM provides 3 separate functionality :
> + *   - Mirroring: synchronize CPU page table and device page table
> + *   - Device memory: allocating struct page for device memory
> + *   - Migration: migrating regular memory to device memory
> + *
> + * Each can be use independently from the others.

Small nit s/use/used/

> + *
> + *
> + * Mirroring:
> + *
> + * HMM provide helpers to mirror process address space on a device. For this 
> it
> + * provides several helpers to order device page table update in respect to 
> CPU
> + * page table update. Requirement is that for any given virtual address the 
> CPU
> + * and device page table can not point to different physical page. It uses 
> the
> + * mmu_notifier API and introduce virtual address range lock which block CPU
> + * page table update for a range while the device page table is being 
> updated.
> + * Usage pattern is:
> + *
> + *  hmm_vma_range_lock(vma, start, end);
> + *  // snap shot CPU page table
> + *  // update device page table from snapshot
> + *  hmm_vma_range_unlock(vma, start, end);

This code block can be explained better in more detail.

> + *
> + * Any CPU page table update that conflict with a range lock will wait until
> + * range is unlock. This garanty proper serialization of CPU and device page
> + * table update.
> + *

Small typo in here  

> + *
> + * Device memory:
> + *
> + * HMM provides helpers to help leverage device memory either addressable 
> like
> + * regular memory by the CPU or un-addressable at all. In both case the 
> device
> + * memory is associated to dedicated structs page (which are allocated like 
> for
> + * hotplug memory). Device memory management is under the responsability of 
> the

Typo in here   

> + * device driver. HMM only allocate and initialize the struct pages 
> associated
> + * with the device memory.

We should also mention that its hot plugged into the kernel as a ZONE_DEVICE
based memory.

> + *
> + * Allocating struct page for device memory allow to use device memory 
> allmost
> + * like any regular memory. Unlike regular memory it can not be added to the
> + * lru, nor can any memory allocation can use device memory directly. Device
> + * memory will only end up to be use in a process if device driver migrate 
> some
> + * of the process memory from regular memory to device memory.
> + *
> + *
> + * Migration:
> + *
> + * Existing memory migration mechanism (mm/migrate.c) does not allow to use
> + * something else than the CPU to copy from source to destination memory. 
> More
> + * over existing code is not tailor 

Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Michal Hocko
On Tue 22-11-16 11:38:47, Linus Torvalds wrote:
> On Tue, Nov 22, 2016 at 8:14 AM, Vlastimil Babka  wrote:
> >
> > Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is
> > already EOL AFAICS).
> >
> > - send the patch [1] as 4.8-only stable.
> 
> I think that's the right thing to do. It's pretty small, and the
> argument that it changes the oom logic too much is pretty bogus, I
> think. The oom logic in 4.8 is simply broken. Let's get it fixed.
> Changing it is the point.

The point I've tried to make is that it is not should_reclaim_retry
which is broken. It's an overly optimistic reliance on the compaction
to do it's work which led to all those issues. My previous fix
31e49bfda184 ("mm, oom: protect !costly allocations some more for
!CONFIG_COMPACTION") tried to cope with that by checking the order-0
watermark which has proven to help most users. Now it didn't cover
everybody obviously. Rather than fiddling with fine tuning of these
heuristics I think it would be safer to simply admit that high order
OOM detection doesn't work in 4.8 kernel and so do not declare the OOM
killer for those requests at all. The risk of such a change is not big
because there usually are order-0 requests happening all the time so if
we are really OOM we would trigger the OOM eventually.

So I am proposing this for 4.8 stable tree instead
---
commit b2ccdcb731b666aa28f86483656c39c5e53828c7
Author: Michal Hocko 
Date:   Wed Nov 23 07:26:30 2016 +0100

mm, oom: stop pre-mature high-order OOM killer invocations

31e49bfda184 ("mm, oom: protect !costly allocations some more for
!CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM
killer invocation for high order requests. It seemed to work for most
users just fine but it is far from bullet proof and obviously not
sufficient for Marc who has reported pre-mature OOM killer invocations
with 4.8 based kernels. 4.9 will all the compaction improvements seems
to be behaving much better but that would be too intrusive to backport
to 4.8 stable kernels. Instead this patch simply never declares OOM for
!costly high order requests. We rely on order-0 requests to do that in
case we are really out of memory. Order-0 requests are much more common
and so a risk of a livelock without any way forward is highly unlikely.

Reported-by: Marc MERLIN 
Signed-off-by: Michal Hocko 

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c64ed3c..7401e996009a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, unsigned 
int order, int alloc_fla
if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
return false;
 
+#ifdef CONFIG_COMPACTION
+   /*
+* This is a gross workaround to compensate a lack of reliable 
compaction
+* operation. We cannot simply go OOM with the current state of the 
compaction
+* code because this can lead to pre mature OOM declaration.
+*/
+   if (order <= PAGE_ALLOC_COSTLY_ORDER)
+   return true;
+#endif
+
/*
 * There are setups with compaction disabled which would prefer to loop
 * inside the allocator rather than hit the oom killer prematurely.
-- 
Michal Hocko
SUSE Labs


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Michal Hocko
On Tue 22-11-16 11:38:47, Linus Torvalds wrote:
> On Tue, Nov 22, 2016 at 8:14 AM, Vlastimil Babka  wrote:
> >
> > Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is
> > already EOL AFAICS).
> >
> > - send the patch [1] as 4.8-only stable.
> 
> I think that's the right thing to do. It's pretty small, and the
> argument that it changes the oom logic too much is pretty bogus, I
> think. The oom logic in 4.8 is simply broken. Let's get it fixed.
> Changing it is the point.

The point I've tried to make is that it is not should_reclaim_retry
which is broken. It's an overly optimistic reliance on the compaction
to do it's work which led to all those issues. My previous fix
31e49bfda184 ("mm, oom: protect !costly allocations some more for
!CONFIG_COMPACTION") tried to cope with that by checking the order-0
watermark which has proven to help most users. Now it didn't cover
everybody obviously. Rather than fiddling with fine tuning of these
heuristics I think it would be safer to simply admit that high order
OOM detection doesn't work in 4.8 kernel and so do not declare the OOM
killer for those requests at all. The risk of such a change is not big
because there usually are order-0 requests happening all the time so if
we are really OOM we would trigger the OOM eventually.

So I am proposing this for 4.8 stable tree instead
---
commit b2ccdcb731b666aa28f86483656c39c5e53828c7
Author: Michal Hocko 
Date:   Wed Nov 23 07:26:30 2016 +0100

mm, oom: stop pre-mature high-order OOM killer invocations

31e49bfda184 ("mm, oom: protect !costly allocations some more for
!CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM
killer invocation for high order requests. It seemed to work for most
users just fine but it is far from bullet proof and obviously not
sufficient for Marc who has reported pre-mature OOM killer invocations
with 4.8 based kernels. 4.9 will all the compaction improvements seems
to be behaving much better but that would be too intrusive to backport
to 4.8 stable kernels. Instead this patch simply never declares OOM for
!costly high order requests. We rely on order-0 requests to do that in
case we are really out of memory. Order-0 requests are much more common
and so a risk of a livelock without any way forward is highly unlikely.

Reported-by: Marc MERLIN 
Signed-off-by: Michal Hocko 

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c64ed3c..7401e996009a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, unsigned 
int order, int alloc_fla
if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
return false;
 
+#ifdef CONFIG_COMPACTION
+   /*
+* This is a gross workaround to compensate a lack of reliable 
compaction
+* operation. We cannot simply go OOM with the current state of the 
compaction
+* code because this can lead to pre mature OOM declaration.
+*/
+   if (order <= PAGE_ALLOC_COSTLY_ORDER)
+   return true;
+#endif
+
/*
 * There are setups with compaction disabled which would prefer to loop
 * inside the allocator rather than hit the oom killer prematurely.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/8] CaitSith LSM module

2016-11-22 Thread Tetsuo Handa
Tetsuo Handa wrote:
> John Johansen wrote:
> > > In order to minimize the burden of reviewing, this patchset implements
> > > only functionality of checking program execution requests (i.e. execve()
> > > system call) using pathnames. I'm planning to add other functionalities
> > > after this version got included into mainline. You can find how future
> > > versions of CaitSith will look like at http://caitsith.osdn.jp/ .
> > > 
> > Thanks I've started working my way through this, but it is going to take
> > me a while.
> > 
> 
> Thank you for your time.

May I hear the status? Is there something I can do other than waiting?

I wrote a full manual for this patchset as http://caitsith.osdn.jp/index2.html .
Does anybody have fundamental objection against CaitSith?


Re: [PATCH 0/8] CaitSith LSM module

2016-11-22 Thread Tetsuo Handa
Tetsuo Handa wrote:
> John Johansen wrote:
> > > In order to minimize the burden of reviewing, this patchset implements
> > > only functionality of checking program execution requests (i.e. execve()
> > > system call) using pathnames. I'm planning to add other functionalities
> > > after this version got included into mainline. You can find how future
> > > versions of CaitSith will look like at http://caitsith.osdn.jp/ .
> > > 
> > Thanks I've started working my way through this, but it is going to take
> > me a while.
> > 
> 
> Thank you for your time.

May I hear the status? Is there something I can do other than waiting?

I wrote a full manual for this patchset as http://caitsith.osdn.jp/index2.html .
Does anybody have fundamental objection against CaitSith?


Re: sendfile from 9p fs into af_alg

2016-11-22 Thread Al Viro
On Tue, Nov 22, 2016 at 08:55:59PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 23, 2016 at 04:46:26AM +, Al Viro wrote:
> > On Tue, Nov 22, 2016 at 07:58:29PM -0800, Alexei Starovoitov wrote:
> > > Hi Al,
> > > 
> > > it seems the following commit 523ac9afc73a ("switch 
> > > default_file_splice_read() to use of pipe-backed iov_iter")
> > > breaks sendfile from 9p fs into af_alg socket.
> > > sendfile into af_alg is used by iproute2/tc.
> > > I'm not sure whether it's 9p or crypto or vfs problem, but happy to test 
> > > any patches.
> > 
> > Could you try -rc6 (or anything that contains 680bb946a1ae04, for that
> > matter)?
> 
> already tested with that patch in the latest net-next. Still broken :(

Joy...  Which transport are you using there?  The interesting part is
whether it's zerocopy or non-zerocopy path in p9_client_read()...


Re: sendfile from 9p fs into af_alg

2016-11-22 Thread Al Viro
On Tue, Nov 22, 2016 at 08:55:59PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 23, 2016 at 04:46:26AM +, Al Viro wrote:
> > On Tue, Nov 22, 2016 at 07:58:29PM -0800, Alexei Starovoitov wrote:
> > > Hi Al,
> > > 
> > > it seems the following commit 523ac9afc73a ("switch 
> > > default_file_splice_read() to use of pipe-backed iov_iter")
> > > breaks sendfile from 9p fs into af_alg socket.
> > > sendfile into af_alg is used by iproute2/tc.
> > > I'm not sure whether it's 9p or crypto or vfs problem, but happy to test 
> > > any patches.
> > 
> > Could you try -rc6 (or anything that contains 680bb946a1ae04, for that
> > matter)?
> 
> already tested with that patch in the latest net-next. Still broken :(

Joy...  Which transport are you using there?  The interesting part is
whether it's zerocopy or non-zerocopy path in p9_client_read()...


Re: [PATCH v16 10/15] clocksource/drivers/arm_arch_timer: Refactor the timer init code to prepare for GTDT

2016-11-22 Thread Fu Wei
Hi Mark,


On 19 November 2016 at 04:03, Mark Rutland  wrote:
> On Wed, Nov 16, 2016 at 09:49:03PM +0800, fu@linaro.org wrote:
>> From: Fu Wei 
>>
>> The patch refactor original memory-mapped timer init code:
>> (1) Extract a subfunction for detecting a bast time frame:
>> is_best_frame.
>
> Please leave this logic in arch_timer_mem_init(). Pulling it out gains
> us nothing, but makes the patch harder to review.

OK, I have put it back to arch_timer_mem_init() in next version: v17

>
>> (2) Refactor "arch_timer_mem_init", make it become a common code for
>> memory-mapped timer init.
>> (3) Add a new function "arch_timer_mem_of_init" for DT init.
>
> These generally look fine.
>
> Thanks,
> Mark.
>
>> Signed-off-by: Fu Wei 
>> ---
>>  drivers/clocksource/arm_arch_timer.c | 162 
>> +++
>>  1 file changed, 107 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/clocksource/arm_arch_timer.c 
>> b/drivers/clocksource/arm_arch_timer.c
>> index 9ddc091..0836bb9 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -923,17 +923,35 @@ static int __init arch_timer_of_init(struct 
>> device_node *np)
>>  CLOCKSOURCE_OF_DECLARE(armv7_arch_timer, "arm,armv7-timer", 
>> arch_timer_of_init);
>>  CLOCKSOURCE_OF_DECLARE(armv8_arch_timer, "arm,armv8-timer", 
>> arch_timer_of_init);
>>
>> -static int __init arch_timer_mem_init(struct device_node *np)
>> +static bool __init is_best_frame(void __iomem *cntctlbase, u32 cnttidr, int 
>> n)
>> +{
>> + u32 cntacr = CNTACR_RFRQ | CNTACR_RWPT | CNTACR_RPCT | CNTACR_RWVT |
>> +  CNTACR_RVOFF | CNTACR_RVCT;
>> +
>> + /* Try enabling everything, and see what sticks */
>> + writel_relaxed(cntacr, cntctlbase + CNTACR(n));
>> + cntacr = readl_relaxed(cntctlbase + CNTACR(n));
>> +
>> + if ((cnttidr & CNTTIDR_VIRT(n)) &&
>> + !(~cntacr & (CNTACR_RWVT | CNTACR_RVCT)))
>> + arch_timer_mem_use_virtual = true;
>> + else if (~cntacr & (CNTACR_RWPT | CNTACR_RPCT))
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> +static int __init arch_timer_mem_init(struct arch_timer_mem *timer_mem)
>>  {
>> - struct device_node *frame, *best_frame = NULL;
>>   void __iomem *cntctlbase, *base;
>> - unsigned int irq, ret = -EINVAL;
>> + struct arch_timer_mem_frame *best_frame = NULL;
>> + unsigned int irq;
>>   u32 cnttidr;
>> + int i, ret;
>>
>> - arch_timers_present |= ARCH_TIMER_TYPE_MEM;
>> - cntctlbase = of_iomap(np, 0);
>> + cntctlbase = ioremap(timer_mem->cntctlbase, timer_mem->size);
>>   if (!cntctlbase) {
>> - pr_err("Can't find CNTCTLBase\n");
>> + pr_err("Can't map CNTCTLBase.\n");
>>   return -ENXIO;
>>   }
>>
>> @@ -943,76 +961,110 @@ static int __init arch_timer_mem_init(struct 
>> device_node *np)
>>* Try to find a virtual capable frame. Otherwise fall back to a
>>* physical capable frame.
>>*/
>> - for_each_available_child_of_node(np, frame) {
>> - int n;
>> - u32 cntacr;
>> -
>> - if (of_property_read_u32(frame, "frame-number", )) {
>> - pr_err("Missing frame-number\n");
>> - of_node_put(frame);
>> - goto out;
>> - }
>> -
>> - /* Try enabling everything, and see what sticks */
>> - cntacr = CNTACR_RFRQ | CNTACR_RWPT | CNTACR_RPCT |
>> -  CNTACR_RWVT | CNTACR_RVOFF | CNTACR_RVCT;
>> - writel_relaxed(cntacr, cntctlbase + CNTACR(n));
>> - cntacr = readl_relaxed(cntctlbase + CNTACR(n));
>> -
>> - if ((cnttidr & CNTTIDR_VIRT(n)) &&
>> - !(~cntacr & (CNTACR_RWVT | CNTACR_RVCT))) {
>> - of_node_put(best_frame);
>> - best_frame = frame;
>> - arch_timer_mem_use_virtual = true;
>> - break;
>> + for (i = 0; i < timer_mem->num_frames; i++) {
>> + if (is_best_frame(cntctlbase, cnttidr,
>> +   timer_mem->frame[i].frame_nr)) {
>> + best_frame = _mem->frame[i];
>> + if (arch_timer_mem_use_virtual)
>> + break;
>>   }
>> -
>> - if (~cntacr & (CNTACR_RWPT | CNTACR_RPCT))
>> - continue;
>> -
>> - of_node_put(best_frame);
>> - best_frame = of_node_get(frame);
>>   }
>> + iounmap(cntctlbase);
>>
>> - ret= -ENXIO;
>> - base = arch_counter_base = of_iomap(best_frame, 0);
>> - if (!base) {
>> - pr_err("Can't map frame's registers\n");
>> - goto out;
>> + if (!best_frame) {
>> + pr_err("Can't find frame for register\n");
>> + 

Re: [PATCH v16 10/15] clocksource/drivers/arm_arch_timer: Refactor the timer init code to prepare for GTDT

2016-11-22 Thread Fu Wei
Hi Mark,


On 19 November 2016 at 04:03, Mark Rutland  wrote:
> On Wed, Nov 16, 2016 at 09:49:03PM +0800, fu@linaro.org wrote:
>> From: Fu Wei 
>>
>> The patch refactor original memory-mapped timer init code:
>> (1) Extract a subfunction for detecting a bast time frame:
>> is_best_frame.
>
> Please leave this logic in arch_timer_mem_init(). Pulling it out gains
> us nothing, but makes the patch harder to review.

OK, I have put it back to arch_timer_mem_init() in next version: v17

>
>> (2) Refactor "arch_timer_mem_init", make it become a common code for
>> memory-mapped timer init.
>> (3) Add a new function "arch_timer_mem_of_init" for DT init.
>
> These generally look fine.
>
> Thanks,
> Mark.
>
>> Signed-off-by: Fu Wei 
>> ---
>>  drivers/clocksource/arm_arch_timer.c | 162 
>> +++
>>  1 file changed, 107 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/clocksource/arm_arch_timer.c 
>> b/drivers/clocksource/arm_arch_timer.c
>> index 9ddc091..0836bb9 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -923,17 +923,35 @@ static int __init arch_timer_of_init(struct 
>> device_node *np)
>>  CLOCKSOURCE_OF_DECLARE(armv7_arch_timer, "arm,armv7-timer", 
>> arch_timer_of_init);
>>  CLOCKSOURCE_OF_DECLARE(armv8_arch_timer, "arm,armv8-timer", 
>> arch_timer_of_init);
>>
>> -static int __init arch_timer_mem_init(struct device_node *np)
>> +static bool __init is_best_frame(void __iomem *cntctlbase, u32 cnttidr, int 
>> n)
>> +{
>> + u32 cntacr = CNTACR_RFRQ | CNTACR_RWPT | CNTACR_RPCT | CNTACR_RWVT |
>> +  CNTACR_RVOFF | CNTACR_RVCT;
>> +
>> + /* Try enabling everything, and see what sticks */
>> + writel_relaxed(cntacr, cntctlbase + CNTACR(n));
>> + cntacr = readl_relaxed(cntctlbase + CNTACR(n));
>> +
>> + if ((cnttidr & CNTTIDR_VIRT(n)) &&
>> + !(~cntacr & (CNTACR_RWVT | CNTACR_RVCT)))
>> + arch_timer_mem_use_virtual = true;
>> + else if (~cntacr & (CNTACR_RWPT | CNTACR_RPCT))
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> +static int __init arch_timer_mem_init(struct arch_timer_mem *timer_mem)
>>  {
>> - struct device_node *frame, *best_frame = NULL;
>>   void __iomem *cntctlbase, *base;
>> - unsigned int irq, ret = -EINVAL;
>> + struct arch_timer_mem_frame *best_frame = NULL;
>> + unsigned int irq;
>>   u32 cnttidr;
>> + int i, ret;
>>
>> - arch_timers_present |= ARCH_TIMER_TYPE_MEM;
>> - cntctlbase = of_iomap(np, 0);
>> + cntctlbase = ioremap(timer_mem->cntctlbase, timer_mem->size);
>>   if (!cntctlbase) {
>> - pr_err("Can't find CNTCTLBase\n");
>> + pr_err("Can't map CNTCTLBase.\n");
>>   return -ENXIO;
>>   }
>>
>> @@ -943,76 +961,110 @@ static int __init arch_timer_mem_init(struct 
>> device_node *np)
>>* Try to find a virtual capable frame. Otherwise fall back to a
>>* physical capable frame.
>>*/
>> - for_each_available_child_of_node(np, frame) {
>> - int n;
>> - u32 cntacr;
>> -
>> - if (of_property_read_u32(frame, "frame-number", )) {
>> - pr_err("Missing frame-number\n");
>> - of_node_put(frame);
>> - goto out;
>> - }
>> -
>> - /* Try enabling everything, and see what sticks */
>> - cntacr = CNTACR_RFRQ | CNTACR_RWPT | CNTACR_RPCT |
>> -  CNTACR_RWVT | CNTACR_RVOFF | CNTACR_RVCT;
>> - writel_relaxed(cntacr, cntctlbase + CNTACR(n));
>> - cntacr = readl_relaxed(cntctlbase + CNTACR(n));
>> -
>> - if ((cnttidr & CNTTIDR_VIRT(n)) &&
>> - !(~cntacr & (CNTACR_RWVT | CNTACR_RVCT))) {
>> - of_node_put(best_frame);
>> - best_frame = frame;
>> - arch_timer_mem_use_virtual = true;
>> - break;
>> + for (i = 0; i < timer_mem->num_frames; i++) {
>> + if (is_best_frame(cntctlbase, cnttidr,
>> +   timer_mem->frame[i].frame_nr)) {
>> + best_frame = _mem->frame[i];
>> + if (arch_timer_mem_use_virtual)
>> + break;
>>   }
>> -
>> - if (~cntacr & (CNTACR_RWPT | CNTACR_RPCT))
>> - continue;
>> -
>> - of_node_put(best_frame);
>> - best_frame = of_node_get(frame);
>>   }
>> + iounmap(cntctlbase);
>>
>> - ret= -ENXIO;
>> - base = arch_counter_base = of_iomap(best_frame, 0);
>> - if (!base) {
>> - pr_err("Can't map frame's registers\n");
>> - goto out;
>> + if (!best_frame) {
>> + pr_err("Can't find frame for register\n");
>> + return -EINVAL;
>>   }
>>
>>   if 

Re: net/can: use-after-free in bcm_rx_thr_flush

2016-11-22 Thread Oliver Hartkopp

On 11/22/2016 06:37 PM, Andrey Konovalov wrote:

On Tue, Nov 22, 2016 at 6:29 PM, Oliver Hartkopp  wrote:

Hi Andrey,

thanks for the report.

Although I can't see the issue in the code ...



Oh, I can see it now m(

Will send a patch today.

Many thanks,
Oliver



Re: net/can: use-after-free in bcm_rx_thr_flush

2016-11-22 Thread Oliver Hartkopp

On 11/22/2016 06:37 PM, Andrey Konovalov wrote:

On Tue, Nov 22, 2016 at 6:29 PM, Oliver Hartkopp  wrote:

Hi Andrey,

thanks for the report.

Although I can't see the issue in the code ...



Oh, I can see it now m(

Will send a patch today.

Many thanks,
Oliver



Re: [PATCH v2 1/5] ARM: memory: da8xx-ddrctl: new driver

2016-11-22 Thread Sekhar Nori
On Tuesday 22 November 2016 11:51 PM, Frank Rowand wrote:
> Please note that the compatible property might contain several strings, not 
> just
> a single string.

So I guess the best thing to do is to use
of_property_read_string_index() and print the sting at index 0.

Thanks,
Sekhar


Re: [PATCH] HID: lg: fix noderef.cocci warnings

2016-11-22 Thread Fengguang Wu

On Tue, Nov 22, 2016 at 11:44:34AM +0100, Jiri Kosina wrote:

On Mon, 21 Nov 2016, Benjamin Tissoires wrote:


> Generated by: scripts/coccinelle/misc/noderef.cocci
>
> CC: Benjamin Tissoires 
> Signed-off-by: Fengguang Wu 
> ---
>
>  hid-lg.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> --- a/drivers/hid/hid-lg.c
> +++ b/drivers/hid/hid-lg.c
> @@ -777,8 +777,10 @@ static int lg_probe(struct hid_device *h
>buf[1] = 0xB2;
>get_random_bytes([2], 2);
>
> -  ret = hid_hw_raw_request(hdev, buf[0], buf, sizeof(buf),
> -  HID_FEATURE_REPORT, HID_REQ_SET_REPORT);
> +  ret = hid_hw_raw_request(hdev, buf[0], buf,
> +   sizeof(*buf),

This is wrong. I messed up and should have used "sizeof(cbuf)", but the
coccinelle script failed at detecting the correct solution (I guess it
couldn't).


Fengguang, is there anything that could be done to improve this?


CC Julie and Gilles. I'm not sure if the coccinelle script could be
made that smart. :)


Jiri, do you want me to send a v2 of the series or will you just amend
the patch while applying?


I'll fix that up, no worries. Thanks,

--
Jiri Kosina
SUSE Labs


Re: [PATCH v2 1/5] ARM: memory: da8xx-ddrctl: new driver

2016-11-22 Thread Sekhar Nori
On Tuesday 22 November 2016 11:51 PM, Frank Rowand wrote:
> Please note that the compatible property might contain several strings, not 
> just
> a single string.

So I guess the best thing to do is to use
of_property_read_string_index() and print the sting at index 0.

Thanks,
Sekhar


Re: [PATCH] HID: lg: fix noderef.cocci warnings

2016-11-22 Thread Fengguang Wu

On Tue, Nov 22, 2016 at 11:44:34AM +0100, Jiri Kosina wrote:

On Mon, 21 Nov 2016, Benjamin Tissoires wrote:


> Generated by: scripts/coccinelle/misc/noderef.cocci
>
> CC: Benjamin Tissoires 
> Signed-off-by: Fengguang Wu 
> ---
>
>  hid-lg.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> --- a/drivers/hid/hid-lg.c
> +++ b/drivers/hid/hid-lg.c
> @@ -777,8 +777,10 @@ static int lg_probe(struct hid_device *h
>buf[1] = 0xB2;
>get_random_bytes([2], 2);
>
> -  ret = hid_hw_raw_request(hdev, buf[0], buf, sizeof(buf),
> -  HID_FEATURE_REPORT, HID_REQ_SET_REPORT);
> +  ret = hid_hw_raw_request(hdev, buf[0], buf,
> +   sizeof(*buf),

This is wrong. I messed up and should have used "sizeof(cbuf)", but the
coccinelle script failed at detecting the correct solution (I guess it
couldn't).


Fengguang, is there anything that could be done to improve this?


CC Julie and Gilles. I'm not sure if the coccinelle script could be
made that smart. :)


Jiri, do you want me to send a v2 of the series or will you just amend
the patch while applying?


I'll fix that up, no worries. Thanks,

--
Jiri Kosina
SUSE Labs


Crypto Fixes for 4.9

2016-11-22 Thread Herbert Xu
Hi Linus:

The last push broke algif_hash for all shash implementations,
so this is a follow-up to fix that.  It also fixes a problem
in the crypto scatterwalk that triggers a BUG_ON with certain
debugging options due to the new vmalloced-stack code.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Herbert Xu (2):
  crypto: algif_hash - Fix result clobbering in recvmsg
  crypto: scatterwalk - Remove unnecessary aliasing check in map_and_copy

 crypto/algif_hash.c  |2 +-
 crypto/scatterwalk.c |4 
 2 files changed, 1 insertion(+), 5 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Crypto Fixes for 4.9

2016-11-22 Thread Herbert Xu
Hi Linus:

The last push broke algif_hash for all shash implementations,
so this is a follow-up to fix that.  It also fixes a problem
in the crypto scatterwalk that triggers a BUG_ON with certain
debugging options due to the new vmalloced-stack code.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Herbert Xu (2):
  crypto: algif_hash - Fix result clobbering in recvmsg
  crypto: scatterwalk - Remove unnecessary aliasing check in map_and_copy

 crypto/algif_hash.c  |2 +-
 crypto/scatterwalk.c |4 
 2 files changed, 1 insertion(+), 5 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 7/8] perf sched timehist: Add -V/--cpu-visual option

2016-11-22 Thread Namhyung Kim
On Tue, Nov 22, 2016 at 03:33:26PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 16, 2016 at 03:06:33PM +0900, Namhyung Kim escreveu:
> > From: David Ahern 
> > 
> > The -V option provides a visual aid for sched switches by cpu:
> > 
> >   $ perf sched timehist -V
> >  timecpu  0123456789abc  task name  b/n time  
> > sch delay   run time
> >  [tid/pid](msec)
> >  (msec) (msec)
> >   --- --  -    -  
> > -  -
> >   ...
> >2412598.429696 [0009]   i 0.000
> >   0.000  0.000
> >2412598.429767 [0002]sperf[7219]0.000
> >   0.000  0.000
> >2412598.429783 [0009]   s perf[7220]0.000
> >   0.006  0.087
> >2412598.429794 [0010]i0.000
> >   0.000  0.000
> >2412598.429795 [0009]   s migration/9[53]   0.000
> >   0.003  0.011
> >2412598.430370 [0010]ssleep[7220]   0.011
> >   0.000  0.576
> >2412598.432584 [0003] i   0.000
> >   0.000  0.000
> >   ...
> 
> Forgot to add docs, will do.

The documentation of sched timehist command (including the -V option)
comes with the next patch (8/8).

Thanks,
Namhyung


> 
> - Arnaldo
>  
> > Signed-off-by: David Ahern 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  tools/perf/builtin-sched.c | 44 
> > ++--
> >  1 file changed, 42 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > index 1f8731640809..829468defa07 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> > @@ -201,6 +201,7 @@ struct perf_sched {
> > boolsummary_only;
> > boolshow_callchain;
> > unsigned intmax_stack;
> > +   boolshow_cpu_visual;
> > boolshow_wakeups;
> > u64 skipped_samples;
> >  };
> > @@ -1783,10 +1784,23 @@ static char *timehist_get_commstr(struct thread 
> > *thread)
> > return str;
> >  }
> >  
> > -static void timehist_header(void)
> > +static void timehist_header(struct perf_sched *sched)
> >  {
> > +   u32 ncpus = sched->max_cpu + 1;
> > +   u32 i, j;
> > +
> > printf("%15s %6s ", "time", "cpu");
> >  
> > +   if (sched->show_cpu_visual) {
> > +   printf(" ");
> > +   for (i = 0, j = 0; i < ncpus; ++i) {
> > +   printf("%x", j++);
> > +   if (j > 15)
> > +   j = 0;
> > +   }
> > +   printf(" ");
> > +   }
> > +
> > printf(" %-20s  %9s  %9s  %9s",
> > "task name", "wait time", "sch delay", "run time");
> >  
> > @@ -1797,6 +1811,9 @@ static void timehist_header(void)
> >  */
> > printf("%15s %-6s ", "", "");
> >  
> > +   if (sched->show_cpu_visual)
> > +   printf(" %*s ", ncpus, "");
> > +
> > printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", 
> > "(msec)");
> >  
> > /*
> > @@ -1804,6 +1821,9 @@ static void timehist_header(void)
> >  */
> > printf("%.15s %.6s ", graph_dotted_line, graph_dotted_line);
> >  
> > +   if (sched->show_cpu_visual)
> > +   printf(" %.*s ", ncpus, graph_dotted_line);
> > +
> > printf(" %.20s  %.9s  %.9s  %.9s",
> > graph_dotted_line, graph_dotted_line, graph_dotted_line,
> > graph_dotted_line);
> > @@ -1817,11 +1837,28 @@ static void timehist_print_sample(struct perf_sched 
> > *sched,
> >   struct thread *thread)
> >  {
> > struct thread_runtime *tr = thread__priv(thread);
> > +   u32 max_cpus = sched->max_cpu + 1;
> > char tstr[64];
> >  
> > timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
> > printf("%15s [%04d] ", tstr, sample->cpu);
> >  
> > +   if (sched->show_cpu_visual) {
> > +   u32 i;
> > +   char c;
> > +
> > +   printf(" ");
> > +   for (i = 0; i < max_cpus; ++i) {
> > +   /* flag idle times with 'i'; others are sched events */
> > +   if (i == sample->cpu)
> > +   c = (thread->tid == 0) ? 'i' : 's';
> > +   else
> > +   c = ' ';
> > +   printf("%c", c);
> > +   }
> > +   printf(" ");
> > +   }
> > +
> > printf(" %-*s ", comm_width, timehist_get_commstr(thread));
> >  
> > print_sched_time(tr->dt_wait, 6);
> > @@ -2095,6 +2132,8 @@ static void timehist_print_wakeup_event(struct 
> > perf_sched *sched,
> >  
> > timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
> > printf("%15s [%04d] ", tstr, 

Re: [PATCH v2 7/8] perf sched timehist: Add -V/--cpu-visual option

2016-11-22 Thread Namhyung Kim
On Tue, Nov 22, 2016 at 03:33:26PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 16, 2016 at 03:06:33PM +0900, Namhyung Kim escreveu:
> > From: David Ahern 
> > 
> > The -V option provides a visual aid for sched switches by cpu:
> > 
> >   $ perf sched timehist -V
> >  timecpu  0123456789abc  task name  b/n time  
> > sch delay   run time
> >  [tid/pid](msec)
> >  (msec) (msec)
> >   --- --  -    -  
> > -  -
> >   ...
> >2412598.429696 [0009]   i 0.000
> >   0.000  0.000
> >2412598.429767 [0002]sperf[7219]0.000
> >   0.000  0.000
> >2412598.429783 [0009]   s perf[7220]0.000
> >   0.006  0.087
> >2412598.429794 [0010]i0.000
> >   0.000  0.000
> >2412598.429795 [0009]   s migration/9[53]   0.000
> >   0.003  0.011
> >2412598.430370 [0010]ssleep[7220]   0.011
> >   0.000  0.576
> >2412598.432584 [0003] i   0.000
> >   0.000  0.000
> >   ...
> 
> Forgot to add docs, will do.

The documentation of sched timehist command (including the -V option)
comes with the next patch (8/8).

Thanks,
Namhyung


> 
> - Arnaldo
>  
> > Signed-off-by: David Ahern 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  tools/perf/builtin-sched.c | 44 
> > ++--
> >  1 file changed, 42 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > index 1f8731640809..829468defa07 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> > @@ -201,6 +201,7 @@ struct perf_sched {
> > boolsummary_only;
> > boolshow_callchain;
> > unsigned intmax_stack;
> > +   boolshow_cpu_visual;
> > boolshow_wakeups;
> > u64 skipped_samples;
> >  };
> > @@ -1783,10 +1784,23 @@ static char *timehist_get_commstr(struct thread 
> > *thread)
> > return str;
> >  }
> >  
> > -static void timehist_header(void)
> > +static void timehist_header(struct perf_sched *sched)
> >  {
> > +   u32 ncpus = sched->max_cpu + 1;
> > +   u32 i, j;
> > +
> > printf("%15s %6s ", "time", "cpu");
> >  
> > +   if (sched->show_cpu_visual) {
> > +   printf(" ");
> > +   for (i = 0, j = 0; i < ncpus; ++i) {
> > +   printf("%x", j++);
> > +   if (j > 15)
> > +   j = 0;
> > +   }
> > +   printf(" ");
> > +   }
> > +
> > printf(" %-20s  %9s  %9s  %9s",
> > "task name", "wait time", "sch delay", "run time");
> >  
> > @@ -1797,6 +1811,9 @@ static void timehist_header(void)
> >  */
> > printf("%15s %-6s ", "", "");
> >  
> > +   if (sched->show_cpu_visual)
> > +   printf(" %*s ", ncpus, "");
> > +
> > printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", 
> > "(msec)");
> >  
> > /*
> > @@ -1804,6 +1821,9 @@ static void timehist_header(void)
> >  */
> > printf("%.15s %.6s ", graph_dotted_line, graph_dotted_line);
> >  
> > +   if (sched->show_cpu_visual)
> > +   printf(" %.*s ", ncpus, graph_dotted_line);
> > +
> > printf(" %.20s  %.9s  %.9s  %.9s",
> > graph_dotted_line, graph_dotted_line, graph_dotted_line,
> > graph_dotted_line);
> > @@ -1817,11 +1837,28 @@ static void timehist_print_sample(struct perf_sched 
> > *sched,
> >   struct thread *thread)
> >  {
> > struct thread_runtime *tr = thread__priv(thread);
> > +   u32 max_cpus = sched->max_cpu + 1;
> > char tstr[64];
> >  
> > timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
> > printf("%15s [%04d] ", tstr, sample->cpu);
> >  
> > +   if (sched->show_cpu_visual) {
> > +   u32 i;
> > +   char c;
> > +
> > +   printf(" ");
> > +   for (i = 0; i < max_cpus; ++i) {
> > +   /* flag idle times with 'i'; others are sched events */
> > +   if (i == sample->cpu)
> > +   c = (thread->tid == 0) ? 'i' : 's';
> > +   else
> > +   c = ' ';
> > +   printf("%c", c);
> > +   }
> > +   printf(" ");
> > +   }
> > +
> > printf(" %-*s ", comm_width, timehist_get_commstr(thread));
> >  
> > print_sched_time(tr->dt_wait, 6);
> > @@ -2095,6 +2132,8 @@ static void timehist_print_wakeup_event(struct 
> > perf_sched *sched,
> >  
> > timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
> > printf("%15s [%04d] ", tstr, sample->cpu);
> > +   if (sched->show_cpu_visual)
> > +   

Re: Locking API testsuite output mangled

2016-11-22 Thread Michael Ellerman
Christian Kujau  writes:

> The "Locking API testsuite" output during bootup (with 
> CONFIG_DEBUG_LOCKING_API_SELFTESTS=y) on this PowerPC system looks 
> mangled, possibly related to the recent printk changes (4bcc595ccd80, 
> "printk: reinstate KERN_CONT for printing continuation lines"). Before 
> (e.g. with v4.6) it looked like this:
>
>  http://nerdbynature.de/bits/4.6.0-rc7/dmesg.txt
>
> See below for the current output.

That's nothing powerpc specific AFAICS, does this fix it?

cheers

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 872a15a2a637..f3a217ea0388 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -980,23 +980,23 @@ static void dotest(void (*testcase_fn)(void), int 
expected, int lockclass_mask)
 #ifndef CONFIG_PROVE_LOCKING
if (expected == FAILURE && debug_locks) {
expected_testcase_failures++;
-   printk("failed|");
+   pr_cont("failed|");
}
else
 #endif
if (debug_locks != expected) {
unexpected_testcase_failures++;
-   printk("FAILED|");
+   pr_cont("FAILED|");
 
dump_stack();
} else {
testcase_successes++;
-   printk("  ok  |");
+   pr_cont("  ok  |");
}
testcase_total++;
 
if (debug_locks_verbose)
-   printk(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
+   pr_cont(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
lockclass_mask, debug_locks, expected);
/*
 * Some tests (e.g. double-unlock) might corrupt the preemption
@@ -1021,26 +1021,26 @@ static inline void print_testname(const char *testname)
 #define DO_TESTCASE_1(desc, name, nr)  \
print_testname(desc"/"#nr); \
dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK);  \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_1B(desc, name, nr) \
print_testname(desc"/"#nr); \
dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK);  \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_3(desc, name, nr)  \
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN);   \
dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_3RW(desc, name, nr)\
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN|LOCKTYPE_RWLOCK);\
dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_6(desc, name)  \
print_testname(desc);   \
@@ -1050,7 +1050,7 @@ static inline void print_testname(const char *testname)
dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_6_SUCCESS(desc, name)  \
print_testname(desc);   \
@@ -1060,7 +1060,7 @@ static inline void print_testname(const char *testname)
dotest(name##_mutex, SUCCESS, LOCKTYPE_MUTEX);  \
dotest(name##_wsem, SUCCESS, LOCKTYPE_RWSEM);   \
dotest(name##_rsem, SUCCESS, LOCKTYPE_RWSEM);   \
-   printk("\n");
+   pr_cont("\n");
 
 /*
  * 'read' variant: rlocks must not trigger.
@@ -1073,7 +1073,7 @@ static inline void print_testname(const char *testname)
dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_2I(desc, name, nr) \
DO_TESTCASE_1("hard-"desc, name##_hard, nr);\
@@ -1726,25 +1726,25 @@ static void ww_tests(void)
dotest(ww_test_fail_acquire, SUCCESS, LOCKTYPE_WW);
dotest(ww_test_normal, SUCCESS, LOCKTYPE_WW);
dotest(ww_test_unneeded_slow, FAILURE, LOCKTYPE_WW);
-   printk("\n");
+   pr_cont("\n");
 
print_testname("ww contexts mixing");
dotest(ww_test_two_contexts, FAILURE, LOCKTYPE_WW);
dotest(ww_test_diff_class, FAILURE, LOCKTYPE_WW);
-   printk("\n");
+   pr_cont("\n");
 
print_testname("finishing ww context");
  

Re: Locking API testsuite output mangled

2016-11-22 Thread Michael Ellerman
Christian Kujau  writes:

> The "Locking API testsuite" output during bootup (with 
> CONFIG_DEBUG_LOCKING_API_SELFTESTS=y) on this PowerPC system looks 
> mangled, possibly related to the recent printk changes (4bcc595ccd80, 
> "printk: reinstate KERN_CONT for printing continuation lines"). Before 
> (e.g. with v4.6) it looked like this:
>
>  http://nerdbynature.de/bits/4.6.0-rc7/dmesg.txt
>
> See below for the current output.

That's nothing powerpc specific AFAICS, does this fix it?

cheers

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 872a15a2a637..f3a217ea0388 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -980,23 +980,23 @@ static void dotest(void (*testcase_fn)(void), int 
expected, int lockclass_mask)
 #ifndef CONFIG_PROVE_LOCKING
if (expected == FAILURE && debug_locks) {
expected_testcase_failures++;
-   printk("failed|");
+   pr_cont("failed|");
}
else
 #endif
if (debug_locks != expected) {
unexpected_testcase_failures++;
-   printk("FAILED|");
+   pr_cont("FAILED|");
 
dump_stack();
} else {
testcase_successes++;
-   printk("  ok  |");
+   pr_cont("  ok  |");
}
testcase_total++;
 
if (debug_locks_verbose)
-   printk(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
+   pr_cont(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
lockclass_mask, debug_locks, expected);
/*
 * Some tests (e.g. double-unlock) might corrupt the preemption
@@ -1021,26 +1021,26 @@ static inline void print_testname(const char *testname)
 #define DO_TESTCASE_1(desc, name, nr)  \
print_testname(desc"/"#nr); \
dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK);  \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_1B(desc, name, nr) \
print_testname(desc"/"#nr); \
dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK);  \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_3(desc, name, nr)  \
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN);   \
dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_3RW(desc, name, nr)\
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN|LOCKTYPE_RWLOCK);\
dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_6(desc, name)  \
print_testname(desc);   \
@@ -1050,7 +1050,7 @@ static inline void print_testname(const char *testname)
dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_6_SUCCESS(desc, name)  \
print_testname(desc);   \
@@ -1060,7 +1060,7 @@ static inline void print_testname(const char *testname)
dotest(name##_mutex, SUCCESS, LOCKTYPE_MUTEX);  \
dotest(name##_wsem, SUCCESS, LOCKTYPE_RWSEM);   \
dotest(name##_rsem, SUCCESS, LOCKTYPE_RWSEM);   \
-   printk("\n");
+   pr_cont("\n");
 
 /*
  * 'read' variant: rlocks must not trigger.
@@ -1073,7 +1073,7 @@ static inline void print_testname(const char *testname)
dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
-   printk("\n");
+   pr_cont("\n");
 
 #define DO_TESTCASE_2I(desc, name, nr) \
DO_TESTCASE_1("hard-"desc, name##_hard, nr);\
@@ -1726,25 +1726,25 @@ static void ww_tests(void)
dotest(ww_test_fail_acquire, SUCCESS, LOCKTYPE_WW);
dotest(ww_test_normal, SUCCESS, LOCKTYPE_WW);
dotest(ww_test_unneeded_slow, FAILURE, LOCKTYPE_WW);
-   printk("\n");
+   pr_cont("\n");
 
print_testname("ww contexts mixing");
dotest(ww_test_two_contexts, FAILURE, LOCKTYPE_WW);
dotest(ww_test_diff_class, FAILURE, LOCKTYPE_WW);
-   printk("\n");
+   pr_cont("\n");
 
print_testname("finishing ww context");

[PATCH 1/2] kvm: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its structure

2016-11-22 Thread David Gibson
The KVM_PPC_PVINFO_FLAGS_EV_IDLE macro defines a bit for use in the flags
field of struct kvm_ppc_pvinfo.  However, changes since that was introduced
have moved it away from that structure definition, which is confusing.

Move it back next to the structure it belongs with.

Signed-off-by: David Gibson 
---
 include/uapi/linux/kvm.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4ee67cb..cac48ed 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -651,6 +651,9 @@ struct kvm_enable_cap {
 };
 
 /* for KVM_PPC_GET_PVINFO */
+
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
+
 struct kvm_ppc_pvinfo {
/* out */
__u32 flags;
@@ -682,8 +685,6 @@ struct kvm_ppc_smmu_info {
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
 };
 
-#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
-
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
-- 
2.7.4



[PATCH 1/2] kvm: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its structure

2016-11-22 Thread David Gibson
The KVM_PPC_PVINFO_FLAGS_EV_IDLE macro defines a bit for use in the flags
field of struct kvm_ppc_pvinfo.  However, changes since that was introduced
have moved it away from that structure definition, which is confusing.

Move it back next to the structure it belongs with.

Signed-off-by: David Gibson 
---
 include/uapi/linux/kvm.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4ee67cb..cac48ed 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -651,6 +651,9 @@ struct kvm_enable_cap {
 };
 
 /* for KVM_PPC_GET_PVINFO */
+
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
+
 struct kvm_ppc_pvinfo {
/* out */
__u32 flags;
@@ -682,8 +685,6 @@ struct kvm_ppc_smmu_info {
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
 };
 
-#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
-
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
-- 
2.7.4



[PATCH 0/2] Preliminary cleanups for HPT resizing

2016-11-22 Thread David Gibson
Hi Paul,

I'm still chasing this confusion about the CAS bit to send the real
HPT resizing patches.  However, in the meantime, here are some
preliminary cleanups.

These cleanups stand on their own, although I wrote them in the
context of writing the HPT resizing code, and are prerequisites for
those patches.

David Gibson (2):
  kvm: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its
structure
  powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB

 arch/powerpc/kvm/powerpc.c | 5 -
 include/uapi/linux/kvm.h   | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 0/2] Preliminary cleanups for HPT resizing

2016-11-22 Thread David Gibson
Hi Paul,

I'm still chasing this confusion about the CAS bit to send the real
HPT resizing patches.  However, in the meantime, here are some
preliminary cleanups.

These cleanups stand on their own, although I wrote them in the
context of writing the HPT resizing code, and are prerequisites for
those patches.

David Gibson (2):
  kvm: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its
structure
  powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB

 arch/powerpc/kvm/powerpc.c | 5 -
 include/uapi/linux/kvm.h   | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 2/2] powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB

2016-11-22 Thread David Gibson
At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled.
However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually
works on KVM HV.  On KVM PR it will fail with ENOTTY.

qemu already has a workaround for this, so it's not breaking things in
practice, but it would be better to advertise this correctly.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/powerpc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..7b6b9eb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -536,7 +536,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
case KVM_CAP_SPAPR_TCE_64:
-   case KVM_CAP_PPC_ALLOC_HTAB:
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
@@ -545,6 +544,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #endif
r = 1;
break;
+
+   case KVM_CAP_PPC_ALLOC_HTAB:
+   r = hv_enabled;
+   break;
 #endif /* CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_SMT:
-- 
2.7.4



[PATCH 2/2] powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB

2016-11-22 Thread David Gibson
At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled.
However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually
works on KVM HV.  On KVM PR it will fail with ENOTTY.

qemu already has a workaround for this, so it's not breaking things in
practice, but it would be better to advertise this correctly.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/powerpc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..7b6b9eb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -536,7 +536,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
case KVM_CAP_SPAPR_TCE_64:
-   case KVM_CAP_PPC_ALLOC_HTAB:
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
@@ -545,6 +544,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #endif
r = 1;
break;
+
+   case KVM_CAP_PPC_ALLOC_HTAB:
+   r = hv_enabled;
+   break;
 #endif /* CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_SMT:
-- 
2.7.4



Re: [PATCH V9 1/6] tracing: add a possibility of exporting function trace to other places instead of ring buffer only

2016-11-22 Thread Chunyan Zhang
Hi Steve,

Actually I had been keeping the idea that we would need to export most
kinds of traces rather than function trace only to somewhere else, say
STM, that's also why I made STM_SOURCE_FTRACE depending on TRACING
which was later changed to FUNCTION_TRACER according to you advice.

Thanks,
Chunyan

On 23 November 2016 at 10:27, Chunyan Zhang  wrote:
> On 23 November 2016 at 06:39, Steven Rostedt  wrote:
>> On Mon, 21 Nov 2016 15:57:18 +0800
>> Chunyan Zhang  wrote:
>>
>>> Currently Function traces can be only exported to ring buffer, this
>>> patch added trace_export concept which can process traces and export
>>> them to a registered destination as an addition to the current only
>>> one output of Ftrace - i.e. ring buffer.
>>>
>>> In this way, if we want Function traces to be sent to other destination
>>> rather than ring buffer only, we just need to register a new trace_export
>>> and implement its own .write() function for writing traces to storage.
>>>
>>> With this patch, only Function trace (trace type is TRACE_FN)
>>> is supported.
>>>
>>> Signed-off-by: Chunyan Zhang 
>>> ---
>>>  include/linux/trace.h |  28 +++
>>>  kernel/trace/trace.c  | 129 
>>> +-
>>>  2 files changed, 156 insertions(+), 1 deletion(-)
>>>  create mode 100644 include/linux/trace.h
>>>
>>> diff --git a/include/linux/trace.h b/include/linux/trace.h
>>> new file mode 100644
>>> index 000..9330a58
>>> --- /dev/null
>>> +++ b/include/linux/trace.h
>>> @@ -0,0 +1,28 @@
>>> +#ifndef _LINUX_TRACE_H
>>> +#define _LINUX_TRACE_H
>>> +
>>> +#ifdef CONFIG_TRACING
>>> +/*
>>> + * The trace export - an export of Ftrace output. The trace_export
>>> + * can process traces and export them to a registered destination as
>>> + * an addition to the current only output of Ftrace - i.e. ring buffer.
>>> + *
>>> + * If you want traces to be sent to some other place rather than ring
>>> + * buffer only, just need to register a new trace_export and implement
>>> + * its own .write() function for writing traces to the storage.
>>> + *
>>> + * next  - pointer to the next trace_export
>>> + * write - copy traces which have been delt with ->commit() to
>>> + * the destination
>>> + */
>>> +struct trace_export {
>>> + struct trace_export __rcu   *next;
>>> + void (*write)(const void *, unsigned int);
>>> +};
>>> +
>>> +int register_ftrace_export(struct trace_export *export);
>>> +int unregister_ftrace_export(struct trace_export *export);
>>> +
>>> +#endif   /* CONFIG_TRACING */
>>> +
>>> +#endif   /* _LINUX_TRACE_H */
>>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>>> index 8696ce6..038291d 100644
>>> --- a/kernel/trace/trace.c
>>> +++ b/kernel/trace/trace.c
>>> @@ -40,6 +40,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>
>>>  #include "trace.h"
>>> @@ -2128,6 +2129,129 @@ void trace_buffer_unlock_commit_regs(struct 
>>> trace_array *tr,
>>>   ftrace_trace_userstack(buffer, flags, pc);
>>>  }
>>>
>>> +static void
>>> +trace_process_export(struct trace_export *export,
>>> +struct ring_buffer_event *event)
>>> +{
>>> + struct trace_entry *entry;
>>> + unsigned int size = 0;
>>> +
>>> + entry = ring_buffer_event_data(event);
>>> + size = ring_buffer_event_length(event);
>>> + export->write(entry, size);
>>> +}
>>> +
>>> +static DEFINE_MUTEX(ftrace_export_lock);
>>> +
>>> +static struct trace_export __rcu *ftrace_exports_list __read_mostly;
>>> +
>>> +static DEFINE_STATIC_KEY_FALSE(ftrace_exports_enabled);
>>> +
>>> +static inline void ftrace_exports_enable(void)
>>> +{
>>> + static_branch_enable(_exports_enabled);
>>> +}
>>> +
>>> +static inline void ftrace_exports_disable(void)
>>> +{
>>> + static_branch_disable(_exports_enabled);
>>> +}
>>> +
>>> +void ftrace_exports(struct ring_buffer_event *event)
>>
>> I'm currently testing the patches, but is there a reason that
>> ftrace_exports() is not static?
>
> At present  ftrace_exports() is only used by function trace though,
> but I hope it can be used by other traces when it needed.
> So I didn't mark it with static, but if you think it should better be
> static for the time being, I can revise that.
>
> Thanks,
> Chunyan
>
>
>>
>> -- Steve
>>
>>> +{
>>> + struct trace_export *export;
>>> +
>>> + preempt_disable_notrace();
>>> +
>>> + export = rcu_dereference_raw_notrace(ftrace_exports_list);
>>> + while (export) {
>>> + trace_process_export(export, event);
>>> + export = rcu_dereference_raw_notrace(export->next);
>>> + }
>>> +
>>> + preempt_enable_notrace();
>>> +}
>>> +
>>> +static inline void
>>> +add_trace_export(struct trace_export **list, struct trace_export *export)
>>> +{
>>> + rcu_assign_pointer(export->next, *list);
>>> + /*
>>> +  * We are 

Re: [PATCH V9 1/6] tracing: add a possibility of exporting function trace to other places instead of ring buffer only

2016-11-22 Thread Chunyan Zhang
Hi Steve,

Actually I had been keeping the idea that we would need to export most
kinds of traces rather than function trace only to somewhere else, say
STM, that's also why I made STM_SOURCE_FTRACE depending on TRACING
which was later changed to FUNCTION_TRACER according to you advice.

Thanks,
Chunyan

On 23 November 2016 at 10:27, Chunyan Zhang  wrote:
> On 23 November 2016 at 06:39, Steven Rostedt  wrote:
>> On Mon, 21 Nov 2016 15:57:18 +0800
>> Chunyan Zhang  wrote:
>>
>>> Currently Function traces can be only exported to ring buffer, this
>>> patch added trace_export concept which can process traces and export
>>> them to a registered destination as an addition to the current only
>>> one output of Ftrace - i.e. ring buffer.
>>>
>>> In this way, if we want Function traces to be sent to other destination
>>> rather than ring buffer only, we just need to register a new trace_export
>>> and implement its own .write() function for writing traces to storage.
>>>
>>> With this patch, only Function trace (trace type is TRACE_FN)
>>> is supported.
>>>
>>> Signed-off-by: Chunyan Zhang 
>>> ---
>>>  include/linux/trace.h |  28 +++
>>>  kernel/trace/trace.c  | 129 
>>> +-
>>>  2 files changed, 156 insertions(+), 1 deletion(-)
>>>  create mode 100644 include/linux/trace.h
>>>
>>> diff --git a/include/linux/trace.h b/include/linux/trace.h
>>> new file mode 100644
>>> index 000..9330a58
>>> --- /dev/null
>>> +++ b/include/linux/trace.h
>>> @@ -0,0 +1,28 @@
>>> +#ifndef _LINUX_TRACE_H
>>> +#define _LINUX_TRACE_H
>>> +
>>> +#ifdef CONFIG_TRACING
>>> +/*
>>> + * The trace export - an export of Ftrace output. The trace_export
>>> + * can process traces and export them to a registered destination as
>>> + * an addition to the current only output of Ftrace - i.e. ring buffer.
>>> + *
>>> + * If you want traces to be sent to some other place rather than ring
>>> + * buffer only, just need to register a new trace_export and implement
>>> + * its own .write() function for writing traces to the storage.
>>> + *
>>> + * next  - pointer to the next trace_export
>>> + * write - copy traces which have been delt with ->commit() to
>>> + * the destination
>>> + */
>>> +struct trace_export {
>>> + struct trace_export __rcu   *next;
>>> + void (*write)(const void *, unsigned int);
>>> +};
>>> +
>>> +int register_ftrace_export(struct trace_export *export);
>>> +int unregister_ftrace_export(struct trace_export *export);
>>> +
>>> +#endif   /* CONFIG_TRACING */
>>> +
>>> +#endif   /* _LINUX_TRACE_H */
>>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>>> index 8696ce6..038291d 100644
>>> --- a/kernel/trace/trace.c
>>> +++ b/kernel/trace/trace.c
>>> @@ -40,6 +40,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>
>>>  #include "trace.h"
>>> @@ -2128,6 +2129,129 @@ void trace_buffer_unlock_commit_regs(struct 
>>> trace_array *tr,
>>>   ftrace_trace_userstack(buffer, flags, pc);
>>>  }
>>>
>>> +static void
>>> +trace_process_export(struct trace_export *export,
>>> +struct ring_buffer_event *event)
>>> +{
>>> + struct trace_entry *entry;
>>> + unsigned int size = 0;
>>> +
>>> + entry = ring_buffer_event_data(event);
>>> + size = ring_buffer_event_length(event);
>>> + export->write(entry, size);
>>> +}
>>> +
>>> +static DEFINE_MUTEX(ftrace_export_lock);
>>> +
>>> +static struct trace_export __rcu *ftrace_exports_list __read_mostly;
>>> +
>>> +static DEFINE_STATIC_KEY_FALSE(ftrace_exports_enabled);
>>> +
>>> +static inline void ftrace_exports_enable(void)
>>> +{
>>> + static_branch_enable(_exports_enabled);
>>> +}
>>> +
>>> +static inline void ftrace_exports_disable(void)
>>> +{
>>> + static_branch_disable(_exports_enabled);
>>> +}
>>> +
>>> +void ftrace_exports(struct ring_buffer_event *event)
>>
>> I'm currently testing the patches, but is there a reason that
>> ftrace_exports() is not static?
>
> At present  ftrace_exports() is only used by function trace though,
> but I hope it can be used by other traces when it needed.
> So I didn't mark it with static, but if you think it should better be
> static for the time being, I can revise that.
>
> Thanks,
> Chunyan
>
>
>>
>> -- Steve
>>
>>> +{
>>> + struct trace_export *export;
>>> +
>>> + preempt_disable_notrace();
>>> +
>>> + export = rcu_dereference_raw_notrace(ftrace_exports_list);
>>> + while (export) {
>>> + trace_process_export(export, event);
>>> + export = rcu_dereference_raw_notrace(export->next);
>>> + }
>>> +
>>> + preempt_enable_notrace();
>>> +}
>>> +
>>> +static inline void
>>> +add_trace_export(struct trace_export **list, struct trace_export *export)
>>> +{
>>> + rcu_assign_pointer(export->next, *list);
>>> + /*
>>> +  * We are entering export into the list but another
>>> +  * CPU might be walking that list. We need to 

Re: [RESEND PATCH 2/3] ARM: davinci: hawk: Remove vbus and over current gpios

2016-11-22 Thread Sekhar Nori
On Tuesday 22 November 2016 09:11 PM, Axel Haslam wrote:
> Hi Sekhar
> 
> On Tue, Nov 22, 2016 at 11:37 AM, Sekhar Nori  wrote:
>> On Monday 21 November 2016 10:23 PM, Axel Haslam wrote:
>>> The hawk board VBUS is fixed to a 5v source, and the over
>>> current pin is actually not connected to the SoC.
>>>
>>> Do not reseve these gpios for OHCI as they are not related
>>> to usb.
>>>
>>> Signed-off-by: Axel Haslam 
>>
>> As discussed over the MMC/SD patches, this patch should be based off the
>> hawkboard schematic, not the LCDK schematic.
>>
> 
> I looked at the hawkboard schematics and they are the same
> as the lcdk as far as usb i concerned:
> 
> The ohci vbus is fixed to 5v, and the over current pins of the
> TPS are not connected. so this patch should be ok for
> both the hawk and the lcdk.

Alright! Thanks for checking.

Regards,
Sekhar


Re: [RESEND PATCH 2/3] ARM: davinci: hawk: Remove vbus and over current gpios

2016-11-22 Thread Sekhar Nori
On Tuesday 22 November 2016 09:11 PM, Axel Haslam wrote:
> Hi Sekhar
> 
> On Tue, Nov 22, 2016 at 11:37 AM, Sekhar Nori  wrote:
>> On Monday 21 November 2016 10:23 PM, Axel Haslam wrote:
>>> The hawk board VBUS is fixed to a 5v source, and the over
>>> current pin is actually not connected to the SoC.
>>>
>>> Do not reseve these gpios for OHCI as they are not related
>>> to usb.
>>>
>>> Signed-off-by: Axel Haslam 
>>
>> As discussed over the MMC/SD patches, this patch should be based off the
>> hawkboard schematic, not the LCDK schematic.
>>
> 
> I looked at the hawkboard schematics and they are the same
> as the lcdk as far as usb i concerned:
> 
> The ohci vbus is fixed to 5v, and the over current pins of the
> TPS are not connected. so this patch should be ok for
> both the hawk and the lcdk.

Alright! Thanks for checking.

Regards,
Sekhar


Re: sendfile from 9p fs into af_alg

2016-11-22 Thread Alexei Starovoitov
On Wed, Nov 23, 2016 at 04:46:26AM +, Al Viro wrote:
> On Tue, Nov 22, 2016 at 07:58:29PM -0800, Alexei Starovoitov wrote:
> > Hi Al,
> > 
> > it seems the following commit 523ac9afc73a ("switch 
> > default_file_splice_read() to use of pipe-backed iov_iter")
> > breaks sendfile from 9p fs into af_alg socket.
> > sendfile into af_alg is used by iproute2/tc.
> > I'm not sure whether it's 9p or crypto or vfs problem, but happy to test 
> > any patches.
> 
> Could you try -rc6 (or anything that contains 680bb946a1ae04, for that
> matter)?

already tested with that patch in the latest net-next. Still broken :(



Re: sendfile from 9p fs into af_alg

2016-11-22 Thread Alexei Starovoitov
On Wed, Nov 23, 2016 at 04:46:26AM +, Al Viro wrote:
> On Tue, Nov 22, 2016 at 07:58:29PM -0800, Alexei Starovoitov wrote:
> > Hi Al,
> > 
> > it seems the following commit 523ac9afc73a ("switch 
> > default_file_splice_read() to use of pipe-backed iov_iter")
> > breaks sendfile from 9p fs into af_alg socket.
> > sendfile into af_alg is used by iproute2/tc.
> > I'm not sure whether it's 9p or crypto or vfs problem, but happy to test 
> > any patches.
> 
> Could you try -rc6 (or anything that contains 680bb946a1ae04, for that
> matter)?

already tested with that patch in the latest net-next. Still broken :(



Re: sendfile from 9p fs into af_alg

2016-11-22 Thread Al Viro
On Tue, Nov 22, 2016 at 07:58:29PM -0800, Alexei Starovoitov wrote:
> Hi Al,
> 
> it seems the following commit 523ac9afc73a ("switch 
> default_file_splice_read() to use of pipe-backed iov_iter")
> breaks sendfile from 9p fs into af_alg socket.
> sendfile into af_alg is used by iproute2/tc.
> I'm not sure whether it's 9p or crypto or vfs problem, but happy to test any 
> patches.

Could you try -rc6 (or anything that contains 680bb946a1ae04, for that
matter)?


Re: sendfile from 9p fs into af_alg

2016-11-22 Thread Al Viro
On Tue, Nov 22, 2016 at 07:58:29PM -0800, Alexei Starovoitov wrote:
> Hi Al,
> 
> it seems the following commit 523ac9afc73a ("switch 
> default_file_splice_read() to use of pipe-backed iov_iter")
> breaks sendfile from 9p fs into af_alg socket.
> sendfile into af_alg is used by iproute2/tc.
> I'm not sure whether it's 9p or crypto or vfs problem, but happy to test any 
> patches.

Could you try -rc6 (or anything that contains 680bb946a1ae04, for that
matter)?


Re: [PATCH v2] tools lib traceevent: Add retrieval of preempt count and latency flags

2016-11-22 Thread Namhyung Kim
Hi Arnaldo and Steve,

On Tue, Nov 22, 2016 at 03:06:24PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 22, 2016 at 11:31:58AM -0500, Steven Rostedt escreveu:
> > 
> > Add a way to retrieve the preempt count as well as the latency flags from a
> > pevent_record.
> > 
> >  int pevent_data_preempt_count(pevent, record);
> > 
> > returns the preempt count of a record.
> > 
> >  int pevent_data_flags(pevent, record);
> > 
> > returns the latency flags for a record.
> 
> Namhyung, I'm preemptively adding your Acked-by, ok?

Sure.

Acked-by: Namhyung Kim 

Thanks,
Namhyung


>  
> > Signed-off-by: Steven Rostedt 
> > ---
> >  tools/lib/traceevent/event-parse.c | 30 --
> >  tools/lib/traceevent/event-parse.h |  2 ++
> >  2 files changed, 30 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/lib/traceevent/event-parse.c 
> > b/tools/lib/traceevent/event-parse.c
> > index 664c90c8e22b..6e2dfcbf9e30 100644
> > --- a/tools/lib/traceevent/event-parse.c
> > +++ b/tools/lib/traceevent/event-parse.c
> > @@ -5191,11 +5191,11 @@ struct event_format 
> > *pevent_data_event_from_type(struct pevent *pevent, int type
> >  }
> >  
> >  /**
> > - * pevent_data_pid - parse the PID from raw data
> > + * pevent_data_pid - parse the PID from record
> >   * @pevent: a handle to the pevent
> >   * @rec: the record to parse
> >   *
> > - * This returns the PID from a raw data.
> > + * This returns the PID from a record.
> >   */
> >  int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec)
> >  {
> > @@ -5203,6 +5203,32 @@ int pevent_data_pid(struct pevent *pevent, struct 
> > pevent_record *rec)
> >  }
> >  
> >  /**
> > + * pevent_data_prempt_count - parse the preempt count from the record
> > + * @pevent: a handle to the pevent
> > + * @rec: the record to parse
> > + *
> > + * This returns the preempt count from a record.
> > + */
> > +int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record 
> > *rec)
> > +{
> > +   return parse_common_pc(pevent, rec->data);
> > +}
> > +
> > +/**
> > + * pevent_data_flags - parse the latency flags from the record
> > + * @pevent: a handle to the pevent
> > + * @rec: the record to parse
> > + *
> > + * This returns the latency flags from a record.
> > + *
> > + *  Use trace_flag_type enum for the flags (see event-parse.h).
> > + */
> > +int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec)
> > +{
> > +   return parse_common_flags(pevent, rec->data);
> > +}
> > +
> > +/**
> >   * pevent_data_comm_from_pid - return the command line from PID
> >   * @pevent: a handle to the pevent
> >   * @pid: the PID of the task to search for
> > diff --git a/tools/lib/traceevent/event-parse.h 
> > b/tools/lib/traceevent/event-parse.h
> > index 9ffde377e89d..82b83c4d621c 100644
> > --- a/tools/lib/traceevent/event-parse.h
> > +++ b/tools/lib/traceevent/event-parse.h
> > @@ -712,6 +712,8 @@ void pevent_data_lat_fmt(struct pevent *pevent,
> >  int pevent_data_type(struct pevent *pevent, struct pevent_record *rec);
> >  struct event_format *pevent_data_event_from_type(struct pevent *pevent, 
> > int type);
> >  int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec);
> > +int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record 
> > *rec);
> > +int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec);
> >  const char *pevent_data_comm_from_pid(struct pevent *pevent, int pid);
> >  struct cmdline;
> >  struct cmdline *pevent_data_pid_from_comm(struct pevent *pevent, const 
> > char *comm,
> > -- 
> > 2.1.0
> > 


Re: [PATCH v2] tools lib traceevent: Add retrieval of preempt count and latency flags

2016-11-22 Thread Namhyung Kim
Hi Arnaldo and Steve,

On Tue, Nov 22, 2016 at 03:06:24PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 22, 2016 at 11:31:58AM -0500, Steven Rostedt escreveu:
> > 
> > Add a way to retrieve the preempt count as well as the latency flags from a
> > pevent_record.
> > 
> >  int pevent_data_preempt_count(pevent, record);
> > 
> > returns the preempt count of a record.
> > 
> >  int pevent_data_flags(pevent, record);
> > 
> > returns the latency flags for a record.
> 
> Namhyung, I'm preemptively adding your Acked-by, ok?

Sure.

Acked-by: Namhyung Kim 

Thanks,
Namhyung


>  
> > Signed-off-by: Steven Rostedt 
> > ---
> >  tools/lib/traceevent/event-parse.c | 30 --
> >  tools/lib/traceevent/event-parse.h |  2 ++
> >  2 files changed, 30 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/lib/traceevent/event-parse.c 
> > b/tools/lib/traceevent/event-parse.c
> > index 664c90c8e22b..6e2dfcbf9e30 100644
> > --- a/tools/lib/traceevent/event-parse.c
> > +++ b/tools/lib/traceevent/event-parse.c
> > @@ -5191,11 +5191,11 @@ struct event_format 
> > *pevent_data_event_from_type(struct pevent *pevent, int type
> >  }
> >  
> >  /**
> > - * pevent_data_pid - parse the PID from raw data
> > + * pevent_data_pid - parse the PID from record
> >   * @pevent: a handle to the pevent
> >   * @rec: the record to parse
> >   *
> > - * This returns the PID from a raw data.
> > + * This returns the PID from a record.
> >   */
> >  int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec)
> >  {
> > @@ -5203,6 +5203,32 @@ int pevent_data_pid(struct pevent *pevent, struct 
> > pevent_record *rec)
> >  }
> >  
> >  /**
> > + * pevent_data_prempt_count - parse the preempt count from the record
> > + * @pevent: a handle to the pevent
> > + * @rec: the record to parse
> > + *
> > + * This returns the preempt count from a record.
> > + */
> > +int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record 
> > *rec)
> > +{
> > +   return parse_common_pc(pevent, rec->data);
> > +}
> > +
> > +/**
> > + * pevent_data_flags - parse the latency flags from the record
> > + * @pevent: a handle to the pevent
> > + * @rec: the record to parse
> > + *
> > + * This returns the latency flags from a record.
> > + *
> > + *  Use trace_flag_type enum for the flags (see event-parse.h).
> > + */
> > +int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec)
> > +{
> > +   return parse_common_flags(pevent, rec->data);
> > +}
> > +
> > +/**
> >   * pevent_data_comm_from_pid - return the command line from PID
> >   * @pevent: a handle to the pevent
> >   * @pid: the PID of the task to search for
> > diff --git a/tools/lib/traceevent/event-parse.h 
> > b/tools/lib/traceevent/event-parse.h
> > index 9ffde377e89d..82b83c4d621c 100644
> > --- a/tools/lib/traceevent/event-parse.h
> > +++ b/tools/lib/traceevent/event-parse.h
> > @@ -712,6 +712,8 @@ void pevent_data_lat_fmt(struct pevent *pevent,
> >  int pevent_data_type(struct pevent *pevent, struct pevent_record *rec);
> >  struct event_format *pevent_data_event_from_type(struct pevent *pevent, 
> > int type);
> >  int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec);
> > +int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record 
> > *rec);
> > +int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec);
> >  const char *pevent_data_comm_from_pid(struct pevent *pevent, int pid);
> >  struct cmdline;
> >  struct cmdline *pevent_data_pid_from_comm(struct pevent *pevent, const 
> > char *comm,
> > -- 
> > 2.1.0
> > 


Re: [PATCH v2] mm: support anonymous stable page

2016-11-22 Thread Hugh Dickins
On Tue, 22 Nov 2016, Minchan Kim wrote:
> On Mon, Nov 21, 2016 at 07:46:28PM -0800, Hugh Dickins wrote:
> > 
> > Andrew might ask if we should Cc stable (haha): I think we agree
> > that it's a defect we've been aware of ever since stable pages were
> > first proposed, but nobody has actually been troubled by it before
> > your async zram development: so, you're right to be fixing it ahead
> > of your zram changes, but we don't see a call for backporting.
> 
> I thought so until I see your comment. However, I checked again
> and found it seems a ancient bug since zram birth.
> swap_writepage unlock the page right before submitting bio while
> it keeps the lock during rw_page operation during bdev_write_page.
> So, if zram_rw_page fails(e.g, -ENOMEM) and then fallback to
> submit_bio in __swap_writepage, the problem can occur.

It's not clear to me why that matters.  If it drives zram mad
to the point of crashing the kernel, yes, that would matter.  But
if it just places incomprehensible or mis-CRCed data on the device,
who cares?  The reused swap page is marked dirty, and nobody should
be reading the stale data back off swap.  If you do resend with a
stable tag, please make clear why it matters.

Hugh

> 
> Hmm, I will resend patchset with zram fix part with marking
> the stable.
> 
> Thanks, Hugh!


Re: [PATCH v2] mm: support anonymous stable page

2016-11-22 Thread Hugh Dickins
On Tue, 22 Nov 2016, Minchan Kim wrote:
> On Mon, Nov 21, 2016 at 07:46:28PM -0800, Hugh Dickins wrote:
> > 
> > Andrew might ask if we should Cc stable (haha): I think we agree
> > that it's a defect we've been aware of ever since stable pages were
> > first proposed, but nobody has actually been troubled by it before
> > your async zram development: so, you're right to be fixing it ahead
> > of your zram changes, but we don't see a call for backporting.
> 
> I thought so until I see your comment. However, I checked again
> and found it seems a ancient bug since zram birth.
> swap_writepage unlock the page right before submitting bio while
> it keeps the lock during rw_page operation during bdev_write_page.
> So, if zram_rw_page fails(e.g, -ENOMEM) and then fallback to
> submit_bio in __swap_writepage, the problem can occur.

It's not clear to me why that matters.  If it drives zram mad
to the point of crashing the kernel, yes, that would matter.  But
if it just places incomprehensible or mis-CRCed data on the device,
who cares?  The reused swap page is marked dirty, and nobody should
be reading the stale data back off swap.  If you do resend with a
stable tag, please make clear why it matters.

Hugh

> 
> Hmm, I will resend patchset with zram fix part with marking
> the stable.
> 
> Thanks, Hugh!


linux-next: remove the remoteproc tree?

2016-11-22 Thread Stephen Rothwell
Hi Ohad,

The remoteproc tree
(git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc.git#for-next)
has not bee updated for more than a year.  Should I remove it from
linux-next?  Or could it (at least) be cleaned up, please?

-- 
Cheers,
Stephen Rothwell


linux-next: remove the remoteproc tree?

2016-11-22 Thread Stephen Rothwell
Hi Ohad,

The remoteproc tree
(git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc.git#for-next)
has not bee updated for more than a year.  Should I remove it from
linux-next?  Or could it (at least) be cleaned up, please?

-- 
Cheers,
Stephen Rothwell


Re: [PATCH 3/5] thermal: rockchip: fixes invalid temperature case

2016-11-22 Thread Brian Norris
On Wed, Nov 23, 2016 at 11:03:33AM +0800, Caesar Wang wrote:
> 在 2016年11月23日 10:33, Brian Norris 写道:
> >IIUC, "too high" should not be interpreted as TSADCV2_DATA_MASK on
> >rk3288, should it? That corresponds to -40C, which means you'll be
> >triggering the alarm temperature at a very *low* temperature, not a very
> >high one, no?
> 
> The "too high" will correspond to -40C on rk3288, but shouldn't
> trigger the alarm temperature.
> 
> Due to the alarm or tshut function will handle it.
> 
> e.g.:
> static void rk_tsadcv2_alarm_temp(const struct chip_tsadc_table *table,
>   int chn, void __iomem *regs, int temp)
> {
> u32 alarm_value, int_en;
> 
> /* Make sure the value is valid */
> alarm_value = rk_tsadcv2_temp_to_code(table, temp);
> if (alarm_value == table->data_mask)
> return;

Ah, right. I keep forgetting about this odd error handling.

That's still the wrong error handling though; the right response is
never to avoid doing anything (and therefore returning "success" to the
thermal core). You need to either program a high (or low) trip value, or
else report an error (i.e., allow rk_tsadcv2_alarm_temp() to return an
error code back to the calling function). Otherwise, this:

echo -45000 > trip_0_temp

will succeed without error, and:

cat trip_0_temp
-45000

will return the cached temperature from of-thermal, even though the trip
point is programmed to something else entirely.

Brian


Re: [PATCH 3/5] thermal: rockchip: fixes invalid temperature case

2016-11-22 Thread Brian Norris
On Wed, Nov 23, 2016 at 11:03:33AM +0800, Caesar Wang wrote:
> 在 2016年11月23日 10:33, Brian Norris 写道:
> >IIUC, "too high" should not be interpreted as TSADCV2_DATA_MASK on
> >rk3288, should it? That corresponds to -40C, which means you'll be
> >triggering the alarm temperature at a very *low* temperature, not a very
> >high one, no?
> 
> The "too high" will correspond to -40C on rk3288, but shouldn't
> trigger the alarm temperature.
> 
> Due to the alarm or tshut function will handle it.
> 
> e.g.:
> static void rk_tsadcv2_alarm_temp(const struct chip_tsadc_table *table,
>   int chn, void __iomem *regs, int temp)
> {
> u32 alarm_value, int_en;
> 
> /* Make sure the value is valid */
> alarm_value = rk_tsadcv2_temp_to_code(table, temp);
> if (alarm_value == table->data_mask)
> return;

Ah, right. I keep forgetting about this odd error handling.

That's still the wrong error handling though; the right response is
never to avoid doing anything (and therefore returning "success" to the
thermal core). You need to either program a high (or low) trip value, or
else report an error (i.e., allow rk_tsadcv2_alarm_temp() to return an
error code back to the calling function). Otherwise, this:

echo -45000 > trip_0_temp

will succeed without error, and:

cat trip_0_temp
-45000

will return the cached temperature from of-thermal, even though the trip
point is programmed to something else entirely.

Brian


Re: [fuse-devel] fuse: max_background and congestion_threshold settings

2016-11-22 Thread Maxim Patlasov

On 11/22/2016 02:45 PM, Nikolaus Rath wrote:


On Nov 16 2016, Maxim Patlasov  wrote:

On 11/16/2016 12:19 PM, Nikolaus Rath wrote:


On Nov 16 2016, Maxim Patlasov  wrote:

On 11/16/2016 11:19 AM, Nikolaus Rath wrote:


Hi Maxim,

On Nov 15 2016, Maxim Patlasov  wrote:

On 11/15/2016 08:18 AM, Nikolaus Rath wrote:

Could someone explain to me the meaning of the max_background and
congestion_threshold settings of the fuse module?

At first I assumed that max_background specifies the maximum number of
pending requests (i.e., requests that have been send to userspace but
for which no reply was received yet). But looking at fs/fuse/dev.c, it
looks as if not every request is included in this number.

fuse uses max_background for cases where the total number of
simultaneous requests of given type is not limited by some other
natural means. AFAIU, these cases are: 1) async processing of direct
IO; 2) read-ahead. As an example of "natural" limitation: when
userspace process blocks on a sync direct IO read/write, the number of
requests fuse consumed is limited by the number of such processes
(actually their threads). In contrast, if userspace requests 1GB
direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
fuse requests simultaneously. That's where max_background steps in.

Ah, that makes sense. Are these two cases meant as examples, or is that
an exhaustive list? Because I would have thought that other cases should
be writing of cached data (when writeback caching is enabled), and
asynchronous I/O from userspace...?

I think that's exhaustive list, but I can miss something.

As for writing of cached data, that definitely doesn't go through
background requests. Here we rely on flusher: fuse will allocate as
many requests as the flusher wants to writeback.

Buffered AIO READs actually block in submit_io until fully
processed. So it's just another example of "natural" limitation I told
above.

Not sure I understand. What is it that's blocking? It can't be the
userspace process, because then it wouldn't be asynchronous I/O...

Surprise! Alas, Linux kernel does NOT process buffered AIO reads in
async manner. You can verify it yourself by strace-ing a simple
program looping over io_submit + io_getevents: for direct IO (as
expected) io_submit returns immediately while io_getevents waits for
actual IO; in contrast, for buffered IO (surprisingly) io_submit waits
for actual IO while io_getevents returns immediately. Presumably,
people are supposed to use mmap-ed read/writes rather than buffered
AIO.

What about buffered, asynchronous writes when writeback cache is
disabled? It sounds as if io_submit does not block (so userspace could
create an unlimited number), nor can the kernel coalesce them (since
writeback caching is disabled).


I've never looked closely at it. Do you have a particular use case or 
concern?





Thanks!
-Nikolaus





Re: [fuse-devel] fuse: max_background and congestion_threshold settings

2016-11-22 Thread Maxim Patlasov

On 11/22/2016 02:45 PM, Nikolaus Rath wrote:


On Nov 16 2016, Maxim Patlasov  wrote:

On 11/16/2016 12:19 PM, Nikolaus Rath wrote:


On Nov 16 2016, Maxim Patlasov  wrote:

On 11/16/2016 11:19 AM, Nikolaus Rath wrote:


Hi Maxim,

On Nov 15 2016, Maxim Patlasov  wrote:

On 11/15/2016 08:18 AM, Nikolaus Rath wrote:

Could someone explain to me the meaning of the max_background and
congestion_threshold settings of the fuse module?

At first I assumed that max_background specifies the maximum number of
pending requests (i.e., requests that have been send to userspace but
for which no reply was received yet). But looking at fs/fuse/dev.c, it
looks as if not every request is included in this number.

fuse uses max_background for cases where the total number of
simultaneous requests of given type is not limited by some other
natural means. AFAIU, these cases are: 1) async processing of direct
IO; 2) read-ahead. As an example of "natural" limitation: when
userspace process blocks on a sync direct IO read/write, the number of
requests fuse consumed is limited by the number of such processes
(actually their threads). In contrast, if userspace requests 1GB
direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
fuse requests simultaneously. That's where max_background steps in.

Ah, that makes sense. Are these two cases meant as examples, or is that
an exhaustive list? Because I would have thought that other cases should
be writing of cached data (when writeback caching is enabled), and
asynchronous I/O from userspace...?

I think that's exhaustive list, but I can miss something.

As for writing of cached data, that definitely doesn't go through
background requests. Here we rely on flusher: fuse will allocate as
many requests as the flusher wants to writeback.

Buffered AIO READs actually block in submit_io until fully
processed. So it's just another example of "natural" limitation I told
above.

Not sure I understand. What is it that's blocking? It can't be the
userspace process, because then it wouldn't be asynchronous I/O...

Surprise! Alas, Linux kernel does NOT process buffered AIO reads in
async manner. You can verify it yourself by strace-ing a simple
program looping over io_submit + io_getevents: for direct IO (as
expected) io_submit returns immediately while io_getevents waits for
actual IO; in contrast, for buffered IO (surprisingly) io_submit waits
for actual IO while io_getevents returns immediately. Presumably,
people are supposed to use mmap-ed read/writes rather than buffered
AIO.

What about buffered, asynchronous writes when writeback cache is
disabled? It sounds as if io_submit does not block (so userspace could
create an unlimited number), nor can the kernel coalesce them (since
writeback caching is disabled).


I've never looked closely at it. Do you have a particular use case or 
concern?





Thanks!
-Nikolaus





Re: linux-next: Tree for Nov 22 (ext4 + dax)

2016-11-22 Thread Theodore Ts'o
On Wed, Nov 23, 2016 at 04:19:35AM +1100, Stephen Rothwell wrote:
> > I don't see why this is happening but I reproduced it multiple times.
> > 
> > fs/built-in.o: In function `ext4_dax_fault':
> > file.c:(.text+0x6278b): undefined reference to `dax_iomap_fault'
> > file.c:(.text+0x627dc): undefined reference to `dax_iomap_fault'
> >
> > Full randconfig file is attached.
> 
> CONFIG_FS_IOMAP is not set ...

I've just added a Kconfig fixup from Jan Kara to address this.

 - Ted


Re: linux-next: Tree for Nov 22 (ext4 + dax)

2016-11-22 Thread Theodore Ts'o
On Wed, Nov 23, 2016 at 04:19:35AM +1100, Stephen Rothwell wrote:
> > I don't see why this is happening but I reproduced it multiple times.
> > 
> > fs/built-in.o: In function `ext4_dax_fault':
> > file.c:(.text+0x6278b): undefined reference to `dax_iomap_fault'
> > file.c:(.text+0x627dc): undefined reference to `dax_iomap_fault'
> >
> > Full randconfig file is attached.
> 
> CONFIG_FS_IOMAP is not set ...

I've just added a Kconfig fixup from Jan Kara to address this.

 - Ted


Re: [PATCH v4 3/3] dmaengine: sun6i: share the dma driver with sun50i

2016-11-22 Thread Vinod Koul
On Sun, Nov 20, 2016 at 06:45:40PM +0800, Hao Zhang wrote:
> Changes the limited buswith to 8 bytes,and add
> the test in sun6i_dma_config function
> 
> Accroding to sun6i dma driver, i think ,if the client
  
typo and other grammatical mistakes here..


> doesn't configure the address width with dmaengine_slave_config
> function, it would use the default width. So we can add the test
> in sun6i_dma_config function called by dmaengine_slave_config,
> and test the configuration whether is support for the device.
> 
> Signed-off-by: Hao Zhang 
> ---
>  drivers/dma/sun6i-dma.c | 33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/sun6i-dma.c b/drivers/dma/sun6i-dma.c
> index a235878..f7c90b6 100644
> --- a/drivers/dma/sun6i-dma.c
> +++ b/drivers/dma/sun6i-dma.c
> @@ -250,7 +250,7 @@ static inline s8 convert_burst(u32 maxburst)
>  static inline s8 convert_buswidth(enum dma_slave_buswidth addr_width)
>  {
>   if ((addr_width < DMA_SLAVE_BUSWIDTH_1_BYTE) ||
> - (addr_width > DMA_SLAVE_BUSWIDTH_4_BYTES))
> + (addr_width > DMA_SLAVE_BUSWIDTH_8_BYTES))
>   return -EINVAL;
>  
>   return addr_width >> 1;
> @@ -758,6 +758,18 @@ static int sun6i_dma_config(struct dma_chan *chan,
>  {
>   struct sun6i_vchan *vchan = to_sun6i_vchan(chan);
>  
> + if ((BIT(config->src_addr_width) | chan->device->src_addr_widths) !=
> + chan->device->src_addr_widths) {

First I dont like coding style here

Second, this is not driver specific, should be move to core..

-- 
~Vinod


Re: [PATCH v4 3/3] dmaengine: sun6i: share the dma driver with sun50i

2016-11-22 Thread Vinod Koul
On Sun, Nov 20, 2016 at 06:45:40PM +0800, Hao Zhang wrote:
> Changes the limited buswith to 8 bytes,and add
> the test in sun6i_dma_config function
> 
> Accroding to sun6i dma driver, i think ,if the client
  
typo and other grammatical mistakes here..


> doesn't configure the address width with dmaengine_slave_config
> function, it would use the default width. So we can add the test
> in sun6i_dma_config function called by dmaengine_slave_config,
> and test the configuration whether is support for the device.
> 
> Signed-off-by: Hao Zhang 
> ---
>  drivers/dma/sun6i-dma.c | 33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/sun6i-dma.c b/drivers/dma/sun6i-dma.c
> index a235878..f7c90b6 100644
> --- a/drivers/dma/sun6i-dma.c
> +++ b/drivers/dma/sun6i-dma.c
> @@ -250,7 +250,7 @@ static inline s8 convert_burst(u32 maxburst)
>  static inline s8 convert_buswidth(enum dma_slave_buswidth addr_width)
>  {
>   if ((addr_width < DMA_SLAVE_BUSWIDTH_1_BYTE) ||
> - (addr_width > DMA_SLAVE_BUSWIDTH_4_BYTES))
> + (addr_width > DMA_SLAVE_BUSWIDTH_8_BYTES))
>   return -EINVAL;
>  
>   return addr_width >> 1;
> @@ -758,6 +758,18 @@ static int sun6i_dma_config(struct dma_chan *chan,
>  {
>   struct sun6i_vchan *vchan = to_sun6i_vchan(chan);
>  
> + if ((BIT(config->src_addr_width) | chan->device->src_addr_widths) !=
> + chan->device->src_addr_widths) {

First I dont like coding style here

Second, this is not driver specific, should be move to core..

-- 
~Vinod


  1   2   3   4   5   6   7   8   9   10   >