Re: [PATCH] soc: rockchip: power-domain: Handle errors from of_genpd_add_provider_onecell

2016-09-15 Thread Tomeu Vizoso
On 09/16/2016 12:31 AM, Heiko Stübner wrote:
> Am Donnerstag, 15. September 2016, 16:39:34 schrieb Heiko Stübner:
>> Am Donnerstag, 15. September 2016, 12:43:41 schrieb Tomeu Vizoso:
>>> It was a bit surprising that the device was reported to have probed just
>>> fine, but the provider hadn't been registered.
>>>
>>> So handle any errors when registering the provider and fail the probe
>>> accordingly.
>>>
>>> Signed-off-by: Tomeu Vizoso <tomeu.viz...@collabora.com>
>>> Cc: Caesar Wang <w...@rock-chips.com>
>>> ---
>>>
>>>  drivers/soc/rockchip/pm_domains.c | 6 +-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/soc/rockchip/pm_domains.c
>>> b/drivers/soc/rockchip/pm_domains.c index 7acd1517dd37..57e920128cb2
>>> 100644
>>> --- a/drivers/soc/rockchip/pm_domains.c
>>> +++ b/drivers/soc/rockchip/pm_domains.c
>>> @@ -627,7 +627,11 @@ static int rockchip_pm_domain_probe(struct
>>> platform_device *pdev) goto err_out;
>>>
>>> }
>>>
>>> -   of_genpd_add_provider_onecell(np, >genpd_data);
>>> +   error = of_genpd_add_provider_onecell(np, >genpd_data);
>>> +   if (error) {
>>> +   dev_err(dev, "failed to add provider: %d\n", error);
>>> +   goto err_out;
>>> +   }
>>>
>>> return 0;
>>
>> Looks good in itself, but seems to trigger some issue in the genpd code
>> when applied alone on top of linux-next-20160915. Looks like genpd
>> is missing counter-initialization somewhere, as I'm seeing now:
>>
>> [1.664744] genpd_poweroff_unused disabling (null)
>> [1.669553] [ cut here ]
>> [1.674169] WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:1440
>> __queue_work+0x2b8/0x3f8 [1.682337] Modules linked in:
>> [1.685401] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.8.0-rc6-next-20160915-1-g7432710-dirty #5 [1.694608] Hardware
>> name: Rockchip (Device Tree)
>> [1.699312] [] (unwind_backtrace) from []
>> (show_stack+0x10/0x14) [1.707050] [] (show_stack) from
>> [] (dump_stack+0x90/0xa4) [1.714267] []
>> (dump_stack) from [] (__warn+0xe8/0x100) [1.721221]
>> [] (__warn) from [] (warn_slowpath_null+0x20/0x28) [   
>> 1.728785] [] (warn_slowpath_null) from []
>> (__queue_work+0x2b8/0x3f8) [1.737041] [] (__queue_work) from
>> [] (queue_work_on+0x40/0x4c) [1.744692] []
>> (queue_work_on) from [] (genpd_poweroff_unused+0x78/0x9c) [   
>> 1.753123] [] (genpd_poweroff_unused) from []
>> (do_one_initcall+0x40/0x170) [1.761815] [] (do_one_initcall)
>> from [] (kernel_init_freeable+0x15c/0x1fc) [1.770507]
>> [] (kernel_init_freeable) from []
>> (kernel_init+0x8/0x114) [1.778678] [] (kernel_init) from
>> [] (ret_from_fork+0x14/0x3c) [1.786240] ---[ end trace
>> b38c51ace1463add ]---
>> [1.790875] genpd_poweroff_unused disabling ��
>> [1.795856] genpd_poweroff_unused disabling ��
>> [1.800830] [ cut here ]
>> [1.805443] WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:1440
>> __queue_work+0x2b8/0x3f8 [1.813610] Modules linked in:
>> [1.816673] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW  
>> 4.8.0-rc6-next-20160915-1-g7432710-dirty #5 [1.827094] Hardware
>> name: Rockchip (Device Tree)
>> [1.831795] [] (unwind_backtrace) from []
>> (show_stack+0x10/0x14) [1.839532] [] (show_stack) from
>> [] (dump_stack+0x90/0xa4) [1.846747] []
>> (dump_stack) from [] (__warn+0xe8/0x100) [1.853700]
>> [] (__warn) from [] (warn_slowpath_null+0x20/0x28) [   
>> 1.861264] [] (warn_slowpath_null) from []
>> (__queue_work+0x2b8/0x3f8) [1.869521] [] (__queue_work) from
>> [] (queue_work_on+0x40/0x4c) [1.877170] []
>> (queue_work_on) from [] (genpd_poweroff_unused+0x78/0x9c) [   
>> 1.885600] [] (genpd_poweroff_unused) from []
>> (do_one_initcall+0x40/0x170) [1.894290] [] (do_one_initcall)
>> from [] (kernel_init_freeable+0x15c/0x1fc) [1.902981]
>> [] (kernel_init_freeable) from []
>> (kernel_init+0x8/0x114) [1.911152] [] (kernel_init) from
>> [] (ret_from_fork+0x14/0x3c) [1.918713] ---[ end trace
>> b38c51ace1463ade ]---
>> [1.923338] genpd_poweroff_unused disabling ��
>> [1.928325] genpd_poweroff_unused disabling ��
>>
>> [+ millions more of those]
> 
> just for completenes sake, I've included your patch in a series of my
> own [0] which adds the necessary  pm_genpd_remove prequisite to prevent
> the errors shown above.

That's awesome, thanks. For some reason I hadn't noticed those.

Regards,

Tomeu

> 
> Heiko
> 
> [0] "[PATCH 0/2] soc: rockchip: fix probe error path in power-domain driver"
> 



Re: [PATCH] soc: rockchip: power-domain: Handle errors from of_genpd_add_provider_onecell

2016-09-15 Thread Tomeu Vizoso
On 09/16/2016 12:31 AM, Heiko Stübner wrote:
> Am Donnerstag, 15. September 2016, 16:39:34 schrieb Heiko Stübner:
>> Am Donnerstag, 15. September 2016, 12:43:41 schrieb Tomeu Vizoso:
>>> It was a bit surprising that the device was reported to have probed just
>>> fine, but the provider hadn't been registered.
>>>
>>> So handle any errors when registering the provider and fail the probe
>>> accordingly.
>>>
>>> Signed-off-by: Tomeu Vizoso 
>>> Cc: Caesar Wang 
>>> ---
>>>
>>>  drivers/soc/rockchip/pm_domains.c | 6 +-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/soc/rockchip/pm_domains.c
>>> b/drivers/soc/rockchip/pm_domains.c index 7acd1517dd37..57e920128cb2
>>> 100644
>>> --- a/drivers/soc/rockchip/pm_domains.c
>>> +++ b/drivers/soc/rockchip/pm_domains.c
>>> @@ -627,7 +627,11 @@ static int rockchip_pm_domain_probe(struct
>>> platform_device *pdev) goto err_out;
>>>
>>> }
>>>
>>> -   of_genpd_add_provider_onecell(np, >genpd_data);
>>> +   error = of_genpd_add_provider_onecell(np, >genpd_data);
>>> +   if (error) {
>>> +   dev_err(dev, "failed to add provider: %d\n", error);
>>> +   goto err_out;
>>> +   }
>>>
>>> return 0;
>>
>> Looks good in itself, but seems to trigger some issue in the genpd code
>> when applied alone on top of linux-next-20160915. Looks like genpd
>> is missing counter-initialization somewhere, as I'm seeing now:
>>
>> [1.664744] genpd_poweroff_unused disabling (null)
>> [1.669553] [ cut here ]
>> [1.674169] WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:1440
>> __queue_work+0x2b8/0x3f8 [1.682337] Modules linked in:
>> [1.685401] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.8.0-rc6-next-20160915-1-g7432710-dirty #5 [1.694608] Hardware
>> name: Rockchip (Device Tree)
>> [1.699312] [] (unwind_backtrace) from []
>> (show_stack+0x10/0x14) [1.707050] [] (show_stack) from
>> [] (dump_stack+0x90/0xa4) [1.714267] []
>> (dump_stack) from [] (__warn+0xe8/0x100) [1.721221]
>> [] (__warn) from [] (warn_slowpath_null+0x20/0x28) [   
>> 1.728785] [] (warn_slowpath_null) from []
>> (__queue_work+0x2b8/0x3f8) [1.737041] [] (__queue_work) from
>> [] (queue_work_on+0x40/0x4c) [1.744692] []
>> (queue_work_on) from [] (genpd_poweroff_unused+0x78/0x9c) [   
>> 1.753123] [] (genpd_poweroff_unused) from []
>> (do_one_initcall+0x40/0x170) [1.761815] [] (do_one_initcall)
>> from [] (kernel_init_freeable+0x15c/0x1fc) [1.770507]
>> [] (kernel_init_freeable) from []
>> (kernel_init+0x8/0x114) [1.778678] [] (kernel_init) from
>> [] (ret_from_fork+0x14/0x3c) [1.786240] ---[ end trace
>> b38c51ace1463add ]---
>> [1.790875] genpd_poweroff_unused disabling ��
>> [1.795856] genpd_poweroff_unused disabling ��
>> [1.800830] [ cut here ]
>> [1.805443] WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:1440
>> __queue_work+0x2b8/0x3f8 [1.813610] Modules linked in:
>> [1.816673] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW  
>> 4.8.0-rc6-next-20160915-1-g7432710-dirty #5 [1.827094] Hardware
>> name: Rockchip (Device Tree)
>> [1.831795] [] (unwind_backtrace) from []
>> (show_stack+0x10/0x14) [1.839532] [] (show_stack) from
>> [] (dump_stack+0x90/0xa4) [1.846747] []
>> (dump_stack) from [] (__warn+0xe8/0x100) [1.853700]
>> [] (__warn) from [] (warn_slowpath_null+0x20/0x28) [   
>> 1.861264] [] (warn_slowpath_null) from []
>> (__queue_work+0x2b8/0x3f8) [1.869521] [] (__queue_work) from
>> [] (queue_work_on+0x40/0x4c) [1.877170] []
>> (queue_work_on) from [] (genpd_poweroff_unused+0x78/0x9c) [   
>> 1.885600] [] (genpd_poweroff_unused) from []
>> (do_one_initcall+0x40/0x170) [1.894290] [] (do_one_initcall)
>> from [] (kernel_init_freeable+0x15c/0x1fc) [1.902981]
>> [] (kernel_init_freeable) from []
>> (kernel_init+0x8/0x114) [1.911152] [] (kernel_init) from
>> [] (ret_from_fork+0x14/0x3c) [1.918713] ---[ end trace
>> b38c51ace1463ade ]---
>> [1.923338] genpd_poweroff_unused disabling ��
>> [1.928325] genpd_poweroff_unused disabling ��
>>
>> [+ millions more of those]
> 
> just for completenes sake, I've included your patch in a series of my
> own [0] which adds the necessary  pm_genpd_remove prequisite to prevent
> the errors shown above.

That's awesome, thanks. For some reason I hadn't noticed those.

Regards,

Tomeu

> 
> Heiko
> 
> [0] "[PATCH 0/2] soc: rockchip: fix probe error path in power-domain driver"
> 



Re: [PATCH net-next 0/4] rxrpc: Support IPv6

2016-09-15 Thread David Miller
From: David Howells 
Date: Tue, 13 Sep 2016 23:41:31 +0100

> 
> Here is a set of patches that add IPv6 support.  They need to be applied on
> top of the just-posted miscellaneous fix patches.  They are:
> 
>  (1) Make autobinding of an unconnected socket work when sendmsg() is
>  called to initiate a client call.
> 
>  (2) Don't specify the protocol when creating the client socket, but rather
>  take the default instead.
> 
>  (3) Use rxrpc_extract_addr_from_skb() in a couple of places that were
>  doing the same thing manually.  This allows the IPv6 address
>  extraction to be done in fewer places.
> 
>  (4) Add IPv6 support.  With this, calls can be made to IPv6 servers from
>  userspace AF_RXRPC programs; AFS, however, can't use IPv6 yet as the
>  RPC calls need to be upgradeable.
 ...
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160913-2

Looks good, pulled, thanks.


Re: [PATCH net-next 0/4] rxrpc: Support IPv6

2016-09-15 Thread David Miller
From: David Howells 
Date: Tue, 13 Sep 2016 23:41:31 +0100

> 
> Here is a set of patches that add IPv6 support.  They need to be applied on
> top of the just-posted miscellaneous fix patches.  They are:
> 
>  (1) Make autobinding of an unconnected socket work when sendmsg() is
>  called to initiate a client call.
> 
>  (2) Don't specify the protocol when creating the client socket, but rather
>  take the default instead.
> 
>  (3) Use rxrpc_extract_addr_from_skb() in a couple of places that were
>  doing the same thing manually.  This allows the IPv6 address
>  extraction to be done in fewer places.
> 
>  (4) Add IPv6 support.  With this, calls can be made to IPv6 servers from
>  userspace AF_RXRPC programs; AFS, however, can't use IPv6 yet as the
>  RPC calls need to be upgradeable.
 ...
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160913-2

Looks good, pulled, thanks.


Re: [PATCH net-next 00/10] rxrpc: Miscellaneous fixes

2016-09-15 Thread David Miller
From: David Howells 
Date: Tue, 13 Sep 2016 23:20:56 +0100

> 
> Here's a set of miscellaneous fix patches.  There are a couple of points of
> note:
> 
>  (1) There is one non-fix patch that adjusts the call ref tracking
>  tracepoint to make kernel API-held refs on calls more obvious.  This
>  is a prerequisite for the patch that fixes prealloc refcounting.
> 
>  (2) The final patch alters how jumbo packets that partially exceed the
>  receive window are handled.  Previously, space was being left in the
>  Rx buffer for them, but this significantly hurts performance as the Rx
>  window can't be increased to match the OpenAFS Tx window size.
> 
>  Instead, the excess subpackets are discarded and an EXCEEDS_WINDOW ACK
>  is generated for the first.  To avoid the problem of someone trying to
>  run the kernel out of space by feeding the kernel a series of
>  overlapping maximal jumbo packets, we stop allowing jumbo packets on a
>  call if we encounter more than three jumbo packets with duplicate or
>  excessive subpackets.
 ...
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160913-1

Pulled, thanks David.


Re: [PATCH net-next 00/10] rxrpc: Miscellaneous fixes

2016-09-15 Thread David Miller
From: David Howells 
Date: Tue, 13 Sep 2016 23:20:56 +0100

> 
> Here's a set of miscellaneous fix patches.  There are a couple of points of
> note:
> 
>  (1) There is one non-fix patch that adjusts the call ref tracking
>  tracepoint to make kernel API-held refs on calls more obvious.  This
>  is a prerequisite for the patch that fixes prealloc refcounting.
> 
>  (2) The final patch alters how jumbo packets that partially exceed the
>  receive window are handled.  Previously, space was being left in the
>  Rx buffer for them, but this significantly hurts performance as the Rx
>  window can't be increased to match the OpenAFS Tx window size.
> 
>  Instead, the excess subpackets are discarded and an EXCEEDS_WINDOW ACK
>  is generated for the first.  To avoid the problem of someone trying to
>  run the kernel out of space by feeding the kernel a series of
>  overlapping maximal jumbo packets, we stop allowing jumbo packets on a
>  call if we encounter more than three jumbo packets with duplicate or
>  excessive subpackets.
 ...
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160913-1

Pulled, thanks David.


Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in /proc/self/smaps)

2016-09-15 Thread Nicholas Piggin
On Fri, 16 Sep 2016 08:33:50 +1000
Dave Chinner  wrote:

> On Thu, Sep 15, 2016 at 09:42:22PM +1000, Nicholas Piggin wrote:
> > On Thu, 15 Sep 2016 20:32:10 +1000
> > Dave Chinner  wrote:  
> > > 
> > > You still haven't described anything about what a per-block flag
> > > design is supposed to look like :/  
> > 
> > For the API, or implementation? I'm not quite sure what you mean
> > here. For implementation it's possible to carefully ensure metadata
> > is persistent when allocating blocks in page fault but before
> > mapping pages. Truncate or hole punch or such things can be made to
> > work by invalidating all such mappings and holding them off until
> > you can cope with them again. Not necessarily for a filesystem with
> > *all* capabilities of XFS -- I don't know -- but for a complete basic
> > one.  
> 
> SO, essentially, it comes down to synchrnous metadta updates again.

Yes. I guess fundamentally you can't get away from either that or
preloading at some level.

(Also I don't know that there's a sane way to handle [cm]time properly,
so some things like that -- this is just about block allocation /
avoiding fdatasync).

> but synchronous updates would be conditional on whether an extent
> metadata with the "nofsync" flag asserted was updated? Where's the
> nofsync flag kept? in memory at a generic layer, or in the
> filesystem, potentially in an on-disk structure? How would the
> application set it for a given range?

I guess that comes back to the API. Whether you want it to be persistent,
request based, etc. It could be derived type of storage blocks that are
mapped there, stored per-inode, in-memory, or on extents on disk. I'm not
advocating for a particular API and of course less complexity better.

> 
> > > > > > Filesystem will
> > > > > > invalidate all such mappings before it does buffered IOs or hole 
> > > > > > punch,
> > > > > > and will sync metadata after allocating a new block but before 
> > > > > > returning
> > > > > > from a fault.  
> > > > > 
> > > > > ... requires synchronous metadata updates from page fault context,
> > > > > which we already know is not a good solution.  I'll quote one of
> > > > > Christoph's previous replies to save me the trouble:
> > > > > 
> > > > >   "You could write all metadata synchronously from the page
> > > > >   fault handler, but that's basically asking for all kinds of
> > > > >   deadlocks."
> > > > > So, let's redirect back to the "no sync" flag you were talking about
> > > > > - can you answer the questions I asked above? It would be especially
> > > > > important to highlight how the proposed feature would avoid requiring
> > > > > synchronous metadata updates in page fault contexts
> > > > 
> > > > Right. So what deadlocks are you concerned about?
> > > 
> > > It basically puts the entire journal checkpoint path under a page
> > > fault context. i.e. a whole new global locking context problem is  
> > 
> > Yes there are potentially some new lock orderings created if you
> > do that, depending on what locks the filesystem does.  
> 
> Well, that's the whole issue.

For filesystem implementations, but perhaps not mm/vfs implemenatation
AFAIKS.

> 
> > > created as this path can now be run both inside and outside the
> > > mmap_sem. Nothing ever good comes from running filesystem locking
> > > code both inside and outside the mmap_sem.  
> > 
> > You mean that some cases journal checkpoint runs with mmap_sem
> > held, and others without mmap_sem held? Not that mmap_sem is taken
> > inside journal checkpoint.  
> 
> Maybe not, but now we open up the potential for locks held inside
> or outside mmap sem to interact with the journal locks that are now
> held inside and outside mmap_sem. See below
> 
> > Then I don't really see why that's a
> > problem. I mean performance could suffer a bit, but with fault
> > retry you can almost always do the syncing outside mmap_sem in
> > practice.
> > 
> > Yes, I'll preemptively agree with you -- We don't want to add any
> > such burden if it is not needed and well justified.
> >   
> > > FWIW, We've never executed synchronous transactions inside page
> > > faults in XFS, and I think ext4 is in the same boat - it may be even
> > > worse because of the way it does ordered data dispatch through the
> > > journal. I don't really even want to think about the level of hurt
> > > this might put btrfs or other COW/log structured filesystems under.
> > > I'm sure Christoph can reel off a bunch more issues off the top of
> > > his head  
> > 
> > I asked him, we'll see what he thinks. I don't beleive there is
> > anything fundamental about mm or fs core layers that cause deadlocks
> > though.  
> 
> Spent 5 minutes looking at ext4 for an example: filesystems are
> allowed to take page locks during transaction commit. e.g ext4
> journal commit when using the default ordered data mode:
> 
> jbd2_journal_commit_transaction
>   

Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in /proc/self/smaps)

2016-09-15 Thread Nicholas Piggin
On Fri, 16 Sep 2016 08:33:50 +1000
Dave Chinner  wrote:

> On Thu, Sep 15, 2016 at 09:42:22PM +1000, Nicholas Piggin wrote:
> > On Thu, 15 Sep 2016 20:32:10 +1000
> > Dave Chinner  wrote:  
> > > 
> > > You still haven't described anything about what a per-block flag
> > > design is supposed to look like :/  
> > 
> > For the API, or implementation? I'm not quite sure what you mean
> > here. For implementation it's possible to carefully ensure metadata
> > is persistent when allocating blocks in page fault but before
> > mapping pages. Truncate or hole punch or such things can be made to
> > work by invalidating all such mappings and holding them off until
> > you can cope with them again. Not necessarily for a filesystem with
> > *all* capabilities of XFS -- I don't know -- but for a complete basic
> > one.  
> 
> SO, essentially, it comes down to synchrnous metadta updates again.

Yes. I guess fundamentally you can't get away from either that or
preloading at some level.

(Also I don't know that there's a sane way to handle [cm]time properly,
so some things like that -- this is just about block allocation /
avoiding fdatasync).

> but synchronous updates would be conditional on whether an extent
> metadata with the "nofsync" flag asserted was updated? Where's the
> nofsync flag kept? in memory at a generic layer, or in the
> filesystem, potentially in an on-disk structure? How would the
> application set it for a given range?

I guess that comes back to the API. Whether you want it to be persistent,
request based, etc. It could be derived type of storage blocks that are
mapped there, stored per-inode, in-memory, or on extents on disk. I'm not
advocating for a particular API and of course less complexity better.

> 
> > > > > > Filesystem will
> > > > > > invalidate all such mappings before it does buffered IOs or hole 
> > > > > > punch,
> > > > > > and will sync metadata after allocating a new block but before 
> > > > > > returning
> > > > > > from a fault.  
> > > > > 
> > > > > ... requires synchronous metadata updates from page fault context,
> > > > > which we already know is not a good solution.  I'll quote one of
> > > > > Christoph's previous replies to save me the trouble:
> > > > > 
> > > > >   "You could write all metadata synchronously from the page
> > > > >   fault handler, but that's basically asking for all kinds of
> > > > >   deadlocks."
> > > > > So, let's redirect back to the "no sync" flag you were talking about
> > > > > - can you answer the questions I asked above? It would be especially
> > > > > important to highlight how the proposed feature would avoid requiring
> > > > > synchronous metadata updates in page fault contexts
> > > > 
> > > > Right. So what deadlocks are you concerned about?
> > > 
> > > It basically puts the entire journal checkpoint path under a page
> > > fault context. i.e. a whole new global locking context problem is  
> > 
> > Yes there are potentially some new lock orderings created if you
> > do that, depending on what locks the filesystem does.  
> 
> Well, that's the whole issue.

For filesystem implementations, but perhaps not mm/vfs implemenatation
AFAIKS.

> 
> > > created as this path can now be run both inside and outside the
> > > mmap_sem. Nothing ever good comes from running filesystem locking
> > > code both inside and outside the mmap_sem.  
> > 
> > You mean that some cases journal checkpoint runs with mmap_sem
> > held, and others without mmap_sem held? Not that mmap_sem is taken
> > inside journal checkpoint.  
> 
> Maybe not, but now we open up the potential for locks held inside
> or outside mmap sem to interact with the journal locks that are now
> held inside and outside mmap_sem. See below
> 
> > Then I don't really see why that's a
> > problem. I mean performance could suffer a bit, but with fault
> > retry you can almost always do the syncing outside mmap_sem in
> > practice.
> > 
> > Yes, I'll preemptively agree with you -- We don't want to add any
> > such burden if it is not needed and well justified.
> >   
> > > FWIW, We've never executed synchronous transactions inside page
> > > faults in XFS, and I think ext4 is in the same boat - it may be even
> > > worse because of the way it does ordered data dispatch through the
> > > journal. I don't really even want to think about the level of hurt
> > > this might put btrfs or other COW/log structured filesystems under.
> > > I'm sure Christoph can reel off a bunch more issues off the top of
> > > his head  
> > 
> > I asked him, we'll see what he thinks. I don't beleive there is
> > anything fundamental about mm or fs core layers that cause deadlocks
> > though.  
> 
> Spent 5 minutes looking at ext4 for an example: filesystems are
> allowed to take page locks during transaction commit. e.g ext4
> journal commit when using the default ordered data mode:
> 
> jbd2_journal_commit_transaction
>   journal_submit_data_buffers()
> 

Re: [ANNOUNCE] 4.6.7-rt13

2016-09-15 Thread Sebastian Andrzej Siewior
On 2016-09-15 18:52:26 [+0200], To Thomas Gleixner wrote:
> The delta patch against 4.6.7-rt12 is appended below and can be found here:

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 190af4271b5c..58fd4ff3f53a 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -89,6 +89,8 @@ static __always_inline bool __preempt_count_dec_and_test(void)
if (preempt_count_dec_and_test())
return true;
 #ifdef CONFIG_PREEMPT_LAZY
+   if (current_thread_info()->preempt_lazy_count)
+   return false;
return test_thread_flag(TIF_NEED_RESCHED_LAZY);
 #else
return false;
@@ -101,8 +103,19 @@ static __always_inline bool 
__preempt_count_dec_and_test(void)
 static __always_inline bool should_resched(int preempt_offset)
 {
 #ifdef CONFIG_PREEMPT_LAZY
-   return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset ||
-   test_thread_flag(TIF_NEED_RESCHED_LAZY));
+   u32 tmp;
+
+   tmp = raw_cpu_read_4(__preempt_count);
+   if (tmp == preempt_offset)
+   return true;
+
+   /* preempt count == 0 ? */
+   tmp &= ~PREEMPT_NEED_RESCHED;
+   if (tmp)
+   return false;
+   if (current_thread_info()->preempt_lazy_count)
+   return false;
+   return test_thread_flag(TIF_NEED_RESCHED_LAZY);
 #else
return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
 #endif
diff --git a/fs/dcache.c b/fs/dcache.c
index aa418c1bdcb5..90b66896ccb2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -40,8 +40,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include "internal.h"
 #include "mount.h"
@@ -795,10 +793,11 @@ void dput(struct dentry *dentry)
if (parent == dentry) {
/* the task with the highest priority won't schedule */
r = cond_resched();
-   if (!r && (rt_task(current) || dl_task(current)))
+   if (!r)
cpu_chill();
-   } else
+   } else {
dentry = parent;
+   }
goto repeat;
}
 }
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dea12a6e413b..72cb21071425 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -713,6 +713,7 @@ static inline void __ftrace_enabled_restore(int enabled)
 #define CALLER_ADDR5 ((unsigned long)ftrace_return_address(5))
 #define CALLER_ADDR6 ((unsigned long)ftrace_return_address(6))
 
+#ifdef CONFIG_USING_GET_LOCK_PARENT_IP
 static inline unsigned long get_lock_parent_ip(void)
 {
unsigned long addr = CALLER_ADDR0;
@@ -724,6 +725,7 @@ static inline unsigned long get_lock_parent_ip(void)
return addr;
return CALLER_ADDR2;
 }
+#endif
 
 #ifdef CONFIG_IRQSOFF_TRACER
   extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
diff --git a/kernel/Makefile b/kernel/Makefile
index f0c40bf49d9f..c60cc9130374 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -11,6 +11,13 @@ obj-y = fork.o exec_domain.o panic.o \
notifier.o ksysfs.o cred.o reboot.o \
async.o range.o smpboot.o
 
+# Tracing may do some dangerous __builtin_return_address() operations
+# We know they are dangerous, we don't need gcc telling us that.
+ifdef CONFIG_USING_GET_LOCK_PARENT_IP
+FRAME_CFLAGS := $(call cc-disable-warning,frame-address)
+KBUILD_CFLAGS += $(FRAME_CFLAGS)
+endif
+
 obj-$(CONFIG_MULTIUSER) += groups.o
 
 ifdef CONFIG_FUNCTION_TRACER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 364ccd0eb57b..9aae45fae52a 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -215,6 +215,7 @@ config PREEMPT_TRACER
select RING_BUFFER_ALLOW_SWAP
select TRACER_SNAPSHOT
select TRACER_SNAPSHOT_PER_CPU_SWAP
+   select USING_GET_LOCK_PARENT_IP
help
  This option measures the time spent in preemption-off critical
  sections, with microsecond accuracy.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e9a607534ca..0adcc993f372 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -962,6 +962,7 @@ config TIMER_STATS
 config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
+   select USING_GET_LOCK_PARENT_IP
default y
help
  If you say Y here then the kernel will use a debug variant of the
@@ -1144,8 +1145,17 @@ config LOCK_TORTURE_TEST
 
 endmenu # lock debugging
 
+config USING_GET_LOCK_PARENT_IP
+bool
+   help
+ Enables the use of the function get_lock_parent_ip() that
+ will use __builtin_return_address(n) with n > 0 causing
+ some gcc warnings. When this is selected, those warnings
+ will be suppressed.
+
 config TRACE_IRQFLAGS
bool
+   select 

Re: [ANNOUNCE] 4.6.7-rt13

2016-09-15 Thread Sebastian Andrzej Siewior
On 2016-09-15 18:52:26 [+0200], To Thomas Gleixner wrote:
> The delta patch against 4.6.7-rt12 is appended below and can be found here:

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 190af4271b5c..58fd4ff3f53a 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -89,6 +89,8 @@ static __always_inline bool __preempt_count_dec_and_test(void)
if (preempt_count_dec_and_test())
return true;
 #ifdef CONFIG_PREEMPT_LAZY
+   if (current_thread_info()->preempt_lazy_count)
+   return false;
return test_thread_flag(TIF_NEED_RESCHED_LAZY);
 #else
return false;
@@ -101,8 +103,19 @@ static __always_inline bool 
__preempt_count_dec_and_test(void)
 static __always_inline bool should_resched(int preempt_offset)
 {
 #ifdef CONFIG_PREEMPT_LAZY
-   return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset ||
-   test_thread_flag(TIF_NEED_RESCHED_LAZY));
+   u32 tmp;
+
+   tmp = raw_cpu_read_4(__preempt_count);
+   if (tmp == preempt_offset)
+   return true;
+
+   /* preempt count == 0 ? */
+   tmp &= ~PREEMPT_NEED_RESCHED;
+   if (tmp)
+   return false;
+   if (current_thread_info()->preempt_lazy_count)
+   return false;
+   return test_thread_flag(TIF_NEED_RESCHED_LAZY);
 #else
return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
 #endif
diff --git a/fs/dcache.c b/fs/dcache.c
index aa418c1bdcb5..90b66896ccb2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -40,8 +40,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include "internal.h"
 #include "mount.h"
@@ -795,10 +793,11 @@ void dput(struct dentry *dentry)
if (parent == dentry) {
/* the task with the highest priority won't schedule */
r = cond_resched();
-   if (!r && (rt_task(current) || dl_task(current)))
+   if (!r)
cpu_chill();
-   } else
+   } else {
dentry = parent;
+   }
goto repeat;
}
 }
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dea12a6e413b..72cb21071425 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -713,6 +713,7 @@ static inline void __ftrace_enabled_restore(int enabled)
 #define CALLER_ADDR5 ((unsigned long)ftrace_return_address(5))
 #define CALLER_ADDR6 ((unsigned long)ftrace_return_address(6))
 
+#ifdef CONFIG_USING_GET_LOCK_PARENT_IP
 static inline unsigned long get_lock_parent_ip(void)
 {
unsigned long addr = CALLER_ADDR0;
@@ -724,6 +725,7 @@ static inline unsigned long get_lock_parent_ip(void)
return addr;
return CALLER_ADDR2;
 }
+#endif
 
 #ifdef CONFIG_IRQSOFF_TRACER
   extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
diff --git a/kernel/Makefile b/kernel/Makefile
index f0c40bf49d9f..c60cc9130374 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -11,6 +11,13 @@ obj-y = fork.o exec_domain.o panic.o \
notifier.o ksysfs.o cred.o reboot.o \
async.o range.o smpboot.o
 
+# Tracing may do some dangerous __builtin_return_address() operations
+# We know they are dangerous, we don't need gcc telling us that.
+ifdef CONFIG_USING_GET_LOCK_PARENT_IP
+FRAME_CFLAGS := $(call cc-disable-warning,frame-address)
+KBUILD_CFLAGS += $(FRAME_CFLAGS)
+endif
+
 obj-$(CONFIG_MULTIUSER) += groups.o
 
 ifdef CONFIG_FUNCTION_TRACER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 364ccd0eb57b..9aae45fae52a 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -215,6 +215,7 @@ config PREEMPT_TRACER
select RING_BUFFER_ALLOW_SWAP
select TRACER_SNAPSHOT
select TRACER_SNAPSHOT_PER_CPU_SWAP
+   select USING_GET_LOCK_PARENT_IP
help
  This option measures the time spent in preemption-off critical
  sections, with microsecond accuracy.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e9a607534ca..0adcc993f372 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -962,6 +962,7 @@ config TIMER_STATS
 config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
+   select USING_GET_LOCK_PARENT_IP
default y
help
  If you say Y here then the kernel will use a debug variant of the
@@ -1144,8 +1145,17 @@ config LOCK_TORTURE_TEST
 
 endmenu # lock debugging
 
+config USING_GET_LOCK_PARENT_IP
+bool
+   help
+ Enables the use of the function get_lock_parent_ip() that
+ will use __builtin_return_address(n) with n > 0 causing
+ some gcc warnings. When this is selected, those warnings
+ will be suppressed.
+
 config TRACE_IRQFLAGS
bool
+   select 

Re: v4.8-rc: GSM audio causes trouble

2016-09-15 Thread Sebastian Reichel
Hi,

On Thu, Sep 15, 2016 at 10:51:42PM +0200, Pavel Machek wrote:
> > I was trying to improve GSM call quality, and hit problems in
> > v4.8-rc. Sound only worked for a while, then I tried to kill
> > cmtspeech_ofono_test, and could not, not even with -9 and could not
> > even reboot.
> > 
> > I went back to v4.1 (ok, quite far, I see), and problems are gone.
> > 
> > Does it work for you? Any ideas what to try... besides bisect?
> 
> v4.7 is okay.
> 
> v4.8 with
> 
> git diff ..mini-v4.7 drivers/hsi/ | git apply
> 
> Seems to work ok.
> 
> Best regards,

hsi changes from 4.7 to 4.8 are basically runtime PM enablement.
Does 4.8 work for you with just 9c99e5e51988 reverted?

-- Sebastian


signature.asc
Description: PGP signature


Re: v4.8-rc: GSM audio causes trouble

2016-09-15 Thread Sebastian Reichel
Hi,

On Thu, Sep 15, 2016 at 10:51:42PM +0200, Pavel Machek wrote:
> > I was trying to improve GSM call quality, and hit problems in
> > v4.8-rc. Sound only worked for a while, then I tried to kill
> > cmtspeech_ofono_test, and could not, not even with -9 and could not
> > even reboot.
> > 
> > I went back to v4.1 (ok, quite far, I see), and problems are gone.
> > 
> > Does it work for you? Any ideas what to try... besides bisect?
> 
> v4.7 is okay.
> 
> v4.8 with
> 
> git diff ..mini-v4.7 drivers/hsi/ | git apply
> 
> Seems to work ok.
> 
> Best regards,

hsi changes from 4.7 to 4.8 are basically runtime PM enablement.
Does 4.8 work for you with just 9c99e5e51988 reverted?

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH v3] leds: Introduce userspace leds driver

2016-09-15 Thread Pavel Machek
On Thu 2016-09-15 10:31:50, David Lechner wrote:
> On 09/15/2016 09:54 AM, Jacek Anaszewski wrote:
> >Hi Pavel,
> >
> >On 09/15/2016 03:08 PM, Pavel Machek wrote:
> >>Hi!
> >>
> +if (copy_from_user(>user_dev, buffer,
> +   sizeof(struct uleds_user_dev))) {
> +ret = -EFAULT;
> +goto out;
> +}
> +
> +if (!udev->user_dev.name[0]) {
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +ret = led_classdev_register(NULL, >led_cdev);
> +if (ret < 0)
> +goto out;
> >>
> >>No sanity checking on the name -> probably a security hole. Do not
> >>push this upstream before this is fixed.
> >
> >Thanks for catching this.
> >
> >David, please check if the LED name sticks to the LED class
> >device naming convention.
> 
> I don't think it is a good idea to enforce the LED class naming convention.
> Someone may have a userspace application they want to test that has a
> hard-coded name that does not follow the convention.

Umm.

Noone has applications with hardcoded names that are not possible
today, right?

And better not encourage crazy names.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: [PATCH v3] leds: Introduce userspace leds driver

2016-09-15 Thread Pavel Machek
On Thu 2016-09-15 10:31:50, David Lechner wrote:
> On 09/15/2016 09:54 AM, Jacek Anaszewski wrote:
> >Hi Pavel,
> >
> >On 09/15/2016 03:08 PM, Pavel Machek wrote:
> >>Hi!
> >>
> +if (copy_from_user(>user_dev, buffer,
> +   sizeof(struct uleds_user_dev))) {
> +ret = -EFAULT;
> +goto out;
> +}
> +
> +if (!udev->user_dev.name[0]) {
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +ret = led_classdev_register(NULL, >led_cdev);
> +if (ret < 0)
> +goto out;
> >>
> >>No sanity checking on the name -> probably a security hole. Do not
> >>push this upstream before this is fixed.
> >
> >Thanks for catching this.
> >
> >David, please check if the LED name sticks to the LED class
> >device naming convention.
> 
> I don't think it is a good idea to enforce the LED class naming convention.
> Someone may have a userspace application they want to test that has a
> hard-coded name that does not follow the convention.

Umm.

Noone has applications with hardcoded names that are not possible
today, right?

And better not encourage crazy names.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


[PATCH] clocksource: arm_arch_timer: Don't assume clock runs in suspend

2016-09-15 Thread Brian Norris
Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource
for suspend timekeeping"), this driver assumes that the ARM architected
timer keeps running in suspend. This is not the case for some ARM SoCs,
depending on the HW state used for system suspend. Let's not assume that
all SoCs support this, and instead only support this if the device tree
explicitly tells us it's "always on". In all other cases, just fall back
to the RTC. This should be relatively harmless.

It seems fair to key the system-suspend behavior off the same property
used for C3STOP, since if the timer doesn't keep context for CPU sleep,
it likely doesn't for system sleep either.

Signed-off-by: Brian Norris 
---
 drivers/clocksource/arm_arch_timer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index 57700541f951..e28677a34f02 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -490,7 +490,7 @@ static struct clocksource clocksource_counter = {
.rating = 400,
.read   = arch_counter_read,
.mask   = CLOCKSOURCE_MASK(56),
-   .flags  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP,
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static struct cyclecounter cyclecounter = {
@@ -526,6 +526,8 @@ static void __init arch_counter_register(unsigned type)
clocksource_counter.name = "arch_mem_counter";
}
 
+   if (!arch_timer_c3stop)
+   clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
start_count = arch_timer_read_counter();
clocksource_register_hz(_counter, arch_timer_rate);
cyclecounter.mult = clocksource_counter.mult;
-- 
2.8.1.340.g018a5d0.dirty



Re: [PATCH v3] leds: Introduce userspace leds driver

2016-09-15 Thread Pavel Machek
Hi!

> +if (copy_from_user(>user_dev, buffer,
> +   sizeof(struct uleds_user_dev))) {
> +ret = -EFAULT;
> +goto out;
> +}
> +
> +if (!udev->user_dev.name[0]) {
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +ret = led_classdev_register(NULL, >led_cdev);
> +if (ret < 0)
> +goto out;
> >>
> >>No sanity checking on the name -> probably a security hole. Do not
> >>push this upstream before this is fixed.
> >
> 
> If this is a serious security issue, then you should also raise an issue
> with input maintainers because this is the extent of sanity checking for
> uinput device names as well.

I guess that should be fixed. But lets not add new ones.

> I must confess that I am no security expert, so unless you can give specific
> examples of what potential threats are, I will not be able to guess what I
> need to do to fix it.
> 
> After some digging around the kernel, I don't see many instances of
> validating device node names. The best I have found so far comes from
> create_entry() in binfmt_misc.c
> 
>   if (!e->name[0] ||
>   !strcmp(e->name, ".") ||
>   !strcmp(e->name, "..") ||
>   strchr(e->name, '/'))
>   goto einval;
> 
> Would something like this be a sufficient sanity check? I suppose we could
> also check for non-printing characters, but I don't think ignoring them
> would be a security issue.

That would be minimum, yes. I guess it would be better/easier to just
limit the names to [a-zA-Z:-_0-9]*?

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


[PATCH] clocksource: arm_arch_timer: Don't assume clock runs in suspend

2016-09-15 Thread Brian Norris
Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource
for suspend timekeeping"), this driver assumes that the ARM architected
timer keeps running in suspend. This is not the case for some ARM SoCs,
depending on the HW state used for system suspend. Let's not assume that
all SoCs support this, and instead only support this if the device tree
explicitly tells us it's "always on". In all other cases, just fall back
to the RTC. This should be relatively harmless.

It seems fair to key the system-suspend behavior off the same property
used for C3STOP, since if the timer doesn't keep context for CPU sleep,
it likely doesn't for system sleep either.

Signed-off-by: Brian Norris 
---
 drivers/clocksource/arm_arch_timer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index 57700541f951..e28677a34f02 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -490,7 +490,7 @@ static struct clocksource clocksource_counter = {
.rating = 400,
.read   = arch_counter_read,
.mask   = CLOCKSOURCE_MASK(56),
-   .flags  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP,
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static struct cyclecounter cyclecounter = {
@@ -526,6 +526,8 @@ static void __init arch_counter_register(unsigned type)
clocksource_counter.name = "arch_mem_counter";
}
 
+   if (!arch_timer_c3stop)
+   clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
start_count = arch_timer_read_counter();
clocksource_register_hz(_counter, arch_timer_rate);
cyclecounter.mult = clocksource_counter.mult;
-- 
2.8.1.340.g018a5d0.dirty



Re: [PATCH v3] leds: Introduce userspace leds driver

2016-09-15 Thread Pavel Machek
Hi!

> +if (copy_from_user(>user_dev, buffer,
> +   sizeof(struct uleds_user_dev))) {
> +ret = -EFAULT;
> +goto out;
> +}
> +
> +if (!udev->user_dev.name[0]) {
> +ret = -EINVAL;
> +goto out;
> +}
> +
> +ret = led_classdev_register(NULL, >led_cdev);
> +if (ret < 0)
> +goto out;
> >>
> >>No sanity checking on the name -> probably a security hole. Do not
> >>push this upstream before this is fixed.
> >
> 
> If this is a serious security issue, then you should also raise an issue
> with input maintainers because this is the extent of sanity checking for
> uinput device names as well.

I guess that should be fixed. But lets not add new ones.

> I must confess that I am no security expert, so unless you can give specific
> examples of what potential threats are, I will not be able to guess what I
> need to do to fix it.
> 
> After some digging around the kernel, I don't see many instances of
> validating device node names. The best I have found so far comes from
> create_entry() in binfmt_misc.c
> 
>   if (!e->name[0] ||
>   !strcmp(e->name, ".") ||
>   !strcmp(e->name, "..") ||
>   strchr(e->name, '/'))
>   goto einval;
> 
> Would something like this be a sufficient sanity check? I suppose we could
> also check for non-printing characters, but I don't think ignoring them
> would be a security issue.

That would be minimum, yes. I guess it would be better/easier to just
limit the names to [a-zA-Z:-_0-9]*?

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


[PATCH v2 4/8] x86/dumpstack: Pin the target stack when dumping it

2016-09-15 Thread Andy Lutomirski
Specifically, pin the stack in save_stack_trace_tsk() and
show_trace_log_lvl().

This will prevent a crash if the target task dies before or while
dumping its stack once we start freeing task stacks early.

Cc: Josh Poimboeuf 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack_32.c | 5 +
 arch/x86/kernel/dumpstack_64.c | 5 +
 arch/x86/kernel/stacktrace.c   | 5 +
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 2d65cfa5e0b4..122f37d7bb7e 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -163,6 +163,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
unsigned long *stack;
int i;
 
+   if (!try_get_task_stack(task))
+   return;
+
sp = sp ? : get_stack_pointer(task, regs);
 
stack = sp;
@@ -179,6 +182,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
}
pr_cont("\n");
show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+
+   put_task_stack(task);
 }
 
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 8cb6004a4dfd..16c0d5f89b5e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -218,6 +218,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
unsigned long *stack;
int i;
 
+   if (!try_get_task_stack(task))
+   return;
+
irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
irq_stack = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
@@ -253,6 +256,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
 
pr_cont("\n");
show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+
+   put_task_stack(task);
 }
 
 void show_regs(struct pt_regs *regs)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 785aef1c7ef5..23fa81e24c8a 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -79,9 +79,14 @@ void save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
+   if (!try_get_task_stack(tsk))
+   return;
+
dump_trace(tsk, NULL, NULL, 0, _stack_ops_nosched, trace);
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
+
+   put_task_stack(tsk);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.7.4



[PATCH v2 3/8] kthread: to_live_kthread() needs try_get_task_stack()

2016-09-15 Thread Andy Lutomirski
From: Oleg Nesterov 

get_task_struct(tsk) no longer pins tsk->stack so all users of
to_live_kthread() should do try_get_task_stack/put_task_stack to protect
"struct kthread" which lives on kthread's stack.

TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too,
and rework kthread_stop(), it can use task_work_add() to sync with the exiting
kernel thread.

Message-Id: <20160629180357.ga7...@redhat.com>
Signed-off-by: Oleg Nesterov 
Signed-off-by: Andy Lutomirski 
---
 kernel/kthread.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173dca1ae..4ab4c3766a80 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -64,7 +64,7 @@ static inline struct kthread *to_kthread(struct task_struct 
*k)
 static struct kthread *to_live_kthread(struct task_struct *k)
 {
struct completion *vfork = ACCESS_ONCE(k->vfork_done);
-   if (likely(vfork))
+   if (likely(vfork) && try_get_task_stack(k))
return __to_kthread(vfork);
return NULL;
 }
@@ -425,8 +425,10 @@ void kthread_unpark(struct task_struct *k)
 {
struct kthread *kthread = to_live_kthread(k);
 
-   if (kthread)
+   if (kthread) {
__kthread_unpark(k, kthread);
+   put_task_stack(k);
+   }
 }
 EXPORT_SYMBOL_GPL(kthread_unpark);
 
@@ -455,6 +457,7 @@ int kthread_park(struct task_struct *k)
wait_for_completion(>parked);
}
}
+   put_task_stack(k);
ret = 0;
}
return ret;
@@ -490,6 +493,7 @@ int kthread_stop(struct task_struct *k)
__kthread_unpark(k, kthread);
wake_up_process(k);
wait_for_completion(>exited);
+   put_task_stack(k);
}
ret = k->exit_code;
put_task_struct(k);
-- 
2.7.4



[PATCH v2 6/8] lib/syscall: Pin the task stack in collect_syscall()

2016-09-15 Thread Andy Lutomirski
This will avoid a potential read-after-free if collect_syscall()
(e.g. /proc/PID/syscall) is called on an exiting task.

Reported-by: Jann Horn 
Signed-off-by: Andy Lutomirski 
---
 lib/syscall.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/syscall.c b/lib/syscall.c
index e30e03932480..63239e097b13 100644
--- a/lib/syscall.c
+++ b/lib/syscall.c
@@ -7,9 +7,19 @@ static int collect_syscall(struct task_struct *target, long 
*callno,
   unsigned long args[6], unsigned int maxargs,
   unsigned long *sp, unsigned long *pc)
 {
-   struct pt_regs *regs = task_pt_regs(target);
-   if (unlikely(!regs))
+   struct pt_regs *regs;
+
+   if (!try_get_task_stack(target)) {
+   /* Task has no stack, so the task isn't in a syscall. */
+   *callno = -1;
+   return 0;
+   }
+
+   regs = task_pt_regs(target);
+   if (unlikely(!regs)) {
+   put_task_stack(target);
return -EAGAIN;
+   }
 
*sp = user_stack_pointer(regs);
*pc = instruction_pointer(regs);
@@ -18,6 +28,7 @@ static int collect_syscall(struct task_struct *target, long 
*callno,
if (*callno != -1L && maxargs > 0)
syscall_get_arguments(target, regs, 0, maxargs, args);
 
+   put_task_stack(target);
return 0;
 }
 
-- 
2.7.4



[PATCH v2 8/8] fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set

2016-09-15 Thread Andy Lutomirski
vmalloc is a bit slow, and pounding vmalloc/vfree will eventually
force a global TLB flush.

To reduce pressure on them, if CONFIG_VMAP_STACK, cache two thread
stacks per cpu.  This will let us quickly allocate a hopefully
cache-hot, TLB-hot stack under heavy forking workloads (shell script
style).

On my silly pthread_create benchmark, it saves about 2 µs per
pthread_create+join with CONFIG_VMAP_STACK=y.

Signed-off-by: Andy Lutomirski 
---
 kernel/fork.c | 62 ++-
 1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 5dd0a516626d..2d44a9d05218 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -159,15 +159,41 @@ void __weak arch_release_thread_stack(unsigned long 
*stack)
  * kmemcache based allocator.
  */
 # if THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK)
+
+#ifdef CONFIG_VMAP_STACK
+/*
+ * vmalloc is a bit slow, and calling vfree enough times will force a TLB
+ * flush.  Try to minimize the number of calls by caching stacks.
+ */
+#define NR_CACHED_STACKS 2
+static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]);
+#endif
+
 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int 
node)
 {
 #ifdef CONFIG_VMAP_STACK
-   void *stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
-  VMALLOC_START, VMALLOC_END,
-  THREADINFO_GFP | __GFP_HIGHMEM,
-  PAGE_KERNEL,
-  0, node,
-  __builtin_return_address(0));
+   void *stack;
+   int i;
+
+   local_irq_disable();
+   for (i = 0; i < NR_CACHED_STACKS; i++) {
+   struct vm_struct *s = this_cpu_read(cached_stacks[i]);
+
+   if (!s)
+   continue;
+   this_cpu_write(cached_stacks[i], NULL);
+
+   tsk->stack_vm_area = s;
+   local_irq_enable();
+   return s->addr;
+   }
+   local_irq_enable();
+
+   stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
+VMALLOC_START, VMALLOC_END,
+THREADINFO_GFP | __GFP_HIGHMEM,
+PAGE_KERNEL,
+0, node, __builtin_return_address(0));
 
/*
 * We can't call find_vm_area() in interrupt context, and
@@ -187,10 +213,28 @@ static unsigned long *alloc_thread_stack_node(struct 
task_struct *tsk, int node)
 
 static inline void free_thread_stack(struct task_struct *tsk)
 {
-   if (task_stack_vm_area(tsk))
+#ifdef CONFIG_VMAP_STACK
+   if (task_stack_vm_area(tsk)) {
+   unsigned long flags;
+   int i;
+
+   local_irq_save(flags);
+   for (i = 0; i < NR_CACHED_STACKS; i++) {
+   if (this_cpu_read(cached_stacks[i]))
+   continue;
+
+   this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
+   local_irq_restore(flags);
+   return;
+   }
+   local_irq_restore(flags);
+
vfree(tsk->stack);
-   else
-   __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER);
+   return;
+   }
+#endif
+
+   __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER);
 }
 # else
 static struct kmem_cache *thread_stack_cache;
-- 
2.7.4



[PATCH v2 4/8] x86/dumpstack: Pin the target stack when dumping it

2016-09-15 Thread Andy Lutomirski
Specifically, pin the stack in save_stack_trace_tsk() and
show_trace_log_lvl().

This will prevent a crash if the target task dies before or while
dumping its stack once we start freeing task stacks early.

Cc: Josh Poimboeuf 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack_32.c | 5 +
 arch/x86/kernel/dumpstack_64.c | 5 +
 arch/x86/kernel/stacktrace.c   | 5 +
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 2d65cfa5e0b4..122f37d7bb7e 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -163,6 +163,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
unsigned long *stack;
int i;
 
+   if (!try_get_task_stack(task))
+   return;
+
sp = sp ? : get_stack_pointer(task, regs);
 
stack = sp;
@@ -179,6 +182,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
}
pr_cont("\n");
show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+
+   put_task_stack(task);
 }
 
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 8cb6004a4dfd..16c0d5f89b5e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -218,6 +218,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
unsigned long *stack;
int i;
 
+   if (!try_get_task_stack(task))
+   return;
+
irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
irq_stack = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
@@ -253,6 +256,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
 
pr_cont("\n");
show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+
+   put_task_stack(task);
 }
 
 void show_regs(struct pt_regs *regs)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 785aef1c7ef5..23fa81e24c8a 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -79,9 +79,14 @@ void save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
+   if (!try_get_task_stack(tsk))
+   return;
+
dump_trace(tsk, NULL, NULL, 0, _stack_ops_nosched, trace);
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
+
+   put_task_stack(tsk);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.7.4



[PATCH v2 3/8] kthread: to_live_kthread() needs try_get_task_stack()

2016-09-15 Thread Andy Lutomirski
From: Oleg Nesterov 

get_task_struct(tsk) no longer pins tsk->stack so all users of
to_live_kthread() should do try_get_task_stack/put_task_stack to protect
"struct kthread" which lives on kthread's stack.

TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too,
and rework kthread_stop(), it can use task_work_add() to sync with the exiting
kernel thread.

Message-Id: <20160629180357.ga7...@redhat.com>
Signed-off-by: Oleg Nesterov 
Signed-off-by: Andy Lutomirski 
---
 kernel/kthread.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173dca1ae..4ab4c3766a80 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -64,7 +64,7 @@ static inline struct kthread *to_kthread(struct task_struct 
*k)
 static struct kthread *to_live_kthread(struct task_struct *k)
 {
struct completion *vfork = ACCESS_ONCE(k->vfork_done);
-   if (likely(vfork))
+   if (likely(vfork) && try_get_task_stack(k))
return __to_kthread(vfork);
return NULL;
 }
@@ -425,8 +425,10 @@ void kthread_unpark(struct task_struct *k)
 {
struct kthread *kthread = to_live_kthread(k);
 
-   if (kthread)
+   if (kthread) {
__kthread_unpark(k, kthread);
+   put_task_stack(k);
+   }
 }
 EXPORT_SYMBOL_GPL(kthread_unpark);
 
@@ -455,6 +457,7 @@ int kthread_park(struct task_struct *k)
wait_for_completion(>parked);
}
}
+   put_task_stack(k);
ret = 0;
}
return ret;
@@ -490,6 +493,7 @@ int kthread_stop(struct task_struct *k)
__kthread_unpark(k, kthread);
wake_up_process(k);
wait_for_completion(>exited);
+   put_task_stack(k);
}
ret = k->exit_code;
put_task_struct(k);
-- 
2.7.4



[PATCH v2 6/8] lib/syscall: Pin the task stack in collect_syscall()

2016-09-15 Thread Andy Lutomirski
This will avoid a potential read-after-free if collect_syscall()
(e.g. /proc/PID/syscall) is called on an exiting task.

Reported-by: Jann Horn 
Signed-off-by: Andy Lutomirski 
---
 lib/syscall.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/syscall.c b/lib/syscall.c
index e30e03932480..63239e097b13 100644
--- a/lib/syscall.c
+++ b/lib/syscall.c
@@ -7,9 +7,19 @@ static int collect_syscall(struct task_struct *target, long 
*callno,
   unsigned long args[6], unsigned int maxargs,
   unsigned long *sp, unsigned long *pc)
 {
-   struct pt_regs *regs = task_pt_regs(target);
-   if (unlikely(!regs))
+   struct pt_regs *regs;
+
+   if (!try_get_task_stack(target)) {
+   /* Task has no stack, so the task isn't in a syscall. */
+   *callno = -1;
+   return 0;
+   }
+
+   regs = task_pt_regs(target);
+   if (unlikely(!regs)) {
+   put_task_stack(target);
return -EAGAIN;
+   }
 
*sp = user_stack_pointer(regs);
*pc = instruction_pointer(regs);
@@ -18,6 +28,7 @@ static int collect_syscall(struct task_struct *target, long 
*callno,
if (*callno != -1L && maxargs > 0)
syscall_get_arguments(target, regs, 0, maxargs, args);
 
+   put_task_stack(target);
return 0;
 }
 
-- 
2.7.4



[PATCH v2 8/8] fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set

2016-09-15 Thread Andy Lutomirski
vmalloc is a bit slow, and pounding vmalloc/vfree will eventually
force a global TLB flush.

To reduce pressure on them, if CONFIG_VMAP_STACK, cache two thread
stacks per cpu.  This will let us quickly allocate a hopefully
cache-hot, TLB-hot stack under heavy forking workloads (shell script
style).

On my silly pthread_create benchmark, it saves about 2 µs per
pthread_create+join with CONFIG_VMAP_STACK=y.

Signed-off-by: Andy Lutomirski 
---
 kernel/fork.c | 62 ++-
 1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 5dd0a516626d..2d44a9d05218 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -159,15 +159,41 @@ void __weak arch_release_thread_stack(unsigned long 
*stack)
  * kmemcache based allocator.
  */
 # if THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK)
+
+#ifdef CONFIG_VMAP_STACK
+/*
+ * vmalloc is a bit slow, and calling vfree enough times will force a TLB
+ * flush.  Try to minimize the number of calls by caching stacks.
+ */
+#define NR_CACHED_STACKS 2
+static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]);
+#endif
+
 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int 
node)
 {
 #ifdef CONFIG_VMAP_STACK
-   void *stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
-  VMALLOC_START, VMALLOC_END,
-  THREADINFO_GFP | __GFP_HIGHMEM,
-  PAGE_KERNEL,
-  0, node,
-  __builtin_return_address(0));
+   void *stack;
+   int i;
+
+   local_irq_disable();
+   for (i = 0; i < NR_CACHED_STACKS; i++) {
+   struct vm_struct *s = this_cpu_read(cached_stacks[i]);
+
+   if (!s)
+   continue;
+   this_cpu_write(cached_stacks[i], NULL);
+
+   tsk->stack_vm_area = s;
+   local_irq_enable();
+   return s->addr;
+   }
+   local_irq_enable();
+
+   stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
+VMALLOC_START, VMALLOC_END,
+THREADINFO_GFP | __GFP_HIGHMEM,
+PAGE_KERNEL,
+0, node, __builtin_return_address(0));
 
/*
 * We can't call find_vm_area() in interrupt context, and
@@ -187,10 +213,28 @@ static unsigned long *alloc_thread_stack_node(struct 
task_struct *tsk, int node)
 
 static inline void free_thread_stack(struct task_struct *tsk)
 {
-   if (task_stack_vm_area(tsk))
+#ifdef CONFIG_VMAP_STACK
+   if (task_stack_vm_area(tsk)) {
+   unsigned long flags;
+   int i;
+
+   local_irq_save(flags);
+   for (i = 0; i < NR_CACHED_STACKS; i++) {
+   if (this_cpu_read(cached_stacks[i]))
+   continue;
+
+   this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
+   local_irq_restore(flags);
+   return;
+   }
+   local_irq_restore(flags);
+
vfree(tsk->stack);
-   else
-   __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER);
+   return;
+   }
+#endif
+
+   __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER);
 }
 # else
 static struct kmem_cache *thread_stack_cache;
-- 
2.7.4



[PATCH v2 7/8] sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK

2016-09-15 Thread Andy Lutomirski
We currently keep every task's stack around until the task_struct
itself is freed.  This means that we keep the stack allocation alive
for longer than necessary and that, under load, we free stacks in
big batches whenever RCU drops the last task reference.  Neither of
these is good for reuse of cache-hot memory, and freeing in batches
prevents us from usefully caching small numbers of vmalloced stacks.

On architectures that have thread_info on the stack, we can't easily
change this, but on architectures that set THREAD_INFO_IN_TASK, we
can free it as soon as the task is dead.

Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Signed-off-by: Andy Lutomirski 
---
 include/linux/init_task.h |  4 +++-
 include/linux/sched.h | 14 ++
 kernel/fork.c | 35 ++-
 kernel/sched/core.c   |  4 
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 9c04d44eeb3c..325f649d77ff 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -186,7 +186,9 @@ extern struct task_group root_task_group;
 #endif
 
 #ifdef CONFIG_THREAD_INFO_IN_TASK
-# define INIT_TASK_TI(tsk) .thread_info = INIT_THREAD_INFO(tsk),
+# define INIT_TASK_TI(tsk) \
+   .thread_info = INIT_THREAD_INFO(tsk),   \
+   .stack_refcount = ATOMIC_INIT(1),
 #else
 # define INIT_TASK_TI(tsk)
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a95867267e9f..abb795afc823 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1936,6 +1936,10 @@ struct task_struct {
 #ifdef CONFIG_VMAP_STACK
struct vm_struct *stack_vm_area;
 #endif
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+   /* A live task holds one reference. */
+   atomic_t stack_refcount;
+#endif
 /* CPU-specific state of this task */
struct thread_struct thread;
 /*
@@ -3143,12 +3147,22 @@ static inline unsigned long *end_of_stack(struct 
task_struct *p)
 
 #endif
 
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+static inline void *try_get_task_stack(struct task_struct *tsk)
+{
+   return atomic_inc_not_zero(>stack_refcount) ?
+   task_stack_page(tsk) : NULL;
+}
+
+extern void put_task_stack(struct task_struct *tsk);
+#else
 static inline void *try_get_task_stack(struct task_struct *tsk)
 {
return task_stack_page(tsk);
 }
 
 static inline void put_task_stack(struct task_struct *tsk) {}
+#endif
 
 #define task_stack_end_corrupted(task) \
(*(end_of_stack(task)) != STACK_END_MAGIC)
diff --git a/kernel/fork.c b/kernel/fork.c
index 0c240fd5beba..5dd0a516626d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -269,11 +269,40 @@ static void account_kernel_stack(struct task_struct *tsk, 
int account)
}
 }
 
-void free_task(struct task_struct *tsk)
+static void release_task_stack(struct task_struct *tsk)
 {
account_kernel_stack(tsk, -1);
arch_release_thread_stack(tsk->stack);
free_thread_stack(tsk);
+   tsk->stack = NULL;
+#ifdef CONFIG_VMAP_STACK
+   tsk->stack_vm_area = NULL;
+#endif
+}
+
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+void put_task_stack(struct task_struct *tsk)
+{
+   if (atomic_dec_and_test(>stack_refcount))
+   release_task_stack(tsk);
+}
+#endif
+
+void free_task(struct task_struct *tsk)
+{
+#ifndef CONFIG_THREAD_INFO_IN_TASK
+   /*
+* The task is finally done with both the stack and thread_info,
+* so free both.
+*/
+   release_task_stack(tsk);
+#else
+   /*
+* If the task had a separate stack allocation, it should be gone
+* by now.
+*/
+   WARN_ON_ONCE(atomic_read(>stack_refcount) != 0);
+#endif
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
put_seccomp_filter(tsk);
@@ -411,6 +440,9 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
 #ifdef CONFIG_VMAP_STACK
tsk->stack_vm_area = stack_vm_area;
 #endif
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+   atomic_set(>stack_refcount, 1);
+#endif
 
if (err)
goto free_stack;
@@ -1771,6 +1803,7 @@ bad_fork_cleanup_count:
atomic_dec(>cred->user->processes);
exit_creds(p);
 bad_fork_free:
+   put_task_stack(p);
free_task(p);
 fork_out:
return ERR_PTR(retval);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0b6238f18da2..23c6037e2d89 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2772,6 +2772,10 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
 * task and put them back on the free list.
 */
kprobe_flush_task(prev);
+
+   /* Task is done with its stack. */
+   put_task_stack(prev);
+
put_task_struct(prev);
}
 
-- 
2.7.4



[PATCH v2 5/8] x86/process: Pin the target stack in get_wchan()

2016-09-15 Thread Andy Lutomirski
This will prevent a crash if get_wchan() runs after the task stack
is freed.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/process.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0b9ed8ec5226..4002b475171c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -532,15 +532,18 @@ unsigned long thread_saved_pc(struct task_struct *tsk)
  */
 unsigned long get_wchan(struct task_struct *p)
 {
-   unsigned long start, bottom, top, sp, fp, ip;
+   unsigned long start, bottom, top, sp, fp, ip, ret = 0;
int count = 0;
 
if (!p || p == current || p->state == TASK_RUNNING)
return 0;
 
+   if (!try_get_task_stack(p))
+   return 0;
+
start = (unsigned long)task_stack_page(p);
if (!start)
-   return 0;
+   goto out;
 
/*
 * Layout of the stack page:
@@ -564,16 +567,21 @@ unsigned long get_wchan(struct task_struct *p)
 
sp = READ_ONCE(p->thread.sp);
if (sp < bottom || sp > top)
-   return 0;
+   goto out;
 
fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
do {
if (fp < bottom || fp > top)
-   return 0;
+   goto out;
ip = READ_ONCE_NOCHECK(*(unsigned long *)(fp + sizeof(unsigned 
long)));
-   if (!in_sched_functions(ip))
-   return ip;
+   if (!in_sched_functions(ip)) {
+   ret = ip;
+   goto out;
+   }
fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
} while (count++ < 16 && p->state != TASK_RUNNING);
-   return 0;
+
+out:
+   put_task_stack(p);
+   return ret;
 }
-- 
2.7.4



[PATCH v2 7/8] sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK

2016-09-15 Thread Andy Lutomirski
We currently keep every task's stack around until the task_struct
itself is freed.  This means that we keep the stack allocation alive
for longer than necessary and that, under load, we free stacks in
big batches whenever RCU drops the last task reference.  Neither of
these is good for reuse of cache-hot memory, and freeing in batches
prevents us from usefully caching small numbers of vmalloced stacks.

On architectures that have thread_info on the stack, we can't easily
change this, but on architectures that set THREAD_INFO_IN_TASK, we
can free it as soon as the task is dead.

Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Signed-off-by: Andy Lutomirski 
---
 include/linux/init_task.h |  4 +++-
 include/linux/sched.h | 14 ++
 kernel/fork.c | 35 ++-
 kernel/sched/core.c   |  4 
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 9c04d44eeb3c..325f649d77ff 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -186,7 +186,9 @@ extern struct task_group root_task_group;
 #endif
 
 #ifdef CONFIG_THREAD_INFO_IN_TASK
-# define INIT_TASK_TI(tsk) .thread_info = INIT_THREAD_INFO(tsk),
+# define INIT_TASK_TI(tsk) \
+   .thread_info = INIT_THREAD_INFO(tsk),   \
+   .stack_refcount = ATOMIC_INIT(1),
 #else
 # define INIT_TASK_TI(tsk)
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a95867267e9f..abb795afc823 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1936,6 +1936,10 @@ struct task_struct {
 #ifdef CONFIG_VMAP_STACK
struct vm_struct *stack_vm_area;
 #endif
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+   /* A live task holds one reference. */
+   atomic_t stack_refcount;
+#endif
 /* CPU-specific state of this task */
struct thread_struct thread;
 /*
@@ -3143,12 +3147,22 @@ static inline unsigned long *end_of_stack(struct 
task_struct *p)
 
 #endif
 
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+static inline void *try_get_task_stack(struct task_struct *tsk)
+{
+   return atomic_inc_not_zero(>stack_refcount) ?
+   task_stack_page(tsk) : NULL;
+}
+
+extern void put_task_stack(struct task_struct *tsk);
+#else
 static inline void *try_get_task_stack(struct task_struct *tsk)
 {
return task_stack_page(tsk);
 }
 
 static inline void put_task_stack(struct task_struct *tsk) {}
+#endif
 
 #define task_stack_end_corrupted(task) \
(*(end_of_stack(task)) != STACK_END_MAGIC)
diff --git a/kernel/fork.c b/kernel/fork.c
index 0c240fd5beba..5dd0a516626d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -269,11 +269,40 @@ static void account_kernel_stack(struct task_struct *tsk, 
int account)
}
 }
 
-void free_task(struct task_struct *tsk)
+static void release_task_stack(struct task_struct *tsk)
 {
account_kernel_stack(tsk, -1);
arch_release_thread_stack(tsk->stack);
free_thread_stack(tsk);
+   tsk->stack = NULL;
+#ifdef CONFIG_VMAP_STACK
+   tsk->stack_vm_area = NULL;
+#endif
+}
+
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+void put_task_stack(struct task_struct *tsk)
+{
+   if (atomic_dec_and_test(>stack_refcount))
+   release_task_stack(tsk);
+}
+#endif
+
+void free_task(struct task_struct *tsk)
+{
+#ifndef CONFIG_THREAD_INFO_IN_TASK
+   /*
+* The task is finally done with both the stack and thread_info,
+* so free both.
+*/
+   release_task_stack(tsk);
+#else
+   /*
+* If the task had a separate stack allocation, it should be gone
+* by now.
+*/
+   WARN_ON_ONCE(atomic_read(>stack_refcount) != 0);
+#endif
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
put_seccomp_filter(tsk);
@@ -411,6 +440,9 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
 #ifdef CONFIG_VMAP_STACK
tsk->stack_vm_area = stack_vm_area;
 #endif
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+   atomic_set(>stack_refcount, 1);
+#endif
 
if (err)
goto free_stack;
@@ -1771,6 +1803,7 @@ bad_fork_cleanup_count:
atomic_dec(>cred->user->processes);
exit_creds(p);
 bad_fork_free:
+   put_task_stack(p);
free_task(p);
 fork_out:
return ERR_PTR(retval);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0b6238f18da2..23c6037e2d89 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2772,6 +2772,10 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
 * task and put them back on the free list.
 */
kprobe_flush_task(prev);
+
+   /* Task is done with its stack. */
+   put_task_stack(prev);
+
put_task_struct(prev);
}
 
-- 
2.7.4



[PATCH v2 5/8] x86/process: Pin the target stack in get_wchan()

2016-09-15 Thread Andy Lutomirski
This will prevent a crash if get_wchan() runs after the task stack
is freed.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/process.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0b9ed8ec5226..4002b475171c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -532,15 +532,18 @@ unsigned long thread_saved_pc(struct task_struct *tsk)
  */
 unsigned long get_wchan(struct task_struct *p)
 {
-   unsigned long start, bottom, top, sp, fp, ip;
+   unsigned long start, bottom, top, sp, fp, ip, ret = 0;
int count = 0;
 
if (!p || p == current || p->state == TASK_RUNNING)
return 0;
 
+   if (!try_get_task_stack(p))
+   return 0;
+
start = (unsigned long)task_stack_page(p);
if (!start)
-   return 0;
+   goto out;
 
/*
 * Layout of the stack page:
@@ -564,16 +567,21 @@ unsigned long get_wchan(struct task_struct *p)
 
sp = READ_ONCE(p->thread.sp);
if (sp < bottom || sp > top)
-   return 0;
+   goto out;
 
fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
do {
if (fp < bottom || fp > top)
-   return 0;
+   goto out;
ip = READ_ONCE_NOCHECK(*(unsigned long *)(fp + sizeof(unsigned 
long)));
-   if (!in_sched_functions(ip))
-   return ip;
+   if (!in_sched_functions(ip)) {
+   ret = ip;
+   goto out;
+   }
fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
} while (count++ < 16 && p->state != TASK_RUNNING);
-   return 0;
+
+out:
+   put_task_stack(p);
+   return ret;
 }
-- 
2.7.4



[PATCH v2 2/8] sched: Add try_get_task_stack() and put_task_stack()

2016-09-15 Thread Andy Lutomirski
There are a few places in the kernel that access stack memory
belonging to a different task.  Before we can start freeing task
stacks before the task_struct is freed, we need a way for those code
paths to pin the stack.

Signed-off-by: Andy Lutomirski 
---
 include/linux/sched.h | 16 
 init/Kconfig  |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a287e8b13549..a95867267e9f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3094,11 +3094,19 @@ static inline struct thread_info 
*task_thread_info(struct task_struct *task)
 {
return >thread_info;
 }
+
+/*
+ * When accessing the stack of a non-current task that might exit, use
+ * try_get_task_stack() instead.  task_stack_page will return a pointer
+ * that could get freed out from under you.
+ */
 static inline void *task_stack_page(const struct task_struct *task)
 {
return task->stack;
 }
+
 #define setup_thread_stack(new,old)do { } while(0)
+
 static inline unsigned long *end_of_stack(const struct task_struct *task)
 {
return task->stack;
@@ -3134,6 +3142,14 @@ static inline unsigned long *end_of_stack(struct 
task_struct *p)
 }
 
 #endif
+
+static inline void *try_get_task_stack(struct task_struct *tsk)
+{
+   return task_stack_page(tsk);
+}
+
+static inline void put_task_stack(struct task_struct *tsk) {}
+
 #define task_stack_end_corrupted(task) \
(*(end_of_stack(task)) != STACK_END_MAGIC)
 
diff --git a/init/Kconfig b/init/Kconfig
index ec8d43894b02..3b9a47fe843b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -33,6 +33,9 @@ config THREAD_INFO_IN_TASK
  make this work, an arch will need to remove all thread_info fields
  except flags and fix any runtime bugs.
 
+ One subtle change that will be needed is to use try_get_task_stack()
+ and put_task_stack() in save_thread_stack_tsk() and get_wchan().
+
 menu "General setup"
 
 config BROKEN
-- 
2.7.4



[PATCH v2 1/8] x86/entry/64: Fix a minor comment rebase error

2016-09-15 Thread Andy Lutomirski
When I rebased my thread_info changes onto Brian's switch_to()
changes, I carefully checked that I fixed up all the code correctly,
but I missed a comment :(

Fixes: 15f4eae70d36 ("x86: Move thread_info into task_struct")
Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 2b46384b4a4f..80ab68a42621 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -372,7 +372,6 @@ END(ptregs_\func)
 /*
  * %rdi: prev task
  * %rsi: next task
- * rsi: task we're switching to
  */
 ENTRY(__switch_to_asm)
/*
-- 
2.7.4



[PATCH v2 2/8] sched: Add try_get_task_stack() and put_task_stack()

2016-09-15 Thread Andy Lutomirski
There are a few places in the kernel that access stack memory
belonging to a different task.  Before we can start freeing task
stacks before the task_struct is freed, we need a way for those code
paths to pin the stack.

Signed-off-by: Andy Lutomirski 
---
 include/linux/sched.h | 16 
 init/Kconfig  |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a287e8b13549..a95867267e9f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3094,11 +3094,19 @@ static inline struct thread_info 
*task_thread_info(struct task_struct *task)
 {
return >thread_info;
 }
+
+/*
+ * When accessing the stack of a non-current task that might exit, use
+ * try_get_task_stack() instead.  task_stack_page will return a pointer
+ * that could get freed out from under you.
+ */
 static inline void *task_stack_page(const struct task_struct *task)
 {
return task->stack;
 }
+
 #define setup_thread_stack(new,old)do { } while(0)
+
 static inline unsigned long *end_of_stack(const struct task_struct *task)
 {
return task->stack;
@@ -3134,6 +3142,14 @@ static inline unsigned long *end_of_stack(struct 
task_struct *p)
 }
 
 #endif
+
+static inline void *try_get_task_stack(struct task_struct *tsk)
+{
+   return task_stack_page(tsk);
+}
+
+static inline void put_task_stack(struct task_struct *tsk) {}
+
 #define task_stack_end_corrupted(task) \
(*(end_of_stack(task)) != STACK_END_MAGIC)
 
diff --git a/init/Kconfig b/init/Kconfig
index ec8d43894b02..3b9a47fe843b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -33,6 +33,9 @@ config THREAD_INFO_IN_TASK
  make this work, an arch will need to remove all thread_info fields
  except flags and fix any runtime bugs.
 
+ One subtle change that will be needed is to use try_get_task_stack()
+ and put_task_stack() in save_thread_stack_tsk() and get_wchan().
+
 menu "General setup"
 
 config BROKEN
-- 
2.7.4



[PATCH v2 1/8] x86/entry/64: Fix a minor comment rebase error

2016-09-15 Thread Andy Lutomirski
When I rebased my thread_info changes onto Brian's switch_to()
changes, I carefully checked that I fixed up all the code correctly,
but I missed a comment :(

Fixes: 15f4eae70d36 ("x86: Move thread_info into task_struct")
Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 2b46384b4a4f..80ab68a42621 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -372,7 +372,6 @@ END(ptregs_\func)
 /*
  * %rdi: prev task
  * %rsi: next task
- * rsi: task we're switching to
  */
 ENTRY(__switch_to_asm)
/*
-- 
2.7.4



[PATCH v2 0/8] thread_info cleanups and stack caching

2016-09-15 Thread Andy Lutomirski
This is the last bit of the vmap stack pile.  Now that thread_info
is non-magical, we can free the thread stack as soon as the task is
dead (without waiting for RCU) and then, if vmapped stacks are in
use, cache the entire stack for reuse on the same cpu.

This seems to be an overall speedup of about 0.5-1 µs per
pthread_create/join compared to the old CONFIG_VMAP_STACK=n baseline
in a simple test -- a percpu cache of vmalloced stacks appears to be
a bit faster than a high-order stack allocation, at least when the
cache hits.  (I expect that workloads with a low cache hit rate are
likely to be dominated by other effects anyway.)

Changes from v1:
 - Rebased.
 - Added a comment fixup (patch 1).
 - Add one more try_get_task_stack() that Josh noticed.

Changes from before:
 - A bunch of the series is already in 4.8-rc.
 - Added the get_wchan() and collect_syscall() patches.
 - Rebased.

Andy Lutomirski (7):
  x86/entry/64: Fix a minor comment rebase error
  sched: Add try_get_task_stack() and put_task_stack()
  x86/dumpstack: Pin the target stack when dumping it
  x86/process: Pin the target stack in get_wchan()
  lib/syscall: Pin the task stack in collect_syscall()
  sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
  fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set

Oleg Nesterov (1):
  kthread: to_live_kthread() needs try_get_task_stack()

 arch/x86/entry/entry_64.S  |  1 -
 arch/x86/kernel/dumpstack_32.c |  5 +++
 arch/x86/kernel/dumpstack_64.c |  5 +++
 arch/x86/kernel/process.c  | 22 +++---
 arch/x86/kernel/stacktrace.c   |  5 +++
 include/linux/init_task.h  |  4 +-
 include/linux/sched.h  | 30 +
 init/Kconfig   |  3 ++
 kernel/fork.c  | 97 +-
 kernel/kthread.c   |  8 +++-
 kernel/sched/core.c|  4 ++
 lib/syscall.c  | 15 ++-
 12 files changed, 176 insertions(+), 23 deletions(-)

-- 
2.7.4



[PATCH v2 0/8] thread_info cleanups and stack caching

2016-09-15 Thread Andy Lutomirski
This is the last bit of the vmap stack pile.  Now that thread_info
is non-magical, we can free the thread stack as soon as the task is
dead (without waiting for RCU) and then, if vmapped stacks are in
use, cache the entire stack for reuse on the same cpu.

This seems to be an overall speedup of about 0.5-1 µs per
pthread_create/join compared to the old CONFIG_VMAP_STACK=n baseline
in a simple test -- a percpu cache of vmalloced stacks appears to be
a bit faster than a high-order stack allocation, at least when the
cache hits.  (I expect that workloads with a low cache hit rate are
likely to be dominated by other effects anyway.)

Changes from v1:
 - Rebased.
 - Added a comment fixup (patch 1).
 - Add one more try_get_task_stack() that Josh noticed.

Changes from before:
 - A bunch of the series is already in 4.8-rc.
 - Added the get_wchan() and collect_syscall() patches.
 - Rebased.

Andy Lutomirski (7):
  x86/entry/64: Fix a minor comment rebase error
  sched: Add try_get_task_stack() and put_task_stack()
  x86/dumpstack: Pin the target stack when dumping it
  x86/process: Pin the target stack in get_wchan()
  lib/syscall: Pin the task stack in collect_syscall()
  sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
  fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set

Oleg Nesterov (1):
  kthread: to_live_kthread() needs try_get_task_stack()

 arch/x86/entry/entry_64.S  |  1 -
 arch/x86/kernel/dumpstack_32.c |  5 +++
 arch/x86/kernel/dumpstack_64.c |  5 +++
 arch/x86/kernel/process.c  | 22 +++---
 arch/x86/kernel/stacktrace.c   |  5 +++
 include/linux/init_task.h  |  4 +-
 include/linux/sched.h  | 30 +
 init/Kconfig   |  3 ++
 kernel/fork.c  | 97 +-
 kernel/kthread.c   |  8 +++-
 kernel/sched/core.c|  4 ++
 lib/syscall.c  | 15 ++-
 12 files changed, 176 insertions(+), 23 deletions(-)

-- 
2.7.4



Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs

2016-09-15 Thread Dave Chinner
On Thu, Sep 15, 2016 at 07:04:27PM -0700, Dan Williams wrote:
> On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner  wrote:
> > On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote:
> >> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner  wrote:
> >> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
> >> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig  wrote:
> >> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
> >> >> >> The DAX property, page cache bypass, of a VMA is only detectable via 
> >> >> >> the
> >> >> >> vma_is_dax() helper to check the S_DAX inode flag.  However, this is
> >> >> >> only available internal to the kernel and is a property that 
> >> >> >> userspace
> >> >> >> applications would like to interrogate.
> >> >> >
> >> >> > They have absolutely no business knowing such an implementation 
> >> >> > detail.
> >> >>
> >> >> Hasn't that train already left the station with FS_XFLAG_DAX?
> >> >
> >> > No, that's an admin flag, not a runtime hint for applications. Just
> >> > because that flag is set on an inode, it does not mean that DAX is
> >> > actually in use - it will be ignored if the backing dev is not dax
> >> > capable.
> >>
> >> What's the point of an admin flag if an admin can't do cat /proc/ >> of interest>/smaps, or some other mechanism, to validate that the
> >> setting the admin cares about is in effect?
> >
> > Sorry, I don't follow - why would you be looking at mapping file
> > regions in /proc to determine if some file somewhere in a filesystem
> > has a specific flag set on it or not?
> >
> > FS_XFLAG_DAX is an inode attribute flag, not something you can
> > query or administrate through mmap:
> >
> > I.e.
> > # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo
> >  --- foo
> >  --x foo
> >  --- foo
> > #
> >
> > What happens when that flag is set on an inode is determined by a
> > whole bunch of other things that are completely separate to the
> > management of the inode flag itself.
> 
> Right, I understand that, but how does an admin audit those "bunch of
> other things"

Filesystem mounts checks all the various stuff that determines
whether DAX can be used. It logs to the console that it is "Dax
capable". Any file that then has FS_XFLAG_DAX set will result in DAX
being used. There is no other possibility when these two things are
reported.

/me points at runtime diagnostic tracepoints like
trace_xfs_file_dax_read() and notes that dax is sadly lacking in
diagnostic tracepoints.

Besides, userspace can't do anything useful with this information,
because the FS_XFLAG_DAX can be changed /at any time/ by an admin.
And the filesystem is free to remove it at any time, too, if it
needs to (e.g. file gets reflinked or snapshotted).

That's right, an inode can dynamically change from DAX to non-DAX
underneath the application, and the application /will not notice/.
That's because changing the flag will sync and invalidate the
existing mappings and the next application access will simply fault
it back in using whatever mechanism the inode is now configured
with.

Plain and simple: userspace has absolutely no fucking idea of
whether DAX is enabled or not, and whatever the kernel returns to
userspace above the DAX configuration is stale before it even got
out of the kernel

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs

2016-09-15 Thread Dave Chinner
On Thu, Sep 15, 2016 at 07:04:27PM -0700, Dan Williams wrote:
> On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner  wrote:
> > On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote:
> >> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner  wrote:
> >> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
> >> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig  wrote:
> >> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
> >> >> >> The DAX property, page cache bypass, of a VMA is only detectable via 
> >> >> >> the
> >> >> >> vma_is_dax() helper to check the S_DAX inode flag.  However, this is
> >> >> >> only available internal to the kernel and is a property that 
> >> >> >> userspace
> >> >> >> applications would like to interrogate.
> >> >> >
> >> >> > They have absolutely no business knowing such an implementation 
> >> >> > detail.
> >> >>
> >> >> Hasn't that train already left the station with FS_XFLAG_DAX?
> >> >
> >> > No, that's an admin flag, not a runtime hint for applications. Just
> >> > because that flag is set on an inode, it does not mean that DAX is
> >> > actually in use - it will be ignored if the backing dev is not dax
> >> > capable.
> >>
> >> What's the point of an admin flag if an admin can't do cat /proc/ >> of interest>/smaps, or some other mechanism, to validate that the
> >> setting the admin cares about is in effect?
> >
> > Sorry, I don't follow - why would you be looking at mapping file
> > regions in /proc to determine if some file somewhere in a filesystem
> > has a specific flag set on it or not?
> >
> > FS_XFLAG_DAX is an inode attribute flag, not something you can
> > query or administrate through mmap:
> >
> > I.e.
> > # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo
> >  --- foo
> >  --x foo
> >  --- foo
> > #
> >
> > What happens when that flag is set on an inode is determined by a
> > whole bunch of other things that are completely separate to the
> > management of the inode flag itself.
> 
> Right, I understand that, but how does an admin audit those "bunch of
> other things"

Filesystem mounts checks all the various stuff that determines
whether DAX can be used. It logs to the console that it is "Dax
capable". Any file that then has FS_XFLAG_DAX set will result in DAX
being used. There is no other possibility when these two things are
reported.

/me points at runtime diagnostic tracepoints like
trace_xfs_file_dax_read() and notes that dax is sadly lacking in
diagnostic tracepoints.

Besides, userspace can't do anything useful with this information,
because the FS_XFLAG_DAX can be changed /at any time/ by an admin.
And the filesystem is free to remove it at any time, too, if it
needs to (e.g. file gets reflinked or snapshotted).

That's right, an inode can dynamically change from DAX to non-DAX
underneath the application, and the application /will not notice/.
That's because changing the flag will sync and invalidate the
existing mappings and the next application access will simply fault
it back in using whatever mechanism the inode is now configured
with.

Plain and simple: userspace has absolutely no fucking idea of
whether DAX is enabled or not, and whatever the kernel returns to
userspace above the DAX configuration is stale before it even got
out of the kernel

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: clk/Renesas-MSTP: Less function calls in cpg_mstp_clocks_init() after error detection

2016-09-15 Thread SF Markus Elfring
>> * Should it usually be determined quicker if a required resource like
>>   memory could be acquired before trying the next allocation?
> 
> Note that if memory allocation fails in this driver, the system won't
> boot at all.

Thanks for this information.


> So even not checking for allocation failures at all could be acceptable.

I find this opinion interesting somehow.

I would generally prefer to check return values from various function calls
immediately instead of keeping the discussed source code structure unchanged.

Regards,
Markus


Re: clk/Renesas-MSTP: Less function calls in cpg_mstp_clocks_init() after error detection

2016-09-15 Thread SF Markus Elfring
>> * Should it usually be determined quicker if a required resource like
>>   memory could be acquired before trying the next allocation?
> 
> Note that if memory allocation fails in this driver, the system won't
> boot at all.

Thanks for this information.


> So even not checking for allocation failures at all could be acceptable.

I find this opinion interesting somehow.

I would generally prefer to check return values from various function calls
immediately instead of keeping the discussed source code structure unchanged.

Regards,
Markus


Re: [PATCH v3 3/3] x86,arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction

2016-09-15 Thread Kyle Huey
On Thu, Sep 15, 2016 at 5:07 PM, Andy Lutomirski  wrote:
> On Thu, Sep 15, 2016 at 4:33 PM, Kyle Huey  wrote:
>> +int get_cpuid_mode(unsigned long adr)
>> +{
>> +   unsigned int val;
>> +
>> +   if (test_thread_flag(TIF_NOCPUID))
>> +   val = ARCH_CPUID_SIGSEGV;
>> +   else
>> +   val = ARCH_CPUID_ENABLE;
>> +
>> +   return put_user(val, (unsigned int __user *)adr);
>> +}
>
> Can we just do:
>
> if (arg2 != 0)
>   return -EINVAL;
> else
>  return test_thread_flag(TIF_NOCPUID) ? ARCH_CPUID_SIGSEGBV : 
> ARCH_CPUID_ENABLE;

We could.  I copied the pattern of PR_GET_TSC here, but I don't feel
strongly about it.

>> diff --git a/tools/testing/selftests/x86/cpuid-fault.c 
>> b/tools/testing/selftests/x86/cpuid-fault.c
>> new file mode 100644
>> index 000..a9f3f68
>> --- /dev/null
>> +++ b/tools/testing/selftests/x86/cpuid-fault.c
>> @@ -0,0 +1,234 @@
>> +
>> +/*
>> + * Tests for arch_prctl(ARCH_GET_CPUID, ...) / prctl(ARCH_SET_CPUID, ...)
>> + *
>> + * Basic test to test behaviour of ARCH_GET_CPUID and ARCH_SET_CPUID
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +#include 
>> +
>> +const char *cpuid_names[] = {
>> +   [0] = "[not set]",
>
> Is 0 even possible?

Only if the call fails.

- Kyle


Re: [PATCH v3 3/3] x86,arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction

2016-09-15 Thread Kyle Huey
On Thu, Sep 15, 2016 at 5:07 PM, Andy Lutomirski  wrote:
> On Thu, Sep 15, 2016 at 4:33 PM, Kyle Huey  wrote:
>> +int get_cpuid_mode(unsigned long adr)
>> +{
>> +   unsigned int val;
>> +
>> +   if (test_thread_flag(TIF_NOCPUID))
>> +   val = ARCH_CPUID_SIGSEGV;
>> +   else
>> +   val = ARCH_CPUID_ENABLE;
>> +
>> +   return put_user(val, (unsigned int __user *)adr);
>> +}
>
> Can we just do:
>
> if (arg2 != 0)
>   return -EINVAL;
> else
>  return test_thread_flag(TIF_NOCPUID) ? ARCH_CPUID_SIGSEGBV : 
> ARCH_CPUID_ENABLE;

We could.  I copied the pattern of PR_GET_TSC here, but I don't feel
strongly about it.

>> diff --git a/tools/testing/selftests/x86/cpuid-fault.c 
>> b/tools/testing/selftests/x86/cpuid-fault.c
>> new file mode 100644
>> index 000..a9f3f68
>> --- /dev/null
>> +++ b/tools/testing/selftests/x86/cpuid-fault.c
>> @@ -0,0 +1,234 @@
>> +
>> +/*
>> + * Tests for arch_prctl(ARCH_GET_CPUID, ...) / prctl(ARCH_SET_CPUID, ...)
>> + *
>> + * Basic test to test behaviour of ARCH_GET_CPUID and ARCH_SET_CPUID
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +#include 
>> +
>> +const char *cpuid_names[] = {
>> +   [0] = "[not set]",
>
> Is 0 even possible?

Only if the call fails.

- Kyle


Re: [PATCH] tools: iio: fix iio build error by adding KBUILD_OUTPUT support to makefile

2016-09-15 Thread Mugunthan V N
Jonathan

On Monday 22 August 2016 12:35 AM, Jonathan Cameron wrote:
> On 11/08/16 07:01, Mugunthan V N wrote:
>> > Current make doesn't have support to pass kernel built directory
>> > to find events.h kernel header file, so adding support for
>> > KBUILD_OUTPUT support to Makefile.
>> > 
>> > $ make CROSS_COMPILE=arm-linux-gnueabihf- -C tools/iio
>> > make: Entering directory 
>> > '/home/a0131834/workspace/git/mainline/linux/tools/iio'
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o iio_event_monitor.o 
>> > iio_event_monitor.c
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o iio_utils.o 
>> > iio_utils.c
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o lsiio.o lsiio.c
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o 
>> > iio_generic_buffer.o iio_generic_buffer.c
>> > arm-linux-gnueabihf-gcc   lsiio.o iio_utils.o   -o lsiio
>> > arm-linux-gnueabihf-gcc   iio_generic_buffer.o iio_utils.o   -o 
>> > iio_generic_buffer
>> > iio_event_monitor.c:28:30: fatal error: linux/iio/events.h: No such file 
>> > or directory
>> >  #include 
>> >^
>> > compilation terminated.
>> > : recipe for target 'iio_event_monitor.o' failed
>> > make: *** [iio_event_monitor.o] Error 1
>> > make: Leaving directory 
>> > '/home/a0131834/workspace/git/mainline/linux/tools/iio'
>> > 
>> > Signed-off-by: Mugunthan V N 
> I can see where you are coming from, but wouldn't we normally expect the 
> kernel
> headers for the relevant kernel to available as part of the toolchain being
> used to do the build?

If it is a native build, we can expect toolchain to provide kernel
headers, but for a cross compile user has to provide where the kernel
headers present for the target which is done this patch.

> 
> They often lag a bit I guess, so perhaps we do want to put this little bit
> of 'fudging' in place.
> 
> What do others think?

Any updates on the patch?

Regards
Mugunthan V N


Re: [PATCH] tools: iio: fix iio build error by adding KBUILD_OUTPUT support to makefile

2016-09-15 Thread Mugunthan V N
Jonathan

On Monday 22 August 2016 12:35 AM, Jonathan Cameron wrote:
> On 11/08/16 07:01, Mugunthan V N wrote:
>> > Current make doesn't have support to pass kernel built directory
>> > to find events.h kernel header file, so adding support for
>> > KBUILD_OUTPUT support to Makefile.
>> > 
>> > $ make CROSS_COMPILE=arm-linux-gnueabihf- -C tools/iio
>> > make: Entering directory 
>> > '/home/a0131834/workspace/git/mainline/linux/tools/iio'
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o iio_event_monitor.o 
>> > iio_event_monitor.c
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o iio_utils.o 
>> > iio_utils.c
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o lsiio.o lsiio.c
>> > arm-linux-gnueabihf-gcc -Wall -g -D_GNU_SOURCE   -c -o 
>> > iio_generic_buffer.o iio_generic_buffer.c
>> > arm-linux-gnueabihf-gcc   lsiio.o iio_utils.o   -o lsiio
>> > arm-linux-gnueabihf-gcc   iio_generic_buffer.o iio_utils.o   -o 
>> > iio_generic_buffer
>> > iio_event_monitor.c:28:30: fatal error: linux/iio/events.h: No such file 
>> > or directory
>> >  #include 
>> >^
>> > compilation terminated.
>> > : recipe for target 'iio_event_monitor.o' failed
>> > make: *** [iio_event_monitor.o] Error 1
>> > make: Leaving directory 
>> > '/home/a0131834/workspace/git/mainline/linux/tools/iio'
>> > 
>> > Signed-off-by: Mugunthan V N 
> I can see where you are coming from, but wouldn't we normally expect the 
> kernel
> headers for the relevant kernel to available as part of the toolchain being
> used to do the build?

If it is a native build, we can expect toolchain to provide kernel
headers, but for a cross compile user has to provide where the kernel
headers present for the target which is done this patch.

> 
> They often lag a bit I guess, so perhaps we do want to put this little bit
> of 'fudging' in place.
> 
> What do others think?

Any updates on the patch?

Regards
Mugunthan V N


[PATCH v3 3/3] nvme: Enable autonomous power state transitions

2016-09-15 Thread Andy Lutomirski
NVMe devices can advertise multiple power states.  These states can
be either "operational" (the device is fully functional but possibly
slow) or "non-operational" (the device is asleep until woken up).
Some devices can automatically enter a non-operational state when
idle for a specified amount of time and then automatically wake back
up when needed.

The hardware configuration is a table.  For each state, an entry in
the table indicates the next deeper non-operational state, if any,
to autonomously transition to and the idle time required before
transitioning.

This patch teaches the driver to program APST so that each
successive non-operational state will be entered after an idle time
equal to 100% of the total latency (entry plus exit) associated with
that state.  A sysfs attribute 'ps_max_latency_us' gives the maximum
acceptable latency in ns; non-operational states with total latency
greater than this value will not be used.  As a special case,
ps_max_latency_us=0 will disable APST entirely.  On hardware without
APST support, ps_max_latency_us will not be exposed in sysfs.

The ps_max_latency_us parameter for newly-probed devices is set by
the module parameter nvme_core.default_ps_max_latency_us.

In theory, the device can expose "default" APST table, but this
doesn't seem to function correctly on my device (Samsung 950), nor
does it seem particularly useful.  There is also an optional
mechanism by which a configuration can be "saved" so it will be
automatically loaded on reset.  This can be configured from
userspace, but it doesn't seem useful to support in the driver.

On my laptop, enabling APST seems to save nearly 1W.

The hardware tables can be decoded in userspace with nvme-cli.
'nvme id-ctrl /dev/nvmeN' will show the power state table and
'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
configuration.

I called the parameters ps_max_latency_us instead of
apst_max_latency_us because we might support other power saving
modes (e.g. non-automonous power state transitions or even runtime
D3) and the same parameter could control the maximum allowable
latency for these states as well.

Signed-off-by: Andy Lutomirski 
---
 drivers/nvme/host/core.c | 166 +++
 drivers/nvme/host/nvme.h |   6 ++
 include/linux/nvme.h |   6 ++
 3 files changed, 178 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9260d2971176..8144e383c9f9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -56,6 +56,11 @@ EXPORT_SYMBOL_GPL(nvme_max_retries);
 static int nvme_char_major;
 module_param(nvme_char_major, int, 0);
 
+static unsigned long default_ps_max_latency_us = 25000;
+module_param(default_ps_max_latency_us, ulong, 0644);
+MODULE_PARM_DESC(ps_max_latency_us,
+"default max power saving latency; overridden per device in 
sysfs");
+
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
@@ -1209,6 +1214,98 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
blk_queue_write_cache(q, vwc, vwc);
 }
 
+static void nvme_configure_apst(struct nvme_ctrl *ctrl)
+{
+   /*
+* APST (Autonomous Power State Transition) lets us program a
+* table of power state transitions that the controller will
+* perform automatically.  We configure it with a simple
+* heuristic: we are willing to spend at most 2% of the time
+* transitioning between power states.  Therefore, when running
+* in any given state, we will enter the next lower-power
+* non-operational state after waiting 100 * (enlat + exlat)
+* microseconds, as long as that state's total latency is under
+* the requested maximum latency.
+*
+* We will not autonomously enter any non-operational state for
+* which the total latency exceeds ps_max_latency_us.  Users
+* can set ps_max_latency_us to zero to turn off APST.
+*/
+
+   unsigned apste;
+   struct nvme_feat_auto_pst *table;
+   int ret;
+
+   if (!ctrl->apsta)
+   return; /* APST isn't supported. */
+
+   if (ctrl->npss > 31) {
+   dev_warn(ctrl->device, "NPSS is invalid; not using APST\n");
+   return;
+   }
+
+   table = kzalloc(sizeof(*table), GFP_KERNEL);
+   if (!table)
+   return;
+
+   if (ctrl->ps_max_latency_us == 0) {
+   /* Turn off APST. */
+   apste = 0;
+   } else {
+   __le64 target = cpu_to_le64(0);
+   int state;
+
+   /*
+* Walk through all states from lowest- to highest-power.
+* According to the spec, lower-numbered states use more
+* power.  NPSS, despite the name, is the index of the
+* lowest-power state, not the number of states.
+*/
+   for (state = 

[PATCH v3 3/3] nvme: Enable autonomous power state transitions

2016-09-15 Thread Andy Lutomirski
NVMe devices can advertise multiple power states.  These states can
be either "operational" (the device is fully functional but possibly
slow) or "non-operational" (the device is asleep until woken up).
Some devices can automatically enter a non-operational state when
idle for a specified amount of time and then automatically wake back
up when needed.

The hardware configuration is a table.  For each state, an entry in
the table indicates the next deeper non-operational state, if any,
to autonomously transition to and the idle time required before
transitioning.

This patch teaches the driver to program APST so that each
successive non-operational state will be entered after an idle time
equal to 100% of the total latency (entry plus exit) associated with
that state.  A sysfs attribute 'ps_max_latency_us' gives the maximum
acceptable latency in ns; non-operational states with total latency
greater than this value will not be used.  As a special case,
ps_max_latency_us=0 will disable APST entirely.  On hardware without
APST support, ps_max_latency_us will not be exposed in sysfs.

The ps_max_latency_us parameter for newly-probed devices is set by
the module parameter nvme_core.default_ps_max_latency_us.

In theory, the device can expose "default" APST table, but this
doesn't seem to function correctly on my device (Samsung 950), nor
does it seem particularly useful.  There is also an optional
mechanism by which a configuration can be "saved" so it will be
automatically loaded on reset.  This can be configured from
userspace, but it doesn't seem useful to support in the driver.

On my laptop, enabling APST seems to save nearly 1W.

The hardware tables can be decoded in userspace with nvme-cli.
'nvme id-ctrl /dev/nvmeN' will show the power state table and
'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
configuration.

I called the parameters ps_max_latency_us instead of
apst_max_latency_us because we might support other power saving
modes (e.g. non-automonous power state transitions or even runtime
D3) and the same parameter could control the maximum allowable
latency for these states as well.

Signed-off-by: Andy Lutomirski 
---
 drivers/nvme/host/core.c | 166 +++
 drivers/nvme/host/nvme.h |   6 ++
 include/linux/nvme.h |   6 ++
 3 files changed, 178 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9260d2971176..8144e383c9f9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -56,6 +56,11 @@ EXPORT_SYMBOL_GPL(nvme_max_retries);
 static int nvme_char_major;
 module_param(nvme_char_major, int, 0);
 
+static unsigned long default_ps_max_latency_us = 25000;
+module_param(default_ps_max_latency_us, ulong, 0644);
+MODULE_PARM_DESC(ps_max_latency_us,
+"default max power saving latency; overridden per device in 
sysfs");
+
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
@@ -1209,6 +1214,98 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
blk_queue_write_cache(q, vwc, vwc);
 }
 
+static void nvme_configure_apst(struct nvme_ctrl *ctrl)
+{
+   /*
+* APST (Autonomous Power State Transition) lets us program a
+* table of power state transitions that the controller will
+* perform automatically.  We configure it with a simple
+* heuristic: we are willing to spend at most 2% of the time
+* transitioning between power states.  Therefore, when running
+* in any given state, we will enter the next lower-power
+* non-operational state after waiting 100 * (enlat + exlat)
+* microseconds, as long as that state's total latency is under
+* the requested maximum latency.
+*
+* We will not autonomously enter any non-operational state for
+* which the total latency exceeds ps_max_latency_us.  Users
+* can set ps_max_latency_us to zero to turn off APST.
+*/
+
+   unsigned apste;
+   struct nvme_feat_auto_pst *table;
+   int ret;
+
+   if (!ctrl->apsta)
+   return; /* APST isn't supported. */
+
+   if (ctrl->npss > 31) {
+   dev_warn(ctrl->device, "NPSS is invalid; not using APST\n");
+   return;
+   }
+
+   table = kzalloc(sizeof(*table), GFP_KERNEL);
+   if (!table)
+   return;
+
+   if (ctrl->ps_max_latency_us == 0) {
+   /* Turn off APST. */
+   apste = 0;
+   } else {
+   __le64 target = cpu_to_le64(0);
+   int state;
+
+   /*
+* Walk through all states from lowest- to highest-power.
+* According to the spec, lower-numbered states use more
+* power.  NPSS, despite the name, is the index of the
+* lowest-power state, not the number of states.
+*/
+   for (state = (int)ctrl->npss; state >= 0; 

[PATCH v3 1/3] nvme/scsi: Remove power management support

2016-09-15 Thread Andy Lutomirski
As far as I can tell, there is basically nothing correct about this
code.  It misinterprets npss (off-by-one).  It hardcodes a bunch of
power states, which is nonsense, because they're all just indices
into a table that software needs to parse.  It completely ignores
the distinction between operational and non-operational states.
And, until 4.8, if all of the above magically succeeded, it would
dereference a NULL pointer and OOPS.

Since this code appears to be useless, just delete it.

Signed-off-by: Andy Lutomirski 
---
 drivers/nvme/host/scsi.c | 74 ++--
 1 file changed, 3 insertions(+), 71 deletions(-)

diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index e947e298a737..44009105f8c8 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -72,15 +72,6 @@ static int sg_version_num = 30534;   /* 2 digits for each 
component */
 #define ALL_LUNS_RETURNED  0x02
 #define ALL_WELL_KNOWN_LUNS_RETURNED   0x01
 #define RESTRICTED_LUNS_RETURNED   0x00
-#define NVME_POWER_STATE_START_VALID   0x00
-#define NVME_POWER_STATE_ACTIVE0x01
-#define NVME_POWER_STATE_IDLE  0x02
-#define NVME_POWER_STATE_STANDBY   0x03
-#define NVME_POWER_STATE_LU_CONTROL0x07
-#define POWER_STATE_0  0
-#define POWER_STATE_1  1
-#define POWER_STATE_2  2
-#define POWER_STATE_3  3
 #define DOWNLOAD_SAVE_ACTIVATE 0x05
 #define DOWNLOAD_SAVE_DEFER_ACTIVATE   0x0E
 #define ACTIVATE_DEFERRED_MICROCODE0x0F
@@ -1229,64 +1220,6 @@ static void nvme_trans_fill_read_cap(u8 *response, 
struct nvme_id_ns *id_ns,
 
 /* Start Stop Unit Helper Functions */
 
-static int nvme_trans_power_state(struct nvme_ns *ns, struct sg_io_hdr *hdr,
-   u8 pc, u8 pcmod, u8 start)
-{
-   int res;
-   int nvme_sc;
-   struct nvme_id_ctrl *id_ctrl;
-   int lowest_pow_st;  /* max npss = lowest power consumption */
-   unsigned ps_desired = 0;
-
-   nvme_sc = nvme_identify_ctrl(ns->ctrl, _ctrl);
-   res = nvme_trans_status_code(hdr, nvme_sc);
-   if (res)
-   return res;
-
-   lowest_pow_st = max(POWER_STATE_0, (int)(id_ctrl->npss - 1));
-   kfree(id_ctrl);
-
-   switch (pc) {
-   case NVME_POWER_STATE_START_VALID:
-   /* Action unspecified if POWER CONDITION MODIFIER != 0 */
-   if (pcmod == 0 && start == 0x1)
-   ps_desired = POWER_STATE_0;
-   if (pcmod == 0 && start == 0x0)
-   ps_desired = lowest_pow_st;
-   break;
-   case NVME_POWER_STATE_ACTIVE:
-   /* Action unspecified if POWER CONDITION MODIFIER != 0 */
-   if (pcmod == 0)
-   ps_desired = POWER_STATE_0;
-   break;
-   case NVME_POWER_STATE_IDLE:
-   /* Action unspecified if POWER CONDITION MODIFIER != [0,1,2] */
-   if (pcmod == 0x0)
-   ps_desired = POWER_STATE_1;
-   else if (pcmod == 0x1)
-   ps_desired = POWER_STATE_2;
-   else if (pcmod == 0x2)
-   ps_desired = POWER_STATE_3;
-   break;
-   case NVME_POWER_STATE_STANDBY:
-   /* Action unspecified if POWER CONDITION MODIFIER != [0,1] */
-   if (pcmod == 0x0)
-   ps_desired = max(POWER_STATE_0, (lowest_pow_st - 2));
-   else if (pcmod == 0x1)
-   ps_desired = max(POWER_STATE_0, (lowest_pow_st - 1));
-   break;
-   case NVME_POWER_STATE_LU_CONTROL:
-   default:
-   res = nvme_trans_completion(hdr, SAM_STAT_CHECK_CONDITION,
-   ILLEGAL_REQUEST, SCSI_ASC_INVALID_CDB,
-   SCSI_ASCQ_CAUSE_NOT_REPORTABLE);
-   break;
-   }
-   nvme_sc = nvme_set_features(ns->ctrl, NVME_FEAT_POWER_MGMT, ps_desired, 
0,
-   NULL);
-   return nvme_trans_status_code(hdr, nvme_sc);
-}
-
 static int nvme_trans_send_activate_fw_cmd(struct nvme_ns *ns, struct 
sg_io_hdr *hdr,
u8 buffer_id)
 {
@@ -2235,11 +2168,10 @@ static int nvme_trans_synchronize_cache(struct nvme_ns 
*ns,
 static int nvme_trans_start_stop(struct nvme_ns *ns, struct sg_io_hdr *hdr,
u8 *cmd)
 {
-   u8 immed, pcmod, pc, no_flush, start;
+   u8 immed, pcmod, no_flush, start;
 
immed = cmd[1] & 0x01;
pcmod = cmd[3] & 0x0f;
-   pc = (cmd[4] & 0xf0) >> 4;

[PATCH v3 2/3] nvme: Pass pointers, not dma addresses, to nvme_get/set_features()

2016-09-15 Thread Andy Lutomirski
Any user I can imagine that needs a buffer at all will want to pass
a pointer directly.  There are no currently callers that use
buffers, so this change is painless, and it will make it much easier
to start using features that use buffers (e.g. APST).

Signed-off-by: Andy Lutomirski 
---
 drivers/nvme/host/core.c | 18 ++
 drivers/nvme/host/nvme.h |  4 ++--
 drivers/nvme/host/scsi.c |  6 +++---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2feacc70bf61..9260d2971176 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -597,7 +597,7 @@ int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
 }
 
 int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
-   dma_addr_t dma_addr, u32 *result)
+ void *buffer, size_t buflen, u32 *result)
 {
struct nvme_command c;
struct nvme_completion cqe;
@@ -606,10 +606,9 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, 
unsigned nsid,
memset(, 0, sizeof(c));
c.features.opcode = nvme_admin_get_features;
c.features.nsid = cpu_to_le32(nsid);
-   c.features.dptr.prp1 = cpu_to_le64(dma_addr);
c.features.fid = cpu_to_le32(fid);
 
-   ret = __nvme_submit_sync_cmd(dev->admin_q, , , NULL, 0, 0,
+   ret = __nvme_submit_sync_cmd(dev->admin_q, , , buffer, buflen, 0,
NVME_QID_ANY, 0, 0);
if (ret >= 0 && result)
*result = le32_to_cpu(cqe.result);
@@ -617,7 +616,7 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, 
unsigned nsid,
 }
 
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
-   dma_addr_t dma_addr, u32 *result)
+ const void *buffer, size_t buflen, u32 *result)
 {
struct nvme_command c;
struct nvme_completion cqe;
@@ -625,12 +624,15 @@ int nvme_set_features(struct nvme_ctrl *dev, unsigned 
fid, unsigned dword11,
 
memset(, 0, sizeof(c));
c.features.opcode = nvme_admin_set_features;
-   c.features.dptr.prp1 = cpu_to_le64(dma_addr);
c.features.fid = cpu_to_le32(fid);
c.features.dword11 = cpu_to_le32(dword11);
 
-   ret = __nvme_submit_sync_cmd(dev->admin_q, , , NULL, 0, 0,
-   NVME_QID_ANY, 0, 0);
+   /*
+* Casting buffer to void* is safe here: __nvme_submit_sync_cmd knows
+* that we're writing because it decodes the opcode.
+*/
+   ret = __nvme_submit_sync_cmd(dev->admin_q, , ,
+   (void *)buffer, buflen, 0, NVME_QID_ANY, 0, 0);
if (ret >= 0 && result)
*result = le32_to_cpu(cqe.result);
return ret;
@@ -664,7 +666,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
u32 result;
int status, nr_io_queues;
 
-   status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, 0,
+   status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,
);
if (status < 0)
return status;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ab18b78102bf..383ae22e169e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -292,9 +292,9 @@ int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
struct nvme_id_ns **id);
 int nvme_get_log_page(struct nvme_ctrl *dev, struct nvme_smart_log **log);
 int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
-   dma_addr_t dma_addr, u32 *result);
+ void *buffer, size_t buflen, u32 *result);
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
-   dma_addr_t dma_addr, u32 *result);
+ const void *buffer, size_t buflen, u32 *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index 44009105f8c8..c2a0a1c7d05d 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -906,7 +906,7 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, 
struct sg_io_hdr *hdr,
kfree(smart_log);
 
/* Get Features for Temp Threshold */
-   res = nvme_get_features(ns->ctrl, NVME_FEAT_TEMP_THRESH, 0, 0,
+   res = nvme_get_features(ns->ctrl, NVME_FEAT_TEMP_THRESH, 0, NULL, 0,
_resp);
if (res != NVME_SC_SUCCESS)
temp_c_thresh = LOG_TEMP_UNKNOWN;
@@ -1039,7 +1039,7 @@ static int nvme_trans_fill_caching_page(struct nvme_ns 
*ns,
if (len < MODE_PAGE_CACHING_LEN)
return -EINVAL;
 
-   nvme_sc = 

[PATCH v3 1/3] nvme/scsi: Remove power management support

2016-09-15 Thread Andy Lutomirski
As far as I can tell, there is basically nothing correct about this
code.  It misinterprets npss (off-by-one).  It hardcodes a bunch of
power states, which is nonsense, because they're all just indices
into a table that software needs to parse.  It completely ignores
the distinction between operational and non-operational states.
And, until 4.8, if all of the above magically succeeded, it would
dereference a NULL pointer and OOPS.

Since this code appears to be useless, just delete it.

Signed-off-by: Andy Lutomirski 
---
 drivers/nvme/host/scsi.c | 74 ++--
 1 file changed, 3 insertions(+), 71 deletions(-)

diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index e947e298a737..44009105f8c8 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -72,15 +72,6 @@ static int sg_version_num = 30534;   /* 2 digits for each 
component */
 #define ALL_LUNS_RETURNED  0x02
 #define ALL_WELL_KNOWN_LUNS_RETURNED   0x01
 #define RESTRICTED_LUNS_RETURNED   0x00
-#define NVME_POWER_STATE_START_VALID   0x00
-#define NVME_POWER_STATE_ACTIVE0x01
-#define NVME_POWER_STATE_IDLE  0x02
-#define NVME_POWER_STATE_STANDBY   0x03
-#define NVME_POWER_STATE_LU_CONTROL0x07
-#define POWER_STATE_0  0
-#define POWER_STATE_1  1
-#define POWER_STATE_2  2
-#define POWER_STATE_3  3
 #define DOWNLOAD_SAVE_ACTIVATE 0x05
 #define DOWNLOAD_SAVE_DEFER_ACTIVATE   0x0E
 #define ACTIVATE_DEFERRED_MICROCODE0x0F
@@ -1229,64 +1220,6 @@ static void nvme_trans_fill_read_cap(u8 *response, 
struct nvme_id_ns *id_ns,
 
 /* Start Stop Unit Helper Functions */
 
-static int nvme_trans_power_state(struct nvme_ns *ns, struct sg_io_hdr *hdr,
-   u8 pc, u8 pcmod, u8 start)
-{
-   int res;
-   int nvme_sc;
-   struct nvme_id_ctrl *id_ctrl;
-   int lowest_pow_st;  /* max npss = lowest power consumption */
-   unsigned ps_desired = 0;
-
-   nvme_sc = nvme_identify_ctrl(ns->ctrl, _ctrl);
-   res = nvme_trans_status_code(hdr, nvme_sc);
-   if (res)
-   return res;
-
-   lowest_pow_st = max(POWER_STATE_0, (int)(id_ctrl->npss - 1));
-   kfree(id_ctrl);
-
-   switch (pc) {
-   case NVME_POWER_STATE_START_VALID:
-   /* Action unspecified if POWER CONDITION MODIFIER != 0 */
-   if (pcmod == 0 && start == 0x1)
-   ps_desired = POWER_STATE_0;
-   if (pcmod == 0 && start == 0x0)
-   ps_desired = lowest_pow_st;
-   break;
-   case NVME_POWER_STATE_ACTIVE:
-   /* Action unspecified if POWER CONDITION MODIFIER != 0 */
-   if (pcmod == 0)
-   ps_desired = POWER_STATE_0;
-   break;
-   case NVME_POWER_STATE_IDLE:
-   /* Action unspecified if POWER CONDITION MODIFIER != [0,1,2] */
-   if (pcmod == 0x0)
-   ps_desired = POWER_STATE_1;
-   else if (pcmod == 0x1)
-   ps_desired = POWER_STATE_2;
-   else if (pcmod == 0x2)
-   ps_desired = POWER_STATE_3;
-   break;
-   case NVME_POWER_STATE_STANDBY:
-   /* Action unspecified if POWER CONDITION MODIFIER != [0,1] */
-   if (pcmod == 0x0)
-   ps_desired = max(POWER_STATE_0, (lowest_pow_st - 2));
-   else if (pcmod == 0x1)
-   ps_desired = max(POWER_STATE_0, (lowest_pow_st - 1));
-   break;
-   case NVME_POWER_STATE_LU_CONTROL:
-   default:
-   res = nvme_trans_completion(hdr, SAM_STAT_CHECK_CONDITION,
-   ILLEGAL_REQUEST, SCSI_ASC_INVALID_CDB,
-   SCSI_ASCQ_CAUSE_NOT_REPORTABLE);
-   break;
-   }
-   nvme_sc = nvme_set_features(ns->ctrl, NVME_FEAT_POWER_MGMT, ps_desired, 
0,
-   NULL);
-   return nvme_trans_status_code(hdr, nvme_sc);
-}
-
 static int nvme_trans_send_activate_fw_cmd(struct nvme_ns *ns, struct 
sg_io_hdr *hdr,
u8 buffer_id)
 {
@@ -2235,11 +2168,10 @@ static int nvme_trans_synchronize_cache(struct nvme_ns 
*ns,
 static int nvme_trans_start_stop(struct nvme_ns *ns, struct sg_io_hdr *hdr,
u8 *cmd)
 {
-   u8 immed, pcmod, pc, no_flush, start;
+   u8 immed, pcmod, no_flush, start;
 
immed = cmd[1] & 0x01;
pcmod = cmd[3] & 0x0f;
-   pc = (cmd[4] & 0xf0) >> 4;
no_flush = cmd[4] & 

[PATCH v3 2/3] nvme: Pass pointers, not dma addresses, to nvme_get/set_features()

2016-09-15 Thread Andy Lutomirski
Any user I can imagine that needs a buffer at all will want to pass
a pointer directly.  There are no currently callers that use
buffers, so this change is painless, and it will make it much easier
to start using features that use buffers (e.g. APST).

Signed-off-by: Andy Lutomirski 
---
 drivers/nvme/host/core.c | 18 ++
 drivers/nvme/host/nvme.h |  4 ++--
 drivers/nvme/host/scsi.c |  6 +++---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2feacc70bf61..9260d2971176 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -597,7 +597,7 @@ int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
 }
 
 int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
-   dma_addr_t dma_addr, u32 *result)
+ void *buffer, size_t buflen, u32 *result)
 {
struct nvme_command c;
struct nvme_completion cqe;
@@ -606,10 +606,9 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, 
unsigned nsid,
memset(, 0, sizeof(c));
c.features.opcode = nvme_admin_get_features;
c.features.nsid = cpu_to_le32(nsid);
-   c.features.dptr.prp1 = cpu_to_le64(dma_addr);
c.features.fid = cpu_to_le32(fid);
 
-   ret = __nvme_submit_sync_cmd(dev->admin_q, , , NULL, 0, 0,
+   ret = __nvme_submit_sync_cmd(dev->admin_q, , , buffer, buflen, 0,
NVME_QID_ANY, 0, 0);
if (ret >= 0 && result)
*result = le32_to_cpu(cqe.result);
@@ -617,7 +616,7 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, 
unsigned nsid,
 }
 
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
-   dma_addr_t dma_addr, u32 *result)
+ const void *buffer, size_t buflen, u32 *result)
 {
struct nvme_command c;
struct nvme_completion cqe;
@@ -625,12 +624,15 @@ int nvme_set_features(struct nvme_ctrl *dev, unsigned 
fid, unsigned dword11,
 
memset(, 0, sizeof(c));
c.features.opcode = nvme_admin_set_features;
-   c.features.dptr.prp1 = cpu_to_le64(dma_addr);
c.features.fid = cpu_to_le32(fid);
c.features.dword11 = cpu_to_le32(dword11);
 
-   ret = __nvme_submit_sync_cmd(dev->admin_q, , , NULL, 0, 0,
-   NVME_QID_ANY, 0, 0);
+   /*
+* Casting buffer to void* is safe here: __nvme_submit_sync_cmd knows
+* that we're writing because it decodes the opcode.
+*/
+   ret = __nvme_submit_sync_cmd(dev->admin_q, , ,
+   (void *)buffer, buflen, 0, NVME_QID_ANY, 0, 0);
if (ret >= 0 && result)
*result = le32_to_cpu(cqe.result);
return ret;
@@ -664,7 +666,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
u32 result;
int status, nr_io_queues;
 
-   status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, 0,
+   status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,
);
if (status < 0)
return status;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ab18b78102bf..383ae22e169e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -292,9 +292,9 @@ int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
struct nvme_id_ns **id);
 int nvme_get_log_page(struct nvme_ctrl *dev, struct nvme_smart_log **log);
 int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
-   dma_addr_t dma_addr, u32 *result);
+ void *buffer, size_t buflen, u32 *result);
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
-   dma_addr_t dma_addr, u32 *result);
+ const void *buffer, size_t buflen, u32 *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index 44009105f8c8..c2a0a1c7d05d 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -906,7 +906,7 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, 
struct sg_io_hdr *hdr,
kfree(smart_log);
 
/* Get Features for Temp Threshold */
-   res = nvme_get_features(ns->ctrl, NVME_FEAT_TEMP_THRESH, 0, 0,
+   res = nvme_get_features(ns->ctrl, NVME_FEAT_TEMP_THRESH, 0, NULL, 0,
_resp);
if (res != NVME_SC_SUCCESS)
temp_c_thresh = LOG_TEMP_UNKNOWN;
@@ -1039,7 +1039,7 @@ static int nvme_trans_fill_caching_page(struct nvme_ns 
*ns,
if (len < MODE_PAGE_CACHING_LEN)
return -EINVAL;
 
-   nvme_sc = nvme_get_features(ns->ctrl, 

[PATCH v3 0/3] nvme power saving

2016-09-15 Thread Andy Lutomirski
Hi all-

Here's v3 of the APST patch set.  The biggest bikesheddable thing (I
think) is the scaling factor.  I currently have it hardcoded so that
we wait 50x the total latency before entering a power saving state.
On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
latency, 22ms exit latency) after 1200ms.  I have the default max
latency set to 25ms.

FWIW, in practice, the latency this introduces seems to be well
under 22ms, but my benchmark is a bit silly and I might have
measured it wrong.  I certainly haven't observed a slowdown just
using my laptop.

This time around, I changed the names of parameters after Jay
Frayensee got confused by the first try.  Now they are:

 - ps_max_latency_us in sysfs: actually controls it.
 - nvme_core.default_ps_max_latency_us: sets the default.

Yeah, they're mouthfuls, but they should be clearer now.

Changes from v2:
 - Rename the parameters.

Changes from v1:
 - Get rid of feature buffer alignment warnings.
 - Change the error message if NPSS is bogus.
 - Rename apst_max_latency_ns to apst_max_latency_us because module params
   don't like u64 or unsigned long long and I wanted to make it fit more
   comfortably in a ulong module param.  (And the nanoseconds were useless.)
 - Add a module parameter for the default max latency.

Andy Lutomirski (3):
  nvme/scsi: Remove power management support
  nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
  nvme: Enable autonomous power state transitions

 drivers/nvme/host/core.c | 184 ---
 drivers/nvme/host/nvme.h |  10 ++-
 drivers/nvme/host/scsi.c |  80 ++---
 include/linux/nvme.h |   6 ++
 4 files changed, 196 insertions(+), 84 deletions(-)

-- 
2.7.4



[PATCH v3 0/3] nvme power saving

2016-09-15 Thread Andy Lutomirski
Hi all-

Here's v3 of the APST patch set.  The biggest bikesheddable thing (I
think) is the scaling factor.  I currently have it hardcoded so that
we wait 50x the total latency before entering a power saving state.
On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
latency, 22ms exit latency) after 1200ms.  I have the default max
latency set to 25ms.

FWIW, in practice, the latency this introduces seems to be well
under 22ms, but my benchmark is a bit silly and I might have
measured it wrong.  I certainly haven't observed a slowdown just
using my laptop.

This time around, I changed the names of parameters after Jay
Frayensee got confused by the first try.  Now they are:

 - ps_max_latency_us in sysfs: actually controls it.
 - nvme_core.default_ps_max_latency_us: sets the default.

Yeah, they're mouthfuls, but they should be clearer now.

Changes from v2:
 - Rename the parameters.

Changes from v1:
 - Get rid of feature buffer alignment warnings.
 - Change the error message if NPSS is bogus.
 - Rename apst_max_latency_ns to apst_max_latency_us because module params
   don't like u64 or unsigned long long and I wanted to make it fit more
   comfortably in a ulong module param.  (And the nanoseconds were useless.)
 - Add a module parameter for the default max latency.

Andy Lutomirski (3):
  nvme/scsi: Remove power management support
  nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
  nvme: Enable autonomous power state transitions

 drivers/nvme/host/core.c | 184 ---
 drivers/nvme/host/nvme.h |  10 ++-
 drivers/nvme/host/scsi.c |  80 ++---
 include/linux/nvme.h |   6 ++
 4 files changed, 196 insertions(+), 84 deletions(-)

-- 
2.7.4



Re: clk/Renesas-MSTP: Delete an error message for a failed memory allocation

2016-09-15 Thread SF Markus Elfring
>> Does this wish influence the handling of suggested changes for the source
>> file "drivers/clk/renesas/clk-mstp.c" anyhow?
>> https://patchwork.kernel.org/patch/9332363/
> 
> Yes, please submit a new version of this patch that removes all the memory 
> allocation error messages from drivers/clk/renesas/ in one go.

I find this kind of feedback a bit surprising and it seems to be promising
to increase software development in this design direction.
Have you got any corresponding scripts for the semantic patch language
prepared to make such a source code adjustment safer (and a bit more convenient
with the help of the Coccinelle software)?

I am also curious on how the change acceptance will evolve for the other
three update steps from this patch series when this one indicates further
software update opportunities already.

Regards,
Markus


Re: clk/Renesas-MSTP: Delete an error message for a failed memory allocation

2016-09-15 Thread SF Markus Elfring
>> Does this wish influence the handling of suggested changes for the source
>> file "drivers/clk/renesas/clk-mstp.c" anyhow?
>> https://patchwork.kernel.org/patch/9332363/
> 
> Yes, please submit a new version of this patch that removes all the memory 
> allocation error messages from drivers/clk/renesas/ in one go.

I find this kind of feedback a bit surprising and it seems to be promising
to increase software development in this design direction.
Have you got any corresponding scripts for the semantic patch language
prepared to make such a source code adjustment safer (and a bit more convenient
with the help of the Coccinelle software)?

I am also curious on how the change acceptance will evolve for the other
three update steps from this patch series when this one indicates further
software update opportunities already.

Regards,
Markus


linux-next: Tree for Sep 16

2016-09-15 Thread Stephen Rothwell
Hi all,

Changes since 20160915:

New tree: kgdb

The kbuild tree still had its build failure and warnings for PowerPC,
for which I applied a couple of patches

The drm-intel tree gained a conflict against Linus' tree.

The drm-msm tree gained a conflict against Linus' tree.

The block tree gained a build failure so I used the version from
next-20160915.

The slave-dma tree gained a build failure so I used the version from
next-20160915.

Non-merge commits (relative to Linus' tree): 7534
 7442 files changed, 346319 insertions(+), 212779 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 242 trees (counting Linus' and 34 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (4cea8776571b Merge tag 'pci-v4.8-fixes-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci)
Merging fixes/master (d3396e1e4ec4 Merge tag 'fixes-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging kbuild-current/rc-fixes (d3e2773c4ede builddeb: Skip gcc-plugins when 
not configured)
Merging arc-current/for-curr (3eab887a5542 Linux 4.8-rc4)
Merging arm-current/fixes (1a57c286d8ce ARM: pxa/lubbock: add pcmcia clock)
Merging m68k-current/for-linus (6bd80f372371 m68k/defconfig: Update defconfigs 
for v4.7-rc2)
Merging metag-fixes/fixes (97b1d23f7bcb metag: Drop show_mem() from mem_init())
Merging powerpc-fixes/fixes (ffed15d3ce3f powerpc/kernel: Fix size of 
NUM_CPU_FTR_KEYS on 32-bit)
Merging sparc/master (4620a06e4b3c shmem: Fix link error if huge pages support 
is disabled)
Merging net/master (440f895aa97f drivers: net: phy: xgene: Fix 'remove' 
function)
Merging ipsec/master (1fb81e09d487 vti: use right inner_mode for inbound inter 
address family policy checks)
Merging netfilter/master (440f895aa97f drivers: net: phy: xgene: Fix 'remove' 
function)
Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging wireless-drivers/master (a0714125d11b Merge ath-current from ath.git)
Merging mac80211/master (85d5313ed717 mac80211: reject TSPEC TIDs (TSIDs) for 
aggregation)
Merging sound-current/for-linus (3f640970a414 ALSA: hda - Fix headset mic 
detection problem for several Dell laptops)
Merging pci-current/for-linus (035ee288ae7a PCI: Fix bridge_d3 update on device 
removal)
Merging driver-core.current/driver-core-linus (c6935931c189 Linux 4.8-rc5)
Merging tty.current/tty-linus (c6935931c189 Linux 4.8-rc5)
Merging usb.current/usb-linus (9395452b4aab Linux 4.8-rc6)
Merging usb-gadget-fixes/fixes (696118c016dd usb: dwc3: pci: fix build warning 
on !PM_SLEEP)
Merging usb-serial-fixes/usb-linus (f190fd92458d USB: serial: simple: add 
support for another Infineon flashloader)
Merging usb-chipidea-fixes/ci-for-usb-stable (6f3c4fb6d05e usb: chipidea: udc: 
fix NULL ptr dereference in isr_setup_status_phase)
Merging staging.current/staging-linus (9395452b4aab Linux 4.8-rc6)
Merging char-misc.current/char-misc-linus (c6935931c189 Linux 4.8-rc5)
Merging input-current/for-linus (4af2ff91ec3f Input: silead_gsl1680 - use 
"silead/" prefix for firmware loading)
Merging crypto-current/master (2db34e78f126 crypto: arm64/aes-ctr - fix NULL 
dereference in tail processing)
Merging ide/master (797cee982eef Merge branch 'stable-4.8' of 
git://git.infradead.org/users/pcmoore/audit)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms 
vs m

linux-next: Tree for Sep 16

2016-09-15 Thread Stephen Rothwell
Hi all,

Changes since 20160915:

New tree: kgdb

The kbuild tree still had its build failure and warnings for PowerPC,
for which I applied a couple of patches

The drm-intel tree gained a conflict against Linus' tree.

The drm-msm tree gained a conflict against Linus' tree.

The block tree gained a build failure so I used the version from
next-20160915.

The slave-dma tree gained a build failure so I used the version from
next-20160915.

Non-merge commits (relative to Linus' tree): 7534
 7442 files changed, 346319 insertions(+), 212779 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 242 trees (counting Linus' and 34 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (4cea8776571b Merge tag 'pci-v4.8-fixes-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci)
Merging fixes/master (d3396e1e4ec4 Merge tag 'fixes-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging kbuild-current/rc-fixes (d3e2773c4ede builddeb: Skip gcc-plugins when 
not configured)
Merging arc-current/for-curr (3eab887a5542 Linux 4.8-rc4)
Merging arm-current/fixes (1a57c286d8ce ARM: pxa/lubbock: add pcmcia clock)
Merging m68k-current/for-linus (6bd80f372371 m68k/defconfig: Update defconfigs 
for v4.7-rc2)
Merging metag-fixes/fixes (97b1d23f7bcb metag: Drop show_mem() from mem_init())
Merging powerpc-fixes/fixes (ffed15d3ce3f powerpc/kernel: Fix size of 
NUM_CPU_FTR_KEYS on 32-bit)
Merging sparc/master (4620a06e4b3c shmem: Fix link error if huge pages support 
is disabled)
Merging net/master (440f895aa97f drivers: net: phy: xgene: Fix 'remove' 
function)
Merging ipsec/master (1fb81e09d487 vti: use right inner_mode for inbound inter 
address family policy checks)
Merging netfilter/master (440f895aa97f drivers: net: phy: xgene: Fix 'remove' 
function)
Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging wireless-drivers/master (a0714125d11b Merge ath-current from ath.git)
Merging mac80211/master (85d5313ed717 mac80211: reject TSPEC TIDs (TSIDs) for 
aggregation)
Merging sound-current/for-linus (3f640970a414 ALSA: hda - Fix headset mic 
detection problem for several Dell laptops)
Merging pci-current/for-linus (035ee288ae7a PCI: Fix bridge_d3 update on device 
removal)
Merging driver-core.current/driver-core-linus (c6935931c189 Linux 4.8-rc5)
Merging tty.current/tty-linus (c6935931c189 Linux 4.8-rc5)
Merging usb.current/usb-linus (9395452b4aab Linux 4.8-rc6)
Merging usb-gadget-fixes/fixes (696118c016dd usb: dwc3: pci: fix build warning 
on !PM_SLEEP)
Merging usb-serial-fixes/usb-linus (f190fd92458d USB: serial: simple: add 
support for another Infineon flashloader)
Merging usb-chipidea-fixes/ci-for-usb-stable (6f3c4fb6d05e usb: chipidea: udc: 
fix NULL ptr dereference in isr_setup_status_phase)
Merging staging.current/staging-linus (9395452b4aab Linux 4.8-rc6)
Merging char-misc.current/char-misc-linus (c6935931c189 Linux 4.8-rc5)
Merging input-current/for-linus (4af2ff91ec3f Input: silead_gsl1680 - use 
"silead/" prefix for firmware loading)
Merging crypto-current/master (2db34e78f126 crypto: arm64/aes-ctr - fix NULL 
dereference in tail processing)
Merging ide/master (797cee982eef Merge branch 'stable-4.8' of 
git://git.infradead.org/users/pcmoore/audit)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms 
vs m

Re: [GIT PULL v2] phy: for 4.9

2016-09-15 Thread Kishon Vijay Abraham I
Hi Greg,

On Thursday 15 September 2016 07:31 PM, Greg KH wrote:
> On Thu, Sep 15, 2016 at 06:26:18PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Greg,
>>
>> Please find the updated PHY pull request for 4.9 based on usb-next. This
>> now fixes the merge conflicts caused because of extcon branch merged
>> into linux-phy.
>>
>> Please find the detailed list of changes in the tag message.
>>
>> Let me know if you want me to change something.
> 
> Much better, thanks, now pulled and pushed out.

Looks like there are still problems with the merge. I'll fix them and send a
new pull request.

Thanks
Kishon


Re: [RESEND PATCH] arm64: kgdb: fix single stepping

2016-09-15 Thread AKASHI Takahiro
Hi Jason,

Welcome back to mainline development.
I've been looking forward to your comments.

On Thu, Sep 15, 2016 at 08:04:57AM -0500, Jason Wessel wrote:
> On 04/20/2015 08:13 PM, AKASHI Takahiro wrote:
> >Jason,
> >
> >Could you please review my patch below?
> >See also arm64 maintainer's comment:
> >http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
> >
> >Thanks,
> >-Takahiro AKASHI
> >
> >I tried to verify kgdb in vanilla kernel on fast model, but it seems that
> >the single stepping with kgdb doesn't work correctly since its first
> >appearance at v3.15.
> >
> >On v3.15, 'stepi' command after breaking the kernel at some breakpoint
> >steps forward to the next instruction, but the succeeding 'stepi' never
> >goes beyond that.
> >On v3.16, 'stepi' moves forward and stops at the next instruction just
> >after enable_dbg in el1_dbg, and never goes beyond that. This variance of
> >behavior seems to come in with the following patch in v3.16:
> >
> >commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> >paths where possible")
> >
> >This patch
> >(1) moves kgdb_disable_single_step() from 'c' command handling to single
> >step handler.
> >This makes sure that single stepping gets effective at every 's' command.
> >Please note that, under the current implementation, single step bit in
> >spsr, which is cleared by the first single stepping, will not be set
> >again for the consecutive 's' commands because single step bit in mdscr
> >is still kept on (that is, kernel_active_single_step() in
> >kgdb_arch_handle_exception() is true).
> >(2) re-implements kgdb_roundup_cpus() because the current implementation
> >enabled interrupts naively. See below.
> >(3) removes 'enable_dbg' in el1_dbg.
> >Single step bit in mdscr is turned on in do_handle_exception()->
> >kgdb_handle_expection() before returning to debugged context, and if
> >debug exception is enabled in el1_dbg, we will see unexpected single-
> >stepping in el1_dbg.
> >Since v3.18, the following patch does the same:
> >  commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
> >  on return from el1_dbg)
> >(4) masks interrupts while single-stepping one instruction.
> >If an interrupt is caught during processing a single-stepping, debug
> >exception is unintentionally enabled by el1_irq's 'enable_dbg' before
> >returning to debugged context.
> >Thus, like in (2), we will see unexpected single-stepping in el1_irq.
> >
> >Basically (1) and (2) are for v3.15, (3) and (4) for v3.1[67].
> >
> >@@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, 
> >int signo,
> >  * over and over again.
> >  */
> > kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
> >-atomic_set(_cpu_doing_single_step, -1);
> >-kgdb_single_step =  0;
> 
> 
> This is a subtle change, but I assume it is what you intended?  All the CPUs 
> will get released into the run state when exiting the kgdb exception handler.

You are talking about "- kgdb_single_step = 0." Right?
Do you think that there is any (negative) side effect of this change?
Since most of architectures, including x86, don't handle this variable
explicitly, I didn't expect that it was essential.

> 
> >-
> >-/*
> >- * Received continue command, disable single step
> >- */
> >-if (kernel_active_single_step())
> >-kernel_disable_single_step();
> >
> 
> 
> I see why this is not needed above any more given that you have a function 
> that cleans up the state on entry to the kgdb exception handler.

Yeah, the point here is that, on arm64, we need to set SS (Software step)
bit in SPSR as well as MDSCR before returning from the exception handler
in order to enable a hardware single step according to ARM ARM D2.12
("Software Step exceptions").
So it might be good enough just to call kernel_enable_single_step()
at 's' command unconditionally instead of "if (!kernel_active_single_step)".
(Please note that, as I mentioned in the commit message, SPSR.SS bit will
be cleared while MDSCR.SS bit is kept on after a hardware single step.)

But anyhow, I believe that the hunk above is redundant in my implementation.

> The rest of the patch is fine.

Thank you.

> I added the patch to kgdb-next after fixing up the context since it no longer 
> applied to the mainline ( 
> https://git.kernel.org/cgit/linux/kernel/git/jwessel/kgdb.git/log/?h=kgdb-next).
>   If there is further discussion on the point above, another patch can be 
> added, but it I am assuming this is the way you desire it to work as there 
> are some other architectures that use the same behaviour.  I do not presently 
> have any ARM64 hardware to validate this particular change.
> 
> I also added to the patch a "Cc: linux-stable " so we 
> can have this appear on some of 

Re: [GIT PULL v2] phy: for 4.9

2016-09-15 Thread Kishon Vijay Abraham I
Hi Greg,

On Thursday 15 September 2016 07:31 PM, Greg KH wrote:
> On Thu, Sep 15, 2016 at 06:26:18PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Greg,
>>
>> Please find the updated PHY pull request for 4.9 based on usb-next. This
>> now fixes the merge conflicts caused because of extcon branch merged
>> into linux-phy.
>>
>> Please find the detailed list of changes in the tag message.
>>
>> Let me know if you want me to change something.
> 
> Much better, thanks, now pulled and pushed out.

Looks like there are still problems with the merge. I'll fix them and send a
new pull request.

Thanks
Kishon


Re: [RESEND PATCH] arm64: kgdb: fix single stepping

2016-09-15 Thread AKASHI Takahiro
Hi Jason,

Welcome back to mainline development.
I've been looking forward to your comments.

On Thu, Sep 15, 2016 at 08:04:57AM -0500, Jason Wessel wrote:
> On 04/20/2015 08:13 PM, AKASHI Takahiro wrote:
> >Jason,
> >
> >Could you please review my patch below?
> >See also arm64 maintainer's comment:
> >http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
> >
> >Thanks,
> >-Takahiro AKASHI
> >
> >I tried to verify kgdb in vanilla kernel on fast model, but it seems that
> >the single stepping with kgdb doesn't work correctly since its first
> >appearance at v3.15.
> >
> >On v3.15, 'stepi' command after breaking the kernel at some breakpoint
> >steps forward to the next instruction, but the succeeding 'stepi' never
> >goes beyond that.
> >On v3.16, 'stepi' moves forward and stops at the next instruction just
> >after enable_dbg in el1_dbg, and never goes beyond that. This variance of
> >behavior seems to come in with the following patch in v3.16:
> >
> >commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> >paths where possible")
> >
> >This patch
> >(1) moves kgdb_disable_single_step() from 'c' command handling to single
> >step handler.
> >This makes sure that single stepping gets effective at every 's' command.
> >Please note that, under the current implementation, single step bit in
> >spsr, which is cleared by the first single stepping, will not be set
> >again for the consecutive 's' commands because single step bit in mdscr
> >is still kept on (that is, kernel_active_single_step() in
> >kgdb_arch_handle_exception() is true).
> >(2) re-implements kgdb_roundup_cpus() because the current implementation
> >enabled interrupts naively. See below.
> >(3) removes 'enable_dbg' in el1_dbg.
> >Single step bit in mdscr is turned on in do_handle_exception()->
> >kgdb_handle_expection() before returning to debugged context, and if
> >debug exception is enabled in el1_dbg, we will see unexpected single-
> >stepping in el1_dbg.
> >Since v3.18, the following patch does the same:
> >  commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
> >  on return from el1_dbg)
> >(4) masks interrupts while single-stepping one instruction.
> >If an interrupt is caught during processing a single-stepping, debug
> >exception is unintentionally enabled by el1_irq's 'enable_dbg' before
> >returning to debugged context.
> >Thus, like in (2), we will see unexpected single-stepping in el1_irq.
> >
> >Basically (1) and (2) are for v3.15, (3) and (4) for v3.1[67].
> >
> >@@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, 
> >int signo,
> >  * over and over again.
> >  */
> > kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
> >-atomic_set(_cpu_doing_single_step, -1);
> >-kgdb_single_step =  0;
> 
> 
> This is a subtle change, but I assume it is what you intended?  All the CPUs 
> will get released into the run state when exiting the kgdb exception handler.

You are talking about "- kgdb_single_step = 0." Right?
Do you think that there is any (negative) side effect of this change?
Since most of architectures, including x86, don't handle this variable
explicitly, I didn't expect that it was essential.

> 
> >-
> >-/*
> >- * Received continue command, disable single step
> >- */
> >-if (kernel_active_single_step())
> >-kernel_disable_single_step();
> >
> 
> 
> I see why this is not needed above any more given that you have a function 
> that cleans up the state on entry to the kgdb exception handler.

Yeah, the point here is that, on arm64, we need to set SS (Software step)
bit in SPSR as well as MDSCR before returning from the exception handler
in order to enable a hardware single step according to ARM ARM D2.12
("Software Step exceptions").
So it might be good enough just to call kernel_enable_single_step()
at 's' command unconditionally instead of "if (!kernel_active_single_step)".
(Please note that, as I mentioned in the commit message, SPSR.SS bit will
be cleared while MDSCR.SS bit is kept on after a hardware single step.)

But anyhow, I believe that the hunk above is redundant in my implementation.

> The rest of the patch is fine.

Thank you.

> I added the patch to kgdb-next after fixing up the context since it no longer 
> applied to the mainline ( 
> https://git.kernel.org/cgit/linux/kernel/git/jwessel/kgdb.git/log/?h=kgdb-next).
>   If there is further discussion on the point above, another patch can be 
> added, but it I am assuming this is the way you desire it to work as there 
> are some other architectures that use the same behaviour.  I do not presently 
> have any ARM64 hardware to validate this particular change.
> 
> I also added to the patch a "Cc: linux-stable " so we 
> can have this appear on some of the older kernels.


Re: linux-next: build failure after merge of the block tree

2016-09-15 Thread Jens Axboe

On 09/15/2016 07:14 PM, Stephen Rothwell wrote:

Hi Jens,

After merging the block tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

drivers/block/nbd.c:884:2: error: unknown field 'map_queue' specified in 
initializer
  .map_queue = blk_mq_map_queue,
  ^
/home/sfr/next/next/drivers/block/nbd.c:884:15: error: 'blk_mq_map_queue' 
undeclared here (not in a function)
  .map_queue = blk_mq_map_queue,
   ^

Caused by commit

  9151bcb4fb38 ("blk-mq: kill unused blk_mq_create_mq_map()")

interacting with commit

  fd8383fd88a2 ("nbd: convert to blkmq")

also in the block tree.


That's my fault, when merged together the nbd part needs an update. I've
pushed out a fix for it.

--
Jens Axboe



Re: linux-next: build failure after merge of the block tree

2016-09-15 Thread Jens Axboe

On 09/15/2016 07:14 PM, Stephen Rothwell wrote:

Hi Jens,

After merging the block tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

drivers/block/nbd.c:884:2: error: unknown field 'map_queue' specified in 
initializer
  .map_queue = blk_mq_map_queue,
  ^
/home/sfr/next/next/drivers/block/nbd.c:884:15: error: 'blk_mq_map_queue' 
undeclared here (not in a function)
  .map_queue = blk_mq_map_queue,
   ^

Caused by commit

  9151bcb4fb38 ("blk-mq: kill unused blk_mq_create_mq_map()")

interacting with commit

  fd8383fd88a2 ("nbd: convert to blkmq")

also in the block tree.


That's my fault, when merged together the nbd part needs an update. I've
pushed out a fix for it.

--
Jens Axboe



Re: [PATCH] mwifiex: fix null pointer deference when adapter is null

2016-09-15 Thread Kalle Valo
kbuild test robot <l...@intel.com> writes:

> url:
> https://github.com/0day-ci/linux/commits/Colin-King/mwifiex-fix-null-pointer-deference-when-adapter-is-null/20160915-231625
> base:   
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git
>  master
> config: x86_64-randconfig-x013-201637 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
>
> All warnings (new ones prefixed by >>):
>
>drivers/net/wireless/marvell/mwifiex/main.c: In function 
> 'mwifiex_shutdown_sw':
>>> drivers/net/wireless/marvell/mwifiex/main.c:1433:1: warning: label 
>>> 'exit_remove' defined but not used [-Wunused-label]
> exit_remove:
> ^~~

Looks like a valid warning to me, so please resend.

-- 
Kalle Valo


Re: [PATCH] mwifiex: fix null pointer deference when adapter is null

2016-09-15 Thread Kalle Valo
kbuild test robot  writes:

> url:
> https://github.com/0day-ci/linux/commits/Colin-King/mwifiex-fix-null-pointer-deference-when-adapter-is-null/20160915-231625
> base:   
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git
>  master
> config: x86_64-randconfig-x013-201637 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
>
> All warnings (new ones prefixed by >>):
>
>drivers/net/wireless/marvell/mwifiex/main.c: In function 
> 'mwifiex_shutdown_sw':
>>> drivers/net/wireless/marvell/mwifiex/main.c:1433:1: warning: label 
>>> 'exit_remove' defined but not used [-Wunused-label]
> exit_remove:
> ^~~

Looks like a valid warning to me, so please resend.

-- 
Kalle Valo


Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs

2016-09-15 Thread Dan Williams
On Thu, Sep 15, 2016 at 7:04 PM, Dan Williams  wrote:
> On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner  wrote:
>> On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote:
>>> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner  wrote:
>>> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
>>> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig  wrote:
>>> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
>>> >> >> The DAX property, page cache bypass, of a VMA is only detectable via 
>>> >> >> the
>>> >> >> vma_is_dax() helper to check the S_DAX inode flag.  However, this is
>>> >> >> only available internal to the kernel and is a property that userspace
>>> >> >> applications would like to interrogate.
>>> >> >
>>> >> > They have absolutely no business knowing such an implementation detail.
>>> >>
>>> >> Hasn't that train already left the station with FS_XFLAG_DAX?
>>> >
>>> > No, that's an admin flag, not a runtime hint for applications. Just
>>> > because that flag is set on an inode, it does not mean that DAX is
>>> > actually in use - it will be ignored if the backing dev is not dax
>>> > capable.
>>>
>>> What's the point of an admin flag if an admin can't do cat /proc/>> of interest>/smaps, or some other mechanism, to validate that the
>>> setting the admin cares about is in effect?
>>
>> Sorry, I don't follow - why would you be looking at mapping file
>> regions in /proc to determine if some file somewhere in a filesystem
>> has a specific flag set on it or not?
>>
>> FS_XFLAG_DAX is an inode attribute flag, not something you can
>> query or administrate through mmap:
>>
>> I.e.
>> # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo
>>  --- foo
>>  --x foo
>>  --- foo
>> #
>>
>> What happens when that flag is set on an inode is determined by a
>> whole bunch of other things that are completely separate to the
>> management of the inode flag itself.
>
> Right, I understand that, but how does an admin audit those "bunch of
> other things" that actually gate whether DAX ends up being used in
> practice?  There's currently no way for userspace to observe that a
> file with FS_XFLAG_DAX actually results in a change in mmap behavior.

Let me put it another way, if we inadvertently break DAX causing it to
be disabled in scenarios when it should be enabled.  What is the
interface for the admin to check "I have the DAX inode flag set, but
the file this application expects to be mapped DAX is mapped with page
cache"?


Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs

2016-09-15 Thread Dan Williams
On Thu, Sep 15, 2016 at 7:04 PM, Dan Williams  wrote:
> On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner  wrote:
>> On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote:
>>> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner  wrote:
>>> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
>>> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig  wrote:
>>> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
>>> >> >> The DAX property, page cache bypass, of a VMA is only detectable via 
>>> >> >> the
>>> >> >> vma_is_dax() helper to check the S_DAX inode flag.  However, this is
>>> >> >> only available internal to the kernel and is a property that userspace
>>> >> >> applications would like to interrogate.
>>> >> >
>>> >> > They have absolutely no business knowing such an implementation detail.
>>> >>
>>> >> Hasn't that train already left the station with FS_XFLAG_DAX?
>>> >
>>> > No, that's an admin flag, not a runtime hint for applications. Just
>>> > because that flag is set on an inode, it does not mean that DAX is
>>> > actually in use - it will be ignored if the backing dev is not dax
>>> > capable.
>>>
>>> What's the point of an admin flag if an admin can't do cat /proc/>> of interest>/smaps, or some other mechanism, to validate that the
>>> setting the admin cares about is in effect?
>>
>> Sorry, I don't follow - why would you be looking at mapping file
>> regions in /proc to determine if some file somewhere in a filesystem
>> has a specific flag set on it or not?
>>
>> FS_XFLAG_DAX is an inode attribute flag, not something you can
>> query or administrate through mmap:
>>
>> I.e.
>> # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo
>>  --- foo
>>  --x foo
>>  --- foo
>> #
>>
>> What happens when that flag is set on an inode is determined by a
>> whole bunch of other things that are completely separate to the
>> management of the inode flag itself.
>
> Right, I understand that, but how does an admin audit those "bunch of
> other things" that actually gate whether DAX ends up being used in
> practice?  There's currently no way for userspace to observe that a
> file with FS_XFLAG_DAX actually results in a change in mmap behavior.

Let me put it another way, if we inadvertently break DAX causing it to
be disabled in scenarios when it should be enabled.  What is the
interface for the admin to check "I have the DAX inode flag set, but
the file this application expects to be mapped DAX is mapped with page
cache"?


[PATCH V2 3/3] mfd: wm8994-core: Don't use managed regulator bulk get API

2016-09-15 Thread Viresh Kumar
The kernel WARNs and then crashes today if wm8994_device_init() fails
after calling devm_regulator_bulk_get().

That happens because there are multiple devices involved here and the
order in which managed resources are freed isn't correct.

The regulators are added as children of wm8994->dev.  Whereas,
devm_regulator_bulk_get() receives wm8994->dev as the device, though it
gets the same regulators which were added as children of wm8994->dev
earlier.

During failures, the children are removed first and the core eventually
calls regulator_unregister() for them. As regulator_put() was never done
for them (opposite of devm_regulator_bulk_get()), the kernel WARNs at

WARN_ON(rdev->open_count);

And eventually it crashes from debugfs_remove_recursive().

x--x

 wm8994 3-001a: Device is not a WM8994, ID is 0
 [ cut here ]
 WARNING: CPU: 0 PID: 1 at 
/mnt/ssd/all/work/repos/devel/linux/drivers/regulator/core.c:4072 
regulator_unregister+0xc8/0xd0
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc6-00154-g54fe84cbd50b #41
 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
 [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
 [] (show_stack) from [] (dump_stack+0x88/0x9c)
 [] (dump_stack) from [] (__warn+0xe8/0x100)
 [] (__warn) from [] (warn_slowpath_null+0x20/0x28)
 [] (warn_slowpath_null) from [] 
(regulator_unregister+0xc8/0xd0)
 [] (regulator_unregister) from [] 
(release_nodes+0x16c/0x1dc)
 [] (release_nodes) from [] 
(__device_release_driver+0x8c/0x110)
 [] (__device_release_driver) from [] 
(device_release_driver+0x1c/0x28)
 [] (device_release_driver) from [] 
(bus_remove_device+0xd8/0x104)
 [] (bus_remove_device) from [] (device_del+0x10c/0x218)
 [] (device_del) from [] (platform_device_del+0x1c/0x88)
 [] (platform_device_del) from [] 
(platform_device_unregister+0xc/0x20)
 [] (platform_device_unregister) from [] 
(mfd_remove_devices_fn+0x5c/0x64)
 [] (mfd_remove_devices_fn) from [] 
(device_for_each_child_reverse+0x4c/0x78)
 [] (device_for_each_child_reverse) from [] 
(mfd_remove_devices+0x20/0x30)
 [] (mfd_remove_devices) from [] 
(wm8994_device_init+0x2ac/0x7f0)
 [] (wm8994_device_init) from [] 
(i2c_device_probe+0x178/0x1fc)
 [] (i2c_device_probe) from [] 
(driver_probe_device+0x214/0x2c0)
 [] (driver_probe_device) from [] 
(__driver_attach+0xac/0xb0)
 [] (__driver_attach) from [] (bus_for_each_dev+0x68/0x9c)
 [] (bus_for_each_dev) from [] (bus_add_driver+0x1a0/0x218)
 [] (bus_add_driver) from [] (driver_register+0x78/0xf8)
 [] (driver_register) from [] 
(i2c_register_driver+0x34/0x84)
 [] (i2c_register_driver) from [] 
(do_one_initcall+0x40/0x170)
 [] (do_one_initcall) from [] 
(kernel_init_freeable+0x15c/0x1fc)
 [] (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
 [] (kernel_init) from [] (ret_from_fork+0x14/0x3c)
 ---[ end trace 0919d3d0bc998260 ]---

 [snip..]

 Unable to handle kernel NULL pointer dereference at virtual address 0078
 pgd = c0004000
 [0078] *pgd=
 Internal error: Oops: 5 [#1] PREEMPT SMP ARM
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
4.8.0-rc6-00154-g54fe84cbd50b #41
 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
 task: ee874000 task.stack: ee878000
 PC is at down_write+0x14/0x54
 LR is at debugfs_remove_recursive+0x30/0x150

 [snip..]

 [] (down_write) from [] 
(debugfs_remove_recursive+0x30/0x150)
 [] (debugfs_remove_recursive) from [] 
(_regulator_put+0x24/0xac)
 [] (_regulator_put) from [] (regulator_put+0x1c/0x2c)
 [] (regulator_put) from [] (release_nodes+0x16c/0x1dc)
 [] (release_nodes) from [] (driver_probe_device+0xec/0x2c0)
 [] (driver_probe_device) from [] 
(__driver_attach+0xac/0xb0)
 [] (__driver_attach) from [] (bus_for_each_dev+0x68/0x9c)
 [] (bus_for_each_dev) from [] (bus_add_driver+0x1a0/0x218)
 [] (bus_add_driver) from [] (driver_register+0x78/0xf8)
 [] (driver_register) from [] 
(i2c_register_driver+0x34/0x84)
 [] (i2c_register_driver) from [] 
(do_one_initcall+0x40/0x170)
 [] (do_one_initcall) from [] 
(kernel_init_freeable+0x15c/0x1fc)
 [] (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
 [] (kernel_init) from [] (ret_from_fork+0x14/0x3c)
 Code: e1a04000 f590f000 e3a03001 e34f3fff (e1902f9f)
 ---[ end trace 0919d3d0bc998262 ]---

x--x

Fix the kernel warnings and crashes by using regulator_bulk_get()
instead of devm_regulator_bulk_get() and explicitly freeing the supplies
in exit paths.

Tested on Exynos 5250, dual core ARM A15 machine.

Signed-off-by: Viresh Kumar 

---
V1->V2:
- Use regulator_bulk_free() instead of open coding it.
- Shorter backtrace
- Reworded the last paragraph to make it more clear
- Added a comment in code
---
 drivers/mfd/wm8994-core.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/mfd/wm8994-core.c b/drivers/mfd/wm8994-core.c
index 95e6bc55adbb..953d0790ffd5 

[PATCH V2 3/3] mfd: wm8994-core: Don't use managed regulator bulk get API

2016-09-15 Thread Viresh Kumar
The kernel WARNs and then crashes today if wm8994_device_init() fails
after calling devm_regulator_bulk_get().

That happens because there are multiple devices involved here and the
order in which managed resources are freed isn't correct.

The regulators are added as children of wm8994->dev.  Whereas,
devm_regulator_bulk_get() receives wm8994->dev as the device, though it
gets the same regulators which were added as children of wm8994->dev
earlier.

During failures, the children are removed first and the core eventually
calls regulator_unregister() for them. As regulator_put() was never done
for them (opposite of devm_regulator_bulk_get()), the kernel WARNs at

WARN_ON(rdev->open_count);

And eventually it crashes from debugfs_remove_recursive().

x--x

 wm8994 3-001a: Device is not a WM8994, ID is 0
 [ cut here ]
 WARNING: CPU: 0 PID: 1 at 
/mnt/ssd/all/work/repos/devel/linux/drivers/regulator/core.c:4072 
regulator_unregister+0xc8/0xd0
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc6-00154-g54fe84cbd50b #41
 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
 [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
 [] (show_stack) from [] (dump_stack+0x88/0x9c)
 [] (dump_stack) from [] (__warn+0xe8/0x100)
 [] (__warn) from [] (warn_slowpath_null+0x20/0x28)
 [] (warn_slowpath_null) from [] 
(regulator_unregister+0xc8/0xd0)
 [] (regulator_unregister) from [] 
(release_nodes+0x16c/0x1dc)
 [] (release_nodes) from [] 
(__device_release_driver+0x8c/0x110)
 [] (__device_release_driver) from [] 
(device_release_driver+0x1c/0x28)
 [] (device_release_driver) from [] 
(bus_remove_device+0xd8/0x104)
 [] (bus_remove_device) from [] (device_del+0x10c/0x218)
 [] (device_del) from [] (platform_device_del+0x1c/0x88)
 [] (platform_device_del) from [] 
(platform_device_unregister+0xc/0x20)
 [] (platform_device_unregister) from [] 
(mfd_remove_devices_fn+0x5c/0x64)
 [] (mfd_remove_devices_fn) from [] 
(device_for_each_child_reverse+0x4c/0x78)
 [] (device_for_each_child_reverse) from [] 
(mfd_remove_devices+0x20/0x30)
 [] (mfd_remove_devices) from [] 
(wm8994_device_init+0x2ac/0x7f0)
 [] (wm8994_device_init) from [] 
(i2c_device_probe+0x178/0x1fc)
 [] (i2c_device_probe) from [] 
(driver_probe_device+0x214/0x2c0)
 [] (driver_probe_device) from [] 
(__driver_attach+0xac/0xb0)
 [] (__driver_attach) from [] (bus_for_each_dev+0x68/0x9c)
 [] (bus_for_each_dev) from [] (bus_add_driver+0x1a0/0x218)
 [] (bus_add_driver) from [] (driver_register+0x78/0xf8)
 [] (driver_register) from [] 
(i2c_register_driver+0x34/0x84)
 [] (i2c_register_driver) from [] 
(do_one_initcall+0x40/0x170)
 [] (do_one_initcall) from [] 
(kernel_init_freeable+0x15c/0x1fc)
 [] (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
 [] (kernel_init) from [] (ret_from_fork+0x14/0x3c)
 ---[ end trace 0919d3d0bc998260 ]---

 [snip..]

 Unable to handle kernel NULL pointer dereference at virtual address 0078
 pgd = c0004000
 [0078] *pgd=
 Internal error: Oops: 5 [#1] PREEMPT SMP ARM
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
4.8.0-rc6-00154-g54fe84cbd50b #41
 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
 task: ee874000 task.stack: ee878000
 PC is at down_write+0x14/0x54
 LR is at debugfs_remove_recursive+0x30/0x150

 [snip..]

 [] (down_write) from [] 
(debugfs_remove_recursive+0x30/0x150)
 [] (debugfs_remove_recursive) from [] 
(_regulator_put+0x24/0xac)
 [] (_regulator_put) from [] (regulator_put+0x1c/0x2c)
 [] (regulator_put) from [] (release_nodes+0x16c/0x1dc)
 [] (release_nodes) from [] (driver_probe_device+0xec/0x2c0)
 [] (driver_probe_device) from [] 
(__driver_attach+0xac/0xb0)
 [] (__driver_attach) from [] (bus_for_each_dev+0x68/0x9c)
 [] (bus_for_each_dev) from [] (bus_add_driver+0x1a0/0x218)
 [] (bus_add_driver) from [] (driver_register+0x78/0xf8)
 [] (driver_register) from [] 
(i2c_register_driver+0x34/0x84)
 [] (i2c_register_driver) from [] 
(do_one_initcall+0x40/0x170)
 [] (do_one_initcall) from [] 
(kernel_init_freeable+0x15c/0x1fc)
 [] (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
 [] (kernel_init) from [] (ret_from_fork+0x14/0x3c)
 Code: e1a04000 f590f000 e3a03001 e34f3fff (e1902f9f)
 ---[ end trace 0919d3d0bc998262 ]---

x--x

Fix the kernel warnings and crashes by using regulator_bulk_get()
instead of devm_regulator_bulk_get() and explicitly freeing the supplies
in exit paths.

Tested on Exynos 5250, dual core ARM A15 machine.

Signed-off-by: Viresh Kumar 

---
V1->V2:
- Use regulator_bulk_free() instead of open coding it.
- Shorter backtrace
- Reworded the last paragraph to make it more clear
- Added a comment in code
---
 drivers/mfd/wm8994-core.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/mfd/wm8994-core.c b/drivers/mfd/wm8994-core.c
index 95e6bc55adbb..953d0790ffd5 100644
--- 

[PATCH V2 1/3] mfd: wm8994-core: don't split lines unnecessarily

2016-09-15 Thread Viresh Kumar
These can fit in a single line (80 columns), don't split lines
unnecessarily.

Signed-off-by: Viresh Kumar 

---
V1->V2: New patch
---
 drivers/mfd/wm8994-core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/mfd/wm8994-core.c b/drivers/mfd/wm8994-core.c
index 7eec619a6023..1990b2c90732 100644
--- a/drivers/mfd/wm8994-core.c
+++ b/drivers/mfd/wm8994-core.c
@@ -401,8 +401,7 @@ static int wm8994_device_init(struct wm8994 *wm8994, int 
irq)
goto err;
}
 
-   ret = regulator_bulk_enable(wm8994->num_supplies,
-   wm8994->supplies);
+   ret = regulator_bulk_enable(wm8994->num_supplies, wm8994->supplies);
if (ret != 0) {
dev_err(wm8994->dev, "Failed to enable supplies: %d\n", ret);
goto err;
@@ -606,8 +605,7 @@ static void wm8994_device_exit(struct wm8994 *wm8994)
pm_runtime_disable(wm8994->dev);
mfd_remove_devices(wm8994->dev);
wm8994_irq_exit(wm8994);
-   regulator_bulk_disable(wm8994->num_supplies,
-  wm8994->supplies);
+   regulator_bulk_disable(wm8994->num_supplies, wm8994->supplies);
 }
 
 static const struct of_device_id wm8994_of_match[] = {
-- 
2.7.1.410.g6faf27b



[PATCH V2 2/3] mfd: wm8994-core: disable regulators before removing them

2016-09-15 Thread Viresh Kumar
The order in which resources were freed in wm8994_device_exit() isn't
correct. The regulators are removed before they are disabled.

Fix it by reordering code a bit, which makes it exact opposite of
wm8994_device_init() as well.

Signed-off-by: Viresh Kumar 
Acked-by: Charles Keepax 

---
V1->V2:
- Added Ack from Charles.
---
 drivers/mfd/wm8994-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mfd/wm8994-core.c b/drivers/mfd/wm8994-core.c
index 1990b2c90732..95e6bc55adbb 100644
--- a/drivers/mfd/wm8994-core.c
+++ b/drivers/mfd/wm8994-core.c
@@ -603,9 +603,9 @@ static int wm8994_device_init(struct wm8994 *wm8994, int 
irq)
 static void wm8994_device_exit(struct wm8994 *wm8994)
 {
pm_runtime_disable(wm8994->dev);
-   mfd_remove_devices(wm8994->dev);
wm8994_irq_exit(wm8994);
regulator_bulk_disable(wm8994->num_supplies, wm8994->supplies);
+   mfd_remove_devices(wm8994->dev);
 }
 
 static const struct of_device_id wm8994_of_match[] = {
-- 
2.7.1.410.g6faf27b



[PATCH V2 1/3] mfd: wm8994-core: don't split lines unnecessarily

2016-09-15 Thread Viresh Kumar
These can fit in a single line (80 columns), don't split lines
unnecessarily.

Signed-off-by: Viresh Kumar 

---
V1->V2: New patch
---
 drivers/mfd/wm8994-core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/mfd/wm8994-core.c b/drivers/mfd/wm8994-core.c
index 7eec619a6023..1990b2c90732 100644
--- a/drivers/mfd/wm8994-core.c
+++ b/drivers/mfd/wm8994-core.c
@@ -401,8 +401,7 @@ static int wm8994_device_init(struct wm8994 *wm8994, int 
irq)
goto err;
}
 
-   ret = regulator_bulk_enable(wm8994->num_supplies,
-   wm8994->supplies);
+   ret = regulator_bulk_enable(wm8994->num_supplies, wm8994->supplies);
if (ret != 0) {
dev_err(wm8994->dev, "Failed to enable supplies: %d\n", ret);
goto err;
@@ -606,8 +605,7 @@ static void wm8994_device_exit(struct wm8994 *wm8994)
pm_runtime_disable(wm8994->dev);
mfd_remove_devices(wm8994->dev);
wm8994_irq_exit(wm8994);
-   regulator_bulk_disable(wm8994->num_supplies,
-  wm8994->supplies);
+   regulator_bulk_disable(wm8994->num_supplies, wm8994->supplies);
 }
 
 static const struct of_device_id wm8994_of_match[] = {
-- 
2.7.1.410.g6faf27b



[PATCH V2 2/3] mfd: wm8994-core: disable regulators before removing them

2016-09-15 Thread Viresh Kumar
The order in which resources were freed in wm8994_device_exit() isn't
correct. The regulators are removed before they are disabled.

Fix it by reordering code a bit, which makes it exact opposite of
wm8994_device_init() as well.

Signed-off-by: Viresh Kumar 
Acked-by: Charles Keepax 

---
V1->V2:
- Added Ack from Charles.
---
 drivers/mfd/wm8994-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mfd/wm8994-core.c b/drivers/mfd/wm8994-core.c
index 1990b2c90732..95e6bc55adbb 100644
--- a/drivers/mfd/wm8994-core.c
+++ b/drivers/mfd/wm8994-core.c
@@ -603,9 +603,9 @@ static int wm8994_device_init(struct wm8994 *wm8994, int 
irq)
 static void wm8994_device_exit(struct wm8994 *wm8994)
 {
pm_runtime_disable(wm8994->dev);
-   mfd_remove_devices(wm8994->dev);
wm8994_irq_exit(wm8994);
regulator_bulk_disable(wm8994->num_supplies, wm8994->supplies);
+   mfd_remove_devices(wm8994->dev);
 }
 
 static const struct of_device_id wm8994_of_match[] = {
-- 
2.7.1.410.g6faf27b



[fsnotify] ced8959181: kernel BUG at fs/notify/notification.c:66!

2016-09-15 Thread kernel test robot

FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit ced89591817caae5d5e1c0c1e76a7f809d004f57 ("fsnotify: convert 
notification_mutex to a spinlock")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu kvm64,+ssse3 -smp 2 -m 8G

caused below changes:


+--+++
|  | f82fa3d0e7 | ced8959181 |
+--+++
| boot_successes   | 8  | 0  |
| boot_failures| 0  | 8  |
| kernel_BUG_at_fs/notify/notification.c   | 0  | 8  |
| invalid_opcode:#[##]DEBUG_PAGEALLOC  | 0  | 8  |
| RIP:fsnotify_notify_queue_is_empty   | 0  | 8  |
| calltrace:SyS_poll   | 0  | 4  |
| Kernel_panic-not_syncing:Fatal_exception | 0  | 8  |
| calltrace:SyS_epoll_ctl  | 0  | 4  |
+--+++



[   17.967731] random: systemd: uninitialized urandom read (16 bytes read)
[   17.969211] random: systemd: uninitialized urandom read (16 bytes read)
[   17.971379] [ cut here ]
[   17.972173] kernel BUG at fs/notify/notification.c:66!
[   17.973243] invalid opcode:  [#1] DEBUG_PAGEALLOC
[   17.974000] Modules linked in:
[   17.974487] CPU: 0 PID: 1 Comm: systemd Not tainted 4.8.0-rc6-00236-gced8959 
#1
[   17.975597] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   17.976943] task: a1d9f2be7000 task.stack: a1d9f2bfc000
[   17.977825] RIP: 0010:[]  [] 
fsnotify_notify_queue_is_empty+0x9/0x10
[   17.979342] RSP: 0018:a1d9f2bffe58  EFLAGS: 00010297
[   17.980134] RAX: a1d9f2be7000 RBX: a1d9f2bfff00 RCX: a1d9eb8ce408
[   17.981244] RDX:  RSI: a1d9eb8ce358 RDI: a1d9eb8ce3e8
[   17.982452] RBP: a1d9f2bffe58 R08: 0004 R09: a1d9eb8ce340
[   17.983499] R10: 01c4 R11: a1d9eb8ce000 R12: a1d9eb8ce3e8
[   17.984594] R13: a1d9eb8cea00 R14: a1d9eb8ce408 R15: 979ae630
[   17.985710] FS:  7f81ac2c5900() GS:9801f000() 
knlGS:
[   17.986903] CS:  0010 DS:  ES:  CR0: 80050033
[   17.987772] CR2: 7f81aadc12a0 CR3: 00022a715000 CR4: 06b0
[   17.988921] Stack:
[   17.989237]  a1d9f2bffe90 979ada7f fff4 
a1d9eb8ce200
[   17.990420]  317ed0e00019 a1d9eb8cea00 a1d9ebefdd00 
a1d9f2bfff48
[   17.991618]  979afd0a a1d9eb8cea00 a1d9eb8cea00 
a1d9f2bffed0
[   17.992832] Call Trace:
[   17.993215]  [] inotify_poll+0x5f/0x80
[   17.994015]  [] SyS_epoll_ctl+0x55a/0xce0
[   17.994858]  [] ? ep_poll_wakeup_proc+0x30/0x30
[   17.995775]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[   17.996778] Code: 66 90 55 48 89 e5 e8 f7 d4 f0 ff b8 01 00 00 00 0f c1 05 
db 8b 86 00 83 c0 01 5d c3 66 0f 1f 44 00 00 55 48 89 e5 e8 d7 d4 f0 ff <0f> 0b 
0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 f3 49 89 fc e8 
[   18.001178] RIP  [] fsnotify_notify_queue_is_empty+0x9/0x10
[   18.002268]  RSP 
[   18.002809] ---[ end trace 299e1f1032a0eef3 ]---
[   18.003501] Kernel panic - not syncing: Fatal exception





Thanks,
Kernel Test Robot
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.8.0-rc6 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set

[fsnotify] ced8959181: kernel BUG at fs/notify/notification.c:66!

2016-09-15 Thread kernel test robot

FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit ced89591817caae5d5e1c0c1e76a7f809d004f57 ("fsnotify: convert 
notification_mutex to a spinlock")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu kvm64,+ssse3 -smp 2 -m 8G

caused below changes:


+--+++
|  | f82fa3d0e7 | ced8959181 |
+--+++
| boot_successes   | 8  | 0  |
| boot_failures| 0  | 8  |
| kernel_BUG_at_fs/notify/notification.c   | 0  | 8  |
| invalid_opcode:#[##]DEBUG_PAGEALLOC  | 0  | 8  |
| RIP:fsnotify_notify_queue_is_empty   | 0  | 8  |
| calltrace:SyS_poll   | 0  | 4  |
| Kernel_panic-not_syncing:Fatal_exception | 0  | 8  |
| calltrace:SyS_epoll_ctl  | 0  | 4  |
+--+++



[   17.967731] random: systemd: uninitialized urandom read (16 bytes read)
[   17.969211] random: systemd: uninitialized urandom read (16 bytes read)
[   17.971379] [ cut here ]
[   17.972173] kernel BUG at fs/notify/notification.c:66!
[   17.973243] invalid opcode:  [#1] DEBUG_PAGEALLOC
[   17.974000] Modules linked in:
[   17.974487] CPU: 0 PID: 1 Comm: systemd Not tainted 4.8.0-rc6-00236-gced8959 
#1
[   17.975597] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   17.976943] task: a1d9f2be7000 task.stack: a1d9f2bfc000
[   17.977825] RIP: 0010:[]  [] 
fsnotify_notify_queue_is_empty+0x9/0x10
[   17.979342] RSP: 0018:a1d9f2bffe58  EFLAGS: 00010297
[   17.980134] RAX: a1d9f2be7000 RBX: a1d9f2bfff00 RCX: a1d9eb8ce408
[   17.981244] RDX:  RSI: a1d9eb8ce358 RDI: a1d9eb8ce3e8
[   17.982452] RBP: a1d9f2bffe58 R08: 0004 R09: a1d9eb8ce340
[   17.983499] R10: 01c4 R11: a1d9eb8ce000 R12: a1d9eb8ce3e8
[   17.984594] R13: a1d9eb8cea00 R14: a1d9eb8ce408 R15: 979ae630
[   17.985710] FS:  7f81ac2c5900() GS:9801f000() 
knlGS:
[   17.986903] CS:  0010 DS:  ES:  CR0: 80050033
[   17.987772] CR2: 7f81aadc12a0 CR3: 00022a715000 CR4: 06b0
[   17.988921] Stack:
[   17.989237]  a1d9f2bffe90 979ada7f fff4 
a1d9eb8ce200
[   17.990420]  317ed0e00019 a1d9eb8cea00 a1d9ebefdd00 
a1d9f2bfff48
[   17.991618]  979afd0a a1d9eb8cea00 a1d9eb8cea00 
a1d9f2bffed0
[   17.992832] Call Trace:
[   17.993215]  [] inotify_poll+0x5f/0x80
[   17.994015]  [] SyS_epoll_ctl+0x55a/0xce0
[   17.994858]  [] ? ep_poll_wakeup_proc+0x30/0x30
[   17.995775]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[   17.996778] Code: 66 90 55 48 89 e5 e8 f7 d4 f0 ff b8 01 00 00 00 0f c1 05 
db 8b 86 00 83 c0 01 5d c3 66 0f 1f 44 00 00 55 48 89 e5 e8 d7 d4 f0 ff <0f> 0b 
0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 f3 49 89 fc e8 
[   18.001178] RIP  [] fsnotify_notify_queue_is_empty+0x9/0x10
[   18.002268]  RSP 
[   18.002809] ---[ end trace 299e1f1032a0eef3 ]---
[   18.003501] Kernel panic - not syncing: Fatal exception





Thanks,
Kernel Test Robot
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.8.0-rc6 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set

linux-next: build failure after merge of the slave-dma tree

2016-09-15 Thread Stephen Rothwell
Hi Vinod,

After merging the slave-dma tree, today's linux-next build (arm
multi_v7_defconfig build) failed like this:

drivers/video/fbdev/Kconfig:5:error: recursive dependency detected!
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:5:  symbol FB is selected by DRM_KMS_FB_HELPER
drivers/gpu/drm/Kconfig:42: symbol DRM_KMS_FB_HELPER depends on 
DRM_KMS_HELPER
drivers/gpu/drm/Kconfig:36: symbol DRM_KMS_HELPER is selected by 
DRM_VIRTIO_GPU
drivers/gpu/drm/virtio/Kconfig:1:   symbol DRM_VIRTIO_GPU depends on VIRTIO
drivers/virtio/Kconfig:1:   symbol VIRTIO is selected by REMOTEPROC
drivers/remoteproc/Kconfig:4:   symbol REMOTEPROC is selected by 
ST_SLIM_REMOTEPROC
drivers/remoteproc/Kconfig:103: symbol ST_SLIM_REMOTEPROC is selected by ST_FDMA
drivers/dma/Kconfig:440:symbol ST_FDMA depends on DMADEVICES
drivers/dma/Kconfig:5:  symbol DMADEVICES is selected by SND_SOC_SH4_SIU
sound/soc/sh/Kconfig:29:symbol SND_SOC_SH4_SIU is selected by 
SND_SIU_MIGOR
sound/soc/sh/Kconfig:58:symbol SND_SIU_MIGOR depends on I2C
drivers/i2c/Kconfig:7:  symbol I2C is selected by FB_DDC
drivers/video/fbdev/Kconfig:63: symbol FB_DDC is selected by FB_CYBER2000_DDC
drivers/video/fbdev/Kconfig:382:symbol FB_CYBER2000_DDC depends on 
FB_CYBER2000
drivers/video/fbdev/Kconfig:370:symbol FB_CYBER2000 depends on FB
*
* Restart config...
*
*
* USB HID Boot Protocol drivers
*
USB HIDBP Keyboard (simple Boot) support (USB_KBD) [N/m/y/?] (NEW) aborted!

Console input/output is redirected. Run 'make oldconfig' to update 
configuration.

/home/sfr/next/next/scripts/kconfig/Makefile:37: recipe for target 
'silentoldconfig' failed

Probably caused by commits

  73f19dcade3d ("remoteproc: st_slim_rproc: add a slimcore rproc driver")
  e0cbdfc766bf ("dmaengine: st_fdma: Add STMicroelectronics FDMA engine driver 
support")
  a490790f8df3 ("ARM: multi_v7_defconfig: Enable STi and simple-card drivers.")

I have used the slave-dma tree from next-20160915 for today.

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the slave-dma tree

2016-09-15 Thread Stephen Rothwell
Hi Vinod,

After merging the slave-dma tree, today's linux-next build (arm
multi_v7_defconfig build) failed like this:

drivers/video/fbdev/Kconfig:5:error: recursive dependency detected!
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:5:  symbol FB is selected by DRM_KMS_FB_HELPER
drivers/gpu/drm/Kconfig:42: symbol DRM_KMS_FB_HELPER depends on 
DRM_KMS_HELPER
drivers/gpu/drm/Kconfig:36: symbol DRM_KMS_HELPER is selected by 
DRM_VIRTIO_GPU
drivers/gpu/drm/virtio/Kconfig:1:   symbol DRM_VIRTIO_GPU depends on VIRTIO
drivers/virtio/Kconfig:1:   symbol VIRTIO is selected by REMOTEPROC
drivers/remoteproc/Kconfig:4:   symbol REMOTEPROC is selected by 
ST_SLIM_REMOTEPROC
drivers/remoteproc/Kconfig:103: symbol ST_SLIM_REMOTEPROC is selected by ST_FDMA
drivers/dma/Kconfig:440:symbol ST_FDMA depends on DMADEVICES
drivers/dma/Kconfig:5:  symbol DMADEVICES is selected by SND_SOC_SH4_SIU
sound/soc/sh/Kconfig:29:symbol SND_SOC_SH4_SIU is selected by 
SND_SIU_MIGOR
sound/soc/sh/Kconfig:58:symbol SND_SIU_MIGOR depends on I2C
drivers/i2c/Kconfig:7:  symbol I2C is selected by FB_DDC
drivers/video/fbdev/Kconfig:63: symbol FB_DDC is selected by FB_CYBER2000_DDC
drivers/video/fbdev/Kconfig:382:symbol FB_CYBER2000_DDC depends on 
FB_CYBER2000
drivers/video/fbdev/Kconfig:370:symbol FB_CYBER2000 depends on FB
*
* Restart config...
*
*
* USB HID Boot Protocol drivers
*
USB HIDBP Keyboard (simple Boot) support (USB_KBD) [N/m/y/?] (NEW) aborted!

Console input/output is redirected. Run 'make oldconfig' to update 
configuration.

/home/sfr/next/next/scripts/kconfig/Makefile:37: recipe for target 
'silentoldconfig' failed

Probably caused by commits

  73f19dcade3d ("remoteproc: st_slim_rproc: add a slimcore rproc driver")
  e0cbdfc766bf ("dmaengine: st_fdma: Add STMicroelectronics FDMA engine driver 
support")
  a490790f8df3 ("ARM: multi_v7_defconfig: Enable STi and simple-card drivers.")

I have used the slave-dma tree from next-20160915 for today.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

2016-09-15 Thread Aneesh Kumar K.V
js1...@gmail.com writes:

> From: Joonsoo Kim 
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
>  #ifdef CONFIG_HIGHMEM
>  32
>  #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Signed-off-by: Joonsoo Kim 
> ---
>  include/linux/mmzone.h | 2 +-
>  mm/page_alloc.c| 7 ---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d572b78..e3f39af 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, 
> int,
>   void __user *, size_t *, loff_t *);
>  int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
>   void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
>  int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
>   void __user *, size_t *, loff_t *);
>  int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4f7d5d7..a8310de 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned 
> int order);
>   * TBD: should special case ZONE_DMA32 machines here - in those we normally
>   * don't need any ZONE_NORMAL reservation
>   */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
>  #ifdef CONFIG_ZONE_DMA
>256,
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
>256,
>  #endif
> -#ifdef CONFIG_HIGHMEM
>32,
> +#ifdef CONFIG_HIGHMEM
> +  INT_MAX,
>  #endif
> -  32,
> +  INT_MAX,
>  };
>
>  EXPORT_SYMBOL(totalram_pages);
> -- 
> 1.9.1

We can also do things like below to make it readable ?

#ifdef CONFIG_ZONE_DMA
[ZONE_DMA] = 256,
#endif

Reviewed-by: Aneesh Kumar K.V 



Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

2016-09-15 Thread Aneesh Kumar K.V
js1...@gmail.com writes:

> From: Joonsoo Kim 
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
>  #ifdef CONFIG_HIGHMEM
>  32
>  #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Signed-off-by: Joonsoo Kim 
> ---
>  include/linux/mmzone.h | 2 +-
>  mm/page_alloc.c| 7 ---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d572b78..e3f39af 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, 
> int,
>   void __user *, size_t *, loff_t *);
>  int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
>   void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
>  int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
>   void __user *, size_t *, loff_t *);
>  int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4f7d5d7..a8310de 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned 
> int order);
>   * TBD: should special case ZONE_DMA32 machines here - in those we normally
>   * don't need any ZONE_NORMAL reservation
>   */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
>  #ifdef CONFIG_ZONE_DMA
>256,
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
>256,
>  #endif
> -#ifdef CONFIG_HIGHMEM
>32,
> +#ifdef CONFIG_HIGHMEM
> +  INT_MAX,
>  #endif
> -  32,
> +  INT_MAX,
>  };
>
>  EXPORT_SYMBOL(totalram_pages);
> -- 
> 1.9.1

We can also do things like below to make it readable ?

#ifdef CONFIG_ZONE_DMA
[ZONE_DMA] = 256,
#endif

Reviewed-by: Aneesh Kumar K.V 



Re: [PATCH 3/4] autofs - make mountpoint checks namespace aware

2016-09-15 Thread Ian Kent
On Thu, 2016-09-15 at 19:47 -0500, Eric W. Biederman wrote:
> Ian Kent  writes:
> 
> > On Wed, 2016-09-14 at 21:08 -0500, Eric W. Biederman wrote:
> > > Ian Kent  writes:
> > > 
> > > > On Wed, 2016-09-14 at 12:28 -0500, Eric W. Biederman wrote:
> > > > > Ian Kent  writes:
> > > > > 
> > > > > > If an automount mount is clone(2)ed into a file system that is
> > > > > > propagation private, when it later expires in the originating
> > > > > > namespace subsequent calls to autofs ->d_automount() for that
> > > > > > dentry in the original namespace will return ELOOP until the
> > > > > > mount is manually umounted in the cloned namespace.
> > > > > > 
> > > > > > In the same way, if an autofs mount is triggered by automount(8)
> > > > > > running within a container the dentry will be seen as mounted in
> > > > > > the root init namespace and calls to ->d_automount() in that
> > > > > > namespace
> > > > > > will return ELOOP until the mount is umounted within the container.
> > > > > > 
> > > > > > Also, have_submounts() can return an incorect result when a mount
> > > > > > exists in a namespace other than the one being checked.
> > > > > 
> > > > > Overall this appears to be a fairly reasonable set of changes.  It
> > > > > does
> > > > > increase the expense when an actual mount point is encountered, but if
> > > > > these are the desired some increase in cost when a dentry is a
> > > > > mountpoint is unavoidable.
> > > > > 
> > > > > May I ask the motiviation for this set of changes?  Reading through
> > > > > the
> > > > > changes I don't grasp why we want to change the behavior of autofs.
> > > > > What problem is being solved?  What are the benefits?
> > > > 
> > > > LOL, it's all too easy for me to give a patch description that I think
> > > > explains
> > > > a problem I need to solve without realizing it isn't clear to others
> > > > what
> > > > the
> > > > problem is, sorry about that.
> > > > 
> > > > For quite a while now, and not that frequently but consistently, I've
> > > > been
> > > > getting reports of people using autofs getting ELOOP errors and not
> > > > being
> > > > able
> > > > to mount automounts.
> > > > 
> > > > This has been due to the cloning of autofs file systems (that have
> > > > active
> > > > automounts at the time of the clone) by other systems.
> > > > 
> > > > An unshare, as one example, can easily result in the cloning of an
> > > > autofs
> > > > file
> > > > system that has active mounts which shows this problem.
> > > > 
> > > > Once an active mount that has been cloned is expired in the namespace
> > > > that
> > > > performed the unshare it can't be (auto)mounted again in the the
> > > > originating
> > > > namespace because the mounted check in the autofs module will think it
> > > > is
> > > > already mounted.
> > > > 
> > > > I'm not sure this is a clear description either, hopefully it is enough
> > > > to
> > > > demonstrate the type of problem I'm typing to solve.
> > > 
> > > So to rephrase the problem is that an autofs instance can stop working
> > > properly from the perspective of the mount namespace it is mounted in
> > > if the autofs instance is shared between multiple mount namespaces.  The
> > > problem is that mounts and unmounts do not always propogate between
> > > mount namespaces.  This lack of symmetric mount/unmount behavior
> > > leads to mountpoints that become unusable.
> > 
> > That's right.
> > 
> > It's also worth considering that symmetric mount propagation is usually not
> > the
> > behaviour needed either and things like LXC and Docker are set propagation
> > slave
> > because of problems caused by propagation back to the parent namespace.
> > 
> > So a mount can be triggered within a container, mounted by the automount
> > daemon
> > in the parent namespace, and propagated to the child and similarly for
> > expires,
> > which is the common use case now.
> > 
> > > 
> > > Which leads to the question what is the expected new behavior with your
> > > patchset applied.  New mounts can be added in the parent mount namespace
> > > (because the test is local).  Does your change also allow the
> > > autofs mountpoints to be used in the other mount namespaces that share
> > > the autofs instance if everything becomes unmounted?
> > 
> > The problem occurs when the subordinate namespace doesn't deal with these
> > propagated mounts properly, although they can obviously be used by the
> > subordinate namespace.
> > 
> > > 
> > > Or is it expected that other mount namespaces that share an autofs
> > > instance will get changes in their mounts via mount propagation and if
> > > mount propagation is insufficient they are on their own.
> > 
> > Namespaces that receive updates via mount propagation from a parent will
> > continue to function as they do now.
> > 
> > Mounts that don't get updates via mount propagation will retain the mount to
> > use
> > if they need to, as they would without 

Re: [PATCH 3/4] autofs - make mountpoint checks namespace aware

2016-09-15 Thread Ian Kent
On Thu, 2016-09-15 at 19:47 -0500, Eric W. Biederman wrote:
> Ian Kent  writes:
> 
> > On Wed, 2016-09-14 at 21:08 -0500, Eric W. Biederman wrote:
> > > Ian Kent  writes:
> > > 
> > > > On Wed, 2016-09-14 at 12:28 -0500, Eric W. Biederman wrote:
> > > > > Ian Kent  writes:
> > > > > 
> > > > > > If an automount mount is clone(2)ed into a file system that is
> > > > > > propagation private, when it later expires in the originating
> > > > > > namespace subsequent calls to autofs ->d_automount() for that
> > > > > > dentry in the original namespace will return ELOOP until the
> > > > > > mount is manually umounted in the cloned namespace.
> > > > > > 
> > > > > > In the same way, if an autofs mount is triggered by automount(8)
> > > > > > running within a container the dentry will be seen as mounted in
> > > > > > the root init namespace and calls to ->d_automount() in that
> > > > > > namespace
> > > > > > will return ELOOP until the mount is umounted within the container.
> > > > > > 
> > > > > > Also, have_submounts() can return an incorect result when a mount
> > > > > > exists in a namespace other than the one being checked.
> > > > > 
> > > > > Overall this appears to be a fairly reasonable set of changes.  It
> > > > > does
> > > > > increase the expense when an actual mount point is encountered, but if
> > > > > these are the desired some increase in cost when a dentry is a
> > > > > mountpoint is unavoidable.
> > > > > 
> > > > > May I ask the motiviation for this set of changes?  Reading through
> > > > > the
> > > > > changes I don't grasp why we want to change the behavior of autofs.
> > > > > What problem is being solved?  What are the benefits?
> > > > 
> > > > LOL, it's all too easy for me to give a patch description that I think
> > > > explains
> > > > a problem I need to solve without realizing it isn't clear to others
> > > > what
> > > > the
> > > > problem is, sorry about that.
> > > > 
> > > > For quite a while now, and not that frequently but consistently, I've
> > > > been
> > > > getting reports of people using autofs getting ELOOP errors and not
> > > > being
> > > > able
> > > > to mount automounts.
> > > > 
> > > > This has been due to the cloning of autofs file systems (that have
> > > > active
> > > > automounts at the time of the clone) by other systems.
> > > > 
> > > > An unshare, as one example, can easily result in the cloning of an
> > > > autofs
> > > > file
> > > > system that has active mounts which shows this problem.
> > > > 
> > > > Once an active mount that has been cloned is expired in the namespace
> > > > that
> > > > performed the unshare it can't be (auto)mounted again in the the
> > > > originating
> > > > namespace because the mounted check in the autofs module will think it
> > > > is
> > > > already mounted.
> > > > 
> > > > I'm not sure this is a clear description either, hopefully it is enough
> > > > to
> > > > demonstrate the type of problem I'm typing to solve.
> > > 
> > > So to rephrase the problem is that an autofs instance can stop working
> > > properly from the perspective of the mount namespace it is mounted in
> > > if the autofs instance is shared between multiple mount namespaces.  The
> > > problem is that mounts and unmounts do not always propogate between
> > > mount namespaces.  This lack of symmetric mount/unmount behavior
> > > leads to mountpoints that become unusable.
> > 
> > That's right.
> > 
> > It's also worth considering that symmetric mount propagation is usually not
> > the
> > behaviour needed either and things like LXC and Docker are set propagation
> > slave
> > because of problems caused by propagation back to the parent namespace.
> > 
> > So a mount can be triggered within a container, mounted by the automount
> > daemon
> > in the parent namespace, and propagated to the child and similarly for
> > expires,
> > which is the common use case now.
> > 
> > > 
> > > Which leads to the question what is the expected new behavior with your
> > > patchset applied.  New mounts can be added in the parent mount namespace
> > > (because the test is local).  Does your change also allow the
> > > autofs mountpoints to be used in the other mount namespaces that share
> > > the autofs instance if everything becomes unmounted?
> > 
> > The problem occurs when the subordinate namespace doesn't deal with these
> > propagated mounts properly, although they can obviously be used by the
> > subordinate namespace.
> > 
> > > 
> > > Or is it expected that other mount namespaces that share an autofs
> > > instance will get changes in their mounts via mount propagation and if
> > > mount propagation is insufficient they are on their own.
> > 
> > Namespaces that receive updates via mount propagation from a parent will
> > continue to function as they do now.
> > 
> > Mounts that don't get updates via mount propagation will retain the mount to
> > use
> > if they need to, as they would without this change, but the originating
> > namespace will 

Re: [PATCH] PM / OPP: avoid maybe-uninitialized warning

2016-09-15 Thread Viresh Kumar
On 15-09-16, 17:38, Arnd Bergmann wrote:
> When CONFIG_OPTIMIZE_INLINING is set and we are building with 
> -Wmaybe-uninitialized
> enabled, we can get a warning for the opp core driver:
> 
> drivers/base/power/opp/core.c: In function 'dev_pm_opp_set_rate':
> drivers/base/power/opp/core.c:560:8: warning: 'ou_volt_min' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
> 
> This has only now appeared as a result of commit 797da5598f3a ("PM / devfreq:
> Add COMPILE_TEST for build coverage"), which makes the driver visible in
> some configurations that didn't have it before.
> 
> The warning is a false positive that I got with gcc-6.1.1, but there is
> a simple workaround in removing the local variables that we get warnings
> for (all three are affected depending on the configuration). This also
> makes the code easier to read.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/base/power/opp/core.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
> index df0c70963d9e..4c7c6da7a989 100644
> --- a/drivers/base/power/opp/core.c
> +++ b/drivers/base/power/opp/core.c
> @@ -584,7 +584,6 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
> target_freq)
>   struct clk *clk;
>   unsigned long freq, old_freq;
>   unsigned long u_volt, u_volt_min, u_volt_max;
> - unsigned long ou_volt, ou_volt_min, ou_volt_max;
>   int ret;
>  
>   if (unlikely(!target_freq)) {
> @@ -620,11 +619,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned 
> long target_freq)
>   }
>  
>   old_opp = _find_freq_ceil(opp_table, _freq);
> - if (!IS_ERR(old_opp)) {
> - ou_volt = old_opp->u_volt;
> - ou_volt_min = old_opp->u_volt_min;
> - ou_volt_max = old_opp->u_volt_max;
> - } else {
> + if (IS_ERR(old_opp)) {
>   dev_err(dev, "%s: failed to find current OPP for freq %lu 
> (%ld)\n",
>   __func__, old_freq, PTR_ERR(old_opp));
>   }
> @@ -683,7 +678,8 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
> target_freq)
>  restore_voltage:
>   /* This shouldn't harm even if the voltages weren't updated earlier */
>   if (!IS_ERR(old_opp))
> - _set_opp_voltage(dev, reg, ou_volt, ou_volt_min, ou_volt_max);
> + _set_opp_voltage(dev, reg, old_opp->u_volt,
> +  old_opp->u_volt_min, old_opp->u_volt_max);
>  
>   return ret;
>  }

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] PM / OPP: avoid maybe-uninitialized warning

2016-09-15 Thread Viresh Kumar
On 15-09-16, 17:38, Arnd Bergmann wrote:
> When CONFIG_OPTIMIZE_INLINING is set and we are building with 
> -Wmaybe-uninitialized
> enabled, we can get a warning for the opp core driver:
> 
> drivers/base/power/opp/core.c: In function 'dev_pm_opp_set_rate':
> drivers/base/power/opp/core.c:560:8: warning: 'ou_volt_min' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
> 
> This has only now appeared as a result of commit 797da5598f3a ("PM / devfreq:
> Add COMPILE_TEST for build coverage"), which makes the driver visible in
> some configurations that didn't have it before.
> 
> The warning is a false positive that I got with gcc-6.1.1, but there is
> a simple workaround in removing the local variables that we get warnings
> for (all three are affected depending on the configuration). This also
> makes the code easier to read.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/base/power/opp/core.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
> index df0c70963d9e..4c7c6da7a989 100644
> --- a/drivers/base/power/opp/core.c
> +++ b/drivers/base/power/opp/core.c
> @@ -584,7 +584,6 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
> target_freq)
>   struct clk *clk;
>   unsigned long freq, old_freq;
>   unsigned long u_volt, u_volt_min, u_volt_max;
> - unsigned long ou_volt, ou_volt_min, ou_volt_max;
>   int ret;
>  
>   if (unlikely(!target_freq)) {
> @@ -620,11 +619,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned 
> long target_freq)
>   }
>  
>   old_opp = _find_freq_ceil(opp_table, _freq);
> - if (!IS_ERR(old_opp)) {
> - ou_volt = old_opp->u_volt;
> - ou_volt_min = old_opp->u_volt_min;
> - ou_volt_max = old_opp->u_volt_max;
> - } else {
> + if (IS_ERR(old_opp)) {
>   dev_err(dev, "%s: failed to find current OPP for freq %lu 
> (%ld)\n",
>   __func__, old_freq, PTR_ERR(old_opp));
>   }
> @@ -683,7 +678,8 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
> target_freq)
>  restore_voltage:
>   /* This shouldn't harm even if the voltages weren't updated earlier */
>   if (!IS_ERR(old_opp))
> - _set_opp_voltage(dev, reg, ou_volt, ou_volt_min, ou_volt_max);
> + _set_opp_voltage(dev, reg, old_opp->u_volt,
> +  old_opp->u_volt_min, old_opp->u_volt_max);
>  
>   return ret;
>  }

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] staging: android ion/hisi: fix dependencies

2016-09-15 Thread Sumit Semwal
On 16 September 2016 at 07:40, Laura Abbott  wrote:
> On 09/15/2016 08:57 AM, Arnd Bergmann wrote:
>>
>> The newly added Hi6220 Ion code fails to build when the ION_OF helpers
>> are not present:
>>
>> drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function
>> `hi6220_ion_remove':
>> hi6220_ion.c:(.text.hi6220_ion_remove+0x4c): undefined reference to
>> `ion_destroy_platform_data'
>> drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function
>> `hi6220_ion_probe':
>> hi6220_ion.c:(.text.hi6220_ion_probe+0x5c): undefined reference to
>> `ion_parse_dt'
>> hi6220_ion.c:(.text.hi6220_ion_probe+0xf8): undefined reference to
>> `ion_destroy_platform_data'
>>
>> This selects the symbol when needed.
>>
>
> Acked-by: Laura Abbott 
>
Reviewed-by: Sumit Semwal 
>
>> Signed-off-by: Arnd Bergmann 
>> Fixes: 2b40182a19bc ("staging: android: ion: Add ion driver for Hi6220 SoC
>> platform")
>> ---
>>  drivers/staging/android/ion/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/staging/android/ion/Kconfig
>> b/drivers/staging/android/ion/Kconfig
>> index 94105544feef..c8fb4134c55d 100644
>> --- a/drivers/staging/android/ion/Kconfig
>> +++ b/drivers/staging/android/ion/Kconfig
>> @@ -36,6 +36,7 @@ config ION_TEGRA
>>  config ION_HISI
>> tristate "Ion for Hisilicon"
>> depends on ARCH_HISI && ION
>> +   select ION_OF
>> help
>>   Choose this option if you wish to use ion on Hisilicon Platform.
>>
>>
>


Re: [PATCH] staging: android ion/hisi: fix dependencies

2016-09-15 Thread Sumit Semwal
On 16 September 2016 at 07:40, Laura Abbott  wrote:
> On 09/15/2016 08:57 AM, Arnd Bergmann wrote:
>>
>> The newly added Hi6220 Ion code fails to build when the ION_OF helpers
>> are not present:
>>
>> drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function
>> `hi6220_ion_remove':
>> hi6220_ion.c:(.text.hi6220_ion_remove+0x4c): undefined reference to
>> `ion_destroy_platform_data'
>> drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function
>> `hi6220_ion_probe':
>> hi6220_ion.c:(.text.hi6220_ion_probe+0x5c): undefined reference to
>> `ion_parse_dt'
>> hi6220_ion.c:(.text.hi6220_ion_probe+0xf8): undefined reference to
>> `ion_destroy_platform_data'
>>
>> This selects the symbol when needed.
>>
>
> Acked-by: Laura Abbott 
>
Reviewed-by: Sumit Semwal 
>
>> Signed-off-by: Arnd Bergmann 
>> Fixes: 2b40182a19bc ("staging: android: ion: Add ion driver for Hi6220 SoC
>> platform")
>> ---
>>  drivers/staging/android/ion/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/staging/android/ion/Kconfig
>> b/drivers/staging/android/ion/Kconfig
>> index 94105544feef..c8fb4134c55d 100644
>> --- a/drivers/staging/android/ion/Kconfig
>> +++ b/drivers/staging/android/ion/Kconfig
>> @@ -36,6 +36,7 @@ config ION_TEGRA
>>  config ION_HISI
>> tristate "Ion for Hisilicon"
>> depends on ARCH_HISI && ION
>> +   select ION_OF
>> help
>>   Choose this option if you wish to use ion on Hisilicon Platform.
>>
>>
>


Re: [PATCH -next] staging: android: ion: Fix return value check in hi6220_ion_probe()

2016-09-15 Thread Sumit Semwal
On 16 September 2016 at 07:40, Laura Abbott  wrote:
> On 09/14/2016 07:20 PM, Wei Yongjun wrote:
>>
>> From: Wei Yongjun 
>>
>> In case of error, the function ion_device_create() returns ERR_PTR() and
>> never returns NULL. The NULL test in the return value check should be
>> replaced with IS_ERR().
>>
>
> Acked-by: Laura Abbott 
>
Reviewed-by: Sumit Semwal 
>
>> Signed-off-by: Wei Yongjun 
>> ---
>>  drivers/staging/android/ion/hisilicon/hi6220_ion.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> index f392db2..659aa71 100644
>> --- a/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> +++ b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> @@ -49,8 +49,8 @@ static int hi6220_ion_probe(struct platform_device
>> *pdev)
>> platform_set_drvdata(pdev, ipdev);
>>
>> ipdev->idev = ion_device_create(NULL);
>> -   if (!ipdev->idev)
>> -   return -ENOMEM;
>> +   if (IS_ERR(ipdev->idev))
>> +   return PTR_ERR(ipdev->idev);
>>
>> ipdev->data = ion_parse_dt(pdev, hisi_heaps);
>> if (IS_ERR(ipdev->data))
>>
>>
>>
>



-- 
Thanks and regards,

Sumit Semwal
Linaro Mobile Group - Kernel Team Lead
Linaro.org │ Open source software for ARM SoCs


Re: [PATCH -next] staging: android: ion: Fix return value check in hi6220_ion_probe()

2016-09-15 Thread Sumit Semwal
On 16 September 2016 at 07:40, Laura Abbott  wrote:
> On 09/14/2016 07:20 PM, Wei Yongjun wrote:
>>
>> From: Wei Yongjun 
>>
>> In case of error, the function ion_device_create() returns ERR_PTR() and
>> never returns NULL. The NULL test in the return value check should be
>> replaced with IS_ERR().
>>
>
> Acked-by: Laura Abbott 
>
Reviewed-by: Sumit Semwal 
>
>> Signed-off-by: Wei Yongjun 
>> ---
>>  drivers/staging/android/ion/hisilicon/hi6220_ion.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> index f392db2..659aa71 100644
>> --- a/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> +++ b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
>> @@ -49,8 +49,8 @@ static int hi6220_ion_probe(struct platform_device
>> *pdev)
>> platform_set_drvdata(pdev, ipdev);
>>
>> ipdev->idev = ion_device_create(NULL);
>> -   if (!ipdev->idev)
>> -   return -ENOMEM;
>> +   if (IS_ERR(ipdev->idev))
>> +   return PTR_ERR(ipdev->idev);
>>
>> ipdev->data = ion_parse_dt(pdev, hisi_heaps);
>> if (IS_ERR(ipdev->data))
>>
>>
>>
>



-- 
Thanks and regards,

Sumit Semwal
Linaro Mobile Group - Kernel Team Lead
Linaro.org │ Open source software for ARM SoCs


Re: Crashing 'kzm' target in next-20160913 due to 'gpio: mxc: shift gpio_mxc_init() to subsys_initcall level'

2016-09-15 Thread Guenter Roeck

On 09/15/2016 07:46 AM, Uwe Kleine-König wrote:

On Thu, Sep 15, 2016 at 07:35:16AM -0700, Guenter Roeck wrote:

On 09/15/2016 07:23 AM, Uwe Kleine-König wrote:

On Thu, Sep 15, 2016 at 04:35:04PM +0300, Vladimir Zapolskiy wrote:

Hi Guenter,

On 09/14/2016 06:20 AM, Guenter Roeck wrote:

Hi Vladimir,

your commit e188cbf7564f ("gpio: mxc: shift gpio_mxc_init() to subsys_initcall 
level")
in -next causes the following crash when running the 'kzm' target (and most 
likely
the real thing) with qemu.

[1.211426] Unable to handle kernel NULL pointer dereference at virtual 
address 000c
[1.211600] pgd = c0004000
[1.211680] [000c] *pgd=
[1.212067] Internal error: Oops: 5 [#1] SMP ARM
[1.212245] Modules linked in:
[1.212542] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc6-next-20160913 #1
[1.212671] Hardware name: Kyoto Microcomputer Co., Ltd. KZM-ARM11-01
[1.212825] task: c6848000 task.stack: c683e000
[1.213231] PC is at platform_get_irq+0xc0/0xe8

See 
http://kerneltests.org/builders/qemu-arm-next/builds/525/steps/qemubuildcommand/logs/stdio
for a complete log.

Problem is quite subtle. The change causes the gpio driver to be installed 
later.
As a result, kzm_init_smsc9118() fails to initialize the gpio pins correctly.
gpio_request() in that function returns -EPROBE_DEFER, which is ignored,
gpio_to_irq() then returns -22 which is unconditionally assigned as interrupt 
number.
platform_get_irq(), as called from the smsc driver, gets this negative interrupt
number, and passes it unconditionally to irq_get_irq_data(), which returns NULL.
The NULL pointer is then passed to irqd_set_trigger_type() which, not entirely
surprisingly, crashes.

So, in other words, lots of bugs here. Nevertheless, I would suggest to keep 
using
postcore_initcall(), at least until it is sure that all gpio clients handle 
-EPROBE_DEFER
correctly.


I'm inviting Shawn and Uwe to the discussion.

The proper fix in this particular case should be like this one:

diff --git a/arch/arm/mach-imx/mach-kzm_arm11_01.c 
b/arch/arm/mach-imx/mach-kzm_arm11_01.c
index 31df4361996f..8288acfe7221 100644
--- a/arch/arm/mach-imx/mach-kzm_arm11_01.c
+++ b/arch/arm/mach-imx/mach-kzm_arm11_01.c
@@ -245,13 +245,17 @@ static void __init kzm_board_init(void)
mxc_iomux_setup_multiple_pins(kzm_pins,
  ARRAY_SIZE(kzm_pins), "kzm");
-   kzm_init_ext_uart();
-   kzm_init_smsc9118();
kzm_init_imx_uart();
pr_info("Clock input source is 26MHz\n");
 }
+static void __init kzm_late_init(void)
+{
+   kzm_init_ext_uart();
+   kzm_init_smsc9118();
+}
+
 /*
  * This structure defines static mappings for the kzm-arm11-01 board.
  */
@@ -291,5 +295,6 @@ MACHINE_START(KZM_ARM11_01, "Kyoto Microcomputer Co., Ltd. 
KZM-ARM11-01")
.init_irq = mx31_init_irq,
.init_time  = kzm_timer_init,
.init_machine = kzm_board_init,
+   .init_late  = kzm_late_init,
.restart= mxc_restart,
 MACHINE_END


That + checking the return code of gpio_request and the other calls.
Or better, convert the machine to dt.


But I agree that there might be more legacy boards (i.MX31 only IMHO),
which may attempt to manipulate GPIO lines before subsys_initcall()
level.


I wouldn't revert anything for legacy boards. That's the chance to say
in the near future: They stopped working in September 2016, obviously
nobody cares, let's rip them. :-)


New kernel development philosophy ? Regressions are acceptable as long as
they affect a board older than X years ? What is your cut-off date for accepting
regressions like that ?


I would soften your statement about regressions to: Regressions like
these are an incentive to pre-dt platforms to modernize. Of course they
should ideally be prevented, and fixed when they happen, but I wouldn't
start reverting gpio-changes because of a regression on a machine that
is hardly used with modern kernel (ok, that's an assumption, but it
didn't receive enough love to be converted to dt), that fails to even
check return values of gpio_request and that can be easily fixed.


All 890 of them ? Really ?

Guenter



Re: Crashing 'kzm' target in next-20160913 due to 'gpio: mxc: shift gpio_mxc_init() to subsys_initcall level'

2016-09-15 Thread Guenter Roeck

On 09/15/2016 07:46 AM, Uwe Kleine-König wrote:

On Thu, Sep 15, 2016 at 07:35:16AM -0700, Guenter Roeck wrote:

On 09/15/2016 07:23 AM, Uwe Kleine-König wrote:

On Thu, Sep 15, 2016 at 04:35:04PM +0300, Vladimir Zapolskiy wrote:

Hi Guenter,

On 09/14/2016 06:20 AM, Guenter Roeck wrote:

Hi Vladimir,

your commit e188cbf7564f ("gpio: mxc: shift gpio_mxc_init() to subsys_initcall 
level")
in -next causes the following crash when running the 'kzm' target (and most 
likely
the real thing) with qemu.

[1.211426] Unable to handle kernel NULL pointer dereference at virtual 
address 000c
[1.211600] pgd = c0004000
[1.211680] [000c] *pgd=
[1.212067] Internal error: Oops: 5 [#1] SMP ARM
[1.212245] Modules linked in:
[1.212542] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc6-next-20160913 #1
[1.212671] Hardware name: Kyoto Microcomputer Co., Ltd. KZM-ARM11-01
[1.212825] task: c6848000 task.stack: c683e000
[1.213231] PC is at platform_get_irq+0xc0/0xe8

See 
http://kerneltests.org/builders/qemu-arm-next/builds/525/steps/qemubuildcommand/logs/stdio
for a complete log.

Problem is quite subtle. The change causes the gpio driver to be installed 
later.
As a result, kzm_init_smsc9118() fails to initialize the gpio pins correctly.
gpio_request() in that function returns -EPROBE_DEFER, which is ignored,
gpio_to_irq() then returns -22 which is unconditionally assigned as interrupt 
number.
platform_get_irq(), as called from the smsc driver, gets this negative interrupt
number, and passes it unconditionally to irq_get_irq_data(), which returns NULL.
The NULL pointer is then passed to irqd_set_trigger_type() which, not entirely
surprisingly, crashes.

So, in other words, lots of bugs here. Nevertheless, I would suggest to keep 
using
postcore_initcall(), at least until it is sure that all gpio clients handle 
-EPROBE_DEFER
correctly.


I'm inviting Shawn and Uwe to the discussion.

The proper fix in this particular case should be like this one:

diff --git a/arch/arm/mach-imx/mach-kzm_arm11_01.c 
b/arch/arm/mach-imx/mach-kzm_arm11_01.c
index 31df4361996f..8288acfe7221 100644
--- a/arch/arm/mach-imx/mach-kzm_arm11_01.c
+++ b/arch/arm/mach-imx/mach-kzm_arm11_01.c
@@ -245,13 +245,17 @@ static void __init kzm_board_init(void)
mxc_iomux_setup_multiple_pins(kzm_pins,
  ARRAY_SIZE(kzm_pins), "kzm");
-   kzm_init_ext_uart();
-   kzm_init_smsc9118();
kzm_init_imx_uart();
pr_info("Clock input source is 26MHz\n");
 }
+static void __init kzm_late_init(void)
+{
+   kzm_init_ext_uart();
+   kzm_init_smsc9118();
+}
+
 /*
  * This structure defines static mappings for the kzm-arm11-01 board.
  */
@@ -291,5 +295,6 @@ MACHINE_START(KZM_ARM11_01, "Kyoto Microcomputer Co., Ltd. 
KZM-ARM11-01")
.init_irq = mx31_init_irq,
.init_time  = kzm_timer_init,
.init_machine = kzm_board_init,
+   .init_late  = kzm_late_init,
.restart= mxc_restart,
 MACHINE_END


That + checking the return code of gpio_request and the other calls.
Or better, convert the machine to dt.


But I agree that there might be more legacy boards (i.MX31 only IMHO),
which may attempt to manipulate GPIO lines before subsys_initcall()
level.


I wouldn't revert anything for legacy boards. That's the chance to say
in the near future: They stopped working in September 2016, obviously
nobody cares, let's rip them. :-)


New kernel development philosophy ? Regressions are acceptable as long as
they affect a board older than X years ? What is your cut-off date for accepting
regressions like that ?


I would soften your statement about regressions to: Regressions like
these are an incentive to pre-dt platforms to modernize. Of course they
should ideally be prevented, and fixed when they happen, but I wouldn't
start reverting gpio-changes because of a regression on a machine that
is hardly used with modern kernel (ok, that's an assumption, but it
didn't receive enough love to be converted to dt), that fails to even
check return values of gpio_request and that can be easily fixed.


All 890 of them ? Really ?

Guenter



Crashes in next-20160915 (BUG at fs/notify/notification.c:66!)

2016-09-15 Thread Guenter Roeck

Hi,

I see various architectures crashing in -next with the following error.

[ cut here ]
kernel BUG at fs/notify/notification.c:66!
invalid opcode:  [#1] PREEMPT
Modules linked in:
CPU: 0 PID: 110 Comm: udevd Not tainted 4.8.0-rc6-next-20160915-yocto-standard 
#1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
task: 88000dd58000 task.stack: c925
RIP: 0010:[]  [] 
fsnotify_notify_queue_is_empty+0x9/0x10
RSP: 0018:c9253e68  EFLAGS: 0296
RAX: 88000dd58000 RBX: 88000dc9a9c0 RCX: 00011ff0
RDX: 811bd52a RSI: 88000e3c1258 RDI: 88000dc9a9c0
RBP: c9253e68 R08: 88000ffd37e0 R09: 88000dcdd600
R10: 88000dcdd600 R11:  R12: 0001
R13: 88000dd28c00 R14: 88000dc0a380 R15: 88000dc0a398
FS:  7ff5bea85740() GS:81c31000() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff5bea79000 CR3: 0dd63000 CR4: 000406f0
Stack:
 c9253e88 811bd532  88000dd28c00
 c9253f48 811bfeba 0006 
 88000dcdd600 88000dcdd600 001900253f58 04080002
Call Trace:
 [] inotify_poll+0x42/0x70
 [] SyS_epoll_ctl+0x84a/0xf60
 [] ? ep_send_events_proc+0x180/0x180
 [] entry_SYSCALL_64_fastpath+0x13/0x8f
Code: 90 90 0f 1f 44 00 00 55 b8 01 00 00 00 48 89 e5 0f c1 05 bb e4 d4 00 83 c0 01 
5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 0f 1f 
44 00 00 55 48 89 e5 48 83 ec 10 48
RIP  [] fsnotify_notify_queue_is_empty+0x9/0x10
 RSP 
---[ end trace 7dc4a27003f0b575 ]---

I didn't bisect, but I would guess the culprit is one of the new patches in the
affected file.

22e9cf146d3b fanotify: fix possible false warning when freeing events
ced89591817c fsnotify: convert notification_mutex to a spinlock
f82fa3d0e7f5 fsnotify: drop notification_mutex before destroying event
782fbc7e8685 fanotify: fix list corruption in fanotify_get_response()
56cf1c8a1b35 fsnotify: add a way to stop queueing events on group shutdown

Guenter


Crashes in next-20160915 (BUG at fs/notify/notification.c:66!)

2016-09-15 Thread Guenter Roeck

Hi,

I see various architectures crashing in -next with the following error.

[ cut here ]
kernel BUG at fs/notify/notification.c:66!
invalid opcode:  [#1] PREEMPT
Modules linked in:
CPU: 0 PID: 110 Comm: udevd Not tainted 4.8.0-rc6-next-20160915-yocto-standard 
#1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
task: 88000dd58000 task.stack: c925
RIP: 0010:[]  [] 
fsnotify_notify_queue_is_empty+0x9/0x10
RSP: 0018:c9253e68  EFLAGS: 0296
RAX: 88000dd58000 RBX: 88000dc9a9c0 RCX: 00011ff0
RDX: 811bd52a RSI: 88000e3c1258 RDI: 88000dc9a9c0
RBP: c9253e68 R08: 88000ffd37e0 R09: 88000dcdd600
R10: 88000dcdd600 R11:  R12: 0001
R13: 88000dd28c00 R14: 88000dc0a380 R15: 88000dc0a398
FS:  7ff5bea85740() GS:81c31000() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff5bea79000 CR3: 0dd63000 CR4: 000406f0
Stack:
 c9253e88 811bd532  88000dd28c00
 c9253f48 811bfeba 0006 
 88000dcdd600 88000dcdd600 001900253f58 04080002
Call Trace:
 [] inotify_poll+0x42/0x70
 [] SyS_epoll_ctl+0x84a/0xf60
 [] ? ep_send_events_proc+0x180/0x180
 [] entry_SYSCALL_64_fastpath+0x13/0x8f
Code: 90 90 0f 1f 44 00 00 55 b8 01 00 00 00 48 89 e5 0f c1 05 bb e4 d4 00 83 c0 01 
5d c3 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 0f 1f 
44 00 00 55 48 89 e5 48 83 ec 10 48
RIP  [] fsnotify_notify_queue_is_empty+0x9/0x10
 RSP 
---[ end trace 7dc4a27003f0b575 ]---

I didn't bisect, but I would guess the culprit is one of the new patches in the
affected file.

22e9cf146d3b fanotify: fix possible false warning when freeing events
ced89591817c fsnotify: convert notification_mutex to a spinlock
f82fa3d0e7f5 fsnotify: drop notification_mutex before destroying event
782fbc7e8685 fanotify: fix list corruption in fanotify_get_response()
56cf1c8a1b35 fsnotify: add a way to stop queueing events on group shutdown

Guenter


[PATCH -next v2] ARM: orion: fix invalid use of sizeof in orion_ge00_switch_init()

2016-09-15 Thread Wei Yongjun
From: Wei Yongjun 

sizeof() when applied to a pointer typed expression gives the
size of the pointer, not that of the pointed data.

Fixes: fe158a17c1e0 ("ARM: orion: simplify orion_ge00_switch_init")
Signed-off-by: Wei Yongjun 
---
v1 -> v2: fix file module name in subject
---
 arch/arm/plat-orion/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c
index 272f49b..23a975a 100644
--- a/arch/arm/plat-orion/common.c
+++ b/arch/arm/plat-orion/common.c
@@ -478,7 +478,7 @@ void __init orion_ge00_switch_init(struct dsa_platform_data 
*d)
for (i = 0; i < d->nr_chips; i++)
d->chip[i].host_dev = _ge_mvmdio.dev;
 
-   platform_device_register_data(NULL, "dsa", 0, d, sizeof(d));
+   platform_device_register_data(NULL, "dsa", 0, d, sizeof(*d));
 }
 
 /*



[PATCH -next v2] ARM: orion: fix invalid use of sizeof in orion_ge00_switch_init()

2016-09-15 Thread Wei Yongjun
From: Wei Yongjun 

sizeof() when applied to a pointer typed expression gives the
size of the pointer, not that of the pointed data.

Fixes: fe158a17c1e0 ("ARM: orion: simplify orion_ge00_switch_init")
Signed-off-by: Wei Yongjun 
---
v1 -> v2: fix file module name in subject
---
 arch/arm/plat-orion/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c
index 272f49b..23a975a 100644
--- a/arch/arm/plat-orion/common.c
+++ b/arch/arm/plat-orion/common.c
@@ -478,7 +478,7 @@ void __init orion_ge00_switch_init(struct dsa_platform_data 
*d)
for (i = 0; i < d->nr_chips; i++)
d->chip[i].host_dev = _ge_mvmdio.dev;
 
-   platform_device_register_data(NULL, "dsa", 0, d, sizeof(d));
+   platform_device_register_data(NULL, "dsa", 0, d, sizeof(*d));
 }
 
 /*



[PATCH] phy: sun4i-usb: Fix build dependency

2016-09-15 Thread Axel Lin
of_usb_get_dr_mode_by_phy will not be compiled if !USB_COMMON, fix below
build error.

ERROR: "of_usb_get_dr_mode_by_phy" [drivers/phy/phy-sun4i-usb.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1

Signed-off-by: Axel Lin 
---
 drivers/phy/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index cc27c12..b65aba0 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -258,6 +258,8 @@ config PHY_SUN4I_USB
depends on RESET_CONTROLLER
depends on EXTCON
depends on POWER_SUPPLY
+   depends on USB_SUPPORT
+   select USB_COMMON
select GENERIC_PHY
help
  Enable this to support the transceiver that is part of Allwinner
-- 
2.7.4



[PATCH] phy: sun4i-usb: Fix build dependency

2016-09-15 Thread Axel Lin
of_usb_get_dr_mode_by_phy will not be compiled if !USB_COMMON, fix below
build error.

ERROR: "of_usb_get_dr_mode_by_phy" [drivers/phy/phy-sun4i-usb.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1

Signed-off-by: Axel Lin 
---
 drivers/phy/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index cc27c12..b65aba0 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -258,6 +258,8 @@ config PHY_SUN4I_USB
depends on RESET_CONTROLLER
depends on EXTCON
depends on POWER_SUPPLY
+   depends on USB_SUPPORT
+   select USB_COMMON
select GENERIC_PHY
help
  Enable this to support the transceiver that is part of Allwinner
-- 
2.7.4



Re: [PATCH -next] staging: android: ion: Fix return value check in hi6220_ion_probe()

2016-09-15 Thread Laura Abbott

On 09/14/2016 07:20 PM, Wei Yongjun wrote:

From: Wei Yongjun 

In case of error, the function ion_device_create() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().



Acked-by: Laura Abbott 


Signed-off-by: Wei Yongjun 
---
 drivers/staging/android/ion/hisilicon/hi6220_ion.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/hisilicon/hi6220_ion.c 
b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
index f392db2..659aa71 100644
--- a/drivers/staging/android/ion/hisilicon/hi6220_ion.c
+++ b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
@@ -49,8 +49,8 @@ static int hi6220_ion_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, ipdev);

ipdev->idev = ion_device_create(NULL);
-   if (!ipdev->idev)
-   return -ENOMEM;
+   if (IS_ERR(ipdev->idev))
+   return PTR_ERR(ipdev->idev);

ipdev->data = ion_parse_dt(pdev, hisi_heaps);
if (IS_ERR(ipdev->data))







Re: [PATCH -next] staging: android: ion: Fix return value check in hi6220_ion_probe()

2016-09-15 Thread Laura Abbott

On 09/14/2016 07:20 PM, Wei Yongjun wrote:

From: Wei Yongjun 

In case of error, the function ion_device_create() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().



Acked-by: Laura Abbott 


Signed-off-by: Wei Yongjun 
---
 drivers/staging/android/ion/hisilicon/hi6220_ion.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/hisilicon/hi6220_ion.c 
b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
index f392db2..659aa71 100644
--- a/drivers/staging/android/ion/hisilicon/hi6220_ion.c
+++ b/drivers/staging/android/ion/hisilicon/hi6220_ion.c
@@ -49,8 +49,8 @@ static int hi6220_ion_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, ipdev);

ipdev->idev = ion_device_create(NULL);
-   if (!ipdev->idev)
-   return -ENOMEM;
+   if (IS_ERR(ipdev->idev))
+   return PTR_ERR(ipdev->idev);

ipdev->data = ion_parse_dt(pdev, hisi_heaps);
if (IS_ERR(ipdev->data))







Re: [PATCH] staging: android ion/hisi: fix dependencies

2016-09-15 Thread Laura Abbott

On 09/15/2016 08:57 AM, Arnd Bergmann wrote:

The newly added Hi6220 Ion code fails to build when the ION_OF helpers
are not present:

drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function 
`hi6220_ion_remove':
hi6220_ion.c:(.text.hi6220_ion_remove+0x4c): undefined reference to 
`ion_destroy_platform_data'
drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function 
`hi6220_ion_probe':
hi6220_ion.c:(.text.hi6220_ion_probe+0x5c): undefined reference to 
`ion_parse_dt'
hi6220_ion.c:(.text.hi6220_ion_probe+0xf8): undefined reference to 
`ion_destroy_platform_data'

This selects the symbol when needed.



Acked-by: Laura Abbott 


Signed-off-by: Arnd Bergmann 
Fixes: 2b40182a19bc ("staging: android: ion: Add ion driver for Hi6220 SoC 
platform")
---
 drivers/staging/android/ion/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/ion/Kconfig 
b/drivers/staging/android/ion/Kconfig
index 94105544feef..c8fb4134c55d 100644
--- a/drivers/staging/android/ion/Kconfig
+++ b/drivers/staging/android/ion/Kconfig
@@ -36,6 +36,7 @@ config ION_TEGRA
 config ION_HISI
tristate "Ion for Hisilicon"
depends on ARCH_HISI && ION
+   select ION_OF
help
  Choose this option if you wish to use ion on Hisilicon Platform.






Re: [PATCH] staging: android ion/hisi: fix dependencies

2016-09-15 Thread Laura Abbott

On 09/15/2016 08:57 AM, Arnd Bergmann wrote:

The newly added Hi6220 Ion code fails to build when the ION_OF helpers
are not present:

drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function 
`hi6220_ion_remove':
hi6220_ion.c:(.text.hi6220_ion_remove+0x4c): undefined reference to 
`ion_destroy_platform_data'
drivers/staging/android/ion/hisilicon/hi6220_ion.o: In function 
`hi6220_ion_probe':
hi6220_ion.c:(.text.hi6220_ion_probe+0x5c): undefined reference to 
`ion_parse_dt'
hi6220_ion.c:(.text.hi6220_ion_probe+0xf8): undefined reference to 
`ion_destroy_platform_data'

This selects the symbol when needed.



Acked-by: Laura Abbott 


Signed-off-by: Arnd Bergmann 
Fixes: 2b40182a19bc ("staging: android: ion: Add ion driver for Hi6220 SoC 
platform")
---
 drivers/staging/android/ion/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/ion/Kconfig 
b/drivers/staging/android/ion/Kconfig
index 94105544feef..c8fb4134c55d 100644
--- a/drivers/staging/android/ion/Kconfig
+++ b/drivers/staging/android/ion/Kconfig
@@ -36,6 +36,7 @@ config ION_TEGRA
 config ION_HISI
tristate "Ion for Hisilicon"
depends on ARCH_HISI && ION
+   select ION_OF
help
  Choose this option if you wish to use ion on Hisilicon Platform.






[PATCH] phy: meson-usb2: Fix build dependency

2016-09-15 Thread Axel Lin
of_usb_get_dr_mode_by_phy will not be compiled if !USB_COMMON, fix below
build error:

drivers/built-in.o: In function `phy_meson_usb2_probe':
debugfs.c:(.text+0x76b4): undefined reference to `of_usb_get_dr_mode_by_phy'
Makefile:961: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin 
---
 drivers/phy/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 6ad87ec..cc27c12 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -458,6 +458,8 @@ config PHY_MESON_USB2
tristate "Meson USB2 PHY driver"
default ARCH_MESON
depends on OF && (ARCH_MESON || COMPILE_TEST)
+   depends on USB_SUPPORT
+   select USB_COMMON
select GENERIC_PHY
help
  Enable this to support the Meson USB2 PHYs found in Meson8b
-- 
2.7.4



[PATCH] phy: meson-usb2: Fix build dependency

2016-09-15 Thread Axel Lin
of_usb_get_dr_mode_by_phy will not be compiled if !USB_COMMON, fix below
build error:

drivers/built-in.o: In function `phy_meson_usb2_probe':
debugfs.c:(.text+0x76b4): undefined reference to `of_usb_get_dr_mode_by_phy'
Makefile:961: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin 
---
 drivers/phy/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 6ad87ec..cc27c12 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -458,6 +458,8 @@ config PHY_MESON_USB2
tristate "Meson USB2 PHY driver"
default ARCH_MESON
depends on OF && (ARCH_MESON || COMPILE_TEST)
+   depends on USB_SUPPORT
+   select USB_COMMON
select GENERIC_PHY
help
  Enable this to support the Meson USB2 PHYs found in Meson8b
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >