Re: [PATCH] checkpatch.pl: Add SPDX license tag check

2018-02-01 Thread Greg Kroah-Hartman
On Thu, Feb 01, 2018 at 03:14:29PM -0600, Rob Herring wrote:
> Add SPDX license tag check based on the rules defined in
> Documentation/process/license-rules.rst. To summarize, SPDX license tags
> should be on the 1st line (or 2nd line in scripts) using the appropriate
> comment style for the file type.
> 
> Cc: Andy Whitcroft 
> Cc: Joe Perches 
> Cc: Greg Kroah-Hartman 
> Cc: Thomas Gleixner 
> Cc: Philippe Ombredanne 
> Cc: Andrew Morton 
> Signed-off-by: Rob Herring 

Acked-by: Greg Kroah-Hartman 


Re: [PATCH] checkpatch.pl: Add SPDX license tag check

2018-02-01 Thread Greg Kroah-Hartman
On Thu, Feb 01, 2018 at 03:14:29PM -0600, Rob Herring wrote:
> Add SPDX license tag check based on the rules defined in
> Documentation/process/license-rules.rst. To summarize, SPDX license tags
> should be on the 1st line (or 2nd line in scripts) using the appropriate
> comment style for the file type.
> 
> Cc: Andy Whitcroft 
> Cc: Joe Perches 
> Cc: Greg Kroah-Hartman 
> Cc: Thomas Gleixner 
> Cc: Philippe Ombredanne 
> Cc: Andrew Morton 
> Signed-off-by: Rob Herring 

Acked-by: Greg Kroah-Hartman 


Re: [perf] perf test BPF fails on 4.9.20

2018-02-01 Thread Pintu Kumar
Hi,

perf test bpf prologue generation is failing.
37.2: Test BPF prologue generation   : FAILED!

Try to find probe point from debuginfo.
Matched function: null_lseek [105be32]
Probe point found: null_lseek+0
Searching 'file' variable in context.
Converting variable file into trace event.
converting f_mode in file
file(type:file) has no member f_mode.
An error occurred in debuginfo analysis (-22).
bpf_probe: failed to convert perf probe eventsFailed to add events
selected by BPF
test child finished with -1
 end 
Test BPF filter subtest 1: FAILED!


Is there any fix available for this issue?
I searched 4.15, but could not relate any of the patches to this.


Thanks,
Pintu



On Thu, Feb 1, 2018 at 7:34 PM, Pintu Kumar  wrote:
> Hi,
>
> After enabling DEBUG_INFO in kernel I still get this error for BPF test.
> Please help.
>
> # perf test BPF -v
> .
> Looking at the vmlinux_path (8 entries long)
> Using /usr/lib/debug/boot/vmlinux-4.9.00--amd-x86-64-00071-gd94c220-dirty
> for symbols
> Open Debuginfo file:
> /usr/lib/debug/boot/vmlinux-4.9.00--amd-x86-64-00071-gd94c220-dirty
> Try to find probe point from debuginfo.
> Matched function: null_lseek [105be32]
> Probe point found: null_lseek+0
> Searching 'file' variable in context.
> Converting variable file into trace event.
> converting f_mode in file
> file(type:file) has no member f_mode.
> An error occurred in debuginfo analysis (-22).
> bpf_probe: failed to convert perf probe eventsFailed to add events
> selected by BPF
> test child finished with -1
>  end 
> Test BPF filter subtest 1: FAILED!
>
>
>
> On Thu, Feb 1, 2018 at 10:50 AM, Pintu Kumar  wrote:
>> Dear Masami,
>>
>> Now I am stuck again with 'perf test' failure on 4.9
>>
>> # perf --version
>> perf version 4.9.20-
>>
>> # perf test
>> 16: Try 'import perf' in python, checking link problems  : FAILED!
>> 37.2: Test BPF prologue generation   : FAILED!
>>
>> If you have any clue about these failure please hep me.
>>
>> Here are the verbose output:
>> -
>> 1) # perf test python -v
>> 16: Try 'import perf' in python, checking link problems  :
>> --- start ---
>> test child forked, pid 24562
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> ImportError: No module named perf
>> test child finished with -1
>>  end 
>> Try 'import perf' in python, checking link problems: FAILED!
>> --
>>
>> 2) # perf test BPF -v
>> ---
>> .
>> bpf: config 'func=null_lseek file->f_mode offset orig' is ok
>> Looking at the vmlinux_path (8 entries long)
>> symsrc__init: cannot get elf header.
>> Failed to find the path for kernel: Invalid ELF file
>> bpf_probe: failed to convert perf probe eventsFailed to add events
>> selected by BPF
>> test child finished with -1
>>  end 
>> Test BPF filter subtest 1: FAILED!
>>
>> ---
>>
>>
>> Thanks,
>> Pintu
>>
>>
>> On Wed, Jan 31, 2018 at 9:01 AM, Masami Hiramatsu  
>> wrote:
>>> On Tue, 30 Jan 2018 19:20:36 +0530
>>> Pintu Kumar  wrote:
>>>
 On Tue, Jan 30, 2018 at 11:13 AM, Masami Hiramatsu  
 wrote:
 >
 > On Mon, 29 Jan 2018 22:00:52 +0530
 > Pintu Kumar  wrote:
 >
 > > Dear Masami,
 > >
 > > Thank you so much for your reply.
 > > Please find some of my answers inline.
 > >
 > >
 > > On Mon, Jan 29, 2018 at 7:47 PM, Masami Hiramatsu 
 > >  wrote:
 > > > On Mon, 29 Jan 2018 13:40:34 +0530
 > > > Pintu Kumar  wrote:
 > > >
 > > >> Hi All,
 > > >>
 > > >> 'perf probe' is failing sometimes on 4.9.20 with AMD-64.
 > > >> # perf probe --add schedule
 > > >> schedule is out of .text, skip it.
 > > >>   Error: Failed to add events.
 > > >>
 > > >> If any one have come across this problem please let me know the 
 > > >> cause.
 > > >
 > > > Hi Pintu,
 > > >
 > > > Could you run it with --vv?
 > > >
 > > Ok, I will send verbose output by tomorrow.
 > >
 > > >>
 > > >> Note: I don't have CONFIG_DEBUG_INFO enabled in kernel. Is this the 
 > > >> problem?
 > > >
 > > > Without it, you can not probe source-level probe nor trace local 
 > > > variable.
 > > >
 > >
 > > Currently I am facing problem in enabling DEBUG_INFO in our kernel 
 > > 4.9.20
 > > However, I will try to manually include "-g" option during compilation.
 > >
 > > >> However, I manually copied the vmlinux file to /boot/ directory, but
 > > >> still it 

Re: [perf] perf test BPF fails on 4.9.20

2018-02-01 Thread Pintu Kumar
Hi,

perf test bpf prologue generation is failing.
37.2: Test BPF prologue generation   : FAILED!

Try to find probe point from debuginfo.
Matched function: null_lseek [105be32]
Probe point found: null_lseek+0
Searching 'file' variable in context.
Converting variable file into trace event.
converting f_mode in file
file(type:file) has no member f_mode.
An error occurred in debuginfo analysis (-22).
bpf_probe: failed to convert perf probe eventsFailed to add events
selected by BPF
test child finished with -1
 end 
Test BPF filter subtest 1: FAILED!


Is there any fix available for this issue?
I searched 4.15, but could not relate any of the patches to this.


Thanks,
Pintu



On Thu, Feb 1, 2018 at 7:34 PM, Pintu Kumar  wrote:
> Hi,
>
> After enabling DEBUG_INFO in kernel I still get this error for BPF test.
> Please help.
>
> # perf test BPF -v
> .
> Looking at the vmlinux_path (8 entries long)
> Using /usr/lib/debug/boot/vmlinux-4.9.00--amd-x86-64-00071-gd94c220-dirty
> for symbols
> Open Debuginfo file:
> /usr/lib/debug/boot/vmlinux-4.9.00--amd-x86-64-00071-gd94c220-dirty
> Try to find probe point from debuginfo.
> Matched function: null_lseek [105be32]
> Probe point found: null_lseek+0
> Searching 'file' variable in context.
> Converting variable file into trace event.
> converting f_mode in file
> file(type:file) has no member f_mode.
> An error occurred in debuginfo analysis (-22).
> bpf_probe: failed to convert perf probe eventsFailed to add events
> selected by BPF
> test child finished with -1
>  end 
> Test BPF filter subtest 1: FAILED!
>
>
>
> On Thu, Feb 1, 2018 at 10:50 AM, Pintu Kumar  wrote:
>> Dear Masami,
>>
>> Now I am stuck again with 'perf test' failure on 4.9
>>
>> # perf --version
>> perf version 4.9.20-
>>
>> # perf test
>> 16: Try 'import perf' in python, checking link problems  : FAILED!
>> 37.2: Test BPF prologue generation   : FAILED!
>>
>> If you have any clue about these failure please hep me.
>>
>> Here are the verbose output:
>> -
>> 1) # perf test python -v
>> 16: Try 'import perf' in python, checking link problems  :
>> --- start ---
>> test child forked, pid 24562
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> ImportError: No module named perf
>> test child finished with -1
>>  end 
>> Try 'import perf' in python, checking link problems: FAILED!
>> --
>>
>> 2) # perf test BPF -v
>> ---
>> .
>> bpf: config 'func=null_lseek file->f_mode offset orig' is ok
>> Looking at the vmlinux_path (8 entries long)
>> symsrc__init: cannot get elf header.
>> Failed to find the path for kernel: Invalid ELF file
>> bpf_probe: failed to convert perf probe eventsFailed to add events
>> selected by BPF
>> test child finished with -1
>>  end 
>> Test BPF filter subtest 1: FAILED!
>>
>> ---
>>
>>
>> Thanks,
>> Pintu
>>
>>
>> On Wed, Jan 31, 2018 at 9:01 AM, Masami Hiramatsu  
>> wrote:
>>> On Tue, 30 Jan 2018 19:20:36 +0530
>>> Pintu Kumar  wrote:
>>>
 On Tue, Jan 30, 2018 at 11:13 AM, Masami Hiramatsu  
 wrote:
 >
 > On Mon, 29 Jan 2018 22:00:52 +0530
 > Pintu Kumar  wrote:
 >
 > > Dear Masami,
 > >
 > > Thank you so much for your reply.
 > > Please find some of my answers inline.
 > >
 > >
 > > On Mon, Jan 29, 2018 at 7:47 PM, Masami Hiramatsu 
 > >  wrote:
 > > > On Mon, 29 Jan 2018 13:40:34 +0530
 > > > Pintu Kumar  wrote:
 > > >
 > > >> Hi All,
 > > >>
 > > >> 'perf probe' is failing sometimes on 4.9.20 with AMD-64.
 > > >> # perf probe --add schedule
 > > >> schedule is out of .text, skip it.
 > > >>   Error: Failed to add events.
 > > >>
 > > >> If any one have come across this problem please let me know the 
 > > >> cause.
 > > >
 > > > Hi Pintu,
 > > >
 > > > Could you run it with --vv?
 > > >
 > > Ok, I will send verbose output by tomorrow.
 > >
 > > >>
 > > >> Note: I don't have CONFIG_DEBUG_INFO enabled in kernel. Is this the 
 > > >> problem?
 > > >
 > > > Without it, you can not probe source-level probe nor trace local 
 > > > variable.
 > > >
 > >
 > > Currently I am facing problem in enabling DEBUG_INFO in our kernel 
 > > 4.9.20
 > > However, I will try to manually include "-g" option during compilation.
 > >
 > > >> However, I manually copied the vmlinux file to /boot/ directory, but
 > > >> still it does not work.
 > > >
 > > > That doesn't work.
 > > > CONFIG_DEBUG_INFO option enables gcc to compile kernel with extra 
 > > > debuginfo.
 > > > Without 

RE: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread He, Roger
Can you try to use a fixed limit like I suggested once more?
E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Maybe you missed previous mail. I explain again here.
Set the value as 256MB not work on my platform.  My machine has 8GB system 
memory, and 8GB swap disk.
On my machine, set it as 4G can work.
But 4G also not work on test machine with 16GB system memory & 8GB swap disk.


Thanks
Roger(Hongbo.He)

-Original Message-
From: Koenig, Christian 
Sent: Friday, February 02, 2018 3:46 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Can you try to use a fixed limit like I suggested once more?

E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Regards,
Christian.

Am 02.02.2018 um 07:57 schrieb He, Roger:
>   Use the limit as total ram*1/2 seems work very well.
>   No OOM although swap disk reaches full at peak for piglit test.
>
> But for this approach, David noticed that has an obvious defect.
> For example,  if the platform has 32G system memory, 8G swap disk.
> 1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
> at all.
> For now we work out an improved version based on get_nr_swap_pages().
> Going to send out later.
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: He, Roger
> Sent: Thursday, February 01, 2018 4:03 PM
> To: Koenig, Christian ; Zhou, 
> David(ChunMing) ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 
> 
> Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Just now, I tried with fixed limit.  But not work always.
> For example: set the limit as 4GB on my platform with 8GB system memory, it 
> can pass.
> But when run with platform with 16GB system memory, it failed since OOM.
>
> And I guess it also depends on app's behavior.
> I mean some apps  make OS to use more swap space as well.
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of He, Roger
> Sent: Thursday, February 01, 2018 1:48 PM
> To: Koenig, Christian ; Zhou, 
> David(ChunMing) ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
>   But what we could do is to rely on a fixed limit like the Intel driver 
> does and I suggested before.
>   E.g. don't copy anything into a shmemfile when there is only x MB of 
> swap space left.
>
> Here I think we can do it further, let the limit value scaling with total 
> system memory.
> For example: total system memory * 1/2.
> If that it will match the platform configuration better.
>
>   Roger can you test that approach once more with your fix for the OOM 
> issues in the page fault handler?
>
> Sure. Use the limit as total ram*1/2 seems work very well.
> No OOM although swap disk reaches full at peak for piglit test.
> I speculate this case happens but no OOM because:
>
> a. run a while, swap disk be used close to 1/2* total size and but not over 
> 1/2 * total.
> b. all subsequent swapped pages stay in system memory until no space there.
>   Then the swapped pages in shmem be flushed into swap disk. And probably 
> OS also need some swap space.
>   For this case, it is easy to get full for swap disk.
> c. but because now free swap size < 1/2 * total, so no swap out happen  after 
> that.
>  And at least 1/4* system memory will left because below check in 
> ttm_mem_global_reserve will ensure that.
>   if (zone->used_mem > limit)
>   goto out_unlock;
>  
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, January 31, 2018 4:13 PM
> To: He, Roger ; Zhou, David(ChunMing) 
> ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
> driver does and I suggested before.
>
> E.g. don't copy anything into a shmemfile when there is only x MB of swap 
> space left.
>
> Roger can you test that approach once more with your fix for the OOM issues 
> in the page fault handler?
>
> Thanks,
> Christian.
>
> Am 31.01.2018 um 09:08 schrieb He, Roger:
>>  I think this patch isn't need at all. You can directly read 
>> total_swap_pages variable in TTM.
>>
>> Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct 

RE: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread He, Roger
Can you try to use a fixed limit like I suggested once more?
E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Maybe you missed previous mail. I explain again here.
Set the value as 256MB not work on my platform.  My machine has 8GB system 
memory, and 8GB swap disk.
On my machine, set it as 4G can work.
But 4G also not work on test machine with 16GB system memory & 8GB swap disk.


Thanks
Roger(Hongbo.He)

-Original Message-
From: Koenig, Christian 
Sent: Friday, February 02, 2018 3:46 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Can you try to use a fixed limit like I suggested once more?

E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Regards,
Christian.

Am 02.02.2018 um 07:57 schrieb He, Roger:
>   Use the limit as total ram*1/2 seems work very well.
>   No OOM although swap disk reaches full at peak for piglit test.
>
> But for this approach, David noticed that has an obvious defect.
> For example,  if the platform has 32G system memory, 8G swap disk.
> 1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
> at all.
> For now we work out an improved version based on get_nr_swap_pages().
> Going to send out later.
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: He, Roger
> Sent: Thursday, February 01, 2018 4:03 PM
> To: Koenig, Christian ; Zhou, 
> David(ChunMing) ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 
> 
> Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Just now, I tried with fixed limit.  But not work always.
> For example: set the limit as 4GB on my platform with 8GB system memory, it 
> can pass.
> But when run with platform with 16GB system memory, it failed since OOM.
>
> And I guess it also depends on app's behavior.
> I mean some apps  make OS to use more swap space as well.
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of He, Roger
> Sent: Thursday, February 01, 2018 1:48 PM
> To: Koenig, Christian ; Zhou, 
> David(ChunMing) ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
>   But what we could do is to rely on a fixed limit like the Intel driver 
> does and I suggested before.
>   E.g. don't copy anything into a shmemfile when there is only x MB of 
> swap space left.
>
> Here I think we can do it further, let the limit value scaling with total 
> system memory.
> For example: total system memory * 1/2.
> If that it will match the platform configuration better.
>
>   Roger can you test that approach once more with your fix for the OOM 
> issues in the page fault handler?
>
> Sure. Use the limit as total ram*1/2 seems work very well.
> No OOM although swap disk reaches full at peak for piglit test.
> I speculate this case happens but no OOM because:
>
> a. run a while, swap disk be used close to 1/2* total size and but not over 
> 1/2 * total.
> b. all subsequent swapped pages stay in system memory until no space there.
>   Then the swapped pages in shmem be flushed into swap disk. And probably 
> OS also need some swap space.
>   For this case, it is easy to get full for swap disk.
> c. but because now free swap size < 1/2 * total, so no swap out happen  after 
> that.
>  And at least 1/4* system memory will left because below check in 
> ttm_mem_global_reserve will ensure that.
>   if (zone->used_mem > limit)
>   goto out_unlock;
>  
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, January 31, 2018 4:13 PM
> To: He, Roger ; Zhou, David(ChunMing) 
> ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
> driver does and I suggested before.
>
> E.g. don't copy anything into a shmemfile when there is only x MB of swap 
> space left.
>
> Roger can you test that approach once more with your fix for the OOM issues 
> in the page fault handler?
>
> Thanks,
> Christian.
>
> Am 31.01.2018 um 09:08 schrieb He, Roger:
>>  I think this patch isn't need at all. You can directly read 
>> total_swap_pages variable in TTM.
>>
>> Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using 
>> will result in:
>> "WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".
>>
>> Thanks
>> Roger(Hongbo.He)
>> -Original Message-
>> From: dri-devel 

Re: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread Christian König

Can you try to use a fixed limit like I suggested once more?

E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Regards,
Christian.

Am 02.02.2018 um 07:57 schrieb He, Roger:

Use the limit as total ram*1/2 seems work very well.
No OOM although swap disk reaches full at peak for piglit test.

But for this approach, David noticed that has an obvious defect.
For example,  if the platform has 32G system memory, 8G swap disk.
1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
at all.
For now we work out an improved version based on get_nr_swap_pages().
Going to send out later.

Thanks
Roger(Hongbo.He)
-Original Message-
From: He, Roger
Sent: Thursday, February 01, 2018 4:03 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 

Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Just now, I tried with fixed limit.  But not work always.
For example: set the limit as 4GB on my platform with 8GB system memory, it can 
pass.
But when run with platform with 16GB system memory, it failed since OOM.

And I guess it also depends on app's behavior.
I mean some apps  make OS to use more swap space as well.

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
He, Roger
Sent: Thursday, February 01, 2018 1:48 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

But what we could do is to rely on a fixed limit like the Intel driver 
does and I suggested before.
E.g. don't copy anything into a shmemfile when there is only x MB of 
swap space left.

Here I think we can do it further, let the limit value scaling with total 
system memory.
For example: total system memory * 1/2.
If that it will match the platform configuration better.

Roger can you test that approach once more with your fix for the OOM 
issues in the page fault handler?

Sure. Use the limit as total ram*1/2 seems work very well.
No OOM although swap disk reaches full at peak for piglit test.
I speculate this case happens but no OOM because:

a. run a while, swap disk be used close to 1/2* total size and but not over 1/2 
* total.
b. all subsequent swapped pages stay in system memory until no space there.
  Then the swapped pages in shmem be flushed into swap disk. And probably 
OS also need some swap space.
  For this case, it is easy to get full for swap disk.
c. but because now free swap size < 1/2 * total, so no swap out happen  after 
that.
 And at least 1/4* system memory will left because below check in 
ttm_mem_global_reserve will ensure that.
if (zone->used_mem > limit)
goto out_unlock;
 
Thanks

Roger(Hongbo.He)
-Original Message-
From: Koenig, Christian
Sent: Wednesday, January 31, 2018 4:13 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
driver does and I suggested before.

E.g. don't copy anything into a shmemfile when there is only x MB of swap space 
left.

Roger can you test that approach once more with your fix for the OOM issues in 
the page fault handler?

Thanks,
Christian.

Am 31.01.2018 um 09:08 schrieb He, Roger:

I think this patch isn't need at all. You can directly read 
total_swap_pages variable in TTM.

Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using will 
result in:
"WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On
Behalf Of Chunming Zhou
Sent: Wednesday, January 31, 2018 3:15 PM
To: He, Roger ; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Koenig,
Christian 
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to
expose total_swap_pages

Hi Roger,

I think this patch isn't need at all. You can directly read total_swap_pages 
variable in TTM. See the comment:

/* protected with swap_lock. reading in vm_swap_full() doesn't need
lock */ long total_swap_pages;

there are many places using it directly, you just couldn't change its value. 
Reading it doesn't need lock.


Regards,

David Zhou


On 2018年01月29日 16:29, Roger He wrote:

ttm module needs 

Re: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread Christian König

Can you try to use a fixed limit like I suggested once more?

E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Regards,
Christian.

Am 02.02.2018 um 07:57 schrieb He, Roger:

Use the limit as total ram*1/2 seems work very well.
No OOM although swap disk reaches full at peak for piglit test.

But for this approach, David noticed that has an obvious defect.
For example,  if the platform has 32G system memory, 8G swap disk.
1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
at all.
For now we work out an improved version based on get_nr_swap_pages().
Going to send out later.

Thanks
Roger(Hongbo.He)
-Original Message-
From: He, Roger
Sent: Thursday, February 01, 2018 4:03 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 

Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Just now, I tried with fixed limit.  But not work always.
For example: set the limit as 4GB on my platform with 8GB system memory, it can 
pass.
But when run with platform with 16GB system memory, it failed since OOM.

And I guess it also depends on app's behavior.
I mean some apps  make OS to use more swap space as well.

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
He, Roger
Sent: Thursday, February 01, 2018 1:48 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

But what we could do is to rely on a fixed limit like the Intel driver 
does and I suggested before.
E.g. don't copy anything into a shmemfile when there is only x MB of 
swap space left.

Here I think we can do it further, let the limit value scaling with total 
system memory.
For example: total system memory * 1/2.
If that it will match the platform configuration better.

Roger can you test that approach once more with your fix for the OOM 
issues in the page fault handler?

Sure. Use the limit as total ram*1/2 seems work very well.
No OOM although swap disk reaches full at peak for piglit test.
I speculate this case happens but no OOM because:

a. run a while, swap disk be used close to 1/2* total size and but not over 1/2 
* total.
b. all subsequent swapped pages stay in system memory until no space there.
  Then the swapped pages in shmem be flushed into swap disk. And probably 
OS also need some swap space.
  For this case, it is easy to get full for swap disk.
c. but because now free swap size < 1/2 * total, so no swap out happen  after 
that.
 And at least 1/4* system memory will left because below check in 
ttm_mem_global_reserve will ensure that.
if (zone->used_mem > limit)
goto out_unlock;
 
Thanks

Roger(Hongbo.He)
-Original Message-
From: Koenig, Christian
Sent: Wednesday, January 31, 2018 4:13 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
driver does and I suggested before.

E.g. don't copy anything into a shmemfile when there is only x MB of swap space 
left.

Roger can you test that approach once more with your fix for the OOM issues in 
the page fault handler?

Thanks,
Christian.

Am 31.01.2018 um 09:08 schrieb He, Roger:

I think this patch isn't need at all. You can directly read 
total_swap_pages variable in TTM.

Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using will 
result in:
"WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On
Behalf Of Chunming Zhou
Sent: Wednesday, January 31, 2018 3:15 PM
To: He, Roger ; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Koenig,
Christian 
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to
expose total_swap_pages

Hi Roger,

I think this patch isn't need at all. You can directly read total_swap_pages 
variable in TTM. See the comment:

/* protected with swap_lock. reading in vm_swap_full() doesn't need
lock */ long total_swap_pages;

there are many places using it directly, you just couldn't change its value. 
Reading it doesn't need lock.


Regards,

David Zhou


On 2018年01月29日 16:29, Roger He wrote:

ttm module needs it to determine its internal parameter setting.

Signed-off-by: Roger He 
---
include/linux/swap.h |  6 ++
mm/swapfile.c| 15 +++
2 files changed, 21 insertions(+)

drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of macro 'if'

2018-02-01 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4bf772b14675411a69b3c807f73006de0fe4b649
commit: 872f3578241d7e648b3bfcf6451a55faf97ce2e9 RDMA/bnxt_re: Add support for 
MRs with Huge pages
date:   2 weeks ago
config: i386-randconfig-sb0-02021411 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 872f3578241d7e648b3bfcf6451a55faf97ce2e9
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:2: warning: left shift count >= 
width of type
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
   In file included from include/linux/kernel.h:10:0,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
>> include/linux/compiler.h:61:17: warning: left shift count >= width of type
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^
>> drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of macro 
>> 'if'
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3317:4: warning: left shift count 
>= width of type
   length, BNXT_RE_MAX_MR_SIZE);
   ^
--
   drivers/infiniband//hw/bnxt_re/ib_verbs.c: In function 
'bnxt_re_query_device':
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:149:2: warning: left shift count 
>= width of type
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
   In file included from include/linux/kernel.h:10:0,
from include/linux/interrupt.h:6,
from drivers/infiniband//hw/bnxt_re/ib_verbs.c:39:
>> include/linux/compiler.h:61:17: warning: left shift count >= width of type
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of 
macro 'if'
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3317:4: warning: left shift count 
>= width of type
   length, BNXT_RE_MAX_MR_SIZE);
   ^

vim +/if +3315 drivers/infiniband/hw/bnxt_re/ib_verbs.c

872f35782 Somnath Kotur   2018-01-11  3302  
1ac5a4047 Selvin Xavier   2017-02-10  3303  /* uverbs */
1ac5a4047 Selvin Xavier   2017-02-10  3304  struct ib_mr 
*bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
1ac5a4047 Selvin Xavier   2017-02-10  3305u64 
virt_addr, int mr_access_flags,
1ac5a4047 Selvin Xavier   2017-02-10  3306
struct ib_udata *udata)
1ac5a4047 Selvin Xavier   2017-02-10  3307  {
1ac5a4047 Selvin Xavier   2017-02-10  3308  struct bnxt_re_pd *pd = 
container_of(ib_pd, struct bnxt_re_pd, ib_pd);
1ac5a4047 Selvin Xavier   2017-02-10  3309  struct bnxt_re_dev *rdev = 
pd->rdev;
1ac5a4047 Selvin Xavier   2017-02-10  3310  struct bnxt_re_mr *mr;
1ac5a4047 Selvin Xavier   2017-02-10  3311  struct ib_umem *umem;
872f35782 Somnath Kotur   2018-01-11  3312  u64 *pbl_tbl = NULL;
872f35782 Somnath Kotur   2018-01-11  3313  int umem_pgs, page_shift, rc;
1ac5a4047 Selvin Xavier   2017-02-10  3314  
58d4a671d Selvin Xavier   2017-06-29 @3315  if (length > 
BNXT_RE_MAX_MR_SIZE) {
58d4a671d Selvin Xavier   2017-06-29  3316  
dev_err(rdev_to_dev(rdev), "MR Size: %lld > Max supported:%ld\n",
58d4a671d Selvin Xavier   2017-06-29  3317  length, 
BNXT_RE_MAX_MR_SIZE);
58d4a671d Selvin Xavier   2017-06-29  3318  return ERR_PTR(-ENOMEM);
58d4a671d Selvin Xavier   2017-06-29  3319  }
58d4a671d Selvin Xavier   2017-06-29  3320  
1ac5a4047 Selvin Xavier   2017-02-10  3321  mr = kzalloc(sizeof(*mr), 
GFP_KERNEL);
1ac5a4047 Selvin Xavier   2017-02-10  3322  if (!mr)
1ac5a4047 Selvin Xavier   2017-02-10  3323  return ERR_PTR(-ENOMEM);
1ac5a4047 Selvin 

drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of macro 'if'

2018-02-01 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4bf772b14675411a69b3c807f73006de0fe4b649
commit: 872f3578241d7e648b3bfcf6451a55faf97ce2e9 RDMA/bnxt_re: Add support for 
MRs with Huge pages
date:   2 weeks ago
config: i386-randconfig-sb0-02021411 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 872f3578241d7e648b3bfcf6451a55faf97ce2e9
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:2: warning: left shift count >= 
width of type
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
   In file included from include/linux/kernel.h:10:0,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
>> include/linux/compiler.h:61:17: warning: left shift count >= width of type
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^
>> drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of macro 
>> 'if'
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3317:4: warning: left shift count 
>= width of type
   length, BNXT_RE_MAX_MR_SIZE);
   ^
--
   drivers/infiniband//hw/bnxt_re/ib_verbs.c: In function 
'bnxt_re_query_device':
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:149:2: warning: left shift count 
>= width of type
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
   In file included from include/linux/kernel.h:10:0,
from include/linux/interrupt.h:6,
from drivers/infiniband//hw/bnxt_re/ib_verbs.c:39:
>> include/linux/compiler.h:61:17: warning: left shift count >= width of type
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of 
macro 'if'
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3317:4: warning: left shift count 
>= width of type
   length, BNXT_RE_MAX_MR_SIZE);
   ^

vim +/if +3315 drivers/infiniband/hw/bnxt_re/ib_verbs.c

872f35782 Somnath Kotur   2018-01-11  3302  
1ac5a4047 Selvin Xavier   2017-02-10  3303  /* uverbs */
1ac5a4047 Selvin Xavier   2017-02-10  3304  struct ib_mr 
*bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
1ac5a4047 Selvin Xavier   2017-02-10  3305u64 
virt_addr, int mr_access_flags,
1ac5a4047 Selvin Xavier   2017-02-10  3306
struct ib_udata *udata)
1ac5a4047 Selvin Xavier   2017-02-10  3307  {
1ac5a4047 Selvin Xavier   2017-02-10  3308  struct bnxt_re_pd *pd = 
container_of(ib_pd, struct bnxt_re_pd, ib_pd);
1ac5a4047 Selvin Xavier   2017-02-10  3309  struct bnxt_re_dev *rdev = 
pd->rdev;
1ac5a4047 Selvin Xavier   2017-02-10  3310  struct bnxt_re_mr *mr;
1ac5a4047 Selvin Xavier   2017-02-10  3311  struct ib_umem *umem;
872f35782 Somnath Kotur   2018-01-11  3312  u64 *pbl_tbl = NULL;
872f35782 Somnath Kotur   2018-01-11  3313  int umem_pgs, page_shift, rc;
1ac5a4047 Selvin Xavier   2017-02-10  3314  
58d4a671d Selvin Xavier   2017-06-29 @3315  if (length > 
BNXT_RE_MAX_MR_SIZE) {
58d4a671d Selvin Xavier   2017-06-29  3316  
dev_err(rdev_to_dev(rdev), "MR Size: %lld > Max supported:%ld\n",
58d4a671d Selvin Xavier   2017-06-29  3317  length, 
BNXT_RE_MAX_MR_SIZE);
58d4a671d Selvin Xavier   2017-06-29  3318  return ERR_PTR(-ENOMEM);
58d4a671d Selvin Xavier   2017-06-29  3319  }
58d4a671d Selvin Xavier   2017-06-29  3320  
1ac5a4047 Selvin Xavier   2017-02-10  3321  mr = kzalloc(sizeof(*mr), 
GFP_KERNEL);
1ac5a4047 Selvin Xavier   2017-02-10  3322  if (!mr)
1ac5a4047 Selvin Xavier   2017-02-10  3323  return ERR_PTR(-ENOMEM);
1ac5a4047 Selvin 

Re: [git pull] drm pull for v4.16-rc1

2018-02-01 Thread Daniel Stone
On 2 February 2018 at 02:50, Dave Airlie  wrote:
> On 2 February 2018 at 12:44, Linus Torvalds
>  wrote:
>> On Thu, Feb 1, 2018 at 6:22 PM, Dave Airlie  wrote:
>>>
>>> Turned out I was running on wayland instead of X.org and my cut-n-paste from
>>> gedit to firefox got truncated, wierd. I'll go annoy some people, and make 
>>> sure
>>> it doesn't happen again.
>>
>> Heh, so there's some Wayland clipboard buffer limit.
>
> Yup or some bug getting the second chunks across from one place to another.

The transfer part of Wayland's clipboard protocol is an FD between
both clients for them to send data directly. But Firefox isn't yet
native, and I can fully believe that GNOME Shell's Xwayland clipboard
translator isn't perfect.

>> But that reminds me: is there any *standard* tool to programmatically
>> feed into the clipboard?
>>
>> I occasionally do things like
>>
>> git shortlog A..B | xsel
>>
>> in order to then paste it into some browser window or other.
>>
>> And sure, that works well. But I do it seldom enough that I never
>> remember the command, and half the time it's not even installed
>> because I've switched machines or something, and xsel is always some
>> add-on.
>>
>> What's the thing "real" X people do/use?
>
> I use gedit to move things from files to clip now, for mostly the same 
> reasons,
> I know it's installed usually. xclip and xsel are two utilities I know
> off, but I don'
> think anything gets installed by default.

That's the state of the art for X11.

Cheers,
Daniel


Re: [git pull] drm pull for v4.16-rc1

2018-02-01 Thread Daniel Stone
On 2 February 2018 at 02:50, Dave Airlie  wrote:
> On 2 February 2018 at 12:44, Linus Torvalds
>  wrote:
>> On Thu, Feb 1, 2018 at 6:22 PM, Dave Airlie  wrote:
>>>
>>> Turned out I was running on wayland instead of X.org and my cut-n-paste from
>>> gedit to firefox got truncated, wierd. I'll go annoy some people, and make 
>>> sure
>>> it doesn't happen again.
>>
>> Heh, so there's some Wayland clipboard buffer limit.
>
> Yup or some bug getting the second chunks across from one place to another.

The transfer part of Wayland's clipboard protocol is an FD between
both clients for them to send data directly. But Firefox isn't yet
native, and I can fully believe that GNOME Shell's Xwayland clipboard
translator isn't perfect.

>> But that reminds me: is there any *standard* tool to programmatically
>> feed into the clipboard?
>>
>> I occasionally do things like
>>
>> git shortlog A..B | xsel
>>
>> in order to then paste it into some browser window or other.
>>
>> And sure, that works well. But I do it seldom enough that I never
>> remember the command, and half the time it's not even installed
>> because I've switched machines or something, and xsel is always some
>> add-on.
>>
>> What's the thing "real" X people do/use?
>
> I use gedit to move things from files to clip now, for mostly the same 
> reasons,
> I know it's installed usually. xclip and xsel are two utilities I know
> off, but I don'
> think anything gets installed by default.

That's the state of the art for X11.

Cheers,
Daniel


Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-02-01 Thread Tomasz Figa
On Fri, Feb 2, 2018 at 4:33 PM, Sakari Ailus
 wrote:
>> >> +/**
>> >> + * struct media_request_queue - queue of requests
>> >> + *
>> >> + * @mdev:media_device that manages this queue
>> >> + * @ops: implementation of the queue
>> >> + * @mutex:   protects requests, active_request, req_id, and all members 
>> >> of
>> >> + *   struct media_request
>> >> + * @active_request: request being currently run by this queue
>> >> + * @requests:list of requests (not in any particular order) that 
>> >> this
>> >> + *   queue owns.
>> >> + * @req_id:  counter used to identify requests for debugging purposes
>> >> + */
>> >> +struct media_request_queue {
>> >> + struct media_device *mdev;
>> >> + const struct media_request_queue_ops *ops;
>> >> +
>> >> + struct mutex mutex;
>> >
>> > Any particular reason for using a mutex? The request queue lock will need
>> > to be acquired from interrupts, too, so this should be changed to a
>> > spinlock.
>>
>> Will it be acquired from interrupts? In any case it should be possible
>> to change this to a spinlock.
>
> Using mutexes will effectively make this impossible, and I don't think we
> can safely say there's not going to be a need for that. So spinlocks,
> please.
>

IMHO whether a mutex or spinlock is the right thing depends on what
kind of critical section it is used for. If it only protects data (and
according to the comment, this one seems to do so), spinlock might
actually have better properties, e.g. not introducing the need to
reschedule, if another CPU is accessing the data at the moment. It
might also depend on how heavy the data accesses are, though. We
shouldn't need to spin for too long time.

Best regards,
Tomasz


Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-02-01 Thread Tomasz Figa
On Fri, Feb 2, 2018 at 4:33 PM, Sakari Ailus
 wrote:
>> >> +/**
>> >> + * struct media_request_queue - queue of requests
>> >> + *
>> >> + * @mdev:media_device that manages this queue
>> >> + * @ops: implementation of the queue
>> >> + * @mutex:   protects requests, active_request, req_id, and all members 
>> >> of
>> >> + *   struct media_request
>> >> + * @active_request: request being currently run by this queue
>> >> + * @requests:list of requests (not in any particular order) that 
>> >> this
>> >> + *   queue owns.
>> >> + * @req_id:  counter used to identify requests for debugging purposes
>> >> + */
>> >> +struct media_request_queue {
>> >> + struct media_device *mdev;
>> >> + const struct media_request_queue_ops *ops;
>> >> +
>> >> + struct mutex mutex;
>> >
>> > Any particular reason for using a mutex? The request queue lock will need
>> > to be acquired from interrupts, too, so this should be changed to a
>> > spinlock.
>>
>> Will it be acquired from interrupts? In any case it should be possible
>> to change this to a spinlock.
>
> Using mutexes will effectively make this impossible, and I don't think we
> can safely say there's not going to be a need for that. So spinlocks,
> please.
>

IMHO whether a mutex or spinlock is the right thing depends on what
kind of critical section it is used for. If it only protects data (and
according to the comment, this one seems to do so), spinlock might
actually have better properties, e.g. not introducing the need to
reschedule, if another CPU is accessing the data at the moment. It
might also depend on how heavy the data accesses are, though. We
shouldn't need to spin for too long time.

Best regards,
Tomasz


Re: [PATCH] esp4: remove redundant initialization of pointer esph

2018-02-01 Thread Steffen Klassert
On Tue, Jan 30, 2018 at 02:53:48PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Pointer esph is being assigned a value that is never read, esph is
> re-assigned and only read inside an if statement, hence the
> initialization is redundant and can be removed.
> 
> Cleans up clang warning:
> net/ipv4/esp4.c:657:21: warning: Value stored to 'esph' during
> its initialization is never read
> 
> Signed-off-by: Colin Ian King 

I've queued this for ipsec-next, will be applied
after the merge window.


Re: [PATCH] esp4: remove redundant initialization of pointer esph

2018-02-01 Thread Steffen Klassert
On Tue, Jan 30, 2018 at 02:53:48PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Pointer esph is being assigned a value that is never read, esph is
> re-assigned and only read inside an if statement, hence the
> initialization is redundant and can be removed.
> 
> Cleans up clang warning:
> net/ipv4/esp4.c:657:21: warning: Value stored to 'esph' during
> its initialization is never read
> 
> Signed-off-by: Colin Ian King 

I've queued this for ipsec-next, will be applied
after the merge window.


Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-02-01 Thread Sakari Ailus
Hi Alexandre,

On Tue, Jan 30, 2018 at 01:23:05PM +0900, Alexandre Courbot wrote:
> Hi Sakari, thanks for the review!
> 
> The version you reviewed is not the latest one, but I suppose most of
> your comments still apply.
> 
> On Fri, Jan 26, 2018 at 5:39 PM, Sakari Ailus  wrote:
> > Hi Alexandre,
> >
> > I remember it was discussed that the work after the V4L2 jobs API would
> > continue from the existing request API patches. I see that at least the
> > rather important support for events is missing in this version. Why was it
> > left out?
> 
> Request completion is signaled by polling on the request FD, so we
> don't need to rely on V4L2 events to signal this anymore. If we want
> to signal different kinds of events on requests we could implement a
> more sophisticated event system on top of that, but for our current
> needs polling is sufficient.

Right. This works for now indeed. We will need to revisit this when
requests are moved to the media device in the future.

> 
> What other kind of event besides completion could we want to deliver
> to user-space from a request?
> 
> >
> > I also see that variable size IOCTL argument support is no longer included.
> 
> Do we need this for the request API?

Technically there's no strict need for that now. However when the requests
are moved to the media device (i.e. other device nodes are not needed
anymore), then this is a must.

It was proposed and AFAIR agreed on as well that new media device
IOCTLs would not use reserved fields any longer but rely on variable size
IOCTL arguments instead. This is in line with your request argument struct
having no reserved fields and I don't think we should add them there.

> 
> >
> > On Fri, Dec 15, 2017 at 04:56:17PM +0900, Alexandre Courbot wrote:
> >> The request API provides a way to group buffers and device parameters
> >> into units of work to be queued and executed. This patch introduces the
> >> UAPI and core framework.
> >>
> >> This patch is based on the previous work by Laurent Pinchart. The core
> >> has changed considerably, but the UAPI is mostly untouched.
> >>
> >> Signed-off-by: Alexandre Courbot 
> >> ---
> >>  drivers/media/Makefile   |   3 +-
> >>  drivers/media/media-device.c |   6 +
> >>  drivers/media/media-request.c| 390 
> >> +++
> >>  drivers/media/v4l2-core/v4l2-ioctl.c |   2 +-
> >>  include/media/media-device.h |   3 +
> >>  include/media/media-entity.h |   6 +
> >>  include/media/media-request.h| 269 
> >>  include/uapi/linux/media.h   |  11 +
> >>  8 files changed, 688 insertions(+), 2 deletions(-)
> >>  create mode 100644 drivers/media/media-request.c
> >>  create mode 100644 include/media/media-request.h
> >>
> >> diff --git a/drivers/media/Makefile b/drivers/media/Makefile
> >> index 594b462ddf0e..985d35ec6b29 100644
> >> --- a/drivers/media/Makefile
> >> +++ b/drivers/media/Makefile
> >> @@ -3,7 +3,8 @@
> >>  # Makefile for the kernel multimedia device drivers.
> >>  #
> >>
> >> -media-objs   := media-device.o media-devnode.o media-entity.o
> >> +media-objs   := media-device.o media-devnode.o media-entity.o \
> >> +media-request.o
> >>
> >>  #
> >>  # I2C drivers should come before other drivers, otherwise they'll fail
> >> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> >> index e79f72b8b858..045cec7d2de9 100644
> >> --- a/drivers/media/media-device.c
> >> +++ b/drivers/media/media-device.c
> >> @@ -32,6 +32,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>
> >>  #ifdef CONFIG_MEDIA_CONTROLLER
> >>
> >> @@ -407,6 +408,7 @@ static const struct media_ioctl_info ioctl_info[] = {
> >>   MEDIA_IOC(ENUM_LINKS, media_device_enum_links, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >>   MEDIA_IOC(SETUP_LINK, media_device_setup_link, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >>   MEDIA_IOC(G_TOPOLOGY, media_device_get_topology, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >> + MEDIA_IOC(REQUEST_CMD, media_device_request_cmd, 0),
> >>  };
> >>
> >>  static long media_device_ioctl(struct file *filp, unsigned int cmd,
> >> @@ -688,6 +690,10 @@ EXPORT_SYMBOL_GPL(media_device_init);
> >>
> >>  void media_device_cleanup(struct media_device *mdev)
> >>  {
> >> + if (mdev->req_queue) {
> >> + mdev->req_queue->ops->release(mdev->req_queue);
> >> + mdev->req_queue = NULL;
> >> + }
> >>   ida_destroy(>entity_internal_idx);
> >>   mdev->entity_internal_idx_max = 0;
> >>   media_graph_walk_cleanup(>pm_count_walk);
> >> diff --git a/drivers/media/media-request.c b/drivers/media/media-request.c
> >> new file mode 100644
> >> index ..15dc65ddfe41
> >> --- /dev/null
> >> +++ b/drivers/media/media-request.c
> >> @@ -0,0 +1,390 @@
> >> +/*
> >> + * Request and request queue base management
> >> + *
> >> + * Copyright (C) 2017, The Chromium OS 

Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-02-01 Thread Sakari Ailus
Hi Alexandre,

On Tue, Jan 30, 2018 at 01:23:05PM +0900, Alexandre Courbot wrote:
> Hi Sakari, thanks for the review!
> 
> The version you reviewed is not the latest one, but I suppose most of
> your comments still apply.
> 
> On Fri, Jan 26, 2018 at 5:39 PM, Sakari Ailus  wrote:
> > Hi Alexandre,
> >
> > I remember it was discussed that the work after the V4L2 jobs API would
> > continue from the existing request API patches. I see that at least the
> > rather important support for events is missing in this version. Why was it
> > left out?
> 
> Request completion is signaled by polling on the request FD, so we
> don't need to rely on V4L2 events to signal this anymore. If we want
> to signal different kinds of events on requests we could implement a
> more sophisticated event system on top of that, but for our current
> needs polling is sufficient.

Right. This works for now indeed. We will need to revisit this when
requests are moved to the media device in the future.

> 
> What other kind of event besides completion could we want to deliver
> to user-space from a request?
> 
> >
> > I also see that variable size IOCTL argument support is no longer included.
> 
> Do we need this for the request API?

Technically there's no strict need for that now. However when the requests
are moved to the media device (i.e. other device nodes are not needed
anymore), then this is a must.

It was proposed and AFAIR agreed on as well that new media device
IOCTLs would not use reserved fields any longer but rely on variable size
IOCTL arguments instead. This is in line with your request argument struct
having no reserved fields and I don't think we should add them there.

> 
> >
> > On Fri, Dec 15, 2017 at 04:56:17PM +0900, Alexandre Courbot wrote:
> >> The request API provides a way to group buffers and device parameters
> >> into units of work to be queued and executed. This patch introduces the
> >> UAPI and core framework.
> >>
> >> This patch is based on the previous work by Laurent Pinchart. The core
> >> has changed considerably, but the UAPI is mostly untouched.
> >>
> >> Signed-off-by: Alexandre Courbot 
> >> ---
> >>  drivers/media/Makefile   |   3 +-
> >>  drivers/media/media-device.c |   6 +
> >>  drivers/media/media-request.c| 390 
> >> +++
> >>  drivers/media/v4l2-core/v4l2-ioctl.c |   2 +-
> >>  include/media/media-device.h |   3 +
> >>  include/media/media-entity.h |   6 +
> >>  include/media/media-request.h| 269 
> >>  include/uapi/linux/media.h   |  11 +
> >>  8 files changed, 688 insertions(+), 2 deletions(-)
> >>  create mode 100644 drivers/media/media-request.c
> >>  create mode 100644 include/media/media-request.h
> >>
> >> diff --git a/drivers/media/Makefile b/drivers/media/Makefile
> >> index 594b462ddf0e..985d35ec6b29 100644
> >> --- a/drivers/media/Makefile
> >> +++ b/drivers/media/Makefile
> >> @@ -3,7 +3,8 @@
> >>  # Makefile for the kernel multimedia device drivers.
> >>  #
> >>
> >> -media-objs   := media-device.o media-devnode.o media-entity.o
> >> +media-objs   := media-device.o media-devnode.o media-entity.o \
> >> +media-request.o
> >>
> >>  #
> >>  # I2C drivers should come before other drivers, otherwise they'll fail
> >> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> >> index e79f72b8b858..045cec7d2de9 100644
> >> --- a/drivers/media/media-device.c
> >> +++ b/drivers/media/media-device.c
> >> @@ -32,6 +32,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>
> >>  #ifdef CONFIG_MEDIA_CONTROLLER
> >>
> >> @@ -407,6 +408,7 @@ static const struct media_ioctl_info ioctl_info[] = {
> >>   MEDIA_IOC(ENUM_LINKS, media_device_enum_links, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >>   MEDIA_IOC(SETUP_LINK, media_device_setup_link, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >>   MEDIA_IOC(G_TOPOLOGY, media_device_get_topology, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >> + MEDIA_IOC(REQUEST_CMD, media_device_request_cmd, 0),
> >>  };
> >>
> >>  static long media_device_ioctl(struct file *filp, unsigned int cmd,
> >> @@ -688,6 +690,10 @@ EXPORT_SYMBOL_GPL(media_device_init);
> >>
> >>  void media_device_cleanup(struct media_device *mdev)
> >>  {
> >> + if (mdev->req_queue) {
> >> + mdev->req_queue->ops->release(mdev->req_queue);
> >> + mdev->req_queue = NULL;
> >> + }
> >>   ida_destroy(>entity_internal_idx);
> >>   mdev->entity_internal_idx_max = 0;
> >>   media_graph_walk_cleanup(>pm_count_walk);
> >> diff --git a/drivers/media/media-request.c b/drivers/media/media-request.c
> >> new file mode 100644
> >> index ..15dc65ddfe41
> >> --- /dev/null
> >> +++ b/drivers/media/media-request.c
> >> @@ -0,0 +1,390 @@
> >> +/*
> >> + * Request and request queue base management
> >> + *
> >> + * Copyright (C) 2017, The Chromium OS Authors.  All rights reserved.
> >> + *
> >> + * 

Re: KASAN: stack-out-of-bounds Read in xfrm_state_find (4)

2018-02-01 Thread Steffen Klassert
On Thu, Feb 01, 2018 at 11:30:00AM +0100, Dmitry Vyukov wrote:
> On Thu, Feb 1, 2018 at 9:34 AM, Steffen Klassert
> 
> Hi Steffen,
> 
> Please see the email footer:
> 
> > If you want to test a patch for this bug, please reply with:
> > #syz test: git://repo/address.git branch
> > and provide the patch inline or as an attachment.

Thanks for the hint, I've overlooked this. This is very usefull
for the case that I can not reproduce the bug, but I think I know
how to fix it.

There are two more cases that come to my mind where syzbot could
help.

1. I can not reproduce the bug and I don't know how to fix it,
   but some debug output would be helpfull:

   syz test-debug-patch-and-send-dmesg-output: git://repo/address.git branch

2. I can not reproduce the bug and I have absolutely no idea what it
   could be:

   syz bisect: git://repo/address.git branch commit a commit b

I don't know if this is possible, but it would bring the bugfixing
process a bit coser to the case where a real user does a bug report.


#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master


Subject: [PATCH RFC] xfrm: Refuse to insert 32 bit userspace socket policies on 
64 bit systems

We don't have compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken confuguration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.

Reported-by: syzbot+e1a1577ca8bcb47b7...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_state.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index a3785f538018..25861a4ef872 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2056,6 +2056,11 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 
__user *optval, int optlen
struct xfrm_mgr *km;
struct xfrm_policy *pol = NULL;
 
+#ifdef CONFIG_COMPAT
+   if (in_compat_syscall())
+   return -EOPNOTSUPP;
+#endif
+
if (optlen <= 0 || optlen > PAGE_SIZE)
return -EMSGSIZE;
 
-- 
2.14.1



Re: KASAN: stack-out-of-bounds Read in xfrm_state_find (4)

2018-02-01 Thread Steffen Klassert
On Thu, Feb 01, 2018 at 11:30:00AM +0100, Dmitry Vyukov wrote:
> On Thu, Feb 1, 2018 at 9:34 AM, Steffen Klassert
> 
> Hi Steffen,
> 
> Please see the email footer:
> 
> > If you want to test a patch for this bug, please reply with:
> > #syz test: git://repo/address.git branch
> > and provide the patch inline or as an attachment.

Thanks for the hint, I've overlooked this. This is very usefull
for the case that I can not reproduce the bug, but I think I know
how to fix it.

There are two more cases that come to my mind where syzbot could
help.

1. I can not reproduce the bug and I don't know how to fix it,
   but some debug output would be helpfull:

   syz test-debug-patch-and-send-dmesg-output: git://repo/address.git branch

2. I can not reproduce the bug and I have absolutely no idea what it
   could be:

   syz bisect: git://repo/address.git branch commit a commit b

I don't know if this is possible, but it would bring the bugfixing
process a bit coser to the case where a real user does a bug report.


#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master


Subject: [PATCH RFC] xfrm: Refuse to insert 32 bit userspace socket policies on 
64 bit systems

We don't have compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken confuguration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.

Reported-by: syzbot+e1a1577ca8bcb47b7...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_state.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index a3785f538018..25861a4ef872 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2056,6 +2056,11 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 
__user *optval, int optlen
struct xfrm_mgr *km;
struct xfrm_policy *pol = NULL;
 
+#ifdef CONFIG_COMPAT
+   if (in_compat_syscall())
+   return -EOPNOTSUPP;
+#endif
+
if (optlen <= 0 || optlen > PAGE_SIZE)
return -EMSGSIZE;
 
-- 
2.14.1



Re: [PATCH v11 3/3] mm, x86: display pkey in smaps only if arch supports pkeys

2018-02-01 Thread Ram Pai
On Fri, Feb 02, 2018 at 12:27:27PM +0800, kbuild test robot wrote:
> Hi Ram,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.15 next-20180201]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Ram-2DPai_mm-2Dx86-2Dpowerpc-2DEnhancements-2Dto-2DMemory-2DProtection-2DKeys_20180202-2D120004=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=Fv3tEHet1bTUrDjOnzEhXvGM_4tGlkYhJHPBnWNWgVA=Z1W6CV2tfPmLYU8lVv1oDRl2cAyQA76KE2P064A2CQY=
> config: x86_64-randconfig-x005-201804 (attached as .config)
> compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from arch/x86/include/asm/mmu_context.h:8:0,
> from arch/x86/events/core.c:36:
> >> include/linux/pkeys.h:16:23: error: expected identifier or '(' before 
> >> numeric constant
> #define vma_pkey(vma) 0
>   ^
> >> arch/x86/include/asm/mmu_context.h:298:19: note: in expansion of macro 
> >> 'vma_pkey'
> static inline int vma_pkey(struct vm_area_struct *vma)
>   ^~~~
> 
> vim +16 include/linux/pkeys.h
> 
>  7
>  8#ifdef CONFIG_ARCH_HAS_PKEYS
>  9#include 
> 10#else /* ! CONFIG_ARCH_HAS_PKEYS */
> 11#define arch_max_pkey() (1)
> 12#define execute_only_pkey(mm) (0)
> 13#define arch_override_mprotect_pkey(vma, prot, pkey) (0)
> 14#define PKEY_DEDICATED_EXECUTE_ONLY 0
> 15#define ARCH_VM_PKEY_FLAGS 0
>   > 16#define vma_pkey(vma) 0

Oops. Thanks for catching the issue. The following fix will resolve the error.

diff --git a/arch/x86/include/asm/mmu_context.h
b/arch/x86/include/asm/mmu_context.h
index 6d16d15..c1aeb19 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -238,11 +238,6 @@ static inline int vma_pkey(struct vm_area_struct
*vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
}
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif

RP



Re: [PATCH v11 3/3] mm, x86: display pkey in smaps only if arch supports pkeys

2018-02-01 Thread Ram Pai
On Fri, Feb 02, 2018 at 12:27:27PM +0800, kbuild test robot wrote:
> Hi Ram,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.15 next-20180201]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Ram-2DPai_mm-2Dx86-2Dpowerpc-2DEnhancements-2Dto-2DMemory-2DProtection-2DKeys_20180202-2D120004=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=Fv3tEHet1bTUrDjOnzEhXvGM_4tGlkYhJHPBnWNWgVA=Z1W6CV2tfPmLYU8lVv1oDRl2cAyQA76KE2P064A2CQY=
> config: x86_64-randconfig-x005-201804 (attached as .config)
> compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from arch/x86/include/asm/mmu_context.h:8:0,
> from arch/x86/events/core.c:36:
> >> include/linux/pkeys.h:16:23: error: expected identifier or '(' before 
> >> numeric constant
> #define vma_pkey(vma) 0
>   ^
> >> arch/x86/include/asm/mmu_context.h:298:19: note: in expansion of macro 
> >> 'vma_pkey'
> static inline int vma_pkey(struct vm_area_struct *vma)
>   ^~~~
> 
> vim +16 include/linux/pkeys.h
> 
>  7
>  8#ifdef CONFIG_ARCH_HAS_PKEYS
>  9#include 
> 10#else /* ! CONFIG_ARCH_HAS_PKEYS */
> 11#define arch_max_pkey() (1)
> 12#define execute_only_pkey(mm) (0)
> 13#define arch_override_mprotect_pkey(vma, prot, pkey) (0)
> 14#define PKEY_DEDICATED_EXECUTE_ONLY 0
> 15#define ARCH_VM_PKEY_FLAGS 0
>   > 16#define vma_pkey(vma) 0

Oops. Thanks for catching the issue. The following fix will resolve the error.

diff --git a/arch/x86/include/asm/mmu_context.h
b/arch/x86/include/asm/mmu_context.h
index 6d16d15..c1aeb19 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -238,11 +238,6 @@ static inline int vma_pkey(struct vm_area_struct
*vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
}
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif

RP



Re: [PATCH v2] ACPI / tables: Add IORT to injectable table list

2018-02-01 Thread Yang, Shunyong
Hi, Hanjun

On Wed, 2018-01-31 at 21:32 +0800, Hanjun Guo wrote:
> Hi Shunyong,
> 
> On 2018/1/30 9:44, Yang, Shunyong wrote:
> > 
> > Hi, Rafael
> > 
> > Could you please help to review this patch? This is a small change
> > to
> > add ACPI_SIG_IORT to table_sigs[]. 
> > Loading IORT table from initrd is very useful to debug SMMU
> > node/device
> > probe, MSI allocation, stream id translation and verifying IORT
> > table
> > from firmware. So, I add this.
> It's true, mappings in IORT will be easy getting wrong, so it would
> be
> good to test it without updating the firmware.
> 
> But I think you'd better to add your comment about why you need
> IORT in the commit message in your patch, that will be useful
> to convince Rafael to take your patch.
> 

Thanks for your suggestion. I will add detailed information to commit
message and send out v3 later.

Thanks.
Shunyong.


Re: [PATCH] ARM: dts: imx6q-bx50v3: Enable secure-reg-access

2018-02-01 Thread Shawn Guo
+ Frank

On Mon, Jan 15, 2018 at 05:07:22PM +0100, Sebastian Reichel wrote:
> From: Peter Senna Tschudin 
> 
> Add secure-reg-access on device tree include file for Bx50 devices
> to enable PMU and hardware counters for perf.
> 
> Signed-off-by: Peter Senna Tschudin 
> Signed-off-by: Sebastian Reichel 
> ---
>  arch/arm/boot/dts/imx6q-bx50v3.dtsi | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
> b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> index 86cfd4481e72..ccaaee83e2fa 100644
> --- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> +++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> @@ -43,6 +43,13 @@
>  #include "imx6q-ba16.dtsi"
>  
>  / {
> + soc {
> + pmu {
> + compatible = "arm,cortex-a9-pmu";
> + secure-reg-access;

I'm not sure this could be a board level configuration.  Shouldn't this
property just be added into pmu node in imx6qdl.dtsi?

Shawn

> + };
> + };
> +
>   clocks {
>   mclk: clock@0 {
>   compatible = "fixed-clock";
> -- 
> 2.15.1
> 


Re: [PATCH v2] ACPI / tables: Add IORT to injectable table list

2018-02-01 Thread Yang, Shunyong
Hi, Hanjun

On Wed, 2018-01-31 at 21:32 +0800, Hanjun Guo wrote:
> Hi Shunyong,
> 
> On 2018/1/30 9:44, Yang, Shunyong wrote:
> > 
> > Hi, Rafael
> > 
> > Could you please help to review this patch? This is a small change
> > to
> > add ACPI_SIG_IORT to table_sigs[]. 
> > Loading IORT table from initrd is very useful to debug SMMU
> > node/device
> > probe, MSI allocation, stream id translation and verifying IORT
> > table
> > from firmware. So, I add this.
> It's true, mappings in IORT will be easy getting wrong, so it would
> be
> good to test it without updating the firmware.
> 
> But I think you'd better to add your comment about why you need
> IORT in the commit message in your patch, that will be useful
> to convince Rafael to take your patch.
> 

Thanks for your suggestion. I will add detailed information to commit
message and send out v3 later.

Thanks.
Shunyong.


Re: [PATCH] ARM: dts: imx6q-bx50v3: Enable secure-reg-access

2018-02-01 Thread Shawn Guo
+ Frank

On Mon, Jan 15, 2018 at 05:07:22PM +0100, Sebastian Reichel wrote:
> From: Peter Senna Tschudin 
> 
> Add secure-reg-access on device tree include file for Bx50 devices
> to enable PMU and hardware counters for perf.
> 
> Signed-off-by: Peter Senna Tschudin 
> Signed-off-by: Sebastian Reichel 
> ---
>  arch/arm/boot/dts/imx6q-bx50v3.dtsi | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
> b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> index 86cfd4481e72..ccaaee83e2fa 100644
> --- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> +++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> @@ -43,6 +43,13 @@
>  #include "imx6q-ba16.dtsi"
>  
>  / {
> + soc {
> + pmu {
> + compatible = "arm,cortex-a9-pmu";
> + secure-reg-access;

I'm not sure this could be a board level configuration.  Shouldn't this
property just be added into pmu node in imx6qdl.dtsi?

Shawn

> + };
> + };
> +
>   clocks {
>   mclk: clock@0 {
>   compatible = "fixed-clock";
> -- 
> 2.15.1
> 


Re: [PATCH v4 02/10] ufs: sysfs: device descriptor

2018-02-01 Thread gre...@linuxfoundation.org
On Fri, Feb 02, 2018 at 12:25:46AM +, Bart Van Assche wrote:
> On Thu, 2018-02-01 at 18:15 +0200, Stanislav Nijnikov wrote:
> > +enum ufs_desc_param_size {
> > +   UFS_PARAM_BYTE_SIZE = 1,
> > +   UFS_PARAM_WORD_SIZE = 2,
> > +   UFS_PARAM_DWORD_SIZE= 4,
> > +   UFS_PARAM_QWORD_SIZE= 8,
> > +};
> 
> Please do not copy bad naming choices from the Windows kernel into the Linux
> kernel. Using names like WORD / DWORD / QWORD is much less readable than using
> the numeric constants 2, 4, 8. Hence my proposal to leave out the above enum
> completely.

Are you sure those do not come from the spec itself?  It's been a while
since I last read it, but for some reason I remember those types of
names being in there.  But I might be confusing specs here.

thanks,

greg k-h


Re: [PATCH v4 02/10] ufs: sysfs: device descriptor

2018-02-01 Thread gre...@linuxfoundation.org
On Fri, Feb 02, 2018 at 12:25:46AM +, Bart Van Assche wrote:
> On Thu, 2018-02-01 at 18:15 +0200, Stanislav Nijnikov wrote:
> > +enum ufs_desc_param_size {
> > +   UFS_PARAM_BYTE_SIZE = 1,
> > +   UFS_PARAM_WORD_SIZE = 2,
> > +   UFS_PARAM_DWORD_SIZE= 4,
> > +   UFS_PARAM_QWORD_SIZE= 8,
> > +};
> 
> Please do not copy bad naming choices from the Windows kernel into the Linux
> kernel. Using names like WORD / DWORD / QWORD is much less readable than using
> the numeric constants 2, 4, 8. Hence my proposal to leave out the above enum
> completely.

Are you sure those do not come from the spec itself?  It's been a while
since I last read it, but for some reason I remember those types of
names being in there.  But I might be confusing specs here.

thanks,

greg k-h


Re: [PATCH] ARM: dts: imx6q-bx50v3: disable SD card (usdhc2)

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:44:24PM +0100, Sebastian Reichel wrote:
> From: Ian Ray 
> 
> Disable the SD card interface from devicetree.
> 
> Signed-off-by: Ian Ray 
> Signed-off-by: Sebastian Reichel 

I applied the patch [1] from Ian.

Shawn

[1 ]https://www.spinics.net/lists/devicetree/msg209294.html

> ---
>  arch/arm/boot/dts/imx6q-bx50v3.dtsi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
> b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> index aefce581c0c3..86cfd4481e72 100644
> --- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> +++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> @@ -328,6 +328,10 @@
>   };
>  };
>  
> + {
> + status = "disabled";
> +};
> +
>   {
>   pinctrl-names = "default";
>   pinctrl-0 = <_usdhc4>;
> -- 
> 2.15.1
> 


Re: [PATCH] ARM: dts: imx6q-bx50v3: disable SD card (usdhc2)

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:44:24PM +0100, Sebastian Reichel wrote:
> From: Ian Ray 
> 
> Disable the SD card interface from devicetree.
> 
> Signed-off-by: Ian Ray 
> Signed-off-by: Sebastian Reichel 

I applied the patch [1] from Ian.

Shawn

[1 ]https://www.spinics.net/lists/devicetree/msg209294.html

> ---
>  arch/arm/boot/dts/imx6q-bx50v3.dtsi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
> b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> index aefce581c0c3..86cfd4481e72 100644
> --- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> +++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> @@ -328,6 +328,10 @@
>   };
>  };
>  
> + {
> + status = "disabled";
> +};
> +
>   {
>   pinctrl-names = "default";
>   pinctrl-0 = <_usdhc4>;
> -- 
> 2.15.1
> 


[PATCH 2/2] KVM: X86: Add per-VM no-HLT-exiting capability

2018-02-01 Thread Wanpeng Li
From: Wanpeng Li 

If host CPUs are dedicated to a VM, we can avoid VM exits on HLT.
This patch adds the per-VM non-HLT-exiting capability.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/api.txt | 11 +++
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/vmx.c| 21 +
 arch/x86/kvm/x86.c|  5 +
 arch/x86/kvm/x86.h|  5 +
 include/uapi/linux/kvm.h  |  1 +
 6 files changed, 45 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index e5f1743..573a3e5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4222,6 +4222,17 @@ enables QEMU to build error log and branch to guest 
kernel registered
 machine check handling routine. Without this capability KVM will
 branch to guests' 0x200 interrupt vector.
 
+7.13 KVM_CAP_X86_GUEST_HLT
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success
+
+This capability indicates that a guest using HLT to stop a virtual CPU
+will not cause a VM exit. As such, time spent while a virtual CPU is
+halted in this way will then be accounted for as guest running time on
+the host, KVM_FEATURE_PV_UNHALT should be disabled.
+
 8. Other capabilities.
 --
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dd6f57a..c566ea0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -804,6 +804,8 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
+   bool hlt_in_guest;
+
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1e554a..6cfd8d3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2478,6 +2478,19 @@ static int nested_vmx_check_exception(struct kvm_vcpu 
*vcpu, unsigned long *exit
return 0;
 }
 
+static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
+{
+   /*
+* Ensure that we clear the HLT state in the VMCS.  We don't need to
+* explicitly skip the instruction because if the HLT state is set,
+* then the instruction is already executing and RIP has already been
+* advanced.
+*/
+   if (kvm_hlt_in_guest(vcpu->kvm) &&
+   vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT)
+   vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
+}
+
 static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2508,6 +2521,8 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_rdtscp_supported(void)
@@ -5301,6 +5316,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
exec_control |= CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_CR3_LOAD_EXITING  |
CPU_BASED_INVLPG_EXITING;
+   if (kvm_hlt_in_guest(vmx->vcpu.kvm))
+   exec_control &= ~CPU_BASED_HLT_EXITING;
return exec_control;
 }
 
@@ -5729,6 +5746,8 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
} else
intr |= INTR_TYPE_EXT_INTR;
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
@@ -5759,6 +5778,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c13cd14..a508247 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2740,6 +2740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SET_BOOT_CPU_ID:
case KVM_CAP_SPLIT_IRQCHIP:
case KVM_CAP_IMMEDIATE_EXIT:
+   case KVM_CAP_X86_GUEST_HLT:
r = 1;
break;
case KVM_CAP_ADJUST_CLOCK:
@@ -4061,6 +4062,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 
r = 0;
break;
+   case KVM_CAP_X86_GUEST_HLT:
+   kvm->arch.hlt_in_guest = cap->args[0];
+   r = 0;
+   break;
default:
r = -EINVAL;
break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b91215d..96fe84e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -270,4 +270,9 @@ static inline bool kvm_mwait_in_guest(void)
!boot_cpu_has_bug(X86_BUG_MONITOR);
 }
 
+static inline bool kvm_hlt_in_guest(struct kvm *kvm)
+{

[PATCH 2/2] KVM: X86: Add per-VM no-HLT-exiting capability

2018-02-01 Thread Wanpeng Li
From: Wanpeng Li 

If host CPUs are dedicated to a VM, we can avoid VM exits on HLT.
This patch adds the per-VM non-HLT-exiting capability.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/api.txt | 11 +++
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/vmx.c| 21 +
 arch/x86/kvm/x86.c|  5 +
 arch/x86/kvm/x86.h|  5 +
 include/uapi/linux/kvm.h  |  1 +
 6 files changed, 45 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index e5f1743..573a3e5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4222,6 +4222,17 @@ enables QEMU to build error log and branch to guest 
kernel registered
 machine check handling routine. Without this capability KVM will
 branch to guests' 0x200 interrupt vector.
 
+7.13 KVM_CAP_X86_GUEST_HLT
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success
+
+This capability indicates that a guest using HLT to stop a virtual CPU
+will not cause a VM exit. As such, time spent while a virtual CPU is
+halted in this way will then be accounted for as guest running time on
+the host, KVM_FEATURE_PV_UNHALT should be disabled.
+
 8. Other capabilities.
 --
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dd6f57a..c566ea0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -804,6 +804,8 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
+   bool hlt_in_guest;
+
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1e554a..6cfd8d3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2478,6 +2478,19 @@ static int nested_vmx_check_exception(struct kvm_vcpu 
*vcpu, unsigned long *exit
return 0;
 }
 
+static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
+{
+   /*
+* Ensure that we clear the HLT state in the VMCS.  We don't need to
+* explicitly skip the instruction because if the HLT state is set,
+* then the instruction is already executing and RIP has already been
+* advanced.
+*/
+   if (kvm_hlt_in_guest(vcpu->kvm) &&
+   vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT)
+   vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
+}
+
 static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2508,6 +2521,8 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_rdtscp_supported(void)
@@ -5301,6 +5316,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
exec_control |= CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_CR3_LOAD_EXITING  |
CPU_BASED_INVLPG_EXITING;
+   if (kvm_hlt_in_guest(vmx->vcpu.kvm))
+   exec_control &= ~CPU_BASED_HLT_EXITING;
return exec_control;
 }
 
@@ -5729,6 +5746,8 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
} else
intr |= INTR_TYPE_EXT_INTR;
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
@@ -5759,6 +5778,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c13cd14..a508247 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2740,6 +2740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SET_BOOT_CPU_ID:
case KVM_CAP_SPLIT_IRQCHIP:
case KVM_CAP_IMMEDIATE_EXIT:
+   case KVM_CAP_X86_GUEST_HLT:
r = 1;
break;
case KVM_CAP_ADJUST_CLOCK:
@@ -4061,6 +4062,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 
r = 0;
break;
+   case KVM_CAP_X86_GUEST_HLT:
+   kvm->arch.hlt_in_guest = cap->args[0];
+   r = 0;
+   break;
default:
r = -EINVAL;
break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b91215d..96fe84e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -270,4 +270,9 @@ static inline bool kvm_mwait_in_guest(void)
!boot_cpu_has_bug(X86_BUG_MONITOR);
 }
 
+static inline bool kvm_hlt_in_guest(struct kvm *kvm)
+{
+   return kvm->arch.hlt_in_guest;
+}
+
 #endif
diff --git 

[PATCH 1/2] KVM: X86: Add dedicated pCPU hint PV_DEDICATED

2018-02-01 Thread Wanpeng Li
From: Wanpeng Li 

Waiman Long mentioned that:

 Generally speaking, unfair lock performs well for VMs with a small
 number of vCPUs. Native qspinlock may perform better than pvqspinlock
 if there is vCPU pinning and there is no vCPU over-commitment.

This patch adds a performance hint to allow hypervisor admin to choose
the qspinlock to be used when a dedicated pCPU is available.

PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
PV_DEDICATED = 0, PV_UNHALT = 1: default is Hybrid PV queued/unfair lock
PV_DEDICATED = 0, PV_UNHALT = 0: default is tas

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/cpuid.txt  | 6 ++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c| 6 ++
 3 files changed, 13 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 87a7506..c0740b1 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_PV_DEDICATED   || 8 || guest checks this feature bit
+   ||   || to determine if they run on
+   ||   || dedicated vCPUs, allowing opti-
+   ||   || mizations such as usage of
+   ||   || qspinlocks.
+--
 KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
||   || before enabling paravirtualized
||   || tlb flush.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 6cfa9c8..9a5ef67 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_PV_DEDICATED   8
 #define KVM_FEATURE_PV_TLB_FLUSH   9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT10
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa2b706..6f0e43f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -726,6 +726,12 @@ void __init kvm_spinlock_init(void)
 {
if (!kvm_para_available())
return;
+
+   if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) {
+   static_branch_disable(_spin_lock_key);
+   return;
+   }
+
/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
-- 
2.7.4



[PATCH 1/2] KVM: X86: Add dedicated pCPU hint PV_DEDICATED

2018-02-01 Thread Wanpeng Li
From: Wanpeng Li 

Waiman Long mentioned that:

 Generally speaking, unfair lock performs well for VMs with a small
 number of vCPUs. Native qspinlock may perform better than pvqspinlock
 if there is vCPU pinning and there is no vCPU over-commitment.

This patch adds a performance hint to allow hypervisor admin to choose
the qspinlock to be used when a dedicated pCPU is available.

PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
PV_DEDICATED = 0, PV_UNHALT = 1: default is Hybrid PV queued/unfair lock
PV_DEDICATED = 0, PV_UNHALT = 0: default is tas

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/cpuid.txt  | 6 ++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c| 6 ++
 3 files changed, 13 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 87a7506..c0740b1 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_PV_DEDICATED   || 8 || guest checks this feature bit
+   ||   || to determine if they run on
+   ||   || dedicated vCPUs, allowing opti-
+   ||   || mizations such as usage of
+   ||   || qspinlocks.
+--
 KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
||   || before enabling paravirtualized
||   || tlb flush.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 6cfa9c8..9a5ef67 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_PV_DEDICATED   8
 #define KVM_FEATURE_PV_TLB_FLUSH   9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT10
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa2b706..6f0e43f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -726,6 +726,12 @@ void __init kvm_spinlock_init(void)
 {
if (!kvm_para_available())
return;
+
+   if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) {
+   static_branch_disable(_spin_lock_key);
+   return;
+   }
+
/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
-- 
2.7.4



[PATCH 0/2] KVM: X86: Add dedicated pCPU hint and per-VM non-HLT-exiting capability

2018-02-01 Thread Wanpeng Li
Waiman Long mentioned that:

 Generally speaking, unfair lock performs well for VMs with a small
 number of vCPUs. Native qspinlock may perform better than pvqspinlock
 if there is vCPU pinning and there is no vCPU over-commitment.

This patchset adds a PV_DEDICATED performance hint to allow hypervisor 
admin to choose the qspinlock to be used when a dedicated pCPU is 
available.

In addition, according to the original hlt in VMX non-root mode discussion, 
https://www.spinics.net/lists/kvm/msg152397.html This patchset also 
adds the per-VM non-HLT-exiting capability to further improve performance 
under the dedicated pCPU scenarios.

Wanpeng Li (2):
  KVM: X86: Add dedicated pCPU hint PV_DEDICATED
  KVM: X86: Add per-VM no-HLT-exiting capability

 Documentation/virtual/kvm/api.txt| 11 +++
 Documentation/virtual/kvm/cpuid.txt  |  6 ++
 arch/x86/include/asm/kvm_host.h  |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c|  6 ++
 arch/x86/kvm/vmx.c   | 21 +
 arch/x86/kvm/x86.c   |  5 +
 arch/x86/kvm/x86.h   |  5 +
 include/uapi/linux/kvm.h |  1 +
 9 files changed, 58 insertions(+)

-- 
2.7.4



[PATCH 0/2] KVM: X86: Add dedicated pCPU hint and per-VM non-HLT-exiting capability

2018-02-01 Thread Wanpeng Li
Waiman Long mentioned that:

 Generally speaking, unfair lock performs well for VMs with a small
 number of vCPUs. Native qspinlock may perform better than pvqspinlock
 if there is vCPU pinning and there is no vCPU over-commitment.

This patchset adds a PV_DEDICATED performance hint to allow hypervisor 
admin to choose the qspinlock to be used when a dedicated pCPU is 
available.

In addition, according to the original hlt in VMX non-root mode discussion, 
https://www.spinics.net/lists/kvm/msg152397.html This patchset also 
adds the per-VM non-HLT-exiting capability to further improve performance 
under the dedicated pCPU scenarios.

Wanpeng Li (2):
  KVM: X86: Add dedicated pCPU hint PV_DEDICATED
  KVM: X86: Add per-VM no-HLT-exiting capability

 Documentation/virtual/kvm/api.txt| 11 +++
 Documentation/virtual/kvm/cpuid.txt  |  6 ++
 arch/x86/include/asm/kvm_host.h  |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c|  6 ++
 arch/x86/kvm/vmx.c   | 21 +
 arch/x86/kvm/x86.c   |  5 +
 arch/x86/kvm/x86.h   |  5 +
 include/uapi/linux/kvm.h |  1 +
 9 files changed, 58 insertions(+)

-- 
2.7.4



[PATCH 5/6] nvme-pci: discard wait timeout when delete cq/sq

2018-02-01 Thread Jianchao Wang
Currently, nvme_disable_io_queues could be wakeup by both request
completion and wait timeout path. This is unnecessary and could
introduce race between nvme_dev_disable and request timeout path.
When delete cq/sq command expires, the nvme_disable_io_queues will
also be wakeup and return to nvme_dev_disable, then handle the
outstanding requests. This will race with the request timeout path.

To fix it, just use wait_for_completion instead of the timeout one.
The request timeout path will wakeup it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b192b0..a838713c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2048,7 +2048,6 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 
opcode)
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
int pass, queues = dev->online_queues - 1;
-   unsigned long timeout;
u8 opcode = nvme_admin_delete_sq;
 
for (pass = 0; pass < 2; pass++) {
@@ -2056,15 +2055,12 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 
reinit_completion(>ioq_wait);
  retry:
-   timeout = ADMIN_TIMEOUT;
for (; i > 0; i--, sent++)
if (nvme_delete_queue(>queues[i], opcode))
break;
 
while (sent--) {
-   timeout = 
wait_for_completion_io_timeout(>ioq_wait, timeout);
-   if (timeout == 0)
-   return;
+   wait_for_completion(>ioq_wait);
if (i)
goto retry;
}
-- 
2.7.4



[PATCH 5/6] nvme-pci: discard wait timeout when delete cq/sq

2018-02-01 Thread Jianchao Wang
Currently, nvme_disable_io_queues could be wakeup by both request
completion and wait timeout path. This is unnecessary and could
introduce race between nvme_dev_disable and request timeout path.
When delete cq/sq command expires, the nvme_disable_io_queues will
also be wakeup and return to nvme_dev_disable, then handle the
outstanding requests. This will race with the request timeout path.

To fix it, just use wait_for_completion instead of the timeout one.
The request timeout path will wakeup it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b192b0..a838713c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2048,7 +2048,6 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 
opcode)
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
int pass, queues = dev->online_queues - 1;
-   unsigned long timeout;
u8 opcode = nvme_admin_delete_sq;
 
for (pass = 0; pass < 2; pass++) {
@@ -2056,15 +2055,12 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 
reinit_completion(>ioq_wait);
  retry:
-   timeout = ADMIN_TIMEOUT;
for (; i > 0; i--, sent++)
if (nvme_delete_queue(>queues[i], opcode))
break;
 
while (sent--) {
-   timeout = 
wait_for_completion_io_timeout(>ioq_wait, timeout);
-   if (timeout == 0)
-   return;
+   wait_for_completion(>ioq_wait);
if (i)
goto retry;
}
-- 
2.7.4



[PATCH 3/6] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface

2018-02-01 Thread Jianchao Wang
No functional change, just make blk_mq_rq_update_aborted_gstate a
external interface.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 01f271d..a027ca2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -581,7 +581,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int 
*srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
 {
unsigned long flags;
 
@@ -597,6 +597,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request 
*rq, u64 gstate)
u64_stats_update_end(>aborted_gstate_sync);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(blk_mq_rq_update_aborted_gstate);
 
 static u64 blk_mq_rq_aborted_gstate(struct request *rq)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49..ad54024 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,6 +257,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool 
at_head,
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long 
msecs);
 void blk_mq_complete_request(struct request *rq);
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate);
 
 bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
2.7.4



[PATCH 3/6] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface

2018-02-01 Thread Jianchao Wang
No functional change, just make blk_mq_rq_update_aborted_gstate a
external interface.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 01f271d..a027ca2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -581,7 +581,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int 
*srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
 {
unsigned long flags;
 
@@ -597,6 +597,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request 
*rq, u64 gstate)
u64_stats_update_end(>aborted_gstate_sync);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(blk_mq_rq_update_aborted_gstate);
 
 static u64 blk_mq_rq_aborted_gstate(struct request *rq)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49..ad54024 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,6 +257,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool 
at_head,
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long 
msecs);
 void blk_mq_complete_request(struct request *rq);
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate);
 
 bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
2.7.4



[PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-01 Thread Jianchao Wang
Move clearing host mem behind stopping queues. Prepare for
following patch which will grab all the outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..00cffed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2186,7 +2186,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (!dead) {
if (shutdown)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
+   }
+   nvme_stop_queues(>ctrl);
 
+   if (!dead) {
/*
 * If the controller is still alive tell it to stop using the
 * host memory buffer.  In theory the shutdown / reset should
@@ -2195,11 +2198,6 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
 */
if (dev->host_mem_descs)
nvme_set_host_mem(dev, 0);
-
-   }
-   nvme_stop_queues(>ctrl);
-
-   if (!dead) {
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-- 
2.7.4



[PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-01 Thread Jianchao Wang
Move clearing host mem behind stopping queues. Prepare for
following patch which will grab all the outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..00cffed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2186,7 +2186,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (!dead) {
if (shutdown)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
+   }
+   nvme_stop_queues(>ctrl);
 
+   if (!dead) {
/*
 * If the controller is still alive tell it to stop using the
 * host memory buffer.  In theory the shutdown / reset should
@@ -2195,11 +2198,6 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
 */
if (dev->host_mem_descs)
nvme_set_host_mem(dev, 0);
-
-   }
-   nvme_stop_queues(>ctrl);
-
-   if (!dead) {
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-- 
2.7.4



[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Currently, the complicated relationship between nvme_dev_disable
and nvme_timeout has become a devil that will introduce many
circular pattern which may trigger deadlock or IO hang. Let's
enumerate the tangles between them:
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.

To break up them, let's first look at what's kind of requests
nvme_timeout has to handle.

RESETTING  previous adminq/IOq request
or shutdownadminq requests from nvme_dev_disable

RECONNECTING   adminq requests from nvme_reset_work

nvme_timeout has to invoke nvme_dev_disable first before complete
all the expired request above. We avoid this as following.

For the previous adminq/IOq request:
use blk_abort_request to force all the outstanding requests expired
in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and
return BLK_EH_NOT_HANDLED. Then the request will not be completed and
freed. We needn't invoke nvme_dev_disable any more.

blk_abort_request is safe when race with irq completion path.
we have been able to grab all the outstanding requests. This could
eliminate the race between nvme_timeout and nvme_dev_disable.

We use NVME_REQ_CANCELLED to identify them. After the controller is
totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate
to clear requests and invoke blk_mq_complete_request to complete them.

In addition, to identify the previous adminq/IOq request and adminq
requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING
and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to
distinguish them.

For the adminq requests from nvme_dev_disable/nvme_reset_work:
invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and
return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will
see the error.

With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 146 
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7fa397..5b192b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -70,6 +70,8 @@ struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+#define NVME_PCI_OUTSTANDING_GRABBING 1
+#define NVME_PCI_OUTSTANDING_GRABBED 2
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -80,6 +82,7 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
+   int grab_flag;
struct dma_pool *prp_page_pool;
struct dma_pool *prp_small_pool;
unsigned online_queues;
@@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, 
blk_status_t error)
blk_mq_free_request(req);
 }
 
+static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   u32 csts;
+   bool dead;
+
+   if (!pci_is_enabled(pdev))
+   return;
+
+   csts = readl(dev->bar + NVME_REG_CSTS);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   pdev->error_state  != pci_channel_io_normal);
+   if (!dead)
+   nvme_disable_ctrl(>ctrl, dev->ctrl.cap);
+}
+
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
/*
 * Reset immediately if the controller is failed
+* nvme_dev_disable will take over the expired requests.
 */
if (nvme_should_reset(dev, csts)) {
+   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_warn_reset(dev, csts);
-   nvme_dev_disable(dev, false);
nvme_reset_ctrl(>ctrl);
-   return BLK_EH_HANDLED;
+   return BLK_EH_NOT_HANDLED;
}
 
/*
@@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
}
 
/*
-* Shutdown immediately if controller times out while starting. The
-* reset work will see the pci device disabled when it gets the forced
-* cancellation error. All outstanding requests are completed on
-* shutdown, so we return BLK_EH_HANDLED.
+* The previous outstanding requests on adminq and ioq have been
+* grabbed 

[PATCH 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Hi Christoph, Keith and Sagi

Please consider and comment on the following patchset.
That's really appreciated.

There is a complicated relationship between nvme_timeout and nvme_dev_disable.
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.
We have found some issues introduced by them, please refer the following link

http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html 
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html
Even we cannot ensure there is no other issue.

The best way to fix them is to break up the relationship between them.
With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.


There are 6 patches:

1st ~ 3th patches does some preparation for the 4th one.
4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement
the synchronization between them. More details, please refer to the comment of
this patch.
5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can
only be wakeup by completion path.
6th fixes a bug found when test, it is not related with 4th patch.

This patchset was tested under debug patch for some days.
And some bugfix have been done.
The debug patch and other patches are available in following it branch:
https://github.com/jianchwa/linux-blcok.git nvme_fixes_test

Jianchao Wang (6)
0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch
0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch
0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch
0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch
0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0006-nvme-pci-suspend-queues-based-on-online_queues.patch

diff stat following:
 block/blk-mq.c  |   3 +-
 drivers/nvme/host/pci.c | 225 
++-
 include/linux/blk-mq.h  |   1 +
 3 files changed, 169 insertions(+), 60 deletions(-)

Thanks
Jianchao



[PATCH 6/6] nvme-pci: suspend queues based on online_queues

2018-02-01 Thread Jianchao Wang
nvme cq irq is freed based on queue_count. When the sq/cq creation
fails, irq will not be setup. free_irq will warn 'Try to free
already-free irq'.

To fix it, we only increase online_queues when adminq/sq/cq are
created and associated irq is setup. Then suspend queues based
on online_queues.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a838713c..e37f209 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1349,9 +1349,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
nvmeq->cq_vector = -1;
spin_unlock_irq(>q_lock);
 
-   if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-   blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
@@ -1495,13 +1492,15 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, 
int qid)
nvme_init_queue(nvmeq, qid);
result = queue_request_irq(nvmeq);
if (result < 0)
-   goto release_sq;
+   goto offline;
 
return result;
 
- release_sq:
+offline:
+   dev->online_queues--;
+release_sq:
adapter_delete_sq(dev, qid);
- release_cq:
+release_cq:
adapter_delete_cq(dev, qid);
return result;
 }
@@ -1641,6 +1640,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev 
*dev)
result = queue_request_irq(nvmeq);
if (result) {
nvmeq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
 
@@ -1988,6 +1988,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = queue_request_irq(adminq);
if (result) {
adminq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
return nvme_create_io_queues(dev);
@@ -2257,13 +2258,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
int i;
bool dead = true;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int onlines;
 
mutex_lock(>shutdown_lock);
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
-   pdev->error_state  != pci_channel_io_normal);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   (pdev->error_state  != pci_channel_io_normal) ||
+   (dev->online_queues == 0));
}
 
/* Just freeze the queue for shutdown case */
@@ -2297,9 +2301,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-   for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
+
+   onlines = dev->online_queues;
+   for (i = onlines - 1; i >= 0; i--)
nvme_suspend_queue(>queues[i]);
 
+   if (dev->ctrl.admin_q)
+   blk_mq_quiesce_queue(dev->ctrl.admin_q);
+
nvme_pci_disable(dev);
 
blk_mq_tagset_busy_iter(>tagset, nvme_pci_cancel_rq, >ctrl);
@@ -2444,12 +2453,12 @@ static void nvme_reset_work(struct work_struct *work)
 * Keep the controller around but remove all namespaces if we don't have
 * any working I/O queue.
 */
-   if (dev->online_queues < 2) {
+   if (dev->online_queues == 1) {
dev_warn(dev->ctrl.device, "IO queues not created\n");
nvme_kill_queues(>ctrl);
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
-   } else {
+   } else if (dev->online_queues > 1) {
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-- 
2.7.4



[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Currently, the complicated relationship between nvme_dev_disable
and nvme_timeout has become a devil that will introduce many
circular pattern which may trigger deadlock or IO hang. Let's
enumerate the tangles between them:
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.

To break up them, let's first look at what's kind of requests
nvme_timeout has to handle.

RESETTING  previous adminq/IOq request
or shutdownadminq requests from nvme_dev_disable

RECONNECTING   adminq requests from nvme_reset_work

nvme_timeout has to invoke nvme_dev_disable first before complete
all the expired request above. We avoid this as following.

For the previous adminq/IOq request:
use blk_abort_request to force all the outstanding requests expired
in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and
return BLK_EH_NOT_HANDLED. Then the request will not be completed and
freed. We needn't invoke nvme_dev_disable any more.

blk_abort_request is safe when race with irq completion path.
we have been able to grab all the outstanding requests. This could
eliminate the race between nvme_timeout and nvme_dev_disable.

We use NVME_REQ_CANCELLED to identify them. After the controller is
totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate
to clear requests and invoke blk_mq_complete_request to complete them.

In addition, to identify the previous adminq/IOq request and adminq
requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING
and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to
distinguish them.

For the adminq requests from nvme_dev_disable/nvme_reset_work:
invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and
return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will
see the error.

With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 146 
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7fa397..5b192b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -70,6 +70,8 @@ struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+#define NVME_PCI_OUTSTANDING_GRABBING 1
+#define NVME_PCI_OUTSTANDING_GRABBED 2
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -80,6 +82,7 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
+   int grab_flag;
struct dma_pool *prp_page_pool;
struct dma_pool *prp_small_pool;
unsigned online_queues;
@@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, 
blk_status_t error)
blk_mq_free_request(req);
 }
 
+static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   u32 csts;
+   bool dead;
+
+   if (!pci_is_enabled(pdev))
+   return;
+
+   csts = readl(dev->bar + NVME_REG_CSTS);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   pdev->error_state  != pci_channel_io_normal);
+   if (!dead)
+   nvme_disable_ctrl(>ctrl, dev->ctrl.cap);
+}
+
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
/*
 * Reset immediately if the controller is failed
+* nvme_dev_disable will take over the expired requests.
 */
if (nvme_should_reset(dev, csts)) {
+   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_warn_reset(dev, csts);
-   nvme_dev_disable(dev, false);
nvme_reset_ctrl(>ctrl);
-   return BLK_EH_HANDLED;
+   return BLK_EH_NOT_HANDLED;
}
 
/*
@@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
}
 
/*
-* Shutdown immediately if controller times out while starting. The
-* reset work will see the pci device disabled when it gets the forced
-* cancellation error. All outstanding requests are completed on
-* shutdown, so we return BLK_EH_HANDLED.
+* The previous outstanding requests on adminq and ioq have been
+* grabbed or drained for RECONNECTING 

[PATCH 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Hi Christoph, Keith and Sagi

Please consider and comment on the following patchset.
That's really appreciated.

There is a complicated relationship between nvme_timeout and nvme_dev_disable.
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.
We have found some issues introduced by them, please refer the following link

http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html 
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html
Even we cannot ensure there is no other issue.

The best way to fix them is to break up the relationship between them.
With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.


There are 6 patches:

1st ~ 3th patches does some preparation for the 4th one.
4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement
the synchronization between them. More details, please refer to the comment of
this patch.
5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can
only be wakeup by completion path.
6th fixes a bug found when test, it is not related with 4th patch.

This patchset was tested under debug patch for some days.
And some bugfix have been done.
The debug patch and other patches are available in following it branch:
https://github.com/jianchwa/linux-blcok.git nvme_fixes_test

Jianchao Wang (6)
0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch
0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch
0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch
0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch
0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0006-nvme-pci-suspend-queues-based-on-online_queues.patch

diff stat following:
 block/blk-mq.c  |   3 +-
 drivers/nvme/host/pci.c | 225 
++-
 include/linux/blk-mq.h  |   1 +
 3 files changed, 169 insertions(+), 60 deletions(-)

Thanks
Jianchao



[PATCH 6/6] nvme-pci: suspend queues based on online_queues

2018-02-01 Thread Jianchao Wang
nvme cq irq is freed based on queue_count. When the sq/cq creation
fails, irq will not be setup. free_irq will warn 'Try to free
already-free irq'.

To fix it, we only increase online_queues when adminq/sq/cq are
created and associated irq is setup. Then suspend queues based
on online_queues.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a838713c..e37f209 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1349,9 +1349,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
nvmeq->cq_vector = -1;
spin_unlock_irq(>q_lock);
 
-   if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-   blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
@@ -1495,13 +1492,15 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, 
int qid)
nvme_init_queue(nvmeq, qid);
result = queue_request_irq(nvmeq);
if (result < 0)
-   goto release_sq;
+   goto offline;
 
return result;
 
- release_sq:
+offline:
+   dev->online_queues--;
+release_sq:
adapter_delete_sq(dev, qid);
- release_cq:
+release_cq:
adapter_delete_cq(dev, qid);
return result;
 }
@@ -1641,6 +1640,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev 
*dev)
result = queue_request_irq(nvmeq);
if (result) {
nvmeq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
 
@@ -1988,6 +1988,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = queue_request_irq(adminq);
if (result) {
adminq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
return nvme_create_io_queues(dev);
@@ -2257,13 +2258,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
int i;
bool dead = true;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int onlines;
 
mutex_lock(>shutdown_lock);
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
-   pdev->error_state  != pci_channel_io_normal);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   (pdev->error_state  != pci_channel_io_normal) ||
+   (dev->online_queues == 0));
}
 
/* Just freeze the queue for shutdown case */
@@ -2297,9 +2301,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-   for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
+
+   onlines = dev->online_queues;
+   for (i = onlines - 1; i >= 0; i--)
nvme_suspend_queue(>queues[i]);
 
+   if (dev->ctrl.admin_q)
+   blk_mq_quiesce_queue(dev->ctrl.admin_q);
+
nvme_pci_disable(dev);
 
blk_mq_tagset_busy_iter(>tagset, nvme_pci_cancel_rq, >ctrl);
@@ -2444,12 +2453,12 @@ static void nvme_reset_work(struct work_struct *work)
 * Keep the controller around but remove all namespaces if we don't have
 * any working I/O queue.
 */
-   if (dev->online_queues < 2) {
+   if (dev->online_queues == 1) {
dev_warn(dev->ctrl.device, "IO queues not created\n");
nvme_kill_queues(>ctrl);
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
-   } else {
+   } else if (dev->online_queues > 1) {
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-- 
2.7.4



[PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-02-01 Thread George Cherian
The PCIe Controller on Cavium ThunderX2 processors does not
respond to downstream CFG/ECFG cycles when root port is
in power management D3-hot state.

In our tests the above mentioned errata causes the following crash when
the downstream endpoint config space is accessed, while root port is in
D3 state.

[   12.775202] Unhandled fault: synchronous external abort (0x96000610) at 
0x
[   12.783453] Internal error: : 96000610 [#1] SMP
[   12.787971] Modules linked in: aes_neon_blk ablk_helper cryptd
[   12.793799] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.0-32-generic #34
[   12.800659] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 01/01/2018
[   12.807607] task: 808f346b8d80 task.stack: 808f346b4000
[   12.813518] PC is at pci_generic_config_read+0x5c/0xf0
[   12.818643] LR is at pci_generic_config_read+0x48/0xf0
[   12.823767] pc : [] lr : [] pstate: 
204000c9
[   12.831148] sp : 808f346b7bf0
[   12.834449] x29: 808f346b7bf0 x28: 08e2b848
[   12.839750] x27: 08dc3070 x26: 08d516c0
[   12.845050] x25: 0040 x24: 0937a480
[   12.850351] x23: 006c x22: 
[   12.855651] x21: 808f346b7c84 x20: 0004
[   12.860951] x19: 808f31076000 x18: 
[   12.866251] x17: 1b3613e6 x16: 7f330457
[   12.871551] x15: 67268ad7 x14: 5c6254ac
[   12.876851] x13: f1e100cb x12: 0030
[   12.882151] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   12.887452] x9 : ff656d6e626d686f x8 : 7f7f7f7f7f7f7f7f
[   12.892752] x7 : 808f310da108 x6 : 
[   12.898052] x5 : 0003 x4 : 808f3107a800
[   12.903352] x3 : 0030006c x2 : 0014
[   12.908652] x1 : 2000 x0 : 2030006c
[   12.913952]
[   12.915431] Process swapper/0 (pid: 1, stack limit = 0x808f346b4020)
[   12.922118] Stack: (0x808f346b7bf0 to 0x808f346b8000)
[   12.927850] 7be0:   808f346b7c30 
08506e2c
[   12.935665] 7c00: 09185000 006c 808f31076000 
808f346b7d14
[   12.943481] 7c20:  08309488 808f346b7c90 
085089f4
[   12.951296] 7c40: 0004 808f310d4000  
808f346b7d14
[   12.959111] 7c60: 0068 08dc3078 08d604c8 
085089d8
[   12.966927] 7c80: 0004 0004080b 808f346b7cd0 
08513d28
[   12.974742] 7ca0: 09185000 ffe7 0001 
808f310d4000
[   12.982557] 7cc0: 092ae000 808f310d4000 808f346b7d20 
085142d4
[   12.990372] 7ce0: 808f310d4000 808f310d4000 09214000 
808f310d40b0
[   12.998188] 7d00: 092ae000 808f310d40b0 092ae000 
0004080b
[   13.006003] 7d20: 808f346b7d40 08518754  
808f310d4000
[   13.013818] 7d40: 808f346b7d80 08d9a974  
808f310d4000
[   13.021634] 7d60: 08d9a93c  092ae000 
0004080b
[   13.029449] 7d80: 808f346b7da0 08083b4c 09185000 
808f346b4000
[   13.037264] 7da0: 808f346b7e30 08d60dfc 00f5 
09185000
[   13.045079] 7dc0: 092ae000 0007 092ae000 
08dc3078
[   13.052895] 7de0: 08d604c8 08d51600 08dc3070 
08e2b720
[   13.060710] 7e00: 091a68d8 08c09678  
00070007
[   13.068526] 7e20:  0004080b 808f346b7ea0 
08980d90
[   13.076342] 7e40: 08980d78   

[   13.084157] 7e60:    

[   13.091972] 7e80:    
0004080b
[   13.099788] 7ea0:  08083690 08980d78 

[   13.107603] 7ec0:    

[   13.115418] 7ee0:    

[   13.123233] 7f00:    

[   13.131048] 7f20:    

[   13.138864] 7f40:    

[   13.146679] 7f60:    

[   13.154494] 7f80:    

[   13.162309] 7fa0:    

[   13.170125] 7fc0:  0005  

[   13.177940] 7fe0:    

[   13.185755] Call trace:
[   13.188190] 

[PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-02-01 Thread George Cherian
The PCIe Controller on Cavium ThunderX2 processors does not
respond to downstream CFG/ECFG cycles when root port is
in power management D3-hot state.

In our tests the above mentioned errata causes the following crash when
the downstream endpoint config space is accessed, while root port is in
D3 state.

[   12.775202] Unhandled fault: synchronous external abort (0x96000610) at 
0x
[   12.783453] Internal error: : 96000610 [#1] SMP
[   12.787971] Modules linked in: aes_neon_blk ablk_helper cryptd
[   12.793799] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.0-32-generic #34
[   12.800659] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 01/01/2018
[   12.807607] task: 808f346b8d80 task.stack: 808f346b4000
[   12.813518] PC is at pci_generic_config_read+0x5c/0xf0
[   12.818643] LR is at pci_generic_config_read+0x48/0xf0
[   12.823767] pc : [] lr : [] pstate: 
204000c9
[   12.831148] sp : 808f346b7bf0
[   12.834449] x29: 808f346b7bf0 x28: 08e2b848
[   12.839750] x27: 08dc3070 x26: 08d516c0
[   12.845050] x25: 0040 x24: 0937a480
[   12.850351] x23: 006c x22: 
[   12.855651] x21: 808f346b7c84 x20: 0004
[   12.860951] x19: 808f31076000 x18: 
[   12.866251] x17: 1b3613e6 x16: 7f330457
[   12.871551] x15: 67268ad7 x14: 5c6254ac
[   12.876851] x13: f1e100cb x12: 0030
[   12.882151] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   12.887452] x9 : ff656d6e626d686f x8 : 7f7f7f7f7f7f7f7f
[   12.892752] x7 : 808f310da108 x6 : 
[   12.898052] x5 : 0003 x4 : 808f3107a800
[   12.903352] x3 : 0030006c x2 : 0014
[   12.908652] x1 : 2000 x0 : 2030006c
[   12.913952]
[   12.915431] Process swapper/0 (pid: 1, stack limit = 0x808f346b4020)
[   12.922118] Stack: (0x808f346b7bf0 to 0x808f346b8000)
[   12.927850] 7be0:   808f346b7c30 
08506e2c
[   12.935665] 7c00: 09185000 006c 808f31076000 
808f346b7d14
[   12.943481] 7c20:  08309488 808f346b7c90 
085089f4
[   12.951296] 7c40: 0004 808f310d4000  
808f346b7d14
[   12.959111] 7c60: 0068 08dc3078 08d604c8 
085089d8
[   12.966927] 7c80: 0004 0004080b 808f346b7cd0 
08513d28
[   12.974742] 7ca0: 09185000 ffe7 0001 
808f310d4000
[   12.982557] 7cc0: 092ae000 808f310d4000 808f346b7d20 
085142d4
[   12.990372] 7ce0: 808f310d4000 808f310d4000 09214000 
808f310d40b0
[   12.998188] 7d00: 092ae000 808f310d40b0 092ae000 
0004080b
[   13.006003] 7d20: 808f346b7d40 08518754  
808f310d4000
[   13.013818] 7d40: 808f346b7d80 08d9a974  
808f310d4000
[   13.021634] 7d60: 08d9a93c  092ae000 
0004080b
[   13.029449] 7d80: 808f346b7da0 08083b4c 09185000 
808f346b4000
[   13.037264] 7da0: 808f346b7e30 08d60dfc 00f5 
09185000
[   13.045079] 7dc0: 092ae000 0007 092ae000 
08dc3078
[   13.052895] 7de0: 08d604c8 08d51600 08dc3070 
08e2b720
[   13.060710] 7e00: 091a68d8 08c09678  
00070007
[   13.068526] 7e20:  0004080b 808f346b7ea0 
08980d90
[   13.076342] 7e40: 08980d78   

[   13.084157] 7e60:    

[   13.091972] 7e80:    
0004080b
[   13.099788] 7ea0:  08083690 08980d78 

[   13.107603] 7ec0:    

[   13.115418] 7ee0:    

[   13.123233] 7f00:    

[   13.131048] 7f20:    

[   13.138864] 7f40:    

[   13.146679] 7f60:    

[   13.154494] 7f80:    

[   13.162309] 7fa0:    

[   13.170125] 7fc0:  0005  

[   13.177940] 7fe0:    

[   13.185755] Call trace:
[   13.188190] 

Re: [PATCH] ARM: dts: imx53: use PMIC's TSI pins in adc mode

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:28:20PM +0100, Sebastian Reichel wrote:
> PPD uses the PMIC's TSI pins in general purpose ADC mode.
> 
> Signed-off-by: Sebastian Reichel 

s/imx53/imx53-ppd

> ---
>  arch/arm/boot/dts/imx53-ppd.dts | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx53-ppd.dts b/arch/arm/boot/dts/imx53-ppd.dts
> index 123297da43a7..80224fa995f9 100644
> --- a/arch/arm/boot/dts/imx53-ppd.dts
> +++ b/arch/arm/boot/dts/imx53-ppd.dts
> @@ -132,6 +132,14 @@
>   enable-active-high;
>   };
>  
> + reg_tsiref: tsiref {

A better node name should be regulator-tsiref.

> + compatible = "regulator-fixed";
> + regulator-name = "tsiref";
> + regulator-min-microvolt = <250>;
> + regulator-max-microvolt = <250>;
> + regulator-always-on;
> + };
> +
>   pwm_bl: backlight {
>   compatible = "pwm-backlight";
>   pwms = < 0 5>;
> @@ -295,6 +303,9 @@
>   interrupts = <12 0x8>;
>   spi-max-frequency = <100>;
>  

This new line can be dropped now.

I fixed up all these, and applied the patch.

Shawn 

> + dlg,tsi-as-adc;
> + tsiref-supply = <_tsiref>;
> +
>   regulators {
>   buck1_reg: buck1 {
>   regulator-name = "BUCKCORE";
> -- 
> 2.15.1
> 


Re: [PATCH] ARM: dts: imx53: use PMIC's TSI pins in adc mode

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:28:20PM +0100, Sebastian Reichel wrote:
> PPD uses the PMIC's TSI pins in general purpose ADC mode.
> 
> Signed-off-by: Sebastian Reichel 

s/imx53/imx53-ppd

> ---
>  arch/arm/boot/dts/imx53-ppd.dts | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx53-ppd.dts b/arch/arm/boot/dts/imx53-ppd.dts
> index 123297da43a7..80224fa995f9 100644
> --- a/arch/arm/boot/dts/imx53-ppd.dts
> +++ b/arch/arm/boot/dts/imx53-ppd.dts
> @@ -132,6 +132,14 @@
>   enable-active-high;
>   };
>  
> + reg_tsiref: tsiref {

A better node name should be regulator-tsiref.

> + compatible = "regulator-fixed";
> + regulator-name = "tsiref";
> + regulator-min-microvolt = <250>;
> + regulator-max-microvolt = <250>;
> + regulator-always-on;
> + };
> +
>   pwm_bl: backlight {
>   compatible = "pwm-backlight";
>   pwms = < 0 5>;
> @@ -295,6 +303,9 @@
>   interrupts = <12 0x8>;
>   spi-max-frequency = <100>;
>  

This new line can be dropped now.

I fixed up all these, and applied the patch.

Shawn 

> + dlg,tsi-as-adc;
> + tsiref-supply = <_tsiref>;
> +
>   regulators {
>   buck1_reg: buck1 {
>   regulator-name = "BUCKCORE";
> -- 
> 2.15.1
> 


Re: [PATCH] xen: fix frontend driver disconnected from xenbus on removal

2018-02-01 Thread Oleksandr Andrushchenko

On 02/01/2018 11:09 PM, Boris Ostrovsky wrote:

On 02/01/2018 03:24 PM, Oleksandr Andrushchenko wrote:


On 02/01/2018 10:08 PM, Boris Ostrovsky wrote:

On 02/01/2018 03:57 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Current xenbus frontend driver removal flow first disconnects
the driver from xenbus and then calls driver's remove callback.
This makes it impossible for the driver to listen to backend's
state changes and synchronize the removal procedure.

Fix this by removing other end XenBus watches after the
driver's remove callback is called.

Signed-off-by: Oleksandr Andrushchenko

---
   drivers/xen/xenbus/xenbus_probe.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c
b/drivers/xen/xenbus/xenbus_probe.c
index 74888cacd0b0..9c63cd3f416b 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -258,11 +258,11 @@ int xenbus_dev_remove(struct device *_dev)
 DPRINTK("%s", dev->nodename);
   -free_otherend_watch(dev);
-
   if (drv->remove)
   drv->remove(dev);

Is it possible for the watch to fire here?

Indeed. Yes, It is possible, so we have to somehow protect the removed
driver from being called, e.g. the driver cleans up in its .remove,
but watch may still trigger .otherend_changed callback.
Is this what you mean?

(-David who is not at Citrix anymore)

Exactly.

That's why otherend cleanup is split into free_otherend_watch() and
free_otherend_details().

Understood, thank you
Confusion came because of the patch [1]: in .remove we wait
for the backend to change its states in .otherend_changed
callback and wake us, but I am not sure how those state changes
may occur if during .remove the driver has already watches
freed. So, this is why I tried to play around with
free_otherend_watch()...



If so, do you have something neat on your mind how to solve this?

Not necessarily "neat" but perhaps you can use
xenbus_read_otherend_details() in both front and back ends. After all,
IIUIC you are doing something synchronously so you don't really need a
watch.

Yes, I will implement a dedicated flow in the .remove
instead of relying on .otherend_changed

-boris


-boris


   +free_otherend_watch(dev);
+
   free_otherend_details(dev);
 xenbus_switch_state(dev, XenbusStateClosed);

Thank you,
Oleksandr

Thank you,
Oleksandr

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/xen-netfront.c?id=5b5971df3bc2775107ddad164018a8a8db633b81


[PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-01 Thread Jianchao Wang
Currently, request queue will be frozen and quiesced for both reset
and shutdown case. This will trigger ioq requests in RECONNECTING
state which should be avoided to prepare for following patch.
Just freeze request queue for shutdown case and drain all the resudual
entered requests after controller has been shutdown.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 00cffed..a7fa397 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2172,21 +2172,23 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   if (dev->ctrl.state == NVME_CTRL_LIVE ||
-   dev->ctrl.state == NVME_CTRL_RESETTING)
-   nvme_start_freeze(>ctrl);
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state  != pci_channel_io_normal);
}
 
-   /*
-* Give the controller a chance to complete all entered requests if
-* doing a safe shutdown.
-*/
-   if (!dead) {
-   if (shutdown)
+   /* Just freeze the queue for shutdown case */
+   if (shutdown) {
+   if (dev->ctrl.state == NVME_CTRL_LIVE ||
+   dev->ctrl.state == NVME_CTRL_RESETTING)
+   nvme_start_freeze(>ctrl);
+   /*
+* Give the controller a chance to complete all
+* entered requests if doing a safe shutdown.
+*/
+   if (!dead)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
}
+
nvme_stop_queues(>ctrl);
 
if (!dead) {
@@ -2210,12 +2212,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
blk_mq_tagset_busy_iter(>admin_tagset, nvme_cancel_request, 
>ctrl);
 
/*
-* The driver will not be starting up queues again if shutting down so
-* must flush all entered requests to their failed completion to avoid
-* deadlocking blk-mq hot-cpu notifier.
+* For shutdown case, controller will not be setup again soon. If any
+* residual requests here, the controller must have go wrong. Drain and
+* fail all the residual entered IO requests.
 */
-   if (shutdown)
+   if (shutdown) {
nvme_start_queues(>ctrl);
+   nvme_wait_freeze(>ctrl);
+   nvme_stop_queues(>ctrl);
+   }
mutex_unlock(>shutdown_lock);
 }
 
@@ -2349,12 +2354,11 @@ static void nvme_reset_work(struct work_struct *work)
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
} else {
-   nvme_start_queues(>ctrl);
-   nvme_wait_freeze(>ctrl);
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-   nvme_unfreeze(>ctrl);
+   if (was_suspend)
+   nvme_unfreeze(>ctrl);
}
 
/*
-- 
2.7.4



[PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-01 Thread Jianchao Wang
Currently, request queue will be frozen and quiesced for both reset
and shutdown case. This will trigger ioq requests in RECONNECTING
state which should be avoided to prepare for following patch.
Just freeze request queue for shutdown case and drain all the resudual
entered requests after controller has been shutdown.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 00cffed..a7fa397 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2172,21 +2172,23 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   if (dev->ctrl.state == NVME_CTRL_LIVE ||
-   dev->ctrl.state == NVME_CTRL_RESETTING)
-   nvme_start_freeze(>ctrl);
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state  != pci_channel_io_normal);
}
 
-   /*
-* Give the controller a chance to complete all entered requests if
-* doing a safe shutdown.
-*/
-   if (!dead) {
-   if (shutdown)
+   /* Just freeze the queue for shutdown case */
+   if (shutdown) {
+   if (dev->ctrl.state == NVME_CTRL_LIVE ||
+   dev->ctrl.state == NVME_CTRL_RESETTING)
+   nvme_start_freeze(>ctrl);
+   /*
+* Give the controller a chance to complete all
+* entered requests if doing a safe shutdown.
+*/
+   if (!dead)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
}
+
nvme_stop_queues(>ctrl);
 
if (!dead) {
@@ -2210,12 +2212,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
blk_mq_tagset_busy_iter(>admin_tagset, nvme_cancel_request, 
>ctrl);
 
/*
-* The driver will not be starting up queues again if shutting down so
-* must flush all entered requests to their failed completion to avoid
-* deadlocking blk-mq hot-cpu notifier.
+* For shutdown case, controller will not be setup again soon. If any
+* residual requests here, the controller must have go wrong. Drain and
+* fail all the residual entered IO requests.
 */
-   if (shutdown)
+   if (shutdown) {
nvme_start_queues(>ctrl);
+   nvme_wait_freeze(>ctrl);
+   nvme_stop_queues(>ctrl);
+   }
mutex_unlock(>shutdown_lock);
 }
 
@@ -2349,12 +2354,11 @@ static void nvme_reset_work(struct work_struct *work)
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
} else {
-   nvme_start_queues(>ctrl);
-   nvme_wait_freeze(>ctrl);
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-   nvme_unfreeze(>ctrl);
+   if (was_suspend)
+   nvme_unfreeze(>ctrl);
}
 
/*
-- 
2.7.4



Re: [PATCH] xen: fix frontend driver disconnected from xenbus on removal

2018-02-01 Thread Oleksandr Andrushchenko

On 02/01/2018 11:09 PM, Boris Ostrovsky wrote:

On 02/01/2018 03:24 PM, Oleksandr Andrushchenko wrote:


On 02/01/2018 10:08 PM, Boris Ostrovsky wrote:

On 02/01/2018 03:57 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Current xenbus frontend driver removal flow first disconnects
the driver from xenbus and then calls driver's remove callback.
This makes it impossible for the driver to listen to backend's
state changes and synchronize the removal procedure.

Fix this by removing other end XenBus watches after the
driver's remove callback is called.

Signed-off-by: Oleksandr Andrushchenko

---
   drivers/xen/xenbus/xenbus_probe.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c
b/drivers/xen/xenbus/xenbus_probe.c
index 74888cacd0b0..9c63cd3f416b 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -258,11 +258,11 @@ int xenbus_dev_remove(struct device *_dev)
 DPRINTK("%s", dev->nodename);
   -free_otherend_watch(dev);
-
   if (drv->remove)
   drv->remove(dev);

Is it possible for the watch to fire here?

Indeed. Yes, It is possible, so we have to somehow protect the removed
driver from being called, e.g. the driver cleans up in its .remove,
but watch may still trigger .otherend_changed callback.
Is this what you mean?

(-David who is not at Citrix anymore)

Exactly.

That's why otherend cleanup is split into free_otherend_watch() and
free_otherend_details().

Understood, thank you
Confusion came because of the patch [1]: in .remove we wait
for the backend to change its states in .otherend_changed
callback and wake us, but I am not sure how those state changes
may occur if during .remove the driver has already watches
freed. So, this is why I tried to play around with
free_otherend_watch()...



If so, do you have something neat on your mind how to solve this?

Not necessarily "neat" but perhaps you can use
xenbus_read_otherend_details() in both front and back ends. After all,
IIUIC you are doing something synchronously so you don't really need a
watch.

Yes, I will implement a dedicated flow in the .remove
instead of relying on .otherend_changed

-boris


-boris


   +free_otherend_watch(dev);
+
   free_otherend_details(dev);
 xenbus_switch_state(dev, XenbusStateClosed);

Thank you,
Oleksandr

Thank you,
Oleksandr

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/xen-netfront.c?id=5b5971df3bc2775107ddad164018a8a8db633b81


Is the hisilicon tree maintained ?

2018-02-01 Thread Daniel Lezcano

Hi Wei Xu,

I found in the MAINTAINERS file the hisilicon tree is at:

https://github.com/hisilicon/linux-hisi

But, (except I missed it) I didn't find any update since Nov, 2017.

Is that tree maintained ?

Thanks?

  -- Daniel


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



[PATCH 3/6] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface

2018-02-01 Thread Jianchao Wang
No functional change, just make blk_mq_rq_update_aborted_gstate a
external interface.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 01f271d..a027ca2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -581,7 +581,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int 
*srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
 {
unsigned long flags;
 
@@ -597,6 +597,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request 
*rq, u64 gstate)
u64_stats_update_end(>aborted_gstate_sync);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(blk_mq_rq_update_aborted_gstate);
 
 static u64 blk_mq_rq_aborted_gstate(struct request *rq)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49..ad54024 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,6 +257,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool 
at_head,
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long 
msecs);
 void blk_mq_complete_request(struct request *rq);
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate);
 
 bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
2.7.4



Is the hisilicon tree maintained ?

2018-02-01 Thread Daniel Lezcano

Hi Wei Xu,

I found in the MAINTAINERS file the hisilicon tree is at:

https://github.com/hisilicon/linux-hisi

But, (except I missed it) I didn't find any update since Nov, 2017.

Is that tree maintained ?

Thanks?

  -- Daniel


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



[PATCH 3/6] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface

2018-02-01 Thread Jianchao Wang
No functional change, just make blk_mq_rq_update_aborted_gstate a
external interface.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 01f271d..a027ca2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -581,7 +581,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int 
*srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
 {
unsigned long flags;
 
@@ -597,6 +597,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request 
*rq, u64 gstate)
u64_stats_update_end(>aborted_gstate_sync);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(blk_mq_rq_update_aborted_gstate);
 
 static u64 blk_mq_rq_aborted_gstate(struct request *rq)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49..ad54024 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,6 +257,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool 
at_head,
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long 
msecs);
 void blk_mq_complete_request(struct request *rq);
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate);
 
 bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
2.7.4



RE: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread He, Roger
Use the limit as total ram*1/2 seems work very well. 
No OOM although swap disk reaches full at peak for piglit test.

But for this approach, David noticed that has an obvious defect. 
For example,  if the platform has 32G system memory, 8G swap disk.
1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
at all.
For now we work out an improved version based on get_nr_swap_pages().
Going to send out later.

Thanks
Roger(Hongbo.He)
-Original Message-
From: He, Roger 
Sent: Thursday, February 01, 2018 4:03 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 

Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Just now, I tried with fixed limit.  But not work always.
For example: set the limit as 4GB on my platform with 8GB system memory, it can 
pass.
But when run with platform with 16GB system memory, it failed since OOM.

And I guess it also depends on app's behavior.
I mean some apps  make OS to use more swap space as well.

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
He, Roger
Sent: Thursday, February 01, 2018 1:48 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

But what we could do is to rely on a fixed limit like the Intel driver 
does and I suggested before.
E.g. don't copy anything into a shmemfile when there is only x MB of 
swap space left.

Here I think we can do it further, let the limit value scaling with total 
system memory.
For example: total system memory * 1/2. 
If that it will match the platform configuration better. 

Roger can you test that approach once more with your fix for the OOM 
issues in the page fault handler?

Sure. Use the limit as total ram*1/2 seems work very well. 
No OOM although swap disk reaches full at peak for piglit test.
I speculate this case happens but no OOM because:

a. run a while, swap disk be used close to 1/2* total size and but not over 1/2 
* total.
b. all subsequent swapped pages stay in system memory until no space there.
 Then the swapped pages in shmem be flushed into swap disk. And probably OS 
also need some swap space.
 For this case, it is easy to get full for swap disk.
c. but because now free swap size < 1/2 * total, so no swap out happen  after 
that. 
And at least 1/4* system memory will left because below check in 
ttm_mem_global_reserve will ensure that.
if (zone->used_mem > limit)
goto out_unlock;

Thanks
Roger(Hongbo.He)
-Original Message-
From: Koenig, Christian
Sent: Wednesday, January 31, 2018 4:13 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
driver does and I suggested before.

E.g. don't copy anything into a shmemfile when there is only x MB of swap space 
left.

Roger can you test that approach once more with your fix for the OOM issues in 
the page fault handler?

Thanks,
Christian.

Am 31.01.2018 um 09:08 schrieb He, Roger:
>   I think this patch isn't need at all. You can directly read 
> total_swap_pages variable in TTM.
>
> Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using 
> will result in:
> "WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Chunming Zhou
> Sent: Wednesday, January 31, 2018 3:15 PM
> To: He, Roger ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Koenig, 
> Christian 
> Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Hi Roger,
>
> I think this patch isn't need at all. You can directly read total_swap_pages 
> variable in TTM. See the comment:
>
> /* protected with swap_lock. reading in vm_swap_full() doesn't need 
> lock */ long total_swap_pages;
>
> there are many places using it directly, you just couldn't change its value. 
> Reading it doesn't need lock.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018年01月29日 16:29, Roger He wrote:
>> ttm module needs it to determine its internal parameter setting.
>>
>> Signed-off-by: Roger He 
>> ---
>>

RE: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread He, Roger
Use the limit as total ram*1/2 seems work very well. 
No OOM although swap disk reaches full at peak for piglit test.

But for this approach, David noticed that has an obvious defect. 
For example,  if the platform has 32G system memory, 8G swap disk.
1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
at all.
For now we work out an improved version based on get_nr_swap_pages().
Going to send out later.

Thanks
Roger(Hongbo.He)
-Original Message-
From: He, Roger 
Sent: Thursday, February 01, 2018 4:03 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 

Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Just now, I tried with fixed limit.  But not work always.
For example: set the limit as 4GB on my platform with 8GB system memory, it can 
pass.
But when run with platform with 16GB system memory, it failed since OOM.

And I guess it also depends on app's behavior.
I mean some apps  make OS to use more swap space as well.

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
He, Roger
Sent: Thursday, February 01, 2018 1:48 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

But what we could do is to rely on a fixed limit like the Intel driver 
does and I suggested before.
E.g. don't copy anything into a shmemfile when there is only x MB of 
swap space left.

Here I think we can do it further, let the limit value scaling with total 
system memory.
For example: total system memory * 1/2. 
If that it will match the platform configuration better. 

Roger can you test that approach once more with your fix for the OOM 
issues in the page fault handler?

Sure. Use the limit as total ram*1/2 seems work very well. 
No OOM although swap disk reaches full at peak for piglit test.
I speculate this case happens but no OOM because:

a. run a while, swap disk be used close to 1/2* total size and but not over 1/2 
* total.
b. all subsequent swapped pages stay in system memory until no space there.
 Then the swapped pages in shmem be flushed into swap disk. And probably OS 
also need some swap space.
 For this case, it is easy to get full for swap disk.
c. but because now free swap size < 1/2 * total, so no swap out happen  after 
that. 
And at least 1/4* system memory will left because below check in 
ttm_mem_global_reserve will ensure that.
if (zone->used_mem > limit)
goto out_unlock;

Thanks
Roger(Hongbo.He)
-Original Message-
From: Koenig, Christian
Sent: Wednesday, January 31, 2018 4:13 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
driver does and I suggested before.

E.g. don't copy anything into a shmemfile when there is only x MB of swap space 
left.

Roger can you test that approach once more with your fix for the OOM issues in 
the page fault handler?

Thanks,
Christian.

Am 31.01.2018 um 09:08 schrieb He, Roger:
>   I think this patch isn't need at all. You can directly read 
> total_swap_pages variable in TTM.
>
> Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using 
> will result in:
> "WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Chunming Zhou
> Sent: Wednesday, January 31, 2018 3:15 PM
> To: He, Roger ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Koenig, 
> Christian 
> Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Hi Roger,
>
> I think this patch isn't need at all. You can directly read total_swap_pages 
> variable in TTM. See the comment:
>
> /* protected with swap_lock. reading in vm_swap_full() doesn't need 
> lock */ long total_swap_pages;
>
> there are many places using it directly, you just couldn't change its value. 
> Reading it doesn't need lock.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018年01月29日 16:29, Roger He wrote:
>> ttm module needs it to determine its internal parameter setting.
>>
>> Signed-off-by: Roger He 
>> ---
>>include/linux/swap.h |  6 ++
>>mm/swapfile.c| 15 +++
>>2 files changed, 21 insertions(+)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h index 
>> c2b8128..708d66f 100644
>> 

[PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-01 Thread Jianchao Wang
Move clearing host mem behind stopping queues. Prepare for
following patch which will grab all the outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..00cffed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2186,7 +2186,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (!dead) {
if (shutdown)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
+   }
+   nvme_stop_queues(>ctrl);
 
+   if (!dead) {
/*
 * If the controller is still alive tell it to stop using the
 * host memory buffer.  In theory the shutdown / reset should
@@ -2195,11 +2198,6 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
 */
if (dev->host_mem_descs)
nvme_set_host_mem(dev, 0);
-
-   }
-   nvme_stop_queues(>ctrl);
-
-   if (!dead) {
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-- 
2.7.4



[PATCH 6/6] nvme-pci: suspend queues based on online_queues

2018-02-01 Thread Jianchao Wang
nvme cq irq is freed based on queue_count. When the sq/cq creation
fails, irq will not be setup. free_irq will warn 'Try to free
already-free irq'.

To fix it, we only increase online_queues when adminq/sq/cq are
created and associated irq is setup. Then suspend queues based
on online_queues.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a838713c..e37f209 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1349,9 +1349,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
nvmeq->cq_vector = -1;
spin_unlock_irq(>q_lock);
 
-   if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-   blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
@@ -1495,13 +1492,15 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, 
int qid)
nvme_init_queue(nvmeq, qid);
result = queue_request_irq(nvmeq);
if (result < 0)
-   goto release_sq;
+   goto offline;
 
return result;
 
- release_sq:
+offline:
+   dev->online_queues--;
+release_sq:
adapter_delete_sq(dev, qid);
- release_cq:
+release_cq:
adapter_delete_cq(dev, qid);
return result;
 }
@@ -1641,6 +1640,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev 
*dev)
result = queue_request_irq(nvmeq);
if (result) {
nvmeq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
 
@@ -1988,6 +1988,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = queue_request_irq(adminq);
if (result) {
adminq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
return nvme_create_io_queues(dev);
@@ -2257,13 +2258,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
int i;
bool dead = true;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int onlines;
 
mutex_lock(>shutdown_lock);
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
-   pdev->error_state  != pci_channel_io_normal);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   (pdev->error_state  != pci_channel_io_normal) ||
+   (dev->online_queues == 0));
}
 
/* Just freeze the queue for shutdown case */
@@ -2297,9 +2301,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-   for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
+
+   onlines = dev->online_queues;
+   for (i = onlines - 1; i >= 0; i--)
nvme_suspend_queue(>queues[i]);
 
+   if (dev->ctrl.admin_q)
+   blk_mq_quiesce_queue(dev->ctrl.admin_q);
+
nvme_pci_disable(dev);
 
blk_mq_tagset_busy_iter(>tagset, nvme_pci_cancel_rq, >ctrl);
@@ -2444,12 +2453,12 @@ static void nvme_reset_work(struct work_struct *work)
 * Keep the controller around but remove all namespaces if we don't have
 * any working I/O queue.
 */
-   if (dev->online_queues < 2) {
+   if (dev->online_queues == 1) {
dev_warn(dev->ctrl.device, "IO queues not created\n");
nvme_kill_queues(>ctrl);
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
-   } else {
+   } else if (dev->online_queues > 1) {
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-- 
2.7.4



[PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-01 Thread Jianchao Wang
Currently, request queue will be frozen and quiesced for both reset
and shutdown case. This will trigger ioq requests in RECONNECTING
state which should be avoided to prepare for following patch.
Just freeze request queue for shutdown case and drain all the resudual
entered requests after controller has been shutdown.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 00cffed..a7fa397 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2172,21 +2172,23 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   if (dev->ctrl.state == NVME_CTRL_LIVE ||
-   dev->ctrl.state == NVME_CTRL_RESETTING)
-   nvme_start_freeze(>ctrl);
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state  != pci_channel_io_normal);
}
 
-   /*
-* Give the controller a chance to complete all entered requests if
-* doing a safe shutdown.
-*/
-   if (!dead) {
-   if (shutdown)
+   /* Just freeze the queue for shutdown case */
+   if (shutdown) {
+   if (dev->ctrl.state == NVME_CTRL_LIVE ||
+   dev->ctrl.state == NVME_CTRL_RESETTING)
+   nvme_start_freeze(>ctrl);
+   /*
+* Give the controller a chance to complete all
+* entered requests if doing a safe shutdown.
+*/
+   if (!dead)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
}
+
nvme_stop_queues(>ctrl);
 
if (!dead) {
@@ -2210,12 +2212,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
blk_mq_tagset_busy_iter(>admin_tagset, nvme_cancel_request, 
>ctrl);
 
/*
-* The driver will not be starting up queues again if shutting down so
-* must flush all entered requests to their failed completion to avoid
-* deadlocking blk-mq hot-cpu notifier.
+* For shutdown case, controller will not be setup again soon. If any
+* residual requests here, the controller must have go wrong. Drain and
+* fail all the residual entered IO requests.
 */
-   if (shutdown)
+   if (shutdown) {
nvme_start_queues(>ctrl);
+   nvme_wait_freeze(>ctrl);
+   nvme_stop_queues(>ctrl);
+   }
mutex_unlock(>shutdown_lock);
 }
 
@@ -2349,12 +2354,11 @@ static void nvme_reset_work(struct work_struct *work)
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
} else {
-   nvme_start_queues(>ctrl);
-   nvme_wait_freeze(>ctrl);
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-   nvme_unfreeze(>ctrl);
+   if (was_suspend)
+   nvme_unfreeze(>ctrl);
}
 
/*
-- 
2.7.4



[PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-01 Thread Jianchao Wang
Move clearing host mem behind stopping queues. Prepare for
following patch which will grab all the outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..00cffed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2186,7 +2186,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (!dead) {
if (shutdown)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
+   }
+   nvme_stop_queues(>ctrl);
 
+   if (!dead) {
/*
 * If the controller is still alive tell it to stop using the
 * host memory buffer.  In theory the shutdown / reset should
@@ -2195,11 +2198,6 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
 */
if (dev->host_mem_descs)
nvme_set_host_mem(dev, 0);
-
-   }
-   nvme_stop_queues(>ctrl);
-
-   if (!dead) {
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-- 
2.7.4



[PATCH 6/6] nvme-pci: suspend queues based on online_queues

2018-02-01 Thread Jianchao Wang
nvme cq irq is freed based on queue_count. When the sq/cq creation
fails, irq will not be setup. free_irq will warn 'Try to free
already-free irq'.

To fix it, we only increase online_queues when adminq/sq/cq are
created and associated irq is setup. Then suspend queues based
on online_queues.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a838713c..e37f209 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1349,9 +1349,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
nvmeq->cq_vector = -1;
spin_unlock_irq(>q_lock);
 
-   if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-   blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
@@ -1495,13 +1492,15 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, 
int qid)
nvme_init_queue(nvmeq, qid);
result = queue_request_irq(nvmeq);
if (result < 0)
-   goto release_sq;
+   goto offline;
 
return result;
 
- release_sq:
+offline:
+   dev->online_queues--;
+release_sq:
adapter_delete_sq(dev, qid);
- release_cq:
+release_cq:
adapter_delete_cq(dev, qid);
return result;
 }
@@ -1641,6 +1640,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev 
*dev)
result = queue_request_irq(nvmeq);
if (result) {
nvmeq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
 
@@ -1988,6 +1988,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = queue_request_irq(adminq);
if (result) {
adminq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
return nvme_create_io_queues(dev);
@@ -2257,13 +2258,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
int i;
bool dead = true;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int onlines;
 
mutex_lock(>shutdown_lock);
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
-   pdev->error_state  != pci_channel_io_normal);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   (pdev->error_state  != pci_channel_io_normal) ||
+   (dev->online_queues == 0));
}
 
/* Just freeze the queue for shutdown case */
@@ -2297,9 +2301,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-   for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
+
+   onlines = dev->online_queues;
+   for (i = onlines - 1; i >= 0; i--)
nvme_suspend_queue(>queues[i]);
 
+   if (dev->ctrl.admin_q)
+   blk_mq_quiesce_queue(dev->ctrl.admin_q);
+
nvme_pci_disable(dev);
 
blk_mq_tagset_busy_iter(>tagset, nvme_pci_cancel_rq, >ctrl);
@@ -2444,12 +2453,12 @@ static void nvme_reset_work(struct work_struct *work)
 * Keep the controller around but remove all namespaces if we don't have
 * any working I/O queue.
 */
-   if (dev->online_queues < 2) {
+   if (dev->online_queues == 1) {
dev_warn(dev->ctrl.device, "IO queues not created\n");
nvme_kill_queues(>ctrl);
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
-   } else {
+   } else if (dev->online_queues > 1) {
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-- 
2.7.4



[PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-01 Thread Jianchao Wang
Currently, request queue will be frozen and quiesced for both reset
and shutdown case. This will trigger ioq requests in RECONNECTING
state which should be avoided to prepare for following patch.
Just freeze request queue for shutdown case and drain all the resudual
entered requests after controller has been shutdown.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 00cffed..a7fa397 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2172,21 +2172,23 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   if (dev->ctrl.state == NVME_CTRL_LIVE ||
-   dev->ctrl.state == NVME_CTRL_RESETTING)
-   nvme_start_freeze(>ctrl);
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state  != pci_channel_io_normal);
}
 
-   /*
-* Give the controller a chance to complete all entered requests if
-* doing a safe shutdown.
-*/
-   if (!dead) {
-   if (shutdown)
+   /* Just freeze the queue for shutdown case */
+   if (shutdown) {
+   if (dev->ctrl.state == NVME_CTRL_LIVE ||
+   dev->ctrl.state == NVME_CTRL_RESETTING)
+   nvme_start_freeze(>ctrl);
+   /*
+* Give the controller a chance to complete all
+* entered requests if doing a safe shutdown.
+*/
+   if (!dead)
nvme_wait_freeze_timeout(>ctrl, NVME_IO_TIMEOUT);
}
+
nvme_stop_queues(>ctrl);
 
if (!dead) {
@@ -2210,12 +2212,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
blk_mq_tagset_busy_iter(>admin_tagset, nvme_cancel_request, 
>ctrl);
 
/*
-* The driver will not be starting up queues again if shutting down so
-* must flush all entered requests to their failed completion to avoid
-* deadlocking blk-mq hot-cpu notifier.
+* For shutdown case, controller will not be setup again soon. If any
+* residual requests here, the controller must have go wrong. Drain and
+* fail all the residual entered IO requests.
 */
-   if (shutdown)
+   if (shutdown) {
nvme_start_queues(>ctrl);
+   nvme_wait_freeze(>ctrl);
+   nvme_stop_queues(>ctrl);
+   }
mutex_unlock(>shutdown_lock);
 }
 
@@ -2349,12 +2354,11 @@ static void nvme_reset_work(struct work_struct *work)
nvme_remove_namespaces(>ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
} else {
-   nvme_start_queues(>ctrl);
-   nvme_wait_freeze(>ctrl);
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-   nvme_unfreeze(>ctrl);
+   if (was_suspend)
+   nvme_unfreeze(>ctrl);
}
 
/*
-- 
2.7.4



[PATCH] x86/retpoline: check CONFIG_RETPOLINE option when SPECTRE_V2_CMD_AUTO

2018-02-01 Thread Chen Baozi
Currently, if there is no spectre_v2= or nospectre_v2 specified in the boot
parameter, the kernel will automatically choose mitigation by default.
However, when selecting the auto mode, it doesn't check whether the
retpoline has been built in the kernel. Thus, if someone built a kernel
without CONFIG_RETPOLINE and booted the system without specifying any
spectre_v2 kernel parameters, the kernel would still report that it has
enabled a minimal retpoline mitigation which is not the case. This patch
adds the checking of CONFIG_RETPOLINE option under the 'auto' mode to fix
it.

Signed-off-by: Chen Baozi 
---
 arch/x86/kernel/cpu/bugs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..70b7d17426eb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -193,7 +193,9 @@ static void __init spectre_v2_select_mitigation(void)
case SPECTRE_V2_CMD_FORCE:
/* FALLTRHU */
case SPECTRE_V2_CMD_AUTO:
-   goto retpoline_auto;
+   if (IS_ENABLED(CONFIG_RETPOLINE))
+   goto retpoline_auto;
+   break;
 
case SPECTRE_V2_CMD_RETPOLINE_AMD:
if (IS_ENABLED(CONFIG_RETPOLINE))
-- 
2.13.5 (Apple Git-94)



[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Currently, the complicated relationship between nvme_dev_disable
and nvme_timeout has become a devil that will introduce many
circular pattern which may trigger deadlock or IO hang. Let's
enumerate the tangles between them:
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.

To break up them, let's first look at what's kind of requests
nvme_timeout has to handle.

RESETTING  previous adminq/IOq request
or shutdownadminq requests from nvme_dev_disable

RECONNECTING   adminq requests from nvme_reset_work

nvme_timeout has to invoke nvme_dev_disable first before complete
all the expired request above. We avoid this as following.

For the previous adminq/IOq request:
use blk_abort_request to force all the outstanding requests expired
in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and
return BLK_EH_NOT_HANDLED. Then the request will not be completed and
freed. We needn't invoke nvme_dev_disable any more.

blk_abort_request is safe when race with irq completion path.
we have been able to grab all the outstanding requests. This could
eliminate the race between nvme_timeout and nvme_dev_disable.

We use NVME_REQ_CANCELLED to identify them. After the controller is
totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate
to clear requests and invoke blk_mq_complete_request to complete them.

In addition, to identify the previous adminq/IOq request and adminq
requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING
and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to
distinguish them.

For the adminq requests from nvme_dev_disable/nvme_reset_work:
invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and
return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will
see the error.

With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 146 
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7fa397..5b192b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -70,6 +70,8 @@ struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+#define NVME_PCI_OUTSTANDING_GRABBING 1
+#define NVME_PCI_OUTSTANDING_GRABBED 2
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -80,6 +82,7 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
+   int grab_flag;
struct dma_pool *prp_page_pool;
struct dma_pool *prp_small_pool;
unsigned online_queues;
@@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, 
blk_status_t error)
blk_mq_free_request(req);
 }
 
+static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   u32 csts;
+   bool dead;
+
+   if (!pci_is_enabled(pdev))
+   return;
+
+   csts = readl(dev->bar + NVME_REG_CSTS);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   pdev->error_state  != pci_channel_io_normal);
+   if (!dead)
+   nvme_disable_ctrl(>ctrl, dev->ctrl.cap);
+}
+
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
/*
 * Reset immediately if the controller is failed
+* nvme_dev_disable will take over the expired requests.
 */
if (nvme_should_reset(dev, csts)) {
+   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_warn_reset(dev, csts);
-   nvme_dev_disable(dev, false);
nvme_reset_ctrl(>ctrl);
-   return BLK_EH_HANDLED;
+   return BLK_EH_NOT_HANDLED;
}
 
/*
@@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
}
 
/*
-* Shutdown immediately if controller times out while starting. The
-* reset work will see the pci device disabled when it gets the forced
-* cancellation error. All outstanding requests are completed on
-* shutdown, so we return BLK_EH_HANDLED.
+* The previous outstanding requests on adminq and ioq have been
+* grabbed 

[PATCH] x86/retpoline: check CONFIG_RETPOLINE option when SPECTRE_V2_CMD_AUTO

2018-02-01 Thread Chen Baozi
Currently, if there is no spectre_v2= or nospectre_v2 specified in the boot
parameter, the kernel will automatically choose mitigation by default.
However, when selecting the auto mode, it doesn't check whether the
retpoline has been built in the kernel. Thus, if someone built a kernel
without CONFIG_RETPOLINE and booted the system without specifying any
spectre_v2 kernel parameters, the kernel would still report that it has
enabled a minimal retpoline mitigation which is not the case. This patch
adds the checking of CONFIG_RETPOLINE option under the 'auto' mode to fix
it.

Signed-off-by: Chen Baozi 
---
 arch/x86/kernel/cpu/bugs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..70b7d17426eb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -193,7 +193,9 @@ static void __init spectre_v2_select_mitigation(void)
case SPECTRE_V2_CMD_FORCE:
/* FALLTRHU */
case SPECTRE_V2_CMD_AUTO:
-   goto retpoline_auto;
+   if (IS_ENABLED(CONFIG_RETPOLINE))
+   goto retpoline_auto;
+   break;
 
case SPECTRE_V2_CMD_RETPOLINE_AMD:
if (IS_ENABLED(CONFIG_RETPOLINE))
-- 
2.13.5 (Apple Git-94)



[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Currently, the complicated relationship between nvme_dev_disable
and nvme_timeout has become a devil that will introduce many
circular pattern which may trigger deadlock or IO hang. Let's
enumerate the tangles between them:
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.

To break up them, let's first look at what's kind of requests
nvme_timeout has to handle.

RESETTING  previous adminq/IOq request
or shutdownadminq requests from nvme_dev_disable

RECONNECTING   adminq requests from nvme_reset_work

nvme_timeout has to invoke nvme_dev_disable first before complete
all the expired request above. We avoid this as following.

For the previous adminq/IOq request:
use blk_abort_request to force all the outstanding requests expired
in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and
return BLK_EH_NOT_HANDLED. Then the request will not be completed and
freed. We needn't invoke nvme_dev_disable any more.

blk_abort_request is safe when race with irq completion path.
we have been able to grab all the outstanding requests. This could
eliminate the race between nvme_timeout and nvme_dev_disable.

We use NVME_REQ_CANCELLED to identify them. After the controller is
totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate
to clear requests and invoke blk_mq_complete_request to complete them.

In addition, to identify the previous adminq/IOq request and adminq
requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING
and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to
distinguish them.

For the adminq requests from nvme_dev_disable/nvme_reset_work:
invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and
return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will
see the error.

With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 146 
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7fa397..5b192b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -70,6 +70,8 @@ struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+#define NVME_PCI_OUTSTANDING_GRABBING 1
+#define NVME_PCI_OUTSTANDING_GRABBED 2
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -80,6 +82,7 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
+   int grab_flag;
struct dma_pool *prp_page_pool;
struct dma_pool *prp_small_pool;
unsigned online_queues;
@@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, 
blk_status_t error)
blk_mq_free_request(req);
 }
 
+static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   u32 csts;
+   bool dead;
+
+   if (!pci_is_enabled(pdev))
+   return;
+
+   csts = readl(dev->bar + NVME_REG_CSTS);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   pdev->error_state  != pci_channel_io_normal);
+   if (!dead)
+   nvme_disable_ctrl(>ctrl, dev->ctrl.cap);
+}
+
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
/*
 * Reset immediately if the controller is failed
+* nvme_dev_disable will take over the expired requests.
 */
if (nvme_should_reset(dev, csts)) {
+   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_warn_reset(dev, csts);
-   nvme_dev_disable(dev, false);
nvme_reset_ctrl(>ctrl);
-   return BLK_EH_HANDLED;
+   return BLK_EH_NOT_HANDLED;
}
 
/*
@@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
}
 
/*
-* Shutdown immediately if controller times out while starting. The
-* reset work will see the pci device disabled when it gets the forced
-* cancellation error. All outstanding requests are completed on
-* shutdown, so we return BLK_EH_HANDLED.
+* The previous outstanding requests on adminq and ioq have been
+* grabbed or drained for RECONNECTING 

[no subject]

2018-02-01 Thread Jianchao Wang
Hi Christoph, Keith and Sagi

Please consider and comment on the following patchset.
That's really appreciated.

There is a complicated relationship between nvme_timeout and nvme_dev_disable.
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.
We have found some issues introduced by them, please refer the following link

http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html 
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html
Even we cannot ensure there is no other issue.

The best way to fix them is to break up the relationship between them.
With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.


There are 6 patches:

1st ~ 3th patches does some preparation for the 4th one.
4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement
the synchronization between them. More details, please refer to the comment of
this patch.
5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can
only be wakeup by completion path.
6th fixes a bug found when test, it is not related with 4th patch.

This patchset was tested under debug patch for some days.
And some bugfix have been done.
The debug patch and other patches are available in following it branch:
https://github.com/jianchwa/linux-blcok.git nvme_fixes_test

Jianchao Wang (6)
0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch
0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch
0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch
0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch
0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0006-nvme-pci-suspend-queues-based-on-online_queues.patch

diff stat following:
 block/blk-mq.c  |   3 +-
 drivers/nvme/host/pci.c | 225 
++-
 include/linux/blk-mq.h  |   1 +
 3 files changed, 169 insertions(+), 60 deletions(-)

Thanks
Jianchao



[no subject]

2018-02-01 Thread Jianchao Wang
Hi Christoph, Keith and Sagi

Please consider and comment on the following patchset.
That's really appreciated.

There is a complicated relationship between nvme_timeout and nvme_dev_disable.
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.
We have found some issues introduced by them, please refer the following link

http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html 
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html
Even we cannot ensure there is no other issue.

The best way to fix them is to break up the relationship between them.
With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.


There are 6 patches:

1st ~ 3th patches does some preparation for the 4th one.
4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement
the synchronization between them. More details, please refer to the comment of
this patch.
5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can
only be wakeup by completion path.
6th fixes a bug found when test, it is not related with 4th patch.

This patchset was tested under debug patch for some days.
And some bugfix have been done.
The debug patch and other patches are available in following it branch:
https://github.com/jianchwa/linux-blcok.git nvme_fixes_test

Jianchao Wang (6)
0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch
0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch
0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch
0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch
0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0006-nvme-pci-suspend-queues-based-on-online_queues.patch

diff stat following:
 block/blk-mq.c  |   3 +-
 drivers/nvme/host/pci.c | 225 
++-
 include/linux/blk-mq.h  |   1 +
 3 files changed, 169 insertions(+), 60 deletions(-)

Thanks
Jianchao



[PATCH 5/6] nvme-pci: discard wait timeout when delete cq/sq

2018-02-01 Thread Jianchao Wang
Currently, nvme_disable_io_queues could be wakeup by both request
completion and wait timeout path. This is unnecessary and could
introduce race between nvme_dev_disable and request timeout path.
When delete cq/sq command expires, the nvme_disable_io_queues will
also be wakeup and return to nvme_dev_disable, then handle the
outstanding requests. This will race with the request timeout path.

To fix it, just use wait_for_completion instead of the timeout one.
The request timeout path will wakeup it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b192b0..a838713c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2048,7 +2048,6 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 
opcode)
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
int pass, queues = dev->online_queues - 1;
-   unsigned long timeout;
u8 opcode = nvme_admin_delete_sq;
 
for (pass = 0; pass < 2; pass++) {
@@ -2056,15 +2055,12 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 
reinit_completion(>ioq_wait);
  retry:
-   timeout = ADMIN_TIMEOUT;
for (; i > 0; i--, sent++)
if (nvme_delete_queue(>queues[i], opcode))
break;
 
while (sent--) {
-   timeout = 
wait_for_completion_io_timeout(>ioq_wait, timeout);
-   if (timeout == 0)
-   return;
+   wait_for_completion(>ioq_wait);
if (i)
goto retry;
}
-- 
2.7.4



[PATCH 5/6] nvme-pci: discard wait timeout when delete cq/sq

2018-02-01 Thread Jianchao Wang
Currently, nvme_disable_io_queues could be wakeup by both request
completion and wait timeout path. This is unnecessary and could
introduce race between nvme_dev_disable and request timeout path.
When delete cq/sq command expires, the nvme_disable_io_queues will
also be wakeup and return to nvme_dev_disable, then handle the
outstanding requests. This will race with the request timeout path.

To fix it, just use wait_for_completion instead of the timeout one.
The request timeout path will wakeup it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b192b0..a838713c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2048,7 +2048,6 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 
opcode)
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
int pass, queues = dev->online_queues - 1;
-   unsigned long timeout;
u8 opcode = nvme_admin_delete_sq;
 
for (pass = 0; pass < 2; pass++) {
@@ -2056,15 +2055,12 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 
reinit_completion(>ioq_wait);
  retry:
-   timeout = ADMIN_TIMEOUT;
for (; i > 0; i--, sent++)
if (nvme_delete_queue(>queues[i], opcode))
break;
 
while (sent--) {
-   timeout = 
wait_for_completion_io_timeout(>ioq_wait, timeout);
-   if (timeout == 0)
-   return;
+   wait_for_completion(>ioq_wait);
if (i)
goto retry;
}
-- 
2.7.4



Re: [PATCH] ARM: dts: imx53: Add touchscreen reset line

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:24:52PM +0100, Sebastian Reichel wrote:
> From: Martyn Welch 
> 
> Utilise new support in Atmel MaxTouch driver to drive the
> touchscreen controllers reset line correctly.
> 
> Signed-off-by: Martyn Welch 
> Signed-off-by: Sebastian Reichel 

s/imx53/imx53-ppd in subject.

I fixed it up and applied the patch.

Shawn


Re: [PATCH] ARM: dts: imx53: Add touchscreen reset line

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:24:52PM +0100, Sebastian Reichel wrote:
> From: Martyn Welch 
> 
> Utilise new support in Atmel MaxTouch driver to drive the
> touchscreen controllers reset line correctly.
> 
> Signed-off-by: Martyn Welch 
> Signed-off-by: Sebastian Reichel 

s/imx53/imx53-ppd in subject.

I fixed it up and applied the patch.

Shawn


[patch v1 4/4] platform/x86: mlx-platform: Add support for new Mellanox systems

2018-02-01 Thread Vadim Pasternak
Add support for the next new Mellanox system types: msn274x, msn201x,
qmb7, sn34, sn37. The current members of these types are:
- MSN2740 (32x100GbE Ethernet switch with cost reduction);
- MSN2010 (18x10GbE plus 4x4x25GbE);
- QMB700 (40x200GbE InfiniBand switch);
- SN3700 (32x200GbE and 16x400GbE Ethernet switch);
- SN3410 (6x400GbE plus 48x50GbE Ethernet switch).

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 298 
 1 file changed, 298 insertions(+)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index 4d8078d..94b0bfc 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -83,6 +83,7 @@
 #define MLXPLAT_CPLD_PSU_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_PWR_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_FAN_MASK  GENMASK(3, 0)
+#define MLXPLAT_CPLD_FAN_NG_MASK   GENMASK(5, 0)
 
 /* Start channel numbers */
 #define MLXPLAT_CPLD_CH1   2
@@ -94,6 +95,7 @@
 /* Hotplug devices adapter numbers */
 #define MLXPLAT_CPLD_NR_NONE   -1
 #define MLXPLAT_CPLD_PSU_DEFAULT_NR10
+#define MLXPLAT_CPLD_PSU_MSN_NR4
 #define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
 #define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
 #define MLXPLAT_CPLD_FAN3_DEFAULT_NR   13
@@ -335,6 +337,225 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_msn21xx_data = {
.mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
 };
 
+/* Platform hotplug MSN201x system family data */
+static struct mlxreg_core_data mlxplat_mlxcpld_msn201x_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+};
+
+static struct mlxreg_core_item mlxplat_mlxcpld_msn201x_items[] = {
+   {
+   .data = mlxplat_mlxcpld_msn201x_pwr_items_data,
+   .aggr_mask = MLXPLAT_CPLD_AGGR_PWR_MASK_DEF,
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = MLXPLAT_CPLD_PWR_MASK,
+   .count = ARRAY_SIZE(mlxplat_mlxcpld_msn201x_pwr_items_data),
+   .inversed = 0,
+   .health = false,
+   },
+};
+
+static
+struct mlxreg_core_hotplug_platform_data mlxplat_mlxcpld_msn201x_data = {
+   .items = mlxplat_mlxcpld_msn21xx_items,
+   .counter = ARRAY_SIZE(mlxplat_mlxcpld_msn201x_items),
+   .cell = MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET,
+   .mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
+   .cell_low = MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET,
+   .mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
+};
+
+/* Platform hotplug next generation system family data */
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_psu_items_data[] = {
+   {
+   .label = "psu1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
+   .mask = BIT(0),
+   .hpdev.brdinfo = _mlxcpld_psu[0],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+   {
+   .label = "psu2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
+   .mask = BIT(1),
+   .hpdev.brdinfo = _mlxcpld_psu[1],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+};
+
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.brdinfo = _mlxcpld_pwr[0],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.brdinfo = _mlxcpld_pwr[1],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+};
+
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_fan_items_data[] = {
+   {
+   .label = "fan1",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan2",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan3",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(2),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan4",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(3),
+   .hpdev.nr 

[patch v1 4/4] platform/x86: mlx-platform: Add support for new Mellanox systems

2018-02-01 Thread Vadim Pasternak
Add support for the next new Mellanox system types: msn274x, msn201x,
qmb7, sn34, sn37. The current members of these types are:
- MSN2740 (32x100GbE Ethernet switch with cost reduction);
- MSN2010 (18x10GbE plus 4x4x25GbE);
- QMB700 (40x200GbE InfiniBand switch);
- SN3700 (32x200GbE and 16x400GbE Ethernet switch);
- SN3410 (6x400GbE plus 48x50GbE Ethernet switch).

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 298 
 1 file changed, 298 insertions(+)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index 4d8078d..94b0bfc 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -83,6 +83,7 @@
 #define MLXPLAT_CPLD_PSU_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_PWR_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_FAN_MASK  GENMASK(3, 0)
+#define MLXPLAT_CPLD_FAN_NG_MASK   GENMASK(5, 0)
 
 /* Start channel numbers */
 #define MLXPLAT_CPLD_CH1   2
@@ -94,6 +95,7 @@
 /* Hotplug devices adapter numbers */
 #define MLXPLAT_CPLD_NR_NONE   -1
 #define MLXPLAT_CPLD_PSU_DEFAULT_NR10
+#define MLXPLAT_CPLD_PSU_MSN_NR4
 #define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
 #define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
 #define MLXPLAT_CPLD_FAN3_DEFAULT_NR   13
@@ -335,6 +337,225 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_msn21xx_data = {
.mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
 };
 
+/* Platform hotplug MSN201x system family data */
+static struct mlxreg_core_data mlxplat_mlxcpld_msn201x_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+};
+
+static struct mlxreg_core_item mlxplat_mlxcpld_msn201x_items[] = {
+   {
+   .data = mlxplat_mlxcpld_msn201x_pwr_items_data,
+   .aggr_mask = MLXPLAT_CPLD_AGGR_PWR_MASK_DEF,
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = MLXPLAT_CPLD_PWR_MASK,
+   .count = ARRAY_SIZE(mlxplat_mlxcpld_msn201x_pwr_items_data),
+   .inversed = 0,
+   .health = false,
+   },
+};
+
+static
+struct mlxreg_core_hotplug_platform_data mlxplat_mlxcpld_msn201x_data = {
+   .items = mlxplat_mlxcpld_msn21xx_items,
+   .counter = ARRAY_SIZE(mlxplat_mlxcpld_msn201x_items),
+   .cell = MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET,
+   .mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
+   .cell_low = MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET,
+   .mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
+};
+
+/* Platform hotplug next generation system family data */
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_psu_items_data[] = {
+   {
+   .label = "psu1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
+   .mask = BIT(0),
+   .hpdev.brdinfo = _mlxcpld_psu[0],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+   {
+   .label = "psu2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
+   .mask = BIT(1),
+   .hpdev.brdinfo = _mlxcpld_psu[1],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+};
+
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.brdinfo = _mlxcpld_pwr[0],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.brdinfo = _mlxcpld_pwr[1],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+};
+
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_fan_items_data[] = {
+   {
+   .label = "fan1",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan2",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan3",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(2),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan4",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(3),
+   .hpdev.nr = 

[patch v1 3/4] platform/x86: mlx-platform: Fix power cable setting for systems from msn21xx family

2018-02-01 Thread Vadim Pasternak
Add dedicated structure with power cable setting for Mellanox system from
family msn21xx. These systems do not have physical device for power
unit controller. So, in case power cable is inserted or removed, relevant
interrupt signal is to be handled, status will be updated, but no device
is to be associated with this signal.

Add definition for interrupt low aggregation signal. On system from
msn21xx family, low aggregation mask should be removed in order to allow
signal hit CPU.

Fixes: 6613d18e9038 ("platform/x86: mlx-platform: Move module from arch/x86")
Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index 177b40a..4d8078d 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -77,6 +77,8 @@
 #define MLXPLAT_CPLD_AGGR_FAN_MASK_DEF 0x40
 #define MLXPLAT_CPLD_AGGR_MASK_DEF (MLXPLAT_CPLD_AGGR_PSU_MASK_DEF | \
 MLXPLAT_CPLD_AGGR_FAN_MASK_DEF)
+#define MLXPLAT_CPLD_AGGR_MASK_NG_DEF  0x04
+#define MLXPLAT_CPLD_LOW_AGGR_MASK_LOW 0xc0
 #define MLXPLAT_CPLD_AGGR_MASK_MSN21XX 0x04
 #define MLXPLAT_CPLD_PSU_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_PWR_MASK  GENMASK(1, 0)
@@ -295,14 +297,29 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_default_data = {
.mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
 };
 
+static struct mlxreg_core_data mlxplat_mlxcpld_msn21xx_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+};
+
 /* Platform hotplug MSN21xx system family data */
 static struct mlxreg_core_item mlxplat_mlxcpld_msn21xx_items[] = {
{
-   .data = mlxplat_mlxcpld_default_pwr_items_data,
+   .data = mlxplat_mlxcpld_msn21xx_pwr_items_data,
.aggr_mask = MLXPLAT_CPLD_AGGR_PWR_MASK_DEF,
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = MLXPLAT_CPLD_PWR_MASK,
-   .count = ARRAY_SIZE(mlxplat_mlxcpld_pwr),
+   .count = ARRAY_SIZE(mlxplat_mlxcpld_msn21xx_pwr_items_data),
.inversed = 0,
.health = false,
},
@@ -314,6 +331,8 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_msn21xx_data = {
.counter = ARRAY_SIZE(mlxplat_mlxcpld_msn21xx_items),
.cell = MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET,
.mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
+   .cell_low = MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET,
+   .mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
 };
 
 static bool mlxplat_mlxcpld_writeable_reg(struct device *dev, unsigned int reg)
-- 
2.1.4



[patch v1 3/4] platform/x86: mlx-platform: Fix power cable setting for systems from msn21xx family

2018-02-01 Thread Vadim Pasternak
Add dedicated structure with power cable setting for Mellanox system from
family msn21xx. These systems do not have physical device for power
unit controller. So, in case power cable is inserted or removed, relevant
interrupt signal is to be handled, status will be updated, but no device
is to be associated with this signal.

Add definition for interrupt low aggregation signal. On system from
msn21xx family, low aggregation mask should be removed in order to allow
signal hit CPU.

Fixes: 6613d18e9038 ("platform/x86: mlx-platform: Move module from arch/x86")
Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index 177b40a..4d8078d 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -77,6 +77,8 @@
 #define MLXPLAT_CPLD_AGGR_FAN_MASK_DEF 0x40
 #define MLXPLAT_CPLD_AGGR_MASK_DEF (MLXPLAT_CPLD_AGGR_PSU_MASK_DEF | \
 MLXPLAT_CPLD_AGGR_FAN_MASK_DEF)
+#define MLXPLAT_CPLD_AGGR_MASK_NG_DEF  0x04
+#define MLXPLAT_CPLD_LOW_AGGR_MASK_LOW 0xc0
 #define MLXPLAT_CPLD_AGGR_MASK_MSN21XX 0x04
 #define MLXPLAT_CPLD_PSU_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_PWR_MASK  GENMASK(1, 0)
@@ -295,14 +297,29 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_default_data = {
.mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
 };
 
+static struct mlxreg_core_data mlxplat_mlxcpld_msn21xx_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+};
+
 /* Platform hotplug MSN21xx system family data */
 static struct mlxreg_core_item mlxplat_mlxcpld_msn21xx_items[] = {
{
-   .data = mlxplat_mlxcpld_default_pwr_items_data,
+   .data = mlxplat_mlxcpld_msn21xx_pwr_items_data,
.aggr_mask = MLXPLAT_CPLD_AGGR_PWR_MASK_DEF,
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = MLXPLAT_CPLD_PWR_MASK,
-   .count = ARRAY_SIZE(mlxplat_mlxcpld_pwr),
+   .count = ARRAY_SIZE(mlxplat_mlxcpld_msn21xx_pwr_items_data),
.inversed = 0,
.health = false,
},
@@ -314,6 +331,8 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_msn21xx_data = {
.counter = ARRAY_SIZE(mlxplat_mlxcpld_msn21xx_items),
.cell = MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET,
.mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
+   .cell_low = MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET,
+   .mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
 };
 
 static bool mlxplat_mlxcpld_writeable_reg(struct device *dev, unsigned int reg)
-- 
2.1.4



[patch v1 2/4] platform/x86: mlx-platform: Add define for the negative bus

2018-02-01 Thread Vadim Pasternak
Add define for the negative bus Id in order to use it in case no hotplug
device is associated with hotplug interrupt signal. In this case signal
will be handled by mlxreg-hotplug driver, but any device will not be
associated with this signal.

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index a1ae93d..177b40a 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -90,6 +90,7 @@
 #define MLXPLAT_CPLD_LPC_MUX_DEVS  2
 
 /* Hotplug devices adapter numbers */
+#define MLXPLAT_CPLD_NR_NONE   -1
 #define MLXPLAT_CPLD_PSU_DEFAULT_NR10
 #define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
 #define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
-- 
2.1.4



[patch v1 2/4] platform/x86: mlx-platform: Add define for the negative bus

2018-02-01 Thread Vadim Pasternak
Add define for the negative bus Id in order to use it in case no hotplug
device is associated with hotplug interrupt signal. In this case signal
will be handled by mlxreg-hotplug driver, but any device will not be
associated with this signal.

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index a1ae93d..177b40a 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -90,6 +90,7 @@
 #define MLXPLAT_CPLD_LPC_MUX_DEVS  2
 
 /* Hotplug devices adapter numbers */
+#define MLXPLAT_CPLD_NR_NONE   -1
 #define MLXPLAT_CPLD_PSU_DEFAULT_NR10
 #define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
 #define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
-- 
2.1.4



[patch v1 0/4] mlx-platform: Add support for new Mellanox systems, code improvement, fixes for msn21xx system

2018-02-01 Thread Vadim Pasternak
The patchset:
- adds defines for bus numbers, used for system topology description;
- fixes definition for power cables for system family msn21xx;
- introduces support for new Mellanox systems;

Vadim Pasternak (4):
  platform/x86: mlx-platform: Use defines for bus assignment
  platform/x86: mlx-platform: Add define for the negative bus
  platform/x86: mlx-platform: Fix power cable setting for systems from
msn21xx family
  platform/x86: mlx-platform: Add support for new Mellanox systems

 drivers/platform/x86/mlx-platform.c | 345 ++--
 1 file changed, 335 insertions(+), 10 deletions(-)

-- 
2.1.4



[patch v1 0/4] mlx-platform: Add support for new Mellanox systems, code improvement, fixes for msn21xx system

2018-02-01 Thread Vadim Pasternak
The patchset:
- adds defines for bus numbers, used for system topology description;
- fixes definition for power cables for system family msn21xx;
- introduces support for new Mellanox systems;

Vadim Pasternak (4):
  platform/x86: mlx-platform: Use defines for bus assignment
  platform/x86: mlx-platform: Add define for the negative bus
  platform/x86: mlx-platform: Fix power cable setting for systems from
msn21xx family
  platform/x86: mlx-platform: Add support for new Mellanox systems

 drivers/platform/x86/mlx-platform.c | 345 ++--
 1 file changed, 335 insertions(+), 10 deletions(-)

-- 
2.1.4



[patch v1 1/4] platform/x86: mlx-platform: Use defines for bus assignment

2018-02-01 Thread Vadim Pasternak
Add defines the bus Ids, used for hotplug devices topology in order to
improve code readability. Defines added for FAN and power units.

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index dfecba4..a1ae93d 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -89,6 +89,13 @@
 /* Number of LPC attached MUX platform devices */
 #define MLXPLAT_CPLD_LPC_MUX_DEVS  2
 
+/* Hotplug devices adapter numbers */
+#define MLXPLAT_CPLD_PSU_DEFAULT_NR10
+#define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
+#define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
+#define MLXPLAT_CPLD_FAN3_DEFAULT_NR   13
+#define MLXPLAT_CPLD_FAN4_DEFAULT_NR   14
+
 /* mlxplat_priv - platform private data
  * @pdev_i2c - i2c controller platform device
  * @pdev_mux - array of mux platform devices
@@ -190,14 +197,14 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_psu_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = _mlxcpld_psu[0],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
{
.label = "psu2",
.reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = _mlxcpld_psu[1],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
 };
 
@@ -207,14 +214,14 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_pwr_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = _mlxcpld_pwr[0],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
{
.label = "pwr2",
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = _mlxcpld_pwr[1],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
 };
 
@@ -224,28 +231,28 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_fan_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = _mlxcpld_fan[0],
-   .hpdev.nr = 11,
+   .hpdev.nr = MLXPLAT_CPLD_FAN1_DEFAULT_NR,
},
{
.label = "fan2",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = _mlxcpld_fan[1],
-   .hpdev.nr = 12,
+   .hpdev.nr = MLXPLAT_CPLD_FAN2_DEFAULT_NR,
},
{
.label = "fan3",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(2),
.hpdev.brdinfo = _mlxcpld_fan[2],
-   .hpdev.nr = 13,
+   .hpdev.nr = MLXPLAT_CPLD_FAN3_DEFAULT_NR,
},
{
.label = "fan4",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(3),
.hpdev.brdinfo = _mlxcpld_fan[3],
-   .hpdev.nr = 14,
+   .hpdev.nr = MLXPLAT_CPLD_FAN4_DEFAULT_NR,
},
 };
 
-- 
2.1.4



[patch v1 1/4] platform/x86: mlx-platform: Use defines for bus assignment

2018-02-01 Thread Vadim Pasternak
Add defines the bus Ids, used for hotplug devices topology in order to
improve code readability. Defines added for FAN and power units.

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index dfecba4..a1ae93d 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -89,6 +89,13 @@
 /* Number of LPC attached MUX platform devices */
 #define MLXPLAT_CPLD_LPC_MUX_DEVS  2
 
+/* Hotplug devices adapter numbers */
+#define MLXPLAT_CPLD_PSU_DEFAULT_NR10
+#define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
+#define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
+#define MLXPLAT_CPLD_FAN3_DEFAULT_NR   13
+#define MLXPLAT_CPLD_FAN4_DEFAULT_NR   14
+
 /* mlxplat_priv - platform private data
  * @pdev_i2c - i2c controller platform device
  * @pdev_mux - array of mux platform devices
@@ -190,14 +197,14 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_psu_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = _mlxcpld_psu[0],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
{
.label = "psu2",
.reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = _mlxcpld_psu[1],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
 };
 
@@ -207,14 +214,14 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_pwr_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = _mlxcpld_pwr[0],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
{
.label = "pwr2",
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = _mlxcpld_pwr[1],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
 };
 
@@ -224,28 +231,28 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_fan_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = _mlxcpld_fan[0],
-   .hpdev.nr = 11,
+   .hpdev.nr = MLXPLAT_CPLD_FAN1_DEFAULT_NR,
},
{
.label = "fan2",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = _mlxcpld_fan[1],
-   .hpdev.nr = 12,
+   .hpdev.nr = MLXPLAT_CPLD_FAN2_DEFAULT_NR,
},
{
.label = "fan3",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(2),
.hpdev.brdinfo = _mlxcpld_fan[2],
-   .hpdev.nr = 13,
+   .hpdev.nr = MLXPLAT_CPLD_FAN3_DEFAULT_NR,
},
{
.label = "fan4",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(3),
.hpdev.brdinfo = _mlxcpld_fan[3],
-   .hpdev.nr = 14,
+   .hpdev.nr = MLXPLAT_CPLD_FAN4_DEFAULT_NR,
},
 };
 
-- 
2.1.4



[GIT PULL] arch/microblaze patches for 4.16-rc1

2018-02-01 Thread Michal Simek
Hi,

please pull the following fixes to your tree.

Thanks,
Michal


The following changes since commit a8750ddca918032d6349adbf9a4b6555e7db20da:

  Linux 4.15-rc8 (2018-01-14 15:32:30 -0800)

are available in the git repository at:

  git://git.monstr.eu/linux-2.6-microblaze.git tags/microblaze-4.16-rc1

for you to fetch changes up to 7b6ce52be3f86520524711a6f33f3866f9339694:

  microblaze: Setup proper dependency for optimized lib functions
(2018-01-22 11:24:14 +0100)


Microblaze patches for 4.16-rc1

- Fix endian handling and Kconfig dependency
- Fix iounmap prototype


Arnd Bergmann (2):
  microblaze: fix endian handling
  microblaze: fix iounmap prototype

Michal Simek (1):
  microblaze: Setup proper dependency for optimized lib functions

 arch/microblaze/Kconfig.platform |  1 +
 arch/microblaze/Makefile | 17 +++--
 arch/microblaze/include/asm/io.h |  2 +-
 arch/microblaze/mm/pgtable.c |  2 +-
 4 files changed, 14 insertions(+), 8 deletions(-)



-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP SoCs




signature.asc
Description: OpenPGP digital signature


[GIT PULL] arch/microblaze patches for 4.16-rc1

2018-02-01 Thread Michal Simek
Hi,

please pull the following fixes to your tree.

Thanks,
Michal


The following changes since commit a8750ddca918032d6349adbf9a4b6555e7db20da:

  Linux 4.15-rc8 (2018-01-14 15:32:30 -0800)

are available in the git repository at:

  git://git.monstr.eu/linux-2.6-microblaze.git tags/microblaze-4.16-rc1

for you to fetch changes up to 7b6ce52be3f86520524711a6f33f3866f9339694:

  microblaze: Setup proper dependency for optimized lib functions
(2018-01-22 11:24:14 +0100)


Microblaze patches for 4.16-rc1

- Fix endian handling and Kconfig dependency
- Fix iounmap prototype


Arnd Bergmann (2):
  microblaze: fix endian handling
  microblaze: fix iounmap prototype

Michal Simek (1):
  microblaze: Setup proper dependency for optimized lib functions

 arch/microblaze/Kconfig.platform |  1 +
 arch/microblaze/Makefile | 17 +++--
 arch/microblaze/include/asm/io.h |  2 +-
 arch/microblaze/mm/pgtable.c |  2 +-
 4 files changed, 14 insertions(+), 8 deletions(-)



-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP SoCs




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] socket: Provide bounce buffer for constant sized put_cmsg()

2018-02-01 Thread kbuild test robot
Hi Kees,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kees-Cook/socket-Provide-bounce-buffer-for-constant-sized-put_cmsg/20180202-113637
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/bluetooth/hci_sock.c:1406:17: sparse: incorrect type in initializer 
>> (invalid types) @@ expected void _val @@ got void _val @@
   net/bluetooth/hci_sock.c:1406:17: expected void _val
   net/bluetooth/hci_sock.c:1406:17: got void 
>> net/bluetooth/hci_sock.c:1406:17: sparse: expression using sizeof(void)
   In file included from include/linux/compat.h:16:0,
from include/linux/ethtool.h:17,
from include/linux/netdevice.h:41,
from include/net/sock.h:51,
from include/net/bluetooth/bluetooth.h:29,
from net/bluetooth/hci_sock.c:32:
   net/bluetooth/hci_sock.c: In function 'hci_sock_cmsg':
   include/linux/socket.h:355:19: error: variable or field '_val' declared void
_val = 14- ^
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~
   include/linux/socket.h:355:26: warning: dereferencing 'void pointer
_val = 20- ^~~
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~
   include/linux/socket.h:355:26: error: void value not ignored as it ought to 
be
_val = 26- ^
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~

vim +1406 net/bluetooth/hci_sock.c

767c5eb5 Marcel Holtmann 2007-09-09  1405  
767c5eb5 Marcel Holtmann 2007-09-09 @1406   put_cmsg(msg, SOL_HCI, 
HCI_CMSG_TSTAMP, len, data);
a61bbcf2 Patrick McHardy 2005-08-14  1407   }
^1da177e Linus Torvalds  2005-04-16  1408  }
^1da177e Linus Torvalds  2005-04-16  1409  

:: The code at line 1406 was first introduced by commit
:: 767c5eb5d35aeb85987143f0a730bc21d3ecfb3d [Bluetooth] Add compat handling 
for timestamp structure

:: TO: Marcel Holtmann <mar...@holtmann.org>
:: CC: Marcel Holtmann <mar...@holtmann.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH] socket: Provide bounce buffer for constant sized put_cmsg()

2018-02-01 Thread kbuild test robot
Hi Kees,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kees-Cook/socket-Provide-bounce-buffer-for-constant-sized-put_cmsg/20180202-113637
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/bluetooth/hci_sock.c:1406:17: sparse: incorrect type in initializer 
>> (invalid types) @@ expected void _val @@ got void _val @@
   net/bluetooth/hci_sock.c:1406:17: expected void _val
   net/bluetooth/hci_sock.c:1406:17: got void 
>> net/bluetooth/hci_sock.c:1406:17: sparse: expression using sizeof(void)
   In file included from include/linux/compat.h:16:0,
from include/linux/ethtool.h:17,
from include/linux/netdevice.h:41,
from include/net/sock.h:51,
from include/net/bluetooth/bluetooth.h:29,
from net/bluetooth/hci_sock.c:32:
   net/bluetooth/hci_sock.c: In function 'hci_sock_cmsg':
   include/linux/socket.h:355:19: error: variable or field '_val' declared void
_val = 14- ^
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~
   include/linux/socket.h:355:26: warning: dereferencing 'void pointer
_val = 20- ^~~
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~
   include/linux/socket.h:355:26: error: void value not ignored as it ought to 
be
_val = 26- ^
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~

vim +1406 net/bluetooth/hci_sock.c

767c5eb5 Marcel Holtmann 2007-09-09  1405  
767c5eb5 Marcel Holtmann 2007-09-09 @1406   put_cmsg(msg, SOL_HCI, 
HCI_CMSG_TSTAMP, len, data);
a61bbcf2 Patrick McHardy 2005-08-14  1407   }
^1da177e Linus Torvalds  2005-04-16  1408  }
^1da177e Linus Torvalds  2005-04-16  1409  

:: The code at line 1406 was first introduced by commit
:: 767c5eb5d35aeb85987143f0a730bc21d3ecfb3d [Bluetooth] Add compat handling 
for timestamp structure

:: TO: Marcel Holtmann 
:: CC: Marcel Holtmann 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH v6 17/41] dt-bindings: clock: Add bindings for DA8XX CFGCHIP clocks

2018-02-01 Thread Sekhar Nori
On Saturday 20 January 2018 10:43 PM, David Lechner wrote:
> +EMIFA clock source (ASYNC1)
> +---
> +Required properties:
> +- compatible: shall be "ti,da850-async1-clksrc".
> +- #clock-cells: from common clock binding; shall be set to 0.
> +- clocks: phandles to the parent clocks corresponding to clock-names
> +- clock-names: shall be "pll0_sysclk3", "div4.5"

Is this clock really referred to as aysnc1 in documentation? I don't get
hits for async1 in OMAP-L138 TRM.

Thanks,
Sekhar


Re: [PATCH v6 17/41] dt-bindings: clock: Add bindings for DA8XX CFGCHIP clocks

2018-02-01 Thread Sekhar Nori
On Saturday 20 January 2018 10:43 PM, David Lechner wrote:
> +EMIFA clock source (ASYNC1)
> +---
> +Required properties:
> +- compatible: shall be "ti,da850-async1-clksrc".
> +- #clock-cells: from common clock binding; shall be set to 0.
> +- clocks: phandles to the parent clocks corresponding to clock-names
> +- clock-names: shall be "pll0_sysclk3", "div4.5"

Is this clock really referred to as aysnc1 in documentation? I don't get
hits for async1 in OMAP-L138 TRM.

Thanks,
Sekhar


Re: possible deadlock in get_user_pages_unlocked

2018-02-01 Thread Al Viro
On Fri, Feb 02, 2018 at 05:46:26AM +, Al Viro wrote:
> On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:
> 
> > Try starting up multiple instances of the program; that sometimes helps with
> > these races that are hard to hit (since you may e.g. have a different 
> > number of
> > CPUs than syzbot used).  If I start up 4 instances I see the lockdep splat 
> > after
> > around 2-5 seconds.
> 
> 5 instances in parallel, 10 minutes into the run...
> 
> >  This is on latest Linus tree (4bf772b1467).  Also note the
> > reproducer uses KVM, so if you're running it in a VM it will only work if 
> > you've
> > enabled nested virtualization on the host (kvm_intel.nested=1).
> 
> cat /sys/module/kvm_amd/parameters/nested 
> 1
> 
> on host
> 
> > Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
> > get_user_page_nowait() to get_user_pages_unlocked()").
> 
> That simply prevents this reproducer hitting get_user_pages_unlocked()
> instead of grab mmap_sem/get_user_pages/drop mmap_sem.  I.e. does not
> allow __get_user_pages_locked() to drop/regain ->mmap_sem.
> 
> The bug may be in the way we call get_user_pages_unlocked() in that
> commit, but it might easily be a bug in __get_user_pages_locked()
> exposed by that reproducer somehow.

I think I understand what's going on.  FOLL_NOWAIT handling is a serious
mess ;-/  I'll probably have something to test tomorrow - I still can't
reproduce it here, unfortunately.


Re: possible deadlock in get_user_pages_unlocked

2018-02-01 Thread Al Viro
On Fri, Feb 02, 2018 at 05:46:26AM +, Al Viro wrote:
> On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:
> 
> > Try starting up multiple instances of the program; that sometimes helps with
> > these races that are hard to hit (since you may e.g. have a different 
> > number of
> > CPUs than syzbot used).  If I start up 4 instances I see the lockdep splat 
> > after
> > around 2-5 seconds.
> 
> 5 instances in parallel, 10 minutes into the run...
> 
> >  This is on latest Linus tree (4bf772b1467).  Also note the
> > reproducer uses KVM, so if you're running it in a VM it will only work if 
> > you've
> > enabled nested virtualization on the host (kvm_intel.nested=1).
> 
> cat /sys/module/kvm_amd/parameters/nested 
> 1
> 
> on host
> 
> > Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
> > get_user_page_nowait() to get_user_pages_unlocked()").
> 
> That simply prevents this reproducer hitting get_user_pages_unlocked()
> instead of grab mmap_sem/get_user_pages/drop mmap_sem.  I.e. does not
> allow __get_user_pages_locked() to drop/regain ->mmap_sem.
> 
> The bug may be in the way we call get_user_pages_unlocked() in that
> commit, but it might easily be a bug in __get_user_pages_locked()
> exposed by that reproducer somehow.

I think I understand what's going on.  FOLL_NOWAIT handling is a serious
mess ;-/  I'll probably have something to test tomorrow - I still can't
reproduce it here, unfortunately.


Re: [PATCH v2 1/7] ARM: imx: add timer stop flag to ARM power off state

2018-02-01 Thread Shawn Guo
On Wed, Jan 10, 2018 at 10:04:47PM +0100, Stefan Agner wrote:
> When the CPU is in ARM power off state the ARM architected
> timers are stopped. The flag is already present in the higher
> power WAIT mode.
> 
> This allows to use the ARM generic timer on i.MX 6UL/6ULL SoC.
> Without the flag the kernel freezes when the timer enters the
> first time ARM power off mode.
> 
> Note: The default timer on i.MX6SX is the i.MX GPT timer which is
> not disabled during CPU idle. However, the timer is not affected
> by the CPUIDLE_FLAG_TIMER_STOP flag. The flag only affects CPU
> local timers.
> 
> Cc: Anson Huang 
> Signed-off-by: Stefan Agner 
> Reviewed-by: Lucas Stach 

Applied all, thanks.


Re: [PATCH v2 1/7] ARM: imx: add timer stop flag to ARM power off state

2018-02-01 Thread Shawn Guo
On Wed, Jan 10, 2018 at 10:04:47PM +0100, Stefan Agner wrote:
> When the CPU is in ARM power off state the ARM architected
> timers are stopped. The flag is already present in the higher
> power WAIT mode.
> 
> This allows to use the ARM generic timer on i.MX 6UL/6ULL SoC.
> Without the flag the kernel freezes when the timer enters the
> first time ARM power off mode.
> 
> Note: The default timer on i.MX6SX is the i.MX GPT timer which is
> not disabled during CPU idle. However, the timer is not affected
> by the CPUIDLE_FLAG_TIMER_STOP flag. The flag only affects CPU
> local timers.
> 
> Cc: Anson Huang 
> Signed-off-by: Stefan Agner 
> Reviewed-by: Lucas Stach 

Applied all, thanks.


[V2][PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

2018-02-01 Thread Haiqing Bai
From: Shigeru Yoshida 

Running io_watchdog_func() while ohci_urb_enqueue() is running can
cause a race condition where ohci->prev_frame_no is corrupted and the
watchdog can mis-detect following error:

  ohci-platform 664a0800.usb: frame counter not updating; disabled
  ohci-platform 664a0800.usb: HC died; cleaning up

Specifically, following scenario causes a race condition:

  1. ohci_urb_enqueue() calls spin_lock_irqsave(>lock, flags)
 and enters the critical section
  2. ohci_urb_enqueue() calls timer_pending(>io_watchdog) and it
 returns false
  3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
 read by ohci_frame_no(ohci)
  4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
  5. ohci_urb_enqueue() calls spin_unlock_irqrestore(>lock,
 flags) and exits the critical section
  6. Later, ohci_urb_enqueue() is called
  7. ohci_urb_enqueue() calls spin_lock_irqsave(>lock, flags)
 and enters the critical section
  8. The timer scheduled on step 4 expires and io_watchdog_func() runs
  9. io_watchdog_func() calls spin_lock_irqsave(>lock, flags)
 and waits on it because ohci_urb_enqueue() is already in the
 critical section on step 7
 10. ohci_urb_enqueue() calls timer_pending(>io_watchdog) and it
 returns false
 11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
 read by ohci_frame_no(ohci) because the frame number proceeded
 between step 3 and 6
 12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
 13. ohci_urb_enqueue() calls spin_unlock_irqrestore(>lock,
 flags) and exits the critical section, then wake up
 io_watchdog_func() which is waiting on step 9
 14. io_watchdog_func() enters the critical section
 15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
 variable to the frame number
 16. io_watchdog_func() compares frame_no and ohci->prev_frame_no

On step 16, because this calling of io_watchdog_func() is scheduled on
step 4, the frame number set in ohci->prev_frame_no is expected to the
number set on step 3.  However, ohci->prev_frame_no is overwritten on
step 11.  Because step 16 is executed soon after step 11, the frame
number might not proceed, so ohci->prev_frame_no must equals to
frame_no.

To address above scenario, this patch introduces a special sentinel
value IO_WATCHDOG_OFF and set this value to ohci->prev_frame_no when
the watchdog is not pending or running.  When ohci_urb_enqueue()
schedules the watchdog (step 4 and 12 above), it compares
ohci->prev_frame_no to IO_WATCHDOG_OFF so that ohci->prev_frame_no is
not overwritten while io_watchdog_func() is running.

v2: Instead of adding an extra flag variable, defining IO_WATCHDOG_OFF
as a special sentinel value for prev_frame_no.

Signed-off-by: Shigeru Yoshida 
Signed-off-by: Haiqing Bai 
---
 drivers/usb/host/ohci-hcd.c | 10 +++---
 drivers/usb/host/ohci-hub.c |  4 +++-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
index ee96763..84f88fa 100644
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -74,6 +74,7 @@
 
 #defineSTATECHANGE_DELAY   msecs_to_jiffies(300)
 #defineIO_WATCHDOG_DELAY   msecs_to_jiffies(275)
+#defineIO_WATCHDOG_OFF 0xff00
 
 #include "ohci.h"
 #include "pci-quirks.h"
@@ -231,7 +232,7 @@ static int ohci_urb_enqueue (
}
 
/* Start up the I/O watchdog timer, if it's not running */
-   if (!timer_pending(>io_watchdog) &&
+   if (ohci->prev_frame_no == IO_WATCHDOG_OFF &&
list_empty(>eds_in_use) &&
!(ohci->flags & OHCI_QUIRK_QEMU)) {
ohci->prev_frame_no = ohci_frame_no(ohci);
@@ -501,6 +502,7 @@ static int ohci_init (struct ohci_hcd *ohci)
return 0;
 
timer_setup(>io_watchdog, io_watchdog_func, 0);
+   ohci->prev_frame_no = IO_WATCHDOG_OFF;
 
ohci->hcca = dma_alloc_coherent (hcd->self.controller,
sizeof(*ohci->hcca), >hcca_dma, GFP_KERNEL);
@@ -730,7 +732,7 @@ static void io_watchdog_func(struct timer_list *t)
u32 head;
struct ed   *ed;
struct td   *td, *td_start, *td_next;
-   unsignedframe_no;
+   unsignedframe_no, prev_frame_no = IO_WATCHDOG_OFF;
unsigned long   flags;
 
spin_lock_irqsave(>lock, flags);
@@ -835,7 +837,7 @@ static void io_watchdog_func(struct timer_list *t)
}
}
if (!list_empty(>eds_in_use)) {
-   ohci->prev_frame_no = frame_no;
+   prev_frame_no = frame_no;
ohci->prev_wdh_cnt = ohci->wdh_cnt;
ohci->prev_donehead = 

[V2][PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

2018-02-01 Thread Haiqing Bai
From: Shigeru Yoshida 

Running io_watchdog_func() while ohci_urb_enqueue() is running can
cause a race condition where ohci->prev_frame_no is corrupted and the
watchdog can mis-detect following error:

  ohci-platform 664a0800.usb: frame counter not updating; disabled
  ohci-platform 664a0800.usb: HC died; cleaning up

Specifically, following scenario causes a race condition:

  1. ohci_urb_enqueue() calls spin_lock_irqsave(>lock, flags)
 and enters the critical section
  2. ohci_urb_enqueue() calls timer_pending(>io_watchdog) and it
 returns false
  3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
 read by ohci_frame_no(ohci)
  4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
  5. ohci_urb_enqueue() calls spin_unlock_irqrestore(>lock,
 flags) and exits the critical section
  6. Later, ohci_urb_enqueue() is called
  7. ohci_urb_enqueue() calls spin_lock_irqsave(>lock, flags)
 and enters the critical section
  8. The timer scheduled on step 4 expires and io_watchdog_func() runs
  9. io_watchdog_func() calls spin_lock_irqsave(>lock, flags)
 and waits on it because ohci_urb_enqueue() is already in the
 critical section on step 7
 10. ohci_urb_enqueue() calls timer_pending(>io_watchdog) and it
 returns false
 11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
 read by ohci_frame_no(ohci) because the frame number proceeded
 between step 3 and 6
 12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
 13. ohci_urb_enqueue() calls spin_unlock_irqrestore(>lock,
 flags) and exits the critical section, then wake up
 io_watchdog_func() which is waiting on step 9
 14. io_watchdog_func() enters the critical section
 15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
 variable to the frame number
 16. io_watchdog_func() compares frame_no and ohci->prev_frame_no

On step 16, because this calling of io_watchdog_func() is scheduled on
step 4, the frame number set in ohci->prev_frame_no is expected to the
number set on step 3.  However, ohci->prev_frame_no is overwritten on
step 11.  Because step 16 is executed soon after step 11, the frame
number might not proceed, so ohci->prev_frame_no must equals to
frame_no.

To address above scenario, this patch introduces a special sentinel
value IO_WATCHDOG_OFF and set this value to ohci->prev_frame_no when
the watchdog is not pending or running.  When ohci_urb_enqueue()
schedules the watchdog (step 4 and 12 above), it compares
ohci->prev_frame_no to IO_WATCHDOG_OFF so that ohci->prev_frame_no is
not overwritten while io_watchdog_func() is running.

v2: Instead of adding an extra flag variable, defining IO_WATCHDOG_OFF
as a special sentinel value for prev_frame_no.

Signed-off-by: Shigeru Yoshida 
Signed-off-by: Haiqing Bai 
---
 drivers/usb/host/ohci-hcd.c | 10 +++---
 drivers/usb/host/ohci-hub.c |  4 +++-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
index ee96763..84f88fa 100644
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -74,6 +74,7 @@
 
 #defineSTATECHANGE_DELAY   msecs_to_jiffies(300)
 #defineIO_WATCHDOG_DELAY   msecs_to_jiffies(275)
+#defineIO_WATCHDOG_OFF 0xff00
 
 #include "ohci.h"
 #include "pci-quirks.h"
@@ -231,7 +232,7 @@ static int ohci_urb_enqueue (
}
 
/* Start up the I/O watchdog timer, if it's not running */
-   if (!timer_pending(>io_watchdog) &&
+   if (ohci->prev_frame_no == IO_WATCHDOG_OFF &&
list_empty(>eds_in_use) &&
!(ohci->flags & OHCI_QUIRK_QEMU)) {
ohci->prev_frame_no = ohci_frame_no(ohci);
@@ -501,6 +502,7 @@ static int ohci_init (struct ohci_hcd *ohci)
return 0;
 
timer_setup(>io_watchdog, io_watchdog_func, 0);
+   ohci->prev_frame_no = IO_WATCHDOG_OFF;
 
ohci->hcca = dma_alloc_coherent (hcd->self.controller,
sizeof(*ohci->hcca), >hcca_dma, GFP_KERNEL);
@@ -730,7 +732,7 @@ static void io_watchdog_func(struct timer_list *t)
u32 head;
struct ed   *ed;
struct td   *td, *td_start, *td_next;
-   unsignedframe_no;
+   unsignedframe_no, prev_frame_no = IO_WATCHDOG_OFF;
unsigned long   flags;
 
spin_lock_irqsave(>lock, flags);
@@ -835,7 +837,7 @@ static void io_watchdog_func(struct timer_list *t)
}
}
if (!list_empty(>eds_in_use)) {
-   ohci->prev_frame_no = frame_no;
+   prev_frame_no = frame_no;
ohci->prev_wdh_cnt = ohci->wdh_cnt;
ohci->prev_donehead = ohci_readl(ohci,
>regs->donehead);
@@ -845,6 +847,7 @@ 

  1   2   3   4   5   6   7   8   9   10   >