Re: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage

2017-12-18 Thread Ingo Molnar

* Eric Biggers  wrote:

> There may be a small overhead caused by replacing 'xchg REG, REG' with
> the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per
> round.  But, counterintuitively, when I tested "ctr-twofish-3way" on a
> Haswell processor, the new version was actually about 2% faster.
> (Perhaps 'xchg' is not as well optimized as plain moves.)

XCHG has implicit LOCK semantics on all x86 CPUs, so that's not a surprising 
result I think.

Thanks,

Ingo


Re: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage

2017-12-18 Thread Ingo Molnar

* Eric Biggers  wrote:

> There may be a small overhead caused by replacing 'xchg REG, REG' with
> the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per
> round.  But, counterintuitively, when I tested "ctr-twofish-3way" on a
> Haswell processor, the new version was actually about 2% faster.
> (Perhaps 'xchg' is not as well optimized as plain moves.)

XCHG has implicit LOCK semantics on all x86 CPUs, so that's not a surprising 
result I think.

Thanks,

Ingo


Re: [PATCH v2 00/22] mmc: tmio: various fixes and cleanups

2017-12-18 Thread Ulf Hansson
On 19 December 2017 at 04:56, Masahiro Yamada
 wrote:
> Hi Ulf,
>
>
> 2017-12-15 18:18 GMT+09:00 Ulf Hansson :
>> On 24 November 2017 at 17:24, Masahiro Yamada
>>  wrote:
>>>
>>> I am working on this IP for Socionext SoCs.
>>>
>>> I was hit by several issues, and noticed various
>>> clean-up candidates.
>>>
>>>  - Fix and clean-up Kconfig
>>>  - Fix various card detection problems
>>>  - Move Renesas private data out of TMIO core
>>>  - Allow to perform platform-specific settings before MMC host starts
>>>  - Fix weird IRQ handling
>>>
>>> I am getting more and more patches for TMIO.
>>> I put all in a single series to clarify the patch order.
>>>
>>> 1, 2, 4, 5, 6, 7 were already acked or reviewed by Wolfram Sang.
>>>
>>>
>>> Masahiro Yamada (22):
>>>   mmc: renesas_sdhi: consolidate DMAC CONFIG options
>>>   mmc: renesas_sdhi: remove wrong depends on to enable compile test
>>>   mmc: renesas_sdhi: remove eprobe jump label
>>>   mmc: tmio: set tmio_mmc_host to driver data
>>>   mmc: tmio: use devm_ioremap_resource() instead of devm_ioremap()
>>>   mmc: tmio: move mmc_host_ops to struct tmio_mmc_host from static data
>>>   mmc: tmio, renesas_sdhi: set mmc_host_ops hooks directly
>>>   mmc: tmio: move mmc_gpio_request_cd() before mmc_add_host()
>>>   mmc: tmio: use mmc_can_gpio_cd() instead of checking
>>> TMIO_MMC_USE_GPIO_CD
>>>   mmc: tmio: support IP-builtin card detection logic
>>>   mmc: renesas_sdhi: remove always false condition
>>>   mmc: tmio,renesas_sdhi: move struct tmio_mmc_dma to renesas_sdhi.h
>>>   mmc: tmio,renesas_sdhi: move Renesas-specific DMA data to
>>> renesas_sdhi.h
>>>   mmc: tmio,renesas_sdhi: move ssc_tappos to renesas_sdhi.h
>>>   mmc: tmio: change bus_shift to unsigned int
>>>   mmc: tmio: fix never-detected card insertion bug
>>>   mmc: tmio: move TMIO_MASK_{READOP,WRITEOP} handling to correct place
>>>   mmc: tmio: remove useless TMIO_MASK_CMD handling in
>>> tmio_mmc_host_probe()
>>>   mmc: tmio: ioremap memory resource in tmio_mmc_host_alloc()
>>>   mmc: tmio: move clk_enable/disable out of tmio_mmc_host_probe()
>>>   mmc: tmio: move {tmio_}mmc_of_parse() to tmio_mmc_host_alloc()
>>>   mmc: tmio: remove dma_ops from tmio_mmc_host_probe() argument
>>>
>>>  drivers/mmc/host/Kconfig  |   5 +-
>>>  drivers/mmc/host/Makefile |   8 +-
>>>  drivers/mmc/host/renesas_sdhi.h   |  22 
>>>  drivers/mmc/host/renesas_sdhi_core.c  |  49 -
>>>  drivers/mmc/host/renesas_sdhi_internal_dmac.c |  14 ++-
>>>  drivers/mmc/host/renesas_sdhi_sys_dmac.c  |  35 +++---
>>>  drivers/mmc/host/tmio_mmc.c   |  23 ++--
>>>  drivers/mmc/host/tmio_mmc.h   |  23 +---
>>>  drivers/mmc/host/tmio_mmc_core.c  | 149 
>>> +-
>>>  9 files changed, 170 insertions(+), 158 deletions(-)
>>>
>>> --
>>> 2.7.4
>>>
>>
>> To get this moving, I have applied patch 1->8 for next, thanks!
>
>
> Could you apply 11->15 as well?
> They were reviewed by Wolfram.

Applied for next!

>
> We can skip 9, 10.

Okay, thought there were a dependency.

Thanks and kind regards
Uffe


Re: [PATCH v2 00/22] mmc: tmio: various fixes and cleanups

2017-12-18 Thread Ulf Hansson
On 19 December 2017 at 04:56, Masahiro Yamada
 wrote:
> Hi Ulf,
>
>
> 2017-12-15 18:18 GMT+09:00 Ulf Hansson :
>> On 24 November 2017 at 17:24, Masahiro Yamada
>>  wrote:
>>>
>>> I am working on this IP for Socionext SoCs.
>>>
>>> I was hit by several issues, and noticed various
>>> clean-up candidates.
>>>
>>>  - Fix and clean-up Kconfig
>>>  - Fix various card detection problems
>>>  - Move Renesas private data out of TMIO core
>>>  - Allow to perform platform-specific settings before MMC host starts
>>>  - Fix weird IRQ handling
>>>
>>> I am getting more and more patches for TMIO.
>>> I put all in a single series to clarify the patch order.
>>>
>>> 1, 2, 4, 5, 6, 7 were already acked or reviewed by Wolfram Sang.
>>>
>>>
>>> Masahiro Yamada (22):
>>>   mmc: renesas_sdhi: consolidate DMAC CONFIG options
>>>   mmc: renesas_sdhi: remove wrong depends on to enable compile test
>>>   mmc: renesas_sdhi: remove eprobe jump label
>>>   mmc: tmio: set tmio_mmc_host to driver data
>>>   mmc: tmio: use devm_ioremap_resource() instead of devm_ioremap()
>>>   mmc: tmio: move mmc_host_ops to struct tmio_mmc_host from static data
>>>   mmc: tmio, renesas_sdhi: set mmc_host_ops hooks directly
>>>   mmc: tmio: move mmc_gpio_request_cd() before mmc_add_host()
>>>   mmc: tmio: use mmc_can_gpio_cd() instead of checking
>>> TMIO_MMC_USE_GPIO_CD
>>>   mmc: tmio: support IP-builtin card detection logic
>>>   mmc: renesas_sdhi: remove always false condition
>>>   mmc: tmio,renesas_sdhi: move struct tmio_mmc_dma to renesas_sdhi.h
>>>   mmc: tmio,renesas_sdhi: move Renesas-specific DMA data to
>>> renesas_sdhi.h
>>>   mmc: tmio,renesas_sdhi: move ssc_tappos to renesas_sdhi.h
>>>   mmc: tmio: change bus_shift to unsigned int
>>>   mmc: tmio: fix never-detected card insertion bug
>>>   mmc: tmio: move TMIO_MASK_{READOP,WRITEOP} handling to correct place
>>>   mmc: tmio: remove useless TMIO_MASK_CMD handling in
>>> tmio_mmc_host_probe()
>>>   mmc: tmio: ioremap memory resource in tmio_mmc_host_alloc()
>>>   mmc: tmio: move clk_enable/disable out of tmio_mmc_host_probe()
>>>   mmc: tmio: move {tmio_}mmc_of_parse() to tmio_mmc_host_alloc()
>>>   mmc: tmio: remove dma_ops from tmio_mmc_host_probe() argument
>>>
>>>  drivers/mmc/host/Kconfig  |   5 +-
>>>  drivers/mmc/host/Makefile |   8 +-
>>>  drivers/mmc/host/renesas_sdhi.h   |  22 
>>>  drivers/mmc/host/renesas_sdhi_core.c  |  49 -
>>>  drivers/mmc/host/renesas_sdhi_internal_dmac.c |  14 ++-
>>>  drivers/mmc/host/renesas_sdhi_sys_dmac.c  |  35 +++---
>>>  drivers/mmc/host/tmio_mmc.c   |  23 ++--
>>>  drivers/mmc/host/tmio_mmc.h   |  23 +---
>>>  drivers/mmc/host/tmio_mmc_core.c  | 149 
>>> +-
>>>  9 files changed, 170 insertions(+), 158 deletions(-)
>>>
>>> --
>>> 2.7.4
>>>
>>
>> To get this moving, I have applied patch 1->8 for next, thanks!
>
>
> Could you apply 11->15 as well?
> They were reviewed by Wolfram.

Applied for next!

>
> We can skip 9, 10.

Okay, thought there were a dependency.

Thanks and kind regards
Uffe


Re: stack traces and zombie tasks

2017-12-18 Thread Ingo Molnar

* Miroslav Benes  wrote:

> On Sun, 17 Dec 2017, Josh Poimboeuf wrote:
> 
> > On Fri, Dec 15, 2017 at 07:51:45AM -0800, Andy Lutomirski wrote:
> > > On Fri, Dec 15, 2017 at 4:54 AM, Miroslav Benes  wrote:
> > > > Hi,
> > > >
> > > > commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
> > > > it") slightly changed the behaviour of stack traces dumping for zombie
> > > > tasks.
> > > >
> > > > Before the commit (well, this is older SLE12 kernel, but that should not
> > > > matter), if one called 'cat /proc//stack', they would get
> > > > something like this
> > > >
> > > >   [] do_exit+0x6f7/0xa80
> > > >   [] do_group_exit+0x39/0xa0
> > > >   [] __wake_up_parent+0x0/0x30
> > > >   [] system_call_fastpath+0x16/0x1b
> > > >   [<7fd128f9c4f9>] 0x7fd128f9c4f9
> > > >   [] 0x
> > > >
> > > > After, one gets nothing. The trace is empty. try_get_task_stack() 
> > > > contains
> > > > atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
> > > > x86_64) and because stack_refcount is 0 for a zombie task, it returns
> > > > NULL. Therefore, all save_stack_trace_*() functions return immediately.
> > > >
> > > > I guess that no one has cared about it so far. There is a problem for
> > > > live patching though. save_stack_trace_tsk_reliable() returns -EINVAL 
> > > > for
> > > > the zombie task and its stack is deemed unreliable. It could block our
> > > > transition for quite a long time.
> > > >
> > > > We can skip those tasks in kernel/livepatch/ with a simple test we have 
> > > > in
> > > > kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 
> > > > 0).
> > > > But you may want to change it generally, so better to ask first.
> > > >
> > > 
> > > Sounds like a bug in save_stack_trace_tsk_reliable() to me: if the
> > > task has no stack, then the trace is 100% definitely empty :)
> > 
> > I would agree with that, something like the following should fix it?
> > 
> > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> > index 77835bc021c7..20161ef53537 100644
> > --- a/arch/x86/kernel/stacktrace.c
> > +++ b/arch/x86/kernel/stacktrace.c
> > @@ -164,8 +164,12 @@ int save_stack_trace_tsk_reliable(struct task_struct 
> > *tsk,
> >  {
> > int ret;
> >  
> > +   /*
> > +* If the task doesn't have a stack (e.g., a zombie), the stack is
> > +* "reliably" empty.
> > +*/
> > if (!try_get_task_stack(tsk))
> > -   return -EINVAL;
> > +   return 0;
> >  
> > ret = __save_stack_trace_reliable(trace, tsk);
> 
> This obviously fixes the problem, so you can add
> 
> Reported-and-tested-by: Miroslav Benes 

Great.

Josh, mind sending a changelogged version, or should I distill a commit out of 
this discussion, for tip:x86/urgent?

Thanks,

Ingo


Re: stack traces and zombie tasks

2017-12-18 Thread Ingo Molnar

* Miroslav Benes  wrote:

> On Sun, 17 Dec 2017, Josh Poimboeuf wrote:
> 
> > On Fri, Dec 15, 2017 at 07:51:45AM -0800, Andy Lutomirski wrote:
> > > On Fri, Dec 15, 2017 at 4:54 AM, Miroslav Benes  wrote:
> > > > Hi,
> > > >
> > > > commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
> > > > it") slightly changed the behaviour of stack traces dumping for zombie
> > > > tasks.
> > > >
> > > > Before the commit (well, this is older SLE12 kernel, but that should not
> > > > matter), if one called 'cat /proc//stack', they would get
> > > > something like this
> > > >
> > > >   [] do_exit+0x6f7/0xa80
> > > >   [] do_group_exit+0x39/0xa0
> > > >   [] __wake_up_parent+0x0/0x30
> > > >   [] system_call_fastpath+0x16/0x1b
> > > >   [<7fd128f9c4f9>] 0x7fd128f9c4f9
> > > >   [] 0x
> > > >
> > > > After, one gets nothing. The trace is empty. try_get_task_stack() 
> > > > contains
> > > > atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
> > > > x86_64) and because stack_refcount is 0 for a zombie task, it returns
> > > > NULL. Therefore, all save_stack_trace_*() functions return immediately.
> > > >
> > > > I guess that no one has cared about it so far. There is a problem for
> > > > live patching though. save_stack_trace_tsk_reliable() returns -EINVAL 
> > > > for
> > > > the zombie task and its stack is deemed unreliable. It could block our
> > > > transition for quite a long time.
> > > >
> > > > We can skip those tasks in kernel/livepatch/ with a simple test we have 
> > > > in
> > > > kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 
> > > > 0).
> > > > But you may want to change it generally, so better to ask first.
> > > >
> > > 
> > > Sounds like a bug in save_stack_trace_tsk_reliable() to me: if the
> > > task has no stack, then the trace is 100% definitely empty :)
> > 
> > I would agree with that, something like the following should fix it?
> > 
> > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> > index 77835bc021c7..20161ef53537 100644
> > --- a/arch/x86/kernel/stacktrace.c
> > +++ b/arch/x86/kernel/stacktrace.c
> > @@ -164,8 +164,12 @@ int save_stack_trace_tsk_reliable(struct task_struct 
> > *tsk,
> >  {
> > int ret;
> >  
> > +   /*
> > +* If the task doesn't have a stack (e.g., a zombie), the stack is
> > +* "reliably" empty.
> > +*/
> > if (!try_get_task_stack(tsk))
> > -   return -EINVAL;
> > +   return 0;
> >  
> > ret = __save_stack_trace_reliable(trace, tsk);
> 
> This obviously fixes the problem, so you can add
> 
> Reported-and-tested-by: Miroslav Benes 

Great.

Josh, mind sending a changelogged version, or should I distill a commit out of 
this discussion, for tip:x86/urgent?

Thanks,

Ingo


Re: [PATCH v1] media: videobuf2: Add new uAPI for DVB streaming I/O

2017-12-18 Thread Philippe Ombredanne
Dear Satendra,

On Tue, Dec 19, 2017 at 4:35 AM, Satendra Singh Thakur
 wrote:



> --- /dev/null
> +++ b/drivers/media/dvb-core/dvb_vb2.c
> @@ -0,0 +1,422 @@
> +/*
> + * dvb-vb2.c - dvb-vb2
> + *
> + * Copyright (C) 2015 Samsung Electronics
> + *
> + * Author: jh1009.s...@samsung.com
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation.
> + */

Would you mind using the new SPDX tags documented in Thomas patch set
[1] rather than this fine but longer legalese?

And if you could spread the word to others in your team this would be very nice.
See also this fine article posted by Mauro on the Samsung Open Source
Group Blog [2]
Thank you!

[1] https://lkml.org/lkml/2017/12/4/934
[2] https://blogs.s-osg.org/linux-kernel-license-practices-revisited-spdx/
-- 
Cordially
Philippe Ombredanne


Re: [PATCH v1] media: videobuf2: Add new uAPI for DVB streaming I/O

2017-12-18 Thread Philippe Ombredanne
Dear Satendra,

On Tue, Dec 19, 2017 at 4:35 AM, Satendra Singh Thakur
 wrote:



> --- /dev/null
> +++ b/drivers/media/dvb-core/dvb_vb2.c
> @@ -0,0 +1,422 @@
> +/*
> + * dvb-vb2.c - dvb-vb2
> + *
> + * Copyright (C) 2015 Samsung Electronics
> + *
> + * Author: jh1009.s...@samsung.com
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation.
> + */

Would you mind using the new SPDX tags documented in Thomas patch set
[1] rather than this fine but longer legalese?

And if you could spread the word to others in your team this would be very nice.
See also this fine article posted by Mauro on the Samsung Open Source
Group Blog [2]
Thank you!

[1] https://lkml.org/lkml/2017/12/4/934
[2] https://blogs.s-osg.org/linux-kernel-license-practices-revisited-spdx/
-- 
Cordially
Philippe Ombredanne


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Mon, Dec 18, 2017 at 12:45:13PM -0800, Dave Hansen wrote:
> > On 12/18/2017 12:41 PM, Peter Zijlstra wrote:
> > >> I also don't think the user_shared area of the fixmap can get *that*
> > >> big.  Does anybody know offhand what the theoretical limits are there?
> > > Problem there is the nr_cpus term I think, we currently have up to 8k
> > > CPUs, but I can see that getting bigger in the future.
> > 
> > It only matters if we go over 512GB, though.  Is the per-cpu part of the
> > fixmap ever more than 512GB/8k=64MB?
> 
> Unlikely, I think the LDT (@ 32 pages / 128K) and the DS (@ 2*4 pages /
> 32K) are the largest entries in there.

Note that with the latest state of things the LDT is not in the fixmap anymore, 
it's mapped separately, via Andy's following patch:

  e86aaee3f2d9: ("x86/pti: Put the LDT in its own PGD if PTI is on")

We have the IDT, the per-CPU entry area and the Debug Store (on Intel CPUs) 
mapped 
in the fixmap area, in addition to the usual fixmap entries that are a handful 
of 
pages. (That's on 64-bit - on 32-bit we have a pretty large kmap area.)

The biggest contribution to the size of the fixmap area is struct 
cpu_entry_area 
(FIX_CPU_ENTRY_AREA_BOTTOM..FIX_CPU_ENTRY_AREA_TOP), which is ~180k, i.e. 44 
pages.

Our current NR_CPUS limit is 8,192 CPUs, but even with 65,536 CPUs the fixmap 
area 
would still only be ~12 GB total - so we are far from running out of space.

Btw., the DS portion of the fixmap is not 64K but 128K. Here's the current size 
distribution of struct cpu_entry_area:

  ::gdt   -4K
  ::tss   -   12K
  ::entry_trampoline  -4K
#if 64-bit
  ::exception_stacks  -   20K
#endif
#if Intel
  ::cpu_debug_store   -4K
  ::cpu_debug_buffers -  128K
#endif

So ::cpu_debug_buffers (struct debug_store_buffers) is the biggest fixmap 
chunk, 
by far, distributed the following way:

  ::bts_buffer[]  -   64K
  ::pebs_buffer[] -   64K

Thanks,

Ingo


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Mon, Dec 18, 2017 at 12:45:13PM -0800, Dave Hansen wrote:
> > On 12/18/2017 12:41 PM, Peter Zijlstra wrote:
> > >> I also don't think the user_shared area of the fixmap can get *that*
> > >> big.  Does anybody know offhand what the theoretical limits are there?
> > > Problem there is the nr_cpus term I think, we currently have up to 8k
> > > CPUs, but I can see that getting bigger in the future.
> > 
> > It only matters if we go over 512GB, though.  Is the per-cpu part of the
> > fixmap ever more than 512GB/8k=64MB?
> 
> Unlikely, I think the LDT (@ 32 pages / 128K) and the DS (@ 2*4 pages /
> 32K) are the largest entries in there.

Note that with the latest state of things the LDT is not in the fixmap anymore, 
it's mapped separately, via Andy's following patch:

  e86aaee3f2d9: ("x86/pti: Put the LDT in its own PGD if PTI is on")

We have the IDT, the per-CPU entry area and the Debug Store (on Intel CPUs) 
mapped 
in the fixmap area, in addition to the usual fixmap entries that are a handful 
of 
pages. (That's on 64-bit - on 32-bit we have a pretty large kmap area.)

The biggest contribution to the size of the fixmap area is struct 
cpu_entry_area 
(FIX_CPU_ENTRY_AREA_BOTTOM..FIX_CPU_ENTRY_AREA_TOP), which is ~180k, i.e. 44 
pages.

Our current NR_CPUS limit is 8,192 CPUs, but even with 65,536 CPUs the fixmap 
area 
would still only be ~12 GB total - so we are far from running out of space.

Btw., the DS portion of the fixmap is not 64K but 128K. Here's the current size 
distribution of struct cpu_entry_area:

  ::gdt   -4K
  ::tss   -   12K
  ::entry_trampoline  -4K
#if 64-bit
  ::exception_stacks  -   20K
#endif
#if Intel
  ::cpu_debug_store   -4K
  ::cpu_debug_buffers -  128K
#endif

So ::cpu_debug_buffers (struct debug_store_buffers) is the biggest fixmap 
chunk, 
by far, distributed the following way:

  ::bts_buffer[]  -   64K
  ::pebs_buffer[] -   64K

Thanks,

Ingo


error: redefinition of ‘shipped_regdb_certs’

2017-12-18 Thread Michal Hocko
Hi Johannes,
this is the first time I am seeing this with the current Linus
tree. 4.15-rc3 compiled fine. I have checked that you have updated the
makefile which generates the file by 715a12334764 ("wireless: don't
write C files on failures").

I am getting the following compilation error:
  CC [M]  net/wireless/shipped-certs.o
net/wireless/shipped-certs.c:686:10: error: redefinition of 
‘shipped_regdb_certs’
 const u8 shipped_regdb_certs[] = {
  ^~~
net/wireless/shipped-certs.c:2:10: note: previous definition of 
‘shipped_regdb_certs’ was here
 const u8 shipped_regdb_certs[] = {
  ^~~
net/wireless/shipped-certs.c:1368:14: error: redefinition of 
‘shipped_regdb_certs_len’
 unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);
  ^~~
net/wireless/shipped-certs.c:684:14: note: previous definition of 
‘shipped_regdb_certs_len’ was here
 unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);
  ^~~
scripts/Makefile.build:310: recipe for target 'net/wireless/shipped-certs.o' 
failed
make[3]: *** [net/wireless/shipped-certs.o] Error 1
scripts/Makefile.build:569: recipe for target 'net/wireless' failed
make[2]: *** [net/wireless] Error 2
Makefile:1012: recipe for target 'net' failed
make[1]: *** [net] Error 2

I have no idea how the generated file is supposed to work but it seems
that the cleanup doesn't work properly.  Compilation succeeds if I
remove the file and it is generated again:

$ rm net/wireless/shipped-certs.c
[...]
  GEN net/wireless/shipped-certs.c
  CC [M]  net/wireless/shipped-certs.o

I have double checked that neither make clean/distclean removed the
file. Strangely enough neither did the tree with 715a12334764 reverted.

Any idea?
-- 
Michal Hocko
SUSE Labs


error: redefinition of ‘shipped_regdb_certs’

2017-12-18 Thread Michal Hocko
Hi Johannes,
this is the first time I am seeing this with the current Linus
tree. 4.15-rc3 compiled fine. I have checked that you have updated the
makefile which generates the file by 715a12334764 ("wireless: don't
write C files on failures").

I am getting the following compilation error:
  CC [M]  net/wireless/shipped-certs.o
net/wireless/shipped-certs.c:686:10: error: redefinition of 
‘shipped_regdb_certs’
 const u8 shipped_regdb_certs[] = {
  ^~~
net/wireless/shipped-certs.c:2:10: note: previous definition of 
‘shipped_regdb_certs’ was here
 const u8 shipped_regdb_certs[] = {
  ^~~
net/wireless/shipped-certs.c:1368:14: error: redefinition of 
‘shipped_regdb_certs_len’
 unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);
  ^~~
net/wireless/shipped-certs.c:684:14: note: previous definition of 
‘shipped_regdb_certs_len’ was here
 unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);
  ^~~
scripts/Makefile.build:310: recipe for target 'net/wireless/shipped-certs.o' 
failed
make[3]: *** [net/wireless/shipped-certs.o] Error 1
scripts/Makefile.build:569: recipe for target 'net/wireless' failed
make[2]: *** [net/wireless] Error 2
Makefile:1012: recipe for target 'net' failed
make[1]: *** [net] Error 2

I have no idea how the generated file is supposed to work but it seems
that the cleanup doesn't work properly.  Compilation succeeds if I
remove the file and it is generated again:

$ rm net/wireless/shipped-certs.c
[...]
  GEN net/wireless/shipped-certs.c
  CC [M]  net/wireless/shipped-certs.o

I have double checked that neither make clean/distclean removed the
file. Strangely enough neither did the tree with 715a12334764 reverted.

Any idea?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH for-4.15] ASoC: rt5514: don't assume rt5514 component was "attached"

2017-12-18 Thread Cheng-yi Chiang
Hi Brian, I am sorry for not using the plain text mode in the previous mail.
I agree with you on other points.

On Tue, Dec 19, 2017 at 1:42 AM, Brian Norris  wrote:
> Hi!
>
> (By the way, your mail is HTML and likely will get rejected by many
> mailing lists and/or people reading these mailing lists.)
>
> On Mon, Dec 18, 2017 at 12:23:18PM +0800, Cheng-yi Chiang wrote:
>>Hi Brian,
>>  Oder has posted the same fix : [1]https://patchwork.kernel.org/
>>patch/10066257/ and it has been applied.
>
> OK cool. Obviously I'm biased, but I prefer mine, as it has less
> needless whitespace, and is appropriately documented ;) But it should be
> a fine replacement.
>
>>Perhaps you can cherry-pick it to v4.15 ?
>
> I have no say in that; that would be up to Mark, I think. It's most
> certainly a regression in 4.15-rc1, so IMO it should be given a proper
> 'Fixes: e9c50aa6bd39 ("ASoC: rt5514-spi: check irq status to schedule
> data copy in resume function")' tag and sent for 4.15, not 4.16.
>
> Brian


Re: [PATCH for-4.15] ASoC: rt5514: don't assume rt5514 component was "attached"

2017-12-18 Thread Cheng-yi Chiang
Hi Brian, I am sorry for not using the plain text mode in the previous mail.
I agree with you on other points.

On Tue, Dec 19, 2017 at 1:42 AM, Brian Norris  wrote:
> Hi!
>
> (By the way, your mail is HTML and likely will get rejected by many
> mailing lists and/or people reading these mailing lists.)
>
> On Mon, Dec 18, 2017 at 12:23:18PM +0800, Cheng-yi Chiang wrote:
>>Hi Brian,
>>  Oder has posted the same fix : [1]https://patchwork.kernel.org/
>>patch/10066257/ and it has been applied.
>
> OK cool. Obviously I'm biased, but I prefer mine, as it has less
> needless whitespace, and is appropriately documented ;) But it should be
> a fine replacement.
>
>>Perhaps you can cherry-pick it to v4.15 ?
>
> I have no say in that; that would be up to Mark, I think. It's most
> certainly a regression in 4.15-rc1, so IMO it should be given a proper
> 'Fixes: e9c50aa6bd39 ("ASoC: rt5514-spi: check irq status to schedule
> data copy in resume function")' tag and sent for 4.15, not 4.16.
>
> Brian


Re: [PATCH 3/4] PM / core: Direct DPM_FLAG_SMART_SUSPEND optimization

2017-12-18 Thread Ulf Hansson
On 10 December 2017 at 01:00, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> Make the PM core avoid invoking the "late" and "noirq" system-wide
> suspend (or analogous) callbacks provided by device drivers directly
> for devices with DPM_FLAG_SMART_SUSPEND set that are in runtime
> suspend during the "late" and "noirq" phases of system-wide suspend
> (or analogous) transitions.  That is only done for devices without
> any middle-layer "late" and "noirq" suspend callbacks (to avoid
> confusing the middle layer if there is one).
>
> The underlying observation is that runtime PM is disabled for devices
> during the "late" and "noirq" system-wide suspend phases, so if they
> remain in runtime suspend from the "late" phase forward, it doesn't
> make sense to invoke the "late" and "noirq" callbacks provided by
> the drivers for them (arguably, the device is already suspended and
> in the right state).  Thus, if the remaining driver suspend callbacks
> are to be invoked directly by the core, they can be skipped.
>

As I have stated earlier, this isn't going to solve the general case,
as the above change log seems to state. I think we really need to do
that, before adding yet another system suspend/resume optimization
path in the PM core.

The main reason is that lots of drivers have additional operations to
perform, besides making sure that its device is put into a "runtime
suspended state" during system suspend. In other words, skipping
system suspend callbacks (and likewise for system resume) is to me the
wrong solution to mitigate these problems.

> This change really makes it possible for, say, platform device
> drivers to re-use runtime PM suspend and resume callbacks by
> pointing ->suspend_late and ->resume_early, respectively (and
> possibly the analogous hibernation-related callback pointers too),
> to them without adding any extra "is the device already suspended?"
> type of checks to the callback routines, as long as they will be
> invoked directly by the core.

Certainly there are drivers that could start to deploying this
solution, because at the moment they don't have any additional
operations to perform at system suspend. But what about the rest,
don't we care about them?

Moreover, what happens when/if a driver that has deployed this
solution, starts being used on a new SoC and then additional
operations during system suspend becomes required (for example
pinctrls that needs to be put in a system sleep state)? Then
everything falls apart, doesn't it?

Kind regards
Uffe

>
> Signed-off-by: Rafael J. Wysocki 
> ---
>  Documentation/driver-api/pm/devices.rst |   18 +++---
>  drivers/base/power/main.c   |   85 
> +---
>  2 files changed, 88 insertions(+), 15 deletions(-)
>
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -777,14 +777,16 @@ The driver can indicate that by setting
>  runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
>  suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
>  has been disabled for it, under the assumption that its state should not 
> change
> -after that point until the system-wide transition is over.  If that happens, 
> the
> -driver's system-wide resume callbacks, if present, may still be invoked 
> during
> -the subsequent system-wide resume transition and the device's runtime power
> -management status may be set to "active" before enabling runtime PM for it,
> -so the driver must be prepared to cope with the invocation of its system-wide
> -resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
> -intervening ``->runtime_resume`` and so on) and the final state of the device
> -must reflect the "active" status for runtime PM in that case.
> +after that point until the system-wide transition is over (the PM core itself
> +does that for devices whose "noirq", "late" and "early" system-wide PM 
> callbacks
> +are executed directly by it).  If that happens, the driver's system-wide 
> resume
> +callbacks, if present, may still be invoked during the subsequent system-wide
> +resume transition and the device's runtime power management status may be set
> +to "active" before enabling runtime PM for it, so the driver must be 
> prepared to
> +cope with the invocation of its system-wide resume callbacks back-to-back 
> with
> +its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` 
> and
> +so on) and the final state of the device must reflect the "active" runtime PM
> +status in that case.
>
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in 
> :file:`Documentation/power/runtime_pm.txt`.
> 

Re: [PATCH 3/4] PM / core: Direct DPM_FLAG_SMART_SUSPEND optimization

2017-12-18 Thread Ulf Hansson
On 10 December 2017 at 01:00, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> Make the PM core avoid invoking the "late" and "noirq" system-wide
> suspend (or analogous) callbacks provided by device drivers directly
> for devices with DPM_FLAG_SMART_SUSPEND set that are in runtime
> suspend during the "late" and "noirq" phases of system-wide suspend
> (or analogous) transitions.  That is only done for devices without
> any middle-layer "late" and "noirq" suspend callbacks (to avoid
> confusing the middle layer if there is one).
>
> The underlying observation is that runtime PM is disabled for devices
> during the "late" and "noirq" system-wide suspend phases, so if they
> remain in runtime suspend from the "late" phase forward, it doesn't
> make sense to invoke the "late" and "noirq" callbacks provided by
> the drivers for them (arguably, the device is already suspended and
> in the right state).  Thus, if the remaining driver suspend callbacks
> are to be invoked directly by the core, they can be skipped.
>

As I have stated earlier, this isn't going to solve the general case,
as the above change log seems to state. I think we really need to do
that, before adding yet another system suspend/resume optimization
path in the PM core.

The main reason is that lots of drivers have additional operations to
perform, besides making sure that its device is put into a "runtime
suspended state" during system suspend. In other words, skipping
system suspend callbacks (and likewise for system resume) is to me the
wrong solution to mitigate these problems.

> This change really makes it possible for, say, platform device
> drivers to re-use runtime PM suspend and resume callbacks by
> pointing ->suspend_late and ->resume_early, respectively (and
> possibly the analogous hibernation-related callback pointers too),
> to them without adding any extra "is the device already suspended?"
> type of checks to the callback routines, as long as they will be
> invoked directly by the core.

Certainly there are drivers that could start to deploying this
solution, because at the moment they don't have any additional
operations to perform at system suspend. But what about the rest,
don't we care about them?

Moreover, what happens when/if a driver that has deployed this
solution, starts being used on a new SoC and then additional
operations during system suspend becomes required (for example
pinctrls that needs to be put in a system sleep state)? Then
everything falls apart, doesn't it?

Kind regards
Uffe

>
> Signed-off-by: Rafael J. Wysocki 
> ---
>  Documentation/driver-api/pm/devices.rst |   18 +++---
>  drivers/base/power/main.c   |   85 
> +---
>  2 files changed, 88 insertions(+), 15 deletions(-)
>
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -777,14 +777,16 @@ The driver can indicate that by setting
>  runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
>  suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
>  has been disabled for it, under the assumption that its state should not 
> change
> -after that point until the system-wide transition is over.  If that happens, 
> the
> -driver's system-wide resume callbacks, if present, may still be invoked 
> during
> -the subsequent system-wide resume transition and the device's runtime power
> -management status may be set to "active" before enabling runtime PM for it,
> -so the driver must be prepared to cope with the invocation of its system-wide
> -resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
> -intervening ``->runtime_resume`` and so on) and the final state of the device
> -must reflect the "active" status for runtime PM in that case.
> +after that point until the system-wide transition is over (the PM core itself
> +does that for devices whose "noirq", "late" and "early" system-wide PM 
> callbacks
> +are executed directly by it).  If that happens, the driver's system-wide 
> resume
> +callbacks, if present, may still be invoked during the subsequent system-wide
> +resume transition and the device's runtime power management status may be set
> +to "active" before enabling runtime PM for it, so the driver must be 
> prepared to
> +cope with the invocation of its system-wide resume callbacks back-to-back 
> with
> +its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` 
> and
> +so on) and the final state of the device must reflect the "active" runtime PM
> +status in that case.
>
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in 
> :file:`Documentation/power/runtime_pm.txt`.
> Index: linux-pm/drivers/base/power/main.c
> 

Re: [PATCH 4.14 000/178] 4.14.8-stable review

2017-12-18 Thread Greg Kroah-Hartman
On Mon, Dec 18, 2017 at 01:25:06PM -0700, Shuah Khan wrote:
> On 12/18/2017 08:47 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.8 release.
> > There are 178 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Dec 20 15:28:30 UTC 2017.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.8-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.14.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.14 000/178] 4.14.8-stable review

2017-12-18 Thread Greg Kroah-Hartman
On Mon, Dec 18, 2017 at 01:25:06PM -0700, Shuah Khan wrote:
> On 12/18/2017 08:47 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.8 release.
> > There are 178 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Dec 20 15:28:30 UTC 2017.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.8-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.14.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.4 000/115] 4.4.107-stable review

2017-12-18 Thread Greg Kroah-Hartman
On Mon, Dec 18, 2017 at 11:17:30AM -0700, Nathan Chancellor wrote:
> Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5.
> 
> No initial issues noticed in general usage or dmesg.

Great, thanks for testing and letting me know.

greg k-h


Re: [PATCH 4.4 000/115] 4.4.107-stable review

2017-12-18 Thread Greg Kroah-Hartman
On Mon, Dec 18, 2017 at 11:17:30AM -0700, Nathan Chancellor wrote:
> Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5.
> 
> No initial issues noticed in general usage or dmesg.

Great, thanks for testing and letting me know.

greg k-h


[PATCH] cgroup: Fix deadlock in cpu hotplug path

2017-12-18 Thread Prateek Sood
Deadlock during cgroup migration from cpu hotplug path when a task T is
being moved from source to destination cgroup.

kworker/0:0
cpuset_hotplug_workfn()
   cpuset_hotplug_update_tasks()
  hotplug_update_tasks_legacy()
remove_tasks_in_empty_cpuset()
  cgroup_transfer_tasks() // stuck in iterator loop
cgroup_migrate()
  cgroup_migrate_add_task()

In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
Task T will not migrate to destination cgroup. css_task_iter_start()
will keep pointing to task T in loop waiting for task T cg_list node
to be removed.

Task T
do_exit()
  exit_signals() // sets PF_EXITING
  exit_task_namespaces()
switch_task_namespaces()
  free_nsproxy()
put_mnt_ns()
  drop_collected_mounts()
namespace_unlock()
  synchronize_rcu()
_synchronize_rcu_expedited()
  schedule_work() // on cpu0 low priority worker pool
  wait_event() // waiting for work item to execute

Task T inserted a work item in the worklist of cpu0 low priority
worker pool. It is waiting for expedited grace period work item
to execute. This work item will only be executed once kworker/0:0
complete execution of cpuset_hotplug_workfn().

kworker/0:0 ==> Task T ==>kworker/0:0

In case of PF_EXITING task being migrated from source to destination
cgroup, migrate next available task in source cgroup.

Change-Id: I8874fb04479c136cae4dabd5c168c7749df4
Signed-off-by: Prateek Sood 
---
 kernel/cgroup/cgroup-v1.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 024085d..a2c05d2 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -123,7 +123,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup 
*from)
 */
do {
css_task_iter_start(>self, 0, );
-   task = css_task_iter_next();
+
+   do {
+   task = css_task_iter_next();
+   } while (task && (task->flags & PF_EXITING));
+
if (task)
get_task_struct(task);
css_task_iter_end();
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., 
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.



[PATCH] cgroup: Fix deadlock in cpu hotplug path

2017-12-18 Thread Prateek Sood
Deadlock during cgroup migration from cpu hotplug path when a task T is
being moved from source to destination cgroup.

kworker/0:0
cpuset_hotplug_workfn()
   cpuset_hotplug_update_tasks()
  hotplug_update_tasks_legacy()
remove_tasks_in_empty_cpuset()
  cgroup_transfer_tasks() // stuck in iterator loop
cgroup_migrate()
  cgroup_migrate_add_task()

In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
Task T will not migrate to destination cgroup. css_task_iter_start()
will keep pointing to task T in loop waiting for task T cg_list node
to be removed.

Task T
do_exit()
  exit_signals() // sets PF_EXITING
  exit_task_namespaces()
switch_task_namespaces()
  free_nsproxy()
put_mnt_ns()
  drop_collected_mounts()
namespace_unlock()
  synchronize_rcu()
_synchronize_rcu_expedited()
  schedule_work() // on cpu0 low priority worker pool
  wait_event() // waiting for work item to execute

Task T inserted a work item in the worklist of cpu0 low priority
worker pool. It is waiting for expedited grace period work item
to execute. This work item will only be executed once kworker/0:0
complete execution of cpuset_hotplug_workfn().

kworker/0:0 ==> Task T ==>kworker/0:0

In case of PF_EXITING task being migrated from source to destination
cgroup, migrate next available task in source cgroup.

Change-Id: I8874fb04479c136cae4dabd5c168c7749df4
Signed-off-by: Prateek Sood 
---
 kernel/cgroup/cgroup-v1.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 024085d..a2c05d2 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -123,7 +123,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup 
*from)
 */
do {
css_task_iter_start(>self, 0, );
-   task = css_task_iter_next();
+
+   do {
+   task = css_task_iter_next();
+   } while (task && (task->flags & PF_EXITING));
+
if (task)
get_task_struct(task);
css_task_iter_end();
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., 
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.



Re: [PATCH] powerpc/perf: Dereference bhrb entries safely

2017-12-18 Thread Ravi Bangoria
Hi Balbir,

Sorry was away for few days.

On 12/14/2017 05:54 PM, Balbir Singh wrote:
> On Tue, Dec 12, 2017 at 11:29 PM, Ravi Bangoria
>  wrote:
>> It may very well happen that branch instructions recorded by
>> bhrb entries already get unmapped before they get processed by
>> the kernel. Hence, trying to dereference such memory location
>> will endup in a crash. Ex,
>>
>> Unable to handle kernel paging request for data at address 
>> 0xc00819c41764
>> Faulting instruction address: 0xc0084a14
>> NIP [c0084a14] branch_target+0x4/0x70
>> LR [c00eb828] record_and_restart+0x568/0x5c0
>> Call Trace:
>> [c00eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
>> [c00ec378] perf_event_interrupt+0x298/0x460
>> [c0027964] performance_monitor_exception+0x54/0x70
>> [c0009ba4] performance_monitor_common+0x114/0x120
>>
>> Fix this by deferefencing them safely.
>>
>> Suggested-by: Naveen N. Rao 
>> Signed-off-by: Ravi Bangoria 
>> ---
>>  arch/powerpc/perf/core-book3s.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index 9e3da168d54c..5a68d2effdf9 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -410,8 +410,11 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>> int ret;
>> __u64 target;
>>
>> -   if (is_kernel_addr(addr))
>> -   return branch_target((unsigned int *)addr);
>> +   if (is_kernel_addr(addr)) {
> I think __kernel_text_address() is more accurate right? In which case
> you need to check for is_kernel_addr(addr) and if its not 
> kernel_text_address()
> then we have an interesting case of a branch from something not text.
> It would be nice to catch such cases.

Something like this?

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1538129..627af56 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -410,8 +410,13 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 int ret;
 __u64 target;
 
-    if (is_kernel_addr(addr))
-        return branch_target((unsigned int *)addr);
+    if (is_kernel_addr(addr)) {
+        if (probe_kernel_address((void *)addr, instr)) {
+            WARN_ON(!__kernel_text_address(addr));
+            return 0;
+        }
+        return branch_target();
+    }
 
 /* Userspace: need copy instruction here then translate it */
 pagefault_disable();


I think this will throw warnings when you try to read recently unmapped
vmalloced address. Is that fine?

Thanks for the review.
Ravi

>> +   if (probe_kernel_address((void *)addr, instr))
>> +   return 0;
>> +   return branch_target();
>> +   }
>>
>> /* Userspace: need copy instruction here then translate it */
>> pagefault_disable();
> Otherwise,
>
> Reviewed-by: Balbir Singh 
>



Re: [PATCH] powerpc/perf: Dereference bhrb entries safely

2017-12-18 Thread Ravi Bangoria
Hi Balbir,

Sorry was away for few days.

On 12/14/2017 05:54 PM, Balbir Singh wrote:
> On Tue, Dec 12, 2017 at 11:29 PM, Ravi Bangoria
>  wrote:
>> It may very well happen that branch instructions recorded by
>> bhrb entries already get unmapped before they get processed by
>> the kernel. Hence, trying to dereference such memory location
>> will endup in a crash. Ex,
>>
>> Unable to handle kernel paging request for data at address 
>> 0xc00819c41764
>> Faulting instruction address: 0xc0084a14
>> NIP [c0084a14] branch_target+0x4/0x70
>> LR [c00eb828] record_and_restart+0x568/0x5c0
>> Call Trace:
>> [c00eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
>> [c00ec378] perf_event_interrupt+0x298/0x460
>> [c0027964] performance_monitor_exception+0x54/0x70
>> [c0009ba4] performance_monitor_common+0x114/0x120
>>
>> Fix this by deferefencing them safely.
>>
>> Suggested-by: Naveen N. Rao 
>> Signed-off-by: Ravi Bangoria 
>> ---
>>  arch/powerpc/perf/core-book3s.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index 9e3da168d54c..5a68d2effdf9 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -410,8 +410,11 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>> int ret;
>> __u64 target;
>>
>> -   if (is_kernel_addr(addr))
>> -   return branch_target((unsigned int *)addr);
>> +   if (is_kernel_addr(addr)) {
> I think __kernel_text_address() is more accurate right? In which case
> you need to check for is_kernel_addr(addr) and if its not 
> kernel_text_address()
> then we have an interesting case of a branch from something not text.
> It would be nice to catch such cases.

Something like this?

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1538129..627af56 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -410,8 +410,13 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 int ret;
 __u64 target;
 
-    if (is_kernel_addr(addr))
-        return branch_target((unsigned int *)addr);
+    if (is_kernel_addr(addr)) {
+        if (probe_kernel_address((void *)addr, instr)) {
+            WARN_ON(!__kernel_text_address(addr));
+            return 0;
+        }
+        return branch_target();
+    }
 
 /* Userspace: need copy instruction here then translate it */
 pagefault_disable();


I think this will throw warnings when you try to read recently unmapped
vmalloced address. Is that fine?

Thanks for the review.
Ravi

>> +   if (probe_kernel_address((void *)addr, instr))
>> +   return 0;
>> +   return branch_target();
>> +   }
>>
>> /* Userspace: need copy instruction here then translate it */
>> pagefault_disable();
> Otherwise,
>
> Reviewed-by: Balbir Singh 
>



Re: [alsa-devel] [trivial PATCH] treewide: Align function definition open/close braces

2017-12-18 Thread Takashi Iwai
On Mon, 18 Dec 2017 01:28:44 +0100,
Joe Perches wrote:
> 
> Some functions definitions have either the initial open brace and/or
> the closing brace outside of column 1.
> 
> Move those braces to column 1.
> 
> This allows various function analyzers like gnu complexity to work
> properly for these modified functions.
> 
> Miscellanea:
> 
> o Remove extra trailing ; and blank line from xfs_agf_verify
> 
> Signed-off-by: Joe Perches 
> ---
> git diff -w shows no difference other than the above 'Miscellanea'
> 
> (this is against -next, but it applies against Linus' tree
>  with a couple offsets)
> 
>  arch/x86/include/asm/atomic64_32.h   |  2 +-
>  drivers/acpi/custom_method.c |  2 +-
>  drivers/acpi/fan.c   |  2 +-
>  drivers/gpu/drm/amd/display/dc/core/dc.c |  2 +-
>  drivers/media/i2c/msp3400-kthreads.c |  2 +-
>  drivers/message/fusion/mptsas.c  |  2 +-
>  drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c |  2 +-
>  drivers/net/wireless/ath/ath9k/xmit.c|  2 +-
>  drivers/platform/x86/eeepc-laptop.c  |  2 +-
>  drivers/rtc/rtc-ab-b5ze-s3.c |  2 +-
>  drivers/scsi/dpt_i2o.c   |  2 +-
>  drivers/scsi/sym53c8xx_2/sym_glue.c  |  2 +-
>  fs/locks.c   |  2 +-
>  fs/ocfs2/stack_user.c|  2 +-
>  fs/xfs/libxfs/xfs_alloc.c|  5 ++---
>  fs/xfs/xfs_export.c  |  2 +-
>  kernel/audit.c   |  6 +++---
>  kernel/trace/trace_printk.c  |  4 ++--
>  lib/raid6/sse2.c | 14 +++---
>  sound/soc/fsl/fsl_dma.c  |  2 +-

For sound bits,
  Acked-by: Takashi Iwai 


thanks,

Takashi


Re: [alsa-devel] [trivial PATCH] treewide: Align function definition open/close braces

2017-12-18 Thread Takashi Iwai
On Mon, 18 Dec 2017 01:28:44 +0100,
Joe Perches wrote:
> 
> Some functions definitions have either the initial open brace and/or
> the closing brace outside of column 1.
> 
> Move those braces to column 1.
> 
> This allows various function analyzers like gnu complexity to work
> properly for these modified functions.
> 
> Miscellanea:
> 
> o Remove extra trailing ; and blank line from xfs_agf_verify
> 
> Signed-off-by: Joe Perches 
> ---
> git diff -w shows no difference other than the above 'Miscellanea'
> 
> (this is against -next, but it applies against Linus' tree
>  with a couple offsets)
> 
>  arch/x86/include/asm/atomic64_32.h   |  2 +-
>  drivers/acpi/custom_method.c |  2 +-
>  drivers/acpi/fan.c   |  2 +-
>  drivers/gpu/drm/amd/display/dc/core/dc.c |  2 +-
>  drivers/media/i2c/msp3400-kthreads.c |  2 +-
>  drivers/message/fusion/mptsas.c  |  2 +-
>  drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c |  2 +-
>  drivers/net/wireless/ath/ath9k/xmit.c|  2 +-
>  drivers/platform/x86/eeepc-laptop.c  |  2 +-
>  drivers/rtc/rtc-ab-b5ze-s3.c |  2 +-
>  drivers/scsi/dpt_i2o.c   |  2 +-
>  drivers/scsi/sym53c8xx_2/sym_glue.c  |  2 +-
>  fs/locks.c   |  2 +-
>  fs/ocfs2/stack_user.c|  2 +-
>  fs/xfs/libxfs/xfs_alloc.c|  5 ++---
>  fs/xfs/xfs_export.c  |  2 +-
>  kernel/audit.c   |  6 +++---
>  kernel/trace/trace_printk.c  |  4 ++--
>  lib/raid6/sse2.c | 14 +++---
>  sound/soc/fsl/fsl_dma.c  |  2 +-

For sound bits,
  Acked-by: Takashi Iwai 


thanks,

Takashi


Re: [PATCH v3 07/33] nds32: MMU initialization

2017-12-18 Thread Greentime Hu
Hi, Guo Ren:

2017-12-18 20:22 GMT+08:00 Guo Ren :
> On Mon, Dec 18, 2017 at 07:21:30PM +0800, Greentime Hu wrote:
>> Hi, Guo Ren:
>>
>> 2017-12-18 17:08 GMT+08:00 Guo Ren :
>> > Hi Greentime,
>> >
>> > On Fri, Dec 08, 2017 at 05:11:50PM +0800, Greentime Hu wrote:
>> > [...]
>> >>
>> >> diff --git a/arch/nds32/mm/highmem.c b/arch/nds32/mm/highmem.c
>> > [...]
>> >> +void *kmap(struct page *page)
>> >> +{
>> >> + unsigned long vaddr;
>> >> + might_sleep();
>> >> + if (!PageHighMem(page))
>> >> + return page_address(page);
>> >> + vaddr = (unsigned long)kmap_high(page);
>> > Here should invalid the cpu_mmu_tlb's entry, Or invalid it in the
>> > set_pte().
>> >
>> > eg:
>> > vaddr0 = kmap(page0)
>> > *vaddr0 = val0 //It will cause tlb-miss, and hard-refill to MMU-tlb
>> > kunmap(page0)
>> > vaddr1 = kmap(page1) // Mostly vaddr1 = vaddr0
>> > val = vaddr1; //No tlb-miss and it will get page0's val not page1, because
>> > last expired vaddr0's entry is left in CPU-MMU-tlb.
>> >
>>
>> Thanks.
>> I will add __nds32__tlbop_inv(vaddr); to invalidate this mapping
>> before retrun vaddr.
>
> Sorry, perhaps I'm wrong. See
> kmap->kmap_high->map_new_virtual->get_next_pkmap_nr(color).
>
> Seems pkmap will return the vaddr by vaddr + 1 until
> no_more_pkmaps(), and then flush_all_zero_pkmaps.
> Just kmap_atomic need it, and you've done.

Thanks for double checking this case. :)
As you said, it will flush tlb in the generic code flow.


Re: [PATCH v3 07/33] nds32: MMU initialization

2017-12-18 Thread Greentime Hu
Hi, Guo Ren:

2017-12-18 20:22 GMT+08:00 Guo Ren :
> On Mon, Dec 18, 2017 at 07:21:30PM +0800, Greentime Hu wrote:
>> Hi, Guo Ren:
>>
>> 2017-12-18 17:08 GMT+08:00 Guo Ren :
>> > Hi Greentime,
>> >
>> > On Fri, Dec 08, 2017 at 05:11:50PM +0800, Greentime Hu wrote:
>> > [...]
>> >>
>> >> diff --git a/arch/nds32/mm/highmem.c b/arch/nds32/mm/highmem.c
>> > [...]
>> >> +void *kmap(struct page *page)
>> >> +{
>> >> + unsigned long vaddr;
>> >> + might_sleep();
>> >> + if (!PageHighMem(page))
>> >> + return page_address(page);
>> >> + vaddr = (unsigned long)kmap_high(page);
>> > Here should invalid the cpu_mmu_tlb's entry, Or invalid it in the
>> > set_pte().
>> >
>> > eg:
>> > vaddr0 = kmap(page0)
>> > *vaddr0 = val0 //It will cause tlb-miss, and hard-refill to MMU-tlb
>> > kunmap(page0)
>> > vaddr1 = kmap(page1) // Mostly vaddr1 = vaddr0
>> > val = vaddr1; //No tlb-miss and it will get page0's val not page1, because
>> > last expired vaddr0's entry is left in CPU-MMU-tlb.
>> >
>>
>> Thanks.
>> I will add __nds32__tlbop_inv(vaddr); to invalidate this mapping
>> before retrun vaddr.
>
> Sorry, perhaps I'm wrong. See
> kmap->kmap_high->map_new_virtual->get_next_pkmap_nr(color).
>
> Seems pkmap will return the vaddr by vaddr + 1 until
> no_more_pkmaps(), and then flush_all_zero_pkmaps.
> Just kmap_atomic need it, and you've done.

Thanks for double checking this case. :)
As you said, it will flush tlb in the generic code flow.


Re: [PATCH 0/2] arm64 SMMUv3 PMU driver with IORT support

2017-12-18 Thread Linu Cherian
Hi Marc,

On Mon Dec 18, 2017 at 03:39:22PM +, Marc Zyngier wrote:
> Thanks for putting me in the loop Robin.
> 
> On 18/12/17 14:48, Robin Murphy wrote:
> > On 10/12/17 02:35, Linu Cherian wrote:
> >> Hi,
> >>
> >>
> >> On Fri Aug 04, 2017 at 03:59:12PM -0400, Neil Leeder wrote:
> >>> This adds a driver for the SMMUv3 PMU into the perf framework.
> >>> It includes an IORT update to support PM Counter Groups.
> >>>
> >>
> >> In one of Cavium's upcoming SOC, SMMU PMCG implementation is such a way
> >> that platform bus id (Device ID in ITS terminmology)is shared with that of 
> >> SMMU.
> >> This would be a matter of concern for software if the SMMU and SMMU PMCG 
> >> blocks
> >> are managed by two independent drivers.
> >>
> >> The problem arises when we want to alloc/free MSIs for these devices
> >> using the APIs, platform_msi_domain_alloc/free_irqs.
> >> Platform bus id being same for these two hardware blocks, they end up 
> >> sharing the same
> >> ITT(Interrupt Translation Table) in GIC ITS and hence alloc, free and 
> >> management
> >> of this shared ITT becomes a problem when they are managed by two 
> >> independent
> >> drivers.
> > 
> > What is the problem exactly? IIRC resizing a possibly-live ITT is a 
> > right pain in the bum to do - is it just that?
> 
> Understatement of the day! ;-) Yes, it is a massive headache, and will
> either result in a temporary loss of interrupts (at some point you have
> to unmap the ITT), or with spurious interrupts (you generate interrupts
> for all the MSIs you've blackholed when unmapping the ITT).

Just sharing a thought.

If the firmware, through device tree/ACPI 
can provide the following input to the ITS driver,
(For example, as part of msi-parent property in device tree)

1. flag indicating the ITT (Device ID) is being shared 
2. the maximum number of vectors that are required to be allocated for this ITT

resizing of ITT and the associated complexities could be avoided.   

 
> 
> > 
> >> We were looking into the option of keeping the SMMU PMCG nodes as sub 
> >> nodes under
> >> SMMUv3 node, so that SMMUv3 driver could probe and figure out the total 
> >> vectors
> >> required for SMMU PMCG devices and make a common 
> >> platform_msi_domain_alloc/free_irqs
> >> function call for all devices that share the platform bus id.
> > 
> > I'm not sure how scalable that approach would be, since it's not 
> > entirely obvious how to handle PMCGs associated with named components or 
> > root complexes (rather than directly with SMMU instances). We certainly 
> > don't want to end up spraying similar PMCG DevID logic around all manner 
> > of GPU/accelerator/etc. drivers in future (whilst PMCGs for device TLBs 
> > will be expected to have distinct IDs from their host devices, they 
> > could reasonably still overlap with other PMCGs/SMMUs).
> > 
> >> Would like to know your expert opinion on what would be the right approach
> >> to handle this case ?
> > 
> > My gut feeling says the way to deal with this properly is in the ITS 
> > code, but I appreciate that that's a lot easier said than done :/
> 
> I can revive the hack I once wrote for that (and that was hoping to
> forever forget), but that's pretty disgusting, and subject to the above
> caveat.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...

-- 
Linu cherian


Re: [PATCH 0/2] arm64 SMMUv3 PMU driver with IORT support

2017-12-18 Thread Linu Cherian
Hi Marc,

On Mon Dec 18, 2017 at 03:39:22PM +, Marc Zyngier wrote:
> Thanks for putting me in the loop Robin.
> 
> On 18/12/17 14:48, Robin Murphy wrote:
> > On 10/12/17 02:35, Linu Cherian wrote:
> >> Hi,
> >>
> >>
> >> On Fri Aug 04, 2017 at 03:59:12PM -0400, Neil Leeder wrote:
> >>> This adds a driver for the SMMUv3 PMU into the perf framework.
> >>> It includes an IORT update to support PM Counter Groups.
> >>>
> >>
> >> In one of Cavium's upcoming SOC, SMMU PMCG implementation is such a way
> >> that platform bus id (Device ID in ITS terminmology)is shared with that of 
> >> SMMU.
> >> This would be a matter of concern for software if the SMMU and SMMU PMCG 
> >> blocks
> >> are managed by two independent drivers.
> >>
> >> The problem arises when we want to alloc/free MSIs for these devices
> >> using the APIs, platform_msi_domain_alloc/free_irqs.
> >> Platform bus id being same for these two hardware blocks, they end up 
> >> sharing the same
> >> ITT(Interrupt Translation Table) in GIC ITS and hence alloc, free and 
> >> management
> >> of this shared ITT becomes a problem when they are managed by two 
> >> independent
> >> drivers.
> > 
> > What is the problem exactly? IIRC resizing a possibly-live ITT is a 
> > right pain in the bum to do - is it just that?
> 
> Understatement of the day! ;-) Yes, it is a massive headache, and will
> either result in a temporary loss of interrupts (at some point you have
> to unmap the ITT), or with spurious interrupts (you generate interrupts
> for all the MSIs you've blackholed when unmapping the ITT).

Just sharing a thought.

If the firmware, through device tree/ACPI 
can provide the following input to the ITS driver,
(For example, as part of msi-parent property in device tree)

1. flag indicating the ITT (Device ID) is being shared 
2. the maximum number of vectors that are required to be allocated for this ITT

resizing of ITT and the associated complexities could be avoided.   

 
> 
> > 
> >> We were looking into the option of keeping the SMMU PMCG nodes as sub 
> >> nodes under
> >> SMMUv3 node, so that SMMUv3 driver could probe and figure out the total 
> >> vectors
> >> required for SMMU PMCG devices and make a common 
> >> platform_msi_domain_alloc/free_irqs
> >> function call for all devices that share the platform bus id.
> > 
> > I'm not sure how scalable that approach would be, since it's not 
> > entirely obvious how to handle PMCGs associated with named components or 
> > root complexes (rather than directly with SMMU instances). We certainly 
> > don't want to end up spraying similar PMCG DevID logic around all manner 
> > of GPU/accelerator/etc. drivers in future (whilst PMCGs for device TLBs 
> > will be expected to have distinct IDs from their host devices, they 
> > could reasonably still overlap with other PMCGs/SMMUs).
> > 
> >> Would like to know your expert opinion on what would be the right approach
> >> to handle this case ?
> > 
> > My gut feeling says the way to deal with this properly is in the ITS 
> > code, but I appreciate that that's a lot easier said than done :/
> 
> I can revive the hack I once wrote for that (and that was hoping to
> forever forget), but that's pretty disgusting, and subject to the above
> caveat.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...

-- 
Linu cherian


Re: [PATCH v6] mfd: syscon: Add hardware spinlock support

2017-12-18 Thread Baolin Wang
On 18 December 2017 at 20:44, Arnd Bergmann  wrote:
> On Mon, Dec 18, 2017 at 7:54 AM, Baolin Wang  wrote:
>> On 15 December 2017 at 21:13, Arnd Bergmann  wrote:
>>> On Fri, Dec 15, 2017 at 11:42 AM, Lee Jones  wrote:
>>>
> @@ -87,6 +88,30 @@ static struct syscon *of_syscon_register(struct 
> device_node *np)
>   if (ret)
>   reg_io_width = 4;
>
> + ret = of_hwspin_lock_get_id(np, 0);
> + if (ret > 0) {
> + syscon_config.hwlock_id = ret;
> + syscon_config.hwlock_mode = HWLOCK_IRQSTATE;
> + } else {
> + switch (ret) {
> + case -ENOENT:
> + /* Ignore missing hwlock, it's optional. */
> + break;
> + case 0:
> + /* In case of the HWSPINLOCK is not enabled. */
> + if (!IS_ENABLED(CONFIG_HWSPINLOCK))
> + break;
> +
> + ret = -EINVAL;
> + /* fall-through */
> + default:
> + pr_err("Failed to retrieve valid hwlock: %d\n", 
> ret);
> + /* fall-through */
> + case -EPROBE_DEFER:
> + goto err_regmap;
> + }
>>>
>>> The 'case 0' seems odd here, are we sure that this is always a failure?
>>> From the of_hwspin_lock_get_id() definition it looks like zero might
>>> be valid, and the !CONFIG_HWSPINLOCK implementation appears
>>> to be written so that we should consider '0' valid but unused and
>>> silently continue with that. If that is generally not the intended
>>> use, it should probably return -EINVAL or something like that.
>>
>> Yes, 0 is valid for of_hwspin_lock_get_id(), but if we pass 'hwlock id
>> = 0' to regmap, the regmap core will not regard it as a valid hwlock
>> id to request the hwlock and will use default mutex lock instead of
>> hwlock, which will cause problems. Meanwhile if we silently continue
>> with case 0, users will not realize that they set one invalid hwlock
>> id to regmap core, so here we regarded case 0 as one invalid id to
>> print error messages for users.
>
> Something else still seems wrong then: If regmap doesn't accept a zero
> lock-id, then of_hwspin_lock_get_id() should never return that as a
> valid ID, right?

Um, why regmap doesn't accept a zero lock-id, that because regmap will
reguest hwlock depending on the 'regmap_config->hwlock_id' is not
zero, if regmap regard a zero lock-id as valid which will affect other
'struct regmap_config' definition. So users should not assign the zero
lock-id to regmap.

Now of_hwspin_lock_get_id() can return 0 as valid, which depend on
what is the base id registered by hwspinlock driver. So you think we
should not regard 0 as valid from of_hwspin_lock_get_id(), I can try
to send another patch to fix.

But for this patch I still think we need regard the zero lock-id as
invalid and gave error messages to users.

-- 
Baolin.wang
Best Regards


Re: [PATCH v6] mfd: syscon: Add hardware spinlock support

2017-12-18 Thread Baolin Wang
On 18 December 2017 at 20:44, Arnd Bergmann  wrote:
> On Mon, Dec 18, 2017 at 7:54 AM, Baolin Wang  wrote:
>> On 15 December 2017 at 21:13, Arnd Bergmann  wrote:
>>> On Fri, Dec 15, 2017 at 11:42 AM, Lee Jones  wrote:
>>>
> @@ -87,6 +88,30 @@ static struct syscon *of_syscon_register(struct 
> device_node *np)
>   if (ret)
>   reg_io_width = 4;
>
> + ret = of_hwspin_lock_get_id(np, 0);
> + if (ret > 0) {
> + syscon_config.hwlock_id = ret;
> + syscon_config.hwlock_mode = HWLOCK_IRQSTATE;
> + } else {
> + switch (ret) {
> + case -ENOENT:
> + /* Ignore missing hwlock, it's optional. */
> + break;
> + case 0:
> + /* In case of the HWSPINLOCK is not enabled. */
> + if (!IS_ENABLED(CONFIG_HWSPINLOCK))
> + break;
> +
> + ret = -EINVAL;
> + /* fall-through */
> + default:
> + pr_err("Failed to retrieve valid hwlock: %d\n", 
> ret);
> + /* fall-through */
> + case -EPROBE_DEFER:
> + goto err_regmap;
> + }
>>>
>>> The 'case 0' seems odd here, are we sure that this is always a failure?
>>> From the of_hwspin_lock_get_id() definition it looks like zero might
>>> be valid, and the !CONFIG_HWSPINLOCK implementation appears
>>> to be written so that we should consider '0' valid but unused and
>>> silently continue with that. If that is generally not the intended
>>> use, it should probably return -EINVAL or something like that.
>>
>> Yes, 0 is valid for of_hwspin_lock_get_id(), but if we pass 'hwlock id
>> = 0' to regmap, the regmap core will not regard it as a valid hwlock
>> id to request the hwlock and will use default mutex lock instead of
>> hwlock, which will cause problems. Meanwhile if we silently continue
>> with case 0, users will not realize that they set one invalid hwlock
>> id to regmap core, so here we regarded case 0 as one invalid id to
>> print error messages for users.
>
> Something else still seems wrong then: If regmap doesn't accept a zero
> lock-id, then of_hwspin_lock_get_id() should never return that as a
> valid ID, right?

Um, why regmap doesn't accept a zero lock-id, that because regmap will
reguest hwlock depending on the 'regmap_config->hwlock_id' is not
zero, if regmap regard a zero lock-id as valid which will affect other
'struct regmap_config' definition. So users should not assign the zero
lock-id to regmap.

Now of_hwspin_lock_get_id() can return 0 as valid, which depend on
what is the base id registered by hwspinlock driver. So you think we
should not regard 0 as valid from of_hwspin_lock_get_id(), I can try
to send another patch to fix.

But for this patch I still think we need regard the zero lock-id as
invalid and gave error messages to users.

-- 
Baolin.wang
Best Regards


[PATCH 3/3] i2c: mediatek: Enable i2c module clock before i2c registers access.

2017-12-18 Thread Jun Gao
From: Jun Gao 

Make sure i2c module clock has been enabled before i2c registers
access.

Signed-off-by: Jun Gao 
---
 drivers/i2c/busses/i2c-mt65xx.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index 58d6401..cf23a74 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -861,10 +861,19 @@ static int mtk_i2c_remove(struct platform_device *pdev)
 #ifdef CONFIG_PM_SLEEP
 static int mtk_i2c_resume(struct device *dev)
 {
+   int ret;
struct mtk_i2c *i2c = dev_get_drvdata(dev);
 
+   ret = mtk_i2c_clock_enable(i2c);
+   if (ret) {
+   dev_err(dev, "clock enable failed!\n");
+   return ret;
+   }
+
mtk_i2c_init_hw(i2c);
 
+   mtk_i2c_clock_disable(i2c);
+
return 0;
 }
 #endif
-- 
1.8.1.1



[PATCH 3/3] i2c: mediatek: Enable i2c module clock before i2c registers access.

2017-12-18 Thread Jun Gao
From: Jun Gao 

Make sure i2c module clock has been enabled before i2c registers
access.

Signed-off-by: Jun Gao 
---
 drivers/i2c/busses/i2c-mt65xx.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index 58d6401..cf23a74 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -861,10 +861,19 @@ static int mtk_i2c_remove(struct platform_device *pdev)
 #ifdef CONFIG_PM_SLEEP
 static int mtk_i2c_resume(struct device *dev)
 {
+   int ret;
struct mtk_i2c *i2c = dev_get_drvdata(dev);
 
+   ret = mtk_i2c_clock_enable(i2c);
+   if (ret) {
+   dev_err(dev, "clock enable failed!\n");
+   return ret;
+   }
+
mtk_i2c_init_hw(i2c);
 
+   mtk_i2c_clock_disable(i2c);
+
return 0;
 }
 #endif
-- 
1.8.1.1



[PATCH 0/3] Add i2c dt-binding and compatible for Mediatek MT2712

2017-12-18 Thread Jun Gao
This patch series based on v4.15-rc1, include MT2712 i2c dt-binding, compatible
and i2c module clock enable.

Jun Gao (3):
  dt-bindings: i2c: Add MediaTek MT2712 i2c binding
  i2c: mediatek: Add i2c compatible for MediaTek MT2712
  i2c: mediatek: Enable i2c module clock before i2c registers access.

 Documentation/devicetree/bindings/i2c/i2c-mtk.txt |  1 +
 drivers/i2c/busses/i2c-mt65xx.c   | 40 ---
 2 files changed, 37 insertions(+), 4 deletions(-)

--
1.8.1.1



[PATCH 1/3] dt-bindings: i2c: Add MediaTek MT2712 i2c binding

2017-12-18 Thread Jun Gao
From: Jun Gao 

Add MT2712 i2c binding to binding file. Compare to MT8173 i2c
controller, MT2712 has timing adjust registers which can adjust
the internal divider of i2c source clock, SCL duty cycle, SCL
compare point, start(repeated start) and stop time, SDA change
time.

Signed-off-by: Jun Gao 
---
 Documentation/devicetree/bindings/i2c/i2c-mtk.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt 
b/Documentation/devicetree/bindings/i2c/i2c-mtk.txt
index ff7bf37..e199695 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-mtk.txt
@@ -5,6 +5,7 @@ The MediaTek's I2C controller is used to interface with I2C 
devices.
 Required properties:
   - compatible: value should be either of the following.
   "mediatek,mt2701-i2c", "mediatek,mt6577-i2c": for MediaTek MT2701
+  "mediatek,mt2712-i2c": for MediaTek MT2712
   "mediatek,mt6577-i2c": for MediaTek MT6577
   "mediatek,mt6589-i2c": for MediaTek MT6589
   "mediatek,mt7622-i2c": for MediaTek MT7622
-- 
1.8.1.1



[PATCH 0/3] Add i2c dt-binding and compatible for Mediatek MT2712

2017-12-18 Thread Jun Gao
This patch series based on v4.15-rc1, include MT2712 i2c dt-binding, compatible
and i2c module clock enable.

Jun Gao (3):
  dt-bindings: i2c: Add MediaTek MT2712 i2c binding
  i2c: mediatek: Add i2c compatible for MediaTek MT2712
  i2c: mediatek: Enable i2c module clock before i2c registers access.

 Documentation/devicetree/bindings/i2c/i2c-mtk.txt |  1 +
 drivers/i2c/busses/i2c-mt65xx.c   | 40 ---
 2 files changed, 37 insertions(+), 4 deletions(-)

--
1.8.1.1



[PATCH 1/3] dt-bindings: i2c: Add MediaTek MT2712 i2c binding

2017-12-18 Thread Jun Gao
From: Jun Gao 

Add MT2712 i2c binding to binding file. Compare to MT8173 i2c
controller, MT2712 has timing adjust registers which can adjust
the internal divider of i2c source clock, SCL duty cycle, SCL
compare point, start(repeated start) and stop time, SDA change
time.

Signed-off-by: Jun Gao 
---
 Documentation/devicetree/bindings/i2c/i2c-mtk.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt 
b/Documentation/devicetree/bindings/i2c/i2c-mtk.txt
index ff7bf37..e199695 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-mtk.txt
@@ -5,6 +5,7 @@ The MediaTek's I2C controller is used to interface with I2C 
devices.
 Required properties:
   - compatible: value should be either of the following.
   "mediatek,mt2701-i2c", "mediatek,mt6577-i2c": for MediaTek MT2701
+  "mediatek,mt2712-i2c": for MediaTek MT2712
   "mediatek,mt6577-i2c": for MediaTek MT6577
   "mediatek,mt6589-i2c": for MediaTek MT6589
   "mediatek,mt7622-i2c": for MediaTek MT7622
-- 
1.8.1.1



[PATCH 2/3] i2c: mediatek: Add i2c compatible for MediaTek MT2712

2017-12-18 Thread Jun Gao
From: Jun Gao 

Add i2c compatible for MT2712. Compare to MT8173 i2c controller,
internal divider of i2c source clock need to be configured for
MT2712 i2c speed calculation.

Signed-off-by: Jun Gao 
---
 drivers/i2c/busses/i2c-mt65xx.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index 09d288c..58d6401 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -61,6 +61,7 @@
 #define I2C_DMA_HARD_RST   0x0002
 #define I2C_DMA_4G_MODE0x0001
 
+#define I2C_DEFAULT_CLK_DIV5
 #define I2C_DEFAULT_SPEED  10  /* hz */
 #define MAX_FS_MODE_SPEED  40
 #define MAX_HS_MODE_SPEED  340
@@ -127,6 +128,7 @@ enum I2C_REGS_OFFSET {
OFFSET_DEBUGSTAT = 0x64,
OFFSET_DEBUGCTRL = 0x68,
OFFSET_TRANSFER_LEN_AUX = 0x6c,
+   OFFSET_CLOCK_DIV = 0x70,
 };
 
 struct mtk_i2c_compatible {
@@ -136,6 +138,7 @@ struct mtk_i2c_compatible {
unsigned char auto_restart: 1;
unsigned char aux_len_reg: 1;
unsigned char support_33bits: 1;
+   unsigned char timing_adjust: 1;
 };
 
 struct mtk_i2c {
@@ -176,6 +179,15 @@ struct mtk_i2c {
.max_num_msgs = 255,
 };
 
+static const struct mtk_i2c_compatible mt2712_compat = {
+   .pmic_i2c = 0,
+   .dcm = 1,
+   .auto_restart = 1,
+   .aux_len_reg = 1,
+   .support_33bits = 1,
+   .timing_adjust = 1,
+};
+
 static const struct mtk_i2c_compatible mt6577_compat = {
.quirks = _i2c_quirks,
.pmic_i2c = 0,
@@ -183,6 +195,7 @@ struct mtk_i2c {
.auto_restart = 0,
.aux_len_reg = 0,
.support_33bits = 0,
+   .timing_adjust = 0,
 };
 
 static const struct mtk_i2c_compatible mt6589_compat = {
@@ -192,6 +205,7 @@ struct mtk_i2c {
.auto_restart = 0,
.aux_len_reg = 0,
.support_33bits = 0,
+   .timing_adjust = 0,
 };
 
 static const struct mtk_i2c_compatible mt7622_compat = {
@@ -201,6 +215,7 @@ struct mtk_i2c {
.auto_restart = 1,
.aux_len_reg = 1,
.support_33bits = 0,
+   .timing_adjust = 0,
 };
 
 static const struct mtk_i2c_compatible mt8173_compat = {
@@ -209,9 +224,11 @@ struct mtk_i2c {
.auto_restart = 1,
.aux_len_reg = 1,
.support_33bits = 1,
+   .timing_adjust = 0,
 };
 
 static const struct of_device_id mtk_i2c_of_match[] = {
+   { .compatible = "mediatek,mt2712-i2c", .data = _compat },
{ .compatible = "mediatek,mt6577-i2c", .data = _compat },
{ .compatible = "mediatek,mt6589-i2c", .data = _compat },
{ .compatible = "mediatek,mt7622-i2c", .data = _compat },
@@ -271,6 +288,9 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c)
if (i2c->dev_comp->dcm)
writew(I2C_DCM_DISABLE, i2c->base + OFFSET_DCM_EN);
 
+   if (i2c->dev_comp->timing_adjust)
+   writew(I2C_DEFAULT_CLK_DIV - 1, i2c->base + OFFSET_CLOCK_DIV);
+
writew(i2c->timing_reg, i2c->base + OFFSET_TIMING);
writew(i2c->high_speed_reg, i2c->base + OFFSET_HS);
 
@@ -725,10 +745,6 @@ static int mtk_i2c_probe(struct platform_device *pdev)
if (!i2c)
return -ENOMEM;
 
-   ret = mtk_i2c_parse_dt(pdev->dev.of_node, i2c);
-   if (ret)
-   return -EINVAL;
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
i2c->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(i2c->base))
@@ -759,6 +775,13 @@ static int mtk_i2c_probe(struct platform_device *pdev)
i2c->adap.timeout = 2 * HZ;
i2c->adap.retries = 1;
 
+   ret = mtk_i2c_parse_dt(pdev->dev.of_node, i2c);
+   if (ret)
+   return -EINVAL;
+
+   if (i2c->dev_comp->timing_adjust)
+   i2c->clk_src_div *= I2C_DEFAULT_CLK_DIV;
+
if (i2c->have_pmic && !i2c->dev_comp->pmic_i2c)
return -EINVAL;
 
-- 
1.8.1.1



[PATCH 2/3] i2c: mediatek: Add i2c compatible for MediaTek MT2712

2017-12-18 Thread Jun Gao
From: Jun Gao 

Add i2c compatible for MT2712. Compare to MT8173 i2c controller,
internal divider of i2c source clock need to be configured for
MT2712 i2c speed calculation.

Signed-off-by: Jun Gao 
---
 drivers/i2c/busses/i2c-mt65xx.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index 09d288c..58d6401 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -61,6 +61,7 @@
 #define I2C_DMA_HARD_RST   0x0002
 #define I2C_DMA_4G_MODE0x0001
 
+#define I2C_DEFAULT_CLK_DIV5
 #define I2C_DEFAULT_SPEED  10  /* hz */
 #define MAX_FS_MODE_SPEED  40
 #define MAX_HS_MODE_SPEED  340
@@ -127,6 +128,7 @@ enum I2C_REGS_OFFSET {
OFFSET_DEBUGSTAT = 0x64,
OFFSET_DEBUGCTRL = 0x68,
OFFSET_TRANSFER_LEN_AUX = 0x6c,
+   OFFSET_CLOCK_DIV = 0x70,
 };
 
 struct mtk_i2c_compatible {
@@ -136,6 +138,7 @@ struct mtk_i2c_compatible {
unsigned char auto_restart: 1;
unsigned char aux_len_reg: 1;
unsigned char support_33bits: 1;
+   unsigned char timing_adjust: 1;
 };
 
 struct mtk_i2c {
@@ -176,6 +179,15 @@ struct mtk_i2c {
.max_num_msgs = 255,
 };
 
+static const struct mtk_i2c_compatible mt2712_compat = {
+   .pmic_i2c = 0,
+   .dcm = 1,
+   .auto_restart = 1,
+   .aux_len_reg = 1,
+   .support_33bits = 1,
+   .timing_adjust = 1,
+};
+
 static const struct mtk_i2c_compatible mt6577_compat = {
.quirks = _i2c_quirks,
.pmic_i2c = 0,
@@ -183,6 +195,7 @@ struct mtk_i2c {
.auto_restart = 0,
.aux_len_reg = 0,
.support_33bits = 0,
+   .timing_adjust = 0,
 };
 
 static const struct mtk_i2c_compatible mt6589_compat = {
@@ -192,6 +205,7 @@ struct mtk_i2c {
.auto_restart = 0,
.aux_len_reg = 0,
.support_33bits = 0,
+   .timing_adjust = 0,
 };
 
 static const struct mtk_i2c_compatible mt7622_compat = {
@@ -201,6 +215,7 @@ struct mtk_i2c {
.auto_restart = 1,
.aux_len_reg = 1,
.support_33bits = 0,
+   .timing_adjust = 0,
 };
 
 static const struct mtk_i2c_compatible mt8173_compat = {
@@ -209,9 +224,11 @@ struct mtk_i2c {
.auto_restart = 1,
.aux_len_reg = 1,
.support_33bits = 1,
+   .timing_adjust = 0,
 };
 
 static const struct of_device_id mtk_i2c_of_match[] = {
+   { .compatible = "mediatek,mt2712-i2c", .data = _compat },
{ .compatible = "mediatek,mt6577-i2c", .data = _compat },
{ .compatible = "mediatek,mt6589-i2c", .data = _compat },
{ .compatible = "mediatek,mt7622-i2c", .data = _compat },
@@ -271,6 +288,9 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c)
if (i2c->dev_comp->dcm)
writew(I2C_DCM_DISABLE, i2c->base + OFFSET_DCM_EN);
 
+   if (i2c->dev_comp->timing_adjust)
+   writew(I2C_DEFAULT_CLK_DIV - 1, i2c->base + OFFSET_CLOCK_DIV);
+
writew(i2c->timing_reg, i2c->base + OFFSET_TIMING);
writew(i2c->high_speed_reg, i2c->base + OFFSET_HS);
 
@@ -725,10 +745,6 @@ static int mtk_i2c_probe(struct platform_device *pdev)
if (!i2c)
return -ENOMEM;
 
-   ret = mtk_i2c_parse_dt(pdev->dev.of_node, i2c);
-   if (ret)
-   return -EINVAL;
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
i2c->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(i2c->base))
@@ -759,6 +775,13 @@ static int mtk_i2c_probe(struct platform_device *pdev)
i2c->adap.timeout = 2 * HZ;
i2c->adap.retries = 1;
 
+   ret = mtk_i2c_parse_dt(pdev->dev.of_node, i2c);
+   if (ret)
+   return -EINVAL;
+
+   if (i2c->dev_comp->timing_adjust)
+   i2c->clk_src_div *= I2C_DEFAULT_CLK_DIV;
+
if (i2c->have_pmic && !i2c->dev_comp->pmic_i2c)
return -EINVAL;
 
-- 
1.8.1.1



Re: [PATCH] KVM/Eventfd: Avoid crash when assign and deassign same eventfd in parallel.

2017-12-18 Thread Lan Tianyu
On 2017年12月18日 16:50, Paolo Bonzini wrote:
> On 18/12/2017 09:30, David Hildenbrand wrote:
>> The ugly thing in kvm_irqfd_assign() is that we access irqfd without
>> holding a lock. I think that should rather be fixed than working around
>> that issue. (e.g. lock() -> lookup again -> verify still in list ->
>> unlock())
> 
> I wonder if it's even simpler:
> 
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index f2ac53ab8243..17ed298bd66f 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -387,7 +387,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  
>   idx = srcu_read_lock(>irq_srcu);
>   irqfd_update(kvm, irqfd);
> - srcu_read_unlock(>irq_srcu, idx);
>  
>   list_add_tail(>list, >irqfds.items);
>  
> @@ -420,10 +419,12 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd 
> *args)
>   irqfd->consumer.token, ret);
>   }
>  #endif
> + srcu_read_unlock(>irq_srcu, idx);
>  
>   return 0;
>  
>  fail:
> + /* irq_srcu is *not* held here.  */
>   if (irqfd->resampler)
>   irqfd_resampler_shutdown(irqfd);
>  
> 

Hi Paolo:
This patch still can't prevent from freeing struct irqfd in
irq_shutdown() during assign irqfd. Once irqfd is added to
kvm->irqfds.items list, deassign path can get irqfd and free it.
-- 
Best regards
Tianyu Lan


Re: [PATCH] KVM/Eventfd: Avoid crash when assign and deassign same eventfd in parallel.

2017-12-18 Thread Lan Tianyu
On 2017年12月18日 16:50, Paolo Bonzini wrote:
> On 18/12/2017 09:30, David Hildenbrand wrote:
>> The ugly thing in kvm_irqfd_assign() is that we access irqfd without
>> holding a lock. I think that should rather be fixed than working around
>> that issue. (e.g. lock() -> lookup again -> verify still in list ->
>> unlock())
> 
> I wonder if it's even simpler:
> 
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index f2ac53ab8243..17ed298bd66f 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -387,7 +387,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  
>   idx = srcu_read_lock(>irq_srcu);
>   irqfd_update(kvm, irqfd);
> - srcu_read_unlock(>irq_srcu, idx);
>  
>   list_add_tail(>list, >irqfds.items);
>  
> @@ -420,10 +419,12 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd 
> *args)
>   irqfd->consumer.token, ret);
>   }
>  #endif
> + srcu_read_unlock(>irq_srcu, idx);
>  
>   return 0;
>  
>  fail:
> + /* irq_srcu is *not* held here.  */
>   if (irqfd->resampler)
>   irqfd_resampler_shutdown(irqfd);
>  
> 

Hi Paolo:
This patch still can't prevent from freeing struct irqfd in
irq_shutdown() during assign irqfd. Once irqfd is added to
kvm->irqfds.items list, deassign path can get irqfd and free it.
-- 
Best regards
Tianyu Lan


[PATCH v2 1/5] mm: migrate NUMA stats from per-zone to per-node

2017-12-18 Thread Kemi Wang
There is not really any use to get NUMA stats separated by zone, and
current per-zone NUMA stats is only consumed in /proc/zoneinfo. For code
cleanup purpose, we move NUMA stats from per-zone to per-node and reuse the
existed per-cpu infrastructure.

Suggested-by: Andi Kleen 
Suggested-by: Michal Hocko 
Signed-off-by: Kemi Wang 
---
 drivers/base/node.c|  23 +++
 include/linux/mmzone.h |  27 
 include/linux/vmstat.h |  31 -
 mm/mempolicy.c |   2 +-
 mm/page_alloc.c|  16 +++--
 mm/vmstat.c| 177 +
 6 files changed, 46 insertions(+), 230 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index ee090ab..a045ea1 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -169,13 +169,14 @@ static ssize_t node_read_numastat(struct device *dev,
   "interleave_hit %lu\n"
   "local_node %lu\n"
   "other_node %lu\n",
-  sum_zone_numa_state(dev->id, NUMA_HIT),
-  sum_zone_numa_state(dev->id, NUMA_MISS),
-  sum_zone_numa_state(dev->id, NUMA_FOREIGN),
-  sum_zone_numa_state(dev->id, NUMA_INTERLEAVE_HIT),
-  sum_zone_numa_state(dev->id, NUMA_LOCAL),
-  sum_zone_numa_state(dev->id, NUMA_OTHER));
+  node_page_state(NODE_DATA(dev->id), NUMA_HIT),
+  node_page_state(NODE_DATA(dev->id), NUMA_MISS),
+  node_page_state(NODE_DATA(dev->id), NUMA_FOREIGN),
+  node_page_state(NODE_DATA(dev->id), NUMA_INTERLEAVE_HIT),
+  node_page_state(NODE_DATA(dev->id), NUMA_LOCAL),
+  node_page_state(NODE_DATA(dev->id), NUMA_OTHER));
 }
+
 static DEVICE_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
 static ssize_t node_read_vmstat(struct device *dev,
@@ -190,17 +191,9 @@ static ssize_t node_read_vmstat(struct device *dev,
n += sprintf(buf+n, "%s %lu\n", vmstat_text[i],
 sum_zone_node_page_state(nid, i));
 
-#ifdef CONFIG_NUMA
-   for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++)
-   n += sprintf(buf+n, "%s %lu\n",
-vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
-sum_zone_numa_state(nid, i));
-#endif
-
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
n += sprintf(buf+n, "%s %lu\n",
-vmstat_text[i + NR_VM_ZONE_STAT_ITEMS +
-NR_VM_NUMA_STAT_ITEMS],
+vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
 node_page_state(pgdat, i));
 
return n;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 67f2e3c..c06d880 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -115,20 +115,6 @@ struct zone_padding {
 #define ZONE_PADDING(name)
 #endif
 
-#ifdef CONFIG_NUMA
-enum numa_stat_item {
-   NUMA_HIT,   /* allocated in intended node */
-   NUMA_MISS,  /* allocated in non intended node */
-   NUMA_FOREIGN,   /* was intended here, hit elsewhere */
-   NUMA_INTERLEAVE_HIT,/* interleaver preferred this zone */
-   NUMA_LOCAL, /* allocation from local node */
-   NUMA_OTHER, /* allocation from other node */
-   NR_VM_NUMA_STAT_ITEMS
-};
-#else
-#define NR_VM_NUMA_STAT_ITEMS 0
-#endif
-
 enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
@@ -151,7 +137,18 @@ enum zone_stat_item {
NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
-   NR_LRU_BASE,
+#ifdef CONFIG_NUMA
+   NUMA_HIT,   /* allocated in intended node */
+   NUMA_MISS,  /* allocated in non intended node */
+   NUMA_FOREIGN,   /* was intended here, hit elsewhere */
+   NUMA_INTERLEAVE_HIT,/* interleaver preferred this zone */
+   NUMA_LOCAL, /* allocation from local node */
+   NUMA_OTHER, /* allocation from other node */
+   NR_VM_NUMA_STAT_ITEMS,
+#else
+#defineNR_VM_NUMA_STAT_ITEMS 0
+#endif
+   NR_LRU_BASE = NR_VM_NUMA_STAT_ITEMS,
NR_INACTIVE_ANON = NR_LRU_BASE, /* must match order of LRU_[IN]ACTIVE */
NR_ACTIVE_ANON, /*  " " "   "   " */
NR_INACTIVE_FILE,   /*  " " "   "   " */
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 1779c98..80bf290 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -118,37 +118,8 @@ static inline void vm_events_fold_cpu(int cpu)
  * Zone and node-based page accounting with per cpu differentials.
  */
 extern atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS];
-extern 

[PATCH v2 1/5] mm: migrate NUMA stats from per-zone to per-node

2017-12-18 Thread Kemi Wang
There is not really any use to get NUMA stats separated by zone, and
current per-zone NUMA stats is only consumed in /proc/zoneinfo. For code
cleanup purpose, we move NUMA stats from per-zone to per-node and reuse the
existed per-cpu infrastructure.

Suggested-by: Andi Kleen 
Suggested-by: Michal Hocko 
Signed-off-by: Kemi Wang 
---
 drivers/base/node.c|  23 +++
 include/linux/mmzone.h |  27 
 include/linux/vmstat.h |  31 -
 mm/mempolicy.c |   2 +-
 mm/page_alloc.c|  16 +++--
 mm/vmstat.c| 177 +
 6 files changed, 46 insertions(+), 230 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index ee090ab..a045ea1 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -169,13 +169,14 @@ static ssize_t node_read_numastat(struct device *dev,
   "interleave_hit %lu\n"
   "local_node %lu\n"
   "other_node %lu\n",
-  sum_zone_numa_state(dev->id, NUMA_HIT),
-  sum_zone_numa_state(dev->id, NUMA_MISS),
-  sum_zone_numa_state(dev->id, NUMA_FOREIGN),
-  sum_zone_numa_state(dev->id, NUMA_INTERLEAVE_HIT),
-  sum_zone_numa_state(dev->id, NUMA_LOCAL),
-  sum_zone_numa_state(dev->id, NUMA_OTHER));
+  node_page_state(NODE_DATA(dev->id), NUMA_HIT),
+  node_page_state(NODE_DATA(dev->id), NUMA_MISS),
+  node_page_state(NODE_DATA(dev->id), NUMA_FOREIGN),
+  node_page_state(NODE_DATA(dev->id), NUMA_INTERLEAVE_HIT),
+  node_page_state(NODE_DATA(dev->id), NUMA_LOCAL),
+  node_page_state(NODE_DATA(dev->id), NUMA_OTHER));
 }
+
 static DEVICE_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
 static ssize_t node_read_vmstat(struct device *dev,
@@ -190,17 +191,9 @@ static ssize_t node_read_vmstat(struct device *dev,
n += sprintf(buf+n, "%s %lu\n", vmstat_text[i],
 sum_zone_node_page_state(nid, i));
 
-#ifdef CONFIG_NUMA
-   for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++)
-   n += sprintf(buf+n, "%s %lu\n",
-vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
-sum_zone_numa_state(nid, i));
-#endif
-
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
n += sprintf(buf+n, "%s %lu\n",
-vmstat_text[i + NR_VM_ZONE_STAT_ITEMS +
-NR_VM_NUMA_STAT_ITEMS],
+vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
 node_page_state(pgdat, i));
 
return n;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 67f2e3c..c06d880 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -115,20 +115,6 @@ struct zone_padding {
 #define ZONE_PADDING(name)
 #endif
 
-#ifdef CONFIG_NUMA
-enum numa_stat_item {
-   NUMA_HIT,   /* allocated in intended node */
-   NUMA_MISS,  /* allocated in non intended node */
-   NUMA_FOREIGN,   /* was intended here, hit elsewhere */
-   NUMA_INTERLEAVE_HIT,/* interleaver preferred this zone */
-   NUMA_LOCAL, /* allocation from local node */
-   NUMA_OTHER, /* allocation from other node */
-   NR_VM_NUMA_STAT_ITEMS
-};
-#else
-#define NR_VM_NUMA_STAT_ITEMS 0
-#endif
-
 enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
@@ -151,7 +137,18 @@ enum zone_stat_item {
NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
-   NR_LRU_BASE,
+#ifdef CONFIG_NUMA
+   NUMA_HIT,   /* allocated in intended node */
+   NUMA_MISS,  /* allocated in non intended node */
+   NUMA_FOREIGN,   /* was intended here, hit elsewhere */
+   NUMA_INTERLEAVE_HIT,/* interleaver preferred this zone */
+   NUMA_LOCAL, /* allocation from local node */
+   NUMA_OTHER, /* allocation from other node */
+   NR_VM_NUMA_STAT_ITEMS,
+#else
+#defineNR_VM_NUMA_STAT_ITEMS 0
+#endif
+   NR_LRU_BASE = NR_VM_NUMA_STAT_ITEMS,
NR_INACTIVE_ANON = NR_LRU_BASE, /* must match order of LRU_[IN]ACTIVE */
NR_ACTIVE_ANON, /*  " " "   "   " */
NR_INACTIVE_FILE,   /*  " " "   "   " */
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 1779c98..80bf290 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -118,37 +118,8 @@ static inline void vm_events_fold_cpu(int cpu)
  * Zone and node-based page accounting with per cpu differentials.
  */
 extern atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS];
-extern atomic_long_t vm_numa_stat[NR_VM_NUMA_STAT_ITEMS];
 extern 

[PATCH v2 5/5] mm: Rename zone_statistics() to numa_statistics()

2017-12-18 Thread Kemi Wang
Since the functionality of zone_statistics() updates numa counters, but
numa statistics has been separated from zone statistics framework. Thus,
the function name makes people confused. So, change the name to
numa_statistics() as well as its call sites accordingly.

Signed-off-by: Kemi Wang 
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e8d8f..f7583de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2790,7 +2790,7 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
  *
  * Must be called with interrupts disabled.
  */
-static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
+static inline void numa_statistics(struct zone *preferred_zone, struct zone *z)
 {
 #ifdef CONFIG_NUMA
int preferred_nid = preferred_zone->node;
@@ -2854,7 +2854,7 @@ static struct page *rmqueue_pcplist(struct zone 
*preferred_zone,
page = __rmqueue_pcplist(zone,  migratetype, pcp, list);
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
-   zone_statistics(preferred_zone, zone);
+   numa_statistics(preferred_zone, zone);
}
local_irq_restore(flags);
return page;
@@ -2902,7 +2902,7 @@ struct page *rmqueue(struct zone *preferred_zone,
  get_pcppage_migratetype(page));
 
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
-   zone_statistics(preferred_zone, zone);
+   numa_statistics(preferred_zone, zone);
local_irq_restore(flags);
 
 out:
-- 
2.7.4



[PATCH v2 5/5] mm: Rename zone_statistics() to numa_statistics()

2017-12-18 Thread Kemi Wang
Since the functionality of zone_statistics() updates numa counters, but
numa statistics has been separated from zone statistics framework. Thus,
the function name makes people confused. So, change the name to
numa_statistics() as well as its call sites accordingly.

Signed-off-by: Kemi Wang 
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e8d8f..f7583de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2790,7 +2790,7 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
  *
  * Must be called with interrupts disabled.
  */
-static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
+static inline void numa_statistics(struct zone *preferred_zone, struct zone *z)
 {
 #ifdef CONFIG_NUMA
int preferred_nid = preferred_zone->node;
@@ -2854,7 +2854,7 @@ static struct page *rmqueue_pcplist(struct zone 
*preferred_zone,
page = __rmqueue_pcplist(zone,  migratetype, pcp, list);
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
-   zone_statistics(preferred_zone, zone);
+   numa_statistics(preferred_zone, zone);
}
local_irq_restore(flags);
return page;
@@ -2902,7 +2902,7 @@ struct page *rmqueue(struct zone *preferred_zone,
  get_pcppage_migratetype(page));
 
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
-   zone_statistics(preferred_zone, zone);
+   numa_statistics(preferred_zone, zone);
local_irq_restore(flags);
 
 out:
-- 
2.7.4



[PATCH v2 4/5] mm: use node_page_state_snapshot to avoid deviation

2017-12-18 Thread Kemi Wang
To avoid deviation, this patch uses node_page_state_snapshot instead of
node_page_state for node page stats query.
e.g. cat /proc/zoneinfo
 cat /sys/devices/system/node/node*/vmstat
 cat /sys/devices/system/node/node*/numastat

As it is a slow path and would not be read frequently, I would worry about
it.

Signed-off-by: Kemi Wang 
---
 drivers/base/node.c | 17 ++---
 mm/vmstat.c |  2 +-
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index a045ea1..cf303f8 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -169,12 +169,15 @@ static ssize_t node_read_numastat(struct device *dev,
   "interleave_hit %lu\n"
   "local_node %lu\n"
   "other_node %lu\n",
-  node_page_state(NODE_DATA(dev->id), NUMA_HIT),
-  node_page_state(NODE_DATA(dev->id), NUMA_MISS),
-  node_page_state(NODE_DATA(dev->id), NUMA_FOREIGN),
-  node_page_state(NODE_DATA(dev->id), NUMA_INTERLEAVE_HIT),
-  node_page_state(NODE_DATA(dev->id), NUMA_LOCAL),
-  node_page_state(NODE_DATA(dev->id), NUMA_OTHER));
+  node_page_state_snapshot(NODE_DATA(dev->id), NUMA_HIT),
+  node_page_state_snapshot(NODE_DATA(dev->id), NUMA_MISS),
+  node_page_state_snapshot(NODE_DATA(dev->id),
+  NUMA_FOREIGN),
+  node_page_state_snapshot(NODE_DATA(dev->id),
+  NUMA_INTERLEAVE_HIT),
+  node_page_state_snapshot(NODE_DATA(dev->id), NUMA_LOCAL),
+  node_page_state_snapshot(NODE_DATA(dev->id),
+  NUMA_OTHER));
 }
 
 static DEVICE_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
@@ -194,7 +197,7 @@ static ssize_t node_read_vmstat(struct device *dev,
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
n += sprintf(buf+n, "%s %lu\n",
 vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
-node_page_state(pgdat, i));
+node_page_state_snapshot(pgdat, i));
 
return n;
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 64e08ae..d65f28d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1466,7 +1466,7 @@ static void zoneinfo_show_print(struct seq_file *m, 
pg_data_t *pgdat,
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
seq_printf(m, "\n  %-12s %lu",
vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
-   node_page_state(pgdat, i));
+   node_page_state_snapshot(pgdat, i));
}
}
seq_printf(m,
-- 
2.7.4



[PATCH v2 4/5] mm: use node_page_state_snapshot to avoid deviation

2017-12-18 Thread Kemi Wang
To avoid deviation, this patch uses node_page_state_snapshot instead of
node_page_state for node page stats query.
e.g. cat /proc/zoneinfo
 cat /sys/devices/system/node/node*/vmstat
 cat /sys/devices/system/node/node*/numastat

As it is a slow path and would not be read frequently, I would worry about
it.

Signed-off-by: Kemi Wang 
---
 drivers/base/node.c | 17 ++---
 mm/vmstat.c |  2 +-
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index a045ea1..cf303f8 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -169,12 +169,15 @@ static ssize_t node_read_numastat(struct device *dev,
   "interleave_hit %lu\n"
   "local_node %lu\n"
   "other_node %lu\n",
-  node_page_state(NODE_DATA(dev->id), NUMA_HIT),
-  node_page_state(NODE_DATA(dev->id), NUMA_MISS),
-  node_page_state(NODE_DATA(dev->id), NUMA_FOREIGN),
-  node_page_state(NODE_DATA(dev->id), NUMA_INTERLEAVE_HIT),
-  node_page_state(NODE_DATA(dev->id), NUMA_LOCAL),
-  node_page_state(NODE_DATA(dev->id), NUMA_OTHER));
+  node_page_state_snapshot(NODE_DATA(dev->id), NUMA_HIT),
+  node_page_state_snapshot(NODE_DATA(dev->id), NUMA_MISS),
+  node_page_state_snapshot(NODE_DATA(dev->id),
+  NUMA_FOREIGN),
+  node_page_state_snapshot(NODE_DATA(dev->id),
+  NUMA_INTERLEAVE_HIT),
+  node_page_state_snapshot(NODE_DATA(dev->id), NUMA_LOCAL),
+  node_page_state_snapshot(NODE_DATA(dev->id),
+  NUMA_OTHER));
 }
 
 static DEVICE_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
@@ -194,7 +197,7 @@ static ssize_t node_read_vmstat(struct device *dev,
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
n += sprintf(buf+n, "%s %lu\n",
 vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
-node_page_state(pgdat, i));
+node_page_state_snapshot(pgdat, i));
 
return n;
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 64e08ae..d65f28d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1466,7 +1466,7 @@ static void zoneinfo_show_print(struct seq_file *m, 
pg_data_t *pgdat,
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
seq_printf(m, "\n  %-12s %lu",
vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
-   node_page_state(pgdat, i));
+   node_page_state_snapshot(pgdat, i));
}
}
seq_printf(m,
-- 
2.7.4



[PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat from s8 to s16

2017-12-18 Thread Kemi Wang
The type s8 used for vm_diff_nodestat[] as local cpu counters has the
limitation of global counters update frequency, especially for those
monotone increasing type of counters like NUMA counters with more and more
cpus/nodes. This patch extends the type of vm_diff_nodestat from s8 to s16
without any functionality change.

 before after
sizeof(struct per_cpu_nodestat)28 68

Signed-off-by: Kemi Wang 
---
 include/linux/mmzone.h |  4 ++--
 mm/vmstat.c| 16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c06d880..2da6b6f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -289,8 +289,8 @@ struct per_cpu_pageset {
 };
 
 struct per_cpu_nodestat {
-   s8 stat_threshold;
-   s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
+   s16 stat_threshold;
+   s16 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
 };
 
 #endif /* !__GENERATING_BOUNDS.H */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1dd12ae..9c681cc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -332,7 +332,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum 
node_stat_item item,
long delta)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
long x;
long t;
 
@@ -390,13 +390,13 @@ void __inc_zone_state(struct zone *zone, enum 
zone_stat_item item)
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
-   s8 v, t;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 v, t;
 
v = __this_cpu_inc_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
if (unlikely(v > t)) {
-   s8 overstep = t >> 1;
+   s16 overstep = t >> 1;
 
node_page_state_add(v + overstep, pgdat, item);
__this_cpu_write(*p, -overstep);
@@ -434,13 +434,13 @@ void __dec_zone_state(struct zone *zone, enum 
zone_stat_item item)
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
-   s8 v, t;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 v, t;
 
v = __this_cpu_dec_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
if (unlikely(v < - t)) {
-   s8 overstep = t >> 1;
+   s16 overstep = t >> 1;
 
node_page_state_add(v - overstep, pgdat, item);
__this_cpu_write(*p, overstep);
@@ -533,7 +533,7 @@ static inline void mod_node_state(struct pglist_data *pgdat,
enum node_stat_item item, int delta, int overstep_mode)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
long o, n, t, z;
 
do {
-- 
2.7.4



[PATCH v2 3/5] mm: enlarge NUMA counters threshold size

2017-12-18 Thread Kemi Wang
We have seen significant overhead in cache bouncing caused by NUMA counters
update in multi-threaded page allocation. See 'commit 1d90ca897cb0 ("mm:
update NUMA counter threshold size")' for more details.

This patch updates NUMA counters to a fixed size of (MAX_S16 - 2) and deals
with global counter update using different threshold size for node page
stats.

Signed-off-by: Kemi Wang 
---
 mm/vmstat.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 9c681cc..64e08ae 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -30,6 +30,8 @@
 
 #include "internal.h"
 
+#define VM_NUMA_STAT_THRESHOLD (S16_MAX - 2)
+
 #ifdef CONFIG_NUMA
 int sysctl_vm_numa_stat = ENABLE_NUMA_STAT;
 
@@ -394,7 +396,11 @@ void __inc_node_state(struct pglist_data *pgdat, enum 
node_stat_item item)
s16 v, t;
 
v = __this_cpu_inc_return(*p);
-   t = __this_cpu_read(pcp->stat_threshold);
+   if (item >= NR_VM_NUMA_STAT_ITEMS)
+   t = __this_cpu_read(pcp->stat_threshold);
+   else
+   t = VM_NUMA_STAT_THRESHOLD;
+
if (unlikely(v > t)) {
s16 overstep = t >> 1;
 
@@ -549,7 +555,10 @@ static inline void mod_node_state(struct pglist_data 
*pgdat,
 * Most of the time the thresholds are the same anyways
 * for all cpus in a node.
 */
-   t = this_cpu_read(pcp->stat_threshold);
+   if (item >= NR_VM_NUMA_STAT_ITEMS)
+   t = this_cpu_read(pcp->stat_threshold);
+   else
+   t = VM_NUMA_STAT_THRESHOLD;
 
o = this_cpu_read(*p);
n = delta + o;
-- 
2.7.4



[PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat from s8 to s16

2017-12-18 Thread Kemi Wang
The type s8 used for vm_diff_nodestat[] as local cpu counters has the
limitation of global counters update frequency, especially for those
monotone increasing type of counters like NUMA counters with more and more
cpus/nodes. This patch extends the type of vm_diff_nodestat from s8 to s16
without any functionality change.

 before after
sizeof(struct per_cpu_nodestat)28 68

Signed-off-by: Kemi Wang 
---
 include/linux/mmzone.h |  4 ++--
 mm/vmstat.c| 16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c06d880..2da6b6f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -289,8 +289,8 @@ struct per_cpu_pageset {
 };
 
 struct per_cpu_nodestat {
-   s8 stat_threshold;
-   s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
+   s16 stat_threshold;
+   s16 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
 };
 
 #endif /* !__GENERATING_BOUNDS.H */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1dd12ae..9c681cc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -332,7 +332,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum 
node_stat_item item,
long delta)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
long x;
long t;
 
@@ -390,13 +390,13 @@ void __inc_zone_state(struct zone *zone, enum 
zone_stat_item item)
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
-   s8 v, t;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 v, t;
 
v = __this_cpu_inc_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
if (unlikely(v > t)) {
-   s8 overstep = t >> 1;
+   s16 overstep = t >> 1;
 
node_page_state_add(v + overstep, pgdat, item);
__this_cpu_write(*p, -overstep);
@@ -434,13 +434,13 @@ void __dec_zone_state(struct zone *zone, enum 
zone_stat_item item)
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
-   s8 v, t;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 v, t;
 
v = __this_cpu_dec_return(*p);
t = __this_cpu_read(pcp->stat_threshold);
if (unlikely(v < - t)) {
-   s8 overstep = t >> 1;
+   s16 overstep = t >> 1;
 
node_page_state_add(v - overstep, pgdat, item);
__this_cpu_write(*p, overstep);
@@ -533,7 +533,7 @@ static inline void mod_node_state(struct pglist_data *pgdat,
enum node_stat_item item, int delta, int overstep_mode)
 {
struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
-   s8 __percpu *p = pcp->vm_node_stat_diff + item;
+   s16 __percpu *p = pcp->vm_node_stat_diff + item;
long o, n, t, z;
 
do {
-- 
2.7.4



[PATCH v2 3/5] mm: enlarge NUMA counters threshold size

2017-12-18 Thread Kemi Wang
We have seen significant overhead in cache bouncing caused by NUMA counters
update in multi-threaded page allocation. See 'commit 1d90ca897cb0 ("mm:
update NUMA counter threshold size")' for more details.

This patch updates NUMA counters to a fixed size of (MAX_S16 - 2) and deals
with global counter update using different threshold size for node page
stats.

Signed-off-by: Kemi Wang 
---
 mm/vmstat.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 9c681cc..64e08ae 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -30,6 +30,8 @@
 
 #include "internal.h"
 
+#define VM_NUMA_STAT_THRESHOLD (S16_MAX - 2)
+
 #ifdef CONFIG_NUMA
 int sysctl_vm_numa_stat = ENABLE_NUMA_STAT;
 
@@ -394,7 +396,11 @@ void __inc_node_state(struct pglist_data *pgdat, enum 
node_stat_item item)
s16 v, t;
 
v = __this_cpu_inc_return(*p);
-   t = __this_cpu_read(pcp->stat_threshold);
+   if (item >= NR_VM_NUMA_STAT_ITEMS)
+   t = __this_cpu_read(pcp->stat_threshold);
+   else
+   t = VM_NUMA_STAT_THRESHOLD;
+
if (unlikely(v > t)) {
s16 overstep = t >> 1;
 
@@ -549,7 +555,10 @@ static inline void mod_node_state(struct pglist_data 
*pgdat,
 * Most of the time the thresholds are the same anyways
 * for all cpus in a node.
 */
-   t = this_cpu_read(pcp->stat_threshold);
+   if (item >= NR_VM_NUMA_STAT_ITEMS)
+   t = this_cpu_read(pcp->stat_threshold);
+   else
+   t = VM_NUMA_STAT_THRESHOLD;
 
o = this_cpu_read(*p);
n = delta + o;
-- 
2.7.4



[PATCH v2 0/5] mm: NUMA stats code cleanup and enhancement

2017-12-18 Thread Kemi Wang
The existed implementation of NUMA counters is per logical CPU along with
zone->vm_numa_stat[] separated by zone, plus a global numa counter array
vm_numa_stat[]. However, unlike the other vmstat counters, NUMA stats don't
effect system's decision and are only consumed when reading from /proc and
/sys. Also, usually nodes only have a single zone, except for node 0, and
there isn't really any use where you need these hits counts separated by
zone.

Therefore, we can migrate the implementation of numa stats from per-zone to
per-node (as suggested by Andi Kleen), and reuse the existed per-cpu
infrastructure with a little enhancement for NUMA stats. In this way, we
can get rid of the special way for NUMA stats and keep the performance gain
at the same time. With this patch series, about 170 lines code can be
saved.

The first patch migrates NUMA stats from per-zone to pre-node using the
existed per-cpu infrastructure. There is a little user-visual change when
read /proc/zoneinfo listed below:
 Before   After
Node 0, zone  DMA   Node 0, zone  DMA
  per-node stats  per-node stats
  nr_inactive_anon 7244  *numa_hit 98665086*
  nr_active_anon 177064  *numa_miss0*
  ...*numa_foreign 0*
  nr_bounce0 *numa_interleave 21059*
  nr_free_cma  0 *numa_local   98665086*
 *numa_hit 0**numa_other   0*
 *numa_miss0* nr_inactive_anon 20055
 *numa_foreign 0* nr_active_anon 389771
 *numa_interleave 0*  ...
 *numa_local   0* nr_bounce0
 *numa_other   0* nr_free_cma  0

The second patch extends the local cpu counter vm_stat_node_diff from s8 to
s16. It does not have any functionality change.

The third patch uses a large and constant threshold size for NUMA counters
to reduce the global NUMA counters update frequency.

The forth patch uses node_page_state_snapshot instead of node_page_state
when query a node stats (e.g. cat /sys/devices/system/node/node*/vmstat).
The only differece is that the stats value in local cpus are also included
in node_page_state_snapshot.

The last patch renames zone_statistics() to numa_statistics().

At last, I want to extend my heartiest appreciation for Michal Hocko's
suggestion of reusing the existed per-cpu infrastructure making it much
better than before.

Changelog:
  v1->v2:
  a) enhance the existed per-cpu infrastructure for node page stats by
  entending local cpu counters vm_node_stat_diff from s8 to s16
  b) reuse the per-cpu infrastrcuture for NUMA stats

Kemi Wang (5):
  mm: migrate NUMA stats from per-zone to per-node
  mm: Extends local cpu counter vm_diff_nodestat from s8 to s16
  mm: enlarge NUMA counters threshold size
  mm: use node_page_state_snapshot to avoid deviation
  mm: Rename zone_statistics() to numa_statistics()

 drivers/base/node.c|  28 +++
 include/linux/mmzone.h |  31 
 include/linux/vmstat.h |  31 
 mm/mempolicy.c |   2 +-
 mm/page_alloc.c|  22 +++---
 mm/vmstat.c| 206 +
 6 files changed, 74 insertions(+), 246 deletions(-)

-- 
2.7.4



[PATCH v2 0/5] mm: NUMA stats code cleanup and enhancement

2017-12-18 Thread Kemi Wang
The existed implementation of NUMA counters is per logical CPU along with
zone->vm_numa_stat[] separated by zone, plus a global numa counter array
vm_numa_stat[]. However, unlike the other vmstat counters, NUMA stats don't
effect system's decision and are only consumed when reading from /proc and
/sys. Also, usually nodes only have a single zone, except for node 0, and
there isn't really any use where you need these hits counts separated by
zone.

Therefore, we can migrate the implementation of numa stats from per-zone to
per-node (as suggested by Andi Kleen), and reuse the existed per-cpu
infrastructure with a little enhancement for NUMA stats. In this way, we
can get rid of the special way for NUMA stats and keep the performance gain
at the same time. With this patch series, about 170 lines code can be
saved.

The first patch migrates NUMA stats from per-zone to pre-node using the
existed per-cpu infrastructure. There is a little user-visual change when
read /proc/zoneinfo listed below:
 Before   After
Node 0, zone  DMA   Node 0, zone  DMA
  per-node stats  per-node stats
  nr_inactive_anon 7244  *numa_hit 98665086*
  nr_active_anon 177064  *numa_miss0*
  ...*numa_foreign 0*
  nr_bounce0 *numa_interleave 21059*
  nr_free_cma  0 *numa_local   98665086*
 *numa_hit 0**numa_other   0*
 *numa_miss0* nr_inactive_anon 20055
 *numa_foreign 0* nr_active_anon 389771
 *numa_interleave 0*  ...
 *numa_local   0* nr_bounce0
 *numa_other   0* nr_free_cma  0

The second patch extends the local cpu counter vm_stat_node_diff from s8 to
s16. It does not have any functionality change.

The third patch uses a large and constant threshold size for NUMA counters
to reduce the global NUMA counters update frequency.

The forth patch uses node_page_state_snapshot instead of node_page_state
when query a node stats (e.g. cat /sys/devices/system/node/node*/vmstat).
The only differece is that the stats value in local cpus are also included
in node_page_state_snapshot.

The last patch renames zone_statistics() to numa_statistics().

At last, I want to extend my heartiest appreciation for Michal Hocko's
suggestion of reusing the existed per-cpu infrastructure making it much
better than before.

Changelog:
  v1->v2:
  a) enhance the existed per-cpu infrastructure for node page stats by
  entending local cpu counters vm_node_stat_diff from s8 to s16
  b) reuse the per-cpu infrastrcuture for NUMA stats

Kemi Wang (5):
  mm: migrate NUMA stats from per-zone to per-node
  mm: Extends local cpu counter vm_diff_nodestat from s8 to s16
  mm: enlarge NUMA counters threshold size
  mm: use node_page_state_snapshot to avoid deviation
  mm: Rename zone_statistics() to numa_statistics()

 drivers/base/node.c|  28 +++
 include/linux/mmzone.h |  31 
 include/linux/vmstat.h |  31 
 mm/mempolicy.c |   2 +-
 mm/page_alloc.c|  22 +++---
 mm/vmstat.c| 206 +
 6 files changed, 74 insertions(+), 246 deletions(-)

-- 
2.7.4



Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper

2017-12-18 Thread Masami Hiramatsu
On Mon, 18 Dec 2017 16:09:30 +0100
Daniel Borkmann  wrote:

> On 12/18/2017 10:51 AM, Masami Hiramatsu wrote:
> > On Fri, 15 Dec 2017 14:12:54 -0500
> > Josef Bacik  wrote:
> >> From: Josef Bacik 
> >>
> >> Error injection is sloppy and very ad-hoc.  BPF could fill this niche
> >> perfectly with it's kprobe functionality.  We could make sure errors are
> >> only triggered in specific call chains that we care about with very
> >> specific situations.  Accomplish this with the bpf_override_funciton
> >> helper.  This will modify the probe'd callers return value to the
> >> specified value and set the PC to an override function that simply
> >> returns, bypassing the originally probed function.  This gives us a nice
> >> clean way to implement systematic error injection for all of our code
> >> paths.
> > 
> > OK, got it. I think the error_injectable function list should be defined
> > in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
> > the "safeness".
> > 
> > [...]
> >> diff --git a/arch/x86/kernel/kprobes/ftrace.c 
> >> b/arch/x86/kernel/kprobes/ftrace.c
> >> index 8dc0161cec8f..1ea748d682fd 100644
> >> --- a/arch/x86/kernel/kprobes/ftrace.c
> >> +++ b/arch/x86/kernel/kprobes/ftrace.c
> >> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
> >>p->ainsn.boostable = false;
> >>return 0;
> >>  }
> >> +
> >> +asmlinkage void override_func(void);
> >> +asm(
> >> +  ".type override_func, @function\n"
> >> +  "override_func:\n"
> >> +  "   ret\n"
> >> +  ".size override_func, .-override_func\n"
> >> +);
> >> +
> >> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
> >> +{
> >> +  regs->ip = (unsigned long)_func;
> >> +}
> >> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
> > 
> > Calling this as "override_function" is meaningless. This is a function
> > which just return. So I think combination of just_return_func() and
> > arch_bpf_override_func_just_return() will be better.
> > 
> > Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
> > dependent implementation of kprobes, not bpf.
> 
> Josef, please work out any necessary cleanups that would still need
> to be addressed based on Masami's feedback and send them as follow-up
> patches, thanks.
> 
> > Hmm, arch/x86/net/bpf_jit_comp.c will be better place?
> 
> (No, it's JIT only and I'd really prefer to keep it that way, mixing
>  this would result in a huge mess.)

OK, that is same to kprobes. kernel/kprobes.c and arch/x86/kernel/kprobe/*
are for instrumentation code. And kernel/trace/trace_kprobe.c is ftrace's
kprobe user interface, just one implementation of kprobe usage. So please
do not mix it up. It will result in a huge mess to me.

Thank you,

-- 
Masami Hiramatsu 


Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper

2017-12-18 Thread Masami Hiramatsu
On Mon, 18 Dec 2017 16:09:30 +0100
Daniel Borkmann  wrote:

> On 12/18/2017 10:51 AM, Masami Hiramatsu wrote:
> > On Fri, 15 Dec 2017 14:12:54 -0500
> > Josef Bacik  wrote:
> >> From: Josef Bacik 
> >>
> >> Error injection is sloppy and very ad-hoc.  BPF could fill this niche
> >> perfectly with it's kprobe functionality.  We could make sure errors are
> >> only triggered in specific call chains that we care about with very
> >> specific situations.  Accomplish this with the bpf_override_funciton
> >> helper.  This will modify the probe'd callers return value to the
> >> specified value and set the PC to an override function that simply
> >> returns, bypassing the originally probed function.  This gives us a nice
> >> clean way to implement systematic error injection for all of our code
> >> paths.
> > 
> > OK, got it. I think the error_injectable function list should be defined
> > in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
> > the "safeness".
> > 
> > [...]
> >> diff --git a/arch/x86/kernel/kprobes/ftrace.c 
> >> b/arch/x86/kernel/kprobes/ftrace.c
> >> index 8dc0161cec8f..1ea748d682fd 100644
> >> --- a/arch/x86/kernel/kprobes/ftrace.c
> >> +++ b/arch/x86/kernel/kprobes/ftrace.c
> >> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
> >>p->ainsn.boostable = false;
> >>return 0;
> >>  }
> >> +
> >> +asmlinkage void override_func(void);
> >> +asm(
> >> +  ".type override_func, @function\n"
> >> +  "override_func:\n"
> >> +  "   ret\n"
> >> +  ".size override_func, .-override_func\n"
> >> +);
> >> +
> >> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
> >> +{
> >> +  regs->ip = (unsigned long)_func;
> >> +}
> >> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
> > 
> > Calling this as "override_function" is meaningless. This is a function
> > which just return. So I think combination of just_return_func() and
> > arch_bpf_override_func_just_return() will be better.
> > 
> > Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
> > dependent implementation of kprobes, not bpf.
> 
> Josef, please work out any necessary cleanups that would still need
> to be addressed based on Masami's feedback and send them as follow-up
> patches, thanks.
> 
> > Hmm, arch/x86/net/bpf_jit_comp.c will be better place?
> 
> (No, it's JIT only and I'd really prefer to keep it that way, mixing
>  this would result in a huge mess.)

OK, that is same to kprobes. kernel/kprobes.c and arch/x86/kernel/kprobe/*
are for instrumentation code. And kernel/trace/trace_kprobe.c is ftrace's
kprobe user interface, just one implementation of kprobe usage. So please
do not mix it up. It will result in a huge mess to me.

Thank you,

-- 
Masami Hiramatsu 


Re: [PATCH 0/2] arm64 SMMUv3 PMU driver with IORT support

2017-12-18 Thread Linu Cherian
Hi Robin,

On Mon Dec 18, 2017 at 02:48:14PM +, Robin Murphy wrote:
> On 10/12/17 02:35, Linu Cherian wrote:
> >Hi,
> >
> >
> >On Fri Aug 04, 2017 at 03:59:12PM -0400, Neil Leeder wrote:
> >>This adds a driver for the SMMUv3 PMU into the perf framework.
> >>It includes an IORT update to support PM Counter Groups.
> >>
> >
> >In one of Cavium's upcoming SOC, SMMU PMCG implementation is such a way
> >that platform bus id (Device ID in ITS terminmology)is shared with that of 
> >SMMU.
> >This would be a matter of concern for software if the SMMU and SMMU PMCG 
> >blocks
> >are managed by two independent drivers.
> >
> >The problem arises when we want to alloc/free MSIs for these devices
> >using the APIs, platform_msi_domain_alloc/free_irqs.
> >Platform bus id being same for these two hardware blocks, they end up 
> >sharing the same
> >ITT(Interrupt Translation Table) in GIC ITS and hence alloc, free and 
> >management
> >of this shared ITT becomes a problem when they are managed by two independent
> >drivers.
> 
> What is the problem exactly? IIRC resizing a possibly-live ITT is a
> right pain in the bum to do - is it just that?

Yes exactly. Resizing ITT was the problem in sharing.

 
> >We were looking into the option of keeping the SMMU PMCG nodes as sub nodes 
> >under
> >SMMUv3 node, so that SMMUv3 driver could probe and figure out the total 
> >vectors
> >required for SMMU PMCG devices and make a common 
> >platform_msi_domain_alloc/free_irqs
> >function call for all devices that share the platform bus id.
> 
> I'm not sure how scalable that approach would be, since it's not
> entirely obvious how to handle PMCGs associated with named
> components or root complexes (rather than directly with SMMU
> instances). We certainly don't want to end up spraying similar PMCG
> DevID logic around all manner of GPU/accelerator/etc. drivers in
> future (whilst PMCGs for device TLBs will be expected to have
> distinct IDs from their host devices, they could reasonably still
> overlap with other PMCGs/SMMUs).
>

OK.
 
While trying the above approach, we also felt that the code will become 
lot messier than actually thought.

> >Would like to know your expert opinion on what would be the right approach
> >to handle this case ?
> 
> My gut feeling says the way to deal with this properly is in the ITS
> code, but I appreciate that that's a lot easier said than done :/
>

Yes Correct.
 
> Robin.

-- 
Linu cherian


Re: [PATCH 0/2] arm64 SMMUv3 PMU driver with IORT support

2017-12-18 Thread Linu Cherian
Hi Robin,

On Mon Dec 18, 2017 at 02:48:14PM +, Robin Murphy wrote:
> On 10/12/17 02:35, Linu Cherian wrote:
> >Hi,
> >
> >
> >On Fri Aug 04, 2017 at 03:59:12PM -0400, Neil Leeder wrote:
> >>This adds a driver for the SMMUv3 PMU into the perf framework.
> >>It includes an IORT update to support PM Counter Groups.
> >>
> >
> >In one of Cavium's upcoming SOC, SMMU PMCG implementation is such a way
> >that platform bus id (Device ID in ITS terminmology)is shared with that of 
> >SMMU.
> >This would be a matter of concern for software if the SMMU and SMMU PMCG 
> >blocks
> >are managed by two independent drivers.
> >
> >The problem arises when we want to alloc/free MSIs for these devices
> >using the APIs, platform_msi_domain_alloc/free_irqs.
> >Platform bus id being same for these two hardware blocks, they end up 
> >sharing the same
> >ITT(Interrupt Translation Table) in GIC ITS and hence alloc, free and 
> >management
> >of this shared ITT becomes a problem when they are managed by two independent
> >drivers.
> 
> What is the problem exactly? IIRC resizing a possibly-live ITT is a
> right pain in the bum to do - is it just that?

Yes exactly. Resizing ITT was the problem in sharing.

 
> >We were looking into the option of keeping the SMMU PMCG nodes as sub nodes 
> >under
> >SMMUv3 node, so that SMMUv3 driver could probe and figure out the total 
> >vectors
> >required for SMMU PMCG devices and make a common 
> >platform_msi_domain_alloc/free_irqs
> >function call for all devices that share the platform bus id.
> 
> I'm not sure how scalable that approach would be, since it's not
> entirely obvious how to handle PMCGs associated with named
> components or root complexes (rather than directly with SMMU
> instances). We certainly don't want to end up spraying similar PMCG
> DevID logic around all manner of GPU/accelerator/etc. drivers in
> future (whilst PMCGs for device TLBs will be expected to have
> distinct IDs from their host devices, they could reasonably still
> overlap with other PMCGs/SMMUs).
>

OK.
 
While trying the above approach, we also felt that the code will become 
lot messier than actually thought.

> >Would like to know your expert opinion on what would be the right approach
> >to handle this case ?
> 
> My gut feeling says the way to deal with this properly is in the ITS
> code, but I appreciate that that's a lot easier said than done :/
>

Yes Correct.
 
> Robin.

-- 
Linu cherian


Re: [PATCH v2 2/3] vsprintf: print if symbol not found

2017-12-18 Thread Tobin C. Harding
On Mon, Dec 18, 2017 at 10:18:27PM -0800, Joe Perches wrote:
> On Tue, 2017-12-19 at 14:28 +1100, Tobin C. Harding wrote:
> > Depends on: commit 40eee173a35e ("kallsyms: don't leak address when
> > symbol not found")
> > 
> > Currently vsprintf for specifiers %p[SsB] relies on the behaviour of
> > kallsyms (sprint_symbol()) and prints the actual address if a symbol is
> > not found. Previous patch changes this behaviour so that sprint_symbol()
> > returns an error if symbol not found. With this patch in place we can
> > print a sanitized message '' instead of leaking the
> > address.
> > 
> > Print '' for printk specifier %p[sSB] if symbol look
> > up fails.
> > 
> > Signed-off-by: Tobin C. Harding 
> > ---
> >  lib/vsprintf.c | 11 ---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> > index 01c3957b2de6..820ed4fe6e6c 100644
> > --- a/lib/vsprintf.c
> > +++ b/lib/vsprintf.c
> > @@ -674,6 +674,8 @@ char *symbol_string(char *buf, char *end, void *ptr,
> > unsigned long value;
> >  #ifdef CONFIG_KALLSYMS
> > char sym[KSYM_SYMBOL_LEN];
> > +   const char *sym_not_found = "";
> 
> This will be reinitialized on every use.
> 
> > +   int ret;
> >  #endif
> >  
> > if (fmt[1] == 'R')
> > @@ -682,11 +684,14 @@ char *symbol_string(char *buf, char *end, void *ptr,
> >  
> >  #ifdef CONFIG_KALLSYMS
> > if (*fmt == 'B')
> > -   sprint_backtrace(sym, value);
> > +   ret = sprint_backtrace(sym, value);
> > else if (*fmt != 'f' && *fmt != 's')
> > -   sprint_symbol(sym, value);
> > +   ret = sprint_symbol(sym, value);
> > else
> > -   sprint_symbol_no_offset(sym, value);
> > +   ret = sprint_symbol_no_offset(sym, value);
> > +
> > +   if (ret == -1)
> > +   strcpy(sym, sym_not_found);
> 
> 
> This could avoid the unnecessary strcpy if sym_not_found
> was not used at all and this was used instead
> 
>   if (ret == -1)
>   return string(buf, end, "", spec);
> 
>   return string(buf, end, sym, spec);
> 
> or maybe
> 
>   return string(buf, end, ret == -1 ? "" : sum, spec);

Oh, thanks. This is much cleaner. Will re-spin.

thanks,
Tobin.


Re: [PATCH v2 2/3] vsprintf: print if symbol not found

2017-12-18 Thread Tobin C. Harding
On Mon, Dec 18, 2017 at 10:18:27PM -0800, Joe Perches wrote:
> On Tue, 2017-12-19 at 14:28 +1100, Tobin C. Harding wrote:
> > Depends on: commit 40eee173a35e ("kallsyms: don't leak address when
> > symbol not found")
> > 
> > Currently vsprintf for specifiers %p[SsB] relies on the behaviour of
> > kallsyms (sprint_symbol()) and prints the actual address if a symbol is
> > not found. Previous patch changes this behaviour so that sprint_symbol()
> > returns an error if symbol not found. With this patch in place we can
> > print a sanitized message '' instead of leaking the
> > address.
> > 
> > Print '' for printk specifier %p[sSB] if symbol look
> > up fails.
> > 
> > Signed-off-by: Tobin C. Harding 
> > ---
> >  lib/vsprintf.c | 11 ---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> > index 01c3957b2de6..820ed4fe6e6c 100644
> > --- a/lib/vsprintf.c
> > +++ b/lib/vsprintf.c
> > @@ -674,6 +674,8 @@ char *symbol_string(char *buf, char *end, void *ptr,
> > unsigned long value;
> >  #ifdef CONFIG_KALLSYMS
> > char sym[KSYM_SYMBOL_LEN];
> > +   const char *sym_not_found = "";
> 
> This will be reinitialized on every use.
> 
> > +   int ret;
> >  #endif
> >  
> > if (fmt[1] == 'R')
> > @@ -682,11 +684,14 @@ char *symbol_string(char *buf, char *end, void *ptr,
> >  
> >  #ifdef CONFIG_KALLSYMS
> > if (*fmt == 'B')
> > -   sprint_backtrace(sym, value);
> > +   ret = sprint_backtrace(sym, value);
> > else if (*fmt != 'f' && *fmt != 's')
> > -   sprint_symbol(sym, value);
> > +   ret = sprint_symbol(sym, value);
> > else
> > -   sprint_symbol_no_offset(sym, value);
> > +   ret = sprint_symbol_no_offset(sym, value);
> > +
> > +   if (ret == -1)
> > +   strcpy(sym, sym_not_found);
> 
> 
> This could avoid the unnecessary strcpy if sym_not_found
> was not used at all and this was used instead
> 
>   if (ret == -1)
>   return string(buf, end, "", spec);
> 
>   return string(buf, end, sym, spec);
> 
> or maybe
> 
>   return string(buf, end, ret == -1 ? "" : sum, spec);

Oh, thanks. This is much cleaner. Will re-spin.

thanks,
Tobin.


Re: [PATCH] KVM/Eventfd: Avoid crash when assign and deassign same eventfd in parallel.

2017-12-18 Thread Lan Tianyu
Hi David:
Thanks for your review.

On 2017年12月18日 16:30, David Hildenbrand wrote:
> On 18.12.2017 00:40, Lan Tianyu wrote:
>> Syzroot reports crash in kvm_irqfd_assign() is caused by use-after-free.
>> Because kvm_irqfd_assign() and kvm_irqfd_deassign() can't run in parallel
>> for same eventfd. When assign path hasn't been finished after irqfd
>> has been added to kvm->irqfds.items list, another thead may deassign the
>> eventfd and free struct kvm_kernel_irqfd(). This causes assign path still
>> uses struct kvm_kernel_irqfd freed by deassign path. To avoid such issue,
>> add "initialized" flag in the struct kvm_kernel_irqfd and check the flag 
>> before
>> deactivating irqfd. If irqfd is still in initialization, deassign path
>> return fault.>
>> Reported-by: Dmitry Vyukov 
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Dmitry Vyukov 
>> Cc: Wanpeng Li 
>> Signed-off-by: Tianyu Lan 
>> ---
>>  include/linux/kvm_irqfd.h |  1 +
>>  virt/kvm/eventfd.c| 11 +--
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
>> index 76c2fbc..be6b254 100644
>> --- a/include/linux/kvm_irqfd.h
>> +++ b/include/linux/kvm_irqfd.h
>> @@ -66,6 +66,7 @@ struct kvm_kernel_irqfd {
>>  struct work_struct shutdown;
>>  struct irq_bypass_consumer consumer;
>>  struct irq_bypass_producer *producer;
>> +u8 initialized:1;
>>  };
>>  
>>  #endif /* __LINUX_KVM_IRQFD_H */
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index a334399..80f06e6 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -421,6 +421,7 @@ int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
>>  }
>>  #endif
>>  
>> +irqfd->initialized = 1;
> 
> The ugly thing in kvm_irqfd_assign() is that we access irqfd without
> holding a lock. I think that should rather be fixed than working around
> that issue. (e.g. lock() -> lookup again -> verify still in list ->
> unlock())

The new lock should be always held in assign path otherwise we need to
lookup irqfds list frequently, right?

At first, I tried to use a mutex lock between assign and deassign path
but assign path already involves some locks and add new lock maybe
introduce dead lock. So I used flag check to replace with new lock.

> 
>>  return 0;
>>  
>>  fail:
>> @@ -525,6 +526,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>>  {
> 
> Which tool are you using to generate diffs? git format-patch?

Yes, I used git version 1.8.3.1 :)
This also confused me. I will try newer version.

> 
> Mentioning, because the indicated function here  (kvm_irqfd_deassign)
> 
>>  struct kvm_kernel_irqfd *irqfd, *tmp;
>>  struct eventfd_ctx *eventfd;
>> +int ret = 0;
>>  
>>  eventfd = eventfd_ctx_fdget(args->fd);
>>  if (IS_ERR(eventfd))
>> @@ -543,7 +545,12 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>>  write_seqcount_begin(>irq_entry_sc);
>>  irqfd->irq_entry.type = 0;
>>  write_seqcount_end(>irq_entry_sc);
>> -irqfd_deactivate(irqfd);
>> +
>> +if (irqfd->initialized)
>> +irqfd_deactivate(irqfd);
>> +else
>> +ret = -EFAULT;
>> +
>>  }
>>  }
>>  
>> @@ -557,7 +564,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>>   */
> 
> and here are wrong and misleading (kvm_irqfd_deassig). (and just noticed
> also in the second hunk)
> 
>>  flush_workqueue(irqfd_cleanup_wq);
>>  
>> -return 0;
>> +return ret;
>>  }
>>  
>>  int
>>
> 
> 


-- 
Best regards
Tianyu Lan


Re: [PATCH] KVM/Eventfd: Avoid crash when assign and deassign same eventfd in parallel.

2017-12-18 Thread Lan Tianyu
Hi David:
Thanks for your review.

On 2017年12月18日 16:30, David Hildenbrand wrote:
> On 18.12.2017 00:40, Lan Tianyu wrote:
>> Syzroot reports crash in kvm_irqfd_assign() is caused by use-after-free.
>> Because kvm_irqfd_assign() and kvm_irqfd_deassign() can't run in parallel
>> for same eventfd. When assign path hasn't been finished after irqfd
>> has been added to kvm->irqfds.items list, another thead may deassign the
>> eventfd and free struct kvm_kernel_irqfd(). This causes assign path still
>> uses struct kvm_kernel_irqfd freed by deassign path. To avoid such issue,
>> add "initialized" flag in the struct kvm_kernel_irqfd and check the flag 
>> before
>> deactivating irqfd. If irqfd is still in initialization, deassign path
>> return fault.>
>> Reported-by: Dmitry Vyukov 
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Dmitry Vyukov 
>> Cc: Wanpeng Li 
>> Signed-off-by: Tianyu Lan 
>> ---
>>  include/linux/kvm_irqfd.h |  1 +
>>  virt/kvm/eventfd.c| 11 +--
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
>> index 76c2fbc..be6b254 100644
>> --- a/include/linux/kvm_irqfd.h
>> +++ b/include/linux/kvm_irqfd.h
>> @@ -66,6 +66,7 @@ struct kvm_kernel_irqfd {
>>  struct work_struct shutdown;
>>  struct irq_bypass_consumer consumer;
>>  struct irq_bypass_producer *producer;
>> +u8 initialized:1;
>>  };
>>  
>>  #endif /* __LINUX_KVM_IRQFD_H */
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index a334399..80f06e6 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -421,6 +421,7 @@ int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
>>  }
>>  #endif
>>  
>> +irqfd->initialized = 1;
> 
> The ugly thing in kvm_irqfd_assign() is that we access irqfd without
> holding a lock. I think that should rather be fixed than working around
> that issue. (e.g. lock() -> lookup again -> verify still in list ->
> unlock())

The new lock should be always held in assign path otherwise we need to
lookup irqfds list frequently, right?

At first, I tried to use a mutex lock between assign and deassign path
but assign path already involves some locks and add new lock maybe
introduce dead lock. So I used flag check to replace with new lock.

> 
>>  return 0;
>>  
>>  fail:
>> @@ -525,6 +526,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>>  {
> 
> Which tool are you using to generate diffs? git format-patch?

Yes, I used git version 1.8.3.1 :)
This also confused me. I will try newer version.

> 
> Mentioning, because the indicated function here  (kvm_irqfd_deassign)
> 
>>  struct kvm_kernel_irqfd *irqfd, *tmp;
>>  struct eventfd_ctx *eventfd;
>> +int ret = 0;
>>  
>>  eventfd = eventfd_ctx_fdget(args->fd);
>>  if (IS_ERR(eventfd))
>> @@ -543,7 +545,12 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>>  write_seqcount_begin(>irq_entry_sc);
>>  irqfd->irq_entry.type = 0;
>>  write_seqcount_end(>irq_entry_sc);
>> -irqfd_deactivate(irqfd);
>> +
>> +if (irqfd->initialized)
>> +irqfd_deactivate(irqfd);
>> +else
>> +ret = -EFAULT;
>> +
>>  }
>>  }
>>  
>> @@ -557,7 +564,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>>   */
> 
> and here are wrong and misleading (kvm_irqfd_deassig). (and just noticed
> also in the second hunk)
> 
>>  flush_workqueue(irqfd_cleanup_wq);
>>  
>> -return 0;
>> +return ret;
>>  }
>>  
>>  int
>>
> 
> 


-- 
Best regards
Tianyu Lan


Re: [PATCH v10 1/5] add infrastructure for tagging functions as error injectable

2017-12-18 Thread Masami Hiramatsu
On Fri, 15 Dec 2017 14:12:52 -0500
Josef Bacik  wrote:

> From: Josef Bacik 
> 
> Using BPF we can override kprob'ed functions and return arbitrary
> values.  Obviously this can be a bit unsafe, so make this feature opt-in
> for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
> order to give BPF access to that function for error injection purposes.
> 
> Signed-off-by: Josef Bacik 
> Acked-by: Ingo Molnar 
> ---
>  include/asm-generic/vmlinux.lds.h |  10 +++
>  include/linux/bpf.h   |  11 +++
>  include/linux/kprobes.h   |   1 +
>  include/linux/module.h|   5 ++
>  kernel/kprobes.c  | 163 
> ++
>  kernel/module.c   |   6 +-
>  6 files changed, 195 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index ee8b707d9fa9..a2e8582d094a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -136,6 +136,15 @@
>  #define KPROBE_BLACKLIST()
>  #endif
>  
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define ERROR_INJECT_LIST()  . = ALIGN(8);   
> \
> + 
> VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .;   \
> + KEEP(*(_kprobe_error_inject_list))  
> \
> + VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) 
> = .;
> +#else
> +#define ERROR_INJECT_LIST()
> +#endif
> +
>  #ifdef CONFIG_EVENT_TRACING
>  #define FTRACE_EVENTS()  . = ALIGN(8);   
> \
>   VMLINUX_SYMBOL(__start_ftrace_events) = .;  \
> @@ -564,6 +573,7 @@
>   FTRACE_EVENTS() \
>   TRACE_SYSCALLS()\
>   KPROBE_BLACKLIST()  \
> + ERROR_INJECT_LIST() \
>   MEM_DISCARD(init.rodata)\
>   CLK_OF_TABLES() \
>   RESERVEDMEM_OF_TABLES() \
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e55e4255a210..7f4d2a953173 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -576,4 +576,15 @@ extern const struct bpf_func_proto 
> bpf_sock_map_update_proto;
>  void bpf_user_rnd_init_once(void);
>  u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>  
> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE

BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
Since this feature override a function to just return with
some return value (as far as I understand, or would you
also plan to modify execution path inside a function?),
I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
CONFIG_BPF_EXECUTION_OVERRIDE.

Indeed, BPF is based on kprobes, but it seems you are limiting it
with ftrace (function-call trace) (I'm not sure the reason why),
so using "kprobes" for this feature seems strange for me.

The idea in this patch itself (marking injectable function on
a list) is OK to me. 

Thank you,

> +#define BPF_ALLOW_ERROR_INJECTION(fname) \
> +static unsigned long __used  \
> + __attribute__((__section__("_kprobe_error_inject_list")))   \
> + _eil_addr_##fname = (unsigned long)fname;
> +#else
> +#define BPF_ALLOW_ERROR_INJECTION(fname)
> +#endif
> +#endif
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9440a2fc8893..963fd364f3d6 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -271,6 +271,7 @@ extern bool arch_kprobe_on_func_entry(unsigned long 
> offset);
>  extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, 
> unsigned long offset);
>  
>  extern bool within_kprobe_blacklist(unsigned long addr);
> +extern bool within_kprobe_error_injection_list(unsigned long addr);
>  
>  struct kprobe_insn_cache {
>   struct mutex mutex;
> diff --git a/include/linux/module.h b/include/linux/module.h
> index c69b49abe877..548fa09fa806 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -475,6 +475,11 @@ struct module {
>   ctor_fn_t *ctors;
>   unsigned int num_ctors;
>  #endif
> +
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + unsigned int num_kprobe_ei_funcs;
> + unsigned long *kprobe_ei_funcs;
> +#endif
>  } cacheline_aligned __randomize_layout;
>  #ifndef MODULE_ARCH_INIT
>  #define MODULE_ARCH_INIT {}
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index da2ccf142358..b4aab48ad258 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -83,6 

Re: [PATCH v10 1/5] add infrastructure for tagging functions as error injectable

2017-12-18 Thread Masami Hiramatsu
On Fri, 15 Dec 2017 14:12:52 -0500
Josef Bacik  wrote:

> From: Josef Bacik 
> 
> Using BPF we can override kprob'ed functions and return arbitrary
> values.  Obviously this can be a bit unsafe, so make this feature opt-in
> for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
> order to give BPF access to that function for error injection purposes.
> 
> Signed-off-by: Josef Bacik 
> Acked-by: Ingo Molnar 
> ---
>  include/asm-generic/vmlinux.lds.h |  10 +++
>  include/linux/bpf.h   |  11 +++
>  include/linux/kprobes.h   |   1 +
>  include/linux/module.h|   5 ++
>  kernel/kprobes.c  | 163 
> ++
>  kernel/module.c   |   6 +-
>  6 files changed, 195 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index ee8b707d9fa9..a2e8582d094a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -136,6 +136,15 @@
>  #define KPROBE_BLACKLIST()
>  #endif
>  
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define ERROR_INJECT_LIST()  . = ALIGN(8);   
> \
> + 
> VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .;   \
> + KEEP(*(_kprobe_error_inject_list))  
> \
> + VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) 
> = .;
> +#else
> +#define ERROR_INJECT_LIST()
> +#endif
> +
>  #ifdef CONFIG_EVENT_TRACING
>  #define FTRACE_EVENTS()  . = ALIGN(8);   
> \
>   VMLINUX_SYMBOL(__start_ftrace_events) = .;  \
> @@ -564,6 +573,7 @@
>   FTRACE_EVENTS() \
>   TRACE_SYSCALLS()\
>   KPROBE_BLACKLIST()  \
> + ERROR_INJECT_LIST() \
>   MEM_DISCARD(init.rodata)\
>   CLK_OF_TABLES() \
>   RESERVEDMEM_OF_TABLES() \
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e55e4255a210..7f4d2a953173 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -576,4 +576,15 @@ extern const struct bpf_func_proto 
> bpf_sock_map_update_proto;
>  void bpf_user_rnd_init_once(void);
>  u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>  
> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE

BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
Since this feature override a function to just return with
some return value (as far as I understand, or would you
also plan to modify execution path inside a function?),
I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
CONFIG_BPF_EXECUTION_OVERRIDE.

Indeed, BPF is based on kprobes, but it seems you are limiting it
with ftrace (function-call trace) (I'm not sure the reason why),
so using "kprobes" for this feature seems strange for me.

The idea in this patch itself (marking injectable function on
a list) is OK to me. 

Thank you,

> +#define BPF_ALLOW_ERROR_INJECTION(fname) \
> +static unsigned long __used  \
> + __attribute__((__section__("_kprobe_error_inject_list")))   \
> + _eil_addr_##fname = (unsigned long)fname;
> +#else
> +#define BPF_ALLOW_ERROR_INJECTION(fname)
> +#endif
> +#endif
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9440a2fc8893..963fd364f3d6 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -271,6 +271,7 @@ extern bool arch_kprobe_on_func_entry(unsigned long 
> offset);
>  extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, 
> unsigned long offset);
>  
>  extern bool within_kprobe_blacklist(unsigned long addr);
> +extern bool within_kprobe_error_injection_list(unsigned long addr);
>  
>  struct kprobe_insn_cache {
>   struct mutex mutex;
> diff --git a/include/linux/module.h b/include/linux/module.h
> index c69b49abe877..548fa09fa806 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -475,6 +475,11 @@ struct module {
>   ctor_fn_t *ctors;
>   unsigned int num_ctors;
>  #endif
> +
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + unsigned int num_kprobe_ei_funcs;
> + unsigned long *kprobe_ei_funcs;
> +#endif
>  } cacheline_aligned __randomize_layout;
>  #ifndef MODULE_ARCH_INIT
>  #define MODULE_ARCH_INIT {}
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index da2ccf142358..b4aab48ad258 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -83,6 +83,16 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned 
> 

[PATCH] fs: add kernel-doc entry for function argument

2017-12-18 Thread Tobin C. Harding
Building kernel docs causes SPHINX to emit

warning: No description found for parameter 'rcu'

Add kernel-doc description for parameter 'rcu'

Signed-off-by: Tobin C. Harding 
---

Description taken from comments in update_ovl_inode_times()

/*
 * Nothing to do if in rcu or if non-overlayfs
 */

 fs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index 03102d6ef044..0530b5b5e76f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1669,6 +1669,7 @@ static int update_time(struct inode *inode, struct 
timespec *time, int flags)
  * touch_atime -   update the access time
  * @path: the  path to update
  * @inode: inode to update
+ * @rcu: true if in rcu
  *
  * Update the accessed time on an inode and mark it for writeback.
  * This function automatically handles read only file systems and media,
-- 
2.7.4



[PATCH] fs: add kernel-doc entry for function argument

2017-12-18 Thread Tobin C. Harding
Building kernel docs causes SPHINX to emit

warning: No description found for parameter 'rcu'

Add kernel-doc description for parameter 'rcu'

Signed-off-by: Tobin C. Harding 
---

Description taken from comments in update_ovl_inode_times()

/*
 * Nothing to do if in rcu or if non-overlayfs
 */

 fs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index 03102d6ef044..0530b5b5e76f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1669,6 +1669,7 @@ static int update_time(struct inode *inode, struct 
timespec *time, int flags)
  * touch_atime -   update the access time
  * @path: the  path to update
  * @inode: inode to update
+ * @rcu: true if in rcu
  *
  * Update the accessed time on an inode and mark it for writeback.
  * This function automatically handles read only file systems and media,
-- 
2.7.4



[PATCH v2] crypto: AF_ALG - limit mask and type

2017-12-18 Thread Stephan Müller
The user space interface allows specifying the type and the mask field
used to allocate the cipher. As user space can precisely select the
desired cipher by using either the name or the driver name, additional
selection options for cipher are not considered necessary and relevant
for user space.

This fixes a bug where user space is able to cause one cipher to be
registered multiple times potentially exhausting kernel memory.

Reported-by: syzbot 
Cc: 
Signed-off-by: Stephan Mueller 
---
 crypto/af_alg.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 1e5353f62067..4f4cfc5a7ef3 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -150,7 +150,6 @@ EXPORT_SYMBOL_GPL(af_alg_release_parent);
 
 static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
-   const u32 forbidden = CRYPTO_ALG_INTERNAL;
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct sockaddr_alg *sa = (void *)uaddr;
@@ -176,9 +175,12 @@ static int alg_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
if (IS_ERR(type))
return PTR_ERR(type);
 
-   private = type->bind(sa->salg_name,
-sa->salg_feat & ~forbidden,
-sa->salg_mask & ~forbidden);
+   /*
+* The use of the salg_feat and salg_mask are forbidden as they expose
+* too much of the low-level handling which is not suitable for
+* hostile code.
+*/
+   private = type->bind(sa->salg_name, 0, 0);
if (IS_ERR(private)) {
module_put(type->owner);
return PTR_ERR(private);
-- 
2.14.3




[PATCH v2] crypto: AF_ALG - limit mask and type

2017-12-18 Thread Stephan Müller
The user space interface allows specifying the type and the mask field
used to allocate the cipher. As user space can precisely select the
desired cipher by using either the name or the driver name, additional
selection options for cipher are not considered necessary and relevant
for user space.

This fixes a bug where user space is able to cause one cipher to be
registered multiple times potentially exhausting kernel memory.

Reported-by: syzbot 
Cc: 
Signed-off-by: Stephan Mueller 
---
 crypto/af_alg.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 1e5353f62067..4f4cfc5a7ef3 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -150,7 +150,6 @@ EXPORT_SYMBOL_GPL(af_alg_release_parent);
 
 static int alg_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
-   const u32 forbidden = CRYPTO_ALG_INTERNAL;
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct sockaddr_alg *sa = (void *)uaddr;
@@ -176,9 +175,12 @@ static int alg_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
if (IS_ERR(type))
return PTR_ERR(type);
 
-   private = type->bind(sa->salg_name,
-sa->salg_feat & ~forbidden,
-sa->salg_mask & ~forbidden);
+   /*
+* The use of the salg_feat and salg_mask are forbidden as they expose
+* too much of the low-level handling which is not suitable for
+* hostile code.
+*/
+   private = type->bind(sa->salg_name, 0, 0);
if (IS_ERR(private)) {
module_put(type->owner);
return PTR_ERR(private);
-- 
2.14.3




RE: [PATCH 0/3] Use mm_struct and switch_mm() instead of manually

2017-12-18 Thread Prakhya, Sai Praneeth
> > Changes in V3:
> > 1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
> > by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
> > initialized, therefore initialize efi_mm in efi_alloc_page_tables().
> 
> Thanks for the v3.
> 
> I confirmed that the issue I saw with the v2 when I enabled 'efi=debug' on the
> sgi-uv 300 machine (i.e the NULL pointer access while accessing
> mm_cpumask(next), in the function call
> 'switch_mm_irqs_off') is fixed in the v3.
> 
> Also as I noted during the v2 review, introducing the 'efi_switch_mm() ' 
> helper
> instead of manually twiddling with %cr3 seems more cleaner (having personally
> debugged this leg several times on the underlying x86 EFI machines).
> 
> So in addition to me testing this on the sgi-uv300 and Dell Optiplex EFI 
> enabled
> machine, please feel free to add:
> 
> Reviewed-by: Bhupesh Sharma 
> 

Hi  Bhupesh,

Thanks for the review and re-iterating the usefulness of this patch set.

Regards,
Sai


RE: [PATCH 0/3] Use mm_struct and switch_mm() instead of manually

2017-12-18 Thread Prakhya, Sai Praneeth
> > Changes in V3:
> > 1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
> > by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
> > initialized, therefore initialize efi_mm in efi_alloc_page_tables().
> 
> Thanks for the v3.
> 
> I confirmed that the issue I saw with the v2 when I enabled 'efi=debug' on the
> sgi-uv 300 machine (i.e the NULL pointer access while accessing
> mm_cpumask(next), in the function call
> 'switch_mm_irqs_off') is fixed in the v3.
> 
> Also as I noted during the v2 review, introducing the 'efi_switch_mm() ' 
> helper
> instead of manually twiddling with %cr3 seems more cleaner (having personally
> debugged this leg several times on the underlying x86 EFI machines).
> 
> So in addition to me testing this on the sgi-uv300 and Dell Optiplex EFI 
> enabled
> machine, please feel free to add:
> 
> Reviewed-by: Bhupesh Sharma 
> 

Hi  Bhupesh,

Thanks for the review and re-iterating the usefulness of this patch set.

Regards,
Sai


Re: [PATCH v2 2/3] vsprintf: print if symbol not found

2017-12-18 Thread Joe Perches
On Tue, 2017-12-19 at 14:28 +1100, Tobin C. Harding wrote:
> Depends on: commit 40eee173a35e ("kallsyms: don't leak address when
> symbol not found")
> 
> Currently vsprintf for specifiers %p[SsB] relies on the behaviour of
> kallsyms (sprint_symbol()) and prints the actual address if a symbol is
> not found. Previous patch changes this behaviour so that sprint_symbol()
> returns an error if symbol not found. With this patch in place we can
> print a sanitized message '' instead of leaking the
> address.
> 
> Print '' for printk specifier %p[sSB] if symbol look
> up fails.
> 
> Signed-off-by: Tobin C. Harding 
> ---
>  lib/vsprintf.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 01c3957b2de6..820ed4fe6e6c 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -674,6 +674,8 @@ char *symbol_string(char *buf, char *end, void *ptr,
>   unsigned long value;
>  #ifdef CONFIG_KALLSYMS
>   char sym[KSYM_SYMBOL_LEN];
> + const char *sym_not_found = "";

This will be reinitialized on every use.

> + int ret;
>  #endif
>  
>   if (fmt[1] == 'R')
> @@ -682,11 +684,14 @@ char *symbol_string(char *buf, char *end, void *ptr,
>  
>  #ifdef CONFIG_KALLSYMS
>   if (*fmt == 'B')
> - sprint_backtrace(sym, value);
> + ret = sprint_backtrace(sym, value);
>   else if (*fmt != 'f' && *fmt != 's')
> - sprint_symbol(sym, value);
> + ret = sprint_symbol(sym, value);
>   else
> - sprint_symbol_no_offset(sym, value);
> + ret = sprint_symbol_no_offset(sym, value);
> +
> + if (ret == -1)
> + strcpy(sym, sym_not_found);


This could avoid the unnecessary strcpy if sym_not_found
was not used at all and this was used instead

if (ret == -1)
return string(buf, end, "", spec);

return string(buf, end, sym, spec);

or maybe

return string(buf, end, ret == -1 ? "" : sum, spec);

>  
>   return string(buf, end, sym, spec);
>  #else


Re: [PATCH v2 2/3] vsprintf: print if symbol not found

2017-12-18 Thread Joe Perches
On Tue, 2017-12-19 at 14:28 +1100, Tobin C. Harding wrote:
> Depends on: commit 40eee173a35e ("kallsyms: don't leak address when
> symbol not found")
> 
> Currently vsprintf for specifiers %p[SsB] relies on the behaviour of
> kallsyms (sprint_symbol()) and prints the actual address if a symbol is
> not found. Previous patch changes this behaviour so that sprint_symbol()
> returns an error if symbol not found. With this patch in place we can
> print a sanitized message '' instead of leaking the
> address.
> 
> Print '' for printk specifier %p[sSB] if symbol look
> up fails.
> 
> Signed-off-by: Tobin C. Harding 
> ---
>  lib/vsprintf.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 01c3957b2de6..820ed4fe6e6c 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -674,6 +674,8 @@ char *symbol_string(char *buf, char *end, void *ptr,
>   unsigned long value;
>  #ifdef CONFIG_KALLSYMS
>   char sym[KSYM_SYMBOL_LEN];
> + const char *sym_not_found = "";

This will be reinitialized on every use.

> + int ret;
>  #endif
>  
>   if (fmt[1] == 'R')
> @@ -682,11 +684,14 @@ char *symbol_string(char *buf, char *end, void *ptr,
>  
>  #ifdef CONFIG_KALLSYMS
>   if (*fmt == 'B')
> - sprint_backtrace(sym, value);
> + ret = sprint_backtrace(sym, value);
>   else if (*fmt != 'f' && *fmt != 's')
> - sprint_symbol(sym, value);
> + ret = sprint_symbol(sym, value);
>   else
> - sprint_symbol_no_offset(sym, value);
> + ret = sprint_symbol_no_offset(sym, value);
> +
> + if (ret == -1)
> + strcpy(sym, sym_not_found);


This could avoid the unnecessary strcpy if sym_not_found
was not used at all and this was used instead

if (ret == -1)
return string(buf, end, "", spec);

return string(buf, end, sym, spec);

or maybe

return string(buf, end, ret == -1 ? "" : sum, spec);

>  
>   return string(buf, end, sym, spec);
>  #else


[PATCH] iio: add kernel-doc '@owner'

2017-12-18 Thread Tobin C. Harding
When building kernel documentation sphinx emits the following warning

warning: No description found for parameter 'owner'

Add description for struct member 'owner'.

Signed-off-by: Tobin C. Harding 
---

I couldn't work out what DRIVER/INTERN were for so I took a guess.

 include/linux/iio/trigger.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iio/trigger.h b/include/linux/iio/trigger.h
index 7d5e44518379..d884b5099cd0 100644
--- a/include/linux/iio/trigger.h
+++ b/include/linux/iio/trigger.h
@@ -43,6 +43,7 @@ struct iio_trigger_ops {
 /**
  * struct iio_trigger - industrial I/O trigger device
  * @ops:   [DRIVER] operations structure
+ * @owner: [DRIVER] owner of this driver module
  * @id:[INTERN] unique id number
  * @name:  [DRIVER] unique name
  * @dev:   [DRIVER] associated device (if relevant)
-- 
2.7.4



[PATCH] iio: add kernel-doc '@owner'

2017-12-18 Thread Tobin C. Harding
When building kernel documentation sphinx emits the following warning

warning: No description found for parameter 'owner'

Add description for struct member 'owner'.

Signed-off-by: Tobin C. Harding 
---

I couldn't work out what DRIVER/INTERN were for so I took a guess.

 include/linux/iio/trigger.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iio/trigger.h b/include/linux/iio/trigger.h
index 7d5e44518379..d884b5099cd0 100644
--- a/include/linux/iio/trigger.h
+++ b/include/linux/iio/trigger.h
@@ -43,6 +43,7 @@ struct iio_trigger_ops {
 /**
  * struct iio_trigger - industrial I/O trigger device
  * @ops:   [DRIVER] operations structure
+ * @owner: [DRIVER] owner of this driver module
  * @id:[INTERN] unique id number
  * @name:  [DRIVER] unique name
  * @dev:   [DRIVER] associated device (if relevant)
-- 
2.7.4



Re: [PATCH] USB: serial: option: adding support for YUGA CLM920-NC5

2017-12-18 Thread Bjørn Mork
"SZ Lin (林上智)"  writes:
>> Johan Hovold  writes:
>> 
>> >> +static const struct option_blacklist_info yuga_clm920_nc5_blacklist = {
>> >> + .reserved = BIT(0) | BIT(1) | BIT(4), };
>> >
>> > Do you really need to blacklist the first interface?
>> 
>> Good question. Interface #0 does look a lot like a Qualcomm DM/DIAG 
>> function, based
>> on two bulk endpoints, no additional descriptors and the fact that it is the 
>> first interface.
>> If so, then we do want a serial driver for it.  There is a basic libqcdm 
>> implementation in
>> ModemManager if you want to test it out.
>> 
>
> I have confirmed that interface #0 is QCDM/DIAG port in this module,
>and thus I will remove this from reserved list in next patch.
>Furthermore, interface #1 is ADB port. Should I also remove this from
>reserved list?

No. ADB is handled by userspace tools using libusb.  It should not be
bound to any serial driver, so you will need to blacklist it.  But you
need to keep the blacklist anyway to include the QCDM/DIAG port

I assume Johan's alternative was to match class/subclass/protocol
against ff/00/00, which would have worked if you only wanted to include
interfaces 2 and 3.

>> And I expect interface #4 is QMI/rmnet?  Feel free to confirm that 
>> assumption with a
>> patch against qmi_wwan :-)
>> 
> Yes, it is. I will send qmi_wwan patch by all means.

Thanks


Bjørn


Re: [PATCH] USB: serial: option: adding support for YUGA CLM920-NC5

2017-12-18 Thread Bjørn Mork
"SZ Lin (林上智)"  writes:
>> Johan Hovold  writes:
>> 
>> >> +static const struct option_blacklist_info yuga_clm920_nc5_blacklist = {
>> >> + .reserved = BIT(0) | BIT(1) | BIT(4), };
>> >
>> > Do you really need to blacklist the first interface?
>> 
>> Good question. Interface #0 does look a lot like a Qualcomm DM/DIAG 
>> function, based
>> on two bulk endpoints, no additional descriptors and the fact that it is the 
>> first interface.
>> If so, then we do want a serial driver for it.  There is a basic libqcdm 
>> implementation in
>> ModemManager if you want to test it out.
>> 
>
> I have confirmed that interface #0 is QCDM/DIAG port in this module,
>and thus I will remove this from reserved list in next patch.
>Furthermore, interface #1 is ADB port. Should I also remove this from
>reserved list?

No. ADB is handled by userspace tools using libusb.  It should not be
bound to any serial driver, so you will need to blacklist it.  But you
need to keep the blacklist anyway to include the QCDM/DIAG port

I assume Johan's alternative was to match class/subclass/protocol
against ff/00/00, which would have worked if you only wanted to include
interfaces 2 and 3.

>> And I expect interface #4 is QMI/rmnet?  Feel free to confirm that 
>> assumption with a
>> patch against qmi_wwan :-)
>> 
> Yes, it is. I will send qmi_wwan patch by all means.

Thanks


Bjørn


RE: [PATCH] USB: serial: option: adding support for YUGA CLM920-NC5

2017-12-18 Thread 林上智
> -Original Message-
> From: Bjørn Mork [mailto:bj...@mork.no]
> Sent: Tuesday, December 19, 2017 2:44 AM
> To: SZ Lin (林上智)
> Cc: Johan Hovold; Taiyi TY Wu (吳泰毅); Greg Kroah-Hartman;
> linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] USB: serial: option: adding support for YUGA CLM920-NC5
> 
> Johan Hovold  writes:
> 
> >> +static const struct option_blacklist_info yuga_clm920_nc5_blacklist = {
> >> +  .reserved = BIT(0) | BIT(1) | BIT(4), };
> >
> > Do you really need to blacklist the first interface?
> 
> Good question. Interface #0 does look a lot like a Qualcomm DM/DIAG function, 
> based
> on two bulk endpoints, no additional descriptors and the fact that it is the 
> first interface.
> If so, then we do want a serial driver for it.  There is a basic libqcdm 
> implementation in
> ModemManager if you want to test it out.
> 

I have confirmed that interface #0 is QCDM/DIAG port in this module, and thus I 
will remove this from reserved list in next patch.
Furthermore, interface #1 is ADB port. Should I also remove this from reserved 
list?

> And I expect interface #4 is QMI/rmnet?  Feel free to confirm that assumption 
> with a
> patch against qmi_wwan :-)
> 
Yes, it is. I will send qmi_wwan patch by all means.
> 
> Bjørn

SZ


RE: [PATCH] USB: serial: option: adding support for YUGA CLM920-NC5

2017-12-18 Thread 林上智
> -Original Message-
> From: Bjørn Mork [mailto:bj...@mork.no]
> Sent: Tuesday, December 19, 2017 2:44 AM
> To: SZ Lin (林上智)
> Cc: Johan Hovold; Taiyi TY Wu (吳泰毅); Greg Kroah-Hartman;
> linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] USB: serial: option: adding support for YUGA CLM920-NC5
> 
> Johan Hovold  writes:
> 
> >> +static const struct option_blacklist_info yuga_clm920_nc5_blacklist = {
> >> +  .reserved = BIT(0) | BIT(1) | BIT(4), };
> >
> > Do you really need to blacklist the first interface?
> 
> Good question. Interface #0 does look a lot like a Qualcomm DM/DIAG function, 
> based
> on two bulk endpoints, no additional descriptors and the fact that it is the 
> first interface.
> If so, then we do want a serial driver for it.  There is a basic libqcdm 
> implementation in
> ModemManager if you want to test it out.
> 

I have confirmed that interface #0 is QCDM/DIAG port in this module, and thus I 
will remove this from reserved list in next patch.
Furthermore, interface #1 is ADB port. Should I also remove this from reserved 
list?

> And I expect interface #4 is QMI/rmnet?  Feel free to confirm that assumption 
> with a
> patch against qmi_wwan :-)
> 
Yes, it is. I will send qmi_wwan patch by all means.
> 
> Bjørn

SZ


net/wireless/certs/*.x509 binary files

2017-12-18 Thread Randy Dunlap
This is just an FYI/acknowledgment that net/wireless/certs/*.x509
binary file(s) practically kills use of Linux kernel tarballs.

Of course, someone can always enable EXPERT and CFG80211_CERTIFICATION_ONUS
and disable the REGDB kconfig symbols to get around this.
Oh, and then chmod +x tools/objtool/sync-check.sh (unrelated problem).

Then you are good to go. :)

Background:
I was getting build errors on net/wireless/shipped-certs.o and the
build log didn't help me at all. It just said something like,
Error: build failed on net/wireless/shipped-certs.o.
Even building with V=1 didn't help.
So I finally discovered the reason and worked around it.

-- 
~Randy

PS:  Yes, I know about git.


net/wireless/certs/*.x509 binary files

2017-12-18 Thread Randy Dunlap
This is just an FYI/acknowledgment that net/wireless/certs/*.x509
binary file(s) practically kills use of Linux kernel tarballs.

Of course, someone can always enable EXPERT and CFG80211_CERTIFICATION_ONUS
and disable the REGDB kconfig symbols to get around this.
Oh, and then chmod +x tools/objtool/sync-check.sh (unrelated problem).

Then you are good to go. :)

Background:
I was getting build errors on net/wireless/shipped-certs.o and the
build log didn't help me at all. It just said something like,
Error: build failed on net/wireless/shipped-certs.o.
Even building with V=1 didn't help.
So I finally discovered the reason and worked around it.

-- 
~Randy

PS:  Yes, I know about git.


Re: linux-next: Signed-off-by missing for commit in the scsi tree

2017-12-18 Thread chenxiang (M)

在 2017/12/19 12:27, Martin K. Petersen 写道:

Stephen,


   9188808c7760 ("scsi: hisi_sas: fix SAS_QUEUE_FULL problem while running IO")

is missing a Signed-off-by from its author.

John?


Hi Stephen and Martin,
John is on vacation. I have checked it and please add a "Signed-off-by: 
Xiang Chen " or let me know if want us to 
re-send this patch again .









Re: linux-next: Signed-off-by missing for commit in the scsi tree

2017-12-18 Thread chenxiang (M)

在 2017/12/19 12:27, Martin K. Petersen 写道:

Stephen,


   9188808c7760 ("scsi: hisi_sas: fix SAS_QUEUE_FULL problem while running IO")

is missing a Signed-off-by from its author.

John?


Hi Stephen and Martin,
John is on vacation. I have checked it and please add a "Signed-off-by: 
Xiang Chen " or let me know if want us to 
re-send this patch again .









[PATCH] iio: fix SPHINX kernel-docs build warning

2017-12-18 Thread Tobin C. Harding
When building kernel documentation sphinx emits the following warnings

No description found for parameter 'iio_dev'
Excess function parameter 'indio_dev' description in 'iio_device_register'

These warnings are caused by a macro with a different argument identifier to the
one listed in the kernel-docs.

Change macro argument to be the same as the docs.

Signed-off-by: Tobin C. Harding 
---
 include/linux/iio/iio.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
index 20b61347ea58..1234e5571d45 100644
--- a/include/linux/iio/iio.h
+++ b/include/linux/iio/iio.h
@@ -606,8 +606,8 @@ const struct iio_chan_spec
  * iio_device_register() - register a device with the IIO subsystem
  * @indio_dev: Device structure filled by the device driver
  **/
-#define iio_device_register(iio_dev) \
-   __iio_device_register((iio_dev), THIS_MODULE)
+#define iio_device_register(indio_dev) \
+   __iio_device_register((indio_dev), THIS_MODULE)
 int __iio_device_register(struct iio_dev *indio_dev, struct module *this_mod);
 void iio_device_unregister(struct iio_dev *indio_dev);
 /**
-- 
2.7.4



[PATCH] iio: fix SPHINX kernel-docs build warning

2017-12-18 Thread Tobin C. Harding
When building kernel documentation sphinx emits the following warnings

No description found for parameter 'iio_dev'
Excess function parameter 'indio_dev' description in 'iio_device_register'

These warnings are caused by a macro with a different argument identifier to the
one listed in the kernel-docs.

Change macro argument to be the same as the docs.

Signed-off-by: Tobin C. Harding 
---
 include/linux/iio/iio.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
index 20b61347ea58..1234e5571d45 100644
--- a/include/linux/iio/iio.h
+++ b/include/linux/iio/iio.h
@@ -606,8 +606,8 @@ const struct iio_chan_spec
  * iio_device_register() - register a device with the IIO subsystem
  * @indio_dev: Device structure filled by the device driver
  **/
-#define iio_device_register(iio_dev) \
-   __iio_device_register((iio_dev), THIS_MODULE)
+#define iio_device_register(indio_dev) \
+   __iio_device_register((indio_dev), THIS_MODULE)
 int __iio_device_register(struct iio_dev *indio_dev, struct module *this_mod);
 void iio_device_unregister(struct iio_dev *indio_dev);
 /**
-- 
2.7.4



[PATCH V2] USBIP: return correct port ENABLE status

2017-12-18 Thread pei . zhang
From: Pei Zhang 

USB system will clear port's ENABLE feature for some USB devices when
vdev is already assigned port address. This cause getPortStatus reports
to system that this device is not enabled, client OS will failed to use
this usb device.

The failure devices include a SAMSUNG SSD storage, Logitech webcam C920.

V2: send again to all related maintainers.

Signed-off-by: Pei Zhang 
---
 drivers/usb/usbip/vhci_hcd.c | 63 
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 713e941..7970bab 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -430,38 +430,45 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 
typeReq, u16 wValue,
vhci_hcd->re_timeout = 0;
}
 
-   if ((vhci_hcd->port_status[rhport] & (1 << 
USB_PORT_FEAT_RESET)) !=
-   0 && time_after(jiffies, vhci_hcd->re_timeout)) {
-   vhci_hcd->port_status[rhport] |= (1 << 
USB_PORT_FEAT_C_RESET);
-   vhci_hcd->port_status[rhport] &= ~(1 << 
USB_PORT_FEAT_RESET);
-   vhci_hcd->re_timeout = 0;
-
-   if (vhci_hcd->vdev[rhport].ud.status ==
-   VDEV_ST_NOTASSIGNED) {
-   usbip_dbg_vhci_rh(
-   " enable rhport %d (status %u)\n",
-   rhport,
-   vhci_hcd->vdev[rhport].ud.status);
-   vhci_hcd->port_status[rhport] |=
-   USB_PORT_STAT_ENABLE;
-   }
-
-   if (hcd->speed < HCD_USB3) {
-   switch (vhci_hcd->vdev[rhport].speed) {
-   case USB_SPEED_HIGH:
-   vhci_hcd->port_status[rhport] |=
- USB_PORT_STAT_HIGH_SPEED;
-   break;
-   case USB_SPEED_LOW:
+   if ((vhci_hcd->port_status[rhport] & (1 << 
USB_PORT_FEAT_RESET))) {
+   if (time_after(jiffies, vhci_hcd->re_timeout)) {
+   vhci_hcd->port_status[rhport] |= (1 << 
USB_PORT_FEAT_C_RESET);
+   vhci_hcd->port_status[rhport] &= ~(1 << 
USB_PORT_FEAT_RESET);
+   vhci_hcd->re_timeout = 0;
+
+   if (vhci_hcd->vdev[rhport].ud.status ==
+   VDEV_ST_NOTASSIGNED) {
+   usbip_dbg_vhci_rh(
+   " enable rhport %d (status 
%u)\n",
+   rhport, 
vhci_hcd->vdev[rhport].ud.status);
vhci_hcd->port_status[rhport] |=
-   USB_PORT_STAT_LOW_SPEED;
-   break;
-   default:
-   pr_err("vhci_device speed not set\n");
-   break;
+   USB_PORT_STAT_ENABLE;
+   }
+
+   if (hcd->speed < HCD_USB3) {
+   switch (vhci_hcd->vdev[rhport].speed) {
+   case USB_SPEED_HIGH:
+   vhci_hcd->port_status[rhport] |=
+   
USB_PORT_STAT_HIGH_SPEED;
+   break;
+   case USB_SPEED_LOW:
+   vhci_hcd->port_status[rhport] |=
+   USB_PORT_STAT_LOW_SPEED;
+   break;
+   default:
+   pr_err("vhci_device speed not 
set\n");
+   break;
+   }
}
}
+   } else {
+   /* Port would be disabled by clearing FEAT_ENABLE,
+* make it enabled again here.
+*/
+   if (vhci_hcd->vdev[rhport].ud.status == VDEV_ST_USED)
+   vhci_hcd->port_status[rhport] |= 
USB_PORT_STAT_ENABLE;
}
+
((__le16 *) buf)[0] = 
cpu_to_le16(vhci_hcd->port_status[rhport]);
((__le16 *) buf)[1] =

[PATCH V2] USBIP: return correct port ENABLE status

2017-12-18 Thread pei . zhang
From: Pei Zhang 

USB system will clear port's ENABLE feature for some USB devices when
vdev is already assigned port address. This cause getPortStatus reports
to system that this device is not enabled, client OS will failed to use
this usb device.

The failure devices include a SAMSUNG SSD storage, Logitech webcam C920.

V2: send again to all related maintainers.

Signed-off-by: Pei Zhang 
---
 drivers/usb/usbip/vhci_hcd.c | 63 
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 713e941..7970bab 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -430,38 +430,45 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 
typeReq, u16 wValue,
vhci_hcd->re_timeout = 0;
}
 
-   if ((vhci_hcd->port_status[rhport] & (1 << 
USB_PORT_FEAT_RESET)) !=
-   0 && time_after(jiffies, vhci_hcd->re_timeout)) {
-   vhci_hcd->port_status[rhport] |= (1 << 
USB_PORT_FEAT_C_RESET);
-   vhci_hcd->port_status[rhport] &= ~(1 << 
USB_PORT_FEAT_RESET);
-   vhci_hcd->re_timeout = 0;
-
-   if (vhci_hcd->vdev[rhport].ud.status ==
-   VDEV_ST_NOTASSIGNED) {
-   usbip_dbg_vhci_rh(
-   " enable rhport %d (status %u)\n",
-   rhport,
-   vhci_hcd->vdev[rhport].ud.status);
-   vhci_hcd->port_status[rhport] |=
-   USB_PORT_STAT_ENABLE;
-   }
-
-   if (hcd->speed < HCD_USB3) {
-   switch (vhci_hcd->vdev[rhport].speed) {
-   case USB_SPEED_HIGH:
-   vhci_hcd->port_status[rhport] |=
- USB_PORT_STAT_HIGH_SPEED;
-   break;
-   case USB_SPEED_LOW:
+   if ((vhci_hcd->port_status[rhport] & (1 << 
USB_PORT_FEAT_RESET))) {
+   if (time_after(jiffies, vhci_hcd->re_timeout)) {
+   vhci_hcd->port_status[rhport] |= (1 << 
USB_PORT_FEAT_C_RESET);
+   vhci_hcd->port_status[rhport] &= ~(1 << 
USB_PORT_FEAT_RESET);
+   vhci_hcd->re_timeout = 0;
+
+   if (vhci_hcd->vdev[rhport].ud.status ==
+   VDEV_ST_NOTASSIGNED) {
+   usbip_dbg_vhci_rh(
+   " enable rhport %d (status 
%u)\n",
+   rhport, 
vhci_hcd->vdev[rhport].ud.status);
vhci_hcd->port_status[rhport] |=
-   USB_PORT_STAT_LOW_SPEED;
-   break;
-   default:
-   pr_err("vhci_device speed not set\n");
-   break;
+   USB_PORT_STAT_ENABLE;
+   }
+
+   if (hcd->speed < HCD_USB3) {
+   switch (vhci_hcd->vdev[rhport].speed) {
+   case USB_SPEED_HIGH:
+   vhci_hcd->port_status[rhport] |=
+   
USB_PORT_STAT_HIGH_SPEED;
+   break;
+   case USB_SPEED_LOW:
+   vhci_hcd->port_status[rhport] |=
+   USB_PORT_STAT_LOW_SPEED;
+   break;
+   default:
+   pr_err("vhci_device speed not 
set\n");
+   break;
+   }
}
}
+   } else {
+   /* Port would be disabled by clearing FEAT_ENABLE,
+* make it enabled again here.
+*/
+   if (vhci_hcd->vdev[rhport].ud.status == VDEV_ST_USED)
+   vhci_hcd->port_status[rhport] |= 
USB_PORT_STAT_ENABLE;
}
+
((__le16 *) buf)[0] = 
cpu_to_le16(vhci_hcd->port_status[rhport]);
((__le16 *) buf)[1] =
cpu_to_le16(vhci_hcd->port_status[rhport] >> 

Re: r8169 regression: UDP packets dropped intermittantly

2017-12-18 Thread Jonathan Woithe
Hi again

This is a follow up to my earlier message.

On Tue, Dec 19, 2017 at 09:02:25AM +1030, Jonathan Woithe wrote:
> On Mon, Dec 18, 2017 at 02:38:53PM +0100, Holger Hoffstätte wrote:
> > Since I've seen your postings several times now with no comment or 
> > resolution
> > I've decided to try your reproducer on my own systems. In short, I cannot
> > reproduce any packet loss, despite having 2 (cheap) 1Gb switches between the
> > two machines. Both are running 4.14.7.
> 
> Thanks for trying the test program on your system.  The result indicates
> that the problem might be specific to the behaviour of a particular network
> variant of the r8169 chip.

I was able to temporarily acquire a PCIe card which uses the r8169 driver.
This allowed me to run the reproducer on the same machine with two different
r8169-based cards.  The original NIC is this:

  05:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 
  Gigabit Ethernet (rev 10) [10ec:8169]
  Subsystem: Netgear GA311 [1385:311a]

The PCIe card is this:

  02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B 
  PCI Express Gigabit Ethernet controller (rev 06) [10ec:8168]
  Subsystem: Realtek Semiconductor Co., Ltd. Device 0123 [10ec:0123]

The test was conducted with kernel 4.3.0 since both the 4.3.0 driver (which
triggers the fault) and the forward ported driver (which predates commit
da78dbff2e05630921c551dbbc70a4b7981a8fff) was available.  For the record,
the machine used as the slave in these tests (the one receiving the 6 byte
request and sending the 14 byte response) was using its onboard NIC:

  00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network 
  Connection (rev 05) [8086:1503]
  Subsystem: Gigabyte Technology Co., Ltd 82579V Gigabit Network 
  Connection [1458:e000]

Test outcomes were as follows:

  PCIe card, unpatched 4.3.0 r8169 driver: no error (tested for 1 hour)
  PCIe card, forward ported r8169 driver:  no error (tested for 1 hour)

  GA311 card, unpatched 4.3.0 r8169 driver: test fail in under 4 minutes
  GA311 card, forward ported r8169 driver:  no error (tested for 1 hour)

For completeness, I then booted 4.14 and repeated the test with its r8168
driver.  The PCIe card ran for an hour without triggering the error, while
the GA311 triggered it quickly (in under 3 minutes).

This clearly indicates that not every card using the r8169 driver is
vulnerable to the problem.  It also explains why Holger was unable to
reproduce the result on his system: the PCIe cards do not appear to suffer
from the problem.  Most likely the PCI RTL-8169 chip is affected, but newer
PCIe variations do not.  However, obviously more testing will be required
with a wider variety of cards if this inference is to hold up.

The above result (and those from Holger) allow the problem description to be
refined a little: changes in commit da78dbff2e05630921c551dbbc70a4b7981a8fff
cause GA311 NICs (and possibly other PCI cards using an RTL-8169) to have
trouble with small UDP packets, while PCIe variants are seemingly
unaffected.

Does this help?

Regards
  jonathan


Re: r8169 regression: UDP packets dropped intermittantly

2017-12-18 Thread Jonathan Woithe
Hi again

This is a follow up to my earlier message.

On Tue, Dec 19, 2017 at 09:02:25AM +1030, Jonathan Woithe wrote:
> On Mon, Dec 18, 2017 at 02:38:53PM +0100, Holger Hoffstätte wrote:
> > Since I've seen your postings several times now with no comment or 
> > resolution
> > I've decided to try your reproducer on my own systems. In short, I cannot
> > reproduce any packet loss, despite having 2 (cheap) 1Gb switches between the
> > two machines. Both are running 4.14.7.
> 
> Thanks for trying the test program on your system.  The result indicates
> that the problem might be specific to the behaviour of a particular network
> variant of the r8169 chip.

I was able to temporarily acquire a PCIe card which uses the r8169 driver.
This allowed me to run the reproducer on the same machine with two different
r8169-based cards.  The original NIC is this:

  05:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 
  Gigabit Ethernet (rev 10) [10ec:8169]
  Subsystem: Netgear GA311 [1385:311a]

The PCIe card is this:

  02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B 
  PCI Express Gigabit Ethernet controller (rev 06) [10ec:8168]
  Subsystem: Realtek Semiconductor Co., Ltd. Device 0123 [10ec:0123]

The test was conducted with kernel 4.3.0 since both the 4.3.0 driver (which
triggers the fault) and the forward ported driver (which predates commit
da78dbff2e05630921c551dbbc70a4b7981a8fff) was available.  For the record,
the machine used as the slave in these tests (the one receiving the 6 byte
request and sending the 14 byte response) was using its onboard NIC:

  00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network 
  Connection (rev 05) [8086:1503]
  Subsystem: Gigabyte Technology Co., Ltd 82579V Gigabit Network 
  Connection [1458:e000]

Test outcomes were as follows:

  PCIe card, unpatched 4.3.0 r8169 driver: no error (tested for 1 hour)
  PCIe card, forward ported r8169 driver:  no error (tested for 1 hour)

  GA311 card, unpatched 4.3.0 r8169 driver: test fail in under 4 minutes
  GA311 card, forward ported r8169 driver:  no error (tested for 1 hour)

For completeness, I then booted 4.14 and repeated the test with its r8168
driver.  The PCIe card ran for an hour without triggering the error, while
the GA311 triggered it quickly (in under 3 minutes).

This clearly indicates that not every card using the r8169 driver is
vulnerable to the problem.  It also explains why Holger was unable to
reproduce the result on his system: the PCIe cards do not appear to suffer
from the problem.  Most likely the PCI RTL-8169 chip is affected, but newer
PCIe variations do not.  However, obviously more testing will be required
with a wider variety of cards if this inference is to hold up.

The above result (and those from Holger) allow the problem description to be
refined a little: changes in commit da78dbff2e05630921c551dbbc70a4b7981a8fff
cause GA311 NICs (and possibly other PCI cards using an RTL-8169) to have
trouble with small UDP packets, while PCIe variants are seemingly
unaffected.

Does this help?

Regards
  jonathan


[PATCH V4 02/26] powerpc/PCI: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Use pci_get_domain_bus_and_slot() with a domain number of 0 as the code
is not ready to consume multiple domains and existing code used domain
number 0.

Signed-off-by: Sinan Kaya 
---
 arch/powerpc/kernel/pci_32.c  | 3 ++-
 arch/powerpc/platforms/powermac/feature.c | 2 +-
 arch/powerpc/sysdev/mv64x60_pci.c | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 1d817f4..85ad2f7 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -96,7 +96,8 @@
reg = of_get_property(node, "reg", NULL);
if (!reg)
continue;
-   dev = pci_get_bus_and_slot(pci_bus, ((reg[0] >> 8) & 0xff));
+   dev = pci_get_domain_bus_and_slot(0, pci_bus,
+ ((reg[0] >> 8) & 0xff));
if (!dev || !dev->subordinate) {
pci_dev_put(dev);
continue;
diff --git a/arch/powerpc/platforms/powermac/feature.c 
b/arch/powerpc/platforms/powermac/feature.c
index 466b842..3f82cb2 100644
--- a/arch/powerpc/platforms/powermac/feature.c
+++ b/arch/powerpc/platforms/powermac/feature.c
@@ -829,7 +829,7 @@ static long core99_scc_enable(struct device_node *node, 
long param, long value)
 
if (value) {
if (pci_device_from_OF_node(node, , ) == 0)
-   pdev = pci_get_bus_and_slot(pbus, pid);
+   pdev = pci_get_domain_bus_and_slot(0, pbus, pid);
if (pdev == NULL)
return 0;
rc = pci_enable_device(pdev);
diff --git a/arch/powerpc/sysdev/mv64x60_pci.c 
b/arch/powerpc/sysdev/mv64x60_pci.c
index d52b3b8..6fe9104 100644
--- a/arch/powerpc/sysdev/mv64x60_pci.c
+++ b/arch/powerpc/sysdev/mv64x60_pci.c
@@ -37,7 +37,7 @@ static ssize_t mv64x60_hs_reg_read(struct file *filp, struct 
kobject *kobj,
if (count < MV64X60_VAL_LEN_MAX)
return -EINVAL;
 
-   phb = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
+   phb = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0));
if (!phb)
return -ENODEV;
pci_read_config_dword(phb, MV64X60_PCICFG_CPCI_HOTSWAP, );
@@ -61,7 +61,7 @@ static ssize_t mv64x60_hs_reg_write(struct file *filp, struct 
kobject *kobj,
if (sscanf(buf, "%i", ) != 1)
return -EINVAL;
 
-   phb = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
+   phb = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0));
if (!phb)
return -ENODEV;
pci_write_config_dword(phb, MV64X60_PCICFG_CPCI_HOTSWAP, v);
-- 
1.9.1



[PATCH V4 02/26] powerpc/PCI: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Use pci_get_domain_bus_and_slot() with a domain number of 0 as the code
is not ready to consume multiple domains and existing code used domain
number 0.

Signed-off-by: Sinan Kaya 
---
 arch/powerpc/kernel/pci_32.c  | 3 ++-
 arch/powerpc/platforms/powermac/feature.c | 2 +-
 arch/powerpc/sysdev/mv64x60_pci.c | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 1d817f4..85ad2f7 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -96,7 +96,8 @@
reg = of_get_property(node, "reg", NULL);
if (!reg)
continue;
-   dev = pci_get_bus_and_slot(pci_bus, ((reg[0] >> 8) & 0xff));
+   dev = pci_get_domain_bus_and_slot(0, pci_bus,
+ ((reg[0] >> 8) & 0xff));
if (!dev || !dev->subordinate) {
pci_dev_put(dev);
continue;
diff --git a/arch/powerpc/platforms/powermac/feature.c 
b/arch/powerpc/platforms/powermac/feature.c
index 466b842..3f82cb2 100644
--- a/arch/powerpc/platforms/powermac/feature.c
+++ b/arch/powerpc/platforms/powermac/feature.c
@@ -829,7 +829,7 @@ static long core99_scc_enable(struct device_node *node, 
long param, long value)
 
if (value) {
if (pci_device_from_OF_node(node, , ) == 0)
-   pdev = pci_get_bus_and_slot(pbus, pid);
+   pdev = pci_get_domain_bus_and_slot(0, pbus, pid);
if (pdev == NULL)
return 0;
rc = pci_enable_device(pdev);
diff --git a/arch/powerpc/sysdev/mv64x60_pci.c 
b/arch/powerpc/sysdev/mv64x60_pci.c
index d52b3b8..6fe9104 100644
--- a/arch/powerpc/sysdev/mv64x60_pci.c
+++ b/arch/powerpc/sysdev/mv64x60_pci.c
@@ -37,7 +37,7 @@ static ssize_t mv64x60_hs_reg_read(struct file *filp, struct 
kobject *kobj,
if (count < MV64X60_VAL_LEN_MAX)
return -EINVAL;
 
-   phb = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
+   phb = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0));
if (!phb)
return -ENODEV;
pci_read_config_dword(phb, MV64X60_PCICFG_CPCI_HOTSWAP, );
@@ -61,7 +61,7 @@ static ssize_t mv64x60_hs_reg_write(struct file *filp, struct 
kobject *kobj,
if (sscanf(buf, "%i", ) != 1)
return -EINVAL;
 
-   phb = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
+   phb = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0));
if (!phb)
return -ENODEV;
pci_write_config_dword(phb, MV64X60_PCICFG_CPCI_HOTSWAP, v);
-- 
1.9.1



[PATCH V4 06/26] edd: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Domain number is not available in struct edd_info. Hard-coding the domain
number as 0.

Signed-off-by: Sinan Kaya 
---
 drivers/firmware/edd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/edd.c b/drivers/firmware/edd.c
index e229576..60a8f13 100644
--- a/drivers/firmware/edd.c
+++ b/drivers/firmware/edd.c
@@ -669,10 +669,10 @@ static void edd_release(struct kobject * kobj)
struct edd_info *info = edd_dev_get_info(edev);
 
if (edd_dev_is_type(edev, "PCI") || edd_dev_is_type(edev, "XPRS")) {
-   return pci_get_bus_and_slot(info->params.interface_path.pci.bus,
-
PCI_DEVFN(info->params.interface_path.pci.slot,
-  info->params.interface_path.pci.
-  function));
+   return pci_get_domain_bus_and_slot(0,
+   info->params.interface_path.pci.bus,
+   PCI_DEVFN(info->params.interface_path.pci.slot,
+   info->params.interface_path.pci.function));
}
return NULL;
 }
-- 
1.9.1



[PATCH V4 06/26] edd: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Domain number is not available in struct edd_info. Hard-coding the domain
number as 0.

Signed-off-by: Sinan Kaya 
---
 drivers/firmware/edd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/edd.c b/drivers/firmware/edd.c
index e229576..60a8f13 100644
--- a/drivers/firmware/edd.c
+++ b/drivers/firmware/edd.c
@@ -669,10 +669,10 @@ static void edd_release(struct kobject * kobj)
struct edd_info *info = edd_dev_get_info(edev);
 
if (edd_dev_is_type(edev, "PCI") || edd_dev_is_type(edev, "XPRS")) {
-   return pci_get_bus_and_slot(info->params.interface_path.pci.bus,
-
PCI_DEVFN(info->params.interface_path.pci.slot,
-  info->params.interface_path.pci.
-  function));
+   return pci_get_domain_bus_and_slot(0,
+   info->params.interface_path.pci.bus,
+   PCI_DEVFN(info->params.interface_path.pci.slot,
+   info->params.interface_path.pci.function));
}
return NULL;
 }
-- 
1.9.1



linux-next: Tree for Dec 19

2017-12-18 Thread Stephen Rothwell
Hi all,

Changes since 20171218:

The xtensa tree gained a conflict against Linus' tree.

The v4l-dvb tree gained a conflict against the vfs tree.

The net-next tree gained a conflict against the net tree.

The tty tree lost its build failure.

Non-merge commits (relative to Linus' tree): 4879
 5210 files changed, 176553 insertions(+), 150930 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 253 trees (counting Linus' and 43 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (cb81fc6a3cf6 Merge branch 'parisc-4.15-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cfe17c9bbe6a kbuild: move cc-option and 
cc-disable-warning after incl. arch Makefile)
Merging arc-current/for-curr (799ab4a2c85b arc: do not use __print_symbol())
Merging arm-current/fixes (36b0cb84ee85 ARM: 8731/1: Fix 
csum_partial_copy_from_user() stack mismatch)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (110df8bd3e41 powerpc/perf: Fix kfree memory 
allocated for nest pmus)
Merging sparc/master (a0908a1b7d68 Merge branch 'akpm' (patches from Andrew))
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (ab14436065c8 net: phy: xgene: disable clk on error paths)
Merging bpf/master (c1b08ebe5003 Merge branch 'bpf-jit-fixes')
Merging ipsec/master (d2950278d2d0 xfrm: put policies when reusing pcpu xdst 
entry)
Merging netfilter/master (d6da83813fb3 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a41886f56b7b Merge tag 
'iwlwifi-for-kalle-2017-12-05' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (4564b187c163 nl80211: fix nl80211_send_iface() error 
paths)
Merging sound-current/for-linus (9226665159f0 ALSA: hda/realtek - Fix Dell AIO 
LineOut issue)
Merging pci-current/for-linus (0c31f1d7be1b PCI: rcar: Fix use-after-free in 
probe error path)
Merging driver-core.current/driver-core-linus (f57ab9a01a36 drivers: base: 
cacheinfo: fix cache type for non-architected system cache)
Merging tty.current/tty-linus (50c4c4e268a2 Linux 4.15-rc3)
Merging usb.current/usb-linus (1291a0d5049d Linux 4.15-rc4)
Merging usb-gadget-fixes/fixes (9dbe416b656b Revert "usb: gadget: allow to 
enable legacy drivers without USB_ETH")
Merging usb-serial-fixes/usb-linus (92a18a657fb2 USB: serial: qcserial: add 
Sierra Wireless EM7565)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON)
Merging staging.current/staging-linus (1291a0d5049d Linux 4.15-rc4)
Merging char-misc.current/char-misc-linus (7f3dc0088b98 binder: fix proc->files 
use-after-free)
Merging input-current/for-linus (8b7e9d9e2d8b Input: hideep - fix compile error 
due to missing include fi

linux-next: Tree for Dec 19

2017-12-18 Thread Stephen Rothwell
Hi all,

Changes since 20171218:

The xtensa tree gained a conflict against Linus' tree.

The v4l-dvb tree gained a conflict against the vfs tree.

The net-next tree gained a conflict against the net tree.

The tty tree lost its build failure.

Non-merge commits (relative to Linus' tree): 4879
 5210 files changed, 176553 insertions(+), 150930 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 253 trees (counting Linus' and 43 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (cb81fc6a3cf6 Merge branch 'parisc-4.15-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cfe17c9bbe6a kbuild: move cc-option and 
cc-disable-warning after incl. arch Makefile)
Merging arc-current/for-curr (799ab4a2c85b arc: do not use __print_symbol())
Merging arm-current/fixes (36b0cb84ee85 ARM: 8731/1: Fix 
csum_partial_copy_from_user() stack mismatch)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (110df8bd3e41 powerpc/perf: Fix kfree memory 
allocated for nest pmus)
Merging sparc/master (a0908a1b7d68 Merge branch 'akpm' (patches from Andrew))
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (ab14436065c8 net: phy: xgene: disable clk on error paths)
Merging bpf/master (c1b08ebe5003 Merge branch 'bpf-jit-fixes')
Merging ipsec/master (d2950278d2d0 xfrm: put policies when reusing pcpu xdst 
entry)
Merging netfilter/master (d6da83813fb3 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a41886f56b7b Merge tag 
'iwlwifi-for-kalle-2017-12-05' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (4564b187c163 nl80211: fix nl80211_send_iface() error 
paths)
Merging sound-current/for-linus (9226665159f0 ALSA: hda/realtek - Fix Dell AIO 
LineOut issue)
Merging pci-current/for-linus (0c31f1d7be1b PCI: rcar: Fix use-after-free in 
probe error path)
Merging driver-core.current/driver-core-linus (f57ab9a01a36 drivers: base: 
cacheinfo: fix cache type for non-architected system cache)
Merging tty.current/tty-linus (50c4c4e268a2 Linux 4.15-rc3)
Merging usb.current/usb-linus (1291a0d5049d Linux 4.15-rc4)
Merging usb-gadget-fixes/fixes (9dbe416b656b Revert "usb: gadget: allow to 
enable legacy drivers without USB_ETH")
Merging usb-serial-fixes/usb-linus (92a18a657fb2 USB: serial: qcserial: add 
Sierra Wireless EM7565)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON)
Merging staging.current/staging-linus (1291a0d5049d Linux 4.15-rc4)
Merging char-misc.current/char-misc-linus (7f3dc0088b98 binder: fix proc->files 
use-after-free)
Merging input-current/for-linus (8b7e9d9e2d8b Input: hideep - fix compile error 
due to missing include fi

[PATCH V4 11/26] iommu/amd: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Hard-code the domain number as 0 for the AMD IOMMU driver.

Signed-off-by: Sinan Kaya 
---
 drivers/iommu/amd_iommu.c  | 3 ++-
 drivers/iommu/amd_iommu_init.c | 9 +
 drivers/iommu/amd_iommu_v2.c   | 3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..821547b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -527,7 +527,8 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
struct iommu_dev_data *dev_data = NULL;
struct pci_dev *pdev;
 
-   pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+  devid & 0xff);
if (pdev)
dev_data = get_dev_data(>dev);
 
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d03..4e4a615 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1697,8 +1697,8 @@ static int iommu_init_pci(struct amd_iommu *iommu)
u32 range, misc, low, high;
int ret;
 
-   iommu->dev = pci_get_bus_and_slot(PCI_BUS_NUM(iommu->devid),
- iommu->devid & 0xff);
+   iommu->dev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(iommu->devid),
+iommu->devid & 0xff);
if (!iommu->dev)
return -ENODEV;
 
@@ -1764,8 +1764,9 @@ static int iommu_init_pci(struct amd_iommu *iommu)
if (is_rd890_iommu(iommu->dev)) {
int i, j;
 
-   iommu->root_pdev = pci_get_bus_and_slot(iommu->dev->bus->number,
-   PCI_DEVFN(0, 0));
+   iommu->root_pdev =
+   pci_get_domain_bus_and_slot(0, iommu->dev->bus->number,
+   PCI_DEVFN(0, 0));
 
/*
 * Some rd890 systems may not be fully reconfigured by the
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 7d94e1d..8696382 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -564,7 +564,8 @@ static int ppr_notifier(struct notifier_block *nb, unsigned 
long e, void *data)
finish  = (iommu_fault->tag >> 9) & 1;
 
devid = iommu_fault->device_id;
-   pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+  devid & 0xff);
if (!pdev)
return -ENODEV;
dev_data = get_dev_data(>dev);
-- 
1.9.1



[PATCH V4 11/26] iommu/amd: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Hard-code the domain number as 0 for the AMD IOMMU driver.

Signed-off-by: Sinan Kaya 
---
 drivers/iommu/amd_iommu.c  | 3 ++-
 drivers/iommu/amd_iommu_init.c | 9 +
 drivers/iommu/amd_iommu_v2.c   | 3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..821547b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -527,7 +527,8 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
struct iommu_dev_data *dev_data = NULL;
struct pci_dev *pdev;
 
-   pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+  devid & 0xff);
if (pdev)
dev_data = get_dev_data(>dev);
 
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d03..4e4a615 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1697,8 +1697,8 @@ static int iommu_init_pci(struct amd_iommu *iommu)
u32 range, misc, low, high;
int ret;
 
-   iommu->dev = pci_get_bus_and_slot(PCI_BUS_NUM(iommu->devid),
- iommu->devid & 0xff);
+   iommu->dev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(iommu->devid),
+iommu->devid & 0xff);
if (!iommu->dev)
return -ENODEV;
 
@@ -1764,8 +1764,9 @@ static int iommu_init_pci(struct amd_iommu *iommu)
if (is_rd890_iommu(iommu->dev)) {
int i, j;
 
-   iommu->root_pdev = pci_get_bus_and_slot(iommu->dev->bus->number,
-   PCI_DEVFN(0, 0));
+   iommu->root_pdev =
+   pci_get_domain_bus_and_slot(0, iommu->dev->bus->number,
+   PCI_DEVFN(0, 0));
 
/*
 * Some rd890 systems may not be fully reconfigured by the
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 7d94e1d..8696382 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -564,7 +564,8 @@ static int ppr_notifier(struct notifier_block *nb, unsigned 
long e, void *data)
finish  = (iommu_fault->tag >> 9) & 1;
 
devid = iommu_fault->device_id;
-   pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+  devid & 0xff);
if (!pdev)
return -ENODEV;
dev_data = get_dev_data(>dev);
-- 
1.9.1



[PATCH V4 15/26] PCI: cpqhp: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Hard-coding the domain number as 0. The code doesn't seem to be ready
for multiple domains.

Signed-off-by: Sinan Kaya 
---
 drivers/pci/hotplug/cpqphp_pci.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/cpqphp_pci.c b/drivers/pci/hotplug/cpqphp_pci.c
index e220d49..8897a77 100644
--- a/drivers/pci/hotplug/cpqphp_pci.c
+++ b/drivers/pci/hotplug/cpqphp_pci.c
@@ -89,7 +89,9 @@ int cpqhp_configure_device(struct controller *ctrl, struct 
pci_func *func)
pci_lock_rescan_remove();
 
if (func->pci_dev == NULL)
-   func->pci_dev = pci_get_bus_and_slot(func->bus, 
PCI_DEVFN(func->device, func->function));
+   func->pci_dev = pci_get_domain_bus_and_slot(0, func->bus,
+   PCI_DEVFN(func->device,
+   func->function));
 
/* No pci device, we need to create it then */
if (func->pci_dev == NULL) {
@@ -99,7 +101,9 @@ int cpqhp_configure_device(struct controller *ctrl, struct 
pci_func *func)
if (num)
pci_bus_add_devices(ctrl->pci_dev->bus);
 
-   func->pci_dev = pci_get_bus_and_slot(func->bus, 
PCI_DEVFN(func->device, func->function));
+   func->pci_dev = pci_get_domain_bus_and_slot(0, func->bus,
+   PCI_DEVFN(func->device,
+   func->function));
if (func->pci_dev == NULL) {
dbg("ERROR: pci_dev still null\n");
goto out;
@@ -129,7 +133,10 @@ int cpqhp_unconfigure_device(struct pci_func *func)
 
pci_lock_rescan_remove();
for (j = 0; j < 8 ; j++) {
-   struct pci_dev *temp = pci_get_bus_and_slot(func->bus, 
PCI_DEVFN(func->device, j));
+   struct pci_dev *temp = pci_get_domain_bus_and_slot(0,
+   func->bus,
+   PCI_DEVFN(func->device,
+   j));
if (temp) {
pci_dev_put(temp);
pci_stop_and_remove_bus_device(temp);
@@ -319,6 +326,7 @@ int cpqhp_save_config(struct controller *ctrl, int 
busnumber, int is_hot_plug)
int cloop = 0;
int stop_it;
int index;
+   u16 devfn;
 
/* Decide which slots are supported */
 
@@ -416,7 +424,9 @@ int cpqhp_save_config(struct controller *ctrl, int 
busnumber, int is_hot_plug)
new_slot->switch_save = 0x10;
/* In case of unsupported board */
new_slot->status = DevError;
-   new_slot->pci_dev = pci_get_bus_and_slot(new_slot->bus, 
(new_slot->device << 3) | new_slot->function);
+   devfn = (new_slot->device << 3) | new_slot->function;
+   new_slot->pci_dev = pci_get_domain_bus_and_slot(0,
+   new_slot->bus, devfn);
 
for (cloop = 0; cloop < 0x20; cloop++) {
rc = pci_bus_read_config_dword(ctrl->pci_bus, 
PCI_DEVFN(device, function), cloop << 2, (u32 *) 
&(new_slot->config_space[cloop]));
-- 
1.9.1



[PATCH V4 14/26] pch_gbe: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Use the domain information from pdev while calling into
pci_get_domain_bus_and_slot() function.

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 40e52ff..7cd4946 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -2594,8 +2594,10 @@ static int pch_gbe_probe(struct pci_dev *pdev,
if (adapter->pdata && adapter->pdata->platform_init)
adapter->pdata->platform_init(pdev);
 
-   adapter->ptp_pdev = pci_get_bus_and_slot(adapter->pdev->bus->number,
-  PCI_DEVFN(12, 4));
+   adapter->ptp_pdev =
+   pci_get_domain_bus_and_slot(pci_domain_nr(adapter->pdev->bus),
+   adapter->pdev->bus->number,
+   PCI_DEVFN(12, 4));
 
netdev->netdev_ops = _gbe_netdev_ops;
netdev->watchdog_timeo = PCH_GBE_WATCHDOG_PERIOD;
-- 
1.9.1



[PATCH V4 15/26] PCI: cpqhp: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Hard-coding the domain number as 0. The code doesn't seem to be ready
for multiple domains.

Signed-off-by: Sinan Kaya 
---
 drivers/pci/hotplug/cpqphp_pci.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/cpqphp_pci.c b/drivers/pci/hotplug/cpqphp_pci.c
index e220d49..8897a77 100644
--- a/drivers/pci/hotplug/cpqphp_pci.c
+++ b/drivers/pci/hotplug/cpqphp_pci.c
@@ -89,7 +89,9 @@ int cpqhp_configure_device(struct controller *ctrl, struct 
pci_func *func)
pci_lock_rescan_remove();
 
if (func->pci_dev == NULL)
-   func->pci_dev = pci_get_bus_and_slot(func->bus, 
PCI_DEVFN(func->device, func->function));
+   func->pci_dev = pci_get_domain_bus_and_slot(0, func->bus,
+   PCI_DEVFN(func->device,
+   func->function));
 
/* No pci device, we need to create it then */
if (func->pci_dev == NULL) {
@@ -99,7 +101,9 @@ int cpqhp_configure_device(struct controller *ctrl, struct 
pci_func *func)
if (num)
pci_bus_add_devices(ctrl->pci_dev->bus);
 
-   func->pci_dev = pci_get_bus_and_slot(func->bus, 
PCI_DEVFN(func->device, func->function));
+   func->pci_dev = pci_get_domain_bus_and_slot(0, func->bus,
+   PCI_DEVFN(func->device,
+   func->function));
if (func->pci_dev == NULL) {
dbg("ERROR: pci_dev still null\n");
goto out;
@@ -129,7 +133,10 @@ int cpqhp_unconfigure_device(struct pci_func *func)
 
pci_lock_rescan_remove();
for (j = 0; j < 8 ; j++) {
-   struct pci_dev *temp = pci_get_bus_and_slot(func->bus, 
PCI_DEVFN(func->device, j));
+   struct pci_dev *temp = pci_get_domain_bus_and_slot(0,
+   func->bus,
+   PCI_DEVFN(func->device,
+   j));
if (temp) {
pci_dev_put(temp);
pci_stop_and_remove_bus_device(temp);
@@ -319,6 +326,7 @@ int cpqhp_save_config(struct controller *ctrl, int 
busnumber, int is_hot_plug)
int cloop = 0;
int stop_it;
int index;
+   u16 devfn;
 
/* Decide which slots are supported */
 
@@ -416,7 +424,9 @@ int cpqhp_save_config(struct controller *ctrl, int 
busnumber, int is_hot_plug)
new_slot->switch_save = 0x10;
/* In case of unsupported board */
new_slot->status = DevError;
-   new_slot->pci_dev = pci_get_bus_and_slot(new_slot->bus, 
(new_slot->device << 3) | new_slot->function);
+   devfn = (new_slot->device << 3) | new_slot->function;
+   new_slot->pci_dev = pci_get_domain_bus_and_slot(0,
+   new_slot->bus, devfn);
 
for (cloop = 0; cloop < 0x20; cloop++) {
rc = pci_bus_read_config_dword(ctrl->pci_bus, 
PCI_DEVFN(device, function), cloop << 2, (u32 *) 
&(new_slot->config_space[cloop]));
-- 
1.9.1



[PATCH V4 14/26] pch_gbe: deprecate pci_get_bus_and_slot()

2017-12-18 Thread Sinan Kaya
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().

Use the domain information from pdev while calling into
pci_get_domain_bus_and_slot() function.

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 40e52ff..7cd4946 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -2594,8 +2594,10 @@ static int pch_gbe_probe(struct pci_dev *pdev,
if (adapter->pdata && adapter->pdata->platform_init)
adapter->pdata->platform_init(pdev);
 
-   adapter->ptp_pdev = pci_get_bus_and_slot(adapter->pdev->bus->number,
-  PCI_DEVFN(12, 4));
+   adapter->ptp_pdev =
+   pci_get_domain_bus_and_slot(pci_domain_nr(adapter->pdev->bus),
+   adapter->pdev->bus->number,
+   PCI_DEVFN(12, 4));
 
netdev->netdev_ops = _gbe_netdev_ops;
netdev->watchdog_timeo = PCH_GBE_WATCHDOG_PERIOD;
-- 
1.9.1



  1   2   3   4   5   6   7   8   9   10   >