Re: [PATCH v1] mm: relax deferred struct page requirements

2017-11-16 Thread Heiko Carstens
On Thu, Nov 16, 2017 at 08:46:01PM -0500, Pavel Tatashin wrote:
> There is no need to have ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT,
> as all the page initialization code is in common code.
> 
> Also, there is no need to depend on MEMORY_HOTPLUG, as initialization code
> does not really use hotplug memory functionality. So, we can remove this
> requirement as well.
> 
> This patch allows to use deferred struct page initialization on all
> platforms with memblock allocator.
> 
> Tested on x86, arm64, and sparc. Also, verified that code compiles on
> PPC with CONFIG_MEMORY_HOTPLUG disabled.
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  arch/powerpc/Kconfig | 1 -
>  arch/s390/Kconfig| 1 -
>  arch/x86/Kconfig | 1 -
>  mm/Kconfig   | 7 +--
>  4 files changed, 1 insertion(+), 9 deletions(-)

For s390 the s390 bit:

Acked-by: Heiko Carstens 



Re: [PATCH kernel] vfio/spapr: Add trace points for map/unmap

2017-11-16 Thread Alexey Kardashevskiy
On 17/11/17 11:13, Alex Williamson wrote:
> On Tue, 14 Nov 2017 10:47:12 +1100
> Alexey Kardashevskiy  wrote:
> 
>> On 27/10/17 14:00, Alexey Kardashevskiy wrote:
>>> This adds trace_map/trace_unmap tracepoints to spapr driver. Type1 already
>>> uses these via the IOMMU API (iommu_map/__iommu_unmap).
>>>
>>> Signed-off-by: Alexey Kardashevskiy   
> 
> Is this really legitimate to include tracepoints from a different
> subsystem?>  The vfio type1 backend gets these trace points by virtue of
> it actually using the IOMMU API, it doesn't call them itself.  I'm kind
> of surprised these are actually available to be called from a module.

They are explicitly exported:

EXPORT_TRACEPOINT_SYMBOL_GPL(map);
EXPORT_TRACEPOINT_SYMBOL_GPL(unmap);

I would think this is for things like drivers/vfio/vfio_iommu_spapr_tce.c ,
why else?...


> I suspect the way to do this is probably to define our own tracepoints
> in the vfio/spapr backend or insert tracepoints into the IOMMU layers
> that that code calls into rather than masquerading as tracepoints from
> a different subsystem.  Right?

This makes sense too. But it is going to be just cut-n-paste and some
confusion -
/sys/kernel/debug/tracing/events/iommu/map will still be present but
won't work and
/sys/kernel/debug/tracing/events/vfio/vfio_iommu_spapr_tce/map will.

Judges? :)




>  Thanks,
> 
> Alex
> 
>>> ---
>>>
>>> Example:
>>>  qemu-system-ppc-8655  [096]   724.662740: unmap:IOMMU: 
>>> iova=0x3000 size=4096 unmapped_size=4096
>>>  qemu-system-ppc-8656  [104]   724.970912: map:  IOMMU: 
>>> iova=0x0800 paddr=0x7ffef7ff size=65536
>>> ---
>>>  drivers/vfio/vfio_iommu_spapr_tce.c | 12 ++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>>> b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> index 63112c36ab2d..4531486c77c6 100644
>>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> @@ -22,6 +22,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #include 
>>>  #include 
>>> @@ -502,17 +503,19 @@ static int tce_iommu_clear(struct tce_container 
>>> *container,
>>> struct iommu_table *tbl,
>>> unsigned long entry, unsigned long pages)
>>>  {
>>> -   unsigned long oldhpa;
>>> +   unsigned long oldhpa, unmapped, firstentry = entry, totalpages = pages;
>>> long ret;
>>> enum dma_data_direction direction;
>>>  
>>> -   for ( ; pages; --pages, ++entry) {
>>> +   for (unmapped = 0; pages; --pages, ++entry) {
>>> direction = DMA_NONE;
>>> oldhpa = 0;
>>> ret = iommu_tce_xchg(tbl, entry, , );
>>> if (ret)
>>> continue;
>>>  
>>> +   ++unmapped;
>>> +
>>> if (direction == DMA_NONE)
>>> continue;
>>>  
>>> @@ -523,6 +526,9 @@ static int tce_iommu_clear(struct tce_container 
>>> *container,
>>>  
>>> tce_iommu_unuse_page(container, oldhpa);
>>> }
>>> +   trace_unmap(firstentry << tbl->it_page_shift,
>>> +   totalpages << tbl->it_page_shift,
>>> +   unmapped << tbl->it_page_shift);
>>>  
>>> return 0;
>>>  }
>>> @@ -965,6 +971,8 @@ static long tce_iommu_ioctl(void *iommu_data,
>>> direction);
>>>  
>>> iommu_flush_tce(tbl);
>>> +   if (!ret)
>>> +   trace_map(param.iova, param.vaddr, param.size);
>>>  
>>> return ret;
>>> }
>>>   
>>
>>
> 


-- 
Alexey


Re: [PATCH v2 0/8] powerpc: Support ibm,dynamic-memory-v2 property

2017-11-16 Thread Bharata B Rao
On Thu, Nov 16, 2017 at 9:31 PM, Nathan Fontenot 
wrote:

>
>
> On 11/15/2017 11:37 PM, Bharata B Rao wrote:
> > On Fri, Oct 20, 2017 at 6:51 PM, Nathan Fontenot <
> nf...@linux.vnet.ibm.com > wrote:
> >
> > This patch set provides a set of updates to de-couple the LMB
> information
> > provided in the ibm,dynamic-memory device tree property from the
> device
> > tree property format. A part of this patch series introduces a new
> > device tree property format for dynamic memory, ibm-dynamic-meory-v2.
> > By separating the device tree format from the information provided by
> > the device tree property consumers of this information need not know
> > what format is currently being used and provide multiple parsing
> routines
> > for the information.
> >
> > The first two patches update the of_get_assoc_arrays() and
> > of_get_usable_memory() routines to look up the device node for the
> > properties they parse. This is needed because the calling routines
> for
> > these two functions will not have the device node to pass in in
> > subsequent patches.
> >
> > The third patch adds a new kernel structure, struct drmem_lmb, that
> > is used to represent each of the possible LMBs specified in the
> > ibm,dynamic-memory* device tree properties. The patch adds code
> > to parse the property and build the LMB array data, and updates
> prom.c
> > to use this new data structure instead of parsing the device tree
> directly.
> >
> > The fourth and fifth patches update the numa and pseries hotplug code
> > respectively to use the new LMB array data instead of parsing the
> > device tree directly.
> >
> > The sixth patch moves the of_drconf_cell struct to drmem.h where it
> > fits better than prom.h
> >
> > The seventh patch introduces support for the ibm,dynamic-memory-v2
> > property format by updating the new drmem.c code to be able to parse
> > and create this new device tree format.
> >
> > The last patch in the series updates the architecture vector to
> indicate
> > support for ibm,dynamic-memory-v2.
> >
> >
> > Here we are consolidating LMBs into LMB sets but still end up working
> with individual LMBs during hotplug. Can we instead start working with LMB
> sets together during hotplug ? In other words
>
> In a sense we do do this when handling memory DLPAR indexed-count
> requests. This takes a starting
> drc index for a LMB and adds/removes the following  contiguous
> LMBs. This operation is
> all-or-nothing, if any LMB fails to add/remove we revert back to the
> original state.
>

I am aware of count-indexed and we do use it for memory hotplug/unplug for
KVM on Power. However the RTAS and configure-connector calls there are
still per-LMB.


> Thi isn't exactly what you're asking for but...
> >
> > - The RTAS calls involved during DRC acquire stage can be done only once
> per LMB set.
> > - One configure-connector call for the entire LMB set.
>
> these two interfaces work on a single drc index, not a set of drc indexes.
> Working on a set
> of LMBs would require extending the current rtas calls or creating new
> ones.
>

Yes.


>
> One thing we can look into doing for indexed-count requests is to perform
> each of the
> steps for all LMBs in the set at once, i.e. make the acquire call for
> LMBs, then make the
> configure-connector calls for all the LMBs...
>

That is what I am hinting at to check the feasibility of such a mechanism.
Given that all the LMBs of the set are supposed to have similar attributes
(like node associativity etc), it makes sense to have a single DRC acquire
call and single configure-connector call for the entire set.


> The only drawback is this approach would make handling failures and
> backing out of the
> updates a bit messier, but I've never really thought that optimizing for
> the failure
> case to be as important.
>

Yes, error recovery can be messy given that we have multiple calls under
DRC acquire call (get-sensor-state and set-indicator).

BTW I thought this reorganization involving ibm,drc-info and
ibm,dynamic-memory-v2 was for representing and hotplugging huge amounts of
memory efficiently and quickly. So you have not yet found the per-LMB calls
to be hurting when huge amount of memory is involved ?

Regards,
Bharata.


Re: [PATCH 2/2] ASoC: fsl_asrc: constify some arrays

2017-11-16 Thread Nicolin Chen
On Thu, Nov 16, 2017 at 08:25:47AM -0800, Joe Perches wrote:
> Using const reduces data.
> 
> $ size sound/soc/fsl/fsl_asrc.o*
>text  data bss dec hex filename
>   21691  5872 192   277556c6b sound/soc/fsl/fsl_asrc.o.new
>   21435  6128 192   277556c6b sound/soc/fsl/fsl_asrc.o.old
> 
> Signed-off-by: Joe Perches 

Acked-by: Nicolin Chen 

> ---
>  sound/soc/fsl/fsl_asrc.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
> index ed683fe8b94a..641724c9b3f8 100644
> --- a/sound/soc/fsl/fsl_asrc.c
> +++ b/sound/soc/fsl/fsl_asrc.c
> @@ -49,12 +49,12 @@ static const u8 process_option[][12][2] = {
>  };
>  
>  /* Corresponding to process_option */
> -static int supported_input_rate[] = {
> +static const int supported_input_rate[] = {
>   5512, 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200,
>   96000, 176400, 192000,
>  };
>  
> -static int supported_asrc_rate[] = {
> +static const int supported_asrc_rate[] = {
>   8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 
> 176400, 192000,
>  };
>  
> @@ -62,26 +62,26 @@ static int supported_asrc_rate[] = {
>   * The following tables map the relationship between asrc_inclk/asrc_outclk 
> in
>   * fsl_asrc.h and the registers of ASRCSR
>   */
> -static unsigned char input_clk_map_imx35[] = {
> +static const unsigned char input_clk_map_imx35[] = {
>   0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf,
>  };
>  
> -static unsigned char output_clk_map_imx35[] = {
> +static const unsigned char output_clk_map_imx35[] = {
>   0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf,
>  };
>  
>  /* i.MX53 uses the same map for input and output */
> -static unsigned char input_clk_map_imx53[] = {
> +static const unsigned char input_clk_map_imx53[] = {
>  /*   0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xa  0xb  0xc  0xd  
> 0xe  0xf */
>   0x0, 0x1, 0x2, 0x7, 0x4, 0x5, 0x6, 0x3, 0x8, 0x9, 0xa, 0xb, 0xc, 0xf, 
> 0xe, 0xd,
>  };
>  
> -static unsigned char output_clk_map_imx53[] = {
> +static const unsigned char output_clk_map_imx53[] = {
>  /*   0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xa  0xb  0xc  0xd  
> 0xe  0xf */
>   0x8, 0x9, 0xa, 0x7, 0xc, 0x5, 0x6, 0xb, 0x0, 0x1, 0x2, 0x3, 0x4, 0xf, 
> 0xe, 0xd,
>  };
>  
> -static unsigned char *clk_map[2];
> +static const unsigned char *clk_map[2];
>  
>  /**
>   * Request ASRC pair
> -- 
> 2.15.0
> 


Re: [PATCH RESEND v10 00/10] Application Data Integrity feature introduced by SPARC M7

2017-11-16 Thread Anthony Yznaga

> On Nov 16, 2017, at 6:38 AM, Khalid Aziz  wrote:
> 
> Changelog v10:
> 
>   - Patch 1/10: Updated si_codes definitions for SEGV to match 4.14
>   - Patch 2/10: No changes
>   - Patch 3/10: Updated copyright
>   - Patch 4/10: No changes
>   - Patch 5/10: No changes
>   - Patch 6/10: Updated copyright
>   - Patch 7/10: No changes
>   - Patch 8/10: No changes
>   - Patch 9/10: No changes
>   - Patch 10/10: Added code to return from kernel path to set
> PSTATE.mcde if kernel continues execution in another thread
> (Suggested by Anthony)

Looks good, Khalid.  Thanks for making the changes.

For the entire series:

Reviewed-by: Anthony Yznaga 


Re: [PATCH] powerpc/npu: Cleanup MMIO ATSD flushing

2017-11-16 Thread Balbir Singh
On Thu, Nov 16, 2017 at 5:24 PM, Aneesh Kumar K.V
 wrote:
> Balbir Singh  writes:
>
>  +  address = start;
>> + do {
>> + local_irq_disable();
>> + find_linux_pte(mm->pgd, address, _thp, );
>> + if (!is_thp)
>> + shift = PAGE_SHIFT;
>
> It can still be hugetlb if is_thp is false.
>

Yep, the hshift check needs to be upfront, the version below makes sense.

>> + else if (hshift && !is_thp)
>> + shift = hshift;
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> + else
>> + shift = HPAGE_PMD_SIZE;
>
> That is wrong. I guess it should be shift = HPAGE_PMD_SHIFT. But i am
> not sure we need to make it this complex at all. See below.
>
>> +#else
>> + else {
>> + shift = PAGE_SHIFT;
>> + pr_warn_once("unsupport page size for mm %p,addr 
>> %lx\n",
>> + mm, start);
>> + }
>> +#endif
>
> I am still not sure this is correct from a pure page table walking
> point. Why not
>
>if (hshift)
>   shift = hshift;
>else
>   shift = PAGE_SHIFT;
>

I don't think I care about THP at this point

I'll respin

Balbir


[PATCH v1] mm: relax deferred struct page requirements

2017-11-16 Thread Pavel Tatashin
There is no need to have ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT,
as all the page initialization code is in common code.

Also, there is no need to depend on MEMORY_HOTPLUG, as initialization code
does not really use hotplug memory functionality. So, we can remove this
requirement as well.

This patch allows to use deferred struct page initialization on all
platforms with memblock allocator.

Tested on x86, arm64, and sparc. Also, verified that code compiles on
PPC with CONFIG_MEMORY_HOTPLUG disabled.

Signed-off-by: Pavel Tatashin 
---
 arch/powerpc/Kconfig | 1 -
 arch/s390/Kconfig| 1 -
 arch/x86/Kconfig | 1 -
 mm/Kconfig   | 7 +--
 4 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cb782ac1c35d..1540348691c9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -148,7 +148,6 @@ config PPC
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_SUPPORTS_ATOMIC_RMW
-   select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_WANT_IPC_PARSE_VERSION
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 863a62a6de3c..525c2e3df6f5 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -108,7 +108,6 @@ config S390
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
select ARCH_SAVE_PAGE_KEYS if HIBERNATION
select ARCH_SUPPORTS_ATOMIC_RMW
-   select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index df3276d6bfe3..00a5446de394 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -69,7 +69,6 @@ config X86
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_SUPPORTS_ATOMIC_RMW
-   select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
diff --git a/mm/Kconfig b/mm/Kconfig
index 9c4b80c2..c6bd0309ce7a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -639,15 +639,10 @@ config MAX_STACK_SIZE_MB
 
  A sane initial value is 80 MB.
 
-# For architectures that support deferred memory initialisation
-config ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
-   bool
-
 config DEFERRED_STRUCT_PAGE_INIT
bool "Defer initialisation of struct pages to kthreads"
default n
-   depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
-   depends on NO_BOOTMEM && MEMORY_HOTPLUG
+   depends on NO_BOOTMEM
depends on !FLATMEM
help
  Ordinarily all struct pages are initialised during early boot in a
-- 
2.15.0



Re: [PATCH kernel] vfio/spapr: Add trace points for map/unmap

2017-11-16 Thread Alex Williamson
On Tue, 14 Nov 2017 10:47:12 +1100
Alexey Kardashevskiy  wrote:

> On 27/10/17 14:00, Alexey Kardashevskiy wrote:
> > This adds trace_map/trace_unmap tracepoints to spapr driver. Type1 already
> > uses these via the IOMMU API (iommu_map/__iommu_unmap).
> > 
> > Signed-off-by: Alexey Kardashevskiy   

Is this really legitimate to include tracepoints from a different
subsystem?  The vfio type1 backend gets these trace points by virtue of
it actually using the IOMMU API, it doesn't call them itself.  I'm kind
of surprised these are actually available to be called from a module.
I suspect the way to do this is probably to define our own tracepoints
in the vfio/spapr backend or insert tracepoints into the IOMMU layers
that that code calls into rather than masquerading as tracepoints from
a different subsystem.  Right?  Thanks,

Alex

> > ---
> > 
> > Example:
> >  qemu-system-ppc-8655  [096]   724.662740: unmap:IOMMU: 
> > iova=0x3000 size=4096 unmapped_size=4096
> >  qemu-system-ppc-8656  [104]   724.970912: map:  IOMMU: 
> > iova=0x0800 paddr=0x7ffef7ff size=65536
> > ---
> >  drivers/vfio/vfio_iommu_spapr_tce.c | 12 ++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> > b/drivers/vfio/vfio_iommu_spapr_tce.c
> > index 63112c36ab2d..4531486c77c6 100644
> > --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> > @@ -22,6 +22,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -502,17 +503,19 @@ static int tce_iommu_clear(struct tce_container 
> > *container,
> > struct iommu_table *tbl,
> > unsigned long entry, unsigned long pages)
> >  {
> > -   unsigned long oldhpa;
> > +   unsigned long oldhpa, unmapped, firstentry = entry, totalpages = pages;
> > long ret;
> > enum dma_data_direction direction;
> >  
> > -   for ( ; pages; --pages, ++entry) {
> > +   for (unmapped = 0; pages; --pages, ++entry) {
> > direction = DMA_NONE;
> > oldhpa = 0;
> > ret = iommu_tce_xchg(tbl, entry, , );
> > if (ret)
> > continue;
> >  
> > +   ++unmapped;
> > +
> > if (direction == DMA_NONE)
> > continue;
> >  
> > @@ -523,6 +526,9 @@ static int tce_iommu_clear(struct tce_container 
> > *container,
> >  
> > tce_iommu_unuse_page(container, oldhpa);
> > }
> > +   trace_unmap(firstentry << tbl->it_page_shift,
> > +   totalpages << tbl->it_page_shift,
> > +   unmapped << tbl->it_page_shift);
> >  
> > return 0;
> >  }
> > @@ -965,6 +971,8 @@ static long tce_iommu_ioctl(void *iommu_data,
> > direction);
> >  
> > iommu_flush_tce(tbl);
> > +   if (!ret)
> > +   trace_map(param.iova, param.vaddr, param.size);
> >  
> > return ret;
> > }
> >   
> 
> 



Re: [PATCH 0/9] posix_clocks: Prepare syscalls for 64 bit time_t conversion

2017-11-16 Thread Arnd Bergmann
On Thu, Nov 16, 2017 at 10:04 AM, Thomas Gleixner  wrote:
> On Wed, 15 Nov 2017, Deepa Dinamani wrote:
>> > I had on concern about x32, maybe we should check
>> > for "COMPAT_USE_64BIT_TIME" before zeroing out the tv_nsec
>> > bits.
>>
>> Thanks, I think you are right. I had the check conditional on
>> CONFIG_64BIT_TIME and then removed as I forgot why I added it. :)
>>
>> > Regarding CONFIG_COMPAT_TIME/CONFIG_64BIT_TIME, would
>> > it help to just leave out that part for now and unconditionally
>> > define '__kernel_timespec' as 'timespec' until we are ready to
>> > convert the architectures?
>>
>> Another approach would be to use separate configs:
>>
>> 1. To indicate 64 bit time_t syscall support. This will be dependent
>> on architectures as CONFIG_64BIT_TIME.
>> We can delete this once all architectures have provided support for this.
>>
>> 2. Another config (maybe COMPAT_32BIT_TIME?) to be introduced later,
>> which will compile out all syscalls/ features that use 32 bit time_t.
>> This can help build a y2038 safe kernel later.
>>
>> Would this work for everyone?
>
> Having extra config switches which are selectable by architectures and
> removed when everything is converted is definitely the right way to go.
>
> That allows you to gradually convert stuff w/o inflicting wreckage all over
> the place.

The CONFIG_64BIT_TIME would do that nicely for the new stuff like
the conditional definition of __kernel_timespec, this one would get
removed after we convert all architectures.

A second issue is how to control the compilation of the compat syscalls.
CONFIG_COMPAT_32BIT_TIME handles that and could be defined
in Kconfig as 'def_bool (!64BIT && CONFIG_64BIT_TIME) || COMPAT',
this is then just a more readable way of expressing exactly when the
functions should be built.

For completeness, there may be a third category, depending on how
we handle things like sys_nanosleep(): Here, we want the native
sys_nanosleep on 64-bit architectures, and compat_sys_nanosleep()
to handle the 32-bit time_t variant on both 32-bit and 64-bit targets,
but our plan is to not have a native 32-bit sys_nanosleep on 32-bit
architectures any more, as new glibc should call clock_nanosleep()
with a new syscall number instead. Should we then enclose
sys_nanosleep in "#if !defined(CONFIG_64BIT_TIME) ||
defined(CONFIG_64BIT)", or should we try to come up with another
Kconfig symbol name that expresses this better?

   Arnd


Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends

2017-11-16 Thread Michael Ellerman
Brian King  writes:

> On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
>> On Thu, 16 Nov 2017 09:37:48 -0600
>> Brian King  wrote:
>> 
>>> Resending as the first attempt is not showing up in the list archive.
>>>
>>> This patch converts several network drivers to use smp_rmb
>>> rather than read_barrier_depends. The initial issue was
>>> discovered with ixgbe on a Power machine which resulted
>>> in skb list corruption due to fetching a stale skb pointer.
>>> More details can be found in the ixgbe patch description.
>> 
>> Thanks for the fix Brian, I bet it was a tough debug.
>> 
>> The only users in the entire kernel of read_barrier_depends() (not
>> smp_read_barrier_depends) are the Intel network drivers.
>> 
>> Wouldn't it be better for power to just fix read_barrier_depends to do
>> the right thing on power? The question I'm not sure of the answer to is:
>> Is it really the wrong barrier to be using or is the implementation in
>> the kernel powerpc wrong?
>> 
>> So I think the right thing might actually to be to:
>> Fix arch powerpc read_barrier_depends to not be a noop, as the
>> semantics of the read_barrier_depends seems to be sufficient to solve
>> this problem, but it seems not to work for powerpc?
>
> Jesse,
>
> Thanks for the quick response.
>
> Cc'ing linuxppc-dev as well. 
>
> I did think about changing the powerpc definition of read_barrier_depends,
> but after reading up on that barrier, decided it was not the correct barrier
> to be used in this context. Here is some good historical background on
> read_barrier_depends that I found, along with an example.
>
> https://lwn.net/Articles/5159/
>
> Since there is no data-dependency in the code in question here, I think
> the smp_rmb is the proper barrier to use.

Yes I agree.

The read_barrier_depends() is correct to order the load of eop_desc and
then the dependent load of eop_desc->wb.status, but it's only required
or does anything on Alpha.

> For background, the code in question looks like this:
>
> CPU 1   CPU2
> 
> 1: ixgbe_xmit_frame_ringixgbe_clean_tx_irq
> 2:  first->skb = skb eop_desc = tx_buffer->next_to_watch
>  if (!eop_desc)
>  break;
> 3:  ixgbe_tx_map read_barrier_depends()
>  if (!(eop_desc->wb.status) ... )
>  break;
> 4:   wmb 
> 5:   first->next_to_watch = tx_desc  napi_consume_skb(tx_buffer->skb ..);
> 6:   writel(i, tx_ring->tail);
>
> What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
> prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
> to a smp_rmb solves this and prevents us from dereferencing old pointer.

Right. Given that read_barrier_depends() is a nop, there's nothing there
to order the load of tx_buffer->skb vs anything else.

If it's actually the load of tx_buffer->skb that's the issue then the
smp_rmb() should really be immediately prior to that, rather than where
the read_barrier_depends() currently is.

cheers



Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends

2017-11-16 Thread Jesse Brandeburg
On Thu, 16 Nov 2017 14:03:02 -0600
Brian King  wrote:
> I did think about changing the powerpc definition of read_barrier_depends,
> but after reading up on that barrier, decided it was not the correct barrier
> to be used in this context. Here is some good historical background on
> read_barrier_depends that I found, along with an example.
> 
> https://lwn.net/Articles/5159/
> 
> Since there is no data-dependency in the code in question here, I think
> the smp_rmb is the proper barrier to use.

Hey Brian, thanks for the explanation, I'll agree with you and Alex
that the smb_rmb replacement is okay.  Does your test still pass
without the ->skb NULLs?


Re: [PATCH v3] ppc64 boot: Wait for boot cpu to show up if nr_cpus limit is about to hit.

2017-11-16 Thread Guilherme G. Piccoli
On 11/06/2017 03:34 PM, Thadeu Lima de Souza Cascardo wrote:
> From: Mahesh Salgaonkar 
> 
> The kernel boot parameter 'nr_cpus=' allows one to specify number of
> possible cpus in the system. In the normal scenario the first cpu (cpu0)
> that shows up is the boot cpu and hence it gets covered under nr_cpus
> limit.
> 
> But this assumption will be broken in kdump scenario where kdump kenrel

Minor nit: s/kenrel/kernel

> after a crash can boot up on an non-zero boot cpu. The paca structure
> allocation depends on value of nr_cpus and is indexed using logical cpu
> ids. This definetly will be an issue if boot cpu id > nr_cpus

Minor nit (2): s/definetly/definitely
> 
> This patch modifies allocate_pacas() and smp_setup_cpu_maps() to
> accommodate boot cpu for the case where boot_cpuid > nr_cpu_ids.
> 
> This change would help to reduce the memory reservation requirement for
> kdump on ppc64.
> 
> Signed-off-by: Mahesh Salgaonkar 
> Signed-off-by: Thadeu Lima de Souza Cascardo 

Tested-by: Guilherme G. Piccoli 

Without this patch, got a crash deterministically when booting with
nr_cpus=1 in P8 bare-metal. The patch fixes the issue...

Thanks,


Guilherme


> ---
> 
> v3: fixup signedness or nr_cpus to match nr_cpu_ids
>  and fix conflict due to change from %d to %u
> 
> Resending this as it was not applied, and I can reproduce the issue with
> v4.14-rc8 when booting a kdump kernel after a crash that has been given
> nr_cpus=1 as a parameter. With this patch, I can't reproduce it anymore.
> 
> ---
>  arch/powerpc/include/asm/paca.h|  3 +++
>  arch/powerpc/include/asm/smp.h |  1 +
>  arch/powerpc/kernel/paca.c | 23 +-
>  arch/powerpc/kernel/prom.c | 39 
> +-
>  arch/powerpc/kernel/setup-common.c | 25 
>  5 files changed, 85 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 04b60af027ae..ea0dbf2bbeef 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -49,6 +49,9 @@ extern unsigned int debug_smp_processor_id(void); /* from 
> linux/smp.h */
>  #define get_lppaca() (get_paca()->lppaca_ptr)
>  #define get_slb_shadow() (get_paca()->slb_shadow_ptr)
> 
> +/* Maximum number of threads per core. */
> +#define  MAX_SMT 8
> +
>  struct task_struct;
> 
>  /*
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index fac963e10d39..553cd22b2ccc 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -30,6 +30,7 @@
>  #include 
> 
>  extern int boot_cpuid;
> +extern int boot_hw_cpuid;
>  extern int spinning_secondaries;
> 
>  extern void cpu_die(void);
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index 2ff2b8a19f71..9c689ee4b6a3 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -207,6 +207,7 @@ void __init allocate_pacas(void)
>  {
>   u64 limit;
>   int cpu;
> + unsigned int nr_cpus;
> 
>   limit = ppc64_rma_size;
> 
> @@ -219,20 +220,32 @@ void __init allocate_pacas(void)
>   limit = min(0x1000ULL, limit);
>  #endif
> 
> - paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
> + /*
> +  * Always align up the nr_cpu_ids to SMT threads and allocate
> +  * the paca. This will help us to prepare for a situation where
> +  * boot cpu id > nr_cpus_id. We will use the last nthreads
> +  * slots (nthreads == threads per core) to accommodate a core
> +  * that contains boot cpu thread.
> +  *
> +  * Do not change nr_cpu_ids value here. Let us do that in
> +  * early_init_dt_scan_cpus() where we know exact value
> +  * of threads per core.
> +  */
> + nr_cpus = _ALIGN_UP(nr_cpu_ids, MAX_SMT);
> + paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpus);
> 
>   paca = __va(memblock_alloc_base(paca_size, PAGE_SIZE, limit));
>   memset(paca, 0, paca_size);
> 
>   printk(KERN_DEBUG "Allocated %u bytes for %u pacas at %p\n",
> - paca_size, nr_cpu_ids, paca);
> + paca_size, nr_cpus, paca);
> 
> - allocate_lppacas(nr_cpu_ids, limit);
> + allocate_lppacas(nr_cpus, limit);
> 
> - allocate_slb_shadows(nr_cpu_ids, limit);
> + allocate_slb_shadows(nr_cpus, limit);
> 
>   /* Can't use for_each_*_cpu, as they aren't functional yet */
> - for (cpu = 0; cpu < nr_cpu_ids; cpu++)
> + for (cpu = 0; cpu < nr_cpus; cpu++)
>   initialise_paca([cpu], cpu);
>  }
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index f83056297441..93837093c5cb 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -302,6 +302,29 @@ static void __init check_cpu_feature_properties(unsigned 
> 

Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends

2017-11-16 Thread Duyck, Alexander H
On Thu, 2017-11-16 at 14:03 -0600, Brian King wrote:
> On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
> > On Thu, 16 Nov 2017 09:37:48 -0600
> > Brian King  wrote:
> > 
> > > Resending as the first attempt is not showing up in the list archive.
> > > 
> > > This patch converts several network drivers to use smp_rmb
> > > rather than read_barrier_depends. The initial issue was
> > > discovered with ixgbe on a Power machine which resulted
> > > in skb list corruption due to fetching a stale skb pointer.
> > > More details can be found in the ixgbe patch description.
> > 
> > Thanks for the fix Brian, I bet it was a tough debug.
> > 
> > The only users in the entire kernel of read_barrier_depends() (not
> > smp_read_barrier_depends) are the Intel network drivers.
> > 
> > Wouldn't it be better for power to just fix read_barrier_depends to do
> > the right thing on power? The question I'm not sure of the answer to is:
> > Is it really the wrong barrier to be using or is the implementation in
> > the kernel powerpc wrong?
> > 
> > So I think the right thing might actually to be to:
> > Fix arch powerpc read_barrier_depends to not be a noop, as the
> > semantics of the read_barrier_depends seems to be sufficient to solve
> > this problem, but it seems not to work for powerpc?
> 
> Jesse,
> 
> Thanks for the quick response.
> 
> Cc'ing linuxppc-dev as well. 
> 
> I did think about changing the powerpc definition of read_barrier_depends,
> but after reading up on that barrier, decided it was not the correct barrier
> to be used in this context. Here is some good historical background on
> read_barrier_depends that I found, along with an example.
> 
> https://lwn.net/Articles/5159/
> 
> Since there is no data-dependency in the code in question here, I think
> the smp_rmb is the proper barrier to use.
> 
> For background, the code in question looks like this:
> 
> CPU 1   CPU2
> 
> 1: ixgbe_xmit_frame_ringixgbe_clean_tx_irq
> 2:  first->skb = skb eop_desc = tx_buffer->next_to_watch
>  if (!eop_desc)
>  break;
> 3:  ixgbe_tx_map read_barrier_depends()
>  if (!(eop_desc->wb.status) ... )
>  break;
> 4:   wmb 
> 5:   first->next_to_watch = tx_desc  napi_consume_skb(tx_buffer->skb ..);
> 6:   writel(i, tx_ring->tail);
> 
> What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
> prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
> to a smp_rmb solves this and prevents us from dereferencing old pointer.
> 
> -Brian

So the barrier part I am okay with for all the drivers. I hadn't
accounted for the skb being read before next_to_watch. I was more
concerned about the descriptor ring versus buffer_info structure at the
time I made use of that.

The updates to clear tx_buffer->skb in ixgbe I am not okay with.
Basically the tell-tale sign for skb present is next_to_watch being
non-null. The extra writes add overhead and I want to avoid that at all
costs since I want to avoid as much bouncing between the xmit path and
the Tx clean-up as possible.

- Alex



[PATCH V3 4/4] powerpc: Enable support for ibm,drc-info devtree property

2017-11-16 Thread Michael Bringmann
prom_init.c: Enable support for new DRC device tree property
"ibm,drc-info" in initial handshake between the Linux kernel and
the front end processor.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/kernel/prom_init.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 02190e9..f962908 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -873,6 +873,7 @@ struct ibm_arch_vec __cacheline_aligned 
ibm_architecture_vec = {
.mmu = 0,
.hash_ext = 0,
.radix_ext = 0,
+   .byte22 = OV5_FEAT(OV5_DRC_INFO),
},
 
/* option vector 6: IBM PAPR hints */



Subject: [PATCH V4 3/4] hotplug/drc-info: Add code to search ibm,drc-info property

2017-11-16 Thread Michael Bringmann
rpadlpar_core.c: Provide parallel routines to search the older device-
tree properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains"), or the new property "ibm,drc-info".

The interface to examine the DRC information is changed from a "get"
function that returns values for local verification elsewhere, to a
"check" function that validates the 'name' and/or 'type' of a device
node.  This update hides the format of the underlying device-tree
properties, and concentrates the value checks into a single function
without requiring the user to verify whether a search was successful.

Signed-off-by: Michael Bringmann 
---
Changes in V4:
  -- Rename of_one_drc_info to of_read_drc_info_cell
  -- Fix some spacing within arguments
---
 drivers/pci/hotplug/rpadlpar_core.c |   13 ++--
 drivers/pci/hotplug/rpaphp.h|4 +
 drivers/pci/hotplug/rpaphp_core.c   |  110 +++
 3 files changed, 92 insertions(+), 35 deletions(-)

diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
b/drivers/pci/hotplug/rpadlpar_core.c
index a3449d7..fc01d7d 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../pci.h"
 #include "rpaphp.h"
@@ -44,15 +45,14 @@ static struct device_node *find_vio_slot_node(char 
*drc_name)
 {
struct device_node *parent = of_find_node_by_name(NULL, "vdevice");
struct device_node *dn = NULL;
-   char *name;
int rc;
 
if (!parent)
return NULL;
 
while ((dn = of_get_next_child(parent, dn))) {
-   rc = rpaphp_get_drc_props(dn, NULL, , NULL, NULL);
-   if ((rc == 0) && (!strcmp(drc_name, name)))
+   rc = rpaphp_check_drc_props(dn, drc_name, NULL);
+   if (rc == 0)
break;
}
 
@@ -64,15 +64,12 @@ static struct device_node *find_php_slot_pci_node(char 
*drc_name,
  char *drc_type)
 {
struct device_node *np = NULL;
-   char *name;
-   char *type;
int rc;
 
while ((np = of_find_node_by_name(np, "pci"))) {
-   rc = rpaphp_get_drc_props(np, NULL, , , NULL);
+   rc = rpaphp_check_drc_props(np, drc_name, drc_type);
if (rc == 0)
-   if (!strcmp(drc_name, name) && !strcmp(drc_type, type))
-   break;
+   break;
}
 
return np;
diff --git a/drivers/pci/hotplug/rpaphp.h b/drivers/pci/hotplug/rpaphp.h
index 7db024e..8db5f2e 100644
--- a/drivers/pci/hotplug/rpaphp.h
+++ b/drivers/pci/hotplug/rpaphp.h
@@ -91,8 +91,8 @@ struct slot {
 
 /* rpaphp_core.c */
 int rpaphp_add_slot(struct device_node *dn);
-int rpaphp_get_drc_props(struct device_node *dn, int *drc_index,
-   char **drc_name, char **drc_type, int *drc_power_domain);
+int rpaphp_check_drc_props(struct device_node *dn, char *drc_name,
+   char *drc_type);
 
 /* rpaphp_slot.c */
 void dealloc_slot_struct(struct slot *slot);
diff --git a/drivers/pci/hotplug/rpaphp_core.c 
b/drivers/pci/hotplug/rpaphp_core.c
index 1e29aba..0a3b5f5 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include/* for eeh_add_device() */
 #include   /* rtas_call */
 #include /* for pci_controller */
@@ -196,25 +197,21 @@ static int get_children_props(struct device_node *dn, 
const int **drc_indexes,
return 0;
 }
 
-/* To get the DRC props describing the current node, first obtain it's
- * my-drc-index property.  Next obtain the DRC list from it's parent.  Use
- * the my-drc-index for correlation, and obtain the requested properties.
+
+/* Verify the existence of 'drc_name' and/or 'drc_type' within the
+ * current node.  First obtain it's my-drc-index property.  Next,
+ * obtain the DRC info from it's parent.  Use the my-drc-index for
+ * correlation, and obtain/validate the requested properties.
  */
-int rpaphp_get_drc_props(struct device_node *dn, int *drc_index,
-   char **drc_name, char **drc_type, int *drc_power_domain)
+
+static int rpaphp_check_drc_props_v1(struct device_node *dn, char *drc_name,
+   char *drc_type, unsigned int my_index)
 {
+   char *name_tmp, *type_tmp;
const int *indexes, *names;
const int *types, *domains;
-   const unsigned int *my_index;
-   char *name_tmp, *type_tmp;
int i, rc;
 
-   my_index = of_get_property(dn, "ibm,my-drc-index", NULL);
-   if (!my_index) {
-   /* Node isn't DLPAR/hotplug capable */
-   return -EINVAL;
-   }
-
rc = get_children_props(dn->parent, , , , );
if (rc < 0) {
return -EINVAL;
@@ -225,24 +222,87 @@ int 

[PATCH V4 2/4] pseries/drc-info: Search DRC properties for CPU indexes

2017-11-16 Thread Michael Bringmann
pseries/drc-info: Provide parallel routines to convert between
drc_index and CPU numbers at runtime, using the older device-tree
properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains"), or the new property "ibm,drc-info".

Signed-off-by: Michael Bringmann 
---
Changes in V4:
  -- Rename of_one_drc_info to of_read_drc_info_cell
  -- Fix some spacing within expressions
  -- Make some style corrections
---
 arch/powerpc/include/asm/prom.h |   15 +++
 arch/powerpc/platforms/pseries/of_helpers.c |   60 ++
 arch/powerpc/platforms/pseries/pseries_energy.c |  138 ++-
 3 files changed, 185 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 3243455..0ef41b1 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -96,6 +96,21 @@ struct of_drconf_cell {
 #define DRCONF_MEM_AI_INVALID  0x0040
 #define DRCONF_MEM_RESERVED0x0080
 
+struct of_drc_info {
+   char *drc_type;
+   char *drc_name_prefix;
+   u32 drc_index_start;
+   u32 drc_name_suffix_start;
+   u32 num_sequential_elems;
+   u32 sequential_inc;
+   u32 drc_power_domain;
+   u32 last_drc_index;
+};
+
+extern int of_read_drc_info_cell(struct property **prop,
+   const __be32 **curval, struct of_drc_info *data);
+
+
 /*
  * There are two methods for telling firmware what our capabilities are.
  * Newer machines have an "ibm,client-architecture-support" method on the
diff --git a/arch/powerpc/platforms/pseries/of_helpers.c 
b/arch/powerpc/platforms/pseries/of_helpers.c
index 2798933..b36f1ae 100644
--- a/arch/powerpc/platforms/pseries/of_helpers.c
+++ b/arch/powerpc/platforms/pseries/of_helpers.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "of_helpers.h"
 
@@ -36,3 +37,62 @@ struct device_node *pseries_of_derive_parent(const char 
*path)
kfree(parent_path);
return parent ? parent : ERR_PTR(-EINVAL);
 }
+
+
+/* Helper Routines to convert between drc_index to cpu numbers */
+
+int of_read_drc_info_cell(struct property **prop, const __be32 **curval,
+   struct of_drc_info *data)
+{
+   const char *p;
+   const __be32 *p2;
+
+   if (!data)
+   return -EINVAL;
+
+   /* Get drc-type:encode-string */
+   p = data->drc_type = (char*) (*curval);
+   p = of_prop_next_string(*prop, p);
+   if (!p)
+   return -EINVAL;
+
+   /* Get drc-name-prefix:encode-string */
+   data->drc_name_prefix = (char *)p;
+   p = of_prop_next_string(*prop, p);
+   if (!p)
+   return -EINVAL;
+
+   /* Get drc-index-start:encode-int */
+   p2 = (const __be32 *)p;
+   p2 = of_prop_next_u32(*prop, p2, >drc_index_start);
+   if (!p2)
+   return -EINVAL;
+
+   /* Get drc-name-suffix-start:encode-int */
+   p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start);
+   if (!p2)
+   return -EINVAL;
+
+   /* Get number-sequential-elements:encode-int */
+   p2 = of_prop_next_u32(*prop, p2, >num_sequential_elems);
+   if (!p2)
+   return -EINVAL;
+
+   /* Get sequential-increment:encode-int */
+   p2 = of_prop_next_u32(*prop, p2, >sequential_inc);
+   if (!p2)
+   return -EINVAL;
+
+   /* Get drc-power-domain:encode-int */
+   p2 = of_prop_next_u32(*prop, p2, >drc_power_domain);
+   if (!p2)
+   return -EINVAL;
+
+   /* Should now know end of current entry */
+   (*curval) = (void *)p2;
+   data->last_drc_index = data->drc_index_start +
+   ((data->num_sequential_elems - 1) * data->sequential_inc);
+
+   return 0;
+}
+EXPORT_SYMBOL(of_read_drc_info_cell);
diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
b/arch/powerpc/platforms/pseries/pseries_energy.c
index 35c891a..b8f6603 100644
--- a/arch/powerpc/platforms/pseries/pseries_energy.c
+++ b/arch/powerpc/platforms/pseries/pseries_energy.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 #define MODULE_VERS "1.0"
@@ -38,26 +39,64 @@
 static u32 cpu_to_drc_index(int cpu)
 {
struct device_node *dn = NULL;
-   const int *indexes;
-   int i;
+   int thread_index;
int rc = 1;
u32 ret = 0;
 
dn = of_find_node_by_path("/cpus");
if (dn == NULL)
goto err;
-   indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
-   if (indexes == NULL)
-   goto err_of_node_put;
+
/* Convert logical cpu number to core number */
-   i = cpu_core_index_of_thread(cpu);
-   /*
-* The first element indexes[0] is the number of drc_indexes
-* returned in the list.  Hence i+1 will get the drc_index
-* corresponding to core number i.
-*/
-   WARN_ON(i > 

[PATCH V4 1/4] powerpc/firmware: Add definitions for new drc-info firmware feature

2017-11-16 Thread Michael Bringmann
Firmware Features: Define new bit flag representing the presence of
new device tree property "ibm,drc-info".  The flag is used to tell
the front end processor whether the Linux kernel supports the new
property, and by the front end processor to tell the Linux kernel
that the new property is present in the device tree.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/firmware.h   |3 ++-
 arch/powerpc/include/asm/prom.h   |1 +
 arch/powerpc/platforms/pseries/firmware.c |1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 8645897..329d537 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,6 +51,7 @@
 #define FW_FEATURE_BEST_ENERGY ASM_CONST(0x8000)
 #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001)
 #define FW_FEATURE_PRRNASM_CONST(0x0002)
+#define FW_FEATURE_DRC_INFOASM_CONST(0x0004)
 
 #ifndef __ASSEMBLY__
 
@@ -67,7 +68,7 @@ enum {
FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
-   FW_FEATURE_HPT_RESIZE,
+   FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRC_INFO,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 825bd59..3243455 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -175,6 +175,7 @@ struct of_drconf_cell {
 #define OV5_HASH_GTSE  0x1940  /* Guest Translation Shoot Down Avail */
 /* Radix Table Extensions */
 #define OV5_RADIX_GTSE 0x1A40  /* Guest Translation Shoot Down Avail */
+#define OV5_DRC_INFO   0x1640  /* Redef Prop Structures: drc-info   */
 
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX  0x02/* Linux is our OS */
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 63cc82a..757d757 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -114,6 +114,7 @@ struct vec5_fw_feature {
 vec5_fw_features_table[] = {
{FW_FEATURE_TYPE1_AFFINITY, OV5_TYPE1_AFFINITY},
{FW_FEATURE_PRRN,   OV5_PRRN},
+   {FW_FEATURE_DRC_INFO,   OV5_DRC_INFO},
 };
 
 static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)



[PATCH V4 0/4] powerpc/devtree: Add support for 'ibm,drc-info' property

2017-11-16 Thread Michael Bringmann
Several properties in the DRC device tree format are replaced by
more compact representations to allow, for example, for the encoding
of vast amounts of memory, and or reduced duplication of information
in related data structures.

"ibm,drc-info": This property, when present, replaces the following
four properties: "ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains".  This property is defined for all
dynamically reconfigurable platform nodes.  The "ibm,drc-info" elements
are intended to provide a more compact representation, and reduce some
search overhead.

"ibm,architecture.vec": Bidirectional communication mechanism between
the host system and the front end processor indicating what features
the host system supports and what features the front end processor will
actually provide.  In this case, we are indicating that the host system
can support the new device tree structure "ibm,drc-info".

Signed-off-by: Michael Bringmann 

Michael Bringmann (4):
  powerpc/firmware: Add definitions for new drc-info firmware feature.
  pseries/drc-info: Search new DRC properties for CPU indexes
  hotplug/drc-info: Add code to search new devtree property
  powerpc: Enable support for new DRC devtree property
---
Changes in V4:
  -- Rename of_one_drc_info to of_read_drc_info_cell
  -- Fix some spacing within expressions
  -- Make some style corrections



Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends

2017-11-16 Thread Brian King
On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
> On Thu, 16 Nov 2017 09:37:48 -0600
> Brian King  wrote:
> 
>> Resending as the first attempt is not showing up in the list archive.
>>
>> This patch converts several network drivers to use smp_rmb
>> rather than read_barrier_depends. The initial issue was
>> discovered with ixgbe on a Power machine which resulted
>> in skb list corruption due to fetching a stale skb pointer.
>> More details can be found in the ixgbe patch description.
> 
> Thanks for the fix Brian, I bet it was a tough debug.
> 
> The only users in the entire kernel of read_barrier_depends() (not
> smp_read_barrier_depends) are the Intel network drivers.
> 
> Wouldn't it be better for power to just fix read_barrier_depends to do
> the right thing on power? The question I'm not sure of the answer to is:
> Is it really the wrong barrier to be using or is the implementation in
> the kernel powerpc wrong?
> 
> So I think the right thing might actually to be to:
> Fix arch powerpc read_barrier_depends to not be a noop, as the
> semantics of the read_barrier_depends seems to be sufficient to solve
> this problem, but it seems not to work for powerpc?

Jesse,

Thanks for the quick response.

Cc'ing linuxppc-dev as well. 

I did think about changing the powerpc definition of read_barrier_depends,
but after reading up on that barrier, decided it was not the correct barrier
to be used in this context. Here is some good historical background on
read_barrier_depends that I found, along with an example.

https://lwn.net/Articles/5159/

Since there is no data-dependency in the code in question here, I think
the smp_rmb is the proper barrier to use.

For background, the code in question looks like this:

CPU 1   CPU2

1: ixgbe_xmit_frame_ringixgbe_clean_tx_irq
2:  first->skb = skb eop_desc = tx_buffer->next_to_watch
 if (!eop_desc)
 break;
3:  ixgbe_tx_map read_barrier_depends()
 if (!(eop_desc->wb.status) ... )
 break;
4:   wmb 
5:   first->next_to_watch = tx_desc  napi_consume_skb(tx_buffer->skb ..);
6:   writel(i, tx_ring->tail);

What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
to a smp_rmb solves this and prevents us from dereferencing old pointer.

-Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [PATCH] [net-next,v2] ibmvnic: fix dma_mapping_error call

2017-11-16 Thread Desnes Augusto Nunes do Rosário

First of all, I apologize for sending this patch to net-next!

Since this is a fix, it should had been sent to the regular net tree, 
which I'll do now with the proper fixes tag. My mistake!


Thanks for understanding and please discard this one.

On 11/16/2017 04:33 PM, Desnes Augusto Nunes do Rosario wrote:

This patch fixes the dma_mapping_error call to use the correct dma_addr
which is inside the ibmvnic_vpd struct. Moreover, it fixes a uninitialized
  warning for the local dma_addr.

Signed-off-by: Desnes A. Nunes do Rosario 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 04aaacb..1dc4aef 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -849,7 +849,6 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
  {
struct device *dev = >vdev->dev;
union ibmvnic_crq crq;
-   dma_addr_t dma_addr;
int len = 0;

if (adapter->vpd->buff)
@@ -879,7 +878,7 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
adapter->vpd->dma_addr =
dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
   DMA_FROM_DEVICE);
-   if (dma_mapping_error(dev, dma_addr)) {
+   if (dma_mapping_error(dev, adapter->vpd->dma_addr)) {
dev_err(dev, "Could not map VPD buffer\n");
kfree(adapter->vpd->buff);
return -ENOMEM;



--
Desnes Augusto Nunes do Rosário
--

Linux Developer - IBM / Brazil
M.Sc. in Electrical and Computer Engineering - UFRN

(11) 9595-30-900
desn...@br.ibm.com



Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-16 Thread Meelis Roos
> > For me, 4.13 worked and 4.14 hangs early during boot. Bisecting led to
> > the following commit. I had STRICT_KERNEL_RWX enabled when I met the
> > option. When I disabled STRICT_KERNEL_RWX, the same kernel booted fine.
> 
> Can you please check that 4.13 boots properly with 'nobats' kernel parametre ?

Yes, 4.13.0 with nobats works.

> 
> Christophe
> 
> > 
> > 
> > 95902e6c8864d39b09134dcaa3c99d8161d1deea is the first bad commit
> > commit 95902e6c8864d39b09134dcaa3c99d8161d1deea
> > Author: Christophe Leroy 
> > Date:   Wed Aug 2 15:51:05 2017 +0200
> > 
> >powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32
> > 
> >This patch implements STRICT_KERNEL_RWX on PPC32.
> > 
> >As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
> >in order to allow page protection setup at the level of each page.
> > 
> >As BAT/LTLB mappings are deactivated, there might be a performance
> >impact.
> > 
> >Signed-off-by: Christophe Leroy 
> >Signed-off-by: Michael Ellerman 
> > 
> > :04 04 1eac3de57642856e31a914da2e1fe5368095f04b
> > ee3634b9ae309852feebc69b8a6bd473944e212c M  arch
> > 
> > 
> > Config:
> > 
> > #
> > # Automatically generated file; DO NOT EDIT.
> > # Linux/powerpc 4.13.0-rc2 Kernel Configuration
> > #
> > # CONFIG_PPC64 is not set
> > 
> > #
> > # Processor support
> > #
> > CONFIG_PPC_BOOK3S_32=y
> > # CONFIG_PPC_85xx is not set
> > # CONFIG_PPC_8xx is not set
> > # CONFIG_40x is not set
> > # CONFIG_44x is not set
> > # CONFIG_E200 is not set
> > CONFIG_PPC_BOOK3S=y
> > CONFIG_6xx=y
> > CONFIG_PPC_FPU=y
> > CONFIG_ALTIVEC=y
> > CONFIG_PPC_STD_MMU=y
> > CONFIG_PPC_STD_MMU_32=y
> > # CONFIG_PPC_MM_SLICES is not set
> > CONFIG_PPC_HAVE_PMU_SUPPORT=y
> > # CONFIG_FORCE_SMP is not set
> > # CONFIG_SMP is not set
> > # CONFIG_PPC_DOORBELL is not set
> > CONFIG_VDSO32=y
> > CONFIG_CPU_BIG_ENDIAN=y
> > CONFIG_PPC32=y
> > CONFIG_32BIT=y
> > # CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
> > # CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
> > CONFIG_MMU=y
> > CONFIG_ARCH_MMAP_RND_BITS_MAX=17
> > CONFIG_ARCH_MMAP_RND_BITS_MIN=11
> > CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
> > CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
> > # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
> > # CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
> > CONFIG_NR_IRQS=512
> > CONFIG_STACKTRACE_SUPPORT=y
> > CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> > CONFIG_LOCKDEP_SUPPORT=y
> > CONFIG_RWSEM_XCHGADD_ALGORITHM=y
> > CONFIG_GENERIC_HWEIGHT=y
> > CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
> > CONFIG_PPC=y
> > # CONFIG_GENERIC_CSUM is not set
> > CONFIG_EARLY_PRINTK=y
> > CONFIG_PANIC_TIMEOUT=180
> > CONFIG_GENERIC_NVRAM=y
> > CONFIG_SCHED_OMIT_FRAME_POINTER=y
> > CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> > # CONFIG_PPC_UDBG_16550 is not set
> > # CONFIG_GENERIC_TBSYNC is not set
> > CONFIG_AUDIT_ARCH=y
> > CONFIG_GENERIC_BUG=y
> > # CONFIG_EPAPR_BOOT is not set
> > # CONFIG_DEFAULT_UIMAGE is not set
> > CONFIG_ARCH_HIBERNATION_POSSIBLE=y
> > CONFIG_ARCH_SUSPEND_POSSIBLE=y
> > # CONFIG_PPC_DCR_NATIVE is not set
> > # CONFIG_PPC_DCR_MMIO is not set
> > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> > CONFIG_ARCH_SUPPORTS_UPROBES=y
> > CONFIG_PGTABLE_LEVELS=2
> > CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> > CONFIG_IRQ_WORK=y
> > CONFIG_BUILDTIME_EXTABLE_SORT=y
> > 
> > #
> > # General setup
> > #
> > CONFIG_BROKEN_ON_SMP=y
> > CONFIG_INIT_ENV_ARG_LIMIT=32
> > CONFIG_CROSS_COMPILE=""
> > # CONFIG_COMPILE_TEST is not set
> > CONFIG_LOCALVERSION=""
> > CONFIG_LOCALVERSION_AUTO=y
> > CONFIG_HAVE_KERNEL_GZIP=y
> > CONFIG_KERNEL_GZIP=y
> > CONFIG_DEFAULT_HOSTNAME="pohl"
> > CONFIG_SWAP=y
> > CONFIG_SYSVIPC=y
> > CONFIG_SYSVIPC_SYSCTL=y
> > CONFIG_POSIX_MQUEUE=y
> > CONFIG_POSIX_MQUEUE_SYSCTL=y
> > CONFIG_CROSS_MEMORY_ATTACH=y
> > CONFIG_FHANDLE=y
> > # CONFIG_USELIB is not set
> > # CONFIG_AUDIT is not set
> > CONFIG_HAVE_ARCH_AUDITSYSCALL=y
> > 
> > #
> > # IRQ subsystem
> > #
> > CONFIG_GENERIC_IRQ_SHOW=y
> > CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
> > CONFIG_IRQ_DOMAIN=y
> > # CONFIG_IRQ_DOMAIN_DEBUG is not set
> > CONFIG_IRQ_FORCED_THREADING=y
> > CONFIG_SPARSE_IRQ=y
> > # CONFIG_GENERIC_IRQ_DEBUGFS is not set
> > CONFIG_GENERIC_TIME_VSYSCALL=y
> > CONFIG_GENERIC_CLOCKEVENTS=y
> > CONFIG_GENERIC_CMOS_UPDATE=y
> > 
> > #
> > # Timers subsystem
> > #
> > CONFIG_TICK_ONESHOT=y
> > CONFIG_NO_HZ_COMMON=y
> > # CONFIG_HZ_PERIODIC is not set
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_NO_HZ=y
> > CONFIG_HIGH_RES_TIMERS=y
> > 
> > #
> > # CPU/Task time and stats accounting
> > #
> > CONFIG_TICK_CPU_ACCOUNTING=y
> > # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
> > # CONFIG_IRQ_TIME_ACCOUNTING is not set
> > # CONFIG_BSD_PROCESS_ACCT is not set
> > # CONFIG_TASKSTATS is not set
> > 
> > #
> > # RCU Subsystem
> > #
> > CONFIG_TINY_RCU=y
> > # CONFIG_RCU_EXPERT is not set
> > CONFIG_SRCU=y
> > CONFIG_TINY_SRCU=y
> > # CONFIG_TASKS_RCU is not set
> > # 

Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-16 Thread LEROY Christophe

Meelis Roos  a écrit :


For me, 4.13 worked and 4.14 hangs early during boot. Bisecting led to
the following commit. I had STRICT_KERNEL_RWX enabled when I met the
option. When I disabled STRICT_KERNEL_RWX, the same kernel booted fine.


Can you please check that 4.13 boots properly with 'nobats' kernel parametre ?

Christophe




95902e6c8864d39b09134dcaa3c99d8161d1deea is the first bad commit
commit 95902e6c8864d39b09134dcaa3c99d8161d1deea
Author: Christophe Leroy 
Date:   Wed Aug 2 15:51:05 2017 +0200

powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32

This patch implements STRICT_KERNEL_RWX on PPC32.

As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
in order to allow page protection setup at the level of each page.

As BAT/LTLB mappings are deactivated, there might be a performance
impact.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 

:04 04 1eac3de57642856e31a914da2e1fe5368095f04b  
ee3634b9ae309852feebc69b8a6bd473944e212c M  arch



Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.13.0-rc2 Kernel Configuration
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
# CONFIG_FORCE_SMP is not set
# CONFIG_SMP is not set
# CONFIG_PPC_DOORBELL is not set
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_PPC32=y
CONFIG_32BIT=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
# CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_BITS_MIN=11
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_PPC_UDBG_16550 is not set
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_EPAPR_BOOT is not set
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_KERNEL_GZIP=y
CONFIG_DEFAULT_HOSTNAME="pohl"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_RCU_NEED_SEGCBLIST is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SOCK_CGROUP_DATA is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# 

Re: [PATCH 1/2] ASoC: fsl_asrc: Fix line continuation format

2017-11-16 Thread Nicolin Chen
On Thu, Nov 16, 2017 at 08:25:46AM -0800, Joe Perches wrote:
> Line continuations with excess spacing causes unexpected output
> 
> Signed-off-by: Joe Perches 

Acked-by: Nicolin Chen 

> ---
>  sound/soc/fsl/fsl_asrc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
> index 806d39927318..ed683fe8b94a 100644
> --- a/sound/soc/fsl/fsl_asrc.c
> +++ b/sound/soc/fsl/fsl_asrc.c
> @@ -288,8 +288,8 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair 
> *pair)
>  
>   if ((outrate > 8000 && outrate < 3) &&
>   (outrate/inrate > 24 || inrate/outrate > 8)) {
> - pair_err("exceed supported ratio range [1/24, 8] for \
> - inrate/outrate: %d/%d\n", inrate, outrate);
> + pair_err("exceed supported ratio range [1/24, 8] for 
> inrate/outrate: %d/%d\n",
> +  inrate, outrate);
>   return -EINVAL;
>   }
>  
> -- 
> 2.15.0
> 


Re: [PATCH] [net-next] ibmvnic: This patch fixes the dma_mapping_error call to use the correct dma_addr which is inside the ibmvnic_vpd struct.

2017-11-16 Thread Desnes Augusto Nunes do Rosário

Version 2 of this patch has been already sent with correct styling.

On 11/16/2017 04:28 PM, Desnes Augusto Nunes do Rosario wrote:

Signed-off-by: Desnes A. Nunes do Rosario 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 04aaacb..1dc4aef 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -849,7 +849,6 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
  {
struct device *dev = >vdev->dev;
union ibmvnic_crq crq;
-   dma_addr_t dma_addr;
int len = 0;

if (adapter->vpd->buff)
@@ -879,7 +878,7 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
adapter->vpd->dma_addr =
dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
   DMA_FROM_DEVICE);
-   if (dma_mapping_error(dev, dma_addr)) {
+   if (dma_mapping_error(dev, adapter->vpd->dma_addr)) {
dev_err(dev, "Could not map VPD buffer\n");
kfree(adapter->vpd->buff);
return -ENOMEM;



--
Desnes Augusto Nunes do Rosário
--

Linux Developer - IBM / Brazil
M.Sc. in Electrical and Computer Engineering - UFRN

(11) 9595-30-900
desn...@br.ibm.com



[PATCH] [net-next,v2] ibmvnic: fix dma_mapping_error call

2017-11-16 Thread Desnes Augusto Nunes do Rosario
This patch fixes the dma_mapping_error call to use the correct dma_addr
which is inside the ibmvnic_vpd struct. Moreover, it fixes a uninitialized
 warning for the local dma_addr.

Signed-off-by: Desnes A. Nunes do Rosario 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 04aaacb..1dc4aef 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -849,7 +849,6 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
 {
struct device *dev = >vdev->dev;
union ibmvnic_crq crq;
-   dma_addr_t dma_addr;
int len = 0;
 
if (adapter->vpd->buff)
@@ -879,7 +878,7 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
adapter->vpd->dma_addr =
dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
   DMA_FROM_DEVICE);
-   if (dma_mapping_error(dev, dma_addr)) {
+   if (dma_mapping_error(dev, adapter->vpd->dma_addr)) {
dev_err(dev, "Could not map VPD buffer\n");
kfree(adapter->vpd->buff);
return -ENOMEM;
-- 
2.9.5



Re: RESEND [PATCH V3 3/4] hotplug/drc-info: Add code to search ibm,drc-info property

2017-11-16 Thread Michael Bringmann

>> +
>> +static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
>> +char *drc_type, unsigned int my_index)
>> +{
>> +struct property *info;
>> +unsigned int entries;
>> +struct of_drc_info drc;
>> +void *value;
> 
> This should be __be32 *

Okay.

> 
>> +int j;
>> +
>> +info = of_find_property(dn->parent, "ibm,drc-info", NULL);
>> +if (info == NULL)
>> +return -EINVAL;
>> +
>> +value = info->value;
>> +value = (void *)of_prop_next_u32(info, value, );
>> +if (!value)
>> +return -EINVAL;
>> +
>> +for (j = 0; j < entries; j++) {
>> +of_one_drc_info(, , );
>> +
>> +/* Should now know end of current entry */
>> +
>> +WARN_ON((my_index < drc.drc_index_start) ||
>> +(((my_index-drc.drc_index_start)%
>> +drc.sequential_inc) != 0));
>> +
>> +if (my_index > drc.last_drc_index)
>> +continue;
>> +
>> +break;
>> +}
>> +/* Found it */
>> +
>> +if (((drc_name == NULL) ||
>> + (drc_name && !strncmp(drc_name,
>> +drc.drc_name_prefix,
>> +strlen(drc.drc_name_prefix &&
> 
> Shouldn't we be doing a string compare on the entire name, not just the 
> prefix?
> 
> If I remember correctly the prefix is the same for every cpu.

The prefix may be a value like "CPU", "MEM", "PHB", or other.
I modeled the comparisons using 'drc_name_prefix' after the comparison
of the 'name_tmp' found in the array 'ibm,drc-names' and 'type_tmp'
found in the array 'ibm,drc-types'.  This is modeled in the new
function 'rpaphp_check_drc_props_v1' which was lifted from the original
function rpaphp_get_drc_props().

> 
> -Nathan
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



[PATCH] [net-next] ibmvnic: This patch fixes the dma_mapping_error call to use the correct dma_addr which is inside the ibmvnic_vpd struct.

2017-11-16 Thread Desnes Augusto Nunes do Rosario
Signed-off-by: Desnes A. Nunes do Rosario 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 04aaacb..1dc4aef 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -849,7 +849,6 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
 {
struct device *dev = >vdev->dev;
union ibmvnic_crq crq;
-   dma_addr_t dma_addr;
int len = 0;
 
if (adapter->vpd->buff)
@@ -879,7 +878,7 @@ static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
adapter->vpd->dma_addr =
dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
   DMA_FROM_DEVICE);
-   if (dma_mapping_error(dev, dma_addr)) {
+   if (dma_mapping_error(dev, adapter->vpd->dma_addr)) {
dev_err(dev, "Could not map VPD buffer\n");
kfree(adapter->vpd->buff);
return -ENOMEM;
-- 
2.9.5



[PATCH V2 3/3] postmigration/memory: Associativity & ibm,dynamic-memory-v2

2017-11-16 Thread Michael Bringmann
postmigration/memory: Now apply changes to the associativity of memory
blocks described by the 'ibm,dynamic-memory-v2' property regarding
the topology of LPARS in Post Migration events.

* Extend the previous work done for the 'ibm,associativity-lookup-array'
  to apply to either property 'ibm,dynamic-memory' or
  'ibm,dynamic-memory-v2', whichever is present.
* Add new code to parse the 'ibm,dynamic-memory-v2' property looking
  for differences in block 'assignment', associativity indexes per
  block, and any other difference currently known.

When block differences are recognized, the memory block may be removed,
added, or updated depending upon the state of the new device tree
property and differences from the migrated value of the property.

Signed-off-by: Michael Bringmann 
---
Changes in V2:
  -- Remove unnecessary spacing changes from patch.
  -- Improve patch description.
---
 arch/powerpc/include/asm/prom.h |   12 ++
 arch/powerpc/platforms/pseries/hotplug-memory.c |  169 ++-
 2 files changed, 172 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 825bd59..e16ef0f 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -92,6 +92,18 @@ struct of_drconf_cell {
u32 flags;
 };
 
+/* The of_drconf_cell_v2 struct defines the layout of the LMB array
+ * specified in the device tree property
+ * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory-v2
+ */
+struct of_drconf_cell_v2 {
+   u32 num_seq_lmbs;
+   u64 base_address;
+   u32 drc_index;
+   u32 aa_index;
+   u32 flags;
+} __attribute__((packed));
+
 #define DRCONF_MEM_ASSIGNED0x0008
 #define DRCONF_MEM_AI_INVALID  0x0040
 #define DRCONF_MEM_RESERVED0x0080
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b37e6ad..bf9687b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -1171,14 +1171,111 @@ static int pseries_update_drconf_memory(struct 
of_reconfig_data *pr)
return rc;
 }
 
+static inline int pseries_memory_v2_find_drc(u32 drc_index,
+   u64 *base_addr, unsigned long memblock_size,
+   struct of_drconf_cell_v2 **drmem,
+   struct of_drconf_cell_v2 *last_drmem)
+{
+   struct of_drconf_cell_v2 *dm = (*drmem);
+
+   while (dm < last_drmem) {
+   if ((be32_to_cpu(dm->drc_index) <= drc_index) &&
+   (drc_index <= (be32_to_cpu(dm->drc_index)+
+   be32_to_cpu(dm->num_seq_lmbs)-1))) {
+   int offset = drc_index - be32_to_cpu(dm->drc_index);
+   (*base_addr) = be64_to_cpu(dm->base_address) +
+   (offset * memblock_size);
+   break;
+   } else if (drc_index > (be32_to_cpu(dm->drc_index)+
+   be32_to_cpu(dm->num_seq_lmbs)-1)) {
+   dm++;
+   (*drmem) = dm;
+   } else if (be32_to_cpu(dm->drc_index) > drc_index) {
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+static int pseries_update_drconf_memory_v2(struct of_reconfig_data *pr)
+{
+   struct of_drconf_cell_v2 *new_drmem, *old_drmem, *last_old_drmem;
+   unsigned long memblock_size;
+   u32 new_entries, old_entries;
+   u64 old_base_addr;
+   __be32 *p;
+   int i, rc = 0;
+
+   if (rtas_hp_event)
+   return 0;
+
+   memblock_size = pseries_memory_block_size();
+   if (!memblock_size)
+   return -EINVAL;
+
+   /* The first int of the property is the number of lmb's
+* described by the property. This is followed by an array
+* of of_drconf_cell_v2 entries. Get the number of entries
+* and skip to the array of of_drconf_cell_v2's.
+*/
+   p = (__be32 *) pr->old_prop->value;
+   if (!p)
+   return -EINVAL;
+   old_entries = be32_to_cpu(*p++);
+   old_drmem = (struct of_drconf_cell_v2 *)p;
+   last_old_drmem = old_drmem +
+   (sizeof(struct of_drconf_cell_v2) * old_entries);
+
+   p = (__be32 *)pr->prop->value;
+   new_entries = be32_to_cpu(*p++);
+   new_drmem = (struct of_drconf_cell_v2 *)p;
+
+   for (i = 0; i < new_entries; i++) {
+   int j;
+   u32 new_drc_index = be32_to_cpu(new_drmem->drc_index);
+
+   for (j = 0; j < new_drmem->num_seq_lmbs; j++) {
+   if (!pseries_memory_v2_find_drc(new_drc_index+j,
+   _base_addr,
+   memblock_size,
+ 

[PATCH V2 2/3] postmigration/memory: Review assoc lookup array changes

2017-11-16 Thread Michael Bringmann
postmigration/memory: In an LPAR migration scenario, the property
"ibm,associativity-lookup-arrays" may change.  In the event that a
row of the array differs, locate all assigned memory blocks with that
'aa_index' and 're-add' them to the system memory block data structures.
In the process of the 're-add', the appropriate entry of the property
'ibm,dynamic-memory' would be updated as well as any other applicable
system data structures.

Signed-off-by: Michael Bringmann 
---
Changes in V2:
  -- Remove unnecessary spacing changes
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  119 +++
 1 file changed, 119 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c61cfc6..b37e6ad 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -1171,6 +1171,122 @@ static int pseries_update_drconf_memory(struct 
of_reconfig_data *pr)
return rc;
 }
 
+struct assoc_arrays {
+   u32 n_arrays;
+   u32 array_sz;
+   const __be32 *arrays;
+};
+
+static int pseries_update_ala_memory_aai(int aa_index,
+   struct property *dmprop)
+{
+   struct of_drconf_cell *drmem;
+   u32 entries;
+   __be32 *p;
+   int i;
+
+   p = (__be32 *) dmprop->value;
+   if (!p)
+   return -EINVAL;
+
+   /* The first int of the property is the number of lmb's
+* described by the property. This is followed by an array
+* of of_drconf_cell entries. Get the number of entries
+* and skip to the array of of_drconf_cell's.
+*/
+   entries = be32_to_cpu(*p++);
+   drmem = (struct of_drconf_cell *)p;
+
+   for (i = 0; i < entries; i++) {
+   if ((be32_to_cpu(drmem[i].aa_index) != aa_index) &&
+   (be32_to_cpu(drmem[i].flags) & DRCONF_MEM_ASSIGNED)) {
+   pseries_memory_readd_by_index(
+   be32_to_cpu(drmem[i].drc_index));
+   }
+   }
+
+   return 0;
+}
+
+static int pseries_update_ala_memory(struct of_reconfig_data *pr)
+{
+   struct assoc_arrays new_ala, old_ala;
+   struct device_node *dn;
+   struct property *dmprop;
+   __be32 *p;
+   int i, lim;
+
+   if (rtas_hp_event)
+   return 0;
+
+   dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (!dn)
+   return -ENODEV;
+
+   dmprop = of_find_property(dn, "ibm,dynamic-memory", NULL);
+   if (!dmprop) {
+   of_node_put(dn);
+   return -ENODEV;
+   }
+
+   /*
+* The layout of the ibm,associativity-lookup-arrays
+* property is a number N indicating the number of
+* associativity arrays, followed by a number M
+* indicating the size of each associativity array,
+* followed by a list of N associativity arrays.
+*/
+
+   p = (__be32 *) pr->old_prop->value;
+   if (!p) {
+   of_node_put(dn);
+   return -EINVAL;
+   }
+   old_ala.n_arrays = of_read_number(p++, 1);
+   old_ala.array_sz = of_read_number(p++, 1);
+   old_ala.arrays = p;
+
+   p = (__be32 *) pr->prop->value;
+   if (!p) {
+   of_node_put(dn);
+   return -EINVAL;
+   }
+   new_ala.n_arrays = of_read_number(p++, 1);
+   new_ala.array_sz = of_read_number(p++, 1);
+   new_ala.arrays = p;
+
+   lim = (new_ala.n_arrays > old_ala.n_arrays) ? old_ala.n_arrays :
+   new_ala.n_arrays;
+
+   if (old_ala.array_sz == new_ala.array_sz) {
+
+   for (i = 0; i < lim; i++) {
+   int index = (i * new_ala.array_sz);
+
+   if (!memcmp(_ala.arrays[index],
+   _ala.arrays[index],
+   new_ala.array_sz))
+   continue;
+
+   pseries_update_ala_memory_aai(i, dmprop);
+   }
+
+   for (i = lim; i < new_ala.n_arrays; i++)
+   pseries_update_ala_memory_aai(i, dmprop);
+
+   } else {
+   /* Update all entries representing these rows;
+* as all rows have different sizes, none can
+* have equivalent values.
+*/
+   for (i = 0; i < lim; i++)
+   pseries_update_ala_memory_aai(i, dmprop);
+   }
+
+   of_node_put(dn);
+   return 0;
+}
+
 static int pseries_memory_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
 {
@@ -1187,6 +1303,9 @@ static int pseries_memory_notifier(struct notifier_block 
*nb,
case OF_RECONFIG_UPDATE_PROPERTY:
if (!strcmp(rd->prop->name, 

[PATCH V2 1/3] hotplug/mobility: Apply assoc updates for Post Migration Topo

2017-11-16 Thread Michael Bringmann
hotplug/mobility: Recognize more changes to the associativity of
memory blocks described by the 'ibm,dynamic-memory' and 'cpu'
properties when processing the topology of LPARS in Post Migration
events.  Previous efforts only recognized whether a memory block's
assignment had changed in the property.  Changes here include:

* Checking the aa_index values of the old/new properties and 'readd'
  any block for which the setting has changed.
* Checking for changes in cpus and submitting 'readd' ops for them.
* Creating some common support routines for the submission of memory
  or cpu 'readd' operations.

Signed-off-by: Michael Bringmann 
---
Changes in V2:
  -- Try to improve patch header documentation.
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   64 +++
 arch/powerpc/platforms/pseries/hotplug-memory.c |6 ++
 arch/powerpc/platforms/pseries/mobility.c   |   47 +
 arch/powerpc/platforms/pseries/pseries.h|2 +
 4 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index fadb95e..d127c3a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -634,6 +634,27 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
return rc;
 }
 
+static int dlpar_cpu_readd_by_index(u32 drc_index)
+{
+   int rc = 0;
+
+   pr_info("Attempting to update CPU, drc index %x\n", drc_index);
+
+   if (dlpar_cpu_remove_by_index(drc_index))
+   rc = -EINVAL;
+   else if (dlpar_cpu_add(drc_index))
+   rc = -EINVAL;
+
+   if (rc)
+   pr_info("Failed to update cpu at drc_index %lx\n",
+   (unsigned long int)drc_index);
+   else
+   pr_info("CPU at drc_index %lx was updated\n",
+   (unsigned long int)drc_index);
+
+   return rc;
+}
+
 static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove)
 {
struct device_node *dn;
@@ -824,6 +845,9 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
else
rc = -EINVAL;
break;
+   case PSERIES_HP_ELOG_ACTION_READD:
+   rc = dlpar_cpu_readd_by_index(drc_index);
+   break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
rc = -EINVAL;
@@ -874,6 +898,42 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t 
count)
 
 #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
+static int pseries_update_drconf_cpu(struct of_reconfig_data *pr)
+{
+   u32 old_entries, new_entries;
+   __be32 *p, *old_assoc, *new_assoc;
+
+   if (strcmp(pr->dn->type, "cpu"))
+   return 0;
+
+   /* The first int of the property is the number of domains's
+* described.  This is followed by an array of level values.
+*/
+   p = (__be32 *) pr->old_prop->value;
+   if (!p)
+   return -EINVAL;
+   old_entries = be32_to_cpu(*p++);
+   old_assoc = p;
+
+   p = (__be32 *)pr->prop->value;
+   if (!p)
+   return -EINVAL;
+   new_entries = be32_to_cpu(*p++);
+   new_assoc = p;
+
+   if (old_entries == new_entries) {
+   int sz = old_entries * sizeof(int);
+
+   if (!memcmp(old_assoc, new_assoc, sz))
+   pseries_cpu_readd_by_index(pr->dn->phandle);
+
+   } else {
+   pseries_cpu_readd_by_index(pr->dn->phandle);
+   }
+
+   return 0;
+}
+
 static int pseries_smp_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
@@ -887,6 +947,10 @@ static int pseries_smp_notifier(struct notifier_block *nb,
case OF_RECONFIG_DETACH_NODE:
pseries_remove_processor(rd->dn);
break;
+   case OF_RECONFIG_UPDATE_PROPERTY:
+   if (!strcmp(rd->prop->name, "ibm,associativity"))
+   err = pseries_update_drconf_cpu(rd);
+   break;
}
return notifier_from_errno(err);
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 1d48ab4..c61cfc6 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -1160,6 +1160,12 @@ static int pseries_update_drconf_memory(struct 
of_reconfig_data *pr)
  memblock_size);
rc = (rc < 0) ? -EINVAL : 0;
break;
+   } else if ((be32_to_cpu(old_drmem[i].aa_index) !=
+   be32_to_cpu(new_drmem[i].aa_index)) &&
+   (be32_to_cpu(new_drmem[i].flags) &
+   DRCONF_MEM_ASSIGNED)) 

Re: RESEND [PATCH V3 3/4] hotplug/drc-info: Add code to search ibm,drc-info property

2017-11-16 Thread Nathan Fontenot


On 11/15/2017 12:09 PM, Michael Bringmann wrote:
> rpadlpar_core.c: Provide parallel routines to search the older device-
> tree properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
> and "ibm,drc-power-domains"), or the new property "ibm,drc-info".
> 
> The interface to examine the DRC information is changed from a "get"
> function that returns values for local verification elsewhere, to a
> "check" function that validates the 'name' and/or 'type' of a device
> node.  This update hides the format of the underlying device-tree
> properties, and concentrates the value checks into a single function
> without requiring the user to verify whether a search was successful.
> 
> Signed-off-by: Michael Bringmann 
> ---
> Changes in V3:
>   -- Now passing more values by structure reducing use of local
>  declarations / initialization.
>   -- Improve some code spacing for better clarity.
> ---
>  drivers/pci/hotplug/rpadlpar_core.c |   13 ++--
>  drivers/pci/hotplug/rpaphp.h|4 +
>  drivers/pci/hotplug/rpaphp_core.c   |  110 
> +++
>  3 files changed, 92 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index a3449d7..fc01d7d 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "../pci.h"
>  #include "rpaphp.h"
> @@ -44,15 +45,14 @@ static struct device_node *find_vio_slot_node(char 
> *drc_name)
>  {
>   struct device_node *parent = of_find_node_by_name(NULL, "vdevice");
>   struct device_node *dn = NULL;
> - char *name;
>   int rc;
> 
>   if (!parent)
>   return NULL;
> 
>   while ((dn = of_get_next_child(parent, dn))) {
> - rc = rpaphp_get_drc_props(dn, NULL, , NULL, NULL);
> - if ((rc == 0) && (!strcmp(drc_name, name)))
> + rc = rpaphp_check_drc_props(dn, drc_name, NULL);
> + if (rc == 0)
>   break;
>   }
> 
> @@ -64,15 +64,12 @@ static struct device_node *find_php_slot_pci_node(char 
> *drc_name,
> char *drc_type)
>  {
>   struct device_node *np = NULL;
> - char *name;
> - char *type;
>   int rc;
> 
>   while ((np = of_find_node_by_name(np, "pci"))) {
> - rc = rpaphp_get_drc_props(np, NULL, , , NULL);
> + rc = rpaphp_check_drc_props(np, drc_name, drc_type);
>   if (rc == 0)
> - if (!strcmp(drc_name, name) && !strcmp(drc_type, type))
> - break;
> + break;
>   }
> 
>   return np;
> diff --git a/drivers/pci/hotplug/rpaphp.h b/drivers/pci/hotplug/rpaphp.h
> index 7db024e..8db5f2e 100644
> --- a/drivers/pci/hotplug/rpaphp.h
> +++ b/drivers/pci/hotplug/rpaphp.h
> @@ -91,8 +91,8 @@ struct slot {
> 
>  /* rpaphp_core.c */
>  int rpaphp_add_slot(struct device_node *dn);
> -int rpaphp_get_drc_props(struct device_node *dn, int *drc_index,
> - char **drc_name, char **drc_type, int *drc_power_domain);
> +int rpaphp_check_drc_props(struct device_node *dn, char *drc_name,
> + char *drc_type);
> 
>  /* rpaphp_slot.c */
>  void dealloc_slot_struct(struct slot *slot);
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index 1e29aba..6606175 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include/* for eeh_add_device() */
>  #include /* rtas_call */
>  #include   /* for pci_controller */
> @@ -196,25 +197,21 @@ static int get_children_props(struct device_node *dn, 
> const int **drc_indexes,
>   return 0;
>  }
> 
> -/* To get the DRC props describing the current node, first obtain it's
> - * my-drc-index property.  Next obtain the DRC list from it's parent.  Use
> - * the my-drc-index for correlation, and obtain the requested properties.
> +
> +/* Verify the existence of 'drc_name' and/or 'drc_type' within the
> + * current node.  First obtain it's my-drc-index property.  Next,
> + * obtain the DRC info from it's parent.  Use the my-drc-index for
> + * correlation, and obtain/validate the requested properties.
>   */
> -int rpaphp_get_drc_props(struct device_node *dn, int *drc_index,
> - char **drc_name, char **drc_type, int *drc_power_domain)
> +
> +static int rpaphp_check_drc_props_v1(struct device_node *dn, char *drc_name,
> + char *drc_type, unsigned int my_index)
>  {
> + char *name_tmp, *type_tmp;
>   const int *indexes, *names;
>   const int *types, *domains;
> - const unsigned int *my_index;
> - char *name_tmp, *type_tmp;
>   int i, rc;
> 
> - my_index = of_get_property(dn, 

[PATCH V2 0/3] powerpc/hotplug: Fix affinity assoc for LPAR migration

2017-11-16 Thread Michael Bringmann
The migration of LPARs across Power systems affects many attributes
including that of the associativity of memory blocks and CPUs.  The
patches in this set execute when a system is coming up fresh upon a
migration target.  They are intended to,

* Recognize changes to the associativity of memory and CPUs recorded
  in internal data structures when compared to the latest copies in
  the device tree (e.g. ibm,dynamic-memory, ibm,dynamic-memory-v2,
  cpus),
* Recognize changes to the associativity mapping (e.g. ibm,
  associativity-lookup-arrays), locate all assigned memory blocks
  corresponding to each changed row, and readd all such blocks.
* Generate calls to other code layers to reset the data structures
  related to associativity of the CPUs and memory.
* Re-register the 'changed' entities into the target system.
  Re-registration of CPUs and memory blocks mostly entails acting as
  if they have been newly hot-added into the target system.

Signed-off-by: Michael Bringmann 

Michael Bringmann (3):
  hotplug/mobility: Apply assoc lookup updates for Post Migration Topo
  postmigration/memory: Review assoc lookup array changes
  postmigration/memory: Associativity & 'ibm,dynamic-memory-v2'
---
Changes in V2:
  -- Try to improve patch header documentation.
  -- Remove unnecessary spacing changes from patch



[PATCH v4.2] powerpc/modules: Don't try to restore r2 after a sibling call

2017-11-16 Thread Josh Poimboeuf
On Thu, Nov 16, 2017 at 06:39:03PM +0530, Naveen N. Rao wrote:
> Josh Poimboeuf wrote:
> > On Wed, Nov 15, 2017 at 02:58:33PM +0530, Naveen N. Rao wrote:
> > > > +int instr_is_link_branch(unsigned int instr)
> > > > +{
> > > > +   return (instr_is_branch_iform(instr) || 
> > > > instr_is_branch_bform(instr)) &&
> > > > +  (instr & BRANCH_SET_LINK);
> > > > +}
> > > > +
> > > 
> > > Nitpicking here, but since we're not considering the other branch forms,
> > > perhaps this can be renamed to instr_is_link_relative_branch() (or maybe
> > > instr_is_relative_branch_link()), just so we're clear :)
> > 
> > My understanding is that the absolute/relative bit isn't a "form", but
> > rather a bit that can be set for either the b-form (conditional) or the
> > i-form (unconditional).  And the above function isn't checking the
> > absolute bit, so it isn't necessarily a relative branch.  Or did I miss
> > something?
> 
> Ah, good point. I was coming from the fact that we are only considering the
> i-form and b-form branches and not the lr/ctr/tar based branches, which are
> always absolute branches, but can also set the link register.

Hm, RISC is more complicated than I realized ;-)

> Thinking about this more, aren't we only interested in relative branches
> here (for relocations), so can we actually filter out the absolute branches?
> Something like this?
> 
> int instr_is_relative_branch_link(unsigned int instr)
> {
>   return ((instr_is_branch_iform(instr) || instr_is_branch_bform(instr)) 
> &&
>  !(instr & BRANCH_ABSOLUTE) && (instr & BRANCH_SET_LINK));

Yeah, makes sense to me.  Here's another try (also untested).  If this
looks ok, Kamalesh would you mind testing again?

8<

From: Josh Poimboeuf 
Subject: [PATCH v4.2] powerpc/modules: Don't try to restore r2 after a sibling 
call

When attempting to load a livepatch module, I got the following error:

  module_64: patch_module: Expect noop after relocate, got 3c82

The error was triggered by the following code in
unregister_netdevice_queue():

  14c:   00 00 00 48 b   14c 
 14c: R_PPC64_REL24  net_set_todo
  150:   00 00 82 3c addis   r4,r2,0

GCC didn't insert a nop after the branch to net_set_todo() because it's
a sibling call, so it never returns.  The nop isn't needed after the
branch in that case.

Signed-off-by: Josh Poimboeuf 
---
 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/kernel/module_64.c  | 12 +++-
 arch/powerpc/lib/code-patching.c |  5 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index abef812de7f8..2c895e8d07f7 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -33,6 +33,7 @@ int patch_branch(unsigned int *addr, unsigned long target, 
int flags);
 int patch_instruction(unsigned int *addr, unsigned int instr);
 
 int instr_is_relative_branch(unsigned int instr);
+int instr_is_relative_link_branch(unsigned int instr);
 int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 759104b99f9f..180c16f04063 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -487,7 +487,17 @@ static bool is_early_mcount_callsite(u32 *instruction)
restore r2. */
 static int restore_r2(u32 *instruction, struct module *me)
 {
-   if (is_early_mcount_callsite(instruction - 1))
+   u32 *prev_insn = instruction - 1;
+
+   if (is_early_mcount_callsite(prev_insn))
+   return 1;
+
+   /*
+* Make sure the branch isn't a sibling call.  Sibling calls aren't
+* "link" branches and they don't return, so they don't need the r2
+* restore afterwards.
+*/
+   if (!instr_is_relative_link_branch(*prev_insn))
return 1;
 
if (*instruction != PPC_INST_NOP) {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c9de03e0c1f1..d81aab7441f7 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -304,6 +304,11 @@ int instr_is_relative_branch(unsigned int instr)
return instr_is_branch_iform(instr) || instr_is_branch_bform(instr);
 }
 
+int instr_is_relative_link_branch(unsigned int instr)
+{
+   return instr_is_relative_branch(instr) && (instr & BRANCH_SET_LINK);
+}
+
 static unsigned long branch_iform_target(const unsigned int *instr)
 {
signed long imm;
-- 
2.13.6



Re: RESEND [PATCH V3 2/4] pseries/drc-info: Search DRC properties for CPU indexes

2017-11-16 Thread Michael Bringmann
See below.

On 11/16/2017 11:34 AM, Nathan Fontenot wrote:
> On 11/15/2017 12:09 PM, Michael Bringmann wrote:
>> pseries/drc-info: Provide parallel routines to convert between
>> drc_index and CPU numbers at runtime, using the older device-tree
>> properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
>> and "ibm,drc-power-domains"), or the new property "ibm,drc-info".
>>
>> Signed-off-by: Michael Bringmann 
>> ---
>> Changes in V3:
>>   -- Some code compression and use of data structures for value passing.
>> ---
>>  arch/powerpc/include/asm/prom.h |   15 ++
>>  arch/powerpc/platforms/pseries/of_helpers.c |   60 ++
>>  arch/powerpc/platforms/pseries/pseries_energy.c |  139 
>> ++-
>>  3 files changed, 186 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/prom.h 
>> b/arch/powerpc/include/asm/prom.h
>> index 3243455..007430a 100644
>> --- a/arch/powerpc/include/asm/prom.h
>> +++ b/arch/powerpc/include/asm/prom.h
>> @@ -96,6 +96,21 @@ struct of_drconf_cell {
>>  #define DRCONF_MEM_AI_INVALID   0x0040
>>  #define DRCONF_MEM_RESERVED 0x0080
>>
>> +struct of_drc_info {
>> +char *drc_type;
>> +char *drc_name_prefix;
>> +u32 drc_index_start;
>> +u32 drc_name_suffix_start;
>> +u32 num_sequential_elems;
>> +u32 sequential_inc;
>> +u32 drc_power_domain;
>> +u32 last_drc_index;
>> +};
>> +
>> +extern int of_one_drc_info(struct property **prop, void **curval,
>> +struct of_drc_info *data);
> 
> I'm not sure if prom.h is where this really belongs but I also do
> not see an existing header file that it really makes sense to put it in.

If you think of a better place, please let me know.
> 
>> +
>> +
>>  /*
>>   * There are two methods for telling firmware what our capabilities are.
>>   * Newer machines have an "ibm,client-architecture-support" method on the
>> diff --git a/arch/powerpc/platforms/pseries/of_helpers.c 
>> b/arch/powerpc/platforms/pseries/of_helpers.c
>> index 2798933..62dc8e9 100644
>> --- a/arch/powerpc/platforms/pseries/of_helpers.c
>> +++ b/arch/powerpc/platforms/pseries/of_helpers.c
>> @@ -2,6 +2,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "of_helpers.h"
>>
>> @@ -36,3 +37,62 @@ struct device_node *pseries_of_derive_parent(const char 
>> *path)
>>  kfree(parent_path);
>>  return parent ? parent : ERR_PTR(-EINVAL);
>>  }
>> +
>> +
>> +/* Helper Routines to convert between drc_index to cpu numbers */
>> +
>> +int of_one_drc_info(struct property **prop, void **curval,
>> +struct of_drc_info *data)
> 
> Small nit, this should probably be of_read_drc_info_cell.

Okay.  Will change.

> 
>> +{
>> +const char *p;
>> +const __be32 *p2;
>> +
>> +if (!data)
>> +return -EINVAL;
>> +
>> +/* Get drc-type:encode-string */
>> +p = data->drc_type = (*curval);
>> +p = of_prop_next_string(*prop, p);
>> +if (!p)
>> +return -EINVAL;
>> +
>> +/* Get drc-name-prefix:encode-string */
>> +data->drc_name_prefix = (char *)p;
>> +p = of_prop_next_string(*prop, p);
>> +if (!p)
>> +return -EINVAL;
>> +
>> +/* Get drc-index-start:encode-int */
>> +p2 = (const __be32 *)p;
>> +p2 = of_prop_next_u32(*prop, p2, >drc_index_start);
>> +if (!p2)
>> +return -EINVAL;
>> +
>> +/* Get/skip drc-name-suffix-start:encode-int */
> 
> You're getting the suffix, should probably drop 'skip' in the comment.

Okay.

> 
>> +p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start);
>> +if (!p2)
>> +return -EINVAL;
>> +
>> +/* Get number-sequential-elements:encode-int */
>> +p2 = of_prop_next_u32(*prop, p2, >num_sequential_elems);
>> +if (!p2)
>> +return -EINVAL;
>> +
>> +/* Get sequential-increment:encode-int */
>> +p2 = of_prop_next_u32(*prop, p2, >sequential_inc);
>> +if (!p2)
>> +return -EINVAL;
>> +
>> +/* Get/skip drc-power-domain:encode-int */
> 
> Same here.

Okay.

> 
>> +p2 = of_prop_next_u32(*prop, p2, >drc_power_domain);
>> +if (!p2)
>> +return -EINVAL;
>> +
>> +/* Should now know end of current entry */
>> +(*curval) = (void *)p2;
>> +data->last_drc_index = data->drc_index_start +
>> +((data->num_sequential_elems-1)*data->sequential_inc);
>> +
>> +return 0;
>> +}
>> +EXPORT_SYMBOL(of_one_drc_info);
>> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
>> b/arch/powerpc/platforms/pseries/pseries_energy.c
>> index 35c891a..7160855 100644
>> --- a/arch/powerpc/platforms/pseries/pseries_energy.c
>> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
>> @@ -22,6 +22,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>
>>  #define MODULE_VERS "1.0"
>> @@ -38,26 +39,65 @@
>>  static u32 cpu_to_drc_index(int cpu)
>>  {
>>  struct device_node *dn = NULL;

Re: RESEND [PATCH V3 1/4] powerpc/firmware: Add definitions for new drc-info firmware feature

2017-11-16 Thread Michael Bringmann


On 11/16/2017 11:06 AM, Nathan Fontenot wrote:
> On 11/15/2017 12:09 PM, Michael Bringmann wrote:
>> Firmware Features: Define new bit flag representing the presence of
>> new device tree property "ibm,drc-info".  The flag is used to tell
>> the front end processor whether the Linux kernel supports the new
>> property, and by the front end processor to tell the Linux kernel
>> that the new property is present in the device tree.
> 
> This patch seems to be adding a bit for the drc-info feature so that
> we can use the firmware_has_feature() interface to determine if the
> device tree has the new ibm,drc-info properties.
> 
> I'm not sure what front-end processor you're referring to? Is this
> in reference to the architecture vector that is exchanged with firmware?

I was trying to be generic instead of writing pHyp, BMC, or other.
We can change the comment if it is misleading.

> 
> -Nathan
> 
>>
>> Signed-off-by: Michael Bringmann 
>> ---
>>  arch/powerpc/include/asm/firmware.h   |3 ++-
>>  arch/powerpc/include/asm/prom.h   |1 +
>>  arch/powerpc/platforms/pseries/firmware.c |1 +
>>  3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/firmware.h 
>> b/arch/powerpc/include/asm/firmware.h
>> index 8645897..329d537 100644
>> --- a/arch/powerpc/include/asm/firmware.h
>> +++ b/arch/powerpc/include/asm/firmware.h
>> @@ -51,6 +51,7 @@
>>  #define FW_FEATURE_BEST_ENERGY  ASM_CONST(0x8000)
>>  #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001)
>>  #define FW_FEATURE_PRRN ASM_CONST(0x0002)
>> +#define FW_FEATURE_DRC_INFO ASM_CONST(0x0004)
>>
>>  #ifndef __ASSEMBLY__
>>
>> @@ -67,7 +68,7 @@ enum {
>>  FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
>>  FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
>>  FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
>> -FW_FEATURE_HPT_RESIZE,
>> +FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRC_INFO,
>>  FW_FEATURE_PSERIES_ALWAYS = 0,
>>  FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
>>  FW_FEATURE_POWERNV_ALWAYS = 0,
>> diff --git a/arch/powerpc/include/asm/prom.h 
>> b/arch/powerpc/include/asm/prom.h
>> index 825bd59..3243455 100644
>> --- a/arch/powerpc/include/asm/prom.h
>> +++ b/arch/powerpc/include/asm/prom.h
>> @@ -175,6 +175,7 @@ struct of_drconf_cell {
>>  #define OV5_HASH_GTSE   0x1940  /* Guest Translation Shoot Down 
>> Avail */
>>  /* Radix Table Extensions */
>>  #define OV5_RADIX_GTSE  0x1A40  /* Guest Translation Shoot Down 
>> Avail */
>> +#define OV5_DRC_INFO0x1640  /* Redef Prop Structures: 
>> drc-info   */
>>
>>  /* Option Vector 6: IBM PAPR hints */
>>  #define OV6_LINUX   0x02/* Linux is our OS */
>> diff --git a/arch/powerpc/platforms/pseries/firmware.c 
>> b/arch/powerpc/platforms/pseries/firmware.c
>> index 63cc82a..757d757 100644
>> --- a/arch/powerpc/platforms/pseries/firmware.c
>> +++ b/arch/powerpc/platforms/pseries/firmware.c
>> @@ -114,6 +114,7 @@ struct vec5_fw_feature {
>>  vec5_fw_features_table[] = {
>>  {FW_FEATURE_TYPE1_AFFINITY, OV5_TYPE1_AFFINITY},
>>  {FW_FEATURE_PRRN,   OV5_PRRN},
>> +{FW_FEATURE_DRC_INFO,   OV5_DRC_INFO},
>>  };
>>
>>  static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
>>
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug

2017-11-16 Thread Michael Bringmann
>>>
 +  if ((NODE_DATA(nid) == NULL) ||
 +  (NODE_DATA(nid)->node_spanned_pages == 0)) {
 +  if (try_online_node(nid))
>>>
>>> .. to do something like online a node.
>>
>> We have changed the function name to 'find_cpu_nid'.
> 
> Ok, but I would still not expect 'find_cpu_nid' to online the node.
> 

We would have to talk to the developer that created try_online_node()
which fully initializes the node and all of the related data structures.
A few of the APIs are external, and 'numa.c' knows how to allocate the
base 'pgdat' structure, but everything else that the kernel depends
upon for a node is handled in mm/page_alloc.c and mm/hotplug_memory.c.
I was trying to avoid piecemeal changes to that code -- avoid any changes
if it comes to it.

Even if it was not expected to put the node online, it is convenient,
as otherwise the patches to 'numa.c' would have to put the node online
-- that is expected for a CPU that is online.

Regards,

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: RESEND [PATCH V3 2/4] pseries/drc-info: Search DRC properties for CPU indexes

2017-11-16 Thread Nathan Fontenot
On 11/15/2017 12:09 PM, Michael Bringmann wrote:
> pseries/drc-info: Provide parallel routines to convert between
> drc_index and CPU numbers at runtime, using the older device-tree
> properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
> and "ibm,drc-power-domains"), or the new property "ibm,drc-info".
> 
> Signed-off-by: Michael Bringmann 
> ---
> Changes in V3:
>   -- Some code compression and use of data structures for value passing.
> ---
>  arch/powerpc/include/asm/prom.h |   15 ++
>  arch/powerpc/platforms/pseries/of_helpers.c |   60 ++
>  arch/powerpc/platforms/pseries/pseries_energy.c |  139 
> ++-
>  3 files changed, 186 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
> index 3243455..007430a 100644
> --- a/arch/powerpc/include/asm/prom.h
> +++ b/arch/powerpc/include/asm/prom.h
> @@ -96,6 +96,21 @@ struct of_drconf_cell {
>  #define DRCONF_MEM_AI_INVALID0x0040
>  #define DRCONF_MEM_RESERVED  0x0080
> 
> +struct of_drc_info {
> + char *drc_type;
> + char *drc_name_prefix;
> + u32 drc_index_start;
> + u32 drc_name_suffix_start;
> + u32 num_sequential_elems;
> + u32 sequential_inc;
> + u32 drc_power_domain;
> + u32 last_drc_index;
> +};
> +
> +extern int of_one_drc_info(struct property **prop, void **curval,
> + struct of_drc_info *data);

I'm not sure if prom.h is where this really belongs but I also do
not see an existing header file that it really makes sense to put it in.
 
> +
> +
>  /*
>   * There are two methods for telling firmware what our capabilities are.
>   * Newer machines have an "ibm,client-architecture-support" method on the
> diff --git a/arch/powerpc/platforms/pseries/of_helpers.c 
> b/arch/powerpc/platforms/pseries/of_helpers.c
> index 2798933..62dc8e9 100644
> --- a/arch/powerpc/platforms/pseries/of_helpers.c
> +++ b/arch/powerpc/platforms/pseries/of_helpers.c
> @@ -2,6 +2,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "of_helpers.h"
> 
> @@ -36,3 +37,62 @@ struct device_node *pseries_of_derive_parent(const char 
> *path)
>   kfree(parent_path);
>   return parent ? parent : ERR_PTR(-EINVAL);
>  }
> +
> +
> +/* Helper Routines to convert between drc_index to cpu numbers */
> +
> +int of_one_drc_info(struct property **prop, void **curval,
> + struct of_drc_info *data)

Small nit, this should probably be of_read_drc_info_cell.
 
> +{
> + const char *p;
> + const __be32 *p2;
> +
> + if (!data)
> + return -EINVAL;
> +
> + /* Get drc-type:encode-string */
> + p = data->drc_type = (*curval);
> + p = of_prop_next_string(*prop, p);
> + if (!p)
> + return -EINVAL;
> +
> + /* Get drc-name-prefix:encode-string */
> + data->drc_name_prefix = (char *)p;
> + p = of_prop_next_string(*prop, p);
> + if (!p)
> + return -EINVAL;
> +
> + /* Get drc-index-start:encode-int */
> + p2 = (const __be32 *)p;
> + p2 = of_prop_next_u32(*prop, p2, >drc_index_start);
> + if (!p2)
> + return -EINVAL;
> +
> + /* Get/skip drc-name-suffix-start:encode-int */

You're getting the suffix, should probably drop 'skip' in the comment.

> + p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start);
> + if (!p2)
> + return -EINVAL;
> +
> + /* Get number-sequential-elements:encode-int */
> + p2 = of_prop_next_u32(*prop, p2, >num_sequential_elems);
> + if (!p2)
> + return -EINVAL;
> +
> + /* Get sequential-increment:encode-int */
> + p2 = of_prop_next_u32(*prop, p2, >sequential_inc);
> + if (!p2)
> + return -EINVAL;
> +
> + /* Get/skip drc-power-domain:encode-int */

Same here.

> + p2 = of_prop_next_u32(*prop, p2, >drc_power_domain);
> + if (!p2)
> + return -EINVAL;
> +
> + /* Should now know end of current entry */
> + (*curval) = (void *)p2;
> + data->last_drc_index = data->drc_index_start +
> + ((data->num_sequential_elems-1)*data->sequential_inc);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(of_one_drc_info);
> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
> b/arch/powerpc/platforms/pseries/pseries_energy.c
> index 35c891a..7160855 100644
> --- a/arch/powerpc/platforms/pseries/pseries_energy.c
> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
> 
>  #define MODULE_VERS "1.0"
> @@ -38,26 +39,65 @@
>  static u32 cpu_to_drc_index(int cpu)
>  {
>   struct device_node *dn = NULL;
> - const int *indexes;
> - int i;
> + int thread_index;
>   int rc = 1;
>   u32 ret = 0;
> 
>   dn = of_find_node_by_path("/cpus");
>   if (dn == NULL)
>   goto err;
> - indexes = of_get_property(dn, 

[PATCH V7 3/3] hotplug/cpu: Fix crash with memoryless nodes

2017-11-16 Thread Michael Bringmann
On powerpc systems with shared configurations of CPUs and memory and
memoryless nodes at boot, an event ordering problem was observed on
a SLES12 build platforms with the hot-add of CPUs to the memoryless
nodes.

* The most common error occurred when the memory SLAB driver attempted
  to reference the memoryless node to which a CPU was being added
  before the kernel had finished initializing all of the data structures
  for the CPU and exited 'device_online' under DLPAR/hot-add.

  Normally the memoryless node would be initialized through the call
  path device_online ... arch_update_cpu_topology ... find_cpu_nid
  ...  try_online_node.  This patch ensures that the powerpc node will
  be initialized as early as possible, even if it was memoryless and
  CPU-less at the point when we are trying to hot-add a new CPU to it.

Signed-off-by: Michael Bringmann 
---
Changes in V7:
  -- Make function find_cpu_nid() externally visible/usable so that
 it may be used from hotplug-cpu.c
---
 arch/powerpc/mm/numa.c   |3 ++-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 163f4cc..d6d4f7c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1310,7 +1310,7 @@ static long vphn_get_associativity(unsigned long cpu,
return rc;
 }
 
-static inline int find_cpu_nid(int cpu)
+int find_cpu_nid(int cpu)
 {
__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
int new_nid;
@@ -1343,6 +1343,7 @@ static inline int find_cpu_nid(int cpu)
 #endif
}
 
+   printk(KERN_INFO "%s:%d cpu %d nid %d\n", __FUNCTION__, __LINE__, cpu, 
new_nid);
return new_nid;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a7d14aa7..df8c732 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -340,6 +340,8 @@ static void pseries_remove_processor(struct device_node *np)
cpu_maps_update_done();
 }
 
+extern int find_cpu_nid(int cpu);
+
 static int dlpar_online_cpu(struct device_node *dn)
 {
int rc = 0;
@@ -364,6 +366,7 @@ static int dlpar_online_cpu(struct device_node *dn)
!= CPU_STATE_OFFLINE);
cpu_maps_update_done();
timed_topology_update(1);
+   find_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
if (rc)
goto out;



RESEND [PATCH V7 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug

2017-11-16 Thread Michael Bringmann
On powerpc systems which allow 'hot-add' of CPU, it may occur that
the new resources are to be inserted into nodes that were not used
for memory resources at bootup.  Many different configurations of
PowerPC resources may need to be supported depending upon the
environment.  Important characteristics of the nodes and operating
environment include:

* Dedicated vs. shared resources.  Shared resources require
  information such as the VPHN hcall for CPU assignment to nodes.
  Associativity decisions made based on dedicated resource rules,
  such as associativity properties in the device tree, may vary
  from decisions made using the values returned by the VPHN hcall.
* memoryless nodes at boot.  Nodes need to be defined as 'possible'
  at boot for operation with other code modules.  Previously, the
  powerpc code would limit the set of possible nodes to those which
  have memory assigned at boot, and were thus online.  Subsequent
  add/remove of CPUs or memory would only work with this subset of
  possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot.  In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented
  by the device tree or VPHN hcalls.  Nodes that might be known to
  the pHyp were not 'possible' in the runtime kernel because they did
  not have memory at boot.

This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot.
The problems of interest include,

* Nodes known to powerpc to be memoryless at boot, but to have
  CPUs in them are allowed to be 'possible' and 'online'.  Memory
  allocations for those nodes are taken from another node that does
  have memory until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc.  Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting the references to one of the set
  of nodes known to have memory at boot.

Note that this software operates under the context of CPU hotplug.
We are not doing memory hotplug in this code, but rather updating
the kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology).  We are initializing a node that may be
used by CPUs or memory before it can be referenced as invalid by a
CPU hotplug operation.  CPU hotplug operations are protected by a
range of APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected
by mem_hotplug_begin/mem_hotplug_done, device locks, and more.  In
the case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap.  Using HMC hot-add/hot-remove operations, we have been able
to add and remove CPUs to any possible node without failures.  HMC
operations involve a degree self-serialization, though.

Signed-off-by: Michael Bringmann 
---
Changes in V6:
  -- Add some needed node initialization to runtime code that maps
 CPUs based on VPHN associativity
  -- Add error checks and alternate recovery for compile flag
 CONFIG_MEMORY_HOTPLUG
  -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG
  -- Add more information to the patch introductory text
---
 arch/powerpc/mm/numa.c |   51 ++--
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 334a1ff..163f4cc 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
nid = of_node_to_nid_single(cpu);
 
 out_present:
-   if (nid < 0 || !node_online(nid))
+   if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
map_cpu_to_node(lcpu, nid);
@@ -867,7 +867,7 @@ void __init dump_numa_cpu_topology(void)
 }
 
 /* Initialize NODE_DATA for a node on the local memory */
-static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
+static void setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 {
u64 spanned_pages = end_pfn - start_pfn;
const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
@@ -913,10 +913,8 @@ static void __init find_possible_nodes(void)
min_common_depth);
 
for (i = 0; i < numnodes; i++) {
-   if (!node_possible(i)) {
-   setup_node_data(i, 0, 

[PATCH V7 1/3] powerpc/nodes: Ensure enough nodes avail for operations

2017-11-16 Thread Michael Bringmann
On powerpc systems which allow 'hot-add' of CPU or memory resources,
it may occur that the new resources are to be inserted into nodes
that were not used for these resources at bootup.  In the kernel,
any node that is used must be defined and initialized.  These empty
nodes may occur when,

* Dedicated vs. shared resources.  Shared resources require
  information such as the VPHN hcall for CPU assignment to nodes.
  Associativity decisions made based on dedicated resource rules,
  such as associativity properties in the device tree, may vary
  from decisions made using the values returned by the VPHN hcall.
* memoryless nodes at boot.  Nodes need to be defined as 'possible'
  at boot for operation with other code modules.  Previously, the
  powerpc code would limit the set of possible nodes to those which
  have memory assigned at boot, and were thus online.  Subsequent
  add/remove of CPUs or memory would only work with this subset of
  possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot.  In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented
  by the device tree or VPHN hcalls.  Nodes that might be known to
  the pHyp were not 'possible' in the runtime kernel because they did
  not have memory at boot.

This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot.  This
patch set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have
  CPUs in them are allowed to be 'possible' and 'online'.  Memory
  allocations for those nodes are taken from another node that does
  have memory until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc.  Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting to one of the set of nodes that
  were known to have memory at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the "ibm,max-associativity-domains" property is not present at boot,
no operation will be performed to define or enable additional nodes, or
enable the above 'nodes_and()'.

Signed-off-by: Michael Bringmann 
---
Changes in V6:
  -- Remove some node initialization/allocation from boot setup
 to later in runtime to try to limit memory needs early on
  -- Augment descriptive documentation for patch
---
 arch/powerpc/mm/numa.c |   40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index eb604b3..334a1ff 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,37 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init find_possible_nodes(void)
+{
+   struct device_node *rtas;
+   u32 numnodes, i;
+
+   if (min_common_depth <= 0)
+   return;
+
+   rtas = of_find_node_by_path("/rtas");
+   if (!rtas)
+   return;
+
+   if (of_property_read_u32_index(rtas,
+   "ibm,max-associativity-domains",
+   min_common_depth, ))
+   goto out;
+
+   pr_info("numa: Nodes = %d (mcd = %d)\n", numnodes,
+   min_common_depth);
+
+   for (i = 0; i < numnodes; i++) {
+   if (!node_possible(i)) {
+   setup_node_data(i, 0, 0);
+   node_set(i, node_possible_map);
+   }
+   }
+
+out:
+   of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
int nid, cpu;
@@ -905,12 +936,15 @@ void __init initmem_init(void)
memblock_dump_all();
 
/*
-* Reduce the possible NUMA nodes to the online NUMA nodes,
-* since we do not support node hotplug. This ensures that  we
-* lower the maximum NUMA node ID to what is actually present.
+* Modify the set of possible NUMA nodes to reflect information
+* available about the set of online nodes, and the set of nodes
+* that we expect to make use of for this platform's affinity
+* calculations.
 */

[PATCH V7 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug

2017-11-16 Thread Michael Bringmann
To: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Michael Bringmann 
Cc: John Allen 
Cc: Nathan Fontenot 
Cc: Tyrel Datwyler 
Cc: Thomas Falcon 
Subject: [PATCH V7 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug

On powerpc systems which allow 'hot-add' of CPU, it may occur that
the new resources are to be inserted into nodes that were not used
for memory resources at bootup.  Many different configurations of
PowerPC resources may need to be supported depending upon the
environment.  Important characteristics of the nodes and operating
environment include:

* Dedicated vs. shared resources.  Shared resources require
  information such as the VPHN hcall for CPU assignment to nodes.
  Associativity decisions made based on dedicated resource rules,
  such as associativity properties in the device tree, may vary
  from decisions made using the values returned by the VPHN hcall.
* memoryless nodes at boot.  Nodes need to be defined as 'possible'
  at boot for operation with other code modules.  Previously, the
  powerpc code would limit the set of possible nodes to those which
  have memory assigned at boot, and were thus online.  Subsequent
  add/remove of CPUs or memory would only work with this subset of
  possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot.  In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented
  by the device tree or VPHN hcalls.  Nodes that might be known to
  the pHyp were not 'possible' in the runtime kernel because they did
  not have memory at boot.

This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot.
The problems of interest include,

* Nodes known to powerpc to be memoryless at boot, but to have
  CPUs in them are allowed to be 'possible' and 'online'.  Memory
  allocations for those nodes are taken from another node that does
  have memory until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc.  Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting the references to one of the set
  of nodes known to have memory at boot.

Note that this software operates under the context of CPU hotplug.
We are not doing memory hotplug in this code, but rather updating
the kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology).  We are initializing a node that may be
used by CPUs or memory before it can be referenced as invalid by a
CPU hotplug operation.  CPU hotplug operations are protected by a
range of APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected
by mem_hotplug_begin/mem_hotplug_done, device locks, and more.  In
the case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap.  Using HMC hot-add/hot-remove operations, we have been able
to add and remove CPUs to any possible node without failures.  HMC
operations involve a degree self-serialization, though.

Signed-off-by: Michael Bringmann 
---
Changes in V6:
  -- Add some needed node initialization to runtime code that maps
 CPUs based on VPHN associativity
  -- Add error checks and alternate recovery for compile flag
 CONFIG_MEMORY_HOTPLUG
  -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG
  -- Add more information to the patch introductory text
---
 arch/powerpc/mm/numa.c |   51 ++--
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 334a1ff..163f4cc 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
nid = of_node_to_nid_single(cpu);
 
 out_present:
-   if (nid < 0 || !node_online(nid))
+   if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
map_cpu_to_node(lcpu, nid);
@@ -867,7 +867,7 @@ void __init dump_numa_cpu_topology(void)
 }
 
 /* Initialize NODE_DATA for a node on the local memory */
-static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
+static void setup_node_data(int 

[PATCH V7 0/2] powerpc/nodes: Fix issues with memoryless nodes

2017-11-16 Thread Michael Bringmann
powerpc/nodes: Ensure enough nodes avail for operations

powerpc/initnodes: Ensure nodes initialized for hotplug

hotplug/cpu: Fix crash with memoryless nodes

Signed-off-by: Michael Bringmann 

Michael Bringmann (3):
  powerpc/nodes: Ensure enough nodes avail for operations
  powerpc/initnodes: Ensure nodes initialized for hotplug
  hotplug/cpu: Fix crash with memoryless nodes
---
Changes in V7:
  -- Remove some node initialization/allocation from boot setup
 to later in runtime to try to limit memory needs early on
  -- Add some needed node initialization to runtime code that maps
 CPUs based on VPHN associativity
  -- Add error checks and alternate recovery for compile flag
 CONFIG_MEMORY_HOTPLUG
  -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG
  -- Add initialization for systems that start using nodes before new
 CPU brought online.
  -- Make function find_cpu_nid() externally visible/usable so that
 it may be used from hotplug-cpu.c



Re: RESEND [PATCH V3 1/4] powerpc/firmware: Add definitions for new drc-info firmware feature

2017-11-16 Thread Nathan Fontenot
On 11/15/2017 12:09 PM, Michael Bringmann wrote:
> Firmware Features: Define new bit flag representing the presence of
> new device tree property "ibm,drc-info".  The flag is used to tell
> the front end processor whether the Linux kernel supports the new
> property, and by the front end processor to tell the Linux kernel
> that the new property is present in the device tree.

This patch seems to be adding a bit for the drc-info feature so that
we can use the firmware_has_feature() interface to determine if the
device tree has the new ibm,drc-info properties.

I'm not sure what front-end processor you're referring to? Is this
in reference to the architecture vector that is exchanged with firmware?

-Nathan

> 
> Signed-off-by: Michael Bringmann 
> ---
>  arch/powerpc/include/asm/firmware.h   |3 ++-
>  arch/powerpc/include/asm/prom.h   |1 +
>  arch/powerpc/platforms/pseries/firmware.c |1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/firmware.h 
> b/arch/powerpc/include/asm/firmware.h
> index 8645897..329d537 100644
> --- a/arch/powerpc/include/asm/firmware.h
> +++ b/arch/powerpc/include/asm/firmware.h
> @@ -51,6 +51,7 @@
>  #define FW_FEATURE_BEST_ENERGY   ASM_CONST(0x8000)
>  #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001)
>  #define FW_FEATURE_PRRN  ASM_CONST(0x0002)
> +#define FW_FEATURE_DRC_INFO  ASM_CONST(0x0004)
> 
>  #ifndef __ASSEMBLY__
> 
> @@ -67,7 +68,7 @@ enum {
>   FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
>   FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
>   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
> - FW_FEATURE_HPT_RESIZE,
> + FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRC_INFO,
>   FW_FEATURE_PSERIES_ALWAYS = 0,
>   FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
>   FW_FEATURE_POWERNV_ALWAYS = 0,
> diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
> index 825bd59..3243455 100644
> --- a/arch/powerpc/include/asm/prom.h
> +++ b/arch/powerpc/include/asm/prom.h
> @@ -175,6 +175,7 @@ struct of_drconf_cell {
>  #define OV5_HASH_GTSE0x1940  /* Guest Translation Shoot Down 
> Avail */
>  /* Radix Table Extensions */
>  #define OV5_RADIX_GTSE   0x1A40  /* Guest Translation Shoot Down 
> Avail */
> +#define OV5_DRC_INFO 0x1640  /* Redef Prop Structures: drc-info   */
> 
>  /* Option Vector 6: IBM PAPR hints */
>  #define OV6_LINUX0x02/* Linux is our OS */
> diff --git a/arch/powerpc/platforms/pseries/firmware.c 
> b/arch/powerpc/platforms/pseries/firmware.c
> index 63cc82a..757d757 100644
> --- a/arch/powerpc/platforms/pseries/firmware.c
> +++ b/arch/powerpc/platforms/pseries/firmware.c
> @@ -114,6 +114,7 @@ struct vec5_fw_feature {
>  vec5_fw_features_table[] = {
>   {FW_FEATURE_TYPE1_AFFINITY, OV5_TYPE1_AFFINITY},
>   {FW_FEATURE_PRRN,   OV5_PRRN},
> + {FW_FEATURE_DRC_INFO,   OV5_DRC_INFO},
>  };
> 
>  static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
> 



Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug

2017-11-16 Thread Nathan Fontenot


On 11/15/2017 12:28 PM, Michael Bringmann wrote:
> Hello:
> See below.
> 
> On 10/16/2017 07:54 AM, Michael Ellerman wrote:
>> Michael Bringmann  writes:
>>
>>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
>>> it may occur that the new resources are to be inserted into nodes
>>> that were not used for memory resources at bootup.  Many different
>>> configurations of PowerPC resources may need to be supported depending
>>> upon the environment.
>>
>> Give me some detail please?!
> 
> The most important characteristics that I have observed are:
> 
> * Dedicated vs. shared resources.  Shared resources require information
>   such as the VPHN hcall for CPU assignment to nodes.
> * memoryless nodes at boot.  Nodes need to be defined as 'possible' at
>   boot for operation with other code modules.  Previously, the powerpc
>   code would limit the set of possible/online nodes to those which have
>   memory assigned at boot.  Subsequent add/remove of CPUs or memory would
>   only work with this subset of possible nodes.
> * memoryless nodes with CPUs at boot.  Due to the previous restriction on
>   nodes, nodes that had CPUs but no memory were being collapsed into other
>   nodes that did have memory at boot.  In practice this meant that the
>   node assignment presented by the runtime kernel differed from the affinity
>   and associativity attirbutes presented by the device tree or VPHN hcalls.
>   Nodes that might be known to the pHyp were not 'possible' in the runtime
>   kernel because they did not have memory at boot.
> 
>>
>>> This patch fixes some problems encountered at
>>
>> What problems?
> 
> This patch set fixes a couple of problems.
> 
> * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them
>   are allowed to be 'possible' and 'online'.  Memory allocations for those
>   nodes are taken from another node that does have memory until and if memory
>   is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still be
>   referenced subsequently by affinity or associativity attributes, are kept
>   in the list of 'possible' nodes for powerpc.  Hot-add of memory or CPUs
>   to the system can reference these nodes and bring them online instead of
>   redirecting the resources to the set of nodes known to have memory at boot.
> 
>>
>>> runtime with configurations that support memory-less nodes, but which
>>> allow CPUs to be added at and after boot.
>>
>> How does it fix those problems?
> 
> This problem was fixed in a couple of ways.  First, the code now checks
> whether the node to which a CPU is mapped by 'numa_update_cpu_topology' /
> 'arch_update_cpu_topology' has been initialized and has memory available.
> If either test is false, a call is made to 'try_online_node()' to finish
> the data structure initialization.  Only if we are unable to initialize
> the node at this point will the CPU node assignment be collapsed into an
> existing node.  After initialization by 'try_online_node()', calls to
> 'local_memory_node' no longer crash for these memoryless nodes.
> 
>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index b385cd0..e811dd1 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>>> return rc;
>>>  }
>>>  
>>> +static int verify_node_preparation(int nid)
>>> +{
>>
>> I would not expect a function called "verify" ...
>>
>>> +   if ((NODE_DATA(nid) == NULL) ||
>>> +   (NODE_DATA(nid)->node_spanned_pages == 0)) {
>>> +   if (try_online_node(nid))
>>
>> .. to do something like online a node.
> 
> We have changed the function name to 'find_cpu_nid'.

Ok, but I would still not expect 'find_cpu_nid' to online the node.

> 
>>
>>> +   return first_online_node;
>>> +   }
>>> +
>>> +   return nid;
>>> +}
>>> +
>>>  /*
>>>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>>>   * characteristics change. This function doesn't perform any locking and is
>>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>>> /* Use associativity from first thread for all siblings */
>>> vphn_get_associativity(cpu, associativity);
>>> new_nid = associativity_to_nid(associativity);
>>> -   if (new_nid < 0 || !node_online(new_nid))
>>> +   if (new_nid < 0 || !node_possible(new_nid))
>>> new_nid = first_online_node;
>>>  
>>> +   new_nid = verify_node_preparation(new_nid);
>>
>> You're being called part-way through CPU hotplug here, are we sure it's
>> safe to go and do memory hotplug from there? What's the locking
>> situation?
> 
> We are not doing memory hotplug.  We are initializing a node that may be used
> by CPUs or memory before it can be referenced as invalid by a CPU hotplug
> operation.  CPU hotplug operations are protected by a 

Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB

2017-11-16 Thread Lorenzo Pieralisi
On Mon, Nov 13, 2017 at 02:35:48AM +, M.h. Lian wrote:

[...]

> > > On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote:
> > > > Add the property of inbound and outbound windows number for ep driver.
> > > >
> > > > Signed-off-by: Bao Xiaowei 
> > > > Acked-by: Minghuan Lian 
> > > > ---
> > > >  v2:
> > > >  - no change
> > > >  v3:
> > > >  - modify the commit message
> > > >  v4:
> > > >  - no change
> > > >
> > > >  arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > >
> > > $subject should start with something like
> > > arm64: dts: ls1046a: **

Indeed.

> > > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > > > b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > > > index 06b5e12d04d8..f8332669663c 100644
> > > > --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > > > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> > > > @@ -674,6 +674,8 @@
> > > > device_type = "pci";
> > > > dma-coherent;
> > > > num-lanes = <4>;
> > > > +   num-ib-windows = <6>;
> > > > +   num-ob-windows = <6>;
> > >
> > > EP specific properties shouldn't be added in RC dt node. Ideally you
> > > should have a separate dt node for RC and EP.
> > 
> > It is a single PCIe controller which can be configured to either RC
> > mode or EP mode.  Wouldn't it conflict with the device tree
> > principles to have two device tree nodes for the same PCIe
> > controller?  And obviously the two modes cannot be used at the same
> > time so we cannot have two drivers both probe on the same hardware.
> > 
> [Minghuan Lian]  There is only one PCIe dts node in the dts file. PCIe
> dts node describes the PCIe controller's hardware properties and does
> not have work mode.  The new properties  "num-ib-windows " and
> "num-ob-windows" are used to describe the inbound/outbound window
> number included in the PCIe hardware. These windows are used in both
> RC and EP mode.  We can change work mode when resetting via RCW(reset
> configuration word).

I am not happy about this (that's what I am asking Rob to chime in
please on DT side).

1) I do not think it is allowed to have two DT nodes in a dts with same unit
   address (ie same reg property)

   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/dra7.dtsi?h=v4.14

2) In the Synopsis Designware PCIe interface bindings we have some
   properties that are for RC mode and some for EP mode but there is
   no way from a *binding* perspective to detect in what mode the
   controller is:

   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/pci/designware-pcie.txt?h=v4.14

3) You can't use properties that in the bindings above are declared EP
   only for RC mode, we define bindings to respect their rules.

4) I think that a) a compatible should be added to the designware-pcie
   bindings to define endpoint mode and b) the same should be done for
   the ls1046a bindings. If the RC is programmed in EP mode DT firmware
   should be able to provide the information to an operating system, it
   is actually a _different_ component but on this I need DT people to
   chime in to define the best way forward.

I cannot review/merge this code until the points above are clarified.

Thanks,
Lorenzo


[PATCH 2/2] ASoC: fsl_asrc: constify some arrays

2017-11-16 Thread Joe Perches
Using const reduces data.

$ size sound/soc/fsl/fsl_asrc.o*
   textdata bss dec hex filename
  216915872 192   277556c6b sound/soc/fsl/fsl_asrc.o.new
  214356128 192   277556c6b sound/soc/fsl/fsl_asrc.o.old

Signed-off-by: Joe Perches 
---
 sound/soc/fsl/fsl_asrc.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index ed683fe8b94a..641724c9b3f8 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -49,12 +49,12 @@ static const u8 process_option[][12][2] = {
 };
 
 /* Corresponding to process_option */
-static int supported_input_rate[] = {
+static const int supported_input_rate[] = {
5512, 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200,
96000, 176400, 192000,
 };
 
-static int supported_asrc_rate[] = {
+static const int supported_asrc_rate[] = {
8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 
176400, 192000,
 };
 
@@ -62,26 +62,26 @@ static int supported_asrc_rate[] = {
  * The following tables map the relationship between asrc_inclk/asrc_outclk in
  * fsl_asrc.h and the registers of ASRCSR
  */
-static unsigned char input_clk_map_imx35[] = {
+static const unsigned char input_clk_map_imx35[] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf,
 };
 
-static unsigned char output_clk_map_imx35[] = {
+static const unsigned char output_clk_map_imx35[] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf,
 };
 
 /* i.MX53 uses the same map for input and output */
-static unsigned char input_clk_map_imx53[] = {
+static const unsigned char input_clk_map_imx53[] = {
 /* 0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xa  0xb  0xc  0xd  
0xe  0xf */
0x0, 0x1, 0x2, 0x7, 0x4, 0x5, 0x6, 0x3, 0x8, 0x9, 0xa, 0xb, 0xc, 0xf, 
0xe, 0xd,
 };
 
-static unsigned char output_clk_map_imx53[] = {
+static const unsigned char output_clk_map_imx53[] = {
 /* 0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xa  0xb  0xc  0xd  
0xe  0xf */
0x8, 0x9, 0xa, 0x7, 0xc, 0x5, 0x6, 0xb, 0x0, 0x1, 0x2, 0x3, 0x4, 0xf, 
0xe, 0xd,
 };
 
-static unsigned char *clk_map[2];
+static const unsigned char *clk_map[2];
 
 /**
  * Request ASRC pair
-- 
2.15.0



[PATCH 1/2] ASoC: fsl_asrc: Fix line continuation format

2017-11-16 Thread Joe Perches
Line continuations with excess spacing causes unexpected output

Signed-off-by: Joe Perches 
---
 sound/soc/fsl/fsl_asrc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 806d39927318..ed683fe8b94a 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -288,8 +288,8 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair *pair)
 
if ((outrate > 8000 && outrate < 3) &&
(outrate/inrate > 24 || inrate/outrate > 8)) {
-   pair_err("exceed supported ratio range [1/24, 8] for \
-   inrate/outrate: %d/%d\n", inrate, outrate);
+   pair_err("exceed supported ratio range [1/24, 8] for 
inrate/outrate: %d/%d\n",
+inrate, outrate);
return -EINVAL;
}
 
-- 
2.15.0



[PATCH 0/2] ASoC: fsl_asrc: neatening

2017-11-16 Thread Joe Perches
Joe Perches (2):
  ASoC: fsl_asrc: Fix line continuation format
  ASoC: fsl_asrc: constify some arrays

 sound/soc/fsl/fsl_asrc.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

-- 
2.15.0



Re: [PATCH] powerpc/pseries/cpuidle: add polling idle for shared processor guests

2017-11-16 Thread Nicholas Piggin
On Tue, 10 Oct 2017 17:11:09 +1000
Nicholas Piggin  wrote:

> For shared processor guests (e.g., KVM), add an idle polling mode rather
> than immediately returning to the hypervisor when the guest CPU goes
> idle.
> 
> Test setup is a 2 socket POWER9 with 4 guests running, each with vCPUs
> equal to 1/2 of real of CPUs. Saturated each guest with tbench. Using
> polling idle gives about 1.4x throughput.
> 
> Kernel compile speed was not changed significantly.
> 
> Signed-off-by: Nicholas Piggin 

What should we do about this one?


> ---
>  drivers/cpuidle/cpuidle-pseries.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index e9b3853d93ea..16be7ad30fe1 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -171,11 +171,17 @@ static struct cpuidle_state dedicated_states[] = {
>   * States for shared partition case.
>   */
>  static struct cpuidle_state shared_states[] = {
> + { /* Snooze */
> + .name = "snooze",
> + .desc = "snooze",
> + .exit_latency = 0,
> + .target_residency = 0,
> + .enter = _loop },
>   { /* Shared Cede */
>   .name = "Shared Cede",
>   .desc = "Shared Cede",
> - .exit_latency = 0,
> - .target_residency = 0,
> + .exit_latency = 10,
> + .target_residency = 100,
>   .enter = _cede_loop },
>  };
>  



Re: [PATCH v2 0/8] powerpc: Support ibm,dynamic-memory-v2 property

2017-11-16 Thread Nathan Fontenot


On 11/15/2017 11:37 PM, Bharata B Rao wrote:
> On Fri, Oct 20, 2017 at 6:51 PM, Nathan Fontenot  > wrote:
> 
> This patch set provides a set of updates to de-couple the LMB information
> provided in the ibm,dynamic-memory device tree property from the device
> tree property format. A part of this patch series introduces a new
> device tree property format for dynamic memory, ibm-dynamic-meory-v2.
> By separating the device tree format from the information provided by
> the device tree property consumers of this information need not know
> what format is currently being used and provide multiple parsing routines
> for the information.
> 
> The first two patches update the of_get_assoc_arrays() and
> of_get_usable_memory() routines to look up the device node for the
> properties they parse. This is needed because the calling routines for
> these two functions will not have the device node to pass in in
> subsequent patches.
> 
> The third patch adds a new kernel structure, struct drmem_lmb, that
> is used to represent each of the possible LMBs specified in the
> ibm,dynamic-memory* device tree properties. The patch adds code
> to parse the property and build the LMB array data, and updates prom.c
> to use this new data structure instead of parsing the device tree 
> directly.
> 
> The fourth and fifth patches update the numa and pseries hotplug code
> respectively to use the new LMB array data instead of parsing the
> device tree directly.
> 
> The sixth patch moves the of_drconf_cell struct to drmem.h where it
> fits better than prom.h
> 
> The seventh patch introduces support for the ibm,dynamic-memory-v2
> property format by updating the new drmem.c code to be able to parse
> and create this new device tree format.
> 
> The last patch in the series updates the architecture vector to indicate
> support for ibm,dynamic-memory-v2.
> 
> 
> Here we are consolidating LMBs into LMB sets but still end up working with 
> individual LMBs during hotplug. Can we instead start working with LMB sets 
> together during hotplug ? In other words

In a sense we do do this when handling memory DLPAR indexed-count requests. 
This takes a starting
drc index for a LMB and adds/removes the following  contiguous LMBs. 
This operation is
all-or-nothing, if any LMB fails to add/remove we revert back to the original 
state.

Thi isn't exactly what you're asking for but...
> 
> - The RTAS calls involved during DRC acquire stage can be done only once per 
> LMB set.
> - One configure-connector call for the entire LMB set.

these two interfaces work on a single drc index, not a set of drc indexes. 
Working on a set
of LMBs would require extending the current rtas calls or creating new ones.

One thing we can look into doing for indexed-count requests is to perform each 
of the
steps for all LMBs in the set at once, i.e. make the acquire call for LMBs, 
then make the
configure-connector calls for all the LMBs...

The only drawback is this approach would make handling failures and backing out 
of the
updates a bit messier, but I've never really thought that optimizing for the 
failure
case to be as important.

-Nathan

> 
> I think this should help hotplugging of large amounts of memory. Other than 
> that, if we choose to use LMB representation for PMEM, it will be useful 
> there too to handle all the LMBs of a PMEM range as one set.
> 
> Regards,
> Bharata.



[PATCH 4/4] cpuidle/powernv: avoid double irq enable coming out of idle

2017-11-16 Thread Nicholas Piggin
Since e1689795a7 ("cpuidle: Add common time keeping and irq enabling"),
cpuidle drivers are expected to return from ->enter with irqs disabled.

Update the cpuidle-powernv snooze and cede loops to disable irqs before
returning.

Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-pseries.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a187a39fb866..0f2b697cbb27 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -51,8 +51,6 @@ static inline void idle_loop_epilog(unsigned long in_purr)
get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
get_lppaca()->idle = 0;
 
-   if (irqs_disabled())
-   local_irq_enable();
ppc64_runlatch_on();
 }
 
@@ -87,6 +85,8 @@ static int snooze_loop(struct cpuidle_device *dev,
HMT_medium();
clear_thread_flag(TIF_POLLING_NRFLAG);
 
+   local_irq_disable();
+
idle_loop_epilog(in_purr);
 
return index;
@@ -121,6 +121,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
HMT_medium();
check_and_cede_processor();
 
+   local_irq_disable();
get_lppaca()->donate_dedicated_cpu = 0;
 
idle_loop_epilog(in_purr);
@@ -145,6 +146,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
 */
check_and_cede_processor();
 
+   local_irq_disable();
idle_loop_epilog(in_purr);
 
return index;
-- 
2.15.0



[PATCH 3/4] cpuidle/powernv: avoid double irq enable coming out of idle

2017-11-16 Thread Nicholas Piggin
Since e1689795a7 ("cpuidle: Add common time keeping and irq enabling"),
cpuidle drivers are expected to return from ->enter with irqs disabled.

Update the cpuidle-powernv snooze loop to disable irqs before returning.

Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index e06605b21841..1a8234e706bc 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -76,6 +76,8 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
 
+   local_irq_disable();
+
return index;
 }
 
-- 
2.15.0



[PATCH 2/4] powerpc/64: do not trace irqs-off at interrupt return to soft-disabled context

2017-11-16 Thread Nicholas Piggin
When an interrupt is returning to a soft-disabled context (which can
happen for non-maskable interrupts or synchronous interrupts), it goes
through the motions of soft-disabling again, including calling
TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()).

This is not necessary, because we must already be soft-disabled in the
interrupt context, it also may be causing crashes in the irq tracing
code to re-enter as an nmi. Replace it with a warning to ensure that
soft-interrupts are still disabled.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3320bcac7192..36878b6ee8b8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -911,9 +911,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
beq 1f
rlwinm  r7,r7,0,~PACA_IRQ_HARD_DIS
stb r7,PACAIRQHAPPENED(r13)
-1: li  r0,0
-   stb r0,PACASOFTIRQEN(r13);
-   TRACE_DISABLE_INTS
+1:
+#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG)
+   /* The interrupt should not have soft enabled. */
+   lbz r7,PACASOFTIRQEN(r13)
+1: tdnei   r7,0
+   EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+#endif
b   .Ldo_restore
 
/*
-- 
2.15.0



[PATCH 1/4] powerpc: define __ARCH_IRQ_EXIT_IRQS_DISABLED

2017-11-16 Thread Nicholas Piggin
powerpc calls irq_exit() with local irqs disabled, therefore it
can define __ARCH_IRQ_EXIT_IRQS_DISABLED.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hardirq.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/hardirq.h 
b/arch/powerpc/include/asm/hardirq.h
index 456f9e7b8d83..5986d473722b 100644
--- a/arch/powerpc/include/asm/hardirq.h
+++ b/arch/powerpc/include/asm/hardirq.h
@@ -29,6 +29,7 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
 #define local_softirq_pending()
__this_cpu_read(irq_stat.__softirq_pending)
 
 #define __ARCH_SET_SOFTIRQ_PENDING
+#define __ARCH_IRQ_EXIT_IRQS_DISABLED
 
 #define set_softirq_pending(x) __this_cpu_write(irq_stat.__softirq_pending, 
(x))
 #define or_softirq_pending(x) __this_cpu_or(irq_stat.__softirq_pending, (x))
-- 
2.15.0



[PATCH 0/4] interrupt tracing fixes

2017-11-16 Thread Nicholas Piggin
Here are a few loosely related fixes for interrupt tracing code
and irq state handling which eliminates local_irq_enable() when
already enabled, and local_irq_disable() when already disabled,
and also fixes an NMI re-entrancy bug in irq tracing that has
been crashing in the field when PMU interrupts (non-maskable) and
irq tracing runs together it's causing things to get into "impossible"
states.

I have only tested 64s, and don't know if patch 1 and 2 are right
on 64e or 32 so if anyone could take a look or test, that would
be good.

Thanks,
Nick

Nicholas Piggin (4):
  powerpc: define __ARCH_IRQ_EXIT_IRQS_DISABLED
  powerpc/64: do not trace irqs-off at interrupt return to soft-disabled
context
  cpuidle/powernv: avoid double irq enable coming out of idle
  cpuidle/powernv: avoid double irq enable coming out of idle

 arch/powerpc/include/asm/hardirq.h |  1 +
 arch/powerpc/kernel/entry_64.S | 10 +++---
 drivers/cpuidle/cpuidle-powernv.c  |  2 ++
 drivers/cpuidle/cpuidle-pseries.c  |  6 --
 4 files changed, 14 insertions(+), 5 deletions(-)

-- 
2.15.0



Re: [PATCH] cpufreq: powernv: Return the actual CPU frequency in /proc/cpuinfo

2017-11-16 Thread Nicholas Piggin
On Fri,  6 Oct 2017 12:54:08 +0530
Shriya  wrote:

> Make /proc/cpuinfo read the frequency of the CPU it is running at
> instead of reading the cached value of the last requested frequency.
> In conditions like WOF/throttle CPU can be running at a different
> frequency than the requested frequency.
> 
> Signed-off-by: Shriya 

This causes the following:

[7.203270] BUG: sleeping function called from invalid context at 
/home/npiggin/linux/kernel/locking/rwsem.c:23
[7.203323] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: systemd
[7.203352] 1 lock held by systemd/1:
[7.203367]  #0:  (>lock){+.+.}, at: [] 
seq_read+0x78/0x5c0
[7.203416] CPU: 164 PID: 1 Comm: systemd Not tainted 
4.14.0-00345-g8fb6e339cdf5 #33
[7.203452] Call Trace:
[7.203463] [c00ff55039d0] [c0be4f24] dump_stack+0x104/0x190 
(unreliable)
[7.203502] [c00ff5503a10] [c01297d0] ___might_sleep+0x2e0/0x320
[7.203539] [c00ff5503a90] [c0c049fc] down_read+0x3c/0xc0
[7.203569] [c00ff5503ad0] [c09f7c70] cpufreq_get+0x50/0xc0
[7.203600] [c00ff5503b20] [c0093618] pnv_get_proc_freq+0x28/0x60
[7.203637] [c00ff5503b50] [c002bfc0] show_cpuinfo+0x1b0/0x460
[7.203667] [c00ff5503c00] [c03e14b8] seq_read+0x238/0x5c0
[7.203698] [c00ff5503ca0] [c0441e80] proc_reg_read+0xb0/0x110
[7.203729] [c00ff5503cf0] [c03a16cc] __vfs_read+0x6c/0x1c0
[7.203759] [c00ff5503d90] [c03a18dc] vfs_read+0xbc/0x1b0
[7.203788] [c00ff5503de0] [c03a20cc] SyS_read+0x6c/0x110
[7.203819] [c00ff5503e30] [c000b82c] system_call+0x58/0x6c

>  arch/powerpc/platforms/powernv/setup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/setup.c 
> b/arch/powerpc/platforms/powernv/setup.c
> index 897aa14..55ea4bf 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -311,7 +311,7 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu)
>  {
>   unsigned long ret_freq;
>  
> - ret_freq = cpufreq_quick_get(cpu) * 1000ul;
> + ret_freq = cpufreq_get(cpu) * 1000ul;
>  
>   /*
>* If the backend cpufreq driver does not exist,



Re: [PATCH 2/3] libnvdimm: Add a device-tree interface

2017-11-16 Thread Rob Herring
On Thu, Nov 16, 2017 at 04:51:30AM +1100, Oliver O'Halloran wrote:
> A fairly bare-bones set of device-tree bindings so libnvdimm can be used
> on powerpc and other device-tree based platforms.
> 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Oliver O'Halloran 
> ---
>  .../devicetree/bindings/nvdimm/nvdimm-bus.txt  |  69 +++

Also, please split bindings to a separate patch.

>  MAINTAINERS|   8 +
>  drivers/nvdimm/Kconfig |  10 +
>  drivers/nvdimm/Makefile|   1 +
>  drivers/nvdimm/of_nvdimm.c | 202 
> +
>  5 files changed, 290 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/nvdimm/nvdimm-bus.txt
>  create mode 100644 drivers/nvdimm/of_nvdimm.c


Re: [PATCH 2/3] libnvdimm: Add a device-tree interface

2017-11-16 Thread Rob Herring
On Thu, Nov 16, 2017 at 04:51:30AM +1100, Oliver O'Halloran wrote:
> A fairly bare-bones set of device-tree bindings so libnvdimm can be used
> on powerpc and other device-tree based platforms.
> 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Oliver O'Halloran 
> ---
>  .../devicetree/bindings/nvdimm/nvdimm-bus.txt  |  69 +++
>  MAINTAINERS|   8 +
>  drivers/nvdimm/Kconfig |  10 +
>  drivers/nvdimm/Makefile|   1 +
>  drivers/nvdimm/of_nvdimm.c | 202 
> +
>  5 files changed, 290 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/nvdimm/nvdimm-bus.txt
>  create mode 100644 drivers/nvdimm/of_nvdimm.c
> 
> diff --git a/Documentation/devicetree/bindings/nvdimm/nvdimm-bus.txt 
> b/Documentation/devicetree/bindings/nvdimm/nvdimm-bus.txt
> new file mode 100644
> index ..491e7c4900ed
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/nvdimm/nvdimm-bus.txt
> @@ -0,0 +1,69 @@
> +Device-tree bindings for nonvolatile-memory / NVDIMMs
> +-
> +
> +Non-volatile DIMMs are memory modules used to provide (cacheable) main 
> memory that
> +retains its contents across power cycles. In more practical terms, they are 
> kind
> +of storage device where the contents can be accessed by the CPU directly,
> +rather than indirectly via a storage controller or similar. This can provide
> +substantial performance improvements for applications designed to take
> +advantage of in-memory storage.
> +
> +This binding provides a way to describe memory regions that should be managed
> +by an NVDIMM storage driver (libNVDIMM in Linux) and some of the associated
> +metadata. The binding itself is split into two main parts: A container bus 
> and
> +its sub-nodes which describe which memory address ranges corresponding to
> +NVDIMM backed memory.
> +
> +Bindings for the container bus:
> +--
> +
> +Required properties:
> + - compatible = "nvdimm-bus";
> + - ranges;
> + A blank ranges property is required because the sub-nodes have
> + addresses in the system's physical address space.
> +
> +The use of a container bus is mainly to handle future expansion of the 
> binding. For
> +comparison the ACPI equivalent of this binding (NFIT) describes: Memory 
> regions, DIMM
> +control structures, Block mode DIMM control structures, interleave sets, and 
> more. Some
> +of these structures cross reference each other so everyone should be happier 
> if we keep
> +it relatively self contained.

Will adding any of these things need unit addresses and colide with the 
existing nodes below? IOW, at one level there's only 1 number space.

> +
> +Bindings for the region nodes:
> +-
> +
> +Required properties:
> + - compatible = "nvdimm-persistent" or "nvdimm-volatile"
> +
> + The "nvdimm-persistent" region type indicates that this memory 
> region
> + is actually a persistent region. The volatile type is mainly 
> useful
> + for testing and RAM disks that can persist across kexec.
> +
> + - reg = ;
> + The reg property should only contain one address range.
> +
> +Optional properties:
> + - Any relevant NUMA assocativity properties for the target platform.
> +
> +A complete example:
> +
> +
> +/ {
> + #size-cells = <1>;
> + #address-cells = <1>;
> +
> + nonvolatile-memory {
> + compatible = "nonvolatile-memory";
> + ranges;
> +
> + region@5000 {
> + compatible = "nvdimm-persistent";
> + reg = <0x5000 0x1000>;
> + };
> +
> + region@6000 {
> + compatible = "nvdimm-volatile";
> + reg = <0x6000 0x1000>;
> + };
> + };
> +};
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 65eff7857ec3..0350bf5a94d2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7875,6 +7875,14 @@ Q: 
> https://patchwork.kernel.org/project/linux-nvdimm/list/
>  S:   Supported
>  F:   drivers/nvdimm/pmem*
>  
> +LIBNVDIMM: DEVICETREE BINDINGS
> +M:   Oliver O'Halloran 
> +L:   linux-nvd...@lists.01.org
> +Q:   https://patchwork.kernel.org/project/linux-nvdimm/list/
> +S:   Supported
> +F:   drivers/nvdimm/of_nvdimm.c
> +F:   Documentation/devicetree/bindings/nvdimm/nvdimm-bus.txt
> +
>  LIBNVDIMM: NON-VOLATILE MEMORY DEVICE SUBSYSTEM
>  M:   Dan Williams 
>  L:   linux-nvd...@lists.01.org
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 5bdd499b5f4f..72d147b55596 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -102,4 +102,14 @@ config NVDIMM_DAX
>  
> Select Y if unsure
>  
> +config OF_NVDIMM
> + tristate 

Re: [PATCH v4 2/3] powerpc/modules: Don't try to restore r2 after a sibling call

2017-11-16 Thread Naveen N. Rao

Josh Poimboeuf wrote:

On Wed, Nov 15, 2017 at 02:58:33PM +0530, Naveen N. Rao wrote:

> +int instr_is_link_branch(unsigned int instr)
> +{
> +  return (instr_is_branch_iform(instr) || instr_is_branch_bform(instr)) &&
> + (instr & BRANCH_SET_LINK);
> +}
> +

Nitpicking here, but since we're not considering the other branch forms,
perhaps this can be renamed to instr_is_link_relative_branch() (or maybe
instr_is_relative_branch_link()), just so we're clear :)


My understanding is that the absolute/relative bit isn't a "form", but
rather a bit that can be set for either the b-form (conditional) or the
i-form (unconditional).  And the above function isn't checking the
absolute bit, so it isn't necessarily a relative branch.  Or did I miss
something?


Ah, good point. I was coming from the fact that we are only considering 
the i-form and b-form branches and not the lr/ctr/tar based branches, 
which are always absolute branches, but can also set the link register.


Thinking about this more, aren't we only interested in relative branches
here (for relocations), so can we actually filter out the absolute 
branches? Something like this?


int instr_is_relative_branch_link(unsigned int instr)
{
return ((instr_is_branch_iform(instr) || instr_is_branch_bform(instr)) 
&&
   !(instr & BRANCH_ABSOLUTE) && (instr & BRANCH_SET_LINK));
}


- Naveen




Re: [PATCH v2] powerpc/powernv: Add pci_reset_phbs parameter to issue a PHB reset

2017-11-16 Thread Guilherme G. Piccoli
On 11/16/2017 01:49 AM, Balbir Singh wrote:
> On Thu, Oct 26, 2017 at 2:27 AM, Guilherme G. Piccoli
>  wrote:
>> During a kdump kernel boot in PowerPC, we request a reset of the PHBs
>> to the FW. It makes sense, since if we are booting a kdump kernel it
>> means we had some trouble before and we cannot rely in the adapters'
>> health; they could be in a bad state, hence the reset is needed.
>>
>> But this reset is useful not only in kdump - there are situations,
>> specially when debugging drivers, that we could break an adapter in
>> a way it requires such reset. One can tell to just go ahead and
>> reboot the machine, but happens that many times doing kexec is much
>> faster, and so preferable than a full power cycle.
>>
>> This patch adds the pci_reset_phbs parameter to perform such reset
>> when desired by the user.
>>
> 
> Do we care to reset specific phbs or all of them? I guess all based on
> your description.

Exactly Balbir, it does reset all of them. We could add such
granularity, but I don't see much usability..
But if somebody feels it's useful, we can change...

Thanks!


> 
> Balbir Singh.
> 



[GIT PULL] Please pull powerpc/linux.git powerpc-4.15-1 tag

2017-11-16 Thread Michael Ellerman
Hi Linus,

Please pull powerpc updates for 4.15. Apologies this is a bit late, we
had some late breaking fixes come in.

A bit of a small release, I suspect in part due to me travelling for KS.
But my backlog of patches to review is smaller than usual, so I think in
part folks just didn't send as much this cycle.

There's two conflicts you'll see. The first in KVM code against a fix
that went through Paul's tree, and the other against one of the timer
changes.

The correct resoultion in the KVM case is:
case KVM_CAP_PPC_HTM:
r = hv_enabled &&
(cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM_COMP);

And for the timer change:
mod_timer(_timer, jiffies + topology_timer_secs * HZ);

cheers


The following changes since commit e19b205be43d11bff638cad4487008c48d21c103:

  Linux 4.14-rc2 (2017-09-24 16:38:56 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.15-1

for you to fetch changes up to 3ffa9d9e2a7c10127d8cbf91ea2be15390b450ed:

  powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature (2017-11-15 
14:25:42 +1100)


powerpc updates for 4.15

Non-highlights:

 - Five fixes for the >128T address space handling, both to fix bugs in our
   implementation and to bring the semantics exactly into line with x86.

Highlights:

 - Support for a new OPAL call on bare metal machines which gives us a true NMI
   (ie. is not masked by MSR[EE]=0) for debugging etc.

 - Support for Power9 DD2 in the CXL driver.

 - Improvements to machine check handling so that uncorrectable errors can be
   reported into the generic memory_failure() machinery.

 - Some fixes and improvements for VPHN, which is used under PowerVM to notify
   the Linux partition of topology changes.

 - Plumbing to enable TM (transactional memory) without suspend on some Power9
   processors (PPC_FEATURE2_HTM_NO_SUSPEND).

 - Support for emulating vector loads form cache-inhibited memory, on some
   Power9 revisions.

 - Disable the fast-endian switch "syscall" by default (behind a CONFIG), we
   believe it has never had any users.

 - A major rework of the API drivers use when initiating and waiting for long
   running operations performed by OPAL firmware, and changes to the
   powernv_flash driver to use the new API.

 - Several fixes for the handling of FP/VMX/VSX while processes are using
   transactional memory.

 - Optimisations of TLB range flushes when using the radix MMU on Power9.

 - Improvements to the VAS facility used to access coprocessors on Power9, and
   related improvements to the way the NX crypto driver handles requests.

 - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.

Thanks to:
  Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh
  Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao,
  Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R.
  Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren
  Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami
  Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
  Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de
  Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen
  Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain,
  Vaidyanathan Srinivasan, William A. Kennington III.


Alexey Kardashevskiy (2):
  powerpc/powernv: Reserve a hole which appears after enabling IOV
  powerpc/powernv/ioda: Remove explicit max window size check

Alistair Popple (2):
  powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
  powerpc/powernv/npu: Don't explicitly flush nmmu tlb

Allen Pais (3):
  powerpc/oprofile: Use setup_timer() helper
  powerpc/6xx: Use setup_timer() helper
  powerpc/powermac: Use setup_timer() helper

Andrew Donnellan (1):
  powerpc/configs: Enable I2C_CHARDEV for pseries and powernv

Aneesh Kumar K.V (1):
  powerpc/mm/hash: Add pr_fmt() to hash_utils64.c

Arnd Bergmann (1):
  powerpc/eeh: Stop using do_gettimeofday()

Balbir Singh (7):
  powerpc/mce: Remove unused function get_mce_fault_addr()
  powerpc/mce: Align the print of physical address better
  powerpc/mce: Hookup derror (load/store) UE errors
  powerpc/mce: Hookup ierror (instruction) UE errors
  powerpc/mce: hookup memory_failure for UE errors
  powerpc/xmon: Support dumping software pagetables
  powerpc/mm/radix: Fix crashes on Power9 DD1 with radix MMU and STRICT_RWX

Benjamin Herrenschmidt (2):
  powerpc/powernv: Rework EEH initialization on powernv
  powerpc: Fix DABR match on hash based systems

Breno Leitao (1):
  powerpc/xmon: Check 

STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-16 Thread Meelis Roos
For me, 4.13 worked and 4.14 hangs early during boot. Bisecting led to 
the following commit. I had STRICT_KERNEL_RWX enabled when I met the 
option. When I disabled STRICT_KERNEL_RWX, the same kernel booted fine.


95902e6c8864d39b09134dcaa3c99d8161d1deea is the first bad commit
commit 95902e6c8864d39b09134dcaa3c99d8161d1deea
Author: Christophe Leroy 
Date:   Wed Aug 2 15:51:05 2017 +0200

powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32

This patch implements STRICT_KERNEL_RWX on PPC32.

As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
in order to allow page protection setup at the level of each page.

As BAT/LTLB mappings are deactivated, there might be a performance
impact.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 

:04 04 1eac3de57642856e31a914da2e1fe5368095f04b 
ee3634b9ae309852feebc69b8a6bd473944e212c M  arch


Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.13.0-rc2 Kernel Configuration
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
# CONFIG_FORCE_SMP is not set
# CONFIG_SMP is not set
# CONFIG_PPC_DOORBELL is not set
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_PPC32=y
CONFIG_32BIT=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
# CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_BITS_MIN=11
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_PPC_UDBG_16550 is not set
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_EPAPR_BOOT is not set
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_KERNEL_GZIP=y
CONFIG_DEFAULT_HOSTNAME="pohl"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_RCU_NEED_SEGCBLIST is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SOCK_CGROUP_DATA is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# 

Re: [PATCH 0/9] posix_clocks: Prepare syscalls for 64 bit time_t conversion

2017-11-16 Thread Thomas Gleixner
On Wed, 15 Nov 2017, Deepa Dinamani wrote:
> > I had on concern about x32, maybe we should check
> > for "COMPAT_USE_64BIT_TIME" before zeroing out the tv_nsec
> > bits.
> 
> Thanks, I think you are right. I had the check conditional on
> CONFIG_64BIT_TIME and then removed as I forgot why I added it. :)
> 
> > Regarding CONFIG_COMPAT_TIME/CONFIG_64BIT_TIME, would
> > it help to just leave out that part for now and unconditionally
> > define '__kernel_timespec' as 'timespec' until we are ready to
> > convert the architectures?
> 
> Another approach would be to use separate configs:
> 
> 1. To indicate 64 bit time_t syscall support. This will be dependent
> on architectures as CONFIG_64BIT_TIME.
> We can delete this once all architectures have provided support for this.
> 
> 2. Another config (maybe COMPAT_32BIT_TIME?) to be introduced later,
> which will compile out all syscalls/ features that use 32 bit time_t.
> This can help build a y2038 safe kernel later.
> 
> Would this work for everyone?

Having extra config switches which are selectable by architectures and
removed when everything is converted is definitely the right way to go.

That allows you to gradually convert stuff w/o inflicting wreckage all over
the place.

Thanks,

tglx