Re: [PATCH v4] selftests/powerpc: Fix pkey syscall redefinitions

2020-08-04 Thread Sachin Sant



> On 04-Aug-2020, at 11:01 PM, Sandipan Das  wrote:
> 
> On distros using older glibc versions, the pkey tests
> encounter build failures due to redefinition of the
> pkey syscall numbers.
> 
> For compatibility, commit 743f3544fffb added a wrapper
> for the gettid() syscall and included syscall.h if the
> version of glibc used is older than 2.30. This leads
> to different definitions of SYS_pkey_* as the ones in
> the pkey test header set numeric constants where as the
> ones from syscall.h reuse __NR_pkey_*. The compiler
> complains about redefinitions since they are different.
> 
> This replaces SYS_pkey_* definitions with __NR_pkey_*
> such that the definitions in both syscall.h and pkeys.h
> are alike. This way, if syscall.h has to be included
> for compatibility reasons, builds will still succeed.
> 
> Fixes: 743f3544fffb ("selftests/powerpc: Add wrapper for gettid")
> Reported-by: Sachin Sant 
> Suggested-by: David Laight 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Sandipan Das 

Tested-by: Sachin Sant 

Thanks
-Sachin


[powerpc:next] BUILD SUCCESS 0c83b277ada72b585e6a3e52b067669df15bcedb

2020-08-04 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next
branch HEAD: 0c83b277ada72b585e6a3e52b067669df15bcedb  powerpc: Fix circular 
dependency between percpu.h and mmu.h

elapsed time: 925m

configs tested: 112
configs skipped: 7

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
sh   se7750_defconfig
mipsgpr_defconfig
arm   spear13xx_defconfig
mips  maltasmvp_defconfig
sh   se7206_defconfig
arm   spitz_defconfig
armneponset_defconfig
arm  colibri_pxa300_defconfig
arm   netwinder_defconfig
arm   versatile_defconfig
powerpcmpc7448_hpc2_defconfig
mipsbcm63xx_defconfig
sh  r7785rp_defconfig
shshmin_defconfig
armmulti_v7_defconfig
powerpc  chrp32_defconfig
powerpc powernv_defconfig
sh  sdk7786_defconfig
arm eseries_pxa_defconfig
m68k  atari_defconfig
powerpc   holly_defconfig
armkeystone_defconfig
powerpc mpc83xx_defconfig
sh ap325rxa_defconfig
arm mxs_defconfig
shtitan_defconfig
arc  alldefconfig
m68kmvme147_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a006-20200804
x86_64   randconfig-a001-20200804
x86_64   randconfig-a004-20200804
x86_64   randconfig-a005-20200804
x86_64   randconfig-a002-20200804
x86_64   randconfig-a003-20200804
i386 randconfig-a005-20200805
i386 randconfig-a004-20200805
i386 randconfig-a001-20200805
i386 randconfig-a003-20200805
i386 randconfig-a002-20200805
i386 randconfig-a006-20200805
i386 randconfig-a005-20200804
i386 randconfig-a004-20200804
i386 randconfig-a006-20200804
i386 randconfig-a001-20200804
i386 randconfig-a003-20200804
i386 randconfig-a002-20200804
x86_64   randconfig-a013-20200805
x86_64   randconfig-a011-20200805
x86_64   randconfig-a012-20200805
x86_64   randconfig-a016-20200805
x86_64   randconfig-a015-20200805
x86_64   randconfig-a014-20200805
i386 randconfig-a011-20200804
i386 randconfig-a012-20200804
i386 randconfig-a013-20200804
i386 randconfig-a014-20200804
i386 randconfig-a015-20200804
i386 randconfig-a016-20200804
i386 randconfig-a014-20200805
i386 randconfig-a015-20200805
i386 randconfig-a016-20200805
i386 randconfig-a011-20200805
i386 randconfig

[powerpc:merge] BUILD SUCCESS 14fd53d1e5ee7350564cac75e336f8c0dea13bc9

2020-08-04 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 14fd53d1e5ee7350564cac75e336f8c0dea13bc9  Automatic merge of 
'master', 'next' and 'fixes' (2020-08-04 23:16)

elapsed time: 927m

configs tested: 108
configs skipped: 7

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
sh   se7750_defconfig
mipsgpr_defconfig
arm   spear13xx_defconfig
mips  maltasmvp_defconfig
sh   se7206_defconfig
arm   spitz_defconfig
armneponset_defconfig
armshmobile_defconfig
powerpc  pmac32_defconfig
shsh7785lcr_defconfig
mips cobalt_defconfig
mips rt305x_defconfig
mips pnx8335_stb225_defconfig
ia64  tiger_defconfig
arm  ep93xx_defconfig
arm  colibri_pxa300_defconfig
arm   netwinder_defconfig
arm   versatile_defconfig
powerpcmpc7448_hpc2_defconfig
mipsbcm63xx_defconfig
sh  r7785rp_defconfig
shshmin_defconfig
armmulti_v7_defconfig
mips tb0287_defconfig
sh  rsk7269_defconfig
mipsjmr3927_defconfig
xtensa  iss_defconfig
powerpc  chrp32_defconfig
powerpc powernv_defconfig
sh  sdk7786_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a005-20200804
i386 randconfig-a004-20200804
i386 randconfig-a001-20200804
i386 randconfig-a003-20200804
i386 randconfig-a002-20200804
i386 randconfig-a006-20200804
i386 randconfig-a005-20200805
i386 randconfig-a004-20200805
i386 randconfig-a001-20200805
i386 randconfig-a003-20200805
i386 randconfig-a002-20200805
i386 randconfig-a006-20200805
x86_64   randconfig-a013-20200805
x86_64   randconfig-a011-20200805
x86_64   randconfig-a012-20200805
x86_64   randconfig-a016-20200805
x86_64   randconfig-a015-20200805
x86_64   randconfig-a014-20200805
i386 randconfig-a011-20200805
i386 randconfig-a012-20200805
i386 randconfig-a013-20200805
i386 randconfig-a014-20200805
i386 randconfig-a015-20200805
i386 randconfig-a016-20200805
x86_64   randconfig-a006-20200804
x86_64   randconfig-a001-20200804
x86_64   randconfig-a004-20200804
x86_64   randconfig-a005-20200804
x86_64   randconfig-a002-20200804
x86_64   randconfig-a003-20200804
riscvallyesconfig
riscv allnoconfig
riscv

Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

2020-08-04 Thread Michael Roth
Quoting Michael Ellerman (2020-08-04 22:07:08)
> Greg Kurz  writes:
> > On Tue, 04 Aug 2020 23:35:10 +1000
> > Michael Ellerman  wrote:
> >> There is a bit of history to this code, but not in a good way :)
> >> 
> >> Michael Roth  writes:
> >> > For a power9 KVM guest with XIVE enabled, running a test loop
> >> > where we hotplug 384 vcpus and then unplug them, the following traces
> >> > can be seen (generally within a few loops) either from the unplugged
> >> > vcpu:
> >> >
> >> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
> >> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
> >> >   [ 1767.952311] list_del corruption. next->prev should be 
> >> > c00a02470208, but was c00a02470048
> >> ...
> >> >
> >> > At that point the worker thread assumes the unplugged CPU is in some
> >> > unknown/dead state and procedes with the cleanup, causing the race with
> >> > the XIVE cleanup code executed by the unplugged CPU.
> >> >
> >> > Fix this by inserting an msleep() after each RTAS call to avoid
> >> 
> >> We previously had an msleep(), but it was removed:
> >> 
> >>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
> >
> > Ah, I hadn't seen that one...
> >
> >> > pseries_cpu_die() returning prematurely, and double the number of
> >> > attempts so we wait at least a total of 5 seconds. While this isn't an
> >> > ideal solution, it is similar to how we dealt with a similar issue for
> >> > cede_offline mode in the past (940ce422a3).
> >> 
> >> Thiago tried to fix this previously but there was a bit of discussion
> >> that didn't quite resolve:
> >> 
> >>   
> >> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauer...@linux.ibm.com/
> >
> > Yeah it appears that the motivation at the time was to make the "Querying 
> > DEAD?"
> > messages to disappear and to avoid potentially concurrent calls to 
> > rtas-stop-self
> > which is prohibited by PAPR... not fixing actual crashes.
> 
> I'm pretty sure at one point we were triggering crashes *in* RTAS via
> this path, I think that got resolved.
> 
> >> Spinning forever seems like a bad idea, but as has been demonstrated at
> >> least twice now, continuing when we don't know the state of the other
> >> CPU can lead to straight up crashes.
> >> 
> >> So I think I'm persuaded that it's preferable to have the kernel stuck
> >> spinning rather than oopsing.
> >> 
> >
> > +1
> >
> >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> >> first instinct is no, if we're stuck here for 20s a stack trace would be
> >> good. But then we will probably hit that on some big and/or heavily
> >> loaded machine.
> >> 
> >> So possibly we should call cond_resched() but have some custom logic in
> >> the loop to print a warning if we are stuck for more than some
> >> sufficiently long amount of time.
> >
> > How long should that be ?
> 
> Yeah good question.
> 
> I guess step one would be seeing how long it can take on the 384 vcpu
> machine. And we can probably test on some other big machines.
> 
> Hopefully Nathan can give us some idea of how long he's seen it take on
> large systems? I know he was concerned about the 20s timeout of the
> softlockup detector.
> 
> Maybe a minute or two?

Hmm, so I took a stab at this where I called cond_resched() after
every 5 seconds of polling and printed a warning at the same time (FWIW
that doesn't seem to trigger any warnings on a loaded 96-core mihawk
system using KVM running the 384vcpu unplug loop)

But it sounds like that's not quite what you had in mind. How frequently
do you think we should call cond_resched()? Maybe after 25 iterations
of polling smp_query_cpu_stopped() to keep original behavior somewhat
similar?

I'll let the current patch run on the mihawk system overnight in the
meantime so we at least have that data point, but would be good to
know what things look like a large pHyp machine.

Thanks!

> 
> >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE 
> >> > interrupt controller")
> >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> >> 
> >> This is not public.
> >
> > I'll have a look at changing that.
> 
> Thanks.
> 
> cheers


Re: [PATCH v2 13/17] x86/setup: simplify initrd relocation and reservation

2020-08-04 Thread Baoquan He
On 08/02/20 at 07:35pm, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> Currently, initrd image is reserved very early during setup and then it
> might be relocated and re-reserved after the initial physical memory
> mapping is created. The "late" reservation of memblock verifies that mapped
> memory size exceeds the size of initrd, the checks whether the relocation
  ~ then?
> required and, if yes, relocates inirtd to a new memory allocated from
> memblock and frees the old location.
> 
> The check for memory size is excessive as memblock allocation will anyway
> fail if there is not enough memory. Besides, there is no point to allocate
> memory from memblock using memblock_find_in_range() + memblock_reserve()
> when there exists memblock_phys_alloc_range() with required functionality.
> 
> Remove the redundant check and simplify memblock allocation.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/x86/kernel/setup.c | 16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index a3767e74c758..d8de4053c5e8 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -262,16 +262,12 @@ static void __init relocate_initrd(void)
>   u64 area_size = PAGE_ALIGN(ramdisk_size);
>  
>   /* We need to move the initrd down into directly mapped mem */
> - relocated_ramdisk = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
> -area_size, PAGE_SIZE);
> -
> + relocated_ramdisk = memblock_phys_alloc_range(area_size, PAGE_SIZE, 0,
> +   PFN_PHYS(max_pfn_mapped));
>   if (!relocated_ramdisk)
>   panic("Cannot find place for new RAMDISK of size %lld\n",
> ramdisk_size);
>  
> - /* Note: this includes all the mem currently occupied by
> -the initrd, we rely on that fact to keep the data intact. */
> - memblock_reserve(relocated_ramdisk, area_size);
>   initrd_start = relocated_ramdisk + PAGE_OFFSET;
>   initrd_end   = initrd_start + ramdisk_size;
>   printk(KERN_INFO "Allocated new RAMDISK: [mem %#010llx-%#010llx]\n",
> @@ -298,13 +294,13 @@ static void __init early_reserve_initrd(void)
>  
>   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
>  }
> +
>  static void __init reserve_initrd(void)
>  {
>   /* Assume only end is not page aligned */
>   u64 ramdisk_image = get_ramdisk_image();
>   u64 ramdisk_size  = get_ramdisk_size();
>   u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
> - u64 mapped_size;
>  
>   if (!boot_params.hdr.type_of_loader ||
>   !ramdisk_image || !ramdisk_size)
> @@ -312,12 +308,6 @@ static void __init reserve_initrd(void)
>  
>   initrd_start = 0;
>  
> - mapped_size = memblock_mem_size(max_pfn_mapped);
> - if (ramdisk_size >= (mapped_size>>1))
> - panic("initrd too large to handle, "
> -"disabling initrd (%lld needed, %lld available)\n",
> -ramdisk_size, mapped_size>>1);

Reviewed-by: Baoquan He 

> -
>   printk(KERN_INFO "RAMDISK: [mem %#010llx-%#010llx]\n", ramdisk_image,
>   ramdisk_end - 1);
>  
> -- 
> 2.26.2
> 



Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

2020-08-04 Thread Thiago Jung Bauermann


Michael Ellerman  writes:

> Greg Kurz  writes:
>> On Tue, 04 Aug 2020 23:35:10 +1000
>> Michael Ellerman  wrote:
>>> There is a bit of history to this code, but not in a good way :)
>>>
>>> Michael Roth  writes:
>>> > For a power9 KVM guest with XIVE enabled, running a test loop
>>> > where we hotplug 384 vcpus and then unplug them, the following traces
>>> > can be seen (generally within a few loops) either from the unplugged
>>> > vcpu:
>>> >
>>> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>>> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>>> >   [ 1767.952311] list_del corruption. next->prev should be 
>>> > c00a02470208, but was c00a02470048
>>> ...
>>> >
>>> > At that point the worker thread assumes the unplugged CPU is in some
>>> > unknown/dead state and procedes with the cleanup, causing the race with
>>> > the XIVE cleanup code executed by the unplugged CPU.
>>> >
>>> > Fix this by inserting an msleep() after each RTAS call to avoid
>>>
>>> We previously had an msleep(), but it was removed:
>>>
>>>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
>>
>> Ah, I hadn't seen that one...
>>
>>> > pseries_cpu_die() returning prematurely, and double the number of
>>> > attempts so we wait at least a total of 5 seconds. While this isn't an
>>> > ideal solution, it is similar to how we dealt with a similar issue for
>>> > cede_offline mode in the past (940ce422a3).
>>>
>>> Thiago tried to fix this previously but there was a bit of discussion
>>> that didn't quite resolve:
>>>
>>>   
>>> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauer...@linux.ibm.com/
>>
>> Yeah it appears that the motivation at the time was to make the "Querying 
>> DEAD?"
>> messages to disappear and to avoid potentially concurrent calls to 
>> rtas-stop-self
>> which is prohibited by PAPR... not fixing actual crashes.
>
> I'm pretty sure at one point we were triggering crashes *in* RTAS via
> this path, I think that got resolved.

Yes, pHyp's RTAS now tolerates concurrent calls to stop-self. The
original bug that was reported when I worked on this ended in an RTAS
crash because of this timeout. The crash was fixed then.

--
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v2 11/17] arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()

2020-08-04 Thread Baoquan He
On 08/02/20 at 07:35pm, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> There are several occurrences of the following pattern:
> 
>   for_each_memblock(memory, reg) {
>   start_pfn = memblock_region_memory_base_pfn(reg);
>   end_pfn = memblock_region_memory_end_pfn(reg);
> 
>   /* do something with start_pfn and end_pfn */
>   }
> 
> Rather than iterate over all memblock.memory regions and each time query
> for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
> simpler and clearer code.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/arm/mm/init.c   | 11 ---
>  arch/arm64/mm/init.c | 11 ---
>  arch/powerpc/kernel/fadump.c | 11 ++-
>  arch/powerpc/mm/mem.c| 15 ---
>  arch/powerpc/mm/numa.c   |  7 ++-
>  arch/s390/mm/page-states.c   |  6 ++
>  arch/sh/mm/init.c|  9 +++--
>  mm/memblock.c|  6 ++
>  mm/sparse.c  | 10 --
>  9 files changed, 35 insertions(+), 51 deletions(-)
> 

Reviewed-by: Baoquan He 



Re: [PATCH v2 02/17] dma-contiguous: simplify cma_early_percent_memory()

2020-08-04 Thread Baoquan He
On 08/02/20 at 07:35pm, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The memory size calculation in cma_early_percent_memory() traverses
> memblock.memory rather than simply call memblock_phys_mem_size(). The
> comment in that function suggests that at some point there should have been
> call to memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
> 
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
> 
> Signed-off-by: Mike Rapoport 
> Reviewed-by: Christoph Hellwig 
> ---
>  kernel/dma/contiguous.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index 15bc5026c485..1992afd8ca7b 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -73,16 +73,7 @@ early_param("cma", early_cma);
>  
>  static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
>  {
> - struct memblock_region *reg;
> - unsigned long total_pages = 0;
> -
> - /*
> -  * We cannot use memblock_phys_mem_size() here, because
> -  * memblock_analyze() has not been called yet.
> -  */
> - for_each_memblock(memory, reg)
> - total_pages += memblock_region_memory_end_pfn(reg) -
> -memblock_region_memory_base_pfn(reg);
> + unsigned long total_pages = PHYS_PFN(memblock_phys_mem_size());

Reviewed-by: Baoquan He 

>  
>   return (total_pages * CONFIG_CMA_SIZE_PERCENTAGE / 100) << PAGE_SHIFT;
>  }
> -- 
> 2.26.2
> 



Re: [PATCH v5 4/4] powerpc/pseries/iommu: Allow bigger 64bit window by removing default DMA window

2020-08-04 Thread Alexey Kardashevskiy



On 05/08/2020 13:04, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> default DMA window for the device, before attempting to configure a DDW,
> in order to make the maximum resources available for the next DDW to be
> created.
> 
> This is a requirement for using DDW on devices in which hypervisor
> allows only one DMA window.
> 
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, it's needed to restore the default DMA window.
> For this, an implementation of ibm,reset-pe-dma-windows rtas call is
> needed:
> 
> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> ibm,ddw-extensions. The first extension available (index 2) carries the
> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> the default DMA window for a device, if it has been deleted.
> 
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
> 
> Signed-off-by: Leonardo Bras 
> Tested-by: David Dai 


Reviewed-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/platforms/pseries/iommu.c | 73 +++---
>  1 file changed, 66 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 4e33147825cc..e4198700ed1a 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1066,6 +1066,38 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>   return max_addr;
>  }
>  
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7 
> implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for the 
> device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> +{
> + int ret;
> + u32 cfg_addr, reset_dma_win;
> + u64 buid;
> + struct device_node *dn;
> + struct pci_dn *pdn;
> +
> + ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN, _dma_win);
> + if (ret)
> + return;
> +
> + dn = pci_device_to_OF_node(dev);
> + pdn = PCI_DN(dn);
> + buid = pdn->phb->buid;
> + cfg_addr = (pdn->busno << 16) | (pdn->devfn << 8);
> +
> + ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid));
> + if (ret)
> + dev_info(>dev,
> +  "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> +  reset_dma_win, cfg_addr, BUID_HI(buid), BUID_LO(buid),
> +  ret);
> +}
> +
>  /*
>   * If the PE supports dynamic dma windows, and there is space for a table
>   * that can map all pages in a linear offset, then setup such a table,
> @@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   struct property *win64;
>   struct dynamic_dma_window_prop *ddwprop;
>   struct failed_ddw_pdn *fpdn;
> + bool default_win_removed = false;
>  
>   mutex_lock(_window_init_mutex);
>  
> @@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   if (ret != 0)
>   goto out_failed;
>  
> + /*
> +  * If there is no window available, remove the default DMA window,
> +  * if it's present. This will make all the resources available to the
> +  * new DDW window.
> +  * If anything fails after this, we need to restore it, so also check
> +  * for extensions presence.
> +  */
>   if (query.windows_available == 0) {
> - /*
> -  * no additional windows are available for this device.
> -  * We might be able to reallocate the existing window,
> -  * trading in for a larger page size.
> -  */
> - dev_dbg(>dev, "no free dynamic windows");
> - goto out_failed;
> + struct property *default_win;
> + int reset_win_ext;
> +
> + default_win = of_find_property(pdn, "ibm,dma-window", NULL);
> + if (!default_win)
> + goto out_failed;
> +
> + reset_win_ext = ddw_read_ext(pdn, DDW_EXT_RESET_DMA_WIN, NULL);
> + if (reset_win_ext)
> + goto out_failed;
> +
> + remove_dma_window(pdn, ddw_avail, default_win);
> + default_win_removed = true;
> +
> + /* Query again, to check if the window is available */
> + ret = query_ddw(dev, ddw_avail, , pdn);
> + if (ret != 0)
> + goto out_failed;
> +
> + if (query.windows_available == 0) {
> + /* no windows are available for this device. */
> + dev_dbg(>dev, "no free dynamic windows");
> + goto out_failed;
> +   

Re: [PATCH v5 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

2020-08-04 Thread Alexey Kardashevskiy



On 05/08/2020 13:04, Leonardo Bras wrote:
> Move the window-removing part of remove_ddw into a new function
> (remove_dma_window), so it can be used to remove other DMA windows.
> 
> It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
> property, like the default DMA window from the device, which uses
> "ibm,dma-window".
> 
> Signed-off-by: Leonardo Bras 
> Tested-by: David Dai 


Reviewed-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/platforms/pseries/iommu.c | 45 +++---
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 1a933c4e8bba..4e33147825cc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -781,25 +781,14 @@ static int __init disable_ddw_setup(char *str)
>  
>  early_param("disable_ddw", disable_ddw_setup);
>  
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
> +   struct property *win)
>  {
>   struct dynamic_dma_window_prop *dwp;
> - struct property *win64;
> - u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   u64 liobn;
> - int ret = 0;
> -
> - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -  _avail[0], DDW_APPLICABLE_SIZE);
> -
> - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> - if (!win64)
> - return;
> -
> - if (ret || win64->length < sizeof(*dwp))
> - goto delprop;
> + int ret;
>  
> - dwp = win64->value;
> + dwp = win->value;
>   liobn = (u64)be32_to_cpu(dwp->liobn);
>  
>   /* clear the whole window, note the arg is in kernel pages */
> @@ -821,10 +810,30 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   pr_debug("%pOF: successfully removed direct window: rtas 
> returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
> +}
> +
> +static void remove_ddw(struct device_node *np, bool remove_prop)
> +{
> + struct property *win;
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> + int ret = 0;
> +
> + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> +  _avail[0], DDW_APPLICABLE_SIZE);
> + if (ret)
> + return;
> +
> + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + if (!win)
> + return;
> +
> + if (win->length >= sizeof(struct dynamic_dma_window_prop))
> + remove_dma_window(np, ddw_avail, win);
> +
> + if (!remove_prop)
> + return;
>  
> -delprop:
> - if (remove_prop)
> - ret = of_remove_property(np, win64);
> + ret = of_remove_property(np, win);
>   if (ret)
>   pr_warn("%pOF: failed to remove direct window property: %d\n",
>   np, ret);
> 

-- 
Alexey


Re: [PATCH v5 2/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows

2020-08-04 Thread Alexey Kardashevskiy



On 05/08/2020 13:04, Leonardo Bras wrote:
> From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
> 
> This change of output size is meant to expand the address size of
> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> shifting page_size and migration_capable.
> 
> This ends up requiring the update of
> ddw_query_response->largest_available_block from u32 to u64, and manually
> assigning the values from the buffer into this struct, according to
> output size.
> 
> Also, a routine was created for helping reading the ddw extensions as
> suggested by LoPAR: First reading the size of the extension array from
> index 0, checking if the property exists, and then returning it's value.
> 
> Signed-off-by: Leonardo Bras 
> Tested-by: David Dai 


Reviewed-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/platforms/pseries/iommu.c | 91 +++---
>  1 file changed, 81 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index ac0d6376bdad..1a933c4e8bba 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -47,6 +47,12 @@ enum {
>   DDW_APPLICABLE_SIZE
>  };
>  
> +enum {
> + DDW_EXT_SIZE = 0,
> + DDW_EXT_RESET_DMA_WIN = 1,
> + DDW_EXT_QUERY_OUT_SIZE = 2
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>   struct iommu_table_group *table_group;
> @@ -342,7 +348,7 @@ struct direct_window {
>  /* Dynamic DMA Window support */
>  struct ddw_query_response {
>   u32 windows_available;
> - u32 largest_available_block;
> + u64 largest_available_block;
>   u32 page_size;
>   u32 migration_capable;
>  };
> @@ -877,14 +883,62 @@ static int find_existing_ddw_windows(void)
>  }
>  machine_arch_initcall(pseries, find_existing_ddw_windows);
>  
> +/**
> + * ddw_read_ext - Get the value of an DDW extension
> + * @np:  device node from which the extension value is to be 
> read.
> + * @extnum:  index number of the extension.
> + * @value:   pointer to return value, modified when extension is available.
> + *
> + * Checks if "ibm,ddw-extensions" exists for this node, and get the value
> + * on index 'extnum'.
> + * It can be used only to check if a property exists, passing value == NULL.
> + *
> + * Returns:
> + *   0 if extension successfully read
> + *   -EINVAL if the "ibm,ddw-extensions" does not exist,
> + *   -ENODATA if "ibm,ddw-extensions" does not have a value, and
> + *   -EOVERFLOW if "ibm,ddw-extensions" does not contain this extension.
> + */
> +static inline int ddw_read_ext(const struct device_node *np, int extnum,
> +u32 *value)
> +{
> + static const char propname[] = "ibm,ddw-extensions";
> + u32 count;
> + int ret;
> +
> + ret = of_property_read_u32_index(np, propname, DDW_EXT_SIZE, );
> + if (ret)
> + return ret;
> +
> + if (count < extnum)
> + return -EOVERFLOW;
> +
> + if (!value)
> + value = 
> +
> + return of_property_read_u32_index(np, propname, extnum, value);
> +}
> +
>  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> - struct ddw_query_response *query)
> +  struct ddw_query_response *query,
> +  struct device_node *parent)
>  {
>   struct device_node *dn;
>   struct pci_dn *pdn;
> - u32 cfg_addr;
> + u32 cfg_addr, ext_query, query_out[5];
>   u64 buid;
> - int ret;
> + int ret, out_sz;
> +
> + /*
> +  * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
> +  * output parameters ibm,query-pe-dma-windows will have, ranging from
> +  * 5 to 6.
> +  */
> + ret = ddw_read_ext(parent, DDW_EXT_QUERY_OUT_SIZE, _query);
> + if (!ret && ext_query == 1)
> + out_sz = 6;
> + else
> + out_sz = 5;
>  
>   /*
>* Get the config address and phb buid of the PE window.
> @@ -897,11 +951,28 @@ static int query_ddw(struct pci_dev *dev, const u32 
> *ddw_avail,
>   buid = pdn->phb->buid;
>   cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> - ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, out_sz, query_out,
>   cfg_addr, BUID_HI(buid), BUID_LO(buid));
> - dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
> -  BUID_HI(buid), BUID_LO(buid), ret);
> + dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned 
> %d\n",
> +  ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
> +  BUID_LO(buid), ret);
> +
> + switch (out_sz) {
> + case 5:
> + 

Re: [PATCH v5 1/4] powerpc/pseries/iommu: Create defines for operations in ibm,ddw-applicable

2020-08-04 Thread Alexey Kardashevskiy



On 05/08/2020 13:04, Leonardo Bras wrote:
> Create defines to help handling ibm,ddw-applicable values, avoiding
> confusion about the index of given operations.
> 
> Signed-off-by: Leonardo Bras 
> Tested-by: David Dai 

Reviewed-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/platforms/pseries/iommu.c | 43 --
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..ac0d6376bdad 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -39,6 +39,14 @@
>  
>  #include "pseries.h"
>  
> +enum {
> + DDW_QUERY_PE_DMA_WIN  = 0,
> + DDW_CREATE_PE_DMA_WIN = 1,
> + DDW_REMOVE_PE_DMA_WIN = 2,
> +
> + DDW_APPLICABLE_SIZE
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>   struct iommu_table_group *table_group;
> @@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>  {
>   struct dynamic_dma_window_prop *dwp;
>   struct property *win64;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   u64 liobn;
>   int ret = 0;
>  
>   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -  _avail[0], 3);
> +  _avail[0], DDW_APPLICABLE_SIZE);
>  
>   win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
>   if (!win64)
> @@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   pr_debug("%pOF successfully cleared tces in window.\n",
>np);
>  
> - ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> + ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
>   if (ret)
>   pr_warn("%pOF: failed to remove direct window: rtas returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   else
>   pr_debug("%pOF: successfully removed direct window: rtas 
> returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>  
>  delprop:
>   if (remove_prop)
> @@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const u32 
> *ddw_avail,
>   buid = pdn->phb->buid;
>   cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> -   cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
>   dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
> - BUID_LO(buid), ret);
> + " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
> +  BUID_HI(buid), BUID_LO(buid), ret);
>   return ret;
>  }
>  
> @@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev, const u32 
> *ddw_avail,
>  
>   do {
>   /* extra outputs are LIOBN and dma-addr (hi, lo) */
> - ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid),
> - page_shift, window_shift);
> + ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
> + (u32 *)create, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), page_shift, window_shift);
>   } while (rtas_busy_delay(ret));
>   dev_info(>dev,
>   "ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
> - "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
> -  cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
> -  window_shift, ret, create->liobn, create->addr_hi, 
> create->addr_lo);
> + "(liobn = 0x%x starting addr = %x %x)\n",
> +  ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
> +  BUID_LO(buid), page_shift, window_shift, ret, create->liobn,
> +  create->addr_hi, create->addr_lo);
>  
>   return ret;
>  }
> @@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   int page_shift;
>   u64 dma_addr, max_addr;
>   struct device_node *dn;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   struct direct_window *window;
>   struct property *win64;
>   struct dynamic_dma_window_prop *ddwprop;
> @@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node 

Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

2020-08-04 Thread Michael Ellerman
Greg Kurz  writes:
> On Tue, 04 Aug 2020 23:35:10 +1000
> Michael Ellerman  wrote:
>> There is a bit of history to this code, but not in a good way :)
>> 
>> Michael Roth  writes:
>> > For a power9 KVM guest with XIVE enabled, running a test loop
>> > where we hotplug 384 vcpus and then unplug them, the following traces
>> > can be seen (generally within a few loops) either from the unplugged
>> > vcpu:
>> >
>> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>> >   [ 1767.952311] list_del corruption. next->prev should be 
>> > c00a02470208, but was c00a02470048
>> ...
>> >
>> > At that point the worker thread assumes the unplugged CPU is in some
>> > unknown/dead state and procedes with the cleanup, causing the race with
>> > the XIVE cleanup code executed by the unplugged CPU.
>> >
>> > Fix this by inserting an msleep() after each RTAS call to avoid
>> 
>> We previously had an msleep(), but it was removed:
>> 
>>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
>
> Ah, I hadn't seen that one...
>
>> > pseries_cpu_die() returning prematurely, and double the number of
>> > attempts so we wait at least a total of 5 seconds. While this isn't an
>> > ideal solution, it is similar to how we dealt with a similar issue for
>> > cede_offline mode in the past (940ce422a3).
>> 
>> Thiago tried to fix this previously but there was a bit of discussion
>> that didn't quite resolve:
>> 
>>   
>> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauer...@linux.ibm.com/
>
> Yeah it appears that the motivation at the time was to make the "Querying 
> DEAD?"
> messages to disappear and to avoid potentially concurrent calls to 
> rtas-stop-self
> which is prohibited by PAPR... not fixing actual crashes.

I'm pretty sure at one point we were triggering crashes *in* RTAS via
this path, I think that got resolved.

>> Spinning forever seems like a bad idea, but as has been demonstrated at
>> least twice now, continuing when we don't know the state of the other
>> CPU can lead to straight up crashes.
>> 
>> So I think I'm persuaded that it's preferable to have the kernel stuck
>> spinning rather than oopsing.
>> 
>
> +1
>
>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
>> first instinct is no, if we're stuck here for 20s a stack trace would be
>> good. But then we will probably hit that on some big and/or heavily
>> loaded machine.
>> 
>> So possibly we should call cond_resched() but have some custom logic in
>> the loop to print a warning if we are stuck for more than some
>> sufficiently long amount of time.
>
> How long should that be ?

Yeah good question.

I guess step one would be seeing how long it can take on the 384 vcpu
machine. And we can probably test on some other big machines.

Hopefully Nathan can give us some idea of how long he's seen it take on
large systems? I know he was concerned about the 20s timeout of the
softlockup detector.

Maybe a minute or two?

>> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE 
>> > interrupt controller")
>> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
>> 
>> This is not public.
>
> I'll have a look at changing that.

Thanks.

cheers


[PATCH v5 4/4] powerpc/pseries/iommu: Allow bigger 64bit window by removing default DMA window

2020-08-04 Thread Leonardo Bras
On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
default DMA window for the device, before attempting to configure a DDW,
in order to make the maximum resources available for the next DDW to be
created.

This is a requirement for using DDW on devices in which hypervisor
allows only one DMA window.

If setting up a new DDW fails anywhere after the removal of this
default DMA window, it's needed to restore the default DMA window.
For this, an implementation of ibm,reset-pe-dma-windows rtas call is
needed:

Platforms supporting the DDW option starting with LoPAR level 2.7 implement
ibm,ddw-extensions. The first extension available (index 2) carries the
token for ibm,reset-pe-dma-windows rtas call, which is used to restore
the default DMA window for a device, if it has been deleted.

It does so by resetting the TCE table allocation for the PE to it's
boot time value, available in "ibm,dma-window" device tree node.

Signed-off-by: Leonardo Bras 
Tested-by: David Dai 
---
 arch/powerpc/platforms/pseries/iommu.c | 73 +++---
 1 file changed, 66 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 4e33147825cc..e4198700ed1a 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1066,6 +1066,38 @@ static phys_addr_t ddw_memory_hotplug_max(void)
return max_addr;
 }
 
+/*
+ * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
+ * ibm,ddw-extensions, which carries the rtas token for
+ * ibm,reset-pe-dma-windows.
+ * That rtas-call can be used to restore the default DMA window for the device.
+ */
+static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
+{
+   int ret;
+   u32 cfg_addr, reset_dma_win;
+   u64 buid;
+   struct device_node *dn;
+   struct pci_dn *pdn;
+
+   ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN, _dma_win);
+   if (ret)
+   return;
+
+   dn = pci_device_to_OF_node(dev);
+   pdn = PCI_DN(dn);
+   buid = pdn->phb->buid;
+   cfg_addr = (pdn->busno << 16) | (pdn->devfn << 8);
+
+   ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr, BUID_HI(buid),
+   BUID_LO(buid));
+   if (ret)
+   dev_info(>dev,
+"ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
+reset_dma_win, cfg_addr, BUID_HI(buid), BUID_LO(buid),
+ret);
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
@@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
struct property *win64;
struct dynamic_dma_window_prop *ddwprop;
struct failed_ddw_pdn *fpdn;
+   bool default_win_removed = false;
 
mutex_lock(_window_init_mutex);
 
@@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (ret != 0)
goto out_failed;
 
+   /*
+* If there is no window available, remove the default DMA window,
+* if it's present. This will make all the resources available to the
+* new DDW window.
+* If anything fails after this, we need to restore it, so also check
+* for extensions presence.
+*/
if (query.windows_available == 0) {
-   /*
-* no additional windows are available for this device.
-* We might be able to reallocate the existing window,
-* trading in for a larger page size.
-*/
-   dev_dbg(>dev, "no free dynamic windows");
-   goto out_failed;
+   struct property *default_win;
+   int reset_win_ext;
+
+   default_win = of_find_property(pdn, "ibm,dma-window", NULL);
+   if (!default_win)
+   goto out_failed;
+
+   reset_win_ext = ddw_read_ext(pdn, DDW_EXT_RESET_DMA_WIN, NULL);
+   if (reset_win_ext)
+   goto out_failed;
+
+   remove_dma_window(pdn, ddw_avail, default_win);
+   default_win_removed = true;
+
+   /* Query again, to check if the window is available */
+   ret = query_ddw(dev, ddw_avail, , pdn);
+   if (ret != 0)
+   goto out_failed;
+
+   if (query.windows_available == 0) {
+   /* no windows are available for this device. */
+   dev_dbg(>dev, "no free dynamic windows");
+   goto out_failed;
+   }
}
if (query.page_size & 4) {
page_shift = 24; /* 16MB */
@@ -1231,6 +1288,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
kfree(win64);
 
 

[PATCH v5 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

2020-08-04 Thread Leonardo Bras
Move the window-removing part of remove_ddw into a new function
(remove_dma_window), so it can be used to remove other DMA windows.

It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
property, like the default DMA window from the device, which uses
"ibm,dma-window".

Signed-off-by: Leonardo Bras 
Tested-by: David Dai 
---
 arch/powerpc/platforms/pseries/iommu.c | 45 +++---
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 1a933c4e8bba..4e33147825cc 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -781,25 +781,14 @@ static int __init disable_ddw_setup(char *str)
 
 early_param("disable_ddw", disable_ddw_setup);
 
-static void remove_ddw(struct device_node *np, bool remove_prop)
+static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
+ struct property *win)
 {
struct dynamic_dma_window_prop *dwp;
-   struct property *win64;
-   u32 ddw_avail[DDW_APPLICABLE_SIZE];
u64 liobn;
-   int ret = 0;
-
-   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
-_avail[0], DDW_APPLICABLE_SIZE);
-
-   win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
-   if (!win64)
-   return;
-
-   if (ret || win64->length < sizeof(*dwp))
-   goto delprop;
+   int ret;
 
-   dwp = win64->value;
+   dwp = win->value;
liobn = (u64)be32_to_cpu(dwp->liobn);
 
/* clear the whole window, note the arg is in kernel pages */
@@ -821,10 +810,30 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
pr_debug("%pOF: successfully removed direct window: rtas 
returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
+}
+
+static void remove_ddw(struct device_node *np, bool remove_prop)
+{
+   struct property *win;
+   u32 ddw_avail[DDW_APPLICABLE_SIZE];
+   int ret = 0;
+
+   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
+_avail[0], DDW_APPLICABLE_SIZE);
+   if (ret)
+   return;
+
+   win = of_find_property(np, DIRECT64_PROPNAME, NULL);
+   if (!win)
+   return;
+
+   if (win->length >= sizeof(struct dynamic_dma_window_prop))
+   remove_dma_window(np, ddw_avail, win);
+
+   if (!remove_prop)
+   return;
 
-delprop:
-   if (remove_prop)
-   ret = of_remove_property(np, win64);
+   ret = of_remove_property(np, win);
if (ret)
pr_warn("%pOF: failed to remove direct window property: %d\n",
np, ret);
-- 
2.25.4



[PATCH v5 2/4] powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windows

2020-08-04 Thread Leonardo Bras
>From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
outputs from "ibm,query-pe-dma-windows" go from 5 to 6.

This change of output size is meant to expand the address size of
largest_available_block PE TCE from 32-bit to 64-bit, which ends up
shifting page_size and migration_capable.

This ends up requiring the update of
ddw_query_response->largest_available_block from u32 to u64, and manually
assigning the values from the buffer into this struct, according to
output size.

Also, a routine was created for helping reading the ddw extensions as
suggested by LoPAR: First reading the size of the extension array from
index 0, checking if the property exists, and then returning it's value.

Signed-off-by: Leonardo Bras 
Tested-by: David Dai 
---
 arch/powerpc/platforms/pseries/iommu.c | 91 +++---
 1 file changed, 81 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index ac0d6376bdad..1a933c4e8bba 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -47,6 +47,12 @@ enum {
DDW_APPLICABLE_SIZE
 };
 
+enum {
+   DDW_EXT_SIZE = 0,
+   DDW_EXT_RESET_DMA_WIN = 1,
+   DDW_EXT_QUERY_OUT_SIZE = 2
+};
+
 static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 {
struct iommu_table_group *table_group;
@@ -342,7 +348,7 @@ struct direct_window {
 /* Dynamic DMA Window support */
 struct ddw_query_response {
u32 windows_available;
-   u32 largest_available_block;
+   u64 largest_available_block;
u32 page_size;
u32 migration_capable;
 };
@@ -877,14 +883,62 @@ static int find_existing_ddw_windows(void)
 }
 machine_arch_initcall(pseries, find_existing_ddw_windows);
 
+/**
+ * ddw_read_ext - Get the value of an DDW extension
+ * @np:device node from which the extension value is to be 
read.
+ * @extnum:index number of the extension.
+ * @value: pointer to return value, modified when extension is available.
+ *
+ * Checks if "ibm,ddw-extensions" exists for this node, and get the value
+ * on index 'extnum'.
+ * It can be used only to check if a property exists, passing value == NULL.
+ *
+ * Returns:
+ * 0 if extension successfully read
+ * -EINVAL if the "ibm,ddw-extensions" does not exist,
+ * -ENODATA if "ibm,ddw-extensions" does not have a value, and
+ * -EOVERFLOW if "ibm,ddw-extensions" does not contain this extension.
+ */
+static inline int ddw_read_ext(const struct device_node *np, int extnum,
+  u32 *value)
+{
+   static const char propname[] = "ibm,ddw-extensions";
+   u32 count;
+   int ret;
+
+   ret = of_property_read_u32_index(np, propname, DDW_EXT_SIZE, );
+   if (ret)
+   return ret;
+
+   if (count < extnum)
+   return -EOVERFLOW;
+
+   if (!value)
+   value = 
+
+   return of_property_read_u32_index(np, propname, extnum, value);
+}
+
 static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
-   struct ddw_query_response *query)
+struct ddw_query_response *query,
+struct device_node *parent)
 {
struct device_node *dn;
struct pci_dn *pdn;
-   u32 cfg_addr;
+   u32 cfg_addr, ext_query, query_out[5];
u64 buid;
-   int ret;
+   int ret, out_sz;
+
+   /*
+* From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
+* output parameters ibm,query-pe-dma-windows will have, ranging from
+* 5 to 6.
+*/
+   ret = ddw_read_ext(parent, DDW_EXT_QUERY_OUT_SIZE, _query);
+   if (!ret && ext_query == 1)
+   out_sz = 6;
+   else
+   out_sz = 5;
 
/*
 * Get the config address and phb buid of the PE window.
@@ -897,11 +951,28 @@ static int query_ddw(struct pci_dev *dev, const u32 
*ddw_avail,
buid = pdn->phb->buid;
cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
 
-   ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
+   ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, out_sz, query_out,
cfg_addr, BUID_HI(buid), BUID_LO(buid));
-   dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
-   " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
-BUID_HI(buid), BUID_LO(buid), ret);
+   dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned 
%d\n",
+ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
+BUID_LO(buid), ret);
+
+   switch (out_sz) {
+   case 5:
+   query->windows_available = query_out[0];
+   query->largest_available_block = query_out[1];
+   query->page_size = query_out[2];
+   query->migration_capable = query_out[3];
+   

[PATCH v5 1/4] powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable

2020-08-04 Thread Leonardo Bras
Create defines to help handling ibm,ddw-applicable values, avoiding
confusion about the index of given operations.

Signed-off-by: Leonardo Bras 
Tested-by: David Dai 
---
 arch/powerpc/platforms/pseries/iommu.c | 43 --
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 6d47b4a3ce39..ac0d6376bdad 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -39,6 +39,14 @@
 
 #include "pseries.h"
 
+enum {
+   DDW_QUERY_PE_DMA_WIN  = 0,
+   DDW_CREATE_PE_DMA_WIN = 1,
+   DDW_REMOVE_PE_DMA_WIN = 2,
+
+   DDW_APPLICABLE_SIZE
+};
+
 static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 {
struct iommu_table_group *table_group;
@@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
 {
struct dynamic_dma_window_prop *dwp;
struct property *win64;
-   u32 ddw_avail[3];
+   u32 ddw_avail[DDW_APPLICABLE_SIZE];
u64 liobn;
int ret = 0;
 
ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
-_avail[0], 3);
+_avail[0], DDW_APPLICABLE_SIZE);
 
win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
if (!win64)
@@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
pr_debug("%pOF successfully cleared tces in window.\n",
 np);
 
-   ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
+   ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
if (ret)
pr_warn("%pOF: failed to remove direct window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
-   np, ret, ddw_avail[2], liobn);
+   np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
else
pr_debug("%pOF: successfully removed direct window: rtas 
returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
-   np, ret, ddw_avail[2], liobn);
+   np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 
 delprop:
if (remove_prop)
@@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const u32 
*ddw_avail,
buid = pdn->phb->buid;
cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
 
-   ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
- cfg_addr, BUID_HI(buid), BUID_LO(buid));
+   ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
+   cfg_addr, BUID_HI(buid), BUID_LO(buid));
dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
-   " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
-   BUID_LO(buid), ret);
+   " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
+BUID_HI(buid), BUID_LO(buid), ret);
return ret;
 }
 
@@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev, const u32 
*ddw_avail,
 
do {
/* extra outputs are LIOBN and dma-addr (hi, lo) */
-   ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
-   cfg_addr, BUID_HI(buid), BUID_LO(buid),
-   page_shift, window_shift);
+   ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
+   (u32 *)create, cfg_addr, BUID_HI(buid),
+   BUID_LO(buid), page_shift, window_shift);
} while (rtas_busy_delay(ret));
dev_info(>dev,
"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
-   "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
-cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
-window_shift, ret, create->liobn, create->addr_hi, 
create->addr_lo);
+   "(liobn = 0x%x starting addr = %x %x)\n",
+ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
+BUID_LO(buid), page_shift, window_shift, ret, create->liobn,
+create->addr_hi, create->addr_lo);
 
return ret;
 }
@@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
int page_shift;
u64 dma_addr, max_addr;
struct device_node *dn;
-   u32 ddw_avail[3];
+   u32 ddw_avail[DDW_APPLICABLE_SIZE];
struct direct_window *window;
struct property *win64;
struct dynamic_dma_window_prop *ddwprop;
@@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
 * the property is actually in the parent, not the PE
 */
ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
-   

[PATCH v5 0/4] Allow bigger 64bit window by removing default DMA window

2020-08-04 Thread Leonardo Bras
There are some devices in which a hypervisor may only allow 1 DMA window
to exist at a time, and in those cases, a DDW is never created to them,
since the default DMA window keeps using this resource.

LoPAR recommends this procedure:
1. Remove the default DMA window,
2. Query for which configs the DDW can be created,
3. Create a DDW.

Patch #1:
Create defines for outputs of ibm,ddw-applicable, so it's easier to
identify them.

Patch #2:
- After LoPAR level 2.8, there is an extension that can make
  ibm,query-pe-dma-windows to have 6 outputs instead of 5. This changes the
  order of the outputs, and that can cause some trouble. 
- query_ddw() was updated to check how many outputs the 
  ibm,query-pe-dma-windows is supposed to have, update the rtas_call() and
  deal correctly with the outputs in both cases.
- This patch looks somehow unrelated to the series, but it can avoid future
  problems on DDW creation.

Patch #3 moves the window-removing code from remove_ddw() to
remove_dma_window(), creating a way to delete any DMA window, so it can be
used to delete the default DMA window.

Patch #4 makes use of the remove_dma_window() from patch #3 to remove the
default DMA window before query_ddw(). It also implements a new rtas call
to recover the default DMA window, in case anything fails after it was
removed, and a DDW couldn't be created.

---
Changes since v4:
- Removed patches 5+ in order to deal with a feature at a time
- Remove unnecessary parentesis in patch #4
- Changed patch #4 title from 
  "Remove default DMA window before creating DDW"
- Included David Dai tested-by
- v4 link: 
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=190051=%2A=both

Changes since v3:
- Introduces new patch #5, to prepare for an important change in #6
- struct iommu_table was not being updated, so include a way to do this
  in patch #6.
- Improved patch #4 based in a suggestion from Alexey, to make code
  more easily understandable
- v3 link: 
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=187348=%2A=both

Changes since v2:
- Change the way ibm,ddw-extensions is accessed, using a proper function
  instead of doing this inline everytime it's used.
- Remove previous patch #6, as it doesn't look like it would be useful.
- Add new patch, for changing names from direct* to dma*, as indirect 
  mapping can be used from now on.
- Fix some typos, corrects some define usage.
- v2 link: 
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=185433=%2A=both

Changes since v1:
- Add defines for ibm,ddw-applicable and ibm,ddw-extensions outputs
- Merge aux function query_ddw_out_sz() into query_ddw()
- Merge reset_dma_window() patch (prev. #2) into remove default DMA
  window patch (#4).
- Keep device_node *np name instead of using pdn in remove_*()
- Rename 'device_node *pdn' into 'parent' in new functions
- Rename dfl_win to default_win
- Only remove the default DMA window if there is no window available
  in first query.
- Check if default DMA window can be restored before removing it.
- Fix 'unitialized use' (found by travis mpe:ci-test)
- New patches #5 and #6
- v1 link: 
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=184420=%2A=both

Special thanks for Alexey Kardashevskiy, Brian King and
Oliver O'Halloran for the feedback provided!


Leonardo Bras (4):
  powerpc/pseries/iommu: Create defines for operations in
ibm,ddw-applicable
  powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
  powerpc/pseries/iommu: Move window-removing part of remove_ddw into
remove_dma_window
  powerpc/pseries/iommu: Allow bigger 64bit window by removing default
DMA window

 arch/powerpc/platforms/pseries/iommu.c | 242 -
 1 file changed, 195 insertions(+), 47 deletions(-)

-- 
2.25.4



Re: [PATCH] powerpc/powernv/sriov: Fix use of uninitialised variable

2020-08-04 Thread Michael Ellerman
On Mon, 3 Aug 2020 17:54:08 +1000, Oliver O'Halloran wrote:
> Initialising the value before using it is generally regarded as a good
> idea so do that.

Applied to powerpc/next.

[1/1] powerpc/powernv/sriov: Fix use of uninitialised variable
  https://git.kernel.org/powerpc/c/2075ec9896c5aef01e837198381d04cfa6452317

cheers


Re: [PATCH v2] selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs

2020-08-04 Thread Michael Ellerman
On Mon, 3 Aug 2020 12:07:19 +1000, Michael Ellerman wrote:
> Some of our tests use VSX or newer VMX instructions, so need to be
> skipped on older CPUs to avoid SIGILL'ing.
> 
> Similarly TAR was added in v2.07, and the PMU event used in the stcx
> fail test only works on Power8 or later.

Applied to powerpc/next.

[1/1] selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
  https://git.kernel.org/powerpc/c/872d11bca9c29ed19595c993b9f552ffe9b63dcb

cheers


Re: [PATCH] powerpc/40x: Fix assembler warning about r0

2020-08-04 Thread Michael Ellerman
On Wed, 22 Jul 2020 12:24:22 +1000, Michael Ellerman wrote:
> The assembler says:
>   arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression
> 
> It's objecting to the use of r0 as the RA argument. That's because
> when RA = 0 the literal value 0 is used, rather than the content of
> r0, making the use of r0 in the source potentially confusing.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/40x: Fix assembler warning about r0
  https://git.kernel.org/powerpc/c/8d8a629d00a5283874b81b594f31f8d436dc57d8

cheers


Re: [PATCH v4 1/7] powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable

2020-08-04 Thread David Dai
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> Create defines to help handling ibm,ddw-applicable values, avoiding
> confusion about the index of given operations.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 43 --
> 
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..ac0d6376bdad 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -39,6 +39,14 @@
>  
>  #include "pseries.h"
>  
> +enum {
> + DDW_QUERY_PE_DMA_WIN  = 0,
> + DDW_CREATE_PE_DMA_WIN = 1,
> + DDW_REMOVE_PE_DMA_WIN = 2,
> +
> + DDW_APPLICABLE_SIZE
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>   struct iommu_table_group *table_group;
> @@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>  {
>   struct dynamic_dma_window_prop *dwp;
>   struct property *win64;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   u64 liobn;
>   int ret = 0;
>  
>   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -  _avail[0], 3);
> +  _avail[0],
> DDW_APPLICABLE_SIZE);
>  
>   win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
>   if (!win64)
> @@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>   pr_debug("%pOF successfully cleared tces in window.\n",
>np);
>  
> - ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> + ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL,
> liobn);
>   if (ret)
>   pr_warn("%pOF: failed to remove direct window: rtas
> returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>   else
>   pr_debug("%pOF: successfully removed direct window:
> rtas returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>  
>  delprop:
>   if (remove_prop)
> @@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const
> u32 *ddw_avail,
>   buid = pdn->phb->buid;
>   cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> -   cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32
> *)query,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
>   dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr,
> BUID_HI(buid),
> - BUID_LO(buid), ret);
> + " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN],
> cfg_addr,
> +  BUID_HI(buid), BUID_LO(buid), ret);
>   return ret;
>  }
>  
> @@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev,
> const u32 *ddw_avail,
>  
>   do {
>   /* extra outputs are LIOBN and dma-addr (hi, lo) */
> - ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid),
> - page_shift, window_shift);
> + ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
> + (u32 *)create, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), page_shift,
> window_shift);
>   } while (rtas_busy_delay(ret));
>   dev_info(>dev,
>   "ibm,create-pe-dma-window(%x) %x %x %x %x %x returned
> %d "
> - "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
> -  cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
> -  window_shift, ret, create->liobn, create->addr_hi,
> create->addr_lo);
> + "(liobn = 0x%x starting addr = %x %x)\n",
> +  ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr,
> BUID_HI(buid),
> +  BUID_LO(buid), page_shift, window_shift, ret, create-
> >liobn,
> +  create->addr_hi, create->addr_lo);
>  
>   return ret;
>  }
> @@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>   int page_shift;
>   u64 dma_addr, max_addr;
>   struct device_node *dn;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   struct direct_window *window;
>   struct property *win64;
>   struct dynamic_dma_window_prop *ddwprop;
> @@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>* the property is 

Re: [PATCH v4 4/7] powerpc/pseries/iommu: Remove default DMA window before creating DDW

2020-08-04 Thread David Dai
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove
> the
> default DMA window for the device, before attempting to configure a
> DDW,
> in order to make the maximum resources available for the next DDW to
> be
> created.
> 
> This is a requirement for using DDW on devices in which hypervisor
> allows only one DMA window.
> 
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, it's needed to restore the default DMA window.
> For this, an implementation of ibm,reset-pe-dma-windows rtas call is
> needed:
> 
> Platforms supporting the DDW option starting with LoPAR level 2.7
> implement
> ibm,ddw-extensions. The first extension available (index 2) carries
> the
> token for ibm,reset-pe-dma-windows rtas call, which is used to
> restore
> the default DMA window for a device, if it has been deleted.
> 
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 73 +++-
> --
>  1 file changed, 66 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 4e33147825cc..fc8d0555e2e9 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1066,6 +1066,38 @@ static phys_addr_t
> ddw_memory_hotplug_max(void)
>   return max_addr;
>  }
>  
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7
> implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for
> the device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node
> *par_dn)
> +{
> + int ret;
> + u32 cfg_addr, reset_dma_win;
> + u64 buid;
> + struct device_node *dn;
> + struct pci_dn *pdn;
> +
> + ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN,
> _dma_win);
> + if (ret)
> + return;
> +
> + dn = pci_device_to_OF_node(dev);
> + pdn = PCI_DN(dn);
> + buid = pdn->phb->buid;
> + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> +
> + ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr,
> BUID_HI(buid),
> + BUID_LO(buid));
> + if (ret)
> + dev_info(>dev,
> +  "ibm,reset-pe-dma-windows(%x) %x %x %x
> returned %d ",
> +  reset_dma_win, cfg_addr, BUID_HI(buid),
> BUID_LO(buid),
> +  ret);
> +}
> +
>  /*
>   * If the PE supports dynamic dma windows, and there is space for a
> table
>   * that can map all pages in a linear offset, then setup such a
> table,
> @@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>   struct property *win64;
>   struct dynamic_dma_window_prop *ddwprop;
>   struct failed_ddw_pdn *fpdn;
> + bool default_win_removed = false;
>  
>   mutex_lock(_window_init_mutex);
>  
> @@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>   if (ret != 0)
>   goto out_failed;
>  
> + /*
> +  * If there is no window available, remove the default DMA
> window,
> +  * if it's present. This will make all the resources available
> to the
> +  * new DDW window.
> +  * If anything fails after this, we need to restore it, so also
> check
> +  * for extensions presence.
> +  */
>   if (query.windows_available == 0) {
> - /*
> -  * no additional windows are available for this device.
> -  * We might be able to reallocate the existing window,
> -  * trading in for a larger page size.
> -  */
> - dev_dbg(>dev, "no free dynamic windows");
> - goto out_failed;
> + struct property *default_win;
> + int reset_win_ext;
> +
> + default_win = of_find_property(pdn, "ibm,dma-window",
> NULL);
> + if (!default_win)
> + goto out_failed;
> +
> + reset_win_ext = ddw_read_ext(pdn,
> DDW_EXT_RESET_DMA_WIN, NULL);
> + if (reset_win_ext)
> + goto out_failed;
> +
> + remove_dma_window(pdn, ddw_avail, default_win);
> + default_win_removed = true;
> +
> + /* Query again, to check if the window is available */
> + ret = query_ddw(dev, ddw_avail, , pdn);
> + if (ret != 0)
> + goto out_failed;
> +
> + if (query.windows_available == 0) {
> + /* no windows are available for this device. */
> + dev_dbg(>dev, "no free dynamic windows");
> + goto out_failed;
> + }
>  

Re: [PATCH v4 3/7] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

2020-08-04 Thread David Dai
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> Move the window-removing part of remove_ddw into a new function
> (remove_dma_window), so it can be used to remove other DMA windows.
> 
> It's useful for removing DMA windows that don't create
> DIRECT64_PROPNAME
> property, like the default DMA window from the device, which uses
> "ibm,dma-window".
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 45 +++-
> --
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 1a933c4e8bba..4e33147825cc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -781,25 +781,14 @@ static int __init disable_ddw_setup(char *str)
>  
>  early_param("disable_ddw", disable_ddw_setup);
>  
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static void remove_dma_window(struct device_node *np, u32
> *ddw_avail,
> +   struct property *win)
>  {
>   struct dynamic_dma_window_prop *dwp;
> - struct property *win64;
> - u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   u64 liobn;
> - int ret = 0;
> -
> - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -  _avail[0],
> DDW_APPLICABLE_SIZE);
> -
> - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> - if (!win64)
> - return;
> -
> - if (ret || win64->length < sizeof(*dwp))
> - goto delprop;
> + int ret;
>  
> - dwp = win64->value;
> + dwp = win->value;
>   liobn = (u64)be32_to_cpu(dwp->liobn);
>  
>   /* clear the whole window, note the arg is in kernel pages */
> @@ -821,10 +810,30 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>   pr_debug("%pOF: successfully removed direct window:
> rtas returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
> +}
> +
> +static void remove_ddw(struct device_node *np, bool remove_prop)
> +{
> + struct property *win;
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> + int ret = 0;
> +
> + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> +  _avail[0],
> DDW_APPLICABLE_SIZE);
> + if (ret)
> + return;
> +
> + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + if (!win)
> + return;
> +
> + if (win->length >= sizeof(struct dynamic_dma_window_prop))
> + remove_dma_window(np, ddw_avail, win);
> +
> + if (!remove_prop)
> + return;
>  
> -delprop:
> - if (remove_prop)
> - ret = of_remove_property(np, win64);
> + ret = of_remove_property(np, win);
>   if (ret)
>   pr_warn("%pOF: failed to remove direct window property:
> %d\n",
>   np, ret);
Tested-by: David Dai 



Re: [PATCH v4 2/7] powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windows

2020-08-04 Thread David Dai
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> > From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the
> > number of
> 
> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
> 
> This change of output size is meant to expand the address size of
> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> shifting page_size and migration_capable.
> 
> This ends up requiring the update of
> ddw_query_response->largest_available_block from u32 to u64, and
> manually
> assigning the values from the buffer into this struct, according to
> output size.
> 
> Also, a routine was created for helping reading the ddw extensions as
> suggested by LoPAR: First reading the size of the extension array
> from
> index 0, checking if the property exists, and then returning it's
> value.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 91 +++-
> --
>  1 file changed, 81 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index ac0d6376bdad..1a933c4e8bba 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -47,6 +47,12 @@ enum {
>   DDW_APPLICABLE_SIZE
>  };
>  
> +enum {
> + DDW_EXT_SIZE = 0,
> + DDW_EXT_RESET_DMA_WIN = 1,
> + DDW_EXT_QUERY_OUT_SIZE = 2
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>   struct iommu_table_group *table_group;
> @@ -342,7 +348,7 @@ struct direct_window {
>  /* Dynamic DMA Window support */
>  struct ddw_query_response {
>   u32 windows_available;
> - u32 largest_available_block;
> + u64 largest_available_block;
>   u32 page_size;
>   u32 migration_capable;
>  };
> @@ -877,14 +883,62 @@ static int find_existing_ddw_windows(void)
>  }
>  machine_arch_initcall(pseries, find_existing_ddw_windows);
>  
> +/**
> + * ddw_read_ext - Get the value of an DDW extension
> + * @np:  device node from which the extension value is
> to be read.
> + * @extnum:  index number of the extension.
> + * @value:   pointer to return value, modified when extension is
> available.
> + *
> + * Checks if "ibm,ddw-extensions" exists for this node, and get the
> value
> + * on index 'extnum'.
> + * It can be used only to check if a property exists, passing value
> == NULL.
> + *
> + * Returns:
> + *   0 if extension successfully read
> + *   -EINVAL if the "ibm,ddw-extensions" does not exist,
> + *   -ENODATA if "ibm,ddw-extensions" does not have a value, and
> + *   -EOVERFLOW if "ibm,ddw-extensions" does not contain this
> extension.
> + */
> +static inline int ddw_read_ext(const struct device_node *np, int
> extnum,
> +u32 *value)
> +{
> + static const char propname[] = "ibm,ddw-extensions";
> + u32 count;
> + int ret;
> +
> + ret = of_property_read_u32_index(np, propname, DDW_EXT_SIZE,
> );
> + if (ret)
> + return ret;
> +
> + if (count < extnum)
> + return -EOVERFLOW;
> +
> + if (!value)
> + value = 
> +
> + return of_property_read_u32_index(np, propname, extnum, value);
> +}
> +
>  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> - struct ddw_query_response *query)
> +  struct ddw_query_response *query,
> +  struct device_node *parent)
>  {
>   struct device_node *dn;
>   struct pci_dn *pdn;
> - u32 cfg_addr;
> + u32 cfg_addr, ext_query, query_out[5];
>   u64 buid;
> - int ret;
> + int ret, out_sz;
> +
> + /*
> +  * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule
> how many
> +  * output parameters ibm,query-pe-dma-windows will have,
> ranging from
> +  * 5 to 6.
> +  */
> + ret = ddw_read_ext(parent, DDW_EXT_QUERY_OUT_SIZE, _query);
> + if (!ret && ext_query == 1)
> + out_sz = 6;
> + else
> + out_sz = 5;
>  
>   /*
>* Get the config address and phb buid of the PE window.
> @@ -897,11 +951,28 @@ static int query_ddw(struct pci_dev *dev, const
> u32 *ddw_avail,
>   buid = pdn->phb->buid;
>   cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> - ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32
> *)query,
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, out_sz,
> query_out,
>   cfg_addr, BUID_HI(buid), BUID_LO(buid));
> - dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN],
> cfg_addr,
> -  BUID_HI(buid), BUID_LO(buid), ret);
> + dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x
> returned %d\n",
> +  ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
> BUID_HI(buid),
> +  BUID_LO(buid), ret);
> +
> + switch (out_sz) {
> + case 5:
> + 

Re: [PATCH v4 1/7] powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable

2020-08-04 Thread David Dai
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> Create defines to help handling ibm,ddw-applicable values, avoiding
> confusion about the index of given operations.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 43 --
> 
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..ac0d6376bdad 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -39,6 +39,14 @@
>  
>  #include "pseries.h"
>  
> +enum {
> + DDW_QUERY_PE_DMA_WIN  = 0,
> + DDW_CREATE_PE_DMA_WIN = 1,
> + DDW_REMOVE_PE_DMA_WIN = 2,
> +
> + DDW_APPLICABLE_SIZE
> +};
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>   struct iommu_table_group *table_group;
> @@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>  {
>   struct dynamic_dma_window_prop *dwp;
>   struct property *win64;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   u64 liobn;
>   int ret = 0;
>  
>   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> -  _avail[0], 3);
> +  _avail[0],
> DDW_APPLICABLE_SIZE);
>  
>   win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
>   if (!win64)
> @@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
>   pr_debug("%pOF successfully cleared tces in window.\n",
>np);
>  
> - ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> + ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL,
> liobn);
>   if (ret)
>   pr_warn("%pOF: failed to remove direct window: rtas
> returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>   else
>   pr_debug("%pOF: successfully removed direct window:
> rtas returned "
>   "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>  
>  delprop:
>   if (remove_prop)
> @@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const
> u32 *ddw_avail,
>   buid = pdn->phb->buid;
>   cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>  
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> -   cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32
> *)query,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
>   dev_info(>dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr,
> BUID_HI(buid),
> - BUID_LO(buid), ret);
> + " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN],
> cfg_addr,
> +  BUID_HI(buid), BUID_LO(buid), ret);
>   return ret;
>  }
>  
> @@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev,
> const u32 *ddw_avail,
>  
>   do {
>   /* extra outputs are LIOBN and dma-addr (hi, lo) */
> - ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid),
> - page_shift, window_shift);
> + ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
> + (u32 *)create, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), page_shift,
> window_shift);
>   } while (rtas_busy_delay(ret));
>   dev_info(>dev,
>   "ibm,create-pe-dma-window(%x) %x %x %x %x %x returned
> %d "
> - "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
> -  cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
> -  window_shift, ret, create->liobn, create->addr_hi,
> create->addr_lo);
> + "(liobn = 0x%x starting addr = %x %x)\n",
> +  ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr,
> BUID_HI(buid),
> +  BUID_LO(buid), page_shift, window_shift, ret, create-
> >liobn,
> +  create->addr_hi, create->addr_lo);
>  
>   return ret;
>  }
> @@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>   int page_shift;
>   u64 dma_addr, max_addr;
>   struct device_node *dn;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   struct direct_window *window;
>   struct property *win64;
>   struct dynamic_dma_window_prop *ddwprop;
> @@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>* the property is 

[PATCH] powerpc/oprofile: fix spelling mistake "contex" -> "context"

2020-08-04 Thread Colin King
From: Colin Ian King 

There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King 
---
 arch/powerpc/oprofile/cell/spu_task_sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/oprofile/cell/spu_task_sync.c 
b/arch/powerpc/oprofile/cell/spu_task_sync.c
index df59d0bb121f..489f993100d5 100644
--- a/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -572,7 +572,7 @@ void spu_sync_buffer(int spu_num, unsigned int *samples,
 * samples are recorded.
 * No big deal -- so we just drop a few samples.
 */
-   pr_debug("SPU_PROF: No cached SPU contex "
+   pr_debug("SPU_PROF: No cached SPU context "
  "for SPU #%d. Dropping samples.\n", spu_num);
goto out;
}
-- 
2.27.0



[PATCH v4] selftests/powerpc: Fix pkey syscall redefinitions

2020-08-04 Thread Sandipan Das
On distros using older glibc versions, the pkey tests
encounter build failures due to redefinition of the
pkey syscall numbers.

For compatibility, commit 743f3544fffb added a wrapper
for the gettid() syscall and included syscall.h if the
version of glibc used is older than 2.30. This leads
to different definitions of SYS_pkey_* as the ones in
the pkey test header set numeric constants where as the
ones from syscall.h reuse __NR_pkey_*. The compiler
complains about redefinitions since they are different.

This replaces SYS_pkey_* definitions with __NR_pkey_*
such that the definitions in both syscall.h and pkeys.h
are alike. This way, if syscall.h has to be included
for compatibility reasons, builds will still succeed.

Fixes: 743f3544fffb ("selftests/powerpc: Add wrapper for gettid")
Reported-by: Sachin Sant 
Suggested-by: David Laight 
Suggested-by: Michael Ellerman 
Signed-off-by: Sandipan Das 
---
Previous versions can be found at:
v3: 
https://lore.kernel.org/linuxppc-dev/1bb744b0c7ed3985a5b73289f4de629ac0aeaf7c.1596453627.git.sandi...@linux.ibm.com/
v2: 
https://lore.kernel.org/linuxppc-dev/566dde119ce71f00f9642807ba30ceb7f54c9bfa.1596441105.git.sandi...@linux.ibm.com/
v1: 
https://lore.kernel.org/linuxppc-dev/20200803074043.466809-1-sandi...@linux.ibm.com/

Changes in v4:
- Replace SYS_pkey_* with __NR_pkey_* based on suggestions
  from David and Michael.
- Update commit message and add fixes tag.

Changes in v3:
- Use ifndef...endif instead of undef as suggested by
  Michael.

Changes in v2:
- Fix incorrect commit message.

---
 tools/testing/selftests/powerpc/include/pkeys.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/pkeys.h 
b/tools/testing/selftests/powerpc/include/pkeys.h
index 6ba95039a034..3312cb1b058d 100644
--- a/tools/testing/selftests/powerpc/include/pkeys.h
+++ b/tools/testing/selftests/powerpc/include/pkeys.h
@@ -31,9 +31,9 @@
 
 #define SI_PKEY_OFFSET 0x20
 
-#define SYS_pkey_mprotect  386
-#define SYS_pkey_alloc 384
-#define SYS_pkey_free  385
+#define __NR_pkey_mprotect 386
+#define __NR_pkey_alloc384
+#define __NR_pkey_free 385
 
 #define PKEY_BITS_PER_PKEY 2
 #define NR_PKEYS   32
@@ -62,17 +62,17 @@ void pkey_set_rights(int pkey, unsigned long rights)
 
 int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
 {
-   return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
+   return syscall(__NR_pkey_mprotect, addr, len, prot, pkey);
 }
 
 int sys_pkey_alloc(unsigned long flags, unsigned long rights)
 {
-   return syscall(SYS_pkey_alloc, flags, rights);
+   return syscall(__NR_pkey_alloc, flags, rights);
 }
 
 int sys_pkey_free(int pkey)
 {
-   return syscall(SYS_pkey_free, pkey);
+   return syscall(__NR_pkey_free, pkey);
 }
 
 int pkeys_unsupported(void)
-- 
2.25.1



Re: [merge] Build failure selftest/powerpc/mm/pkey_exec_prot

2020-08-04 Thread Sandipan Das


On 04/08/20 5:53 pm, Michael Ellerman wrote:
> Sandipan Das  writes:
>> On 04/08/20 6:38 am, Michael Ellerman wrote:
>>> Sandipan Das  writes:
 On 03/08/20 4:32 pm, Michael Ellerman wrote:
> Sachin Sant  writes:
>>> On 02-Aug-2020, at 10:58 PM, Sandipan Das  
>>> wrote:
>>> On 02/08/20 4:45 pm, Sachin Sant wrote:
 pkey_exec_prot test from linuxppc merge branch (3f68564f1f5a) fails to
 build due to following error:

 gcc -std=gnu99 -O2 -Wall -Werror 
 -DGIT_VERSION='"v5.8-rc7-1276-g3f68564f1f5a"' 
 -I/home/sachin/linux/tools/testing/selftests/powerpc/include  -m64
 pkey_exec_prot.c 
 /home/sachin/linux/tools/testing/selftests/kselftest_harness.h 
 /home/sachin/linux/tools/testing/selftests/kselftest.h ../harness.c 
 ../utils.c  -o 
 /home/sachin/linux/tools/testing/selftests/powerpc/mm/pkey_exec_prot
 In file included from pkey_exec_prot.c:18:
 /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:34: 
 error: "SYS_pkey_mprotect" redefined [-Werror]
 #define SYS_pkey_mprotect 386

 In file included from /usr/include/sys/syscall.h:31,
 from 
 /home/sachin/linux/tools/testing/selftests/powerpc/include/utils.h:47,
 from 
 /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:12,
 from pkey_exec_prot.c:18:
 /usr/include/bits/syscall.h:1583: note: this is the location of the 
 previous definition
 # define SYS_pkey_mprotect __NR_pkey_mprotect

 commit 128d3d021007 introduced this error.
 selftests/powerpc: Move pkey helpers to headers

 Possibly the # defines for sys calls can be retained in 
 pkey_exec_prot.c or

>>>
>>> I am unable to reproduce this on the latest merge branch (HEAD at 
>>> f59195f7faa4).
>>> I don't see any redefinitions in pkey_exec_prot.c either.
>>>
>>
>> I can still see this problem on latest merge branch.
>> I have following gcc version
>>
>> gcc version 8.3.1 20191121
>
> What libc version? Or just the distro & version?

 Sachin observed this on RHEL 8.2 with glibc-2.28.
 I couldn't reproduce it on Ubuntu 20.04 and Fedora 32 and both these 
 distros
 are using glibc-2.31.
>>>
>>> OK odd. Usually it's newer glibc that hits this problem.
>>>
>>> I guess on RHEL 8.2 we're getting the asm-generic version? But that
>>> would be quite wrong if that's what's happening.
>>
>> If I let GCC dump all the headers that are being used for the source file, I 
>> always
>> see syscall.h being included on the RHEL 8.2 system. That is the header with 
>> the
>> conflicting definition.
>>
>>   $ cd tools/testing/selftests/powerpc/mm
>>   $ gcc -H -std=gnu99 -O2 -Wall -Werror 
>> -DGIT_VERSION='"v5.8-rc7-1456-gf59195f7faa4-dirty"' \
>> -I../include -m64 pkey_exec_prot.c ../../kselftest_harness.h 
>> ../../kselftest.h ../harness.c ../utils.c \
>> -o pkey_exec_prot 2>&1 | grep syscall
>>
>> On Ubuntu 20.04 and Fedora 32, grep doesn't find any matching text.
>> On RHEL 8.2, it shows the following.
>>   ... /usr/include/sys/syscall.h
>>    /usr/include/bits/syscall.h
>>   In file included from /usr/include/sys/syscall.h:31,
>>   /usr/include/bits/syscall.h:1583: note: this is the location of the 
>> previous definition
>>   In file included from /usr/include/sys/syscall.h:31,
>>   /usr/include/bits/syscall.h:1575: note: this is the location of the 
>> previous definition
>>   In file included from /usr/include/sys/syscall.h:31,
>>   /usr/include/bits/syscall.h:1579: note: this is the location of the 
>> previous definition
>>   /usr/include/bits/syscall.h
>>   .. /usr/include/sys/syscall.h
>>   ... /usr/include/bits/syscall.h
>>   /usr/include/bits/syscall.h
>>   .. /usr/include/sys/syscall.h
>>   ... /usr/include/bits/syscall.h
>>   /usr/include/bits/syscall.h
>>
>> So utils.h is also including /usr/include/sys/syscall.h for glibc versions 
>> older than 2.30
>> because of commit 743f3544fffb ("selftests/powerpc: Add wrapper for gettid") 
>> :)
> 
> Haha, of course. :facepalm_emoji:
> 
>> [...]
>> . ../include/pkeys.h
>> [...]
>> .. ../include/utils.h
>> [...]
>> ... /usr/include/sys/syscall.h
>>  /usr/include/asm/unistd.h
>>  /usr/include/bits/syscall.h
>> In file included from pkey_exec_prot.c:18:
>> ../include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>>  #define SYS_pkey_mprotect 386
>>
>> In file included from /usr/include/sys/syscall.h:31,
>>  from ../include/utils.h:47,
>>  from ../include/pkeys.h:12,
>>  from pkey_exec_prot.c:18:
>> /usr/include/bits/syscall.h:1583: note: this is the location of the previous 
>> definition
>>  # define SYS_pkey_mprotect __NR_pkey_mprotect
> 
> Aha, that 

Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

2020-08-04 Thread Greg Kurz
On Tue, 04 Aug 2020 23:35:10 +1000
Michael Ellerman  wrote:

> Hi Mike,
> 
> There is a bit of history to this code, but not in a good way :)
> 
> Michael Roth  writes:
> > For a power9 KVM guest with XIVE enabled, running a test loop
> > where we hotplug 384 vcpus and then unplug them, the following traces
> > can be seen (generally within a few loops) either from the unplugged
> > vcpu:
> >
> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
> >   [ 1767.952311] list_del corruption. next->prev should be 
> > c00a02470208, but was c00a02470048
> ...
> >
> > At that point the worker thread assumes the unplugged CPU is in some
> > unknown/dead state and procedes with the cleanup, causing the race with
> > the XIVE cleanup code executed by the unplugged CPU.
> >
> > Fix this by inserting an msleep() after each RTAS call to avoid
> 
> We previously had an msleep(), but it was removed:
> 
>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
> 

Ah, I hadn't seen that one...

> > pseries_cpu_die() returning prematurely, and double the number of
> > attempts so we wait at least a total of 5 seconds. While this isn't an
> > ideal solution, it is similar to how we dealt with a similar issue for
> > cede_offline mode in the past (940ce422a3).
> 
> Thiago tried to fix this previously but there was a bit of discussion
> that didn't quite resolve:
> 
>   
> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauer...@linux.ibm.com/
> 

Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
messages to disappear and to avoid potentially concurrent calls to 
rtas-stop-self
which is prohibited by PAPR... not fixing actual crashes.

> 
> Spinning forever seems like a bad idea, but as has been demonstrated at
> least twice now, continuing when we don't know the state of the other
> CPU can lead to straight up crashes.
> 
> So I think I'm persuaded that it's preferable to have the kernel stuck
> spinning rather than oopsing.
> 

+1

> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> first instinct is no, if we're stuck here for 20s a stack trace would be
> good. But then we will probably hit that on some big and/or heavily
> loaded machine.
> 
> So possibly we should call cond_resched() but have some custom logic in
> the loop to print a warning if we are stuck for more than some
> sufficiently long amount of time.
> 

How long should that be ?

> 
> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE 
> > interrupt controller")
> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> 
> This is not public.
> 

I'll have a look at changing that.

> I tend to trim Bugzilla links from the change log, because I'm not
> convinced they will last forever, but it is good to have them in the
> mail archive.
> 
> cheers
> 

Cheers,

--
Greg

> > Cc: Michael Ellerman 
> > Cc: Cedric Le Goater 
> > Cc: Greg Kurz 
> > Cc: Nathan Lynch 
> > Signed-off-by: Michael Roth 
> > ---
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> > b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index c6e0d8abf75e..3cb172758052 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu)
> > int cpu_status = 1;
> > unsigned int pcpu = get_hard_smp_processor_id(cpu);
> >  
> > -   for (tries = 0; tries < 25; tries++) {
> > +   for (tries = 0; tries < 50; tries++) {
> > cpu_status = smp_query_cpu_stopped(pcpu);
> > if (cpu_status == QCSS_STOPPED ||
> > cpu_status == QCSS_HARDWARE_ERROR)
> > break;
> > -   cpu_relax();
> > -
> > +   msleep(100);
> > }
> >  
> > if (cpu_status != 0) {
> > -- 
> > 2.17.1



[PATCH -next] soc: fsl: qe: Remove unnessesary check in ucc_set_tdm_rxtx_clk

2020-08-04 Thread Wang Hai
Fix smatch warning:

drivers/soc/fsl/qe/ucc.c:526
 ucc_set_tdm_rxtx_clk() warn: unsigned 'tdm_num' is never less than zero.

'tdm_num' is u32 type, never less than zero.

Signed-off-by: Wang Hai 
---
 drivers/soc/fsl/qe/ucc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qe/ucc.c b/drivers/soc/fsl/qe/ucc.c
index cac0fb7693a0..21dbcd787cd5 100644
--- a/drivers/soc/fsl/qe/ucc.c
+++ b/drivers/soc/fsl/qe/ucc.c
@@ -523,7 +523,7 @@ int ucc_set_tdm_rxtx_clk(u32 tdm_num, enum qe_clock clock,
 
qe_mux_reg = _immr->qmx;
 
-   if (tdm_num > 7 || tdm_num < 0)
+   if (tdm_num > 7)
return -EINVAL;
 
/* The communications direction must be RX or TX */
-- 
2.17.1



Re: [PATCH v2 01/17] KVM: PPC: Book3S HV: simplify kvm_cma_reserve()

2020-08-04 Thread Daniel Axtens
Hi Mike,

>
> The memory size calculation in kvm_cma_reserve() traverses memblock.memory
> rather than simply call memblock_phys_mem_size(). The comment in that
> function suggests that at some point there should have been call to
> memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
>
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
>
> Signed-off-by: Mike Rapoport 
> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 7cd3cf3d366b..56ab0d28de2a 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -95,22 +95,15 @@ EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
>  void __init kvm_cma_reserve(void)
>  {
>   unsigned long align_size;
> - struct memblock_region *reg;
> - phys_addr_t selected_size = 0;
> + phys_addr_t selected_size;
>  
>   /*
>* We need CMA reservation only when we are in HV mode
>*/
>   if (!cpu_has_feature(CPU_FTR_HVMODE))
>   return;
> - /*
> -  * We cannot use memblock_phys_mem_size() here, because
> -  * memblock_analyze() has not been called yet.
> -  */
> - for_each_memblock(memory, reg)
> - selected_size += memblock_region_memory_end_pfn(reg) -
> -  memblock_region_memory_base_pfn(reg);
>  
> + selected_size = PHYS_PFN(memblock_phys_mem_size());
>   selected_size = (selected_size * kvm_cma_resv_ratio / 100) << 
> PAGE_SHIFT;

I think this is correct, but PHYS_PFN does x >> PAGE_SHIFT and then the
next line does x << PAGE_SHIFT, so I think we could combine those two
lines as:

selected_size = PAGE_ALIGN(memblock_phys_mem_size() * kvm_cma_resv_ratio / 100);

(I think that might technically change it from aligning down to aligning
up but I don't think 1 page matters here.)

Kind regards,
Daniel


>   if (selected_size) {
>   pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> -- 
> 2.26.2


Re: [PATCHv4 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-08-04 Thread Pingfan Liu
On Tue, Aug 4, 2020 at 12:29 AM Laurent Dufour  wrote:
>
[...]
> >   lmb_set_nid(lmb);
> >   lmb->flags |= DRCONF_MEM_ASSIGNED;
> > + if (dt_update) {
> > + ret = drmem_update_dt();
> > + if (ret)
> > + pr_warn("%s fail to update dt, but continue\n", 
> > __func__);
> > + }
> >
> >   block_sz = memory_block_size_bytes();
>
> In the case the call to __add_memory is failing, the flag DRCONF_MEM_ASSIGNED
> should be cleared as I mentioned in your previous patch. In addition to this,
Yes.
> the DT should be updated, or the caller should manage that but that will be 
> hard
> since it doesn't know where the error happened in this function.
Yeah, it is hard to manage it by caller, so just updating dt  is a
easier method.
>
> >
> > @@ -625,7 +653,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >   invalidate_lmb_associativity_index(lmb);
> >   lmb_clear_nid(lmb);
> >   lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > -
> > + if (dt_update) {
> > + ret = drmem_update_dt();
> > + if (ret)
> > + pr_warn("%s fail to update dt during 
> > rollback, but continue\n", __func__);
> > + }
> >   __remove_memory(nid, base_addr, block_sz);
> >   }
> >
> > @@ -638,6 +670,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >   int lmbs_available = 0;
> >   int lmbs_added = 0;
> >   int rc;
> > + bool dt_update = false;
> >
> >   pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
> >
> > @@ -664,7 +697,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >   if (rc)
> >   continue;
> >
> > - rc = dlpar_add_lmb(lmb);
> > + rc = dlpar_add_lmb(lmb, dt_update);
> >   if (rc) {
> >   dlpar_release_drc(lmb->drc_index);
> >   continue;
> > @@ -678,16 +711,23 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> >   lmbs_added++;
> >   if (lmbs_added == lmbs_to_add)
> >   break;
> > + else if (lmbs_added == lmbs_to_add - 1)
> > + dt_update = true;
>
> In the case the number of LMB to add is 1, dt_update is never set to true, and
> the device tree is never updated. You need to set dt_update to true earlier in
> the loop block.
Oh, I will fix it in V5
>
> >   }
> >
> >   if (lmbs_added != lmbs_to_add) {
> > + bool rollback_dt_update = false;
> > +
> >   pr_err("Memory hot-add failed, removing any added LMBs\n");
> >
> >   for_each_drmem_lmb(lmb) {
> >   if (!drmem_lmb_reserved(lmb))
> >   continue;
> >
> > - rc = dlpar_remove_lmb(lmb);
> > + if (--lmbs_added == 0 && dt_update)
> > + rollback_dt_update = true;
>
> That test may have to be review to deal with error during the last LMB 
> addition,
> the DT may have been updated before __add_memory() is failing in
> dlpar_add_lmb(). In that case, lmbs_added == 0 and that branch is not covered.
> That's not an issue if dlpar_add_lmb() is handling that case (see my comment 
> above).
I will fix it in next version.

Thanks for your review.

Regards,
Pingfan


Re: [PATCH] powerpc: Fix circular dependency between percpu.h and mmu.h

2020-08-04 Thread Michael Ellerman
On Tue, 4 Aug 2020 23:05:58 +1000, Michael Ellerman wrote:
> Recently random.h started including percpu.h (see commit
> f227e3ec3b5c ("random32: update the net random state on interrupt and
> activity")), which broke corenet64_smp_defconfig:
> 
>   In file included from /linux/arch/powerpc/include/asm/paca.h:18,
>from /linux/arch/powerpc/include/asm/percpu.h:13,
>from /linux/include/linux/random.h:14,
>from /linux/lib/uuid.c:14:
>   /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 
> 'next_tlbcam_idx'
> 139 | DECLARE_PER_CPU(int, next_tlbcam_idx);
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Fix circular dependency between percpu.h and mmu.h
  https://git.kernel.org/powerpc/c/0c83b277ada72b585e6a3e52b067669df15bcedb

cheers


Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

2020-08-04 Thread Michael Ellerman
Hi Mike,

There is a bit of history to this code, but not in a good way :)

Michael Roth  writes:
> For a power9 KVM guest with XIVE enabled, running a test loop
> where we hotplug 384 vcpus and then unplug them, the following traces
> can be seen (generally within a few loops) either from the unplugged
> vcpu:
>
>   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>   [ 1767.952311] list_del corruption. next->prev should be c00a02470208, 
> but was c00a02470048
...
>
> At that point the worker thread assumes the unplugged CPU is in some
> unknown/dead state and procedes with the cleanup, causing the race with
> the XIVE cleanup code executed by the unplugged CPU.
>
> Fix this by inserting an msleep() after each RTAS call to avoid

We previously had an msleep(), but it was removed:

  b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")

> pseries_cpu_die() returning prematurely, and double the number of
> attempts so we wait at least a total of 5 seconds. While this isn't an
> ideal solution, it is similar to how we dealt with a similar issue for
> cede_offline mode in the past (940ce422a3).

Thiago tried to fix this previously but there was a bit of discussion
that didn't quite resolve:

  
https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauer...@linux.ibm.com/


Spinning forever seems like a bad idea, but as has been demonstrated at
least twice now, continuing when we don't know the state of the other
CPU can lead to straight up crashes.

So I think I'm persuaded that it's preferable to have the kernel stuck
spinning rather than oopsing.

I'm 50/50 on whether we should have a cond_resched() in the loop. My
first instinct is no, if we're stuck here for 20s a stack trace would be
good. But then we will probably hit that on some big and/or heavily
loaded machine.

So possibly we should call cond_resched() but have some custom logic in
the loop to print a warning if we are stuck for more than some
sufficiently long amount of time.


> Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt 
> controller")
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588

This is not public.

I tend to trim Bugzilla links from the change log, because I'm not
convinced they will last forever, but it is good to have them in the
mail archive.

cheers

> Cc: Michael Ellerman 
> Cc: Cedric Le Goater 
> Cc: Greg Kurz 
> Cc: Nathan Lynch 
> Signed-off-by: Michael Roth 
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index c6e0d8abf75e..3cb172758052 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu)
>   int cpu_status = 1;
>   unsigned int pcpu = get_hard_smp_processor_id(cpu);
>  
> - for (tries = 0; tries < 25; tries++) {
> + for (tries = 0; tries < 50; tries++) {
>   cpu_status = smp_query_cpu_stopped(pcpu);
>   if (cpu_status == QCSS_STOPPED ||
>   cpu_status == QCSS_HARDWARE_ERROR)
>   break;
> - cpu_relax();
> -
> + msleep(100);
>   }
>  
>   if (cpu_status != 0) {
> -- 
> 2.17.1


Re: [PATCHv4 1/2] powerpc/pseries: group lmb operation and memblock's

2020-08-04 Thread Pingfan Liu
On Mon, Aug 3, 2020 at 9:52 PM Laurent Dufour  wrote:
>
> > @@ -603,6 +606,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> > }
> >
> > lmb_set_nid(lmb);
> > +   lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +
> > block_sz = memory_block_size_bytes();
> >
> > /* Add the memory */
>
> Since the lmb->flags is now set earlier, you should unset it in the case the
> call to __add_memory() fails, something like:
>
> @@ -614,6 +614,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
> if (rc) {
> invalidate_lmb_associativity_index(lmb);
> +   lmb->flags &= ~DRCONF_MEM_ASSIGNED;
You are right. I will fix it in V5.

Thanks,
Pingfan


[PATCH] powerpc: Fix circular dependency between percpu.h and mmu.h

2020-08-04 Thread Michael Ellerman
Recently random.h started including percpu.h (see commit
f227e3ec3b5c ("random32: update the net random state on interrupt and
activity")), which broke corenet64_smp_defconfig:

  In file included from /linux/arch/powerpc/include/asm/paca.h:18,
   from /linux/arch/powerpc/include/asm/percpu.h:13,
   from /linux/include/linux/random.h:14,
   from /linux/lib/uuid.c:14:
  /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 
'next_tlbcam_idx'
139 | DECLARE_PER_CPU(int, next_tlbcam_idx);

This is due to a circular header dependency:
  asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which
  includes asm/mmu.h

Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it.

We can fix it by moving the include of paca.h below the include of
asm-generic/percpu.h.

This moves the include of paca.h out of the #ifdef __powerpc64__, but
that is OK because paca.h is almost entirely inside #ifdef
CONFIG_PPC64 anyway.

It also moves the include of paca.h out of the #ifdef CONFIG_SMP,
which could possibly break something, but seems to have no ill
effects.

Reported-by: Stephen Rothwell 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/percpu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/percpu.h 
b/arch/powerpc/include/asm/percpu.h
index dce863a7635c..8e5b7d0b851c 100644
--- a/arch/powerpc/include/asm/percpu.h
+++ b/arch/powerpc/include/asm/percpu.h
@@ -10,8 +10,6 @@
 
 #ifdef CONFIG_SMP
 
-#include 
-
 #define __my_cpu_offset local_paca->data_offset
 
 #endif /* CONFIG_SMP */
@@ -19,4 +17,6 @@
 
 #include 
 
+#include 
+
 #endif /* _ASM_POWERPC_PERCPU_H_ */
-- 
2.25.1



Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask

2020-08-04 Thread peterz
On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
> * pet...@infradead.org  [2020-08-04 12:45:20]:
> 
> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > > most architectures. One of the users of cpu_smt_mask(), would be to
> > > identify idle-cores. On Power9, a pair of cores can be presented by the
> > > firmware as a big-core for backward compatibility reasons.
> > > 
> > > In order to maintain userspace backward compatibility with previous
> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > > this pair shares the L2 cache as well. However, from the scheduler's point
> > > of view, a core should be determined by SMT4. The load-balancer already
> > > does this. Hence allow PowerPc architecture to override the default
> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> > 
> > I'm utterly confused.
> > 
> > Why can't you set your regular siblings mask to the smt4 thing? Who
> > cares about the compat stuff, I thought that was an LPAR/AIX thing.
> 
> There are no technical challenges to set the sibling mask to SMT4.
> This is for Linux running on PowerVM. When these Power9 boxes are sold /
> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
> world, everything is in SMT8 mode, the device tree properties still mark
> the system to be running on 8 thread core. There are a number of utilities
> like ppc64_cpu that directly read from device-tree. They would get core
> count and thread count which is SMT8 based.
> 
> If the sibling_mask is set to small core, then same user when looking at

FWIW, I find the small/big core naming utterly confusing vs the
big/little naming from ARM. When you say small, I'm thinking of
asymmetric cores, not SMT4/SMT8.

> output from lscpu and other utilities that look at sysfs will start seeing
> 2x number of cores to what he had provisioned and what the utilities from
> the device-tree show. This can gets users confused.

One will report SMT8 and the other SMT4, right? So only users that
cannot read will be confused, but if you can't read, confusion is
guaranteed anyway.

Also, by exposing the true (SMT4) topology to userspace, userspace
applications could behave better -- for those few that actually parse
the topology information.

> So to keep the device-tree properties, utilities depending on device-tree,
> sysfs and utilities depending on sysfs on the same page, userspace are only
> exposed as SMT8.

I'm not convinced it makes sense to lie to userspace just to accomodate
a few users that cannot read.


Re: [merge] Build failure selftest/powerpc/mm/pkey_exec_prot

2020-08-04 Thread Michael Ellerman
Sandipan Das  writes:
> On 04/08/20 6:38 am, Michael Ellerman wrote:
>> Sandipan Das  writes:
>>> On 03/08/20 4:32 pm, Michael Ellerman wrote:
 Sachin Sant  writes:
>> On 02-Aug-2020, at 10:58 PM, Sandipan Das  wrote:
>> On 02/08/20 4:45 pm, Sachin Sant wrote:
>>> pkey_exec_prot test from linuxppc merge branch (3f68564f1f5a) fails to
>>> build due to following error:
>>>
>>> gcc -std=gnu99 -O2 -Wall -Werror 
>>> -DGIT_VERSION='"v5.8-rc7-1276-g3f68564f1f5a"' 
>>> -I/home/sachin/linux/tools/testing/selftests/powerpc/include  -m64
>>> pkey_exec_prot.c 
>>> /home/sachin/linux/tools/testing/selftests/kselftest_harness.h 
>>> /home/sachin/linux/tools/testing/selftests/kselftest.h ../harness.c 
>>> ../utils.c  -o 
>>> /home/sachin/linux/tools/testing/selftests/powerpc/mm/pkey_exec_prot
>>> In file included from pkey_exec_prot.c:18:
>>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:34: 
>>> error: "SYS_pkey_mprotect" redefined [-Werror]
>>> #define SYS_pkey_mprotect 386
>>>
>>> In file included from /usr/include/sys/syscall.h:31,
>>> from 
>>> /home/sachin/linux/tools/testing/selftests/powerpc/include/utils.h:47,
>>> from 
>>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:12,
>>> from pkey_exec_prot.c:18:
>>> /usr/include/bits/syscall.h:1583: note: this is the location of the 
>>> previous definition
>>> # define SYS_pkey_mprotect __NR_pkey_mprotect
>>>
>>> commit 128d3d021007 introduced this error.
>>> selftests/powerpc: Move pkey helpers to headers
>>>
>>> Possibly the # defines for sys calls can be retained in 
>>> pkey_exec_prot.c or
>>>
>>
>> I am unable to reproduce this on the latest merge branch (HEAD at 
>> f59195f7faa4).
>> I don't see any redefinitions in pkey_exec_prot.c either.
>>
>
> I can still see this problem on latest merge branch.
> I have following gcc version
>
> gcc version 8.3.1 20191121

 What libc version? Or just the distro & version?
>>>
>>> Sachin observed this on RHEL 8.2 with glibc-2.28.
>>> I couldn't reproduce it on Ubuntu 20.04 and Fedora 32 and both these distros
>>> are using glibc-2.31.
>> 
>> OK odd. Usually it's newer glibc that hits this problem.
>> 
>> I guess on RHEL 8.2 we're getting the asm-generic version? But that
>> would be quite wrong if that's what's happening.
>
> If I let GCC dump all the headers that are being used for the source file, I 
> always
> see syscall.h being included on the RHEL 8.2 system. That is the header with 
> the
> conflicting definition.
>
>   $ cd tools/testing/selftests/powerpc/mm
>   $ gcc -H -std=gnu99 -O2 -Wall -Werror 
> -DGIT_VERSION='"v5.8-rc7-1456-gf59195f7faa4-dirty"' \
> -I../include -m64 pkey_exec_prot.c ../../kselftest_harness.h 
> ../../kselftest.h ../harness.c ../utils.c \
> -o pkey_exec_prot 2>&1 | grep syscall
>
> On Ubuntu 20.04 and Fedora 32, grep doesn't find any matching text.
> On RHEL 8.2, it shows the following.
>   ... /usr/include/sys/syscall.h
>    /usr/include/bits/syscall.h
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1583: note: this is the location of the 
> previous definition
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1575: note: this is the location of the 
> previous definition
>   In file included from /usr/include/sys/syscall.h:31,
>   /usr/include/bits/syscall.h:1579: note: this is the location of the 
> previous definition
>   /usr/include/bits/syscall.h
>   .. /usr/include/sys/syscall.h
>   ... /usr/include/bits/syscall.h
>   /usr/include/bits/syscall.h
>   .. /usr/include/sys/syscall.h
>   ... /usr/include/bits/syscall.h
>   /usr/include/bits/syscall.h
>
> So utils.h is also including /usr/include/sys/syscall.h for glibc versions 
> older than 2.30
> because of commit 743f3544fffb ("selftests/powerpc: Add wrapper for gettid") 
> :)

Haha, of course. :facepalm_emoji:

> [...]
> . ../include/pkeys.h
> [...]
> .. ../include/utils.h
> [...]
> ... /usr/include/sys/syscall.h
>  /usr/include/asm/unistd.h
>  /usr/include/bits/syscall.h
> In file included from pkey_exec_prot.c:18:
> ../include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
>  #define SYS_pkey_mprotect 386
>
> In file included from /usr/include/sys/syscall.h:31,
>  from ../include/utils.h:47,
>  from ../include/pkeys.h:12,
>  from pkey_exec_prot.c:18:
> /usr/include/bits/syscall.h:1583: note: this is the location of the previous 
> definition
>  # define SYS_pkey_mprotect __NR_pkey_mprotect

Aha, that explains why redefining gives us an error, because we're
defining it to the literal 386 whereas the system header is defining it
to the __NR value.

Is there a reason to use the SYS_ name?


Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask

2020-08-04 Thread Srikar Dronamraju
* pet...@infradead.org  [2020-08-04 12:45:20]:

> On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > most architectures. One of the users of cpu_smt_mask(), would be to
> > identify idle-cores. On Power9, a pair of cores can be presented by the
> > firmware as a big-core for backward compatibility reasons.
> > 
> > In order to maintain userspace backward compatibility with previous
> > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > this pair shares the L2 cache as well. However, from the scheduler's point
> > of view, a core should be determined by SMT4. The load-balancer already
> > does this. Hence allow PowerPc architecture to override the default
> > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> 
> I'm utterly confused.
> 
> Why can't you set your regular siblings mask to the smt4 thing? Who
> cares about the compat stuff, I thought that was an LPAR/AIX thing.

There are no technical challenges to set the sibling mask to SMT4.
This is for Linux running on PowerVM. When these Power9 boxes are sold /
marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
world, everything is in SMT8 mode, the device tree properties still mark
the system to be running on 8 thread core. There are a number of utilities
like ppc64_cpu that directly read from device-tree. They would get core
count and thread count which is SMT8 based.

If the sibling_mask is set to small core, then same user when looking at
output from lscpu and other utilities that look at sysfs will start seeing
2x number of cores to what he had provisioned and what the utilities from
the device-tree show. This can gets users confused.

So to keep the device-tree properties, utilities depending on device-tree,
sysfs and utilities depending on sysfs on the same page, userspace are only
exposed as SMT8.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 1/6] powerpc/powernv/smp: Fix spurious DBG() warning

2020-08-04 Thread Michael Ellerman
Joel Stanley  writes:
> On Tue, 4 Aug 2020 at 00:57, Oliver O'Halloran  wrote:
>>
>> When building with W=1 we get the following warning:
>>
>>  arch/powerpc/platforms/powernv/smp.c: In function ‘pnv_smp_cpu_kill_self’:
>>  arch/powerpc/platforms/powernv/smp.c:276:16: error: suggest braces around
>> empty body in an ‘if’ statement [-Werror=empty-body]
>>276 |  cpu, srr1);
>>|^
>>  cc1: all warnings being treated as errors
>>
>> The full context is this block:
>>
>>  if (srr1 && !generic_check_cpu_restart(cpu))
>> DBG("CPU%d Unexpected exit while offline srr1=%lx!\n",
>> cpu, srr1);
>>
>> When building with DEBUG undefined DBG() expands to nothing and GCC emits
>> the warning due to the lack of braces around an empty statement.
>>
>> Signed-off-by: Oliver O'Halloran 
>> ---
>> We could add the braces too. That might even be better since it's a 
>> multi-line
>> if block even though it's only a single statement.
>
> Or you could put it all on one line, now that our 120 line overlords
> have taken over.

Yeah I think that one may as well be one line.

> Reviewed-by: Joel Stanley 
>
> Messy:
>
> $ git grep "define DBG(" arch/powerpc/ |grep -v print
> arch/powerpc/kernel/crash_dump.c:#define DBG(fmt...)
> arch/powerpc/kernel/iommu.c:#define DBG(...)
> arch/powerpc/kernel/legacy_serial.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/kernel/prom.c:#define DBG(fmt...)
..

Yeah, gross old cruft.

The vast majority of those should just be replaced with pr_devel()
and/or pr_debug().

The pnv_smp_cpu_kill_self() case is one where we probably do want to
stick with udbg_printf(), because I don't think it's kosher to call
printk() from an offline CPU.

cheers


Re: [PATCH v2] powerpc: Warn about use of smt_snooze_delay

2020-08-04 Thread Michael Ellerman
Joel Stanley  writes:
> It's not done anything for a long time. Save the percpu variable, and
> emit a warning to remind users to not expect it to do anything.
>
> Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
> Cc: sta...@vger.kernel.org # v3.14
> Signed-off-by: Joel Stanley 
> --
> v2:
>  Use pr_warn instead of WARN
>  Reword and print proccess name with pid in message
>  Leave CPU_FTR_SMT test in
>  Add Fixes line
>
> mpe, if you don't agree then feel free to drop the cc stable.
>
> Testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system it's quite noisy
> as the online/offline loop that ppc64_cpu runs is slow.

Hmm, that is pretty spammy.

I know I suggested the ratelimit, but I was thinking it would print once
for each ppc64_cpu invocation, not ~30 times.

How about pr_warn_once(), that should still be sufficient for people to
notice if they're looking for it.

...
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b3259697e..ba6d4cee19ef 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -32,29 +32,26 @@
>  
>  static DEFINE_PER_CPU(struct cpu, cpu_devices);
>  
> -/*
> - * SMT snooze delay stuff, 64-bit only for now
> - */
> -
>  #ifdef CONFIG_PPC64
>  
> -/* Time in microseconds we delay before sleeping in the idle loop */
> -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> +/*
> + * Snooze delay has not been hooked up since 3fa8cad82b94 
> ("powerpc/pseries/cpuidle:
> + * smt-snooze-delay cleanup.") and has been broken even longer. As was 
> foretold in
> + * 2014:
> + *
> + *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to 
> clean
> + *  up the kernel code."
> + *
> + * At some point in the future this code should be removed.
> + */
>  
>  static ssize_t store_smt_snooze_delay(struct device *dev,
> struct device_attribute *attr,
> const char *buf,
> size_t count)
>  {
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> - ssize_t ret;
> - long snooze;
> -
> - ret = sscanf(buf, "%ld", );
> - if (ret != 1)
> - return -EINVAL;
> -
> - per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> + pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this 
> has no effect\n",
> + current->comm, current->pid);

Can we make this:

"%s (%d) stored to unsupported smt_snooze_delay, which has no 
effect.\n",


>   return count;
>  }
>  
> @@ -62,9 +59,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>struct device_attribute *attr,
>char *buf)
>  {
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> -
> - return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
> + pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this 
> has no effect\n",
> + current->comm, current->pid);

It has as much effect as it ever did :)

So maybe:

"%s (%d) read from unsupported smt_snooze_delay.\n",


I can do those changes when applying if you like rather than making you
do a v3.

cheers


[PATCH v9 5/5] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-08-04 Thread Christophe Leroy
Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 9 +
 arch/powerpc/kernel/vdso32/vdso32.lds.S| 1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index fd7b01c51281..a6e29f880e0e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cvdso_call __c_kernel_clock_gettime
 V_FUNCTION_END(__kernel_clock_gettime)
 
+/*
+ * Exact prototype of clock_gettime64()
+ *
+ * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_clock_gettime64)
+   cvdso_call __c_kernel_clock_gettime64
+V_FUNCTION_END(__kernel_clock_gettime64)
 
 /*
  * Exact prototype of clock_getres()
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 4c985467a668..582c5b046cc9 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -148,6 +148,7 @@ VERSION
 #ifndef CONFIG_PPC_BOOK3S_601
__kernel_gettimeofday;
__kernel_clock_gettime;
+   __kernel_clock_gettime64;
__kernel_clock_getres;
__kernel_time;
__kernel_get_tbfreq;
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c 
b/arch/powerpc/kernel/vdso32/vgettimeofday.c
index 0b9ab4c22ef2..f7f71fecf4ed 100644
--- a/arch/powerpc/kernel/vdso32/vgettimeofday.c
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct 
old_timespec32 *ts,
return __cvdso_clock_gettime32_data(vd, clock, ts);
 }
 
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+  const struct vdso_data *vd)
+{
+   return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
 int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone 
*tz,
const struct vdso_data *vd)
 {
-- 
2.25.0



[PATCH v9 4/5] powerpc/vdso: Switch VDSO to generic C implementation.

2020-08-04 Thread Christophe Leroy
For VDSO32 on PPC64, we create a fake 32 bits config, on the same
principle as MIPS architecture, in order to get the correct parts of
the different asm header files.

With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 828 nsec/call
clock-getres-realtime-coarse:vdso: 391 nsec/call
clock-gettime-realtime-coarse:vdso: 614 nsec/call
clock-getres-realtime:vdso: 460 nsec/call
clock-gettime-realtime:vdso: 876 nsec/call
clock-getres-monotonic-coarse:vdso: 399 nsec/call
clock-gettime-monotonic-coarse:vdso: 691 nsec/call
clock-getres-monotonic:vdso: 460 nsec/call
clock-gettime-monotonic:vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 955 nsec/call
clock-getres-realtime-coarse:vdso: 545 nsec/call
clock-gettime-realtime-coarse:vdso: 592 nsec/call
clock-getres-realtime:vdso: 545 nsec/call
clock-gettime-realtime:vdso: 941 nsec/call
clock-getres-monotonic-coarse:vdso: 545 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call
clock-getres-monotonic:vdso: 545 nsec/call
clock-gettime-monotonic:vdso: 940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
clock-gettime-boottime:vdso: 3851 nsec/call
clock-gettime-tai:vdso: 3852 nsec/call
clock-gettime-monotonic-raw:vdso: 3396 nsec/call

Same clocks with C VDSO:
clock-gettime-tai:vdso: 941 nsec/call
clock-gettime-monotonic-raw:vdso: 1001 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 220 nsec/call
clock-getres-realtime-coarse:vdso: 102 nsec/call
clock-gettime-realtime-coarse:vdso: 178 nsec/call
clock-getres-realtime:vdso: 129 nsec/call
clock-gettime-realtime:vdso: 235 nsec/call
clock-getres-monotonic-coarse:vdso: 105 nsec/call
clock-gettime-monotonic-coarse:vdso: 208 nsec/call
clock-getres-monotonic:vdso: 129 nsec/call
clock-gettime-monotonic:vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 272 nsec/call
clock-getres-realtime-coarse:vdso: 160 nsec/call
clock-gettime-realtime-coarse:vdso: 184 nsec/call
clock-getres-realtime:vdso: 166 nsec/call
clock-gettime-realtime:vdso: 281 nsec/call
clock-getres-monotonic-coarse:vdso: 160 nsec/call
clock-gettime-monotonic-coarse:vdso: 184 nsec/call
clock-getres-monotonic:vdso: 169 nsec/call
clock-gettime-monotonic:vdso: 275 nsec/call

Signed-off-by: Christophe Leroy 
---
v9:
- Rebased (Impact on arch/powerpc/kernel/vdso??/Makefile

v7:
- Split out preparatory changes in a new preceding patch
- Added -fasynchronous-unwind-tables to CC flags.

v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |   2 +
 arch/powerpc/include/asm/vdso/vsyscall.h   |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h   |  40 +--
 arch/powerpc/kernel/asm-offsets.c  |  49 +---
 arch/powerpc/kernel/time.c |  91 +--
 arch/powerpc/kernel/vdso.c |   5 +-
 arch/powerpc/kernel/vdso32/Makefile|  32 ++-
 arch/powerpc/kernel/vdso32/config-fake32.h |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 291 +
 arch/powerpc/kernel/vdso64/Makefile|  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S  | 242 +
 11 files changed, 143 insertions(+), 691 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c7656cc10eb..9977ab939b42 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -171,6 +171,7 @@ config PPC
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+   select GENERIC_GETTIMEOFDAY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
@@ -202,6 +203,7 @@ config PPC
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200   # 
plugin support on gcc <= 5.1 is buggy on PPC
+   select HAVE_GENERIC_VDSO
select HAVE_HW_BREAKPOINT   if PERF_EVENTS && (PPC_BOOK3S 
|| PPC_8xx)
select HAVE_IDE
select HAVE_IOREMAP_PROT
diff --git 

[PATCH v9 0/5] powerpc: switch VDSO to C implementation

2020-08-04 Thread Christophe Leroy
This is the nineth version of a series to switch powerpc VDSO to
generic C implementation.

Main changes since v8 are:
- Dropped the patches which put the VDSO datapage in front of VDSO text in the 
mapping
- Adds a second stack frame because the caller doesn't set one, at least on 
PPC64
- Saving the TOC pointer on PPC64 (is that really needed ?)

This series applies on today's powerpc/merge branch.

See the last patches for details on changes and performance.

Christophe Leroy (5):
  powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
  powerpc/vdso: Prepare for switching VDSO to generic C implementation.
  powerpc/vdso: Save and restore TOC pointer on PPC64
  powerpc/vdso: Switch VDSO to generic C implementation.
  powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

 arch/powerpc/Kconfig |   2 +
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/processor.h |  13 +-
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 191 
 arch/powerpc/include/asm/vdso/processor.h|  23 ++
 arch/powerpc/include/asm/vdso/vsyscall.h |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h |  40 +--
 arch/powerpc/kernel/asm-offsets.c|  49 +--
 arch/powerpc/kernel/time.c   |  91 +-
 arch/powerpc/kernel/vdso.c   |   5 +-
 arch/powerpc/kernel/vdso32/Makefile  |  32 +-
 arch/powerpc/kernel/vdso32/config-fake32.h   |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S| 300 +--
 arch/powerpc/kernel/vdso32/vdso32.lds.S  |   1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  35 +++
 arch/powerpc/kernel/vdso64/Makefile  |  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S| 242 +--
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 ++
 19 files changed, 447 insertions(+), 702 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

-- 
2.25.0



[PATCH v9 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-04 Thread Christophe Leroy
Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18:   35 25 ff e0 addic.  r9,r5,-32
  1c:   41 80 00 10 blt 2c 
  20:   7c 64 4c 30 srw r4,r3,r9
  24:   38 60 00 00 li  r3,0
...
  2c:   54 69 08 3c rlwinm  r9,r3,1,0,30
  30:   21 45 00 1f subfic  r10,r5,31
  34:   7c 84 2c 30 srw r4,r4,r5
  38:   7d 29 50 30 slw r9,r9,r10
  3c:   7c 63 2c 30 srw r3,r3,r5
  40:   7d 24 23 78 or  r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0:   21 25 00 20 subfic  r9,r5,32
   4:   7c 69 48 30 slw r9,r3,r9
   8:   7c 84 2c 30 srw r4,r4,r5
   c:   7d 24 23 78 or  r4,r9,r4
  10:   7c 63 2c 30 srw r3,r3,r5

Signed-off-by: Christophe Leroy 
---
v9:
- No more modification of __get_datapage(). Offset is added after.
- Adding a second stack frame because the PPC VDSO ABI doesn't force
the caller to set one.

v8:
- New, splitted out of last patch of the series

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 179 +++
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  29 +++
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 +++
 5 files changed, 251 insertions(+)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

diff --git a/arch/powerpc/include/asm/clocksource.h 
b/arch/powerpc/include/asm/clocksource.h
new file mode 100644
index ..482185566b0c
--- /dev/null
+++ b/arch/powerpc/include/asm/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLOCKSOURCE_H
+#define _ASM_CLOCKSOURCE_H
+
+#include 
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/clocksource.h 
b/arch/powerpc/include/asm/vdso/clocksource.h
new file mode 100644
index ..ec5d672d2569
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSOCLOCKSOURCE_H
+#define __ASM_VDSOCLOCKSOURCE_H
+
+#define VDSO_ARCH_CLOCKMODES   VDSO_CLOCKMODE_ARCHTIMER
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index ..0b6fa245d54e
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager5, r0
+   addir5, r5, VDSO_DATA_OFFSET
+   bl  \funct
+   PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   cmpwi   r3, 0
+   mtlrr0
+  .cfi_restore lr
+   addir1, r1, 2 * STACK_FRAME_OVERHEAD
+   crclr   so
+   beqlr+
+   crset   so
+   neg r3, r3
+   blr
+  .cfi_endproc
+.endm
+
+.macro cvdso_call_time funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager4, r0
+   addir4, r4, VDSO_DATA_OFFSET
+   bl   

[PATCH v9 3/5] powerpc/vdso: Save and restore TOC pointer on PPC64

2020-08-04 Thread Christophe Leroy
On PPC64, the TOC pointer needs to be saved and restored.

Suggested-by: Michael Ellerman 
Signed-off-by: Christophe Leroy 
---
v9: New.

I'm not sure this is really needed, I can't see the VDSO C code doing
anything with r2, at least on ppc64_defconfig.

So I let you decide whether you take it or not.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso/gettimeofday.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 0b6fa245d54e..fa774086173b 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -13,10 +13,16 @@
PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
   .cfi_register lr, r0
PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_STL r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
get_datapager5, r0
addir5, r5, VDSO_DATA_OFFSET
bl  \funct
PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_LL  r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
cmpwi   r3, 0
mtlrr0
   .cfi_restore lr
@@ -36,10 +42,16 @@
PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
   .cfi_register lr, r0
PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_STL r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
get_datapager4, r0
addir4, r4, VDSO_DATA_OFFSET
bl  \funct
PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_LL  r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
crclr   so
mtlrr0
   .cfi_restore lr
-- 
2.25.0



[PATCH v9 1/5] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h

2020-08-04 Thread Christophe Leroy
cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.

Move it there.

Signed-off-by: Christophe Leroy 
---
v9: Forgot to remove cpu_relax() from processor.h in v8
---
 arch/powerpc/include/asm/processor.h  | 13 ++---
 arch/powerpc/include/asm/vdso/processor.h | 23 +++
 2 files changed, 25 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..c1ba9c8d9b90 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -6,6 +6,8 @@
  * Copyright (C) 2001 PPC 64 Team, IBM Corp
  */
 
+#include 
+
 #include 
 
 #ifdef CONFIG_VSX
@@ -63,14 +65,6 @@ extern int _chrp_type;
 
 #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
 
-/* Macros for adjusting thread priority (hardware multi-threading) */
-#define HMT_very_low()   asm volatile("or 31,31,31   # very low priority")
-#define HMT_low()   asm volatile("or 1,1,1  # low priority")
-#define HMT_medium_low() asm volatile("or 6,6,6  # medium low priority")
-#define HMT_medium()asm volatile("or 2,2,2  # medium priority")
-#define HMT_medium_high() asm volatile("or 5,5,5  # medium high priority")
-#define HMT_high()  asm volatile("or 3,3,3  # high priority")
-
 #ifdef __KERNEL__
 
 #ifdef CONFIG_PPC64
@@ -350,7 +344,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 }
 
 #ifdef CONFIG_PPC64
-#define cpu_relax()do { HMT_low(); HMT_medium(); barrier(); } while (0)
 
 #define spin_begin()   HMT_low()
 
@@ -369,8 +362,6 @@ do {
\
}   \
 } while (0)
 
-#else
-#define cpu_relax()barrier()
 #endif
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/powerpc/include/asm/vdso/processor.h 
b/arch/powerpc/include/asm/vdso/processor.h
new file mode 100644
index ..39b9beace9ca
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/processor.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_VDSO_PROCESSOR_H
+#define __ASM_VDSO_PROCESSOR_H
+
+#ifndef __ASSEMBLY__
+
+/* Macros for adjusting thread priority (hardware multi-threading) */
+#define HMT_very_low() asm volatile("or 31, 31, 31 # very low 
priority")
+#define HMT_low()  asm volatile("or 1, 1, 1# low priority")
+#define HMT_medium_low()   asm volatile("or 6, 6, 6# medium low 
priority")
+#define HMT_medium()   asm volatile("or 2, 2, 2# medium 
priority")
+#define HMT_medium_high()  asm volatile("or 5, 5, 5# medium high 
priority")
+#define HMT_high() asm volatile("or 3, 3, 3# high 
priority")
+
+#ifdef CONFIG_PPC64
+#define cpu_relax()do { HMT_low(); HMT_medium(); barrier(); } while (0)
+#else
+#define cpu_relax()barrier()
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_PROCESSOR_H */
-- 
2.25.0



Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-08-04 Thread Christophe Leroy




On 07/16/2020 02:59 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

The VDSO datapage and the text pages are always located immediately
next to each other, so it can be hardcoded without an indirection
through __kernel_datapage_offset

In order to ease things, move the data page in front like other
arches, that way there is no need to know the size of the library
to locate the data page.

Before:
clock-getres-realtime-coarse:vdso: 714 nsec/call
clock-gettime-realtime-coarse:vdso: 792 nsec/call
clock-gettime-realtime:vdso: 1243 nsec/call

After:
clock-getres-realtime-coarse:vdso: 699 nsec/call
clock-gettime-realtime-coarse:vdso: 784 nsec/call
clock-gettime-realtime:vdso: 1231 nsec/call

Signed-off-by: Christophe Leroy 
---
v7:
- Moved the removal of the tmp param of __get_datapage()
in a subsequent patch
- Included the addition of the offset param to __get_datapage()
in the further preparatory patch
---
  arch/powerpc/include/asm/vdso_datapage.h |  8 ++--
  arch/powerpc/kernel/vdso.c   | 53 
  arch/powerpc/kernel/vdso32/datapage.S|  3 --
  arch/powerpc/kernel/vdso32/vdso32.lds.S  |  7 +---
  arch/powerpc/kernel/vdso64/datapage.S|  3 --
  arch/powerpc/kernel/vdso64/vdso64.lds.S  |  7 +---
  6 files changed, 16 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index b9ef6cf50ea5..11886467dfdf 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -118,10 +118,12 @@ extern struct vdso_data *vdso_data;
  
  .macro get_datapage ptr, tmp

bcl 20, 31, .+4
+999:
mflr\ptr
-   addi\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l
-   lwz \tmp, 0(\ptr)
-   add \ptr, \tmp, \ptr
+#if CONFIG_PPC_PAGE_SHIFT > 14
+   addis   \ptr, \ptr, (_vdso_datapage - 999b)@ha
+#endif
+   addi\ptr, \ptr, (_vdso_datapage - 999b)@l
  .endm
  
  #endif /* __ASSEMBLY__ */

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index f38f26e844b6..d33fa22ddbed 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -190,7 +190,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 * install_special_mapping or the perf counter mmap tracking code
 * will fail to recognise it as a vDSO (since arch_vma_name fails).
 */
-   current->mm->context.vdso_base = vdso_base;
+   current->mm->context.vdso_base = vdso_base + PAGE_SIZE;


I merged this but then realised it breaks the display of the vdso in 
/proc/self/maps.

ie. the vdso vma gets no name:

   # cat /proc/self/maps
   110f9-110fa r-xp  08:03 17021844 
/usr/bin/cat
   110fa-110fb r--p  08:03 17021844 
/usr/bin/cat
   110fb-110fc rw-p 0001 08:03 17021844 
/usr/bin/cat
   12600-12603 rw-p  00:00 0
[heap]
   7fffa879-7fffa87d rw-p  00:00 0
   7fffa87d-7fffa883 r--p  08:03 17521786   
/usr/lib/locale/en_AU.utf8/LC_CTYPE
   7fffa883-7fffa884 r--p  08:03 16958337   
/usr/lib/locale/en_AU.utf8/LC_NUMERIC
   7fffa884-7fffa885 r--p  08:03 8501358
/usr/lib/locale/en_AU.utf8/LC_TIME
   7fffa885-7fffa8ad r--p  08:03 16870886   
/usr/lib/locale/en_AU.utf8/LC_COLLATE
   7fffa8ad-7fffa8ae r--p  08:03 8509433
/usr/lib/locale/en_AU.utf8/LC_MONETARY
   7fffa8ae-7fffa8af r--p  08:03 25383753   
/usr/lib/locale/en_AU.utf8/LC_MESSAGES/SYS_LC_MESSAGES
   7fffa8af-7fffa8b0 r--p  08:03 17521790   
/usr/lib/locale/en_AU.utf8/LC_PAPER
   7fffa8b0-7fffa8b1 r--p  08:03 8501354
/usr/lib/locale/en_AU.utf8/LC_NAME
   7fffa8b1-7fffa8b2 r--p  08:03 8509431
/usr/lib/locale/en_AU.utf8/LC_ADDRESS
   7fffa8b2-7fffa8b3 r--p  08:03 8509434
/usr/lib/locale/en_AU.utf8/LC_TELEPHONE
   7fffa8b3-7fffa8b4 r--p  08:03 17521787   
/usr/lib/locale/en_AU.utf8/LC_MEASUREMENT
   7fffa8b4-7fffa8b5 r--s  08:03 25623315   
/usr/lib64/gconv/gconv-modules.cache
   7fffa8b5-7fffa8d4 r-xp  08:03 25383789   
/usr/lib64/libc-2.30.so
   7fffa8d4-7fffa8d5 r--p 001e 08:03 25383789   
/usr/lib64/libc-2.30.so
   7fffa8d5-7fffa8d6 rw-p 001f 08:03 25383789   
/usr/lib64/libc-2.30.so
   7fffa8d6-7fffa8d7 r--p  08:03 8509432
/usr/lib/locale/en_AU.utf8/LC_IDENTIFICATION
   7fffa8d7-7fffa8d9 r-xp 

Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-04 Thread Christophe Leroy




On 07/15/2020 01:04 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Modify __get_datapage() to take an offset
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

...


diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index ..4452897f9bd8
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)


This doesn't work for me on ppc64(le) with glibc.

glibc doesn't create a stack frame before making the VDSO call, so the
store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
leading to an infinite loop.

This is an example from a statically built program that calls
clock_gettime():

10030cb0 <__clock_gettime>:
 10030cb0:   0e 10 40 3c lis r2,4110
 10030cb4:   00 7a 42 38 addir2,r2,31232
 10030cb8:   a6 02 08 7c mflrr0
 10030cbc:   ff ff 22 3d addis   r9,r2,-1
 10030cc0:   58 6d 29 39 addir9,r9,27992
 10030cc4:   f0 ff c1 fb std r30,-16(r1)<-- 
redzone store
 10030cc8:   78 23 9e 7c mr  r30,r4
 10030ccc:   f8 ff e1 fb std r31,-8(r1) <-- 
redzone store
 10030cd0:   78 1b 7f 7c mr  r31,r3
 10030cd4:   10 00 01 f8 std r0,16(r1)  <-- 
save LR to caller's frame
 10030cd8:   00 00 09 e8 ld  r0,0(r9)
 10030cdc:   00 00 20 2c cmpdi   r0,0
 10030ce0:   50 00 82 41 beq 10030d30 <__clock_gettime+0x80>
 10030ce4:   a6 03 09 7c mtctr   r0
 10030ce8:   21 04 80 4e bctrl  <-- 
vdso call
 10030cec:   26 00 00 7c mfcrr0
 10030cf0:   00 10 09 74 andis.  r9,r0,4096
 10030cf4:   78 1b 69 7c mr  r9,r3
 10030cf8:   28 00 82 40 bne 10030d20 <__clock_gettime+0x70>
 10030cfc:   b4 07 23 7d extsw   r3,r9
 10030d00:   10 00 01 e8 ld  r0,16(r1)  <-- 
load saved LR, since clobbered by the VDSO
 10030d04:   f0 ff c1 eb ld  r30,-16(r1)
 10030d08:   f8 ff e1 eb ld  r31,-8(r1)
 10030d0c:   a6 03 08 7c mtlrr0 <-- 
restore LR
 10030d10:   20 00 80 4e blr<-- 
jumps to 10030cec


I'm kind of confused how it worked for you on 32-bit.


Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
frame whenever it has anything to same. Here is what I have in libc.so:


000fbb60 <__clock_gettime>:
   fbb60:   94 21 ff e0 stwur1,-32(r1)
   fbb64:   7c 08 02 a6 mflrr0
   fbb68:   48 09 75 c9 bl  193130 <_IO_stdout_+0x24b0>
   fbb6c:   93 c1 00 18 stw r30,24(r1)
   fbb70:   7f c8 02 a6 mflrr30
   fbb74:   90 01 00 24 stw r0,36(r1)
   fbb78:   93 81 00 10 stw r28,16(r1)
   fbb7c:   93 a1 00 14 stw r29,20(r1)
   fbb80:   83 9e fc 98 lwz r28,-872(r30)
   fbb84:   93 e1 00 1c stw r31,28(r1)
   fbb88:   80 1c 00 00 lwz r0,0(r28)
   fbb8c:   83 82 8f f4 lwz r28,-28684(r2)
   fbb90:   7c 7f 1b 78 mr  r31,r3
   fbb94:   7c 00 e2 79 xor.r0,r0,r28
   fbb98:   7c 9d 23 78 mr  r29,r4
   fbb9c:   41 82 00 40 beq fbbdc <__clock_gettime+0x7c>
   fbba0:   7c 09 03 a6 mtctr   r0
   fbba4:   4e 80 04 21 bctrl
   fbba8:   7c 00 00 26 mfcrr0
   fbbac:   74 1c 10 00 andis.  r28,r0,4096
   fbbb0:   40 82 00 24 bne fbbd4 <__clock_gettime+0x74>
   fbbb4:   80 01 00 24 lwz r0,36(r1)
   fbbb8:   83 81 00 10 lwz r28,16(r1)
   fbbbc:   7c 08 03 a6 mtlrr0
   fbbc0:   83 a1 00 14 lwz r29,20(r1)
   fbbc4:   83 c1 00 18 lwz   

Re: [PATCH 2/2] powerpc/topology: Override cpu_smt_mask

2020-08-04 Thread Valentin Schneider


On 04/08/20 11:46, pet...@infradead.org wrote:
> On Tue, Aug 04, 2020 at 09:03:07AM +0530, Srikar Dronamraju wrote:
>> On Power9 a pair of cores can be presented by the firmware as a big-core
>> for backward compatibility reasons, with 4 threads per (small) core and 8
>> threads per big-core. cpu_smt_mask() should generally point to the cpu mask
>> of the (small)core.
>>
>> In order to maintain userspace backward compatibility (with Power8 chips in
>> case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
>> has to be set to big-core. Hence override the default cpu_smt_mask() to be
>> powerpc specific allowing for better scheduling behaviour on Power.
>
> Why does Linux userspace care about this?

Ditto; from [1], a core contains CPUs that all share the same L1 (and capacity,
as per SD_SHARE_CPUCAPACITY). So IMO it makes perfect sense to have a first
domain spanning L1, and its parent spanning L2 - that means
topology_sibling_cpumask *itself* should span a single core rather than a
pair.

[1]: https://lkml.kernel.org/r/jhjr1sviswg.mog...@arm.com


Re: [PATCH 2/2] powerpc/topology: Override cpu_smt_mask

2020-08-04 Thread peterz
On Tue, Aug 04, 2020 at 09:03:07AM +0530, Srikar Dronamraju wrote:
> On Power9 a pair of cores can be presented by the firmware as a big-core
> for backward compatibility reasons, with 4 threads per (small) core and 8
> threads per big-core. cpu_smt_mask() should generally point to the cpu mask
> of the (small)core.
> 
> In order to maintain userspace backward compatibility (with Power8 chips in
> case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
> has to be set to big-core. Hence override the default cpu_smt_mask() to be
> powerpc specific allowing for better scheduling behaviour on Power.

Why does Linux userspace care about this?


Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask

2020-08-04 Thread peterz
On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> most architectures. One of the users of cpu_smt_mask(), would be to
> identify idle-cores. On Power9, a pair of cores can be presented by the
> firmware as a big-core for backward compatibility reasons.
> 
> In order to maintain userspace backward compatibility with previous
> versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> is option to the firmware to advertise a pair of SMT4 cores as a fused
> cores (referred to as the big_core mode in the Linux Kernel). On Power9
> this pair shares the L2 cache as well. However, from the scheduler's point
> of view, a core should be determined by SMT4. The load-balancer already
> does this. Hence allow PowerPc architecture to override the default
> cpu_smt_mask() to point to the SMT4 cores in a big_core mode.

I'm utterly confused.

Why can't you set your regular siblings mask to the smt4 thing? Who
cares about the compat stuff, I thought that was an LPAR/AIX thing.


Re: [PATCH 2/2] lockdep: warn on redundant or incorrect irq state changes

2020-08-04 Thread Alexey Kardashevskiy



On 23/07/2020 20:56, Nicholas Piggin wrote:
> With the previous patch, lockdep hardirq state changes should not be
> redundant. Softirq state changes already follow that pattern.
> 
> So warn on unexpected enable-when-enabled or disable-when-disabled
> conditions, to catch possible errors or sloppy patterns that could
> lead to similar bad behavior due to NMIs etc.


This one does not apply anymore as Peter's changes went in already but
this one is not really a fix so I can live with that. Did 1/2 go
anywhere? Thanks,


> 
> Signed-off-by: Nicholas Piggin 
> ---
>  kernel/locking/lockdep.c   | 80 +-
>  kernel/locking/lockdep_internals.h |  4 --
>  kernel/locking/lockdep_proc.c  | 10 +---
>  3 files changed, 35 insertions(+), 59 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 29a8de4c50b9..138458fb2234 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3649,15 +3649,8 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
>   if (unlikely(!debug_locks || current->lockdep_recursion))
>   return;
>  
> - if (unlikely(current->hardirqs_enabled)) {
> - /*
> -  * Neither irq nor preemption are disabled here
> -  * so this is racy by nature but losing one hit
> -  * in a stat is not a big deal.
> -  */
> - __debug_atomic_inc(redundant_hardirqs_on);
> + if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled))
>   return;
> - }
>  
>   /*
>* We're enabling irqs and according to our state above irqs weren't
> @@ -3695,15 +3688,8 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
>   if (unlikely(!debug_locks || curr->lockdep_recursion))
>   return;
>  
> - if (curr->hardirqs_enabled) {
> - /*
> -  * Neither irq nor preemption are disabled here
> -  * so this is racy by nature but losing one hit
> -  * in a stat is not a big deal.
> -  */
> - __debug_atomic_inc(redundant_hardirqs_on);
> + if (DEBUG_LOCKS_WARN_ON(curr->hardirqs_enabled))
>   return;
> - }
>  
>   /*
>* We're enabling irqs and according to our state above irqs weren't
> @@ -3738,6 +3724,9 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
>   if (unlikely(!debug_locks || curr->lockdep_recursion))
>   return;
>  
> + if (DEBUG_LOCKS_WARN_ON(!curr->hardirqs_enabled))
> + return;
> +
>   /*
>* So we're supposed to get called after you mask local IRQs, but for
>* some reason the hardware doesn't quite think you did a proper job.
> @@ -3745,17 +3734,13 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
>   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>   return;
>  
> - if (curr->hardirqs_enabled) {
> - /*
> -  * We have done an ON -> OFF transition:
> -  */
> - curr->hardirqs_enabled = 0;
> - curr->hardirq_disable_ip = ip;
> - curr->hardirq_disable_event = ++curr->irq_events;
> - debug_atomic_inc(hardirqs_off_events);
> - } else {
> - debug_atomic_inc(redundant_hardirqs_off);
> - }
> + /*
> +  * We have done an ON -> OFF transition:
> +  */
> + curr->hardirqs_enabled = 0;
> + curr->hardirq_disable_ip = ip;
> + curr->hardirq_disable_event = ++curr->irq_events;
> + debug_atomic_inc(hardirqs_off_events);
>  }
>  EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
>  
> @@ -3769,6 +3754,9 @@ void lockdep_softirqs_on(unsigned long ip)
>   if (unlikely(!debug_locks || current->lockdep_recursion))
>   return;
>  
> + if (DEBUG_LOCKS_WARN_ON(curr->softirqs_enabled))
> + return;
> +
>   /*
>* We fancy IRQs being disabled here, see softirq.c, avoids
>* funny state and nesting things.
> @@ -3776,11 +3764,6 @@ void lockdep_softirqs_on(unsigned long ip)
>   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>   return;
>  
> - if (curr->softirqs_enabled) {
> - debug_atomic_inc(redundant_softirqs_on);
> - return;
> - }
> -
>   current->lockdep_recursion++;
>   /*
>* We'll do an OFF -> ON transition:
> @@ -3809,26 +3792,26 @@ void lockdep_softirqs_off(unsigned long ip)
>   if (unlikely(!debug_locks || current->lockdep_recursion))
>   return;
>  
> + if (DEBUG_LOCKS_WARN_ON(!curr->softirqs_enabled))
> + return;
> +
>   /*
>* We fancy IRQs being disabled here, see softirq.c
>*/
>   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>   return;
>  
> - if (curr->softirqs_enabled) {
> - /*
> -  * We have done an ON -> OFF transition:
> -  */
> - curr->softirqs_enabled = 0;
> - 

Re: [RFC PATCH 2/2] powerpc/powernv/cpufreq: Don't assume chip id is same as Linux node id

2020-08-04 Thread Gautham R Shenoy
On Fri, Jul 31, 2020 at 04:52:05PM +0530, Aneesh Kumar K.V wrote:
> On PowerNV platforms we always have 1:1 mapping between chip ID and
> firmware group id. Use the helper to convert firmware group id to
> node id instead of directly using chip ID as Linux node id.
> 
> NOTE: This doesn't have any functional change. On PowerNV platforms
> we continue to have 1:1 mapping between firmware group id and
> Linux node id.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 8646eb197cd9..ca82b6ac0989 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include  /* Required for cpu_sibling_mask() in UP configs */
>  #include 
> +#include 
>  #include 
> 
>  #define POWERNV_MAX_PSTATES_ORDER  8
> @@ -1069,8 +1070,14 @@ static int init_chip_info(void)
>   }
> 
>   for (i = 0; i < nr_chips; i++) {
> + unsigned int nid;
> +
>   chips[i].id = chip[i];
> - cpumask_copy([i].mask, cpumask_of_node(chip[i]));
> + /*
> +  * On powervn platforms firmware group id is same as chipd id.

But doesn't hurt to be safe :-)

Reviewed-by: Gautham R. Shenoy 

> +  */
> + nid = firmware_group_id_to_nid(chip[i]);
> + cpumask_copy([i].mask, cpumask_of_node(nid));
>   INIT_WORK([i].throttle, powernv_cpufreq_work_fn);
>   for_each_cpu(cpu, [i].mask)
>   per_cpu(chip_info, cpu) =  [i];
> -- 
> 2.26.2
> 


Re: [PATCH v2] powerpc: Warn about use of smt_snooze_delay

2020-08-04 Thread Gautham R Shenoy
Hi Joel,

On Tue, Jun 30, 2020 at 11:29:35AM +0930, Joel Stanley wrote:
> It's not done anything for a long time. Save the percpu variable, and
> emit a warning to remind users to not expect it to do anything.
> 
> Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
> Cc: sta...@vger.kernel.org # v3.14
> Signed-off-by: Joel Stanley 


Sorry I missed this v2.

The patch looks good to me.

Acked-by: Gautham R. Shenoy 

> --
> v2:
>  Use pr_warn instead of WARN
>  Reword and print proccess name with pid in message
>  Leave CPU_FTR_SMT test in
>  Add Fixes line
> 
> mpe, if you don't agree then feel free to drop the cc stable.
> 
> Testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system it's quite noisy
> as the online/offline loop that ppc64_cpu runs is slow.
> 
> This could be fixed by open coding pr_warn_ratelimit with the ratelimit
> parameters tweaked if someone was concerned. I'll leave that to someone
> else as a future enhancement.
> 
> [  237.642088][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642175][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642261][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642345][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642430][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642516][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642625][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642709][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642793][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  237.642878][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264030][ T1197] store_smt_snooze_delay: 14 callbacks suppressed
> [  254.264033][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264048][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264062][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264075][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264089][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264103][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264116][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264130][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264143][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> [  254.264157][ T1197] ppc64_cpu (1197) used unsupported smt_snooze_delay, 
> this has no effect
> 
> Signed-off-by: Joel Stanley 
> ---
>  arch/powerpc/kernel/sysfs.c | 41 +++--
>  1 file changed, 16 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b3259697e..ba6d4cee19ef 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -32,29 +32,26 @@
> 
>  static DEFINE_PER_CPU(struct cpu, cpu_devices);
> 
> -/*
> - * SMT snooze delay stuff, 64-bit only for now
> - */
> -
>  #ifdef CONFIG_PPC64
> 
> -/* Time in microseconds we delay before sleeping in the idle loop */
> -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> +/*
> + * Snooze delay has not been hooked up since 3fa8cad82b94 
> ("powerpc/pseries/cpuidle:
> + * smt-snooze-delay cleanup.") and has been broken even longer. As was 
> foretold in
> + * 2014:
> + *
> + *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to 
> clean
> + *  up the kernel code."
> + *
> + * At some point in the future this code should be removed.
> + */
> 
>  static ssize_t store_smt_snooze_delay(struct device *dev,
> struct device_attribute *attr,
> const char *buf,
> size_t count)
>  {
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> - ssize_t ret;
> - long snooze;
> -
> - ret = sscanf(buf, "%ld", );
> - if (ret != 1)
> - return -EINVAL;
> -
> - per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> + pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this 
> has no effect\n",
> + current->comm, current->pid);
>   return count;
>  }
> 
> @@ -62,9 +59,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>struct device_attribute *attr,
>char *buf)
>  {
> - struct cpu *cpu 

Re: [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id

2020-08-04 Thread Srikar Dronamraju
* Aneesh Kumar K.V  [2020-08-02 19:51:41]:
> Srikar Dronamraju  writes:
> > * Aneesh Kumar K.V  [2020-07-31 16:49:14]:
> >
> >
> > If its just to eliminate node 0, then we have 2 other probably better
> > solutions.
> > 1. Dont mark node 0 as spl (currently still in mm-tree and a result in
> > linux-next)
> > 2. powerpc specific: explicitly clear node 0 during numa bringup.
> >
> 
> 
> I am not sure I consider them better. But yes, those patches are good
> and also resolves the node 0 initialization when the firmware didn't
> indicate the presence of such a node.
> 
> This patch in addition make sure that we get the same topolgy report
> across reboot on a virtualized partitions as longs as the cpu/memory
> ratio per powervm domains remain the same. This should also help to
> avoid confusion after an LPM migration once we start applying topology
> updates. 
> 

What do we mean by cpu/memory ratio. The topology across reboot would have
changed only if PowerVM would have allocated resources differently by
scrambling/unscrambling. We are no more processing topology updates at
runtime. As far as I know, after LPM, the source topology is maintained.

> >> This can  be resolved by mapping the firmware provided group id to a 
> >> logical Linux
> >> NUMA id. In this patch, we do this only for pseries platforms considering 
> >> the
> >
> > On PowerVM, as you would know the nid is already a logical or a flattened
> > chip-id and not the actual hardware chip-id.
> 
> Yes. But then they are derived based on PowerVM resources AKA domains.
> Now based on the available resource on a system, we could end up with
> different node numbers with same toplogy across reboots. Making it
> logical at OS level prevent that. 

The above statement kind of gives an impression, that topology changes
across every reboot.  We only end up with different node numbers if and only
if the underlying topology has changed and that case is very rare. Or am I
missing something?

> 
> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> >> index e437a9ac4956..6c659aada55b 100644
> >> --- a/arch/powerpc/mm/numa.c
> >> +++ b/arch/powerpc/mm/numa.c
> >> @@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int nid,
> >>}
> >>  }
> >> 
> >> +static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] =  
> >> NUMA_NO_NODE};
> >> +
> >> +int firmware_group_id_to_nid(int firmware_gid)
> >> +{
> >> +  static int last_nid = 0;
> >> +
> >> +  /*
> >> +   * For PowerNV we don't change the node id. This helps to avoid
> >> +   * confusion w.r.t the expected node ids. On pseries, node numbers
> >> +   * are virtualized. Hence do logical node id for pseries.
> >> +   */
> >> +  if (!firmware_has_feature(FW_FEATURE_LPAR))
> >> +  return firmware_gid;
> >> +
> >> +  if (firmware_gid ==  -1)
> >> +  return NUMA_NO_NODE;
> >> +
> >> +  if (nid_map[firmware_gid] == NUMA_NO_NODE)
> >> +  nid_map[firmware_gid] = last_nid++;
> >
> > How do we ensure 2 simultaneous firmware_group_id_to_nid() calls dont end up
> > at this place in parallel?
> 
> Do we have a code path where we do that? All the node id init should
> happen early and there should not be two cpus doing node init at the
> same time. I might be mistaken. Can you point to the code path where you
> expect this to be called in parallel?
> 

associativity_to_nid gets called the first time a cpu is being made present
from offline. So it need not be in boot path. We may to verify if cpu
hotplug, dlpar, operations are synchronized. For example a memory hotadd and
cpu hotplug are they synchronized? I am not sure if they are synchronized at
this time.

> >
> >> +
> >> +  return nid_map[firmware_gid];
> >> +}
> >> +
> >>  /* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
> >>   * info is found.
> >>   */
> >>  static int associativity_to_nid(const __be32 *associativity)
> >>  {
> >>int nid = NUMA_NO_NODE;
> >> +  int firmware_gid = -1;
> >> 
> >>if (!numa_enabled)
> >>goto out;
> >> 
> >>if (of_read_number(associativity, 1) >= min_common_depth)
> >> -  nid = of_read_number([min_common_depth], 1);
> >> +  firmware_gid = of_read_number([min_common_depth], 
> >> 1);
> >> 
> >>/* POWER4 LPAR uses 0x as invalid node */
> >> -  if (nid == 0x || nid >= MAX_NUMNODES)
> >> -  nid = NUMA_NO_NODE;
> >> +  if (firmware_gid == 0x || firmware_gid >= MAX_NUMNODES)
> >> +  firmware_gid = -1;
> >
> > Lets assume two or more invocations of associativity_to_nid for the same
> > associativity, end up with -1, In each case aren't giving different
> > nids?
> 
> 
> I didn't quiet get the comment here. But I assume you are indicating the
> same one you mentioned above?
> 

No its not related to the above comment.
We are incrementing the nid_map table for every unique firmware_gid or for
every -1 (aka invalid associativities). If there are sufficiently large
number of associativities that 

Re: [PATCH] ASoC: fsl_sai: Clean code for synchronize mode

2020-08-04 Thread Nicolin Chen
On Tue, Aug 04, 2020 at 12:03:46AM -0700, Nicolin Chen wrote:
> On Tue, Aug 04, 2020 at 12:22:53PM +0800, Shengjiu Wang wrote:
> 
> > > > Btw, do we need similar change for TRIGGER_STOP?
> > >
> > > This is a good question. It is better to do change for STOP,
> > > but I am afraid that there is no much test to guarantee the result.
> 
> > Maybe we can do this change for STOP.
> 
> The idea looks good to me...(check inline comments)
>  
> > diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
> > index 1c0e06bb3783..6e4be398eaee 100644
> > --- a/sound/soc/fsl/fsl_sai.c
> > +++ b/sound/soc/fsl/fsl_sai.c
> > @@ -517,6 +517,37 @@ static int fsl_sai_hw_free(struct
> > snd_pcm_substream *substream,
> > return 0;
> >  }
> > 
> > +static void fsl_sai_config_disable(struct fsl_sai *sai, bool tx)
> > +{
> > +   unsigned int ofs = sai->soc_data->reg_offset;
> > +   u32 xcsr, count = 100;
> > +
> > +   regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
> > +  FSL_SAI_CSR_TERE, 0);
> > +
> > +   /* TERE will remain set till the end of current frame */
> > +   do {
> > +   udelay(10);
> > +   regmap_read(sai->regmap, FSL_SAI_xCSR(tx, ofs), );
> > +   } while (--count && xcsr & FSL_SAI_CSR_TERE);
> > +
> > +   regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
> > +  FSL_SAI_CSR_FR, FSL_SAI_CSR_FR);
> > +
> > +   /*
> > +* For sai master mode, after several open/close sai,
> > +* there will be no frame clock, and can't recover
> > +* anymore. Add software reset to fix this issue.
> > +* This is a hardware bug, and will be fix in the
> > +* next sai version.
> > +*/
> > +   if (!sai->is_slave_mode) {
> > +   /* Software Reset for both Tx and Rx */
> 
> Remove "for both Tx and Rx"
> 
> > /* Check if the opposite FRDE is also disabled */
> > regmap_read(sai->regmap, FSL_SAI_xCSR(!tx, ofs), );
> > +   if (sai->synchronous[tx] && !sai->synchronous[!tx] && 
> > !(xcsr & FSL_SAI_CSR_FRDE))
> > +   fsl_sai_config_disable(sai, !tx);
> 
> > +   if (sai->synchronous[tx] || !sai->synchronous[!tx] || 
> > !(xcsr & FSL_SAI_CSR_FRDE))
> > +   fsl_sai_config_disable(sai, tx);
> 
> The first "||" should probably be "&&".
> 
> The trigger() logic is way more complicated than a simple cleanup
> in my opinion. Would suggest to split RMR part out of this change.
> 
> And for conditions like "sync[tx] && !sync[!tx]", it'd be better
> to have a helper function to improve readability:
> 
> /**
>  * fsl_sai_dir_is_synced - Check if stream is synced by the opposite stream
>  *
>  * SAI supports synchronous mode using bit/frame clocks of either 
> Transmitter's
>  * or Receiver's for both streams. This function is used to check if clocks of
>  * current stream's are synced by the opposite stream.

Aww..this should be a generic function, so drop "current":

  * or Receiver's for both streams. This function is used to check if clocks of
  * the stream's are synced by the opposite stream.

>  *
>  * @sai: SAI context
>  * @dir: direction of current stream

Ditto:
   * @dir: stream direction

Thanks.
-
   
>  */
> static inline bool fsl_sai_dir_is_synced(struct fsl_sai *sai, int dir)
> {
>   int adir = (dir == TX) ? RX : TX;
> 
>   /* current dir in async mode while opposite dir in sync mode */
>   return !sai->synchronous[dir] && sai->synchronous[adir];
> }
> 
> Then add more comments in trigger:
> 
> static ...trigger()
> {
>   bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
>   int adir = tx ? RX : TX;
>   int dir = tx ? TX : RX;
> 
>   // 
>   {
>   // ...
> 
>   /* Check if the opposite FRDE is also disabled */
>   regmap_read(sai->regmap, FSL_SAI_xCSR(!tx, ofs), );
> 
>   /*
>* If opposite stream provides clocks for synchronous mode and
>* it is inactive, disable it before disabling the current one
>*/
>   if (fsl_sai_dir_is_synced(adir) && !(xcsr & FSL_SAI_CSR_FRDE))
>   fsl_sai_config_disable(sai, adir);
> 
>   /*
>* Disable current stream if either of:
>* 1. current stream doesn't provide clocks for synchronous mode
>* 2. current stream provides clocks for synchronous mode but no
>*more stream is active.
>*/
>   if (!fsl_sai_dir_is_synced(dir) || !(xcsr & FSL_SAI_CSR_FRDE))
>   fsl_sai_config_disable(sai, dir);
> 
>   // ...
>   }
>   // 
> }
> 
> Note, in fsl_sai_config_disable(sai, dir):
>   bool tx = dir == TX;
> 
> Above all, I am just drafting, so please test it thoroughly :)


Re: [PATCH 1/6] powerpc/powernv/smp: Fix spurious DBG() warning

2020-08-04 Thread Oliver O'Halloran
On Tue, Aug 4, 2020 at 12:07 PM Joel Stanley  wrote:
>
> Messy:
>
> $ git grep "define DBG(" arch/powerpc/ |grep -v print
> arch/powerpc/kernel/crash_dump.c:#define DBG(fmt...)
> arch/powerpc/kernel/iommu.c:#define DBG(...)
> arch/powerpc/kernel/legacy_serial.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/kernel/prom.c:#define DBG(fmt...)
> arch/powerpc/kernel/setup-common.c:#define DBG(fmt...)
> arch/powerpc/kernel/setup_32.c:#define DBG(fmt...)
> arch/powerpc/kernel/smp.c:#define DBG(fmt...)
> arch/powerpc/kernel/vdso.c:#define DBG(fmt...)
> arch/powerpc/kvm/book3s_hv_rm_xive.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/mm/book3s64/hash_utils.c:#define DBG(fmt...)
> arch/powerpc/platforms/83xx/mpc832x_mds.c:#define DBG(fmt...)
> arch/powerpc/platforms/83xx/mpc832x_rdb.c:#define DBG(fmt...)
> arch/powerpc/platforms/83xx/mpc836x_mds.c:#define DBG(fmt...)
> arch/powerpc/platforms/85xx/mpc85xx_ds.c:#define DBG(fmt, args...)
> arch/powerpc/platforms/85xx/mpc85xx_mds.c:#define DBG(fmt...)
> arch/powerpc/platforms/85xx/mpc85xx_rdb.c:#define DBG(fmt, args...)
> arch/powerpc/platforms/86xx/mpc86xx_hpcn.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/platforms/cell/setup.c:#define DBG(fmt...)
> arch/powerpc/platforms/cell/smp.c:#define DBG(fmt...)
> arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c:#define DBG(fmt...)
> do { } while(0)
> arch/powerpc/platforms/maple/pci.c:#define DBG(x...)
> arch/powerpc/platforms/maple/setup.c:#define DBG(fmt...)
> arch/powerpc/platforms/maple/time.c:#define DBG(x...)
> arch/powerpc/platforms/powermac/bootx_init.c:#define DBG(fmt...) do { } 
> while(0)
> arch/powerpc/platforms/powermac/feature.c:#define DBG(fmt...)
> arch/powerpc/platforms/powermac/low_i2c.c:#define DBG(x...) do {\
> arch/powerpc/platforms/powermac/low_i2c.c:#define DBG(x...)
> arch/powerpc/platforms/powermac/nvram.c:#define DBG(x...)
> arch/powerpc/platforms/powermac/pci.c:#define DBG(x...)
> arch/powerpc/platforms/powermac/pfunc_base.c:#define DBG(fmt...)
> arch/powerpc/platforms/powermac/pfunc_core.c:#define DBG(fmt...)
> arch/powerpc/platforms/powermac/smp.c:#define DBG(fmt...)
> arch/powerpc/platforms/powermac/time.c:#define DBG(x...)
> arch/powerpc/platforms/powernv/smp.c:#define DBG(fmt...)
> arch/powerpc/sysdev/dart_iommu.c:#define DBG(...)
> arch/powerpc/sysdev/ge/ge_pic.c:#define DBG(fmt...) do { } while (0)
> arch/powerpc/sysdev/mpic.c:#define DBG(fmt...)
> arch/powerpc/sysdev/tsi108_dev.c:#define DBG(fmt...) do { } while(0)
> arch/powerpc/sysdev/tsi108_pci.c:#define DBG(x...)

I started off writing a patch that fixed all these too. When I went to
test it I discovered there's a giant pile of other W=1 warnings from
other parts of arch/powerpc/ so I figured I'd start with something
less ambitious.


Re: [merge] Build failure selftest/powerpc/mm/pkey_exec_prot

2020-08-04 Thread Sandipan Das
Hi Michael,

On 04/08/20 6:38 am, Michael Ellerman wrote:
> Sandipan Das  writes:
>> On 03/08/20 4:32 pm, Michael Ellerman wrote:
>>> Sachin Sant  writes:
> On 02-Aug-2020, at 10:58 PM, Sandipan Das  wrote:
> On 02/08/20 4:45 pm, Sachin Sant wrote:
>> pkey_exec_prot test from linuxppc merge branch (3f68564f1f5a) fails to
>> build due to following error:
>>
>> gcc -std=gnu99 -O2 -Wall -Werror 
>> -DGIT_VERSION='"v5.8-rc7-1276-g3f68564f1f5a"' 
>> -I/home/sachin/linux/tools/testing/selftests/powerpc/include  -m64
>> pkey_exec_prot.c 
>> /home/sachin/linux/tools/testing/selftests/kselftest_harness.h 
>> /home/sachin/linux/tools/testing/selftests/kselftest.h ../harness.c 
>> ../utils.c  -o 
>> /home/sachin/linux/tools/testing/selftests/powerpc/mm/pkey_exec_prot
>> In file included from pkey_exec_prot.c:18:
>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:34: 
>> error: "SYS_pkey_mprotect" redefined [-Werror]
>> #define SYS_pkey_mprotect 386
>>
>> In file included from /usr/include/sys/syscall.h:31,
>> from 
>> /home/sachin/linux/tools/testing/selftests/powerpc/include/utils.h:47,
>> from 
>> /home/sachin/linux/tools/testing/selftests/powerpc/include/pkeys.h:12,
>> from pkey_exec_prot.c:18:
>> /usr/include/bits/syscall.h:1583: note: this is the location of the 
>> previous definition
>> # define SYS_pkey_mprotect __NR_pkey_mprotect
>>
>> commit 128d3d021007 introduced this error.
>> selftests/powerpc: Move pkey helpers to headers
>>
>> Possibly the # defines for sys calls can be retained in pkey_exec_prot.c 
>> or
>>
>
> I am unable to reproduce this on the latest merge branch (HEAD at 
> f59195f7faa4).
> I don't see any redefinitions in pkey_exec_prot.c either.
>

 I can still see this problem on latest merge branch.
 I have following gcc version

 gcc version 8.3.1 20191121
>>>
>>> What libc version? Or just the distro & version?
>>
>> Sachin observed this on RHEL 8.2 with glibc-2.28.
>> I couldn't reproduce it on Ubuntu 20.04 and Fedora 32 and both these distros
>> are using glibc-2.31.
> 
> OK odd. Usually it's newer glibc that hits this problem.
> 
> I guess on RHEL 8.2 we're getting the asm-generic version? But that
> would be quite wrong if that's what's happening.
> 

If I let GCC dump all the headers that are being used for the source file, I 
always
see syscall.h being included on the RHEL 8.2 system. That is the header with the
conflicting definition.

  $ cd tools/testing/selftests/powerpc/mm
  $ gcc -H -std=gnu99 -O2 -Wall -Werror 
-DGIT_VERSION='"v5.8-rc7-1456-gf59195f7faa4-dirty"' \
-I../include -m64 pkey_exec_prot.c ../../kselftest_harness.h 
../../kselftest.h ../harness.c ../utils.c \
-o pkey_exec_prot 2>&1 | grep syscall

On Ubuntu 20.04 and Fedora 32, grep doesn't find any matching text.
On RHEL 8.2, it shows the following.
  ... /usr/include/sys/syscall.h
   /usr/include/bits/syscall.h
  In file included from /usr/include/sys/syscall.h:31,
  /usr/include/bits/syscall.h:1583: note: this is the location of the previous 
definition
  In file included from /usr/include/sys/syscall.h:31,
  /usr/include/bits/syscall.h:1575: note: this is the location of the previous 
definition
  In file included from /usr/include/sys/syscall.h:31,
  /usr/include/bits/syscall.h:1579: note: this is the location of the previous 
definition
  /usr/include/bits/syscall.h
  .. /usr/include/sys/syscall.h
  ... /usr/include/bits/syscall.h
  /usr/include/bits/syscall.h
  .. /usr/include/sys/syscall.h
  ... /usr/include/bits/syscall.h
  /usr/include/bits/syscall.h

So utils.h is also including /usr/include/sys/syscall.h for glibc versions 
older than 2.30
because of commit 743f3544fffb ("selftests/powerpc: Add wrapper for gettid") :)

[...]
. ../include/pkeys.h
[...]
.. ../include/utils.h
[...]
... /usr/include/sys/syscall.h
 /usr/include/asm/unistd.h
 /usr/include/bits/syscall.h
In file included from pkey_exec_prot.c:18:
../include/pkeys.h:34: error: "SYS_pkey_mprotect" redefined [-Werror]
 #define SYS_pkey_mprotect 386

In file included from /usr/include/sys/syscall.h:31,
 from ../include/utils.h:47,
 from ../include/pkeys.h:12,
 from pkey_exec_prot.c:18:
/usr/include/bits/syscall.h:1583: note: this is the location of the previous 
definition
 # define SYS_pkey_mprotect __NR_pkey_mprotect
[...]


- Sandipan