Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes

2019-11-07 Thread David Hildenbrand

On 08.11.19 06:09, Dan Williams wrote:

On Thu, Nov 7, 2019 at 2:07 PM David Hildenbrand  wrote:


On 07.11.19 19:22, David Hildenbrand wrote:




Am 07.11.2019 um 16:40 schrieb Dan Williams :

On Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand  wrote:


Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
change that.

KVM has this weird use case that you can map anything from /dev/mem
into the guest. pfn_valid() is not a reliable check whether the memmap
was initialized and can be touched. pfn_to_online_page() makes sure
that we have an initialized memmap (and don't have ZONE_DEVICE memory).

Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to make
sure the function produces the same result once we stop setting ZONE_DEVICE
pages PG_reserved.

Cc: Alex Williamson 
Cc: Cornelia Huck 
Signed-off-by: David Hildenbrand 
---
drivers/vfio/vfio_iommu_type1.c | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..f8ce8c408ba8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long 
npage, bool async)
   */
static bool is_invalid_reserved_pfn(unsigned long pfn)
{
-   if (pfn_valid(pfn))
-   return PageReserved(pfn_to_page(pfn));
+   struct page *page = pfn_to_online_page(pfn);


Ugh, I just realized this is not a safe conversion until
pfn_to_online_page() is moved over to subsection granularity. As it
stands it will return true for any ZONE_DEVICE pages that share a
section with boot memory.


That should not happen right now and I commented back when you introduced 
subsection support that I don’t want to have ZONE_DEVICE mixed with online 
pages in a section. Having memory block devices that partially span ZONE_DEVICE 
would be ... really weird. With something like pfn_active() - as discussed - we 
could at least make this check work - but I am not sure if we really want to go 
down that path. In the worst case, some MB of RAM are lost ... I guess this 
needs more thought.



I just realized the "boot memory" part. Is that a real thing? IOW, can
we have ZONE_DEVICE falling into a memory block (with holes)? I somewhat
have doubts that this would work ...


One of the real world failure cases that started the subsection effect
is that Persistent Memory collides with System RAM on a 64MB boundary
on shipping platforms. System RAM ends on a 64MB boundary and due to a
lack of memory controller resources PMEM is mapped contiguously at the
end of that boundary. Some more details in the subsection cover letter
/ changelogs [1] [2]. It's not sufficient to just lose some memory,
that's the broken implementation that lead to the subsection work
because the lost memory may change from one boot to the next and
software can't reliably inject a padding that conforms to the x86
128MB section constraint.


Thanks, I thought it was mostly for weird alignment where other parts of 
the section are basically "holes" and not memory.


Yes, it is a real bug that ZONE_DEVICE pages fall into sections that are 
marked SECTION_IS_ONLINE.




Suffice to say I think we need your pfn_active() to get subsection
granularity pfn_to_online_page() before PageReserved() can be removed.


I agree that we have to fix this. I don't like ZONE_DEVICE pages falling 
into memory device blocks (e.g., cannot get offlined), but I guess that 
train is gone :) As long as it's not for memory hotplug, I can most 
probably live with this.


Also, I'd like to get Michals opinion on this and the pfn_active() 
approach, but I can understand he's busy.


This patch set can wait, I won't be working next week besides 
reading/writing mails either way.


Is anybody looking into the pfn_active() thingy?



[1]: 
https://lore.kernel.org/linux-mm/156092349300.979959.17603710711957735135.st...@dwillia2-desk3.amr.corp.intel.com/
[2]: 
https://lore.kernel.org/linux-mm/156092354368.979959.6232443923440952359.st...@dwillia2-desk3.amr.corp.intel.com/




--

Thanks,

David / dhildenb



Re: [PATCH 09/10] powerpc: Enable OpenCAPI Storage Class Memory driver on bare metal

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:47, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Enable OpenCAPI Storage Class Memory driver on bare metal

Signed-off-by: Alastair D'Silva 
---
  arch/powerpc/configs/powernv_defconfig | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index 6658cceb928c..45c0eff94964 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -352,3 +352,7 @@ CONFIG_KVM_BOOK3S_64=m
  CONFIG_KVM_BOOK3S_64_HV=m
  CONFIG_VHOST_NET=m
  CONFIG_PRINTK_TIME=y
+CONFIG_OCXL_SCM=m
+CONFIG_DEV_DAX=y
+CONFIG_DEV_DAX_PMEM=y
+CONFIG_FS_DAX=y



If this really the intent or do we want to activate DAX only if 
CONFIG_OCXL_SCM is enabled?


  Fred



RE: [RFC v1 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-11-07 Thread Ram Pai
On Thu, Nov 07, 2019 at 09:29:54PM +1100, Michael Ellerman wrote:
> Ram Pai  writes:
> > The hypervisor needs to access the contents of the page holding the TCE
> > entries while setting up the TCE entries in the IOMMU's TCE table. For
> > SecureVMs, since this page is encrypted, the hypervisor cannot access
> > valid entries. Share the page with the hypervisor. This ensures that the
> > hypervisor sees the valid entries.
> 
> Can you please give people some explanation of why this is safe. After
> all the point of the Ultravisor is to protect the guest from a malicious
> hypervisor. Giving the hypervisor access to a page of TCEs sounds
> dangerous, so please explain why it's not.

Yes. will do, in my next version of the patch.

BTW: this page, which is shareed with the hypervisor contains nothing
but TCE entries. The hypervisor has a need to see those entries, so that it
can update the TCE table with correct entires.

Yes, a malicious hypervisor may try to update the TCE table with entries
that point to incorrect memory location.  But doing so will not help the
hypervisor to steal any data from those memory location, because those
memory location; if accessed by the hypervisor, will only fetch
encrypted data.

At most it can lead to denial of service, but not stolen data.

RP



Re: [PATCH 3/3] arch: sembuf.h: make uapi asm/sembuf.h self-contained

2019-11-07 Thread Masahiro Yamada
Hi Andrew,

I think you modified the commit log before applying this patch.
I just noticed a typo.


commit 411865d8dd2c31f56eefc54bc16fabb47e1bfb73
Author: Masahiro Yamada 
Date:   Wed Nov 6 16:07:08 2019 +1100

arch: sembuf.h: make uapi asm/sembuf.h self-contained

Uuserspace cannot compile  due to some missing type
definitions.  For example, building it for x86 fails as follows:



If possible, could you fix up  s/Uuserspace/Userspace/  ?


Thanks.
Masahiro Yamada





On Wed, Oct 30, 2019 at 3:40 PM Masahiro Yamada
 wrote:
>
> The user-space cannot compile  due to some missing type
> definitions. For example, building it for x86 fails as follows:
>
>   CC  usr/include/asm/sembuf.h.s
> In file included from :32:0:
> ./usr/include/asm/sembuf.h:17:20: error: field ‘sem_perm’ has incomplete type
>   struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
> ^~~~
> ./usr/include/asm/sembuf.h:24:2: error: unknown type name ‘__kernel_time_t’
>   __kernel_time_t sem_otime; /* last semop time */
>   ^~~
> ./usr/include/asm/sembuf.h:25:2: error: unknown type name ‘__kernel_ulong_t’
>   __kernel_ulong_t __unused1;
>   ^~~~
> ./usr/include/asm/sembuf.h:26:2: error: unknown type name ‘__kernel_time_t’
>   __kernel_time_t sem_ctime; /* last change time */
>   ^~~
> ./usr/include/asm/sembuf.h:27:2: error: unknown type name ‘__kernel_ulong_t’
>   __kernel_ulong_t __unused2;
>   ^~~~
> ./usr/include/asm/sembuf.h:29:2: error: unknown type name ‘__kernel_ulong_t’
>   __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
>   ^~~~
> ./usr/include/asm/sembuf.h:30:2: error: unknown type name ‘__kernel_ulong_t’
>   __kernel_ulong_t __unused3;
>   ^~~~
> ./usr/include/asm/sembuf.h:31:2: error: unknown type name ‘__kernel_ulong_t’
>   __kernel_ulong_t __unused4;
>   ^~~~
>
> It is just a matter of missing include directive.
>
> Include  to make it self-contained, and add it to
> the compile-test coverage.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/mips/include/uapi/asm/sembuf.h| 2 ++
>  arch/parisc/include/uapi/asm/sembuf.h  | 1 +
>  arch/powerpc/include/uapi/asm/sembuf.h | 2 ++
>  arch/sparc/include/uapi/asm/sembuf.h   | 2 ++
>  arch/x86/include/uapi/asm/sembuf.h | 2 ++
>  arch/xtensa/include/uapi/asm/sembuf.h  | 1 +
>  include/uapi/asm-generic/sembuf.h  | 1 +
>  usr/include/Makefile   | 1 -
>  8 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/mips/include/uapi/asm/sembuf.h 
> b/arch/mips/include/uapi/asm/sembuf.h
> index 60c89e6cb25b..7d135b93bebd 100644
> --- a/arch/mips/include/uapi/asm/sembuf.h
> +++ b/arch/mips/include/uapi/asm/sembuf.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_SEMBUF_H
>  #define _ASM_SEMBUF_H
>
> +#include 
> +
>  /*
>   * The semid64_ds structure for the MIPS architecture.
>   * Note extra padding because this structure is passed back and forth
> diff --git a/arch/parisc/include/uapi/asm/sembuf.h 
> b/arch/parisc/include/uapi/asm/sembuf.h
> index 3c31163b1241..b17a2460b184 100644
> --- a/arch/parisc/include/uapi/asm/sembuf.h
> +++ b/arch/parisc/include/uapi/asm/sembuf.h
> @@ -3,6 +3,7 @@
>  #define _PARISC_SEMBUF_H
>
>  #include 
> +#include 
>
>  /*
>   * The semid64_ds structure for parisc architecture.
> diff --git a/arch/powerpc/include/uapi/asm/sembuf.h 
> b/arch/powerpc/include/uapi/asm/sembuf.h
> index 3f60946f77e3..f42c9c3502c7 100644
> --- a/arch/powerpc/include/uapi/asm/sembuf.h
> +++ b/arch/powerpc/include/uapi/asm/sembuf.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_POWERPC_SEMBUF_H
>  #define _ASM_POWERPC_SEMBUF_H
>
> +#include 
> +
>  /*
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> diff --git a/arch/sparc/include/uapi/asm/sembuf.h 
> b/arch/sparc/include/uapi/asm/sembuf.h
> index f3d309c2e1cd..5d7764cdf80f 100644
> --- a/arch/sparc/include/uapi/asm/sembuf.h
> +++ b/arch/sparc/include/uapi/asm/sembuf.h
> @@ -2,6 +2,8 @@
>  #ifndef _SPARC_SEMBUF_H
>  #define _SPARC_SEMBUF_H
>
> +#include 
> +
>  /*
>   * The semid64_ds structure for sparc architecture.
>   * Note extra padding because this structure is passed back and forth
> diff --git a/arch/x86/include/uapi/asm/sembuf.h 
> b/arch/x86/include/uapi/asm/sembuf.h
> index 89de6cd9f0a7..da0464af7aa6 100644
> --- a/arch/x86/include/uapi/asm/sembuf.h
> +++ b/arch/x86/include/uapi/asm/sembuf.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_X86_SEMBUF_H
>  #define _ASM_X86_SEMBUF_H
>
> +#include 
> +
>  /*
>   * The semid64_ds structure for x86 architecture.
>   * Note extra padding because this structure is passed back and forth
> diff --git a/arch/xtensa/include/uapi/asm/sembuf.h 
> b/arch/xtensa/include/uapi/asm/sembuf.h
> index 09f348d643f1..3b9cdd406dfe 100644
> --- a/arch/xtensa/include/uapi/asm/sembuf.h
> +++ b/arch/xtensa/include/uapi/asm/sembuf.h
> @@ -22,6 +22,7 @@
>  #define 

RE: [RFC v1 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.

2019-11-07 Thread Ram Pai
On Thu, Nov 07, 2019 at 09:26:28PM +1100, Michael Ellerman wrote:
> Ram Pai  writes:
> > This enables IOMMU support for pseries Secure VMs.
> 
> Can you give us some more explanation please?

Yes. Will do. 

The simple explanation is -- it was a mistake. We should 
not have disabled IOMMU ops for secure guests. Though it enabled
us to use virtio devices, with the help of some additional patches to
the virtio subsystem; in hindsight, we should not have disabled IOMMU
ops for secure VMs  :-(. 

RP



> 
> This is basically a revert of commit:
>   edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on secure 
> guests")
> 
> But neglects to remove the now unnecessary include of svm.h.
> 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index 07f0847..189717b 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -1333,15 +1333,7 @@ void iommu_init_early_pSeries(void)
> > of_reconfig_notifier_register(_reconfig_nb);
> > register_memory_notifier(_mem_nb);
> >  
> > -   /*
> > -* Secure guest memory is inacessible to devices so regular DMA isn't
> > -* possible.
> > -*
> > -* In that case keep devices' dma_map_ops as NULL so that the generic
> > -* DMA code path will use SWIOTLB to bounce buffers for DMA.
> 
> Please explain what has changed to make this no longer necessary.
> 
> cheers
> 
> > -*/
> > -   if (!is_secure_guest())
> > -   set_pci_dma_ops(_iommu_ops);
> > +   set_pci_dma_ops(_iommu_ops);
> >  }
> >  
> >  static int __init disable_multitce(char *str)
> > -- 
> > 1.8.3.1

-- 
Ram Pai



Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes

2019-11-07 Thread Dan Williams
On Thu, Nov 7, 2019 at 2:07 PM David Hildenbrand  wrote:
>
> On 07.11.19 19:22, David Hildenbrand wrote:
> >
> >
> >> Am 07.11.2019 um 16:40 schrieb Dan Williams :
> >>
> >> On Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand  
> >> wrote:
> >>>
> >>> Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
> >>> change that.
> >>>
> >>> KVM has this weird use case that you can map anything from /dev/mem
> >>> into the guest. pfn_valid() is not a reliable check whether the memmap
> >>> was initialized and can be touched. pfn_to_online_page() makes sure
> >>> that we have an initialized memmap (and don't have ZONE_DEVICE memory).
> >>>
> >>> Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to make
> >>> sure the function produces the same result once we stop setting 
> >>> ZONE_DEVICE
> >>> pages PG_reserved.
> >>>
> >>> Cc: Alex Williamson 
> >>> Cc: Cornelia Huck 
> >>> Signed-off-by: David Hildenbrand 
> >>> ---
> >>> drivers/vfio/vfio_iommu_type1.c | 10 --
> >>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >>> b/drivers/vfio/vfio_iommu_type1.c
> >>> index 2ada8e6cdb88..f8ce8c408ba8 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long 
> >>> npage, bool async)
> >>>   */
> >>> static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>> {
> >>> -   if (pfn_valid(pfn))
> >>> -   return PageReserved(pfn_to_page(pfn));
> >>> +   struct page *page = pfn_to_online_page(pfn);
> >>
> >> Ugh, I just realized this is not a safe conversion until
> >> pfn_to_online_page() is moved over to subsection granularity. As it
> >> stands it will return true for any ZONE_DEVICE pages that share a
> >> section with boot memory.
> >
> > That should not happen right now and I commented back when you introduced 
> > subsection support that I don’t want to have ZONE_DEVICE mixed with online 
> > pages in a section. Having memory block devices that partially span 
> > ZONE_DEVICE would be ... really weird. With something like pfn_active() - 
> > as discussed - we could at least make this check work - but I am not sure 
> > if we really want to go down that path. In the worst case, some MB of RAM 
> > are lost ... I guess this needs more thought.
> >
>
> I just realized the "boot memory" part. Is that a real thing? IOW, can
> we have ZONE_DEVICE falling into a memory block (with holes)? I somewhat
> have doubts that this would work ...

One of the real world failure cases that started the subsection effect
is that Persistent Memory collides with System RAM on a 64MB boundary
on shipping platforms. System RAM ends on a 64MB boundary and due to a
lack of memory controller resources PMEM is mapped contiguously at the
end of that boundary. Some more details in the subsection cover letter
/ changelogs [1] [2]. It's not sufficient to just lose some memory,
that's the broken implementation that lead to the subsection work
because the lost memory may change from one boot to the next and
software can't reliably inject a padding that conforms to the x86
128MB section constraint.

Suffice to say I think we need your pfn_active() to get subsection
granularity pfn_to_online_page() before PageReserved() can be removed.

[1]: 
https://lore.kernel.org/linux-mm/156092349300.979959.17603710711957735135.st...@dwillia2-desk3.amr.corp.intel.com/
[2]: 
https://lore.kernel.org/linux-mm/156092354368.979959.6232443923440952359.st...@dwillia2-desk3.amr.corp.intel.com/


Re: [PATCH V8] mm/debug: Add tests validating architecture page table helpers

2019-11-07 Thread Anshuman Khandual



On 11/08/2019 12:35 AM, Vineet Gupta wrote:
> On 11/6/19 8:44 PM, Anshuman Khandual wrote:
>>
>>>
   */
 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
  #include 
  #endif
>>> This in wrong.  CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE is a just a glue 
>>> toggle,
>>> used only in Kconfig files (and not in any "C" code).  It enables generic 
>>> Kconfig
>>> code to allow visibility of CONFIG_TRANSPARENT_HUGEPAGE w/o every arch 
>>> needing to
>>> do a me too.
>>>
>>> I think you need to use CONFIG_TRANSPARENT_HUGEPAGE to guard appropriate 
>>> tests. I
>>> understand that it only
>> We can probably replace CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE wrapper with
>> CONFIG_TRANSPARENT_HUGEPAGE. But CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> explicitly depends on CONFIG_TRANSPARENT_HUGEPAGE as a prerequisite. Could
>> you please confirm if the following change on this test will work on ARC
>> platform for both THP and !THP cases ? Thank you.
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 621ac09..99ebc7c 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -67,7 +67,7 @@ static void __init pte_basic_tests(unsigned long pfn, 
>> pgprot_t prot)
>>  WARN_ON(pte_write(pte_wrprotect(pte)));
>>  }
>>  
>> -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>>  {
>>  pmd_t pmd = pfn_pmd(pfn, prot);
>> @@ -85,9 +85,6 @@ static void __init pmd_basic_tests(unsigned long pfn, 
>> pgprot_t prot)
>>   */
>>  WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>>  }
>> -#else
>> -static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> -#endif
>>  
>>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>  static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
>> @@ -112,6 +109,10 @@ static void __init pud_basic_tests(unsigned long pfn, 
>> pgprot_t prot)
>>  #else
>>  static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>>  #endif
>> +#else
>> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
> 
> Fails to build for THP case since
> 
> CONFIG_TRANSPARENT_HUGEPAGE=y
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=n
> 
> ../mm/debug_vm_pgtable.c:112:20: error: redefinition of ‘pmd_basic_tests’
> 

Hmm, really ? With arm64 defconfig we have the same default combination
where it builds.

CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=n /* It should not even appear */

With the above change, we have now

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
{


}

#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
{


}
#else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
#endif
#else   /* !CONFIG_TRANSPARENT_HUGEPAGE */
static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
#endif

When !CONFIG_TRANSPARENT_HUGEPAGE

- Dummy definitions for pmd_basic_tests() and pud_basic_tests()

When CONFIG_TRANSPARENT_HUGEPAGE and !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD

- Actual pmd_basic_tests() and dummy pud_basic_tests()

When CONFIG_TRANSPARENT_HUGEPAGE and CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD

- Actual pmd_basic_tests() and pud_basic_tests()

Tested this on arm64 which does not have 
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
for THP and !THP and on x86 which has CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
for THP and !THP which basically covered all combination for these configs.

Is there something I am still missing in plain sight :)

- Anshuman


Re: [PATCH 10/10] ocxl: Conditionally bind SCM devices to the generic OCXL driver

2019-11-07 Thread Alastair D'Silva
On Thu, 2019-11-07 at 19:08 +0100, Frederic Barrat wrote:
> 
> Le 25/10/2019 à 06:47, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > This patch allows the user to bind OpenCAPI SCM devices to the
> > generic OCXL
> > driver.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> 
> I'm wondering if we should upstream this. Is it of any use outside
> of 
> some serious debug session for a developer?
> Also we would now have 2 drivers picking up the same device ID,
> since 
> the SCM driver is always registering for that ID, irrespective of 
> CONFIG_OCXL_SCM_GENERIC
> 
>Fred
> 

I think I'll drop this patch. It's easy enough to maintain out-of-tree
for our in-house SCM hardware engineers.

> 
> >   drivers/misc/ocxl/Kconfig | 7 +++
> >   drivers/misc/ocxl/pci.c   | 3 +++
> >   2 files changed, 10 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
> > index 1916fa65f2f2..8a683715c97c 100644
> > --- a/drivers/misc/ocxl/Kconfig
> > +++ b/drivers/misc/ocxl/Kconfig
> > @@ -29,3 +29,10 @@ config OCXL
> >   dedicated OpenCAPI link, and don't follow the same protocol.
> >   
> >   If unsure, say N.
> > +
> > +config OCXL_SCM_GENERIC
> > +   bool "Treat OpenCAPI Storage Class Memory as a generic OpenCAPI
> > device"
> > +   default n
> > +   help
> > + Select this option to treat OpenCAPI Storage Class Memory
> > + devices an generic OpenCAPI devices.
> > diff --git a/drivers/misc/ocxl/pci.c b/drivers/misc/ocxl/pci.c
> > index cb920aa88d3a..7137055c1883 100644
> > --- a/drivers/misc/ocxl/pci.c
> > +++ b/drivers/misc/ocxl/pci.c
> > @@ -10,6 +10,9 @@
> >*/
> >   static const struct pci_device_id ocxl_pci_tbl[] = {
> > { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x062B), },
> > +#ifdef CONFIG_OCXL_SCM_GENERIC
> > +   { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
> > +#endif
> > { }
> >   };
> >   MODULE_DEVICE_TABLE(pci, ocxl_pci_tbl);
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes

2019-11-07 Thread David Hildenbrand

On 07.11.19 19:22, David Hildenbrand wrote:




Am 07.11.2019 um 16:40 schrieb Dan Williams :

On Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand  wrote:


Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
change that.

KVM has this weird use case that you can map anything from /dev/mem
into the guest. pfn_valid() is not a reliable check whether the memmap
was initialized and can be touched. pfn_to_online_page() makes sure
that we have an initialized memmap (and don't have ZONE_DEVICE memory).

Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to make
sure the function produces the same result once we stop setting ZONE_DEVICE
pages PG_reserved.

Cc: Alex Williamson 
Cc: Cornelia Huck 
Signed-off-by: David Hildenbrand 
---
drivers/vfio/vfio_iommu_type1.c | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..f8ce8c408ba8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long 
npage, bool async)
  */
static bool is_invalid_reserved_pfn(unsigned long pfn)
{
-   if (pfn_valid(pfn))
-   return PageReserved(pfn_to_page(pfn));
+   struct page *page = pfn_to_online_page(pfn);


Ugh, I just realized this is not a safe conversion until
pfn_to_online_page() is moved over to subsection granularity. As it
stands it will return true for any ZONE_DEVICE pages that share a
section with boot memory.


That should not happen right now and I commented back when you introduced 
subsection support that I don’t want to have ZONE_DEVICE mixed with online 
pages in a section. Having memory block devices that partially span ZONE_DEVICE 
would be ... really weird. With something like pfn_active() - as discussed - we 
could at least make this check work - but I am not sure if we really want to go 
down that path. In the worst case, some MB of RAM are lost ... I guess this 
needs more thought.



I just realized the "boot memory" part. Is that a real thing? IOW, can 
we have ZONE_DEVICE falling into a memory block (with holes)? I somewhat 
have doubts that this would work ...


--

Thanks,

David / dhildenb



Re: [RFC PATCH] powerpc/pseries/mobility: notify network peers after migration

2019-11-07 Thread Thomas Falcon



On 11/6/19 4:14 PM, Nathan Lynch wrote:

Hi Tom,

Thomas Falcon  writes:

After a migration, it is necessary to send a gratuitous ARP
from all running interfaces so that the rest of the network
is aware of its new location. However, some supported network
devices are unaware that they have been migrated. To avoid network
interruptions and other unwanted behavior, force a GARP on all
valid, running interfaces as part of the post_mobility_fixup
routine.

[...]


@@ -331,6 +334,8 @@ void post_mobility_fixup(void)
  {
int rc;
int activate_fw_token;
+   struct net_device *netdev;
+   struct net *net;
  
  	activate_fw_token = rtas_token("ibm,activate-firmware");

if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
@@ -371,6 +376,21 @@ void post_mobility_fixup(void)
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
  
+	/* need to force a gratuitous ARP on running interfaces */

+   rtnl_lock();
+   for_each_net(net) {
+   for_each_netdev(net, netdev) {
+   if (netif_device_present(netdev) &&
+   netif_running(netdev) &&
+   !(netdev->flags & (IFF_NOARP | IFF_LOOPBACK)))
+   call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
+netdev);
+   call_netdevice_notifiers(NETDEV_RESEND_IGMP,
+netdev);
+   }
+   }
+   rtnl_unlock();
+

This isn't an outright nak, but this is not nice. It illustrates the
need to rethink the pseries partition migration code. There is no
mechanism for drivers and other interested code to prepare for a
migration or to adjust to the destination. So post_mobility_fixup() will
continue to grow into a fragile collection of calls into unrelated
subsystems until there is a better design -- either a pseries-specific
notification/callback mechanism, or something based on the pm framework.

My understanding is that this is needed specifically for ibmveth and,
unlike ibmvnic, the platform does not provide any notification to that
driver that a migration has occurred, right?


Correct, the ibmveth device, unlike ibmvnic, receives no signal or 
notification at all in the event of a partition migration, so it can not 
handle it or send a gratuitous ARP because from the driver's perspective 
nothing has changed.  As you've described, there is no existing notifier 
in the kernel to inform interested parties that the system has migrated 
or is about to migrate. Without adding the needed infrastructure to do 
that, I'm not sure how else to fix this.


Tom



Re: [RFC PATCH] powerpc/pseries/mobility: notify network peers after migration

2019-11-07 Thread Thomas Falcon



On 11/6/19 7:33 PM, Michael Ellerman wrote:

Hi Thomas,

Thomas Falcon  writes:

After a migration, it is necessary to send a gratuitous ARP
from all running interfaces so that the rest of the network
is aware of its new location. However, some supported network
devices are unaware that they have been migrated. To avoid network
interruptions and other unwanted behavior, force a GARP on all
valid, running interfaces as part of the post_mobility_fixup
routine.

Signed-off-by: Thomas Falcon 
---
  arch/powerpc/platforms/pseries/mobility.c | 20 
  1 file changed, 20 insertions(+)

This patch is in powerpc code, but it's doing networking stuff that I
don't really understand.

So I'd like an Ack from Dave or someone else in netdev land before I
merge it.


Thanks, I've already included netdev in the CC list. I'll wait and keep 
an eye out for any comments from that side.


Tom





cheers



diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index b571285f6c14..c1abc14cf2bb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -17,6 +17,9 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
  
  #include 

  #include 
@@ -331,6 +334,8 @@ void post_mobility_fixup(void)
  {
int rc;
int activate_fw_token;
+   struct net_device *netdev;
+   struct net *net;
  
  	activate_fw_token = rtas_token("ibm,activate-firmware");

if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
@@ -371,6 +376,21 @@ void post_mobility_fixup(void)
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
  
+	/* need to force a gratuitous ARP on running interfaces */

+   rtnl_lock();
+   for_each_net(net) {
+   for_each_netdev(net, netdev) {
+   if (netif_device_present(netdev) &&
+   netif_running(netdev) &&
+   !(netdev->flags & (IFF_NOARP | IFF_LOOPBACK)))
+   call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
+netdev);
+   call_netdevice_notifiers(NETDEV_RESEND_IGMP,
+netdev);
+   }
+   }
+   rtnl_unlock();
+
return;
  }
  
--

2.12.3


Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes

2019-11-07 Thread David Hildenbrand


> Am 07.11.2019 um 16:40 schrieb Dan Williams :
> 
> On Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand  wrote:
>> 
>> Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
>> change that.
>> 
>> KVM has this weird use case that you can map anything from /dev/mem
>> into the guest. pfn_valid() is not a reliable check whether the memmap
>> was initialized and can be touched. pfn_to_online_page() makes sure
>> that we have an initialized memmap (and don't have ZONE_DEVICE memory).
>> 
>> Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to make
>> sure the function produces the same result once we stop setting ZONE_DEVICE
>> pages PG_reserved.
>> 
>> Cc: Alex Williamson 
>> Cc: Cornelia Huck 
>> Signed-off-by: David Hildenbrand 
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 10 --
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 2ada8e6cdb88..f8ce8c408ba8 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long 
>> npage, bool async)
>>  */
>> static bool is_invalid_reserved_pfn(unsigned long pfn)
>> {
>> -   if (pfn_valid(pfn))
>> -   return PageReserved(pfn_to_page(pfn));
>> +   struct page *page = pfn_to_online_page(pfn);
> 
> Ugh, I just realized this is not a safe conversion until
> pfn_to_online_page() is moved over to subsection granularity. As it
> stands it will return true for any ZONE_DEVICE pages that share a
> section with boot memory.

That should not happen right now and I commented back when you introduced 
subsection support that I don’t want to have ZONE_DEVICE mixed with online 
pages in a section. Having memory block devices that partially span ZONE_DEVICE 
would be ... really weird. With something like pfn_active() - as discussed - we 
could at least make this check work - but I am not sure if we really want to go 
down that path. In the worst case, some MB of RAM are lost ... I guess this 
needs more thought.

Re: [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes

2019-11-07 Thread Dan Williams
On Thu, Oct 24, 2019 at 5:12 AM David Hildenbrand  wrote:
>
> Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
> change that.
>
> KVM has this weird use case that you can map anything from /dev/mem
> into the guest. pfn_valid() is not a reliable check whether the memmap
> was initialized and can be touched. pfn_to_online_page() makes sure
> that we have an initialized memmap (and don't have ZONE_DEVICE memory).
>
> Rewrite is_invalid_reserved_pfn() similar to kvm_is_reserved_pfn() to make
> sure the function produces the same result once we stop setting ZONE_DEVICE
> pages PG_reserved.
>
> Cc: Alex Williamson 
> Cc: Cornelia Huck 
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ada8e6cdb88..f8ce8c408ba8 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -299,9 +299,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long 
> npage, bool async)
>   */
>  static bool is_invalid_reserved_pfn(unsigned long pfn)
>  {
> -   if (pfn_valid(pfn))
> -   return PageReserved(pfn_to_page(pfn));
> +   struct page *page = pfn_to_online_page(pfn);

Ugh, I just realized this is not a safe conversion until
pfn_to_online_page() is moved over to subsection granularity. As it
stands it will return true for any ZONE_DEVICE pages that share a
section with boot memory.


Re: [PATCH V8] mm/debug: Add tests validating architecture page table helpers

2019-11-07 Thread Vineet Gupta
On 11/6/19 8:44 PM, Anshuman Khandual wrote:
>
>>
>>>   */
>>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>>  #include 
>>>  #endif
>> This in wrong.  CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE is a just a glue 
>> toggle,
>> used only in Kconfig files (and not in any "C" code).  It enables generic 
>> Kconfig
>> code to allow visibility of CONFIG_TRANSPARENT_HUGEPAGE w/o every arch 
>> needing to
>> do a me too.
>>
>> I think you need to use CONFIG_TRANSPARENT_HUGEPAGE to guard appropriate 
>> tests. I
>> understand that it only
> We can probably replace CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE wrapper with
> CONFIG_TRANSPARENT_HUGEPAGE. But CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> explicitly depends on CONFIG_TRANSPARENT_HUGEPAGE as a prerequisite. Could
> you please confirm if the following change on this test will work on ARC
> platform for both THP and !THP cases ? Thank you.
>
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 621ac09..99ebc7c 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -67,7 +67,7 @@ static void __init pte_basic_tests(unsigned long pfn, 
> pgprot_t prot)
>   WARN_ON(pte_write(pte_wrprotect(pte)));
>  }
>  
> -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>  {
>   pmd_t pmd = pfn_pmd(pfn, prot);
> @@ -85,9 +85,6 @@ static void __init pmd_basic_tests(unsigned long pfn, 
> pgprot_t prot)
>*/
>   WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>  }
> -#else
> -static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
> -#endif
>  
>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
> @@ -112,6 +109,10 @@ static void __init pud_basic_tests(unsigned long pfn, 
> pgprot_t prot)
>  #else
>  static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>  #endif
> +#else
> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif

Fails to build for THP case since

CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=n

../mm/debug_vm_pgtable.c:112:20: error: redefinition of ‘pmd_basic_tests’


Re: [PATCH 07/10] ocxl: Save the device serial number in ocxl_fn

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:47, Alastair D'Silva a écrit :

From: Alastair D'Silva 

This patch retrieves the serial number of the card and makes it available
to consumers of the ocxl driver via the ocxl_fn struct.

Signed-off-by: Alastair D'Silva 
---



Acked-by: Frederic Barrat 




  drivers/misc/ocxl/config.c | 46 ++
  include/misc/ocxl.h|  1 +
  2 files changed, 47 insertions(+)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index fb0c3b6f8312..a9203c309365 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -71,6 +71,51 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 
afu_idx)
return 0;
  }
  
+/**

+ * Find a related PCI device (function 0)
+ * @device: PCI device to match
+ *
+ * Returns a pointer to the related device, or null if not found
+ */
+static struct pci_dev *get_function_0(struct pci_dev *dev)
+{
+   unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0); // Look for 
function 0
+
+   return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
+   dev->bus->number, devfn);
+}
+
+static void read_serial(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+   u32 low, high;
+   int pos;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
+   if (pos) {
+   pci_read_config_dword(dev, pos + 0x04, );
+   pci_read_config_dword(dev, pos + 0x08, );
+
+   fn->serial = low | ((u64)high) << 32;
+
+   return;
+   }
+
+   if (PCI_FUNC(dev->devfn) != 0) {
+   struct pci_dev *related = get_function_0(dev);
+
+   if (!related) {
+   fn->serial = 0;
+   return;
+   }
+
+   read_serial(related, fn);
+   pci_dev_put(related);
+   return;
+   }
+
+   fn->serial = 0;
+}
+
  static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
  {
u16 val;
@@ -208,6 +253,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
int rc;
  
  	read_pasid(dev, fn);

+   read_serial(dev, fn);
  
  	rc = read_dvsec_tl(dev, fn);

if (rc) {
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 6f7c02f0d5e3..9843051c3c5b 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -46,6 +46,7 @@ struct ocxl_fn_config {
int dvsec_afu_info_pos; /* offset of the AFU information DVSEC */
s8 max_pasid_log;
s8 max_afu_index;
+   u64 serial;
  };
  
  enum ocxl_endian {






Re: [PATCH 10/10] ocxl: Conditionally bind SCM devices to the generic OCXL driver

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:47, Alastair D'Silva a écrit :

From: Alastair D'Silva 

This patch allows the user to bind OpenCAPI SCM devices to the generic OCXL
driver.

Signed-off-by: Alastair D'Silva 
---



I'm wondering if we should upstream this. Is it of any use outside of 
some serious debug session for a developer?
Also we would now have 2 drivers picking up the same device ID, since 
the SCM driver is always registering for that ID, irrespective of 
CONFIG_OCXL_SCM_GENERIC


  Fred



  drivers/misc/ocxl/Kconfig | 7 +++
  drivers/misc/ocxl/pci.c   | 3 +++
  2 files changed, 10 insertions(+)

diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
index 1916fa65f2f2..8a683715c97c 100644
--- a/drivers/misc/ocxl/Kconfig
+++ b/drivers/misc/ocxl/Kconfig
@@ -29,3 +29,10 @@ config OCXL
  dedicated OpenCAPI link, and don't follow the same protocol.
  
  	  If unsure, say N.

+
+config OCXL_SCM_GENERIC
+   bool "Treat OpenCAPI Storage Class Memory as a generic OpenCAPI device"
+   default n
+   help
+ Select this option to treat OpenCAPI Storage Class Memory
+ devices an generic OpenCAPI devices.
diff --git a/drivers/misc/ocxl/pci.c b/drivers/misc/ocxl/pci.c
index cb920aa88d3a..7137055c1883 100644
--- a/drivers/misc/ocxl/pci.c
+++ b/drivers/misc/ocxl/pci.c
@@ -10,6 +10,9 @@
   */
  static const struct pci_device_id ocxl_pci_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x062B), },
+#ifdef CONFIG_OCXL_SCM_GENERIC
+   { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
+#endif
{ }
  };
  MODULE_DEVICE_TABLE(pci, ocxl_pci_tbl);





Re: [PATCH 06/10] ocxl: Add functions to map/unmap LPC memory

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:47, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Add functions to map/unmap LPC memory

Signed-off-by: Alastair D'Silva 
---
  drivers/misc/ocxl/config.c|  4 +++
  drivers/misc/ocxl/core.c  | 50 +++
  drivers/misc/ocxl/ocxl_internal.h |  3 ++
  include/misc/ocxl.h   | 18 +++
  4 files changed, 75 insertions(+)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..fb0c3b6f8312 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
afu->special_purpose_mem_size =
total_mem_size - lpc_mem_size;
}
+
+   dev_info(>dev, "Probed LPC memory of %#llx bytes and special purpose 
memory of %#llx bytes\n",
+   afu->lpc_mem_size, afu->special_purpose_mem_size);
+
return 0;
  }
  
diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c

index 2531c6cf19a0..5554f5ce4b9e 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -210,6 +210,55 @@ static void unmap_mmio_areas(struct ocxl_afu *afu)
release_fn_bar(afu->fn, afu->config.global_mmio_bar);
  }
  
+int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu)

+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if ((afu->config.lpc_mem_size + afu->config.special_purpose_mem_size) 
== 0)
+   return 0;
+
+   afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev);
+   if (afu->lpc_base_addr == 0)
+   return -EINVAL;
+
+   if (afu->config.lpc_mem_size) {
+   afu->lpc_res.start = afu->lpc_base_addr + 
afu->config.lpc_mem_offset;
+   afu->lpc_res.end = afu->lpc_res.start + 
afu->config.lpc_mem_size - 1;
+   }
+
+   if (afu->config.special_purpose_mem_size) {
+   afu->special_purpose_res.start = afu->lpc_base_addr +
+
afu->config.special_purpose_mem_offset;
+   afu->special_purpose_res.end = afu->special_purpose_res.start +
+  
afu->config.special_purpose_mem_size - 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(ocxl_afu_map_lpc_mem);



We should use EXPORT_SYMBOL_GPL().

ok, so we're unmapping the lpc memory implicitly when calling 
ocxl_function_close() and therefore don't really need to export 
(ocxl_)unmap_lpc_mem(). I guess that's fine and easy enough to add if 
one day somebody has the need to unmap without closing.




+
+struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
+{
+   return >lpc_res;
+}
+EXPORT_SYMBOL(ocxl_afu_lpc_mem);
+
+static void unmap_lpc_mem(struct ocxl_afu *afu)
+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if (afu->lpc_res.start || afu->special_purpose_res.start) {
+   void *link = afu->fn->link;
+
+   ocxl_link_lpc_release(link, dev);
+
+   afu->lpc_res.start = 0;
+   afu->lpc_res.end = 0;
+   afu->special_purpose_res.start = 0;
+   afu->special_purpose_res.end = 0;
+   }
+}
+
  static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev 
*dev)
  {
int rc;
@@ -251,6 +300,7 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
  
  static void deconfigure_afu(struct ocxl_afu *afu)

  {
+   unmap_lpc_mem(afu);
unmap_mmio_areas(afu);
reclaim_afu_pasid(afu);
reclaim_afu_actag(afu);
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 20b417e00949..9f4b47900e62 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -52,6 +52,9 @@ struct ocxl_afu {
void __iomem *global_mmio_ptr;
u64 pp_mmio_start;
void *private;
+   u64 lpc_base_addr; /* Covers both LPC & special purpose memory */
+   struct resource lpc_res;
+   struct resource special_purpose_res;
  };
  
  enum ocxl_context_status {

diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 06dd5839e438..6f7c02f0d5e3 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -212,6 +212,24 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int 
irq_id,
  
  // AFU Metadata
  
+/**

+ * Map the LPC system & special purpose memory for an AFU
+ *
+ * Do not call this during device discovery, as there may me multiple
+ * devices on a link, and the memory is mapped for the whole link, not
+ * just one device. It should only be called after all devices have
+ * registered their memory on the link.



If we were supporting more than one AFU-carrying functions, we would 
need to rework this, as functions could come and go and the total range 
could be dynamic (even the max address of the range could increase, if a 
function is updated with an AFU with a bigger LPC size). But 

Re: [PATCH 05/10] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:47, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Tally up the LPC memory on an OpenCAPI link & allow it to be mapped

Signed-off-by: Alastair D'Silva 
---
  drivers/misc/ocxl/core.c  | 10 ++
  drivers/misc/ocxl/link.c  | 60 +++
  drivers/misc/ocxl/ocxl_internal.h | 33 +
  3 files changed, 103 insertions(+)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index b7a09b21ab36..2531c6cf19a0 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
if (rc)
goto err_free_pasid;
  
+	if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) {

+   rc = ocxl_link_add_lpc_mem(afu->fn->link, 
afu->config.lpc_mem_offset,
+  afu->config.lpc_mem_size +
+  
afu->config.special_purpose_mem_size);
+   if (rc)
+   goto err_free_mmio;
+   }
+
return 0;
  
+err_free_mmio:

+   unmap_mmio_areas(afu);
  err_free_pasid:
reclaim_afu_pasid(afu);
  err_free_actag:
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..1d350d0bb860 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -84,6 +84,11 @@ struct ocxl_link {
int dev;
atomic_t irq_available;
struct spa *spa;
+   struct mutex lpc_mem_lock;
+   u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */
+   u64 lpc_mem;
+   int lpc_consumers;
+
void *platform_data;
  };
  static struct list_head links_list = LIST_HEAD_INIT(links_list);
@@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
if (rc)
goto err_spa;
  
+	mutex_init(>lpc_mem_lock);

+
/* platform specific hook */
rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
>platform_data);
@@ -711,3 +718,56 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
atomic_inc(>irq_available);
  }
  EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
+
+int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   // Check for overflow
+   if (offset > (offset + size))
+   return -EINVAL;
+
+   mutex_lock(>lpc_mem_lock);
+   link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);



Good find to avoid having to maintain a range list!



+
+   mutex_unlock(>lpc_mem_lock);
+
+   return 0;
+}
+
+u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+   u64 lpc_mem;
+
+   mutex_lock(>lpc_mem_lock);
+   if (link->lpc_mem) {
+   lpc_mem = link->lpc_mem;
+
+   link->lpc_consumers++;
+   mutex_unlock(>lpc_mem_lock);
+   return lpc_mem;
+   }
+
+   link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link->lpc_mem_sz);
+   if (link->lpc_mem)
+   link->lpc_consumers++;
+   lpc_mem = link->lpc_mem;
+   mutex_unlock(>lpc_mem_lock);
+
+   return lpc_mem;
+}
+
+void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   mutex_lock(>lpc_mem_lock);
+   link->lpc_consumers--;



Replace with WARN_ON(--link->lpc_consumers < 0) ?


  Fred



+   if (link->lpc_consumers == 0) {
+   pnv_ocxl_platform_lpc_release(pdev);
+   link->lpc_mem = 0;
+   }
+
+   mutex_unlock(>lpc_mem_lock);
+}
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 97415afd79f3..20b417e00949 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -141,4 +141,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 
offset);
  u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
  void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
  
+/**

+ * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an 
OpenCAPI link
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @offset: The offset of the memory to add
+ * @size: The amount of memory to increment by
+ *
+ * Return 0 on success, negative on overflow
+ */
+int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size);
+
+/**
+ * ocxl_link_lpc_map() - Map the LPC memory for an OpenCAPI device
+ *
+ * Since LPC memory belongs to a link, the whole LPC memory available
+ * on the link bust be mapped in order to make it accessible to a device.
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @pdev: A device that is on the link
+ */
+u64 ocxl_link_lpc_map(void *link_handle, struct 

Re: [PATCH 04/10] powerpc: Map & release OpenCAPI LPC memory

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:46, Alastair D'Silva a écrit :

From: Alastair D'Silva 

This patch adds platform support to map & release LPC memory.

Signed-off-by: Alastair D'Silva 
---
  arch/powerpc/include/asm/pnv-ocxl.h   |  2 ++
  arch/powerpc/platforms/powernv/ocxl.c | 41 +++
  include/linux/memory_hotplug.h|  5 
  mm/memory_hotplug.c   |  3 +-
  4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 7de82647e761..f8f8ffb48aa8 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void 
*platform_data, int pe_handle)
  
  extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);

  extern void pnv_ocxl_free_xive_irq(u32 irq);
+extern u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
+extern void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
  
  #endif /* _ASM_PNV_OCXL_H */

diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 8c65aacda9c8..c6d4234e0aba 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,6 +475,47 @@ void pnv_ocxl_spa_release(void *platform_data)
  }
  EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
  
+u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)

+{
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u32 bdfn = (pdev->bus->number << 8) | pdev->devfn;
+   u64 base_addr = 0;
+   int rc;
+
+   rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, _addr);
+   if (rc) {
+   dev_warn(>dev,
+"OPAL could not allocate LPC memory, rc=%d\n", rc);
+   return 0;
+   }
+
+   base_addr = be64_to_cpu(base_addr);
+
+   rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
+ size >> PAGE_SHIFT);
+   if (rc)
+   return 0;
+
+   return base_addr;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
+
+void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
+{
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u32 bdfn = (pdev->bus->number << 8) | pdev->devfn;
+   int rc;
+
+   rc = opal_npu_mem_release(phb->opal_id, bdfn);
+   if (rc)
+   dev_warn(>dev,
+"OPAL reported rc=%d when releasing LPC memory\n", rc);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
+
+
  int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
  {
struct spa_data *data = (struct spa_data *) platform_data;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f46ea71b4ffd..3f5f1a642abe 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -339,6 +339,11 @@ static inline int remove_memory(int nid, u64 start, u64 
size)
  static inline void __remove_memory(int nid, u64 start, u64 size) {}
  #endif /* CONFIG_MEMORY_HOTREMOVE */
  
+#if CONFIG_MEMORY_HOTPLUG_SPARSE

+int check_hotplug_memory_addressable(unsigned long pfn,
+   unsigned long nr_pages);
+#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
+
  extern void __ref free_area_init_core_hotplug(int nid);
  extern int __add_memory(int nid, u64 start, u64 size);
  extern int add_memory(int nid, u64 start, u64 size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2cecf07b396f..b39827dbd071 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,7 +278,7 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
  }
  
-static int check_hotplug_memory_addressable(unsigned long pfn,

+int check_hotplug_memory_addressable(unsigned long pfn,
unsigned long nr_pages)
  {
const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
@@ -294,6 +294,7 @@ static int check_hotplug_memory_addressable(unsigned long 
pfn,
  
  	return 0;

  }
+EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);



Making check_hotplug_memory_addressable() visible in the kernel could be 
a separate patch, to make sure it gets the proper attention instead of 
being buried in a powerpc patch.

Also, already mentioned, but it shouldn't be exported.


  
  /*

   * Reasonably generic function for adding memory.  It is





Re: [PATCH 03/10] powerpc: Add OPAL calls for LPC memory alloc/release

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:46, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Add OPAL calls for LPC memory alloc/release

Signed-off-by: Alastair D'Silva 
Acked-by: Andrew Donnellan 
---



Acked-by: Frederic Barrat 




  arch/powerpc/include/asm/opal-api.h| 2 ++
  arch/powerpc/include/asm/opal.h| 3 +++
  arch/powerpc/platforms/powernv/opal-call.c | 2 ++
  3 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 378e3997845a..2c88c02e69ed 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -208,6 +208,8 @@
  #define OPAL_HANDLE_HMI2  166
  #define   OPAL_NX_COPROC_INIT 167
  #define OPAL_XIVE_GET_VP_STATE170
+#define OPAL_NPU_MEM_ALLOC 171
+#define OPAL_NPU_MEM_RELEASE   172
  #define OPAL_MPIPL_UPDATE 173
  #define OPAL_MPIPL_REGISTER_TAG   174
  #define OPAL_MPIPL_QUERY_TAG  175
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a0cf8fba4d12..4db135fb54ab 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -39,6 +39,9 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t 
bdfn,
uint64_t PE_handle);
  int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
uint64_t rate_phys, uint32_t size);
+int64_t opal_npu_mem_alloc(uint64_t phb_id, uint32_t bdfn,
+   uint64_t size, uint64_t *bar);
+int64_t opal_npu_mem_release(uint64_t phb_id, uint32_t bdfn);
  
  int64_t opal_console_write(int64_t term_number, __be64 *length,

   const uint8_t *buffer);
diff --git a/arch/powerpc/platforms/powernv/opal-call.c 
b/arch/powerpc/platforms/powernv/opal-call.c
index a2aa5e433ac8..27c4b93c774c 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -287,6 +287,8 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, 
OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
  OPAL_CALL(opal_sensor_read_u64,   OPAL_SENSOR_READ_U64);
  OPAL_CALL(opal_sensor_group_enable,   OPAL_SENSOR_GROUP_ENABLE);
  OPAL_CALL(opal_nx_coproc_init,OPAL_NX_COPROC_INIT);
+OPAL_CALL(opal_npu_mem_alloc,  OPAL_NPU_MEM_ALLOC);
+OPAL_CALL(opal_npu_mem_release,OPAL_NPU_MEM_RELEASE);
  OPAL_CALL(opal_mpipl_update,  OPAL_MPIPL_UPDATE);
  OPAL_CALL(opal_mpipl_register_tag,OPAL_MPIPL_REGISTER_TAG);
  OPAL_CALL(opal_mpipl_query_tag,   OPAL_MPIPL_QUERY_TAG);





Re: [PATCH 02/10] nvdimm: remove prototypes for nonexistent functions

2019-11-07 Thread Frederic Barrat




Le 25/10/2019 à 06:46, Alastair D'Silva a écrit :

From: Alastair D'Silva 

These functions don't exist, so remove the prototypes for them.

Signed-off-by: Alastair D'Silva 
---



Reviewed-by: Frederic Barrat 



  drivers/nvdimm/nd-core.h | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 25fa121104d0..9f121a6aeb02 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -124,11 +124,7 @@ void nd_region_create_dax_seed(struct nd_region 
*nd_region);
  int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
  void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
  void nd_synchronize(void);
-int nvdimm_bus_register_dimms(struct nvdimm_bus *nvdimm_bus);
-int nvdimm_bus_register_regions(struct nvdimm_bus *nvdimm_bus);
-int nvdimm_bus_init_interleave_sets(struct nvdimm_bus *nvdimm_bus);
  void __nd_device_register(struct device *dev);
-int nd_match_dimm(struct device *dev, void *data);
  struct nd_label_id;
  char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags);
  bool nd_is_uuid_unique(struct device *dev, u8 *uuid);





[PATCH v2 3/4] powerpc/kvm/book3e: Replace current->mm by kvm->mm

2019-11-07 Thread Leonardo Bras
Given that in kvm_create_vm() there is:
kvm->mm = current->mm;

And that on every kvm_*_ioctl we have:
if (kvm->mm != current->mm)
return -EIO;

I see no reason to keep using current->mm instead of kvm->mm.

By doing so, we would reduce the use of 'global' variables on code, relying
more in the contents of kvm struct.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/kvm/booke.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index be9a45874194..fd7bdb4f8f87 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -775,7 +775,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
debug = current->thread.debug;
current->thread.debug = vcpu->arch.dbg_reg;
 
-   vcpu->arch.pgdir = current->mm->pgd;
+   vcpu->arch.pgdir = vcpu->kvm->mm->pgd;
kvmppc_fix_ee_before_entry();
 
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
-- 
2.23.0



[PATCH v2 2/4] powerpc/kvm/book3s: Replace current->mm by kvm->mm

2019-11-07 Thread Leonardo Bras
Given that in kvm_create_vm() there is:
kvm->mm = current->mm;

And that on every kvm_*_ioctl we have:
if (kvm->mm != current->mm)
return -EIO;

I see no reason to keep using current->mm instead of kvm->mm.

By doing so, we would reduce the use of 'global' variables on code, relying
more in the contents of kvm struct.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_64_vio.c| 10 ++
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 9a75f0e1933b..43b3cdf011bd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -296,7 +296,7 @@ static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, 
unsigned long flags,
/* Protect linux PTE lookup from page table destruction */
rcu_read_lock_sched();  /* this disables preemption too */
ret = kvmppc_do_h_enter(kvm, flags, pte_index, pteh, ptel,
-   current->mm->pgd, false, pte_idx_ret);
+   kvm->mm->pgd, false, pte_idx_ret);
rcu_read_unlock_sched();
if (ret == H_TOO_HARD) {
/* this can't happen */
@@ -592,8 +592,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
npages = get_user_pages_fast(hva, 1, writing ? FOLL_WRITE : 0, pages);
if (npages < 1) {
/* Check if it's an I/O mapping */
-   down_read(>mm->mmap_sem);
-   vma = find_vma(current->mm, hva);
+   down_read(>mm->mmap_sem);
+   vma = find_vma(kvm->mm, hva);
if (vma && vma->vm_start <= hva && hva + psize <= vma->vm_end &&
(vma->vm_flags & VM_PFNMAP)) {
pfn = vma->vm_pgoff +
@@ -602,7 +602,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
is_ci = pte_ci(__pte((pgprot_val(vma->vm_page_prot;
write_ok = vma->vm_flags & VM_WRITE;
}
-   up_read(>mm->mmap_sem);
+   up_read(>mm->mmap_sem);
if (!pfn)
goto out_put;
} else {
@@ -621,7 +621,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * hugepage split and collapse.
 */
local_irq_save(flags);
-   ptep = find_current_mm_pte(current->mm->pgd,
+   ptep = find_current_mm_pte(kvm->mm->pgd,
   hva, NULL, NULL);
if (ptep) {
pte = kvmppc_read_update_linux_pte(ptep, 1);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index a402ead833b6..308aa3a639a5 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -253,10 +253,11 @@ static int kvm_spapr_tce_release(struct inode *inode, 
struct file *filp)
}
}
 
+   account_locked_vm(kvm->mm,
+   kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
+
kvm_put_kvm(stt->kvm);
 
-   account_locked_vm(current->mm,
-   kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
call_rcu(>rcu, release_spapr_tce_table);
 
return 0;
@@ -272,6 +273,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 {
struct kvmppc_spapr_tce_table *stt = NULL;
struct kvmppc_spapr_tce_table *siter;
+   struct mm_struct *mm = kvm->mm;
unsigned long npages, size = args->size;
int ret = -ENOMEM;
 
@@ -280,7 +282,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
return -EINVAL;
 
npages = kvmppc_tce_pages(size);
-   ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true);
+   ret = account_locked_vm(mm, kvmppc_stt_pages(npages), true);
if (ret)
return ret;
 
@@ -325,7 +327,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
kvm_put_kvm(kvm);
kfree(stt);
  fail_acct:
-   account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
+   account_locked_vm(mm, kvmppc_stt_pages(npages), false);
return ret;
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..679008c511e4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4280,7 +4280,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
user_vrsave = mfspr(SPRN_VRSAVE);
 
vcpu->arch.wqp = >arch.vcore->wq;
-   vcpu->arch.pgdir = current->mm->pgd;
+   vcpu->arch.pgdir = kvm->mm->pgd;
vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
do {
@@ 

[PATCH v2 0/4] Replace current->mm by kvm->mm on powerpc/kvm

2019-11-07 Thread Leonardo Bras
By replacing, we would reduce the use of 'global' current on code,
relying more in the contents of kvm struct.

On code, I found that in kvm_create_vm() there is:
kvm->mm = current->mm;

And that on every kvm_*_ioctl we have tests like that:
if (kvm->mm != current->mm)
return -EIO;

So this change would be safe.

Also, I fixed a possible 'use after free' of kvm variable in
kvm_vm_ioctl_create_spapr_tce, where it does a mutex_unlock(>lock)
after a kvm_put_kvm(kvm).

Changes since v1:
- Fixes possible 'use after free' on kvm_spapr_tce_release (from v1)
- Fixes possible 'use after free' on kvm_vm_ioctl_create_spapr_tce
- Fixes undeclared variable error

Build test:
- https://travis-ci.org/LeoBras/linux-ppc/builds/608807573

Leonardo Bras (4):
  powerpc/kvm/book3s: Fixes possible 'use after release' of kvm
  powerpc/kvm/book3s: Replace current->mm by kvm->mm
  powerpc/kvm/book3e: Replace current->mm by kvm->mm
  powerpc/kvm/e500: Replace current->mm by kvm->mm

 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_64_vio.c| 13 +++--
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/booke.c|  2 +-
 arch/powerpc/kvm/e500_mmu_host.c|  6 +++---
 5 files changed, 21 insertions(+), 20 deletions(-)

-- 
2.23.0



[PATCH v2 4/4] powerpc/kvm/e500: Replace current->mm by kvm->mm

2019-11-07 Thread Leonardo Bras
Given that in kvm_create_vm() there is:
kvm->mm = current->mm;

And that on every kvm_*_ioctl we have:
if (kvm->mm != current->mm)
return -EIO;

I see no reason to keep using current->mm instead of kvm->mm.

By doing so, we would reduce the use of 'global' variables on code, relying
more in the contents of kvm struct.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/kvm/e500_mmu_host.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 321db0fdb9db..425d13806645 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -355,9 +355,9 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
if (tlbsel == 1) {
struct vm_area_struct *vma;
-   down_read(>mm->mmap_sem);
+   down_read(>mm->mmap_sem);
 
-   vma = find_vma(current->mm, hva);
+   vma = find_vma(kvm->mm, hva);
if (vma && hva >= vma->vm_start &&
(vma->vm_flags & VM_PFNMAP)) {
/*
@@ -441,7 +441,7 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);
}
 
-   up_read(>mm->mmap_sem);
+   up_read(>mm->mmap_sem);
}
 
if (likely(!pfnmap)) {
-- 
2.23.0



[PATCH v2 1/4] powerpc/kvm/book3s: Fixes possible 'use after release' of kvm

2019-11-07 Thread Leonardo Bras
Fixes a possible 'use after free' of kvm variable in
kvm_vm_ioctl_create_spapr_tce, where it does a mutex_unlock(>lock)
after a kvm_put_kvm(kvm).

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/kvm/book3s_64_vio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 5834db0a54c6..a402ead833b6 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -316,14 +316,13 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 
if (ret >= 0)
list_add_rcu(>list, >arch.spapr_tce_tables);
-   else
-   kvm_put_kvm(kvm);
 
mutex_unlock(>lock);
 
if (ret >= 0)
return ret;
 
+   kvm_put_kvm(kvm);
kfree(stt);
  fail_acct:
account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
-- 
2.23.0



[PATCH v3] powerpc/fadump: when fadump is supported register the fadump sysfs files.

2019-11-07 Thread Michal Suchanek
Currently it is not possible to distinguish the case when fadump is
supported by firmware and disabled in kernel and completely unsupported
using the kernel sysfs interface. User can investigate the devicetree
but it is more reasonable to provide sysfs files in case we get some
fadumpv2 in the future.

With this patch sysfs files are available whenever fadump is supported
by firmware.

There is duplicate message about lack of support by firmware in
fadump_reserve_mem and setup_fadump. Remove the duplicate message in
setup_fadump.

Signed-off-by: Michal Suchanek 
---
v2: move the sysfs initialization earlier to avoid condition nesting
v3: remove duplicate message
---
 arch/powerpc/kernel/fadump.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ed59855430b9..ff0114aeba9b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1466,16 +1466,15 @@ static void fadump_init_files(void)
  */
 int __init setup_fadump(void)
 {
-   if (!fw_dump.fadump_enabled)
-   return 0;
-
-   if (!fw_dump.fadump_supported) {
-   printk(KERN_ERR "Firmware-assisted dump is not supported on"
-   " this hardware\n");
+   if (!fw_dump.fadump_supported)
return 0;
-   }
 
+   fadump_init_files();
fadump_show_config();
+
+   if (!fw_dump.fadump_enabled)
+   return 1;
+
/*
 * If dump data is available then see if it is valid and prepare for
 * saving it to the disk.
@@ -1492,8 +1491,6 @@ int __init setup_fadump(void)
else if (fw_dump.reserve_dump_area_size)
fw_dump.ops->fadump_init_mem_struct(_dump);
 
-   fadump_init_files();
-
return 1;
 }
 subsys_initcall(setup_fadump);
-- 
2.23.0



Re: [PATCH] powerpc/fadump: Remove duplicate message.

2019-11-07 Thread Michal Suchánek
On Thu, Oct 24, 2019 at 01:16:51PM +0200, Michal Suchánek wrote:
> On Thu, Oct 24, 2019 at 04:08:08PM +0530, Hari Bathini wrote:
> > 
> > Michal, thanks for looking into this.
> > 
> > On 23/10/19 11:26 PM, Michal Suchanek wrote:
> > > There is duplicate message about lack of support by firmware in
> > > fadump_reserve_mem and setup_fadump. Due to different capitalization it
> > > is clear that the one in setup_fadump is shown on boot. Remove the
> > > duplicate that is not shown.
> > 
> > Actually, the message in fadump_reserve_mem() is logged. 
> > fadump_reserve_mem()
> > executes first and sets fw_dump.fadump_enabled to `0`, if fadump is not 
> > supported.
> > So, the other message in setup_fadump() doesn't get logged anymore with 
> > recent
> > changes. The right thing to do would be to remove similar message in 
> > setup_fadump() instead.
> 
> I need to re-check with a recent kernel build. I saw the message from
> setup_fadump and not the one from fadump_reserve_mem but not sure what
> the platform init code looked like in the kernel I tested with.

Indeed, I was missing the patch that changes the capitalization in
fadump_reserve_mem. In my kernel both messages are the same and the one
from fadump_reserve_mem is displayed.

Thanks

Michal


Re: [PATCH v6 0/7] Powerpc/Watchpoint: Few important fixes

2019-11-07 Thread Ravi Bangoria




On 10/29/19 7:31 PM, Christophe Leroy wrote:



Le 29/10/2019 à 05:54, Ravi Bangoria a écrit :



On 10/17/19 3:01 PM, Ravi Bangoria wrote:

v5: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-October/198069.html

v5->v6:
  - patch 6/7: mpe reported that the perf-hwbreak.c doesn't compile with older
    gcc:
 perf-hwbreak.c:182:2: error: dereferencing type-punned pointer will
 break strict-aliasing rules [-Werror=strict-aliasing]
 temp16 = *((__u16 *)target);
 ^
    Fixed that.



Hi Christophe, Are you ok with the series wrt 8xx?


Yes it looks ok on my 885:

root@vgoip:~# ./ptrace-hwbreak
test: ptrace-hwbreak
tags: git_version:v5.4-rc4-835-gb235e63aa9f0-dirty
PTRACE_SET_DEBUGREG, WO, len: 1: Ok
PTRACE_SET_DEBUGREG, WO, len: 2: Ok
PTRACE_SET_DEBUGREG, WO, len: 4: Ok
PTRACE_SET_DEBUGREG, WO, len: 8: Ok
PTRACE_SET_DEBUGREG, RO, len: 1: Ok
PTRACE_SET_DEBUGREG, RO, len: 2: Ok
PTRACE_SET_DEBUGREG, RO, len: 4: Ok
PTRACE_SET_DEBUGREG, RO, len: 8: Ok
PTRACE_SET_DEBUGREG, RW, len: 1: Ok
PTRACE_SET_DEBUGREG, RW, len: 2: Ok
PTRACE_SET_DEBUGREG, RW, len: 4: Ok
PTRACE_SET_DEBUGREG, RW, len: 8: Ok
PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok
PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok
PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok
success: ptrace-hwbreak

Also ok on book3s/32:


mpe, Can you please pull the series.

Ravi



Re: [PATCH V8] mm/debug: Add tests validating architecture page table helpers

2019-11-07 Thread Anshuman Khandual



On 11/07/2019 06:24 PM, Michael Ellerman wrote:
> Anshuman Khandual  writes:
>> On 11/06/2019 12:11 PM, Christophe Leroy wrote:
>>> Le 06/11/2019 à 04:22, Anshuman Khandual a écrit :
>>>> On 10/28/2019 10:59 AM, Anshuman Khandual wrote:
>>>>> +    ---
>>>>> +    | arch |status|
>>>>> +    ---
>>>>> +    |   alpha: | TODO |
>>>>> +    | arc: | TODO |
>>>>> +    | arm: | TODO |
>>>>> +    |   arm64: |  ok  |
>>>>> +    | c6x: | TODO |
>>>>> +    |    csky: | TODO |
>>>>> +    |   h8300: | TODO |
>>>>> +    | hexagon: | TODO |
>>>>> +    |    ia64: | TODO |
>>>>> +    |    m68k: | TODO |
>>>>> +    |  microblaze: | TODO |
>>>>> +    |    mips: | TODO |
>>>>> +    |   nds32: | TODO |
>>>>> +    |   nios2: | TODO |
>>>>> +    |    openrisc: | TODO |
>>>>> +    |  parisc: | TODO |
>>>>> +    | powerpc: | TODO |
>>>>> +    |   ppc32: |  ok  |
>>>
>>> Note that ppc32 is a part of powerpc, not a standalone arch.
>>
>> Right, I understand. But we are yet to hear about how this test
>> came about on powerpc server platforms. Will update 'powerpc'
>> arch listing above once we get some confirmation. May be once
>> this works on all relevant powerpc platforms, we can just merge
>> 'powerpc' and 'ppc32' entries here as just 'powerpc'.
> 
> On pseries:
> 
>   watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty #152
>   NIP:  c10435a0 LR: c10434b4 CTR: 
>   REGS: c0003a403980 TRAP: 0901   Not tainted  
> (5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty)
>   MSR:  82009033   CR: 44000222  XER: 
> 
>   CFAR: c10435a8 IRQMASK: 0 
>   GPR00: c10434b4 c0003a403c10 c1295000 0521000100c0 
>   GPR04: 8105 00400dc0 3eb0 0001 
>   GPR08:   0001 0100 
>   GPR12:  c18f 
>   NIP [c10435a0] debug_vm_pgtable+0x43c/0x82c
>   LR [c10434b4] debug_vm_pgtable+0x350/0x82c
>   Call Trace:
>   [c0003a403c10] [c104346c] debug_vm_pgtable+0x308/0x82c 
> (unreliable)
>   [c0003a403ce0] [c1004310] kernel_init_freeable+0x1d0/0x39c
>   [c0003a403db0] [c0010da0] kernel_init+0x24/0x174
>   [c0003a403e20] [c000bdc4] ret_from_kernel_thread+0x5c/0x78
>   Instruction dump:
>   7d075078 7ce74b78 7ce0f9ad 40c2fff0 3880 7f83e378 4b02eee1 6000 
>   4880 3920 3941 3900 <7ea0f8a8> 7ea75039 40c2fff8 7ea74878 
> 
> Looking at the asm I think it's stuck in hash__pte_update() waiting for
> H_PAGE_BUSY to clear, but not sure why.
> 
> That's just using qemu TCG, instructions here if anyone wants to test it
> themselves :)
> 
>   https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu
> 
> 
> If I boot with -cpu power9 (using Radix MMU), I get a plain old BUG:
> 
>   debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table 
> helpers
>   [ cut here ]
>   kernel BUG at arch/powerpc/mm/pgtable.c:274!
>   Oops: Exception in kernel mode, sig: 5 [#1]
>   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty #152
>   NIP:  c00724e8 LR: c104358c CTR: 
>   REGS: c0003a483980 TRAP: 0700   Not tainted  
> (5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty)
>   MSR:  82029033   CR: 24000224  XER: 
> 2000
>   CFAR: c1043588 IRQMASK: 0 
>   GPR00: c104358c c0003a483c10 c1295000 0009 
>   GPR04:  0005  0009 
>   GPR08: 0001 000e 0001 c0003a5f 
>   GPR12:  c18f c0010d84  
>   GPR16:   c0003a5f 8105 
>   GPR20: c1003ab8 0015 0500613a0080 0900603a0080 
>   GPR24: 09202e3a0080 c133bd90 c133bd98 c133bda0 
>  

[PATCH 2/2] powerpc/perf: Check pmus_inuse flag in perf_event_print_debug()

2019-11-07 Thread Madhavan Srinivasan
pmu_inuse flag is part of lppaca struct which notifies the hypervisor
whether guest/partition is using PMUs. This provides a hint for
save/restore of PMU registers. Currently perf_event_print_debug()
does not check for pmu_inuse flag and it is not safe to use it to
dump PMU SPRs in a CONFIG_PSERIES.

Patch adds two things here. 1) An inline ppc_get_pmu_inuse() to get
the pmu_inuse value and 2)check in perf_event_print_debug() before
dumping the PMU SPRs.

ppc_get_pmu_inuse() is based on ppc_set_pmu_inuse() and includes same
CONFIG_ checks.
---
 arch/powerpc/include/asm/pmc.h  | 15 +++
 arch/powerpc/perf/core-book3s.c |  9 +
 2 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
index c6bbe9778d3c..35179d218e2e 100644
--- a/arch/powerpc/include/asm/pmc.h
+++ b/arch/powerpc/include/asm/pmc.h
@@ -34,11 +34,26 @@ static inline void ppc_set_pmu_inuse(int inuse)
 #endif
 }
 
+static inline u8 ppc_get_pmu_inuse(void)
+{
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+#ifdef CONFIG_PPC_PSERIES
+   return get_lppaca()->pmcregs_in_use;
+#endif
+   }
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   return get_paca()->pmcregs_in_use;
+#endif
+#endif
+}
+
 extern void power4_enable_pmcs(void);
 
 #else /* CONFIG_PPC64 */
 
 static inline void ppc_set_pmu_inuse(int inuse) { }
+static inline u8 ppc_get_pmu_inuse(void) { }
 
 #endif
 
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index f455e274281a..855a5f9589ef 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -816,6 +816,15 @@ void perf_event_print_debug(void)
if (!ppmu->n_counter)
return;
 
+   /*
+* Check pmu_inuse flag. As per PAPR spec, hypersivor
+* will save/restore the PMU regs only if pmu_inuse is
+* set. If its not enable, values dumped from these SPRs
+* may not be valid or useful.
+*/
+   if (!ppc_get_pmu_inuse())
+   return;
+
local_irq_save(flags);
 
pr_info("CPU: %d PMU registers, ppmu = %s n_counters = %d",
-- 
2.21.0



[PATCH 1/2] powerpc/perf: Add mtmmcr0(FC) after ppc_set_pmu_inuse(1)

2019-11-07 Thread Madhavan Srinivasan
pmu_inuse flag is part of lppaca struct which notifies the hypervisor
whether guest/partition is using PMUs. This provides a hint incase of
save/restore of PMU registers. And in power_pmu_enable(), linux sets
the pmu_inuse flag and then updates the PMU registers. Current sequence
in power_pmu_enable() is 1) update pmc_inuse flag 2)update MMCRA, MMCR1,
MMCR0 and so on. But with this sequence, there is a window where when
updating MMCRA, hypersior could load stale value to MMCR0 which could
cause a PMI exception. Patch add a mtmmcr0 with freeze counter bit set
right after updating the pmu_inuse flag to avoid any overflow scenarios.
---
 arch/powerpc/perf/core-book3s.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3fb6d265ed17..f455e274281a 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1351,6 +1351,7 @@ static void power_pmu_enable(struct pmu *pmu)
 * Then unfreeze the events.
 */
ppc_set_pmu_inuse(1);
+   mtspr(SPRN_MMCR0, MMCR0_FC);
mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
mtspr(SPRN_MMCR0, (cpuhw->mmcr[0] & ~(MMCR0_PMC1CE | MMCR0_PMCjCE))
-- 
2.21.0



Re: [PATCH V8] mm/debug: Add tests validating architecture page table helpers

2019-11-07 Thread Michael Ellerman
Anshuman Khandual  writes:
> On 11/06/2019 12:11 PM, Christophe Leroy wrote:
>> Le 06/11/2019 à 04:22, Anshuman Khandual a écrit :
>>> On 10/28/2019 10:59 AM, Anshuman Khandual wrote:
>>>> +    ---
>>>> +    | arch |status|
>>>> +    ---
>>>> +    |   alpha: | TODO |
>>>> +    | arc: | TODO |
>>>> +    | arm: | TODO |
>>>> +    |   arm64: |  ok  |
>>>> +    | c6x: | TODO |
>>>> +    |    csky: | TODO |
>>>> +    |   h8300: | TODO |
>>>> +    | hexagon: | TODO |
>>>> +    |    ia64: | TODO |
>>>> +    |    m68k: | TODO |
>>>> +    |  microblaze: | TODO |
>>>> +    |    mips: | TODO |
>>>> +    |   nds32: | TODO |
>>>> +    |   nios2: | TODO |
>>>> +    |    openrisc: | TODO |
>>>> +    |  parisc: | TODO |
>>>> +    | powerpc: | TODO |
>>>> +    |   ppc32: |  ok  |
>> 
>> Note that ppc32 is a part of powerpc, not a standalone arch.
>
> Right, I understand. But we are yet to hear about how this test
> came about on powerpc server platforms. Will update 'powerpc'
> arch listing above once we get some confirmation. May be once
> this works on all relevant powerpc platforms, we can just merge
> 'powerpc' and 'ppc32' entries here as just 'powerpc'.

On pseries:

  watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty #152
  NIP:  c10435a0 LR: c10434b4 CTR: 
  REGS: c0003a403980 TRAP: 0901   Not tainted  
(5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty)
  MSR:  82009033   CR: 44000222  XER: 
  CFAR: c10435a8 IRQMASK: 0 
  GPR00: c10434b4 c0003a403c10 c1295000 0521000100c0 
  GPR04: 8105 00400dc0 3eb0 0001 
  GPR08:   0001 0100 
  GPR12:  c18f 
  NIP [c10435a0] debug_vm_pgtable+0x43c/0x82c
  LR [c10434b4] debug_vm_pgtable+0x350/0x82c
  Call Trace:
  [c0003a403c10] [c104346c] debug_vm_pgtable+0x308/0x82c 
(unreliable)
  [c0003a403ce0] [c1004310] kernel_init_freeable+0x1d0/0x39c
  [c0003a403db0] [c0010da0] kernel_init+0x24/0x174
  [c0003a403e20] [c000bdc4] ret_from_kernel_thread+0x5c/0x78
  Instruction dump:
  7d075078 7ce74b78 7ce0f9ad 40c2fff0 3880 7f83e378 4b02eee1 6000 
  4880 3920 3941 3900 <7ea0f8a8> 7ea75039 40c2fff8 7ea74878 

Looking at the asm I think it's stuck in hash__pte_update() waiting for
H_PAGE_BUSY to clear, but not sure why.

That's just using qemu TCG, instructions here if anyone wants to test it
themselves :)

  https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu


If I boot with -cpu power9 (using Radix MMU), I get a plain old BUG:

  debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers
  [ cut here ]
  kernel BUG at arch/powerpc/mm/pgtable.c:274!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty #152
  NIP:  c00724e8 LR: c104358c CTR: 
  REGS: c0003a483980 TRAP: 0700   Not tainted  
(5.4.0-rc6-gcc-8.2.0-next-20191107-1-g250339d6747b-dirty)
  MSR:  82029033   CR: 24000224  XER: 2000
  CFAR: c1043588 IRQMASK: 0 
  GPR00: c104358c c0003a483c10 c1295000 0009 
  GPR04:  0005  0009 
  GPR08: 0001 000e 0001 c0003a5f 
  GPR12:  c18f c0010d84  
  GPR16:   c0003a5f 8105 
  GPR20: c1003ab8 0015 0500613a0080 0900603a0080 
  GPR24: 09202e3a0080 c133bd90 c133bd98 c133bda0 
  GPR28: c0003a5e c0003a600af8 c0003a2e2d48 c0003a6100a0 
  NIP [c00724e8] assert_pte_locked+0x88/0x190
  LR [c104358c] debug_vm_pgtable+0x428/0x82c
  Call Trace:
  [c0003a483c10] [c104346c] debug_vm_pgtable+0x308/0x82c 
(unreliable)
  [c0003a483ce0] [c1004310] kernel_init_freeable+0x1d0/0x39c
  [c0003a483db0] [c0010da0] kernel_init+0x24/0x174
  [c0003a483e20] [c000bdc4] ret_from_kernel_thread+0x5c/0x78
  Instruction dump:
  7d251a14 39070010 7d463030 7d084a14 38c6 7c884436 7cc607b4 7d083038 
  79081f24 7ccb402a 7cc80074 7908d182 <0b08> 78cb0022 54c8c03e 7d473830 
  ---[ end trace a694f1bc56529c0e ]---


cheers


Re: [PATCH v5 5/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-11-07 Thread Michael Ellerman
"Alastair D'Silva"  writes:
> From: Alastair D'Silva 
>
> When presented with large amounts of memory being hotplugged
> (in my test case, ~890GB), the call to flush_dcache_range takes
> a while (~50 seconds), triggering RCU stalls.
>
> This patch breaks up the call into 1GB chunks, calling
> cond_resched() inbetween to allow the scheduler to run.
>
> Signed-off-by: Alastair D'Silva 

I'm going to mark this as:
  Fixes: fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug")

Because anyone doing large memory hotplugs on older kernels is going to
want to backport this to at least that point, otherwise they will see
the softlockups/RCU stalls.

cheers

> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 54d61ba15e93..a7b662fc02c8 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -104,6 +104,27 @@ int __weak remove_section_mapping(unsigned long start, 
> unsigned long end)
>   return -ENODEV;
>  }
>  
> +#define FLUSH_CHUNK_SIZE SZ_1G
> +/**
> + * flush_dcache_range_chunked(): Write any modified data cache blocks out to
> + * memory and invalidate them, in chunks of up to FLUSH_CHUNK_SIZE
> + * Does not invalidate the corresponding instruction cache blocks.
> + *
> + * @start: the start address
> + * @stop: the stop address (exclusive)
> + * @chunk: the max size of the chunks
> + */
> +static void flush_dcache_range_chunked(unsigned long start, unsigned long 
> stop,
> +unsigned long chunk)
> +{
> + unsigned long i;
> +
> + for (i = start; i < stop; i += chunk) {
> + flush_dcache_range(i, min(stop, start + chunk));
> + cond_resched();
> + }
> +}
> +
>  int __ref arch_add_memory(int nid, u64 start, u64 size,
>   struct mhp_restrictions *restrictions)
>  {
> @@ -120,7 +141,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>   start, start + size, rc);
>   return -EFAULT;
>   }
> - flush_dcache_range(start, start + size);
> +
> + flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
>  
>   return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
> @@ -137,7 +159,8 @@ void __ref arch_remove_memory(int nid, u64 start, u64 
> size,
>  
>   /* Remove htab bolted mappings for this section of memory */
>   start = (unsigned long)__va(start);
> - flush_dcache_range(start, start + size);
> + flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
> +
>   ret = remove_section_mapping(start, start + size);
>   WARN_ON_ONCE(ret);
>  
> -- 
> 2.21.0


Re: [PATCH 7/9] PCI: rpaphp: annotate and correctly byte swap DRC properties

2019-11-07 Thread Michael Ellerman
Tyrel Datwyler  writes:
> The device tree is in big endian format and any properties directly
> retrieved using OF helpers that don't explicitly byte swap should
> be annotated. In particular there are several places where we grab
> the opaque property value for the old ibm,drc-* properties and the
> ibm,my-drc-index property.
>
> Fix this for better static checking by annotating values we know to
> explicitly big endian, and byte swap where appropriate.
>
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/pci/hotplug/rpaphp_core.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)

This is allegedly still popping some sparse warnings:

  +drivers/pci/hotplug/rpaphp_core.c:XX:28: warning: incorrect type in 
assignment (different base types) expected restricted __be32 const [usertype] * 
got int const *[assigned] names
  +drivers/pci/hotplug/rpaphp_core.c:XX:28: warning: incorrect type in 
assignment (different base types) expected restricted __be32 const [usertype] * 
got int const *[assigned] types
  +drivers/pci/hotplug/rpaphp_core.c:XX:30: warning: incorrect type in 
assignment (different base types) expected restricted __be32 const [usertype] * 
got int const *[assigned] indexes
  +drivers/pci/hotplug/rpaphp_core.c:XX:36: warning: incorrect type in 
assignment (different base types) expected restricted __be32 const [usertype] * 
got int const *[assigned] domains


I say allegedly because that output's from a script that tries to diff
sparse warnings before and after the build and it's not always 100% reliable.

cheers


Re: [PATCH 9/9] powerpc/pseries: Enable support for ibm, drc-info property

2019-11-07 Thread Michael Ellerman
Tyrel Datwyler  writes:

> Advertise client support for the PAPR architected ibm,drc-info device
> tree property during CAS handshake.
>
> Signed-off-by: Tyrel Datwyler 

Can you mark this as:

  Fixes: c7a3275e0f9e ("powerpc/pseries: Revert support for ibm,drc-info 
devtree property")


I'm not sure we're going to backport all those fixes into stable
kernels, but at least then we have the link between this commit
c7a3275e0f9e recorded.

cheers

> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index a4e7762..2ca9966 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -1053,7 +1053,7 @@ static const struct ibm_arch_vec 
> ibm_architecture_vec_template __initconst = {
>   .reserved2 = 0,
>   .reserved3 = 0,
>   .subprocessors = 1,
> - .byte22 = OV5_FEAT(OV5_DRMEM_V2),
> + .byte22 = OV5_FEAT(OV5_DRMEM_V2) | OV5_FEAT(OV5_DRC_INFO),
>   .intarch = 0,
>   .mmu = 0,
>   .hash_ext = 0,
> -- 
> 2.7.4


Re: [PATCH 3/9] powerpc/pseries: Add cpu DLPAR support for drc-info property

2019-11-07 Thread Michael Ellerman
Tyrel Datwyler  writes:
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index bbda646..9ba006c 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -730,24 +774,49 @@ static int find_dlpar_cpus_to_add(u32 *cpu_drcs, u32 
> cpus_to_add)
>   return -1;
>   }
>  
> - /* Search the ibm,drc-indexes array for possible CPU drcs to
> -  * add. Note that the format of the ibm,drc-indexes array is
> -  * the number of entries in the array followed by the array
> -  * of drc values so we start looking at index = 1.
> -  */
> - index = 1;
> - while (cpus_found < cpus_to_add) {
> - u32 drc;
> + info = of_find_property(parent, "ibm,drc-info", NULL);
> + if (info) {
> + struct of_drc_info drc;
> + const __be32 *value;
> + int count;
>  
> - rc = of_property_read_u32_index(parent, "ibm,drc-indexes",
> - index++, );
> - if (rc)
> - break;
> + value = of_prop_next_u32(info, NULL, );
> + if (value)
> + value++;
>  
> - if (dlpar_cpu_exists(parent, drc))
> - continue;
> + for (i = 0; i < count; i++) {
> + of_read_drc_info_cell(, , );
> + if (strncmp(drc.drc_type, "CPU", 3))
> + break;
> +
> + for (j = 0; j < drc.num_sequential_elems && cpus_found 
> < cpus_to_add; j++) {

This line's nearly 100 columns, which suggests that this logic has
gotten too convoluted to be a single function.

So I think you should split one or both arms of the if out into separate
functions.

You're basically doing nothing after the if, so possibly you can just
return the result of the split out functions directly.

cheers

> + drc_index = drc.drc_index_start + 
> (drc.sequential_inc * j);
> +
> + if (dlpar_cpu_exists(parent, drc_index))
> + continue;
> +
> + cpu_drcs[cpus_found++] = drc_index;
> + }
> + }
> + } else {
> + /* Search the ibm,drc-indexes array for possible CPU drcs to
> +  * add. Note that the format of the ibm,drc-indexes array is
> +  * the number of entries in the array followed by the array
> +  * of drc values so we start looking at index = 1.
> +  */
> + index = 1;
> + while (cpus_found < cpus_to_add) {
> + rc = of_property_read_u32_index(parent, 
> "ibm,drc-indexes",
> + index++, _index);
> +
> + if (rc)
> + break;
>  
> - cpu_drcs[cpus_found++] = drc;
> + if (dlpar_cpu_exists(parent, drc_index))
> + continue;
> +
> + cpu_drcs[cpus_found++] = drc_index;
> + }
>   }
>  
>   of_node_put(parent);


Re: [PATCH 0/9] Fixes and Enablement of ibm,drc-info property

2019-11-07 Thread Michael Ellerman
Tyrel Datwyler  writes:
> On 11/5/19 9:03 AM, Thomas Falcon wrote:
>> On 11/5/19 9:24 AM, Tyrel Datwyler wrote:
..
>>>
>>> This serious fixs the short comings of the previous submission
>> 
>> Either "seriously fixes the shortcomings", or "fixes the serious 
>> shortcomings?"

> Should be "series" as in this "patch series".

This serious series seriously fixes the series of serious shortcomings?

:P

cheers


Re: [PATCH v3] powerpc: Support CMDLINE_EXTEND

2019-11-07 Thread Michael Ellerman
Chris Packham  writes:
> Hi All,
>
> On Fri, 2019-08-02 at 06:40 +0200, Christophe Leroy wrote:
>> 
>> Le 02/08/2019 à 00:50, Chris Packham a écrit :
>> > Bring powerpc in line with other architectures that support extending or
>> > overriding the bootloader provided command line.
>> > 
>> > The current behaviour is most like CMDLINE_FROM_BOOTLOADER where the
>> > bootloader command line is preferred but the kernel config can provide a
>> > fallback so CMDLINE_FROM_BOOTLOADER is the default. CMDLINE_EXTEND can
>> > be used to append the CMDLINE from the kernel config to the one provided
>> > by the bootloader.
>> > 
>> > Signed-off-by: Chris Packham 
>> 
>> Reviewed-by: Christophe Leroy 
>
> Just going over some old patches this doesn't appear to be in next. Is
> there anything stopping it from being accepted?

Just me not being overloaded :/, sorry.

Have put it in my next-test branch, which means it should appear in next
in the next few days.

cheers


Re: [RFC v1 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-11-07 Thread Michael Ellerman
Ram Pai  writes:
> The hypervisor needs to access the contents of the page holding the TCE
> entries while setting up the TCE entries in the IOMMU's TCE table. For
> SecureVMs, since this page is encrypted, the hypervisor cannot access
> valid entries. Share the page with the hypervisor. This ensures that the
> hypervisor sees the valid entries.

Can you please give people some explanation of why this is safe. After
all the point of the Ultravisor is to protect the guest from a malicious
hypervisor. Giving the hypervisor access to a page of TCEs sounds
dangerous, so please explain why it's not.

cheers

> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 8d9c2b1..07f0847 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "pseries.h"
>  
> @@ -179,6 +180,19 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
> long tcenum,
>  
>  static DEFINE_PER_CPU(__be64 *, tce_page);
>  
> +/*
> + * Allocate a tce page.  If secure VM, share the page with the hypervisor.
> + */
> +static __be64 *alloc_tce_page(void)
> +{
> + __be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> +
> + if (tcep && is_secure_guest())
> + uv_share_page(PHYS_PFN(__pa(tcep)), 1);
> +
> + return tcep;
> +}
> +
>  static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>long npages, unsigned long uaddr,
>enum dma_data_direction direction,
> @@ -206,8 +220,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
> *tbl, long tcenum,
>* from iommu_alloc{,_sg}()
>*/
>   if (!tcep) {
> - tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> - /* If allocation fails, fall back to the loop implementation */
> + tcep = alloc_tce_page();
>   if (!tcep) {
>   local_irq_restore(flags);
>   return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> @@ -391,6 +404,7 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long 
> start_pfn,
>   return rc;
>  }
>  
> +
>  static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>   unsigned long num_pfn, const void *arg)
>  {
> @@ -405,7 +419,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long 
> start_pfn,
>   tcep = __this_cpu_read(tce_page);
>  
>   if (!tcep) {
> - tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> + tcep = alloc_tce_page();
>   if (!tcep) {
>   local_irq_enable();
>   return -ENOMEM;
> -- 
> 1.8.3.1


Re: [RFC v1 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.

2019-11-07 Thread Michael Ellerman
Ram Pai  writes:
> This enables IOMMU support for pseries Secure VMs.

Can you give us some more explanation please?

This is basically a revert of commit:
  edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on secure 
guests")

But neglects to remove the now unnecessary include of svm.h.

> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 07f0847..189717b 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1333,15 +1333,7 @@ void iommu_init_early_pSeries(void)
>   of_reconfig_notifier_register(_reconfig_nb);
>   register_memory_notifier(_mem_nb);
>  
> - /*
> -  * Secure guest memory is inacessible to devices so regular DMA isn't
> -  * possible.
> -  *
> -  * In that case keep devices' dma_map_ops as NULL so that the generic
> -  * DMA code path will use SWIOTLB to bounce buffers for DMA.

Please explain what has changed to make this no longer necessary.

cheers

> -  */
> - if (!is_secure_guest())
> - set_pci_dma_ops(_iommu_ops);
> + set_pci_dma_ops(_iommu_ops);
>  }
>  
>  static int __init disable_multitce(char *str)
> -- 
> 1.8.3.1


Re: Bug 205201 - overflow of DMA mask and bus mask

2019-11-07 Thread Christian Zigotzky

On 05 November 2019 at 5:28 pm, Christoph Hellwig wrote:

On Tue, Nov 05, 2019 at 08:56:27AM +0100, Christian Zigotzky wrote:

Hi All,

We still have DMA problems with some PCI devices. Since the PowerPC updates
4.21-1 [1] we need to decrease the RAM to 3500MB (mem=3500M) if we want to
work with our PCI devices. The FSL P5020 and P5040 have these problems
currently.

Error message:

[   25.654852] bttv 1000:04:05.0: overflow 0xfe077000+4096 of DMA
mask  bus mask df00

All 5.x Linux kernels can't initialize a SCSI PCI card anymore so booting
of a Linux userland isn't possible.

PLEASE check the DMA changes in the PowerPC updates 4.21-1 [1]. The kernel
4.20 works with all PCI devices without limitation of RAM.

Can you send me the .config and a dmesg?  And in the meantime try the
patch below?

---
>From 4d659b7311bd4141fdd3eeeb80fa2d7602ea01d4 Mon Sep 17 00:00:00 2001
From: Nicolas Saenz Julienne 
Date: Fri, 18 Oct 2019 13:00:43 +0200
Subject: dma-direct: check for overflows on 32 bit DMA addresses

As seen on the new Raspberry Pi 4 and sta2x11's DMA implementation it is
possible for a device configured with 32 bit DMA addresses and a partial
DMA mapping located at the end of the address space to overflow. It
happens when a higher physical address, not DMAable, is translated to
it's DMA counterpart.

For example the Raspberry Pi 4, configurable up to 4 GB of memory, has
an interconnect capable of addressing the lower 1 GB of physical memory
with a DMA offset of 0xc000. It transpires that, any attempt to
translate physical addresses higher than the first GB will result in an
overflow which dma_capable() can't detect as it only checks for
addresses bigger then the maximum allowed DMA address.

Fix this by verifying in dma_capable() if the DMA address range provided
is at any point lower than the minimum possible DMA address on the bus.

Signed-off-by: Nicolas Saenz Julienne 
---
  include/linux/dma-direct.h | 8 
  1 file changed, 8 insertions(+)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..6ad9e9ea7564 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -3,6 +3,7 @@
  #define _LINUX_DMA_DIRECT_H 1
  
  #include 

+#include  /* for min_low_pfn */
  #include 
  
  #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA

@@ -27,6 +28,13 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size)
if (!dev->dma_mask)
return false;
  
+#ifndef CONFIG_ARCH_DMA_ADDR_T_64BIT

+   /* Check if DMA address overflowed */
+   if (min(addr, addr + size - 1) <
+   __phys_to_dma(dev, (phys_addr_t)(min_low_pfn << PAGE_SHIFT)))
+   return false;
+#endif
+
return addr + size - 1 <=
min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
  }

Hello Christoph,

Thanks a lot for your patch! Unfortunately this patch doesn't solve the 
issue.


Error messages:

[    6.041163] bttv: driver version 0.9.19 loaded
[    6.041167] bttv: using 8 buffers with 2080k (520 pages) each for capture
[    6.041559] bttv: Bt8xx card found (0)
[    6.041609] bttv: 0: Bt878 (rev 17) at 1000:04:05.0, irq: 19, 
latency: 128, mmio: 0xc20001000
[    6.041622] bttv: 0: using: Typhoon TView RDS + FM Stereo / KNC1 TV 
Station RDS [card=53,insmod option]

[    6.042216] bttv: 0: tuner type=5
[    6.111994] bttv: 0: audio absent, no audio device found!
[    6.176425] bttv: 0: Setting PLL: 28636363 => 35468950 (needs up to 
100ms)

[    6.25] bttv: PLL set ok
[    6.209351] bttv: 0: registered device video0
[    6.211576] bttv: 0: registered device vbi0
[    6.214897] bttv: 0: registered device radio0
[  114.218806] bttv 1000:04:05.0: overflow 0xff507000+4096 of 
DMA mask  bus mask df00
[  114.218848] Modules linked in: rfcomm bnep tuner_simple tuner_types 
tea5767 tuner tda7432 tvaudio msp3400 bttv tea575x tveeprom 
videobuf_dma_sg videobuf_core rc_core videodev mc btusb btrtl btbcm 
btintel bluetooth uio_pdrv_genirq uio ecdh_generic ecc
[  114.219012] [c001ecddf720] [808ff6e8] 
.buffer_prepare+0x150/0x268 [bttv]
[  114.219029] [c001ecddf860] [808fff6c] 
.bttv_qbuf+0x50/0x64 [bttv]


-

Trace:

[  462.783184] Call Trace:
[  462.783187] [c001c6c67420] [c00b3358] 
.report_addr+0xb8/0xc0 (unreliable)
[  462.783192] [c001c6c67490] [c00b351c] 
.dma_direct_map_page+0xf0/0x128
[  462.783195] [c001c6c67530] [c00b35b0] 
.dma_direct_map_sg+0x5c/0xac
[  462.783205] [c001c6c675e0] [80862e88] 
.__videobuf_iolock+0x660/0x6d8 [videobuf_dma_sg]
[  462.783220] [c001c6c676b0] [80854274] 
.videobuf_iolock+0x98/0xb4 [videobuf_core]
[  462.783271] [c001c6c67720] [808686e8] 
.buffer_prepare+0x150/0x268 [bttv]
[  462.783276] [c001c6c677c0] [80854afc] 
.videobuf_qbuf+0x2b8/0x428 [videobuf_core]
[  462.783288] [c001c6c67860] [80868f6c] 
.bttv_qbuf+0x50/0x64 [bttv]

[Bug 205201] Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205201

--- Comment #10 from Christian Zigotzky (chzigot...@xenosoft.de) ---
Created attachment 285815
  --> https://bugzilla.kernel.org/attachment.cgi?id=285815=edit
Kernel 5.4-rc6 config for the Cyrus+ board and for the QEMU ppce500 board (CPU:
P5040 and P5020)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205201] Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205201

--- Comment #9 from Christian Zigotzky (chzigot...@xenosoft.de) ---
Created attachment 285813
  --> https://bugzilla.kernel.org/attachment.cgi?id=285813=edit
dmesg fsl p5040

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205201] Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205201

--- Comment #8 from Christian Zigotzky (chzigot...@xenosoft.de) ---
Trace:

[  462.783184] Call Trace:
[  462.783187] [c001c6c67420] [c00b3358] .report_addr+0xb8/0xc0
(unreliable)
[  462.783192] [c001c6c67490] [c00b351c]
.dma_direct_map_page+0xf0/0x128
[  462.783195] [c001c6c67530] [c00b35b0]
.dma_direct_map_sg+0x5c/0xac
[  462.783205] [c001c6c675e0] [80862e88]
.__videobuf_iolock+0x660/0x6d8 [videobuf_dma_sg]
[  462.783220] [c001c6c676b0] [80854274] .videobuf_iolock+0x98/0xb4
[videobuf_core]
[  462.783271] [c001c6c67720] [808686e8]
.buffer_prepare+0x150/0x268 [bttv]
[  462.783276] [c001c6c677c0] [80854afc] .videobuf_qbuf+0x2b8/0x428
[videobuf_core]
[  462.783288] [c001c6c67860] [80868f6c] .bttv_qbuf+0x50/0x64
[bttv]
[  462.783383] [c001c6c678e0] [807bf208] .v4l_qbuf+0x54/0x60
[videodev]
[  462.783402] [c001c6c67970] [807c1eac]
.__video_do_ioctl+0x30c/0x3f8 [videodev]
[  462.783421] [c001c6c67a80] [807c3c08]
.video_usercopy+0x18c/0x3dc [videodev]
[  462.783440] [c001c6c67c00] [807bb14c] .v4l2_ioctl+0x60/0x78
[videodev]
[  462.783460] [c001c6c67c90] [807d3c48]
.v4l2_compat_ioctl32+0x9b4/0x1850 [videodev]
[  462.783468] [c001c6c67d70] [c01ad9cc]
.__se_compat_sys_ioctl+0x284/0x127c
[  462.783473] [c001c6c67e20] [c67c] system_call+0x60/0x6c
[  462.783475] Instruction dump:
[  462.783477] 40fe0044 6000 892255d0 2f89 40fe0020 3c82ffc5 3921
6000 
[  462.783483] 38842029 992255d0 485ad0d9 6000 <0fe0> 38210070 e8010010
7c0803a6 
[  462.783490] ---[ end trace b677d4a00458e277 ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205201] Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205201

--- Comment #7 from Christian Zigotzky (chzigot...@xenosoft.de) ---
Unfortunately this patch doesn't solve the issue. 

Error message:

[6.041163] bttv: driver version 0.9.19 loaded
[6.041167] bttv: using 8 buffers with 2080k (520 pages) each for
capture
[6.041559] bttv: Bt8xx card found (0)
[6.041609] bttv: 0: Bt878 (rev 17) at 1000:04:05.0, irq: 19, latency:
128, mmio: 0xc20001000
[6.041622] bttv: 0: using: Typhoon TView RDS + FM Stereo / KNC1 TV
Station RDS [card=53,insmod option]
[6.042216] bttv: 0: tuner type=5
[6.111994] bttv: 0: audio absent, no audio device found!
[6.176425] bttv: 0: Setting PLL: 28636363 => 35468950 (needs up to
100ms)
[6.25] bttv: PLL set ok
[6.209351] bttv: 0: registered device video0
[6.211576] bttv: 0: registered device vbi0
[6.214897] bttv: 0: registered device radio0
[  114.218806] bttv 1000:04:05.0: overflow 0xff507000+4096 of DMA
mask  bus mask df00
[  114.218848] Modules linked in: rfcomm bnep tuner_simple tuner_types
tea5767 tuner tda7432 tvaudio msp3400 bttv tea575x tveeprom videobuf_dma_sg
videobuf_core rc_core videodev mc btusb btrtl btbcm btintel bluetooth
uio_pdrv_genirq uio ecdh_generic ecc
[  114.219012] [c001ecddf720] [808ff6e8]
.buffer_prepare+0x150/0x268 [bttv]
[  114.219029] [c001ecddf860] [808fff6c] .bttv_qbuf+0x50/0x64
[bttv]

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 3/3] powerpc/pseries: Fixup config space size of OpenCAPI devices

2019-11-07 Thread christophe lombard

On 05/11/2019 06:01, Andrew Donnellan wrote:

On 22/10/19 6:52 pm, christophe lombard wrote:

Fix up the pci config size of the OpenCAPI PCIe devices in the pseries
environment.
Most of OpenCAPI PCIe devices have 4096 bytes of configuration space.


It's not "most of", it's "all" - the OpenCAPI Discovery and 
Configuration Spec requires the use of extended capabilities that fall 
in the 0x100-0xFFF range.




Signed-off-by: Christophe Lombard 
---
  arch/powerpc/platforms/pseries/pci.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platforms/pseries/pci.c

index 1eae1d09980c..3397784767b0 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -291,6 +291,15 @@ static void fixup_winbond_82c105(struct pci_dev* 
dev)
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, 
PCI_DEVICE_ID_WINBOND_82C105,

   fixup_winbond_82c105);
+static void fixup_opencapi_cfg_size(struct pci_dev *pdev)
+{
+    if (!machine_is(pseries))
+    return;
+
+    pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x062b, 
fixup_opencapi_cfg_size);


An OpenCAPI device can have any PCI ID, is there a particular reason 
we're limiting this to 1014:062b? On PowerNV, we check the PHB type to 
determine whether the device is OpenCAPI or not, what's the equivalent 
for pseries?




Thanks for the review. For pseries, there is no specific OpenCapi PHB 
type which constraints this kind of request.

We are working to found an other solution.


+
  int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
  {
  struct device_node *dn, *pdn;







Re: [RFC v1 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-11-07 Thread Alexey Kardashevskiy



On 07/11/2019 04:01, Ram Pai wrote:
> On Wed, Nov 06, 2019 at 12:58:50PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 05/11/2019 08:28, Ram Pai wrote:
>>> The hypervisor needs to access the contents of the page holding the TCE
>>> entries while setting up the TCE entries in the IOMMU's TCE table. For
>>> SecureVMs, since this page is encrypted, the hypervisor cannot access
>>> valid entries. Share the page with the hypervisor. This ensures that the
>>> hypervisor sees the valid entries.
>>>
>>> Signed-off-by: Ram Pai 
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 20 +---
>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index 8d9c2b1..07f0847 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -37,6 +37,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #include "pseries.h"
>>>  
>>> @@ -179,6 +180,19 @@ static int tce_build_pSeriesLP(struct iommu_table 
>>> *tbl, long tcenum,
>>>  
>>>  static DEFINE_PER_CPU(__be64 *, tce_page);
>>>  
>>> +/*
>>> + * Allocate a tce page.  If secure VM, share the page with the hypervisor.
>>> + */
>>> +static __be64 *alloc_tce_page(void)
>>> +{
>>> +   __be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> +
>>> +   if (tcep && is_secure_guest())
>>> +   uv_share_page(PHYS_PFN(__pa(tcep)), 1);
>>
>>
>> There is no matching unshare in this patch.
> 
> The page is allocated and shared, and stays that way for the life of the
> kernel. It is not explicitly unshared or freed.


Ah, fair enough, I missed that, strange that we do not free it but ok. Thanks,


>  It is however
> implicitly unshared by the guest kernel, through a UV_UNSHARE_ALL_PAGES ucall
> when the guest kernel reboots. And it also gets implicitly unshared by
> the Ultravisor/Hypervisor, if the SVM abruptly terminates.
> 
>>
>>
>>> +
>>> +   return tcep;
>>> +}
>>> +
>>>  static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  long npages, unsigned long uaddr,
>>>  enum dma_data_direction direction,
>>> @@ -206,8 +220,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
>>> *tbl, long tcenum,
>>>  * from iommu_alloc{,_sg}()
>>>  */
>>> if (!tcep) {
>>> -   tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> -   /* If allocation fails, fall back to the loop implementation */
>>> +   tcep = alloc_tce_page();
>>> if (!tcep) {
>>> local_irq_restore(flags);
>>> return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
>>> @@ -391,6 +404,7 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long 
>>> start_pfn,
>>> return rc;
>>>  }
>>>  
>>> +
>>
>> Unrelated.
> 
> yes. will fix it.
> 
> Thanks,
> RP
> 

-- 
Alexey


Re: [PATCH V8] mm/debug: Add tests validating architecture page table helpers

2019-11-07 Thread Anshuman Khandual



On 11/06/2019 11:37 PM, Vineet Gupta wrote:
> On 11/5/19 7:03 PM, Anshuman Khandual wrote:
>> But should not pfn_pmd() be encapsulated inside 
>> HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> at the minimum (but I would say it should be available always, nonetheless) 
>> when
>> the platform subscribes to THP irrespective of whether THP is enabled or not.
> 
> For ARC it was only introduced/needed when I added THP support so it is 
> dependent
> in some way.
Right, it is dependent.

> 
>> I could see in the file (arch/arc/include/asm/pgtable.h) that fetching 
>> pfn_pmd()
>> and all other basic PMD definitions is conditional on 
>> CONFIG_TRANSPARENT_HUGEPAGE.
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> #include 
>> #endif
>>
>> IIUC, CONFIG_TRANSPARENT_HUGEPAGE should only encapsulate PMD page table 
>> helpers
>> which are expected from generic THP code (pmd_trans_huge, 
>> pmdp_set_access_flags
>> etc) but not the basic PMD helpers like pmd_pfn, pmd_mkyoung, pmd_mkdirty,
>> pmd_mkclean etc. 
> 
> ARC only has 2 levels of paging, so these don't make any sense in general and
> needed only for THP case.
> I case of arch/arm you see it is only defined in pgtable-3level.h

There is no uniformity for all these across architectures. It has been bit
difficult to get some of these required helpers right (compile and run) on
different platforms.

> 
>> Hence wondering will it be possible to accommodate following
>> code change on arc platform (not even compiled) in order to fix the problem ?
> 
> I'm open to making changes in ARC code but lets do the right thing.
> 
>>   */
>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>  #include 
>>  #endif
> 
> This in wrong.  CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE is a just a glue toggle,
> used only in Kconfig files (and not in any "C" code).  It enables generic 
> Kconfig
> code to allow visibility of CONFIG_TRANSPARENT_HUGEPAGE w/o every arch 
> needing to
> do a me too.
> 
> I think you need to use CONFIG_TRANSPARENT_HUGEPAGE to guard appropriate 
> tests. I
> understand that it only

We can probably replace CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE wrapper with
CONFIG_TRANSPARENT_HUGEPAGE. But CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
explicitly depends on CONFIG_TRANSPARENT_HUGEPAGE as a prerequisite. Could
you please confirm if the following change on this test will work on ARC
platform for both THP and !THP cases ? Thank you.

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 621ac09..99ebc7c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -67,7 +67,7 @@ static void __init pte_basic_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(pte_write(pte_wrprotect(pte)));
 }
 
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
 {
pmd_t pmd = pfn_pmd(pfn, prot);
@@ -85,9 +85,6 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
 */
WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
 }
-#else
-static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
-#endif
 
 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
 static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
@@ -112,6 +109,10 @@ static void __init pud_basic_tests(unsigned long pfn, 
pgprot_t prot)
 #else
 static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
 #endif
+#else
+static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
 
 static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
 {

> -Vineet
> 


Re: [PATCH v2 05/18] mm/gup: introduce pin_user_pages*() and FOLL_PIN

2019-11-07 Thread Mike Rapoport
On Tue, Nov 05, 2019 at 11:00:06AM -0800, John Hubbard wrote:
> On 11/5/19 5:10 AM, Mike Rapoport wrote:
> ...
> >> ---
> >>  Documentation/vm/index.rst  |   1 +
> >>  Documentation/vm/pin_user_pages.rst | 212 ++
> > 
> > I think it belongs to Documentation/core-api.
> 
> Done:
> 
> diff --git a/Documentation/core-api/index.rst 
> b/Documentation/core-api/index.rst
> index ab0eae1c153a..413f7d7c8642 100644
> --- a/Documentation/core-api/index.rst
> +++ b/Documentation/core-api/index.rst
> @@ -31,6 +31,7 @@ Core utilities
> generic-radix-tree
> memory-allocation
> mm-api
> +   pin_user_pages
> gfp_mask-from-fs-io
> timekeeping
> boot-time-mm

Thanks!
 
> ...
> >> diff --git a/Documentation/vm/pin_user_pages.rst 
> >> b/Documentation/vm/pin_user_pages.rst
> >> new file mode 100644
> >> index ..3910f49ca98c
> >> --- /dev/null
> >> +++ b/Documentation/vm/pin_user_pages.rst
> >> @@ -0,0 +1,212 @@
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +
> >> +
> >> +pin_user_pages() and related calls
> >> +
> > 
> > I know this is too much to ask, but having pin_user_pages() a part of more
> > general GUP description would be really great :)
> > 
> 
> Yes, definitely. But until I saw the reaction to the pin_user_pages() API
> family, I didn't want to write too much--it could have all been tossed out
> in favor of a whole different API. But now that we've had some initial
> reviews, I'm much more confident in being able to write about the larger 
> API set.
> 
> So yes, I'll put that on my pending list.
> 
> 
> ...
> >> +This document describes the following functions: ::
> >> +
> >> + pin_user_pages
> >> + pin_user_pages_fast
> >> + pin_user_pages_remote
> >> +
> >> + pin_longterm_pages
> >> + pin_longterm_pages_fast
> >> + pin_longterm_pages_remote
> >> +
> >> +Basic description of FOLL_PIN
> >> +=
> >> +
> >> +A new flag for get_user_pages ("gup") has been added: FOLL_PIN. FOLL_PIN 
> >> has
> > 
> > Consider reading this after, say, half a year ;-)
> > 
> 
> OK, OK. I knew when I wrote that that it was not going to stay new forever, 
> but
> somehow failed to write the right thing anyway. :) 
> 
> Here's a revised set of paragraphs:
> 
> Basic description of FOLL_PIN
> =
> 
> FOLL_PIN and FOLL_LONGTERM are flags that can be passed to the 
> get_user_pages*()
> ("gup") family of functions. FOLL_PIN has significant interactions and
> interdependencies with FOLL_LONGTERM, so both are covered here.
> 
> Both FOLL_PIN and FOLL_LONGTERM are internal to gup, meaning that neither
> FOLL_PIN nor FOLL_LONGTERM should not appear at the gup call sites. This 
> allows
> the associated wrapper functions  (pin_user_pages() and others) to set the
> correct combination of these flags, and to check for problems as well.

Great, thanks! 
 
> thanks,
> 
> John Hubbard
> NVIDIA

-- 
Sincerely yours,
Mike.


Re: [RFC v1 0/2] Enable IOMMU support for pseries Secure VMs

2019-11-07 Thread Alexey Kardashevskiy



On 07/11/2019 05:06, Michael S. Tsirkin wrote:
> On Wed, Nov 06, 2019 at 12:59:50PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 05/11/2019 08:28, Ram Pai wrote:
>>> This patch series enables IOMMU support for pseries Secure VMs.
>>>
>>>
>>> Tested using QEMU command line option:
>>>
>>>  "-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4,
>>> iommu_platform=on,disable-modern=off,disable-legacy=on"
>>>  and 
>>>
>>>  "-device virtio-blk-pci,scsi=off,bus=pci.0,
>>> addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,
>>> iommu_platform=on,disable-modern=off,disable-legacy=on"
>>
>>
>> Worth mentioning that SLOF won't boot with such devices as SLOF does not 
>> know about iommu_platform=on.
> 
> Shouldn't be hard to support: set up the iommu to allow everything
> and ack the feature. Right?

With the defaults we have (32bit window limited by 1GB and SLOF residing close 
to the end of the second GB), it is not
straight forward; also SLOF's XHCI and E1000 already use IOMMU so we should 
just do this for virtio as well, I am
hacking it now. Thanks,



> 
>>>
>>> Ram Pai (2):
>>>   powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
>>>   powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.
>>>
>>>  arch/powerpc/platforms/pseries/iommu.c | 30 ++
>>>  1 file changed, 18 insertions(+), 12 deletions(-)
>>>
>>
>> -- 
>> Alexey
> 

-- 
Alexey


Re: [PATCH] powerpc/tools: Don't quote $objdump in scripts

2019-11-07 Thread Michael Ellerman
On Thu, 2019-10-24 at 00:47:30 UTC, Michael Ellerman wrote:
> Some of our scripts are passed $objdump and then call it as
> "$objdump". This doesn't work if it contains spaces because we're
> using ccache, for example you get errors such as:
> 
>   ./arch/powerpc/tools/relocs_check.sh: line 48: ccache ppc64le-objdump: No 
> such file or directory
>   ./arch/powerpc/tools/unrel_branch_check.sh: line 26: ccache 
> ppc64le-objdump: No such file or directory
> 
> Fix it by not quoting the string when we expand it, allowing the shell
> to do the right thing for us.
> 
> Fixes: a71aa05e1416 ("powerpc: Convert relocs_check to a shell script using 
> grep")
> Fixes: 4ea80652dc75 ("powerpc/64s: Tool to flag direct branches from 
> unrelocated interrupt vectors")
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/e44ff9ea8f4c8a90c82f7b85bd4f5e497c841960

cheers


Re: [PATCH] powerpc: Add build-time check of ptrace PT_xx defines

2019-11-07 Thread Michael Ellerman
On Wed, 2019-10-30 at 11:12:31 UTC, Michael Ellerman wrote:
> As part of the uapi we export a lot of PT_xx defines for each register
> in struct pt_regs. These are expressed as an index from gpr[0], in
> units of unsigned long.
> 
> Currently there's nothing tying the values of those defines to the
> actual layout of the struct.
> 
> But we *don't* want to change the uapi defines to derive the PT_xx
> values based on the layout of the struct, those values are ABI and
> must never change.
> 
> Instead we want to do the reverse, make sure that the layout of the
> struct never changes vs the PT_xx defines. So add build time checks of
> that.
> 
> This probably seems paranoid, but at least once in the past someone
> has sent a patch that would have broken the ABI if it hadn't been
> spotted. Although it probably would have been detected via testing,
> it's preferable to just quash any issues at the source.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/b9e0805abf2e92fc275ac5fbd8c1c9a92b00413d

cheers


[PATCH v7 4/4] powerpc: load firmware trusted keys/hashes into kernel keyring

2019-11-07 Thread Eric Richter
From: Nayna Jain 

The keys used to verify the Host OS kernel are managed by firmware as
secure variables. This patch loads the verification keys into the .platform
keyring and revocation hashes into .blacklist keyring. This enables
verification and loading of the kernels signed by the boot time keys which
are trusted by firmware.

Signed-off-by: Nayna Jain 
Reviewed-by: Mimi Zohar 
Signed-off-by: Eric Richter 
---
 arch/powerpc/Kconfig  |  1 +
 security/integrity/Kconfig|  8 ++
 security/integrity/Makefile   |  4 +-
 .../integrity/platform_certs/load_powerpc.c   | 98 +++
 4 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 security/integrity/platform_certs/load_powerpc.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cabc091f3fe1..498967a5ef4e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -939,6 +939,7 @@ config PPC_SECURE_BOOT
bool
depends on PPC_POWERNV
depends on IMA_ARCH_POLICY
+   select LOAD_PPC_KEYS
help
  Systems with firmware secure boot enabled need to define security
  policies to extend secure boot to the OS. This config allows a user
diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
index 0bae6adb63a9..26abee23e4e3 100644
--- a/security/integrity/Kconfig
+++ b/security/integrity/Kconfig
@@ -72,6 +72,14 @@ config LOAD_IPL_KEYS
depends on S390
def_bool y
 
+config LOAD_PPC_KEYS
+   bool "Enable loading of platform and blacklisted keys for POWER"
+   depends on INTEGRITY_PLATFORM_KEYRING
+   depends on PPC_SECURE_BOOT
+   help
+ Enable loading of keys to the .platform keyring and blacklisted
+ hashes to the .blacklist keyring for powerpc based platforms.
+
 config INTEGRITY_AUDIT
bool "Enables integrity auditing support "
depends on AUDIT
diff --git a/security/integrity/Makefile b/security/integrity/Makefile
index 351c9662994b..7ee39d66cf16 100644
--- a/security/integrity/Makefile
+++ b/security/integrity/Makefile
@@ -14,6 +14,8 @@ integrity-$(CONFIG_LOAD_UEFI_KEYS) += 
platform_certs/efi_parser.o \
  platform_certs/load_uefi.o \
  platform_certs/keyring_handler.o
 integrity-$(CONFIG_LOAD_IPL_KEYS) += platform_certs/load_ipl_s390.o
-
+integrity-$(CONFIG_LOAD_PPC_KEYS) += platform_certs/efi_parser.o \
+ platform_certs/load_powerpc.o \
+ platform_certs/keyring_handler.o
 obj-$(CONFIG_IMA)  += ima/
 obj-$(CONFIG_EVM)  += evm/
diff --git a/security/integrity/platform_certs/load_powerpc.c 
b/security/integrity/platform_certs/load_powerpc.c
new file mode 100644
index ..805f7df64769
--- /dev/null
+++ b/security/integrity/platform_certs/load_powerpc.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain
+ *
+ *  - loads keys and hashes stored and controlled by the firmware.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "keyring_handler.h"
+
+/*
+ * Get a certificate list blob from the named secure variable.
+ */
+static __init void *get_cert_list(u8 *key, unsigned long keylen, uint64_t 
*size)
+{
+   int rc;
+   void *db;
+
+   rc = secvar_ops->get(key, keylen, NULL, size);
+   if (rc) {
+   pr_err("Couldn't get size: %d\n", rc);
+   return NULL;
+   }
+
+   db = kmalloc(*size, GFP_KERNEL);
+   if (!db)
+   return NULL;
+
+   rc = secvar_ops->get(key, keylen, db, size);
+   if (rc) {
+   kfree(db);
+   pr_err("Error reading db var: %d\n", rc);
+   return NULL;
+   }
+
+   return db;
+}
+
+/*
+ * Load the certs contained in the keys databases into the platform trusted
+ * keyring and the blacklisted X.509 cert SHA256 hashes into the blacklist
+ * keyring.
+ */
+static int __init load_powerpc_certs(void)
+{
+   void *db = NULL, *dbx = NULL;
+   uint64_t dbsize = 0, dbxsize = 0;
+   int rc = 0;
+   struct device_node *node;
+
+   if (!secvar_ops)
+   return -ENODEV;
+
+   /* The following only applies for the edk2-compat backend.
+* Return early if it is not set.
+*/
+
+   node = of_find_compatible_node(NULL, NULL, "ibm,edk2-compat-v1");
+   if (!node)
+   return -ENODEV;
+
+   /* Get db, and dbx.  They might not exist, so it isn't
+* an error if we can't get them.
+*/
+   db = get_cert_list("db", 3, );
+   if (!db) {
+   pr_err("Couldn't get db list from firmware\n");
+   } else {
+   rc = parse_efi_signature_list("powerpc:db", db, dbsize,
+ 

[PATCH v7 3/4] x86/efi: move common keyring handler functions to new file

2019-11-07 Thread Eric Richter
From: Nayna Jain 

The handlers to add the keys to the .platform keyring and blacklisted
hashes to the .blacklist keyring is common for both the uefi and powerpc
mechanisms of loading the keys/hashes from the firmware.

This patch moves the common code from load_uefi.c to keyring_handler.c

Signed-off-by: Nayna Jain 
Acked-by: Mimi Zohar 
Signed-off-by: Eric Richter 
---
 security/integrity/Makefile   |  3 +-
 .../platform_certs/keyring_handler.c  | 80 +++
 .../platform_certs/keyring_handler.h  | 32 
 security/integrity/platform_certs/load_uefi.c | 67 +---
 4 files changed, 115 insertions(+), 67 deletions(-)
 create mode 100644 security/integrity/platform_certs/keyring_handler.c
 create mode 100644 security/integrity/platform_certs/keyring_handler.h

diff --git a/security/integrity/Makefile b/security/integrity/Makefile
index 35e6ca773734..351c9662994b 100644
--- a/security/integrity/Makefile
+++ b/security/integrity/Makefile
@@ -11,7 +11,8 @@ integrity-$(CONFIG_INTEGRITY_SIGNATURE) += digsig.o
 integrity-$(CONFIG_INTEGRITY_ASYMMETRIC_KEYS) += digsig_asymmetric.o
 integrity-$(CONFIG_INTEGRITY_PLATFORM_KEYRING) += 
platform_certs/platform_keyring.o
 integrity-$(CONFIG_LOAD_UEFI_KEYS) += platform_certs/efi_parser.o \
-   platform_certs/load_uefi.o
+ platform_certs/load_uefi.o \
+ platform_certs/keyring_handler.o
 integrity-$(CONFIG_LOAD_IPL_KEYS) += platform_certs/load_ipl_s390.o
 
 obj-$(CONFIG_IMA)  += ima/
diff --git a/security/integrity/platform_certs/keyring_handler.c 
b/security/integrity/platform_certs/keyring_handler.c
new file mode 100644
index ..c5ba695c10e3
--- /dev/null
+++ b/security/integrity/platform_certs/keyring_handler.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../integrity.h"
+
+static efi_guid_t efi_cert_x509_guid __initdata = EFI_CERT_X509_GUID;
+static efi_guid_t efi_cert_x509_sha256_guid __initdata =
+   EFI_CERT_X509_SHA256_GUID;
+static efi_guid_t efi_cert_sha256_guid __initdata = EFI_CERT_SHA256_GUID;
+
+/*
+ * Blacklist a hash.
+ */
+static __init void uefi_blacklist_hash(const char *source, const void *data,
+  size_t len, const char *type,
+  size_t type_len)
+{
+   char *hash, *p;
+
+   hash = kmalloc(type_len + len * 2 + 1, GFP_KERNEL);
+   if (!hash)
+   return;
+   p = memcpy(hash, type, type_len);
+   p += type_len;
+   bin2hex(p, data, len);
+   p += len * 2;
+   *p = 0;
+
+   mark_hash_blacklisted(hash);
+   kfree(hash);
+}
+
+/*
+ * Blacklist an X509 TBS hash.
+ */
+static __init void uefi_blacklist_x509_tbs(const char *source,
+  const void *data, size_t len)
+{
+   uefi_blacklist_hash(source, data, len, "tbs:", 4);
+}
+
+/*
+ * Blacklist the hash of an executable.
+ */
+static __init void uefi_blacklist_binary(const char *source,
+const void *data, size_t len)
+{
+   uefi_blacklist_hash(source, data, len, "bin:", 4);
+}
+
+/*
+ * Return the appropriate handler for particular signature list types found in
+ * the UEFI db and MokListRT tables.
+ */
+__init efi_element_handler_t get_handler_for_db(const efi_guid_t *sig_type)
+{
+   if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0)
+   return add_to_platform_keyring;
+   return 0;
+}
+
+/*
+ * Return the appropriate handler for particular signature list types found in
+ * the UEFI dbx and MokListXRT tables.
+ */
+__init efi_element_handler_t get_handler_for_dbx(const efi_guid_t *sig_type)
+{
+   if (efi_guidcmp(*sig_type, efi_cert_x509_sha256_guid) == 0)
+   return uefi_blacklist_x509_tbs;
+   if (efi_guidcmp(*sig_type, efi_cert_sha256_guid) == 0)
+   return uefi_blacklist_binary;
+   return 0;
+}
diff --git a/security/integrity/platform_certs/keyring_handler.h 
b/security/integrity/platform_certs/keyring_handler.h
new file mode 100644
index ..2462bfa08fe3
--- /dev/null
+++ b/security/integrity/platform_certs/keyring_handler.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef PLATFORM_CERTS_INTERNAL_H
+#define PLATFORM_CERTS_INTERNAL_H
+
+#include 
+
+void blacklist_hash(const char *source, const void *data,
+   size_t len, const char *type,
+   size_t type_len);
+
+/*
+ * Blacklist an X509 TBS hash.
+ */
+void blacklist_x509_tbs(const char *source, const void *data, size_t len);
+
+/*
+ * Blacklist the hash of an executable.
+ */
+void blacklist_binary(const char *source, const void *data, size_t len);
+
+/*
+ * Return the handler for particular signature list types found in 

[PATCH v7 1/4] powerpc/powernv: Add OPAL API interface to access secure variable

2019-11-07 Thread Eric Richter
From: Nayna Jain 

The X.509 certificates trusted by the platform and required to secure boot
the OS kernel are wrapped in secure variables, which are controlled by
OPAL.

This patch adds firmware/kernel interface to read and write OPAL secure
variables based on the unique key.

This support can be enabled using CONFIG_OPAL_SECVAR.

Signed-off-by: Claudio Carvalho 
Signed-off-by: Nayna Jain 
Signed-off-by: Eric Richter 
---
 arch/powerpc/include/asm/opal-api.h  |   5 +-
 arch/powerpc/include/asm/opal.h  |   7 +
 arch/powerpc/include/asm/secvar.h|  35 +
 arch/powerpc/kernel/Makefile |   2 +-
 arch/powerpc/kernel/secvar-ops.c |  16 +++
 arch/powerpc/platforms/powernv/Makefile  |   2 +-
 arch/powerpc/platforms/powernv/opal-call.c   |   3 +
 arch/powerpc/platforms/powernv/opal-secvar.c | 140 +++
 arch/powerpc/platforms/powernv/opal.c|   3 +
 9 files changed, 210 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/secvar.h
 create mode 100644 arch/powerpc/kernel/secvar-ops.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 378e3997845a..c1f25a760eb1 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -211,7 +211,10 @@
 #define OPAL_MPIPL_UPDATE  173
 #define OPAL_MPIPL_REGISTER_TAG174
 #define OPAL_MPIPL_QUERY_TAG   175
-#define OPAL_LAST  175
+#define OPAL_SECVAR_GET176
+#define OPAL_SECVAR_GET_NEXT   177
+#define OPAL_SECVAR_ENQUEUE_UPDATE 178
+#define OPAL_LAST  178
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a0cf8fba4d12..9986ac34b8e2 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -298,6 +298,13 @@ int opal_sensor_group_clear(u32 group_hndl, int token);
 int opal_sensor_group_enable(u32 group_hndl, int token, bool enable);
 int opal_nx_coproc_init(uint32_t chip_id, uint32_t ct);
 
+int opal_secvar_get(const char *key, uint64_t key_len, u8 *data,
+   uint64_t *data_size);
+int opal_secvar_get_next(const char *key, uint64_t *key_len,
+uint64_t key_buf_size);
+int opal_secvar_enqueue_update(const char *key, uint64_t key_len, u8 *data,
+  uint64_t data_size);
+
 s64 opal_mpipl_update(enum opal_mpipl_ops op, u64 src, u64 dest, u64 size);
 s64 opal_mpipl_register_tag(enum opal_mpipl_tags tag, u64 addr);
 s64 opal_mpipl_query_tag(enum opal_mpipl_tags tag, u64 *addr);
diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
new file mode 100644
index ..4cc35b58b986
--- /dev/null
+++ b/arch/powerpc/include/asm/secvar.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain
+ *
+ * PowerPC secure variable operations.
+ */
+#ifndef SECVAR_OPS_H
+#define SECVAR_OPS_H
+
+#include 
+#include 
+
+extern const struct secvar_operations *secvar_ops;
+
+struct secvar_operations {
+   int (*get)(const char *key, uint64_t key_len, u8 *data,
+  uint64_t *data_size);
+   int (*get_next)(const char *key, uint64_t *key_len,
+   uint64_t keybufsize);
+   int (*set)(const char *key, uint64_t key_len, u8 *data,
+  uint64_t data_size);
+};
+
+#ifdef CONFIG_PPC_SECURE_BOOT
+
+extern void set_secvar_ops(const struct secvar_operations *ops);
+
+#else
+
+static inline void set_secvar_ops(const struct secvar_operations *ops) { }
+
+#endif
+
+#endif
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e8eb2955b7d5..3cf26427334f 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -161,7 +161,7 @@ ifneq ($(CONFIG_PPC_POWERNV)$(CONFIG_PPC_SVM),)
 obj-y  += ucall.o
 endif
 
-obj-$(CONFIG_PPC_SECURE_BOOT)  += secure_boot.o ima_arch.o
+obj-$(CONFIG_PPC_SECURE_BOOT)  += secure_boot.o ima_arch.o secvar-ops.o
 
 # Disable GCOV, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
diff --git a/arch/powerpc/kernel/secvar-ops.c b/arch/powerpc/kernel/secvar-ops.c
new file mode 100644
index ..4cfa7dbd8850
--- /dev/null
+++ b/arch/powerpc/kernel/secvar-ops.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain
+ *
+ * This file initializes secvar operations for PowerPC Secureboot
+ */
+
+#include 
+
+const struct secvar_operations *secvar_ops;
+
+void set_secvar_ops(const struct 

Re: [PATCH V2 1/2] ASoC: dt-bindings: fsl_asrc: add compatible string for imx8qm

2019-11-07 Thread S.j. Wang
Hi Rob
> 
> On Wed, Oct 30, 2019 at 07:41:26PM +0800, Shengjiu Wang wrote:
> > In order to support the two asrc modules in imx8qm, we need to add
> > compatible string "fsl,imx8qm-asrc0" and "fsl,imx8qm-asrc1"
> 
> Are the blocks different in some way?
> 
> If not, why do you need to distinguish them?
> 
The internal clock mapping is different for each module.

Or we can use one compatible string, but need add another
property "fsl,asrc-clk-map" to distinguish the different clock map.

The change is in below.

Which one do you think is better? 

Required properties:

-  - compatible : Contains "fsl,imx35-asrc" or "fsl,imx53-asrc".
+  - compatible : Contains "fsl,imx35-asrc", "fsl,imx53-asrc",
+ "fsl,imx8qm-asrc".

   - reg: Offset and length of the register set for the 
device.

@@ -35,6 +36,11 @@ Required properties:

- fsl,asrc-width: Defines a mutual sample width used by DPCM Back Ends.

+   - fsl,asrc-clk-map   : Defines clock map used in driver. which is required
+ by imx8qm
+ <0> - select the map for asrc0
+ <1> - select the map for asrc1
+
 Optional properties:


Best regards
Wang shengjiu


[PATCH v7 2/4] powerpc: expose secure variables to userspace via sysfs

2019-11-07 Thread Eric Richter
From: Nayna Jain 

PowerNV secure variables, which store the keys used for OS kernel
verification, are managed by the firmware. These secure variables need to
be accessed by the userspace for addition/deletion of the certificates.

This patch adds the sysfs interface to expose secure variables for PowerNV
secureboot. The users shall use this interface for manipulating
the keys stored in the secure variables.

Signed-off-by: Nayna Jain 
Reviewed-by: Greg Kroah-Hartman 
Signed-off-by: Eric Richter 
---
 Documentation/ABI/testing/sysfs-secvar |  46 +
 arch/powerpc/Kconfig   |  11 ++
 arch/powerpc/kernel/Makefile   |   1 +
 arch/powerpc/kernel/secvar-sysfs.c | 247 +
 4 files changed, 305 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-secvar
 create mode 100644 arch/powerpc/kernel/secvar-sysfs.c

diff --git a/Documentation/ABI/testing/sysfs-secvar 
b/Documentation/ABI/testing/sysfs-secvar
new file mode 100644
index ..911b89cc6957
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-secvar
@@ -0,0 +1,46 @@
+What:  /sys/firmware/secvar
+Date:  August 2019
+Contact:   Nayna Jain 
+Description:   This directory is created if the POWER firmware supports OS
+   secureboot, thereby secure variables. It exposes interface
+   for reading/writing the secure variables
+
+What:  /sys/firmware/secvar/vars
+Date:  August 2019
+Contact:   Nayna Jain 
+Description:   This directory lists all the secure variables that are supported
+   by the firmware.
+
+What:  /sys/firmware/secvar/backend
+Date:  August 2019
+Contact:   Nayna Jain 
+Description:   A string indicating which backend is in use by the firmware.
+   This determines the format of the variable and the accepted
+   format of variable updates.
+
+What:  /sys/firmware/secvar/vars/
+Date:  August 2019
+Contact:   Nayna Jain 
+Description:   Each secure variable is represented as a directory named as
+   . The variable name is unique and is in ASCII
+   representation. The data and size can be determined by reading
+   their respective attribute files.
+
+What:  /sys/firmware/secvar/vars//size
+Date:  August 2019
+Contact:   Nayna Jain 
+Description:   An integer representation of the size of the content of the
+   variable. In other words, it represents the size of the data.
+
+What:  /sys/firmware/secvar/vars//data
+Date:  August 2019
+Contact:   Nayna Jain h
+Description:   A read-only file containing the value of the variable. The size
+   of the file represents the maximum size of the variable data.
+
+What:  /sys/firmware/secvar/vars//update
+Date:  August 2019
+Contact:   Nayna Jain 
+Description:   A write-only file that is used to submit the new value for the
+   variable. The size of the file represents the maximum size of
+   the variable data that can be written.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c795039bdc73..cabc091f3fe1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -945,6 +945,17 @@ config PPC_SECURE_BOOT
  to enable OS secure boot on systems that have firmware support for
  it. If in doubt say N.
 
+config PPC_SECVAR_SYSFS
+   bool "Enable sysfs interface for POWER secure variables"
+   default y
+   depends on PPC_SECURE_BOOT
+   depends on SYSFS
+   help
+ POWER secure variables are managed and controlled by firmware.
+ These variables are exposed to userspace via sysfs to enable
+ read/write operations on these variables. Say Y if you have
+ secure boot enabled and want to expose variables to userspace.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 3cf26427334f..b216e9f316ee 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -162,6 +162,7 @@ obj-y   += ucall.o
 endif
 
 obj-$(CONFIG_PPC_SECURE_BOOT)  += secure_boot.o ima_arch.o secvar-ops.o
+obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
 
 # Disable GCOV, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
new file mode 100644
index ..a3ba58ee4285
--- /dev/null
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -0,0 +1,247 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 IBM Corporation 
+ *
+ * This code exposes secure variables to user via sysfs
+ */
+
+#define pr_fmt(fmt) "secvar-sysfs: "fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define NAME_MAX_SIZE 1024
+
+static struct kobject *secvar_kobj;
+static struct kset *secvar_kset;
+
+static ssize_t 

Please add powerpc topic/kasan-bitops branch to linux-next

2019-11-07 Thread Michael Ellerman
Hi Stephen,

Can you please add the topic/kasan-bitops tree of the powerpc repository
to linux-next.

powerpc git 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git#topic/kasan-bitops

See:
  
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/kasan-bitops

This will be a (hopefully) short lived branch to carry some cross
architecture KASAN related patches for v5.5.

cheers


[PATCH v7 0/4] powerpc: expose secure variables to the kernel and userspace

2019-11-07 Thread Eric Richter
In order to verify the OS kernel on PowerNV systems, secure boot requires
X.509 certificates trusted by the platform. These are stored in secure
variables controlled by OPAL, called OPAL secure variables. In order to
enable users to manage the keys, the secure variables need to be exposed
to userspace.

OPAL provides the runtime services for the kernel to be able to access the
secure variables[1]. This patchset defines the kernel interface for the
OPAL APIs. These APIs are used by the hooks, which load these variables
to the keyring and expose them to the userspace for reading/writing.

The previous version[2] of the patchset added support only for the sysfs
interface. This patch adds two more patches that involves loading of
the firmware trusted keys to the kernel keyring.

Overall, this patchset adds the following support:

* expose secure variables to the kernel via OPAL Runtime API interface
* expose secure variables to the userspace via kernel sysfs interface
* load kernel verification and revocation keys to .platform and
.blacklist keyring respectively.

The secure variables can be read/written using simple linux utilities
cat/hexdump.

For example:
Path to the secure variables is:
/sys/firmware/secvar/vars

Each secure variable is listed as directory. 
$ ls -l
total 0
drwxr-xr-x. 2 root root 0 Aug 20 21:20 db
drwxr-xr-x. 2 root root 0 Aug 20 21:20 KEK
drwxr-xr-x. 2 root root 0 Aug 20 21:20 PK

The attributes of each of the secure variables are(for example: PK):
[db]$ ls -l
total 0
-r--r--r--. 1 root root  4096 Oct  1 15:10 data
-r--r--r--. 1 root root 65536 Oct  1 15:10 size
--w---. 1 root root  4096 Oct  1 15:12 update

The "data" is used to read the existing variable value using hexdump. The
data is stored in ESL format.
The "update" is used to write a new value using cat. The update is
to be submitted as AUTH file.

[1] Depends on skiboot OPAL API changes which removes metadata from
the API. https://lists.ozlabs.org/pipermail/skiboot/2019-September/015203.html.
[2] https://lkml.org/lkml/2019/6/13/1644

Changelog:
v7 (on behalf of Nayna, by Eric Richter):
* secvar-sysfs now a bool rather than a tristate option
* added documentation for backend sysfs entry

v6 (on behalf of Nayna, by Eric Richter):
* updated device tree layout
  * secvar node now sets compatible based on backend
  * all ibm,secvar-v1 compatible-checking code checks for
ibm,edk2-compat-v1
* added backend attribute to secvar-sysfs to expose backend version to
  userspace
* loading certs from db now depends on backend (not all backends may
  have a "db")
* fixed device node leaks
* fixed leaking string on early exit

v5:
* rebased to v5.4-rc3
* includes Oliver's feedbacks
  * changed OPAL API as platform driver
  * sysfs are made default enabled and dependent on PPC_SECURE_BOOT
  * fixed code specific changes in both OPAL API and sysfs
  * reading size of the "data" and "update" file from device-tree.  
  * fixed sysfs documentation to also reflect the data and update file
  size interpretation
  * This patchset is no more dependent on ima-arch/blacklist patchset

v4:
* rebased to v5.4-rc1 
* uses __BIN_ATTR_WO macro to create binary attribute as suggested by
  Greg
* removed email id from the file header
* renamed argument keysize to keybufsize in get_next() function
* updated default binary file sizes to 0, as firmware handles checking
against the maximum size
* fixed minor formatting issues in Patch 4/4
* added Greg's and Mimi's Reviewed-by and Ack-by

v3:
* includes Greg's feedbacks:
 * fixes in Patch 2/4
   * updates the Documentation.
   * fixes code feedbacks
* adds SYSFS Kconfig dependency for SECVAR_SYSFS
* fixes mixed tabs and spaces
* removes "name" attribute for each of the variable name based
directories
* fixes using __ATTR_RO() and __BIN_ATTR_RO() and statics and const
* fixes the racing issue by using kobj_type default groups. Also,
fixes the kobject leakage.
* removes extra print messages
  * updates patch description for Patch 3/4
  * removes file name from Patch 4/4 file header comment and removed
  def_bool y from the LOAD_PPC_KEYS Kconfig

* includes Oliver's feedbacks:
  * fixes Patch 1/2
   * moves OPAL API wrappers after opal_nx_proc_init(), fixed the
   naming, types and removed extern.
   * fixes spaces
   * renames get_variable() to get(), get_next_variable() to get_next()
   and set_variable() to set()
   * removed get_secvar_ops() and defined secvar_ops as global
   * fixes consts and statics
   * removes generic secvar_init() and defined platform specific
   opal_secar_init()
   * updates opal_secvar_supported() to check for secvar support even
   before checking the OPAL APIs support and also fixed the error codes.
   * addes function that converts OPAL return codes to linux errno
   * moves secvar check support in the opal_secvar_init() and defined its
   prototype in opal.h
  * fixes Patch 2/2
   * fixes static/const
   * defines macro for max name size
   *