date:20190606

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Michael Ellerman

Ravi Bangoria  writes:

> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
>
>   # cat /proc/kallsyms | grep kthread_create_list
>   c121c8b8 d kthread_create_list
>
> Add watchpoint on kthread_create_list->next:
>
>   # perf record -e mem:0xc121c8c0
>
> Run some workload such that new kthread gets invoked. Ex, I
> just logged out from console:
>
>   list_add corruption. next->prev should be prev (c1214e00), \
>   but was c121c8b8. (next=c121c8b8).
>   WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>   CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
>   ...
>   NIP __list_add_valid+0xb4/0xc0
>   LR __list_add_valid+0xb0/0xc0
>   Call Trace:
>   __list_add_valid+0xb0/0xc0 (unreliable)
>   __kthread_create_on_node+0xe0/0x260
>   kthread_create_on_node+0x34/0x50
>   create_worker+0xe8/0x260
>   worker_thread+0x444/0x560
>   kthread+0x160/0x1a0
>   ret_from_kernel_thread+0x5c/0x70

This all depends on what code the compiler generates for the list
access. Can you include a disassembly of the relevant code in your
kernel so we have an example of the bad case.

> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 9481a11..96de0d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1753,7 +1753,7 @@ handle_dabr_fault:
>   ld  r5,_DSISR(r1)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  do_break
> -12:  b   ret_from_except_lite
> +12:  b   ret_from_except

This probably warrants a comment explaining why we can't use the (badly
named) "lite" version.

cheers

[Bug 203837] Booting kernel under KVM immediately freezes host

2019-06-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203837

Paul Mackerras (pau...@ozlabs.org) changed:

   What|Removed |Added

 CC||pau...@ozlabs.org

--- Comment #1 from Paul Mackerras (pau...@ozlabs.org) ---
I have tried but not succeeded in replicating this problem.

I have tried 5.2-rc3 in the host with the config I usually use, plus 5.2-rc3 in
the guest with that same config. That boots just fine.

With 5.2-rc3 in the host and my usual config, and 5.2-rc3 in the guest compiled
with the config attached to this bug, the guest gets a kernel panic due to
being unable to mount root. It looks like it never manages to load virtio-blk
for some reason.

With the config attached to this bug, I did once see the guest stop outputting
messages after the message about bringing up CPUs. The host was still running
just fine, and top in the host showed the qemu-system-ppc64 process using 100%
of a CPU, consistent with the guest being in an infinite loop.

I think we need more details about the machine where the crash is occurring -
host kernel config, details of VM config (qemu command line or libvirt xml),
etc.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate

2019-06-06 Thread Aneesh Kumar K.V

Nicholas Piggin  writes:

> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
>
> Fix this by adding a check for this condition as well.
>

Reviewed-by: Aneesh Kumar K.V 

> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at 
> _PAGE_PRESENT bit")
> Cc: Christophe Leroy 
> Suggested-by: Aneesh Kumar K.V 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/mm/pgtable.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index db4a6253df92..533fc6fa6726 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
>   pdshift = PMD_SHIFT;
>   pmdp = pmd_offset(, ea);
>   pmd  = READ_ONCE(*pmdp);
> +
>   /*
> -  * A hugepage collapse is captured by pmd_none, because
> -  * it mark the pmd none and do a hpte invalidate.
> +  * A hugepage collapse is captured by this condition, see
> +  * pmdp_collapse_flush.
>*/
>   if (pmd_none(pmd))
>   return NULL;
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> + /*
> +  * A hugepage split is captured by this condition, see
> +  * pmdp_invalidate.
> +  *
> +  * Huge page modification can be caught here too.
> +  */
> + if (pmd_is_serializing(pmd))
> + return NULL;
> +#endif
> +
>   if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
>   if (is_thp)
>   *is_thp = true;
> -- 
> 2.20.1

Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate

2019-06-06 Thread Christophe Leroy





Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :

The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
the synchronisation against lock free lookups, __find_linux_pte's
pmd_none check no longer returns true for such cases.

Fix this by adding a check for this condition as well.

Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT 
bit")
Cc: Christophe Leroy 
Suggested-by: Aneesh Kumar K.V 
Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/mm/pgtable.c | 16 ++--
  1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..533fc6fa6726 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
pdshift = PMD_SHIFT;
pmdp = pmd_offset(, ea);
pmd  = READ_ONCE(*pmdp);
+
/*
-* A hugepage collapse is captured by pmd_none, because
-* it mark the pmd none and do a hpte invalidate.
+* A hugepage collapse is captured by this condition, see
+* pmdp_collapse_flush.
 */
if (pmd_none(pmd))
return NULL;
  
+#ifdef CONFIG_PPC_BOOK3S_64

+   /*
+* A hugepage split is captured by this condition, see
+* pmdp_invalidate.
+*
+* Huge page modification can be caught here too.
+*/
+   if (pmd_is_serializing(pmd))
+   return NULL;
+#endif
+


Could get rid of that #ifdef by adding the following in book3s32 and 
nohash pgtable.h:


static inline bool pmd_is_serializing()  { return false; }

Christophe


if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
if (is_thp)
*is_thp = true;

Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-06 Thread Aneesh Kumar K.V

Nicholas Piggin  writes:

> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
>
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
>
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
>
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
>

Reviewed-by: Aneesh Kumar K.V 

> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in 
> pte helpers")
> Cc: Aneesh Kumar K.V 
> Cc: Christophe Leroy 
> Signed-off-by: Nicholas Piggin 
> ---
>
> I accounted for Aneesh's and Christophe's feedback, except I couldn't
> find a good way to replace the ifdef with IS_ENABLED because of
> _PAGE_INVALID etc., but at least cleaned that up a bit nicer.
>
> Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
> Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
> They should probably both be merged in stable kernels after upstream.
>
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 
>  arch/powerpc/mm/book3s64/pgtable.c   |  3 ++
>  2 files changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 7dede2e34b70..ccf00a8b98c6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
>   return false;
>  }
>  
> +static inline int pmd_is_serializing(pmd_t pmd)
> +{
> + /*
> +  * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
> +  * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
> +  *
> +  * This condition may also occur when flushing a pmd while flushing
> +  * it (see ptep_modify_prot_start), so callers must ensure this
> +  * case is fine as well.
> +  */
> + if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
> + cpu_to_be64(_PAGE_INVALID))
> + return true;
> +
> + return false;
> +}
> +
>  static inline int pmd_bad(pmd_t pmd)
>  {
>   if (radix_enabled())
> @@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
>  #define pmd_access_permitted pmd_access_permitted
>  static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  {
> + /*
> +  * pmdp_invalidate sets this combination (which is not caught by
> +  * !pte_present() check in pte_access_permitted), to prevent
> +  * lock-free lookups, as part of the serialize_against_pte_lookup()
> +  * synchronisation.
> +  *
> +  * This also catches the case where the PTE's hardware PRESENT bit is
> +  * cleared while TLB is flushed, which is suboptimal but should not
> +  * be frequent.
> +  */
> + if (pmd_is_serializing(pmd))
> + return false;
> +
>   return pte_access_permitted(pmd_pte(pmd), write);
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 16bda049187a..ff98b663c83e 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, 
> unsigned long address,
>   /*
>* This ensures that generic code that rely on IRQ disabling
>* to prevent a parallel THP split work as expected.
> +  *
> +  * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
> +  * a special case check in pmd_access_permitted.
>*/
>   serialize_against_pte_lookup(vma->vm_mm);
>   return __pmd(old_pmd);
> -- 
> 2.20.1

Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-06 Thread Christophe Leroy





Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :

Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
in pte helpers") changed the actual bitwise tests in pte_access_permitted
by using pte_write() and pte_present() helpers rather than raw bitwise
testing _PAGE_WRITE and _PAGE_PRESENT bits.

The pte_present change now returns true for ptes which are !_PAGE_PRESENT
and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
synchronize access from lock-free lookups. pte_access_permitted is used by
pmd_access_permitted, so allowing GUP lock free access to proceed with
such PTEs breaks this synchronisation.

This bug has been observed on HPT host, with random crashes and corruption
in guests, usually together with bad PMD messages in the host.

Fix this by adding an explicit check in pmd_access_permitted, and
documenting the condition explicitly.

The pte_write() change should be okay, and would prevent GUP from falling
back to the slow path when encountering savedwrite ptes, which matches
what x86 (that does not implement savedwrite) does.

Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte 
helpers")
Cc: Aneesh Kumar K.V 
Cc: Christophe Leroy 
Signed-off-by: Nicholas Piggin 
---

I accounted for Aneesh's and Christophe's feedback, except I couldn't
find a good way to replace the ifdef with IS_ENABLED because of
_PAGE_INVALID etc., but at least cleaned that up a bit nicer.


I guess the standard way is to add a pmd_is_serializing() which return 
always false in book3s/32/pgtable.h and in nohash/pgtable.h




Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
They should probably both be merged in stable kernels after upstream.

  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 
  arch/powerpc/mm/book3s64/pgtable.c   |  3 ++
  2 files changed, 33 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..ccf00a8b98c6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
return false;
  }
  
+static inline int pmd_is_serializing(pmd_t pmd)


should be static inline bool instead of int ?

Christophe


+{
+   /*
+* If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
+* and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
+*
+* This condition may also occur when flushing a pmd while flushing
+* it (see ptep_modify_prot_start), so callers must ensure this
+* case is fine as well.
+*/
+   if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
+   cpu_to_be64(_PAGE_INVALID))
+   return true;
+
+   return false;
+}
+
  static inline int pmd_bad(pmd_t pmd)
  {
if (radix_enabled())
@@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
  #define pmd_access_permitted pmd_access_permitted
  static inline bool pmd_access_permitted(pmd_t pmd, bool write)
  {
+   /*
+* pmdp_invalidate sets this combination (which is not caught by
+* !pte_present() check in pte_access_permitted), to prevent
+* lock-free lookups, as part of the serialize_against_pte_lookup()
+* synchronisation.
+*
+* This also catches the case where the PTE's hardware PRESENT bit is
+* cleared while TLB is flushed, which is suboptimal but should not
+* be frequent.
+*/
+   if (pmd_is_serializing(pmd))
+   return false;
+
return pte_access_permitted(pmd_pte(pmd), write);
  }
  
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c

index 16bda049187a..ff98b663c83e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned 
long address,
/*
 * This ensures that generic code that rely on IRQ disabling
 * to prevent a parallel THP split work as expected.
+*
+* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
+* a special case check in pmd_access_permitted.
 */
serialize_against_pte_lookup(vma->vm_mm);
return __pmd(old_pmd);

[PATCH] powerpc/pseries: fix oops in hotplug memory notifier

2019-06-06 Thread Nathan Lynch

During post-migration device tree updates, we can oops in
pseries_update_drconf_memory if the source device tree has an
ibm,dynamic-memory-v2 property and the destination has a
ibm,dynamic_memory (v1) property. The notifier processes an "update"
for the ibm,dynamic-memory property but it's really an add in this
scenario. So make sure the old property object is there before
dereferencing it.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 47087832f8b2..e6bd172bcf30 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -980,6 +980,9 @@ static int pseries_update_drconf_memory(struct 
of_reconfig_data *pr)
if (!memblock_size)
return -EINVAL;
 
+   if (!pr->old_prop)
+   return 0;
+
p = (__be32 *) pr->old_prop->value;
if (!p)
return -EINVAL;
-- 
2.20.1

Re: [PATCH v3 7/9] KVM: PPC: Ultravisor: Restrict LDBAR access

2019-06-06 Thread Madhavan Srinivasan




On 06/06/19 11:06 PM, Claudio Carvalho wrote:

When the ultravisor firmware is available, it takes control over the
LDBAR register. In this case, thread-imc updates and save/restore
operations on the LDBAR register are handled by ultravisor.

Signed-off-by: Claudio Carvalho 
Signed-off-by: Ram Pai 
---
  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 2 ++
  arch/powerpc/platforms/powernv/idle.c | 6 --
  arch/powerpc/platforms/powernv/opal-imc.c | 7 +++
  3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f9b2620fbecd..cffb365d9d02 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -375,8 +375,10 @@ BEGIN_FTR_SECTION
mtspr   SPRN_RPR, r0
ld  r0, KVM_SPLIT_PMMAR(r6)
mtspr   SPRN_PMMAR, r0
+BEGIN_FW_FTR_SECTION_NESTED(70)
ld  r0, KVM_SPLIT_LDBAR(r6)
mtspr   SPRN_LDBAR, r0
+END_FW_FTR_SECTION_NESTED(FW_FEATURE_ULTRAVISOR, 0, 70)
isync
  FTR_SECTION_ELSE
/* On P9 we use the split_info for coordinating LPCR changes */
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index c9133f7908ca..fd62435e3267 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -679,7 +679,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
sprs.ptcr   = mfspr(SPRN_PTCR);
sprs.rpr= mfspr(SPRN_RPR);
sprs.tscr   = mfspr(SPRN_TSCR);
-   sprs.ldbar  = mfspr(SPRN_LDBAR);
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   sprs.ldbar  = mfspr(SPRN_LDBAR);

sprs_saved = true;

@@ -762,7 +763,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
mtspr(SPRN_PTCR,sprs.ptcr);
mtspr(SPRN_RPR, sprs.rpr);
mtspr(SPRN_TSCR,sprs.tscr);
-   mtspr(SPRN_LDBAR,   sprs.ldbar);
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   mtspr(SPRN_LDBAR,   sprs.ldbar);

if (pls >= pnv_first_tb_loss_level) {
/* TB loss */
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 1b6932890a73..e9b641d313fb 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -254,6 +254,13 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
bool core_imc_reg = false, thread_imc_reg = false;
u32 type;

+   /*
+* When the Ultravisor is enabled, it is responsible for thread-imc
+* updates
+*/


Would prefer the comment to be "Disable IMC devices, when Ultravisor is 
enabled"

Rest looks good.
Acked-by: Madhavan Srinivasan 


+   if (firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   return -EACCES;
+
/*
 * Check whether this is kdump kernel. If yes, force the engines to
 * stop and return.

Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-06 Thread Nicholas Piggin

Aneesh Kumar K.V's on June 7, 2019 1:23 am:
> Nicholas Piggin  writes:
> 
>> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
>> in pte helpers") changed the actual bitwise tests in pte_access_permitted
>> by using pte_write() and pte_present() helpers rather than raw bitwise
>> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>>
>> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
>> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
>> synchronize access from lock-free lookups. pte_access_permitted is used by
>> pmd_access_permitted, so allowing GUP lock free access to proceed with
>> such PTEs breaks this synchronisation.
>>
>> This bug has been observed on HPT host, with random crashes and corruption
>> in guests, usually together with bad PMD messages in the host.
>>
>> Fix this by adding an explicit check in pmd_access_permitted, and
>> documenting the condition explicitly.
>>
>> The pte_write() change should be okay, and would prevent GUP from falling
>> back to the slow path when encountering savedwrite ptes, which matches
>> what x86 (that does not implement savedwrite) does.
>>
> 
> I guess we are doing the find_linux_pte change in another patch.
> 
> Reviewed-by: Aneesh Kumar K.V 

Sorry, just got delayed with re-testing. Thanks for the feedback on it
I send new patches.

Two patches yes because they fix issues introduced in different
commits so it should make backports easier.

Thanks,
Nick

[PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate

2019-06-06 Thread Nicholas Piggin

The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
the synchronisation against lock free lookups, __find_linux_pte's
pmd_none check no longer returns true for such cases.

Fix this by adding a check for this condition as well.

Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at 
_PAGE_PRESENT bit")
Cc: Christophe Leroy 
Suggested-by: Aneesh Kumar K.V 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/pgtable.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..533fc6fa6726 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
pdshift = PMD_SHIFT;
pmdp = pmd_offset(, ea);
pmd  = READ_ONCE(*pmdp);
+
/*
-* A hugepage collapse is captured by pmd_none, because
-* it mark the pmd none and do a hpte invalidate.
+* A hugepage collapse is captured by this condition, see
+* pmdp_collapse_flush.
 */
if (pmd_none(pmd))
return NULL;
 
+#ifdef CONFIG_PPC_BOOK3S_64
+   /*
+* A hugepage split is captured by this condition, see
+* pmdp_invalidate.
+*
+* Huge page modification can be caught here too.
+*/
+   if (pmd_is_serializing(pmd))
+   return NULL;
+#endif
+
if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
if (is_thp)
*is_thp = true;
-- 
2.20.1

[PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-06 Thread Nicholas Piggin

Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
in pte helpers") changed the actual bitwise tests in pte_access_permitted
by using pte_write() and pte_present() helpers rather than raw bitwise
testing _PAGE_WRITE and _PAGE_PRESENT bits.

The pte_present change now returns true for ptes which are !_PAGE_PRESENT
and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
synchronize access from lock-free lookups. pte_access_permitted is used by
pmd_access_permitted, so allowing GUP lock free access to proceed with
such PTEs breaks this synchronisation.

This bug has been observed on HPT host, with random crashes and corruption
in guests, usually together with bad PMD messages in the host.

Fix this by adding an explicit check in pmd_access_permitted, and
documenting the condition explicitly.

The pte_write() change should be okay, and would prevent GUP from falling
back to the slow path when encountering savedwrite ptes, which matches
what x86 (that does not implement savedwrite) does.

Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte 
helpers")
Cc: Aneesh Kumar K.V 
Cc: Christophe Leroy 
Signed-off-by: Nicholas Piggin 
---

I accounted for Aneesh's and Christophe's feedback, except I couldn't
find a good way to replace the ifdef with IS_ENABLED because of
_PAGE_INVALID etc., but at least cleaned that up a bit nicer.

Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
They should probably both be merged in stable kernels after upstream.

 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 
 arch/powerpc/mm/book3s64/pgtable.c   |  3 ++
 2 files changed, 33 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..ccf00a8b98c6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
return false;
 }
 
+static inline int pmd_is_serializing(pmd_t pmd)
+{
+   /*
+* If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
+* and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
+*
+* This condition may also occur when flushing a pmd while flushing
+* it (see ptep_modify_prot_start), so callers must ensure this
+* case is fine as well.
+*/
+   if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
+   cpu_to_be64(_PAGE_INVALID))
+   return true;
+
+   return false;
+}
+
 static inline int pmd_bad(pmd_t pmd)
 {
if (radix_enabled())
@@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_access_permitted pmd_access_permitted
 static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 {
+   /*
+* pmdp_invalidate sets this combination (which is not caught by
+* !pte_present() check in pte_access_permitted), to prevent
+* lock-free lookups, as part of the serialize_against_pte_lookup()
+* synchronisation.
+*
+* This also catches the case where the PTE's hardware PRESENT bit is
+* cleared while TLB is flushed, which is suboptimal but should not
+* be frequent.
+*/
+   if (pmd_is_serializing(pmd))
+   return false;
+
return pte_access_permitted(pmd_pte(pmd), write);
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..ff98b663c83e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned 
long address,
/*
 * This ensures that generic code that rely on IRQ disabling
 * to prevent a parallel THP split work as expected.
+*
+* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
+* a special case check in pmd_access_permitted.
 */
serialize_against_pte_lookup(vma->vm_mm);
return __pmd(old_pmd);
-- 
2.20.1

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Ravi Bangoria




On 6/7/19 6:20 AM, Michael Neuling wrote:
> On Thu, 2019-06-06 at 12:59 +0530, Ravi Bangoria wrote:
>> Powerpc hw triggers watchpoint before executing the instruction.
>> To make trigger-after-execute behavior, kernel emulates the
>> instruction. If the instruction is 'load something into non-
>> volatile register', exception handler should restore emulated
>> register state while returning back, otherwise there will be
>> register state corruption. Ex, Adding a watchpoint on a list
>> can corrput the list:
>>
>>   # cat /proc/kallsyms | grep kthread_create_list
>>   c121c8b8 d kthread_create_list
>>
>> Add watchpoint on kthread_create_list->next:
>>
>>   # perf record -e mem:0xc121c8c0
>>
>> Run some workload such that new kthread gets invoked. Ex, I
>> just logged out from console:
>>
>>   list_add corruption. next->prev should be prev (c1214e00), \
>>  but was c121c8b8. (next=c121c8b8).
>>   WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>>   CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ 
>> #69
>>   ...
>>   NIP __list_add_valid+0xb4/0xc0
>>   LR __list_add_valid+0xb0/0xc0
>>   Call Trace:
>>   __list_add_valid+0xb0/0xc0 (unreliable)
>>   __kthread_create_on_node+0xe0/0x260
>>   kthread_create_on_node+0x34/0x50
>>   create_worker+0xe8/0x260
>>   worker_thread+0x444/0x560
>>   kthread+0x160/0x1a0
>>   ret_from_kernel_thread+0x5c/0x70
>>
>> Signed-off-by: Ravi Bangoria 
> 
> How long has this been around? Should we be CCing stable?

"bl .save_nvgprs" was added in the commit 5aae8a5370802 ("powerpc, 
hw_breakpoints:
Implement hw_breakpoints for 64-bit server processors"), which was merged in
v2.6.36.

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-06 Thread Alistair Popple

On Thursday, 6 June 2019 10:07:54 PM AEST Oliver wrote:
> On Thu, Jun 6, 2019 at 5:17 PM Alistair Popple  
wrote:
> > I have been hitting EEH address errors testing this with some network
> > cards which map/unmap DMA addresses more frequently. For example:
> > 
> > PHB4 PHB#5 Diag-data (Version: 1)
> > brdgCtl:0002
> > RootSts:00060020 00402000 a0220008 00100107 0800
> > PhbSts: 001c 001c
> > Lem:00010080  0080
> > PhbErr: 0280 0200 214898000240
> > a0084000 RxeTceErr:  2000 2000
> > c000  PblErr: 0002
> > 0002   RegbErr:   
> > 0040 0040 61000c48 
> > PE[000] A/B: 8300b038 8000
> > 
> > Interestingly the PE[000] A/B data is the same across different cards
> > and drivers.
> 
> TCE page fault due to permissions so odds are the DMA address was unmapped.
> 
> What cards did you get this with? I tried with one of the common
> BCM5719 NICs and generated network traffic by using rsync to copy a
> linux git tree to the system and it worked fine.

Personally I've seen it with the BCM5719 with the driver modified to set a DMA 
mask of 48 bits instead of 64 and using scp to copy a random 1GB file to the 
system repeatedly until it crashes. 

I have also had reports of someone hitting the same error using a Mellanox 
CX-5 adaptor with a similar driver modification.

- Alistair

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Michael Neuling

On Thu, 2019-06-06 at 12:59 +0530, Ravi Bangoria wrote:
> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
> 
>   # cat /proc/kallsyms | grep kthread_create_list
>   c121c8b8 d kthread_create_list
> 
> Add watchpoint on kthread_create_list->next:
> 
>   # perf record -e mem:0xc121c8c0
> 
> Run some workload such that new kthread gets invoked. Ex, I
> just logged out from console:
> 
>   list_add corruption. next->prev should be prev (c1214e00), \
>   but was c121c8b8. (next=c121c8b8).
>   WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>   CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
>   ...
>   NIP __list_add_valid+0xb4/0xc0
>   LR __list_add_valid+0xb0/0xc0
>   Call Trace:
>   __list_add_valid+0xb0/0xc0 (unreliable)
>   __kthread_create_on_node+0xe0/0x260
>   kthread_create_on_node+0x34/0x50
>   create_worker+0xe8/0x260
>   worker_thread+0x444/0x560
>   kthread+0x160/0x1a0
>   ret_from_kernel_thread+0x5c/0x70
> 
> Signed-off-by: Ravi Bangoria 

How long has this been around? Should we be CCing stable?

Mikey

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S
> index 9481a11..96de0d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1753,7 +1753,7 @@ handle_dabr_fault:
>   ld  r5,_DSISR(r1)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  do_break
> -12:  b   ret_from_except_lite
> +12:  b   ret_from_except
>  
>  
>  #ifdef CONFIG_PPC_BOOK3S_64

[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #2 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 283139
  --> https://bugzilla.kernel.org/attachment.cgi?id=283139=edit
kernel .config (5.2-rc3, G4 MDD)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #1 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 283137
  --> https://bugzilla.kernel.org/attachment.cgi?id=283137=edit
failed boot, screenshot 5.2-rc1

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 203839] New: Kernel 5.2-rc3 fails to boot on a PowerMac G4 3, 6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203839

Bug ID: 203839
   Summary: Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6:
systemd[1]: Failed to bump fs.file-max, ignoring:
invalid argument
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.2-rc3
  Hardware: PPC-32
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-32
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: erhar...@mailbox.org
Regression: No

Created attachment 283135
  --> https://bugzilla.kernel.org/attachment.cgi?id=283135=edit
failed boot, screenshot 5.2-rc3

The system boots fine with kernel 5.1.7. Starting with 5.2-rc1 the G4 got
problems to correctly finish booting. With 5.2-rc3 basic boot process seems to
complete, but crashes when handing control over to systemd:

systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
systemd[1]: segfault (11) at 0 nip 0 ir 0 code 1
systemd[1]: Bad NIP, not dumping instructions
[...]

For more details see the screenshot. Kernel 5.2-rc1 errors out even earlier
(see screenshot) with a different error. Also this problem maybe is 32bit
specific. Tried 5.2-rc3 on a PowerMac G5 which boots successfully without
problems.

root is ext4, boot is ext2, systemd is v241.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[RFC/RFT PATCH] Revert "ASoC: fsl_esai: ETDR and TX0~5 registers are non volatile"

2019-06-06 Thread Nicolin Chen

This reverts commit 8973112aa41b8ad956a5b47f2fe17bc2a5cf2645.

ETDR and TX0~5 are TX data registers. There are a couple of reasons
to revert the change:
1) Though ETDR and TX0~5 are not volatile but write-only registers,
   they should not be cached either. According to the definition of
   "volatile_reg", one should be put in the volatile list if it can
   not be cached.
2) When doing regcache_sync(), the operation may accidentally write
   some "dirty" data into these registers, in case that cached data
   happen to be different from the default ones. It may also result
   in a channel shift/swap situation since the number of write-via-
   sync operations at ETDR would unlikely match the channel number.

Note: this revert is not a complete revert as it keeps those macros
of registers remaining in the default value list while the original
commit also changed other entries in the list. And this patch isn't
very necessary to Cc stable tree since there has been always a FIFO
reset operation around the regcache_sync() call, even prior to this
reverted commit.

Signed-off-by: Nicolin Chen 
Cc: Shengjiu Wang 
---
Hi Mark,
In case there's no objection against the patch, I'd still like to
wait for a Tested-by from NXP folks before submitting it. Thanks!

 sound/soc/fsl/fsl_esai.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index 10d2210c91ef..8f0a86335f73 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -652,16 +652,9 @@ static const struct snd_soc_component_driver 
fsl_esai_component = {
 };
 
 static const struct reg_default fsl_esai_reg_defaults[] = {
-   {REG_ESAI_ETDR,  0x},
{REG_ESAI_ECR,   0x},
{REG_ESAI_TFCR,  0x},
{REG_ESAI_RFCR,  0x},
-   {REG_ESAI_TX0,   0x},
-   {REG_ESAI_TX1,   0x},
-   {REG_ESAI_TX2,   0x},
-   {REG_ESAI_TX3,   0x},
-   {REG_ESAI_TX4,   0x},
-   {REG_ESAI_TX5,   0x},
{REG_ESAI_TSR,   0x},
{REG_ESAI_SAICR, 0x},
{REG_ESAI_TCR,   0x},
@@ -711,10 +704,17 @@ static bool fsl_esai_readable_reg(struct device *dev, 
unsigned int reg)
 static bool fsl_esai_volatile_reg(struct device *dev, unsigned int reg)
 {
switch (reg) {
+   case REG_ESAI_ETDR:
case REG_ESAI_ERDR:
case REG_ESAI_ESR:
case REG_ESAI_TFSR:
case REG_ESAI_RFSR:
+   case REG_ESAI_TX0:
+   case REG_ESAI_TX1:
+   case REG_ESAI_TX2:
+   case REG_ESAI_TX3:
+   case REG_ESAI_TX4:
+   case REG_ESAI_TX5:
case REG_ESAI_RX0:
case REG_ESAI_RX1:
case REG_ESAI_RX2:
-- 
2.17.1

[Bug 203837] New: Booting kernel under KVM immediately freezes host

2019-06-06 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203837

Bug ID: 203837
   Summary: Booting kernel under KVM immediately freezes host
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: v5.2-rc2
  Hardware: PPC-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: blocking
  Priority: P1
 Component: PPC-64
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: sh...@anastas.io
Regression: No

Created attachment 283133
  --> https://bugzilla.kernel.org/attachment.cgi?id=283133=edit
Guest kernel config

When booting kernel v5.2-rc2 (and confirmed up to 156c05917) in a VM on a
POWER9 host running kernel 5.1.7, the host immediately locks up and
becomes unresponsive to the point of requiring a hard reset.

The last guest kernel message printed to the screen before the
host locks up is:

[0.013940] smp: Bringing up secondary CPUs ...

Due to the nature of the bug, it is very difficult to bisect, since a manual
host reset is required each time the bug is encountered. Also, my only
POWER machine is my primary workstation.

The bug has also been confirmed on other host kernel versions (down to 5.0.x).
When downgrading the guest kernel to 5.1.0, the issue is not present.

The guest kernel .config is attached.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3 6/9] KVM: PPC: Ultravisor: Restrict flush of the partition tlb cache

2019-06-06 Thread Paul Mackerras

On Thu, Jun 06, 2019 at 04:39:04PM -0300, Murilo Opsfelder Araújo wrote:
> Claudio Carvalho  writes:
> 
> > From: Ram Pai 
> >
> > Ultravisor is responsible for flushing the tlb cache, since it manages
> > the PATE entries. Hence skip tlb flush, if the ultravisor firmware is
> > available.
> >
> > Signed-off-by: Ram Pai 
> > Signed-off-by: Claudio Carvalho 
> > ---
> >  arch/powerpc/mm/book3s64/pgtable.c | 33 +-
> >  1 file changed, 19 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> > b/arch/powerpc/mm/book3s64/pgtable.c
> > index 40a9fc8b139f..1eeb5fe87023 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -224,6 +224,23 @@ void __init mmu_partition_table_init(void)
> > powernv_set_nmmu_ptcr(ptcr);
> >  }
> >
> > +static void flush_partition(unsigned int lpid, unsigned long dw0)
> > +{
> > +   if (dw0 & PATB_HR) {
> > +   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 1) : :
> > +"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> > +   asm volatile(PPC_TLBIE_5(%0, %1, 2, 1, 1) : :
> > +"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> > +   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
> > +   } else {
> > +   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 0) : :
> > +"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> > +   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
> > +   }
> > +   /* do we need fixup here ?*/
> > +   asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> > +}
> > +
> 
> checkpatch.pl seems to complain:
> 
> ERROR: need consistent spacing around '%' (ctx:WxV)
> #125: FILE: arch/powerpc/mm/book3s64/pgtable.c:230:
> +   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 1) : :
>  ^
> 
> ERROR: need consistent spacing around '%' (ctx:WxV)
> #127: FILE: arch/powerpc/mm/book3s64/pgtable.c:232:
> +   asm volatile(PPC_TLBIE_5(%0, %1, 2, 1, 1) : :
>  ^
> 
> ERROR: need consistent spacing around '%' (ctx:WxV)
> #131: FILE: arch/powerpc/mm/book3s64/pgtable.c:236:
> +   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 0) : :
>  ^

Then clearly checkpatch.pl has a bug.

Paul.

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Larry Finger


On 6/6/19 6:43 AM, Christoph Hellwig wrote:

On Thu, Jun 06, 2019 at 08:57:49PM +1000, Benjamin Herrenschmidt wrote:

Wow... that's an odd amount. One thing we could possibly do is add code
to limit the amount of RAM when we detect that device


Sent too quickly... I mean that *or* force swiotlb at 30-bits on those systems 
based
on detecting the presence of that device in the device-tree.


swiotlb doesn't really help you, as these days swiotlb on matters for
the dma_map* case.  What would help is a ZONE_DMA that covers these
devices.  No need to do the 24-bit x86 does, but 30-bit would do it.

WIP patch for testing below:

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index b8286a2013b4..7a367ce87c41 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,6 +319,10 @@ struct vm_area_struct;
  #endif /* __ASSEMBLY__ */
  #include 
  
+#if 1 /* XXX: pmac?  dynamic discovery? */

+#define ARCH_ZONE_DMA_BITS 30
+#else
  #define ARCH_ZONE_DMA_BITS 31
+#endif
  
  #endif /* _ASM_POWERPC_PAGE_H */

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cba29131bccc..2540d3b2588c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -248,7 +248,8 @@ void __init paging_init(void)
   (long int)((top_of_ram - total_ram) >> 20));
  
  #ifdef CONFIG_ZONE_DMA

-   max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffUL >> PAGE_SHIFT);
+   max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
+   ((1UL << ARCH_ZONE_DMA_BITS) - 1) >> PAGE_SHIFT);
  #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
  #ifdef CONFIG_HIGHMEM



This trial patch failed.

Larry

Re: [PATCH v3 6/9] KVM: PPC: Ultravisor: Restrict flush of the partition tlb cache

2019-06-06 Thread Murilo Opsfelder Araújo

Claudio Carvalho  writes:

> From: Ram Pai 
>
> Ultravisor is responsible for flushing the tlb cache, since it manages
> the PATE entries. Hence skip tlb flush, if the ultravisor firmware is
> available.
>
> Signed-off-by: Ram Pai 
> Signed-off-by: Claudio Carvalho 
> ---
>  arch/powerpc/mm/book3s64/pgtable.c | 33 +-
>  1 file changed, 19 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 40a9fc8b139f..1eeb5fe87023 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -224,6 +224,23 @@ void __init mmu_partition_table_init(void)
>   powernv_set_nmmu_ptcr(ptcr);
>  }
>
> +static void flush_partition(unsigned int lpid, unsigned long dw0)
> +{
> + if (dw0 & PATB_HR) {
> + asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 1) : :
> +  "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> + asm volatile(PPC_TLBIE_5(%0, %1, 2, 1, 1) : :
> +  "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> + trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
> + } else {
> + asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 0) : :
> +  "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> + trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
> + }
> + /* do we need fixup here ?*/
> + asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> +}
> +

checkpatch.pl seems to complain:

ERROR: need consistent spacing around '%' (ctx:WxV)
#125: FILE: arch/powerpc/mm/book3s64/pgtable.c:230:
+   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 1) : :
 ^

ERROR: need consistent spacing around '%' (ctx:WxV)
#127: FILE: arch/powerpc/mm/book3s64/pgtable.c:232:
+   asm volatile(PPC_TLBIE_5(%0, %1, 2, 1, 1) : :
 ^

ERROR: need consistent spacing around '%' (ctx:WxV)
#131: FILE: arch/powerpc/mm/book3s64/pgtable.c:236:
+   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 0) : :
 ^

>  static void __mmu_partition_table_set_entry(unsigned int lpid,
>   unsigned long dw0,
>   unsigned long dw1)
> @@ -238,20 +255,8 @@ static void __mmu_partition_table_set_entry(unsigned int 
> lpid,
>* The type of flush (hash or radix) depends on what the previous
>* use of this partition ID was, not the new use.
>*/
> - asm volatile("ptesync" : : : "memory");
> - if (old & PATB_HR) {
> - asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
> -  "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> - asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
> -  "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> - trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
> - } else {
> - asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
> -  "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
> - trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
> - }
> - /* do we need fixup here ?*/
> - asm volatile("eieio; tlbsync; ptesync" : : : "memory");
> + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
> + flush_partition(lpid, old);
>  }
>
>  void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
> --
> 2.20.1

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Larry Finger


On 6/6/19 6:43 AM, Christoph Hellwig wrote:

On Thu, Jun 06, 2019 at 08:57:49PM +1000, Benjamin Herrenschmidt wrote:

Wow... that's an odd amount. One thing we could possibly do is add code
to limit the amount of RAM when we detect that device


Sent too quickly... I mean that *or* force swiotlb at 30-bits on those systems 
based
on detecting the presence of that device in the device-tree.


swiotlb doesn't really help you, as these days swiotlb on matters for
the dma_map* case.  What would help is a ZONE_DMA that covers these
devices.  No need to do the 24-bit x86 does, but 30-bit would do it.

WIP patch for testing below:

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index b8286a2013b4..7a367ce87c41 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,6 +319,10 @@ struct vm_area_struct;
  #endif /* __ASSEMBLY__ */
  #include 
  
+#if 1 /* XXX: pmac?  dynamic discovery? */

+#define ARCH_ZONE_DMA_BITS 30
+#else
  #define ARCH_ZONE_DMA_BITS 31
+#endif
  
  #endif /* _ASM_POWERPC_PAGE_H */

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cba29131bccc..2540d3b2588c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -248,7 +248,8 @@ void __init paging_init(void)
   (long int)((top_of_ram - total_ram) >> 20));
  
  #ifdef CONFIG_ZONE_DMA

-   max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffUL >> PAGE_SHIFT);
+   max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
+   ((1UL << ARCH_ZONE_DMA_BITS) - 1) >> PAGE_SHIFT);
  #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
  #ifdef CONFIG_HIGHMEM



I am generating a test kernel with this patch.

FYI, the "free" command on my machine shows 1.5+ G of memory. That likely means 
I have 2G installed.


I have tested a patched kernel in which b43legacy falls back to a 31-bit DMA 
mask when the 32-bit one failed. That worked, but would likely kill the x86 
version. Let me know if think a fix in the driver rather than the kernel would 
be better. I still need to understand why the same setup works in b43 and fails 
in b43legacy. :(


Larry

[PATCH v3 9/9] KVM: PPC: Ultravisor: Check for MSR_S during hv_reset_msr

2019-06-06 Thread Claudio Carvalho

From: Michael Anderson 

 - Check for MSR_S so that kvmppc_set_msr will include. Prior to this
   change return to guest would not have the S bit set.

 - Patch based on comment from Paul Mackerras 

Signed-off-by: Michael Anderson 
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ab3d484c5e2e..ab62a66f9b4e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -295,6 +295,7 @@ static void kvmppc_mmu_book3s_64_hv_reset_msr(struct 
kvm_vcpu *vcpu)
msr |= MSR_TS_S;
else
msr |= vcpu->arch.shregs.msr & MSR_TS_MASK;
+   msr |= vcpu->arch.shregs.msr & MSR_S;
kvmppc_set_msr(vcpu, msr);
 }
 
-- 
2.20.1

[PATCH v3 0/9] kvmppc: Paravirtualize KVM to support ultravisor

2019-06-06 Thread Claudio Carvalho

POWER platforms that supports the Protected Execution Facility (PEF)
introduce features that combine hardware facilities and firmware to
enable secure virtual machines. That includes a new processor mode
(ultravisor mode) and the ultravisor firmware.

In PEF enabled systems, the ultravisor firmware runs at a privilege
level above the hypervisor and also takes control over some system
resources. The hypervisor, though, can make system calls to access these
resources. Such system calls, a.k.a. ucalls, are handled by the
ultravisor firmware.

The processor allows part of the system memory to be configured as
secure memory, and introduces a a new mode, called secure mode, where
any software entity in that mode can access secure memory. The
hypervisor doesn't (and can't) run in secure mode, but a secure guest
and the ultravisor firmware do.

This patch set adds support for ultravisor calls and do some preparation
for running secure guests.

---
Changelog:
---
v2->v3:
 - Squashed patches:
 "[PATCH v2 08/10] KVM: PPC: Ultravisor: Return to UV for hcalls from SVM"
 "[PATCH v2 09/10] KVM: PPC: Book3S HV: Fixed for running secure guests"
 - Renamed patch from/to:
 "[PATCH v2 08/10] KVM: PPC: Ultravisor: Return to UV for hcalls from SVM"
 "[PATCH v3 08/09] KVM: PPC: Ultravisor: Enter a secure guest
 - Rebased
 - Addressed comments from Paul Mackerras
 - Dropped ultravisor checks made in power8 code
 - Updated the commit message for:
 "[PATCH v3 08/09] KVM: PPC: Ultravisor: Enter a secure guest"
 - Addressed comments from Maddy
 - Dropped imc-pmu.c changes
 - Changed opal-imc.c to fail the probe when the ultravisor is enabled
 - Fixed "ucall defined but not used" issue when CONFIG_PPC_UV not set 

v1->v2:
 - Addressed comments from Paul Mackerras:
 - Write the pate in HV's table before doing that in UV's
 - Renamed and better documented the ultravisor header files. Also added
   all possible return codes for each ucall
 - Updated the commit message that introduces the MSR_S bit 
 - Moved ultravisor.c and ucall.S to arch/powerpc/kernel
 - Changed ucall.S to not save CR
 - Rebased
 - Changed the patches order
 - Updated several commit messages
 - Added FW_FEATURE_ULTRAVISOR to enable use of firmware_has_feature()
 - Renamed CONFIG_PPC_KVM_UV to CONFIG_PPC_UV and used it to ifdef the ucall
   handler and the code that populates the powerpc_firmware_features for 
   ultravisor
 - Exported the ucall symbol. KVM may be built as module.
 - Restricted LDBAR access if the ultravisor firmware is available
 - Dropped patches:
 "[PATCH 06/13] KVM: PPC: Ultravisor: UV_RESTRICTED_SPR_WRITE ucall"
 "[PATCH 07/13] KVM: PPC: Ultravisor: UV_RESTRICTED_SPR_READ ucall"
 "[PATCH 08/13] KVM: PPC: Ultravisor: fix mtspr and mfspr"
 - Squashed patches:
 "[PATCH 09/13] KVM: PPC: Ultravisor: Return to UV for hcalls from SVM"
 "[PATCH 13/13] KVM: PPC: UV: Have fast_guest_return check secure_guest"

Anshuman Khandual (1):
  KVM: PPC: Ultravisor: Add PPC_UV config option

Claudio Carvalho (2):
  powerpc: Introduce FW_FEATURE_ULTRAVISOR
  KVM: PPC: Ultravisor: Restrict LDBAR access

Michael Anderson (2):
  KVM: PPC: Ultravisor: Use UV_WRITE_PATE ucall to register a PATE
  KVM: PPC: Ultravisor: Check for MSR_S during hv_reset_msr

Ram Pai (2):
  KVM: PPC: Ultravisor: Add generic ultravisor call handler
  KVM: PPC: Ultravisor: Restrict flush of the partition tlb cache

Sukadev Bhattiprolu (2):
  KVM: PPC: Ultravisor: Introduce the MSR_S bit
  KVM: PPC: Ultravisor: Enter a secure guest

 arch/powerpc/Kconfig  | 20 +++
 arch/powerpc/include/asm/firmware.h   |  5 +-
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/include/asm/reg.h|  3 ++
 arch/powerpc/include/asm/ultravisor-api.h | 24 +
 arch/powerpc/include/asm/ultravisor.h | 49 +
 arch/powerpc/kernel/Makefile  |  1 +
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/prom.c|  6 +++
 arch/powerpc/kernel/ucall.S   | 31 +++
 arch/powerpc/kernel/ultravisor.c  | 28 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 39 +++---
 arch/powerpc/mm/book3s64/hash_utils.c |  3 +-
 arch/powerpc/mm/book3s64/pgtable.c| 65 +--
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  9 ++--
 arch/powerpc/platforms/powernv/idle.c |  6 ++-
 arch/powerpc/platforms/powernv/opal-imc.c |  7 +++
 18 files changed, 269 insertions(+), 30 deletions(-)
 create mode 100644 arch/powerpc/include/asm/ultravisor-api.h
 create mode 100644 arch/powerpc/include/asm/ultravisor.h
 create mode 100644 arch/powerpc/kernel/ucall.S
 create mode 100644 arch/powerpc/kernel/ultravisor.c

-- 
2.20.1

[PATCH v3 7/9] KVM: PPC: Ultravisor: Restrict LDBAR access

2019-06-06 Thread Claudio Carvalho

When the ultravisor firmware is available, it takes control over the
LDBAR register. In this case, thread-imc updates and save/restore
operations on the LDBAR register are handled by ultravisor.

Signed-off-by: Claudio Carvalho 
Signed-off-by: Ram Pai 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 2 ++
 arch/powerpc/platforms/powernv/idle.c | 6 --
 arch/powerpc/platforms/powernv/opal-imc.c | 7 +++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f9b2620fbecd..cffb365d9d02 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -375,8 +375,10 @@ BEGIN_FTR_SECTION
mtspr   SPRN_RPR, r0
ld  r0, KVM_SPLIT_PMMAR(r6)
mtspr   SPRN_PMMAR, r0
+BEGIN_FW_FTR_SECTION_NESTED(70)
ld  r0, KVM_SPLIT_LDBAR(r6)
mtspr   SPRN_LDBAR, r0
+END_FW_FTR_SECTION_NESTED(FW_FEATURE_ULTRAVISOR, 0, 70)
isync
 FTR_SECTION_ELSE
/* On P9 we use the split_info for coordinating LPCR changes */
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index c9133f7908ca..fd62435e3267 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -679,7 +679,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
sprs.ptcr   = mfspr(SPRN_PTCR);
sprs.rpr= mfspr(SPRN_RPR);
sprs.tscr   = mfspr(SPRN_TSCR);
-   sprs.ldbar  = mfspr(SPRN_LDBAR);
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   sprs.ldbar  = mfspr(SPRN_LDBAR);
 
sprs_saved = true;
 
@@ -762,7 +763,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
mtspr(SPRN_PTCR,sprs.ptcr);
mtspr(SPRN_RPR, sprs.rpr);
mtspr(SPRN_TSCR,sprs.tscr);
-   mtspr(SPRN_LDBAR,   sprs.ldbar);
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   mtspr(SPRN_LDBAR,   sprs.ldbar);
 
if (pls >= pnv_first_tb_loss_level) {
/* TB loss */
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 1b6932890a73..e9b641d313fb 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -254,6 +254,13 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
bool core_imc_reg = false, thread_imc_reg = false;
u32 type;
 
+   /*
+* When the Ultravisor is enabled, it is responsible for thread-imc
+* updates
+*/
+   if (firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   return -EACCES;
+
/*
 * Check whether this is kdump kernel. If yes, force the engines to
 * stop and return.
-- 
2.20.1

[PATCH v3 8/9] KVM: PPC: Ultravisor: Enter a secure guest

2019-06-06 Thread Claudio Carvalho

From: Sukadev Bhattiprolu 

To enter a secure guest, we have to go through the ultravisor, therefore
we do a ucall when we are entering a secure guest.

This change is needed for any sort of entry to the secure guest from the
hypervisor, whether it is a return from an hcall, a return from a
hypervisor interrupt, or the first time that a secure guest vCPU is run.

If we are returning from an hcall, the results are already in the
appropriate registers (R3:12), except for R6,7, which need to be
restored before doing the ucall (UV_RETURN).

Have fast_guest_return check the kvm_arch.secure_guest field so that a
new CPU enters UV when started (in response to a RTAS start-cpu call).

Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson.

Signed-off-by: Sukadev Bhattiprolu 
[Pass SRR1 in r11 for UV_RETURN, fix kvmppc_msr_interrupt to preserve
 the MSR_S bit]
Signed-off-by: Paul Mackerras 
[Fix UV_RETURN token number and arch.secure_guest check]
Signed-off-by: Ram Pai 
[Update commit message and ret_to_ultra comment]
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/include/asm/ultravisor-api.h |  1 +
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 37 +++
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 013c76a0a03e..184becb62ea4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -294,6 +294,7 @@ struct kvm_arch {
cpumask_t cpu_in_guest;
u8 radix;
u8 fwnmi_enabled;
+   u8 secure_guest;
bool threads_indep;
bool nested_enable;
pgd_t *pgtable;
diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 24bfb4c1737e..15e6ce77a131 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -19,5 +19,6 @@
 
 /* opcodes */
 #define UV_WRITE_PATE  0xF104
+#define UV_RETURN  0xF11C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8e02444e9d3d..44742724513e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -508,6 +508,7 @@ int main(void)
OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v);
OFFSET(KVM_RADIX, kvm, arch.radix);
OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled);
+   OFFSET(KVM_SECURE_GUEST, kvm, arch.secure_guest);
OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr);
OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar);
OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index cffb365d9d02..d719d730d31e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Sign-extend HDEC if not on POWER9 */
 #define EXTEND_HDEC(reg)   \
@@ -1092,16 +1093,12 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
ld  r5, VCPU_LR(r4)
-   ld  r6, VCPU_CR(r4)
mtlrr5
-   mtcrr6
 
ld  r1, VCPU_GPR(R1)(r4)
ld  r2, VCPU_GPR(R2)(r4)
ld  r3, VCPU_GPR(R3)(r4)
ld  r5, VCPU_GPR(R5)(r4)
-   ld  r6, VCPU_GPR(R6)(r4)
-   ld  r7, VCPU_GPR(R7)(r4)
ld  r8, VCPU_GPR(R8)(r4)
ld  r9, VCPU_GPR(R9)(r4)
ld  r10, VCPU_GPR(R10)(r4)
@@ -1119,10 +1116,35 @@ BEGIN_FTR_SECTION
mtspr   SPRN_HDSISR, r0
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 
+   ld  r6, VCPU_KVM(r4)
+   lbz r7, KVM_SECURE_GUEST(r6)
+   cmpdi   r7, 0
+   bne ret_to_ultra
+
+   lwz r6, VCPU_CR(r4)
+   mtcrr6
+
+   ld  r7, VCPU_GPR(R7)(r4)
+   ld  r6, VCPU_GPR(R6)(r4)
ld  r0, VCPU_GPR(R0)(r4)
ld  r4, VCPU_GPR(R4)(r4)
HRFI_TO_GUEST
b   .
+/*
+ * We are entering a secure guest, so we have to invoke the ultravisor to do
+ * that. If we are returning from a hcall, the results are already in the
+ * appropriate registers (R3:12), except for R6,7 which we used as temporary
+ * registers above. Restore them, and set R0 to the ucall number (UV_RETURN).
+ */
+ret_to_ultra:
+   lwz r6, VCPU_CR(r4)
+   mtcrr6
+   mfspr   r11, SPRN_SRR1
+   LOAD_REG_IMMEDIATE(r0, UV_RETURN)
+   ld  r7, VCPU_GPR(R7)(r4)
+   ld  r6, VCPU_GPR(R6)(r4)
+   ld  r4, VCPU_GPR(R4)(r4)
+   sc  2
 
 /*
  * Enter the guest on a P9 or later system where we have exactly
@@ -3318,13 +3340,16 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
  *   r0 is used as a scratch register
  */
 kvmppc_msr_interrupt:
+

[PATCH v3 6/9] KVM: PPC: Ultravisor: Restrict flush of the partition tlb cache

2019-06-06 Thread Claudio Carvalho

From: Ram Pai 

Ultravisor is responsible for flushing the tlb cache, since it manages
the PATE entries. Hence skip tlb flush, if the ultravisor firmware is
available.

Signed-off-by: Ram Pai 
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/mm/book3s64/pgtable.c | 33 +-
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 40a9fc8b139f..1eeb5fe87023 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -224,6 +224,23 @@ void __init mmu_partition_table_init(void)
powernv_set_nmmu_ptcr(ptcr);
 }
 
+static void flush_partition(unsigned int lpid, unsigned long dw0)
+{
+   if (dw0 & PATB_HR) {
+   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 1) : :
+"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+   asm volatile(PPC_TLBIE_5(%0, %1, 2, 1, 1) : :
+"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
+   } else {
+   asm volatile(PPC_TLBIE_5(%0, %1, 2, 0, 0) : :
+"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
+   }
+   /* do we need fixup here ?*/
+   asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+
 static void __mmu_partition_table_set_entry(unsigned int lpid,
unsigned long dw0,
unsigned long dw1)
@@ -238,20 +255,8 @@ static void __mmu_partition_table_set_entry(unsigned int 
lpid,
 * The type of flush (hash or radix) depends on what the previous
 * use of this partition ID was, not the new use.
 */
-   asm volatile("ptesync" : : : "memory");
-   if (old & PATB_HR) {
-   asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
-"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
-   asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
-"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
-   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
-   } else {
-   asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
-"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
-   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
-   }
-   /* do we need fixup here ?*/
-   asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   flush_partition(lpid, old);
 }
 
 void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
-- 
2.20.1

[PATCH v3 5/9] KVM: PPC: Ultravisor: Use UV_WRITE_PATE ucall to register a PATE

2019-06-06 Thread Claudio Carvalho

From: Michael Anderson 

When running under an ultravisor, the ultravisor controls the real
partition table and has it in secure memory where the hypervisor can't
access it, and therefore we (the HV) have to do a ucall whenever we want
to update an entry.

The HV still keeps a copy of its view of the partition table in normal
memory so that the nest MMU can access it.

Both partition tables will have PATE entries for HV and normal virtual
machines.

Suggested-by: Ryan Grimm 
Signed-off-by: Michael Anderson 
Signed-off-by: Madhavan Srinivasan 
Signed-off-by: Ram Pai 
[Write the pate in HV's table before doing that in UV's]
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/include/asm/ultravisor-api.h |  5 +++-
 arch/powerpc/include/asm/ultravisor.h | 14 ++
 arch/powerpc/mm/book3s64/hash_utils.c |  3 +-
 arch/powerpc/mm/book3s64/pgtable.c| 34 +--
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  9 --
 5 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 5f538f33c704..24bfb4c1737e 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -15,6 +15,9 @@
 #define U_SUCCESS  H_SUCCESS
 #define U_FUNCTION H_FUNCTION
 #define U_PARAMETERH_PARAMETER
+#define U_PERMISSION   H_PERMISSION
 
-#endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
+/* opcodes */
+#define UV_WRITE_PATE  0xF104
 
+#endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index 7500771a8ebd..4ffec7a36acd 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -12,6 +12,8 @@
 
 #if !defined(__ASSEMBLY__)
 
+#include 
+
 /* Internal functions */
 extern int early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
 int depth, void *data);
@@ -28,8 +30,20 @@ extern int early_init_dt_scan_ultravisor(unsigned long node, 
const char *uname,
  */
 #if defined(CONFIG_PPC_UV)
 long ucall(unsigned long opcode, unsigned long *retbuf, ...);
+#else
+static long ucall(unsigned long opcode, unsigned long *retbuf, ...)
+{
+   return U_NOT_AVAILABLE;
+}
 #endif
 
+static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1)
+{
+   unsigned long retbuf[UCALL_BUFSIZE];
+
+   return ucall(UV_WRITE_PATE, retbuf, lpid, dw0, dw1);
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 1ff451892d7f..220a4e133240 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1080,9 +1080,10 @@ void hash__early_init_mmu_secondary(void)
 
if (!cpu_has_feature(CPU_FTR_ARCH_300))
mtspr(SPRN_SDR1, _SDR1);
-   else
+   else if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
mtspr(SPRN_PTCR,
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+
}
/* Initialize SLB */
slb_initialize();
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..40a9fc8b139f 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -206,12 +208,25 @@ void __init mmu_partition_table_init(void)
 * 64 K size.
 */
ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
-   mtspr(SPRN_PTCR, ptcr);
+   /*
+* If ultravisor is available, it is responsible for creating and
+* managing partition table
+*/
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   mtspr(SPRN_PTCR, ptcr);
+
+   /*
+* Since nestMMU cannot access secure memory. Create
+* and manage our own partition table. This table
+* contains entries for nonsecure and hypervisor
+* partition.
+*/
powernv_set_nmmu_ptcr(ptcr);
 }
 
-void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
-  unsigned long dw1)
+static void __mmu_partition_table_set_entry(unsigned int lpid,
+   unsigned long dw0,
+   unsigned long dw1)
 {
unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
 
@@ -238,6 +253,19 @@ void mmu_partition_table_set_entry(unsigned int lpid, 
unsigned long dw0,
/* do we need fixup here ?*/
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 }
+
+void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+ unsigned long dw1)
+{
+

[PATCH v3 4/9] KVM: PPC: Ultravisor: Add generic ultravisor call handler

2019-06-06 Thread Claudio Carvalho

From: Ram Pai 

Add the ucall() function, which can be used to make ultravisor calls
with varied number of in and out arguments. Ultravisor calls can be made
from the host or guests.

This copies the implementation of plpar_hcall().

Signed-off-by: Ram Pai 
[Change ucall.S to not save CR, rename and move the headers, build
 ucall.S if CONFIG_PPC_UV set, and add some comments in the code]
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/include/asm/ultravisor-api.h | 20 +++
 arch/powerpc/include/asm/ultravisor.h | 20 +++
 arch/powerpc/kernel/Makefile  |  2 +-
 arch/powerpc/kernel/ucall.S   | 31 +++
 arch/powerpc/kernel/ultravisor.c  |  4 +++
 5 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/ultravisor-api.h
 create mode 100644 arch/powerpc/kernel/ucall.S

diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
new file mode 100644
index ..5f538f33c704
--- /dev/null
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Ultravisor calls.
+ *
+ * Copyright 2019, IBM Corporation.
+ *
+ */
+#ifndef _ASM_POWERPC_ULTRAVISOR_API_H
+#define _ASM_POWERPC_ULTRAVISOR_API_H
+
+#include 
+
+/* Return codes */
+#define U_NOT_AVAILABLEH_NOT_AVAILABLE
+#define U_SUCCESS  H_SUCCESS
+#define U_FUNCTION H_FUNCTION
+#define U_PARAMETERH_PARAMETER
+
+#endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
+
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index e5009b0d84ea..7500771a8ebd 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -8,8 +8,28 @@
 #ifndef _ASM_POWERPC_ULTRAVISOR_H
 #define _ASM_POWERPC_ULTRAVISOR_H
 
+#include 
+
+#if !defined(__ASSEMBLY__)
+
 /* Internal functions */
 extern int early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
 int depth, void *data);
 
+/* API functions */
+#define UCALL_BUFSIZE 4
+/**
+ * ucall: Make a powerpc ultravisor call.
+ * @opcode: The ultravisor call to make.
+ * @retbuf: Buffer to store up to 4 return arguments in.
+ *
+ * This call supports up to 6 arguments and 4 return arguments. Use
+ * UCALL_BUFSIZE to size the return argument buffer.
+ */
+#if defined(CONFIG_PPC_UV)
+long ucall(unsigned long opcode, unsigned long *retbuf, ...);
+#endif
+
+#endif /* !__ASSEMBLY__ */
+
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index c8ca219e54bf..43ff4546e469 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -154,7 +154,7 @@ endif
 
 obj-$(CONFIG_EPAPR_PARAVIRT)   += epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
-obj-$(CONFIG_PPC_UV)   += ultravisor.o
+obj-$(CONFIG_PPC_UV)   += ultravisor.o ucall.o
 
 # Disable GCOV, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
diff --git a/arch/powerpc/kernel/ucall.S b/arch/powerpc/kernel/ucall.S
new file mode 100644
index ..ecc88998a13b
--- /dev/null
+++ b/arch/powerpc/kernel/ucall.S
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Generic code to perform an ultravisor call.
+ *
+ * Copyright 2019, IBM Corporation.
+ *
+ */
+#include 
+
+/*
+ * This function is based on the plpar_hcall()
+ */
+_GLOBAL_TOC(ucall)
+   mr  r0,r3
+   std r4,STK_PARAM(R4)(r1) /* Save ret buffer */
+   mr  r3,r5
+   mr  r4,r6
+   mr  r5,r7
+   mr  r6,r8
+   mr  r7,r9
+   mr  r8,r10
+
+   sc 2/* invoke the ultravisor */
+
+   ld  r12,STK_PARAM(R4)(r1)
+   std r4,  0(r12)
+   std r5,  8(r12)
+   std r6, 16(r12)
+   std r7, 24(r12)
+
+   blr /* return r3 = status */
diff --git a/arch/powerpc/kernel/ultravisor.c b/arch/powerpc/kernel/ultravisor.c
index dc6021f63c97..02ddf79a9522 100644
--- a/arch/powerpc/kernel/ultravisor.c
+++ b/arch/powerpc/kernel/ultravisor.c
@@ -8,10 +8,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
+/* in ucall.S */
+EXPORT_SYMBOL_GPL(ucall);
+
 int __init early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
 int depth, void *data)
 {
-- 
2.20.1

[PATCH v3 3/9] powerpc: Introduce FW_FEATURE_ULTRAVISOR

2019-06-06 Thread Claudio Carvalho

This feature tells if the ultravisor firmware is available to handle
ucalls.

Signed-off-by: Claudio Carvalho 
[Device node name to "ibm,ultravisor"]
Signed-off-by: Michael Anderson 
---
 arch/powerpc/include/asm/firmware.h   |  5 +++--
 arch/powerpc/include/asm/ultravisor.h | 15 +++
 arch/powerpc/kernel/Makefile  |  1 +
 arch/powerpc/kernel/prom.c|  6 ++
 arch/powerpc/kernel/ultravisor.c  | 24 
 5 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/ultravisor.h
 create mode 100644 arch/powerpc/kernel/ultravisor.c

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 00bc42d95679..43b48c4d3ca9 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -54,6 +54,7 @@
 #define FW_FEATURE_DRC_INFOASM_CONST(0x0008)
 #define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010)
 #define FW_FEATURE_PAPR_SCMASM_CONST(0x0020)
+#define FW_FEATURE_ULTRAVISOR  ASM_CONST(0x0040)
 
 #ifndef __ASSEMBLY__
 
@@ -72,9 +73,9 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-   FW_FEATURE_PAPR_SCM,
+   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_PSERIES_ALWAYS = 0,
-   FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
+   FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
new file mode 100644
index ..e5009b0d84ea
--- /dev/null
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Ultravisor definitions
+ *
+ * Copyright 2019, IBM Corporation.
+ *
+ */
+#ifndef _ASM_POWERPC_ULTRAVISOR_H
+#define _ASM_POWERPC_ULTRAVISOR_H
+
+/* Internal functions */
+extern int early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
+int depth, void *data);
+
+#endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a20..c8ca219e54bf 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -154,6 +154,7 @@ endif
 
 obj-$(CONFIG_EPAPR_PARAVIRT)   += epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
+obj-$(CONFIG_PPC_UV)   += ultravisor.o
 
 # Disable GCOV, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 4221527b082f..8a9a8a319959 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -713,6 +714,11 @@ void __init early_init_devtree(void *params)
of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL);
 #endif
 
+#if defined(CONFIG_PPC_UV)
+   /* Scan tree for ultravisor feature */
+   of_scan_flat_dt(early_init_dt_scan_ultravisor, NULL);
+#endif
+
/* Retrieve various informations from the /chosen node of the
 * device-tree, including the platform type, initrd location and
 * size, TCE reserve, and more ...
diff --git a/arch/powerpc/kernel/ultravisor.c b/arch/powerpc/kernel/ultravisor.c
new file mode 100644
index ..dc6021f63c97
--- /dev/null
+++ b/arch/powerpc/kernel/ultravisor.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Ultravisor high level interfaces
+ *
+ * Copyright 2019, IBM Corporation.
+ *
+ */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+int __init early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
+int depth, void *data)
+{
+   if (depth != 1 || strcmp(uname, "ibm,ultravisor") != 0)
+   return 0;
+
+   powerpc_firmware_features |= FW_FEATURE_ULTRAVISOR;
+   pr_debug("Ultravisor detected!\n");
+   return 1;
+}
-- 
2.20.1

[PATCH v3 1/9] KVM: PPC: Ultravisor: Add PPC_UV config option

2019-06-06 Thread Claudio Carvalho

From: Anshuman Khandual 

CONFIG_PPC_UV adds support for ultravisor.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Bharata B Rao 
Signed-off-by: Ram Pai 
[Update config help and commit message]
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/Kconfig | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..276c1857c335 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -439,6 +439,26 @@ config PPC_TRANSACTIONAL_MEM
---help---
  Support user-mode Transactional Memory on POWERPC.
 
+config PPC_UV
+   bool "Ultravisor support"
+   depends on KVM_BOOK3S_HV_POSSIBLE
+   select HMM_MIRROR
+   select HMM
+   select ZONE_DEVICE
+   select MIGRATE_VMA_HELPER
+   select DEV_PAGEMAP_OPS
+   select DEVICE_PRIVATE
+   select MEMORY_HOTPLUG
+   select MEMORY_HOTREMOVE
+   default n
+   help
+ This option paravirtualizes the kernel to run in POWER platforms that
+ supports the Protected Execution Facility (PEF). In such platforms,
+ the ultravisor firmware runs at a privilege level above the
+ hypervisor.
+
+ If unsure, say "N".
+
 config LD_HEAD_STUB_CATCH
bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if 
EXPERT
depends on PPC64
-- 
2.20.1

[PATCH v3 2/9] KVM: PPC: Ultravisor: Introduce the MSR_S bit

2019-06-06 Thread Claudio Carvalho

From: Sukadev Bhattiprolu 

The ultravisor processor mode is introduced in POWER platforms that
supports the Protected Execution Facility (PEF). Ultravisor is higher
privileged than hypervisor mode.

In PEF enabled platforms, the MSR_S bit is used to indicate if the
thread is in secure state. With the MSR_S bit, the privilege state of
the thread is now determined by MSR_S, MSR_HV and MSR_PR, as follows:

S   HV  PR
---
0   x   1   problem
1   0   1   problem
x   x   0   privileged
x   1   0   hypervisor
1   1   0   ultravisor
1   1   1   reserved

The hypervisor doesn't (and can't) run with the MSR_S bit set, but a
secure guest and the ultravisor firmware do.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Ram Pai 
[Update the commit message]
Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/include/asm/reg.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 10caa145f98b..39b4c0a519f5 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -38,6 +38,7 @@
 #define MSR_TM_LG  32  /* Trans Mem Available */
 #define MSR_VEC_LG 25  /* Enable AltiVec */
 #define MSR_VSX_LG 23  /* Enable VSX */
+#define MSR_S_LG   22  /* Secure VM bit */
 #define MSR_POW_LG 18  /* Enable Power Management */
 #define MSR_WE_LG  18  /* Wait State Enable */
 #define MSR_TGPR_LG17  /* TLB Update registers in use */
@@ -71,11 +72,13 @@
 #define MSR_SF __MASK(MSR_SF_LG)   /* Enable 64 bit mode */
 #define MSR_ISF__MASK(MSR_ISF_LG)  /* Interrupt 64b mode 
valid on 630 */
 #define MSR_HV __MASK(MSR_HV_LG)   /* Hypervisor state */
+#define MSR_S  __MASK(MSR_S_LG)/* Secure state */
 #else
 /* so tests for these bits fail on 32-bit */
 #define MSR_SF 0
 #define MSR_ISF0
 #define MSR_HV 0
+#define MSR_S  0
 #endif
 
 /*
-- 
2.20.1

Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-06 Thread Aneesh Kumar K.V

Nicholas Piggin  writes:

> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
>
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
>
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
>
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
>

I guess we are doing the find_linux_pte change in another patch.

Reviewed-by: Aneesh Kumar K.V 

> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in 
> pte helpers")
> Cc: Aneesh Kumar K.V 
> Cc: Christophe Leroy 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++-
>  arch/powerpc/mm/book3s64/pgtable.c   |  3 +++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 7dede2e34b70..aaa72aa1b765 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd)
>  #define pmd_access_permitted pmd_access_permitted
>  static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  {
> - return pte_access_permitted(pmd_pte(pmd), write);
> + pte_t pte = pmd_pte(pmd);
> + unsigned long pteval = pte_val(pte);
> +
> + /*
> +  * pmdp_invalidate sets this combination (that is not caught by
> +  * !pte_present() check in pte_access_permitted), to prevent
> +  * lock-free lookups, as part of the serialize_against_pte_lookup()
> +  * synchronisation.
> +  *
> +  * This check inadvertently catches the case where the PTE's hardware
> +  * PRESENT bit is cleared while TLB is flushed, to work around
> +  * hardware TLB issues. This is suboptimal, but should not be hit
> +  * frequently and should be harmless.
> +  */
> + if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT))
> + return false;
> +
> + return pte_access_permitted(pte, write);
>  }
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 16bda049187a..ff98b663c83e 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, 
> unsigned long address,
>   /*
>* This ensures that generic code that rely on IRQ disabling
>* to prevent a parallel THP split work as expected.
> +  *
> +  * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
> +  * a special case check in pmd_access_permitted.
>*/
>   serialize_against_pte_lookup(vma->vm_mm);
>   return __pmd(old_pmd);
> -- 
> 2.20.1

[PATCH] powerpc/cacheflush: fix variable set but not used

2019-06-06 Thread Qian Cai

The powerpc's flush_cache_vmap() is defined as a macro and never use
both of its arguments, so it will generate a compilation warning,

lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:203:16: warning: variable 'start' set but not used
[-Wunused-but-set-variable]

Fix it by making it an inline function.

Signed-off-by: Qian Cai 
---
 arch/powerpc/include/asm/cacheflush.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 74d60cfe8ce5..fd318f7c3eed 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -29,9 +29,12 @@
  * not expect this type of fault. flush_cache_vmap is not exactly the right
  * place to put this, but it seems to work well enough.
  */
-#define flush_cache_vmap(start, end)   do { asm volatile("ptesync" ::: 
"memory"); } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end)
+{
+   asm volatile("ptesync" ::: "memory");
+}
 #else
-#define flush_cache_vmap(start, end)   do { } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end) { }
 #endif
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-- 
1.8.3.1

Re: spidev.c driver on the ppc8247 (kernel 2.6.27.19)

2019-06-06 Thread siva krishna

can yo explain little more on this 



--
Sent from: http://linuxppc.10917.n7.nabble.com/linuxppc-dev-f3.html

Re: spidev.c driver on the ppc8247 (kernel 2.6.27.19)

2019-06-06 Thread siva krishna

Hi ,

can you elobarate this ,meaning which file need to be modified etc .



--
Sent from: http://linuxppc.10917.n7.nabble.com/linuxppc-dev-f3.html

Re: PowerPC arch_ptrace() writes beyond thread_struct/task_struct

2019-06-06 Thread Radu Rendec

On Thu, 2019-06-06 at 07:15 +0200, Christophe Leroy wrote:
> 
> Le 05/06/2019 à 23:45, Radu Rendec a écrit :
> > Hi Everyone,
> > 
> > I'm seeing some weird memory corruption that I have been able to isolate
> > to arch_ptrace() [arch/powerpc/kernel/ptrace.c] and PTRACE_POKEUSR. I am
> > on PowerPC 32 (MPC8378), kernel 4.9.179.
> > 
> > It's not very easy for me to test on the latest kernel, but I guess
> > little has changed since 4.9 in either the architecture specific ptrace
> > code or PowerPC register data structures.
> > 
> > What happens is that gdb calls ptrace(PTRACE_POKEUSER) with addr=0x158.
> > This goes down to arch_ptrace() [arch/powerpc/kernel/ptrace.c], inside
> > `case PTRACE_POKEUSR`, on the branch that does this:
> > 
> >  memcpy(>thread.TS_FPR(fpidx), ,
> >  sizeof(long));
> > 
> > where:
> >  index = addr >> 2 = 0x56 = 86
> >  fpidx = index - PT_FPR0 = 86 - 48 = 38
> 
> In struct thread_fp_state, fpr field is u64, so I guess we should have 
> the following on PPC32:
> 
> fpidx = (index - PT_FPR0) >> 1;

I guess this would only apply to PPC32, since everything up to fpidx is
calculated in units of sizeof(long) - which is 4 on PPC32 and 8 on
PPC64. But fpr[0:31] is always u64.

It also looks odd that only sizeof(long) bytes are ever copied for any
given fpr[fpidx], which means one half of the u64 is never accessible on
PPC32.

Ont other thing I don't get is the "+1" in the definition of PT_FPSCR
for PPC32:

#define PT_FPSCR (PT_FPR0 + 2*32 + 1)

Looking at struct thread_fp_state, fpscr follows immediately after
fpr[31]. Is the FPSCR register only 32-bit on PPC32? Is it stored in the
2nd half of (struct thread_fp_state).fpscr? This line:

child->thread.fp_state.fpscr = data;

suggests so. And in that case, the "+1" in PT_FPSCR makes sense, but
only for big endian: assigning `data` (which is "long", 32-bit) to the
`fpscr` field (which is "u64") would go to the higher address, which is
indeed "+1" in units of 32-bit words.

Then there is also a problem is the condition that determines whether
memcpy() is used to access one of the fpr[0:31] or fpscr is assigned
directly:

if (fpidx < (PT_FPSCR - PT_FPR0))

The case when the supplied addr points to the lower half of fpscr (which
is unused on PPC32?) erroneously indexes into fpr[0:31].

Is there any documentation of what "addr" is supposed to mean?

> >  >thread.TS_FPR(fpidx) = (void *)child + 1296
> > 
> >  offsetof(struct task_struct, thread) = 960
> >  sizeof(struct thread_struct) = 336
> >  sizeof(struct task_struct) = 1296
> > 
> > In other words, the memcpy() call writes just beyond thread_struct
> > (which is also beyond task_struct, for that matter).
> > 
> > This should never get past the bounds checks for `index`, so perhaps
> > there is a mismatch between ptrace macros and the actual register data
> > structures layout.
> > 
> > I will continue to investigate, but I'm not familiar with the PowerPC
> > registers so it will take a while before I make sense of all the data
> > structures and macros. Hopefully this rings a bell to someone who is
> > already familiar with those and could figure out quickly what the
> > problem is.

Re: [PATCH] powerpc/nvdimm: Add support for multibyte read/write for metadata

2019-06-06 Thread Aneesh Kumar K.V

Michael Ellerman  writes:

> "Aneesh Kumar K.V"  writes:
>> Oliver  writes:
>>> On Sun, Jun 2, 2019 at 2:44 PM Aneesh Kumar K.V
>>>  wrote:
> ...
 diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
 b/arch/powerpc/platforms/pseries/papr_scm.c
 index 0176ce66673f..e33cebb8ee6c 100644
 --- a/arch/powerpc/platforms/pseries/papr_scm.c
 +++ b/arch/powerpc/platforms/pseries/papr_scm.c
 @@ -97,42 +97,102 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
  }

  static int papr_scm_meta_get(struct papr_scm_priv *p,
 -   struct nd_cmd_get_config_data_hdr *hdr)
 +struct nd_cmd_get_config_data_hdr *hdr)
  {
 unsigned long data[PLPAR_HCALL_BUFSIZE];
 +   unsigned long offset, data_offset;
 +   int len, read;
 int64_t ret;

 -   if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
 +   if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
 return -EINVAL;

 -   ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
 -   hdr->in_offset, 1);
 -
 -   if (ret == H_PARAMETER) /* bad DRC index */
 -   return -ENODEV;
 -   if (ret)
 -   return -EINVAL; /* other invalid parameter */
 -
 -   hdr->out_buf[0] = data[0] & 0xff;
 -
 +   for (len = hdr->in_length; len; len -= read) {
 +
 +   data_offset = hdr->in_length - len;
 +   offset = hdr->in_offset + data_offset;
 +
 +   if (len >= 8)
 +   read = 8;
 +   else if (len >= 4)
 +   read = 4;
 +   else if ( len >= 2)
 +   read = 2;
 +   else
 +   read = 1;
 +
 +   ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
 + offset, read);
 +
 +   if (ret == H_PARAMETER) /* bad DRC index */
 +   return -ENODEV;
 +   if (ret)
 +   return -EINVAL; /* other invalid parameter */
 +
 +   switch (read) {
 +   case 8:
 +   *(uint64_t *)(hdr->out_buf + data_offset) = 
 be64_to_cpu(data[0]);
 +   break;
 +   case 4:
 +   *(uint32_t *)(hdr->out_buf + data_offset) = 
 be32_to_cpu(data[0] & 0x);
 +   break;
> ...
>>>
>>> I assume you got the qemu bits sorted out with Shiva? Looks good otherwise.
>>
>> That is correct. I also tested with different xfer values (1, 2, 4, 8)
>> on both Qemu and PowerVM.
>
> With a big endian kernel?

I completed this testing and found new bugs in other parts of the code.

Thanks for the sugestion.

-aneesh

Re: [PATCH] powerpc/nvdimm: Add support for multibyte read/write for metadata

2019-06-06 Thread Aneesh Kumar K.V

Alexey Kardashevskiy  writes:

> On 02/06/2019 14:43, Aneesh Kumar K.V wrote:
>> SCM_READ/WRITE_MEATADATA hcall supports multibyte read/write. This patch
>> updates the metadata read/write to use 1, 2, 4 or 8 byte read/write as
>> mentioned in PAPR document.
>> 
>> READ/WRITE_METADATA hcall supports the 1, 2, 4, or 8 bytes read/write.
>> For other values hcall results H_P3.
>> 
>> Hypervisor stores the metadata contents in big-endian format and in-order
>> to enable read/write in different granularity, we need to switch the contents
>> to big-endian before calling HCALL.
>> 
>> Based on an patch from Oliver O'Halloran 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/platforms/pseries/papr_scm.c | 104 +-
>>  1 file changed, 82 insertions(+), 22 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 0176ce66673f..e33cebb8ee6c 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -97,42 +97,102 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
>>  }
>>  
>>  static int papr_scm_meta_get(struct papr_scm_priv *p,
>> -struct nd_cmd_get_config_data_hdr *hdr)
>> + struct nd_cmd_get_config_data_hdr *hdr)
>>  {
>>  unsigned long data[PLPAR_HCALL_BUFSIZE];
>> +unsigned long offset, data_offset;
>> +int len, read;
>>  int64_t ret;
>>  
>> -if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
>> +if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
>>  return -EINVAL;
>>  
>> -ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
>> -hdr->in_offset, 1);
>> -
>> -if (ret == H_PARAMETER) /* bad DRC index */
>> -return -ENODEV;
>> -if (ret)
>> -return -EINVAL; /* other invalid parameter */
>> -
>> -hdr->out_buf[0] = data[0] & 0xff;
>> -
>> +for (len = hdr->in_length; len; len -= read) {
>> +
>> +data_offset = hdr->in_length - len;
>> +offset = hdr->in_offset + data_offset;
>> +
>> +if (len >= 8)
>> +read = 8;
>> +else if (len >= 4)
>> +read = 4;
>> +else if ( len >= 2)
>
> Do not need a space before "len".

Will fix in next update

>
>
>> +read = 2;
>> +else
>> +read = 1;
>> +
>> +ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
>> +  offset, read);
>> +
>> +if (ret == H_PARAMETER) /* bad DRC index */
>> +return -ENODEV;
>> +if (ret)
>> +return -EINVAL; /* other invalid parameter */
>> +
>> +switch (read) {
>> +case 8:
>> +*(uint64_t *)(hdr->out_buf + data_offset) = 
>> be64_to_cpu(data[0]);
>> +break;
>> +case 4:
>> +*(uint32_t *)(hdr->out_buf + data_offset) = 
>> be32_to_cpu(data[0] & 0x);
>> +break;
>> +
>> +case 2:
>> +*(uint16_t *)(hdr->out_buf + data_offset) = 
>> be16_to_cpu(data[0] & 0x);
>> +break;
>> +
>> +case 1:
>> +*(uint32_t *)(hdr->out_buf + data_offset) = (data[0] & 
>> 0xff);
>
>
> Memory corruption, should be uint8_t*.

Good catch. That also resulted in an error on big endian kernel. Will
fix that in next update
>
>
>> +break;
>> +}
>> +}
>>  return 0;
>>  }
>>  
>>  static int papr_scm_meta_set(struct papr_scm_priv *p,
>> -struct nd_cmd_set_config_hdr *hdr)
>> + struct nd_cmd_set_config_hdr *hdr)
>>  {
>> +unsigned long offset, data_offset;
>> +int len, wrote;
>> +unsigned long data;
>> +__be64 data_be;
>>  int64_t ret;
>>  
>> -if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
>> +if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
>>  return -EINVAL;
>>  
>> -ret = plpar_hcall_norets(H_SCM_WRITE_METADATA,
>> -p->drc_index, hdr->in_offset, hdr->in_buf[0], 1);
>> -
>> -if (ret == H_PARAMETER) /* bad DRC index */
>> -return -ENODEV;
>> -if (ret)
>> -return -EINVAL; /* other invalid parameter */
>> +for (len = hdr->in_length; len; len -= wrote) {
>> +
>> +data_offset = hdr->in_length - len;
>> +offset = hdr->in_offset + data_offset;
>> +
>> +if (len >= 8) {
>> +data = *(uint64_t *)(hdr->in_buf + data_offset);
>> +data_be = cpu_to_be64(data);
>> +wrote = 8;
>> +} else if (len >= 4) {
>> +data = *(uint32_t *)(hdr->in_buf + data_offset);
>> +data &=

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-06 Thread Oliver

On Thu, Jun 6, 2019 at 5:17 PM Alistair Popple  wrote:
>
> I have been hitting EEH address errors testing this with some network
> cards which map/unmap DMA addresses more frequently. For example:
>
> PHB4 PHB#5 Diag-data (Version: 1)
> brdgCtl:0002
> RootSts:00060020 00402000 a0220008 00100107 0800
> PhbSts: 001c 001c
> Lem:00010080  0080
> PhbErr: 0280 0200 214898000240 
> a0084000
> RxeTceErr:  2000 2000 c000 
> 
> PblErr: 0002 0002  
> 
> RegbErr:0040 0040 61000c48 
> 
> PE[000] A/B: 8300b038 8000
>
> Interestingly the PE[000] A/B data is the same across different cards
> and drivers.

TCE page fault due to permissions so odds are the DMA address was unmapped.

What cards did you get this with? I tried with one of the common
BCM5719 NICs and generated network traffic by using rsync to copy a
linux git tree to the system and it worked fine.

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Christoph Hellwig

On Thu, Jun 06, 2019 at 08:57:49PM +1000, Benjamin Herrenschmidt wrote:
> > Wow... that's an odd amount. One thing we could possibly do is add code
> > to limit the amount of RAM when we detect that device
> 
> Sent too quickly... I mean that *or* force swiotlb at 30-bits on those 
> systems based
> on detecting the presence of that device in the device-tree.

swiotlb doesn't really help you, as these days swiotlb on matters for
the dma_map* case.  What would help is a ZONE_DMA that covers these
devices.  No need to do the 24-bit x86 does, but 30-bit would do it.

WIP patch for testing below:

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index b8286a2013b4..7a367ce87c41 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,6 +319,10 @@ struct vm_area_struct;
 #endif /* __ASSEMBLY__ */
 #include 
 
+#if 1 /* XXX: pmac?  dynamic discovery? */
+#define ARCH_ZONE_DMA_BITS 30
+#else
 #define ARCH_ZONE_DMA_BITS 31
+#endif
 
 #endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cba29131bccc..2540d3b2588c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -248,7 +248,8 @@ void __init paging_init(void)
   (long int)((top_of_ram - total_ram) >> 20));
 
 #ifdef CONFIG_ZONE_DMA
-   max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffUL >> PAGE_SHIFT);
+   max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
+   ((1UL << ARCH_ZONE_DMA_BITS) - 1) >> PAGE_SHIFT);
 #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM

[PATCH v1 5/5] crypto: talitos - drop icv_ool

2019-06-06 Thread Christophe Leroy

icv_ool is not used anymore, drop it.

Fixes: 9cc87bc3613b ("crypto: talitos - fix AEAD processing")
Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 3 ---
 drivers/crypto/talitos.h | 2 --
 2 files changed, 5 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index b2de931de623..03b7a5d28fb0 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -1278,9 +1278,6 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct 
aead_request *areq,
 is_ipsec_esp && !encrypt);
tbl_off += ret;
 
-   /* ICV data */
-   edesc->icv_ool = !encrypt;
-
if (!encrypt && is_ipsec_esp) {
struct talitos_ptr *tbl_ptr = >link_tbl[tbl_off];
 
diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
index 95f78c6d9206..1469b956948a 100644
--- a/drivers/crypto/talitos.h
+++ b/drivers/crypto/talitos.h
@@ -46,7 +46,6 @@ struct talitos_desc {
  * talitos_edesc - s/w-extended descriptor
  * @src_nents: number of segments in input scatterlist
  * @dst_nents: number of segments in output scatterlist
- * @icv_ool: whether ICV is out-of-line
  * @iv_dma: dma address of iv for checking continuity and link table
  * @dma_len: length of dma mapped link_tbl space
  * @dma_link_tbl: bus physical address of link_tbl/buf
@@ -61,7 +60,6 @@ struct talitos_desc {
 struct talitos_edesc {
int src_nents;
int dst_nents;
-   bool icv_ool;
dma_addr_t iv_dma;
int dma_len;
dma_addr_t dma_link_tbl;
-- 
2.13.3

[PATCH v1 3/5] crypto: talitos - fix hash on SEC1.

2019-06-06 Thread Christophe Leroy

On SEC1, hash provides wrong result when performing hashing in several
steps with input data SG list has more than one element. This was
detected with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS:

[   44.185947] alg: hash: md5-talitos test failed (wrong result) on test vector 
6, cfg="random: may_sleep use_finup src_divs=[25.88%@+8063, 
24.19%@+9588, 28.63%@+16333, 4.60%@+6756, 16.70%@+16281] 
dst_divs=[71.61%@alignmask+16361, 14.36%@+7756, 14.3%@+"
[   44.325122] alg: hash: sha1-talitos test failed (wrong result) on test 
vector 3, cfg="random: inplace use_final src_divs=[16.56%@+16378, 
52.0%@+16329, 21.42%@alignmask+16380, 10.2%@alignmask+16380] 
iv_offset=39"
[   44.493500] alg: hash: sha224-talitos test failed (wrong result) on test 
vector 4, cfg="random: use_final nosimd src_divs=[52.27%@+7401, 
17.34%@+16285, 17.71%@+26, 12.68%@+10644] iv_offset=43"
[   44.673262] alg: hash: sha256-talitos test failed (wrong result) on test 
vector 4, cfg="random: may_sleep use_finup src_divs=[60.6%@+12790, 
17.86%@+1329, 12.64%@alignmask+16300, 8.29%@+15, 0.40%@+13506, 
0.51%@+16322, 0.24%@+16339] dst_divs"

This is due to two issues:
- We have an overlap between the buffer used for copying the input
data (SEC1 doesn't do scatter/gather) and the chained descriptor.
- Data copy is wrong when the previous hash left less than one
blocksize of data to hash, implying a complement of the previous
block with a few bytes from the new request.

This patch fixes it by:
- Moving the second descriptor after the buffer, as moving the buffer
after the descriptor would make it more complex for other cipher
operations (AEAD, ABLKCIPHER)
- Rebuiding a new data SG list without the bytes taken from the new
request to complete the previous one.

Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
SEC1")
Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 63 ++--
 1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 5b401aec6c84..4f03baef952b 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -336,15 +336,18 @@ static void flush_channel(struct device *dev, int ch, int 
error, int reset_ch)
tail = priv->chan[ch].tail;
while (priv->chan[ch].fifo[tail].desc) {
__be32 hdr;
+   struct talitos_edesc *edesc;
 
request = >chan[ch].fifo[tail];
+   edesc = container_of(request->desc, struct talitos_edesc, desc);
 
/* descriptors with their done bits set don't get the error */
rmb();
if (!is_sec1)
hdr = request->desc->hdr;
else if (request->desc->next_desc)
-   hdr = (request->desc + 1)->hdr1;
+   hdr = ((struct talitos_desc *)
+  (edesc->buf + edesc->dma_len))->hdr1;
else
hdr = request->desc->hdr1;
 
@@ -476,8 +479,14 @@ static u32 current_desc_hdr(struct device *dev, int ch)
}
}
 
-   if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc)
-   return (priv->chan[ch].fifo[iter].desc + 1)->hdr;
+   if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc) {
+   struct talitos_edesc *edesc;
+
+   edesc = container_of(priv->chan[ch].fifo[iter].desc,
+struct talitos_edesc, desc);
+   return ((struct talitos_desc *)
+   (edesc->buf + edesc->dma_len))->hdr;
+   }
 
return priv->chan[ch].fifo[iter].desc->hdr;
 }
@@ -1402,15 +1411,11 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
device *dev,
edesc->dst_nents = dst_nents;
edesc->iv_dma = iv_dma;
edesc->dma_len = dma_len;
-   if (dma_len) {
-   void *addr = >link_tbl[0];
-
-   if (is_sec1 && !dst)
-   addr += sizeof(struct talitos_desc);
-   edesc->dma_link_tbl = dma_map_single(dev, addr,
+   if (dma_len)
+   edesc->dma_link_tbl = dma_map_single(dev, >link_tbl[0],
 edesc->dma_len,
 DMA_BIDIRECTIONAL);
-   }
+
return edesc;
 }
 
@@ -1722,14 +1727,16 @@ static void common_nonsnoop_hash_unmap(struct device 
*dev,
struct talitos_private *priv = dev_get_drvdata(dev);
bool is_sec1 = has_ftr_sec1(priv);
struct talitos_desc *desc = >desc;
-   struct talitos_desc *desc2 = desc + 1;
+   struct talitos_desc *desc2 = (struct talitos_desc *)
+(edesc->buf + edesc->dma_len);
 
unmap_single_talitos_ptr(dev, >desc.ptr[5], DMA_FROM_DEVICE);
if (desc->next_desc &&
desc->ptr[5].ptr != desc2->ptr[5].ptr)

[PATCH v1 4/5] crypto: talitos - eliminate unneeded 'done' functions at build time

2019-06-06 Thread Christophe Leroy

When building for SEC1 only, talitos2_done functions are unneeded
and should go away.

For this, use has_ftr_sec1() which will always return true when only
SEC1 support is being built, allowing GCC to drop TALITOS2 functions.

Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 4f03baef952b..b2de931de623 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -3401,7 +3401,7 @@ static int talitos_probe(struct platform_device *ofdev)
if (err)
goto err_out;
 
-   if (of_device_is_compatible(np, "fsl,sec1.0")) {
+   if (has_ftr_sec1(priv)) {
if (priv->num_channels == 1)
tasklet_init(>done_task[0], talitos1_done_ch0,
 (unsigned long)dev);
-- 
2.13.3

[PATCH v1 2/5] crypto: talitos - move struct talitos_edesc into talitos.h

2019-06-06 Thread Christophe Leroy

Next patch will require struct talitos_edesc to be defined
earlier in talitos.c

This patch moves it into talitos.h so that it can be used
from any place in talitos.c

Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
SEC1")
Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 30 --
 drivers/crypto/talitos.h | 30 ++
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 3b3e99f1cddb..5b401aec6c84 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -951,36 +951,6 @@ static int aead_des3_setkey(struct crypto_aead *authenc,
goto out;
 }
 
-/*
- * talitos_edesc - s/w-extended descriptor
- * @src_nents: number of segments in input scatterlist
- * @dst_nents: number of segments in output scatterlist
- * @icv_ool: whether ICV is out-of-line
- * @iv_dma: dma address of iv for checking continuity and link table
- * @dma_len: length of dma mapped link_tbl space
- * @dma_link_tbl: bus physical address of link_tbl/buf
- * @desc: h/w descriptor
- * @link_tbl: input and output h/w link tables (if {src,dst}_nents > 1) (SEC2)
- * @buf: input and output buffeur (if {src,dst}_nents > 1) (SEC1)
- *
- * if decrypting (with authcheck), or either one of src_nents or dst_nents
- * is greater than 1, an integrity check value is concatenated to the end
- * of link_tbl data
- */
-struct talitos_edesc {
-   int src_nents;
-   int dst_nents;
-   bool icv_ool;
-   dma_addr_t iv_dma;
-   int dma_len;
-   dma_addr_t dma_link_tbl;
-   struct talitos_desc desc;
-   union {
-   struct talitos_ptr link_tbl[0];
-   u8 buf[0];
-   };
-};
-
 static void talitos_sg_unmap(struct device *dev,
 struct talitos_edesc *edesc,
 struct scatterlist *src,
diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h
index 32ad4fc679ed..95f78c6d9206 100644
--- a/drivers/crypto/talitos.h
+++ b/drivers/crypto/talitos.h
@@ -42,6 +42,36 @@ struct talitos_desc {
 
 #define TALITOS_DESC_SIZE  (sizeof(struct talitos_desc) - sizeof(__be32))
 
+/*
+ * talitos_edesc - s/w-extended descriptor
+ * @src_nents: number of segments in input scatterlist
+ * @dst_nents: number of segments in output scatterlist
+ * @icv_ool: whether ICV is out-of-line
+ * @iv_dma: dma address of iv for checking continuity and link table
+ * @dma_len: length of dma mapped link_tbl space
+ * @dma_link_tbl: bus physical address of link_tbl/buf
+ * @desc: h/w descriptor
+ * @link_tbl: input and output h/w link tables (if {src,dst}_nents > 1) (SEC2)
+ * @buf: input and output buffeur (if {src,dst}_nents > 1) (SEC1)
+ *
+ * if decrypting (with authcheck), or either one of src_nents or dst_nents
+ * is greater than 1, an integrity check value is concatenated to the end
+ * of link_tbl data
+ */
+struct talitos_edesc {
+   int src_nents;
+   int dst_nents;
+   bool icv_ool;
+   dma_addr_t iv_dma;
+   int dma_len;
+   dma_addr_t dma_link_tbl;
+   struct talitos_desc desc;
+   union {
+   struct talitos_ptr link_tbl[0];
+   u8 buf[0];
+   };
+};
+
 /**
  * talitos_request - descriptor submission request
  * @desc: descriptor pointer (kernel virtual)
-- 
2.13.3

[PATCH v1 0/5] Additional fixes on Talitos driver

2019-06-06 Thread Christophe Leroy

This series is the last set of fixes for the Talitos driver.

We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and
SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS:

[3.385197] bus: 'platform': really_probe: probing driver talitos with 
device ff02.crypto
[3.450982] random: fast init done
[   12.252548] alg: No test for authenc(hmac(md5),cbc(aes)) 
(authenc-hmac-md5-cbc-aes-talitos-hsna)
[   12.262226] alg: No test for authenc(hmac(md5),cbc(des3_ede)) 
(authenc-hmac-md5-cbc-3des-talitos-hsna)
[   43.310737] Bug in SEC1, padding ourself
[   45.603318] random: crng init done
[   54.612333] talitos ff02.crypto: fsl,sec1.2 algorithms registered in 
/proc/crypto
[   54.620232] driver: 'talitos': driver_bound: bound to device 
'ff02.crypto'

[1.193721] bus: 'platform': really_probe: probing driver talitos with 
device b003.crypto
[1.229197] random: fast init done
[2.714920] alg: No test for authenc(hmac(sha224),cbc(aes)) 
(authenc-hmac-sha224-cbc-aes-talitos)
[2.724312] alg: No test for authenc(hmac(sha224),cbc(aes)) 
(authenc-hmac-sha224-cbc-aes-talitos-hsna)
[4.482045] alg: No test for authenc(hmac(md5),cbc(aes)) 
(authenc-hmac-md5-cbc-aes-talitos)
[4.490940] alg: No test for authenc(hmac(md5),cbc(aes)) 
(authenc-hmac-md5-cbc-aes-talitos-hsna)
[4.500280] alg: No test for authenc(hmac(md5),cbc(des3_ede)) 
(authenc-hmac-md5-cbc-3des-talitos)
[4.509727] alg: No test for authenc(hmac(md5),cbc(des3_ede)) 
(authenc-hmac-md5-cbc-3des-talitos-hsna)
[6.631781] random: crng init done
[   11.521795] talitos b003.crypto: fsl,sec2.2 algorithms registered in 
/proc/crypto
[   11.529803] driver: 'talitos': driver_bound: bound to device 
'b003.crypto'

Christophe Leroy (5):
  crypto: talitos - fix ECB and CBC algs ivsize
  crypto: talitos - move struct talitos_edesc into talitos.h
  crypto: talitos - fix hash on SEC1.
  crypto: talitos - eliminate unneeded 'done' functions at build time
  crypto: talitos - drop icv_ool

 drivers/crypto/talitos.c | 104 ---
 drivers/crypto/talitos.h |  28 +
 2 files changed, 72 insertions(+), 60 deletions(-)

-- 
2.13.3

[PATCH v1 1/5] crypto: talitos - fix ECB and CBC algs ivsize

2019-06-06 Thread Christophe Leroy

commit d84cc9c9524e ("crypto: talitos - fix ECB algs ivsize")
wrongly modified CBC algs ivsize instead of ECB aggs ivsize.

This restore the CBC algs original ivsize of removes ECB's ones.

Signed-off-by: Christophe Leroy 
Fixes: d84cc9c9524e ("crypto: talitos - fix ECB algs ivsize")
---
 drivers/crypto/talitos.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 122ec6c85446..3b3e99f1cddb 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2753,7 +2753,6 @@ static struct talitos_alg_template driver_algs[] = {
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
-   .ivsize = AES_BLOCK_SIZE,
.setkey = ablkcipher_aes_setkey,
}
},
@@ -2770,6 +2769,7 @@ static struct talitos_alg_template driver_algs[] = {
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
+   .ivsize = AES_BLOCK_SIZE,
.setkey = ablkcipher_aes_setkey,
}
},
@@ -2805,7 +2805,6 @@ static struct talitos_alg_template driver_algs[] = {
.cra_ablkcipher = {
.min_keysize = DES_KEY_SIZE,
.max_keysize = DES_KEY_SIZE,
-   .ivsize = DES_BLOCK_SIZE,
.setkey = ablkcipher_des_setkey,
}
},
@@ -2822,6 +2821,7 @@ static struct talitos_alg_template driver_algs[] = {
.cra_ablkcipher = {
.min_keysize = DES_KEY_SIZE,
.max_keysize = DES_KEY_SIZE,
+   .ivsize = DES_BLOCK_SIZE,
.setkey = ablkcipher_des_setkey,
}
},
@@ -2839,7 +2839,6 @@ static struct talitos_alg_template driver_algs[] = {
.cra_ablkcipher = {
.min_keysize = DES3_EDE_KEY_SIZE,
.max_keysize = DES3_EDE_KEY_SIZE,
-   .ivsize = DES3_EDE_BLOCK_SIZE,
.setkey = ablkcipher_des3_setkey,
}
},
@@ -2857,6 +2856,7 @@ static struct talitos_alg_template driver_algs[] = {
.cra_ablkcipher = {
.min_keysize = DES3_EDE_KEY_SIZE,
.max_keysize = DES3_EDE_KEY_SIZE,
+   .ivsize = DES3_EDE_BLOCK_SIZE,
.setkey = ablkcipher_des3_setkey,
}
},
-- 
2.13.3

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Benjamin Herrenschmidt

On Thu, 2019-06-06 at 20:56 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2019-06-06 at 12:31 +0300, Aaro Koskinen wrote:
> > Hi,
> > 
> > On Thu, Jun 06, 2019 at 10:54:51AM +1000, Benjamin Herrenschmidt
> > wrote:
> > > On Thu, 2019-06-06 at 01:50 +0300, Aaro Koskinen wrote:
> > > > Hi,
> > > > 
> > > > When upgrading from v5.0 -> v5.1 on G4 PowerBook, I noticed WLAN
> > > > does
> > > > not work anymore:
> > > > 
> > > > [   42.004303] b43legacy-phy0: Loading firmware version 0x127,
> > > > patch level 14 (2005-04-18 02:36:27)
> > > > [   42.184837] b43legacy-phy0 debug: Chip initialized
> > > > [   42.184873] b43legacy-phy0 ERROR: The machine/kernel does not
> > > > support the required 30-bit DMA mask
> > > > 
> > > > The same happens with the current mainline.
> > > 
> > > How much RAM do you have ?
> > 
> > The system has 1129 MB RAM. Booting with mem=1G makes it work.
> 
> Wow... that's an odd amount. One thing we could possibly do is add code
> to limit the amount of RAM when we detect that device

Sent too quickly... I mean that *or* force swiotlb at 30-bits on those systems 
based
on detecting the presence of that device in the device-tree.

Cheers,
Ben.

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Benjamin Herrenschmidt

On Thu, 2019-06-06 at 12:31 +0300, Aaro Koskinen wrote:
> Hi,
> 
> On Thu, Jun 06, 2019 at 10:54:51AM +1000, Benjamin Herrenschmidt
> wrote:
> > On Thu, 2019-06-06 at 01:50 +0300, Aaro Koskinen wrote:
> > > Hi,
> > > 
> > > When upgrading from v5.0 -> v5.1 on G4 PowerBook, I noticed WLAN
> > > does
> > > not work anymore:
> > > 
> > > [   42.004303] b43legacy-phy0: Loading firmware version 0x127,
> > > patch level 14 (2005-04-18 02:36:27)
> > > [   42.184837] b43legacy-phy0 debug: Chip initialized
> > > [   42.184873] b43legacy-phy0 ERROR: The machine/kernel does not
> > > support the required 30-bit DMA mask
> > > 
> > > The same happens with the current mainline.
> > 
> > How much RAM do you have ?
> 
> The system has 1129 MB RAM. Booting with mem=1G makes it work.

Wow... that's an odd amount. One thing we could possibly do is add code
to limit the amount of RAM when we detect that device

Cheers,
Ben.

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Aaro Koskinen

Hi,

On Thu, Jun 06, 2019 at 10:54:51AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2019-06-06 at 01:50 +0300, Aaro Koskinen wrote:
> > Hi,
> > 
> > When upgrading from v5.0 -> v5.1 on G4 PowerBook, I noticed WLAN does
> > not work anymore:
> > 
> > [   42.004303] b43legacy-phy0: Loading firmware version 0x127, patch level 
> > 14 (2005-04-18 02:36:27)
> > [   42.184837] b43legacy-phy0 debug: Chip initialized
> > [   42.184873] b43legacy-phy0 ERROR: The machine/kernel does not support 
> > the required 30-bit DMA mask
> > 
> > The same happens with the current mainline.
> 
> How much RAM do you have ?

The system has 1129 MB RAM. Booting with mem=1G makes it work.

A.

Re: [PATCH] ocxl: do not use C++ style comments in uapi header

2019-06-06 Thread Masahiro Yamada

Hi Michael,

On Wed, Jun 5, 2019 at 3:18 PM Andrew Donnellan  wrote:
>
> On 4/6/19 10:12 pm, Masahiro Yamada wrote:
> > On Tue, Jun 4, 2019 at 8:54 PM Frederic Barrat  
> > wrote:
> >>
> >>
> >>
> >> Le 04/06/2019 à 13:16, Masahiro Yamada a écrit :
> >>> Linux kernel tolerates C++ style comments these days. Actually, the
> >>> SPDX License tags for .c files start with //.
> >>>
> >>> On the other hand, uapi headers are written in more strict C, where
> >>> the C++ comment style is forbidden.
> >>>
> >>> Signed-off-by: Masahiro Yamada 
> >>> ---
> >>
> >> Thanks!
> >> Acked-by: Frederic Barrat 
> >>
> >
> > Please hold on this patch until
> > we get consensus about the C++ comment style.
> >
> > Discussion just started here:
> > https://lore.kernel.org/patchwork/patch/1083801/
>
> If you choose to proceed with this patch:
>
> Acked-by: Andrew Donnellan 

After some discussion,
the other one was applied to the media subsystem.

Please pick up this one with Frederic and Andrew's Ack.

Thanks.



-- 
Best Regards
Masahiro Yamada

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Ravi Bangoria




On 6/6/19 12:59 PM, Ravi Bangoria wrote:
> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
> 
>   # cat /proc/kallsyms | grep kthread_create_list
>   c121c8b8 d kthread_create_list
> 
> Add watchpoint on kthread_create_list->next:

s/kthread_create_list->next/kthread_create_list->prev/

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Naveen N. Rao


Ravi Bangoria wrote:

Powerpc hw triggers watchpoint before executing the instruction.
To make trigger-after-execute behavior, kernel emulates the
instruction. If the instruction is 'load something into non-
volatile register', exception handler should restore emulated
register state while returning back, otherwise there will be
register state corruption. Ex, Adding a watchpoint on a list
can corrput the list:

  # cat /proc/kallsyms | grep kthread_create_list
  c121c8b8 d kthread_create_list

Add watchpoint on kthread_create_list->next:

  # perf record -e mem:0xc121c8c0

Run some workload such that new kthread gets invoked. Ex, I
just logged out from console:

  list_add corruption. next->prev should be prev (c1214e00), \
but was c121c8b8. (next=c121c8b8).
  WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
  CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
  ...
  NIP __list_add_valid+0xb4/0xc0
  LR __list_add_valid+0xb0/0xc0
  Call Trace:
  __list_add_valid+0xb0/0xc0 (unreliable)
  __kthread_create_on_node+0xe0/0x260
  kthread_create_on_node+0x34/0x50
  create_worker+0xe8/0x260
  worker_thread+0x444/0x560
  kthread+0x160/0x1a0
  ret_from_kernel_thread+0x5c/0x70

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Awesome catch - this one has had a glorious run...
Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit 
server processors")

Reviewed-by: Naveen N. Rao 


- Naveen

Re: [PATCH 12/16] mm: consolidate the get_user_pages* implementations

2019-06-06 Thread John Hubbard


On 6/5/19 11:20 PM, Christoph Hellwig wrote:

On Wed, Jun 05, 2019 at 11:01:17PM -0700, John Hubbard wrote:

I started reviewing this one patch, and it's kind of messy figuring out
if the code motion preserves everything because of
all the consolidation from other places, plus having to move things in
and out of the ifdef blocks.  So I figured I'd check and see if this is
going to make it past RFC status soon, and if it's going before or after
Ira's recent RFC ("RDMA/FS DAX truncate proposal").


I don't like the huge moves either, but I can't really think of any
better way to do it.  Proposals welcome, though.



One way would be to do it in two patches:

1) Move the code into gup.c, maybe at the bottom. Surround each function
or group of functions by whatever ifdefs they need.

2) Move code out of the bottom of gup.c, into the final location.

...but I'm not certain that will be that much better. In the spirit of
not creating gratuitous work for others, I could try it out and send
out something if it looks like it's noticeably easier to verify/review.

thanks,
--
John Hubbard
NVIDIA

[PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Ravi Bangoria

Powerpc hw triggers watchpoint before executing the instruction.
To make trigger-after-execute behavior, kernel emulates the
instruction. If the instruction is 'load something into non-
volatile register', exception handler should restore emulated
register state while returning back, otherwise there will be
register state corruption. Ex, Adding a watchpoint on a list
can corrput the list:

  # cat /proc/kallsyms | grep kthread_create_list
  c121c8b8 d kthread_create_list

Add watchpoint on kthread_create_list->next:

  # perf record -e mem:0xc121c8c0

Run some workload such that new kthread gets invoked. Ex, I
just logged out from console:

  list_add corruption. next->prev should be prev (c1214e00), \
but was c121c8b8. (next=c121c8b8).
  WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
  CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
  ...
  NIP __list_add_valid+0xb4/0xc0
  LR __list_add_valid+0xb0/0xc0
  Call Trace:
  __list_add_valid+0xb0/0xc0 (unreliable)
  __kthread_create_on_node+0xe0/0x260
  kthread_create_on_node+0x34/0x50
  create_worker+0xe8/0x260
  worker_thread+0x444/0x560
  kthread+0x160/0x1a0
  ret_from_kernel_thread+0x5c/0x70

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9481a11..96de0d1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1753,7 +1753,7 @@ handle_dabr_fault:
ld  r5,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_break
-12:b   ret_from_except_lite
+12:b   ret_from_except
 
 
 #ifdef CONFIG_PPC_BOOK3S_64
-- 
1.8.3.1

Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-06 Thread Alistair Popple

I have been hitting EEH address errors testing this with some network
cards which map/unmap DMA addresses more frequently. For example:

PHB4 PHB#5 Diag-data (Version: 1)   

   
brdgCtl:0002

   
RootSts:00060020 00402000 a0220008 00100107 0800

   
PhbSts: 001c 001c   

   
Lem:00010080  0080  

   
PhbErr: 0280 0200 214898000240 a0084000 

   
RxeTceErr:  2000 2000 c000  

   
PblErr: 0002 0002   

   
RegbErr:0040 0040 61000c48  

   
PE[000] A/B: 8300b038 8000  

   

Interestingly the PE[000] A/B data is the same across different cards
and drivers.

- Alistair

On Wednesday, 5 June 2019 11:11:06 PM AEST Shawn Anastasio wrote:
> On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
> > This is an attempt to allow DMA masks between 32..59 which are not large
> > enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
> > on the max order, up to 40 is usually available.
> > 
> > 
> > This is based on v5.2-rc2.
> > 
> > Please comment. Thanks.
> 
> I have tested this patch set with an AMD GPU that's limited to <64bit
> DMA (I believe it's 40 or 42 bit). It successfully allows the card to
> operate without falling back to 32-bit DMA mode as it does without
> the patches.
> 
> Relevant kernel log message:
> ```
> [0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
> ```
> 
> Tested-by: Shawn Anastasio

Re: [PATCH v12 00/31] Speculative page faults

2019-06-06 Thread Haiyan Song

Hi Laurent,

Regression test for v12 patch serials have been run on Intel 2s skylake 
platform,
some regressions were found by LKP-tools (linux kernel performance). Only 
tested the
cases that have been run and found regressions on v11 patch serials.

Get the patch serials from https://github.com/ldu4/linux/tree/spf-v12.
Kernel commit:
  base: a297558ad4479e0c9c5c14f3f69fe43113f72d1c 
(v5.1-rc4-mmotm-2019-04-09-17-51)
  head: 02c5a1f984a8061d075cfd74986ac8aa01d81064 (spf-v12)

Benchmark: will-it-scale
Download link: https://github.com/antonblanchard/will-it-scale/tree/master
Metrics: will-it-scale.per_thread_ops=threads/nr_cpu
test box: lkp-skl-2sp8(nr_cpu=72,memory=192G)
THP: enable / disable
nr_task: 100%

The following is benchmark results, tested 4 times for every case.

a). Enable THP
base  %stddev   changehead   
%stddev
will-it-scale.page_fault3.per_thread_ops63216  ±3%  -16.9%52537   
±4%
will-it-scale.page_fault2.per_thread_ops36862   -9.8% 33256

b). Disable THP
base  %stddev   changehead   
%stddev
will-it-scale.page_fault3.per_thread_ops65111   -18.6%53023  ±2%
will-it-scale.page_fault2.per_thread_ops38164   -12.0%33565

Best regards,
Haiyan Song

On Tue, Apr 16, 2019 at 03:44:51PM +0200, Laurent Dufour wrote:
> This is a port on kernel 5.1 of the work done by Peter Zijlstra to handle
> page fault without holding the mm semaphore [1].
> 
> The idea is to try to handle user space page faults without holding the
> mmap_sem. This should allow better concurrency for massively threaded
> process since the page fault handler will not wait for other threads memory
> layout change to be done, assuming that this change is done in another part
> of the process's memory space. This type of page fault is named speculative
> page fault. If the speculative page fault fails because a concurrency has
> been detected or because underlying PMD or PTE tables are not yet
> allocating, it is failing its processing and a regular page fault is then
> tried.
> 
> The speculative page fault (SPF) has to look for the VMA matching the fault
> address without holding the mmap_sem, this is done by protecting the MM RB
> tree with RCU and by using a reference counter on each VMA. When fetching a
> VMA under the RCU protection, the VMA's reference counter is incremented to
> ensure that the VMA will not freed in our back during the SPF
> processing. Once that processing is done the VMA's reference counter is
> decremented. To ensure that a VMA is still present when walking the RB tree
> locklessly, the VMA's reference counter is incremented when that VMA is
> linked in the RB tree. When the VMA is unlinked from the RB tree, its
> reference counter will be decremented at the end of the RCU grace period,
> ensuring it will be available during this time. This means that the VMA
> freeing could be delayed and could delay the file closing for file
> mapping. Since the SPF handler is not able to manage file mapping, file is
> closed synchronously and not during the RCU cleaning. This is safe since
> the page fault handler is aborting if a file pointer is associated to the
> VMA.
> 
> Using RCU fixes the overhead seen by Haiyan Song using the will-it-scale
> benchmark [2].
> 
> The VMA's attributes checked during the speculative page fault processing
> have to be protected against parallel changes. This is done by using a per
> VMA sequence lock. This sequence lock allows the speculative page fault
> handler to fast check for parallel changes in progress and to abort the
> speculative page fault in that case.
> 
> Once the VMA has been found, the speculative page fault handler would check
> for the VMA's attributes to verify that the page fault has to be handled
> correctly or not. Thus, the VMA is protected through a sequence lock which
> allows fast detection of concurrent VMA changes. If such a change is
> detected, the speculative page fault is aborted and a *classic* page fault
> is tried.  VMA sequence lockings are added when VMA attributes which are
> checked during the page fault are modified.
> 
> When the PTE is fetched, the VMA is checked to see if it has been changed,
> so once the page table is locked, the VMA is valid, so any other changes
> leading to touching this PTE will need to lock the page table, so no
> parallel change is possible at this time.
> 
> The locking of the PTE is done with interrupts disabled, this allows
> checking for the PMD to ensure that there is not an ongoing collapsing
> operation. Since khugepaged is firstly set the PMD to pmd_none and then is
> waiting for the other CPU to have caught the IPI interrupt, if the pmd is
> valid at the time the PTE is locked, we have the guarantee that the
> collapsing operation will have to wait on the PTE lock to move
> forward. This allows the SPF handler to map the PTE safely. If the PMD
> value is

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-06 Thread Christoph Hellwig

On Wed, Jun 05, 2019 at 10:06:18PM -0500, Larry Finger wrote:
> First of all, you have my sympathy for the laborious bisection on a 
> PowerBook G4. I have done several myself. Thank you.
>
> I confirm your results.
>
> The ppc code has a maximum DMA size of 31 bits, thus a 32-bit request will 
> fail. Why the 30-bit fallback fails in b43legacy fails while it works in 
> b43 is a mystery.
>
> Although dma_nommu_dma_supported() may be "largely identical" to 
> dma_direct_supported(), they obviously differ. Routine 
> dma_nommu_dma_supported() returns 1 for 32-bit systems, but I do not know 
> what dma_direct_supported() returns.
>
> I am trying to find a patch.

if (IS_ENABLED(CONFIG_ZONE_DMA))
min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
else
min_mask = DMA_BIT_MASK(32);

min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
return mask >= __phys_to_dma(dev, min_mask);

So the smaller or:

 (1) 32-bit
 (2) ARCH_ZONE_DMA_BITS
 (3) the actual amount of memory in the system

modolo any DMA offsets that come into play.

No offsets should exists on pmac, and ARCH_ZONE_DMA_BITS is 31 on
powerpc.  So unless the system has 1GB or less memory it will probably
return false for b43, because it can't actually guarantee reliable
allocation.  It will work fine on x86 with the smaller ZONE_DMA.

Re: [PATCH 12/16] mm: consolidate the get_user_pages* implementations

2019-06-06 Thread Christoph Hellwig

On Wed, Jun 05, 2019 at 11:01:17PM -0700, John Hubbard wrote:
> I started reviewing this one patch, and it's kind of messy figuring out 
> if the code motion preserves everything because of
> all the consolidation from other places, plus having to move things in
> and out of the ifdef blocks.  So I figured I'd check and see if this is
> going to make it past RFC status soon, and if it's going before or after
> Ira's recent RFC ("RDMA/FS DAX truncate proposal").

I don't like the huge moves either, but I can't really think of any
better way to do it.  Proposals welcome, though.

Re: [PATCH 12/16] mm: consolidate the get_user_pages* implementations

2019-06-06 Thread John Hubbard

On 6/1/19 12:49 AM, Christoph Hellwig wrote:
> Always build mm/gup.c, and move the nommu versions and replace the
> separate stubs for various functions by the default ones, with the _fast
> version always falling back to the slow path because gup_fast_permitted
> always returns false now if HAVE_FAST_GUP is not set, and we use the
> nommu version of __get_user_pages while keeping all the wrappers common.
> 
> This also ensures the new put_user_pages* helpers are available for
> nommu, as those are currently missing, which would create a problem as
> soon as we actually grew users for it.
> 

Hi Christoph,

Thanks for fixing up the nommu case. And the patchset overall is a huge
relief to see, because I'd filed those arches under the "despair" category
for the gup conversions. :)

I started reviewing this one patch, and it's kind of messy figuring out 
if the code motion preserves everything because of
all the consolidation from other places, plus having to move things in
and out of the ifdef blocks.  So I figured I'd check and see if this is
going to make it past RFC status soon, and if it's going before or after
Ira's recent RFC ("RDMA/FS DAX truncate proposal").


thanks,
-- 
John Hubbard
NVIDIA

> Signed-off-by: Christoph Hellwig 
> ---
>  mm/Kconfig  |   1 +
>  mm/Makefile |   4 +-
>  mm/gup.c| 476 +---
>  mm/nommu.c  |  88 --
>  mm/util.c   |  47 --
>  5 files changed, 269 insertions(+), 347 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 98dffb0f2447..5c41409557da 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -133,6 +133,7 @@ config HAVE_MEMBLOCK_PHYS_MAP
>   bool
>  
>  config HAVE_FAST_GUP
> + depends on MMU
>   bool
>  
>  config ARCH_KEEP_MEMBLOCK
> diff --git a/mm/Makefile b/mm/Makefile
> index ac5e5ba78874..dc0746ca1109 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -22,7 +22,7 @@ KCOV_INSTRUMENT_mmzone.o := n
>  KCOV_INSTRUMENT_vmstat.o := n
>  
>  mmu-y:= nommu.o
> -mmu-$(CONFIG_MMU):= gup.o highmem.o memory.o mincore.o \
> +mmu-$(CONFIG_MMU):= highmem.o memory.o mincore.o \
>  mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \
>  msync.o page_vma_mapped.o pagewalk.o \
>  pgtable-generic.o rmap.o vmalloc.o
> @@ -39,7 +39,7 @@ obj-y   := filemap.o mempool.o 
> oom_kill.o fadvise.o \
>  mm_init.o mmu_context.o percpu.o slab_common.o \
>  compaction.o vmacache.o \
>  interval_tree.o list_lru.o workingset.o \
> -debug.o $(mmu-y)
> +debug.o gup.o $(mmu-y)
>  
>  # Give 'page_alloc' its own module-parameter namespace
>  page-alloc-y := page_alloc.o
> diff --git a/mm/gup.c b/mm/gup.c
> index a24f52292c7f..c8da7764de9c 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -134,6 +134,7 @@ void put_user_pages(struct page **pages, unsigned long 
> npages)
>  }
>  EXPORT_SYMBOL(put_user_pages);
>  
> +#ifdef CONFIG_MMU
>  static struct page *no_page_table(struct vm_area_struct *vma,
>   unsigned int flags)
>  {
> @@ -1099,86 +1100,6 @@ static __always_inline long 
> __get_user_pages_locked(struct task_struct *tsk,
>   return pages_done;
>  }
>  
> -/*
> - * We can leverage the VM_FAULT_RETRY functionality in the page fault
> - * paths better by using either get_user_pages_locked() or
> - * get_user_pages_unlocked().
> - *
> - * get_user_pages_locked() is suitable to replace the form:
> - *
> - *  down_read(>mmap_sem);
> - *  do_something()
> - *  get_user_pages(tsk, mm, ..., pages, NULL);
> - *  up_read(>mmap_sem);
> - *
> - *  to:
> - *
> - *  int locked = 1;
> - *  down_read(>mmap_sem);
> - *  do_something()
> - *  get_user_pages_locked(tsk, mm, ..., pages, );
> - *  if (locked)
> - *  up_read(>mmap_sem);
> - */
> -long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> -unsigned int gup_flags, struct page **pages,
> -int *locked)
> -{
> - /*
> -  * FIXME: Current FOLL_LONGTERM behavior is incompatible with
> -  * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
> -  * vmas.  As there are no users of this flag in this call we simply
> -  * disallow this option for now.
> -  */
> - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
> - return -EINVAL;
> -
> - return __get_user_pages_locked(current, current->mm, start, nr_pages,
> -pages, NULL, locked,
> -gup_flags | FOLL_TOUCH);
> -}
> -EXPORT_SYMBOL(get_user_pages_locked);
> -
> -/*
> - * get_user_pages_unlocked() is suitable to replace the form:
> - *
> - *  down_read(>mmap_sem);
> - *  get_user_pages(tsk, mm, ..., pages, NULL);
> - *  up_read(>mmap_sem);
> - *

61 matches

Mail list logo