Re: How to handle PTE tables with non contiguous entries ?

2018-09-10 Thread Christophe LEROY




Le 10/09/2018 à 23:06, Nicholas Piggin a écrit :

On Mon, 10 Sep 2018 14:34:37 +
Christophe Leroy  wrote:


Hi,

I'm having a hard time figuring out the best way to handle the following
situation:

On the powerpc8xx, handling 16k size pages requires to have page tables
with 4 identical entries.

Initially I was thinking about handling this by simply modifying
pte_index() which changing pte_t type in order to have one entry every
16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
*ptep+3 both in set_pte_at() and pte_update().

However, this doesn't work because many many places in the mm core part
of the kernel use loops on ptep with single ptep++ increment.

Therefore did it with the following hack:

   /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
   typedef struct { pte_basic_t pte; } pte_t;
+#endif

@@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
  : "cc" );
   #else /* PTE_ATOMIC_UPDATES */
  unsigned long old = pte_val(*p);
-   *p = __pte((old & ~clr) | set);
+   unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+   *p = __pte(new);
+#endif
   #endif /* !PTE_ATOMIC_UPDATES */

   #ifdef CONFIG_44x


@@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
*mm, unsigned long addr,
  /* Anything else just stores the PTE normally. That covers all
64-bit
   * cases, and 32-bit non-hash with 32-bit PTEs.
   */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
  *ptep = pte;
+#endif



But I'm not too happy with it as it means pte_t is not a single type
anymore so passing it from one function to the other is quite heavy.


Would someone have an idea of an elegent way to handle that ?


I can't think of anything better. Do we pass pte by value to a lot of
non inlined functions? Possible to inline the important ones?


Good question, I need to check that.



Other option, try to get an iterator like pte = pte_next(pte) into core
code.


Yes I've been thinking about that, but it looks like a huge job to 
identify all places, as some drivers are also playing with it.
I'm not sure it is only to find all 'pte++' and 'ptep++', I fear there 
might be places with more unexpected names.


Thanks
Christophe



Thanks,
Nick



[PATCH kernel] powerpc/powernv/ioda2: Reduce upper limit for DMA window size (retry)

2018-09-10 Thread Alexey Kardashevskiy
We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.

Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in other
words we can create a window as big as 1<<58 but DMA simply won't work.

This reduces the upper limit from 59 to 55 bits to let the userspace know
about the hardware limits.

Fixes: ce57c6610cc2 "Merge branch 'topic/ppc-kvm' into next"
Signed-off-by: Alexey Kardashevskiy 
---

The merge commit says  d3d4ffaae439 (the original of this one) was
propagated but it was not:

[vpl1 kernel]$ git s ce57c6610cc2:arch/powerpc/platforms/powernv/pci-ioda-tce.c 
| grep 'page_shift >= '
if ((level_shift - 3) * levels + page_shift >= 60)

---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 6c5db1a..fe96910 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -276,7 +276,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 
bus_offset,
level_shift = entries_shift + 3;
level_shift = max_t(unsigned int, level_shift, PAGE_SHIFT);
 
-   if ((level_shift - 3) * levels + page_shift >= 60)
+   if ((level_shift - 3) * levels + page_shift >= 55)
return -EINVAL;
 
/* Allocate TCE table */
-- 
2.11.0



Re: How to handle PTE tables with non contiguous entries ?

2018-09-10 Thread Christophe LEROY




Le 10/09/2018 à 22:05, Dan Malek a écrit :


Hello Cristophe.


On Sep 10, 2018, at 7:34 AM, Christophe Leroy  wrote:

On the powerpc8xx, handling 16k size pages requires to have page tables with 4 
identical entries.


Do you think a 16k page is useful?  Back in the day, the goal was to keep the 
fault handling and management overhead as simple and generic as possible, as 
you know this affects the system performance.  I understand there would be 
fewer page faults and more efficient use of the MMU resources with 16k, but if 
this comes at an overhead cost, is it really worth it?


Yes that's definitly usefull, the current 16k implementation already 
provides nice results, but it is based on the Linux structure, which 
implies not being able to use the 8xx HW assistance in TLBmiss handlers.


That's the reason why I'm trying to alter the Linux structure to match 
the 8xx page layout, hence the need to have 4 entries in the PTE for 
each 16k page.




In addition to the normal 4k mapping, I had thought about using 512k mapping, 
which could be easily detected at level 2 (PMD), with a single entry loaded 
into the MMU.  We would need an aux header or something from the 
executable/library to assist with knowing when this could be done.  I never got 
around to it. :)


Yes, 512k and 8M hugepages are implemented as well, but they are based 
on Linux structure, hence requiring some time consuming handling like 
checking the page size on every miss in order to run the appropriate 
part of the handler.


With the HW layout, the 512k entries are spread every 128 bytes in the 
PTE table but with those I don't have much problem because the hugepage 
code uses huge_pte_offset() and never increase the pte pointer directly.





The 8xx platforms tended to have smaller memory resources, so the 4k 
granularity was also useful in making better use of the available space.


Well, on my boards I have 128Mbytes, 16k page and hugepages have shown 
their benefit.





Would someone have an idea of an elegent way to handle that ?


My suggestion would be to not change the PTE table, but have the fault handler 
detect a 16k page and load any one of the four entries based upon miss offset.  
Kinda use the same 4k miss hander, but with 16k knowledge.  You wouldn’t save 
any PTE table space, but the MMU efficiency may be worth it.  As I recall, the 
hardware may ignore/mask any LS bits, and there is PMD level information to 
utilize as well.


That's exactly what I want to do, which means that everytime pte++ is 
encountered in some mm/memory.c file needs to push the index to the next 
16k page ie increase the pointer by 4 entries.




It’s been a long time since I’ve investigated how things have evolved, glad 
it’s still in use, and I hope you at least have some fun with the development :)


Thanks
Christophe



Re: [PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal

2018-09-10 Thread David Gibson
On Mon, Sep 10, 2018 at 06:29:12PM +1000, Alexey Kardashevskiy wrote:
> The kvmppc_gpa_to_ua() helper itself takes care of the permission
> bits in the TCE and yet every single caller removes them.
> 
> This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
> (which are GPAs + TCE permission bits) to make the callers simpler.
> 
> This should cause no behavioural change.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> Changes:
> v2:
> * %s/kvmppc_gpa_to_ua/kvmppc_tce_to_ua/g
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
>  arch/powerpc/kvm/book3s_64_vio.c| 12 
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 22 +-
>  3 files changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 2f5d431..38d0328 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -194,7 +194,7 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>   (iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
>   (stt)->size, (ioba), (npages)) ?\
>   H_PARAMETER : H_SUCCESS)
> -extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
>   unsigned long *ua, unsigned long **prmap);
>  extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>   unsigned long idx, unsigned long tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 8231b17..c0c64d1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -378,8 +378,7 @@ static long kvmppc_tce_validate(struct 
> kvmppc_spapr_tce_table *stt,
>   if (iommu_tce_check_gpa(stt->page_shift, gpa))
>   return H_TOO_HARD;
>  
> - if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> - , NULL))
> + if (kvmppc_tce_to_ua(stt->kvm, tce, , NULL))
>   return H_TOO_HARD;
>  
>   list_for_each_entry_rcu(stit, >iommu_tables, next) {
> @@ -552,8 +551,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned 
> long liobn,
>  
>   idx = srcu_read_lock(>kvm->srcu);
>  
> - if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
> - tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), , NULL)) {
> + if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, , NULL)) {
>   ret = H_PARAMETER;
>   goto unlock_exit;
>   }
> @@ -614,7 +612,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   return ret;
>  
>   idx = srcu_read_lock(>kvm->srcu);
> - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , NULL)) {
> + if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, , NULL)) {
>   ret = H_TOO_HARD;
>   goto unlock_exit;
>   }
> @@ -649,9 +647,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   }
>   tce = be64_to_cpu(tce);
>  
> - if (kvmppc_gpa_to_ua(vcpu->kvm,
> - tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> - , NULL))
> + if (kvmppc_tce_to_ua(vcpu->kvm, tce, , NULL))
>   return H_PARAMETER;
>  
>   list_for_each_entry_lockless(stit, >iommu_tables, next) {
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index adf3b21..389dac1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -110,8 +110,7 @@ static long kvmppc_rm_tce_validate(struct 
> kvmppc_spapr_tce_table *stt,
>   if (iommu_tce_check_gpa(stt->page_shift, gpa))
>   return H_PARAMETER;
>  
> - if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> - , NULL))
> + if (kvmppc_tce_to_ua(stt->kvm, tce, , NULL))
>   return H_TOO_HARD;
>  
>   list_for_each_entry_lockless(stit, >iommu_tables, next) {
> @@ -180,10 +179,10 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
>   unsigned long *ua, unsigned long **prmap)
>  {
> - unsigned long gfn = gpa >> PAGE_SHIFT;
> + unsigned long gfn = tce >> PAGE_SHIFT;
>   struct kvm_memory_slot *memslot;
>  
>   memslot = search_memslots(kvm_memslots(kvm), gfn);
> @@ -191,7 +190,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>   return -EINVAL;
>  
>   *ua = __gfn_to_hva_memslot(memslot, gfn) |
> - (gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> + (tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>  
>  #ifdef 

Re: [PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode

2018-09-10 Thread David Gibson
On Mon, Sep 10, 2018 at 06:29:07PM +1000, Alexey Kardashevskiy wrote:
> At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
> which in turn reads the old TCE and if it was a valid entry - marks
> the physical page dirty if it was mapped for writing. Since it is
> the real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
> to get the page struct. However SetPageDirty() itself reads the compound
> page head and returns a virtual address for the head page struct and
> setting dirty bit for that kills the system.
> 
> This adds additional dirty bit tracking into the MM/IOMMU API for use
> in the real mode. Note that this does not change how VFIO and
> KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
> - use the lowest bit of the cached host phys address to carry
> the dirty bit;
> - mark pages dirty when they are unpinned which happens when
> the preregistered memory is released which always happens in virtual
> mode;
> - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
> - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
> in the new mm_iommu_ua_mark_dirty_rm() helper;
> - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
> caller anyway) to reduce the real mode KVM and IOMMU knowledge
> across different subsystems.
> 
> This removes realmode_pfn_to_page() as it is not used anymore.
> 
> While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
> the real mode only and modules cannot call it anyway.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> Changes:
> v2:
> * only do delaying dirtying for the real mode
> * no change in VFIO IOMMU SPAPR TCE driver is needed anymore
> * inverted MM_IOMMU_TABLE_GROUP_PAGE_MASK
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
>  arch/powerpc/include/asm/iommu.h |  2 --
>  arch/powerpc/include/asm/mmu_context.h   |  1 +
>  arch/powerpc/kernel/iommu.c  | 25 --
>  arch/powerpc/kvm/book3s_64_vio_hv.c  | 39 +-
>  arch/powerpc/mm/init_64.c| 49 
> 
>  arch/powerpc/mm/mmu_context_iommu.c  | 34 ---
>  7 files changed, 62 insertions(+), 89 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 13a688f..2fdc865 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1051,7 +1051,6 @@ static inline void vmemmap_remove_mapping(unsigned long 
> start,
>   return hash__vmemmap_remove_mapping(start, page_size);
>  }
>  #endif
> -struct page *realmode_pfn_to_page(unsigned long pfn);
>  
>  static inline pte_t pmd_pte(pmd_t pmd)
>  {
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index ab3a4fb..3d4b88c 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -220,8 +220,6 @@ extern void iommu_del_device(struct device *dev);
>  extern int __init tce_iommu_bus_notifier_init(void);
>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>   unsigned long *hpa, enum dma_data_direction *direction);
> -extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> - unsigned long *hpa, enum dma_data_direction *direction);
>  #else
>  static inline void iommu_register_group(struct iommu_table_group 
> *table_group,
>   int pci_domain_number,
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index b2f89b6..b694d6a 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -38,6 +38,7 @@ extern long mm_iommu_ua_to_hpa(struct 
> mm_iommu_table_group_mem_t *mem,
>   unsigned long ua, unsigned int pageshift, unsigned long *hpa);
>  extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>   unsigned long ua, unsigned int pageshift, unsigned long *hpa);
> +extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long 
> ua);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
>  #endif
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index af7a20d..19b4c62 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1013,31 +1013,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned 
> long entry,
>  }
>  EXPORT_SYMBOL_GPL(iommu_tce_xchg);
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> - unsigned long *hpa, enum dma_data_direction *direction)
> -{
> - long ret;
> -
> - ret = tbl->it_ops->exchange_rm(tbl, 

[PATCH] powerpc/tm: Fix HFSCR bit for no suspend case

2018-09-10 Thread Michael Neuling
Currently on P9N DD2.1 we end up taking infinite TM facility
unavailable exceptions on the first TM usage by userspace.

In the special case of TM no suspend (P9N DD2.1), Linux is told TM is
off via CPU dt-ftrs but told to (partially) use it via
OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] will be off from
dt-ftrs but we need to turn it on for the no suspend case.

This patch fixes this by enabling HFSCR TM in this case.

Cc: sta...@vger.kernel.org # 4.15+
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/setup_64.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6a501b25dd..faf00222b3 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -243,13 +243,19 @@ static void cpu_ready_for_interrupts(void)
}
 
/*
-* Fixup HFSCR:TM based on CPU features. The bit is set by our
-* early asm init because at that point we haven't updated our
-* CPU features from firmware and device-tree. Here we have,
-* so let's do it.
+* Set HFSCR:TM based on CPU features:
+* In the special case of TM no suspend (P9N DD2.1), Linux is
+* told TM is off via the dt-ftrs but told to (partially) use
+* it via OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM]
+* will be off from dt-ftrs but we need to turn it on for the
+* no suspend case.
 */
-   if (cpu_has_feature(CPU_FTR_HVMODE) && 
!cpu_has_feature(CPU_FTR_TM_COMP))
-   mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM);
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   if (cpu_has_feature(CPU_FTR_TM_COMP))
+   mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) | HFSCR_TM);
+   else
+   mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM);
+   }
 
/* Set IR and DR in PACA MSR */
get_paca()->kernel_msr = MSR_KERNEL;
-- 
2.17.1



[PATCH] powerpc/mpc85xx: fix issues in clock node

2018-09-10 Thread andy . tang
From: Yuantian Tang 

The compatible string is not correct in the clock node.
The clocks property refers to the wrong node too.
This patch is to fix them.

Signed-off-by: Tang Yuantian 
---
 arch/powerpc/boot/dts/fsl/t1023si-post.dtsi |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi
index 4908af5..763caf4 100644
--- a/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t1023si-post.dtsi
@@ -348,7 +348,7 @@
mux0: mux0@0 {
#clock-cells = <0>;
reg = <0x0 4>;
-   compatible = "fsl,core-mux-clock";
+   compatible = "fsl,qoriq-core-mux-2.0";
clocks = < 0>, < 1>;
clock-names = "pll0_0", "pll0_1";
clock-output-names = "cmux0";
@@ -356,9 +356,9 @@
mux1: mux1@20 {
#clock-cells = <0>;
reg = <0x20 4>;
-   compatible = "fsl,core-mux-clock";
-   clocks = < 0>, < 1>;
-   clock-names = "pll0_0", "pll0_1";
+   compatible = "fsl,qoriq-core-mux-2.0";
+   clocks = < 0>, < 1>;
+   clock-names = "pll1_0", "pll1_1";
clock-output-names = "cmux1";
};
};
-- 
1.7.1



Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Paul Mackerras
On Mon, Sep 10, 2018 at 08:05:38PM +1000, Michael Neuling wrote:
> 
> > > + /* Make sure we aren't patching a freed init section */
> > > + if (in_init_section(patch_addr) && init_freed())
> > > + return 0;
> > > +
> > 
> > Do we even need the init_freed() check?
> 
> Maybe not.  If userspace isn't up, then maybe it's ok to skip.

Isn't this same function used for patching asm feature sections?  It's
not OK to skip patching them in init code.

> > What user input can we process in init-only code?
> 
> See the stack trace in the commit message. It's a weird case for KVM guests in
> KVM PR mode. 

The fault_in_pages_readable (formerly __get_user) there isn't actually
reading userspace, it's just a way of doing a load with a convenient
way to handle it if it traps.

Paul.


Re: MPC83xx reset status register (RSR, offset 0x910)

2018-09-10 Thread Radu Rendec
Hi,

On Mon, 2018-09-10 at 07:37 +0200, Christophe Leroy wrote:
> Le 10/09/2018 à 01:13, Radu Rendec a écrit :
> >
> > I'm using U-boot as well, but it's just not configured to read or clear
> > the RSR. I'm curious: if U-boot reads/clears the RSR in your case, how
> > do you make the initial value available to user space programs running
> > under Linux?
>
> I'm surprised. When looking at U-boot code, I don't see any way to
> configure that. It seems just do by default in function cpu_init_f():
>
> https://elixir.bootlin.com/u-boot/v2018.07/source/arch/powerpc/cpu/mpc83xx/cpu_init.c#L217
>
> /* RSR - Reset Status Register - clear all status (4.6.1.3) */
> gd->arch.reset_status = __raw_readl(>reset.rsr);
> __raw_writel(~(RSR_RES), >reset.rsr);

I'm working as a contractor in a large embedded project, so I don't know
all the bits and pieces. I just checked the U-boot code. Whoever was
maintaining it, "configured" it by commenting out the __raw_writel()
that clears the register :)

Probably the reason was specifically to be able to read it from Linux,
but unfortunately the guy is not here any more to ask him.

It may make more sense to read it from U-boot, but (1) the value must
still be passed to Linux somehow and (2) in my case, I would prefer not
to touch U-boot.

> Do you know any user space program in Linux that needs this value ?

I don't know of any "standard" program that needs it. In the project I'm
working on, there are multiple peripherals on the board and initialization
is slightly different when the reset line is physically asserted vs. a
soft CPU reset. Besides, we need to show the reset reason to the user.

I guess in the embedded world this is a fairly common use case, so
perhaps others can benefit from that if I fix it in a way that can be
pushed upstream.

> > Thank you very much for the patches. Is there any chance they can be
> > submitted upstream?
>
> I see no problem submitting them upstream, but are they really worth it
> ? Adding Michael in copy to get his opinion.

I guess it's worth if they are changed to make the value available to
the kernel and user space rather than just decoding/printing it, for the
reasons I mentioned above.

The MPC83xx also has a watchdog and the kernel driver (mpc8xxx_wdt.c)
could also be improved to support the WDIOC_GETBOOTSTATUS ioctl and
properly report if the system rebooted due to a watchdog.

> > I tried to look for something similar on other platforms or architectures,
> > but couldn't find anything.
>
> I believe furst thing is to identify some app needing such an
> information, then we'll be able to investigate how to handle it.

Well, I guess I explained my reasons and use case. If there is any
interest in that, I will gladly implement it in a way that makes sense
to upstream. Let's see what Michael thinks.

Thanks,
Radu Rendec


Re: How to handle PTE tables with non contiguous entries ?

2018-09-10 Thread Dan Malek


Hello Cristophe.

> On Sep 10, 2018, at 7:34 AM, Christophe Leroy  wrote:
> 
> On the powerpc8xx, handling 16k size pages requires to have page tables with 
> 4 identical entries.

Do you think a 16k page is useful?  Back in the day, the goal was to keep the 
fault handling and management overhead as simple and generic as possible, as 
you know this affects the system performance.  I understand there would be 
fewer page faults and more efficient use of the MMU resources with 16k, but if 
this comes at an overhead cost, is it really worth it?

In addition to the normal 4k mapping, I had thought about using 512k mapping, 
which could be easily detected at level 2 (PMD), with a single entry loaded 
into the MMU.  We would need an aux header or something from the 
executable/library to assist with knowing when this could be done.  I never got 
around to it. :)

The 8xx platforms tended to have smaller memory resources, so the 4k 
granularity was also useful in making better use of the available space.

> Would someone have an idea of an elegent way to handle that ?

My suggestion would be to not change the PTE table, but have the fault handler 
detect a 16k page and load any one of the four entries based upon miss offset.  
Kinda use the same 4k miss hander, but with 16k knowledge.  You wouldn’t save 
any PTE table space, but the MMU efficiency may be worth it.  As I recall, the 
hardware may ignore/mask any LS bits, and there is PMD level information to 
utilize as well.

It’s been a long time since I’ve investigated how things have evolved, glad 
it’s still in use, and I hope you at least have some fun with the development :)

Thanks.

— Dan



Re: How to handle PTE tables with non contiguous entries ?

2018-09-10 Thread Nicholas Piggin
On Mon, 10 Sep 2018 14:34:37 +
Christophe Leroy  wrote:

> Hi,
> 
> I'm having a hard time figuring out the best way to handle the following 
> situation:
> 
> On the powerpc8xx, handling 16k size pages requires to have page tables 
> with 4 identical entries.
> 
> Initially I was thinking about handling this by simply modifying 
> pte_index() which changing pte_t type in order to have one entry every 
> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and 
> *ptep+3 both in set_pte_at() and pte_update().
> 
> However, this doesn't work because many many places in the mm core part 
> of the kernel use loops on ptep with single ptep++ increment.
> 
> Therefore did it with the following hack:
> 
>   /* PTE level */
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
> +#else
>   typedef struct { pte_basic_t pte; } pte_t;
> +#endif
> 
> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>  : "cc" );
>   #else /* PTE_ATOMIC_UPDATES */
>  unsigned long old = pte_val(*p);
> -   *p = __pte((old & ~clr) | set);
> +   unsigned long new = (old & ~clr) | set;
> +
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
> +#else
> +   *p = __pte(new);
> +#endif
>   #endif /* !PTE_ATOMIC_UPDATES */
> 
>   #ifdef CONFIG_44x
> 
> 
> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct 
> *mm, unsigned long addr,
>  /* Anything else just stores the PTE normally. That covers all 
> 64-bit
>   * cases, and 32-bit non-hash with 32-bit PTEs.
>   */
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
> +#else
>  *ptep = pte;
> +#endif
> 
> 
> 
> But I'm not too happy with it as it means pte_t is not a single type 
> anymore so passing it from one function to the other is quite heavy.
> 
> 
> Would someone have an idea of an elegent way to handle that ?

I can't think of anything better. Do we pass pte by value to a lot of
non inlined functions? Possible to inline the important ones?

Other option, try to get an iterator like pte = pte_next(pte) into core
code.

Thanks,
Nick


Re: [PATCH 2/5] include: add setbits32/clrbits32/clrsetbits32/setbits64/clrbits64/clrsetbits64 in linux/setbits.h

2018-09-10 Thread LABBE Corentin
On Mon, Sep 10, 2018 at 07:22:04AM +0200, Christophe LEROY wrote:
> 
> 
> Le 07/09/2018 à 21:41, Corentin Labbe a écrit :
> > This patch adds setbits32/clrbits32/clrsetbits32 and
> > setbits64/clrbits64/clrsetbits64 in linux/setbits.h header.
> 
> So you changed the name of setbits32() ... to setbits32_be() and now you 
> are adding new functions called setbits32() ... which do something 
> different ?
> 
> What will happen if any file has been forgotten during the conversion, 
> or if anybody has outoftree drivers and missed this change ?
> They will silently successfully compile without any error or warning, 
> and the result will be crap buggy.
> 
> And why would it be more legitim to have setbits32() be implicitely LE 
> instead of implicitely BE ?
> 
> I really think those new functions should be called something like 
> setbits_le32() ...
> 

I believed that writel/readl was endian agnostic so it explain my mistake.

I will use xxxbits_le32 as you requests.

Thanks
Regards


Re: [PATCH 1/5] powerpc: rename setbits32/clrbits32 to setbits32_be/clrbits32_be

2018-09-10 Thread LABBE Corentin
On Mon, Sep 10, 2018 at 07:16:56AM +0200, Christophe LEROY wrote:
> 
> 
> Le 07/09/2018 à 21:41, Corentin Labbe a écrit :
> > Since setbits32/clrbits32 work on be32, it's better to remove ambiguity on
> > the used data type.
> 
> Wouldn't it be better to call them setbits_be32() / clrbits_be32() to 
> have something looking similar to in_be32() / ou_be32() ?
> 

I agree, I will update the patch.

Thanks



Re: [PATCH 2/5] include: add setbits32/clrbits32/clrsetbits32/setbits64/clrbits64/clrsetbits64 in linux/setbits.h

2018-09-10 Thread LABBE Corentin
On Fri, Sep 07, 2018 at 03:00:40PM -0500, Scott Wood wrote:
> On Fri, 2018-09-07 at 19:41 +, Corentin Labbe wrote:
> > This patch adds setbits32/clrbits32/clrsetbits32 and
> > setbits64/clrbits64/clrsetbits64 in linux/setbits.h header.
> > 
> > Signed-off-by: Corentin Labbe 
> > ---
> >  include/linux/setbits.h | 55
> > +
> >  1 file changed, 55 insertions(+)
> >  create mode 100644 include/linux/setbits.h
> > 
> > diff --git a/include/linux/setbits.h b/include/linux/setbits.h
> > new file mode 100644
> > index ..3e1e273551bb
> > --- /dev/null
> > +++ b/include/linux/setbits.h
> > @@ -0,0 +1,55 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __LINUX_SETBITS_H
> > +#define __LINUX_SETBITS_H
> > +
> > +#include 
> > +
> > +#define __setbits(readfunction, writefunction, addr, set) \
> > +   writefunction((readfunction(addr) | (set)), addr)
> > +#define __clrbits(readfunction, writefunction, addr, mask) \
> > +   writefunction((readfunction(addr) & ~(mask)), addr)
> > +#define __clrsetbits(readfunction, writefunction, addr, mask, set) \
> > +   writefunction(((readfunction(addr) & ~(mask)) | (set)), addr)
> > +#define __setclrbits(readfunction, writefunction, addr, mask, set) \
> > +   writefunction(((readfunction(addr) | (seti)) & ~(mask)), addr)
> > +
> > +#define setbits32(addr, set) __setbits(readl, writel, addr, set)
> > +#define setbits32_relaxed(addr, set) __setbits(readl_relaxed,
> > writel_relaxed, \
> > +  addr, set)
> > +
> > +#define clrbits32(addr, mask) __clrbits(readl, writel, addr, mask)
> > +#define clrbits32_relaxed(addr, mask) __clrbits(readl_relaxed,
> > writel_relaxed, \
> > +   addr, mask)
> 
> So now setbits32/clrbits32 is implicitly little-endian?  Introducing new
> implicit-endian accessors is probably a bad thing in general, but doing it
> with a name that until this patchset was implicitly big-endian (at least on
> powerpc) seems worse.  Why not setbits32_le()?
> 

I believed that writel/readl was endian agnostic, but It seems that I was wrong.
So I will use _le32.

> 
> > +
> > +#define clrsetbits32(addr, mask, set) __clrsetbits(readl, writel, addr,
> > mask, set)
> > +#define clrsetbits32_relaxed(addr, mask, set) __clrsetbits(readl_relaxed, \
> > +  writel_relaxed,
> > \
> > +  addr, mask, set)
> > +
> > +#define setclrbits32(addr, mask, set) __setclrbits(readl, writel, addr,
> > mask, set)
> > +#define setclrbits32_relaxed(addr, mask, set) __setclrbits(readl_relaxed, \
> > +  writel_relaxed,
> > \
> > +  addr, mask, set)
> 
> What's the use case for setclrbits?  I don't see it used anywhere in this
> patchset (not even in the coccinelle patterns), it doesn't seem like it would
> be a common pattern, and it could easily get confused with clrsetbits.
> 

It is absent from the coccinelle script due to copy/paste error.
And absent from patchset since it is only two possible example that I can test.

If you run the next fixed coccinelle script, you will find some setclrbits.
Since I fear that mask and set could have some common bits sometimes, I prefer 
to keep separate clrsetbits and setclrbits.

Regards


Re: [PATCH v2 0/3] tty: hvc: latency break regression fixes

2018-09-10 Thread Greg Kroah-Hartman
On Sun, Sep 09, 2018 at 03:39:13PM +1000, Nicholas Piggin wrote:
> Re-sending this one with the used-uinitialized warning in patch
> 3 fixed.
> 
> Greg these patches are needed to fix regressions in this merge
> window, please consider them for your tty tree.

All now queued up, thanks.

greg k-h


Re: [PATCH v2 5/5] PCI/powerpc/eeh: Add pcibios hooks for preparing to rescan

2018-09-10 Thread Sergey Miroshnichenko
Hello Sam,

On 9/10/18 8:03 AM, Sam Bobroff wrote:
> Hi Sergey,
> 
> On Thu, Sep 06, 2018 at 02:57:52PM +0300, Sergey Miroshnichenko wrote:
>> Reading an empty slot returns all ones, which triggers a false
>> EEH error event on PowerNV.
>>
>> New callbacks pcibios_rescan_prepare/done are introduced to
>> pause/resume the EEH during rescan.
> 
> If I understand it correctly, this temporarily disables EEH for config space
> accesses on the whole PHB while the rescan runs. Is it possible that a
> real EEH event could be missed if it occurred during the rescan?
> 
> Even if it's not possible, I think it would be good to mention that in a
> comment.

Yes, missing a real EEH event is possible, unfortunately, and it is
indeed worth mentioning.

To reduce this probability the next patchset I'll post in a few days
among other things puts all the affected device drivers to pause during
rescan, mainly because of moving BARs and bridge windows, but it will
also help here a bit.

> 
>> Signed-off-by: Sergey Miroshnichenko 
>> ---
>>  arch/powerpc/include/asm/eeh.h   |  2 ++
>>  arch/powerpc/kernel/eeh.c| 12 +++
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 22 
>>  drivers/pci/probe.c  | 14 +
>>  include/linux/pci.h  |  2 ++
>>  5 files changed, 52 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> index 219637ea69a1..926c3e31df99 100644
>> --- a/arch/powerpc/include/asm/eeh.h
>> +++ b/arch/powerpc/include/asm/eeh.h
>> @@ -219,6 +219,8 @@ struct eeh_ops {
>>  int (*next_error)(struct eeh_pe **pe);
>>  int (*restore_config)(struct pci_dn *pdn);
>>  int (*notify_resume)(struct pci_dn *pdn);
>> +int (*pause)(struct pci_bus *bus);
>> +int (*resume)(struct pci_bus *bus);
> 
> I think these names are a bit too generic, what about naming them
> pause_bus()/resume_bus() or even prepare_rescan()/rescan_done()?
> 

Thanks! I will rename them to rescan_prepare/rescan_done to make friends
with reset_prepare/reset_done from struct pci_error_handlers.

>>  };
>>  
>>  extern int eeh_subsystem_flags;
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index 6ebba3e48b01..9fb5012f389d 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -1831,3 +1831,15 @@ static int __init eeh_init_proc(void)
>>  return 0;
>>  }
>>  __initcall(eeh_init_proc);
>> +
>> +void pcibios_rescan_prepare(struct pci_bus *bus)
>> +{
>> +if (eeh_ops && eeh_ops->pause)
>> +eeh_ops->pause(bus);
>> +}
>> +
>> +void pcibios_rescan_done(struct pci_bus *bus)
>> +{
>> +if (eeh_ops && eeh_ops->resume)
>> +eeh_ops->resume(bus);
>> +}
>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
>> b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> index 3c1beae29f2d..9724a58afcd2 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> @@ -59,6 +59,26 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
>>  eeh_sysfs_add_device(pdev);
>>  }
>>  
>> +static int pnv_eeh_pause(struct pci_bus *bus)
>> +{
>> +struct pci_controller *hose = pci_bus_to_host(bus);
>> +struct pnv_phb *phb = hose->private_data;
>> +
>> +phb->flags &= ~PNV_PHB_FLAG_EEH;
>> +disable_irq(eeh_event_irq);
>> +return 0;
>> +}
>> +
>> +static int pnv_eeh_resume(struct pci_bus *bus)
>> +{
>> +struct pci_controller *hose = pci_bus_to_host(bus);
>> +struct pnv_phb *phb = hose->private_data;
>> +
>> +enable_irq(eeh_event_irq);
>> +phb->flags |= PNV_PHB_FLAG_EEH;
>> +return 0;
>> +}
>> +
>>  static int pnv_eeh_init(void)
>>  {
>>  struct pci_controller *hose;
>> @@ -1710,6 +1730,8 @@ static struct eeh_ops pnv_eeh_ops = {
>>  .write_config   = pnv_eeh_write_config,
>>  .next_error = pnv_eeh_next_error,
>>  .restore_config = pnv_eeh_restore_config,
>> +.pause  = pnv_eeh_pause,
>> +.resume = pnv_eeh_resume,
>>  .notify_resume  = NULL
>>  };
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index ac876e32de4b..4a9045364809 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2801,6 +2801,14 @@ void __weak pcibios_remove_bus(struct pci_bus *bus)
>>  {
>>  }
>>  
>> +void __weak pcibios_rescan_prepare(struct pci_bus *bus)
>> +{
>> +}
>> +
>> +void __weak pcibios_rescan_done(struct pci_bus *bus)
>> +{
>> +}
>> +
>>  struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
>>  struct pci_ops *ops, void *sysdata, struct list_head *resources)
>>  {
>> @@ -3055,9 +3063,15 @@ unsigned int pci_rescan_bus_bridge_resize(struct 
>> pci_dev *bridge)
>>  unsigned int pci_rescan_bus(struct pci_bus *bus)
>>  {
>>  unsigned int max;
>> +struct pci_bus *root = bus;
>> +
>> +while 

Re: [PATCH v2 2/5] powerpc/pci: Create pci_dn on demand

2018-09-10 Thread Sergey Miroshnichenko
Hello Sam,

On 9/10/18 7:47 AM, Sam Bobroff wrote:
> Hi Sergey,
> 
> On Thu, Sep 06, 2018 at 02:57:49PM +0300, Sergey Miroshnichenko wrote:
>> The pci_dn structures can be created not only from DT, but also
>> directly from newly discovered PCIe devices, so allocate them
>> dynamically.
>>
>> Signed-off-by: Sergey Miroshnichenko 
>> ---
>>  arch/powerpc/kernel/pci_dn.c | 76 
>>  1 file changed, 59 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>> index ab147a1909c8..48ec16407835 100644
>> --- a/arch/powerpc/kernel/pci_dn.c
>> +++ b/arch/powerpc/kernel/pci_dn.c
>> @@ -33,6 +33,8 @@
>>  #include 
>>  #include 
>>  
>> +static struct pci_dn *create_pdn(struct pci_dev *pdev, struct pci_dn 
>> *parent);
>> +
>>  /*
>>   * The function is used to find the firmware data of one
>>   * specific PCI device, which is attached to the indicated
>> @@ -58,6 +60,9 @@ static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
>>  pbus = pbus->parent;
>>  }
>>  
>> +if (!pbus->self && !pci_is_root_bus(pbus))
>> +return NULL;
>> +
>>  /*
>>   * Except virtual bus, all PCI buses should
>>   * have device nodes.
>> @@ -65,13 +70,15 @@ static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
>>  dn = pci_bus_to_OF_node(pbus);
>>  pdn = dn ? PCI_DN(dn) : NULL;
>>  
>> +if (!pdn && pbus->self)
>> +pdn = pbus->self->dev.archdata.pci_data;
>> +
>>  return pdn;
>>  }
>>  
>>  struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
>>  int devfn)
>>  {
>> -struct device_node *dn = NULL;
>>  struct pci_dn *parent, *pdn;
>>  struct pci_dev *pdev = NULL;
>>  
>> @@ -80,17 +87,10 @@ struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
>>  if (pdev->devfn == devfn) {
>>  if (pdev->dev.archdata.pci_data)
>>  return pdev->dev.archdata.pci_data;
>> -
>> -dn = pci_device_to_OF_node(pdev);
>>  break;
>>  }
>>  }
>>  
>> -/* Fast path: fetch from device node */
>> -pdn = dn ? PCI_DN(dn) : NULL;
>> -if (pdn)
>> -return pdn;
>> -
> 
> Why is it necessary to remove the above fast-path?
> 

It is not, actually - this had leaked from early stages of debugging,
when I've found that after hotplug+rescan or hotplug+kexec the kernel
took pdns from device nodes that do not represent the actual PCIe
topology anymore. But after patches 3 and 4 the PCI_DN() is NULL for
every PCIe device except root on PowerNV, so this block is safe.

It will remain in version 3 of this patchset.

>>  /* Slow path: fetch from firmware data hierarchy */
>>  parent = pci_bus_to_pdn(bus);
>>  if (!parent)
>> @@ -128,16 +128,9 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
>>  if (!parent)
>>  return NULL;
>>  
>> -list_for_each_entry(pdn, >child_list, list) {
>> -if (pdn->busno == pdev->bus->number &&
>> -pdn->devfn == pdev->devfn)
>> -return pdn;
>> -}
> 
> Could you explain why the above block was removed? Is it now impossible
> for it to find a pdn?
> 

I see now that this block was also removed too early: on PowerNV with
this patchset it is impossible to have pdn being present in
parent->child_list but not in pdev->dev.archdata.pci_data; but this may
be not the case for pSeries.

>> -
>> -return NULL;
>> +return create_pdn(pdev, parent);
>>  }
>>  
>> -#ifdef CONFIG_PCI_IOV
>>  static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
>> int vf_index,
>> int busno, int devfn)
>> @@ -156,7 +149,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
>> *parent,
>>  pdn->parent = parent;
>>  pdn->busno = busno;
>>  pdn->devfn = devfn;
>> +#ifdef CONFIG_PCI_IOV
>>  pdn->vf_index = vf_index;
>> +#endif /* CONFIG_PCI_IOV */
>>  pdn->pe_number = IODA_INVALID_PE;
>>  INIT_LIST_HEAD(>child_list);
>>  INIT_LIST_HEAD(>list);
> 
> I can see that this change allows you to re-use this to set up a pdn in
> create_pdn(). Perhaps you should refactor pci_add_device_node_info() to
> use it as well, now that it's possible?
> 

Sure, will do.

>> @@ -164,7 +159,54 @@ static struct pci_dn *add_one_dev_pci_data(struct 
>> pci_dn *parent,
>>  
>>  return pdn;
>>  }
>> -#endif
>> +
>> +static struct pci_dn *create_pdn(struct pci_dev *pdev, struct pci_dn 
>> *parent)
>> +{
>> +struct pci_dn *pdn = NULL;
>> +
>> +pdn = add_one_dev_pci_data(parent, 0, pdev->bus->number, pdev->devfn);
>> +dev_info(>dev, "Create a new pdn for devfn %2x\n", pdev->devfn / 
>> 8);
>> +
>> +if (pdn) {
>> +#ifdef CONFIG_EEH
>> +struct eeh_dev *edev;
>> +#endif /* CONFIG_EEH */
>> +  

Re: [PATCH v2 1/5] powerpc/pci: Access PCI config space directly w/o pci_dn

2018-09-10 Thread Sergey Miroshnichenko
Hello Sam,

On 9/10/18 7:23 AM, Sam Bobroff wrote:
> Hi Sergey,
> 
> On Thu, Sep 06, 2018 at 02:57:48PM +0300, Sergey Miroshnichenko wrote:
>> The pci_dn structures are retrieved from a DT, but hot-plugged PCIe
>> devices don't have them. Don't stop PCIe I/O in absence of pci_dn, so
>> it is now possible to discover new devices.
>>
>> Signed-off-by: Sergey Miroshnichenko 
>> ---
>>  arch/powerpc/kernel/rtas_pci.c   | 97 +++-
>>  arch/powerpc/platforms/powernv/pci.c | 64 --
>>  2 files changed, 109 insertions(+), 52 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
>> index c2b148b1634a..0611b46d9b5f 100644
>> --- a/arch/powerpc/kernel/rtas_pci.c
>> +++ b/arch/powerpc/kernel/rtas_pci.c
>> @@ -55,10 +55,26 @@ static inline int config_access_valid(struct pci_dn *dn, 
>> int where)
>>  return 0;
>>  }
>>  
>> -int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
>> +static int rtas_read_raw_config(unsigned long buid, int busno, unsigned int 
>> devfn,
>> +int where, int size, u32 *val)
>>  {
>>  int returnval = -1;
>> -unsigned long buid, addr;
>> +unsigned long addr = rtas_config_addr(busno, devfn, where);
>> +int ret;
>> +
>> +if (buid) {
>> +ret = rtas_call(ibm_read_pci_config, 4, 2, ,
>> +addr, BUID_HI(buid), BUID_LO(buid), size);
>> +} else {
>> +ret = rtas_call(read_pci_config, 2, 2, , addr, size);
>> +}
>> +*val = returnval;
>> +
>> +return ret;
>> +}
>> +
>> +int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
>> +{
>>  int ret;
>>  
>>  if (!pdn)
>> @@ -71,16 +87,8 @@ int rtas_read_config(struct pci_dn *pdn, int where, int 
>> size, u32 *val)
>>  return PCIBIOS_SET_FAILED;
>>  #endif
>>  
>> -addr = rtas_config_addr(pdn->busno, pdn->devfn, where);
>> -buid = pdn->phb->buid;
>> -if (buid) {
>> -ret = rtas_call(ibm_read_pci_config, 4, 2, ,
>> -addr, BUID_HI(buid), BUID_LO(buid), size);
>> -} else {
>> -ret = rtas_call(read_pci_config, 2, 2, , addr, size);
>> -}
>> -*val = returnval;
>> -
>> +ret = rtas_read_raw_config(pdn->phb->buid, pdn->busno, pdn->devfn,
>> +   where, size, val);
>>  if (ret)
>>  return PCIBIOS_DEVICE_NOT_FOUND;
>>  
>> @@ -98,18 +106,44 @@ static int rtas_pci_read_config(struct pci_bus *bus,
>>  
>>  pdn = pci_get_pdn_by_devfn(bus, devfn);
>>  
>> -/* Validity of pdn is checked in here */
>> -ret = rtas_read_config(pdn, where, size, val);
>> -if (*val == EEH_IO_ERROR_VALUE(size) &&
>> -eeh_dev_check_failure(pdn_to_eeh_dev(pdn)))
>> -return PCIBIOS_DEVICE_NOT_FOUND;
>> +if (pdn && eeh_enabled()) {
>> +/* Validity of pdn is checked in here */
>> +ret = rtas_read_config(pdn, where, size, val);
>> +
>> +if (*val == EEH_IO_ERROR_VALUE(size) &&
>> +eeh_dev_check_failure(pdn_to_eeh_dev(pdn)))
>> +ret = PCIBIOS_DEVICE_NOT_FOUND;
>> +} else {
>> +struct pci_controller *phb = pci_bus_to_host(bus);
>> +
>> +ret = rtas_read_raw_config(phb->buid, bus->number, devfn,
>> +   where, size, val);
>> +}
> 
> In the above block, if pdn is valid but EEH isn't enabled,
> rtas_read_raw_config() will be used instead of rtas_read_config(), so
> config_access_valid() won't be tested. Is that correct?
> 

Thank you for the review!

This was the original intention, but now I can see that if a pdn is
valid, the EEH-branch should be taken even if EEH is disabled, as it was
before this patch; and functions there have checks for eeh_enabled()
inside. I'll fix that in v3 as follows:

-   if (pdn && eeh_enabled()) {
+   if (pdn) {

>>  
>>  return ret;
>>  }
>>  
>> +static int rtas_write_raw_config(unsigned long buid, int busno, unsigned 
>> int devfn,
>> + int where, int size, u32 val)
>> +{
>> +unsigned long addr = rtas_config_addr(busno, devfn, where);
>> +int ret;
>> +
>> +if (buid) {
>> +ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr,
>> +BUID_HI(buid), BUID_LO(buid), size, (ulong)val);
>> +} else {
>> +ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, 
>> (ulong)val);
>> +}
>> +
>> +if (ret)
>> +return PCIBIOS_DEVICE_NOT_FOUND;
>> +
>> +return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>>  int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val)
>>  {
>> -unsigned long buid, addr;
>>  int ret;
>>  
>>  if (!pdn)
>> @@ -122,15 +156,8 @@ int rtas_write_config(struct pci_dn *pdn, int where, 
>> int size, u32 val)
>>  return PCIBIOS_SET_FAILED;
>>  #endif
>>  
>> -addr = 

[PATCH v5 2/2] powerpc/pseries:Remove unneeded uses of dlpar work queue

2018-09-10 Thread Nathan Fontenot
There are three instances in which dlpar hotplug events are invoked;
handling a hotplug interrupt (in a kvm guest), handling a dlpar
request through sysfs, and updating LMB affinity when handling a
PRRN event. Only in the case of handling a hotplug interrupt do we
have to put the work on a workqueue, the other cases can handle the
dlpar request directly.

This patch exports the handle_dlpar_errorlog() function so that
dlpar hotplug events can be handled directly and updates the two
instances mentioned above to use the direct invocation.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/platforms/pseries/dlpar.c|   37 +++--
 arch/powerpc/platforms/pseries/mobility.c |   18 +-
 arch/powerpc/platforms/pseries/pseries.h  |5 ++--
 arch/powerpc/platforms/pseries/ras.c  |2 +-
 4 files changed, 19 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index a0b20c03f078..052c4f2ba0a0 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -32,8 +32,6 @@ static struct workqueue_struct *pseries_hp_wq;
 struct pseries_hp_work {
struct work_struct work;
struct pseries_hp_errorlog *errlog;
-   struct completion *hp_completion;
-   int *rc;
 };
 
 struct cc_workarea {
@@ -329,7 +327,7 @@ int dlpar_release_drc(u32 drc_index)
return 0;
 }
 
-static int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog)
+int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog)
 {
int rc;
 
@@ -371,20 +369,13 @@ static void pseries_hp_work_fn(struct work_struct *work)
struct pseries_hp_work *hp_work =
container_of(work, struct pseries_hp_work, work);
 
-   if (hp_work->rc)
-   *(hp_work->rc) = handle_dlpar_errorlog(hp_work->errlog);
-   else
-   handle_dlpar_errorlog(hp_work->errlog);
-
-   if (hp_work->hp_completion)
-   complete(hp_work->hp_completion);
+   handle_dlpar_errorlog(hp_work->errlog);
 
kfree(hp_work->errlog);
kfree((void *)work);
 }
 
-void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog,
-struct completion *hotplug_done, int *rc)
+void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog)
 {
struct pseries_hp_work *work;
struct pseries_hp_errorlog *hp_errlog_copy;
@@ -397,13 +388,9 @@ void queue_hotplug_event(struct pseries_hp_errorlog 
*hp_errlog,
if (work) {
INIT_WORK((struct work_struct *)work, pseries_hp_work_fn);
work->errlog = hp_errlog_copy;
-   work->hp_completion = hotplug_done;
-   work->rc = rc;
queue_work(pseries_hp_wq, (struct work_struct *)work);
} else {
-   *rc = -ENOMEM;
kfree(hp_errlog_copy);
-   complete(hotplug_done);
}
 }
 
@@ -521,18 +508,15 @@ static int dlpar_parse_id_type(char **cmd, struct 
pseries_hp_errorlog *hp_elog)
 static ssize_t dlpar_store(struct class *class, struct class_attribute *attr,
   const char *buf, size_t count)
 {
-   struct pseries_hp_errorlog *hp_elog;
-   struct completion hotplug_done;
+   struct pseries_hp_errorlog hp_elog;
char *argbuf;
char *args;
int rc;
 
args = argbuf = kstrdup(buf, GFP_KERNEL);
-   hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
-   if (!hp_elog || !argbuf) {
+   if (!argbuf) {
pr_info("Could not allocate resources for DLPAR operation\n");
kfree(argbuf);
-   kfree(hp_elog);
return -ENOMEM;
}
 
@@ -540,25 +524,22 @@ static ssize_t dlpar_store(struct class *class, struct 
class_attribute *attr,
 * Parse out the request from the user, this will be in the form:
 *
 */
-   rc = dlpar_parse_resource(, hp_elog);
+   rc = dlpar_parse_resource(, _elog);
if (rc)
goto dlpar_store_out;
 
-   rc = dlpar_parse_action(, hp_elog);
+   rc = dlpar_parse_action(, _elog);
if (rc)
goto dlpar_store_out;
 
-   rc = dlpar_parse_id_type(, hp_elog);
+   rc = dlpar_parse_id_type(, _elog);
if (rc)
goto dlpar_store_out;
 
-   init_completion(_done);
-   queue_hotplug_event(hp_elog, _done, );
-   wait_for_completion(_done);
+   rc = handle_dlpar_errorlog(_elog);
 
 dlpar_store_out:
kfree(argbuf);
-   kfree(hp_elog);
 
if (rc)
pr_err("Could not handle DLPAR request \"%s\"\n", buf);
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index f0e30dc94988..6f27d00505cf 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,7 +242,7 @@ static int 

[PATCH v5 1/2] powerpc/pseries: Remove prrn_work workqueue

2018-09-10 Thread Nathan Fontenot
When a PRRN event is received we are already running in a worker
thread. Instead of spawning off another worker thread on the prrn_work
workqueue to handle the PRRN event we can just call the PRRN handler
routine directly.

With this update we can also pass the scope variable for the PRRN
event directly to the handler instead of it being a global variable.

This patch fixes the following oops mnessage we are seeing in PRRN testing:

Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver 
nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) 
rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag 
af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure 
scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) 
scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External 54
CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G X 
4.4.126-94.22-default #1
Workqueue: pseries hotplug workque pseries_hp_work_fn
task: c00775367790 ti: c0001ebd4000 task.ti: c0070d14
NIP:  LR: 1fb3d050 CTR: 
REGS: c0001ebd7d40 TRAP: 0700   Tainted: G X  
(4.4.126-94.22-default)
MSR: 800102081000 <41,VEC,ME5  CR: 2802  XER: 20040018   4
CFAR: 1fb3d084 40 419   13
GPR00: 400010007 1400 00041fffe200
GPR04: 00805 1fb15fa8 00050500
GPR08: 0001f40040001  05:5200040002
GPR12: 5c7a05400 c00e89f8 1ed9f668
GPR16: 1fbeff9441fbeff94 1fb545e4 00600060
GPR20: 4  
GPR24: 540001fb3c000  1fb1b040
GPR28: 1fb2400041fb440d8 0008 
NIP [] 5 (null)
LR [1fb3d050] 031fb3d050
Call Trace:4
Instruction dump:  4   5:47 122
  X4XX     
  X5XX  6000 6000 6000 6000
---[ end trace aa5627b04a7d9d6b ]---   3NMI 
watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [kworker/27:0:13903]
Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver 
nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) 
rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag 
af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure 
scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) 
scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External
CPU: 27 PID: 13903 Comm: kworker/27:0 Tainted: G  D  X 
4.4.126-94.22-default #1
Workqueue: events prrn_work_fn
task: c00747cfa390 ti: c0074712c000 task.ti: c0074712c000
NIP: c08002a8 LR: c0090770 CTR: 0032e088
REGS: c0074712f7b0 TRAP: 0901   Tainted: G  D  X  
(4.4.126-94.22-default)
MSR: 80019033   CR: 22482044  XER: 2004
CFAR: c08002c4 SOFTE: 1
GPR00: c0090770 c0074712fa30 c0f09800 c0fa1928 6:02
GPR04: c00775f5e000 fffe 0001 c0f42db8
GPR08: 0001 8007  
GPR12: 800621008318 c7a14400
NIP [c08002a8] _raw_spin_lock+0x68/0xd0
LR [c0090770] mobility_rtas_call+0x50/0x100
Call Trace:595
[c0074712fa60] [c0090770] mobility_rtas_call+0x50/0x100
[c0074712faf0] [c0090b08] pseries_devicetree_update+0xf8/0x530
[c0074712fc20] [c0031ba4] prrn_work_fn+0x34/0x50
[c0074712fc40] [c00e0390] process_one_work+0x1a0/0x4e0
[c0074712fcd0] [c00e0870] worker_thread+0x1a0/0x6105:57   2
[c0074712fd80] [c00e8b18] kthread+0x128/0x150
[c0074712fe30] [c00096f8] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
2c09 40c20010 7d40192d 40c2fff0 7c2004ac 2fa9 40de0018 5:540030   3
e8010010 ebe1fff8 7c0803a6 4e800020 <7c210b78> e92d 89290009 792affe3

Signed-off-by: John Allen 
Signed-off-by: Haren Myneni 
---
v5:
  - Update commit message to include oops message
v4:
  - Remove prrn_work workqueue as suggested by Michael Ellerman
  - Make the PRRN event scope passed in as opposed to a global, suggested
by Michael Ellerman
v3:
  -Scrap the mutex as it only 

[PATCH v5 0/2] powerpc/pseries: Improve serialization of PRRN events

2018-09-10 Thread Nathan Fontenot
Stress testing has uncovered issues with handling continuously queued PRRN
events. Running PRRN events in this way can seriously load the system given
the sheer volume of dlpar actions being handled, eventually resulting
in a system oops (see below). This patchset ensures that PRRN
events are handled more synchronously. It also updates dlpar invocation
so that it can be done directly instead of waiting on a workqueue.

Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Supported: Yes, External 54
CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G X 
4.4.126-94.22-default #1
Workqueue: pseries hotplug workque pseries_hp_work_fn
task: c00775367790 ti: c0001ebd4000 task.ti: c0070d14
NIP:  LR: 1fb3d050 CTR: 
REGS: c0001ebd7d40 TRAP: 0700   Tainted: G X  
(4.4.126-94.22-default)
MSR: 800102081000 <41,VEC,ME5  CR: 2802  XER: 20040018   4
CFAR: 1fb3d084 40 419   13
GPR00: 400010007 1400 00041fffe200 
GPR04: 00805 1fb15fa8 00050500 
GPR08: 0001f40040001  05:5200040002
GPR12: 5c7a05400 c00e89f8 1ed9f668 
GPR16: 1fbeff9441fbeff94 1fb545e4 00600060 
GPR20: 4   
GPR24: 540001fb3c000  1fb1b040 
GPR28: 1fb2400041fb440d8 0008  
NIP [] 5 (null)
LR [1fb3d050] 031fb3d050
Call Trace:4
Instruction dump:  4   5:47 122
  X4XX      
  X5XX  6000 6000 6000 6000 

-Nathan
---

Nathan Fontenot (2):
  powerpc/pseries: Remove prrn_work workqueue
  powerpc/pseries:Remove unneeded uses of dlpar work queue


 arch/powerpc/kernel/rtasd.c   |   17 ++---
 arch/powerpc/platforms/pseries/dlpar.c|   37 +++--
 arch/powerpc/platforms/pseries/mobility.c |   18 +-
 arch/powerpc/platforms/pseries/pseries.h  |5 ++--
 arch/powerpc/platforms/pseries/ras.c  |2 +-
 5 files changed, 22 insertions(+), 57 deletions(-)



Re: Conflict between sparse and commit cafa0010cd51f ("Raise the minimum required gcc version to 4.6")

2018-09-10 Thread Luc Van Oostenryck
On Mon, Sep 10, 2018 at 04:05:34PM +0200, Christophe LEROY wrote:
> 
> This time it works, thanks for your help.

You're welcome.
 
> Should we find a may to automate that in the Makefile when
> CROSS_COMPILE is defined ?

The situation here with an old gcc is really an oddity.
I was instead thinking to update sparse so that it repports a
GCC version of at least 4.6, maybe something even more recent.

But maybe, yes, kbuild could pass GCC_VERSION to sparse so
that both the compiler used and sparse will repport the same.
I'll see. The problem is not tied to cross-compilation, though,
just that sparse may be compiled with an older compiler.

-- Luc


[PATCH v3 7/9] powerpc: enable building all dtbs

2018-09-10 Thread Rob Herring
Enable the 'dtbs' target for powerpc. This allows building all the dts
files in arch/powerpc/boot/dts/ when COMPILE_TEST and OF_ALL_DTBS are
enabled.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 arch/powerpc/boot/dts/Makefile | 5 +
 arch/powerpc/boot/dts/fsl/Makefile | 4 
 2 files changed, 9 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/fsl/Makefile

diff --git a/arch/powerpc/boot/dts/Makefile b/arch/powerpc/boot/dts/Makefile
index f66554cd5c45..fb335d05aae8 100644
--- a/arch/powerpc/boot/dts/Makefile
+++ b/arch/powerpc/boot/dts/Makefile
@@ -1 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
+
+subdir-y += fsl
+
+dtstree:= $(srctree)/$(src)
+dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
diff --git a/arch/powerpc/boot/dts/fsl/Makefile 
b/arch/powerpc/boot/dts/fsl/Makefile
new file mode 100644
index ..3bae982641e9
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+dtstree:= $(srctree)/$(src)
+dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
-- 
2.17.1



[PATCH v3 6/9] kbuild: consolidate Devicetree dtb build rules

2018-09-10 Thread Rob Herring
There is nothing arch specific about building dtb files other than their
location under /arch/*/boot/dts/. Keeping each arch aligned is a pain.
The dependencies and supported targets are all slightly different.
Also, a cross-compiler for each arch is needed, but really the host
compiler preprocessor is perfectly fine for building dtbs. Move the
build rules to a common location and remove the arch specific ones. This
is done in a single step to avoid warnings about overriding rules.

The build dependencies had been a mixture of 'scripts' and/or 'prepare'.
These pull in several dependencies some of which need a target compiler
(specifically devicetable-offsets.h) and aren't needed to build dtbs.
All that is really needed is dtc, so adjust the dependencies to only be
dtc.

This change enables support 'dtbs_install' on some arches which were
missing the target.

Acked-by: Will Deacon 
Acked-by: Paul Burton 
Acked-by: Ley Foon Tan 
Cc: Masahiro Yamada 
Cc: Michal Marek 
Cc: Vineet Gupta 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Yoshinori Sato 
Cc: Michal Simek 
Cc: Ralf Baechle 
Cc: James Hogan 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-kbu...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-m...@linux-mips.org
Cc: nios2-...@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-xte...@linux-xtensa.org
Signed-off-by: Rob Herring 
---
 Makefile  | 35 ++-
 arch/arc/Makefile |  6 --
 arch/arm/Makefile | 20 +-
 arch/arm64/Makefile   | 17 +--
 arch/c6x/Makefile |  2 --
 arch/h8300/Makefile   | 11 +-
 arch/microblaze/Makefile  |  4 +---
 arch/microblaze/boot/dts/Makefile |  2 ++
 arch/mips/Makefile| 15 +
 arch/nds32/Makefile   |  2 +-
 arch/nios2/Makefile   |  7 ---
 arch/nios2/boot/Makefile  |  4 
 arch/powerpc/Makefile |  3 ---
 arch/xtensa/Makefile  | 12 +--
 scripts/Makefile  |  3 +--
 scripts/Makefile.lib  |  2 +-
 scripts/dtc/Makefile  |  2 +-
 17 files changed, 46 insertions(+), 101 deletions(-)

diff --git a/Makefile b/Makefile
index 19948e556941..c43859eba70f 100644
--- a/Makefile
+++ b/Makefile
@@ -1071,7 +1071,7 @@ include/config/kernel.release: $(srctree)/Makefile FORCE
 # Carefully list dependencies so we do not try to build scripts twice
 # in parallel
 PHONY += scripts
-scripts: scripts_basic asm-generic gcc-plugins $(autoksyms_h)
+scripts: scripts_basic scripts_dtc asm-generic gcc-plugins $(autoksyms_h)
$(Q)$(MAKE) $(build)=$(@)
 
 # Things we need to do before we recursively start building the kernel
@@ -1215,6 +1215,33 @@ kselftest-merge:
$(srctree)/tools/testing/selftests/*/config
+$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig
 
+# ---
+# Devicetree files
+
+ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/boot/dts/),)
+dtstree := arch/$(SRCARCH)/boot/dts
+endif
+
+ifdef CONFIG_OF_EARLY_FLATTREE
+
+%.dtb : scripts_dtc
+   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
+
+PHONY += dtbs dtbs_install
+dtbs: scripts_dtc
+   $(Q)$(MAKE) $(build)=$(dtstree)
+
+dtbs_install: dtbs
+   $(Q)$(MAKE) $(dtbinst)=$(dtstree)
+
+all: dtbs
+
+endif
+
+PHONY += scripts_dtc
+scripts_dtc: scripts_basic
+   $(Q)$(MAKE) $(build)=scripts/dtc
+
 # ---
 # Modules
 
@@ -1424,6 +1451,12 @@ help:
@echo  '  kselftest-merge - Merge all the config dependencies of 
kselftest to existing'
@echo  '.config.'
@echo  ''
+   @$(if $(dtstree), \
+   echo 'Devicetree:'; \
+   echo '* dtbs- Build device tree blobs for enabled 
boards'; \
+   echo '  dtbs_install- Install dtbs to 
$(INSTALL_DTBS_PATH)'; \
+   echo '')
+
@echo 'Userspace tools targets:'
@echo '  use "make tools/help"'
@echo '  or  "cd tools; make help"'
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index fb026196aaab..5c7bc6d62f43 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -132,11 +132,5 @@ boot_targets += uImage uImage.bin uImage.gz
 $(boot_targets): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
 
-%.dtb %.dtb.S %.dtb.o: scripts
-   $(Q)$(MAKE) $(build)=$(boot)/dts $(boot)/dts/$@
-
-dtbs: scripts
-   $(Q)$(MAKE) $(build)=$(boot)/dts
-
 archclean:
$(Q)$(MAKE) $(clean)=$(boot)
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index d1516f85f25d..161c2df6567e 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ 

[PATCH v3 1/9] powerpc: build .dtb files in dts directory

2018-09-10 Thread Rob Herring
Align powerpc with other architectures which build the dtb files in the
same directory as the dts files. This is also in line with most other
build targets which are located in the same directory as the source.
This move will help enable the 'dtbs' target which builds all the dtbs
regardless of kernel config.

This transition could break some scripts if they expect dtb files in the
old location.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
v3:
 - Remove duplicate mpc5200 dtbs from image-y targets. The dtb target already
   comes from the cuImage. target.

 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/boot/Makefile | 55 --
 arch/powerpc/boot/dts/Makefile |  1 +
 3 files changed, 28 insertions(+), 30 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/Makefile

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 11a1acba164a..53ea887eb34e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -294,7 +294,7 @@ bootwrapper_install:
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)

 %.dtb: scripts
-   $(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
+   $(Q)$(MAKE) $(build)=$(boot)/dts $(patsubst %,$(boot)/dts/%,$@)

 # Used to create 'merged defconfigs'
 # To use it $(call) it with the first argument as the base defconfig
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 0fb96c26136f..bca5c23767df 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -304,9 +304,9 @@ image-$(CONFIG_PPC_ADDER875)+= 
cuImage.adder875-uboot \
   dtbImage.adder875-redboot

 # Board ports in arch/powerpc/platform/52xx/Kconfig
-image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200 lite5200.dtb
-image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200b lite5200b.dtb
-image-$(CONFIG_PPC_MEDIA5200)  += cuImage.media5200 media5200.dtb
+image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200
+image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200b
+image-$(CONFIG_PPC_MEDIA5200)  += cuImage.media5200

 # Board ports in arch/powerpc/platform/82xx/Kconfig
 image-$(CONFIG_MPC8272_ADS)+= cuImage.mpc8272ads
@@ -381,11 +381,11 @@ $(addprefix $(obj)/, $(sort $(filter zImage.%, 
$(image-y: vmlinux $(wrapperb
$(call if_changed,wrap,$(subst $(obj)/zImage.,,$@))

 # dtbImage% - a dtbImage is a zImage with an embedded device tree blob
-$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE
-   $(call if_changed,wrap,$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE
+   $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE
-   $(call if_changed,wrap,$*,,$(obj)/$*.dtb)
+$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE
+   $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb)

 # This cannot be in the root of $(src) as the zImage rule always adds a $(obj)
 # prefix
@@ -395,36 +395,33 @@ $(obj)/vmlinux.strip: vmlinux
 $(obj)/uImage: vmlinux $(wrapperbits) FORCE
$(call if_changed,wrap,uboot)

-$(obj)/uImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/uImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/uImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb)
+$(obj)/uImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb)

-$(obj)/cuImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/cuImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/cuImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb)
+$(obj)/cuImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb)

-$(obj)/simpleImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call 
if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/simpleImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/simpleImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb)
+$(obj)/simpleImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 

[PATCH v3 0/9] Devicetree build consolidation

2018-09-10 Thread Rob Herring
This series addresses a couple of issues I have with building dts files.

First, the ability to build all the dts files in the tree. This has been
supported on most arches for some time with powerpc being the main
exception. The reason powerpc wasn't supported was it needed a change
in the location built dtb files are put.

Secondly, it's a pain to acquire all the cross-compilers needed to build
dtbs for each arch. There's no reason to build with the cross compiler and
the host compiler is perfectly fine as we only need the pre-processor.

I started addressing just those 2 problems, but kept finding small
differences such as target dependencies and dtbs_install support across
architectures. Instead of trying to align all these, I've consolidated the
build targets moving them out of the arch makefiles.

I'd like to take the series via the DT tree.

Rob

v3:
 - Rework dtc dependency to avoid 2 entry paths to scripts/dtc/. Essentially,
   I copied 'scripts_basic'.
 - Add missing scripts_basic dependency for dtc and missing PHONY tag.
 - Drop the '|' order only from dependencies
 - Drop %.dtb.S and %.dtb.o as top-level targets
 - PPC: remove duplicate mpc5200 dtbs from image-y targets

v2:
 - Fix $arch/boot/dts path check for out of tree builds
 - Fix dtc dependency for building built-in dtbs
 - Fix microblaze built-in dtb building
 - Add dtbs target for microblaze

Rob Herring (9):
  powerpc: build .dtb files in dts directory
  nios2: build .dtb files in dts directory
  nios2: use common rules to build built-in dtb
  nios2: fix building all dtbs
  c6x: use common built-in dtb support
  kbuild: consolidate Devicetree dtb build rules
  powerpc: enable building all dtbs
  c6x: enable building all dtbs
  microblaze: enable building all dtbs

 Makefile   | 35 ++-
 arch/arc/Makefile  |  6 
 arch/arm/Makefile  | 20 +--
 arch/arm64/Makefile| 17 +
 arch/c6x/Makefile  |  2 --
 arch/c6x/boot/dts/Makefile | 17 -
 arch/c6x/boot/dts/linked_dtb.S |  2 --
 arch/c6x/include/asm/sections.h|  1 -
 arch/c6x/kernel/setup.c|  4 +--
 arch/c6x/kernel/vmlinux.lds.S  | 10 --
 arch/h8300/Makefile| 11 +-
 arch/microblaze/Makefile   |  4 +--
 arch/microblaze/boot/dts/Makefile  |  4 +++
 arch/mips/Makefile | 15 +---
 arch/nds32/Makefile|  2 +-
 arch/nios2/Makefile| 11 +-
 arch/nios2/boot/Makefile   | 22 
 arch/nios2/boot/dts/Makefile   |  6 
 arch/nios2/boot/linked_dtb.S   | 19 ---
 arch/powerpc/Makefile  |  3 --
 arch/powerpc/boot/Makefile | 55 ++
 arch/powerpc/boot/dts/Makefile |  6 
 arch/powerpc/boot/dts/fsl/Makefile |  4 +++
 arch/xtensa/Makefile   | 12 +--
 scripts/Makefile   |  3 +-
 scripts/Makefile.lib   |  2 +-
 scripts/dtc/Makefile   |  2 +-
 27 files changed, 100 insertions(+), 195 deletions(-)
 delete mode 100644 arch/c6x/boot/dts/linked_dtb.S
 create mode 100644 arch/nios2/boot/dts/Makefile
 delete mode 100644 arch/nios2/boot/linked_dtb.S
 create mode 100644 arch/powerpc/boot/dts/Makefile
 create mode 100644 arch/powerpc/boot/dts/fsl/Makefile

--
2.17.1


How to handle PTE tables with non contiguous entries ?

2018-09-10 Thread Christophe Leroy

Hi,

I'm having a hard time figuring out the best way to handle the following 
situation:


On the powerpc8xx, handling 16k size pages requires to have page tables 
with 4 identical entries.


Initially I was thinking about handling this by simply modifying 
pte_index() which changing pte_t type in order to have one entry every 
16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and 
*ptep+3 both in set_pte_at() and pte_update().


However, this doesn't work because many many places in the mm core part 
of the kernel use loops on ptep with single ptep++ increment.


Therefore did it with the following hack:

 /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
 typedef struct { pte_basic_t pte; } pte_t;
+#endif

@@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
: "cc" );
 #else /* PTE_ATOMIC_UPDATES */
unsigned long old = pte_val(*p);
-   *p = __pte((old & ~clr) | set);
+   unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+   *p = __pte(new);
+#endif
 #endif /* !PTE_ATOMIC_UPDATES */

 #ifdef CONFIG_44x


@@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct 
*mm, unsigned long addr,
/* Anything else just stores the PTE normally. That covers all 
64-bit

 * cases, and 32-bit non-hash with 32-bit PTEs.
 */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
*ptep = pte;
+#endif



But I'm not too happy with it as it means pte_t is not a single type 
anymore so passing it from one function to the other is quite heavy.



Would someone have an idea of an elegent way to handle that ?

Thanks
Christophe


Re: [PATCH v2 6/9] kbuild: consolidate Devicetree dtb build rules

2018-09-10 Thread Rob Herring
On Sun, Sep 9, 2018 at 6:28 PM Masahiro Yamada
 wrote:
>
> 2018-09-06 8:53 GMT+09:00 Rob Herring :
> > There is nothing arch specific about building dtb files other than their
> > location under /arch/*/boot/dts/. Keeping each arch aligned is a pain.
> > The dependencies and supported targets are all slightly different.
> > Also, a cross-compiler for each arch is needed, but really the host
> > compiler preprocessor is perfectly fine for building dtbs. Move the
> > build rules to a common location and remove the arch specific ones. This
> > is done in a single step to avoid warnings about overriding rules.
> >
> > The build dependencies had been a mixture of 'scripts' and/or 'prepare'.
> > These pull in several dependencies some of which need a target compiler
> > (specifically devicetable-offsets.h) and aren't needed to build dtbs.
> > All that is really needed is dtc, so adjust the dependencies to only be
> > dtc.
> >
> > This change enables support 'dtbs_install' on some arches which were
> > missing the target.
> >
> > Cc: Masahiro Yamada 
> > Cc: Michal Marek 
> > Cc: Vineet Gupta 
> > Cc: Russell King 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: Yoshinori Sato 
> > Cc: Michal Simek 
> > Cc: Ralf Baechle 
> > Cc: Paul Burton 
> > Cc: James Hogan 
> > Cc: Ley Foon Tan 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul Mackerras 
> > Cc: Michael Ellerman 
> > Cc: Chris Zankel 
> > Cc: Max Filippov 
> > Cc: linux-kbu...@vger.kernel.org
> > Cc: linux-snps-...@lists.infradead.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: uclinux-h8-de...@lists.sourceforge.jp
> > Cc: linux-m...@linux-mips.org
> > Cc: nios2-...@lists.rocketboards.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-xte...@linux-xtensa.org
> > Signed-off-by: Rob Herring 
> > ---
> > Please ack so I can take the whole series via the DT tree.
> >
> > v2:
> >  - Fix $arch/boot/dts path check for out of tree builds
> >  - Fix dtc dependency for building built-in dtbs
> >  - Fix microblaze built-in dtb building
> >
> >  Makefile  | 32 +++
> >  arch/arc/Makefile |  6 --
> >  arch/arm/Makefile | 20 +--
> >  arch/arm64/Makefile   | 17 +---
> >  arch/c6x/Makefile |  2 --
> >  arch/h8300/Makefile   | 11 +--
> >  arch/microblaze/Makefile  |  4 +---
> >  arch/microblaze/boot/dts/Makefile |  2 ++
> >  arch/mips/Makefile| 15 +--
> >  arch/nds32/Makefile   |  2 +-
> >  arch/nios2/Makefile   |  7 ---
> >  arch/nios2/boot/Makefile  |  4 
> >  arch/powerpc/Makefile |  3 ---
> >  arch/xtensa/Makefile  | 12 +---
> >  scripts/Makefile.lib  |  2 +-
> >  15 files changed, 42 insertions(+), 97 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index 2b458801ba74..bc18dbbc16c5 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1212,6 +1212,32 @@ kselftest-merge:
> > $(srctree)/tools/testing/selftests/*/config
> > +$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig
> >
> > +# 
> > ---
> > +# Devicetree files
> > +
> > +ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/boot/dts/),)
> > +dtstree := arch/$(SRCARCH)/boot/dts
> > +endif
> > +
> > +ifdef CONFIG_OF_EARLY_FLATTREE
> > +
> > +%.dtb %.dtb.S %.dtb.o: | dtc
> > +   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
>
>
> Hmm, I was worried about '%.dtb.o: | dtc'
> but seems working.
>
> Compiling %.S -> %.o requires objtool for x86,
> but x86 does not support DT.

Well, x86 does support DT to some extent. There's 2 platforms and the
DT unittests build and run on x86.

Actually, we can remove "%.dtb.S %.dtb.o" because we don't need those
as top-level build targets. Must have been a copy-n-paste relic from
before having common rules.

>
> If CONFIG_MODVERSIONS=y, scripts/genksyms/genksyms is required,
> %.dtb.S does not contain EXPORT_SYMBOL.

Okay, but that shouldn't affect any of this. We only build *.dtb.S
when doing built-in dtbs.

> BTW, 'dtc' should be a PHONY target.

Right, I found that too.

Rob


Re: [PATCH] powerpc: fix csum_ipv6_magic() on little endian platforms

2018-09-10 Thread Xin Long
On Mon, Sep 10, 2018 at 2:09 PM Christophe Leroy
 wrote:
>
> On little endian platforms, csum_ipv6_magic() keeps len and proto in
> CPU byte order. This generates a bad results leading to ICMPv6 packets
> from other hosts being dropped by powerpc64le platforms.
>
> In order to fix this, len and proto should be converted to network
> byte order ie bigendian byte order. However checksumming 0x12345678
> and 0x56341278 provide the exact same result so it is enough to
> rotate the sum of len and proto by 1 byte.
>
> PPC32 only support bigendian so the fix is needed for PPC64 only
>
> Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly")
> Reported-by: Jianlin Shi 
> Reported-by: Xin Long 
> Cc:  # 4.18+
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/lib/checksum_64.S | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
> index 886ed94b9c13..2a68c43e13f5 100644
> --- a/arch/powerpc/lib/checksum_64.S
> +++ b/arch/powerpc/lib/checksum_64.S
> @@ -443,6 +443,9 @@ _GLOBAL(csum_ipv6_magic)
> addcr0, r8, r9
> ld  r10, 0(r4)
> ld  r11, 8(r4)
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +   rotldi  r5, r5, 8
> +#endif
> adder0, r0, r10
> add r5, r5, r7
> adder0, r0, r11
> --
> 2.13.3
>
Tested-by: Xin Long 


Re: Conflict between sparse and commit cafa0010cd51f ("Raise the minimum required gcc version to 4.6")

2018-09-10 Thread Christophe LEROY




Le 10/09/2018 à 15:56, Luc Van Oostenryck a écrit :

On Mon, Sep 10, 2018 at 01:19:07PM +, Christophe Leroy wrote:



On 09/10/2018 11:34 AM, Luc Van Oostenryck wrote:

On Mon, Sep 10, 2018 at 09:56:33AM +, Christophe Leroy wrote:


# export REAL_CC=ppc-linux-gcc
# make CHECK="cgcc -target=ppc -D_CALL_ELF=2 -D__GCC__=5
-D__GCC_MINOR__=4" C=2 arch/powerpc/kernel/process.o
scripts/kconfig/conf  --syncconfig Kconfig
#
# configuration written to .config
#
UPD include/config/kernel.release
UPD include/generated/utsrelease.h
CC  kernel/bounds.s
CC  arch/powerpc/kernel/asm-offsets.s
CALLscripts/checksyscalls.sh
CHECK   scripts/mod/empty.c
Can't exec "/bin/sh": Argument list too long at /usr/local/bin/cgcc line 86.
make[2]: *** [scripts/mod/empty.o] Error 1
make[1]: *** [scripts/mod] Error 2
make: *** [scripts] Error 2


OK. Clearly nobody has ever used it so :(
There is an infinite loop because cgcc use the env var CHECK
to call sparse while kbuild use CHECK to call cgcc here.

The following seems to work here.
$ export REAL_CC=ppc-linux-gcc
$ make CHECK="CHECK=sparse cgcc -target=ppc ...


Not yet ...

[root@pc16082vm linux-powerpc]# export REAL_CC=ppc-linux-gcc
[root@pc16082vm linux-powerpc]# make CHECK="CHECK=sparse cgcc
-target=ppc -D_CALL_ELF=2 -D__GNUC__=5 -D__GNUC_MINOR__=4" C=2
arch/powerpc/kernel/process.o
   CALLscripts/checksyscalls.sh
   CHECK   scripts/mod/empty.c
:0:0: warning: "__STDC__" redefined
: note: this is the location of the previous definition
/opt/cldk-1.4.0/lib/gcc/ppc-linux/5.4.0/../../../../ppc-linux/lib/crt1.o:(.rodata+0x4):
undefined reference to `main'
collect2: error: ld returned 1 exit status
make[2]: *** [scripts/mod/empty.o] Error 1
make[1]: *** [scripts/mod] Error 2
make: *** [scripts] Error 2


OK. Using cgcc creates more problems that it solves and this file
scripts/mod/empty.c is weird.
Dropping cgcc and simply giving the GCC version to sparse works for
me here (the needed defines are given by arch/powerpc/Makefile) but
for sure I don't have the same environment as you have:
   $ make CHECK="sparse -D__GNUC__=5 -D__GNUC_MINOR__=4" ...


This time it works, thanks for your help.

Should we find a may to automate that in the Makefile when CROSS_COMPILE 
is defined ?




Bonne chance,


Merci

Christophe


Re: Conflict between sparse and commit cafa0010cd51f ("Raise the minimum required gcc version to 4.6")

2018-09-10 Thread Luc Van Oostenryck
On Mon, Sep 10, 2018 at 01:19:07PM +, Christophe Leroy wrote:
> 
> 
> On 09/10/2018 11:34 AM, Luc Van Oostenryck wrote:
> > On Mon, Sep 10, 2018 at 09:56:33AM +, Christophe Leroy wrote:
> > > 
> > > # export REAL_CC=ppc-linux-gcc
> > > # make CHECK="cgcc -target=ppc -D_CALL_ELF=2 -D__GCC__=5
> > > -D__GCC_MINOR__=4" C=2 arch/powerpc/kernel/process.o
> > > scripts/kconfig/conf  --syncconfig Kconfig
> > > #
> > > # configuration written to .config
> > > #
> > >UPD include/config/kernel.release
> > >UPD include/generated/utsrelease.h
> > >CC  kernel/bounds.s
> > >CC  arch/powerpc/kernel/asm-offsets.s
> > >CALLscripts/checksyscalls.sh
> > >CHECK   scripts/mod/empty.c
> > > Can't exec "/bin/sh": Argument list too long at /usr/local/bin/cgcc line 
> > > 86.
> > > make[2]: *** [scripts/mod/empty.o] Error 1
> > > make[1]: *** [scripts/mod] Error 2
> > > make: *** [scripts] Error 2
> > 
> > OK. Clearly nobody has ever used it so :(
> > There is an infinite loop because cgcc use the env var CHECK
> > to call sparse while kbuild use CHECK to call cgcc here.
> > 
> > The following seems to work here.
> > $ export REAL_CC=ppc-linux-gcc
> > $ make CHECK="CHECK=sparse cgcc -target=ppc ...
> 
> Not yet ...
> 
> [root@pc16082vm linux-powerpc]# export REAL_CC=ppc-linux-gcc
> [root@pc16082vm linux-powerpc]# make CHECK="CHECK=sparse cgcc
> -target=ppc -D_CALL_ELF=2 -D__GNUC__=5 -D__GNUC_MINOR__=4" C=2
> arch/powerpc/kernel/process.o
>   CALLscripts/checksyscalls.sh
>   CHECK   scripts/mod/empty.c
> :0:0: warning: "__STDC__" redefined
> : note: this is the location of the previous definition
> /opt/cldk-1.4.0/lib/gcc/ppc-linux/5.4.0/../../../../ppc-linux/lib/crt1.o:(.rodata+0x4):
> undefined reference to `main'
> collect2: error: ld returned 1 exit status
> make[2]: *** [scripts/mod/empty.o] Error 1
> make[1]: *** [scripts/mod] Error 2
> make: *** [scripts] Error 2

OK. Using cgcc creates more problems that it solves and this file
scripts/mod/empty.c is weird.
Dropping cgcc and simply giving the GCC version to sparse works for
me here (the needed defines are given by arch/powerpc/Makefile) but
for sure I don't have the same environment as you have:
  $ make CHECK="sparse -D__GNUC__=5 -D__GNUC_MINOR__=4" ...

Bonne chance,
-- Luc


[PATCH 7/7 v7] arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc

2018-09-10 Thread Nipun Gupta
fsl-mc bus support the new iommu-map property. Comply to this binding
for fsl_mc bus.

Signed-off-by: Nipun Gupta 
Reviewed-by: Laurentiu Tudor 
---
 arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
index 137ef4d..3d5e049 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
@@ -184,6 +184,7 @@
#address-cells = <2>;
#size-cells = <2>;
ranges;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x1 0x>;
 
clockgen: clocking@130 {
compatible = "fsl,ls2080a-clockgen";
@@ -357,6 +358,8 @@
reg = <0x0008 0x0c00 0 0x40>,/* MC portal 
base */
  <0x 0x0834 0 0x4>; /* MC control 
reg */
msi-parent = <>;
+   iommu-map = <0  0 0>;  /* This is fixed-up by 
u-boot */
+   dma-coherent;
#address-cells = <3>;
#size-cells = <1>;
 
@@ -460,6 +463,9 @@
compatible = "arm,mmu-500";
reg = <0 0x500 0 0x80>;
#global-interrupts = <12>;
+   #iommu-cells = <1>;
+   stream-match-mask = <0x7C00>;
+   dma-coherent;
interrupts = <0 13 4>, /* global secure fault */
 <0 14 4>, /* combined secure interrupt */
 <0 15 4>, /* global non-secure fault */
@@ -502,7 +508,6 @@
 <0 204 4>, <0 205 4>,
 <0 206 4>, <0 207 4>,
 <0 208 4>, <0 209 4>;
-   mmu-masters = <_mc 0x300 0>;
};
 
dspi: dspi@210 {
-- 
1.9.1



[PATCH 6/7 v7] bus/fsl-mc: set coherent dma mask for devices on fsl-mc bus

2018-09-10 Thread Nipun Gupta
of_dma_configure() API expects coherent_dma_mask to be correctly
set in the devices. This patch does the needful.

Signed-off-by: Nipun Gupta 
Reviewed-by: Robin Murphy 
---
 drivers/bus/fsl-mc/fsl-mc-bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index fa43c7d..624828b 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -627,6 +627,7 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
mc_dev->icid = parent_mc_dev->icid;
mc_dev->dma_mask = FSL_MC_DEFAULT_DMA_MASK;
mc_dev->dev.dma_mask = _dev->dma_mask;
+   mc_dev->dev.coherent_dma_mask = mc_dev->dma_mask;
dev_set_msi_domain(_dev->dev,
   dev_get_msi_domain(_mc_dev->dev));
}
-- 
1.9.1



[PATCH 5/7 v7] bus/fsl-mc: support dma configure for devices on fsl-mc bus

2018-09-10 Thread Nipun Gupta
This patch adds support of dma configuration for devices on fsl-mc
bus using 'dma_configure' callback for busses. Also, directly calling
arch_setup_dma_ops is removed from the fsl-mc bus.

Signed-off-by: Nipun Gupta 
Reviewed-by: Laurentiu Tudor 
Reviewed-by: Robin Murphy 
---
 drivers/bus/fsl-mc/fsl-mc-bus.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index 5d8266c..fa43c7d 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -127,6 +127,16 @@ static int fsl_mc_bus_uevent(struct device *dev, struct 
kobj_uevent_env *env)
return 0;
 }
 
+static int fsl_mc_dma_configure(struct device *dev)
+{
+   struct device *dma_dev = dev;
+
+   while (dev_is_fsl_mc(dma_dev))
+   dma_dev = dma_dev->parent;
+
+   return of_dma_configure(dev, dma_dev->of_node, 0);
+}
+
 static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
@@ -148,6 +158,7 @@ struct bus_type fsl_mc_bus_type = {
.name = "fsl-mc",
.match = fsl_mc_bus_match,
.uevent = fsl_mc_bus_uevent,
+   .dma_configure  = fsl_mc_dma_configure,
.dev_groups = fsl_mc_dev_groups,
 };
 EXPORT_SYMBOL_GPL(fsl_mc_bus_type);
@@ -633,10 +644,6 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
goto error_cleanup_dev;
}
 
-   /* Objects are coherent, unless 'no shareability' flag set. */
-   if (!(obj_desc->flags & FSL_MC_OBJ_FLAG_NO_MEM_SHAREABILITY))
-   arch_setup_dma_ops(_dev->dev, 0, 0, NULL, true);
-
/*
 * The device-specific probe callback will get invoked by device_add()
 */
-- 
1.9.1



[PATCH 4/7 v7] iommu/arm-smmu: Add support for the fsl-mc bus

2018-09-10 Thread Nipun Gupta
Implement bus specific support for the fsl-mc bus including
registering arm_smmu_ops and bus specific device add operations.

Signed-off-by: Nipun Gupta 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c |  7 +++
 drivers/iommu/iommu.c| 13 +
 include/linux/fsl/mc.h   |  8 
 include/linux/iommu.h|  2 ++
 4 files changed, 30 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f7a96bc..a011bb6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,6 +52,7 @@
 #include 
 
 #include 
+#include 
 
 #include "io-pgtable.h"
 #include "arm-smmu-regs.h"
@@ -1459,6 +1460,8 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
 
if (dev_is_pci(dev))
group = pci_device_group(dev);
+   else if (dev_is_fsl_mc(dev))
+   group = fsl_mc_device_group(dev);
else
group = generic_device_group(dev);
 
@@ -2037,6 +2040,10 @@ static void arm_smmu_bus_init(void)
bus_set_iommu(_bus_type, _smmu_ops);
}
 #endif
+#ifdef CONFIG_FSL_MC_BUS
+   if (!iommu_present(_mc_bus_type))
+   bus_set_iommu(_mc_bus_type, _smmu_ops);
+#endif
 }
 
 static int arm_smmu_device_probe(struct platform_device *pdev)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d227b86..df2f49e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -988,6 +989,18 @@ struct iommu_group *pci_device_group(struct device *dev)
return iommu_group_alloc();
 }
 
+/* Get the IOMMU group for device on fsl-mc bus */
+struct iommu_group *fsl_mc_device_group(struct device *dev)
+{
+   struct device *cont_dev = fsl_mc_cont_dev(dev);
+   struct iommu_group *group;
+
+   group = iommu_group_get(cont_dev);
+   if (!group)
+   group = iommu_group_alloc();
+   return group;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
index f27cb14..dddaca1 100644
--- a/include/linux/fsl/mc.h
+++ b/include/linux/fsl/mc.h
@@ -351,6 +351,14 @@ struct fsl_mc_io {
 #define dev_is_fsl_mc(_dev) (0)
 #endif
 
+/* Macro to check if a device is a container device */
+#define fsl_mc_is_cont_dev(_dev) (to_fsl_mc_device(_dev)->flags & \
+   FSL_MC_IS_DPRC)
+
+/* Macro to get the container device of a MC device */
+#define fsl_mc_cont_dev(_dev) (fsl_mc_is_cont_dev(_dev) ? \
+   (_dev) : (_dev)->parent)
+
 /*
  * module_fsl_mc_driver() - Helper macro for drivers that don't do
  * anything special in module init/exit.  This eliminates a lot of
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7447b0b..209891d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -389,6 +389,8 @@ static inline size_t iommu_map_sg(struct iommu_domain 
*domain,
 extern struct iommu_group *pci_device_group(struct device *dev);
 /* Generic device grouping function */
 extern struct iommu_group *generic_device_group(struct device *dev);
+/* FSL-MC device grouping function */
+struct iommu_group *fsl_mc_device_group(struct device *dev);
 
 /**
  * struct iommu_fwspec - per-device IOMMU instance data
-- 
1.9.1



[PATCH 3/7 v7] iommu/of: support iommu configuration for fsl-mc devices

2018-09-10 Thread Nipun Gupta
With of_pci_map_rid available for all the busses, use the function
for configuration of devices on fsl-mc bus

Signed-off-by: Nipun Gupta 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/of_iommu.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 811e160..284474d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define NO_IOMMU   1
 
@@ -159,6 +160,23 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 
alias, void *data)
return err;
 }
 
+static int of_fsl_mc_iommu_init(struct fsl_mc_device *mc_dev,
+   struct device_node *master_np)
+{
+   struct of_phandle_args iommu_spec = { .args_count = 1 };
+   int err;
+
+   err = of_map_rid(master_np, mc_dev->icid, "iommu-map",
+"iommu-map-mask", _spec.np,
+iommu_spec.args);
+   if (err)
+   return err == -ENODEV ? NO_IOMMU : err;
+
+   err = of_iommu_xlate(_dev->dev, _spec);
+   of_node_put(iommu_spec.np);
+   return err;
+}
+
 const struct iommu_ops *of_iommu_configure(struct device *dev,
   struct device_node *master_np)
 {
@@ -190,6 +208,8 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
 
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, );
+   } else if (dev_is_fsl_mc(dev)) {
+   err = of_fsl_mc_iommu_init(to_fsl_mc_device(dev), master_np);
} else {
struct of_phandle_args iommu_spec;
int idx = 0;
-- 
1.9.1



[PATCH 2/7 v7] iommu/of: make of_pci_map_rid() available for other devices too

2018-09-10 Thread Nipun Gupta
iommu-map property is also used by devices with fsl-mc. This
patch moves the of_pci_map_rid to generic location, so that it
can be used by other busses too.

'of_pci_map_rid' is renamed here to 'of_map_rid' and there is no
functional change done in the API.

Signed-off-by: Nipun Gupta 
Reviewed-by: Rob Herring 
Reviewed-by: Robin Murphy 
Acked-by: Bjorn Helgaas 
---
 drivers/iommu/of_iommu.c |   5 +--
 drivers/of/base.c| 102 +++
 drivers/of/irq.c |   5 +--
 drivers/pci/of.c | 101 --
 include/linux/of.h   |  11 +
 include/linux/of_pci.h   |  10 -
 6 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5c36a8b..811e160 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -149,9 +149,8 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 
alias, void *data)
struct of_phandle_args iommu_spec = { .args_count = 1 };
int err;
 
-   err = of_pci_map_rid(info->np, alias, "iommu-map",
-"iommu-map-mask", _spec.np,
-iommu_spec.args);
+   err = of_map_rid(info->np, alias, "iommu-map", "iommu-map-mask",
+_spec.np, iommu_spec.args);
if (err)
return err == -ENODEV ? NO_IOMMU : err;
 
diff --git a/drivers/of/base.c b/drivers/of/base.c
index 848f549..c7aac81 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1995,3 +1995,105 @@ int of_find_last_cache_level(unsigned int cpu)
 
return cache_level;
 }
+
+/**
+ * of_map_rid - Translate a requester ID through a downstream mapping.
+ * @np: root complex device node.
+ * @rid: device requester ID to map.
+ * @map_name: property name of the map to use.
+ * @map_mask_name: optional property name of the mask to use.
+ * @target: optional pointer to a target device node.
+ * @id_out: optional pointer to receive the translated ID.
+ *
+ * Given a device requester ID, look up the appropriate implementation-defined
+ * platform ID and/or the target device which receives transactions on that
+ * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or
+ * @id_out may be NULL if only the other is required. If @target points to
+ * a non-NULL device node pointer, only entries targeting that node will be
+ * matched; if it points to a NULL value, it will receive the device node of
+ * the first matching target phandle, with a reference held.
+ *
+ * Return: 0 on success or a standard error code on failure.
+ */
+int of_map_rid(struct device_node *np, u32 rid,
+  const char *map_name, const char *map_mask_name,
+  struct device_node **target, u32 *id_out)
+{
+   u32 map_mask, masked_rid;
+   int map_len;
+   const __be32 *map = NULL;
+
+   if (!np || !map_name || (!target && !id_out))
+   return -EINVAL;
+
+   map = of_get_property(np, map_name, _len);
+   if (!map) {
+   if (target)
+   return -ENODEV;
+   /* Otherwise, no map implies no translation */
+   *id_out = rid;
+   return 0;
+   }
+
+   if (!map_len || map_len % (4 * sizeof(*map))) {
+   pr_err("%pOF: Error: Bad %s length: %d\n", np,
+   map_name, map_len);
+   return -EINVAL;
+   }
+
+   /* The default is to select all bits. */
+   map_mask = 0x;
+
+   /*
+* Can be overridden by "{iommu,msi}-map-mask" property.
+* If of_property_read_u32() fails, the default is used.
+*/
+   if (map_mask_name)
+   of_property_read_u32(np, map_mask_name, _mask);
+
+   masked_rid = map_mask & rid;
+   for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) {
+   struct device_node *phandle_node;
+   u32 rid_base = be32_to_cpup(map + 0);
+   u32 phandle = be32_to_cpup(map + 1);
+   u32 out_base = be32_to_cpup(map + 2);
+   u32 rid_len = be32_to_cpup(map + 3);
+
+   if (rid_base & ~map_mask) {
+   pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) 
ignores rid-base (0x%x)\n",
+   np, map_name, map_name,
+   map_mask, rid_base);
+   return -EFAULT;
+   }
+
+   if (masked_rid < rid_base || masked_rid >= rid_base + rid_len)
+   continue;
+
+   phandle_node = of_find_node_by_phandle(phandle);
+   if (!phandle_node)
+   return -ENODEV;
+
+   if (target) {
+   if (*target)
+   of_node_put(phandle_node);
+   else
+   *target = phandle_node;
+
+   if 

[PATCH 1/7 v7] Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc bus

2018-09-10 Thread Nipun Gupta
The existing IOMMU bindings cannot be used to specify the relationship
between fsl-mc devices and IOMMUs. This patch adds a generic binding for
mapping fsl-mc devices to IOMMUs, using iommu-map property.

Signed-off-by: Nipun Gupta 
Reviewed-by: Rob Herring 
Acked-by: Robin Murphy 
---
 .../devicetree/bindings/misc/fsl,qoriq-mc.txt  | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
index 6611a7c..01fdc33 100644
--- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
+++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
@@ -9,6 +9,25 @@ blocks that can be used to create functional hardware 
objects/devices
 such as network interfaces, crypto accelerator instances, L2 switches,
 etc.
 
+For an overview of the DPAA2 architecture and fsl-mc bus see:
+Documentation/networking/dpaa2/overview.rst
+
+As described in the above overview, all DPAA2 objects in a DPRC share the
+same hardware "isolation context" and a 10-bit value called an ICID
+(isolation context id) is expressed by the hardware to identify
+the requester.
+
+The generic 'iommus' property is insufficient to describe the relationship
+between ICIDs and IOMMUs, so an iommu-map property is used to define
+the set of possible ICIDs under a root DPRC and how they map to
+an IOMMU.
+
+For generic IOMMU bindings, see
+Documentation/devicetree/bindings/iommu/iommu.txt.
+
+For arm-smmu binding, see:
+Documentation/devicetree/bindings/iommu/arm,smmu.txt.
+
 Required properties:
 
 - compatible
@@ -88,14 +107,34 @@ Sub-nodes:
   Value type: 
   Definition: Specifies the phandle to the PHY device node 
associated
   with the this dpmac.
+Optional properties:
+
+- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier
+  data.
+
+  The property is an arbitrary number of tuples of
+  (icid-base,iommu,iommu-base,length).
+
+  Any ICID i in the interval [icid-base, icid-base + length) is
+  associated with the listed IOMMU, with the iommu-specifier
+  (i - icid-base + iommu-base).
 
 Example:
 
+smmu: iommu@500 {
+   compatible = "arm,mmu-500";
+   #iommu-cells = <1>;
+   stream-match-mask = <0x7C00>;
+   ...
+};
+
 fsl_mc: fsl-mc@80c00 {
 compatible = "fsl,qoriq-mc";
 reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */
   <0x 0x0834 0 0x4>; /* MC control reg */
 msi-parent = <>;
+/* define map for ICIDs 23-64 */
+iommu-map = <23  23 41>;
 #address-cells = <3>;
 #size-cells = <1>;
 
-- 
1.9.1



[PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU

2018-09-10 Thread Nipun Gupta
This patchset defines IOMMU DT binding for fsl-mc bus and adds
support in SMMU for fsl-mc bus.

These patches
  - Define property 'iommu-map' for fsl-mc bus (patch 1)
  - Integrates the fsl-mc bus with the SMMU using this
IOMMU binding (patch 2,3,4)
  - Adds the dma configuration support for fsl-mc bus (patch 5, 6)
  - Updates the fsl-mc device node with iommu/dma related changes (patch 7)

Changes in v2:
  - use iommu-map property for fsl-mc bus
  - rebase over patchset https://patchwork.kernel.org/patch/10317337/
and make corresponding changes for dma configuration of devices on
fsl-mc bus

Changes in v3:
  - move of_map_rid in drivers/of/address.c

Changes in v4:
  - move of_map_rid in drivers/of/base.c

Changes in v5:
  - break patch 5 in two separate patches (now patch 5/7 and patch 6/7)
  - add changelog text in patch 3/7 and patch 5/7
  - typo fix

Changes in v6:
  - Updated fsl_mc_device_group() API to be more rational
  - Added dma-coherent property in the LS2 smmu device node
  - Minor fixes in the device-tree documentation

Changes in v7:
  - Rebased over linux 4.19

Nipun Gupta (7):
  Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc
bus
  iommu/of: make of_pci_map_rid() available for other devices too
  iommu/of: support iommu configuration for fsl-mc devices
  iommu/arm-smmu: Add support for the fsl-mc bus
  bus: fsl-mc: support dma configure for devices on fsl-mc bus
  bus: fsl-mc: set coherent dma mask for devices on fsl-mc bus
  arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc

 .../devicetree/bindings/misc/fsl,qoriq-mc.txt  |  39 
 arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi |   7 +-
 drivers/bus/fsl-mc/fsl-mc-bus.c|  16 +++-
 drivers/iommu/arm-smmu.c   |   7 ++
 drivers/iommu/iommu.c  |  13 +++
 drivers/iommu/of_iommu.c   |  25 -
 drivers/of/base.c  | 102 +
 drivers/of/irq.c   |   5 +-
 drivers/pci/of.c   | 101 
 include/linux/fsl/mc.h |   8 ++
 include/linux/iommu.h  |   2 +
 include/linux/of.h |  11 +++
 include/linux/of_pci.h |  10 --
 13 files changed, 224 insertions(+), 122 deletions(-)

-- 
1.9.1



Re: [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE

2018-09-10 Thread Laurent Dufour
Hi Michael,

Do you plan to pull it for 4.20 ?

Cheers,
Laurent.

On 20/08/2018 16:29, Laurent Dufour wrote:
> On very large system we could see soft lockup fired when a process is
> exiting
> 
> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c 
> xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum 
> lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac 
> scsi_dh_alua autofs4
> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
> NIP:  c00b995c LR: c00b8f64 CTR: aa18
> REGS: c6b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
> MSR:  80010280b033   CR: 22042082  
> XER: 
> CFAR: 006cf8f0 SOFTE: 0 
> GPR00: 0010 c6b0645b7890 c0f99200  
> GPR04: 8e01a5a4de58 400249cf1bfd5480 8e01a5a4de50 400249cf1bfd5480 
> GPR08: 8e01a5a4de48 400249cf1bfd5480 8e01a5a4de40 400249cf1bfd5480 
> GPR12:  c0001e690800 
> NIP [c00b995c] plpar_hcall9+0x44/0x7c
> LR [c00b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
> Call Trace:
> [c6b0645b7890] [8e01a5a4dd20] 0x8e01a5a4dd20 (unreliable)
> [c6b0645b7a00] [c006d5b0] flush_hash_range+0x60/0x110
> [c6b0645b7a50] [c0072a2c] __flush_tlb_pending+0x4c/0xd0
> [c6b0645b7a80] [c02eaf44] unmap_page_range+0x984/0xbd0
> [c6b0645b7bc0] [c02eb594] unmap_vmas+0x84/0x100
> [c6b0645b7c10] [c02f8afc] exit_mmap+0xac/0x1f0
> [c6b0645b7cd0] [c00f2638] mmput+0x98/0x1b0
> [c6b0645b7d00] [c00fc9d0] do_exit+0x330/0xc00
> [c6b0645b7dc0] [c00fd384] do_group_exit+0x64/0x100
> [c6b0645b7e00] [c00fd44c] sys_exit_group+0x2c/0x30
> [c6b0645b7e30] [c000b960] system_call+0x58/0x6c
> Instruction dump:
> 6000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
> e9410060 e9610068 e9810070 4422 <7d806378> e9810028 f88c f8ac0008
> 
> This happens when removing the PTE by calling the hypervisor using the
> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
> tlbie for each PTE it is processing. This could lead to long time spent in
> the hypervisor (sometimes up to 4s) and soft lockup being raised because
> the scheduler is not called in zap_pte_range().
> 
> Since the Power7's time, the hypervisor is providing a new hcall
> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
> tlbie. By limiting the amount of tlbie generated, this reduces the time
> spent invalidating the PTEs.
> 
> This hcall requires that the pages are "all within the same naturally
> aligned 8 page virtual address block".
> 
> With this patch series applied, I couldn't see any soft lockup raised on
> the victim LPAR I was running the test one.
> 
> Changes since V1:
> - Remove a call to BUG_ON() in call_block_remove() since this one can be
>   handled gently.
> - Remove uneeded of current_vpgb to 0 when retrying entries in
>   hugepage_block_invalidate() and do_block_remove().
> 
> Laurent Dufour (3):
>   powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
>   powerpc/pseries/mm: factorize PTE slot computation
>   powerpc/pseries/mm: call H_BLOCK_REMOVE
> 
>  arch/powerpc/include/asm/firmware.h   |   3 +-
>  arch/powerpc/include/asm/hvcall.h |   1 +
>  arch/powerpc/platforms/pseries/firmware.c |   1 +
>  arch/powerpc/platforms/pseries/lpar.c | 241 
> --
>  4 files changed, 230 insertions(+), 16 deletions(-)
> 



Re: Conflict between sparse and commit cafa0010cd51f ("Raise the minimum required gcc version to 4.6")

2018-09-10 Thread Christophe Leroy




On 09/10/2018 11:34 AM, Luc Van Oostenryck wrote:

On Mon, Sep 10, 2018 at 09:56:33AM +, Christophe Leroy wrote:


# export REAL_CC=ppc-linux-gcc
# make CHECK="cgcc -target=ppc -D_CALL_ELF=2 -D__GCC__=5
-D__GCC_MINOR__=4" C=2 arch/powerpc/kernel/process.o
scripts/kconfig/conf  --syncconfig Kconfig
#
# configuration written to .config
#
   UPD include/config/kernel.release
   UPD include/generated/utsrelease.h
   CC  kernel/bounds.s
   CC  arch/powerpc/kernel/asm-offsets.s
   CALLscripts/checksyscalls.sh
   CHECK   scripts/mod/empty.c
Can't exec "/bin/sh": Argument list too long at /usr/local/bin/cgcc line 86.
make[2]: *** [scripts/mod/empty.o] Error 1
make[1]: *** [scripts/mod] Error 2
make: *** [scripts] Error 2


OK. Clearly nobody has ever used it so :(
There is an infinite loop because cgcc use the env var CHECK
to call sparse while kbuild use CHECK to call cgcc here.

The following seems to work here.
$ export REAL_CC=ppc-linux-gcc
$ make CHECK="CHECK=sparse cgcc -target=ppc ...


Not yet ...

[root@pc16082vm linux-powerpc]# export REAL_CC=ppc-linux-gcc
[root@pc16082vm linux-powerpc]# make CHECK="CHECK=sparse cgcc 
-target=ppc -D_CALL_ELF=2 -D__GNUC__=5 -D__GNUC_MINOR__=4" C=2 
arch/powerpc/kernel/process.o

  CALLscripts/checksyscalls.sh
  CHECK   scripts/mod/empty.c
:0:0: warning: "__STDC__" redefined
: note: this is the location of the previous definition
/opt/cldk-1.4.0/lib/gcc/ppc-linux/5.4.0/../../../../ppc-linux/lib/crt1.o:(.rodata+0x4): 
undefined reference to `main'

collect2: error: ld returned 1 exit status
make[2]: *** [scripts/mod/empty.o] Error 1
make[1]: *** [scripts/mod] Error 2
make: *** [scripts] Error 2

Christophe



It's a bit kludgy, I admit.

-- Luc



Re: v4.17 regression: PowerMac G3 won't boot, was Re: [PATCH v5 1/3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle()

2018-09-10 Thread Rob Herring
On Sun, Sep 09, 2018 at 07:04:25PM +0200, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-31 at 14:58 +1000, Benjamin Herrenschmidt wrote:
> > 
> > > A long shot, but something to consider, is that I failed to cover the
> > > cases of dynamic devicetree updates (removing nodes that contain a
> > > phandle) in ways other than overlays.  Michael Ellerman has reported
> > > such a problem for powerpc/mobility with of_detach_node().  A patch to
> > > fix that is one of the tasks I need to complete.
> > 
> > The only thing I can think of is booting via the BootX bootloader on
> > those ancient macs results in a DT with no phandles. I didn't see an
> > obvious reason why that would cause that patch to break though.
> 
> Guys, we still don't have a fix for this one on its way upstream...
> 
> My test patch just creates phandle properties for all nodes, that was
> not intended as a fix, more a way to check if the problem was related
> to the lack of phandles.
> 
> I don't actually know why the new code causes things to fail when
> phandles are absent. This needs to be looked at.
> 
> I'm travelling at the moment and generally caught up with other things,
> I haven't had a chance to dig, so just a heads up. I don't intend to
> submit my patch since it's just a band aid. We need to figure out what
> the actual problem is.

Can you try this patch (w/o Ben's patch). I think the problem is if 
there are no phandles, then roundup_pow_of_two is passed 0 which is 
documented as undefined result.

Though, if a DT has no properties with phandles, then why are we doing a 
lookup in the first place?


8<--

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 9095b8290150..74eaedd5b860 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -140,6 +140,9 @@ void of_populate_phandle_cache(void)
if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
phandles++;
 
+   if (!phandles)
+   goto out;
+
cache_entries = roundup_pow_of_two(phandles);
phandle_cache_mask = cache_entries - 1;
 


Re: Conflict between sparse and commit cafa0010cd51f ("Raise the minimum required gcc version to 4.6")

2018-09-10 Thread Luc Van Oostenryck
On Mon, Sep 10, 2018 at 09:56:33AM +, Christophe Leroy wrote:
> 
> # export REAL_CC=ppc-linux-gcc
> # make CHECK="cgcc -target=ppc -D_CALL_ELF=2 -D__GCC__=5
> -D__GCC_MINOR__=4" C=2 arch/powerpc/kernel/process.o
> scripts/kconfig/conf  --syncconfig Kconfig
> #
> # configuration written to .config
> #
>   UPD include/config/kernel.release
>   UPD include/generated/utsrelease.h
>   CC  kernel/bounds.s
>   CC  arch/powerpc/kernel/asm-offsets.s
>   CALLscripts/checksyscalls.sh
>   CHECK   scripts/mod/empty.c
> Can't exec "/bin/sh": Argument list too long at /usr/local/bin/cgcc line 86.
> make[2]: *** [scripts/mod/empty.o] Error 1
> make[1]: *** [scripts/mod] Error 2
> make: *** [scripts] Error 2

OK. Clearly nobody has ever used it so :(
There is an infinite loop because cgcc use the env var CHECK
to call sparse while kbuild use CHECK to call cgcc here.

The following seems to work here.
$ export REAL_CC=ppc-linux-gcc
$ make CHECK="CHECK=sparse cgcc -target=ppc ...

It's a bit kludgy, I admit.

-- Luc


[tip:sched/core] sched/topology: Set correct NUMA topology type

2018-09-10 Thread tip-bot for Srikar Dronamraju
Commit-ID:  e5e96fafd9028b1478b165db78c52d981c14f471
Gitweb: https://git.kernel.org/tip/e5e96fafd9028b1478b165db78c52d981c14f471
Author: Srikar Dronamraju 
AuthorDate: Fri, 10 Aug 2018 22:30:18 +0530
Committer:  Ingo Molnar 
CommitDate: Mon, 10 Sep 2018 10:13:45 +0200

sched/topology: Set correct NUMA topology type

With the following commit:

  051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain")

the scheduler introduced a new NUMA level. However this leads to the NUMA 
topology
on 2 node systems to not be marked as NUMA_DIRECT anymore.

After this commit, it gets reported as NUMA_BACKPLANE, because
sched_domains_numa_level is now 2 on 2 node systems.

Fix this by allowing setting systems that have up to 2 NUMA levels as
NUMA_DIRECT.

While here remove code that assumes that level can be 0.

Signed-off-by: Srikar Dronamraju 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andre Wild 
Cc: Heiko Carstens 
Cc: Linus Torvalds 
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Suravee Suthikulpanit 
Cc: Thomas Gleixner 
Cc: linuxppc-dev 
Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain"
Link: 
http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-sri...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/topology.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 56a0fed30c0a..505a41c42b96 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1295,7 +1295,7 @@ static void init_numa_topology_type(void)
 
n = sched_max_numa_distance;
 
-   if (sched_domains_numa_levels <= 1) {
+   if (sched_domains_numa_levels <= 2) {
sched_numa_topology_type = NUMA_DIRECT;
return;
}
@@ -1380,9 +1380,6 @@ void sched_init_numa(void)
break;
}
 
-   if (!level)
-   return;
-
/*
 * 'level' contains the number of unique distances
 *


Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michal Suchánek
On Mon, 10 Sep 2018 12:16:35 +0200
Christophe LEROY  wrote:

> Le 10/09/2018 à 12:05, Michael Neuling a écrit :
> >   
> >>> + /* Make sure we aren't patching a freed init section */
> >>> + if (in_init_section(patch_addr) && init_freed())
> >>> + return 0;
> >>> +  
> >>
> >> Do we even need the init_freed() check?  
> > 
> > Maybe not.  If userspace isn't up, then maybe it's ok to skip.  
> 
> Euh ... Do you mean you'll skip all patches into init functions ?
> But code patching is not only for meltdown/spectrum workarounds, some
> of the patchings might be needed for the init functions themselves.

Some stuff like cpu feature tests have an early variant that does not
need patching but maybe not everything has.

and some stuff like lwsync might be also expanded from some macros or
inlines and may be needed in the init code. It might be questionable to
rely on it getting patched, though. Hard to tell without seeing what is
actually patched where.

Thanks

Michal


Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Christophe LEROY




Le 10/09/2018 à 12:05, Michael Neuling a écrit :



+   /* Make sure we aren't patching a freed init section */
+   if (in_init_section(patch_addr) && init_freed())
+   return 0;
+


Do we even need the init_freed() check?


Maybe not.  If userspace isn't up, then maybe it's ok to skip.


Euh ... Do you mean you'll skip all patches into init functions ?
But code patching is not only for meltdown/spectrum workarounds, some of 
the patchings might be needed for the init functions themselves.


Christophe




What user input can we process in init-only code?


See the stack trace in the commit message. It's a weird case for KVM guests in
KVM PR mode.

That's the only case I can found so far.


Also it would be nice to write the function+offset of the skipped patch
location into the kernel log.


OK. I'll update.

Mikey



Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michael Neuling


> > +   /* Make sure we aren't patching a freed init section */
> > +   if (in_init_section(patch_addr) && init_freed())
> > +   return 0;
> > +
> 
> Do we even need the init_freed() check?

Maybe not.  If userspace isn't up, then maybe it's ok to skip.

> What user input can we process in init-only code?

See the stack trace in the commit message. It's a weird case for KVM guests in
KVM PR mode. 

That's the only case I can found so far.

> Also it would be nice to write the function+offset of the skipped patch
> location into the kernel log.

OK. I'll update.

Mikey


Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michael Neuling


> > For stable I've marked this as v4.13+ since that's when we refactored
> > code-patching.c but it could go back even further than that. In
> > reality though, I think we can only hit this since the first
> > spectre/meltdown changes.
> 
> Which means it affects all maintained stable trees because the
> spectre/meltdown changes are backported.

Yep we this this on SLES12 SP3 

Mikey


Re: Conflict between sparse and commit cafa0010cd51f ("Raise the minimum required gcc version to 4.6")

2018-09-10 Thread Christophe Leroy




On 09/10/2018 09:28 AM, Luc Van Oostenryck wrote:

On Mon, Sep 10, 2018 at 08:49:07AM +0200, Christophe LEROY wrote:

Le 07/09/2018 à 20:19, Nick Desaulniers a écrit :

On Fri, Sep 7, 2018 at 11:13 AM Luc Van Oostenryck wrote:


Sparse expand these macros to the same version than the compiler used
to compile GCC. I find a bit strange though to have sparse v0.5.2 but
using an old compiler.


So Christophe must have a version of gcc < 4.6 installed somewhere?
Does sparse use `cc`? If so, Christophe, does your `ls -l $(which cc)`
point to an old version of gcc maybe?


Indeed it looks like sparse expand these macros to the version of
the compiler it was compiled with.

I'm building kernels for a powerpc platforms, with CROSS_COMPILE set
to ppc-linux- and ppc-linux-gcc being version 5.4

However my build machine is a CentOS6 and the native gcc has version
4.4.7, so sparse expands that version.


OK, I see.
  

Is there a way to get sparse in line with my cross compiler version
and not with the local native version ?


When cross-compiling, there is also things like the machine word-size
and the endianness to take in account (they also default to the
native compiler used to compile sparse itself) as well as a few
defines (like __PPC64__). To be in line with your cross-compiler
you can use to the wrapper 'cgcc' (installed with sparse) and call
it, for example, like this:
$ export REAL_CC=ppc-linux-gcc
$ cgcc -target=ppcc64 -D_CALL_ELF=2 -D__GCC__=5 -D__GCC_MINOR__=4 ...
or, since this is for the kernel:
$ export REAL_CC=ppc-linux-gcc
$ make CHECK='cgcc -target=ppcc64 ...


I think this should solve it. Do not hesitate to report any
difficulties you may encounter.


# export REAL_CC=ppc-linux-gcc
# make CHECK="cgcc -target=ppc -D_CALL_ELF=2 -D__GCC__=5 
-D__GCC_MINOR__=4" C=2 arch/powerpc/kernel/process.o

scripts/kconfig/conf  --syncconfig Kconfig
#
# configuration written to .config
#
  UPD include/config/kernel.release
  UPD include/generated/utsrelease.h
  CC  kernel/bounds.s
  CC  arch/powerpc/kernel/asm-offsets.s
  CALLscripts/checksyscalls.sh
  CHECK   scripts/mod/empty.c
Can't exec "/bin/sh": Argument list too long at /usr/local/bin/cgcc line 86.
make[2]: *** [scripts/mod/empty.o] Error 1
make[1]: *** [scripts/mod] Error 2
make: *** [scripts] Error 2

Christophe



-- Luc



Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michal Suchánek
On Mon, 10 Sep 2018 15:44:05 +1000
Michael Neuling  wrote:

> This stops us from doing code patching in init sections after they've
> been freed.
> 
> In this chain:
>   kvm_guest_init() ->
> kvm_use_magic_page() ->
>   fault_in_pages_readable() ->
>__get_user() ->
>  __get_user_nocheck() ->
>barrier_nospec();
> 
> We have a code patching location at barrier_nospec() and
> kvm_guest_init() is an init function. This whole chain gets inlined,
> so when we free the init section (hence kvm_guest_init()), this code
> goes away and hence should no longer be patched.
> 
> We seen this as userspace memory corruption when using a memory
> checker while doing partition migration testing on powervm (this
> starts the code patching post migration via
> /sys/kernel/mobility/migration). In theory, it could also happen when
> using /sys/kernel/debug/powerpc/barrier_nospec.
> 
> With this patch there is a small change of a race if we code patch
> between the init section being freed and setting SYSTEM_RUNNING (in
> kernel_init()) but that seems like an impractical time and small
> window for any code patching to occur.
> 
> cc: sta...@vger.kernel.org # 4.13+
> Signed-off-by: Michael Neuling 
> 
> ---
> For stable I've marked this as v4.13+ since that's when we refactored
> code-patching.c but it could go back even further than that. In
> reality though, I think we can only hit this since the first
> spectre/meltdown changes.
> ---
>  arch/powerpc/lib/code-patching.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/lib/code-patching.c
> b/arch/powerpc/lib/code-patching.c index 850f3b8f4d..a2bc08bfd8 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -23,11 +23,30 @@
>  #include 
>  #include 
>  
> +
> +static inline bool in_init_section(unsigned int *patch_addr)
> +{
> + if (patch_addr < (unsigned int *)__init_begin)
> + return false;
> + if (patch_addr >= (unsigned int *)__init_end)
> + return false;
> + return true;
> +}
> +
> +static inline bool init_freed(void)
> +{
> + return (system_state >= SYSTEM_RUNNING);
> +}
> +
>  static int __patch_instruction(unsigned int *exec_addr, unsigned int
> instr, unsigned int *patch_addr)
>  {
>   int err;
>  
> + /* Make sure we aren't patching a freed init section */
> + if (in_init_section(patch_addr) && init_freed())
> + return 0;
> +

Do we even need the init_freed() check?

What user input can we process in init-only code?

Also it would be nice to write the function+offset of the skipped patch
location into the kernel log.

Thanks

Michal


RE: [PATCH 1/3] soc: fsl: add Platform PM driver QorIQ platforms

2018-09-10 Thread Ran Wang
Hi Scott,

On 2018/9/8 4:35, Scott Wood wrote:
> 
> On Fri, 2018-08-31 at 11:52 +0800, Ran Wang wrote:
> > This driver is to provide a independent framework for PM service
> > provider and consumer to configure system level wake up feature. For
> > example, RCPM driver could register a callback function on this
> > platform first, and Flex timer driver who want to enable timer wake up
> > feature, will call generic API provided by this platform driver, and
> > then it will trigger RCPM driver to do it. The benefit is to isolate
> > the user and service, such as flex timer driver will not have to know
> > the implement details of wakeup function it require. Besides, it is
> > also easy for service side to upgrade its logic when design is changed
> > and remain user side unchanged.
> >
> > Signed-off-by: Ran Wang 
> > ---
> >  drivers/soc/fsl/Kconfig   |   14 +
> >  drivers/soc/fsl/Makefile  |1 +
> >  drivers/soc/fsl/plat_pm.c |  144
> > +
> >  include/soc/fsl/plat_pm.h |   22 +++
> >  4 files changed, 181 insertions(+), 0 deletions(-)  create mode
> > 100644 drivers/soc/fsl/plat_pm.c  create mode 100644
> > include/soc/fsl/plat_pm.h
> >
> > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig index
> > 7a9fb9b..6517412 100644
> > --- a/drivers/soc/fsl/Kconfig
> > +++ b/drivers/soc/fsl/Kconfig
> > @@ -16,3 +16,17 @@ config FSL_GUTS
> >   Initially only reading SVR and registering soc device are
> > supported.
> >   Other guts accesses, such as reading RCW, should eventually be
> > moved
> >   into this driver as well.
> +
> > +config FSL_PLAT_PM
> > +   bool "Freescale platform PM framework"
> 
> This name seems to be simultaneously too generic (for something that is
> likely intended only for use with certain Freescale/NXP chip families) and too
> specific (for something that seems to be general infrastructure with no real
> hardware dependencies).

Yes, this driver has no real HW dependencies at all. But we'd like to introduce 
it
to help providing more flexibility & generic on FSL PM feature configure (so 
far 
we have RCPM on system wakeup source control). I think it's good
for driver/IP porting among different SoC in the future. As to the name, do you
have better suggestion?

> What specific problems with Linux's generic wakeup infrastructure is this
> trying to solve, and why would those problems not be better solved there?

Actually, I am not sure if generic wakeup infrastructure have this kind of PM 
feature
(keep specific IP alive during system suspend, could you please show me?).
And I think it is not common requirement, so I decide to put it in FSL folder. 

> Also, you should CC linux-pm on these patches.

Yes, thanks for suggestion

Regards,
Ran

> -Scott



RE: [PATCH 3/3] soc: fsl: add RCPM driver

2018-09-10 Thread Ran Wang
Hi Scott,

On 2018/9/8 18:16, Scott Wood wrote:
> 
> On Fri, 2018-08-31 at 11:52 +0800, Ran Wang wrote:
> > The NXP's QorIQ Processors based on ARM Core have RCPM module (Run
> > Control and Power Management), which performs all device-level tasks
> > associated with power management such as wakeup source control.
> >
> > This driver depends on FSL platform PM driver framework which help to
> > isolate user and PM service provider (such as RCPM driver).
> >
> > Signed-off-by: Chenhui Zhao 
> > Signed-off-by: Ying Zhang 
> > Signed-off-by: Ran Wang 
> > ---
> >  drivers/soc/fsl/Kconfig   |6 ++
> >  drivers/soc/fsl/Makefile  |1 +
> >  drivers/soc/fsl/ls-rcpm.c |  153
> > +
> >  3 files changed, 160 insertions(+), 0 deletions(-)  create mode
> > 100644 drivers/soc/fsl/ls-rcpm.c
> 
> Is there a reason why this is LS-specific, or could it be used with PPC RCPM
> blocks?

They have different SW arch design on low power operation: PPC RCPM
driver has taken care of most things of suspend enter & exit. And LS RCPM driver
will only handle wakeup source configure and left rest work to system firmware
to do. So you might be aware that LS RCPM will only get call whenever plat_pm
driver API is called rather than system suspend begin where.

> 
> > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig index
> > 6517412..882330d 100644
> > --- a/drivers/soc/fsl/Kconfig
> > +++ b/drivers/soc/fsl/Kconfig
> > @@ -30,3 +30,9 @@ config FSL_PLAT_PM
> >   have to know the implement details of wakeup function it require.
> >   Besides, it is also easy for service side to upgrade its logic
> > when
> >   design changed and remain user side unchanged.
> > +
> > +config LS_RCPM
> > +   bool "Freescale RCPM support"
> > +   depends on (FSL_PLAT_PM)
> 
> Why is this parenthesized?

Because we'd like to decouple RCPM driver and its user.
Benefit is that will allow user doesn't have to know who will serve it for some 
PM
features (such as wake up source control), and provide some kind of flexibility 
when
either RCMP or user driver evolve in the future. So I add a plat_pm driver to 
prevent
wake up IP knowing any information of RCPM.

Regards,
Ran

> 
> -Scott



[PATCH 2/2] powerpc/boot: Ensure _zimage_start is a weak symbol

2018-09-10 Thread Joel Stanley
When building with clang crt0's _zimage_start is not marked weak, which
breaks the build when linking the kernel image:

 $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
 0058 g   .text   _zimage_start

 ld: arch/powerpc/boot/wrapper.a(crt0.o): in function '_zimage_start':
 (.text+0x58): multiple definition of '_zimage_start';
 arch/powerpc/boot/pseries-head.o:(.text+0x0): first defined here

Clang requires the .weak directive to appear after the symbol is
declared. The binutils manual says:

 This directive sets the weak attribute on the comma separated list of
 symbol names. If the symbols do not already exist, they will be
 created.

So it appears this is different with clang. The only reference I could
see for this was an OpenBSD mailing list post[1].

Changing it to be after the declaration fixes building with Clang, and
still works with GCC.

 $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
 0058  w  .text  _zimage_start

[1] https://groups.google.com/forum/#!topic/fa.openbsd.tech/PAgKKen2YCY

Signed-off-by: Joel Stanley 
---
 arch/powerpc/boot/crt0.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S
index ace3f3c64620..41c6d03d6e2d 100644
--- a/arch/powerpc/boot/crt0.S
+++ b/arch/powerpc/boot/crt0.S
@@ -47,8 +47,8 @@ p_end:.long   _end
 p_pstack:  .long   _platform_stack_top
 #endif
 
-   .weak   _zimage_start
.globl  _zimage_start
+   .weak   _zimage_start
 _zimage_start:
.globl  _zimage_start_lib
 _zimage_start_lib:
-- 
2.17.1



[PATCH 1/2] powerpc/boot: Fix crt0.S syntax for clang

2018-09-10 Thread Joel Stanley
Clang's assembler does not like the syntax of the cmpdi:

 arch/powerpc/boot/crt0.S:168:22: error: unexpected modifier on variable 
reference
 cmpdi   12,RELACOUNT@l
  ^
 arch/powerpc/boot/crt0.S:168:11: error: unknown operand
 cmpdi   12,RELACOUNT@l
   ^
Enclosing RELACOUNT in () makes it is happy. Tested with GCC 8 and Clang
8 (trunk).

Signed-off-by: Joel Stanley 
---
 arch/powerpc/boot/crt0.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S
index dcf2f15e6797..ace3f3c64620 100644
--- a/arch/powerpc/boot/crt0.S
+++ b/arch/powerpc/boot/crt0.S
@@ -165,7 +165,7 @@ p_base: mflrr10 /* r10 now points to 
runtime addr of p_base */
ld  r13,8(r11)   /* get RELA pointer in r13 */
b   11f
 10:addis   r12,r12,(-RELACOUNT)@ha
-   cmpdi   r12,RELACOUNT@l
+   cmpdi   r12,(RELACOUNT)@l
bne 11f
ld  r8,8(r11)   /* get RELACOUNT value in r8 */
 11:addir11,r11,16
-- 
2.17.1



[PATCH 0/2] powerpc: Clang build fixes

2018-09-10 Thread Joel Stanley
Two fixes to get us closer to building with clang. With a one patch[1]
on top of clang master I can build and boot a powernv kernel:

$ make ARCH=powerpc powernv_defconfig
$ ./scripts/config -e PPC_DISABLE_WERROR -d FTRACE -d BTRFS_FS -d MD_RAID456
$ make CC=/scratch/joel/llvm-build/bin/clang-8 
CLANG_TRIPLE=powerpc64le-linux-gnu -j128

$ qemu-system-ppc64 -M powernv -m 3G -nographic -kernel zImage.epapr \
 -L ~/skiboot/ -initrd ~/rootfs.cpio.xz

Linux version 4.19.0-rc3-3-g728b25f26bce (joel@ozrom3) (clang version 8.0.0 
(trunk 341773)) #12 SMP Mon Sep 10 17:32:05 ACST 2018

The DISABLE_WERROR is due to clang's -Wduplicate-decl-specifier. Some
macros we have in arch/powerpc/include/asm/uaccess.h warn on 'const
typeof(var)', where as the GCC version doesn't. Anton did fix this a
while ago, but the fix was 'reverted' to resolve some sparse warnings.
I think we should re-apply Anton's patch[2].

[1] https://reviews.llvm.org/D50965
[2] http://git.kernel.org/torvalds/c/b91c1e3e7a6f22a6b898e345b745b6a43273c973

Joel Stanley (2):
  powerpc/boot: Fix crt0.S syntax for clang
  powerpc/boot: Ensure _zimage_start is a weak symbol

 arch/powerpc/boot/crt0.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.17.1



Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-09-10 Thread Gerd Hoffmann
> > this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example
> > QEMU has the use of iommu_platform attribute disabled for virtio-gpu
> > device.  So would also like to move towards not having to specify
> > the VIRTIO_F_IOMMU_PLATFORM flag.
> 
> Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your 
> platform given that you can't directly use physical addresses.
> Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM.

This needs both host and guest side changes btw.

Guest side patch is in drm-misc (a3b815f09bb8) and should land in the
next merge window.

Host side patches are here:
  https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-iommu

Should also land in the next qemu version.

cheers,
  Gerd



RE: [PATCH 2/3] Documentation: dt: binding: fsl: update property description for RCPM

2018-09-10 Thread Ran Wang
Hi Scott,

On 2018/9/8 4:23, Scott Wood wrote:
> 
> On Fri, 2018-08-31 at 11:52 +0800, Ran Wang wrote:
> > +Optional properties:
> > + - big-endian : Indicate RCPM registers is big-endian. A RCPM node
> > +   that doesn't have this property will be regarded as little-endian.
> 
> You've just broken all the existing powerpc device trees that are big-endian
> and have no big-endian property.

Yes, powerpc RCPM driver (arch/powerpc/sysdev/fsl_rcpm.c) will not refer
to big-endian. However, I think if Layerscape RCPM driver use different 
compatible
id (such as ' fsl,qoriq-rcpm-2.2'), it might be safe. Is that OK?

> > + -  : This string
> > +   is referred by RCPM driver to judge if the consumer (such as flex timer)
> > +   is able to be regards as wakeup source or not, such as
> > + 'fsl,ls1012a-
> > ftm'.
> > +   Further, this property will carry the bit mask info to control
> > +   coresponding wake up source.
> 
> What will you do if there are multiple instances of a device with the same
> compatible, and different wakeup bits?  

You got me! This is a problem in current version. Well, we have to decouple 
wake up
source IP and RCPM driver. That's why I add a plat_pm driver to prevent wake up 
IP
knowing any information of RCPM. So in current context, one idea come to me is 
to
redesign property ' fsl,ls1012a-ftm = <0x2>;', change to 'fsl,ls1012a-ftm = 
< 0x2>;'
What do you say? And could you tell me which API I can use to check if that 
device's
name is ftm0 (for example we want to find a node ftm0: ftm@29d000)?

>Plus, it's an awkward design in
> general, and you don't describe what the value actually means (bits in which
> register?). 

Yes, I admit my design looks ugly and not flexible and extensible enough. 
However, for above reason, 
do you have any good idea, please? :)

As to the register information, I can explain here in details (will add to 
commit
message of next version): There is a RCPM HW block which has register named 
IPPDEXPCR.
It controls whether prevent certain IP (such as timer, usb, etc) from entering 
low
power mode when system suspended. So it's necessary to program it if we want one
of those IP work as a wakeup source. However, different Layerscape SoCs have 
different bit vs.
IP mapping  layout. So I have to record this information in device tree and 
fetched by RCPM driver
directly.

Do I need to list all SoC's mapping information in this binding doc for 
reference?

>What was wrong with the existing binding?  

There was one version of RCPM patch which requiring property 
'fsl,#rcpm-wakeup-cells' but was not
accepted by upstream finally. Now we will no need it any longer due to new 
design allow case of multiple
RCPM devices existing case.

>Alternatively, use the clock bindings.

Sorry I didn't get your point.

> > -
> > -Example:
> > -   lpuart0: serial@295 {
> > -   compatible = "fsl,ls1021a-lpuart";
> > -   reg = <0x0 0x295 0x0 0x1000>;
> > -   interrupts = ;
> > -   clocks = <>;
> > -   clock-names = "ipg";
> > -   fsl,rcpm-wakeup = < 0x0 0x4000>;
> > +   big-endian;
> > +   fsl,ls1012a-ftm = <0x2>;
> > +   fsl,pfe = <0xf020>;
> 
> fsl,pfe is not documented.

pfe patch is not upstream yet, will remove it.

Regards,
Ran

> -Scott



[PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal

2018-09-10 Thread Alexey Kardashevskiy
The kvmppc_gpa_to_ua() helper itself takes care of the permission
bits in the TCE and yet every single caller removes them.

This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
(which are GPAs + TCE permission bits) to make the callers simpler.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* %s/kvmppc_gpa_to_ua/kvmppc_tce_to_ua/g
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c| 12 
 arch/powerpc/kvm/book3s_64_vio_hv.c | 22 +-
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2f5d431..38d0328 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,7 +194,7 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
(stt)->size, (ioba), (npages)) ?\
H_PARAMETER : H_SUCCESS)
-extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
unsigned long idx, unsigned long tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8231b17..c0c64d1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -378,8 +378,7 @@ static long kvmppc_tce_validate(struct 
kvmppc_spapr_tce_table *stt,
if (iommu_tce_check_gpa(stt->page_shift, gpa))
return H_TOO_HARD;
 
-   if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-   , NULL))
+   if (kvmppc_tce_to_ua(stt->kvm, tce, , NULL))
return H_TOO_HARD;
 
list_for_each_entry_rcu(stit, >iommu_tables, next) {
@@ -552,8 +551,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long 
liobn,
 
idx = srcu_read_lock(>kvm->srcu);
 
-   if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
-   tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), , NULL)) {
+   if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, , NULL)) {
ret = H_PARAMETER;
goto unlock_exit;
}
@@ -614,7 +612,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
return ret;
 
idx = srcu_read_lock(>kvm->srcu);
-   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , NULL)) {
+   if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, , NULL)) {
ret = H_TOO_HARD;
goto unlock_exit;
}
@@ -649,9 +647,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
}
tce = be64_to_cpu(tce);
 
-   if (kvmppc_gpa_to_ua(vcpu->kvm,
-   tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-   , NULL))
+   if (kvmppc_tce_to_ua(vcpu->kvm, tce, , NULL))
return H_PARAMETER;
 
list_for_each_entry_lockless(stit, >iommu_tables, next) {
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index adf3b21..389dac1 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -110,8 +110,7 @@ static long kvmppc_rm_tce_validate(struct 
kvmppc_spapr_tce_table *stt,
if (iommu_tce_check_gpa(stt->page_shift, gpa))
return H_PARAMETER;
 
-   if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-   , NULL))
+   if (kvmppc_tce_to_ua(stt->kvm, tce, , NULL))
return H_TOO_HARD;
 
list_for_each_entry_lockless(stit, >iommu_tables, next) {
@@ -180,10 +179,10 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
unsigned long *ua, unsigned long **prmap)
 {
-   unsigned long gfn = gpa >> PAGE_SHIFT;
+   unsigned long gfn = tce >> PAGE_SHIFT;
struct kvm_memory_slot *memslot;
 
memslot = search_memslots(kvm_memslots(kvm), gfn);
@@ -191,7 +190,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
return -EINVAL;
 
*ua = __gfn_to_hva_memslot(memslot, gfn) |
-   (gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+   (tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
if (prmap)
@@ -200,7 +199,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 
return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua);
 
 #ifdef 

[PATCH kernel v2 5/6] KVM: PPC: Propagate errors to the guest when failed instead of ignoring

2018-09-10 Thread Alexey Kardashevskiy
At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
the hardware tables, we print a warning once, clear the entry and
continue. This is so as at the time the assumption was that if
a VFIO device is hotplugged into the guest, and the userspace replays
virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
then there is nothing useful we can do about it.

However the assumption is not valid as these handlers are not called for
TCE replay (VFIO ioctl interface is used for that) and these handlers
are for new TCEs.

This returns an error to the guest if there is a request which cannot be
processed. By now the only possible failure must be H_TOO_HARD.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_vio.c| 20 ++--
 arch/powerpc/kvm/book3s_64_vio_hv.c | 21 +++--
 2 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 01e1994..8231b17 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned 
long liobn,
ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
entry, ua, dir);
 
-   if (ret == H_SUCCESS)
-   continue;
-
-   if (ret == H_TOO_HARD)
+   if (ret != H_SUCCESS) {
+   kvmppc_clear_tce(stit->tbl, entry);
goto unlock_exit;
-
-   WARN_ON_ONCE(1);
-   kvmppc_clear_tce(stit->tbl, entry);
+   }
}
 
kvmppc_tce_put(stt, entry, tce);
@@ -663,14 +659,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
stit->tbl, entry + i, ua,
iommu_tce_direction(tce));
 
-   if (ret == H_SUCCESS)
-   continue;
-
-   if (ret == H_TOO_HARD)
+   if (ret != H_SUCCESS) {
+   kvmppc_clear_tce(stit->tbl, entry);
goto unlock_exit;
-
-   WARN_ON_ONCE(1);
-   kvmppc_clear_tce(stit->tbl, entry);
+   }
}
 
kvmppc_tce_put(stt, entry + i, tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 977e95a..adf3b21 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -403,14 +403,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned 
long liobn,
ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
stit->tbl, entry, ua, dir);
 
-   if (ret == H_SUCCESS)
-   continue;
-
-   if (ret == H_TOO_HARD)
+   if (ret != H_SUCCESS) {
+   kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
return ret;
-
-   WARN_ON_ONCE_RM(1);
-   kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
+   }
}
 
kvmppc_tce_put(stt, entry, tce);
@@ -556,14 +552,11 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
stit->tbl, entry + i, ua,
iommu_tce_direction(tce));
 
-   if (ret == H_SUCCESS)
-   continue;
-
-   if (ret == H_TOO_HARD)
+   if (ret != H_SUCCESS) {
+   kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl,
+   entry);
goto unlock_exit;
-
-   WARN_ON_ONCE_RM(1);
-   kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
+   }
}
 
kvmppc_tce_put(stt, entry + i, tce);
-- 
2.11.0



[PATCH kernel v2 4/6] KVM: PPC: Validate TCEs against preregistered memory page sizes

2018-09-10 Thread Alexey Kardashevskiy
The userspace can request an arbitrary supported page size for a DMA
window and this works fine as long as the mapped memory is backed with
the pages of the same or bigger size; if this is not the case,
mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
dangerously incorrect TCEs.

However since it is quite easy to misconfigure the KVM and we do not do
reverts to all changes made to TCE tables if an error happens in a middle,
we better do the acceptable page size validation before we even touch
the tables.

This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
against the preregistered memory page sizes.

Since the new check uses real/virtual mode helpers, this renames
kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
case and mirrors it for the virtual mode under the old name. The real
mode handler is not used for the virtual mode as:
1. it uses _lockless() list traversing primitives instead of RCU;
2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
virtual mode does not have to use and since on POWER9+radix only virtual
mode handlers actually work, we do not want to slow down that path even
a bit.

This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
are static now.

>From now on the attempts on mapping IOMMU pages bigger than allowed will
result in KVM exit.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
Changes:
v2:
* updated commit log
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 --
 arch/powerpc/kvm/book3s_64_vio.c| 35 +++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++---
 3 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e991821..2f5d431 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
(stt)->size, (ioba), (npages)) ?\
H_PARAMETER : H_SUCCESS)
-extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
-   unsigned long tce);
 extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 984cec8..01e1994 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
return ret;
 }
 
+static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
+   unsigned long tce)
+{
+   unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+   enum dma_data_direction dir = iommu_tce_direction(tce);
+   struct kvmppc_spapr_tce_iommu_table *stit;
+   unsigned long ua = 0;
+
+   /* Allow userspace to poison TCE table */
+   if (dir == DMA_NONE)
+   return H_SUCCESS;
+
+   if (iommu_tce_check_gpa(stt->page_shift, gpa))
+   return H_TOO_HARD;
+
+   if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+   , NULL))
+   return H_TOO_HARD;
+
+   list_for_each_entry_rcu(stit, >iommu_tables, next) {
+   unsigned long hpa = 0;
+   struct mm_iommu_table_group_mem_t *mem;
+   long shift = stit->tbl->it_page_shift;
+
+   mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
+   if (!mem)
+   return H_TOO_HARD;
+
+   if (mm_iommu_ua_to_hpa(mem, ua, shift, ))
+   return H_TOO_HARD;
+   }
+
+   return H_SUCCESS;
+}
+
 static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
 {
unsigned long hpa = 0;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 7388b66..977e95a 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
  * to the table and user space is supposed to process them), we can skip
  * checking other things (such as TCE is a guest RAM address or the page
  * was actually allocated).
- *
- * WARNING: This will be called in real-mode on HV KVM and virtual
- *  mode on PR KVM
  */
-long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
+   unsigned long tce)
 {
unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
enum dma_data_direction dir = iommu_tce_direction(tce);
+   struct kvmppc_spapr_tce_iommu_table *stit;
+  

[PATCH kernel v2 2/6] KVM: PPC: Validate all tces before updating tables

2018-09-10 Thread Alexey Kardashevskiy
The KVM TCE handlers are written in a way so they fail when either
something went horribly wrong or the userspace did some obvious mistake
such as passing a misaligned address.

We are going to enhance the TCE checker to fail on attempts to map bigger
IOMMU page than the underlying pinned memory so let's valitate TCE
beforehand.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
Changes:
v2:
* added a comment for the second get_user() from v1 discussion
---
 arch/powerpc/kvm/book3s_64_vio.c| 18 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  4 
 2 files changed, 22 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9a3f264..3c17977 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -599,6 +599,24 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
ret = kvmppc_tce_validate(stt, tce);
if (ret != H_SUCCESS)
goto unlock_exit;
+   }
+
+   for (i = 0; i < npages; ++i) {
+   /*
+* This looks unsafe, because we validate, then regrab
+* the TCE from userspace which could have been changed by
+* another thread.
+*
+* But it actually is safe, because the relevant checks will be
+* re-executed in the following code.  If userspace tries to
+* change this dodgily it will result in a messier failure mode
+* but won't threaten the host.
+*/
+   if (get_user(tce, tces + i)) {
+   ret = H_TOO_HARD;
+   goto unlock_exit;
+   }
+   tce = be64_to_cpu(tce);
 
if (kvmppc_gpa_to_ua(vcpu->kvm,
tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6821ead..c2848e0b 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -524,6 +524,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
ret = kvmppc_tce_validate(stt, tce);
if (ret != H_SUCCESS)
goto unlock_exit;
+   }
+
+   for (i = 0; i < npages; ++i) {
+   unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
ua = 0;
if (kvmppc_gpa_to_ua(vcpu->kvm,
-- 
2.11.0



[PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode

2018-09-10 Thread Alexey Kardashevskiy
At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
which in turn reads the old TCE and if it was a valid entry - marks
the physical page dirty if it was mapped for writing. Since it is
the real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
to get the page struct. However SetPageDirty() itself reads the compound
page head and returns a virtual address for the head page struct and
setting dirty bit for that kills the system.

This adds additional dirty bit tracking into the MM/IOMMU API for use
in the real mode. Note that this does not change how VFIO and
KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
- use the lowest bit of the cached host phys address to carry
the dirty bit;
- mark pages dirty when they are unpinned which happens when
the preregistered memory is released which always happens in virtual
mode;
- add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
- change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
in the new mm_iommu_ua_mark_dirty_rm() helper;
- move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
caller anyway) to reduce the real mode KVM and IOMMU knowledge
across different subsystems.

This removes realmode_pfn_to_page() as it is not used anymore.

While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
the real mode only and modules cannot call it anyway.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* only do delaying dirtying for the real mode
* no change in VFIO IOMMU SPAPR TCE driver is needed anymore
* inverted MM_IOMMU_TABLE_GROUP_PAGE_MASK
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
 arch/powerpc/include/asm/iommu.h |  2 --
 arch/powerpc/include/asm/mmu_context.h   |  1 +
 arch/powerpc/kernel/iommu.c  | 25 --
 arch/powerpc/kvm/book3s_64_vio_hv.c  | 39 +-
 arch/powerpc/mm/init_64.c| 49 
 arch/powerpc/mm/mmu_context_iommu.c  | 34 ---
 7 files changed, 62 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 13a688f..2fdc865 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1051,7 +1051,6 @@ static inline void vmemmap_remove_mapping(unsigned long 
start,
return hash__vmemmap_remove_mapping(start, page_size);
 }
 #endif
-struct page *realmode_pfn_to_page(unsigned long pfn);
 
 static inline pte_t pmd_pte(pmd_t pmd)
 {
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index ab3a4fb..3d4b88c 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -220,8 +220,6 @@ extern void iommu_del_device(struct device *dev);
 extern int __init tce_iommu_bus_notifier_init(void);
 extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
unsigned long *hpa, enum dma_data_direction *direction);
-extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
-   unsigned long *hpa, enum dma_data_direction *direction);
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index b2f89b6..b694d6a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -38,6 +38,7 @@ extern long mm_iommu_ua_to_hpa(struct 
mm_iommu_table_group_mem_t *mem,
unsigned long ua, unsigned int pageshift, unsigned long *hpa);
 extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
unsigned long ua, unsigned int pageshift, unsigned long *hpa);
+extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua);
 extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
 extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
 #endif
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index af7a20d..19b4c62 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1013,31 +1013,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned 
long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_xchg);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
-   unsigned long *hpa, enum dma_data_direction *direction)
-{
-   long ret;
-
-   ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
-
-   if (!ret && ((*direction == DMA_FROM_DEVICE) ||
-   (*direction == DMA_BIDIRECTIONAL))) {
-   struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT);
-
-   if (likely(pg)) {
-   SetPageDirty(pg);
-  

[PATCH kernel v2 0/6] KVM: PPC: TCE improvements

2018-09-10 Thread Alexey Kardashevskiy
Hi,

Here is my current queue of TCE/KVM patches.

1/6 is a bugfix for https://bugzilla.redhat.com/show_bug.cgi?id=1620360
2/6..5/6 are to help with testing 
https://bugzilla.redhat.com/show_bug.cgi?id=1613190
6/6 is a small cleanup


This is based on sha1
11da3a7 Linus Torvalds "Linux 4.19-rc3".

Please comment. Thanks.


Alex, I cc: you to keep you informed that
[RFC 1/6] did change drivers/vfio/vfio_iommu_spapr_tce.c but
this one does not.


Alexey Kardashevskiy (6):
  KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
  KVM: PPC: Validate all tces before updating tables
  KVM: PPC: Inform the userspace about TCE update failures
  KVM: PPC: Validate TCEs against preregistered memory page sizes
  KVM: PPC: Propagate errors to the guest when failed instead of
ignoring
  KVM: PPC: Remove redundand permission bits removal

 arch/powerpc/include/asm/book3s/64/pgtable.h |   1 -
 arch/powerpc/include/asm/iommu.h |   2 -
 arch/powerpc/include/asm/kvm_ppc.h   |   4 +-
 arch/powerpc/include/asm/mmu_context.h   |   1 +
 arch/powerpc/kernel/iommu.c  |  25 --
 arch/powerpc/kvm/book3s_64_vio.c |  89 +++--
 arch/powerpc/kvm/book3s_64_vio_hv.c  | 114 +--
 arch/powerpc/mm/init_64.c|  49 
 arch/powerpc/mm/mmu_context_iommu.c  |  34 +++-
 9 files changed, 170 insertions(+), 149 deletions(-)

-- 
2.11.0



[PATCH kernel v2 3/6] KVM: PPC: Inform the userspace about TCE update failures

2018-09-10 Thread Alexey Kardashevskiy
We return H_TOO_HARD from TCE update handlers when we think that
the next handler (realmode -> virtual mode -> user mode) has a chance to
handle the request; H_HARDWARE/H_CLOSED otherwise.

This changes the handlers to return H_TOO_HARD on every error giving
the userspace an opportunity to handle any request or at least log
them all.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_vio.c| 8 
 arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3c17977..984cec8 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
long ret;
 
if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, , )))
-   return H_HARDWARE;
+   return H_TOO_HARD;
 
if (dir == DMA_NONE)
return H_SUCCESS;
@@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct 
iommu_table *tbl,
return H_TOO_HARD;
 
if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, )))
-   return H_HARDWARE;
+   return H_TOO_HARD;
 
if (mm_iommu_mapped_inc(mem))
-   return H_CLOSED;
+   return H_TOO_HARD;
 
ret = iommu_tce_xchg(tbl, entry, , );
if (WARN_ON_ONCE(ret)) {
mm_iommu_mapped_dec(mem);
-   return H_HARDWARE;
+   return H_TOO_HARD;
}
 
if (dir != DMA_NONE)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index c2848e0b..7388b66 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -300,10 +300,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, 
struct iommu_table *tbl,
 
if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
)))
-   return H_HARDWARE;
+   return H_TOO_HARD;
 
if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
-   return H_CLOSED;
+   return H_TOO_HARD;
 
ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, , );
if (ret) {
@@ -501,7 +501,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
rmap = (void *) vmalloc_to_phys(rmap);
if (WARN_ON_ONCE_RM(!rmap))
-   return H_HARDWARE;
+   return H_TOO_HARD;
 
/*
 * Synchronize with the MMU notifier callbacks in
-- 
2.11.0



Re: [PATCH] powerpc: Avoid code patching freed init sections

2018-09-10 Thread Michal Suchánek
On Mon, 10 Sep 2018 15:44:05 +1000
Michael Neuling  wrote:

> This stops us from doing code patching in init sections after they've
> been freed.
> 
> In this chain:
>   kvm_guest_init() ->
> kvm_use_magic_page() ->
>   fault_in_pages_readable() ->
>__get_user() ->
>  __get_user_nocheck() ->
>barrier_nospec();
> 
> We have a code patching location at barrier_nospec() and
> kvm_guest_init() is an init function. This whole chain gets inlined,
> so when we free the init section (hence kvm_guest_init()), this code
> goes away and hence should no longer be patched.
> 
> We seen this as userspace memory corruption when using a memory
> checker while doing partition migration testing on powervm (this
> starts the code patching post migration via
> /sys/kernel/mobility/migration). In theory, it could also happen when
> using /sys/kernel/debug/powerpc/barrier_nospec.
> 
> With this patch there is a small change of a race if we code patch
> between the init section being freed and setting SYSTEM_RUNNING (in
> kernel_init()) but that seems like an impractical time and small
> window for any code patching to occur.
> 
> cc: sta...@vger.kernel.org # 4.13+
> Signed-off-by: Michael Neuling 
> 
> ---
> For stable I've marked this as v4.13+ since that's when we refactored
> code-patching.c but it could go back even further than that. In
> reality though, I think we can only hit this since the first
> spectre/meltdown changes.

Which means it affects all maintained stable trees because the
spectre/meltdown changes are backported.

Thanks

Michal


Re: [PATCH 4.4.y] crypto: vmx - Fix sleep-in-atomic bugs

2018-09-10 Thread Ondrej Mosnacek
On Mon, Sep 10, 2018 at 9:42 AM Ondrej Mosnacek  wrote:
> commit 0522236d4f9c5ab2e79889cb020d1acbe5da416e upstream.
>
> Conflicts:
>   drivers/crypto/vmx/
> aes_cbc.c - adapted enable/disable calls to v4.4 state
> aes_xts.c - did not exist yet in v4.4
>
> This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX
> implementations. The problem is that the blkcipher_* functions should
> not be called in atomic context.
>
> The bugs can be reproduced via the AF_ALG interface by trying to
> encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the
> VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then
> trigger BUG in crypto_yield():
>
> [  891.863680] BUG: sleeping function called from invalid context at 
> include/crypto/algapi.h:424
> [  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
> [  891.864739] 1 lock held by kcapi-enc/12347:
> [  891.864811]  #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at: 
> skcipher_recvmsg+0x50/0x530
> [  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 
> 4.19.0-0.rc0.git3.1.fc30.ppc64le #1
> [  891.865251] Call Trace:
> [  891.865340] [c003387578c0] [c0d67ea4] dump_stack+0xe8/0x164 
> (unreliable)
> [  891.865511] [c00338757910] [c0172a58] 
> ___might_sleep+0x2f8/0x310
> [  891.865679] [c00338757990] [c06bff74] 
> blkcipher_walk_done+0x374/0x4a0
> [  891.865825] [c003387579e0] [d7e73e70] 
> p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
> [  891.865993] [c00338757ad0] [c06c0ee0] 
> skcipher_encrypt_blkcipher+0x60/0x80
> [  891.866128] [c00338757b10] [c06ec504] 
> skcipher_recvmsg+0x424/0x530
> [  891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0
> [  891.866403] [c00338757c10] [c0b00f64] ___sys_recvmsg+0xf4/0x2f0
> [  891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0
> [  891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70
>
> Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
> Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Ondrej Mosnacek 
> Signed-off-by: Herbert Xu 
> Signed-off-by: Greg Kroah-Hartman 

Whoops, sorry about that last Signed-off-by, I cherry-picked from
another stable commit and forgot to edit it out...

> ---
>  drivers/crypto/vmx/aes_cbc.c | 30 ++
>  1 file changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
> index 9506e8693c81..d8ef1147b344 100644
> --- a/drivers/crypto/vmx/aes_cbc.c
> +++ b/drivers/crypto/vmx/aes_cbc.c
> @@ -111,24 +111,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc 
> *desc,
> ret = crypto_blkcipher_encrypt(_desc, dst, src,
>nbytes);
> } else {
> -   preempt_disable();
> -   pagefault_disable();
> -   enable_kernel_altivec();
> -   enable_kernel_vsx();
> -
> blkcipher_walk_init(, dst, src, nbytes);
> ret = blkcipher_walk_virt(desc, );
> while ((nbytes = walk.nbytes)) {
> +   preempt_disable();
> +   pagefault_disable();
> +   enable_kernel_vsx();
> +   enable_kernel_altivec();
> aes_p8_cbc_encrypt(walk.src.virt.addr,
>walk.dst.virt.addr,
>nbytes & AES_BLOCK_MASK,
>>enc_key, walk.iv, 1);
> +   pagefault_enable();
> +   preempt_enable();
> +
> nbytes &= AES_BLOCK_SIZE - 1;
> ret = blkcipher_walk_done(desc, , nbytes);
> }
> -
> -   pagefault_enable();
> -   preempt_enable();
> }
>
> return ret;
> @@ -152,24 +151,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc 
> *desc,
> ret = crypto_blkcipher_decrypt(_desc, dst, src,
>nbytes);
> } else {
> -   preempt_disable();
> -   pagefault_disable();
> -   enable_kernel_altivec();
> -   enable_kernel_vsx();
> -
> blkcipher_walk_init(, dst, src, nbytes);
> ret = blkcipher_walk_virt(desc, );
> while ((nbytes = walk.nbytes)) {
> +   preempt_disable();
> +   pagefault_disable();
> +   enable_kernel_vsx();
> +   enable_kernel_altivec();
> aes_p8_cbc_encrypt(walk.src.virt.addr,
>walk.dst.virt.addr,
>  

[PATCH 4.4.y] crypto: vmx - Fix sleep-in-atomic bugs

2018-09-10 Thread Ondrej Mosnacek
commit 0522236d4f9c5ab2e79889cb020d1acbe5da416e upstream.

Conflicts:
  drivers/crypto/vmx/
aes_cbc.c - adapted enable/disable calls to v4.4 state
aes_xts.c - did not exist yet in v4.4

This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX
implementations. The problem is that the blkcipher_* functions should
not be called in atomic context.

The bugs can be reproduced via the AF_ALG interface by trying to
encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the
VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then
trigger BUG in crypto_yield():

[  891.863680] BUG: sleeping function called from invalid context at 
include/crypto/algapi.h:424
[  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
[  891.864739] 1 lock held by kcapi-enc/12347:
[  891.864811]  #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at: 
skcipher_recvmsg+0x50/0x530
[  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 
4.19.0-0.rc0.git3.1.fc30.ppc64le #1
[  891.865251] Call Trace:
[  891.865340] [c003387578c0] [c0d67ea4] dump_stack+0xe8/0x164 
(unreliable)
[  891.865511] [c00338757910] [c0172a58] ___might_sleep+0x2f8/0x310
[  891.865679] [c00338757990] [c06bff74] 
blkcipher_walk_done+0x374/0x4a0
[  891.865825] [c003387579e0] [d7e73e70] 
p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
[  891.865993] [c00338757ad0] [c06c0ee0] 
skcipher_encrypt_blkcipher+0x60/0x80
[  891.866128] [c00338757b10] [c06ec504] 
skcipher_recvmsg+0x424/0x530
[  891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0
[  891.866403] [c00338757c10] [c0b00f64] ___sys_recvmsg+0xf4/0x2f0
[  891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0
[  891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70

Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
Cc: sta...@vger.kernel.org
Signed-off-by: Ondrej Mosnacek 
Signed-off-by: Herbert Xu 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/crypto/vmx/aes_cbc.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
index 9506e8693c81..d8ef1147b344 100644
--- a/drivers/crypto/vmx/aes_cbc.c
+++ b/drivers/crypto/vmx/aes_cbc.c
@@ -111,24 +111,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
ret = crypto_blkcipher_encrypt(_desc, dst, src,
   nbytes);
} else {
-   preempt_disable();
-   pagefault_disable();
-   enable_kernel_altivec();
-   enable_kernel_vsx();
-
blkcipher_walk_init(, dst, src, nbytes);
ret = blkcipher_walk_virt(desc, );
while ((nbytes = walk.nbytes)) {
+   preempt_disable();
+   pagefault_disable();
+   enable_kernel_vsx();
+   enable_kernel_altivec();
aes_p8_cbc_encrypt(walk.src.virt.addr,
   walk.dst.virt.addr,
   nbytes & AES_BLOCK_MASK,
   >enc_key, walk.iv, 1);
+   pagefault_enable();
+   preempt_enable();
+
nbytes &= AES_BLOCK_SIZE - 1;
ret = blkcipher_walk_done(desc, , nbytes);
}
-
-   pagefault_enable();
-   preempt_enable();
}
 
return ret;
@@ -152,24 +151,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
ret = crypto_blkcipher_decrypt(_desc, dst, src,
   nbytes);
} else {
-   preempt_disable();
-   pagefault_disable();
-   enable_kernel_altivec();
-   enable_kernel_vsx();
-
blkcipher_walk_init(, dst, src, nbytes);
ret = blkcipher_walk_virt(desc, );
while ((nbytes = walk.nbytes)) {
+   preempt_disable();
+   pagefault_disable();
+   enable_kernel_vsx();
+   enable_kernel_altivec();
aes_p8_cbc_encrypt(walk.src.virt.addr,
   walk.dst.virt.addr,
   nbytes & AES_BLOCK_MASK,
   >dec_key, walk.iv, 0);
+   pagefault_enable();
+   preempt_enable();
+
nbytes &= AES_BLOCK_SIZE - 1;
ret = blkcipher_walk_done(desc, , nbytes);
}
-
-   pagefault_enable();
-   

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-09-10 Thread Christoph Hellwig
On Thu, Sep 06, 2018 at 07:09:09PM -0500, Jiandi An wrote:
> For virtio device we have to pass in iommu_platform=true flag for
> this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example
> QEMU has the use of iommu_platform attribute disabled for virtio-gpu
> device.  So would also like to move towards not having to specify
> the VIRTIO_F_IOMMU_PLATFORM flag.

Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your 
platform given that you can't directly use physical addresses.
Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM.

Also just as I said for the power folks: you should really work with
the qemu folks that VIRTIO_F_IOMMU_PLATFORM (or whatever we call the
properly documented flag) can be set by default, and no pointless
performance overhead is implied by having a sane and simple
implementation.


[PATCH] powerpc: fix csum_ipv6_magic() on little endian platforms

2018-09-10 Thread Christophe Leroy
On little endian platforms, csum_ipv6_magic() keeps len and proto in
CPU byte order. This generates a bad results leading to ICMPv6 packets
from other hosts being dropped by powerpc64le platforms.

In order to fix this, len and proto should be converted to network
byte order ie bigendian byte order. However checksumming 0x12345678
and 0x56341278 provide the exact same result so it is enough to
rotate the sum of len and proto by 1 byte.

PPC32 only support bigendian so the fix is needed for PPC64 only

Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly")
Reported-by: Jianlin Shi 
Reported-by: Xin Long 
Cc:  # 4.18+
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/checksum_64.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
index 886ed94b9c13..2a68c43e13f5 100644
--- a/arch/powerpc/lib/checksum_64.S
+++ b/arch/powerpc/lib/checksum_64.S
@@ -443,6 +443,9 @@ _GLOBAL(csum_ipv6_magic)
addcr0, r8, r9
ld  r10, 0(r4)
ld  r11, 8(r4)
+#ifndef CONFIG_CPU_BIG_ENDIAN
+   rotldi  r5, r5, 8
+#endif
adder0, r0, r10
add r5, r5, r7
adder0, r0, r11
-- 
2.13.3



Re: [RFC PATCH v1 00/17] ban the use of _PAGE_XXX flags outside platform specific code

2018-09-10 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> On 09/06/2018 09:58 AM, Aneesh Kumar K.V wrote:
>> Christophe Leroy  writes:
>> 
>>> Today flags like for instance _PAGE_RW or _PAGE_USER are used through
>>> common parts of code.
>>> Using those directly in common parts of code have proven to lead to
>>> mistakes or misbehaviour, because their use is not always as trivial
>>> as one could think.
>>>
>>> For instance, (flags & _PAGE_USER) == 0 isn't enough to tell
>>> that a page is a kernel page, because some targets are using
>>> _PAGE_PRIVILEDGED and not _PAGE_USER, so the test has to be
>>> (flags & (_PAGE_USER | _PAGE_PRIVILEDGED)) == _PAGE_PRIVILEDGED
>>> This has to (bad) consequences:
>>>
>>>   - All targets must define every bit, even the unsupported ones,
>>> leading to a lot of useless #define _PAGE_XXX 0
>>>   - If someone forgets to take into account all possible _PAGE_XXX bits
>>> for the case, we can get unexpected behaviour on some targets.
>>>
>>> This becomes even more complex when we come to using _PAGE_RW.
>>> Testing (flags & _PAGE_RW) is not enough to test whether a page
>>> if writable or not, because:
>>>
>>>   - Some targets have _PAGE_RO instead, which has to be unset to tell
>>> a page is writable
>>>   - Some targets have _PAGE_R and _PAGE_W, in which case
>>> _PAGE_RW = _PAGE_R | _PAGE_W
>>>   - Even knowing whether a page is readable is not always trivial because:
>>> - Some targets requires to check that _PAGE_R is set to ensure page
>>> is readable
>>> - Some targets requires to check that _PAGE_NA is not set
>>> - Some targets requires to check that _PAGE_RO or _PAGE_RW is set
>>>
>>> Etc 
>>>
>>> In order to work around all those issues and minimise the risks of errors,
>>> this serie aims at removing all use of _PAGE_XXX flags from powerpc code
>>> and always use pte_xxx() and pte_mkxxx() accessors instead. Those accessors
>>> are then defined in target specific parts of the kernel code.
>> 
>> The series is really good. It also helps in code readability. Few things
>> i am not sure there is a way to reduce the overhead
>> 
>> -access = _PAGE_EXEC;
>> +access = pte_val(pte_mkexec(__pte(0)));
>> 
>> Considering we have multiple big endian to little endian coversion there
>> for book3s 64.
>
> Thanks for the review.
>
> For the above, I propose the following:
>
> diff --git a/arch/powerpc/mm/hash_utils_64.c 
> b/arch/powerpc/mm/hash_utils_64.c
> index f23a89d8e4ce..904ac9c84ea5 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -1482,7 +1482,7 @@ static bool should_hash_preload(struct mm_struct 
> *mm, unsigned long ea)
>   #endif
>
>   void hash_preload(struct mm_struct *mm, unsigned long ea,
> -   unsigned long access, unsigned long trap)
> +   bool is_exec, unsigned long trap)
>   {
>   int hugepage_shift;
>   unsigned long vsid;
> @@ -1490,6 +1490,7 @@ void hash_preload(struct mm_struct *mm, unsigned 
> long ea,
>   pte_t *ptep;
>   unsigned long flags;
>   int rc, ssize, update_flags = 0;
> + unsigned long access = is_exec ? _PAGE_EXEC : 0;


I guess it will be better if we do

unsigned long access = _PAGE_PRESENT | _PAGE_READ

if (is_exec)
   access |= _PAGE_EXEC.

That will also bring it closer to __hash_page. I agree that we should
always find _PAGE_PRESENT and _PAGE_READ set, because we just handled
the page fault.

-aneesh




Re: [PATCH] powerpc/powernv: Make possible for user to force a full ipl cec reboot

2018-09-10 Thread Vaibhav Jain
Thanks for looking into this patch Stewart

Stewart Smith  writes:

> We're about to introduce an MPIPL reboot type (to take a firmware
> assisted kdump style thing), and we maybe should have a reboot type to
> force attempting a fast-reboot, and this makes me think if we should add
> those in now?
I will probably let Vasant and others answer that.


> If the reboot type isn't supported, what should be the behvaiour? Reboot
> the default way or don't reboot at all?
Yes, I have addressed that in v3 of this patch at
http://patchwork.ozlabs.org/patch/967248/. In case the reboot type isnt
supported or if there is an error invoking it, the patch will revert
back to calling opal_cec_reboot().

-- 
Vaibhav Jain 
Linux Technology Center, IBM India Pvt. Ltd.