Re: [PATCH kernel 2/2] powerpc/powernv: Define PHB4 type and enable sketchy bypass on POWER9

2018-06-17 Thread Benjamin Herrenschmidt
On Mon, 2018-06-18 at 12:13 +1000, Alexey Kardashevskiy wrote:
> On Sat, 16 Jun 2018 11:05:19 +1000
> Benjamin Herrenschmidt  wrote:
> 
> > On Fri, 2018-06-01 at 18:10 +1000, Alexey Kardashevskiy wrote:
> > > These are found in POWER9 chips. Right now these PHBs have unknown type
> > > so changing it to PHB4 won't make much of a difference except enabling
> > > sketchy bypass for POWER9 as this does below.  
> > 
> > And that will break on multi-chip systems since P9 doesn't have the
> > memory contiguous (it has the chip ID in the top bits).
> 
> 
> This did not break mine and it is hard to see why it would break at all
> if we use 1G pages and the maximum we need to cover is 48 bits (this
> is what we are trying to support here - all these gpus, right?), or is
> it more now? If so, I have posted v2 of tce multilevel dynamic
> allocation which helps with enormous tce tables.

The whole point of sketchy bypass is to deal with devices with small
amount of DMA bits... most Radeon's have 40 for example. So that won't
work terribly well.

Cheers,
Ben.

> 
> 
> > 
> > Russell is working on a different implementation that should be much
> > more imune to the system physical memory layout.
> > 
> > > Signed-off-by: Alexey Kardashevskiy 
> > > ---
> > >  arch/powerpc/platforms/powernv/pci.h  | 1 +
> > >  arch/powerpc/platforms/powernv/pci-ioda.c | 5 -
> > >  2 files changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/powernv/pci.h 
> > > b/arch/powerpc/platforms/powernv/pci.h
> > > index eada4b6..1408247 100644
> > > --- a/arch/powerpc/platforms/powernv/pci.h
> > > +++ b/arch/powerpc/platforms/powernv/pci.h
> > > @@ -23,6 +23,7 @@ enum pnv_phb_model {
> > >   PNV_PHB_MODEL_UNKNOWN,
> > >   PNV_PHB_MODEL_P7IOC,
> > >   PNV_PHB_MODEL_PHB3,
> > > + PNV_PHB_MODEL_PHB4,
> > >   PNV_PHB_MODEL_NPU,
> > >   PNV_PHB_MODEL_NPU2,
> > >  };
> > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > index 9239142..66c2804 100644
> > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > @@ -1882,7 +1882,8 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev 
> > > *pdev, u64 dma_mask)
> > >   if (dma_mask >> 32 &&
> > >   dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
> > >   pnv_pci_ioda_pe_single_vendor(pe) &&
> > > - phb->model == PNV_PHB_MODEL_PHB3) {
> > > + (phb->model == PNV_PHB_MODEL_PHB3 ||
> > > +  phb->model == PNV_PHB_MODEL_PHB4)) {
> > >   /* Configure the bypass mode */
> > >   rc = pnv_pci_ioda_dma_64bit_bypass(pe);
> > >   if (rc)
> > > @@ -3930,6 +3931,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
> > > device_node *np,
> > >   phb->model = PNV_PHB_MODEL_P7IOC;
> > >   else if (of_device_is_compatible(np, "ibm,power8-pciex"))
> > >   phb->model = PNV_PHB_MODEL_PHB3;
> > > + else if (of_device_is_compatible(np, "ibm,power9-pciex"))
> > > + phb->model = PNV_PHB_MODEL_PHB4;
> > >   else if (of_device_is_compatible(np, "ibm,power8-npu-pciex"))
> > >   phb->model = PNV_PHB_MODEL_NPU;
> > >   else if (of_device_is_compatible(np, "ibm,power9-npu-pciex"))  
> 
> 
> 
> --
> Alexey


Re: linux-next: build failure in Linus' tree

2018-06-17 Thread Stephen Rothwell
Hi all,

On Tue, 12 Jun 2018 12:26:40 +1000 Stephen Rothwell  
wrote:
>
> Building Linus' tree, today's linux-next build (powerpc allyesconfig)
> failed like this:
> 
> ld: net/bpfilter/bpfilter_umh.o: compiled for a little endian system and 
> target is big endian
> ld: failed to merge target specific data of file net/bpfilter/bpfilter_umh.o
> 
> This has come to light since I started using a native compiler (i.e. one
> that can build executables, not just the kernel) for my PowerPC builds
> on a powerpcle host.
> 
> I have switched back to my limited compiler.

Any progress on this?
-- 
Cheers,
Stephen Rothwell


pgpZl8VkvnduO.pgp
Description: OpenPGP digital signature


Re: [powerpc/powervmc]kernel BUG at arch/powerpc/mm/pgtable-book3s64.c:414!

2018-06-17 Thread Aneesh Kumar K.V

On 06/17/2018 01:22 PM, Venkat Rao B wrote:



On Friday 15 June 2018 07:14 PM, Michael Ellerman wrote:

vrbagal1  writes:


Hi,

Observing kernel bug followed by kernel oops and system reboots, while
running kselftest on Power8 LPAR.

Machine Details: Power8 LPAR
Git Tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Commit ID: f5b7769eb0400ec5217a47e41148a9f816ca1f9f
Kernel version: 4.17.0-autotest
GCC version: (gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC))



This is fixed by your patch Aneesh?

http://patchwork.ozlabs.org/patch/929325/


Yes, this patch fixes the issue.



I missed adding Reported-by:Venkat Rao B  
and Tested-by:Venkat Rao B 


Michael,

Can you add that when applying the patch?

-aneesh



Re: [PATCH kernel 2/2] powerpc/powernv: Define PHB4 type and enable sketchy bypass on POWER9

2018-06-17 Thread Alexey Kardashevskiy
On Sat, 16 Jun 2018 11:05:19 +1000
Benjamin Herrenschmidt  wrote:

> On Fri, 2018-06-01 at 18:10 +1000, Alexey Kardashevskiy wrote:
> > These are found in POWER9 chips. Right now these PHBs have unknown type
> > so changing it to PHB4 won't make much of a difference except enabling
> > sketchy bypass for POWER9 as this does below.  
> 
> And that will break on multi-chip systems since P9 doesn't have the
> memory contiguous (it has the chip ID in the top bits).


This did not break mine and it is hard to see why it would break at all
if we use 1G pages and the maximum we need to cover is 48 bits (this
is what we are trying to support here - all these gpus, right?), or is
it more now? If so, I have posted v2 of tce multilevel dynamic
allocation which helps with enormous tce tables.



> 
> Russell is working on a different implementation that should be much
> more imune to the system physical memory layout.
> 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> >  arch/powerpc/platforms/powernv/pci.h  | 1 +
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 5 -
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/pci.h 
> > b/arch/powerpc/platforms/powernv/pci.h
> > index eada4b6..1408247 100644
> > --- a/arch/powerpc/platforms/powernv/pci.h
> > +++ b/arch/powerpc/platforms/powernv/pci.h
> > @@ -23,6 +23,7 @@ enum pnv_phb_model {
> > PNV_PHB_MODEL_UNKNOWN,
> > PNV_PHB_MODEL_P7IOC,
> > PNV_PHB_MODEL_PHB3,
> > +   PNV_PHB_MODEL_PHB4,
> > PNV_PHB_MODEL_NPU,
> > PNV_PHB_MODEL_NPU2,
> >  };
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 9239142..66c2804 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -1882,7 +1882,8 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev 
> > *pdev, u64 dma_mask)
> > if (dma_mask >> 32 &&
> > dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
> > pnv_pci_ioda_pe_single_vendor(pe) &&
> > -   phb->model == PNV_PHB_MODEL_PHB3) {
> > +   (phb->model == PNV_PHB_MODEL_PHB3 ||
> > +phb->model == PNV_PHB_MODEL_PHB4)) {
> > /* Configure the bypass mode */
> > rc = pnv_pci_ioda_dma_64bit_bypass(pe);
> > if (rc)
> > @@ -3930,6 +3931,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
> > device_node *np,
> > phb->model = PNV_PHB_MODEL_P7IOC;
> > else if (of_device_is_compatible(np, "ibm,power8-pciex"))
> > phb->model = PNV_PHB_MODEL_PHB3;
> > +   else if (of_device_is_compatible(np, "ibm,power9-pciex"))
> > +   phb->model = PNV_PHB_MODEL_PHB4;
> > else if (of_device_is_compatible(np, "ibm,power8-npu-pciex"))
> > phb->model = PNV_PHB_MODEL_NPU;
> > else if (of_device_is_compatible(np, "ibm,power9-npu-pciex"))  



--
Alexey


Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"

2018-06-17 Thread Eric Dumazet



On 06/17/2018 03:27 AM, Andreas Schwab wrote:

> 
> That doesn't change anything.

OK, thanks !

Oh this is silly, please try :

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 
c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe
 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim);
 int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
 {
if (skb->ip_summed == CHECKSUM_COMPLETE) {
-   int delta = skb->len - len;
+   __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0);
 
-   skb->csum = csum_sub(skb->csum,
-skb_checksum(skb, len, delta, 0));
+   skb->csum = csum_block_sub(skb->csum, csumdiff, len);
}
return __pskb_trim(skb, len);
 }




[PATCH] powerpc: wii: Remove outdated comment about memory fixups

2018-06-17 Thread Jonathan Neuschäfer
The workaround has been removed. What stays is just code to find the
memory hole so the BATs can be configured properly in the function below.

Fixes: 57deb8fea01f ("powerpc/wii: Don't rely on the reserved memory hack")
Signed-off-by: Jonathan Neuschäfer 
---
 arch/powerpc/platforms/embedded6xx/wii.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/wii.c 
b/arch/powerpc/platforms/embedded6xx/wii.c
index fc00d82691e1..a133b70bfe6f 100644
--- a/arch/powerpc/platforms/embedded6xx/wii.c
+++ b/arch/powerpc/platforms/embedded6xx/wii.c
@@ -68,16 +68,6 @@ void __init wii_memory_fixups(void)
 {
struct memblock_region *p = memblock.memory.regions;
 
-   /*
-* This is part of a workaround to allow the use of two
-* discontinuous RAM ranges on the Wii, even if this is
-* currently unsupported on 32-bit PowerPC Linux.
-*
-* We coalesce the two memory ranges of the Wii into a
-* single range, then create a reservation for the "hole"
-* between both ranges.
-*/
-
BUG_ON(memblock.memory.cnt != 2);
BUG_ON(!page_aligned(p[0].base) || !page_aligned(p[1].base));
 
-- 
2.17.1



Re: [PATCH] powerpc/64s: Report SLB multi-hit rather than parity error

2018-06-17 Thread Nicholas Piggin
On Fri, 15 Jun 2018 21:37:15 +1000
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> > On Wed, 13 Jun 2018 23:24:14 +1000
> > Michael Ellerman  wrote:
> >  
> >> When we take an SLB multi-hit on bare metal, we see both the multi-hit
> >> and parity error bits set in DSISR. The user manuals indicates this is
> >> expected to always happen on Power8, whereas on Power9 it says a
> >> multi-hit will "usually" also cause a parity error.
> >> 
> >> We decide what to do based on the various error tables in mce_power.c,
> >> and because we process them in order and only report the first, we
> >> currently always report a parity error but not the multi-hit, eg:
> >> 
> >>   Severe Machine check interrupt [Recovered]
> >> Initiator: CPU
> >> Error type: SLB [Parity]
> >>   Effective address: c00d4300
> >> 
> >> Although this is correct, it leaves the user wondering why they got a
> >> parity error. It would be clearer instead if we reported the
> >> multi-hit because that is more likely to be simply a software bug,
> >> whereas a true parity error is possibly an indication of a bad core.
> >> 
> >> We can do that simply by reordering the error tables so that multi-hit
> >> appears before parity. That doesn't affect the error recovery at all,
> >> because we flush the SLB either way.  
> >
> > Yeah this is a good idea. I wonder if there are any other conditions
> > like this that should be reordered.  
> 
> Yeah good point, this one just caught my eye because I was testing it.
> Ideally it wouldn't matter and we could actually report multiple, but
> that would be a bit of a bigger change.

Yep this patch looks fine for a minimal fix.

> 
> > I think the i-side should not have to be changed here because it
> > matches the value not bits, so that shouldn't matter.  
> 
> Ah OK, will check.
> 
> > A bit of a shame we don't report i/d side, and ideally we'd be able
> > to report multiple conditions. The reporting APIs really want to be
> > massaged a bit, but for now this is a good step.  
> 
> Ah snap, yep, more detail & multiple conditions would be nice.
> 
> I don't really understand the way we do the reporting now. The
> struct machine_check_event is all carefully laid out with reserved
> fields and a version number and everything as if it's an ABI. But AFAICS
> it's purely internal to the kernel.
> 
> And then we have struct mce_error_info, but that's a separate thing and
> struct machine_check_event doesn't contain one of them?

Yeah I noticed that too a while back, was it an old OPAL API or maybe a
proposed new API that was never implemented? I would like to end up
doing most MCE decoding in firmware at some point, but I don't think
it's worth keeping this existing ABI thing around for it.

Thanks,
Nick


[PATCH kernel v2 5/6] powerpc/powernv: Rework TCE level allocation

2018-06-17 Thread Alexey Kardashevskiy
This moves actual pages allocation to a separate function which is going
to be reused later in on-demand TCE allocation.

While we are at it, remove unnecessary level size round up as the caller
does this already.

Reviewed-by: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 30 +--
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index f14b282..36c2eb0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -31,6 +31,23 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
tbl->it_type = TCE_PCI;
 }
 
+static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
+{
+   struct page *tce_mem = NULL;
+   __be64 *addr;
+
+   tce_mem = alloc_pages_node(nid, GFP_KERNEL, shift - PAGE_SHIFT);
+   if (!tce_mem) {
+   pr_err("Failed to allocate a TCE memory, level shift=%d\n",
+   shift);
+   return NULL;
+   }
+   addr = page_address(tce_mem);
+   memset(addr, 0, 1UL << shift);
+
+   return addr;
+}
+
 static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)
 {
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -165,21 +182,12 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int 
nid, unsigned int shift,
unsigned int levels, unsigned long limit,
unsigned long *current_offset, unsigned long *total_allocated)
 {
-   struct page *tce_mem = NULL;
__be64 *addr, *tmp;
-   unsigned int order = max_t(unsigned int, shift, PAGE_SHIFT) -
-   PAGE_SHIFT;
-   unsigned long allocated = 1UL << (order + PAGE_SHIFT);
+   unsigned long allocated = 1UL << shift;
unsigned int entries = 1UL << (shift - 3);
long i;
 
-   tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
-   if (!tce_mem) {
-   pr_err("Failed to allocate a TCE memory, order=%d\n", order);
-   return NULL;
-   }
-   addr = page_address(tce_mem);
-   memset(addr, 0, allocated);
+   addr = pnv_alloc_tce_level(nid, shift);
*total_allocated += allocated;
 
--levels;
-- 
2.11.0



[PATCH kernel v2 3/6] KVM: PPC: Make iommu_table::it_userspace big endian

2018-06-17 Thread Alexey Kardashevskiy
We are going to reuse multilevel TCE code for the userspace copy of
the TCE table and since it is big endian, let's make the copy big endian
too.

Reviewed-by: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h|  2 +-
 arch/powerpc/kvm/book3s_64_vio.c| 11 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c | 10 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 19 +--
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 20febe0..803ac70 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -117,7 +117,7 @@ struct iommu_table {
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
struct list_head it_group_list;/* List of iommu_table_group_link */
-   unsigned long *it_userspace; /* userspace view of the table */
+   __be64 *it_userspace; /* userspace view of the table */
struct iommu_table_ops *it_ops;
struct krefit_kref;
 };
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 80ead38..1dbca4b 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -378,19 +378,19 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
 {
struct mm_iommu_table_group_mem_t *mem = NULL;
const unsigned long pgsize = 1ULL << tbl->it_page_shift;
-   unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
 
if (!pua)
/* it_userspace allocation might be delayed */
return H_TOO_HARD;
 
-   mem = mm_iommu_lookup(kvm->mm, *pua, pgsize);
+   mem = mm_iommu_lookup(kvm->mm, be64_to_cpu(*pua), pgsize);
if (!mem)
return H_TOO_HARD;
 
mm_iommu_mapped_dec(mem);
 
-   *pua = 0;
+   *pua = cpu_to_be64(0);
 
return H_SUCCESS;
 }
@@ -437,7 +437,8 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct 
iommu_table *tbl,
enum dma_data_direction dir)
 {
long ret;
-   unsigned long hpa, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   unsigned long hpa;
+   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
struct mm_iommu_table_group_mem_t *mem;
 
if (!pua)
@@ -464,7 +465,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct 
iommu_table *tbl,
if (dir != DMA_NONE)
kvmppc_tce_iommu_mapped_dec(kvm, tbl, entry);
 
-   *pua = ua;
+   *pua = cpu_to_be64(ua);
 
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 635f3ca..18109f3 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
 {
struct mm_iommu_table_group_mem_t *mem = NULL;
const unsigned long pgsize = 1ULL << tbl->it_page_shift;
-   unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
 
if (!pua)
/* it_userspace allocation might be delayed */
@@ -210,13 +210,13 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm 
*kvm,
if (WARN_ON_ONCE_RM(!pua))
return H_HARDWARE;
 
-   mem = mm_iommu_lookup_rm(kvm->mm, *pua, pgsize);
+   mem = mm_iommu_lookup_rm(kvm->mm, be64_to_cpu(*pua), pgsize);
if (!mem)
return H_TOO_HARD;
 
mm_iommu_mapped_dec(mem);
 
-   *pua = 0;
+   *pua = cpu_to_be64(0);
 
return H_SUCCESS;
 }
@@ -268,7 +268,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, 
struct iommu_table *tbl,
 {
long ret;
unsigned long hpa = 0;
-   unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
struct mm_iommu_table_group_mem_t *mem;
 
if (!pua)
@@ -302,7 +302,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, 
struct iommu_table *tbl,
if (dir != DMA_NONE)
kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
 
-   *pua = ua;
+   *pua = cpu_to_be64(ua);
 
return 0;
 }
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 451284e0..8283a4a 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -231,7 +231,7 @@ static long tce_iommu_userspace_view_alloc(struct 
iommu_table *tbl,
decrement_locked_vm(mm, cb >> PAGE_SHIFT);
return -ENOMEM;
}
-   tbl->it_userspace = uas;
+   tbl->it_userspace = (__be64 *) uas;
 
return 0;
 }
@@ -490,20 +490,20 @@ static void tce_iommu_unuse_page_v2(struct 

[PATCH kernel v2 1/6] powerpc/powernv: Remove useless wrapper

2018-06-17 Thread Alexey Kardashevskiy
This gets rid of a useless wrapper around
pnv_pci_ioda2_table_free_pages().

Reviewed-by: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 29f798c..d4c60b6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2206,11 +2206,6 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
long index,
pnv_pci_ioda2_tce_invalidate(tbl, index, npages, false);
 }
 
-static void pnv_ioda2_table_free(struct iommu_table *tbl)
-{
-   pnv_pci_ioda2_table_free_pages(tbl);
-}
-
 static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.set = pnv_ioda2_tce_build,
 #ifdef CONFIG_IOMMU_API
@@ -2219,7 +2214,7 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
 #endif
.clear = pnv_ioda2_tce_free,
.get = pnv_tce_get,
-   .free = pnv_ioda2_table_free,
+   .free = pnv_pci_ioda2_table_free_pages,
 };
 
 static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
-- 
2.11.0



[PATCH kernel v2 0/6] powerpc/powernv/iommu: Optimize memory use

2018-06-17 Thread Alexey Kardashevskiy


This patchset aims to reduce actual memory use for guests with
sparse memory. The pseries guest uses dynamic DMA windows to map
the entire guest RAM but it only actually maps onlined memory
which may be not be contiguous. I hit this when tried passing
through NVLink2-connected GPU RAM of NVIDIA V100 and trying to
map this RAM at the same offset as in the real hardware
forced me to rework I handle these windows.

This moves userspace-to-host-physical translation table
(iommu_table::it_userspace) from VFIO TCE IOMMU subdriver to
the platform code and reuses the already existing multilevel
TCE table code which we have for the hardware tables.
At last in 6/6 I switch to on-demand allocation so we do not
allocate huge chunks of the table if we do not have to;
there is some math in 6/6.

Changes:
v2:
* bugfix and error handling in 6/6


Please comment. Thanks.



Alexey Kardashevskiy (6):
  powerpc/powernv: Remove useless wrapper
  powerpc/powernv: Move TCE manupulation code to its own file
  KVM: PPC: Make iommu_table::it_userspace big endian
  powerpc/powernv: Add indirect levels to it_userspace
  powerpc/powernv: Rework TCE level allocation
  powerpc/powernv/ioda: Allocate indirect TCE levels on demand

 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/include/asm/iommu.h  |  11 +-
 arch/powerpc/platforms/powernv/pci.h  |  44 ++-
 arch/powerpc/kvm/book3s_64_vio.c  |  11 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c   |  18 +-
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 395 ++
 arch/powerpc/platforms/powernv/pci-ioda.c | 192 ++---
 arch/powerpc/platforms/powernv/pci.c  | 158 ---
 drivers/vfio/vfio_iommu_spapr_tce.c   |  65 +
 9 files changed, 482 insertions(+), 414 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-ioda-tce.c

-- 
2.11.0



[PATCH kernel v2 6/6] powerpc/powernv/ioda: Allocate indirect TCE levels on demand

2018-06-17 Thread Alexey Kardashevskiy
At the moment we allocate the entire TCE table, twice (hardware part and
userspace translation cache). This normally works as we normally have
contigous memory and the guest will map entire RAM for 64bit DMA.

However if we have sparse RAM (one example is a memory device), then
we will allocate TCEs which will never be used as the guest only maps
actual memory for DMA. If it is a single level TCE table, there is nothing
we can really do but if it a multilevel table, we can skip allocating
TCEs we know we won't need.

This adds ability to allocate only first level, saving memory.

This changes iommu_table::free() to avoid allocating of an extra level;
iommu_table::set() will do this when needed.

This adds @alloc parameter to iommu_table::exchange() to tell the callback
if it can allocate an extra level; the flag is set to "false" for
the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
H_TOO_HARD.

This still requires the entire table to be counted in mm::locked_vm.

To be conservative, this only does on-demand allocation when
the usespace cache table is requested which is the case of VFIO.

The example math for a system replicating a powernv setup with NVLink2
in a guest:
16GB RAM mapped at 0x0
128GB GPU RAM window (16GB of actual RAM) mapped at 0x2440

the table to cover that all with 64K pages takes:
(((0x2440 + 0x20) >> 16)*8)>>20 = 4556MB

If we allocate only necessary TCE levels, we will only need:
(((0x4 + 0x4) >> 16)*8)>>20 = 4MB (plus some for indirect
levels).

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* fixed bug in cleanup path which forced the entire table to be
allocated right before destroying
* added memory allocation error handling pnv_tce()
---
 arch/powerpc/include/asm/iommu.h  |  7 ++-
 arch/powerpc/platforms/powernv/pci.h  |  6 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c   |  4 +-
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 69 ---
 arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++--
 drivers/vfio/vfio_iommu_spapr_tce.c   |  2 +-
 6 files changed, 69 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 4bdcf22..daa3ee5 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -70,7 +70,7 @@ struct iommu_table_ops {
unsigned long *hpa,
enum dma_data_direction *direction);
 
-   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
+   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
 #endif
void (*clear)(struct iommu_table *tbl,
long index, long npages);
@@ -122,10 +122,13 @@ struct iommu_table {
__be64 *it_userspace; /* userspace view of the table */
struct iommu_table_ops *it_ops;
struct krefit_kref;
+   int it_nid;
 };
 
+#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
+   ((tbl)->it_ops->useraddrptr((tbl), (entry), false))
 #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
-   ((tbl)->it_ops->useraddrptr((tbl), (entry)))
+   ((tbl)->it_ops->useraddrptr((tbl), (entry), true))
 
 /* Pure 2^n version of get_order */
 static inline __attribute_const__
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 5e02408..1fa5590 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
index, long npages,
unsigned long attrs);
 extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
 extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
-   unsigned long *hpa, enum dma_data_direction *direction);
-extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
+   unsigned long *hpa, enum dma_data_direction *direction,
+   bool alloc);
+extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
+   bool alloc);
 extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
 extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index db0490c..05b4865 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
 {
struct mm_iommu_table_group_mem_t *mem = NULL;
const unsigned long pgsize = 1ULL << tbl->it_page_shift;
-   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
 
if (!pua)
/* it_userspace allocation might be delayed */
@@ -264,7 +264,7 @@ static long 

[PATCH kernel v2 4/6] powerpc/powernv: Add indirect levels to it_userspace

2018-06-17 Thread Alexey Kardashevskiy
We want to support sparse memory and therefore huge chunks of DMA windows
do not need to be mapped. If a DMA window big enough to require 2 or more
indirect levels, and a DMA window is used to map all RAM (which is
a default case for 64bit window), we can actually save some memory by
not allocation TCE for regions which we are not going to map anyway.

The hardware tables alreary support indirect levels but we also keep
host-physical-to-userspace translation array which is allocated by
vmalloc() and is a flat array which might use quite some memory.

This converts it_userspace from vmalloc'ed array to a multi level table.

As the format becomes platform dependend, this replaces the direct access
to it_usespace with a iommu_table_ops::useraddrptr hook which returns
a pointer to the userspace copy of a TCE; future extension will return
NULL if the level was not allocated.

This should not change non-KVM handling of TCE tables and it_userspace
will not be allocated for non-KVM tables.

Reviewed-by: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  |  6 +--
 arch/powerpc/platforms/powernv/pci.h  |  3 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c   |  8 
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 65 +--
 arch/powerpc/platforms/powernv/pci-ioda.c | 31 ++---
 drivers/vfio/vfio_iommu_spapr_tce.c   | 46 ---
 6 files changed, 81 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 803ac70..4bdcf22 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -69,6 +69,8 @@ struct iommu_table_ops {
long index,
unsigned long *hpa,
enum dma_data_direction *direction);
+
+   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
 #endif
void (*clear)(struct iommu_table *tbl,
long index, long npages);
@@ -123,9 +125,7 @@ struct iommu_table {
 };
 
 #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
-   ((tbl)->it_userspace ? \
-   &((tbl)->it_userspace[(entry) - (tbl)->it_offset]) : \
-   NULL)
+   ((tbl)->it_ops->useraddrptr((tbl), (entry)))
 
 /* Pure 2^n version of get_order */
 static inline __attribute_const__
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index f507baf..5e02408 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -268,11 +268,12 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
index, long npages,
 extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
 extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
unsigned long *hpa, enum dma_data_direction *direction);
+extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
 extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
 extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
__u32 page_shift, __u64 window_size, __u32 levels,
-   struct iommu_table *tbl);
+   bool alloc_userspace_copy, struct iommu_table *tbl);
 extern void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
 
 extern long pnv_pci_link_table_and_group(int node, int num,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 18109f3..db0490c 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -206,10 +206,6 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
/* it_userspace allocation might be delayed */
return H_TOO_HARD;
 
-   pua = (void *) vmalloc_to_phys(pua);
-   if (WARN_ON_ONCE_RM(!pua))
-   return H_HARDWARE;
-
mem = mm_iommu_lookup_rm(kvm->mm, be64_to_cpu(*pua), pgsize);
if (!mem)
return H_TOO_HARD;
@@ -282,10 +278,6 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, 
struct iommu_table *tbl,
if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, )))
return H_HARDWARE;
 
-   pua = (void *) vmalloc_to_phys(pua);
-   if (WARN_ON_ONCE_RM(!pua))
-   return H_HARDWARE;
-
if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
return H_CLOSED;
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 700ceb1..f14b282 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -31,9 +31,9 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
tbl->it_type = TCE_PCI;
 }
 
-static __be64 *pnv_tce(struct iommu_table *tbl, long idx)
+static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)

[PATCH kernel v2 2/6] powerpc/powernv: Move TCE manupulation code to its own file

2018-06-17 Thread Alexey Kardashevskiy
Right now we have allocation code in pci-ioda.c and traversing code in
pci.c, let's keep them toghether. However both files are big enough
already so let's move this business to a new file.

While we at it, move the code which links IOMMU table groups to
IOMMU tables as it is not specific to any PNV PHB model.

These puts exported symbols from the new file together.

This fixes several warnings from checkpatch.pl like this:
"WARNING: Prefer 'unsigned int' to bare use of 'unsigned'".

As this is almost cut-n-paste, there should be no behavioral change.

Reviewed-by: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/pci.h  |  41 ++--
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 313 ++
 arch/powerpc/platforms/powernv/pci-ioda.c | 146 
 arch/powerpc/platforms/powernv/pci.c  | 158 -
 5 files changed, 340 insertions(+), 320 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-ioda-tce.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 703a350..b540ce8e 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,7 +6,7 @@ obj-y   += opal-msglog.o opal-hmi.o 
opal-power.o opal-irqchip.o
 obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
+obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
 obj-$(CONFIG_CXL_BASE) += pci-cxl.o
 obj-$(CONFIG_EEH)  += eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 1408247..f507baf 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -202,13 +202,6 @@ struct pnv_phb {
 };
 
 extern struct pci_ops pnv_pci_ops;
-extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
-   unsigned long uaddr, enum dma_data_direction direction,
-   unsigned long attrs);
-extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
-extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
-   unsigned long *hpa, enum dma_data_direction *direction);
-extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
unsigned char *log_buff);
@@ -218,14 +211,6 @@ int pnv_pci_cfg_write(struct pci_dn *pdn,
  int where, int size, u32 val);
 extern struct iommu_table *pnv_pci_table_alloc(int nid);
 
-extern long pnv_pci_link_table_and_group(int node, int num,
-   struct iommu_table *tbl,
-   struct iommu_table_group *table_group);
-extern void pnv_pci_unlink_table_and_group(struct iommu_table *tbl,
-   struct iommu_table_group *table_group);
-extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
- void *tce_mem, u64 tce_size,
- u64 dma_offset, unsigned page_shift);
 extern void pnv_pci_init_ioda_hub(struct device_node *np);
 extern void pnv_pci_init_ioda2_phb(struct device_node *np);
 extern void pnv_pci_init_npu_phb(struct device_node *np);
@@ -273,4 +258,30 @@ extern void pnv_cxl_cx4_teardown_msi_irqs(struct pci_dev 
*pdev);
 /* phb ops (cxl switches these when enabling the kernel api on the phb) */
 extern const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops;
 
+/* pci-ioda-tce.c */
+#define POWERNV_IOMMU_DEFAULT_LEVELS   1
+#define POWERNV_IOMMU_MAX_LEVELS   5
+
+extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
+   unsigned long uaddr, enum dma_data_direction direction,
+   unsigned long attrs);
+extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
+extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
+   unsigned long *hpa, enum dma_data_direction *direction);
+extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
+
+extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
+   __u32 page_shift, __u64 window_size, __u32 levels,
+   struct iommu_table *tbl);
+extern void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
+
+extern long pnv_pci_link_table_and_group(int node, int num,
+   struct iommu_table *tbl,
+   struct iommu_table_group *table_group);
+extern void pnv_pci_unlink_table_and_group(struct iommu_table *tbl,
+   struct iommu_table_group *table_group);
+extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
+   void *tce_mem, u64 tce_size,
+   u64 dma_offset, 

Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"

2018-06-17 Thread Andreas Schwab
On Jun 16 2018, Eric Dumazet  wrote:

> I would try something like :
>
> Basically do not bother using CHECKSUM_COMPLETE for small frames that might 
> have been padded.
>
> Since we need to bring one cache line in eth_type_trans() and further header 
> processing,
> computing the checksum in software will be almost free anyway.
>
> diff --git a/drivers/net/ethernet/sun/sungem.c 
> b/drivers/net/ethernet/sun/sungem.c
> index 
> 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..071039f211a8a33153e888bd4014314ba5e686a4
>  100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -855,9 +855,11 @@ static int gem_rx(struct gem *gp, int work_to_do)
> skb = copy_skb;
> }
>  
> -   csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 
> 0x);
> -   skb->csum = csum_unfold(csum);
> -   skb->ip_summed = CHECKSUM_COMPLETE;
> +   if (len > ETH_ZLEN) {
> +   csum = (__force __sum16)htons((status & 
> RXDCTRL_TCPCSUM) ^ 0x);
> +   skb->csum = csum_unfold(csum);
> +   skb->ip_summed = CHECKSUM_COMPLETE;
> +   }
> skb->protocol = eth_type_trans(skb, gp->dev);
>  
> napi_gro_receive(>napi, skb);

That doesn't change anything.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"

2018-06-17 Thread Andreas Schwab
On Jun 16 2018, Mathieu Malaterre  wrote:

> That's odd since it seems to only affect g4+sungem user. None of the
> ppc64 seems to be having it.

I'm also seeing it on a PowerMac G5.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH v2] mm: convert return type of handle_mm_fault() caller to vm_fault_t

2018-06-17 Thread Souptick Joarder
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault()
are changed to return vm_fault_t type.

Signed-off-by: Souptick Joarder 
---
v2: Fixed kbuild error

 arch/alpha/mm/fault.c |  3 ++-
 arch/arc/mm/fault.c   |  4 +++-
 arch/arm/mm/fault.c   |  7 ---
 arch/arm64/mm/fault.c |  6 +++---
 arch/hexagon/mm/vm_fault.c|  2 +-
 arch/ia64/mm/fault.c  |  2 +-
 arch/m68k/mm/fault.c  |  4 ++--
 arch/microblaze/mm/fault.c|  2 +-
 arch/mips/mm/fault.c  |  2 +-
 arch/nds32/mm/fault.c |  2 +-
 arch/nios2/mm/fault.c |  2 +-
 arch/openrisc/mm/fault.c  |  2 +-
 arch/parisc/mm/fault.c|  2 +-
 arch/powerpc/include/asm/copro.h  |  4 +++-
 arch/powerpc/mm/copro_fault.c |  2 +-
 arch/powerpc/mm/fault.c   |  7 ---
 arch/powerpc/platforms/cell/spufs/fault.c |  2 +-
 arch/riscv/mm/fault.c |  3 ++-
 arch/s390/mm/fault.c  | 13 -
 arch/sh/mm/fault.c|  4 ++--
 arch/sparc/mm/fault_32.c  |  3 ++-
 arch/sparc/mm/fault_64.c  |  3 ++-
 arch/um/kernel/trap.c |  2 +-
 arch/unicore32/mm/fault.c |  9 +
 arch/x86/mm/fault.c   |  5 +++--
 arch/xtensa/mm/fault.c|  2 +-
 drivers/iommu/amd_iommu_v2.c  |  2 +-
 drivers/iommu/intel-svm.c |  4 +++-
 drivers/misc/cxl/fault.c  |  2 +-
 drivers/misc/ocxl/link.c  |  3 ++-
 mm/hmm.c  |  8 
 mm/ksm.c  |  2 +-
 32 files changed, 69 insertions(+), 51 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index cd3c572..2a979ee 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -87,7 +87,8 @@
struct vm_area_struct * vma;
struct mm_struct *mm = current->mm;
const struct exception_table_entry *fixup;
-   int fault, si_code = SEGV_MAPERR;
+   int si_code = SEGV_MAPERR;
+   vm_fault_t fault;
siginfo_t info;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index a0b7bd6..3a18d33 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -66,7 +67,8 @@ void do_page_fault(unsigned long address, struct pt_regs 
*regs)
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
siginfo_t info;
-   int fault, ret;
+   int ret;
+   vm_fault_t fault;
int write = regs->ecr_cause & ECR_C_PROTV_STORE;  /* ST/EX */
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index b75eada..758abcb 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -219,12 +219,12 @@ static inline bool access_error(unsigned int fsr, struct 
vm_area_struct *vma)
return vma->vm_flags & mask ? false : true;
 }
 
-static int __kprobes
+static vm_fault_t __kprobes
 __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
unsigned int flags, struct task_struct *tsk)
 {
struct vm_area_struct *vma;
-   int fault;
+   vm_fault_t fault;
 
vma = find_vma(mm, addr);
fault = VM_FAULT_BADMAP;
@@ -259,7 +259,8 @@ static inline bool access_error(unsigned int fsr, struct 
vm_area_struct *vma)
 {
struct task_struct *tsk;
struct mm_struct *mm;
-   int fault, sig, code;
+   int sig, code;
+   vm_fault_t fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
if (notify_page_fault(regs, fsr))
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2af3dd8..8da263b 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -371,12 +371,12 @@ static void do_bad_area(unsigned long addr, unsigned int 
esr, struct pt_regs *re
 #define VM_FAULT_BADMAP0x01
 #define VM_FAULT_BADACCESS 0x02
 
-static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
+static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
   unsigned int mm_flags, unsigned long vm_flags,
   struct task_struct *tsk)
 {
struct vm_area_struct *vma;
-   int fault;
+   

Re: [powerpc/powervmc]kernel BUG at arch/powerpc/mm/pgtable-book3s64.c:414!

2018-06-17 Thread Venkat Rao B




On Friday 15 June 2018 07:14 PM, Michael Ellerman wrote:

vrbagal1  writes:


Hi,

Observing kernel bug followed by kernel oops and system reboots, while
running kselftest on Power8 LPAR.

Machine Details: Power8 LPAR
Git Tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Commit ID: f5b7769eb0400ec5217a47e41148a9f816ca1f9f
Kernel version: 4.17.0-autotest
GCC version: (gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC))



This is fixed by your patch Aneesh?

http://patchwork.ozlabs.org/patch/929325/


Yes, this patch fixes the issue.

Regards,
Venkat



cheers