[PATCH] powerpc/8xx: add missing header in 8xx_mmu.c

2018-10-17 Thread Christophe Leroy
arch/powerpc/mm/8xx_mmu.c:174:6: error: no previous prototype for ‘set_context’ 
[-Werror=missing-prototypes]
 void set_context(unsigned long id, pgd_t *pgd)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/8xx_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 36484a2ef915..64ee7597380e 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -13,6 +13,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
-- 
2.13.3



[PATCH] powerpc/mm: remove unused variable

2018-10-17 Thread Christophe Leroy
In file included from ./include/linux/hugetlb.h:445:0,
 from arch/powerpc/kernel/setup-common.c:37:
./arch/powerpc/include/asm/hugetlb.h: In function ‘huge_ptep_clear_flush’:
./arch/powerpc/include/asm/hugetlb.h:154:8: error: variable ‘pte’ set but not 
used [-Werror=unused-but-set-variable]

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 2d00cc530083..cb812b131a37 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -148,8 +148,7 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
-   pte_t pte;
-   pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+   huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
flush_hugetlb_page(vma, addr);
 }
 
-- 
2.13.3



Re: MPC8321 boot failure

2018-10-17 Thread Christophe LEROY

Hi,

I can now confirm that the boot failure is due to the absence of commit 
8183d99f4a22 ("powerpc/lib/feature-fixups: use raw_patch_instruction()")


Greg, could you please apply that patch to 4.14 stable ?

Thanks
Christophe

Le 17/10/2018 à 18:36, Christophe LEROY a écrit :

Hi,

Yes I discovered the same issue today on MPC8321E, I plan to look at it 
more closely tomorrow morning (Paris Time).


I think we are missing commit 8183d99f4a22c2abbc543847a588df3666ef0c0c , 
I didn't realise it when we applied the serie to 4.14, 
patch_instruction() is called too early without that patch.


If you have opportunity to test now, you are welcome, otherwise I'll 
test it tomorrow.


Christophe

Le 17/10/2018 à 17:18, David Gounaris a écrit :
Hello, I got into troubles when I upgraded to Linux kernel 4.14.76 on 
boards with MPC8321.



The symptom that I see is that the boot process gets cyclic, and no 
printouts are seen from the Linux kernel. It seems like it resets.



When I revert the following commits it works again.

af1a8101794dfea897290e057f61086dabfe6c91, powerpc/lib: fix book3s/32 
boot failure due to code patching
609fbeddb24c4035d24fc32d82dc08b30ae3dfc0, powerpc: Avoid code patching 
freed init sections


Any ideas of how to continue?

BR / David Gounaris





Re: [PATCH] powerpc: Add missing include

2018-10-17 Thread Christophe LEROY




Le 17/10/2018 à 21:25, Mathieu Malaterre a écrit :

In commit 88b0fe175735 ("powerpc: Add show_user_instructions()") the
function show_user_instructions was added.

This commit adds an include of header file  to provide
the missing function prototype. Silence the following gcc warning
(treated as error with W=1):

   arch/powerpc/kernel/process.c:1302:6: error: no previous prototype for 
‘show_user_instructions’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 


This is already fixed, see 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c9386bfd37d37f29588de9ea9add455510049c33


Christophe


---
  arch/powerpc/kernel/process.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bb6ac471a784..1c64491e9702 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -65,6 +65,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 



Re: [PATCH 1/5] powerpc/64s: Kernel Hypervisor Restricted Access Prevention

2018-10-17 Thread Russell Currey
On Wed, 2018-10-17 at 22:30 +1100, Michael Ellerman wrote:
> Russell Currey  writes:
> > diff --git a/arch/powerpc/kernel/entry_64.S
> > b/arch/powerpc/kernel/entry_64.S
> > index 7b1693adff2a..090f72cbb02d 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -286,6 +286,9 @@ BEGIN_FTR_SECTION
> > HMT_MEDIUM_LOW
> >  END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> >  
> > +   /* headed back to userspace, so unlock the AMR */
> > +   UNLOCK_AMR(r2)
> > +
> 
> This one needs an ifdef, or preferable an empty version in a header
> for
> non-book3s 64, otherwise we get:
> 
>   arch/powerpc/kernel/entry_64.S: Assembler messages:
>   arch/powerpc/kernel/entry_64.S:290: Error: unrecognized opcode:
> `unlock_amr(%r2)'
>   scripts/Makefile.build:405: recipe for target
> 'arch/powerpc/kernel/entry_64.o' failed
> 
> That's a corenet64-ish defconfig.

Yep, sorry.  I knew it wouldn't build on non-64s but I just wanted to
get the main part out there so people could start looking at it.  Will
fix.

- Russell

> 
> cheers



Re: [PATCH 1/5] powerpc/64s: Kernel Hypervisor Restricted Access Prevention

2018-10-17 Thread Russell Currey
On Wed, 2018-10-17 at 22:59 +1000, Nicholas Piggin wrote:
> On Wed, 17 Oct 2018 17:44:19 +1100
> Russell Currey  wrote:
> 
> > Kernel Hypervisor Restricted Access Prevention (KHRAP) utilises a
> > feature
> > of the Radix MMU which disallows read and write access to userspace
> > addresses.  By utilising this, the kernel is prevented from
> > accessing
> > user data from outside of trusted paths that perform proper safety
> > checks,
> > such as copy_{to/from}_user() and friends.
> > 
> > Userspace access is disabled from early boot and is only enabled
> > when:
> > 
> > - exiting the kernel and entering userspace
> > - performing an operation like copy_{to/from}_user()
> > - context switching to a process that has access enabled
> > 
> > and similarly, access is disabled again when exiting userspace and
> > entering
> > the kernel.
> > 
> > This feature has a slight performance impact which I roughly
> > measured to be
> > 4% slower (performing 1GB of 1 byte read()/write() syscalls), and
> > is gated
> > behind the CONFIG_PPC_RADIX_KHRAP option for performance-critical
> > builds.
> > 
> > This feature can be tested by using the lkdtm driver
> > (CONFIG_LKDTM=y) and
> > performing the following:
> > 
> > echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT
> > 
> > if enabled, this should send SIGSEGV to the thread.
> > 
> > Signed-off-by: Russell Currey 
> > ---
> > More detailed benchmarks soon, there's more optimisations here as
> > well.
> 
> Nice, this turned out to be a lot neater than I feared! Good stuff.
> 
> > @@ -240,6 +240,22 @@ BEGIN_FTR_SECTION_NESTED(941)  
> > \
> > mtspr   SPRN_PPR,ra;
> > \
> >  END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
> >  
> > +#define LOCK_AMR(reg)  
> > \
> > +BEGIN_MMU_FTR_SECTION_NESTED(69)   
> > \
> > +   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
> > +   isync;  
> > \
> > +   mtspr   SPRN_AMR,reg;   
> > \
> > +   isync;  
> > \
> > +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KHRAP,MMU_FTR_RADIX_KHRAP
> > ,69)
> > +
> > +#define UNLOCK_AMR(reg)
> > \
> > +BEGIN_MMU_FTR_SECTION_NESTED(420)  
> > \
> > +   li  reg,0;  \
> > +   isync;  
> > \
> > +   mtspr   SPRN_AMR,reg;   
> > \
> > +   isync;  
> > \
> > +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KHRAP,MMU_FTR_RADIX_KHRAP
> > ,420)
> 
> I wonder if you can skip the first isync on the way in and the second
> isync on the way out because the interrupt and return should be
> context
> synchronizing. Might not make a difference though.

Ben thought we wouldn't need at least one of them, but it's
implementation dependent, so there might be some concern with future
chips actually needing both isyncs or something.  There weren't any
consequences to leaving out isyncs, I'll do some quick benchmarking to
see if it's any meaningful speedup to leave one out.

> 
> What do you think about making the name match the C code a bit more.
> Like AMR_LOCK_USER_ACCESS()?

That is a good idea.

> 
> Thanks,
> Nick



Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2

2018-10-17 Thread Alistair Popple
Hi Alexey,

> > wouldn't you also need to do that somewhere? Unless the driver
> > does it at startup?
> 
> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
> without any software interactions. Am I hoping for too much here?

Sadly you are. It's not the GPU caches that need flushing, it's the CPU caches. 
This needs to happen as part of the reset sequence, so I guess you would need 
to add it to the VFIO driver.

- Alistair

> 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> >>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy 
wrote:
>  Ping?
>  
>  On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> > The skiboot firmware has a hot reset handler which fences the
> > NVIDIA V100
> > GPU RAM on Witherspoons and makes accesses no-op instead of
> > throwing HMIs:
> > https://github.com/open-power/skiboot/commit/fca2b2b839a67
> > 
> > Now we are going to pass V100 via VFIO which most certainly
> > involves
> > KVM guests which are often terminated without getting a chance to
> > offline
> > GPU RAM so we end up with a running machine with misconfigured
> > memory.
> > Accessing this memory produces hardware management interrupts
> > (HMI)
> > which bring the host down.
> > 
> > To suppress HMIs, this wires up this hot reset hook to
> > vfio_pci_disable()
> > via pci_disable_device() which switches NPU2 to a safe mode and
> > prevents
> > HMIs.
> > 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > Changes:
> > v2:
> > * updated the commit log
> > ---
> > 
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> > b/arch/powerpc/platforms/powernv/pci-ioda.c index
> > cde7102..e37b9cc 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
> > pci_dev *pdev)> 
> > pnv_ioda_release_pe(pe);
> >  
> >  }
> > 
> > +static void pnv_npu_disable_device(struct pci_dev *pdev)
> > +{
> > +   struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> > +   struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> > +
> > +   if (eehpe && eeh_ops && eeh_ops->reset)
> > +   eeh_ops->reset(eehpe, EEH_RESET_HOT);
> > +}
> > +
> > 
> >  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >  {
> >  
> > struct pnv_phb *phb = hose->private_data;
> > 
> > @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
> > pnv_npu_ioda_controller_ops = {> 
> > .reset_secondary_bus= pnv_pci_reset_secondary_bus,
> > .dma_set_mask   = pnv_npu_dma_set_mask,
> > .shutdown   = pnv_pci_ioda_shutdown,
> > 
> > +   .disable_device = pnv_npu_disable_device,
> > 
> >  };
> >  
> >  static const struct pci_controller_ops
> >  pnv_npu_ocapi_ioda_controller_ops = {




Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-17 Thread Alexey Kardashevskiy



On 18/10/2018 08:52, Alex Williamson wrote:
> On Wed, 17 Oct 2018 12:19:20 +1100
> Alexey Kardashevskiy  wrote:
> 
>> On 17/10/2018 06:08, Alex Williamson wrote:
>>> On Mon, 15 Oct 2018 20:42:33 +1100
>>> Alexey Kardashevskiy  wrote:
>>>   
 POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
 pluggable PCIe devices but implement PCIe links for config space and MMIO.
 In addition to that the GPUs are interconnected to each other and also
 have direct links to the P9 CPU. The links are NVLink2 and provide direct
 access to the system RAM for GPUs via NPU (an NVLink2 "proxy" on P9 chip).
 These systems also support ATS (address translation services) which is
 a part of the NVLink2 prototol. Such GPUs also share on-board RAM
 (16GB in tested config) to the system via the same NVLink2 so a CPU has
 cache-coherent access to a GPU RAM.

 This exports GPU RAM to the userspace as a new PCI region. This
 preregisters the new memory as device memory as it might be used for DMA.
 This inserts pfns from the fault handler as the GPU memory is not onlined
 until the NVIDIA driver is loaded and trained the links so doing this
 earlier produces low level errors which we fence in the firmware so
 it does not hurt the host system but still better to avoid.

 This exports ATSD (Address Translation Shootdown) register of NPU which
 allows the guest to invalidate TLB. The register conviniently occupies
 a single 64k page. Since NPU maps the GPU memory, it has a "tgt" property
 (which is an abbreviated host system bus address). This exports the "tgt"
 as a capability so the guest can program it into the GPU so the GPU can
 know how to route DMA trafic.  
>>>
>>> I'm not really following what "tgt" is and why it's needed.  Is the GPU
>>> memory here different than the GPU RAM region above?  Why does the user
>>> need the host system bus address of this "tgt" thing?  Are we not able
>>> to relocate it in guest physical address space, does this shootdown
>>> only work in the host physical address space and therefore we need this
>>> offset?  Please explain, I'm confused.  
>>
>>
>> This "tgt" is made of:
>> - "memory select" (bits 45, 46)
>> - "group select" (bits 43, 44)
>> - "chip select" (bit 42)
>> - chip internal address (bits 0..41)
>>
>> These are internal to GPU and this is where GPU RAM is mapped into the
>> GPU's real space, this fits 46 bits.
>>
>> On POWER9 CPU the bits are different and higher so the same memory is
>> mapped higher on P9 CPU. Just because we can map it higher, I guess.
>>
>> So it is not exactly the address but this provides the exact physical
>> location of the memory.
>>
>> We have a group of 3 interconnected GPUs, they got their own
>> memory/group/chip numbers. The GPUs use ATS service to translate
>> userspace to physical (host or guest) addresses. Now a GPU needs to know
>> which specific link to use for a specific physical address, in other
>> words what this physical address belongs to - a CPU or one of GPUs. This
>> is when "tgt" is used by the GPU hardware.
> 
> Clear as mud ;) 

/me is sad. I hope Piotr explained it better...


> So tgt, provided by the npu2 capability of the ATSD
> region of the NPU tells the GPU (a completely separate device) how to
> route it its own RAM via its NVLink interface?  How can one tgt
> indicate the routing for multiple interfaces?

This NVLink DMA is using direct host physical addresses (no IOMMU, no
filtering) which come from ATS. So unless we tell the GPU its own
address range on the host CPU, it will route trafic via CPU. And the
driver can also discover the NVLink topology and tell each GPU physical
addresses of peer GPUs.



> 
>> A GPU could run all the DMA trafic via the system bus indeed, just not
>> as fast.
>>
>> I am also struggling here and adding an Nvidia person in cc: (I should
>> have done that when I posted the patches, my bad) to correct when/if I
>> am wrong.
>>
>>
>>
>>>
 For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
 know LPID (a logical partition ID or a KVM guest hardware ID in other
 words) and PID (a memory context ID of an userspace process, not to be
 confused with a linux pid). This assigns a GPU to LPID in the NPU and
 this is why this adds a listener for KVM on an IOMMU group. A PID comes
 via NVLink from a GPU and NPU uses a PID wildcard to pass it through.

 This requires coherent memory and ATSD to be available on the host as
 the GPU vendor only supports configurations with both features enabled
 and other configurations are known not to work. Because of this and
 because of the ways the features are advertised to the host system
 (which is a device tree with very platform specific properties),
 this requires enabled POWERNV platform.

 This hardcodes the NVLink2 support for specific vendor and device IDs
 as there is no 

Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-17 Thread Piotr Jaroszyński

On 10/17/18 2:52 PM, Alex Williamson wrote:

On Wed, 17 Oct 2018 12:19:20 +1100
Alexey Kardashevskiy  wrote:


On 17/10/2018 06:08, Alex Williamson wrote:

On Mon, 15 Oct 2018 20:42:33 +1100
Alexey Kardashevskiy  wrote:
   

POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
pluggable PCIe devices but implement PCIe links for config space and MMIO.
In addition to that the GPUs are interconnected to each other and also
have direct links to the P9 CPU. The links are NVLink2 and provide direct
access to the system RAM for GPUs via NPU (an NVLink2 "proxy" on P9 chip).
These systems also support ATS (address translation services) which is
a part of the NVLink2 prototol. Such GPUs also share on-board RAM
(16GB in tested config) to the system via the same NVLink2 so a CPU has
cache-coherent access to a GPU RAM.

This exports GPU RAM to the userspace as a new PCI region. This
preregisters the new memory as device memory as it might be used for DMA.
This inserts pfns from the fault handler as the GPU memory is not onlined
until the NVIDIA driver is loaded and trained the links so doing this
earlier produces low level errors which we fence in the firmware so
it does not hurt the host system but still better to avoid.

This exports ATSD (Address Translation Shootdown) register of NPU which
allows the guest to invalidate TLB. The register conviniently occupies
a single 64k page. Since NPU maps the GPU memory, it has a "tgt" property
(which is an abbreviated host system bus address). This exports the "tgt"
as a capability so the guest can program it into the GPU so the GPU can
know how to route DMA trafic.


I'm not really following what "tgt" is and why it's needed.  Is the GPU
memory here different than the GPU RAM region above?  Why does the user
need the host system bus address of this "tgt" thing?  Are we not able
to relocate it in guest physical address space, does this shootdown
only work in the host physical address space and therefore we need this
offset?  Please explain, I'm confused.



This "tgt" is made of:
- "memory select" (bits 45, 46)
- "group select" (bits 43, 44)
- "chip select" (bit 42)
- chip internal address (bits 0..41)

These are internal to GPU and this is where GPU RAM is mapped into the
GPU's real space, this fits 46 bits.

On POWER9 CPU the bits are different and higher so the same memory is
mapped higher on P9 CPU. Just because we can map it higher, I guess.

So it is not exactly the address but this provides the exact physical
location of the memory.

We have a group of 3 interconnected GPUs, they got their own
memory/group/chip numbers. The GPUs use ATS service to translate
userspace to physical (host or guest) addresses. Now a GPU needs to know
which specific link to use for a specific physical address, in other
words what this physical address belongs to - a CPU or one of GPUs. This
is when "tgt" is used by the GPU hardware.


Clear as mud ;)  So tgt, provided by the npu2 capability of the ATSD
region of the NPU tells the GPU (a completely separate device) how to
route it its own RAM via its NVLink interface?  How can one tgt
indicate the routing for multiple interfaces?


The tgt addresses are read by the GPU driver for each GPU from the 
device tree properties and are used to program routing for all the GPUs 
in the VM. Each GPU needs to know:
1) Its own address range so that it can route ATS accesses to it 
directly to its RAM.
2) All direct peer GPUs (connected with nvlink directly) address ranges 
so that it can route accesses to peers through links going directly to 
those peers.

3) Everything else gets routed to the links going to the CPU.

Anticipating a question about the security implications of allowing the 
guest to configure this routing, no, this is not a problem:
1) If a range gets misprogrammed causing an access to be routed to the 
CPU nvlink incorrectly, the CPU still receives the full physical address 
of the access and can drop or re-route it correctly.
2) If a range gets misprogrammed causing an access to go to a peer GPU 
incorrectly, the guest can corrupt memory of that peer GPU, but it fully 
owns that GPU anyway.
3) If a range gets misprogrammed causing an access to go to local GPU 
memory incorrectly, the guest can corrupt that memory, but it fully owns 
it anyway.


Thanks,
Piotr





A GPU could run all the DMA trafic via the system bus indeed, just not
as fast.

I am also struggling here and adding an Nvidia person in cc: (I should
have done that when I posted the patches, my bad) to correct when/if I
am wrong.





For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
know LPID (a logical partition ID or a KVM guest hardware ID in other
words) and PID (a memory context ID of an userspace process, not to be
confused with a linux pid). This assigns a GPU to LPID in the NPU and
this is why this adds a listener for KVM on an IOMMU group. A PID comes
via NVLink from a GPU and NPU uses a PID wildcard to pass it through.


Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-17 Thread Alex Williamson
On Wed, 17 Oct 2018 12:19:20 +1100
Alexey Kardashevskiy  wrote:

> On 17/10/2018 06:08, Alex Williamson wrote:
> > On Mon, 15 Oct 2018 20:42:33 +1100
> > Alexey Kardashevskiy  wrote:
> >   
> >> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
> >> pluggable PCIe devices but implement PCIe links for config space and MMIO.
> >> In addition to that the GPUs are interconnected to each other and also
> >> have direct links to the P9 CPU. The links are NVLink2 and provide direct
> >> access to the system RAM for GPUs via NPU (an NVLink2 "proxy" on P9 chip).
> >> These systems also support ATS (address translation services) which is
> >> a part of the NVLink2 prototol. Such GPUs also share on-board RAM
> >> (16GB in tested config) to the system via the same NVLink2 so a CPU has
> >> cache-coherent access to a GPU RAM.
> >>
> >> This exports GPU RAM to the userspace as a new PCI region. This
> >> preregisters the new memory as device memory as it might be used for DMA.
> >> This inserts pfns from the fault handler as the GPU memory is not onlined
> >> until the NVIDIA driver is loaded and trained the links so doing this
> >> earlier produces low level errors which we fence in the firmware so
> >> it does not hurt the host system but still better to avoid.
> >>
> >> This exports ATSD (Address Translation Shootdown) register of NPU which
> >> allows the guest to invalidate TLB. The register conviniently occupies
> >> a single 64k page. Since NPU maps the GPU memory, it has a "tgt" property
> >> (which is an abbreviated host system bus address). This exports the "tgt"
> >> as a capability so the guest can program it into the GPU so the GPU can
> >> know how to route DMA trafic.  
> > 
> > I'm not really following what "tgt" is and why it's needed.  Is the GPU
> > memory here different than the GPU RAM region above?  Why does the user
> > need the host system bus address of this "tgt" thing?  Are we not able
> > to relocate it in guest physical address space, does this shootdown
> > only work in the host physical address space and therefore we need this
> > offset?  Please explain, I'm confused.  
> 
> 
> This "tgt" is made of:
> - "memory select" (bits 45, 46)
> - "group select" (bits 43, 44)
> - "chip select" (bit 42)
> - chip internal address (bits 0..41)
> 
> These are internal to GPU and this is where GPU RAM is mapped into the
> GPU's real space, this fits 46 bits.
> 
> On POWER9 CPU the bits are different and higher so the same memory is
> mapped higher on P9 CPU. Just because we can map it higher, I guess.
> 
> So it is not exactly the address but this provides the exact physical
> location of the memory.
> 
> We have a group of 3 interconnected GPUs, they got their own
> memory/group/chip numbers. The GPUs use ATS service to translate
> userspace to physical (host or guest) addresses. Now a GPU needs to know
> which specific link to use for a specific physical address, in other
> words what this physical address belongs to - a CPU or one of GPUs. This
> is when "tgt" is used by the GPU hardware.

Clear as mud ;)  So tgt, provided by the npu2 capability of the ATSD
region of the NPU tells the GPU (a completely separate device) how to
route it its own RAM via its NVLink interface?  How can one tgt
indicate the routing for multiple interfaces?

> A GPU could run all the DMA trafic via the system bus indeed, just not
> as fast.
> 
> I am also struggling here and adding an Nvidia person in cc: (I should
> have done that when I posted the patches, my bad) to correct when/if I
> am wrong.
> 
> 
> 
> >
> >> For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
> >> know LPID (a logical partition ID or a KVM guest hardware ID in other
> >> words) and PID (a memory context ID of an userspace process, not to be
> >> confused with a linux pid). This assigns a GPU to LPID in the NPU and
> >> this is why this adds a listener for KVM on an IOMMU group. A PID comes
> >> via NVLink from a GPU and NPU uses a PID wildcard to pass it through.
> >>
> >> This requires coherent memory and ATSD to be available on the host as
> >> the GPU vendor only supports configurations with both features enabled
> >> and other configurations are known not to work. Because of this and
> >> because of the ways the features are advertised to the host system
> >> (which is a device tree with very platform specific properties),
> >> this requires enabled POWERNV platform.
> >>
> >> This hardcodes the NVLink2 support for specific vendor and device IDs
> >> as there is no reliable way of knowing about coherent memory and ATS
> >> support. The GPU has an unique vendor PCIe capability 0x23 but it was
> >> confirmed that it does not provide required information (and it is still
> >> undisclosed what it actually does).
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>  drivers/vfio/pci/Makefile   |   1 +
> >>  drivers/vfio/pci/vfio_pci_private.h |   2 +
> >>  include/uapi/linux/vfio.h   |  

Re: [PATCH v4 01/18] of: overlay: add tests to validate kfrees from overlay removal

2018-10-17 Thread Alan Tull
On Mon, Oct 15, 2018 at 9:39 PM  wrote:

Hi Frank,

>
> From: Frank Rowand 
>
> Add checks:
>   - attempted kfree due to refcount reaching zero before overlay
> is removed
>   - properties linked to an overlay node when the node is removed
>   - node refcount > one during node removal in a changeset destroy,
> if the node was created by the changeset
>
> After applying this patch, several validation warnings will be
> reported from the devicetree unittest during boot due to
> pre-existing devicetree bugs. The warnings will be similar to:
>
>   OF: ERROR: of_node_release() overlay node 
> /testcase-data/overlay-node/test-bus/test-unittest11/test-unittest111 
> contains unexpected properties
>   OF: ERROR: memory leak - destroy cset entry: attach overlay node 
> /testcase-data-2/substation@100/hvac-medium-2 expected refcount 1 instead of 
> 2.  of_node_get() / of_node_put() are unbalanced for this node.
>
> Signed-off-by: Frank Rowand 
> ---
> Changes since v3:
>   - Add expected value of refcount for destroy cset entry error.  Also
> explain the cause of the error.
>
>  drivers/of/dynamic.c | 29 +
>  drivers/of/overlay.c |  1 +
>  include/linux/of.h   | 15 ++-
>  3 files changed, 40 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index f4f8ed9b5454..24c97b7a050f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj)
> if (!of_node_check_flag(node, OF_DYNAMIC))
> return;
>
> +   if (of_node_check_flag(node, OF_OVERLAY)) {
> +
> +   if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) {
> +   /* premature refcount of zero, do not free memory */
> +   pr_err("ERROR: memory leak %s() overlay node %pOF 
> before free overlay changeset\n",
> +  __func__, node);
> +   return;
> +   }
> +
> +   /*
> +* If node->properties non-empty then properties were added
> +* to this node either by different overlay that has not
> +* yet been removed, or by a non-overlay mechanism.
> +*/
> +   if (node->properties)
> +   pr_err("ERROR: %s() overlay node %pOF contains 
> unexpected properties\n",
> +  __func__, node);
> +   }
> +
> property_list_free(node->properties);
> property_list_free(node->deadprops);
>
> @@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct 
> device_node *np,
>
>  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
>  {
> +   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> +   of_node_check_flag(ce->np, OF_OVERLAY)) {
> +   if (kref_read(>np->kobj.kref) > 1) {
> +   pr_err("ERROR: memory leak - destroy cset entry: 
> attach overlay node %pOF expected refcount 1 instead of %d.  of_node_get() / 
> of_node_put() are unbalanced for this node.\n",
> +  ce->np, kref_read(>np->kobj.kref));

Still testing as much as I have time to do.

I'm hitting this error message once when removing an overlay that adds
several child nodes.  The only node I get the message for was a node
that added a fixed-clock (the other nodes didn't trigger the error).
Then even if I edited all the rest of the overlay DTS and removed all
other child nodes and all references to the clock from other nodes, I
still got the error.

Removing dtbo: 1-socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dtb
[   72.032270] OF: ERROR: memory leak - destroy cset entry: attach
overlay node /soc/base_fpga_region/clk_0 expected refcount 1 instead
of 2.  of_node_get() / of_node_put() are unbalanced for this node.

Here's the very stripped down overlay:

/dts-v1/;
/plugin/;
/ {
fragment@0 {
target-path = "/soc/base_fpga_region";
#address-cells = <1>;
#size-cells = <1>;

__overlay__ {
external-fpga-config;

#address-cells = <1>;
#size-cells = <1>;

clk_0: clk_0 {
compatible = "fixed-clock";
#clock-cells = <0>;
clock-frequency = <1>;  /* 100.00 MHz */
clock-output-names = "clk_0-clk";
};
};
};
};

I'll look at it some more tomorrow and try to figure out what's
special about this node.

Alan

> +   } else {
> +   of_node_set_flag(ce->np, OF_OVERLAY_FREE_CSET);
> +   }
> +   }
> +
> of_node_put(ce->np);
> list_del(>node);
> kfree(ce);
> diff --git a/drivers/of/overlay.c 

Re: [PATCH v4 00/18] of: overlay: validation checks, subsequent fixes

2018-10-17 Thread Alan Tull
On Tue, Oct 16, 2018 at 10:08 PM Frank Rowand  wrote:
>
> On 10/16/18 02:47, Michael Ellerman wrote:
> > frowand.l...@gmail.com writes:
> >
> >> From: Frank Rowand 
> >>
> >> Add checks to (1) overlay apply process and (2) memory freeing
> >> triggered by overlay release.  The checks are intended to detect
> >> possible memory leaks and invalid overlays.
> >>
> >> The checks revealed bugs in existing code.  Fixed the bugs.
> >>
> >> While fixing bugs, noted other issues, which are fixed in
> >> separate patches.
> >>
> >> *  Powerpc folks: I was not able to test the patches that
> >> *  directly impact Powerpc systems that use dynamic
> >> *  devicetree.  Please review that code carefully and
> >> *  test.  The specific patches are: 03/16, 04/16, 07/16
> >
> > Hi Frank,
> >
> > Do you have this series in a git tree somewhere?
> >
> > I tried applying it on top of linux-next but hit some conflicts which I
> > couldn't easily resolve.
> >
> > cheers
> >
>
>
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frowand/linux.git
>
> $ git checkout v4.19-rc1--kfree_validate--v4
>
> $ git log --oneline v4.19-rc1..
> 2ba1b7d353dd of: unittest: initialize args before calling of_*parse_*()
> 4f9108209f79 of: unittest: find overlays[] entry by name instead of index
> 353403c76ff8 of: unittest: allow base devicetree to have symbol metadata
> 8fc37e04a01b of: overlay: set node fields from properties when add new 
> overlay n
> 05d5df0e5151 of: unittest: remove unused of_unittest_apply_overlay() argument
> 8c021cba757a of: overlay: check prevents multiple fragments touching same 
> proper
> 797a6f66e039 of: overlay: check prevents multiple fragments add or delete 
> same n
> c385e25a040d of: overlay: test case of two fragments adding same node
> c88fd240f0e0 of: overlay: make all pr_debug() and pr_err() messages unique
> 1028a215d32a of: overlay: validate overlay properties #address-cells and 
> #size-c
> f1a97ef74ce4 of: overlay: reorder fields in struct fragment
> ffe78cf7a1fb of: dynamic: change type of of_{at,de}tach_node() to void
> 5f5ff8ec0c0c of: overlay: do not duplicate properties from overlay for new 
> nodes
> 06e72dcb2bb0 of: overlay: use prop add changeset entry for property in new 
> nodes
> a02f8d326a08 powerpc/pseries: add of_node_put() in dlpar_detach_node()
> e203be664330 of: overlay: add missing of_node_get() in __of_attach_node_sysfs
> 8eb46208e7c8 of: overlay: add missing of_node_put() after add new node to 
> change
> b22067db7cf9 of: overlay: add tests to validate kfrees from overlay removal

That branch is a real time saver, thanks!

ALan


[PATCH v07 5/5] migration/memory: Support 'ibm,dynamic-memory-v2'

2018-10-17 Thread Michael Bringmann
migration/memory: This patch adds recognition for changes to the
associativity of memory blocks described by 'ibm,dynamic-memory-v2'.
If the associativity of an LMB has changed, it should be readded to
the system in order to update local and general kernel data structures.
This patch builds upon previous enhancements that scan the device-tree
"ibm,dynamic-memory" properties using the base LMB array, and a copy
derived from the updated properties.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 6856010..03c5e49 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -1187,7 +1187,8 @@ static int pseries_memory_notifier(struct notifier_block 
*nb,
err = pseries_remove_mem_node(rd->dn);
break;
case OF_RECONFIG_UPDATE_PROPERTY:
-   if (!strcmp(rd->prop->name, "ibm,dynamic-memory")) {
+   if (!strcmp(rd->prop->name, "ibm,dynamic-memory") ||
+   !strcmp(rd->prop->name, "ibm,dynamic-memory-v2")) {
struct drmem_lmb_info *dinfo =
drmem_lmbs_init(rd->prop);
if (!dinfo)



[PATCH v07 4/5] migration/memory: Evaluate LMB assoc changes

2018-10-17 Thread Michael Bringmann
migration/memory: This patch adds code that recognizes changes to
the associativity of memory blocks described by the device-tree
properties in order to drive equivalent 'hotplug' operations to
update local and general kernel data structures to reflect those
changes.  These differences may include:

* Evaluate 'ibm,dynamic-memory' properties when processing the
  updated device-tree properties of the system during Post Migration
  events (migration_store).  The new functionality looks for changes
  to the aa_index values for each drc_index/LMB to identify any memory
  blocks that should be readded.

* In an LPAR migration scenario, the "ibm,associativity-lookup-arrays"
  property may change.  In the event that a row of the array differs,
  locate all assigned memory blocks with that 'aa_index' and 're-add'
  them to the system memory block data structures.  In the process of
  the 're-add', the system routines will update the corresponding entry
  for the memory in the LMB structures and any other relevant kernel
  data structures.

A number of previous extensions made to the DRMEM code for scanning
device-tree properties and creating LMB arrays are used here to
ensure that the resulting code is simpler and more usable:

* Use new paired list iterator for the DRMEM LMB info arrays to find
  differences in old and new versions of properties.
* Use new iterator for copies of the DRMEM info arrays to evaluate
  completely new structures.
* Combine common code for parsing and evaluating memory description
  properties based on the DRMEM LMB array model to greatly simplify
  extension from the older property 'ibm,dynamic-memory' to the new
  property model of 'ibm,dynamic-memory-v2'.

For support, add a new pseries hotplug action for DLPAR operations,
PSERIES_HP_ELOG_ACTION_READD_MULTIPLE.  It is a variant of the READD
operation which performs the action upon multiple instances of the
resource at one time.  The operation is to be triggered by device-tree
analysis of updates by RTAS events analyzed by 'migation_store' during
post-migration processing.  It will be used for memory updates,
initially.

Signed-off-by: Michael Bringmann 
---
Changes in v06:
  -- Rebase to powerpc next branch to account for recent code changes.
  -- Fix prototype problem when CONFIG_MEMORY_HOTPLUG not defined.
Changes in v05:
  -- Move common structure from numa.c + hotplug-memory.c to header file.
  -- Clarify some comments.
  -- Use walk_drmem_lmbs_pairs and callback instead of local loop
Changes in v04:
  -- Move dlpar_memory_readd_multiple() function definition and use
 into previous patch along with action constant definition.
  -- Correct spacing in patch
Changes in v03:
  -- Modify the code that parses the memory affinity attributes to
 mark relevant DRMEM LMB array entries using the internal_flags
 mechanism instead of generate unique hotplug actions for each
 memory block to be readded.  The change is intended to both
 simplify the code, and to require fewer resources on systems
 with huge amounts of memory.
  -- Save up notice about any all LMB entries until the end of the
 'migration_store' operation at which point a single action is
 queued to scan the entire DRMEM array.
  -- Add READD_MULTIPLE function for memory that scans the DRMEM
 array to identify multiple entries that were marked previously.
 The corresponding memory blocks are to be readded to the system
 to update relevant data structures outside of the powerpc-
 specific code.
  -- Change dlpar_memory_pmt_changes_action to directly queue worker
 to pseries work queue.
---
 arch/powerpc/include/asm/topology.h |7 +
 arch/powerpc/mm/numa.c  |6 -
 arch/powerpc/platforms/pseries/hotplug-memory.c |  207 +++
 arch/powerpc/platforms/pseries/mobility.c   |3 
 arch/powerpc/platforms/pseries/pseries.h|8 +
 5 files changed, 186 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index a4a718d..fbe03df 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -135,5 +135,12 @@ static inline void shared_proc_topology_init(void) {}
 #endif
 #endif
 
+
+struct assoc_arrays {
+   u32 n_arrays;
+   u32 array_sz;
+   const __be32 *arrays;
+};
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 693ae1c..f1e7287 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -368,12 +368,6 @@ static unsigned long read_n_cells(int n, const __be32 
**buf)
return result;
 }
 
-struct assoc_arrays {
-   u32 n_arrays;
-   u32 array_sz;
-   const __be32 *arrays;
-};
-
 /*
  * Retrieve and validate the list of associativity arrays for drconf
  * memory from the ibm,associativity-lookup-arrays property of the
diff --git 

[PATCH v07 3/5] migration/memory: Add hotplug READD_MULTIPLE

2018-10-17 Thread Michael Bringmann
migration/memory: This patch adds a new pseries hotplug action
for CPU and memory operations, PSERIES_HP_ELOG_ACTION_READD_MULTIPLE.
This is a variant of the READD operation which performs the action
upon multiple instances of the resource at one time.  The operation
is to be triggered by device-tree analysis of updates by RTAS events
analyzed by 'migation_store' during post-migration processing.  It
will be used for memory updates, initially.

Signed-off-by: Michael Bringmann 
---
Changes in v07:
  -- Provide more useful return value from dlpar_memory_readd_multiple
Changes in v05:
  -- Provide dlpar_memory_readd_helper routine to compress some common code
Changes in v04:
  -- Move init of 'lmb->internal_flags' in init_drmem_v2_lmbs to
 previous patch.
  -- Pull in implementation of dlpar_memory_readd_multiple() to go
 with operation flag.
---
 arch/powerpc/include/asm/rtas.h |1 
 arch/powerpc/platforms/pseries/hotplug-memory.c |   47 ---
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 0183e95..cc00451 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -333,6 +333,7 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ACTION_ADD 1
 #define PSERIES_HP_ELOG_ACTION_REMOVE  2
 #define PSERIES_HP_ELOG_ACTION_READD   3
+#define PSERIES_HP_ELOG_ACTION_READD_MULTIPLE  4
 
 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2b796da..c44c6a6 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -507,6 +507,19 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
return rc;
 }
 
+static int dlpar_memory_readd_helper(struct drmem_lmb *lmb)
+{
+   int rc;
+
+   rc = dlpar_remove_lmb(lmb);
+   if (!rc) {
+   rc = dlpar_add_lmb(lmb);
+   if (rc)
+   dlpar_release_drc(lmb->drc_index);
+   }
+   return rc;
+}
+
 static int dlpar_memory_readd_by_index(u32 drc_index)
 {
struct drmem_lmb *lmb;
@@ -519,12 +532,7 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
for_each_drmem_lmb(lmb) {
if (lmb->drc_index == drc_index) {
lmb_found = 1;
-   rc = dlpar_remove_lmb(lmb);
-   if (!rc) {
-   rc = dlpar_add_lmb(lmb);
-   if (rc)
-   dlpar_release_drc(lmb->drc_index);
-   }
+   rc = dlpar_memory_readd_helper(lmb);
break;
}
}
@@ -541,6 +549,26 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
return rc;
 }
 
+static int dlpar_memory_readd_multiple(void)
+{
+   struct drmem_lmb *lmb;
+   int rc = 0;
+
+   pr_info("Attempting to update multiple LMBs\n");
+
+   for_each_drmem_lmb(lmb) {
+   if (drmem_lmb_update(lmb)) {
+   rc |= dlpar_memory_readd_helper(lmb);
+   drmem_remove_lmb_update(lmb);
+   }
+   }
+
+   if (rc)
+   return -EIO;
+
+   return rc;
+}
+
 static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 {
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
@@ -641,6 +669,10 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
 {
return -EOPNOTSUPP;
 }
+static int dlpar_memory_readd_multiple(void)
+{
+   return -EOPNOTSUPP;
+}
 
 static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 {
@@ -918,6 +950,9 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
drc_index = hp_elog->_drc_u.drc_index;
rc = dlpar_memory_readd_by_index(drc_index);
break;
+   case PSERIES_HP_ELOG_ACTION_READD_MULTIPLE:
+   rc = dlpar_memory_readd_multiple();
+   break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
rc = -EINVAL;



[PATCH v07 2/5] powerpc/drmem: Add internal_flags feature

2018-10-17 Thread Michael Bringmann
powerpc/drmem: Add internal_flags field to each LMB to allow
marking of kernel software-specific operations that need not
be exported to other users.  For instance, if information about
selected LMBs needs to be maintained for subsequent passes
through the system, it can be encoded into the LMB array itself
without requiring the allocation and maintainance of additional
data structures.

Signed-off-by: Michael Bringmann 
---
Changes in v04:
  -- Add another initialization of 'lmb->internal_flags' to
 init_drmem_v2_lmbs.
---
 arch/powerpc/include/asm/drmem.h |   18 ++
 arch/powerpc/mm/drmem.c  |3 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index cfe8598..dbb3e6c 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -17,6 +17,7 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
+   u32 internal_flags;
 };
 
 struct drmem_lmb_info {
@@ -94,6 +95,23 @@ static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb)
return lmb->flags & DRMEM_LMB_RESERVED;
 }
 
+#define DRMEM_LMBINT_UPDATE0x0001
+
+static inline void drmem_mark_lmb_update(struct drmem_lmb *lmb)
+{
+   lmb->internal_flags |= DRMEM_LMBINT_UPDATE;
+}
+
+static inline void drmem_remove_lmb_update(struct drmem_lmb *lmb)
+{
+   lmb->internal_flags &= ~DRMEM_LMBINT_UPDATE;
+}
+
+static inline bool drmem_lmb_update(struct drmem_lmb *lmb)
+{
+   return lmb->internal_flags & DRMEM_LMBINT_UPDATE;
+}
+
 u64 drmem_lmb_memory_max(void);
 void __init walk_drmem_lmbs(struct device_node *dn,
void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index ded9dbf..f199fe5 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -207,6 +207,7 @@ static void read_drconf_v1_cell(struct drmem_lmb *lmb,
 
lmb->aa_index = of_read_number(p++, 1);
lmb->flags = of_read_number(p++, 1);
+   lmb->internal_flags = 0;
 
*prop = p;
 }
@@ -265,6 +266,7 @@ static void __walk_drmem_v2_lmbs(const __be32 *prop, const 
__be32 *usm,
 
lmb.aa_index = dr_cell.aa_index;
lmb.flags = dr_cell.flags;
+   lmb.internal_flags = 0;
 
func(, );
}
@@ -441,6 +443,7 @@ static void init_drmem_v2_lmbs(const __be32 *prop,
 
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
+   lmb->internal_flags = 0;
}
}
 }



[PATCH v07 1/5] powerpc/drmem: Export 'dynamic-memory' loader

2018-10-17 Thread Michael Bringmann
powerpc/drmem: Export many of the functions of DRMEM to parse
"ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during hotplug
operations and for Post Migration events.

Also modify the DRMEM initialization code to allow it to,

* Be called after system initialization
* Provide a separate user copy of the LMB array that is produces
* Free the user copy upon request

In addition, a couple of changes were made to make the creation
of additional copies of the LMB array more useful including,

* Add iterator function to work through a pair of drmem_info arrays
  with a callback function to apply specific tests.
* Modify DRMEM code to replace usages of dt_root_addr_cells, and
  dt_mem_next_cell, as these are only available at first boot.

Signed-off-by: Michael Bringmann 
---
Changes in v05:
  -- Add walk_drmem_lmbs_pairs to replace macro for_each_pair_lmb
---
 arch/powerpc/include/asm/drmem.h |   13 +
 arch/powerpc/mm/drmem.c  |   96 ++
 2 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 7c1d8e7..cfe8598 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -35,6 +35,11 @@ struct drmem_lmb_info {
_info->lmbs[0],   \
_info->lmbs[drmem_info->n_lmbs - 1])
 
+#define for_each_dinfo_lmb(dinfo, lmb) \
+   for_each_drmem_lmb_in_range((lmb),  \
+   >lmbs[0],\
+   >lmbs[dinfo->n_lmbs - 1])
+
 /*
  * The of_drconf_cell_v1 struct defines the layout of the LMB data
  * specified in the ibm,dynamic-memory device tree property.
@@ -94,6 +99,14 @@ void __init walk_drmem_lmbs(struct device_node *dn,
void (*func)(struct drmem_lmb *, const __be32 **));
 int drmem_update_dt(void);
 
+struct drmem_lmb_info *drmem_lmbs_init(struct property *prop);
+void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
+int walk_drmem_lmbs_pairs(struct drmem_lmb_info *dinfo_oth,
+ int (*func)(struct drmem_lmb *cnt,
+   struct drmem_lmb *oth,
+   void *data),
+ void *data);
+
 #ifdef CONFIG_PPC_PSERIES
 void __init walk_drmem_lmbs_early(unsigned long node,
void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 3f18036..ded9dbf 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -20,6 +20,7 @@
 
 static struct drmem_lmb_info __drmem_info;
 struct drmem_lmb_info *drmem_info = &__drmem_info;
+static int n_root_addr_cells;
 
 u64 drmem_lmb_memory_max(void)
 {
@@ -193,12 +194,13 @@ int drmem_update_dt(void)
return rc;
 }
 
-static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
+static void read_drconf_v1_cell(struct drmem_lmb *lmb,
   const __be32 **prop)
 {
const __be32 *p = *prop;
 
-   lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
+   lmb->base_addr = of_read_number(p, n_root_addr_cells);
+   p += n_root_addr_cells;
lmb->drc_index = of_read_number(p++, 1);
 
p++; /* skip reserved field */
@@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb 
*lmb,
*prop = p;
 }
 
-static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
void (*func)(struct drmem_lmb *, const __be32 **))
 {
struct drmem_lmb lmb;
@@ -225,13 +227,14 @@ static void __init __walk_drmem_v1_lmbs(const __be32 
*prop, const __be32 *usm,
}
 }
 
-static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
+static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
   const __be32 **prop)
 {
const __be32 *p = *prop;
 
dr_cell->seq_lmbs = of_read_number(p++, 1);
-   dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
+   dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
+   p += n_root_addr_cells;
dr_cell->drc_index = of_read_number(p++, 1);
dr_cell->aa_index = of_read_number(p++, 1);
dr_cell->flags = of_read_number(p++, 1);
@@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct 
of_drconf_cell_v2 *dr_cell,
*prop = p;
 }
 
-static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
void (*func)(struct drmem_lmb *, const __be32 **))
 {
struct of_drconf_cell_v2 dr_cell;
@@ -275,6 +278,9 @@ void __init walk_drmem_lmbs_early(unsigned long node,
const __be32 *prop, *usm;
int len;
 
+   

[PATCH v07 0/5] powerpc/migration: Affinity fix for memory

2018-10-17 Thread Michael Bringmann
The migration of LPARs across Power systems affects many attributes
including that of the associativity of memory blocks.  The patches
in this set execute when a system is coming up fresh upon a migration
target.  They are intended to,

* Recognize changes to the associativity of memory recorded in
  internal data structures when compared to the latest copies in
  the device tree (e.g. ibm,dynamic-memory, ibm,dynamic-memory-v2).
* Recognize changes to the associativity mapping (e.g. ibm,
  associativity-lookup-arrays), locate all assigned memory blocks
  corresponding to each changed row, and readd all such blocks.
* Generate calls to other code layers to reset the data structures
  related to associativity of memory.
* Re-register the 'changed' entities into the target system.
  Re-registration of memory blocks mostly entails acting as if they
  have been newly hot-added into the target system.

This code builds upon features introduced in a previous patch set
that updates CPUs for affinity changes that may occur during LPM.

Signed-off-by: Michael Bringmann 

Michael Bringmann (5):
  powerpc/drmem: Export 'dynamic-memory' loader
  powerpc/drmem: Add internal_flags feature
  migration/memory: Add hotplug flags READD_MULTIPLE
  migration/memory: Evaluate LMB assoc changes
  migration/memory: Support 'ibm,dynamic-memory-v2'
---
Changes in v07:
  -- Provide more useful return value from dlpar_memory_readd_multiple
Changes in v06:
  -- Rebase to powerpc next branch to account for recent code changes.
  -- Fix prototype problem when CONFIG_MEMORY_HOTPLUG not defined.
Changes in v05:
  -- Add walk_drmem_lmbs_pairs to replace macro for_each_pair_lmb
  -- Use walk_drmem_lmbs_pairs and callback instead of local loop
  -- Provide dlpar_memory_readd_helper routine to compress some common code
  -- Move common structure from numa.c + hotplug-memory.c to header file.
  -- Clarify some comments.
Changes in v04:
  -- Move dlpar_memory_readd_multiple() to patch with new ACTION
 constant.
  -- Move init of 'lmb->internal_flags' in init_drmem_v2_lmbs to
 patch with other references to flag.
  -- Correct spacing in one of the patches
Changes in v03:
  -- Change operation to tag changed LMBs in DRMEM array instead of
 queuing a potentially huge number of structures.
  -- Added another hotplug queue event for CPU/memory operations
  -- Added internal_flags feature to DRMEM
  -- Improve the patch description language for the patch set.
  -- Revise patch set to queue worker for memory association
 updates directly to pseries worker queue.



[PATCH] powerpc: Add missing include

2018-10-17 Thread Mathieu Malaterre
In commit 88b0fe175735 ("powerpc: Add show_user_instructions()") the
function show_user_instructions was added.

This commit adds an include of header file  to provide
the missing function prototype. Silence the following gcc warning
(treated as error with W=1):

  arch/powerpc/kernel/process.c:1302:6: error: no previous prototype for 
‘show_user_instructions’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bb6ac471a784..1c64491e9702 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
-- 
2.11.0



Re: MPC8321 boot failure

2018-10-17 Thread Christophe LEROY

Hi,

Yes I discovered the same issue today on MPC8321E, I plan to look at it 
more closely tomorrow morning (Paris Time).


I think we are missing commit 8183d99f4a22c2abbc543847a588df3666ef0c0c , 
I didn't realise it when we applied the serie to 4.14, 
patch_instruction() is called too early without that patch.


If you have opportunity to test now, you are welcome, otherwise I'll 
test it tomorrow.


Christophe

Le 17/10/2018 à 17:18, David Gounaris a écrit :
Hello, I got into troubles when I upgraded to Linux kernel 4.14.76 on 
boards with MPC8321.



The symptom that I see is that the boot process gets cyclic, and no 
printouts are seen from the Linux kernel. It seems like it resets.



When I revert the following commits it works again.

af1a8101794dfea897290e057f61086dabfe6c91, powerpc/lib: fix book3s/32 
boot failure due to code patching
609fbeddb24c4035d24fc32d82dc08b30ae3dfc0, powerpc: Avoid code patching 
freed init sections


Any ideas of how to continue?

BR / David Gounaris





Re: [PATCH v06 3/5] migration/memory: Add hotplug READD_MULTIPLE

2018-10-17 Thread Michael Bringmann
On 10/16/2018 07:48 PM, Michael Ellerman wrote:
> Michael Bringmann  writes:
>> On 10/16/2018 02:57 PM, Tyrel Datwyler wrote:
>>> On 10/15/2018 05:39 PM, Michael Ellerman wrote:
 Michael Bringmann  writes:
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 2b796da..9c76345 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -541,6 +549,23 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
>   return rc;
>  }
>
> +static int dlpar_memory_readd_multiple(void)
> +{
> + struct drmem_lmb *lmb;
> + int rc;
> +
> + pr_info("Attempting to update multiple LMBs\n");
> +
> + for_each_drmem_lmb(lmb) {
> + if (drmem_lmb_update(lmb)) {
> + rc = dlpar_memory_readd_helper(lmb);
> + drmem_remove_lmb_update(lmb);
> + }
> + }
> +
> + return rc;
> +}

 This leaves rc potentially uninitialised.

 What should the result be in that case, -EINVAL ?
>>>
>>> On another note if there are multiple LMBs to update the value of rc only 
>>> reflects the final dlpar_memory_readd_helper() call.
>>
>> Correct.  But that is what happens when we compress common code
>> between two disparate uses i.e. updating memory association after
>> a migration event with no reporting mechanism other than the console
>> log, vs re-adding a single LMB by index for the purposes of DLPAR / drmgr.
>>
>> I could discard the return value from dlpar_memory_readd_helper entirely
>> in this function and just return 0, but in my experience, once errors start
>> to occur in memory dlpar ops, they tend to keep on occurring, so I was
>> returning the last one.  We could also make the code smart enough to
>> capture and return the first/last non-zero return code.  I didn't believe
>> that the frequency of errors for this operation warranted the overhead.
> 
> The actual error value is probably not very relevant.
> 
> But dropping errors entirely is almost always a bad idea.
> 
> So I think you should at least return an error if any error occurred,
> that way at least an error will be returned up to the caller(s).
> 
> Something like:
> 
>   int rc;
> 
>   rc = 0;
>   for_each_drmem_lmb(lmb) {
>   if (drmem_lmb_update(lmb)) {
>   rc |= dlpar_memory_readd_helper(lmb);
>   drmem_remove_lmb_update(lmb);
>   }
>   }
> 
>   if (rc)
>   return -EIO;

Okay.

> 
> cheers
> 

Thanks.

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH] powerpc/book3e: redefine pte_mkprivileged() and pte_mkuser()

2018-10-17 Thread Aneesh Kumar K.V

On 10/17/18 6:33 PM, Christophe Leroy wrote:

Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
default pte_mkprivileged() and pte_mkuser() are not usable.

This patch redefines them for book3e.

In theorie, only pte_mkprivileged() needs to be redefined because
_PAGE_USER includes _PAGE_PRIVILEGED, but it is less confusing
to redefine both.

Fixes: a0da4bc166f2 ("powerpc/mm: Allow platforms to redefine some helpers")


 Reviewed-by: Aneesh Kumar K.V 


Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/nohash/pte-book3e.h | 16 
  1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 58eef8cb569d..fa1451e15b4e 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -109,5 +109,21 @@
  #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER)
  #define PAGE_READONLY_X   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
  
+#ifndef __ASSEMBLY__

+static inline pte_t pte_mkprivileged(pte_t pte)
+{
+   return __pte((pte_val(pte) & ~_PAGE_USER) | _PAGE_PRIVILEGED);
+}
+
+#define pte_mkprivileged pte_mkprivileged
+
+static inline pte_t pte_mkuser(pte_t pte)
+{
+   return __pte((pte_val(pte) & ~_PAGE_PRIVILEGED) | _PAGE_USER);
+}
+
+#define pte_mkuser pte_mkuser
+#endif /* __ASSEMBLY__ */
+
  #endif /* __KERNEL__ */
  #endif /*  _ASM_POWERPC_NOHASH_PTE_BOOK3E_H */





[PATCH] powerpc/book3e: redefine pte_mkprivileged() and pte_mkuser()

2018-10-17 Thread Christophe Leroy
Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
default pte_mkprivileged() and pte_mkuser() are not usable.

This patch redefines them for book3e.

In theorie, only pte_mkprivileged() needs to be redefined because
_PAGE_USER includes _PAGE_PRIVILEGED, but it is less confusing
to redefine both.

Fixes: a0da4bc166f2 ("powerpc/mm: Allow platforms to redefine some helpers")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/pte-book3e.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 58eef8cb569d..fa1451e15b4e 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -109,5 +109,21 @@
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
 #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
 
+#ifndef __ASSEMBLY__
+static inline pte_t pte_mkprivileged(pte_t pte)
+{
+   return __pte((pte_val(pte) & ~_PAGE_USER) | _PAGE_PRIVILEGED);
+}
+
+#define pte_mkprivileged pte_mkprivileged
+
+static inline pte_t pte_mkuser(pte_t pte)
+{
+   return __pte((pte_val(pte) & ~_PAGE_PRIVILEGED) | _PAGE_USER);
+}
+
+#define pte_mkuser pte_mkuser
+#endif /* __ASSEMBLY__ */
+
 #endif /* __KERNEL__ */
 #endif /*  _ASM_POWERPC_NOHASH_PTE_BOOK3E_H */
-- 
2.13.3



Re: [PATCH 1/5] powerpc/64s: Kernel Hypervisor Restricted Access Prevention

2018-10-17 Thread Nicholas Piggin
On Wed, 17 Oct 2018 17:44:19 +1100
Russell Currey  wrote:

> Kernel Hypervisor Restricted Access Prevention (KHRAP) utilises a feature
> of the Radix MMU which disallows read and write access to userspace
> addresses.  By utilising this, the kernel is prevented from accessing
> user data from outside of trusted paths that perform proper safety checks,
> such as copy_{to/from}_user() and friends.
> 
> Userspace access is disabled from early boot and is only enabled when:
> 
>   - exiting the kernel and entering userspace
>   - performing an operation like copy_{to/from}_user()
>   - context switching to a process that has access enabled
> 
> and similarly, access is disabled again when exiting userspace and entering
> the kernel.
> 
> This feature has a slight performance impact which I roughly measured to be
> 4% slower (performing 1GB of 1 byte read()/write() syscalls), and is gated
> behind the CONFIG_PPC_RADIX_KHRAP option for performance-critical builds.
> 
> This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and
> performing the following:
> 
>   echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT
> 
> if enabled, this should send SIGSEGV to the thread.
> 
> Signed-off-by: Russell Currey 
> ---
> More detailed benchmarks soon, there's more optimisations here as well.

Nice, this turned out to be a lot neater than I feared! Good stuff.

> @@ -240,6 +240,22 @@ BEGIN_FTR_SECTION_NESTED(941)
> \
>   mtspr   SPRN_PPR,ra;\
>  END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
>  
> +#define LOCK_AMR(reg)
> \
> +BEGIN_MMU_FTR_SECTION_NESTED(69) 
> \
> + LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
> + isync;  \
> + mtspr   SPRN_AMR,reg;   \
> + isync;  \
> +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KHRAP,MMU_FTR_RADIX_KHRAP,69)
> +
> +#define UNLOCK_AMR(reg)  
> \
> +BEGIN_MMU_FTR_SECTION_NESTED(420)
> \
> + li  reg,0;  \
> + isync;  \
> + mtspr   SPRN_AMR,reg;   \
> + isync;  \
> +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KHRAP,MMU_FTR_RADIX_KHRAP,420)

I wonder if you can skip the first isync on the way in and the second
isync on the way out because the interrupt and return should be context
synchronizing. Might not make a difference though.

What do you think about making the name match the C code a bit more.
Like AMR_LOCK_USER_ACCESS()?

Thanks,
Nick


Re: Crash on FSL Book3E due to pte_pgprot()? (was Re: [PATCH v3 12/24] powerpc/mm: use pte helpers in generic code)

2018-10-17 Thread Aneesh Kumar K.V

On 10/17/18 4:42 PM, Christophe Leroy wrote:



On 10/17/2018 10:32 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

On 10/17/2018 12:59 AM, Michael Ellerman wrote:

...

The question is what's the right way to fix it? Should pte_pgprot() not
be filtering those bits out on book3e?


I think we should not use pte_pggrot() for that then. What about the
below fix ?


Thanks, that almost works.

pte_mkprivileged() also needs to not strip _PAGE_BAP_SR.


Oops, I missed it allthough I knew it. Patch below.

From: Christophe Leroy 
Date: Wed, 17 Oct 2018 10:46:24 +
Subject: [PATCH] powerpc/book3e: redefine pte_mkprivileged() and 
pte_mkuser()
To: Benjamin Herrenschmidt , Paul Mackerras 
, Michael Ellerman 

Cc: linux-ker...@vger.kernel.org, linuxppc-dev@lists.ozlabs.org

Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
default pte_mkprivileged() and pte_mkuser() are not usable.

This patch redefines them for book3e.

Fixes: a0da4bc166f2 ("powerpc/mm: Allow platforms to redefine some 
helpers")

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/nohash/pte-book3e.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h

index 58eef8cb569d..fb4297dff3e2 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -109,5 +109,19 @@
  #define PAGE_READONLY    __pgprot(_PAGE_BASE | _PAGE_USER)
  #define PAGE_READONLY_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)

+static inline pte_t pte_mkprivileged(pte_t pte)
+{
+    return __pte((pte_val(pte) & ~_PAGE_USER) | _PAGE_PRIVILEGED);
+}
+
+#define pte_mkprivileged pte_mkprivileged
+
+static inline pte_t pte_mkuser(pte_t pte)
+{
+    return __pte((pte_val(pte) & ~_PAGE_PRIVILEGED) | _PAGE_USER);
+}
+
+#define pte_mkuser pte_mkuser
+


I was build testing a similar patch. We would need to put #ifndef 
__ASSEMBLY__ around it.




-aneesh



Re: [PATCH v5 21/22] powerpc/nohash32: allow setting GUARDED attribute in the PMD directly

2018-10-17 Thread Christophe LEROY




Le 25/09/2018 à 18:51, Christophe Leroy a écrit :

On the 8xx, the GUARDED attribute of the pages is managed in the
L1 entry, therefore to avoid having to copy it into L1 entry
at each TLB miss, we have to set it in the PMD

In order to allow this, this patch splits the VM alloc space in two
parts, one for VM alloc and non Guarded IO, and one for Guarded IO.

Signed-off-by: Christophe Leroy 


I'm not too happy with this part, I think I'll drop it for now and 
rework it in a future serie to something more generic using mm slices.


Christophe


---
  arch/powerpc/include/asm/book3s/32/pgalloc.h |  2 +-
  arch/powerpc/include/asm/book3s/32/pgtable.h |  2 ++
  arch/powerpc/include/asm/nohash/32/pgalloc.h | 19 --
  arch/powerpc/include/asm/nohash/32/pgtable.h | 19 --
  arch/powerpc/mm/dump_linuxpagetables.c   | 21 +--
  arch/powerpc/mm/mem.c|  7 
  arch/powerpc/mm/pgtable_32.c | 52 +---
  arch/powerpc/platforms/Kconfig.cputype   |  2 ++
  8 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 711a8b84e3ee..9097cfd4ce43 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -139,7 +139,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
pgtable_free_tlb(tlb, page_address(table), 0);
  }
  
-static inline pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)

+static inline pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va, 
bool is_g)
  {
if (!pmd_present(*pmdp)) {
pte_t *ptep = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 7a8a590f6b4c..28001d5eaa89 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -156,6 +156,8 @@ static inline bool pte_user(pte_t pte)
  #define IOREMAP_TOP   KVIRT_TOP
  #endif
  
+#define IOREMAP_BASE	VMALLOC_START

+
  /*
   * Just any arbitrary offset to the start of the vmalloc VM area: the
   * current 16MB value just means that there will be a 64MB "hole" after the
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 77c09bef3122..bfb26c385dac 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -60,6 +60,14 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, 
pmd_t *pmdp,
*pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
  }
  
+#ifdef CONFIG_PPC_PMD_GUARDED

+static inline void pmd_populate_kernel_g(struct mm_struct *mm, pmd_t *pmdp,
+pte_t *pte)
+{
+   *pmdp = __pmd(__pa(pte) | _PMD_PRESENT | _PMD_GUARDED);
+}
+#endif
+
  static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
  {
@@ -84,6 +92,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
  #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
  #endif
  
+#ifndef CONFIG_PPC_PMD_GUARDED

+#define pmd_populate_kernel_g  pmd_populate_kernel
+#endif
+
  static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
  unsigned long address)
  {
@@ -151,7 +163,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
pgtable_free_tlb(tlb, table, 0);
  }
  
-static inline pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)

+static inline pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va, 
bool is_g)
  {
if (!pmd_present(*pmdp)) {
pte_t *ptep = __va(memblock_alloc(PTE_FRAG_SIZE, 
PTE_FRAG_SIZE));
@@ -164,7 +176,10 @@ static inline pte_t *early_pte_alloc_kernel(pmd_t *pmdp, 
unsigned long va)
else
memset(ptep, 0, PTE_FRAG_SIZE);
  
-		pmd_populate_kernel(_mm, pmdp, ptep);

+   if (is_g)
+   pmd_populate_kernel_g(_mm, pmdp, ptep);
+   else
+   pmd_populate_kernel(_mm, pmdp, ptep);
}
return pte_offset_kernel(pmdp, va);
  }
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index dc82c10383d5..fccc5620a988 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -84,9 +84,14 @@ extern int icache_44x_need_flush;
   * virtual space that goes below PKMAP and FIXMAP
   */
  #ifdef CONFIG_HIGHMEM
-#define KVIRT_TOP  PKMAP_BASE
+#define _KVIRT_TOP PKMAP_BASE
  #else
-#define KVIRT_TOP  (0xfe00UL)  /* for now, could be FIXMAP_BASE ? */
+#define _KVIRT_TOP (0xfe00UL)  /* for now, could be 

Re: [PATCH 1/5] powerpc/64s: Kernel Hypervisor Restricted Access Prevention

2018-10-17 Thread Michael Ellerman
Russell Currey  writes:
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 7b1693adff2a..090f72cbb02d 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -286,6 +286,9 @@ BEGIN_FTR_SECTION
>   HMT_MEDIUM_LOW
>  END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>  
> + /* headed back to userspace, so unlock the AMR */
> + UNLOCK_AMR(r2)
> +

This one needs an ifdef, or preferable an empty version in a header for
non-book3s 64, otherwise we get:

  arch/powerpc/kernel/entry_64.S: Assembler messages:
  arch/powerpc/kernel/entry_64.S:290: Error: unrecognized opcode: 
`unlock_amr(%r2)'
  scripts/Makefile.build:405: recipe for target 
'arch/powerpc/kernel/entry_64.o' failed

That's a corenet64-ish defconfig.

cheers


Re: Crash on FSL Book3E due to pte_pgprot()? (was Re: [PATCH v3 12/24] powerpc/mm: use pte helpers in generic code)

2018-10-17 Thread Christophe Leroy




On 10/17/2018 10:32 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

On 10/17/2018 12:59 AM, Michael Ellerman wrote:

...

The question is what's the right way to fix it? Should pte_pgprot() not
be filtering those bits out on book3e?


I think we should not use pte_pggrot() for that then. What about the
below fix ?


Thanks, that almost works.

pte_mkprivileged() also needs to not strip _PAGE_BAP_SR.


Oops, I missed it allthough I knew it. Patch below.

From: Christophe Leroy 
Date: Wed, 17 Oct 2018 10:46:24 +
Subject: [PATCH] powerpc/book3e: redefine pte_mkprivileged() and 
pte_mkuser()
To: Benjamin Herrenschmidt , Paul Mackerras 
, Michael Ellerman 

Cc: linux-ker...@vger.kernel.org, linuxppc-dev@lists.ozlabs.org

Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
default pte_mkprivileged() and pte_mkuser() are not usable.

This patch redefines them for book3e.

Fixes: a0da4bc166f2 ("powerpc/mm: Allow platforms to redefine some helpers")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/pte-book3e.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h

index 58eef8cb569d..fb4297dff3e2 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -109,5 +109,19 @@
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
 #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)

+static inline pte_t pte_mkprivileged(pte_t pte)
+{
+   return __pte((pte_val(pte) & ~_PAGE_USER) | _PAGE_PRIVILEGED);
+}
+
+#define pte_mkprivileged pte_mkprivileged
+
+static inline pte_t pte_mkuser(pte_t pte)
+{
+   return __pte((pte_val(pte) & ~_PAGE_PRIVILEGED) | _PAGE_USER);
+}
+
+#define pte_mkuser pte_mkuser
+
 #endif /* __KERNEL__ */
 #endif /*  _ASM_POWERPC_NOHASH_PTE_BOOK3E_H */
--
2.13.3






But there's also a use of pte_pgprot() in mm/memory.c, and I think that
is also broken now that we don't add PAGE_KERNEL back in.

Aneesh is going to do a patch to make pte_pgprot() only mask the PFN
which is what other arches do.


Yes I saw it, that's ok for me.

Christophe



cheers


From: Christophe Leroy 
Date: Wed, 17 Oct 2018 05:56:25 +
Subject: [PATCH] powerpc/mm: don't use pte_pgprot() in ioremap_prot()

pte_pgprot() filters out some required flags like _PAGE_PRESENT.

This patch replaces pte_pgprot() by __pgprot(pte_val())
in ioremap_prot()

Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy 
---
   arch/powerpc/mm/pgtable_32.c | 3 ++-
   arch/powerpc/mm/pgtable_64.c | 4 ++--
   2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5877f5aa8f5d..a606e2f4937b 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -122,7 +122,8 @@ ioremap_prot(phys_addr_t addr, unsigned long size,
unsigned long flags)
pte = pte_exprotect(pte);
pte = pte_mkprivileged(pte);

-   return __ioremap_caller(addr, size, pte_pgprot(pte),
__builtin_return_address(0));
+   return __ioremap_caller(addr, size, __pgprot(pte_val(pte)),
+   __builtin_return_address(0));
   }
   EXPORT_SYMBOL(ioremap_prot);

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index fb1375c07e8c..836bf436cabb 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -245,8 +245,8 @@ void __iomem * ioremap_prot(phys_addr_t addr,
unsigned long size,
pte = pte_mkprivileged(pte);

if (ppc_md.ioremap)
-   return ppc_md.ioremap(addr, size, pte_pgprot(pte), caller);
-   return __ioremap_caller(addr, size, pte_pgprot(pte), caller);
+   return ppc_md.ioremap(addr, size, __pgprot(pte_val(pte)), 
caller);
+   return __ioremap_caller(addr, size, __pgprot(pte_val(pte)), caller);
   }


--
2.13.3


Re: Crash on FSL Book3E due to pte_pgprot()? (was Re: [PATCH v3 12/24] powerpc/mm: use pte helpers in generic code)

2018-10-17 Thread Michael Ellerman
Christophe Leroy  writes:
> On 10/17/2018 12:59 AM, Michael Ellerman wrote:
...
>> The question is what's the right way to fix it? Should pte_pgprot() not
>> be filtering those bits out on book3e?
>
> I think we should not use pte_pggrot() for that then. What about the 
> below fix ?

Thanks, that almost works.

pte_mkprivileged() also needs to not strip _PAGE_BAP_SR.


But there's also a use of pte_pgprot() in mm/memory.c, and I think that
is also broken now that we don't add PAGE_KERNEL back in.

Aneesh is going to do a patch to make pte_pgprot() only mask the PFN
which is what other arches do.

cheers

> From: Christophe Leroy 
> Date: Wed, 17 Oct 2018 05:56:25 +
> Subject: [PATCH] powerpc/mm: don't use pte_pgprot() in ioremap_prot()
>
> pte_pgprot() filters out some required flags like _PAGE_PRESENT.
>
> This patch replaces pte_pgprot() by __pgprot(pte_val())
> in ioremap_prot()
>
> Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
> Signed-off-by: Christophe Leroy 
> ---
>   arch/powerpc/mm/pgtable_32.c | 3 ++-
>   arch/powerpc/mm/pgtable_64.c | 4 ++--
>   2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 5877f5aa8f5d..a606e2f4937b 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -122,7 +122,8 @@ ioremap_prot(phys_addr_t addr, unsigned long size, 
> unsigned long flags)
>   pte = pte_exprotect(pte);
>   pte = pte_mkprivileged(pte);
>
> - return __ioremap_caller(addr, size, pte_pgprot(pte), 
> __builtin_return_address(0));
> + return __ioremap_caller(addr, size, __pgprot(pte_val(pte)),
> + __builtin_return_address(0));
>   }
>   EXPORT_SYMBOL(ioremap_prot);
>
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index fb1375c07e8c..836bf436cabb 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -245,8 +245,8 @@ void __iomem * ioremap_prot(phys_addr_t addr, 
> unsigned long size,
>   pte = pte_mkprivileged(pte);
>
>   if (ppc_md.ioremap)
> - return ppc_md.ioremap(addr, size, pte_pgprot(pte), caller);
> - return __ioremap_caller(addr, size, pte_pgprot(pte), caller);
> + return ppc_md.ioremap(addr, size, __pgprot(pte_val(pte)), 
> caller);
> + return __ioremap_caller(addr, size, __pgprot(pte_val(pte)), caller);
>   }
>
>
> -- 
> 2.13.3


Re: [PATCH 1/5] powerpc/64s: Kernel Hypervisor Restricted Access Prevention

2018-10-17 Thread kbuild test robot
Hi Russell,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20181016]
[cannot apply to v4.19-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Russell-Currey/powerpc-64s-Kernel-Hypervisor-Restricted-Access-Prevention/20181017-153543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-storcenter_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   In file included from include/linux/uaccess.h:14:0,
from net/core/datagram.c:40:
   arch/powerpc/include/asm/uaccess.h: In function 'unlock_user_access':
>> arch/powerpc/include/asm/uaccess.h:69:6: error: implicit declaration of 
>> function 'mmu_has_feature'; did you mean 'firmware_has_feature'? 
>> [-Werror=implicit-function-declaration]
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 firmware_has_feature
>> arch/powerpc/include/asm/uaccess.h:69:22: error: 'MMU_FTR_RADIX_KHRAP' 
>> undeclared (first use in this function); did you mean 'CPU_FTR_CAN_NAP'?
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 CPU_FTR_CAN_NAP
   arch/powerpc/include/asm/uaccess.h:69:22: note: each undeclared identifier 
is reported only once for each function it appears in
   arch/powerpc/include/asm/uaccess.h: In function 'lock_user_access':
   arch/powerpc/include/asm/uaccess.h:83:22: error: 'MMU_FTR_RADIX_KHRAP' 
undeclared (first use in this function); did you mean 'CPU_FTR_CAN_NAP'?
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 CPU_FTR_CAN_NAP
   In file included from include/linux/mm_types.h:18:0,
from include/linux/mm.h:17,
from net/core/datagram.c:41:
   arch/powerpc/include/asm/mmu.h: At top level:
>> arch/powerpc/include/asm/mmu.h:209:20: error: conflicting types for 
>> 'mmu_has_feature'
static inline bool mmu_has_feature(unsigned long feature)
   ^~~
   In file included from include/linux/uaccess.h:14:0,
from net/core/datagram.c:40:
   arch/powerpc/include/asm/uaccess.h:69:6: note: previous implicit declaration 
of 'mmu_has_feature' was here
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
   cc1: some warnings being treated as errors
--
   In file included from include/linux/uaccess.h:14:0,
from include/linux/crypto.h:26,
from include/crypto/skcipher.h:16,
from include/crypto/chacha20.h:9,
from lib/chacha20.c:17:
   arch/powerpc/include/asm/uaccess.h: In function 'unlock_user_access':
>> arch/powerpc/include/asm/uaccess.h:69:6: error: implicit declaration of 
>> function 'mmu_has_feature'; did you mean 'firmware_has_feature'? 
>> [-Werror=implicit-function-declaration]
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 firmware_has_feature
>> arch/powerpc/include/asm/uaccess.h:69:22: error: 'MMU_FTR_RADIX_KHRAP' 
>> undeclared (first use in this function); did you mean 'CPU_FTR_CAN_NAP'?
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 CPU_FTR_CAN_NAP
   arch/powerpc/include/asm/uaccess.h:69:22: note: each undeclared identifier 
is reported only once for each function it appears in
   arch/powerpc/include/asm/uaccess.h: In function 'lock_user_access':
   arch/powerpc/include/asm/uaccess.h:83:22: error: 'MMU_FTR_RADIX_KHRAP' 
undeclared (first use in this function); did you mean 'CPU_FTR_CAN_NAP'?
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 CPU_FTR_CAN_NAP
   cc1: some warnings being treated as errors
--
   In file included from include/linux/uaccess.h:14:0,
from arch/powerpc/kernel/module.c:25:
   arch/powerpc/include/asm/uaccess.h: In function 'unlock_user_access':
>> arch/powerpc/include/asm/uaccess.h:69:6: error: implicit declaration of 
>> function 'mmu_has_feature'; did you mean 'firmware_has_feature'? 
>> [-Werror=implicit-function-declaration]
 if (mmu_has_feature(MMU_FTR_RADIX_KHRAP)) {
 ^~~
 firmware_has_feature
>> arch/powerpc/include/asm/uaccess.h:69:22: error: 'MMU_FTR_RADIX_KHRAP'

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-17 Thread Vlastimil Babka
On 10/16/18 9:43 PM, Joel Fernandes wrote:
> On Tue, Oct 16, 2018 at 01:29:52PM +0200, Vlastimil Babka wrote:
>> On 10/16/18 12:33 AM, Joel Fernandes wrote:
>>> On Mon, Oct 15, 2018 at 02:42:09AM -0700, Christoph Hellwig wrote:
 On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote:
> Android needs to mremap large regions of memory during memory management
> related operations.

 Just curious: why?
>>>
>>> In Android we have a requirement of moving a large (up to a GB now, but may
>>> grow bigger in future) memory range from one location to another.
>>
>> I think Christoph's "why?" was about the requirement, not why it hurts
>> applications. I admit I'm now also curious :)
> 
> This issue was discovered when we wanted to be able to move the physical
> pages of a memory range to another location quickly so that, after the
> application threads are resumed, UFFDIO_REGISTER_MODE_MISSING userfaultfd
> faults can be received on the original memory range. The actual operations
> performed on the memory range are beyond the scope of this discussion. The
> user threads continue to refer to the old address which will now fault. The
> reason we want retain the old memory range and receives faults there is to
> avoid the need to fix the addresses all over the address space of the threads
> after we finish with performing operations on them in the fault handlers, so
> we mremap it and receive faults at the old addresses.
> 
> Does that answer your question?

Yes, interesting, thanks!

Vlastimil

> thanks,
> 
> - Joel
> 



Re: Crash on FSL Book3E due to pte_pgprot()? (was Re: [PATCH v3 12/24] powerpc/mm: use pte helpers in generic code)

2018-10-17 Thread Christophe LEROY




Le 17/10/2018 à 11:39, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


On 10/17/2018 12:59 AM, Michael Ellerman wrote:

Christophe Leroy  writes:


Get rid of platform specific _PAGE_ in powerpc common code and
use helpers instead.

mm/dump_linuxpagetables.c will be handled separately

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
   arch/powerpc/include/asm/book3s/32/pgtable.h |  9 +++--
   arch/powerpc/include/asm/nohash/32/pgtable.h | 12 
   arch/powerpc/include/asm/nohash/pgtable.h|  3 +--
   arch/powerpc/mm/pgtable.c| 21 +++--
   arch/powerpc/mm/pgtable_32.c | 15 ---
   arch/powerpc/mm/pgtable_64.c | 14 +++---
   arch/powerpc/xmon/xmon.c | 12 +++-
   7 files changed, 41 insertions(+), 45 deletions(-)


So turns out this patch *also* breaks my p5020ds :)

Even with patch 4 merged, see next.

It's the same crash:

pcieport 2000:00:00.0: AER enabled with IRQ 480
Unable to handle kernel paging request for data at address 
0x88008008
Faulting instruction address: 0xc00192cc
Oops: Kernel access of bad area, sig: 11 [#1]
BE SMP NR_CPUS=24 CoreNet Generic
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc7x-g98c847323b3a #1
NIP:  c00192cc LR: c05d0f9c CTR: 0010
REGS: c000f31bb400 TRAP: 0300   Not tainted  
(4.19.0-rc3-gcc7x-g98c847323b3a)
MSR:  80029000   CR: 24000224  XER: 
DEAR: 88008008 ESR: 0080 IRQMASK: 0
GPR00: c05d0f84 c000f31bb688 c117dc00 88008008
GPR04:  0040 0ffbff241010 c000f31b8000
GPR08:  0010  c12d4710
GPR12: 84000422 c12ff000 c0002774 
GPR16:    
GPR20:    
GPR24:   88008008 c00089a8
GPR28: c000f3576400 c000f3576410 0040 c12ecc98
NIP [c00192cc] ._memset_io+0x6c/0x9c
LR [c05d0f9c] .fsl_qman_probe+0x198/0x928
Call Trace:
[c000f31bb688] [c05d0f84] .fsl_qman_probe+0x180/0x928 
(unreliable)
[c000f31bb728] [c06432ec] .platform_drv_probe+0x60/0xb4
[c000f31bb7a8] [c064083c] .really_probe+0x294/0x35c
[c000f31bb848] [c0640d2c] .__driver_attach+0x148/0x14c
[c000f31bb8d8] [c063d7dc] .bus_for_each_dev+0xb0/0x118
[c000f31bb988] [c063ff28] .driver_attach+0x34/0x4c
[c000f31bba08] [c063f648] .bus_add_driver+0x174/0x2bc
[c000f31bbaa8] [c06418bc] .driver_register+0x90/0x180
[c000f31bbb28] [c0643270] .__platform_driver_register+0x60/0x7c
[c000f31bbba8] [c0ee2a70] .fsl_qman_driver_init+0x24/0x38
[c000f31bbc18] [c00023fc] .do_one_initcall+0x64/0x2b8
[c000f31bbcf8] [c0e9f480] .kernel_init_freeable+0x3a8/0x494
[c000f31bbda8] [c0002798] .kernel_init+0x24/0x148
[c000f31bbe28] [c9e8] .ret_from_kernel_thread+0x58/0x70
Instruction dump:
4e800020 2ba50003 40dd003c 3925fffc 5488402e 7929f082 7d082378 39290001
550a801e 7d2903a6 7d4a4378 794a0020 <9143> 38630004 4200fff8 70a50003


Comparing a working vs broken kernel, it seems to boil down to the fact
that we're filtering out more PTE bits now that we use pte_pgprot() in
ioremap_prot().

With the old code we get:
ioremap_prot: addr 0xff80 flags 0x241215
ioremap_prot: addr 0xff80 flags 0x241215
map_kernel_page: ea 0x88008008 pa 0xff80 pte 0xff800241215


And now we get:
ioremap_prot: addr 0xff80 flags 0x241215 pte 0x241215
ioremap_prot: addr 0xff80 pte 0x241215
ioremap_prot: addr 0xff80 prot 0x241014
map_kernel_page: ea 0x88008008 pa 0xff80 pte 0xff800241014

So we're losing 0x201, which for nohash book3e is:

#define _PAGE_PRESENT   0x01 /* software: pte contains a 
translation */
#define _PAGE_PSIZE_4K  0x000200


I haven't worked out if it's one or both of those that matter.


At least missing _PAGE_PRESENT is an issue I believe.


The question is what's the right way to fix it? Should pte_pgprot() not
be filtering those bits out on book3e?


I think we should not use pte_pggrot() for that then. What about the
below fix ?

Christophe

From: Christophe Leroy 
Date: Wed, 17 Oct 2018 05:56:25 +
Subject: [PATCH] powerpc/mm: don't use pte_pgprot() in ioremap_prot()

pte_pgprot() filters out some required flags like _PAGE_PRESENT.

This patch replaces pte_pgprot() by __pgprot(pte_val())
in ioremap_prot()

Fixes: 

Re: Crash on FSL Book3E due to pte_pgprot()? (was Re: [PATCH v3 12/24] powerpc/mm: use pte helpers in generic code)

2018-10-17 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> On 10/17/2018 12:59 AM, Michael Ellerman wrote:
>> Christophe Leroy  writes:
>> 
>>> Get rid of platform specific _PAGE_ in powerpc common code and
>>> use helpers instead.
>>>
>>> mm/dump_linuxpagetables.c will be handled separately
>>>
>>> Reviewed-by: Aneesh Kumar K.V 
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>>   arch/powerpc/include/asm/book3s/32/pgtable.h |  9 +++--
>>>   arch/powerpc/include/asm/nohash/32/pgtable.h | 12 
>>>   arch/powerpc/include/asm/nohash/pgtable.h|  3 +--
>>>   arch/powerpc/mm/pgtable.c| 21 +++--
>>>   arch/powerpc/mm/pgtable_32.c | 15 ---
>>>   arch/powerpc/mm/pgtable_64.c | 14 +++---
>>>   arch/powerpc/xmon/xmon.c | 12 +++-
>>>   7 files changed, 41 insertions(+), 45 deletions(-)
>> 
>> So turns out this patch *also* breaks my p5020ds :)
>> 
>> Even with patch 4 merged, see next.
>> 
>> It's the same crash:
>> 
>>pcieport 2000:00:00.0: AER enabled with IRQ 480
>>Unable to handle kernel paging request for data at address 
>> 0x88008008
>>Faulting instruction address: 0xc00192cc
>>Oops: Kernel access of bad area, sig: 11 [#1]
>>BE SMP NR_CPUS=24 CoreNet Generic
>>Modules linked in:
>>CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc7x-g98c847323b3a 
>> #1
>>NIP:  c00192cc LR: c05d0f9c CTR: 0010
>>REGS: c000f31bb400 TRAP: 0300   Not tainted  
>> (4.19.0-rc3-gcc7x-g98c847323b3a)
>>MSR:  80029000   CR: 24000224  XER: 
>>DEAR: 88008008 ESR: 0080 IRQMASK: 0
>>GPR00: c05d0f84 c000f31bb688 c117dc00 88008008
>>GPR04:  0040 0ffbff241010 c000f31b8000
>>GPR08:  0010  c12d4710
>>GPR12: 84000422 c12ff000 c0002774 
>>GPR16:    
>>GPR20:    
>>GPR24:   88008008 c00089a8
>>GPR28: c000f3576400 c000f3576410 0040 c12ecc98
>>NIP [c00192cc] ._memset_io+0x6c/0x9c
>>LR [c05d0f9c] .fsl_qman_probe+0x198/0x928
>>Call Trace:
>>[c000f31bb688] [c05d0f84] .fsl_qman_probe+0x180/0x928 
>> (unreliable)
>>[c000f31bb728] [c06432ec] .platform_drv_probe+0x60/0xb4
>>[c000f31bb7a8] [c064083c] .really_probe+0x294/0x35c
>>[c000f31bb848] [c0640d2c] .__driver_attach+0x148/0x14c
>>[c000f31bb8d8] [c063d7dc] .bus_for_each_dev+0xb0/0x118
>>[c000f31bb988] [c063ff28] .driver_attach+0x34/0x4c
>>[c000f31bba08] [c063f648] .bus_add_driver+0x174/0x2bc
>>[c000f31bbaa8] [c06418bc] .driver_register+0x90/0x180
>>[c000f31bbb28] [c0643270] 
>> .__platform_driver_register+0x60/0x7c
>>[c000f31bbba8] [c0ee2a70] .fsl_qman_driver_init+0x24/0x38
>>[c000f31bbc18] [c00023fc] .do_one_initcall+0x64/0x2b8
>>[c000f31bbcf8] [c0e9f480] .kernel_init_freeable+0x3a8/0x494
>>[c000f31bbda8] [c0002798] .kernel_init+0x24/0x148
>>[c000f31bbe28] [c9e8] .ret_from_kernel_thread+0x58/0x70
>>Instruction dump:
>>4e800020 2ba50003 40dd003c 3925fffc 5488402e 7929f082 7d082378 39290001
>>550a801e 7d2903a6 7d4a4378 794a0020 <9143> 38630004 4200fff8 70a50003
>> 
>> 
>> Comparing a working vs broken kernel, it seems to boil down to the fact
>> that we're filtering out more PTE bits now that we use pte_pgprot() in
>> ioremap_prot().
>> 
>> With the old code we get:
>>ioremap_prot: addr 0xff80 flags 0x241215
>>ioremap_prot: addr 0xff80 flags 0x241215
>>map_kernel_page: ea 0x88008008 pa 0xff80 pte 0xff800241215
>> 
>> 
>> And now we get:
>>ioremap_prot: addr 0xff80 flags 0x241215 pte 0x241215
>>ioremap_prot: addr 0xff80 pte 0x241215
>>ioremap_prot: addr 0xff80 prot 0x241014
>>map_kernel_page: ea 0x88008008 pa 0xff80 pte 0xff800241014
>> 
>> So we're losing 0x201, which for nohash book3e is:
>> 
>>#define _PAGE_PRESENT 0x01 /* software: pte contains a 
>> translation */
>>#define _PAGE_PSIZE_4K0x000200
>> 
>> 
>> I haven't worked out if it's one or both of those that matter.
>
> At least missing _PAGE_PRESENT is an issue I believe.
>> 
>> The question is what's the right way to fix it? Should pte_pgprot() not
>> be filtering those bits out on book3e?
>
> I think we should not use pte_pggrot() for that then. What about the 
> below fix ?
>
> Christophe
>
> From: Christophe Leroy 
> Date: Wed, 17 Oct 2018 05:56:25 +
> Subject: 

Re: help

2018-10-17 Thread Madhavan Srinivasan




On Wednesday 17 October 2018 02:03 PM, Lorenzo Chelini wrote:

Hi All,

I am a PhD at IBM Zurich. I am playing around with the new POWER9 servers.
I am interested in plotting a roofline model for a given application, but I
need
to measure the traffic to and from the memory.
Ideally, what I would like to measure is the traffic at the memory
controller level.


Yes, you can get that data using Nest IMC counters. And perf
has support to expose these counters. Here is a usage example

$ perf stat -e nest_mcs01_imc/PM_MCS01_128B_RD_DISP_PORT01/  -I 1000 
--per-socket


Above command gets you all the Read traffic via specific memory 
controller/port.
Option "-I" reads the counter every sec and "--per-socket" presents the 
data socket wise.


"perf list nest_mcs" command will list all supportted events for memory 
controller.



Maddy



Do you know if this is possible using perf? If yes, which performance
counters I should
query to have an estimate on the memory traffic?

Thanks for your time.
Looking forward to hearing from you.
Best regards,
Lorenzo Chelini





help

2018-10-17 Thread Lorenzo Chelini


Hi All,

I am a PhD at IBM Zurich. I am playing around with the new POWER9 servers.
I am interested in plotting a roofline model for a given application, but I
need
to measure the traffic to and from the memory.
Ideally, what I would like to measure is the traffic at the memory
controller level.

Do you know if this is possible using perf? If yes, which performance
counters I should
query to have an estimate on the memory traffic?

Thanks for your time.
Looking forward to hearing from you.
Best regards,
Lorenzo Chelini



Re: move bus (PCI, PCMCIA, EISA, rapdio) config to drivers/ v2

2018-10-17 Thread Geert Uytterhoeven
Hi Christoph,

On Wed, Oct 17, 2018 at 10:03 AM Christoph Hellwig  wrote:
> currently every architecture that wants to provide on of the common
> periphal busses needs to add some boilerplate code and include the
> right Kconfig files.   This series instead just selects the presence
> (when needed) and then handles everything in the bus-specific
> Kconfig file under drivers/.
>
> Changes since v1:
>  - rename all HAS_* Kconfig symbols to HAVE_*
>  - drop the CONFIG_PCI_QSPAN option entirely
>  - drop duplicate select from powerpc
>  - restore missing selection of PCI_MSI for riscv
>  - update x86 and riscv defconfigs to include PCI
>  - actually inclue drivers/eisa/Kconfig
>  - adjust some captilizations

Thanks for the update!

Please use "git format-patch -v --cover" to prepare patch series
for sending with git-send-email.

  "-v" to prefix all patches with version number ,
  "--cover" to have a "[PATCH 0/]" prefix in the cover letter.

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 8/8] kconfig: remove CONFIG_MCA leftovers

2018-10-17 Thread Christoph Hellwig
On Tue, Oct 16, 2018 at 02:20:23PM +0900, Masahiro Yamada wrote:
> On Sun, Oct 14, 2018 at 12:11 AM Christoph Hellwig  wrote:
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> 
> 
> Can you use "powerpc:" or something
> for the subject line?
> 
> I'd like to see "kconfig:" only for patches
> that touch the scripts/kconfig/ directory.

Sorry, I missed this for v2.  Will fix it up for the next version
or let you fix it up if there isn't one.


move bus (PCI, PCMCIA, EISA, rapdio) config to drivers/ v2

2018-10-17 Thread Christoph Hellwig
Hi all,

currently every architecture that wants to provide on of the common
periphal busses needs to add some boilerplate code and include the
right Kconfig files.   This series instead just selects the presence
(when needed) and then handles everything in the bus-specific
Kconfig file under drivers/.

Changes since v1:
 - rename all HAS_* Kconfig symbols to HAVE_*
 - drop the CONFIG_PCI_QSPAN option entirely
 - drop duplicate select from powerpc
 - restore missing selection of PCI_MSI for riscv
 - update x86 and riscv defconfigs to include PCI
 - actually inclue drivers/eisa/Kconfig
 - adjust some captilizations


[PATCH 8/8] kconfig: remove CONFIG_MCA leftovers

2018-10-17 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
---
 arch/powerpc/Kconfig | 4 
 drivers/scsi/Kconfig | 6 +++---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f2001fff14d1..f3ec13765639 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -944,10 +944,6 @@ config FSL_GTM
help
  Freescale General-purpose Timers support
 
-# Yes MCA RS/6000s exist but Linux-PPC does not currently support any
-config MCA
-   bool
-
 config PCI_DOMAINS
def_bool PCI
 
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 7c097006c54d..d3734c54aec9 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -535,7 +535,7 @@ config SCSI_HPTIOP
 
 config SCSI_BUSLOGIC
tristate "BusLogic SCSI support"
-   depends on (PCI || ISA || MCA) && SCSI && ISA_DMA_API && VIRT_TO_BUS
+   depends on (PCI || ISA) && SCSI && ISA_DMA_API && VIRT_TO_BUS
---help---
  This is support for BusLogic MultiMaster and FlashPoint SCSI Host
  Adapters. Consult the SCSI-HOWTO, available from
@@ -1142,12 +1142,12 @@ config SCSI_LPFC_DEBUG_FS
 
 config SCSI_SIM710
tristate "Simple 53c710 SCSI support (Compaq, NCR machines)"
-   depends on (EISA || MCA) && SCSI
+   depends on EISA && SCSI
select SCSI_SPI_ATTRS
---help---
  This driver is for NCR53c710 based SCSI host adapters.
 
- It currently supports Compaq EISA cards and NCR MCA cards
+ It currently supports Compaq EISA cards.
 
 config SCSI_DC395x
tristate "Tekram DC395(U/UW/F) and DC315(U) SCSI support"
-- 
2.19.1



[PATCH 3/8] powerpc: PCI_MSI needs PCI

2018-10-17 Thread Christoph Hellwig
Various powerpc boards select the PCI_MSI config option without selecting
PCI, resulting in potentially not compilable configurations if the by
default enabled PCI option is disabled.  Explicitly select PCI to ensure
we always have valid configs.

Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
---
 arch/powerpc/platforms/40x/Kconfig | 1 +
 arch/powerpc/platforms/44x/Kconfig | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/40x/Kconfig 
b/arch/powerpc/platforms/40x/Kconfig
index 60254a321a91..d5361e63e0bb 100644
--- a/arch/powerpc/platforms/40x/Kconfig
+++ b/arch/powerpc/platforms/40x/Kconfig
@@ -33,6 +33,7 @@ config KILAUEA
select 405EX
select PPC40x_SIMPLE
select PPC4xx_PCI_EXPRESS
+   select PCI
select PCI_MSI
select PPC4xx_MSI
help
diff --git a/arch/powerpc/platforms/44x/Kconfig 
b/arch/powerpc/platforms/44x/Kconfig
index a6011422b861..70856a213663 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -24,6 +24,7 @@ config BLUESTONE
default n
select PPC44x_SIMPLE
select APM821xx
+   select PCI
select PCI_MSI
select PPC4xx_MSI
select PPC4xx_PCI_EXPRESS
@@ -78,6 +79,7 @@ config KATMAI
select 440SPe
select PCI
select PPC4xx_PCI_EXPRESS
+   select PCI
select PCI_MSI
select PPC4xx_MSI
help
@@ -219,6 +221,7 @@ config AKEBONO
select SWIOTLB
select 476FPE
select PPC4xx_PCI_EXPRESS
+   select PCI
select PCI_MSI
select PPC4xx_HSTA_MSI
select I2C
-- 
2.19.1



[PATCH 1/8] aha152x: rename the PCMCIA define

2018-10-17 Thread Christoph Hellwig
We plan to enable building the PCMCIA core and drivers, and the
non-prefixed PCMCIA name clashes with some arch headers.

Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
---
 drivers/scsi/aha152x.c | 14 +++---
 drivers/scsi/pcmcia/aha152x_core.c |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c
index 4d7b0e0adbf7..301b3cad15f8 100644
--- a/drivers/scsi/aha152x.c
+++ b/drivers/scsi/aha152x.c
@@ -269,7 +269,7 @@ static LIST_HEAD(aha152x_host_list);
 /* DEFINES */
 
 /* For PCMCIA cards, always use AUTOCONF */
-#if defined(PCMCIA) || defined(MODULE)
+#if defined(AHA152X_PCMCIA) || defined(MODULE)
 #if !defined(AUTOCONF)
 #define AUTOCONF
 #endif
@@ -297,7 +297,7 @@ CMD_INC_RESID(struct scsi_cmnd *cmd, int inc)
 
 #define DELAY_DEFAULT 1000
 
-#if defined(PCMCIA)
+#if defined(AHA152X_PCMCIA)
 #define IRQ_MIN 0
 #define IRQ_MAX 16
 #else
@@ -328,7 +328,7 @@ MODULE_AUTHOR("Jürgen Fischer");
 MODULE_DESCRIPTION(AHA152X_REVID);
 MODULE_LICENSE("GPL");
 
-#if !defined(PCMCIA)
+#if !defined(AHA152X_PCMCIA)
 #if defined(MODULE)
 static int io[] = {0, 0};
 module_param_hw_array(io, int, ioport, NULL, 0);
@@ -391,7 +391,7 @@ static struct isapnp_device_id id_table[] = {
 MODULE_DEVICE_TABLE(isapnp, id_table);
 #endif /* ISAPNP */
 
-#endif /* !PCMCIA */
+#endif /* !AHA152X_PCMCIA */
 
 static struct scsi_host_template aha152x_driver_template;
 
@@ -863,7 +863,7 @@ void aha152x_release(struct Scsi_Host *shpnt)
if (shpnt->irq)
free_irq(shpnt->irq, shpnt);
 
-#if !defined(PCMCIA)
+#if !defined(AHA152X_PCMCIA)
if (shpnt->io_port)
release_region(shpnt->io_port, IO_RANGE);
 #endif
@@ -2924,7 +2924,7 @@ static struct scsi_host_template aha152x_driver_template 
= {
.slave_alloc= aha152x_adjust_queue,
 };
 
-#if !defined(PCMCIA)
+#if !defined(AHA152X_PCMCIA)
 static int setup_count;
 static struct aha152x_setup setup[2];
 
@@ -3392,4 +3392,4 @@ static int __init aha152x_setup(char *str)
 __setup("aha152x=", aha152x_setup);
 #endif
 
-#endif /* !PCMCIA */
+#endif /* !AHA152X_PCMCIA */
diff --git a/drivers/scsi/pcmcia/aha152x_core.c 
b/drivers/scsi/pcmcia/aha152x_core.c
index dba3716511c5..24b89228b241 100644
--- a/drivers/scsi/pcmcia/aha152x_core.c
+++ b/drivers/scsi/pcmcia/aha152x_core.c
@@ -1,3 +1,3 @@
-#define PCMCIA 1
+#define AHA152X_PCMCIA 1
 #define AHA152X_STAT 1
 #include "aha152x.c"
-- 
2.19.1



[PATCH 4/8] PCI: consolidate PCI config entry in drivers/pci

2018-10-17 Thread Christoph Hellwig
There is no good reason to duplicate the PCI menu in every architecture.
Instead provide a selectable HAS_PCI symbol that indicates availability
of PCI support and the handle the rest in drivers/pci.

Note that for powerpc we now select HAVE_PCI globally instead of the
convoluted mess of conditional or or non-conditional support per board,
similar to what we do e.g. on x86.  For alpha PCI is selected for the
non-jensen configs as it was the default before, and a lot of code does
not compile without PCI enabled.  On other architectures with limited
PCI support that wasn't as complicated I've left the selection as-is.

Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
Acked-by: Bjorn Helgaas 
---
 arch/alpha/Kconfig | 15 ++---
 arch/arc/Kconfig   | 20 
 arch/arc/plat-axs10x/Kconfig   |  2 +-
 arch/arc/plat-hsdk/Kconfig |  2 +-
 arch/arm/Kconfig   | 19 ++--
 arch/arm/mach-ks8695/Kconfig   | 10 +++---
 arch/arm/mach-pxa/Kconfig  |  2 +-
 arch/arm64/Kconfig | 10 +-
 arch/hexagon/Kconfig   |  3 --
 arch/ia64/Kconfig  |  9 +-
 arch/m68k/Kconfig.bus  | 11 ---
 arch/m68k/Kconfig.cpu  |  1 +
 arch/microblaze/Kconfig|  6 +---
 arch/mips/Kconfig  | 43 +-
 arch/mips/alchemy/Kconfig  |  6 ++--
 arch/mips/ath25/Kconfig|  2 +-
 arch/mips/ath79/Kconfig|  8 ++---
 arch/mips/bcm63xx/Kconfig  | 14 -
 arch/mips/lantiq/Kconfig   |  2 +-
 arch/mips/loongson64/Kconfig   |  6 ++--
 arch/mips/pmcs-msp71xx/Kconfig | 10 +++---
 arch/mips/ralink/Kconfig   |  8 ++---
 arch/mips/sibyte/Kconfig   | 10 +++---
 arch/mips/txx9/Kconfig |  8 ++---
 arch/mips/vr41xx/Kconfig   |  8 ++---
 arch/parisc/Kconfig|  1 +
 arch/powerpc/Kconfig   | 25 ---
 arch/powerpc/platforms/44x/Kconfig |  1 -
 arch/powerpc/platforms/512x/Kconfig|  1 -
 arch/powerpc/platforms/52xx/Kconfig|  1 -
 arch/powerpc/platforms/83xx/Kconfig|  1 -
 arch/powerpc/platforms/85xx/Kconfig|  1 -
 arch/powerpc/platforms/86xx/Kconfig|  2 --
 arch/powerpc/platforms/Kconfig |  1 -
 arch/powerpc/platforms/Kconfig.cputype |  2 --
 arch/powerpc/platforms/ps3/Kconfig |  1 -
 arch/riscv/Kconfig | 18 ++-
 arch/s390/Kconfig  | 23 +-
 arch/sh/Kconfig| 19 ++--
 arch/sh/boards/Kconfig | 30 +-
 arch/sparc/Kconfig | 15 +
 arch/um/Kconfig|  3 --
 arch/unicore32/Kconfig | 11 +--
 arch/x86/Kconfig   | 12 +--
 arch/x86/configs/i386_defconfig|  1 +
 arch/x86/configs/x86_64_defconfig  |  1 +
 arch/xtensa/Kconfig| 16 +-
 arch/xtensa/configs/common_defconfig   |  1 +
 arch/xtensa/configs/iss_defconfig  |  2 +-
 drivers/Kconfig|  4 +++
 drivers/parisc/Kconfig | 11 ---
 drivers/pci/Kconfig| 12 +++
 52 files changed, 133 insertions(+), 318 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 5b4f88363453..bb89924c0361 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -6,6 +6,8 @@ config ALPHA
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_NO_PREEMPT
select ARCH_USE_CMPXCHG_LOCKREF
+   select HAVE_PCI if !ALPHA_JENSEN
+   select PCI if !ALPHA_JENSEN
select HAVE_AOUT
select HAVE_IDE
select HAVE_OPROFILE
@@ -15,6 +17,7 @@ config ALPHA
select NEED_SG_DMA_LENGTH
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
+   select GENERIC_PCI_IOMAP if PCI
select AUTO_IRQ_AFFINITY if SMP
select GENERIC_IRQ_SHOW
select ARCH_WANT_IPC_PARSE_VERSION
@@ -319,17 +322,6 @@ config ISA_DMA_API
bool
default y
 
-config PCI
-   bool
-   depends on !ALPHA_JENSEN
-   select GENERIC_PCI_IOMAP
-   default y
-   help
- Find out whether you have a PCI motherboard. PCI is the name of a
- bus system, i.e. the way the CPU talks to the other stuff inside
- your box. Other bus systems are ISA, EISA, MicroChannel (MCA) or
- VESA. If you have PCI, say Y, otherwise N.
-
 config PCI_DOMAINS
bool
default y
@@ -681,7 +673,6 @@ config HZ
default 1200 if HZ_1200
default 1024
 
-source "drivers/pci/Kconfig"
 source "drivers/eisa/Kconfig"
 
 source "drivers/pcmcia/Kconfig"
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index a045f3086047..55a6953e9239 100644
--- a/arch/arc/Kconfig
+++ 

[PATCH 5/8] pcmcia: allow PCMCIA support independent of the architecture

2018-10-17 Thread Christoph Hellwig
There is nothing architecture specific in the PCMCIA core, so allow
building it everywhere.  The actual host controllers will depend on ISA,
PCI or a specific SOC.

Signed-off-by: Christoph Hellwig 
Acked-by: Dominik Brodowski 
Acked-by: Thomas Gleixner 
---
 arch/alpha/Kconfig | 2 --
 arch/arm/Kconfig   | 2 --
 arch/ia64/Kconfig  | 2 --
 arch/m68k/Kconfig.bus  | 2 --
 arch/mips/Kconfig  | 2 --
 arch/powerpc/Kconfig   | 2 --
 arch/sh/Kconfig| 2 --
 arch/sparc/Kconfig | 2 --
 arch/unicore32/Kconfig | 6 --
 arch/x86/Kconfig   | 2 --
 arch/xtensa/Kconfig| 2 --
 drivers/Kconfig| 1 +
 drivers/parisc/Kconfig | 2 --
 drivers/pcmcia/Kconfig | 1 +
 14 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index bb89924c0361..96f02268ea16 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -675,8 +675,6 @@ config HZ
 
 source "drivers/eisa/Kconfig"
 
-source "drivers/pcmcia/Kconfig"
-
 config SRM_ENV
tristate "SRM environment through procfs"
depends on PROC_FS
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 516105df6c71..60e37b9a715d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1256,8 +1256,6 @@ config PCI_HOST_ITE8152
default y
select DMABOUNCE
 
-source "drivers/pcmcia/Kconfig"
-
 endmenu
 
 menu "Kernel Features"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 89da763d7c17..704ff5922ce0 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -553,8 +553,6 @@ config PCI_DOMAINS
 config PCI_SYSCALL
def_bool PCI
 
-source "drivers/pcmcia/Kconfig"
-
 endmenu
 
 endif
diff --git a/arch/m68k/Kconfig.bus b/arch/m68k/Kconfig.bus
index 8cb0604b195b..9d0a3a23d50e 100644
--- a/arch/m68k/Kconfig.bus
+++ b/arch/m68k/Kconfig.bus
@@ -68,6 +68,4 @@ if !MMU
 config ISA_DMA_API
 def_bool !M5272
 
-source "drivers/pcmcia/Kconfig"
-
 endif
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 09b93d5a55cb..18eeb66c6d99 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3090,8 +3090,6 @@ config ZONE_DMA
 config ZONE_DMA32
bool
 
-source "drivers/pcmcia/Kconfig"
-
 config HAS_RAPIDIO
bool
default n
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8f5a49d11385..6430aafe712f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -963,8 +963,6 @@ config PCI_8260
select PPC_INDIRECT_PCI
default y
 
-source "drivers/pcmcia/Kconfig"
-
 config HAS_RAPIDIO
bool
default n
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2ff6855811a5..ce9487139155 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -861,8 +861,6 @@ config MAPLE
 config PCI_DOMAINS
bool
 
-source "drivers/pcmcia/Kconfig"
-
 endmenu
 
 menu "Power management options (EXPERIMENTAL)"
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index fc311b8dc46b..0198f96528fc 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -509,8 +509,6 @@ config SPARC_GRPCI2
help
  Say Y here to include the GRPCI2 Host Bridge Driver.
 
-source "drivers/pcmcia/Kconfig"
-
 config SUN_OPENPROMFS
tristate "Openprom tree appears in /proc/openprom"
help
diff --git a/arch/unicore32/Kconfig b/arch/unicore32/Kconfig
index 601dcad2560e..d7750e7c7ccb 100644
--- a/arch/unicore32/Kconfig
+++ b/arch/unicore32/Kconfig
@@ -118,12 +118,6 @@ config UNICORE_FPU_F64
 
 endmenu
 
-menu "Bus support"
-
-source "drivers/pcmcia/Kconfig"
-
-endmenu
-
 menu "Kernel Features"
 
 source "kernel/Kconfig.hz"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5816e20a3ff9..fda01408b596 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2810,8 +2810,6 @@ config AMD_NB
def_bool y
depends on CPU_SUP_AMD && PCI
 
-source "drivers/pcmcia/Kconfig"
-
 config RAPIDIO
tristate "RapidIO support"
depends on PCI
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index f057c16a48a5..c18ceaab7860 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -517,8 +517,6 @@ config FORCE_MAX_ZONEORDER
  This config option is actually maximum order plus one. For example,
  a value of 11 means that the largest free memory block is 2^10 pages.
 
-source "drivers/pcmcia/Kconfig"
-
 config PLATFORM_WANT_DEFAULT_MEM
def_bool n
 
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 059573823387..58ee88c36cf5 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -5,6 +5,7 @@ menu "Device Drivers"
 
 source "drivers/amba/Kconfig"
 source "drivers/pci/Kconfig"
+source "drivers/pcmcia/Kconfig"
 
 
 source "drivers/base/Kconfig"
diff --git a/drivers/parisc/Kconfig b/drivers/parisc/Kconfig
index 5bbfea1a019c..1a55763d1245 100644
--- a/drivers/parisc/Kconfig
+++ b/drivers/parisc/Kconfig
@@ -92,8 +92,6 @@ config IOMMU_SBA
depends on PCI_LBA
default PCI_LBA
 
-source "drivers/pcmcia/Kconfig"
-
 endmenu
 
 menu "PA-RISC specific drivers"
diff --git 

[PATCH 6/8] rapidio: consolidate RAPIDIO config entry in drivers/rapidio

2018-10-17 Thread Christoph Hellwig
There is no good reason to duplicate the RAPIDIO menu in various
architectures.  Instead provide a selectable HAS_RAPIDIO symbol
that indicates native availability of RAPIDIO support and the handle
the rest in drivers/pci.  This also means we now provide support
for PCI(e) to Rapidio bridges for every architecture instead of a
limited subset.

Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
---
 arch/mips/Kconfig   | 15 +--
 arch/powerpc/Kconfig| 15 +--
 arch/powerpc/platforms/85xx/Kconfig |  8 
 arch/powerpc/platforms/86xx/Kconfig |  4 ++--
 arch/x86/Kconfig| 10 --
 drivers/Kconfig |  1 +
 drivers/rapidio/Kconfig | 11 +++
 7 files changed, 20 insertions(+), 44 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 18eeb66c6d99..96198f8375e1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -894,7 +894,7 @@ config CAVIUM_OCTEON_SOC
bool "Cavium Networks Octeon SoC based boards"
select CEVT_R4K
select ARCH_HAS_PHYS_TO_DMA
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select PHYS_ADDR_T_64BIT
select SYS_SUPPORTS_64BIT_KERNEL
select SYS_SUPPORTS_BIG_ENDIAN
@@ -3090,19 +3090,6 @@ config ZONE_DMA
 config ZONE_DMA32
bool
 
-config HAS_RAPIDIO
-   bool
-   default n
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
-
-source "drivers/rapidio/Kconfig"
-
 endmenu
 
 config TRAD_SIGNALS
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6430aafe712f..ee28bb22732b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -963,27 +963,14 @@ config PCI_8260
select PPC_INDIRECT_PCI
default y
 
-config HAS_RAPIDIO
-   bool
-   default n
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
-
 config FSL_RIO
bool "Freescale Embedded SRIO Controller support"
-   depends on RAPIDIO = y && HAS_RAPIDIO
+   depends on RAPIDIO = y && HAVE_RAPIDIO
default "n"
---help---
  Include support for RapidIO controller on Freescale embedded
  processors (MPC8548, MPC8641, etc).
 
-source "drivers/rapidio/Kconfig"
-
 endmenu
 
 config NONSTATIC_KERNEL
diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index 20867a23f3f2..1c6bb9180d70 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -65,7 +65,7 @@ config MPC85xx_CDS
bool "Freescale MPC85xx CDS"
select DEFAULT_UIMAGE
select PPC_I8259
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
help
  This option enables support for the MPC85xx CDS board
 
@@ -73,7 +73,7 @@ config MPC85xx_MDS
bool "Freescale MPC85xx MDS"
select DEFAULT_UIMAGE
select PHYLIB if NETDEVICES
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select SWIOTLB
help
  This option enables support for the MPC85xx MDS board
@@ -218,7 +218,7 @@ config PPA8548
help
  This option enables support for the Prodrive PPA8548 board.
select DEFAULT_UIMAGE
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
 
 config GE_IMP3A
bool "GE Intelligent Platforms IMP3A"
@@ -276,7 +276,7 @@ config CORENET_GENERIC
select SWIOTLB
select GPIOLIB
select GPIO_MPC8XXX
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select PPC_EPAPR_HV_PIC
help
  This option enables support for the FSL CoreNet based boards.
diff --git a/arch/powerpc/platforms/86xx/Kconfig 
b/arch/powerpc/platforms/86xx/Kconfig
index 87220554dd6f..badd9d6ba1ef 100644
--- a/arch/powerpc/platforms/86xx/Kconfig
+++ b/arch/powerpc/platforms/86xx/Kconfig
@@ -15,7 +15,7 @@ config MPC8641_HPCN
select PPC_I8259
select DEFAULT_UIMAGE
select FSL_ULI1575 if PCI
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select SWIOTLB
help
  This option enables support for the MPC8641 HPCN board.
@@ -57,7 +57,7 @@ config GEF_SBC610
select MMIO_NVRAM
select GPIOLIB
select GE_FPGA
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
help
  This option enables support for the GE SBC610.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fda01408b596..6fe3740018f6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2810,16 +2810,6 @@ config AMD_NB
def_bool y
depends on CPU_SUP_AMD && PCI
 
-config RAPIDIO
-   tristate "RapidIO 

[PATCH 2/8] powerpc: remove CONFIG_PCI_QSPAN

2018-10-17 Thread Christoph Hellwig
This option isn't actually used anywhere.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig | 9 -
 1 file changed, 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a80669209155..e8c8970248bc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -955,7 +955,6 @@ config PCI
bool "PCI support" if PPC_PCI_CHOICE
default y if !40x && !CPM2 && !PPC_8xx && !PPC_83xx \
&& !PPC_85xx && !PPC_86xx && !GAMECUBE_COMMON
-   default PCI_QSPAN if PPC_8xx
select GENERIC_PCI_IOMAP
help
  Find out whether your system includes a PCI bus. PCI is the name of
@@ -969,14 +968,6 @@ config PCI_DOMAINS
 config PCI_SYSCALL
def_bool PCI
 
-config PCI_QSPAN
-   bool "QSpan PCI"
-   depends on PPC_8xx
-   select PPC_I8259
-   help
- Say Y here if you have a system based on a Motorola 8xx-series
- embedded processor with a QSPAN PCI interface, otherwise say N.
-
 config PCI_8260
bool
depends on PCI && 8260
-- 
2.19.1



[PATCH 7/8] eisa: consolidate EISA Kconfig entry in drivers/eisa

2018-10-17 Thread Christoph Hellwig
Let architectures opt into EISA support by selecting HAS_EISA and
handle everything else in drivers/eisa.

Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
---
 arch/alpha/Kconfig | 10 +++---
 arch/arm/Kconfig   | 16 +---
 arch/mips/Kconfig  | 31 +--
 arch/powerpc/Kconfig   |  3 ---
 arch/x86/Kconfig   | 19 +--
 drivers/Kconfig|  1 +
 drivers/eisa/Kconfig   | 21 -
 drivers/parisc/Kconfig | 11 +--
 8 files changed, 32 insertions(+), 80 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 96f02268ea16..779e25255d78 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -6,6 +6,9 @@ config ALPHA
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_NO_PREEMPT
select ARCH_USE_CMPXCHG_LOCKREF
+   select HAVE_EISA if ALPHA_GENERIC || ALPHA_JENSEN || ALPHA_ALCOR || \
+   ALPHA_MIKASA || ALPHA_SABLE || ALPHA_LYNX || \
+   ALPHA_NORITAKE || ALPHA_RAWHIDE
select HAVE_PCI if !ALPHA_JENSEN
select PCI if !ALPHA_JENSEN
select HAVE_AOUT
@@ -518,11 +521,6 @@ config ALPHA_SRM
 
  If unsure, say N.
 
-config EISA
-   bool
-   depends on ALPHA_GENERIC || ALPHA_JENSEN || ALPHA_ALCOR || ALPHA_MIKASA 
|| ALPHA_SABLE || ALPHA_LYNX || ALPHA_NORITAKE || ALPHA_RAWHIDE
-   default y
-
 config ARCH_MAY_HAVE_PC_FDC
def_bool y
 
@@ -673,8 +671,6 @@ config HZ
default 1200 if HZ_1200
default 1024
 
-source "drivers/eisa/Kconfig"
-
 config SRM_ENV
tristate "SRM environment through procfs"
depends on PROC_FS
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 60e37b9a715d..c90a1a4d6079 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
+   select HAVE_EISA
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
@@ -162,21 +163,6 @@ config HAVE_PROC_CPU
 config NO_IOPORT_MAP
bool
 
-config EISA
-   bool
-   ---help---
- The Extended Industry Standard Architecture (EISA) bus was
- developed as an open alternative to the IBM MicroChannel bus.
-
- The EISA bus provided some of the features of the IBM MicroChannel
- bus while maintaining backward compatibility with cards made for
- the older ISA bus.  The EISA bus saw limited use between 1988 and
- 1995 when it was made obsolete by the PCI bus.
-
- Say Y here if you are building a kernel for an EISA-based machine.
-
- Otherwise, say N.
-
 config SBUS
bool
 
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 96198f8375e1..7cf58031a43e 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -23,6 +23,7 @@ config MIPS
select GENERIC_CPU_AUTOPROBE
select GENERIC_IRQ_PROBE
select GENERIC_IRQ_SHOW
+   select GENERIC_ISA_DMA if EISA
select GENERIC_LIB_ASHLDI3
select GENERIC_LIB_ASHRDI3
select GENERIC_LIB_CMPDI2
@@ -72,6 +73,7 @@ config MIPS
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP
select IRQ_FORCED_THREADING
+   select ISA if EISA
select MODULES_USE_ELF_RELA if MODULES && 64BIT
select MODULES_USE_ELF_REL if MODULES
select PCI_DOMAINS if PCI
@@ -634,7 +636,7 @@ config SGI_IP22
select CSRC_R4K
select DEFAULT_SGI_PARTITION
select DMA_NONCOHERENT
-   select HW_HAS_EISA
+   select HAVE_EISA
select I8253
select I8259
select IP22_CPU_SCACHE
@@ -699,7 +701,7 @@ config SGI_IP28
select DMA_NONCOHERENT
select GENERIC_ISA_DMA_SUPPORT_BROKEN
select IRQ_MIPS_CPU
-   select HW_HAS_EISA
+   select HAVE_EISA
select I8253
select I8259
select SGI_HAS_I8042
@@ -842,8 +844,8 @@ config SNI_RM
select DEFAULT_SGI_PARTITION if CPU_BIG_ENDIAN
select DMA_NONCOHERENT
select GENERIC_ISA_DMA
+   select HAVE_EISA
select HAVE_PCSPKR_PLATFORM
-   select HW_HAS_EISA
select HAVE_PCI
select IRQ_MIPS_CPU
select I8253
@@ -2991,9 +2993,6 @@ config MIPS_AUTO_PFN_OFFSET
 
 menu "Bus options (PCI, PCMCIA, EISA, ISA, TC)"
 
-config HW_HAS_EISA
-   bool
-
 config HT_PCI
bool "Support for HT-linked PCI"
default y
@@ -3027,26 +3026,6 @@ config PCI_DRIVERS_LEGACY
 config ISA
bool
 
-config EISA
-   bool "EISA support"
-   depends on HW_HAS_EISA
-   select ISA
-   select GENERIC_ISA_DMA
-   ---help---
- The Extended Industry Standard Architecture (EISA) bus was
- developed as an open alternative to the IBM MicroChannel bus.
-
- The EISA bus provided some 

Re: [PATCH 4.18 086/135] KVM: PPC: Book3S HV: Dont use compound_order to determine host mapping size

2018-10-17 Thread Greg Kroah-Hartman
On Wed, Oct 17, 2018 at 09:32:25AM +1100, Paul Mackerras wrote:
> On Tue, Oct 16, 2018 at 07:05:16PM +0200, Greg Kroah-Hartman wrote:
> > 4.18-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Nicholas Piggin 
> > 
> > [ Upstream commit 71d29f43b6332badc5598c656616a62575e83342 ]
> 
> If you take 71d29f43b633 then you also need 6579804c4317 ("KVM: PPC:
> Book3S HV: Avoid crash from THP collapse during radix page fault",
> 2018-10-04).

Thanks, now queued up.

greg k-h


[PATCH 3/5] powerpc/lib: checksum KHRAP support

2018-10-17 Thread Russell Currey
Wrap the checksumming code in KHRAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/lib/checksum_wrappers.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/checksum_wrappers.c 
b/arch/powerpc/lib/checksum_wrappers.c
index a0cb63fb76a1..695460a29c9f 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -26,7 +26,7 @@
 __wsum csum_and_copy_from_user(const void __user *src, void *dst,
   int len, __wsum sum, int *err_ptr)
 {
-   unsigned int csum;
+   unsigned int csum, amr = unlock_user_access();
 
might_sleep();
 
@@ -60,6 +60,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void 
*dst,
}
 
 out:
+   lock_user_access(amr);
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_from_user);
@@ -67,7 +68,7 @@ EXPORT_SYMBOL(csum_and_copy_from_user);
 __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
 __wsum sum, int *err_ptr)
 {
-   unsigned int csum;
+   unsigned int csum, amr = unlock_user_access();
 
might_sleep();
 
@@ -97,6 +98,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user 
*dst, int len,
}
 
 out:
+   lock_user_access(amr);
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_to_user);
-- 
2.19.1



[PATCH 5/5] powerpc/64s: Document that PPC supports nosmap

2018-10-17 Thread Russell Currey
Signed-off-by: Russell Currey 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a5ad67d5cb16..8f78e75965f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2764,7 +2764,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings
 
-   nosmap  [X86]
+   nosmap  [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
 
-- 
2.19.1



[PATCH 2/5] powerpc/futex: KHRAP support for futex ops

2018-10-17 Thread Russell Currey
Wrap the futex operations in KHRAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/futex.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
index 94542776a62d..e0f4227cfd32 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -34,7 +34,9 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
u32 __user *uaddr)
 {
int oldval = 0, ret;
+   unsigned long amr;
 
+   amr = unlock_user_access();
pagefault_disable();
 
switch (op) {
@@ -62,6 +64,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
if (!ret)
*oval = oldval;
 
+   lock_user_access(amr);
return ret;
 }
 
@@ -71,10 +74,12 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 {
int ret = 0;
u32 prev;
+   unsigned long amr;
 
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
 
+   amr = unlock_user_access();
 __asm__ __volatile__ (
 PPC_ATOMIC_ENTRY_BARRIER
 "1: lwarx   %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
@@ -95,6 +100,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 : "cc", "memory");
 
*uval = prev;
+   lock_user_access(amr);
 return ret;
 }
 
-- 
2.19.1



[PATCH 4/5] powerpc/64s: Disable KHRAP with nosmap option

2018-10-17 Thread Russell Currey
KHRAP is similar to SMAP on x86 platforms, so implement support for
the same kernel parameter.

Signed-off-by: Russell Currey 
---
 arch/powerpc/mm/init_64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 7a9886f98b0c..10182ce3b94f 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -312,6 +312,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 
 #ifdef CONFIG_PPC_BOOK3S_64
 static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT);
+static bool disable_khrap = !IS_ENABLED(CONFIG_PPC_RADIX_KHRAP);
 
 static int __init parse_disable_radix(char *p)
 {
@@ -328,6 +329,18 @@ static int __init parse_disable_radix(char *p)
 }
 early_param("disable_radix", parse_disable_radix);
 
+static int __init parse_nosmap(char *p)
+{
+   /*
+* nosmap is an existing option on x86 where it doesn't return -EINVAL
+* if the parameter is set to something, so even though it's different
+* to disable_radix, don't return an error for compatibility.
+*/
+   disable_khrap = true;
+   return 0;
+}
+early_param("nosmap", parse_nosmap);
+
 /*
  * If we're running under a hypervisor, we need to check the contents of
  * /chosen/ibm,architecture-vec-5 to see if the hypervisor is willing to do
@@ -381,6 +394,8 @@ void __init mmu_early_init_devtree(void)
/* Disable radix mode based on kernel command line. */
if (disable_radix)
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   if (disable_radix || disable_khrap)
+   cur_cpu_spec->mmu_features &= ~MMU_FTR_RADIX_KHRAP;
 
/*
 * Check /chosen/ibm,architecture-vec-5 if running as a guest.
-- 
2.19.1



[PATCH 1/5] powerpc/64s: Kernel Hypervisor Restricted Access Prevention

2018-10-17 Thread Russell Currey
Kernel Hypervisor Restricted Access Prevention (KHRAP) utilises a feature
of the Radix MMU which disallows read and write access to userspace
addresses.  By utilising this, the kernel is prevented from accessing
user data from outside of trusted paths that perform proper safety checks,
such as copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when:

- exiting the kernel and entering userspace
- performing an operation like copy_{to/from}_user()
- context switching to a process that has access enabled

and similarly, access is disabled again when exiting userspace and entering
the kernel.

This feature has a slight performance impact which I roughly measured to be
4% slower (performing 1GB of 1 byte read()/write() syscalls), and is gated
behind the CONFIG_PPC_RADIX_KHRAP option for performance-critical builds.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and
performing the following:

echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT

if enabled, this should send SIGSEGV to the thread.

Signed-off-by: Russell Currey 
---
More detailed benchmarks soon, there's more optimisations here as well.

 arch/powerpc/include/asm/exception-64s.h | 17 +++
 arch/powerpc/include/asm/mmu.h   |  7 +++
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/include/asm/uaccess.h   | 63 +---
 arch/powerpc/kernel/dt_cpu_ftrs.c|  4 ++
 arch/powerpc/kernel/entry_64.S   |  7 +++
 arch/powerpc/mm/fault.c  |  9 
 arch/powerpc/mm/pgtable-radix.c  |  2 +
 arch/powerpc/mm/pkeys.c  |  7 ++-
 arch/powerpc/platforms/Kconfig.cputype   | 15 ++
 10 files changed, 122 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..3b84a8050bae 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -240,6 +240,22 @@ BEGIN_FTR_SECTION_NESTED(941)  
\
mtspr   SPRN_PPR,ra;\
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
 
+#define LOCK_AMR(reg)  \
+BEGIN_MMU_FTR_SECTION_NESTED(69)   
\
+   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
+   isync;  \
+   mtspr   SPRN_AMR,reg;   \
+   isync;  \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KHRAP,MMU_FTR_RADIX_KHRAP,69)
+
+#define UNLOCK_AMR(reg)
\
+BEGIN_MMU_FTR_SECTION_NESTED(420)  
\
+   li  reg,0;  \
+   isync;  \
+   mtspr   SPRN_AMR,reg;   \
+   isync;  \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_KHRAP,MMU_FTR_RADIX_KHRAP,420)
+
 /*
  * Get an SPR into a register if the CPU has the given feature
  */
@@ -500,6 +516,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
beq 4f; /* if from kernel mode  */ \
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10);  \
SAVE_PPR(area, r9);\
+   LOCK_AMR(r9);  \
 4: EXCEPTION_PROLOG_COMMON_2(area)\
EXCEPTION_PROLOG_COMMON_3(n)   \
ACCOUNT_STOLEN_TIME
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..504c8bfa2f9d 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -107,6 +107,10 @@
  */
 #define MMU_FTR_1T_SEGMENT ASM_CONST(0x4000)
 
+/* Supports KHRAP (key 0 controlling userspace addresses) on radix
+ */
+#define MMU_FTR_RADIX_KHRAPASM_CONST(0x8000)
+
 /* MMU feature bit sets for various CPUs */
 #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2  \
MMU_FTR_HPTE_TABLE | MMU_FTR_PPCAS_ARCH_V2
@@ -143,6 +147,9 @@ enum {
MMU_FTR_KERNEL_RO | MMU_FTR_68_BIT_VA |
 #ifdef CONFIG_PPC_RADIX_MMU
MMU_FTR_TYPE_RADIX |
+#endif
+#ifdef CONFIG_PPC_RADIX_KHRAP
+   MMU_FTR_RADIX_KHRAP |
 #endif
0,
 };
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 640a4d818772..8aa3540fbedc 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -246,6 +246,7 @@
 #define 

[v9 7/7] dt-bindings: fsl-qdma: Add NXP Layerscpae qDMA controller bindings

2018-10-17 Thread Peng Ma
Document the devicetree bindings for NXP Layerscape qDMA controller
which could be found on NXP QorIQ Layerscape SoCs.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
Reviewed-by: Rob Herring 
---
change in v9:
- add required properties such as interrupts,block-number,block-offset
etc.

 Documentation/devicetree/bindings/dma/fsl-qdma.txt |   57 
 1 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/fsl-qdma.txt

diff --git a/Documentation/devicetree/bindings/dma/fsl-qdma.txt 
b/Documentation/devicetree/bindings/dma/fsl-qdma.txt
new file mode 100644
index 000..283372a
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/fsl-qdma.txt
@@ -0,0 +1,57 @@
+NXP Layerscape SoC qDMA Controller
+==
+
+This device follows the generic DMA bindings defined in dma/dma.txt.
+
+Required properties:
+
+- compatible:  Must be one of
+"fsl,ls1021a-qdma": for LS1021A Board
+"fsl,ls1043a-qdma": for ls1043A Board
+"fsl,ls1046a-qdma": for ls1046A Board
+- reg: Should contain the register's base address and length.
+- interrupts:  Should contain a reference to the interrupt used by this
+   device.
+- interrupt-names: Should contain interrupt names:
+"qdma-queue0": the block0 interrupt
+"qdma-queue1": the block1 interrupt
+"qdma-queue2": the block2 interrupt
+"qdma-queue3": the block3 interrupt
+"qdma-error":  the error interrupt
+- fsl,dma-queues:  Should contain number of queues supported.
+- dma-channels:Number of DMA channels supported
+- block-number:the virtual block number
+- block-offset:the offset of different virtual block
+- status-sizes:status queue size of per virtual block
+- queue-sizes: command queue size of per virtual block, the size number
+   based on queues
+
+Optional properties:
+
+- dma-channels:Number of DMA channels supported by the 
controller.
+- big-endian:  If present registers and hardware scatter/gather 
descriptors
+   of the qDMA are implemented in big endian mode, 
otherwise in little
+   mode.
+
+Examples:
+
+   qdma: dma-controller@839 {
+   compatible = "fsl,ls1021a-qdma";
+   reg = <0x0 0x8388000 0x0 0x1000>, /* Controller regs */
+ <0x0 0x8389000 0x0 0x1000>, /* Status regs */
+ <0x0 0x838a000 0x0 0x2000>; /* Block regs */
+   interrupts = ,
+,
+;
+   interrupt-names = "qdma-error",
+   "qdma-queue0", "qdma-queue1";
+   dma-channels = <8>;
+   block-number = <2>;
+   block-offset = <0x1000>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
+DMA clients must use the format described in dma/dma.txt file.
-- 
1.7.1



[v9 6/7] arm64: dts: ls1046a: add qdma device tree nodes

2018-10-17 Thread Peng Ma
add the qDMA device tree nodes for LS1046A devices.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v9:
- add interrupts for each virtual block 
- add block-number
- add block-offset

 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index ef83786..2a48d9b 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -704,6 +704,27 @@
< 0 0 4  GIC_SPI 154 
IRQ_TYPE_LEVEL_HIGH>;
};
 
+   qdma: dma-controller@838 {
+   compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
+   reg = <0x0 0x838 0x0 0x1000>, /* Controller regs */
+ <0x0 0x839 0x0 0x1>, /* Status regs */
+ <0x0 0x83a 0x0 0x4>; /* Block regs */
+   interrupts = <0 153 0x4>,
+<0 39 0x4>,
+<0 40 0x4>,
+<0 41 0x4>,
+<0 42 0x4>;
+   interrupt-names = "qdma-error", "qdma-queue0",
+   "qdma-queue1", "qdma-queue2", "qdma-queue3";
+   dma-channels = <8>;
+   block-number = <1>;
+   block-offset = <0x1>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
};
 
reserved-memory {
-- 
1.7.1



[v9 5/7] arm64: dts: ls1043a: add qdma device tree nodes

2018-10-17 Thread Peng Ma
add the qDMA device tree nodes for LS1043A devices.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v9:
- add interrupts for each virtual block 
- add block-number
- add block-offset

 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 7881e3d..e798c4c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -734,6 +734,28 @@
< 0 0 3  0 156 0x4>,
< 0 0 4  0 157 0x4>;
};
+
+   qdma: dma-controller@838 {
+   compatible = "fsl,ls1021a-qdma", "fsl,ls1043a-qdma";
+   reg = <0x0 0x838 0x0 0x1000>, /* Controller regs */
+ <0x0 0x839 0x0 0x1>, /* Status regs */
+ <0x0 0x83a 0x0 0x4>; /* Block regs */
+   interrupts = <0 153 0x4>,
+<0 39 0x4>,
+<0 40 0x4>,
+<0 41 0x4>,
+<0 42 0x4>;
+   interrupt-names = "qdma-error", "qdma-queue0",
+   "qdma-queue1", "qdma-queue2", "qdma-queue3";
+   dma-channels = <8>;
+   block-number = <1>;
+   block-offset = <0x1>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
};
 
firmware {
-- 
1.7.1



[v9 3/7] dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs

2018-10-17 Thread Peng Ma
NXP Queue DMA controller(qDMA) on Layerscape SoCs supports channel
virtuallization by allowing DMA jobs to be enqueued into different
command queues.

Note that this module depends on NXP DPAA.

Signed-off-by: Wen He 
Signed-off-by: Jiaheng Fan 
Signed-off-by: Peng Ma 
---
change in v9:
- add multi block for each core qDMA engine driver support 
- remove remaining code for SG mode to clean up the qdma driver
- set Scatter/Gather table with 64B aligned to fixed an errata
- Added new internal functions to improve the structure of the driver 
and do some other cleanups
- remove useless headers
- format some bad code
- use dma_cookie_status as status fn

 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/fsl-qdma.c | 1257 
 3 files changed, 1271 insertions(+), 0 deletions(-)
 create mode 100644 drivers/dma/fsl-qdma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index dacf3f4..50e19d7 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -218,6 +218,19 @@ config FSL_EDMA
  multiplexing capability for DMA request sources(slot).
  This module can be found on Freescale Vybrid and LS-1 SoCs.
 
+config FSL_QDMA
+   tristate "NXP Layerscape qDMA engine support"
+   depends on ARM || ARM64
+   select DMA_ENGINE
+   select DMA_VIRTUAL_CHANNELS
+   select DMA_ENGINE_RAID
+   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+   help
+ Support the NXP Layerscape qDMA engine with command queue and legacy 
mode.
+ Channel virtualization is supported through enqueuing of DMA jobs to,
+ or dequeuing DMA jobs from, different work queues.
+ This module can be found on NXP Layerscape SoCs.
+
 config FSL_RAID
 tristate "Freescale RAID engine Support"
 depends on FSL_SOC && !ASYNC_TX_ENABLE_CHANNEL_SWITCH
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index c91702d..2d1b586 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_DW_DMAC_CORE) += dw/
 obj-$(CONFIG_EP93XX_DMA) += ep93xx_dma.o
 obj-$(CONFIG_FSL_DMA) += fsldma.o
 obj-$(CONFIG_FSL_EDMA) += fsl-edma.o
+obj-$(CONFIG_FSL_QDMA) += fsl-qdma.o
 obj-$(CONFIG_FSL_RAID) += fsl_raid.o
 obj-$(CONFIG_HSU_DMA) += hsu/
 obj-$(CONFIG_IMG_MDC_DMA) += img-mdc-dma.o
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c
new file mode 100644
index 000..404869e
--- /dev/null
+++ b/drivers/dma/fsl-qdma.c
@@ -0,0 +1,1257 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright 2018 NXP
+
+/*
+ * Driver for NXP Layerscape Queue Direct Memory Access Controller
+ *
+ * Author:
+ *  Wen He 
+ *  Jiaheng Fan 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "virt-dma.h"
+#include "fsldma.h"
+
+/* Register related definition */
+#define FSL_QDMA_DMR   0x0
+#define FSL_QDMA_DSR   0x4
+#define FSL_QDMA_DEIER 0xe00
+#define FSL_QDMA_DEDR  0xe04
+#define FSL_QDMA_DECFDW0R  0xe10
+#define FSL_QDMA_DECFDW1R  0xe14
+#define FSL_QDMA_DECFDW2R  0xe18
+#define FSL_QDMA_DECFDW3R  0xe1c
+#define FSL_QDMA_DECFQIDR  0xe30
+#define FSL_QDMA_DECBR 0xe34
+
+#define FSL_QDMA_BCQMR(x)  (0xc0 + 0x100 * (x))
+#define FSL_QDMA_BCQSR(x)  (0xc4 + 0x100 * (x))
+#define FSL_QDMA_BCQEDPA_SADDR(x)  (0xc8 + 0x100 * (x))
+#define FSL_QDMA_BCQDPA_SADDR(x)   (0xcc + 0x100 * (x))
+#define FSL_QDMA_BCQEEPA_SADDR(x)  (0xd0 + 0x100 * (x))
+#define FSL_QDMA_BCQEPA_SADDR(x)   (0xd4 + 0x100 * (x))
+#define FSL_QDMA_BCQIER(x) (0xe0 + 0x100 * (x))
+#define FSL_QDMA_BCQIDR(x) (0xe4 + 0x100 * (x))
+
+#define FSL_QDMA_SQDPAR0x80c
+#define FSL_QDMA_SQEPAR0x814
+#define FSL_QDMA_BSQMR 0x800
+#define FSL_QDMA_BSQSR 0x804
+#define FSL_QDMA_BSQICR0x828
+#define FSL_QDMA_CQMR  0xa00
+#define FSL_QDMA_CQDSCR1   0xa08
+#define FSL_QDMA_CQDSCR20xa0c
+#define FSL_QDMA_CQIER 0xa10
+#define FSL_QDMA_CQEDR 0xa14
+#define FSL_QDMA_SQCCMR0xa20
+
+/* Registers for bit and genmask */
+#define FSL_QDMA_CQIDR_SQT BIT(15)
+#define QDMA_CCDF_FOTMAT   BIT(29)
+#define QDMA_CCDF_SER  BIT(30)
+#define QDMA_SG_FINBIT(30)
+#define QDMA_SG_LEN_MASK   GENMASK(29, 0)
+#define QDMA_CCDF_MASK GENMASK(28, 20)
+
+#define FSL_QDMA_DEDR_CLEARGENMASK(31, 0)
+#define FSL_QDMA_BCQIDR_CLEAR  GENMASK(31, 0)
+#define FSL_QDMA_DEIER_CLEAR   GENMASK(31, 0)
+
+#define FSL_QDMA_BCQIER_CQTIE  BIT(15)
+#define FSL_QDMA_BCQIER_CQPEIE 

[v9 4/7] arm: dts: ls1021a: add qdma device tree nodes

2018-10-17 Thread Peng Ma
add the qDMA device tree nodes for LS1021A devices.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v9:
- add interrupts for each virtual block 
- add block-number
- add block-offset

 arch/arm/boot/dts/ls1021a.dtsi |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index f184905..0b910e7 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -806,5 +806,25 @@
#size-cells = <1>;
ranges = <0x0 0x0 0x1001 0x1>;
};
+
+   qdma: dma-controller@839 {
+   compatible = "fsl,ls1021a-qdma";
+   reg = <0x0 0x8388000 0x0 0x1000>, /* Controller regs */
+ <0x0 0x8389000 0x0 0x1000>, /* Status regs */
+ <0x0 0x838a000 0x0 0x2000>; /* Block regs */
+   interrupts = ,
+,
+;
+   interrupt-names = "qdma-error",
+   "qdma-queue0", "qdma-queue1";
+   dma-channels = <8>;
+   block-number = <1>;
+   block-offset = <0x1000>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
};
 };
-- 
1.7.1



[v9 2/7] dmaengine: fsldma: Adding macro FSL_DMA_IN/OUT implement for ARM platform

2018-10-17 Thread Peng Ma
This patch add the macro FSL_DMA_IN/OUT implement for ARM platform.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v9:
- rewrite function fsl_ioread64 and fsl_ioread64be to make a better read

 drivers/dma/fsldma.h |   59 +
 1 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 982845b..f635bc1 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -196,39 +196,60 @@ struct fsldma_chan {
 #define to_fsl_desc(lh) container_of(lh, struct fsl_desc_sw, node)
 #define tx_to_fsl_desc(tx) container_of(tx, struct fsl_desc_sw, async_tx)
 
+#ifdef CONFIG_PPC
+#define fsl_ioread32(p)in_le32(p)
+#define fsl_ioread32be(p)  in_be32(p)
+#define fsl_iowrite32(v, p)out_le32(p, v)
+#define fsl_iowrite32be(v, p)  out_be32(p, v)
+
 #ifndef __powerpc64__
-static u64 in_be64(const u64 __iomem *addr)
+static u64 fsl_ioread64(const u64 __iomem *addr)
 {
-   return ((u64)in_be32((u32 __iomem *)addr) << 32) |
-   (in_be32((u32 __iomem *)addr + 1));
+   u32 fsl_addr = lower_32_bits(addr);
+
+   return in_le32(fsl_addr) | in_le32(fsl_addr + 1) << 32;
 }
 
-static void out_be64(u64 __iomem *addr, u64 val)
+static void fsl_iowrite64(u64 val, u64 __iomem *addr)
 {
-   out_be32((u32 __iomem *)addr, val >> 32);
-   out_be32((u32 __iomem *)addr + 1, (u32)val);
+   out_le32((u32 __iomem *)addr + 1, val >> 32);
+   out_le32((u32 __iomem *)addr, (u32)val);
 }
 
-/* There is no asm instructions for 64 bits reverse loads and stores */
-static u64 in_le64(const u64 __iomem *addr)
+static u64 fsl_ioread64be(const u64 __iomem *addr)
 {
-   return ((u64)in_le32((u32 __iomem *)addr + 1) << 32) |
-   (in_le32((u32 __iomem *)addr));
+   u32 fsl_addr = lower_32_bits(addr);
+
+   return in_be32(fsl_addr + 1) | in_be32(fsl_addr) << 32;
 }
 
-static void out_le64(u64 __iomem *addr, u64 val)
+static void fsl_iowrite64be(u64 val, u64 __iomem *addr)
 {
-   out_le32((u32 __iomem *)addr + 1, val >> 32);
-   out_le32((u32 __iomem *)addr, (u32)val);
+   out_be32((u32 __iomem *)addr, val >> 32);
+   out_be32((u32 __iomem *)addr + 1, (u32)val);
 }
 #endif
+#endif
 
-#define FSL_DMA_IN(fsl_chan, addr, width)  \
-   (((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
-   in_be##width(addr) : in_le##width(addr))
-#define FSL_DMA_OUT(fsl_chan, addr, val, width)\
-   (((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
-   out_be##width(addr, val) : out_le##width(addr, val))
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
+#define fsl_ioread32(p)ioread32(p)
+#define fsl_ioread32be(p)  ioread32be(p)
+#define fsl_iowrite32(v, p)iowrite32(v, p)
+#define fsl_iowrite32be(v, p)  iowrite32be(v, p)
+#define fsl_ioread64(p)ioread64(p)
+#define fsl_ioread64be(p)  ioread64be(p)
+#define fsl_iowrite64(v, p)iowrite64(v, p)
+#define fsl_iowrite64be(v, p)  iowrite64be(v, p)
+#endif
+
+#define FSL_DMA_IN(fsl_dma, addr, width)   \
+   (((fsl_dma)->feature & FSL_DMA_BIG_ENDIAN) ?\
+   fsl_ioread##width##be(addr) : fsl_ioread##width(addr))
+
+#define FSL_DMA_OUT(fsl_dma, addr, val, width) \
+   (((fsl_dma)->feature & FSL_DMA_BIG_ENDIAN) ?\
+   fsl_iowrite##width##be(val, addr) : fsl_iowrite \
+   ##width(val, addr))
 
 #define DMA_TO_CPU(fsl_chan, d, width) \
(((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
-- 
1.7.1



[v9 1/7] dmaengine: fsldma: Replace DMA_IN/OUT by FSL_DMA_IN/OUT

2018-10-17 Thread Peng Ma
From: Wen He 

This patch implement a standard macro call functions is
used to NXP dma drivers.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v9:
- no

 drivers/dma/fsldma.c |   16 
 drivers/dma/fsldma.h |4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 1117b51..39871e0 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -53,42 +53,42 @@
 
 static void set_sr(struct fsldma_chan *chan, u32 val)
 {
-   DMA_OUT(chan, >regs->sr, val, 32);
+   FSL_DMA_OUT(chan, >regs->sr, val, 32);
 }
 
 static u32 get_sr(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->sr, 32);
+   return FSL_DMA_IN(chan, >regs->sr, 32);
 }
 
 static void set_mr(struct fsldma_chan *chan, u32 val)
 {
-   DMA_OUT(chan, >regs->mr, val, 32);
+   FSL_DMA_OUT(chan, >regs->mr, val, 32);
 }
 
 static u32 get_mr(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->mr, 32);
+   return FSL_DMA_IN(chan, >regs->mr, 32);
 }
 
 static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
 {
-   DMA_OUT(chan, >regs->cdar, addr | FSL_DMA_SNEN, 64);
+   FSL_DMA_OUT(chan, >regs->cdar, addr | FSL_DMA_SNEN, 64);
 }
 
 static dma_addr_t get_cdar(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->cdar, 64) & ~FSL_DMA_SNEN;
+   return FSL_DMA_IN(chan, >regs->cdar, 64) & ~FSL_DMA_SNEN;
 }
 
 static void set_bcr(struct fsldma_chan *chan, u32 val)
 {
-   DMA_OUT(chan, >regs->bcr, val, 32);
+   FSL_DMA_OUT(chan, >regs->bcr, val, 32);
 }
 
 static u32 get_bcr(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->bcr, 32);
+   return FSL_DMA_IN(chan, >regs->bcr, 32);
 }
 
 /*
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 4787d48..982845b 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -223,10 +223,10 @@ static void out_le64(u64 __iomem *addr, u64 val)
 }
 #endif
 
-#define DMA_IN(fsl_chan, addr, width)  \
+#define FSL_DMA_IN(fsl_chan, addr, width)  \
(((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
in_be##width(addr) : in_le##width(addr))
-#define DMA_OUT(fsl_chan, addr, val, width)\
+#define FSL_DMA_OUT(fsl_chan, addr, val, width)\
(((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
out_be##width(addr, val) : out_le##width(addr, val))
 
-- 
1.7.1



Re: Crash on FSL Book3E due to pte_pgprot()? (was Re: [PATCH v3 12/24] powerpc/mm: use pte helpers in generic code)

2018-10-17 Thread Christophe Leroy




On 10/17/2018 12:59 AM, Michael Ellerman wrote:

Christophe Leroy  writes:


Get rid of platform specific _PAGE_ in powerpc common code and
use helpers instead.

mm/dump_linuxpagetables.c will be handled separately

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/book3s/32/pgtable.h |  9 +++--
  arch/powerpc/include/asm/nohash/32/pgtable.h | 12 
  arch/powerpc/include/asm/nohash/pgtable.h|  3 +--
  arch/powerpc/mm/pgtable.c| 21 +++--
  arch/powerpc/mm/pgtable_32.c | 15 ---
  arch/powerpc/mm/pgtable_64.c | 14 +++---
  arch/powerpc/xmon/xmon.c | 12 +++-
  7 files changed, 41 insertions(+), 45 deletions(-)


So turns out this patch *also* breaks my p5020ds :)

Even with patch 4 merged, see next.

It's the same crash:

   pcieport 2000:00:00.0: AER enabled with IRQ 480
   Unable to handle kernel paging request for data at address 0x88008008
   Faulting instruction address: 0xc00192cc
   Oops: Kernel access of bad area, sig: 11 [#1]
   BE SMP NR_CPUS=24 CoreNet Generic
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc7x-g98c847323b3a #1
   NIP:  c00192cc LR: c05d0f9c CTR: 0010
   REGS: c000f31bb400 TRAP: 0300   Not tainted  
(4.19.0-rc3-gcc7x-g98c847323b3a)
   MSR:  80029000   CR: 24000224  XER: 
   DEAR: 88008008 ESR: 0080 IRQMASK: 0
   GPR00: c05d0f84 c000f31bb688 c117dc00 88008008
   GPR04:  0040 0ffbff241010 c000f31b8000
   GPR08:  0010  c12d4710
   GPR12: 84000422 c12ff000 c0002774 
   GPR16:    
   GPR20:    
   GPR24:   88008008 c00089a8
   GPR28: c000f3576400 c000f3576410 0040 c12ecc98
   NIP [c00192cc] ._memset_io+0x6c/0x9c
   LR [c05d0f9c] .fsl_qman_probe+0x198/0x928
   Call Trace:
   [c000f31bb688] [c05d0f84] .fsl_qman_probe+0x180/0x928 
(unreliable)
   [c000f31bb728] [c06432ec] .platform_drv_probe+0x60/0xb4
   [c000f31bb7a8] [c064083c] .really_probe+0x294/0x35c
   [c000f31bb848] [c0640d2c] .__driver_attach+0x148/0x14c
   [c000f31bb8d8] [c063d7dc] .bus_for_each_dev+0xb0/0x118
   [c000f31bb988] [c063ff28] .driver_attach+0x34/0x4c
   [c000f31bba08] [c063f648] .bus_add_driver+0x174/0x2bc
   [c000f31bbaa8] [c06418bc] .driver_register+0x90/0x180
   [c000f31bbb28] [c0643270] .__platform_driver_register+0x60/0x7c
   [c000f31bbba8] [c0ee2a70] .fsl_qman_driver_init+0x24/0x38
   [c000f31bbc18] [c00023fc] .do_one_initcall+0x64/0x2b8
   [c000f31bbcf8] [c0e9f480] .kernel_init_freeable+0x3a8/0x494
   [c000f31bbda8] [c0002798] .kernel_init+0x24/0x148
   [c000f31bbe28] [c9e8] .ret_from_kernel_thread+0x58/0x70
   Instruction dump:
   4e800020 2ba50003 40dd003c 3925fffc 5488402e 7929f082 7d082378 39290001
   550a801e 7d2903a6 7d4a4378 794a0020 <9143> 38630004 4200fff8 70a50003


Comparing a working vs broken kernel, it seems to boil down to the fact
that we're filtering out more PTE bits now that we use pte_pgprot() in
ioremap_prot().

With the old code we get:
   ioremap_prot: addr 0xff80 flags 0x241215
   ioremap_prot: addr 0xff80 flags 0x241215
   map_kernel_page: ea 0x88008008 pa 0xff80 pte 0xff800241215


And now we get:
   ioremap_prot: addr 0xff80 flags 0x241215 pte 0x241215
   ioremap_prot: addr 0xff80 pte 0x241215
   ioremap_prot: addr 0xff80 prot 0x241014
   map_kernel_page: ea 0x88008008 pa 0xff80 pte 0xff800241014

So we're losing 0x201, which for nohash book3e is:

   #define _PAGE_PRESENT0x01 /* software: pte contains a 
translation */
   #define _PAGE_PSIZE_4K   0x000200


I haven't worked out if it's one or both of those that matter.


At least missing _PAGE_PRESENT is an issue I believe.


The question is what's the right way to fix it? Should pte_pgprot() not
be filtering those bits out on book3e?


I think we should not use pte_pggrot() for that then. What about the 
below fix ?


Christophe

From: Christophe Leroy 
Date: Wed, 17 Oct 2018 05:56:25 +
Subject: [PATCH] powerpc/mm: don't use pte_pgprot() in ioremap_prot()

pte_pgprot() filters out some required flags like _PAGE_PRESENT.

This patch replaces pte_pgprot() by __pgprot(pte_val())
in ioremap_prot()

Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable_32.c | 3