Re: [PATCH v5 0/3] LLVM/Clang fixes for a few defconfigs

2019-11-27 Thread Nathan Chancellor
On Thu, Nov 28, 2019 at 03:59:07PM +1100, Michael Ellerman wrote:
> Nick Desaulniers  writes:
> > Hi Michael,
> > Do you have feedback for Nathan? Rebasing these patches is becoming a
> > nuisance for our CI, and we would like to keep building PPC w/ Clang.
> 
> Sorry just lost in the flood of patches.
> 
> Merged now.
> 
> cheers

Thank you very much for picking them up :)

Cheers,
Nathan


Re: [PATCH v3 4/8] powerpc/vdso32: inline __get_datapage()

2019-11-27 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 22/11/2019 à 07:38, Michael Ellerman a écrit :
>> Michael Ellerman  writes:
>>> Christophe Leroy  writes:
 __get_datapage() is only a few instructions to retrieve the
 address of the page where the kernel stores data to the VDSO.

 By inlining this function into its users, a bl/blr pair and
 a mflr/mtlr pair is avoided, plus a few reg moves.

 The improvement is noticeable (about 55 nsec/call on an 8xx)

 vdsotest before the patch:
 gettimeofday:vdso: 731 nsec/call
 clock-gettime-realtime-coarse:vdso: 668 nsec/call
 clock-gettime-monotonic-coarse:vdso: 745 nsec/call

 vdsotest after the patch:
 gettimeofday:vdso: 677 nsec/call
 clock-gettime-realtime-coarse:vdso: 613 nsec/call
 clock-gettime-monotonic-coarse:vdso: 690 nsec/call

 Signed-off-by: Christophe Leroy 
>>>
>>> This doesn't build with gcc 4.6.3:
>>>
>>>/linux/arch/powerpc/kernel/vdso32/gettimeofday.S: Assembler messages:
>>>/linux/arch/powerpc/kernel/vdso32/gettimeofday.S:41: Error: unsupported 
>>> relocation against __kernel_datapage_offset
>>>/linux/arch/powerpc/kernel/vdso32/gettimeofday.S:86: Error: unsupported 
>>> relocation against __kernel_datapage_offset
>>>/linux/arch/powerpc/kernel/vdso32/gettimeofday.S:213: Error: unsupported 
>>> relocation against __kernel_datapage_offset
>>>/linux/arch/powerpc/kernel/vdso32/gettimeofday.S:247: Error: unsupported 
>>> relocation against __kernel_datapage_offset
>>>make[4]: *** [arch/powerpc/kernel/vdso32/gettimeofday.o] Error 1
>> 
>> Actually I guess it's binutils, which is v2.22 in this case.
>> 
>> Needed this:
>> 
>> diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
>> b/arch/powerpc/include/asm/vdso_datapage.h
>> index 12785f72f17d..0048db347ddf 100644
>> --- a/arch/powerpc/include/asm/vdso_datapage.h
>> +++ b/arch/powerpc/include/asm/vdso_datapage.h
>> @@ -117,7 +117,7 @@ extern struct vdso_data *vdso_data;
>>   .macro get_datapage ptr, tmp
>>  bcl 20, 31, .+4
>>  mflr\ptr
>> -addi\ptr, \ptr, __kernel_datapage_offset - (.-4)
>> +addi\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l
>>  lwz \tmp, 0(\ptr)
>>  add \ptr, \tmp, \ptr
>>   .endm
>> 
>
> Are you still planning to getting this series merged ? Do you need any 
> help / rebase / re-spin ?

Not sure. I'll possibly send a 2nd pull request next week with it
included.

cheers


Re: [PATCH v11 0/7] KVM: PPC: Driver to manage pages of secure guest

2019-11-27 Thread Bharata B Rao
On Mon, Nov 25, 2019 at 08:36:24AM +0530, Bharata B Rao wrote:
> Hi,
> 
> This is the next version of the patchset that adds required support
> in the KVM hypervisor to run secure guests on PEF-enabled POWER platforms.
> 

Here is a fix for the issue Hugh identified with the usage of ksm_madvise()
in this patchset. It applies on top of this patchset.


>From 8a4d769bf4c61f921c79ce68923be3c403bd5862 Mon Sep 17 00:00:00 2001
From: Bharata B Rao 
Date: Thu, 28 Nov 2019 09:31:54 +0530
Subject: [PATCH 1/1] KVM: PPC: Book3S HV: Take write mmap_sem when calling
 ksm_madvise

In order to prevent the device private pages (that correspond to
pages of secure guest) from participating in KSM merging, H_SVM_PAGE_IN
calls ksm_madvise() under read version of mmap_sem. However ksm_madvise()
needs to be under write lock, fix this.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index f24ac3cfb34c..2de264fc3156 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -46,11 +46,10 @@
  *
  * Locking order
  *
- * 1. srcu_read_lock(>srcu) - Protects KVM memslots
- * 2. down_read(>mm->mmap_sem) - find_vma, migrate_vma_pages and helpers
- * 3. mutex_lock(>arch.uvmem_lock) - protects read/writes to uvmem slots
- *   thus acting as sync-points
- *   for page-in/out
+ * 1. kvm->srcu - Protects KVM memslots
+ * 2. kvm->mm->mmap_sem - find_vma, migrate_vma_pages and helpers, ksm_madvise
+ * 3. kvm->arch.uvmem_lock - protects read/writes to uvmem slots thus acting
+ *  as sync-points for page-in/out
  */
 
 /*
@@ -344,7 +343,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 static int
 kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, unsigned long gpa, struct kvm *kvm,
-  unsigned long page_shift)
+  unsigned long page_shift, bool *downgrade)
 {
unsigned long src_pfn, dst_pfn = 0;
struct migrate_vma mig;
@@ -360,8 +359,15 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned 
long start,
mig.src = _pfn;
mig.dst = _pfn;
 
+   /*
+* We come here with mmap_sem write lock held just for
+* ksm_madvise(), otherwise we only need read mmap_sem.
+* Hence downgrade to read lock once ksm_madvise() is done.
+*/
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
  MADV_UNMERGEABLE, >vm_flags);
+   downgrade_write(>mm->mmap_sem);
+   *downgrade = true;
if (ret)
return ret;
 
@@ -456,6 +462,7 @@ unsigned long
 kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 unsigned long flags, unsigned long page_shift)
 {
+   bool downgrade = false;
unsigned long start, end;
struct vm_area_struct *vma;
int srcu_idx;
@@ -476,7 +483,7 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 
ret = H_PARAMETER;
srcu_idx = srcu_read_lock(>srcu);
-   down_read(>mm->mmap_sem);
+   down_write(>mm->mmap_sem);
 
start = gfn_to_hva(kvm, gfn);
if (kvm_is_error_hva(start))
@@ -492,12 +499,16 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
if (!vma || vma->vm_start > start || vma->vm_end < end)
goto out_unlock;
 
-   if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
+   if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift,
+   ))
ret = H_SUCCESS;
 out_unlock:
mutex_unlock(>arch.uvmem_lock);
 out:
-   up_read(>mm->mmap_sem);
+   if (downgrade)
+   up_read(>mm->mmap_sem);
+   else
+   up_write(>mm->mmap_sem);
srcu_read_unlock(>srcu, srcu_idx);
return ret;
 }
-- 
2.21.0



Re: [PATCH v5 0/3] LLVM/Clang fixes for a few defconfigs

2019-11-27 Thread Michael Ellerman
Nick Desaulniers  writes:
> Hi Michael,
> Do you have feedback for Nathan? Rebasing these patches is becoming a
> nuisance for our CI, and we would like to keep building PPC w/ Clang.

Sorry just lost in the flood of patches.

Merged now.

cheers

> On Mon, Nov 18, 2019 at 8:57 PM Nathan Chancellor
>  wrote:
>>
>> Hi all,
>>
>> This series includes a set of fixes for LLVM/Clang when building
>> a few defconfigs (powernv, ppc44x, and pseries are the ones that our
>> CI configuration tests [1]). The first patch fixes pseries_defconfig,
>> which has never worked in mainline. The second and third patches fixes
>> issues with all of these configs due to internal changes to LLVM, which
>> point out issues with the kernel.
>>
>> These have been broken since July/August, it would be nice to get these
>> reviewed and applied. Please let me know what I can do to get these
>> applied soon so we can stop applying them out of tree.
>>
>> [1]: https://github.com/ClangBuiltLinux/continuous-integration
>>
>> Previous versions:
>>
>> v3: 
>> https://lore.kernel.org/lkml/20190911182049.77853-1-natechancel...@gmail.com/
>>
>> v4: 
>> https://lore.kernel.org/lkml/20191014025101.18567-1-natechancel...@gmail.com/
>>
>> Cheers,
>> Nathan
>>
>>
>
>
> -- 
> Thanks,
> ~Nick Desaulniers


Re: [Very RFC 45/46] powernv/pci: Remove requirement for a pdn in config accessors

2019-11-27 Thread Alexey Kardashevskiy



On 20/11/2019 12:28, Oliver O'Halloran wrote:
> :toot:
> 
> Signed-off-by: Oliver O'Halloran 


Squash it into 26/46 "powernv/pci: Remove pdn from
pnv_pci_cfg_{read|write}". Thanks,


> ---
>  arch/powerpc/platforms/powernv/pci.c | 10 --
>  1 file changed, 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci.c 
> b/arch/powerpc/platforms/powernv/pci.c
> index 0eeea8652426..6383dcfec606 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -750,17 +750,12 @@ static int pnv_pci_read_config(struct pci_bus *bus,
>  unsigned int devfn,
>  int where, int size, u32 *val)
>  {
> - struct pci_dn *pdn;
>   struct pnv_phb *phb = pci_bus_to_pnvhb(bus);
>   u16 bdfn = bus->number << 8 | devfn;
>   struct eeh_dev *edev;
>   int ret;
>  
>   *val = 0x;
> - pdn = pci_get_pdn_by_devfn(bus, devfn);
> - if (!pdn)
> - return PCIBIOS_DEVICE_NOT_FOUND;
> -
>   edev = pnv_eeh_find_edev(phb, bdfn);
>   if (!pnv_eeh_pre_cfg_check(edev))
>   return PCIBIOS_DEVICE_NOT_FOUND;
> @@ -781,16 +776,11 @@ static int pnv_pci_write_config(struct pci_bus *bus,
>   unsigned int devfn,
>   int where, int size, u32 val)
>  {
> - struct pci_dn *pdn;
>   struct pnv_phb *phb = pci_bus_to_pnvhb(bus);
>   u16 bdfn = bus->number << 8 | devfn;
>   struct eeh_dev *edev;
>   int ret;
>  
> - pdn = pci_get_pdn_by_devfn(bus, devfn);
> - if (!pdn)
> - return PCIBIOS_DEVICE_NOT_FOUND;
> -
>   edev = pnv_eeh_find_edev(phb, bdfn);
>   if (!pnv_eeh_pre_cfg_check(edev))
>   return PCIBIOS_DEVICE_NOT_FOUND;
> 

-- 
Alexey


Re: [PATCH 1/1] powerpc/kvm/book3s: Fixes possible 'use after release' of kvm

2019-11-27 Thread Paul Mackerras
On Tue, Nov 26, 2019 at 02:52:12PM -0300, Leonardo Bras wrote:
> Fixes a possible 'use after free' of kvm variable.
> It does use mutex_unlock(>lock) after possible freeing a variable
> with kvm_put_kvm(kvm).

Comments below...

> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 5834db0a54c6..a402ead833b6 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -316,14 +316,13 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  
>   if (ret >= 0)
>   list_add_rcu(>list, >arch.spapr_tce_tables);
> - else
> - kvm_put_kvm(kvm);
>  
>   mutex_unlock(>lock);
>  
>   if (ret >= 0)
>   return ret;
>  
> + kvm_put_kvm(kvm);

There isn't a potential use-after-free here.  We are relying on the
property that the release function (kvm_vm_release) cannot be called
in parallel with this function.  The reason is that this function
(kvm_vm_ioctl_create_spapr_tce) is handling an ioctl on a kvm VM file
descriptor.  That means that a userspace process has the file
descriptor still open.  The code that implements the close() system
call makes sure that no thread is still executing inside any system
call that is using the same file descriptor before calling the file
descriptor's release function (in this case, kvm_vm_release).  That
means that this kvm_put_kvm() call here cannot make the reference
count go to zero.

>   kfree(stt);
>   fail_acct:
>   account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 13efc291b1c7..f37089b60d09 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2744,10 +2744,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
> u32 id)
>   /* Now it's all set up, let userspace reach it */
>   kvm_get_kvm(kvm);
>   r = create_vcpu_fd(vcpu);
> - if (r < 0) {
> - kvm_put_kvm(kvm);
> + if (r < 0)
>   goto unlock_vcpu_destroy;
> - }
>  
>   kvm->vcpus[atomic_read(>online_vcpus)] = vcpu;
>  
> @@ -2771,6 +2769,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
> u32 id)
>   mutex_lock(>lock);
>   kvm->created_vcpus--;
>   mutex_unlock(>lock);
> + if (r < 0)
> + kvm_put_kvm(kvm);
>   return r;
>  }

Once again we are inside an ioctl on the kvm VM file descriptor, so
the reference count cannot go to zero.

> @@ -3183,10 +3183,10 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
>   kvm_get_kvm(kvm);
>   ret = anon_inode_getfd(ops->name, _device_fops, dev, O_RDWR | 
> O_CLOEXEC);
>   if (ret < 0) {
> - kvm_put_kvm(kvm);
>   mutex_lock(>lock);
>   list_del(>vm_node);
>   mutex_unlock(>lock);
> + kvm_put_kvm(kvm);
>   ops->destroy(dev);
>   return ret;
>   }

Same again here.

Paul.


Re: [Very RFC 44/46] powerpc/pci: Don't set pdn->pe_number when applying the weird P8 NVLink PE hack

2019-11-27 Thread Alexey Kardashevskiy



On 20/11/2019 12:28, Oliver O'Halloran wrote:
> P8 needs to shove four GPUs into three PEs for $reasons. Remove the
> pdn->pe_assignment done there since we just use the pe_rmap[] now.


Reviewed-by: Alexey Kardashevskiy 




> 
> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 2a9201306543..eceff27357e5 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1183,7 +1183,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct 
> pci_dev *npu_pdev)
>   long rid;
>   struct pnv_ioda_pe *pe;
>   struct pci_dev *gpu_pdev;
> - struct pci_dn *npu_pdn;
>   struct pnv_phb *phb = pci_bus_to_pnvhb(npu_pdev->bus);
>  
>   /*
> @@ -1210,9 +1209,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct 
> pci_dev *npu_pdev)
>   dev_info(_pdev->dev,
>   "Associating to existing PE %x\n", pe_num);
>   pci_dev_get(npu_pdev);
> - npu_pdn = pci_get_pdn(npu_pdev);
> - rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
> - npu_pdn->pe_number = pe_num;
> +
> + rid = npu_pdev->bus->number << 8 | npu_pdev->devfn;
>   phb->ioda.pe_rmap[rid] = pe->pe_number;
>  
>   /* Map the PE to this link */
> 

-- 
Alexey


Re: [Very RFC 43/46] powernv/pci: Do not set pdn->pe_number for NPU/CAPI devices

2019-11-27 Thread Alexey Kardashevskiy
cc: Greg.


On 20/11/2019 12:28, Oliver O'Halloran wrote:
> The only thing we need the pdn for in this function is setting the pe_number
> field, which we don't use anymore. Fix the weird refcounting behaviour while
> we're here.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
> Either Fred, or Reza also fixed this in some patch lately and that'll 
> probably get
> merged before this one does.
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 27 +--
>  1 file changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 45d940730c30..2a9201306543 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1066,16 +1066,13 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
> *dev, int offset)
>  static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>  {
>   struct pnv_phb *phb = pci_bus_to_pnvhb(dev->bus);
> - struct pci_dn *pdn = pci_get_pdn(dev);
> - struct pnv_ioda_pe *pe;
> + struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
>  
> - if (!pdn) {
> - pr_err("%s: Device tree node not associated properly\n",
> -pci_name(dev));
> + /* Already has a PE assigned? huh? */
> + if (pe) {
> + WARN_ON(1);
>   return NULL;
>   }
> - if (pdn->pe_number != IODA_INVALID_PE)
> - return NULL;
>  
>   pe = pnv_ioda_alloc_pe(phb);
>   if (!pe) {
> @@ -1084,29 +1081,25 @@ static struct pnv_ioda_pe 
> *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   return NULL;
>   }
>  
> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the
> -  * pointer in the PE data structure, both should be destroyed at the
> -  * same time. However, this needs to be looked at more closely again
> -  * once we actually start removing things (Hotplug, SR-IOV, ...)
> + /*
> +  * NB: We **do not** hold a pci_dev ref for pe->pdev.
>*
> -  * At some point we want to remove the PDN completely anyways
> +  * The pci_dev's release function cleans up the ioda_pe state, so:
> +  *  a) We can't take a ref otherwise the release function is never 
> called
> +  *  b) The pe->pdev pointer will always point to valid pci_dev (or NULL)
>*/
> - pci_dev_get(dev);
> - pdn->pe_number = pe->pe_number;
>   pe->flags = PNV_IODA_PE_DEV;
>   pe->pdev = dev;
>   pe->pbus = NULL;
>   pe->mve_number = -1;
> - pe->rid = dev->bus->number << 8 | pdn->devfn;
> + pe->rid = dev->bus->number << 8 | dev->devfn;
>  
>   pe_info(pe, "Associated device to PE\n");
>  
>   if (pnv_ioda_configure_pe(phb, pe)) {
>   /* XXX What do we do here ? */
>   pnv_ioda_free_pe(pe);
> - pdn->pe_number = IODA_INVALID_PE;
>   pe->pdev = NULL;
> - pci_dev_put(dev);
>   return NULL;
>   }
>  
> 

-- 
Alexey


[PATCH] powerpc: add link stack flush mitigation status in debugfs.

2019-11-27 Thread Michal Suchanek
The link stack flush status is not visible in debugfs. It can be enabled
even when count cache flush is disabled. Add separate file for its
status.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/security.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 7d4b2080a658..56dce4798a4d 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -446,14 +446,26 @@ static int count_cache_flush_get(void *data, u64 *val)
return 0;
 }
 
+static int link_stack_flush_get(void *data, u64 *val)
+{
+   *val = link_stack_flush_enabled;
+
+   return 0;
+}
+
 DEFINE_DEBUGFS_ATTRIBUTE(fops_count_cache_flush, count_cache_flush_get,
 count_cache_flush_set, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_link_stack_flush, link_stack_flush_get,
+count_cache_flush_set, "%llu\n");
 
 static __init int count_cache_flush_debugfs_init(void)
 {
debugfs_create_file_unsafe("count_cache_flush", 0600,
   powerpc_debugfs_root, NULL,
   _count_cache_flush);
+   debugfs_create_file_unsafe("link_stack_flush", 0600,
+  powerpc_debugfs_root, NULL,
+  _link_stack_flush);
return 0;
 }
 device_initcall(count_cache_flush_debugfs_init);
-- 
2.23.0



[PATCH] selftests/powerpc: Use write_pmc instead of count_pmc to reset PMCs at the end of ebb selftests

2019-11-27 Thread Desnes A. Nunes do Rosario
By using count_pmc() to reset the pmc instead of write_pmc(), an extra
count is performed on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] more than
the value accounted by ebb_state.stats.ebb_count in the main test loops.
This extra pmc_count makes a few tests fail occasionally on PowerVM systems
with high workloads, such as cycles_test shown hereafter, where the
ebb_count is occasionally above the upper limit due to this extra count.

Moreover, this is also indicated by extra PMC1 trace_log on the output of
a few tests:

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
---
 .../powerpc/pmu/ebb/back_to_back_ebbs_test.c |  2 +-
 .../testing/selftests/powerpc/pmu/ebb/cycles_test.c  |  2 +-
 .../powerpc/pmu/ebb/cycles_with_freeze_test.c|  2 +-
 .../powerpc/pmu/ebb/cycles_with_mmcr2_test.c |  2 +-
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c|  2 +-
 .../powerpc/pmu/ebb/ebb_on_willing_child_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/multi_counter_test.c   | 12 ++--
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c |  2 +-
 .../selftests/powerpc/pmu/ebb/pmae_handling_test.c   |  2 +-
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c  |  2 +-
 11 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index a2d7b0e3dca9..f133ab425f10 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,7 +91,7 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index bc893813483e..14a399a64729 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,7 +42,7 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index dcd351d20328..0f2089f6f82c 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,7 +99,7 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index 94c99c12c0f2..a8f3bee04cd8 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,7 +71,7 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index dfbc5c3ad52d..bf6f25dfcf7b 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,7 +396,7 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index ca2f7d729155..513812cdcca1 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
@@ -38,7 +38,7 @@ static int victim_child(union pipe read_pipe, union pipe 
write_pipe)

Re: [PATCH 1/1] powerpc/kvm/book3s: Fixes possible 'use after release' of kvm

2019-11-27 Thread Paolo Bonzini
On 26/11/19 18:52, Leonardo Bras wrote:
> Fixes a possible 'use after free' of kvm variable.
> It does use mutex_unlock(>lock) after possible freeing a variable
> with kvm_put_kvm(kvm).
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 3 +--
>  virt/kvm/kvm_main.c  | 8 
>  2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 5834db0a54c6..a402ead833b6 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -316,14 +316,13 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  
>   if (ret >= 0)
>   list_add_rcu(>list, >arch.spapr_tce_tables);
> - else
> - kvm_put_kvm(kvm);
>  
>   mutex_unlock(>lock);
>  
>   if (ret >= 0)
>   return ret;
>  
> + kvm_put_kvm(kvm);
>   kfree(stt);
>   fail_acct:
>   account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);

This part is a good change, as it makes the code clearer.  The
virt/kvm/kvm_main.c bits, however, are not necessary as explained by Sean.

Paolo

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 13efc291b1c7..f37089b60d09 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2744,10 +2744,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
> u32 id)
>   /* Now it's all set up, let userspace reach it */
>   kvm_get_kvm(kvm);
>   r = create_vcpu_fd(vcpu);
> - if (r < 0) {
> - kvm_put_kvm(kvm);
> + if (r < 0)
>   goto unlock_vcpu_destroy;
> - }
>  
>   kvm->vcpus[atomic_read(>online_vcpus)] = vcpu;
>  
> @@ -2771,6 +2769,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
> u32 id)
>   mutex_lock(>lock);
>   kvm->created_vcpus--;
>   mutex_unlock(>lock);
> + if (r < 0)
> + kvm_put_kvm(kvm);
>   return r;
>  }
>  
> @@ -3183,10 +3183,10 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
>   kvm_get_kvm(kvm);
>   ret = anon_inode_getfd(ops->name, _device_fops, dev, O_RDWR | 
> O_CLOEXEC);
>   if (ret < 0) {
> - kvm_put_kvm(kvm);
>   mutex_lock(>lock);
>   list_del(>vm_node);
>   mutex_unlock(>lock);
> + kvm_put_kvm(kvm);
>   ops->destroy(dev);
>   return ret;
>   }
> 



RE: [PATCH 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2019-11-27 Thread Haren Myneni



"Linuxppc-dev" 
wrote on 11/27/2019 12:46:09 AM:
> >
> > +static void notify_process(pid_t pid, u64 fault_addr)
> > +{
> > +   int rc;
> > +   struct kernel_siginfo info;
> > +
> > +   memset(, 0, sizeof(info));
> > +
> > +   info.si_signo = SIGSEGV;
> > +   info.si_errno = EFAULT;
> > +   info.si_code = SEGV_MAPERR;
> > +
> > +   info.si_addr = (void *)fault_addr;
> > +   rcu_read_lock();
> > +   rc = kill_pid_info(SIGSEGV, , find_vpid(pid));
> > +   rcu_read_unlock();
> > +
> > +   pr_devel("%s(): pid %d kill_proc_info() rc %d\n", __func__, pid,
rc);
> > +}
>
> Shouldn't this use force_sig_fault_to_task instead?
>
> > +   /*
> > +* User space passed invalid CSB address, Notify process with
> > +* SEGV signal.
> > +*/
> > +   tsk = get_pid_task(window->pid, PIDTYPE_PID);
> > +   /*
> > +* Send window will be closed after processing all NX requests
> > +* and process exits after closing all windows. In multi-thread
> > +* applications, thread may not exists, but does not close FD
> > +* (means send window) upon exit. Parent thread (tgid) can use
> > +* and close the window later.
> > +*/
> > +   if (tsk) {
> > +  if (tsk->flags & PF_EXITING)
> > + task_exit = 1;
> > +  put_task_struct(tsk);
> > +  pid = vas_window_pid(window);
>
> The pid is later used for sending the signal again, why not keep the
> reference?

Sorry, Not dropping the PID reference here, Happens only when window
closed. If the task for this PID is not available, looking for tgid in the
case of multi-thread process.
>
> > +   } else {
> > +  pid = vas_window_tgid(window);
> > +
> > +  rcu_read_lock();
> > +  tsk = find_task_by_vpid(pid);
> > +  if (!tsk) {
> > + rcu_read_unlock();
> > + return;
> > +  }
> > +  if (tsk->flags & PF_EXITING)
> > + task_exit = 1;
> > +  rcu_read_unlock();
>
> Why does this not need a reference to the task, but the other one does?

Window is opened with open() and ioctl(fd), will be closed either by close
(fd) explicitly or release FD during process exit.

Process closes all open windows when it exits. So we do not need to keep
the reference for this case. In multi-thread case, child thread can open a
window, but it does not release FD when it exits. Parent thread (tgid) can
continue use this window and closes it upon its exit. So taking reference
to PID in case if this pid is assigned to child thread to make sure its pid
is not reused until window is closed.

We are taking pid reference during window open and releases it when closing
the window.

Thanks
Haren



>


RE: [PATCH 03/14] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block

2019-11-27 Thread Haren Myneni


"Linuxppc-dev" 
wrote on 11/27/2019 12:30:55 AM:

>
> > +#define crb_csb_addr(c)  __be64_to_cpu(c->csb_addr)
> > +#define crb_nx_fault_addr(c)   __be64_to_cpu
> (c->stamp.nx.fault_storage_addr)
> > +#define crb_nx_flags(c)  c->stamp.nx.flags
> > +#define crb_nx_fault_status(c)   c->stamp.nx.fault_status
>
> Except for crb_nx_fault_addr all these macros are unused, and
> crb_nx_fault_addr probably makes more sense open coded in the only
> caller.

Thanks, My mistake, code got changed and forgot to remove unused macros.

>
> Also please don't use the __ prefixed byte swap helpers in any driver
> or arch code.
>
> > +
> > +static inline uint32_t crb_nx_pswid(struct coprocessor_request_block
*crb)
> > +{
> > +   return __be32_to_cpu(crb->stamp.nx.pswid);
> > +}
>
> Same here.  Also not sure what the point of the helper is except for
> obsfucating the code.
>


RE: [PATCH 02/14] Revert "powerpc/powernv: remove the unused vas_win_paste_addr and vas_win_id functions"

2019-11-27 Thread Haren Myneni


"Linuxppc-dev" 
wrote on 11/27/2019 12:28:10 AM:
>
> On Tue, Nov 26, 2019 at 05:03:27PM -0800, Haren Myneni wrote:
> >
> > This reverts commit 452d23c0f6bd97f2fd8a9691fee79b76040a0feb.
> >
> > User space send windows (NX GZIP compression) need vas_win_paste_addr()
> > to mmap window paste address and vas_win_id() to get window ID when
> > window address is given.
>
> Even with your full series applied vas_win_paste_addr is entirely
> unused, and vas_win_id is only used once in the same file it is defined.

Thanks for the review.
vas_win_paste_addr() will be used in NX compression driver and planning to
post this series soon. Can I add this change later as part of this series?

>
> So instead of this patch you should just open code vas_win_id in
> init_winctx_for_txwin.
>
> > +static inline u32 encode_pswid(int vasid, int winid)
> > +{
> > +   u32 pswid = 0;
> > +
> > +   pswid |= vasid << (31 - 7);
> > +   pswid |= winid;
> > +
> > +   return pswid;
>
> This can be simplified down to:
>
>return (u32)winid | (vasid << (31 - 7));
>


Re: [PATCH 1/1] powerpc/kvm/book3s: Fixes possible 'use after release' of kvm

2019-11-27 Thread Leonardo Bras
On Wed, 2019-11-27 at 17:40 +0100, Paolo Bonzini wrote:
> >   
> >if (ret >= 0)
> >list_add_rcu(>list, >arch.spapr_tce_tables);
> > - else
> > - kvm_put_kvm(kvm);
> >   
> >mutex_unlock(>lock);
> >   
> >if (ret >= 0)
> >return ret;
> >   
> > + kvm_put_kvm(kvm);
> >kfree(stt);
> >fail_acct:
> >account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
> 
> This part is a good change, as it makes the code clearer.  The
> virt/kvm/kvm_main.c bits, however, are not necessary as explained by Sean.
> 

Thanks!
So, like this patch?
https://lkml.org/lkml/2019/11/7/763

Best regards,

Leonardo


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 0/2] Replace current->mm by kvm->mm on powerpc/kvm

2019-11-27 Thread Leonardo Bras
Result of Travis-CI testing the change:
https://travis-ci.org/LeoBras/linux-ppc/builds/617712012


signature.asc
Description: This is a digitally signed message part


Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-27 Thread Christian Zigotzky

On 27 November 2019 at 07:56 am, Mike Rapoport wrote:


Maybe we'll simply force bottom up allocation before calling
swiotlb_init()? Anyway, it's the last memblock allocation.


diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 62f74b1b33bd..771e6cf7e2b9 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -286,14 +286,15 @@ void __init mem_init(void)
/*
 * book3s is limited to 16 page sizes due to encoding this in
 * a 4-bit field for slices.
 */
BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
  
  #ifdef CONFIG_SWIOTLB

+   memblock_set_bottom_up(true);
swiotlb_init(0);
  #endif
  
  	high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);

set_max_mapnr(max_pfn);
memblock_free_all();
  
  

Hello Mike,

I tested the latest Git kernel with your new patch today. My PCI TV card 
works without any problems.


Thanks,
Christian


Re: [PATCH v4 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()

2019-11-27 Thread Christophe Leroy




Le 27/11/2019 à 15:59, Segher Boessenkool a écrit :

On Wed, Nov 27, 2019 at 02:50:30PM +0100, Christophe Leroy wrote:

So what do we do ? We just drop the "r2" clobber ?


You have to make sure your asm code works for all ABIs.  This is quite
involved if you do a call to an external function.  The compiler does
*not* see this call, so you will have to make sure that all that the
compiler and linker do will work, or prevent some of those things (say,
inlining of the function containing the call).


But the whole purpose of the patch is to inline the call to __do_irq() 
in order to avoid the trampoline function.





Otherwise, to be on the safe side we can just save r2 in a local var
before the bl and restore it after. I guess it won't collapse CPU time
on a performant PPC64.


That does not fix everything.  The called function requires a specific
value in r2 on entry.


Euh ... but there is nothing like that when using existing 
call_do_irq(). How does GCC know that call_do_irq() has same TOC as 
__do_irq() ?




So all this needs verification.  Hopefully you can get away with just
not clobbering r2 (and not adding a nop after the bl), sure.  But this
needs to be checked.

Changing control flow inside inline assembler always is problematic.
Another problem in this case (on all ABIs) is that the compiler does
not see you call __do_irq.  Again, you can probably get away with that
too, but :-)


Anyway it sees I reference it, as it is in input arguments. Isn't it 
enough ?


Christophe


Re: [PATCH v4 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()

2019-11-27 Thread Segher Boessenkool
On Wed, Nov 27, 2019 at 02:50:30PM +0100, Christophe Leroy wrote:
> So what do we do ? We just drop the "r2" clobber ?

You have to make sure your asm code works for all ABIs.  This is quite
involved if you do a call to an external function.  The compiler does
*not* see this call, so you will have to make sure that all that the
compiler and linker do will work, or prevent some of those things (say,
inlining of the function containing the call).

> Otherwise, to be on the safe side we can just save r2 in a local var 
> before the bl and restore it after. I guess it won't collapse CPU time 
> on a performant PPC64.

That does not fix everything.  The called function requires a specific
value in r2 on entry.

So all this needs verification.  Hopefully you can get away with just
not clobbering r2 (and not adding a nop after the bl), sure.  But this
needs to be checked.

Changing control flow inside inline assembler always is problematic.
Another problem in this case (on all ABIs) is that the compiler does
not see you call __do_irq.  Again, you can probably get away with that
too, but :-)


Segher


Re: [PATCH v3 4/8] powerpc/vdso32: inline __get_datapage()

2019-11-27 Thread Christophe Leroy

Hi Michael,

Le 22/11/2019 à 07:38, Michael Ellerman a écrit :

Michael Ellerman  writes:

Christophe Leroy  writes:

__get_datapage() is only a few instructions to retrieve the
address of the page where the kernel stores data to the VDSO.

By inlining this function into its users, a bl/blr pair and
a mflr/mtlr pair is avoided, plus a few reg moves.

The improvement is noticeable (about 55 nsec/call on an 8xx)

vdsotest before the patch:
gettimeofday:vdso: 731 nsec/call
clock-gettime-realtime-coarse:vdso: 668 nsec/call
clock-gettime-monotonic-coarse:vdso: 745 nsec/call

vdsotest after the patch:
gettimeofday:vdso: 677 nsec/call
clock-gettime-realtime-coarse:vdso: 613 nsec/call
clock-gettime-monotonic-coarse:vdso: 690 nsec/call

Signed-off-by: Christophe Leroy 


This doesn't build with gcc 4.6.3:

   /linux/arch/powerpc/kernel/vdso32/gettimeofday.S: Assembler messages:
   /linux/arch/powerpc/kernel/vdso32/gettimeofday.S:41: Error: unsupported 
relocation against __kernel_datapage_offset
   /linux/arch/powerpc/kernel/vdso32/gettimeofday.S:86: Error: unsupported 
relocation against __kernel_datapage_offset
   /linux/arch/powerpc/kernel/vdso32/gettimeofday.S:213: Error: unsupported 
relocation against __kernel_datapage_offset
   /linux/arch/powerpc/kernel/vdso32/gettimeofday.S:247: Error: unsupported 
relocation against __kernel_datapage_offset
   make[4]: *** [arch/powerpc/kernel/vdso32/gettimeofday.o] Error 1


Actually I guess it's binutils, which is v2.22 in this case.

Needed this:

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index 12785f72f17d..0048db347ddf 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -117,7 +117,7 @@ extern struct vdso_data *vdso_data;
  .macro get_datapage ptr, tmp
bcl 20, 31, .+4
mflr\ptr
-   addi\ptr, \ptr, __kernel_datapage_offset - (.-4)
+   addi\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l
lwz \tmp, 0(\ptr)
add \ptr, \tmp, \ptr
  .endm



Are you still planning to getting this series merged ? Do you need any 
help / rebase / re-spin ?


Christophe


Re: [PATCH v1 1/4] powerpc/fixmap: don't clear fixmap area in paging_init()

2019-11-27 Thread Christophe Leroy




Le 26/11/2019 à 02:13, Michael Ellerman a écrit :

On Thu, 2019-09-12 at 13:49:41 UTC, Christophe Leroy wrote:

fixmap is intended to map things permanently like the IMMR region on
FSL SOC (8xx, 83xx, ...), so don't clear it when initialising paging()

Signed-off-by: Christophe Leroy 


Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f2bb86937d86ebcb0e52f95b6d19aba1d850e601



Hi,

What happened ?

It looks like it is gone in today's powerpc next.

Christophe


Re: [PATCH v4 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()

2019-11-27 Thread Christophe Leroy




Le 25/11/2019 à 15:25, Segher Boessenkool a écrit :

On Mon, Nov 25, 2019 at 09:32:23PM +1100, Michael Ellerman wrote:

Segher Boessenkool  writes:

+static inline void call_do_irq(struct pt_regs *regs, void *sp)
+{
+   register unsigned long r3 asm("r3") = (unsigned long)regs;
+
+   /* Temporarily switch r1 to sp, call __do_irq() then restore r1 */
+   asm volatile(
+   "  "PPC_STLU"1, %2(%1);\n"
+   "  mr  1, %1;\n"
+   "  bl  %3;\n"
+   "  "PPC_LL"  1, 0(1);\n" :
+   "+r"(r3) :
+   "b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), "i"(__do_irq) 
:
+   "lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
+   "r0", "r2", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", 
"r12");
+}


If we add a nop after the bl, so the linker could insert a TOC restore,
then I don't think there's any circumstance under which we expect this
to actually clobber r2, is there?


That is mostly correct.


That's the standard I aspire to :P


If call_do_irq was a no-inline function, there would not be problems.

What TOC does __do_irq require in r2 on entry, and what will be there
when it returns?


The kernel TOC, and also the kernel TOC, unless something's gone wrong
or I'm missing something.


If that is the case, we can just do the bl, no nop at all?  And that works
for all of our ABIs.

If we can be certain that we have the kernel TOC in r2 on entry to
call_do_irq, that is!  (Or it establishes it itself).


So what do we do ? We just drop the "r2" clobber ?

Otherwise, to be on the safe side we can just save r2 in a local var 
before the bl and restore it after. I guess it won't collapse CPU time 
on a performant PPC64.


Christophe


[GIT PULL] y2038: syscall implementation cleanups

2019-11-27 Thread Arnd Bergmann
The following changes since commit a99d8080aaf358d5d23581244e5da23b35e340b9:

  Linux 5.4-rc6 (2019-11-03 14:07:26 -0800)

are available in the Git repository at:

  git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground.git
tags/y2038-cleanups-5.5

for you to fetch changes up to b111df8447acdeb4b9220f99d5d4b28f83eb56ad:

  y2038: alarm: fix half-second cut-off (2019-11-25 21:52:35 +0100)


y2038: syscall implementation cleanups

This is a series of cleanups for the y2038 work, mostly intended
for namespace cleaning: the kernel defines the traditional
time_t, timeval and timespec types that often lead to y2038-unsafe
code. Even though the unsafe usage is mostly gone from the kernel,
having the types and associated functions around means that we
can still grow new users, and that we may be missing conversions
to safe types that actually matter.

There are still a number of driver specific patches needed to
get the last users of these types removed, those have been
submitted to the respective maintainers.

Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-a...@arndb.de/
Signed-off-by: Arnd Bergmann 


Arnd Bergmann (26):
  y2038: remove CONFIG_64BIT_TIME
  y2038: add __kernel_old_timespec and __kernel_old_time_t
  y2038: vdso: change timeval to __kernel_old_timeval
  y2038: vdso: change timespec to __kernel_old_timespec
  y2038: vdso: change time_t to __kernel_old_time_t
  y2038: vdso: nds32: open-code timespec_add_ns()
  y2038: vdso: powerpc: avoid timespec references
  y2038: ipc: remove __kernel_time_t reference from headers
  y2038: stat: avoid 'time_t' in 'struct stat'
  y2038: uapi: change __kernel_time_t to __kernel_old_time_t
  y2038: rusage: use __kernel_old_timeval
  y2038: syscalls: change remaining timeval to __kernel_old_timeval
  y2038: socket: remove timespec reference in timestamping
  y2038: socket: use __kernel_old_timespec instead of timespec
  y2038: make ns_to_compat_timeval use __kernel_old_timeval
  y2038: elfcore: Use __kernel_old_timeval for process times
  y2038: timerfd: Use timespec64 internally
  y2038: time: avoid timespec usage in settimeofday()
  y2038: itimer: compat handling to itimer.c
  y2038: use compat_{get,set}_itimer on alpha
  y2038: move itimer reset into itimer.c
  y2038: itimer: change implementation to timespec64
  y2038: allow disabling time32 system calls
  y2038: fix typo in powerpc vdso "LOPART"
  y2038: ipc: fix x32 ABI breakage
  y2038: alarm: fix half-second cut-off

 arch/Kconfig  |  11 +-
 arch/alpha/kernel/osf_sys.c   |  67 +--
 arch/alpha/kernel/syscalls/syscall.tbl|   4 +-
 arch/ia64/kernel/asm-offsets.c|   2 +-
 arch/mips/include/uapi/asm/msgbuf.h   |   6 +-
 arch/mips/include/uapi/asm/sembuf.h   |   4 +-
 arch/mips/include/uapi/asm/shmbuf.h   |   6 +-
 arch/mips/include/uapi/asm/stat.h |  16 +--
 arch/mips/kernel/binfmt_elfn32.c  |   4 +-
 arch/mips/kernel/binfmt_elfo32.c  |   4 +-
 arch/nds32/kernel/vdso/gettimeofday.c |  61 +-
 arch/parisc/include/uapi/asm/msgbuf.h |   6 +-
 arch/parisc/include/uapi/asm/sembuf.h |   4 +-
 arch/parisc/include/uapi/asm/shmbuf.h |   6 +-
 arch/powerpc/include/asm/asm-prototypes.h |   3 +-
 arch/powerpc/include/asm/vdso_datapage.h  |   6 +-
 arch/powerpc/include/uapi/asm/msgbuf.h|   6 +-
 arch/powerpc/include/uapi/asm/sembuf.h|   4 +-
 arch/powerpc/include/uapi/asm/shmbuf.h|   6 +-
 arch/powerpc/include/uapi/asm/stat.h  |   2 +-
 arch/powerpc/kernel/asm-offsets.c |  18 ++-
 arch/powerpc/kernel/syscalls.c|   4 +-
 arch/powerpc/kernel/time.c|   5 +-
 arch/powerpc/kernel/vdso32/gettimeofday.S |   6 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S |   8 +-
 arch/sparc/include/uapi/asm/msgbuf.h  |   6 +-
 arch/sparc/include/uapi/asm/sembuf.h  |   4 +-
 arch/sparc/include/uapi/asm/shmbuf.h  |   6 +-
 arch/sparc/include/uapi/asm/stat.h|  24 ++--
 arch/sparc/vdso/vclock_gettime.c  |  36 +++---
 arch/x86/entry/vdso/vclock_gettime.c  |   6 +-
 arch/x86/entry/vsyscall/vsyscall_64.c |   4 +-
 arch/x86/include/uapi/asm/msgbuf.h|   6 +-
 arch/x86/include/uapi/asm/sembuf.h|   4 +-
 arch/x86/include/uapi/asm/shmbuf.h|   6 +-
 arch/x86/um/vdso/um_vdso.c|  12 +-
 fs/aio.c  |   2 +-
 fs/binfmt_elf.c   |  12 +-
 fs/binfmt_elf_fdpic.c |  12 +-
 fs/compat_binfmt_elf.c|   4 +-
 fs/select.c   |  10 +-
 fs/timerfd.c  |  14 +--
 fs/utimes.c   |   8 +-
 include/linux/compat.h

Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 10:47:45 +0100
Frederic Barrat  wrote:

> 
> 
> Le 27/11/2019 à 10:33, Greg Kurz a écrit :
> > On Wed, 27 Nov 2019 10:10:13 +0100
> > Frederic Barrat  wrote:
> > 
> >>
> >>
> >> Le 27/11/2019 à 09:24, Greg Kurz a écrit :
> >>> On Wed, 27 Nov 2019 18:09:40 +1100
> >>> Alexey Kardashevskiy  wrote:
> >>>
> 
> 
>  On 20/11/2019 12:28, Oliver O'Halloran wrote:
> > The comment here implies that we don't need to take a ref to the pci_dev
> > because the ioda_pe will always have one. This implies that the current
> > expection is that the pci_dev for an NPU device will *never* be torn
> > down since the ioda_pe having a ref to the device will prevent the
> > release function from being called.
> >
> > In other words, the desired behaviour here appears to be leaking a ref.
> >
> > Nice!
> 
> 
>  There is a history: https://patchwork.ozlabs.org/patch/1088078/
> 
>  We did not fix anything in particular then, we do not seem to be fixing
>  anything now (in other words - we cannot test it in a normal natural
>  way). I'd drop this one.
> 
> >>>
> >>> Yeah, I didn't fix anything at the time. Just reverted to the ref
> >>> count behavior we had before:
> >>>
> >>> https://patchwork.ozlabs.org/patch/829172/
> >>>
> >>> Frederic recently posted his take on the same topic from the OpenCAPI
> >>> point of view:
> >>>
> >>> http://patchwork.ozlabs.org/patch/1198947/
> >>>
> >>> He seems to indicate the NPU devices as the real culprit because
> >>> nobody ever cared for them to be removable. Fixing that seems be
> >>> a chore nobody really wants to address obviously... :-\
> >>
> >>
> >> I had taken a stab at not leaking a ref for the nvlink devices and do
> >> the proper thing regarding ref counting (i.e. fixing all the callers of
> >> get_pci_dev() to drop the reference when they were done). With that, I
> >> could see that the ref count of the nvlink devices could drop to 0
> >> (calling remove for the device in /sys) and that the devices could go away.
> >>
> >> But then, I realized it's not necessarily desirable at this point. There
> >> are several comments in the code saying the npu devices (for nvlink)
> >> don't go away, there's no device release callback defined when it seems
> >> there should be, at least to handle releasing PEs All in all, it
> >> seems that some work would be needed. And if it hasn't been required by
> >> now...
> >>
> > 
> > If everyone is ok with leaking a reference in the NPU case, I guess
> > this isn't a problem. But if we move forward with Oliver's patch, a
> > pci_dev_put() would be needed for OpenCAPI, correct ?
> 
> 
> No, these code paths are nvlink-only.
> 

Oh yes indeed. Then this patch and yours fit well together :)

>Fred
> 
> 
> 
> >> Fred
> >>
> >>
> 
> 
> >
> > Signed-off-by: Oliver O'Halloran 
> > ---
> >arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
> >1 file changed, 3 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> > b/arch/powerpc/platforms/powernv/npu-dma.c
> > index 72d3749da02c..2eb6e6d45a98 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct 
> > device_node *dn)
> > break;
> >
> > /*
> > -* pci_get_domain_bus_and_slot() increased the reference count 
> > of
> > -* the PCI device, but callers don't need that actually as the 
> > PE
> > -* already holds a reference to the device. Since callers aren't
> > -* aware of the reference count change, call pci_dev_put() now 
> > to
> > -* avoid leaks.
> > +* NB: for_each_pci_dev() elevates the pci_dev refcount.
> > +* Caller is responsible for dropping the ref when it's
> > +* finished with it.
> >  */
> > -   if (pdev)
> > -   pci_dev_put(pdev);
> > -
> > return pdev;
> >}
> >
> >
> 
> >>>
> >>
> > 
> 



Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 10:10:13 +0100
Frederic Barrat  wrote:

> 
> 
> Le 27/11/2019 à 09:24, Greg Kurz a écrit :
> > On Wed, 27 Nov 2019 18:09:40 +1100
> > Alexey Kardashevskiy  wrote:
> > 
> >>
> >>
> >> On 20/11/2019 12:28, Oliver O'Halloran wrote:
> >>> The comment here implies that we don't need to take a ref to the pci_dev
> >>> because the ioda_pe will always have one. This implies that the current
> >>> expection is that the pci_dev for an NPU device will *never* be torn
> >>> down since the ioda_pe having a ref to the device will prevent the
> >>> release function from being called.
> >>>
> >>> In other words, the desired behaviour here appears to be leaking a ref.
> >>>
> >>> Nice!
> >>
> >>
> >> There is a history: https://patchwork.ozlabs.org/patch/1088078/
> >>
> >> We did not fix anything in particular then, we do not seem to be fixing
> >> anything now (in other words - we cannot test it in a normal natural
> >> way). I'd drop this one.
> >>
> > 
> > Yeah, I didn't fix anything at the time. Just reverted to the ref
> > count behavior we had before:
> > 
> > https://patchwork.ozlabs.org/patch/829172/
> > 
> > Frederic recently posted his take on the same topic from the OpenCAPI
> > point of view:
> > 
> > http://patchwork.ozlabs.org/patch/1198947/
> > 
> > He seems to indicate the NPU devices as the real culprit because
> > nobody ever cared for them to be removable. Fixing that seems be
> > a chore nobody really wants to address obviously... :-\
> 
> 
> I had taken a stab at not leaking a ref for the nvlink devices and do 
> the proper thing regarding ref counting (i.e. fixing all the callers of 
> get_pci_dev() to drop the reference when they were done). With that, I 
> could see that the ref count of the nvlink devices could drop to 0 
> (calling remove for the device in /sys) and that the devices could go away.
> 
> But then, I realized it's not necessarily desirable at this point. There 
> are several comments in the code saying the npu devices (for nvlink) 
> don't go away, there's no device release callback defined when it seems 
> there should be, at least to handle releasing PEs All in all, it 
> seems that some work would be needed. And if it hasn't been required by 
> now...
> 

If everyone is ok with leaking a reference in the NPU case, I guess
this isn't a problem. But if we move forward with Oliver's patch, a
pci_dev_put() would be needed for OpenCAPI, correct ?

>Fred
> 
> 
> >>
> >>
> >>>
> >>> Signed-off-by: Oliver O'Halloran 
> >>> ---
> >>>   arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
> >>>   1 file changed, 3 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> >>> b/arch/powerpc/platforms/powernv/npu-dma.c
> >>> index 72d3749da02c..2eb6e6d45a98 100644
> >>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> >>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> >>> @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node 
> >>> *dn)
> >>>   break;
> >>>   
> >>>   /*
> >>> -  * pci_get_domain_bus_and_slot() increased the reference count of
> >>> -  * the PCI device, but callers don't need that actually as the PE
> >>> -  * already holds a reference to the device. Since callers aren't
> >>> -  * aware of the reference count change, call pci_dev_put() now to
> >>> -  * avoid leaks.
> >>> +  * NB: for_each_pci_dev() elevates the pci_dev refcount.
> >>> +  * Caller is responsible for dropping the ref when it's
> >>> +  * finished with it.
> >>>*/
> >>> - if (pdev)
> >>> - pci_dev_put(pdev);
> >>> -
> >>>   return pdev;
> >>>   }
> >>>   
> >>>
> >>
> > 
> 



[PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

2019-11-27 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On PSeries LPARs, to compute the utilization, tools such as lparstat
need to know the [S]PURR ticks when the CPUs were busy or idle.

In the pseries cpuidle driver, we keep track of the idle PURR ticks in
the VPA variable "wait_state_cycles". This patch extends the support
to account for the idle SPURR ticks.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kernel/idle.c|  2 ++
 drivers/cpuidle/cpuidle-pseries.c | 28 +---
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index a36fd05..708ec68 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -33,6 +33,8 @@
 unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(cpuidle_disable);
 
+DEFINE_PER_CPU(u64, idle_spurr_cycles);
+
 static int __init powersave_off(char *arg)
 {
ppc_md.power_save = NULL;
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 74c2479..45e2be4 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -30,11 +30,14 @@ struct cpuidle_driver pseries_idle_driver = {
 static struct cpuidle_state *cpuidle_state_table __read_mostly;
 static u64 snooze_timeout __read_mostly;
 static bool snooze_timeout_en __read_mostly;
+DECLARE_PER_CPU(u64, idle_spurr_cycles);
 
-static inline void idle_loop_prolog(unsigned long *in_purr)
+static inline void idle_loop_prolog(unsigned long *in_purr,
+   unsigned long *in_spurr)
 {
ppc64_runlatch_off();
*in_purr = mfspr(SPRN_PURR);
+   *in_spurr = mfspr(SPRN_SPURR);
/*
 * Indicate to the HV that we are idle. Now would be
 * a good time to find other work to dispatch.
@@ -42,13 +45,16 @@ static inline void idle_loop_prolog(unsigned long *in_purr)
get_lppaca()->idle = 1;
 }
 
-static inline void idle_loop_epilog(unsigned long in_purr)
+static inline void idle_loop_epilog(unsigned long in_purr,
+   unsigned long in_spurr)
 {
u64 wait_cycles;
+   u64 *idle_spurr_cycles_ptr = this_cpu_ptr(_spurr_cycles);
 
wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
wait_cycles += mfspr(SPRN_PURR) - in_purr;
get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+   *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;
get_lppaca()->idle = 0;
 
ppc64_runlatch_on();
@@ -58,12 +64,12 @@ static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
 {
-   unsigned long in_purr;
+   unsigned long in_purr, in_spurr;
u64 snooze_exit_time;
 
set_thread_flag(TIF_POLLING_NRFLAG);
 
-   idle_loop_prolog(_purr);
+   idle_loop_prolog(_purr, _spurr);
local_irq_enable();
snooze_exit_time = get_tb() + snooze_timeout;
 
@@ -87,7 +93,7 @@ static int snooze_loop(struct cpuidle_device *dev,
 
local_irq_disable();
 
-   idle_loop_epilog(in_purr);
+   idle_loop_epilog(in_purr, in_spurr);
 
return index;
 }
@@ -113,9 +119,9 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
 {
-   unsigned long in_purr;
+   unsigned long in_purr, in_spurr;
 
-   idle_loop_prolog(_purr);
+   idle_loop_prolog(_purr, _spurr);
get_lppaca()->donate_dedicated_cpu = 1;
 
HMT_medium();
@@ -124,7 +130,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
local_irq_disable();
get_lppaca()->donate_dedicated_cpu = 0;
 
-   idle_loop_epilog(in_purr);
+   idle_loop_epilog(in_purr, in_spurr);
 
return index;
 }
@@ -133,9 +139,9 @@ static int shared_cede_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
 {
-   unsigned long in_purr;
+   unsigned long in_purr, in_spurr;
 
-   idle_loop_prolog(_purr);
+   idle_loop_prolog(_purr, _spurr);
 
/*
 * Yield the processor to the hypervisor.  We return if
@@ -147,7 +153,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
check_and_cede_processor();
 
local_irq_disable();
-   idle_loop_epilog(in_purr);
+   idle_loop_epilog(in_purr, in_spurr);
 
return index;
 }
-- 
1.9.4



[PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

2019-11-27 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On PSeries LPARs, the data centers planners desire a more accurate
view of system utilization per resource such as CPU to plan the system
capacity requirements better. Such accuracy can be obtained by reading
PURR/SPURR registers for CPU resource utilization.

Tools such as lparstat which are used to compute the utilization need
to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
counters are already exposed through sysfs.  We already account for
PURR ticks when we go to idle so that we can update the VPA area. This
patchset extends support to account for SPURR ticks when idle, and
expose both via per-cpu sysfs files.

These patches are required for enhancement to the lparstat utility
that compute the CPU utilization based on PURR and SPURR which can be
found here :
https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4

Gautham R. Shenoy (3):
  powerpc/pseries: Account for SPURR ticks on idle CPUs
  powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
  Documentation: Document sysfs interfaces purr, spurr, idle_purr,
idle_spurr

 Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++
 arch/powerpc/kernel/idle.c |  2 ++
 arch/powerpc/kernel/sysfs.c| 32 ++
 drivers/cpuidle/cpuidle-pseries.c  | 28 ++--
 4 files changed, 90 insertions(+), 11 deletions(-)

-- 
1.9.4



[PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2019-11-27 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

The total PURR and SPURR ticks are already exposed via the per-cpu
sysfs files /sys/devices/system/cpu/cpuX/purr and
/sys/devices/system/cpu/cpuX/spurr.

This patch adds support for exposing the idle PURR and SPURR ticks via
/sys/devices/system/cpu/cpuX/idle_purr and
/sys/devices/system/cpu/cpuX/idle_spurr.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kernel/sysfs.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 80a676d..42ade55 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
 }
 static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
 
+static ssize_t idle_purr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, dev);
+   unsigned int cpuid = cpu->dev.id;
+   struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
+   u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
+
+   return sprintf(buf, "%llx\n", idle_purr_cycles);
+}
+static DEVICE_ATTR_RO(idle_purr);
+
+DECLARE_PER_CPU(u64, idle_spurr_cycles);
+static ssize_t idle_spurr_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, dev);
+   unsigned int cpuid = cpu->dev.id;
+   u64 *idle_spurr_cycles_ptr = per_cpu_ptr(_spurr_cycles, cpuid);
+
+   return sprintf(buf, "%llx\n", *idle_spurr_cycles_ptr);
+}
+static DEVICE_ATTR_RO(idle_spurr);
+
+static void create_idle_purr_spurr_sysfs_entry(struct device *cpudev)
+{
+   device_create_file(cpudev, _attr_idle_purr);
+   device_create_file(cpudev, _attr_idle_spurr);
+}
+
 static int __init topology_init(void)
 {
int cpu, r;
@@ -1067,6 +1097,8 @@ static int __init topology_init(void)
register_cpu(c, cpu);
 
device_create_file(>dev, _attr_physical_id);
+   if (firmware_has_feature(FW_FEATURE_SPLPAR))
+   create_idle_purr_spurr_sysfs_entry(>dev);
}
}
r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
-- 
1.9.4



[PATCH 3/3] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr

2019-11-27 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Add documentation for the following sysfs interfaces:
/sys/devices/system/cpu/cpuX/purr
/sys/devices/system/cpu/cpuX/spurr
/sys/devices/system/cpu/cpuX/idle_purr
/sys/devices/system/cpu/cpuX/idle_spurr

Signed-off-by: Gautham R. Shenoy 
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index fc20cde..ecd23fb 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -574,3 +574,42 @@ Description:   Secure Virtual Machine
If 1, it means the system is using the Protected Execution
Facility in POWER9 and newer processors. i.e., it is a Secure
Virtual Machine.
+
+What:  /sys/devices/system/cpu/cpuX/purr
+Date:  Apr 2005
+Contact:   Linux for PowerPC mailing list 
+Description:   PURR ticks for this CPU since the system boot.
+
+   The Processor Utilization Resources Register (PURR) is
+   a 64-bit counter which provides an estimate of the
+   resources used by the CPU thread. The contents of this
+   register increases monotonically. This sysfs interface
+   exposes the number of PURR ticks for cpuX.
+
+What:  /sys/devices/system/cpu/cpuX/spurr
+Date:  Dec 2006
+Contact:   Linux for PowerPC mailing list 
+Description:   SPURR ticks for this CPU since the system boot.
+
+   The Scaled Processor Utilization Resources Register
+   (SPURR) is a 64-bit counter that provides a frequency
+   invariant estimate of the resources used by the CPU
+   thread. The contents of this register increases
+   monotonically. This sysfs interface exposes the number
+   of SPURR ticks for cpuX.
+
+What:  /sys/devices/system/cpu/cpuX/idle_purr
+Date:  Nov 2019
+Contact:   Linux for PowerPC mailing list 
+Description:   PURR ticks for cpuX when it was idle.
+
+   This sysfs interface exposes the number of PURR ticks
+   for cpuX when it was idle.
+
+What:  /sys/devices/system/cpu/cpuX/spurr
+Date:  Nov 2019
+Contact:   Linux for PowerPC mailing list 
+Description:   SPURR ticks for cpuX when it was idle.
+
+   This sysfs interface exposes the number of SPURR ticks
+   for cpuX when it was idle.
-- 
1.9.4



Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 20:40:00 +1100
"Oliver O'Halloran"  wrote:

> On Wed, Nov 27, 2019 at 8:34 PM Greg Kurz  wrote:
> >
> >
> > If everyone is ok with leaking a reference in the NPU case, I guess
> > this isn't a problem. But if we move forward with Oliver's patch, a
> > pci_dev_put() would be needed for OpenCAPI, correct ?
> 
> Yes, but I think that's fair enough. By convention it's the callers
> responsibility to drop the ref when it calls a function that returns a
> refcounted object. Doing anything else creates a race condition since
> the object's count could drop to zero before the caller starts using
> it.
> 

Sure, you're right, especially with Frederic's patch that drops
the pci_dev_get(dev) in pnv_ioda_setup_dev_PE().

> Oliver



Re: [Y2038] [PATCH 07/23] y2038: vdso: powerpc: avoid timespec references

2019-11-27 Thread Arnd Bergmann
On Thu, Nov 21, 2019 at 5:25 PM Christophe Leroy
 wrote:
> Arnd Bergmann  a écrit :
> > On Wed, Nov 20, 2019 at 11:43 PM Ben Hutchings
> >  wrote:
> >>
> >> On Fri, 2019-11-08 at 22:07 +0100, Arnd Bergmann wrote:
> >> > @@ -192,7 +190,7 @@ V_FUNCTION_BEGIN(__kernel_time)
> >> >   bl  __get_datapage@local
> >> >   mr  r9, r3  /* datapage ptr in r9 */
> >> >
> >> > - lwz r3,STAMP_XTIME+TSPEC_TV_SEC(r9)
> >> > + lwz r3,STAMP_XTIME_SEC+LOWPART(r9)
> >>
> >> "LOWPART" should be "LOPART".
> >>
> >
> > Thanks, fixed both instances in a patch on top now. I considered folding
> > it into the original patch, but as it's close to the merge window I'd
> > rather not rebase it, and this way I also give you credit for
> > finding the bug.
>
> Take care, might conflict with
> https://github.com/linuxppc/linux/commit/5e381d727fe8834ca5a126f510194a7a4ac6dd3a

Sorry for my late reply. I see this commit and no other variant of it has
made it into linux-next by now, so I assume this is not getting sent for v5.5
and it's not stopping me from sending my own pull request.

Please let me know if I missed something and this will cause problems.

On a related note: are you still working on the generic lib/vdso support for
powerpc? Without that, future libc implementations that use 64-bit time_t
will have to use the slow clock_gettime64 syscall instead of the vdso,
which has a significant performance impact.

   Arnd


[PATCH v2 rebase 34/34] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.

2019-11-27 Thread Michal Suchanek
Signed-off-by: Michal Suchanek 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9d3a5c54a41d..4d2a43542c83 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12774,6 +12774,8 @@ F:  arch/*/kernel/*/perf_event*.c
 F: arch/*/kernel/*/*/perf_event*.c
 F: arch/*/include/asm/perf_event.h
 F: arch/*/kernel/perf_callchain.c
+F: arch/*/perf/*
+F: arch/*/perf/*/*
 F: arch/*/events/*
 F: arch/*/events/*/*
 F: tools/perf/
-- 
2.23.0



[PATCH v2 rebase 33/34] powerpc/perf: split callchain.c by bitness

2019-11-27 Thread Michal Suchanek
Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.

No code change intended.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/perf/Makefile   |   5 +-
 arch/powerpc/perf/callchain.c| 362 +--
 arch/powerpc/perf/callchain.h|  20 ++
 arch/powerpc/perf/callchain_32.c | 197 +
 arch/powerpc/perf/callchain_64.c | 178 +++
 5 files changed, 400 insertions(+), 362 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index c155dcbb8691..53d614e98537 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_PERF_EVENTS)  += callchain.o perf_regs.o
+obj-$(CONFIG_PERF_EVENTS)  += callchain.o callchain_$(BITS).o perf_regs.o
+ifdef CONFIG_COMPAT
+obj-$(CONFIG_PERF_EVENTS)  += callchain_32.o
+endif
 
 obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += ppc970-pmu.o power5-pmu.o \
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index b9fc2f297f30..dd5051015008 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,11 +15,9 @@
 #include 
 #include 
 #include 
-#ifdef CONFIG_COMPAT
-#include "../kernel/ppc32.h"
-#endif
 #include 
 
+#include "callchain.h"
 
 /*
  * Is sp valid as the address of the next kernel stack frame after prev_sp?
@@ -102,364 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
}
 }
 
-static inline int valid_user_sp(unsigned long sp)
-{
-   bool is_64 = !is_32bit_task();
-
-   if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
-   return 0;
-   return 1;
-}
-
-#ifdef CONFIG_PPC64
-/*
- * On 64-bit we don't want to invoke hash_page on user addresses from
- * interrupt context, so if the access faults, we read the page tables
- * to find which page (if any) is mapped and access it directly.
- */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-   int ret = -EFAULT;
-   pgd_t *pgdir;
-   pte_t *ptep, pte;
-   unsigned shift;
-   unsigned long addr = (unsigned long) ptr;
-   unsigned long offset;
-   unsigned long pfn, flags;
-   void *kaddr;
-
-   pgdir = current->mm->pgd;
-   if (!pgdir)
-   return -EFAULT;
-
-   local_irq_save(flags);
-   ptep = find_current_mm_pte(pgdir, addr, NULL, );
-   if (!ptep)
-   goto err_out;
-   if (!shift)
-   shift = PAGE_SHIFT;
-
-   /* align address to page boundary */
-   offset = addr & ((1UL << shift) - 1);
-
-   pte = READ_ONCE(*ptep);
-   if (!pte_present(pte) || !pte_user(pte))
-   goto err_out;
-   pfn = pte_pfn(pte);
-   if (!page_is_ram(pfn))
-   goto err_out;
-
-   /* no highmem to worry about here */
-   kaddr = pfn_to_kaddr(pfn);
-   memcpy(buf, kaddr + offset, nb);
-   ret = 0;
-err_out:
-   local_irq_restore(flags);
-   return ret;
-}
-
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-{
-   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-   ((unsigned long)ptr & 7))
-   return -EFAULT;
-
-   pagefault_disable();
-   if (!__get_user_inatomic(*ret, ptr)) {
-   pagefault_enable();
-   return 0;
-   }
-   pagefault_enable();
-
-   return read_user_stack_slow(ptr, ret, 8);
-}
-
-/*
- * 64-bit user processes use the same stack frame for RT and non-RT signals.
- */
-struct signal_frame_64 {
-   chardummy[__SIGNAL_FRAMESIZE];
-   struct ucontext uc;
-   unsigned long   unused[2];
-   unsigned inttramp[6];
-   struct siginfo  *pinfo;
-   void*puc;
-   struct siginfo  info;
-   charabigap[288];
-};
-
-static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
-{
-   if (nip == fp + offsetof(struct signal_frame_64, tramp))
-   return 1;
-   if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
-   nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
-   return 1;
-   return 0;
-}
-
-/*
- * Do some sanity checking on the signal frame pointed to by sp.
- * We check the pinfo and puc pointers in the frame.
- */
-static int sane_signal_64_frame(unsigned long sp)
-{
-   struct signal_frame_64 __user *sf;
-   unsigned long pinfo, puc;
-
-   sf = (struct signal_frame_64 __user *) sp;
-   if (read_user_stack_64((unsigned long __user *) >pinfo, ) ||
-   read_user_stack_64((unsigned long __user *) >puc, ))
-   

[PATCH v2 rebase 31/34] powerpc/64: make buildable without CONFIG_COMPAT

2019-11-27 Thread Michal Suchanek
There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/thread_info.h | 4 ++--
 arch/powerpc/kernel/Makefile   | 6 +++---
 arch/powerpc/kernel/entry_64.S | 2 ++
 arch/powerpc/kernel/signal.c   | 3 +--
 arch/powerpc/kernel/syscall_64.c   | 6 ++
 arch/powerpc/kernel/vdso.c | 3 ++-
 arch/powerpc/perf/callchain.c  | 8 +++-
 7 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 8e1d0195ac36..c128d8a48ea3 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int 
flags)
return (ti->local_flags & flags) != 0;
 }
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #define is_32bit_task()(test_thread_flag(TIF_32BIT))
 #else
-#define is_32bit_task()(1)
+#define is_32bit_task()(IS_ENABLED(CONFIG_PPC32))
 #endif
 
 #if defined(CONFIG_PPC64)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 72ba4622fc2c..0270f4b440a5 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -41,16 +41,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
 endif
 
 obj-y  := cputable.o ptrace.o syscalls.o \
-  irq.o align.o signal_32.o pmc.o vdso.o \
+  irq.o align.o signal_$(BITS).o pmc.o vdso.o \
   process.o systbl.o idle.o \
   signal.o sysfs.o cacheinfo.o time.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
   of_platform.o prom_parse.o
-obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
-  signal_64.o ptrace32.o \
+obj-$(CONFIG_PPC64)+= setup_64.o \
   paca.o nvram_64.o firmware.o note.o \
   syscall_64.o
+obj-$(CONFIG_COMPAT)   += sys_ppc32.o ptrace32.o signal_32.o
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 00173cc904ef..c339a984958f 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -52,8 +52,10 @@
 SYS_CALL_TABLE:
.tc sys_call_table[TC],sys_call_table
 
+#ifdef CONFIG_COMPAT
 COMPAT_SYS_CALL_TABLE:
.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
 
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 60436432399f..61678cb0e6a1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
sigset_t *oldset = sigmask_to_save();
struct ksignal ksig = { .sig = 0 };
int ret;
-   int is32 = is_32bit_task();
 
BUG_ON(tsk != current);
 
@@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
 
rseq_signal_deliver(, tsk->thread.regs);
 
-   if (is32) {
+   if (is_32bit_task()) {
if (ksig.ka.sa.sa_flags & SA_SIGINFO)
ret = handle_rt_signal32(, oldset, tsk);
else
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 62f44c3072f3..783deda66866 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -18,7 +18,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, 
long);
 
 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8, unsigned long r0, struct pt_regs *regs)
 {
-   unsigned long ti_flags;
syscall_fn f;
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S))
@@ -65,8 +64,7 @@ long system_call_exception(long r3, long r4, long r5, long 
r6, long r7, long r8,
 
__hard_irq_enable();
 
-   ti_flags = current_thread_info()->flags;
-   if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+   if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
/*
 * We use the return value of do_syscall_trace_enter() as the
 * syscall number. If the syscall was rejected for any reason
@@ -82,7 +80,7 @@ long system_call_exception(long r3, long r4, long r5, long 
r6, long r7, long r8,
/* May be faster to do array_index_nospec? */
barrier_nospec();
 
-   if (unlikely(ti_flags & _TIF_32BIT)) {
+   if (unlikely(is_32bit_task())) {
f = (void 

[PATCH v2 rebase 32/34] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.

2019-11-27 Thread Michal Suchanek
On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek 
Reviewed-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e446bb5b3f8d..fabae186eea7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -267,8 +267,9 @@ config PANIC_TIMEOUT
default 180
 
 config COMPAT
-   bool
-   default y if PPC64
+   bool "Enable support for 32bit binaries"
+   depends on PPC64
+   default y if !CPU_LITTLE_ENDIAN
select COMPAT_BINFMT_ELF
select ARCH_WANT_OLD_COMPAT_IPC
select COMPAT_OLD_SIGACTION
-- 
2.23.0



[PATCH v2 rebase 30/34] powerpc/perf: consolidate valid_user_sp

2019-11-27 Thread Michal Suchanek
Merge the 32bit and 64bit version.

Halve the check constants on 32bit.

Use STACK_TOP since it is defined.

Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.

This removes a page from the valid 32bit area on 64bit:
 #define TASK_SIZE_USER32 (0x0001UL - (1 * PAGE_SIZE))
 #define STACK_TOP_USER32 TASK_SIZE_USER32

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/perf/callchain.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c6c4c609cc14..a22a19975a19 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -102,6 +102,15 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
}
 }
 
+static inline int valid_user_sp(unsigned long sp)
+{
+   bool is_64 = !is_32bit_task();
+
+   if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
+   return 0;
+   return 1;
+}
+
 #ifdef CONFIG_PPC64
 /*
  * On 64-bit we don't want to invoke hash_page on user addresses from
@@ -165,13 +174,6 @@ static int read_user_stack_64(unsigned long __user *ptr, 
unsigned long *ret)
return read_user_stack_slow(ptr, ret, 8);
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-   if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32)
-   return 0;
-   return 1;
-}
-
 /*
  * 64-bit user processes use the same stack frame for RT and non-RT signals.
  */
@@ -230,7 +232,7 @@ static void perf_callchain_user_64(struct 
perf_callchain_entry_ctx *entry,
 
while (entry->nr < entry->max_stack) {
fp = (unsigned long __user *) sp;
-   if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, _sp))
+   if (!valid_user_sp(sp) || read_user_stack_64(fp, _sp))
return;
if (level > 0 && read_user_stack_64([2], _ip))
return;
@@ -279,13 +281,6 @@ static inline void perf_callchain_user_64(struct 
perf_callchain_entry_ctx *entry
 {
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-   if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
-   return 0;
-   return 1;
-}
-
 #define __SIGNAL_FRAMESIZE32   __SIGNAL_FRAMESIZE
 #define sigcontext32   sigcontext
 #define mcontext32 mcontext
@@ -428,7 +423,7 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry_ctx *entry,
 
while (entry->nr < entry->max_stack) {
fp = (unsigned int __user *) (unsigned long) sp;
-   if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, _sp))
+   if (!valid_user_sp(sp) || read_user_stack_32(fp, _sp))
return;
if (level > 0 && read_user_stack_32([1], _ip))
return;
-- 
2.23.0



[PATCH v2 rebase 29/34] powerpc/perf: consolidate read_user_stack_32

2019-11-27 Thread Michal Suchanek
There are two almost identical copies for 32bit and 64bit.

The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.

Signed-off-by: Michal Suchanek 
Reviewed-by: Christophe Leroy 
---
 arch/powerpc/perf/callchain.c | 59 +++
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 35d542515faf..c6c4c609cc14 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -165,22 +165,6 @@ static int read_user_stack_64(unsigned long __user *ptr, 
unsigned long *ret)
return read_user_stack_slow(ptr, ret, 8);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-   ((unsigned long)ptr & 3))
-   return -EFAULT;
-
-   pagefault_disable();
-   if (!__get_user_inatomic(*ret, ptr)) {
-   pagefault_enable();
-   return 0;
-   }
-   pagefault_enable();
-
-   return read_user_stack_slow(ptr, ret, 4);
-}
-
 static inline int valid_user_sp(unsigned long sp, int is_64)
 {
if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32)
@@ -285,25 +269,9 @@ static void perf_callchain_user_64(struct 
perf_callchain_entry_ctx *entry,
 }
 
 #else  /* CONFIG_PPC64 */
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 {
-   int rc;
-
-   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-   ((unsigned long)ptr & 3))
-   return -EFAULT;
-
-   pagefault_disable();
-   rc = __get_user_inatomic(*ret, ptr);
-   pagefault_enable();
-
-   return rc;
+   return 0;
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx 
*entry,
@@ -326,6 +294,29 @@ static inline int valid_user_sp(unsigned long sp, int 
is_64)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+   int rc;
+
+   if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+   ((unsigned long)ptr & 3))
+   return -EFAULT;
+
+   pagefault_disable();
+   rc = __get_user_inatomic(*ret, ptr);
+   pagefault_enable();
+
+   if (IS_ENABLED(CONFIG_PPC64) && rc)
+   return read_user_stack_slow(ptr, ret, 4);
+   return rc;
+}
+
 /*
  * Layout for non-RT signal frames
  */
-- 
2.23.0



[PATCH v2 rebase 28/34] powerpc: move common register copy functions from signal_32.c to signal.c

2019-11-27 Thread Michal Suchanek
These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek 
Reviewed-by: Christophe Leroy 
---
 arch/powerpc/kernel/signal.c| 141 
 arch/powerpc/kernel/signal_32.c | 140 ---
 2 files changed, 141 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e6c30cee6abf..60436432399f 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,12 +18,153 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "signal.h"
 
+#ifdef CONFIG_VSX
+unsigned long copy_fpr_to_user(void __user *to,
+  struct task_struct *task)
+{
+   u64 buf[ELF_NFPREG];
+   int i;
+
+   /* save FPR copy to local buffer then write to the thread_struct */
+   for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+   buf[i] = task->thread.TS_FPR(i);
+   buf[i] = task->thread.fp_state.fpscr;
+   return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_fpr_from_user(struct task_struct *task,
+void __user *from)
+{
+   u64 buf[ELF_NFPREG];
+   int i;
+
+   if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+   return 1;
+   for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+   task->thread.TS_FPR(i) = buf[i];
+   task->thread.fp_state.fpscr = buf[i];
+
+   return 0;
+}
+
+unsigned long copy_vsx_to_user(void __user *to,
+  struct task_struct *task)
+{
+   u64 buf[ELF_NVSRHALFREG];
+   int i;
+
+   /* save FPR copy to local buffer then write to the thread_struct */
+   for (i = 0; i < ELF_NVSRHALFREG; i++)
+   buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
+   return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_vsx_from_user(struct task_struct *task,
+void __user *from)
+{
+   u64 buf[ELF_NVSRHALFREG];
+   int i;
+
+   if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+   return 1;
+   for (i = 0; i < ELF_NVSRHALFREG ; i++)
+   task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+   return 0;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+unsigned long copy_ckfpr_to_user(void __user *to,
+ struct task_struct *task)
+{
+   u64 buf[ELF_NFPREG];
+   int i;
+
+   /* save FPR copy to local buffer then write to the thread_struct */
+   for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+   buf[i] = task->thread.TS_CKFPR(i);
+   buf[i] = task->thread.ckfp_state.fpscr;
+   return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
+ void __user *from)
+{
+   u64 buf[ELF_NFPREG];
+   int i;
+
+   if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+   return 1;
+   for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+   task->thread.TS_CKFPR(i) = buf[i];
+   task->thread.ckfp_state.fpscr = buf[i];
+
+   return 0;
+}
+
+unsigned long copy_ckvsx_to_user(void __user *to,
+ struct task_struct *task)
+{
+   u64 buf[ELF_NVSRHALFREG];
+   int i;
+
+   /* save FPR copy to local buffer then write to the thread_struct */
+   for (i = 0; i < ELF_NVSRHALFREG; i++)
+   buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
+   return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
+ void __user *from)
+{
+   u64 buf[ELF_NVSRHALFREG];
+   int i;
+
+   if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+   return 1;
+   for (i = 0; i < ELF_NVSRHALFREG ; i++)
+   task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+   return 0;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#else
+inline unsigned long copy_fpr_to_user(void __user *to,
+ struct task_struct *task)
+{
+   return __copy_to_user(to, task->thread.fp_state.fpr,
+ ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_fpr_from_user(struct task_struct *task,
+   void __user *from)
+{
+   return __copy_from_user(task->thread.fp_state.fpr, from,
+ ELF_NFPREG * sizeof(double));
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+inline unsigned long copy_ckfpr_to_user(void __user *to,
+struct task_struct *task)
+{
+   return __copy_to_user(to, task->thread.ckfp_state.fpr,
+ ELF_NFPREG * sizeof(double));
+}
+

[PATCH v2 rebase 27/34] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro

2019-11-27 Thread Michal Suchanek
This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")

When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.

Link: https://lore.kernel.org/lkml/20190828151552.ga16...@infradead.org/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/

Signed-off-by: Michal Suchanek 
Reviewed-by: Arnd Bergmann 
---
 arch/powerpc/include/asm/unistd.h | 1 +
 fs/read_write.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index b0720c7c3fcf..700fcdac2e3c 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -31,6 +31,7 @@
 #define __ARCH_WANT_SYS_SOCKETCALL
 #define __ARCH_WANT_SYS_FADVISE64
 #define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
 #define __ARCH_WANT_SYS_NICE
 #define __ARCH_WANT_SYS_OLD_GETRLIMIT
 #define __ARCH_WANT_SYS_OLD_UNAME
diff --git a/fs/read_write.c b/fs/read_write.c
index 5bbf587f5bc1..89aa2701dbeb 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -331,7 +331,8 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, 
compat_off_t, offset, unsigned i
 }
 #endif
 
-#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
+#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
+   defined(__ARCH_WANT_SYS_LLSEEK)
 SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
unsigned long, offset_low, loff_t __user *, result,
unsigned int, whence)
-- 
2.23.0



[PATCH v2 rebase 26/34] powerpc/64: system call: Fix sparse warning about missing declaration

2019-11-27 Thread Michal Suchanek
Sparse warns about missing declarations for these functions:

+arch/powerpc/kernel/syscall_64.c:108:23: warning: symbol 
'syscall_exit_prepare' was not declared. Should it be static?
+arch/powerpc/kernel/syscall_64.c:18:6: warning: symbol 'system_call_exception' 
was not declared. Should it be static?
+arch/powerpc/kernel/syscall_64.c:200:23: warning: symbol 
'interrupt_exit_user_prepare' was not declared. Should it be static?
+arch/powerpc/kernel/syscall_64.c:288:23: warning: symbol 
'interrupt_exit_kernel_prepare' was not declared. Should it be static?

Add declaration for them.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/asm-prototypes.h | 6 ++
 arch/powerpc/kernel/syscall_64.c  | 1 +
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 399ca63196e4..841746357833 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -96,6 +96,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, 
fd_set __user *exp, s
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
+#ifdef CONFIG_PPC64
+long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8, unsigned long r0, struct pt_regs *regs);
+notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs 
*regs);
+notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, 
unsigned long msr);
+notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, 
unsigned long msr);
+#endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low);
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index d00cfc4a39a9..62f44c3072f3 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -1,4 +1,5 @@
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.23.0



[PATCH v2 rebase 25/34] powerpc/64s/exception: remove lite interrupt return

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

The difference between lite and regular returns is that the lite case
restores all NVGPRs, whereas lite skips that. This is quite clumsy
though, most interrupts want the NVGPRs saved for debugging, not to
modify in the caller, so the NVGPRs restore is not necessary most of
the time. Restore NVGPRs explicitly for one case that requires it,
and move everything else over to avoiding the restore unless the
interrupt return demands it (e.g., handling a signal).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S   |  4 
 arch/powerpc/kernel/exceptions-64s.S | 21 +++--
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index b2e68f5ca8f7..00173cc904ef 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -452,10 +452,6 @@ _GLOBAL(fast_interrupt_return)
 
.balign IFETCH_ALIGN_BYTES
 _GLOBAL(interrupt_return)
-   REST_NVGPRS(r1)
-
-   .balign IFETCH_ALIGN_BYTES
-_GLOBAL(interrupt_return_lite)
ld  r4,_MSR(r1)
andi.   r0,r4,MSR_PR
beq kernel_interrupt_return
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 269edd1460be..1bccc869ebd3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1507,7 +1507,7 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_IRQ
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM hardware_interrupt
 
@@ -1694,7 +1694,7 @@ EXC_COMMON_BEGIN(decrementer_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  timer_interrupt
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM decrementer
 
@@ -1785,7 +1785,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #else
bl  unknown_exception
 #endif
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM doorbell_super
 
@@ -2183,7 +2183,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #else
bl  unknown_exception
 #endif
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM h_doorbell
 
@@ -2213,7 +2213,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_IRQ
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM h_virt_irq
 
@@ -2260,7 +2260,7 @@ EXC_COMMON_BEGIN(performance_monitor_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  performance_monitor_exception
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM performance_monitor
 
@@ -3013,7 +3013,7 @@ do_hash_page:
 cmpdi  r3,0/* see if __hash_page succeeded */
 
/* Success */
-   beq interrupt_return_lite   /* Return from exception on success */
+   beq interrupt_return/* Return from exception on success */
 
/* Error */
blt-13f
@@ -3027,10 +3027,11 @@ do_hash_page:
 handle_page_fault:
 11:andis.  r0,r5,DSISR_DABRMATCH@h
bne-handle_dabr_fault
+   bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_page_fault
cmpdi   r3,0
-   beq+interrupt_return_lite
+   beq+interrupt_return
mr  r5,r3
addir3,r1,STACK_FRAME_OVERHEAD
ld  r4,_DAR(r1)
@@ -3045,9 +3046,9 @@ handle_dabr_fault:
bl  do_break
/*
 * do_break() may have changed the NV GPRS while handling a breakpoint.
-* If so, we need to restore them with their updated values. Don't use
-* interrupt_return_lite here.
+* If so, we need to restore them with their updated values.
 */
+   REST_NVGPRS(r1)
b   interrupt_return
 
 
-- 
2.23.0



[PATCH v2 rebase 24/34] powerpc/64s: interrupt return in C

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Implement the bulk of interrupt return logic in C. The asm return code
must handle a few cases: restoring full GPRs, and emulating stack store.

The asm return code is moved into 64e for now. The new logic has made
allowance for 64e, but I don't have a full environment that works well
to test it, and even booting in emulated qemu is not great for stress
testing. 64e shouldn't be too far off working with this, given a bit
more testing and auditing of the logic.

This is slightly faster on a POWER9 (page fault speed increases about
1.1%), probably due to reduced mtmsrd.

Signed-off-by: Nicholas Piggin 
[ms: Move the FP restore functions to restore_math. They are not used
anywhere else and when restore_math is not built gcc warns about them
being unused.
Add asm/context_tracking.h include to exceptions-64e.S for SCHEDULE_USER
definition.]
Signed-off-by: Michal Suchanek 
---
 .../powerpc/include/asm/book3s/64/kup-radix.h |  10 +
 arch/powerpc/include/asm/switch_to.h  |   6 +
 arch/powerpc/kernel/entry_64.S| 475 --
 arch/powerpc/kernel/exceptions-64e.S  | 255 +-
 arch/powerpc/kernel/exceptions-64s.S  | 119 ++---
 arch/powerpc/kernel/process.c |  89 ++--
 arch/powerpc/kernel/syscall_64.c  | 157 +-
 arch/powerpc/kernel/vector.S  |   2 +-
 8 files changed, 623 insertions(+), 490 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 07058edc5970..762afbed4762 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -60,6 +60,12 @@
 #include 
 #include 
 
+static inline void kuap_restore_amr(struct pt_regs *regs)
+{
+   if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   mtspr(SPRN_AMR, regs->kuap);
+}
+
 static inline void kuap_check_amr(void)
 {
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
@@ -110,6 +116,10 @@ static inline bool bad_kuap_fault(struct pt_regs *regs, 
bool is_write)
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
 #else /* CONFIG_PPC_KUAP */
+static inline void kuap_restore_amr(struct pt_regs *regs)
+{
+}
+
 static inline void kuap_check_amr(void)
 {
 }
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 476008bc3d08..b867b58b1093 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -23,7 +23,13 @@ extern void switch_booke_debug_regs(struct debug_reg 
*new_debug);
 
 extern int emulate_altivec(struct pt_regs *);
 
+#ifdef CONFIG_PPC_BOOK3S_64
 void restore_math(struct pt_regs *regs);
+#else
+static inline void restore_math(struct pt_regs *regs)
+{
+}
+#endif
 
 void restore_tm_state(struct pt_regs *regs);
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 15bc2a872a76..b2e68f5ca8f7 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -279,7 +280,7 @@ flush_count_cache:
  * state of one is saved on its kernel stack.  Then the state
  * of the other is restored from its kernel stack.  The memory
  * management hardware is updated to the second process's state.
- * Finally, we can return to the second process, via ret_from_except.
+ * Finally, we can return to the second process, via interrupt_return.
  * On entry, r3 points to the THREAD for the current task, r4
  * points to the THREAD for the new task.
  *
@@ -433,408 +434,150 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
addir1,r1,SWITCH_FRAME_SIZE
blr
 
-   .align  7
-_GLOBAL(ret_from_except)
-   ld  r11,_TRAP(r1)
-   andi.   r0,r11,1
-   bne ret_from_except_lite
-   REST_NVGPRS(r1)
-
-_GLOBAL(ret_from_except_lite)
+#ifdef CONFIG_PPC_BOOK3S
/*
-* Disable interrupts so that current_thread_info()->flags
-* can't change between when we test it and when we return
-* from the interrupt.
-*/
-#ifdef CONFIG_PPC_BOOK3E
-   wrteei  0
-#else
-   li  r10,MSR_RI
-   mtmsrd  r10,1 /* Update machine state */
-#endif /* CONFIG_PPC_BOOK3E */
+* If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
+* touched, AMR not set, no exit work created, then this can be used.
+*/
+   .balign IFETCH_ALIGN_BYTES
+_GLOBAL(fast_interrupt_return)
+   ld  r4,_MSR(r1)
+   andi.   r0,r4,MSR_PR
+   bne .Lfast_user_interrupt_return
+   andi.   r0,r4,MSR_RI
+   bne+.Lfast_kernel_interrupt_return
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  unrecoverable_exception
+   b   . /* should not get here */
 
-   ld  r9, PACA_THREAD_INFO(r13)
-   ld  r3,_MSR(r1)
-#ifdef 

[PATCH v2 rebase 23/34] powerpc/64: system call implement the bulk of the logic in C

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

System call entry and particularly exit code is beyond the limit of what
is reasonable to implement in asm.

This conversion moves all conditional branches out of the asm code,
except for the case that all GPRs should be restored at exit.

Null syscall test is about 5% faster after this patch, because the exit
work is handled under local_irq_disable, and the hard mask and pending
interrupt replay is handled after that, which avoids games with MSR.

Signed-off-by: Nicholas Piggin 
[ms: add endian conversion for dtl_idx]
Signed-off-by: Michal Suchanek 

v3:
- Fix !KUAP build [mpe]
- Fix BookE build/boot [mpe]
- Don't trace irqs with MSR[RI]=0
- Don't allow syscall_exit_prepare to be ftraced, because function
  graph tracing which traces exits barfs after the IRQ state is
  prepared for kernel exit.
- Fix BE syscall table to use normal function descriptors now that they
  are called from C.
- Comment syscall_exit_prepare.
---
 arch/powerpc/include/asm/asm-prototypes.h |  11 -
 .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
 arch/powerpc/include/asm/cputime.h|  24 ++
 arch/powerpc/include/asm/hw_irq.h |   4 +
 arch/powerpc/include/asm/ptrace.h |   3 +
 arch/powerpc/include/asm/signal.h |   3 +
 arch/powerpc/include/asm/switch_to.h  |   5 +
 arch/powerpc/include/asm/time.h   |   3 +
 arch/powerpc/kernel/Makefile  |   3 +-
 arch/powerpc/kernel/entry_64.S| 337 +++---
 arch/powerpc/kernel/signal.h  |   2 -
 arch/powerpc/kernel/syscall_64.c  | 195 ++
 arch/powerpc/kernel/systbl.S  |   9 +-
 13 files changed, 300 insertions(+), 313 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 8561498e653c..399ca63196e4 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -103,14 +103,6 @@ long sys_switch_endian(void);
 notrace unsigned int __check_irq_replay(void);
 void notrace restore_interrupts(void);
 
-/* ptrace */
-long do_syscall_trace_enter(struct pt_regs *regs);
-void do_syscall_trace_leave(struct pt_regs *regs);
-
-/* process */
-void restore_math(struct pt_regs *regs);
-void restore_tm_state(struct pt_regs *regs);
-
 /* prom_init (OpenFirmware) */
 unsigned long __init prom_init(unsigned long r3, unsigned long r4,
   unsigned long pp,
@@ -121,9 +113,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned 
long r4,
 void __init early_setup(unsigned long dt_ptr);
 void early_setup_secondary(void);
 
-/* time */
-void accumulate_stolen_time(void);
-
 /* misc runtime */
 extern u64 __bswapdi2(u64);
 extern s64 __lshrdi3(s64, int);
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index f254de956d6a..07058edc5970 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -3,6 +3,7 @@
 #define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
 
 #include 
+#include 
 
 #define AMR_KUAP_BLOCK_READUL(0x4000)
 #define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
@@ -56,7 +57,14 @@
 
 #ifdef CONFIG_PPC_KUAP
 
-#include 
+#include 
+#include 
+
+static inline void kuap_check_amr(void)
+{
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
+}
 
 /*
  * We support individually allowing read or write, but we don't support nesting
@@ -101,6 +109,10 @@ static inline bool bad_kuap_fault(struct pt_regs *regs, 
bool is_write)
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
+#else /* CONFIG_PPC_KUAP */
+static inline void kuap_check_amr(void)
+{
+}
 #endif /* CONFIG_PPC_KUAP */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index 2431b4ada2fa..c43614cffaac 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -60,6 +60,30 @@ static inline void arch_vtime_task_switch(struct task_struct 
*prev)
 }
 #endif
 
+static inline void account_cpu_user_entry(void)
+{
+   unsigned long tb = mftb();
+   struct cpu_accounting_data *acct = get_accounting(current);
+
+   acct->utime += (tb - acct->starttime_user);
+   acct->starttime = tb;
+}
+static inline void account_cpu_user_exit(void)
+{
+   unsigned long tb = mftb();
+   struct cpu_accounting_data *acct = get_accounting(current);
+
+   acct->stime += (tb - acct->starttime);
+   acct->starttime_user = tb;
+}
+
 #endif /* __KERNEL__ */
+#else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
+static inline void 

[PATCH v2 rebase 22/34] powerpc/64: system call remove non-volatile GPR save optimisation

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

powerpc has an optimisation where interrupts avoid saving the
non-volatile (or callee saved) registers to the interrupt stack frame if
they are not required.

Two problems with this are that an interrupt does not always know
whether it will need non-volatiles; and if it does need them, they can
only be saved from the entry-scoped asm code (because we don't control
what the C compiler does with these registers).

system calls are the most difficult: some system calls always require
all registers (e.g., fork, to copy regs into the child).  Sometimes
registers are only required under certain conditions (e.g., tracing,
signal delivery). These cases require ugly logic in the call chains
(e.g., ppc_fork), and require a lot of logic to be implemented in asm.

So remove the optimisation for system calls, and always save NVGPRs on
entry. Modern high performance CPUs are not so sensitive, because the
stores are dense in cache and can be hidden by other expensive work in
the syscall path -- the null syscall selftests benchmark on POWER9 is
not slowed (124.40ns before and 123.64ns after, i.e., within the noise).

Other interrupts retain the NVGPR optimisation for now.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S   | 72 +---
 arch/powerpc/kernel/syscalls/syscall.tbl | 22 +---
 2 files changed, 28 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6467bdab8d40..5a3e0b5c9ad1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -98,13 +98,14 @@ END_BTB_FLUSH_SECTION
std r11,_XER(r1)
std r11,_CTR(r1)
std r9,GPR13(r1)
+   SAVE_NVGPRS(r1)
mflrr10
/*
 * This clears CR0.SO (bit 28), which is the error indication on
 * return from this system call.
 */
rldimi  r2,r11,28,(63-28)
-   li  r11,0xc01
+   li  r11,0xc00
std r10,_LINK(r1)
std r11,_TRAP(r1)
std r3,ORIG_GPR3(r1)
@@ -323,7 +324,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 /* Traced system call support */
 .Lsyscall_dotrace:
-   bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_syscall_trace_enter
 
@@ -408,7 +408,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mtmsrd  r10,1
 #endif /* CONFIG_PPC_BOOK3E */
 
-   bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_syscall_trace_leave
b   ret_from_except
@@ -442,62 +441,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 _ASM_NOKPROBE_SYMBOL(system_call_common);
 _ASM_NOKPROBE_SYMBOL(system_call_exit);
 
-/* Save non-volatile GPRs, if not already saved. */
-_GLOBAL(save_nvgprs)
-   ld  r11,_TRAP(r1)
-   andi.   r0,r11,1
-   beqlr-
-   SAVE_NVGPRS(r1)
-   clrrdi  r0,r11,1
-   std r0,_TRAP(r1)
-   blr
-_ASM_NOKPROBE_SYMBOL(save_nvgprs);
-
-   
-/*
- * The sigsuspend and rt_sigsuspend system calls can call do_signal
- * and thus put the process into the stopped state where we might
- * want to examine its user state with ptrace.  Therefore we need
- * to save all the nonvolatile registers (r14 - r31) before calling
- * the C code.  Similarly, fork, vfork and clone need the full
- * register state on the stack so that it can be copied to the child.
- */
-
-_GLOBAL(ppc_fork)
-   bl  save_nvgprs
-   bl  sys_fork
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_vfork)
-   bl  save_nvgprs
-   bl  sys_vfork
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_clone)
-   bl  save_nvgprs
-   bl  sys_clone
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_clone3)
-   bl  save_nvgprs
-   bl  sys_clone3
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc32_swapcontext)
-   bl  save_nvgprs
-   bl  compat_sys_swapcontext
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc64_swapcontext)
-   bl  save_nvgprs
-   bl  sys_swapcontext
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_switch_endian)
-   bl  save_nvgprs
-   bl  sys_switch_endian
-   b   .Lsyscall_exit
-
 _GLOBAL(ret_from_fork)
bl  schedule_tail
REST_NVGPRS(r1)
@@ -516,6 +459,17 @@ _GLOBAL(ret_from_kernel_thread)
li  r3,0
b   .Lsyscall_exit
 
+/* Save non-volatile GPRs, if not already saved. */
+_GLOBAL(save_nvgprs)
+   ld  r11,_TRAP(r1)
+   andi.   r0,r11,1
+   beqlr-
+   SAVE_NVGPRS(r1)
+   clrrdi  r0,r11,1
+   std r0,_TRAP(r1)
+   blr
+_ASM_NOKPROBE_SYMBOL(save_nvgprs);
+
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #define FLUSH_COUNT_CACHE  \
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 43f736ed47f2..d899bcb5343e 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -9,7 +9,9 @@
 

[PATCH v2 rebase 21/34] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

The soft nmi handler does not reconcile interrupt state, so it should
not return via the normal ret_from_except path. Return like other NMIs,
using the EXCEPTION_RESTORE_REGS macro.

This becomes important when the scv interrupt is implemented, which
must handle soft-masked interrupts that have r13 set to something other
than the PACA -- returning to kernel in this case must restore r13.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 38bc66b95516..af1264cd005f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2740,7 +2740,11 @@ EXC_COMMON_BEGIN(soft_nmi_common)
bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  soft_nmi_interrupt
-   b   ret_from_except
+   /* Clear MSR_RI before setting SRR0 and SRR1. */
+   li  r9,0
+   mtmsrd  r9,1
+   EXCEPTION_RESTORE_REGS hsrr=0
+   RFI_TO_KERNEL
 
 #endif /* CONFIG_PPC_WATCHDOG */
 
-- 
2.23.0



[PATCH v2 rebase 20/34] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Apart from SRESET, MCE, and syscall (hcall variant), the SRR type
interrupts are not escalated to hypervisor mode, so delivered to the OS.

When running PR KVM, the OS is the hypervisor, and the guest runs with
MSR[PR]=1, so these interrupts must test if a guest was running when
interrupted. These tests are required at the real-mode entry points
because the PR KVM host runs with LPCR[AIL]=0.

In HV KVM and nested HV KVM, the guest always receives these interrupts,
so there is no need for the host to make this test. So remove the tests
if PR KVM is not configured.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 65 ++--
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 2f50587392aa..38bc66b95516 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -214,9 +214,36 @@ do_define_int n
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
- * If hv is possible, interrupts come into to the hv version
- * of the kvmppc_interrupt code, which then jumps to the PR handler,
- * kvmppc_interrupt_pr, if the guest is a PR guest.
+ * All interrupts which set HSRR registers, as well as SRESET and MCE and
+ * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
+ * so they all generally need to test whether they were taken in guest context.
+ *
+ * Note: SRESET and MCE may also be sent to the guest by the hypervisor, and be
+ * taken with MSR[HV]=0.
+ *
+ * Interrupts which set SRR registers (with the above exceptions) do not
+ * elevate to MSR[HV]=1 mode, though most can be taken when running with
+ * MSR[HV]=1  (e.g., bare metal kernel and userspace). So these interrupts do
+ * not need to test whether a guest is running because they get delivered to
+ * the guest directly, including nested HV KVM guests.
+ *
+ * The exception is PR KVM, where the guest runs with MSR[PR]=1 and the host
+ * runs with MSR[HV]=0, so the host takes all interrupts on behalf of the
+ * guest. PR KVM runs with LPCR[AIL]=0 which causes interrupts to always be
+ * delivered to the real-mode entry point, therefore such interrupts only test
+ * KVM in their real mode handlers, and only when PR KVM is possible.
+ *
+ * Interrupts that are taken in MSR[HV]=0 and escalate to MSR[HV]=1 are always
+ * delivered in real-mode when the MMU is in hash mode because the MMU
+ * registers are not set appropriately to translate host addresses. In nested
+ * radix mode these can be delivered in virt-mode as the host translations are
+ * used implicitly (see: effective LPID, effective PID).
+ */
+
+/*
+ * If an interrupt is taken while a guest is running, it is immediately routed
+ * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go first
+ * to kvmppc_interrupt_hv, which handles the PR guest case.
  */
 #define kvmppc_interrupt kvmppc_interrupt_hv
 #else
@@ -1258,8 +1285,10 @@ INT_DEFINE_BEGIN(data_access)
IVEC=0x300
IDAR=1
IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_SKIP=1
IKVM_REAL=1
+#endif
 INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
@@ -1306,8 +1335,10 @@ INT_DEFINE_BEGIN(data_access_slb)
IAREA=PACA_EXSLB
IRECONCILE=0
IDAR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_SKIP=1
IKVM_REAL=1
+#endif
 INT_DEFINE_END(data_access_slb)
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
@@ -1357,7 +1388,9 @@ INT_DEFINE_BEGIN(instruction_access)
IISIDE=1
IDAR=1
IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_access)
 
 EXC_REAL_BEGIN(instruction_access, 0x400, 0x80)
@@ -1396,7 +1429,9 @@ INT_DEFINE_BEGIN(instruction_access_slb)
IRECONCILE=0
IISIDE=1
IDAR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_access_slb)
 
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
@@ -1488,7 +1523,9 @@ INT_DEFINE_BEGIN(alignment)
IVEC=0x600
IDAR=1
IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(alignment)
 
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
@@ -1518,7 +1555,9 @@ EXC_COMMON_BEGIN(alignment_common)
  */
 INT_DEFINE_BEGIN(program_check)
IVEC=0x700
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(program_check)
 
 EXC_REAL_BEGIN(program_check, 0x700, 0x100)
@@ -1581,7 +1620,9 @@ EXC_COMMON_BEGIN(program_check_common)
 INT_DEFINE_BEGIN(fp_unavailable)
IVEC=0x800
IRECONCILE=0
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(fp_unavailable)
 
 EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100)
@@ -1643,7 +1684,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 INT_DEFINE_BEGIN(decrementer)
IVEC=0x900

[PATCH v2 rebase 19/34] powerpc/64s/exception: add more comments for interrupt handlers

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

A few of the non-standard handlers are left uncommented. Some more
description could be added to some.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 391 ---
 1 file changed, 353 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ef37d0ab6594..2f50587392aa 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,26 +121,26 @@ name:
 /*
  * Interrupt code generation macros
  */
-#define IVEC   .L_IVEC_\name\()
-#define IHSRR  .L_IHSRR_\name\()
-#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\()
-#define IAREA  .L_IAREA_\name\()
-#define IVIRT  .L_IVIRT_\name\()
-#define IISIDE .L_IISIDE_\name\()
-#define IDAR   .L_IDAR_\name\()
-#define IDSISR .L_IDSISR_\name\()
-#define ISET_RI.L_ISET_RI_\name\()
-#define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\()
-#define IREALMODE_COMMON   .L_IREALMODE_COMMON_\name\()
-#define IMASK  .L_IMASK_\name\()
-#define IKVM_SKIP  .L_IKVM_SKIP_\name\()
-#define IKVM_REAL  .L_IKVM_REAL_\name\()
+#define IVEC   .L_IVEC_\name\()/* Interrupt vector address */
+#define IHSRR  .L_IHSRR_\name\()   /* Sets SRR or HSRR registers */
+#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\() /* HSRR if HV else 
SRR */
+#define IAREA  .L_IAREA_\name\()   /* PACA save area */
+#define IVIRT  .L_IVIRT_\name\()   /* Has virt mode entry point */
+#define IISIDE .L_IISIDE_\name\()  /* Uses SRR0/1 not DAR/DSISR */
+#define IDAR   .L_IDAR_\name\()/* Uses DAR (or SRR0) */
+#define IDSISR .L_IDSISR_\name\()  /* Uses DSISR (or SRR1) */
+#define ISET_RI.L_ISET_RI_\name\() /* Run common code w/ 
MSR[RI]=1 */
+#define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch 
to common */
+#define IREALMODE_COMMON   .L_IREALMODE_COMMON_\name\() /* Common runs in 
realmode */
+#define IMASK  .L_IMASK_\name\()   /* IRQ soft-mask bit */
+#define IKVM_SKIP  .L_IKVM_SKIP_\name\()   /* Generate KVM skip handler */
+#define IKVM_REAL  .L_IKVM_REAL_\name\()   /* Real entry tests KVM */
 #define __IKVM_REAL(name)  .L_IKVM_REAL_ ## name
-#define IKVM_VIRT  .L_IKVM_VIRT_\name\()
-#define ISTACK .L_ISTACK_\name\()
+#define IKVM_VIRT  .L_IKVM_VIRT_\name\()   /* Virt entry tests KVM */
+#define ISTACK .L_ISTACK_\name\()  /* Set regular kernel stack */
 #define __ISTACK(name) .L_ISTACK_ ## name
-#define IRECONCILE .L_IRECONCILE_\name\()
-#define IKUAP  .L_IKUAP_\name\()
+#define IRECONCILE .L_IRECONCILE_\name\()  /* Do RECONCILE_IRQ_STATE */
+#define IKUAP  .L_IKUAP_\name\()   /* Do KUAP lock */
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -759,6 +759,39 @@ __start_interrupts:
 EXC_VIRT_NONE(0x4000, 0x100)
 
 
+/**
+ * Interrupt 0x100 - System Reset Interrupt (SRESET aka NMI).
+ * This is a non-maskable, asynchronous interrupt always taken in real-mode.
+ * It is caused by:
+ * - Wake from power-saving state, on powernv.
+ * - An NMI from another CPU, triggered by firmware or hypercall.
+ * - As crash/debug signal injected from BMC, firmware or hypervisor.
+ *
+ * Handling:
+ * Power-save wakeup is the only performance critical path, so this is
+ * determined quickly as possible first. In this case volatile registers
+ * can be discarded and SPRs like CFAR don't need to be read.
+ *
+ * If not a powersave wakeup, then it's run as a regular interrupt, however
+ * it uses its own stack and PACA save area to preserve the regular kernel
+ * environment for debugging.
+ *
+ * This interrupt is not maskable, so triggering it when MSR[RI] is clear,
+ * or SCRATCH0 is in use, etc. may cause a crash. It's also not entirely
+ * correct to switch to virtual mode to run the regular interrupt handler
+ * because it might be interrupted when the MMU is in a bad state (e.g., SLB
+ * is clear).
+ *
+ * FWNMI:
+ * PAPR specifies a "fwnmi" facility which sends the sreset to a different
+ * entry point with a different register set up. Some hypervisors will
+ * send the sreset to 0x100 in the guest if it is not fwnmi capable.
+ *
+ * KVM:
+ * Unlike most SRR interrupts, this may be taken by the host while executing
+ * in a guest, so a KVM test is required. KVM will pull the CPU out of guest
+ * mode and then raise the sreset.
+ */
 INT_DEFINE_BEGIN(system_reset)
IVEC=0x100
IAREA=PACA_EXNMI
@@ -834,6 +867,7 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake)
  * Vectors for the FWNMI option.  Share common code.
  */
 TRAMP_REAL_BEGIN(system_reset_fwnmi)
+   /* XXX: fwnmi guest could run a nested/PR guest, so why no test?  */

[PATCH v2 rebase 18/34] powerpc/64s/exception: Clean up SRR specifiers

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Remove more magic numbers and replace with nicely named bools.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 68 +---
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9494403b9586..ef37d0ab6594 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -105,11 +105,6 @@ name:
ori reg,reg,(ABS_ADDR(label))@l;\
addis   reg,reg,(ABS_ADDR(label))@h
 
-/* Exception register prefixes */
-#define EXC_HV_OR_STD  2 /* depends on HVMODE */
-#define EXC_HV 1
-#define EXC_STD0
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -128,6 +123,7 @@ name:
  */
 #define IVEC   .L_IVEC_\name\()
 #define IHSRR  .L_IHSRR_\name\()
+#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\()
 #define IAREA  .L_IAREA_\name\()
 #define IVIRT  .L_IVIRT_\name\()
 #define IISIDE .L_IISIDE_\name\()
@@ -159,7 +155,10 @@ do_define_int n
.error "IVEC not defined"
.endif
.ifndef IHSRR
-   IHSRR=EXC_STD
+   IHSRR=0
+   .endif
+   .ifndef IHSRR_IF_HVMODE
+   IHSRR_IF_HVMODE=0
.endif
.ifndef IAREA
IAREA=PACA_EXGEN
@@ -257,7 +256,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r9,IAREA+EX_R9(r13)
ld  r10,IAREA+EX_R10(r13)
/* HSRR variants have the 0x2 bit added to their trap number */
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
ori r12,r12,(IVEC + 0x2)
FTR_SECTION_ELSE
@@ -278,7 +277,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r10,IAREA+EX_R10(r13)
ld  r11,IAREA+EX_R11(r13)
ld  r12,IAREA+EX_R12(r13)
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
b   kvmppc_skip_Hinterrupt
FTR_SECTION_ELSE
@@ -403,7 +402,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
stw r10,IAREA+EX_DSISR(r13)
.endif
 
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
@@ -485,7 +484,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
.abort "Bad maskable vector"
.endif
 
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
bne masked_Hinterrupt
FTR_SECTION_ELSE
@@ -618,12 +617,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
  */
-.macro EXCEPTION_RESTORE_REGS hsrr
+.macro EXCEPTION_RESTORE_REGS hsrr=0
/* Move original SRR0 and SRR1 into the respective regs */
ld  r9,_MSR(r1)
-   .if \hsrr == EXC_HV_OR_STD
-   .error "EXC_HV_OR_STD Not implemented for EXCEPTION_RESTORE_REGS"
-   .endif
.if \hsrr
mtspr   SPRN_HSRR1,r9
.else
@@ -898,7 +894,7 @@ EXC_COMMON_BEGIN(system_reset_common)
ld  r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
 
-   EXCEPTION_RESTORE_REGS EXC_STD
+   EXCEPTION_RESTORE_REGS
RFI_TO_USER_OR_KERNEL
 
GEN_KVM system_reset
@@ -952,7 +948,7 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
lhz r12,PACA_IN_MCE(r13);   \
subir12,r12,1;  \
sth r12,PACA_IN_MCE(r13);   \
-   EXCEPTION_RESTORE_REGS EXC_STD
+   EXCEPTION_RESTORE_REGS
 
 EXC_COMMON_BEGIN(machine_check_early_common)
/*
@@ -1321,7 +1317,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(hardware_interrupt)
IVEC=0x500
-   IHSRR=EXC_HV_OR_STD
+   IHSRR_IF_HVMODE=1
IMASK=IRQS_DISABLED
IKVM_REAL=1
IKVM_VIRT=1
@@ -1490,7 +1486,7 @@ EXC_COMMON_BEGIN(decrementer_common)
 
 INT_DEFINE_BEGIN(hdecrementer)
IVEC=0x980
-   IHSRR=EXC_HV
+   IHSRR=1
ISTACK=0
IRECONCILE=0
IKVM_REAL=1
@@ -1732,7 +1728,7 @@ EXC_COMMON_BEGIN(single_step_common)
 
 INT_DEFINE_BEGIN(h_data_storage)
IVEC=0xe00
-   IHSRR=EXC_HV
+   IHSRR=1
IDAR=1
IDSISR=1
IKVM_SKIP=1
@@ -1764,7 +1760,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(h_instr_storage)
IVEC=0xe20
-   IHSRR=EXC_HV
+   IHSRR=1
IKVM_REAL=1
IKVM_VIRT=1
 INT_DEFINE_END(h_instr_storage)
@@ -1787,7 +1783,7 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 
 INT_DEFINE_BEGIN(emulation_assist)
   

[PATCH v2 rebase 17/34] powerpc/64s/exception: re-inline some handlers

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

The reduction in interrupt entry size allows some handlers to be
re-inlined.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 7a234e6d7bf5..9494403b9586 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1186,7 +1186,7 @@ INT_DEFINE_BEGIN(data_access)
 INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
-   GEN_INT_ENTRY data_access, virt=0, ool=1
+   GEN_INT_ENTRY data_access, virt=0
 EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
GEN_INT_ENTRY data_access, virt=1
@@ -1216,7 +1216,7 @@ INT_DEFINE_BEGIN(data_access_slb)
 INT_DEFINE_END(data_access_slb)
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-   GEN_INT_ENTRY data_access_slb, virt=0, ool=1
+   GEN_INT_ENTRY data_access_slb, virt=0
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
GEN_INT_ENTRY data_access_slb, virt=1
@@ -1472,7 +1472,7 @@ INT_DEFINE_BEGIN(decrementer)
 INT_DEFINE_END(decrementer)
 
 EXC_REAL_BEGIN(decrementer, 0x900, 0x80)
-   GEN_INT_ENTRY decrementer, virt=0, ool=1
+   GEN_INT_ENTRY decrementer, virt=0
 EXC_REAL_END(decrementer, 0x900, 0x80)
 EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
GEN_INT_ENTRY decrementer, virt=1
-- 
2.23.0



[PATCH v2 rebase 16/34] powerpc/64s/exception: hdecrementer avoid touching the stack

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

The hdec interrupt handler is reported to sometimes fire in Linux if
KVM leaves it pending after a guest exists. This is harmless, so there
is a no-op handler for it.

The interrupt handler currently uses the regular kernel stack. Change
this to avoid touching the stack entirely.

This should be the last place where the regular Linux stack can be
accessed with asynchronous interrupts (including PMI) soft-masked.
It might be possible to take advantage of this invariant, e.g., to
context switch the kernel stack SLB entry without clearing MSR[EE].

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/time.h  |  1 -
 arch/powerpc/kernel/exceptions-64s.S | 25 -
 arch/powerpc/kernel/time.c   |  9 -
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 08dbe3e6831c..e0107495c4de 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -24,7 +24,6 @@ extern struct clock_event_device decrementer_clockevent;
 
 
 extern void generic_calibrate_decr(void);
-extern void hdec_interrupt(struct pt_regs *regs);
 
 /* Some sane defaults: 125 MHz timebase, 1GHz processor */
 extern unsigned long ppc_proc_freq;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9fa71d51ecf4..7a234e6d7bf5 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1491,6 +1491,8 @@ EXC_COMMON_BEGIN(decrementer_common)
 INT_DEFINE_BEGIN(hdecrementer)
IVEC=0x980
IHSRR=EXC_HV
+   ISTACK=0
+   IRECONCILE=0
IKVM_REAL=1
IKVM_VIRT=1
 INT_DEFINE_END(hdecrementer)
@@ -1502,11 +1504,24 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
GEN_INT_ENTRY hdecrementer, virt=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
 EXC_COMMON_BEGIN(hdecrementer_common)
-   GEN_COMMON hdecrementer
-   bl  save_nvgprs
-   addir3,r1,STACK_FRAME_OVERHEAD
-   bl  hdec_interrupt
-   b   ret_from_except
+   __GEN_COMMON_ENTRY hdecrementer
+   /*
+* Hypervisor decrementer interrupts not caught by the KVM test
+* shouldn't occur but are sometimes left pending on exit from a KVM
+* guest.  We don't need to do anything to clear them, as they are
+* edge-triggered.
+*
+* Be careful to avoid touching the kernel stack.
+*/
+   ld  r10,PACA_EXGEN+EX_CTR(r13)
+   mtctr   r10
+   mtcrf   0x80,r9
+   ld  r9,PACA_EXGEN+EX_R9(r13)
+   ld  r10,PACA_EXGEN+EX_R10(r13)
+   ld  r11,PACA_EXGEN+EX_R11(r13)
+   ld  r12,PACA_EXGEN+EX_R12(r13)
+   ld  r13,PACA_EXGEN+EX_R13(r13)
+   HRFI_TO_KERNEL
 
GEN_KVM hdecrementer
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 968ae97382b4..e4572d67cc76 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -663,15 +663,6 @@ void timer_broadcast_interrupt(void)
 }
 #endif
 
-/*
- * Hypervisor decrementer interrupts shouldn't occur but are sometimes
- * left pending on exit from a KVM guest.  We don't need to do anything
- * to clear them, as they are edge-triggered.
- */
-void hdec_interrupt(struct pt_regs *regs)
-{
-}
-
 #ifdef CONFIG_SUSPEND
 static void generic_suspend_disable_irqs(void)
 {
-- 
2.23.0



[PATCH v2 rebase 15/34] powerpc/64s/exception: trim unused arguments from KVMTEST macro

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index abf26db36427..9fa71d51ecf4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -224,7 +224,7 @@ do_define_int n
 #define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
-.macro KVMTEST name, hsrr, n
+.macro KVMTEST name
lbz r10,HSTATE_IN_GUEST(r13)
cmpwi   r10,0
bne \name\()_kvm
@@ -293,7 +293,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 .endm
 
 #else
-.macro KVMTEST name, hsrr, n
+.macro KVMTEST name
 .endm
 .macro GEN_KVM name
 .endm
@@ -437,7 +437,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
.if IKVM_REAL
-   KVMTEST \name IHSRR IVEC
+   KVMTEST \name
.endif
 
ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
@@ -460,7 +460,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
.if IKVM_VIRT
-   KVMTEST \name IHSRR IVEC
+   KVMTEST \name
 1:
.endif
.endif /* IVIRT */
@@ -1595,7 +1595,7 @@ INT_DEFINE_END(system_call)
GET_PACA(r13)
std r10,PACA_EXGEN+EX_R10(r13)
INTERRUPT_TO_KERNEL
-   KVMTEST system_call EXC_STD 0xc00 /* uses r10, branch to 
system_call_kvm */
+   KVMTEST system_call /* uses r10, branch to system_call_kvm */
mfctr   r9
 #else
mr  r9,r13
-- 
2.23.0



[PATCH v2 rebase 13/34] powerpc/64s/exception: remove confusing IEARLY option

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Replace IEARLY=1 and IEARLY=2 with IBRANCH_COMMON, which controls if
the entry code branches to a common handler; and IREALMODE_COMMON,
which controls whether the common handler should remain in real mode.

These special cases no longer avoid loading the SRR registers, there
is no point as most of them load the registers immediately anyway.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 48 ++--
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 7db76e7be0aa..716a95ba814f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -174,7 +174,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IDAR   .L_IDAR_\name\()
 #define IDSISR .L_IDSISR_\name\()
 #define ISET_RI.L_ISET_RI_\name\()
-#define IEARLY .L_IEARLY_\name\()
+#define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\()
+#define IREALMODE_COMMON   .L_IREALMODE_COMMON_\name\()
 #define IMASK  .L_IMASK_\name\()
 #define IKVM_SKIP  .L_IKVM_SKIP_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
@@ -218,8 +219,15 @@ do_define_int n
.ifndef ISET_RI
ISET_RI=1
.endif
-   .ifndef IEARLY
-   IEARLY=0
+   .ifndef IBRANCH_TO_COMMON
+   IBRANCH_TO_COMMON=1
+   .endif
+   .ifndef IREALMODE_COMMON
+   IREALMODE_COMMON=0
+   .else
+   .if ! IBRANCH_TO_COMMON
+   .error "IREALMODE_COMMON=1 but IBRANCH_TO_COMMON=0"
+   .endif
.endif
.ifndef IMASK
IMASK=0
@@ -353,6 +361,11 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  */
 
 .macro GEN_BRANCH_TO_COMMON name, virt
+   .if IREALMODE_COMMON
+   LOAD_HANDLER(r10, \name\()_common)
+   mtctr   r10
+   bctr
+   .else
.if \virt
 #ifndef CONFIG_RELOCATABLE
b   \name\()_common_virt
@@ -366,6 +379,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
mtctr   r10
bctr
.endif
+   .endif
 .endm
 
 .macro GEN_INT_ENTRY name, virt, ool=0
@@ -421,11 +435,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
stw r10,IAREA+EX_DSISR(r13)
.endif
 
-   .if IEARLY == 2
-   /* nothing more */
-   .elseif IEARLY
-   BRANCH_TO_C000(r11, \name\()_common)
-   .else
.if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
@@ -441,6 +450,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
mfspr   r11,SPRN_SRR0   /* save SRR0 */
mfspr   r12,SPRN_SRR1   /* and SRR1 */
.endif
+
+   .if IBRANCH_TO_COMMON
GEN_BRANCH_TO_COMMON \name \virt
.endif
 
@@ -926,6 +937,7 @@ INT_DEFINE_BEGIN(machine_check_early)
IVEC=0x200
IAREA=PACA_EXMC
IVIRT=0 /* no virt entry point */
+   IREALMODE_COMMON=1
/*
 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 * nested machine check corrupts it. machine_check_common enables
@@ -933,7 +945,6 @@ INT_DEFINE_BEGIN(machine_check_early)
 */
ISET_RI=0
ISTACK=0
-   IEARLY=1
IDAR=1
IDSISR=1
IRECONCILE=0
@@ -973,9 +984,6 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
EXCEPTION_RESTORE_REGS EXC_STD
 
 EXC_COMMON_BEGIN(machine_check_early_common)
-   mfspr   r11,SPRN_SRR0
-   mfspr   r12,SPRN_SRR1
-
/*
 * Switch to mc_emergency stack and handle re-entrancy (we limit
 * the nested MCE upto level 4 to avoid stack overflow).
@@ -1822,7 +1830,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 INT_DEFINE_BEGIN(hmi_exception_early)
IVEC=0xe60
IHSRR=EXC_HV
-   IEARLY=1
+   IREALMODE_COMMON=1
ISTACK=0
IRECONCILE=0
IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
@@ -1842,8 +1850,6 @@ EXC_REAL_END(hmi_exception, 0xe60, 0x20)
 EXC_VIRT_NONE(0x4e60, 0x20)
 
 EXC_COMMON_BEGIN(hmi_exception_early_common)
-   mfspr   r11,SPRN_HSRR0  /* Save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* Save HSRR1 */
mr  r10,r1  /* Save r1 */
ld  r1,PACAEMERGSP(r13) /* Use emergency stack for realmode */
subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/
@@ -2169,29 +2175,23 @@ EXC_VIRT_NONE(0x5400, 0x100)
 INT_DEFINE_BEGIN(denorm_exception)
IVEC=0x1500
IHSRR=EXC_HV
-   IEARLY=2
+   IBRANCH_TO_COMMON=0
IKVM_REAL=1
 INT_DEFINE_END(denorm_exception)
 
 EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
GEN_INT_ENTRY denorm_exception, virt=0
 #ifdef CONFIG_PPC_DENORMALISATION
-   mfspr   r10,SPRN_HSRR1
-   andis.  

[PATCH v2 rebase 14/34] powerpc/64s/exception: remove the SPR saving patch code macros

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

These are used infrequently enough they don't provide much help, so
inline them.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 82 ++--
 1 file changed, 28 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 716a95ba814f..abf26db36427 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -110,46 +110,6 @@ name:
 #define EXC_HV 1
 #define EXC_STD0
 
-/*
- * PPR save/restore macros used in exceptions-64s.S
- * Used for P7 or later processors
- */
-#define SAVE_PPR(area, ra) \
-BEGIN_FTR_SECTION_NESTED(940)  \
-   ld  ra,area+EX_PPR(r13);/* Read PPR from paca */\
-   std ra,_PPR(r1);\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
-
-#define RESTORE_PPR_PACA(area, ra) \
-BEGIN_FTR_SECTION_NESTED(941)  \
-   ld  ra,area+EX_PPR(r13);\
-   mtspr   SPRN_PPR,ra;\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
-
-/*
- * Get an SPR into a register if the CPU has the given feature
- */
-#define OPT_GET_SPR(ra, spr, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   mfspr   ra,spr; \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Set an SPR from a register if the CPU has the given feature
- */
-#define OPT_SET_SPR(ra, spr, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   mtspr   spr,ra; \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Save a register to the PACA if the CPU has the given feature
- */
-#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   std ra,offset(r13); \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -278,18 +238,18 @@ do_define_int n
cmpwi   r10,KVM_GUEST_MODE_SKIP
beq 89f
.else
-BEGIN_FTR_SECTION_NESTED(947)
+BEGIN_FTR_SECTION
ld  r10,IAREA+EX_CFAR(r13)
std r10,HSTATE_CFAR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
.endif
 
ld  r10,PACA_EXGEN+EX_CTR(r13)
mtctr   r10
-BEGIN_FTR_SECTION_NESTED(948)
+BEGIN_FTR_SECTION
ld  r10,IAREA+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r11,IAREA+EX_R11(r13)
ld  r12,IAREA+EX_R12(r13)
std r12,HSTATE_SCRATCH0(r13)
@@ -386,10 +346,14 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
SET_SCRATCH0(r13)   /* save r13 */
GET_PACA(r13)
std r9,IAREA+EX_R9(r13) /* save r9 */
-   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION
+   mfspr   r9,SPRN_PPR
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
HMT_MEDIUM
std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
-   OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+   mfspr   r10,SPRN_CFAR
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
.if \ool
.if !\virt
b   tramp_real_\name
@@ -402,8 +366,12 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
.endif
 
-   OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
-   OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+   std r9,IAREA+EX_PPR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION
+   std r10,IAREA+EX_CFAR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
INTERRUPT_TO_KERNEL
mfctr   r10
std r10,IAREA+EX_CTR(r13)
@@ -558,7 +526,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
.endif
beq 101f/* if from kernel mode  */
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
-   SAVE_PPR(IAREA, r9)
+BEGIN_FTR_SECTION
+   ld  r9,IAREA+EX_PPR(r13)/* Read PPR from paca   */
+   std r9,_PPR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 101:
.else
.if IKUAP
@@ -598,10 +569,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
std r10,_DSISR(r1)
  

[PATCH v2 rebase 10/34] powerpc/64s/exception: move real->virt switch into the common handler

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

The real mode interrupt entry points currently use rfid to branch to
the common handler in virtual mode. This is a significant amount of
code, and forces other code (notably the KVM test) to live in the
real mode handler.

In the interest of minimising the amount of code that runs unrelocated
move the switch to virt mode into the common code, and do it with
mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST
performance of real-mode interrupt handlers is not a big concern these
days).

This requires CTR to always be saved (real-mode needs to reach 0xc...)
but that's not a huge impact these days. It could be optimized away in
future.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h |   4 -
 arch/powerpc/kernel/exceptions-64s.S | 251 ++-
 2 files changed, 109 insertions(+), 146 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 33f4f72eb035..47bd4ea0837d 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -33,11 +33,7 @@
 #include 
 
 /* PACA save area size in u64 units (exgen, exmc, etc) */
-#if defined(CONFIG_RELOCATABLE)
 #define EX_SIZE10
-#else
-#define EX_SIZE9
-#endif
 
 /*
  * maximum recursive depth of MCE exceptions
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b8588618cdc3..5803ce3b9404 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -32,16 +32,10 @@
 #define EX_CCR 52
 #define EX_CFAR56
 #define EX_PPR 64
-#if defined(CONFIG_RELOCATABLE)
 #define EX_CTR 72
 .if EX_SIZE != 10
.error "EX_SIZE is wrong"
 .endif
-#else
-.if EX_SIZE != 9
-   .error "EX_SIZE is wrong"
-.endif
-#endif
 
 /*
  * Following are fixed section helper macros.
@@ -124,22 +118,6 @@ name:
 #define EXC_HV 1
 #define EXC_STD0
 
-#if defined(CONFIG_RELOCATABLE)
-/*
- * If we support interrupts with relocation on AND we're a relocatable kernel,
- * we need to use CTR to get to the 2nd level handler.  So, save/restore it
- * when required.
- */
-#define SAVE_CTR(reg, area)mfctr   reg ;   std reg,area+EX_CTR(r13)
-#define GET_CTR(reg, area) ld  reg,area+EX_CTR(r13)
-#define RESTORE_CTR(reg, area) ld  reg,area+EX_CTR(r13) ; mtctr reg
-#else
-/* ...else CTR is unused and in register. */
-#define SAVE_CTR(reg, area)
-#define GET_CTR(reg, area) mfctr   reg
-#define RESTORE_CTR(reg, area)
-#endif
-
 /*
  * PPR save/restore macros used in exceptions-64s.S
  * Used for P7 or later processors
@@ -199,6 +177,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IVEC   .L_IVEC_\name\()
 #define IHSRR  .L_IHSRR_\name\()
 #define IAREA  .L_IAREA_\name\()
+#define IVIRT  .L_IVIRT_\name\()
 #define IISIDE .L_IISIDE_\name\()
 #define IDAR   .L_IDAR_\name\()
 #define IDSISR .L_IDSISR_\name\()
@@ -232,6 +211,9 @@ do_define_int n
.ifndef IAREA
IAREA=PACA_EXGEN
.endif
+   .ifndef IVIRT
+   IVIRT=1
+   .endif
.ifndef IISIDE
IISIDE=0
.endif
@@ -325,7 +307,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 * outside the head section. CONFIG_RELOCATABLE KVM expects CTR
 * to be saved in HSTATE_SCRATCH1.
 */
-   mfctr   r9
+   ld  r9,IAREA+EX_CTR(r13)
std r9,HSTATE_SCRATCH1(r13)
__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
mtctr   r9
@@ -362,101 +344,6 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 .endm
 #endif
 
-.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri
-   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
-   .if ! \set_ri
-   xorir10,r10,MSR_RI  /* Clear MSR_RI */
-   .endif
-   .if \hsrr == EXC_HV_OR_STD
-   BEGIN_FTR_SECTION
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   FTR_SECTION_ELSE
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   .endif
-   LOAD_HANDLER(r10, \label\())
-   .if \hsrr == EXC_HV_OR_STD
-   BEGIN_FTR_SECTION
-   mtspr   SPRN_HSRR0,r10
-   HRFI_TO_KERNEL
-   FTR_SECTION_ELSE
-   mtspr   

[PATCH v2 rebase 12/34] powerpc/64s/exception: move KVM test to common code

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

This allows more code to be moved out of unrelocated regions. The system
call KVMTEST is changed to be open-coded and remain in the tramp area to
avoid having to move it to entry_64.S. The custom nature of the system
call entry code means the hcall case can be made more streamlined than
regular interrupt handlers.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S| 239 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  11 --
 arch/powerpc/kvm/book3s_segment.S   |   7 -
 3 files changed, 119 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index fbc3fbb293f7..7db76e7be0aa 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -44,7 +44,6 @@
  * EXC_VIRT_BEGIN/END  - virt (AIL), unrelocated exception vectors
  * TRAMP_REAL_BEGIN- real, unrelocated helpers (virt may call these)
  * TRAMP_VIRT_BEGIN- virt, unreloc helpers (in practice, real can use)
- * TRAMP_KVM_BEGIN - KVM handlers, these are put into real, unrelocated
  * EXC_COMMON  - After switching to virtual, relocated mode.
  */
 
@@ -74,13 +73,6 @@ name:
 #define TRAMP_VIRT_BEGIN(name) \
FIXED_SECTION_ENTRY_BEGIN(virt_trampolines, name)
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-#define TRAMP_KVM_BEGIN(name)  \
-   TRAMP_VIRT_BEGIN(name)
-#else
-#define TRAMP_KVM_BEGIN(name)
-#endif
-
 #define EXC_REAL_NONE(start, size) \
FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##unused, start, size); \
FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, 
exc_real_##start##_##unused, start, size)
@@ -271,6 +263,9 @@ do_define_int n
 .endm
 
 .macro GEN_KVM name
+   .balign IFETCH_ALIGN_BYTES
+\name\()_kvm:
+
.if IKVM_SKIP
cmpwi   r10,KVM_GUEST_MODE_SKIP
beq 89f
@@ -281,13 +276,18 @@ BEGIN_FTR_SECTION_NESTED(947)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
.endif
 
+   ld  r10,PACA_EXGEN+EX_CTR(r13)
+   mtctr   r10
 BEGIN_FTR_SECTION_NESTED(948)
ld  r10,IAREA+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-   ld  r10,IAREA+EX_R10(r13)
+   ld  r11,IAREA+EX_R11(r13)
+   ld  r12,IAREA+EX_R12(r13)
std r12,HSTATE_SCRATCH0(r13)
sldir12,r9,32
+   ld  r9,IAREA+EX_R9(r13)
+   ld  r10,IAREA+EX_R10(r13)
/* HSRR variants have the 0x2 bit added to their trap number */
.if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
@@ -300,29 +300,16 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.else
ori r12,r12,(IVEC)
.endif
-
-#ifdef CONFIG_RELOCATABLE
-   /*
-* KVM requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
-* outside the head section. CONFIG_RELOCATABLE KVM expects CTR
-* to be saved in HSTATE_SCRATCH1.
-*/
-   ld  r9,IAREA+EX_CTR(r13)
-   std r9,HSTATE_SCRATCH1(r13)
-   __LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
-   mtctr   r9
-   ld  r9,IAREA+EX_R9(r13)
-   bctr
-#else
-   ld  r9,IAREA+EX_R9(r13)
b   kvmppc_interrupt
-#endif
-
 
.if IKVM_SKIP
 89:mtocrf  0x80,r9
+   ld  r10,PACA_EXGEN+EX_CTR(r13)
+   mtctr   r10
ld  r9,IAREA+EX_R9(r13)
ld  r10,IAREA+EX_R10(r13)
+   ld  r11,IAREA+EX_R11(r13)
+   ld  r12,IAREA+EX_R12(r13)
.if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
b   kvmppc_skip_Hinterrupt
@@ -407,11 +394,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
mfctr   r10
std r10,IAREA+EX_CTR(r13)
mfcrr9
-
-   .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
-   KVMTEST \name IHSRR IVEC
-   .endif
-
std r11,IAREA+EX_R11(r13)
std r12,IAREA+EX_R12(r13)
 
@@ -475,6 +457,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 .macro __GEN_COMMON_ENTRY name
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
+   .if IKVM_REAL
+   KVMTEST \name IHSRR IVEC
+   .endif
+
ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
/* MSR[RI] is clear iff using SRR regs */
.if IHSRR == EXC_HV_OR_STD
@@ -487,9 +473,17 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
mtmsrd  r10
 
.if IVIRT
+   .if IKVM_VIRT
+   b   1f /* skip the virt test coming from real */
+   .endif
+
.balign IFETCH_ALIGN_BYTES
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
+   .if IKVM_VIRT
+   KVMTEST \name IHSRR IVEC
+1:
+   .endif
.endif /* IVIRT */
 .endm
 
@@ -848,8 +842,6 @@ 

[PATCH v2 rebase 11/34] powerpc/64s/exception: move soft-mask test to common code

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

As well as moving code out of the unrelocated vectors, this allows the
masked handlers to be moved to common code, and allows the soft_nmi
handler to be generated more like a regular handler.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 106 +--
 1 file changed, 49 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 5803ce3b9404..fbc3fbb293f7 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -411,36 +411,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
KVMTEST \name IHSRR IVEC
.endif
-   .if IMASK
-   lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,IMASK
-   /* Associate vector numbers with bits in paca->irq_happened */
-   .if IVEC == 0x500 || IVEC == 0xea0
-   li  r10,PACA_IRQ_EE
-   .elseif IVEC == 0x900
-   li  r10,PACA_IRQ_DEC
-   .elseif IVEC == 0xa00 || IVEC == 0xe80
-   li  r10,PACA_IRQ_DBELL
-   .elseif IVEC == 0xe60
-   li  r10,PACA_IRQ_HMI
-   .elseif IVEC == 0xf00
-   li  r10,PACA_IRQ_PMI
-   .else
-   .abort "Bad maskable vector"
-   .endif
-
-   .if IHSRR == EXC_HV_OR_STD
-   BEGIN_FTR_SECTION
-   bne masked_Hinterrupt
-   FTR_SECTION_ELSE
-   bne masked_interrupt
-   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif IHSRR
-   bne masked_Hinterrupt
-   .else
-   bne masked_interrupt
-   .endif
-   .endif
 
std r11,IAREA+EX_R11(r13)
std r12,IAREA+EX_R12(r13)
@@ -524,6 +494,37 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 .endm
 
 .macro __GEN_COMMON_BODY name
+   .if IMASK
+   lbz r10,PACAIRQSOFTMASK(r13)
+   andi.   r10,r10,IMASK
+   /* Associate vector numbers with bits in paca->irq_happened */
+   .if IVEC == 0x500 || IVEC == 0xea0
+   li  r10,PACA_IRQ_EE
+   .elseif IVEC == 0x900
+   li  r10,PACA_IRQ_DEC
+   .elseif IVEC == 0xa00 || IVEC == 0xe80
+   li  r10,PACA_IRQ_DBELL
+   .elseif IVEC == 0xe60
+   li  r10,PACA_IRQ_HMI
+   .elseif IVEC == 0xf00
+   li  r10,PACA_IRQ_PMI
+   .else
+   .abort "Bad maskable vector"
+   .endif
+
+   .if IHSRR == EXC_HV_OR_STD
+   BEGIN_FTR_SECTION
+   bne masked_Hinterrupt
+   FTR_SECTION_ELSE
+   bne masked_interrupt
+   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
+   .elseif IHSRR
+   bne masked_Hinterrupt
+   .else
+   bne masked_interrupt
+   .endif
+   .endif
+
.if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
mr  r10,r1  /* Save r1  */
@@ -2343,18 +2344,10 @@ EXC_VIRT_NONE(0x5800, 0x100)
 
 #ifdef CONFIG_PPC_WATCHDOG
 
-#define MASKED_DEC_HANDLER_LABEL 3f
-
-#define MASKED_DEC_HANDLER(_H) \
-3: /* soft-nmi */  \
-   std r12,PACA_EXGEN+EX_R12(r13); \
-   GET_SCRATCH0(r10);  \
-   std r10,PACA_EXGEN+EX_R13(r13); \
-   mfspr   r11,SPRN_SRR0;  /* save SRR0 */ \
-   mfspr   r12,SPRN_SRR1;  /* and SRR1 */  \
-   LOAD_HANDLER(r10, soft_nmi_common); \
-   mtctr   r10;\
-   bctr
+INT_DEFINE_BEGIN(soft_nmi)
+   IVEC=0x900
+   ISTACK=0
+INT_DEFINE_END(soft_nmi)
 
 /*
  * Branch to soft_nmi_interrupt using the emergency stack. The emergency
@@ -2366,19 +2359,16 @@ EXC_VIRT_NONE(0x5800, 0x100)
  * and run it entirely with interrupts hard disabled.
  */
 EXC_COMMON_BEGIN(soft_nmi_common)
+   mfspr   r11,SPRN_SRR0
mr  r10,r1
ld  r1,PACAEMERGSP(r13)
subir1,r1,INT_FRAME_SIZE
-   __ISTACK(decrementer)=0
-   __GEN_COMMON_BODY decrementer
+   __GEN_COMMON_BODY soft_nmi
bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  soft_nmi_interrupt
b   ret_from_except
 
-#else /* CONFIG_PPC_WATCHDOG */
-#define MASKED_DEC_HANDLER_LABEL 2f /* normal return */
-#define MASKED_DEC_HANDLER(_H)
 #endif /* CONFIG_PPC_WATCHDOG */
 
 /*
@@ -2397,7 +2387,6 @@ masked_Hinterrupt:
.else
 

[PATCH v2 rebase 09/34] powerpc/64s/exception: Add ISIDE option

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Rather than using DAR=2 to select the i-side registers, add an
explicit option.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index bef0c2eee7dc..b8588618cdc3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -199,6 +199,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IVEC   .L_IVEC_\name\()
 #define IHSRR  .L_IHSRR_\name\()
 #define IAREA  .L_IAREA_\name\()
+#define IISIDE .L_IISIDE_\name\()
 #define IDAR   .L_IDAR_\name\()
 #define IDSISR .L_IDSISR_\name\()
 #define ISET_RI.L_ISET_RI_\name\()
@@ -231,6 +232,9 @@ do_define_int n
.ifndef IAREA
IAREA=PACA_EXGEN
.endif
+   .ifndef IISIDE
+   IISIDE=0
+   .endif
.ifndef IDAR
IDAR=0
.endif
@@ -542,7 +546,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 */
GET_SCRATCH0(r10)
std r10,IAREA+EX_R13(r13)
-   .if IDAR == 1
+   .if IDAR && !IISIDE
.if IHSRR
mfspr   r10,SPRN_HDAR
.else
@@ -550,7 +554,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
std r10,IAREA+EX_DAR(r13)
.endif
-   .if IDSISR == 1
+   .if IDSISR && !IISIDE
.if IHSRR
mfspr   r10,SPRN_HDSISR
.else
@@ -625,16 +629,18 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
+
.if IDAR
-   .if IDAR == 2
+   .if IISIDE
ld  r10,_NIP(r1)
.else
ld  r10,IAREA+EX_DAR(r13)
.endif
std r10,_DAR(r1)
.endif
+
.if IDSISR
-   .if IDSISR == 2
+   .if IISIDE
ld  r10,_MSR(r1)
lis r11,DSISR_SRR1_MATCH_64S@h
and r10,r10,r11
@@ -643,6 +649,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
std r10,_DSISR(r1)
.endif
+
 BEGIN_FTR_SECTION_NESTED(66)
ld  r10,IAREA+EX_CFAR(r13)
std r10,ORIG_GPR3(r1)
@@ -1311,8 +1318,9 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(instruction_access)
IVEC=0x400
-   IDAR=2
-   IDSISR=2
+   IISIDE=1
+   IDAR=1
+   IDSISR=1
IKVM_REAL=1
 INT_DEFINE_END(instruction_access)
 
@@ -1341,7 +1349,8 @@ INT_DEFINE_BEGIN(instruction_access_slb)
IVEC=0x480
IAREA=PACA_EXSLB
IRECONCILE=0
-   IDAR=2
+   IISIDE=1
+   IDAR=1
IKVM_REAL=1
 INT_DEFINE_END(instruction_access_slb)
 
-- 
2.23.0



[PATCH v2 rebase 08/34] powerpc/64s/exception: Remove old INT_KVM_HANDLER

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 55 +---
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f318869607db..bef0c2eee7dc 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -266,15 +266,6 @@ do_define_int n
.endif
 .endm
 
-.macro INT_KVM_HANDLER name, vec, hsrr, area, skip
-   TRAMP_KVM_BEGIN(\name\()_kvm)
-   KVM_HANDLER \vec, \hsrr, \area, \skip
-.endm
-
-.macro GEN_KVM name
-   KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP
-.endm
-
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -293,35 +284,35 @@ do_define_int n
bne \name\()_kvm
 .endm
 
-.macro KVM_HANDLER vec, hsrr, area, skip
-   .if \skip
+.macro GEN_KVM name
+   .if IKVM_SKIP
cmpwi   r10,KVM_GUEST_MODE_SKIP
beq 89f
.else
 BEGIN_FTR_SECTION_NESTED(947)
-   ld  r10,\area+EX_CFAR(r13)
+   ld  r10,IAREA+EX_CFAR(r13)
std r10,HSTATE_CFAR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
.endif
 
 BEGIN_FTR_SECTION_NESTED(948)
-   ld  r10,\area+EX_PPR(r13)
+   ld  r10,IAREA+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-   ld  r10,\area+EX_R10(r13)
+   ld  r10,IAREA+EX_R10(r13)
std r12,HSTATE_SCRATCH0(r13)
sldir12,r9,32
/* HSRR variants have the 0x2 bit added to their trap number */
-   .if \hsrr == EXC_HV_OR_STD
+   .if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
-   ori r12,r12,(\vec + 0x2)
+   ori r12,r12,(IVEC + 0x2)
FTR_SECTION_ELSE
-   ori r12,r12,(\vec)
+   ori r12,r12,(IVEC)
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
-   ori r12,r12,(\vec + 0x2)
+   .elseif IHSRR
+   ori r12,r12,(IVEC+ 0x2)
.else
-   ori r12,r12,(\vec)
+   ori r12,r12,(IVEC)
.endif
 
 #ifdef CONFIG_RELOCATABLE
@@ -334,25 +325,25 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
std r9,HSTATE_SCRATCH1(r13)
__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
mtctr   r9
-   ld  r9,\area+EX_R9(r13)
+   ld  r9,IAREA+EX_R9(r13)
bctr
 #else
-   ld  r9,\area+EX_R9(r13)
+   ld  r9,IAREA+EX_R9(r13)
b   kvmppc_interrupt
 #endif
 
 
-   .if \skip
+   .if IKVM_SKIP
 89:mtocrf  0x80,r9
-   ld  r9,\area+EX_R9(r13)
-   ld  r10,\area+EX_R10(r13)
-   .if \hsrr == EXC_HV_OR_STD
+   ld  r9,IAREA+EX_R9(r13)
+   ld  r10,IAREA+EX_R10(r13)
+   .if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
b   kvmppc_skip_Hinterrupt
FTR_SECTION_ELSE
b   kvmppc_skip_interrupt
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
+   .elseif IHSRR
b   kvmppc_skip_Hinterrupt
.else
b   kvmppc_skip_interrupt
@@ -363,7 +354,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 #else
 .macro KVMTEST name, hsrr, n
 .endm
-.macro KVM_HANDLER name, vec, hsrr, area, skip
+.macro GEN_KVM name
 .endm
 #endif
 
@@ -1640,6 +1631,12 @@ EXC_VIRT_NONE(0x4b00, 0x100)
  * without saving, though xer is not a good idea to use, as hardware may
  * interpret some bits so it may be costly to change them.
  */
+INT_DEFINE_BEGIN(system_call)
+   IVEC=0xc00
+   IKVM_REAL=1
+   IKVM_VIRT=1
+INT_DEFINE_END(system_call)
+
 .macro SYSTEM_CALL virt
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
/*
@@ -1733,7 +1730,7 @@ TRAMP_KVM_BEGIN(system_call_kvm)
SET_SCRATCH0(r10)
std r9,PACA_EXGEN+EX_R9(r13)
mfcrr9
-   KVM_HANDLER 0xc00, EXC_STD, PACA_EXGEN, 0
+   GEN_KVM system_call
 #endif
 
 
-- 
2.23.0



[PATCH v2 rebase 07/34] powerpc/64s/exception: Remove old INT_COMMON macro

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 51 +---
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 17e4aaf6ed42..f318869607db 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -591,8 +591,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * If stack=0, then the stack is already set in r1, and r1 is saved in r10.
  * PPR save and CPU accounting is not done for the !stack case (XXX why not?)
  */
-.macro INT_COMMON vec, area, stack, kuap, reconcile, dar, dsisr
-   .if \stack
+.macro GEN_COMMON name
+   .if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
mr  r10,r1  /* Save r1  */
subir1,r1,INT_FRAME_SIZE/* alloc frame on kernel stack  */
@@ -609,54 +609,54 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
std r0,GPR0(r1) /* save r0 in stackframe*/
std r10,GPR1(r1)/* save r1 in stackframe*/
 
-   .if \stack
-   .if \kuap
+   .if ISTACK
+   .if IKUAP
kuap_save_amr_and_lock r9, r10, cr1, cr0
.endif
beq 101f/* if from kernel mode  */
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
-   SAVE_PPR(\area, r9)
+   SAVE_PPR(IAREA, r9)
 101:
.else
-   .if \kuap
+   .if IKUAP
kuap_save_amr_and_lock r9, r10, cr1
.endif
.endif
 
/* Save original regs values from save area to stack frame. */
-   ld  r9,\area+EX_R9(r13) /* move r9, r10 to stackframe   */
-   ld  r10,\area+EX_R10(r13)
+   ld  r9,IAREA+EX_R9(r13) /* move r9, r10 to stackframe   */
+   ld  r10,IAREA+EX_R10(r13)
std r9,GPR9(r1)
std r10,GPR10(r1)
-   ld  r9,\area+EX_R11(r13)/* move r11 - r13 to stackframe */
-   ld  r10,\area+EX_R12(r13)
-   ld  r11,\area+EX_R13(r13)
+   ld  r9,IAREA+EX_R11(r13)/* move r11 - r13 to stackframe */
+   ld  r10,IAREA+EX_R12(r13)
+   ld  r11,IAREA+EX_R13(r13)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
-   .if \dar
-   .if \dar == 2
+   .if IDAR
+   .if IDAR == 2
ld  r10,_NIP(r1)
.else
-   ld  r10,\area+EX_DAR(r13)
+   ld  r10,IAREA+EX_DAR(r13)
.endif
std r10,_DAR(r1)
.endif
-   .if \dsisr
-   .if \dsisr == 2
+   .if IDSISR
+   .if IDSISR == 2
ld  r10,_MSR(r1)
lis r11,DSISR_SRR1_MATCH_64S@h
and r10,r10,r11
.else
-   lwz r10,\area+EX_DSISR(r13)
+   lwz r10,IAREA+EX_DSISR(r13)
.endif
std r10,_DSISR(r1)
.endif
 BEGIN_FTR_SECTION_NESTED(66)
-   ld  r10,\area+EX_CFAR(r13)
+   ld  r10,IAREA+EX_CFAR(r13)
std r10,ORIG_GPR3(r1)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
-   GET_CTR(r10, \area)
+   GET_CTR(r10, IAREA)
std r10,_CTR(r1)
std r2,GPR2(r1) /* save r2 in stackframe*/
SAVE_4GPRS(3, r1)   /* save r3 - r6 in stackframe   */
@@ -668,26 +668,22 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
mfspr   r11,SPRN_XER/* save XER in stackframe   */
std r10,SOFTE(r1)
std r11,_XER(r1)
-   li  r9,(\vec)+1
+   li  r9,(IVEC)+1
std r9,_TRAP(r1)/* set trap number  */
li  r10,0
ld  r11,exception_marker@toc(r2)
std r10,RESULT(r1)  /* clear regs->result   */
std r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame   */
 
-   .if \stack
+   .if ISTACK
ACCOUNT_STOLEN_TIME
.endif
 
-   .if \reconcile
+   .if IRECONCILE
RECONCILE_IRQ_STATE(r10, r11)
.endif
 .endm
 
-.macro GEN_COMMON name
-   INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR
-.endm
-
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -2400,7 +2396,8 @@ EXC_COMMON_BEGIN(soft_nmi_common)
mr  r10,r1
ld  r1,PACAEMERGSP(r13)
subir1,r1,INT_FRAME_SIZE
-   INT_COMMON 0x900, PACA_EXGEN, 0, 1, 1, 0, 0
+   __ISTACK(decrementer)=0
+   GEN_COMMON decrementer
bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  soft_nmi_interrupt
-- 
2.23.0



[PATCH v2 rebase 06/34] powerpc/64s/exception: Remove old INT_ENTRY macro

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 68 
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f70c9fb2566a..17e4aaf6ed42 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -482,13 +482,13 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * - Fall through and continue executing in real, unrelocated mode.
  *   This is done if early=2.
  */
-.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, 
ri=1, dar=0, dsisr=0, bitmask=0, kvm=0
+.macro GEN_INT_ENTRY name, virt, ool=0
SET_SCRATCH0(r13)   /* save r13 */
GET_PACA(r13)
-   std r9,\area\()+EX_R9(r13)  /* save r9 */
+   std r9,IAREA+EX_R9(r13) /* save r9 */
OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
HMT_MEDIUM
-   std r10,\area\()+EX_R10(r13)/* save r10 - r12 */
+   std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
.if \ool
.if !\virt
@@ -502,47 +502,47 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
.endif
 
-   OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR)
-   OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR)
+   OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
+   OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
INTERRUPT_TO_KERNEL
-   SAVE_CTR(r10, \area\())
+   SAVE_CTR(r10, IAREA)
mfcrr9
-   .if \kvm
-   KVMTEST \name \hsrr \vec
+   .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
+   KVMTEST \name IHSRR IVEC
.endif
-   .if \bitmask
+   .if IMASK
lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,\bitmask
+   andi.   r10,r10,IMASK
/* Associate vector numbers with bits in paca->irq_happened */
-   .if \vec == 0x500 || \vec == 0xea0
+   .if IVEC == 0x500 || IVEC == 0xea0
li  r10,PACA_IRQ_EE
-   .elseif \vec == 0x900
+   .elseif IVEC == 0x900
li  r10,PACA_IRQ_DEC
-   .elseif \vec == 0xa00 || \vec == 0xe80
+   .elseif IVEC == 0xa00 || IVEC == 0xe80
li  r10,PACA_IRQ_DBELL
-   .elseif \vec == 0xe60
+   .elseif IVEC == 0xe60
li  r10,PACA_IRQ_HMI
-   .elseif \vec == 0xf00
+   .elseif IVEC == 0xf00
li  r10,PACA_IRQ_PMI
.else
.abort "Bad maskable vector"
.endif
 
-   .if \hsrr == EXC_HV_OR_STD
+   .if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
bne masked_Hinterrupt
FTR_SECTION_ELSE
bne masked_interrupt
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
+   .elseif IHSRR
bne masked_Hinterrupt
.else
bne masked_interrupt
.endif
.endif
 
-   std r11,\area\()+EX_R11(r13)
-   std r12,\area\()+EX_R12(r13)
+   std r11,IAREA+EX_R11(r13)
+   std r12,IAREA+EX_R12(r13)
 
/*
 * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI],
@@ -550,47 +550,39 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 * not recoverable if they are live.
 */
GET_SCRATCH0(r10)
-   std r10,\area\()+EX_R13(r13)
-   .if \dar == 1
-   .if \hsrr
+   std r10,IAREA+EX_R13(r13)
+   .if IDAR == 1
+   .if IHSRR
mfspr   r10,SPRN_HDAR
.else
mfspr   r10,SPRN_DAR
.endif
-   std r10,\area\()+EX_DAR(r13)
+   std r10,IAREA+EX_DAR(r13)
.endif
-   .if \dsisr == 1
-   .if \hsrr
+   .if IDSISR == 1
+   .if IHSRR
mfspr   r10,SPRN_HDSISR
.else
mfspr   r10,SPRN_DSISR
.endif
-   stw r10,\area\()+EX_DSISR(r13)
+   stw r10,IAREA+EX_DSISR(r13)
.endif
 
-   .if \early == 2
+   .if IEARLY == 2
/* nothing more */
-   .elseif \early
+   .elseif IEARLY
mfctr   r10 /* save ctr, even for !RELOCATABLE */
BRANCH_TO_C000(r11, \name\()_common)
.elseif !\virt
-   INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri
+   INT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR, ISET_RI
.else
-   INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr
+   INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR
.endif
.if \ool
.popsection

[PATCH v2 rebase 05/34] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

Aside from label names and BUG line numbers, the generated code change
is an additional HMI KVM handler added for the "late" KVM handler,
because early and late HMI generation is achieved by defining two
different interrupt types.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 556 ---
 1 file changed, 418 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index cefe2e9a9e05..f70c9fb2566a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -206,8 +206,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IMASK  .L_IMASK_\name\()
 #define IKVM_SKIP  .L_IKVM_SKIP_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
+#define __IKVM_REAL(name)  .L_IKVM_REAL_ ## name
 #define IKVM_VIRT  .L_IKVM_VIRT_\name\()
 #define ISTACK .L_ISTACK_\name\()
+#define __ISTACK(name) .L_ISTACK_ ## name
 #define IRECONCILE .L_IRECONCILE_\name\()
 #define IKUAP  .L_IKUAP_\name\()
 
@@ -570,7 +572,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
/* nothing more */
.elseif \early
mfctr   r10 /* save ctr, even for !RELOCATABLE */
-   BRANCH_TO_C000(r11, \name\()_early_common)
+   BRANCH_TO_C000(r11, \name\()_common)
.elseif !\virt
INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri
.else
@@ -843,6 +845,19 @@ __start_interrupts:
 EXC_VIRT_NONE(0x4000, 0x100)
 
 
+INT_DEFINE_BEGIN(system_reset)
+   IVEC=0x100
+   IAREA=PACA_EXNMI
+   /*
+* MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
+* being used, so a nested NMI exception would corrupt it.
+*/
+   ISET_RI=0
+   ISTACK=0
+   IRECONCILE=0
+   IKVM_REAL=1
+INT_DEFINE_END(system_reset)
+
 EXC_REAL_BEGIN(system_reset, 0x100, 0x100)
 #ifdef CONFIG_PPC_P7_NAP
/*
@@ -880,11 +895,8 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif
 
-   INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0, kvm=1
+   GEN_INT_ENTRY system_reset, virt=0
/*
-* MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
-* being used, so a nested NMI exception would corrupt it.
-*
 * In theory, we should not enable relocation here if it was disabled
 * in SRR1, because the MMU may not be configured to support it (e.g.,
 * SLB may have been cleared). In practice, there should only be a few
@@ -893,7 +905,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 */
 EXC_REAL_END(system_reset, 0x100, 0x100)
 EXC_VIRT_NONE(0x4100, 0x100)
-INT_KVM_HANDLER system_reset 0x100, EXC_STD, PACA_EXNMI, 0
+TRAMP_KVM_BEGIN(system_reset_kvm)
+   GEN_KVM system_reset
 
 #ifdef CONFIG_PPC_P7_NAP
 TRAMP_REAL_BEGIN(system_reset_idle_wake)
@@ -908,8 +921,8 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake)
  * Vectors for the FWNMI option.  Share common code.
  */
 TRAMP_REAL_BEGIN(system_reset_fwnmi)
-   /* See comment at system_reset exception, don't turn on RI */
-   INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0
+   __IKVM_REAL(system_reset)=0
+   GEN_INT_ENTRY system_reset, virt=0
 
 #endif /* CONFIG_PPC_PSERIES */
 
@@ -929,7 +942,7 @@ EXC_COMMON_BEGIN(system_reset_common)
mr  r10,r1
ld  r1,PACA_NMI_EMERG_SP(r13)
subir1,r1,INT_FRAME_SIZE
-   INT_COMMON 0x100, PACA_EXNMI, 0, 1, 0, 0, 0
+   GEN_COMMON system_reset
bl  save_nvgprs
/*
 * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does
@@ -971,23 +984,46 @@ EXC_COMMON_BEGIN(system_reset_common)
RFI_TO_USER_OR_KERNEL
 
 
-EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
-   INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, 
dsisr=1
+INT_DEFINE_BEGIN(machine_check_early)
+   IVEC=0x200
+   IAREA=PACA_EXMC
/*
 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 * nested machine check corrupts it. machine_check_common enables
 * MSR_RI.
 */
+   ISET_RI=0
+   ISTACK=0
+   IEARLY=1
+   IDAR=1
+   IDSISR=1
+   IRECONCILE=0
+   IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
+INT_DEFINE_END(machine_check_early)
+
+INT_DEFINE_BEGIN(machine_check)
+   IVEC=0x200
+   IAREA=PACA_EXMC
+   ISET_RI=0
+   IDAR=1
+   IDSISR=1
+   IKVM_SKIP=1
+   IKVM_REAL=1
+INT_DEFINE_END(machine_check)
+
+EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
+   GEN_INT_ENTRY machine_check_early, virt=0
 EXC_REAL_END(machine_check, 0x200, 0x100)
 EXC_VIRT_NONE(0x4200, 0x100)
 
 #ifdef CONFIG_PPC_PSERIES
 TRAMP_REAL_BEGIN(machine_check_fwnmi)
/* See comment at machine_check exception, don't turn on RI */
-   INT_HANDLER machine_check, 0x200, 

[PATCH v2 rebase 04/34] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

These don't provide a large amount of code sharing. Removing them
makes code easier to shuffle around. For example, some of the common
instructions will be moved into the common code gen macro.

No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 160 ---
 1 file changed, 117 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 087df86d03ff..cefe2e9a9e05 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -757,28 +757,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
 #define FINISH_NAP
 #endif
 
-#define EXC_COMMON(name, realvec, hdlr)
\
-   EXC_COMMON_BEGIN(name); \
-   INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \
-   bl  save_nvgprs;\
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret_from_except
-
-/*
- * Like EXC_COMMON, but for exceptions that can occur in the idle task and
- * therefore need the special idle handling (finish nap and runlatch)
- */
-#define EXC_COMMON_ASYNC(name, realvec, hdlr)  \
-   EXC_COMMON_BEGIN(name); \
-   INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \
-   FINISH_NAP; \
-   RUNLATCH_ON;\
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret_from_except_lite
-
-
 /*
  * There are a few constraints to be concerned with.
  * - Real mode exceptions code/data must be located at their physical location.
@@ -1349,7 +1327,13 @@ EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100)
INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, 
bitmask=IRQS_DISABLED, kvm=1
 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100)
 INT_KVM_HANDLER hardware_interrupt, 0x500, EXC_HV_OR_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ)
+EXC_COMMON_BEGIN(hardware_interrupt_common)
+   INT_COMMON 0x500, PACA_EXGEN, 1, 1, 1, 0, 0
+   FINISH_NAP
+   RUNLATCH_ON
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_IRQ
+   b   ret_from_except_lite
 
 
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
@@ -1455,7 +1439,13 @@ EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED
 EXC_VIRT_END(decrementer, 0x4900, 0x80)
 INT_KVM_HANDLER decrementer, 0x900, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt)
+EXC_COMMON_BEGIN(decrementer_common)
+   INT_COMMON 0x900, PACA_EXGEN, 1, 1, 1, 0, 0
+   FINISH_NAP
+   RUNLATCH_ON
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  timer_interrupt
+   b   ret_from_except_lite
 
 
 EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80)
@@ -1465,7 +1455,12 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
 INT_KVM_HANDLER hdecrementer, 0x980, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt)
+EXC_COMMON_BEGIN(hdecrementer_common)
+   INT_COMMON 0x980, PACA_EXGEN, 1, 1, 1, 0, 0
+   bl  save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  hdec_interrupt
+   b   ret_from_except
 
 
 EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100)
@@ -1475,11 +1470,17 @@ EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100)
INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED
 EXC_VIRT_END(doorbell_super, 0x4a00, 0x100)
 INT_KVM_HANDLER doorbell_super, 0xa00, EXC_STD, PACA_EXGEN, 0
+EXC_COMMON_BEGIN(doorbell_super_common)
+   INT_COMMON 0xa00, PACA_EXGEN, 1, 1, 1, 0, 0
+   FINISH_NAP
+   RUNLATCH_ON
+   addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_DOORBELL
-EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception)
+   bl  doorbell_exception
 #else
-EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, unknown_exception)
+   bl  unknown_exception
 #endif
+   b   ret_from_except_lite
 
 
 EXC_REAL_NONE(0xb00, 0x100)
@@ -1623,7 +1624,12 @@ EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
INT_HANDLER single_step, 0xd00, virt=1
 EXC_VIRT_END(single_step, 0x4d00, 0x100)
 INT_KVM_HANDLER single_step, 0xd00, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON(single_step_common, 0xd00, single_step_exception)
+EXC_COMMON_BEGIN(single_step_common)
+   INT_COMMON 0xd00, PACA_EXGEN, 1, 1, 1, 0, 0
+   bl  save_nvgprs
+   

[PATCH v2 rebase 03/34] powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 595e215515e9..087df86d03ff 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -204,6 +204,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define ISET_RI.L_ISET_RI_\name\()
 #define IEARLY .L_IEARLY_\name\()
 #define IMASK  .L_IMASK_\name\()
+#define IKVM_SKIP  .L_IKVM_SKIP_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
 #define IKVM_VIRT  .L_IKVM_VIRT_\name\()
 #define ISTACK .L_ISTACK_\name\()
@@ -243,6 +244,9 @@ do_define_int n
.ifndef IMASK
IMASK=0
.endif
+   .ifndef IKVM_SKIP
+   IKVM_SKIP=0
+   .endif
.ifndef IKVM_REAL
IKVM_REAL=0
.endif
@@ -265,6 +269,10 @@ do_define_int n
KVM_HANDLER \vec, \hsrr, \area, \skip
 .endm
 
+.macro GEN_KVM name
+   KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP
+.endm
+
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -1226,6 +1234,7 @@ INT_DEFINE_BEGIN(data_access)
IVEC=0x300
IDAR=1
IDSISR=1
+   IKVM_SKIP=1
IKVM_REAL=1
 INT_DEFINE_END(data_access)
 
@@ -1235,7 +1244,8 @@ EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
-INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(data_access_kvm)
+   GEN_KVM data_access
 EXC_COMMON_BEGIN(data_access_common)
GEN_COMMON data_access
ld  r4,_DAR(r1)
-- 
2.23.0



[PATCH v2 rebase 02/34] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0be6d8c34536..595e215515e9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -206,6 +206,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IMASK  .L_IMASK_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
 #define IKVM_VIRT  .L_IKVM_VIRT_\name\()
+#define ISTACK .L_ISTACK_\name\()
+#define IRECONCILE .L_IRECONCILE_\name\()
+#define IKUAP  .L_IKUAP_\name\()
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -246,6 +249,15 @@ do_define_int n
.ifndef IKVM_VIRT
IKVM_VIRT=0
.endif
+   .ifndef ISTACK
+   ISTACK=1
+   .endif
+   .ifndef IRECONCILE
+   IRECONCILE=1
+   .endif
+   .ifndef IKUAP
+   IKUAP=1
+   .endif
 .endm
 
 .macro INT_KVM_HANDLER name, vec, hsrr, area, skip
@@ -670,6 +682,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
.endif
 .endm
 
+.macro GEN_COMMON name
+   INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR
+.endm
+
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -1221,13 +1237,7 @@ EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
 EXC_COMMON_BEGIN(data_access_common)
-   /*
-* Here r13 points to the paca, r9 contains the saved CR,
-* SRR0 and SRR1 are saved in r11 and r12,
-* r9 - r13 are saved in paca->exgen.
-* EX_DAR and EX_DSISR have saved DAR/DSISR
-*/
-   INT_COMMON 0x300, PACA_EXGEN, 1, 1, 1, 1, 1
+   GEN_COMMON data_access
ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)
 BEGIN_MMU_FTR_SECTION
-- 
2.23.0



[PATCH v2 rebase 00/34] exception cleanup, syscall in C and !COMPAT

2019-11-27 Thread Michal Suchanek
Hello,

This is merge of https://patchwork.ozlabs.org/cover/1162376/ (except two
last experimental patches) and
https://patchwork.ozlabs.org/patch/1162079/ rebased on top of master.

There was minor conflict in Makefile in the latter series.

Refreshed the patchset to fix build error on ppc32 and ppc64e.

Rebased on top of powerpc/merge.

Thanks

Michal

Michal Suchanek (9):
  powerpc/64: system call: Fix sparse warning about missing declaration
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc: move common register copy functions from signal_32.c to
signal.c
  powerpc/perf: consolidate read_user_stack_32
  powerpc/perf: consolidate valid_user_sp
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by
default.
  powerpc/perf: split callchain.c by bitness
  MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
entry.

Nicholas Piggin (25):
  powerpc/64s/exception: Introduce INT_DEFINE parameter block for code
generation
  powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE
parameters
  powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE
parameters
  powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
  powerpc/64s/exception: Move all interrupt handlers to new style code
gen macros
  powerpc/64s/exception: Remove old INT_ENTRY macro
  powerpc/64s/exception: Remove old INT_COMMON macro
  powerpc/64s/exception: Remove old INT_KVM_HANDLER
  powerpc/64s/exception: Add ISIDE option
  powerpc/64s/exception: move real->virt switch into the common handler
  powerpc/64s/exception: move soft-mask test to common code
  powerpc/64s/exception: move KVM test to common code
  powerpc/64s/exception: remove confusing IEARLY option
  powerpc/64s/exception: remove the SPR saving patch code macros
  powerpc/64s/exception: trim unused arguments from KVMTEST macro
  powerpc/64s/exception: hdecrementer avoid touching the stack
  powerpc/64s/exception: re-inline some handlers
  powerpc/64s/exception: Clean up SRR specifiers
  powerpc/64s/exception: add more comments for interrupt handlers
  powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is
supported
  powerpc/64s/exception: soft nmi interrupt should not use
ret_from_except
  powerpc/64: system call remove non-volatile GPR save optimisation
  powerpc/64: system call implement the bulk of the logic in C
  powerpc/64s: interrupt return in C
  powerpc/64s/exception: remove lite interrupt return

 MAINTAINERS   |2 +
 arch/powerpc/Kconfig  |5 +-
 arch/powerpc/include/asm/asm-prototypes.h |   17 +-
 .../powerpc/include/asm/book3s/64/kup-radix.h |   24 +-
 arch/powerpc/include/asm/cputime.h|   24 +
 arch/powerpc/include/asm/exception-64s.h  |4 -
 arch/powerpc/include/asm/hw_irq.h |4 +
 arch/powerpc/include/asm/ptrace.h |3 +
 arch/powerpc/include/asm/signal.h |3 +
 arch/powerpc/include/asm/switch_to.h  |   11 +
 arch/powerpc/include/asm/thread_info.h|4 +-
 arch/powerpc/include/asm/time.h   |4 +-
 arch/powerpc/include/asm/unistd.h |1 +
 arch/powerpc/kernel/Makefile  |9 +-
 arch/powerpc/kernel/entry_64.S|  880 ++--
 arch/powerpc/kernel/exceptions-64e.S  |  255 ++-
 arch/powerpc/kernel/exceptions-64s.S  | 1937 -
 arch/powerpc/kernel/process.c |   89 +-
 arch/powerpc/kernel/signal.c  |  144 +-
 arch/powerpc/kernel/signal.h  |2 -
 arch/powerpc/kernel/signal_32.c   |  140 --
 arch/powerpc/kernel/syscall_64.c  |  349 +++
 arch/powerpc/kernel/syscalls/syscall.tbl  |   22 +-
 arch/powerpc/kernel/systbl.S  |9 +-
 arch/powerpc/kernel/time.c|9 -
 arch/powerpc/kernel/vdso.c|3 +-
 arch/powerpc/kernel/vector.S  |2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |   11 -
 arch/powerpc/kvm/book3s_segment.S |7 -
 arch/powerpc/perf/Makefile|5 +-
 arch/powerpc/perf/callchain.c |  370 +---
 arch/powerpc/perf/callchain.h |   20 +
 arch/powerpc/perf/callchain_32.c  |  197 ++
 arch/powerpc/perf/callchain_64.c  |  178 ++
 fs/read_write.c   |3 +-
 35 files changed, 2798 insertions(+), 1949 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

-- 
2.23.0



[PATCH v2 rebase 01/34] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation

2019-11-27 Thread Michal Suchanek
From: Nicholas Piggin 

The code generation macro arguments are difficult to read, and
defaults can't easily be used.

This introduces a block where parameters can be set for interrupt
handler code generation by the subsequent macros, and adds the first
generation macro for interrupt entry.

One interrupt handler is converted to the new macros to demonstrate
the change, the rest will be coverted all at once.

No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 77 ++--
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 46508b148e16..0be6d8c34536 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -193,6 +193,61 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
mtctr   reg;\
bctr
 
+/*
+ * Interrupt code generation macros
+ */
+#define IVEC   .L_IVEC_\name\()
+#define IHSRR  .L_IHSRR_\name\()
+#define IAREA  .L_IAREA_\name\()
+#define IDAR   .L_IDAR_\name\()
+#define IDSISR .L_IDSISR_\name\()
+#define ISET_RI.L_ISET_RI_\name\()
+#define IEARLY .L_IEARLY_\name\()
+#define IMASK  .L_IMASK_\name\()
+#define IKVM_REAL  .L_IKVM_REAL_\name\()
+#define IKVM_VIRT  .L_IKVM_VIRT_\name\()
+
+#define INT_DEFINE_BEGIN(n)\
+.macro int_define_ ## n name
+
+#define INT_DEFINE_END(n)  \
+.endm ;
\
+int_define_ ## n n ;   \
+do_define_int n
+
+.macro do_define_int name
+   .ifndef IVEC
+   .error "IVEC not defined"
+   .endif
+   .ifndef IHSRR
+   IHSRR=EXC_STD
+   .endif
+   .ifndef IAREA
+   IAREA=PACA_EXGEN
+   .endif
+   .ifndef IDAR
+   IDAR=0
+   .endif
+   .ifndef IDSISR
+   IDSISR=0
+   .endif
+   .ifndef ISET_RI
+   ISET_RI=1
+   .endif
+   .ifndef IEARLY
+   IEARLY=0
+   .endif
+   .ifndef IMASK
+   IMASK=0
+   .endif
+   .ifndef IKVM_REAL
+   IKVM_REAL=0
+   .endif
+   .ifndef IKVM_VIRT
+   IKVM_VIRT=0
+   .endif
+.endm
+
 .macro INT_KVM_HANDLER name, vec, hsrr, area, skip
TRAMP_KVM_BEGIN(\name\()_kvm)
KVM_HANDLER \vec, \hsrr, \area, \skip
@@ -474,7 +529,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 */
GET_SCRATCH0(r10)
std r10,\area\()+EX_R13(r13)
-   .if \dar
+   .if \dar == 1
.if \hsrr
mfspr   r10,SPRN_HDAR
.else
@@ -482,7 +537,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
std r10,\area\()+EX_DAR(r13)
.endif
-   .if \dsisr
+   .if \dsisr == 1
.if \hsrr
mfspr   r10,SPRN_HDSISR
.else
@@ -506,6 +561,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
 .endm
 
+.macro GEN_INT_ENTRY name, virt, ool=0
+   .if ! \virt
+   INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, 
ISET_RI, IDAR, IDSISR, IMASK, IKVM_REAL
+   .else
+   INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, 
ISET_RI, IDAR, IDSISR, IMASK, IKVM_VIRT
+   .endif
+.endm
+
 /*
  * On entry r13 points to the paca, r9-r13 are saved in the paca,
  * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and
@@ -1143,12 +1206,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
bl  unrecoverable_exception
b   .
 
+INT_DEFINE_BEGIN(data_access)
+   IVEC=0x300
+   IDAR=1
+   IDSISR=1
+   IKVM_REAL=1
+INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
-   INT_HANDLER data_access, 0x300, ool=1, dar=1, dsisr=1, kvm=1
+   GEN_INT_ENTRY data_access, virt=0, ool=1
 EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
-   INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1
+   GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
 EXC_COMMON_BEGIN(data_access_common)
-- 
2.23.0



Re: [Very RFC 35/46] powernv/pci: Remove open-coded PE lookup in pnv_pci_release_device

2019-11-27 Thread Oliver O'Halloran
On Wed, Nov 27, 2019 at 4:24 PM Alexey Kardashevskiy  wrote:
>
>
>
> On 20/11/2019 12:28, Oliver O'Halloran wrote:
> > Signed-off-by: Oliver O'Halloran 
> > ---
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 4f38652c7cd7..8525642b1256 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -3562,14 +3562,14 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe 
> > *pe)
> >  static void pnv_pci_release_device(struct pci_dev *pdev)
> >  {
> >   struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
> > + struct pnv_ioda_pe *pe = pnv_ioda_get_pe(pdev);
> >   struct pci_dn *pdn = pci_get_pdn(pdev);
> > - struct pnv_ioda_pe *pe;
> >
> >   /* The VF PE state is torn down when sriov_disable() is called */
> >   if (pdev->is_virtfn)
> >   return;
> >
> > - if (!pdn || pdn->pe_number == IODA_INVALID_PE)
> > + if (WARN_ON(!pe))
>
>
> Is that WARN_ON because there is always a PE - from upstream bridge or a

The device should always belong to a PE. If it doesn't (at this point)
then something deeply strange has happened.

> reserved one?

If it's associated with the reserved PE the rmap is set to
IODA_PE_INVALID, so would return NULL and we'd hit the WARN_ON(). I
think that's ok though since PE assignment should always succeed. If
it failed, or we're tearing down the device before we got to the point
of assigning a PE then there's probably a bug.


Re: [PATCH v2 29/35] powerpc/perf: remove current_is_64bit()

2019-11-27 Thread Michal Suchánek
On Wed, Nov 27, 2019 at 06:41:09AM +0100, Christophe Leroy wrote:
> 
> 
> Le 26/11/2019 à 21:13, Michal Suchanek a écrit :
> > Since commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
> > current_is_64bit() is quivalent to !is_32bit_task().
> > Remove the redundant function.
> > 
> > Link: https://github.com/linuxppc/issues/issues/275
> > Link: https://lkml.org/lkml/2019/9/12/540
> > 
> > Fixes: linuxppc#275
> > Suggested-by: Christophe Leroy 
> > Signed-off-by: Michal Suchanek 
> 
> This change is already in powerpc/next, see 
> https://github.com/linuxppc/linux/commit/42484d2c0f82b666292faf6668c77b49a3a04bc0

Right, needs rebase.

Thanks

Michal
> 
> Christophe
> 
> > ---
> >   arch/powerpc/perf/callchain.c | 17 +
> >   1 file changed, 1 insertion(+), 16 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index c84bbd4298a0..35d542515faf 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -284,16 +284,6 @@ static void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry,
> > }
> >   }
> > -static inline int current_is_64bit(void)
> > -{
> > -   /*
> > -* We can't use test_thread_flag() here because we may be on an
> > -* interrupt stack, and the thread flags don't get copied over
> > -* from the thread_info on the main stack to the interrupt stack.
> > -*/
> > -   return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> > -}
> > -
> >   #else  /* CONFIG_PPC64 */
> >   /*
> >* On 32-bit we just access the address and let hash_page create a
> > @@ -321,11 +311,6 @@ static inline void perf_callchain_user_64(struct 
> > perf_callchain_entry_ctx *entry
> >   {
> >   }
> > -static inline int current_is_64bit(void)
> > -{
> > -   return 0;
> > -}
> > -
> >   static inline int valid_user_sp(unsigned long sp, int is_64)
> >   {
> > if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> > @@ -486,7 +471,7 @@ static void perf_callchain_user_32(struct 
> > perf_callchain_entry_ctx *entry,
> >   void
> >   perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct 
> > pt_regs *regs)
> >   {
> > -   if (current_is_64bit())
> > +   if (!is_32bit_task())
> > perf_callchain_user_64(entry, regs);
> > else
> > perf_callchain_user_32(entry, regs);
> > 


Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Frederic Barrat




Le 27/11/2019 à 10:33, Greg Kurz a écrit :

On Wed, 27 Nov 2019 10:10:13 +0100
Frederic Barrat  wrote:




Le 27/11/2019 à 09:24, Greg Kurz a écrit :

On Wed, 27 Nov 2019 18:09:40 +1100
Alexey Kardashevskiy  wrote:




On 20/11/2019 12:28, Oliver O'Halloran wrote:

The comment here implies that we don't need to take a ref to the pci_dev
because the ioda_pe will always have one. This implies that the current
expection is that the pci_dev for an NPU device will *never* be torn
down since the ioda_pe having a ref to the device will prevent the
release function from being called.

In other words, the desired behaviour here appears to be leaking a ref.

Nice!



There is a history: https://patchwork.ozlabs.org/patch/1088078/

We did not fix anything in particular then, we do not seem to be fixing
anything now (in other words - we cannot test it in a normal natural
way). I'd drop this one.



Yeah, I didn't fix anything at the time. Just reverted to the ref
count behavior we had before:

https://patchwork.ozlabs.org/patch/829172/

Frederic recently posted his take on the same topic from the OpenCAPI
point of view:

http://patchwork.ozlabs.org/patch/1198947/

He seems to indicate the NPU devices as the real culprit because
nobody ever cared for them to be removable. Fixing that seems be
a chore nobody really wants to address obviously... :-\



I had taken a stab at not leaking a ref for the nvlink devices and do
the proper thing regarding ref counting (i.e. fixing all the callers of
get_pci_dev() to drop the reference when they were done). With that, I
could see that the ref count of the nvlink devices could drop to 0
(calling remove for the device in /sys) and that the devices could go away.

But then, I realized it's not necessarily desirable at this point. There
are several comments in the code saying the npu devices (for nvlink)
don't go away, there's no device release callback defined when it seems
there should be, at least to handle releasing PEs All in all, it
seems that some work would be needed. And if it hasn't been required by
now...



If everyone is ok with leaking a reference in the NPU case, I guess
this isn't a problem. But if we move forward with Oliver's patch, a
pci_dev_put() would be needed for OpenCAPI, correct ?



No, these code paths are nvlink-only.

  Fred




Fred







Signed-off-by: Oliver O'Halloran 
---
   arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
   1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 72d3749da02c..2eb6e6d45a98 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn)
break;
   
   	/*

-* pci_get_domain_bus_and_slot() increased the reference count of
-* the PCI device, but callers don't need that actually as the PE
-* already holds a reference to the device. Since callers aren't
-* aware of the reference count change, call pci_dev_put() now to
-* avoid leaks.
+* NB: for_each_pci_dev() elevates the pci_dev refcount.
+* Caller is responsible for dropping the ref when it's
+* finished with it.
 */
-   if (pdev)
-   pci_dev_put(pdev);
-
return pdev;
   }
   













Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Oliver O'Halloran
On Wed, Nov 27, 2019 at 8:34 PM Greg Kurz  wrote:
>
>
> If everyone is ok with leaking a reference in the NPU case, I guess
> this isn't a problem. But if we move forward with Oliver's patch, a
> pci_dev_put() would be needed for OpenCAPI, correct ?

Yes, but I think that's fair enough. By convention it's the callers
responsibility to drop the ref when it calls a function that returns a
refcounted object. Doing anything else creates a race condition since
the object's count could drop to zero before the caller starts using
it.

Oliver


Re: [PATCH 02/14] Revert "powerpc/powernv: remove the unused vas_win_paste_addr and vas_win_id functions"

2019-11-27 Thread Christoph Hellwig
On Wed, Nov 27, 2019 at 01:20:36AM -0800, Haren Myneni wrote:
> Thanks for the review.
> vas_win_paste_addr() will be used in NX compression driver and planning to
> post this series soon. Can I add this change later as part of this series?

Please only add core functionality and exports with the actual users.


Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Frederic Barrat




Le 27/11/2019 à 09:24, Greg Kurz a écrit :

On Wed, 27 Nov 2019 18:09:40 +1100
Alexey Kardashevskiy  wrote:




On 20/11/2019 12:28, Oliver O'Halloran wrote:

The comment here implies that we don't need to take a ref to the pci_dev
because the ioda_pe will always have one. This implies that the current
expection is that the pci_dev for an NPU device will *never* be torn
down since the ioda_pe having a ref to the device will prevent the
release function from being called.

In other words, the desired behaviour here appears to be leaking a ref.

Nice!



There is a history: https://patchwork.ozlabs.org/patch/1088078/

We did not fix anything in particular then, we do not seem to be fixing
anything now (in other words - we cannot test it in a normal natural
way). I'd drop this one.



Yeah, I didn't fix anything at the time. Just reverted to the ref
count behavior we had before:

https://patchwork.ozlabs.org/patch/829172/

Frederic recently posted his take on the same topic from the OpenCAPI
point of view:

http://patchwork.ozlabs.org/patch/1198947/

He seems to indicate the NPU devices as the real culprit because
nobody ever cared for them to be removable. Fixing that seems be
a chore nobody really wants to address obviously... :-\



I had taken a stab at not leaking a ref for the nvlink devices and do 
the proper thing regarding ref counting (i.e. fixing all the callers of 
get_pci_dev() to drop the reference when they were done). With that, I 
could see that the ref count of the nvlink devices could drop to 0 
(calling remove for the device in /sys) and that the devices could go away.


But then, I realized it's not necessarily desirable at this point. There 
are several comments in the code saying the npu devices (for nvlink) 
don't go away, there's no device release callback defined when it seems 
there should be, at least to handle releasing PEs All in all, it 
seems that some work would be needed. And if it hasn't been required by 
now...


  Fred







Signed-off-by: Oliver O'Halloran 
---
  arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
  1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 72d3749da02c..2eb6e6d45a98 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn)
break;
  
  	/*

-* pci_get_domain_bus_and_slot() increased the reference count of
-* the PCI device, but callers don't need that actually as the PE
-* already holds a reference to the device. Since callers aren't
-* aware of the reference count change, call pci_dev_put() now to
-* avoid leaks.
+* NB: for_each_pci_dev() elevates the pci_dev refcount.
+* Caller is responsible for dropping the ref when it's
+* finished with it.
 */
-   if (pdev)
-   pci_dev_put(pdev);
-
return pdev;
  }
  









Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 18:09:40 +1100
Alexey Kardashevskiy  wrote:

> 
> 
> On 20/11/2019 12:28, Oliver O'Halloran wrote:
> > The comment here implies that we don't need to take a ref to the pci_dev
> > because the ioda_pe will always have one. This implies that the current
> > expection is that the pci_dev for an NPU device will *never* be torn
> > down since the ioda_pe having a ref to the device will prevent the
> > release function from being called.
> > 
> > In other words, the desired behaviour here appears to be leaking a ref.
> > 
> > Nice!
> 
> 
> There is a history: https://patchwork.ozlabs.org/patch/1088078/
> 
> We did not fix anything in particular then, we do not seem to be fixing
> anything now (in other words - we cannot test it in a normal natural
> way). I'd drop this one.
> 

Yeah, I didn't fix anything at the time. Just reverted to the ref
count behavior we had before:

https://patchwork.ozlabs.org/patch/829172/

Frederic recently posted his take on the same topic from the OpenCAPI
point of view:

http://patchwork.ozlabs.org/patch/1198947/

He seems to indicate the NPU devices as the real culprit because
nobody ever cared for them to be removable. Fixing that seems be
a chore nobody really wants to address obviously... :-\

> 
> 
> > 
> > Signed-off-by: Oliver O'Halloran 
> > ---
> >  arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
> >  1 file changed, 3 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> > b/arch/powerpc/platforms/powernv/npu-dma.c
> > index 72d3749da02c..2eb6e6d45a98 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node 
> > *dn)
> > break;
> >  
> > /*
> > -* pci_get_domain_bus_and_slot() increased the reference count of
> > -* the PCI device, but callers don't need that actually as the PE
> > -* already holds a reference to the device. Since callers aren't
> > -* aware of the reference count change, call pci_dev_put() now to
> > -* avoid leaks.
> > +* NB: for_each_pci_dev() elevates the pci_dev refcount.
> > +* Caller is responsible for dropping the ref when it's
> > +* finished with it.
> >  */
> > -   if (pdev)
> > -   pci_dev_put(pdev);
> > -
> > return pdev;
> >  }
> >  
> > 
> 



Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-27 Thread Christoph Hellwig
On Wed, Nov 27, 2019 at 08:56:25AM +0200, Mike Rapoport wrote:
> Maybe we'll simply force bottom up allocation before calling
> swiotlb_init()? Anyway, it's the last memblock allocation.

That should work, but I don't think it is the proper fix.  The underlying
issue here is that ZONE_DMA/DMA32 sizing is something that needs to
be propagated to memblock and dma-direct as is based around addressing
limitations.  But our zone initialization is such a mess that we
can't just reuse a variable.  Nicolas has started to clean some of
this up, but we need to clean that whole zone initialization mess up
a lot more.


Re: [PATCH] powerpc/32: drop unused ISA_DMA_THRESHOLD

2019-11-27 Thread Christoph Hellwig
On Mon, Nov 25, 2019 at 11:20:33AM +0200, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The ISA_DMA_THRESHOLD variable is set by several platforms but never
> referenced.
> Remove it.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2019-11-27 Thread Christoph Hellwig
>  
> +static void notify_process(pid_t pid, u64 fault_addr)
> +{
> + int rc;
> + struct kernel_siginfo info;
> +
> + memset(, 0, sizeof(info));
> +
> + info.si_signo = SIGSEGV;
> + info.si_errno = EFAULT;
> + info.si_code = SEGV_MAPERR;
> +
> + info.si_addr = (void *)fault_addr;
> + rcu_read_lock();
> + rc = kill_pid_info(SIGSEGV, , find_vpid(pid));
> + rcu_read_unlock();
> +
> + pr_devel("%s(): pid %d kill_proc_info() rc %d\n", __func__, pid, rc);
> +}

Shouldn't this use force_sig_fault_to_task instead?

> + /*
> +  * User space passed invalid CSB address, Notify process with
> +  * SEGV signal.
> +  */
> + tsk = get_pid_task(window->pid, PIDTYPE_PID);
> + /*
> +  * Send window will be closed after processing all NX requests
> +  * and process exits after closing all windows. In multi-thread
> +  * applications, thread may not exists, but does not close FD
> +  * (means send window) upon exit. Parent thread (tgid) can use
> +  * and close the window later.
> +  */
> + if (tsk) {
> + if (tsk->flags & PF_EXITING)
> + task_exit = 1;
> + put_task_struct(tsk);
> + pid = vas_window_pid(window);

The pid is later used for sending the signal again, why not keep the
reference?

> + } else {
> + pid = vas_window_tgid(window);
> +
> + rcu_read_lock();
> + tsk = find_task_by_vpid(pid);
> + if (!tsk) {
> + rcu_read_unlock();
> + return;
> + }
> + if (tsk->flags & PF_EXITING)
> + task_exit = 1;
> + rcu_read_unlock();

Why does this not need a reference to the task, but the other one does?


Re: [PATCH 06/14] powerpc/vas: Setup fault handler per VAS instance

2019-11-27 Thread Christoph Hellwig
>  
> +struct task_struct *fault_handler;
> +
> +void vas_wakeup_fault_handler(int virq, void *arg)
> +{
> + struct vas_instance *vinst = arg;
> +
> + atomic_inc(>pending_fault);
> + wake_up(>fault_wq);
> +}
> +
> +/*
> + * Fault handler thread for each VAS instance and process fault CRBs.
> + */
> +static int fault_handler_func(void *arg)
> +{
> + struct vas_instance *vinst = (struct vas_instance *)arg;
> +
> + do {
> + if (signal_pending(current))
> + flush_signals(current);
> +
> + wait_event_interruptible(vinst->fault_wq,
> + atomic_read(>pending_fault) ||
> + kthread_should_stop());
> +
> + if (kthread_should_stop())
> + break;
> +
> + atomic_dec(>pending_fault);
> + } while (!kthread_should_stop());
> +
> + return 0;
> +}

Pleae use threaded interrupts instead of reinventing them badly.


Re: [PATCH 05/14] powerpc/vas: Setup fault window per VAS instance

2019-11-27 Thread Christoph Hellwig
> +/*
> + * We do not remove VAS instances. The following functions are needed
> + * when VAS hotplug is supported.
> + */
> +#if 0

Please don't add dead code to the kernel tree.


Re: [PATCH 04/14] powerpc/vas: Setup IRQ mapping and register port for each window

2019-11-27 Thread Christoph Hellwig
> +static irqreturn_t vas_irq_handler(int virq, void *data)
> +{
> + struct vas_instance *vinst = data;
> +
> + pr_devel("VAS %d: virq %d\n", vinst->vas_id, virq);
> +
> + return IRQ_HANDLED;
> +}

An empty interrupt handler is rather pointless.  It later grows code,
but adding it without that is a bad idea.  Please squash the patches
into sesible chunks.


Re: [PATCH 03/14] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block

2019-11-27 Thread Christoph Hellwig
> +#define crb_csb_addr(c)  __be64_to_cpu(c->csb_addr)
> +#define crb_nx_fault_addr(c) __be64_to_cpu(c->stamp.nx.fault_storage_addr)
> +#define crb_nx_flags(c)  c->stamp.nx.flags
> +#define crb_nx_fault_status(c)   c->stamp.nx.fault_status

Except for crb_nx_fault_addr all these macros are unused, and
crb_nx_fault_addr probably makes more sense open coded in the only
caller.

Also please don't use the __ prefixed byte swap helpers in any driver
or arch code.

> +
> +static inline uint32_t crb_nx_pswid(struct coprocessor_request_block *crb)
> +{
> + return __be32_to_cpu(crb->stamp.nx.pswid);
> +}

Same here.  Also not sure what the point of the helper is except for
obsfucating the code.


Re: [PATCH 02/14] Revert "powerpc/powernv: remove the unused vas_win_paste_addr and vas_win_id functions"

2019-11-27 Thread Christoph Hellwig
On Tue, Nov 26, 2019 at 05:03:27PM -0800, Haren Myneni wrote:
> 
> This reverts commit 452d23c0f6bd97f2fd8a9691fee79b76040a0feb.
> 
> User space send windows (NX GZIP compression) need vas_win_paste_addr()
> to mmap window paste address and vas_win_id() to get window ID when
> window address is given.

Even with your full series applied vas_win_paste_addr is entirely
unused, and vas_win_id is only used once in the same file it is defined.

So instead of this patch you should just open code vas_win_id in
init_winctx_for_txwin.

> +static inline u32 encode_pswid(int vasid, int winid)
> +{
> + u32 pswid = 0;
> +
> + pswid |= vasid << (31 - 7);
> + pswid |= winid;
> +
> + return pswid;

This can be simplified down to:

return (u32)winid | (vasid << (31 - 7));