date:20200415


Hi,

Le 16/04/2020 à 00:06, Segher Boessenkool a écrit :

Hi!

On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote:

At the time being, __put_user()/__get_user() and friends only use
register indirect with immediate index addressing, with the index
set to 0. Ex:

lwz reg1, 0(reg2)


This is called a "D-form" instruction, or sometimes "offset addressing".
Don't talk about an "index", it confuses things, because the *other*
kind is called "indexed" already, also in the ISA docs!  (X-form, aka
indexed addressing, [reg+reg], where D-form does [reg+imm], and both
forms can do [reg]).


In the "Programming Environments Manual for 32-Bit Implementations of 
the PowerPC™ Architecture", they list the following addressing modes:


Load and store operations have three categories of effective address 
generation that depend on the

operands specified:
• Register indirect with immediate index mode
• Register indirect with index mode
• Register indirect mode





Give the compiler the opportunity to use other adressing modes
whenever possible, to get more optimised code.


Great :-)


--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -114,7 +114,7 @@ extern long __put_user_bad(void);
   */
  #define __put_user_asm(x, addr, err, op)  \
__asm__ __volatile__(   \
-   "1:" op " %1,0(%2)   # put_user\n"  \
+   "1:" op "%U2%X2 %1,%2# put_user\n"  \
"2:\n"\
".section .fixup,\"ax\"\n"  \
"3:li %0,%3\n"\
@@ -122,7 +122,7 @@ extern long __put_user_bad(void);
".previous\n" \
EX_TABLE(1b, 3b)\
: "=r" (err)  \
-   : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
+   : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))


%Un on an "m" operand doesn't do much: you need to make it "m<>" if you
want pre-modify ("update") insns to be generated.  (You then will want
to make sure that operand is used in a way GCC can understand; since it
is used only once here, that works fine).


Ah ? Indeed I got the idea from include/asm/io.h where there is:

#define DEF_MMIO_IN_D(name, size, insn) \
static inline u##size name(const volatile u##size __iomem *addr)\
{   \
u##size ret;\
__asm__ __volatile__("sync;"#insn"%U1%X1 %0,%1;twi 0,%0,0;isync"\
: "=r" (ret) : "m" (*addr) : "memory");   \
return ret; \
}

It should be "m<>" there as well ?




@@ -130,8 +130,8 @@ extern long __put_user_bad(void);
  #else /* __powerpc64__ */
  #define __put_user_asm2(x, addr, err) \
__asm__ __volatile__(   \
-   "1:stw %1,0(%2)\n"\
-   "2:stw %1+1,4(%2)\n"  \
+   "1:stw%U2%X2 %1,%2\n" \
+   "2:stw%U2%X2 %L1,%L2\n"   \
"3:\n"\
".section .fixup,\"ax\"\n"  \
"4:li %0,%3\n"\
@@ -140,7 +140,7 @@ extern long __put_user_bad(void);
EX_TABLE(1b, 4b)\
EX_TABLE(2b, 4b)\
: "=r" (err)  \
-   : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
+   : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))


Here, it doesn't work.  You don't want two consecutive update insns in
any case.  Easiest is to just not use "m<>", and then, don't use %Un
(which won't do anything, but it is confusing).


Can't we leave the Un on the second stw ?



Same for the reads.

Rest looks fine, and update should be good with that fixed as said.

Reviewed-by: Segher Boessenkool 


Segher



Thanks for the review
Christophe

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

Yes, kzalloc() would clean the allocated areas and the init of remaining array
elements are redundant. I will remove the block in v3.

>> > +  dev_err(>dev, "error no valid uio-map configured\n");
>> > +  ret = -EINVAL;
>> > +  goto err_info_free_internel;
>> > +  }
>> > +
>> > +  info->version = "0.1.0";
>> 
>> Could you define some DRIVER_VERSION in the top of the file next to 
>> DRIVER_NAME instead of hard coding in the middle on a function ?
>
>That's what v1 had, and Greg KH said to remove it.  I'm guessing that he
>thought it was the common-but-pointless practice of having the driver print a
>version number that never gets updated, rather than something the UIO API
>(unfortunately, compared to a feature query interface) expects.  That said,
>I'm not sure what the value is of making it a macro since it should only be
>used once, that use is self documenting, it isn't tunable, etc.  Though if
>this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro
>again, it should be UIO_VERSION, not DRIVER_VERSION).
>
>Does this really need a three-part version scheme?  What's wrong with a
>version of "1", to be changed to "2" in the hopefully-unlikely event that the
>userspace API changes?  Assuming UIO is used for this at all, which doesn't
>seem like a great fit to me.
>
>-Scott
>

As Scott mentioned, the version define as necessity by uio core but actually
useless for us here(and for many other type of devices I guess). So maybe the 
better
way is to set it optionally, but this belong first to uio core.

For the cache-sram uio driver, I will define an UIO_VERSION micro as a 
compromise
fit all wonders, no confusing as Greg first mentioned.

>> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
>> +{   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
>> +{},
>> +};
>
>NACK
>
>The device tree describes the hardware, not what driver you want to bind the
>hardware to, or how you want to allocate the resources.  And even if defining
>nodes for sram allocation were the right way to go, why do you have a separate
>compatible for each chip when you're just describing software configuration?
>
>Instead, have module parameters that take the sizes and alignments you'd like
>to allocate and expose to userspace.  Better still would be some sort of
>dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment,
>if it succeeds you can mmap it, and when the fd is closed the region is
>freed).
>
>-Scott
>

Can not agree more. But what if I want to define more than one cache-sram uio 
devices?
How about use the device tree for pseudo uio cache-sram driver?

static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
{   .compatible = "uio,cache-sram", },
{},
};

Thanks,
Wenhu

[PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function

2020-04-15 Thread Tianjia Zhang

In earlier versions of kvm, 'kvm_run' is an independent structure
and is not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplify the function definition, removes the extra
'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure
if necessary.

Signed-off-by: Tianjia Zhang 
---

v2 change:
  remove 'kvm_run' parameter and extract it from 'kvm_vcpu'

 arch/mips/kvm/mips.c   |  3 ++-
 arch/powerpc/kvm/powerpc.c |  3 ++-
 arch/s390/kvm/kvm-s390.c   |  3 ++-
 arch/x86/kvm/x86.c | 11 ++-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/arm/arm.c |  6 +++---
 virt/kvm/kvm_main.c|  2 +-
 7 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 8f05dd0a0f4e..ec24adf4857e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
return -ENOIOCTLCMD;
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *run = vcpu->run;
int r = -EINTR;
 
vcpu_load(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index e15166b0a16d..7e24691e138a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
return r;
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *run = vcpu->run;
int r;
 
vcpu_load(vcpu);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 19a81024fe16..443af3ead739 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
store_regs_fmt2(vcpu, kvm_run);
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *kvm_run = vcpu->run;
int rc;
 
if (kvm_run->immediate_exit)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf2ecafd027..a0338e86c90f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
trace_kvm_fpu(0);
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *kvm_run = vcpu->run;
int r;
 
vcpu_load(vcpu);
@@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
r = -EAGAIN;
if (signal_pending(current)) {
r = -EINTR;
-   vcpu->run->exit_reason = KVM_EXIT_INTR;
+   kvm_run->exit_reason = KVM_EXIT_INTR;
++vcpu->stat.signal_exits;
}
goto out;
}
 
-   if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
+   if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
r = -EINVAL;
goto out;
}
 
-   if (vcpu->run->kvm_dirty_regs) {
+   if (kvm_run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
goto out;
@@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 
 out:
kvm_put_guest_fpu(vcpu);
-   if (vcpu->run->kvm_valid_regs)
+   if (kvm_run->kvm_valid_regs)
store_regs(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d58beb65454..1e17ef719595 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state);
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
 
 int kvm_arch_init(void *opaque);
 void kvm_arch_exit(void);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..f5390ac2165b 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:  The VCPU pointer
- * @run:   The kvm_run structure pointer used for userspace

[PATCH] KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions

2020-04-15 Thread Paul Mackerras

Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.

Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE.  That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down.  That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.

This addresses the problem by catching that case and returning to the guest
instead, letting it fault again, and retrying the whole page fault from
the beginning.

For completeness, this fixes the radix page fault handler in the same
way.  For radix this didn't cause any obvious misbehaviour, because we
ended up putting the non-present PTE into the guest's partition-scoped
page tables, leading immediately to another hypervisor data/instruction
storage interrupt, which would go through the page fault path again
and fix things up.

Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page 
fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
Reported-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
This is a reworked version of the patch David Gibson sent recently,
with the fix applied to the radix case as well. The commit message
is mostly stolen from David's patch.

 arch/powerpc/kvm/book3s_64_mmu_hv.c| 9 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 3aecec8..20b7dce 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -604,18 +604,19 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 */
local_irq_disable();
ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, );
+   pte = __pte(0);
+   if (ptep)
+   pte = *ptep;
+   local_irq_enable();
/*
 * If the PTE disappeared temporarily due to a THP
 * collapse, just return and let the guest try again.
 */
-   if (!ptep) {
-   local_irq_enable();
+   if (!pte_present(pte)) {
if (page)
put_page(page);
return RESUME_GUEST;
}
-   pte = *ptep;
-   local_irq_enable();
hpa = pte_pfn(pte) << PAGE_SHIFT;
pte_size = PAGE_SIZE;
if (shift)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 134fbc1..7bf94ba 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -815,18 +815,19 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 */
local_irq_disable();
ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, );
+   pte = __pte(0);
+   if (ptep)
+   pte = *ptep;
+   local_irq_enable();
/*
 * If the PTE disappeared temporarily due to a THP
 * collapse, just return and let the guest try again.
 */
-   if (!ptep) {
-   local_irq_enable();
+   if (!pte_present(pte)) {
if (page)
put_page(page);
return RESUME_GUEST;
}
-   pte = *ptep;
-   local_irq_enable();
 
/* If we're logging dirty pages, always map single pages */
large_enable = !(memslot->flags & KVM_MEM_LOG_DIRTY_PAGES);
-- 
2.7.4

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

2020-04-15 Thread Florian Weimer

* Rich Felker:

> My preference would be that it work just like the i386 AT_SYSINFO
> where you just replace "int $128" with "call *%%gs:16" and the kernel
> provides a stub in the vdso that performs either scv or the old
> mechanism with the same calling convention.

The i386 mechanism has received some criticism because it provides an
effective means to redirect execution flow to anyone who can write to
the TCB.  I am not sure if it makes sense to copy it.

Re: [PATCH v2] powerpc/setup_64: Set cache-line-size based on cache-block-size

2020-04-15 Thread Chris Packham

Hi All,

On Wed, 2020-03-25 at 16:18 +1300, Chris Packham wrote:
> If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not,
> use
> the block-size value for both. Per the devicetree spec cache-line-
> size
> is only needed if it differs from the block size.
> 
> Signed-off-by: Chris Packham 
> ---
> It looks as though the bsizep = lsizep is not required per the spec
> but it's
> probably safer to retain it.
> 
> Changes in v2:
> - Scott pointed out that u-boot should be filling in the cache
> properties
>   (which it does). But it does not specify a cache-line-size because
> it
>   provides a cache-block-size and the spec says you don't have to if
> they are
>   the same. So the error is in the parsing not in the devicetree
> itself.
> 

Ping? This thread went kind of quiet.

>  arch/powerpc/kernel/setup_64.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c
> b/arch/powerpc/kernel/setup_64.c
> index e05e6dd67ae6..dd8a238b54b8 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -516,6 +516,8 @@ static bool __init parse_cache_info(struct
> device_node *np,
>   lsizep = of_get_property(np, propnames[3], NULL);
>   if (bsizep == NULL)
>   bsizep = lsizep;
> + if (lsizep == NULL)
> + lsizep = bsizep;
>   if (lsizep != NULL)
>   lsize = be32_to_cpu(*lsizep);
>   if (bsizep != NULL)

Re: [PATCH v2, 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

From: Scott Wood 

>> +bool "32-bit kernel"
>
>Why make that user selectable ?
>
>Either a kernel is 64-bit or it is 32-bit. So having PPC64 user 
>selectable is all we need.
>
>And what is the link between this change and the description in the log ?
>
>>  default y if !PPC64
>>  select KASAN_VMALLOC if KASAN && MODULES
>>   
>> @@ -15,6 +15,7 @@ config PPC_BOOK3S_32
>>  bool
>>   
>>   menu "Processor support"
>> +
>
>Why adding this space ?
>
>>   choice
>>  prompt "Processor Type"
>>  depends on PPC32
>> @@ -211,9 +212,9 @@ config PPC_BOOK3E
>>  depends on PPC_BOOK3E_64
>>   
>>   config E500
>> +bool "e500 Support"
>>  select FSL_EMB_PERFMON
>>  select PPC_FSL_BOOK3E
>> -bool
>
>Why make this user-selectable ? This is already selected by the 
>processors requiring it, ie 8500, e5500 and e6500.
>
>Is there any other case where we need E500 ?
>
>And again, what's the link between this change and the description in 
>the log ?
>
>
>>   
>>   config PPC_E500MC
>>  bool "e500mc Support"
>> 
>
>Christophe

Hi, Scott, Christophe!

I find that I did not get the point well of the defferences between
configurability and selectability(maybe words I created) of Kconfig items.

You are right that FSL_85XX_CACHE_SRAM should only be selected by a caller
but never enable it seperately.

Same answer for the comments from Christophe. I will drop this patch in v3.

Thanks,
Wenhu

Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Paul Mackerras

On Wed, Apr 15, 2020 at 04:03:29PM +0200, Michal Suchánek wrote:
> On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote:
> > The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
> > Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and
> > User Authority Mask Override Register (UAMOR) are not correctly saved and
> > restored when the CPU is going into/coming out of idle state.
> > 
> > On POWER9 CPUs, this means that a CPU may return from idle with the AMR
> > value of another thread on the same core.
> > 
> > This allows a trivial Denial of Service attack against KVM hosts, by booting
> > a guest kernel which makes use of the AMR, such as a v5.2 or later kernel
> > with Kernel Userspace Access Prevention (KUAP) enabled.
> > 
> > The guest kernel will set the AMR to prevent userspace access, then the
> > thread will go idle. At a later point, the hardware thread that the guest
> > was using may come out of idle and start executing in the host, without
> > restoring the host AMR value. The host kernel can get caught in a page fault
> > loop, as the AMR is unexpectedly causing memory accesses to fail in the
> > host, and the host is eventually rendered unusable.
> 
> Hello,
> 
> shouldn't the kernel restore the host registers when leaving the guest?

It does.  That's not the bug.

> I recall some code exists for handling the *AM*R when leaving guest. Can
> the KVM guest enter idle without exiting to host?

No, we currently never execute the "stop" instruction in guest context.

The bug occurs when a thread that is in the host goes idle and
executes the stop instruction to go to a power-saving state, while
another thread is executing inside a guest.  Hardware loses the first
thread's AMR while it is stopped, and as it happens, it is possible
for the first thread to wake up with the contents of its AMR equal to
the other thread's AMR.  This can happen even if the first thread has
never executed in the guest.

The kernel needs to save and restore AMR (among other registers)
across the stop instruction because of this hardware behaviour.
We missed the AMR initially, which is what led to this vulnerability.

Paul.

[PATCH] KVM: PPC: Handle non-present PTEs in kvmppc_book3s_hv_page_fault()

2020-04-15 Thread David Gibson

Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.

Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE.  That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down.  That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.

This addresses the problem by catching that case and returning to the guest
instead.

Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page 
fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
Suggested-by: Paul Mackerras 
Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6404df613ea3..394fca8e630a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -616,6 +616,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
}
pte = *ptep;
local_irq_enable();
+   if (!pte_present(pte)) {
+   if (page)
+   put_page(page);
+   return RESUME_GUEST;
+   }
hpa = pte_pfn(pte) << PAGE_SHIFT;
pte_size = PAGE_SIZE;
if (shift)
-- 
2.25.2

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 1:03 pm:
> On Thu, Apr 16, 2020 at 12:53:31PM +1000, Nicholas Piggin wrote:
>> > Not to mention the dcache line to access
>> > __hwcap or whatever, and the icache lines to setup access TOC-relative
>> > access to it. (Of course you could put a copy of its value in TLS at a
>> > fixed offset, which would somewhat mitigate both.)
>> > 
>> >> And finally, the HWCAP test can eventually go away in future. A vdso
>> >> call can not.
>> > 
>> > We support nearly arbitrarily old kernels (with limited functionality)
>> > and hardware (with full functionality) and don't intend for that to
>> > change, ever. But indeed glibc might want too eventually drop the
>> > check.
>> 
>> Ah, cool. Any build-time flexibility there?
>> 
>> We may or may not be getting a new ABI that will use instructions not 
>> supported by old processors.
>> 
>> https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html
>> 
>> Current ABI continues to work of course and be the default for some 
>> time, but building for new one would give some opportunity to drop
>> such support for old procs, at least for glibc.
> 
> What does "new ABI" entail to you? In the terminology I use with musl,
> "new ABI" and "new ISA level" are different things. You can compile
> (explicit -march or compiler default) binaries that won't run on older
> cpus due to use of new insns etc., but we consider it the same ABI if
> you can link code for an older/baseline ISA level with the
> newer-ISA-level object files, i.e. if the interface surface for
> linkage remains compatible. We also try to avoid gratuitous
> proliferation of different ABIs unless there's a strong underlying
> need (like addition of softfloat ABIs for archs that usually have FPU,
> or vice versa).

Yeah it will be a new ABI type that also requires a new ISA level.
As far as I know (and I'm not on the toolchain side) there will be
some call compatibility between the two, so it may be fine to
continue with existing ABI for libc. But it just something that
comes to mind as a build-time cutover where we might be able to
assume particular features.

> In principle the same could be done for kernels except it's a bigger
> silent gotcha (possible ENOSYS in places where it shouldn't be able to
> happen rather than a trapping SIGILL or similar) and there's rarely
> any serious performance or size benefit to dropping support for older
> kernels.

Right, I don't think it'd be a huge problem whatever way we go,
compared with the cost of the system call.

Thanks,
Nick

Re: [PATCH] papr/scm: Add bad memory ranges to nvdimm bad ranges

2020-04-15 Thread Vaibhav Jain

Hi Santosh,

Some review comments below.

Santosh Sivaraj  writes:

> Subscribe to the MCE notification and add the physical address which
> generated a memory error to nvdimm bad range.
>
> Signed-off-by: Santosh Sivaraj 
> ---
>
> This patch depends on "powerpc/mce: Add MCE notification chain" [1].
>
> Unlike the previous series[2], the patch adds badblock registration only for
> pseries scm driver. Handling badblocks for baremetal (powernv) PMEM will be 
> done
> later and if possible get the badblock handling as a common code.
>
> [1] 
> https://lore.kernel.org/linuxppc-dev/20200330071219.12284-1-ganes...@linux.ibm.com/
> [2] 
> https://lore.kernel.org/linuxppc-dev/20190820023030.18232-1-sant...@fossix.org/
>
> arch/powerpc/platforms/pseries/papr_scm.c | 96 ++-
>  1 file changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 0b4467e378e5..5012cbf4606e 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  
> @@ -39,8 +41,12 @@ struct papr_scm_priv {
>   struct resource res;
>   struct nd_region *region;
>   struct nd_interleave_set nd_set;
> + struct list_head region_list;
>  };
>  
> +LIST_HEAD(papr_nd_regions);
> +DEFINE_MUTEX(papr_ndr_lock);
> +
>  static int drc_pmem_bind(struct papr_scm_priv *p)
>  {
>   unsigned long ret[PLPAR_HCALL_BUFSIZE];
> @@ -372,6 +378,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   dev_info(dev, "Region registered with target node %d and online 
> node %d",
>target_nid, online_nid);
>  
> + mutex_lock(_ndr_lock);
> + list_add_tail(>region_list, _nd_regions);
> + mutex_unlock(_ndr_lock);
> +
>   return 0;
>  
>  err: nvdimm_bus_unregister(p->bus);
> @@ -379,6 +389,68 @@ err: nvdimm_bus_unregister(p->bus);
>   return -ENXIO;
>  }
>  
> +void papr_scm_add_badblock(struct nd_region *region, struct nvdimm_bus *bus,
> +u64 phys_addr)
> +{
> + u64 aligned_addr = ALIGN_DOWN(phys_addr, L1_CACHE_BYTES);
> +
> + if (nvdimm_bus_add_badrange(bus, aligned_addr, L1_CACHE_BYTES)) {
> + pr_err("Bad block registration for 0x%llx failed\n", phys_addr);
> + return;
> + }
> +
> + pr_debug("Add memory range (0x%llx - 0x%llx) as bad range\n",
> +  aligned_addr, aligned_addr + L1_CACHE_BYTES);
> +
> + nvdimm_region_notify(region, NVDIMM_REVALIDATE_POISON);
> +}
> +
> +static int handle_mce_ue(struct notifier_block *nb, unsigned long val,
> +  void *data)
> +{
> + struct machine_check_event *evt = data;
> + struct papr_scm_priv *p;
> + u64 phys_addr;
> + bool found = false;
> +
> + if (evt->error_type != MCE_ERROR_TYPE_UE)
> + return NOTIFY_DONE;
> +
> + if (list_empty(_nd_regions))
> + return NOTIFY_DONE;
> +
> + phys_addr = evt->u.ue_error.physical_address +
> + (evt->u.ue_error.effective_address & ~PAGE_MASK);
Though it seems that you are trying to get the actual physical address
from the page aligned evt->u.ue_error.physical_address, it would be nice
if you could put a comment as to why you are doing this seemingly wierd
math with real and effective addresses here.
> +
> + if (!evt->u.ue_error.physical_address_provided ||
> + !is_zone_device_page(pfn_to_page(phys_addr >> PAGE_SHIFT)))
> + return NOTIFY_DONE;
> +
> + /* mce notifier is called from a process context, so mutex is safe */
> + mutex_lock(_ndr_lock);
> + list_for_each_entry(p, _nd_regions, region_list) {
> + struct resource res = p->res;
> +
> + if (phys_addr >= res.start && phys_addr <= res.end) {
> + found = true;
> + break;
> + }
> + }
> +
> + mutex_unlock(_ndr_lock);
> +
> + if (!found)
> + return NOTIFY_DONE;
> +
> + papr_scm_add_badblock(p->region, p->bus, phys_addr);
I see a possible race between papr_scm_add_badblock() and
papr_scm_remove() as a bad block may be reported just remove a region is
disabled. Would recomment calling papr_scm_bad_block() in context of
papr_ndr_lock.

> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block mce_ue_nb = {
> + .notifier_call = handle_mce_ue
> +};
> +
>  static int papr_scm_probe(struct platform_device *pdev)
>  {
>   struct device_node *dn = pdev->dev.of_node;
> @@ -476,6 +548,10 @@ static int papr_scm_remove(struct platform_device *pdev)
>  {
>   struct papr_scm_priv *p = platform_get_drvdata(pdev);
>  
> + mutex_lock(_ndr_lock);
> + list_del(&(p->region_list));
> + mutex_unlock(_ndr_lock);
> +
>   nvdimm_bus_unregister(p->bus);
>   drc_pmem_unbind(p);
>

Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c

2020-04-15 Thread Frank Rowand

On 4/8/20 10:22 AM, Frank Rowand wrote:
> Hi Michael,
> 
> On 4/7/20 10:13 PM, Michael Ellerman wrote:
>> bugzilla-dae...@bugzilla.kernel.org writes:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=206203
>>>
>>> Erhard F. (erhar...@mailbox.org) changed:
>>>
>>>What|Removed |Added
>>> 
>>>  Attachment #286801|0   |1
>>> is obsolete||
>>>
>>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) ---
>>> Created attachment 288189
>>>   --> https://bugzilla.kernel.org/attachment.cgi?id=288189=edit
>>> kmemleak output (kernel 5.6.2, Talos II)
>>
>> These are all in or triggered by the of unittest code AFAICS.
>> Content of the log reproduced below.
>>
>> Frank/Rob, are these memory leaks expected?
> 
> Thanks for the report.  I'll look at each one.

Only one of the leaks was expected.  I have patches to fix the
unexpected leaks and to remove the expected leak so that the
kmemleak report of it will not have to be checked again.

I expect to send the patch series tomorrow (Thursday).

-Frank

> 
> -Frank
> 
> 
>>
>> cheers
>>
>>
>> unreferenced object 0xc007eb89ca58 (size 192):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 32 bytes):
>> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
>> c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<07b50c76>] .__of_node_dup+0x38/0x1c0
>> [] .of_unittest_changeset+0x13c/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007ec978008 (size 8):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 8 bytes):
>> 6e 31 00 6b 6b 6b 6b a5  n1..
>>   backtrace:
>> [] .kstrdup+0x44/0xb0
>> [] .__of_node_dup+0x50/0x1c0
>> [] .of_unittest_changeset+0x13c/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007eb89e318 (size 192):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 32 bytes):
>> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
>> c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<07b50c76>] .__of_node_dup+0x38/0x1c0
>> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007ec97ab08 (size 8):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 8 bytes):
>> 6e 32 00 6b 6b 6b 6b a5  n2..
>>   backtrace:
>> [] .kstrdup+0x44/0xb0
>> [] .__of_node_dup+0x50/0x1c0
>> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007eb89e528 (size 192):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 32 bytes):
>> c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00  
>> c0 00 00 07 ec 97 b3 18 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<07b50c76>] .__of_node_dup+0x38/0x1c0
>> [] .of_unittest_changeset+0x1ec/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007ec97b318 (size 8):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 8 bytes):
>> 6e 32 31 00 6b 6b 6b a5  n21.kkk.
>>   backtrace:
>> [] .kstrdup+0x44/0xb0
>>

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 12:53:31PM +1000, Nicholas Piggin wrote:
> > Not to mention the dcache line to access
> > __hwcap or whatever, and the icache lines to setup access TOC-relative
> > access to it. (Of course you could put a copy of its value in TLS at a
> > fixed offset, which would somewhat mitigate both.)
> > 
> >> And finally, the HWCAP test can eventually go away in future. A vdso
> >> call can not.
> > 
> > We support nearly arbitrarily old kernels (with limited functionality)
> > and hardware (with full functionality) and don't intend for that to
> > change, ever. But indeed glibc might want too eventually drop the
> > check.
> 
> Ah, cool. Any build-time flexibility there?
> 
> We may or may not be getting a new ABI that will use instructions not 
> supported by old processors.
> 
> https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html
> 
> Current ABI continues to work of course and be the default for some 
> time, but building for new one would give some opportunity to drop
> such support for old procs, at least for glibc.

What does "new ABI" entail to you? In the terminology I use with musl,
"new ABI" and "new ISA level" are different things. You can compile
(explicit -march or compiler default) binaries that won't run on older
cpus due to use of new insns etc., but we consider it the same ABI if
you can link code for an older/baseline ISA level with the
newer-ISA-level object files, i.e. if the interface surface for
linkage remains compatible. We also try to avoid gratuitous
proliferation of different ABIs unless there's a strong underlying
need (like addition of softfloat ABIs for archs that usually have FPU,
or vice versa).

In principle the same could be done for kernels except it's a bigger
silent gotcha (possible ENOSYS in places where it shouldn't be able to
happen rather than a trapping SIGILL or similar) and there's rarely
any serious performance or size benefit to dropping support for older
kernels.

Rich

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 12:35 pm:
> On Thu, Apr 16, 2020 at 12:24:16PM +1000, Nicholas Piggin wrote:
>> >> > Likewise, it's not useful to have different error return mechanisms
>> >> > because the caller just has to branch to support both (or the
>> >> > kernel-provided stub just has to emulate one for it; that could work
>> >> > if you really want to change the bad existing convention).
>> >> > 
>> >> > Thoughts?
>> >> 
>> >> The existing convention has to change somewhat because of the clobbers,
>> >> so I thought we could change the error return at the same time. I'm
>> >> open to not changing it and using CR0[SO], but others liked the idea.
>> >> Pro: it matches sc and vsyscall. Con: it's different from other common
>> >> archs. Performnce-wise it would really be a wash -- cost of conditional
>> >> branch is not the cmp but the mispredict.
>> > 
>> > If you do the branch on hwcap at each syscall, then you significantly
>> > increase code size of every syscall point, likely turning a bunch of
>> > trivial functions that didn't need stack frames into ones that do. You
>> > also potentially make them need a TOC pointer. Making them all just do
>> > an indirect call unconditionally (with pointer in TLS like i386?) is a
>> > lot more efficient in code size and at least as good for performance.
>> 
>> I disagree. Doing the long vdso indirect call *necessarily* requires
>> touching a new icache line, and even a new TLB entry. Indirect branches
> 
> The increase in number of icache lines from the branch at every
> syscall point is far greater than the use of a single extra icache
> line shared by all syscalls.

That's true, I was thinking of a single function that does the test and 
calls syscalls, which might be the fair comparison.

> Not to mention the dcache line to access
> __hwcap or whatever, and the icache lines to setup access TOC-relative
> access to it. (Of course you could put a copy of its value in TLS at a
> fixed offset, which would somewhat mitigate both.)
> 
>> And finally, the HWCAP test can eventually go away in future. A vdso
>> call can not.
> 
> We support nearly arbitrarily old kernels (with limited functionality)
> and hardware (with full functionality) and don't intend for that to
> change, ever. But indeed glibc might want too eventually drop the
> check.

Ah, cool. Any build-time flexibility there?

We may or may not be getting a new ABI that will use instructions not 
supported by old processors.

https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html

Current ABI continues to work of course and be the default for some 
time, but building for new one would give some opportunity to drop
such support for old procs, at least for glibc.

> 
>> If you really want to select with an indirect branch rather than
>> direct conditional, you can do that all within the library.
> 
> OK. It's a little bit more work if that's not the interface the kernel
> will give us, but it's no big deal.

Okay.

Thanks,
Nick

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-15 Thread Oliver O'Halloran

On Thu, Apr 16, 2020 at 12:34 PM Oliver O'Halloran  wrote:
>
> On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy  wrote:
> >
> > Anyone? Is it totally useless or wrong approach? Thanks,
>
> I wouldn't say it's either, but I still hate it.
>
> The 4GB mode being per-PHB makes it difficult to use unless we force
> that mode on 100% of the time which I'd prefer not to do. Ideally
> devices that actually support 64bit addressing (which is most of them)
> should be able to use no-translate mode when possible since a) It's
> faster, and b) It frees up room in the TCE cache devices that actually
> need them. I know you've done some testing with 100G NICs and found
> the overhead was fine, but IMO that's a bad test since it's pretty
> much the best-case scenario since all the devices on the PHB are in
> the same PE. The PHB's TCE cache only hits when the TCE matches the
> DMA bus address and the PE number for the device so in a multi-PE
> environment there's a lot of potential for TCE cache trashing. If
> there was one or two PEs under that PHB it's probably not going to
> matter, but if you have an NVMe rack with 20 drives it starts to look
> a bit ugly.
>
> That all said, it might be worth doing this anyway since we probably
> want the software infrastructure in place to take advantage of it.
> Maybe expand the command line parameters to allow it to be enabled on
> a per-PHB basis rather than globally.

Since we're on the topic

I've been thinking the real issue we have is that we're trying to pick
an "optimal" IOMMU config at a point where we don't have enough
information to work out what's actually optimal. The IOMMU config is
done on a per-PE basis, but since PEs may contain devices with
different DMA masks (looking at you wierd AMD audio function) we're
always going to have to pick something conservative as the default
config for TVE#0 (64k, no bypass mapping) since the driver will tell
us what the device actually supports long after the IOMMU configuation
is done. What we really want is to be able to have separate IOMMU
contexts for each device, or at the very least a separate context for
the crippled devices.

We could allow a per-device IOMMU context by extending the Master /
Slave PE thing to cover DMA in addition to MMIO. Right now we only use
slave PEs when a device's MMIO BARs extend over multiple m64 segments.
When that happens an MMIO error causes the PHB to freezes the PE
corresponding to one of those segments, but not any of the others. To
present a single "PE" to the EEH core we check the freeze status of
each of the slave PEs when the EEH core does a PE status check and if
any of them are frozen, we freeze the rest of them too. When a driver
sets a limited DMA mask we could move that device to a seperate slave
PE so that it has it's own IOMMU context taylored to its DMA
addressing limits.

Thoughts?

Oliver

Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings

Excerpts from Will Deacon's message of April 15, 2020 8:47 pm:
> Hi Nick,
> 
> On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
>> have vmalloc attempt to allocate PMD-sized pages first, before falling back
>> to small pages. Allocations which use something other than PAGE_KERNEL
>> protections are not permitted to use huge pages yet, not all callers expect
>> this (e.g., module allocations vs strict module rwx).
>> 
>> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
>> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.
> 
> I wonder if it's worth extending vmap() to handle higher order pages in
> a similar way? That might be helpful for tracing PMUs such as Arm SPE,
> where the CPU streams tracing data out to a virtually addressed buffer
> (see rb_alloc_aux_page()).

Yeah it becomes pretty trivial to do that with VM_HUGE_PAGES after
this patch, I have something to do it but no callers ready yet, if
you have an easy one we can add it.

>> This can result in more internal fragmentation and memory overhead for a
>> given allocation. It can also cause greater NUMA unbalance on hashdist
>> allocations.
>> 
>> There may be other callers that expect small pages under vmalloc but use
>> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
>> alternative would be a new function or flag which enables large mappings,
>> and use that in callers.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  include/linux/vmalloc.h |   2 +
>>  mm/vmalloc.c| 135 +---
>>  2 files changed, 102 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 291313a7e663..853b82eac192 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -24,6 +24,7 @@ struct notifier_block; /* in notifier.h */
>>  #define VM_UNINITIALIZED0x0020  /* vm_struct is not fully 
>> initialized */
>>  #define VM_NO_GUARD 0x0040  /* don't add guard page */
>>  #define VM_KASAN0x0080  /* has allocated kasan shadow 
>> memory */
>> +#define VM_HUGE_PAGES   0x0100  /* may use huge pages */
> 
> Please can you add a check for this in the arm64 change_memory_common()
> code? Other architectures might need something similar, but we need to
> forbid changing memory attributes for portions of the huge page.

Yeah good idea, I can look about adding some more checks.

> 
> In general, I'm a bit wary of software table walkers tripping over this.
> For example, I don't think apply_to_existing_page_range() can handle
> huge mappings at all, but the one user (KASAN) only ever uses page mappings
> so it's ok there.

Right, I have something to warn for apply to page range (and looking
at adding support for bigger pages). It doesn't even have a test and
warn at the moment which isn't good practice IMO so we should add one
even without huge vmap.

> 
>> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned 
>> long size,
>>  if (unlikely(!size))
>>  return NULL;
>>  
>> -if (flags & VM_IOREMAP)
>> -align = 1ul << clamp_t(int, get_count_order_long(size),
>> -   PAGE_SHIFT, IOREMAP_MAX_ORDER);
>> +if (flags & VM_IOREMAP) {
>> +align = max(align,
>> +1ul << clamp_t(int, get_count_order_long(size),
>> +   PAGE_SHIFT, IOREMAP_MAX_ORDER));
>> +}
> 
> 
> I don't follow this part. Please could you explain why you're potentially
> aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
> of the patch.

Trying to remember. If the caller asks for a particular alignment we 
shouldn't reduce it. Should put it in another patch.

Thanks,
Nick

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 12:24:16PM +1000, Nicholas Piggin wrote:
> >> > Likewise, it's not useful to have different error return mechanisms
> >> > because the caller just has to branch to support both (or the
> >> > kernel-provided stub just has to emulate one for it; that could work
> >> > if you really want to change the bad existing convention).
> >> > 
> >> > Thoughts?
> >> 
> >> The existing convention has to change somewhat because of the clobbers,
> >> so I thought we could change the error return at the same time. I'm
> >> open to not changing it and using CR0[SO], but others liked the idea.
> >> Pro: it matches sc and vsyscall. Con: it's different from other common
> >> archs. Performnce-wise it would really be a wash -- cost of conditional
> >> branch is not the cmp but the mispredict.
> > 
> > If you do the branch on hwcap at each syscall, then you significantly
> > increase code size of every syscall point, likely turning a bunch of
> > trivial functions that didn't need stack frames into ones that do. You
> > also potentially make them need a TOC pointer. Making them all just do
> > an indirect call unconditionally (with pointer in TLS like i386?) is a
> > lot more efficient in code size and at least as good for performance.
> 
> I disagree. Doing the long vdso indirect call *necessarily* requires
> touching a new icache line, and even a new TLB entry. Indirect branches

The increase in number of icache lines from the branch at every
syscall point is far greater than the use of a single extra icache
line shared by all syscalls. Not to mention the dcache line to access
__hwcap or whatever, and the icache lines to setup access TOC-relative
access to it. (Of course you could put a copy of its value in TLS at a
fixed offset, which would somewhat mitigate both.)

> And finally, the HWCAP test can eventually go away in future. A vdso
> call can not.

We support nearly arbitrarily old kernels (with limited functionality)
and hardware (with full functionality) and don't intend for that to
change, ever. But indeed glibc might want too eventually drop the
check.

> If you really want to select with an indirect branch rather than
> direct conditional, you can do that all within the library.

OK. It's a little bit more work if that's not the interface the kernel
will give us, but it's no big deal.

Rich

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-15 Thread Oliver O'Halloran

On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy  wrote:
>
> Anyone? Is it totally useless or wrong approach? Thanks,

I wouldn't say it's either, but I still hate it.

The 4GB mode being per-PHB makes it difficult to use unless we force
that mode on 100% of the time which I'd prefer not to do. Ideally
devices that actually support 64bit addressing (which is most of them)
should be able to use no-translate mode when possible since a) It's
faster, and b) It frees up room in the TCE cache devices that actually
need them. I know you've done some testing with 100G NICs and found
the overhead was fine, but IMO that's a bad test since it's pretty
much the best-case scenario since all the devices on the PHB are in
the same PE. The PHB's TCE cache only hits when the TCE matches the
DMA bus address and the PE number for the device so in a multi-PE
environment there's a lot of potential for TCE cache trashing. If
there was one or two PEs under that PHB it's probably not going to
matter, but if you have an NVMe rack with 20 drives it starts to look
a bit ugly.

That all said, it might be worth doing this anyway since we probably
want the software infrastructure in place to take advantage of it.
Maybe expand the command line parameters to allow it to be enabled on
a per-PHB basis rather than globally.

Oliver

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 10:48 am:
> On Thu, Apr 16, 2020 at 10:16:54AM +1000, Nicholas Piggin wrote:
>> Excerpts from Rich Felker's message of April 16, 2020 8:55 am:
>> > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
>> >> I would like to enable Linux support for the powerpc 'scv' instruction,
>> >> as a faster system call instruction.
>> >> 
>> >> This requires two things to be defined: Firstly a way to advertise to 
>> >> userspace that kernel supports scv, and a way to allocate and advertise
>> >> support for individual scv vectors. Secondly, a calling convention ABI
>> >> for this new instruction.
>> >> 
>> >> Thanks to those who commented last time, since then I have removed my
>> >> answered questions and unpopular alternatives but you can find them
>> >> here
>> >> 
>> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
>> >> 
>> >> Let me try one more with a wider cc list, and then we'll get something
>> >> merged. Any questions or counter-opinions are welcome.
>> >> 
>> >> System Call Vectored (scv) ABI
>> >> ==
>> >> 
>> >> The scv instruction is introduced with POWER9 / ISA3, it comes with an
>> >> rfscv counter-part. The benefit of these instructions is performance
>> >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
>> >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
>> >> updates. The scv instruction has 128 interrupt entry points (not enough 
>> >> to cover the Linux system call space).
>> >> 
>> >> The proposal is to assign scv numbers very conservatively and allocate 
>> >> them as individual HWCAP features as we add support for more. The zero 
>> >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
>> >> 
>> >> Advertisement
>> >> 
>> >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
>> >> SIGILL in current environments. Linux has defined a HWCAP2 bit 
>> >> PPC_FEATURE2_SCV for SCV support, but does not set it.
>> >> 
>> >> When scv instruction support and the scv 0 vector for system calls are 
>> >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
>> >> should not be used without future HWCAP bits indicating support, which is
>> >> how we will allocate them. (Should unallocated ones generate SIGILL, or
>> >> return -ENOSYS in r3?)
>> >> 
>> >> Calling convention
>> >> 
>> >> The proposal is for scv 0 to provide the standard Linux system call ABI 
>> >> with the following differences from sc convention[1]:
>> >> 
>> >> - LR is to be volatile across scv calls. This is necessary because the 
>> >>   scv instruction clobbers LR. From previous discussion, this should be 
>> >>   possible to deal with in GCC clobbers and CFI.
>> >> 
>> >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>> >>   kernel system call exit to avoid restoring the CR register (although 
>> >>   we probably still would anyway to avoid information leak).
>> >> 
>> >> - Error handling: I think the consensus has been to move to using negative
>> >>   return value in r3 rather than CR0[SO]=1 to indicate error, which 
>> >> matches
>> >>   most other architectures and is closer to a function call.
>> >> 
>> >> The number of scratch registers (r9-r12) at kernel entry seems 
>> >> sufficient that we don't have any costly spilling, patch is here[2].  
>> >> 
>> >> [1] 
>> >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
>> >> [2] 
>> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840..html
>> > 
>> > My preference would be that it work just like the i386 AT_SYSINFO
>> > where you just replace "int $128" with "call *%%gs:16" and the kernel
>> > provides a stub in the vdso that performs either scv or the old
>> > mechanism with the same calling convention. Then if the kernel doesn't
>> > provide it (because the kernel is too old) libc would have to provide
>> > its own stub that uses the legacy method and matches the calling
>> > convention of the one the kernel is expected to provide.
>> 
>> I'm not sure if that's necessary. That's done on x86-32 because they
>> select different sequences to use based on the CPU running and if the host
>> kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP
>> bits and select the right sequence in libc as well I suppose.
> 
> It's not just a HWCAP. It's a contract between the kernel and
> userspace to support a particular calling convention that's not
> exposed except as the public entry point the kernel exports via
> AT_SYSINFO.

Right.

>> > Note that any libc that actually makes use of the new functionality is
>> > not going to be able to make clobbers conditional on support for it;
>> > branching around different clobbers is going to defeat any gains vs
>> > always just treating anything clobbered by either method as clobbered.
>> 
>> Well it would have to test HWCAP and

Re: [PATCH 01/34] docs: filesystems: fix references for doc files there

2020-04-15 Thread Joseph Qi




On 2020/4/15 22:32, Mauro Carvalho Chehab wrote:
> Several files there were renamed to ReST. Fix the broken
> references.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/ABI/stable/sysfs-devices-node   | 2 +-
>  Documentation/ABI/testing/procfs-smaps_rollup | 2 +-
>  Documentation/admin-guide/cpu-load.rst| 2 +-
>  Documentation/admin-guide/nfs/nfsroot.rst | 2 +-
>  Documentation/driver-api/driver-model/device.rst  | 2 +-
>  Documentation/driver-api/driver-model/overview.rst| 2 +-
>  Documentation/filesystems/dax.txt | 2 +-
>  Documentation/filesystems/dnotify.txt | 2 +-
>  Documentation/filesystems/ramfs-rootfs-initramfs.rst  | 2 +-
>  Documentation/powerpc/firmware-assisted-dump.rst  | 2 +-
>  Documentation/process/adding-syscalls.rst | 2 +-
>  .../translations/it_IT/process/adding-syscalls.rst| 2 +-
>  Documentation/translations/zh_CN/filesystems/sysfs.txt| 6 +++---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 2 +-
>  fs/Kconfig| 2 +-
>  fs/Kconfig.binfmt | 2 +-
>  fs/adfs/Kconfig   | 2 +-
>  fs/affs/Kconfig   | 2 +-
>  fs/afs/Kconfig| 6 +++---
>  fs/bfs/Kconfig| 2 +-
>  fs/cramfs/Kconfig | 2 +-
>  fs/ecryptfs/Kconfig   | 2 +-
>  fs/fat/Kconfig| 8 
>  fs/fuse/Kconfig   | 2 +-
>  fs/fuse/dev.c | 2 +-
>  fs/hfs/Kconfig| 2 +-
>  fs/hpfs/Kconfig   | 2 +-
>  fs/isofs/Kconfig  | 2 +-
>  fs/namespace.c| 2 +-
>  fs/notify/inotify/Kconfig | 2 +-
>  fs/ntfs/Kconfig   | 2 +-
>  fs/ocfs2/Kconfig  | 2 +-

For ocfs2 part,
Acked-by: Joseph Qi 

>  fs/overlayfs/Kconfig  | 6 +++---
>  fs/proc/Kconfig   | 4 ++--
>  fs/romfs/Kconfig  | 2 +-
>  fs/sysfs/dir.c| 2 +-
>  fs/sysfs/file.c   | 2 +-
>  fs/sysfs/mount.c  | 2 +-
>  fs/sysfs/symlink.c| 2 +-
>  fs/sysv/Kconfig   | 2 +-
>  fs/udf/Kconfig| 2 +-
>  include/linux/relay.h | 2 +-
>  include/linux/sysfs.h | 2 +-
>  kernel/relay.c| 2 +-
>  44 files changed, 54 insertions(+), 54 deletions(-)
> 
> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
> b/Documentation/ABI/stable/sysfs-devices-node
> index df8413cf1468..484fc04bcc25 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -54,7 +54,7 @@ Date:   October 2002
>  Contact: Linux Memory Management list 
>  Description:
>   Provides information about the node's distribution and memory
> - utilization. Similar to /proc/meminfo, see 
> Documentation/filesystems/proc.txt
> + utilization. Similar to /proc/meminfo, see 
> Documentation/filesystems/proc.rst
>  
>  What:/sys/devices/system/node/nodeX/numastat
>  Date:October 2002
> diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
> b/Documentation/ABI/testing/procfs-smaps_rollup
> index 274df44d8b1b..046978193368 100644
> --- a/Documentation/ABI/testing/procfs-smaps_rollup
> +++ b/Documentation/ABI/testing/procfs-smaps_rollup
> @@ -11,7 +11,7 @@ Description:
>   Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
>   are not present in /proc/pid/smaps.  These fields represent
>   the sum of the Pss field of each type (anon, file, shmem).
> - For more details, see Documentation/filesystems/proc.txt
> + For more details, see Documentation/filesystems/proc.rst
>   and the procfs man page.
>  
>   Typical output looks like this:
> diff --git a/Documentation/admin-guide/cpu-load.rst 
> b/Documentation/admin-guide/cpu-load.rst
> index 2d01ce43d2a2..ebdecf864080 100644
> --- a/Documentation/admin-guide/cpu-load.rst
> +++

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-15 Thread Alexey Kardashevskiy

Anyone? Is it totally useless or wrong approach? Thanks,


On 08/04/2020 19:43, Alexey Kardashevskiy wrote:
> 
> 
> On 23/03/2020 18:53, Alexey Kardashevskiy wrote:
>> Here is an attempt to support bigger DMA space for devices
>> supporting DMA masks less than 59 bits (GPUs come into mind
>> first). POWER9 PHBs have an option to map 2 windows at 0
>> and select a windows based on DMA address being below or above
>> 4GB.
>>
>> This adds the "iommu=iommu_bypass" kernel parameter and
>> supports VFIO+pseries machine - current this requires telling
>> upstream+unmodified QEMU about this via
>> -global spapr-pci-host-bridge.dma64_win_addr=0x1
>> or per-phb property. 4/4 advertises the new option but
>> there is no automation around it in QEMU (should it be?).
>>
>> For now it is either 1<<59 or 4GB mode; dynamic switching is
>> not supported (could be via sysfs).
>>
>> This is a rebased version of
>> https://lore.kernel.org/kvm/20191202015953.127902-1-...@ozlabs.ru/
>>
>> The main change since v1 is that now it is 7 patches with
>> clearer separation of steps.
>>
>>
>> This is based on 6c90b86a745a "Merge tag 'mmc-v5.6-rc6' of 
>> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc"
>>
>> Please comment. Thanks.
> 
> Ping?
> 
> 
>>
>>
>>
>> Alexey Kardashevskiy (7):
>>   powerpc/powernv/ioda: Move TCE bypass base to PE
>>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>>   powerpc/powernv/ioda: Allow smaller TCE table levels
>>   powerpc/powernv/phb4: Use IOMMU instead of bypassing
>>   powerpc/iommu: Add a window number to
>> iommu_table_group_ops::get_table_size
>>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
>>
>>  arch/powerpc/include/asm/iommu.h  |   3 +
>>  arch/powerpc/include/asm/opal-api.h   |   9 +-
>>  arch/powerpc/include/asm/opal.h   |   2 +
>>  arch/powerpc/platforms/powernv/pci.h  |   4 +-
>>  include/uapi/linux/vfio.h |   2 +
>>  arch/powerpc/platforms/powernv/npu-dma.c  |   1 +
>>  arch/powerpc/platforms/powernv/opal-call.c|   2 +
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 234 ++
>>  drivers/vfio/vfio_iommu_spapr_tce.c   |  17 +-
>>  10 files changed, 213 insertions(+), 65 deletions(-)
>>
> 

-- 
Alexey

Re: [PATCH 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

2020-04-15 Thread kbuild test robot

Hi Wang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on char-misc/char-misc-testing staging/staging-testing 
v5.7-rc1 next-20200415]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Wang-Wenhu/drivers-uio-new-driver-uio_fsl_85xx_cache_sram/20200416-040633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for ARCH_32BIT_OFF_T
   Depends on !64BIT
   Selected by
   - PPC && PPC32
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:14:3: error: conflicting types for 
>> 'atomic64_t'
   14 | } atomic64_t;
   | ^~
   In file included from include/linux/page-flags.h:9,
   from kernel/bounds.c:10:
   include/linux/types.h:178:3: note: previous declaration of 'atomic64_t' was 
here
   178 | } atomic64_t;
   | ^~
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:18:12: error: conflicting types for 
>> 'atomic64_read'
   18 | extern s64 atomic64_read(const atomic64_t
   | ^
   In file included from include/linux/atomic.h:7,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
   arch/powerpc/include/asm/atomic.h:300:23: note: previous definition of 
'atomic64_read' was here
   300 | static __inline__ s64 atomic64_read(const atomic64_t
   | ^
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:19:13: error: conflicting types for 
>> 'atomic64_set'
   19 | extern void atomic64_set(atomic64_t s64 i);
   | ^~~~
   In file included from include/linux/atomic.h:7,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
   arch/powerpc/include/asm/atomic.h:309:24: note: previous definition of 
'atomic64_set' was here
   309 | static __inline__ void atomic64_set(atomic64_t s64 i)
   | ^~~~
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:32: warning: "ATOMIC64_OPS" redefined
   32 | #define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) 
ATOMIC64_FETCH_OP(op)
   |
   In file included from include/linux/atomic.h:7,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
   arch/powerpc/include/asm/atomic.h:380: note: this is the location of the 
previous definition
   380 | #define ATOMIC64_OPS(op, asm_op) |
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:24:14: error: conflicting types for 
>> 'atomic64_add'
   24 | extern void atomic64_##op(s64 a, atomic64_t
   | ^
>> include/asm-generic/atomic64.h:32:26: note: in expansion of macro 
>> 'ATOMIC64_OP'
   32 | #define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) 
ATOMIC64_FETCH_OP(op)
   | ^~~
>> include/asm-generic/atomic64.h:34:1: note: in expansion of macro 
>> 'ATOMIC64_OPS'
   34 | ATOMIC

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 10:16:54AM +1000, Nicholas Piggin wrote:
> Excerpts from Rich Felker's message of April 16, 2020 8:55 am:
> > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
> >> I would like to enable Linux support for the powerpc 'scv' instruction,
> >> as a faster system call instruction.
> >> 
> >> This requires two things to be defined: Firstly a way to advertise to 
> >> userspace that kernel supports scv, and a way to allocate and advertise
> >> support for individual scv vectors. Secondly, a calling convention ABI
> >> for this new instruction.
> >> 
> >> Thanks to those who commented last time, since then I have removed my
> >> answered questions and unpopular alternatives but you can find them
> >> here
> >> 
> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
> >> 
> >> Let me try one more with a wider cc list, and then we'll get something
> >> merged. Any questions or counter-opinions are welcome.
> >> 
> >> System Call Vectored (scv) ABI
> >> ==
> >> 
> >> The scv instruction is introduced with POWER9 / ISA3, it comes with an
> >> rfscv counter-part. The benefit of these instructions is performance
> >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
> >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
> >> updates. The scv instruction has 128 interrupt entry points (not enough 
> >> to cover the Linux system call space).
> >> 
> >> The proposal is to assign scv numbers very conservatively and allocate 
> >> them as individual HWCAP features as we add support for more. The zero 
> >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
> >> 
> >> Advertisement
> >> 
> >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
> >> SIGILL in current environments. Linux has defined a HWCAP2 bit 
> >> PPC_FEATURE2_SCV for SCV support, but does not set it.
> >> 
> >> When scv instruction support and the scv 0 vector for system calls are 
> >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
> >> should not be used without future HWCAP bits indicating support, which is
> >> how we will allocate them. (Should unallocated ones generate SIGILL, or
> >> return -ENOSYS in r3?)
> >> 
> >> Calling convention
> >> 
> >> The proposal is for scv 0 to provide the standard Linux system call ABI 
> >> with the following differences from sc convention[1]:
> >> 
> >> - LR is to be volatile across scv calls. This is necessary because the 
> >>   scv instruction clobbers LR. From previous discussion, this should be 
> >>   possible to deal with in GCC clobbers and CFI.
> >> 
> >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
> >>   kernel system call exit to avoid restoring the CR register (although 
> >>   we probably still would anyway to avoid information leak).
> >> 
> >> - Error handling: I think the consensus has been to move to using negative
> >>   return value in r3 rather than CR0[SO]=1 to indicate error, which matches
> >>   most other architectures and is closer to a function call.
> >> 
> >> The number of scratch registers (r9-r12) at kernel entry seems 
> >> sufficient that we don't have any costly spilling, patch is here[2].  
> >> 
> >> [1] 
> >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
> >> [2] 
> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840..html
> > 
> > My preference would be that it work just like the i386 AT_SYSINFO
> > where you just replace "int $128" with "call *%%gs:16" and the kernel
> > provides a stub in the vdso that performs either scv or the old
> > mechanism with the same calling convention. Then if the kernel doesn't
> > provide it (because the kernel is too old) libc would have to provide
> > its own stub that uses the legacy method and matches the calling
> > convention of the one the kernel is expected to provide.
> 
> I'm not sure if that's necessary. That's done on x86-32 because they
> select different sequences to use based on the CPU running and if the host
> kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP
> bits and select the right sequence in libc as well I suppose.

It's not just a HWCAP. It's a contract between the kernel and
userspace to support a particular calling convention that's not
exposed except as the public entry point the kernel exports via
AT_SYSINFO.

> > Note that any libc that actually makes use of the new functionality is
> > not going to be able to make clobbers conditional on support for it;
> > branching around different clobbers is going to defeat any gains vs
> > always just treating anything clobbered by either method as clobbered.
> 
> Well it would have to test HWCAP and patch in or branch to two 
> completely different sequences including register save/restores yes.
> You could have the same asm and matching clobbers to put the sequence
> inline

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 8:55 am:
> On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
>> I would like to enable Linux support for the powerpc 'scv' instruction,
>> as a faster system call instruction.
>> 
>> This requires two things to be defined: Firstly a way to advertise to 
>> userspace that kernel supports scv, and a way to allocate and advertise
>> support for individual scv vectors. Secondly, a calling convention ABI
>> for this new instruction.
>> 
>> Thanks to those who commented last time, since then I have removed my
>> answered questions and unpopular alternatives but you can find them
>> here
>> 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
>> 
>> Let me try one more with a wider cc list, and then we'll get something
>> merged. Any questions or counter-opinions are welcome.
>> 
>> System Call Vectored (scv) ABI
>> ==
>> 
>> The scv instruction is introduced with POWER9 / ISA3, it comes with an
>> rfscv counter-part. The benefit of these instructions is performance
>> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
>> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
>> updates. The scv instruction has 128 interrupt entry points (not enough 
>> to cover the Linux system call space).
>> 
>> The proposal is to assign scv numbers very conservatively and allocate 
>> them as individual HWCAP features as we add support for more. The zero 
>> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
>> 
>> Advertisement
>> 
>> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
>> SIGILL in current environments. Linux has defined a HWCAP2 bit 
>> PPC_FEATURE2_SCV for SCV support, but does not set it.
>> 
>> When scv instruction support and the scv 0 vector for system calls are 
>> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
>> should not be used without future HWCAP bits indicating support, which is
>> how we will allocate them. (Should unallocated ones generate SIGILL, or
>> return -ENOSYS in r3?)
>> 
>> Calling convention
>> 
>> The proposal is for scv 0 to provide the standard Linux system call ABI 
>> with the following differences from sc convention[1]:
>> 
>> - LR is to be volatile across scv calls. This is necessary because the 
>>   scv instruction clobbers LR. From previous discussion, this should be 
>>   possible to deal with in GCC clobbers and CFI.
>> 
>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>   kernel system call exit to avoid restoring the CR register (although 
>>   we probably still would anyway to avoid information leak).
>> 
>> - Error handling: I think the consensus has been to move to using negative
>>   return value in r3 rather than CR0[SO]=1 to indicate error, which matches
>>   most other architectures and is closer to a function call.
>> 
>> The number of scratch registers (r9-r12) at kernel entry seems 
>> sufficient that we don't have any costly spilling, patch is here[2].  
>> 
>> [1] 
>> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html
> 
> My preference would be that it work just like the i386 AT_SYSINFO
> where you just replace "int $128" with "call *%%gs:16" and the kernel
> provides a stub in the vdso that performs either scv or the old
> mechanism with the same calling convention. Then if the kernel doesn't
> provide it (because the kernel is too old) libc would have to provide
> its own stub that uses the legacy method and matches the calling
> convention of the one the kernel is expected to provide.

I'm not sure if that's necessary. That's done on x86-32 because they
select different sequences to use based on the CPU running and if the host
kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP
bits and select the right sequence in libc as well I suppose.

> Note that any libc that actually makes use of the new functionality is
> not going to be able to make clobbers conditional on support for it;
> branching around different clobbers is going to defeat any gains vs
> always just treating anything clobbered by either method as clobbered.

Well it would have to test HWCAP and patch in or branch to two 
completely different sequences including register save/restores yes.
You could have the same asm and matching clobbers to put the sequence
inline and then you could patch the one sc/scv instruction I suppose.

A bit of logic to select between them doesn't defeat gains though,
it's about 90 cycle improvement which is a handful of branch mispredicts 
so it really is an improvement. Eventually userspace will stop 
supporting the old variant too.

> Likewise, it's not useful to have different error return mechanisms
> because the caller just has to branch to support both (or the
> kernel-provided

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
> I would like to enable Linux support for the powerpc 'scv' instruction,
> as a faster system call instruction.
> 
> This requires two things to be defined: Firstly a way to advertise to 
> userspace that kernel supports scv, and a way to allocate and advertise
> support for individual scv vectors. Secondly, a calling convention ABI
> for this new instruction.
> 
> Thanks to those who commented last time, since then I have removed my
> answered questions and unpopular alternatives but you can find them
> here
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
> 
> Let me try one more with a wider cc list, and then we'll get something
> merged. Any questions or counter-opinions are welcome.
> 
> System Call Vectored (scv) ABI
> ==
> 
> The scv instruction is introduced with POWER9 / ISA3, it comes with an
> rfscv counter-part. The benefit of these instructions is performance
> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
> updates. The scv instruction has 128 interrupt entry points (not enough 
> to cover the Linux system call space).
> 
> The proposal is to assign scv numbers very conservatively and allocate 
> them as individual HWCAP features as we add support for more. The zero 
> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
> 
> Advertisement
> 
> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
> SIGILL in current environments. Linux has defined a HWCAP2 bit 
> PPC_FEATURE2_SCV for SCV support, but does not set it.
> 
> When scv instruction support and the scv 0 vector for system calls are 
> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
> should not be used without future HWCAP bits indicating support, which is
> how we will allocate them. (Should unallocated ones generate SIGILL, or
> return -ENOSYS in r3?)
> 
> Calling convention
> 
> The proposal is for scv 0 to provide the standard Linux system call ABI 
> with the following differences from sc convention[1]:
> 
> - LR is to be volatile across scv calls. This is necessary because the 
>   scv instruction clobbers LR. From previous discussion, this should be 
>   possible to deal with in GCC clobbers and CFI.
> 
> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>   kernel system call exit to avoid restoring the CR register (although 
>   we probably still would anyway to avoid information leak).
> 
> - Error handling: I think the consensus has been to move to using negative
>   return value in r3 rather than CR0[SO]=1 to indicate error, which matches
>   most other architectures and is closer to a function call.
> 
> The number of scratch registers (r9-r12) at kernel entry seems 
> sufficient that we don't have any costly spilling, patch is here[2].  
> 
> [1] 
> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html

My preference would be that it work just like the i386 AT_SYSINFO
where you just replace "int $128" with "call *%%gs:16" and the kernel
provides a stub in the vdso that performs either scv or the old
mechanism with the same calling convention. Then if the kernel doesn't
provide it (because the kernel is too old) libc would have to provide
its own stub that uses the legacy method and matches the calling
convention of the one the kernel is expected to provide.

Note that any libc that actually makes use of the new functionality is
not going to be able to make clobbers conditional on support for it;
branching around different clobbers is going to defeat any gains vs
always just treating anything clobbered by either method as clobbered.
Likewise, it's not useful to have different error return mechanisms
because the caller just has to branch to support both (or the
kernel-provided stub just has to emulate one for it; that could work
if you really want to change the bad existing convention).

Thoughts?

Rich

Re: [PATCH v2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-04-15 Thread Segher Boessenkool

Hi!

On Wed, Apr 15, 2020 at 09:25:59AM +, Christophe Leroy wrote:
> +#define __put_user_goto(x, ptr, label) \
> + __put_user_nocheck_goto((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), 
> label)

This line gets too long, can you break it up somehow?

> +#define __put_user_asm_goto(x, addr, label, op)  \
> + asm volatile goto(  \
> + "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> + EX_TABLE(1b, %l2)   \
> + :   \
> + : "r" (x), "m" (*addr)  \
> + :   \
> + : label)

Same "%Un" problem as in the other patch.  You could use "m<>" here,
but maybe just dropping "%Un" is better.

> +#ifdef __powerpc64__
> +#define __put_user_asm2_goto(x, ptr, label)  \
> + __put_user_asm_goto(x, ptr, label, "std")
> +#else /* __powerpc64__ */
> +#define __put_user_asm2_goto(x, addr, label) \
> + asm volatile goto(  \
> + "1: stw%U1%X1 %0, %1\n" \
> + "2: stw%U1%X1 %L0, %L1\n"   \
> + EX_TABLE(1b, %l2)   \
> + EX_TABLE(2b, %l2)   \
> + :   \
> + : "r" (x), "m" (*addr)  \
> + :   \
> + : label)
> +#endif /* __powerpc64__ */

Here, you should drop it for sure.

Rest looks fine.

Reviewed-by: Segher Boessenkool 


Segher

Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-04-15 Thread Segher Boessenkool

Hi!

On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote:
> At the time being, __put_user()/__get_user() and friends only use
> register indirect with immediate index addressing, with the index
> set to 0. Ex:
> 
>   lwz reg1, 0(reg2)

This is called a "D-form" instruction, or sometimes "offset addressing".
Don't talk about an "index", it confuses things, because the *other*
kind is called "indexed" already, also in the ISA docs!  (X-form, aka
indexed addressing, [reg+reg], where D-form does [reg+imm], and both
forms can do [reg]).

> Give the compiler the opportunity to use other adressing modes
> whenever possible, to get more optimised code.

Great :-)

> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -114,7 +114,7 @@ extern long __put_user_bad(void);
>   */
>  #define __put_user_asm(x, addr, err, op) \
>   __asm__ __volatile__(   \
> - "1: " op " %1,0(%2) # put_user\n"   \
> + "1: " op "%U2%X2 %1,%2  # put_user\n"   \
>   "2:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "3: li %0,%3\n" \
> @@ -122,7 +122,7 @@ extern long __put_user_bad(void);
>   ".previous\n"   \
>   EX_TABLE(1b, 3b)\
>   : "=r" (err)\
> - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))

%Un on an "m" operand doesn't do much: you need to make it "m<>" if you
want pre-modify ("update") insns to be generated.  (You then will want
to make sure that operand is used in a way GCC can understand; since it
is used only once here, that works fine).

> @@ -130,8 +130,8 @@ extern long __put_user_bad(void);
>  #else /* __powerpc64__ */
>  #define __put_user_asm2(x, addr, err)\
>   __asm__ __volatile__(   \
> - "1: stw %1,0(%2)\n" \
> - "2: stw %1+1,4(%2)\n"   \
> + "1: stw%U2%X2 %1,%2\n"  \
> + "2: stw%U2%X2 %L1,%L2\n"\
>   "3:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "4: li %0,%3\n" \
> @@ -140,7 +140,7 @@ extern long __put_user_bad(void);
>   EX_TABLE(1b, 4b)\
>   EX_TABLE(2b, 4b)\
>   : "=r" (err)\
> - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))

Here, it doesn't work.  You don't want two consecutive update insns in
any case.  Easiest is to just not use "m<>", and then, don't use %Un
(which won't do anything, but it is confusing).

Same for the reads.

Rest looks fine, and update should be good with that fixed as said.

Reviewed-by: Segher Boessenkool 

Segher

Powerpc Linux 'scv' system call ABI proposal take 2

I would like to enable Linux support for the powerpc 'scv' instruction,
as a faster system call instruction.

This requires two things to be defined: Firstly a way to advertise to 
userspace that kernel supports scv, and a way to allocate and advertise
support for individual scv vectors. Secondly, a calling convention ABI
for this new instruction.

Thanks to those who commented last time, since then I have removed my
answered questions and unpopular alternatives but you can find them
here

https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html

Let me try one more with a wider cc list, and then we'll get something
merged. Any questions or counter-opinions are welcome.

System Call Vectored (scv) ABI
==

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is performance
(trading slower SRR0/1 with faster LR/CTR registers, and entering the
kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
updates. The scv instruction has 128 interrupt entry points (not enough 
to cover the Linux system call space).

The proposal is to assign scv numbers very conservatively and allocate 
them as individual HWCAP features as we add support for more. The zero 
vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.

Advertisement

Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
SIGILL in current environments. Linux has defined a HWCAP2 bit 
PPC_FEATURE2_SCV for SCV support, but does not set it.

When scv instruction support and the scv 0 vector for system calls are 
added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
should not be used without future HWCAP bits indicating support, which is
how we will allocate them. (Should unallocated ones generate SIGILL, or
return -ENOSYS in r3?)

Calling convention

The proposal is for scv 0 to provide the standard Linux system call ABI 
with the following differences from sc convention[1]:

- LR is to be volatile across scv calls. This is necessary because the 
  scv instruction clobbers LR. From previous discussion, this should be 
  possible to deal with in GCC clobbers and CFI.

- CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
  kernel system call exit to avoid restoring the CR register (although 
  we probably still would anyway to avoid information leak).

- Error handling: I think the consensus has been to move to using negative
  return value in r3 rather than CR0[SO]=1 to indicate error, which matches
  most other architectures and is closer to a function call.

The number of scratch registers (r9-r12) at kernel entry seems 
sufficient that we don't have any costly spilling, patch is here[2].  

[1] 
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html

Re: [PATCH v5 0/6] implement KASLR for powerpc/fsl_booke/64

On Mon, 2020-03-30 at 10:20 +0800, Jason Yan wrote:
> This is a try to implement KASLR for Freescale BookE64 which is based on
> my earlier implementation for Freescale BookE32:
> 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718=*
> 
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
> 
> KERNELBASE
> 
>   64K |--> kernel <--|
>|  |  |
> +--+--+--++--+--+--+--+--+--+--+--+--++--+--+
> |  |  |  ||  |  |  |  |  |  |  |  |  ||  |  |
> +--+--+--++--+--+--+--+--+--+--+--+--++--+--+
> | |1G
> |->   offset<-|
> 
>   kernstart_virt_addr
> 
> I'm not sure if the slot numbers is enough or the design has any
> defects. If you have some better ideas, I would be happy to hear that.
> 
> Thank you all.
> 
> v4->v5:
>   Fix "-Werror=maybe-uninitialized" compile error.
>   Fix typo "similar as" -> "similar to".
> v3->v4:
>   Do not define __kaslr_offset as a fixed symbol. Reference __run_at_load
> and
> __kaslr_offset by symbol instead of magic offsets.
>   Use IS_ENABLED(CONFIG_PPC32) instead of #ifdef CONFIG_PPC32.
>   Change kaslr-booke32 to kaslr-booke in index.rst
>   Switch some instructions to 64-bit.
> v2->v3:
>   Fix build error when KASLR is disabled.
> v1->v2:
>   Add __kaslr_offset for the secondary cpu boot up.
> 
> Jason Yan (6):
>   powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
> kaslr_early_init()
>   powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>   powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>   powerpc/fsl_booke/64: do not clear the BSS for the second pass
>   powerpc/fsl_booke/64: clear the original kernel if randomized
>   powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
> and add 64bit part
> 
>  Documentation/powerpc/index.rst   |  2 +-
>  .../{kaslr-booke32.rst => kaslr-booke.rst}| 35 ++-
>  arch/powerpc/Kconfig  |  2 +-
>  arch/powerpc/kernel/exceptions-64e.S  | 23 +
>  arch/powerpc/kernel/head_64.S | 13 +++
>  arch/powerpc/kernel/setup_64.c|  3 +
>  arch/powerpc/mm/mmu_decl.h| 23 +++--
>  arch/powerpc/mm/nohash/kaslr_booke.c  | 91 +--
>  8 files changed, 147 insertions(+), 45 deletions(-)
>  rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
> 

Acked-by: Scott Wood 

-Scott

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

On Wed, 2020-04-15 at 18:52 +0200, Christophe Leroy wrote:
> 
> Le 15/04/2020 à 17:24, Wang Wenhu a écrit :
> > +
> > +   if (uiomem >= >mem[MAX_UIO_MAPS]) {
> 
> I'd prefer
>   if (uiomem - info->mem >= MAX_UIO_MAPS) {
> 
> > +   dev_warn(>dev, "more than %d uio-maps for
> > device.\n",
> > +MAX_UIO_MAPS);
> > +   break;
> > +   }
> > +   }
> > +
> > +   while (uiomem < >mem[MAX_UIO_MAPS]) {
> 
> I'd prefer
> 
>   while (uiomem - info->mem < MAX_UIO_MAPS) {
> 

I wouldn't.  You're turning a simple comparison into a division and a
comparison (if the compiler doesn't optimize it back into the original form),
and making it less clear in the process.

Of course, working with array indices to begin with instead of incrementing a
pointer would be more idiomatic.

> > +   uiomem->size = 0;
> > +   ++uiomem;
> > +   }
> > +
> > +   if (info->mem[0].size == 0) {
> 
> Is there any point in doing all the clearing loop above if it's to bail 
> out here ?
> 
> Wouldn't it be cleaner to do the test above the clearing loop, by just 
> checking whether uiomem is still equal to info->mem ?

There's no point doing the clearing at all, since the array was allocated with
kzalloc().

> > +   dev_err(>dev, "error no valid uio-map configured\n");
> > +   ret = -EINVAL;
> > +   goto err_info_free_internel;
> > +   }
> > +
> > +   info->version = "0.1.0";
> 
> Could you define some DRIVER_VERSION in the top of the file next to 
> DRIVER_NAME instead of hard coding in the middle on a function ?

That's what v1 had, and Greg KH said to remove it.  I'm guessing that he
thought it was the common-but-pointless practice of having the driver print a
version number that never gets updated, rather than something the UIO API
(unfortunately, compared to a feature query interface) expects.  That said,
I'm not sure what the value is of making it a macro since it should only be
used once, that use is self documenting, it isn't tunable, etc.  Though if
this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro
again, it should be UIO_VERSION, not DRIVER_VERSION).

Does this really need a three-part version scheme?  What's wrong with a
version of "1", to be changed to "2" in the hopefully-unlikely event that the
userspace API changes?  Assuming UIO is used for this at all, which doesn't
seem like a great fit to me.

-Scott

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

On Wed, 2020-04-15 at 08:24 -0700, Wang Wenhu wrote:
> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
> + {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
> + {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
> + {},
> +};

NACK

The device tree describes the hardware, not what driver you want to bind the
hardware to, or how you want to allocate the resources.  And even if defining
nodes for sram allocation were the right way to go, why do you have a separate
compatible for each chip when you're just describing software configuration?

Instead, have module parameters that take the sizes and alignments you'd like
to allocate and expose to userspace.  Better still would be some sort of
dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment,
if it succeeds you can mmap it, and when the fd is closed the region is
freed).

-Scott

Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

On Wed, 2020-04-15 at 08:24 -0700, Wang Wenhu wrote:
> Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
> could be configured and used as a piece of SRAM which is hignly
> friendly for some user level application performances.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Christophe Leroy 
> Cc: Scott Wood 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Wang Wenhu 
> ---
> Changes since v1:
>  * None
> ---
>  arch/powerpc/platforms/85xx/Kconfig| 2 +-
>  arch/powerpc/platforms/Kconfig.cputype | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/Kconfig
> b/arch/powerpc/platforms/85xx/Kconfig
> index fa3d29dcb57e..6debb4f1b9cc 100644
> --- a/arch/powerpc/platforms/85xx/Kconfig
> +++ b/arch/powerpc/platforms/85xx/Kconfig
> @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
>  if PPC32
>  
>  config FSL_85XX_CACHE_SRAM
> - bool
> + bool "Freescale 85xx Cache-Sram"
>   select PPC_LIB_RHEAP
>   help
> When selected, this option enables cache-sram support

NACK

As discussed before, the driver that uses this API should "select" this
symbol.

-Scott

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram





Le 15/04/2020 à 17:24, Wang Wenhu a écrit :

A driver for freescale 85xx platforms to access the Cache-Sram form
user level. This is extremely helpful for some user-space applications
that require high performance memory accesses.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
  * Addressed comments of Greg K-H
  * Moved kfree(info->name) into uio_info_free_internal()
---
  drivers/uio/Kconfig   |   8 ++
  drivers/uio/Makefile  |   1 +
  drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++
  3 files changed, 191 insertions(+)
  create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 202ee81cfc2b..afd38ec13de0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,14 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
  
+config UIO_FSL_85XX_CACHE_SRAM

+   tristate "Freescale 85xx Cache-Sram driver"
+   depends on FSL_85XX_CACHE_SRAM


Is there any point having FSL_85XX_CACHE_SRAM without this ?

Should it be the other way round, leave FSL_85XX_CACHE_SRAM unselectable 
by user, and this driver select FSL_85XX_CACHE_SRAM instead of depending 
on it ?



+   help
+ Generic driver for accessing the Cache-Sram form user level. This
+ is extremely helpful for some user-space applications that require
+ high performance memory accesses.
+
  config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index c285dd2a4539..be2056cffc21 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o
  obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
  obj-$(CONFIG_UIO_MF624) += uio_mf624.o
  obj-$(CONFIG_UIO_FSL_ELBC_GPCM)   += uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
  obj-$(CONFIG_UIO_HV_GENERIC)  += uio_hv_generic.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..fb6903fdaddb
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+ * Copyright (C) 2020 Wang Wenhu 
+ * All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"uio_fsl_85xx_cache_sram"
+#define UIO_NAME   "uio_cache_sram"
+
+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",},
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",  },
+   {},
+};
+
+static void uio_info_free_internal(struct uio_info *info)
+{
+   struct uio_mem *uiomem = >mem[0];
+
+   while (uiomem < >mem[MAX_UIO_MAPS]) {
+   if (uiomem->size) {
+   mpc85xx_cache_sram_free(uiomem->internal_addr);
+   kfree(uiomem->name);
+   }
+   uiomem++;
+   }
+
+   kfree(info->name);
+}
+
+static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
+{
+   struct device_node *parent = pdev->dev.of_node;
+   struct device_node *node = NULL;
+   struct uio_info *info;
+   struct uio_mem *uiomem;
+   const char *dt_name;
+   u32 mem_size;
+   u32 align;


Align is not used outside of the for loop, it should be declared in the 
loop block.



+   void *virt;


Same for virt


+

Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable





Le 15/04/2020 à 17:24, Wang Wenhu a écrit :

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.


It looks like following patches are fixing errors generated by selecting 
FSL_85XX_CACHE_SRAM.


So this patch should go after the patches which fixes the errors, ie it 
should be patch 4 in the series.


Christophe

Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable





Le 15/04/2020 à 17:24, Wang Wenhu a écrit :

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
  * None
---
  arch/powerpc/platforms/85xx/Kconfig| 2 +-
  arch/powerpc/platforms/Kconfig.cputype | 5 +++--
  2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index fa3d29dcb57e..6debb4f1b9cc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
  if PPC32
  
  config FSL_85XX_CACHE_SRAM

-   bool
+   bool "Freescale 85xx Cache-Sram"
select PPC_LIB_RHEAP
help
  When selected, this option enables cache-sram support
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..1921e9a573e8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,6 @@
  # SPDX-License-Identifier: GPL-2.0
  config PPC32
-   bool
+   bool "32-bit kernel"


Why make that user selectable ?

Either a kernel is 64-bit or it is 32-bit. So having PPC64 user 
selectable is all we need.


And what is the link between this change and the description in the log ?


default y if !PPC64
select KASAN_VMALLOC if KASAN && MODULES
  
@@ -15,6 +15,7 @@ config PPC_BOOK3S_32

bool
  
  menu "Processor support"

+


Why adding this space ?


  choice
prompt "Processor Type"
depends on PPC32
@@ -211,9 +212,9 @@ config PPC_BOOK3E
depends on PPC_BOOK3E_64
  
  config E500

+   bool "e500 Support"
select FSL_EMB_PERFMON
select PPC_FSL_BOOK3E
-   bool


Why make this user-selectable ? This is already selected by the 
processors requiring it, ie 8500, e5500 and e6500.


Is there any other case where we need E500 ?

And again, what's the link between this change and the description in 
the log ?



  
  config PPC_E500MC

bool "e500mc Support"



Christophe

Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching

2020-04-15 Thread Christopher M Riedl

> On April 15, 2020 3:45 AM Christophe Leroy  wrote:
> 
>  
> Le 15/04/2020 à 07:11, Christopher M Riedl a écrit :
> >> On March 24, 2020 11:25 AM Christophe Leroy  
> >> wrote:
> >>
> >>   
> >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> >>> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> >>> mappings to other CPUs. These mappings should be kept local to the CPU
> >>> doing the patching. Use the pre-initialized temporary mm and patching
> >>> address for this purpose. Also add a check after patching to ensure the
> >>> patch succeeded.
> >>>
> >>> Based on x86 implementation:
> >>>
> >>> commit b3fd8e83ada0
> >>> ("x86/alternatives: Use temporary mm for text poking")
> >>>
> >>> Signed-off-by: Christopher M. Riedl 
> >>> ---
> >>>arch/powerpc/lib/code-patching.c | 128 ++-
> >>>1 file changed, 57 insertions(+), 71 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/lib/code-patching.c 
> >>> b/arch/powerpc/lib/code-patching.c
> >>> index 18b88ecfc5a8..f156132e8975 100644
> >>> --- a/arch/powerpc/lib/code-patching.c
> >>> +++ b/arch/powerpc/lib/code-patching.c
> >>> @@ -19,6 +19,7 @@
> >>>#include 
> >>>#include 
> >>>#include 
> >>> +#include 
> >>>
> >>>static int __patch_instruction(unsigned int *exec_addr, unsigned int 
> >>> instr,
> >>>  unsigned int *patch_addr)
> >>> @@ -65,99 +66,79 @@ void __init poking_init(void)
> >>>   pte_unmap_unlock(ptep, ptl);
> >>>}
> >>>
> >>> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >>> -
> >>> -static int text_area_cpu_up(unsigned int cpu)
> >>> -{
> >>> - struct vm_struct *area;
> >>> -
> >>> - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >>> - if (!area) {
> >>> - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> >>> - cpu);
> >>> - return -1;
> >>> - }
> >>> - this_cpu_write(text_poke_area, area);
> >>> -
> >>> - return 0;
> >>> -}
> >>> -
> >>> -static int text_area_cpu_down(unsigned int cpu)
> >>> -{
> >>> - free_vm_area(this_cpu_read(text_poke_area));
> >>> - return 0;
> >>> -}
> >>> -
> >>> -/*
> >>> - * Run as a late init call. This allows all the boot time patching to be 
> >>> done
> >>> - * simply by patching the code, and then we're called here prior to
> >>> - * mark_rodata_ro(), which happens after all init calls are run. Although
> >>> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and 
> >>> we judge
> >>> - * it as being preferable to a kernel that will crash later when someone 
> >>> tries
> >>> - * to use patch_instruction().
> >>> - */
> >>> -static int __init setup_text_poke_area(void)
> >>> -{
> >>> - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >>> - "powerpc/text_poke:online", text_area_cpu_up,
> >>> - text_area_cpu_down));
> >>> -
> >>> - return 0;
> >>> -}
> >>> -late_initcall(setup_text_poke_area);
> >>> +struct patch_mapping {
> >>> + spinlock_t *ptl; /* for protecting pte table */
> >>> + struct temp_mm temp_mm;
> >>> +};
> >>>
> >>>/*
> >>> * This can be called for kernel text or a module.
> >>> */
> >>> -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> >>> +static int map_patch(const void *addr, struct patch_mapping 
> >>> *patch_mapping)
> >>
> >> Why change the name ?
> >>
> > 
> > It's not really an "area" anymore.
> > 
> >>>{
> >>> - unsigned long pfn;
> >>> - int err;
> >>> + struct page *page;
> >>> + pte_t pte, *ptep;
> >>> + pgprot_t pgprot;
> >>>
> >>>   if (is_vmalloc_addr(addr))
> >>> - pfn = vmalloc_to_pfn(addr);
> >>> + page = vmalloc_to_page(addr);
> >>>   else
> >>> - pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> >>> + page = virt_to_page(addr);
> >>>
> >>> - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> >>> + if (radix_enabled())
> >>> + pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
> >>> + else
> >>> + pgprot = PAGE_SHARED;
> >>
> >> Can you explain the difference between radix and non radix ?
> >>
> >> Why PAGE_KERNEL for a page that is mapped in userspace ?
> >>
> >> Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just
> >> using PAGE_KERNEL ?
> >>
> > 
> > On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for
> > kernel to userspace access in __hash_page - hence we cannot access the 
> > mapping
> > if the page is mapped PAGE_KERNEL on hash. However, I would like to use
> > PAGE_KERNEL here as well and am working on understanding why this check is
> > done in hash and if this can change. On radix this works just fine.
> > 
> > The page is mapped PAGE_KERNEL because the address is technically a 
> > userspace
> > address - but only to keep the mapping local to this CPU doing the patching.
> > PAGE_KERNEL makes it clear both in intent and protection that this is a 
> > kernel
> > mapping.
> > 
> > I think the

Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching

2020-04-15 Thread Christopher M Riedl

> On April 15, 2020 4:12 AM Christophe Leroy  wrote:
> 
>  
> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >> On March 26, 2020 9:42 AM Christophe Leroy  wrote:
> >>
> >>   
> >> This patch fixes the RFC series identified below.
> >> It fixes three points:
> >> - Failure with CONFIG_PPC_KUAP
> >> - Failure to write do to lack of DIRTY bit set on the 8xx
> >> - Inadequaly complex WARN post verification
> >>
> >> However, it has an impact on the CPU load. Here is the time
> >> needed on an 8xx to run the ftrace selftests without and
> >> with this series:
> >> - Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds
> >> - With CONFIG_STRICT_KERNEL_RWX==> 40 seconds
> >> - With CONFIG_STRICT_KERNEL_RWX + this series  ==> 43 seconds
> >>
> >> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >> Signed-off-by: Christophe Leroy 
> >> ---
> >>   arch/powerpc/lib/code-patching.c | 5 -
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/lib/code-patching.c 
> >> b/arch/powerpc/lib/code-patching.c
> >> index f156132e8975..4ccff427592e 100644
> >> --- a/arch/powerpc/lib/code-patching.c
> >> +++ b/arch/powerpc/lib/code-patching.c
> >> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct 
> >> patch_mapping *patch_mapping)
> >>}
> >>   
> >>pte = mk_pte(page, pgprot);
> >> +  pte = pte_mkdirty(pte);
> >>set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>   
> >>init_temp_mm(_mapping->temp_mm, patching_mm);
> >> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, 
> >> unsigned int instr)
> >>(offset_in_page((unsigned long)addr) /
> >>sizeof(unsigned int));
> >>   
> >> +  allow_write_to_user(patch_addr, sizeof(instr));
> >>__patch_instruction(addr, instr, patch_addr);
> >> +  prevent_write_to_user(patch_addr, sizeof(instr));
> >>
> > 
> > On radix we can map the page with PAGE_KERNEL protection which ends up
> > setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> > ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> > 
> > Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> > the __patch_instruction() with the allow_/prevent_write_to_user() KUAP 
> > things
> > because this is a temporary kernel mapping which really isn't userspace in
> > the usual sense.
> 
> On the 8xx, that's pretty different.
> 
> The PTE doesn't control whether a page is user page or a kernel page. 
> The only thing that is set in the PTE is whether a page is linked to a 
> given PID or not.
> PAGE_KERNEL tells that the page can be addressed with any PID.
> 
> The user access right is given by a kind of zone, which is in the PGD 
> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. 
> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> 
> By default, zone 0 can only be accessed by kernel, and zone 1 can only 
> be accessed by user. When kernel wants to access zone 1, it temporarily 
> changes properties of zone 1 to allow both kernel and user accesses.
> 
> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel 
> must unlock it to access it.
> 
> 
> And this is more or less the same on hash/32. This is managed by segment 
> registers. One segment register corresponds to a 256Mbytes area. Every 
> pages below PAGE_OFFSET can only be read by default by kernel. Only user 
> can write if the PTE allows it. When the kernel needs to write at an 
> address below PAGE_OFFSET, it must change the segment properties in the 
> corresponding segment register.
> 
> So, for both cases, if we want to have it local to a task while still 
> allowing kernel access, it means we have to define a new special area 
> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> 
> That looks complex to me for a small benefit, especially as 8xx is not 
> SMP and neither are most of the hash/32 targets.
> 

Agreed. So I guess the solution is to differentiate between radix/non-radix
and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
is enabled. Hmm, I need to think about this some more, especially if it's
acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?

I don't necessarily want to drop the local mm patching idea for non-radix
platforms since that means we would have to maintain two implementations.

> Christophe

Re: Linux-next POWER9 NULL pointer NIP since 1st Apr.

2020-04-15 Thread Qian Cai




> On Apr 10, 2020, at 3:20 PM, Qian Cai  wrote:
> 
> 
> 
>> On Apr 9, 2020, at 10:14 AM, Steven Rostedt  wrote:
>> 
>> On Thu, 9 Apr 2020 06:06:35 -0400
>> Qian Cai  wrote:
>> 
> I’ll go to bisect some more but it is going to take a while.
> 
> $ git log --oneline 4c205c84e249..8e99cf91b99b
> 8e99cf91b99b tracing: Do not allocate buffer in trace_find_next_entry() 
> in atomic
> 2ab2a0924b99 tracing: Add documentation on set_ftrace_notrace_pid and 
> set_event_notrace_pid
> ebed9628f5c2 selftests/ftrace: Add test to test new set_event_notrace_pid 
> file
> ed8839e072b8 selftests/ftrace: Add test to test new 
> set_ftrace_notrace_pid file
> 276836260301 tracing: Create set_event_notrace_pid to not trace tasks  
 
> b3b1e6ededa4 ftrace: Create set_ftrace_notrace_pid to not trace tasks
> 717e3f5ebc82 ftrace: Make function trace pid filtering a bit more exact  
 
 If it is affecting function tracing, it is probably one of the above two
 commits.  
>>> 
>>> OK, it was narrowed down to one of those messed with mcount here,
>> 
>> Thing is, nothing here touches mcount.
> 
> Yes, you are right. I went back to test the commit just before the 5.7-trace 
> merge request,
> I did reproduce there. The thing is that this bastard could take more 6-hour 
> to happen,
> so my previous attempt did not wait long enough. Back to the square one…

OK, I starts to test all commits up to 12 hours. The progess on far is,

BAD: v5.6-rc1
GOOD: v5.5
GOOD: 153b5c566d30 Merge tag 'microblaze-v5.6-rc1' of 
git://git.monstr.eu/linux-2.6-microblaze

The next step I’ll be testing,

71c3a888cbca Merge tag 'powerpc-5.6-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

IF that is BAD, the merge request is the culprit. I can see a few commits are 
more related that others.

5290ae2b8e5f powerpc/64: Use {SAVE,REST}_NVGPRS macros
ed0bc98f8cbe powerpc/64s: Reimplement power4_idle code in C

Does it ring any bell yet?

Re: [PATCH] lib/mpi: Fix building for powerpc with clang

2020-04-15 Thread Herbert Xu

On Mon, Apr 13, 2020 at 12:50:42PM -0700, Nathan Chancellor wrote:
> 0day reports over and over on an powerpc randconfig with clang:
> 
> lib/mpi/generic_mpih-mul1.c:37:13: error: invalid use of a cast in a
> inline asm context requiring an l-value: remove the cast or build with
> -fheinous-gnu-extensions
> 
> Remove the superfluous casts, which have been done previously for x86
> and arm32 in commit dea632cadd12 ("lib/mpi: fix build with clang") and
> commit 7b7c1df2883d ("lib/mpi/longlong.h: fix building with 32-bit
> x86").
> 
> Reported-by: kbuild test robot 
> Link: https://github.com/ClangBuiltLinux/linux/issues/991
> Signed-off-by: Nathan Chancellor 

Acked-by: Herbert Xu 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[PATCH v2, 4/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr

Include "linux/of_address.h" to fix the compile error for
mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.

  CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’:
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of 
function ‘of_iomap’; did you mean ‘pci_iomap’? 
[-Werror=implicit-function-declaration]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
   ^~~~
   pci_iomap
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer 
from integer without a cast [-Werror=int-conversion]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
 ^
cc1: all warnings being treated as errors
scripts/Makefile.build:267: recipe for target 
'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 
---
Changes since v1:
 * None
---
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c 
b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
index 2d0af0c517bb..7533572492f0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
+++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fsl_85xx_cache_ctlr.h"
-- 
2.17.1

[PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

A driver for freescale 85xx platforms to access the Cache-Sram form
user level. This is extremely helpful for some user-space applications
that require high performance memory accesses.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
 * Addressed comments of Greg K-H
 * Moved kfree(info->name) into uio_info_free_internal()
---
 drivers/uio/Kconfig   |   8 ++
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++
 3 files changed, 191 insertions(+)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 202ee81cfc2b..afd38ec13de0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,14 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
 
+config UIO_FSL_85XX_CACHE_SRAM
+   tristate "Freescale 85xx Cache-Sram driver"
+   depends on FSL_85XX_CACHE_SRAM
+   help
+ Generic driver for accessing the Cache-Sram form user level. This
+ is extremely helpful for some user-space applications that require
+ high performance memory accesses.
+
 config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index c285dd2a4539..be2056cffc21 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o
 obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
 obj-$(CONFIG_UIO_MF624) += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
 obj-$(CONFIG_UIO_HV_GENERIC)   += uio_hv_generic.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..fb6903fdaddb
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+ * Copyright (C) 2020 Wang Wenhu 
+ * All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"uio_fsl_85xx_cache_sram"
+#define UIO_NAME   "uio_cache_sram"
+
+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
+   {},
+};
+
+static void uio_info_free_internal(struct uio_info *info)
+{
+   struct uio_mem *uiomem = >mem[0];
+
+   while (uiomem < >mem[MAX_UIO_MAPS]) {
+   if (uiomem->size) {
+   mpc85xx_cache_sram_free(uiomem->internal_addr);
+   kfree(uiomem->name);
+   }
+   uiomem++;
+   }
+
+   kfree(info->name);
+}
+
+static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
+{
+   struct device_node *parent = pdev->dev.of_node;
+   struct device_node *node = NULL;
+   struct uio_info *info;
+   struct uio_mem *uiomem;
+   const char *dt_name;
+   u32 mem_size;
+   u32 align;
+   void *virt;
+   phys_addr_t phys;
+   int ret = -ENODEV;
+
+   /* alloc uio_info for one device */
+   info = kzalloc(sizeof(*info), GFP_KERNEL);
+   if (!info) {
+   ret = -ENOMEM;
+   goto err_out;
+   }
+
+   /* get optional uio name */
+   if (of_property_read_string(parent, "uio_name", _name))
+   dt_name =

[PATCH v2, 2/5] powerpc: sysdev: fix compile error for fsl_85xx_cache_sram

Include linux/io.h into fsl_85xx_cache_sram.c to fix the
implicit-declaration compile error when building Cache-Sram.

arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’:
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of 
function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? 
[-Werror=implicit-function-declaration]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
  ^~~~
  bitmap_complement
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes 
pointer from integer without a cast [-Werror=int-conversion]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
^
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of 
function ‘iounmap’; did you mean ‘roundup’? 
[-Werror=implicit-function-declaration]
  iounmap(cache_sram->base_virt);
  ^~~
  roundup
cc1: all warnings being treated as errors

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: WANG Wenhu 
---
Changes since v1:
 * None
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index f6c665dac725..be3aef4229d7 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fsl_85xx_cache_ctlr.h"
 
-- 
2.17.1

[PATCH v2,0/5] drivers: uio: new driver uio_fsl_85xx_cache_sram

This series add a new uio driver for freescale 85xx platforms to
access the Cache-Sram form user level. This is extremely helpful
for the user-space applications that require high performance memory
accesses.

It fixes the compile errors and warning of the hardware level drivers
and implements the uio driver in uio_fsl_85xx_cache_sram.c.

Changes since v1:
 * Addressed comments of Greg K-H
 * Moved kfree(info->name) into uio_info_free_internal()

Wang Wenhu (5):
  powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
  powerpc: sysdev: fix compile error for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr
  drivers: uio: new driver for fsl_85xx_cache_sram

 arch/powerpc/platforms/85xx/Kconfig   |   2 +-
 arch/powerpc/platforms/Kconfig.cputype|   5 +-
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c |   3 +-
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c |   1 +
 drivers/uio/Kconfig   |   8 +
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++
 7 files changed, 198 insertions(+), 4 deletions(-)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

-- 
2.17.1

[PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
 * None
---
 arch/powerpc/platforms/85xx/Kconfig| 2 +-
 arch/powerpc/platforms/Kconfig.cputype | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index fa3d29dcb57e..6debb4f1b9cc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
 if PPC32
 
 config FSL_85XX_CACHE_SRAM
-   bool
+   bool "Freescale 85xx Cache-Sram"
select PPC_LIB_RHEAP
help
  When selected, this option enables cache-sram support
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..1921e9a573e8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 config PPC32
-   bool
+   bool "32-bit kernel"
default y if !PPC64
select KASAN_VMALLOC if KASAN && MODULES
 
@@ -15,6 +15,7 @@ config PPC_BOOK3S_32
bool
 
 menu "Processor support"
+
 choice
prompt "Processor Type"
depends on PPC32
@@ -211,9 +212,9 @@ config PPC_BOOK3E
depends on PPC_BOOK3E_64
 
 config E500
+   bool "e500 Support"
select FSL_EMB_PERFMON
select PPC_FSL_BOOK3E
-   bool
 
 config PPC_E500MC
bool "e500mc Support"
-- 
2.17.1

[PATCH v2, 3/5] powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram

Function instantiate_cache_sram should not be linked into the init
section for its caller mpc85xx_l2ctlr_of_probe is none-__init.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 

Warning information:
  MODPOST vmlinux.o
WARNING: modpost: vmlinux.o(.text+0x1e540): Section mismatch in reference from 
the function mpc85xx_l2ctlr_of_probe() to the function 
.init.text:instantiate_cache_sram()
The function mpc85xx_l2ctlr_of_probe() references
the function __init instantiate_cache_sram().
This is often because mpc85xx_l2ctlr_of_probe lacks a __init
annotation or the annotation of instantiate_cache_sram is wrong.
---
Changes since v1:
 * None
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index be3aef4229d7..3de5ac8382c0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -68,7 +68,7 @@ void mpc85xx_cache_sram_free(void *ptr)
 }
 EXPORT_SYMBOL(mpc85xx_cache_sram_free);
 
-int __init instantiate_cache_sram(struct platform_device *dev,
+int instantiate_cache_sram(struct platform_device *dev,
struct sram_parameters sram_params)
 {
int ret = 0;
-- 
2.17.1

[PATCH] powerpc/uaccess: Don't set KUAP by default on book3s/32

On book3s/32, KUAP is an heavy process as it requires to
determine which segments are impacted and unlock/lock
each of them.

And since the implementation of user_access_begin/end, it
is even worth for the time being because unlike __get_user(),
user_access_begin doesn't make difference between read and write
and unlocks access also for read allthought that's unneeded
on book3s/32.

As shown by the size of a kernel built with KUAP and one without,
the overhead is 64k bytes of code. As a comparison a similar
build on an 8xx has an overhead of only 8k bytes of code.

   textdata bss dec hex filename
7230416 1425868  837376 9493660  90dc9c vmlinux.kuap6xx
7165012 1425548  837376 9427936  8fdbe0 vmlinux.nokuap6xx
6519796 1960028  477464 8957288  88ad68 vmlinux.kuap8xx
6511664 1959864  477464 8948992  888d00 vmlinux.nokuap8xx

Until a more optimised KUAP is implemented on book3s/32,
don't select it by default.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..0c7151c98b56 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -389,7 +389,7 @@ config PPC_HAVE_KUAP
 config PPC_KUAP
bool "Kernel Userspace Access Protection"
depends on PPC_HAVE_KUAP
-   default y
+   default y if !PPC_BOOK3S_32
help
  Enable support for Kernel Userspace Access Protection (KUAP)
 
-- 
2.25.0

[PATCH] powerpc/uaccess: Don't set KUEP by default on book3s/32

On book3s/32, KUEP is an heavy process as it requires to
set/unset the NX bit in each of the 12 user segments
everytime the kernel is entered/exited from/to user space.

Don't select KUEP by default on book3s/32.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c7151c98b56..11412078e732 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -377,7 +377,7 @@ config PPC_HAVE_KUEP
 config PPC_KUEP
bool "Kernel Userspace Execution Prevention"
depends on PPC_HAVE_KUEP
-   default y
+   default y if !PPC_BOOK3S_32
help
  Enable support for Kernel Userspace Execution Prevention (KUEP)
 
-- 
2.25.0

[PATCH 01/34] docs: filesystems: fix references for doc files there

2020-04-15 Thread Mauro Carvalho Chehab

Several files there were renamed to ReST. Fix the broken
references.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/ABI/stable/sysfs-devices-node   | 2 +-
 Documentation/ABI/testing/procfs-smaps_rollup | 2 +-
 Documentation/admin-guide/cpu-load.rst| 2 +-
 Documentation/admin-guide/nfs/nfsroot.rst | 2 +-
 Documentation/driver-api/driver-model/device.rst  | 2 +-
 Documentation/driver-api/driver-model/overview.rst| 2 +-
 Documentation/filesystems/dax.txt | 2 +-
 Documentation/filesystems/dnotify.txt | 2 +-
 Documentation/filesystems/ramfs-rootfs-initramfs.rst  | 2 +-
 Documentation/powerpc/firmware-assisted-dump.rst  | 2 +-
 Documentation/process/adding-syscalls.rst | 2 +-
 .../translations/it_IT/process/adding-syscalls.rst| 2 +-
 Documentation/translations/zh_CN/filesystems/sysfs.txt| 6 +++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 2 +-
 fs/Kconfig| 2 +-
 fs/Kconfig.binfmt | 2 +-
 fs/adfs/Kconfig   | 2 +-
 fs/affs/Kconfig   | 2 +-
 fs/afs/Kconfig| 6 +++---
 fs/bfs/Kconfig| 2 +-
 fs/cramfs/Kconfig | 2 +-
 fs/ecryptfs/Kconfig   | 2 +-
 fs/fat/Kconfig| 8 
 fs/fuse/Kconfig   | 2 +-
 fs/fuse/dev.c | 2 +-
 fs/hfs/Kconfig| 2 +-
 fs/hpfs/Kconfig   | 2 +-
 fs/isofs/Kconfig  | 2 +-
 fs/namespace.c| 2 +-
 fs/notify/inotify/Kconfig | 2 +-
 fs/ntfs/Kconfig   | 2 +-
 fs/ocfs2/Kconfig  | 2 +-
 fs/overlayfs/Kconfig  | 6 +++---
 fs/proc/Kconfig   | 4 ++--
 fs/romfs/Kconfig  | 2 +-
 fs/sysfs/dir.c| 2 +-
 fs/sysfs/file.c   | 2 +-
 fs/sysfs/mount.c  | 2 +-
 fs/sysfs/symlink.c| 2 +-
 fs/sysv/Kconfig   | 2 +-
 fs/udf/Kconfig| 2 +-
 include/linux/relay.h | 2 +-
 include/linux/sysfs.h | 2 +-
 kernel/relay.c| 2 +-
 44 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index df8413cf1468..484fc04bcc25 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -54,7 +54,7 @@ Date: October 2002
 Contact:   Linux Memory Management list 
 Description:
Provides information about the node's distribution and memory
-   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.txt
+   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.rst
 
 What:  /sys/devices/system/node/nodeX/numastat
 Date:  October 2002
diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
b/Documentation/ABI/testing/procfs-smaps_rollup
index 274df44d8b1b..046978193368 100644
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps.  These fields represent
the sum of the Pss field of each type (anon, file, shmem).
-   For more details, see Documentation/filesystems/proc.txt
+   For more details, see Documentation/filesystems/proc.rst
and the procfs man page.
 
Typical output looks like this:
diff --git a/Documentation/admin-guide/cpu-load.rst 
b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..ebdecf864080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -105,7 +105,7 @@ References
 --
 
 - http://lkml.org/lkml/2007/2/12/6
-- Documentation/filesystems/proc.txt (1.8)
+- Documentation/filesystems/proc.rst (1.8)
 
 
 Thanks
diff --git a/Documentation/admin-guide/nfs/nfsroot.rst

[PATCH 29/34] docs: filesystems: convert spufs/spufs.txt to ReST

2020-04-15 Thread Mauro Carvalho Chehab

This file is at groff output format. Manually convert it to
ReST format, trying to preserve a similar output after parsed.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/filesystems/spufs/index.rst |  1 +
 .../spufs/{spufs.txt => spufs.rst}| 59 +--
 MAINTAINERS   |  2 +-
 3 files changed, 30 insertions(+), 32 deletions(-)
 rename Documentation/filesystems/spufs/{spufs.txt => spufs.rst} (95%)

diff --git a/Documentation/filesystems/spufs/index.rst 
b/Documentation/filesystems/spufs/index.rst
index 39553c6ebefd..939cf59a7d9e 100644
--- a/Documentation/filesystems/spufs/index.rst
+++ b/Documentation/filesystems/spufs/index.rst
@@ -8,4 +8,5 @@ SPU Filesystem
 .. toctree::
:maxdepth: 1
 
+   spufs
spu_create
diff --git a/Documentation/filesystems/spufs/spufs.txt 
b/Documentation/filesystems/spufs/spufs.rst
similarity index 95%
rename from Documentation/filesystems/spufs/spufs.txt
rename to Documentation/filesystems/spufs/spufs.rst
index caf36aaae804..8a42859bb100 100644
--- a/Documentation/filesystems/spufs/spufs.txt
+++ b/Documentation/filesystems/spufs/spufs.rst
@@ -1,12 +1,18 @@
-SPUFS(2)   Linux Programmer's Manual  SPUFS(2)
+.. SPDX-License-Identifier: GPL-2.0
 
+=
+spufs
+=
 
+Name
+
 
-NAME
spufs - the SPU file system
 
 
-DESCRIPTION
+Description
+===
+
The SPU file system is used on PowerPC machines that implement the Cell
Broadband Engine Architecture in order to access Synergistic  Processor
Units (SPUs).
@@ -21,7 +27,9 @@ DESCRIPTION
ally add or remove files.
 
 
-MOUNT OPTIONS
+Mount Options
+=
+
uid=
   set the user owning the mount point, the default is 0 (root).
 
@@ -29,7 +37,9 @@ MOUNT OPTIONS
   set the group owning the mount point, the default is 0 (root).
 
 
-FILES
+Files
+=
+
The files in spufs mostly follow the standard behavior for regular sys-
tem  calls like read(2) or write(2), but often support only a subset of
the operations supported on regular file systems. This list details the
@@ -125,14 +135,12 @@ FILES
   space is available for writing.
 
 
-   /mbox_stat
-   /ibox_stat
-   /wbox_stat
+   /mbox_stat, /ibox_stat, /wbox_stat
Read-only files that contain the length of the current queue, i.e.  how
many  words  can  be  read  from  mbox or ibox or how many words can be
written to wbox without blocking.  The files can be read only in 4-byte
units  and  return  a  big-endian  binary integer number.  The possible
-   operations on an open *box_stat file are:
+   operations on an open ``*box_stat`` file are:
 
read(2)
   If a count smaller than four is requested, read returns  -1  and
@@ -143,12 +151,7 @@ FILES
   in EAGAIN.
 
 
-   /npc
-   /decr
-   /decr_status
-   /spu_tag_mask
-   /event_mask
-   /srr0
+   /npc, /decr, /decr_status, /spu_tag_mask, /event_mask, /srr0
Internal  registers  of  the SPU. The representation is an ASCII string
with the numeric value of the next instruction to  be  executed.  These
can  be  used in read/write mode for debugging, but normal operation of
@@ -157,17 +160,14 @@ FILES
 
The contents of these files are:
 
+   === ===
npc Next Program Counter
-
decrSPU Decrementer
-
decr_status Decrementer Status
-
spu_tag_maskMFC tag mask for SPU DMA
-
event_mask  Event mask for SPU interrupts
-
srr0Interrupt Return address register
+   === ===
 
 
The   possible   operations   on   an   open  npc,  decr,  decr_status,
@@ -206,8 +206,7 @@ FILES
   from the data buffer, updating the value of the fpcr register.
 
 
-   /signal1
-   /signal2
+   /signal1, /signal2
The two signal notification channels of an SPU.  These  are  read-write
files  that  operate  on  a 32 bit word.  Writing to one of these files
triggers an interrupt on the SPU.  The  value  written  to  the  signal
@@ -233,8 +232,7 @@ FILES
   file.
 
 
-   /signal1_type
-   /signal2_type
+   /signal1_type, /signal2_type
These two files change the behavior of the signal1 and signal2  notifi-
cation  files.  The  contain  a numerical ASCII string which is read as
either "1" or "0".  In mode 0 (overwrite), the  hardware  replaces  the
@@ -259,18 +257,17 @@ FILES
   the previous setting.
 
 
-EXAMPLES
+Examples
+
/etc/fstab entry
   none  /spu  spufs gid=spu   00
 
 
-AUTHORS
+Authors
+===
Arnd  Bergmann  ,  Mark  Nutter ,
Ulrich Weigand 
 
-SEE ALSO
+See Also
+

[PATCH 00/34] fs: convert remaining docs to ReST file format

2020-04-15 Thread Mauro Carvalho Chehab

This patch series convert the remaining files under Documentation/filesystems
to the ReST file format. It is based on linux-next (next-20200414).

PS.: I opted to add mainly ML from the output of get_maintainers.pl to the c/c
list of patch 00/34, because  otherwise the number of c/c would be too many,
with would very likely cause ML servers to reject it.

The results of those changes (together with other changes from my pending
doc patches) are available at:

   https://www.infradead.org/~mchehab/kernel_docs/filesystems/index.html

Mauro Carvalho Chehab (34):
  docs: filesystems: fix references for doc files there
  docs: filesystems: convert caching/object.txt to ReST
  docs: filesystems: convert caching/fscache.txt to ReST format
  docs: filesystems: caching/netfs-api.txt: convert it to ReST
  docs: filesystems: caching/operations.txt: convert it to ReST
  docs: filesystems: caching/cachefiles.txt: convert to ReST
  docs: filesystems: caching/backend-api.txt: convert it to ReST
  docs: filesystems: convert cifs/cifsroot.rst to ReST
  docs: filesystems: convert configfs.txt to ReST
  docs: filesystems: convert automount-support.txt to ReST
  docs: filesystems: convert coda.txt to ReST
  docs: filesystems: convert dax.txt to ReST
  docs: filesystems: convert devpts.txt to ReST
  docs: filesystems: convert dnotify.txt to ReST
  docs: filesystems: convert fiemap.txt to ReST
  docs: filesystems: convert files.txt to ReST
  docs: filesystems: convert fuse-io.txt to ReST
  docs: filesystems: convert gfs2-glocks.txt to ReST
  docs: filesystems: convert locks.txt to ReST
  docs: filesystems: convert mandatory-locking.txt to ReST
  docs: filesystems: convert mount_api.txt to ReST
  docs: filesystems: rename path-lookup.txt file
  docs: filesystems: convert path-walking.txt to ReST
  docs: filesystems: convert quota.txt to ReST
  docs: filesystems: convert seq_file.txt to ReST
  docs: filesystems: convert sharedsubtree.txt to ReST
  docs: filesystems: split spufs.txt into 3 separate files
  docs: filesystems: convert spufs/spu_create.txt to ReST
  docs: filesystems: convert spufs/spufs.txt to ReST
  docs: filesystems: convert spufs/spu_run.txt to ReST
  docs: filesystems: convert sysfs-pci.txt to ReST
  docs: filesystems: convert sysfs-tagging.txt to ReST
  docs: filesystems: convert xfs-delayed-logging-design.txt to ReST
  docs: filesystems: convert xfs-self-describing-metadata.txt to ReST

 Documentation/ABI/stable/sysfs-devices-node   |2 +-
 Documentation/ABI/testing/procfs-smaps_rollup |2 +-
 Documentation/admin-guide/cpu-load.rst|2 +-
 Documentation/admin-guide/ext4.rst|2 +-
 Documentation/admin-guide/nfs/nfsroot.rst |2 +-
 Documentation/admin-guide/sysctl/kernel.rst   |2 +-
 .../driver-api/driver-model/device.rst|2 +-
 .../driver-api/driver-model/overview.rst  |2 +-
 ...ount-support.txt => automount-support.rst} |   23 +-
 .../{backend-api.txt => backend-api.rst}  |  165 +-
 .../{cachefiles.txt => cachefiles.rst}|  139 +-
 Documentation/filesystems/caching/fscache.rst |  565 ++
 Documentation/filesystems/caching/fscache.txt |  448 -
 Documentation/filesystems/caching/index.rst   |   14 +
 .../caching/{netfs-api.txt => netfs-api.rst}  |  172 +-
 .../caching/{object.txt => object.rst}|   43 +-
 .../{operations.txt => operations.rst}|   45 +-
 .../cifs/{cifsroot.txt => cifsroot.rst}   |   56 +-
 Documentation/filesystems/coda.rst| 1670 
 Documentation/filesystems/coda.txt| 1676 -
 .../{configfs/configfs.txt => configfs.rst}   |  129 +-
 .../filesystems/{dax.txt => dax.rst}  |   11 +-
 Documentation/filesystems/devpts.rst  |   36 +
 Documentation/filesystems/devpts.txt  |   26 -
 .../filesystems/{dnotify.txt => dnotify.rst}  |   13 +-
 Documentation/filesystems/ext2.rst|2 +-
 .../filesystems/{fiemap.txt => fiemap.rst}|  133 +-
 .../filesystems/{files.txt => files.rst}  |   15 +-
 .../filesystems/{fuse-io.txt => fuse-io.rst}  |6 +
 .../{gfs2-glocks.txt => gfs2-glocks.rst}  |  147 +-
 Documentation/filesystems/index.rst   |   26 +
 .../filesystems/{locks.txt => locks.rst}  |   14 +-
 ...tory-locking.txt => mandatory-locking.rst} |   25 +-
 .../{mount_api.txt => mount_api.rst}  |  329 ++--
 .../{path-lookup.txt => path-walking.rst} |   88 +-
 Documentation/filesystems/porting.rst |2 +-
 Documentation/filesystems/proc.rst|2 +-
 .../filesystems/{quota.txt => quota.rst}  |   41 +-
 .../filesystems/ramfs-rootfs-initramfs.rst|2 +-
 .../{seq_file.txt => seq_file.rst}|   61 +-
 .../{sharedsubtree.txt => sharedsubtree.rst}  |  394 ++--
 Documentation/filesystems/spufs/index.rst |   13 +
 .../filesystems/spufs/spu_create.rst  |  131 ++
 Documentation/filesystems/spufs/spu_run.rst   |  138 ++
 .../{spufs.txt =>

Re: [PATCH v6 6/7] ASoC: dt-bindings: fsl_easrc: Add document for EASRC

2020-04-15 Thread Rob Herring

On Tue, Apr 14, 2020 at 9:56 PM Shengjiu Wang  wrote:
>
> Hi Rob
>
> On Tue, Apr 14, 2020 at 11:49 PM Rob Herring  wrote:
> >
> > On Wed, Apr 01, 2020 at 04:45:39PM +0800, Shengjiu Wang wrote:
> > > EASRC (Enhanced Asynchronous Sample Rate Converter) is a new
> > > IP module found on i.MX8MN.
> > >
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  .../devicetree/bindings/sound/fsl,easrc.yaml  | 101 ++
> > >  1 file changed, 101 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/sound/fsl,easrc.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/sound/fsl,easrc.yaml 
> > > b/Documentation/devicetree/bindings/sound/fsl,easrc.yaml
> > > new file mode 100644
> > > index ..14ea60084420
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/sound/fsl,easrc.yaml
> > > @@ -0,0 +1,101 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/sound/fsl,easrc.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: NXP Asynchronous Sample Rate Converter (ASRC) Controller
> > > +
> > > +maintainers:
> > > +  - Shengjiu Wang 
> > > +
> > > +properties:
> > > +  $nodename:
> > > +pattern: "^easrc@.*"
> > > +
> > > +  compatible:
> > > +const: fsl,imx8mn-easrc
> > > +
> > > +  reg:
> > > +maxItems: 1
> > > +
> > > +  interrupts:
> > > +maxItems: 1
> > > +
> > > +  clocks:
> > > +items:
> > > +  - description: Peripheral clock
> > > +
> > > +  clock-names:
> > > +items:
> > > +  - const: mem
> > > +
> > > +  dmas:
> > > +maxItems: 8
> > > +
> > > +  dma-names:
> > > +items:
> > > +  - const: ctx0_rx
> > > +  - const: ctx0_tx
> > > +  - const: ctx1_rx
> > > +  - const: ctx1_tx
> > > +  - const: ctx2_rx
> > > +  - const: ctx2_tx
> > > +  - const: ctx3_rx
> > > +  - const: ctx3_tx
> > > +
> > > +  firmware-name:
> > > +allOf:
> > > +  - $ref: /schemas/types.yaml#/definitions/string
> > > +  - const: imx/easrc/easrc-imx8mn.bin
> > > +description: The coefficient table for the filters
> > > +
> > > +  fsl,asrc-rate:
> >
> > fsl,asrc-rate-hz
>
> Can we keep "fsl,asrc-rate", because I want this property
> align with the one in fsl,asrc.txt.  These two asrc modules
> can share same property name.

Oh, yes.

So with the example fixed:

Reviewed-by: Rob Herring

Re: [PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram>On Wed, Apr 15, 2020 at 05:33:46AM -0700, Wang Wenhu wrote:

Hi, Greg k-h!
Thank you for you fast reply. All the comments will
be addressed with v2 soon. Detailed explanations are
just below specific comment.

>> A driver for freescale 85xx platforms to access the Cache-Sram form
>> user level. This is extremely helpful for some user-space applications
>> that require high performance memory accesses.
>> 
>> Cc: Greg Kroah-Hartman 
>> Cc: Christophe Leroy 
>> Cc: Scott Wood 
>> Cc: Michael Ellerman 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Wang Wenhu 
>> ---
>>  drivers/uio/Kconfig   |   8 ++
>>  drivers/uio/Makefile  |   1 +
>>  drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
>>  3 files changed, 204 insertions(+)
>>  create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c
>> 
>> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
>> index 202ee81cfc2b..afd38ec13de0 100644
>> --- a/drivers/uio/Kconfig
>> +++ b/drivers/uio/Kconfig
>> @@ -105,6 +105,14 @@ config UIO_NETX
>>To compile this driver as a module, choose M here; the module
>>will be called uio_netx.
>>  
>> +config UIO_FSL_85XX_CACHE_SRAM
>> +tristate "Freescale 85xx Cache-Sram driver"
>> +depends on FSL_85XX_CACHE_SRAM
>> +help
>> +  Generic driver for accessing the Cache-Sram form user level. This
>> +  is extremely helpful for some user-space applications that require
>> +  high performance memory accesses.
>> +
>>  config UIO_FSL_ELBC_GPCM
>>  tristate "eLBC/GPCM driver"
>>  depends on FSL_LBC
>> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
>> index c285dd2a4539..be2056cffc21 100644
>> --- a/drivers/uio/Makefile
>> +++ b/drivers/uio/Makefile
>> @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o
>>  obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
>>  obj-$(CONFIG_UIO_MF624) += uio_mf624.o
>>  obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
>> +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)   += uio_fsl_85xx_cache_sram.o
>>  obj-$(CONFIG_UIO_HV_GENERIC)+= uio_hv_generic.o
>> diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
>> b/drivers/uio/uio_fsl_85xx_cache_sram.c
>> new file mode 100644
>> index ..e11202dd5b93
>> --- /dev/null
>> +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
>> @@ -0,0 +1,195 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
>> + * Copyright (C) 2020 Wang Wenhu 
>> + * All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License version 2 as published
>> + * by the Free Software Foundation.
>
>Nit, you don't need this sentance anymore now that you have the SPDX
>line above
>
Got, I will delete it with v2.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define DRIVER_VERSION  "0.1.0"
>
>Don't do DRIVER_VERSIONs, they never work once the code is in the kernel
>tree.
>
>> +#define DRIVER_NAME "uio_fsl_85xx_cache_sram"
>
>KBUILD_MODNAME?

Yes, and sorry for that I did not get what should have been done?

>
>> +#define UIO_NAME"uio_cache_sram"
>> +
>> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
>> +{   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
>> +{},
>> +};
>> +
>> +static void uio_info_free_internal(struct uio_info *info)
>> +{
>> +struct uio_mem *uiomem = >mem[0];
>> +
>> +while (uiomem < >mem[MAX_UIO_MAPS]) {
>> +if (uiomem->size) {
>> +mpc85xx_cache_sram_free(uiomem->internal_addr);
>> +kfree(uiomem->name);
>> +

Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Michal Suchánek

On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote:
> The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
> Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and
> User Authority Mask Override Register (UAMOR) are not correctly saved and
> restored when the CPU is going into/coming out of idle state.
> 
> On POWER9 CPUs, this means that a CPU may return from idle with the AMR
> value of another thread on the same core.
> 
> This allows a trivial Denial of Service attack against KVM hosts, by booting
> a guest kernel which makes use of the AMR, such as a v5.2 or later kernel
> with Kernel Userspace Access Prevention (KUAP) enabled.
> 
> The guest kernel will set the AMR to prevent userspace access, then the
> thread will go idle. At a later point, the hardware thread that the guest
> was using may come out of idle and start executing in the host, without
> restoring the host AMR value. The host kernel can get caught in a page fault
> loop, as the AMR is unexpectedly causing memory accesses to fail in the
> host, and the host is eventually rendered unusable.

Hello,

shouldn't the kernel restore the host registers when leaving the guest?

I recall some code exists for handling the *AM*R when leaving guest. Can
the KVM guest enter idle without exiting to host?

Thanks

Michal

[PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property

2020-04-15 Thread Aishwarya R

>> Use of_property_read_u32 to read the "reg" and "i2c-address" property
>> instead of using of_get_property to check the return values.
>>
>> Signed-off-by: Aishwarya R 

> This is quite a fragile driver. Have you tested it on HW?

This change is not tested with the Hardware.
But of_property_read_u32 is better here than generic of_get_property.
This make sure that value read properly independent of system endianess.

Re: [PATCH 4.19] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2020-04-15 Thread Greg KH

On Wed, Apr 15, 2020 at 10:40:05PM +1000, Andrew Donnellan wrote:
> From: Michael Ellerman 
> 
> commit 53a712bae5dd919521a58d7bad773b949358add0 upstream.
> 
> In order to implement KUAP (Kernel Userspace Access Protection) on
> Power9 we will be using the AMR, and therefore indirectly the
> UAMOR/AMOR.
> 
> So save/restore these regs in the idle code.
> 
> Signed-off-by: Michael Ellerman 
> [ajd: Backport to 4.19 tree, CVE-2020-11669]
> Signed-off-by: Andrew Donnellan 
> ---
>  arch/powerpc/kernel/idle_book3s.S | 27 +++
>  1 file changed, 23 insertions(+), 4 deletions(-)

This and the 4.14 patch now queued up, thanks.

greg k-h

CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Andrew Donnellan

The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
Authority Mask Register (AMR), Authority Mask Override Register (AMOR)
and User Authority Mask Override Register (UAMOR) are not correctly
saved and restored when the CPU is going into/coming out of idle state.

On POWER9 CPUs, this means that a CPU may return from idle with the AMR
value of another thread on the same core.

This allows a trivial Denial of Service attack against KVM hosts, by
booting a guest kernel which makes use of the AMR, such as a v5.2 or
later kernel with Kernel Userspace Access Prevention (KUAP) enabled.

The guest kernel will set the AMR to prevent userspace access, then the
thread will go idle. At a later point, the hardware thread that the
guest was using may come out of idle and start executing in the host,
without restoring the host AMR value. The host kernel can get caught in
a page fault loop, as the AMR is unexpectedly causing memory accesses to
fail in the host, and the host is eventually rendered unusable.

The fix is to correctly save and restore the AMR in the idle state
handling code.

The bug does not affect POWER8 or earlier Power CPUs.

CVE-2020-11669 has been assigned.

The bug has already been fixed upstream in kernels v5.2 onwards, by [0].

Fixes have been submitted for inclusion in upstream stable kernel trees
for v4.19[1] and v4.14[2].

The bug is already fixed in Red Hat Enterprise Linux 8 kernels from
4.18.0-147 onwards - see RHSA-2019:3517[3].

Thanks to David Gibson of Red Hat for the initial bug report.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53a712bae5dd919521a58d7bad773b949358add0

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208661.html

[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208660.html

[3] https://access.redhat.com/errata/RHSA-2019:3517

--
Andrew Donnellan OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram

2020-04-15 Thread Greg KH

On Wed, Apr 15, 2020 at 05:33:46AM -0700, Wang Wenhu wrote:
> A driver for freescale 85xx platforms to access the Cache-Sram form
> user level. This is extremely helpful for some user-space applications
> that require high performance memory accesses.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Christophe Leroy 
> Cc: Scott Wood 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Wang Wenhu 
> ---
>  drivers/uio/Kconfig   |   8 ++
>  drivers/uio/Makefile  |   1 +
>  drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
>  3 files changed, 204 insertions(+)
>  create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c
> 
> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> index 202ee81cfc2b..afd38ec13de0 100644
> --- a/drivers/uio/Kconfig
> +++ b/drivers/uio/Kconfig
> @@ -105,6 +105,14 @@ config UIO_NETX
> To compile this driver as a module, choose M here; the module
> will be called uio_netx.
>  
> +config UIO_FSL_85XX_CACHE_SRAM
> + tristate "Freescale 85xx Cache-Sram driver"
> + depends on FSL_85XX_CACHE_SRAM
> + help
> +   Generic driver for accessing the Cache-Sram form user level. This
> +   is extremely helpful for some user-space applications that require
> +   high performance memory accesses.
> +
>  config UIO_FSL_ELBC_GPCM
>   tristate "eLBC/GPCM driver"
>   depends on FSL_LBC
> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> index c285dd2a4539..be2056cffc21 100644
> --- a/drivers/uio/Makefile
> +++ b/drivers/uio/Makefile
> @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)  += uio_netx.o
>  obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
>  obj-$(CONFIG_UIO_MF624) += uio_mf624.o
>  obj-$(CONFIG_UIO_FSL_ELBC_GPCM)  += uio_fsl_elbc_gpcm.o
> +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)+= uio_fsl_85xx_cache_sram.o
>  obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o
> diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
> b/drivers/uio/uio_fsl_85xx_cache_sram.c
> new file mode 100644
> index ..e11202dd5b93
> --- /dev/null
> +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
> @@ -0,0 +1,195 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
> + * Copyright (C) 2020 Wang Wenhu 
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.

Nit, you don't need this sentance anymore now that you have the SPDX
line above

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DRIVER_VERSION   "0.1.0"

Don't do DRIVER_VERSIONs, they never work once the code is in the kernel
tree.

> +#define DRIVER_NAME  "uio_fsl_85xx_cache_sram"

KBUILD_MODNAME?

> +#define UIO_NAME "uio_cache_sram"
> +
> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
> + {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
> + {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
> + {},
> +};
> +
> +static void uio_info_free_internal(struct uio_info *info)
> +{
> + struct uio_mem *uiomem = >mem[0];
> +
> + while (uiomem < >mem[MAX_UIO_MAPS]) {
> + if (uiomem->size) {
> + mpc85xx_cache_sram_free(uiomem->internal_addr);
> + kfree(uiomem->name);
> + }
> + uiomem++;
> + }
> +}
> +
> +static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
> +{
> + struct device_node *parent = pdev->dev.of_node;
> + struct device_node *node = NULL;
> + struct uio_info *info;
> + struct

[PATCH 4.19] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2020-04-15 Thread Andrew Donnellan

From: Michael Ellerman 

commit 53a712bae5dd919521a58d7bad773b949358add0 upstream.

In order to implement KUAP (Kernel Userspace Access Protection) on
Power9 we will be using the AMR, and therefore indirectly the
UAMOR/AMOR.

So save/restore these regs in the idle code.

Signed-off-by: Michael Ellerman 
[ajd: Backport to 4.19 tree, CVE-2020-11669]
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/kernel/idle_book3s.S | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 36178000a2f2..4a860d3b9229 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -170,8 +170,11 @@ core_idle_lock_held:
bne-core_idle_lock_held
blr
 
-/* Reuse an unused pt_regs slot for IAMR */
+/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */
+#define PNV_POWERSAVE_AMR  _TRAP
 #define PNV_POWERSAVE_IAMR _DAR
+#define PNV_POWERSAVE_UAMOR_DSISR
+#define PNV_POWERSAVE_AMOR RESULT
 
 /*
  * Pass requested state in r3:
@@ -205,8 +208,16 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
 
 BEGIN_FTR_SECTION
+   mfspr   r4, SPRN_AMR
mfspr   r5, SPRN_IAMR
+   mfspr   r6, SPRN_UAMOR
+   std r4, PNV_POWERSAVE_AMR(r1)
std r5, PNV_POWERSAVE_IAMR(r1)
+   std r6, PNV_POWERSAVE_UAMOR(r1)
+BEGIN_FTR_SECTION_NESTED(42)
+   mfspr   r7, SPRN_AMOR
+   std r7, PNV_POWERSAVE_AMOR(r1)
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
mfcrr5
@@ -935,12 +946,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_GPR(2, r1)
 
 BEGIN_FTR_SECTION
-   /* IAMR was saved in pnv_powersave_common() */
+   /* These regs were saved in pnv_powersave_common() */
+   ld  r4, PNV_POWERSAVE_AMR(r1)
ld  r5, PNV_POWERSAVE_IAMR(r1)
+   ld  r6, PNV_POWERSAVE_UAMOR(r1)
+   mtspr   SPRN_AMR, r4
mtspr   SPRN_IAMR, r5
+   mtspr   SPRN_UAMOR, r6
+BEGIN_FTR_SECTION_NESTED(42)
+   ld  r7, PNV_POWERSAVE_AMOR(r1)
+   mtspr   SPRN_AMOR, r7
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
/*
-* We don't need an isync here because the upcoming mtmsrd is
-* execution synchronizing.
+* We don't need an isync here after restoring IAMR because the upcoming
+* mtmsrd is execution synchronizing.
 */
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
-- 
2.20.1

[PATCH 4.14] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2020-04-15 Thread Andrew Donnellan

From: Michael Ellerman 

commit 53a712bae5dd919521a58d7bad773b949358add0 upstream.

In order to implement KUAP (Kernel Userspace Access Protection) on
Power9 we will be using the AMR, and therefore indirectly the
UAMOR/AMOR.

So save/restore these regs in the idle code.

Signed-off-by: Michael Ellerman 
[ajd: Backport to 4.14 tree, CVE-2020-11669]
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/kernel/idle_book3s.S | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 74fc20431082..01b823bdb49c 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -163,8 +163,11 @@ core_idle_lock_held:
bne-core_idle_lock_held
blr
 
-/* Reuse an unused pt_regs slot for IAMR */
+/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */
+#define PNV_POWERSAVE_AMR  _TRAP
 #define PNV_POWERSAVE_IAMR _DAR
+#define PNV_POWERSAVE_UAMOR_DSISR
+#define PNV_POWERSAVE_AMOR RESULT
 
 /*
  * Pass requested state in r3:
@@ -198,8 +201,16 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
 
 BEGIN_FTR_SECTION
+   mfspr   r4, SPRN_AMR
mfspr   r5, SPRN_IAMR
+   mfspr   r6, SPRN_UAMOR
+   std r4, PNV_POWERSAVE_AMR(r1)
std r5, PNV_POWERSAVE_IAMR(r1)
+   std r6, PNV_POWERSAVE_UAMOR(r1)
+BEGIN_FTR_SECTION_NESTED(42)
+   mfspr   r7, SPRN_AMOR
+   std r7, PNV_POWERSAVE_AMOR(r1)
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
mfcrr5
@@ -951,12 +962,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_GPR(2, r1)
 
 BEGIN_FTR_SECTION
-   /* IAMR was saved in pnv_powersave_common() */
+   /* These regs were saved in pnv_powersave_common() */
+   ld  r4, PNV_POWERSAVE_AMR(r1)
ld  r5, PNV_POWERSAVE_IAMR(r1)
+   ld  r6, PNV_POWERSAVE_UAMOR(r1)
+   mtspr   SPRN_AMR, r4
mtspr   SPRN_IAMR, r5
+   mtspr   SPRN_UAMOR, r6
+BEGIN_FTR_SECTION_NESTED(42)
+   ld  r7, PNV_POWERSAVE_AMOR(r1)
+   mtspr   SPRN_AMOR, r7
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
/*
-* We don't need an isync here because the upcoming mtmsrd is
-* execution synchronizing.
+* We don't need an isync here after restoring IAMR because the upcoming
+* mtmsrd is execution synchronizing.
 */
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
-- 
2.20.1

Applied "ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe()" to the asoc tree

2020-04-15 Thread Mark Brown

The patch

   ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe()

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 83b35f4586e235bfb785a7947b555ad8f3d96887 Mon Sep 17 00:00:00 2001
From: Tang Bin 
Date: Wed, 15 Apr 2020 12:45:13 +0800
Subject: [PATCH] ASoC: fsl_micfil: Omit superfluous error message in
 fsl_micfil_probe()

In the function fsl_micfil_probe(), when get irq failed, the function
platform_get_irq() logs an error message, so remove redundant message here.

Signed-off-by: Tang Bin 
Signed-off-by: Shengju Zhang 
Link: 
https://lore.kernel.org/r/20200415044513.17492-1-tang...@cmss.chinamobile.com
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_micfil.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c
index f7f2d29f1bfe..e73bd6570a08 100644
--- a/sound/soc/fsl/fsl_micfil.c
+++ b/sound/soc/fsl/fsl_micfil.c
@@ -702,10 +702,8 @@ static int fsl_micfil_probe(struct platform_device *pdev)
for (i = 0; i < MICFIL_IRQ_LINES; i++) {
micfil->irq[i] = platform_get_irq(pdev, i);
dev_err(>dev, "GET IRQ: %d\n", micfil->irq[i]);
-   if (micfil->irq[i] < 0) {
-   dev_err(>dev, "no irq for node %s\n", pdev->name);
+   if (micfil->irq[i] < 0)
return micfil->irq[i];
-   }
}
 
if (of_property_read_bool(np, "fsl,shared-interrupt"))
-- 
2.20.1

[PATCH 4/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr

Include "linux/of_address.h" to fix the compile error for
mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.

  CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’:
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of 
function ‘of_iomap’; did you mean ‘pci_iomap’? 
[-Werror=implicit-function-declaration]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
   ^~~~
   pci_iomap
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer 
from integer without a cast [-Werror=int-conversion]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
 ^
cc1: all warnings being treated as errors
scripts/Makefile.build:267: recipe for target 
'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 
---
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c 
b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
index 2d0af0c517bb..7533572492f0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
+++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fsl_85xx_cache_ctlr.h"
-- 
2.17.1

[PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram

A driver for freescale 85xx platforms to access the Cache-Sram form
user level. This is extremely helpful for some user-space applications
that require high performance memory accesses.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
 drivers/uio/Kconfig   |   8 ++
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
 3 files changed, 204 insertions(+)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 202ee81cfc2b..afd38ec13de0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,14 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
 
+config UIO_FSL_85XX_CACHE_SRAM
+   tristate "Freescale 85xx Cache-Sram driver"
+   depends on FSL_85XX_CACHE_SRAM
+   help
+ Generic driver for accessing the Cache-Sram form user level. This
+ is extremely helpful for some user-space applications that require
+ high performance memory accesses.
+
 config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index c285dd2a4539..be2056cffc21 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o
 obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
 obj-$(CONFIG_UIO_MF624) += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
 obj-$(CONFIG_UIO_HV_GENERIC)   += uio_hv_generic.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..e11202dd5b93
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+ * Copyright (C) 2020 Wang Wenhu 
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION "0.1.0"
+#define DRIVER_NAME"uio_fsl_85xx_cache_sram"
+#define UIO_NAME   "uio_cache_sram"
+
+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
+   {},
+};
+
+static void uio_info_free_internal(struct uio_info *info)
+{
+   struct uio_mem *uiomem = >mem[0];
+
+   while (uiomem < >mem[MAX_UIO_MAPS]) {
+   if (uiomem->size) {
+   mpc85xx_cache_sram_free(uiomem->internal_addr);
+   kfree(uiomem->name);
+   }
+   uiomem++;
+   }
+}
+
+static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
+{
+   struct device_node *parent = pdev->dev.of_node;
+   struct device_node *node = NULL;
+   struct uio_info *info;
+   struct uio_mem *uiomem;
+   const char *dt_name;
+   u32 mem_size;
+   u32 align;
+   void *virt;
+   phys_addr_t phys;
+   int ret = -ENODEV;
+
+   /* alloc uio_info for one device */
+   info = kzalloc(sizeof(*info), GFP_KERNEL);
+   if (!info) {
+   dev_err(>dev, "kzalloc uio_info failed\n");
+   ret = -ENOMEM;
+   goto

[PATCH 3/5] powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram

Function instantiate_cache_sram should not be linked into the init
section for its caller mpc85xx_l2ctlr_of_probe is none-__init.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 

Warning information:
  MODPOST vmlinux.o
WARNING: modpost: vmlinux.o(.text+0x1e540): Section mismatch in reference from 
the function mpc85xx_l2ctlr_of_probe() to the function 
.init.text:instantiate_cache_sram()
The function mpc85xx_l2ctlr_of_probe() references
the function __init instantiate_cache_sram().
This is often because mpc85xx_l2ctlr_of_probe lacks a __init
annotation or the annotation of instantiate_cache_sram is wrong.
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index be3aef4229d7..3de5ac8382c0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -68,7 +68,7 @@ void mpc85xx_cache_sram_free(void *ptr)
 }
 EXPORT_SYMBOL(mpc85xx_cache_sram_free);
 
-int __init instantiate_cache_sram(struct platform_device *dev,
+int instantiate_cache_sram(struct platform_device *dev,
struct sram_parameters sram_params)
 {
int ret = 0;
-- 
2.17.1

[PATCH 2/5] powerpc: sysdev: fix compile error for fsl_85xx_cache_sram

Include linux/io.h into fsl_85xx_cache_sram.c to fix the
implicit-declaration compile error when building Cache-Sram.

arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’:
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of 
function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? 
[-Werror=implicit-function-declaration]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
  ^~~~
  bitmap_complement
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes 
pointer from integer without a cast [-Werror=int-conversion]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
^
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of 
function ‘iounmap’; did you mean ‘roundup’? 
[-Werror=implicit-function-declaration]
  iounmap(cache_sram->base_virt);
  ^~~
  roundup
cc1: all warnings being treated as errors

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index f6c665dac725..be3aef4229d7 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fsl_85xx_cache_ctlr.h"
 
-- 
2.17.1

[PATCH 0/5] drivers: uio: new driver uio_fsl_85xx_cache_sram

This series add a new uio driver for freescale 85xx platforms to
access the Cache-Sram form user level. This is extremely helpful
for the user-space applications that require high performance memory
accesses.

It fixes the compile errors and warning of the hardware level drivers
and implements the uio driver in uio_fsl_85xx_cache_sram.c.

Wang Wenhu (5):
  powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
  powerpc: sysdev: fix compile error for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr
  drivers: uio: new driver for fsl_85xx_cache_sram

 arch/powerpc/platforms/85xx/Kconfig   |   2 +-
 arch/powerpc/platforms/Kconfig.cputype|   5 +-
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c |   3 +-
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c |   1 +
 drivers/uio/Kconfig   |   8 +
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
 7 files changed, 211 insertions(+), 4 deletions(-)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

-- 
2.17.1

[PATCH 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
 arch/powerpc/platforms/85xx/Kconfig| 2 +-
 arch/powerpc/platforms/Kconfig.cputype | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index fa3d29dcb57e..6debb4f1b9cc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
 if PPC32
 
 config FSL_85XX_CACHE_SRAM
-   bool
+   bool "Freescale 85xx Cache-Sram"
select PPC_LIB_RHEAP
help
  When selected, this option enables cache-sram support
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..1921e9a573e8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 config PPC32
-   bool
+   bool "32-bit kernel"
default y if !PPC64
select KASAN_VMALLOC if KASAN && MODULES
 
@@ -15,6 +15,7 @@ config PPC_BOOK3S_32
bool
 
 menu "Processor support"
+
 choice
prompt "Processor Type"
depends on PPC32
@@ -211,9 +212,9 @@ config PPC_BOOK3E
depends on PPC_BOOK3E_64
 
 config E500
+   bool "e500 Support"
select FSL_EMB_PERFMON
select PPC_FSL_BOOK3E
-   bool
 
 config PPC_E500MC
bool "e500mc Support"
-- 
2.17.1

[PATCH AUTOSEL 4.9 06/21] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index b7f937563827d..d1fee2d35b49c 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -299,23 +299,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 4.14 10/30] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index b7f937563827d..d1fee2d35b49c 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -299,23 +299,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 4.19 10/40] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index b7f937563827d..d1fee2d35b49c 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -299,23 +299,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.4 32/84] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 9cd6f3e1000b3..09a0594350b69 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -294,23 +294,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -367,3 +350,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.4 31/84] powerpc/prom_init: Pass the "os-term" message to hypervisor

From: Alexey Kardashevskiy 

[ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ]

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/prom_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index eba9d4ee4baf6..689664cd4e79b 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1761,6 +1761,9 @@ static void __init prom_rtas_os_term(char *str)
if (token == 0)
prom_panic("Could not get token for ibm,os-term\n");
os_term_args.token = cpu_to_be32(token);
+   os_term_args.nargs = cpu_to_be32(1);
+   os_term_args.nret = cpu_to_be32(1);
+   os_term_args.args[0] = cpu_to_be32(__pa(str));
prom_rtas_hcall((uint64_t)_term_args);
 }
 #endif /* CONFIG_PPC_SVM */
-- 
2.20.1

[PATCH AUTOSEL 5.4 21/84] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests

From: Michael Roth 

[ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ]

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 
2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when 
nested")
Cc: linuxppc-...@ozlabs.org
Cc: David Gibson 
Cc: Paul Mackerras 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 36abbe3c346df..e2183fed947d4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3623,6 +3623,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_nested_cede(vcpu);
+   kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
} else {
-- 
2.20.1

[PATCH AUTOSEL 5.5 046/106] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 9cd6f3e1000b3..09a0594350b69 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -294,23 +294,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -367,3 +350,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.5 045/106] powerpc/prom_init: Pass the "os-term" message to hypervisor

From: Alexey Kardashevskiy 

[ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ]

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/prom_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 577345382b23f..673f13b87db13 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str)
if (token == 0)
prom_panic("Could not get token for ibm,os-term\n");
os_term_args.token = cpu_to_be32(token);
+   os_term_args.nargs = cpu_to_be32(1);
+   os_term_args.nret = cpu_to_be32(1);
+   os_term_args.args[0] = cpu_to_be32(__pa(str));
prom_rtas_hcall((uint64_t)_term_args);
 }
 #endif /* CONFIG_PPC_SVM */
-- 
2.20.1

[PATCH AUTOSEL 5.5 031/106] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests

From: Michael Roth 

[ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ]

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 
2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when 
nested")
Cc: linuxppc-...@ozlabs.org
Cc: David Gibson 
Cc: Paul Mackerras 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ef6aa63b071b3..a1d793b96d2b7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3628,6 +3628,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_nested_cede(vcpu);
+   kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
} else {
-- 
2.20.1

[PATCH AUTOSEL 5.6 053/129] powerpc/prom_init: Pass the "os-term" message to hypervisor

From: Alexey Kardashevskiy 

[ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ]

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/prom_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 577345382b23f..673f13b87db13 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str)
if (token == 0)
prom_panic("Could not get token for ibm,os-term\n");
os_term_args.token = cpu_to_be32(token);
+   os_term_args.nargs = cpu_to_be32(1);
+   os_term_args.nret = cpu_to_be32(1);
+   os_term_args.args[0] = cpu_to_be32(__pa(str));
prom_rtas_hcall((uint64_t)_term_args);
 }
 #endif /* CONFIG_PPC_SVM */
-- 
2.20.1

[PATCH AUTOSEL 5.6 054/129] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 6f019df37916f..15b2c6eb506d0 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -291,23 +291,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -364,3 +347,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.6 039/129] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests

From: Michael Roth 

[ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ]

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 
2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when 
nested")
Cc: linuxppc-...@ozlabs.org
Cc: David Gibson 
Cc: Paul Mackerras 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2cefd071b8483..c0c43a7338304 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3616,6 +3616,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_nested_cede(vcpu);
+   kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
} else {
-- 
2.20.1

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-04-15 Thread Alexey Kardashevskiy

On 15/04/2020 16:18, Christoph Hellwig wrote:
> On Wed, Apr 15, 2020 at 12:26:04PM +1000, Alexey Kardashevskiy wrote:
>> May be this is correct and allowed (no idea) but removing exported
>> symbols at least deserves a mention in the commit log, does not it?
>>
>> The rest of the series is fine and works. Thanks,
> 
> Maybe I can throw in a line, but the point is that dma_direct_*
> was exported as dma_* called them inline.  Now dma_* is out of line
> and exported instead, which always was the actual API.

They become inline in 2/4.

And the fact they were exported leaves possibility that there is a
driver somewhere relying on these symbols or distro kernel won't build
because the symbol disappeared from exports (I do not know what KABI
guarantees or if mainline kernel cares). I do not care in particular but
some might, a line separated with empty lines in the commit log would do.

-- 
Alexey

Re: [PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property

2020-04-15 Thread Wolfram Sang

On Wed, Apr 08, 2020 at 03:33:53PM +0530, Aishwarya R wrote:
> Use of_property_read_u32 to read the "reg" and "i2c-address" property
> instead of using of_get_property to check the return values.
> 
> Signed-off-by: Aishwarya R 

This is quite a fragile driver. Have you tested it on HW?



signature.asc
Description: PGP signature

Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings

2020-04-15 Thread Will Deacon

Hi Nick,

On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
> have vmalloc attempt to allocate PMD-sized pages first, before falling back
> to small pages. Allocations which use something other than PAGE_KERNEL
> protections are not permitted to use huge pages yet, not all callers expect
> this (e.g., module allocations vs strict module rwx).
> 
> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.

I wonder if it's worth extending vmap() to handle higher order pages in
a similar way? That might be helpful for tracing PMUs such as Arm SPE,
where the CPU streams tracing data out to a virtually addressed buffer
(see rb_alloc_aux_page()).

> This can result in more internal fragmentation and memory overhead for a
> given allocation. It can also cause greater NUMA unbalance on hashdist
> allocations.
> 
> There may be other callers that expect small pages under vmalloc but use
> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
> alternative would be a new function or flag which enables large mappings,
> and use that in callers.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/linux/vmalloc.h |   2 +
>  mm/vmalloc.c| 135 +---
>  2 files changed, 102 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 291313a7e663..853b82eac192 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -24,6 +24,7 @@ struct notifier_block;  /* in notifier.h */
>  #define VM_UNINITIALIZED 0x0020  /* vm_struct is not fully 
> initialized */
>  #define VM_NO_GUARD  0x0040  /* don't add guard page */
>  #define VM_KASAN 0x0080  /* has allocated kasan shadow 
> memory */
> +#define VM_HUGE_PAGES0x0100  /* may use huge pages */

Please can you add a check for this in the arm64 change_memory_common()
code? Other architectures might need something similar, but we need to
forbid changing memory attributes for portions of the huge page.

In general, I'm a bit wary of software table walkers tripping over this.
For example, I don't think apply_to_existing_page_range() can handle
huge mappings at all, but the one user (KASAN) only ever uses page mappings
so it's ok there.

> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned 
> long size,
>   if (unlikely(!size))
>   return NULL;
>  
> - if (flags & VM_IOREMAP)
> - align = 1ul << clamp_t(int, get_count_order_long(size),
> -PAGE_SHIFT, IOREMAP_MAX_ORDER);
> + if (flags & VM_IOREMAP) {
> + align = max(align,
> + 1ul << clamp_t(int, get_count_order_long(size),
> +PAGE_SHIFT, IOREMAP_MAX_ORDER));
> + }


I don't follow this part. Please could you explain why you're potentially
aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
of the patch.

Cheers,

Will

Re: [PATCH v2] Fix: buffer overflow during hvc_alloc().

2020-04-15 Thread Greg KH

On Tue, Apr 14, 2020 at 10:15:03PM +0300, and...@daynix.com wrote:
> From: Andrew Melnychenko 
> 
> If there is a lot(more then 16) of virtio-console devices
> or virtio_console module is reloaded
> - buffers 'vtermnos' and 'cons_ops' are overflowed.
> In older kernels it overruns spinlock which leads to kernel freezing:
> https://bugzilla.redhat.com/show_bug.cgi?id=1786239
> 
> To reproduce the issue, you can try simple script that
> loads/unloads module. Something like this:
> while [ 1 ]
> do
>   modprobe virtio_console
>   sleep 2
>   modprobe -r virtio_console
>   sleep 2
> done
> 
> Description of problem:
> Guest get 'Call Trace' when loading module "virtio_console"
> and unloading it frequently - clearly reproduced on kernel-4.18.0:
> 
> [   81.498208] [ cut here ]
> [   81.499263] pvqspinlock: lock 0x92080020 has corrupted value 
> 0xc0774ca0!
> [   81.501000] WARNING: CPU: 0 PID: 785 at 
> kernel/locking/qspinlock_paravirt.h:500 
> __pv_queued_spin_unlock_slowpath+0xc0/0xd0
> [   81.503173] Modules linked in: virtio_console fuse xt_CHECKSUM 
> ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref 
> nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
> nf_tables_set nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
> nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 
> nf_nat_ipv4 nf_nat nf_conntrack nft_chain_route_ipv4 ip6_tables nft_compat 
> ip_set nf_tables nfnetlink sunrpc bochs_drm drm_vram_helper ttm 
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 
> pcspkr crct10dif_pclmul crc32_pclmul joydev ghash_clmulni_intel ip_tables xfs 
> libcrc32c sd_mod sg ata_generic ata_piix virtio_net libata crc32c_intel 
> net_failover failover serio_raw virtio_scsi dm_mirror dm_region_hash dm_log 
> dm_mod [last unloaded: virtio_console]
> [   81.517019] CPU: 0 PID: 785 Comm: kworker/0:2 Kdump: loaded Not tainted 
> 4.18.0-167.el8.x86_64 #1
> [   81.518639] Hardware name: Red Hat KVM, BIOS 
> 1.12.0-5.scrmod+el8.2.0+5159+d8aa4d83 04/01/2014
> [   81.520205] Workqueue: events control_work_handler [virtio_console]
> [   81.521354] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xc0/0xd0
> [   81.522450] Code: 07 00 48 63 7a 10 e8 bf 64 f5 ff 66 90 c3 8b 05 e6 cf d6 
> 01 85 c0 74 01 c3 8b 17 48 89 fe 48 c7 c7 38 4b 29 91 e8 3a 6c fa ff <0f> 0b 
> c3 0f 0b 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48
> [   81.525830] RSP: 0018:b51a01ffbd70 EFLAGS: 00010282
> [   81.526798] RAX:  RBX: 0010 RCX: 
> 
> [   81.528110] RDX: 9e66f1826480 RSI: 9e66f1816a08 RDI: 
> 9e66f1816a08
> [   81.529437] RBP: 9153ff10 R08: 026c R09: 
> 0053
> [   81.530732] R10:  R11: b51a01ffbc18 R12: 
> 9e66cd682200
> [   81.532133] R13: 9153ff10 R14: 9e6685569500 R15: 
> 9e66cd682000
> [   81.533442] FS:  () GS:9e66f180() 
> knlGS:
> [   81.534914] CS:  0010 DS:  ES:  CR0: 80050033
> [   81.535971] CR2: 5624c55b14d0 CR3: 0003a023c000 CR4: 
> 003406f0
> [   81.537283] Call Trace:
> [   81.537763]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
> [   81.539011]  .slowpath+0x9/0xe
> [   81.539585]  hvc_alloc+0x25e/0x300
> [   81.540237]  init_port_console+0x28/0x100 [virtio_console]
> [   81.541251]  handle_control_message.constprop.27+0x1c4/0x310 
> [virtio_console]
> [   81.542546]  control_work_handler+0x70/0x10c [virtio_console]
> [   81.543601]  process_one_work+0x1a7/0x3b0
> [   81.544356]  worker_thread+0x30/0x390
> [   81.545025]  ? create_worker+0x1a0/0x1a0
> [   81.545749]  kthread+0x112/0x130
> [   81.546358]  ? kthread_flush_work_fn+0x10/0x10
> [   81.547183]  ret_from_fork+0x22/0x40
> [   81.547842] ---[ end trace aa97649bd16c8655 ]---
> [   83.546539] general protection fault:  [#1] SMP NOPTI
> [   83.547422] CPU: 5 PID: 3225 Comm: modprobe Kdump: loaded Tainted: G   
>  W- -  - 4.18.0-167.el8.x86_64 #1
> [   83.549191] Hardware name: Red Hat KVM, BIOS 
> 1.12.0-5.scrmod+el8.2.0+5159+d8aa4d83 04/01/2014
> [   83.550544] RIP: 0010:__pv_queued_spin_lock_slowpath+0x19a/0x2a0
> [   83.551504] Code: c4 c1 ea 12 41 be 01 00 00 00 4c 8d 6d 14 41 83 e4 03 8d 
> 42 ff 49 c1 e4 05 48 98 49 81 c4 40 a5 02 00 4c 03 24 c5 60 48 34 91 <49> 89 
> 2c 24 b8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 14 84
> [   83.554449] RSP: 0018:b51a0323fdb0 EFLAGS: 00010202
> [   83.555290] RAX: 301c RBX: 92080020 RCX: 
> 0001
> [   83.556426] RDX: 301d RSI:  RDI: 
> 
> [   83.557556] RBP: 9e66f196a540 R08: 028a R09: 
> 9e66d2757788
> [   83.558688] R10:  R11:

[PATCH] powerpc/8xx: Reduce time spent in allow_user_access() and friends

To enable/disable kernel access to user space, the 8xx has to
modify the properties of access group 1. This is done by writing
predefined values into SPRN_Mx_AP registers.

As of today, a __put_user() gives:

0d64 :
 d64:   3d 20 4f ff lis r9,20479
 d68:   61 29 ff ff ori r9,r9,65535
 d6c:   7d 3a c3 a6 mtspr   794,r9
 d70:   39 20 00 00 li  r9,0
 d74:   90 83 00 00 stw r4,0(r3)
 d78:   3d 20 6f ff lis r9,28671
 d7c:   61 29 ff ff ori r9,r9,65535
 d80:   7d 3a c3 a6 mtspr   794,r9
 d84:   4e 80 00 20 blr

Because only groups 0 and 1 are used, the definition of
groups 2 to 15 doesn't matter.
By setting unused bits to 0 instead on 1, one instruction is
removed for each lock and unlock action:

0d5c :
 d5c:   3d 20 40 00 lis r9,16384
 d60:   7d 3a c3 a6 mtspr   794,r9
 d64:   39 20 00 00 li  r9,0
 d68:   90 83 00 00 stw r4,0(r3)
 d6c:   3d 20 60 00 lis r9,24576
 d70:   7d 3a c3 a6 mtspr   794,r9
 d74:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 76af5b0cb16e..6aa3464a88ed 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -37,16 +37,16 @@
  * Therefore, we define 2 APG groups. lsb is _PMD_USER
  * 0 => Kernel => 01 (all accesses performed according to page definition)
  * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
- * 2-16 => NA => 11 (all accesses performed as user iaw page definition)
+ * 2-15 => Not Used
  */
-#define MI_APG_INIT0x4fff
+#define MI_APG_INIT0x4000
 
 /*
  * 0 => Kernel => 01 (all accesses performed according to page definition)
  * 1 => User => 10 (all accesses performed according to swaped page definition)
- * 2-16 => NA => 11 (all accesses performed as user iaw page definition)
+ * 2-15 => Not Used
  */
-#define MI_APG_KUEP0x6fff
+#define MI_APG_KUEP0x6000
 
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MI_RPN is written, bits in
@@ -117,16 +117,16 @@
  * Therefore, we define 2 APG groups. lsb is _PMD_USER
  * 0 => Kernel => 01 (all accesses performed according to page definition)
  * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
- * 2-16 => NA => 11 (all accesses performed as user iaw page definition)
+ * 2-15 => Not Used
  */
-#define MD_APG_INIT0x4fff
+#define MD_APG_INIT0x4000
 
 /*
  * 0 => No user => 01 (all accesses performed according to page definition)
  * 1 => User => 10 (all accesses performed according to swaped page definition)
- * 2-16 => NA => 11 (all accesses performed as user iaw page definition)
+ * 2-15 => Not Used
  */
-#define MD_APG_KUAP0x6fff
+#define MD_APG_KUAP0x6000
 
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MD_RPN is written, bits in
-- 
2.25.0

[PATCH v2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

unsafe_put_user() is designed to take benefit of 'asm goto'.

Instead of using the standard __put_user() approach and branch
based on the returned error, use 'asm goto' and make the
exception code branch directly to the error label. There is
no code anymore in the fixup section.

This change significantly simplifies functions using
unsafe_put_user()

Small exemple of the benefit with the following code:

struct test {
u32 item1;
u16 item2;
u8 item3;
u64 item4;
};

int set_test_to_user(struct test __user *test, u32 item1, u16 item2, u8 item3, 
u64 item4)
{
unsafe_put_user(item1, >item1, failed);
unsafe_put_user(item2, >item2, failed);
unsafe_put_user(item3, >item3, failed);
unsafe_put_user(item4, >item4, failed);
return 0;
failed:
return -EFAULT;
}

Before the patch:

0be8 :
 be8:   39 20 00 00 li  r9,0
 bec:   90 83 00 00 stw r4,0(r3)
 bf0:   2f 89 00 00 cmpwi   cr7,r9,0
 bf4:   40 9e 00 38 bne cr7,c2c 
 bf8:   b0 a3 00 04 sth r5,4(r3)
 bfc:   2f 89 00 00 cmpwi   cr7,r9,0
 c00:   40 9e 00 2c bne cr7,c2c 
 c04:   98 c3 00 06 stb r6,6(r3)
 c08:   2f 89 00 00 cmpwi   cr7,r9,0
 c0c:   40 9e 00 20 bne cr7,c2c 
 c10:   90 e3 00 08 stw r7,8(r3)
 c14:   91 03 00 0c stw r8,12(r3)
 c18:   21 29 00 00 subfic  r9,r9,0
 c1c:   7d 29 49 10 subfe   r9,r9,r9
 c20:   38 60 ff f2 li  r3,-14
 c24:   7d 23 18 38 and r3,r9,r3
 c28:   4e 80 00 20 blr
 c2c:   38 60 ff f2 li  r3,-14
 c30:   4e 80 00 20 blr

 <.fixup>:
...
  b8:   39 20 ff f2 li  r9,-14
  bc:   48 00 00 00 b   bc <.fixup+0xbc>
bc: R_PPC_REL24 .text+0xbf0
  c0:   39 20 ff f2 li  r9,-14
  c4:   48 00 00 00 b   c4 <.fixup+0xc4>
c4: R_PPC_REL24 .text+0xbfc
  c8:   39 20 ff f2 li  r9,-14
  cc:   48 00 00 00 b   cc <.fixup+0xcc>
cc: R_PPC_REL24 .text+0xc08
  d0:   39 20 ff f2 li  r9,-14
  d4:   48 00 00 00 b   d4 <.fixup+0xd4>
d4: R_PPC_REL24 .text+0xc18

 <__ex_table>:
...
a0: R_PPC_REL32 .text+0xbec
a4: R_PPC_REL32 .fixup+0xb8
a8: R_PPC_REL32 .text+0xbf8
ac: R_PPC_REL32 .fixup+0xc0
b0: R_PPC_REL32 .text+0xc04
b4: R_PPC_REL32 .fixup+0xc8
b8: R_PPC_REL32 .text+0xc10
bc: R_PPC_REL32 .fixup+0xd0
c0: R_PPC_REL32 .text+0xc14
c4: R_PPC_REL32 .fixup+0xd0

After the patch:

0be8 :
 be8:   90 83 00 00 stw r4,0(r3)
 bec:   b0 a3 00 04 sth r5,4(r3)
 bf0:   98 c3 00 06 stb r6,6(r3)
 bf4:   90 e3 00 08 stw r7,8(r3)
 bf8:   91 03 00 0c stw r8,12(r3)
 bfc:   38 60 00 00 li  r3,0
 c00:   4e 80 00 20 blr
 c04:   38 60 ff f2 li  r3,-14
 c08:   4e 80 00 20 blr

 <__ex_table>:
...
a0: R_PPC_REL32 .text+0xbe8
a4: R_PPC_REL32 .text+0xc04
a8: R_PPC_REL32 .text+0xbec
ac: R_PPC_REL32 .text+0xc04
b0: R_PPC_REL32 .text+0xbf0
b4: R_PPC_REL32 .text+0xc04
b8: R_PPC_REL32 .text+0xbf4
bc: R_PPC_REL32 .text+0xc04
c0: R_PPC_REL32 .text+0xbf8
c4: R_PPC_REL32 .text+0xc04

Signed-off-by: Christophe Leroy 
---
v2:
- Grouped most __goto() macros together
- Removed stuff in .fixup section, referencing the error label
directly from the extable
- Using more flexible addressing in asm.
---
 arch/powerpc/include/asm/uaccess.h | 61 +-
 1 file changed, 52 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index dee71e9c7618..5d323e4f2ce1 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -93,12 +93,12 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
 #define __get_user(x, ptr) \
__get_user_nocheck((x), (ptr), sizeof(*(ptr)), true)
 #define __put_user(x, ptr) \
-   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), true)
+   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
+#define __put_user_goto(x, ptr, label) \
+   __put_user_nocheck_goto((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), 
label)
 
 #define __get_user_allowed(x, ptr) \
__get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
-#define __put_user_allowed(x, ptr) \
-   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), 
false)
 
 #define __get_user_inatomic(x, ptr) \

[PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

At the time being, __put_user()/__get_user() and friends only use
register indirect with immediate index addressing, with the index
set to 0. Ex:

lwz reg1, 0(reg2)

Give the compiler the opportunity to use other adressing modes
whenever possible, to get more optimised code.

Hereunder is a small exemple:

struct test {
u32 item1;
u16 item2;
u8 item3;
u64 item4;
};

int set_test_user(struct test __user *from, struct test __user *to, int idx)
{
int err;
u32 item1;
u16 item2;
u8 item3;
u64 item4;

err = __get_user(item1, >item1);
err |= __get_user(item2, >item2);
err |= __get_user(item3, >item3);
err |= __get_user(item4, >item4);

err |= __put_user(item1, >item1);
err |= __put_user(item2, >item2);
err |= __put_user(item3, >item3);
err |= __put_user(item4, >item4);

return err;
}

Before the patch:

0df0 :
 df0:   94 21 ff f0 stwur1,-16(r1)
 df4:   39 40 00 00 li  r10,0
 df8:   93 c1 00 08 stw r30,8(r1)
 dfc:   93 e1 00 0c stw r31,12(r1)
 e00:   7d 49 53 78 mr  r9,r10
 e04:   80 a3 00 00 lwz r5,0(r3)
 e08:   38 e3 00 04 addir7,r3,4
 e0c:   7d 46 53 78 mr  r6,r10
 e10:   a0 e7 00 00 lhz r7,0(r7)
 e14:   7d 29 33 78 or  r9,r9,r6
 e18:   39 03 00 06 addir8,r3,6
 e1c:   7d 46 53 78 mr  r6,r10
 e20:   89 08 00 00 lbz r8,0(r8)
 e24:   7d 29 33 78 or  r9,r9,r6
 e28:   38 63 00 08 addir3,r3,8
 e2c:   7d 46 53 78 mr  r6,r10
 e30:   83 c3 00 00 lwz r30,0(r3)
 e34:   83 e3 00 04 lwz r31,4(r3)
 e38:   7d 29 33 78 or  r9,r9,r6
 e3c:   7d 43 53 78 mr  r3,r10
 e40:   90 a4 00 00 stw r5,0(r4)
 e44:   7d 29 1b 78 or  r9,r9,r3
 e48:   38 c4 00 04 addir6,r4,4
 e4c:   7d 43 53 78 mr  r3,r10
 e50:   b0 e6 00 00 sth r7,0(r6)
 e54:   7d 29 1b 78 or  r9,r9,r3
 e58:   38 e4 00 06 addir7,r4,6
 e5c:   7d 43 53 78 mr  r3,r10
 e60:   99 07 00 00 stb r8,0(r7)
 e64:   7d 23 1b 78 or  r3,r9,r3
 e68:   38 84 00 08 addir4,r4,8
 e6c:   93 c4 00 00 stw r30,0(r4)
 e70:   93 e4 00 04 stw r31,4(r4)
 e74:   7c 63 53 78 or  r3,r3,r10
 e78:   83 c1 00 08 lwz r30,8(r1)
 e7c:   83 e1 00 0c lwz r31,12(r1)
 e80:   38 21 00 10 addir1,r1,16
 e84:   4e 80 00 20 blr

After the patch:

0dbc :
 dbc:   39 40 00 00 li  r10,0
 dc0:   7d 49 53 78 mr  r9,r10
 dc4:   80 03 00 00 lwz r0,0(r3)
 dc8:   7d 48 53 78 mr  r8,r10
 dcc:   a1 63 00 04 lhz r11,4(r3)
 dd0:   7d 29 43 78 or  r9,r9,r8
 dd4:   7d 48 53 78 mr  r8,r10
 dd8:   88 a3 00 06 lbz r5,6(r3)
 ddc:   7d 29 43 78 or  r9,r9,r8
 de0:   7d 48 53 78 mr  r8,r10
 de4:   80 c3 00 08 lwz r6,8(r3)
 de8:   80 e3 00 0c lwz r7,12(r3)
 dec:   7d 29 43 78 or  r9,r9,r8
 df0:   7d 43 53 78 mr  r3,r10
 df4:   90 04 00 00 stw r0,0(r4)
 df8:   7d 29 1b 78 or  r9,r9,r3
 dfc:   7d 43 53 78 mr  r3,r10
 e00:   b1 64 00 04 sth r11,4(r4)
 e04:   7d 29 1b 78 or  r9,r9,r3
 e08:   7d 43 53 78 mr  r3,r10
 e0c:   98 a4 00 06 stb r5,6(r4)
 e10:   7d 23 1b 78 or  r3,r9,r3
 e14:   90 c4 00 08 stw r6,8(r4)
 e18:   90 e4 00 0c stw r7,12(r4)
 e1c:   7c 63 53 78 or  r3,r3,r10
 e20:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/uaccess.h | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 2f500debae21..dee71e9c7618 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -114,7 +114,7 @@ extern long __put_user_bad(void);
  */
 #define __put_user_asm(x, addr, err, op)   \
__asm__ __volatile__(   \
-   "1: " op " %1,0(%2) # put_user\n"   \
+   "1: " op "%U2%X2 %1,%2  # put_user\n"   \
"2:\n"  \
".section .fixup,\"ax\"\n"  \
"3: li %0,%3\n" \
@@ -122,7 +122,7 @@ extern long __put_user_bad(void);
".previous\n"   \
EX_TABLE(1b, 3b)\
: "=r" (err)\
-   : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
+   : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))
 
 #ifdef __powerpc64__
 #define __put_user_asm2(x, ptr, retval)\
@@ -130,8 +130,8 @@ extern long

Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching





Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :

On March 26, 2020 9:42 AM Christophe Leroy  wrote:

  
This patch fixes the RFC series identified below.

It fixes three points:
- Failure with CONFIG_PPC_KUAP
- Failure to write do to lack of DIRTY bit set on the 8xx
- Inadequaly complex WARN post verification

However, it has an impact on the CPU load. Here is the time
needed on an 8xx to run the ftrace selftests without and
with this series:
- Without CONFIG_STRICT_KERNEL_RWX  ==> 38 seconds
- With CONFIG_STRICT_KERNEL_RWX ==> 40 seconds
- With CONFIG_STRICT_KERNEL_RWX + this series   ==> 43 seconds

Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/lib/code-patching.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f156132e8975..4ccff427592e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping 
*patch_mapping)
}
  
  	pte = mk_pte(page, pgprot);

+   pte = pte_mkdirty(pte);
set_pte_at(patching_mm, patching_addr, ptep, pte);
  
  	init_temp_mm(_mapping->temp_mm, patching_mm);

@@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, 
unsigned int instr)
(offset_in_page((unsigned long)addr) /
sizeof(unsigned int));
  
+	allow_write_to_user(patch_addr, sizeof(instr));

__patch_instruction(addr, instr, patch_addr);
+   prevent_write_to_user(patch_addr, sizeof(instr));



On radix we can map the page with PAGE_KERNEL protection which ends up
setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.

Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
because this is a temporary kernel mapping which really isn't userspace in
the usual sense.


On the 8xx, that's pretty different.

The PTE doesn't control whether a page is user page or a kernel page. 
The only thing that is set in the PTE is whether a page is linked to a 
given PID or not.

PAGE_KERNEL tells that the page can be addressed with any PID.

The user access right is given by a kind of zone, which is in the PGD 
entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. 
Every pages below PAGE_OFFSET are defined as belonging to zone 1.


By default, zone 0 can only be accessed by kernel, and zone 1 can only 
be accessed by user. When kernel wants to access zone 1, it temporarily 
changes properties of zone 1 to allow both kernel and user accesses.


So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel 
must unlock it to access it.



And this is more or less the same on hash/32. This is managed by segment 
registers. One segment register corresponds to a 256Mbytes area. Every 
pages below PAGE_OFFSET can only be read by default by kernel. Only user 
can write if the PTE allows it. When the kernel needs to write at an 
address below PAGE_OFFSET, it must change the segment properties in the 
corresponding segment register.


So, for both cases, if we want to have it local to a task while still 
allowing kernel access, it means we have to define a new special area 
between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.


That looks complex to me for a small benefit, especially as 8xx is not 
SMP and neither are most of the hash/32 targets.


Christophe

Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching