Re: [PATCH v5 4/6] powerpc/pseries: Consolidate different NUMA distance update code paths

2021-07-26 Thread David Gibson
On Tue, Jul 27, 2021 at 09:02:33AM +0530, Aneesh Kumar K.V wrote:
> David Gibson  writes:
> 
> > On Thu, Jul 22, 2021 at 12:37:46PM +0530, Aneesh Kumar K.V wrote:
> >> David Gibson  writes:
> >> 
> >> > On Mon, Jun 28, 2021 at 08:41:15PM +0530, Aneesh Kumar K.V wrote:
> 
> 
> 
> > 
> >> >
> >> >> +   nid = of_read_number([index], 1);
> >> >> +
> >> >> +   if (nid == 0x || nid >= nr_node_ids)
> >> >> +   nid = default_nid;
> >> >> +   if (nid > 0 && affinity_form == FORM1_AFFINITY) {
> >> >> +   int i;
> >> >> +   const __be32 *associativity;
> >> >> +
> >> >> +   index = lmb->aa_index * aa.array_sz;
> >> >> +   associativity = [index];
> >> >> +   /*
> >> >> +* lookup array associativity entries have 
> >> >> different format
> >> >> +* There is no length of the array as the first 
> >> >> element.
> >> >
> >> > The difference it very small, and this is not a hot path.  Couldn't
> >> > you reduce a chunk of code by prepending aa.array_sz, then re-using
> >> > __initialize_form1_numa_distance.  Or even making
> >> > __initialize_form1_numa_distance() take the length as a parameter.
> >> 
> >> The changes are small but confusing w.r.t how we look at the
> >> associativity-lookup-arrays. The way we interpret associativity array
> >> and associativity lookup array using primary_domain_index is different.
> >> Hence the '-1' in the node lookup here.
> >
> > They're really not, though.  It's exactly the same interpretation of
> > the associativity array itself - it's just that one of them has the
> > array prepended with a (redundant) length.  So you can make
> > __initialize_form1_numa_distance() work on the "bare" associativity
> > array, with a given length.  Here you call it with aa.array_sz as the
> > length, and in the other place you call it with prop[0] as the length.
> >
> >> 
> >>index = lmb->aa_index * aa.array_sz + primary_domain_index - 1;
> >>nid = of_read_number([index], 1);
> >> 
> >> 
> >> >
> >> >> +*/
> >> >> +   for (i = 0; i < max_associativity_domain_index; 
> >> >> i++) {
> >> >> +   const __be32 *entry;
> >> >> +
> >> >> +   entry = 
> >> >> [be32_to_cpu(distance_ref_points[i]) - 1];
> >> >
> >> > Does anywhere verify that distance_ref_points[i] <= aa.array_size for
> >> > every i?
> >> 
> >> We do check for 
> >> 
> >>if (primary_domain_index <= aa.array_sz &&
> >
> > Right, but that doesn't check the other distance_ref_points entries.
> > Not that there's any reason to have extra entries with Form2, but we
> > still don't want stray array accesses.
> 
> This is how the change looks. I am not convinced this makes it simpler.

It's not, but that's because the lookup_array_assoc flag is not needed...

> I will add that as the last patch and we can drop that if we find that
> not helpful? 
> 
> modified   arch/powerpc/mm/numa.c
> @@ -171,20 +171,31 @@ static void unmap_cpu_from_node(unsigned long cpu)
>  }
>  #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
>  
> -/*
> - * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
> - * info is found.
> - */
> -static int associativity_to_nid(const __be32 *associativity)
> +static int __associativity_to_nid(const __be32 *associativity,
> +   bool lookup_array_assoc,
> +   int max_array_index)
>  {
>   int nid = NUMA_NO_NODE;
> + int index;
>  
>   if (!numa_enabled)
>   goto out;
> + /*
> +  * ibm,associativity-lookup-array doesn't have element
> +  * count at the start of the associativity. Hence
> +  * decrement the primary_domain_index when used with
> +  * lookup-array associativity.
> +  */
> + if (lookup_array_assoc)
> + index = primary_domain_index - 1;
> + else {
> + index = primary_domain_index;
> + max_array_index = of_read_number(associativity, 1);
> + }
> + if (index > max_array_index)
> + goto out;

So, the associativity-array-with-length is exactly a length, followed
by an associativity-array-without-length.  What I was suggesting is
you make this function only take an
associativity-array-without-length, with the length passed separately.

Where you want to use it on an associativity-array-with-length, stored
in __be32 *awl, you just invoke it as:
associativity_to_nid(awl + 1, of_read_number(awl, 1));

> - if (of_read_number(associativity, 1) >= primary_domain_index)
> - nid = of_read_number([primary_domain_index], 1);
> -
> + nid = of_read_number([index], 1);
>   /* POWER4 LPAR uses 0x as invalid node */
>   if (nid == 0x || nid >= nr_node_ids)
>   nid = NUMA_NO_NODE;
> @@ 

[PATCH] tests/nvdimm/ndtest: Simulate nvdimm health, DSC and smart-inject

2021-07-26 Thread Shivaprasad G Bhat
The 'papr_scm' module and 'papr' implementation in libndctl supports
PDSMs for reporting PAPR NVDIMM health, its dirty-shutdown-count and
injecting smart-error. This patch adds support for those PDSMs in
ndtest module so that PDSM specific paths in libndctl can be exercised.

Signed-off-by: Shivaprasad G Bhat 
---
The patch depends on the PAPR PDSM smart-inject payload definitions
added with the patch - 
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg191337.html

 tools/testing/nvdimm/test/ndtest.c |  149 
 tools/testing/nvdimm/test/ndtest.h |7 ++
 2 files changed, 156 insertions(+)

diff --git a/tools/testing/nvdimm/test/ndtest.c 
b/tools/testing/nvdimm/test/ndtest.c
index 00ec2c213061..6622e8adbd11 100644
--- a/tools/testing/nvdimm/test/ndtest.c
+++ b/tools/testing/nvdimm/test/ndtest.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../watermark.h"
 #include "nfit_test.h"
@@ -49,6 +50,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "1e5c75d2-b618-11ea-9aa3-507b9ddc0f72",
.physical_id = 0,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -56,6 +61,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "1c4d43ac-b618-11ea-be80-507b9ddc0f72",
.physical_id = 1,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -63,6 +72,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "a9f17ffc-b618-11ea-b36d-507b9ddc0f72",
.physical_id = 2,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -70,6 +83,10 @@ static struct ndtest_dimm dimm_group1[] = {
.uuid_str = "b6b83b22-b618-11ea-8aae-507b9ddc0f72",
.physical_id = 3,
.num_formats = 2,
+   .flags = PAPR_PMEM_HEALTH_NON_CRITICAL,
+   .extension_flags = PDSM_DIMM_DSC_VALID | 
PDSM_DIMM_HEALTH_RUN_GAUGE_VALID,
+   .dimm_fuel_gauge = 95,
+   .dimm_dsc = 42,
},
{
.size = DIMM_SIZE,
@@ -297,6 +314,103 @@ static int ndtest_get_config_size(struct ndtest_dimm 
*dimm, unsigned int buf_len
return 0;
 }
 
+static int ndtest_pdsm_health(struct ndtest_dimm *dimm,
+   union nd_pdsm_payload *payload,
+   unsigned int buf_len)
+{
+   struct nd_papr_pdsm_health *health = >health;
+
+   if (buf_len < sizeof(health))
+   return -EINVAL;
+
+   health->extension_flags = 0;
+   health->dimm_unarmed = !!(dimm->flags & PAPR_PMEM_UNARMED_MASK);
+   health->dimm_bad_shutdown = !!(dimm->flags & 
PAPR_PMEM_BAD_SHUTDOWN_MASK);
+   health->dimm_bad_restore = !!(dimm->flags & PAPR_PMEM_BAD_RESTORE_MASK);
+   health->dimm_health = PAPR_PDSM_DIMM_HEALTHY;
+
+   if (dimm->flags & PAPR_PMEM_HEALTH_FATAL)
+   health->dimm_health = PAPR_PDSM_DIMM_FATAL;
+   else if (dimm->flags & PAPR_PMEM_HEALTH_CRITICAL)
+   health->dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+   else if (dimm->flags & PAPR_PMEM_HEALTH_UNHEALTHY ||
+dimm->flags & PAPR_PMEM_HEALTH_NON_CRITICAL)
+   health->dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+   health->extension_flags = 0;
+   if (dimm->extension_flags & PDSM_DIMM_HEALTH_RUN_GAUGE_VALID) {
+   health->dimm_fuel_gauge = dimm->dimm_fuel_gauge;
+   health->extension_flags |= PDSM_DIMM_HEALTH_RUN_GAUGE_VALID;
+   }
+   if (dimm->extension_flags & PDSM_DIMM_DSC_VALID) {
+   health->dimm_dsc = dimm->dimm_dsc;
+   health->extension_flags |= PDSM_DIMM_DSC_VALID;
+   }
+
+   return 0;
+}
+
+static void smart_notify(struct ndtest_dimm *dimm)
+{
+   struct device *bus = dimm->dev->parent;
+
+   if (!(dimm->flags & PAPR_PMEM_HEALTH_NON_CRITICAL) ||
+   (dimm->flags & PAPR_PMEM_BAD_SHUTDOWN_MASK)) {
+   device_lock(bus);
+   /* send smart notification */
+   if (dimm->notify_handle)
+   sysfs_notify_dirent(dimm->notify_handle);
+   device_unlock(bus);
+   }
+}
+
+static int 

Re: [PATCH v5 4/6] powerpc/pseries: Consolidate different NUMA distance update code paths

2021-07-26 Thread Aneesh Kumar K.V
David Gibson  writes:

> On Thu, Jul 22, 2021 at 12:37:46PM +0530, Aneesh Kumar K.V wrote:
>> David Gibson  writes:
>> 
>> > On Mon, Jun 28, 2021 at 08:41:15PM +0530, Aneesh Kumar K.V wrote:



> 
>> >
>> >> + nid = of_read_number([index], 1);
>> >> +
>> >> + if (nid == 0x || nid >= nr_node_ids)
>> >> + nid = default_nid;
>> >> + if (nid > 0 && affinity_form == FORM1_AFFINITY) {
>> >> + int i;
>> >> + const __be32 *associativity;
>> >> +
>> >> + index = lmb->aa_index * aa.array_sz;
>> >> + associativity = [index];
>> >> + /*
>> >> +  * lookup array associativity entries have different 
>> >> format
>> >> +  * There is no length of the array as the first element.
>> >
>> > The difference it very small, and this is not a hot path.  Couldn't
>> > you reduce a chunk of code by prepending aa.array_sz, then re-using
>> > __initialize_form1_numa_distance.  Or even making
>> > __initialize_form1_numa_distance() take the length as a parameter.
>> 
>> The changes are small but confusing w.r.t how we look at the
>> associativity-lookup-arrays. The way we interpret associativity array
>> and associativity lookup array using primary_domain_index is different.
>> Hence the '-1' in the node lookup here.
>
> They're really not, though.  It's exactly the same interpretation of
> the associativity array itself - it's just that one of them has the
> array prepended with a (redundant) length.  So you can make
> __initialize_form1_numa_distance() work on the "bare" associativity
> array, with a given length.  Here you call it with aa.array_sz as the
> length, and in the other place you call it with prop[0] as the length.
>
>> 
>>  index = lmb->aa_index * aa.array_sz + primary_domain_index - 1;
>>  nid = of_read_number([index], 1);
>> 
>> 
>> >
>> >> +  */
>> >> + for (i = 0; i < max_associativity_domain_index; i++) {
>> >> + const __be32 *entry;
>> >> +
>> >> + entry = 
>> >> [be32_to_cpu(distance_ref_points[i]) - 1];
>> >
>> > Does anywhere verify that distance_ref_points[i] <= aa.array_size for
>> > every i?
>> 
>> We do check for 
>> 
>>  if (primary_domain_index <= aa.array_sz &&
>
> Right, but that doesn't check the other distance_ref_points entries.
> Not that there's any reason to have extra entries with Form2, but we
> still don't want stray array accesses.

This is how the change looks. I am not convinced this makes it simpler.
I will add that as the last patch and we can drop that if we find that
not helpful? 

modified   arch/powerpc/mm/numa.c
@@ -171,20 +171,31 @@ static void unmap_cpu_from_node(unsigned long cpu)
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
 
-/*
- * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
- * info is found.
- */
-static int associativity_to_nid(const __be32 *associativity)
+static int __associativity_to_nid(const __be32 *associativity,
+ bool lookup_array_assoc,
+ int max_array_index)
 {
int nid = NUMA_NO_NODE;
+   int index;
 
if (!numa_enabled)
goto out;
+   /*
+* ibm,associativity-lookup-array doesn't have element
+* count at the start of the associativity. Hence
+* decrement the primary_domain_index when used with
+* lookup-array associativity.
+*/
+   if (lookup_array_assoc)
+   index = primary_domain_index - 1;
+   else {
+   index = primary_domain_index;
+   max_array_index = of_read_number(associativity, 1);
+   }
+   if (index > max_array_index)
+   goto out;
 
-   if (of_read_number(associativity, 1) >= primary_domain_index)
-   nid = of_read_number([primary_domain_index], 1);
-
+   nid = of_read_number([index], 1);
/* POWER4 LPAR uses 0x as invalid node */
if (nid == 0x || nid >= nr_node_ids)
nid = NUMA_NO_NODE;
@@ -192,6 +203,15 @@ static int associativity_to_nid(const __be32 
*associativity)
return nid;
 }
 
+/*
+ * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
+ * info is found.
+ */
+static inline int associativity_to_nid(const __be32 *associativity)
+{
+   return __associativity_to_nid(associativity, false, 0);
+}
+
 static int __cpu_form2_relative_distance(__be32 *cpu1_assoc, __be32 
*cpu2_assoc)
 {
int dist;
@@ -295,19 +315,38 @@ int of_node_to_nid(struct device_node *device)
 }
 EXPORT_SYMBOL(of_node_to_nid);
 
-static void __initialize_form1_numa_distance(const __be32 *associativity)
+static void __initialize_form1_numa_distance(const __be32 *associativity,
+bool lookup_array_assoc,
+int 

Re: [PATCH v5 2/2] KVM: PPC: Book3S HV: Stop forwarding all HFUs to L1

2021-07-26 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of July 27, 2021 6:17 am:
> If the nested hypervisor has no access to a facility because it has
> been disabled by the host, it should also not be able to see the
> Hypervisor Facility Unavailable that arises from one of its guests
> trying to access the facility.
> 
> This patch turns a HFU that happened in L2 into a Hypervisor Emulation
> Assistance interrupt and forwards it to L1 for handling. The ones that
> happened because L1 explicitly disabled the facility for L2 are still
> let through, along with the corresponding Cause bits in the HFSCR.
> 
> Signed-off-by: Fabiano Rosas 
> Reviewed-by: Nicholas Piggin 
> ---
>  arch/powerpc/kvm/book3s_hv_nested.c | 32 +++--
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 8215dbd4be9a..d544b092b49a 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -99,7 +99,7 @@ static void byteswap_hv_regs(struct hv_guest_state *hr)
>   hr->dawrx1 = swab64(hr->dawrx1);
>  }
>  
> -static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
> +static void save_hv_return_state(struct kvm_vcpu *vcpu,
>struct hv_guest_state *hr)
>  {
>   struct kvmppc_vcore *vc = vcpu->arch.vcore;
> @@ -118,7 +118,7 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
> int trap,
>   hr->pidr = vcpu->arch.pid;
>   hr->cfar = vcpu->arch.cfar;
>   hr->ppr = vcpu->arch.ppr;
> - switch (trap) {
> + switch (vcpu->arch.trap) {
>   case BOOK3S_INTERRUPT_H_DATA_STORAGE:
>   hr->hdar = vcpu->arch.fault_dar;
>   hr->hdsisr = vcpu->arch.fault_dsisr;
> @@ -128,9 +128,29 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
> int trap,
>   hr->asdr = vcpu->arch.fault_gpa;
>   break;
>   case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
> - hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
> -  (HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
> - break;
> + {
> + u8 cause = vcpu->arch.hfscr >> 56;

Can this be u64 just to help gcc?

> +
> + WARN_ON_ONCE(cause >= BITS_PER_LONG);
> +
> + if (!(hr->hfscr & (1UL << cause))) {
> + hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
> +  (HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
> + break;
> + }
> +
> + /*
> +  * We have disabled this facility, so it does not
> +  * exist from L1's perspective. Turn it into a HEAI.
> +  */
> + vcpu->arch.trap = BOOK3S_INTERRUPT_H_EMUL_ASSIST;
> + kvmppc_load_last_inst(vcpu, INST_GENERIC, 
> >arch.emul_inst);

Hmm, this doesn't handle kvmpc_load_last_inst failure. Other code tends 
to just resume guest and retry in this case. Can we do that here?

> +
> + /* Don't leak the cause field */
> + hr->hfscr &= ~HFSCR_INTR_CAUSE;

This hunk also remains -- shouldn't change HFSCR for HEA, only HFAC.

Thanks,
Nick



Re: Linux kernel: powerpc: KVM guest to host memory corruption

2021-07-26 Thread Michael Ellerman
Michael Ellerman  writes:
> The Linux kernel for powerpc since v3.10 has a bug which allows a malicious 
> KVM guest to
> corrupt host memory.
>
> In the handling of the H_RTAS hypercall, args.rets is made to point into the 
> args.args
> buffer which is located on the stack:
>
>   args.rets = [be32_to_cpu(args.nargs)];
>
> However args.nargs has not been range checked. That allows the guest to point 
> args.rets
> anywhere up to +16GB from args.args.
>
> The guest does not have control of what is written to args.rets, it is always 
> (u32)-3,
> because subsequent code does check nargs. Additionally the guest will be 
> killed as a
> result of the nargs being out of range, so a given guest only has a single 
> shot at
> corrupting memory.
>
> Only machines using Linux as the hypervisor, aka. KVM or bare metal, are 
> affected by the
> bug.
>
> The bug was introduced in:
>
> 8e591cb72047 ("KVM: PPC: Book3S: Add infrastructure to implement 
> kernel-side RTAS calls")
>
> Which was first released in v3.10.
>
> The upstream fix is:
>
>   f62f3c20647e ("KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow")
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f62f3c20647ebd5fb6ecb8f0b477b9281c44c10a
>
> Which will be included in the v5.14 release.

This has been assigned CVE-2021-37576.

cheers


RE: [PATCH v3] soc: fsl: qe: convert QE interrupt controller to platform_device

2021-07-26 Thread Leo Li



> -Original Message-
> From: Maxim Kochetkov 
> Sent: Monday, July 26, 2021 12:22 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org;
> sarava...@google.com; Leo Li ; Qiang Zhao
> ; gre...@linuxfoundation.org; Maxim Kochetkov
> ; kernel test robot ; Dan Carpenter
> 
> Subject: [PATCH v3] soc: fsl: qe: convert QE interrupt controller to
> platform_device
> 
> Since 5.13 QE's ucc nodes can't get interrupts from devicetree:
> 
>   ucc@2000 {
>   cell-index = <1>;
>   reg = <0x2000 0x200>;
>   interrupts = <32>;
>   interrupt-parent = <>;
>   };
> 
> Now fw_devlink expects driver to create and probe a struct device for
> interrupt controller.
> 
> So lets convert this driver to simple platform_device with probe().
> Also use platform_get_ and devm_ family function to get/allocate resources
> and drop unused .compatible = "qeic".
> 
> [1] -
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.k
> ernel.org%2Flkml%2FCAGETcx9PiX%3D%3DmLxB9PO8Myyk6u2vhPVwTMsA
> 5NkD-
> ywH5xhusw%40mail.gmail.comdata=04%7C01%7Cleoyang.li%40nxp.co
> m%7C6e64e4b86f2d4a89390808d94ff50bec%7C686ea1d3bc2b4c6fa92cd99c5c
> 301635%7C0%7C0%7C637628736153046082%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000sdata=G7HhGFmRLvMyNMULSddWctD3HhtVWMfZAxPXjl8
> CBTY%3Dreserved=0
> Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
> Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by
> default""")
> Signed-off-by: Maxim Kochetkov 
> Reported-by: kernel test robot 
> Reported-by: Dan Carpenter 
> ---
> Changes in v3:
>  - use .compatible = "qeic" again (Li Yang  asks to keep
> it)
> 
> Changes in v2:
>  - use devm_ family functions to allocate mem/resources
>  - use platform_get_ family functions to get resources/irqs
>  - drop unused .compatible = "qeic"
> 
>  drivers/soc/fsl/qe/qe_ic.c | 75 ++
>  1 file changed, 44 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c index
> 3f711c1a0996..54cabd2605dd 100644
> --- a/drivers/soc/fsl/qe/qe_ic.c
> +++ b/drivers/soc/fsl/qe/qe_ic.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -404,41 +405,40 @@ static void qe_ic_cascade_muxed_mpic(struct
> irq_desc *desc)
>   chip->irq_eoi(>irq_data);
>  }
> 
> -static void __init qe_ic_init(struct device_node *node)
> +static int qe_ic_init(struct platform_device *pdev)
>  {
> + struct device *dev = >dev;
>   void (*low_handler)(struct irq_desc *desc);
>   void (*high_handler)(struct irq_desc *desc);
>   struct qe_ic *qe_ic;
> - struct resource res;
> - u32 ret;
> + struct resource *res;
> + struct device_node *node = pdev->dev.of_node;
> 
> - ret = of_address_to_resource(node, 0, );
> - if (ret)
> - return;
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (res == NULL) {
> + dev_err(dev, "no memory resource defined\n");
> + return -ENODEV;
> + }
> 
> - qe_ic = kzalloc(sizeof(*qe_ic), GFP_KERNEL);
> + qe_ic = devm_kzalloc(dev, sizeof(*qe_ic), GFP_KERNEL);
>   if (qe_ic == NULL)
> - return;
> + return -ENOMEM;
> 
> - qe_ic->irqhost = irq_domain_add_linear(node, NR_QE_IC_INTS,
> -_ic_host_ops, qe_ic);
> - if (qe_ic->irqhost == NULL) {
> - kfree(qe_ic);
> - return;
> + qe_ic->regs = devm_ioremap(dev, res->start, resource_size(res));
> + if (qe_ic->regs == NULL) {
> + dev_err(dev, "failed to ioremap() registers\n");
> + return -ENODEV;
>   }
> 
> - qe_ic->regs = ioremap(res.start, resource_size());
> -
>   qe_ic->hc_irq = qe_ic_irq_chip;
> 
> - qe_ic->virq_high = irq_of_parse_and_map(node, 0);
> - qe_ic->virq_low = irq_of_parse_and_map(node, 1);
> + qe_ic->virq_high = platform_get_irq(pdev, 0);
> + qe_ic->virq_low = platform_get_irq(pdev, 1);
> 
> - if (!qe_ic->virq_low) {
> - printk(KERN_ERR "Failed to map QE_IC low IRQ\n");
> - kfree(qe_ic);
> - return;
> + if (qe_ic->virq_low < 0) {
> + return -ENODEV;
>   }
> +
>   if (qe_ic->virq_high != qe_ic->virq_low) {
>   low_handler = qe_ic_cascade_low;
>   high_handler = qe_ic_cascade_high;
> @@ -447,6 +447,13 @@ static void __init qe_ic_init(struct device_node
> *node)
>   high_handler = NULL;
>   }
> 
> + qe_ic->irqhost = irq_domain_add_linear(node, NR_QE_IC_INTS,
> +_ic_host_ops, qe_ic);
> + if (qe_ic->irqhost == NULL) {
> + dev_err(dev, "failed to add irq domain\n");
> + return 

[PATCH v5 2/2] KVM: PPC: Book3S HV: Stop forwarding all HFUs to L1

2021-07-26 Thread Fabiano Rosas
If the nested hypervisor has no access to a facility because it has
been disabled by the host, it should also not be able to see the
Hypervisor Facility Unavailable that arises from one of its guests
trying to access the facility.

This patch turns a HFU that happened in L2 into a Hypervisor Emulation
Assistance interrupt and forwards it to L1 for handling. The ones that
happened because L1 explicitly disabled the facility for L2 are still
let through, along with the corresponding Cause bits in the HFSCR.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 32 +++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 8215dbd4be9a..d544b092b49a 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -99,7 +99,7 @@ static void byteswap_hv_regs(struct hv_guest_state *hr)
hr->dawrx1 = swab64(hr->dawrx1);
 }
 
-static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
+static void save_hv_return_state(struct kvm_vcpu *vcpu,
 struct hv_guest_state *hr)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
@@ -118,7 +118,7 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int 
trap,
hr->pidr = vcpu->arch.pid;
hr->cfar = vcpu->arch.cfar;
hr->ppr = vcpu->arch.ppr;
-   switch (trap) {
+   switch (vcpu->arch.trap) {
case BOOK3S_INTERRUPT_H_DATA_STORAGE:
hr->hdar = vcpu->arch.fault_dar;
hr->hdsisr = vcpu->arch.fault_dsisr;
@@ -128,9 +128,29 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
int trap,
hr->asdr = vcpu->arch.fault_gpa;
break;
case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
-   hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
-(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
-   break;
+   {
+   u8 cause = vcpu->arch.hfscr >> 56;
+
+   WARN_ON_ONCE(cause >= BITS_PER_LONG);
+
+   if (!(hr->hfscr & (1UL << cause))) {
+   hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
+(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
+   break;
+   }
+
+   /*
+* We have disabled this facility, so it does not
+* exist from L1's perspective. Turn it into a HEAI.
+*/
+   vcpu->arch.trap = BOOK3S_INTERRUPT_H_EMUL_ASSIST;
+   kvmppc_load_last_inst(vcpu, INST_GENERIC, 
>arch.emul_inst);
+
+   /* Don't leak the cause field */
+   hr->hfscr &= ~HFSCR_INTR_CAUSE;
+
+   fallthrough;
+   }
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
hr->heir = vcpu->arch.emul_inst;
break;
@@ -368,7 +388,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
delta_spurr = vcpu->arch.spurr - l2_hv.spurr;
delta_ic = vcpu->arch.ic - l2_hv.ic;
delta_vtb = vc->vtb - l2_hv.vtb;
-   save_hv_return_state(vcpu, vcpu->arch.trap, _hv);
+   save_hv_return_state(vcpu, _hv);
 
/* restore L1 state */
vcpu->arch.nested = NULL;
-- 
2.29.2



[PATCH v5 1/2] KVM: PPC: Book3S HV: Sanitise vcpu registers in nested path

2021-07-26 Thread Fabiano Rosas
As one of the arguments of the H_ENTER_NESTED hypercall, the nested
hypervisor (L1) prepares a structure containing the values of various
hypervisor-privileged registers with which it wants the nested guest
(L2) to run. Since the nested HV runs in supervisor mode it needs the
host to write to these registers.

To stop a nested HV manipulating this mechanism and using a nested
guest as a proxy to access a facility that has been made unavailable
to it, we have a routine that sanitises the values of the HV registers
before copying them into the nested guest's vcpu struct.

However, when coming out of the guest the values are copied as they
were back into L1 memory, which means that any sanitisation we did
during guest entry will be exposed to L1 after H_ENTER_NESTED returns.

This patch alters this sanitisation to have effect on the vcpu->arch
registers directly before entering and after exiting the guest,
leaving the structure that is copied back into L1 unchanged (except
when we really want L1 to access the value, e.g the Cause bits of
HFSCR).

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 94 ++---
 1 file changed, 46 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 8543ad538b0c..8215dbd4be9a 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -105,7 +105,6 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int 
trap,
struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
hr->dpdes = vc->dpdes;
-   hr->hfscr = vcpu->arch.hfscr;
hr->purr = vcpu->arch.purr;
hr->spurr = vcpu->arch.spurr;
hr->ic = vcpu->arch.ic;
@@ -128,55 +127,17 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
int trap,
case BOOK3S_INTERRUPT_H_INST_STORAGE:
hr->asdr = vcpu->arch.fault_gpa;
break;
+   case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
+   hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
+(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
+   break;
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
hr->heir = vcpu->arch.emul_inst;
break;
}
 }
 
-/*
- * This can result in some L0 HV register state being leaked to an L1
- * hypervisor when the hv_guest_state is copied back to the guest after
- * being modified here.
- *
- * There is no known problem with such a leak, and in many cases these
- * register settings could be derived by the guest by observing behaviour
- * and timing, interrupts, etc., but it is an issue to consider.
- */
-static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
-{
-   struct kvmppc_vcore *vc = vcpu->arch.vcore;
-   u64 mask;
-
-   /*
-* Don't let L1 change LPCR bits for the L2 except these:
-*/
-   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
-   LPCR_LPES | LPCR_MER;
-
-   /*
-* Additional filtering is required depending on hardware
-* and configuration.
-*/
-   hr->lpcr = kvmppc_filter_lpcr_hv(vcpu->kvm,
-   (vc->lpcr & ~mask) | (hr->lpcr & mask));
-
-   /*
-* Don't let L1 enable features for L2 which we've disabled for L1,
-* but preserve the interrupt cause field.
-*/
-   hr->hfscr &= (HFSCR_INTR_CAUSE | vcpu->arch.hfscr);
-
-   /* Don't let data address watchpoint match in hypervisor state */
-   hr->dawrx0 &= ~DAWRX_HYP;
-   hr->dawrx1 &= ~DAWRX_HYP;
-
-   /* Don't let completed instruction address breakpt match in HV state */
-   if ((hr->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
-   hr->ciabr &= ~CIABR_PRIV;
-}
-
-static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
+static void restore_hv_regs(struct kvm_vcpu *vcpu, const struct hv_guest_state 
*hr)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
@@ -288,6 +249,43 @@ static int kvmhv_write_guest_state_and_regs(struct 
kvm_vcpu *vcpu,
 sizeof(struct pt_regs));
 }
 
+static void load_l2_hv_regs(struct kvm_vcpu *vcpu,
+   const struct hv_guest_state *l2_hv,
+   const struct hv_guest_state *l1_hv, u64 *lpcr)
+{
+   struct kvmppc_vcore *vc = vcpu->arch.vcore;
+   u64 mask;
+
+   restore_hv_regs(vcpu, l2_hv);
+
+   /*
+* Don't let L1 change LPCR bits for the L2 except these:
+*/
+   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
+   LPCR_LPES | LPCR_MER;
+
+   /*
+* Additional filtering is required depending on hardware
+* and configuration.
+*/
+   *lpcr = kvmppc_filter_lpcr_hv(vcpu->kvm,
+ (vc->lpcr & ~mask) | (*lpcr & mask));
+
+   /*
+* Don't 

[PATCH v5 0/2] KVM: PPC: Book3S HV: Nested guest state sanitising changes

2021-07-26 Thread Fabiano Rosas
This series aims to stop contaminating the l2_hv structure with bits
that might have come from L1 state.

Patch 1 makes l2_hv read-only (mostly). It is now only changed when we
explicitly want to pass information to L1.

Patch 2 makes sure that L1 is not forwarded HFU interrupts when the
host has decided to disable any facilities (theoretical for now, since
HFSCR bits are always the same between L1/Ln).

Changes since v4:
- moved setting of the Cause bits under BOOK3S_INTERRUPT_H_FAC_UNAVAIL.

v4:

- now passing lpcr separately into load_l2_hv_regs to solve the
  conflict with commit a19b70abc69a ("KVM: PPC: Book3S HV: Nested move
  LPCR sanitising to sanitise_hv_regs");

- patch 2 now forwards a HEAI instead of injecting a Program.

https://lkml.kernel.org/r/2021071240.2384655-1-faro...@linux.ibm.com

v3:

- removed the sanitise functions;
- moved the entry code into a new load_l2_hv_regs and the exit code
  into the existing save_hv_return_state;
- new patch: removes the cause bits when L0 has disabled the
  corresponding facility.

https://lkml.kernel.org/r/20210415230948.3563415-1-faro...@linux.ibm.com

v2:

- made the change more generic, not only applies to hfscr anymore;
- sanitisation is now done directly on the vcpu struct, l2_hv is left
  unchanged.

https://lkml.kernel.org/r/20210406214645.3315819-1-faro...@linux.ibm.com

v1:
https://lkml.kernel.org/r/20210305231055.2913892-1-faro...@linux.ibm.com

Fabiano Rosas (2):
  KVM: PPC: Book3S HV: Sanitise vcpu registers in nested path
  KVM: PPC: Book3S HV: Stop forwarding all HFUs to L1

 arch/powerpc/kvm/book3s_hv_nested.c | 118 
 1 file changed, 68 insertions(+), 50 deletions(-)

-- 
2.29.2



Re: [PATCH] powerpc/stacktrace: Include linux/delay.h

2021-07-26 Thread Gabriel Paubert
On Mon, Jul 26, 2021 at 05:42:43PM +0200, Michal Suchanek wrote:
> commit 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in 
> raise_backtrace_ipi()")
> introduces udelay() call without including the linux/delay.h header.
> This may happen to work on master but the header that declares the
> functionshould be included nonetheless.
> 
> Fixes: 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in 
> raise_backtrace_ipi()")
> Signed-off-by: Michal Suchanek 
> ---
>  arch/powerpc/kernel/stacktrace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/kernel/stacktrace.c 
> b/arch/powerpc/kernel/stacktrace.c
> index 2b0d04a1b7d2..a17ac10f86b1 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 

Hmm, I believe we try to keep the list of includes sorted in
alphabetical order.

Gabriel
 



[PATCH] cpufreq:powernv: Fix init_chip_info initialization in numa=off

2021-07-26 Thread Pratik R. Sampat
In the numa=off kernel command-line configuration init_chip_info() loops
around the number of chips and attempts to copy the cpumask of that node
which is NULL for all iterations after the first chip.

Hence, store the cpu mask for each chip instead of derving cpumask from
node while populating the "chips" struct array and copy that to the
chips[i].mask

Cc: sta...@vger.kernel.org
Fixes: 053819e0bf84 ("cpufreq: powernv: Handle throttling due to Pmax capping 
at chip level")
Signed-off-by: Pratik R. Sampat 
Reported-by: Shirisha Ganta 
---
 drivers/cpufreq/powernv-cpufreq.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 005600cef273..8ec10d9aed8f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -1046,12 +1046,20 @@ static int init_chip_info(void)
unsigned int *chip;
unsigned int cpu, i;
unsigned int prev_chip_id = UINT_MAX;
+   cpumask_t *chip_cpu_mask;
int ret = 0;
 
chip = kcalloc(num_possible_cpus(), sizeof(*chip), GFP_KERNEL);
if (!chip)
return -ENOMEM;
 
+   /* Allocate a chip cpu mask large enough to fit mask for all chips */
+   chip_cpu_mask = kcalloc(32, sizeof(cpumask_t), GFP_KERNEL);
+   if (!chip_cpu_mask) {
+   ret = -ENOMEM;
+   goto free_and_return;
+   }
+
for_each_possible_cpu(cpu) {
unsigned int id = cpu_to_chip_id(cpu);
 
@@ -1059,22 +1067,25 @@ static int init_chip_info(void)
prev_chip_id = id;
chip[nr_chips++] = id;
}
+   cpumask_set_cpu(cpu, _cpu_mask[nr_chips-1]);
}
 
chips = kcalloc(nr_chips, sizeof(struct chip), GFP_KERNEL);
if (!chips) {
ret = -ENOMEM;
-   goto free_and_return;
+   goto out_chip_cpu_mask;
}
 
for (i = 0; i < nr_chips; i++) {
chips[i].id = chip[i];
-   cpumask_copy([i].mask, cpumask_of_node(chip[i]));
+   cpumask_copy([i].mask, _cpu_mask[i]);
INIT_WORK([i].throttle, powernv_cpufreq_work_fn);
for_each_cpu(cpu, [i].mask)
per_cpu(chip_info, cpu) =  [i];
}
 
+out_chip_cpu_mask:
+   kfree(chip_cpu_mask);
 free_and_return:
kfree(chip);
return ret;
-- 
2.31.1



[PATCH] powerpc/stacktrace: Include linux/delay.h

2021-07-26 Thread Michal Suchanek
commit 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in 
raise_backtrace_ipi()")
introduces udelay() call without including the linux/delay.h header.
This may happen to work on master but the header that declares the
functionshould be included nonetheless.

Fixes: 7c6986ade69e ("powerpc/stacktrace: Fix spurious "stale" traces in 
raise_backtrace_ipi()")
Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/stacktrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 2b0d04a1b7d2..a17ac10f86b1 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.26.2



[RFC PATCH 1/4] powerpc: Optimize register usage for esr register

2021-07-26 Thread sxwjean
From: Xiongwei Song 

Create an anonymous union for dsisr and esr regsiters, we can reference
esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dsisr. This makes code more clear.

Signed-off-by: Xiongwei Song 
---
 arch/powerpc/include/asm/ptrace.h  |  5 -
 arch/powerpc/include/uapi/asm/ptrace.h |  5 -
 arch/powerpc/kernel/process.c  |  2 +-
 arch/powerpc/kernel/ptrace/ptrace.c|  2 ++
 arch/powerpc/kernel/traps.c|  2 +-
 arch/powerpc/mm/fault.c| 16 ++--
 arch/powerpc/platforms/44x/machine_check.c |  4 ++--
 arch/powerpc/platforms/4xx/machine_check.c |  2 +-
 8 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 3e5d470a6155..c252d04b1206 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -44,7 +44,10 @@ struct pt_regs
 #endif
unsigned long trap;
unsigned long dar;
-   unsigned long dsisr;
+   union {
+   unsigned long dsisr;
+   unsigned long esr;
+   };
unsigned long result;
};
};
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 7004cfea3f5f..e357288b5f34 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -53,7 +53,10 @@ struct pt_regs
/* N.B. for critical exceptions on 4xx, the dar and dsisr
   fields are overloaded to hold srr0 and srr1. */
unsigned long dar;  /* Fault registers */
-   unsigned long dsisr;/* on 4xx/Book-E used for ESR */
+   union {
+   unsigned long dsisr;/* on Book-S used for DSISR */
+   unsigned long esr;  /* on 4xx/Book-E used for ESR */
+   };
unsigned long result;   /* Result of a system call */
 };
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 185beb290580..f74af8f9133c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1499,7 +1499,7 @@ static void __show_regs(struct pt_regs *regs)
trap == INTERRUPT_DATA_STORAGE ||
trap == INTERRUPT_ALIGNMENT) {
if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
-   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
regs->dsisr);
+   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
regs->esr);
else
pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
regs->dsisr);
}
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
b/arch/powerpc/kernel/ptrace/ptrace.c
index 0a0a33eb0d28..00789ad2c4a3 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -375,6 +375,8 @@ void __init pt_regs_check(void)
 offsetof(struct user_pt_regs, dar));
BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) !=
 offsetof(struct user_pt_regs, dsisr));
+   BUILD_BUG_ON(offsetof(struct pt_regs, esr) !=
+offsetof(struct user_pt_regs, esr));
BUILD_BUG_ON(offsetof(struct pt_regs, result) !=
 offsetof(struct user_pt_regs, result));
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index dfbce527c98e..2164f5705a0b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -562,7 +562,7 @@ static inline int check_io_access(struct pt_regs *regs)
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 /* On 4xx, the reason for the machine check or program exception
is in the ESR. */
-#define get_reason(regs)   ((regs)->dsisr)
+#define get_reason(regs)   ((regs)->esr)
 #define REASON_FP  ESR_FP
 #define REASON_ILLEGAL (ESR_PIL | ESR_PUO)
 #define REASON_PRIVILEGED  ESR_PPR
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..62953d4e7c93 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -541,7 +541,11 @@ static __always_inline void __do_page_fault(struct pt_regs 
*regs)
 {
long err;
 
-   err = ___do_page_fault(regs, regs->dar, regs->dsisr);
+   if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
+   err = ___do_page_fault(regs, regs->dar, regs->esr);
+   else
+   err = ___do_page_fault(regs, regs->dar, regs->dsisr);
+
if (unlikely(err))
bad_page_fault(regs, err);
 }
@@ -567,7 +571,15 @@ NOKPROBE_SYMBOL(hash__do_page_fault);
  */
 static void __bad_page_fault(struct pt_regs *regs, int sig)
 {
-   int is_write = page_fault_is_write(regs->dsisr);
+   unsigned long err_reg;
+   int is_write;
+
+   if 

[RFC PATCH 3/4] powerpc: Optimize register usage for dear register

2021-07-26 Thread sxwjean
From: Xiongwei Song 

Create an anonymous union for dar and dear regsiters, we can reference
dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dar. This makes code more clear.

Signed-off-by: Xiongwei Song 
---
 arch/powerpc/include/asm/ptrace.h  | 5 -
 arch/powerpc/include/uapi/asm/ptrace.h | 5 -
 arch/powerpc/kernel/process.c  | 2 +-
 arch/powerpc/kernel/ptrace/ptrace.c| 2 ++
 arch/powerpc/kernel/traps.c| 5 -
 arch/powerpc/mm/fault.c| 2 +-
 6 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index c252d04b1206..fa725e3238c2 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -43,7 +43,10 @@ struct pt_regs
unsigned long mq;
 #endif
unsigned long trap;
-   unsigned long dar;
+   union {
+   unsigned long dar;
+   unsigned long dear;
+   };
union {
unsigned long dsisr;
unsigned long esr;
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index e357288b5f34..9ae150fb4c4b 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -52,7 +52,10 @@ struct pt_regs
unsigned long trap; /* Reason for being here */
/* N.B. for critical exceptions on 4xx, the dar and dsisr
   fields are overloaded to hold srr0 and srr1. */
-   unsigned long dar;  /* Fault registers */
+   union {
+   unsigned long dar;  /* Fault registers */
+   unsigned long dear;
+   };
union {
unsigned long dsisr;/* on Book-S used for DSISR */
unsigned long esr;  /* on 4xx/Book-E used for ESR */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index f74af8f9133c..50436b52c213 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1499,7 +1499,7 @@ static void __show_regs(struct pt_regs *regs)
trap == INTERRUPT_DATA_STORAGE ||
trap == INTERRUPT_ALIGNMENT) {
if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
-   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
regs->esr);
+   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dear, 
regs->esr);
else
pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
regs->dsisr);
}
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
b/arch/powerpc/kernel/ptrace/ptrace.c
index 00789ad2c4a3..969dca8b0718 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -373,6 +373,8 @@ void __init pt_regs_check(void)
 offsetof(struct user_pt_regs, trap));
BUILD_BUG_ON(offsetof(struct pt_regs, dar) !=
 offsetof(struct user_pt_regs, dar));
+   BUILD_BUG_ON(offsetof(struct pt_regs, dear) !=
+offsetof(struct user_pt_regs, dear));
BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) !=
 offsetof(struct user_pt_regs, dsisr));
BUILD_BUG_ON(offsetof(struct pt_regs, esr) !=
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2164f5705a0b..0796630d3d23 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1609,7 +1609,10 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
}
 bad:
if (user_mode(regs))
-   _exception(sig, regs, code, regs->dar);
+   if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
+   _exception(sig, regs, code, regs->dear);
+   else
+   _exception(sig, regs, code, regs->dar);
else
bad_page_fault(regs, sig);
 }
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 62953d4e7c93..3db6b39a1178 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -542,7 +542,7 @@ static __always_inline void __do_page_fault(struct pt_regs 
*regs)
long err;
 
if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
-   err = ___do_page_fault(regs, regs->dar, regs->esr);
+   err = ___do_page_fault(regs, regs->dear, regs->esr);
else
err = ___do_page_fault(regs, regs->dar, regs->dsisr);
 
-- 
2.30.2



[RFC PATCH 2/4] powerpc/64e: Get esr offset with _ESR macro

2021-07-26 Thread sxwjean
From: Xiongwei Song 

Use _ESR to get the offset of esr register in pr_regs for 64e cpus.

Signed-off-by: Xiongwei Song 
---
 arch/powerpc/kernel/asm-offsets.c|  2 +-
 arch/powerpc/kernel/exceptions-64e.S | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a47eefa09bcb..f4ebc435fd78 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -287,6 +287,7 @@ int main(void)
STACK_PT_REGS_OFFSET(_XER, xer);
STACK_PT_REGS_OFFSET(_DAR, dar);
STACK_PT_REGS_OFFSET(_DSISR, dsisr);
+   STACK_PT_REGS_OFFSET(_ESR, esr);
STACK_PT_REGS_OFFSET(ORIG_GPR3, orig_gpr3);
STACK_PT_REGS_OFFSET(RESULT, result);
STACK_PT_REGS_OFFSET(_TRAP, trap);
@@ -298,7 +299,6 @@ int main(void)
 * we use them to hold SRR0 and SRR1.
 */
STACK_PT_REGS_OFFSET(_DEAR, dar);
-   STACK_PT_REGS_OFFSET(_ESR, dsisr);
 #else /* CONFIG_PPC64 */
STACK_PT_REGS_OFFSET(SOFTE, softe);
STACK_PT_REGS_OFFSET(_PPR, ppr);
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 1401787b0b93..bf8c4c2f98ea 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -546,7 +546,7 @@ __end_interrupts:
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
std r14,_DAR(r1)
-   std r15,_DSISR(r1)
+   std r15,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x300)
@@ -559,7 +559,7 @@ __end_interrupts:
li  r15,0
mr  r14,r10
std r14,_DAR(r1)
-   std r15,_DSISR(r1)
+   std r15,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x400)
@@ -576,7 +576,7 @@ __end_interrupts:
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
std r14,_DAR(r1)
-   std r15,_DSISR(r1)
+   std r15,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
ld  r15,PACA_EXGEN+EX_R15(r13)
EXCEPTION_COMMON(0x600)
@@ -587,7 +587,7 @@ __end_interrupts:
NORMAL_EXCEPTION_PROLOG(0x700, BOOKE_INTERRUPT_PROGRAM,
PROLOG_ADDITION_1REG)
mfspr   r14,SPRN_ESR
-   std r14,_DSISR(r1)
+   std r14,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
EXCEPTION_COMMON(0x700)
addir3,r1,STACK_FRAME_OVERHEAD
@@ -1058,7 +1058,7 @@ bad_stack_book3e:
mfspr   r10,SPRN_DEAR
mfspr   r11,SPRN_ESR
std r10,_DAR(r1)
-   std r11,_DSISR(r1)
+   std r11,_ESR(r1)
std r0,GPR0(r1);/* save r0 in stackframe */ \
std r2,GPR2(r1);/* save r2 in stackframe */ \
SAVE_4GPRS(3, r1);  /* save r3 - r6 in stackframe */\
-- 
2.30.2



[RFC PATCH 4/4] powerpc/64e: Get dear offset with _DEAR macro

2021-07-26 Thread sxwjean
From: Xiongwei Song 

Use _DEAR to get the offset of dear register in pr_regs for 64e cpus.

Signed-off-by: Xiongwei Song 
---
 arch/powerpc/kernel/asm-offsets.c| 13 +++--
 arch/powerpc/kernel/exceptions-64e.S |  8 
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index f4ebc435fd78..8357d5fcd09e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -286,23 +286,16 @@ int main(void)
STACK_PT_REGS_OFFSET(_CCR, ccr);
STACK_PT_REGS_OFFSET(_XER, xer);
STACK_PT_REGS_OFFSET(_DAR, dar);
+   STACK_PT_REGS_OFFSET(_DEAR, dear);
STACK_PT_REGS_OFFSET(_DSISR, dsisr);
STACK_PT_REGS_OFFSET(_ESR, esr);
STACK_PT_REGS_OFFSET(ORIG_GPR3, orig_gpr3);
STACK_PT_REGS_OFFSET(RESULT, result);
STACK_PT_REGS_OFFSET(_TRAP, trap);
-#ifndef CONFIG_PPC64
-   /*
-* The PowerPC 400-class & Book-E processors have neither the DAR
-* nor the DSISR SPRs. Hence, we overload them to hold the similar
-* DEAR and ESR SPRs for such processors.  For critical interrupts
-* we use them to hold SRR0 and SRR1.
-*/
-   STACK_PT_REGS_OFFSET(_DEAR, dar);
-#else /* CONFIG_PPC64 */
+#ifdef CONFIG_PPC64
STACK_PT_REGS_OFFSET(SOFTE, softe);
STACK_PT_REGS_OFFSET(_PPR, ppr);
-#endif /* CONFIG_PPC64 */
+#endif
 
 #ifdef CONFIG_PPC_PKEY
STACK_PT_REGS_OFFSET(STACK_REGS_AMR, amr);
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index bf8c4c2f98ea..221e085e8c8c 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -545,7 +545,7 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
-   std r14,_DAR(r1)
+   std r14,_DEAR(r1)
std r15,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
ld  r15,PACA_EXGEN+EX_R15(r13)
@@ -558,7 +558,7 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
li  r15,0
mr  r14,r10
-   std r14,_DAR(r1)
+   std r14,_DEAR(r1)
std r15,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
ld  r15,PACA_EXGEN+EX_R15(r13)
@@ -575,7 +575,7 @@ __end_interrupts:
PROLOG_ADDITION_2REGS)
mfspr   r14,SPRN_DEAR
mfspr   r15,SPRN_ESR
-   std r14,_DAR(r1)
+   std r14,_DEAR(r1)
std r15,_ESR(r1)
ld  r14,PACA_EXGEN+EX_R14(r13)
ld  r15,PACA_EXGEN+EX_R15(r13)
@@ -1057,7 +1057,7 @@ bad_stack_book3e:
std r11,_CCR(r1)
mfspr   r10,SPRN_DEAR
mfspr   r11,SPRN_ESR
-   std r10,_DAR(r1)
+   std r10,_DEAR(r1)
std r11,_ESR(r1)
std r0,GPR0(r1);/* save r0 in stackframe */ \
std r2,GPR2(r1);/* save r2 in stackframe */ \
-- 
2.30.2



Re: [PATCH v3 0/5] powerpc: apm82181: adding customer devices

2021-07-26 Thread Andy Shevchenko
On Sat, Jul 24, 2021 at 12:08:30AM +0200, Christian Lamparter wrote:
> On 23/07/2021 21:19, Andy Shevchenko wrote:
> > On Sun, Sep 06, 2020 at 12:06:10AM +0200, Christian Lamparter wrote:
> > > I've been holding on to these devices dts' for a while now.
> > > But ever since the recent purge of the PPC405, I'm feeling
> > > the urge to move forward.
> > > 
> > > The devices in question have been running with OpenWrt since
> > > around 2016/2017. Back then it was linux v4.4 and required
> > > many out-of-tree patches (for WIFI, SATA, CRYPTO...), that
> > > since have been integrated. So, there's nothing else in the
> > > way I think.
> > > 
> > > A patch that adds the Meraki vendor-prefix has been sent
> > > separately, as there's also the Meraki MR32 that I'm working
> > > on as well. Here's the link to the patch:
> > > 
> > > 
> > > Now, I've looked around in the arch/powerpc for recent .dts
> > > and device submissions to get an understanding of what is
> > > required.
> > > >From the looks of it, it seems like every device gets a
> > > skeleton defconfig and a CONFIG_$DEVICE symbol (Like:
> > > CONFIG_MERAKI_MR24, CONFIG_WD_MYBOOKLIVE).
> > > 
> > > Will this be the case? Or would it make sense to further
> > > unite the Bluestone, MR24 and MBL under a common CONFIG_APM82181
> > > and integrate the BLUESTONE device's defconfig into it as well?
> > > (I've stumbled across the special machine compatible
> > > handling of ppc in the Documentation/devicetree/usage-model.rst
> > > already.)
> > 
> > I haven't found any traces of this to be applied. What is the status of this
> > patch series? And what is the general state of affairs for the PPC44x?
> 
> 
> My best guess is: It's complicated. While there was a recent big
> UPSET EVENT regarding the My Book Live (MBL) that affected "hundreds"
> and "thousands": "An unpleasant surprise for My Book Live owners"
> (). Sadly this wasn't getting any
> traction.
> 
> I can tell that the mentioned Cisco Meraki MR32 (Broadcom ARM SoC)
> got merged. So this is off the plate .
> 
> But APM821xx sadly went nowhere . One reason being that I haven't
> yet posted a V4, V5 and so on...

I will help with testing if needed, please continue this, it's helpful!

> In theory, for v4 I would have liked to know how to handle the
> kConfig aspect of the series: Would it be "OK" to have a
> single CONFIG_APM82181/CONFIG_APM821XX symbol or should there
> be a CONFIG_MBL the CONFIG_MR24 (CONFIG_WNDR4700 and CONFIG_MX60W
> in the future)?

No idea. Not a PPC maintainer here.

> As for the MBL: Well, If you (or any one else) is interested in
> having a more up-to-date Debian. Then I have something:
> 
> A while back, I made a "build.sh". This will build a
> "out-of-the-box" Debian unstable/SID powerpc system image.
> This includes sensible NAS defaults + programs as well as
> a Cockpit Web-GUI. But also makes it easily possible to do
> the DTBs development on the latest vanilla (5.14-rc2 as of
> the time of writing this) kernel for the
> MyBook Live Single and Duo:
> 
> 

Thanks for the pointer.

> I can't really make one for the MR24 though. Its 32MiB NAND
> makes it difficult to install anything else than OpenWrt
> (and get some use out of the device).

Not interested in MR24, up to you.

> So, how to proceed?

At least send a v4 :-)

-- 
With Best Regards,
Andy Shevchenko




[PATCH v3] soc: fsl: qe: convert QE interrupt controller to platform_device

2021-07-26 Thread Maxim Kochetkov
Since 5.13 QE's ucc nodes can't get interrupts from devicetree:

ucc@2000 {
cell-index = <1>;
reg = <0x2000 0x200>;
interrupts = <32>;
interrupt-parent = <>;
};

Now fw_devlink expects driver to create and probe a struct device
for interrupt controller.

So lets convert this driver to simple platform_device with probe().
Also use platform_get_ and devm_ family function to get/allocate
resources and drop unused .compatible = "qeic".

[1] - 
https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com
Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by 
default""")
Signed-off-by: Maxim Kochetkov 
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
---
Changes in v3:
 - use .compatible = "qeic" again (Li Yang  asks to keep it)
 
Changes in v2:
 - use devm_ family functions to allocate mem/resources
 - use platform_get_ family functions to get resources/irqs
 - drop unused .compatible = "qeic"

 drivers/soc/fsl/qe/qe_ic.c | 75 ++
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
index 3f711c1a0996..54cabd2605dd 100644
--- a/drivers/soc/fsl/qe/qe_ic.c
+++ b/drivers/soc/fsl/qe/qe_ic.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -404,41 +405,40 @@ static void qe_ic_cascade_muxed_mpic(struct irq_desc 
*desc)
chip->irq_eoi(>irq_data);
 }
 
-static void __init qe_ic_init(struct device_node *node)
+static int qe_ic_init(struct platform_device *pdev)
 {
+   struct device *dev = >dev;
void (*low_handler)(struct irq_desc *desc);
void (*high_handler)(struct irq_desc *desc);
struct qe_ic *qe_ic;
-   struct resource res;
-   u32 ret;
+   struct resource *res;
+   struct device_node *node = pdev->dev.of_node;
 
-   ret = of_address_to_resource(node, 0, );
-   if (ret)
-   return;
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (res == NULL) {
+   dev_err(dev, "no memory resource defined\n");
+   return -ENODEV;
+   }
 
-   qe_ic = kzalloc(sizeof(*qe_ic), GFP_KERNEL);
+   qe_ic = devm_kzalloc(dev, sizeof(*qe_ic), GFP_KERNEL);
if (qe_ic == NULL)
-   return;
+   return -ENOMEM;
 
-   qe_ic->irqhost = irq_domain_add_linear(node, NR_QE_IC_INTS,
-  _ic_host_ops, qe_ic);
-   if (qe_ic->irqhost == NULL) {
-   kfree(qe_ic);
-   return;
+   qe_ic->regs = devm_ioremap(dev, res->start, resource_size(res));
+   if (qe_ic->regs == NULL) {
+   dev_err(dev, "failed to ioremap() registers\n");
+   return -ENODEV;
}
 
-   qe_ic->regs = ioremap(res.start, resource_size());
-
qe_ic->hc_irq = qe_ic_irq_chip;
 
-   qe_ic->virq_high = irq_of_parse_and_map(node, 0);
-   qe_ic->virq_low = irq_of_parse_and_map(node, 1);
+   qe_ic->virq_high = platform_get_irq(pdev, 0);
+   qe_ic->virq_low = platform_get_irq(pdev, 1);
 
-   if (!qe_ic->virq_low) {
-   printk(KERN_ERR "Failed to map QE_IC low IRQ\n");
-   kfree(qe_ic);
-   return;
+   if (qe_ic->virq_low < 0) {
+   return -ENODEV;
}
+
if (qe_ic->virq_high != qe_ic->virq_low) {
low_handler = qe_ic_cascade_low;
high_handler = qe_ic_cascade_high;
@@ -447,6 +447,13 @@ static void __init qe_ic_init(struct device_node *node)
high_handler = NULL;
}
 
+   qe_ic->irqhost = irq_domain_add_linear(node, NR_QE_IC_INTS,
+  _ic_host_ops, qe_ic);
+   if (qe_ic->irqhost == NULL) {
+   dev_err(dev, "failed to add irq domain\n");
+   return -ENODEV;
+   }
+
qe_ic_write(qe_ic->regs, QEIC_CICR, 0);
 
irq_set_handler_data(qe_ic->virq_low, qe_ic);
@@ -456,20 +463,26 @@ static void __init qe_ic_init(struct device_node *node)
irq_set_handler_data(qe_ic->virq_high, qe_ic);
irq_set_chained_handler(qe_ic->virq_high, high_handler);
}
+   return 0;
 }
+static const struct of_device_id qe_ic_ids[] = {
+   { .compatible = "fsl,qe-ic"},
+   { .compatible = "qeic"},
+   {},
+};
 
-static int __init qe_ic_of_init(void)
+static struct platform_driver qe_ic_driver =
 {
-   struct device_node *np;
+   .driver = {
+   .name   = "qe-ic",
+   .of_match_table = qe_ic_ids,
+   },
+   .probe  = qe_ic_init,
+};
 
-   np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
-   if (!np) {
-   np = of_find_node_by_type(NULL, "qeic");
-   

Linux kernel: powerpc: KVM guest to host memory corruption

2021-07-26 Thread Michael Ellerman
The Linux kernel for powerpc since v3.10 has a bug which allows a malicious KVM 
guest to
corrupt host memory.

In the handling of the H_RTAS hypercall, args.rets is made to point into the 
args.args
buffer which is located on the stack:

args.rets = [be32_to_cpu(args.nargs)];

However args.nargs has not been range checked. That allows the guest to point 
args.rets
anywhere up to +16GB from args.args.

The guest does not have control of what is written to args.rets, it is always 
(u32)-3,
because subsequent code does check nargs. Additionally the guest will be killed 
as a
result of the nargs being out of range, so a given guest only has a single shot 
at
corrupting memory.

Only machines using Linux as the hypervisor, aka. KVM or bare metal, are 
affected by the
bug.

The bug was introduced in:

8e591cb72047 ("KVM: PPC: Book3S: Add infrastructure to implement 
kernel-side RTAS calls")

Which was first released in v3.10.

The upstream fix is:

  f62f3c20647e ("KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow")

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f62f3c20647ebd5fb6ecb8f0b477b9281c44c10a

Which will be included in the v5.14 release.

cheers


Re: [PATCH v1 23/55] KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs

2021-07-26 Thread kernel test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.14-rc3 next-20210723]
[cannot apply to powerpc/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/KVM-PPC-Book3S-HV-P9-entry-exit-optimisations/20210726-115329
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
ff1176468d368232b684f75e82563369208bc371
config: powerpc64-randconfig-r024-20210726 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
c63dbd850182797bc4b76124d08e1c320ab2365d)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/d173e4690cf13578686dbbce48e1f81e925b96af
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Nicholas-Piggin/KVM-PPC-Book3S-HV-P9-entry-exit-optimisations/20210726-115329
git checkout d173e4690cf13578686dbbce48e1f81e925b96af
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir 
ARCH=powerpc SHELL=/bin/bash arch/powerpc/kernel/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/include/asm/io-defs.h:45:1: error: performing pointer 
arithmetic on a null pointer has undefined behavior 
[-Werror,-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
   ^~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 
'DEF_PCI_AC_NORET'
   __do_##name al; \
   ^~
   :121:1: note: expanded from here
   __do_insw
   ^
   arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw'
   #define __do_insw(p, b, n)  readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
  ~^
   In file included from arch/powerpc/kernel/process.c:28:
   In file included from include/linux/init_task.h:9:
   In file included from include/linux/ftrace.h:10:
   In file included from include/linux/trace_recursion.h:5:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:47:1: error: performing pointer 
arithmetic on a null pointer has undefined behavior 
[-Werror,-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 
'DEF_PCI_AC_NORET'
   __do_##name al; \
   ^~
   :123:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)  readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
  ~^
   In file included from arch/powerpc/kernel/process.c:28:
   In file included from include/linux/init_task.h:9:
   In file included from include/linux/ftrace.h:10:
   In file included from include/linux/trace_recursion.h:5:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:49:1: error: performing pointer 
arithmetic on a null pointer has undefined behavior 
[-Werror,-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 
'DEF_PCI_AC_NORET'
   __do_##name al; \
   ^~
   :125:1: note: expanded from here
   __do_outsb
   ^
   arch/p

Re: [PATCH v1 23/55] KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs

2021-07-26 Thread kernel test robot
Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc3 next-20210723]
[cannot apply to powerpc/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/KVM-PPC-Book3S-HV-P9-entry-exit-optimisations/20210726-115329
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
ff1176468d368232b684f75e82563369208bc371
config: powerpc-randconfig-r022-20210726 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
c63dbd850182797bc4b76124d08e1c320ab2365d)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# 
https://github.com/0day-ci/linux/commit/d173e4690cf13578686dbbce48e1f81e925b96af
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Nicholas-Piggin/KVM-PPC-Book3S-HV-P9-entry-exit-optimisations/20210726-115329
git checkout d173e4690cf13578686dbbce48e1f81e925b96af
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   arch/powerpc/kernel/process.c:612:33: error: no member named 'tm_tfhar' in 
'struct thread_struct'
   current->thread.tm_tfhar = mfspr(SPRN_TFHAR);
   ~~~ ^
   arch/powerpc/kernel/process.c:613:33: error: no member named 'tm_tfiar' in 
'struct thread_struct'
   current->thread.tm_tfiar = mfspr(SPRN_TFIAR);
   ~~~ ^
   arch/powerpc/kernel/process.c:614:33: error: no member named 'tm_texasr' in 
'struct thread_struct'
   current->thread.tm_texasr = mfspr(SPRN_TEXASR);
   ~~~ ^
>> arch/powerpc/kernel/process.c:596:6: warning: no previous prototype for 
>> function 'save_user_regs_kvm' [-Wmissing-prototypes]
   void save_user_regs_kvm(void)
^
   arch/powerpc/kernel/process.c:596:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
   void save_user_regs_kvm(void)
   ^
   static 
>> arch/powerpc/kernel/process.c:611:16: warning: shift count >= width of type 
>> [-Wshift-count-overflow]
   if (usermsr & MSR_TM) {
 ^~
   arch/powerpc/include/asm/reg.h:115:17: note: expanded from macro 'MSR_TM'
   #define MSR_TM  __MASK(MSR_TM_LG)   /* Transactional Mem 
Available */
   ^
   arch/powerpc/include/asm/reg.h:66:23: note: expanded from macro '__MASK'
   #define __MASK(X)   (1UL<<(X))
   ^ ~~~
   arch/powerpc/kernel/process.c:615:47: warning: shift count >= width of type 
[-Wshift-count-overflow]
   current->thread.regs->msr &= ~MSR_TM;
 ^~
   arch/powerpc/include/asm/reg.h:115:17: note: expanded from macro 'MSR_TM'
   #define MSR_TM  __MASK(MSR_TM_LG)   /* Transactional Mem 
Available */
   ^
   arch/powerpc/include/asm/reg.h:66:23: note: expanded from macro '__MASK'
   #define __MASK(X)   (1UL<<(X))
   ^ ~~~
   3 warnings and 3 errors generated.


vim +/save_user_regs_kvm +596 arch/powerpc/kernel/process.c

   595  
 > 596  void save_user_regs_kvm(void)
   597  {
   598  unsigned long usermsr;
   599  
   600  if (!current->thread.regs)
   601  return;
   602  
   603  usermsr = current->thread.regs->msr;
   604  
   605  if (usermsr & MSR_FP)
   606  save_fpu(current);
   607  
   608  if (usermsr & MSR_VEC)
   609  save_altivec(current);
   610  
 > 611  if (usermsr & MSR_TM) {
   612  current->thread.tm_tfhar = mfspr(SPRN_TFHAR);
   613  current->thread.tm_tfiar = mfspr(SPRN_TFIAR);
   614  current->thread.tm_texasr = mfspr(SPRN_TEXASR);
   615  current->thread.regs->msr &= ~MSR_TM;
   616  }
   617  }
   618  EXPORT_SYMBOL_GPL(save_user_regs_kvm);
   619  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip