Re: [PATCH v2 2/4] powerpc/selftests/perf-hwbreak: Coalesce event creation code

2021-04-11 Thread Ravi Bangoria




On 4/9/21 12:49 PM, Daniel Axtens wrote:

Hi Ravi,


perf-hwbreak selftest opens hw-breakpoint event at multiple places for
which it has same code repeated. Coalesce that code into a function.

Signed-off-by: Ravi Bangoria 
---
  .../selftests/powerpc/ptrace/perf-hwbreak.c   | 78 +--


This doesn't simplify things very much for the code as it stands now,
but I think your next patch adds a bunch of calls to these functions, so
I agree that it makes sense to consolidate them now.


Right. This helps in next patch.


  1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index c1f324afdbf3..bde475341c8a 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -34,28 +34,46 @@
  
  #define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
  
-static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid,

- int cpu, int group_fd,
- unsigned long flags)
+static void perf_event_attr_set(struct perf_event_attr *attr,
+   __u32 type, __u64 addr, __u64 len,
+   bool exclude_user)
  {
-   attr->size = sizeof(*attr);
-   return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->bp_type= type;
+   attr->bp_addr= addr;
+   attr->bp_len = len;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+   attr->exclude_guest  = 1;


Only 1 of the calls to perf sets exclude_{kernel,hv,guest} - I assume
there's no issue with setting them always but I wanted to check.


True. there is no issue in setting them as this test does all
load/stores in userspace. So all events will be recorded
irrespective of settings of exclude_{kernel,hv,guest}.




+   attr->exclude_user   = exclude_user;
+   attr->disabled   = 1;
  }
  
-	/* setup counters */

-   memset(, 0, sizeof(attr));
-   attr.disabled = 1;
-   attr.type = PERF_TYPE_BREAKPOINT;
-   attr.bp_type = readwriteflag;
-   attr.bp_addr = (__u64)ptr;
-   attr.bp_len = sizeof(int);
-   if (arraytest)
-   attr.bp_len = DAWR_LENGTH_MAX;
-   attr.exclude_user = exclude_user;
-   break_fd = sys_perf_event_open(, 0, -1, -1, 0);
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   arraytest ? DAWR_LENGTH_MAX : sizeof(int),
+   exclude_user);


checkpatch doesn't like this very much:

CHECK: Alignment should match open parenthesis
#103: FILE: tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c:115:
+   break_fd = perf_process_event_open_exclude_user(readwriteflag, 
(__u64)ptr,
+   arraytest ? DAWR_LENGTH_MAX : sizeof(int),


Sure will fix it.



Apart from that, this seems good but I haven't checked in super fine
detail just yet :)


Thanks for the review.
-Ravi


Re: [PATCH v2 1/4] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR

2021-04-11 Thread Ravi Bangoria



On 4/9/21 12:22 PM, Daniel Axtens wrote:

Hi Ravi,


Add selftests to test multiple active DAWRs with ptrace interface.


It would be good if somewhere (maybe in the cover letter) you explain
what DAWR stands for and where to find more information about it. I
found the Power ISA v3.1 Book 3 Chapter 9 very helpful.


Sure. Will add the details in v2 cover letter.


Apart from that, I don't have any specific comments about this patch. It
looks good to me, it seems to do what it says, and there are no comments
from checkpatch. It is a bit sparse in terms of comments but it is
consistent with the rest of the file so I can't really complain there :)

Reviewed-by: Daniel Axtens 


Thanks
Ravi


Re: [PATCH] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

2021-04-11 Thread Nicholas Piggin
Excerpts from Athira Rajeev's message of April 9, 2021 10:53 pm:
> 
> 
>> On 09-Apr-2021, at 6:38 AM, Nicholas Piggin  wrote:
>> 
> Hi Nick,
> 
> Thanks for checking the patch and sharing review comments.
> 
>> I was going to nitpick "overflown" here as something birds do, but some
>> sources says overflown is okay for past tense.
>> 
>> You could use "overflowed" for that, but I understand the issue with the 
>> word: you are talking about counters that are currently in an "overflow" 
>> state, but the overflow occurred in the past and is not still happening
>> so you "overflowing" doesn't exactly fit either.
>> 
>> overflown kind of works for some reason you can kind of use it for
>> present tense!
> 
> Ok sure, Yes counter is currently in an “overflow” state.
> 
>> 
>> Excerpts from Athira Rajeev's message of April 7, 2021 12:47 am:
>>> Running perf fuzzer showed below in dmesg logs:
>>> "Can't find PMC that caused IRQ"
>>> 
>>> This means a PMU exception happened, but none of the PMC's (Performance
>>> Monitor Counter) were found to be overflown. There are some corner cases
>>> that clears the PMCs after PMI gets masked. In such cases, the perf
>>> interrupt handler will not find the active PMC values that had caused
>>> the overflow and thus leads to this message while replaying.
>>> 
>>> Case 1: PMU Interrupt happens during replay of other interrupts and
>>> counter values gets cleared by PMU callbacks before replay:
>>> 
>>> During replay of interrupts like timer, __do_irq and doorbell exception, we
>>> conditionally enable interrupts via may_hard_irq_enable(). This could
>>> potentially create a window to generate a PMI. Since irq soft mask is set
>>> to ALL_DISABLED, the PMI will get masked here.
>> 
>> I wonder if may_hard_irq_enable shouldn't enable if PMI is soft
>> disabled. And also maybe replay should not set ALL_DISABLED if
>> there are no PMI interrupts pending.
>> 
>> Still, I think those are a bit more tricky and might take a while
>> to get right or just not be worth while, so I think your patch is
>> fine.
> 
> Ok Nick.
>> 
>>> We could get IPIs run before
>>> perf interrupt is replayed and the PMU events could deleted or stopped.
>>> This will change the PMU SPR values and resets the counters. Snippet of
>>> ftrace log showing PMU callbacks invoked in "__do_irq":
>>> 
>>> -0 [051] dns. 132025441306354: __do_irq <-call_do_irq
>>> -0 [051] dns. 132025441306430: irq_enter <-__do_irq
>>> -0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
>>> -0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
>>> <<>>
>>> -0 [051] dnH. 132025441307770: 
>>> generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
>>> -0 [051] dnH. 132025441307839: flush_smp_call_function_queue 
>>> <-smp_ipi_demux_relaxed
>>> -0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
>>> -0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
>>> -0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
>>> -0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
>>> -0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
>>> -0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
>>> -0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
>>> -0 [051] dnH. 132025441308787: power_pmu_event_idx 
>>> <-perf_event_update_userpage
>>> -0 [051] dnH. 132025441308859: rcu_read_unlock_strict 
>>> <-perf_event_update_userpage
>>> -0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
>>> <<>>
>>> -0 [051] dnH. 132025441311108: irq_exit <-__do_irq
>>> -0 [051] dns. 132025441311319: performance_monitor_exception 
>>> <-replay_soft_interrupts
>>> 
>>> Case 2: PMI's masked during local_* operations, example local_add.
>>> If the local_add operation happens within a local_irq_save, replay of
>>> PMI will be during local_irq_restore. Similar to case 1, this could
>>> also create a window before replay where PMU events gets deleted or
>>> stopped.
>> 
>> Here as well perhaps PMIs should be replayed if they are unmasked
>> even if other interrupts are still masked. Again that might be more
>> complexity than it's worth.
> Ok..
> 
>> 
>>> 
>>> Patch adds a fix to update the PMU callback functions (del,stop,enable) to
>>> check for pending perf interrupt. If there is an overflown PMC and pending
>>> perf interrupt indicated in Paca, clear the PMI bit in paca to drop that
>>> sample. In case of power_pmu_del, also clear the MMCR0 PMAO bit which
>>> otherwise could lead to spurious interrupts in some corner cases. Example,
>>> a timer after power_pmu_del which will re-enable interrupts since PMI is
>>> cleared and triggers a PMI again since PMAO bit is still set.
>>> 
>>> We can't just replay PMI any time. Hence this approach is preferred rather
>>> than replaying PMI before resetting overflown PMC. Patch also documents
>>> core-book3s on a race condition which can trigger these PMC messages during
>>> idle path in PowerNV.
>>> 
>>> Fixes: f442d004806e ("powerpc/64s: Add 

[PATCH v1 12/12] KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR

2021-04-11 Thread Nicholas Piggin
Rather than clear the HV bit from the MSR at guest entry, make it clear
that the hypervisor does not allow the guest to set the bit.

The HV clear is kept in guest entry for now, but a future patch will
warn if it is set.

Acked-by: Paul Mackerras 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_builtin.c | 4 ++--
 arch/powerpc/kvm/book3s_hv_nested.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 41cb03d0bde4..7a0e33a9c980 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -662,8 +662,8 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
 
 void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
-   /* Guest must always run with ME enabled. */
-   msr = msr | MSR_ME;
+   /* Guest must always run with ME enabled, HV disabled. */
+   msr = (msr | MSR_ME) & ~MSR_HV;
 
/*
 * Check for illegal transactional state bit combination
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index fb03085c902b..60724f674421 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -344,8 +344,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token;
vcpu->arch.regs = l2_regs;
 
-   /* Guest must always run with ME enabled. */
-   vcpu->arch.shregs.msr = vcpu->arch.regs.msr | MSR_ME;
+   /* Guest must always run with ME enabled, HV disabled. */
+   vcpu->arch.shregs.msr = (vcpu->arch.regs.msr | MSR_ME) & ~MSR_HV;
 
sanitise_hv_regs(vcpu, _hv);
restore_hv_regs(vcpu, _hv);
-- 
2.23.0



[PATCH v1 11/12] KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR

2021-04-11 Thread Nicholas Piggin
Rather than add the ME bit to the MSR at guest entry, make it clear
that the hypervisor does not allow the guest to clear the bit.

The ME set is kept in guest entry for now, but a future patch will
warn if it's not present.

Acked-by: Paul Mackerras 
Reviewed-by: Daniel Axtens 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_builtin.c | 3 +++
 arch/powerpc/kvm/book3s_hv_nested.c  | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 158d309b42a3..41cb03d0bde4 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -662,6 +662,9 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
 
 void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
+   /* Guest must always run with ME enabled. */
+   msr = msr | MSR_ME;
+
/*
 * Check for illegal transactional state bit combination
 * and if we find it, force the TS field to a safe state.
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index d14fe32f167b..fb03085c902b 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -343,7 +343,9 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
vcpu->arch.nested = l2;
vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token;
vcpu->arch.regs = l2_regs;
-   vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
+
+   /* Guest must always run with ME enabled. */
+   vcpu->arch.shregs.msr = vcpu->arch.regs.msr | MSR_ME;
 
sanitise_hv_regs(vcpu, _hv);
restore_hv_regs(vcpu, _hv);
-- 
2.23.0



[PATCH v1 10/12] powerpc/64s: remove KVM SKIP test from instruction breakpoint handler

2021-04-11 Thread Nicholas Piggin
The code being executed in KVM_GUEST_MODE_SKIP is hypervisor code with
MSR[IR]=0, so the faults of concern are the d-side ones caused by access
to guest context by the hypervisor.

Instruction breakpoint interrupts are not a concern here. It's unlikely
any good would come of causing breaks in this code, but skipping the
instruction that caused it won't help matters (e.g., skip the mtmsr that
sets MSR[DR]=0 or clears KVM_GUEST_MODE_SKIP).

 [Paul notes: "the 0x1300 interrupt was dropped from the architecture a
  long time ago and is not generated by P7, P8, P9 or P10." So add a
  comment about this in the handler code while we're here. ]

Acked-by: Paul Mackerras 
Reviewed-by: Daniel Axtens 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index a0515cb829c2..358cd4b0c08e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2549,11 +2549,16 @@ EXC_REAL_NONE(0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
 #endif
 
-
+/**
+ * Interrupt 0x1300 - Instruction Address Breakpoint Interrupt.
+ * This has been removed from the ISA before 2.01, which is the earliest
+ * 64-bit BookS ISA supported, however the G5 / 970 implements this
+ * interrupt with a non-architected feature available through the support
+ * processor interface.
+ */
 INT_DEFINE_BEGIN(instruction_breakpoint)
IVEC=0x1300
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
-   IKVM_SKIP=1
IKVM_REAL=1
 #endif
 INT_DEFINE_END(instruction_breakpoint)
-- 
2.23.0



[PATCH v1 09/12] powerpc/64s: Remove KVM handler support from CBE_RAS interrupts

2021-04-11 Thread Nicholas Piggin
Cell does not support KVM.

Acked-by: Paul Mackerras 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 8082b690e874..a0515cb829c2 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2530,8 +2530,6 @@ EXC_VIRT_NONE(0x5100, 0x100)
 INT_DEFINE_BEGIN(cbe_system_error)
IVEC=0x1200
IHSRR=1
-   IKVM_SKIP=1
-   IKVM_REAL=1
 INT_DEFINE_END(cbe_system_error)
 
 EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100)
@@ -2701,8 +2699,6 @@ EXC_COMMON_BEGIN(denorm_exception_common)
 INT_DEFINE_BEGIN(cbe_maintenance)
IVEC=0x1600
IHSRR=1
-   IKVM_SKIP=1
-   IKVM_REAL=1
 INT_DEFINE_END(cbe_maintenance)
 
 EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100)
@@ -2754,8 +2750,6 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 INT_DEFINE_BEGIN(cbe_thermal)
IVEC=0x1800
IHSRR=1
-   IKVM_SKIP=1
-   IKVM_REAL=1
 INT_DEFINE_END(cbe_thermal)
 
 EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100)
-- 
2.23.0



[PATCH v1 08/12] KVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls

2021-04-11 Thread Nicholas Piggin
This config option causes the warning in init_default_hcalls to fire
because the TCE handlers are in the default hcall list but not
implemented.

Acked-by: Paul Mackerras 
Reviewed-by: Daniel Axtens 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b88df175aa76..4a532410e128 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5412,8 +5412,10 @@ static unsigned int default_hcall_list[] = {
H_READ,
H_PROTECT,
H_BULK_REMOVE,
+#ifdef CONFIG_SPAPR_TCE_IOMMU
H_GET_TCE,
H_PUT_TCE,
+#endif
H_SET_DABR,
H_SET_XDABR,
H_CEDE,
-- 
2.23.0



[PATCH v1 07/12] KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument

2021-04-11 Thread Nicholas Piggin
The va argument is not used in the function or set by its asm caller,
so remove it to be safe.

Acked-by: Paul Mackerras 
Reviewed-by: Daniel Axtens 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_ppc.h  | 3 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 8aacd76bb702..9531b1c1b190 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -767,8 +767,7 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long 
flags,
  unsigned long pte_index, unsigned long avpn);
 long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu);
 long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
-  unsigned long pte_index, unsigned long avpn,
-  unsigned long va);
+  unsigned long pte_index, unsigned long avpn);
 long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long pte_index);
 long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 88da2764c1bb..7af7c70f1468 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -673,8 +673,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 }
 
 long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
- unsigned long pte_index, unsigned long avpn,
- unsigned long va)
+ unsigned long pte_index, unsigned long avpn)
 {
struct kvm *kvm = vcpu->kvm;
__be64 *hpte;
-- 
2.23.0



[PATCH v1 06/12] KVM: PPC: Book3S HV: Remove redundant mtspr PSPB

2021-04-11 Thread Nicholas Piggin
This SPR is set to 0 twice when exiting the guest.

Acked-by: Paul Mackerras 
Suggested-by: Fabiano Rosas 
Reviewed-by: Daniel Axtens 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 70c6e9c27eb7..b88df175aa76 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3790,7 +3790,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
mtspr(SPRN_DSCR, host_dscr);
mtspr(SPRN_TIDR, host_tidr);
mtspr(SPRN_IAMR, host_iamr);
-   mtspr(SPRN_PSPB, 0);
 
if (host_amr != vcpu->arch.amr)
mtspr(SPRN_AMR, host_amr);
-- 
2.23.0



[PATCH v1 05/12] KVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC]

2021-04-11 Thread Nicholas Piggin
Prevent radix guests setting LPCR[TC]. This bit only applies to hash
partitions.

Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3de8a1f89a7d..70c6e9c27eb7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1645,6 +1645,10 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
kvm_vcpu *vcpu,
  */
 unsigned long kvmppc_filter_lpcr_hv(struct kvm *kvm, unsigned long lpcr)
 {
+   /* LPCR_TC only applies to HPT guests */
+   if (kvm_is_radix(kvm))
+   lpcr &= ~LPCR_TC;
+
/* On POWER8 and above, userspace can modify AIL */
if (!cpu_has_feature(CPU_FTR_ARCH_207S))
lpcr &= ~LPCR_AIL;
-- 
2.23.0



[PATCH v1 04/12] KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2

2021-04-11 Thread Nicholas Piggin
These are already disallowed by H_SET_MODE from the guest, also disallow
these by updating LPCR directly.

AIL modes can affect the host interrupt behaviour while the guest LPCR
value is set, so filter it here too.

Acked-by: Paul Mackerras 
Suggested-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 268e31c7e49c..3de8a1f89a7d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -803,7 +803,10 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
vcpu->arch.dawrx1 = value2;
return H_SUCCESS;
case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
-   /* KVM does not support mflags=2 (AIL=2) */
+   /*
+* KVM does not support mflags=2 (AIL=2) and AIL=1 is reserved.
+* Keep this in synch with kvmppc_filter_guest_lpcr_hv.
+*/
if (mflags != 0 && mflags != 3)
return H_UNSUPPORTED_FLAG_START;
return H_TOO_HARD;
@@ -1645,6 +1648,8 @@ unsigned long kvmppc_filter_lpcr_hv(struct kvm *kvm, 
unsigned long lpcr)
/* On POWER8 and above, userspace can modify AIL */
if (!cpu_has_feature(CPU_FTR_ARCH_207S))
lpcr &= ~LPCR_AIL;
+   if ((lpcr & LPCR_AIL) != LPCR_AIL_3)
+   lpcr &= ~LPCR_AIL; /* LPCR[AIL]=1/2 is disallowed */
 
/*
 * On POWER9, allow userspace to enable large decrementer for the
-- 
2.23.0



[PATCH v1 03/12] KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits

2021-04-11 Thread Nicholas Piggin
Guest LPCR depends on hardware type, and future changes will add
restrictions based on errata and guest MMU mode. Move this logic
to a common function and use it for the cases where the guest
wants to update its LPCR (or the LPCR of a nested guest).

This also adds a warning in other places that set or update LPCR
if we try to set something that would have been disallowed by
the filter, as a sanity check.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_book3s.h |  2 +
 arch/powerpc/kvm/book3s_hv.c  | 68 ---
 arch/powerpc/kvm/book3s_hv_nested.c   |  8 +++-
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 2f5f919f6cd3..c58121508157 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -258,6 +258,8 @@ extern long kvmppc_hv_get_dirty_log_hpt(struct kvm *kvm,
 extern void kvmppc_harvest_vpa_dirty(struct kvmppc_vpa *vpa,
struct kvm_memory_slot *memslot,
unsigned long *map);
+extern unsigned long kvmppc_filter_lpcr_hv(struct kvm *kvm,
+   unsigned long lpcr);
 extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
unsigned long mask);
 extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 208a053c9adf..268e31c7e49c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1635,6 +1635,35 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
kvm_vcpu *vcpu,
return 0;
 }
 
+/*
+ * Enforce limits on guest LPCR values based on hardware availability,
+ * guest configuration, and possibly hypervisor support and security
+ * concerns.
+ */
+unsigned long kvmppc_filter_lpcr_hv(struct kvm *kvm, unsigned long lpcr)
+{
+   /* On POWER8 and above, userspace can modify AIL */
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   lpcr &= ~LPCR_AIL;
+
+   /*
+* On POWER9, allow userspace to enable large decrementer for the
+* guest, whether or not the host has it enabled.
+*/
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   lpcr &= ~LPCR_LD;
+
+   return lpcr;
+}
+
+static void verify_lpcr(struct kvm *kvm, unsigned long lpcr)
+{
+   if (lpcr != kvmppc_filter_lpcr_hv(kvm, lpcr)) {
+   WARN_ONCE(1, "lpcr 0x%lx differs from filtered 0x%lx\n",
+ lpcr, kvmppc_filter_lpcr_hv(kvm, lpcr));
+   }
+}
+
 static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
bool preserve_top32)
 {
@@ -1643,6 +1672,23 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 
new_lpcr,
u64 mask;
 
spin_lock(>lock);
+
+   /*
+* Userspace can only modify
+* DPFD (default prefetch depth), ILE (interrupt little-endian),
+* TC (translation control), AIL (alternate interrupt location),
+* LD (large decrementer).
+* These are subject to restrictions from kvmppc_filter_lcpr_hv().
+*/
+   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD;
+
+   /* Broken 32-bit version of LPCR must not clear top bits */
+   if (preserve_top32)
+   mask &= 0x;
+
+   new_lpcr = kvmppc_filter_lpcr_hv(kvm,
+   (vc->lpcr & ~mask) | (new_lpcr & mask));
+
/*
 * If ILE (interrupt little-endian) has changed, update the
 * MSR_LE bit in the intr_msr for each vcpu in this vcore.
@@ -1661,25 +1707,8 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 
new_lpcr,
}
}
 
-   /*
-* Userspace can only modify DPFD (default prefetch depth),
-* ILE (interrupt little-endian) and TC (translation control).
-* On POWER8 and POWER9 userspace can also modify AIL (alt. interrupt 
loc.).
-*/
-   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
-   if (cpu_has_feature(CPU_FTR_ARCH_207S))
-   mask |= LPCR_AIL;
-   /*
-* On POWER9, allow userspace to enable large decrementer for the
-* guest, whether or not the host has it enabled.
-*/
-   if (cpu_has_feature(CPU_FTR_ARCH_300))
-   mask |= LPCR_LD;
+   vc->lpcr = new_lpcr;
 
-   /* Broken 32-bit version of LPCR must not clear top bits */
-   if (preserve_top32)
-   mask &= 0x;
-   vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
spin_unlock(>lock);
 }
 
@@ -4644,8 +4673,10 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long 
lpcr, unsigned long mask)
struct kvmppc_vcore *vc = kvm->arch.vcores[i];
if (!vc)
continue;
+
spin_lock(>lock);
vc->lpcr = (vc->lpcr & ~mask) | lpcr;
+ 

[PATCH v1 02/12] KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs

2021-04-11 Thread Nicholas Piggin
This will get a bit more complicated in future patches. Move it
into the helper function.

This change allows the L1 hypervisor to determine some of the LPCR
bits that the L0 is using to run it, which could be a privilege
violation (LPCR is HV-privileged), although the same problem exists
now for HFSCR for example. Discussion of the HV privilege issue is
ongoing and can be resolved with a later change.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 0cd0e7aad588..3060e5deffc8 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -132,8 +132,27 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
int trap,
}
 }
 
+/*
+ * This can result in some L0 HV register state being leaked to an L1
+ * hypervisor when the hv_guest_state is copied back to the guest after
+ * being modified here.
+ *
+ * There is no known problem with such a leak, and in many cases these
+ * register settings could be derived by the guest by observing behaviour
+ * and timing, interrupts, etc., but it is an issue to consider.
+ */
 static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 {
+   struct kvmppc_vcore *vc = vcpu->arch.vcore;
+   u64 mask;
+
+   /*
+* Don't let L1 change LPCR bits for the L2 except these:
+*/
+   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
+   LPCR_LPES | LPCR_MER;
+   hr->lpcr = (vc->lpcr & ~mask) | (hr->lpcr & mask);
+
/*
 * Don't let L1 enable features for L2 which we've disabled for L1,
 * but preserve the interrupt cause field.
@@ -271,8 +290,6 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
u64 hv_ptr, regs_ptr;
u64 hdec_exp;
s64 delta_purr, delta_spurr, delta_ic, delta_vtb;
-   u64 mask;
-   unsigned long lpcr;
 
if (vcpu->kvm->arch.l1_ptcr == 0)
return H_NOT_AVAILABLE;
@@ -321,9 +338,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token;
vcpu->arch.regs = l2_regs;
vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
-   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
-   LPCR_LPES | LPCR_MER;
-   lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
+
sanitise_hv_regs(vcpu, _hv);
restore_hv_regs(vcpu, _hv);
 
@@ -335,7 +350,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
r = RESUME_HOST;
break;
}
-   r = kvmhv_run_single_vcpu(vcpu, hdec_exp, lpcr);
+   r = kvmhv_run_single_vcpu(vcpu, hdec_exp, l2_hv.lpcr);
} while (is_kvmppc_resume_guest(r));
 
/* save L2 state for return */
-- 
2.23.0



[PATCH v1 00/12] minor KVM fixes and cleanups

2021-04-11 Thread Nicholas Piggin
Here is the first batch of patches are extracted from the patches of the
KVM C conversion series, plus one new fix (host CTRL not restored) since
v6 was posted.

Please consider for merging.

Thanks,
Nick

Nicholas Piggin (12):
  KVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit
  KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs
  KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits
  KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2
  KVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC]
  KVM: PPC: Book3S HV: Remove redundant mtspr PSPB
  KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument
  KVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls
  powerpc/64s: Remove KVM handler support from CBE_RAS interrupts
  powerpc/64s: remove KVM SKIP test from instruction breakpoint handler
  KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR
  KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR

 arch/powerpc/include/asm/kvm_book3s.h |  2 +
 arch/powerpc/include/asm/kvm_ppc.h|  3 +-
 arch/powerpc/kernel/exceptions-64s.S  | 15 +++--
 arch/powerpc/kvm/book3s_hv.c  | 85 ---
 arch/powerpc/kvm/book3s_hv_builtin.c  |  3 +
 arch/powerpc/kvm/book3s_hv_nested.c   | 37 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |  3 +-
 7 files changed, 109 insertions(+), 39 deletions(-)

-- 
2.23.0



[PATCH v1 01/12] KVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit

2021-04-11 Thread Nicholas Piggin
The host CTRL (runlatch) value is not restored after guest exit. The
host CTRL should always be 1 except in CPU idle code, so this can result
in the host running with runlatch clear, and potentially switching to
a different vCPU which then runs with runlatch clear as well.

This has little effect on P9 machines, CTRL is only responsible for some
PMU counter logic in the host and so other than corner cases of software
relying on that, or explicitly reading the runlatch value (Linux does
not appear to be affected but it's possible non-Linux guests could be),
there should be no execution correctness problem, though it could be
used as a covert channel between guests.

There may be microcontrollers, firmware or monitoring tools that sample
the runlatch value out-of-band, however since the register is writable
by guests, these values would (should) not be relied upon for correct
operation of the host, so suboptimal performance or incorrect reporting
should be the worst problem.

Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path 
on P9 for radix guests")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 13bad6bf4c95..208a053c9adf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3728,7 +3728,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vcpu->arch.dec_expires = dec + tb;
vcpu->cpu = -1;
vcpu->arch.thread_cpu = -1;
+   /* Save guest CTRL register, set runlatch to 1 */
vcpu->arch.ctrl = mfspr(SPRN_CTRLF);
+   if (!(vcpu->arch.ctrl & 1))
+   mtspr(SPRN_CTRLT, vcpu->arch.ctrl | 1);
 
vcpu->arch.iamr = mfspr(SPRN_IAMR);
vcpu->arch.pspb = mfspr(SPRN_PSPB);
-- 
2.23.0



Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-11 Thread Matthew Wilcox
On Sun, Apr 11, 2021 at 11:33:18AM +0100, Matthew Wilcox wrote:
> Basically, we have three aligned dwords here.  We can either alias with
> @flags and the first word of @lru, or the second word of @lru and @mapping,
> or @index and @private.  @flags is a non-starter.  If we use @mapping,
> then you have to set it to NULL before you free it, and I'm not sure
> how easy that will be for you.  If that's trivial, then we could use
> the layout:
> 
>   unsigned long _pp_flags;
>   unsigned long pp_magic;
>   union {
>   dma_addr_t dma_addr;/* might be one or two words */
>   unsigned long _pp_align[2];
>   };
>   unsigned long pp_pfmemalloc;
>   unsigned long xmi;

I forgot about the munmap path.  That calls zap_page_range() which calls
set_page_dirty() which calls page_mapping().  If we use page->mapping,
that's going to get interpreted as an address_space pointer.

*sigh*.  Foiled at every turn.

I'm kind of inclined towards using two (or more) bits for PageSlab as
we discussed here:

https://lore.kernel.org/linux-mm/01000163efe179fe-d6270c58-eaba-482f-a6bd-334667250ef7-000...@email.amazonses.com/

so we have PageKAlloc that's true for PageSlab, PagePool, PageDMAPool,
PageVMalloc, PageFrag and maybe a few other kernel-internal allocations.

(see also here:)
https://lore.kernel.org/linux-mm/20180518194519.3820-18-wi...@infradead.org/


Re: [PATCH 02/16] powerpc/vas: Make VAS API powerpc platform independent

2021-04-11 Thread Haren Myneni


Christophe,
Thanks for your comments. Please see below for my responses.

On Sun, 2021-04-11 at 10:49 +0200, Christophe Leroy wrote:
> 
> Le 11/04/2021 à 02:31, Haren Myneni a écrit :
> > Using the same /dev/crypto/nx-gzip interface for both powerNV and
> > pseries. So this patcb moves VAS API to powerpc platform indepedent
> > directory. The actual functionality is not changed in this patch.
> 
> This patch seems to do a lot more than moving VAS API to independent
> directory. A more detailed 
> description would help.

Actually the functionality is not changed in this patch. Moved the
common interface code (needed for both powerNV and powerVM platforms)
to arch/powerpc/kernel and added hooks to invoke powerNV specific
functions (underline code is not changed). I will add some more
description.

> 
> And it is not something defined in the powerpc architecture I think,
> so it should
> remain in some common platform related directory.
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >   arch/powerpc/Kconfig  | 15 +
> >   arch/powerpc/include/asm/vas.h| 22 ++-
> >   arch/powerpc/kernel/Makefile  |  1 +
> >   .../{platforms/powernv => kernel}/vas-api.c   | 64 ++--
> > --
> >   arch/powerpc/platforms/powernv/Kconfig| 14 
> >   arch/powerpc/platforms/powernv/Makefile   |  2 +-
> >   arch/powerpc/platforms/powernv/vas-window.c   | 66
> > +++
> >   7 files changed, 140 insertions(+), 44 deletions(-)
> >   rename arch/powerpc/{platforms/powernv => kernel}/vas-api.c (83%)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 386ae12d8523..7aa1fbf7c1dc 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -478,6 +478,21 @@ config PPC_UV
> >   
> >   If unsure, say "N".
> >   
> > +config PPC_VAS
> > +   bool "IBM Virtual Accelerator Switchboard (VAS)"
> > +   depends on PPC_POWERNV && PPC_64K_PAGES
> > +   default y
> > +   help
> > + This enables support for IBM Virtual Accelerator Switchboard
> > (VAS).
> 
> IIUC is a functionnality in a coprocessor of some IBM processors.
> Something similar in principle to 
> the communication coprocessors we find in Freescale processors.
> 
> It is not a generic functionnality part of the powerpc architecture,
> I don't think this belongs to 
> arch/powerpc/Kconfig
> 
> I think it should go in arch/powerpc/platform/Kconfig
> 
> Or maybe in drivers/soc/ibm/ ?

VAS (Virtual Accelerator Switchboard) is on powerpc chip which allows
userspace / kernel to communicate to accelerators (Right now supports
only Next (NX) accelerator). VAS can also provide other SW
functionalities such as fast thread wakeup, fast memory copy (using
copy/paste instructions). So introduced this VAS API to support any
accelerator and other SW functionalities in future. 

VAS is introduced on P9 and will be available in all future powerpc
chips.

I will move config changes to arch/powerpc/platform/Kconfig but thought
easy to look / manage Kconfig changes if they are in same directory. 

Thanks
Haren
> 
> 
> > +
> > + VAS allows accelerators in co-processors like NX-GZIP and NX-
> > 842
> > + to be accessible to kernel subsystems and user processes.
> > + VAS adapters are found in POWER9 and later based systems.
> > + The user mode NX-GZIP support is added on P9 for powerNV and
> > on
> > + P10 for powerVM.
> > +
> > + If unsure, say "N".
> > +
> >   config LD_HEAD_STUB_CATCH
> > bool "Reserve 256 bytes to cope with linker stubs in HEAD text"
> > if EXPERT
> > depends on PPC64
> > diff --git a/arch/powerpc/include/asm/vas.h
> > b/arch/powerpc/include/asm/vas.h
> > index 41f73fae7ab8..6bbade60d8f4 100644
> > --- a/arch/powerpc/include/asm/vas.h
> > +++ b/arch/powerpc/include/asm/vas.h
> > @@ -5,6 +5,8 @@
> >   
> >   #ifndef _ASM_POWERPC_VAS_H
> >   #define _ASM_POWERPC_VAS_H
> > +#include 
> > +
> >   
> >   struct vas_window;
> >   
> > @@ -48,6 +50,16 @@ enum vas_cop_type {
> > VAS_COP_TYPE_MAX,
> >   };
> >   
> > +/*
> > + * User space window operations used for powernv and powerVM
> > + */
> > +struct vas_user_win_ops {
> > +   struct vas_window * (*open_win)(struct vas_tx_win_open_attr *,
> > +   enum vas_cop_type);
> > +   u64 (*paste_addr)(void *);
> > +   int (*close_win)(void *);
> > +};
> > +
> >   /*
> >* Receive window attributes specified by the (in-kernel) owner
> > of window.
> >*/
> > @@ -161,6 +173,9 @@ int vas_copy_crb(void *crb, int offset);
> >* assumed to be true for NX windows.
> >*/
> >   int vas_paste_crb(struct vas_window *win, int offset, bool re);
> > +int vas_register_api_powernv(struct module *mod, enum vas_cop_type
> > cop_type,
> > +const char *name);
> > +void vas_unregister_api_powernv(void);
> >   
> >   /*
> >* Register / unregister coprocessor type to VAS API which will
> > be exported
> > @@ -170,8 +185,9 

Re: Bogus struct page layout on 32-bit

2021-04-11 Thread Matthew Wilcox
On Sat, Apr 10, 2021 at 09:10:47PM +0200, Arnd Bergmann wrote:
> On Sat, Apr 10, 2021 at 4:44 AM Matthew Wilcox  wrote:
> > +   dma_addr_t dma_addr __packed;
> > };
> > struct {/* slab, slob and slub */
> > union {
> >
> > but I don't know if GCC is smart enough to realise that dma_addr is now
> > on an 8 byte boundary and it can use a normal instruction to access it,
> > or whether it'll do something daft like use byte loads to access it.
> >
> > We could also do:
> >
> > +   dma_addr_t dma_addr __packed __aligned(sizeof(void 
> > *));
> >
> > and I see pahole, at least sees this correctly:
> >
> > struct {
> > long unsigned int _page_pool_pad; /* 4 4 */
> > dma_addr_t dma_addr 
> > __attribute__((__aligned__(4))); /* 8 8 */
> > } __attribute__((__packed__)) 
> > __attribute__((__aligned__(4)));
> >
> > This presumably affects any 32-bit architecture with a 64-bit phys_addr_t
> > / dma_addr_t.  Advice, please?
> 
> I've tried out what gcc would make of this:  https://godbolt.org/z/aTEbxxbG3
> 
> struct page {
> short a;
> struct {
> short b;
> long long c __attribute__((packed, aligned(2)));
> } __attribute__((packed));
> } __attribute__((aligned(8)));
> 
> In this structure, 'c' is clearly aligned to eight bytes, and gcc does
> realize that
> it is safe to use the 'ldrd' instruction for 32-bit arm, which is forbidden on
> struct members with less than 4 byte alignment. However, it also complains
> that passing a pointer to 'c' into a function that expects a 'long long' is 
> not
> allowed because alignof(c) is only '2' here.
> 
> (I used 'short' here because I having a 64-bit member misaligned by four
> bytes wouldn't make a difference to the instructions on Arm, or any other
> 32-bit architecture I can think of, regardless of the ABI requirements).

So ... we could do this:

+++ b/include/linux/types.h
@@ -140,7 +140,7 @@ typedef u64 blkcnt_t;
  * so they don't care about the size of the actual bus addresses.
  */
 #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
-typedef u64 dma_addr_t;
+typedef u64 __attribute__((aligned(sizeof(void * dma_addr_t;
 #else
 typedef u32 dma_addr_t;
 #endif

but I'm a little scared that this might have unintended consequences.
And Jesper points out that a big-endian 64-bit dma_addr_t can impersonate
a PageTail page, and we should solve that problem while we're at it.
So I don't think we should do this, but thought I should mention it as
a possibility.


Re: sysctl: setting key "net.core.bpf_jit_enable": Invalid argument

2021-04-11 Thread Paul Menzel

Dear Christophe,


Am 11.04.21 um 18:23 schrieb Christophe Leroy:


Le 11/04/2021 à 13:09, Paul Menzel a écrit :


Related to * [CVE-2021-29154] Linux kernel incorrect computation of 
branch displacements in BPF JIT compiler can be abused to execute 
arbitrary code in Kernel mode* [1], on the POWER8 system IBM S822LC 
with self-built Linux 5.12.0-rc5+, I am unable to disable 
`bpf_jit_enable`.


    $ /sbin/sysctl net.core.bpf_jit_enable
    net.core.bpf_jit_enable = 1
    $ sudo /sbin/sysctl -w net.core.bpf_jit_enable=0
    sysctl: setting key "net.core.bpf_jit_enable": Invalid argument

It works on an x86 with Debian sid/unstable and Linux 5.10.26-1.


Maybe you have selected CONFIG_BPF_JIT_ALWAYS_ON in your self-built 
kernel ?


config BPF_JIT_ALWAYS_ON
 bool "Permanently enable BPF JIT and remove BPF interpreter"
 depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT
 help
   Enables BPF JIT and removes BPF interpreter to avoid
   speculative execution of BPF instructions by the interpreter


Thank you. Indeed. In contrast to Debian, Ubuntu’s Linux configuration 
selects that option, and I copied that.


$ grep _BPF_JIT /boot/config-5.8.0-49-generic
/boot/config-5.8.0-49-generic:CONFIG_BPF_JIT_ALWAYS_ON=y
/boot/config-5.8.0-49-generic:CONFIG_BPF_JIT_DEFAULT_ON=y
/boot/config-5.8.0-49-generic:CONFIG_BPF_JIT=y

I wonder, if there is a way to better integrate that option into 
`/proc/sys`, so it’s clear, that it’s always enabled.



Kind regards,

Paul


Re: [PATCH v6 3/9] riscv: locks: Introduce ticket-based spinlock implementation

2021-04-11 Thread Guo Ren
On Mon, Apr 12, 2021 at 12:02 AM Guo Ren  wrote:
>
> On Wed, Mar 31, 2021 at 10:32 PM  wrote:
> >
> > From: Guo Ren 
> >
> > This patch introduces a ticket lock implementation for riscv, along the
> > same lines as the implementation for arch/arm & arch/csky.
> >
> > We still use qspinlock as default.
> >
> > Signed-off-by: Guo Ren 
> > Cc: Peter Zijlstra 
> > Cc: Anup Patel 
> > Cc: Arnd Bergmann 
> > ---
> >  arch/riscv/Kconfig  |  7 ++-
> >  arch/riscv/include/asm/spinlock.h   | 84 +
> >  arch/riscv/include/asm/spinlock_types.h | 17 +
> >  3 files changed, 107 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 67cc65ba1ea1..34d0276f01d5 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -34,7 +34,7 @@ config RISCV
> > select ARCH_WANT_FRAME_POINTERS
> > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> > select ARCH_USE_QUEUED_RWLOCKS
> > -   select ARCH_USE_QUEUED_SPINLOCKS
> > +   select ARCH_USE_QUEUED_SPINLOCKSif !RISCV_TICKET_LOCK
> > select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> > select CLONE_BACKWARDS
> > select CLINT_TIMER if !MMU
> > @@ -344,6 +344,11 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> > def_bool y
> > depends on NUMA
> >
> > +config RISCV_TICKET_LOCK
> > +   bool "Ticket-based spin-locking"
> > +   help
> > + Say Y here to use ticket-based spin-locking.
> > +
> >  config RISCV_ISA_C
> > bool "Emit compressed instructions when building Linux"
> > default y
> > diff --git a/arch/riscv/include/asm/spinlock.h 
> > b/arch/riscv/include/asm/spinlock.h
> > index a557de67a425..90b7eaa950cf 100644
> > --- a/arch/riscv/include/asm/spinlock.h
> > +++ b/arch/riscv/include/asm/spinlock.h
> > @@ -7,7 +7,91 @@
> >  #ifndef _ASM_RISCV_SPINLOCK_H
> >  #define _ASM_RISCV_SPINLOCK_H
> >
> > +#ifdef CONFIG_RISCV_TICKET_LOCK
> > +#ifdef CONFIG_32BIT
> > +#define __ASM_SLLIW "slli\t"
> > +#define __ASM_SRLIW "srli\t"
> > +#else
> > +#define __ASM_SLLIW "slliw\t"
> > +#define __ASM_SRLIW "srliw\t"
> > +#endif
> > +
> > +/*
> > + * Ticket-based spin-locking.
> > + */
> > +static inline void arch_spin_lock(arch_spinlock_t *lock)
> > +{
> > +   arch_spinlock_t lockval;
> > +   u32 tmp;
> > +
> > +   asm volatile (
> > +   "1: lr.w%0, %2  \n"
> > +   "   mv  %1, %0  \n"
> > +   "   addw%0, %0, %3  \n"
> > +   "   sc.w%0, %0, %2  \n"
> > +   "   bnez%0, 1b  \n"
> > +   : "=" (tmp), "=" (lockval), "+A" (lock->lock)
> > +   : "r" (1 << TICKET_NEXT)
> > +   : "memory");
> > +
> > +   smp_cond_load_acquire(>tickets.owner,
> > +   VAL == lockval.tickets.next);
> It's wrong, blew is fixup:
>
> diff --git a/arch/csky/include/asm/spinlock.h 
> b/arch/csky/include/asm/spinlock.h
> index fe98ad8ece51..2be627ceb9df 100644
> --- a/arch/csky/include/asm/spinlock.h
> +++ b/arch/csky/include/asm/spinlock.h
> @@ -27,7 +27,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
> : "r"(p), "r"(ticket_next)
> : "cc");
>
> -   smp_cond_load_acquire(>tickets.owner,
> +   if (lockval.owner != lockval.tickets.next)
> +   smp_cond_load_acquire(>tickets.owner,
> VAL == lockval.tickets.next);
eh... plus __smp_acquire_fence:

   if (lockval.owner != lockval.tickets.next)
   smp_cond_load_acquire(>tickets.owner,
VAL == lockval.tickets.next);
   else
   __smp_acquire_fence();

> > +}
> > +
> > +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> > +{
> > +   u32 tmp, contended, res;
> > +
> > +   do {
> > +   asm volatile (
> > +   "   lr.w%0, %3  \n"
> > +   __ASM_SRLIW"%1, %0, %5  \n"
> > +   __ASM_SLLIW"%2, %0, %5  \n"
> > +   "   or  %1, %2, %1  \n"
> > +   "   li  %2, 0   \n"
> > +   "   sub %1, %1, %0  \n"
> > +   "   bnez%1, 1f  \n"
> > +   "   addw%0, %0, %4  \n"
> > +   "   sc.w%2, %0, %3  \n"
> > +   "1: \n"
> > +   : "=" (tmp), "=" (contended), "=" (res),
> > + "+A" (lock->lock)
> > +   : "r" (1 << TICKET_NEXT), "I" (TICKET_NEXT)
> > +   : "memory");
> > +   } while (res);
> > +
> > +   if (!contended)
> > +   __atomic_acquire_fence();
> > +
> > +   return !contended;
> > +}
> > +
> > +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > +{
> > +   

[PATCH] powerpc/signal32: Fix build failure with CONFIG_SPE

2021-04-11 Thread Christophe Leroy
Add missing fault exit label in unsafe_copy_from_user() in order to
avoid following build failure with CONFIG_SPE

  CC  arch/powerpc/kernel/signal_32.o
arch/powerpc/kernel/signal_32.c: In function 'restore_user_regs':
arch/powerpc/kernel/signal_32.c:565:36: error: macro "unsafe_copy_from_user" 
requires 4 arguments, but only 3 given
  565 |   ELF_NEVRREG * sizeof(u32));
  |^
In file included from ./include/linux/uaccess.h:11,
 from ./include/linux/sched/task.h:11,
 from ./include/linux/sched/signal.h:9,
 from ./include/linux/rcuwait.h:6,
 from ./include/linux/percpu-rwsem.h:7,
 from ./include/linux/fs.h:33,
 from ./include/linux/huge_mm.h:8,
 from ./include/linux/mm.h:707,
 from arch/powerpc/kernel/signal_32.c:17:
./arch/powerpc/include/asm/uaccess.h:428: note: macro "unsafe_copy_from_user" 
defined here
  428 | #define unsafe_copy_from_user(d, s, l, e) \
  |
arch/powerpc/kernel/signal_32.c:564:3: error: 'unsafe_copy_from_user' 
undeclared (first use in this function); did you mean 'raw_copy_from_user'?
  564 |   unsafe_copy_from_user(current->thread.evr, >mc_vregs,
  |   ^
  |   raw_copy_from_user
arch/powerpc/kernel/signal_32.c:564:3: note: each undeclared identifier is 
reported only once for each function it appears in
make[3]: *** [arch/powerpc/kernel/signal_32.o] Error 1

Signed-off-by: Christophe Leroy 
Fixes: 627b72bee84d ("powerpc/signal32: Convert restore_[tm]_user_regs() to 
user access block")
Reported-by: kernel test robot 
Reported-by: Guenter Roeck 
---
 arch/powerpc/kernel/signal_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 23fdb364b511..d489ccea2ab3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -562,7 +562,7 @@ static long restore_user_regs(struct pt_regs *regs,
if (msr & MSR_SPE) {
/* restore spe registers from the stack */
unsafe_copy_from_user(current->thread.evr, >mc_vregs,
- ELF_NEVRREG * sizeof(u32));
+ ELF_NEVRREG * sizeof(u32), failed);
current->thread.used_spe = true;
} else if (current->thread.used_spe)
memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32));
-- 
2.25.0



Re: sysctl: setting key "net.core.bpf_jit_enable": Invalid argument

2021-04-11 Thread Christophe Leroy




Le 11/04/2021 à 13:09, Paul Menzel a écrit :

Dear Linux folks,


Related to * [CVE-2021-29154] Linux kernel incorrect computation of branch displacements in BPF JIT 
compiler can be abused to execute arbitrary code in Kernel mode* [1], on the POWER8 system IBM 
S822LC with self-built Linux 5.12.0-rc5+, I am unable to disable `bpf_jit_enable`.


    $ /sbin/sysctl net.core.bpf_jit_enable
    net.core.bpf_jit_enable = 1
    $ sudo /sbin/sysctl -w net.core.bpf_jit_enable=0
    sysctl: setting key "net.core.bpf_jit_enable": Invalid argument

It works on an x86 with Debian sid/unstable and Linux 5.10.26-1.


Maybe you have selected CONFIG_BPF_JIT_ALWAYS_ON in your self-built kernel ?

config BPF_JIT_ALWAYS_ON
bool "Permanently enable BPF JIT and remove BPF interpreter"
depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT
help
  Enables BPF JIT and removes BPF interpreter to avoid
  speculative execution of BPF instructions by the interpreter


Christophe


Re: [PATCH v6 3/9] riscv: locks: Introduce ticket-based spinlock implementation

2021-04-11 Thread Guo Ren
On Wed, Mar 31, 2021 at 10:32 PM  wrote:
>
> From: Guo Ren 
>
> This patch introduces a ticket lock implementation for riscv, along the
> same lines as the implementation for arch/arm & arch/csky.
>
> We still use qspinlock as default.
>
> Signed-off-by: Guo Ren 
> Cc: Peter Zijlstra 
> Cc: Anup Patel 
> Cc: Arnd Bergmann 
> ---
>  arch/riscv/Kconfig  |  7 ++-
>  arch/riscv/include/asm/spinlock.h   | 84 +
>  arch/riscv/include/asm/spinlock_types.h | 17 +
>  3 files changed, 107 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 67cc65ba1ea1..34d0276f01d5 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -34,7 +34,7 @@ config RISCV
> select ARCH_WANT_FRAME_POINTERS
> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> select ARCH_USE_QUEUED_RWLOCKS
> -   select ARCH_USE_QUEUED_SPINLOCKS
> +   select ARCH_USE_QUEUED_SPINLOCKSif !RISCV_TICKET_LOCK
> select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> select CLONE_BACKWARDS
> select CLINT_TIMER if !MMU
> @@ -344,6 +344,11 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> +config RISCV_TICKET_LOCK
> +   bool "Ticket-based spin-locking"
> +   help
> + Say Y here to use ticket-based spin-locking.
> +
>  config RISCV_ISA_C
> bool "Emit compressed instructions when building Linux"
> default y
> diff --git a/arch/riscv/include/asm/spinlock.h 
> b/arch/riscv/include/asm/spinlock.h
> index a557de67a425..90b7eaa950cf 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -7,7 +7,91 @@
>  #ifndef _ASM_RISCV_SPINLOCK_H
>  #define _ASM_RISCV_SPINLOCK_H
>
> +#ifdef CONFIG_RISCV_TICKET_LOCK
> +#ifdef CONFIG_32BIT
> +#define __ASM_SLLIW "slli\t"
> +#define __ASM_SRLIW "srli\t"
> +#else
> +#define __ASM_SLLIW "slliw\t"
> +#define __ASM_SRLIW "srliw\t"
> +#endif
> +
> +/*
> + * Ticket-based spin-locking.
> + */
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +   arch_spinlock_t lockval;
> +   u32 tmp;
> +
> +   asm volatile (
> +   "1: lr.w%0, %2  \n"
> +   "   mv  %1, %0  \n"
> +   "   addw%0, %0, %3  \n"
> +   "   sc.w%0, %0, %2  \n"
> +   "   bnez%0, 1b  \n"
> +   : "=" (tmp), "=" (lockval), "+A" (lock->lock)
> +   : "r" (1 << TICKET_NEXT)
> +   : "memory");
> +
> +   smp_cond_load_acquire(>tickets.owner,
> +   VAL == lockval.tickets.next);
It's wrong, blew is fixup:

diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
index fe98ad8ece51..2be627ceb9df 100644
--- a/arch/csky/include/asm/spinlock.h
+++ b/arch/csky/include/asm/spinlock.h
@@ -27,7 +27,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
: "r"(p), "r"(ticket_next)
: "cc");

-   smp_cond_load_acquire(>tickets.owner,
+   if (lockval.owner != lockval.tickets.next)
+   smp_cond_load_acquire(>tickets.owner,
VAL == lockval.tickets.next);
> +}
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> +   u32 tmp, contended, res;
> +
> +   do {
> +   asm volatile (
> +   "   lr.w%0, %3  \n"
> +   __ASM_SRLIW"%1, %0, %5  \n"
> +   __ASM_SLLIW"%2, %0, %5  \n"
> +   "   or  %1, %2, %1  \n"
> +   "   li  %2, 0   \n"
> +   "   sub %1, %1, %0  \n"
> +   "   bnez%1, 1f  \n"
> +   "   addw%0, %0, %4  \n"
> +   "   sc.w%2, %0, %3  \n"
> +   "1: \n"
> +   : "=" (tmp), "=" (contended), "=" (res),
> + "+A" (lock->lock)
> +   : "r" (1 << TICKET_NEXT), "I" (TICKET_NEXT)
> +   : "memory");
> +   } while (res);
> +
> +   if (!contended)
> +   __atomic_acquire_fence();
> +
> +   return !contended;
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> +   smp_store_release(>tickets.owner, lock->tickets.owner + 1);
> +}
> +
> +static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> +{
> +   return lock.tickets.owner == lock.tickets.next;
> +}
> +
> +static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> +{
> +   return !arch_spin_value_unlocked(READ_ONCE(*lock));
> +}
> +
> +static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> +{
> +   struct __raw_tickets tickets = READ_ONCE(lock->tickets);
> +
> +   return (tickets.next - tickets.owner) > 1;
> +}
> +#define 

Re: [PATCH v6 4/9] csky: locks: Optimize coding convention

2021-04-11 Thread Guo Ren
On Wed, Mar 31, 2021 at 10:32 PM  wrote:
>
> From: Guo Ren 
>
>  - Using smp_cond_load_acquire in arch_spin_lock by Peter's
>advice.
>  - Using __smp_acquire_fence in arch_spin_trylock
>  - Using smp_store_release in arch_spin_unlock
>
> All above are just coding conventions and won't affect the
> function.
>
> TODO in smp_cond_load_acquire for architecture:
>  - current csky only has:
>lr.w val, 
>sc.w . val2
>(Any other stores to p0 will let sc.w failed)
>
>  - But smp_cond_load_acquire need:
>lr.w val, 
>wfe
>(Any stores to p0 will send the event to let wfe retired)
>
> Signed-off-by: Guo Ren 
> Link: 
> https://lore.kernel.org/linux-riscv/caahsdy1jhlufwu7rucaq+ruwrbks2ksdva7eprt8--4zfof...@mail.gmail.com/T/#m13adac285b7f51f4f879a5d6b65753ecb1a7524e
> Cc: Peter Zijlstra 
> Cc: Arnd Bergmann 
> ---
>  arch/csky/include/asm/spinlock.h | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/arch/csky/include/asm/spinlock.h 
> b/arch/csky/include/asm/spinlock.h
> index 69f5aa249c5f..69677167977a 100644
> --- a/arch/csky/include/asm/spinlock.h
> +++ b/arch/csky/include/asm/spinlock.h
> @@ -26,10 +26,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
> : "r"(p), "r"(ticket_next)
> : "cc");
>
> -   while (lockval.tickets.next != lockval.tickets.owner)
> -   lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
> -
> -   smp_mb();
> +   smp_cond_load_acquire(>tickets.owner,
> +   VAL == lockval.tickets.next);
It's wrong, we should determine lockval before next read.

Fixup:

diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
index fe98ad8ece51..2be627ceb9df 100644
--- a/arch/csky/include/asm/spinlock.h
+++ b/arch/csky/include/asm/spinlock.h
@@ -27,7 +27,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
: "r"(p), "r"(ticket_next)
: "cc");

-   smp_cond_load_acquire(>tickets.owner,
+   if (lockval.owner != lockval.tickets.next)
+   smp_cond_load_acquire(>tickets.owner,
VAL == lockval.tickets.next);

>  }
>
>  static inline int arch_spin_trylock(arch_spinlock_t *lock)
> @@ -55,15 +53,14 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
> } while (!res);
>
> if (!contended)
> -   smp_mb();
> +   __smp_acquire_fence();
>
> return !contended;
>  }
>
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> -   smp_mb();
> -   WRITE_ONCE(lock->tickets.owner, lock->tickets.owner + 1);
> +   smp_store_release(>tickets.owner, lock->tickets.owner + 1);
>  }
>
>  static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> --
> 2.17.1
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH 1/2] powerpc: syscalls: switch to generic syscalltbl.sh

2021-04-11 Thread Masahiro Yamada
Hi Michael,

On Tue, Mar 2, 2021 at 12:31 AM Masahiro Yamada  wrote:
>
> Many architectures duplicate similar shell scripts.
>
> This commit converts powerpc to use scripts/syscalltbl.sh. This also
> unifies syscall_table_32.h and syscall_table_c32.h.
>
> Signed-off-by: Masahiro Yamada 


Could you check this series?

Thanks.
Masahiro


> ---
>
>  arch/powerpc/include/asm/Kbuild |  1 -
>  arch/powerpc/kernel/syscalls/Makefile   | 22 +++--
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  | 36 -
>  arch/powerpc/kernel/systbl.S|  5 ++-
>  arch/powerpc/platforms/cell/spu_callbacks.c |  2 +-
>  5 files changed, 10 insertions(+), 56 deletions(-)
>  delete mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
>
> diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
> index e1f9b4ea1c53..bcf95ce0964f 100644
> --- a/arch/powerpc/include/asm/Kbuild
> +++ b/arch/powerpc/include/asm/Kbuild
> @@ -1,7 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  generated-y += syscall_table_32.h
>  generated-y += syscall_table_64.h
> -generated-y += syscall_table_c32.h
>  generated-y += syscall_table_spu.h
>  generic-y += export.h
>  generic-y += kvm_types.h
> diff --git a/arch/powerpc/kernel/syscalls/Makefile 
> b/arch/powerpc/kernel/syscalls/Makefile
> index 9e3be295dbba..df21c731c806 100644
> --- a/arch/powerpc/kernel/syscalls/Makefile
> +++ b/arch/powerpc/kernel/syscalls/Makefile
> @@ -7,7 +7,7 @@ _dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') 
>   \
>
>  syscall := $(src)/syscall.tbl
>  syshdr := $(srctree)/$(src)/syscallhdr.sh
> -systbl := $(srctree)/$(src)/syscalltbl.sh
> +systbl := $(srctree)/scripts/syscalltbl.sh
>
>  quiet_cmd_syshdr = SYSHDR  $@
>cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'   \
> @@ -16,10 +16,7 @@ quiet_cmd_syshdr = SYSHDR  $@
>'$(syshdr_offset_$(basetarget))'
>
>  quiet_cmd_systbl = SYSTBL  $@
> -  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'   \
> -  '$(systbl_abis_$(basetarget))'   \
> -  '$(systbl_abi_$(basetarget))'\
> -  '$(systbl_offset_$(basetarget))'
> +  cmd_systbl = $(CONFIG_SHELL) $(systbl) --abis $(abis) $< $@
>
>  syshdr_abis_unistd_32 := common,nospu,32
>  $(uapi)/unistd_32.h: $(syscall) $(syshdr) FORCE
> @@ -29,30 +26,21 @@ syshdr_abis_unistd_64 := common,nospu,64
>  $(uapi)/unistd_64.h: $(syscall) $(syshdr) FORCE
> $(call if_changed,syshdr)
>
> -systbl_abis_syscall_table_32 := common,nospu,32
> -systbl_abi_syscall_table_32 := 32
> +$(kapi)/syscall_table_32.h: abis := common,nospu,32
>  $(kapi)/syscall_table_32.h: $(syscall) $(systbl) FORCE
> $(call if_changed,systbl)
>
> -systbl_abis_syscall_table_64 := common,nospu,64
> -systbl_abi_syscall_table_64 := 64
> +$(kapi)/syscall_table_64.h: abis := common,nospu,64
>  $(kapi)/syscall_table_64.h: $(syscall) $(systbl) FORCE
> $(call if_changed,systbl)
>
> -systbl_abis_syscall_table_c32 := common,nospu,32
> -systbl_abi_syscall_table_c32 := c32
> -$(kapi)/syscall_table_c32.h: $(syscall) $(systbl) FORCE
> -   $(call if_changed,systbl)
> -
> -systbl_abis_syscall_table_spu := common,spu
> -systbl_abi_syscall_table_spu := spu
> +$(kapi)/syscall_table_spu.h: abis := common,spu
>  $(kapi)/syscall_table_spu.h: $(syscall) $(systbl) FORCE
> $(call if_changed,systbl)
>
>  uapisyshdr-y   += unistd_32.h unistd_64.h
>  kapisyshdr-y   += syscall_table_32.h   \
>syscall_table_64.h   \
> -  syscall_table_c32.h  \
>syscall_table_spu.h
>
>  uapisyshdr-y   := $(addprefix $(uapi)/, $(uapisyshdr-y))
> diff --git a/arch/powerpc/kernel/syscalls/syscalltbl.sh 
> b/arch/powerpc/kernel/syscalls/syscalltbl.sh
> deleted file mode 100644
> index f7393a7b18aa..
> --- a/arch/powerpc/kernel/syscalls/syscalltbl.sh
> +++ /dev/null
> @@ -1,36 +0,0 @@
> -#!/bin/sh
> -# SPDX-License-Identifier: GPL-2.0
> -
> -in="$1"
> -out="$2"
> -my_abis=`echo "($3)" | tr ',' '|'`
> -my_abi="$4"
> -offset="$5"
> -
> -emit() {
> -   t_nxt="$1"
> -   t_nr="$2"
> -   t_entry="$3"
> -
> -   while [ $t_nxt -lt $t_nr ]; do
> -   printf "__SYSCALL(%s,sys_ni_syscall)\n" "${t_nxt}"
> -   t_nxt=$((t_nxt+1))
> -   done
> -   printf "__SYSCALL(%s,%s)\n" "${t_nxt}" "${t_entry}"
> -}
> -
> -grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
> -   nxt=0
> -   if [ -z "$offset" ]; then
> -   offset=0
> -   fi
> -
> -   while read nr abi name entry compat ; do
> -   if [ "$my_abi" = "c32" ] && [ ! -z "$compat" ]; then
> -   emit $((nxt+offset)) $((nr+offset)) $compat
> -   else
> -   emit $((nxt+offset)) $((nr+offset)) $entry
> -   fi
> -   

sysctl: setting key "net.core.bpf_jit_enable": Invalid argument

2021-04-11 Thread Paul Menzel

Dear Linux folks,


Related to * [CVE-2021-29154] Linux kernel incorrect computation of 
branch displacements in BPF JIT compiler can be abused to execute 
arbitrary code in Kernel mode* [1], on the POWER8 system IBM S822LC with 
self-built Linux 5.12.0-rc5+, I am unable to disable `bpf_jit_enable`.


$ /sbin/sysctl net.core.bpf_jit_enable
net.core.bpf_jit_enable = 1
$ sudo /sbin/sysctl -w net.core.bpf_jit_enable=0
sysctl: setting key "net.core.bpf_jit_enable": Invalid argument

It works on an x86 with Debian sid/unstable and Linux 5.10.26-1.


Kind regards,

Paul


[1]: https://seclists.org/oss-sec/2021/q2/12


Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-11 Thread Matthew Wilcox
On Sun, Apr 11, 2021 at 11:43:07AM +0200, Jesper Dangaard Brouer wrote:
> On Sat, 10 Apr 2021 21:52:45 +0100
> "Matthew Wilcox (Oracle)"  wrote:
> 
> > 32-bit architectures which expect 8-byte alignment for 8-byte integers
> > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> > page inadvertently expanded in 2019.  When the dma_addr_t was added,
> > it forced the alignment of the union to 8 bytes, which inserted a 4 byte
> > gap between 'flags' and the union.
> > 
> > We could fix this by telling the compiler to use a smaller alignment
> > for the dma_addr, but that seems a little fragile.  Instead, move the
> > 'flags' into the union.  That causes dma_addr to shift into the same
> > bits as 'mapping', so it would have to be cleared on free.  To avoid
> > this, insert three words of padding and use the same bits as ->index
> > and ->private, neither of which have to be cleared on free.
> > 
> > Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
> > Signed-off-by: Matthew Wilcox (Oracle) 
> > ---
> >  include/linux/mm_types.h | 38 ++
> >  1 file changed, 26 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 6613b26a8894..45c563e9b50e 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -68,16 +68,22 @@ struct mem_cgroup;
> >  #endif
> >  
> >  struct page {
> > -   unsigned long flags;/* Atomic flags, some possibly
> > -* updated asynchronously */
> > /*
> > -* Five words (20/40 bytes) are available in this union.
> > -* WARNING: bit 0 of the first word is used for PageTail(). That
> > -* means the other users of this union MUST NOT use the bit to
> > +* This union is six words (24 / 48 bytes) in size.
> > +* The first word is reserved for atomic flags, often updated
> > +* asynchronously.  Use the PageFoo() macros to access it.  Some
> > +* of the flags can be reused for your own purposes, but the
> > +* word as a whole often contains other information and overwriting
> > +* it will cause functions like page_zone() and page_node() to stop
> > +* working correctly.
> > +*
> > +* Bit 0 of the second word is used for PageTail(). That
> > +* means the other users of this union MUST leave the bit zero to
> >  * avoid collision and false-positive PageTail().
> >  */
> > union {
> > struct {/* Page cache and anonymous pages */
> > +   unsigned long flags;
> > /**
> >  * @lru: Pageout list, eg. active_list protected by
> >  * lruvec->lru_lock.  Sometimes used as a generic list
> > @@ -96,13 +102,14 @@ struct page {
> > unsigned long private;
> > };
> > struct {/* page_pool used by netstack */
> > -   /**
> > -* @dma_addr: might require a 64-bit value even on
> > -* 32-bit architectures.
> > -*/
> > -   dma_addr_t dma_addr;
> 
> The original intend of placing member @dma_addr here is that it overlap
> with @LRU (type struct list_head) which contains two pointers.  Thus, in
> case of CONFIG_ARCH_DMA_ADDR_T_64BIT=y on 32-bit architectures it would
> use both pointers.
> 
> Thinking more about this, this design is flawed as bit 0 of the first
> word is used for compound pages (see PageTail and @compound_head), is
> reserved.  We knew DMA addresses were aligned, thus we though this
> satisfied that need.  BUT for DMA_ADDR_T_64BIT=y on 32-bit arch the
> first word will contain the "upper" part of the DMA addr, which I don't
> think gives this guarantee.
> 
> I guess, nobody are using this combination?!?  I though we added this
> to satisfy TI (Texas Instrument) driver cpsw (code in
> drivers/net/ethernet/ti/cpsw*).  Thus, I assumed it was in use?

It may be in use, but we've got away with it?  It's relatively rare
that this is going to bite us.  I think what has to happen is:

page is mapped to userspace
task calls get_user_page_fast(), loads the PTE

page is unmapped & freed
page is reallocated to the page_pool
page is DMA mapped to an address that happens to have that bit set

task looks for the compound_head() of that PTE, and attempts to bump
the refcount.  *oops*

If it has happened, would it have turned into a bug report?
If we had seen such a bug report, would we have noticed it?

> > +   unsigned long _pp_flags;
> > +   unsigned long pp_magic;
> > +   unsigned long xmi;
> 
> Matteo notice, I think intent is we can store xdp_mem_info in @xmi.

Yep.

> > +   unsigned long _pp_mapping_pad;
> > +   dma_addr_t dma_addr;/* might be one or two words */
> > };
> 
> Could you explain your intent here?
> I worry about 

Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-11 Thread Jesper Dangaard Brouer
On Sat, 10 Apr 2021 21:52:45 +0100
"Matthew Wilcox (Oracle)"  wrote:

> 32-bit architectures which expect 8-byte alignment for 8-byte integers
> and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> page inadvertently expanded in 2019.  When the dma_addr_t was added,
> it forced the alignment of the union to 8 bytes, which inserted a 4 byte
> gap between 'flags' and the union.
> 
> We could fix this by telling the compiler to use a smaller alignment
> for the dma_addr, but that seems a little fragile.  Instead, move the
> 'flags' into the union.  That causes dma_addr to shift into the same
> bits as 'mapping', so it would have to be cleared on free.  To avoid
> this, insert three words of padding and use the same bits as ->index
> and ->private, neither of which have to be cleared on free.
> 
> Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  include/linux/mm_types.h | 38 ++
>  1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6613b26a8894..45c563e9b50e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -68,16 +68,22 @@ struct mem_cgroup;
>  #endif
>  
>  struct page {
> - unsigned long flags;/* Atomic flags, some possibly
> -  * updated asynchronously */
>   /*
> -  * Five words (20/40 bytes) are available in this union.
> -  * WARNING: bit 0 of the first word is used for PageTail(). That
> -  * means the other users of this union MUST NOT use the bit to
> +  * This union is six words (24 / 48 bytes) in size.
> +  * The first word is reserved for atomic flags, often updated
> +  * asynchronously.  Use the PageFoo() macros to access it.  Some
> +  * of the flags can be reused for your own purposes, but the
> +  * word as a whole often contains other information and overwriting
> +  * it will cause functions like page_zone() and page_node() to stop
> +  * working correctly.
> +  *
> +  * Bit 0 of the second word is used for PageTail(). That
> +  * means the other users of this union MUST leave the bit zero to
>* avoid collision and false-positive PageTail().
>*/
>   union {
>   struct {/* Page cache and anonymous pages */
> + unsigned long flags;
>   /**
>* @lru: Pageout list, eg. active_list protected by
>* lruvec->lru_lock.  Sometimes used as a generic list
> @@ -96,13 +102,14 @@ struct page {
>   unsigned long private;
>   };
>   struct {/* page_pool used by netstack */
> - /**
> -  * @dma_addr: might require a 64-bit value even on
> -  * 32-bit architectures.
> -  */
> - dma_addr_t dma_addr;

The original intend of placing member @dma_addr here is that it overlap
with @LRU (type struct list_head) which contains two pointers.  Thus, in
case of CONFIG_ARCH_DMA_ADDR_T_64BIT=y on 32-bit architectures it would
use both pointers.

Thinking more about this, this design is flawed as bit 0 of the first
word is used for compound pages (see PageTail and @compound_head), is
reserved.  We knew DMA addresses were aligned, thus we though this
satisfied that need.  BUT for DMA_ADDR_T_64BIT=y on 32-bit arch the
first word will contain the "upper" part of the DMA addr, which I don't
think gives this guarantee.

I guess, nobody are using this combination?!?  I though we added this
to satisfy TI (Texas Instrument) driver cpsw (code in
drivers/net/ethernet/ti/cpsw*).  Thus, I assumed it was in use?


> + unsigned long _pp_flags;
> + unsigned long pp_magic;
> + unsigned long xmi;

Matteo notice, I think intent is we can store xdp_mem_info in @xmi.

> + unsigned long _pp_mapping_pad;
> + dma_addr_t dma_addr;/* might be one or two words */
>   };

Could you explain your intent here?
I worry about @index.

As I mentioned in other thread[1] netstack use page_is_pfmemalloc()
(code copy-pasted below signature) which imply that the member @index
have to be kept intact. In above, I'm unsure @index is untouched.

[1] https://lore.kernel.org/lkml/20210410082158.79ad09a6@carbon/
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

/*
 * Return true only if the page has been allocated with
 * ALLOC_NO_WATERMARKS and the low watermark was not
 * met implying that the system is under some pressure.
 */
static inline bool page_is_pfmemalloc(const struct page *page)
{
/*
 * Page index cannot be this large so this must be
 * a 

Re: [PATCH 02/16] powerpc/vas: Make VAS API powerpc platform independent

2021-04-11 Thread Christophe Leroy




Le 11/04/2021 à 02:31, Haren Myneni a écrit :


Using the same /dev/crypto/nx-gzip interface for both powerNV and
pseries. So this patcb moves VAS API to powerpc platform indepedent
directory. The actual functionality is not changed in this patch.


This patch seems to do a lot more than moving VAS API to independent directory. A more detailed 
description would help.


And it is not something defined in the powerpc architecture I think, so it 
should
remain in some common platform related directory.



Signed-off-by: Haren Myneni 
---
  arch/powerpc/Kconfig  | 15 +
  arch/powerpc/include/asm/vas.h| 22 ++-
  arch/powerpc/kernel/Makefile  |  1 +
  .../{platforms/powernv => kernel}/vas-api.c   | 64 ++
  arch/powerpc/platforms/powernv/Kconfig| 14 
  arch/powerpc/platforms/powernv/Makefile   |  2 +-
  arch/powerpc/platforms/powernv/vas-window.c   | 66 +++
  7 files changed, 140 insertions(+), 44 deletions(-)
  rename arch/powerpc/{platforms/powernv => kernel}/vas-api.c (83%)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..7aa1fbf7c1dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -478,6 +478,21 @@ config PPC_UV
  
  	  If unsure, say "N".
  
+config PPC_VAS

+   bool "IBM Virtual Accelerator Switchboard (VAS)"
+   depends on PPC_POWERNV && PPC_64K_PAGES
+   default y
+   help
+ This enables support for IBM Virtual Accelerator Switchboard (VAS).


IIUC is a functionnality in a coprocessor of some IBM processors. Something similar in principle to 
the communication coprocessors we find in Freescale processors.


It is not a generic functionnality part of the powerpc architecture, I don't think this belongs to 
arch/powerpc/Kconfig


I think it should go in arch/powerpc/platform/Kconfig

Or maybe in drivers/soc/ibm/ ?



+
+ VAS allows accelerators in co-processors like NX-GZIP and NX-842
+ to be accessible to kernel subsystems and user processes.
+ VAS adapters are found in POWER9 and later based systems.
+ The user mode NX-GZIP support is added on P9 for powerNV and on
+ P10 for powerVM.
+
+ If unsure, say "N".
+
  config LD_HEAD_STUB_CATCH
bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if 
EXPERT
depends on PPC64
diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 41f73fae7ab8..6bbade60d8f4 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -5,6 +5,8 @@
  
  #ifndef _ASM_POWERPC_VAS_H

  #define _ASM_POWERPC_VAS_H
+#include 
+
  
  struct vas_window;
  
@@ -48,6 +50,16 @@ enum vas_cop_type {

VAS_COP_TYPE_MAX,
  };
  
+/*

+ * User space window operations used for powernv and powerVM
+ */
+struct vas_user_win_ops {
+   struct vas_window * (*open_win)(struct vas_tx_win_open_attr *,
+   enum vas_cop_type);
+   u64 (*paste_addr)(void *);
+   int (*close_win)(void *);
+};
+
  /*
   * Receive window attributes specified by the (in-kernel) owner of window.
   */
@@ -161,6 +173,9 @@ int vas_copy_crb(void *crb, int offset);
   * assumed to be true for NX windows.
   */
  int vas_paste_crb(struct vas_window *win, int offset, bool re);
+int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
+const char *name);
+void vas_unregister_api_powernv(void);
  
  /*

   * Register / unregister coprocessor type to VAS API which will be exported
@@ -170,8 +185,9 @@ int vas_paste_crb(struct vas_window *win, int offset, bool 
re);
   * Only NX GZIP coprocessor type is supported now, but this API can be
   * used for others in future.
   */
-int vas_register_api_powernv(struct module *mod, enum vas_cop_type cop_type,
-const char *name);
-void vas_unregister_api_powernv(void);
+int vas_register_coproc_api(struct module *mod, enum vas_cop_type cop_type,
+   const char *name,
+   struct vas_user_win_ops *vops);
+void vas_unregister_coproc_api(void);
  
  #endif /* __ASM_POWERPC_VAS_H */

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 6084fa499aa3..205d8f12bd36 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -118,6 +118,7 @@ obj-$(CONFIG_PPC_UDBG_16550)+= legacy_serial.o 
udbg_16550.o
  obj-$(CONFIG_STACKTRACE)  += stacktrace.o
  obj-$(CONFIG_SWIOTLB) += dma-swiotlb.o
  obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o
+obj-$(CONFIG_PPC_VAS)  += vas-api.o
  
  pci64-$(CONFIG_PPC64)		+= pci_dn.o pci-hotplug.o isa-bridge.o

  obj-$(CONFIG_PCI) += pci_$(BITS).o $(pci64-y) \
diff --git a/arch/powerpc/platforms/powernv/vas-api.c 
b/arch/powerpc/kernel/vas-api.c
similarity index 83%
rename from arch/powerpc/platforms/powernv/vas-api.c
rename 

Re: [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper

2021-04-11 Thread Leonardo Bras
On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > Cc: linuxppc-dev@lists.ozlabs.org, linux-ker...@vger.kernel.org,
> > 
> > Add a new helper _iommu_table_setparms(), and use it in
> > iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
> > code.
> > 
> > Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
> > so move it to the new helper. Since we need the iommu_table_ops to be
> > declared before used, move iommu_table_lpar_multi_ops and
> > iommu_table_pseries_ops to before their respective iommu_table_setparms*().
> > 
> > The tce_exchange_pseries() also had to be moved up, since it's used in
> > iommu_table_lpar_multi_ops.xchg_no_kill.
> 
> 
> Use forward declarations (preferred) or make a separate patch for moving 
> chunks (I do not see much point).

Fixed :)

> > @@ -509,8 +559,13 @@ static void iommu_table_setparms(struct pci_controller 
> > *phb,
> >     const unsigned long *basep;
> >     const u32 *sizep;

> > -   node = phb->dn;
> > +   /* Test if we are going over 2GB of DMA space */
> > 
> > 
> > 
> > +   if (phb->dma_window_base_cur + phb->dma_window_size > 0x8000ul) {
> > +   udbg_printf("PCI_DMA: Unexpected number of IOAs under this 
> > PHB.\n");
> > +   panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> > +   }
> 
> 
> s/0x8000ul/2*SZ_1G/

Done!

> 
> but more to the point - why this check? QEMU can create windows at 0 and 
> as big as the VM requested. And I am pretty sure I can construct QEMU 
> command line such as it won't have MMIO32 at all and a 4GB default DMA 
> window.
> 

Oh, the diff was a little strange here. I did not add this snippet, it
was already in that function, but since I created the helper, the diff
made it look like I introduced this piece of code.
Please take a look in the diff snippet bellow. (This same lines were
there.)

> > @@ -519,33 +574,25 @@ static void iommu_table_setparms(struct 
> > pci_controller *phb,
> >     return;
> >     }
> >   
> > -   tbl->it_base = (unsigned long)__va(*basep);
> > 
> > 
> > 
> > +   _iommu_table_setparms(tbl, phb->bus->number, 0, 
> > phb->dma_window_base_cur,
> > + phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
> > + (unsigned long)__va(*basep), 
> > _table_pseries_ops);
> > if (!is_kdump_kernel())
> > 
> > 
> > 
> >     memset((void *)tbl->it_base, 0, *sizep);
> > 
> > -   tbl->it_busno = phb->bus->number;
> > -   tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> > -
> > -   /* Units of tce entries */
> > -   tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
> > -
> > -   /* Test if we are going over 2GB of DMA space */
> > -   if (phb->dma_window_base_cur + phb->dma_window_size > 0x8000ul) {
> > -   udbg_printf("PCI_DMA: Unexpected number of IOAs under this 
> > PHB.\n");
> > -   panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> > -   }
> > -
> >     phb->dma_window_base_cur += phb->dma_window_size;
> > -
> > -   /* Set the tce table size - measured in entries */
> > -   tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> > -
> > -   tbl->it_index = 0;
> > -   tbl->it_blocksize = 16;
> > -   tbl->it_type = TCE_PCI;
> >   }
> >   

Thanks for reviewing, Alexey!




Re: [PATCH 00/16] Enable VAS and NX-GZIP support on powerVM

2021-04-11 Thread Christophe Leroy




Le 11/04/2021 à 02:27, Haren Myneni a écrit :


This patch series enables VAS / NX-GZIP on powerVM which allows
the user space to do copy/paste with the same existing interface
that is available on powerNV.


Can you explain (here and in patch 1 at least) what VAS and NX means ?
Is that Vector Addition System ? Is that Virtual Address Space ?
(https://en.wikipedia.org/wiki/VAS)



VAS Enablement:
- Get all VAS capabilities using H_QUERY_VAS_CAPABILITIES that are
   available in the hypervisor. These capabilities tells OS which
   type of features (credit types such as Default and Quality of
   Service (QoS)). Also gives specific capabilities for each credit
   type: Maximum window credits, Maximum LPAR credits, Target credits
   in that parition (varies from max LPAR credits based DLPAR
   operation), whether supports user mode COPY/PASTE and etc.
- Register LPAR VAS operations such as open window. get paste
   address and close window with the current VAS user space API.
- Open window operation - Use H_ALLOCATE_VAS_WINDOW HCALL to open
   window and H_MODIFY_VAS_WINDOW HCALL to setup the window with LPAR
   PID and etc.
- mmap to paste address returned in H_ALLOCATE_VAS_WINDOW HCALL
- To close window, H_DEALLOCATE_VAS_WINDOW HCALL is used to close in
   the hypervisor.

NX Enablement:
- Get NX capabilities from the the hypervisor which provides Maximum
   buffer length in a single GZIP request, recommended minimum
   compression / decompression lengths.
- Register to VAS to enable user space VAS API

Main feature differences with powerNV implementation:
- Each VAS window will be configured with a number of credits which
   means that many requests can be issues simultaniously on that
   window. On powerNV, 1K credits are configured per window.
   Whereas on powerVM, the hypervisor allows 1 credit per window
   at present.
- The hypervisor introduced 2 different types of credits: Default -
   Uses normal priority FIFO and Quality of Service (QoS) - Uses high
   priority FIFO. On powerVM, VAS/NX HW resources are shared across
   LPARs. The total number of credits available on a system depends
   on cores configured. We may see more credits are assigned across
   the system than the NX HW resources can handle. So to avoid NX HW
   contention, pHyp introduced QoS credits which can be configured
   by system administration with HMC API. Then the total number of
   available default credits on LPAR varies based on QoS credits
   configured.
- On powerNV, windows are allocated on a specific VAS instance
   and the user space can select VAS instance with the open window
   ioctl. Since VAS instances can be shared across partitions on
   powerVM, the hypervisor manages window allocations on different
   VAS instances. So H_ALLOCATE_VAS_WINDOW allows to select by domain
   indentifiers (H_HOME_NODE_ASSOCIATIVITY values by cpu). By default
   the hypervisor selects VAS instance closer to CPU resources that the
   parition uses. So vas_id in ioctl interface is ignored on powerVM
   except vas_id=-1 which is used to allocate window based on CPU that
   the process is executing. This option is needed for process affinity
   to NUMA node.

   The existing applications that linked with libnxz should work as
   long as the job request length is restricted to
   req_max_processed_len.

   Tested the following patches on P10 successfully with test cases
   given: https://github.com/libnxz/power-gzip

   Note: The hypervisor supports user mode NX from p10 onwards. Linux
supports user mode VAS/NX on P10 only with radix page tables.

Patches 1- 4:   Make the code that is needed for both powerNV and
 powerVM to powerpc platform independent.
Patch5: Modify vas-window struct to support both and the
 related changes.
Patch 6:Define HCALL and the related VAS/NXGZIP specific
 structs.
Patch 7:Define QoS credit flag in window open ioctl
Patch 8:Implement Allocate, Modify and Deallocate HCALLs
Patch 9:Retrieve VAS capabilities from the hypervisor
Patch 10;   Implement window operations and integrate with API
Patch 11:   Setup IRQ and NX fault handling
Patch 12;   Add sysfs interface to expose VAS capabilities
Patch 13 - 14:  Make the code common to add NX-GZIP enablement
Patch 15:   Get NX capabilities from the hypervisor
patch 16;   Add sysfs interface to expose NX capabilities

Haren Myneni (16):
   powerpc/powernv/vas: Rename register/unregister functions
   powerpc/vas: Make VAS API powerpc platform independent
   powerpc/vas: Create take/drop task reference functions
   powerpc/vas: Move update_csb/dump_crb to platform independent
   powerpc/vas:  Define and use common vas_window struct
   powerpc/pseries/vas: Define VAS/NXGZIP HCALLs and structs
   powerpc/vas: Define QoS credit flag to allocate window
   powerpc/pseries/vas: Implement allocate/modify/deallocate HCALLS
   powerpc/pseries/vas: Implement to get 

Re: [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()

2021-04-11 Thread Leonardo Bras
On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > 
> >     dev_dbg(>dev, "created tce table LIOBN 0x%x for %pOF\n",
> > - create.liobn, dn);
> > +   create.liobn, dn);
> 
> 
> Unrelated. If you think the spaces/tabs thing needs to be fixed, make it 
> a separate patch and do all these changes there at once.

Sorry, it was some issue with my editor / diff. 
I removed those changes for next version.

> > -out_free_prop:
> > +out_prop_free:
> 
> 
> Really? :) s/out_prop_del/out_del_prop/ may be? The less unrelated 
> changes the better.

I changed all labels I added to have out__, I think
that will allow it to stay like existing labels.


Thanks for reviewing!
Leonardo Bras



Re: [PATCH 15/16] crypto/nx: Get NX capabilities for GZIP coprocessor type

2021-04-11 Thread kernel test robot
Hi Haren,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on cryptodev/master crypto/master v5.12-rc6 
next-20210409]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Haren-Myneni/Enable-VAS-and-NX-GZIP-support-on-powerVM/20210411-084631
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/3dc0fb58cbf2543e4f3cb016ef3ed475a975f3c9
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Haren-Myneni/Enable-VAS-and-NX-GZIP-support-on-powerVM/20210411-084631
git checkout 3dc0fb58cbf2543e4f3cb016ef3ed475a975f3c9
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   powerpc64-linux-ld: drivers/crypto/nx/nx-common-pseries.o: in function 
`.nx842_pseries_init':
>> nx-common-pseries.c:(.init.text+0x184): undefined reference to 
>> `.plpar_vas_query_capabilities'
>> powerpc64-linux-ld: nx-common-pseries.c:(.init.text+0x28c): undefined 
>> reference to `.plpar_vas_query_capabilities'
>> powerpc64-linux-ld: nx-common-pseries.c:(.init.text+0x488): undefined 
>> reference to `.vas_register_api_pseries'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 14/16] crypto/nx: Register and unregister VAS interface

2021-04-11 Thread kernel test robot
Hi Haren,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on cryptodev/master crypto/master v5.12-rc6 
next-20210409]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Haren-Myneni/Enable-VAS-and-NX-GZIP-support-on-powerVM/20210411-084631
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/8650d30be9c22ee8fdc59063b993bfbafe88a328
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Haren-Myneni/Enable-VAS-and-NX-GZIP-support-on-powerVM/20210411-084631
git checkout 8650d30be9c22ee8fdc59063b993bfbafe88a328
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: ".vas_unregister_api_pseries" 
>> [drivers/crypto/nx/nx-compress-pseries.ko] undefined!
>> ERROR: modpost: ".vas_register_api_pseries" 
>> [drivers/crypto/nx/nx-compress-pseries.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper

2021-04-11 Thread Leonardo Bras
Hello Alexey, thanks for the feedback!

On Tue, 2020-09-29 at 13:57 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > index ffb2637dc82b..c838da3d8f32 100644
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -655,34 +655,21 @@ static void iommu_table_reserve_pages(struct 
> > iommu_table *tbl,
> >     if (tbl->it_offset == 0)
> >     set_bit(0, tbl->it_map);
> >   
> > 
> > 
> > 
> > +   /* Check if res_start..res_end is a valid range in the table */
> > +   if (res_start >= res_end || res_start < tbl->it_offset ||
> > +   res_end > (tbl->it_offset + tbl->it_size)) {
> > +   tbl->it_reserved_start = tbl->it_offset;
> > +   tbl->it_reserved_end = tbl->it_offset;
> 
> 
> This silently ignores overlapped range of the reserved area and the 
> window which does not seem right.

Humm, that makes sense.
Would it be better to do something like this?

if (res_start < tbl->it_offset) 
res_start = tbl->it_offset;

if (res_end > (tbl->it_offset + tbl->it_size))
res_end = tbl->it_offset + tbl->it_size;

if (res_start >= res_end) {
tbl->it_reserved_start = tbl->it_offset;
tbl->it_reserved_end = tbl->it_offset;
return;
}


> > +   return;
> > +   }
> > +
> >     tbl->it_reserved_start = res_start;
> >     tbl->it_reserved_end = res_end;

> >   - /* Check if res_start..res_end isn't empty and overlaps the table */
> > -   if (res_start && res_end &&
> > -   (tbl->it_offset + tbl->it_size < res_start ||
> > -res_end < tbl->it_offset))
> > -   return;
> > -
> >     for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
> >     set_bit(i - tbl->it_offset, tbl->it_map);
> >   }
> > +bool iommu_table_in_use(struct iommu_table *tbl)
> > +{
> > +   unsigned long start = 0, end;
> > +
> > +   /* ignore reserved bit0 */
> > +   if (tbl->it_offset == 0)
> > +   start = 1;
> > +   end = tbl->it_reserved_start - tbl->it_offset;
> > +   if (find_next_bit(tbl->it_map, end, start) != end)
> > +   return true;
> > +
> > +   start = tbl->it_reserved_end - tbl->it_offset;
> > +   end = tbl->it_size;
> > +   return find_next_bit(tbl->it_map, end, start) != end;
> > +
> 
> Unnecessary empty line.

Sure, removing. 
Thanks!