date:20210728

[PATCH] powerpc/pseries: Fix regression while building external modules

2021-07-28 Thread Srikar Dronamraju

With Commit c9f3401313a5 ("powerpc: Always enable queued spinlocks for
64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always
enabled on ppc64le, external modules that use spinlock APIs are
failing.

ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol
'shared_processor'

Before the above commit, modules were able to build without any
issues. Also this problem is not seen on other architectures. This
problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in
the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by
default and only enabled in certain conditions like
CONFIG_DEBUG_SPINLOCKS is set in the kernel config.

 #include 
spinlock_t spLock;

static int __init spinlock_test_init(void)
{
spin_lock_init(&spLock);
spin_lock(&spLock);
spin_unlock(&spLock);
return 0;
}

static void __exit spinlock_test_exit(void)
{
printk("spinlock_test unloaded\n");
}
module_init(spinlock_test_init);
module_exit(spinlock_test_exit);

MODULE_DESCRIPTION ("spinlock_test");
MODULE_LICENSE ("non-GPL");
MODULE_AUTHOR ("Srikar Dronamraju");

Given that spin locks are one of the basic facilities for module code,
this effectively makes it impossible to build/load almost any non GPL
modules on ppc64le.

This was first reported at https://github.com/openzfs/zfs/issues/11172

Currently shared_processor is exported as GPL only symbol.
Fix this for parity with other architectures by exposing
shared_processor to non-GPL modules too.

Fixes: 14c73bd344da ("powerpc/vcpu: Assume dedicated processors as non-preempt")
Fixes: c9f3401313a5 ("powerpc: Always enable queued spinlocks for 64s, disable 
for others")
Reported-by: marc.c.dio...@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: marc.c.dio...@gmail.com
Cc: jfor...@redhat.com
Cc: yaday...@in.ibm.com
Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/platforms/pseries/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 754e493b7c05..0338f481c12b 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -77,7 +77,7 @@
 #include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
-EXPORT_SYMBOL_GPL(shared_processor);
+EXPORT_SYMBOL(shared_processor);
 
 int CMO_PrPSP = -1;
 int CMO_SecPSP = -1;

base-commit: adf3c31e18b765ea24eba7b0c1efc076b8ee3d55
-- 
2.18.2

RE: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-07-28 Thread Paul Murphy

 
(My apologies for however IBM's email client munges this)
> I heard it is going to be in Go 1.16.7, but I do not know much about Go.> Maybe the folks in Cc can chime in.
 
 
We have backports primed and ready for the next point release. They are waiting on the release manager to cherrypick them.
 
I think we were aware that our VDSO usage may have exploited some peculiarities in how the ppc64 version was constructed (i.e hand written assembly which just didn't happen to clobber R30). Go up to this point has only used the vdso function __kernel_clock_gettime; it is the only entry point which would need to explicitly avoid R30 for Go's sake.
 
Paul M.

Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Sachin Sant




> On 28-Jul-2021, at 11:05 PM, Nathan Chancellor  wrote:
> 
> On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
>> linux-next fails to boot on Power server (POWER8/POWER9). Following traces
>> are seen during boot
>> 
>> [0.010799] software IO TLB: tearing down default memory pool
>> [0.010805] [ cut here ]
>> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
>> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
…….
> 
> I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
> commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That

Indeed. Thanks Nathan.
Bisect points to this commit.  Reverting the commit allows the kernel to boot.

Thanks
-Sachin

> series just keeps on giving... Adding some people from that thread to
> this one. Original thread:
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/
> 
> [1]: 
> https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default
> 
> Cheers,
> Nathan

[PATCH v2 2/2] selftests: Skip TM tests on synthetic TM implementations

2021-07-28 Thread Jordan Niethe

Transactional Memory was removed from the architecture in ISA v3.1. For
threads running in P8/P9 compatibility mode on P10 a synthetic TM
implementation is provided. In this implementation, tbegin. always sets
cr0 eq meaning the abort handler is always called. This is not an issue
as users of TM are expected to have a fallback non transactional way to
make forward progress in the abort handler.  The TEXASR indicates if a
transaction failure is due to a synthetic implementation.

Some of the TM self tests need a non-degenerate TM implementation for
their testing to be meaningful so check for a synthetic implementation
and skip the test if so.

Signed-off-by: Jordan Niethe 
---
v2: Added checking for synthetic implementation to more tests
---
 .../selftests/powerpc/ptrace/ptrace-tm-gpr.c  |  1 +
 .../powerpc/ptrace/ptrace-tm-spd-gpr.c|  1 +
 .../powerpc/ptrace/ptrace-tm-spd-tar.c|  1 +
 .../powerpc/ptrace/ptrace-tm-spd-vsx.c|  1 +
 .../selftests/powerpc/ptrace/ptrace-tm-spr.c  |  1 +
 .../selftests/powerpc/ptrace/ptrace-tm-tar.c  |  1 +
 .../selftests/powerpc/ptrace/ptrace-tm-vsx.c  |  1 +
 .../selftests/powerpc/signal/signal_tm.c  |  1 +
 tools/testing/selftests/powerpc/tm/tm-exec.c  |  1 +
 tools/testing/selftests/powerpc/tm/tm-fork.c  |  1 +
 .../testing/selftests/powerpc/tm/tm-poison.c  |  1 +
 .../selftests/powerpc/tm/tm-resched-dscr.c|  1 +
 .../powerpc/tm/tm-signal-context-chk-fpu.c|  1 +
 .../powerpc/tm/tm-signal-context-chk-gpr.c|  1 +
 .../powerpc/tm/tm-signal-context-chk-vmx.c|  1 +
 .../powerpc/tm/tm-signal-context-chk-vsx.c|  1 +
 .../powerpc/tm/tm-signal-pagefault.c  |  1 +
 .../powerpc/tm/tm-signal-sigreturn-nt.c   |  1 +
 .../selftests/powerpc/tm/tm-signal-stack.c|  1 +
 .../selftests/powerpc/tm/tm-sigreturn.c   |  1 +
 .../testing/selftests/powerpc/tm/tm-syscall.c |  2 +-
 tools/testing/selftests/powerpc/tm/tm-tar.c   |  1 +
 tools/testing/selftests/powerpc/tm/tm-tmspr.c |  1 +
 tools/testing/selftests/powerpc/tm/tm-trap.c  |  1 +
 .../selftests/powerpc/tm/tm-unavailable.c |  1 +
 .../selftests/powerpc/tm/tm-vmx-unavail.c |  1 +
 .../testing/selftests/powerpc/tm/tm-vmxcopy.c |  1 +
 tools/testing/selftests/powerpc/tm/tm.h   | 36 +++
 28 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
index 7df7100a29be..67ca297c5cca 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
@@ -113,6 +113,7 @@ int ptrace_tm_gpr(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 2, 0777|IPC_CREAT);
pid = fork();
if (pid < 0) {
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
index 8706bea5d015..6f2bce1b6c5d 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
@@ -119,6 +119,7 @@ int ptrace_tm_spd_gpr(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT);
pid = fork();
if (pid < 0) {
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c
index 2ecfa1158e2b..e112a34fbe59 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c
@@ -129,6 +129,7 @@ int ptrace_tm_spd_tar(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT);
pid = fork();
if (pid == 0)
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c
index 6f7fb51f0809..40133d49fe39 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c
@@ -129,6 +129,7 @@ int ptrace_tm_spd_vsx(void)
int ret, status, i;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT);
 
for (i = 0; i < 128; i++) {
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c
index 068bfed2e606..880ba6a29a48 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c
@@ -114,6 +114,7 @@ int ptrace_tm_spr(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id =

[PATCH v2 1/2] selftests/powerpc: Add missing clobbered register to to ptrace TM tests

2021-07-28 Thread Jordan Niethe

ISA v3.1 removes TM but includes a synthetic implementation for
backwards compatibility.  With this implementation,  the tests
ptrace-tm-spd-gpr and ptrace-tm-gpr should never be able to make any
forward progress and eventually should be killed by the timeout.
Instead on a P10 running in P9 mode, ptrace_tm_gpr fails like so:

test: ptrace_tm_gpr
tags: git_version:unknown
Starting the child
...
...
GPR[27]: 1 Expected: 2
GPR[28]: 1 Expected: 2
GPR[29]: 1 Expected: 2
GPR[30]: 1 Expected: 2
GPR[31]: 1 Expected: 2
[FAIL] Test FAILED on line 98
failure: ptrace_tm_gpr
selftests:  ptrace-tm-gpr [FAIL]

The problem is in the inline assembly of the child. r0 is loaded with a
value in the child's transaction abort handler but this register is not
included in the clobbers list.  This means it is possible that this
statement:
cptr[1] = 0;
which is meant to signal the parent to wait may actually use the value
placed into r0 by the inline assembly incorrectly signal the parent to
continue.

By inspection the same problem is present in ptrace-tm-spd-gpr.

Adding r0 to the clobbbers list makes the test fail correctly via a
timeout on a P10 running in P8/P9 compatibility mode.

Suggested-by: Michael Neuling 
Signed-off-by: Jordan Niethe 
---
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c | 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
index 82f7bdc2e5e6..7df7100a29be 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
@@ -57,7 +57,7 @@ void tm_gpr(void)
: [gpr_1]"i"(GPR_1), [gpr_2]"i"(GPR_2),
[sprn_texasr] "i" (SPRN_TEXASR), [flt_1] "b" (&a),
[flt_2] "b" (&b), [cptr1] "b" (&cptr[1])
-   : "memory", "r7", "r8", "r9", "r10",
+   : "memory", "r0", "r7", "r8", "r9", "r10",
"r11", "r12", "r13", "r14", "r15", "r16",
"r17", "r18", "r19", "r20", "r21", "r22",
"r23", "r24", "r25", "r26", "r27", "r28",
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
index ad65be6e8e85..8706bea5d015 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
@@ -65,7 +65,7 @@ void tm_spd_gpr(void)
: [gpr_1]"i"(GPR_1), [gpr_2]"i"(GPR_2), [gpr_4]"i"(GPR_4),
[sprn_texasr] "i" (SPRN_TEXASR), [flt_1] "b" (&a),
[flt_4] "b" (&d)
-   : "memory", "r5", "r6", "r7",
+   : "memory", "r0", "r5", "r6", "r7",
"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
"r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
"r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31"
-- 
2.25.1

Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Nicholas Piggin

Excerpts from Nathan Chancellor's message of July 29, 2021 3:35 am:
> On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
>> linux-next fails to boot on Power server (POWER8/POWER9). Following traces
>> are seen during boot
>> 
>> [0.010799] software IO TLB: tearing down default memory pool
>> [0.010805] [ cut here ]
>> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
>> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
>> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> [0.010820] Modules linked in:
>> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
>> 5.14.0-rc3-next-20210727 #1
>> [0.010830] NIP:  c0032cfc LR: c000c764 CTR: 
>> c000c670
>> [0.010834] REGS: c3603b10 TRAP: 0700   Not tainted  
>> (5.14.0-rc3-next-20210727)
>> [0.010838] MSR:  80029033   CR: 28000222  
>> XER: 0002
>> [0.010848] CFAR: c000c760 IRQMASK: 3 
>> [0.010848] GPR00: c000c764 c3603db0 c29bd000 
>> 0001 
>> [0.010848] GPR04: 0a68 0400 c3603868 
>>  
>> [0.010848] GPR08:    
>> 0003 
>> [0.010848] GPR12:  c0001ec9ee80 c0012a28 
>>  
>> [0.010848] GPR16:    
>>  
>> [0.010848] GPR20:    
>>  
>> [0.010848] GPR24: f134   
>> c3603868 
>> [0.010848] GPR28: 0400 0a68 c202e9c0 
>> c3603e80 
>> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0
>> [0.010901] LR [c000c764] system_call_common+0xf4/0x258
>> [0.010907] Call Trace:
>> [0.010909] [c3603db0] [c016a6dc] 
>> calculate_sigpending+0x4c/0xe0 (unreliable)
>> [0.010915] [c3603e10] [c000c764] 
>> system_call_common+0xf4/0x258
>> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8
>> [0.010926] NIP:  c0092dec LR: c0114fc8 CTR: 
>> 
>> [0.010930] REGS: c3603e80 TRAP: 0c00   Not tainted  
>> (5.14.0-rc3-next-20210727)
>> [0.010934] MSR:  80009033   CR: 28000222  
>> XER: 
>> [0.010943] IRQMASK: 0 
>> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 
>> f134 
>> [0.010943] GPR04: 0a68 0400 c3603868 
>>  
>> [0.010943] GPR08:    
>>  
>> [0.010943] GPR12:  c0001ec9ee80 c0012a28 
>>  
>> [0.010943] GPR16:    
>>  
>> [0.010943] GPR20:    
>>  
>> [0.010943] GPR24: c20033c4 c110afc0 c2081950 
>> c3277d40 
>> [0.010943] GPR28:  ca68 0400 
>> 000d 
>> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8
>> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60
>> [0.010999] --- interrupt: c00
>> [0.011001] [c3603b00] [c000c764] 
>> system_call_common+0xf4/0x258 (unreliable)
>> [0.011008] Instruction dump:
>> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 
>> e87f0108 68690002 
>> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 
>> 792907e0 0b09 
>> [0.011029] ---[ end trace a20ad55589efcb10 ]---
>> [0.012297] 
>> [1.012304] Kernel panic - not syncing: Fatal exception
>> 
>> next-20210723 was good. The boot failure seems to have been introduced with 
>> next-20210726.
>> 
>> I have attached the boot log.
> 
> I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
> commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That
> series just keeps on giving... Adding some people from that thread to
> this one. Original thread:
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/

This is because powerpc's set_memory_encrypted makes an ultracall but it 
does not exist on that processor.

x86's set_memory_encrypted/decrypted have

   /* Nothing to do if memory encryption is not active */
if (!mem_encrypt_active())
return 0;

Probably powerpc should just do that too.

Thanks,
Nick

Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-28 Thread Steven Rostedt

On Thu, 29 Jul 2021 10:00:51 +0800
Kefeng Wang  wrote:

> On 2021/7/28 23:28, Steven Rostedt wrote:
> > On Wed, 28 Jul 2021 16:13:18 +0800
> > Kefeng Wang  wrote:
> >  
> >> The is_kernel[_text]() function check the address whether or not
> >> in kernel[_text] ranges, also they will check the address whether
> >> or not in gate area, so use better name.  
> > Do you know what a gate area is?
> >
> > Because I believe gate area is kernel text, so the rename just makes it
> > redundant and more confusing.  
> 
> Yes, the gate area(eg, vectors part on ARM32, similar on x86/ia64) is 
> kernel text.
> 
> I want to keep the 'basic' section boundaries check, which only check 
> the start/end
> 
> of sections, all in section.h,  could we use 'generic' or 'basic' or 
> 'core' in the naming?
> 
>   * is_kernel_generic_data()  --- come from core_kernel_data() in kernel.h
>   * is_kernel_generic_text()
> 
> The old helper could remain unchanged, any suggestion, thanks.

Because it looks like the check of just being in the range of "_stext"
to "_end" is just an internal helper, why not do what we do all over
the kernel, and just prefix the function with a couple of underscores,
that denote that it's internal?

  __is_kernel_text()

Then you have:

 static inline int is_kernel_text(unsigned long addr)
 {
if (__is_kernel_text(addr))
return 1;
return in_gate_area_no_mm(addr);
 }

-- Steve

Re: [PATCH v5 2/2] KVM: PPC: Book3S HV: Stop forwarding all HFUs to L1

2021-07-28 Thread Nicholas Piggin

Excerpts from Fabiano Rosas's message of July 28, 2021 12:36 am:
> Nicholas Piggin  writes:
> 
>> Excerpts from Fabiano Rosas's message of July 27, 2021 6:17 am:
>>> If the nested hypervisor has no access to a facility because it has
>>> been disabled by the host, it should also not be able to see the
>>> Hypervisor Facility Unavailable that arises from one of its guests
>>> trying to access the facility.
>>> 
>>> This patch turns a HFU that happened in L2 into a Hypervisor Emulation
>>> Assistance interrupt and forwards it to L1 for handling. The ones that
>>> happened because L1 explicitly disabled the facility for L2 are still
>>> let through, along with the corresponding Cause bits in the HFSCR.
>>> 
>>> Signed-off-by: Fabiano Rosas 
>>> Reviewed-by: Nicholas Piggin 
>>> ---
>>>  arch/powerpc/kvm/book3s_hv_nested.c | 32 +++--
>>>  1 file changed, 26 insertions(+), 6 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
>>> b/arch/powerpc/kvm/book3s_hv_nested.c
>>> index 8215dbd4be9a..d544b092b49a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv_nested.c
>>> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
>>> @@ -99,7 +99,7 @@ static void byteswap_hv_regs(struct hv_guest_state *hr)
>>> hr->dawrx1 = swab64(hr->dawrx1);
>>>  }
>>>  
>>> -static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
>>> +static void save_hv_return_state(struct kvm_vcpu *vcpu,
>>>  struct hv_guest_state *hr)
>>>  {
>>> struct kvmppc_vcore *vc = vcpu->arch.vcore;
>>> @@ -118,7 +118,7 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
>>> int trap,
>>> hr->pidr = vcpu->arch.pid;
>>> hr->cfar = vcpu->arch.cfar;
>>> hr->ppr = vcpu->arch.ppr;
>>> -   switch (trap) {
>>> +   switch (vcpu->arch.trap) {
>>> case BOOK3S_INTERRUPT_H_DATA_STORAGE:
>>> hr->hdar = vcpu->arch.fault_dar;
>>> hr->hdsisr = vcpu->arch.fault_dsisr;
>>> @@ -128,9 +128,29 @@ static void save_hv_return_state(struct kvm_vcpu 
>>> *vcpu, int trap,
>>> hr->asdr = vcpu->arch.fault_gpa;
>>> break;
>>> case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
>>> -   hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
>>> -(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
>>> -   break;
>>> +   {
>>> +   u8 cause = vcpu->arch.hfscr >> 56;
>>
>> Can this be u64 just to help gcc?
>>
> 
> Yes.
> 
>>> +
>>> +   WARN_ON_ONCE(cause >= BITS_PER_LONG);
>>> +
>>> +   if (!(hr->hfscr & (1UL << cause))) {
>>> +   hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) |
>>> +(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
>>> +   break;
>>> +   }
>>> +
>>> +   /*
>>> +* We have disabled this facility, so it does not
>>> +* exist from L1's perspective. Turn it into a HEAI.
>>> +*/
>>> +   vcpu->arch.trap = BOOK3S_INTERRUPT_H_EMUL_ASSIST;
>>> +   kvmppc_load_last_inst(vcpu, INST_GENERIC, 
>>> &vcpu->arch.emul_inst);
>>
>> Hmm, this doesn't handle kvmpc_load_last_inst failure. Other code tends 
>> to just resume guest and retry in this case. Can we do that here?
>>
> 
> Not at this point. The other code does that inside
> kvmppc_handle_exit_hv, which is called from kvmhv_run_single_vcpu. And
> since we're changing the interrupt, I cannot load the last instruction
> at kvmppc_handle_nested_exit because at that point this is still an HFU.
> 
> Unless I do it anyway at the HFU handler and put a comment explaining
> the situation.

Yeah I think it would be better to move this logic to the nested exit 
handler.

Thanks,
Nick

Re: [PATCH] ibmvfc: fix command state accounting and stale response detection

2021-07-28 Thread Martin K. Petersen

On Fri, 16 Jul 2021 14:52:20 -0600, Tyrel Datwyler wrote:

> Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside
> the host/queue lock") responses to commands were completed sequentially
> with the host lock held such that a command had a basic binary state of
> active or free. It was therefore a simple affair of ensuring the
> assocaiated ibmvfc_event to a VIOS response was valid by testing that it
> was not already free. The lock relexation work to complete commands
> outside the lock inadverdently made it a trinary command state such that
> a command is either in flight, received and being completed, or
> completed and now free. This breaks the stale command detection logic as
> a command may be still marked active and been placed on the delayed
> completion list when a second stale response for the same command
> arrives. This can lead to double completions and list corruption. This
> issue was exposed by a recent VIOS regression were a missing memory
> barrier could occasionally result in the ibmvfc client receiveing a
> duplicate response for the same command.
> 
> [...]

Applied to 5.14/scsi-fixes, thanks!

[1/1] ibmvfc: fix command state accounting and stale response detection
  https://git.kernel.org/mkp/scsi/c/73bfdf707d01

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range

2021-07-28 Thread Kefeng Wang




On 2021/7/28 22:46, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:15 +0800
Kefeng Wang  wrote:


The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when
check the address range, the issue exists since Linux v2.6.12.

Cc: Arnd Bergmann 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Acked-by: Sergey Senozhatsky 
Reviewed-by: Petr Mladek 
Signed-off-by: Kefeng Wang 

Reviewed-by: Steven Rostedt (VMware) 


Thanks.



-- Steve

Re: [PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang




On 2021/7/28 23:32, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:19 +0800
Kefeng Wang  wrote:


@@ -64,8 +64,7 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
  
  int notrace core_kernel_text(unsigned long addr)

  {
-   if (addr >= (unsigned long)_stext &&
-   addr < (unsigned long)_etext)
+   if (is_kernel_text(addr))

Perhaps this was a bug, and these functions should be checking the gate
area as well, as that is part of kernel text.

Ok, I would fix this if patch5 is reviewed well.


-- Steve



return 1;
  
  	if (system_state < SYSTEM_RUNNING &&

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 884a950c7026..88f5b0c058b7 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, void 
*object,
  
  static inline bool kernel_or_module_addr(const void *addr)

  {
-   if (addr >= (void *)_stext && addr < (void *)_end)
+   if (is_kernel((unsigned long)addr))
return true;
if (is_module_address((unsigned long)addr))
return true;
--

.

Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang




On 2021/7/28 23:28, Steven Rostedt wrote:

On Wed, 28 Jul 2021 16:13:18 +0800
Kefeng Wang  wrote:


The is_kernel[_text]() function check the address whether or not
in kernel[_text] ranges, also they will check the address whether
or not in gate area, so use better name.

Do you know what a gate area is?

Because I believe gate area is kernel text, so the rename just makes it
redundant and more confusing.


Yes, the gate area(eg, vectors part on ARM32, similar on x86/ia64) is 
kernel text.


I want to keep the 'basic' section boundaries check, which only check 
the start/end


of sections, all in section.h,  could we use 'generic' or 'basic' or 
'core' in the naming?


 * is_kernel_generic_data() --- come from core_kernel_data() in kernel.h
 * is_kernel_generic_text()

The old helper could remain unchanged, any suggestion, thanks.



-- Steve
.

Re: [PATCH] arch: Kconfig: clean up obsolete use of HAVE_IDE

2021-07-28 Thread Jens Axboe

On 7/28/21 12:21 PM, Lukas Bulwahn wrote:
> The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is
> supported.
> 
> As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63
> ("ide: remove the legacy ide driver"), there is no need to mention
> HAVE_IDE in all those arch-specific Kconfig files.
> 
> The issue was identified with ./scripts/checkkconfigsymbols.py.

Thanks, let's queue this for 5.14 to avoid any future conflicts with
it.

-- 
Jens Axboe

Re: [PATCH] arch: Kconfig: clean up obsolete use of HAVE_IDE

2021-07-28 Thread Randy Dunlap

On 7/28/21 11:21 AM, Lukas Bulwahn wrote:
> The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is
> supported.
> 
> As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63
> ("ide: remove the legacy ide driver"), there is no need to mention
> HAVE_IDE in all those arch-specific Kconfig files.
> 
> The issue was identified with ./scripts/checkkconfigsymbols.py.
> 
> Fixes: b7fb14d3ac63 ("ide: remove the legacy ide driver")
> Suggested-by: Randy Dunlap 
> Signed-off-by: Lukas Bulwahn 

Acked-by: Randy Dunlap 

Thanks.

> ---
>  arch/alpha/Kconfig| 1 -
>  arch/arm/Kconfig  | 6 --
>  arch/arm/mach-davinci/Kconfig | 1 -
>  arch/h8300/Kconfig.cpu| 1 -
>  arch/ia64/Kconfig | 1 -
>  arch/m68k/Kconfig | 1 -
>  arch/mips/Kconfig | 1 -
>  arch/parisc/Kconfig   | 1 -
>  arch/powerpc/Kconfig  | 1 -
>  arch/sh/Kconfig   | 1 -
>  arch/sparc/Kconfig| 1 -
>  arch/x86/Kconfig  | 1 -
>  arch/xtensa/Kconfig   | 1 -
>  13 files changed, 18 deletions(-)
> 


-- 
~Randy

[PATCH] arch: Kconfig: clean up obsolete use of HAVE_IDE

2021-07-28 Thread Lukas Bulwahn

The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is
supported.

As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63
("ide: remove the legacy ide driver"), there is no need to mention
HAVE_IDE in all those arch-specific Kconfig files.

The issue was identified with ./scripts/checkkconfigsymbols.py.

Fixes: b7fb14d3ac63 ("ide: remove the legacy ide driver")
Suggested-by: Randy Dunlap 
Signed-off-by: Lukas Bulwahn 
---
 arch/alpha/Kconfig| 1 -
 arch/arm/Kconfig  | 6 --
 arch/arm/mach-davinci/Kconfig | 1 -
 arch/h8300/Kconfig.cpu| 1 -
 arch/ia64/Kconfig | 1 -
 arch/m68k/Kconfig | 1 -
 arch/mips/Kconfig | 1 -
 arch/parisc/Kconfig   | 1 -
 arch/powerpc/Kconfig  | 1 -
 arch/sh/Kconfig   | 1 -
 arch/sparc/Kconfig| 1 -
 arch/x86/Kconfig  | 1 -
 arch/xtensa/Kconfig   | 1 -
 13 files changed, 18 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 77d3280dc678..a6d4c2f744e3 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -14,7 +14,6 @@ config ALPHA
select PCI_SYSCALL if PCI
select HAVE_AOUT
select HAVE_ASM_MODVERSIONS
-   select HAVE_IDE
select HAVE_PCSPKR_PLATFORM
select HAVE_PERF_EVENTS
select NEED_DMA_MAP_STATE
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 82f908fa5676..2fb7012c3246 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -95,7 +95,6 @@ config ARM
select HAVE_FUNCTION_TRACER if !XIP_KERNEL
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || 
CPU_V7)
-   select HAVE_IDE if PCI || ISA || PCMCIA
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZ4
@@ -361,7 +360,6 @@ config ARCH_FOOTBRIDGE
bool "FootBridge"
select CPU_SA110
select FOOTBRIDGE
-   select HAVE_IDE
select NEED_MACH_IO_H if !MMU
select NEED_MACH_MEMORY_H
help
@@ -430,7 +428,6 @@ config ARCH_PXA
select GENERIC_IRQ_MULTI_HANDLER
select GPIO_PXA
select GPIOLIB
-   select HAVE_IDE
select IRQ_DOMAIN
select PLAT_PXA
select SPARSE_IRQ
@@ -446,7 +443,6 @@ config ARCH_RPC
select ARM_HAS_SG_CHAIN
select CPU_SA110
select FIQ
-   select HAVE_IDE
select HAVE_PATA_PLATFORM
select ISA_DMA_API
select LEGACY_TIMER_TICK
@@ -469,7 +465,6 @@ config ARCH_SA1100
select CPU_SA1100
select GENERIC_IRQ_MULTI_HANDLER
select GPIOLIB
-   select HAVE_IDE
select IRQ_DOMAIN
select ISA
select NEED_MACH_MEMORY_H
@@ -505,7 +500,6 @@ config ARCH_OMAP1
select GENERIC_IRQ_CHIP
select GENERIC_IRQ_MULTI_HANDLER
select GPIOLIB
-   select HAVE_IDE
select HAVE_LEGACY_CLK
select IRQ_DOMAIN
select NEED_MACH_IO_H if PCCARD
diff --git a/arch/arm/mach-davinci/Kconfig b/arch/arm/mach-davinci/Kconfig
index de11030748d0..1d3aef84287d 100644
--- a/arch/arm/mach-davinci/Kconfig
+++ b/arch/arm/mach-davinci/Kconfig
@@ -9,7 +9,6 @@ menuconfig ARCH_DAVINCI
select PM_GENERIC_DOMAINS_OF if PM && OF
select REGMAP_MMIO
select RESET_CONTROLLER
-   select HAVE_IDE
select PINCTRL_SINGLE
 
 if ARCH_DAVINCI
diff --git a/arch/h8300/Kconfig.cpu b/arch/h8300/Kconfig.cpu
index 2b9cbaf41cd0..e4467d40107d 100644
--- a/arch/h8300/Kconfig.cpu
+++ b/arch/h8300/Kconfig.cpu
@@ -44,7 +44,6 @@ config H8300_H8MAX
bool "H8MAX"
select H83069
select RAMKERNEL
-   select HAVE_IDE
help
  H8MAX Evaluation Board Support
  More Information. (Japanese Only)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index cf425c2c63af..4993c7ac7ff6 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,7 +25,6 @@ config IA64
select HAVE_ASM_MODVERSIONS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_EXIT_THREAD
-   select HAVE_IDE
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 96989ad46f66..d632a1d576f9 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -23,7 +23,6 @@ config M68K
select HAVE_DEBUG_BUGVERBOSE
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED
select HAVE_FUTEX_CMPXCHG if MMU && FUTEX
-   select HAVE_IDE
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_UID16
select MMU_GATHER_NO_RANGE if MMU
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index cee6087cd686..6dfb27d531dd 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -71,7 +71,6 @@ config MIPS
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
select HAVE_GENERIC_VDSO
-   select HAVE_IDE
select HAVE_IOREMAP

[PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings

2021-07-28 Thread Parth Shah

On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
itself to find both the L2 and L3 cache siblings.
Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
but fill the mask using same property "2" array.

Signed-off-by: Parth Shah 
---
 arch/powerpc/include/asm/smp.h  |  3 ++
 arch/powerpc/kernel/cacheinfo.c |  3 ++
 arch/powerpc/kernel/smp.c   | 66 ++---
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 1259040cc3a4..7ef1cd8168a0 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -35,6 +35,7 @@ extern int *chip_id_lookup_table;
 
 DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
 DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
+DECLARE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map);
 
 #ifdef CONFIG_SMP
 
@@ -144,6 +145,7 @@ extern int cpu_to_core_id(int cpu);
 
 extern bool has_big_cores;
 extern bool thread_group_shares_l2;
+extern bool thread_group_shares_l3;
 
 #define cpu_smt_mask cpu_smt_mask
 #ifdef CONFIG_SCHED_SMT
@@ -198,6 +200,7 @@ extern void __cpu_die(unsigned int cpu);
 #define hard_smp_processor_id()get_hard_smp_processor_id(0)
 #define smp_setup_cpu_maps()
 #define thread_group_shares_l2  0
+#define thread_group_shares_l3 0
 static inline void inhibit_secondary_onlining(void) {}
 static inline void uninhibit_secondary_onlining(void) {}
 static inline const struct cpumask *cpu_sibling_mask(int cpu)
diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 20d91693eac1..cf1be75b7833 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level)
else if (thread_group_shares_l2 && level == 2)
return cpumask_first(per_cpu(thread_group_l2_cache_map,
 cpu_id));
+   else if (thread_group_shares_l3 && level == 3)
+   return cpumask_first(per_cpu(thread_group_l3_cache_map,
+cpu_id));
return -1;
 }
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a7fcac44a8e2..f2abd88e0c25 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -78,6 +78,7 @@ struct task_struct *secondary_current;
 bool has_big_cores;
 bool coregroup_enabled;
 bool thread_group_shares_l2;
+bool thread_group_shares_l3;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
@@ -101,7 +102,7 @@ enum {
 
 #define MAX_THREAD_LIST_SIZE   8
 #define THREAD_GROUP_SHARE_L1   1
-#define THREAD_GROUP_SHARE_L2   2
+#define THREAD_GROUP_SHARE_L2_L3 2
 struct thread_groups {
unsigned int property;
unsigned int nr_groups;
@@ -131,6 +132,12 @@ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
  */
 DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
 
+/*
+ * On P10, thread_group_l3_cache_map for each CPU is equal to the
+ * thread_group_l2_cache_map
+ */
+DEFINE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map);
+
 /* SMP operations for this machine */
 struct smp_ops_t *smp_ops;
 
@@ -889,19 +896,41 @@ static struct thread_groups *__init get_thread_groups(int 
cpu,
return tg;
 }
 
+static int update_mask_from_threadgroup(cpumask_var_t *mask, struct 
thread_groups *tg, int cpu, int cpu_group_start)
+{
+   int first_thread = cpu_first_thread_sibling(cpu);
+   int i;
+
+   zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
+
+   for (i = first_thread; i < first_thread + threads_per_core; i++) {
+   int i_group_start = get_cpu_thread_group_start(i, tg);
+
+   if (unlikely(i_group_start == -1)) {
+   WARN_ON_ONCE(1);
+   return -ENODATA;
+   }
+
+   if (i_group_start == cpu_group_start)
+   cpumask_set_cpu(i, *mask);
+   }
+
+   return 0;
+}
+
 static int __init init_thread_group_cache_map(int cpu, int cache_property)
 
 {
-   int first_thread = cpu_first_thread_sibling(cpu);
-   int i, cpu_group_start = -1, err = 0;
+   int cpu_group_start = -1, err = 0;
struct thread_groups *tg = NULL;
cpumask_var_t *mask = NULL;
 
if (cache_property != THREAD_GROUP_SHARE_L1 &&
-   cache_property != THREAD_GROUP_SHARE_L2)
+   cache_property != THREAD_GROUP_SHARE_L2_L3)
return -EINVAL;
 
tg = get_thread_groups(cpu, cache_property, &err);
+
if (!tg)
return err;
 
@@ -912,25 +941,18 @@ static int __init init_thread_group_cache_map(int cpu, 
int cache_property)
return -ENODATA;
}
 
-   if (cache_property == THREAD_GROUP

[PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()

2021-07-28 Thread Parth Shah

From: "Gautham R. Shenoy" 

The helper function get_shared_cpu_map() was added in

'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct
shared_cpu_map on big-cores")'

and subsequently expanded upon in

'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling
map/list for L2 cache")'

in order to help report the correct groups of threads sharing these caches
on big-core systems where groups of threads within a core can share
different sets of caches.

Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
cache->shared_cpu_map contains the correct set of thread-siblings
sharing the cache. Hence we no longer need the functions
get_shared_cpu_map(). This patch removes this function. We also remove
the helper function index_dir_to_cpu() which was only called by
get_shared_cpu_map().

With these functions removed, we can still see the correct
cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
caches distributed among groups of threads in a core.

With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
are split between the two groups of threads in a core, for CPUs 8,9,
the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
follows:

$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15

$ ppc64_cpu --smt=4
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11

$ ppc64_cpu --smt=2
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9

$ ppc64_cpu --smt=1
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kernel/cacheinfo.c | 41 +
 1 file changed, 1 insertion(+), 40 deletions(-)

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 5a6925d87424..20d91693eac1 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct 
kobj_attribute *attr, char *
 static struct kobj_attribute cache_level_attr =
__ATTR(level, 0444, level_show, NULL);
 
-static unsigned int index_dir_to_cpu(struct cache_index_dir *index)
-{
-   struct kobject *index_dir_kobj = &index->kobj;
-   struct kobject *cache_dir_kobj = index_dir_kobj->parent;
-   struct kobject *cpu_dev_kobj = cache_dir_kobj->parent;
-   struct device *dev = kobj_to_dev(cpu_dev_kobj);
-
-   return dev->id;
-}
-
-/*
- * On big-core systems, each core has two groups of CPUs each of which
- * has its own L1-cache. The thread-siblings which share l1-cache with
- * @cpu can be obtained via cpu_smallcore_mask().
- *
- * On some big-core systems, the L2 cache is shared only between some
- * groups of siblings. This is already parsed and encoded in
- * cpu_l2_cache_mask().
- *
- * TODO: cache_lookup_or_instantiate() needs to be made aware of the
- *   "ibm,thread-groups" property so that cache->shared_cpu_map
- *   reflects the correct siblings on platforms that have this
- *   device-tree property. This helper function is only a stop-gap
- *   solution so that we report the correct siblings to the
- *   userspace via sysfs.
- */

[PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id

2021-07-28 Thread Parth Shah

From: "Gautham R. Shenoy" 

Currently the cacheinfo code on powerpc indexes the "cache" objects
(modelling the L1/L2/L3 caches) where the key is device-tree node
corresponding to that cache. On some of the POWER server platforms
thread-groups within the core share different sets of caches (Eg: On
SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
threads 1,3,5,7 of the same core share another L1 cache). On such
platforms, there is a single device-tree node corresponding to that
cache and the cache-configuration within the threads of the core is
indicated via "ibm,thread-groups" device-tree property.

Since the current code is not aware of the "ibm,thread-groups"
property, on the aforementoined systems, cacheinfo code still treats
all the threads in the core to be sharing the cache because of the
single device-tree node (In the earlier example, the cacheinfo code
would says CPUs 0-7 share L1 cache).

In this patch, we make the powerpc cacheinfo code aware of the
"ibm,thread-groups" property. We indexe the "cache" objects by the
key-pair (device-tree node, thread-group id). For any CPUX, for a
given level of cache, the thread-group id is defined to be the first
CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
of cache which are not represented in "ibm,thread-groups" property,
the thread-group id is -1.

Signed-off-by: Gautham R. Shenoy 
[parth: Remove "static" keyword for the definition of 
"thread_group_l1_cache_map"
and "thread_group_l2_cache_map" to get rid of the compile error.]
Signed-off-by: Parth Shah 
---
 arch/powerpc/include/asm/smp.h  |  3 ++
 arch/powerpc/kernel/cacheinfo.c | 80 -
 arch/powerpc/kernel/smp.c   |  4 +-
 3 files changed, 63 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 03b3d010cbab..1259040cc3a4 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -33,6 +33,9 @@ extern bool coregroup_enabled;
 extern int cpu_to_chip_id(int cpu);
 extern int *chip_id_lookup_table;
 
+DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
+DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
+
 #ifdef CONFIG_SMP
 
 struct smp_ops_t {
diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 6f903e9aa20b..5a6925d87424 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -120,6 +120,7 @@ struct cache {
struct cpumask shared_cpu_map; /* online CPUs using this cache */
int type;  /* split cache disambiguation */
int level; /* level not explicit in device tree */
+   int group_id;  /* id of the group of threads that share 
this cache */
struct list_head list; /* global list of cache objects */
struct cache *next_local;  /* next cache of >= level */
 };
@@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache 
*cache)
 }
 
 static void cache_init(struct cache *cache, int type, int level,
-  struct device_node *ofnode)
+  struct device_node *ofnode, int group_id)
 {
cache->type = type;
cache->level = level;
cache->ofnode = of_node_get(ofnode);
+   cache->group_id = group_id;
INIT_LIST_HEAD(&cache->list);
list_add(&cache->list, &cache_list);
 }
 
-static struct cache *new_cache(int type, int level, struct device_node *ofnode)
+static struct cache *new_cache(int type, int level,
+  struct device_node *ofnode, int group_id)
 {
struct cache *cache;
 
cache = kzalloc(sizeof(*cache), GFP_KERNEL);
if (cache)
-   cache_init(cache, type, level, ofnode);
+   cache_init(cache, type, level, ofnode, group_id);
 
return cache;
 }
@@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct 
cache *cache)
return cache;
 
list_for_each_entry(iter, &cache_list, list)
-   if (iter->ofnode == cache->ofnode && iter->next_local == cache)
+   if (iter->ofnode == cache->ofnode &&
+   iter->group_id == cache->group_id &&
+   iter->next_local == cache)
return iter;
 
return cache;
 }
 
-/* return the first cache on a local list matching node */
-static struct cache *cache_lookup_by_node(const struct device_node *node)
+/* return the first cache on a local list matching node and thread-group id */
+static struct cache *cache_lookup_by_node_group(const struct device_node *node,
+   int group_id)
 {
struct cache *cache = NULL;
struct cache *iter;
 
list_for_each_entry(iter, &cache_list, list) {
-   if (iter->ofnode != node)
+   if (iter->ofnode != node ||
+   iter->group_i

[PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property

2021-07-28 Thread Parth Shah

Changes from v1 -> v2:
- Based on Gautham's comments, use a separate thread_group_l3_cache_map
  and modify parsing code to build cache_map for L3. This makes the
  cache_map building code isolated from the parsing code.
v1 can be found at:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html

On POWER10 big-core system, the L3 cache reflected by sysfs contains all
the CPUs in the big-core.

grep . /sys/devices/system/cpu/cpu0/cache/index*/shared_cpu_list
/sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list:0-7

In the above example, CPU-0 observes CPU 0-7 in L3 (index3) cache, which
is not correct as only the CPUs in small core share the L3 cache.

The "ibm,thread-groups" contains property "2" to indicate that the CPUs
share both the L2 and L3 caches. This patch-set uses this property to
reflect correct L3 topology to a cache-object.

After applying this patch-set, the topology looks like:
$> ppc64_cpu --smt=8
$> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11,13,15


$> ppc64_cpu --smt=4
$> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11

$> ppc64_cpu --smt=2
$> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9

$> ppc64_cpu --smt=1
grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8

Patches Organization:
=
This patch-set series is based on top of v5.14-rc2

- Patch 1-2: Add functionality to introduce awareness for
"ibm,thread-groups". Original (not merged) posted version can be found at:
https://lore.kernel.org/linuxppc-dev/1611041780-8640-1-git-send-email-...@linux.vnet.ibm.co
- Patch 3: Use existing L2 cache_map to detect L3 cache siblings


Gautham R. Shenoy (2):
  powerpc/cacheinfo: Lookup cache by dt node and thread-group id
  powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()

Parth Shah (1):
  powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache
siblings

 arch/powerpc/include/asm/smp.h  |   6 ++
 arch/powerpc/kernel/cacheinfo.c | 124 
 arch/powerpc/kernel/smp.c   |  70 --
 3 files changed, 115 insertions(+), 85 deletions(-)

-- 
2.26.3

Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Nathan Chancellor

On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
> linux-next fails to boot on Power server (POWER8/POWER9). Following traces
> are seen during boot
> 
> [0.010799] software IO TLB: tearing down default memory pool
> [0.010805] [ cut here ]
> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.010820] Modules linked in:
> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 5.14.0-rc3-next-20210727 #1
> [0.010830] NIP:  c0032cfc LR: c000c764 CTR: 
> c000c670
> [0.010834] REGS: c3603b10 TRAP: 0700   Not tainted  
> (5.14.0-rc3-next-20210727)
> [0.010838] MSR:  80029033   CR: 28000222  
> XER: 0002
> [0.010848] CFAR: c000c760 IRQMASK: 3 
> [0.010848] GPR00: c000c764 c3603db0 c29bd000 
> 0001 
> [0.010848] GPR04: 0a68 0400 c3603868 
>  
> [0.010848] GPR08:    
> 0003 
> [0.010848] GPR12:  c0001ec9ee80 c0012a28 
>  
> [0.010848] GPR16:    
>  
> [0.010848] GPR20:    
>  
> [0.010848] GPR24: f134   
> c3603868 
> [0.010848] GPR28: 0400 0a68 c202e9c0 
> c3603e80 
> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0
> [0.010901] LR [c000c764] system_call_common+0xf4/0x258
> [0.010907] Call Trace:
> [0.010909] [c3603db0] [c016a6dc] 
> calculate_sigpending+0x4c/0xe0 (unreliable)
> [0.010915] [c3603e10] [c000c764] 
> system_call_common+0xf4/0x258
> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8
> [0.010926] NIP:  c0092dec LR: c0114fc8 CTR: 
> 
> [0.010930] REGS: c3603e80 TRAP: 0c00   Not tainted  
> (5.14.0-rc3-next-20210727)
> [0.010934] MSR:  80009033   CR: 28000222  
> XER: 
> [0.010943] IRQMASK: 0 
> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 
> f134 
> [0.010943] GPR04: 0a68 0400 c3603868 
>  
> [0.010943] GPR08:    
>  
> [0.010943] GPR12:  c0001ec9ee80 c0012a28 
>  
> [0.010943] GPR16:    
>  
> [0.010943] GPR20:    
>  
> [0.010943] GPR24: c20033c4 c110afc0 c2081950 
> c3277d40 
> [0.010943] GPR28:  ca68 0400 
> 000d 
> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8
> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60
> [0.010999] --- interrupt: c00
> [0.011001] [c3603b00] [c000c764] 
> system_call_common+0xf4/0x258 (unreliable)
> [0.011008] Instruction dump:
> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 
> 68690002 
> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 
> 792907e0 0b09 
> [0.011029] ---[ end trace a20ad55589efcb10 ]---
> [0.012297] 
> [1.012304] Kernel panic - not syncing: Fatal exception
> 
> next-20210723 was good. The boot failure seems to have been introduced with 
> next-20210726.
> 
> I have attached the boot log.

I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That
series just keeps on giving... Adding some people from that thread to
this one. Original thread:
https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/

[1]: https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default

Cheers,
Nathan

Re: [PATCH v5 1/6] kexec: move locking into do_kexec_load

2021-07-28 Thread Eric W. Biederman

Arnd Bergmann  writes:

> From: Arnd Bergmann 
>
> The locking is the same between the native and compat version of
> sys_kexec_load(), so it can be done in the common implementation
> to reduce duplication.

Acked-by: "Eric W. Biederman" 

>
> Co-developed-by: Eric Biederman 
> Co-developed-by: Christoph Hellwig 
> Signed-off-by: Arnd Bergmann 
> ---
>  kernel/kexec.c | 44 
>  1 file changed, 16 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index c82c6c06f051..9c7aef8f4bb6 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -110,6 +110,17 @@ static int do_kexec_load(unsigned long entry, unsigned 
> long nr_segments,
>   unsigned long i;
>   int ret;
>  
> + /*
> +  * Because we write directly to the reserved memory region when loading
> +  * crash kernels we need a mutex here to prevent multiple crash kernels
> +  * from attempting to load simultaneously, and to prevent a crash kernel
> +  * from loading over the top of a in use crash kernel.
> +  *
> +  * KISS: always take the mutex.
> +  */
> + if (!mutex_trylock(&kexec_mutex))
> + return -EBUSY;
> +
>   if (flags & KEXEC_ON_CRASH) {
>   dest_image = &kexec_crash_image;
>   if (kexec_crash_image)
> @@ -121,7 +132,8 @@ static int do_kexec_load(unsigned long entry, unsigned 
> long nr_segments,
>   if (nr_segments == 0) {
>   /* Uninstall image */
>   kimage_free(xchg(dest_image, NULL));
> - return 0;
> + ret = 0;
> + goto out_unlock;
>   }
>   if (flags & KEXEC_ON_CRASH) {
>   /*
> @@ -134,7 +146,7 @@ static int do_kexec_load(unsigned long entry, unsigned 
> long nr_segments,
>  
>   ret = kimage_alloc_init(&image, entry, nr_segments, segments, flags);
>   if (ret)
> - return ret;
> + goto out_unlock;
>  
>   if (flags & KEXEC_PRESERVE_CONTEXT)
>   image->preserve_context = 1;
> @@ -171,6 +183,8 @@ static int do_kexec_load(unsigned long entry, unsigned 
> long nr_segments,
>   arch_kexec_protect_crashkres();
>  
>   kimage_free(image);
> +out_unlock:
> + mutex_unlock(&kexec_mutex);
>   return ret;
>  }
>  
> @@ -247,21 +261,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, 
> unsigned long, nr_segments,
>   ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
>   return -EINVAL;
>  
> - /* Because we write directly to the reserved memory
> -  * region when loading crash kernels we need a mutex here to
> -  * prevent multiple crash  kernels from attempting to load
> -  * simultaneously, and to prevent a crash kernel from loading
> -  * over the top of a in use crash kernel.
> -  *
> -  * KISS: always take the mutex.
> -  */
> - if (!mutex_trylock(&kexec_mutex))
> - return -EBUSY;
> -
>   result = do_kexec_load(entry, nr_segments, segments, flags);
>  
> - mutex_unlock(&kexec_mutex);
> -
>   return result;
>  }
>  
> @@ -301,21 +302,8 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
>   return -EFAULT;
>   }
>  
> - /* Because we write directly to the reserved memory
> -  * region when loading crash kernels we need a mutex here to
> -  * prevent multiple crash  kernels from attempting to load
> -  * simultaneously, and to prevent a crash kernel from loading
> -  * over the top of a in use crash kernel.
> -  *
> -  * KISS: always take the mutex.
> -  */
> - if (!mutex_trylock(&kexec_mutex))
> - return -EBUSY;
> -
>   result = do_kexec_load(entry, nr_segments, ksegments, flags);
>  
> - mutex_unlock(&kexec_mutex);
> -
>   return result;
>  }
>  #endif

Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Borislav Petkov

On Wed, Jul 28, 2021 at 02:17:27PM +0100, Christoph Hellwig wrote:
> So common checks obviously make sense, but I really hate the stupid
> multiplexer.  Having one well-documented helper per feature is much
> easier to follow.

We had that in x86 - it was called cpu_has_ where xxx is the
feature bit. It didn't scale with the sheer amount of feature bits that
kept getting added so we do cpu_feature_enabled(X86_FEATURE_XXX) now.

The idea behind this is very similar - those protected guest flags
will only grow in the couple of tens range - at least - so having a
multiplexer is a lot simpler, I'd say, than having a couple of tens of
helpers. And those PATTR flags should have good, readable names, btw.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v5 2/6] kexec: avoid compat_alloc_user_space

2021-07-28 Thread Eric W. Biederman

Arnd Bergmann  writes:

> From: Arnd Bergmann 
>
> kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load()
> uses compat_alloc_user_space() to convert the layout and put it back
> onto the user space caller stack.
>
> Moving the user space access into the syscall handler directly actually
> makes the code simpler, as the conversion for compat mode can now be
> done on kernel memory.

Acked-by: "Eric W. Biederman" 

>
> Co-developed-by: Eric Biederman 
> Co-developed-by: Christoph Hellwig 
> Link: https://lore.kernel.org/lkml/ypbtsu4gx6pl7%2...@infradead.org/
> Link: https://lore.kernel.org/lkml/m1y2cbzmnw@fess.ebiederm.org/
> Signed-off-by: Arnd Bergmann 
> ---
>  kernel/kexec.c | 61 +-
>  1 file changed, 25 insertions(+), 36 deletions(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 9c7aef8f4bb6..b5e40f069768 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -19,26 +19,9 @@
>  
>  #include "kexec_internal.h"
>  
> -static int copy_user_segment_list(struct kimage *image,
> -   unsigned long nr_segments,
> -   struct kexec_segment __user *segments)
> -{
> - int ret;
> - size_t segment_bytes;
> -
> - /* Read in the segments */
> - image->nr_segments = nr_segments;
> - segment_bytes = nr_segments * sizeof(*segments);
> - ret = copy_from_user(image->segment, segments, segment_bytes);
> - if (ret)
> - ret = -EFAULT;
> -
> - return ret;
> -}
> -
>  static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
>unsigned long nr_segments,
> -  struct kexec_segment __user *segments,
> +  struct kexec_segment *segments,
>unsigned long flags)
>  {
>   int ret;
> @@ -58,10 +41,8 @@ static int kimage_alloc_init(struct kimage **rimage, 
> unsigned long entry,
>   return -ENOMEM;
>  
>   image->start = entry;
> -
> - ret = copy_user_segment_list(image, nr_segments, segments);
> - if (ret)
> - goto out_free_image;
> + image->nr_segments = nr_segments;
> + memcpy(image->segment, segments, nr_segments * sizeof(*segments));
>  
>   if (kexec_on_panic) {
>   /* Enable special crash kernel control page alloc policy. */
> @@ -104,7 +85,7 @@ static int kimage_alloc_init(struct kimage **rimage, 
> unsigned long entry,
>  }
>  
>  static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
> - struct kexec_segment __user *segments, unsigned long flags)
> + struct kexec_segment *segments, unsigned long flags)
>  {
>   struct kimage **dest_image, *image;
>   unsigned long i;
> @@ -250,7 +231,8 @@ static inline int kexec_load_check(unsigned long 
> nr_segments,
>  SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
>   struct kexec_segment __user *, segments, unsigned long, flags)
>  {
> - int result;
> + struct kexec_segment *ksegments;
> + unsigned long result;
>  
>   result = kexec_load_check(nr_segments, flags);
>   if (result)
> @@ -261,7 +243,12 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, 
> unsigned long, nr_segments,
>   ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
>   return -EINVAL;
>  
> - result = do_kexec_load(entry, nr_segments, segments, flags);
> + ksegments = memdup_user(segments, nr_segments * sizeof(ksegments[0]));
> + if (IS_ERR(ksegments))
> + return PTR_ERR(ksegments);
> +
> + result = do_kexec_load(entry, nr_segments, ksegments, flags);
> + kfree(ksegments);
>  
>   return result;
>  }
> @@ -273,7 +260,7 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
>  compat_ulong_t, flags)
>  {
>   struct compat_kexec_segment in;
> - struct kexec_segment out, __user *ksegments;
> + struct kexec_segment *ksegments;
>   unsigned long i, result;
>  
>   result = kexec_load_check(nr_segments, flags);
> @@ -286,24 +273,26 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, 
> entry,
>   if ((flags & KEXEC_ARCH_MASK) == KEXEC_ARCH_DEFAULT)
>   return -EINVAL;
>  
> - ksegments = compat_alloc_user_space(nr_segments * sizeof(out));
> + ksegments = kmalloc_array(nr_segments, sizeof(ksegments[0]),
> + GFP_KERNEL);
> + if (!ksegments)
> + return -ENOMEM;
> +
>   for (i = 0; i < nr_segments; i++) {
>   result = copy_from_user(&in, &segments[i], sizeof(in));
>   if (result)
> - return -EFAULT;
> + goto fail;
>  
> - out.buf   = compat_ptr(in.buf);
> - out.bufsz = in.bufsz;
> - out.mem   = in.mem;
> - out.memsz = in.memsz;
> -
> - result = copy_to_user(&ksegments

Re: [PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()

2021-07-28 Thread Steven Rostedt

On Wed, 28 Jul 2021 16:13:19 +0800
Kefeng Wang  wrote:

> @@ -64,8 +64,7 @@ const struct exception_table_entry 
> *search_exception_tables(unsigned long addr)
>  
>  int notrace core_kernel_text(unsigned long addr)
>  {
> - if (addr >= (unsigned long)_stext &&
> - addr < (unsigned long)_etext)
> + if (is_kernel_text(addr))

Perhaps this was a bug, and these functions should be checking the gate
area as well, as that is part of kernel text.

-- Steve


>   return 1;
>  
>   if (system_state < SYSTEM_RUNNING &&
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index 884a950c7026..88f5b0c058b7 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, 
> void *object,
>  
>  static inline bool kernel_or_module_addr(const void *addr)
>  {
> - if (addr >= (void *)_stext && addr < (void *)_end)
> + if (is_kernel((unsigned long)addr))
>   return true;
>   if (is_module_address((unsigned long)addr))
>   return true;
> --

Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-28 Thread Steven Rostedt

On Wed, 28 Jul 2021 16:13:18 +0800
Kefeng Wang  wrote:

> The is_kernel[_text]() function check the address whether or not
> in kernel[_text] ranges, also they will check the address whether
> or not in gate area, so use better name.

Do you know what a gate area is?

Because I believe gate area is kernel text, so the rename just makes it
redundant and more confusing.

-- Steve

Re: [PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range

2021-07-28 Thread Steven Rostedt

On Wed, 28 Jul 2021 16:13:15 +0800
Kefeng Wang  wrote:

> The is_kernel_inittext/is_kernel_text/is_kernel function should not
> include the end address(the labels _einittext, _etext and _end) when
> check the address range, the issue exists since Linux v2.6.12.
> 
> Cc: Arnd Bergmann 
> Cc: Sergey Senozhatsky 
> Cc: Petr Mladek 
> Acked-by: Sergey Senozhatsky 
> Reviewed-by: Petr Mladek 
> Signed-off-by: Kefeng Wang 

Reviewed-by: Steven Rostedt (VMware) 

-- Steve

> ---
>  include/linux/kallsyms.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
> index 2a241e3f063f..b016c62f30a6 100644
> --- a/include/linux/kallsyms.h
> +++ b/include/linux/kallsyms.h
> @@ -27,21 +27,21 @@ struct module;
>  static inline int is_kernel_inittext(unsigned long addr)
>  {
>   if (addr >= (unsigned long)_sinittext
> - && addr <= (unsigned long)_einittext)
> + && addr < (unsigned long)_einittext)
>   return 1;
>   return 0;
>  }
>  
>  static inline int is_kernel_text(unsigned long addr)
>  {
> - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
> + if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
>   return 1;
>   return in_gate_area_no_mm(addr);
>  }
>  
>  static inline int is_kernel(unsigned long addr)
>  {
> - if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
> + if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
>   return 1;
>   return in_gate_area_no_mm(addr);
>  }

Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()

2021-07-28 Thread Christoph Hellwig

On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky via iommu wrote:
> Introduce an x86 version of the prot_guest_has() function. This will be
> used in the more generic x86 code to replace vendor specific calls like
> sev_active(), etc.
> 
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().
> 
> The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the
> same reasons previously stated when changing sme_active(), sev_active and

None of that applies here as none of the callers get pulled into
random macros.  The only case of that is sme_me_mask through
sme_mask, but that's not something this series replaces as far as I can
tell.

Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Christoph Hellwig

On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky via iommu wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).

So common checks obviously make sense, but I really hate the stupid
multiplexer.  Having one well-documented helper per feature is much
easier to follow.

> +#define PATTR_MEM_ENCRYPT0   /* Encrypted memory */
> +#define PATTR_HOST_MEM_ENCRYPT   1   /* Host encrypted 
> memory */
> +#define PATTR_GUEST_MEM_ENCRYPT  2   /* Guest encrypted 
> memory */
> +#define PATTR_GUEST_PROT_STATE   3   /* Guest encrypted 
> state */

The kerneldoc comments on these individual helpers will give you plenty
of space to properly document what they indicate and what a (potential)
caller should do based on them.  Something the above comments completely
fail to.

Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-07-28 Thread Paul Menzel


Dear Michael,


Am 28.07.21 um 14:43 schrieb Michael Ellerman:

Paul Menzel  writes:

Am 28.07.21 um 01:14 schrieb Benjamin Herrenschmidt:

On Tue, 2021-07-27 at 10:45 +0200, Paul Menzel wrote:



On ppc64le Go 1.16.2 from Ubuntu 21.04 terminates with a segmentation
fault [1], and it might be related to *[release-branch.go1.16] runtime:
fix crash during VDSO calls on PowerPC* [2], conjecturing that commit
ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
added in Linux 5.11 causes this.

If this is indeed the case, this would be a regression in userspace. Is
there a generic fix or should the change be reverted?


 From the look at the links you posted, this appears to be completely
broken assumptions by Go that some registers don't change while calling
what essentially are external library functions *while inside those
functions* (ie in this case from a signal handler).

I suppose it would be possible to build the VDSO with gcc arguments to
make it not use r30, but that's just gross...


Thank you for looking into this. No idea, if it falls under Linux’ no
regression policy or not.


Reluctantly yes, I think it does. Though it would have been good if it
had been reported to us sooner.

It looks like that Go fix is only committed to master, and neither of
the latest Go 1.16 or 1.15 releases contain the fix? ie. there's no way
for a user to get a working version of Go other than building master?


I heard it is going to be in Go 1.16.7, but I do not know much about Go. 
Maybe the folks in Cc can chime in.



I'll see if we can work around it in the kernel. Are you able to test a
kernel patch if I send you one?


Yes, I could test a Linux kernel patch on ppc64le (POWER 8) running 
Ubuntu 21.04.



Kind regards,

Paul

Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-07-28 Thread Michael Ellerman

Paul Menzel  writes:
> Am 28.07.21 um 01:14 schrieb Benjamin Herrenschmidt:
>> On Tue, 2021-07-27 at 10:45 +0200, Paul Menzel wrote:
>
>>> On ppc64le Go 1.16.2 from Ubuntu 21.04 terminates with a segmentation
>>> fault [1], and it might be related to *[release-branch.go1.16] runtime:
>>> fix crash during VDSO calls on PowerPC* [2], conjecturing that commit
>>> ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
>>> added in Linux 5.11 causes this.
>>>
>>> If this is indeed the case, this would be a regression in userspace. Is
>>> there a generic fix or should the change be reverted?
>> 
>> From the look at the links you posted, this appears to be completely
>> broken assumptions by Go that some registers don't change while calling
>> what essentially are external library functions *while inside those
>> functions* (ie in this case from a signal handler).
>> 
>> I suppose it would be possible to build the VDSO with gcc arguments to
>> make it not use r30, but that's just gross...
>
> Thank you for looking into this. No idea, if it falls under Linux’ no 
> regression policy or not.

Reluctantly yes, I think it does. Though it would have been good if it
had been reported to us sooner.

It looks like that Go fix is only committed to master, and neither of
the latest Go 1.16 or 1.15 releases contain the fix? ie. there's no way
for a user to get a working version of Go other than building master?

I'll see if we can work around it in the kernel. Are you able to test a
kernel patch if I send you one?

cheers

[PATCH v2 0/1] cpufreq:powernv: Fix init_chip_info initialization in numa=off

2021-07-28 Thread Pratik R. Sampat

v1: https://lkml.org/lkml/2021/7/26/1509
Changelog v1-->v2:
Based on comments from Gautham,
1. Included a #define for MAX_NR_CHIPS instead of hardcoding the
allocation.

Pratik R. Sampat (1):
  cpufreq:powernv: Fix init_chip_info initialization in numa=off

 drivers/cpufreq/powernv-cpufreq.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

-- 
2.31.1

[PATCH v2 1/1] cpufreq:powernv: Fix init_chip_info initialization in numa=off

2021-07-28 Thread Pratik R. Sampat

In the numa=off kernel command-line configuration init_chip_info() loops
around the number of chips and attempts to copy the cpumask of that node
which is NULL for all iterations after the first chip.

Hence, store the cpu mask for each chip instead of derving cpumask from
node while populating the "chips" struct array and copy that to the
chips[i].mask

Cc: sta...@vger.kernel.org
Fixes: 053819e0bf84 ("cpufreq: powernv: Handle throttling due to Pmax capping 
at chip level")
Signed-off-by: Pratik R. Sampat 
Reported-by: Shirisha Ganta 
Reviewed-by: Gautham R. Shenoy 
---
 drivers/cpufreq/powernv-cpufreq.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 005600cef273..5f0e7c315e49 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -36,6 +36,7 @@
 #define MAX_PSTATE_SHIFT   32
 #define LPSTATE_SHIFT  48
 #define GPSTATE_SHIFT  56
+#define MAX_NR_CHIPS   32
 
 #define MAX_RAMP_DOWN_TIME 5120
 /*
@@ -1046,12 +1047,20 @@ static int init_chip_info(void)
unsigned int *chip;
unsigned int cpu, i;
unsigned int prev_chip_id = UINT_MAX;
+   cpumask_t *chip_cpu_mask;
int ret = 0;
 
chip = kcalloc(num_possible_cpus(), sizeof(*chip), GFP_KERNEL);
if (!chip)
return -ENOMEM;
 
+   /* Allocate a chip cpu mask large enough to fit mask for all chips */
+   chip_cpu_mask = kcalloc(MAX_NR_CHIPS, sizeof(cpumask_t), GFP_KERNEL);
+   if (!chip_cpu_mask) {
+   ret = -ENOMEM;
+   goto free_and_return;
+   }
+
for_each_possible_cpu(cpu) {
unsigned int id = cpu_to_chip_id(cpu);
 
@@ -1059,22 +1068,25 @@ static int init_chip_info(void)
prev_chip_id = id;
chip[nr_chips++] = id;
}
+   cpumask_set_cpu(cpu, &chip_cpu_mask[nr_chips-1]);
}
 
chips = kcalloc(nr_chips, sizeof(struct chip), GFP_KERNEL);
if (!chips) {
ret = -ENOMEM;
-   goto free_and_return;
+   goto out_chip_cpu_mask;
}
 
for (i = 0; i < nr_chips; i++) {
chips[i].id = chip[i];
-   cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
+   cpumask_copy(&chips[i].mask, &chip_cpu_mask[i]);
INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
for_each_cpu(cpu, &chips[i].mask)
per_cpu(chip_info, cpu) =  &chips[i];
}
 
+out_chip_cpu_mask:
+   kfree(chip_cpu_mask);
 free_and_return:
kfree(chip);
return ret;
-- 
2.31.1

Re: [PATCH 00/11] Implement generic prot_guest_has() helper function

2021-07-28 Thread Christian König


Am 28.07.21 um 00:26 schrieb Tom Lendacky:

This patch series provides a generic helper function, prot_guest_has(),
to replace the sme_active(), sev_active(), sev_es_active() and
mem_encrypt_active() functions.

It is expected that as new protected virtualization technologies are
added to the kernel, they can all be covered by a single function call
instead of a collection of specific function calls all called from the
same locations.

The powerpc and s390 patches have been compile tested only. Can the
folks copied on this series verify that nothing breaks for them.


As GPU driver dev I'm only one end user of this, but at least from the 
high level point of view that makes totally sense to me.


Feel free to add an Acked-by: Christian König .

We could run that through the AMD GPU unit tests, but I fear we actually 
don't test on a system with SEV/SME active.


Going to raise that on our team call today.

Regards,
Christian.



Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Baoquan He 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Christian Borntraeger 
Cc: Daniel Vetter 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Airlie 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Joerg Roedel 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Thomas Zimmermann 
Cc: Vasily Gorbik 
Cc: VMware Graphics 
Cc: Will Deacon 

---

Patches based on:
   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
   commit 79e920060fa7 ("Merge branch 'WIP/fixes'")

Tom Lendacky (11):
   mm: Introduce a function to check for virtualization protection
 features
   x86/sev: Add an x86 version of prot_guest_has()
   powerpc/pseries/svm: Add a powerpc version of prot_guest_has()
   x86/sme: Replace occurrences of sme_active() with prot_guest_has()
   x86/sev: Replace occurrences of sev_active() with prot_guest_has()
   x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
   treewide: Replace the use of mem_encrypt_active() with
 prot_guest_has()
   mm: Remove the now unused mem_encrypt_active() function
   x86/sev: Remove the now unused mem_encrypt_active() function
   powerpc/pseries/svm: Remove the now unused mem_encrypt_active()
 function
   s390/mm: Remove the now unused mem_encrypt_active() function

  arch/Kconfig   |  3 ++
  arch/powerpc/include/asm/mem_encrypt.h |  5 --
  arch/powerpc/include/asm/protected_guest.h | 30 +++
  arch/powerpc/platforms/pseries/Kconfig |  1 +
  arch/s390/include/asm/mem_encrypt.h|  2 -
  arch/x86/Kconfig   |  1 +
  arch/x86/include/asm/kexec.h   |  2 +-
  arch/x86/include/asm/mem_encrypt.h | 13 +
  arch/x86/include/asm/protected_guest.h | 27 ++
  arch/x86/kernel/crash_dump_64.c|  4 +-
  arch/x86/kernel/head64.c   |  4 +-
  arch/x86/kernel/kvm.c  |  3 +-
  arch/x86/kernel/kvmclock.c |  4 +-
  arch/x86/kernel/machine_kexec_64.c | 19 +++
  arch/x86/kernel/pci-swiotlb.c  |  9 ++--
  arch/x86/kernel/relocate_kernel_64.S   |  2 +-
  arch/x86/kernel/sev.c  |  6 +--
  arch/x86/kvm/svm/svm.c |  3 +-
  arch/x86/mm/ioremap.c  | 16 +++---
  arch/x86/mm/mem_encrypt.c  | 60 +++---
  arch/x86/mm/mem_encrypt_identity.c |  3 +-
  arch/x86/mm/pat/set_memory.c   |  3 +-
  arch/x86/platform/efi/efi_64.c |  9 ++--
  arch/x86/realmode/init.c   |  8 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  4 +-
  drivers/gpu/drm/drm_cache.c|  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c|  6 +--
  drivers/iommu/amd/init.c   |  7 +--
  drivers/iommu/amd/iommu.c  |  3 +-
  drivers/iommu/amd/iommu_v2.c   |  3 +-
  drivers/iommu/iommu.c  |  3 +-
  fs/proc/vmcore.c   |  6 +--
  include/linux/mem_encrypt.h|  4 --
  include/linux/protected_guest.h| 37 +
  kernel/dma/swiotlb.c   |  4 +-
  36 files changed, 218 insertions(+), 104 deletions(-)
  create mode 100644 arch/powerpc/include/asm/protected_guest.h
  create mode 100644 arch/x86/include/asm/protected_guest.h
  create mode 100644 include/linux/protected_guest.h

Re: [PATCH] virtio-console: avoid DMA from vmalloc area

2021-07-28 Thread Xianting Tian




在 2021/7/28 下午5:01, Arnd Bergmann 写道:

On Wed, Jul 28, 2021 at 10:28 AM Xianting Tian
 wrote:

在 2021/7/28 下午3:25, Arnd Bergmann 写道:

I checked several hvc backends, like drivers/tty/hvc/hvc_riscv_sbi.c,
drivers/tty/hvc/hvc_iucv.c, drivers/tty/hvc/hvc_rtas.c, they don't use dma.

I not finished all hvc backends check yet. But I think even if all hvc
backends don't use dma currently, it is still possible that the hvc
backend using dma will be added in the furture.

So I agree with you it should better be fixed in the hvc framework,
solve the issue in the first place.

Ok, sounds good to me, no need to check more backends then.
I see the hvc-console driver is listed as 'Odd Fixes' in the maintainer
list, with nobody assigned other than the ppc kernel list (added to Cc).

Once you come up with a fix in hvc_console.c, please send that to the
tty maintainers, the ppc list and me, and I'll review it.

OK, thanks, I will submit the patch ASAP :)


 Arnd

Re: [PATCH] virtio-console: avoid DMA from vmalloc area

2021-07-28 Thread Arnd Bergmann

On Wed, Jul 28, 2021 at 10:28 AM Xianting Tian
 wrote:
> 在 2021/7/28 下午3:25, Arnd Bergmann 写道:
>
> I checked several hvc backends, like drivers/tty/hvc/hvc_riscv_sbi.c,
> drivers/tty/hvc/hvc_iucv.c, drivers/tty/hvc/hvc_rtas.c, they don't use dma.
>
> I not finished all hvc backends check yet. But I think even if all hvc
> backends don't use dma currently, it is still possible that the hvc
> backend using dma will be added in the furture.
>
> So I agree with you it should better be fixed in the hvc framework,
> solve the issue in the first place.

Ok, sounds good to me, no need to check more backends then.
I see the hvc-console driver is listed as 'Odd Fixes' in the maintainer
list, with nobody assigned other than the ppc kernel list (added to Cc).

Once you come up with a fix in hvc_console.c, please send that to the
tty maintainers, the ppc list and me, and I'll review it.

Arnd

Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)

2021-07-28 Thread Paul Menzel


Dear Benjamin,


Am 28.07.21 um 01:14 schrieb Benjamin Herrenschmidt:

On Tue, 2021-07-27 at 10:45 +0200, Paul Menzel wrote:



On ppc64le Go 1.16.2 from Ubuntu 21.04 terminates with a segmentation
fault [1], and it might be related to *[release-branch.go1.16] runtime:
fix crash during VDSO calls on PowerPC* [2], conjecturing that commit
ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
added in Linux 5.11 causes this.

If this is indeed the case, this would be a regression in userspace. Is
there a generic fix or should the change be reverted?


From the look at the links you posted, this appears to be completely
broken assumptions by Go that some registers don't change while calling
what essentially are external library functions *while inside those
functions* (ie in this case from a signal handler).

I suppose it would be possible to build the VDSO with gcc arguments to
make it not use r30, but that's just gross...


Thank you for looking into this. No idea, if it falls under Linux’ no 
regression policy or not.



Kind regards,

Paul

[PATCH] mm/pkeys: Remove unused parameter in arch_set_user_pkey_access

2021-07-28 Thread Jiashuo Liang

The arch_set_user_pkey_access function never uses its first parameter
(struct task_struct *tsk).  It is only able to set the pkey permissions for
the current task as implemented, and existing kernel code only passes
"current" to arch_set_user_pkey_access.  So remove the ambiguous parameter
to make the code clean.

Signed-off-by: Jiashuo Liang 
---
 arch/powerpc/include/asm/pkeys.h |  8 +++-
 arch/powerpc/mm/book3s64/pkeys.c |  3 +--
 arch/x86/include/asm/pkeys.h | 12 
 arch/x86/kernel/fpu/xstate.c |  3 +--
 arch/x86/mm/pkeys.c  |  3 +--
 include/linux/pkeys.h|  3 +--
 mm/mprotect.c|  2 +-
 7 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 59a2c7dbc78f..e905b2ab31e2 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -143,10 +143,8 @@ static inline int arch_override_mprotect_pkey(struct 
vm_area_struct *vma,
return __arch_override_mprotect_pkey(vma, prot, pkey);
 }
 
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-  unsigned long init_val);
-static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val)
+extern int __arch_set_user_pkey_access(int pkey, unsigned long init_val);
+static inline int arch_set_user_pkey_access(int pkey, unsigned long init_val)
 {
if (!mmu_has_feature(MMU_FTR_PKEY))
return -EINVAL;
@@ -160,7 +158,7 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
if (pkey == 0)
return init_val ? -EINVAL : 0;
 
-   return __arch_set_user_pkey_access(tsk, pkey, init_val);
+   return __arch_set_user_pkey_access(pkey, init_val);
 }
 
 static inline bool arch_pkeys_enabled(void)
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index a2d9ad138709..dc77c0a27291 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -333,8 +333,7 @@ static inline void init_iamr(int pkey, u8 init_bits)
  * Set the access rights in AMR IAMR and UAMOR registers for @pkey to that
  * specified in @init_val.
  */
-int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val)
+int __arch_set_user_pkey_access(int pkey, unsigned long init_val)
 {
u64 new_amr_bits = 0x0ul;
u64 new_iamr_bits = 0x0ul;
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 5c7bcaa79623..26d872bdee49 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -11,8 +11,7 @@
  */
 #define arch_max_pkey() (cpu_feature_enabled(X86_FEATURE_OSPKE) ? 16 : 1)
 
-extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
+extern int arch_set_user_pkey_access(int pkey, unsigned long init_val);
 
 static inline bool arch_pkeys_enabled(void)
 {
@@ -43,8 +42,7 @@ static inline int arch_override_mprotect_pkey(struct 
vm_area_struct *vma,
return __arch_override_mprotect_pkey(vma, prot, pkey);
 }
 
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
+extern int __arch_set_user_pkey_access(int pkey, unsigned long init_val);
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | 
VM_PKEY_BIT3)
 
@@ -120,10 +118,8 @@ int mm_pkey_free(struct mm_struct *mm, int pkey)
return 0;
 }
 
-extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
-extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
-   unsigned long init_val);
+extern int arch_set_user_pkey_access(int pkey, unsigned long init_val);
+extern int __arch_set_user_pkey_access(int pkey, unsigned long init_val);
 
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8def1b7f8fb..565de4a49c0a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -912,8 +912,7 @@ EXPORT_SYMBOL_GPL(get_xsave_addr);
  * This will go out and modify PKRU register to set the access
  * rights for @pkey to @init_val.
  */
-int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
- unsigned long init_val)
+int arch_set_user_pkey_access(int pkey, unsigned long init_val)
 {
u32 old_pkru, new_pkru_bits = 0;
int pkey_shift;
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index e44e938885b7..fafc10ea7cf1 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -42,8 +42,7 @@ int __execute_only_pkey(struct mm_struct *mm)
 * Set up PKRU so that it denies access for everything
 * other than execution.

[PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang

The new is_kernel() check the kernel address ranges, and the
new is_kernel_text() check the kernel text section ranges.

Then use them to make some code clear.

Cc: Arnd Bergmann 
Cc: Andrey Ryabinin 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 27 +++
 include/linux/kallsyms.h   |  4 ++--
 kernel/extable.c   |  3 +--
 mm/kasan/report.c  |  2 +-
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 4f2f32aa2b7a..6b143637ab88 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -170,6 +170,20 @@ static inline bool is_kernel_rodata(unsigned long addr)
   addr < (unsigned long)__end_rodata;
 }
 
+/**
+ * is_kernel_text - checks if the pointer address is located in the
+ *  .text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .text, false otherwise.
+ */
+static inline bool is_kernel_text(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_etext;
+}
+
 /**
  * is_kernel_inittext - checks if the pointer address is located in the
  *.init.text section
@@ -184,4 +198,17 @@ static inline bool is_kernel_inittext(unsigned long addr)
   addr < (unsigned long)_einittext;
 }
 
+/**
+ * is_kernel - checks if the pointer address is located in the kernel range
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in kernel range, false otherwise.
+ */
+static inline bool is_kernel(unsigned long addr)
+{
+   return addr >= (unsigned long)_stext &&
+  addr < (unsigned long)_end;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 4f501ac9c2c2..897d5720884f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -26,14 +26,14 @@ struct module;
 
 static inline int is_kernel_text_or_gate_area(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
+   if (is_kernel_text(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel_or_gate_area(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
+   if (is_kernel(addr))
return 1;
return in_gate_area_no_mm(addr);
 }
diff --git a/kernel/extable.c b/kernel/extable.c
index 98ca627ac5ef..0ba383d850ff 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -64,8 +64,7 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
 
 int notrace core_kernel_text(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext &&
-   addr < (unsigned long)_etext)
+   if (is_kernel_text(addr))
return 1;
 
if (system_state < SYSTEM_RUNNING &&
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 884a950c7026..88f5b0c058b7 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, void 
*object,
 
 static inline bool kernel_or_module_addr(const void *addr)
 {
-   if (addr >= (void *)_stext && addr < (void *)_end)
+   if (is_kernel((unsigned long)addr))
return true;
if (is_module_address((unsigned long)addr))
return true;
-- 
2.26.2

[PATCH v2 3/7] sections: Move and rename core_kernel_data() to is_kernel_core_data()

2021-07-28 Thread Kefeng Wang

Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then
update all the callers.

Cc: Arnd Bergmann 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: "David S. Miller" 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 14 ++
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 18 --
 kernel/trace/ftrace.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 5 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 817309e289db..26ed9fc9b4e3 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -142,6 +142,20 @@ static inline bool init_section_intersects(void *virt, 
size_t size)
return memory_intersects(__init_begin, __init_end, virt, size);
 }
 
+/**
+ * is_kernel_core_data - checks if the pointer address is located in the
+ *  .data section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .data, false otherwise.
+ */
+static inline bool is_kernel_core_data(unsigned long addr)
+{
+   return addr >= (unsigned long)_sdata &&
+  addr < (unsigned long)_edata;
+}
+
 /**
  * is_kernel_rodata - checks if the pointer address is located in the
  *.rodata section
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 1b2f0a7e00d6..0622418bafbc 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -230,7 +230,6 @@ extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
 extern int init_kernel_text(unsigned long addr);
-extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index b0ea5eb0c3b4..da26203841d4 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -82,24 +82,6 @@ int notrace core_kernel_text(unsigned long addr)
return 0;
 }
 
-/**
- * core_kernel_data - tell if addr points to kernel data
- * @addr: address to test
- *
- * Returns true if @addr passed in is from the core kernel data
- * section.
- *
- * Note: On some archs it may return true for core RODATA, and false
- *  for others. But will always be true for core RW data.
- */
-int core_kernel_data(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sdata &&
-   addr < (unsigned long)_edata)
-   return 1;
-   return 0;
-}
-
 int __kernel_text_address(unsigned long addr)
 {
if (kernel_text_address(addr))
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e6fb3e6e1ffc..d01ca1cb2d5f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -323,7 +323,7 @@ int __register_ftrace_function(struct ftrace_ops *ops)
if (!ftrace_enabled && (ops->flags & FTRACE_OPS_FL_PERMANENT))
return -EBUSY;
 
-   if (!core_kernel_data((unsigned long)ops))
+   if (!is_kernel_core_data((unsigned long)ops))
ops->flags |= FTRACE_OPS_FL_DYNAMIC;
 
add_ftrace_ops(&ftrace_ops_list, ops);
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index f6cb0d4d114c..4b45ed631eb8 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const 
char *path,
addr = (unsigned long)ent->data;
if (is_module_address(addr))
where = "module";
-   else if (core_kernel_data(addr))
+   else if (is_kernel_core_data(addr))
where = "kernel";
else
continue;
-- 
2.26.2

[PATCH v2 7/7] powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

2021-07-28 Thread Kefeng Wang

Use is_kernel_text() and is_kernel_inittext() helper to simplify code,
also drop etext, _stext, _sinittext, _einittext declaration which
already declared in section.h.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kefeng Wang 
---
 arch/powerpc/mm/pgtable_32.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index dcf5ecca19d9..13c798308c2e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -33,8 +33,6 @@
 
 #include 
 
-extern char etext[], _stext[], _sinittext[], _einittext[];
-
 static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
 
 notrace void __init early_ioremap_init(void)
@@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long 
offset, unsigned long top)
 {
unsigned long v, s;
phys_addr_t p;
-   int ktext;
+   bool ktext;
 
s = offset;
v = PAGE_OFFSET + s;
p = memstart_addr + s;
for (; s < top; s += PAGE_SIZE) {
-   ktext = ((char *)v >= _stext && (char *)v < etext) ||
-   ((char *)v >= _sinittext && (char *)v < _einittext);
+   ktext = (is_kernel_text(v) || is_kernel_inittext(v));
map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
v += PAGE_SIZE;
p += PAGE_SIZE;
-- 
2.26.2

[PATCH v2 4/7] sections: Move is_kernel_inittext() into sections.h

2021-07-28 Thread Kefeng Wang

The is_kernel_inittext() and init_kernel_text() are with same
functionality, let's just keep is_kernel_inittext() and move
it into sections.h, then update all the callers.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: x...@kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 include/asm-generic/sections.h | 14 ++
 include/linux/kallsyms.h   |  8 
 include/linux/kernel.h |  1 -
 kernel/extable.c   | 12 ++--
 5 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a1202536fc57..d92ec2ced059 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -175,7 +175,7 @@ static struct orc_entry *orc_find(unsigned long ip)
}
 
/* vmlinux .init slow lookup: */
-   if (init_kernel_text(ip))
+   if (is_kernel_inittext(ip))
return __orc_find(__start_orc_unwind_ip, __start_orc_unwind,
  __stop_orc_unwind_ip - __start_orc_unwind_ip, 
ip);
 
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 26ed9fc9b4e3..4f2f32aa2b7a 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -170,4 +170,18 @@ static inline bool is_kernel_rodata(unsigned long addr)
   addr < (unsigned long)__end_rodata;
 }
 
+/**
+ * is_kernel_inittext - checks if the pointer address is located in the
+ *.init.text section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .init.text, false otherwise.
+ */
+static inline bool is_kernel_inittext(unsigned long addr)
+{
+   return addr >= (unsigned long)_sinittext &&
+  addr < (unsigned long)_einittext;
+}
+
 #endif /* _ASM_GENERIC_SECTIONS_H_ */
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index b016c62f30a6..8a9d329c927c 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,14 +24,6 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_inittext(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext
-   && addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 static inline int is_kernel_text(unsigned long addr)
 {
if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0622418bafbc..d4ba46cf4737 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -229,7 +229,6 @@ extern bool parse_option_str(const char *str, const char 
*option);
 extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
-extern int init_kernel_text(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index da26203841d4..98ca627ac5ef 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -62,14 +62,6 @@ const struct exception_table_entry 
*search_exception_tables(unsigned long addr)
return e;
 }
 
-int init_kernel_text(unsigned long addr)
-{
-   if (addr >= (unsigned long)_sinittext &&
-   addr < (unsigned long)_einittext)
-   return 1;
-   return 0;
-}
-
 int notrace core_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_stext &&
@@ -77,7 +69,7 @@ int notrace core_kernel_text(unsigned long addr)
return 1;
 
if (system_state < SYSTEM_RUNNING &&
-   init_kernel_text(addr))
+   is_kernel_inittext(addr))
return 1;
return 0;
 }
@@ -94,7 +86,7 @@ int __kernel_text_address(unsigned long addr)
 * Since we are after the module-symbols check, there's
 * no danger of address overlap:
 */
-   if (init_kernel_text(addr))
+   if (is_kernel_inittext(addr))
return 1;
return 0;
 }
-- 
2.26.2

[PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()

2021-07-28 Thread Kefeng Wang

The is_kernel[_text]() function check the address whether or not
in kernel[_text] ranges, also they will check the address whether
or not in gate area, so use better name.

Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Sami Tolvanen 
Cc: Nathan Chancellor 
Cc: Arnd Bergmann 
Cc: b...@vger.kernel.org
Signed-off-by: Kefeng Wang 
---
 arch/x86/net/bpf_jit_comp.c | 2 +-
 include/linux/kallsyms.h| 8 
 kernel/cfi.c| 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 333650b9372a..c87d0dd4370d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -372,7 +372,7 @@ static int __bpf_arch_text_poke(void *ip, enum 
bpf_text_poke_type t,
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
   void *old_addr, void *new_addr)
 {
-   if (!is_kernel_text((long)ip) &&
+   if (!is_kernel_text_or_gate_area((long)ip) &&
!is_bpf_text_address((long)ip))
/* BPF poking in modules is not supported */
return -EINVAL;
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 8a9d329c927c..4f501ac9c2c2 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,14 +24,14 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_text(unsigned long addr)
+static inline int is_kernel_text_or_gate_area(unsigned long addr)
 {
if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
 
-static inline int is_kernel(unsigned long addr)
+static inline int is_kernel_or_gate_area(unsigned long addr)
 {
if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
return 1;
@@ -41,9 +41,9 @@ static inline int is_kernel(unsigned long addr)
 static inline int is_ksym_addr(unsigned long addr)
 {
if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
-   return is_kernel(addr);
+   return is_kernel_or_gate_area(addr);
 
-   return is_kernel_text(addr) || is_kernel_inittext(addr);
+   return is_kernel_text_or_gate_area(addr) || is_kernel_inittext(addr);
 }
 
 static inline void *dereference_symbol_descriptor(void *ptr)
diff --git a/kernel/cfi.c b/kernel/cfi.c
index e17a56639766..e7d90eff4382 100644
--- a/kernel/cfi.c
+++ b/kernel/cfi.c
@@ -282,7 +282,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr)
 {
cfi_check_fn fn = NULL;
 
-   if (is_kernel_text(ptr))
+   if (is_kernel_text_or_gate_area(ptr))
return __cfi_check;
 
/*
-- 
2.26.2

[PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range

2021-07-28 Thread Kefeng Wang

The is_kernel_inittext/is_kernel_text/is_kernel function should not
include the end address(the labels _einittext, _etext and _end) when
check the address range, the issue exists since Linux v2.6.12.

Cc: Arnd Bergmann 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Acked-by: Sergey Senozhatsky 
Reviewed-by: Petr Mladek 
Signed-off-by: Kefeng Wang 
---
 include/linux/kallsyms.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 2a241e3f063f..b016c62f30a6 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -27,21 +27,21 @@ struct module;
 static inline int is_kernel_inittext(unsigned long addr)
 {
if (addr >= (unsigned long)_sinittext
-   && addr <= (unsigned long)_einittext)
+   && addr < (unsigned long)_einittext)
return 1;
return 0;
 }
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
+   if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
 
 static inline int is_kernel(unsigned long addr)
 {
-   if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
+   if (addr >= (unsigned long)_stext && addr < (unsigned long)_end)
return 1;
return in_gate_area_no_mm(addr);
 }
-- 
2.26.2

[PATCH v2 1/7] kallsyms: Remove arch specific text and data check

2021-07-28 Thread Kefeng Wang

After commit 4ba66a976072 ("arch: remove blackfin port"),
no need arch-specific text/data check.

Cc: Arnd Bergmann 
Signed-off-by: Kefeng Wang 
---
 include/asm-generic/sections.h | 16 
 include/linux/kallsyms.h   |  3 +--
 kernel/locking/lockdep.c   |  3 ---
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index d16302d3eb59..817309e289db 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -64,22 +64,6 @@ extern __visible const void __nosave_begin, __nosave_end;
 #define dereference_kernel_function_descriptor(p) ((void *)(p))
 #endif
 
-/* random extra sections (if any).  Override
- * in asm/sections.h */
-#ifndef arch_is_kernel_text
-static inline int arch_is_kernel_text(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
-#ifndef arch_is_kernel_data
-static inline int arch_is_kernel_data(unsigned long addr)
-{
-   return 0;
-}
-#endif
-
 /*
  * Check if an address is part of freed initmem. This is needed on 
architectures
  * with virt == phys kernel mapping, for code that wants to check if an address
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 6851c2313cad..2a241e3f063f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -34,8 +34,7 @@ static inline int is_kernel_inittext(unsigned long addr)
 
 static inline int is_kernel_text(unsigned long addr)
 {
-   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
-   arch_is_kernel_text(addr))
+   if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext))
return 1;
return in_gate_area_no_mm(addr);
 }
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bf1c00c881e4..64b17e995108 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -803,9 +803,6 @@ static int static_obj(const void *obj)
if ((addr >= start) && (addr < end))
return 1;
 
-   if (arch_is_kernel_data(addr))
-   return 1;
-
/*
 * in-kernel percpu var?
 */
-- 
2.26.2

[PATCH v2 0/7] sections: Unify kernel sections range check and use

2021-07-28 Thread Kefeng Wang

There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and
unify them.

1. cleanup arch specific text/data check and fix address boundary check
   in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
   the code

After this series, we have 5 APIs about kernel sections range check in
sections.h

 * is_kernel_core_data()--- come from core_kernel_data() in kernel.h
 * is_kernel_rodata()   --- already in sections.h
 * is_kernel_text() --- come from kallsyms.h
 * is_kernel_inittext() --- come from kernel.h and kallsyms.h
 * is_kernel()  --- come from kallsyms.h


Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-a...@vger.kernel.org 
Cc: io...@lists.linux-foundation.org
Cc: b...@vger.kernel.org 

v2:
- add ACK/RW to patch2, and drop inappropriate fix tag
- keep 'core' to check kernel data, suggestted by Steven Rostedt
  , rename is_kernel_data() to is_kernel_core_data()
- drop patch8 which is merged
- drop patch9 which is resend independently

v1:
https://lore.kernel.org/linux-arch/20210626073439.150586-1-wangkefeng.w...@huawei.com

Kefeng Wang (7):
  kallsyms: Remove arch specific text and data check
  kallsyms: Fix address-checks for kernel related range
  sections: Move and rename core_kernel_data() to is_kernel_core_data()
  sections: Move is_kernel_inittext() into sections.h
  kallsyms: Rename is_kernel() and is_kernel_text()
  sections: Add new is_kernel() and is_kernel_text()
  powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

 arch/powerpc/mm/pgtable_32.c   |  7 +---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 arch/x86/net/bpf_jit_comp.c|  2 +-
 include/asm-generic/sections.h | 71 ++
 include/linux/kallsyms.h   | 21 +++---
 include/linux/kernel.h |  2 -
 kernel/cfi.c   |  2 +-
 kernel/extable.c   | 33 ++--
 kernel/locking/lockdep.c   |  3 --
 kernel/trace/ftrace.c  |  2 +-
 mm/kasan/report.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 12 files changed, 72 insertions(+), 77 deletions(-)

-- 
2.26.2

[powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Sachin Sant

linux-next fails to boot on Power server (POWER8/POWER9). Following traces
are seen during boot

[0.010799] software IO TLB: tearing down default memory pool
[0.010805] [ cut here ]
[0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
[0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
[0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[0.010820] Modules linked in:
[0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
5.14.0-rc3-next-20210727 #1
[0.010830] NIP:  c0032cfc LR: c000c764 CTR: c000c670
[0.010834] REGS: c3603b10 TRAP: 0700   Not tainted  
(5.14.0-rc3-next-20210727)
[0.010838] MSR:  80029033   CR: 28000222  
XER: 0002
[0.010848] CFAR: c000c760 IRQMASK: 3 
[0.010848] GPR00: c000c764 c3603db0 c29bd000 
0001 
[0.010848] GPR04: 0a68 0400 c3603868 
 
[0.010848] GPR08:    
0003 
[0.010848] GPR12:  c0001ec9ee80 c0012a28 
 
[0.010848] GPR16:    
 
[0.010848] GPR20:    
 
[0.010848] GPR24: f134   
c3603868 
[0.010848] GPR28: 0400 0a68 c202e9c0 
c3603e80 
[0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0
[0.010901] LR [c000c764] system_call_common+0xf4/0x258
[0.010907] Call Trace:
[0.010909] [c3603db0] [c016a6dc] 
calculate_sigpending+0x4c/0xe0 (unreliable)
[0.010915] [c3603e10] [c000c764] 
system_call_common+0xf4/0x258
[0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8
[0.010926] NIP:  c0092dec LR: c0114fc8 CTR: 
[0.010930] REGS: c3603e80 TRAP: 0c00   Not tainted  
(5.14.0-rc3-next-20210727)
[0.010934] MSR:  80009033   CR: 28000222  
XER: 
[0.010943] IRQMASK: 0 
[0.010943] GPR00: c202e9c0 c3603b00 c29bd000 
f134 
[0.010943] GPR04: 0a68 0400 c3603868 
 
[0.010943] GPR08:    
 
[0.010943] GPR12:  c0001ec9ee80 c0012a28 
 
[0.010943] GPR16:    
 
[0.010943] GPR20:    
 
[0.010943] GPR24: c20033c4 c110afc0 c2081950 
c3277d40 
[0.010943] GPR28:  ca68 0400 
000d 
[0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8
[0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60
[0.010999] --- interrupt: c00
[0.011001] [c3603b00] [c000c764] 
system_call_common+0xf4/0x258 (unreliable)
[0.011008] Instruction dump:
[0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 
68690002 
[0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 792907e0 
0b09 
[0.011029] ---[ end trace a20ad55589efcb10 ]---
[0.012297] 
[1.012304] Kernel panic - not syncing: Fatal exception

next-20210723 was good. The boot failure seems to have been introduced with 
next-20210726.

I have attached the boot log.

Thanks
-Sachin

[0.00] hash-mmu: Page sizes from device-tree:
[0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, 
avpnm=0x, tlbiel=1, penc=0
[0.00] hash-mmu: base_shift=12: shift=16, sllp=0x, 
avpnm=0x, tlbiel=1, penc=7
[0.00] hash-mmu: base_shift=12: shift=24, sllp=0x, 
avpnm=0x, tlbiel=1, penc=56
[0.00] hash-mmu: base_shift=16: shift=16, sllp=0x0110, 
avpnm=0x, tlbiel=1, penc=1
[0.00] hash-mmu: base_shift=16: shift=24, sllp=0x0110, 
avpnm=0x, tlbiel=1, penc=8
[0.00] hash-mmu: base_shift=24: shift=24, sllp=0x0100, 
avpnm=0x0001, tlbiel=0, penc=0
[0.00] hash-mmu: base_shift=34: shift=34, sllp=0x0120, 
avpnm=0x07ff, tlbiel=0, penc=3
[0.00] Enabling pkeys with max key count 31
[0.00] Activating Kernel Userspace Execution Prevention
[0.00] Activating Kernel Userspace Access Prevention
[0.00] Using 1TB segments
[0.00] hash-mmu: Initializing hash mmu with SLB
[0.00] Linux version 5.14.0-rc3-next-20210727 
(r...@ltczz304-lp7.aus.stglabs.ibm.com) (gcc (GCC) 8.4.1 20200928 (Red Hat 
8.4.1-1), GNU ld version 2.30-93.el8) #1 SMP Wed Jul 28 01:12:04 EDT 2021
[0.00] Found initrd at 0xc558:0xca67e

[PATCH V2] powerpc/fadump: register for fadump as early as possible

2021-07-28 Thread Hari Bathini

Crash recovery (fadump) is setup in the userspace by some service.
This service rebuilds initrd with dump capture capability, if it is
not already dump capture capable before proceeding to register for
firmware assisted dump (echo 1 > /sys/kernel/fadump/registered). But
arming the kernel with crash recovery support does not have to wait
for userspace configuration. So, register for fadump while setting
it up itself. This can at worst lead to a scenario, where /proc/vmcore
is ready afer crash but the initrd does not know how/where to offload
it, which is always better than not having a /proc/vmcore at all due
to incomplete configuration in the userspace at the time of crash.

Commit 0823c68b054b ("powerpc/fadump: re-register firmware-assisted
dump if already registered") ensures this change does not break
userspace.

Signed-off-by: Hari Bathini 
---

Changes in V2:
* Updated the changelog with bit more explanation about userspace issue
  with/without this change.
* Added a comment in the code for why setup_fadump function is changed
  from subsys_init() to subsys_init_sync() call.


 arch/powerpc/kernel/fadump.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b990075285f5..2911aefdf594 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1637,13 +1637,20 @@ int __init setup_fadump(void)
if (fw_dump.ops->fadump_process(&fw_dump) < 0)
fadump_invalidate_release_mem();
}
-   /* Initialize the kernel dump memory structure for FAD registration. */
-   else if (fw_dump.reserve_dump_area_size)
+   /* Initialize the kernel dump memory structure and register with f/w */
+   else if (fw_dump.reserve_dump_area_size) {
fw_dump.ops->fadump_init_mem_struct(&fw_dump);
+   register_fadump();
+   }
 
return 1;
 }
-subsys_initcall(setup_fadump);
+/*
+ * Replace subsys_initcall() with subsys_initcall_sync() as there is dependency
+ * with crash_save_vmcoreinfo_init() to ensure vmcoreinfo initialization is 
done
+ * before regisering with f/w.
+ */
+subsys_initcall_sync(setup_fadump);
 #else /* !CONFIG_PRESERVE_FA_DUMP */
 
 /* Scan the Firmware Assisted dump configuration details. */

47 matches

Mail list logo