Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> On 29-Jul-2021, at 9:43 PM, Will Deacon wrote: > > On Wed, Jul 28, 2021 at 10:35:34AM -0700, Nathan Chancellor wrote: >> On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: >>> next-20210723 was good. The boot failure seems to have been introduced with >>> next-20210726. >>> >>> I have attached the boot log. >> >> I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on >> commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That >> series just keeps on giving... > > Yes, but look how handy our new print is! > > [0.010799] software IO TLB: tearing down default memory pool > [0.010805] [ cut here ] > [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! > > Following Nick's suggestion, the diff below should help? I don't have a > relevant box on which I can test it though. > Thanks for the fix. This fixes the reported problem for me. Tested successfully on both PowerVM LPAR as well as bare metal environment. Reported-by: Sachin Sant Tested-by: Sachin Sant > Will > > --->8 > > diff --git a/arch/powerpc/platforms/pseries/svm.c > b/arch/powerpc/platforms/pseries/svm.c > index 1d829e257996..87f001b4c4e4 100644 > --- a/arch/powerpc/platforms/pseries/svm.c > +++ b/arch/powerpc/platforms/pseries/svm.c > @@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void) > > int set_memory_encrypted(unsigned long addr, int numpages) > { > + if (!mem_encrypt_active()) > + return 0; > + >if (!PAGE_ALIGNED(addr)) >return -EINVAL; > > @@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages) > > int set_memory_decrypted(unsigned long addr, int numpages) > { > + if (!mem_encrypt_active()) > + return 0; > + >if (!PAGE_ALIGNED(addr)) >return -EINVAL; >
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
On 7/29/2021 9:35 AM, Konrad Rzeszutek Wilk wrote: On Thu, Jul 29, 2021 at 05:13:36PM +0100, Will Deacon wrote: On Wed, Jul 28, 2021 at 10:35:34AM -0700, Nathan Chancellor wrote: On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: next-20210723 was good. The boot failure seems to have been introduced with next-20210726. I have attached the boot log. I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That series just keeps on giving... Low-level across platform do that. And thank you for testing it and finding this bug. Please let me know if the patch works so I can add it in in the patch series. That was not meant to sound as sarcastic as it did so my apologies for that :( Will's patch looks good to me in QEMU, I do not have a bare metal POWER system to test it on. Tested-by: Nathan Chancellor Yes, but look how handy our new print is! :) [0.010799] software IO TLB: tearing down default memory pool [0.010805] [ cut here ] [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! Following Nick's suggestion, the diff below should help? I don't have a relevant box on which I can test it though. Will --->8 diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c index 1d829e257996..87f001b4c4e4 100644 --- a/arch/powerpc/platforms/pseries/svm.c +++ b/arch/powerpc/platforms/pseries/svm.c @@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void) int set_memory_encrypted(unsigned long addr, int numpages) { + if (!mem_encrypt_active()) + return 0; + if (!PAGE_ALIGNED(addr)) return -EINVAL; @@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages) int set_memory_decrypted(unsigned long addr, int numpages) { + if (!mem_encrypt_active()) + return 0; + if (!PAGE_ALIGNED(addr)) return -EINVAL;
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
On Thu, Jul 29, 2021 at 05:13:36PM +0100, Will Deacon wrote: > On Wed, Jul 28, 2021 at 10:35:34AM -0700, Nathan Chancellor wrote: > > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: > > > next-20210723 was good. The boot failure seems to have been introduced > > > with next-20210726. > > > > > > I have attached the boot log. > > > > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on > > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That > > series just keeps on giving... Low-level across platform do that. And thank you for testing it and finding this bug. Please let me know if the patch works so I can add it in in the patch series. > > Yes, but look how handy our new print is! :) > > [0.010799] software IO TLB: tearing down default memory pool > [0.010805] [ cut here ] > [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! > > Following Nick's suggestion, the diff below should help? I don't have a > relevant box on which I can test it though. > > Will > > --->8 > > diff --git a/arch/powerpc/platforms/pseries/svm.c > b/arch/powerpc/platforms/pseries/svm.c > index 1d829e257996..87f001b4c4e4 100644 > --- a/arch/powerpc/platforms/pseries/svm.c > +++ b/arch/powerpc/platforms/pseries/svm.c > @@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void) > > int set_memory_encrypted(unsigned long addr, int numpages) > { > + if (!mem_encrypt_active()) > + return 0; > + > if (!PAGE_ALIGNED(addr)) > return -EINVAL; > > @@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages) > > int set_memory_decrypted(unsigned long addr, int numpages) > { > + if (!mem_encrypt_active()) > + return 0; > + > if (!PAGE_ALIGNED(addr)) > return -EINVAL; >
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
On Wed, Jul 28, 2021 at 10:35:34AM -0700, Nathan Chancellor wrote: > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: > > next-20210723 was good. The boot failure seems to have been introduced with > > next-20210726. > > > > I have attached the boot log. > > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That > series just keeps on giving... Yes, but look how handy our new print is! [0.010799] software IO TLB: tearing down default memory pool [0.010805] [ cut here ] [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! Following Nick's suggestion, the diff below should help? I don't have a relevant box on which I can test it though. Will --->8 diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c index 1d829e257996..87f001b4c4e4 100644 --- a/arch/powerpc/platforms/pseries/svm.c +++ b/arch/powerpc/platforms/pseries/svm.c @@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void) int set_memory_encrypted(unsigned long addr, int numpages) { + if (!mem_encrypt_active()) + return 0; + if (!PAGE_ALIGNED(addr)) return -EINVAL; @@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages) int set_memory_decrypted(unsigned long addr, int numpages) { + if (!mem_encrypt_active()) + return 0; + if (!PAGE_ALIGNED(addr)) return -EINVAL;
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> On 28-Jul-2021, at 11:05 PM, Nathan Chancellor wrote: > > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: >> linux-next fails to boot on Power server (POWER8/POWER9). Following traces >> are seen during boot >> >> [0.010799] software IO TLB: tearing down default memory pool >> [0.010805] [ cut here ] >> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! >> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] ……. > > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That Indeed. Thanks Nathan. Bisect points to this commit. Reverting the commit allows the kernel to boot. Thanks -Sachin > series just keeps on giving... Adding some people from that thread to > this one. Original thread: > https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ > > [1]: > https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default > > Cheers, > Nathan
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
Excerpts from Nathan Chancellor's message of July 29, 2021 3:35 am: > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: >> linux-next fails to boot on Power server (POWER8/POWER9). Following traces >> are seen during boot >> >> [0.010799] software IO TLB: tearing down default memory pool >> [0.010805] [ cut here ] >> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! >> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] >> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries >> [0.010820] Modules linked in: >> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >> 5.14.0-rc3-next-20210727 #1 >> [0.010830] NIP: c0032cfc LR: c000c764 CTR: >> c000c670 >> [0.010834] REGS: c3603b10 TRAP: 0700 Not tainted >> (5.14.0-rc3-next-20210727) >> [0.010838] MSR: 80029033 CR: 28000222 >> XER: 0002 >> [0.010848] CFAR: c000c760 IRQMASK: 3 >> [0.010848] GPR00: c000c764 c3603db0 c29bd000 >> 0001 >> [0.010848] GPR04: 0a68 0400 c3603868 >> >> [0.010848] GPR08: >> 0003 >> [0.010848] GPR12: c0001ec9ee80 c0012a28 >> >> [0.010848] GPR16: >> >> [0.010848] GPR20: >> >> [0.010848] GPR24: f134 >> c3603868 >> [0.010848] GPR28: 0400 0a68 c202e9c0 >> c3603e80 >> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0 >> [0.010901] LR [c000c764] system_call_common+0xf4/0x258 >> [0.010907] Call Trace: >> [0.010909] [c3603db0] [c016a6dc] >> calculate_sigpending+0x4c/0xe0 (unreliable) >> [0.010915] [c3603e10] [c000c764] >> system_call_common+0xf4/0x258 >> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8 >> [0.010926] NIP: c0092dec LR: c0114fc8 CTR: >> >> [0.010930] REGS: c3603e80 TRAP: 0c00 Not tainted >> (5.14.0-rc3-next-20210727) >> [0.010934] MSR: 80009033 CR: 28000222 >> XER: >> [0.010943] IRQMASK: 0 >> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 >> f134 >> [0.010943] GPR04: 0a68 0400 c3603868 >> >> [0.010943] GPR08: >> >> [0.010943] GPR12: c0001ec9ee80 c0012a28 >> >> [0.010943] GPR16: >> >> [0.010943] GPR20: >> >> [0.010943] GPR24: c20033c4 c110afc0 c2081950 >> c3277d40 >> [0.010943] GPR28: ca68 0400 >> 000d >> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8 >> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60 >> [0.010999] --- interrupt: c00 >> [0.011001] [c3603b00] [c000c764] >> system_call_common+0xf4/0x258 (unreliable) >> [0.011008] Instruction dump: >> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 >> e87f0108 68690002 >> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 >> 792907e0 0b09 >> [0.011029] ---[ end trace a20ad55589efcb10 ]--- >> [0.012297] >> [1.012304] Kernel panic - not syncing: Fatal exception >> >> next-20210723 was good. The boot failure seems to have been introduced with >> next-20210726. >> >> I have attached the boot log. > > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That > series just keeps on giving... Adding some people from that thread to > this one. Original thread: > https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ This is because powerpc's set_memory_encrypted makes an ultracall but it does not exist on that processor. x86's set_memory_encrypted/decrypted have /* Nothing to do if memory encryption is not active */ if (!mem_encrypt_active()) return 0; Probably powerpc should just do that too. Thanks, Nick
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: > linux-next fails to boot on Power server (POWER8/POWER9). Following traces > are seen during boot > > [0.010799] software IO TLB: tearing down default memory pool > [0.010805] [ cut here ] > [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! > [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] > [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [0.010820] Modules linked in: > [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > 5.14.0-rc3-next-20210727 #1 > [0.010830] NIP: c0032cfc LR: c000c764 CTR: > c000c670 > [0.010834] REGS: c3603b10 TRAP: 0700 Not tainted > (5.14.0-rc3-next-20210727) > [0.010838] MSR: 80029033 CR: 28000222 > XER: 0002 > [0.010848] CFAR: c000c760 IRQMASK: 3 > [0.010848] GPR00: c000c764 c3603db0 c29bd000 > 0001 > [0.010848] GPR04: 0a68 0400 c3603868 > > [0.010848] GPR08: > 0003 > [0.010848] GPR12: c0001ec9ee80 c0012a28 > > [0.010848] GPR16: > > [0.010848] GPR20: > > [0.010848] GPR24: f134 > c3603868 > [0.010848] GPR28: 0400 0a68 c202e9c0 > c3603e80 > [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0 > [0.010901] LR [c000c764] system_call_common+0xf4/0x258 > [0.010907] Call Trace: > [0.010909] [c3603db0] [c016a6dc] > calculate_sigpending+0x4c/0xe0 (unreliable) > [0.010915] [c3603e10] [c000c764] > system_call_common+0xf4/0x258 > [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8 > [0.010926] NIP: c0092dec LR: c0114fc8 CTR: > > [0.010930] REGS: c3603e80 TRAP: 0c00 Not tainted > (5.14.0-rc3-next-20210727) > [0.010934] MSR: 80009033 CR: 28000222 > XER: > [0.010943] IRQMASK: 0 > [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 > f134 > [0.010943] GPR04: 0a68 0400 c3603868 > > [0.010943] GPR08: > > [0.010943] GPR12: c0001ec9ee80 c0012a28 > > [0.010943] GPR16: > > [0.010943] GPR20: > > [0.010943] GPR24: c20033c4 c110afc0 c2081950 > c3277d40 > [0.010943] GPR28: ca68 0400 > 000d > [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8 > [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60 > [0.010999] --- interrupt: c00 > [0.011001] [c3603b00] [c000c764] > system_call_common+0xf4/0x258 (unreliable) > [0.011008] Instruction dump: > [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 > 68690002 > [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 > 792907e0 0b09 > [0.011029] ---[ end trace a20ad55589efcb10 ]--- > [0.012297] > [1.012304] Kernel panic - not syncing: Fatal exception > > next-20210723 was good. The boot failure seems to have been introduced with > next-20210726. > > I have attached the boot log. I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That series just keeps on giving... Adding some people from that thread to this one. Original thread: https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ [1]: https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default Cheers, Nathan
[powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
linux-next fails to boot on Power server (POWER8/POWER9). Following traces are seen during boot [0.010799] software IO TLB: tearing down default memory pool [0.010805] [ cut here ] [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [0.010820] Modules linked in: [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc3-next-20210727 #1 [0.010830] NIP: c0032cfc LR: c000c764 CTR: c000c670 [0.010834] REGS: c3603b10 TRAP: 0700 Not tainted (5.14.0-rc3-next-20210727) [0.010838] MSR: 80029033 CR: 28000222 XER: 0002 [0.010848] CFAR: c000c760 IRQMASK: 3 [0.010848] GPR00: c000c764 c3603db0 c29bd000 0001 [0.010848] GPR04: 0a68 0400 c3603868 [0.010848] GPR08: 0003 [0.010848] GPR12: c0001ec9ee80 c0012a28 [0.010848] GPR16: [0.010848] GPR20: [0.010848] GPR24: f134 c3603868 [0.010848] GPR28: 0400 0a68 c202e9c0 c3603e80 [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0 [0.010901] LR [c000c764] system_call_common+0xf4/0x258 [0.010907] Call Trace: [0.010909] [c3603db0] [c016a6dc] calculate_sigpending+0x4c/0xe0 (unreliable) [0.010915] [c3603e10] [c000c764] system_call_common+0xf4/0x258 [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8 [0.010926] NIP: c0092dec LR: c0114fc8 CTR: [0.010930] REGS: c3603e80 TRAP: 0c00 Not tainted (5.14.0-rc3-next-20210727) [0.010934] MSR: 80009033 CR: 28000222 XER: [0.010943] IRQMASK: 0 [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 f134 [0.010943] GPR04: 0a68 0400 c3603868 [0.010943] GPR08: [0.010943] GPR12: c0001ec9ee80 c0012a28 [0.010943] GPR16: [0.010943] GPR20: [0.010943] GPR24: c20033c4 c110afc0 c2081950 c3277d40 [0.010943] GPR28: ca68 0400 000d [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8 [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60 [0.010999] --- interrupt: c00 [0.011001] [c3603b00] [c000c764] system_call_common+0xf4/0x258 (unreliable) [0.011008] Instruction dump: [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 68690002 [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 792907e0 0b09 [0.011029] ---[ end trace a20ad55589efcb10 ]--- [0.012297] [1.012304] Kernel panic - not syncing: Fatal exception next-20210723 was good. The boot failure seems to have been introduced with next-20210726. I have attached the boot log. Thanks -Sachin [0.00] hash-mmu: Page sizes from device-tree: [0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, avpnm=0x, tlbiel=1, penc=0 [0.00] hash-mmu: base_shift=12: shift=16, sllp=0x, avpnm=0x, tlbiel=1, penc=7 [0.00] hash-mmu: base_shift=12: shift=24, sllp=0x, avpnm=0x, tlbiel=1, penc=56 [0.00] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x, tlbiel=1, penc=1 [0.00] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x, tlbiel=1, penc=8 [0.00] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, tlbiel=0, penc=0 [0.00] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, tlbiel=0, penc=3 [0.00] Enabling pkeys with max key count 31 [0.00] Activating Kernel Userspace Execution Prevention [0.00] Activating Kernel Userspace Access Prevention [0.00] Using 1TB segments [0.00] hash-mmu: Initializing hash mmu with SLB [0.00] Linux version 5.14.0-rc3-next-20210727 (r...@ltczz304-lp7.aus.stglabs.ibm.com) (gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), GNU ld version 2.30-93.el8) #1 SMP Wed Jul 28 01:12:04 EDT 2021 [0.00] Found initrd at