Re: [sparc64] kernel panic from running a program in userspace

2021-06-19 Thread Colin Ian King
Hi,

I suspect this issue was fixed with the following commit:

commit e5e8b80d352ec999d2bba3ea584f541c83f4ca3f
Author: Rob Gardner 
Date:   Sun Feb 28 22:48:16 2021 -0700

sparc64: Fix opcode filtering in handling of no fault loads

Colin

On 19/06/2021 09:24, Anatoly Pugachev wrote:
> Hello!
> 
> Getting the following in logs:
> (reproducible with almost every run, tried different kernel as well -
> debian packaged 5.10.0-7-sparc64-smp )
> 
> [  863.344843] stress-ng[593992]: bad register window fault: SP
> fcd023ff (orig_sp fcd01c00) TPC fff80001000237fc O7
> fff800010003e008
> [  890.782498] CPU[4]: SUN4V mondo timeout, cpu(5) made no forward
> progress after 51 retries. Total target cpus(7).
> [  890.782539] CPU[3]: SUN4V mondo timeout, cpu(5) made no forward
> progress after 51 retries. Total target cpus(7).
> [  890.782590] Kernel panic - not syncing: SUN4V mondo timeout panic
> [  890.782664] CPU: 4 PID: 480951 Comm: stress-ng Tainted: G
>  E 5.13.0-rc6 #229
> [  890.782713] Call Trace:
> [  890.782733] [<00c806c8>] panic+0xf4/0x2d4
> [  890.782773] [<0043f3a8>] hypervisor_xcall_deliver+0x288/0x320
> [  890.782816] [<0043efb8>] xcall_deliver+0xf8/0x120
> [  890.782860] [<00440518>] smp_flush_tlb_page+0x38/0x60
> [  890.782898] [<0044ee44>] flush_tlb_pending+0x64/0xa0
> [  890.782938] [<0044f1c4>] arch_leave_lazy_mmu_mode+0x24/0x40
> [  890.782977] [<00651b4c>] copy_pte_range+0x5ac/0x860
> [  890.783013] [<00655974>] copy_pud_range+0x1f4/0x260
> [  890.783049] [<00655b2c>] copy_page_range+0x14c/0x1c0
> [  890.783083] [<004613b4>] dup_mmap+0x374/0x4a0
> [  890.783123] [<00461530>] dup_mm+0x50/0x200
> [  890.783157] [<00462384>] copy_process+0x704/0x1280
> [  890.783196] [<004631a8>] kernel_clone+0x88/0x380
> [  890.783231] [<0042d170>] sparc_clone+0xb0/0xe0
> [  890.783274] [<00406274>] linux_sparc_syscall+0x34/0x44
> [  890.784106] CPU[7]: SUN4V mondo timeout, cpu(5) made no forward
> progress after 52 retries. Total target cpus(7).
> [  890.784119] CPU[6]: SUN4V mondo timeout, cpu(5) made no forward
> progress after 53 retries. Total target cpus(7).
> [  890.784876] Press Stop-A (L1-A) from sun keyboard or send break
> [  890.784876] twice on console to return to the boot prom
> [  890.784897] ---[ end Kernel panic - not syncing: SUN4V mondo
> timeout panic ]---
> 
> (and machine halt)
> 
> after running stress-ng :
> 
> stress-ng.git$ ./stress-ng --verbose --timeout 10m --opcode -1
> stress-ng: debug: [480950] stress-ng 0.12.10 g27f90a2276bd
> stress-ng: debug: [480950] system: Linux ttip 5.13.0-rc6 #229 SMP Tue
> Jun 15 12:30:23 MSK 2021 sparc64
> stress-ng: debug: [480950] RAM total: 7.8G, RAM free: 7.0G, swap free: 768.7M
> stress-ng: debug: [480950] 8 processors online, 256 processors configured
> stress-ng: info:  [480950] dispatching hogs: 8 opcode
> stress-ng: debug: [480950] cache allocate: using cache maximum level L2
> stress-ng: debug: [480950] cache allocate: shared cache buffer size: 128K
> stress-ng: debug: [480950] starting stressors
> stress-ng: debug: [480951] stress-ng-opcode: started [480951] (instance 0)
> stress-ng: debug: [480952] stress-ng-opcode: started [480952] (instance 1)
> stress-ng: debug: [480953] stress-ng-opcode: started [480953] (instance 2)
> stress-ng: debug: [480955] stress-ng-opcode: started [480955] (instance 3)
> stress-ng: debug: [480957] stress-ng-opcode: started [480957] (instance 4)
> stress-ng: debug: [480959] stress-ng-opcode: started [480959] (instance 5)
> stress-ng: debug: [480961] stress-ng-opcode: started [480961] (instance 6)
> stress-ng: debug: [480950] 8 stressors started
> stress-ng: debug: [480963] stress-ng-opcode: started [480963] (instance 7)
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> Inconsistency detected by ld.so: dl-runtime.c: 80: _dl_fixup:
> Assertion `ELFW(R_TYPE)(reloc->r_info) == ELF_MACHINE_JMP_SLOT'
> failed!
> *** stack smashing detected ***: terminated
> munmap_chunk(): invalid pointer
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> *** stack smashing detected ***: terminated
> Inconsistency detected by ld.so: : 422: Assertion `�' failed!
> *** stack smashing detected ***: terminated
> 
> 
> Machine is my testing LDOM (virtual machine), installed and running
> the latest sparc4 debian sid (unstable).
> 



Re: [sparc64] kernel panic from running a program in userspace

2021-06-19 Thread Anatoly Pugachev
On Sat, Jun 19, 2021 at 12:31 PM Colin Ian King
 wrote:
>
> Hi,
>
> I suspect this issue was fixed with the following commit:
>
> commit e5e8b80d352ec999d2bba3ea584f541c83f4ca3f
> Author: Rob Gardner 
> Date:   Sun Feb 28 22:48:16 2021 -0700
>
> sparc64: Fix opcode filtering in handling of no fault loads

Colin,

yes, but I believe that it was quite a different kernel bug.
Besides, my current kernel test is based on git kernel 5.13.0-rc6
(released last monday), which already includes the mentioned 'opcode'
fix.

> > stress-ng.git$ ./stress-ng --verbose --timeout 10m --opcode -1
> > stress-ng: debug: [480950] stress-ng 0.12.10 g27f90a2276bd
> > stress-ng: debug: [480950] system: Linux ttip 5.13.0-rc6 #229 SMP Tue
> > Jun 15 12:30:23 MSK 2021 sparc64
> > stress-ng: debug: [480950] RAM total: 7.8G, RAM free: 7.0G, swap free: 
> > 768.7M
> > stress-ng: debug: [480950] 8 processors online, 256 processors configured
> > stress-ng: info:  [480950] dispatching hogs: 8 opcode