Public bug reported:

== Comment: #0 - Harish Sriram <hasri...@in.ibm.com> - 2018-07-31 03:50:07 ==
--Problem Description-- 
Rcu stalls and Soft-lockups observed on stressing Ubuntu 18 04 1

Contact Information = hasri...@in.ibm.com

---Issue observed---
[ 1196.813220] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1196.813241]  0-....: (19 ticks this GP) idle=966/140000000000000/0 
softirq=11580/11580 fqs=1552 
[ 1196.813249]  (detected by 24, t=5252 jiffies, g=11722, c=11721, q=1061088)
[ 1196.813282] Task dump for CPU 0:
[ 1196.813285] stress-ng-dev   R  running task        0 46323  33635 0x00042004
[ 1196.813294] Call Trace:
[ 1196.813310] [c000002c75ad7b10] [c0000000018bd940] log_first_seq+0x0/0x8 
(unreliable)
[ 1198.508930] kauditd_printk_skb: 3 callbacks suppressed
[ 1198.508938] audit: type=1400 audit(1533020002.449:312): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/pulseaudio-eg" 
pid=12813 comm="stress-ng-appar"
[ 1198.508954] audit: type=1400 audit(1533020002.449:313): apparmor="STATUS" 
operation="profile_load" profile="unconfined" 
name="/usr/bin/pulseaudio-eg///usr/lib/pulseaudio/pulse/gconf-helper" pid=12813 
comm="stress-ng-appar"
[ 1199.361719] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... 
145-... 159-... } 5489 jiffies s: 173 root: 0x201/.
[ 1199.361742] blocking rcu_node structures: l=1:0-15:0x1/. l=1:144-159:0x8002/.
[ 1199.361749] Task dump for CPU 0:
[ 1199.361752] stress-ng-dev   R  running task        0 46323  33635 0x00042004
[ 1199.361757] Call Trace:
[ 1199.361769] [c000002c75ad7b10] [c0000000018bd940] log_first_seq+0x0/0x8 
(unreliable)
[ 1199.361777] Task dump for CPU 145:
[ 1199.361779] migration/145   R  running task        0   883      2 0x00000804
[ 1199.361783] Call Trace:
[ 1199.361787] [c000002ff0f5fa40] [c000002ff0f5fb00] 0xc000002ff0f5fb00 
(unreliable)
[ 1199.361791] Task dump for CPU 159:
[ 1199.361792] migration/159   R  running task        0   967      2 0x00000804
[ 1199.361796] Call Trace:
[ 1199.361799] [c000002d78a47a40] [c000002d78a47b00] 0xc000002d78a47b00 
(unreliable)
[ 1199.787698] audit: type=1400 audit(1533020003.985:314): apparmor="STATUS" 
operation="profile_replace" profile="unconfined" name="/usr/bin/pulseaudio-eg" 
pid=12813 comm="stress-ng-appar"
[ 1200.781159] watchdog: BUG: soft lockup - CPU#145 stuck for 23s! 
[migration/145:883]
[ 1200.781163] Modules linked in: snd_seq snd_seq_device snd_timer snd 
soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common 
serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock 
twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth 
ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 
unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg 
powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq 
leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser rdma_cm 
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
[ 1200.781321]  raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage 
crct10dif_vpmsum crc32c_vpmsum tg3 ipr
[ 1200.781353] CPU: 145 PID: 883 Comm: migration/145 Not tainted 
4.15.0-29-generic #31-Ubuntu
[ 1200.781359] NIP:  c000000000206594 LR: c00000000020699c CTR: c000000000206470
[ 1200.781364] REGS: c000002ff0f5f9e0 TRAP: 0901   Not tainted  
(4.15.0-29-generic)
[ 1200.781366] MSR:  9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>  CR: 
28002222  XER: 20000000
[ 1200.781392] CFAR: c0000000002065a4 SOFTE: 1 
               GPR00: c00000000020699c c000002ff0f5fc60 c0000000016eaf00 
0000000000000000 
               GPR04: 0000000000000001 0000002ffbf90000 0000009dde82957a 
0000000000000000 
               GPR08: c00000000fae3b00 0000000000000001 c000000000d432f8 
0000000000000bdf 
               GPR12: 0000000000000000 c00000000fae3b00 
[ 1200.781453] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0
[ 1200.781461] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0
[ 1200.781462] Call Trace:
[ 1200.781475] [c000002ff0f5fc60] [c000002ff0f5fd40] 0xc000002ff0f5fd40 
(unreliable)
[ 1200.781487] [c000002ff0f5fcb0] [c00000000020699c] 
cpu_stopper_thread+0xfc/0x1f0
[ 1200.781503] [c000002ff0f5fd60] [c000000000143ae0] 
smpboot_thread_fn+0x250/0x290
[ 1200.781510] [c000002ff0f5fdc0] [c00000000013d728] kthread+0x1a8/0x1b0
[ 1200.781522] [c000002ff0f5fe30] [c00000000000b658] 
ret_from_kernel_thread+0x5c/0x84
[ 1200.781525] Instruction dump:
[ 1200.781531] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac 913d0020 
2b9f0004 
[ 1200.781551] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 409eff74 
2b890001 
[ 1200.905158] watchdog: BUG: soft lockup - CPU#159 stuck for 22s! 
[migration/159:967]
[ 1200.905161] Modules linked in: snd_seq snd_seq_device snd_timer snd 
soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common 
serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock 
twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth 
ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 
unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg 
powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq 
leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser rdma_cm 
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
[ 1200.905290]  raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage 
crct10dif_vpmsum crc32c_vpmsum tg3 ipr
[ 1200.905316] CPU: 159 PID: 967 Comm: migration/159 Tainted: G             L   
4.15.0-29-generic #31-Ubuntu
[ 1200.905320] NIP:  c000000000206594 LR: c00000000020699c CTR: c000000000206470
[ 1200.905326] REGS: c000002d78a479e0 TRAP: 0901   Tainted: G             L    
(4.15.0-29-generic)
[ 1200.905327] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 
28002822  XER: 20000000
[ 1200.905345] CFAR: c0000000002065a4 SOFTE: 1 
               GPR00: c00000000020699c c000002d78a47c60 c0000000016eaf00 
0000000000000000 
               GPR04: 0000000000000001 0000002ffc310000 0000009dd8be8a16 
0000000000000000 
               GPR08: c00000000faed500 0000000000000001 c000000000d432f8 
0000000000000b97 
               GPR12: 0000000000000000 c00000000faed500 
[ 1200.905383] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0
[ 1200.905387] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0
[ 1200.905391] Call Trace:
[ 1200.905398] [c000002d78a47c60] [c000002d78a47d40] 0xc000002d78a47d40 
(unreliable)
[ 1200.905405] [c000002d78a47cb0] [c00000000020699c] 
cpu_stopper_thread+0xfc/0x1f0
[ 1200.905413] [c000002d78a47d60] [c000000000143ae0] 
smpboot_thread_fn+0x250/0x290
[ 1200.905418] [c000002d78a47dc0] [c00000000013d728] kthread+0x1a8/0x1b0
[ 1200.905426] [c000002d78a47e30] [c00000000000b658] 
ret_from_kernel_thread+0x5c/0x84
[ 1200.905429] Instruction dump:
[ 1200.905433] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac 913d0020 
2b9f0004 
[ 1200.905445] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 409eff74 
2b890001 


---uname output---
# uname -a
Linux lep8d 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:37:15 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux

Machine Type = Power 8 BML/Tuleta

----Additional Info-----
rcu stalls and soft lockups leads to Hard LOCKUPs but is cpu becomes unstuck 
after hard lockup.

dmesg is attached.
sosreport will be attached.

Reproducible : 90%

---Steps to Reproduce---
1. wget https://github.com/ColinIanKing/stress-ng/archive/master.zip
2. unzip master.zip; cd stress-ng-master;
3. make; make install;
4. Run the following command multiple times
stress-ng --all <nr_cpus>  --vm-bytes 80%  --aggressive --maximize --oomable  
--timeout 300  --verify  --syslog  --metrics  --times

---Expected---
Test should not cause any lockup or crash.

== Comment: #1 - Harish Sriram <hasri...@in.ibm.com> - 2018-07-31
03:50:49 ==


== Comment: #5 - SRIKAR DRONAMRAJU <srikar.dronamr...@in.ibm.com> - 2018-08-15 
02:16:49 ==
(unreliable)
> [ 1199.361777] Task dump for CPU 145:
> [ 1199.361779] migration/145   R  running task        0   883      2
> 0x00000804
> [ 1199.361783] Call Trace:
> [ 1199.361787] [c000002ff0f5fa40] [c000002ff0f5fb00] 0xc000002ff0f5fb00
> (unreliable)
> [ 1199.361791] Task dump for CPU 159:
> [ 1199.361792] migration/159   R  running task        0   967      2
> 0x00000804
> [ 1199.361796] Call Trace:
> [ 1199.361799] [c000002d78a47a40] [c000002d78a47b00] 0xc000002d78a47b00
> (unreliable)
> [ 1199.787698] audit: type=1400 audit(1533020003.985:314): apparmor="STATUS"
> operation="profile_replace" profile="unconfined"
> name="/usr/bin/pulseaudio-eg" pid=12813 comm="stress-ng-appar"
> [ 1200.781159] watchdog: BUG: soft lockup - CPU#145 stuck for 23s!
> [migration/145:883]
> [ 1200.781163] Modules linked in: snd_seq snd_seq_device snd_timer snd
> soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common
> serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock
> twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth
> ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320
> unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg
> powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq
> leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser
> rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure
> scsi_transport_sas btrfs zstd_compress raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor
> [ 1200.781321]  raid6_pq libcrc32c raid1 raid0 multipath linear uas
> usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr
> [ 1200.781353] CPU: 145 PID: 883 Comm: migration/145 Not tainted
> 4.15.0-29-generic #31-Ubuntu
> [ 1200.781359] NIP:  c000000000206594 LR: c00000000020699c CTR:
> c000000000206470
> [ 1200.781364] REGS: c000002ff0f5f9e0 TRAP: 0901   Not tainted 
> (4.15.0-29-generic)
> [ 1200.781366] MSR:  9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>  CR:
> 28002222  XER: 20000000
> [ 1200.781392] CFAR: c0000000002065a4 SOFTE: 1 
>                GPR00: c00000000020699c c000002ff0f5fc60 c0000000016eaf00
> 0000000000000000 
>                GPR04: 0000000000000001 0000002ffbf90000 0000009dde82957a
> 0000000000000000 
>                GPR08: c00000000fae3b00 0000000000000001 c000000000d432f8
> 0000000000000bdf 
>                GPR12: 0000000000000000 c00000000fae3b00 
> [ 1200.781453] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0
> [ 1200.781461] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0
> [ 1200.781462] Call Trace:
> [ 1200.781475] [c000002ff0f5fc60] [c000002ff0f5fd40] 0xc000002ff0f5fd40
> (unreliable)
> [ 1200.781487] [c000002ff0f5fcb0] [c00000000020699c]
> cpu_stopper_thread+0xfc/0x1f0
> [ 1200.781503] [c000002ff0f5fd60] [c000000000143ae0]
> smpboot_thread_fn+0x250/0x290
> [ 1200.781510] [c000002ff0f5fdc0] [c00000000013d728] kthread+0x1a8/0x1b0
> [ 1200.781522] [c000002ff0f5fe30] [c00000000000b658]
> ret_from_kernel_thread+0x5c/0x84
> [ 1200.781525] Instruction dump:
> [ 1200.781531] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac
> 913d0020 2b9f0004 
> [ 1200.781551] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840
> 409eff74 2b890001 


2610e88 stop_machine: Disable preemption after queueing stopper threads
9fb8d5d stop_machine: Disable preemption when waking two stopper threads
0b26351 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock

These 3 commit are missing that could be the reason we are seeing these
traces.

== Comment: #13 - Harish Sriram <hasri...@in.ibm.com> - 2018-08-16
10:29:21 ==

** Affects: kernel-package (Ubuntu)
     Importance: Undecided
     Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
         Status: New


** Tags: architecture-ppc64le bugnameltc-170060 severity-high 
targetmilestone-inin1804

** Tags added: architecture-ppc64le bugnameltc-170060 severity-high
targetmilestone-inin1804

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1855679

Title:
  Rcu stalls and Soft-lockups observed on stressing Ubuntu 18 04 1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/1855679/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to