Re: Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3
divya writes: > While running systemtap tests on the p6 machine , against the kernel version > 2.6.36-rc7-git3 > Oops occured , here are the call trace > > BUG: spinlock bad magic on CPU#6, stapio/20398 > -- 0:conmux-control -- time-stamp -- Oct/13/10 2:49:18 --res > lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0 > [...] jistone committed some timing-related changes last night. Would you mind trying a new build on your ppc box? It seems as through there is a concurrency problem with the timer shutdown process. If the problem still occurs, we might need to instrument that part of the kernel and/or runtime code to figure it out. - FChE ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3
On Friday 15 October 2010 02:16 AM, Frank Ch. Eigler wrote: divya writes: While running systemtap tests on the p6 machine , against the kernel version 2.6.36-rc7-git3 Oops occured , here are the call trace Did the oops happen during a systemtap module startup vs. operation vs. shutdown? stap -V version string? BUG: spinlock bad magic on CPU#6, stapio/20398 -- 0:conmux-control -- time-stamp -- Oct/13/10 2:49:18 --res lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0 Call Trace: [c001effbfab0] [c0011934] .show_stack+0x6c/0x16c (unreliable) [c001effbfb60] [c02c9274] .spin_bug+0xb0/0xd4 [c001effbfbf0] [c02c953c] .do_raw_spin_lock+0x48/0x184 [c001effbfc90] [c054af78] ._raw_spin_lock+0x10/0x24 [c001effbfd00] [d3015908] .__stp_time_timer_callback+0x94/0x13c [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] [...] kernel BUG at kernel/timer.c:681! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries [...] [c001effbfc50] [c001effbfd00] 0xc001effbfd00 (unreliable) [c001effbfd00] [d301597c] .__stp_time_timer_callback+0x108/0x13c [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] [c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8 We have had occasional problems in the past with something like this: http://sourceware.org/PR10651, but it never was tracked down to a systemtap bug per se, as opposed to suspicions that the kernel was not satisfying one of its guarantees w.r.t. del_timer_sync(). - FChE Sorry Frank for the late reply The Oops occured during the execution of systemtap tests. stap version being : Snapshot: 89e2abb Attached are the systemtap test execution logs Thanks Divya 10/13 15:06:44 DEBUG| utils:0053| Running 'which gcc' 10/13 15:06:44 DEBUG| utils:0085| /usr/bin/gcc 10/13 15:06:44 INFO | test:0256| Test started. Number of iterations: 1 10/13 15:06:44 INFO | test:0259| Executing iteration 1 of 1 10/13 15:06:44 DEBUG| utils:0053| Running 'PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/bin:/usr/local/autobench/autotest/deps/dejagnu/dejagnu/bin:$PATH make installcheck' 10/13 15:06:44 DEBUG| utils:0085| make check-DEJAGNU RUNTESTFLAGS=" --tool_opts \'install \'" 10/13 15:06:44 DEBUG| utils:0085| make[1]: Entering directory `/usr/local/autobench/autotest/tests/systemtap/test' 10/13 15:06:44 DEBUG| utils:0085| srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd /usr/local/autobench/autotest/tests/systemtap/src/testsuite && pwd`; export srcdir; \ 10/13 15:06:44 DEBUG| utils:0085| EXPECT=expect; export EXPECT; \ 10/13 15:06:44 DEBUG| utils:0085| runtest="env SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/usr/local/autobench/autotest/deps/systemtap/systemtap/share/systemtap/runtime SYSTEMTAP_TAPSET=/usr/local/autobench/autotest/deps/systemtap/systemtap/share/systemtap/tapset LD_LIBRARY_PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/lib/systemtap CRASH_LIBDIR=/usr/local/autobench/autotest/deps/systemtap/systemtap/lib/systemtap PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/bin:$PATH SYSTEMTAP_PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/bin SYSTEMTAP_INCLUDES=/usr/local/autobench/autotest/deps/systemtap/systemtap/include PKGLIBDIR=/usr/local/autobench/autotest/deps/systemtap/systemtap/libexec/systemtap /usr/local/autobench/autotest/tests/systemtap/src/testsuite/execrc runtest"; \ 10/13 15:06:44 DEBUG| utils:0085| if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \ 10/13 15:06:44 DEBUG| utils:0085| exit_status=0; l='systemtap'; for tool in $l; do \ 10/13 15:06:44 DEBUG| utils:0085| if $runtest --tool $tool --tool_opts \'\' --srcdir $srcdir --tool_opts \'install \'; \ 10/13 15:06:44 DEBUG| utils:0085| then :; else exit_status=1; fi; \ 10/13 15:06:44 DEBUG| utils:0085| done; \ 10/13 15:06:44 DEBUG| utils:0085| else echo "WARNING: could not find \`runtest'" 1>&2; :;\ 10/13 15:06:44 DEBUG| utils:0085| fi; \ 10/13 15:06:44 DEBUG| utils:0085| exit $exit_status 10/13 15:06:45 ERROR| utils:0085| WARNING: Couldn't find the global config file. 10/13 15:06:45 DEBUG| utils:0085| kernel location: 10/13 15:06:45 DEBUG| utils:0085| kernel version: 2.6.36-rc7-git3-autotest 10/13 15:06:45 DEBUG| utils:0085| systemtap location: /usr/local/autobench/autotest/deps/systemtap/systemtap/bin/stap 10/13 15:06:45 DEBUG| utils:0085| systemtap version: version 1.4/0.148 non-git sources 10/13 15:06:45 DEBUG| utils:0085| gcc location: /usr/bin/gcc 10/13 15:06:45 DEBUG| utils:0085| gcc version: gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291] 10/13 15:06:46 DEBUG| utils:0085| Test Run By root on Wed Oct 13 15:06:46 2010 10/13 15:06:46 DEBUG| utils:0085| Native configuration is powerpc64-unknown-linux-gnu 10/13 15:06:46 DEBUG| utils:0085| 10
Re: Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3
divya writes: > While running systemtap tests on the p6 machine , against the kernel > version 2.6.36-rc7-git3 Oops occured , here are the call trace Did the oops happen during a systemtap module startup vs. operation vs. shutdown? stap -V version string? > BUG: spinlock bad magic on CPU#6, stapio/20398 > -- 0:conmux-control -- time-stamp -- Oct/13/10 2:49:18 --res > lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0 > Call Trace: > [c001effbfab0] [c0011934] .show_stack+0x6c/0x16c (unreliable) > [c001effbfb60] [c02c9274] .spin_bug+0xb0/0xd4 > [c001effbfbf0] [c02c953c] .do_raw_spin_lock+0x48/0x184 > [c001effbfc90] [c054af78] ._raw_spin_lock+0x10/0x24 > [c001effbfd00] [d3015908] .__stp_time_timer_callback+0x94/0x13c > [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] > [...] > kernel BUG at kernel/timer.c:681! > Oops: Exception in kernel mode, sig: 5 [#1] > SMP NR_CPUS=1024 NUMA pSeries > [...] > [c001effbfc50] [c001effbfd00] 0xc001effbfd00 (unreliable) > [c001effbfd00] [d301597c] .__stp_time_timer_callback+0x108/0x13c > [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] > [c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8 We have had occasional problems in the past with something like this: http://sourceware.org/PR10651, but it never was tracked down to a systemtap bug per se, as opposed to suspicions that the kernel was not satisfying one of its guarantees w.r.t. del_timer_sync(). - FChE ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3
While running systemtap tests on the p6 machine , against the kernel version 2.6.36-rc7-git3 Oops occured , here are the call trace BUG: spinlock bad magic on CPU#6, stapio/20398 -- 0:conmux-control -- time-stamp -- Oct/13/10 2:49:18 --res lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0 Call Trace: [c001effbfab0] [c0011934] .show_stack+0x6c/0x16c (unreliable) [c001effbfb60] [c02c9274] .spin_bug+0xb0/0xd4 [c001effbfbf0] [c02c953c] .do_raw_spin_lock+0x48/0x184 [c001effbfc90] [c054af78] ._raw_spin_lock+0x10/0x24 [c001effbfd00] [d3015908] .__stp_time_timer_callback+0x94/0x13c [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] [c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8 [c001effbfec0] [c00952d0] .__do_softirq+0xe4/0x1b4 [c001effbff90] [c002a7a8] .call_do_softirq+0x14/0x24 [c0010af1f560] [c000dde4] .do_softirq+0x88/0xf0 [c0010af1f600] [c0095030] .irq_exit+0x50/0xac [c0010af1f680] [c0027660] .timer_interrupt+0x110/0x13c [c0010af1f710] [c0003718] decrementer_common+0x118/0x180 --- Exception: 901 at .smp_call_function_many+0x284/0x2d0 LR = .smp_call_function_many+0x268/0x2d0 [c0010af1fae0] [c00c55c4] .smp_call_function+0x3c/0x54 [c0010af1fb60] [c0094cdc] .on_each_cpu+0x24/0x84 [c0010af1fc00] [d3016738] ._stp_ctl_write_cmd+0x3b0/0x9c8 [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] [c0010af1fce0] [c01585f4] .vfs_write+0xd0/0x1b8 [c0010af1fd80] [c01587e4] .SyS_write+0x58/0xa0 [c0010af1fe30] [c00085b4] syscall_exit+0x0/0x40 [ cut here ] kernel BUG at kernel/timer.c:681! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries last sysfs file: /sys/module/stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798/sections/__param Modules linked in: stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798 ipv6 fuse loop dm_mod ibmveth sg sr_mod cdrom sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod [last unloaded: stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] NIP: c009d090 LR: d301597c CTR: c009cfb0 REGS: c001effbf9d0 TRAP: 0700 Not tainted (2.6.36-rc7-git3-autotest) MSR: 80029032 CR: 28000482 XER: 0002 TASK = c00103422410[20398] 'stapio' THREAD: c0010af1c000 CPU: 6 GPR00: 0001 c001effbfc50 c0a31660 c0fcfa48 GPR04: 0001c8da 0070 0002 GPR08: c0ac7bf8 0006 c009cfb0 GPR12: d3017090 cf600f00 10008fd8 10008ff8 GPR16: c0a91180 0001 GPR20: c0010edb9030 c0010edb9430 c0010edb9830 c0010edb9c30 GPR24: 0001 001167a5b590dc41 c0fcfa18 0001c8da GPR28: c0fcfa48 d301597c c09a6470 0001c8da NIP [c009d090] .mod_timer+0xe0/0x24c LR [d301597c] .__stp_time_timer_callback+0x108/0x13c [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] Call Trace: [c001effbfc50] [c001effbfd00] 0xc001effbfd00 (unreliable) [c001effbfd00] [d301597c] .__stp_time_timer_callback+0x108/0x13c [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] [c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8 [c001effbfec0] [c00952d0] .__do_softirq+0xe4/0x1b4 [c001effbff90] [c002a7a8] .call_do_softirq+0x14/0x24 [c0010af1f560] [c000dde4] .do_softirq+0x88/0xf0 [c0010af1f600] [c0095030] .irq_exit+0x50/0xac [c0010af1f680] [c0027660] .timer_interrupt+0x110/0x13c [c0010af1f710] [c0003718] decrementer_common+0x118/0x180 --- Exception: 901 at .smp_call_function_many+0x284/0x2d0 LR = .smp_call_function_many+0x268/0x2d0 [c0010af1fae0] [c00c55c4] .smp_call_function+0x3c/0x54 [c0010af1fb60] [c0094cdc] .on_each_cpu+0x24/0x84 [c0010af1fc00] [d3016738] ._stp_ctl_write_cmd+0x3b0/0x9c8 [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798] [c0010af1fce0] [c01585f4] .vfs_write+0xd0/0x1b8 [c0010af1fd80] [c01587e4] .SyS_write+0x58/0xa0 [c0010af1fe30] [c00085b4] syscall_exit+0x0/0x40 Instruction dump: 7ffb4878 f9210070 e93e8078 8009 2f80 419e0010 7fa4eb78 7f83e378 4bfffcb1 e81c0020 7c74 7800d182<0b00> 7f83e378 38810070 3b40 Kernel panic - not syncing: Fatal exception in interrupt Call Trace: [c001effbf5b0] [c0011934] .show_stack+0x6c/0x16c (unreliable) [c001effbf660] [c0552094] .panic+0x9c/0x204 [c001effbf700] [c0028974] .die+0x268/0x2ac [c001effbf7a0] [c0028ca8] ._exception+0x88/0x174 [c001effbf960] [c0004f8c] program_check_common+0x10c/0x180 --- Exception: 700 at .mod_timer+0xe0/0x24c LR = .__stp_time_timer_callb