Re: Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3

2010-11-03 Thread Frank Ch. Eigler

divya  writes:

> While running systemtap tests on the p6 machine , against the kernel version 
> 2.6.36-rc7-git3
> Oops occured , here are the call trace
>
>  BUG: spinlock bad magic on CPU#6, stapio/20398
> -- 0:conmux-control -- time-stamp -- Oct/13/10  2:49:18 --res
>  lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0
> [...]

jistone committed some timing-related changes last night.  Would you
mind trying a new build on your ppc box?

It seems as through there is a concurrency problem with the timer
shutdown process.  If the problem still occurs, we might need to
instrument that part of the kernel and/or runtime code to figure it
out.

- FChE
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3

2010-10-25 Thread divya

On Friday 15 October 2010 02:16 AM, Frank Ch. Eigler wrote:

divya  writes:

   

While running systemtap tests on the p6 machine , against the kernel
version 2.6.36-rc7-git3 Oops occured , here are the call trace
 

Did the oops happen during a systemtap module startup vs. operation
vs. shutdown?  stap -V version string?

   

  BUG: spinlock bad magic on CPU#6, stapio/20398
-- 0:conmux-control -- time-stamp -- Oct/13/10  2:49:18 --res
  lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0
Call Trace:
[c001effbfab0] [c0011934] .show_stack+0x6c/0x16c (unreliable)
[c001effbfb60] [c02c9274] .spin_bug+0xb0/0xd4
[c001effbfbf0] [c02c953c] .do_raw_spin_lock+0x48/0x184
[c001effbfc90] [c054af78] ._raw_spin_lock+0x10/0x24
[c001effbfd00] [d3015908] .__stp_time_timer_callback+0x94/0x13c 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
[...]
kernel BUG at kernel/timer.c:681!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
[...]
[c001effbfc50] [c001effbfd00] 0xc001effbfd00 (unreliable)
[c001effbfd00] [d301597c] .__stp_time_timer_callback+0x108/0x13c 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
[c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8
 


We have had occasional problems in the past with something like this:
http://sourceware.org/PR10651, but it never was tracked down to a
systemtap bug per se, as opposed to suspicions that the kernel was not
satisfying one of its guarantees w.r.t. del_timer_sync().

- FChE
   

Sorry Frank for the late reply
The Oops occured during the execution of systemtap tests.

stap version being :
 Snapshot: 89e2abb

Attached are the systemtap test execution logs

Thanks
Divya



10/13 15:06:44 DEBUG| utils:0053| Running 'which gcc'
10/13 15:06:44 DEBUG| utils:0085| /usr/bin/gcc
10/13 15:06:44 INFO |  test:0256| Test started. Number of iterations: 1
10/13 15:06:44 INFO |  test:0259| Executing iteration 1 of 1
10/13 15:06:44 DEBUG| utils:0053| Running 
'PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/bin:/usr/local/autobench/autotest/deps/dejagnu/dejagnu/bin:$PATH
 make installcheck'
10/13 15:06:44 DEBUG| utils:0085| make  check-DEJAGNU RUNTESTFLAGS=" 
--tool_opts \'install \'"
10/13 15:06:44 DEBUG| utils:0085| make[1]: Entering directory 
`/usr/local/autobench/autotest/tests/systemtap/test'
10/13 15:06:44 DEBUG| utils:0085| srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd 
/usr/local/autobench/autotest/tests/systemtap/src/testsuite && pwd`; export 
srcdir; \
10/13 15:06:44 DEBUG| utils:0085|   EXPECT=expect; export EXPECT; \
10/13 15:06:44 DEBUG| utils:0085|   runtest="env SYSTEMTAP_TESTAPPS= 
SYSTEMTAP_RUNTIME=/usr/local/autobench/autotest/deps/systemtap/systemtap/share/systemtap/runtime
 
SYSTEMTAP_TAPSET=/usr/local/autobench/autotest/deps/systemtap/systemtap/share/systemtap/tapset
 
LD_LIBRARY_PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/lib/systemtap
 
CRASH_LIBDIR=/usr/local/autobench/autotest/deps/systemtap/systemtap/lib/systemtap
 PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/bin:$PATH 
SYSTEMTAP_PATH=/usr/local/autobench/autotest/deps/systemtap/systemtap/bin 
SYSTEMTAP_INCLUDES=/usr/local/autobench/autotest/deps/systemtap/systemtap/include
  
PKGLIBDIR=/usr/local/autobench/autotest/deps/systemtap/systemtap/libexec/systemtap
 /usr/local/autobench/autotest/tests/systemtap/src/testsuite/execrc runtest"; \
10/13 15:06:44 DEBUG| utils:0085|   if /bin/sh -c "$runtest --version" > 
/dev/null 2>&1; then \
10/13 15:06:44 DEBUG| utils:0085| exit_status=0; l='systemtap'; for 
tool in $l; do \
10/13 15:06:44 DEBUG| utils:0085|   if $runtest  --tool $tool 
--tool_opts \'\' --srcdir $srcdir --tool_opts \'install \'; \
10/13 15:06:44 DEBUG| utils:0085|   then :; else exit_status=1; fi; \
10/13 15:06:44 DEBUG| utils:0085| done; \
10/13 15:06:44 DEBUG| utils:0085|   else echo "WARNING: could not find 
\`runtest'" 1>&2; :;\
10/13 15:06:44 DEBUG| utils:0085|   fi; \
10/13 15:06:44 DEBUG| utils:0085|   exit $exit_status
10/13 15:06:45 ERROR| utils:0085| WARNING: Couldn't find the global config 
file.
10/13 15:06:45 DEBUG| utils:0085| kernel location: 
10/13 15:06:45 DEBUG| utils:0085| kernel version: 2.6.36-rc7-git3-autotest
10/13 15:06:45 DEBUG| utils:0085| systemtap location: 
/usr/local/autobench/autotest/deps/systemtap/systemtap/bin/stap
10/13 15:06:45 DEBUG| utils:0085| systemtap version: version 1.4/0.148 
non-git sources
10/13 15:06:45 DEBUG| utils:0085| gcc location: /usr/bin/gcc
10/13 15:06:45 DEBUG| utils:0085| gcc version: gcc (SUSE Linux) 4.3.2 
[gcc-4_3-branch revision 141291]
10/13 15:06:46 DEBUG| utils:0085| Test Run By root on Wed Oct 13 15:06:46 
2010
10/13 15:06:46 DEBUG| utils:0085| Native configuration is 
powerpc64-unknown-linux-gnu
10/13 15:06:46 DEBUG| utils:0085| 
10

Re: Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3

2010-10-14 Thread Frank Ch. Eigler

divya  writes:

> While running systemtap tests on the p6 machine , against the kernel
> version 2.6.36-rc7-git3 Oops occured , here are the call trace

Did the oops happen during a systemtap module startup vs. operation
vs. shutdown?  stap -V version string?

>  BUG: spinlock bad magic on CPU#6, stapio/20398
> -- 0:conmux-control -- time-stamp -- Oct/13/10  2:49:18 --res
>  lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0
> Call Trace:
> [c001effbfab0] [c0011934] .show_stack+0x6c/0x16c (unreliable)
> [c001effbfb60] [c02c9274] .spin_bug+0xb0/0xd4
> [c001effbfbf0] [c02c953c] .do_raw_spin_lock+0x48/0x184
> [c001effbfc90] [c054af78] ._raw_spin_lock+0x10/0x24
> [c001effbfd00] [d3015908] .__stp_time_timer_callback+0x94/0x13c 
> [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
> [...]
> kernel BUG at kernel/timer.c:681!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> [...]
> [c001effbfc50] [c001effbfd00] 0xc001effbfd00 (unreliable)
> [c001effbfd00] [d301597c] .__stp_time_timer_callback+0x108/0x13c 
> [stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
> [c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8


We have had occasional problems in the past with something like this:
http://sourceware.org/PR10651, but it never was tracked down to a
systemtap bug per se, as opposed to suspicions that the kernel was not
satisfying one of its guarantees w.r.t. del_timer_sync().

- FChE
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Oops while running systemtap on the p6 machine against the kernel version 2.6.36-rc7-git3

2010-10-14 Thread divya

While running systemtap tests on the p6 machine , against the kernel version 
2.6.36-rc7-git3
Oops occured , here are the call trace

 BUG: spinlock bad magic on CPU#6, stapio/20398
-- 0:conmux-control -- time-stamp -- Oct/13/10  2:49:18 --res
 lock: c0fcfa18, .magic: , .owner:/-1, .owner_cpu: 0
Call Trace:
[c001effbfab0] [c0011934] .show_stack+0x6c/0x16c (unreliable)
[c001effbfb60] [c02c9274] .spin_bug+0xb0/0xd4
[c001effbfbf0] [c02c953c] .do_raw_spin_lock+0x48/0x184
[c001effbfc90] [c054af78] ._raw_spin_lock+0x10/0x24
[c001effbfd00] [d3015908] .__stp_time_timer_callback+0x94/0x13c 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
[c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8
[c001effbfec0] [c00952d0] .__do_softirq+0xe4/0x1b4
[c001effbff90] [c002a7a8] .call_do_softirq+0x14/0x24
[c0010af1f560] [c000dde4] .do_softirq+0x88/0xf0
[c0010af1f600] [c0095030] .irq_exit+0x50/0xac
[c0010af1f680] [c0027660] .timer_interrupt+0x110/0x13c
[c0010af1f710] [c0003718] decrementer_common+0x118/0x180
--- Exception: 901 at .smp_call_function_many+0x284/0x2d0
LR = .smp_call_function_many+0x268/0x2d0
[c0010af1fae0] [c00c55c4] .smp_call_function+0x3c/0x54
[c0010af1fb60] [c0094cdc] .on_each_cpu+0x24/0x84
[c0010af1fc00] [d3016738] ._stp_ctl_write_cmd+0x3b0/0x9c8 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
[c0010af1fce0] [c01585f4] .vfs_write+0xd0/0x1b8
[c0010af1fd80] [c01587e4] .SyS_write+0x58/0xa0
[c0010af1fe30] [c00085b4] syscall_exit+0x0/0x40
[ cut here ]
kernel BUG at kernel/timer.c:681!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
last sysfs file: 
/sys/module/stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798/sections/__param
Modules linked in: stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798 ipv6 fuse loop 
dm_mod ibmveth sg sr_mod cdrom sd_mod crc_t10dif ibmvscsic scsi_transport_srp 
scsi_tgt scsi_mod [last unloaded: stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
NIP: c009d090 LR: d301597c CTR: c009cfb0
REGS: c001effbf9d0 TRAP: 0700   Not tainted  (2.6.36-rc7-git3-autotest)
MSR: 80029032   CR: 28000482  XER: 0002
TASK = c00103422410[20398] 'stapio' THREAD: c0010af1c000 CPU: 6
GPR00: 0001 c001effbfc50 c0a31660 c0fcfa48
GPR04: 0001c8da 0070 0002 
GPR08:  c0ac7bf8 0006 c009cfb0
GPR12: d3017090 cf600f00 10008fd8 10008ff8
GPR16:  c0a91180  0001
GPR20: c0010edb9030 c0010edb9430 c0010edb9830 c0010edb9c30
GPR24: 0001 001167a5b590dc41 c0fcfa18 0001c8da
GPR28: c0fcfa48 d301597c c09a6470 0001c8da
NIP [c009d090] .mod_timer+0xe0/0x24c
LR [d301597c] .__stp_time_timer_callback+0x108/0x13c 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
Call Trace:
[c001effbfc50] [c001effbfd00] 0xc001effbfd00 (unreliable)
[c001effbfd00] [d301597c] .__stp_time_timer_callback+0x108/0x13c 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
[c001effbfdc0] [c009c2f8] .run_timer_softirq+0x1d8/0x2a8
[c001effbfec0] [c00952d0] .__do_softirq+0xe4/0x1b4
[c001effbff90] [c002a7a8] .call_do_softirq+0x14/0x24
[c0010af1f560] [c000dde4] .do_softirq+0x88/0xf0
[c0010af1f600] [c0095030] .irq_exit+0x50/0xac
[c0010af1f680] [c0027660] .timer_interrupt+0x110/0x13c
[c0010af1f710] [c0003718] decrementer_common+0x118/0x180
--- Exception: 901 at .smp_call_function_many+0x284/0x2d0
LR = .smp_call_function_many+0x268/0x2d0
[c0010af1fae0] [c00c55c4] .smp_call_function+0x3c/0x54
[c0010af1fb60] [c0094cdc] .on_each_cpu+0x24/0x84
[c0010af1fc00] [d3016738] ._stp_ctl_write_cmd+0x3b0/0x9c8 
[stap_75ce6f84d34f8665c9a6b8e27fb9ea95_818798]
[c0010af1fce0] [c01585f4] .vfs_write+0xd0/0x1b8
[c0010af1fd80] [c01587e4] .SyS_write+0x58/0xa0
[c0010af1fe30] [c00085b4] syscall_exit+0x0/0x40
Instruction dump:
7ffb4878 f9210070 e93e8078 8009 2f80 419e0010 7fa4eb78 7f83e378
4bfffcb1 e81c0020 7c74 7800d182<0b00>  7f83e378 38810070 3b40
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[c001effbf5b0] [c0011934] .show_stack+0x6c/0x16c (unreliable)
[c001effbf660] [c0552094] .panic+0x9c/0x204
[c001effbf700] [c0028974] .die+0x268/0x2ac
[c001effbf7a0] [c0028ca8] ._exception+0x88/0x174
[c001effbf960] [c0004f8c] program_check_common+0x10c/0x180
--- Exception: 700 at .mod_timer+0xe0/0x24c
LR = .__stp_time_timer_callb