Hello, Libor!
Every once in a while, when I run ttcp 

I get a kernel NULL pointer dereference from SDP

I compiled the ttcp.aio test with 

gcc -I../../../linux-kernel/infiniband/ulp/sdp ttcp.aio.c -O2 -o ttcp.aio.x 
-laio

I run ttcp on the server as

./ttcp.aio.x -r -l 100 -a 10

and the client as

./ttcp.aio.x -t -l 100 -n 100 -a 10 11.4.8.155

I repeated this test several times, sometimes getting 
ttcp-t: Event error <-32> <5275648>
messages and sometimes not.
It was the server that finally crashed.

My kernel is 2.6.11 + latest openib svn (rev 2171).

The log file leading to the crash is below:

Apr 18 17:34:11 swlab155 kernel:  ERR: : IOCB <0> cancel <0> flag <0040> size 
<1:0:1>
Apr 18 17:34:22 swlab155 kernel:  ERR: : IOCB <0> cancel <0> flag <0040> size 
<100:0:100>
Apr 18 17:34:41 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> 
<1:8:8>
Apr 18 17:34:41 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> 
<1:8:8>
Apr 18 17:34:49 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> 
<1:8:8>
Apr 18 17:34:49 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> 
<1:8:8>
Apr 18 17:34:59 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> 
<1:8:8>
Apr 18 17:34:59 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> 
<1:8:8>
Apr 18 17:34:59 swlab155 kernel: WARN: : Unexpected conn state. conn <9> state 
<ff01:fd01>
Apr 18 17:35:22 swlab155 kernel:  ERR: : IOCB <0> cancel <0> flag <0040> size 
<100:0:100>
Apr 18 17:35:34 swlab155 last message repeated 5 times
Apr 18 17:35:44 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> 
<1:8:8>
Apr 18 17:35:44 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> 
<1:8:8>
Apr 18 17:35:52 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> 
<1:8:8>
Apr 18 17:35:52 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> 
<1:8:8>
Apr 18 17:35:52 swlab155 kernel: WARN: : Cancel read with no IOCB. 
<2:0:00000005>
Apr 18 17:35:52 swlab155 kernel: Unable to handle kernel NULL pointer 
dereference at 0000000000000038 RIP: 
Apr 18 17:35:52 swlab155 kernel: <ffffffff80389f5e>{_spin_lock_irqsave+9}
Apr 18 17:35:52 swlab155 kernel: PGD 15cb56067 PUD 15cbb4067 PMD 0 
Apr 18 17:35:52 swlab155 kernel: Oops: 0002 [1] SMP 
Apr 18 17:35:52 swlab155 kernel: CPU 0 
Apr 18 17:35:52 swlab155 kernel: Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa 
ib_umad ib_mthca ib_mad ib_core
Apr 18 17:35:52 swlab155 kernel: Pid: 6, comm: events/0 Not tainted 
2.6.11-openib
Apr 18 17:35:52 swlab155 kernel: RIP: 0010:[_spin_lock_irqsave+9/27] 
<ffffffff80389f5e>{_spin_lock_irqsave+9}
Apr 18 17:35:52 swlab155 kernel: RIP: 0010:[<ffffffff80389f5e>] 
<ffffffff80389f5e>{_spin_lock_irqsave+9}
Apr 18 17:35:52 swlab155 kernel: RSP: 0000:ffff8100dfe9fe08  EFLAGS: 00010092
Apr 18 17:35:52 swlab155 kernel: RAX: 0000000000000064 RBX: 0000000000000000 
RCX: ffff81015c596528
Apr 18 17:35:52 swlab155 kernel: RDX: 0000000000000000 RSI: 0000000000000064 
RDI: 0000000000000038
Apr 18 17:35:52 swlab155 kernel: RBP: ffff81014dd23080 R08: ffff8100dfe9e000 
R09: 0000000000000000
Apr 18 17:35:52 swlab155 kernel: R10: 00000000ffffffff R11: 0000000000000000 
R12: 0000000000000068
Apr 18 17:35:52 swlab155 kernel: R13: 0000000000000064 R14: 0000000000000038 
R15: 0000000000000000
Apr 18 17:35:52 swlab155 kernel: FS:  0000000000000000(0000) 
GS:ffffffff80522c80(0000) knlGS:0000000000000000
Apr 18 17:35:52 swlab155 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
000000008005003b
Apr 18 17:35:52 swlab155 kernel: CR2: 0000000000000038 CR3: 000000015c626000 
CR4: 00000000000006e0
Apr 18 17:35:52 swlab155 kernel: Process events/0 (pid: 6, threadinfo 
ffff8100dfe9e000, task ffff8100dff02750)
Apr 18 17:35:52 swlab155 kernel: Stack: 0000000000000292 ffffffff8018663b 
0000000000000286 ffff81014dec1680 
Apr 18 17:35:52 swlab155 kernel:        ffff81014dec1718 ffff8100dffa2000 
ffff81014dec1680 0000000000000292 
Apr 18 17:35:52 swlab155 kernel:        ffffffff8804a8c2 ffffffff8804a956 
Apr 18 17:35:52 swlab155 kernel: Call 
Trace:<ffffffff8018663b>{aio_complete+129} 
<ffffffff8804a8c2>{:ib_sdp:do_iocb_complete+0} 
Apr 18 17:35:52 swlab155 kernel:        
<ffffffff8804a956>{:ib_sdp:do_iocb_complete+148} 
<ffffffff80140b1f>{worker_thread+476} 
Apr 18 17:35:52 swlab155 kernel:        
<ffffffff8012d10b>{default_wake_function+0} 
<ffffffff8012d10b>{default_wake_function+0} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff80140943>{worker_thread+0} 
<ffffffff80144a12>{kthread+206} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff8010dc43>{child_rip+8} 
<ffffffff80144944>{kthread+0} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff8010dc3b>{child_rip+0} 
Apr 18 17:35:52 swlab155 kernel: 
Apr 18 17:35:52 swlab155 kernel: Code: f0 fe 0f 0f 88 8b 01 00 00 48 8b 04 24 
48 83 c4 08 c3 fa f0 
Apr 18 17:35:52 swlab155 kernel: RIP <ffffffff80389f5e>{_spin_lock_irqsave+9} 
RSP <ffff8100dfe9fe08>
Apr 18 17:35:52 swlab155 kernel: CR2: 0000000000000038

-- 
MST - Michael S. Tsirkin
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to