On Tue, 2005-09-27 at 12:53, James Lentini wrote: > On Tue, 27 Sep 2005, Hal Rosenstock wrote: > > > > Since we don't check for a kmalloc failure in DT_Tdep_PT_Printf, this > > > oops occurs: > > > > > > > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer > > > > dereference at virtual address 00000004 > > > > > > I've checked in the patch below to fix that, but this is not the root > > > of the problem. > > > > I'll try it with the patch and let you know how it behaves. When it > > still runs out of memory will it fail more gracefully ? I understand it > > won't fix the root cause of running out of memory. > > It should behave more gracefully. Thanks for testing.
That seems better but I still see the following: Sep 28 09:33:07 hal kernel: teback:0 unstable:0 free:420 slab:29838 mapped:28019 pagetables:487 Sep 28 09:33:07 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:3560kB inactive:1596kB present:16384kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 240 240 Sep 28 09:33:07 hal kernel: Normal free:672kB min:1920kB low:2400kB high:2880kB active:90152kB inactive:25992kB present:245760kB pages_scanned:91 all_unreclaimable? no Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:07 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:07 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB Sep 28 09:33:07 hal kernel: Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 672kB Sep 28 09:33:07 hal kernel: HighMem: empty Sep 28 09:33:07 hal kernel: Swap cache: add 19875, delete 15824, find 4973/5982, race 0+0 Sep 28 09:33:07 hal kernel: Free swap = 483892kB Sep 28 09:33:07 hal kernel: Total swap = 522104kB Sep 28 09:33:07 hal kernel: Free swap: 483892kB Sep 28 09:33:09 hal kernel: 65536 pages of RAM Sep 28 09:33:10 hal kernel: 0 pages of HIGHMEM Sep 28 09:33:10 hal kernel: 1533 reserved pages Sep 28 09:33:11 hal kernel: 47248 pages shared Sep 28 09:33:11 hal kernel: 4051 pages swap cached Sep 28 09:33:11 hal kernel: 0 pages dirty Sep 28 09:33:11 hal kernel: 0 pages writeback Sep 28 09:33:11 hal kernel: 28019 pages mapped Sep 28 09:33:11 hal kernel: 29838 pages slab Sep 28 09:33:11 hal kernel: 487 pages pagetables Sep 28 09:33:11 hal kernel: DT_Tdep_PT_Printf: out of memory Sep 28 09:33:11 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20 Sep 28 09:33:11 hal kernel: [<c014d512>] __alloc_pages+0x2f2/0x490 Sep 28 09:33:11 hal kernel: [<c0151001>] kmem_getpages+0x31/0xb0 Sep 28 09:33:11 hal kernel: [<c0152a79>] cache_grow+0x139/0x360 Sep 28 09:33:11 hal kernel: [<c0153251>] cache_alloc_refill+0x151/0x340 Sep 28 09:33:11 hal kernel: [<d0ace21a>] DT_handle_send_op+0x2fa/0x400 [kdapltest] Sep 28 09:33:11 hal kernel: [<c0153b44>] __kmalloc+0xb4/0xf0 Sep 28 09:33:11 hal kernel: [<d0ad86d5>] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 28 09:33:11 hal kernel: [<d0ad9566>] DT_Tdep_PT_Printf+0x16/0x1d0 [kdapltest] Sep 28 09:33:11 hal kernel: [<d0acc9f8>] DT_Transaction_Run+0x2c8/0xb60 [kdapltest] Sep 28 09:33:11 hal kernel: [<d0ad888d>] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [<d0ad888d>] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [<d0acb918>] DT_Transaction_Main+0x1388/0x21a0 [kdapltest] Sep 28 09:33:11 hal kernel: [<c0113e3d>] __change_page_attr+0x2d/0x170 Sep 28 09:33:11 hal kernel: [<c0152eb6>] cache_free_debugcheck+0x196/0x2d0 Sep 28 09:33:11 hal kernel: [<d0ad878f>] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [<d0ad8770>] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [<c0100f75>] kernel_thread_helper+0x5/0x10 Sep 28 09:33:11 hal kernel: DMA per-cpu: Sep 28 09:33:11 hal kernel: cpu 0 hot: low 2, high 6, batch 1 used:2 Sep 28 09:33:11 hal kernel: cpu 0 cold: low 0, high 2, batch 1 used:1 Sep 28 09:33:11 hal kernel: Normal per-cpu: Sep 28 09:33:11 hal kernel: cpu 0 hot: low 62, high 186, batch 31 used:92 Sep 28 09:33:11 hal kernel: cpu 0 cold: low 0, high 62, batch 31 used:34 Sep 28 09:33:11 hal kernel: HighMem per-cpu: empty Sep 28 09:33:11 hal kernel: Free pages: 1680kB (0kB HighMem) Sep 28 09:33:11 hal kernel: Active:23428 inactive:6897 dirty:0 writeback:0 unstable:0 free:420 slab:29838 mapped:28019 pagetables:487 Sep 28 09:33:11 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:3560kB inactive:1596kB present:16384kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 240 240 Sep 28 09:33:11 hal kernel: Normal free:672kB min:1920kB low:2400kB high:2880kB active:90152kB inactive:25992kB present:245760kB pages_scanned:91 all_unreclaimable? no Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:11 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:11 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB Sep 28 09:33:11 hal kernel: Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 672kB Sep 28 09:33:11 hal kernel: HighMem: empty Sep 28 09:33:11 hal kernel: Swap cache: add 19875, delete 15824, find 4973/5982, race 0+0 Sep 28 09:33:11 hal kernel: Free swap = 483892kB Sep 28 09:33:11 hal kernel: Total swap = 522104kB Sep 28 09:33:11 hal kernel: Free swap: 483892kB Sep 28 09:33:11 hal kernel: 65536 pages of RAM Sep 28 09:33:11 hal kernel: 0 pages of HIGHMEM Sep 28 09:33:12 hal kernel: 1533 reserved pages Sep 28 09:33:12 hal kernel: 47248 pages shared Sep 28 09:33:12 hal kernel: 4051 pages swap cached Sep 28 09:33:12 hal kernel: 0 pages dirty Sep 28 09:33:12 hal kernel: 0 pages writeback Sep 28 09:33:12 hal kernel: 28019 pages mapped Sep 28 09:33:12 hal kernel: 29838 pages slab Sep 28 09:33:12 hal kernel: 487 pages pagetables Sep 28 09:33:12 hal kernel: DT_Tdep_PT_Printf: out of memory Sep 28 09:33:12 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20 Sep 28 09:33:12 hal kernel: [<c014d512>] __alloc_pages+0x2f2/0x490 Sep 28 09:33:12 hal kernel: [<c0151001>] kmem_getpages+0x31/0xb0 Sep 28 09:33:12 hal kernel: [<c0152a79>] cache_grow+0x139/0x360 Sep 28 09:33:12 hal kernel: [<c022981b>] vscnprintf+0x2b/0x40 Sep 28 09:33:12 hal kernel: [<c0153251>] cache_alloc_refill+0x151/0x340 Sep 28 09:33:12 hal kernel: [<c0153b44>] __kmalloc+0xb4/0xf0 Sep 28 09:33:12 hal kernel: [<d0ad86d5>] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 28 09:33:12 hal kernel: [<d0ad86d5>] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 28 09:33:12 hal kernel: <of memory Sep 28 09:33:12 hal kernel: DT_Tdep_PT_Printf: out of memory Sep 28 09:33:12 hal last message repeated 439 times Also, I don't understand why: kdapltest -T T -s <IP> -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR would work and kdapltest -T T -s <IP> -D mthca0a -d -i 10000 -w 8 client SR server SR would fail. It seems the former is more strenuous (everything same but 2 threads and less iterations). -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
