Hi all,
I have a system running bacula backup for about a year with no problems.
The last two to three weeks i suddenly had occasional system crashes
(complete hangs where I need to reset).
I checked the syslog for any clues and found this error popping up several
times during backup:
[198937.504359] swapper: page allocation failure. order:0, mode:0x20
[198937.504359] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
[198937.504359]
[198937.504359] Call Trace:
[198937.504359] IRQ [80276c15]
__alloc_pages_internal+0x3a6/0x3bf
[198937.504359] [802957ec] kmem_getpages+0x96/0x15f
[198937.504359] [80295e7c] fallback_alloc+0x16b/0x1e1
[198937.504359] [80295a59] kmem_cache_alloc_node+0x105/0x138
[198937.504359] [803b6771] __alloc_skb+0x3c/0x12d
[198937.504359] [803b7707] __netdev_alloc_skb+0x29/0x43
[198937.504359] [a00bd4ee]
:e1000e:e1000_alloc_rx_buffers+0x83/0x1db
[198937.504359] [a00bd34f] :e1000e:e1000_clean_rx_irq+0x29f/0x2da
[198937.504359] [a00bbf74] :e1000e:e1000_clean+0x88/0x270
[198937.504359] [803bd297] net_rx_action+0xab/0x1da
[198937.504359] [802393a1] __do_softirq+0x5c/0xd1
[198937.504359] [8020d2dc] call_softirq+0x1c/0x28
[198937.504359] [8020f3e8] do_softirq+0x3c/0x81
[198937.504359] [802392ff] irq_exit+0x3f/0x85
[198937.504359] [8020f648] do_IRQ+0xb9/0xd9
[198937.504359] [80212b57] mwait_idle+0x0/0x4d
[198937.504359] [8020c47d] ret_from_intr+0x0/0x19
[198937.504359] EOI [8021a857] lapic_next_event+0x0/0x13
[198937.504359] [80212b98] mwait_idle+0x41/0x4d
[198937.504359] [8020ad04] cpu_idle+0x8e/0xb8
[198937.504359]
[198937.504359] Mem-info:
[198937.504359] Node 0 DMA per-cpu:
[198937.504359] CPU0: hi:0, btch: 1 usd: 0
[198937.504359] CPU1: hi:0, btch: 1 usd: 0
[198937.504359] CPU2: hi:0, btch: 1 usd: 0
[198937.504359] CPU3: hi:0, btch: 1 usd: 0
[198937.504359] Node 0 DMA32 per-cpu:
[198937.504359] CPU0: hi: 186, btch: 31 usd: 63
[198937.504359] CPU1: hi: 186, btch: 31 usd: 129
[198937.504359] CPU2: hi: 186, btch: 31 usd: 40
[198937.504359] CPU3: hi: 186, btch: 31 usd: 49
[198937.504359] Node 0 Normal per-cpu:
[198937.504359] CPU0: hi: 186, btch: 31 usd: 63
[198937.504359] CPU1: hi: 186, btch: 31 usd: 142
[198937.504359] CPU2: hi: 186, btch: 31 usd: 99
[198937.504359] CPU3: hi: 186, btch: 31 usd: 62
[198937.504359] Active:134041 inactive:3826671 dirty:20050 writeback:2
unstable:0
[198937.504359] free:17166 slab:124810 mapped:3927 pagetables:815 bounce:0
[198937.504359] Node 0 DMA free:11652kB min:8kB low:8kB high:12kB active:0kB
inactive:0kB present:10720kB pages_scanned:0 all_unreclaimable? yes
[198937.504359] lowmem_reserve[]: 0 3255 16133 16133
[198937.504359] Node 0 DMA32 free:52600kB min:3276kB low:4092kB high:4912kB
active:16220kB inactive:2819100kB present:664kB pages_scanned:0
all_unreclaimable? no
[198937.504359] lowmem_reserve[]: 0 0 12877 12877
[198937.504359] Node 0 Normal free:4412kB min:12968kB low:16208kB
high:19452kB active:519944kB inactive:12487584kB present:13186560kB
pages_scanned:0 all_unreclaimable? no
[198937.504359] lowmem_reserve[]: 0 0 0 0
[198937.504359] Node 0 DMA: 3*4kB 5*8kB 3*16kB 3*32kB 3*64kB 0*128kB 2*256kB
1*512kB 2*1024kB 0*2048kB 2*4096kB = 11652kB
[198937.504359] Node 0 DMA32: 74*4kB 32*8kB 33*16kB 307*32kB 161*64kB
151*128kB 23*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 52568kB
[198937.504359] Node 0 Normal: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 2*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 4360kB
[198937.504359] 3849584 total pagecache pages
[198937.504359] Swap cache: add 366679, delete 356085, find 103268/139352
[198937.504359] Free swap = 3746948kB
[198937.504359] Total swap = 4096564kB
[198937.504359] 4194304 pages of RAM
[198937.504359] 75804 reserved pages
[198937.504359] 3821940 pages shared
[198937.504359] 10594 pages swap cached
swapper: page allocation failure sounds to me like the system is trying to
swap and that fails. I had 4Gb RAM and tried upgrading it to 16Gb (to see if
there was a RAM shortage) but to no avail. I keep getting these messages.
It seems like it can handle a bunch of these messages before the system
really hangs.
I tried upgrading bacula from version 3.0.2 to 5.0.2 (thought maybe of a
memory leak) also to no avail.
Ran dbcheck (no errors), ran mysql repair table (no errors).
I noticed the File database was kinda getting large (~4GB) so I reconfigured
mysql with the my-huge.cnf config also to no avail.
Is there anything i'm overlooking?
What else could be the cause of this behaviour?
I'm using debian lenny 2.6.26-2-amd64 kernel with bacula 5.0.2 (from the
backports repository)
I'm backing up ~50 clients daily incremental and weekly full.
Kind regards,
Caspar Smit
Systemengineer
True Bit Resources B.V.
Ampèrestraat 13E
1446