On 2/27/2012 10:17 AM, Donny Brooks wrote:
On 2/27/2012 10:15 AM, Donny Brooks wrote:
Here recently we have been having a major problem with high load on
our SOGo server. It will go from a normal load to unresponsive in
less than 5 minutes. If I don't catch it in time and stop/start the
services it will require I power the server off and restart it. The
last lines in the sogo.log before it hangs is this:
Feb 27 10:05:50 sogod [1877]: [WARN] <0x0xf8fc7a8[WOWatchDogChild]>
pid 3184 has been hanging in the same request for 1 minutes
Feb 27 10:05:54 sogod [1877]: [WARN] <0x0xf8db168[WOWatchDogChild]>
pid 3185 has been hanging in the same request for 1 minutes
Feb 27 10:05:54 sogod [1877]: [WARN] <0x0xf8fbae8[WOWatchDogChild]>
pid 3217 has been hanging in the same request for 1 minutes
Feb 27 10:05:54 sogod [1877]: [WARN] <0x0xf8f01e8[WOWatchDogChild]>
pid 1994 has been hanging in the same request for 4 minutes
Feb 27 10:05:55 sogod [1877]: [WARN] <0x0xf926298[WOWatchDogChild]>
pid 3054 has been hanging in the same request for 1 minutes
I have set the worker count to 8 (prefork setting) and the SvMemLimit
to 256. The machine has 6 cpu cores at 2.0Ghz each and I have given
it 6GB ram. I am at a loss. Any input would be great. Just really
looks bad that we did not have these issues until now, just when we
roll it out in production...figures huh.
Donny B.
Sorry forgot to say this is a Xen pv domu running Centos 5.7. Nothing
wrong with any of the other virtual machines except this one.
Ok, so this time I actually got the trace on the console when it died.
It appears that the sogod process is running out of memory, then eats up
the swap, and eventually crashes the machine. Here is what is displayed
on the console:
stopINFO: task kjournald:383 blocked for more than 120 seconds.ervice
memcached s
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff880001030860 0 383 7 405 354
(L-TLB)
ffff880179fc7dd0 0000000000000246 ffff880179fc7d60 ffffffff80262851
000000000000000a ffff88017ef970c0 ffff88017e8d47a0 0000000000001713
ffff88017ef972a8 ffffffff80215e2e
Call Trace:
[<ffffffff80262851>] __wait_on_bit+0x60/0x6e
[<ffffffff80215e2e>] sync_buffer+0x0/0x3f
[<ffffffff80288deb>] dequeue_task+0x18/0x37
[<ffffffff80288e32>] deactivate_task+0x28/0x5f
[<ffffffff8026ea05>] monotonic_clock+0x35/0x7b
[<ffffffff80263929>] _spin_lock_irqsave+0x9/0x14
[<ffffffff8803362d>] :jbd:journal_commit_transaction+0x173/0x10ae
[<ffffffff8029d756>] autoremove_wake_function+0x0/0x2e
[<ffffffff8024cc85>] try_to_del_timer_sync+0x7f/0x88
[<ffffffff8803765b>] :jbd:kjournald+0xc1/0x213
[<ffffffff8029d756>] autoremove_wake_function+0x0/0x2e
[<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
[<ffffffff8803759a>] :jbd:kjournald+0x0/0x213
[<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
[<ffffffff802339b7>] kthread+0xfe/0x132
[<ffffffff8025fb2c>] child_rip+0xa/0x12
[<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
[<ffffffff802338b9>] kthread+0x0/0x132
[<ffffffff8025fb22>] child_rip+0x0/0x12
sogod invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Call Trace:
[<ffffffff802c128f>] out_of_memory+0x8b/0x203
[<ffffffff8020faa8>] __alloc_pages+0x27f/0x308
[<ffffffff80213a19>] __do_page_cache_readahead+0xc8/0x1af
[<ffffffff8021428a>] filemap_nopage+0x14c/0x360
[<ffffffff80208e91>] __handle_mm_fault+0x444/0x144f
[<ffffffff8020622a>] hypercall_page+0x22a/0x1000
[<ffffffff80266d94>] do_page_fault+0xf72/0x131b
[<ffffffff8026ea05>] monotonic_clock+0x35/0x7b
[<ffffffff80261df5>] thread_return+0x6c/0x113
[<ffffffff8025f82b>] error_exit+0x0/0x6e
DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:19
cpu 0 cold: high 62, batch 15 used:17
cpu 1 hot: high 186, batch 31 used:30
cpu 1 cold: high 62, batch 15 used:58
cpu 2 hot: high 186, batch 31 used:29
cpu 2 cold: high 62, batch 15 used:10
cpu 3 hot: high 186, batch 31 used:29
cpu 3 cold: high 62, batch 15 used:46
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:55
cpu 5 hot: high 186, batch 31 used:30
cpu 5 cold: high 62, batch 15 used:60
Normal per-cpu:
cpu 0 hot: high 186, batch 31 used:51
cpu 0 cold: high 62, batch 15 used:32
cpu 1 hot: high 186, batch 31 used:21
cpu 1 cold: high 62, batch 15 used:47
cpu 2 hot: high 186, batch 31 used:30
cpu 2 cold: high 62, batch 15 used:53
cpu 3 hot: high 186, batch 31 used:1
cpu 3 cold: high 62, batch 15 used:59
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:55
cpu 5 hot: high 186, batch 31 used:6
cpu 5 cold: high 62, batch 15 used:43
HighMem per-cpu: empty
Free pages: 21296kB (0kB HighMem)
Active:754455 inactive:752969 dirty:0 writeback:0 unstable:0 free:5324
slab:4880 mapped-file:2012 mapped-anon:1507934 pagetables:11784
DMA free:3296kB min:12kB low:12kB high:16kB active:0kB inactive:0kB
present:9084kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 4024 6052 6052
DMA32 free:14704kB min:6612kB low:8264kB high:9916kB active:2015296kB
inactive:1992804kB present:4120800kB pages_scanned:7209383
all_unreclaimable? yes
lowmem_reserve[]: 0 0 2027 2027
Normal free:3296kB min:3332kB low:4164kB high:4996kB active:1002524kB
inactive:1019072kB present:2076560kB pages_scanned:8197357
all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 2*128kB 1*256kB 1*512kB 2*1024kB
0*2048kB 0*4096kB = 3296kB
DMA32: 24*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB
0*1024kB 1*2048kB 3*4096kB = 14704kB
Normal: 20*4kB 8*8kB 51*16kB 17*32kB 0*64kB 0*128kB 1*256kB 1*512kB
1*1024kB 0*2048kB 0*4096kB = 3296kB
HighMem: empty
2769 pagecache pages
Swap cache: add 925839, delete 925152, find 194321/239169, race 0+383
Free swap = 0kB
Total swap = 2096472kB
Out of memory: Killed process 4047, UID 986, (sogod).
sogod: page allocation failure. order:0, mode:0x201d2
Call Trace:
[<ffffffff8020fb18>] __alloc_pages+0x2ef/0x308
[<ffffffff80213a19>] __do_page_cache_readahead+0xc8/0x1af
[<ffffffff8021428a>] filemap_nopage+0x14c/0x360
[<ffffffff80208e91>] __handle_mm_fault+0x444/0x144f
[<ffffffff8020622a>] hypercall_page+0x22a/0x1000
[<ffffffff80266d94>] do_page_fault+0xf72/0x131b
[<ffffffff8026ea05>] monotonic_clock+0x35/0x7b
[<ffffffff80261df5>] thread_return+0x6c/0x113
[<ffffffff8025f82b>] error_exit+0x0/0x6e
DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:19
cpu 0 cold: high 62, batch 15 used:20
cpu 1 hot: high 186, batch 31 used:30
cpu 1 cold: high 62, batch 15 used:58
cpu 2 hot: high 186, batch 31 used:29
cpu 2 cold: high 62, batch 15 used:11
cpu 3 hot: high 186, batch 31 used:29
cpu 3 cold: high 62, batch 15 used:47
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:59
cpu 5 hot: high 186, batch 31 used:30
cpu 5 cold: high 62, batch 15 used:61
Normal per-cpu:
cpu 0 hot: high 186, batch 31 used:53
cpu 0 cold: high 62, batch 15 used:32
cpu 1 hot: high 186, batch 31 used:21
cpu 1 cold: high 62, batch 15 used:47
cpu 2 hot: high 186, batch 31 used:30
cpu 2 cold: high 62, batch 15 used:53
cpu 3 hot: high 186, batch 31 used:1
cpu 3 cold: high 62, batch 15 used:59
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:55
cpu 5 hot: high 186, batch 31 used:6
cpu 5 cold: high 62, batch 15 used:43
HighMem per-cpu: empty
Free pages: 21296kB (0kB HighMem)
Active:761008 inactive:746379 dirty:0 writeback:0 unstable:0 free:5324
slab:4878 mapped-file:2006 mapped-anon:1507929 pagetables:11784
DMA free:3296kB min:12kB low:12kB high:16kB active:0kB inactive:0kB
present:9084kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 4024 6052 6052
DMA32 free:14704kB min:6612kB low:8264kB high:9916kB active:2036812kB
inactive:1971388kB present:4120800kB pages_scanned:9995769
all_unreclaimable? yes
lowmem_reserve[]: 0 0 2027 2027
Normal free:3296kB min:3332kB low:4164kB high:4996kB active:1007220kB
inactive:1014128kB present:2076560kB pages_scanned:8600508
all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 2*128kB 1*256kB 1*512kB 2*1024kB
0*2048kB 0*4096kB = 3296kB
DMA32: 24*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB
0*1024kB 1*2048kB 3*4096kB = 14704kB
Normal: 20*4kB 8*8kB 51*16kB 17*32kB 0*64kB 0*128kB 1*256kB 1*512kB
1*1024kB 0*2048kB 0*4096kB = 3296kB
HighMem: empty
2758 pagecache pages
Swap cache: add 925839, delete 925157, find 194321/239169, race 0+383
Free swap = 0kB
Total swap = 2096472kB
Out of memory: Killed process 2015, UID 986, (sogod).
--
[email protected]
https://inverse.ca/sogo/lists