On 2/27/2012 10:17 AM, Donny Brooks wrote:
On 2/27/2012 10:15 AM, Donny Brooks wrote:
Here recently we have been having a major problem with high load on our SOGo server. It will go from a normal load to unresponsive in less than 5 minutes. If I don't catch it in time and stop/start the services it will require I power the server off and restart it. The last lines in the sogo.log before it hangs is this:

Feb 27 10:05:50 sogod [1877]: [WARN] <0x0xf8fc7a8[WOWatchDogChild]> pid 3184 has been hanging in the same request for 1 minutes Feb 27 10:05:54 sogod [1877]: [WARN] <0x0xf8db168[WOWatchDogChild]> pid 3185 has been hanging in the same request for 1 minutes Feb 27 10:05:54 sogod [1877]: [WARN] <0x0xf8fbae8[WOWatchDogChild]> pid 3217 has been hanging in the same request for 1 minutes Feb 27 10:05:54 sogod [1877]: [WARN] <0x0xf8f01e8[WOWatchDogChild]> pid 1994 has been hanging in the same request for 4 minutes Feb 27 10:05:55 sogod [1877]: [WARN] <0x0xf926298[WOWatchDogChild]> pid 3054 has been hanging in the same request for 1 minutes


I have set the worker count to 8 (prefork setting) and the SvMemLimit to 256. The machine has 6 cpu cores at 2.0Ghz each and I have given it 6GB ram. I am at a loss. Any input would be great. Just really looks bad that we did not have these issues until now, just when we roll it out in production...figures huh.

Donny B.
Sorry forgot to say this is a Xen pv domu running Centos 5.7. Nothing wrong with any of the other virtual machines except this one.


Ok, so this time I actually got the trace on the console when it died. It appears that the sogod process is running out of memory, then eats up the swap, and eventually crashes the machine. Here is what is displayed on the console:


stopINFO: task kjournald:383 blocked for more than 120 seconds.ervice memcached s
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff880001030860 0 383 7 405 354 (L-TLB)
 ffff880179fc7dd0  0000000000000246  ffff880179fc7d60  ffffffff80262851
 000000000000000a  ffff88017ef970c0  ffff88017e8d47a0  0000000000001713
 ffff88017ef972a8  ffffffff80215e2e
Call Trace:
 [<ffffffff80262851>] __wait_on_bit+0x60/0x6e
 [<ffffffff80215e2e>] sync_buffer+0x0/0x3f
 [<ffffffff80288deb>] dequeue_task+0x18/0x37
 [<ffffffff80288e32>] deactivate_task+0x28/0x5f
 [<ffffffff8026ea05>] monotonic_clock+0x35/0x7b
 [<ffffffff80263929>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8803362d>] :jbd:journal_commit_transaction+0x173/0x10ae
 [<ffffffff8029d756>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8024cc85>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff8803765b>] :jbd:kjournald+0xc1/0x213
 [<ffffffff8029d756>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8803759a>] :jbd:kjournald+0x0/0x213
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802339b7>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802338b9>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

sogod invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff802c128f>] out_of_memory+0x8b/0x203
 [<ffffffff8020faa8>] __alloc_pages+0x27f/0x308
 [<ffffffff80213a19>] __do_page_cache_readahead+0xc8/0x1af
 [<ffffffff8021428a>] filemap_nopage+0x14c/0x360
 [<ffffffff80208e91>] __handle_mm_fault+0x444/0x144f
 [<ffffffff8020622a>] hypercall_page+0x22a/0x1000
 [<ffffffff80266d94>] do_page_fault+0xf72/0x131b
 [<ffffffff8026ea05>] monotonic_clock+0x35/0x7b
 [<ffffffff80261df5>] thread_return+0x6c/0x113
 [<ffffffff8025f82b>] error_exit+0x0/0x6e

DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:19
cpu 0 cold: high 62, batch 15 used:17
cpu 1 hot: high 186, batch 31 used:30
cpu 1 cold: high 62, batch 15 used:58
cpu 2 hot: high 186, batch 31 used:29
cpu 2 cold: high 62, batch 15 used:10
cpu 3 hot: high 186, batch 31 used:29
cpu 3 cold: high 62, batch 15 used:46
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:55
cpu 5 hot: high 186, batch 31 used:30
cpu 5 cold: high 62, batch 15 used:60
Normal per-cpu:
cpu 0 hot: high 186, batch 31 used:51
cpu 0 cold: high 62, batch 15 used:32
cpu 1 hot: high 186, batch 31 used:21
cpu 1 cold: high 62, batch 15 used:47
cpu 2 hot: high 186, batch 31 used:30
cpu 2 cold: high 62, batch 15 used:53
cpu 3 hot: high 186, batch 31 used:1
cpu 3 cold: high 62, batch 15 used:59
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:55
cpu 5 hot: high 186, batch 31 used:6
cpu 5 cold: high 62, batch 15 used:43
HighMem per-cpu: empty
Free pages:       21296kB (0kB HighMem)
Active:754455 inactive:752969 dirty:0 writeback:0 unstable:0 free:5324 slab:4880 mapped-file:2012 mapped-anon:1507934 pagetables:11784 DMA free:3296kB min:12kB low:12kB high:16kB active:0kB inactive:0kB present:9084kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 4024 6052 6052
DMA32 free:14704kB min:6612kB low:8264kB high:9916kB active:2015296kB inactive:1992804kB present:4120800kB pages_scanned:7209383 all_unreclaimable? yes
lowmem_reserve[]: 0 0 2027 2027
Normal free:3296kB min:3332kB low:4164kB high:4996kB active:1002524kB inactive:1019072kB present:2076560kB pages_scanned:8197357 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 2*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 3296kB DMA32: 24*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 3*4096kB = 14704kB Normal: 20*4kB 8*8kB 51*16kB 17*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3296kB
HighMem: empty
2769 pagecache pages
Swap cache: add 925839, delete 925152, find 194321/239169, race 0+383
Free swap  = 0kB
Total swap = 2096472kB
Out of memory: Killed process 4047, UID 986, (sogod).
sogod: page allocation failure. order:0, mode:0x201d2

Call Trace:
 [<ffffffff8020fb18>] __alloc_pages+0x2ef/0x308
 [<ffffffff80213a19>] __do_page_cache_readahead+0xc8/0x1af
 [<ffffffff8021428a>] filemap_nopage+0x14c/0x360
 [<ffffffff80208e91>] __handle_mm_fault+0x444/0x144f
 [<ffffffff8020622a>] hypercall_page+0x22a/0x1000
 [<ffffffff80266d94>] do_page_fault+0xf72/0x131b
 [<ffffffff8026ea05>] monotonic_clock+0x35/0x7b
 [<ffffffff80261df5>] thread_return+0x6c/0x113
 [<ffffffff8025f82b>] error_exit+0x0/0x6e

DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:19
cpu 0 cold: high 62, batch 15 used:20
cpu 1 hot: high 186, batch 31 used:30
cpu 1 cold: high 62, batch 15 used:58
cpu 2 hot: high 186, batch 31 used:29
cpu 2 cold: high 62, batch 15 used:11
cpu 3 hot: high 186, batch 31 used:29
cpu 3 cold: high 62, batch 15 used:47
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:59
cpu 5 hot: high 186, batch 31 used:30
cpu 5 cold: high 62, batch 15 used:61
Normal per-cpu:
cpu 0 hot: high 186, batch 31 used:53
cpu 0 cold: high 62, batch 15 used:32
cpu 1 hot: high 186, batch 31 used:21
cpu 1 cold: high 62, batch 15 used:47
cpu 2 hot: high 186, batch 31 used:30
cpu 2 cold: high 62, batch 15 used:53
cpu 3 hot: high 186, batch 31 used:1
cpu 3 cold: high 62, batch 15 used:59
cpu 4 hot: high 186, batch 31 used:30
cpu 4 cold: high 62, batch 15 used:55
cpu 5 hot: high 186, batch 31 used:6
cpu 5 cold: high 62, batch 15 used:43
HighMem per-cpu: empty
Free pages:       21296kB (0kB HighMem)
Active:761008 inactive:746379 dirty:0 writeback:0 unstable:0 free:5324 slab:4878 mapped-file:2006 mapped-anon:1507929 pagetables:11784 DMA free:3296kB min:12kB low:12kB high:16kB active:0kB inactive:0kB present:9084kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 4024 6052 6052
DMA32 free:14704kB min:6612kB low:8264kB high:9916kB active:2036812kB inactive:1971388kB present:4120800kB pages_scanned:9995769 all_unreclaimable? yes
lowmem_reserve[]: 0 0 2027 2027
Normal free:3296kB min:3332kB low:4164kB high:4996kB active:1007220kB inactive:1014128kB present:2076560kB pages_scanned:8600508 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 2*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 3296kB DMA32: 24*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 3*4096kB = 14704kB Normal: 20*4kB 8*8kB 51*16kB 17*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3296kB
HighMem: empty
2758 pagecache pages
Swap cache: add 925839, delete 925157, find 194321/239169, race 0+383
Free swap  = 0kB
Total swap = 2096472kB
Out of memory: Killed process 2015, UID 986, (sogod).
--
[email protected]
https://inverse.ca/sogo/lists

Reply via email to