On 07/17/2012 11:46 AM, Stephan Wiesand wrote:
On Jul 17, 2012, at 19:22 , Orion Poplawski wrote:
Our SL6.2 KVM and nfs/backup server has been crashing frequently recently
(starting around Fri 13th - yikes!) with Kernel panic - Out of memory and no
killable processes. The server has 48GB ram, 2GB swap, only about 15GB
dedicated to VM guests. I've tried bumping up vm.min_free_kbytes to 262144 to
no avail. Nothing strange is getting written to the logs before the crash.
Hmm, I suppose bumping up min_free_kbytes might be making things worse?
Happening with both 2.6.32-220.23.1 and 2.6.32-279.1.1.
Anyone else seeing this?
Not on our KVM servers (which don't have any other duties though), which have
been running -220.23.1 for three weeks.
Any other ideas?
Is swap space sufficient?
It was 2GB, but barely used. The system should have way more RAM than needed.
Upped to 8GB.
Have you modified vm.overcommit_* ? Doing so may help turning the panics into
allocation failures that can be handled.
Haven't modified them:
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.nr_overcommit_hugepages = 0
I suppose:
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
would limit total RAM usage to about 46.4GB which should be safe. I might try
that next.
Do any slab pools keep growing, to an unusual size?
Here's what I have shortly after reboot. I'll keep watching it.
Active / Total Objects (% used) : 1500116 / 1526912 (98.2%)
Active / Total Slabs (% used) : 37344 / 37481 (99.6%)
Active / Total Caches (% used) : 134 / 204 (65.7%)
Active / Total Size (% used) : 147024.04K / 152389.66K (96.5%)
Minimum / Average / Maximum Object : 0.02K / 0.10K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
123540 123283 99% 0.19K 6177 20 24708K size-192
236059 235736 99% 0.06K 4001 59 16004K ksm_rmap_item
484128 484094 99% 0.02K 3362 144 13448K avtab_node
203 203 100% 32.12K 203 1 12992K kmem_cache
341936 341360 99% 0.03K 3053 112 12212K size-32
14136 14109 99% 0.58K 2356 6 9424K inode_cache
71595 71595 100% 0.10K 1935 37 7740K buffer_head
31500 31078 98% 0.19K 1575 20 6300K dentry
10857 10857 100% 0.55K 1551 7 6204K radix_tree_node
5140 4839 94% 1.00K 1285 4 5140K size-1024
4480 4462 99% 1.00K 1120 4 4480K ext4_inode_cache
5772 5684 98% 0.62K 962 6 3848K proc_inode_cache
24300 24269 99% 0.14K 900 27 3600K sysfs_dir_cache
1558 1348 86% 2.00K 779 2 3116K size-2048
13794 13421 97% 0.20K 726 19 2904K vm_area_struct
1074 1050 97% 2.59K 358 3 2864K task_struct
699 699 100% 4.00K 699 1 2796K size-4096
975 951 97% 2.06K 325 3 2600K sighand_cache
4536 3262 71% 0.50K 567 8 2268K size-512
17 17 100% 128.00K 17 1 2176K size-131072
27401 27229 99% 0.07K 517 53 2068K selinux_inode_security
2255 2232 98% 0.78K 451 5 1804K shmem_inode_cache
22007 21365 97% 0.06K 373 59 1492K size-64
10950 9234 84% 0.12K 365 30 1460K size-128
22 22 100% 64.00K 22 1 1408K size-65536
326 326 100% 4.00K 326 1 1304K biovec-256
5720 4125 72% 0.19K 286 20 1144K filp
1020 985 96% 1.00K 255 4 1020K signal_cache
After some disk activity I'm at:
Active / Total Objects (% used) : 4829537 / 4855308 (99.5%)
Active / Total Slabs (% used) : 163899 / 163988 (99.9%)
Active / Total Caches (% used) : 132 / 204 (64.7%)
Active / Total Size (% used) : 630344.49K / 634988.70K (99.3%)
Minimum / Average / Maximum Object : 0.02K / 0.13K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
2918893 2918736 99% 0.10K 78889 37 315556K buffer_head
112616 112599 99% 1.00K 28154 4 112616K ext4_inode_cache
98637 98563 99% 0.55K 14091 7 56364K radix_tree_node
165540 165060 99% 0.19K 8277 20 33108K dentry
123520 123363 99% 0.19K 6176 20 24704K size-192
236059 235736 99% 0.06K 4001 59 16004K ksm_rmap_item
484128 484094 99% 0.02K 3362 144 13448K avtab_node
203 203 100% 32.12K 203 1 12992K kmem_cache
342384 341570 99% 0.03K 3057 112 12228K size-32
139019 138470 99% 0.07K 2623 53 10492K selinux_inode_security
Still watching it...
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder Office FAX: 303-415-9702
3380 Mitchell Lane [email protected]
Boulder, CO 80301 http://www.nwra.com