On 07/17/2012 11:46 AM, Stephan Wiesand wrote:
On Jul 17, 2012, at 19:22 , Orion Poplawski wrote:

Our SL6.2 KVM and nfs/backup server has been crashing frequently recently 
(starting around Fri 13th - yikes!) with Kernel panic - Out of memory and no 
killable processes.  The server has 48GB ram, 2GB swap, only about 15GB 
dedicated to VM guests.  I've tried bumping up vm.min_free_kbytes to 262144 to 
no avail.  Nothing strange is getting written to the logs before the crash.

Hmm, I suppose bumping up min_free_kbytes might be making things worse?

Happening with both 2.6.32-220.23.1 and 2.6.32-279.1.1.

Anyone else seeing this?

Not on our KVM servers (which don't have any other duties though), which have 
been running -220.23.1 for three weeks.

  Any other ideas?

Is swap space sufficient?

It was 2GB, but barely used. The system should have way more RAM than needed. Upped to 8GB.

Have you modified vm.overcommit_* ? Doing so may help turning the panics into 
allocation failures that can be handled.


Haven't modified them:

vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.nr_overcommit_hugepages = 0

I suppose:

vm.overcommit_memory = 2
vm.overcommit_ratio = 80

would limit total RAM usage to about 46.4GB which should be safe. I might try that next.

Do any slab pools keep growing, to an unusual size?


Here's what I have shortly after reboot.  I'll keep watching it.

 Active / Total Objects (% used)    : 1500116 / 1526912 (98.2%)
 Active / Total Slabs (% used)      : 37344 / 37481 (99.6%)
 Active / Total Caches (% used)     : 134 / 204 (65.7%)
 Active / Total Size (% used)       : 147024.04K / 152389.66K (96.5%)
 Minimum / Average / Maximum Object : 0.02K / 0.10K / 4096.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
123540 123283  99%    0.19K   6177       20     24708K size-192
236059 235736  99%    0.06K   4001       59     16004K ksm_rmap_item
484128 484094  99%    0.02K   3362      144     13448K avtab_node
   203    203 100%   32.12K    203        1     12992K kmem_cache
341936 341360  99%    0.03K   3053      112     12212K size-32
 14136  14109  99%    0.58K   2356        6      9424K inode_cache
 71595  71595 100%    0.10K   1935       37      7740K buffer_head
 31500  31078  98%    0.19K   1575       20      6300K dentry
 10857  10857 100%    0.55K   1551        7      6204K radix_tree_node
  5140   4839  94%    1.00K   1285        4      5140K size-1024
  4480   4462  99%    1.00K   1120        4      4480K ext4_inode_cache
  5772   5684  98%    0.62K    962        6      3848K proc_inode_cache
 24300  24269  99%    0.14K    900       27      3600K sysfs_dir_cache
  1558   1348  86%    2.00K    779        2      3116K size-2048
 13794  13421  97%    0.20K    726       19      2904K vm_area_struct
  1074   1050  97%    2.59K    358        3      2864K task_struct
   699    699 100%    4.00K    699        1      2796K size-4096
   975    951  97%    2.06K    325        3      2600K sighand_cache
  4536   3262  71%    0.50K    567        8      2268K size-512
    17     17 100%  128.00K     17        1      2176K size-131072
 27401  27229  99%    0.07K    517       53      2068K selinux_inode_security
  2255   2232  98%    0.78K    451        5      1804K shmem_inode_cache
 22007  21365  97%    0.06K    373       59      1492K size-64
 10950   9234  84%    0.12K    365       30      1460K size-128
    22     22 100%   64.00K     22        1      1408K size-65536
   326    326 100%    4.00K    326        1      1304K biovec-256
  5720   4125  72%    0.19K    286       20      1144K filp
  1020    985  96%    1.00K    255        4      1020K signal_cache

After some disk activity I'm at:

 Active / Total Objects (% used)    : 4829537 / 4855308 (99.5%)
 Active / Total Slabs (% used)      : 163899 / 163988 (99.9%)
 Active / Total Caches (% used)     : 132 / 204 (64.7%)
 Active / Total Size (% used)       : 630344.49K / 634988.70K (99.3%)
 Minimum / Average / Maximum Object : 0.02K / 0.13K / 4096.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
2918893 2918736  99%    0.10K  78889       37    315556K buffer_head
112616 112599  99%    1.00K  28154        4    112616K ext4_inode_cache
 98637  98563  99%    0.55K  14091        7     56364K radix_tree_node
165540 165060  99%    0.19K   8277       20     33108K dentry
123520 123363  99%    0.19K   6176       20     24704K size-192
236059 235736  99%    0.06K   4001       59     16004K ksm_rmap_item
484128 484094  99%    0.02K   3362      144     13448K avtab_node
   203    203 100%   32.12K    203        1     12992K kmem_cache
342384 341570  99%    0.03K   3057      112     12228K size-32
139019 138470  99%    0.07K   2623       53     10492K selinux_inode_security

Still watching it...

--
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder Office                  FAX: 303-415-9702
3380 Mitchell Lane                       [email protected]
Boulder, CO 80301                   http://www.nwra.com

Reply via email to