2013/5/12 Mihai Rotaru <[email protected]>: > Salut, > > Ma confrunt cu o problema destul de bizara in care nu inteleg unde "dispar" > in jur de 20 GB. Am testat cu RedHat 6.4 (kernel default 2.6.x) si cu > Debian 7.0 (kernel default 3.x) iar problema apare in ambele distributii. > > Rulez o aplicatie pe un server dedicat cu 64 GB de RAM. La aplicatie se > conecteaza 5 milioane de clienti prin TCP care raman conectati la > aplicatie. > > Kernelul foloseste 16 GB de memorie pentru SLAB pentru cele 5 milioane de > socketuri (vezi "slabtop" si SLAB din "/proc/meminfo"). > > Aplicatia foloseste in jur de 25 GB (vezi RES din "top" si ACTIVE din > "/proc/meminfo") > > Deci, memoria folosita de aplicatie 25 GB + memoria folosita de kernel > (slab) 16 GB = 41 GB. Deci, din cei 64 GB total, ar trebui sa mai ramana 23 > GB memorie libera. Cu toate astea, nu raman decat 2 GB de memorie libera > dupa cum este raportat de "vmstat" si "top". Am testat (incercand sa > conectez mai multi clienti) si intr-adevar memoria libera reala este cea > raportata de "vmstat" si "top", adica 2 GB. > > Are cineva vreo idee unde au "disparut" cei in jur de 20 GB, mai exact, de > cine sunt folositi cei 20 GB de RAM si de ce Linux-ul nu-i raporteaza sau > daca-i raporteaza unde anume? Mai jos este output-ul de la top, slabtop, > vmstat, si /proc/meminfo. > > Mersi mult, > Mihai > > -bash-4.1# more /proc/meminfo > MemTotal: 65956340 kB > MemFree: 2867416 kB > Buffers: 23972 kB > Cached: 75244 kB > SwapCached: 0 kB > Active: 25330980 kB > Inactive: 67204 kB > Active(anon): 25299084 kB > Inactive(anon): 80 kB > Active(file): 31896 kB > Inactive(file): 67124 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 33046520 kB > SwapFree: 33046520 kB > Dirty: 12 kB > Writeback: 0 kB > AnonPages: 25298960 kB > Mapped: 14084 kB > Shmem: 204 kB > Slab: 17095636 kB > SReclaimable: 5011828 kB > SUnreclaim: 12083808 kB > KernelStack: 2568 kB > PageTables: 51892 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 66024688 kB > Committed_AS: 27313564 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 457356 kB > VmallocChunk: 34325066836 kB > HardwareCorrupted: 0 kB > AnonHugePages: 25171968 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 5604 kB > DirectMap2M: 2078720 kB > DirectMap1G: 65011712 kB > > -bash-4.1# slabtop > > Active / Total Objects (% used) : 30118719 / 30135494 (99.9%) > Active / Total Slabs (% used) : 4269889 / 4269910 (100.0%) > Active / Total Caches (% used) : 99 / 181 (54.7%) > Active / Total Size (% used) : 15391683.09K / 15394406.68K (100.0%) > Minimum / Average / Maximum Object : 0.02K / 0.51K / 4096.00K > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > > 5012780 5012730 99% 0.19K 250639 20 1002556K dentry > 5001540 5001512 99% 0.19K 250077 20 1000308K filp > 5000285 5000025 99% 0.07K 94345 53 377380K eventpoll_pwq > 5000165 5000161 99% 0.69K 1000033 5 4000132K sock_inode_cache > 5000160 5000025 99% 0.12K 166672 30 666688K eventpoll_epi > 5000018 5000012 99% 1.81K 2500009 2 10000036K TCPv6 > 20832 20334 97% 0.03K 186 112 744K size-32 > 14337 14197 99% 0.14K 531 27 2124K sysfs_dir_cache > 13570 10962 80% 0.06K 230 59 920K size-64 > 8843 8820 99% 0.10K 239 37 956K buffer_head > 8268 7879 95% 0.07K 156 53 624K selinux_inode_security > > -bash-4.1# top > > top - 17:53:33 up 6:57, 5 users, load average: 0.00, 0.03, 0.07 > Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie > Cpu(s): 10.8%us, 8.1%sy, 0.0%ni, 74.4%id, 0.0%wa, 0.0%hi, 6.7%si, > 0.0%st > Mem: 65956340k total, 63087348k used, 2868992k free, 24020k buffers > Swap: 33046520k total, 0k used, 33046520k free, 75244k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND > > 32676 root 20 0 29.9g 24g 9916 S 252.1 38.4 149:44.21 java > ... > > -bash-4.1# vmstat 5 > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu----- > r b swpd free buff cache si so bi bo in cs us sy id > wa st > 3 0 0 2869620 24036 75244 0 0 0 0 65 57 3 3 93 > 0 0 > 0 0 0 2868628 24036 75244 0 0 0 0 101608 154075 7 > 9 84 0 0
Mai e tinuta cate o pagina pentru fiecare socket activ. Astea parca nu erau contorizate in /proc/meminfo. Daca ai 5M conexiuni active e posibil sa se ajunga la 20G. E o schimbare prin versiunile mai noi de kernel in care paginile astea nu mai sunt per socket, ci per process. Poti sa incerci cu 3.8 sa vezi daca e mai buna situatia. sorin _______________________________________________ RLUG mailing list [email protected] http://lists.lug.ro/mailman/listinfo/rlug
