Re: Likely mem leak in 3.7

2012-12-01 Thread James Cloos
I've extensively tested 2844a48706e5 (tip at the time I compiled) for
the last few days and have been unable to reproduce.

This bug appears to be fixed.

Thanks.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Likely mem leak in 3.7

2012-11-17 Thread James Cloos
> "DR" == David Rientjes  writes:

I had to reboot to get some work done.

In order to re-create the missing ram I had to use a btrfs fs for the
temp files for a few emerge(1) runs.  It seems that the ram which was
used for the cache of those temp files is not recovered when the files
are deleted?

Right now cached is about 4G smaller than typical on previous kernels.

I also notice that there are ten 1G maps (end of /proc/meminfo); I
wonder whether those end up used inefficiently?

I compared the output below to a capture on 3.6 about 1¾ hours after
boot; the numbers do not seem much different, although the btrfs slabs
have different names.

DR> echo m > /proc/sysrq-trigger

[95432.729187] SysRq : Show Memory
[95432.729192] Mem-Info:
[95432.729194] Node 0 DMA per-cpu:
[95432.729196] CPU0: hi:0, btch:   1 usd:   0
[95432.729198] CPU1: hi:0, btch:   1 usd:   0
[95432.729199] CPU2: hi:0, btch:   1 usd:   0
[95432.729200] CPU3: hi:0, btch:   1 usd:   0
[95432.729201] Node 0 DMA32 per-cpu:
[95432.729203] CPU0: hi:  186, btch:  31 usd: 157
[95432.729205] CPU1: hi:  186, btch:  31 usd: 184
[95432.729206] CPU2: hi:  186, btch:  31 usd: 173
[95432.729207] CPU3: hi:  186, btch:  31 usd: 181
[95432.729208] Node 0 Normal per-cpu:
[95432.729210] CPU0: hi:  186, btch:  31 usd: 158
[95432.729211] CPU1: hi:  186, btch:  31 usd: 157
[95432.729212] CPU2: hi:  186, btch:  31 usd:  89
[95432.729214] CPU3: hi:  186, btch:  31 usd: 122
[95432.729218] active_anon:965802 inactive_anon:174825 isolated_anon:0
 active_file:162783 inactive_file:223765 isolated_file:0
 unevictable:0 dirty:4486 writeback:315 unstable:0
 free:178576 slab_reclaimable:113370 slab_unreclaimable:8715
 mapped:949576 shmem:940282 pagetables:21020 bounce:0
 free_cma:0
[95432.729221] Node 0 DMA free:15864kB min:88kB low:108kB high:132kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15880kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[95432.729226] lowmem_reserve[]: 0 3168 11682 11682
[95432.729229] Node 0 DMA32 free:376380kB min:18300kB low:22872kB high:27448kB 
active_anon:1551176kB inactive_anon:345756kB active_file:486140kB 
inactive_file:361936kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:3244136kB mlocked:0kB dirty:1368kB writeback:88kB mapped:1883492kB 
shmem:1880344kB slab_reclaimable:157768kB slab_unreclaimable:3944kB 
kernel_stack:496kB pagetables:23592kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[95432.729235] lowmem_reserve[]: 0 0 8514 8514
[95432.729237] Node 0 Normal free:322060kB min:49188kB low:61484kB high:73780kB 
active_anon:2312032kB inactive_anon:353544kB active_file:164992kB 
inactive_file:533124kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:8718776kB mlocked:0kB dirty:16576kB writeback:1172kB mapped:1914812kB 
shmem:1880784kB slab_reclaimable:295712kB slab_unreclaimable:30900kB 
kernel_stack:3552kB pagetables:60488kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[95432.729242] lowmem_reserve[]: 0 0 0 0
[95432.729245] Node 0 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 
0*512kB 1*1024kB 1*2048kB 3*4096kB = 15864kB
[95432.729252] Node 0 DMA32: 36923*4kB 15958*8kB 4599*16kB 741*32kB 27*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 376428kB
[95432.729258] Node 0 Normal: 35337*4kB 93927*8kB 46306*16kB 19127*32kB 
6764*64kB 2788*128kB 1236*256kB 539*512kB 273*1024kB 84*2048kB 201*4096kB = 
4902748kB
[95432.729265] 1329636 total pagecache pages
[95432.729267] 2823 pages in swap cache
[95432.729268] Swap cache stats: add 46261, delete 43438, find 709513/710665
[95432.729269] Free swap  = 3540840kB
[95432.729270] Total swap = 3670008kB
[95432.780343] 3080176 pages RAM
[95432.780345] 68408 pages reserved
[95432.780346] 7950465 pages shared
[95432.780347] 418322 pages non-shared

DR> zgrep CONFIG_SL[AU]B /proc/config.gz

CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLABINFO=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set

DR> cat /proc/slabinfo

slabinfo - version: 2.1
# name
 : tunables: slabdata 
  
ext4_groupinfo_1k 64 64128   321 : tunables000 : 
slabdata  2  2  0
fat_inode_cache   48 48680   244 : tunables000 : 
slabdata  2  2  0
fat_cache  0  0 40  1021 : tunables000 : 
slabdata  0  0  0
UDPLITEv6  0  0   1088   308 : tunables000 : 
slabdata  0  0  0
UDPv6120120   1088   308 : tunables   

Re: Likely mem leak in 3.7

2012-11-16 Thread David Rientjes
On Thu, 15 Nov 2012, James Cloos wrote:

> The kernel does not log anything relevant to this.
> 

Can you do the following as root:

dmesg -c > /dev/null
echo m > /proc/sysrq-trigger
dmesg > foo

and send foo inline in your reply?

> Slabinfo gives some odd output.  It seems to think there are negative
> quantities of some slabs:
> 
> Name   Objects ObjsizeSpace Slabs/Part/Cpu  O/S O %Fr %Ef 
> Flg
> :at-016   5632  1690.1K 18446744073709551363/0/275  
> 256 0   0 100 *a
> :t-0483386  48   249.8K 18446744073709551558/22/119   
> 85 0  36  65 *
> :t-1201022 120   167.9K 18446744073709551604/14/53   
> 34 0  34  73 *
> blkdev_requests182 376   122.8K 18446744073709551604/7/27   
> 21 1  46  55 
> ext4_io_end3481128   393.2K 18446744073709551588/0/40   
> 29 3   0  99 a
> 

Can you send the output of

zgrep CONFIG_SL[AU]B /proc/config.gz
cat /proc/slabinfo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Likely mem leak in 3.7

2012-11-15 Thread James Cloos
Starting with 3.7 rc1, my workstation seems to loose ram.

Up until (and including) 3.6, used-(buffers+cached) was roughly the same
as sum(rss) (taking shared into account).  Now there is an approx 6G gap.

When the box first starts, it is clearly less swappy than with <= 3.6; I
can't tell whether that is related.  The reduced swappiness persists.

It seems to get worse when I update packages (it runs Gentoo).  The
portage tree and overlays are on btrfs filesystems.  As is /var/log
(with compression, except for the distfiles fs).  The compilations
themselves are done in a tmpfs.  I CCed l-b because of that apparent
correlation.

My postgress db is on xfs (tested faster) and has a 3G shared segment,
but that recovers when the pg process is stopped; neither of those seem
to be implicated.

There are also several ext4 partitions, including / and /home.

Cgroups are configured, and openrc does put everything it starts into
its own directory under /sys/fs/cgroup/openrc.  But top(1) shows all
of the processes, and its idea of free mem does change with pg's use
of its shared segment.  So it doesn't *look* like the ram is hiding
in some cgroup.

The kernel does not log anything relevant to this.

Slabinfo gives some odd output.  It seems to think there are negative
quantities of some slabs:

Name   Objects ObjsizeSpace Slabs/Part/Cpu  O/S O %Fr %Ef 
Flg
:at-016   5632  1690.1K 18446744073709551363/0/275  256 
0   0 100 *a
:t-0483386  48   249.8K 18446744073709551558/22/119   
85 0  36  65 *
:t-1201022 120   167.9K 18446744073709551604/14/53   34 
0  34  73 *
blkdev_requests182 376   122.8K 18446744073709551604/7/27   21 
1  46  55 
ext4_io_end3481128   393.2K 18446744073709551588/0/40   29 
3   0  99 a

The largest entries it reports are:

Name   Objects ObjsizeSpace Slabs/Part/Cpu  O/S O %Fr %Ef 
Flg
ext4_inode_cache 38448 864   106.1M3201/566/39   37 3  17  31 a
:at-104 316429 10436.5M   8840/3257/92   39 0  36  89 *a
btrfs_inode  13271 98435.7M  1078/0/14   33 3   0  36 a
radix_tree_node  43785 56034.7M   2075/1800/45   28 2  84  70 a
dentry   64281 19214.3M   3439/1185/55   21 0  33  86 a
proc_inode_cache 15695 60812.1M 693/166/51   26 2  22  78 a
inode_cache  10730 544 6.0M   349/0/21   29 2   0  96 a
task_struct6285896 4.3M  123/23/105 3  17  84 

The total Space is much smaller than the missing ram.

The only other difference I see is that one process has left behind
several score zombies.  It is structured as a parent with several
worker kids, but the kids stay zombie even when the parent process
is stopped and restarted.  wchan shows that they are stuck in exit.
Their normal rss isn't enough to account for the missing ram, even
if it isn't reclaimed.  (Not to mention, ram != brains. :)

I haven't tried bisecting because of the time it takes to confirm the
problem (several hours of uptime).  I've only compiled (each of) the
rc tags, so v3.6 is that last known good and v3.7-rc1 is the first
known bad.

If there is anything that I missed, please let me know!

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/