Re: [Pytables-users] Memory leak in PyTables (latest git/pro version)

Francesc Alted Fri, 22 Jul 2011 02:29:16 -0700

A Friday 22 July 2011 11:15:41 Francesc Alted escrigué:
> Here it is a message that bounced (unsubscribed address, probably).
> 
> ---------------------------------------------------------------------
> ----------- De: Dave LeBlanc <david.lebl...@gmail.com>
> A: pytables-users@lists.sourceforge.net
> Data: Dimecres 19:59:33
> 
> Hi all, we're looking to use PyTables in a scenario where we
> open/read/write/close files a lot with a long-running interpreter
> session.
> We're seeing memory usage steadily increase for even simple things.
> 
> For example, just opening and closing a tables file in a loop will
> cause memory to increase. To illustrate this, I've opened the
> check_leaks.py script in $PyTables/tables/tests, change the
> following line:
> 
> from:
>     var2 = tables.StringCol(length=1, pos=2)
> to:
>     var2 = tables.StringCol(1, pos=2)
> 
> This allows the script to run again with the newer pytables stuff.
> 
> 
> Writing:
> 
> Then running the check_leaks.py like this will create a file, foo.h5:
> 
>     python check_leaks.py -t -i 15 -w foo.h5 > write-stats.txt
> 
> 
> 
> Reading:
> 
> We can now run the read-test:
> 
>     python check_leaks.py -t -i 50 -r foo.h5 > read-stats.txt
> 
> 
> 
> Just opening and closing:
> 
> If you comment out some lines in the read test, so it just opens and
> closes
> the file:
> 
> def read_table(file, nchildren, niter):
> 
>     for i in range(niter):
> 
>         fileh = tables.openFile(file, mode = "r")
> 
>         #for child in range(nchildren):
> 
>         #    node = fileh.getNode(fileh.root, 'table' + str(child))
> 
>         #    klass = node._v_attrs.CLASS
> 
>         #    data = node[:]  # Read data
> 
>         #    #print "data-->", data
> 
>         #show_mem("After reading data. Iter %s" % i)
> 
>         fileh.close()
> 
>         show_mem("After close")
> 
> 
> Then run the check-leaks script again:
> 
>     python check_leaks.py -t -i 90000 -r foo.h5 > read-stats.txt
> 
> 
> 
> For just the read case,  the first few lines (after it warms up) look
> like:
> 
> 
> Memory usage: ******* After close *******
> VmSize:   89044 kB      *VmRSS:   14036 kB*
> VmData:   74348 kB      VmStk:     200 kB
> VmExe:     1920 kB      VmLib:   11348 kB
> WallClock time: 0.024069070816   Delta time: 0.00594902038574
> 
> 
> 
> After 9000 loops, it looks like:
> 
> Memory usage: ******* After close *******
> VmSize:   92604 kB      *VmRSS:   17672 kB*
> VmData:   77908 kB      VmStk:     200 kB
> VmExe:     1920 kB      VmLib:   11348 kB
> WallClock time: 51.3589119911   Delta time: 0.00462102890015
> 
> 
> That's almost 3mb, just opening and closing files. Things get worse
> for reading and writing, but I'm very surprised to see this just
> opening and closing files. I'll be digging into this, but I'd
> appreciate it if anyone
> could provide some insight as to what's consuming this memory.


I wouldn't say that's a leak.  Many of the building blocks of PyTables 
(the Python interpreter, the HDF5 library, PyTables itself) have 
internal caches that increase their capacity as they warm-up.  3 MB 
after 9000 loops is a really low figure, and you should not expect to 
see this increasing much more than that.

However, my experience is somewhat different.  I.e. for reading I get:

[clip]
Memory usage: ******* After reading data. Iter 37 *******
VmSize:  194488 kB      VmRSS:   44752 kB
VmData:   76020 kB      VmStk:     212 kB
VmExe:     1352 kB      VmLib:   15332 kB
WallClock time: 45.2275369167   Delta time: 1.1934440136
[clip]
Memory usage: ******* After reading data. Iter 117 *******
VmSize:  194488 kB      VmRSS:   44896 kB
VmData:   76020 kB      VmStk:     212 kB
VmExe:     1352 kB      VmLib:   15332 kB
WallClock time: 141.416234016   Delta time: 1.16887283325
[clip]

That is, VmSize/VmData does not increase at all here for 70 iterations 
(a good sign that shows that no leaks are developing here).

Just to be sure, I've run the check via valgrind, here it is the result 
for 3 iterations:

==14010== LEAK SUMMARY:
==14010==    definitely lost: 0 bytes in 0 blocks
==14010==    indirectly lost: 0 bytes in 0 blocks
==14010==      possibly lost: 237,683 bytes in 333 blocks
==14010==    still reachable: 8,705,393 bytes in 3,144 blocks
==14010==         suppressed: 0 bytes in 0 blocks

So, I think you should have no worries: PyTables does not leak (at least 
in these scenarios :).

-- 
Francesc Alted

------------------------------------------------------------------------------
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks & Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Memory leak in PyTables (latest git/pro version)

Reply via email to