On Fri, 2009-09-18 at 17:07 +0200, Francesc Alted wrote:
> A Friday 18 September 2009 16:09:58 David Fokkema escrigué:
> > Hi list,
> >
> > I'm not sure what this is... I've written a minimal script which shows
> > the following problem: fill up a table with 10 million rows, which costs
> > almost no memory. Then, do the following query:
> >
> > r = data.root.events.col('event_id')
> >
> > which brings up memory usage from 14 Mb to 99 Mb. Do it again, which
> > brings memory usage further up by tens of Mb's, which are freed after
> > the query finishes.
> 
> This is expected.  While the query is executing, the results are being kept 
> in 
> a new NumPy array.  When the query finishes, the new NumPy object is bound to 
> the `r` variable, and the old NumPy object pointed by `r` is released.

Ah, yes, of course. Interestingly, it seems that sys.getsizeof doesn't
report the size of the NumPy object, but only the reference r? It
returns 40 bytes, nothing else.

> > Instead, try the following query:
> >
> > r = [x['event_id'] for x in data.root.events]
> >
> > which brings memory usage from 14 Mb to 296 Mb. Do it again, which
> > brings memory usage up to 528 Mb.
> 
> Expected again.  In this case, you are getting the column as a Python list, 
> and this takes *far* more space than a regular NumPy array.

Ok, but surely not _that_ much space? I end up with a list consisting of
10 million values (longs) which came from a UInt64Col, so should take up
about 8 bytes each, so lets say 80 million bytes if python doesn't
optimize the small numbers and add some overhead because of the list.
Now, sys.getsizeof returns 40 megabytes, which is about what I'd expect.
However, that's nowhere near 282 Mb which is taken up by python.

> > Del-ing objects and imports doesn't clean up memory...
> 
> It should.  How are you deleting objects, and how do you determine that 
> memory 
> is not being released?

Ah, lets see:

This script:

import tables

class Event(tables.IsDescription):
    event_id = tables.UInt64Col()
    ext_timestamp = tables.UInt64Col(dflt=9999)
    other_value = tables.UInt64Col(dflt=9999)

def create_tables():
    data = tables.openFile('test.h5', 'w', 'PyTables Test')
    data.createTable('/', 'events', Event, 'Test Events')

    table = data.root.events
    tablerow = table.row
    for i in xrange(10000000):
        tablerow['event_id'] = i
        tablerow.append()
    table.flush()

    data.close()

def test_query():
    data = tables.openFile('test.h5', 'r')
    r = [x['event_id'] for x in data.root.events]
    data.close()
    return r

And this is my log:

>>> from test_tables import *
>>> create_tables()

(now test.h5 is 230 Mb in size and python uses 19 Mb)

>>> r = test_query()

(now python uses 293 Mb)

>>> import sys
>>> sys.getsizeof(r)
40764028

(which is only 40 Mb, right? That's something I can live with, ;-) )

>>> dir()
['Event', '__builtins__', '__doc__', '__name__', '__package__',
'create_tables', 'r', 'sys', 'tables', 'test_query']
>>> del Event
>>> del create_tables
>>> del r
>>> del tables
>>> del test_query
>>> del sys
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']

(python still uses 293 Mb...)

So... is this strange? Test_query closes the file so there shouldn't be
anything floating around related to that... However, there might be
something in the C code which is malloc-ing but not freeing memory?

Best regards,

David


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to