[Pytables-users] Write performance & iterating through nodes

Ümit Seren Mon, 16 Jan 2012 10:43:56 -0800

I created a hdf5 file with pytables which contains around 29 000
tables with around 31k rows each.
I am trying to create a caching table in the same hdf5 file which
contains a subset of those 29 000 tables.


I wrote a script which basically iterates through each of the 29 000
tables retrieves a subset and then writes it to the caching table.
Basically it goes through the subset and then adds the rows from the
subset one by one to the caching table.
The first couple of 1000 tables run really quickly (around 5-8 tables
per second or so). However the longer the script runs the slower it
becomes (down to 1 table per second).

Does anyone know why this is the case? (LRU cache  maybe?)

Right now I write row by row using row.append().
Is it faster to create the dataset in memory and then write it as a
whole block to the table?

thanks in advance

Ümit

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] Write performance & iterating through nodes

Reply via email to