On Mon, Jan 16, 2012 at 12:43 PM, Ümit Seren <uemit.se...@gmail.com> wrote:

> I created a hdf5 file with pytables which contains around 29 000
> tables with around 31k rows each.
> I am trying to create a caching table in the same hdf5 file which
> contains a subset of those 29 000 tables.
>
> I wrote a script which basically iterates through each of the 29 000
> tables retrieves a subset and then writes it to the caching table.
> Basically it goes through the subset and then adds the rows from the
> subset one by one to the caching table.
> The first couple of 1000 tables run really quickly (around 5-8 tables
> per second or so). However the longer the script runs the slower it
> becomes (down to 1 table per second).
>
> Does anyone know why this is the case? (LRU cache  maybe?)
>
> Right now I write row by row using row.append().
> Is it faster to create the dataset in memory and then write it as a
> whole block to the table?
>

Yes.  In general, the more you can read / write in one go, the better
performance is.  There is overhead in both Python and HDF5 to the
method calls.

However, the 9x slowdown per call is a little disconcerting.  Do you have
a demonstration script that you can share?

Be Well
Anthony


>
> thanks in advance
>
> Ümit
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Mar 27 - Feb 2
> Save $400 by Jan. 27
> Register now!
> http://p.sf.net/sfu/rsa-sfdev2dev2
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to