Hi, I'm trying to find the most efficient way to read multiple fields from my table. I noticed that readWhere is really slow:
In [158]: %timeit A = array([row[field] for row in t.root.collocations.where(cond)], dtype='int64') 1 loops, best of 3: 649 ms per loop In [159]: %timeit B = t.root.collocations.readWhere(cond, field=field) 1 loops, best of 3: 7.99 s per loop In [160]: (A==B).all() Out[160]: True Why is readWhere so slow? Now, I really have multiple fields. Neither readWhere, nor row.__getitem__, nor the void returned by row.fetch_all_fields supports fetching more than one field as a recarray does. Is this by design? What is the preferred way to fetch multiple fields? I would like to get a ndarray with the requested fields, of course maintaining the dtype. If I write: In [218]: %timeit A = [[row[f] for f in fields] for row in t.root.collocations.where(cond)] 1 loops, best of 3: 1.19 s per loop I lose the dtype I could do the slower and more memory-intensive In [217]: %timeit A = [row.fetch_all_fields() for row in t.root.collocations.where(cond)] 1 loops, best of 3: 1.56 s per loop but then I really need In [235]: %timeit A = [row.fetch_all_fields() for row in t.root.collocations.where(cond)]; A = array(A, dtype=A[0].dtype)[fields] 1 loops, best of 3: 2.08 s per loop which is faster and less ugly than In [351]: %timeit A = [tuple(row[f] for f in fields) for row in t.root.collocations.where(cond)]; A = array(A, dtype=dtype(zip(fields, (t.root.collocations.dtype[f] for f in fields)))) 1 loops, best of 3: 3.07 s per loop but probably needs more memory. What is the best way to go here? Note that my toy example table has only 1.5 million rows, but in production use it will be closer to several hundred million rows. -- Gerrit Holl PhD student at Department of Space Science, LuleƄ University of Technology, Kiruna, Sweden http://www.sat.ltu.se/members/gerrit/ ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users