Hi,

I'm trying to find the most efficient way to read multiple fields from
my table. I noticed that readWhere is really slow:

In [158]: %timeit A = array([row[field] for row in
t.root.collocations.where(cond)], dtype='int64')
1 loops, best of 3: 649 ms per loop

In [159]: %timeit B = t.root.collocations.readWhere(cond, field=field)
1 loops, best of 3: 7.99 s per loop

In [160]: (A==B).all()
Out[160]: True

Why is readWhere so slow?

Now, I really have multiple fields. Neither readWhere, nor
row.__getitem__, nor the void returned by row.fetch_all_fields
supports fetching more than one field as a recarray does. Is this by
design? What is the preferred way to fetch multiple fields? I would
like to get a ndarray with the requested fields, of course maintaining
the dtype. If I write:

In [218]: %timeit A = [[row[f] for f in fields] for row in
t.root.collocations.where(cond)]
1 loops, best of 3: 1.19 s per loop

I lose the dtype

I could do the slower and more memory-intensive

In [217]: %timeit A = [row.fetch_all_fields() for row in
t.root.collocations.where(cond)]
1 loops, best of 3: 1.56 s per loop

but then I really need

In [235]: %timeit A = [row.fetch_all_fields() for row in
t.root.collocations.where(cond)]; A = array(A,
dtype=A[0].dtype)[fields]
1 loops, best of 3: 2.08 s per loop

which is faster and less ugly than

In [351]: %timeit A = [tuple(row[f] for f in fields) for row in
t.root.collocations.where(cond)]; A = array(A, dtype=dtype(zip(fields,
(t.root.collocations.dtype[f] for f in fields))))
1 loops, best of 3: 3.07 s per loop

but probably needs more memory.

What is the best way to go here?

Note that my toy example table has only 1.5 million rows, but in
production use it will be closer to several hundred million rows.

-- 
Gerrit Holl
PhD student at Department of Space Science, LuleƄ University of
Technology, Kiruna, Sweden
http://www.sat.ltu.se/members/gerrit/

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to