Francesc Alted <faltet <at> pytables.org> writes:

> Hi Jon Olav,
> 
> A Tuesday 07 July 2009 13:44:30 Jon Olav Vik escrigué:
> > The problem in brief: Why does it take 20-40 seconds to extract a table
> > column of 200000 integers? The code snippet in question is:
> >
> > with pt.openFile(filename) as f:
> >     vlarrayrow = f.root.gp.cols.vlarrayrow[:]
> 
> Quick answer: when your dataset fits in OS filesystem memory cache, the 
> retrieval is very fast.  If not, you can go at you disk speed as maximum.
[...]
> As you may have noticed, I've counted the *total* size of the table as being 
> read instead of the size of only one single column.  This is because the 
table 
> is organized row-wise on-disk, and you need to read *all* the columns for 
> accessing just one.  The only solution to avoid this is to implement a column-
> wise table, which I'd like to implement in a next future (but not there yet).

Thanks for the clarification!

Then it seems I should have a much more fine-grained structure in my HDF5 file, 
grouping in tables only what will normally be needed together. In particular, 
an index column is a horrible idea. That's sad, I kind of liked the compact 
description of related data in a single table.

I guess a column-oriented table would be equivalent to a collection of separate 
tales, and just suffer slowdown for whole-row access than column access. On the 
other hand, filesystems do allow lots and lots of simultaneous file handles. 
Having a file handle for each column might allow one to piece together full 
rows quite fast.

How about navigating multi-dimensional arrays? Is that very much slower along 
directions other than the contiguous one?

Actually, I'm beginning to yearn for the old system of separate files in a 
dedicated directory. Easier to process in parallel, copy only the parts I'll be 
working on, not everything gets corrupted if one piece fails, easier to 
sync, ... I feel my faith as an HDF5 missionary shaking 8-/


Best regards,
Jon Olav



------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to