Francesc Alted <faltet <at> pytables.org> writes: > Hi Jon Olav, > > A Tuesday 07 July 2009 13:44:30 Jon Olav Vik escrigué: > > The problem in brief: Why does it take 20-40 seconds to extract a table > > column of 200000 integers? The code snippet in question is: > > > > with pt.openFile(filename) as f: > > vlarrayrow = f.root.gp.cols.vlarrayrow[:] > > Quick answer: when your dataset fits in OS filesystem memory cache, the > retrieval is very fast. If not, you can go at you disk speed as maximum. [...] > As you may have noticed, I've counted the *total* size of the table as being > read instead of the size of only one single column. This is because the table > is organized row-wise on-disk, and you need to read *all* the columns for > accessing just one. The only solution to avoid this is to implement a column- > wise table, which I'd like to implement in a next future (but not there yet).
Thanks for the clarification! Then it seems I should have a much more fine-grained structure in my HDF5 file, grouping in tables only what will normally be needed together. In particular, an index column is a horrible idea. That's sad, I kind of liked the compact description of related data in a single table. I guess a column-oriented table would be equivalent to a collection of separate tales, and just suffer slowdown for whole-row access than column access. On the other hand, filesystems do allow lots and lots of simultaneous file handles. Having a file handle for each column might allow one to piece together full rows quite fast. How about navigating multi-dimensional arrays? Is that very much slower along directions other than the contiguous one? Actually, I'm beginning to yearn for the old system of separate files in a dedicated directory. Easier to process in parallel, copy only the parts I'll be working on, not everything gets corrupted if one piece fails, easier to sync, ... I feel my faith as an HDF5 missionary shaking 8-/ Best regards, Jon Olav ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users