A Wednesday 08 July 2009 16:43:08 Jon Olav Vik escrigué: > Thanks for the clarification! > > Then it seems I should have a much more fine-grained structure in my HDF5 > file, grouping in tables only what will normally be needed together. In > particular, an index column is a horrible idea. That's sad, I kind of liked > the compact description of related data in a single table.
Why a horrible idea? Indexing in Pro implements indexes as separate columns and that works *very* fast. It is more costly from an implementation POV, but that's secondary when speed is critical. > I guess a column-oriented table would be equivalent to a collection of > separate tales, and just suffer slowdown for whole-row access than column > access. On the other hand, filesystems do allow lots and lots of > simultaneous file handles. Having a file handle for each column might allow > one to piece together full rows quite fast. > > How about navigating multi-dimensional arrays? Is that very much slower > along directions other than the contiguous one? Yes, that's a very well known fact for data access in general. That holds true even for memory access: In [29]: a = np.ones(10000*10000).reshape(10000,10000) In [30]: timeit -n1 -r1 a[1,:].sum() 1 loops, best of 1: 78.9 µs per loop In [31]: timeit -n1 -r1 a[:, 1].sum() 1 loops, best of 1: 1.11 ms per loop # more than 10x slower so, with more reason when using disks (they have more latency). > Actually, I'm beginning to yearn for the old system of separate files in a > dedicated directory. Easier to process in parallel, copy only the parts > I'll be working on, not everything gets corrupted if one piece fails, > easier to sync, ... I feel my faith as an HDF5 missionary shaking 8-/ I doubt that implementing everything in separate files would be any easier. And neither would it be faster, as long as you don't re-implement all the I/O buffering and internal caches that PyTables/HDF5 wears. But YMMV ;-) -- Francesc Alted ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users