A Wednesday 08 July 2009 16:43:08 Jon Olav Vik escrigué:
> Thanks for the clarification!
>
> Then it seems I should have a much more fine-grained structure in my HDF5
> file, grouping in tables only what will normally be needed together. In
> particular, an index column is a horrible idea. That's sad, I kind of liked
> the compact description of related data in a single table.

Why a horrible idea?  Indexing in Pro implements indexes as separate columns 
and that works *very* fast.  It is more costly from an implementation POV, but 
that's secondary when speed is critical.

> I guess a column-oriented table would be equivalent to a collection of
> separate tales, and just suffer slowdown for whole-row access than column
> access. On the other hand, filesystems do allow lots and lots of
> simultaneous file handles. Having a file handle for each column might allow
> one to piece together full rows quite fast.
>
> How about navigating multi-dimensional arrays? Is that very much slower
> along directions other than the contiguous one?

Yes, that's a very well known fact for data access in general.  That holds 
true even for memory access:

In [29]: a = np.ones(10000*10000).reshape(10000,10000)

In [30]: timeit -n1 -r1 a[1,:].sum()
1 loops, best of 1: 78.9 µs per loop

In [31]: timeit -n1 -r1 a[:, 1].sum()
1 loops, best of 1: 1.11 ms per loop   # more than 10x slower

so, with more reason when using disks (they have more latency).

> Actually, I'm beginning to yearn for the old system of separate files in a
> dedicated directory. Easier to process in parallel, copy only the parts
> I'll be working on, not everything gets corrupted if one piece fails,
> easier to sync, ... I feel my faith as an HDF5 missionary shaking 8-/

I doubt that implementing everything in separate files would be any easier.   
And neither would it be faster, as long as you don't re-implement all the I/O 
buffering and internal caches that PyTables/HDF5 wears.  But YMMV ;-)

-- 
Francesc Alted

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to