A Friday 15 May 2009 21:05:22 David Fokkema escrigué:
> > Apparently there is no provision in HDF5 for compressing actual data in
> > variable length arrays.  However, if this is a must for you you can
> > always compress the data manually before writing it to disk, and
> > decompress it after the reading process.
>
> Hmmm... that's a shame. Is there really no provision for it or is it
> just hard to set up? I'll have to think this over, then.

Well, Quincey Koziol, one of the lead developers for HDF5, answered to the 
above question too:

"""
Yes, George is correct.  The VL data is stored in a "global heap" in  
the file, which is not compressed.  Someday, we would like to switch  
the storage for VL data in datasets (and probably attributes) to use  
the new "fractal heap" code.  We'll also probably use one fractal heap  
per dataset, instead of sharing the VL data for all datasets in one  
centralized location.
"""

So, it seems like they want to address this issue, but the solution is not 
here yet.

> I do need
> compression, because I'm basically storing parts of a terabyte dataset
> on my Eee pc, with which I'm very happy because of its weight and
> easy-to-travel-with design, but is a bit underpowered for real world
> data analysis. That may rule out compression because of CPU cycles, now
> that I think about it, :-/ Well, I'll try compression, serializing and
> storing as a string.

Curiously enough, I'm lately working on a high-performance compressor for 
binary data that can be useful here.  It leverages the vector capabilities of 
Intel and AMD processors (in particular, the SSE2 instruction set, which is 
present in AMD CPUs since Athlon and in Intel ones since Pentium 4, including 
Atom ;).  Integrating this compressor with the VLArray (perhaps via a new 
pseudo atom) is definitely possible, and could be very interesting for many 
situations.  I'll think more about this...

Cheers,

-- 
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'.  In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to