A Friday 15 May 2009 21:05:22 David Fokkema escrigué: > > Apparently there is no provision in HDF5 for compressing actual data in > > variable length arrays. However, if this is a must for you you can > > always compress the data manually before writing it to disk, and > > decompress it after the reading process. > > Hmmm... that's a shame. Is there really no provision for it or is it > just hard to set up? I'll have to think this over, then.
Well, Quincey Koziol, one of the lead developers for HDF5, answered to the above question too: """ Yes, George is correct. The VL data is stored in a "global heap" in the file, which is not compressed. Someday, we would like to switch the storage for VL data in datasets (and probably attributes) to use the new "fractal heap" code. We'll also probably use one fractal heap per dataset, instead of sharing the VL data for all datasets in one centralized location. """ So, it seems like they want to address this issue, but the solution is not here yet. > I do need > compression, because I'm basically storing parts of a terabyte dataset > on my Eee pc, with which I'm very happy because of its weight and > easy-to-travel-with design, but is a bit underpowered for real world > data analysis. That may rule out compression because of CPU cycles, now > that I think about it, :-/ Well, I'll try compression, serializing and > storing as a string. Curiously enough, I'm lately working on a high-performance compressor for binary data that can be useful here. It leverages the vector capabilities of Intel and AMD processors (in particular, the SSE2 instruction set, which is present in AMD CPUs since Athlon and in Intel ones since Pentium 4, including Atom ;). Integrating this compressor with the VLArray (perhaps via a new pseudo atom) is definitely possible, and could be very interesting for many situations. I'll think more about this... Cheers, -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'. In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users