Hi Hans,

El dc 24 de 01 del 2007 a les 13:13 +0000, en/na Hans Fangohr va
escriure:
> Hi pytables creators and users,
> 
> firstly congratulations and thank you for providing pytables; it seems
> to be an excellent interface to hdf5.

Glad that you think so. See my comments intertwined in your message.

> 
> I have started using pytables and am not sure what is the best
> approach for my problem. In short, I'd like to save time dependent
> finite element data with good compression. I have scanned the web and
> the mailing lists but couldn't find a good answer.
> 
> 1. What would I like to achieve?
> 
> I'd like to save time dependent finite element mesh data with good
> compression.
> 
> In more detail, we are having the following data structures (for example in 
> 3d):
> 
> The mesh itself:
> 
> -the positions of the mesh nodes (Nx3 float array for N nodes)
> -the indices of nodes making up simplices (Mx4 int array for M simplices)
> -the region to which each simplex belongs (Mx1 int array for M simplices)
> 
> The mesh is not time dependent, and need to be stored only once.
> 
> The actual time dependent data for one 3d vector field (here with linear 
> basis-functions)
> has the structure
> 
> -Nx3 float array for every time step plus
> -some small extra data (such as the 'time' for this time step)
> 
> 2. What options have I considered?
> 
> It seems wise to store the mesh together with the data (in the same h5
> file). One file layout could therefore be:
> 
> A group
> 
> /mesh
> 
> with arraynodes
> 
> /mesh/positions
> /mesh/simplices
> /mesh/regions
> 
> and another group
> 
> /data/
> 
> with a tablenode
> 
> /data/field1
> 
> which keeps the time dependent data. The table would have an
> Array(Nx3) column for the vector field, and an extra column for the time
> (and some columns for other observables I didn't explain here).
> 
> So far this structure feels to be quite a canonical choice -- however,
> if you think this could be done better, then please let me know!
> 
> My question is geared towards how to achieve good compression for this
> type of data (see below). I am happy to choose between lzo,zlib,bzip2,
> the different advantages and disadvantages are well explained in the
> manual. It is not so clear to me when to use CArrays and when not.
> 
> 3. What partial results do I have?
> 
> I have written some code and run a few tests with the pytable-data
> structure as outlined above.
> 
> In particular, I have only saved the mesh in the file (i.e. no /data
> group for now), and compared using compressed CArrays (one for
> /mesh/positions, one for /mesh/simplices and one for /mesh/regions)
> against using normal Arrays, and associating a compression filter the group 
> /mesh
> (like this:
>  myfilter = tables.Filters(complevel=5,complib='zlib')
>  meshgroup = f.createGroup('/','mesh',filters=myfilter)
> )
> 
> While saving the data (for a particular example) in a text-based file
> (not from pytables) needs 680kB, using compressed CArrays takes only
> 110k. Compressing the mesh-group (and therefor implicitly all
> sub-nodes), requires 250k.

I think you are a bit misleaded here. Array objects doesn't support
compression at all (we've to stress this out more in the docs). So, when
using Arrays, you require less space than using ASCII files because HDF5
uses a *binary* representation on disk, and only because of this.

So, if you need compression, you will need to use whatever Leaf
container in PyTables other than Array objects (which are meant for
quick and dirty management of numpy arrays).

> 
> 4. What is the question, then??
> 
> So clearly the CArray wins, and in some way I can see where that comes
> from: I have to specify the chunk size, so the systems knows what it
> should look at and try to compress together.
> 
> It appears that I cannot tell the compression filter for the group
> /mesh how to chunk the data when I create the group, correct?
> Therefore, CArrays are my best bet. (Are they?)
> 
> Coming to my main question: It would appear that -- if CArrays -- give
> the best compression for the mesh data, I would like to have a _table_
> of CArrays for my /data/field1 node to store the time dependent data
> most efficiently.
> 
> Is this possible? (I couldn't find an example for creating CArrays in
> tables.)
> 
> What I have done for now, is to associate a compression filter with
> the whole table /data/field1, but from the partial results obtained
> for the /mesh/* nodes, this seems to be not the most space efficient
> approach.

Table objects also supports compression. It is possible that your data
is less compressible than the mesh, and this can be the reason why you
see less compression on Tables. Other possibility is that Tables are
made of heterogeneous data, and this normally offers less possibilities
of achieving good compression ratios than using homogeneous containers
(CArray, EArray or VLArray).

As always, your goals together with a good deal of experiments will
hopefully lead you to an optimal solution.

> 
> I realise this is quite a long email; I hope it will be of some use to
> other people having the same question at some later point. I explained
> my data structure in detail to make sure I am thinking about the right
> approach.

Don't worry. It's always nice to see new users coming with their own
problems.

Hope that helps,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to