Hi pytables creators and users,

firstly congratulations and thank you for providing pytables; it seems
to be an excellent interface to hdf5.

I have started using pytables and am not sure what is the best
approach for my problem. In short, I'd like to save time dependent
finite element data with good compression. I have scanned the web and
the mailing lists but couldn't find a good answer.

1. What would I like to achieve?

I'd like to save time dependent finite element mesh data with good
compression.

In more detail, we are having the following data structures (for example in 3d):

The mesh itself:

-the positions of the mesh nodes (Nx3 float array for N nodes)
-the indices of nodes making up simplices (Mx4 int array for M simplices)
-the region to which each simplex belongs (Mx1 int array for M simplices)

The mesh is not time dependent, and need to be stored only once.

The actual time dependent data for one 3d vector field (here with linear 
basis-functions)
has the structure

-Nx3 float array for every time step plus
-some small extra data (such as the 'time' for this time step)

2. What options have I considered?

It seems wise to store the mesh together with the data (in the same h5
file). One file layout could therefore be:

A group

/mesh

with arraynodes

/mesh/positions
/mesh/simplices
/mesh/regions

and another group

/data/

with a tablenode

/data/field1

which keeps the time dependent data. The table would have an
Array(Nx3) column for the vector field, and an extra column for the time
(and some columns for other observables I didn't explain here).

So far this structure feels to be quite a canonical choice -- however,
if you think this could be done better, then please let me know!

My question is geared towards how to achieve good compression for this
type of data (see below). I am happy to choose between lzo,zlib,bzip2,
the different advantages and disadvantages are well explained in the
manual. It is not so clear to me when to use CArrays and when not.

3. What partial results do I have?

I have written some code and run a few tests with the pytable-data
structure as outlined above.

In particular, I have only saved the mesh in the file (i.e. no /data
group for now), and compared using compressed CArrays (one for
/mesh/positions, one for /mesh/simplices and one for /mesh/regions)
against using normal Arrays, and associating a compression filter the group 
/mesh
(like this:
 myfilter = tables.Filters(complevel=5,complib='zlib')
 meshgroup = f.createGroup('/','mesh',filters=myfilter)
)

While saving the data (for a particular example) in a text-based file
(not from pytables) needs 680kB, using compressed CArrays takes only
110k. Compressing the mesh-group (and therefor implicitly all
sub-nodes), requires 250k.

4. What is the question, then??

So clearly the CArray wins, and in some way I can see where that comes
from: I have to specify the chunk size, so the systems knows what it
should look at and try to compress together.

It appears that I cannot tell the compression filter for the group
/mesh how to chunk the data when I create the group, correct?
Therefore, CArrays are my best bet. (Are they?)

Coming to my main question: It would appear that -- if CArrays -- give
the best compression for the mesh data, I would like to have a _table_
of CArrays for my /data/field1 node to store the time dependent data
most efficiently.

Is this possible? (I couldn't find an example for creating CArrays in
tables.)

What I have done for now, is to associate a compression filter with
the whole table /data/field1, but from the partial results obtained
for the /mesh/* nodes, this seems to be not the most space efficient
approach.

I realise this is quite a long email; I hope it will be of some use to
other people having the same question at some later point. I explained
my data structure in detail to make sure I am thinking about the right
approach.

Many thanks for any answers, advice, corrections etc in advance,

Hans




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to