Hi Francesc,

<snip>

> > 3. What partial results do I have?
> >
> > I have written some code and run a few tests with the pytable-data
> > structure as outlined above.
> >
> > In particular, I have only saved the mesh in the file (i.e. no /data
> > group for now), and compared using compressed CArrays (one for
> > /mesh/positions, one for /mesh/simplices and one for /mesh/regions)
> > against using normal Arrays, and associating a compression filter the group 
> > /mesh
> > (like this:
> >  myfilter = tables.Filters(complevel=5,complib='zlib')
> >  meshgroup = f.createGroup('/','mesh',filters=myfilter)
> > )
> >
> > While saving the data (for a particular example) in a text-based file
> > (not from pytables) needs 680kB, using compressed CArrays takes only
> > 110k. Compressing the mesh-group (and therefor implicitly all
> > sub-nodes), requires 250k.
>
> I think you are a bit misleaded here. Array objects doesn't support
> compression at all (we've to stress this out more in the docs). So, when
> using Arrays, you require less space than using ASCII files because HDF5
> uses a *binary* representation on disk, and only because of this.

Ah, that's interesting; thank you.

> So, if you need compression, you will need to use whatever Leaf
> container in PyTables other than Array objects (which are meant for
> quick and dirty management of numpy arrays).

Okay, that makes sense. They are indeed much easier to use as you
don't have to comment on their size (shape) when you create them.

> > 4. What is the question, then??
> >
> > So clearly the CArray wins, and in some way I can see where that comes
> > from: I have to specify the chunk size, so the systems knows what it
> > should look at and try to compress together.
> >
> > It appears that I cannot tell the compression filter for the group
> > /mesh how to chunk the data when I create the group, correct?
> > Therefore, CArrays are my best bet. (Are they?)
> >
> > Coming to my main question: It would appear that -- if CArrays -- give
> > the best compression for the mesh data, I would like to have a _table_
> > of CArrays for my /data/field1 node to store the time dependent data
> > most efficiently.
> >
> > Is this possible? (I couldn't find an example for creating CArrays in
> > tables.)
> >
> > What I have done for now, is to associate a compression filter with
> > the whole table /data/field1, but from the partial results obtained
> > for the /mesh/* nodes, this seems to be not the most space efficient
> > approach.
>
> Table objects also supports compression. It is possible that your data
> is less compressible than the mesh, and this can be the reason why you
> see less compression on Tables. Other possibility is that Tables are
> made of heterogeneous data, and this normally offers less possibilities
> of achieving good compression ratios than using homogeneous containers
> (CArray, EArray or VLArray).

With the new information (that arrays don't compress), I have done
some more tests saving just the 'mesh' (as outlined in my last email).

Reminder of previous findings:

-680k in ascii format
-110k using 3 CArrays in /mesh
-250k using 3 Arrays in /mesh (which ist just the binary data)
New results:

-250k written the 3 arrays as one row in table (consistend with line above)
-114k written the 3 arrays as one row in compressed table

So it turns out that using 3 compressed CArrays gives pretty much the
same compression as writing the 3 arrays (in one row) into a table
(difference is less than 4%, and will possible get smaller for larger
files).

That's good news, as I can write my main data in a compressed table
(which is what I thought I wanted to do).

Three more question out of curiosity (if I may):

In this piece of code (where points, simplex and simplexregion are 
numpy-arrays):
----------------------
filter = tables.Filters(complevel=5,complib='zlib')

points_chunk = tables.Float64Atom(shape=points.shape)
points_carray = f.createCArray(meshgroup, 'points', points.shape,\
                               points_chunk, filters=filter)
points_carray[:] = points

simplex_chunk = tables.Int32Atom(shape=simplex.shape)
simplex_carray = f.createCArray(meshgroup, 'simplices',simplex.shape,\
                                simplex_chunk,filters=filter)
simplex_c_array[:] = simplex

simplexregion_chunk = tables.Int32Atom(shape=simplexregion.shape)
simplexregion_carray = f.createCArray(meshgroup, 'simplicesregions',\
                                      simplexregion.shape, simplexregion_chunk, 
\
                                      filters=filter)
simplexregion_carray[:]=simplexregion
------------------------

(i) am I correct to assume that each of the three CArrays is
compressed individually (if you seee what I mean)?

(ii) if true, and I would like to compress the second and third
CArrays together (because I know that they contain similar data and
should compress well), could I do this somehow using CArrays (if that
question makes sense)?

(iii) when compressing tables, is there any way to tell the
compression routine about the 'chunk size' to use?



> Hope that helps,

It does indeed.

Best wishes,

Hans


>
> --
> Francesc Altet    |  Be careful about using the following code --
> Carabos Coop. V.  |  I've only proven that it works,
> www.carabos.com   |  I haven't tested it. -- Donald Knuth
>
>
>
>

--
Hans Fangohr
School of Engineering Sciences
University of Southampton
Phone: +44 (0) 238059 8345

Email: [EMAIL PROTECTED]
http://www.soton.ac.uk/~fangohr





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to