A Friday 15 May 2009 15:40:16 David Fokkema escrigué:
> Hi list,
>
> I don't get this (using pytables 2.1.1):
>
> In [1]: import tables
>
> In [2]: data = tables.openFile('data_new.h5', 'w')
>
> In [3]: data.createVLArray(data.root, 'nosee',
> tables.Int32Atom())Out[3]:
> /nosee (VLArray(0,)) ''
>   atom = Int32Atom(shape=(), dflt=0)
>   byteorder = 'little'
>   nrows = 0
>   flavor = 'numpy'
>
> In [4]: data.createVLArray(data.root, 'see', tables.Int32Atom(),
> filters=tables.Filters(complevel=1))
> Out[4]:
> /see (VLArray(0,), shuffle, zlib(1)) ''
>   atom = Int32Atom(shape=(), dflt=0)
>   byteorder = 'little'
>   nrows = 0
>   flavor = 'numpy'
>
> In [5]: a = 1000000 * [200]
>
> In [6]: for i in range(50):
>    ...:     data.root.see.append(a)
>    ...:
>    ...:
>
> In [7]: data.flush()
>
> And looking at the file:
>
> 191M 2009-05-15 15:37 data_new.h5
>
> Also writing to the uncompressed table, adds another 191 Mb to the file.
> So, I really see no compression at all. I also tried zlib(9). Why are my
> arrays not compressed? The repetitive values seem like a perfect
> candidate for compression.

Yes, I can reproduce this.  Well, at least it seems that PyTables is setting 
the filters correctly.  For the 'see' dataset h5ls -v is reporting:

    Chunks:    {2048} 32768 bytes                                            
    Storage:   800 logical bytes, 391 allocated bytes, 204.60% utilization   
    Filter-0:  shuffle-2 OPT {16}                                            
    Filter-1:  deflate-1 OPT {1}                                             
    Type:      variable length of                                            
                   native int                                                

which clearly demonstrate that the filters are correctly installed in the HDF5 
pipeline :-\

This definitely seems an HDF5 issue.  To say the truth I've never seen good 
compression rates in VLArrays (although I'd never thought that compression was 
completely inexistent!).

I'll try to report this to the hdf-forum list and get back to you.

Cheers,

-- 
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'.  In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to