On 5/25/13 5:06 PM, Andreas Hilboll wrote:
> Am 25.05.2013 14:27, schrieb Andreas Hilboll:
>> Hi,
>> the netcdf4-python project
>> (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable)
>> supports a "least_significant_digit" attribute when creating a
>> variable/array. This leads to a truncation of the array data before
>> storing it to disk
>> (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26),
>> which leads to be zlib compression more effective.
>> My question: Is the same true when I compress the array data with blosc?
>> Will I get significant compression improvements when truncating my data
>> before storing it in pytables?
> Actually, I can now answer my own question: Yes, it does save some
> space. As test, I created a file with two 5760x2880x12 arrays of dtype
> float32. The data values are all in the range between +-1E17. When I
> truncate the input values to 1E11 (least_significant_digit=-11), when I
> get about 20% space reduction:
> -rw-r--r-- 1 andreas andreas 418M Mai 25 16:47 satdb_blosc9-11.h5
> -rw-r--r-- 1 andreas andreas 578M Mai 25 16:34 satdb_blosc9.h5
> Would you guys be interested in having this as an optional filter? If
> so, I'd be happy to submit a PR for this.

Yeah, quantize used to be in the netcdf3 module in old versions of 
PyTables (with the introduction of netcdf4-python this was removed).  
But it would be interesting to have it around again.  It would be nice 
of you can contribute the PR, together with some docs (a small tutorial 
would be really great).

For efficiency, the place for this filter would be inside Blosc, but 
that's is another story :)


Francesc Alted

Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
Pytables-users mailing list

Reply via email to