Am 25.05.2013 14:27, schrieb Andreas Hilboll:
> Hi,
> 
> the netcdf4-python project
> (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable)
> supports a "least_significant_digit" attribute when creating a
> variable/array. This leads to a truncation of the array data before
> storing it to disk
> (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26),
> which leads to be zlib compression more effective.
> 
> My question: Is the same true when I compress the array data with blosc?
> Will I get significant compression improvements when truncating my data
> before storing it in pytables?

Actually, I can now answer my own question: Yes, it does save some
space. As test, I created a file with two 5760x2880x12 arrays of dtype
float32. The data values are all in the range between +-1E17. When I
truncate the input values to 1E11 (least_significant_digit=-11), when I
get about 20% space reduction:

-rw-r--r-- 1 andreas andreas 418M Mai 25 16:47 satdb_blosc9-11.h5
-rw-r--r-- 1 andreas andreas 578M Mai 25 16:34 satdb_blosc9.h5

Would you guys be interested in having this as an optional filter? If
so, I'd be happy to submit a PR for this.


-- Andreas.

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to