A Wednesday 10 November 2010 15:41:50 Gerrit Holl escrigué:
> Hi,
> 
> On 10 November 2010 14:39, Francesc Alted <fal...@pytables.org> wrote:
> > A Wednesday 10 November 2010 12:08:34 Francesc Alted escrigué:
> >> A Tuesday 09 November 2010 16:38:10 Francesc Alted escrigué:
> >> > Hmm, this seems a problem with Blosc indeed.  Could you put the
> >> > compressed datafile (for example, noaa18_2008_zlib3.h5) in a
> >> > public place so that I can see what's going on?
> >> 
> >> Thanks Gerrit for your sample file.  I've looked at the issue, but
> >> it looks like a (complex) inefficiency in the Lempel-Ziv part of
> >> Blosc (BloscLZ).  I've no time now to look into this, so I
> >> created a ticket:
> >> 
> >> http://blosc.pytables.org/trac/ticket/7
> >> 
> >> Hope this can be addressed in the near future.
> > 
> > Hmm, I realized what's going on.  The fix is in PyTables trunk now
> > (or std-2.2 branch, if you prefer).  Can you have a try at it?
> 
> I tested for the reduced version below:
> 
> -rw-r--r-- 1 gerrit students 3.2M Wednesday 10-11-2010 15:36:20
> noaa18_2008_blosc1_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 3.2M Wednesday 10-11-2010 15:36:20
> noaa18_2008_blosc2_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 2.8M Wednesday 10-11-2010 15:36:20
> noaa18_2008_blosc3_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 670K Wednesday 10-11-2010 15:36:20
> noaa18_2008_blosc4_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 670K Wednesday 10-11-2010 15:36:20
> noaa18_2008_blosc5_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 636K Wednesday 10-11-2010 15:36:21
> noaa18_2008_blosc6_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 640K Wednesday 10-11-2010 15:36:21
> noaa18_2008_blosc7_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 638K Wednesday 10-11-2010 15:36:21
> noaa18_2008_blosc8_reduced_newversion.h5
> -rw-r--r-- 1 gerrit students 640K Wednesday 10-11-2010 15:36:21
> noaa18_2008_blosc9_reduced_newversion.h5
> 
> 7, 8 and 9 are still larger than 6, but now by a very slight amount.
> Write speeds were similar, I didn't test read speeds, but usually
> higher compressions are slower. So then maybe number 6 (as I'm using
> now) is still optimal. Might this be expected?

Yes, this is expected now.  You know, Blosc needs small chunksizes in 
order to be fast, and beyond 64 KB blocks, you will generally only get 
marginal improvements (or decline as in this case).  OTOH, Blosc can 
still get better compression ratios with chunks beyond 64 KB, but only 
for typesizes < 256 bytes.  However, your typesizes are around 300 
bytes, so there are no chances this additional boost in compression 
ratio can be achieved.

> One side note: I forgot to update numexpr before recompiling
> pytables, and pytables only complained run-time, not compile-time.
> It would be nice if pytables checked such essential dependencies
> compile- or install-time.

I don't agree.  Numexpr is not a compile-time requisite, only a run-time 
one.  If the version is less than recommended, a message is issued.  
Hmmm, now that I think about this, I suppose it is better to issue a 
true Python warning, instead of a plain print.  Added a ticket:

http://pytables.org/trac/ticket/311

-- 
Francesc Alted

------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to