Thanks for your prompt reply.
[EMAIL PROTECTED] wrote on 08/24/2007 06:57:23
AM:
> Hi Elias,
>
> A Thursday 23 August 2007, escriguéreu:
> > Francesc,
> >
> > Here's my setup:
> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >=-=-=-= PyTables version: 1.3
> > HDF5 version: 1.6.5
> > numarray version: 1.5.1
> > Zlib version: 1.2.1
> > BZIP2 version: 1.0.2 (30-Dec-2001)
> > Python version: 2.4.3 (#1, Apr 21 2006, 14:31:08)
> > [GCC 3.3.3 (SuSE Linux)]
> > Platform: linux2-x86_64
> > Byte-ordering: little
^^^^^^ <--Notice this
Well since my platform is little-endian and the problem file is also
little-endian, I ignored this. I think somehow 'h5import' was creating
big-endian files on my little-endian machine, but I have not verified
this.
I tried using 'ptrepack' on the 'new' file to remove the shuffle filter
and did not notice any improvement. On the other hand, when I created a
test file *without* shuffle and then used ptrepack to *add* the shuffle, I
got some improvement (these are *much* smaller files):
$ python test_finder.py
Testing file noshuffle.h5
GRID 121731
fh.find_gpfb('121731') took 1.73 sec
Found 3 results for your search
fh.find('1121910', gpf=True) took 0.192 sec
Testing file repacked.h5
GRID 121731
fh.find_gpfb('121731') took 0.989 sec
Found 3 results for your search
fh.find('1121910', gpf=True) took 0.0993 sec
Perhaps these cases are not stressful enough to draw conclusions about the
shuffle filter for the full size file. Also, I failed to mention that my
'New' file was actually created with a file.copyNode() call after deleting
and recreating a bad node. I'm planning to rebuild this file and I'll try
it both with and without shuffle.
As an aside, its seems that 'ptrepack' doesn't worry about byte-order. I
tried it on one of my 'old' big-endian files and not only did it take
forever to complete, it corrupted most of the data, byte-ordering I
assume.
>
> So, you may want to try PyTables 2.0 or, if you want to stick with 1.3,
> try disabling the shuffle filter (at the expense of reducing the
> compression effectiveness) when creating the 'new' arrays. My
> recommendation, though, is you to switch to 2.0 as there are more
> optimizations (like using numpy natively and others) that can help
> improving your times still more.
Well, I have 2.0 built but not installed. My reluctance is to avoid
breaking my production codebase, so I have to proceed cautiously. However,
this will definitely motivate me to upgrade!
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users