Re: [Pytables-users] LZO binaries for Windows available

Francesc Alted Fri, 31 Jul 2009 07:53:56 -0700

El dv 31 de 07 del 2009 a les 15:53 +0200, en/na Kim Hansen va escriure:
> >> > Please, try again with the above directives,
> >>
> >> Ah, OK, stupid me.
> >>
> >> OK, as I indicated in a previous post, that is actually what I asked
> >> about there, and actually I had also tried already to
> >> copy lzo1.dll to
> >> C:\Python25\Lib\site-packages\tables
> >>
> >> but there the test gave the same warning that it could not find LZO
> >>
> >> However upon moving it to C:\Windows\System32 (man, I really dislike
> >> fiddling with dlls in that dir) it was capable to find it on doing an
> >> import tables; tables.test()
> >>
> >> Don't understand why it could not find it in the \tables folder
> >
> > So neither do I.  Perhaps somebody knowing better Windows and its 
> > intricacies
> > can shed more light here.
> >
> > Well, at least you finally have LZO support in PyTables.
> >
> Yes, the important point is that it is possible to make it work now.
> 
> In my application it is write once and read many, and the data sizes
> are large, thus the LZO caught my attention from reading the
> Optimation tips Chapter, as it should be very fast in its
> decompression and the extra processing could be more than
> countercompensated by avaoiding a bottleneck in the file I/O speed
> from the HDDs.
> 
> So I wrote myself a little test program with some test data having an
> entropy (compressability) approximately similar to my real data:
> 
> import os
> from stat import ST_SIZE
> import time
> 
> import tables as tb
> import numpy as np
> 
> total_size = 10 ** 10
> chunk_size = 5 * 10 ** 5
> complib = 'lzo'
> max_comp_lvl = 9
> h5name = 'd:/test.h5'
> dtype = np.dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4')])
> recs = total_size / dtype.itemsize
> recs_per_chunk = chunk_size / dtype.itemsize
> test_data_chunk = np.empty(recs_per_chunk, dtype=dtype).view(np.recarray)
> test_data_chunk.x[:] = 1.0 + np.random.standard_normal(recs_per_chunk)
> test_data_chunk.y[:] = 1000.0 + np.random.standard_normal(recs_per_chunk)
> test_data_chunk.z[:] = 1000000000.0 + 
> np.random.standard_normal(recs_per_chunk)
> print "Testing hd5f write, read performance of offset Gaussian noise
> data for %d bytes in chunks of %d bytes using %s compression:" %\
>       (total_size, chunk_size, complib)
> for complevel in xrange(max_comp_lvl +1):
>     start_time = time.time()
>     filters = tb.Filters(complib="lzo", complevel=complevel)
>     h5 = tb.openFile(h5name, mode='w', filters=filters)
>     test_tbl = h5.createTable(h5.root, "test", np.empty(0, dtype=dtype),
>                               expectedrows=recs, chunkshape=recs_per_chunk)
>     bytes_written = 0
>     while bytes_written < total_size:
>         test_tbl.append(test_data_chunk)
>         bytes_written += chunk_size
>     h5.close()
>     elapsed = time.time() - start_time
>     data_rate = 1.0 * total_size / elapsed
>     print "Write test with compression level %d: %6.1f MB/s" %
> (complevel, data_rate * 1.0e-6)
>     h5compression = 1.0 * total_size / os.stat(h5name)[ST_SIZE]
>     print "HDF5 file compressed by 1:%5.3f" % h5compression
>     start_time = time.time()
>     h5 = tb.openFile('d:/test.h5', mode='r')
>     test_tbl = h5.root.test
>     for start in xrange(0, recs, recs_per_chunk):
>         test_tbl.read(start, start + recs_per_chunk)
>     h5.close()
>     elapsed = time.time() - start_time
>     data_rate = 1.0 * total_size / elapsed
>     print "Read test with compression level %d: %6.1f MB/s" %
> (complevel, data_rate * 1.0e-6)
>     os.remove('d:/test.h5')
> 
> With these results:
> 
> Testing hd5f write, read performance of offset Gaussian noise data for
> 10000000000 bytes in chunks of 500000 bytes using lzo compression:
> Write test with compression level 0:   70.0 MB/s
> HDF5 file compressed by 1:1.000
> Read test with compression level 0:  121.3 MB/s
> Write test with compression level 1:   64.3 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 1:  138.3 MB/s
> Write test with compression level 2:   65.6 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 2:  139.2 MB/s
> Write test with compression level 3:   65.8 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 3:  141.3 MB/s
> Write test with compression level 4:   65.9 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 4:  155.5 MB/s
> Write test with compression level 5:   65.7 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 5:  140.4 MB/s
> Write test with compression level 6:   65.9 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 6:  142.7 MB/s
> Write test with compression level 7:   64.7 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 7:  138.2 MB/s
> Write test with compression level 8:   65.0 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 8:  136.4 MB/s
> Write test with compression level 9:   64.6 MB/s
> HDF5 file compressed by 1:1.983
> Read test with compression level 9:  135.8 MB/s
> 
> 
> I see that for a compression level of about 4, the write speed only
> goes down from 70 MB/s to 66 MB/s, and the read speed increases from
> 121 MB/s to 155 MB/s (or 28%). Actually I had hoped for a larger
> relative increase in the read speed based on what I saw in the
> Optimization tips chapter.
> 
> Are there tricks for making it even faster or have I done stupid
> things in my test code?


There are several.  One of them could be to tune your chunksize (i.e.
try with larger and shorter ones).  The other one is trying to disable
the shuffle filter (shuffle=False during Filters() instantation).

>  Not a big issue though as the 155 MB/s is
> really good enough for my application. Curiously, the compression
> ratio is independent on the compression level.

Yes, LZO implementation in PyTables only implements the LZO-1X
algorithm, so there is no difference in selecting different compression
levels.  If you want more speed, you will have to wait until Blosc would
be ready (at the expense of reduced compression ratios, that is ;-).

Francesc


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] LZO binaries for Windows available

Reply via email to