Re: [Pytables-users] LZO binaries for Windows available

Kim Hansen Fri, 31 Jul 2009 06:54:06 -0700

>> > Please, try again with the above directives,
>>
>> Ah, OK, stupid me.
>>
>> OK, as I indicated in a previous post, that is actually what I asked
>> about there, and actually I had also tried already to
>> copy lzo1.dll to
>> C:\Python25\Lib\site-packages\tables
>>
>> but there the test gave the same warning that it could not find LZO
>>
>> However upon moving it to C:\Windows\System32 (man, I really dislike
>> fiddling with dlls in that dir) it was capable to find it on doing an
>> import tables; tables.test()
>>
>> Don't understand why it could not find it in the \tables folder
>
> So neither do I.  Perhaps somebody knowing better Windows and its intricacies
> can shed more light here.
>
> Well, at least you finally have LZO support in PyTables.
>
Yes, the important point is that it is possible to make it work now.


In my application it is write once and read many, and the data sizes
are large, thus the LZO caught my attention from reading the
Optimation tips Chapter, as it should be very fast in its
decompression and the extra processing could be more than
countercompensated by avaoiding a bottleneck in the file I/O speed
from the HDDs.

So I wrote myself a little test program with some test data having an
entropy (compressability) approximately similar to my real data:

import os
from stat import ST_SIZE
import time

import tables as tb
import numpy as np

total_size = 10 ** 10
chunk_size = 5 * 10 ** 5
complib = 'lzo'
max_comp_lvl = 9
h5name = 'd:/test.h5'
dtype = np.dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4')])
recs = total_size / dtype.itemsize
recs_per_chunk = chunk_size / dtype.itemsize
test_data_chunk = np.empty(recs_per_chunk, dtype=dtype).view(np.recarray)
test_data_chunk.x[:] = 1.0 + np.random.standard_normal(recs_per_chunk)
test_data_chunk.y[:] = 1000.0 + np.random.standard_normal(recs_per_chunk)
test_data_chunk.z[:] = 1000000000.0 + np.random.standard_normal(recs_per_chunk)
print "Testing hd5f write, read performance of offset Gaussian noise
data for %d bytes in chunks of %d bytes using %s compression:" %\
      (total_size, chunk_size, complib)
for complevel in xrange(max_comp_lvl +1):
    start_time = time.time()
    filters = tb.Filters(complib="lzo", complevel=complevel)
    h5 = tb.openFile(h5name, mode='w', filters=filters)
    test_tbl = h5.createTable(h5.root, "test", np.empty(0, dtype=dtype),
                              expectedrows=recs, chunkshape=recs_per_chunk)
    bytes_written = 0
    while bytes_written < total_size:
        test_tbl.append(test_data_chunk)
        bytes_written += chunk_size
    h5.close()
    elapsed = time.time() - start_time
    data_rate = 1.0 * total_size / elapsed
    print "Write test with compression level %d: %6.1f MB/s" %
(complevel, data_rate * 1.0e-6)
    h5compression = 1.0 * total_size / os.stat(h5name)[ST_SIZE]
    print "HDF5 file compressed by 1:%5.3f" % h5compression
    start_time = time.time()
    h5 = tb.openFile('d:/test.h5', mode='r')
    test_tbl = h5.root.test
    for start in xrange(0, recs, recs_per_chunk):
        test_tbl.read(start, start + recs_per_chunk)
    h5.close()
    elapsed = time.time() - start_time
    data_rate = 1.0 * total_size / elapsed
    print "Read test with compression level %d: %6.1f MB/s" %
(complevel, data_rate * 1.0e-6)
    os.remove('d:/test.h5')

With these results:

Testing hd5f write, read performance of offset Gaussian noise data for
10000000000 bytes in chunks of 500000 bytes using lzo compression:
Write test with compression level 0:   70.0 MB/s
HDF5 file compressed by 1:1.000
Read test with compression level 0:  121.3 MB/s
Write test with compression level 1:   64.3 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 1:  138.3 MB/s
Write test with compression level 2:   65.6 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 2:  139.2 MB/s
Write test with compression level 3:   65.8 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 3:  141.3 MB/s
Write test with compression level 4:   65.9 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 4:  155.5 MB/s
Write test with compression level 5:   65.7 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 5:  140.4 MB/s
Write test with compression level 6:   65.9 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 6:  142.7 MB/s
Write test with compression level 7:   64.7 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 7:  138.2 MB/s
Write test with compression level 8:   65.0 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 8:  136.4 MB/s
Write test with compression level 9:   64.6 MB/s
HDF5 file compressed by 1:1.983
Read test with compression level 9:  135.8 MB/s


I see that for a compression level of about 4, the write speed only
goes down from 70 MB/s to 66 MB/s, and the read speed increases from
121 MB/s to 155 MB/s (or 28%). Actually I had hoped for a larger
relative increase in the read speed based on what I saw in the
Optimization tips chapter.

Are there tricks for making it even faster or have I done stupid
things in my test code? Not a big issue though as the 155 MB/s is
really good enough for my application. Curiously, the compression
ratio is independent on the compression level.

Cheers,

Kim

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] LZO binaries for Windows available

Reply via email to