On 12/5/12 7:55 PM, Alvaro Tejero Cantero wrote: > My system was benched for reads and writes with Blosc[1]: > > with pt.openFile(paths.braw(block), 'r') as handle: > pt.setBloscMaxThreads(1) > %timeit a = handle.root.raw.c042[:] > pt.setBloscMaxThreads(6) > %timeit a = handle.root.raw.c042[:] > pt.setBloscMaxThreads(11) > %timeit a = handle.root.raw.c042[:] > print handle.root.raw._v_attrs.FILTERS > print handle.root.raw.c042.__sizeof__() > print handle.root.raw.c042 > > gives > > 1 loops, best of 3: 483 ms per loop > 1 loops, best of 3: 782 ms per loop > 1 loops, best of 3: 663 ms per loop > Filters(complevel=5, complib='blosc', shuffle=True, fletcher32=False) > 104 > /raw/c042 (CArray(303390000,), shuffle, blosc(5)) '' > > I can't understand what is going on, for the life of me. These > datasets use int16 atoms and at Blosc complevel=5 used to compress by > a factor of about 2. Even for such low compression ratios there should > be huge differences between single- and multi-threaded reads. > > Do you have any clue?
Yeah, welcome to the wonderful art of fine tuning. Fortunately we have a machine which is pretty identical to yours (hey, your computer was too good in Blosc benchmarks so as to ignore it :), so I can reproduce your issue: In [3]: a = ((np.random.rand(3e8))*100).astype('i2') In [4]: f = tb.openFile("test.h5", "w") In [5]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape, filters=tb.Filters(5, complib="blosc")) In [6]: act[:] = a In [7]: f.flush() In [8]: ll test.h5 -rw-rw-r-- 1 faltet 301719914 Dec 6 04:55 test.h5 This random set of numbers is close to your array in size (~3e8 elements), and also has a similar compression factor (~2x). Now the timings (using 6 cores by default): In [9]: timeit act[:] 1 loops, best of 3: 441 ms per loop In [11]: tb.setBloscMaxThreads(1) Out[11]: 6 In [12]: timeit act[:] 1 loops, best of 3: 347 ms per loop So yeah, that might seem a bit disappointing. It turns out that the default chunksize for PyTables is tuned so as to balance among sequential and random reads. If what you want is to optimize only for sequential reads (apparently this is what you are after, right?), then it normally helps to increase the chunksize. For example, by doing some quick trials, I determined that a chunksize of 2 MB is pretty optimal for sequential access: In [44]: f.removeNode(f.root.act) In [45]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape, filters=tb.Filters(5, complib="blosc"), chunkshape=(2**20,)) In [46]: act[:] = a In [47]: tb.setBloscMaxThreads(1) Out[47]: 6 In [48]: timeit act[:] 1 loops, best of 3: 334 ms per loop In [49]: tb.setBloscMaxThreads(3) Out[49]: 1 In [50]: timeit act[:] 1 loops, best of 3: 298 ms per loop In [51]: tb.setBloscMaxThreads(6) Out[51]: 3 In [52]: timeit act[:] 1 loops, best of 3: 303 ms per loop Also, we see here that the sweet point is using 3 threads, not more (don't ask why). However, that does not mean that Blosc is not able to work faster on this machine, and in fact it does: In [59]: import blosc In [60]: sa = a.tostring() In [61]: ac2 = blosc.compress(sa, 2, clevel=5) In [62]: blosc.set_nthreads(6) Out[62]: 6 In [64]: timeit a2 = blosc.decompress(ac2) 10 loops, best of 3: 80.7 ms per loop In [65]: blosc.set_nthreads(1) Out[65]: 6 In [66]: timeit a2 = blosc.decompress(ac2) 1 loops, best of 3: 249 ms per loop So that means that a pure Blosc compression in-memory can only go 4x faster than PyTables + Blosc, and in this is case the latter is reaching an excellent mark of 2 GB/s, which is really good for a read from disk operation. Note how a memcpy() operation in this machine is just about as good as this: In [36]: timeit a.copy() 1 loops, best of 3: 294 ms per loop Now that I'm on this, I'm curious on how other compressors would perform for this scenario: In [6]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape, filters=tb.Filters(5, complib="lzo"), chunkshape=(2**20,)) In [7]: act[:] = a In [8]: f.flush() In [9]: ll test.h5 # compression ratio very close to Blosc -rw-rw-r-- 1 faltet 302769510 Dec 6 05:23 test.h5 In [10]: timeit act[:] 1 loops, best of 3: 1.13 s per loop so, the time for LZO is more than 3x slower than Blosc. And a similar thing with zlib: In [12]: f.close() In [13]: f = tb.openFile("test.h5", "w") In [14]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape, filters=tb.Filters(1, complib="zlib"), chunkshape=(2**20,)) In [15]: act[:] = a In [16]: f.flush() In [17]: ll test.h5 # the compression rate is somewhat better -rw-rw-r-- 1 faltet 254821296 Dec 6 05:26 test.h5 In [18]: timeit act[:] 1 loops, best of 3: 2.24 s per loop which is 6x slower than Blosc (although the compression ratio is a bit better). And just for matter of completeness, let's see how fast can perform carray (the package, not the CArray object in PyTables) for a chunked array in-memory: In [19]: import carray as ca In [20]: ac3 = ca.carray(a, chunklen=2**20, cparams=ca.cparams(5)) In [21]: ac3 Out[21]: carray((300000000,), int16) nbytes: 572.20 MB; cbytes: 289.56 MB; ratio: 1.98 cparams := cparams(clevel=5, shuffle=True) [59 34 36 ..., 21 58 50] In [22]: timeit ac3[:] 1 loops, best of 3: 254 ms per loop In [23]: ca.set_nthreads(1) Out[23]: 6 In [24]: timeit ac3[:] 1 loops, best of 3: 282 ms per loop So, with 254 ms, it is only marginally faster than PyTables (~298 ms). Now with a carray object on-disk: In [27]: acd = ca.carray(a, chunklen=2**20, cparams=ca.cparams(5), rootdir="test") In [28]: acd Out[28]: carray((300000000,), int16) nbytes: 572.20 MB; cbytes: 289.56 MB; ratio: 1.98 cparams := cparams(clevel=5, shuffle=True) rootdir := 'test' [59 34 36 ..., 21 58 50] In [30]: ca.set_nthreads(6) Out[30]: 1 In [31]: timeit acd[:] 1 loops, best of 3: 317 ms per loop In [32]: ca.set_nthreads(1) Out[32]: 6 In [33]: timeit acd[:] 1 loops, best of 3: 361 ms per loop The times in this case are a bit larger than with PyTables (317ms vs 298ms), which speaks a lot how efficiently is implemented I/O in HDF5/PyTables stack. -- Francesc Alted ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users