Hi braingateway, A Thursday 14 October 2010 00:45:05 braingateway escrigué: > Hi everyone, > > > I used to work with numpy.memmap, the speed was roughly OK for me, > but I always need to save corresponding metadata (such as variable > names, variable shapes, experiment descriptions, etc.) into a > separate file, which is a very bad approach when I have lots of data > files and change their names from time to time. I heard a lot > amazing characteristics about Pytables recently. It sounds perfectly > match my application, It is based on HDF5, can be compressed by > Blosc, and even faster I/O speed that numpy.memmap. So I decide to > shift my project to Pytables. When I tried the official bench mark > code (poly.py), it seems OK, at least without compression the I/O > speed is faster than nump.memmap. However, when I try to dig a > little big deeper, I got problems immediately.
Mmh, you rather meant *performance* problems probably :-) > I did several > different experiments to get familiar with performance spec of > Pytables. First, I try to just read data chunks (smaller than > (1E+6,24)) into RAM from a random location in a larger data file > which containing (3E+6,24) random float64 numbers, about 549MB. For > each reading operation, I obtained the average speed from 10 > experiments. It took numpy.memmap 56ms to read 1E+6 long single > column, and 73ms to read data chunk (1E+6,24). Pytables (with > chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and > 257ms for (1E+6,24).The standard deviations of all the results are > always very low, which suggests the performance is stable. I've been reading your code, and you are accessing your data column- wise, instead of row-wise. In the C-world (and hence Python, NumPy, PyTables...) you want to make sure that you access data by row, not column, to get maximum performance. For an explanation on why see: https://portal.g-node.org/python- autumnschool/_media/materials/starving_cpus/starvingcpus.pdf and specifically slides 23 and 31. > Surprisingly, Pytables are 3 times slower than numpy.memmap. I > thought maybe pytables will show better or at least same performance > as numpy.memmap when I need to stream data to the disk and there is > some calculation involved. So next test, I used the same expr as > official bench mark code (poly.py) to operate on the entire array > and streamed the result onto disk. Averagely numpy.memmap+numexpr > took 1.5s to finish the calculation, but Pytables took 9.0s. Then I > start to think, this might because I used the wrong chunkshape for > Pytables. So I did all the tests again with chunkshape = None which > let the Pytables decide its optimized chunkshape (1365, 24). The > results are actually much worse than bigger chunkshape except for > reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms > with bigger chunkshape. It took 358ms for reading a chunk with size > (1E+6,24) into RAM, and 14s to finish the expr calculation. In all > the tests, the pytables use far less RAM (<300MB) than numpy.memmap > (around 1GB). PyTables should not use as much as 300 MB for your problem. You are probably speaking about virtual memory, but you should get the amount of *resident* memory instead. > I am almost sure there is something I did wrong to > make pytables so slow. So if you could give me some hint, I shall > highly appreciate your assistance. I attached my test code and > results. Another thing about your "performance problems" when using compression is that you are trying your benchmarks with completely random data, and in this case, compression is rather useless. Make sure that you use real data for your benchmarks. If it is compressible, things might change a lot. BTW, in order to make your messages more readable, it would help if you can make a proper use of paragraphing. You know, trying to read a big paragraph with 40 lines is not exactly easy. Cheers, -- Francesc Alted ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users