Hi braingateway,

A Thursday 14 October 2010 00:45:05 braingateway escrigué:
> Hi everyone,
> 
> 
> I used to work with numpy.memmap, the speed was roughly OK for me,
> but I always need to save corresponding metadata (such as variable
> names, variable shapes, experiment descriptions, etc.) into a
> separate file, which is a very bad approach when I have lots of data
> files and change their names from time to time. I heard a lot
> amazing characteristics about Pytables recently. It sounds perfectly
> match my application, It is based on HDF5, can be compressed by
> Blosc, and even faster I/O speed that numpy.memmap. So I decide to
> shift my project to Pytables. When I tried the official bench mark
> code (poly.py), it seems OK, at least without compression the I/O
> speed is faster than nump.memmap. However, when I try to dig a
> little big deeper, I got problems immediately.

Mmh, you rather meant *performance* problems probably :-)

> I did several
> different experiments to get familiar with performance spec of
> Pytables. First, I try to just read data chunks (smaller than
> (1E+6,24)) into RAM from a random location in a larger data file
> which containing (3E+6,24) random float64 numbers, about 549MB. For
> each reading operation, I obtained the average speed from 10
> experiments. It took numpy.memmap 56ms to read 1E+6 long single
> column, and 73ms to read data chunk (1E+6,24). Pytables (with
> chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and
> 257ms for (1E+6,24).The standard deviations of all the results are
> always very low, which suggests the performance is stable.

I've been reading your code, and you are accessing your data column-
wise, instead of row-wise.  In the C-world (and hence Python, NumPy, 
PyTables...) you want to make sure that you access data by row, not 
column, to get maximum performance.  For an explanation on why see:

https://portal.g-node.org/python-
autumnschool/_media/materials/starving_cpus/starvingcpus.pdf

and specifically slides 23 and 31.

> Surprisingly, Pytables are 3 times slower than numpy.memmap. I
> thought maybe pytables will show better or at least same performance
> as numpy.memmap when I need to stream data to the disk and there is
> some calculation involved. So next test, I used the same expr as
> official bench mark code (poly.py) to operate on the entire array
> and streamed the result onto disk. Averagely numpy.memmap+numexpr
> took 1.5s to finish the calculation, but Pytables took 9.0s. Then I
> start to think, this might because I used the wrong chunkshape for
> Pytables. So I did all the tests again with chunkshape = None which
> let the Pytables decide its optimized chunkshape (1365, 24). The
> results are actually much worse than bigger chunkshape except for
> reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms
> with bigger chunkshape. It took 358ms for reading a chunk with size
> (1E+6,24) into RAM, and 14s to finish the expr calculation. In all
> the tests, the pytables use far less RAM (<300MB) than numpy.memmap
> (around 1GB).

PyTables should not use as much as 300 MB for your problem.  You are 
probably speaking about virtual memory, but you should get the amount of 
*resident* memory instead.

> I am almost sure there is something I did wrong to
> make pytables so slow. So if you could give me some hint, I shall
> highly appreciate your assistance. I attached my test code and
> results.

Another thing about your "performance problems" when using compression 
is that you are trying your benchmarks with completely random data, and 
in this case, compression is rather useless.  Make sure that you use 
real data for your benchmarks.  If it is compressible, things might 
change a lot.

BTW, in order to make your messages more readable, it would help if you 
can make a proper use of paragraphing.  You know, trying to read a big 
paragraph with 40 lines is not exactly easy.

Cheers,

-- 
Francesc Alted

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to