On Wed, Jul 29, 2009 at 8:24 AM, Francesc Alted<fal...@pytables.org> wrote: > A Thursday 23 July 2009 21:17:17 escriguéreu: >> The slow part of our code is a single line of Python: >> `numpy.dot(matrix, vector)`. `matrix` is a tall (thousands by 50) >> PyTables Array or CArray. `vector` is a row sliced out of it (1 by >> 50). PyTables performs extremely poorly in this case (70 seconds >> compared to 0.04 seconds for NumPy ndarrays). Here is the profiler >> output (%prun), Python 2.5, NumPy 1.3: >> >> ncalls tottime percall cumtime percall filename:lineno(function) >> 285592 25.140 0.000 25.140 0.000 {method '_g_readSlice' of >> 'tables.hdf5Extension.Array' objects} >> 285592 15.211 0.000 27.501 0.000 >> array.py:407(_interpret_indexing) 285592 3.439 0.000 66.873 >> 0.000 array.py:486(__getitem__) 285592 3.164 0.000 8.295 0.000 >> leaf.py:425(_processRange) 856776 2.902 0.000 4.487 0.000 >> utils.py:66(idx2long) 1 2.876 2.876 69.749 69.749 >> {numpy.core.multiarray.dot} 285592 2.719 0.000 27.859 0.000 >> array.py:565(_readSlice) 1142368 2.257 0.000 2.257 0.000 >> utils.py:44(is_idx) >> 1427963 1.782 0.000 1.782 0.000 {len} >> 285592 1.636 0.000 4.492 0.000 flavor.py:344(conv_to_numpy) >> 285592 1.261 0.000 1.261 0.000 numeric.py:180(asarray) >> 285592 1.232 0.000 5.724 0.000 >> flavor.py:110(array_of_flavor2) 571186 1.130 0.000 1.130 0.000 >> {isinstance} >> 285592 1.069 0.000 2.329 0.000 >> flavor.py:353(_conv_numpy_to_numpy) 285592 0.920 0.000 6.644 >> 0.000 flavor.py:130(flavor_to_flavor) 285592 0.896 0.000 7.540 >> 0.000 flavor.py:150(internal_to_flavor) 285592 0.644 0.000 0.644 >> 0.000 {tables.utilsExtension.getIndices} 285592 0.535 0.000 0.535 >> 0.000 leaf.py:243(<lambda>) 285593 0.527 0.000 0.527 0.000 >> {hasattr} >> 285592 0.411 0.000 0.411 0.000 {method 'append' of 'list' >> objects} 1 0.000 0.000 69.750 69.750 labeled_view.py:242(dot) 5 >> 0.000 0.000 0.000 0.000 array.py:123(_getnrows) 1 0.000 >> 0.000 0.000 0.000 labeled_view.py:93(__init__) >> >> (labeled_view.py is the only part that's our code, and it only calls >> numpy.core.multiarray.dot once.) > > Yes, this kind of performance is expected. The problem here is that you are > treating PyTables objects as if they where regular NumPy matrices, but they > are fundamentally different beasts. In particular, accessing a single element > of an array is much slower with PyTables because of several reasons, but the > most important one is that you have to go through several layers (indexing > layers, HDF5 layer, filesystem layer...) in order to get a single element and > this is *slow*.
Indeed. The Python layer takes a big chunk: about 65% of the time. But C code is still taking 35% because it's doing memory allocation, HDF5 calculations, and filesystem calls. > To overcome this what you need is to *block* the access to your data so that > you can retrieve a significant amount of data (a block) on each access. This > is the path that follows the recently introduced `tables.Expr` in 2.2b1. > However, `tables.Expr` is mostly for operating arrays element-wise, and hence > it does not support linear algebra operations yet. As I've said in other > message, you could implement yourself a *blocked* algorithm for doing matrix- > matrix multiplications on top of PyTables for getting maximum performance. By > using this approach I'd say that you could get a similar (if not better) > performance than with numpy.memmap. This is the sort of operation that's tricky enough to get right efficiently. And it falls very naturally into the NumPy abstraction that PyTables already presents. So while I could theoretically do a one-off solution this time (and special-case PyTables in several places in my code), it should really be part of Array's NumPy flavor abstraction. I guess `dot` is slow because lacking a specialized method it simply runs __getitem__ for each item. But looking at http://docs.scipy.org/doc/numpy-1.3.x/reference/c-api.types-and-structures.html#dotfunc and numpy/core/blasdot/_dotblas.c, it appears possible to write a custom dot function for your data type. I'm not going to have the opportunity to try this until mid-September at least, and even then my funding/task allocation is very uncertain, so if this is an important performance point to anyone else (which I suspect it is), someone else will need to get it started. > For example, Ulrich Drepper explains how to get good performance for matrix- > matrix multiplication with today's computers by using memory block access: > > http://people.redhat.com/drepper/summit08-memperf.pdf > > This is for operating with data in-memory, but the technique is the same for > data on-disk. For a general introduction to the blocking technique, you may > want to have a look at my presentation in the last EuroSciPy conference: > > http://www.euroscipy.org/presentations/slides/slides_alted.pdf I understand these presentations, but I doubt that most scientists to whom I've recommended PyTables will understand well enough to implement. Hence the importance of presenting a consistent abstraction with predictable performance. Regards, -Ken ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users