On Wed, Jul 29, 2009 at 8:24 AM, Francesc Alted<fal...@pytables.org> wrote:
> A Thursday 23 July 2009 21:17:17 escriguéreu:
>> The slow part of our code is a single line of Python:
>> `numpy.dot(matrix, vector)`. `matrix` is a tall (thousands by 50)
>> PyTables Array or CArray. `vector` is a row sliced out of it (1 by
>> 50). PyTables performs extremely poorly in this case (70 seconds
>> compared to 0.04 seconds for NumPy ndarrays). Here is the profiler
>> output (%prun), Python 2.5, NumPy 1.3:
>>
>>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>   285592   25.140    0.000   25.140    0.000 {method '_g_readSlice' of
>> 'tables.hdf5Extension.Array' objects}
>>   285592   15.211    0.000   27.501    0.000
>> array.py:407(_interpret_indexing) 285592    3.439    0.000   66.873
>> 0.000 array.py:486(__getitem__) 285592    3.164    0.000    8.295    0.000
>> leaf.py:425(_processRange) 856776    2.902    0.000    4.487    0.000
>> utils.py:66(idx2long) 1    2.876    2.876   69.749   69.749
>> {numpy.core.multiarray.dot} 285592    2.719    0.000   27.859    0.000
>> array.py:565(_readSlice) 1142368    2.257    0.000    2.257    0.000
>> utils.py:44(is_idx)
>>  1427963    1.782    0.000    1.782    0.000 {len}
>>   285592    1.636    0.000    4.492    0.000 flavor.py:344(conv_to_numpy)
>>   285592    1.261    0.000    1.261    0.000 numeric.py:180(asarray)
>>   285592    1.232    0.000    5.724    0.000
>> flavor.py:110(array_of_flavor2) 571186    1.130    0.000    1.130    0.000
>> {isinstance}
>>   285592    1.069    0.000    2.329    0.000
>> flavor.py:353(_conv_numpy_to_numpy) 285592    0.920    0.000    6.644
>> 0.000 flavor.py:130(flavor_to_flavor) 285592    0.896    0.000    7.540
>> 0.000 flavor.py:150(internal_to_flavor) 285592    0.644    0.000    0.644
>>  0.000 {tables.utilsExtension.getIndices} 285592    0.535    0.000    0.535
>>    0.000 leaf.py:243(<lambda>) 285593    0.527    0.000    0.527    0.000
>> {hasattr}
>>   285592    0.411    0.000    0.411    0.000 {method 'append' of 'list'
>> objects} 1    0.000    0.000   69.750   69.750 labeled_view.py:242(dot) 5
>>  0.000    0.000    0.000    0.000 array.py:123(_getnrows) 1    0.000
>> 0.000    0.000    0.000 labeled_view.py:93(__init__)
>>
>> (labeled_view.py is the only part that's our code, and it only calls
>> numpy.core.multiarray.dot once.)
>
> Yes, this kind of performance is expected.  The problem here is that you are
> treating PyTables objects as if they where regular NumPy matrices, but they
> are fundamentally different beasts.  In particular, accessing a single element
> of an array is much slower with PyTables because of several reasons, but the
> most important one is that you have to go through several layers (indexing
> layers, HDF5 layer, filesystem layer...) in order to get a single element and
> this is *slow*.

Indeed. The Python layer takes a big chunk: about 65% of the time. But
C code is still taking 35% because it's doing memory allocation, HDF5
calculations, and filesystem calls.

> To overcome this what you need is to *block* the access to your data so that
> you can retrieve a significant amount of data (a block) on each access.  This
> is the path that follows the recently introduced `tables.Expr` in 2.2b1.
> However, `tables.Expr` is mostly for operating arrays element-wise, and hence
> it does not support linear algebra operations yet.  As I've said in other
> message, you could implement yourself a *blocked* algorithm for doing matrix-
> matrix multiplications on top of PyTables for getting maximum performance.  By
> using this approach I'd say that you could get a similar (if not better)
> performance than with numpy.memmap.

This is the sort of operation that's tricky enough to get right
efficiently. And it falls very naturally into the NumPy abstraction
that PyTables already presents. So while I could theoretically do a
one-off solution this time (and special-case PyTables in several
places in my code), it should really be part of Array's NumPy flavor
abstraction.

I guess `dot` is slow because lacking a specialized method it simply
runs __getitem__ for each item. But looking at
http://docs.scipy.org/doc/numpy-1.3.x/reference/c-api.types-and-structures.html#dotfunc
and numpy/core/blasdot/_dotblas.c, it appears possible to write a
custom dot function for your data type. I'm not going to have the
opportunity to try this until mid-September at least, and even then my
funding/task allocation is very uncertain, so if this is an important
performance point to anyone else (which I suspect it is), someone else
will need to get it started.

> For example, Ulrich Drepper explains how to get good performance for matrix-
> matrix multiplication with today's computers by using memory block access:
>
> http://people.redhat.com/drepper/summit08-memperf.pdf
>
> This is for operating with data in-memory, but the technique is the same for
> data on-disk.  For a general introduction to the blocking technique, you may
> want to have a look at my presentation in the last EuroSciPy conference:
>
> http://www.euroscipy.org/presentations/slides/slides_alted.pdf

I understand these presentations, but I doubt that most scientists to
whom I've recommended PyTables will understand well enough to
implement. Hence the importance of presenting a consistent abstraction
with predictable performance.

Regards,
-Ken

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to