to, 2010-04-01 kello 11:30 -0700, M Trumpis kirjoitti: [clip] > Actually I realized later that the main slow-down comes from the fact > that my array was strided in fortran order (ascending strides). But > from the point of view of a ufunc that is operating independently at > each value, why would it need to respect striding?
Correct. There has been discussion about improving ufunc performance by optimizing the memory access pattern. The main issue in your case is that the output array is in C order, so that it is *not* possible to access both the input and the output arrays in the optimal order. Fixing this issue requires allowing ufuncs to allocate arrays that are in non-C order. This needs a design decision that has not so far been made. I'd be for this, I don't think it can break anything. The second issue is that there is no universal access pattern choice for every case that is optimal on all processor cache layouts. This forces to use heuristics to determine the access pattern, which is not so simple to get right, and usually would require some information of the processor's cache architecture. (Even some code has been written along these lines, though mostly addressing the reduction: http://github.com/pv/numpy-work/tree/ticket/1143-speedup-reduce http://projects.scipy.org/numpy/ticket/1143 Not production quality so far, and the non-C-output order would definitely help also here.) -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion