On 06.06.2014 04:18, Sturla Molden wrote: > On 05/06/14 22:51, Nathaniel Smith wrote: > >> This gets evaluated as: >> >> tmp1 = a + b >> tmp2 = tmp1 + c >> result = tmp2 / c >> >> All these temporaries are very expensive. Suppose that a, b, c are >> arrays with N bytes each, and N is large. For simple arithmetic like >> this, then costs are dominated by memory access. Allocating an N byte >> array requires the kernel to clear the memory, which incurs N bytes of >> memory traffic. > > It seems to be the case that a large portion of the run-time in Python > code using NumPy can be spent in the kernel zeroing pages (which the > kernel does for security reasons). > > I think this can also be seen as a 'malloc problem'. It comes about > because each new NumPy array starts with a fresh buffer allocated by > malloc. Perhaps buffers can be reused? > > Sturla > >
Caching memory inside of numpy would indeed solve this issue too. There has even been a paper written on this which contains some more serious benchmarks than the laplace case which runs on very old hardware (and the inplace and out of place cases are actually not the same, one computes array/scalar the other array * (1 / scalar)): hiperfit.dk/pdf/Doubling.pdf "The result is an improvement of as much as 2.29 times speedup, on average 1.32 times speedup across a benchmark suite of 15 applications" The problem with this approach is that it is already difficult enough to handle memory in numpy. Having a cache that potentially stores gigabytes of memory out of the users sight will just make things worse. This would not be needed if we can come up with a way on how python can help out numpy in eliding the temporaries. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com