Dear scikit-learners,
during the last sprint we've spotted an efficiency issue with the numpy.dot for
numpy versions < 1.8. Apparently, the dot allocates additional copies in order
to deliver appropriate input to the underlying BLAS gemm function which expects
Fortran contiguous memory layout for both matrices.
For example, if a 1 GB rectangular matrix is multiplied by a small square
matrix, this does not consume 2 GB but 3 GB which is certainly not ideal.
Also see http://wiki.scipy.org/PerformanceTips for a detailed description of
the problem and its solution, section `Linear Algebra on Large Arrays`.
To tackle this problem we recently created a wrapper function, called
`fast_dot`, that checks the data layout and transposes the data if necessary
while passing appropriate transpose arguments to the gemm function to compute
the multiplication requested. On the related branch the function is only used
if the numpy version available is < 1.8 --- otherwise the call is dispatched to
np.dot.
As far as I can tell, this fixes the issue across all builds I've seen in the
last weeks. In many cases it even speeds up the computation. On my box for
numpy < 1.8 this even speeds up the test-suite, if only about a few seconds.
Before we can feel comfortable to merge this new functionality additional
benchmarks and reviews would be highly appreciated.
I'm currently at the EuroSciPy -- don't hesitate to address me in case you have
any questions.
Cheers,
Denis
For details see PR #2288.
For convenience here's a prototypical plot:
https://www.dropbox.com/s/na0zehig5eolx3x/fast_dot_profile.png
To benchmark and create a similar plot, please run this script
https://gist.github.com/dengemann/6094449
using this memory profiler
https://github.com/fabianp/memory_profiler
like that:
mprof run --python run_profile_fast_dot.py
mprof plot
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Das Forschungszentrum oeffnet seine Tueren am Sonntag, 29. September, von 10:00
bis 17:00 Uhr: http://www.tagderneugier.de
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general