Sturla Molden <sturla.mol...@gmail.com>
wrote:

> Yes there is! We can make a very cache friendly loop by iterating over
> DTRSV instead of using DTRTRS as this example does. That also avoids
> temporary variables, and we can multithread the call to DTRSV. 

Just a tiny comment on this: It might be easier for the LAPACK library to
use SIMD extensions (SSE2, SSE3, AVX) if we use DTRTRS. So we need to
benchmark with relevant matrix sizes to see which strategy is the better.
One is better for SIMD, the other is better for cache.

The best way to time this is to put performance timers inside the Cython
code:

 - Windows: QueryPerformanceCounter
 - Mac OS X: mach_absolute_time
 - Linux: clock_gettime with CLOCK_REALTIME

Sturla


------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to