Sturla Molden <sturla.mol...@gmail.com> wrote: > Yes there is! We can make a very cache friendly loop by iterating over > DTRSV instead of using DTRTRS as this example does. That also avoids > temporary variables, and we can multithread the call to DTRSV.
Just a tiny comment on this: It might be easier for the LAPACK library to use SIMD extensions (SSE2, SSE3, AVX) if we use DTRTRS. So we need to benchmark with relevant matrix sizes to see which strategy is the better. One is better for SIMD, the other is better for cache. The best way to time this is to put performance timers inside the Cython code: - Windows: QueryPerformanceCounter - Mac OS X: mach_absolute_time - Linux: clock_gettime with CLOCK_REALTIME Sturla ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general