Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Javier Martínez-López
That is great, thanks! I do not have the mkl module (it isn't free, right?) but with your script the calculation is approx. 10 times faster than in R. Is there a way to increase performance using Cython, BLAS and LAPACK? Could you possibly show some examples of how to do it? Thank you very much

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Vlad Niculae
If you're affiliated with a university, Anaconda has free academic licenses that include MKL and their optimized builds. Vlad On Mon Feb 24 09:22:07 2014, Javier Martínez-López wrote: That is great, thanks! I do not have the mkl module (it isn't free, right?) but with your script the

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Javier Martínez-López
Thank you very much! Already working! However, contrasting with the results obtained by Sturla, I get the fastest result with cholesky, parallel method: bash-4.1$ python mahalk.py Similar result to scipy.spatial.distance.mahalanobis: true Similar results with and without parallel execution: true

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Sturla Molden
Javier Martínez-López javi.martinez.lo...@gmail.com wrote: That is great, thanks! I do not have the mkl module (it isn't free, right?) but with your script the calculation is approx. 10 times faster than in R. Great! By the way, using OpenBLAS will be fast as well. On Mac OS X Mavericks

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Sturla Molden
Sturla Molden sturla.mol...@gmail.com wrote: Yes there is! We can make a very cache friendly loop by iterating over DTRSV instead of using DTRTRS as this example does. That also avoids temporary variables, and we can multithread the call to DTRSV. Just a tiny comment on this: It might be

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Sturla Molden
Sturla Molden sturla.mol...@gmail.com wrote: Just a tiny comment on this: It might be easier for the LAPACK library to use SIMD extensions (SSE2, SSE3, AVX) if we use DTRTRS. So we need to benchmark with relevant matrix sizes to see which strategy is the better. One is better for SIMD, the

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Sturla Molden
On 24/02/14 16:11, Javier Martínez-López wrote: I am sorry Sturla but I am new in python and I cannot follow you on this... hopefully someone else will! I am now combining the R and the Python code and trying to scale up the process! Thank you very much and cheers, Javier This would be

[Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-21 Thread Javier Martínez-López
Hello, I am quite new to python, so I am sorry if I am asking something too simple: I am trying to speed up a raster processing with python and I want to use the sklearn.metrics.pairwise.pairwise_distances module with the mahalanobis distance metric and the n_jobs=-1 in order to speed up the

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-21 Thread Sturla Molden
On 21/02/14 14:07, Javier Martínez-López wrote: Hello, I am quite new to python, so I am sorry if I am asking something too simple: I am trying to speed up a raster processing with python and I want to use the sklearn.metrics.pairwise.pairwise_distances module with the mahalanobis distance