That is great, thanks! I do not have the mkl module (it isn't free,
right?) but with your script the calculation is approx. 10 times
faster than in R. Is there a way to increase performance using Cython,
BLAS and LAPACK? Could you possibly show some examples of how to do
it?

Thank you very much again and cheers,

Javier

On Sat, Feb 22, 2014 at 12:20 AM, Sturla Molden <sturla.mol...@gmail.com> wrote:
> On 21/02/14 14:07, Javier Martínez-López wrote:
>> Hello,
>>
>> I am quite new to python, so I am sorry if I am asking something too
>> simple: I am trying to speed up a raster processing with python and I
>> want to use the "sklearn.metrics.pairwise.pairwise_distances" module
>> with the mahalanobis distance metric and the " n_jobs=-1" in order to
>> speed up the process using all processors. However, when I run it, it
>> only uses one processor... while if I use the euclidean distance all
>> processors are used... where is the problem? am I doing something
>> wrong?
>
>
> On 21/02/14 14:07, Javier Martínez-López wrote:> Hello,
>  >
>  > I am quite new to python, so I am sorry if I am asking something too
>  > simple: I am trying to speed up a raster processing with python and I
>  > want to use the "sklearn.metrics.pairwise.pairwise_distances" module
>  > with the mahalanobis distance metric and the " n_jobs=-1" in order to
>  > speed up the process using all processors. However, when I run it, it
>  > only uses one processor... while if I use the euclidean distance all
>  > processors are used... where is the problem? am I doing something
>  > wrong?
>  >
>  > Thank you and cheers,
>  >
>  > Javier
>
> I does not seem it is vectorized up with joblib. It is just deferred to
> scipy.spatial.distances.mahalanobis.
>
> Assuming X is ndata x nparam array, m is the mean vector and S is the
> covariance matrix, you can compute mahalanobis distances in parallel
> like so:
>
> import numpy as np
> import scipy
> from scipy.linalg import cholesky, solve_triangular
> from sklearn.externals.joblib import Parallel, delayed
> from multiprocessing import cpu_count
>
> def _schedule(n, nproc):
>      """ guided scheduler """
>      start = 0
>      size = (n - start) // nproc
>      while size > 100:
>          yield slice(start,start+size)
>          start += size
>          size = (n - start) // nproc
>      yield slice(start,n+1)
>      return
>
> def _mahalanobis_distances(m, L, X):
>      cX = X - m[np.newaxis,:]
>      tmp = solve_triangular(L, cX.T, lower=True).T
>      tmp **= 2
>      return np.sqrt(tmp.sum(axis=1))
>
> def mahalanobis_distances(m, S, X, parallel=True):
>      L = cholesky(S, lower=True)
>      n = X.shape[0]
>      if parallel:
>          nproc = cpu_count()
>          res = (Parallel(n_jobs=-1)
>                 (delayed(_mahalanobis_distances)
>                   (m, L, X[s,:])
>                     for s in _schedule(n,nproc)))
>          return np.hstack(res)
>      else:
>          return _mahalanobis_distances(m, L, X)
>
>
> # scipy.spatial.distance.mahalanobis for comparison
>
> from scipy.spatial import distance
>
> def _mahalanobis_distances_scipy(m, SI, X):
>      n = X.shape[0]
>      mahal = np.zeros(n)
>      for i in xrange(X.shape[0]):
>          x = X[i,:]
>          mahal[i] = distance.mahalanobis(x,m,SI)
>      return mahal
>
> def mahalanobis_distances_scipy(m, S, X, parallel=True):
>      SI = np.linalg.inv(S)
>      n = X.shape[0]
>      if parallel:
>          nproc = cpu_count()
>          res = (Parallel(n_jobs=-1)
>                 (delayed(_mahalanobis_distances_scipy)
>                  (m, SI, X[s,:])
>                    for s in _schedule(n,nproc)))
>          return np.hstack(res)
>      else:
>          return _mahalanobis_distances_scipy(m, SI, X)
>
>
>
> if __name__ == "__main__":
>
>      ## verify accuracy
>
>      X = np.random.randn(300000*16).reshape((300000,16))
>      m = np.mean(X, axis=0)
>      S = np.cov(X, rowvar=False)
>
>      mahal_scipy = mahalanobis_distances_scipy(m, S, X)
>      mahal_cholesky = mahalanobis_distances(m, S, X)
>
>      print("Similar result to scipy.spatial.distance.mahalanobis: %s" %
>         ('true' if np.allclose(mahal_scipy - mahal_cholesky,0) else
> 'false',))
>
>      mahal = (mahalanobis_distances(m, S, X, parallel=True)
>         - mahalanobis_distances(m, S, X, parallel=False)
>         + mahalanobis_distances_scipy(m, S, X, parallel=True)
>         - mahalanobis_distances_scipy(m, S, X, parallel=False))
>
>      print("Similar results with and without parallel execution: %s" %
>         ('true' if np.allclose(mahal,0) else 'false',))
>
>      # timing
>      from time import clock
>
>      print("Example timings:")
>
>      t0 = clock()
>      mahalanobis_distances(m, S, X, parallel=True)
>      t1 = clock()
>      print("cholesky, parallel: %f ms" % (1000*(t1-t0),))
>
>      t0 = clock()
>      mahalanobis_distances(m, S, X, parallel=False)
>      t1 = clock()
>      print("cholesky, single process: %f ms" % (1000*(t1-t0),))
>
>      t0 = clock()
>      mahalanobis_distances_scipy(m, S, X, parallel=True)
>      t1 = clock()
>      print("scipy, parallel: %f ms" % (1000*(t1-t0),))
>
>      t0 = clock()
>      mahalanobis_distances_scipy(m, S, X, parallel=False)
>      t1 = clock()
>      print("scipy, single process: %f ms" % (1000*(t1-t0),))
>
>
> This gives me:
>
> $ python mahal.py
> Similar result to scipy.spatial.distance.mahalanobis: true
> Similar results with and without parallel execution: true
> Example timings:
> cholesky, parallel: 122.138000 ms
> cholesky, single process: 159.878000 ms
> scipy, parallel: 308.829000 ms
> scipy, single process: 6744.384000 ms
>
>
> Actually, there is a strange interaction with MKL threading here:
>
>      try:
>         import mkl
>         mkl.set_num_threads(1)
>      except ImportError:
>         pass
>
> Now I get these timings:
>
> $ python mahal.py
> Similar result to scipy.spatial.distance.mahalanobis: true
> Similar results with and without parallel execution: true
> Example timings:
> cholesky, parallel: 120.784000 ms
> cholesky, single process: 84.593000 ms
> scipy, parallel: 129.127000 ms
> scipy, single process: 6749.786000 ms
>
>
> The version that uses a single MKL thread and a single process is
> actually the fastest.
>
> I could probably get even better results using Cython, BLAS and LAPACK,
> but this can suffice for now.
>
>
> Sturla
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to