On 21/02/14 14:07, Javier Martínez-López wrote: > Hello, > > I am quite new to python, so I am sorry if I am asking something too > simple: I am trying to speed up a raster processing with python and I > want to use the "sklearn.metrics.pairwise.pairwise_distances" module > with the mahalanobis distance metric and the " n_jobs=-1" in order to > speed up the process using all processors. However, when I run it, it > only uses one processor... while if I use the euclidean distance all > processors are used... where is the problem? am I doing something > wrong?
On 21/02/14 14:07, Javier Martínez-López wrote:> Hello, > > I am quite new to python, so I am sorry if I am asking something too > simple: I am trying to speed up a raster processing with python and I > want to use the "sklearn.metrics.pairwise.pairwise_distances" module > with the mahalanobis distance metric and the " n_jobs=-1" in order to > speed up the process using all processors. However, when I run it, it > only uses one processor... while if I use the euclidean distance all > processors are used... where is the problem? am I doing something > wrong? > > Thank you and cheers, > > Javier I does not seem it is vectorized up with joblib. It is just deferred to scipy.spatial.distances.mahalanobis. Assuming X is ndata x nparam array, m is the mean vector and S is the covariance matrix, you can compute mahalanobis distances in parallel like so: import numpy as np import scipy from scipy.linalg import cholesky, solve_triangular from sklearn.externals.joblib import Parallel, delayed from multiprocessing import cpu_count def _schedule(n, nproc): """ guided scheduler """ start = 0 size = (n - start) // nproc while size > 100: yield slice(start,start+size) start += size size = (n - start) // nproc yield slice(start,n+1) return def _mahalanobis_distances(m, L, X): cX = X - m[np.newaxis,:] tmp = solve_triangular(L, cX.T, lower=True).T tmp **= 2 return np.sqrt(tmp.sum(axis=1)) def mahalanobis_distances(m, S, X, parallel=True): L = cholesky(S, lower=True) n = X.shape[0] if parallel: nproc = cpu_count() res = (Parallel(n_jobs=-1) (delayed(_mahalanobis_distances) (m, L, X[s,:]) for s in _schedule(n,nproc))) return np.hstack(res) else: return _mahalanobis_distances(m, L, X) # scipy.spatial.distance.mahalanobis for comparison from scipy.spatial import distance def _mahalanobis_distances_scipy(m, SI, X): n = X.shape[0] mahal = np.zeros(n) for i in xrange(X.shape[0]): x = X[i,:] mahal[i] = distance.mahalanobis(x,m,SI) return mahal def mahalanobis_distances_scipy(m, S, X, parallel=True): SI = np.linalg.inv(S) n = X.shape[0] if parallel: nproc = cpu_count() res = (Parallel(n_jobs=-1) (delayed(_mahalanobis_distances_scipy) (m, SI, X[s,:]) for s in _schedule(n,nproc))) return np.hstack(res) else: return _mahalanobis_distances_scipy(m, SI, X) if __name__ == "__main__": ## verify accuracy X = np.random.randn(300000*16).reshape((300000,16)) m = np.mean(X, axis=0) S = np.cov(X, rowvar=False) mahal_scipy = mahalanobis_distances_scipy(m, S, X) mahal_cholesky = mahalanobis_distances(m, S, X) print("Similar result to scipy.spatial.distance.mahalanobis: %s" % ('true' if np.allclose(mahal_scipy - mahal_cholesky,0) else 'false',)) mahal = (mahalanobis_distances(m, S, X, parallel=True) - mahalanobis_distances(m, S, X, parallel=False) + mahalanobis_distances_scipy(m, S, X, parallel=True) - mahalanobis_distances_scipy(m, S, X, parallel=False)) print("Similar results with and without parallel execution: %s" % ('true' if np.allclose(mahal,0) else 'false',)) # timing from time import clock print("Example timings:") t0 = clock() mahalanobis_distances(m, S, X, parallel=True) t1 = clock() print("cholesky, parallel: %f ms" % (1000*(t1-t0),)) t0 = clock() mahalanobis_distances(m, S, X, parallel=False) t1 = clock() print("cholesky, single process: %f ms" % (1000*(t1-t0),)) t0 = clock() mahalanobis_distances_scipy(m, S, X, parallel=True) t1 = clock() print("scipy, parallel: %f ms" % (1000*(t1-t0),)) t0 = clock() mahalanobis_distances_scipy(m, S, X, parallel=False) t1 = clock() print("scipy, single process: %f ms" % (1000*(t1-t0),)) This gives me: $ python mahal.py Similar result to scipy.spatial.distance.mahalanobis: true Similar results with and without parallel execution: true Example timings: cholesky, parallel: 122.138000 ms cholesky, single process: 159.878000 ms scipy, parallel: 308.829000 ms scipy, single process: 6744.384000 ms Actually, there is a strange interaction with MKL threading here: try: import mkl mkl.set_num_threads(1) except ImportError: pass Now I get these timings: $ python mahal.py Similar result to scipy.spatial.distance.mahalanobis: true Similar results with and without parallel execution: true Example timings: cholesky, parallel: 120.784000 ms cholesky, single process: 84.593000 ms scipy, parallel: 129.127000 ms scipy, single process: 6749.786000 ms The version that uses a single MKL thread and a single process is actually the fastest. I could probably get even better results using Cython, BLAS and LAPACK, but this can suffice for now. Sturla ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general