On 21/02/14 14:07, Javier Martínez-López wrote:
> Hello,
>
> I am quite new to python, so I am sorry if I am asking something too
> simple: I am trying to speed up a raster processing with python and I
> want to use the "sklearn.metrics.pairwise.pairwise_distances" module
> with the mahalanobis distance metric and the " n_jobs=-1" in order to
> speed up the process using all processors. However, when I run it, it
> only uses one processor... while if I use the euclidean distance all
> processors are used... where is the problem? am I doing something
> wrong?
On 21/02/14 14:07, Javier Martínez-López wrote:> Hello,
>
> I am quite new to python, so I am sorry if I am asking something too
> simple: I am trying to speed up a raster processing with python and I
> want to use the "sklearn.metrics.pairwise.pairwise_distances" module
> with the mahalanobis distance metric and the " n_jobs=-1" in order to
> speed up the process using all processors. However, when I run it, it
> only uses one processor... while if I use the euclidean distance all
> processors are used... where is the problem? am I doing something
> wrong?
>
> Thank you and cheers,
>
> Javier
I does not seem it is vectorized up with joblib. It is just deferred to
scipy.spatial.distances.mahalanobis.
Assuming X is ndata x nparam array, m is the mean vector and S is the
covariance matrix, you can compute mahalanobis distances in parallel
like so:
import numpy as np
import scipy
from scipy.linalg import cholesky, solve_triangular
from sklearn.externals.joblib import Parallel, delayed
from multiprocessing import cpu_count
def _schedule(n, nproc):
""" guided scheduler """
start = 0
size = (n - start) // nproc
while size > 100:
yield slice(start,start+size)
start += size
size = (n - start) // nproc
yield slice(start,n+1)
return
def _mahalanobis_distances(m, L, X):
cX = X - m[np.newaxis,:]
tmp = solve_triangular(L, cX.T, lower=True).T
tmp **= 2
return np.sqrt(tmp.sum(axis=1))
def mahalanobis_distances(m, S, X, parallel=True):
L = cholesky(S, lower=True)
n = X.shape[0]
if parallel:
nproc = cpu_count()
res = (Parallel(n_jobs=-1)
(delayed(_mahalanobis_distances)
(m, L, X[s,:])
for s in _schedule(n,nproc)))
return np.hstack(res)
else:
return _mahalanobis_distances(m, L, X)
# scipy.spatial.distance.mahalanobis for comparison
from scipy.spatial import distance
def _mahalanobis_distances_scipy(m, SI, X):
n = X.shape[0]
mahal = np.zeros(n)
for i in xrange(X.shape[0]):
x = X[i,:]
mahal[i] = distance.mahalanobis(x,m,SI)
return mahal
def mahalanobis_distances_scipy(m, S, X, parallel=True):
SI = np.linalg.inv(S)
n = X.shape[0]
if parallel:
nproc = cpu_count()
res = (Parallel(n_jobs=-1)
(delayed(_mahalanobis_distances_scipy)
(m, SI, X[s,:])
for s in _schedule(n,nproc)))
return np.hstack(res)
else:
return _mahalanobis_distances_scipy(m, SI, X)
if __name__ == "__main__":
## verify accuracy
X = np.random.randn(300000*16).reshape((300000,16))
m = np.mean(X, axis=0)
S = np.cov(X, rowvar=False)
mahal_scipy = mahalanobis_distances_scipy(m, S, X)
mahal_cholesky = mahalanobis_distances(m, S, X)
print("Similar result to scipy.spatial.distance.mahalanobis: %s" %
('true' if np.allclose(mahal_scipy - mahal_cholesky,0) else
'false',))
mahal = (mahalanobis_distances(m, S, X, parallel=True)
- mahalanobis_distances(m, S, X, parallel=False)
+ mahalanobis_distances_scipy(m, S, X, parallel=True)
- mahalanobis_distances_scipy(m, S, X, parallel=False))
print("Similar results with and without parallel execution: %s" %
('true' if np.allclose(mahal,0) else 'false',))
# timing
from time import clock
print("Example timings:")
t0 = clock()
mahalanobis_distances(m, S, X, parallel=True)
t1 = clock()
print("cholesky, parallel: %f ms" % (1000*(t1-t0),))
t0 = clock()
mahalanobis_distances(m, S, X, parallel=False)
t1 = clock()
print("cholesky, single process: %f ms" % (1000*(t1-t0),))
t0 = clock()
mahalanobis_distances_scipy(m, S, X, parallel=True)
t1 = clock()
print("scipy, parallel: %f ms" % (1000*(t1-t0),))
t0 = clock()
mahalanobis_distances_scipy(m, S, X, parallel=False)
t1 = clock()
print("scipy, single process: %f ms" % (1000*(t1-t0),))
This gives me:
$ python mahal.py
Similar result to scipy.spatial.distance.mahalanobis: true
Similar results with and without parallel execution: true
Example timings:
cholesky, parallel: 122.138000 ms
cholesky, single process: 159.878000 ms
scipy, parallel: 308.829000 ms
scipy, single process: 6744.384000 ms
Actually, there is a strange interaction with MKL threading here:
try:
import mkl
mkl.set_num_threads(1)
except ImportError:
pass
Now I get these timings:
$ python mahal.py
Similar result to scipy.spatial.distance.mahalanobis: true
Similar results with and without parallel execution: true
Example timings:
cholesky, parallel: 120.784000 ms
cholesky, single process: 84.593000 ms
scipy, parallel: 129.127000 ms
scipy, single process: 6749.786000 ms
The version that uses a single MKL thread and a single process is
actually the fastest.
I could probably get even better results using Cython, BLAS and LAPACK,
but this can suffice for now.
Sturla
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general