Hello folks, I've just added an n_jobs option to the pairwise_distances and pairwise_kernels functions. This works by breaking down the pairwise matrix into "n_jobs" even slices and doing the computations in parallel.
On the USPS dataset (n_samples=7291, n_features=257), I got the following results: sparse, n_jobs=1: 30.92 sparse, n_jobs=4: 10.17 dense, n_jobs=1: 7.64 dense, n_jobs=4: 4.75 I also added a bench using random data in benchmarks/bench_plot_parallel_pairwise.py. Overall, it seems that the memory copying implied by the use of "hstack" and "Parallel" is worth the price, especially on larger datasets. On smaller ones, using only one core may be slightly faster. For simplicity, I prefer not to add an n_jobs option to the individual metric functions (euclidean_distances, rbf_kernel, ...). Instead, just use pairwise_distances(X, Y, metric="...", n_jobs=...) if you want to do parallel computation. We can now export an n_jobs attribute to the estimators that use pairwise metrics (neighbors, kernel pca, ...). I'll leave that to the maintainers of the respective modules. Mathieu ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
