Hi Mathieu!
do you see any use case for which distance calculation is
performed in an "outer" loop?
As a general rule of thumb, IMHO I think it's better to parallelize
at higher levels (more external iteration loops). It's generally:
- more efficient
- keeps the API cleaner (non need to push n_jobs parameter down)
What do you think?
Paolo
On Wed, Jan 25, 2012 at 4:16 PM, Mathieu Blondel <[email protected]>wrote:
> Hello folks,
>
> I've just added an n_jobs option to the pairwise_distances and
> pairwise_kernels functions. This works by breaking down the pairwise
> matrix into "n_jobs" even slices and doing the computations in
> parallel.
>
> On the USPS dataset (n_samples=7291, n_features=257), I got the
> following results:
>
> sparse, n_jobs=1: 30.92
> sparse, n_jobs=4: 10.17
>
> dense, n_jobs=1: 7.64
> dense, n_jobs=4: 4.75
>
> I also added a bench using random data in
> benchmarks/bench_plot_parallel_pairwise.py. Overall, it seems that the
> memory copying implied by the use of "hstack" and "Parallel" is worth
> the price, especially on larger datasets. On smaller ones, using only
> one core may be slightly faster.
>
> For simplicity, I prefer not to add an n_jobs option to the individual
> metric functions (euclidean_distances, rbf_kernel, ...). Instead, just
> use pairwise_distances(X, Y, metric="...", n_jobs=...) if you want to
> do parallel computation.
>
> We can now export an n_jobs attribute to the estimators that use
> pairwise metrics (neighbors, kernel pca, ...). I'll leave that to the
> maintainers of the respective modules.
>
> Mathieu
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general