Re: [Scikit-learn-general] n_jobs in GridSearch

Paul . Czodrowski Fri, 26 Oct 2012 05:23:38 -0700

Dear SciKitters,

> >
> > I was wondering if I properly defined the grid search in the case of a
SVM:
> >
> > "
> > # code snippet
> > tuned_parameters = [{'kernel': ['linear'],'C': [1,10,100,1000]}]
> > scores = [ ('precision', precision_score), ('recall', recall_score),]
> > for score_name, score_func in scores:
> >   clf = GridSearchCV(SVC(C=1), tuned_parameters,
> > score_func=score_func,n_jobs=20)
> > "
>


here is more code - and yes, i do output the
"
for score_name, score_func in scores:
    print "# Tuning hyper-parameters for %s" % score_name
    print
    clf = GridSearchCV(SVC(C=1), tuned_parameters,
score_func=score_func,n_jobs=1)
    clf.fit(trainDescrs, trainActs, cv=5,n_jobs=1)
    print "Best parameters set found on development set:"
    print
    print clf.best_estimator_
    print
    print "Grid scores on development set:"
    print
    for params, mean_score, scores in clf.grid_scores_:
        print "%0.3f (+/-%0.03f) for %r" % (
            mean_score, scores.std() / 2, params)
    print
"

> Why do you loop on score without storing the `clf.best_params_` of
> each iteration? their is no point in running a grid search for finding
> the optimal parameter values if you don't even look at the parameters
> and average score values in the end :)
>
> Also: what is your problem?
My problem is that the job without parallisation takes ages, whereas on a
single CPU, it runs on 0.66 seconds.
Now we come to the point, that such a computation does not make sense at
all (see the end of your email). But on a larger dataset, the parallelised
version also takes ages, and I simply don't undertand this behavior.


>
> What do you expect to get?
At least the same speed as with the single CPU calculations. But maybe my
use case is not appropriate for such a study.

What would one expect when tuned_parameters looks like this:
"
[{'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 {'C': [1, 10, 100, 1000], 'kernel': ['linear']}]
"

When using "n_jobs=1", I do not get any output in the first iteration step.
Here, I use a slightly larger dataset (300 compounds

>
> What do you get when running this on your data?
>
> > My data set is rather small - for debugging purposes, it just contains
10
> > training and 10 testing molecules with 120 numerical descriptors each.
I'm
> > trying to resolve a binary classification with the help of a SVM.
>
> There is probably no point in parallelizing the grid search of such a
> small problem. How long do a single `SVC(C=1).fit(X, y)` take? If it's
> less than a couple of seconds you should not bother with multi
> processing and just leave `n_jobs=1` (i.e. its default value).

Agreed.


Thanks for your feedback,
Paul

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] n_jobs in GridSearch

Reply via email to