I think that I have to re-phrase my post, since I discovered an awkward
behavior using SVC and the linear kernel - exactly THIS kernel takes ages
on my dataset.

E.g. the "RBF" kernel runs perfect, and so does the GridSearch! :)
Exclusion of the "linear" kernel from GridSearch gives now the following:

"
# Tuning hyper-parameters for precision

Best parameters set found on development set:

SVC(C=1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel=rbf, probability=False, shrinking=True, tol=0.001, verbose=False)

Grid scores on development set:

0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 1}
0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 10}
0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 100}
0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 1000}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 1}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 10}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 100}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 1000}

# Tuning hyper-parameters for recall

Best parameters set found on development set:

SVC(C=1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel=rbf, probability=False, shrinking=True, tol=0.001, verbose=False)

Grid scores on development set:

0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 1}
0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 10}
0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 100}
0.000 (+/-0.000) for {'kernel': 'rbf', 'C': 1000}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 1}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 10}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 100}
0.000 (+/-0.000) for {'kernel': 'sigmoid', 'C': 1000}
"

by using this code
"
tuned_parameters = [{'kernel': ['rbf'],'C': [1,10,100,1000]},
                    {'kernel': ['sigmoid'],'C': [1,10,100,1000]}]
scores = [('precision', precision_score),('recall', recall_score)]

for score_name, score_func in scores:
    print "# Tuning hyper-parameters for %s" % score_name
    print
    clf = GridSearchCV(SVC(C=1), tuned_parameters,
score_func=score_func,n_jobs=1)
    clf.fit(trainDescrs, trainActs, cv=5,n_jobs=1)
    print "Best parameters set found on development set:"
    print
    print clf.best_estimator_
    print
    print "Grid scores on development set:"
    print
    for params, mean_score, scores in clf.grid_scores_:
        print "%0.3f (+/-%0.03f) for %r" % (
            mean_score, scores.std() / 2, params)
    print
"


Question:
Bad statistics - am I outputting anything wrong?

Apologies:
Sorry guys to have mis-lead with my first posting. Hopefully I have not
discourage anyone from further answering...

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to