[Scikit-learn-general] cross-validation

Su, Jian, Ph.D. Thu, 12 Dec 2013 12:40:24 -0800

Hello,

I am using pipeline and grid to find the best hyperparameters, as the code in 
the end of the post.


Here are two questions:
1. Even I set random_state=0, the results are not the same every time. How can 
I find the "truth"?
0.867933723197 {'clf__bootstrap': False, 'clf__max_depth': 10, 
'features__univ_select__k': 6, 'clf__n_estimators': 14, 
'features__pca__n_components': 3}
0.888569974774 {'clf__bootstrap': True, 'clf__max_depth': 9, 
'features__univ_select__k': 5, 'clf__n_estimators': 13, 
'features__pca__n_components': 3}
0.885452499713 {'clf__bootstrap': True, 'clf__max_depth': 7, 
'features__univ_select__k': 6, 'clf__n_estimators': 13, 
'features__pca__n_components': 3}

2. To evaluate the classifier, should I use a separate dataset other than X, 
right?
grid_search.predict(X_extra, y_extra)
If it's the way, since (X+X_extra) are actually all I have, the separation of X 
and X_extra will affect the evaluation, right?

Thank you,
Jian


>>>>>>>>>>>>>>>>>>>>
X = preprocessing.scale(X)
n_samples, n_features = np.shape(X)
pca = PCA(n_components=2)
selection = SelectKBest(k=3)
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
svm = SVC(C=1)
pipeline = Pipeline([("features", combined_features), ("svm", svm)])
param_grid = dict(features__pca__n_components=[1, 2, 3, 4, 5],
                  features__univ_select__k=[1, 2, 3, 4, 5, 6],
                  svm__C=[0.1, 0.3, 1, 3, 10, 30],
                  svm__gamma=[0.01, 0.03, 0.1, 0.3, 1],
                  svm__kernel=['rbf','linear']
                  )
cv =  cross_validation.ShuffleSplit(n_samples, n_iter=3,test_size=0.3, 
random_state=0)
grid_search = GridSearchCV(pipeline, param_grid=param_grid, scoring='roc_auc', 
cv=cv, refit=True, n_jobs=-1)
grid_search.fit(X, y)
print grid_search.best_score_, grid_search.best_params_

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] cross-validation

Reply via email to