Hi Tim, By default, cross_val_score uses on StratifiedKFold(shuffle=False) to create the train/test folds while train_test_split uses ShuffleSplit. The discrepancy you observe might therefore come from either shuffling, the stratification of the labels or both of them.
Can you set the CV parameter in cross_val_score to - ShuffleSplit(n_folds=3, shuffle=True) - ShuffleSplit(n_folds=3, shuffle=False) - StratifiedKFold(n_folds=3, shuffle=True) - StratifiedKFold(n_folds=3, shuffle=False) and then try to determine in which cases scores are consistent? Cheers, Gilles On 19 February 2015 at 08:17, Tim Head <beta...@gmail.com> wrote: > Hello, > > I was comparing scores from CV with a score obtained from training on a > subset of the data used in the CV and get very different answers. This > surprised me, should I be? If not how do I understand how/why this happens? > > I run: > > scores = cross_validation.cross_val_score(clf, X_dev, y_dev, > scoring="roc_auc", n_jobs=6) > > and get three scores around 0.77. > > Then I split X_dev with train_test_split(test_size=0.33) and retrain my > classifier on the training part and evaluate the score on the training. Now > the score is around 0.70. > > I thought that the second part, training the classifier on X_train, would be > similar to one of the splits that cross validation comes up with. If the > score between the three CV splits varied a lot more then I would not be > surprised, but the variation is pretty small compared to the difference > between the CV scores and training on 2/3 of X_dev. > > The (full) code is here: > https://gist.github.com/betatim/822785858d15a92aeafb > > Surely-overlooking-something, > T > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general