Hi Tim,

By default, cross_val_score uses on StratifiedKFold(shuffle=False) to
create the train/test folds while train_test_split uses ShuffleSplit.
The discrepancy you observe might therefore come from either
shuffling, the stratification of the labels or both of them.

Can you set the CV parameter in cross_val_score to
- ShuffleSplit(n_folds=3, shuffle=True)
- ShuffleSplit(n_folds=3, shuffle=False)
- StratifiedKFold(n_folds=3, shuffle=True)
- StratifiedKFold(n_folds=3, shuffle=False)
and then try to determine in which cases scores are consistent?

Cheers,
Gilles

On 19 February 2015 at 08:17, Tim Head <beta...@gmail.com> wrote:
> Hello,
>
> I was comparing scores from CV with a score obtained from training on a
> subset of the data used in the CV and get very different answers. This
> surprised me, should I be? If not how do I understand how/why this happens?
>
> I run:
>
> scores = cross_validation.cross_val_score(clf, X_dev, y_dev,
> scoring="roc_auc", n_jobs=6)
>
> and get three scores around 0.77.
>
> Then I split X_dev with train_test_split(test_size=0.33) and retrain my
> classifier on the training part and evaluate the score on the training. Now
> the score is around 0.70.
>
> I thought that the second part, training the classifier on X_train, would be
> similar to one of the splits that cross validation comes up with. If the
> score between the three CV splits varied a lot more then I would not be
> surprised, but the variation is pretty small compared to the difference
> between the CV scores and training on 2/3 of X_dev.
>
> The (full) code is here:
> https://gist.github.com/betatim/822785858d15a92aeafb
>
> Surely-overlooking-something,
> T
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to