Indeed I observed a wide discrepancy in the test score between
standard KFold and StratifiedKFold used by cross_val_score on
approximately class-balanced yet non iid data:  namely the digits
dataset from sklearn. I think this is a bug.

The digits dataset is non-iid as the samples stem from 13 human
writers with specific writing styles and apparently consecutive
samples are more likely to stem from the same writer (although I
emailed the original author of the optdigits datasets and then did no
keep the authorship metadata for individual samples).

-- 
Olivier

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to