Indeed I observed a wide discrepancy in the test score between standard KFold and StratifiedKFold used by cross_val_score on approximately class-balanced yet non iid data: namely the digits dataset from sklearn. I think this is a bug.
The digits dataset is non-iid as the samples stem from 13 human writers with specific writing styles and apparently consecutive samples are more likely to stem from the same writer (although I emailed the original author of the optdigits datasets and then did no keep the authorship metadata for individual samples). -- Olivier ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
