I took my example in classification for didactic purposes. My hypothesis still holds that the splitting of the data creates anti correlations between train and test (a depletion effect).
Basically , you shouldn't work with datasets that small. Gaël Sent from my phone, please excuse typos and briefness On Sep 26, 2017, 18:51, at 18:51, Thomas Evangelidis <teva...@gmail.com> wrote: >I have very small training sets (10-50 observations). Currently, I am >working with 16 observations for training and 25 for validation >(external >test set). And I am doing Regression, not Classification (hence the SVR >instead of SVC). > > >On 26 September 2017 at 18:21, Gael Varoquaux ><gael.varoqu...@normalesup.org >> wrote: > >> Hypothesis: you have a very small dataset and when you leave out >data, >> you create a distribution shift between the train and the test. A >> simplified example: 20 samples, 10 class a, 10 class b. A >leave-one-out >> cross-validation will create a training set of 10 samples of one >class, 9 >> samples of the other, and the test set is composed of the class that >is >> minority on the train set. >> >> G >> >> On Tue, Sep 26, 2017 at 06:10:39PM +0200, Thomas Evangelidis wrote: >> > Greetings, >> >> > I don't know if anyone encountered this before, but sometimes I get >> > anti-correlated predictions by the SVR I that am training. Namely, >the >> > Pearson's R and Kendall's tau are negative when I compare the >> predictions on >> > the external test set with the true values. However, the SVR >predictions >> on the >> > training set have positive correlations with the experimental >values and >> hence >> > I can't think of a way to know in advance if the trained SVR will >produce >> > anti-correlated predictions in order to change their sign and avoid >the >> > disaster. Here is an example of what I mean: >> >> > Training set predictions: R=0.452422, tau=0.333333 >> > External test set predictions: R=-0.537420, tau-0.300000 >> >> > Obviously, in a real case scenario where I wouldn't have the >external >> test set >> > I would have used the worst observation instead of the best ones. >Has >> anybody >> > any idea about how I could prevent this? >> >> > thanks in advance >> > Thomas >> -- >> Gael Varoquaux >> Researcher, INRIA Parietal >> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> Phone: ++ 33-1-69-08-79-68 >> http://gael-varoquaux.info >http://twitter.com/GaelVaroquaux >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > >-- > >====================================================================== > >Dr Thomas Evangelidis > >Post-doctoral Researcher >CEITEC - Central European Institute of Technology >Masaryk University >Kamenice 5/A35/2S049, >62500 Brno, Czech Republic > >email: tev...@pharm.uoa.gr > > teva...@gmail.com > > >website: https://sites.google.com/site/thomasevangelidishomepage/ > > >------------------------------------------------------------------------ > >_______________________________________________ >scikit-learn mailing list >scikit-learn@python.org >https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn