I have very small training sets (10-50 observations). Currently, I am working with 16 observations for training and 25 for validation (external test set). And I am doing Regression, not Classification (hence the SVR instead of SVC).
On 26 September 2017 at 18:21, Gael Varoquaux <gael.varoqu...@normalesup.org > wrote: > Hypothesis: you have a very small dataset and when you leave out data, > you create a distribution shift between the train and the test. A > simplified example: 20 samples, 10 class a, 10 class b. A leave-one-out > cross-validation will create a training set of 10 samples of one class, 9 > samples of the other, and the test set is composed of the class that is > minority on the train set. > > G > > On Tue, Sep 26, 2017 at 06:10:39PM +0200, Thomas Evangelidis wrote: > > Greetings, > > > I don't know if anyone encountered this before, but sometimes I get > > anti-correlated predictions by the SVR I that am training. Namely, the > > Pearson's R and Kendall's tau are negative when I compare the > predictions on > > the external test set with the true values. However, the SVR predictions > on the > > training set have positive correlations with the experimental values and > hence > > I can't think of a way to know in advance if the trained SVR will produce > > anti-correlated predictions in order to change their sign and avoid the > > disaster. Here is an example of what I mean: > > > Training set predictions: R=0.452422, tau=0.333333 > > External test set predictions: R=-0.537420, tau-0.300000 > > > Obviously, in a real case scenario where I wouldn't have the external > test set > > I would have used the worst observation instead of the best ones. Has > anybody > > any idea about how I could prevent this? > > > thanks in advance > > Thomas > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- ====================================================================== Dr Thomas Evangelidis Post-doctoral Researcher CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/2S049, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn