Maybe it’s worth switching to LOOCV since you may have a bit of a pessimistic bias here due to the small training set size (in bootstrap you only have asymptotically 0.632 unique samples for training). I would try both linear and nonlinear models; instead of adding more features maybe also try to eliminate some features via L1, feature selection, or feature extraction in addition to trying different algorithms like random forests, gaussian processes, RBF kernel SVM regression, and so forth.
> On Oct 1, 2016, at 10:59 AM, Thomas Evangelidis <teva...@gmail.com> wrote: > > Dear scikit-learn users and developers, > > I have a dataset consisting of 42 observation (molnames) and 4 variables > (VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model that > estimates the experimental value (Expr). I tried multivariate linear > regression using 10,000 bootstrap repeats each time using 21 observations for > training and the rest 21 for testing, but the average correlation was only R= > 0.1727 +- 0.19779. > > > molname VDWAALS EEL EGB > ESURF Expr > CHEMBL108457 -20.4848 -96.5826 23.4584 -5.4045 > -7.27193 > CHEMBL388269 -50.3860 28.9403 -51.5147 -6.4061 > -6.8022 > CHEMBL244078 -49.1466 -21.9869 17.7999 -6.4588 > -6.61742 > CHEMBL244077 -53.4365 -32.8943 34.8723 -7.0384 > -6.61742 > CHEMBL396772 -51.4111 -34.4904 36.0326 -6.5443 > -5.82207 > ........ > > I would like your advice about what other machine learning algorithm I could > try with these data. E.g. can I make a decision tree or the observations and > variable are too few to avoid overfitting? I could include more variables but > the observations will always remain 42. > > I would greatly appreciate any advice! > > Thomas > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn