Hi Thomas, A number of people I've learned from have given me the following "recipe", which I hold to loosely.
1. Start with Random Forest - it should be able to give you good baseline predictive capacity. 2. Let's say you don't care about interpretability, but only care about predictive value. Keep tweaking RF parameters (use grid search + cross validation), or switch to gradient boosting. 3. Let's say you do care about interpretability. Use RF's feature_importances_ to get out the features that are important for prediction. Try linear regression on just those, may also want to try multiplying those features together to get the "interaction" product of those features. (this is using RF as a feature selection method). Beyond this, I am sure more "expert" types will be able to chime in, and also correct me if I've said anything wrong here. Cheers Eric On Sat, Oct 1, 2016 at 10:59 AM, Thomas Evangelidis <[email protected]> wrote: > Dear scikit-learn users and developers, > > I have a dataset consisting of 42 observation (molnames) and 4 variables ( > VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model > that estimates the experimental value (Expr). I tried multivariate linear > regression using 10,000 bootstrap repeats each time using 21 observations > for training and the rest 21 for testing, but the average correlation was > only R= 0.1727 +- 0.19779. > > > molname VDWAALS EEL EGB >> ESURF Expr >> CHEMBL108457 -20.4848 -96.5826 23.4584 >> -5.4045 -7.27193 >> CHEMBL388269 -50.3860 28.9403 -51.5147 >> -6.4061 -6.8022 >> CHEMBL244078 -49.1466 -21.9869 17.7999 >> -6.4588 -6.61742 >> CHEMBL244077 -53.4365 -32.8943 34.8723 >> -7.0384 -6.61742 >> CHEMBL396772 -51.4111 -34.4904 36.0326 >> -6.5443 -5.82207 >> ........ > > > I would like your advice about what other machine learning algorithm I > could try with these data. E.g. can I make a decision tree or the > observations and variable are too few to avoid overfitting? I could > include more variables but the observations will always remain 42. > > I would greatly appreciate any advice! > > Thomas > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
