Re: [scikit-learn] suggested machine learning algorithm

Sebastian Raschka Sat, 01 Oct 2016 13:00:35 -0700

Maybe it’s worth switching to LOOCV since you may have a bit of a pessimistic 
bias here due to the small training set size (in bootstrap you only have 
asymptotically 0.632 unique samples for training). I would try both linear and 
nonlinear models; instead of adding more features maybe also try to eliminate 
some features via L1, feature selection, or feature extraction in addition to 
trying different algorithms like random forests, gaussian processes, RBF kernel 
SVM regression, and so forth.



> On Oct 1, 2016, at 10:59 AM, Thomas Evangelidis <[email protected]> wrote:
> 
> Dear scikit-learn users and developers,
> 
> I have a dataset consisting of 42 observation (molnames) and 4 variables 
> (VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model that 
> estimates the experimental value (Expr). I tried multivariate linear 
> regression using 10,000 bootstrap repeats each time using 21 observations for 
> training and the rest 21 for testing, but the average correlation was only R= 
> 0.1727 +- 0.19779.
> 
> 
> molname                    VDWAALS     EEL               EGB              
> ESURF        Expr
> CHEMBL108457        -20.4848        -96.5826         23.4584       -5.4045    
>     -7.27193
> CHEMBL388269        -50.3860         28.9403        -51.5147       -6.4061    
>     -6.8022
> CHEMBL244078        -49.1466        -21.9869         17.7999       -6.4588    
>     -6.61742
> CHEMBL244077        -53.4365        -32.8943         34.8723       -7.0384    
>     -6.61742
> CHEMBL396772        -51.4111        -34.4904         36.0326       -6.5443    
>     -5.82207
> ........
> 
> I would like your advice about what other machine learning algorithm I could 
> try with these data. E.g. can I make a decision tree or the observations  and 
> variable are too few to avoid overfitting? I could include more variables but 
> the observations will always remain 42.
> 
> I would greatly appreciate any advice!
> 
> Thomas
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] suggested machine learning algorithm

Reply via email to