Hi Thomas, An example os such "dummy" meta-regressor can be seen in NNScore, which is protein-ligand scoring function (one of Sebastian's suggestions). A meta-class is implemented in Open Drug Discovery Toolkit [here: https://github.com/oddt/oddt/blob/master/oddt/scoring/__init__.py#L200], along with also suggested RF-Score and few other methods you might find useful.
Actually, what NNScore does it train 1000 MLPRegressors and pick 20 best scored on PDBbind test set. An ensemble prediction is mean prediction of those best models. ---- Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2017-01-11 21:16 GMT+01:00 Sebastian Raschka <se.rasc...@gmail.com>: > Hi, Thomas, > > I was just reading through a recent preprint (Protein-Ligand Scoring with > Convolutional Neural Networks, https://arxiv.org/abs/1612.02751), and I > thought that may be related to your task and maybe interesting or even > useful for your work. > Also check out references 13, 21, 22, and 24, where they talk about > alternative (the more classic) representations of protein-ligand complexes > or interactions as inputs to either random forests or multi-layer > perceptrons. > > Best, > Sebastian > > > > On Jan 10, 2017, at 7:46 AM, Thomas Evangelidis <teva...@gmail.com> > wrote: > > > > Jacob, > > > > The features are not 6000. I train 2 MLPRegressors from two types of > data, both refer to the same dataset (35 molecules in total) but each one > contains different type of information. The first data consist of 60 > features. I tried 100 different random states and measured the average |R| > using the leave-20%-out cross-validation. Below are the results from the > first data: > > > > RandomForestRegressor: |R|= 0.389018243545 +- 0.252891783658 > > LASSO: |R|= 0.247411754937 +- 0.232325286471 > > GradientBoostingRegressor: |R|= 0.324483769202 +- 0.211778410841 > > MLPRegressor: |R|= 0.540528696597 +- 0.255714448793 > > > > The second type of data consist of 456 features. Below are the results > for these too: > > > > RandomForestRegressor: |R|= 0.361562548904 +- 0.234872385318 > > LASSO: |R|= 3.27752711304e-16 +- 2.60800139195e-16 > > GradientBoostingRegressor: |R|= 0.328087138161 +- 0.229588427086 > > MLPRegressor: |R|= 0.455473342507 +- 0.24579081197 > > > > > > At the end I want to combine models created from these data types using > a meta-estimator (that was my original question). The combination with the > highest |R| (0.631851796403 +- 0.247911204514) was produced by an SVR that > combined the best MLPRegressor from data type 1 and the best MLPRegressor > from data type2: > > > > > > > > > > > > On 10 January 2017 at 01:36, Jacob Schreiber <jmschreibe...@gmail.com> > wrote: > > Even with a single layer with 10 neurons you're still trying to train > over 6000 parameters using ~30 samples. Dropout is a concept common in > neural networks, but doesn't appear to be in sklearn's implementation of > MLPs. Early stopping based on validation performance isn't an "extra" step > for reducing overfitting, it's basically a required step for neural > networks. It seems like you have a validation sample of ~6 datapoints.. I'm > still very skeptical of that giving you proper results for a complex model. > Will this larger dataset be of exactly the same data? Just taking another > unrelated dataset and showing that a MLP can learn it doesn't mean it will > work for your specific data. Can you post the actual results from using > LASSO, RandomForestRegressor, GradientBoostingRegressor, and MLP? > > > > On Mon, Jan 9, 2017 at 4:21 PM, Stuart Reynolds < > stu...@stuartreynolds.net> wrote: > > If you dont have a large dataset, you can still do leave one out cross > validation. > > > > On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis <teva...@gmail.com> > wrote: > > > > Jacob & Sebastian, > > > > I think the best way to find out if my modeling approach works is to > find a larger dataset, split it into two parts, the first one will be used > as training/cross-validation set and the second as a test set, like in a > real case scenario. > > > > Regarding the MLPRegressor regularization, below is my optimum setup: > > > > MLPRegressor(random_state=random_state, max_iter=400, > early_stopping=True, validation_fraction=0.2, alpha=10, > hidden_layer_sizes=(10,)) > > > > This means only one hidden layer with maximum 10 neurons, alpha=10 for > L2 regularization and early stopping to terminate training if validation > score is not improving. I think this is a quite simple model. My final > predictor is an SVR that combines 2 MLPRegressors, each one trained with > different types of input data. > > > > @Sebastian > > You have mentioned dropout again but I could not find it in the docs: > > http://scikit-learn.org/stable/modules/generated/sklearn.neural_network. > MLPRegressor.html#sklearn.neural_network.MLPRegressor > > > > Maybe you are referring to another MLPRegressor implementation? I have > seen a while ago another implementation you had on github. Can you clarify > which one you recommend and why? > > > > > > Thank you both of you for your hints! > > > > best > > Thomas > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ====================================================================== > > > > > > Thomas Evangelidis > > > > > > Research Specialist > > CEITEC - Central European Institute of Technology > > Masaryk University > > Kamenice 5/A35/1S081, > > 62500 Brno, Czech Republic > > > > email: tev...@pharm.uoa.gr > > > > > > teva...@gmail.com > > > > > > > > website: > > > > https://sites.google.com/site/thomasevangelidishomepage/ > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn@python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > > CEITEC - Central European Institute of Technology > > Masaryk University > > Kamenice 5/A35/1S081, > > 62500 Brno, Czech Republic > > > > email: tev...@pharm.uoa.gr > > teva...@gmail.com > > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn