Hi, Thomas,

I was just reading through a recent preprint (Protein-Ligand Scoring with 
Convolutional Neural Networks, https://arxiv.org/abs/1612.02751), and I thought 
that may be related to your task and maybe interesting or even useful for your 
work.
Also check out references 13, 21, 22, and 24, where they talk about alternative 
(the more classic) representations of protein-ligand complexes or interactions 
as inputs to either random forests or multi-layer perceptrons.

Best,
Sebastian


> On Jan 10, 2017, at 7:46 AM, Thomas Evangelidis <teva...@gmail.com> wrote:
> 
> Jacob,
> 
> The features are not 6000. I train 2 MLPRegressors from two types of data, 
> both refer to the same dataset (35 molecules in total) but each one contains 
> different type of information. The first data consist of 60 features. I tried 
> 100 different random states and measured the average |R| using the 
> leave-20%-out cross-validation. Below are the results from the first data:
> 
> RandomForestRegressor: |R|= 0.389018243545 +- 0.252891783658
> LASSO: |R|= 0.247411754937 +- 0.232325286471
> GradientBoostingRegressor: |R|= 0.324483769202 +- 0.211778410841
> MLPRegressor: |R|= 0.540528696597 +- 0.255714448793
> 
> The second type of data consist of 456 features. Below are the results for 
> these too:
> 
> RandomForestRegressor: |R|= 0.361562548904 +- 0.234872385318
> LASSO: |R|= 3.27752711304e-16 +- 2.60800139195e-16
> GradientBoostingRegressor: |R|= 0.328087138161 +- 0.229588427086
> MLPRegressor: |R|= 0.455473342507 +- 0.24579081197
> 
> 
> At the end I want to combine models created from these data types using a 
> meta-estimator (that was my original question). The combination with the 
> highest |R| (0.631851796403 +- 0.247911204514) was produced by an SVR that 
> combined the best MLPRegressor from data type 1 and the best MLPRegressor 
> from data type2: 
> 
> 
> 
> 
> 
> On 10 January 2017 at 01:36, Jacob Schreiber <jmschreibe...@gmail.com> wrote:
> Even with a single layer with 10 neurons you're still trying to train over 
> 6000 parameters using ~30 samples. Dropout is a concept common in neural 
> networks, but doesn't appear to be in sklearn's implementation of MLPs. Early 
> stopping based on validation performance isn't an "extra" step for reducing 
> overfitting, it's basically a required step for neural networks. It seems 
> like you have a validation sample of ~6 datapoints.. I'm still very skeptical 
> of that giving you proper results for a complex model. Will this larger 
> dataset be of exactly the same data? Just taking another unrelated dataset 
> and showing that a MLP can learn it doesn't mean it will work for your 
> specific data. Can you post the actual results from using LASSO, 
> RandomForestRegressor, GradientBoostingRegressor, and MLP?
> 
> On Mon, Jan 9, 2017 at 4:21 PM, Stuart Reynolds <stu...@stuartreynolds.net> 
> wrote:
> If you dont have a large dataset, you can still do leave one out cross 
> validation.
> 
> On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis <teva...@gmail.com> wrote:
> 
> Jacob & Sebastian,
> 
> I think the best way to find out if my modeling approach works is to find a 
> larger dataset, split it into two parts, the first one will be used as 
> training/cross-validation set and the second as a test set, like in a real 
> case scenario.
> 
> Regarding the MLPRegressor regularization, below is my optimum setup:
> 
> MLPRegressor(random_state=random_state, max_iter=400, early_stopping=True, 
> validation_fraction=0.2, alpha=10, hidden_layer_sizes=(10,))
> 
> This means only one hidden layer with maximum 10 neurons, alpha=10 for L2 
> regularization and early stopping to terminate training if validation score 
> is not improving. I think this is a quite simple model. My final predictor is 
> an SVR that combines 2 MLPRegressors, each one trained with different types 
> of input data.
> 
> @Sebastian
> You have mentioned dropout again but I could not find it in the docs:
> http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
> 
> Maybe you are referring to another MLPRegressor implementation? I have seen a 
> while ago another implementation you had on github. Can you clarify which one 
> you recommend and why?
> 
> 
> Thank you both of you for your hints!
> 
> best
> Thomas
> 
> 
> 
> -- 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ======================================================================
> 
> 
> Thomas Evangelidis
> 
> 
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081, 
> 62500 Brno, Czech Republic 
> 
> email: tev...@pharm.uoa.gr
> 
> 
>               teva...@gmail.com
> 
> 
> 
> website:
> 
> https://sites.google.com/site/thomasevangelidishomepage/
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> scikit-learn mailing list
> 
> scikit-learn@python.org
> 
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> 
> -- 
> ======================================================================
> Thomas Evangelidis
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081, 
> 62500 Brno, Czech Republic 
> 
> email: tev...@pharm.uoa.gr
>               teva...@gmail.com
> 
> website: https://sites.google.com/site/thomasevangelidishomepage/
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to