Re: [scikit-learn] meta-estimator for multiple MLPRegressor

Thomas Evangelidis Tue, 10 Jan 2017 04:49:01 -0800

Jacob,

The features are not 6000. I train 2 MLPRegressors from two types of data,
both refer to the same dataset (35 molecules in total) but each one
contains different type of information. The first data consist of 60
features. I tried 100 different random states and measured the average |R|
using the leave-20%-out cross-validation. Below are the results from the
first data:


RandomForestRegressor: |R|= 0.389018243545 +- 0.252891783658
LASSO: |R|= 0.247411754937 +- 0.232325286471
GradientBoostingRegressor: |R|= 0.324483769202 +- 0.211778410841
MLPRegressor: |R|= 0.540528696597 +- 0.255714448793

The second type of data consist of 456 features. Below are the results for
these too:

RandomForestRegressor: |R|= 0.361562548904 +- 0.234872385318
LASSO: |R|= 3.27752711304e-16 +- 2.60800139195e-16
GradientBoostingRegressor: |R|= 0.328087138161 +- 0.229588427086
MLPRegressor: |R|= 0.455473342507 +- 0.24579081197


At the end I want to combine models created from these data types using a
meta-estimator (that was my original question). The combination with the
highest |R| (0.631851796403 +- 0.247911204514) was produced by an SVR that
combined the best MLPRegressor from data type 1 and the best MLPRegressor
from data type2:





On 10 January 2017 at 01:36, Jacob Schreiber <[email protected]>
wrote:

> Even with a single layer with 10 neurons you're still trying to train over
> 6000 parameters using ~30 samples. Dropout is a concept common in neural
> networks, but doesn't appear to be in sklearn's implementation of MLPs.
> Early stopping based on validation performance isn't an "extra" step for
> reducing overfitting, it's basically a required step for neural networks.
> It seems like you have a validation sample of ~6 datapoints.. I'm still
> very skeptical of that giving you proper results for a complex model. Will
> this larger dataset be of exactly the same data? Just taking another
> unrelated dataset and showing that a MLP can learn it doesn't mean it will
> work for your specific data. Can you post the actual results from using
> LASSO, RandomForestRegressor, GradientBoostingRegressor, and MLP?
>
> On Mon, Jan 9, 2017 at 4:21 PM, Stuart Reynolds <[email protected]
> > wrote:
>
>> If you dont have a large dataset, you can still do leave one out cross
>> validation.
>>
>> On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis <[email protected]>
>> wrote:
>>
>>>
>>> Jacob & Sebastian,
>>>
>>> I think the best way to find out if my modeling approach works is to
>>> find a larger dataset, split it into two parts, the first one will be used
>>> as training/cross-validation set and the second as a test set, like in a
>>> real case scenario.
>>>
>>> Regarding the MLPRegressor regularization, below is my optimum setup:
>>>
>>> MLPRegressor(random_state=random_state, max_iter=400,
>>> early_stopping=True, validation_fraction=0.2, alpha=10,
>>> hidden_layer_sizes=(10,))
>>>
>>>
>>> This means only one hidden layer with maximum 10 neurons, alpha=10 for
>>> L2 regularization and early stopping to terminate training if validation
>>> score is not improving. I think this is a quite simple model. My final
>>> predictor is an SVR that combines 2 MLPRegressors, each one trained with
>>> different types of input data.
>>>
>>> @Sebastian
>>> You have mentioned dropout again but I could not find it in the docs:
>>> http://scikit-learn.org/stable/modules/generated/sklearn.
>>> neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
>>>
>>> Maybe you are referring to another MLPRegressor implementation? I have
>>> seen a while ago another implementation you had on github. Can you clarify
>>> which one you recommend and why?
>>>
>>>
>>> Thank you both of you for your hints!
>>>
>>> best
>>> Thomas
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ======================================================================
>>>
>>>
>>> Thomas Evangelidis
>>>
>>>
>>> Research Specialist
>>> CEITEC - Central European Institute of Technology
>>> Masaryk University
>>> Kamenice 5/A35/1S081,
>>> 62500 Brno, Czech Republic
>>>
>>> email: [email protected]
>>>
>>>
>>>           [email protected]
>>>
>>>
>>>
>>> website:
>>>
>>> https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> scikit-learn mailing list
>>>
>>> [email protected]
>>>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 

======================================================================

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: [email protected]

          [email protected]


website: https://sites.google.com/site/thomasevangelidishomepage/

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

Reply via email to