Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Joel Nothman
Btw, I may have been unclear in the discussion of overfitting. For
*training* the meta-estimator in stacking, it's standard to do something
like cross_val_predict on your training set to produce its input features.

On 8 January 2017 at 22:42, Thomas Evangelidis  wrote:

> Sebastian and Jacob,
>
> Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor
> performance on my data. MLPregressors are way superior. On an other note,
> MLPregressor class has some methods to contol overfitting, like controling
> the alpha parameter for the L2 regularization (maybe setting it to a high
> value?) or the number of neurons in the hidden layers (lowering the 
> hidden_layer_sizes?)
> or even "early_stopping=True". Wouldn't these be sufficient to be on the
> safe side.
>
> Once more I want to highlight something I wrote previously but might have
> been overlooked. The resulting MLPRegressors will be applied to new
> datasets that *ARE VERY SIMILAR TO THE TRAINING DATA*. In other words the
> application of the models will be strictly confined to their applicability
> domain. Wouldn't that be sufficient to not worry about model overfitting
> too much?
>
>
>
>
>
> On 8 January 2017 at 11:53, Sebastian Raschka 
> wrote:
>
>> Like to train an SVR to combine the predictions of the top 10%
>> MLPRegressors using the same data that were used for training of the
>> MLPRegressors? Wouldn't that lead to overfitting?
>>
>>
>> It could, but you don't need to use the same data that you used for
>> training to fit the meta estimator. Like it is commonly done in stacking
>> with cross validation, you can train the mlps on training folds and pass
>> predictions from a test fold to the meta estimator but then you'd have to
>> retrain your mlps and it sounded like you wanted to avoid that.
>>
>> I am currently on mobile and only browsed through the thread briefly, but
>> I agree with others that it may sound like your model(s) may have too much
>> capacity for such a small dataset -- can be tricky to fit the parameters
>> without overfitting. In any case, if you to do the stacking, I'd probably
>> insert a k-fold cv between the mlps and the meta estimator. However I'd
>> definitely also recommend simpler models als
>> alternative.
>>
>> Best,
>> Sebastian
>>
>> On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis  wrote:
>>
>>
>>
>> On 7 January 2017 at 21:20, Sebastian Raschka 
>> wrote:
>>
>>> Hi, Thomas,
>>> sorry, I overread the regression part …
>>> This would be a bit trickier, I am not sure what a good strategy for
>>> averaging regression outputs would be. However, if you just want to compute
>>> the average, you could do sth like
>>> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps]))
>>>
>>> However, it may be better to use stacking, and use the output of
>>> r.predict(X) as meta features to train a model based on these?
>>>
>>
>> ​Like to train an SVR to combine the predictions of the top 10%
>> MLPRegressors using the same data that were used for training of the
>> MLPRegressors? Wouldn't that lead to overfitting?
>> ​
>>
>>
>>>
>>> Best,
>>> Sebastian
>>>
>>> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis 
>>> wrote:
>>> >
>>> > Hi Sebastian,
>>> >
>>> > Thanks, I will try it in another classification problem I have.
>>> However, this time I am using regressors not classifiers.
>>> >
>>> > On Jan 7, 2017 19:28, "Sebastian Raschka" 
>>> wrote:
>>> > Hi, Thomas,
>>> >
>>> > the VotingClassifier can combine different models per majority voting
>>> amongst their predictions. Unfortunately, it refits the classifiers though
>>> (after cloning them). I think we implemented it this way to make it
>>> compatible to GridSearch and so forth. However, I have a version of the
>>> estimator that you can initialize with “refit=False” to avoid refitting if
>>> it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/Ensembl
>>> eVoteClassifier/#example-5-using-pre-fitted-classifiers
>>> >
>>> > Best,
>>> > Sebastian
>>> >
>>> >
>>> >
>>> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis 
>>> wrote:
>>> > >
>>> > > Greetings,
>>> > >
>>> > > I have trained many MLPRegressors using different random_state value
>>> and estimated the R^2 using cross-validation. Now I want to combine the top
>>> 10% of them in how to get more accurate predictions. Is there a
>>> meta-estimator that can get as input a few precomputed MLPRegressors and
>>> give consensus predictions? Can the BaggingRegressor do this job using
>>> MLPRegressors as input?
>>> > >
>>> > > Thanks in advance for any hint.
>>> > > Thomas
>>> > >
>>> > >
>>> > > --
>>> > > 
>>> ==
>>> > > Thomas Evangelidis
>>> > > Research Specialist
>>> > > CEITEC - Central European Institute of Technology
>>> > > Masaryk University
>>> > > Kamenice 5/A35/1S081,
>>> 

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Thomas Evangelidis
Sebastian and Jacob,

Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor
performance on my data. MLPregressors are way superior. On an other note,
MLPregressor class has some methods to contol overfitting, like controling
the alpha parameter for the L2 regularization (maybe setting it to a high
value?) or the number of neurons in the hidden layers (lowering the
hidden_layer_sizes?)
or even "early_stopping=True". Wouldn't these be sufficient to be on the
safe side.

Once more I want to highlight something I wrote previously but might have
been overlooked. The resulting MLPRegressors will be applied to new
datasets that *ARE VERY SIMILAR TO THE TRAINING DATA*. In other words the
application of the models will be strictly confined to their applicability
domain. Wouldn't that be sufficient to not worry about model overfitting
too much?





On 8 January 2017 at 11:53, Sebastian Raschka  wrote:

> Like to train an SVR to combine the predictions of the top 10%
> MLPRegressors using the same data that were used for training of the
> MLPRegressors? Wouldn't that lead to overfitting?
>
>
> It could, but you don't need to use the same data that you used for
> training to fit the meta estimator. Like it is commonly done in stacking
> with cross validation, you can train the mlps on training folds and pass
> predictions from a test fold to the meta estimator but then you'd have to
> retrain your mlps and it sounded like you wanted to avoid that.
>
> I am currently on mobile and only browsed through the thread briefly, but
> I agree with others that it may sound like your model(s) may have too much
> capacity for such a small dataset -- can be tricky to fit the parameters
> without overfitting. In any case, if you to do the stacking, I'd probably
> insert a k-fold cv between the mlps and the meta estimator. However I'd
> definitely also recommend simpler models als
> alternative.
>
> Best,
> Sebastian
>
> On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis  wrote:
>
>
>
> On 7 January 2017 at 21:20, Sebastian Raschka 
> wrote:
>
>> Hi, Thomas,
>> sorry, I overread the regression part …
>> This would be a bit trickier, I am not sure what a good strategy for
>> averaging regression outputs would be. However, if you just want to compute
>> the average, you could do sth like
>> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps]))
>>
>> However, it may be better to use stacking, and use the output of
>> r.predict(X) as meta features to train a model based on these?
>>
>
> ​Like to train an SVR to combine the predictions of the top 10%
> MLPRegressors using the same data that were used for training of the
> MLPRegressors? Wouldn't that lead to overfitting?
> ​
>
>
>>
>> Best,
>> Sebastian
>>
>> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis 
>> wrote:
>> >
>> > Hi Sebastian,
>> >
>> > Thanks, I will try it in another classification problem I have.
>> However, this time I am using regressors not classifiers.
>> >
>> > On Jan 7, 2017 19:28, "Sebastian Raschka"  wrote:
>> > Hi, Thomas,
>> >
>> > the VotingClassifier can combine different models per majority voting
>> amongst their predictions. Unfortunately, it refits the classifiers though
>> (after cloning them). I think we implemented it this way to make it
>> compatible to GridSearch and so forth. However, I have a version of the
>> estimator that you can initialize with “refit=False” to avoid refitting if
>> it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/Ensembl
>> eVoteClassifier/#example-5-using-pre-fitted-classifiers
>> >
>> > Best,
>> > Sebastian
>> >
>> >
>> >
>> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis 
>> wrote:
>> > >
>> > > Greetings,
>> > >
>> > > I have trained many MLPRegressors using different random_state value
>> and estimated the R^2 using cross-validation. Now I want to combine the top
>> 10% of them in how to get more accurate predictions. Is there a
>> meta-estimator that can get as input a few precomputed MLPRegressors and
>> give consensus predictions? Can the BaggingRegressor do this job using
>> MLPRegressors as input?
>> > >
>> > > Thanks in advance for any hint.
>> > > Thomas
>> > >
>> > >
>> > > --
>> > > 
>> ==
>> > > Thomas Evangelidis
>> > > Research Specialist
>> > > CEITEC - Central European Institute of Technology
>> > > Masaryk University
>> > > Kamenice 5/A35/1S081,
>> > > 62500 Brno, Czech Republic
>> > >
>> > > email: tev...@pharm.uoa.gr
>> > >   teva...@gmail.com
>> > >
>> > > website: https://sites.google.com/site/thomasevangelidishomepage/
>> > >
>> > >
>> > > ___
>> > > scikit-learn mailing list
>> > > scikit-learn@python.org
>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> > 

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Sebastian Raschka
> Like to train an SVR to combine the predictions of the top 10% MLPRegressors 
> using the same data that were used for training of the MLPRegressors? 
> Wouldn't that lead to overfitting?

It could, but you don't need to use the same data that you used for training to 
fit the meta estimator. Like it is commonly done in stacking with cross 
validation, you can train the mlps on training folds and pass predictions from 
a test fold to the meta estimator but then you'd have to retrain your mlps and 
it sounded like you wanted to avoid that. 

I am currently on mobile and only browsed through the thread briefly, but I 
agree with others that it may sound like your model(s) may have too much 
capacity for such a small dataset -- can be tricky to fit the parameters 
without overfitting. In any case, if you to do the stacking, I'd probably 
insert a k-fold cv between the mlps and the meta estimator. However I'd 
definitely also recommend simpler models als
alternative.

Best,
Sebastian

> On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis  wrote:
> 
> 
> 
>> On 7 January 2017 at 21:20, Sebastian Raschka  wrote:
>> Hi, Thomas,
>> sorry, I overread the regression part …
>> This would be a bit trickier, I am not sure what a good strategy for 
>> averaging regression outputs would be. However, if you just want to compute 
>> the average, you could do sth like
>> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps]))
>> 
>> However, it may be better to use stacking, and use the output of 
>> r.predict(X) as meta features to train a model based on these?
> 
> ​Like to train an SVR to combine the predictions of the top 10% MLPRegressors 
> using the same data that were used for training of the MLPRegressors? 
> Wouldn't that lead to overfitting?
> ​ 
>> 
>> Best,
>> Sebastian
>> 
>> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis  wrote:
>> >
>> > Hi Sebastian,
>> >
>> > Thanks, I will try it in another classification problem I have. However, 
>> > this time I am using regressors not classifiers.
>> >
>> > On Jan 7, 2017 19:28, "Sebastian Raschka"  wrote:
>> > Hi, Thomas,
>> >
>> > the VotingClassifier can combine different models per majority voting 
>> > amongst their predictions. Unfortunately, it refits the classifiers though 
>> > (after cloning them). I think we implemented it this way to make it 
>> > compatible to GridSearch and so forth. However, I have a version of the 
>> > estimator that you can initialize with “refit=False” to avoid refitting if 
>> > it helps. 
>> > http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers
>> >
>> > Best,
>> > Sebastian
>> >
>> >
>> >
>> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis  
>> > > wrote:
>> > >
>> > > Greetings,
>> > >
>> > > I have trained many MLPRegressors using different random_state value and 
>> > > estimated the R^2 using cross-validation. Now I want to combine the top 
>> > > 10% of them in how to get more accurate predictions. Is there a 
>> > > meta-estimator that can get as input a few precomputed MLPRegressors and 
>> > > give consensus predictions? Can the BaggingRegressor do this job using 
>> > > MLPRegressors as input?
>> > >
>> > > Thanks in advance for any hint.
>> > > Thomas
>> > >
>> > >
>> > > --
>> > > ==
>> > > Thomas Evangelidis
>> > > Research Specialist
>> > > CEITEC - Central European Institute of Technology
>> > > Masaryk University
>> > > Kamenice 5/A35/1S081,
>> > > 62500 Brno, Czech Republic
>> > >
>> > > email: tev...@pharm.uoa.gr
>> > >   teva...@gmail.com
>> > >
>> > > website: https://sites.google.com/site/thomasevangelidishomepage/
>> > >
>> > >
>> > > ___
>> > > scikit-learn mailing list
>> > > scikit-learn@python.org
>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> > ___
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> > ___
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> -- 
> ==
> Thomas Evangelidis
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081, 
> 62500 Brno, Czech Republic 
> 
> email: tev...@pharm.uoa.gr
>   teva...@gmail.com
> 
> website: https://sites.google.com/site/thomasevangelidishomepage/
> 
> ___
> 

Re: [scikit-learn] Roc curve from multilabel classification has slope

2017-01-08 Thread Roman Yurchak
José, I might be misunderstanding something, but wouldn't it make more
sens to plot one ROC curve for every class in your result (using all
samples at once), as opposed to plotting it for every training sample as
you are doing now? Cf the example below,

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

Roman

On 08/01/17 01:42, Jacob Schreiber wrote:
> Slope usually means there are ties in your predictions. Check your
> dataset to see if you have repeated predicted values (possibly 1 or 0).
> 
> On Sat, Jan 7, 2017 at 4:32 PM, José Ismael Fernández Martínez
> > wrote:
> 
> But is not a scikit-learn classifier, is a keras classifier which,
> in the functional API, predict returns probabilities.
> What I don't understand is why my plot of the roc curve has a slope,
> since I call roc_curve passing the actual label as y_true and the
> output of the classifier (score probabilities) as y_score for every
> element tested.
> 
> 
> 
> Sent from my iPhone
> On Jan 7, 2017, at 4:04 PM, Joel Nothman  > wrote:
> 
>> predict method should not return probabilities in scikit-learn
>> classifiers. predict_proba should.
>>
>> On 8 January 2017 at 07:52, José Ismael Fernández Martínez
>> >
>> wrote:
>>
>> Hi, I have a multilabel classifier written in Keras from which
>> I want to compute AUC and plot a ROC curve for every element
>> classified from my test set.
>>
>> 
>>
>> Everything seems fine, except that some elements have a roc
>> curve that have a slope as follows:
>>
>> enter image description here
>> I don't know how to
>> interpret the slope in such cases.
>>
>> Basically my workflow goes as follows, I have a
>> pre-trained |model|, instance of Keras, and I have the
>> features |X| and the binarized labels |y|, every element
>> in |y| is an array of length 1000, as it is a multilabel
>> classification problem each element in |y| might contain many
>> 1s, indicating that the element belongs to multiples classes,
>> so I used the built-in loss of |binary_crossentropy| and my
>> outputs of the model prediction are score probailities. Then I
>> plot the roc curve as follows.
>>
>>
>> The predict method returns probabilities, as I'm using the
>> functional api of keras.
>>
>> Does anyone knows why my roc curves looks like this?
>>
>>
>> Ismael
>>
>>
>>
>> Sent from my iPhone
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org 
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>>
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org 
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org 
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> 
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn