> Once more I want to highlight something I wrote previously but might have > been overlooked. The resulting MLPRegressors will be applied to new datasets > that ARE VERY SIMILAR TO THE TRAINING DATA. In other words the application of > the models will be strictly confined to their applicability domain. Wouldn't > that be sufficient to not worry about model overfitting too much?
If you have a very small dataset and a very large number of features, I’d always be careful with/avoid models that have a high capacity. However, it is really hard to answer this question because we don’t know much about your training and evaluation approach. If you didn’t do much hyperparameter tuning and cross-validation for model selection, and if you set aside a sufficiently large portion as an independent test set that you only looked at once and get a good performance on that, you may be lucky and a complex MLP may generalize well. However, like others said, it’s really hard to get an MLP right (not memorizing training data) if n_samples is small and n_features is large. And for n_features > n_samples, that may be very, very hard. > like controling the alpha parameter for the L2 regularization (maybe setting > it to a high value?) or the number of neurons in the hidden layers (lowering > the hidden_layer_sizes?) or even "early_stopping=True" As a rule of thumb, the higher the capacity the higher the degree/chance of overfitting. So yes, this could help a little bit. You probably also want to try dropout instead of L2 (or in addition), which usually has a stronger effect on regularization (esp. if you have a very large set of redundant features). Can’t remember the exact paper, but I read about an approach where the authors set a max constraint for the weights in combination with dropout, e.g. “ ||w||_2 < constant “, which worked even better than dropout alone (the constant becomes another hyperparm to tune though). Best, Sebastian > On Jan 9, 2017, at 1:21 PM, Jacob Schreiber <jmschreibe...@gmail.com> wrote: > > Thomas, it can be difficult to fine tune L1/L2 regularization in the case > where n_parameters >>> n_samples ~and~ n_features >> n_samples. If your > samples are very similar to the training data, why are simpler models not > working well? > > > > On Sun, Jan 8, 2017 at 8:08 PM, Joel Nothman <joel.noth...@gmail.com> wrote: > Btw, I may have been unclear in the discussion of overfitting. For *training* > the meta-estimator in stacking, it's standard to do something like > cross_val_predict on your training set to produce its input features. > > On 8 January 2017 at 22:42, Thomas Evangelidis <teva...@gmail.com> wrote: > Sebastian and Jacob, > > Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor > performance on my data. MLPregressors are way superior. On an other note, > MLPregressor class has some methods to contol overfitting, like controling > the alpha parameter for the L2 regularization (maybe setting it to a high > value?) or the number of neurons in the hidden layers (lowering the > hidden_layer_sizes?) or even "early_stopping=True". Wouldn't these be > sufficient to be on the safe side. > > Once more I want to highlight something I wrote previously but might have > been overlooked. The resulting MLPRegressors will be applied to new datasets > that ARE VERY SIMILAR TO THE TRAINING DATA. In other words the application of > the models will be strictly confined to their applicability domain. Wouldn't > that be sufficient to not worry about model overfitting too much? > > > > > > On 8 January 2017 at 11:53, Sebastian Raschka <se.rasc...@gmail.com> wrote: >> Like to train an SVR to combine the predictions of the top 10% MLPRegressors >> using the same data that were used for training of the MLPRegressors? >> Wouldn't that lead to overfitting? > > It could, but you don't need to use the same data that you used for training > to fit the meta estimator. Like it is commonly done in stacking with cross > validation, you can train the mlps on training folds and pass predictions > from a test fold to the meta estimator but then you'd have to retrain your > mlps and it sounded like you wanted to avoid that. > > I am currently on mobile and only browsed through the thread briefly, but I > agree with others that it may sound like your model(s) may have too much > capacity for such a small dataset -- can be tricky to fit the parameters > without overfitting. In any case, if you to do the stacking, I'd probably > insert a k-fold cv between the mlps and the meta estimator. However I'd > definitely also recommend simpler models als > alternative. > > Best, > Sebastian > > On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis <teva...@gmail.com> wrote: > >> >> >> On 7 January 2017 at 21:20, Sebastian Raschka <se.rasc...@gmail.com> wrote: >> Hi, Thomas, >> sorry, I overread the regression part … >> This would be a bit trickier, I am not sure what a good strategy for >> averaging regression outputs would be. However, if you just want to compute >> the average, you could do sth like >> np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps])) >> >> However, it may be better to use stacking, and use the output of >> r.predict(X) as meta features to train a model based on these? >> >> Like to train an SVR to combine the predictions of the top 10% >> MLPRegressors using the same data that were used for training of the >> MLPRegressors? Wouldn't that lead to overfitting? >> >> >> Best, >> Sebastian >> >> > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis <teva...@gmail.com> wrote: >> > >> > Hi Sebastian, >> > >> > Thanks, I will try it in another classification problem I have. However, >> > this time I am using regressors not classifiers. >> > >> > On Jan 7, 2017 19:28, "Sebastian Raschka" <se.rasc...@gmail.com> wrote: >> > Hi, Thomas, >> > >> > the VotingClassifier can combine different models per majority voting >> > amongst their predictions. Unfortunately, it refits the classifiers though >> > (after cloning them). I think we implemented it this way to make it >> > compatible to GridSearch and so forth. However, I have a version of the >> > estimator that you can initialize with “refit=False” to avoid refitting if >> > it helps. >> > http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers >> > >> > Best, >> > Sebastian >> > >> > >> > >> > > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis <teva...@gmail.com> >> > > wrote: >> > > >> > > Greetings, >> > > >> > > I have trained many MLPRegressors using different random_state value and >> > > estimated the R^2 using cross-validation. Now I want to combine the top >> > > 10% of them in how to get more accurate predictions. Is there a >> > > meta-estimator that can get as input a few precomputed MLPRegressors and >> > > give consensus predictions? Can the BaggingRegressor do this job using >> > > MLPRegressors as input? >> > > >> > > Thanks in advance for any hint. >> > > Thomas >> > > >> > > >> > > -- >> > > ====================================================================== >> > > Thomas Evangelidis >> > > Research Specialist >> > > CEITEC - Central European Institute of Technology >> > > Masaryk University >> > > Kamenice 5/A35/1S081, >> > > 62500 Brno, Czech Republic >> > > >> > > email: tev...@pharm.uoa.gr >> > > teva...@gmail.com >> > > >> > > website: https://sites.google.com/site/thomasevangelidishomepage/ >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn@python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> -- >> ====================================================================== >> Thomas Evangelidis >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tev...@pharm.uoa.gr >> teva...@gmail.com >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -- > ====================================================================== > Thomas Evangelidis > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > teva...@gmail.com > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn