Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-16 Thread Maciek Wójcikowski
Hi Thomas, An example os such "dummy" meta-regressor can be seen in NNScore, which is protein-ligand scoring function (one of Sebastian's suggestions). A meta-class is implemented in Open Drug Discovery Toolkit [here: https://github.com/oddt/oddt/blob/master/oddt/scoring/__init__.py#L200], along w

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-11 Thread Sebastian Raschka
Hi, Thomas, I was just reading through a recent preprint (Protein-Ligand Scoring with Convolutional Neural Networks, https://arxiv.org/abs/1612.02751), and I thought that may be related to your task and maybe interesting or even useful for your work. Also check out references 13, 21, 22, and 24

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-10 Thread Thomas Evangelidis
Stuart, I didn't see LASSO performing well, especially with the second type of data. The alpha parameter probably needs adjustment with LassoCV. I don't know if you have read my previous messages on this thread, so I quote again my setting for MLPRegressor. MLPRegressor(random_state=random_state

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-10 Thread Stuart Reynolds
Thomas, Jacob's point is important -- its not the number of features that's important, its the number of free parameters. As the number of free parameters increases, the space of representable functions grows to the point where the cost function is minimized by having a single parameter explain eac

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-10 Thread Thomas Evangelidis
Jacob, The features are not 6000. I train 2 MLPRegressors from two types of data, both refer to the same dataset (35 molecules in total) but each one contains different type of information. The first data consist of 60 features. I tried 100 different random states and measured the average |R| usin

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Jacob Schreiber
Even with a single layer with 10 neurons you're still trying to train over 6000 parameters using ~30 samples. Dropout is a concept common in neural networks, but doesn't appear to be in sklearn's implementation of MLPs. Early stopping based on validation performance isn't an "extra" step for reduci

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Stuart Reynolds
If you dont have a large dataset, you can still do leave one out cross validation. On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis wrote: > > Jacob & Sebastian, > > I think the best way to find out if my modeling approach works is to find > a larger dataset, split it into two parts, the first

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Thomas Evangelidis
Jacob & Sebastian, I think the best way to find out if my modeling approach works is to find a larger dataset, split it into two parts, the first one will be used as training/cross-validation set and the second as a test set, like in a real case scenario. Regarding the MLPRegressor regularization

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Sebastian Raschka
> Once more I want to highlight something I wrote previously but might have > been overlooked. The resulting MLPRegressors will be applied to new datasets > that ARE VERY SIMILAR TO THE TRAINING DATA. In other words the application of > the models will be strictly confined to their applicability

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Jacob Schreiber
Thomas, it can be difficult to fine tune L1/L2 regularization in the case where n_parameters >>> n_samples ~and~ n_features >> n_samples. If your samples are very similar to the training data, why are simpler models not working well? On Sun, Jan 8, 2017 at 8:08 PM, Joel Nothman wrote: > Btw, I

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Joel Nothman
Btw, I may have been unclear in the discussion of overfitting. For *training* the meta-estimator in stacking, it's standard to do something like cross_val_predict on your training set to produce its input features. On 8 January 2017 at 22:42, Thomas Evangelidis wrote: > Sebastian and Jacob, > >

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Thomas Evangelidis
Sebastian and Jacob, Regarding overfitting, Lasso, Ridge regression and ElasticNet have poor performance on my data. MLPregressors are way superior. On an other note, MLPregressor class has some methods to contol overfitting, like controling the alpha parameter for the L2 regularization (maybe set

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Sebastian Raschka
> Like to train an SVR to combine the predictions of the top 10% MLPRegressors > using the same data that were used for training of the MLPRegressors? > Wouldn't that lead to overfitting? It could, but you don't need to use the same data that you used for training to fit the meta estimator. Lik

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Jacob Schreiber
This is an aside to what your original question was, but as someone who has dealt with similar data in bioinformatics (gene expression, specifically) I think you should tread -very- carefully if you have such a small sample set and more dimensions than features. MLPs are already prone to overfit an

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Thomas Evangelidis
On 8 January 2017 at 00:04, Jacob Schreiber wrote: > If you have such a small number of observations (with a much higher > feature space) then why do you think you can accurately train not just a > single MLP, but an ensemble of them without overfitting dramatically? > > > ​Because the observatio

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Jacob Schreiber
If you have such a small number of observations (with a much higher feature space) then why do you think you can accurately train not just a single MLP, but an ensemble of them without overfitting dramatically? On Sat, Jan 7, 2017 at 2:26 PM, Thomas Evangelidis wrote: > Regarding the evaluation,

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Thomas Evangelidis
Regarding the evaluation, I use the leave 20% out cross validation method. I cannot leave more out because my data sets are very small, between 30 and 40 observations, each one with 600 features. Is there a limit in the number of MLPRegressors I can combine with stacking considering my small data s

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Joel Nothman
* > There is no problem, in general, with overfitting, as long as your > evaluation of an estimator's performance isn't biased towards the training > set. We've not talked about evaluation. > ___ scikit-learn mailing list scikit-learn@python.org https:/

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Joel Nothman
On 8 January 2017 at 08:36, Thomas Evangelidis wrote: > > > On 7 January 2017 at 21:20, Sebastian Raschka > wrote: > >> Hi, Thomas, >> sorry, I overread the regression part … >> This would be a bit trickier, I am not sure what a good strategy for >> averaging regression outputs would be. However

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Thomas Evangelidis
On 7 January 2017 at 21:20, Sebastian Raschka wrote: > Hi, Thomas, > sorry, I overread the regression part … > This would be a bit trickier, I am not sure what a good strategy for > averaging regression outputs would be. However, if you just want to compute > the average, you could do sth like >

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Sebastian Raschka
Hi, Thomas, sorry, I overread the regression part … This would be a bit trickier, I am not sure what a good strategy for averaging regression outputs would be. However, if you just want to compute the average, you could do sth like np.mean(np.asarray([r.predict(X) for r in list_or_your_mlps]))

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Thomas Evangelidis
Hi Sebastian, Thanks, I will try it in another classification problem I have. However, this time I am using regressors not classifiers. On Jan 7, 2017 19:28, "Sebastian Raschka" wrote: > Hi, Thomas, > > the VotingClassifier can combine different models per majority voting > amongst their predic

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Sebastian Raschka
Hi, Thomas, the VotingClassifier can combine different models per majority voting amongst their predictions. Unfortunately, it refits the classifiers though (after cloning them). I think we implemented it this way to make it compatible to GridSearch and so forth. However, I have a version of th

[scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Thomas Evangelidis
Greetings, I have trained many MLPRegressors using different random_state value and estimated the R^2 using cross-validation. Now I want to combine the top 10% of them in how to get more accurate predictions. Is there a meta-estimator that can get as input a few precomputed MLPRegressors and give