On Wed, Jul 25, 2018 at 12:36:55PM +0200, Benoît Presles wrote: > Do you think the problems I have can come from correlated features? Indeed, > in my dataset I have some highly correlated features.
Yes, in general selecting features conditionally on others is very hard when features are highly correlated. > Do you think this could explain why I don't get reproducible and consistent > results? Yes. > Thanks for your help, > Ben > Le 24/07/2018 à 23:44, bthirion a écrit : > > Univariate screening is somewhat hackish too, but much more stable -- > > and cheap. > > Best, > > Bertrand > > On 24/07/2018 23:33, Benoît Presles wrote: > > > So you think that I cannot get reproducible and consistent results > > > with this method ? > > > If you would avoid RFE, which method do you suggest to find the best > > > features ? > > > Ben > > > Le 24/07/2018 à 21:34, Gael Varoquaux a écrit : > > > > On Tue, Jul 24, 2018 at 08:43:27PM +0200, Benoît Presles wrote: > > > > > 3. With C=1, it seems that I have the same results at each run for all > > > > > solvers (liblinear, sag and saga), however the ranking is not the same > > > > > between the solvers. > > > > Your problem is probably ill-conditioned, hence the specific weights on > > > > the features are not stable. There isn't a good answer to ordering > > > > features, they are degenerate. > > > > In general, I would avoid RFE, it is a hack, and can easily lead > > > > to these > > > > problems. > > > > Gaël > > > > > Thanks for your help, > > > > > Ben > > > > > PS1: I checked and n_iter_ seems to be always lower than max_iter. > > > > > PS2: my data is scaled, I am using "StandardScaler". > > > > > Le 24/07/2018 à 20:33, Andreas Mueller a écrit : > > > > > > On 07/24/2018 02:07 PM, Benoît Presles wrote: > > > > > > > I did the same tests as before adding fit_intercept=False and: > > > > > > > 1. I have got the same problem as before, i.e. when I execute the > > > > > > > RFE multiple times I don't get the same ranking each time. > > > > > > > 2. When I change the solver to 'sag' > > > > > > > (classifier_RFE=LogisticRegression(C=1e9, verbose=1, > > > > > > > max_iter=10000, > > > > > > > fit_intercept=False, solver='sag')), it seems that I get the same > > > > > > > ranking at each run. This is not the case with the 'saga' solver. > > > > > > > The ranking is not the same between the solvers. > > > > > > > 3. With C=1, it seems that I have the same results at each run for > > > > > > > all solvers (liblinear, sag and saga), however the ranking is not > > > > > > > the same between the solvers. > > > > > > > How can I get reproducible and consistent results? > > > > > > Did you scale your data? If not, saga and sag will basically fail. > > > > > > _______________________________________________ > > > > > > scikit-learn mailing list > > > > > > scikit-learn@python.org > > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn@python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn