Re: [scikit-learn] logistic regression results are not stable between solvers

Guillaume Lemaître Wed, 08 Jan 2020 12:21:05 -0800

We issue convergence warning. Can you check n_iter to be sure that you did
not convergence to the stated convergence?


On Wed, 8 Jan 2020 at 20:53, Benoît Presles <benoit.pres...@u-bourgogne.fr>
wrote:

> Dear sklearn users,
>
> I still have some issues concerning logistic regression.
> I did compare on the same data (simulated data) sklearn with three
> different solvers (lbfgs, saga, liblinear) and statsmodels.
>
> When everything goes well, I get the same results between lbfgs, saga,
> liblinear and statsmodels. When everything goes wrong, all the results are
> different.
>
> In fact, when everything goes wrong, statsmodels gives me a convergence
> warning (Warning: Maximum number of iterations has been exceeded. Current
> function value: inf Iterations: 20000) + an error
> (numpy.linalg.LinAlgError: Singular matrix).
>
> Why sklearn does not tell me anything? How can I know that I have
> convergence issues with sklearn?
>
>
> Thanks for your help,
> Best regards,
> Ben
>
> --------------------------------------------
>
> Here is the code I used to generate synthetic data:
>
> from sklearn.datasets import make_classification
> from sklearn.model_selection import StratifiedShuffleSplit
> from sklearn.preprocessing import StandardScaler
> from sklearn.linear_model import LogisticRegression
> import statsmodels.api as sm
> #
> RANDOM_SEED = 2
> #
> X_sim, y_sim = make_classification(n_samples=200,
>                            n_features=20,
>                            n_informative=10,
>                            n_redundant=0,
>                            n_repeated=0,
>                            n_classes=2,
>                            n_clusters_per_class=1,
>                            random_state=RANDOM_SEED,
>                            shuffle=False)
> #
> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
> random_state=RANDOM_SEED)
> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>     X_split_train, X_split_test = X_sim[train_index_split],
> X_sim[test_index_split]
>     y_split_train, y_split_test = y_sim[train_index_split],
> y_sim[test_index_split]
>     ss = StandardScaler()
>     X_split_train = ss.fit_transform(X_split_train)
>     X_split_test = ss.transform(X_split_test)
>     #
>     classifier_lbfgs = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>                                     solver='lbfgs', penalty='none',
> tol=1e-6)
>     classifier_lbfgs.fit(X_split_train, y_split_train)
>     print('classifier lbfgs iter:',  classifier_lbfgs.n_iter_)
>     print(classifier_lbfgs.intercept_)
>     print(classifier_lbfgs.coef_)
>     #
>     classifier_saga = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>                                     solver='saga', penalty='none',
> tol=1e-6)
>     classifier_saga.fit(X_split_train, y_split_train)
>     print('classifier saga iter:', classifier_saga.n_iter_)
>     print(classifier_saga.intercept_)
>     print(classifier_saga.coef_)
>     #
>     classifier_liblinear = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
>                                          C=1e9,
>                                          solver='liblinear', penalty='l2',
> tol=1e-6)
>     classifier_liblinear.fit(X_split_train, y_split_train)
>     print('classifier liblinear iter:', classifier_liblinear.n_iter_)
>     print(classifier_liblinear.intercept_)
>     print(classifier_liblinear.coef_)
>     # statsmodels
>     logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train))
>     logit_res = logit.fit(maxiter=20000)
>     print("Coef statsmodels")
>     print(logit_res.params)
>
>
>
> On 11/10/2019 15:42, Andreas Mueller wrote:
>
>
>
> On 10/10/19 1:14 PM, Benoît Presles wrote:
>
> Thanks for your answers.
> On my real data, I do not have so many samples. I have a bit more than 200
> samples in total and I also would like to get some results with unpenalized
> logisitic regression.
> What do you suggest? Should I switch to the lbfgs solver?
>
> Yes.
>
> Am I sure that with this solver I will not have any convergence issue and
> always get the good result? Indeed, I did not get any convergence warning
> with saga, so I thought everything was fine. I noticed some issues only
> when I decided to test several solvers. Without comparing the results
> across solvers, how to be sure that the optimisation goes well? Shouldn't
> scikit-learn warn the user somehow if it is not the case?
>
> We should attempt to warn in the SAGA solver if it doesn't converge. That
> it doesn't raise a convergence warning should probably be considered a bug.
> It uses the maximum weight change as a stopping criterion right now.
> We could probably compute the dual objective once in the end to see if we
> converged, right? Or is that not possible with SAGA? If not, we might want
> to caution that no convergence warning will be raised.
>
>
> At last, I was using saga because I also wanted to do some feature
> selection by using l1 penalty which is not supported by lbfgs...
>
> You can use liblinear then.
>
>
>
> Best regards,
> Ben
>
>
> Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
>
> Ups I did not see the answer of Roman. Sorry about that. It is coming back
> to the same conclusion :)
>
> On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitr...@gmail.com>
> wrote:
>
>> Uhm actually increasing to 10000 samples solve the convergence issue.
>> SAGA is not designed to work with a so small sample size most probably.
>>
>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitr...@gmail.com>
>> wrote:
>>
>>> I slightly change the bench such that it uses pipeline and plotted the
>>> coefficient:
>>>
>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>
>>> I only see one of the 10 splits where SAGA is not converging, otherwise
>>> the coefficients
>>> look very close (I don't attach the figure here but they can be plotted
>>> using the snippet).
>>> So apart from this second split, the other differences seems to be
>>> numerical instability.
>>>
>>> Where I have some concern is regarding the convergence rate of SAGA but
>>> I have no
>>> intuition to know if this is normal or not.
>>>
>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurc...@gmail.com>
>>> wrote:
>>>
>>>> Ben,
>>>>
>>>> I can confirm your results with penalty='none' and C=1e9. In both
>>>> cases,
>>>> you are running a mostly unpenalized logisitic regression. Usually
>>>> that's less numerically stable than with a small regularization,
>>>> depending on the data collinearity.
>>>>
>>>> Running that same code with
>>>>   - larger penalty ( smaller C values)
>>>>   - or larger number of samples
>>>>   yields for me the same coefficients (up to some tolerance).
>>>>
>>>> You can also see that SAGA convergence is not good by the fact that it
>>>> needs 196000 epochs/iterations to converge.
>>>>
>>>> Actually, I have often seen convergence issues with SAG on small
>>>> datasets (in unit tests), not fully sure why.
>>>>
>>>> --
>>>> Roman
>>>>
>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>> > The predictions across solver are exactly the same when I run the
>>>> code.
>>>> > I am using 0.21.3 version. What is yours?
>>>> >
>>>> >
>>>> > In [13]: import sklearn
>>>> >
>>>> > In [14]: sklearn.__version__
>>>> > Out[14]: '0.21.3'
>>>> >
>>>> >
>>>> > Serafeim
>>>> >
>>>> >
>>>> >
>>>> >> On 9 Oct 2019, at 21:44, Benoît Presles <
>>>> benoit.pres...@u-bourgogne.fr
>>>> >> <mailto:benoit.pres...@u-bourgogne.fr>> wrote:
>>>> >>
>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > scikit-learn mailing list
>>>> > scikit-learn@python.org
>>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> >
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn@python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] logistic regression results are not stable between solvers

Reply via email to