Re: [scikit-learn] logistic regression results are not stable between solvers

Andreas Mueller Wed, 08 Jan 2020 12:56:15 -0800

Hi Ben.

Liblinear and l-bfgs might both converge but to different solutions,given that the intercept is penalized.There is also problems with ill-conditioned problems that are hard todetect.My impression of SAGA was that the convergence checks are too loose andwe should improve them.Have you checked the objective of the l-bfgs and liblinear solvers? Withill-conditioned data the objectives could be similar with differentsolutions.

It's not intended for scikit-learn to warn about ill-conditionedproblems, I think, only convergence issues.


Hth,
Andy


On 1/8/20 3:31 PM, Benoît Presles wrote:

With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinearn_iter_ = 64.



On 08/01/2020 21:18, Guillaume Lemaître wrote:

We issue convergence warning. Can you check n_iter to be sure thatyou did not convergence to the stated convergence?

On Wed, 8 Jan 2020 at 20:53, Benoît Presles<[email protected]<mailto:[email protected]>> wrote:


    Dear sklearn users,

    I still have some issues concerning logistic regression.
    I did compare on the same data (simulated data) sklearn with
    three different solvers (lbfgs, saga, liblinear) and statsmodels.

    When everything goes well, I get the same results between lbfgs,
    saga, liblinear and statsmodels. When everything goes wrong, all
    the results are different.

    In fact, when everything goes wrong, statsmodels gives me a
    convergence warning (Warning: Maximum number of iterations has
    been exceeded. Current function value: inf Iterations: 20000) +
    an error (numpy.linalg.LinAlgError: Singular matrix).

    Why sklearn does not tell me anything? How can I know that I have
    convergence issues with sklearn?


    Thanks for your help,
    Best regards,
    Ben

    --------------------------------------------

    Here is the code I used to generate synthetic data:

    from sklearn.datasets import make_classification
    from sklearn.model_selection import StratifiedShuffleSplit
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LogisticRegression
    import statsmodels.api as sm
    #
    RANDOM_SEED = 2
    #
    X_sim, y_sim = make_classification(n_samples=200,
                               n_features=20,
                               n_informative=10,
                               n_redundant=0,
                               n_repeated=0,
                               n_classes=2,
                               n_clusters_per_class=1,
                               random_state=RANDOM_SEED,
                               shuffle=False)
    #
    sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
    random_state=RANDOM_SEED)
    for train_index_split, test_index_split in sss.split(X_sim, y_sim):
        X_split_train, X_split_test = X_sim[train_index_split],
    X_sim[test_index_split]
        y_split_train, y_split_test = y_sim[train_index_split],
    y_sim[test_index_split]
        ss = StandardScaler()
        X_split_train = ss.fit_transform(X_split_train)
        X_split_test = ss.transform(X_split_test)
        #
        classifier_lbfgs = LogisticRegression(fit_intercept=True,
    max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
                                        solver='lbfgs',
    penalty='none', tol=1e-6)
        classifier_lbfgs.fit(X_split_train, y_split_train)
        print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
        print(classifier_lbfgs.intercept_)
        print(classifier_lbfgs.coef_)
        #
        classifier_saga = LogisticRegression(fit_intercept=True,
    max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
                                        solver='saga',
    penalty='none', tol=1e-6)
        classifier_saga.fit(X_split_train, y_split_train)
        print('classifier saga iter:', classifier_saga.n_iter_)
        print(classifier_saga.intercept_)
        print(classifier_saga.coef_)
        #
        classifier_liblinear = LogisticRegression(fit_intercept=True,
    max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
                                             C=1e9,
    solver='liblinear', penalty='l2', tol=1e-6)
        classifier_liblinear.fit(X_split_train, y_split_train)
        print('classifier liblinear iter:', classifier_liblinear.n_iter_)
        print(classifier_liblinear.intercept_)
        print(classifier_liblinear.coef_)
        # statsmodels
        logit = sm.Logit(y_split_train,
    sm.tools.add_constant(X_split_train))
        logit_res = logit.fit(maxiter=20000)
        print("Coef statsmodels")
        print(logit_res.params)



    On 11/10/2019 15:42, Andreas Mueller wrote:



    On 10/10/19 1:14 PM, Benoît Presles wrote:


    Thanks for your answers.

    On my real data, I do not have so many samples. I have a bit
    more than 200 samples in total and I also would like to get
    some results with unpenalized logisitic regression.
    What do you suggest? Should I switch to the lbfgs solver?

    Yes.

    Am I sure that with this solver I will not have any convergence
    issue and always get the good result? Indeed, I did not get any
    convergence warning with saga, so I thought everything was
    fine. I noticed some issues only when I decided to test several
    solvers. Without comparing the results across solvers, how to
    be sure that the optimisation goes well? Shouldn't scikit-learn
    warn the user somehow if it is not the case?

    We should attempt to warn in the SAGA solver if it doesn't
    converge. That it doesn't raise a convergence warning should
    probably be considered a bug.
    It uses the maximum weight change as a stopping criterion right now.
    We could probably compute the dual objective once in the end to
    see if we converged, right? Or is that not possible with SAGA?
    If not, we might want to caution that no convergence warning
    will be raised.


    At last, I was using saga because I also wanted to do some
    feature selection by using l1 penalty which is not supported by
    lbfgs...

    You can use liblinear then.


    Best regards,
    Ben


    Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :

    Ups I did not see the answer of Roman. Sorry about that. It is
    coming back to the same conclusion :)

    On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître
    <[email protected] <mailto:[email protected]>> wrote:

        Uhm actually increasing to 10000 samples solve the
        convergence issue.
        SAGA is not designed to work with a so small sample size
        most probably.

        On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître
        <[email protected] <mailto:[email protected]>>
        wrote:

            I slightly change the bench such that it uses pipeline
            and plotted the coefficient:

            https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386

            I only see one of the 10 splits where SAGA is not
            converging, otherwise the coefficients
            look very close (I don't attach the figure here but
            they can be plotted using the snippet).
            So apart from this second split, the other differences
            seems to be numerical instability.

            Where I have some concern is regarding the convergence
            rate of SAGA but I have no
            intuition to know if this is normal or not.

            On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
            <[email protected] <mailto:[email protected]>>
            wrote:

                Ben,

                I can confirm your results with penalty='none' and
                C=1e9. In both cases,
                you are running a mostly unpenalized logisitic
                regression. Usually
                that's less numerically stable than with a small
                regularization,
                depending on the data collinearity.

                Running that same code with
                  - larger penalty ( smaller C values)
                  - or larger number of samples
                  yields for me the same coefficients (up to some
                tolerance).

                You can also see that SAGA convergence is not good
                by the fact that it
                needs 196000 epochs/iterations to converge.

                Actually, I have often seen convergence issues
                with SAG on small
                datasets (in unit tests), not fully sure why.

--Roman


                On 09/10/2019 22:10, serafim loukas wrote:
                > The predictions across solver are exactly the
                same when I run the code.
                > I am using 0.21.3 version. What is yours?
                >
                >
                > In [13]: import sklearn
                >
                > In [14]: sklearn.__version__
                > Out[14]: '0.21.3'
                >
                >
                > Serafeim
                >
                >
                >
                >> On 9 Oct 2019, at 21:44, Benoît Presles
                <[email protected]
                <mailto:[email protected]>
                >> <mailto:[email protected]
                <mailto:[email protected]>>> wrote:
                >>
                >> (y_pred_lbfgs==y_pred_saga).all() == False
                >
                >
                > _______________________________________________
                > scikit-learn mailing list
                > [email protected]
                <mailto:[email protected]>
                >
                https://mail.python.org/mailman/listinfo/scikit-learn
                >

                _______________________________________________
                scikit-learn mailing list
                [email protected]
                <mailto:[email protected]>
                https://mail.python.org/mailman/listinfo/scikit-learn

--Guillaume Lemaitre

            Scikit-learn @ Inria Foundation
            https://glemaitre.github.io/

--Guillaume Lemaitre

        Scikit-learn @ Inria Foundation
        https://glemaitre.github.io/

--Guillaume Lemaitre

    Scikit-learn @ Inria Foundation
    https://glemaitre.github.io/

    _______________________________________________
    scikit-learn mailing list
    [email protected]  <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/scikit-learn


    _______________________________________________
    scikit-learn mailing list
    [email protected]  <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/scikit-learn



    _______________________________________________
    scikit-learn mailing list
    [email protected]  <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/scikit-learn

    _______________________________________________
    scikit-learn mailing list
    [email protected] <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/scikit-learn



--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] logistic regression results are not stable between solvers

Reply via email to