Dear scikit-learn users,
I did what you suggested (see code below) and I still do not get the
same results between solvers. I do not have the same predictions and I
do not have the same coefficients.
Best regards,
Ben
Here is the new source code:
from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=400,
n_features=45,
n_informative=10,
n_redundant=0,
n_repeated=0,
n_classes=2,
n_clusters_per_class=1,
random_state=RANDOM_SEED,
shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
X_split_train, X_split_test = X_sim[train_index_split],
X_sim[test_index_split]
y_split_train, y_split_test = y_sim[train_index_split],
y_sim[test_index_split]
ss = StandardScaler()
X_split_train = ss.fit_transform(X_split_train)
X_split_test = ss.transform(X_split_test)
#
classifier_lbfgs = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
solver='lbfgs', penalty='none',
tol=1e-6)
classifier_lbfgs.fit(X_split_train, y_split_train)
print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
print(classifier_lbfgs.coef_)
classifier_saga = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
solver='saga', penalty='none',
tol=1e-6)
classifier_saga.fit(X_split_train, y_split_train)
print('classifier saga iter:', classifier_saga.n_iter_)
print(classifier_saga.coef_)
#
y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
y_pred_saga = classifier_saga.predict(X_split_test)
#
if (y_pred_lbfgs==y_pred_saga).all() == False:
print('lbfgs does not give the same results as saga :-( !')
exit(1)
Le 09/10/2019 à 20:25, Guillaume Lemaître a écrit :
Could you generate more samples, set penalty to none, reduce the tolerance and
check the coefficients instead of predictions. This is sure to be sure that
this is not only a numerical error.
Sent from my phone - sorry to be brief and potential misspell.
Original Message
From: benoit.pres...@u-bourgogne.fr
Sent: 8 October 2019 20:27
To: scikit-learn@python.org
Reply to: scikit-learn@python.org
Subject: [scikit-learn] logistic regression results are not stable between
solvers
Dear scikit-learn users,
I am using logistic regression to make some predictions. On my own data,
I do not get the same results between solvers. I managed to reproduce
this issue on synthetic data (see the code below).
All solvers seem to converge (n_iter_ < max_iter), so why do I get
different results?
If results between solvers are not stable, which one to choose?
Best regards,
Ben
------------------------------------------
Here is the code I used to generate synthetic data:
from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=200,
n_features=45,
n_informative=10,
n_redundant=0,
n_repeated=0,
n_classes=2,
n_clusters_per_class=1,
random_state=RANDOM_SEED,
shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
X_split_train, X_split_test = X_sim[train_index_split],
X_sim[test_index_split]
y_split_train, y_split_test = y_sim[train_index_split],
y_sim[test_index_split]
ss = StandardScaler()
X_split_train = ss.fit_transform(X_split_train)
X_split_test = ss.transform(X_split_test)
#
classifier_lbfgs = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
solver='lbfgs')
classifier_lbfgs.fit(X_split_train, y_split_train)
print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
classifier_saga = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
solver='saga')
classifier_saga.fit(X_split_train, y_split_train)
print('classifier saga iter:', classifier_saga.n_iter_)
#
y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
y_pred_saga = classifier_saga.predict(X_split_test)
#
if (y_pred_lbfgs==y_pred_saga).all() == False:
print('lbfgs does not give the same results as saga :-( !')
exit()
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn