Re: [scikit-learn] Inconsistent Logistic Regression fit results

Andreas Mueller Mon, 15 Aug 2016 15:20:49 -0700

Hm that looks kinda convoluted.
Why don't you just do


    df_train, df_test, y_train, y_test = train_test_split(logreg_x, logreg_y, 
random_state=0)


?
What version of scikit-learn are you using?

Also, you are modifying the inputs. Can you try to do the same but
pass a copy of the input dataframe to the method each time?


On 08/15/2016 06:00 PM, Chris Cameron wrote:

Sebastian,

That doesn’t do it. With the function:

def log_run(logreg_x, logreg_y):
     logreg_x['pass_fail'] = logreg_y
     df_train, df_test = train_test_split(logreg_x, random_state=0)
     y_train = df_train.pass_fail.as_matrix()
     y_test = df_test.pass_fail.as_matrix()
     del(df_train['pass_fail'])
     del(df_test['pass_fail'])
     log_reg_fit = LogisticRegression(class_weight='balanced',
                                      tol=0.000000001,
                                      random_state=0).fit(df_train, y_train)
     predicted = log_reg_fit.predict(df_test)
     accuracy = accuracy_score(y_test, predicted)
     kappa = cohen_kappa_score(y_test, predicted)

return [kappa, accuracy]


I’m still seeing:
log_run(df_save, y)
Out[7]: [-0.054421768707483005, 0.48333333333333334]

log_run(df_save, y)
Out[8]: [0.042553191489361743, 0.55000000000000004]

log_run(df_save, y)
Out[9]: [0.042553191489361743, 0.55000000000000004]

log_run(df_save, y)
Out[10]: [0.027777777777777728, 0.53333333333333333]


Chris

On Aug 15, 2016, at 3:42 PM, [email protected] wrote:

Hi, Chris,
have you set the random seed to a specific, contant integer value? Note that 
the default in LogisticRegression is random_state=None. Setting it to some 
arbitrary number like 123 may help if you haven’t done so, yet.

Best,
Sebastian

On Aug 15, 2016, at 5:27 PM, Chris Cameron <[email protected]> wrote:

Hi all,

Using the same X and y values sklearn.linear_model.LogisticRegression.fit() is 
providing me with inconsistent results.

The documentation for sklearn.linear_model.LogisticRegression states that "It 
is thus not uncommon, to have slightly different results for the same input data.” I 
am experiencing this, however the fix of using a smaller “tol” parameter isn’t 
providing me with consistent fit.

The code I’m using:

def log_run(logreg_x, logreg_y):
   logreg_x['pass_fail'] = logreg_y
   df_train, df_test = train_test_split(logreg_x, random_state=0)
   y_train = df_train.pass_fail.as_matrix()
   y_test = df_test.pass_fail.as_matrix()
   del(df_train['pass_fail'])
   del(df_test['pass_fail'])
   log_reg_fit = 
LogisticRegression(class_weight='balanced',tol=0.000000001).fit(df_train, 
y_train)
   predicted = log_reg_fit.predict(df_test)
   accuracy = accuracy_score(y_test, predicted)
   kappa = cohen_kappa_score(y_test, predicted)

   return [kappa, accuracy]


I’ve gone out of my way to be sure the test and train data is the same for each 
run, so I don’t think there should be random shuffling going on.

Example output:
---
log_run(df_save, y)
Out[32]: [0.027777777777777728, 0.53333333333333333]

log_run(df_save, y)
Out[33]: [0.027777777777777728, 0.53333333333333333]

log_run(df_save, y)
Out[34]: [0.11347517730496456, 0.58333333333333337]

log_run(df_save, y)
Out[35]: [0.042553191489361743, 0.55000000000000004]

log_run(df_save, y)
Out[36]: [-0.07407407407407407, 0.51666666666666672]

log_run(df_save, y)
Out[37]: [0.042553191489361743, 0.55000000000000004]

A little information on the problem DataFrame:
---
len(df_save)
Out[40]: 240

len(df_save.columns)
Out[41]: 18


If I omit this particular column the Kappa no longer fluctuates:

df_save[‘abc'].head()
Out[42]:
0    0.026316
1    0.333333
2    0.015152
3    0.010526
4    0.125000
Name: abc, dtype: float64


Does anyone have ideas on how I can figure this out? Is there some 
randomness/shuffling still going on I missed?


Thanks!
Chris
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Inconsistent Logistic Regression fit results

Reply via email to