Re: [scikit-learn] Inconsistent Logistic Regression fit results

Chris Cameron Mon, 15 Aug 2016 15:02:35 -0700

Sebastian,

That doesn’t do it. With the function:


def log_run(logreg_x, logreg_y):
    logreg_x['pass_fail'] = logreg_y
    df_train, df_test = train_test_split(logreg_x, random_state=0)
    y_train = df_train.pass_fail.as_matrix()
    y_test = df_test.pass_fail.as_matrix()
    del(df_train['pass_fail'])
    del(df_test['pass_fail'])
    log_reg_fit = LogisticRegression(class_weight='balanced',
                                     tol=0.000000001,
                                     random_state=0).fit(df_train, y_train)
    predicted = log_reg_fit.predict(df_test)
    accuracy = accuracy_score(y_test, predicted)
    kappa = cohen_kappa_score(y_test, predicted)
    
    return [kappa, accuracy]

I’m still seeing:
log_run(df_save, y)
Out[7]: [-0.054421768707483005, 0.48333333333333334]

log_run(df_save, y)
Out[8]: [0.042553191489361743, 0.55000000000000004]

log_run(df_save, y)
Out[9]: [0.042553191489361743, 0.55000000000000004]

log_run(df_save, y)
Out[10]: [0.027777777777777728, 0.53333333333333333]


Chris

> On Aug 15, 2016, at 3:42 PM, [email protected] wrote:
> 
> Hi, Chris,
> have you set the random seed to a specific, contant integer value? Note that 
> the default in LogisticRegression is random_state=None. Setting it to some 
> arbitrary number like 123 may help if you haven’t done so, yet.
> 
> Best,
> Sebastian
> 
> 
> 
>> On Aug 15, 2016, at 5:27 PM, Chris Cameron <[email protected]> wrote:
>> 
>> Hi all,
>> 
>> Using the same X and y values sklearn.linear_model.LogisticRegression.fit() 
>> is providing me with inconsistent results.
>> 
>> The documentation for sklearn.linear_model.LogisticRegression states that 
>> "It is thus not uncommon, to have slightly different results for the same 
>> input data.” I am experiencing this, however the fix of using a smaller 
>> “tol” parameter isn’t providing me with consistent fit.
>> 
>> The code I’m using:
>> 
>> def log_run(logreg_x, logreg_y):
>>   logreg_x['pass_fail'] = logreg_y
>>   df_train, df_test = train_test_split(logreg_x, random_state=0)
>>   y_train = df_train.pass_fail.as_matrix()
>>   y_test = df_test.pass_fail.as_matrix()
>>   del(df_train['pass_fail'])
>>   del(df_test['pass_fail'])
>>   log_reg_fit = 
>> LogisticRegression(class_weight='balanced',tol=0.000000001).fit(df_train, 
>> y_train)
>>   predicted = log_reg_fit.predict(df_test)
>>   accuracy = accuracy_score(y_test, predicted)
>>   kappa = cohen_kappa_score(y_test, predicted)
>> 
>>   return [kappa, accuracy]
>> 
>> 
>> I’ve gone out of my way to be sure the test and train data is the same for 
>> each run, so I don’t think there should be random shuffling going on.
>> 
>> Example output:
>> ---
>> log_run(df_save, y)
>> Out[32]: [0.027777777777777728, 0.53333333333333333]
>> 
>> log_run(df_save, y)
>> Out[33]: [0.027777777777777728, 0.53333333333333333]
>> 
>> log_run(df_save, y)
>> Out[34]: [0.11347517730496456, 0.58333333333333337]
>> 
>> log_run(df_save, y)
>> Out[35]: [0.042553191489361743, 0.55000000000000004]
>> 
>> log_run(df_save, y)
>> Out[36]: [-0.07407407407407407, 0.51666666666666672]
>> 
>> log_run(df_save, y)
>> Out[37]: [0.042553191489361743, 0.55000000000000004]
>> 
>> A little information on the problem DataFrame:
>> ---
>> len(df_save)
>> Out[40]: 240
>> 
>> len(df_save.columns)
>> Out[41]: 18
>> 
>> 
>> If I omit this particular column the Kappa no longer fluctuates:
>> 
>> df_save[‘abc'].head()
>> Out[42]: 
>> 0    0.026316
>> 1    0.333333
>> 2    0.015152
>> 3    0.010526
>> 4    0.125000
>> Name: abc, dtype: float64
>> 
>> 
>> Does anyone have ideas on how I can figure this out? Is there some 
>> randomness/shuffling still going on I missed?
>> 
>> 
>> Thanks!
>> Chris
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Inconsistent Logistic Regression fit results

Reply via email to