We have something we are not understanding. clf2 = SGDClassifier(loss='log', penalty='l2',shuffle=True, max_iter=10,tol=.00001, early_stopping=True, validation_fraction=0.2, n_iter_no_change=2, verbose=0, random_state=1)
clf2.fit(X_train,y_train) clf2.n_iter_ The result of the last line is ALWAYS n_iter_no_chang+1. (in this case 3, if we set n_iter+no_change=10, it ends at 11) No matter how I try to slow things down, it appears the early stopping kicks in at epoch 1. We've played with the learning rate, tolerance, etc... to try and make sure our problem isn't being solved in 1 epoch (which does seem dubious). I even ran this manually and scored the accuracy (along with enabling warm_start=True, and max_iter=1) for i in range(5): clf2.fit(X_train,y_train) p = clf2.predict(X_test) print(accuracy_score(p,y_test)) 0.9748226138704509 0.987182421606775 0.9881742580300603 0.9879453727016099 0.991760128175784 So it seems there is some accuracy improvement there to be had, however small--We're stumped as to what is going on and could use some wiser minds to explain this behavior
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn