You are calculating recall, not accuracy.
On Sun, 10 Mar 2019 at 05:36, Rajnish kamboj <rajnishk7.i...@gmail.com> wrote: > > Hi > > I have recently started machine learning and it is my first query regarding > prediction accuracy. > > There is difference in prediction accuracy using SGDClassifier and Cross > validation scores. > > import numpy as np > from sklearn.datasets import fetch_openml > from sklearn.linear_model import SGDClassifier > > mnist = fetch_openml('mnist_784', version=1, cache=True) > X, y = mnist['data'], mnist['target'] > X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:] > shuffled_index = np.random.permutation(60000) # shuffle the 0 - 60000 range > X_train, y_train = X_train[shuffled_index], y_train[shuffled_index] > > y_train_5 = (y_train == '5') > y_test_5 = (y_test == '5') > > sgd_clf = SGDClassifier(random_state=42, tol=1e-3, max_iter=1000) > sgd_clf.fit(X_train, y_train_5) > > # Predicting for all 5s > print("####### PREDICTION STATS ##############") > y_train_5_pred = sgd_clf.predict(X_train) > > print("Total y_train_5 [False|True both]]:", len(y_train_5)) > print("Total y_train_5 [Only 5s]:", sum(y_train_5)) > > # some other digit may be predicted as 5 and some 5s may be predicted as not 5 > print("Predicted 5s:", sum(y_train_5_pred)) > > correctly_predicted = sum(np.logical_and(y_train_5_pred, y_train_5)) > print("Correct Predicted", correctly_predicted) > print("Accuracy:", correctly_predicted/sum(y_train_5) * 100) > > from sklearn.model_selection import cross_val_score > cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring='accuracy') > > MY Output > > ####### PREDICTION STATS ############## > Total y_train_5 [False|True both]]: 60000 > Total y_train_5 [Only 5s]: 5421 > Predicted 5s: 3863 > Correct Predicted 3574 > Accuracy: 65.9287954251983 > array([0.9323 , 0.96805, 0.9641 ]) > ####################################### > > So as per my observation there is a difference, why? > > SGDCLassifier is ~65.92% accurate > cross_val_score are ~95% > > Am I comparing it in wrong way? OR I am missing something? > > > Thanks > > Rajnish > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn