Dietterich had quite a comprehensive paper on this issue:
[1] T. G. Dietterich. Approximate statistical tests for comparing supervised
classification learning algorithms. Neural computation, 10(7):1895–1923, 1998.
I am not sure if it applies to other error metrics rather than
"misclassification e
Hi folks,
I've a question in my mind I could not find a proper answer for it. Suppose
I have two different systems A and B applied on the same dataset and using
different algorithm. Each system scores a specific F-measure(F1) such as:
System A: 88% F1
System B: 89.6% F1
I want to see if the diffe