AUC is independent of threshold. The confusion matrix is not. If all scores for the positive class are greater than all scores for the negative class you will have AUC = 1.00. On the other hand, that doesn't say that all of the positive scores are > 0.5 and all the negative ones < 0.5. It just says that there is *some* threshold that would give perfect performance on the data set you used. Note that this value is on the training set.
On Sun, May 22, 2011 at 2:53 PM, Mat Kelcey <[email protected]>wrote: > Hi, > > I'm working through some examples from mahout in action and have got a > strange result. > > mat@matpc:~/dev/mahout$ bin/mahout trainlogistic --input donut.csv > --output ./model --target color --categories 2 --predictors x y a b c > --types numeric --features 20 --passes 100 --rate 50 > ... > mat@matpc:~/dev/mahout$ bin/mahout runlogistic --input donut.csv > --model ./model --auc --confusion > ... > AUC = 1.00 > confusion: [[27.0, 1.0], [0.0, 12.0]] > entropy: [[-0.1, -1.5], [-4.0, -0.2]] > ... > > how can i have AUC=1.00 when there was a mis prediction? > > cheers, > mat >
