Dear R Helpers I am having difficulty understanding how to use the penalty matrix for the nomROC function in package 'nonbinROC'.
The documentation says that the values of the penalty matrix code the penalty function L[i,j] in which 0 <= L[i,j] <= 1 for j > i. It gives an example that if we have an ordered response with 4 categories, then we might wish to penalise larger misclassifications more - so there is (for example) 0 penalty for correct classifications, 0.25 penalty for misclassifying by one category, 0.5 penalty for misclassifying by two categories and 1.0 penalty for misclassifying by 3 categories. I wanted to use this sort of penalty - but with equal distances between the 4 categories (0, 1/3, 2/3, 1). But, I found that if I simply re-scale the penalty matrix, while maintaining equal distances between categories, then the estimate of overall accuracy increases. In effect I can achieve any value for accuracy - including unity - by re-scaling the penalty matrix. So, I'd like to ask what, if any are the contraints on the scaling process? Here is a working code that illustrates my difficulty:- set.seed(1); gldstd=round(runif(200)*4); gldstd[gldstd==0]=4; table(gldstd) pred1=gldstd*rnorm(200, mean=1, sd=2/gldstd); boxplot(pred1~gldstd) library(nonbinROC); gldstd=ordered(gldstd) ordered_penalty = matrix(c(0,0,0,0,1/3,0,0,0,2/3,1/3,0,0,1,2/3,1/3,0), nrow = 4) constant_penalty = matrix(c(0,0,0,0,1,0,0,0,1,1,0,0,1,1,1,0), nrow = 4) # first using the constant_penalty (default) matrix ordROC(gldstd,pred1, penalty=constant_penalty) # now re-scaling the constant_penalty matrix by 1/2 ordROC(gldstd,pred1, penalty=constant_penalty/2) # now using an ordered penalty matrix ordROC(gldstd,pred1,penalty=ordered_penalty) # now re-scaling the ordered_penalty matrix by 1/2 ordROC(gldstd,pred1,penalty=ordered_penalty/2) Here is the resulting output:- > # first using the constant_penalty (default) matrix > ordROC(gldstd,pred1, penalty=constant_penalty) $`Pairwise Accuracy` Pair Estimate Standard.Error 1 1 vs 2 0.6420973 0.05667060 2 1 vs 3 0.7083172 0.05293275 3 1 vs 4 0.8500507 0.04149018 4 2 vs 3 0.6081169 0.05438183 5 2 vs 4 0.7950680 0.04736247 6 3 vs 4 0.7025974 0.05371362 $`Penalty Matrix` 1 2 3 4 1 0 1 1 1 2 0 0 1 1 3 0 0 0 1 4 0 0 0 0 $`Overall Accuracy` Estimate Standard.Error 1 0.7074935 0.02764736 > # now re-scaling the constant_penalty matrix by 1/2 > ordROC(gldstd,pred1, penalty=constant_penalty/2) $`Pairwise Accuracy` Pair Estimate Standard.Error 1 1 vs 2 0.6420973 0.05667060 2 1 vs 3 0.7083172 0.05293275 3 1 vs 4 0.8500507 0.04149018 4 2 vs 3 0.6081169 0.05438183 5 2 vs 4 0.7950680 0.04736247 6 3 vs 4 0.7025974 0.05371362 $`Penalty Matrix` 1 2 3 4 1 0 0.5 0.5 0.5 2 0 0.0 0.5 0.5 3 0 0.0 0.0 0.5 4 0 0.0 0.0 0.0 $`Overall Accuracy` Estimate Standard.Error 1 0.8537467 0.01382368 <=========== larger estimate of overall accuracy (cf 0.707 for original constant_penalty matrix) > # now using an ordered penalty matrix > ordROC(gldstd,pred1,penalty=ordered_penalty) $`Pairwise Accuracy` Pair Estimate Standard.Error 1 1 vs 2 0.6420973 0.05667060 2 1 vs 3 0.7083172 0.05293275 3 1 vs 4 0.8500507 0.04149018 4 2 vs 3 0.6081169 0.05438183 5 2 vs 4 0.7950680 0.04736247 6 3 vs 4 0.7025974 0.05371362 $`Penalty Matrix` 1 2 3 4 1 0 0.3333333 0.6666667 1.0000000 2 0 0.0000000 0.3333333 0.6666667 3 0 0.0000000 0.0000000 0.3333333 4 0 0.0000000 0.0000000 0.0000000 $`Overall Accuracy` Estimate Standard.Error 1 0.8616933 0.01622384 > # now re-scaling the ordered_penalty matrix by 1/2 > ordROC(gldstd,pred1,penalty=ordered_penalty/2) $`Pairwise Accuracy` Pair Estimate Standard.Error 1 1 vs 2 0.6420973 0.05667060 2 1 vs 3 0.7083172 0.05293275 3 1 vs 4 0.8500507 0.04149018 4 2 vs 3 0.6081169 0.05438183 5 2 vs 4 0.7950680 0.04736247 6 3 vs 4 0.7025974 0.05371362 $`Penalty Matrix` 1 2 3 4 1 0 0.1666667 0.3333333 0.5000000 2 0 0.0000000 0.1666667 0.3333333 3 0 0.0000000 0.0000000 0.1666667 4 0 0.0000000 0.0000000 0.0000000 $`Overall Accuracy` Estimate Standard.Error 1 0.9308467 0.00811192 <=========== larger estimate of overall accuracy (cf 0.862 for original ordered_penalty matrix) I can see that penalising differences between categories less might produce better overall accuracy. However, if I use a constant penalty matrix with all off-diagonal values very close to zero, then the overall accuracy approaches 1. It seems counter-intuitive to me that the estimate of overall accuracy for an ordinal gold standard should depend on the absolute values of the penalty matrix. So, I would like to ask, (a) should the penalty matrix always contain at least one penalty with a value of 1 and/or (b) should there be any other constraint on the sum of penalties in the matrix (e.g. should the matrix sum to some multiple of the number of categories), or (c) is one free to use arbitrarily-scaled penalty matrices for estimates of the accuracy of an ordinal gold standard? Thanks, in advance, for your help, Jonathan Williams [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.