[R] scaling of nonbinROC penalties

Jonathan Williams Fri, 18 Jan 2013 07:25:21 -0800

Dear R Helpers

I am having difficulty understanding how to use the penalty matrix for the 
nomROC function in package 'nonbinROC'.


The documentation says that the values of the penalty matrix code the 
penalty function L[i,j] in which 0 <= L[i,j] <= 1 for
j > i. It gives an example that if we have an ordered response with 4 
categories, then we might wish to penalise larger misclassifications more - so 
there is (for example) 0 penalty for correct classifications, 0.25 penalty for 
misclassifying by one category, 0.5 penalty for misclassifying by two 
categories and 1.0 penalty for misclassifying by 3 categories. I wanted to use 
this sort of penalty - but with equal distances between the 4 categories (0, 
1/3, 2/3, 1). But, I found that if I simply re-scale the penalty matrix, while 
maintaining equal distances between categories, then the estimate of overall 
accuracy increases. In effect I can achieve any value for accuracy - including 
unity - by re-scaling the penalty matrix. So, I'd like to ask what, if any are 
the contraints on the scaling process?

Here is a working code that illustrates my difficulty:-

set.seed(1); gldstd=round(runif(200)*4); gldstd[gldstd==0]=4; table(gldstd)
pred1=gldstd*rnorm(200, mean=1, sd=2/gldstd); boxplot(pred1~gldstd)
library(nonbinROC); gldstd=ordered(gldstd)
ordered_penalty = matrix(c(0,0,0,0,1/3,0,0,0,2/3,1/3,0,0,1,2/3,1/3,0), nrow = 4)
constant_penalty = matrix(c(0,0,0,0,1,0,0,0,1,1,0,0,1,1,1,0), nrow = 4)

# first using the constant_penalty (default) matrix
ordROC(gldstd,pred1, penalty=constant_penalty)

# now re-scaling the constant_penalty matrix by 1/2
ordROC(gldstd,pred1, penalty=constant_penalty/2)

# now using an ordered penalty matrix
ordROC(gldstd,pred1,penalty=ordered_penalty)

# now re-scaling the ordered_penalty matrix by 1/2
ordROC(gldstd,pred1,penalty=ordered_penalty/2)

Here is the resulting output:-
> # first using the constant_penalty (default) matrix
> ordROC(gldstd,pred1, penalty=constant_penalty)
$`Pairwise Accuracy`
    Pair  Estimate Standard.Error
1 1 vs 2 0.6420973     0.05667060
2 1 vs 3 0.7083172     0.05293275
3 1 vs 4 0.8500507     0.04149018
4 2 vs 3 0.6081169     0.05438183
5 2 vs 4 0.7950680     0.04736247
6 3 vs 4 0.7025974     0.05371362

$`Penalty Matrix`
  1 2 3 4
1 0 1 1 1
2 0 0 1 1
3 0 0 0 1
4 0 0 0 0

$`Overall Accuracy`
   Estimate Standard.Error
1 0.7074935     0.02764736

> # now re-scaling the constant_penalty matrix by 1/2
> ordROC(gldstd,pred1, penalty=constant_penalty/2)
$`Pairwise Accuracy`
    Pair  Estimate Standard.Error
1 1 vs 2 0.6420973     0.05667060
2 1 vs 3 0.7083172     0.05293275
3 1 vs 4 0.8500507     0.04149018
4 2 vs 3 0.6081169     0.05438183
5 2 vs 4 0.7950680     0.04736247
6 3 vs 4 0.7025974     0.05371362

$`Penalty Matrix`
  1   2   3   4
1 0 0.5 0.5 0.5
2 0 0.0 0.5 0.5
3 0 0.0 0.0 0.5
4 0 0.0 0.0 0.0

$`Overall Accuracy`
   Estimate Standard.Error
1 0.8537467     0.01382368   <=========== larger estimate of overall accuracy 
(cf 0.707 for original constant_penalty matrix)

> # now using an ordered penalty matrix
> ordROC(gldstd,pred1,penalty=ordered_penalty)
$`Pairwise Accuracy`
    Pair  Estimate Standard.Error
1 1 vs 2 0.6420973     0.05667060
2 1 vs 3 0.7083172     0.05293275
3 1 vs 4 0.8500507     0.04149018
4 2 vs 3 0.6081169     0.05438183
5 2 vs 4 0.7950680     0.04736247
6 3 vs 4 0.7025974     0.05371362

$`Penalty Matrix`
  1         2         3         4
1 0 0.3333333 0.6666667 1.0000000
2 0 0.0000000 0.3333333 0.6666667
3 0 0.0000000 0.0000000 0.3333333
4 0 0.0000000 0.0000000 0.0000000

$`Overall Accuracy`
   Estimate Standard.Error
1 0.8616933     0.01622384

> # now re-scaling the ordered_penalty matrix by 1/2
> ordROC(gldstd,pred1,penalty=ordered_penalty/2)
$`Pairwise Accuracy`
    Pair  Estimate Standard.Error
1 1 vs 2 0.6420973     0.05667060
2 1 vs 3 0.7083172     0.05293275
3 1 vs 4 0.8500507     0.04149018
4 2 vs 3 0.6081169     0.05438183
5 2 vs 4 0.7950680     0.04736247
6 3 vs 4 0.7025974     0.05371362

$`Penalty Matrix`
  1         2         3         4
1 0 0.1666667 0.3333333 0.5000000
2 0 0.0000000 0.1666667 0.3333333
3 0 0.0000000 0.0000000 0.1666667
4 0 0.0000000 0.0000000 0.0000000

$`Overall Accuracy`
   Estimate Standard.Error
1 0.9308467     0.00811192 <=========== larger estimate of overall accuracy (cf 
0.862 for original ordered_penalty matrix)

I can see that penalising differences between categories less might produce 
better overall accuracy. However, if I use a constant penalty matrix with all 
off-diagonal values very close to zero, then the overall accuracy approaches 1. 
It seems counter-intuitive to me that the estimate of overall accuracy for an 
ordinal gold standard should depend on the absolute values of the penalty 
matrix.

So, I would like to ask, (a) should the penalty matrix always contain at least 
one penalty with a value of 1 and/or (b) should there be any other constraint 
on the sum of penalties in the matrix (e.g. should the matrix sum to some 
multiple of the number of categories), or (c) is one free to use 
arbitrarily-scaled penalty matrices for estimates of the accuracy of an ordinal 
gold standard?

Thanks, in advance, for your help,

Jonathan Williams

                                                                                
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] scaling of nonbinROC penalties

Reply via email to