[R] selecting only corresponding categories from a confusion matrix

2010-11-29 Thread drflxms
Dear R colleagues,

as a result of my calculations regarding the inter-observer-variability
in bronchoscopy, I get a confusion matrix like the following:

   0   1 1001 1010  11
0609  11   54   36   6
1  1   260   2
1014   008   4
1004   000   0
1000  23   7   12   10   5
1001   0   040   0
1010   4   003   0
1011   1   010   2
11 0   033   1
1101   000   0
1100   2   000   0
1110   1   000   0

The first column represents the categories found among observers, the
top row represents the categories found by the reference (goldstandard).
I am looking for a way (general algorithm) to extract a data.frame with
only the corresponding categories among observers and reference from the
above confusion matrix. Corresponding means in this case, that a
category has been chosen by both: observers and reference.
In this example corresponding categories would be simply all categories
that have been chosen by the reference (0,1,1001,1010,11), but generally
there might also occur categories which are found by the reference only
(and not among observers - in the first column).
So the solution-dataframe for the above example would look like:

   0   1 1001 1010  11
0609  11   54   36   6
1  1   260   2
1001   0   040   0
1010   4   003   0
11 0   033   1

All the categories found among observers only, were omitted.

If the solution algorithm would include a method to list the omitted
categories and to count their number as well as the number of omitted
cases, it would be just perfect for me.

I'd be happy to read from you soon! Thanks in advance for any kind of
help with this.
Greetings from snowy Munich, Felix

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting only corresponding categories from a confusion matrix

2010-11-29 Thread David Winsemius


On Nov 29, 2010, at 8:32 AM, drflxms wrote:


Dear R colleagues,

as a result of my calculations regarding the inter-observer- 
variability

in bronchoscopy, I get a confusion matrix like the following:

  0   1 1001 1010  11
0609  11   54   36   6
1  1   260   2
1014   008   4
1004   000   0
1000  23   7   12   10   5
1001   0   040   0
1010   4   003   0
1011   1   010   2
11 0   033   1
1101   000   0
1100   2   000   0
1110   1   000   0

The first column represents the categories found among observers, the
top row represents the categories found by the reference  
(goldstandard).
I am looking for a way (general algorithm) to extract a data.frame  
with
only the corresponding categories among observers and reference from  
the

above confusion matrix. Corresponding means in this case, that a
category has been chosen by both: observers and reference.
In this example corresponding categories would be simply all  
categories
that have been chosen by the reference (0,1,1001,1010,11), but  
generally
there might also occur categories which are found by the reference  
only

(and not among observers - in the first column).
So the solution-dataframe for the above example would look like:

  0   1 1001 1010  11
0609  11   54   36   6
1  1   260   2
1001   0   040   0
1010   4   003   0
11 0   033   1


I wasn't able to follow the confusing, er, confusion matrix  
explanation but it appears from a comparison of the input and output  
that you just want row indices that are the  column names:


 mtx[colnames(mtx), ]
   0  1 1001 1010 11
0609 11   54   36  6
1  1  260  2
1001   0  040  0
1010   4  003  0
11 0  033  1

 # and the omitted

 mtx[!rownames(mtx) %in% colnames(mtx), ]
  0 1 1001 1010 11
10   14 008  4
100   4 000  0
1000 23 7   12   10  5
1011  1 010  2
110   1 000  0
1100  2 000  0
1110  1 000  0

 # and their number:

 NROW(mtx[!rownames(mtx) %in% colnames(mtx), ])
[1] 7




All the categories found among observers only, were omitted.

If the solution algorithm would include a method to list the omitted
categories and to count their number as well as the number of omitted
cases, it would be just perfect for me.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.