[R] Subset rows over multiple columns

Doran, Harold Thu, 13 Apr 2006 11:32:51 -0700

I have a data frame where I need to subset certain rows before I compute
the mean of another variable. However, the value that I need to subset
by is found in multiple columns. For example, in the data below the
value R0000160 is found in the first and second columns (itd_1 and
itd_45).  These data are student responses to multiple choice test items
from a computer adaptive test. So, the variable itd_1 denotes that item
i was presented to student k in position t and then the variable
righta_a and righta_b denotes a correct (1) or incorrect response to
that item when it was presented.


My goal is to get the p-value (mean of the binary variable) for each
item irrespective of when it was presented to the student.

So, in the sample case below, I would use all elements in righta_a
(except for the second to last) and then only the second to last value
in righta_b.

> tail(tt)
         itd_1   itd_45 righta_a righta_b
18407 R0000160 R0208470        1        0
18412 R0000160 R0238140        0        1
18417 R0000160 R0259690        1        1
18422 R0000160 R0000730        1        1
18450 R0113750 R0000160        1        1
18456 R0000160 R0238690        0        1

One thing I can envision doing is using the reshape option such that
itd_1 and itd_45 would be in the "long" format. This would cause for
itd_1 and itd_45 to be stacked in a single column as well as righta_a
and righta_b and then I could then use tapply and get what I need
without any subsetting. That is

testScores <- reshape(tt, idvar='id', varying=list(c('itd_1', 'itd_45'),
c('righta_a', 'righta_b')), v.names=c('item','answer'),
timevar='item_position', direction='long')

with(testScores, tapply(answer, item, mean))

Or I could get

with(testScores, tapply(answer, list(item, position), mean))

The only problem here is that I have some duplicate IDs in the data and
reshape doesn't like turning data on its head in that situation, so I
would need to tinker with those first. 

So, I have what I think would be a solution, I wonder if there is
another way to preserve the data in this "wide" format and get the
estimates I need? Maybe it is just easier to use reshape. Any
suggestions?

Harold
Windows Xp
R 2.2.1

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Subset rows over multiple columns

Reply via email to