I have a data frame where I need to subset certain rows before I compute
the mean of another variable. However, the value that I need to subset
by is found in multiple columns. For example, in the data below the
value R0000160 is found in the first and second columns (itd_1 and
itd_45). These data are student responses to multiple choice test items
from a computer adaptive test. So, the variable itd_1 denotes that item
i was presented to student k in position t and then the variable
righta_a and righta_b denotes a correct (1) or incorrect response to
that item when it was presented.
My goal is to get the p-value (mean of the binary variable) for each
item irrespective of when it was presented to the student.
So, in the sample case below, I would use all elements in righta_a
(except for the second to last) and then only the second to last value
in righta_b.
> tail(tt)
itd_1 itd_45 righta_a righta_b
18407 R0000160 R0208470 1 0
18412 R0000160 R0238140 0 1
18417 R0000160 R0259690 1 1
18422 R0000160 R0000730 1 1
18450 R0113750 R0000160 1 1
18456 R0000160 R0238690 0 1
One thing I can envision doing is using the reshape option such that
itd_1 and itd_45 would be in the "long" format. This would cause for
itd_1 and itd_45 to be stacked in a single column as well as righta_a
and righta_b and then I could then use tapply and get what I need
without any subsetting. That is
testScores <- reshape(tt, idvar='id', varying=list(c('itd_1', 'itd_45'),
c('righta_a', 'righta_b')), v.names=c('item','answer'),
timevar='item_position', direction='long')
with(testScores, tapply(answer, item, mean))
Or I could get
with(testScores, tapply(answer, list(item, position), mean))
The only problem here is that I have some duplicate IDs in the data and
reshape doesn't like turning data on its head in that situation, so I
would need to tinker with those first.
So, I have what I think would be a solution, I wonder if there is
another way to preserve the data in this "wide" format and get the
estimates I need? Maybe it is just easier to use reshape. Any
suggestions?
Harold
Windows Xp
R 2.2.1
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html