Hello,
I am relatively new to R, and I am trying to select the last
observation within a group, where the group is defined by two
variables. One of the variables is a date.
In the below example, C3 varies within C2, which varies within C1. I
need to select the last observation in C3 for 4 groups (C1*C2): 1x,
1y, 2x, and 2y. In my real dataset, C2 is a date (mm/dd/yy)
C1 C2 C3
1 x 1
1 x 2
1 y 1
1 y 2
2 x 1
2 x 2
2 y 1
2 y 2
I have found code (from UCLA R FAQs and this list's archives) for
selecting the last observation when a group is defined by ONE variable
(e.g., C1):
last <-by(mydata, mydata$C1, tail, n=1)
lastd<-do.call("rbind", as.list(last))
The by function does not seem to allow two variables in the Indices
argument:
last <-by(mydata, mydata$C1 mydata$C2, tail, n=1) THIS DOESN'T WORK
I tried creating a new variable C1*C2, but I think this is risky since
it may not be unique depending on my values of C1 and C2 (I have a
very large dataset)
Thank you for the help,
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.