I have a data set with about 6 million rows and 50 columns. It is a mixture of dates, factors, and numerics.
What I am trying to accomplish can be seen with the following simplified data, which is given as dput output below. > head(myData) mydate gender mygroup id 1 2012-03-25 F A 1 2 2005-05-23 F B 2 3 2005-09-08 F B 2 4 2005-12-07 F B 2 5 2006-02-26 F C 2 6 2006-05-13 F C 2 For each id, I want to count the number of changes of the variable 'mygroup' that occur. For example, id=1 has 0 changes because it is observed only once. id=2 has 2 changes (B to C, and C to D). I also need to calculate the total observation time for each id using the variable mydate. In the end, I am trying to have a new data set in which each row has an id, days observed, number of changes, and gender. I made some simple summaries using data.table and plyr, but I'm stuck on this reformatting. Thanks for your help. myData <- structure(list(mydate = c("2012-03-25", "2005-05-23", "2005-09-08", "2005-12-07", "2006-02-26", "2006-05-13", "2006-09-01", "2006-12-12", "2006-02-19", "2006-05-03", "2006-04-23", "2007-12-08", "2011-03-19", "2007-12-20", "2008-06-15", "2008-12-16", "2009-06-07", "2009-10-09", "2010-01-28", "2007-06-05"), gender = c("F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M", "M", "M"), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D", "D", "A", "A", "A", "B", "B", "B", "A"), id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("mydate", "gender", "mygroup", "id"), class = "data.frame", row.names = c(NA, -20L )) > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.