Hi,

I have a data set in which the variable 'dose' is time-varying. Currently,
the data set is in a long format, with 1 row for each time unit of follow-up
for each individual "Id". It looks like this:


orig.data <- cbind(Id = c(rep(1,4), rep(2,5)), time = c(1:4, 1:5), dose =
c(1,1,1,0,1,0,1,1,0))

orig.data
      Id time dose
 [1,]  1    1    1
 [2,]  1    2    1
 [3,]  1    3    1
 [4,]  1    4    0
 [5,]  2    1    1
 [6,]  2    2    0
 [7,]  2    3    1
 [8,]  2    4    1
 [9,]  2    5    0

What I would like to do is to convert the data set into an interval format.
By that I mean a data set in which each row has a 'Start' and a 'Stop' value
that indicates the time units in which the 'dose' is constant. For example,
my orig.data example would now be:

int.data <-  cbind(Id = c(rep(1,2), rep(2,4)), Start = c(1,4,1,2,3,5), Stop
= c(3,4,1,2,4,5), dose = c(1,0,1,0,1,0))

int.data
     Id Start Stop dose
[1,]  1     1    3    1
[2,]  1     4    4    0
[3,]  2     1    1    1
[4,]  2     2    2    0
[5,]  2     3    4    1
[6,]  2     5    5    0

Basically, this implies collapsing rows that have the same "Id" and "dose"
and creating "Start" and "Stop" to index the time.

While I can write a clumsy routine with multiple loops to do it, it will be
inefficient and will not work for large data set.

I wonder if people know of a function that would reshape my data set from
'long' to 'interval'?

Best,

MP

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to