[R] couting events by subject with black out windows

2011-11-18 Thread Chris Conner
I large datset that includes subjects(ID), Dates and events that need to be 
counted.  Not every date includes an event, and I need to only count one event 
per 30days, per subject.  So in essence, I need to create a 30-day black out 
period during which time an event cannot be counted for each subject.  The 
reason is that a rule has been set up, whereby a subject can only be counted 
once per 30 day period (the 30 day window includes the day the event of 
interest is counted).
 
The solution should count only the following events per subject(per the 30-day 
blackout rule):
 
ID Date 
auto1 1/1/2010 
auto2 2/12/2010 
auto2 4/21/2011 
auto3 3/1/2010 
auto3 5/3/2010 
 
I have created a multistep process to do this, but it is extremely clumsy 
(detailed below).  I have to believe that one of you has a much more elegant 
solution.  Thank you all in advance for any help
 
## example data
data1 - structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 4L, 
4L, 4L, 4L, 4L), .Label = c(, auto1, auto2, auto3), class = factor), 
Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 15085, 
15086, 14642, 14669, 14732,14747, 14749), class = Date), event = c(1L, 1L, 
1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c(ID, 
Date,event), class = data.frame, row.names = c(NA, 14L))
## remove non events
data2 - data1[data1$event==1,]
library(doBy)
## create a table of first events
step1 - summaryBy(Date~ID, data = data2, FUN=min)
step1$Date30 - step1$Date.min+30 
step2 - merge(data2, step1, by.x=ID, by.y=ID)
## use an ifelse to essentially remove any events that shouldn't be counted
step2$event - ifelse(as.numeric(step2$Date) = step2$Date.min  
as.numeric(step2$Date) = step2$Date30, 0, step2$event)
## basically repeat steps above until I get an error (no more events)
data3 - step2[step2$event==1,]
data3- data3[,1:3]
step3 - summaryBy(Date~ID, data = data3, FUN=min)
step3$Date30 - step3$Date.min+30
step4 - merge(data3, step3, by.x=ID, by.y=ID)
step4$event - ifelse(as.numeric(step4$Date) = step4$Date.min  
as.numeric(step4$Date) = step4$Date30, 0, step4$event)
## then I rbind the keepers
## in this case steps 1 and 3 above
final - rbind(step1,step3)
## then reformat
final - final[,1:2]
final$Date.min - as.Date(final$Date.min,origin=1970-01-01)
## again, extremely clumsy, but it works...  HELP! :)
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] couting events by subject with black out windows

2011-11-18 Thread Dennis Murphy
Hi:

Here's a Q  D solution that could be improved. It uses the plyr
package. Starting from your data1 data frame,

library('plyr')
dseq - seq(as.Date('2010-01-01'), as.Date('2011-06-05'), by = '30 days')
# Use the cut() function to create a factor whose levels are demarcated
# by the dates in dseq:
# See ?cut for labeling options
data1[['tf']] - cut(data1$Date, dseq)
ddply(subset(data1, event == 1L), .(tf),  summarise, Date.min = min(Date))

  tf   Date.min
1 2010-01-01 2010-01-01
2 2010-01-31 2010-02-12
3 2010-05-01 2010-05-03
4 2011-03-27 2011-04-21

The value of tf is the left endpoint of the time interval.

This isn't your desired output in two respects: (1) summarise won't
carry along extra variables, so ID gets dropped; (2) you have
2010-03-01 as the first date of a 30-day period, but according to the
way I defined the 30-day intervals, Mar. 1 is the last day of an
interval, so that's why it's not included [2010-2-12 precedes it]. You
can always change the definitions. If you group by months instead, you
get the output you expected.

Hope this is enough to get you started..
Dennis



On Fri, Nov 18, 2011 at 3:22 PM, Chris Conner connerpha...@yahoo.com wrote:
 I large datset that includes subjects(ID), Dates and events that need to be 
 counted.  Not every date includes an event, and I need to only count one 
 event per 30days, per subject.  So in essence, I need to create a 30-day 
 black out period during which time an event cannot be counted for each 
 subject.  The reason is that a rule has been set up, whereby a subject can 
 only be counted once per 30 day period (the 30 day window includes the day 
 the event of interest is counted).

 The solution should count only the following events per subject(per the 
 30-day blackout rule):

 ID Date
 auto1 1/1/2010
 auto2 2/12/2010
 auto2 4/21/2011
 auto3 3/1/2010
 auto3 5/3/2010

 I have created a multistep process to do this, but it is extremely clumsy 
 (detailed below).  I have to believe that one of you has a much more elegant 
 solution.  Thank you all in advance for any help

 ## example data
 data1 - structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 
 4L, 4L, 4L, 4L, 4L), .Label = c(, auto1, auto2, auto3), class = 
 factor), Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 
 15085, 15086, 14642, 14669, 14732,14747, 14749), class = Date), event = 
 c(1L, 1L, 1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c(ID, 
 Date,event), class = data.frame, row.names = c(NA, 14L))
 ## remove non events
 data2 - data1[data1$event==1,]
 library(doBy)
 ## create a table of first events
 step1 - summaryBy(Date~ID, data = data2, FUN=min)
 step1$Date30 - step1$Date.min+30
 step2 - merge(data2, step1, by.x=ID, by.y=ID)
 ## use an ifelse to essentially remove any events that shouldn't be 
 counted
 step2$event - ifelse(as.numeric(step2$Date) = step2$Date.min  
 as.numeric(step2$Date) = step2$Date30, 0, step2$event)
 ## basically repeat steps above until I get an error (no more events)
 data3 - step2[step2$event==1,]
 data3- data3[,1:3]
 step3 - summaryBy(Date~ID, data = data3, FUN=min)
 step3$Date30 - step3$Date.min+30
 step4 - merge(data3, step3, by.x=ID, by.y=ID)
 step4$event - ifelse(as.numeric(step4$Date) = step4$Date.min  
 as.numeric(step4$Date) = step4$Date30, 0, step4$event)
 ## then I rbind the keepers
 ## in this case steps 1 and 3 above
 final - rbind(step1,step3)
 ## then reformat
 final - final[,1:2]
 final$Date.min - as.Date(final$Date.min,origin=1970-01-01)
 ## again, extremely clumsy, but it works...  HELP! :)
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.