Sorry but there was an error in the seq statement.  Here it is again.

date.grouping <- function(d) {
  # for ea date in d calculate date beginning 6 month period which contains it
  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
  f <- function(x) do.call( "ISOdate", as.list(x) )
  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
  breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf)
  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
}

patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
patients2 <- as.data.frame( patients2 )

summary(patients2)

boxplot(patients2)



--- Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>
>Try this.  The function takes a vector of dates of the form yyyy-mm and produces a 
>new character vector of dates of the same form except the 
>output date is the beginning of the 6 month period in which the input date lies.  The 
>6 month intervals are measured from the minimum date.
>
>date.grouping <- function(d) {
>  # for ea date in d calculate date beginning 6 month period which contains it
>  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
>  f <- function(x) do.call( "ISOdate", as.list(x) )
>  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
>  breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf)
>  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
>}
>
>patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
>patients2 <- as.data.frame( patients2 )
>
>summary(patients2)
>
>boxplot(patients2)
>
>
>
>--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote:
>>Hi,
>>
>>
>>I am new to R, coming from a few years using Stata. I've been twisting my
>>brain and checking several R and S references over the last few days to
>>try to solve this data management problem: I have a data set with a unique
>>patient identifier that is repeated along multiple rows, a variable with
>>month of patient encounter, and a continous variable for cost of
>>individual encounters. The data looks like this:
>>
>>ID    date            cost
>>1     "2001-01"       200.00
>>1     "2001-01"       123.94
>>1     "2001-03"       100.23
>>1     "2001-04"       150.34
>>2     "2001-03"       296.34
>>2     "2002-05"       156.36
>>
>>
>>I would like to obtain the median costs and boxplots for the sum of
>>encounters happening in the first six months after the index encounter
>>(first patient encounter) for each patient, then the mean and median costs
>>for the costs happening from 6 to 12 months after the index encounter, and
>>so on. Notice that the first ID has two encounters during the index date,
>>making it more difficult to define a single row with the index encounter.
>>
>>Any help would be appreciated,
>>
>>
>>Ricardo
>>
>>
>>Ricardo Pietrobon, MD
>>Assistant Professor of Surgery
>>Duke University Medical Center
>>Durham, NC 27710 US
>>
>>______________________________________________
>>[EMAIL PROTECTED] mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>______________________________________________
>[EMAIL PROTECTED] mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to