Sorry but there was an error in the seq statement. Here it is again.
date.grouping <- function(d) { # for ea date in d calculate date beginning 6 month period which contains it mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) f <- function(x) do.call( "ISOdate", as.list(x) ) POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf) format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) } patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) patients2 <- as.data.frame( patients2 ) summary(patients2) boxplot(patients2) --- Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > >Try this. The function takes a vector of dates of the form yyyy-mm and produces a >new character vector of dates of the same form except the >output date is the beginning of the 6 month period in which the input date lies. The >6 month intervals are measured from the minimum date. > >date.grouping <- function(d) { > # for ea date in d calculate date beginning 6 month period which contains it > mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2) > f <- function(x) do.call( "ISOdate", as.list(x) ) > POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1) > breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf) > format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" ) >} > >patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) ) >patients2 <- as.data.frame( patients2 ) > >summary(patients2) > >boxplot(patients2) > > > >--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote: >>Hi, >> >> >>I am new to R, coming from a few years using Stata. I've been twisting my >>brain and checking several R and S references over the last few days to >>try to solve this data management problem: I have a data set with a unique >>patient identifier that is repeated along multiple rows, a variable with >>month of patient encounter, and a continous variable for cost of >>individual encounters. The data looks like this: >> >>ID date cost >>1 "2001-01" 200.00 >>1 "2001-01" 123.94 >>1 "2001-03" 100.23 >>1 "2001-04" 150.34 >>2 "2001-03" 296.34 >>2 "2002-05" 156.36 >> >> >>I would like to obtain the median costs and boxplots for the sum of >>encounters happening in the first six months after the index encounter >>(first patient encounter) for each patient, then the mean and median costs >>for the costs happening from 6 to 12 months after the index encounter, and >>so on. Notice that the first ID has two encounters during the index date, >>making it more difficult to define a single row with the index encounter. >> >>Any help would be appreciated, >> >> >>Ricardo >> >> >>Ricardo Pietrobon, MD >>Assistant Professor of Surgery >>Duke University Medical Center >>Durham, NC 27710 US >> >>______________________________________________ >>[EMAIL PROTECTED] mailing list >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >______________________________________________ >[EMAIL PROTECTED] mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help