TheRealJimShady wrote > Hi Peter, > > Yes, I did miss an e from the first 'not' in the brackets at the end > of the message, sorry. > > Thanks for that code, but when I use it, it creates a new column > called csum which simply contains the values of the variable x . i.e. > it just duplicates the values from x into the new column. I guess > this means that the grouping isn't occurring correctly? > > James > > On 23 November 2012 13:57, Peter Dalgaard-2 [via R] > <
> ml-node+s789695n4650550h76@.nabble > > wrote: >> >> On Nov 23, 2012, at 12:04 , TheRealJimShady wrote: >> >>> Hi Arun & everyone, >>> >>> Thank you very much for your helpful suggestions. I've been working >>> through them, but have realised that my data is a little more >>> complicated than I said and that the solutions you've kindly provided >>> don't work. The problem is that there is more than one day of data for >>> each person. It looks like this: >>> >>> id x date >>> 1 5 2012-06-05 12:01 >>> 1 10 2012-06-05 12:02 >>> 1 45 2012-06-05 12:03 >>> 2 5 2012-06-05 12:01 >>> 2 3 2012-06-05 12:03 >>> 2 2 2012-06-05 12:05 >>> 3 5 2012-06-05 12:03 >>> 3 5 2012-06-05 12:04 >>> 3 8 2012-06-05 12:05 >>> 1 5 2012-06-08 13:01 >>> 1 9 2012-06-08 13:02 >>> 1 3 2012-06-08 13:03 >>> 2 0 2012-06-08 13:15 >>> 2 1 2012-06-08 13:18 >>> 2 8 2012-06-08 13:20 >>> 2 4 2012-06-08 13:21 >>> 3 6 2012-06-08 13:15 >>> 3 2 2012-06-08 13:16 >>> 3 7 2012-06-08 13:17 >>> 3 2 2012-06-08 13:18 >>> >>> So what I need to do is something like this (in pseudo code anyway): >>> >>> - Order the data by the id field and then the date field >>> - add a new variable called cumsum >>> - calculate this variable as the cumulative value of X, but grouping >>> by the id and date (not date, not date and time). >> >> Did you miss an 'e' on the first 'not'?? Otherwise, I'm confused. >> >> Why do people always forget about ave()? I'd try >> >> newdata <- transform(mydata, csum=ave(x, id, as.Date(date), FUN=cumsum)) >> >> You still need to sort, of course, at least by date (incl. time). Check >> carefully whether time zone is an issue for as.Date --- you may need the >> tz >> argument. >> >> Also notice that I shy using an R function name as a variable name. This >> is >> mostly superstition these days, but better safe than sorry. >> >> >>> >>> Thank you >>> >>> James >>> >>> >>> >>> >>> >>> On 23 November 2012 03:54, arun kirshna [via R] >>> <[hidden email]> wrote: >>>> Hi, >>>> No problem. >>>> One more method if you wanted to try: >>>> library(data.table) >>>> dat2<-data.table(dat1) >>>> dat2[,list(x,time,Cumsum=cumsum(x)),list(id)] >>>> # id x time Cumsum >>>> #1: 1 5 12:01 5 >>>> #2: 1 14 12:02 19 >>>> #3: 1 6 12:03 25 >>>> #4: 1 3 12:04 28 >>>> #5: 2 98 12:01 98 >>>> #6: 2 23 12:02 121 >>>> #7: 2 1 12:03 122 >>>> #8: 2 4 12:04 126 >>>> #9: 3 5 12:01 5 >>>> #10: 3 65 12:02 70 >>>> #11: 3 23 12:03 93 >>>> #12: 3 23 12:04 116 >>>> >>>> >>>> A.K. >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: TheRealJimShady <[hidden email]> >>>> To: [hidden email] >>>> Cc: >>>> Sent: Thursday, November 22, 2012 12:27 PM >>>> Subject: Re: [R] Using cumsum with 'group by' ? >>>> >>>> Thank you very much, I will try these tomorrow morning. >>>> >>>> On 22 November 2012 17:25, arun kirshna [via R] >>>> <[hidden email]> wrote: >>>> >>>>> HI, >>>>> You can do this in many ways: >>>>> dat1<-read.table(text=" >>>>> id time x >>>>> 1 12:01 5 >>>>> 1 12:02 14 >>>>> 1 12:03 6 >>>>> 1 12:04 3 >>>>> 2 12:01 98 >>>>> 2 12:02 23 >>>>> 2 12:03 1 >>>>> 2 12:04 4 >>>>> 3 12:01 5 >>>>> 3 12:02 65 >>>>> 3 12:03 23 >>>>> 3 12:04 23 >>>>> ",sep="",header=TRUE,stringsAsFactors=FALSE) >>>>> dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum) >>>>> #or >>>>> unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE) >>>>> # [1] 5 19 25 28 98 121 122 126 5 70 93 116 >>>>> #or >>>>> library(plyr) >>>>> ddply(dat1,.(id),function(x) cumsum(x[3]))[,2] >>>>> # [1] 5 19 25 28 98 121 122 126 5 70 93 116 >>>>> head(dat1) >>>>> # id time x Cumsum >>>>> #1 1 12:01 5 5 >>>>> #2 1 12:02 14 19 >>>>> #3 1 12:03 6 25 >>>>> #4 1 12:04 3 28 >>>>> #5 2 12:01 98 98 >>>>> #6 2 12:02 23 121 >>>>> A.K. >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> If you reply to this email, your message will be added to the >>>>> discussion >>>>> below: >>>>> >>>>> >>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html >>>>> To unsubscribe from Using cumsum with 'group by' ?, click here. >>>>> NAML >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> >>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html >>>> Sent from the R help mailing list archive at Nabble.com. >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> ______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> ________________________________ >>>> If you reply to this email, your message will be added to the >>>> discussion >>>> below: >>>> >>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html >>>> To unsubscribe from Using cumsum with 'group by' ?, click here. >>>> NAML >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html >> >>> Sent from the R help mailing list archive at Nabble.com. >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: [hidden email] Priv: [hidden email] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ________________________________ >> If you reply to this email, your message will be added to the discussion >> below: >> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650550.html >> To unsubscribe from Using cumsum with 'group by' ?, click here. >> NAML Peter Dalgaard's suggestion works for me ... lines<-"id x date time 1 5 2012-06-05 12:01 1 10 2012-06-05 12:02 1 45 2012-06-05 12:03 2 5 2012-06-05 12:01 2 3 2012-06-05 12:03 2 2 2012-06-05 12:05 3 5 2012-06-05 12:03 3 5 2012-06-05 12:04 3 8 2012-06-05 12:05 1 5 2012-06-08 13:01 1 9 2012-06-08 13:02 1 3 2012-06-08 13:03 2 0 2012-06-08 13:15 2 1 2012-06-08 13:18 2 8 2012-06-08 13:20 2 4 2012-06-08 13:21 3 6 2012-06-08 13:15 3 2 2012-06-08 13:16 3 7 2012-06-08 13:17 3 2 2012-06-08 13:18 " # read in data dat1<-read.table(textConnection(lines), header=TRUE,stringsAsFactors=FALSE) # build csum variable newdata <- transform(dat1, csum=ave(x, id, as.Date(date), FUN=cumsum)) # order data (not really necessary) newdata.ord <-newdata[order(dat1[,"id"],dat1[,"date"],dat1[,"time"]),] Or have I misinterpreted your request? HTH Pete (B not D) -- View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650556.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.