Hi,
By comparing some of the solutions: set.seed(25) subid<- sample(30:50,22e5,replace=TRUE) set.seed(27) year<- sample(1990:2012,22e5,replace=TRUE) set.seed(35) var1<- sample(c(1,3,5,7),22e5,replace=TRUE) df2<- data.frame(subid,year,var1) df2<- df2[order(df2$subid,df2$year),] system.time(res<-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[-length(var1)])),delta)[,-4]) # user system elapsed # 8.036 0.132 8.188 system.time(res2<-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ]) # user system elapsed # 1.220 0.000 1.222 system.time(res3<-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),]) # user system elapsed # 1.729 0.000 1.730 identical(res2,res3) #[1] TRUE row.names(res)<-1:nrow(res) row.names(res2)<-1:nrow(res) identical(res,res2) #[1] TRUE I found half an hour a bit too extreme by comparing the above numbers. A.K. David: 6 47 1999 1 should not be included in the output list because, we are trying to detect changes within the subid's. 1999 was the first year for subject 47 and changes have to be detected after that year - hence we were using ddply to group. Your solution ran very fast as expected. AK- I have a large dataset and your solution is taking too long - as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. Thanks for the suggestions. -ST ----- Original Message ----- From: David Winsemius <dwinsem...@comcast.net> To: arun <smartpink...@yahoo.com> Cc: R help <r-help@r-project.org> Sent: Tuesday, June 4, 2013 11:13 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 9:51 PM, arun wrote: > If it is grouped by "subid" (that would be the difference in the number of > changes) > > subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] > # subid year var > #3 36 2003 3 > #7 47 2001 3 > #9 47 2005 1 > #10 47 2007 3 > A.K. I'm not sure why the first one retruns integer values from the ave() call but the second version works: > df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != > x[-length(x)]) ), ] subid year var 1 36 1999 1 1.1 36 1999 1 1.2 36 1999 1 1.3 36 1999 1 ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)])) [1] 0 0 1 0 0 0 1 0 1 1 Perhaps one of the single item groups sabotaged my simple function. > df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != > x[-length(x)]) ) ), ] subid year var 3 36 2003 3 7 47 2001 3 9 47 2005 1 10 47 2007 3 -- David. > > > ----- Original Message ----- > From: David Winsemius <dwinsem...@comcast.net> > To: arun <smartpink...@yahoo.com> > Cc: R help <r-help@r-project.org> > Sent: Tuesday, June 4, 2013 12:37 AM > Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data > > > On Jun 3, 2013, at 7:10 PM, arun wrote: > >> Hi, >> May be this helps: >> res1<-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) >> c(FALSE,diff(x)!=0)),use.names=FALSE)),] >> res1 >> # subid year var >> #3 36 2003 3 >> #7 47 2001 3 >> #9 47 2005 1 >> #10 47 2007 3 >> #or >> library(plyr) >> subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] >> # subid year var >> #3 36 2003 3 >> #7 47 2001 3 >> #9 47 2005 1 >> #10 47 2007 3 >> A.K. >> > It's pretty simple with logical indexing: > >> df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] > subid year var > 3 36 2003 3 > 6 47 1999 1 > 7 47 2001 3 > 9 47 2005 1 > 10 47 2007 3 > > > When I count the number of changes in value of var is give me 5. Not sure why > you are both leaving out row 6. > > -- > David. >> >> >> I need to output a dataframe whenever var changes a value. >> >> df1 <- >> data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) >> >> subid year var >> 1 36 1999 1 >> 2 36 2001 1 >> 3 36 2003 3 >> 4 36 2005 3 >> 5 36 2007 3 >> 6 47 1999 1 >> 7 47 2001 3 >> 8 47 2003 3 >> 9 47 2005 1 >> 10 47 2007 3 >>> >> >> I need: >> 36 2003 3 >> 47 2001 3 >> 47 2005 1 >> 47 2007 3 >> >> I am trying to use ddply over subid and use the diff function, but it is not >> working quiet right. >> >>> dd <- ddply(df1,.(subid),summarize,delta=diff(var) != 0) >>> dd >> subid delta >> 1 36 FALSE >> 2 36 TRUE >> 3 36 FALSE >> 4 36 FALSE >> 5 47 TRUE >> 6 47 FALSE >> 7 47 TRUE >> 8 47 TRUE >> >> I would appreciate any help on this. >> Thank You! >> -ST >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.