Re: [R] obtaining first and last record for rows with same identifier
Francisco J. Zagmutt wrote: If you want to obtain a data frame you can use the functions head and tail like: dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data frame with id last=do.call("rbind",by(dat,dat$id,tail,1))#Selects the last observation for each id first=do.call("rbind",by(dat,dat$id,head,1))#Selects the first observation for each id newdat=rbind(first,last)#Joins data newdat=newdat[order(newdat$id),]#sorts data by id Notice that rownames will give you the original row location of the observations selected I hope this helps Francisco . . . You might also look at section 4.3 of http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] obtaining first and last record for rows with same identifier
If you want to obtain a data frame you can use the functions head and tail like: dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data frame with id last=do.call("rbind",by(dat,dat$id,tail,1))#Selects the last observation for each id first=do.call("rbind",by(dat,dat$id,head,1))#Selects the first observation for each id newdat=rbind(first,last)#Joins data newdat=newdat[order(newdat$id),]#sorts data by id Notice that rownames will give you the original row location of the observations selected I hope this helps Francisco From: Berton Gunter <[EMAIL PROTECTED]> To: "'Sean Davis'" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> CC: "'rhelp'" Subject: RE: [R] obtaining first and last record for rows with same identifier Date: Tue, 24 May 2005 12:17:58 -0700 I think by() is simpler: by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),]) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis > Sent: Tuesday, May 24, 2005 11:38 AM > To: [EMAIL PROTECTED] > Cc: rhelp > Subject: Re: [R] obtaining first and last record for rows > with same identifier > > If you have your data.frame ordered by the patid, you can use the > function rle in combination with cumsum. As a vector example: > > > a <- rep(c('a','b','c'),10) > > a > [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" > "b" "c" "a" > [20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" > > b <- a[order(a)] > > b > [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" > "b" "b" "b" > [20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" > > l <- rle(b)$length > > cbind(l,cumsum(l),cumsum(l)-l+1) >l > [1,] 10 10 1 > [2,] 10 20 11 > [3,] 10 30 21 > > # use the line below to get the length of the block of the dataframe, > the start, and then end indices > > cbind(l,cumsum(l)-l+1,cumsum(l)) >l > [1,] 10 1 10 > [2,] 10 11 20 > [3,] 10 21 30 > > > > Sean > > > On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote: > > > I have a dataframe that contains fields such as patid, labdate, > > labvalue. > > The same patid may show up in multiple rows because of lab > > measurements on multiple days. Is there a simple way to > obtain just > > the first and last record for each patient, or do I need to > write some > > code that performs that. > > > > Thanks, > > Steven > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] obtaining first and last record for rows with same identifier
I think by() is simpler: by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),]) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis > Sent: Tuesday, May 24, 2005 11:38 AM > To: [EMAIL PROTECTED] > Cc: rhelp > Subject: Re: [R] obtaining first and last record for rows > with same identifier > > If you have your data.frame ordered by the patid, you can use the > function rle in combination with cumsum. As a vector example: > > > a <- rep(c('a','b','c'),10) > > a > [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" > "b" "c" "a" > [20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" > > b <- a[order(a)] > > b > [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" > "b" "b" "b" > [20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" > > l <- rle(b)$length > > cbind(l,cumsum(l),cumsum(l)-l+1) >l > [1,] 10 10 1 > [2,] 10 20 11 > [3,] 10 30 21 > > # use the line below to get the length of the block of the dataframe, > the start, and then end indices > > cbind(l,cumsum(l)-l+1,cumsum(l)) >l > [1,] 10 1 10 > [2,] 10 11 20 > [3,] 10 21 30 > > > > Sean > > > On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote: > > > I have a dataframe that contains fields such as patid, labdate, > > labvalue. > > The same patid may show up in multiple rows because of lab > > measurements on multiple days. Is there a simple way to > obtain just > > the first and last record for each patient, or do I need to > write some > > code that performs that. > > > > Thanks, > > Steven > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] obtaining first and last record for rows with same identifier
If you have your data.frame ordered by the patid, you can use the function rle in combination with cumsum. As a vector example: > a <- rep(c('a','b','c'),10) > a [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" [20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" > b <- a[order(a)] > b [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "b" "b" [20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" > l <- rle(b)$length > cbind(l,cumsum(l),cumsum(l)-l+1) l [1,] 10 10 1 [2,] 10 20 11 [3,] 10 30 21 # use the line below to get the length of the block of the dataframe, the start, and then end indices > cbind(l,cumsum(l)-l+1,cumsum(l)) l [1,] 10 1 10 [2,] 10 11 20 [3,] 10 21 30 > Sean On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote: I have a dataframe that contains fields such as patid, labdate, labvalue. The same patid may show up in multiple rows because of lab measurements on multiple days. Is there a simple way to obtain just the first and last record for each patient, or do I need to write some code that performs that. Thanks, Steven __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html