Re: [R] obtaining first and last record for rows with same identifier
Francisco J. Zagmutt wrote: If you want to obtain a data frame you can use the functions head and tail like: dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data frame with id last=do.call(rbind,by(dat,dat$id,tail,1))#Selects the last observation for each id first=do.call(rbind,by(dat,dat$id,head,1))#Selects the first observation for each id newdat=rbind(first,last)#Joins data newdat=newdat[order(newdat$id),]#sorts data by id Notice that rownames will give you the original row location of the observations selected I hope this helps Francisco . . . You might also look at section 4.3 of http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] obtaining first and last record for rows with same identifier
If you have your data.frame ordered by the patid, you can use the function rle in combination with cumsum. As a vector example: a - rep(c('a','b','c'),10) a [1] a b c a b c a b c a b c a b c a b c a [20] b c a b c a b c a b c b - a[order(a)] b [1] a a a a a a a a a a b b b b b b b b b [20] b c c c c c c c c c c l - rle(b)$length cbind(l,cumsum(l),cumsum(l)-l+1) l [1,] 10 10 1 [2,] 10 20 11 [3,] 10 30 21 # use the line below to get the length of the block of the dataframe, the start, and then end indices cbind(l,cumsum(l)-l+1,cumsum(l)) l [1,] 10 1 10 [2,] 10 11 20 [3,] 10 21 30 Sean On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote: I have a dataframe that contains fields such as patid, labdate, labvalue. The same patid may show up in multiple rows because of lab measurements on multiple days. Is there a simple way to obtain just the first and last record for each patient, or do I need to write some code that performs that. Thanks, Steven __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] obtaining first and last record for rows with same identifier
I think by() is simpler: by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),]) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis Sent: Tuesday, May 24, 2005 11:38 AM To: [EMAIL PROTECTED] Cc: rhelp Subject: Re: [R] obtaining first and last record for rows with same identifier If you have your data.frame ordered by the patid, you can use the function rle in combination with cumsum. As a vector example: a - rep(c('a','b','c'),10) a [1] a b c a b c a b c a b c a b c a b c a [20] b c a b c a b c a b c b - a[order(a)] b [1] a a a a a a a a a a b b b b b b b b b [20] b c c c c c c c c c c l - rle(b)$length cbind(l,cumsum(l),cumsum(l)-l+1) l [1,] 10 10 1 [2,] 10 20 11 [3,] 10 30 21 # use the line below to get the length of the block of the dataframe, the start, and then end indices cbind(l,cumsum(l)-l+1,cumsum(l)) l [1,] 10 1 10 [2,] 10 11 20 [3,] 10 21 30 Sean On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote: I have a dataframe that contains fields such as patid, labdate, labvalue. The same patid may show up in multiple rows because of lab measurements on multiple days. Is there a simple way to obtain just the first and last record for each patient, or do I need to write some code that performs that. Thanks, Steven __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] obtaining first and last record for rows with same identifier
If you want to obtain a data frame you can use the functions head and tail like: dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data frame with id last=do.call(rbind,by(dat,dat$id,tail,1))#Selects the last observation for each id first=do.call(rbind,by(dat,dat$id,head,1))#Selects the first observation for each id newdat=rbind(first,last)#Joins data newdat=newdat[order(newdat$id),]#sorts data by id Notice that rownames will give you the original row location of the observations selected I hope this helps Francisco From: Berton Gunter [EMAIL PROTECTED] To: 'Sean Davis' [EMAIL PROTECTED], [EMAIL PROTECTED] CC: 'rhelp' r-help@stat.math.ethz.ch Subject: RE: [R] obtaining first and last record for rows with same identifier Date: Tue, 24 May 2005 12:17:58 -0700 I think by() is simpler: by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),]) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis Sent: Tuesday, May 24, 2005 11:38 AM To: [EMAIL PROTECTED] Cc: rhelp Subject: Re: [R] obtaining first and last record for rows with same identifier If you have your data.frame ordered by the patid, you can use the function rle in combination with cumsum. As a vector example: a - rep(c('a','b','c'),10) a [1] a b c a b c a b c a b c a b c a b c a [20] b c a b c a b c a b c b - a[order(a)] b [1] a a a a a a a a a a b b b b b b b b b [20] b c c c c c c c c c c l - rle(b)$length cbind(l,cumsum(l),cumsum(l)-l+1) l [1,] 10 10 1 [2,] 10 20 11 [3,] 10 30 21 # use the line below to get the length of the block of the dataframe, the start, and then end indices cbind(l,cumsum(l)-l+1,cumsum(l)) l [1,] 10 1 10 [2,] 10 11 20 [3,] 10 21 30 Sean On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote: I have a dataframe that contains fields such as patid, labdate, labvalue. The same patid may show up in multiple rows because of lab measurements on multiple days. Is there a simple way to obtain just the first and last record for each patient, or do I need to write some code that performs that. Thanks, Steven __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html