Re: [R] obtaining first and last record for rows with same identifier

2005-05-25 Thread Frank E Harrell Jr

Francisco J. Zagmutt wrote:
If you want to obtain a data frame you can use the functions head and 
tail like:


dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data 
frame with id
last=do.call("rbind",by(dat,dat$id,tail,1))#Selects the last observation 
for each id
first=do.call("rbind",by(dat,dat$id,head,1))#Selects the first 
observation for each id

newdat=rbind(first,last)#Joins data
newdat=newdat[order(newdat$id),]#sorts data by id

Notice that rownames will give you the original row location of the 
observations selected


I hope this helps

Francisco


. . .

You might also look at section 4.3 of
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] obtaining first and last record for rows with same identifier

2005-05-24 Thread Francisco J. Zagmutt
If you want to obtain a data frame you can use the functions head and tail 
like:


dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data 
frame with id
last=do.call("rbind",by(dat,dat$id,tail,1))#Selects the last observation for 
each id
first=do.call("rbind",by(dat,dat$id,head,1))#Selects the first observation 
for each id

newdat=rbind(first,last)#Joins data
newdat=newdat[order(newdat$id),]#sorts data by id

Notice that rownames will give you the original row location of the 
observations selected


I hope this helps

Francisco



From: Berton Gunter <[EMAIL PROTECTED]>
To: "'Sean Davis'" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
CC: "'rhelp'" 
Subject: RE: [R] obtaining first and last record for rows with same 
identifier

Date: Tue, 24 May 2005 12:17:58 -0700


I think by() is simpler:

 by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),])



-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA

"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis
> Sent: Tuesday, May 24, 2005 11:38 AM
> To: [EMAIL PROTECTED]
> Cc: rhelp
> Subject: Re: [R] obtaining first and last record for rows
> with same identifier
>
> If you have your data.frame ordered by the patid, you can use the
> function rle in combination with cumsum.  As a vector example:
>
>  > a <- rep(c('a','b','c'),10)
>  > a
>   [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a"
> "b" "c" "a"
> [20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c"
>  > b <- a[order(a)]
>  > b
>   [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b"
> "b" "b" "b"
> [20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
>  > l <- rle(b)$length
>  > cbind(l,cumsum(l),cumsum(l)-l+1)
>l
> [1,] 10 10  1
> [2,] 10 20 11
> [3,] 10 30 21
>
> # use the line below to get the length of the block of the dataframe,
> the start, and then end indices
>  > cbind(l,cumsum(l)-l+1,cumsum(l))
>l
> [1,] 10  1 10
> [2,] 10 11 20
> [3,] 10 21 30
>  >
>
> Sean
>
>
> On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote:
>
> > I have a dataframe that contains fields such as patid, labdate,
> > labvalue.
> > The same patid may show up in multiple rows because of lab
> > measurements on multiple days.  Is there a simple way to
> obtain just
> > the first and last record for each patient, or do I need to
> write some
> > code that performs that.
> >
> > Thanks,
> > Steven
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] obtaining first and last record for rows with same identifier

2005-05-24 Thread Berton Gunter

I think by() is simpler:

 by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),])



-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis
> Sent: Tuesday, May 24, 2005 11:38 AM
> To: [EMAIL PROTECTED]
> Cc: rhelp
> Subject: Re: [R] obtaining first and last record for rows 
> with same identifier
> 
> If you have your data.frame ordered by the patid, you can use the 
> function rle in combination with cumsum.  As a vector example:
> 
>  > a <- rep(c('a','b','c'),10)
>  > a
>   [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" 
> "b" "c" "a"
> [20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c"
>  > b <- a[order(a)]
>  > b
>   [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" 
> "b" "b" "b"
> [20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
>  > l <- rle(b)$length
>  > cbind(l,cumsum(l),cumsum(l)-l+1)
>l
> [1,] 10 10  1
> [2,] 10 20 11
> [3,] 10 30 21
> 
> # use the line below to get the length of the block of the dataframe, 
> the start, and then end indices
>  > cbind(l,cumsum(l)-l+1,cumsum(l))
>l
> [1,] 10  1 10
> [2,] 10 11 20
> [3,] 10 21 30
>  >
> 
> Sean
> 
> 
> On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote:
> 
> > I have a dataframe that contains fields such as patid, labdate, 
> > labvalue.
> > The same patid may show up in multiple rows because of lab 
> > measurements on multiple days.  Is there a simple way to 
> obtain just 
> > the first and last record for each patient, or do I need to 
> write some 
> > code that performs that.
> >
> > Thanks,
> > Steven
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] obtaining first and last record for rows with same identifier

2005-05-24 Thread Sean Davis
If you have your data.frame ordered by the patid, you can use the 
function rle in combination with cumsum.  As a vector example:


> a <- rep(c('a','b','c'),10)
> a
 [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" 
"b" "c" "a"

[20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c"
> b <- a[order(a)]
> b
 [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" 
"b" "b" "b"

[20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
> l <- rle(b)$length
> cbind(l,cumsum(l),cumsum(l)-l+1)
  l
[1,] 10 10  1
[2,] 10 20 11
[3,] 10 30 21

# use the line below to get the length of the block of the dataframe, 
the start, and then end indices

> cbind(l,cumsum(l)-l+1,cumsum(l))
  l
[1,] 10  1 10
[2,] 10 11 20
[3,] 10 21 30
>

Sean


On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote:

I have a dataframe that contains fields such as patid, labdate, 
labvalue.
The same patid may show up in multiple rows because of lab 
measurements on multiple days.  Is there a simple way to obtain just 
the first and last record for each patient, or do I need to write some 
code that performs that.


Thanks,
Steven

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html