Re: [R] obtaining first and last record for rows with same identifier

2005-05-25 Thread Frank E Harrell Jr

Francisco J. Zagmutt wrote:
If you want to obtain a data frame you can use the functions head and 
tail like:


dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data 
frame with id
last=do.call(rbind,by(dat,dat$id,tail,1))#Selects the last observation 
for each id
first=do.call(rbind,by(dat,dat$id,head,1))#Selects the first 
observation for each id

newdat=rbind(first,last)#Joins data
newdat=newdat[order(newdat$id),]#sorts data by id

Notice that rownames will give you the original row location of the 
observations selected


I hope this helps

Francisco


. . .

You might also look at section 4.3 of
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] obtaining first and last record for rows with same identifier

2005-05-24 Thread Sean Davis
If you have your data.frame ordered by the patid, you can use the 
function rle in combination with cumsum.  As a vector example:


 a - rep(c('a','b','c'),10)
 a
 [1] a b c a b c a b c a b c a b c a 
b c a

[20] b c a b c a b c a b c
 b - a[order(a)]
 b
 [1] a a a a a a a a a a b b b b b b 
b b b

[20] b c c c c c c c c c c
 l - rle(b)$length
 cbind(l,cumsum(l),cumsum(l)-l+1)
  l
[1,] 10 10  1
[2,] 10 20 11
[3,] 10 30 21

# use the line below to get the length of the block of the dataframe, 
the start, and then end indices

 cbind(l,cumsum(l)-l+1,cumsum(l))
  l
[1,] 10  1 10
[2,] 10 11 20
[3,] 10 21 30


Sean


On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote:

I have a dataframe that contains fields such as patid, labdate, 
labvalue.
The same patid may show up in multiple rows because of lab 
measurements on multiple days.  Is there a simple way to obtain just 
the first and last record for each patient, or do I need to write some 
code that performs that.


Thanks,
Steven

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] obtaining first and last record for rows with same identifier

2005-05-24 Thread Berton Gunter

I think by() is simpler:

 by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),])



-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis
 Sent: Tuesday, May 24, 2005 11:38 AM
 To: [EMAIL PROTECTED]
 Cc: rhelp
 Subject: Re: [R] obtaining first and last record for rows 
 with same identifier
 
 If you have your data.frame ordered by the patid, you can use the 
 function rle in combination with cumsum.  As a vector example:
 
   a - rep(c('a','b','c'),10)
   a
   [1] a b c a b c a b c a b c a b c a 
 b c a
 [20] b c a b c a b c a b c
   b - a[order(a)]
   b
   [1] a a a a a a a a a a b b b b b b 
 b b b
 [20] b c c c c c c c c c c
   l - rle(b)$length
   cbind(l,cumsum(l),cumsum(l)-l+1)
l
 [1,] 10 10  1
 [2,] 10 20 11
 [3,] 10 30 21
 
 # use the line below to get the length of the block of the dataframe, 
 the start, and then end indices
   cbind(l,cumsum(l)-l+1,cumsum(l))
l
 [1,] 10  1 10
 [2,] 10 11 20
 [3,] 10 21 30
  
 
 Sean
 
 
 On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote:
 
  I have a dataframe that contains fields such as patid, labdate, 
  labvalue.
  The same patid may show up in multiple rows because of lab 
  measurements on multiple days.  Is there a simple way to 
 obtain just 
  the first and last record for each patient, or do I need to 
 write some 
  code that performs that.
 
  Thanks,
  Steven
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] obtaining first and last record for rows with same identifier

2005-05-24 Thread Francisco J. Zagmutt
If you want to obtain a data frame you can use the functions head and tail 
like:


dat=data.frame(id=rep(1:5,3),num=rnorm(15), num2=rnorm(15))#Creates data 
frame with id
last=do.call(rbind,by(dat,dat$id,tail,1))#Selects the last observation for 
each id
first=do.call(rbind,by(dat,dat$id,head,1))#Selects the first observation 
for each id

newdat=rbind(first,last)#Joins data
newdat=newdat[order(newdat$id),]#sorts data by id

Notice that rownames will give you the original row location of the 
observations selected


I hope this helps

Francisco



From: Berton Gunter [EMAIL PROTECTED]
To: 'Sean Davis' [EMAIL PROTECTED], [EMAIL PROTECTED]
CC: 'rhelp' r-help@stat.math.ethz.ch
Subject: RE: [R] obtaining first and last record for rows with same 
identifier

Date: Tue, 24 May 2005 12:17:58 -0700


I think by() is simpler:

 by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),])



-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA

The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Sean Davis
 Sent: Tuesday, May 24, 2005 11:38 AM
 To: [EMAIL PROTECTED]
 Cc: rhelp
 Subject: Re: [R] obtaining first and last record for rows
 with same identifier

 If you have your data.frame ordered by the patid, you can use the
 function rle in combination with cumsum.  As a vector example:

   a - rep(c('a','b','c'),10)
   a
   [1] a b c a b c a b c a b c a b c a
 b c a
 [20] b c a b c a b c a b c
   b - a[order(a)]
   b
   [1] a a a a a a a a a a b b b b b b
 b b b
 [20] b c c c c c c c c c c
   l - rle(b)$length
   cbind(l,cumsum(l),cumsum(l)-l+1)
l
 [1,] 10 10  1
 [2,] 10 20 11
 [3,] 10 30 21

 # use the line below to get the length of the block of the dataframe,
 the start, and then end indices
   cbind(l,cumsum(l)-l+1,cumsum(l))
l
 [1,] 10  1 10
 [2,] 10 11 20
 [3,] 10 21 30
  

 Sean


 On May 24, 2005, at 2:27 PM, [EMAIL PROTECTED] wrote:

  I have a dataframe that contains fields such as patid, labdate,
  labvalue.
  The same patid may show up in multiple rows because of lab
  measurements on multiple days.  Is there a simple way to
 obtain just
  the first and last record for each patient, or do I need to
 write some
  code that performs that.
 
  Thanks,
  Steven
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html