Re: [R] the first and last observation for each subject

2009-01-06 Thread William Dunlap
Just in case anyone is still interested, here are some comparisons of the time it says to compute grouped medians via sapply(split(x,group),median) and gm(x,group), which uses the trick used by rle() to find the first and last entries in each group. Which method is fastest depends on the nature

Re: [R] the first and last observation for each subject

2009-01-05 Thread hadley wickham
Another application of that technique can be used to quickly compute medians by groups: gm - function(x, group){ # medians by group: sapply(split(x,group),median) o-order(group, x) group - group[o] x - x[o] changes - group[-1] != group[-length(group)] first - which(c(TRUE,

Re: [R] the first and last observation for each subject

2009-01-05 Thread William Dunlap
Arg, the 'sapply(...)' in the function was in the initial comment, gm - function(x, group){ # medians by group: sapply(split(x,group),median) but someone's mailer put a newline before the sapply gm - function(x, group){ # medians by group: sapply(split(x,group),median) so it got

Re: [R] the first and last observation for each subject

2009-01-05 Thread William Dunlap
-Original Message- From: hadley wickham [mailto:h.wick...@gmail.com] Sent: Sunday, January 04, 2009 8:56 PM To: William Dunlap Cc: gallon...@gmail.com; R help Subject: Re: [R] the first and last observation for each subject library(plyr) # ddply is for splitting up data

Re: [R] the first and last observation for each subject

2009-01-05 Thread Kingsford Jones
Here's some more timing's of Bill's function. Although in this example sapply has a clear performance advantage for smaller numbers of groups (k) , gm is substantially faster for k 1000: gm - function(x, group){ # medians by group: o-order(group, x) group - group[o] x - x[o]

Re: [R] the first and last observation for each subject

2009-01-05 Thread Kingsford Jones
whoops -- I left the group size unchanged so k became greather than the length of the group vector. When I increase the size to 1e7, sapply is faster until it gets to k = 1e6. warning: this takes awhile (particularly on my machine which seems to be using just 1 of it's 2 cpus) for(k in

Re: [R] the first and last observation for each subject

2009-01-04 Thread William Dunlap
[R] the first and last observation for each subject hadley wickham h.wickham at gmail.com Fri Jan 2 14:52:42 CET 2009 On Fri, Jan 2, 2009 at 3:20 AM, gallon li gallon.li at gmail.com wrote: I have the following data ID x y time 1 10 20 0 1 10 30 1 1 10 40 2 2 12 23 0 2 12

[R] the first and last observation for each subject

2009-01-02 Thread gallon li
I have the following data ID x y time 1 10 20 0 1 10 30 1 1 10 40 2 2 12 23 0 2 12 25 1 2 12 28 2 2 12 38 3 3 5 10 0 3 5 15 2 . x is time invariant, ID is the subject id number, y is changing over time. I want to find out the difference between the first and last observed y value for each

Re: [R] the first and last observation for each subject

2009-01-02 Thread Carlos J. Gil Bellosta
Hello, First, order your data by ID and time. The columns you want in your output dataframe are then unique(ID), tapply( x, ID, function( z ) z[ 1 ] ) and tapply( y, ID, function( z ) z[ lenght( z ) ] - z[ 1 ] ) Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Fri,

Re: [R] the first and last observation for each subject

2009-01-02 Thread Gabor Grothendieck
Try this: Lines - ID x y time + 1 10 20 0 + 1 10 30 1 + 1 10 40 2 + 2 12 23 0 + 2 12 25 1 + 2 12 28 2 + 2 12 38 3 + 3 5 10 0 + 3 5 15 2 DF - read.table(textConnection(Lines), header = TRUE) aggregate(DF[3], DF[1:2], function(x) tail(x, 1) - head(x, 1)) ID x y 1 3 5 5 2 1 10 20 3 2

Re: [R] the first and last observation for each subject

2009-01-02 Thread Jorge Ivan Velez
Dear Gallon, Assuming that your data is called mydata, something like this should do the job: newdf-data.frame( ID = unique(mydata$ID), x = unique(mydata$x), y = with(mydata,tapply(y,ID,function(m) tail(m,1)-head(m,1))) ) newdf HTH, Jorge On Fri, Jan

Re: [R] the first and last observation for each subject

2009-01-02 Thread hadley wickham
On Fri, Jan 2, 2009 at 3:20 AM, gallon li gallon...@gmail.com wrote: I have the following data ID x y time 1 10 20 0 1 10 30 1 1 10 40 2 2 12 23 0 2 12 25 1 2 12 28 2 2 12 38 3 3 5 10 0 3 5 15 2 . x is time invariant, ID is the subject id number, y is changing over time. I

Re: [R] the first and last observation for each subject

2009-01-02 Thread Frank E Harrell Jr
Here is a fast approach using the Hmisc package's summarize function. g - function(w) { + time - w[,'time']; y - w[,'y'] + c(y[which.min(time)], y[which.max(time)])} with(DF, summarize(DF, ID, g, stat.name=c('first','last'))) ID first last 1 120 40 2 223 38 3 310 15

Re: [R] the first and last observation for each subject

2009-01-02 Thread Stavros Macrakis
I think there's a pretty simple solution here, though probably not the most efficient: t(sapply(split(a,a$ID), function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y) Using 'unique' instead of min or [[1]] has the advantage that if x is in fact not time-invariant, this gives an

Re: [R] the first and last observation for each subject

2009-01-02 Thread Carlos J. Gil Bellosta
Hello, Is is truly y=max(y)-min(y) what you want below? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Fri, 2009-01-02 at 13:16 -0500, Stavros Macrakis wrote: I think there's a pretty simple solution here, though probably not the most efficient: