Note that ddply is a heavyweight solution, and as your data gets larger you may find that using it for little things like this hits performance.

Also, "df" is a base function that you might actually want to use someday,
and you also introduce confusion in the mind of someone reading your code
if you redefine it this way.

existingdf <- read.csv( text=
"storm,Q_time,Q
s1,2008-08-07 21:15:00,0.000
s1,2008-08-07 21:16:00,3.020
s1,2008-08-07 21:17:00,6.041
s1,2008-08-07 21:18:00,9.061
s1,2008-08-07 21:19:00,12.082
s1,2008-08-07 21:20:00,15.102
s1,2008-08-07 21:21:00,18.123
s1,2008-08-07 21:22:00,11.143
s1,2008-08-07 21:23:00,0.000
s2,2010-10-05 21:00:00,0.000
s2,2010-10-05 21:01:00,1.812
s2,2010-10-05 21:02:00,3.625
s2,2010-10-05 21:03:00,5.437
s2,2010-10-05 21:04:00,7.249
s2,2010-10-05 21:05:00,9.061
s2,2010-10-05 21:06:00,0.874
s2,2010-10-05 21:07:00,0.000
", as.is=TRUE )

library(plyr)
# plyr solution
newdf <- ddply( existingdf
              , "storm"
              , function( DF ) {
                  transform( DF
                           , duration=seq.int( length.out=nrow( DF ) ) )
                }
              )

# base R solution
newdf2 <- transform( existingdf
                   , duration=ave( rep( 1, nrow(existingdf) )
                                 , storm
                                 , FUN=cumsum ) )


On Wed, 16 Apr 2014, Steve E. wrote:

Dear R Community,

I am having some trouble with a task that I hope you might be able to help
with. I have a dataset that includes the time and corresponding stream
discharge from numerous storms (example of structure with simplified data
below). I would like to produce a field that details the duration of each
storm, where each storm is a subset of the data and the duration runs from
zero to end for each unique storm. I have been trying to accomplish this
with ddply but to no avail as I am unable to provide ddply (e.g., below)
with the length of the storm (i.e., subset of data). Thank you in advance,
any help would be appreciated.


existing df:
storm,Q_time,Q
s1,2008-08-07 21:15:00,0.000
s1,2008-08-07 21:16:00,3.020
s1,2008-08-07 21:17:00,6.041
s1,2008-08-07 21:18:00,9.061
s1,2008-08-07 21:19:00,12.082
s1,2008-08-07 21:20:00,15.102
s1,2008-08-07 21:21:00,18.123
s1,2008-08-07 21:22:00,11.143
s1,2008-08-07 21:23:00,0.000
s2,2010-10-05 21:00:00,0.000
s2,2010-10-05 21:01:00,1.812
s2,2010-10-05 21:02:00,3.625
s2,2010-10-05 21:03:00,5.437
s2,2010-10-05 21:04:00,7.249
s2,2010-10-05 21:05:00,9.061
s2,2010-10-05 21:06:00,0.874
s2,2010-10-05 21:07:00,0.000

desired df:
storm,Q_time,Q, duration
s1,2008-08-07 21:15:00,0.000,1
s1,2008-08-07 21:16:00,3.020,2
s1,2008-08-07 21:17:00,6.041,3
s1,2008-08-07 21:18:00,9.061,4
s1,2008-08-07 21:19:00,12.082,5
s1,2008-08-07 21:20:00,15.102,6
s1,2008-08-07 21:21:00,18.123,7
s1,2008-08-07 21:22:00,11.143,8
s1,2008-08-07 21:23:00,0.000,9
s2,2010-10-05 21:00:00,0.000,1
s2,2010-10-05 21:01:00,1.812,2
s2,2010-10-05 21:02:00,3.625,3
s2,2010-10-05 21:03:00,5.437,4
s2,2010-10-05 21:04:00,7.249,5
s2,2010-10-05 21:05:00,9.061,6
s2,2010-10-05 21:06:00,0.874,7
s2,2010-10-05 21:07:00,0.000,8

I have been trying variations of the following statement, but I cannot seem
to get the length of the subset correct as I receive an error of the type
'Error: arguments imply differing number of rows: 2401, 0'.

newdf <- ddply(df, "storm", transform, FUN = function(x)
{duration=seq(from=1, by=1, length.out=nrow(x))})

I would really like to get a handle on ddply in this instance as it will be
quite helpful for many other similar calculations that I need to do with
this dataset.

Thanks again,
Stevan




--
View this message in context: 
http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to