Re: [R] exercise in frustration: applying a function to subsamples

Erik Iverson Mon, 12 Jul 2010 12:21:52 -0700

Your code is not reproducible. Can you come up with a small exampleshowing the crux of your data structures/problem, that we can all run inour R sessions? You're likely get much higher quality responses this way.


Ted Byers wrote:

From the documentation I have found, it seems that one of the functions from

package plyr, or a combination of functions like split and lapply would
allow me to have a really short R script to analyze all my data (I have
reduced it to a couple hundred thousand records with about half a dozen
records.


I get the same result from ddply and split/lapply:

ddply(moreinfo,c("m_id","sale_year","sale_week"),
+       function(df) data.frame(res = fitdist(df$elapsed_time,"exp"),est =
res$estimate,sd = res$sd))
Error in fitdist(df$elapsed_time, "exp") :
  data must be a numeric vector of length greater than 1

and

lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)),
+       function(df) fitdist(df$elapsed_time,"exp"))
Error in fitdist(df$elapsed_time, "exp") :
  data must be a numeric vector of length greater than 1


Now, in retrospect, unless I misunderstood the properties of a data.frame, I
suppose a data.frame might not have been entirely appropriate as the m_id
samples start and end on very different dates, but I would have thought a
list data structure should have been able to handle that.  It would seem
that split is making groups that have the same start and end dates (or that
if, for example, I have sale data for precisely the last year, split would
insist on both 2009 and 2010 having weeks from 0 through 52 instead of just
the weeks in each year that actually have data: 26 through 52 for last year
and 1 through 25 for this year).  I don't see how else the data passed to
fitdist could have a sample size of 0.

I'd appreciate understanding how to resolve this.  However, it isn't s show
stopper as it now seems trivial to just break it out into a loop (followed
by a lapply/split combo using only sale year and sale month).

While I am asking, is there a better way to split such temporally ordered
data into weekly samples that respective the year in which the sample is
taken as well as the week in which it is taken?

Thanks

Ted

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exercise in frustration: applying a function to subsamples

Reply via email to