On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote: > I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution. For example, R functions like quantile() and fitdistr() > expect each observation to have equal sample probability. As a workaround, > I have been "exploding"/"mushrooming" my data based on the appropriate > expansion factors. However, this can take a LONG TIME and I am reaching > out for more efficient suggestions, particularly for the quantile() > function. Example of my workaround: >
The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it might have been borrowed from the quantreg package. > # trees.df represents random sample with probability proportional to size > (of diameter) using "basal area factor" of 20 > trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), > TreesPerAcre=numeric(10)) > trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576) # expansion > factor for each observation > > # to obtain percentiles that are weighted by trees per acre, "explode" > diameter data > explodeFactor <- 10 # represents ten acres > treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) > explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) > k=0 # initialize counter k > for (i in 1:length(trees.df$Diameter)){ > for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ > k <- k +1 > explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] > } > } > > quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for > trees per acre) > quantile(trees.df$Diameter) # percentiles biased upwards > > > > Trevor Walker > -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.