Hi Nate,

Here is my 2 cents worth after coming in late to this discussion.

The fact that your data are proportions is important as it suggests how the data may vary. Do you have the numerator and denominator used to calculate the proportions? If so then I would suggest that you should be performing a binomial GLM with these data.

If you don't have these data, or are disinclined to use them for some reason (why?) then I would strongly suggest considering a asin( sqrt( p)) transformation where p is in [0,1]. There is some justification for this: namely that this transformation stabilises the variance of a binomial variable. That is, it makes the use of un-weighted least squares more appropriate but, of course, the distributional assumptions leading to tests of significance etc may still require checking.

The log transformation has similar motivation, but for a different situation. It is the variance stabilising transformation for when the data are Poisson.

I find it interesting that these pieces of info were passed down to me by my PhD supervisor, who (like Carsten's supervisor) was right about so many things.

HTH,

Scott

Nate Upham wrote:
Thanks very much indeed Carsten and Philippe!
Lots to consider. I should have specified this before, but the variable with zero values that I would like to log (ln) transform does consist of many small values. The range is between 0.00 and 0.35, since this variable is the percentage abundance of bipedal rodents captured on a given night of trapping:

Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160, 0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000, 0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065, 0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140, 0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050, 0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190, 0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)

From your "rules of thumb" advice, it sounds like adding 1 to this data through log1p() might be quite distorting to the analyses. This would deal with the issue of zero values (log(0+1)=0), but small positive values such as 0.01 would go from -4.605 to 0.00995 by log(x +1). Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733). Should I assume that this effect will "even out" over all values since the log(x+1) transformation is applied to the entire variable?

Or, is it best to go with one of these alternatives for the c in log(x +c):
1.  c <- signif(0.5*sort(unique(Y))[2], 2)   #c=0.0035
2.  c <- (quantile(Y)[2]^2)/quantile(Y)[4]   #c=0.0048

Does anyone have English references for alternatives 1 or 2?
This is super helpful, many thanks!
--Nate


On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:

Many thanks to Carsten, Philippe, and Nate for a very informative and entertaining discussion of something I have always wondered about, having heard suggestions for both approaches. At least now I have a better understanding of the rationale for each!

Matt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew Landis
Dept. Biology
Middlebury College
Middlebury VT 05753
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsup...@uchicago.edu
_________________________________





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

--
Scott Foster
CSIRO Mathematical and Information Sciences
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.fos...@csiro.au

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to