Hi Nate,
Here is my 2 cents worth after coming in late to this discussion.
The fact that your data are proportions is important as it suggests how
the data may vary. Do you have the numerator and denominator used to
calculate the proportions? If so then I would suggest that you should be
performing a binomial GLM with these data.
If you don't have these data, or are disinclined to use them for some
reason (why?) then I would strongly suggest considering a asin( sqrt(
p)) transformation where p is in [0,1]. There is some justification for
this: namely that this transformation stabilises the variance of a
binomial variable. That is, it makes the use of un-weighted least
squares more appropriate but, of course, the distributional assumptions
leading to tests of significance etc may still require checking.
The log transformation has similar motivation, but for a different
situation. It is the variance stabilising transformation for when the
data are Poisson.
I find it interesting that these pieces of info were passed down to me
by my PhD supervisor, who (like Carsten's supervisor) was right about so
many things.
HTH,
Scott
Nate Upham wrote:
Thanks very much indeed Carsten and Philippe!
Lots to consider. I should have specified this before, but the
variable with zero values that I would like to log (ln) transform does
consist of many small values. The range is between 0.00 and 0.35,
since this variable is the percentage abundance of bipedal rodents
captured on a given night of trapping:
Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160,
0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000,
0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065,
0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140,
0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050,
0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190,
0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)
From your "rules of thumb" advice, it sounds like adding 1 to this
data through log1p() might be quite distorting to the analyses. This
would deal with the issue of zero values (log(0+1)=0), but small
positive values such as 0.01 would go from -4.605 to 0.00995 by log(x
+1). Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733).
Should I assume that this effect will "even out" over all values since
the log(x+1) transformation is applied to the entire variable?
Or, is it best to go with one of these alternatives for the c in log(x
+c):
1. c <- signif(0.5*sort(unique(Y))[2], 2) #c=0.0035
2. c <- (quantile(Y)[2]^2)/quantile(Y)[4] #c=0.0048
Does anyone have English references for alternatives 1 or 2?
This is super helpful, many thanks!
--Nate
On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:
Many thanks to Carsten, Philippe, and Nate for a very informative
and entertaining discussion of something I have always wondered
about, having heard suggestions for both approaches. At least now I
have a better understanding of the rationale for each!
Matt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew Landis
Dept. Biology
Middlebury College
Middlebury VT 05753
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsup...@uchicago.edu
_________________________________
[[alternative HTML version deleted]]
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
Scott Foster
CSIRO Mathematical and Information Sciences
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia
Phone: (03) 6232 5178
Fax: (03) 6232 5000
Email: scott.fos...@csiro.au
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology