Re: [R-sig-eco] Log transforming zero value data

Scott Wed, 24 Jun 2009 16:23:40 -0700

Hi Nate,

Here is my 2 cents worth after coming in late to this discussion.

The fact that your data are proportions is important as it suggests howthe data may vary. Do you have the numerator and denominator used tocalculate the proportions? If so then I would suggest that you should beperforming a binomial GLM with these data.

If you don't have these data, or are disinclined to use them for somereason (why?) then I would strongly suggest considering a asin( sqrt(p)) transformation where p is in [0,1]. There is some justification forthis: namely that this transformation stabilises the variance of abinomial variable. That is, it makes the use of un-weighted leastsquares more appropriate but, of course, the distributional assumptionsleading to tests of significance etc may still require checking.

The log transformation has similar motivation, but for a differentsituation. It is the variance stabilising transformation for when thedata are Poisson.

I find it interesting that these pieces of info were passed down to meby my PhD supervisor, who (like Carsten's supervisor) was right about somany things.


HTH,

Scott

Nate Upham wrote:

Thanks very much indeed Carsten and Philippe!
Lots to consider. I should have specified this before, but thevariable with zero values that I would like to log (ln) transform doesconsist of many small values. The range is between 0.00 and 0.35,since this variable is the percentage abundance of bipedal rodentscaptured on a given night of trapping:
Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160,0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000,0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065,0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140,0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050,0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190,0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)
From your "rules of thumb" advice, it sounds like adding 1 to thisdata through log1p() might be quite distorting to the analyses. Thiswould deal with the issue of zero values (log(0+1)=0), but smallpositive values such as 0.01 would go from -4.605 to 0.00995 by log(x+1). Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733).Should I assume that this effect will "even out" over all values sincethe log(x+1) transformation is applied to the entire variable?
Or, is it best to go with one of these alternatives for the c in log(x+c):
1.  c <- signif(0.5*sort(unique(Y))[2], 2)   #c=0.0035
2.  c <- (quantile(Y)[2]^2)/quantile(Y)[4]   #c=0.0048

Does anyone have English references for alternatives 1 or 2?
This is super helpful, many thanks!
--Nate


On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:
Many thanks to Carsten, Philippe, and Nate for a very informativeand entertaining discussion of something I have always wonderedabout, having heard suggestions for both approaches. At least now Ihave a better understanding of the rationale for each!
Matt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew Landis
Dept. Biology
Middlebury College
Middlebury VT 05753
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsup...@uchicago.edu
_________________________________





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Scott Foster
CSIRO Mathematical and Information Sciences
GPO Box 1538
Castray Esplanade
Hobart 7001

TasmaniaAustralia


Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.fos...@csiro.au

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Log transforming zero value data

Reply via email to