[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor

2006-09-13 Thread Antonin Ferry
Dear members,
here is my trouble: My data consists of counts of trapped insects in different 
attractive traps. I usually use GLMs with a poisson error distribution to find 
out the differences between my traitments (and to look at other factor 
effects). But for some dataset where one traitment contains only zeros, GLM 
with poisson family fail to find any difference between this particular 
traitment and anyother one (even with traitment that have trapped a lot of 
insects). GLMs with gaussian family does not seem to have the same problem but 
GLMs with binomial family does.
I'm not sure if it is a statistical problem or if it comes from R... in the 
latter case I think some solution exists (perhaps in the options of the glm() 
function ?).
Thank you for your help.


Here I figure out an exemple to past in the console:

## START 
##
# Take a data set of counts for two traitments, one containing only zeros
A=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
B=c(1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0,1,1,1,0,1)
traitment=c(rep(A,40),rep(B,40))
response=c(A,B)
mydata=data.frame(traitment ,response)


# Make a GLM on this dataset , with family=poisson

 g=glm(response~traitment, data=mydata, family=poisson)
 anova.glm(g,test=Chisq)
# There is an effect of the traitment ...

 summary(g)
# But traitment A does not differ from traitment B ! ! ! (the pvalue is always 
close from 1 in such cases)

# Now if you replace only one zero of the A reponse to 1, the GLM works 
properly:
 mydata[1,2]=1
 g=glm(response~traitment, data=mydata, family=poisson)
 anova.glm(g,test=Chisq)
 summary(g)
#
  END ##



Antonin Ferry (PhD)

Laboratoire d'Ecobiologie des Insectes Parasitoides
http://www.parasitoides.univ-rennes1.fr
Université de Renes1, FRANCE

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected result in glm (family=poisson) for data with an only zero response in one factor

2006-09-13 Thread vito muggeo
Dear Antonin,
It is a statistical problem: the well-known monotone likelihood.

In this case ML estimate does not exist (or equals infinity) and Wald 
approximations (ob which SE are based) does not hold.

However LRT is valid and provides reliable results.

As far as I know, the only software dealing with monotone likelihood 
problems in loglinear models is LogXact by cytel corporation.

best,
vito


Antonin Ferry wrote:
 Dear members,
 here is my trouble: My data consists of counts of trapped insects in 
 different attractive traps. I usually use GLMs with a poisson error 
 distribution to find out the differences between my traitments (and to look 
 at other factor effects). But for some dataset where one traitment contains 
 only zeros, GLM with poisson family fail to find any difference between this 
 particular traitment and anyother one (even with traitment that have trapped 
 a lot of insects). GLMs with gaussian family does not seem to have the same 
 problem but GLMs with binomial family does.
 I'm not sure if it is a statistical problem or if it comes from R... in the 
 latter case I think some solution exists (perhaps in the options of the glm() 
 function ?).
 Thank you for your help.
 
 
 Here I figure out an exemple to past in the console:
 
 ## START 
 ##
 # Take a data set of counts for two traitments, one containing only zeros
 A=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
 B=c(1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0,1,1,1,0,1)
 traitment=c(rep(A,40),rep(B,40))
 response=c(A,B)
 mydata=data.frame(traitment ,response)
 
 
 # Make a GLM on this dataset , with family=poisson
 
  g=glm(response~traitment, data=mydata, family=poisson)
  anova.glm(g,test=Chisq)
 # There is an effect of the traitment ...
 
  summary(g)
 # But traitment A does not differ from traitment B ! ! ! (the pvalue is 
 always close from 1 in such cases)
 
 # Now if you replace only one zero of the A reponse to 1, the GLM works 
 properly:
  mydata[1,2]=1
  g=glm(response~traitment, data=mydata, family=poisson)
  anova.glm(g,test=Chisq)
  summary(g)
 #
   END ##
 
 
 
 Antonin Ferry (PhD)
 
 Laboratoire d'Ecobiologie des Insectes Parasitoides
 http://www.parasitoides.univ-rennes1.fr
 Université de Renes1, FRANCE
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 

Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 6626240
fax: 091 485726/485612

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor

2006-09-13 Thread John Maindonald
The Wald statistics that are returned as z value can be a very  
rough approximation.  The standard error is radically different, on a  
logarithmic scale, between log(mu) = -20.30 [the best glm() managed  
in approximating -infinity] and log(mu) + log(a) = -0.29.  It is  
actually worse than might appear; the SE=2457.38 is an approximation  
to infinity!  The phenomenon is an extreme version, now with a  
poisson error model, of the Hauck-Donner effect (Modern Applied  
Statistics with S, 4th edn, pp.197-199) that occurs with binomial  
data.  Use the result from the anova likelihood ratio test, where the  
approximations that are involved are usually much better behaved (but  
it would not hurt to do a simulation as a check.)

There is an example of this phenomenon with a poisson error model in  
Subsection 8.4.2 (the same subsection number both for the 1st edn and  
the forthcoming 2nd edn) of Data Analysis  Graphics using R, CUP,  
2003 and 2006. Install and attach the DAAG package and try

example(moths)

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

 Dear members,
 here is my trouble: My data consists of counts of trapped insects  
 in different attractive traps. I usually use GLMs with a poisson  
 error distribution to find out the differences between my  
 traitments (and to look at other factor effects). But for some  
 dataset where one traitment contains only zeros, GLM with poisson  
 family fail to find any difference between this particular  
 traitment and anyother one (even with traitment that have trapped a  
 lot of insects). GLMs with gaussian family does not seem to have  
 the same problem but GLMs with binomial family does.
 I'm not sure if it is a statistical problem or if it comes from  
 R... in the latter case I think some solution exists (perhaps in  
 the options of the glm() function ?).
 Thank you for your help.


 Here I figure out an exemple to past in the console:

 ## START  
 ## 
 
 # Take a data set of counts for two traitments, one containing only  
 zeros
 A=c 
 (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 
 ,0,0,0,0,0)
 B=c 
 (1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0 
 ,1,1,1,0,1)
 traitment=c(rep(A,40),rep(B,40))
 response=c(A,B)
 mydata=data.frame(traitment ,response)


 # Make a GLM on this dataset , with family=poisson

  g=glm(response~traitment, data=mydata, family=poisson)
  anova.glm(g,test=Chisq)
 # There is an effect of the traitment ...

  summary(g)
 # But traitment A does not differ from traitment B ! ! ! (the  
 pvalue is always close from 1 in such cases)

 # Now if you replace only one zero of the A reponse to 1, the GLM  
 works properly:
  mydata[1,2]=1
  g=glm(response~traitment, data=mydata, family=poisson)
  anova.glm(g,test=Chisq)
  summary(g)
 ## 
 ###  END ##



 Antonin Ferry (PhD)

 Laboratoire d'Ecobiologie des Insectes Parasitoides
 http://www.parasitoides.univ-rennes1.fr
 Université de Renes1, FRANCE

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor

2006-09-13 Thread John Maindonald
PS.  In part, the problem is with the use of the log link, arising  
because the limit of log(mu), as mu goes to 0, is minus infinity.   
This is not an appropriate scale on which to represent a fitted value  
that is zero.  The estimated SE for a fitted value of zero should be  
0.  You will get a more sensible answer if you set family=poisson 
(link=sqrt)

g - glm(response~traitment, data=mydata, family=poisson(link=sqrt))
  summary(g)

Call:
glm(formula = response ~ traitment, family = poisson(link = sqrt),
 data = mydata)

Deviance Residuals:
Min  1Q  Median  3Q Max
-1.225e+00  -2.730e-05  -2.730e-05   2.745e-01   1.193e+00

Coefficients:
  Estimate Std. Error  z value Pr(|z|)
(Intercept) 0.193  0.0790569 0.0002441
traitmentB  0.8660061  0.11180347.746  9.5e-15

(Dispersion parameter for poisson family taken to be 1)

 Null deviance: 75.485  on 79  degrees of freedom
Residual deviance: 33.896  on 78  degrees of freedom
AIC: 89.579

Number of Fisher Scoring iterations: 14


John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

 Dear members,
 here is my trouble: My data consists of counts of trapped insects  
 in different attractive traps. I usually use GLMs with a poisson  
 error distribution to find out the differences between my  
 traitments (and to look at other factor effects). But for some  
 dataset where one traitment contains only zeros, GLM with poisson  
 family fail to find any difference between this particular  
 traitment and anyother one (even with traitment that have trapped a  
 lot of insects). GLMs with gaussian family does not seem to have  
 the same problem but GLMs with binomial family does.
 I'm not sure if it is a statistical problem or if it comes from  
 R... in the latter case I think some solution exists (perhaps in  
 the options of the glm() function ?).
 Thank you for your help.


 Here I figure out an exemple to past in the console:

 ## START  
 ## 
 
 # Take a data set of counts for two traitments, one containing only  
 zeros
 A=c 
 (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 
 ,0,0,0,0,0)
 B=c 
 (1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0 
 ,1,1,1,0,1)
 traitment=c(rep(A,40),rep(B,40))
 response=c(A,B)
 mydata=data.frame(traitment ,response)


 # Make a GLM on this dataset , with family=poisson

  g=glm(response~traitment, data=mydata, family=poisson)
  anova.glm(g,test=Chisq)
 # There is an effect of the traitment ...

  summary(g)
 # But traitment A does not differ from traitment B ! ! ! (the  
 pvalue is always close from 1 in such cases)

 # Now if you replace only one zero of the A reponse to 1, the GLM  
 works properly:
  mydata[1,2]=1
  g=glm(response~traitment, data=mydata, family=poisson)
  anova.glm(g,test=Chisq)
  summary(g)
 ## 
 ###  END ##



 Antonin Ferry (PhD)

 Laboratoire d'Ecobiologie des Insectes Parasitoides
 http://www.parasitoides.univ-rennes1.fr
 Université de Renes1, FRANCE

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.