On Wed, Jan 26, 2005 at 01:39:36AM +0100, Peter Dalgaard wrote: > Thomas Lumley <[EMAIL PROTECTED]> writes: > > > On Tue, 25 Jan 2005, Florian Menzel wrote: > > > > > Hello all, > > > I found a weird result of the GLM function that seems > > > to be a bug. > > > > No, the problem is that you are using the Wald test when the mle is > > infinite, which is always going to be unreliable. It's even worse > > because you are using data that couldn't really have come from a > > Poisson distribution (because for a=1 you have mean 3 and variance 0). > > Actually, that's rather immaterial.
Right. For the Hauck[sic]-Donner effect. This two-sample problem admits a two-dimensional sufficient statistic, the totals in the two groups. In this example Xtot = 0, Ytot = 24. So the problem is reduced to: Given two independent Poisson observations X = 0 and Y = 24, can they possibly come from the same distribution? Thomas' observation is relevant from a practical point of view; by reducing data to sufficient statistics, we lose the possibility to check model assumptions, here the iid (within groups) Poisson distribution. > Try it with > > b=c(rep(0,8),rmultinom(1,24,rep(1/8,8))) This is the conditional distribution, given the sufficient statistic. > or even > > b=c(rep(0,8),rpois(8,3)) This is the unconditional distribution, given that the parameter is equal to the mle. > (in the former case, notice that the LRT is constant). because the LRT statistic is a function of data only through the sufficient statistic. What is the "correct" p-value in this case? The exact p-value for the LRT is hard to calculate (the distribution of the test statistic under the null depends on thrue, common, unknown parameter), but the conditional test, given X+Y, (has certain optimality properties) is very easy to perform. The p-value is 2 * dbinom(0, 24, 0.5) = 1.192093e-07, compared to the LRT 8.017e-09. Who cares? But suppose the observed value of sufficient statistic was (0, 5). Then the two p-values are 0.0625 and 0.00847, respectively, and one should care. Besides the binomial test, you can also use a bootstrap approach, by simulating iid poisson pairs with mean (X+Y)/2 and calculate the LRT statistic. For (X, Y) = (0, 5), this results in a p-value of about 0.018 (one million replicates). So, it seems as if the conditional test is too conservative, and the asymptotics are far too optimistic. I _think_ this is "common knowledge". -- G�ran Brostr�m tel: +46 90 786 5223 Department of Statistics fax: +46 90 786 6614 Ume� University http://www.stat.umu.se/egna/gb/ SE-90187 Ume�, Sweden e-mail: [EMAIL PROTECTED] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
