Re: [R] strange fisher.test result
On Mon, 2 Apr 2007, [EMAIL PROTECTED] wrote: From the above, the marginal totals for his 2x2 table a b = 168 c d 15 24 are (rows then columns) 24,39,31,32 These fixed marginals mean that the whole table is determined by the value of a. The following function P.FX() computes the probabilities of all possible tables, conditional on the marginal totals (it is much more transparent than the code for the same purpose in fisher.test()): As this example has shown, 2x2 tables are a nice opportunity for illustrating how the ordering of the sample space affects inference (because you can actually see the whole sample space). I used something like this as a term project in an introductory R class, where we wrote code to compute the probabilities for all outcomes conditional on one margin, and used this to get (conservative) exact versions of all the popular tests in 2x2 tables. It's interesting to do things like compare the rejection regions and power under various alternatives for the exact versions of the likelihood ratio test and Fisher's test. We didn't get as far as confidence intervals, but the code is at http://faculty.washington.edu/tlumley/b514/exacttest.R with .Rd files at http://faculty.washington.edu/tlumley/b514/man/ [credits: this is all based on ideas from Scott Emerson] -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange fisher.test result
Thomas Lumley wrote: On Mon, 2 Apr 2007, [EMAIL PROTECTED] wrote: From the above, the marginal totals for his 2x2 table a b = 168 c d 15 24 are (rows then columns) 24,39,31,32 These fixed marginals mean that the whole table is determined by the value of a. The following function P.FX() computes the probabilities of all possible tables, conditional on the marginal totals (it is much more transparent than the code for the same purpose in fisher.test()): As this example has shown, 2x2 tables are a nice opportunity for illustrating how the ordering of the sample space affects inference (because you can actually see the whole sample space). I used something like this as a term project in an introductory R class, where we wrote code to compute the probabilities for all outcomes conditional on one margin, and used this to get (conservative) exact versions of all the popular tests in 2x2 tables. It's interesting to do things like compare the rejection regions and power under various alternatives for the exact versions of the likelihood ratio test and Fisher's test. We didn't get as far as confidence intervals, but the code is at http://faculty.washington.edu/tlumley/b514/exacttest.R with .Rd files at http://faculty.washington.edu/tlumley/b514/man/ The effect is already visible with binomial tests. In fact the last exercise in the section on categorical data in Introductory Statistics with R currently reads (the \Answer section is not in the actual book -- yet): Make a plot of the two-sided $p$ value for testing that the probability parameter is $x$ when the observations are 3 successes in 15 trials, for $x$ varying from 0 to 1 in steps of 0.001. Explain what makes the definition of a two-sided confidence interval difficult. \Answer The curve shows substantial discontinuities where probability mass is shifted from one tail to the other, and also a number of local minima. A confidence region could be defined as those $p$ that there is no significant evidence against at level $\alpha$, but for some $\alpha$, that is not an interval. p - seq(0,1,0.001) pval - sapply(p,function(p)binom.test(3,15,p=p)$p.value) plot(p,pval,type=l) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange fisher.test result
On 31-Mar-07 13:36:04, Williams Scott wrote: A simple question - using the following fishers test it appears that the P value is significant, but the CI includes 1. Is this result correct? data.50p10min - matrix(c(16,15, 8, 24),nrow=2) fisher.test(data.50p10min) Fisher's Exact Test for Count Data data: data.50p10min p-value = 0.03941 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.9810441 10.7771597 sample estimates: odds ratio 3.138456 Apologies to Scott for overlooking, in my previous response, that he had in fact given his data in the line data.50p10min - matrix(c(16,15, 8, 24),nrow=2) (and to Rolf Turner for pointing this out to me nicely, in a private mail). I shall denote this by X0 - matrix(c(16,15, 8, 24),nrow=2) I've done a bit of investigating with the help of R, to try to provide a well-formed explanation of reasons underlying Scott's observation and his question. From the above, the marginal totals for his 2x2 table a b = 168 c d 15 24 are (rows then columns) 24,39,31,32 These fixed marginals mean that the whole table is determined by the value of a. The following function P.FX() computes the probabilities of all possible tables, conditional on the marginal totals (it is much more transparent than the code for the same purpose in fisher.test()): P.FX - function(margs,R){ n - margs[1]+margs[2] a - (0:n) b - margs[1]-a c - margs[3]-a d - n-a-b-c ix - (a=0)(b=0)(c=0)(d=0) a - a[ix]; b - b[ix]; c -c[ix]; d - d[ix] P - (R^a)/(exp(lgamma(a+1))*exp(lgamma(b+1))* exp(lgamma(c+1))*exp(lgamma(d+1))) P - P/sum(P) return(cbind(a,b,c,d,P)) } where 'margs' is the set of marginal totals (in the same row-col order as above), and R = (alpha*delta)/(beta*gamma) is the odds-ratio of the cell probabilities alpha beta gamma delta Hence the probabilties for the Null Hypothesis R=1 are P - P.FX(c(24,39,31,32),1) P a b c dP [1,] 0 24 31 8 6.714279e-11 [2,] 1 23 30 9 5.550471e-09 [3,] 2 22 29 10 1.914912e-07 [4,] 3 21 28 11 3.702164e-06 [5,] 4 20 27 12 4.535151e-05 [6,] 5 19 26 13 3.767664e-04 [7,] 6 18 25 14 2.215745e-03 [8,] 7 17 24 15 9.496050e-03 [9,] 8 16 23 16 3.026866e-02 [10,] 9 15 22 17 7.280305e-02 [11,] 10 14 21 18 1.334723e-01 [12,] 11 13 20 19 1.877552e-01 [13,] 12 12 19 20 2.034015e-01 [14,] 13 11 18 21 1.698738e-01 [15,] 14 10 17 22 1.092046e-01 [16,] 15 9 16 23 5.381095e-02 [17,] 16 8 15 24 2.017911e-02 [18,] 17 7 14 25 5.697630e-03 [19,] 18 6 13 26 1.193093e-03 [20,] 19 5 12 27 1.814060e-04 [21,] 20 4 11 28 1.943636e-05 [22,] 21 3 10 29 1.404269e-06 [23,] 22 2 9 30 6.383041e-08 [24,] 23 1 8 31 1.611427e-09 [25,] 24 0 7 32 1.678570e-11 Extract a and P by a - P[,1]; p - P[,5] Reminder: The 2-sided Fisher test p-value is fisher.test(X0)$p.value [1] 0.03940996 Looking at the code for fisher.test() (which takes some thought!), the values of 'a' are (effectively) considered in order of decreasing probability, and added in until the observed value of 'a' is about to be included; the resulting p-value is the sum of the probabilities of the 'a' values not included (amongst which is the observed value itself): iy - order(p,decreasing=TRUE) cbind(a=a[iy],p=p[iy],pval=rev(cumsum(rev(p[iy] ap pval [1,] 12 2.034015e-01 1.00e+00 [2,] 11 1.877552e-01 7.965985e-01 [3,] 13 1.698738e-01 6.088432e-01 [4,] 10 1.334723e-01 4.389695e-01 [5,] 14 1.092046e-01 3.054972e-01 [6,]9 7.280305e-02 1.962926e-01 [7,] 15 5.381095e-02 1.234896e-01 [8,]8 3.026866e-02 6.967862e-02 [9,] 16 2.017911e-02 3.940996e-02 ## Observed value [10,]7 9.496050e-03 1.923085e-02 [11,] 17 5.697630e-03 9.734798e-03 [12,]6 2.215745e-03 4.037168e-03 [13,] 18 1.193093e-03 1.821423e-03 [14,]5 3.767664e-04 6.283293e-04 [15,] 19 1.814060e-04 2.515629e-04 [16,]4 4.535151e-05 7.015687e-05 [17,] 20 1.943636e-05 2.480536e-05 [18,]3 3.702164e-06 5.369000e-06 [19,] 21 1.404269e-06 1.666837e-06 [20,]2 1.914912e-07 2.625675e-07 [21,] 22 6.383041e-08 7.107624e-08 [22,]1 5.550471e-09 7.245826e-09 [23,] 23 1.611427e-09 1.695355e-09 [24,]0 6.714279e-11 8.392849e-11 [25,] 24 1.678570e-11 1.678570e-11 Here it can be seen that, with extreme meaning towards the bottom of the above listing, the probability of a result at least as extreme as the observed value a=16 is 3.940996e-02, the value actually returned by fisher.test() (see above). The right-hand column gives the upwards cumulative sum of the probabilities in the middle column. Now for the 95% 2-sided Confidence Interval. Finding this is a somewhat different matter, since the above 2-sided approach involves ordering the outcomes 2-sidedly, whereas finding the confidence limits involves a 1-sided approach for each limit. For the upper limit of the 95% CI, we seek
[R] strange fisher.test result
A simple question - using the following fishers test it appears that the P value is significant, but the CI includes 1. Is this result correct? data.50p10min - matrix(c(16,15, 8, 24),nrow=2) fisher.test(data.50p10min) Fisher's Exact Test for Count Data data: data.50p10min p-value = 0.03941 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.9810441 10.7771597 sample estimates: odds ratio 3.138456 Thanks for comments Scott ___ Dr Scott Williams MD Peter MacCallum Cancer Centre Melbourne Austrailia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange fisher.test result
On 31-Mar-07 13:36:04, Williams Scott wrote: A simple question - using the following fishers test it appears that the P value is significant, but the CI includes 1. Is this result correct? data.50p10min - matrix(c(16,15, 8, 24),nrow=2) fisher.test(data.50p10min) Fisher's Exact Test for Count Data data: data.50p10min p-value = 0.03941 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.9810441 10.7771597 sample estimates: odds ratio 3.138456 Thanks for comments Scott Remember that the distribution on which the Fisher Exact Test is based is a discrete distribution, so what you might expect based on experience with continuous distributions will not always be the case. In the above case, I can see a possible explanation. The p-value is the probability of the set of data (more extreme to either side) such that the probability of (say) the element in row 1, col 1 of the 2x2 table exceeding an upper critical value is less than or equal to 0.25, and the same probabiltiy 0.025 for being less than a lower critical value. On the other hand, the lower confidence bound for the OR is the smallest value of OR that would not be rejected by a 1-sided test at P-value 0.025, and the upper confidence bound for the OR is the largest value of OR that would not be rejected by a corresponding 1-sided test at P-value 0.025, on the given data. There is no reason, with a discrete distribution, for the confidence interval results for OR to agree exactly with the significance test of OR=1. More than that I cannot say without further information. If you would post the 2x2 table of your data, it would be possible to be more specific. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 31-Mar-07 Time: 15:05:09 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange fisher.test result
The two-sided test of odds-ratio=1 is not necessarily (nor in this case) the same thing as an equal-tailed confidence interval: see the help page comment Two-sided tests are based on the probabilities of the tables, and take as 'more extreme' all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities. fisher.test(data.50p10min, alternative=greater, conf.level=0.975) Fisher's Exact Test for Count Data data: data.50p10min p-value = 0.02727 alternative hypothesis: true odds ratio is greater than 1 97.5 percent confidence interval: 0.9810441 Inf sample estimates: odds ratio 3.138456 which is not significant at 2.5%, and the two-tailed p-value is not double it. There are other ways to compute confidence intervals, but R's fisher.test() gives the 97.5% lower and upper confidence limits. On Sat, 31 Mar 2007, Williams Scott wrote: A simple question - using the following fishers test it appears that the P value is significant, but the CI includes 1. Is this result correct? data.50p10min - matrix(c(16,15, 8, 24),nrow=2) fisher.test(data.50p10min) Fisher's Exact Test for Count Data data: data.50p10min p-value = 0.03941 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.9810441 10.7771597 sample estimates: odds ratio 3.138456 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.