Re: [R] Basis of fisher.test
On 13-Jan-06 Prof Brian Ripley wrote: On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote: [...] ?fisher.test says only: [That following is not a quote from a current version of R.] In the one-sided 2 by 2 cases, p-values are obtained directly using the hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan Joe (1993). The FORTRAN code can be obtained from URL: http://www.netlib.org/toms/643. No, it *also* says Two-sided tests are based on the probabilities of the tables, and take as 'more extreme' all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities. which answers the question (there are only two-sided tests for such tables). Thanks for the above information, which is indeed the definitive straightforward answer to my question! (Not sure that I quite agree with the two-sided terminology, though, since the ranking is unidirectional based on decreasing probability, and the P-value is that of the least-probability tail -- i.e. analagous to the large (-2*loglik) tail of a likelihood-ratio test -- which I've always visualised as a 1-tailed test (depite the fact that the other tail can on occasion be indicative of a fit too good to be true). Now, what does the posting guide say about stating the R version and updating before posting? Well, I plead that in practice there is necessarily a grey area here! My quotation was from ?fisher.test in R-2.1.0beta of 2004/04/08, the most recent version installed on any of my machines. Admittedly a bit behind the times, but not grossly; and that help page has not changed in this respect since the earliest version I have installed, which is R-1.2.3 of 2001/04/26. Contents of help pages can change overnight as R evolves. While it is better to be up-to-date than behind the times (even slightly), there is a compromise to be struck between upgrading to the latest R every time one has a question which might be answered thereby, or going on-line to read the latest PDF documentation from CRAN, on the one hand, and on the other asking a straightforward question to the list. Thanks again, and best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Jan-06 Time: 08:55:11 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Basis of fisher.test
On Fri, 13 Jan 2006 [EMAIL PROTECTED] wrote: On 13-Jan-06 Prof Brian Ripley wrote: On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote: [...] ?fisher.test says only: [That following is not a quote from a current version of R.] In the one-sided 2 by 2 cases, p-values are obtained directly using the hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan Joe (1993). The FORTRAN code can be obtained from URL: http://www.netlib.org/toms/643. No, it *also* says Two-sided tests are based on the probabilities of the tables, and take as 'more extreme' all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities. which answers the question (there are only two-sided tests for such tables). Thanks for the above information, which is indeed the definitive straightforward answer to my question! (Not sure that I quite agree with the two-sided terminology, though, since the ranking is unidirectional based on decreasing probability, and the P-value is that of the least-probability tail -- i.e. analagous to the large (-2*loglik) tail of a likelihood-ratio test -- which I've always visualised as a 1-tailed test (depite the fact that the other tail can on occasion be indicative of a fit too good to be true). As statistics is usually taught, significance tests are always one-tailed. The two-sided t-test is one-tailed, the test statistic being |T|. In any case, the `two-sided' is part of the arguments given to the function, so this para is just using the already-established terminology. Now, what does the posting guide say about stating the R version and updating before posting? Well, I plead that in practice there is necessarily a grey area here! My quotation was from ?fisher.test in R-2.1.0beta of 2004/04/08, the most recent version installed on any of my machines. Admittedly a bit behind the times, but not grossly; and that help page has not changed in this respect since the earliest version I have installed, which is R-1.2.3 of 2001/04/26. Contents of help pages can change overnight as R evolves. While it is better to be up-to-date than behind the times (even slightly), there is a compromise to be struck between upgrading to the latest R every time one has a question which might be answered thereby, or going on-line to read the latest PDF documentation from CRAN, on the one hand, and on the other asking a straightforward question to the list. Well, if you had given the R version number the problem would have been much more obvious. Thanks again, and best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Jan-06 Time: 08:55:11 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Basis of fisher.test
I want to ascertain the basis of the table ranking, i.e. the meaning of extreme, in Fisher's Exact Test as implemented in 'fisher.test', when applied to RxC tables which are larger than 2x2. One can summarise a strategy for the test as 1) For each table compatible with the margins of the observed table, compute the probability of this table conditional on the marginal totals. 2) Rank the possible tables in order of a measure of discrepancy between the table and the null hypothesis of no association. 3) Locate the observed table, and compute the sum of the probabilties, computed in (1), for this table and more extreme tables in the sense of the ranking in (2). The question is: what measure of discrepancy is used in 'fisher.test' corresponding to stage (2)? (There are in principle several possibilities, e.g. value of a Pearson chi-squared, large values being discrepant; the probability calculated in (2), small values being discrepant; ... ) ?fisher.test says only: In the one-sided 2 by 2 cases, p-values are obtained directly using the hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan Joe (1993). The FORTRAN code can be obtained from URL: http://www.netlib.org/toms/643. I have had a look at this FORTRAN code, and cannot ascertain it from the code itself. However, there is a Comment to the effect: c PRE- Table p-value. (Output) c PRE is the probability of a more extreme table, where c 'extreme' is in a probabilistic sense. which suggests that the tables are ranked in order of their probabilities as computed in (2). Can anyone confirm definitively what goes on? With thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Jan-06 Time: 20:19:02 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Basis of fisher.test
(Ted Harding) [EMAIL PROTECTED] writes: I want to ascertain the basis of the table ranking, i.e. the meaning of extreme, in Fisher's Exact Test as implemented in 'fisher.test', when applied to RxC tables which are larger than 2x2. One can summarise a strategy for the test as 1) For each table compatible with the margins of the observed table, compute the probability of this table conditional on the marginal totals. 2) Rank the possible tables in order of a measure of discrepancy between the table and the null hypothesis of no association. 3) Locate the observed table, and compute the sum of the probabilties, computed in (1), for this table and more extreme tables in the sense of the ranking in (2). The question is: what measure of discrepancy is used in 'fisher.test' corresponding to stage (2)? (There are in principle several possibilities, e.g. value of a Pearson chi-squared, large values being discrepant; the probability calculated in (2), small values being discrepant; ... ) ?fisher.test says only: In the one-sided 2 by 2 cases, p-values are obtained directly using the hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan Joe (1993). The FORTRAN code can be obtained from URL: http://www.netlib.org/toms/643. I have had a look at this FORTRAN code, and cannot ascertain it from the code itself. However, there is a Comment to the effect: c PRE- Table p-value. (Output) c PRE is the probability of a more extreme table, where c 'extreme' is in a probabilistic sense. which suggests that the tables are ranked in order of their probabilities as computed in (2). Can anyone confirm definitively what goes on? To my knowledge, it is the table probability, according to the hypergeometric distribution, i.e. the probability of the table given the marginals, which can be translated to sampling a+b balls without replacement from a box with a+c white and b+d black balls. Playing around with dhyper should be instructive. (You're right that the two-sided p values are obtained by summing all smaller or equal table probabilities. This is the traditional way, but there are alternatives, e.g. tail balancing.) -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Basis of fisher.test
On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote: I want to ascertain the basis of the table ranking, i.e. the meaning of extreme, in Fisher's Exact Test as implemented in 'fisher.test', when applied to RxC tables which are larger than 2x2. One can summarise a strategy for the test as 1) For each table compatible with the margins of the observed table, compute the probability of this table conditional on the marginal totals. 2) Rank the possible tables in order of a measure of discrepancy between the table and the null hypothesis of no association. 3) Locate the observed table, and compute the sum of the probabilties, computed in (1), for this table and more extreme tables in the sense of the ranking in (2). The question is: what measure of discrepancy is used in 'fisher.test' corresponding to stage (2)? (There are in principle several possibilities, e.g. value of a Pearson chi-squared, large values being discrepant; the probability calculated in (2), small values being discrepant; ... ) ?fisher.test says only: [That following is not a quote from a current version of R.] In the one-sided 2 by 2 cases, p-values are obtained directly using the hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan Joe (1993). The FORTRAN code can be obtained from URL: http://www.netlib.org/toms/643. No, it *also* says Two-sided tests are based on the probabilities of the tables, and take as 'more extreme' all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities. which answers the question (there are only two-sided tests for such tables). Now, what does the posting guide say about stating the R version and updating before posting? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html