Re: [R] strange fisher.test result

2007-04-03 Thread Thomas Lumley
On Mon, 2 Apr 2007, [EMAIL PROTECTED] wrote:

 From the above, the marginal totals for his 2x2 table

  a   b   =   168

  c   d   15   24

 are (rows then columns) 24,39,31,32

 These fixed marginals mean that the whole table is determined
 by the value of a. The following function P.FX() computes the
 probabilities of all possible tables, conditional on the
 marginal totals (it is much more transparent than the code
 for the same purpose in fisher.test()):

As this example has shown, 2x2 tables are a nice opportunity for 
illustrating how the ordering of the sample space affects inference 
(because you can actually see the whole sample space).

I used something like this as a term project in an introductory R class, 
where we wrote code to compute the probabilities for all outcomes 
conditional on one margin, and used this to get (conservative) exact 
versions of all the popular tests in 2x2 tables.  It's interesting to do 
things like compare the rejection regions and power under various 
alternatives for the exact versions of the likelihood ratio test and 
Fisher's test.  We didn't get as far as confidence intervals, but the code 
is at
http://faculty.washington.edu/tlumley/b514/exacttest.R
with .Rd files at
http://faculty.washington.edu/tlumley/b514/man/

[credits: this is all based on ideas from Scott Emerson]


-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange fisher.test result

2007-04-03 Thread Peter Dalgaard
Thomas Lumley wrote:
 On Mon, 2 Apr 2007, [EMAIL PROTECTED] wrote:
   
 From the above, the marginal totals for his 2x2 table
   
  a   b   =   168

  c   d   15   24

 are (rows then columns) 24,39,31,32

 These fixed marginals mean that the whole table is determined
 by the value of a. The following function P.FX() computes the
 probabilities of all possible tables, conditional on the
 marginal totals (it is much more transparent than the code
 for the same purpose in fisher.test()):
 

 As this example has shown, 2x2 tables are a nice opportunity for 
 illustrating how the ordering of the sample space affects inference 
 (because you can actually see the whole sample space).

 I used something like this as a term project in an introductory R class, 
 where we wrote code to compute the probabilities for all outcomes 
 conditional on one margin, and used this to get (conservative) exact 
 versions of all the popular tests in 2x2 tables.  It's interesting to do 
 things like compare the rejection regions and power under various 
 alternatives for the exact versions of the likelihood ratio test and 
 Fisher's test.  We didn't get as far as confidence intervals, but the code 
 is at
 http://faculty.washington.edu/tlumley/b514/exacttest.R
 with .Rd files at 
 http://faculty.washington.edu/tlumley/b514/man/
   
The effect is already visible with binomial tests. In fact the last 
exercise in the section on categorical data in Introductory Statistics 
with R currently reads (the \Answer section is not in the actual book -- 
yet):

 Make a plot of the two-sided $p$ value for
  testing that the probability parameter is $x$ when the observations
  are 3 successes in 15 trials, for $x$ varying from 0 to 1 in steps of
  0.001. Explain what makes the definition of a two-sided confidence
  interval difficult.
 
  \Answer The curve shows substantial discontinuities where
  probability mass is shifted from one tail to the other, and also a
  number of local minima. A confidence region could be defined as
  those $p$ that there is no significant evidence against at level
  $\alpha$, but for some $\alpha$, that is not an interval.

   p - seq(0,1,0.001)
   pval - sapply(p,function(p)binom.test(3,15,p=p)$p.value)
   plot(p,pval,type=l)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange fisher.test result

2007-04-02 Thread Ted Harding
On 31-Mar-07 13:36:04, Williams Scott wrote:
 A simple question - using the following fishers test it appears that
 the P value is significant, but the CI includes 1. Is this result
 correct?
 
 data.50p10min - matrix(c(16,15, 8, 24),nrow=2)
 
 fisher.test(data.50p10min)
 
 Fisher's Exact Test for Count Data
 data:  data.50p10min 
 p-value = 0.03941
 
 alternative hypothesis: true odds ratio is not equal to 1 
 
 95 percent confidence interval:
   0.9810441 10.7771597 
 
 sample estimates:
 odds ratio 
   3.138456 

Apologies to Scott for overlooking, in my previous response,
that he had in fact given his data in the line

  data.50p10min - matrix(c(16,15, 8, 24),nrow=2)

(and to Rolf Turner for pointing this out to me nicely, in
a private mail).

I shall denote this by X0 - matrix(c(16,15, 8, 24),nrow=2)

I've done a bit of investigating with the help of R, to try
to provide a well-formed explanation of reasons underlying
Scott's observation and his question.

From the above, the marginal totals for his 2x2 table

  a   b   =   168

  c   d   15   24

are (rows then columns) 24,39,31,32

These fixed marginals mean that the whole table is determined
by the value of a. The following function P.FX() computes the
probabilities of all possible tables, conditional on the
marginal totals (it is much more transparent than the code
for the same purpose in fisher.test()):

P.FX - function(margs,R){
  n  - margs[1]+margs[2]
  a  - (0:n)
  b  - margs[1]-a
  c  - margs[3]-a
  d  - n-a-b-c
  ix - (a=0)(b=0)(c=0)(d=0)
  a - a[ix]; b - b[ix]; c -c[ix]; d - d[ix]

  P - (R^a)/(exp(lgamma(a+1))*exp(lgamma(b+1))*
  exp(lgamma(c+1))*exp(lgamma(d+1)))
  P - P/sum(P)
  return(cbind(a,b,c,d,P))
}

where 'margs' is the set of marginal totals (in the same row-col
order as above), and R = (alpha*delta)/(beta*gamma) is the
odds-ratio of the cell probabilities

  alpha   beta

  gamma   delta

Hence the probabilties for the Null Hypothesis R=1 are

 P - P.FX(c(24,39,31,32),1)
 P
   a  b  c  dP
 [1,]  0 24 31  8 6.714279e-11
 [2,]  1 23 30  9 5.550471e-09
 [3,]  2 22 29 10 1.914912e-07
 [4,]  3 21 28 11 3.702164e-06
 [5,]  4 20 27 12 4.535151e-05
 [6,]  5 19 26 13 3.767664e-04
 [7,]  6 18 25 14 2.215745e-03
 [8,]  7 17 24 15 9.496050e-03
 [9,]  8 16 23 16 3.026866e-02
[10,]  9 15 22 17 7.280305e-02
[11,] 10 14 21 18 1.334723e-01
[12,] 11 13 20 19 1.877552e-01
[13,] 12 12 19 20 2.034015e-01
[14,] 13 11 18 21 1.698738e-01
[15,] 14 10 17 22 1.092046e-01
[16,] 15  9 16 23 5.381095e-02
[17,] 16  8 15 24 2.017911e-02
[18,] 17  7 14 25 5.697630e-03
[19,] 18  6 13 26 1.193093e-03
[20,] 19  5 12 27 1.814060e-04
[21,] 20  4 11 28 1.943636e-05
[22,] 21  3 10 29 1.404269e-06
[23,] 22  2  9 30 6.383041e-08
[24,] 23  1  8 31 1.611427e-09
[25,] 24  0  7 32 1.678570e-11

Extract a and P by

a - P[,1]; p - P[,5]

Reminder: The 2-sided Fisher test p-value is
fisher.test(X0)$p.value
[1] 0.03940996

Looking at the code for fisher.test() (which takes some thought!),
the values of 'a' are (effectively) considered in order of
decreasing probability, and added in until the observed value
of 'a' is about to be included; the resulting p-value is the sum
of the probabilities of the 'a' values not included (amongst
which is the observed value itself):

iy - order(p,decreasing=TRUE)
cbind(a=a[iy],p=p[iy],pval=rev(cumsum(rev(p[iy]

 ap pval
 [1,]   12 2.034015e-01 1.00e+00
 [2,]   11 1.877552e-01 7.965985e-01
 [3,]   13 1.698738e-01 6.088432e-01
 [4,]   10 1.334723e-01 4.389695e-01
 [5,]   14 1.092046e-01 3.054972e-01
 [6,]9 7.280305e-02 1.962926e-01
 [7,]   15 5.381095e-02 1.234896e-01
 [8,]8 3.026866e-02 6.967862e-02
 [9,]   16 2.017911e-02 3.940996e-02 ## Observed value
[10,]7 9.496050e-03 1.923085e-02
[11,]   17 5.697630e-03 9.734798e-03
[12,]6 2.215745e-03 4.037168e-03
[13,]   18 1.193093e-03 1.821423e-03
[14,]5 3.767664e-04 6.283293e-04
[15,]   19 1.814060e-04 2.515629e-04
[16,]4 4.535151e-05 7.015687e-05
[17,]   20 1.943636e-05 2.480536e-05
[18,]3 3.702164e-06 5.369000e-06
[19,]   21 1.404269e-06 1.666837e-06
[20,]2 1.914912e-07 2.625675e-07
[21,]   22 6.383041e-08 7.107624e-08
[22,]1 5.550471e-09 7.245826e-09
[23,]   23 1.611427e-09 1.695355e-09
[24,]0 6.714279e-11 8.392849e-11
[25,]   24 1.678570e-11 1.678570e-11

Here it can be seen that, with extreme meaning towards the
bottom of the above listing, the probability of a result at
least as extreme as the observed value a=16 is 3.940996e-02,
the value actually returned by fisher.test() (see above).
The right-hand column gives the upwards cumulative sum of
the probabilities in the middle column.

Now for the 95% 2-sided Confidence Interval. Finding this is a
somewhat different matter, since the above 2-sided approach
involves ordering the outcomes 2-sidedly, whereas finding the
confidence limits involves a 1-sided approach for each limit.

For the upper limit of the 95% CI, we seek 

[R] strange fisher.test result

2007-03-31 Thread Williams Scott
A simple question - using the following fishers test it appears that the P 
value is significant, but the CI includes 1. Is this result correct?

 

 data.50p10min - matrix(c(16,15, 8, 24),nrow=2)

 fisher.test(data.50p10min)

 

Fisher's Exact Test for Count Data

 

data:  data.50p10min 

p-value = 0.03941

alternative hypothesis: true odds ratio is not equal to 1 

95 percent confidence interval:

  0.9810441 10.7771597 

sample estimates:

odds ratio 

  3.138456 

 

Thanks for comments

 

Scott 

___

Dr Scott Williams MD

Peter MacCallum Cancer Centre

Melbourne Austrailia

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange fisher.test result

2007-03-31 Thread Ted Harding
On 31-Mar-07 13:36:04, Williams Scott wrote:
 A simple question - using the following fishers test it appears that
 the P value is significant, but the CI includes 1. Is this result
 correct?
 
 data.50p10min - matrix(c(16,15, 8, 24),nrow=2)
 
 fisher.test(data.50p10min)
 
 Fisher's Exact Test for Count Data
 data:  data.50p10min 
 
 p-value = 0.03941
 
 alternative hypothesis: true odds ratio is not equal to 1 
 
 95 percent confidence interval:
   0.9810441 10.7771597 
 
 sample estimates:
 odds ratio 
   3.138456 
 
 Thanks for comments
 Scott

Remember that the distribution on which the Fisher Exact Test is
based is a discrete distribution, so what you might expect based
on experience with continuous distributions will not always be
the case.

In the above case, I can see a possible explanation. The p-value
is the probability of the set of data (more extreme to either side)
such that the probability of (say) the element in row 1, col 1 of
the 2x2 table exceeding an upper critical value is less than or
equal to 0.25, and the same probabiltiy 0.025 for being less than
a lower critical value.

On the other hand, the lower confidence bound for the OR is
the smallest value of OR that would not be rejected by a 1-sided
test at P-value 0.025, and the upper confidence bound for the OR
is the largest value of OR that would not be rejected by a
corresponding 1-sided test at P-value 0.025, on the given data.

There is no reason, with a discrete distribution, for the confidence
interval results for OR to agree exactly with the significance test
of OR=1.

More than that I cannot say without further information.

If you would post the 2x2 table of your data, it would be possible
to be more specific.

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-07   Time: 15:05:09
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange fisher.test result

2007-03-31 Thread Prof Brian Ripley
The two-sided test of odds-ratio=1 is not necessarily (nor in 
this case) the same thing as an equal-tailed confidence interval: see the 
help page comment

  Two-sided tests are based on the probabilities of the tables, and
  take as 'more extreme' all tables with probabilities less than or
  equal to that of the observed table, the p-value being the sum of
  such probabilities.

 fisher.test(data.50p10min, alternative=greater, conf.level=0.975)

 Fisher's Exact Test for Count Data

data:  data.50p10min
p-value = 0.02727
alternative hypothesis: true odds ratio is greater than 1
97.5 percent confidence interval:
  0.9810441   Inf
sample estimates:
odds ratio
   3.138456

which is not significant at 2.5%, and the two-tailed p-value is not double 
it.  There are other ways to compute confidence intervals, but R's 
fisher.test() gives the 97.5% lower and upper confidence limits.


On Sat, 31 Mar 2007, Williams Scott wrote:

 A simple question - using the following fishers test it appears that the P 
 value is significant, but the CI includes 1. Is this result correct?



 data.50p10min - matrix(c(16,15, 8, 24),nrow=2)

 fisher.test(data.50p10min)



Fisher's Exact Test for Count Data



 data:  data.50p10min

 p-value = 0.03941

 alternative hypothesis: true odds ratio is not equal to 1

 95 percent confidence interval:

  0.9810441 10.7771597

 sample estimates:

 odds ratio

  3.138456


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.