[R] Is it safe? Cochran etc

2004-10-09 Thread Dan Bolser

I have the following contingency table

dat - matrix(c(1,506,13714,878702),nr=2)

And I want to test if their is an association between events 

A:{a,not(a)} and B:{b,not(b)}

| b   | not(b) |
+-++
 a  |   1 |  13714 |
+-++
 not(a) | 506 | 878702 |
+-++

I am worried that prop.test and chisq.test are not valid given the low
counts and low probabilites associated with 'sucess' in each category.

Is it safe to use them, and what is the alternative? (given that
fisher.test can't handle this data... hold the phone...

I just found fisher.test can handle this data if the test is one-tailed
and not two-tailed.

I don't understand the difference between chisq.test, prop.test and
fisher.test when the hybrid=1 option is used for the fisher.test.

I was using the binomial distribution to test the 'extremity' of the
observed data, but now I think I know why that is inapropriate, however,
with the binomial (and its approximation) at least I know what I am
doing. And I can do it in perl easily...

Generally, how should I calculate fisher.test in perl (i.e. what are its
principles). When is it safe to approximate fisher to chisq?

I cannot get insight into this problem...

How come if I do...

dat - matrix(c(50,60,100,100),nr=2)

prop.test(dat)$p.value
chisq.test(dat)$p.value
fisher.test(dat)$p.value

I get 

[1] 0.5173269
[1] 0.5173269
[1] 0.4771358

When I looked at the binomial distribution and the normal approximation
thereof with similar counts I never had a p-value difference  0.004

I am so fed up with this problem :(

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it safe? Cochran etc

2004-10-09 Thread Dan Bolser

Why can't I just use Log odds? Does the standard error of the logs score
depend on a similar chisq assumption?



On Sat, 9 Oct 2004, Dan Bolser wrote:


I have the following contingency table

dat - matrix(c(1,506,13714,878702),nr=2)

And I want to test if their is an association between events 

A:{a,not(a)} and B:{b,not(b)}

| b   | not(b) |
+-++
 a  |   1 |  13714 |
+-++
 not(a) | 506 | 878702 |
+-++

I am worried that prop.test and chisq.test are not valid given the low
counts and low probabilites associated with 'sucess' in each category.

Is it safe to use them, and what is the alternative? (given that
fisher.test can't handle this data... hold the phone...

I just found fisher.test can handle this data if the test is one-tailed
and not two-tailed.

I don't understand the difference between chisq.test, prop.test and
fisher.test when the hybrid=1 option is used for the fisher.test.

I was using the binomial distribution to test the 'extremity' of the
observed data, but now I think I know why that is inapropriate, however,
with the binomial (and its approximation) at least I know what I am
doing. And I can do it in perl easily...

Generally, how should I calculate fisher.test in perl (i.e. what are its
principles). When is it safe to approximate fisher to chisq?

I cannot get insight into this problem...

How come if I do...

dat - matrix(c(50,60,100,100),nr=2)

prop.test(dat)$p.value
chisq.test(dat)$p.value
fisher.test(dat)$p.value

I get 

[1] 0.5173269
[1] 0.5173269
[1] 0.4771358

When I looked at the binomial distribution and the normal approximation
thereof with similar counts I never had a p-value difference  0.004

I am so fed up with this problem :(

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it safe? Cochran etc

2004-10-09 Thread Frederico Zanqueta Poleto
Dan,
I don't know what is the theory behind this hybrid option and what 
consists the Cochran conditions.

However, I think even if you suppose the asymptotic distribution is not 
too accurate, because your sampled 1, there is a too strong association 
of A and B, as this can be noticed by conservative methods such as using 
the Yates continuity correction or Wald/Neyman tests (that usually does 
not reject the null hypothesis of no interaction much more than the 
Pearson/score test and likelihood ratio test, in this order) of the log 
odds.
Both procedures inflate the pvalues, but not sufficiently to change your 
conclusion as you can notice by:

chisq.test(dat,correct=FALSE)
   Pearson's Chi-squared test
data:  dat 

X-squared = 6.0115, df = 1, p-value = 0.01421
chisq.test(dat)
   Pearson's Chi-squared test with Yates' continuity correction
data:  dat 

X-squared = 5.1584, df = 1, p-value = 0.02313
1-pchisq( (log(878702/(13714*506))^2)/(1+1/878702+1/13714+1/506) ,1)
# Wald test of null log odds
[1] 0.03898049
The book Categorical data analysis from Agresti (2002) has an ample 
discussion about tests like this on chapters 1 (basics and one sample) 
and 3 (two variables). You may look there if you still have doubts about 
this tests.

Sincerely,
--
Frederico Zanqueta Poleto
[EMAIL PROTECTED]
--
An approximate answer to the right problem is worth a good deal more than an exact answer 
to an approximate problem. J. W. Tukey

Dan Bolser wrote:
Why can't I just use Log odds? Does the standard error of the logs score
depend on a similar chisq assumption?

On Sat, 9 Oct 2004, Dan Bolser wrote:
 

I have the following contingency table
dat - matrix(c(1,506,13714,878702),nr=2)
And I want to test if their is an association between events 

A:{a,not(a)} and B:{b,not(b)}
  | b   | not(b) |
+-++
a  |   1 |  13714 |
+-++
not(a) | 506 | 878702 |
+-++
I am worried that prop.test and chisq.test are not valid given the low
counts and low probabilites associated with 'sucess' in each category.
Is it safe to use them, and what is the alternative? (given that
fisher.test can't handle this data... hold the phone...
I just found fisher.test can handle this data if the test is one-tailed
and not two-tailed.
I don't understand the difference between chisq.test, prop.test and
fisher.test when the hybrid=1 option is used for the fisher.test.
I was using the binomial distribution to test the 'extremity' of the
observed data, but now I think I know why that is inapropriate, however,
with the binomial (and its approximation) at least I know what I am
doing. And I can do it in perl easily...
Generally, how should I calculate fisher.test in perl (i.e. what are its
principles). When is it safe to approximate fisher to chisq?
I cannot get insight into this problem...
How come if I do...
dat - matrix(c(50,60,100,100),nr=2)
prop.test(dat)$p.value
chisq.test(dat)$p.value
fisher.test(dat)$p.value
I get 

[1] 0.5173269
[1] 0.5173269
[1] 0.4771358
When I looked at the binomial distribution and the normal approximation
thereof with similar counts I never had a p-value difference  0.004
I am so fed up with this problem :(
   

 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it safe? Cochran etc

2004-10-09 Thread Kjetil Brinchmann Halvorsen
Dan Bolser wrote:
I have the following contingency table
dat - matrix(c(1,506,13714,878702),nr=2)
And I want to test if their is an association between events 

A:{a,not(a)} and B:{b,not(b)}
   | b   | not(b) |
+-++
a  |   1 |  13714 |
+-++
not(a) | 506 | 878702 |
+-++
I am worried that prop.test and chisq.test are not valid given the low
counts and low probabilites associated with 'sucess' in each category.
 

 test - matrix( c(1,506, 13714, 878702), 2,2)
 test
[,1]   [,2]
[1,]1  13714
[2,]  506 878702
 chisq.test(test)
   Pearson's Chi-squared test with Yates' continuity correction
data:  test
X-squared = 5.1584, df = 1, p-value = 0.02313
 chisq.test(test,sim=TRUE)
   Pearson's Chi-squared test with simulated p-value (based on 2000
   replicates)
data:  test
X-squared = 6.0115, df = NA, p-value = 0.02099
 chisq.test(test,sim=TRUE, B=20)
# To rule out simulation uncertainty
   Pearson's Chi-squared test with simulated p-value (based on 20
   replicates)
data:  test
X-squared = 6.0115, df = NA, p-value = 0.01634
 N - sum(test)
 rows - rowSums(test)
 cols - colSums(test)
 E - rows %o% cols/N
 E
  [,1]  [,2]
[1,]   7.787351  13707.21
[2,] 499.212649 878708.79

None of the expected'eds are lesser than 5, an often  used rule of thumb 
(which might even be
to conservative. Just to check on the distribution:

rows - round(rows)
cols - round(cols)
 pvals - sapply(r2dtable(10, rows, cols), function(x) 
chisq.test(x)$p.value)
 hist(pvals)
# not very good approximation to uniform histogram, but:
 sum(pvals  0.05)/10
[1] 0.03068
 sum(pvals  0.01)/10
[1] 0.00669

So the true levels are not very far from the calculated by te chisq 
approximation, and it seems safe to use it.

All of this to show that with R you are not anymore dependent on old 
rules os thumb,
you can investigate for yourself.

Kjetil
Is it safe to use them, and what is the alternative? (given that
fisher.test can't handle this data... hold the phone...
I just found fisher.test can handle this data if the test is one-tailed
and not two-tailed.
I don't understand the difference between chisq.test, prop.test and
fisher.test when the hybrid=1 option is used for the fisher.test.
I was using the binomial distribution to test the 'extremity' of the
observed data, but now I think I know why that is inapropriate, however,
with the binomial (and its approximation) at least I know what I am
doing. And I can do it in perl easily...
Generally, how should I calculate fisher.test in perl (i.e. what are its
principles). When is it safe to approximate fisher to chisq?
I cannot get insight into this problem...
How come if I do...
dat - matrix(c(50,60,100,100),nr=2)
prop.test(dat)$p.value
chisq.test(dat)$p.value
fisher.test(dat)$p.value
I get 

[1] 0.5173269
[1] 0.5173269
[1] 0.4771358
When I looked at the binomial distribution and the normal approximation
thereof with similar counts I never had a p-value difference  0.004
I am so fed up with this problem :(
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html