[R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens
I was asked by my boss to do an analysis on a large data set, and I am
trying to convince him to let me use R rather than SPSS. I think Sweave
could make my life much much easier. To get me a little closer to this
goal, I ran my analysis through R and SPSS and compared the resulting
values. In all but one case, they were the same. Given the matrix

[,1] [,2]
[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32

This is the output from R:
 chisq.test(test29)

Pearson's Chi-squared test

data:  test29
X-squared = 9.593, df = 4, p-value = 0.04787

But, the same data in SPSS generates a p value of .051. It's a small but
important difference. I played around and rescaled things, and tried
different values for B, but I never could get R to reach .051.

I'd like to know which program is correct - R or SPSS? I know, this is a
biased place to ask such a question. I also appreciate all input that
will help me use R more effectively. The difference could be the result
of my own ignorance.

thanks
--andy

-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Chuck Cleland
On 11/26/2008 9:51 AM, Andrew Choens wrote:
 I was asked by my boss to do an analysis on a large data set, and I am
 trying to convince him to let me use R rather than SPSS. I think Sweave
 could make my life much much easier. To get me a little closer to this
 goal, I ran my analysis through R and SPSS and compared the resulting
 values. In all but one case, they were the same. Given the matrix
 
 [,1] [,2]
 [1,]  110  358
 [2,]   71  312
 [3,]   29  139
 [4,]   31   77
 [5,]   13   32
 
 This is the output from R:
 chisq.test(test29)
 
   Pearson's Chi-squared test
 
 data:  test29
 X-squared = 9.593, df = 4, p-value = 0.04787
 
 But, the same data in SPSS generates a p value of .051. It's a small but
 important difference. I played around and rescaled things, and tried
 different values for B, but I never could get R to reach .051.
 
 I'd like to know which program is correct - R or SPSS? I know, this is a
 biased place to ask such a question. I also appreciate all input that
 will help me use R more effectively. The difference could be the result
 of my own ignorance.

  The SPSS p-value is for the Likelihood Ratio Chi-squared test, not
Pearson's.  For Pearson's Chi-squared test in SPSS (16.0.2), I get
p=0.04787, so the results do match if you do the same Chi-squared test.

 thanks
 --andy 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Berwin A Turlach
G'day Andy,

On Wed, 26 Nov 2008 14:51:50 +
Andrew Choens [EMAIL PROTECTED] wrote:

 I was asked by my boss to do an analysis on a large data set, and I am
 trying to convince him to let me use R rather than SPSS. 

Very laudable of you. :)

 This is the output from R:
  chisq.test(test29)
 
   Pearson's Chi-squared test
 
 data:  test29
 X-squared = 9.593, df = 4, p-value = 0.04787
 
 But, the same data in SPSS generates a p value of .051. It's a small
 but important difference. 

Chuck explained already the reason for this small difference.  I just
take issue about it being an important difference.  In my opinion, this
difference is not important at all.  It would only be important to
people who are still sticking to arbitrary cut-off points that are
mainly due to historical coincidences and the lack of computing power
at those time in history.  If somebody tells you that this difference
is important, ask him or her whether he or she will be willing to
finance you a room full of calculators (in the sense of Pearson's time)
and whether he or she wants you to do all your calculations and analyses
with these calculators in future.  Alternatively, you could ask the
person whether he or she would like the anaesthetist during his or her
next operation to use chloroform given his or her nostalgic penchant for
out-dated rituals/methods.

 I played around and rescaled things, and tried different values for
 B, but I never could get R to reach .051.

Well, I have no problem when using simulated p-values to get something
close to 0.051; look at the last try.  The second one might also be
noteworthy.  Unfortunately, I didn't save the seed beforehand.

 test29 - matrix(c(110,358,71,312,29,139,31,77,13,32), byrow=TRUE,
 ncol=2) test29
 [,1] [,2]
[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32
 chisq.test(test29, simul=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.04798

 chisq.test(test29, simul=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.05697

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0463

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0499

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0486

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.05125


Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens
On Thu, 2008-11-27 at 00:46 +0800, Berwin A Turlach wrote:
 Chuck explained already the reason for this small difference.  I just
 take issue about it being an important difference.  In my opinion,
 this difference is not important at all.  It would only be important
 to people who are still sticking to arbitrary cut-off points that are
 mainly due to historical coincidences and the lack of computing power
 at those time in history.  If somebody tells you that this difference
 is important, ask him or her whether he or she will be willing to
 finance you a room full of calculators (in the sense of Pearson's
 time) and whether he or she wants you to do all your calculations and
 analyses with these calculators in future.  Alternatively, you could
 ask the person whether he or she would like the anaesthetist during
 his or her next operation to use chloroform given his or her nostalgic
 penchant for out-dated rituals/methods.

Yes he did and when I realized the source of my confusion I was
appropriately chastised. I felt like a bit of a fool. Of course, I
should try comparing apples to apples. Oranges are another thing
entirely.

As to the importance of the difference, I am of two minds. On the one
hand I fully agree with you. It is an anachronistic approach. On the
other hand we don't all have the pleasure of working in a math
department where such subtleties are well understood.

I work for a consulting firm that advises state and local governments
(USA). I personally do try to expand my understanding on statistics and
math (I do not have a degree in math), but my clients do not. When I'm
working with someone from the government, it is sometimes easier to
simply tell them that relationship x is significant at a certain level
of certainty. Although I doubt they could really explain the details,
they have some basic understanding of what I am talking about.
Subtleties are sometimes lost on our public servants.

And, since I do work for government, if I ask for a roomful of
calculators, I might just get them. And really, what am I going to do
with a roomful of calculators?

--andy


-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Ted Harding
On 26-Nov-08 17:57:52, Andrew Choens wrote:
 [...]
 And, since I do work for government, if I ask for a roomful of
 calculators, I might just get them. And really, what am I going
 to do with a roomful of calculators?
 
 --andy
 Insert something humorous here.  :-)

Next time the launch of an incoming nuclear strike is detected,
set them to work as follows (following Karl Pearson's historical
precedent):

  Anti-aircraft guns all day long: Computing for the
 Ministry of Munitions
 JUNE BARROW GREEN (Open University)
   From January 1917 until March 1918 Pearson and his
   staff of mathematicians and human computers at the
   Drapers Biometric Laboratory worked tirelessly on
   the computing of ballistic charts, high-angle range
   tables and fuze-scales for AV Hill of the Anti-Aircraft
   Experimental Section. Things did not always go smoothly
   -- Pearson did not take kindly to the calculations of
   his staff being questioned -- and Hill sometimes had
   to work hard to keep the peace.

If you have enough of them (and Pearson undoubtedly did, so you
can quote that in your requisition request), then you might just
get the answer in time!

[ The above excerpted from http://tinyurl.com/6byoub ]

Good luck!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Nov-08   Time: 18:35:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens

 Next time the launch of an incoming nuclear strike is detected,
 set them to work as follows (following Karl Pearson's historical
 precedent):
 
   Anti-aircraft guns all day long: Computing for the
  Ministry of Munitions
  JUNE BARROW GREEN (Open University)
From January 1917 until March 1918 Pearson and his
staff of mathematicians and human computers at the
Drapers Biometric Laboratory worked tirelessly on
the computing of ballistic charts, high-angle range
tables and fuze-scales for AV Hill of the Anti-Aircraft
Experimental Section. Things did not always go smoothly
-- Pearson did not take kindly to the calculations of
his staff being questioned -- and Hill sometimes had
to work hard to keep the peace.
 
 If you have enough of them (and Pearson undoubtedly did, so you
 can quote that in your requisition request), then you might just
 get the answer in time!
 
 [ The above excerpted from http://tinyurl.com/6byoub ]
 
 Good luck!
 Ted.
 

That is absolutely classic.


-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.