[R] Chi Square with two tab-delimited text files

2007-02-26 Thread Carina Brehony
Hi,

I want to do a chi square test and I have two tab delimited text files with
Expected and Observed values to compare.  Each file contains only the values
and are 48 rows by 116 columns.  I have managed to do something with them,
but I don't think it is right as I got a p value of 1.  In this case I used
the read.table() function to read the values from the files.  But I don't
know if this was right.

 

 

 x=read.table(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Expected
input.txt)

 

 y=read.table(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Observed
input.txt)

 

 chisq.test(x,y)

 

 

 

 

Pearson's Chi-squared test

 

data:  x 

X-squared = 4.4602, df = 5405, p-value = 1

 

Warning message:

Chi-squared approximation may be incorrect in: chisq.test(x, y)

 

 

 

 

Maybe the scan() function is more correct??  Using this I got:

 

 

 x=scan(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Observed
input.txt)

Read 5568 items

 

 y=scan(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Expected
input.txt)

Read 5568 items

 

 chisq.test(x,y)

 

Pearson's Chi-squared test

 

data:  x and y 

X-squared = 172306.4, df = 13880, p-value  2.2e-16

 

Warning message:

Chi-squared approximation may be incorrect in: chisq.test(x, y)

 

 

 

Any help would be much appreciated.

Regards,

 

Carina


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi Square with two tab-delimited text files

2007-02-26 Thread Petr Klasterecky
Carina Brehony napsal(a):
 Hi,
 I want to do a chi square test and I have two tab delimited text files with
 Expected and Observed values to compare.  Each file contains only the values
snip

There are a lot of chi^2 tests, most of them compare OE quantities and 
it is not clear which one you want to use. I'd guess a goodness of fit 
test, but who knows? See ?chisq.test and the examples given there. It 
also tells you that the y-argument is ignored if x is a matrix (that's 
probably the reason why you get different results using read.table and 
scan).
Petr
-- 
Petr Klasterecky
Dept. of Probability and Statistics
Charles University in Prague
Czech Republic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi Square with two tab-delimited text files

2007-02-26 Thread Carina Brehony
Yes, I would like to do a goodness-of-fit test.

-Original Message-
From: Petr Klasterecky [mailto:[EMAIL PROTECTED] 
Sent: 26 February 2007 11:50
To: Carina Brehony
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Chi Square with two tab-delimited text files

Carina Brehony napsal(a):
 Hi,
 I want to do a chi square test and I have two tab delimited text files
with
 Expected and Observed values to compare.  Each file contains only the
values
snip

There are a lot of chi^2 tests, most of them compare OE quantities and 
it is not clear which one you want to use. I'd guess a goodness of fit 
test, but who knows? See ?chisq.test and the examples given there. It 
also tells you that the y-argument is ignored if x is a matrix (that's 
probably the reason why you get different results using read.table and 
scan).
Petr
-- 
Petr Klasterecky
Dept. of Probability and Statistics
Charles University in Prague
Czech Republic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi Square with two tab-delimited text files

2007-02-26 Thread David Barron
It's a bit difficult to advise without knowing what the rows and
columns represent, but why not just calculate the statistic yourself,
given that you already have observed and expected values?  For
example:

chi2 - sum((y-x)^2/x)



On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote:
 Yes, I would like to do a goodness-of-fit test.

 -Original Message-
 From: Petr Klasterecky [mailto:[EMAIL PROTECTED]
 Sent: 26 February 2007 11:50
 To: Carina Brehony
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Chi Square with two tab-delimited text files

 Carina Brehony napsal(a):
  Hi,
  I want to do a chi square test and I have two tab delimited text files
 with
  Expected and Observed values to compare.  Each file contains only the
 values
 snip

 There are a lot of chi^2 tests, most of them compare OE quantities and
 it is not clear which one you want to use. I'd guess a goodness of fit
 test, but who knows? See ?chisq.test and the examples given there. It
 also tells you that the y-argument is ignored if x is a matrix (that's
 probably the reason why you get different results using read.table and
 scan).
 Petr
 --
 Petr Klasterecky
 Dept. of Probability and Statistics
 Charles University in Prague
 Czech Republic

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi Square with two tab-delimited text files

2007-02-26 Thread Carina Brehony
Hi,
The files look like below and the rows and columns are numbers of genetic
types e.g. row1 is type 4; column1 is type A. So for, row1:column1 cell
there are 78 type 4/type A combinations.  I hope this makes sense!



78  500 18  6   0   4   0   1   6
1   1   0   0   0   1   0   0   0   0
0   1   0   0   0   0   0   2   1   0
0   0   1   0   0   0   0   23  0   0
0   7   0   0   7   0   0   0   6   0
8   0   0   0   0   0   0   14  0   0
0   0   0   0   0   0   0   5   0   0
0   0   0   0   45  0   0   0   0   0
0   0   0   0   0   0   0   3   0   40
0   0   0   0   0   0   0   0   0   0
0   0   0   12  0   0   0   0   8   4
0   0   0   0   0   0   etc...  





-Original Message-
From: David Barron [mailto:[EMAIL PROTECTED] 
Sent: 26 February 2007 12:12
To: Carina Brehony; r-help
Subject: Re: [R] Chi Square with two tab-delimited text files

It's a bit difficult to advise without knowing what the rows and
columns represent, but why not just calculate the statistic yourself,
given that you already have observed and expected values?  For
example:

chi2 - sum((y-x)^2/x)



On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote:
 Yes, I would like to do a goodness-of-fit test.

 -Original Message-
 From: Petr Klasterecky [mailto:[EMAIL PROTECTED]
 Sent: 26 February 2007 11:50
 To: Carina Brehony
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Chi Square with two tab-delimited text files

 Carina Brehony napsal(a):
  Hi,
  I want to do a chi square test and I have two tab delimited text files
 with
  Expected and Observed values to compare.  Each file contains only the
 values
 snip

 There are a lot of chi^2 tests, most of them compare OE quantities and
 it is not clear which one you want to use. I'd guess a goodness of fit
 test, but who knows? See ?chisq.test and the examples given there. It
 also tells you that the y-argument is ignored if x is a matrix (that's
 probably the reason why you get different results using read.table and
 scan).
 Petr
 --
 Petr Klasterecky
 Dept. of Probability and Statistics
 Charles University in Prague
 Czech Republic

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi Square with two tab-delimited text files

2007-02-26 Thread David Barron
In that case, you can just ignore the expected values and use the
observed values in the chisq.test.  The reason you got a p value of 1
before is because the second argument was ignored, and so you did a
chi square test on the expected values alone.

If you have loaded the obseved values into a matrix y using read.table
as in your first example, then just use chisq.test(y).  But you should
notice that you have a lot of zero cells and so probably lots of small
expected values, which is a problem for the chi square test.



On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote:
 Hi,
 The files look like below and the rows and columns are numbers of genetic
 types e.g. row1 is type 4; column1 is type A. So for, row1:column1 cell
 there are 78 type 4/type A combinations.  I hope this makes sense!



 78  500 18  6   0   4   0   1   6
 1   1   0   0   0   1   0   0   0   0
 0   1   0   0   0   0   0   2   1   0
 0   0   1   0   0   0   0   23  0   0
 0   7   0   0   7   0   0   0   6   0
 8   0   0   0   0   0   0   14  0   0
 0   0   0   0   0   0   0   5   0   0
 0   0   0   0   45  0   0   0   0   0
 0   0   0   0   0   0   0   3   0   40
 0   0   0   0   0   0   0   0   0   0
 0   0   0   12  0   0   0   0   8   4
 0   0   0   0   0   0   etc...





 -Original Message-
 From: David Barron [mailto:[EMAIL PROTECTED]
 Sent: 26 February 2007 12:12
 To: Carina Brehony; r-help
 Subject: Re: [R] Chi Square with two tab-delimited text files

 It's a bit difficult to advise without knowing what the rows and
 columns represent, but why not just calculate the statistic yourself,
 given that you already have observed and expected values?  For
 example:

 chi2 - sum((y-x)^2/x)



 On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote:
  Yes, I would like to do a goodness-of-fit test.
 
  -Original Message-
  From: Petr Klasterecky [mailto:[EMAIL PROTECTED]
  Sent: 26 February 2007 11:50
  To: Carina Brehony
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] Chi Square with two tab-delimited text files
 
  Carina Brehony napsal(a):
   Hi,
   I want to do a chi square test and I have two tab delimited text files
  with
   Expected and Observed values to compare.  Each file contains only the
  values
  snip
 
  There are a lot of chi^2 tests, most of them compare OE quantities and
  it is not clear which one you want to use. I'd guess a goodness of fit
  test, but who knows? See ?chisq.test and the examples given there. It
  also tells you that the y-argument is ignored if x is a matrix (that's
  probably the reason why you get different results using read.table and
  scan).
  Petr
  --
  Petr Klasterecky
  Dept. of Probability and Statistics
  Charles University in Prague
  Czech Republic
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 =
 David Barron
 Said Business School
 University of Oxford
 Park End Street
 Oxford OX1 1HP




-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi Square with two tab-delimited text files

2007-02-26 Thread Carina Brehony
Hi,
Thanks for the input.  I have tried the test again just using the Observed
values and the read.table() function and get this:

data:  y 
X-squared = NaN, df = 5405, p-value = NA

Warning message:
Chi-squared approximation may be incorrect in: chisq.test(y)


So it doesn't seem to like it!  I guess the zeroes are a problem for it.  Is
there another way around? Do I need to have the totals of each column and
row in the file also?

Thanks,
Carina

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.