[R] Chi Square with two tab-delimited text files
Hi, I want to do a chi square test and I have two tab delimited text files with Expected and Observed values to compare. Each file contains only the values and are 48 rows by 116 columns. I have managed to do something with them, but I don't think it is right as I got a p value of 1. In this case I used the read.table() function to read the values from the files. But I don't know if this was right. x=read.table(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Expected input.txt) y=read.table(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Observed input.txt) chisq.test(x,y) Pearson's Chi-squared test data: x X-squared = 4.4602, df = 5405, p-value = 1 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x, y) Maybe the scan() function is more correct?? Using this I got: x=scan(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Observed input.txt) Read 5568 items y=scan(file=C:/Program Files/R/R-2.2.1/Projects/Stats EU/Expected input.txt) Read 5568 items chisq.test(x,y) Pearson's Chi-squared test data: x and y X-squared = 172306.4, df = 13880, p-value 2.2e-16 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x, y) Any help would be much appreciated. Regards, Carina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi Square with two tab-delimited text files
Carina Brehony napsal(a): Hi, I want to do a chi square test and I have two tab delimited text files with Expected and Observed values to compare. Each file contains only the values snip There are a lot of chi^2 tests, most of them compare OE quantities and it is not clear which one you want to use. I'd guess a goodness of fit test, but who knows? See ?chisq.test and the examples given there. It also tells you that the y-argument is ignored if x is a matrix (that's probably the reason why you get different results using read.table and scan). Petr -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi Square with two tab-delimited text files
Yes, I would like to do a goodness-of-fit test. -Original Message- From: Petr Klasterecky [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 11:50 To: Carina Brehony Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Chi Square with two tab-delimited text files Carina Brehony napsal(a): Hi, I want to do a chi square test and I have two tab delimited text files with Expected and Observed values to compare. Each file contains only the values snip There are a lot of chi^2 tests, most of them compare OE quantities and it is not clear which one you want to use. I'd guess a goodness of fit test, but who knows? See ?chisq.test and the examples given there. It also tells you that the y-argument is ignored if x is a matrix (that's probably the reason why you get different results using read.table and scan). Petr -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi Square with two tab-delimited text files
It's a bit difficult to advise without knowing what the rows and columns represent, but why not just calculate the statistic yourself, given that you already have observed and expected values? For example: chi2 - sum((y-x)^2/x) On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote: Yes, I would like to do a goodness-of-fit test. -Original Message- From: Petr Klasterecky [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 11:50 To: Carina Brehony Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Chi Square with two tab-delimited text files Carina Brehony napsal(a): Hi, I want to do a chi square test and I have two tab delimited text files with Expected and Observed values to compare. Each file contains only the values snip There are a lot of chi^2 tests, most of them compare OE quantities and it is not clear which one you want to use. I'd guess a goodness of fit test, but who knows? See ?chisq.test and the examples given there. It also tells you that the y-argument is ignored if x is a matrix (that's probably the reason why you get different results using read.table and scan). Petr -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi Square with two tab-delimited text files
Hi, The files look like below and the rows and columns are numbers of genetic types e.g. row1 is type 4; column1 is type A. So for, row1:column1 cell there are 78 type 4/type A combinations. I hope this makes sense! 78 500 18 6 0 4 0 1 6 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 1 0 0 0 1 0 0 0 0 23 0 0 0 7 0 0 7 0 0 0 6 0 8 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 3 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 8 4 0 0 0 0 0 0 etc... -Original Message- From: David Barron [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 12:12 To: Carina Brehony; r-help Subject: Re: [R] Chi Square with two tab-delimited text files It's a bit difficult to advise without knowing what the rows and columns represent, but why not just calculate the statistic yourself, given that you already have observed and expected values? For example: chi2 - sum((y-x)^2/x) On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote: Yes, I would like to do a goodness-of-fit test. -Original Message- From: Petr Klasterecky [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 11:50 To: Carina Brehony Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Chi Square with two tab-delimited text files Carina Brehony napsal(a): Hi, I want to do a chi square test and I have two tab delimited text files with Expected and Observed values to compare. Each file contains only the values snip There are a lot of chi^2 tests, most of them compare OE quantities and it is not clear which one you want to use. I'd guess a goodness of fit test, but who knows? See ?chisq.test and the examples given there. It also tells you that the y-argument is ignored if x is a matrix (that's probably the reason why you get different results using read.table and scan). Petr -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi Square with two tab-delimited text files
In that case, you can just ignore the expected values and use the observed values in the chisq.test. The reason you got a p value of 1 before is because the second argument was ignored, and so you did a chi square test on the expected values alone. If you have loaded the obseved values into a matrix y using read.table as in your first example, then just use chisq.test(y). But you should notice that you have a lot of zero cells and so probably lots of small expected values, which is a problem for the chi square test. On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote: Hi, The files look like below and the rows and columns are numbers of genetic types e.g. row1 is type 4; column1 is type A. So for, row1:column1 cell there are 78 type 4/type A combinations. I hope this makes sense! 78 500 18 6 0 4 0 1 6 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 1 0 0 0 1 0 0 0 0 23 0 0 0 7 0 0 7 0 0 0 6 0 8 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 3 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 8 4 0 0 0 0 0 0 etc... -Original Message- From: David Barron [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 12:12 To: Carina Brehony; r-help Subject: Re: [R] Chi Square with two tab-delimited text files It's a bit difficult to advise without knowing what the rows and columns represent, but why not just calculate the statistic yourself, given that you already have observed and expected values? For example: chi2 - sum((y-x)^2/x) On 26/02/07, Carina Brehony [EMAIL PROTECTED] wrote: Yes, I would like to do a goodness-of-fit test. -Original Message- From: Petr Klasterecky [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 11:50 To: Carina Brehony Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Chi Square with two tab-delimited text files Carina Brehony napsal(a): Hi, I want to do a chi square test and I have two tab delimited text files with Expected and Observed values to compare. Each file contains only the values snip There are a lot of chi^2 tests, most of them compare OE quantities and it is not clear which one you want to use. I'd guess a goodness of fit test, but who knows? See ?chisq.test and the examples given there. It also tells you that the y-argument is ignored if x is a matrix (that's probably the reason why you get different results using read.table and scan). Petr -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi Square with two tab-delimited text files
Hi, Thanks for the input. I have tried the test again just using the Observed values and the read.table() function and get this: data: y X-squared = NaN, df = 5405, p-value = NA Warning message: Chi-squared approximation may be incorrect in: chisq.test(y) So it doesn't seem to like it! I guess the zeroes are a problem for it. Is there another way around? Do I need to have the totals of each column and row in the file also? Thanks, Carina __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.