Re: [R] Chi-square test: Specifying expected proportions for two way table
Perhaps I misunderstand, but ?chisq.test explicitly says: "If x is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x must be non-negative integers. Otherwise, x and y must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals." Moreover, expected counts are one component of the returned result (see the "value" section). Proportions can of course easily then be obtained if so desired. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Oct 16, 2021 at 8:52 AM Miloš Žarković wrote: > > Hi, > > Is there a function where I can specify expected proportions for the > two-way table to > calculate the Chi-square test? chisq.test allows specifying only the > one-way table. > Otherwise, I will have to write the function, but I never trust myself not > to make a mess > programing. > > Thanks, > > Miloš > > Miloš Žarković > Professor of Internal Medicine > School of Medicine, University of Belgrade > Clinic of Endocrinology, Clinical Centre of Serbia > 11000 Belgrade > PAK 112113 > Serbia > Phone +381 11 3639 724 > email milos.zarko...@med.bg.ac.rs > milos.zarko...@gmail.com > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-square test: Specifying expected proportions for two way table
Hi, Is there a function where I can specify expected proportions for the two-way table to calculate the Chi-square test? chisq.test allows specifying only the one-way table. Otherwise, I will have to write the function, but I never trust myself not to make a mess programing. Thanks, Miloš Miloš Žarković Professor of Internal Medicine School of Medicine, University of Belgrade Clinic of Endocrinology, Clinical Centre of Serbia 11000 Belgrade PAK 112113 Serbia Phone +381 11 3639 724 email milos.zarko...@med.bg.ac.rs milos.zarko...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
Dear David and John, Thank you for your replies. Indeed I'm using ape and nlme packages. Here it is: > fit<-gls(fcl~mass+activity+agility,correlation=corBrownian(phy=tree),data=df,method="ML",weights=varFixed(~vf)) > Anova(fit) Analysis of Deviance Table (Type II tests) Response: fcl Df Chisq Pr(>Chisq) mass 1 0.1756 0.6752 activity 2 0.5549 0.7577 agility 4 3.2903 0.5105 Anyway, I have the help I was looking for. Thank you vey much. Best regards, Sérgio. - Mensagem original - > De: "Fox, John" <j...@mcmaster.ca> > Para: "Sergio Ferreira Cardoso" <sergio.ferreira-card...@umontpellier.fr> > Cc: "R-help list" <r-help@r-project.org> > Enviadas: Sábado, 21 De Janeiro de 2017 6:09:22 > Assunto: Re: [R] Chi-square test > Dear Sergio, > > You appear to have asked this question twice on r-help. > > Anova() has no specific method for “gls” models (I assume, though you don’t > say > so, that the model is fit by gls() in the nlme package), but the default > method > works and provides Wald chi-square tests for terms in the model. I don’t > understand the model formula x ~ 1 + 2 + 3 + x, however, and so I have no idea > what gls() would do with this model, other than report an error. Perhaps you > can show us the output — or, better yet, provide a reproducible example. > > As a general matter, for 1-df terms in an additive model, the 1-df chi-square > values reported by Anova() will simply be the squares of the corresponding > Wald > statistics (labelled “t” I believe) reported in the summary of the model. > Although the p-value is from the upper tail of the chi-square distribution, > the > test is inherently two-sided. > > Best, > John > > - > John Fox, Professor > McMaster University > Hamilton, Ontario, Canada > Web: http::/socserv.mcmaster.ca/jfox > >> On Jan 20, 2017, at 8:36 AM, Sergio Ferreira Cardoso >> <sergio.ferreira-card...@umontpellier.fr> wrote: >> >> Dear all, >> >> Anova() for .car package retrieves Chi-square statistics when I'm testing a >> model the significance of a multivariate .gls model >> gls(x~1+2+3+x,corBrownian(phy=tree), ...). >> Is this Chi-square a two-sided test? >> >> Thank you. >> >> Best, >> Sérgio. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
> On Jan 20, 2017, at 7:36 AM, Sergio Ferreira Cardoso >wrote: > > Dear all, > > Anova() for .car package retrieves Chi-square statistics when I'm testing a > model the significance of a multivariate .gls model > gls(x~1+2+3+x,corBrownian(phy=tree), ...). > Is this Chi-square a two-sided test? that If you explain what you mean by a "2-sided test" we might be able to help. It’s unlikely that the author set up the test so that it would fail when the fit was so good that the chi-square statistic was very small, but it’s also likely that departures from the implicit hypothesis of all the coefficients being identity 0 would have raised the chi-square statistic away from zero. — David. > > Thank you. > > Best, > Sérgio. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
Dear Sergio, You appear to have asked this question twice on r-help. Anova() has no specific method for “gls” models (I assume, though you don’t say so, that the model is fit by gls() in the nlme package), but the default method works and provides Wald chi-square tests for terms in the model. I don’t understand the model formula x ~ 1 + 2 + 3 + x, however, and so I have no idea what gls() would do with this model, other than report an error. Perhaps you can show us the output — or, better yet, provide a reproducible example. As a general matter, for 1-df terms in an additive model, the 1-df chi-square values reported by Anova() will simply be the squares of the corresponding Wald statistics (labelled “t” I believe) reported in the summary of the model. Although the p-value is from the upper tail of the chi-square distribution, the test is inherently two-sided. Best, John - John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http::/socserv.mcmaster.ca/jfox > On Jan 20, 2017, at 8:36 AM, Sergio Ferreira Cardoso >wrote: > > Dear all, > > Anova() for .car package retrieves Chi-square statistics when I'm testing a > model the significance of a multivariate .gls model > gls(x~1+2+3+x,corBrownian(phy=tree), ...). > Is this Chi-square a two-sided test? > > Thank you. > > Best, > Sérgio. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-square test
Dear all, Anova() for .car package retrieves Chi-square statistics when I'm testing a model the significance of a multivariate .gls model gls(x~1+2+3+x,corBrownian(phy=tree), ...). Is this Chi-square a two-sided test? Thank you. Best, Sérgio. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-square test
Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. Does anybody know the reason? Best Regards, pari __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote: Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. print sum(p11)-1 Does anybody know the reason? R FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html) Berend Best Regards, pari __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
On Feb 20, 2015, at 10:05 AM, pari hesabi wrote: Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. Well, the sum is close to 1.0 but not exact. There's a simple fix: sum(p11)==1 [1] FALSE sum( p11/sum(p11) )==1 [1] TRUE But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. Does anybody know the reason? Numerical accuracy. See R-FAQ 7.31 -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
And probably why chisq.test has the rescale.p= argument. Your second problem with small expected values can be handled with simulate.p.value=. chisq.test(f, p=p11) Error in chisq.test(f, p = p11) : probabilities must sum to 1. 1-sum(p11) [1] 4.3036e-08 chisq.test(f, p=p11, rescale.p=TRUE) Chi-squared test for given probabilities data: f X-squared = 7.6268, df = 14, p-value = 0.9078 Warning message: In chisq.test(f, p = p11, rescale.p = TRUE) : Chi-squared approximation may be incorrect chisq.test(f, p=p11, rescale.p=TRUE, simulate.p.value=TRUE) Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates) data: f X-squared = 7.6268, df = NA, p-value = 0.7996 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman Sent: Friday, February 20, 2015 12:13 PM To: pari hesabi Cc: r-help@r-project.org Subject: Re: [R] Chi-square test On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote: Hello, If the vector of observed frequencies is: f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) and the vector of probability :p11-c(7.577864e-06, 1.999541e-04 ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, 5.009534e-02, 2.645857e-02,0.0205403) The sum of the probabilities is equal to one. But when I want to do the the Chi-square test, I get this error: probabilities must sum to one. print sum(p11)-1 Does anybody know the reason? R FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html) Berend Best Regards, pari __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chi-square test
Dear useRs of R, I have two datasets (TT and SS) and i wanted to to see if my data is uniformly distributed or not?I tested it through chi-square test and results are given at the end of it.Now apparently P-value has a significant importance but I cant interpret the results and why it says that In chisq.test(TT) : Chi-squared approximation may be incorrect ### dput(TT) structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268L,! 3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c(clc5, quota_massima), class = data.frame, row.names = c(NA, -124L)) chisq.test(TT) Pearson's Chi-squared test data: TT X-squared = 411.5517, df = 123, p-value 2.2e-16 Warning message: In chisq.test(TT) : Chi-squared approximation may be incorrect ### dput(SS) structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, 27! 4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = c(NDVIanno, delta_z), class = data.frame, row.names = c(NA, -124L)) chisq.test(SS) Pearson's Chi-squared test data: SS X-squared = 72.8115, df = 123, p-value = 0. Warning message: In chisq.test(SS) : Chi-squared approximation may be incorrect # Kindly guide me through like you always did :) thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi-square test
On 09/15/2014 10:57 AM, eliza botto wrote: Dear useRs of R, I have two datasets (TT and SS) and i wanted to to see if my data is uniformly distributed or not?I tested it through chi-square test and results are given at the end of it.Now apparently P-value has a significant importance but I cant interpret the results and why it says that In chisq.test(TT) : Chi-squared approximation may be incorrect ### dput(TT) structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268! L,! 3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c(clc5, quota_massima), class = data.frame, row.names = c(NA, -124L)) chisq.test(TT) Pearson's Chi-squared test data: TT X-squared = 411.5517, df = 123, p-value 2.2e-16 Warning message: In chisq.test(TT) : Chi-squared approximation may be incorrect ### dput(SS) structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, ! 27! 4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = c(NDVIanno, delta_z), class = data.frame, row.names = c(NA, -124L)) chisq.test(SS) Pearson's Chi-squared test data: SS X-squared = 72.8115, df = 123, p-value = 0. Warning message: In chisq.test(SS) : Chi-squared approximation may be incorrect # Kindly guide me through like you always did :) thanks in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. You are using a Chi-squared test on a 124x2 matrix of values (not all integers) and many are zeros. The expected frequencies for many cells are very small (near zero, less than 1) hence the warning message. More importantly, does this application of the
Re: [R] chi-square test
Rick's question is a good one. It is unlikely that the results will be informative, but from a technical standpoint, you can estimate the p value using the simulate.p.value=TRUE argument to chisq.test(). chisq.test(TT, simulate.p.value=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: TT X-squared = 7919.632, df = NA, p-value = 0.0004998 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rick Bilonick Sent: Monday, September 15, 2014 10:18 AM To: r-help@r-project.org Subject: Re: [R] chi-square test On 09/15/2014 10:57 AM, eliza botto wrote: Dear useRs of R, I have two datasets (TT and SS) and i wanted to to see if my data is uniformly distributed or not?I tested it through chi-square test and results are given at the end of it.Now apparently P-value has a significant importance but I cant interpret the results and why it says that In chisq.test(TT) : Chi-squared approximation may be incorrect ### dput(TT) structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268! L,! 3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c(clc5, quota_massima), class = data.frame, row.names = c(NA, -124L)) chisq.test(TT) Pearson's Chi-squared test data: TT X-squared = 411.5517, df = 123, p-value 2.2e-16 Warning message: In chisq.test(TT) : Chi-squared approximation may be incorrect ### dput(SS) structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, ! 27! 4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = c(NDVIanno, delta_z), class = data.frame, row.names = c(NA, -124L)) chisq.test(SS) Pearson's Chi-squared test data: SS X-squared = 72.8115, df = 123, p-value = 0. Warning message
[R] chi square test
I`m doing the chi square test in R, see below code: row1 - c(27,17,13,21,80,24,35,41,18,51) #Category A (1-10) counts row2 - c(27,11,26,13,30,28,17,30,10,21) #Category B (1-10) counts data.table - rbind(row1,row2) data.table then: chisq.test(data.table) This gives me the chi figure, degrees of freedom and p value. But how do I get the results of individual cells? And has anyone got any good ideas on a decent post hoc test to the chi square test. Thanks all in advance, Dave Clark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi square test
On Mon, Jun 17, 2013 at 9:14 AM, Dave Clark d...@mailbox.co.uk wrote: I`m doing the chi square test in R, see below code: row1 - c(27,17,13,21,80,24,35,41,18,51) #Category A (1-10) counts row2 - c(27,11,26,13,30,28,17,30,10,21) #Category B (1-10) counts data.table - rbind(row1,row2) data.table then: chisq.test(data.table) This gives me the chi figure, degrees of freedom and p value. But how do I get the results of individual cells? What do you mean 'results of individual cells'? As documented in ?chisq.test, you might be looking for one or more of data.table$observed data.table$expected data.table$residuals data.table$stdres Pick your poison. ;-) MW And has anyone got any good ideas on a decent post hoc test to the chi square test. Thanks all in advance, Dave Clark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi square test
On Jun 17, 2013, at 10:36 , R. Michael Weylandt wrote: On Mon, Jun 17, 2013 at 9:14 AM, Dave Clark d...@mailbox.co.uk wrote: I`m doing the chi square test in R, see below code: row1 - c(27,17,13,21,80,24,35,41,18,51) #Category A (1-10) counts row2 - c(27,11,26,13,30,28,17,30,10,21) #Category B (1-10) counts data.table - rbind(row1,row2) data.table then: chisq.test(data.table) This gives me the chi figure, degrees of freedom and p value. But how do I get the results of individual cells? What do you mean 'results of individual cells'? As documented in ?chisq.test, you might be looking for one or more of data.table$observed data.table$expected data.table$residuals data.table$stdres Pick your poison. ;-) MW Replace with chisq.test(data.table)$observed, etc. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi square test
On Mon, Jun 17, 2013 at 10:07 AM, peter dalgaard pda...@gmail.com wrote: On Jun 17, 2013, at 10:36 , R. Michael Weylandt wrote: What do you mean 'results of individual cells'? As documented in ?chisq.test, you might be looking for one or more of data.table$observed data.table$expected data.table$residuals data.table$stdres Pick your poison. ;-) MW Replace with chisq.test(data.table)$observed, etc. D'Oh! Thanks for that, Peter. MW __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square test and survey results
gheine wrote on 10/11/2011 02:31:46 PM: An organization has asked me to comment on the validity of their recent all-employee survey. Survey responses, by geographic region, compared with the total number of employees in each region, were as follows: ByRegion All.Employees Survey.Respondents Region_1735142 Region_2500 83 Region_3897 78 Region_4717133 Region_5167 48 Region_6309 0 Region_7806125 Region_8627122 Region_9858177 Region_10 851160 Region_11 336 52 Region_12 1823312 Region_1380 9 Region_14 774121 Region_15 561 24 Region_16 834134 How well does the survey represent the employee population? Chi-square test says, not very well: chisq.test(ByRegion) Pearson's Chi-squared test data: ByRegion X-squared = 163.6869, df = 15, p-value 2.2e-16 By striking three under-represented regions (3,6, and 15), we get a more reasonable, although still not convincing, result: chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),]) Pearson's Chi-squared test data: ByRegion[setdiff(1:16, c(3, 6, 15)), ] X-squared = 22.5643, df = 12, p-value = 0.03166 You can't simply eliminate the three regions with the fewest respondents (3, 6, and 15). These are the three largest contributors to the chi-squared statistic, precisely because fewer people in those regions were surveyed than expected. In addition, more people in regions 1, 5, and 9 were surveyed than expected. This should be clear in a bar chart. And the resulting chi-squared test confirms this. Jean This poses several questions: 1) Looking at a side-by-side barchart (proportion of responses vs. proportion of employees, per region), the pattern of survey responses appears, visually, to match fairly well the pattern of employees. Is this a case where we trust the numbers and not the picture? 2) Part of the problem, ironically, is that there were too many responses to the survey. If we had only one-tenth the responses, but in the same proportions by region, the chi-square statistic would look much better, (though with a warning about possible inaccuracy): data: data.frame(ByRegion$All.Employees, 0.1 * (ByRegion$Survey.Respondents)) X-squared = 17.5912, df = 15, p-value = 0.2848 Is there a way of reconciling a large response rate with an unrepresentative response profile? Or is the bad news that the survey will give very precise results about a very ill-specified sub-population? (Of course, I would put in softer terms, like you need to assess the degree of homogeneity across different regions .) 3) Is Chi-squared really the right measure of how representative is the survey? Thanks for any help you can give - hope these questions make sense - George H. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square test and survey results
George, Perhaps the site of the RISQ project (Representativity indicators for Survey Quality) might be of use: http://www.risq-project.eu/ . They also provide R-code to calculate their indicators. HTH, Jan Quoting ghe...@mathnmaps.com: An organization has asked me to comment on the validity of their recent all-employee survey. Survey responses, by geographic region, compared with the total number of employees in each region, were as follows: ByRegion All.Employees Survey.Respondents Region_1735142 Region_2500 83 Region_3897 78 Region_4717133 Region_5167 48 Region_6309 0 Region_7806125 Region_8627122 Region_9858177 Region_10 851160 Region_11 336 52 Region_12 1823312 Region_1380 9 Region_14 774121 Region_15 561 24 Region_16 834134 How well does the survey represent the employee population? Chi-square test says, not very well: chisq.test(ByRegion) Pearson's Chi-squared test data: ByRegion X-squared = 163.6869, df = 15, p-value 2.2e-16 By striking three under-represented regions (3,6, and 15), we get a more reasonable, although still not convincing, result: chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),]) Pearson's Chi-squared test data: ByRegion[setdiff(1:16, c(3, 6, 15)), ] X-squared = 22.5643, df = 12, p-value = 0.03166 This poses several questions: 1) Looking at a side-by-side barchart (proportion of responses vs. proportion of employees, per region), the pattern of survey responses appears, visually, to match fairly well the pattern of employees. Is this a case where we trust the numbers and not the picture? 2) Part of the problem, ironically, is that there were too many responses to the survey. If we had only one-tenth the responses, but in the same proportions by region, the chi-square statistic would look much better, (though with a warning about possible inaccuracy): data: data.frame(ByRegion$All.Employees, 0.1 * (ByRegion$Survey.Respondents)) X-squared = 17.5912, df = 15, p-value = 0.2848 Is there a way of reconciling a large response rate with an unrepresentative response profile? Or is the bad news that the survey will give very precise results about a very ill-specified sub-population? (Of course, I would put in softer terms, like you need to assess the degree of homogeneity across different regions .) 3) Is Chi-squared really the right measure of how representative is the survey? Thanks for any help you can give - hope these questions make sense - George H. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square test and survey results
The chisq.test function is expecting a contingency table, basically one column should have the count of respondents and the other column should have the count of non-respondents (yours looks like it is the total instead of the non-respondents), so your data is wrong to begin with. A significant chi-square here just means that the proportion responding differs in some of the regions, that does not mean that the sample is representative (or not representative). What is more important (and not in the data or standard tests) is if there is a relationship between why someone chose to respond and the outcomes of interest. If you are concerned with different proportions responding then you could do post-stratification to correct for the inequality when computing other summaries or tests (though region 6 will still give you problems, you will need to make some assumptions, possibly combine it with another region that is similar). Throwing away data is rarely, if ever, beneficial. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of ghe...@mathnmaps.com Sent: Tuesday, October 11, 2011 1:32 PM To: r-help@r-project.org Subject: [R] Chi-Square test and survey results An organization has asked me to comment on the validity of their recent all-employee survey. Survey responses, by geographic region, compared with the total number of employees in each region, were as follows: ByRegion All.Employees Survey.Respondents Region_1735142 Region_2500 83 Region_3897 78 Region_4717133 Region_5167 48 Region_6309 0 Region_7806125 Region_8627122 Region_9858177 Region_10 851160 Region_11 336 52 Region_12 1823312 Region_1380 9 Region_14 774121 Region_15 561 24 Region_16 834134 How well does the survey represent the employee population? Chi-square test says, not very well: chisq.test(ByRegion) Pearson's Chi-squared test data: ByRegion X-squared = 163.6869, df = 15, p-value 2.2e-16 By striking three under-represented regions (3,6, and 15), we get a more reasonable, although still not convincing, result: chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),]) Pearson's Chi-squared test data: ByRegion[setdiff(1:16, c(3, 6, 15)), ] X-squared = 22.5643, df = 12, p-value = 0.03166 This poses several questions: 1) Looking at a side-by-side barchart (proportion of responses vs. proportion of employees, per region), the pattern of survey responses appears, visually, to match fairly well the pattern of employees. Is this a case where we trust the numbers and not the picture? 2) Part of the problem, ironically, is that there were too many responses to the survey. If we had only one-tenth the responses, but in the same proportions by region, the chi-square statistic would look much better, (though with a warning about possible inaccuracy): data: data.frame(ByRegion$All.Employees, 0.1 * (ByRegion$Survey.Respondents)) X-squared = 17.5912, df = 15, p-value = 0.2848 Is there a way of reconciling a large response rate with an unrepresentative response profile? Or is the bad news that the survey will give very precise results about a very ill-specified sub-population? (Of course, I would put in softer terms, like you need to assess the degree of homogeneity across different regions .) 3) Is Chi-squared really the right measure of how representative is the survey? Thanks for any help you can give - hope these questions make sense - George H. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-Square test and survey results
An organization has asked me to comment on the validity of their recent all-employee survey. Survey responses, by geographic region, compared with the total number of employees in each region, were as follows: ByRegion All.Employees Survey.Respondents Region_1735142 Region_2500 83 Region_3897 78 Region_4717133 Region_5167 48 Region_6309 0 Region_7806125 Region_8627122 Region_9858177 Region_10 851160 Region_11 336 52 Region_12 1823312 Region_1380 9 Region_14 774121 Region_15 561 24 Region_16 834134 How well does the survey represent the employee population? Chi-square test says, not very well: chisq.test(ByRegion) Pearson's Chi-squared test data: ByRegion X-squared = 163.6869, df = 15, p-value 2.2e-16 By striking three under-represented regions (3,6, and 15), we get a more reasonable, although still not convincing, result: chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),]) Pearson's Chi-squared test data: ByRegion[setdiff(1:16, c(3, 6, 15)), ] X-squared = 22.5643, df = 12, p-value = 0.03166 This poses several questions: 1) Looking at a side-by-side barchart (proportion of responses vs. proportion of employees, per region), the pattern of survey responses appears, visually, to match fairly well the pattern of employees. Is this a case where we trust the numbers and not the picture? 2) Part of the problem, ironically, is that there were too many responses to the survey. If we had only one-tenth the responses, but in the same proportions by region, the chi-square statistic would look much better, (though with a warning about possible inaccuracy): data: data.frame(ByRegion$All.Employees, 0.1 * (ByRegion$Survey.Respondents)) X-squared = 17.5912, df = 15, p-value = 0.2848 Is there a way of reconciling a large response rate with an unrepresentative response profile? Or is the bad news that the survey will give very precise results about a very ill-specified sub-population? (Of course, I would put in softer terms, like you need to assess the degree of homogeneity across different regions .) 3) Is Chi-squared really the right measure of how representative is the survey? Thanks for any help you can give - hope these questions make sense - George H. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi square test on data frame
Hi r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43: Dear Michael, Thanks a lot for your reply and for your help.I was struggling so much but your suggestion showed me a path to the solution of my problem.I have tried your code on my data frame step wise and it looks fine to me.But when i tried chi square test- res=chisq.test(y1[id],p=y2[id],rescale.p=T) Chi-squared test for given probabilities data: y1[id] X-squared = NaN, df = 19997, p-value = NA Warning message: In chisq.test(y1[id], p = y2[id], rescale.p = T) : Chi-squared approximation may be incorrect Check what Y1[id] is. Split Yn to lists l1-split(Y1[id], rep(1:6, each=2)) l2-split(Y2[id], rep(1:6, each=2)) do mapply on those list. But the result is rather silly as Michael pointed out. mapply(chisq.test, l1, l2, SIMPLIFY=F) or to get only p values lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),[, 3) Regards Petr It is not giving p value.Then i checked observed and expected values,it is taking all numbers under consideration.but as i mentioned earlier i want p value for each row and therefore degree of freedom will be 1. example- I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. now it should generate p value for next row taking 35 and 84 (v1 and v2) as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi square test for all 6 rows and will generate 6 p values.My data frame has lot of rows(approx. ). Can you please help me with this. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: R. Michael Weylandt [michael.weyla...@gmail.com] Sent: Wednesday, August 17, 2011 7:11 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Chi square test on data frame I think everything below is right, but it's all a little helter-skelter so take it with a grain of salt: First things first, make your data with dput() for the list. Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1, V2, V3, V4, W1, W2, W3, W4))) Now, Y1 = Y[,1:4] Y2 = Y[,-(1:4)] id = apply(Y1,1,order,decreasing=T)[1:2,] # This has the columns you want in each row, but it's not directly appropriate for subsetting # Specifically, the problem is that the row information is implicit in where the col index is in id # We directly extract and force into a 2-col vector that gives rows and columns for each data point id = cbind(as.vector(col(id)),as.vector(id)) Now you can take Y1[id] as the observed values and Y2[id] as the expected. But, to be honest, it sounds like you have more problems in using a chi-sq test than anything else. Beyond all the zeros, you should note that you always have #obs = #expected because Y1= Y2. I'll leave that up to you though. Hope this helps and please make sure you can take my code apart piece by piece to understand it: there's some odd data manipulation that takes advantage of R's way of coercing matrices to vectors and if your actual data isn't like the provided example, you may have to modify. Michael Weylandt On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas vikas.ban...@kcl.ac.uk mailto:vikas.ban...@kcl.ac.uk wrote: Is there anyone who can help me with chi square test on data frame.I am struggling from last 2 days.I will be very thankful to you. Dear all, I have been working on this problem from so many hours but did not find any solution. I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00
Re: [R] Chi square test on data frame
If your data is d1: temp - apply(d1[,1:4], 1, order, decreasing=TRUE)[1:2,] temp - rbind(temp, temp+4) result - sapply(1:nrow(d1), function(i) chisq.test(matrix(as.matrix(d1[i,temp[,i]]), ncol=2))) Uwe Ligges On 16.08.2011 23:26, Bansal, Vikas wrote: Dear all, I have been working on this problem from so many hours but did not find any solution. I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 from first four columns, for each row I have to take two largest values. and these two values will be considered as observed values.And from last four column we will get the expected values.So i have to perform chi square test for each row to get p values. example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. Now as we have observed value as well as expected we have to perform chi square test to get p values for each row in a new column. So far I was working as returning the index for two largest value with- sort.int(df,index.return=TRUE)$ix[c(4,3)] but it does not accept data frame. Can you please give some idea how to do this,because it is very tricky and after studying a lot, I am not able to perform.Please help. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi square test on data frame
My reservations about the methodology aside, it's probably not a bad idea to include an error checking line for the case when the probability of the second event is 0 (and so, unsurprisingly, the chi-sq test rejects the null hypothesis) to look at things like the first line: Try this: Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1, V2, V3, V4, W1, W2, W3, W4))) Fnc - function(Y) { Y1 = Y[1:4] Y2 = Y[-(1:4)] id = order(Y1,decreasing=T)[1:2] if (any(Y2[id]==0)) {return(NA)} p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value return(p) } Res = apply(Y,1,Fnc) #You could even pre-calculate the indices to speed it up a little id = apply(Y[,1:4],1,order,decreasing=T)[1:2,] Y.Id = cbind(Y,t(id)) Fnc2 - function(Y) { Y1 = Y[1:4] Y2 = Y[5:8] id = Y[9:10] if (any(Y2[id]==0)) {return(NA)} p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value return(p) } Res2 = apply(Y.Id,1,Fnc2) identical(Res,Res2) TRUE Hope this helps, Michael On Thu, Aug 18, 2011 at 4:16 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43: Dear Michael, Thanks a lot for your reply and for your help.I was struggling so much but your suggestion showed me a path to the solution of my problem.I have tried your code on my data frame step wise and it looks fine to me.But when i tried chi square test- res=chisq.test(y1[id],p=y2[id],rescale.p=T) Chi-squared test for given probabilities data: y1[id] X-squared = NaN, df = 19997, p-value = NA Warning message: In chisq.test(y1[id], p = y2[id], rescale.p = T) : Chi-squared approximation may be incorrect Check what Y1[id] is. Split Yn to lists l1-split(Y1[id], rep(1:6, each=2)) l2-split(Y2[id], rep(1:6, each=2)) do mapply on those list. But the result is rather silly as Michael pointed out. mapply(chisq.test, l1, l2, SIMPLIFY=F) or to get only p values lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),[, 3) Regards Petr It is not giving p value.Then i checked observed and expected values,it is taking all numbers under consideration.but as i mentioned earlier i want p value for each row and therefore degree of freedom will be 1. example- I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. now it should generate p value for next row taking 35 and 84 (v1 and v2) as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi square test for all 6 rows and will generate 6 p values.My data frame has lot of rows(approx. ). Can you please help me with this. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: R. Michael Weylandt [michael.weyla...@gmail.com] Sent: Wednesday, August 17, 2011 7:11 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Chi square test on data frame I think everything below is right, but it's all a little helter-skelter so take it with a grain of salt: First things first, make your data with dput() for the list. Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1, V2, V3, V4, W1, W2, W3, W4))) Now, Y1 = Y[,1:4] Y2 = Y[,-(1:4)] id = apply(Y1,1,order,decreasing=T)[1:2,] # This has the columns you want in each row, but it's not directly appropriate for subsetting # Specifically, the problem is that the row information is implicit in where the col index is in id # We directly extract and force into a 2-col vector that gives rows and columns for each data point id = cbind(as.vector(col(id)),as.vector(id)) Now you can take Y1[id] as the observed values and Y2[id] as the expected. But, to be honest, it sounds like you have more
[R] Chi square test on data frame
Is there anyone who can help me with chi square test on data frame.I am struggling from last 2 days.I will be very thankful to you. Dear all, I have been working on this problem from so many hours but did not find any solution. I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 from first four columns, for each row I have to take two largest values. and these two values will be considered as observed values.And from last four column we will get the expected values.So i have to perform chi square test for each row to get p values. example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. Now as we have observed value as well as expected we have to perform chi square test to get p values for each row in a new column. So far I was working as returning the index for two largest value with- sort.int(df,index.return=TRUE)$ix[c(4,3)] but it does not accept data frame. Can you please give some idea how to do this,because it is very tricky and after studying a lot, I am not able to perform.Please help. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi square test on data frame
I think everything below is right, but it's all a little helter-skelter so take it with a grain of salt: First things first, make your data with dput() for the list. Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1, V2, V3, V4, W1, W2, W3, W4))) Now, Y1 = Y[,1:4] Y2 = Y[,-(1:4)] id = apply(Y1,1,order,decreasing=T)[1:2,] # This has the columns you want in each row, but it's not directly appropriate for subsetting # Specifically, the problem is that the row information is implicit in where the col index is in id # We directly extract and force into a 2-col vector that gives rows and columns for each data point id = cbind(as.vector(col(id)),as.vector(id)) Now you can take Y1[id] as the observed values and Y2[id] as the expected. But, to be honest, it sounds like you have more problems in using a chi-sq test than anything else. Beyond all the zeros, you should note that you always have #obs = #expected because Y1= Y2. I'll leave that up to you though. Hope this helps and please make sure you can take my code apart piece by piece to understand it: there's some odd data manipulation that takes advantage of R's way of coercing matrices to vectors and if your actual data isn't like the provided example, you may have to modify. Michael Weylandt On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas vikas.ban...@kcl.ac.ukwrote: Is there anyone who can help me with chi square test on data frame.I am struggling from last 2 days.I will be very thankful to you. Dear all, I have been working on this problem from so many hours but did not find any solution. I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 from first four columns, for each row I have to take two largest values. and these two values will be considered as observed values.And from last four column we will get the expected values.So i have to perform chi square test for each row to get p values. example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. Now as we have observed value as well as expected we have to perform chi square test to get p values for each row in a new column. So far I was working as returning the index for two largest value with- sort.int(df,index.return=TRUE)$ix[c(4,3)] but it does not accept data frame. Can you please give some idea how to do this,because it is very tricky and after studying a lot, I am not able to perform.Please help. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi square test on data frame
Dear Michael, Thanks a lot for your reply and for your help.I was struggling so much but your suggestion showed me a path to the solution of my problem.I have tried your code on my data frame step wise and it looks fine to me.But when i tried chi square test- res=chisq.test(y1[id],p=y2[id],rescale.p=T) Chi-squared test for given probabilities data: y1[id] X-squared = NaN, df = 19997, p-value = NA Warning message: In chisq.test(y1[id], p = y2[id], rescale.p = T) : Chi-squared approximation may be incorrect It is not giving p value.Then i checked observed and expected values,it is taking all numbers under consideration.but as i mentioned earlier i want p value for each row and therefore degree of freedom will be 1. example- I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. now it should generate p value for next row taking 35 and 84 (v1 and v2) as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi square test for all 6 rows and will generate 6 p values.My data frame has lot of rows(approx. ). Can you please help me with this. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: R. Michael Weylandt [michael.weyla...@gmail.com] Sent: Wednesday, August 17, 2011 7:11 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Chi square test on data frame I think everything below is right, but it's all a little helter-skelter so take it with a grain of salt: First things first, make your data with dput() for the list. Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1, V2, V3, V4, W1, W2, W3, W4))) Now, Y1 = Y[,1:4] Y2 = Y[,-(1:4)] id = apply(Y1,1,order,decreasing=T)[1:2,] # This has the columns you want in each row, but it's not directly appropriate for subsetting # Specifically, the problem is that the row information is implicit in where the col index is in id # We directly extract and force into a 2-col vector that gives rows and columns for each data point id = cbind(as.vector(col(id)),as.vector(id)) Now you can take Y1[id] as the observed values and Y2[id] as the expected. But, to be honest, it sounds like you have more problems in using a chi-sq test than anything else. Beyond all the zeros, you should note that you always have #obs = #expected because Y1= Y2. I'll leave that up to you though. Hope this helps and please make sure you can take my code apart piece by piece to understand it: there's some odd data manipulation that takes advantage of R's way of coercing matrices to vectors and if your actual data isn't like the provided example, you may have to modify. Michael Weylandt On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas vikas.ban...@kcl.ac.ukmailto:vikas.ban...@kcl.ac.uk wrote: Is there anyone who can help me with chi square test on data frame.I am struggling from last 2 days.I will be very thankful to you. Dear all, I have been working on this problem from so many hours but did not find any solution. I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 from first four columns, for each row I have to take two largest values. and these two values will be considered as observed values.And from last four column we will get the expected values.So i have to perform chi square test for each row to get p values. example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from
[R] Chi square test on data frame
Dear all, I have been working on this problem from so many hours but did not find any solution. I have a data frame with 8 columns- V1 V2 V3 V4 W1 W2W3 W4 1 084 22 10 0 84 0 0 2358400 22 84 0 0 3 0 0 0 48 0 00 48 4 04800 0 48 0 0 5 08400 0 84 0 0 6 0 00 48 0 00 48 from first four columns, for each row I have to take two largest values. and these two values will be considered as observed values.And from last four column we will get the expected values.So i have to perform chi square test for each row to get p values. example for first row is- first two largest values are 84(in V2) and 22 (in V3).so these are considered as observed values.Now if the largest values are in V2 and V3,we have to pick expected values from W2 and W3 which are 84 and 0.I know for chi square test values should not be 0 but we will ignore the warning. Now as we have observed value as well as expected we have to perform chi square test to get p values for each row in a new column. So far I was working as returning the index for two largest value with- sort.int(df,index.return=TRUE)$ix[c(4,3)] but it does not accept data frame. Can you please give some idea how to do this,because it is very tricky and after studying a lot, I am not able to perform.Please help. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi square test of proprotions in 2 groups of different sizes
Hello! Very sorry for a probably very simple question - I looked but did not find an answer in the archives. I have a table counts (below) that shows counts by Option within each of my 2 groups. However, my groups have different sizes (N1=255 and N2=68). Table prop shows the resulting proportions within each group. I would like to compare the proportions in 2 groups using Chi Square test. However, I am not sure how to do it because chisq.test(counts) does not take into account the fact that the sizes of the groups are different. Any hint is greatly appreciated. Thank you! Dimitri G1counts - matrix(c(54,76,125), ncol = 1) G2counts - matrix(c(14,19,35), ncol = 1) counts-cbind(G1counts,G2counts) colnames(counts)-c(Group1,Group2); rownames(counts)-c(Option1,Option2,Option3) N1=255 N2=68 Ns=c(N1,N2) prop1-G1counts/N1 prop2-G2counts/N2 prop-cbind(prop1,prop2) colnames(prop)-c(Group1,Group2); rownames(prop)-c(Option1,Option2,Option3) (Ns);(counts);(prop);sum(prop) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-Square Test Disagreement
I was asked by my boss to do an analysis on a large data set, and I am trying to convince him to let me use R rather than SPSS. I think Sweave could make my life much much easier. To get me a little closer to this goal, I ran my analysis through R and SPSS and compared the resulting values. In all but one case, they were the same. Given the matrix [,1] [,2] [1,] 110 358 [2,] 71 312 [3,] 29 139 [4,] 31 77 [5,] 13 32 This is the output from R: chisq.test(test29) Pearson's Chi-squared test data: test29 X-squared = 9.593, df = 4, p-value = 0.04787 But, the same data in SPSS generates a p value of .051. It's a small but important difference. I played around and rescaled things, and tried different values for B, but I never could get R to reach .051. I'd like to know which program is correct - R or SPSS? I know, this is a biased place to ask such a question. I also appreciate all input that will help me use R more effectively. The difference could be the result of my own ignorance. thanks --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On 11/26/2008 9:51 AM, Andrew Choens wrote: I was asked by my boss to do an analysis on a large data set, and I am trying to convince him to let me use R rather than SPSS. I think Sweave could make my life much much easier. To get me a little closer to this goal, I ran my analysis through R and SPSS and compared the resulting values. In all but one case, they were the same. Given the matrix [,1] [,2] [1,] 110 358 [2,] 71 312 [3,] 29 139 [4,] 31 77 [5,] 13 32 This is the output from R: chisq.test(test29) Pearson's Chi-squared test data: test29 X-squared = 9.593, df = 4, p-value = 0.04787 But, the same data in SPSS generates a p value of .051. It's a small but important difference. I played around and rescaled things, and tried different values for B, but I never could get R to reach .051. I'd like to know which program is correct - R or SPSS? I know, this is a biased place to ask such a question. I also appreciate all input that will help me use R more effectively. The difference could be the result of my own ignorance. The SPSS p-value is for the Likelihood Ratio Chi-squared test, not Pearson's. For Pearson's Chi-squared test in SPSS (16.0.2), I get p=0.04787, so the results do match if you do the same Chi-squared test. thanks --andy -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
G'day Andy, On Wed, 26 Nov 2008 14:51:50 + Andrew Choens [EMAIL PROTECTED] wrote: I was asked by my boss to do an analysis on a large data set, and I am trying to convince him to let me use R rather than SPSS. Very laudable of you. :) This is the output from R: chisq.test(test29) Pearson's Chi-squared test data: test29 X-squared = 9.593, df = 4, p-value = 0.04787 But, the same data in SPSS generates a p value of .051. It's a small but important difference. Chuck explained already the reason for this small difference. I just take issue about it being an important difference. In my opinion, this difference is not important at all. It would only be important to people who are still sticking to arbitrary cut-off points that are mainly due to historical coincidences and the lack of computing power at those time in history. If somebody tells you that this difference is important, ask him or her whether he or she will be willing to finance you a room full of calculators (in the sense of Pearson's time) and whether he or she wants you to do all your calculations and analyses with these calculators in future. Alternatively, you could ask the person whether he or she would like the anaesthetist during his or her next operation to use chloroform given his or her nostalgic penchant for out-dated rituals/methods. I played around and rescaled things, and tried different values for B, but I never could get R to reach .051. Well, I have no problem when using simulated p-values to get something close to 0.051; look at the last try. The second one might also be noteworthy. Unfortunately, I didn't save the seed beforehand. test29 - matrix(c(110,358,71,312,29,139,31,77,13,32), byrow=TRUE, ncol=2) test29 [,1] [,2] [1,] 110 358 [2,] 71 312 [3,] 29 139 [4,] 31 77 [5,] 13 32 chisq.test(test29, simul=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.04798 chisq.test(test29, simul=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.05697 chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.0463 chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.0499 chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.0486 chisq.test(test29, simul=TRUE, B=2) Pearson's Chi-squared test with simulated p-value (based on 2 replicates) data: test29 X-squared = 9.593, df = NA, p-value = 0.05125 Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On Thu, 2008-11-27 at 00:46 +0800, Berwin A Turlach wrote: Chuck explained already the reason for this small difference. I just take issue about it being an important difference. In my opinion, this difference is not important at all. It would only be important to people who are still sticking to arbitrary cut-off points that are mainly due to historical coincidences and the lack of computing power at those time in history. If somebody tells you that this difference is important, ask him or her whether he or she will be willing to finance you a room full of calculators (in the sense of Pearson's time) and whether he or she wants you to do all your calculations and analyses with these calculators in future. Alternatively, you could ask the person whether he or she would like the anaesthetist during his or her next operation to use chloroform given his or her nostalgic penchant for out-dated rituals/methods. Yes he did and when I realized the source of my confusion I was appropriately chastised. I felt like a bit of a fool. Of course, I should try comparing apples to apples. Oranges are another thing entirely. As to the importance of the difference, I am of two minds. On the one hand I fully agree with you. It is an anachronistic approach. On the other hand we don't all have the pleasure of working in a math department where such subtleties are well understood. I work for a consulting firm that advises state and local governments (USA). I personally do try to expand my understanding on statistics and math (I do not have a degree in math), but my clients do not. When I'm working with someone from the government, it is sometimes easier to simply tell them that relationship x is significant at a certain level of certainty. Although I doubt they could really explain the details, they have some basic understanding of what I am talking about. Subtleties are sometimes lost on our public servants. And, since I do work for government, if I ask for a roomful of calculators, I might just get them. And really, what am I going to do with a roomful of calculators? --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On 26-Nov-08 17:57:52, Andrew Choens wrote: [...] And, since I do work for government, if I ask for a roomful of calculators, I might just get them. And really, what am I going to do with a roomful of calculators? --andy Insert something humorous here. :-) Next time the launch of an incoming nuclear strike is detected, set them to work as follows (following Karl Pearson's historical precedent): Anti-aircraft guns all day long: Computing for the Ministry of Munitions JUNE BARROW GREEN (Open University) From January 1917 until March 1918 Pearson and his staff of mathematicians and human computers at the Drapers Biometric Laboratory worked tirelessly on the computing of ballistic charts, high-angle range tables and fuze-scales for AV Hill of the Anti-Aircraft Experimental Section. Things did not always go smoothly -- Pearson did not take kindly to the calculations of his staff being questioned -- and Hill sometimes had to work hard to keep the peace. If you have enough of them (and Pearson undoubtedly did, so you can quote that in your requisition request), then you might just get the answer in time! [ The above excerpted from http://tinyurl.com/6byoub ] Good luck! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Nov-08 Time: 18:35:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
Next time the launch of an incoming nuclear strike is detected, set them to work as follows (following Karl Pearson's historical precedent): Anti-aircraft guns all day long: Computing for the Ministry of Munitions JUNE BARROW GREEN (Open University) From January 1917 until March 1918 Pearson and his staff of mathematicians and human computers at the Drapers Biometric Laboratory worked tirelessly on the computing of ballistic charts, high-angle range tables and fuze-scales for AV Hill of the Anti-Aircraft Experimental Section. Things did not always go smoothly -- Pearson did not take kindly to the calculations of his staff being questioned -- and Hill sometimes had to work hard to keep the peace. If you have enough of them (and Pearson undoubtedly did, so you can quote that in your requisition request), then you might just get the answer in time! [ The above excerpted from http://tinyurl.com/6byoub ] Good luck! Ted. That is absolutely classic. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi-square test
On Tuesday 08 April 2008 17:04:16 Roslina Zakaria wrote: Hi R-users, I would like to find the goodness of fit using Chi-suare test for my data below: xobs=observed data, xtwe=predicted data using tweedie, xgam=predicted data using gamma xobs - c(223,46,12,5,7,17) xtwe - c(217.33,39,14,18.33,6.67,14.67) xgam - c(224.67,37.33,12.33,15.33,5.33,15) chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE) Error in chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE) : unused argument(s) (xtwe = c(217.33, 39, 14, 18.33, 6.67, 14.67)) chisq.test(x, p = p, rescale.p = TRUE) I'm not sure what's wrong with it. Thank you so much for your help. Try this instead: chisq.test(xobs, p=xtwe, rescale.p = TRUE) The help file might be a little obscure. JWDougherty __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chi-square test
Hi R-users, I would like to find the goodness of fit using Chi-suare test for my data below: xobs=observed data, xtwe=predicted data using tweedie, xgam=predicted data using gamma xobs - c(223,46,12,5,7,17) xtwe - c(217.33,39,14,18.33,6.67,14.67) xgam - c(224.67,37.33,12.33,15.33,5.33,15) chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE) Error in chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE) : unused argument(s) (xtwe = c(217.33, 39, 14, 18.33, 6.67, 14.67)) chisq.test(x, p = p, rescale.p = TRUE) I'm not sure what's wrong with it. Thank you so much for your help. You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.