Re: [R] Chi-square test: Specifying expected proportions for two way table

2021-10-16 Thread Bert Gunter
Perhaps I misunderstand, but ?chisq.test explicitly says:

"If x is a matrix with at least two rows and columns, it is taken as a
two-dimensional contingency table: the entries of x must be
non-negative integers. Otherwise, x and y must be vectors or factors
of the same length; cases with missing values are removed, the objects
are coerced to factors, and the contingency table is computed from
these. Then Pearson's chi-squared test is performed of the null
hypothesis that the joint distribution of the cell counts in a
2-dimensional contingency table is the product of the row and column
marginals."

Moreover, expected counts are one component of the returned result
(see the "value" section). Proportions can of course easily then be
obtained if so desired.


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Oct 16, 2021 at 8:52 AM Miloš Žarković  wrote:
>
> Hi,
>
> Is there a function where I can specify expected proportions for the
> two-way table to
> calculate the Chi-square test? chisq.test allows specifying only the
> one-way table.
> Otherwise, I will have to write the function, but I never trust myself not
> to make a mess
> programing.
>
> Thanks,
>
> Miloš
>
> Miloš Žarković
> Professor of Internal Medicine
> School of Medicine, University of Belgrade
> Clinic of Endocrinology, Clinical Centre of Serbia
> 11000 Belgrade
> PAK 112113
> Serbia
> Phone +381 11 3639 724
> email milos.zarko...@med.bg.ac.rs
> milos.zarko...@gmail.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chi-square test: Specifying expected proportions for two way table

2021-10-16 Thread Miloš Žarković
Hi,

Is there a function where I can specify expected proportions for the
two-way table to
calculate the Chi-square test? chisq.test allows specifying only the
one-way table.
Otherwise, I will have to write the function, but I never trust myself not
to make a mess
programing.

Thanks,

Miloš

Miloš Žarković
Professor of Internal Medicine
School of Medicine, University of Belgrade
Clinic of Endocrinology, Clinical Centre of Serbia
11000 Belgrade
PAK 112113
Serbia
Phone +381 11 3639 724
email milos.zarko...@med.bg.ac.rs
milos.zarko...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-square test

2017-01-23 Thread Sergio Ferreira Cardoso
Dear David and John,

Thank you for your replies. Indeed I'm using ape and nlme packages. Here it is:

> fit<-gls(fcl~mass+activity+agility,correlation=corBrownian(phy=tree),data=df,method="ML",weights=varFixed(~vf))
> Anova(fit)
Analysis of Deviance Table (Type II tests)

Response: fcl
 Df  Chisq Pr(>Chisq)
mass  1 0.1756 0.6752
activity  2 0.5549 0.7577
agility   4 3.2903 0.5105

Anyway, I have the help I was looking for. Thank you vey much.

Best regards,
Sérgio.

- Mensagem original -
> De: "Fox, John" <j...@mcmaster.ca>
> Para: "Sergio Ferreira Cardoso" <sergio.ferreira-card...@umontpellier.fr>
> Cc: "R-help list" <r-help@r-project.org>
> Enviadas: Sábado, 21 De Janeiro de 2017 6:09:22
> Assunto: Re: [R] Chi-square test

> Dear Sergio,
> 
> You appear to have asked this question twice on r-help.
> 
> Anova() has no specific method for “gls” models (I assume, though you don’t 
> say
> so, that the model is fit by gls() in the nlme package), but the default 
> method
> works and provides Wald chi-square tests for terms in the model. I don’t
> understand the model formula x ~ 1 + 2 + 3 + x, however, and so I have no idea
> what gls() would do with this model, other than report an error. Perhaps you
> can show us the output — or, better yet, provide a reproducible example.
> 
> As a general matter, for 1-df terms in an additive model, the 1-df chi-square
> values reported by Anova() will simply be the squares of the corresponding 
> Wald
> statistics (labelled “t” I believe) reported in the summary of the model.
> Although the p-value is from the upper tail of the chi-square distribution, 
> the
> test is inherently two-sided.
> 
> Best,
> John
> 
> -
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> Web: http::/socserv.mcmaster.ca/jfox
> 
>> On Jan 20, 2017, at 8:36 AM, Sergio Ferreira Cardoso
>> <sergio.ferreira-card...@umontpellier.fr> wrote:
>> 
>> Dear all,
>> 
>> Anova() for .car package retrieves Chi-square statistics when I'm testing a
>> model the significance of a multivariate .gls model
>> gls(x~1+2+3+x,corBrownian(phy=tree), ...).
>> Is this Chi-square a two-sided test?
>> 
>> Thank you.
>> 
>> Best,
>> Sérgio.
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-square test

2017-01-20 Thread David Winsemius

> On Jan 20, 2017, at 7:36 AM, Sergio Ferreira Cardoso 
>  wrote:
> 
> Dear all, 
> 
> Anova() for .car package retrieves Chi-square statistics when I'm testing a 
> model the significance of a multivariate .gls model 
> gls(x~1+2+3+x,corBrownian(phy=tree), ...). 
> Is this Chi-square a two-sided test? 
that
If you explain what you mean by a "2-sided test" we might be able to help. It’s 
unlikely that the author set up the test so that it would fail when the fit was 
so good that the chi-square statistic was very small, but it’s also likely that 
departures from the implicit hypothesis of all the coefficients being identity 
0 would have raised the chi-square statistic away from zero.

— 
David.
> 
> Thank you. 
> 
> Best, 
> Sérgio. 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-square test

2017-01-20 Thread Fox, John
Dear Sergio,

You appear to have asked this question twice on r-help.

Anova() has no specific method for “gls” models (I assume, though you don’t say 
so, that the model is fit by gls() in the nlme package), but the default method 
works and provides Wald chi-square tests for terms in the model. I don’t 
understand the model formula x ~ 1 + 2 + 3 + x, however, and so I have no idea 
what gls() would do with this model, other than report an error. Perhaps you 
can show us the output — or, better yet, provide a reproducible example.

As a general matter, for 1-df terms in an additive model, the 1-df chi-square 
values reported by Anova() will simply be the squares of the corresponding Wald 
statistics (labelled “t” I believe) reported in the summary of the model. 
Although the p-value is from the upper tail of the chi-square distribution, the 
test is inherently two-sided.

Best,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
Web: http::/socserv.mcmaster.ca/jfox

> On Jan 20, 2017, at 8:36 AM, Sergio Ferreira Cardoso 
>  wrote:
> 
> Dear all, 
> 
> Anova() for .car package retrieves Chi-square statistics when I'm testing a 
> model the significance of a multivariate .gls model 
> gls(x~1+2+3+x,corBrownian(phy=tree), ...). 
> Is this Chi-square a two-sided test? 
> 
> Thank you. 
> 
> Best, 
> Sérgio. 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Chi-square test

2017-01-20 Thread Sergio Ferreira Cardoso
Dear all, 

Anova() for .car package retrieves Chi-square statistics when I'm testing a 
model the significance of a multivariate .gls model 
gls(x~1+2+3+x,corBrownian(phy=tree), ...). 
Is this Chi-square a two-sided test? 

Thank you. 

Best, 
Sérgio. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Chi-square test

2015-02-20 Thread pari hesabi
Hello,
If the vector of observed frequencies is:  
f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
5.009534e-02, 2.645857e-02,0.0205403)
The sum of the probabilities is equal to one. But when I want to do the the 
Chi-square test, I get this error: probabilities must sum to one.
Does anybody know the reason?
Best Regards,
pari
  
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-square test

2015-02-20 Thread Berend Hasselman

 On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote:
 
 Hello,
 If the vector of observed frequencies is:  
 f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
 and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
 ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
 5.009534e-02, 2.645857e-02,0.0205403)
 The sum of the probabilities is equal to one. But when I want to do the the 
 Chi-square test, I get this error: probabilities must sum to one.

print  sum(p11)-1

 Does anybody know the reason?

R FAQ 7.31  (http://cran.r-project.org/doc/FAQ/R-FAQ.html)

Berend

 Best Regards,
 pari
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-square test

2015-02-20 Thread David Winsemius

On Feb 20, 2015, at 10:05 AM, pari hesabi wrote:

 Hello,
 If the vector of observed frequencies is:  
 f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
 and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
 ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
 5.009534e-02, 2.645857e-02,0.0205403)
 The sum of the probabilities is equal to one.

Well, the sum is close to 1.0 but not exact. There's a simple fix:

 sum(p11)==1
[1] FALSE
 sum( p11/sum(p11) )==1
[1] TRUE

  But when I want to do the the Chi-square test, I get this error: 
 probabilities must sum to one.
 Does anybody know the reason?

Numerical accuracy. See R-FAQ 7.31

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-square test

2015-02-20 Thread David L Carlson
And probably why chisq.test has the rescale.p= argument. Your second problem 
with small expected values can be handled with simulate.p.value=.

 chisq.test(f, p=p11)
Error in chisq.test(f, p = p11) : probabilities must sum to 1.
 1-sum(p11)
[1] 4.3036e-08
 chisq.test(f, p=p11, rescale.p=TRUE)

Chi-squared test for given probabilities

data:  f
X-squared = 7.6268, df = 14, p-value = 0.9078

Warning message:
In chisq.test(f, p = p11, rescale.p = TRUE) :
  Chi-squared approximation may be incorrect
 chisq.test(f, p=p11, rescale.p=TRUE, simulate.p.value=TRUE)

Chi-squared test for given probabilities with simulated p-value (based
on 2000 replicates)

data:  f
X-squared = 7.6268, df = NA, p-value = 0.7996

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman
Sent: Friday, February 20, 2015 12:13 PM
To: pari hesabi
Cc: r-help@r-project.org
Subject: Re: [R] Chi-square test


 On 20-02-2015, at 19:05, pari hesabi statistic...@hotmail.com wrote:
 
 Hello,
 If the vector of observed frequencies is:  
 f-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
 and the vector of probability :p11-c(7.577864e-06, 1.999541e-04  
 ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
 5.009534e-02, 2.645857e-02,0.0205403)
 The sum of the probabilities is equal to one. But when I want to do the the 
 Chi-square test, I get this error: probabilities must sum to one.

print  sum(p11)-1

 Does anybody know the reason?

R FAQ 7.31  (http://cran.r-project.org/doc/FAQ/R-FAQ.html)

Berend

 Best Regards,
 pari
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] chi-square test

2014-09-15 Thread eliza botto
Dear useRs of R,
I have two datasets (TT and SS) and i wanted to to see if my data is uniformly 
distributed or not?I tested it through chi-square test and results are given at 
the end of it.Now apparently P-value has a significant importance but I cant 
interpret the results and why it says that In chisq.test(TT) : Chi-squared 
approximation may be incorrect
###
 dput(TT)
structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 
0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 
0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 
0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 
1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 
1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 
1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 
4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 
1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 
2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268L,!
  3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 
3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 
2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 2131L, 
3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 
3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 
3716L, 2697L, 2697L, 1358L)), .Names = c(clc5, quota_massima), class = 
data.frame, row.names = c(NA, -124L))

  chisq.test(TT)
Pearson's Chi-squared test
data:  TT
X-squared = 411.5517, df = 123, p-value  2.2e-16
Warning message:
In chisq.test(TT) : Chi-squared approximation may be incorrect 
###
 dput(SS)
structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, 27!
 4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 
948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 
822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 703L, 760L, 
711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 
1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 
1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 626L, 461L, 350L, 
1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 686L, 
703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 
79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = 
c(NDVIanno, delta_z), class = data.frame, row.names = c(NA, -124L))
  chisq.test(SS)
Pearson's Chi-squared test
data:  SS
X-squared = 72.8115, df = 123, p-value = 0.
Warning message:
In chisq.test(SS) : Chi-squared approximation may be incorrect
#
Kindly guide me through like you always did :)
thanks in advance,


Eliza 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi-square test

2014-09-15 Thread Rick Bilonick

On 09/15/2014 10:57 AM, eliza botto wrote:

Dear useRs of R,
I have two datasets (TT and SS) and i wanted to to see if my data is uniformly 
distributed or not?I tested it through chi-square test and results are given at the end 
of it.Now apparently P-value has a significant importance but I cant interpret the 
results and why it says that In chisq.test(TT) : Chi-squared approximation may be 
incorrect
###

dput(TT)

structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 0.27, 
0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 0, 0, 
0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 0.41, 
0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = c(1167L, 
1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 1369L, 1369L, 
1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 2579L, 2507L, 
1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 1669L, 4743L, 
4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 2584L, 2579L, 
1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 3196L, 3196L, 
2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268!

L,!

   3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 4743L, 3760L, 3885L, 3760L, 4743L, 
2951L, 782L, 2957L, 3343L, 2697L, 2697L, 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 
4530L, 4530L, 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 2622L, 3197L, 
3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = 
c(clc5, quota_massima), class = data.frame, row.names = c(NA, -124L))


  chisq.test(TT)

 Pearson's Chi-squared test
data:  TT
X-squared = 411.5517, df = 123, p-value  2.2e-16
Warning message:
In chisq.test(TT) : Chi-squared approximation may be incorrect
###

dput(SS)

structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, !

27!

  4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 948L, 1082L, 616L, 704L, 814L, 
450L, 865L, 987L, 1265L, 720L, 565L, 652L, 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 
450L, 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 287L, 1043L, 1465L, 963L, 
1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 
811L, 788L, 466L, 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 1370L, 902L, 
686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 
1524L, 1145L, 673L, 513L, 596L, 239L)), .Names = c(NDVIanno, delta_z), class = 
data.frame, row.names = c(NA, -124L))

  chisq.test(SS)

 Pearson's Chi-squared test
data:  SS
X-squared = 72.8115, df = 123, p-value = 0.
Warning message:
In chisq.test(SS) : Chi-squared approximation may be incorrect
#
Kindly guide me through like you always did :)
thanks in advance,


Eliza   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
You are using a Chi-squared test on a 124x2 matrix of values (not all 
integers) and many are zeros. The expected frequencies for many cells 
are very small (near zero, less than 1) hence the warning message. More 
importantly, does this application of the 

Re: [R] chi-square test

2014-09-15 Thread David L Carlson
Rick's question is a good one. It is unlikely that the results will be 
informative, but from a technical standpoint, you can estimate the p value 
using the simulate.p.value=TRUE argument to chisq.test().

 chisq.test(TT, simulate.p.value=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  TT
X-squared = 7919.632, df = NA, p-value = 0.0004998

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rick Bilonick
Sent: Monday, September 15, 2014 10:18 AM
To: r-help@r-project.org
Subject: Re: [R] chi-square test

On 09/15/2014 10:57 AM, eliza botto wrote:
 Dear useRs of R,
 I have two datasets (TT and SS) and i wanted to to see if my data is 
 uniformly distributed or not?I tested it through chi-square test and results 
 are given at the end of it.Now apparently P-value has a significant 
 importance but I cant interpret the results and why it says that In 
 chisq.test(TT) : Chi-squared approximation may be incorrect
 ###
 dput(TT)
 structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 
 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 
 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 
 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = 
 c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 
 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 
 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 
 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 
 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 
 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268!
 L,!
3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 
 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 
 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 
 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 
 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 
 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c(clc5, quota_massima), 
 class = data.frame, row.names = c(NA, -124L))

   chisq.test(TT)
  Pearson's Chi-squared test
 data:  TT
 X-squared = 411.5517, df = 123, p-value  2.2e-16
 Warning message:
 In chisq.test(TT) : Chi-squared approximation may be incorrect
 ###
 dput(SS)
 structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, !
 27!
   4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 
 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 
 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 
 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 
 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 
 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 
 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 
 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 650L, 1065L, 568L, 
 712L, 762L, 636L, 79L, 1092L, 955L, 158L, 1524L, 1145L, 673L, 513L, 596L, 
 239L)), .Names = c(NDVIanno, delta_z), class = data.frame, row.names = 
 c(NA, -124L))
   chisq.test(SS)
  Pearson's Chi-squared test
 data:  SS
 X-squared = 72.8115, df = 123, p-value = 0.
 Warning message

[R] chi square test

2013-06-17 Thread Dave Clark
I`m doing the chi square test in R, see below code: 

 row1 - c(27,17,13,21,80,24,35,41,18,51) #Category A (1-10) counts 
 row2 - c(27,11,26,13,30,28,17,30,10,21) #Category B (1-10) counts 
 data.table - rbind(row1,row2) 
 data.table 

then: 
 chisq.test(data.table) 

This gives me the chi figure, degrees of freedom and p value. 

But how do I get the results of individual cells?

And has anyone got any good ideas on a decent post hoc test to the chi
square test. 

Thanks all in advance, 

 

Dave Clark


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi square test

2013-06-17 Thread R. Michael Weylandt
On Mon, Jun 17, 2013 at 9:14 AM, Dave Clark d...@mailbox.co.uk wrote:
 I`m doing the chi square test in R, see below code:

 row1 - c(27,17,13,21,80,24,35,41,18,51) #Category A (1-10) counts
 row2 - c(27,11,26,13,30,28,17,30,10,21) #Category B (1-10) counts
 data.table - rbind(row1,row2)
 data.table

 then:
 chisq.test(data.table)

 This gives me the chi figure, degrees of freedom and p value.

 But how do I get the results of individual cells?

What do you mean 'results of individual cells'? As documented in
?chisq.test, you might be looking for one or more of

data.table$observed
data.table$expected
data.table$residuals
data.table$stdres

Pick your poison. ;-)

MW


 And has anyone got any good ideas on a decent post hoc test to the chi
 square test.

 Thanks all in advance,



 Dave Clark


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi square test

2013-06-17 Thread peter dalgaard

On Jun 17, 2013, at 10:36 , R. Michael Weylandt wrote:

 On Mon, Jun 17, 2013 at 9:14 AM, Dave Clark d...@mailbox.co.uk wrote:
 I`m doing the chi square test in R, see below code:
 
 row1 - c(27,17,13,21,80,24,35,41,18,51) #Category A (1-10) counts
 row2 - c(27,11,26,13,30,28,17,30,10,21) #Category B (1-10) counts
 data.table - rbind(row1,row2)
 data.table
 
 then:
 chisq.test(data.table)
 
 This gives me the chi figure, degrees of freedom and p value.
 
 But how do I get the results of individual cells?
 
 What do you mean 'results of individual cells'? As documented in
 ?chisq.test, you might be looking for one or more of
 
 data.table$observed
 data.table$expected
 data.table$residuals
 data.table$stdres
 
 Pick your poison. ;-)
 
 MW

Replace with chisq.test(data.table)$observed, etc.


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi square test

2013-06-17 Thread R. Michael Weylandt
On Mon, Jun 17, 2013 at 10:07 AM, peter dalgaard pda...@gmail.com wrote:

 On Jun 17, 2013, at 10:36 , R. Michael Weylandt wrote:

 What do you mean 'results of individual cells'? As documented in
 ?chisq.test, you might be looking for one or more of

 data.table$observed
 data.table$expected
 data.table$residuals
 data.table$stdres

 Pick your poison. ;-)

 MW

 Replace with chisq.test(data.table)$observed, etc.


D'Oh! Thanks for that, Peter.

MW

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square test and survey results

2011-10-12 Thread Jean V Adams
gheine wrote on 10/11/2011 02:31:46 PM:
 
 An organization has asked me to comment on the validity of their
 recent all-employee survey.  Survey responses, by geographic region, 
 compared
 with the total number of employees in each region, were as follows:
 
  ByRegion
All.Employees Survey.Respondents
 Region_1735142
 Region_2500 83
 Region_3897 78
 Region_4717133
 Region_5167 48
 Region_6309  0
 Region_7806125
 Region_8627122
 Region_9858177
 Region_10   851160
 Region_11   336 52
 Region_12  1823312
 Region_1380  9
 Region_14   774121
 Region_15   561 24
 Region_16   834134
 
 How well does the survey represent the employee population?
 Chi-square test says, not very well:
 
  chisq.test(ByRegion)
 
  Pearson's Chi-squared test
 
 data:  ByRegion
 X-squared = 163.6869, df = 15, p-value  2.2e-16
 
 By striking three under-represented regions (3,6, and 15), we get
 a more reasonable, although still not convincing, result:
 
  chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
 
  Pearson's Chi-squared test
 
 data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
 X-squared = 22.5643, df = 12, p-value = 0.03166


You can't simply eliminate the three regions with the fewest respondents 
(3, 6, and 15).  These are the three largest contributors to the 
chi-squared statistic, precisely because fewer people in those regions 
were surveyed than expected.  In addition, more people in regions 1, 5, 
and 9 were surveyed than expected.  This should be clear in a bar chart. 
And the resulting chi-squared test confirms this.

Jean


 This poses several questions:
 
 1)  Looking at a side-by-side barchart (proportion of responses vs.
 proportion of employees, per region), the pattern of survey responses
 appears, visually, to match fairly well the pattern of employees.  Is
 this a case where we trust the numbers and not the picture?
 
 2) Part of the problem, ironically, is that there were too many 
 responses
 to the survey.  If we had only one-tenth the responses, but in the same
 proportions by region, the chi-square statistic would look much better,
 (though with a warning about possible inaccuracy):
 
 data:  data.frame(ByRegion$All.Employees, 0.1 * 
 (ByRegion$Survey.Respondents))
 X-squared = 17.5912, df = 15, p-value = 0.2848
 
 Is there a way of reconciling a large response rate with an 
 unrepresentative
 response profile?  Or is the bad news that the survey will give very 
 precise
 results about a very ill-specified sub-population?
 
 (Of course, I would put in softer terms, like you need to assess the 
 degree
 of homogeneity across different regions .)
 
 3) Is Chi-squared really the right measure of how representative is the 
 survey?
 
  
 
 Thanks for any help you can give - hope these questions make sense -
 
 George H.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square test and survey results

2011-10-12 Thread Jan van der Laan

George,

Perhaps the site of the RISQ project (Representativity indicators for  
Survey Quality) might be of use: http://www.risq-project.eu/ . They  
also provide R-code to calculate their indicators.


HTH,
Jan



Quoting ghe...@mathnmaps.com:


An organization has asked me to comment on the validity of their
recent all-employee survey.  Survey responses, by geographic region, compared
with the total number of employees in each region, were as follows:


ByRegion

  All.Employees Survey.Respondents
Region_1735142
Region_2500 83
Region_3897 78
Region_4717133
Region_5167 48
Region_6309  0
Region_7806125
Region_8627122
Region_9858177
Region_10   851160
Region_11   336 52
Region_12  1823312
Region_1380  9
Region_14   774121
Region_15   561 24
Region_16   834134

How well does the survey represent the employee population?
Chi-square test says, not very well:


chisq.test(ByRegion)


Pearson's Chi-squared test

data:  ByRegion
X-squared = 163.6869, df = 15, p-value  2.2e-16

By striking three under-represented regions (3,6, and 15), we get
a more reasonable, although still not convincing, result:


chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])


Pearson's Chi-squared test

data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
X-squared = 22.5643, df = 12, p-value = 0.03166

This poses several questions:

1)  Looking at a side-by-side barchart (proportion of responses vs.
proportion of employees, per region), the pattern of survey responses
appears, visually, to match fairly well the pattern of employees.  Is
this a case where we trust the numbers and not the picture?

2) Part of the problem, ironically, is that there were too many responses
to the survey.  If we had only one-tenth the responses, but in the same
proportions by region, the chi-square statistic would look much better,
(though with a warning about possible inaccuracy):

data:  data.frame(ByRegion$All.Employees, 0.1 *   
(ByRegion$Survey.Respondents))

X-squared = 17.5912, df = 15, p-value = 0.2848

Is there a way of reconciling a large response rate with an unrepresentative
response profile?  Or is the bad news that the survey will give very precise
results about a very ill-specified sub-population?

(Of course, I would put in softer terms, like you need to assess the degree
of homogeneity across different regions .)

3) Is Chi-squared really the right measure of how representative is the
survey?

 

Thanks for any help you can give - hope these questions make sense -

George H.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square test and survey results

2011-10-12 Thread Greg Snow
The chisq.test function is expecting a contingency table, basically one column 
should have the count of respondents and the other column should have the count 
of non-respondents (yours looks like it is the total instead of the 
non-respondents), so your data is wrong to begin with.  A significant 
chi-square here just means that the proportion responding differs in some of 
the regions, that does not mean that the sample is representative (or not 
representative).  What is more important (and not in the data or standard 
tests) is if there is a relationship between why someone chose to respond and 
the outcomes of interest.

If you are concerned with different proportions responding then you could do 
post-stratification to correct for the inequality when computing other 
summaries or tests (though region 6 will still give you problems, you will need 
to make some assumptions, possibly combine it with another region that is 
similar).

Throwing away data is rarely, if ever, beneficial.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of ghe...@mathnmaps.com
 Sent: Tuesday, October 11, 2011 1:32 PM
 To: r-help@r-project.org
 Subject: [R] Chi-Square test and survey results
 
 An organization has asked me to comment on the validity of their
 recent all-employee survey.  Survey responses, by geographic region,
 compared
 with the total number of employees in each region, were as follows:
 
  ByRegion
All.Employees Survey.Respondents
 Region_1735142
 Region_2500 83
 Region_3897 78
 Region_4717133
 Region_5167 48
 Region_6309  0
 Region_7806125
 Region_8627122
 Region_9858177
 Region_10   851160
 Region_11   336 52
 Region_12  1823312
 Region_1380  9
 Region_14   774121
 Region_15   561 24
 Region_16   834134
 
 How well does the survey represent the employee population?
 Chi-square test says, not very well:
 
  chisq.test(ByRegion)
 
  Pearson's Chi-squared test
 
 data:  ByRegion
 X-squared = 163.6869, df = 15, p-value  2.2e-16
 
 By striking three under-represented regions (3,6, and 15), we get
 a more reasonable, although still not convincing, result:
 
  chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
 
  Pearson's Chi-squared test
 
 data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
 X-squared = 22.5643, df = 12, p-value = 0.03166
 
 This poses several questions:
 
 1)  Looking at a side-by-side barchart (proportion of responses vs.
 proportion of employees, per region), the pattern of survey responses
 appears, visually, to match fairly well the pattern of employees.  Is
 this a case where we trust the numbers and not the picture?
 
 2) Part of the problem, ironically, is that there were too many
 responses
 to the survey.  If we had only one-tenth the responses, but in the same
 proportions by region, the chi-square statistic would look much better,
 (though with a warning about possible inaccuracy):
 
 data:  data.frame(ByRegion$All.Employees, 0.1 *
 (ByRegion$Survey.Respondents))
 X-squared = 17.5912, df = 15, p-value = 0.2848
 
 Is there a way of reconciling a large response rate with an
 unrepresentative
 response profile?  Or is the bad news that the survey will give very
 precise
 results about a very ill-specified sub-population?
 
 (Of course, I would put in softer terms, like you need to assess the
 degree
 of homogeneity across different regions .)
 
 3) Is Chi-squared really the right measure of how representative is the
 survey?
 
  
 
 Thanks for any help you can give - hope these questions make sense -
 
 George H.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chi-Square test and survey results

2011-10-11 Thread gheine

An organization has asked me to comment on the validity of their
recent all-employee survey.  Survey responses, by geographic region, 
compared

with the total number of employees in each region, were as follows:


ByRegion

  All.Employees Survey.Respondents
Region_1735142
Region_2500 83
Region_3897 78
Region_4717133
Region_5167 48
Region_6309  0
Region_7806125
Region_8627122
Region_9858177
Region_10   851160
Region_11   336 52
Region_12  1823312
Region_1380  9
Region_14   774121
Region_15   561 24
Region_16   834134

How well does the survey represent the employee population?
Chi-square test says, not very well:


chisq.test(ByRegion)


Pearson's Chi-squared test

data:  ByRegion
X-squared = 163.6869, df = 15, p-value  2.2e-16

By striking three under-represented regions (3,6, and 15), we get
a more reasonable, although still not convincing, result:


chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])


Pearson's Chi-squared test

data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
X-squared = 22.5643, df = 12, p-value = 0.03166

This poses several questions:

1)  Looking at a side-by-side barchart (proportion of responses vs.
proportion of employees, per region), the pattern of survey responses
appears, visually, to match fairly well the pattern of employees.  Is
this a case where we trust the numbers and not the picture?

2) Part of the problem, ironically, is that there were too many 
responses

to the survey.  If we had only one-tenth the responses, but in the same
proportions by region, the chi-square statistic would look much better,
(though with a warning about possible inaccuracy):

data:  data.frame(ByRegion$All.Employees, 0.1 * 
(ByRegion$Survey.Respondents))

X-squared = 17.5912, df = 15, p-value = 0.2848

Is there a way of reconciling a large response rate with an 
unrepresentative
response profile?  Or is the bad news that the survey will give very 
precise

results about a very ill-specified sub-population?

(Of course, I would put in softer terms, like you need to assess the 
degree

of homogeneity across different regions .)

3) Is Chi-squared really the right measure of how representative is the 
survey?


 

Thanks for any help you can give - hope these questions make sense -

George H.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square test on data frame

2011-08-18 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43:

 
 Dear Michael,
 
 Thanks a lot for your reply and for your help.I was struggling so much 
but
 your suggestion showed me a path to the solution of my problem.I have 
 tried your code on my data frame step wise and it looks fine to me.But 
 when i tried chi square test-
 
 res=chisq.test(y1[id],p=y2[id],rescale.p=T)
 
 Chi-squared test for given probabilities
 
 data:  y1[id] 
 X-squared = NaN, df = 19997, p-value = NA
 
 Warning message:
 In chisq.test(y1[id], p = y2[id], rescale.p = T) :
   Chi-squared approximation may be incorrect

Check what Y1[id] is.

Split Yn to lists
l1-split(Y1[id], rep(1:6, each=2))
l2-split(Y2[id], rep(1:6, each=2))

do mapply on those list. But the result is rather silly as Michael pointed 
out.

mapply(chisq.test, l1, l2, SIMPLIFY=F)

or to get only p values

lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),[, 3)

Regards
Petr

 
 It is not giving p value.Then i checked observed and expected values,it 
is
 taking all numbers under consideration.but as i mentioned earlier i want 
p
 value for each row and therefore degree of freedom will be 1. example-
 
 I have a data frame with 8 columns-
   V1   V2   V3   V4  W1   W2W3   W4
 1 084   22   10   0  84  0  0
 2358400 22  84  0  0
 3 0 0  0  48   0   00 48
 4 04800   0  48   0  0
 5 08400   0  84   0  0
 6 0 00   48   0   00 48
 
 example for first row is-
 
 first two largest values are 84(in V2) and 22 (in V3).so these are 
 considered as observed values.Now if the largest values are in V2 and 
 V3,we have to pick expected values from W2 and W3 which are 84 and 0.I 
 know for chi square test values should not be 0 but we will ignore the 
warning.
 
 now it should generate p value for next row taking 35 and 84 (v1 and v2) 

 as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi 

 square test for all 6 rows and will generate 6 p values.My data frame 
has 
 lot of rows(approx. ).
 
 Can you please help me with this.
 
 
 
 Thanking you,
 Warm Regards
 Vikas Bansal
 Msc Bioinformatics
 Kings College London
 
 From: R. Michael Weylandt [michael.weyla...@gmail.com]
 Sent: Wednesday, August 17, 2011 7:11 PM
 To: Bansal, Vikas
 Cc: r-help@r-project.org
 Subject: Re: [R] Chi square test on data frame
 
 I think everything below is right, but it's all a little helter-skelter 
so
 take it with a grain of salt:
 
 First things first, make your data with dput() for the list.
 
 Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
 ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1,
 V2, V3, V4, W1, W2, W3, W4)))
 
 Now,
 
 Y1 = Y[,1:4]
 Y2 = Y[,-(1:4)]
 
 id = apply(Y1,1,order,decreasing=T)[1:2,]
 # This has the columns you want in each row, but it's not directly 
 appropriate for subsetting
 # Specifically, the problem is that the row information is implicit in 
 where the col index is in id
 # We directly extract and force into a 2-col vector that gives rows and 
 columns for each data point
 id = cbind(as.vector(col(id)),as.vector(id))
 
 Now you can take
 
 Y1[id] as the observed values and Y2[id] as the expected.
 
 But, to be honest, it sounds like you have more problems in using a 
chi-sq
 test than anything else. Beyond all the zeros, you should note that you 
 always have #obs = #expected because Y1= Y2. I'll leave that up to you 
though.
 
 Hope this helps and please make sure you can take my code apart piece by 

 piece to understand it: there's some odd data manipulation that takes 
 advantage of R's way of coercing matrices to vectors and if your actual 
 data isn't like the provided example, you may have to modify.
 
 Michael Weylandt
 
 On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas vikas.ban...@kcl.ac.uk
 mailto:vikas.ban...@kcl.ac.uk wrote:
 Is there anyone who can help me with chi square test on data frame.I am 
 struggling from last 2 days.I will be very  thankful to you.
 
 Dear all,
 
 I have been working on this problem from so many hours but did not find 
 any solution.
 I have a data frame with 8 columns-
   V1   V2   V3   V4  W1   W2W3   W4
 1 084   22   10   0  84  0  0
 2358400 22  84  0  0
 3 0 0  0  48   0   00 48
 4 04800   0  48   0  0
 5 08400   0  84   0  0
 6 0 00

Re: [R] Chi square test on data frame

2011-08-18 Thread Uwe Ligges

If your data is d1:

temp - apply(d1[,1:4], 1, order, decreasing=TRUE)[1:2,]
temp - rbind(temp, temp+4)
result - sapply(1:nrow(d1), function(i)
chisq.test(matrix(as.matrix(d1[i,temp[,i]]), ncol=2)))

Uwe Ligges


On 16.08.2011 23:26, Bansal, Vikas wrote:

Dear all,

I have been working on this problem from so many hours but did not find any 
solution.
I have a data frame with 8 columns-
V1   V2   V3   V4  W1   W2W3   W4
1 084   22   10   0  84  0  0
2358400 22  84  0  0
3 0 0  0  48   0   00 48
4 04800   0  48   0  0
5 08400   0  84   0  0
6 0 00   48   0   00 48

from first four columns, for each row I have to take two largest values. and 
these two values will be considered as observed values.And from last four 
column we will get the expected values.So i have to perform chi square test for 
each row to get p values.

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from W2 and W3 which are 84 and 0.I know for chi square test 
values should not be 0 but we will ignore the warning.
Now as we have observed value as well as expected we have to perform chi square 
test to get p values for each row in a new column.


So far I was working as returning the index for two largest value with-
sort.int(df,index.return=TRUE)$ix[c(4,3)]
  but it does not accept data frame.

Can you please give some idea how to do this,because it is very tricky and 
after studying a lot, I am not able to perform.Please help.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square test on data frame

2011-08-18 Thread R. Michael Weylandt
My reservations about the methodology aside, it's probably not a bad idea to
include an error checking line for the case when the probability of the
second event is 0 (and so, unsurprisingly, the chi-sq test rejects the null
hypothesis) to look at things like the first line:

Try this:

Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1,
V2, V3, V4, W1, W2, W3, W4)))

Fnc - function(Y) {
Y1 = Y[1:4]
Y2 = Y[-(1:4)]
id = order(Y1,decreasing=T)[1:2]
if (any(Y2[id]==0)) {return(NA)}
p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value
return(p)
}
Res = apply(Y,1,Fnc)

#You could even pre-calculate the indices to speed it up a little

id = apply(Y[,1:4],1,order,decreasing=T)[1:2,]
Y.Id = cbind(Y,t(id))

Fnc2 -  function(Y) {
Y1 = Y[1:4]
Y2 = Y[5:8]
id = Y[9:10]
if (any(Y2[id]==0)) {return(NA)}
p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value
return(p)
}
Res2 = apply(Y.Id,1,Fnc2)

 identical(Res,Res2)
TRUE

Hope this helps,

Michael

On Thu, Aug 18, 2011 at 4:16 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43:

 
  Dear Michael,
 
  Thanks a lot for your reply and for your help.I was struggling so much
 but
  your suggestion showed me a path to the solution of my problem.I have
  tried your code on my data frame step wise and it looks fine to me.But
  when i tried chi square test-
 
  res=chisq.test(y1[id],p=y2[id],rescale.p=T)
 
  Chi-squared test for given probabilities
 
  data:  y1[id]
  X-squared = NaN, df = 19997, p-value = NA
 
  Warning message:
  In chisq.test(y1[id], p = y2[id], rescale.p = T) :
Chi-squared approximation may be incorrect

 Check what Y1[id] is.

 Split Yn to lists
 l1-split(Y1[id], rep(1:6, each=2))
 l2-split(Y2[id], rep(1:6, each=2))

 do mapply on those list. But the result is rather silly as Michael pointed
 out.

 mapply(chisq.test, l1, l2, SIMPLIFY=F)

 or to get only p values

 lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),[, 3)

 Regards
 Petr

 
  It is not giving p value.Then i checked observed and expected values,it
 is
  taking all numbers under consideration.but as i mentioned earlier i want
 p
  value for each row and therefore degree of freedom will be 1. example-
 
  I have a data frame with 8 columns-
V1   V2   V3   V4  W1   W2W3   W4
  1 084   22   10   0  84  0  0
  2358400 22  84  0  0
  3 0 0  0  48   0   00 48
  4 04800   0  48   0  0
  5 08400   0  84   0  0
  6 0 00   48   0   00 48
 
  example for first row is-
 
  first two largest values are 84(in V2) and 22 (in V3).so these are
  considered as observed values.Now if the largest values are in V2 and
  V3,we have to pick expected values from W2 and W3 which are 84 and 0.I
  know for chi square test values should not be 0 but we will ignore the
 warning.
 
  now it should generate p value for next row taking 35 and 84 (v1 and v2)

  as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi

  square test for all 6 rows and will generate 6 p values.My data frame
 has
  lot of rows(approx. ).
 
  Can you please help me with this.
 
 
 
  Thanking you,
  Warm Regards
  Vikas Bansal
  Msc Bioinformatics
  Kings College London
  
  From: R. Michael Weylandt [michael.weyla...@gmail.com]
  Sent: Wednesday, August 17, 2011 7:11 PM
  To: Bansal, Vikas
  Cc: r-help@r-project.org
  Subject: Re: [R] Chi square test on data frame
 
  I think everything below is right, but it's all a little helter-skelter
 so
  take it with a grain of salt:
 
  First things first, make your data with dput() for the list.
 
  Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
  0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
  84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
  ), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1,
  V2, V3, V4, W1, W2, W3, W4)))
 
  Now,
 
  Y1 = Y[,1:4]
  Y2 = Y[,-(1:4)]
 
  id = apply(Y1,1,order,decreasing=T)[1:2,]
  # This has the columns you want in each row, but it's not directly
  appropriate for subsetting
  # Specifically, the problem is that the row information is implicit in
  where the col index is in id
  # We directly extract and force into a 2-col vector that gives rows and
  columns for each data point
  id = cbind(as.vector(col(id)),as.vector(id))
 
  Now you can take
 
  Y1[id] as the observed values and Y2[id] as the expected.
 
  But, to be honest, it sounds like you have more

[R] Chi square test on data frame

2011-08-17 Thread Bansal, Vikas
Is there anyone who can help me with chi square test on data frame.I am 
struggling from last 2 days.I will be very  thankful to you.

Dear all,

I have been working on this problem from so many hours but did not find any 
solution.
I have a data frame with 8 columns-
   V1   V2   V3   V4  W1   W2W3   W4
1 084   22   10   0  84  0  0
2358400 22  84  0  0
3 0 0  0  48   0   00 48
4 04800   0  48   0  0
5 08400   0  84   0  0
6 0 00   48   0   00 48

from first four columns, for each row I have to take two largest values. and 
these two values will be considered as observed values.And from last four 
column we will get the expected values.So i have to perform chi square test for 
each row to get p values.

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from W2 and W3 which are 84 and 0.I know for chi square test 
values should not be 0 but we will ignore the warning.
Now as we have observed value as well as expected we have to perform chi square 
test to get p values for each row in a new column.


So far I was working as returning the index for two largest value with-
sort.int(df,index.return=TRUE)$ix[c(4,3)]
 but it does not accept data frame.

Can you please give some idea how to do this,because it is very tricky and 
after studying a lot, I am not able to perform.Please help.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square test on data frame

2011-08-17 Thread R. Michael Weylandt
I think everything below is right, but it's all a little helter-skelter so
take it with a grain of salt:

First things first, make your data with dput() for the list.

Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1,
V2, V3, V4, W1, W2, W3, W4)))

Now,

Y1 = Y[,1:4]
Y2 = Y[,-(1:4)]

id = apply(Y1,1,order,decreasing=T)[1:2,]
# This has the columns you want in each row, but it's not directly
appropriate for subsetting
# Specifically, the problem is that the row information is implicit in where
the col index is in id
# We directly extract and force into a 2-col vector that gives rows and
columns for each data point
id = cbind(as.vector(col(id)),as.vector(id))

Now you can take

Y1[id] as the observed values and Y2[id] as the expected.

But, to be honest, it sounds like you have more problems in using a chi-sq
test than anything else. Beyond all the zeros, you should note that you
always have #obs = #expected because Y1= Y2. I'll leave that up to you
though.

Hope this helps and please make sure you can take my code apart piece by
piece to understand it: there's some odd data manipulation that takes
advantage of R's way of coercing matrices to vectors and if your actual data
isn't like the provided example, you may have to modify.

Michael Weylandt

On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas vikas.ban...@kcl.ac.ukwrote:

 Is there anyone who can help me with chi square test on data frame.I am
 struggling from last 2 days.I will be very  thankful to you.

 Dear all,

 I have been working on this problem from so many hours but did not find any
 solution.
 I have a data frame with 8 columns-
   V1   V2   V3   V4  W1   W2W3   W4
 1 084   22   10   0  84  0  0
 2358400 22  84  0  0
 3 0 0  0  48   0   00 48
 4 04800   0  48   0  0
 5 08400   0  84   0  0
 6 0 00   48   0   00 48

 from first four columns, for each row I have to take two largest values.
 and these two values will be considered as observed values.And from last
 four column we will get the expected values.So i have to perform chi square
 test for each row to get p values.

 example for first row is-

 first two largest values are 84(in V2) and 22 (in V3).so these are
 considered as observed values.Now if the largest values are in V2 and V3,we
 have to pick expected values from W2 and W3 which are 84 and 0.I know for
 chi square test values should not be 0 but we will ignore the warning.
 Now as we have observed value as well as expected we have to perform chi
 square test to get p values for each row in a new column.


 So far I was working as returning the index for two largest value with-
 sort.int(df,index.return=TRUE)$ix[c(4,3)]
  but it does not accept data frame.

 Can you please give some idea how to do this,because it is very tricky and
 after studying a lot, I am not able to perform.Please help.



 Thanking you,
 Warm Regards
 Vikas Bansal
 Msc Bioinformatics
 Kings College London
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square test on data frame

2011-08-17 Thread Bansal, Vikas
Dear Michael,

Thanks a lot for your reply and for your help.I was struggling so much but your 
suggestion showed me a path to the solution of my problem.I have tried your 
code on my data frame step wise and it looks fine to me.But when i tried chi 
square test-

res=chisq.test(y1[id],p=y2[id],rescale.p=T)

Chi-squared test for given probabilities

data:  y1[id] 
X-squared = NaN, df = 19997, p-value = NA

Warning message:
In chisq.test(y1[id], p = y2[id], rescale.p = T) :
  Chi-squared approximation may be incorrect

It is not giving p value.Then i checked observed and expected values,it is 
taking all numbers under consideration.but as i mentioned earlier i want p 
value for each row and therefore degree of freedom will be 1. example-

I have a data frame with 8 columns-
  V1   V2   V3   V4  W1   W2W3   W4
1 084   22   10   0  84  0  0
2358400 22  84  0  0
3 0 0  0  48   0   00 48
4 04800   0  48   0  0
5 08400   0  84   0  0
6 0 00   48   0   00 48

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from W2 and W3 which are 84 and 0.I know for chi square test 
values should not be 0 but we will ignore the warning.

now it should generate p value for next row taking 35 and 84 (v1 and v2) as 
observed and 22 and 84 (w1 and w2) as expected.so here it will do chi square 
test for all 6 rows and will generate 6 p values.My data frame has lot of 
rows(approx. ).

Can you please help me with this.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London

From: R. Michael Weylandt [michael.weyla...@gmail.com]
Sent: Wednesday, August 17, 2011 7:11 PM
To: Bansal, Vikas
Cc: r-help@r-project.org
Subject: Re: [R] Chi square test on data frame

I think everything below is right, but it's all a little helter-skelter so take 
it with a grain of salt:

First things first, make your data with dput() for the list.

Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
), .Dimnames = list(c(1, 2, 3, 4, 5, 6), c(V1,
V2, V3, V4, W1, W2, W3, W4)))

Now,

Y1 = Y[,1:4]
Y2 = Y[,-(1:4)]

id = apply(Y1,1,order,decreasing=T)[1:2,]
# This has the columns you want in each row, but it's not directly appropriate 
for subsetting
# Specifically, the problem is that the row information is implicit in where 
the col index is in id
# We directly extract and force into a 2-col vector that gives rows and columns 
for each data point
id = cbind(as.vector(col(id)),as.vector(id))

Now you can take

Y1[id] as the observed values and Y2[id] as the expected.

But, to be honest, it sounds like you have more problems in using a chi-sq test 
than anything else. Beyond all the zeros, you should note that you always have 
#obs = #expected because Y1= Y2. I'll leave that up to you though.

Hope this helps and please make sure you can take my code apart piece by piece 
to understand it: there's some odd data manipulation that takes advantage of 
R's way of coercing matrices to vectors and if your actual data isn't like the 
provided example, you may have to modify.

Michael Weylandt

On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas 
vikas.ban...@kcl.ac.ukmailto:vikas.ban...@kcl.ac.uk wrote:
Is there anyone who can help me with chi square test on data frame.I am 
struggling from last 2 days.I will be very  thankful to you.

Dear all,

I have been working on this problem from so many hours but did not find any 
solution.
I have a data frame with 8 columns-
  V1   V2   V3   V4  W1   W2W3   W4
1 084   22   10   0  84  0  0
2358400 22  84  0  0
3 0 0  0  48   0   00 48
4 04800   0  48   0  0
5 08400   0  84   0  0
6 0 00   48   0   00 48

from first four columns, for each row I have to take two largest values. and 
these two values will be considered as observed values.And from last four 
column we will get the expected values.So i have to perform chi square test for 
each row to get p values.

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from

[R] Chi square test on data frame

2011-08-16 Thread Bansal, Vikas
Dear all,

I have been working on this problem from so many hours but did not find any 
solution.
I have a data frame with 8 columns-
   V1   V2   V3   V4  W1   W2W3   W4
1 084   22   10   0  84  0  0
2358400 22  84  0  0
3 0 0  0  48   0   00 48
4 04800   0  48   0  0
5 08400   0  84   0  0
6 0 00   48   0   00 48

from first four columns, for each row I have to take two largest values. and 
these two values will be considered as observed values.And from last four 
column we will get the expected values.So i have to perform chi square test for 
each row to get p values.

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from W2 and W3 which are 84 and 0.I know for chi square test 
values should not be 0 but we will ignore the warning.
Now as we have observed value as well as expected we have to perform chi square 
test to get p values for each row in a new column.


So far I was working as returning the index for two largest value with-
sort.int(df,index.return=TRUE)$ix[c(4,3)]
 but it does not accept data frame.

Can you please give some idea how to do this,because it is very tricky and 
after studying a lot, I am not able to perform.Please help.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chi square test of proprotions in 2 groups of different sizes

2011-02-10 Thread Dimitri Liakhovitski
Hello!

Very sorry for a probably very simple question - I looked but did not
find an answer in the archives.
I have a table counts (below) that shows counts by Option within
each of my 2 groups. However, my groups have different sizes (N1=255
and N2=68). Table prop shows the resulting proportions within each
group.
I would like to compare the proportions in 2 groups using Chi Square
test. However, I am not sure how to do it because chisq.test(counts)
does not take into account the fact that the sizes of the groups are
different.
Any hint is greatly appreciated.
Thank you!
Dimitri

G1counts - matrix(c(54,76,125), ncol = 1)
G2counts - matrix(c(14,19,35), ncol = 1)
counts-cbind(G1counts,G2counts)
colnames(counts)-c(Group1,Group2);
rownames(counts)-c(Option1,Option2,Option3)

N1=255
N2=68
Ns=c(N1,N2)

prop1-G1counts/N1
prop2-G2counts/N2
prop-cbind(prop1,prop2)
colnames(prop)-c(Group1,Group2);
rownames(prop)-c(Option1,Option2,Option3)

(Ns);(counts);(prop);sum(prop)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens
I was asked by my boss to do an analysis on a large data set, and I am
trying to convince him to let me use R rather than SPSS. I think Sweave
could make my life much much easier. To get me a little closer to this
goal, I ran my analysis through R and SPSS and compared the resulting
values. In all but one case, they were the same. Given the matrix

[,1] [,2]
[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32

This is the output from R:
 chisq.test(test29)

Pearson's Chi-squared test

data:  test29
X-squared = 9.593, df = 4, p-value = 0.04787

But, the same data in SPSS generates a p value of .051. It's a small but
important difference. I played around and rescaled things, and tried
different values for B, but I never could get R to reach .051.

I'd like to know which program is correct - R or SPSS? I know, this is a
biased place to ask such a question. I also appreciate all input that
will help me use R more effectively. The difference could be the result
of my own ignorance.

thanks
--andy

-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Chuck Cleland
On 11/26/2008 9:51 AM, Andrew Choens wrote:
 I was asked by my boss to do an analysis on a large data set, and I am
 trying to convince him to let me use R rather than SPSS. I think Sweave
 could make my life much much easier. To get me a little closer to this
 goal, I ran my analysis through R and SPSS and compared the resulting
 values. In all but one case, they were the same. Given the matrix
 
 [,1] [,2]
 [1,]  110  358
 [2,]   71  312
 [3,]   29  139
 [4,]   31   77
 [5,]   13   32
 
 This is the output from R:
 chisq.test(test29)
 
   Pearson's Chi-squared test
 
 data:  test29
 X-squared = 9.593, df = 4, p-value = 0.04787
 
 But, the same data in SPSS generates a p value of .051. It's a small but
 important difference. I played around and rescaled things, and tried
 different values for B, but I never could get R to reach .051.
 
 I'd like to know which program is correct - R or SPSS? I know, this is a
 biased place to ask such a question. I also appreciate all input that
 will help me use R more effectively. The difference could be the result
 of my own ignorance.

  The SPSS p-value is for the Likelihood Ratio Chi-squared test, not
Pearson's.  For Pearson's Chi-squared test in SPSS (16.0.2), I get
p=0.04787, so the results do match if you do the same Chi-squared test.

 thanks
 --andy 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Berwin A Turlach
G'day Andy,

On Wed, 26 Nov 2008 14:51:50 +
Andrew Choens [EMAIL PROTECTED] wrote:

 I was asked by my boss to do an analysis on a large data set, and I am
 trying to convince him to let me use R rather than SPSS. 

Very laudable of you. :)

 This is the output from R:
  chisq.test(test29)
 
   Pearson's Chi-squared test
 
 data:  test29
 X-squared = 9.593, df = 4, p-value = 0.04787
 
 But, the same data in SPSS generates a p value of .051. It's a small
 but important difference. 

Chuck explained already the reason for this small difference.  I just
take issue about it being an important difference.  In my opinion, this
difference is not important at all.  It would only be important to
people who are still sticking to arbitrary cut-off points that are
mainly due to historical coincidences and the lack of computing power
at those time in history.  If somebody tells you that this difference
is important, ask him or her whether he or she will be willing to
finance you a room full of calculators (in the sense of Pearson's time)
and whether he or she wants you to do all your calculations and analyses
with these calculators in future.  Alternatively, you could ask the
person whether he or she would like the anaesthetist during his or her
next operation to use chloroform given his or her nostalgic penchant for
out-dated rituals/methods.

 I played around and rescaled things, and tried different values for
 B, but I never could get R to reach .051.

Well, I have no problem when using simulated p-values to get something
close to 0.051; look at the last try.  The second one might also be
noteworthy.  Unfortunately, I didn't save the seed beforehand.

 test29 - matrix(c(110,358,71,312,29,139,31,77,13,32), byrow=TRUE,
 ncol=2) test29
 [,1] [,2]
[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32
 chisq.test(test29, simul=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.04798

 chisq.test(test29, simul=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.05697

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0463

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0499

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0486

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.05125


Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens
On Thu, 2008-11-27 at 00:46 +0800, Berwin A Turlach wrote:
 Chuck explained already the reason for this small difference.  I just
 take issue about it being an important difference.  In my opinion,
 this difference is not important at all.  It would only be important
 to people who are still sticking to arbitrary cut-off points that are
 mainly due to historical coincidences and the lack of computing power
 at those time in history.  If somebody tells you that this difference
 is important, ask him or her whether he or she will be willing to
 finance you a room full of calculators (in the sense of Pearson's
 time) and whether he or she wants you to do all your calculations and
 analyses with these calculators in future.  Alternatively, you could
 ask the person whether he or she would like the anaesthetist during
 his or her next operation to use chloroform given his or her nostalgic
 penchant for out-dated rituals/methods.

Yes he did and when I realized the source of my confusion I was
appropriately chastised. I felt like a bit of a fool. Of course, I
should try comparing apples to apples. Oranges are another thing
entirely.

As to the importance of the difference, I am of two minds. On the one
hand I fully agree with you. It is an anachronistic approach. On the
other hand we don't all have the pleasure of working in a math
department where such subtleties are well understood.

I work for a consulting firm that advises state and local governments
(USA). I personally do try to expand my understanding on statistics and
math (I do not have a degree in math), but my clients do not. When I'm
working with someone from the government, it is sometimes easier to
simply tell them that relationship x is significant at a certain level
of certainty. Although I doubt they could really explain the details,
they have some basic understanding of what I am talking about.
Subtleties are sometimes lost on our public servants.

And, since I do work for government, if I ask for a roomful of
calculators, I might just get them. And really, what am I going to do
with a roomful of calculators?

--andy


-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Ted Harding
On 26-Nov-08 17:57:52, Andrew Choens wrote:
 [...]
 And, since I do work for government, if I ask for a roomful of
 calculators, I might just get them. And really, what am I going
 to do with a roomful of calculators?
 
 --andy
 Insert something humorous here.  :-)

Next time the launch of an incoming nuclear strike is detected,
set them to work as follows (following Karl Pearson's historical
precedent):

  Anti-aircraft guns all day long: Computing for the
 Ministry of Munitions
 JUNE BARROW GREEN (Open University)
   From January 1917 until March 1918 Pearson and his
   staff of mathematicians and human computers at the
   Drapers Biometric Laboratory worked tirelessly on
   the computing of ballistic charts, high-angle range
   tables and fuze-scales for AV Hill of the Anti-Aircraft
   Experimental Section. Things did not always go smoothly
   -- Pearson did not take kindly to the calculations of
   his staff being questioned -- and Hill sometimes had
   to work hard to keep the peace.

If you have enough of them (and Pearson undoubtedly did, so you
can quote that in your requisition request), then you might just
get the answer in time!

[ The above excerpted from http://tinyurl.com/6byoub ]

Good luck!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Nov-08   Time: 18:35:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens

 Next time the launch of an incoming nuclear strike is detected,
 set them to work as follows (following Karl Pearson's historical
 precedent):
 
   Anti-aircraft guns all day long: Computing for the
  Ministry of Munitions
  JUNE BARROW GREEN (Open University)
From January 1917 until March 1918 Pearson and his
staff of mathematicians and human computers at the
Drapers Biometric Laboratory worked tirelessly on
the computing of ballistic charts, high-angle range
tables and fuze-scales for AV Hill of the Anti-Aircraft
Experimental Section. Things did not always go smoothly
-- Pearson did not take kindly to the calculations of
his staff being questioned -- and Hill sometimes had
to work hard to keep the peace.
 
 If you have enough of them (and Pearson undoubtedly did, so you
 can quote that in your requisition request), then you might just
 get the answer in time!
 
 [ The above excerpted from http://tinyurl.com/6byoub ]
 
 Good luck!
 Ted.
 

That is absolutely classic.


-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi-square test

2008-04-09 Thread J Dougherty
On Tuesday 08 April 2008 17:04:16 Roslina Zakaria wrote:
 Hi R-users,
 I would like to find the goodness of fit using Chi-suare test for my data
 below: xobs=observed data, xtwe=predicted data using tweedie,
 xgam=predicted data using gamma

  xobs - c(223,46,12,5,7,17)
  xtwe - c(217.33,39,14,18.33,6.67,14.67)
  xgam - c(224.67,37.33,12.33,15.33,5.33,15)
 
  chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE)

 Error in chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE) :
   unused argument(s) (xtwe = c(217.33, 39, 14, 18.33, 6.67, 14.67))
 chisq.test(x, p = p, rescale.p = TRUE)
 I'm not sure what's wrong with it. 
 Thank you so much for your help.


Try this instead:

chisq.test(xobs, p=xtwe, rescale.p = TRUE)

The help file might be a little obscure.

JWDougherty

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] chi-square test

2008-04-08 Thread Roslina Zakaria
Hi R-users,
I would like to find the goodness of fit using Chi-suare test for my data below:
xobs=observed data, xtwe=predicted data using tweedie, xgam=predicted data 
using gamma
 xobs - c(223,46,12,5,7,17)
 xtwe - c(217.33,39,14,18.33,6.67,14.67)
 xgam - c(224.67,37.33,12.33,15.33,5.33,15)

 chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE)
Error in chisq.test(xobs, xtwe = xtwe, rescale.p = TRUE) : 
  unused argument(s) (xtwe = c(217.33, 39, 14, 18.33, 6.67, 14.67))
chisq.test(x, p = p, rescale.p = TRUE)
I'm not sure what's wrong with it. 
Thank you so much for your help.


  

You rock. That's why Blockbuster's offering you one month of Blockbuster Total 
Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.