"Gary Collins" <[EMAIL PROTECTED]> writes: > looking at the help page/code in STATA for tetrachoric, it says it > estimates the tetrachoric correlation via the approximation suggested > by Edwards & Edwards (1984), "Approximating the tetrachoric > correlation", Biometrics, 40(2): 563. > > that is, > > (alpha (pi/4) - 1) / (alpha^(pi/4)+1), where alpha is ad/bc > > i.e. > > alpha=(522 * 22)/(34 * 54) > > (alpha^(pi/4)-1) / (alpha^(pi/4)+1) > [1] 0.6168851
...and the approximation is obviously quite far off the mark in this case. Presumably (I'm lazy) the approximation holds for the odds ratio alpha close to 1 (rho close to 0) and/or marginal distributions close to 50:50. There's a Stata package "polychoric" which claims to do things more accurately, referred to at http://www.ats.ucla.edu/STAT/stata/faq/tetrac.htm (I believe I mentioned this before, but possibly in a private mail to Janet which never reached r-help). > HTH > > Gary > > On 25/06/06, John Fox <[EMAIL PROTECTED]> wrote: > > Dear Janet, > > > > A good thing to do when different software gives different answers is > > to check each against known results. I'm away from home, and don't have > > all of the examples that I used to check polychor(), but I dug up the > > following. The polychor() function produces output that agrees with > > both of these sources. How does Stata do? > > > > > # example from Drasgow (1988), pp. 69-74 in Kotz and Johnson, > > > # Encyclopedia of statistical sciences. Vol. 7. > > > tab > > [,1] [,2] [,3] > > [1,] 58 52 1 > > [2,] 26 58 3 > > [3,] 8 12 9 > > > > > polychor(tab, std.err=TRUE) > > > > Polychoric Correlation, 2-step est. = 0.42 (0.07474) > > Test of bivariate normality: Chisquare = 11.55, df = 3, p = 0.009078 > > > > > polychor(tab, ML=TRUE, std.err=TRUE) > > > > Polychoric Correlation, ML est. = 0.4191 (0.07616) > > Test of bivariate normality: Chisquare = 11.54, df = 3, p = 0.009157 > > > > Row Thresholds > > Threshold Std.Err. > > 1 -0.02988 0.08299 > > 2 1.13300 0.10630 > > > > > > Column Thresholds > > Threshold Std.Err. > > 1 -0.2422 0.08361 > > 2 1.5940 0.13720 > > > > > tab # example from Brown (1977) Applied Statistics, 26:343-351. > > [,1] [,2] > > [1,] 1562 42 > > [2,] 383 94 > > > > > polychor(tab) > > [1] 0.595824 > > > > > > > Regards, > > John > > > > On Fri, 23 Jun 2006 14:33:31 -0700 > > Janet Rosenbaum <[EMAIL PROTECTED]> wrote: > > > Peter --- Thanks for pointing out the omitted information. The > > > hazards > > > of attempting to be brief. > > > > > > In R, I am using polychor(vec1, vec2, std.err=T) and have used both > > > the > > > ML and 2 step estimates, which give virtually identical answers. I > > > am > > > explicitly using only the 632 complete cases in R to make sure > > > missing > > > data is handled the same way as in stata. > > > > > > Here's my data: > > > > > > 522 54 > > > 34 22 > > > > > > > polychor(v1, v2, std.err=T, ML=T) > > > > > > Polychoric Correlation, ML est. = 0.5172 (0.08048) > > > Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN > > > > > > Row Thresholds > > > Threshold Std.Err. > > > 1 1.349 0.07042 > > > > > > > > > Column Thresholds > > > Threshold Std.Err. > > > 1 1.174 0.06458 > > > Warning message: > > > NaNs produced in: pchisq(q, df, lower.tail, log.p) > > > > > > In stata, I get: > > > > > > . tetrachoric t1_v19a ct1_ix17 > > > > > > Tetrachoric correlations (N=632) > > > > > > ---------------------------------- > > > Variable | t1_v19a ct1_ix17 > > > -------------+-------------------- > > > t1_v19a | 1 > > > ct1_ix17 | .6169 1 > > > ---------------------------------- > > > > > > Thanks for your help. > > > > > > Janet > > > > > > > > > > > > Peter Dalgaard wrote: > > > > Janet Rosenbaum <[EMAIL PROTECTED]> writes: > > > > > > > >> I hope someone here knows the answer to this since it will save me > > > from > > > >> delving deep into documentation. > > > >> > > > >> Based on 22 pairs of vectors, I have noticed that tetrachoric > > > >> correlation coefficients in stata are almost uniformly higher than > > > those > > > >> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R; .51 > > > in > > > >> stata, .39 in R). Stata's estimate is higher than R's in 20 out > > > of 22 > > > >> computations, although the estimates always fall within the 95% CI > > > for > > > >> the TCC calculated by R. > > > >> > > > >> Do stata and R calculate TCC in dramatically different ways? Is > > > the > > > >> handling of missing data perhaps different? Any thoughts? > > > >> > > > >> Btw, I am sending this question only to the R-help list. > > > > > > > > > > > > A bit more information seems necessary: > > > > > > > > - tetrachoric correlations depend on 4 numbers, so you should be > > > able > > > > to give a direct example > > > > > > > > - you're not telling us how you calculate the TCC in R. This is not > > > > obvious (package polycor?). > > > > > > > > > > > > > -------------------- > > > > > > This email message is for the sole use of the intended\ > ...{{dropped}} > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
