There was my mistake in the earlier email. I have corrected the error by dropping "ns.omit" from data.frame().
I added a new corrected correlation and output followings: ------------------------------------------------------------------------------ # > nrow(sdi) [1] 65613 > print(corridor1[65600:65613]) [1] C C C C F [6] F F F B B [11] F F B B Levels: B C D E A F > print(corridor2[65600:65613]) [1] 4 4 4 4 2 2 2 2 1 1 2 2 1 1 > summary(corridor1) B C D E A F 15092 13456 6652 1611 1796 27006 > summary(corridor2) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0 1.0 2.0 2.3 3.0 5.0 > summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1))) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0 0 0 0 0 > table(corridor1,corridor2) corridor2 corridor1 0 1 2 3 4 5 B 0 15092 0 0 0 0 C 0 0 0 0 13456 0 D 0 0 0 6652 0 0 E 0 0 0 0 0 1611 A 1796 0 0 0 0 0 F 0 0 27006 0 0 0 > --------------------------------------------------------------------------------------- There are different correlation coefficients from the following results: Are there any functions or packages for a categorical correlation? > cor(jh1_1, corridor1) [1] 0.02753303 > cor(jh1_1, as.factor(corridor2)) [1] -0.3682788 Thanks for your kindness, Kum On 12 Oct 2006 10:25:33 +0200, Peter Dalgaard <[EMAIL PROTECTED]> wrote: > "Kum-Hoe Hwang" <[EMAIL PROTECTED]> writes: > > > Howdy Gurus ! > > > > I have a different correlation result from the same data. The > > "corridor1" string variable is expressed > > as a number like the "corridor2" number variable. > > -------------------------------------------------------------------------- > > > levels(corridor1) > > [1] "A" "B" "C" "D" "E" "F" > > > levels(as.factor(corridor2)) > > [1] "0" "1" "2" "3" "4" > > > > > ------------------------------------------------------------------------------------------ > > I have the correlation results followings using cor() function. > > ------------------------------------------------------------------------------------------ > > > cor(jh1_1, as.factor(corridor1)) > > [1] 0.01528538 > > > cor(jh1_1, as.factor(corridor2)) > > [1] -0.4972571 > > ------------------------------------------------------------------------------------------ > > I donot know why the above correlation coefficients used the same data > > are different. > > They are 0.015 from as.factor(corridor1), -0.497 from as,factor(corridor2). > > The string variable "corridor1" is the same catergory data with the > > variable corridor2. > > The difference is that "A" is replaced with "0", "B" with "1", "C" > > with "2", ..... > > > > Could you tell me why they are different, and which correlation > > coefficient is correct? > > One thing that strikes me is that corridor1 has 6 levels and corridor2 > has 5... > > In general correlations are not expected to work on factors so I'd be > explicit about taking as.numeric(). A glance at > table(corridor1,corridor2) should be informative too, as would a > summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1))) > > -- > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > -- Kum-Hoe Hwang, Ph.D.Phone : 82-31-250-3516Email : [EMAIL PROTECTED] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.