I am trying to do some verification across a large dataset, cuData, that has 23 columns. Column 23 (similarity) is the outcome 0 or 1 and the other columns are the features. I do this: verificationglm.model <- glm(formula = similarity ~ ., family=binomial, data=cuData[1:1000,]) and produce the model: > summary(verificationglm.model) Call: glm(formula = similarity ~ ., family = binomial, data = cuData[1:1000, ]) Deviance Residuals: Min 1Q Median 3Q Max -2.3885 -0.8943 -0.2918 0.8851 2.7025 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 26.3112869 21.2229690 1.240 0.215066 length -0.6249415 0.1906254 -3.278 0.001044 ** meanPitch -0.0110389 0.0053083 -2.080 0.037565 * minimumPitch 0.0002689 0.0024290 0.111 0.911845 maximumPitch -0.0013454 0.0038149 -0.353 0.724326 meanF1 -0.0362153 0.0112499 -3.219 0.001286 ** meanF2 0.0016765 0.0115335 0.145 0.884430 meanF3 0.0073960 0.0076235 0.970 0.331964 meanF4 0.0063015 0.0016820 3.746 0.000179 *** meanF5 -0.0022535 0.0024885 -0.906 0.365153 ratioF2ToF1 -1.2322825 7.0036532 -0.176 0.860334 ratioF3ToF1 -4.9643148 4.5973552 -1.080 0.280222 jitter -8.7535283 14.5273818 -0.603 0.546806 shimmer 1.6706067 2.6327972 0.635 0.525731 percentUnvoicedFrames -0.4863219 1.1638115 -0.418 0.676042 numberOfVoiceBreaks -0.0335636 0.0634956 -0.529 0.597086 percentOfVoiceBreaks -2.9353239 0.8945600 -3.281 0.001033 ** meanIntensity -0.2931293 0.3355314 -0.874 0.382321 minimumIntensity 0.0689654 0.1531059 0.450 0.652392 maximumIntensity 0.2186570 0.2510906 0.871 0.383848 ratioIntensity -8.1777871 13.1676287 -0.621 0.534565 noSyllsIntensity 0.1714826 0.0695021 2.467 0.013614 * speakingRate -0.3564808 0.1507373 -2.365 0.018034 * startSpeech -1.3537348 6.7337461 -0.201 0.840669 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1384.0 on 999 degrees of freedom Residual deviance: 1084.7 on 976 degrees of freedom AIC: 1132.7 Number of Fisher Scoring iterations: 5 > Now I want to use the model to predict on a different part of the dataset. I try this, and get my prediction: > pred <- predict(verificationglm.model, cuData[1001:2000,1:23]) > pred 1001 1002 1003 1004 1005 1006 1007 -0.495901722 -2.406349629 -0.911082179 -0.965869553 -0.488695693 -1.849622304 -1.637722247 1008 1009 1010 1011 1012 1013 1014 -1.148952722 -0.191538278 -1.511895046 -2.989036645 -2.775775622 0.603852124 -0.838613048 1015 1016 1017 1018 1019 1020 1021 -0.434259674 -2.004230065 -0.234829011 1.666502334 2.039631718 -0.592192326 1.667700087 1022 1023 1024 1025 1026 1027 1028 0.104644531 1.748724399 0.391461247 1.356898357 1.468154760 1.090708994 1.071487227 1029 1030 1031 1032 1033 1034 1035 0.720596788 2.378350706 -0.128248232 0.969373318 0.315142756 1.372108172 -2.399517898 1036 1037 1038 1039 1040 1041 1042 -0.684530171 0.761198819 -1.298372615 1.185368711 -1.148974059 0.358234433 0.671495255 1043 1044 1045 1046 1047 1048 1049 0.683771224 0.663767266 2.009012643 0.196591464 2.063417812 0.823472345 0.696638161 [runs on to 2000] However, I then want to check for classAgreement (an e1701 package function). First I want a table. I do this: > t = table(pred,cuData[1001:2000,24]) > t pred 0 1 -8.90070098980106 0 1 -8.0484071844879 0 1 -7.79298548775523 1 0 -7.18338330609013 1 0 [runs on] when I expect this 0 1 0 ? ? 1 ? ? with the ?’s being some count. When I look at my slice of cuData it looks like this: > cuData[1001:2000,24] [1] 1 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 0 0 1 1 0 [75] 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 [112] 0 1 1 1 0 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [149] 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 [186] 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 [223] 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 [260] 1 0 0 0 0 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 [297] 1 1 0 1 1 1 1 1 1 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 [334] 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 0 1 0 1 0 [371] 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 [408] 1 1 0 0 0 0 1 0 1 1 1 1 [etc] so it looks like a different layout from my pred. Does anyone know how to make these two compatible so table() will work? Thanks. Stephen
-- No virus found in this outgoing message. Checked by AVG Anti-Virus. [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html