predict.glm by default produces predictions on the scale of the
linear predictors. If in a logistic regression, you want the
predictions to be on the response scale [0,1], use
x <- predict(logistic.model, medians, type="response")
for example. See ?predict.glm for details.
Cheers,
Simon.
Hi
I am working on corpora of automatically recognized utterances, looking
for features that predict error in the hypothesis the recognizer is
proposing.
I am using the glm functions to do logistic regression. I do this type
of thing:
* logistic.model = glm(formula = similarity ~., family = binomial,
data = data)
and end up with a model:
summary(logistic.model)
Call:
glm(formula = similarity ~ ., family = binomial, data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1599 0.2334 0.3307 0.4486 1.2471
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 11.1923783 4.6536898 2.405 0.01617 *
length -0.3529775 0.2416538 -1.461 0.14410
meanPitch -0.0203590 0.0064752 -3.144 0.00167 **
minimumPitch 0.0257213 0.0053092 4.845 1.27e-06 ***
maximumPitch -0.0003454 0.0030008 -0.115 0.90838
meanF1 0.0137880 0.0047035 2.931 0.00337 **
meanF2 0.0040238 0.0041684 0.965 0.33439
meanF3 -0.0075497 0.0026751 -2.822 0.00477 **
meanF4 -0.0005362 0.0007443 -0.720 0.47123
meanF5 -0.0001560 0.0003936 -0.396 0.69187
ratioF2ToF1 0.2668678 2.8926149 0.092 0.92649
ratioF3ToF1 1.7339087 1.7655757 0.982 0.32607
jitter -5.2571384 10.8043359 -0.487 0.62656
shimmer -2.3040826 3.0581950 -0.753 0.45120
percentUnvoicedFrames 0.1959342 1.3041689 0.150 0.88058
numberOfVoiceBreaks -0.1022074 0.0823266 -1.241 0.21443
percentOfVoiceBreaks -0.0590097 1.2580202 -0.047 0.96259
meanIntensity -0.0765124 0.0612008 -1.250 0.21123
minimumIntensity 0.1037980 0.0331899 3.127 0.00176 **
maximumIntensity -0.0389995 0.0430368 -0.906 0.36484
ratioIntensity -2.0329346 1.2420286 -1.637 0.10168
noSyllsIntensity 0.1157678 0.0947699 1.222 0.22187
startSpeech 0.0155578 0.1343117 0.116 0.90778
speakingRate -0.2583315 0.1648337 -1.567 0.11706
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2462.3 on 4310 degrees of freedom
Residual deviance: 2209.5 on 4287 degrees of freedom
AIC: 2257.5
Number of Fisher Scoring iterations: 6
I have seen models where almost all the features are showing one in a
thousand significance but I accept that I could improve my model by
normalizing some of the features (some are left skewed and I understand
that I will get a better fir by taking their logs, for example).
What really worries me is that the logistic function produces
predictions that appear to fall well outside 0 to 1.
If I make a dataset of the medians of the above features and use my
logistic.model on it, it produces a
figure of:
> x = predict(logistic.model, medians)
x
[1] 2.82959
which is well outside the range of 0 to 1.
The actual distribution of all the predictions is:
summary(pred)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.516 2.121 2.720 2.731 3.341 6.387
I can get the model to give some sort of prediction by doing this:
pred = predict(logistic.model, data)
pred[pred <= 1.5] = 0
pred[pred > 1.5] = 1
t = table(pred, data[,24])
t
pred 0 1
0 102 253
1 255 3701
classAgreement(t)
$diag
[1] 0.8821619
$kappa
[1] 0.2222949
$rand
[1] 0.7920472
$crand
[1] 0.1913888
but as you can see I am using a break point well outside the range 0 to
1 and the kappa is rather low (I think).
I am a bit of a novice in this, and the results worry me.
Can anyone comment if the results look strange, or if they know I am
doing something wrong?
Stephen
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
[[alternative HTML version deleted]]
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany & Zoology
The Australian National University
Canberra ACT 0200
Australia
T: +61 2 6125 8057 email: [EMAIL PROTECTED]
F: +61 2 6125 5573
CRICOS Provider # 00120C
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html