predict.glm by default produces predictions on the scale of the linear predictors. If in a logistic regression, you want the predictions to be on the response scale [0,1], use

x <- predict(logistic.model, medians, type="response")

for example. See ?predict.glm for details.

Cheers,

Simon.



Hi

I am working on corpora of automatically recognized utterances, looking
for features that predict error in the hypothesis the recognizer is
proposing.
I am using the glm functions to do logistic regression.  I do this type
of thing:

*       logistic.model = glm(formula = similarity ~., family = binomial,
data = data)

and end up with a model:

 summary(logistic.model)

Call:
glm(formula = similarity ~ ., family = binomial, data = data)

Deviance Residuals:
Min 1Q Median 3Q Max -3.1599 0.2334 0.3307 0.4486 1.2471
Coefficients:
Estimate Std. Error z value Pr(>|z|) (Intercept) 11.1923783 4.6536898 2.405 0.01617 * length -0.3529775 0.2416538 -1.461 0.14410 meanPitch -0.0203590 0.0064752 -3.144 0.00167 **
minimumPitch           0.0257213  0.0053092   4.845 1.27e-06 ***
maximumPitch -0.0003454 0.0030008 -0.115 0.90838 meanF1 0.0137880 0.0047035 2.931 0.00337 ** meanF2 0.0040238 0.0041684 0.965 0.33439 meanF3 -0.0075497 0.0026751 -2.822 0.00477 ** meanF4 -0.0005362 0.0007443 -0.720 0.47123 meanF5 -0.0001560 0.0003936 -0.396 0.69187 ratioF2ToF1 0.2668678 2.8926149 0.092 0.92649 ratioF3ToF1 1.7339087 1.7655757 0.982 0.32607 jitter -5.2571384 10.8043359 -0.487 0.62656 shimmer -2.3040826 3.0581950 -0.753 0.45120 percentUnvoicedFrames 0.1959342 1.3041689 0.150 0.88058 numberOfVoiceBreaks -0.1022074 0.0823266 -1.241 0.21443 percentOfVoiceBreaks -0.0590097 1.2580202 -0.047 0.96259 meanIntensity -0.0765124 0.0612008 -1.250 0.21123 minimumIntensity 0.1037980 0.0331899 3.127 0.00176 ** maximumIntensity -0.0389995 0.0430368 -0.906 0.36484 ratioIntensity -2.0329346 1.2420286 -1.637 0.10168 noSyllsIntensity 0.1157678 0.0947699 1.222 0.22187 startSpeech 0.0155578 0.1343117 0.116 0.90778 speakingRate -0.2583315 0.1648337 -1.567 0.11706 ---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2462.3  on 4310  degrees of freedom
Residual deviance: 2209.5  on 4287  degrees of freedom
AIC: 2257.5

Number of Fisher Scoring iterations: 6


I have seen models where almost all the features are showing one in a
thousand significance but I accept that I could improve my model by
normalizing some of the features (some are left skewed and I understand
that I will get a better fir by taking their logs, for example).

What really worries me is that the logistic function produces
predictions that appear to fall well outside 0 to 1.

If I make a dataset of the medians of the above features and use my
logistic.model on it, it produces a
figure of:

 > x = predict(logistic.model, medians)
 x
[1] 2.82959


which is well outside the range of 0 to 1.

The actual distribution of all the predictions is:

 summary(pred)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -1.516   2.121   2.720   2.731   3.341   6.387


I can get the model to give some sort of prediction by doing this:

 pred = predict(logistic.model, data)
 pred[pred <= 1.5] = 0
 pred[pred > 1.5] = 1
 t = table(pred, data[,24])
 t
pred 0 1 0 102 253
   1  255 3701

 classAgreement(t)
$diag
[1] 0.8821619

$kappa
[1] 0.2222949

$rand
[1] 0.7920472

$crand
[1] 0.1913888



but as you can see I am using a break point well outside the range 0 to
1 and the kappa is rather low (I think).

I am a bit of a novice in this, and the results worry me.
Can anyone comment if the results look strange, or if they know I am
doing something wrong?

Stephen


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.



        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany & Zoology
The Australian National University
Canberra ACT 0200
Australia

T: +61 2 6125 8057  email: [EMAIL PROTECTED]
F: +61 2 6125 5573

CRICOS Provider # 00120C

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to