Hi Nick, Thanks a lot for your quick response. Sorry for delayed response becasue of time difference between us.
(11/03/25 22:40), Nick Sabbe wrote: > > I haven't read all of your code, but at first read, it seems right. > > > > With regard to your questions: > > 1. Am I doing it correctly or not? > > Seems OK, as I said. You could use some more standard code to convert your > > data to a matrix, but essentially the results should be the same. > > Also, lambda.min may be a tad to optimistic: to correct for the reuse of > > data in crossvalidation, one normally uses the minus one se trick (I think > > this is described in the helpfile for glmnet.cv, and that is also present in > > the glmnet.cv return value (lambda.1se if I'm not mistaken)) Actually, fit.1cv$lambda.min [1] 0.036 fit.1$lambda.1se [1] 0.121 coef(fit.1cv, s=fit.1cv$lambda.1se) 16 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -1.1707519 V1 0.0000000 V2 0.0000000 V3 0.0000000 V4 0.0000000 V5 0.0000000 V6 0.0000000 V7 0.0000000 V8 0.2865651 V9 0.0000000 V10 0.0000000 V11 0.0000000 V12 0.0000000 V13 0.0000000 V14 0.0000000 V15 0.0000000 Do you mean that I should select one parameter model (only V8 included) described above? > > 2. Which model, I mean lasso or elastic net, should be selected? and > > why? Both models chose the same variables but different coefficient values. > > You may want to read 'the elements of statistical learning' to find some > > info on the advantages of ridge/lasso/elnet compared. Lasso should work fine > > in this relatively low-dimensional setting, although it depends on the > > correlation structure of your covariates. > > Depending on your goals, you may want to refit a standard logistic > > regression with only the variables selected by the lasso: this avoids the > > downward bias that is in (just about) every penalized regression. fit1.glm <- glm(outcome ~ x1+x2+x3+x4, family="binomial", data=MyData) summary(fit1.glm) Call: glm(formula = outcome ~ x1+x2+x3+x4, family = "binomial", data = MyData) Deviance Residuals: Min 1Q Median 3Q Max -1.806 -0.626 -0.438 -0.191 2.304 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.855 0.393 -4.72 2.3e-06 *** x1 1.037 0.558 1.86 0.0630 . x2 1.031 0.326 3.16 0.0016 ** x3 -0.591 0.323 -1.83 0.0678 . x4 0.347 0.293 1.18 0.2368 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 114.717 on 103 degrees of freedom Residual deviance: 87.178 on 99 degrees of freedom AIC: 97.18 Number of Fisher Scoring iterations: 5 library(epicalc) logistic.display(fit1.glm) Logistic regression predicting postopDWI_HI2 crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) x1 1 vs 0 2.45 (0.98,6.13) 2.82 (0.95,8.42) 0.063 0.059 x2 2.76 (1.6,4.77) 2.8 (1.48,5.31) 0.002 < 0.001 x3 0.64 (0.36,1.14) 0.55 (0.29,1.04) 0.068 0.051 x4 2.07 (1.28,3.34) 1.41 (0.8,2.51) 0.237 0.234 Log-likelihood = -43.5889 No. of observations = 104 AIC value = 97.1777 The AUC of this glm model is 0.81, which means good predictive ability. My next question, however, is which coefficients value should I select, glmnet model or glm model for presentation? Is it O.K. to select variables by glmnet and present coefficients (odds ratio) by glm? > > > > 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can > > you calculate 95% confidence interval of odds ratio? > > Or 95%CI is meaningless in this kind of analysis? > > At this time, confidence intervals for lasso/elnet in GLM settings is an > > open problem (the reason being that the L1 penalty is not differentiable). > > Some 'solutions' exist (bootstrap, for one), but they have all been shown to > > have (statistical) properties that make them - at the least - doubtful. I > > know, because I'm working on this. Short answer: there is no way to do this > > (at this time). Well, thank you for a clear-cut answer. :-) > > > > HTH (and hang on there in Japan), > > Nick Sabbe Your answers helped me very much. Actually, I live in the west part of Japan and fortunately had no damage around neighborhodd by the earthquake. Now we are worried about the nuclear power plant... Best regards, KH ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.