Re: [R] Package for .632 (and .632+) bootstrap and the cross-validation of ROC Parameters
Suppose I have Training data: my.train Testing data: my.test I want to calculate bootstrap error rate for logistic model. My wrapper function for prediction pred.glm - function(object, newdata) { ret - as.factor(ifelse(predict.glm(object, newdata, type='response') 0.4, 0, 1)) return(ret) } But i thing i cant understand if i want to calculate misclassification error for my testing data what will be in my data in the following formula. errorest(RES ~., data=???, model=glm, estimator=boot, predict=pred.glm, est.para=control.errorest(nboot = 10)) Using my.test got following error, Error in predict(mymodel, newdata = outbootdata) : unused argument(s) (newdata = list(RES = c(1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0), CAT01 = c(4, 4, 2, 4, 4, 4, 4, 4, 4, 2, 1, 2, 2, 4, 4, 4, 1, 1, 2, 2, 1, 4, 1, 4, 1, 4, 2, 4, 1, 4, 2, 3, 1, 1, 3, 3, 4, 2, 4, 2, 1, 2, 2, 1, 1, please reply... Frank E Harrell Jr wrote: spime wrote: Hi users, I need to calculate .632 (and .632+) bootstrap and the cross-validation of area under curve (AUC) to compare my models. Is there any package for the same. I know about 'ipred' and using it i can calculate misclassification errors. Please help. It's urgent. See the validate* functions in the Design package. Note that some simulations (see http://biostat.mc.vanderbilt.edu/rms) indicate that the advantages of .632 and .632+ over the ordinary bootstrap are highly dependent on the choice of the accuracy measure being validated. The bootstrap variants seem to have advantages mainly if an improper, inefficient, discontinuous scoring rule such as the percent classified correct is used. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Package-for-.632-%28and-.632%2B%29-bootstrap-and-the-cross-validation-of-ROC-Parameters-tf4068544.html#a11578129 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Package for .632 (and .632+) bootstrap and the cross-validation of ROC Parameters
Hi users, I need to calculate .632 (and .632+) bootstrap and the cross-validation of area under curve (AUC) to compare my models. Is there any package for the same. I know about 'ipred' and using it i can calculate misclassification errors. Please help. It's urgent. -- View this message in context: http://www.nabble.com/Package-for-.632-%28and-.632%2B%29-bootstrap-and-the-cross-validation-of-ROC-Parameters-tf4068544.html#a11561405 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two basic question regarding model selection in GAM
Qusetion #1 * Model selection in GAM can be done by using: 1. step.gam {gam} : A directional stepwise search 2. gam {mgcv} : Smoothness estimation using GCV or UBRE/AIC criterion Suppose my model starts with a additive model (linear part + spline part). Using gam() {mgcv} i got estimated degrees of freedom(edf) for the smoothing splines. Now I want to use the functional form of my model taking estimated degrees of freedoms in step.gam() {gam} to search a better model. You know mgcv masks over gam. So i can not use gam after using mgcv. Is there any way to stop mgcv. Qusetion #2 * Suppose i have three models: M1. GAM with thin plate regression spline(TPRS) M2. GAM with cubic regression spline(CRS) M3. GAM with some TPRS and CRS To choose best model among the three, can i use their GCV/AIC/UBRE criterion? -- View this message in context: http://www.nabble.com/two-basic-question-regarding-model-selection-in-GAM-tf3963362.html#a11248016 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to hide axis interval values in a plot
plot(cars) this shows a plot having interval values of axes (x-axis:5-25; y-axis:0-120). I want to hide these values. is there any way? -- View this message in context: http://www.nabble.com/How-to-hide-axis-interval-values-in-a-plot-tf3960418.html#a11238540 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to hide axis interval values in a plot
thanks. got my answer. spime wrote: plot(cars) this shows a plot having interval values of axes (x-axis:5-25; y-axis:0-120). I want to hide these values. is there any way? -- View this message in context: http://www.nabble.com/How-to-hide-axis-interval-values-in-a-plot-tf3960418.html#a11240427 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] BIC and Hosmer-Lemeshow statistic for logistic regression
I haven't find any helpful thread. How can i calculate BIC and Hosmer-Lemeshow statistic for a logistic regression model. I have used glm for logistic fit. -- View this message in context: http://www.nabble.com/BIC-and-Hosmer-Lemeshow-statistic-for-logistic-regression-tf3945943.html#a11193273 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BIC and Hosmer-Lemeshow statistic for logistic regression
Is there any windows version of Design package??? Frank E Harrell Jr wrote: spime wrote: I haven't find any helpful thread. How can i calculate BIC and Hosmer-Lemeshow statistic for a logistic regression model. I have used glm for logistic fit. See the Design package's lrm function and residuals.lrm for a better GOF test. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/BIC-and-Hosmer-Lemeshow-statistic-for-logistic-regression-tf3945943.html#a11195410 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loading problem with R2HTML package
I have downloaded latest version of R2HTML (v1.54) for 64-bit windows PC. My R version 2.5.0. My problem arises when i want to install SciViews-R which need R2HTML package. library(R2HTML) Error in `parent.env-`(`*tmp*`, value = NULL) : use of NULL environment is defunct Error: package/namespace load failed for 'R2HTML' Any remedy ? Regards -- View this message in context: http://www.nabble.com/Loading-problem-with-R2HTML-package-tf3938384.html#a11170223 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error using mgcv package
Dear Mr. Oksanen, First of all thanks for your reply. I have solved this problem in this way. My data consists of some categorical(CAT..) predictors and also some numerical variables(NUM..) have only {0,1} 0r {0,1,2,3} values. For applying GAM i just didnot consider their splines. I had came to this decision because when i tested the same data on S-PLUS, i got an error regarding the applicability of s(...) function on the predictors less than 4 different values. I dont know whether gam() of S-PLUS and gam() of mgcv(R) are same or not. anyway, thanks for your kind reply. bye Jari Oksanen wrote: spime sabya23 at gmail.com writes: Hi all, I need some solution in the following problem. The following error appears when i use mgcv package for implementing GAM. But the same formula works fine in gam package. model.gam - gam(formula = RES ~ CAT01+s(NUM01,5)+CAT02+CAT03+s(NUM02,5)+CAT04+ + CAT05+s(NUM03,5)+CAT06+CAT07+s(NUM04,5)+CAT08+s(NUM05,5)+CAT09+ + CAT10+s(NUM06,5)+CAT11+NUM07+CAT12+CAT13, + family = binomial(link = logit), data = train.data,na.action = na.exclude, + control = list(epsilon = 0.001,bf.epsilon = 0.001, maxit = 50, + bf.maxit = 10, trace = F)) Error in terms.formula(reformulate(term[i])) : invalid model formula in ExtractVars It seems that nobody answered this (in public). It seems that function s() in mgcv is defined as: s(..., k = -1, fx = FALSE, bs = tp, m = 0, by = NA) (Like you see reading its help ?s). The function definition starts with ..., and after three dots you cannot use positional arguments, but you must give the full argument name. Try replacing s(NUM01, 5) with s(NUM01, k=5). See also help in mgcv (?s pointing to ?choose.k) for interpreting argument 'k' which is not directly degrees of freedom. There may be other problems, but this probably fixes tha one you reported above. cheers, jari oksanen And after deleting df's model.gam - gam(formula = RES ~ CAT01+s(NUM01)+CAT02+CAT03+s(NUM02)+CAT04+ + CAT05+s(NUM03)+CAT06+CAT07+s(NUM04)+CAT08+s(NUM05)+CAT09+ + CAT10+s(NUM06)+CAT11+NUM07+CAT12+CAT13, + family = binomial(link = logit), data = train.data) Error in smooth.construct.tp.smooth.spec(object, data, knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom Can anybody show me some light in this case!!! Thanks in advance. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Error-using-mgcv-package-tf3900783.html#a11075667 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error using mgcv package
Hi all, I need some solution in the following problem. The following error appears when i use mgcv package for implementing GAM. But the same formula works fine in gam package. model.gam - gam(formula = RES ~ CAT01+s(NUM01,5)+CAT02+CAT03+s(NUM02,5)+CAT04+ + CAT05+s(NUM03,5)+CAT06+CAT07+s(NUM04,5)+CAT08+s(NUM05,5)+CAT09+ + CAT10+s(NUM06,5)+CAT11+NUM07+CAT12+CAT13, + family = binomial(link = logit), data = train.data,na.action = na.exclude, + control = list(epsilon = 0.001,bf.epsilon = 0.001, maxit = 50, + bf.maxit = 10, trace = F)) Error in terms.formula(reformulate(term[i])) : invalid model formula in ExtractVars And after deleting df's model.gam - gam(formula = RES ~ CAT01+s(NUM01)+CAT02+CAT03+s(NUM02)+CAT04+ + CAT05+s(NUM03)+CAT06+CAT07+s(NUM04)+CAT08+s(NUM05)+CAT09+ + CAT10+s(NUM06)+CAT11+NUM07+CAT12+CAT13, + family = binomial(link = logit), data = train.data) Error in smooth.construct.tp.smooth.spec(object, data, knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom Can anybody show me some light in this case!!! Thanks in advance. -- View this message in context: http://www.nabble.com/Error-using-mgcv-package-tf3900783.html#a11058255 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coding categorical variables in mixed environment
Hi R users, Suppose we have following data for a regression model: AGE:numerical SEX: male/female categorical COLOR: {blue, green, pink} categorical RESPONSE: yes/no categorical AGE SEX COLOR RESPONSE 10 M BLUE Y 12 M GREEN N 13 F PINK Y 11 M BLUE Y 13 M GREEN N 09 F GREEN N 15 F BLUE Y 11 F PINK Y 12 M PINK N 14 M GREENN I want to code the categorical data as {male =1, female =2}, {blue =1, green =2, pink = 3} {yes =1, no =0} and finally get the new table. how can i do this? waiting for reply. Thanks in advance. bye -- View this message in context: http://www.nabble.com/Coding-categorical-variables-in-mixed-environment-tf3896721.html#a11046822 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Determination of % of misclassification
Hi R-users, Suppose i have a two class discrimination problem and i am using logistic regression for the classification. model.logit - glm(formula=RES~NUM01+NUM02+NUM03+NUM04,family=binomial(link=logit),data=train.data) predict.logit-predict.glm(model.logit,newdata=test.data,type='response',se.fit=FALSE) predict.logit I have two questions: 1. Suppose our training data consists of 700 observations and testing set of 300. How can i determine no of misclassifications from predicted values and fitted values. 2. How to determine AUC from ROC curve and also threshold value? Waiting for reply, Thanks in advance, bye -- View this message in context: http://www.nabble.com/Determination-of---of-misclassification-tf3899598.html#a11055026 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to partition sample space
Hi R-users, I need your help in the following problem. Suppose we have a regression problem containing 25 predictor variables of 1000 individuals. I want to divide the data matrix ( 1000 x 25 ) into two partitions for training (70%) and testing(30%). For this reason, i sample 70% of data into another training matrix and remaining 30% into testing matrix using pseudorandom numbers (for future analysis). I need some efficient solution so that we can generate both matrix with minimal time. Thanks in advance. Sabyasachi -- View this message in context: http://www.nabble.com/How-to-partition-sample-space-tf3888059.html#a11021390 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLMM for unbalanced data
Hi friends, I need some help regarding generalized linear mixed model of unbalanced data. 1. Is their any package for applying Monte-Carlo Newton-Raphson (MCNR) or Monte-Carlo EM (MCEM) to estimate fixed and random effects? 2. My data is unbalanced (groups having unequal number of observations) and random-effect design matrix is not containing only 1's but some function of x(predictors). e.g., z = [avg(x1jk)0 0 0 00 0 avg(x2jk)0 0 00 ...... ......... ... 00 0 0 0 avg(x6jk)] where avg(xijk) = an (n_k X 1) column vector of average of jth measurement available for the ith subject in group k and n_k is the no. of observations in kth group. is it possible to apply glmmPQL or any other packge in this situation? If possible kindly tell me how? Thanks in advance. Waiting for reply. -- View this message in context: http://www.nabble.com/GLMM-for-unbalanced-data-tf3762358.html#a10635105 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating contingency table from mixed data
Hi, I am new in R. Please help me in the following case. I have data in hand: http://www.nabble.com/file/8225/Data.txt Data.txt There are some categorical (binary and nominal) and continuous variables. How can i get a generic RXC contingency table from this table? My main objective is to fine count in each cell and mean of continuous variables in each cell. Please reply. Thanks in advance. -- View this message in context: http://www.nabble.com/Creating-contingency-table-from-mixed-data-tf3698055.html#a10341180 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.