Re: [R] Package for .632 (and .632+) bootstrap and the cross-validation of ROC Parameters

2007-07-13 Thread spime

Suppose I have

Training data: my.train
Testing data: my.test

I want to calculate bootstrap error rate for logistic model. My wrapper
function for prediction

pred.glm - function(object, newdata) {
ret - as.factor(ifelse(predict.glm(object, newdata,
type='response')  0.4, 0, 1))
return(ret)
}

But i thing i cant understand if i want to calculate misclassification error
for my testing data what will be in my data in the following formula.

errorest(RES ~., data=???, model=glm, estimator=boot, predict=pred.glm, 
   est.para=control.errorest(nboot = 10))

Using my.test got following error,

Error in predict(mymodel, newdata = outbootdata) : 
unused argument(s) (newdata = list(RES = c(1, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
0), CAT01 = c(4, 4, 2, 4, 4, 4, 4, 4, 4, 2, 1, 2, 2, 4, 4, 4, 1, 1, 2, 2, 1,
4, 1, 4, 1, 4, 2, 4, 1, 4, 2, 3, 1, 1, 3, 3, 4, 2, 4, 2, 1, 2, 2, 1, 1, 
 

please reply...






Frank E Harrell Jr wrote:
 
 spime wrote:
 
 Hi users,
 
 I need to calculate .632 (and .632+) bootstrap and the cross-validation
 of
 area under curve (AUC) to compare my models. Is there any package for the
 same. I know about 'ipred' and using it i can calculate misclassification
 errors. 
 
 Please help. It's urgent. 
 
 See the validate* functions in the Design package.
 
 Note that some simulations (see http://biostat.mc.vanderbilt.edu/rms) 
 indicate that the advantages of .632 and .632+ over the ordinary 
 bootstrap are highly dependent on the choice of the accuracy measure 
 being validated.  The bootstrap variants seem to have advantages mainly 
 if an improper, inefficient, discontinuous scoring rule such as the 
 percent classified correct is used.
 
 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Package-for-.632-%28and-.632%2B%29-bootstrap-and-the-cross-validation-of-ROC-Parameters-tf4068544.html#a11578129
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package for .632 (and .632+) bootstrap and the cross-validation of ROC Parameters

2007-07-12 Thread spime


Hi users,

I need to calculate .632 (and .632+) bootstrap and the cross-validation of
area under curve (AUC) to compare my models. Is there any package for the
same. I know about 'ipred' and using it i can calculate misclassification
errors. 

Please help. It's urgent. 
-- 
View this message in context: 
http://www.nabble.com/Package-for-.632-%28and-.632%2B%29-bootstrap-and-the-cross-validation-of-ROC-Parameters-tf4068544.html#a11561405
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] two basic question regarding model selection in GAM

2007-06-22 Thread spime

Qusetion #1
*  
Model selection in GAM can be done by using:
1. step.gam {gam} : A directional stepwise search
2. gam {mgcv} : Smoothness estimation using GCV or UBRE/AIC criterion

Suppose my model starts with a additive model (linear part + spline part).
Using gam() {mgcv} i got estimated degrees of freedom(edf) for the smoothing
splines. Now I want to use the functional form of my model taking estimated
degrees of freedoms in step.gam() {gam} to search a better model.

You know mgcv masks over gam. So i can not use gam after using mgcv. Is
there any way to stop mgcv.

Qusetion #2
*
Suppose i have three models:
M1. GAM with thin plate regression spline(TPRS)
M2. GAM with cubic regression spline(CRS)
M3. GAM with some TPRS and CRS

To choose best model among the three, can i use their GCV/AIC/UBRE
criterion?
-- 
View this message in context: 
http://www.nabble.com/two-basic-question-regarding-model-selection-in-GAM-tf3963362.html#a11248016
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to hide axis interval values in a plot

2007-06-21 Thread spime



plot(cars)

this shows a plot having interval values of axes (x-axis:5-25;
y-axis:0-120). I want to hide these values. is there any way?
-- 
View this message in context: 
http://www.nabble.com/How-to-hide-axis-interval-values-in-a-plot-tf3960418.html#a11238540
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to hide axis interval values in a plot

2007-06-21 Thread spime

thanks. got my answer.



spime wrote:
 
 
 
plot(cars)
 
 this shows a plot having interval values of axes (x-axis:5-25;
 y-axis:0-120). I want to hide these values. is there any way?
 

-- 
View this message in context: 
http://www.nabble.com/How-to-hide-axis-interval-values-in-a-plot-tf3960418.html#a11240427
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BIC and Hosmer-Lemeshow statistic for logistic regression

2007-06-19 Thread spime


I haven't find any helpful thread. How can i calculate BIC and
Hosmer-Lemeshow statistic for a logistic regression model. I have used glm
for logistic fit.
-- 
View this message in context: 
http://www.nabble.com/BIC-and-Hosmer-Lemeshow-statistic-for-logistic-regression-tf3945943.html#a11193273
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BIC and Hosmer-Lemeshow statistic for logistic regression

2007-06-19 Thread spime


Is there any windows version of Design package???






Frank E Harrell Jr wrote:
 
 spime wrote:
 
 I haven't find any helpful thread. How can i calculate BIC and
 Hosmer-Lemeshow statistic for a logistic regression model. I have used
 glm
 for logistic fit.
 
 See the Design package's lrm function and residuals.lrm for a better GOF 
 test.
 
 
 
 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/BIC-and-Hosmer-Lemeshow-statistic-for-logistic-regression-tf3945943.html#a11195410
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loading problem with R2HTML package

2007-06-17 Thread spime


I have downloaded latest version of R2HTML (v1.54) for 64-bit windows PC. My
R version 2.5.0. My problem arises when i want to install SciViews-R which
need R2HTML package. 


 library(R2HTML)
Error in `parent.env-`(`*tmp*`, value = NULL) : 
use of NULL environment is defunct
Error: package/namespace load failed for 'R2HTML'

Any remedy ?

Regards


-- 
View this message in context: 
http://www.nabble.com/Loading-problem-with-R2HTML-package-tf3938384.html#a11170223
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error using mgcv package

2007-06-12 Thread spime

Dear Mr. Oksanen,

First of all thanks for your reply. I have solved this problem in this way.
My data consists of some categorical(CAT..) predictors and also some
numerical variables(NUM..) have only {0,1} 0r {0,1,2,3} values. For applying
GAM i just didnot consider their splines. I had came to this decision
because when i tested the same data on S-PLUS, i got an error regarding the
applicability of s(...) function on the predictors less than 4 different
values. I dont know whether gam() of S-PLUS and gam() of mgcv(R) are same or
not. anyway, thanks for your kind reply.

bye


Jari Oksanen wrote:
 
 spime sabya23 at gmail.com writes:
 
 
 
 Hi all,
 
 I need some solution in the following problem. The following error
 appears
 when i use mgcv package for implementing GAM. But the same formula
 works
 fine in gam package.
 
  model.gam - gam(formula = RES ~
  CAT01+s(NUM01,5)+CAT02+CAT03+s(NUM02,5)+CAT04+
 + CAT05+s(NUM03,5)+CAT06+CAT07+s(NUM04,5)+CAT08+s(NUM05,5)+CAT09+
 + CAT10+s(NUM06,5)+CAT11+NUM07+CAT12+CAT13,
 + family = binomial(link = logit), data = train.data,na.action =
 na.exclude,
 + control = list(epsilon = 0.001,bf.epsilon = 0.001, maxit = 50, 
 + bf.maxit = 10, trace = F))
 
 Error in terms.formula(reformulate(term[i])) : 
 invalid model formula in ExtractVars
 
 It seems that nobody answered this (in public). 
 
 It seems that function s() in mgcv is defined as:
 
 s(..., k = -1, fx = FALSE, bs = tp, m = 0, by = NA) 
 
 (Like you see reading its help ?s). The function definition starts with
 ...,
 and after three dots you cannot use positional arguments, but you must
 give the
 full argument name. Try replacing s(NUM01, 5) with s(NUM01, k=5). See also
 help
 in mgcv (?s pointing to ?choose.k) for interpreting argument 'k' which is
 not
 directly degrees of freedom.
 
 There may be other problems, but this probably fixes tha one you reported
 above.
 
 cheers, jari oksanen
 
 And after deleting df's 
 
 model.gam - gam(formula = RES ~
 CAT01+s(NUM01)+CAT02+CAT03+s(NUM02)+CAT04+
 + CAT05+s(NUM03)+CAT06+CAT07+s(NUM04)+CAT08+s(NUM05)+CAT09+
 + CAT10+s(NUM06)+CAT11+NUM07+CAT12+CAT13,
 + family = binomial(link = logit), data = train.data)
 
 Error in smooth.construct.tp.smooth.spec(object, data, knots) : 
 A term has fewer unique covariate combinations than specified
 maximum degrees of freedom
 
 Can anybody show me some light in this case!!!
 
 Thanks in advance.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Error-using-mgcv-package-tf3900783.html#a11075667
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error using mgcv package

2007-06-11 Thread spime

Hi all,

I need some solution in the following problem. The following error appears
when i use mgcv package for implementing GAM. But the same formula works
fine in gam package.

 model.gam - gam(formula = RES ~
 CAT01+s(NUM01,5)+CAT02+CAT03+s(NUM02,5)+CAT04+
+ CAT05+s(NUM03,5)+CAT06+CAT07+s(NUM04,5)+CAT08+s(NUM05,5)+CAT09+
+ CAT10+s(NUM06,5)+CAT11+NUM07+CAT12+CAT13,
+ family = binomial(link = logit), data = train.data,na.action = na.exclude,
+ control = list(epsilon = 0.001,bf.epsilon = 0.001, maxit = 50, 
+ bf.maxit = 10, trace = F))

Error in terms.formula(reformulate(term[i])) : 
invalid model formula in ExtractVars

And after deleting df's 

model.gam - gam(formula = RES ~ CAT01+s(NUM01)+CAT02+CAT03+s(NUM02)+CAT04+
+ CAT05+s(NUM03)+CAT06+CAT07+s(NUM04)+CAT08+s(NUM05)+CAT09+
+ CAT10+s(NUM06)+CAT11+NUM07+CAT12+CAT13,
+ family = binomial(link = logit), data = train.data)

Error in smooth.construct.tp.smooth.spec(object, data, knots) : 
A term has fewer unique covariate combinations than specified
maximum degrees of freedom
 

Can anybody show me some light in this case!!!

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/Error-using-mgcv-package-tf3900783.html#a11058255
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coding categorical variables in mixed environment

2007-06-10 Thread spime

Hi R users,

Suppose we have following data for a regression model:

AGE:numerical
SEX: male/female categorical
COLOR: {blue, green, pink} categorical
RESPONSE: yes/no categorical

AGE  SEX  COLOR  RESPONSE
10 M  BLUE Y
12 M  GREEN   N
13 F   PINK Y
11 M  BLUE Y
13 M  GREEN   N
09 F   GREEN   N
15 F   BLUE Y
11 F   PINK  Y
12 M  PINK  N
14 M  GREENN

I want to code the categorical data as {male =1, female =2}, {blue =1, green
=2, pink = 3} {yes =1, no =0} and finally get the new table.

how can i do this?

waiting for reply. Thanks in advance.

bye

 
-- 
View this message in context: 
http://www.nabble.com/Coding-categorical-variables-in-mixed-environment-tf3896721.html#a11046822
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Determination of % of misclassification

2007-06-10 Thread spime

Hi R-users,

Suppose i have a two class discrimination problem and i am using logistic
regression for the classification.

 model.logit -
 glm(formula=RES~NUM01+NUM02+NUM03+NUM04,family=binomial(link=logit),data=train.data)
 predict.logit-predict.glm(model.logit,newdata=test.data,type='response',se.fit=FALSE)
 predict.logit

I have two questions:

1.  Suppose our training data consists of 700 observations and testing set
of 300. How can i determine no of misclassifications from predicted values
and fitted values.

2. How to determine AUC from ROC curve and also threshold value?

Waiting for reply,

Thanks in advance,

bye
-- 
View this message in context: 
http://www.nabble.com/Determination-of---of-misclassification-tf3899598.html#a11055026
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to partition sample space

2007-06-08 Thread spime

Hi R-users,

I need your help in the following problem. Suppose we have a regression
problem containing 25 predictor variables of 1000 individuals. I want to
divide the data matrix ( 1000 x 25 ) into two partitions for training (70%)
and testing(30%). For this reason, i sample 70% of data into another
training matrix and remaining 30% into testing matrix using pseudorandom
numbers (for future analysis).

I need some efficient solution so that we can generate both matrix with
minimal time. 

Thanks in advance.

Sabyasachi
-- 
View this message in context: 
http://www.nabble.com/How-to-partition-sample-space-tf3888059.html#a11021390
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLMM for unbalanced data

2007-05-15 Thread spime

Hi friends,

I need some help regarding generalized linear mixed model of unbalanced
data. 

1. Is their any package for applying Monte-Carlo Newton-Raphson (MCNR) or
Monte-Carlo EM (MCEM) to estimate fixed and random effects?

2. My data is unbalanced (groups having unequal number of observations) and
random-effect design matrix is not containing only 1's but some function of
x(predictors). e.g., 


z = 
[avg(x1jk)0  0 0 00
0   avg(x2jk)0 0 00
   ...... .........   ...
00  0 0  0   avg(x6jk)]

where avg(xijk) = an (n_k X 1) column vector of average of jth measurement
available for the ith subject in group k and n_k is the no. of observations
in kth group.

is it possible to apply glmmPQL or any other packge in this situation? If
possible kindly tell me how?

Thanks in advance.
Waiting for reply.


  
-- 
View this message in context: 
http://www.nabble.com/GLMM-for-unbalanced-data-tf3762358.html#a10635105
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating contingency table from mixed data

2007-05-05 Thread spime

Hi,

I am new in R. Please help me in the following case.

I have data in hand:
http://www.nabble.com/file/8225/Data.txt Data.txt 

There are some categorical (binary and nominal) and continuous variables.

How can i get a generic RXC contingency table from this table? My main
objective is to fine count in each cell and mean of continuous variables in
each cell.

Please reply.

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/Creating-contingency-table-from-mixed-data-tf3698055.html#a10341180
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.