subject:"\[R\] logistic regression"

[R] logistic regression

2007-07-26 Thread Sullivan, Mary M

Greetings,
 

I am working on a logistic regression model in R and I am struggling with the 
code, as it is a relatively new program for me.  In searching Google for 
'logistic regression diagnostics' I came Elizabeth Brown's Lecture 14 from her 
Winter 2004 Biostatistics 515 course  
(http://courses.washington.edu/b515/l14.pdf) .  I found most of the code to be 
very helpful, but I am struggling with the lines on to calculate the observed 
and expected values in the 10 groups created by the cut function.  I get error 
messages in trying to create the E and O matrices:  R won't accept assignment 
of fi1c==j and it won't calculate the sum.  

 

I am wondering whether someone might be able to offer me some assistance...my 
search of the archives was not fruitful.

 

Here is the code that I adapted from the lecture notes:

 

fit - fitted(glm.lyme)

fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)),1))  

t-table(fitc)

fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)), 1), labels = F)

t-table(fitc)

 

#Calculate observed and expected values in ea group

E - matrix(0, nrow=10, ncol = 2)

O - matrix(0, nrow=10, ncol=2)

for (j in 1:10) {

  E[j, 2] = sum(fit[fitc==j])

  E[j, 1] = sum((1- fit)[fitc==j])

  O[j, 2] = sum(pcdata$lymdis[fitc==j])

  O[j, 1] = sum((1-pcdata$lymdis)[fitc==j])



}

 

Here is the error message:  Error in Summary.factor(..., na.rm = na.rm) : 
sum not meaningful for factors

 

 

I understand what it means; I just can't figure out how to get around it or how 
to get the output printed in table form.  Thank you in advance for any 
assistance.

 

Mary Sullivan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression

2007-07-26 Thread Frank E Harrell Jr

Mary,

The 10-group approach results in a low-resolution and fairly arbitrary 
calibration curve.  Also, it is the basis of the original 
Hosmer-Lemeshow goodness of fit statistic which has been superceded by 
the Hosmer et al single degree of freedom GOF test that does not require 
any binning.  The Design package handles both.  Do ?calibrate.lrm, 
?residuals.lrm, ?lrm for details.

Frank Harrell


Sullivan, Mary M wrote:
 Greetings,
  
 
 I am working on a logistic regression model in R and I am struggling with the 
 code, as it is a relatively new program for me.  In searching Google for 
 'logistic regression diagnostics' I came Elizabeth Brown's Lecture 14 from 
 her Winter 2004 Biostatistics 515 course  
 (http://courses.washington.edu/b515/l14.pdf) .  I found most of the code to 
 be very helpful, but I am struggling with the lines on to calculate the 
 observed and expected values in the 10 groups created by the cut function.  I 
 get error messages in trying to create the E and O matrices:  R won't accept 
 assignment of fi1c==j and it won't calculate the sum.  
 
  
 
 I am wondering whether someone might be able to offer me some assistance...my 
 search of the archives was not fruitful.
 
  
 
 Here is the code that I adapted from the lecture notes:
 
  
 
 fit - fitted(glm.lyme)
 
 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)),1))  
 
 t-table(fitc)
 
 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)), 1), labels = F)
 
 t-table(fitc)
 
  
 
 #Calculate observed and expected values in ea group
 
 E - matrix(0, nrow=10, ncol = 2)
 
 O - matrix(0, nrow=10, ncol=2)
 
 for (j in 1:10) {
 
   E[j, 2] = sum(fit[fitc==j])
 
   E[j, 1] = sum((1- fit)[fitc==j])
 
   O[j, 2] = sum(pcdata$lymdis[fitc==j])
 
   O[j, 1] = sum((1-pcdata$lymdis)[fitc==j])
 
 
 
 }
 
  
 
 Here is the error message:  Error in Summary.factor(..., na.rm = na.rm) : 
 sum not meaningful for factors
 
  
 
  
 
 I understand what it means; I just can't figure out how to get around it or 
 how to get the output printed in table form.  Thank you in advance for any 
 assistance.
 
  
 
 Mary Sullivan
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression

2007-07-26 Thread Mike Lawrence

Maybe try making sure the data is numeric:

fac.to.num=function(x) as.numeric(as.character(x))


On 26-Jul-07, at 9:34 AM, Sullivan, Mary M wrote:

 Greetings,


 I am working on a logistic regression model in R and I am  
 struggling with the code, as it is a relatively new program for  
 me.  In searching Google for 'logistic regression diagnostics' I  
 came Elizabeth Brown's Lecture 14 from her Winter 2004  
 Biostatistics 515 course  (http://courses.washington.edu/b515/ 
 l14.pdf) .  I found most of the code to be very helpful, but I am  
 struggling with the lines on to calculate the observed and expected  
 values in the 10 groups created by the cut function.  I get error  
 messages in trying to create the E and O matrices:  R won't accept  
 assignment of fi1c==j and it won't calculate the sum.



 I am wondering whether someone might be able to offer me some  
 assistance...my search of the archives was not fruitful.



 Here is the code that I adapted from the lecture notes:



 fit - fitted(glm.lyme)

 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)),1))

 t-table(fitc)

 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)), 1),  
 labels = F)

 t-table(fitc)



 #Calculate observed and expected values in ea group

 E - matrix(0, nrow=10, ncol = 2)

 O - matrix(0, nrow=10, ncol=2)

 for (j in 1:10) {

   E[j, 2] = sum(fit[fitc==j])

   E[j, 1] = sum((1- fit)[fitc==j])

   O[j, 2] = sum(pcdata$lymdis[fitc==j])

   O[j, 1] = sum((1-pcdata$lymdis)[fitc==j])



 }



 Here is the error message:  Error in Summary.factor(..., na.rm =  
 na.rm) :
 sum not meaningful for factors





 I understand what it means; I just can't figure out how to get  
 around it or how to get the output printed in table form.  Thank  
 you in advance for any assistance.



 Mary Sullivan

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less.
- Piet Hein

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression and dummy variable coding

2007-06-29 Thread Frank E Harrell Jr

Bingshan Li wrote:
 Hi All,
 
 Now it works. Thanks for all your answers and the explanations are  
 very clear.
 
 Bingshan

But note that you are not using R correctly unless you are doing a 
simulation and have some special speed issues.  Let the model functions 
do all this for you.

Frank

 
 On Jun 28, 2007, at 7:44 PM, Seyed Reza Jafarzadeh wrote:
 
 NewVar - relevel( factor(OldVar), ref = b)
 should create a dummy variable, and change the reference category  
 for the model.

 Reza


 On 6/28/07, Bingshan Li [EMAIL PROTECTED] wrote:
 Hello everyone,

 I have a variable with several categories and I want to convert this
 into dummy variables and do logistic regression on it. I used
 model.matrix to create dummy variables but it always picked the
 smallest one as the reference. For example,

 model.matrix(~.,data=as.data.frame(letters[1:5]))

 will code 'a' as '0 0 0 0'. But I want to code another category as
 reference, say 'b'. How to do it in R using model.matrix? Is there
 other way to do it if model.matrix  has no such functionality?

 Thanks!



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression and dummy variable coding

2007-06-29 Thread Li, Bingshan

Hi Frank,
 
I do not quite get you. What do you mean by simulation and speed issues? I do 
not see why they have to be considered in logistic regression.
 
Thanks.
 
Bingshan



From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
Sent: Fri 6/29/2007 7:40 AM
To: Li, Bingshan
Cc: Seyed Reza Jafarzadeh; r-help@stat.math.ethz.ch
Subject: Re: [R] logistic regression and dummy variable coding



Bingshan Li wrote:
 Hi All,

 Now it works. Thanks for all your answers and the explanations are 
 very clear.

 Bingshan

But note that you are not using R correctly unless you are doing a
simulation and have some special speed issues.  Let the model functions
do all this for you.

Frank


 On Jun 28, 2007, at 7:44 PM, Seyed Reza Jafarzadeh wrote:

 NewVar - relevel( factor(OldVar), ref = b)
 should create a dummy variable, and change the reference category 
 for the model.

 Reza


 On 6/28/07, Bingshan Li [EMAIL PROTECTED] wrote:
 Hello everyone,

 I have a variable with several categories and I want to convert this
 into dummy variables and do logistic regression on it. I used
 model.matrix to create dummy variables but it always picked the
 smallest one as the reference. For example,

 model.matrix(~.,data=as.data.frame(letters[1:5]))

 will code 'a' as '0 0 0 0'. But I want to code another category as
 reference, say 'b'. How to do it in R using model.matrix? Is there
 other way to do it if model.matrix  has no such functionality?

 Thanks!



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression and dummy variable coding

2007-06-29 Thread Peter Dalgaard

Li, Bingshan wrote:
 Hi Frank,
  
 I do not quite get you. What do you mean by simulation and speed issues? I do 
 not see why they have to be considered in logistic regression.
  
   
Exactly. So don't use techniques that are only needed when such issues 
do have to be considered.

 Thanks.
  
 Bingshan

 

 From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
 Sent: Fri 6/29/2007 7:40 AM
 To: Li, Bingshan
 Cc: Seyed Reza Jafarzadeh; r-help@stat.math.ethz.ch
 Subject: Re: [R] logistic regression and dummy variable coding



 Bingshan Li wrote:
   
 Hi All,

 Now it works. Thanks for all your answers and the explanations are 
 very clear.

 Bingshan
 

 But note that you are not using R correctly unless you are doing a
 simulation and have some special speed issues.  Let the model functions
 do all this for you.

 Frank

   
 On Jun 28, 2007, at 7:44 PM, Seyed Reza Jafarzadeh wrote:

 
 NewVar - relevel( factor(OldVar), ref = b)
 should create a dummy variable, and change the reference category 
 for the model.

 Reza


 On 6/28/07, Bingshan Li [EMAIL PROTECTED] wrote:
   
 Hello everyone,

 I have a variable with several categories and I want to convert this
 into dummy variables and do logistic regression on it. I used
 model.matrix to create dummy variables but it always picked the
 smallest one as the reference. For example,

 model.matrix(~.,data=as.data.frame(letters[1:5]))

 will code 'a' as '0 0 0 0'. But I want to code another category as
 reference, say 'b'. How to do it in R using model.matrix? Is there
 other way to do it if model.matrix  has no such functionality?

 Thanks!



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 


 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University



   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression and dummy variable coding

2007-06-28 Thread Bingshan Li

Hello everyone,

I have a variable with several categories and I want to convert this  
into dummy variables and do logistic regression on it. I used  
model.matrix to create dummy variables but it always picked the  
smallest one as the reference. For example,

model.matrix(~.,data=as.data.frame(letters[1:5]))

will code 'a' as '0 0 0 0'. But I want to code another category as  
reference, say 'b'. How to do it in R using model.matrix? Is there  
other way to do it if model.matrix  has no such functionality?

Thanks!



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression and dummy variable coding

2007-06-28 Thread Marc Schwartz

On Thu, 2007-06-28 at 18:16 -0500, Bingshan Li wrote:
 Hello everyone,
 
 I have a variable with several categories and I want to convert this  
 into dummy variables and do logistic regression on it. I used  
 model.matrix to create dummy variables but it always picked the  
 smallest one as the reference. For example,
 
 model.matrix(~.,data=as.data.frame(letters[1:5]))
 
 will code 'a' as '0 0 0 0'. But I want to code another category as  
 reference, say 'b'. How to do it in R using model.matrix? Is there  
 other way to do it if model.matrix  has no such functionality?
 
 Thanks!

See ?relevel

Note that this (creating dummy variables) will be done automatically in
R's modeling functions, which default to treatment contrasts on factors.
model.matrix() is used internally by model functions such as glm().

For example using a single factor:

FL - factor(letters[1:5])

 FL
[1] a b c d e
Levels: a b c d e

 contrasts(FL)
  b c d e
a 0 0 0 0
b 1 0 0 0
c 0 1 0 0
d 0 0 1 0
e 0 0 0 1



FL.b - relevel(FL, b)

 FL.b
[1] a b c d e
Levels: b a c d e

 contrasts(FL.b)
  a c d e
b 0 0 0 0
a 1 0 0 0
c 0 1 0 0
d 0 0 1 0
e 0 0 0 1



See ?contrasts and the Statistical Models section in An Introduction to
R.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression and dummy variable coding

2007-06-28 Thread Seyed Reza Jafarzadeh

NewVar - relevel( factor(OldVar), ref = b)
should create a dummy variable, and change the reference category for the model.

Reza


On 6/28/07, Bingshan Li [EMAIL PROTECTED] wrote:
 Hello everyone,

 I have a variable with several categories and I want to convert this
 into dummy variables and do logistic regression on it. I used
 model.matrix to create dummy variables but it always picked the
 smallest one as the reference. For example,

 model.matrix(~.,data=as.data.frame(letters[1:5]))

 will code 'a' as '0 0 0 0'. But I want to code another category as
 reference, say 'b'. How to do it in R using model.matrix? Is there
 other way to do it if model.matrix  has no such functionality?

 Thanks!



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression and dummy variable coding

2007-06-28 Thread Bingshan Li

Hi All,

Now it works. Thanks for all your answers and the explanations are  
very clear.

Bingshan

On Jun 28, 2007, at 7:44 PM, Seyed Reza Jafarzadeh wrote:

 NewVar - relevel( factor(OldVar), ref = b)
 should create a dummy variable, and change the reference category  
 for the model.

 Reza


 On 6/28/07, Bingshan Li [EMAIL PROTECTED] wrote:
 Hello everyone,

 I have a variable with several categories and I want to convert this
 into dummy variables and do logistic regression on it. I used
 model.matrix to create dummy variables but it always picked the
 smallest one as the reference. For example,

 model.matrix(~.,data=as.data.frame(letters[1:5]))

 will code 'a' as '0 0 0 0'. But I want to code another category as
 reference, say 'b'. How to do it in R using model.matrix? Is there
 other way to do it if model.matrix  has no such functionality?

 Thanks!



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R logistic regression - comparison with SPSS

2007-06-10 Thread Alain Reymond

Dear R-list members,

I have been a user of SPSS for a few years and quite new to R. I read
the documentation and tried samples but I have some problems to obtain
results for a logistic regression under R.

The following SPSS script

LOGISTIC REGRESSION  vir
/METHOD = FSTEP(LR) d007 d008 d009 d010 d011 d012 d013 d014 d015
d016 d017 d018 d069 d072 d073
/SAVE = PRED COOK SRESID
/CLASSPLOT
/PRINT = GOODFIT CI(95)
/CRITERIA = PIN(.10) POUT(.10) ITERATE(40) CUT(.5) .

predicts vir (value 0 or 1) according to my parameters d007 to d073. It
gives me the parameters to retain in the logistic equation and the
intercept.
The calculation is made from a set of values of about 1.000 cases.

I have been unable to translate it with success under R. I would like to
check if I can obtain the same results than with SPSS. Can someone help
me translate it under R ? I would be most grateful.

I thank you.

Best regards.

-- 
Alain Reymond
CEIA
Bd Saint-Michel 119
1040 Bruxelles
Tel: +32 2 736 04 58
Fax: +32 2 736 58 02
PGPId :  0xEFB06E2E

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R logistic regression - comparison with SPSS

2007-06-10 Thread Tobias Verbeke

Alain Reymond wrote:

 Dear R-list members,
 
 I have been a user of SPSS for a few years and quite new to R. I read
 the documentation and tried samples but I have some problems to obtain
 results for a logistic regression under R.
 
 The following SPSS script
 
 LOGISTIC REGRESSION  vir
 /METHOD = FSTEP(LR) d007 d008 d009 d010 d011 d012 d013 d014 d015
 d016 d017 d018 d069 d072 d073
 /SAVE = PRED COOK SRESID
 /CLASSPLOT
 /PRINT = GOODFIT CI(95)
 /CRITERIA = PIN(.10) POUT(.10) ITERATE(40) CUT(.5) .
 
 predicts vir (value 0 or 1) according to my parameters d007 to d073. It
 gives me the parameters to retain in the logistic equation and the
 intercept.
 The calculation is made from a set of values of about 1.000 cases.
 
 I have been unable to translate it with success under R. I would like to
 check if I can obtain the same results than with SPSS. Can someone help
 me translate it under R ? I would be most grateful.

If all the variables you mention are available in a data frame, e.g. 
virdf, than you can fit a logistic regression model by

mymodel - glm(vir ~ d007 + d008 + d009 + d010 + d011 + d012 + d013 + 
d014 + d015 + d016 + d017 + d018 + d069 + d072 + d073, data = virdf, 
family = binomial)

or

mymodel - glm(vir ~ ., data = virdf, family = binomial)

if there are no variables other than those mentioned above in the
virdf data frame.

Contrary to SPSS you need not specify in advance what you would like
as output. Everything useful is stored in the model object (here: 
mymodel) which can then be used to further investigate the model in
many ways:

summary(mymodel)
anova(mymodel, test = Chisq)
plot(mymodel)

See ?summary.glm, ?anova.glm etc.

For stepwise variable selection (not necessarily corresponding to
STEP(LR)), see ?step or ?add1 to do it `by hand'.

HTH,
Tobias

P.S. You can find an introduction to R specifically targeted at (SAS 
and) SPSS users here:

http://oit.utk.edu/scc/RforSASSPSSusers.pdf

-- 

Tobias Verbeke - Consultant
Business  Decision Benelux
Rue de la révolution 8
1000 Brussels - BELGIUM

+32 499 36 33 15
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logistic Regression Question: Risk Threshold

2007-04-27 Thread diana

Hi,

I am working on problem 2 of Chapter 8 in Data Analysis and Graphics Using R 
and don't know how to approach the second half of the question:

In the data set (an artificial one of 3121 patients, that is similar to a 
subset of the data analyzed in Stiell et al., 2001) head.injury, obtain a 
logistic regression model relating clinically.important.brain.injury to other 
variables. Patients whose risk is sufficiently high will be sent for CT 
(computed tomography). Using a risk threshold of 0.025 (2.5%), turn the result 
into a decision rule for use of CT.

This is what I have so far:

 names(head.injury)
 [1] age.65amnesia.before   
 [3] basal.skull.fracture  GCS.decrease 
 [5] GCS.13GCS.15.2hours
 [7] high.risk loss.of.consciousness
 [9] open.skull.fracture   vomiting 
[11] clinically.important.brain.injury
 attach(head.injury)

 head.glm = glm(clinically.important.brain.injury ~ ., family=binomial, 
 data=head.injury)
 summary(head.glm)

Call:
glm(formula = clinically.important.brain.injury ~ ., family = binomial, 
data = head.injury)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-2.2774  -0.3511  -0.2095  -0.1489   3.0028  

Coefficients:
  Estimate Std. Error z value Pr(|z|)
(Intercept)-4.4972 0.1629 -27.611   2e-16 ***
age.65  1.3734 0.1827   7.518 5.56e-14 ***
amnesia.before  0.6893 0.1725   3.996 6.45e-05 ***
basal.skull.fracture1.9620 0.2064   9.504   2e-16 ***
GCS.decrease   -0.2688 0.3680  -0.730 0.465152
GCS.13  1.0613 0.2820   3.764 0.000168 ***
GCS.15.2hours   1.9408 0.1663  11.669   2e-16 ***
high.risk   1.1115 0.1591   6.984 2.86e-12 ***
loss.of.consciousness   0.9554 0.1959   4.877 1.08e-06 ***
open.skull.fracture 0.6304 0.3151   2.001 0.045424 *  
vomiting1.2334 0.1961   6.290 3.17e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1741.6  on 3120  degrees of freedom
Residual deviance: 1201.3  on 3110  degrees of freedom
AIC: 1223.3

Number of Fisher Scoring iterations: 6


How do I assess which patients have a high risk level, and how does the risk 
threshold play into that?

Thanks in advance,

Diana

   
-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logistic Regression

2007-03-23 Thread Sergio Della Franca

Dear R-Helpers,

I want to perform a logistic regression on my dataset (y).

I used the following code:

logistic-glm(formula=interest_variable~.,family = binomial(link = logit),
data = y)


This run correctly.
Then i want to develop the logistic regression with three different method:
-forward
-backward
-stepwise

I used these procedure:
forward-step(logistica,direction=forward)
backward-step(logistica,direction=backward)
stepwise-step(logistica,direction=both)

Even these run correctly, but i obtained the same results with the three
different procedures.

Then I tought i made some mistakes.

My question is:

Is correct what i did?
Is correct that three different methods return the same results?

If i made some mistakes, what is the correct method to correctly perform the
three different logistics regression?


Thank you in advance.


Sergio Della Franca.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression

2007-03-15 Thread Milton Cezar Ribeiro

Dear All,

I would like adjust and know the R2 of following presence/absence data:

x-1:10
y-c(0,0,0,0,0,0,0,1,1,1)

I tryed use clogit (survival package) but it don´t worked. 

Any idea?

miltinho

__


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression

2007-03-15 Thread Ted Harding

On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote:
 Dear All,
 
 I would like adjust and know the R2 of following presence/absence
 data:
 
 x-1:10
 y-c(0,0,0,0,0,0,0,1,1,1)
 
 I tryed use clogit (survival package) but it don´t worked. 
 
 Any idea?
 
 miltinho

You are trying to fit an equation

  P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b))

to data

  x =   1   2   3   4   5   6   7   8   9  10

  y =   0   0   0   0   0   0   0   1   1   1

by what amounts to a maximum-likelihood method, i.e. which
chooses the parameter values to maximize the probability of
the observed values of y (given the values of x).

The maximum probability possible is 1, so if you can find
parameters which make P[y = 1] = 0 for x = 1, 2, ... , 7
and P[y = 1] for x = 8, 9, 10 then you have done it.

This will be approximated as closely as you please for any
value of a between 7 and 8, and sufficiently small values of b,
since for such parameter values P[y = 1 ; x] - 0 for x  a,
and - 1 for x  a.

You therefore have a solution which is both indeterminate
(any a such that 7  a  8) and singular (b - 0). So it
will defeat standard estimation methods.

That is the source of your problem. In a more general context,
this is an instance of the linear separation problem in
logistic regression (and similar methods, such a probit
analysis). Basically, this situation implies that, according
to the data, there is a perfect prediction for the results.

There is no well-defined way of dealing with it; any approach
starts from the proposition this perfect prediction is not
a reasonable result in the context of my data, and continues
by following up what you think should be meant by not a
reasonable result. What this is likely to mean would be on
the lines of b should not be that small, which then imposes
upon you the need to be more specific about how small b may
reasonably be. Then carry on from there (perhaps by fixing
the value of b at different reasonable levels, and simply
fitting a for each value of b).

Hoping this helps ... but I'm wondering how it happens that
you have such data ... ??

best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 15-Mar-07   Time: 19:38:51
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression TRY LOGISTF

2007-03-15 Thread Jeff Miller

If Ted is right, then one work-around is to use Firth's method for penalized
log-likelihood. The technique is originally intended to reduce small sample
bias. However, it's now being extended to deal with complete and quasi
separation problems.

I believe the library is called logistf but I haven't had a chance to try
itI know the SAS version (called the fl macro) works fine.

Reference --
http://www.meduniwien.ac.at/user/georg.heinze/techreps/tr2_2004.pdf


Hope this helps,

Jeff Miller
University of Florida
AlphaPoint05, Inc.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
Sent: Thursday, March 15, 2007 2:39 PM
To: R-help
Subject: Re: [R] logistic regression

On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote:
 Dear All,
 
 I would like adjust and know the R2 of following presence/absence
 data:
 
 x-1:10
 y-c(0,0,0,0,0,0,0,1,1,1)
 
 I tryed use clogit (survival package) but it don´t worked. 
 
 Any idea?
 
 miltinho

You are trying to fit an equation

  P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b))

to data

  x =   1   2   3   4   5   6   7   8   9  10

  y =   0   0   0   0   0   0   0   1   1   1

by what amounts to a maximum-likelihood method, i.e. which chooses the
parameter values to maximize the probability of the observed values of y
(given the values of x).

The maximum probability possible is 1, so if you can find parameters which
make P[y = 1] = 0 for x = 1, 2, ... , 7 and P[y = 1] for x = 8, 9, 10 then
you have done it.

This will be approximated as closely as you please for any value of a
between 7 and 8, and sufficiently small values of b, since for such
parameter values P[y = 1 ; x] - 0 for x  a, and - 1 for x  a.

You therefore have a solution which is both indeterminate (any a such that 7
 a  8) and singular (b - 0). So it will defeat standard estimation
methods.

That is the source of your problem. In a more general context, this is an
instance of the linear separation problem in logistic regression (and
similar methods, such a probit analysis). Basically, this situation implies
that, according to the data, there is a perfect prediction for the results.

There is no well-defined way of dealing with it; any approach starts from
the proposition this perfect prediction is not a reasonable result in the
context of my data, and continues by following up what you think should be
meant by not a reasonable result. What this is likely to mean would be on
the lines of b should not be that small, which then imposes upon you the
need to be more specific about how small b may reasonably be. Then carry on
from there (perhaps by fixing the value of b at different reasonable levels,
and simply fitting a for each value of b).

Hoping this helps ... but I'm wondering how it happens that you have such
data ... ??

best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 15-Mar-07   Time: 19:38:51
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logistic regression for drugs Interactions

2007-03-14 Thread francogrex


I have the model below, for which I run a logistic regression including the
interaction term (NSAID*Diuretic)

fit1=glm(resp ~ nsaid+diuretic+I(nsaid*diuretic), family= binomial,data=w)


NSAID   DiureticPresent Absent
0   0   185 6527
0   1   53  1444
1   0   42  1293
1   1   25  253




CoefficientsStd. Error  z value Pr(|z|)
(Intercept) -3.563350.07456 -47.794   2e-16 ***
NSAID   0.13630 0.17361 0.7850.43242
Diuretic0.25847 0.15849 1.6310.10293
I(NSAID*Diuretic)   0.85407 0.30603 2.7910.00526 ** 
---
Signif. Codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
Odds ratio is 2.35 [ln(0.85407)] times higher when NSAID is present in
addition to Diuretic.

---

Odds ratio of Nausea when on Diuretic is exp(0.25847)= 1.29
and the odds ratio of Nausea when on NSAID is exp(0.13630)=1.14
Normally when we want to see the odds ratio of Nausea when a patient is on
both drugs we multiply 1.29*1.14= 1.48 (is this correct? do we multiply or
do we add?)

But since the interaction term is significant then we take that into
account? Does that mean that the odds ratio of the interaction is
exp(0.25847)*exp(0.13630)*exp(0.85407)=3.486297 ?

Or do we use additions?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Logistic-regression-for-drugs-Interactions-tf3404506.html#a9482343
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression on contingency table

2007-03-05 Thread Dieter Menne

Bingshan Li bli1 at bcm.tmc.edu writes:

 I am wondering if there is a way in R to fit logistic regression on  
 contingency table. If I have original data, I can transform the data  
 into a design matrix and then call glm to fit the regression. But now  
 I have a 2x3 contingency table with first row for response 0 and  
 second row for response 1, and the columns are 3 levels of predictor  
 variable. The 3 levels are not ordinal though and indicator variables  
 would be more appreciate.

From Documentation of GLM:

For binomial and quasibinomial families the response can also be specified 
as a factor (when the first level denotes failure and all others success) 
or as a two-column matrix with the columns giving the numbers of successes 
and failures.  


Dieter Menne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression on contingency table

2007-03-05 Thread Marc Schwartz

On Mon, 2007-03-05 at 15:31 +, Dieter Menne wrote:
 Bingshan Li bli1 at bcm.tmc.edu writes:
 
  I am wondering if there is a way in R to fit logistic regression on  
  contingency table. If I have original data, I can transform the data  
  into a design matrix and then call glm to fit the regression. But now  
  I have a 2x3 contingency table with first row for response 0 and  
  second row for response 1, and the columns are 3 levels of predictor  
  variable. The 3 levels are not ordinal though and indicator variables  
  would be more appreciate.
 
 From Documentation of GLM:
 
 For binomial and quasibinomial families the response can also be specified 
 as a factor (when the first level denotes failure and all others success) 
 or as a two-column matrix with the columns giving the numbers of successes 
 and failures.  
 
 
 Dieter Menne


Just to expand on Dieter's comments, one trick to taking this approach
is to coerce the contingency table you are starting with to a data
frame, and then specify a 'weights' argument to glm().

Taking some dummy example data in a 2D contingency table:

 TAB
   X
Y   A B C
  0 1 9 2
  1 3 3 2


So we have X (IV) and Y (Response).

Now, coerce TAB to a data frame. See ?as.data.frame.table and ?xtabs,
which reverses the process back to a contingency table:

DFT - as.data.frame(TAB)

 DFT
  Y X Freq
1 0 A1
2 1 A3
3 0 B9
4 1 B3
5 0 C2
6 1 C2


As an FYI, gets us back to 'TAB':

 xtabs(Freq ~ ., DFT)
   X
Y   A B C
  0 1 9 2
  1 3 3 2



Now create the model, using 'Freq' for the case weights:

fit - glm(Y ~ X, weights = Freq, data = DFT, family = binomial)


 summary(fit)

Call:
glm(formula = Y ~ X, family = binomial, data = DFT, weights = Freq)

Deviance Residuals: 
 1   2   3   4   5   6  
-1.665   1.314  -2.276   2.884  -1.665   1.665  

Coefficients:
Estimate Std. Error z value Pr(|z|)  
(Intercept)1.099  1.155   0.951   0.3414  
XB-2.197  1.333  -1.648   0.0994 .
XC-1.099  1.528  -0.719   0.4720  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 26.920  on 5  degrees of freedom
Residual deviance: 23.540  on 3  degrees of freedom
AIC: 29.54

Number of Fisher Scoring iterations: 4



An alternative to the above is to use a function that I just posted in
the past week for another query, called expand.dft():

expand.dft - function(x, na.strings = NA, as.is = FALSE, dec = .)
{
  DF - sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
   simplify = FALSE)

  DF - subset(do.call(rbind, DF), select = -Freq)

  for (i in 1:ncol(DF))
  {
DF[[i]] - type.convert(as.character(DF[[i]]),
na.strings = na.strings,
as.is = as.is, dec = dec)
   
  }

  DF
} 


This takes the data frame table 'DFT' from above and converts it back to
the raw observations:

DF - expand.dft(DFT)

 DF
Y X
1   0 A
2   1 A
2.1 1 A
2.2 1 A
3   0 B
3.1 0 B
3.2 0 B
3.3 0 B
3.4 0 B
3.5 0 B
3.6 0 B
3.7 0 B
3.8 0 B
4   1 B
4.1 1 B
4.2 1 B
5   0 C
5.1 0 C
6   1 C
6.1 1 C


As an FYI, gets us back to 'TAB':

 table(DF)
   X
Y   A B C
  0 1 9 2
  1 3 3 2



So, now we can use the normal approach for glm():

fit2 - glm(Y ~ X, data = DF, family = binomial)

 summary(fit2)

Call:
glm(formula = Y ~ X, family = binomial, data = DF)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-1.6651  -0.7585  -0.7585   0.8632   1.6651  

Coefficients:
Estimate Std. Error z value Pr(|z|)  
(Intercept)1.099  1.155   0.951   0.3414  
XB-2.197  1.333  -1.648   0.0994 .
XC-1.099  1.528  -0.719   0.4720  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 26.920  on 19  degrees of freedom
Residual deviance: 23.540  on 17  degrees of freedom
AIC: 29.54

Number of Fisher Scoring iterations: 4


Note of course that the DF's are different, though the Null and Residual
Deviances and AIC are the same:


Taking the latter approach, will of course enable you to make subsequent
manipulations on the raw data if you wish.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression on contingency table

2007-03-04 Thread Bingshan Li

Hi folks,

I am wondering if there is a way in R to fit logistic regression on  
contingency table. If I have original data, I can transform the data  
into a design matrix and then call glm to fit the regression. But now  
I have a 2x3 contingency table with first row for response 0 and  
second row for response 1, and the columns are 3 levels of predictor  
variable. The 3 levels are not ordinal though and indicator variables  
would be more appreciate.

Thanks a lot!

Bingshan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal

Hi,

I am using logistic regression model named lrm(Design)

Rite now I was using Area Under Curve (AUC) for testing my model. But, now I
have to calculate precision/recall of the model on test cases.
For lrm, precision and recal would be simply defined with the help of 2
terms below:
True Positive (TP) - Number of test cases where class 1 is given probability
= 0.5.
False Negative (FP) - Number of test cases where class 0 is given
probability = 0.5.

Precision = TP / (TP + FP)
Recall = TP / ( Number of Positive Samples in test data)

Any help is appreciated.

I an write a long code with for loops and all, but is there any inbuild
function or just few commands that would do the task.

regards,
Nitin

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread Frank E Harrell Jr

nitin jindal wrote:
 Hi,
 
 I am using logistic regression model named lrm(Design)
 
 Rite now I was using Area Under Curve (AUC) for testing my model. But, now I
 have to calculate precision/recall of the model on test cases.
 For lrm, precision and recal would be simply defined with the help of 2
 terms below:
 True Positive (TP) - Number of test cases where class 1 is given probability
 = 0.5.
 False Negative (FP) - Number of test cases where class 0 is given
 probability = 0.5.

Why 0.5?

 
 Precision = TP / (TP + FP)
 Recall = TP / ( Number of Positive Samples in test data)

Those are improper scoring rules that can be tricked.  If the outcome is 
rare (say 0.02 incidence) you could just predict that no one will have 
the outcome and be correct 0.98 of the time.  I suggest validating the 
model for discrimination (e.g., AUC) and calibration.

Frank

 
 Any help is appreciated.
 
 I an write a long code with for loops and all, but is there any inbuild
 function or just few commands that would do the task.
 
 regards,
 Nitin
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal

On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 Why 0.5?


The probability has to adjusted based on some hit and trials. I just
mentioned it as an example


 Those are improper scoring rules that can be tricked.  If the outcome is
 rare (say 0.02 incidence) you could just predict that no one will have
 the outcome and be correct 0.98 of the time.  I suggest validating the
 model for discrimination (e.g., AUC) and calibration.


I just have to calculate precision/recall for rare outcome. If the positive
outcome is rare ( say 0.02 incidence) and I predict it to be negative all
the time, my recall would be 0, which is bad. So, precision and recall can
take care of skewed data.

Frank


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread Frank E Harrell Jr

nitin jindal wrote:
 On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 
 Why 0.5?
 
 
 The probability has to adjusted based on some hit and trials. I just
 mentioned it as an example

Using a cutoff is not a good idea unless the utility (loss) function is 
discontinuous and is the same for every subject (in the medical field 
utilities are almost never constant).  And if you are using the data to 
find the cutoff, this will require bootstrapping to penalize for the 
cutoff not being pre-specified.

 
 Those are improper scoring rules that can be tricked.  If the outcome is
 rare (say 0.02 incidence) you could just predict that no one will have
 the outcome and be correct 0.98 of the time.  I suggest validating the
 model for discrimination (e.g., AUC) and calibration.
 
 
 I just have to calculate precision/recall for rare outcome. If the positive
 outcome is rare ( say 0.02 incidence) and I predict it to be negative all
 the time, my recall would be 0, which is bad. So, precision and recall can
 take care of skewed data.

No, that is not clear.  The overall classification error would only be 
0.02 in that case.  It is true though that one of the two conditional 
probabilities would not be good.

 
 Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal

On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 nitin jindal wrote:

 Using a cutoff is not a good idea unless the utility (loss) function is
 discontinuous and is the same for every subject (in the medical field
 utilities are almost never constant).  And if you are using the data to
 find the cutoff, this will require bootstrapping to penalize for the
 cutoff not being pre-specified.


Thnx for this info. If I still have to use cutoff, I will do bootstrapping.
I dont know any alternative to this to compute precision/recall for logistic
regression model.

No, that is not clear.  The overall classification error would only be
 0.02 in that case.  It is true though that one of the two conditional
 probabilities would not be good.


I forgot to mention that for my data, overall classification error is
non-significant. I am only interested in precision/recall for rare outcome.

nitin


  Frank

 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal

Hi.

Thnx a lot. I will try that.

nitin

On 1/24/07, Tobias Sing [EMAIL PROTECTED] wrote:

 Maybe ROCR might help you.
 You can visualize the prec/rec-trade-off across the range of all cutoffs:
 assuming your numerical predictions are in scores and the true class
 labels are in classes:
 pred - prediction( scores, classes )
 perf - performance(pred, 'rec','prec')
 plot(perf)

 HTH,
   Tobias

 On 1/24/07, nitin jindal [EMAIL PROTECTED] wrote:
  Hi,
 
  I am using logistic regression model named lrm(Design)
 
  Rite now I was using Area Under Curve (AUC) for testing my model. But,
 now I
  have to calculate precision/recall of the model on test cases.
  For lrm, precision and recal would be simply defined with the help of 2
  terms below:
  True Positive (TP) - Number of test cases where class 1 is given
 probability
  = 0.5.
  False Negative (FP) - Number of test cases where class 0 is given
  probability = 0.5.
 
  Precision = TP / (TP + FP)
  Recall = TP / ( Number of Positive Samples in test data)
 
  Any help is appreciated.
 
  I an write a long code with for loops and all, but is there any inbuild
  function or just few commands that would do the task.
 
  regards,
  Nitin
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Tobias Sing
 Computational Biology and Applied Algorithmics
 Max Planck Institute for Informatics
 Saarbrucken, Germany
 Phone: +49 681 9325 315
 Fax: +49 681 9325 399
 http://www.tobiassing.net


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression model + Cross-Validation

2007-01-22 Thread Weiwei Shi

why not use lda{MASS} and it has cv=T option; it does loo, though.
Or use randomForest.

if you have to use lrm, then the following code might help:

n.fold - 5 # 5-fold cv
n.sample - 50 # assumed 50 samples
s - sample(1:n.fold, size=n.sample, replace=T)
for (i in 1:n.fold){
  # create your training data and validation data for each fold
  trn - YOURWHOLEDATAFRAME[s!=i,]
  val - YOURWHOLEDATAFRAME[s==i,]
  # now do your own modeling using lrm
  # todo
}

HTH,

weiwei

On 1/21/07, nitin jindal [EMAIL PROTECTED] wrote:
 If validate.lrm does not has this option, do any other function has it.
 I will certainly look into your advice on cross validation. Thnx.

 nitin

 On 1/21/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 
  nitin jindal wrote:
   Hi,
  
   I am trying to cross-validate a logistic regression model.
   I am using logistic regression model (lrm) of package Design.
  
   f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
   val - validate.lrm(f, method=cross, B=5)
 
  val - validate(f, ...)# .lrm not needed
 
  
   My class cy has values 0 and 1.
  
   val variable will give me indicators like slope and AUC. But, I also
  need
   the vector of predicted values of class variable cy for each record
  while
   cross-validation, so that I can manually look at the results. So, is
  there
   any way to get those probabilities assigned to each class.
  
   regards,
   Nitin
 
  No, validate.lrm does not have that option.  Manually looking at the
  results will not be easy when you do enough cross-validations.  A single
  5-fold cross-validation does not provide accurate estimates.  Either use
  the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
k is often 10 but the optimum value may not be 10.  Code for averaging
  repeated cross-validations is in
  http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
  along with simulations of bootstrap vs. a few cross-validation methods
  for binary logistic models.
 
  Frank
  --
  Frank E Harrell Jr   Professor and Chair   School of Medicine
Department of Biostatistics   Vanderbilt University
 

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression model + Cross-Validation

2007-01-21 Thread nitin jindal

Hi,

I am trying to cross-validate a logistic regression model.
I am using logistic regression model (lrm) of package Design.

f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
val - validate.lrm(f, method=cross, B=5)

My class cy has values 0 and 1.

val variable will give me indicators like slope and AUC. But, I also need
the vector of predicted values of class variable cy for each record while
cross-validation, so that I can manually look at the results. So, is there
any way to get those probabilities assigned to each class.

regards,
Nitin

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression model + Cross-Validation

2007-01-21 Thread Frank E Harrell Jr

nitin jindal wrote:
 Hi,
 
 I am trying to cross-validate a logistic regression model.
 I am using logistic regression model (lrm) of package Design.
 
 f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
 val - validate.lrm(f, method=cross, B=5)

val - validate(f, ...)# .lrm not needed

 
 My class cy has values 0 and 1.
 
 val variable will give me indicators like slope and AUC. But, I also need
 the vector of predicted values of class variable cy for each record while
 cross-validation, so that I can manually look at the results. So, is there
 any way to get those probabilities assigned to each class.
 
 regards,
 Nitin

No, validate.lrm does not have that option.  Manually looking at the 
results will not be easy when you do enough cross-validations.  A single 
5-fold cross-validation does not provide accurate estimates.  Either use 
the bootstrap or repeat k-fold cross-validation between 20 and 50 times. 
  k is often 10 but the optimum value may not be 10.  Code for averaging 
repeated cross-validations is in 
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
along with simulations of bootstrap vs. a few cross-validation methods 
for binary logistic models.

Frank
-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression model + Cross-Validation

2007-01-21 Thread nitin jindal

If validate.lrm does not has this option, do any other function has it.
I will certainly look into your advice on cross validation. Thnx.

nitin

On 1/21/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 nitin jindal wrote:
  Hi,
 
  I am trying to cross-validate a logistic regression model.
  I am using logistic regression model (lrm) of package Design.
 
  f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
  val - validate.lrm(f, method=cross, B=5)

 val - validate(f, ...)# .lrm not needed

 
  My class cy has values 0 and 1.
 
  val variable will give me indicators like slope and AUC. But, I also
 need
  the vector of predicted values of class variable cy for each record
 while
  cross-validation, so that I can manually look at the results. So, is
 there
  any way to get those probabilities assigned to each class.
 
  regards,
  Nitin

 No, validate.lrm does not have that option.  Manually looking at the
 results will not be easy when you do enough cross-validations.  A single
 5-fold cross-validation does not provide accurate estimates.  Either use
 the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
   k is often 10 but the optimum value may not be 10.  Code for averaging
 repeated cross-validations is in
 http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
 along with simulations of bootstrap vs. a few cross-validation methods
 for binary logistic models.

 Frank
 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression model + Cross-Validation

2007-01-21 Thread Frank E Harrell Jr

nitin jindal wrote:
 If validate.lrm does not has this option, do any other function has it.
 I will certainly look into your advice on cross validation. Thnx.
 
 nitin

Not that I know of, but easy to program.
Frank

 
 On 1/21/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 nitin jindal wrote:
 Hi,

 I am trying to cross-validate a logistic regression model.
 I am using logistic regression model (lrm) of package Design.

 f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
 val - validate.lrm(f, method=cross, B=5)
 val - validate(f, ...)# .lrm not needed

 My class cy has values 0 and 1.

 val variable will give me indicators like slope and AUC. But, I also
 need
 the vector of predicted values of class variable cy for each record
 while
 cross-validation, so that I can manually look at the results. So, is
 there
 any way to get those probabilities assigned to each class.

 regards,
 Nitin
 No, validate.lrm does not have that option.  Manually looking at the
 results will not be easy when you do enough cross-validations.  A single
 5-fold cross-validation does not provide accurate estimates.  Either use
 the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
   k is often 10 but the optimum value may not be 10.  Code for averaging
 repeated cross-validations is in
 http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
 along with simulations of bootstrap vs. a few cross-validation methods
 for binary logistic models.

 Frank
 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression packages

2007-01-10 Thread Feng Qiu

Hi All:
   I'm testing a set of data classification algorithms in this paper 
(www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf )
I couldn't find such algorithms in R packages:
   1. LOG: polytomous logistic regression (there was one in MASS 
library: multinom. But after I update MASS library, multinom was lost.)
   2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass 
library) for this algorithm, so there should be a corresponding package in 
R, but I haven't found it so far.
   Any advice is appreciated.

Best,

Feng

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression packages

2007-01-10 Thread David Barron

1. multinom is is the nnet package

2. There is a polyclass function in package polspline

On 10/01/07, Feng Qiu [EMAIL PROTECTED] wrote:
 Hi All:
I'm testing a set of data classification algorithms in this paper
 (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf )
 I couldn't find such algorithms in R packages:
1. LOG: polytomous logistic regression (there was one in MASS
 library: multinom. But after I update MASS library, multinom was lost.)
2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass
 library) for this algorithm, so there should be a corresponding package in
 R, but I haven't found it so far.
Any advice is appreciated.

 Best,

 Feng

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP


-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression packages

2007-01-10 Thread Feng Qiu

Hi David:
 Thanks for you information. 2 further questions:
  1. I found out that multinom is not doing politomous logistic 
regression, do you know which function does this?
  2. the polyclass in polspline does polychotomous regression, 
while I'm looking for polytomous regression. Do you think these two are 
similar int erms of prediction?

Best,

Feng


- Original Message - 
From: David Barron [EMAIL PROTECTED]
To: r-help r-help@stat.math.ethz.ch
Sent: Wednesday, January 10, 2007 12:14 PM
Subject: Re: [R] logistic regression packages


 1. multinom is is the nnet package

 2. There is a polyclass function in package polspline

 On 10/01/07, Feng Qiu [EMAIL PROTECTED] wrote:
 Hi All:
I'm testing a set of data classification algorithms in this 
 paper
 (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf )
 I couldn't find such algorithms in R packages:
1. LOG: polytomous logistic regression (there was one in MASS
 library: multinom. But after I update MASS library, multinom was lost.)
2. POL: POLYCLASS algorithm. There is a S-Plus 
 package(polyclass
 library) for this algorithm, so there should be a corresponding package 
 in
 R, but I haven't found it so far.
Any advice is appreciated.

 Best,

 Feng

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 =
 David Barron
 Said Business School
 University of Oxford
 Park End Street
 Oxford OX1 1HP


 -- 
 =
 David Barron
 Said Business School
 University of Oxford
 Park End Street
 Oxford OX1 1HP

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression in R - changing defaults

2007-01-09 Thread Bob Green

Hello,

I was hoping for some advice in changing 2 defaults in a logistic regression.

1. It looks like the first category is the reference category?  In the 
following syntax 'where' has 4 levels, how can I make the reference 
category the third category?

model- glm(cbind(sucesses, failures) ~ where + who + firstep + dxnarrow + 
age + sex + medyear, family = binomial, data=life.use)
model

2. Is it possible to round results to 4 decimal points? If so, what syntax 
is required?

Any assistance is appreciated,

Bob Green

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in R - changing defaults

2007-01-09 Thread Dimitris Rizopoulos

an option is to use ?relevel(), e.g.,

life.use$where. - relevel(life.use$where, 3)
model- glm(cbind(sucesses, failures) ~ where. + who + firstep + 
dxnarrow +
age + sex + medyear, family = binomial, data = life.use)
print(model, digits = 1)
print(model, digits = 2)
print(model, digits = 3)
print(model, digits = 4)


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: Bob Green [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Tuesday, January 09, 2007 1:16 PM
Subject: [R] logistic regression in R - changing defaults


 Hello,

 I was hoping for some advice in changing 2 defaults in a logistic 
 regression.

 1. It looks like the first category is the reference category?  In 
 the
 following syntax 'where' has 4 levels, how can I make the reference
 category the third category?

 model- glm(cbind(sucesses, failures) ~ where + who + firstep + 
 dxnarrow +
 age + sex + medyear, family = binomial, data=life.use)
 model

 2. Is it possible to round results to 4 decimal points? If so, what 
 syntax
 is required?

 Any assistance is appreciated,

 Bob Green

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression with a sample missing subjects with a value of an independent variable

2006-10-21 Thread Gabriele Stocco

Dear R-help,
I am trying to make logistic regression analysis using the R function
glm, with the parameter family set to binomial, in order to use a
logistic regression model.
I have 70 samples. The dependent variables has two levels (0 and 1) and
one of the independent variables has too two levels (0 and 1). 
The variables associate in the way shown in the table:

Dependent   0   1
Independent 0  55  10

1   0   5

This gives a strong association evaluated by the fisher test (p-value =
0.0002481), but with the logistic regression it gives a p-value of 0.99
with very high values of estimate and standard error (respectively and
-19.27 and 1769.26).

Is there any way (other function, different setting of a parameter) to
perform logistic regression analysis with these data with R?

Thank you.

Gabriele Stocco
University of Trieste

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with a sample missing subjects with a value of an independent variable

2006-10-21 Thread Prof Brian Ripley

On Sat, 21 Oct 2006, Gabriele Stocco wrote:

 Dear R-help,
 I am trying to make logistic regression analysis using the R function
 glm, with the parameter family set to binomial, in order to use a
 logistic regression model.
 I have 70 samples. The dependent variables has two levels (0 and 1) and
 one of the independent variables has too two levels (0 and 1).
 The variables associate in the way shown in the table:

Dependent   0   1
 Independent 0  55  10

1   0   5

 This gives a strong association evaluated by the fisher test (p-value =
 0.0002481), but with the logistic regression it gives a p-value of 0.99
 with very high values of estimate and standard error (respectively and
 -19.27 and 1769.26).

Please see the comment at the bottom of this message, as your claims are 
not supported by any code.

 Is there any way (other function, different setting of a parameter) to
 perform logistic regression analysis with these data with R?

fit - glm(matrix(c(55,0,10,5), 2, 2) ~ factor(c(0,1)), binomial())
fit0 - glm(matrix(c(55,0,10,5), 2, 2) ~ 1,  binomial())
anova(fit0, fit, test=Chisq)

   Resid. Df Resid. Dev Df Deviance P(|Chi|)
1 1 16.929
2 0  2.208e-10  1   16.929 3.880e-05

is a reasonable way to do this.  Beware the Hauck-Donner phenomenon (see 
e.g. MASS, the book) for t-tests of coefficients, although I do not get 
the values you quote.  Since the expected values are low, you should not 
take the p value too seriously.

 Thank you.

 Gabriele Stocco
 University of Trieste

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression enquiry

2006-06-04 Thread Bob Green

I am hoping for some advie regarding the following scenario.

I have data from 26 studies that I wanted to analyse using logistic 
regression. Given the data was from studies and not individuals I was 
unsure how I would perform this in R.  When analysed in SPSS, weighting was 
used so that each study was included twice. Where use occurred,  a value 
of 1 was assigned and was weighted by the value for the variable 
positive. In  the second instance, where use did not occur a value of 0 
was assigned and was weighted by the variable negative. Taking the first 
study below, use = 1 would be weighted by 28 and use =0, it would be 
weighted by 117 (the total study N =145). If this data is scrambled in 
transmission I can send a csv version.

My questions:

1. is change required to the data format or is modification of the usual 
code required?
2. what is the best R package for performing logistic regression?


Any assistance is much appreciated

regards

Bob Green




In SPSS, each study was included  I weighted the , though am unsure how I 
would format the data


medyear where   who dxbroad firstep standardage sex 
positivenegative
89  3   2   1   0   0   31.571  28  117.00
98  2   2   1   0   1   48.062  15  72.00
98  4   1   1   0   0   45.261  42  57.00
89  3   0   1   0   1   28.763  19  48.00
99  2   2   1   0   1   34.773  27  73.00
88  3   0   1   0   1   30.658  26  57.00
94  1   1   1   0   1   36.381  70  124.00
96  3   1   1   0   1   40.057  27  40.00
96  2   2   1   0   1   33.164  9   41.00
88  2   0   1   1   0   29.547  30  202.00
98  1   2   0   0   1   39.360  246 734.00
97  4   0   0   0   1   38.467  17  85.00
92  3   0   1   0   1   34.367  15  127.00
88  2   0   1   0   1   46  9   90.00
85  3   0   1   0   1   30.364  58  87.00
94  3   0   1   0   1   38.847  47  126.00
88  3   0   1   0   1   33.854  25  134.00
92  3   0   1   1   1   67  157.00
90  3   0   1   1   1   26.052  17  101.00
90  3   0   1   0   0   39  32.00
90  2   0   1   0   1   36.138  10  173.00
90  2   0   1   0   1   38.953  64  383.00
97  2   0   1   0   1   31.561  12  52.00
99  1   1   1   0   1   25  56.00
100 4   1   1   0   1   45.062  46  270.00
101 2   0   1   0   1   32.4100 33  92.00

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression enquiry

2006-06-04 Thread John Fox

Dear Bob,

If I follow this properly, this is just a binomial logistic regression,
where, for each of your 26 observations, you have a certain number of
successes (positive instances) and failures (negative ones). 

Fitting a binomial logit model is easily accommodated by the glm()
function. Probably the simplest approach is to prepare one observation
for each study which has the number of successes and failures for that
study, and put these together as the columns of a two-column matrix on
the left-hand side of the model; on the right-hand-side you'd have the
usual specification for a model formula. Thus, something like

model - glm(cbind(successes, failures) ~ ..., family=binomial,
data=Dataset)

(where the ... represents the RHS of your model) should do the trick.

I hope this helps,
 John

On Sun, 04 Jun 2006 20:39:29 +1000
 Bob Green [EMAIL PROTECTED] wrote:
 I am hoping for some advie regarding the following scenario.
 
 I have data from 26 studies that I wanted to analyse using logistic 
 regression. Given the data was from studies and not individuals I was
 
 unsure how I would perform this in R.  When analysed in SPSS,
 weighting was 
 used so that each study was included twice. Where use occurred,  a
 value 
 of 1 was assigned and was weighted by the value for the variable 
 positive. In  the second instance, where use did not occur a value
 of 0 
 was assigned and was weighted by the variable negative. Taking the
 first 
 study below, use = 1 would be weighted by 28 and use =0, it would be 
 weighted by 117 (the total study N =145). If this data is scrambled
 in 
 transmission I can send a csv version.
 
 My questions:
 
 1. is change required to the data format or is modification of the
 usual 
 code required?
 2. what is the best R package for performing logistic regression?
 
 
 Any assistance is much appreciated
 
 regards
 
 Bob Green
 
 
 
 
 In SPSS, each study was included  I weighted the , though am unsure
 how I 
 would format the data
 
 
 medyear   where   who dxbroad firstep standardage sex 
 positivenegative
 893   2   1   0   0   31.571  28  117.00
 982   2   1   0   1   48.062  15  72.00
 984   1   1   0   0   45.261  42  57.00
 893   0   1   0   1   28.763  19  48.00
 992   2   1   0   1   34.773  27  73.00
 883   0   1   0   1   30.658  26  57.00
 941   1   1   0   1   36.381  70  124.00
 963   1   1   0   1   40.057  27  40.00
 962   2   1   0   1   33.164  9   41.00
 882   0   1   1   0   29.547  30  202.00
 981   2   0   0   1   39.360  246 734.00
 974   0   0   0   1   38.467  17  85.00
 923   0   1   0   1   34.367  15  127.00
 882   0   1   0   1   46  9   90.00
 853   0   1   0   1   30.364  58  87.00
 943   0   1   0   1   38.847  47  126.00
 883   0   1   0   1   33.854  25  134.00
 923   0   1   1   1   67  157.00
 903   0   1   1   1   26.052  17  101.00
 903   0   1   0   0   39  32.00
 902   0   1   0   1   36.138  10  173.00
 902   0   1   0   1   38.953  64  383.00
 972   0   1   0   1   31.561  12  52.00
 991   1   1   0   1   25  56.00
 100   4   1   1   0   1   45.062  46  270.00
 101   2   0   1   0   1   32.4100 33  92.00
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic Regression - Results?

2006-05-25 Thread Prof Brian Ripley

On Wed, 24 May 2006, Wojciech Gryc wrote:

 Hi,

 I use SPSS at work and have R installed both at work and on my home machine.
 I've been trying to do some logistic regressions in R and SPSS, but the
 results I'm getting are different. I've followed a few R tutorials, and with
 most of them, I get the following instructions:

 result - glm(z ~ x + y, family=binomial(logit))

 In the case above, with three variables (z being dependent).

 In SPSS, I'm told to use Analyze - Regression - Binary Logistic, where I
 put x, y in Covariates and z in Dependent. Note that my values for x and
 y are either 1 or 0.

 The results I get from these two tests are different, however, and I was
 wondering why. Am I choosing the wrong commands? If not, why are the results
 different? Any help would be greatly appreciated, and please note that I
 have a limited amount of stats knowledge.

If you show us an example and both outputs we may be able to help.  `the
results I'm getting are different' covers too many possibilities.

(Please see the posting guide for how to prepare a question like this.)

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic Regression - Results?

2006-05-25 Thread Marwan Khawaja

Are you sure you are using the same contrasts in SPSS? You could have
supplied us with your spss syntax.

Marwan

--
Marwan Khawaja   http://staff.aub.edu.lb/~mk36
-- 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Wojciech Gryc
 Sent: Thursday, May 25, 2006 12:41 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Logistic Regression - Results?
 
 Hi,
 
 I use SPSS at work and have R installed both at work and on 
 my home machine.
 I've been trying to do some logistic regressions in R and 
 SPSS, but the results I'm getting are different. I've 
 followed a few R tutorials, and with most of them, I get the 
 following instructions:
 
 result - glm(z ~ x + y, family=binomial(logit))
 
 In the case above, with three variables (z being dependent).
 
 In SPSS, I'm told to use Analyze - Regression - Binary 
 Logistic, where I put x, y in Covariates and z in 
 Dependent. Note that my values for x and y are either 1 or 0.
 
 The results I get from these two tests are different, 
 however, and I was wondering why. Am I choosing the wrong 
 commands? If not, why are the results different? Any help 
 would be greatly appreciated, and please note that I have a 
 limited amount of stats knowledge.
 
 Thanks,
 Wojciech
 
 -- 
 
 Five Minutes to Midnight:
 Youth on human rights and current affairs 
 http://www.fiveminutestomidnight.org/
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Logistic Regression - Results?

2006-05-24 Thread Wojciech Gryc

Hi,

I use SPSS at work and have R installed both at work and on my home machine.
I've been trying to do some logistic regressions in R and SPSS, but the
results I'm getting are different. I've followed a few R tutorials, and with
most of them, I get the following instructions:

result - glm(z ~ x + y, family=binomial(logit))

In the case above, with three variables (z being dependent).

In SPSS, I'm told to use Analyze - Regression - Binary Logistic, where I
put x, y in Covariates and z in Dependent. Note that my values for x and
y are either 1 or 0.

The results I get from these two tests are different, however, and I was
wondering why. Am I choosing the wrong commands? If not, why are the results
different? Any help would be greatly appreciated, and please note that I
have a limited amount of stats knowledge.

Thanks,
Wojciech

-- 

Five Minutes to Midnight:
Youth on human rights and current affairs
http://www.fiveminutestomidnight.org/

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression with,interactive parameters

2006-04-28 Thread orkun

hello

I need to know more about logistic regression with
interactive parameters.
Could you provide document, link etc..?

kind regards

-- 
Ahmet Temiz
Jeoloji Müh.
Afet İşleri Genel Müdürlüğü
Deprem Araştırma Dairesi
Tel: (312) 287 89 51 veya (312) 287 26 80/1547
Faks: (312) 287 89 51
E. Posta: [EMAIL PROTECTED]
 www.deprem.gov.tr

Ahmet Temiz
Geological Eng.
General Directorate of Disaster Affairs
Earthquake Research Department
Phone: +90 (312) 287 89 51 or (312) 287 26 80/1547
Fax: +90 (312) 287 89 51
E. Mail: [EMAIL PROTECTED]
www.deprem.gov.tr


-- 
This message has been scanned for viruses and\ dangerous con...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Ramón Casero Cañas

Michael Dewey wrote:
 At 17:12 09/04/06, RamÃ³n Casero CaÃ±as wrote:
 
 I am not sure what the problem you really want to solve is but it seems
 that
 a) abnormality is rare
 b) the logistic regression predicts it to be rare.
 If you want a prediction system why not try different cut-offs (other
 than 0.5 on the probability scale) and perhaps plot sensitivity and
 specificity to help to choose a cut-off?

Thanks for your suggestions, Michael. It took me some time to figure out
how to do this in R (as trivial as it may be for others). Some comments
about what I've done follow, in case anyone is interested.

The problem is a) abnormality is rare (Prevalence=14%) and b) there is
not much difference in the independent variable between abnormal and
normal. So the logistic regression model predicts that P(abnormal) =
0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
decide between normal/abnormal. But you are right, in that another
cut-off point can be chosen.

For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
Specificity=52%. They are pretty bad, although for clinical purposes I
would say that Positive/Negative Predictive Values are more interesting.
But then PPV=19% and NPV=90%, which isn't great. As an overall test of
how good the model is for classification I have computed the area under
the ROC, from your suggestion of using Sensitivity and Specificity.

I couldn't find how to do this directly with R, so I implemented it
myself (it's not difficult but I'm new here). I tried with package ROCR,
but apparently it doesn't cover binary outcomes.

The area under the ROC is 0.64, so I would say that even though the
model seems to fit the data, it just doesn't allow acceptable
discrimination, not matter what the cut-off point.


I have also studied the effect of low prevalence. For this, I used
option ran.gen in the boot function (package boot) to define a function
that resamples the data so that it balances abnormal and normal cases.

A logistic regression model is fitted to each replicate, to a parametric
bootstrap, and thus compute the bias of the estimates of the model
coefficients, beta0 and beta1. This shows very small bias for beta1, but
a rather large bias for beta0.

So I would say that prevalence has an effect on beta0, but not beta1.
This is good, because a common measure like the odds ratio depends only
on beta1.

Cheers,

-- 
Ramón Casero Cañas

http://www.robots.ox.ac.uk/~rcasero/wiki
http://www.robots.ox.ac.uk/~rcasero/blog

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Rick Bilonick

On Sun, 2006-04-16 at 19:10 +0100, Ramón Casero Cañas wrote:

 Thanks for your suggestions, Michael. It took me some time to figure out
 how to do this in R (as trivial as it may be for others). Some comments
 about what I've done follow, in case anyone is interested.
 
 The problem is a) abnormality is rare (Prevalence=14%) and b) there is
 not much difference in the independent variable between abnormal and
 normal. So the logistic regression model predicts that P(abnormal) =
 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
 decide between normal/abnormal. But you are right, in that another
 cut-off point can be chosen.
 
 For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
 Specificity=52%. They are pretty bad, although for clinical purposes I
 would say that Positive/Negative Predictive Values are more interesting.
 But then PPV=19% and NPV=90%, which isn't great. As an overall test of
 how good the model is for classification I have computed the area under
 the ROC, from your suggestion of using Sensitivity and Specificity.
 
 I couldn't find how to do this directly with R, so I implemented it
 myself (it's not difficult but I'm new here). I tried with package ROCR,
 but apparently it doesn't cover binary outcomes.
 
 The area under the ROC is 0.64, so I would say that even though the
 model seems to fit the data, it just doesn't allow acceptable
 discrimination, not matter what the cut-off point.
 
 
 I have also studied the effect of low prevalence. For this, I used
 option ran.gen in the boot function (package boot) to define a function
 that resamples the data so that it balances abnormal and normal cases.
 
 A logistic regression model is fitted to each replicate, to a parametric
 bootstrap, and thus compute the bias of the estimates of the model
 coefficients, beta0 and beta1. This shows very small bias for beta1, but
 a rather large bias for beta0.
 
 So I would say that prevalence has an effect on beta0, but not beta1.
 This is good, because a common measure like the odds ratio depends only
 on beta1.
 
 Cheers,
 
 -- 
 Ramón Casero Cañas
 
 http://www.robots.ox.ac.uk/~rcasero/wiki
 http://www.robots.ox.ac.uk/~rcasero/blog
 

The Epi package has function ROC that draws the ROC curve and computes
the AUC among other things.

Rick B.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Frank E Harrell Jr

Ramón Casero Cañas wrote:
 Michael Dewey wrote:
 
At 17:12 09/04/06, RamÃ³n Casero CaÃ±as wrote:

I am not sure what the problem you really want to solve is but it seems
that
a) abnormality is rare
b) the logistic regression predicts it to be rare.
If you want a prediction system why not try different cut-offs (other
than 0.5 on the probability scale) and perhaps plot sensitivity and
specificity to help to choose a cut-off?
 
 
 Thanks for your suggestions, Michael. It took me some time to figure out
 how to do this in R (as trivial as it may be for others). Some comments
 about what I've done follow, in case anyone is interested.
 
 The problem is a) abnormality is rare (Prevalence=14%) and b) there is
 not much difference in the independent variable between abnormal and
 normal. So the logistic regression model predicts that P(abnormal) =
 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
 decide between normal/abnormal. But you are right, in that another
 cut-off point can be chosen.
 
 For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
 Specificity=52%. They are pretty bad, although for clinical purposes I
 would say that Positive/Negative Predictive Values are more interesting.
 But then PPV=19% and NPV=90%, which isn't great. As an overall test of
 how good the model is for classification I have computed the area under
 the ROC, from your suggestion of using Sensitivity and Specificity.
 
 I couldn't find how to do this directly with R, so I implemented it
 myself (it's not difficult but I'm new here). I tried with package ROCR,
 but apparently it doesn't cover binary outcomes.
 
 The area under the ROC is 0.64, so I would say that even though the
 model seems to fit the data, it just doesn't allow acceptable
 discrimination, not matter what the cut-off point.
 
 
 I have also studied the effect of low prevalence. For this, I used
 option ran.gen in the boot function (package boot) to define a function
 that resamples the data so that it balances abnormal and normal cases.
 
 A logistic regression model is fitted to each replicate, to a parametric
 bootstrap, and thus compute the bias of the estimates of the model
 coefficients, beta0 and beta1. This shows very small bias for beta1, but
 a rather large bias for beta0.
 
 So I would say that prevalence has an effect on beta0, but not beta1.
 This is good, because a common measure like the odds ratio depends only
 on beta1.
 
 Cheers,
 

This makes me think you are trying to go against maximum likelihood to 
optimize an improper criterion.  Forcing a single cutpoint to be chosen 
seems to be at the heart of your problem.  There's nothing wrong with 
using probabilities and letting the utility possessor make the final 
decision.

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Ramón Casero Cañas

Frank E Harrell Jr wrote:
 
 This makes me think you are trying to go against maximum likelihood to
 optimize an improper criterion.  Forcing a single cutpoint to be chosen
 seems to be at the heart of your problem.  There's nothing wrong with
 using probabilities and letting the utility possessor make the final
 decision.

I agree, and in fact I was thinking along those lines, but I also needed
a way of evaluating how good is the model to discriminate between
abnormal and normal cases, as opposed to e.g. GOF. The only way I know
of is using area under ROC (thus setting cut-off points), which also
followed neatly from Michael Dewey comments. Any alternatives would be
welcome :)

-- 
Ramón Casero Cañas

http://www.robots.ox.ac.uk/~rcasero/wiki
http://www.robots.ox.ac.uk/~rcasero/blog

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Frank E Harrell Jr

Ramón Casero Cañas wrote:
 Frank E Harrell Jr wrote:
 
This makes me think you are trying to go against maximum likelihood to
optimize an improper criterion.  Forcing a single cutpoint to be chosen
seems to be at the heart of your problem.  There's nothing wrong with
using probabilities and letting the utility possessor make the final
decision.
 
 
 I agree, and in fact I was thinking along those lines, but I also needed
 a way of evaluating how good is the model to discriminate between
 abnormal and normal cases, as opposed to e.g. GOF. The only way I know
 of is using area under ROC (thus setting cut-off points), which also
 followed neatly from Michael Dewey comments. Any alternatives would be
 welcome :)
 

To get the ROC area you don't need to do any of that, and as you 
indicated, it is a good discrimination measure.  The lrm function in the 
Design package gives it to you automatically (C index), and you can also 
get it with the Hmisc package's somers2 and rcorr.cens functions.  ROC 
area is highly related to the Wilcoxon 2-sample test statistic for 
comparing cases and non-cases.

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-12 Thread Michael Dewey

At 17:12 09/04/06, RamÃ³n Casero CaÃ±as wrote:

I have not seen a reply to this so far apologies if I missed something.


When fitting a logistic regression model using weights I get the
following warning

  data.model.w - glm(ABN ~ TR, family=binomial(logit), weights=WEIGHT)
Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)

Details follow

***

I have a binary dependent variable of abnormality

ABN = T, F, T, T, F, F, F...

and a continous predictor

TR = 1.962752 1.871123 1.893543 1.685001 2.121500, ...



As the number of abnormal cases (ABN==T) is only 14%, and there is large
overlapping between abnormal and normal cases, the logistic regression
found by glm is always much closer to the normal cases than for the
abnormal cases. In particular, the probability of abnormal is at most 0.4.

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)   0.7607 0.7196   1.057   0.2905
TR2  -1.4853 0.4328  -3.432   0.0006 ***
---

I would like to compensate for the fact that the a priori probability of
abnormal cases is so low. I have created a weight vector

I am not sure what the problem you really want to solve is but it seems that
a) abnormality is rare
b) the logistic regression predicts it to be rare.
If you want a prediction system why not try different cut-offs (other than 
0.5 on the probability scale) and perhaps plot sensitivity and specificity 
to help to choose a cut-off?

  WEIGHT - ABN
  WEIGHT[ ABN == TRUE ] -  1 / na / 2
  WEIGHT[ ABN == FALSE ] -  1 / nn / 2

so that all weights add up to 1, where ``na'' is the number of abnormal
cases, and ``nn'' is the number of normal cases. That is, normal cases
have less weight in the model fitting because there are so many.

But then I get the warning message at the beginning of this email, and I
suspect that I'm doing something wrong. Must weights be integers, or at
least greater than one?

Regards,

--
RamÃ³n Casero CaÃ±as

http://www.robots.ox.ac.uk/~rcasero/wiki
http://www.robots.ox.ac.uk/~rcasero/blog

Michael Dewey
http://www.aghmed.fsnet.co.uk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression model with non-integer weights

2006-04-09 Thread Ramón Casero Cañas


When fitting a logistic regression model using weights I get the
following warning

 data.model.w - glm(ABN ~ TR, family=binomial(logit), weights=WEIGHT)
Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)

Details follow

***

I have a binary dependent variable of abnormality

ABN = T, F, T, T, F, F, F...

and a continous predictor

TR = 1.962752 1.871123 1.893543 1.685001 2.121500, ...



As the number of abnormal cases (ABN==T) is only 14%, and there is large
overlapping between abnormal and normal cases, the logistic regression
found by glm is always much closer to the normal cases than for the
abnormal cases. In particular, the probability of abnormal is at most 0.4.

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)   0.7607 0.7196   1.057   0.2905
TR2  -1.4853 0.4328  -3.432   0.0006 ***
---

I would like to compensate for the fact that the a priori probability of
abnormal cases is so low. I have created a weight vector

 WEIGHT - ABN
 WEIGHT[ ABN == TRUE ] -  1 / na / 2
 WEIGHT[ ABN == FALSE ] -  1 / nn / 2

so that all weights add up to 1, where ``na'' is the number of abnormal
cases, and ``nn'' is the number of normal cases. That is, normal cases
have less weight in the model fitting because there are so many.

But then I get the warning message at the beginning of this email, and I
suspect that I'm doing something wrong. Must weights be integers, or at
least greater than one?

Regards,

-- 
Ramón Casero Cañas

http://www.robots.ox.ac.uk/~rcasero/wiki
http://www.robots.ox.ac.uk/~rcasero/blog

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Logistic regression - confidence intervals

2006-02-08 Thread Cox, Stephen

Please forgive a rather naïve question... 

Could someone please give a quick explanation for the differences in conf 
intervals achieved via confint.glm (based on profile liklihoods) and the 
intervals achieved using the Design library.

For example, the intervals in the following two outputs are different.

library(Design)
x = rnorm(100)
y = gl(2,50)
d = data.frame(x = x, y = y)
dd = datadist(d); options(datadist = 'dd')
m1 = lrm(y~x, data = d)
summary(m1)

m2 = glm(y~x, family = binomial, data = d)
confint(m2)

I have spent time trying to figure this out via archives, but have not had much 
luck.

Regards

Stephen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression model selection with overdispersed/autocorrelated data

2006-02-08 Thread Jesse . Whittington


Thanks for pointing out the aod package and the beta-binomial logistic
models Renaud.

While I see how betabinom could be applied to some of our other analyses ,
I don't see how it can be used in our habitat selection analysis where
individual locations are coded as 0 or 1 rather than proportions.  Gee
models (geeglm from geepack) could be used for our analyses.  Even though
these models are fit using maximum likelihood estimation, they do not solve
our model selection problem.

Beta-coefficients from gee, glm, glmm's, and lrm are nearly identical.  The
only thing that varies is the variance-covariance matrix and the resulting
standard errors.  Consequently, the deviances should be similar because
predicted values (p) are calculated from the beta-coefficients.  For an
individual data point, the loglikelihood = y * log(p) + (1 - y) * log(1-p)
and the deviance = -2 * sum(loglikelihoods).  Consequently, the difference
in deviance between two models is amplified by autocorrelated data and
causes models to be overparamaterized when using AIC or likelihood ratio
tests.

I am curious how others select models with autocorrelated data.

Thanks for your help,

Jesse






   
  Renaud Lancelot   
   
  renaud.lancelot@To:   [EMAIL PROTECTED] 
[EMAIL PROTECTED] 
  gmail.com   cc:   
r-help@stat.math.ethz.ch  
   Subject:  Re: [R] Logistic 
regression model selection with overdispersed/autocorrelated 
  31/01/2006 01:02  data
   

   

   




If you're not interested in fitting caribou-specific responses, you
can use beta-binomial logistic models. There are several package
available for this purpose on CRAN, among which aod. Because these
models are fitted using maximum-likelihood methods, you can use AIC
(or other information criteria) to compare different models.

Best,

Renaud

2006/1/30, [EMAIL PROTECTED] [EMAIL PROTECTED]:


 I am creating habitat selection models for caribou and other species with
 data collected from GPS collars.  In my current situation the
radio-collars
 recorded the locations of 30 caribou every 6 hours.  I am then comparing
 resources used at caribou locations to random locations using logistic
 regression (standard habitat analysis).

 The data is therefore highly autocorrelated and this causes Type I error
 two ways – small standard errors around beta-coefficients and
 over-paramaterization during model selection.  Robust standard errors are
 easily calculated by block-bootstrapping the data using animal as a
 cluster with the Design library, however I haven't found a satisfactory
 solution for model selection.

 A couple options are:
 1.  Using QAIC where the deviance is divided by a variance inflation
factor
 (Burnham  Anderson).  However, this VIF can vary greatly depending on
the
 data set and the set of covariates used in the global model.
 2.  Manual forward stepwise regression using both changes in deviance and
 robust p-values for the beta-coefficients.

 I have been looking for a solution to this problem for a couple years and
 would appreciate any advice.

 Jesse

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


--
Renaud LANCELOT
Département Elevage et Médecine Vétérinaire (EMVT) du CIRAD
Directeur adjoint chargé des affaires scientifiques

CIRAD, Animal Production and Veterinary Medicine Department
Deputy director for scientific affairs

Campus international de Baillarguet
TA 30 / B (Bât. B, Bur. 214)
34398 Montpellier Cedex 5 - France
Tél   +33 (0)4 67 59 37 17
Secr. +33 (0)4 67 59 39 04
Fax   +33 (0)4 67 59 37 95

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression - confidence intervals

2006-02-08 Thread Prof Brian Ripley


On Wed, 8 Feb 2006, Cox, Stephen wrote:


Please forgive a rather naïve question...

Could someone please give a quick explanation for the differences in 
conf intervals achieved via confint.glm (based on profile liklihoods) 
and the intervals achieved using the Design library.


Well, the Design library is not giving you confidence intervals for 
parameters, is it?  (Since there is no summary method for lrm it is a long 
haul to find out what it is giving you, which I leave to you.)



For example, the intervals in the following two outputs are different.

library(Design)
x = rnorm(100)
y = gl(2,50)
d = data.frame(x = x, y = y)
dd = datadist(d); options(datadist = 'dd')
m1 = lrm(y~x, data = d)
summary(m1)

m2 = glm(y~x, family = binomial, data = d)
confint(m2)

I have spent time trying to figure this out via archives, but have not had much 
luck.

Regards

Stephen


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression - confidence intervals

2006-02-08 Thread Frank E Harrell Jr

Cox, Stephen wrote:
 Please forgive a rather naïve question... 
 
 Could someone please give a quick explanation for the differences in conf 
 intervals achieved via confint.glm (based on profile liklihoods) and the 
 intervals achieved using the Design library.
 
 For example, the intervals in the following two outputs are different.
 
 library(Design)
 x = rnorm(100)
 y = gl(2,50)
 d = data.frame(x = x, y = y)
 dd = datadist(d); options(datadist = 'dd')
 m1 = lrm(y~x, data = d)
 summary(m1)
 
 m2 = glm(y~x, family = binomial, data = d)
 confint(m2)
 
 I have spent time trying to figure this out via archives, but have not had 
 much luck.
 
 Regards
 
 Stephen

Design uses Wald(large sample normality of parameter estimates) -based 
confidence intervals.  These are good for most situations but profile 
confidence intervals are preferred.   Someday I'll make Design do those.

One advantage to Wald statistics is that they extend readily to cluster 
sampling (e.g., using cluster sandwich covariance estimators) and other 
complications (e.g., adjustment of variances for multiple imputation), 
whereas likelihood ratio statistics do not (unless e.g. you have an 
explicit model for the correlation structure or other facits of the model).

Also note that confint is probably giving a confidence interval for a 
one-unit change in x whereas summary.Design is computing an 
interquartile-range effect (difference in x-values is shown in the 
summary output).

When posting a nice simulated example it's best to do 
set.seed(something) so everyone will get the same data.

Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression model selection with overdispersed/autocorrelated data

2006-01-31 Thread Renaud Lancelot

If you're not interested in fitting caribou-specific responses, you
can use beta-binomial logistic models. There are several package
available for this purpose on CRAN, among which aod. Because these
models are fitted using maximum-likelihood methods, you can use AIC
(or other information criteria) to compare different models.

Best,

Renaud

2006/1/30, [EMAIL PROTECTED] [EMAIL PROTECTED]:


 I am creating habitat selection models for caribou and other species with
 data collected from GPS collars.  In my current situation the radio-collars
 recorded the locations of 30 caribou every 6 hours.  I am then comparing
 resources used at caribou locations to random locations using logistic
 regression (standard habitat analysis).

 The data is therefore highly autocorrelated and this causes Type I error
 two ways – small standard errors around beta-coefficients and
 over-paramaterization during model selection.  Robust standard errors are
 easily calculated by block-bootstrapping the data using animal as a
 cluster with the Design library, however I haven't found a satisfactory
 solution for model selection.

 A couple options are:
 1.  Using QAIC where the deviance is divided by a variance inflation factor
 (Burnham  Anderson).  However, this VIF can vary greatly depending on the
 data set and the set of covariates used in the global model.
 2.  Manual forward stepwise regression using both changes in deviance and
 robust p-values for the beta-coefficients.

 I have been looking for a solution to this problem for a couple years and
 would appreciate any advice.

 Jesse

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Renaud LANCELOT
Département Elevage et Médecine Vétérinaire (EMVT) du CIRAD
Directeur adjoint chargé des affaires scientifiques

CIRAD, Animal Production and Veterinary Medicine Department
Deputy director for scientific affairs

Campus international de Baillarguet
TA 30 / B (Bât. B, Bur. 214)
34398 Montpellier Cedex 5 - France
Tél   +33 (0)4 67 59 37 17
Secr. +33 (0)4 67 59 39 04
Fax   +33 (0)4 67 59 37 95

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression model selection with overdispersed/autocorrelated data

2006-01-31 Thread Jesse . Whittington





[EMAIL PROTECTED] wrote:

 I am creating habitat selection models for caribou and other species with
 data collected from GPS collars.  In my current situation the
radio-collars
 recorded the locations of 30 caribou every 6 hours.  I am then comparing
 resources used at caribou locations to random locations using logistic
 regression (standard habitat analysis).

 The data is therefore highly autocorrelated and this causes Type I error
 two ways – small standard errors around beta-coefficients and
 over-paramaterization during model selection.  Robust standard errors are
 easily calculated by block-bootstrapping the data using “animal” as a
 cluster with the Design library, however I haven’t found a satisfactory
 solution for model selection.

 A couple options are:
 1.  Using QAIC where the deviance is divided by a variance inflation
factor
 (Burnham  Anderson).  However, this VIF can vary greatly depending on
the
 data set and the set of covariates used in the global model.
 2.  Manual forward stepwise regression using both changes in deviance and
 robust p-values for the beta-coefficients.

 I have been looking for a solution to this problem for a couple years and
 would appreciate any advice.

 Jesse

Frank E Harrell Jr wrote:

If you must do non-subject-matter-driven model selection, look at the
fastbw function in Design, which will use the cluster bootstrap variance
matrix.

Frank


Thanks for the tip.  I didn't know that the fastbw function could account
for the clustered variance.  For others, the code to run such a model from
the Design library would be:

model.1 - lrm(y ~ x1+x2+x3+x4, data=data, x=T,y=T)  # create model
model.2 - bootcov(model.1, cluster=data$animal, B=1)# calculate
robust variance matrix
fastbw(model.2)  # backward
step-wise selection.

Later we will examine individual caribou responses to trails
(subject-specific model selection).  For this we plan to use mixed effects
models (lmer).  Is this what you would also recommend?

I look forward to reading the new edition of your book when it is
published.

Jesse

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Logistic regression model selection with overdispersed/autocorrelated data

2006-01-30 Thread Jesse . Whittington



I am creating habitat selection models for caribou and other species with
data collected from GPS collars.  In my current situation the radio-collars
recorded the locations of 30 caribou every 6 hours.  I am then comparing
resources used at caribou locations to random locations using logistic
regression (standard habitat analysis).

The data is therefore highly autocorrelated and this causes Type I error
two ways – small standard errors around beta-coefficients and
over-paramaterization during model selection.  Robust standard errors are
easily calculated by block-bootstrapping the data using “animal” as a
cluster with the Design library, however I haven’t found a satisfactory
solution for model selection.

A couple options are:
1.  Using QAIC where the deviance is divided by a variance inflation factor
(Burnham  Anderson).  However, this VIF can vary greatly depending on the
data set and the set of covariates used in the global model.
2.  Manual forward stepwise regression using both changes in deviance and
robust p-values for the beta-coefficients.

I have been looking for a solution to this problem for a couple years and
would appreciate any advice.

Jesse

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression model selection with overdispersed/autocorrelated data

2006-01-30 Thread Frank E Harrell Jr

[EMAIL PROTECTED] wrote:
 
 I am creating habitat selection models for caribou and other species with
 data collected from GPS collars.  In my current situation the radio-collars
 recorded the locations of 30 caribou every 6 hours.  I am then comparing
 resources used at caribou locations to random locations using logistic
 regression (standard habitat analysis).
 
 The data is therefore highly autocorrelated and this causes Type I error
 two ways – small standard errors around beta-coefficients and
 over-paramaterization during model selection.  Robust standard errors are
 easily calculated by block-bootstrapping the data using “animal” as a
 cluster with the Design library, however I haven’t found a satisfactory
 solution for model selection.
 
 A couple options are:
 1.  Using QAIC where the deviance is divided by a variance inflation factor
 (Burnham  Anderson).  However, this VIF can vary greatly depending on the
 data set and the set of covariates used in the global model.
 2.  Manual forward stepwise regression using both changes in deviance and
 robust p-values for the beta-coefficients.
 
 I have been looking for a solution to this problem for a couple years and
 would appreciate any advice.
 
 Jesse

If you must do non-subject-matter-driven model selection, look at the 
fastbw function in Design, which will use the cluster bootstrap variance 
matrix.

Frank

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression to select genes and estimate cutoff point?

2005-12-22 Thread Ido M. Tamir

You could take a look at www.bioconductor.org
limma would be a good starting point.

hth
ido

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression to select genes and estimate cutoff point?

2005-12-22 Thread Frank E Harrell Jr

Lingsheng Dong wrote:
 Hi, all, 
 I am new to R or even to statistics. Not sure if the question has a answer. 
 But I couldn't find a straight forward answer in the help mailing list. 
 I need use MicroArray data to select several diagnostic genes between Normal 
 samples and Tumor samples and use these genes to predict unknow samples.
 Since the sample size is so small and data doesn't follow normal 
 distribution, I am thinking to use logistic regression instead of Student T 
 test to select genes. To make the problem simpler, I assume each gene is 
 independent to each other without interactions.
 My questions is how I should build up the model: one model for each gene or a 
 multiple variable model to include all genes? Which is the test to compare 
 the discrimination power of each gene? I am thinking it is Wald statistic for 
 the multiple variable model and Maximum likelihood for the single gene 
 models? Am I  correct?
 To estimate the cutoff point, I guess the answer is the gene expression when 
 p=0.5 in the model. Am I on the right direction?
 Any suggestion is appreciated!
 Thanks a lot.
 Lingsheng 

Just a comment: Do you not have a statistician to work with at your 
institution?  You are new to statistics and are asking a question that 
would be very difficult to deal with for someone with a PhD in 
statistics and 20 years of experience.  Some of the issues involved are 
multiple comparisons, false discovery rate, shrinkage, array geometry 
effects, nonparametric vs. parametric statistics, stability of selected 
genes, discovery validation, ...

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Logistic regression to select genes and estimate cutoff point?

2005-12-21 Thread Lingsheng Dong

Hi, all, 
I am new to R or even to statistics. Not sure if the question has a answer. But 
I couldn't find a straight forward answer in the help mailing list. 
I need use MicroArray data to select several diagnostic genes between Normal 
samples and Tumor samples and use these genes to predict unknow samples.
Since the sample size is so small and data doesn't follow normal distribution, 
I am thinking to use logistic regression instead of Student T test to select 
genes. To make the problem simpler, I assume each gene is independent to each 
other without interactions.
My questions is how I should build up the model: one model for each gene or a 
multiple variable model to include all genes? Which is the test to compare the 
discrimination power of each gene? I am thinking it is Wald statistic for the 
multiple variable model and Maximum likelihood for the single gene models? Am I 
 correct?
To estimate the cutoff point, I guess the answer is the gene expression when 
p=0.5 in the model. Am I on the right direction?
Any suggestion is appreciated!
Thanks a lot.
Lingsheng 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression with constrained coefficients?

2005-12-08 Thread Richard A. O'Keefe

I am trying to automatically construct a distance function from
a training set in order to use it to cluster another data set.
The variables are nominal.  One variable is a class variable
having two values; it is kept separate from the others.

I have a method which constructs a distance matrix for the levels
of a nominal variable in the context of the other variables.

I want to construct a linear combination of these which gives me
a distance between whole cases that is well associated with the
class variable, in that
combined distance between two cases large =
 they most likely belong to different classes.

So from my training set I construct a set of
(d1(x1,y1), ..., dn(xn,yn), x_class != y_class)
rows bound together as a data frame (actually I construct it by
columns), and then the obvious thing to try was

glm(different.class ~ ., family = binomial(), data = distance.frame)

The thing is that this gives me both positve and negative coefficients,
whereas the linear combination is only guaranteed to be a metric if the
coefficients are all non-negative.

There are four fairly obvious ways to deal with that:
(1) just force the negative coefficients to 0 and hope.
This turns out to work rather well, but still...
(2) keep all the coefficients but take max(0, linear combination of distances).
This turns out to work rather well, but still...
(3) Drop the variables with negative coefficients from the model,
refit, and iterate until no negative coefficients remain.
This can hardly be said to work; sometimes nearly all the variables
are dropped.
(4) Use a version of glm() that will let me constrain the coefficients
to be non-negative.

I *have* searched the R-help archives, and I see that the question about
logistic regression with constrained coefficients has come up before, but
it didn't really get a satisfactory answer.  I've also searched the
documentation of more contributed packages than I could possibly understand.

There is obviously some way to do this using R's general non-linear
optimisation functions.  However, I don't know how to formulate logistic
regression that way.

This whole thing is heuristic.  I am not hell-bent on (ab?)using logistic
regression this way.  It was just an obvious thing to try.  Suggestions
for other means to the same end will be welcome.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression with constrained coefficients?

2005-12-08 Thread Prof Brian Ripley

On Fri, 9 Dec 2005, Richard A. O'Keefe wrote:

 I am trying to automatically construct a distance function from
 a training set in order to use it to cluster another data set.
 The variables are nominal.  One variable is a class variable
 having two values; it is kept separate from the others.

 I have a method which constructs a distance matrix for the levels
 of a nominal variable in the context of the other variables.

 I want to construct a linear combination of these which gives me
 a distance between whole cases that is well associated with the
 class variable, in that
combined distance between two cases large =
 they most likely belong to different classes.

 So from my training set I construct a set of
(d1(x1,y1), ..., dn(xn,yn), x_class != y_class)
 rows bound together as a data frame (actually I construct it by
 columns), and then the obvious thing to try was

glm(different.class ~ ., family = binomial(), data = distance.frame)

 The thing is that this gives me both positve and negative coefficients,
 whereas the linear combination is only guaranteed to be a metric if the
 coefficients are all non-negative.

 There are four fairly obvious ways to deal with that:
 (1) just force the negative coefficients to 0 and hope.
This turns out to work rather well, but still...
 (2) keep all the coefficients but take max(0, linear combination of 
 distances).
This turns out to work rather well, but still...
 (3) Drop the variables with negative coefficients from the model,
refit, and iterate until no negative coefficients remain.
This can hardly be said to work; sometimes nearly all the variables
are dropped.
 (4) Use a version of glm() that will let me constrain the coefficients
to be non-negative.

 I *have* searched the R-help archives, and I see that the question about
 logistic regression with constrained coefficients has come up before, but
 it didn't really get a satisfactory answer.  I've also searched the
 documentation of more contributed packages than I could possibly understand.

 There is obviously some way to do this using R's general non-linear
 optimisation functions.  However, I don't know how to formulate logistic
 regression that way.

There is a worked example in MASS (the book) p.445, including adding 
constraints.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Logistic Regression using glm

2005-10-11 Thread Daniel Pick

Hello everyone,
   I am currently teaching an intermediate stats.
course at UCSD Extension using R.  We are using
Venables and Ripley as the primary text for the
course, with Freund  Wilson's Statistical Methods as
a secondary reference.
   I recently gave a homework assignment on logistic
regression, and I had a question about glm.  Let n be
the number of trials, p be the estimated sample
proportion, and w be the standard binomial weights
n*p*(1-p).  If you perform
output - glm(p ~ x, family = binomial, weights = n)
you get a different result than if you perform the
logit transformation manually on p and perform
output - lm(logit(p) ~ x, weights = w),
where logit(p) is either obtained from R with
qlogis(p) or from a manual computation of ln(p/1-p).

The difference seems to me to be too large to be
roundoff error.  The only thing I can guess is that
the application of the weights in glm is different
than in a manual computation.  Can anyone explain the
difference in results?  


Daniel Pick 
Principal 
Daniel Pick Scientific Software Consulting 
San Diego, CA 
E-Mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic Regression using glm

2005-10-11 Thread Sung, Iyue

You're fitting two different models.

The latter is saying: logit(p)=x+e, where e is a normal error, so that
logit(p) is normal.
lm fits a Linear Model, which uses normal error.

The former says that p is Bernoulli; and p~Bernoulli does not imply
logit(p) is normal.
A Generalized Linear Model has different options for specifying the
random component.

Agresti's Categorical Data Analysis lays out the details very well.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Pick
 Sent: Tuesday, October 11, 2005 12:22 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Logistic Regression using glm
 
 Hello everyone,
I am currently teaching an intermediate stats.
 course at UCSD Extension using R.  We are using Venables and 
 Ripley as the primary text for the course, with Freund  
 Wilson's Statistical Methods as a secondary reference.
I recently gave a homework assignment on logistic 
 regression, and I had a question about glm.  Let n be the 
 number of trials, p be the estimated sample proportion, and w 
 be the standard binomial weights n*p*(1-p).  If you perform 
 output - glm(p ~ x, family = binomial, weights = n) you get 
 a different result than if you perform the logit 
 transformation manually on p and perform output - 
 lm(logit(p) ~ x, weights = w), where logit(p) is either 
 obtained from R with
 qlogis(p) or from a manual computation of ln(p/1-p).
 
 The difference seems to me to be too large to be roundoff 
 error.  The only thing I can guess is that the application of 
 the weights in glm is different than in a manual computation. 
  Can anyone explain the difference in results?  
 
 
 Daniel Pick
 Principal
 Daniel Pick Scientific Software Consulting San Diego, CA
 E-Mail: [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 --
 --
 This e-mail and any attachments may be confidential or\  ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic Regression using glm

2005-10-11 Thread Thomas Lumley


One of these is modelling the mean of the logit of p, the other is 
modelling the logit of the mean of p.  They aren't the same.

-thomas

On Tue, 11 Oct 2005, Daniel Pick wrote:

 Hello everyone,
   I am currently teaching an intermediate stats.
 course at UCSD Extension using R.  We are using
 Venables and Ripley as the primary text for the
 course, with Freund  Wilson's Statistical Methods as
 a secondary reference.
   I recently gave a homework assignment on logistic
 regression, and I had a question about glm.  Let n be
 the number of trials, p be the estimated sample
 proportion, and w be the standard binomial weights
 n*p*(1-p).  If you perform
 output - glm(p ~ x, family = binomial, weights = n)
 you get a different result than if you perform the
 logit transformation manually on p and perform
 output - lm(logit(p) ~ x, weights = w),
 where logit(p) is either obtained from R with
 qlogis(p) or from a manual computation of ln(p/1-p).

 The difference seems to me to be too large to be
 roundoff error.  The only thing I can guess is that
 the application of the weights in glm is different
 than in a manual computation.  Can anyone explain the
 difference in results?


 Daniel Pick
 Principal
 Daniel Pick Scientific Software Consulting
 San Diego, CA
 E-Mail: [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression with nominal predictors

2005-09-19 Thread Spencer Graves

  This sounds to me like a great research project that could be 
answered relatively easily with a Monte Carlo study.  An execllent 
mathematician might be able to produce simple theoretical limits on the 
error from using ranks or normal scores, but such limits would likely be 
much wider than one could get in a typical case using Monte Carlo.  And 
Monte Carlo in R is normally fairly easy even for quite complicated 
situations.

  spencer graves

Ramón Casero Cañas wrote:

 (Sorry for obvious mistakes, as I am quite a newby with no Statistics
 background).
 
 My question is going to be what is the gain of logistic regression over
 odds ratios when none of the input variables is continuous.
 
 
 My experiment:
  Outcome: ordinal scale, ``quality'' (QUA=1,2,3)
  Predictors: ``segment'' (SEG) and ``stress'' (STR). SEG is
  nominal scale with 24 levels, and STR is dychotomous (0,1).
 
 
 
 Considering the outcome continuous, two-way ANOVA with
 
 aov(as.integer(QUA) ~ SEG * STR)
 
 doesn't find evidence of interaction between SEG and STR, and they are
 significant on their own. This is the result that we would expect from
 clinical knowledge.
 
 
 
 I use
 
 xtabs(~QUA+SEG, data=data2.df, subset=STR==0)
 xtabs(~QUA+SEG, data=data2.df, subset=STR==0)
 
 for the contingency tables. There are zero cells, and for some values of
 SEG, there is only one none-zero cell, i.e. some values of SEG determine
 the output with certainty.
 
 So initially I was thinking of a proportional odds logistic regression
 model, but following Hosmer and Lemeshow [1], zero cells are
 problematic. So I take out of the data table the deterministic values of
 SEG, and I pool QUA=2 and QUA=3, and now I have a dychotomous outcome
 (QUA = Good/Bad) and no zero cells.
 
 The following model doesn't find evidence of interaction
 
 glm(QUA ~ STR * SEG, data=data3.df, family=binomial)
 
 so I go for
 
 glm(QUA ~ STR + SEG, data=data3.df, family=binomial)
 
 
 (I suppose that what glm does is to create design variables for SEG,
 where 0 0 ... 0 is for the first value of SEG, 1 0 ... 0 for the second
 value, 0 1 0 ... 0 for the third, etc).
 
 Coefficients:
   Estimate Std. Error   z value Pr(|z|)
 (Intercept) -1.085e+00  1.933e-01-5.614 1.98e-08 ***
 STR.L2.112e-01  6.373e-02 3.314 0.000921 ***
 SEGP2C.MI   -9.869e-01  3.286e-01-3.004 0.002669 **
 SEGP2C.AI   -1.306e+00  3.585e-01-3.644 0.000269 ***
 SEGP2C.AA   -1.743e+00  4.123e-01-4.227 2.37e-05 ***
 [shortened]
 SEGP4C.ML   -5.657e-01  2.990e-01-1.892 0.058485 .
 SEGP4C.BL   -2.908e-16  2.734e-01 -1.06e-15 1.00
 SEGSAX.MS1.092e-01  2.700e-01 0.405 0.685772
 SEGSAX.MAS  -5.441e-16  2.734e-01 -1.99e-15 1.00
 SEGSAX.MA7.130e-01  2.582e-01 2.761 0.005758 **
 SEGSAX.ML1.199e+00  2.565e-01 4.674 2.96e-06 ***
 SEGSAX.MP1.313e+00  2.570e-01 5.108 3.26e-07 ***
 SEGSAX.MI8.865e-01  2.569e-01 3.451 0.000558 ***
 ---
 Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
 
 (Dispersion parameter for binomial family taken to be 1)
 
 Null deviance: 3462.0  on 3123  degrees of freedom
 Residual deviance: 3012.6  on 3101  degrees of freedom
 AIC: 3058.6
 
 Number of Fisher Scoring iterations: 6
 
 
 Even though some coefficients have no evidence of statistical
 significance, the model requires them from a clinical point of view.
 
 At this point, the question would be how to interpret these results, and
 what advantage they offer over odds ratios. From [1] I can understand
 that in the case of a dychotomous and a continuous predictor, you can
 adjust for the continuous variable.
 
 But when all predictors are dychotomous (due to the design variables), I
 don't quite see the effect of adjustment. Wouldn't it be better just to
 split the data in two groups (STR=0 and STR=1), and instead of using
 logistic regression, use odds ratios for each value of SEG?
 
 Cheers,
 
 Ramón.
 
 [1] D.W. Hosmer and S. Lemeshow. ``Applied Logistic Regression''.
 John-Wiley. 2000.
 

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression with nominal predictors

2005-09-13 Thread Ramón Casero Cañas


(Sorry for obvious mistakes, as I am quite a newby with no Statistics
background).

My question is going to be what is the gain of logistic regression over
odds ratios when none of the input variables is continuous.


My experiment:
 Outcome: ordinal scale, ``quality'' (QUA=1,2,3)
 Predictors: ``segment'' (SEG) and ``stress'' (STR). SEG is
 nominal scale with 24 levels, and STR is dychotomous (0,1).



Considering the outcome continuous, two-way ANOVA with

aov(as.integer(QUA) ~ SEG * STR)

doesn't find evidence of interaction between SEG and STR, and they are
significant on their own. This is the result that we would expect from
clinical knowledge.



I use

xtabs(~QUA+SEG, data=data2.df, subset=STR==0)
xtabs(~QUA+SEG, data=data2.df, subset=STR==0)

for the contingency tables. There are zero cells, and for some values of
SEG, there is only one none-zero cell, i.e. some values of SEG determine
the output with certainty.

So initially I was thinking of a proportional odds logistic regression
model, but following Hosmer and Lemeshow [1], zero cells are
problematic. So I take out of the data table the deterministic values of
SEG, and I pool QUA=2 and QUA=3, and now I have a dychotomous outcome
(QUA = Good/Bad) and no zero cells.

The following model doesn't find evidence of interaction

glm(QUA ~ STR * SEG, data=data3.df, family=binomial)

so I go for

glm(QUA ~ STR + SEG, data=data3.df, family=binomial)


(I suppose that what glm does is to create design variables for SEG,
where 0 0 ... 0 is for the first value of SEG, 1 0 ... 0 for the second
value, 0 1 0 ... 0 for the third, etc).

Coefficients:
  Estimate Std. Error   z value Pr(|z|)
(Intercept) -1.085e+00  1.933e-01-5.614 1.98e-08 ***
STR.L2.112e-01  6.373e-02 3.314 0.000921 ***
SEGP2C.MI   -9.869e-01  3.286e-01-3.004 0.002669 **
SEGP2C.AI   -1.306e+00  3.585e-01-3.644 0.000269 ***
SEGP2C.AA   -1.743e+00  4.123e-01-4.227 2.37e-05 ***
[shortened]
SEGP4C.ML   -5.657e-01  2.990e-01-1.892 0.058485 .
SEGP4C.BL   -2.908e-16  2.734e-01 -1.06e-15 1.00
SEGSAX.MS1.092e-01  2.700e-01 0.405 0.685772
SEGSAX.MAS  -5.441e-16  2.734e-01 -1.99e-15 1.00
SEGSAX.MA7.130e-01  2.582e-01 2.761 0.005758 **
SEGSAX.ML1.199e+00  2.565e-01 4.674 2.96e-06 ***
SEGSAX.MP1.313e+00  2.570e-01 5.108 3.26e-07 ***
SEGSAX.MI8.865e-01  2.569e-01 3.451 0.000558 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3462.0  on 3123  degrees of freedom
Residual deviance: 3012.6  on 3101  degrees of freedom
AIC: 3058.6

Number of Fisher Scoring iterations: 6


Even though some coefficients have no evidence of statistical
significance, the model requires them from a clinical point of view.

At this point, the question would be how to interpret these results, and
what advantage they offer over odds ratios. From [1] I can understand
that in the case of a dychotomous and a continuous predictor, you can
adjust for the continuous variable.

But when all predictors are dychotomous (due to the design variables), I
don't quite see the effect of adjustment. Wouldn't it be better just to
split the data in two groups (STR=0 and STR=1), and instead of using
logistic regression, use odds ratios for each value of SEG?

Cheers,

Ramón.

[1] D.W. Hosmer and S. Lemeshow. ``Applied Logistic Regression''.
John-Wiley. 2000.

-- 
Ramón Casero Cañas

web:http://www.robots.ox.ac.uk/~rcasero/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression with constrained parameter ??

2005-08-28 Thread Prof Brian Ripley

On Sat, 27 Aug 2005, Benn Fine wrote:

 I want to fit a logistic regression model under a specified
 null hypothesis, say Ho:Beta_k=1

 Is there a way to constrain a parameter in glm.fit ??

You should be calling glm().  Then you can use offset() in your formula.
(You can also use it as an argument, but please don't as it is only 
partially supported.)


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

PLEASE do (and notice what is says about HTML mail).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression: categorical value, and multinomial

2005-07-27 Thread Haibo Huang

I have two questions:

1. If I want to do a binomial logit, how to handle the
categorical response variable? Data for the response
variables are not numerical, but text.

2. What if I want to do a multinomial logit, still
with categorical response variable? The variable has 5
non-numerical response levels, I have to do it with a
multinomial logit.

Any input is highly appreciated! Thanks!

Ed

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression: categorical value, and multinomial

2005-07-27 Thread ronggui

 d-data.frame(y=sample(letters[1:2],100,T),x=rnorm(100))
 head(d,10)
   y   x
1  b  0.55915620
2  b  0.87575380
3  b -0.13093156
4  b  0.75925729
5  b  0.40233427
6  b  1.34685918
7  a  1.10487752
8  a -2.27456596
9  a  1.65919787
10 b  0.05095611

 glm(y~x,data=d,family=binomial)

Call:  glm(formula = y ~ x, family = binomial, data = d) 

Coefficients:
(Intercept)x  
 0.2771   0.5348  

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:  136.1 
Residual Deviance: 129.5AIC: 133.5 


=== 2005-07-28 00:22:55 您在来信中写道：===

I have two questions:

1. If I want to do a binomial logit, how to handle the
categorical response variable? Data for the response
variables are not numerical, but text.

2. What if I want to do a multinomial logit, still
with categorical response variable? The variable has 5
non-numerical response levels, I have to do it with a
multinomial logit.

Any input is highly appreciated! Thanks!

Ed

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

= = = = = = = = = = = = = = = = = = = =



 

2005-07-28

--
Deparment of Sociology
Fudan University

Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression: categorical value, and multinomial

2005-07-27 Thread John Fox

Dear Ed,

See ?glm for fitting binomial logit models, and ?multinom (in the nnet
package) for multinomial logit models. Neither function will handle a
character (text) variable as the response, but you could easily convert
the variable to a factor.

John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Haibo Huang
 Sent: Wednesday, July 27, 2005 11:23 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] logistic regression: categorical value, and multinomial
 
 I have two questions:
 
 1. If I want to do a binomial logit, how to handle the 
 categorical response variable? Data for the response 
 variables are not numerical, but text.
 
 2. What if I want to do a multinomial logit, still with 
 categorical response variable? The variable has 5 
 non-numerical response levels, I have to do it with a 
 multinomial logit.
 
 Any input is highly appreciated! Thanks!
 
 Ed

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression asymptote problem

2005-07-05 Thread Kevin J Emerson

R-helpers,

I have a question about logistic regressions.

Consider a case where you have binary data that reaches an asymptote
that is not 1, maybe its 0.5.  Can I still use a logistic regression to
fit a curve to this data?  If so, how can I do this in R.  As far as I
can figure out, using a logit link function assumes that the asymptote
is at y = 1.

An example.  Consider the following data:

tmp -
structure(list(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14), yes = c(0, 0, 0, 2, 1, 14, 24, 15, 23, 18, 22, 20, 14, 17
), no = c(94, 101, 95, 80, 81, 63, 51, 56, 30, 38, 31, 18, 21, 
20)), .Names = c(x, yes, no), row.names = c(1, 2, 3, 
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), class =
data.frame)

where x is the independent variable, and yes and no are counts of
events.  plotting the data you can see that the data seem to reach an
asymptote at around y=0.5.  using glm to fit a logistic regression it is
easily seen that it does not fit well.

tmp.glm - glm(cbind(yes,no) ~ x, data = tmp, family = binomial(link =
logit))
plot(tmp.glm$fitted, type = l, ylim = c(0,1))
par(new=T)
plot(tmp$yes / (tmp$yes + tmp$no), ylim = c(0,1))

Any suggestions would be greatly appreciated.

Cheers,
Kevin

-- 


Kevin J Emerson
Center for Ecology and Evolutionary Biology
1210 University of Oregon
University of Oregon
Eugene, OR 97403
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression asymptote problem

2005-07-05 Thread Spencer Graves

  I saw a standard overdispersed binomial.  In particular, I saw NO 
evidence of saturation at 0.5 or anything below 1.  I did the following:


tmp$N - tmp$yes+tmp$no

with(tmp, plot(x, yes))
with(tmp, plot(x, yes/N))

tmp.glm - glm(cbind(yes,no) ~ x, data = tmp, family = binomial(link 
=logit))
tmp.glmq - glm(cbind(yes,no) ~ x, data = tmp, family = 
quasibinomial(link =logit))
summary(tmp.glm)
summary(tmp.glmq)

plot(tmp.glm)
plot(tmp.glmq)

# Test the statistical significance of the Dispersion parameter
pchisq(summary(tmp.glmq)$dispersion*12, 12, lower=FALSE)

  hope this helps.
  spencer graves


Kevin J Emerson wrote:

 R-helpers,
 
 I have a question about logistic regressions.
 
 Consider a case where you have binary data that reaches an asymptote
 that is not 1, maybe its 0.5.  Can I still use a logistic regression to
 fit a curve to this data?  If so, how can I do this in R.  As far as I
 can figure out, using a logit link function assumes that the asymptote
 is at y = 1.
 
 An example.  Consider the following data:
 
 tmp -
 structure(list(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
 14), yes = c(0, 0, 0, 2, 1, 14, 24, 15, 23, 18, 22, 20, 14, 17
 ), no = c(94, 101, 95, 80, 81, 63, 51, 56, 30, 38, 31, 18, 21, 
 20)), .Names = c(x, yes, no), row.names = c(1, 2, 3, 
 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), class =
 data.frame)
 
 where x is the independent variable, and yes and no are counts of
 events.  plotting the data you can see that the data seem to reach an
 asymptote at around y=0.5.  using glm to fit a logistic regression it is
 easily seen that it does not fit well.
 
 tmp.glm - glm(cbind(yes,no) ~ x, data = tmp, family = binomial(link =
 logit))
 plot(tmp.glm$fitted, type = l, ylim = c(0,1))
 par(new=T)
 plot(tmp$yes / (tmp$yes + tmp$no), ylim = c(0,1))
 
 Any suggestions would be greatly appreciated.
 
 Cheers,
 Kevin
 

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression - using polys and products of features

2005-06-17 Thread ronggui

On Fri, 17 Jun 2005 07:39:30 +1000
Stephen Choularton [EMAIL PROTECTED] wrote:

 Hi
  
 I can get all my features by doing this:
  
  logistic.model = glm(similarity ~ ., family=binomial, data =
 cData[3001:3800,])
  
  
 I can get the product of all my features by this:
  
 logistic.model = glm(similarity ~ . ^ 2,  family=binomial, data =
 cData[3001:3800,])
  
 I don't seem to be able to get polys by doing this:
  
 logistic.model = glm(similarity ~ poly(.,2), family=binomial, data =
 cData[3001:3800,])
 Error in poly(., 2) : Object . not found
 
  
 How can I get polys?
  
 What do the warnings mean  when I  do this:
  
  logistic.model = glm(similarity ~ . + . ^ 2, family=binomial, data =
 cData[3001:3800,])
 Warning messages: 
 1: Algorithm did not converge in: glm.fit(x = X, y = Y, weights =
 weights, start = start, etastart = etastart,  
 2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
 = Y, weights = weights, start = start, etastart = etastart,  

.^2 means all the 2-order and 1-order terms,so .+.^2 is meaningless.

 How can I do this?
  
 logistic.model = glm(similarity ~ . + . ^ 2 + poly(.,2),
 family=binomial, data = cData[3001:3800,])


  
 Thanks
  
 Stephen
  
  
  
 
 --
 Internal Virus Database is out-of-date.
 Checked by AVG Anti-Virus.
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Department of Sociology
Fudan University,Shanghai
Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression - using polys and products of features

2005-06-16 Thread Stephen Choularton

Hi
 
I can get all my features by doing this:
 
 logistic.model = glm(similarity ~ ., family=binomial, data =
cData[3001:3800,])
 
 
I can get the product of all my features by this:
 
logistic.model = glm(similarity ~ . ^ 2,  family=binomial, data =
cData[3001:3800,])
 
I don't seem to be able to get polys by doing this:
 
logistic.model = glm(similarity ~ poly(.,2), family=binomial, data =
cData[3001:3800,])
Error in poly(., 2) : Object . not found

 
How can I get polys?
 
What do the warnings mean  when I  do this:
 
 logistic.model = glm(similarity ~ . + . ^ 2, family=binomial, data =
cData[3001:3800,])
Warning messages: 
1: Algorithm did not converge in: glm.fit(x = X, y = Y, weights =
weights, start = start, etastart = etastart,  
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
= Y, weights = weights, start = start, etastart = etastart,  

 
How can I do this?
 
logistic.model = glm(similarity ~ . + . ^ 2 + poly(.,2),
family=binomial, data = cData[3001:3800,])
 
Thanks
 
Stephen
 
 
 

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression with more than two choices

2005-06-14 Thread Marc Girondot

Dear all R-users,

I am a new user of R and I am trying to build a discrete choice model (with
more than two alternatives A, B, C and D) using logistic regression. I have
data that describes the observed choice probabilities and some background
information. An example below describes the data:

SexAge pr(A)   pr(B)   pr(C)   pr(D) ...
1  11  0.5 0.5 0   0
1  40  1   0   0   0
0  34  0   0   0   1
0  64  0.1 0.5 0.2 0.2
...

You can use multinom()
Here is an exemple

For example let this matrix to be analyzed:
male   female   aborted   factor
10 12   1 1.2
14 14   4 1.3
15 12   3 1.4

The data are to be entered in a text file like this:

output  factor  n
m   1.2 10
f   1.2 12
a   1.2 1
m   1.3 14
f   1.3 14
a   1.3 4
m   1.4 15
f   1.4 12
a   1.4 3

library(MASS)

dt.plr - multinom(output ~ factor, data=dt, weights=n, maxit=1000)
dt.pr1-predict(dt.plr, , type=probs)
dt.pr1

I have been able to model a case with only two alternatives A and not A
by using glm().

I do not know what functions are available to estimate such a model with
more than two alternatives. Multinom() is one possibility, but it only
allows the use of binary 0/1-data instead of observed probabilities. Did I
understand this correctly?

Additionally, I am willing to use different independent variables for the
different alternatives in the model. Formally, I mean that:
Pr(A)=exp(uA)/(exp(uA)+exp(uB)+exp(uC)+exp(uD)
Pr(B)=exp(uB)/(exp(uA)+exp(uB)+exp(uC)+exp(uD)
...
where uA, uB, uC and uD are linear functions with different independent
variables, e.g. uA=alpha_A1*Age, uB=alpha_B1*Sex.

Do you know how to estimate this type of models in R?

I don't think it is possible... (at least simply, 
without writing all the script !)

Note that I don't undrestand where the residual 
deviance from multinom() come from. I cant find 
the logic.

Marc
-- 

__
Marc Girondot, Pr
Laboratoire Ecologie, Systématique et Evolution
Equipe de Conservation des Populations et des Communautés
CNRS, ENGREF et Université Paris-Sud 11 , UMR 8079
Bâtiment 362
91405 Orsay Cedex, France

Tel:  33 1 (0)1.69.15.72.30   Fax: 33 1 (0)1 69 
15 56 96   e-mail: [EMAIL PROTECTED]
Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html
Skype: girondot
Fax in US: 1-425-732-6934

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression (glm binary)

2005-06-07 Thread Stephen Choularton

Hi
 
I am looking for a couple of pointers using glm (family = binary).
 
1.  I want to add all the products of my predictive features as
additional features (and I have 23 of them). Is there some easy way to
add them?
 
2. I want to drop each feature in turn and get the most significant,
then drop two and get the next most significant, etc.  Is there some
function that allows me to do this?
 
Thanks
 
Stephen

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] logistic regression (glm binary)

2005-06-07 Thread Liaw, Andy

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Stephen Choularton
 Sent: Tuesday, June 07, 2005 11:49 PM
 To: R Help
 Subject: [R] logistic regression (glm binary)

 Hi

 I am looking for a couple of pointers using glm (family = binary).

Do you mean binomial instead of binary?

 1.  I want to add all the products of my predictive features as
 additional features (and I have 23 of them). Is there some easy way to
 add them?

Probably something along the line:

 dat - data.frame(y=sample(0:1, 100, replace=TRUE), matrix(runif(300),
ncol=3))
 fm - glm(y ~ .^2, family=binomial, data=dat)
 summary(fm)

Call:
glm(formula = y ~ .^2, family = binomial, data = dat)

Deviance Residuals: 
   Min  1Q  Median  3Q Max  
-1.654  -1.175   0.608   1.116   1.651  

Coefficients:
Estimate Std. Error z value Pr(|z|)  
(Intercept)3.264  1.536   2.125   0.0336 *
X1-3.379  2.026  -1.668   0.0953 .
X2-4.659  2.244  -2.077   0.0378 *
X3-3.531  2.060  -1.714   0.0865 .
X1:X2  4.535  2.775   1.634   0.1022  
X1:X3  2.123  2.639   0.804   0.4212  
X2:X3  4.315  2.746   1.571   0.1161  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 137.99  on 99  degrees of freedom
Residual deviance: 131.84  on 93  degrees of freedom
AIC: 145.84

Number of Fisher Scoring iterations: 3

 2. I want to drop each feature in turn and get the most significant,
 then drop two and get the next most significant, etc.  Is there some
 function that allows me to do this?

Not that I know of, and most likely for a very, very good reason...

Andy

 Thanks

 Stephen

 -- 
 No virus found in this outgoing message.
 Checked by AVG Anti-Virus.

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression

2005-05-26 Thread Stephen Choularton

Hi
 
I am working on corpora of automatically recognized utterances, looking
for features that predict error in the hypothesis the recognizer is
proposing.  
 
I am using the glm functions to do logistic regression.  I do this type
of thing:
 
*   logistic.model = glm(formula = similarity ~., family = binomial,
data = data)
 
and end up with a model:
 
 summary(logistic.model)
 
Call:
glm(formula = similarity ~ ., family = binomial, data = data)
 
Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-3.1599   0.2334   0.3307   0.4486   1.2471  
 
Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)   11.1923783  4.6536898   2.405  0.01617 *  
length-0.3529775  0.2416538  -1.461  0.14410
meanPitch -0.0203590  0.0064752  -3.144  0.00167 ** 
minimumPitch   0.0257213  0.0053092   4.845 1.27e-06 ***
maximumPitch  -0.0003454  0.0030008  -0.115  0.90838
meanF1 0.0137880  0.0047035   2.931  0.00337 ** 
meanF2 0.0040238  0.0041684   0.965  0.33439
meanF3-0.0075497  0.0026751  -2.822  0.00477 ** 
meanF4-0.0005362  0.0007443  -0.720  0.47123
meanF5-0.0001560  0.0003936  -0.396  0.69187
ratioF2ToF10.2668678  2.8926149   0.092  0.92649
ratioF3ToF11.7339087  1.7655757   0.982  0.32607
jitter-5.2571384 10.8043359  -0.487  0.62656
shimmer   -2.3040826  3.0581950  -0.753  0.45120
percentUnvoicedFrames  0.1959342  1.3041689   0.150  0.88058
numberOfVoiceBreaks   -0.1022074  0.0823266  -1.241  0.21443
percentOfVoiceBreaks  -0.0590097  1.2580202  -0.047  0.96259
meanIntensity -0.0765124  0.0612008  -1.250  0.21123
minimumIntensity   0.1037980  0.0331899   3.127  0.00176 ** 
maximumIntensity  -0.0389995  0.0430368  -0.906  0.36484
ratioIntensity-2.0329346  1.2420286  -1.637  0.10168
noSyllsIntensity   0.1157678  0.0947699   1.222  0.22187
startSpeech0.0155578  0.1343117   0.116  0.90778
speakingRate  -0.2583315  0.1648337  -1.567  0.11706
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
 
(Dispersion parameter for binomial family taken to be 1)
 
Null deviance: 2462.3  on 4310  degrees of freedom
Residual deviance: 2209.5  on 4287  degrees of freedom
AIC: 2257.5
 
Number of Fisher Scoring iterations: 6
 
 
I have seen models where almost all the features are showing one in a
thousand significance but I accept that I could improve my model by
normalizing some of the features (some are left skewed and I understand
that I will get a better fir by taking their logs, for example).
 
What really worries me is that the logistic function produces
predictions that appear to fall well outside 0 to 1.
 
If I make a dataset of the medians of the above features and use my
logistic.model on it, it produces a 
figure of:
 
 x = predict(logistic.model, medians)
 x
[1] 2.82959

 
which is well outside the range of 0 to 1.
 
The actual distribution of all the predictions is:
 
 summary(pred)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
 -1.516   2.121   2.720   2.731   3.341   6.387 
 
 
I can get the model to give some sort of prediction by doing this:
 
 pred = predict(logistic.model, data)
 pred[pred = 1.5] = 0 
 pred[pred  1.5] = 1 
 t = table(pred, data[,24])
 t

pred 01   
   0  102  253
   1  255 3701
 
 classAgreement(t)
$diag
[1] 0.8821619
 
$kappa
[1] 0.949
 
$rand
[1] 0.7920472
 
$crand
[1] 0.1913888
 

 
but as you can see I am using a break point well outside the range 0 to
1 and the kappa is rather low (I think).
 
I am a bit of a novice in this, and the results worry me.  
 
Can anyone comment if the results look strange, or if they know I am
doing something wrong?
 
Stephen
 

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression

2005-05-26 Thread Simon Blomberg

predict.glm by default produces predictions on the scale of the 
linear predictors. If in a logistic regression, you want the 
predictions to be on the response scale [0,1],  use


x - predict(logistic.model, medians, type=response)

for example. See ?predict.glm for details.

Cheers,

Simon.




Hi

I am working on corpora of automatically recognized utterances, looking
for features that predict error in the hypothesis the recognizer is
proposing. 


I am using the glm functions to do logistic regression.  I do this type
of thing:

*   logistic.model = glm(formula = similarity ~., family = binomial,
data = data)

and end up with a model:


 summary(logistic.model)


Call:
glm(formula = similarity ~ ., family = binomial, data = data)

Deviance Residuals:
Min   1Q   Median   3Q  Max 
-3.1599   0.2334   0.3307   0.4486   1.2471 


Coefficients:
Estimate Std. Error z value Pr(|z|)   
(Intercept)   11.1923783  4.6536898   2.405  0.01617 * 
length-0.3529775  0.2416538  -1.461  0.14410   
meanPitch -0.0203590  0.0064752  -3.144  0.00167 **

minimumPitch   0.0257213  0.0053092   4.845 1.27e-06 ***
maximumPitch  -0.0003454  0.0030008  -0.115  0.90838   
meanF1 0.0137880  0.0047035   2.931  0.00337 **
meanF2 0.0040238  0.0041684   0.965  0.33439   
meanF3-0.0075497  0.0026751  -2.822  0.00477 **
meanF4-0.0005362  0.0007443  -0.720  0.47123   
meanF5-0.0001560  0.0003936  -0.396  0.69187   
ratioF2ToF10.2668678  2.8926149   0.092  0.92649   
ratioF3ToF11.7339087  1.7655757   0.982  0.32607   
jitter-5.2571384 10.8043359  -0.487  0.62656   
shimmer   -2.3040826  3.0581950  -0.753  0.45120   
percentUnvoicedFrames  0.1959342  1.3041689   0.150  0.88058   
numberOfVoiceBreaks   -0.1022074  0.0823266  -1.241  0.21443   
percentOfVoiceBreaks  -0.0590097  1.2580202  -0.047  0.96259   
meanIntensity -0.0765124  0.0612008  -1.250  0.21123   
minimumIntensity   0.1037980  0.0331899   3.127  0.00176 **
maximumIntensity  -0.0389995  0.0430368  -0.906  0.36484   
ratioIntensity-2.0329346  1.2420286  -1.637  0.10168   
noSyllsIntensity   0.1157678  0.0947699   1.222  0.22187   
startSpeech0.0155578  0.1343117   0.116  0.90778   
speakingRate  -0.2583315  0.1648337  -1.567  0.11706   
---

Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2462.3  on 4310  degrees of freedom
Residual deviance: 2209.5  on 4287  degrees of freedom
AIC: 2257.5

Number of Fisher Scoring iterations: 6


I have seen models where almost all the features are showing one in a
thousand significance but I accept that I could improve my model by
normalizing some of the features (some are left skewed and I understand
that I will get a better fir by taking their logs, for example).

What really worries me is that the logistic function produces
predictions that appear to fall well outside 0 to 1.

If I make a dataset of the medians of the above features and use my
logistic.model on it, it produces a
figure of:

  x = predict(logistic.model, medians)

 x

[1] 2.82959




which is well outside the range of 0 to 1.

The actual distribution of all the predictions is:


 summary(pred)

   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 -1.516   2.121   2.720   2.731   3.341   6.387




I can get the model to give some sort of prediction by doing this:


 pred = predict(logistic.model, data)
 pred[pred = 1.5] = 0
 pred[pred  1.5] = 1
 t = table(pred, data[,24])
 t
   
pred 01  
   0  102  253

   1  255 3701


 classAgreement(t)

$diag
[1] 0.8821619

$kappa
[1] 0.949

$rand
[1] 0.7920472

$crand
[1] 0.1913888





but as you can see I am using a break point well outside the range 0 to
1 and the kappa is rather low (I think).

I am a bit of a novice in this, and the results worry me. 


Can anyone comment if the results look strange, or if they know I am
doing something wrong?

Stephen


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany  Zoology
The Australian National University
Canberra ACT 0200
Australia

T: +61 2 6125 8057  email: [EMAIL PROTECTED]
F: +61 2 6125 5573

CRICOS Provider # 00120C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression: differential importance of regressors

2005-05-19 Thread Greg Trafton

Hi, All.  I have a logistic regression model that I have run.  The
question came up:  which of these regressors is more important than
another?

(I'm using Design)

Logistic Regression Model

lrm(formula = iconicgesture ~ ST + SSP + magnitude + Condition + 
Expertise, data = d)

  CoefS.E.   Wald Z P 
Intercept -3.2688 0.2854 -11.45 0.
ST 2.0871 0.2730   7.64 0.
SSP0.7454 0.3031   2.46 0.0139
magnitude -0.9905 0.6284  -1.58 0.1150
Condition  0.9506 0.2932   3.24 0.0012
Expertise  0.8508 0.2654   3.21 0.0013

The real question is that, since both ST and SSP load significantly
into the model, how do I show that ST has a bigger/smaller/similar
effect than SSP?

thanks in advance!
greg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression: differential importance of regressors

2005-05-19 Thread Frank E Harrell Jr

Greg Trafton wrote:
Hi, All.  I have a logistic regression model that I have run.  The
question came up:  which of these regressors is more important than
another?
(I'm using Design)
Logistic Regression Model
lrm(formula = iconicgesture ~ ST + SSP + magnitude + Condition + 
Expertise, data = d)

  CoefS.E.   Wald Z P 
Intercept -3.2688 0.2854 -11.45 0.
ST 2.0871 0.2730   7.64 0.
SSP0.7454 0.3031   2.46 0.0139
magnitude -0.9905 0.6284  -1.58 0.1150
Condition  0.9506 0.2932   3.24 0.0012
Expertise  0.8508 0.2654   3.21 0.0013

The real question is that, since both ST and SSP load significantly
into the model, how do I show that ST has a bigger/smaller/similar
effect than SSP?
thanks in advance!
greg
One thing you can do is to compute what proportion of the total 
likelihood ratio chi-square statistic is due to each variable by 
removing one at a time and looking at difference in Model L.R. (assuming 
both have same observations missing).  Note that you are making heavy 
linearity assumptions.

You can also use the bootstrap to get a confidence interval on the rank 
of the chi-square statistic a variable has among all competing chi-squares.

Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression weights problem

2005-04-13 Thread Federico Calboli

Hi All,

I have a problem with weighted logistic regression. I have a number of
SNPs  and a case/control scenario, but not all genotypes are as
guaranteed as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily inferred.

My data is quite big, but with a dummy example:

 status - c(1,1,1,0,0)
 SNPs - matrix( c(1,0,1,0,0,0,0,1,0,1,0,1,0,1,1), ncol =3)
 weight - c(0.2, 0.1, 1, 0.8, 0.7)
 glm(status ~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)

Coefficients:
(Intercept)SNPs1SNPs2SNPs3
 -2.079   42.282  -18.964   NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:  3.867
Residual Deviance: 0.6279   AIC: 6.236
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
= Y, weights = weights, start = start, etastart = etastart,

NB I do not get warning (2) for my data so I'll completely disregard it.

Warning (1) looks suspiciously like a multiplication of my C/C status by
the weights... what exacly is glm doing with the weight vector? 

In any case, how would I go about weighting my individuals in a logistic
regression?

Regards,

Federico Calboli


-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression weights problem

2005-04-13 Thread Prof Brian Ripley

On Wed, 13 Apr 2005, Federico Calboli wrote:
I have a problem with weighted logistic regression. I have a number of
SNPs  and a case/control scenario, but not all genotypes are as
guaranteed as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily inferred.
My data is quite big, but with a dummy example:
status - c(1,1,1,0,0)
SNPs - matrix( c(1,0,1,0,0,0,0,1,0,1,0,1,0,1,1), ncol =3)
weight - c(0.2, 0.1, 1, 0.8, 0.7)
glm(status ~ SNPs, weights = weight, family = binomial)
Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)
Coefficients:
(Intercept)SNPs1SNPs2SNPs3
-2.079   42.282  -18.964   NA
Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:  3.867
Residual Deviance: 0.6279   AIC: 6.236
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
= Y, weights = weights, start = start, etastart = etastart,
NB I do not get warning (2) for my data so I'll completely disregard it.
Warning (1) looks suspiciously like a multiplication of my C/C status by
the weights... what exacly is glm doing with the weight vector?
Using it in the GLM definition.  If you specify 0=y_i=1 and weights a_i, 
this is how you specify Binomial(a_i, a_iy_i).  Look up any book on GLMs 
and see what it says about the binomial.  E.g. MASS4 pp. 184, 190.

In any case, how would I go about weighting my individuals in a logistic
regression?
Use the cbind(yes, no) form of specification.  Note though that the 
`weights' in a GLM are case weights and not arbitrary downweighting 
factors and aspects of the output (e.g. AIC, anova) depend on this.  A 
different implementation of (differently) weighted GLM is svyglm() in 
package 'survey'.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression weights problem

2005-04-13 Thread Federico Calboli

On Wed, 2005-04-13 at 17:42 +0100, Prof Brian Ripley wrote:
 Use the cbind(yes, no) form of specification.  Note though that the 
 `weights' in a GLM are case weights and not arbitrary downweighting 
 factors and aspects of the output (e.g. AIC, anova) depend on this.  A 
 different implementation of (differently) weighted GLM is svyglm() in 
 package 'survey'.

I tried to use cbind() on a slightly modified dummy set to get get rid
of all warnings and that's what I got:


status - c(1,1,1,0,0)
SNPs - matrix( c(1,0,1,0,0,1,0,1,0,1,0,1,0,1,1), ncol =3)
weight - c(0.2, 0.1, 1, 0.8, 0.7)

 using cbind()

glm(cbind(status, 1-status) ~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = cbind(status, 1 - status) ~ SNPs, family =
binomial,  weights = weight)

Coefficients:
(Intercept)SNPs1SNPs2SNPs3
 -2.079   43.132  -19.487   NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:  3.867
Residual Deviance: 0.6279   AIC: 6.236

### NOT using cbind()

glm(status~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)

Coefficients:
(Intercept)SNPs1SNPs2SNPs3
 -2.079   42.944  -19.366   NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:  3.867
Residual Deviance: 0.6279   AIC: 6.236
Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)
##

The anova() call of the cbind() model seems happy:

###
mod - glm(cbind(status, 1-status) ~ SNPs, weights = weight, family =
binomial)

anova(mod, test = Chi)
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(status, 1 - status)

Terms added sequentially (first to last)


 Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL 4 3.8673
SNPs  2   3.2394 2 0.62790.1980
#

The real data modeldoes not show warnings when I use cbind() but it
still shows warning when I call anova() on the model:

###
anova(glm(cbind(sta, 1-sta) ~ (X1 + X2 + X3 + X4 + X5) * breed, family=
binomial, weights =igf$we, igf),test=Chi)
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(sta, 1 - sta)

Terms added sequentially (first to last)


  Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL330 372.89
X1 1 0.14   329 372.75  0.71
X2 1 0.26   328 372.49  0.61
X3 1 0.001121   327 372.49  0.97
X4 1 2.63   326 369.86  0.10
X5 1 1.87   325 367.99  0.17
breed  3 5.41   322 362.58  0.14
X1:breed   3 2.98   319 359.60  0.39
X2:breed   3 1.21   316 358.39  0.75
X3:breed   2 1.32   314 357.08  0.52
X4:breed   3 1.75   311 355.33  0.63
X5:breed   2 2.38   309 352.95  0.30
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
3: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
4: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
5: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
6: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
7: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
8: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
9: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
10: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
##

Why such inconsistency?

Regards,

Federico Calboli

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression goodness of fit tests

2005-03-11 Thread Frank E Harrell Jr

Trevor Wiens wrote:
On Thu, 10 Mar 2005 16:19:41 -0600
Frank E Harrell Jr [EMAIL PROTECTED] wrote:

The goodness of fit test only works on prespecified models.  It is not 
valid when stepwise variable selection is used (unless perhaps you use 
alpha=0.5).


Perhaps I'm blind, but I can't find any reference to an alpha parameter on the 
help page for stepAIC. It runs fine however when I set the parameter and 
produces as model with the same right hand variables as without. Can you tell 
me what it is ?
T
What I mean is the effective significance level for keeping a variable 
in the model.  Using AIC for one degree of freedom variables is 
effectively using an alpha of 0.16 if I recall properly.

But I hope you got the point that resid(fit,'gof') as with most goodness 
of fit tests assumes prespecified models.

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression goodness of fit tests

2005-03-11 Thread Trevor Wiens

On Fri, 11 Mar 2005 07:32:30 -0500
Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 What I mean is the effective significance level for keeping a variable 
 in the model.  Using AIC for one degree of freedom variables is 
 effectively using an alpha of 0.16 if I recall properly.
 
 But I hope you got the point that resid(fit,'gof') as with most goodness 
 of fit tests assumes prespecified models.
 

Yes. I understand. Late last night someone replied similarly off the list but I 
didn't realize they hadn't posted to the list until this morning, otherwise I 
would have posted a note that my question had been resolved. Sorry for taking 
your time.

T
-- 
Trevor Wiens 
[EMAIL PROTECTED]

The significant problems that we face cannot be solved at the same 
level of thinking we were at when we created them. 
(Albert Einstein)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Logistic regression goodness of fit tests

2005-03-10 Thread Trevor Wiens

I was unsure of what suitable goodness-of-fit tests existed in R for logistic 
regression. After searching the R-help archive I found that using the Design 
models and resid, could be used to calculate this as follows:

d - datadist(mydataframe)
options(datadist = 'd')
fit - lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T)
resid(fit, 'gof').

I set up a script to first use glm to create models use stepAIC to determine 
the optimal model. I used this instead of fastbw because I found the AIC values 
to be completely different and the final models didn't always match. Then my 
script takes the reduced model formula and recreates it using lrm as above. Now 
the problem is that for some models I run into an error to which I can find no 
reference whatsoever on the mailing list or on the web. It is as follows:

test.lrm - lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + 
sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T)
singular information matrix in lrm.fit (rank= 10 ).  Offending variable(s):
slr_mean 
Error in j:(j + params[i] - 1) : NA/NaN argument


Now if I add the singularity criterion and make the value smaller than the 
default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate, it works. 
Why is that?

Not being a statistician but a biogeographer using regression as a tool, I 
don't really understand what is happening here. 

Does changing the tol variable, change how I should interpret goodness-of-fit 
results or other evaluations of the models created?

I've included a summary of the data below (in case it might be helpful) with 
all variables in the data frame as it was easier than selecting out the ones 
used in the model.

Thanks in advance.

T
-- 
Trevor Wiens 
[EMAIL PROTECTED]

The significant problems that we face cannot be solved at the same 
level of thinking we were at when we created them. 
(Albert Einstein)


 summary(datamatrix)
 siteid block recordyearcclo   
 564-125:   5   Min.   :1.000   Min.   :2000   Min.   :0.  
 564-130:   5   1st Qu.:2.000   1st Qu.:2001   1st Qu.:1.  
 564-135:   5   Median :3.000   Median :2002   Median :1.  
 564-140:   5   Mean   :3.042   Mean   :2002   Mean   :0.7509  
 564-145:   5   3rd Qu.:4.000   3rd Qu.:2003   3rd Qu.:1.  
 564-150:   5   Max.   :5.000   Max.   :2004   Max.   :1.  
 (Other):1098  

  elevslopeaspect  slr_mean   
 Min.   :0.   Min.   :0.1499   Min.   :0.   Min.   :7681  
 1st Qu.:0.   1st Qu.:0.5876   1st Qu.:0.   1st Qu.:7852  
 Median :1.   Median :0.9195   Median :0.   Median :7877  
 Mean   :0.6259   Mean   :1.2523   Mean   :0.2482   Mean   :7871  
 3rd Qu.:1.   3rd Qu.:1.6694   3rd Qu.:0.   3rd Qu.:7892  
 Max.   :1.   Max.   :5.3366   Max.   :1.   Max.   :7981  

   cti   cti_var   planar  feat_div
 Min.   :7.157   Min.   :0.4497   Min.   :0.   Min.   :1.000  
 1st Qu.:7.651   1st Qu.:0.6187   1st Qu.:1.   1st Qu.:2.000  
 Median :7.720   Median :0.8495   Median :1.   Median :3.000  
 Mean   :7.763   Mean   :0.9542   Mean   :0.8254   Mean   :3.379  
 3rd Qu.:7.822   3rd Qu.:1.1918   3rd Qu.:1.   3rd Qu.:4.000  
 Max.   :8.769   Max.   :2.5615   Max.   :1.   Max.   :6.000  

   chop_san   loamysandssandy   
 Min.   :0.0   Min.   :0.   Min.   :0.   Min.   :0.  
 1st Qu.:0.0   1st Qu.:0.   1st Qu.:0.   1st Qu.:0.  
 Median :0.0   Median :0.   Median :0.   Median :0.  
 Mean   :0.05762   Mean   :0.3094   Mean   :0.3236   Mean   :0.1099  
 3rd Qu.:0.0   3rd Qu.:1.   3rd Qu.:1.   3rd Qu.:0.  
 Max.   :1.0   Max.   :1.   Max.   :1.   Max.   :1.  
 
  wet  timesinceburn ndvi evi
 Min.   :0.0   Min.   :  1.00   Min.   :0.1140   Min.   :0.1041  
 1st Qu.:0.0   1st Qu.:100.00   1st Qu.:0.2973   1st Qu.:0.1667  
 Median :0.0   Median :100.00   Median :0.3342   Median :0.2027  
 Mean   :0.01950   Mean   : 87.84   Mean   :0.3629   Mean   :0.2184  
 3rd Qu.:0.0   3rd Qu.:100.00   3rd Qu.:0.4463   3rd Qu.:0.2711  
 Max.   :1.0   Max.   :100.00   Max.   :0.5932   Max.   :0.4788  
 
 msavi2  fc  gddprecip  
 Min.   :0.09156   Min.   :0.1552   Min.   :380.6   Min.   : 50.04  
 1st Qu.:0.14936   1st Qu.:0.3246   1st Qu.:492.8   1st Qu.: 76.17  
 Median :0.18257   Median :0.4082   Median :500.8   Median : 85.50  
 Mean   :0.19653   Mean   :0.4398   Mean   :476.4   Mean   : 94.35  
 3rd Qu.:0.24626   3rd Qu.:0.5630   3rd Qu.:501.6   3rd Qu.: 95.16  
 Max.   :0.33258   Max.   :0.6996   Max.

Re: [R] Logistic regression goodness of fit tests

2005-03-10 Thread Roger Bivand

On Thu, 10 Mar 2005, Trevor Wiens wrote:

 I was unsure of what suitable goodness-of-fit tests existed in R for
 logistic regression. After searching the R-help archive I found that
 using the Design models and resid, could be used to calculate this as
 follows:
 
 d - datadist(mydataframe)
 options(datadist = 'd')
 fit - lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T)
 resid(fit, 'gof').
 
 I set up a script to first use glm to create models use stepAIC to
 determine the optimal model. I used this instead of fastbw because I
 found the AIC values to be completely different and the final models
 didn't always match. Then my script takes the reduced model formula and
 recreates it using lrm as above. Now the problem is that for some models
 I run into an error to which I can find no reference whatsoever on the
 mailing list or on the web. It is as follows:
 
 test.lrm - lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + 
 sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T)
 singular information matrix in lrm.fit (rank= 10 ).  Offending variable(s):
 slr_mean 
 Error in j:(j + params[i] - 1) : NA/NaN argument
 
 
 Now if I add the singularity criterion and make the value smaller than
 the default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate,
 it works. Why is that?
 
 Not being a statistician but a biogeographer using regression as a tool,
 I don't really understand what is happening here.

From one geographer to another, and being prepared to bow to
better-founded explanations, you seem to have included a variable - the
offending variable slr_mean - that is very highly correlated with another.  
Making the tolerance tighter says that you are prepared to take the risk
of confounding your results. You've already been fishing for right hand
side variables anyway, so your results are somewhat prejudiced, aren't
they?

I think you may also like to review which of the right hand side variables
should be treated as factors rather than numeric (looking at the summary
suggests that many are factors), and perhaps the dependent variable too,
although lrm() seems to take care of this if you haven't.

 
 Does changing the tol variable, change how I should interpret
 goodness-of-fit results or other evaluations of the models created?
 
 I've included a summary of the data below (in case it might be helpful)
 with all variables in the data frame as it was easier than selecting out
 the ones used in the model.
 
 Thanks in advance.
 
 T
 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression goodness of fit tests

2005-03-10 Thread Frank E Harrell Jr

Trevor Wiens wrote:
I was unsure of what suitable goodness-of-fit tests existed in R for logistic 
regression. After searching the R-help archive I found that using the Design 
models and resid, could be used to calculate this as follows:
d - datadist(mydataframe)
options(datadist = 'd')
fit - lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T)
resid(fit, 'gof').
I set up a script to first use glm to create models use stepAIC to determine 
the optimal model. I used this instead of fastbw because I found the AIC values 
to be completely different and the final models didn't always match. Then my 
script takes the reduced model formula and recreates it using lrm as above. Now 
the problem is that for some models I run into an error to which I can find no 
reference whatsoever on the mailing list or on the web. It is as follows:
test.lrm - lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T)
singular information matrix in lrm.fit (rank= 10 ).  Offending variable(s):
slr_mean 
Error in j:(j + params[i] - 1) : NA/NaN argument

Now if I add the singularity criterion and make the value smaller than the 
default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate, it works. 
Why is that?
Not being a statistician but a biogeographer using regression as a tool, I don't really understand what is happening here. 

Does changing the tol variable, change how I should interpret goodness-of-fit 
results or other evaluations of the models created?
I've included a summary of the data below (in case it might be helpful) with 
all variables in the data frame as it was easier than selecting out the ones 
used in the model.
Thanks in advance.
T
The goodness of fit test only works on prespecified models.  It is not 
valid when stepwise variable selection is used (unless perhaps you use 
alpha=0.5).

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression goodness of fit tests

2005-03-10 Thread Trevor Wiens

On Thu, 10 Mar 2005 22:36:09 +0100 (CET)
Roger Bivand [EMAIL PROTECTED] wrote:

 From one geographer to another, and being prepared to bow to
 better-founded explanations, you seem to have included a variable - the
 offending variable slr_mean - that is very highly correlated with another.  
 Making the tolerance tighter says that you are prepared to take the risk
 of confounding your results. You've already been fishing for right hand
 side variables anyway, so your results are somewhat prejudiced, aren't
 they?
 
 I think you may also like to review which of the right hand side variables
 should be treated as factors rather than numeric (looking at the summary
 suggests that many are factors), and perhaps the dependent variable too,
 although lrm() seems to take care of this if you haven't.
 

Thanks for your informative reply.

The nature of the research is habitat selection for 15 species of grassland 
birds (my masters project).  The response here is presence of Chestnut-collared 
Longsur (cclo). I very carefull reviewed the variables for collinearity and 
none of them showed any difficulty except in a few cases which I've used to 
break some cases into two models where one would have otherwise seemed 
reasonable. Just out of interest however I did run the global model, and this 
problem didn't occur, which seems to indicate to me, based on your comments, 
I'm seeing an interaction effect, not a result of two closely correlated 
variables. 

I don't think I've been fishing. I selected variables for inclusion in 
competing models based on ecologically reasonable criteria. I did examine 
relationships between species occurance and static variables such as dem 
derived variables, to see if the data supported including all variables that 
based on ecological criteria should explain the birds distribution. I have 
included some variables inspite of weak statistical relationships based on a 
paper by Anderson, Burnham and Thompson in Journal of Wildlife Management which 
talks about how factors that are ecologically significant, can have interaction 
effects in a model to provide explanation in your response variable, even 
though individually they are not statistically significant by themselves. So, 
I've tried to avoid fishing, but instead simply trying to select the most 
parsimonious models from the set of selected models for each species.

Based on your advice once I've selected my top candidate models, I'll re-run at 
lower tolerances and only keep models that can pass at that level. 
Alternatively I could simply reject models that don't pass at lower tolerances. 
I do find it curious however that they run fine using glm (family = binomial) 
without complaint.

Thanks again.

T
-- 
Trevor Wiens 
[EMAIL PROTECTED]

The significant problems that we face cannot be solved at the same 
level of thinking we were at when we created them. 
(Albert Einstein)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Logistic regression goodness of fit tests

2005-03-10 Thread Trevor Wiens

On Thu, 10 Mar 2005 16:19:41 -0600
Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 The goodness of fit test only works on prespecified models.  It is not 
 valid when stepwise variable selection is used (unless perhaps you use 
 alpha=0.5).
 

Perhaps I'm blind, but I can't find any reference to an alpha parameter on the 
help page for stepAIC. It runs fine however when I set the parameter and 
produces as model with the same right hand variables as without. Can you tell 
me what it is ?

T
-- 
Trevor Wiens 
[EMAIL PROTECTED]

The significant problems that we face cannot be solved at the same 
level of thinking we were at when we created them. 
(Albert Einstein)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] logistic regression and 3PL model

2005-02-20 Thread John Fox

Dear Mike,

You appear to have misinterpreted Brian Ripley's advice. The logitreg()
function that you've included in your message hasn't been modified to
specify a lower bound (higher than 0) to the probability of success. The
lower argument, which gets passed through to nlminb(), specifies the lower
bounds for the parameters, not for the fitted probabilities.

You also apparently have sent the S-PLUS version of the function. To get
this to run in R, you should replace the call to nlminb() with one to
optim(), and you'll have to modify the local function gmin() slightly. 

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Michael Y. Lau
 Sent: Saturday, February 19, 2005 9:46 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] logistic regression and 3PL model
 
 Hello colleagues,
 
  
 
 This is a follow up to a question I posed in November 
 regarding an analysis I was working on.  Thank you to Dr. 
 Brian Ripley and Dr. John Fox for helping me out during that time.
 
  
 
 I am conducting logistic regression on data set on psi (ESP) 
 ganzfeld trials.  The response variable is binary 
 (correct/incorrect), with a 25% guessing base rate.  Dr. 
 Ripley suggested that I modify a maximum likelihood fitting 
 of a binomial logistic regression presented in MASS$ (p. 445):
 
  
 
 logitreg - function(x, y, wt=rep(1, length(y)), intercept = 
 T, start=rep(0,p), ...) 
 
 {
 
fmin - function (beta, X, y, w){
 
p - plogis(X %*% beta)
 
-sum(2 * w * ifelse(y, log(p), log(1-p)))
 

 
}
 
gmin - function (beta, X, y, w) {
 
eta - X %*% beta; p - plogis(eta)
 
-2 * (w *dlogis(eta) * ifelse(y, 1/p, -1/(1-p))) %*% X
 
}
 
if(is.null(dim(x))) dim(x) - c(length(x), 1)
 
dn - dimnames(x)[[2]]
 
if(!length(dn)) dn - paste(Var, 1:ncol(x), sep=)
 
   p - ncol(x) + intercept
 
   if(intercept) {x - cbind(1,x); dn - c((Intercept), dn)}
 
   if(is.factor(y)) y - (unclass(y) !=1)
 
  fit - nlminb(start, fmin, gmin, X=x, y=y, 
 w=wt, method = BFGS, ...)
 
  names(fit$par) - dn
 
  cat(\nCoefficients:\n); print(fit$par)
 
  # R: use fit$value and fit$convergence
 
  cat(\nResidual Deviance:, 
 format(fit$objective), \n)
 
  cat(\nConvergence message:, fit$message, \n)
 
  invisible(fit)
 
 }
 
  
 
 I have been using lower=.25 in the call for logitreg:
 
  
 
 options(contrasts = c(contr.treatment, contr.poly))
 
 X - model.matrix(longhit ~ age*gender, data=data)[,-1]
 
 logitreg(X, data$longhit, lower =.25)
 
 However, this is giving me .25 as the value for intercepts 
 and all the regression weights.  Am I correcting for the 
 guessing base rate correctly?  
 
 Any suggestions or pointers would be greatly appreciated.  
 Thank you in advance.
 
 Mike Lau
 
  
 
 __
 Michael Y. Lau, M.A. 
 118 Haggar Hall
 Department of Psychology
 University of Notre Dame
 Notre Dame, IN 46556
 (574) 631-1904
 [EMAIL PROTECTED] 
   
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression and 3PL model

2005-02-19 Thread Michael Y. Lau

Hello colleagues,

 

This is a follow up to a question I posed in November regarding an analysis
I was working on.  Thank you to Dr. Brian Ripley and Dr. John Fox for
helping me out during that time.

 

I am conducting logistic regression on data set on psi (ESP) ganzfeld
trials.  The response variable is binary (correct/incorrect), with a 25%
guessing base rate.  Dr. Ripley suggested that I modify a maximum likelihood
fitting of a binomial logistic regression presented in MASS$ (p. 445):

 

logitreg - function(x, y, wt=rep(1, length(y)), intercept = T,
start=rep(0,p), ...) 

{

   fmin - function (beta, X, y, w){

   p - plogis(X %*% beta)

   -sum(2 * w * ifelse(y, log(p), log(1-p)))

   

   }

   gmin - function (beta, X, y, w) {

   eta - X %*% beta; p - plogis(eta)

   -2 * (w *dlogis(eta) * ifelse(y, 1/p, -1/(1-p))) %*% X

   }

   if(is.null(dim(x))) dim(x) - c(length(x), 1)

   dn - dimnames(x)[[2]]

   if(!length(dn)) dn - paste(Var, 1:ncol(x), sep=)

  p - ncol(x) + intercept

  if(intercept) {x - cbind(1,x); dn - c((Intercept), dn)}

  if(is.factor(y)) y - (unclass(y) !=1)

 fit - nlminb(start, fmin, gmin, X=x, y=y, w=wt, method =
BFGS, ...)

 names(fit$par) - dn

 cat(\nCoefficients:\n); print(fit$par)

 # R: use fit$value and fit$convergence

 cat(\nResidual Deviance:, format(fit$objective), \n)

 cat(\nConvergence message:, fit$message, \n)

 invisible(fit)

}

 

I have been using lower=.25 in the call for logitreg:

 

options(contrasts = c(contr.treatment, contr.poly))

X - model.matrix(longhit ~ age*gender, data=data)[,-1]

logitreg(X, data$longhit, lower =.25)

However, this is giving me .25 as the value for intercepts and all the
regression weights.  Am I correcting for the guessing base rate correctly?  

Any suggestions or pointers would be greatly appreciated.  Thank you in
advance.

Mike Lau

 

__ 
Michael Y. Lau, M.A. 
118 Haggar Hall 
Department of Psychology 
University of Notre Dame 
Notre Dame, IN 46556 
(574) 631-1904 
[EMAIL PROTECTED] 
  

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression

2005-02-08 Thread Helene . Dryssens

Hi,

I'm using glm function to do logistic regression and now I want to know if it
exists a kind of R-squared with this function in order to check the model.

Thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression

2005-02-08 Thread Chuck Cleland

Terry Elrod posted functions for (pseudo-)R^2 on S-news back in 1998:
http://www.biostat.wustl.edu/archives/html/s-news/1998-02/msg00267.html
I am not sure that it performs well as a check of the model, so you 
might want to investigate that further.

[EMAIL PROTECTED] wrote:
I'm using glm function to do logistic regression and now I want to know if it
exists a kind of R-squared with this function in order to check the model.
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic regression

2005-02-08 Thread Dan Bebber

Helene,

you should read up about AIC, deviance, and deviance
residuals, then look at the summary() for your model.

Dan

Dr. Daniel P. Bebber
Department of Plant Sciences
University of Oxford
OX1 3RB
UK

Message: 78
Date: Tue,  8 Feb 2005 11:15:35 +0100
From: [EMAIL PROTECTED]
Subject: [R] logistic regression
To: r-help@stat.math.ethz.ch
Cc: [EMAIL PROTECTED]
Message-ID:
[EMAIL PROTECTED]
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I'm using glm function to do logistic regression and
now I want to know 
if it
exists a kind of R-squared with this function in order
to check the 
model.

Thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

1 2 >

1 - 100 of 149 matches

Mail list logo