Re: [R] Binary response GLM Question

2011-06-05 Thread casperyc
Hi Josh,

Thank you for your reply.

The reason I thank Y (0 and 1) here as p is because I think each
observation is just a bernulli trial, so in this case the binomial n=1. And
yet R still fits it (with the logit link) . I know the expression for the
logit link, so I assumed I can take y here as p in my problem. Maybe I am
wrong, I will read some more background and try to work it out.

For now, I can only think of bernoulli trials and I need to use glm. I need
to find the correct response (the link function)?

Can any of you maybe point me in the right direction? or some R example
(reference books)

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574620.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binary response GLM Question

2011-06-04 Thread casperyc
Hi all,

I have a problem with binary response data in GLM fitting.
The problem is that the y take only 1 or 0, and if I use logit link, it is
the log of the odds ratio, which is p/(1-p). In my situation, think y is
p, so sometimes the odds is 0, sometimes it is 1/0, which is (should be)
undefine? I wonder how R fits the glm?

The FULL detail of this exercise is as follow:
--
The data here are concerned with whether people default on a loan taken from
a particular bank and for identical interest rates and for a fixed period.
The information on each individual is their sex (male of female); their
income (in pounds), whether the person is a home owner or not, their age (in
years), and the amount of the loan (in pounds).

The information recorded is whether the individal defaulted on the loan or
not. Study the data and try and understand a relation between the persons
characteristics and defaulting. Specifically, what is your estimated
probability that a female aged 42, who is not a home owner, has an income of
23,500, and took a loan of 12,000, defaults on the loan?

The table holding the data have headings as follows:

m/f: male=1, female=0
age: age in years
home: home=1 is a home owner, home=0 is not a home owner
inc: income
loan: amount of loan
def: default=1, non-default=0.

--

my R code

Q3=read.table(tabl3.dat)
colnames(Q3)=c(Sex,Age,Home,Inc,Loan,Def)
Q3$Sex=as.factor(Q3$Sex)
Q3$Home=as.factor(Q3$Home)
Q3$Def=as.factor(Q3$Def)

Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit))

I dont really get that HOW R actually fits the model? if there is 1/0 that
it has to calculate?
This does give me some results but I dont quite feel right about it.

Now,

if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/
(1+0.5-y) ) as the response, then regress it on the explanntory variables, I
got some estimated probability to be 0.49* (when you transfer the log
odds back to p), whereas the previous model give 0.

Am I wrong in the first place to think that the response is y=default?
How should I approach this?

Thanks!


DATA is attached.

http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat 

--
View this message in context: 
http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Binary response GLM Question

2011-06-04 Thread Joshua Wiley
Hi,

Y is not the same as P.  P is the conditional probability given the
data matrix.  So theoretically, P can take on any value in [0, 1],
which means the odds can be anywhere from [0, +infty], not just 0 or
undefined.  In logistic regression, the logit link is pretty standard,
so I do not think you would need to use the empirical logit link.

I am not sure how much detail you want when you ask how does R fit the
glm.  It uses an iterative algorithm.  If you are willing to spend the
time to work through the code, you can learn a lotjust type:
binomial at the console (no quotes no () after it), the source for the
binomial family will print to the console and you can look through the
logit link code.  That gets passed off to glm() to use to fit the
model.  For a more general explanation of the general process, I would
get a book or look online for information on logistic regression or
maximum liklihood estimation.

Cheers,

Josh

On Sat, Jun 4, 2011 at 6:09 PM, casperyc caspe...@hotmail.co.uk wrote:
 Hi all,

 I have a problem with binary response data in GLM fitting.
 The problem is that the y take only 1 or 0, and if I use logit link, it is
 the log of the odds ratio, which is p/(1-p). In my situation, think y is
 p, so sometimes the odds is 0, sometimes it is 1/0, which is (should be)
 undefine? I wonder how R fits the glm?

 The FULL detail of this exercise is as follow:
 --
 The data here are concerned with whether people default on a loan taken from
 a particular bank and for identical interest rates and for a fixed period.
 The information on each individual is their sex (male of female); their
 income (in pounds), whether the person is a home owner or not, their age (in
 years), and the amount of the loan (in pounds).

 The information recorded is whether the individal defaulted on the loan or
 not. Study the data and try and understand a relation between the persons
 characteristics and defaulting. Specifically, what is your estimated
 probability that a female aged 42, who is not a home owner, has an income of
 23,500, and took a loan of 12,000, defaults on the loan?

 The table holding the data have headings as follows:

 m/f: male=1, female=0
 age: age in years
 home: home=1 is a home owner, home=0 is not a home owner
 inc: income
 loan: amount of loan
 def: default=1, non-default=0.

 --

 my R code

 Q3=read.table(tabl3.dat)
 colnames(Q3)=c(Sex,Age,Home,Inc,Loan,Def)
 Q3$Sex=as.factor(Q3$Sex)
 Q3$Home=as.factor(Q3$Home)
 Q3$Def=as.factor(Q3$Def)

 Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit))

 I dont really get that HOW R actually fits the model? if there is 1/0 that
 it has to calculate?
 This does give me some results but I dont quite feel right about it.

 Now,

 if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/
 (1+0.5-y) ) as the response, then regress it on the explanntory variables, I
 got some estimated probability to be 0.49* (when you transfer the log
 odds back to p), whereas the previous model give 0.

 Am I wrong in the first place to think that the response is y=default?
 How should I approach this?

 Thanks!


 DATA is attached.

 http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.