On 12.11.2010 20:11, Marc Schwartz wrote:
You are not creating your data set properly.
Your 'mat' is:
mat
column1 column2
1 1 0
2 1 0
3 0 1
4 0 0
5 1 1
6 1 0
7 1 0
8 0 1
9 0 0
10 1 1
What you really want is:
DF<- data.frame(y = c(1,0,1,0,0,1,0,0,1,1), x = c(5,4,1,6,3,6,5,3,7,9))
Actually it is in general safer to have a factor y rather than numeric y
for classification tasks.
Best,
Uwe
DF
y x
1 1 5
2 0 4
3 1 1
4 0 6
5 0 3
6 1 6
7 0 5
8 0 3
9 1 7
10 1 9
MOD<- glm(y ~ x, data = DF, family = binomial)
summary(MOD)
Call:
glm(formula = y ~ x, family = binomial, data = DF)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3353 -1.0229 -0.1239 0.9956 1.7477
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.6118 1.7833 -0.904 0.366
x 0.3293 0.3383 0.973 0.330
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13.863 on 9 degrees of freedom
Residual deviance: 12.767 on 8 degrees of freedom
AIC: 16.767
Number of Fisher Scoring iterations: 4
HTH,
Marc Schwartz
On Nov 12, 2010, at 12:56 PM, Benjamin Godlove wrote:
I think it is likely I am missing something. Here is a very simple example:
R code:
mat<- matrix(nrow = 10, ncol = 2, c(1,0,1,0,0,1,0,0,1,1),
c(5,4,1,6,3,6,5,3,7,9), dimnames = list(c(1,2,3,4,5,6,7,8,9,10),
c("column1","column2")))
g<- glm(mat[1:10] ~ mat[11:20], family = binomial (link = logit))
g$converged
SAS code:
data mat;
input col1 col2;
datalines;
1 5
0 4
1 1
0 6
0 3
1 6
0 5
0 3
1 7
1 9
;
proc logistic data=mat descending;
model col1 = col2 / link=logit;
run;
SAS output (in case you don't have access to SAS):
Convergence criterion satisfied
Estimate SE
Intercept -1.6118 1.7833
col2 0.3293 0.3383
Of course, with an example this small, it is not so surprising that the two
methods differ; and they hardly differ by a single S. But as the datasets
get larger, the difference is more pronounced. Let me know if you would
like me to send you a large dataset. I get the feeling I am doing something
wrong in R, so please let me know what you think.
Thank you!
Ben Godlove
On Thu, Nov 11, 2010 at 1:59 PM, Albyn Jones<jo...@reed.edu> wrote:
do you have factors (categorical variables) in the model? it could be
just a parameterization difference.
albyn
On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
Dear R developers,
I have noticed a discrepancy between the coefficients returned by R's
glm()
for logistic regression and SAS's PROC LOGISTIC. I am using dist =
binomial
and link = logit for both R and SAS. I believe R uses IRLS whereas SAS
uses
Fisher's scoring, but the difference is something like 100 SE on the
intercept. What accounts for such a huge difference?
Thank you for your time.
Ben Godlove
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Albyn Jones
Reed College
jo...@reed.edu
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.