Re: [R] glm with family binomial and link identity

2022-04-28 Thread Martin Maechler
TLDR:  No, there was no such change in R 4.2.0

> Ralf Goertz 
> on Wed, 27 Apr 2022 10:27:33 +0200 writes:

> Hi,
> I just noticed that (with my version 4.2.0) it is no longer possible to
> use glm with family=binomial(link=identity). Why is that? It was
> possible with 4.0.x as a colleague of mine just confirmed. 

That's not correct I think:

I've now tried your example in R versions 3.5.x, 4.0.5, 4.1.3, and 4.2.0
and it gave the same error everywhere:

d <- data.frame(y  = c( 0, 1,  0, 1),
trt= c( 0, 0,  1, 1),
w  = c(97, 3, 90, 10))
r0 <- try(
glm(y ~ trt, weights = w, data=d, family = binomial(link = identity)) 
)
## Error in binomial(link = identity) : 
##   link "identity" not available for binomial family; available links are 
‘logit’, ‘probit’, ‘cloglog’, ‘cauchit’, ‘log’

And indeed, binomial() has started with

> head(binomial)
  
1 function (link = "logit")   
2 {   
3 linktemp <- substitute(link)
4 if (!is.character(linktemp))
5 linktemp <- deparse(linktemp)   
6 okLinks <- c("logit", "probit", "cloglog", "cauchit", "log")
> 

*forever*.

It is slightly interesting (to me at least) that if you use a string,
you can use all valid arguments for make.link() and not just
those that are declared "ok" by the code.

HOWEVER, this all has been documented for a long time,
in  ?family == ?binomial

  Note:

 The ‘link’ and ‘variance’ arguments have rather awkward semantics
 for back-compatibility.  The recommended way is to supply them as
 quoted character strings, but they can also be supplied unquoted
 (as names or expressions).  Additionally, they can be supplied as
 a length-one character vector giving the name of one of the
 options, or as a list (for ‘link’, of class ‘"link-glm"’).  The
 restrictions apply only to links given as names: when given as a
 character string all the links known to make.link are accepted.



 



> After all it is useful to compute risk differences with factors.

at least it was neand

> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm with family binomial and link identity

2022-04-28 Thread Ralf Goertz
> Am Wed, 27 Apr 2022 10:27:33 +0200 schrieb Ralf Goertz
> 
> > Hi,
> >
> > I just noticed that (with my version 4.2.0) it is no longer possible
> > to use glm with family=binomial(link=identity). Why is that? It was
> > possible with 4.0.x as a colleague of mine just confirmed. After all
> > it is useful to compute risk differences with factors.  
> 
> Sorry about the noise. It still works with "identity" but it has to be
> quoted (whereas "log" and "logit" also work unquoted).

On the other hand why do I get this following error at all?

Error in binomial(link = identity) : 
  link "identity" not available for binomial family; available links are 
‘logit’, ‘probit’, ‘cloglog’, ‘cauchit’, ‘log’

This is a bit misleading since identity is a legitimate link function
and works perfectly when quoted. The results of that call are in
agreement with what you get with the risk difference approach (for
convenience computed using the meta package)

> d=data.frame(y=c(0, 1, 0, 1), trt=c(0, 0, 1, 1), w=c(97, 3, 90, 10))
> d
  y trt  w
1 0   0 97
2 1   0  3
3 0   1 90
4 1   1 10

> res=glm(y~trt,data=d,weights=w,family=binomial(link="identity"))
> summary(res)

Call: glm(formula = y ~ trt, family = binomial(link = "identity"), data
= dd, weights = w)

Deviance Residuals: 
 1   2   3   4  
-2.431   4.587  -4.355   6.786  

Coefficients:
Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.030000.01706   1.759   0.0786 .
trt  0.070000.03451   2.028   0.0425 *
…

> confint.default(res)[2,]
  2.5 %  97.5 % 
0.002359942 0.137640058 

This is exactly the same as 

> meta::metabin(10,100,3,100,sm="RD")
Number of observations: o = 200
Number of events: e = 13

 RD   95%-CIz p-value
 0.0700 [0.0024; 0.1376] 2.03  0.0425

So why the error?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm with family binomial and link identity

2022-04-27 Thread Ralf Goertz
Am Wed, 27 Apr 2022 10:27:33 +0200
schrieb Ralf Goertz :

> Hi,
>
> I just noticed that (with my version 4.2.0) it is no longer possible
> to use glm with family=binomial(link=identity). Why is that? It was
> possible with 4.0.x as a colleague of mine just confirmed. After all
> it is useful to compute risk differences with factors.

Sorry about the noise. It still works with "identity" but it has to be
quoted (whereas "log" and "logit" also work unquoted).

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm with family binomial and link identity

2022-04-27 Thread Ralf Goertz
Hi,

I just noticed that (with my version 4.2.0) it is no longer possible to
use glm with family=binomial(link=identity). Why is that? It was
possible with 4.0.x as a colleague of mine just confirmed. After all it
is useful to compute risk differences with factors.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM model with spatialspillover on categorical variables

2020-06-04 Thread Bert Gunter
You should post on r-sig-geo, the list devoted to spatial data analysis,
rather than here.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jun 4, 2020 at 12:17 PM Lena Fehlhaber 
wrote:

> I did a regression analysis with categorical data with a glm model
> approach, which worked fine. I have longitude and latitude coordinates for
> each observation and I want to add their geographic spillover effect to the
> model.
>
> My sample data is structured:
>
> Index DV IVI IVII IVIII IVIV Long Lat
>  1  0  2  1  3  -12  -17.8  12
>  2  0  1  1  6  112  11  -122
>  3  1  3  6  1  91  57  53
>
> with regression eq. DV ~ IVI + IVII + IVIII + IVIV
>
> That mentioned, I assume that the nearer regions are, the more it may
> influence my dependant variable. I found several approaches for spatial
> regression models, but not for categorical data. I tried to use existing
> libraries and functions, such as spdep's lagsarlm, glmmfields, spatialreg,
> gstat, geoRglm and many more (I used this list as a reference:
> https://cran.r-project.org/web/views/Spatial.html ). For numeric values, I
> am able to do spatial regression, but for categorical values, I struggle.
> The data structure is the following:
>
> library(dplyr)
> data <- data %>%
>   mutate(
> DV = as.factor(DV),
> IVI = as.factor(IVI),
> IVII = as.factor(IVII),
> IVIII = as.factor(IVIII),
> IVIV = as.numeric(IVIV),
> longitude = as.numeric(longitude),
> latitude = as.numeric(latitude)
>   )
>
> My dependant variable (0|1) as well as my independant variables are
> categorical and it would be no use to transform them, of course. I want to
> have an other glm model in the end, but with spatial spillover effects
> included. The libraries I tested so far can't handle categorical data. Any
> leads/ideas would be greatly appreciated.
>
> Thanks a lot.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM model with spatialspillover on categorical variables

2020-06-04 Thread Lena Fehlhaber
I did a regression analysis with categorical data with a glm model
approach, which worked fine. I have longitude and latitude coordinates for
each observation and I want to add their geographic spillover effect to the
model.

My sample data is structured:

Index DV IVI IVII IVIII IVIV Long Lat
 1  0  2  1  3  -12  -17.8  12
 2  0  1  1  6  112  11  -122
 3  1  3  6  1  91  57  53

with regression eq. DV ~ IVI + IVII + IVIII + IVIV

That mentioned, I assume that the nearer regions are, the more it may
influence my dependant variable. I found several approaches for spatial
regression models, but not for categorical data. I tried to use existing
libraries and functions, such as spdep's lagsarlm, glmmfields, spatialreg,
gstat, geoRglm and many more (I used this list as a reference:
https://cran.r-project.org/web/views/Spatial.html ). For numeric values, I
am able to do spatial regression, but for categorical values, I struggle.
The data structure is the following:

library(dplyr)
data <- data %>%
  mutate(
DV = as.factor(DV),
IVI = as.factor(IVI),
IVII = as.factor(IVII),
IVIII = as.factor(IVIII),
IVIV = as.numeric(IVIV),
longitude = as.numeric(longitude),
latitude = as.numeric(latitude)
  )

My dependant variable (0|1) as well as my independant variables are
categorical and it would be no use to transform them, of course. I want to
have an other glm model in the end, but with spatial spillover effects
included. The libraries I tested so far can't handle categorical data. Any
leads/ideas would be greatly appreciated.

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Model Summary

2018-10-16 Thread Marc Schwartz via R-help



> On Oct 16, 2018, at 12:33 PM, Neslin, Scott A. 
>  wrote:
> 
> R-Help:
> 
> We are working with your GLM R package.  The Summary(Model) now gets printed 
> by the program as one object and we want to put the coefficient columns into 
> Excel.  We took an initial stab at this by counting the number of characters 
> occupied by each column.  But we have now learned that the number of 
> characters in a column depends on the length of the variable names, so is not 
> a constant number (e.g., 54 characters to a line).
> 
> We therefore ask, is it possible for us to get the Summary(Model) column by 
> column, i.e., a separate object for each column?  That way we could assemble 
> an Excel table easily rather than having to count the number of characters.
> 
> Is this possible for us to do by ourselves?  Or could you modify the package 
> in some way?
> 
> We appreciate your attention.  Thank you!
> 
> Scott Neslin
> Prasad Vana
> 
> Dartmouth College


Hi,

Presuming that you are talking about the glm() function, as there is no GLM 
package as far as I can see, R model objects have a structure that can be 
viewed using the str() function. The help for this function can be viewed using:

   ?str

You can then use:

  str(summary(YourModelObject))

That will give you some insights into the model object components and that same 
str() function is valuable for investigating other objects as well.

That being said, R's model objects typically have 'extractor' functions to make 
it easy to obtain commonly used components of the model object, which can be 
complicated.

The R manual "An Introduction to R", has a section on some of these:


https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Generic-functions-for-extracting-model-information
 


Thus, for example, using:

  coef(summary(YourModelObject))

will return the matrix of coefficients and related parameters from the summary 
model object.

Once you have that matrix object, you can write it out to a CSV file using 
?write.csv, where the CSV file can then be opened with or imported into Excel.

So the steps might be along the lines of:

  my.coef <- coef(summary(YourModelObject))

  write.csv(my.coef, file = "MyCoefficients.csv")


The R Data Import/Export manual:

  https://cran.r-project.org/doc/manuals/r-release/R-data.html 


has some insights into pathways for getting data to and from R, including some 
packages that can directly write Excel files. You may wish to review that 
manual as well.

Regards,

Marc Schwartz



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Model Summary

2018-10-16 Thread Amit Mittal
You can just use

'slotNames(modelname)
This will return sub objects for which names can be extracted

Eg

slotNames(modelname)
[1] "mfit" "model"
names(modelname@mfit)
names(modelname@model)
Will return all objects within the model including coed car R cover vcov as
applicable and you can store per choice

However within the column they will still be text

BR

On Tue 16 Oct, 2018, 10:23 PM Duncan Murdoch, 
wrote:

> On 16/10/2018 12:33 PM, Neslin, Scott A. wrote:
> > R-Help:
> >
> > We are working with your GLM R package.  The Summary(Model) now gets
> printed by the program as one object and we want to put the coefficient
> columns into Excel.  We took an initial stab at this by counting the number
> of characters occupied by each column.  But we have now learned that the
> number of characters in a column depends on the length of the variable
> names, so is not a constant number (e.g., 54 characters to a line).
>
> There is no GLM R package as far as I know.  There's a glm() function in
> the stats package.  Is that what you meant?
>
> What would probably be best is if you showed us a simple example of what
> you are doing, and then referred to the results from that when saying
> what you want to extract.
>
> Duncan Murdoch
>
> >
> > We therefore ask, is it possible for us to get the Summary(Model) column
> by column, i.e., a separate object for each column?  That way we could
> assemble an Excel table easily rather than having to count the number of
> characters.
> >
> > Is this possible for us to do by ourselves?  Or could you modify the
> package in some way?
> >
> > We appreciate your attention.  Thank you!
> >
> > Scott Neslin
> > Prasad Vana
> >
> > Dartmouth College
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 

__

Amit Mittal
Pursuing Ph.D. in Finance and Accounting
Indian Institute of Management, Lucknow
Visit my SSRN author page:
http://ssrn.com/author=2665511
* Top 10% Downloaded Author on SSRN
Mob: +91 7525023664

This message has been sent from a mobile device. I may contact you again.

_

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Model Summary

2018-10-16 Thread Peter Langfelder
The coefficients are best obtained as summary(Model)$coefficients.
This is a matrix can than be saved as a csv file and opened in excel
or other spreadsheet software.

HTH,

Peter
On Tue, Oct 16, 2018 at 9:44 AM Neslin, Scott A.
 wrote:
>
> R-Help:
>
> We are working with your GLM R package.  The Summary(Model) now gets printed 
> by the program as one object and we want to put the coefficient columns into 
> Excel.  We took an initial stab at this by counting the number of characters 
> occupied by each column.  But we have now learned that the number of 
> characters in a column depends on the length of the variable names, so is not 
> a constant number (e.g., 54 characters to a line).
>
> We therefore ask, is it possible for us to get the Summary(Model) column by 
> column, i.e., a separate object for each column?  That way we could assemble 
> an Excel table easily rather than having to count the number of characters.
>
> Is this possible for us to do by ourselves?  Or could you modify the package 
> in some way?
>
> We appreciate your attention.  Thank you!
>
> Scott Neslin
> Prasad Vana
>
> Dartmouth College
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Model Summary

2018-10-16 Thread Duncan Murdoch

On 16/10/2018 12:33 PM, Neslin, Scott A. wrote:

R-Help:

We are working with your GLM R package.  The Summary(Model) now gets printed by 
the program as one object and we want to put the coefficient columns into 
Excel.  We took an initial stab at this by counting the number of characters 
occupied by each column.  But we have now learned that the number of characters 
in a column depends on the length of the variable names, so is not a constant 
number (e.g., 54 characters to a line).


There is no GLM R package as far as I know.  There's a glm() function in 
the stats package.  Is that what you meant?


What would probably be best is if you showed us a simple example of what 
you are doing, and then referred to the results from that when saying 
what you want to extract.


Duncan Murdoch



We therefore ask, is it possible for us to get the Summary(Model) column by 
column, i.e., a separate object for each column?  That way we could assemble an 
Excel table easily rather than having to count the number of characters.

Is this possible for us to do by ourselves?  Or could you modify the package in 
some way?

We appreciate your attention.  Thank you!

Scott Neslin
Prasad Vana

Dartmouth College

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM Model Summary

2018-10-16 Thread Neslin, Scott A.
R-Help:

We are working with your GLM R package.  The Summary(Model) now gets printed by 
the program as one object and we want to put the coefficient columns into 
Excel.  We took an initial stab at this by counting the number of characters 
occupied by each column.  But we have now learned that the number of characters 
in a column depends on the length of the variable names, so is not a constant 
number (e.g., 54 characters to a line).

We therefore ask, is it possible for us to get the Summary(Model) column by 
column, i.e., a separate object for each column?  That way we could assemble an 
Excel table easily rather than having to count the number of characters.

Is this possible for us to do by ourselves?  Or could you modify the package in 
some way?

We appreciate your attention.  Thank you!

Scott Neslin
Prasad Vana

Dartmouth College

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm package - Negative binomial regression model - Error

2018-02-26 Thread Paula Couto
Thank  you so much, Thierry!!
I will try that now and see if that solves the issue
Bests,
Paula


On Feb 26, 2018 03:02, "Thierry Onkelinx"  wrote:

Dear Paula,

There are probably missing observations in your data set. Read the
na.action part of the glm help file. na.exclude is most likely what you are
looking for.

Best regards,


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Havenlaan 88 
bus 73, 1000 Brussel
www.inbo.be


///
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

///



2018-02-26 2:07 GMT+01:00 Paula Couto :

> HI there
>
> I am running this model in negative binomial regression, using glm.
> I had no problems with running the model with a set of data, but now that
> i'm trying to run if for new one.  I always have this same error when
> running the regression:
>
> >
> > #Run Regression
> > x=cbind(factor2ind(d$year),factor2ind(d$month_week))
> >
> > out<- glm(cbind(influenza, n_sample) ~ x, family=quasibinomial,
> > data=d)
> >
> > d$prop<-out$fitted.values
>
> Error in `$<-.data.frame`(`*tmp*`, prop, value = c(0.0486530542835839,  :
>   replacement has 208 rows, data has 365
>
> > d$n_p1<-d$prop*d$factor*10
> >
> > obs<-aggregate(d$prop, by = list(d$month_week), FUN=summary)
> > pred<-aggregate(d$n_p1, by = list(d$month_week), FUN=summary)
> >
>
> By the way, I previously prepared the data set  and defined that:
>d$factor<-sapply(d$year,f)
> > d$n_sample<-(d$n_muestras*d$factor*10)
> > d$prop<-(d$influenza/d$n_sample)
>
> But I still don't understand why it keeps saying that dataframe has less
> replacements than rows.
> Could anybody help me with this?
>  Many thankss!!!
> P
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm package - Negative binomial regression model - Error

2018-02-26 Thread Thierry Onkelinx
Dear Paula,

There are probably missing observations in your data set. Read the
na.action part of the glm help file. na.exclude is most likely what you are
looking for.

Best regards,


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///



2018-02-26 2:07 GMT+01:00 Paula Couto :

> HI there
>
> I am running this model in negative binomial regression, using glm.
> I had no problems with running the model with a set of data, but now that
> i'm trying to run if for new one.  I always have this same error when
> running the regression:
>
> >
> > #Run Regression
> > x=cbind(factor2ind(d$year),factor2ind(d$month_week))
> >
> > out<- glm(cbind(influenza, n_sample) ~ x, family=quasibinomial,
> > data=d)
> >
> > d$prop<-out$fitted.values
>
> Error in `$<-.data.frame`(`*tmp*`, prop, value = c(0.0486530542835839,  :
>   replacement has 208 rows, data has 365
>
> > d$n_p1<-d$prop*d$factor*10
> >
> > obs<-aggregate(d$prop, by = list(d$month_week), FUN=summary)
> > pred<-aggregate(d$n_p1, by = list(d$month_week), FUN=summary)
> >
>
> By the way, I previously prepared the data set  and defined that:
>d$factor<-sapply(d$year,f)
> > d$n_sample<-(d$n_muestras*d$factor*10)
> > d$prop<-(d$influenza/d$n_sample)
>
> But I still don't understand why it keeps saying that dataframe has less
> replacements than rows.
> Could anybody help me with this?
>  Many thankss!!!
> P
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm package - Negative binomial regression model - Error

2018-02-25 Thread Paula Couto
HI there

I am running this model in negative binomial regression, using glm.
I had no problems with running the model with a set of data, but now that
i'm trying to run if for new one.  I always have this same error when
running the regression:

>
> #Run Regression
> x=cbind(factor2ind(d$year),factor2ind(d$month_week))
>
> out<- glm(cbind(influenza, n_sample) ~ x, family=quasibinomial,
> data=d)
>
> d$prop<-out$fitted.values

Error in `$<-.data.frame`(`*tmp*`, prop, value = c(0.0486530542835839,  :
  replacement has 208 rows, data has 365

> d$n_p1<-d$prop*d$factor*10
>
> obs<-aggregate(d$prop, by = list(d$month_week), FUN=summary)
> pred<-aggregate(d$n_p1, by = list(d$month_week), FUN=summary)
>

By the way, I previously prepared the data set  and defined that:
   d$factor<-sapply(d$year,f)
> d$n_sample<-(d$n_muestras*d$factor*10)
> d$prop<-(d$influenza/d$n_sample)

But I still don't understand why it keeps saying that dataframe has less
replacements than rows.
Could anybody help me with this?
 Many thankss!!!
P

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm$effects

2018-01-12 Thread Bert Gunter
See ?effects

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jan 12, 2018 at 10:44 AM, Cade, Brian  wrote:

> I know I must be missing something obvious, but checking help and googling
> a bit did not turn up a useable answer.  When I've estimated a glm() model
> object (my example is with just identity link with gaussian family so I
> could have used lm() instead), one of the terms returned in the model
> object is listed as $effects.  What are these quantities?  I have not been
> able to relate them to the $coefficients, $fitted.values, $resid, or $y
> completely.  For example, if I estimate the simplest model with just an
> intercept term, the $effects for the n observations differ from $y by a
> constant quantity except for one of the values.
>
> Brian
>
> Brian S. Cade, PhD
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO  80526-8818
>
> email:  ca...@usgs.gov 
> tel:  970 226-9326
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm$effects

2018-01-12 Thread Cade, Brian
I know I must be missing something obvious, but checking help and googling
a bit did not turn up a useable answer.  When I've estimated a glm() model
object (my example is with just identity link with gaussian family so I
could have used lm() instead), one of the terms returned in the model
object is listed as $effects.  What are these quantities?  I have not been
able to relate them to the $coefficients, $fitted.values, $resid, or $y
completely.  For example, if I estimate the simplest model with just an
intercept term, the $effects for the n observations differ from $y by a
constant quantity except for one of the values.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm and stepAIC selects too many effects

2017-06-06 Thread Marc Girondot via R-help

This is a question at the border between stats and r.

When I do a glm with many potential effects, and select a model using 
stepAIC, many independent variables are selected even if there are no 
relationship between dependent variable and the effects (all are random 
numbers).


Do someone has a solution to prevent this effect ? Is it related to 
Bonferoni correction ?


Is there is a ratio of independent vs number of observations that is 
safe for stepAIC ?


Thanks

Marc

Example of code. When 2 independent variables are included, no effect is 
selected, when 11 are included, 7 to 8 are selected.


x <- rnorm(15, 15, 2)
A <- rnorm(15, 20, 5)
B <- rnorm(15, 20, 5)
C <- rnorm(15, 20, 5)
D <- rnorm(15, 20, 5)
E <- rnorm(15, 20, 5)
F <- rnorm(15, 20, 5)
G <- rnorm(15, 20, 5)
H <- rnorm(15, 20, 5)
I <- rnorm(15, 20, 5)
J <- rnorm(15, 20, 5)
K <- rnorm(15, 20, 5)

df <- data.frame(x=x, A=A, B=B, C=C, D=D,
 E=E, F=F, G=G, H=H, I=I, J=J,
 K=K)

G1 <- glm(formula = x ~ A + B,
 data=df, family = gaussian(link = "identity"))

g1 <- stepAIC(G1)

summary(g1)

G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
 data=df, family = gaussian(link = "identity"))

g2 <- stepAIC(G2)

summary(g2)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R glm function ignores some predictor variables

2017-03-28 Thread peter dalgaard

> On 27 Mar 2017, at 17:23 , Gabrielle Perron  
> wrote:
> 
> Hi,
> 
> 
> This is my first time using this mailing list. I have looked at the posting 
> guide, but please do let me know if I should be doing something differently.


Avoid sending in HTML. It's not really bad here except for excessive 
inter-paragraph spacing, but the autoconversion to plain text can make posts 
almost unreadable.

---
It looks like some of your predictors are equal to linear combinations of other 
predictors. E.g., if you have dummies for mutually exclusive groups, they will 
sum to a vector of ones, which is already in the model for the intercept term, 
so one must be removed. (R has a better interface to this sort of thing, using 
factor variables, but that is another story.) With many predictors, this can 
happen in less obvious ways as well.

For plain multiple regression, I think most would use lm() and not glm(). It 
shouldn't make much of a difference, but some details may differ.

-pd

> Here is my question, I apologize in advance for not being able to provide 
> example data, I am using very large tables, and what I am trying to do works 
> fine with simpler examples, so providing example data cannot help. It has 
> always worked for me until now. So I am just trying to get your ideas on what 
> might be the issue. But if there is any way I could provide more information, 
> do let me know.
> 
> 
> So, I have a vector corresponding to a response variable and a table of 
> predictor variables. The response vector is numeric, the predictor variables 
> (columns of the table) are in the binary format (0s and 1s).
> 
> 
> I am running the glm function (multivariate linear regression) using the 
> response vector and the table of predictors:
> 
> 
>fit <- glm(response ~ as.matrix(predictors), na.action=na.exclude)
> 
>coeff <- as.vector(coef(summary(fit))[,4])[-1]
> 
> 
> When I have been doing that in the past, I would extract the vector of 
> regression coefficient to use it for further analysis.
> 
> 
> The problem is that now the regression returns a vector of coefficients which 
> is missing some values. Essentially some predictor variables are not 
> attributed a coefficient at all by glm. But there are no error messages.
> 
> 
> The summary of the model looks normal, but some predictor variables are 
> missing like I mentioned. Most other predictors have assigned data 
> (coefficient, pvalue, etc.).
> 
> About 30 predictors are missing from the model, over 200.
> 
> 
> I have tried using different response variables (vectors), but I am getting 
> the same issue, although the missing predictors vary depending on the 
> response vector...
> 
> 
> Any ideas on what might be going on? I think this can happen if some 
> variables have 0 variance, but I have checked that. There are also no NA 
> values and no missing values in the tables.
> 
> 
> What could cause glm to ignore/remove some predictor variables?
> 
> 
> Any suggestion is welcome!
> 
> 
> Thank you,
> 
> 
> Gabrielle
> 
> 
> 
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R glm function ignores some predictor variables

2017-03-28 Thread Jim Lemon
Hi Gabrielle,
With that number of binary predictors it would be no surprise if some
were linear combinations of others.

Jim


On Tue, Mar 28, 2017 at 2:23 AM, Gabrielle Perron
 wrote:
> Hi,
>
>
> This is my first time using this mailing list. I have looked at the posting 
> guide, but please do let me know if I should be doing something differently.
>
>
> Here is my question, I apologize in advance for not being able to provide 
> example data, I am using very large tables, and what I am trying to do works 
> fine with simpler examples, so providing example data cannot help. It has 
> always worked for me until now. So I am just trying to get your ideas on what 
> might be the issue. But if there is any way I could provide more information, 
> do let me know.
>
>
> So, I have a vector corresponding to a response variable and a table of 
> predictor variables. The response vector is numeric, the predictor variables 
> (columns of the table) are in the binary format (0s and 1s).
>
>
> I am running the glm function (multivariate linear regression) using the 
> response vector and the table of predictors:
>
>
> fit <- glm(response ~ as.matrix(predictors), na.action=na.exclude)
>
> coeff <- as.vector(coef(summary(fit))[,4])[-1]
>
>
> When I have been doing that in the past, I would extract the vector of 
> regression coefficient to use it for further analysis.
>
>
> The problem is that now the regression returns a vector of coefficients which 
> is missing some values. Essentially some predictor variables are not 
> attributed a coefficient at all by glm. But there are no error messages.
>
>
> The summary of the model looks normal, but some predictor variables are 
> missing like I mentioned. Most other predictors have assigned data 
> (coefficient, pvalue, etc.).
>
> About 30 predictors are missing from the model, over 200.
>
>
> I have tried using different response variables (vectors), but I am getting 
> the same issue, although the missing predictors vary depending on the 
> response vector...
>
>
> Any ideas on what might be going on? I think this can happen if some 
> variables have 0 variance, but I have checked that. There are also no NA 
> values and no missing values in the tables.
>
>
> What could cause glm to ignore/remove some predictor variables?
>
>
> Any suggestion is welcome!
>
>
> Thank you,
>
>
> Gabrielle
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R glm function ignores some predictor variables

2017-03-28 Thread Gabrielle Perron
Hi,


This is my first time using this mailing list. I have looked at the posting 
guide, but please do let me know if I should be doing something differently.


Here is my question, I apologize in advance for not being able to provide 
example data, I am using very large tables, and what I am trying to do works 
fine with simpler examples, so providing example data cannot help. It has 
always worked for me until now. So I am just trying to get your ideas on what 
might be the issue. But if there is any way I could provide more information, 
do let me know.


So, I have a vector corresponding to a response variable and a table of 
predictor variables. The response vector is numeric, the predictor variables 
(columns of the table) are in the binary format (0s and 1s).


I am running the glm function (multivariate linear regression) using the 
response vector and the table of predictors:


fit <- glm(response ~ as.matrix(predictors), na.action=na.exclude)

coeff <- as.vector(coef(summary(fit))[,4])[-1]


When I have been doing that in the past, I would extract the vector of 
regression coefficient to use it for further analysis.


The problem is that now the regression returns a vector of coefficients which 
is missing some values. Essentially some predictor variables are not attributed 
a coefficient at all by glm. But there are no error messages.


The summary of the model looks normal, but some predictor variables are missing 
like I mentioned. Most other predictors have assigned data (coefficient, 
pvalue, etc.).

About 30 predictors are missing from the model, over 200.


I have tried using different response variables (vectors), but I am getting the 
same issue, although the missing predictors vary depending on the response 
vector...


Any ideas on what might be going on? I think this can happen if some variables 
have 0 variance, but I have checked that. There are also no NA values and no 
missing values in the tables.


What could cause glm to ignore/remove some predictor variables?


Any suggestion is welcome!


Thank you,


Gabrielle







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM and POST HOC test INTERPRETATION

2017-02-08 Thread Bert Gunter
Your questions are basically statistical and therefore OT here,
although some kind soul may respond. I would strongly suggest that you
consult with a local statistical expert, as you seem to be out of your
depth statistically.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Feb 8, 2017 at 4:08 PM, CHIRIBOGA Xavier
 wrote:
> Dear colleagues,
>
>
> I am analyzing a data set of 68 values (integers). In some treatments 
> (exactly 6) the values are "zero". Because I record 0 in my measurement (or 
> really a small value below zero)
>
> My experiment is designed in such a way that I record values for 6 treatments 
> at 2 times. Replicates are different in each combination time-treatment.
>
> I am running a GLM , poisson distribution, for ANOVA I used Chisq, and for 
> the POST HOC test I used Tukey.
>
> I try to detect if interaction is significant, so I build the script:
> expresion~time*treatment
>
> Effects of time, treatment are interaction are significant. However, when I 
> run the script for Tukey comparisons, I only get 15 comparisons. Of course I 
> cannot interpret that:
>
> these comparisons are the same for Time 1 and Time 2, since there is a 
> significant effect of time. Moreover, I got a warning message : covariate 
> interactions found. I dont know if I am doing right? I dont know what to do?
>
>
> Thank you for your help,
>
>
> Xavier
>
> PhD Student
>
> University of Neuchatel
>
>
> lm3=glm(expresion~time*treatment,family="poisson")
>> summary(lm3)
>
> Call:
> glm(formula = expresion ~ time * treatment, family = "poisson")
>
> Deviance Residuals:
> Min   1Q   Median   3Q  Max
> -5.3796  -1.4523  -0.6642   1.2277   6.3909
>
> Coefficients:
>   Estimate Std. Error z value Pr(>|z|)
> (Intercept)2.099640.29508   7.115 1.12e-12 ***
> time   0.202940.19255   1.054 0.291895
> treatmentCHA0+Db  -0.170040.36180  -0.470 0.638356
> treatmentDb1.689520.37624   4.490 7.11e-06 ***
> treatmentHEALTHY   0.840350.50340   1.669 0.095049 .
> treatmentPCL   0.320720.37950   0.845 0.398041
> treatmentPCL+Db0.543650.34047   1.597 0.110320
> time:treatmentCHA0+Db  0.873140.22626   3.859 0.000114 ***
> time:treatmentDb  -0.828030.26539  -3.120 0.001808 **
> time:treatmentHEALTHY -1.369870.38318  -3.575 0.000350 ***
> time:treatmentPCL  0.084740.24635   0.344 0.730851
> time:treatmentPCL+Db   0.392440.21521   1.824 0.068217 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
> Null deviance: 1173.05  on 66  degrees of freedom
> Residual deviance:  403.07  on 55  degrees of freedom
> AIC: 707.95
>
> Number of Fisher Scoring iterations: 5
>
>
>> anova(lm3,test="Chisq")
> Analysis of Deviance Table
>
> Model: poisson, link: log
>
> Response: expresion
>
> Terms added sequentially (first to last)
>
>
>Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
> NULL  661173.05
> time1   100.55651072.50 < 2.2e-16 ***
> treatment   5   561.6960 510.81 < 2.2e-16 ***
> time:treatment  5   107.7555 403.07 < 2.2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>> summary(glht(lm3, mcp(treatment="Tukey")))
>
>  Simultaneous Tests for General Linear Hypotheses
>
> Multiple Comparisons of Means: Tukey Contrasts
>
>
> Fit: glm(formula = expresion ~ time * treatment, family = "poisson")
>
> Linear Hypotheses:
>Estimate Std. Error z value Pr(>|z|)
> CHA0+Db - CHA0 == 0 -0.1700 0.3618  -0.470   0.9970
> Db - CHA0 == 0   1.6895 0.3762   4.490   <0.001 ***
> HEALTHY - CHA0 == 0  0.8404 0.5034   1.669   0.5402
> PCL - CHA0 == 0  0.3207 0.3795   0.845   0.9568
> PCL+Db - CHA0 == 0   0.5437 0.3405   1.597   0.5892
> Db - CHA0+Db == 01.8596 0.3135   5.931   <0.001 ***
> HEALTHY - CHA0+Db == 0   1.0104 0.4584   2.204   0.2266
> PCL - CHA0+Db == 0   0.4908 0.3174   1.546   0.6231
> PCL+Db - CHA0+Db == 00.7137 0.2696   2.648   0.0817 .
> HEALTHY - Db == 0   -0.8492 0.4699  -1.807   0.4491
> PCL - Db == 0   -1.3688 0.3338  -4.101   <0.001 ***
> PCL+Db - Db == 0-1.1459 0.2887  -3.969   <0.001 ***
> PCL - HEALTHY == 0  -0.5196 0.4725  -1.100   0.8764
> PCL+Db - HEALTHY == 0   -0.2967 0.4418  -0.672   0.9842
> PCL+Db - PCL == 00.2229 0.2929   0.761   0.9725
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> (Adjusted p values reported -- single-step method)
>
> Warning message:
> In mcp2matrix(model, linfct = linfct) 

[R] GLM and POST HOC test INTERPRETATION

2017-02-08 Thread CHIRIBOGA Xavier
Dear colleagues,


I am analyzing a data set of 68 values (integers). In some treatments (exactly 
6) the values are "zero". Because I record 0 in my measurement (or really a 
small value below zero)

My experiment is designed in such a way that I record values for 6 treatments 
at 2 times. Replicates are different in each combination time-treatment.

I am running a GLM , poisson distribution, for ANOVA I used Chisq, and for the 
POST HOC test I used Tukey.

I try to detect if interaction is significant, so I build the script:
expresion~time*treatment

Effects of time, treatment are interaction are significant. However, when I run 
the script for Tukey comparisons, I only get 15 comparisons. Of course I cannot 
interpret that:

these comparisons are the same for Time 1 and Time 2, since there is a 
significant effect of time. Moreover, I got a warning message : covariate 
interactions found. I dont know if I am doing right? I dont know what to do?


Thank you for your help,


Xavier

PhD Student

University of Neuchatel


lm3=glm(expresion~time*treatment,family="poisson")
> summary(lm3)

Call:
glm(formula = expresion ~ time * treatment, family = "poisson")

Deviance Residuals:
Min   1Q   Median   3Q  Max
-5.3796  -1.4523  -0.6642   1.2277   6.3909

Coefficients:
  Estimate Std. Error z value Pr(>|z|)
(Intercept)2.099640.29508   7.115 1.12e-12 ***
time   0.202940.19255   1.054 0.291895
treatmentCHA0+Db  -0.170040.36180  -0.470 0.638356
treatmentDb1.689520.37624   4.490 7.11e-06 ***
treatmentHEALTHY   0.840350.50340   1.669 0.095049 .
treatmentPCL   0.320720.37950   0.845 0.398041
treatmentPCL+Db0.543650.34047   1.597 0.110320
time:treatmentCHA0+Db  0.873140.22626   3.859 0.000114 ***
time:treatmentDb  -0.828030.26539  -3.120 0.001808 **
time:treatmentHEALTHY -1.369870.38318  -3.575 0.000350 ***
time:treatmentPCL  0.084740.24635   0.344 0.730851
time:treatmentPCL+Db   0.392440.21521   1.824 0.068217 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1173.05  on 66  degrees of freedom
Residual deviance:  403.07  on 55  degrees of freedom
AIC: 707.95

Number of Fisher Scoring iterations: 5


> anova(lm3,test="Chisq")
Analysis of Deviance Table

Model: poisson, link: log

Response: expresion

Terms added sequentially (first to last)


   Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
NULL  661173.05
time1   100.55651072.50 < 2.2e-16 ***
treatment   5   561.6960 510.81 < 2.2e-16 ***
time:treatment  5   107.7555 403.07 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


> summary(glht(lm3, mcp(treatment="Tukey")))

 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: glm(formula = expresion ~ time * treatment, family = "poisson")

Linear Hypotheses:
   Estimate Std. Error z value Pr(>|z|)
CHA0+Db - CHA0 == 0 -0.1700 0.3618  -0.470   0.9970
Db - CHA0 == 0   1.6895 0.3762   4.490   <0.001 ***
HEALTHY - CHA0 == 0  0.8404 0.5034   1.669   0.5402
PCL - CHA0 == 0  0.3207 0.3795   0.845   0.9568
PCL+Db - CHA0 == 0   0.5437 0.3405   1.597   0.5892
Db - CHA0+Db == 01.8596 0.3135   5.931   <0.001 ***
HEALTHY - CHA0+Db == 0   1.0104 0.4584   2.204   0.2266
PCL - CHA0+Db == 0   0.4908 0.3174   1.546   0.6231
PCL+Db - CHA0+Db == 00.7137 0.2696   2.648   0.0817 .
HEALTHY - Db == 0   -0.8492 0.4699  -1.807   0.4491
PCL - Db == 0   -1.3688 0.3338  -4.101   <0.001 ***
PCL+Db - Db == 0-1.1459 0.2887  -3.969   <0.001 ***
PCL - HEALTHY == 0  -0.5196 0.4725  -1.100   0.8764
PCL+Db - HEALTHY == 0   -0.2967 0.4418  -0.672   0.9842
PCL+Db - PCL == 00.2229 0.2929   0.761   0.9725
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Warning message:
In mcp2matrix(model, linfct = linfct) :
  covariate interactions found -- default contrast might be inappropriate


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM HELP NEEDED!

2017-02-01 Thread David Winsemius

> On Feb 1, 2017, at 5:28 AM, CHIRIBOGA Xavier  
> wrote:
> 
> Dear colleagues,
> 
> 
> I am trying to perform a GLM. I tried again without using attach()...but 
> still is not working.
> 
> Do you have any idea to help me?
> 
> 
> Thank you again,
> 
> 
> Xavier
> 
> 
> 
> a <- read.table(file.choose(), h<-T)
>> head(a)
>  time treatment transinduc
> 11   CHA0+Db 1,0768
> 21   CHA0+Db 1,0706
> 31   CHA0+Db 1,0752
> 41   CHA0+Db 1,0689
> 51   CHA0+Db 1,1829
> 61PCL+Db 1,1423
>> summary(a)
>  time treatmenttransinduc
> Min.   :1.000   CHA0   :10   1,0488 : 6
> 1st Qu.:1.000   CHA0+Db: 9   1,0724 : 4
> Median :1.000   Db : 9   1,0752 : 3
> Mean   :1.433   HEALTHY:15   1,0954 : 3
> 3rd Qu.:2.000   PCL:10   1,0001 : 2
> Max.   :2.000   PCL+Db :14   1,0005 : 2
>  (Other):47
>> m1<-glm(a$transinduc~a$time*a$treatment,data=a,family="poisson")
> Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
> family") :
>  valor ausente donde TRUE/FALSE es necesario
> Adem�s: Warning message:
> In Ops.factor(y, 0) : '<' not meaningful for factors

Learning to read R error messages carefully and for full meaning is an 
essential step toward full mastery of this wonderful gift from the R Core.

The business about "negative values" is a complete distraction. That was the 
consequent of an if statement and was only in there to show you where to look 
in the function if you were so inclined . The error is actually being thrown 
much earlier in parsing that statement by the "<" operator inside the `if` 
statement. The "real" error message that says:

> valor ausente donde TRUE/FALSE es necesario

Or in English:

 missing value where TRUE/FALSE needed


Examine this code:

> x <- factor(1:5)
> y<- 1:5
> glm( x ~ y, family="poisson")
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
family") : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors

Because the "<" operator is not defined for factors the result that is passed 
to `if` is of length 0. Setting the factor variable on the RHS and using the 
integer values on hte LHS succeeds.


> glm( y ~ x, family="poisson")

Call:  glm(formula = y ~ x, family = "poisson")

Coefficients:
(Intercept)   x2   x3   x4   x5  
  4.676e-116.931e-011.099e+001.386e+001.609e+00  

Degrees of Freedom: 4 Total (i.e. Null);  0 Residual
Null Deviance:  3.591 
Residual Deviance: 6.661e-16AIC: 24.35

Duncan Murdoch points out that fractional values in the LHS of a formula for 
Poisson regression will not be accepted (since the poisson distribution is 
discrete), and if you do in fact need Poisson regression that you would need to 
use the quasi-binomial family.

On the other hand ... If those were counts in the thousands and needed to be 
converted to "whole numbers", you might need to convert the factor values to 
numeric with:

a$transinduc <- as.numeric( gsub( "[,]", "", a$transinduc) )

-- 
David.
> 
>   [[alternative HTML version deleted]]

Rhelp is a plain text mailing list.

> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM HELP NEEDED!

2017-02-01 Thread Duncan Murdoch

On 01/02/2017 8:28 AM, CHIRIBOGA Xavier wrote:

Dear colleagues,


I am trying to perform a GLM. I tried again without using attach()...but still 
is not working.

Do you have any idea to help me?


Thank you again,


Xavier



a <- read.table(file.choose(), h<-T)


The "h<-T" argument doesn't make sense.  You are just lucky that it 
worked here. For arguments you almost always use "=".


It's also a bad idea to use "T" as an abbreviation for "TRUE", but that 
won't always cause problems.



head(a)

  time treatment transinduc
11   CHA0+Db 1,0768
21   CHA0+Db 1,0706
31   CHA0+Db 1,0752
41   CHA0+Db 1,0689
51   CHA0+Db 1,1829
61PCL+Db 1,1423


I assume that transinduc is supposed to be real numbers, with comma as 
the decimal separator.  You need to use dec = "," in read.table to read 
that properly.



summary(a)

  time treatmenttransinduc
 Min.   :1.000   CHA0   :10   1,0488 : 6
 1st Qu.:1.000   CHA0+Db: 9   1,0724 : 4
 Median :1.000   Db : 9   1,0752 : 3
 Mean   :1.433   HEALTHY:15   1,0954 : 3
 3rd Qu.:2.000   PCL:10   1,0001 : 2
 Max.   :2.000   PCL+Db :14   1,0005 : 2
  (Other):47

m1<-glm(a$transinduc~a$time*a$treatment,data=a,family="poisson")

Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
family") :
  valor ausente donde TRUE/FALSE es necesario
Adem�s: Warning message:
In Ops.factor(y, 0) : '<' not meaningful for factors


It doesn't make sense to use fractional values like 1.0488 in Poisson 
regression.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GLM HELP NEEDED!

2017-02-01 Thread CHIRIBOGA Xavier
Dear colleagues,


I am trying to perform a GLM. I tried again without using attach()...but still 
is not working.

Do you have any idea to help me?


Thank you again,


Xavier



a <- read.table(file.choose(), h<-T)
> head(a)
  time treatment transinduc
11   CHA0+Db 1,0768
21   CHA0+Db 1,0706
31   CHA0+Db 1,0752
41   CHA0+Db 1,0689
51   CHA0+Db 1,1829
61PCL+Db 1,1423
> summary(a)
  time treatmenttransinduc
 Min.   :1.000   CHA0   :10   1,0488 : 6
 1st Qu.:1.000   CHA0+Db: 9   1,0724 : 4
 Median :1.000   Db : 9   1,0752 : 3
 Mean   :1.433   HEALTHY:15   1,0954 : 3
 3rd Qu.:2.000   PCL:10   1,0001 : 2
 Max.   :2.000   PCL+Db :14   1,0005 : 2
  (Other):47
> m1<-glm(a$transinduc~a$time*a$treatment,data=a,family="poisson")
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
family") :
  valor ausente donde TRUE/FALSE es necesario
Adem�s: Warning message:
In Ops.factor(y, 0) : '<' not meaningful for factors


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM output problem

2016-09-01 Thread Anderson Eduardo
Embarrassing but that's true. I wrote 'binamial' instead of 'binomial'. I
tried now with the correct spelling and everything is ok, in fact.


> summary(GLM)

Call:
glm(formula = model, family = binomial(link = logit))

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.575010.7646  -1.726   0.0844 .
x 5.0403 2.7757   1.816   0.0694 .
I(x^2)   -0.2845 0.1558  -1.826   0.0679 .



Thank you all.

Anderson Eduardo


2016-09-01 5:14 GMT-03:00 peter dalgaard :

> >> And use the parameters returned by GLM to contruct an equation for the
> >> regression model:
> >>
> >> model.eq = -0.446078 + 0.267673*x - 0.014577*I(x^2)
> >
> > ## Not what I got with your data. I got:
> >
> > Coefficients:
> > (Intercept)x   I(x^2)
> >   -18.5750   5.0403  -0.2845
> >
> >
> > I suspect you had some other x,y variables lying around when you
> > defined your model.
>
>
> More likely, the family= specification got lost and gaussian family
> implied:
>
> > glm(model)
>
> Call:  glm(formula = model)
>
> Coefficients:
> (Intercept)x   I(x^2)
>-0.44608  0.26767 -0.01458
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>
>


-- 
Anderson A. Eduardo
--
Lattes  | Researcher ID
 | Google Acadêmico
 | Site

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM output problem

2016-09-01 Thread peter dalgaard
>> And use the parameters returned by GLM to contruct an equation for the
>> regression model:
>> 
>> model.eq = -0.446078 + 0.267673*x - 0.014577*I(x^2)
> 
> ## Not what I got with your data. I got:
> 
> Coefficients:
> (Intercept)x   I(x^2)
>   -18.5750   5.0403  -0.2845
> 
> 
> I suspect you had some other x,y variables lying around when you
> defined your model.


More likely, the family= specification got lost and gaussian family implied:

> glm(model)

Call:  glm(formula = model)

Coefficients:
(Intercept)x   I(x^2)  
   -0.44608  0.26767 -0.01458  

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM output problem

2016-08-31 Thread Bert Gunter
Inline.

-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Aug 31, 2016 at 10:03 AM, Anderson Eduardo
 wrote:
> Hello
>
> I have started to work with GLM and I am facing the following problem:
>
> If I take:
>
> y = c(0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
> x = 1:18
>
> model = y ~x + I(x^2)
> GLM = glm(model, family=binamial(link = logit))
>
> And use the parameters returned by GLM to contruct an equation for the
> regression model:
>
> model.eq = -0.446078 + 0.267673*x - 0.014577*I(x^2)

## Not what I got with your data. I got:

Coefficients:
(Intercept)x   I(x^2)
   -18.5750   5.0403  -0.2845


I suspect you had some other x,y variables lying around when you
defined your model.

-- Bert

>
> And backtransform it from the logit to the natural scale (using the inverse
> link-function for this case):
>
> model.proj = exp(model.eq)/(1+exp(model.eq))
>
> the plot for model.proj~x is not the same of the plot for fitted(GLM)~x
> (see the output attached).
>
> Why is this happening? Can someone help me?
>
> Regards,
>
> Anderson Eduardo
>
>  --
> Anderson A. Eduardo
> --
> Lattes  | Researcher ID
>  | Google Acadêmico
>  | Site
> 
> --
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GLM output problem

2016-08-31 Thread Anderson Eduardo
Hello

I have started to work with GLM and I am facing the following problem:

If I take:

y = c(0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
x = 1:18

model = y ~x + I(x^2)
GLM = glm(model, family=binamial(link = logit))

And use the parameters returned by GLM to contruct an equation for the
regression model:

model.eq = -0.446078 + 0.267673*x - 0.014577*I(x^2)

And backtransform it from the logit to the natural scale (using the inverse
link-function for this case):

model.proj = exp(model.eq)/(1+exp(model.eq))

the plot for model.proj~x is not the same of the plot for fitted(GLM)~x
(see the output attached).

Why is this happening? Can someone help me?

Regards,

Anderson Eduardo

 --
Anderson A. Eduardo
--
Lattes  | Researcher ID
 | Google Acadêmico
 | Site

--
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GLM: include observations with missing explanatory variables

2015-12-29 Thread Frank van Berkum
Hi all,
My problem is the following. 
Suppose I have a dataset with observations Y and explanatory variables X1, ..., 
Xn, and suppose one of these explanatory variables is geographical area (of 
which there are ten, j=1,...,10).  For some observations I know the area, but 
for others it is unknown and therefore record as NA.
I want to estimate a model of the form Y[i] ~ Poisson( lambda[i] ) with 
log(lambda[i]) = constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * beta[j]
In words: we estimate a constant for all observations and a factor for each 
area. If it is unknown what the area is, we only include the constant. 
When estimating this model using glm(), the records with is.na(area[i]) are 
'deleted' from the dataset, and this I do not want. I had hoped that the model 
as described above could be estimated using the function I() (interpret as), 
but so far my attempts have not succeeded. 
Any help on how to approach this is kindly appreciated.
Kind regards,
Frank van Berkum  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: include observations with missing explanatory variables

2015-12-29 Thread David Winsemius

> On Dec 29, 2015, at 5:33 AM, Frank van Berkum  
> wrote:
> 
> Hi all,
> My problem is the following. 
> Suppose I have a dataset with observations Y and explanatory variables X1, 
> ..., Xn, and suppose one of these explanatory variables is geographical area 
> (of which there are ten, j=1,...,10).  For some observations I know the area, 
> but for others it is unknown and therefore record as NA.
> I want to estimate a model of the form Y[i] ~ Poisson( lambda[i] ) with 
> log(lambda[i]) = constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * 
> beta[j]
> In words: we estimate a constant for all observations and a factor for each 
> area. If it is unknown what the area is, we only include the constant. 
> When estimating this model using glm(), the records with is.na(area[i]) are 
> 'deleted' from the dataset, and this I do not want. I had hoped that the 
> model as described above could be estimated using the function I() (interpret 
> as), but so far my attempts have not succeeded. 
> Any help on how to approach this is kindly appreciated.
> Kind regards,
> Frank van Berkum
>   [[alternative HTML version deleted]]

I don't understand why you don't just recode the NA's to "unknown" and redo the 
model. This code is untested, but I think demonstrates the two steps needed: 
add a level and then recode the NA values assuming this factor is named `X6`:

dat$X6 <- factor(dat$X6, levels=c( levels(dat$X6), "unknown") )
dat$X6[ is.na(dat$X6) ] <- "unknown"


(I think this might be is a bit quicker than Rolf's approach.)


In R the default contrasts are "treatment" (which is different than the 
contrasts you describe) and so each area will be referenced to the first area 
(and the first level of all other factors) in the lexical ordering of area 
names. This ordering and the contrast type can be changed. The are many 
postings on rhelp over the years demonstrating how to do this.

Study these and the basics of R before reposting and if you do so, then post in 
plain text and include a small example constructed in R:

?factor
?C
?constrasts
?contr.sum

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: include observations with missing explanatory variables

2015-12-29 Thread Rolf Turner


On 30/12/15 02:33, Frank van Berkum wrote:


Hi all, My problem is the following. Suppose I have a dataset with
observations Y and explanatory variables X1, ..., Xn, and suppose one
of these explanatory variables is geographical area (of which there
are ten, j=1,...,10).  For some observations I know the area, but for
others it is unknown and therefore record as NA. I want to estimate a
model of the form Y[i] ~ Poisson( lambda[i] ) with log(lambda[i]) =
constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * beta[j] In
words: we estimate a constant for all observations and a factor for
each area. If it is unknown what the area is, we only include the
constant. When estimating this model using glm(), the records with
is.na(area[i]) are 'deleted' from the dataset, and this I do not
want. I had hoped that the model as described above could be
estimated using the function I() (interpret as), but so far my
attempts have not succeeded. Any help on how to approach this is
kindly appreciated.


As Deep Thought was heard to remark: "Hm.  Tricky."

After pondering for a while it seems to me that you want to make NA the 
reference level of the factor "geographical area".


I.e. convert the NA values to a level of that factor and make it the 
*first* level.


E.g.:

set.seed(42)
GA <- factor(sample(LETTERS[1:10],200,TRUE))
GA[sample(1:200,10)] <- NA
tmp <- as.character(GA)
tmp[is.na(tmp)] <- "unknown"
GA <- factor(tmp,levels=c("unknown",LETTERS[1:10]))

y <- rpois(200,20) # Artificial response.
fit <- glm(y ~ GA,family=poisson)

Note that if you set

z <- predict(fit)

(this gives predictions on the scale of the linear predictor)
then the entries of z corresponding to "unknown" are equal to the 
intercept coefficient of fit.


I can't quite get my head around whether this is *really* accomplishing 
what you want --- but it's a thought.


cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm help

2015-08-21 Thread Joaquín Aldabe
Thankyou all by the comments and sorry for not sending in the adequate
format. I don't have the chance to make txt archives as open office doesn't
do it. I'm attaching the data in excel. Please let me know if this is ok.

The graph that is wierd to me is the BBSA vs AMGP. It is supposed that AMGP
has a positive effect on BBSA, but the model fit shows a negative slope. In
other forum I was told to plot each variable with fixed values of the other
two variables, and make different curves for each variable.

Thanks again for your interest and help.

All the best,
Joaquín

2015-08-19 12:54 GMT-03:00 Joaquín Aldabe joaquin.ald...@gmail.com:

 Dear All, I´m running a glm with poisson errors and have a doubt when
 ploting the predicted values. One of my variables has a positive slope in
 the summary output, but when I plot the predicted values on the original
 plot it draws a line with negative slope. I appreciate your comments on
 this and any other aspect of the analysis.

 Attached is the script and data in different formats just in case.

 Thanks in advanced,

 Joaquín.


 --
 *Joaquín Aldabe*

 *Grupo Biodiversidad, Ambiente y Sociedad*
 Centro Universitario de la Región Este, Universidad de la República
 Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha

 *Departamento de Conservación*
 Aves Uruguay
 BirdLife International
 Canelones 1164, Montevideo

 https://sites.google.com/site/joaquin.aldabe
 https://sites.google.com/site/perfilprofesionaljoaquinaldabe




-- 
*Joaquín Aldabe*

*Grupo Biodiversidad, Ambiente y Sociedad*
Centro Universitario de la Región Este, Universidad de la República
Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha

*Departamento de Conservación*
Aves Uruguay
BirdLife International
Canelones 1164, Montevideo

https://sites.google.com/site/joaquin.aldabe
https://sites.google.com/site/perfilprofesionaljoaquinaldabe
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-21 Thread Bert Gunter
Inline.

-- Bert
Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Aug 20, 2015 at 10:47 PM, Peter Langfelder
peter.langfel...@gmail.com wrote:
 On Thu, Aug 20, 2015 at 10:04 PM, Bert Gunter bgunter.4...@gmail.com wrote:

 I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was built 
 with `cbind` which would create a matrix (probably a character matrix) 
 rather than a data frame.

 False. There is a data.frame method for cbind that returns a data
 frame. Don't know the specifics here, though.


 True, but does not apply here, i.e., David is correct. cbind will
 return a data frame if the first argument is a data frame. In the OP
 case, the first argument was a vector and hence cbind gives a matrix,

False again.

class(cbind(a=1:5,b=data.frame(a=letters[1:5],b=3:7)))

[1] data.frame

##First argument a vector, but data frame is returned. Please consult
?cbind -- especially the data frame section -- for details.

Again, I don't know the specifics here, and you and David may still
well be right for what the OP did. I am only trying to correct what
appear to me to be incorrect statements about the data.frame method of
cbind (or rbind). Apologies if I have misinterpreted.

Cheers,
Bert



 of mode character if any of the inputs were character. Here's a
 short demo:

 a = data.frame(a1 = 1:10)
 # First argument a data frame, so the results is also a data frame  :
 class(cbind(a, b = 11:20))
 [1] data.frame
 # First argument is a vector, so the result is a matrix:
 class(cbind(a$a1, b = 11:20))
 [1] matrix
 mode(cbind(a$a1, b = 11:20))
 [1] numeric
 mode(cbind(a$a1, b = letters[11:20]))
 [1] character

 Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-21 Thread Peter Langfelder
Thanks for the correction, I learned something new.

Peter

On Fri, Aug 21, 2015 at 7:32 AM, Bert Gunter bgunter.4...@gmail.com wrote:
 Inline.

 -- Bert
 Bert Gunter

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll


 On Thu, Aug 20, 2015 at 10:47 PM, Peter Langfelder
 peter.langfel...@gmail.com wrote:
 On Thu, Aug 20, 2015 at 10:04 PM, Bert Gunter bgunter.4...@gmail.com wrote:

 I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was 
 built with `cbind` which would create a matrix (probably a character 
 matrix) rather than a data frame.

 False. There is a data.frame method for cbind that returns a data
 frame. Don't know the specifics here, though.


 True, but does not apply here, i.e., David is correct. cbind will
 return a data frame if the first argument is a data frame. In the OP
 case, the first argument was a vector and hence cbind gives a matrix,

 False again.

 class(cbind(a=1:5,b=data.frame(a=letters[1:5],b=3:7)))

 [1] data.frame

 ##First argument a vector, but data frame is returned. Please consult
 ?cbind -- especially the data frame section -- for details.

 Again, I don't know the specifics here, and you and David may still
 well be right for what the OP did. I am only trying to correct what
 appear to me to be incorrect statements about the data.frame method of
 cbind (or rbind). Apologies if I have misinterpreted.

 Cheers,
 Bert



 of mode character if any of the inputs were character. Here's a
 short demo:

 a = data.frame(a1 = 1:10)
 # First argument a data frame, so the results is also a data frame  :
 class(cbind(a, b = 11:20))
 [1] data.frame
 # First argument is a vector, so the result is a matrix:
 class(cbind(a$a1, b = 11:20))
 [1] matrix
 mode(cbind(a$a1, b = 11:20))
 [1] numeric
 mode(cbind(a$a1, b = letters[11:20]))
 [1] character

 Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-21 Thread David Winsemius
.csv format is still not accepted by the server. When I say it needs to be a 
.txt file   I mean it it needs to be a .txt file. You need to change its 
extension to .txt to prevent your mail client from labeling it as csv which is 
a different type even though I, too, would have thought they should both be 
accepted.

I do now see that the last argument to cbind was a dataframe and Bert's 
comments there were correct. I did not see that tiny little `my4s` way out at 
at the end of all those other arguments. 
-- 
David.

On Aug 21, 2015, at 10:06 AM, Joaquín Aldabe wrote:

 Thanks. Here is in csv format.
 Cheers,
 Joaquín.
 
 2015-08-21 12:49 GMT-03:00 Don McKenzie d...@uw.edu:
 
 You can save to .csv from OpenOffice.
 
 Sent from my iPad
 
 On Aug 21, 2015, at 4:45 AM, Joaquín Aldabe joaquin.ald...@gmail.com
 wrote:
 
 Thankyou all by the comments and sorry for not sending in the adequate
 format. I don't have the chance to make txt archives as open office
 doesn't
 do it. I'm attaching the data in excel. Please let me know if this is ok.
 
 The graph that is wierd to me is the BBSA vs AMGP. It is supposed that
 AMGP
 has a positive effect on BBSA, but the model fit shows a negative slope.
 In
 other forum I was told to plot each variable with fixed values of the
 other
 two variables, and make different curves for each variable.
 
 Thanks again for your interest and help.
 
 All the best,
 Joaquín
 
 2015-08-19 12:54 GMT-03:00 Joaquín Aldabe joaquin.ald...@gmail.com:
 
 Dear All, I´m running a glm with poisson errors and have a doubt when
 ploting the predicted values. One of my variables has a positive slope
 in
 the summary output, but when I plot the predicted values on the original
 plot it draws a line with negative slope. I appreciate your comments on
 this and any other aspect of the analysis.
 
 Attached is the script and data in different formats just in case.
 
 Thanks in advanced,
 
 Joaquín.
 
 
 --
 *Joaquín Aldabe*
 
 *Grupo Biodiversidad, Ambiente y Sociedad*
 Centro Universitario de la Región Este, Universidad de la República
 Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha
 
 *Departamento de Conservación*
 Aves Uruguay
 BirdLife International
 Canelones 1164, Montevideo
 
 https://sites.google.com/site/joaquin.aldabe
 https://sites.google.com/site/perfilprofesionaljoaquinaldabe
 
 
 
 
 --
 *Joaquín Aldabe*
 
 *Grupo Biodiversidad, Ambiente y Sociedad*
 Centro Universitario de la Región Este, Universidad de la República
 Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha
 
 *Departamento de Conservación*
 Aves Uruguay
 BirdLife International
 Canelones 1164, Montevideo
 
 https://sites.google.com/site/joaquin.aldabe
 https://sites.google.com/site/perfilprofesionaljoaquinaldabe
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 *Joaquín Aldabe*
 
 *Grupo Biodiversidad, Ambiente y Sociedad*
 Centro Universitario de la Región Este, Universidad de la República
 Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha
 
 *Departamento de Conservación*
 Aves Uruguay
 BirdLife International
 Canelones 1164, Montevideo
 
 https://sites.google.com/site/joaquin.aldabe
 https://sites.google.com/site/perfilprofesionaljoaquinaldabe
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm help

2015-08-21 Thread Joaquín Aldabe
Thanks. Here is in csv format.
Cheers,
Joaquín.

2015-08-21 12:49 GMT-03:00 Don McKenzie d...@uw.edu:

 You can save to .csv from OpenOffice.

 Sent from my iPad

  On Aug 21, 2015, at 4:45 AM, Joaquín Aldabe joaquin.ald...@gmail.com
 wrote:
 
  Thankyou all by the comments and sorry for not sending in the adequate
  format. I don't have the chance to make txt archives as open office
 doesn't
  do it. I'm attaching the data in excel. Please let me know if this is ok.
 
  The graph that is wierd to me is the BBSA vs AMGP. It is supposed that
 AMGP
  has a positive effect on BBSA, but the model fit shows a negative slope.
 In
  other forum I was told to plot each variable with fixed values of the
 other
  two variables, and make different curves for each variable.
 
  Thanks again for your interest and help.
 
  All the best,
  Joaquín
 
  2015-08-19 12:54 GMT-03:00 Joaquín Aldabe joaquin.ald...@gmail.com:
 
  Dear All, I´m running a glm with poisson errors and have a doubt when
  ploting the predicted values. One of my variables has a positive slope
 in
  the summary output, but when I plot the predicted values on the original
  plot it draws a line with negative slope. I appreciate your comments on
  this and any other aspect of the analysis.
 
  Attached is the script and data in different formats just in case.
 
  Thanks in advanced,
 
  Joaquín.
 
 
  --
  *Joaquín Aldabe*
 
  *Grupo Biodiversidad, Ambiente y Sociedad*
  Centro Universitario de la Región Este, Universidad de la República
  Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha
 
  *Departamento de Conservación*
  Aves Uruguay
  BirdLife International
  Canelones 1164, Montevideo
 
  https://sites.google.com/site/joaquin.aldabe
  https://sites.google.com/site/perfilprofesionaljoaquinaldabe
 
 
 
 
  --
  *Joaquín Aldabe*
 
  *Grupo Biodiversidad, Ambiente y Sociedad*
  Centro Universitario de la Región Este, Universidad de la República
  Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha
 
  *Departamento de Conservación*
  Aves Uruguay
  BirdLife International
  Canelones 1164, Montevideo
 
  https://sites.google.com/site/joaquin.aldabe
  https://sites.google.com/site/perfilprofesionaljoaquinaldabe
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
*Joaquín Aldabe*

*Grupo Biodiversidad, Ambiente y Sociedad*
Centro Universitario de la Región Este, Universidad de la República
Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha

*Departamento de Conservación*
Aves Uruguay
BirdLife International
Canelones 1164, Montevideo

https://sites.google.com/site/joaquin.aldabe
https://sites.google.com/site/perfilprofesionaljoaquinaldabe
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-20 Thread David Winsemius

 On Aug 19, 2015, at 8:54 AM, Joaquín Aldabe joaquin.ald...@gmail.com wrote:
 
 Dear All, I´m running a glm with poisson errors and have a doubt when
 ploting the predicted values. One of my variables has a positive slope in
 the summary output, but when I plot the predicted values on the original
 plot it draws a line with negative slope. I appreciate your comments on
 this and any other aspect of the analysis.
 
 Attached is the script and data in different formats just in case.

The script in .txt format was accepted by the server. The data was apparently 
not in a .txt format, so it was rejected.

You made three plots, so when you reply with data using the correct format 
(which is plain text and generally need to have an extension .txt so that mail 
client will lable as MIME-text) , you should be more specific about which plots 
are not exhibiting the expected features. 

I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was built 
with `cbind` which would create a matrix (probably a character matrix) rather 
than a data frame. I’m wondering you you inadvertently constructed a 
data-object whose structure was different than you imagined? It might have 
gotten coerced back to a dataframe with undesirable concsequences.

— 
David.

 
 Thanks in advanced,
 
 Joaquín.
 
 
 -- 
 *Joaquín Aldabe*
 
 *Grupo Biodiversidad, Ambiente y Sociedad*
 Centro Universitario de la Región Este, Universidad de la República
 Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha
 
 *Departamento de Conservación*
 Aves Uruguay
 BirdLife International
 Canelones 1164, Montevideo
 
 https://sites.google.com/site/joaquin.aldabe
 https://sites.google.com/site/perfilprofesionaljoaquinaldabe
 glm forum consult.txt__
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-20 Thread Bert Gunter
Inline.

Cheers,
Bert
Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Aug 20, 2015 at 9:06 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Aug 19, 2015, at 8:54 AM, Joaquín Aldabe joaquin.ald...@gmail.com wrote:

 Dear All, I´m running a glm with poisson errors and have a doubt when
 ploting the predicted values. One of my variables has a positive slope in
 the summary output, but when I plot the predicted values on the original
 plot it draws a line with negative slope. I appreciate your comments on
 this and any other aspect of the analysis.

 Attached is the script and data in different formats just in case.

 The script in .txt format was accepted by the server. The data was apparently 
 not in a .txt format, so it was rejected.

 You made three plots, so when you reply with data using the correct format 
 (which is plain text and generally need to have an extension .txt so that 
 mail client will lable as MIME-text) , you should be more specific about 
 which plots are not exhibiting the expected features.

 I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was built 
 with `cbind` which would create a matrix (probably a character matrix) rather 
 than a data frame.

False. There is a data.frame method for cbind that returns a data
frame. Don't know the specifics here, though.

Cheers,
Bert


 I’m wondering you you inadvertently constructed a data-object whose
structure was different than you imagined? It might have gotten
coerced back to a dataframe with undesirable concsequences.

 —
 David.


 Thanks in advanced,

 Joaquín.


 --
 *Joaquín Aldabe*

 *Grupo Biodiversidad, Ambiente y Sociedad*
 Centro Universitario de la Región Este, Universidad de la República
 Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha

 *Departamento de Conservación*
 Aves Uruguay
 BirdLife International
 Canelones 1164, Montevideo

 https://sites.google.com/site/joaquin.aldabe
 https://sites.google.com/site/perfilprofesionaljoaquinaldabe
 glm forum consult.txt__
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-20 Thread Peter Langfelder
On Thu, Aug 20, 2015 at 10:04 PM, Bert Gunter bgunter.4...@gmail.com wrote:

 I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was built 
 with `cbind` which would create a matrix (probably a character matrix) 
 rather than a data frame.

 False. There is a data.frame method for cbind that returns a data
 frame. Don't know the specifics here, though.


True, but does not apply here, i.e., David is correct. cbind will
return a data frame if the first argument is a data frame. In the OP
case, the first argument was a vector and hence cbind gives a matrix,
of mode character if any of the inputs were character. Here's a
short demo:

 a = data.frame(a1 = 1:10)
# First argument a data frame, so the results is also a data frame  :
 class(cbind(a, b = 11:20))
[1] data.frame
# First argument is a vector, so the result is a matrix:
 class(cbind(a$a1, b = 11:20))
[1] matrix
 mode(cbind(a$a1, b = 11:20))
[1] numeric
 mode(cbind(a$a1, b = letters[11:20]))
[1] character

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glm help

2015-08-19 Thread Joaquín Aldabe
Dear All, I´m running a glm with poisson errors and have a doubt when
ploting the predicted values. One of my variables has a positive slope in
the summary output, but when I plot the predicted values on the original
plot it draws a line with negative slope. I appreciate your comments on
this and any other aspect of the analysis.

Attached is the script and data in different formats just in case.

Thanks in advanced,

Joaquín.


-- 
*Joaquín Aldabe*

*Grupo Biodiversidad, Ambiente y Sociedad*
Centro Universitario de la Región Este, Universidad de la República
Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha

*Departamento de Conservación*
Aves Uruguay
BirdLife International
Canelones 1164, Montevideo

https://sites.google.com/site/joaquin.aldabe
https://sites.google.com/site/perfilprofesionaljoaquinaldabe
my4-read.table(file.choose(), header=T, dec=,)
my4s-as.data.frame(scale(my4[,c(6,10,12,13,16)], center=T, scale=T))
my4S-cbind(BBSA=my4$BBSA, 
Field_name=my4$Field_name,Grassland_type=my4$Grassland_type, Flood=my4$Flood, 
Year=my4$Year,my4s)

m2.glm=glm(BBSA~AMGP+Distance_to_lagoon+Grass_height, family=quasipoisson, 
data=my4S
summary(m2.glm)
#residuales
par(mfrow=c(2,2))
plot(m2.glm)#pretty good

#predict
amgp=seq(-0.51, 5.44, length.out=100)
grass=seq(-0.85,2.83, length.out=100)
dist=seq(-1.59,1.53, length.out=100)
newdata=data.frame(AMGP=amgp, Distance_to_lagoon=dist,Grass_height=grass)
pred.m2.glm=exp(predict(m2.glm,newdata))
plot(BBSA~Grass_height, data=my4S)
lines(newdata$Grass_height,pred.m2.glm)
plot(BBSA~AMGP, data=my4S)
lines(newdata$AMGP,pred.m2.glm)#why negative slope if summary output is positive
plot(BBSA~Distance_to_lagoon, data=my4S)
lines(newdata$Distance_to_lagoon, pred.m2.glm)__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help - final predictor variable NA

2015-07-22 Thread Ben Bolker
matthewjones43 matthew.jones at kellogg.ox.ac.uk writes:

 
 Hi, I am not a statistician and so I am sure whatever it is I 
 am doing wrong
 must be an obvious error for those who are...Basically I can
  not understand
 why I get NA for variable 'CDSTotal' when running a glm? 
 Does anyone have an
 idea of why this might be happening?

It might be useful to look at 
http://stackoverflow.com/questions/7337761/
 linear-regression-na-estimate-just-for-last-coefficient/7341074#7341074

(broken URL).  You are overfitting the model by including
a predictor that can be expressed as a linear combination of
other predictors, and R is trying to handle it automatically.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm help - final predictor variable NA

2015-07-21 Thread matthewjones43
Hi, I am not a statistician and so I am sure whatever it is I am doing wrong
must be an obvious error for those who are...Basically I can not understand
why I get NA for variable 'CDSTotal' when running a glm? Does anyone have an
idea of why this might be happening?

Call:  glm(formula = cbind(SRAS - 26, 182 - SRAS) ~ Age + Gender + LOC + 
PC + Stability + CDSTotal, family = binomial, data = Controlgroup)

Coefficients:
(Intercept)  Age   Gender  LOC   PCStability  
  -2.575071 0.009148 0.354143 0.018295-0.011317 0.090759  
   CDSTotal  
 NA  

Degrees of Freedom: 64 Total (i.e. Null);  59 Residual
Null Deviance:  2015 
Residual Deviance: 1264 AIC: 1614

Thanks
Matthew



--
View this message in context: 
http://r.789695.n4.nabble.com/glm-help-final-predictor-variable-NA-tp4710161.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm help - final predictor variable NA

2015-07-21 Thread Don McKenzie

 On Jul 21, 2015, at 7:30 PM, Rolf Turner r.tur...@auckland.ac.nz wrote:
 
 
 Psigh!  Why do people think that it is perfectly OK to undertake statistical 
 analyses without knowing or understanding any statistics?
 (I guess it's slightly less dangerous than undertaking to do your own wiring 
 without knowing anything about being an electrician, but still ….)

Fortune?




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help - final predictor variable NA

2015-07-21 Thread Rolf Turner


Psigh!  Why do people think that it is perfectly OK to undertake 
statistical analyses without knowing or understanding any statistics?
(I guess it's slightly less dangerous than undertaking to do your own 
wiring without knowing anything about being an electrician, but still )


However, to stop venting and answer your question:  It is because 
CDSTotal is perfectly confounded (in the given design) with the other 
predictors. That is, CDSTotal is exactly equal to a linear combination 
of the other predictors (and the constant 1).


Try:

lm(CDSTotal ~ Age + Gender + LOC + PC + Stability, data=Controlgroup)

and you will find that the error sum of squares is zero (to within 
numerical tolerance).


cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

On 22/07/15 06:56, matthewjones43 wrote:


Hi, I am not a statistician and so I am sure whatever it is I am doing wrong
must be an obvious error for those who are...Basically I can not understand
why I get NA for variable 'CDSTotal' when running a glm? Does anyone have an
idea of why this might be happening?

Call:  glm(formula = cbind(SRAS - 26, 182 - SRAS) ~ Age + Gender + LOC +
 PC + Stability + CDSTotal, family = binomial, data = Controlgroup)

Coefficients:
(Intercept)  Age   Gender  LOC   PCStability
   -2.575071 0.009148 0.354143 0.018295-0.011317 0.090759
CDSTotal
  NA

Degrees of Freedom: 64 Total (i.e. Null);  59 Residual
Null Deviance:  2015
Residual Deviance: 1264 AIC: 1614


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-05-04 Thread thuksu
For anyone who is looking for an answer to this in the future...

I went for imputation.  It's a way of filling in missing variables based
off of what you see elsewhere in the data.

Myself, I simply took a sample of the categorical from the rest of the test
set.  Some may argue that this is erroneous, as I simply don't know anything
about the new categorical in the test set, and I should throw it away. 
However, my results are going to be aggregated later, and this lets me do
some central limit theorem hand waving.



--
View this message in context: 
http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621p4706772.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-04-30 Thread Jim Lemon
Hi thuksu,
Would defining the factor in your training set with all the levels
that occur in the test set solve the problem? That is, there would be
at least one factor level in the training set even though there were
no instances of that factor.

Jim


On Thu, Apr 30, 2015 at 8:05 AM, thuksu t...@huksu.com wrote:
 My training set and my test set have some factor levels that are
 different  It's rare, but it occurs.

 What is a good way for dealing with this?

 I don't want to throw away the entire row from the data frame, because there
 is some valuable information in there.

 Is there some way to say something like use the weighted average
 coefficient level for this factor?



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-04-30 Thread thuksu
Hi, Thanks for the reply!

I did try this...

# res is a data frame
levels(res$mytypeid.f) - c(levels(res$mytypeid.f),mynewlevel)
logreg - glm(yesno ~ mytypeid.f + amount, data=res, family=binomial)
exp(coef(logreg)) 
# this result shows that the new level is not included in the regression. 
it's probably automatically removed.


I think what I want to do is identify new levels that are not in the
training set, and prune those from the test set.  Then I would be using the
dummy variable by default, which I think is the average, from reading
this:
http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm

Problem is, I'm not sure how to do that...



--
View this message in context: 
http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621p4706644.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-04-29 Thread thuksu
My training set and my test set have some factor levels that are
different  It's rare, but it occurs.

What is a good way for dealing with this?

I don't want to throw away the entire row from the data frame, because there
is some valuable information in there.

Is there some way to say something like use the weighted average
coefficient level for this factor?



--
View this message in context: 
http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM course in Palm Cove

2015-04-23 Thread Highland Statistics Ltd

Apologies for cross-posting


We would like to announce the following statistics course in Palm Cove, 
Australia.


Course1:  GLM with R (Bayesian and frequentist)
Location: Palm Cove, Australia
Date:   11-14 August 2015
Price:   475 GBP
Course website: http://www.highstat.com/statscourse.htm
Course flyer: 
http://www.highstat.com/Courses/Flyers/Flyer2015_08PalmCoveI.pdf



Keywords:
Bayesian statistics, MCMC and JAGS. Overdispersion and solutions. 
Poisson, negative binomial, Bernoulli, binomial,
beta, gamma, inverse Gaussian, lognormal, and binomial distributions. 
GLMs for count data and continuous data. Underdispersion. Truncated 
data. Power analysis.



Kind regards,

Alain Zuur

--
Dr. Alain F. Zuur

First author of:
1. Beginner's Guide to GAMM with R (2014).
2. Beginner's Guide to GLM and GLMM with R (2013).
3. Beginner's Guide to GAM with R (2012).
4. Zero Inflated Models and GLMM with R (2012).
5. A Beginner's Guide to R (2009).
6. Mixed effects models and extensions in ecology with R (2009).
7. Analysing Ecological Data (2007).

Highland Statistics Ltd.
9 St Clair Wynd
UK - AB41 6DZ Newburgh
Tel:   0044 1358 788177
Email: highs...@highstat.com
URL:   www.highstat.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm in hdlm?

2015-03-19 Thread Stanislav Aggerwal
I am following the example in the vignette for hdlm (p. 19), but I cannot
get it to to fit a logistic. For those who don't know the package, it
allows one to fit high dimensional data where the number of variables may
exceed the number of cases.

library(hdlm)

LMFUN - function(x,y) return(glm(y ~ x, family=binomial(link=logit)))
FUNCVFIT - function(x,y) return(cv.glmnet(x, y, family='binomial'))

set.seed(1234)
xx-matrix(runif(20*4),20,4)  #20 cases, 4 variables
xx[,1]-xx[,1]+1:20
yy-c(0,0,0,1,0,1,1,0,1,0,1,0,1,0,1,1,1,1,1,1)

#ordinary glms are fitted with no problems with yy either factor or numeric
fit1-glm(as.factor(yy)~xx,family=binomial)
fit2-glm(yy~xx,family=binomial)

fit3-hdlm(as.factor(yy) ~ xx, LMFUN = LMFUN, FUNCVFIT = FUNCVFIT)

This produces the error:

Error in { :
  task 1 failed - (list) object cannot be coerced to type 'double'
In addition: There were 11 warnings (use warnings() to see them)
=

fit4-hdlm(yy ~ xx, LMFUN = LMFUN, FUNCVFIT = FUNCVFIT)

This produces:

Error in { :
  task 1 failed - (list) object cannot be coerced to type 'double'
In addition: Warning messages:
1: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
2: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
3: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
4: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
5: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
6: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
7: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
8: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
9: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
10: Option grouped=FALSE enforced in cv.glmnet, since  3 observations per
fold
=

Please tell me how to fit the glm in hdlm. Thanks very much for any help.

Stan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM for proportional data

2015-01-24 Thread Bert Gunter
wrong list.

Post on stats.stackexchange.com

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Sat, Jan 24, 2015 at 1:53 AM, Mauricio Gomes
almeida.go...@yahoo.com.br wrote:

 Dears,
 I'd like asking your help to understand a statistical issue from my data set. 
 I ran a GLM with proportional data, using a binomial distribution. However, 
 I've found underdispersion in my model and I don't know how to deal with 
 that. I'm aware that a solution for overdispersion is fit a model using a 
 quasibinomial distribution, but I couldn't find in the literature a solution 
 to my problem. I'd really thank if you can help me.
 Cheers,
 Mauricio


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm - rgamma vector as predictor

2014-10-08 Thread Wim Kreinen
 I have a strange question concerning the fit of a Gamma generalized linear
model with glm (and further using gamma.shape to measure the shape
parameter).

Actually, I started with rgamma to generate some random vectors because I
wanted to play around with various conditions (and become familiar with
gamma shape). But before you can start with gamma shape you need to have a
glm object.

Assuming

gamma.random - rgamma (1000,1.5)

is my random vector.

How do I create a glm object if I only have one vector? I guess, my random
vector is the predictor and the response is the probability (in the sense
of a pdf). Can anybody give me a hint how the syntax is?

glm (? ~ gamma.random, family=Gamma)

Thanks Wim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm - rgamma vector as predictor

2014-10-08 Thread Ben Bolker
Wim Kreinen wkreinen at gmail.com writes:



[snip]

 
 Actually, I started with rgamma to generate some random vectors because I
 wanted to play around with various conditions (and become familiar with
 gamma shape). But before you can start with gamma shape you need to have a
 glm object.
 
 Assuming
 
 gamma.random - rgamma (1000,1.5)
 
 is my random vector.
 
 How do I create a glm object if I only have one vector? I guess, my random
 vector is the predictor and the response is the probability (in the sense
 of a pdf). Can anybody give me a hint how the syntax is?

 [snip]

I think you're looking for

  MASS::gamma.shape(glm(gamma.random ~ 1, family=Gamma))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Help

2014-09-04 Thread peter dalgaard
I think you are looking for

~ Region + Region:Helpers - 1

a.k.a. 

~ Region/Helpers - 1

Notice that these are actually the same model as your glm3 (and also as 
~Region*Helpers), only the parametrization differs. The latter includes an 
overall Helpers term so that the interaction coefficients should be read as 
differences in slope. (With default treatment contrasts, the Helpers term would 
be the slope for the first region and the interactions are differences in slope 
compared to the first region).

-pd

On 03 Sep 2014, at 17:17 , Kathy Haapala ka...@haapi.mn.org wrote:

 Hi all,
 
 I have a large set of data that looks something like this, although
 this data frame is much smaller and includes made up numbers to make
 my question easier.
 
 x.df - data.frame(Region = c(A, A, A, A, A, B, B, B, B, 
 B, B, C, C, C, C), Group_ID = c(1:15), No_Offspring = c(3, 0, 4, 
 2, 1, 0, 3, 4, 3, 2, 2, 5, 4, 1, 3), M_Offspring = c(2, 0, 2, 1, 0, 0, 1, 1, 
 2, 0, 1, 3, 2, 1, 1), F_Offspring = c(1, 0, 2, 1, 1, 0, 2, 3, 1, 2, 1, 2, 2, 
 0, 2), No_Helpers = c(5, 0, 2, 1, 0, 1, 3, 4, 2, 3, 2, 3, 4, 0, 0))
 
 x.df
   Region Group_ID No_Offspring M_Offspring F_Offspring No_Helpers
 1   A13   2   1  5
 2   A20   0   0  0
 3   A34   2   2  2
 4   A42   1   1  1
 5   A51   0   1  0
 6   B60   0   0  1
 7   B73   1   2  3
 8   B84   1   3  4
 9   B93   2   1  2
 10  B   102   0   2  3
 11  B   112   1   1  2
 12  C   125   3   2  3
 13  C   134   2   2  4
 14  C   141   1   0  0
 15  C   153   1   2  0
 
 I have been using GLMs to determine if the number of helpers
 (No_Helpers) has an effect on the sex ratio of the offspring. Here's
 the GLM I have been using:
 
 prop.male - x.df$M_Offspring/x.df$No_Offspring
 glm = glm(prop.male~No_Helpers,binomial,data=x.df)
 
 However, now I'd like to fit a model with region-specific regressions
 and see if this has more support than the model without
 region-specificity. So, I'd like one model that generates a regression
 for each region (A, B,  C).
 
 I've tried treating No_Helpers and Region as covariates:
 glm2 = glm(prop.male~No_Helpers+Region-1,binomial,data=x.df)
 which includes region-specificity in the intercepts, but not the
 entire regression,
 and as interaction terms:
 glm3 = glm(prop.male~No_Helpers*Region-1,binomial,data=x.df)
 which also does not give me an intercept and slope for each region.
 
 I'm not sure how else to adjust the formula, or if the adjustment
 should be somewhere else in the GLM call.
 
 Thanks in advance for your help.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM Help

2014-09-03 Thread Kathy Haapala
Hi all,

I have a large set of data that looks something like this, although
this data frame is much smaller and includes made up numbers to make
my question easier.

 x.df - data.frame(Region = c(A, A, A, A, A, B, B, B, B, 
 B, B, C, C, C, C), Group_ID = c(1:15), No_Offspring = c(3, 0, 4, 
 2, 1, 0, 3, 4, 3, 2, 2, 5, 4, 1, 3), M_Offspring = c(2, 0, 2, 1, 0, 0, 1, 1, 
 2, 0, 1, 3, 2, 1, 1), F_Offspring = c(1, 0, 2, 1, 1, 0, 2, 3, 1, 2, 1, 2, 2, 
 0, 2), No_Helpers = c(5, 0, 2, 1, 0, 1, 3, 4, 2, 3, 2, 3, 4, 0, 0))

 x.df
   Region Group_ID No_Offspring M_Offspring F_Offspring No_Helpers
1   A13   2   1  5
2   A20   0   0  0
3   A34   2   2  2
4   A42   1   1  1
5   A51   0   1  0
6   B60   0   0  1
7   B73   1   2  3
8   B84   1   3  4
9   B93   2   1  2
10  B   102   0   2  3
11  B   112   1   1  2
12  C   125   3   2  3
13  C   134   2   2  4
14  C   141   1   0  0
15  C   153   1   2  0

I have been using GLMs to determine if the number of helpers
(No_Helpers) has an effect on the sex ratio of the offspring. Here's
the GLM I have been using:

 prop.male - x.df$M_Offspring/x.df$No_Offspring
 glm = glm(prop.male~No_Helpers,binomial,data=x.df)

However, now I'd like to fit a model with region-specific regressions
and see if this has more support than the model without
region-specificity. So, I'd like one model that generates a regression
for each region (A, B,  C).

I've tried treating No_Helpers and Region as covariates:
 glm2 = glm(prop.male~No_Helpers+Region-1,binomial,data=x.df)
which includes region-specificity in the intercepts, but not the
entire regression,
and as interaction terms:
 glm3 = glm(prop.male~No_Helpers*Region-1,binomial,data=x.df)
which also does not give me an intercept and slope for each region.

I'm not sure how else to adjust the formula, or if the adjustment
should be somewhere else in the GLM call.

Thanks in advance for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM using truncated lognormal distribution

2014-04-26 Thread Marc Girondot

Dear honorable list-members,

I know how to fit a truncated lognormal distribution (or Gaussian) 
(example here:
http://max2.ese.u-psud.fr/epc/conservation/Girondot/Publications/Blog_r/Entrees/2012/5/24_Adjust_a_truncated_lognormal_distribution.html 
) but I would like to use it in the context of glm.


Rather than using family=gaussian(), ideally I would like to have a 
family=truncated_gaussian().
I see using fix(gaussian) how is organized the gaussian() function. It 
is not 100% clear now but I think I could manage to change it to do a 
family=truncated_gaussian().


But before to do it, perhaps it exists already.

I find the package truncnorm but it does not do this function.

Thanks a lot for any advice,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM vs GAM

2014-03-14 Thread Simon Wood



I am wondering whether anyone could explain what'd be the difference between 
running a 'generalized additive regression' versus 'generalized linear 
regression' with splines.
The smooth terms in mgcv::gam are represented using *penalized* 
regression splines, with the degree of penalization selected as part of 
fitting. Using glm with ns the smooth terms are represented using 
unpenalized regression splines with the smoothness controlled by how 
many knots you chose (none in your example, so the ns terms were 
actually straight lines).


best,
Simon



Are they same models theoretically? My apologies if this is a silly question. 
Any comments or direction to references will be highly appreciated.

Thanks in advance,

Ehsan


#
set.seed(545)
require(mgcv)
n - 200
x1 - c(rnorm(n), 1+rnorm(n))
x2 - sqrt(c(rnorm(n,4),rnorm(n,6)))
y - c(rep(0,n), rep(1,n))
#
# GAM version
#
r1 - gam(y~s(x1, bs = cr)+s(x2, bs = cr),family=binomial)
pr1 - predict(r1, type='response')
summary(pr1)
hist(pr1)
#
# GLM version
#
r2 - glm(y~ns(x1)+ns(x2),family=binomial)
pr2 - predict(r2, type='response')
summary(pr2)
hist(pr2)
#
# Results
#

summary(pr1)

  Min.   1st Qu.Median  Mean   3rd Qu.  Max.
0.394 0.0550200 0.5027000 0.500 0.9322000 1.000

summary(pr2)

  Min.   1st Qu.Median  Mean   3rd Qu.  Max.
0.403 0.0573300 0.5229000 0.500 0.9159000 0.9992000 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM vs GAM

2014-03-13 Thread Ehsan Karim
Dear R-list,

I am wondering whether anyone could explain what'd be the difference between 
running a 'generalized additive regression' versus 'generalized linear 
regression' with splines. 

Are they same models theoretically? My apologies if this is a silly question. 
Any comments or direction to references will be highly appreciated.

Thanks in advance,

Ehsan


#
set.seed(545)
require(mgcv)
n - 200
x1 - c(rnorm(n), 1+rnorm(n))
x2 - sqrt(c(rnorm(n,4),rnorm(n,6)))
y - c(rep(0,n), rep(1,n))
#
# GAM version
#
r1 - gam(y~s(x1, bs = cr)+s(x2, bs = cr),family=binomial)
pr1 - predict(r1, type='response')
summary(pr1)
hist(pr1)
#
# GLM version
#
r2 - glm(y~ns(x1)+ns(x2),family=binomial)
pr2 - predict(r2, type='response')
summary(pr2)
hist(pr2)
#
# Results
#
 summary(pr1)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.394 0.0550200 0.5027000 0.500 0.9322000 1.000 
 summary(pr2)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.403 0.0573300 0.5229000 0.500 0.9159000 0.9992000 
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM with Numeric and Factor as an Input

2014-02-25 Thread Lorenzo Isella

Dear All,
Please consider the snippet at the end of the email.
It is representative of the problems I am experiencing.
I am trying to use glm (without using the formula interface because the  
original data is quite large) to model the response in a case where the  
predictors are a mix of numbers and factors.
In the end, I always end up with an error message, despite having tried  
different choices for the family parameter.
Maybe I am missing the obvious, but can anyone run glm with a combination  
of numbers and factors?

Any help is appreciated.
Cheers

Lorenzo




###
set.seed(1234)

x - rnorm(1000)
dim(x) - c(100,10)
x - as.data.frame(x)
names(x) - LETTERS[seq(10)]

x$J - round(x$J)

x$J - as.factor(x$J)

y - x$A
x - subset(x, select=-c(A))

model - glm.fit(x,y## , family=gaussian
   )

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM with Numeric and Factor as an Input

2014-02-25 Thread Rolf Turner

On 26/02/14 01:40, Lorenzo Isella wrote:

Dear All,
Please consider the snippet at the end of the email.
It is representative of the problems I am experiencing.
I am trying to use glm (without using the formula interface because the
original data is quite large) to model the response in a case where the
predictors are a mix of numbers and factors.
In the end, I always end up with an error message, despite having tried
different choices for the family parameter.
Maybe I am missing the obvious, but can anyone run glm with a
combination of numbers and factors?
Any help is appreciated.
Cheers

Lorenzo




###
set.seed(1234)

x - rnorm(1000)
dim(x) - c(100,10)
x - as.data.frame(x)
names(x) - LETTERS[seq(10)]

x$J - round(x$J)

x$J - as.factor(x$J)

y - x$A
x - subset(x, select=-c(A))

model - glm.fit(x,y## , family=gaussian)


From the help for glm.fit:


For glm.fit: x is a ***design*** matrix of dimension n * p, and y is
a vector of observations of length n.


(Emphasis mine.)

So if you want to/insist on using glm.fit() rather than glm() you will 
have construct your own design matrix.  I.e. replace
each factor column by k-1 columns of dummy variables (where k is the 
number of levels of the given factor).  Note that x should really be a 
*matrix*, not a data frame although it seems that data frames (all of 
whose columns are numeric) get coerced to matrices so it doesn't matter 
much.


cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM weights for the Poisson family

2014-02-06 Thread Milan Bouchet-Valat
I think you should have a look at svyglm() from the survey package.


My two cents


Le mercredi 05 février 2014 à 14:41 +1300, Rolf Turner a écrit :
 You should direct your inquiry to R-help, not to me personally.  I am 
 taking the liberty of cc-ing my reply back to the list.
 
 I really haven't the time at the moment to think the issue through 
 thoroughly, but off the top of my head:  If you are going to use 
 weighted log likelihoods then any comparison of models that you engage 
 in should involve the *same* weights, otherwise you doing the good old 
 apples-with-oranges thing.
 
 So yes, the weights will change the log-likelihood and the AIC.  And so 
 they should.  And if you use AIC to compare models which are 
 meaningfully comparable (id est, have the same weights) this is not a 
 problem.
 
 As I say, this is off the top of my head.  Others older (???) and wiser 
 than I may correct me.
 
 cheers,
 
 Rolf Turner
 
 On 05/02/14 11:56, IamRandom wrote:
  I am trying to do weighted Poisson regression.  I have count data.
 
  Simple example:
  set.seed(50)
  x=seq(0,1,length=100)
  y=numeric(100)
  y[seq(1,100,by=2)]=round(exp(1.5*x[seq(1,100,by=2)]+rnorm(50,0,.1)),0)
  y[seq(2,100,by=2)]=round(exp(1.5*x[seq(1,100,by=2)]+rnorm(50,0,1)),0)
  weigh1=numeric(100)
  weigh1[seq(1,100,by=2)]=rep(5,50)
  weigh1[seq(2,100,by=2)]=rep(1,50)
 
 
  The -2*loglikelihood of both of these regressions is the same with lm,
  which makes sense. The scaling of the weights does not affect the
  log-likelihood.
   -2*logLik( lm(y~x, weights=weigh1))[1]
   -2*logLik( lm(y~x, weights=weigh1/3))[1]
 
  The -2*loglikelihood of these two regressions are different with glm:
-2*logLik(glm(y~x, family=poisson, weights=weigh1))[1]
-2*logLik(glm(y~x, family=poisson, weights=weigh1/3))[1]
 
  This means that the AIC and other model comparison techniques with this
  weighted Poisson regression are dependent on the scaling of the
  weights.  So I assume I misunderstand what the weights are doing in
  the glm function.
 
  -Tracy
 
 
 
  On 2/4/2014 12:56 PM, Rolf Turner wrote:
 
  On 04/02/14 20:12, IamRandom wrote:
 
  I am running a simple example of GLM.  If I include weights when
  family=poisson then the weights are calculated iteratively and
  $weights and $prior.weights return different values.  The $prior.weights
  are what I supplied and $weights are the posterior weights of the
  IWLS.  If I include weights with family=gaussian then the weights are
  static and $weights and $prior.weights return the same values; it seems
  to ignore IWLS algorithm procedure.  I really want the family=poisson
  to behave like the family=gaussian and use the static weights.
  Thoughts?
 
  As far as I understand things, your desideratum makes no sense. The
  prior weights and the just-plain-weights are very different creatures.
  The reason they wind up being the same for the gaussian family is that
  for the gaussian family the likelihood is maximized by least squares;
  there is no need for iteration or for re-weighting.
 
  The poisson family cannot behave like the gaussian family because for
  the poisson family (or any family *other* than gaussian) iteration is
  necessary in order to maximize the likelihood.
 
  You might get some insight into what's going on if you were to read
  Annette Dobson's book An Introduction to Generalized Linear Models
  (Chapman and Hall, 1990).
 
  cheers,
 
  Rolf Turner
 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM weights for the Poisson family

2014-02-04 Thread David Winsemius

On Feb 3, 2014, at 11:12 PM, IamRandom wrote:

 I am running a simple example of GLM.  If I include weights when 
 family=poisson then the weights are calculated iteratively and $weights and 
 $prior.weights return different values.  The $prior.weights are what I 
 supplied and $weights are the posterior weights of the IWLS.  If I include 
 weights with family=gaussian then the weights are static and $weights and 
 $prior.weights return the same values; it seems to ignore IWLS algorithm 
 procedure.

I don't think so. I think it's just starting where you specify and proceeding 
normally from there.

  I really want the family=poisson to behave like the family=gaussian and 
 use the static weights.  Thoughts?

I was under the impression there is no need to iterate for family=gaussian. 
If my understanding is correct only one iteration gets done.

Maybe you should say what you are trying to accomplish.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM weights for the Poisson family

2014-02-04 Thread Rolf Turner


On 04/02/14 20:12, IamRandom wrote:


I am running a simple example of GLM.  If I include weights when
family=poisson then the weights are calculated iteratively and
$weights and $prior.weights return different values.  The $prior.weights
are what I supplied and $weights are the posterior weights of the
IWLS.  If I include weights with family=gaussian then the weights are
static and $weights and $prior.weights return the same values; it seems
to ignore IWLS algorithm procedure.  I really want the family=poisson
to behave like the family=gaussian and use the static weights.  Thoughts?


As far as I understand things, your desideratum makes no sense.  The 
prior weights and the just-plain-weights are very different creatures.
The reason they wind up being the same for the gaussian family is that 
for the gaussian family the likelihood is maximized by least squares; 
there is no need for iteration or for re-weighting.


The poisson family cannot behave like the gaussian family because for 
the poisson family (or any family *other* than gaussian) iteration is 
necessary in order to maximize the likelihood.


You might get some insight into what's going on if you were to read 
Annette Dobson's book An Introduction to Generalized Linear Models

(Chapman and Hall, 1990).

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM weights for the Poisson family

2014-02-04 Thread Rolf Turner


You should direct your inquiry to R-help, not to me personally.  I am 
taking the liberty of cc-ing my reply back to the list.


I really haven't the time at the moment to think the issue through 
thoroughly, but off the top of my head:  If you are going to use 
weighted log likelihoods then any comparison of models that you engage 
in should involve the *same* weights, otherwise you doing the good old 
apples-with-oranges thing.


So yes, the weights will change the log-likelihood and the AIC.  And so 
they should.  And if you use AIC to compare models which are 
meaningfully comparable (id est, have the same weights) this is not a 
problem.


As I say, this is off the top of my head.  Others older (???) and wiser 
than I may correct me.


cheers,

Rolf Turner

On 05/02/14 11:56, IamRandom wrote:

I am trying to do weighted Poisson regression.  I have count data.

Simple example:
set.seed(50)
x=seq(0,1,length=100)
y=numeric(100)
y[seq(1,100,by=2)]=round(exp(1.5*x[seq(1,100,by=2)]+rnorm(50,0,.1)),0)
y[seq(2,100,by=2)]=round(exp(1.5*x[seq(1,100,by=2)]+rnorm(50,0,1)),0)
weigh1=numeric(100)
weigh1[seq(1,100,by=2)]=rep(5,50)
weigh1[seq(2,100,by=2)]=rep(1,50)


The -2*loglikelihood of both of these regressions is the same with lm,
which makes sense. The scaling of the weights does not affect the
log-likelihood.
 -2*logLik( lm(y~x, weights=weigh1))[1]
 -2*logLik( lm(y~x, weights=weigh1/3))[1]

The -2*loglikelihood of these two regressions are different with glm:
  -2*logLik(glm(y~x, family=poisson, weights=weigh1))[1]
  -2*logLik(glm(y~x, family=poisson, weights=weigh1/3))[1]

This means that the AIC and other model comparison techniques with this
weighted Poisson regression are dependent on the scaling of the
weights.  So I assume I misunderstand what the weights are doing in
the glm function.

-Tracy



On 2/4/2014 12:56 PM, Rolf Turner wrote:


On 04/02/14 20:12, IamRandom wrote:


I am running a simple example of GLM.  If I include weights when
family=poisson then the weights are calculated iteratively and
$weights and $prior.weights return different values.  The $prior.weights
are what I supplied and $weights are the posterior weights of the
IWLS.  If I include weights with family=gaussian then the weights are
static and $weights and $prior.weights return the same values; it seems
to ignore IWLS algorithm procedure.  I really want the family=poisson
to behave like the family=gaussian and use the static weights.
Thoughts?


As far as I understand things, your desideratum makes no sense. The
prior weights and the just-plain-weights are very different creatures.
The reason they wind up being the same for the gaussian family is that
for the gaussian family the likelihood is maximized by least squares;
there is no need for iteration or for re-weighting.

The poisson family cannot behave like the gaussian family because for
the poisson family (or any family *other* than gaussian) iteration is
necessary in order to maximize the likelihood.

You might get some insight into what's going on if you were to read
Annette Dobson's book An Introduction to Generalized Linear Models
(Chapman and Hall, 1990).

cheers,

Rolf Turner







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM weights for the Poisson family

2014-02-03 Thread IamRandom
I am running a simple example of GLM.  If I include weights when 
family=poisson then the weights are calculated iteratively and 
$weights and $prior.weights return different values.  The $prior.weights 
are what I supplied and $weights are the posterior weights of the 
IWLS.  If I include weights with family=gaussian then the weights are 
static and $weights and $prior.weights return the same values; it seems 
to ignore IWLS algorithm procedure.  I really want the family=poisson 
to behave like the family=gaussian and use the static weights.  Thoughts?


-Tracy Holsclaw
tholscla at uci.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM: Defining non-constant variance for (gaussian) family

2013-10-07 Thread Christoph Häni
Dear all,

I want to fit some observations Y to a set of predictor variables X_i. (and
proceed with model selection with support of the second-order AIC
(AICc)...)
I (think I) know that the distribution of Y[i] is Gaussian and has a
variance, which is proportional to its value Y[i]. Say:

x1 - rnorm(1:100)*2
x2 - runif(1:100)*0.5
etc...

Y - exp(3.2 + 0.45*x1 - 0.5*x2) + rnorm(100,0,(exp(3.2 + 0.45*x1 -
0.5*x2))^0.5)

So, I thought, I could change the family object as following:

GaussVar - gaussian(log)

GaussVar$variance - function(mu) mu
GaussVar$dev.resids - function(y, mu, wt) wt * ((y - mu)^2/mu)
GaussVar$aic - function (y, n, mu, wt, dev){
nobs - length(y);-2 * sum(dnorm(y,mu,sqrt(dev*mu/nobs), log = TRUE) * wt)
+ 2
}

And the resulting Regression:

reg - glm(Y~x1+x2,family=GaussVar)

Is this a valid modification of the family object?
Am I missing something?
Is the calculation of the AIC complete rubbish?

I have to admit, I'm very new to the topic of regression, so please be
kind! :-)

Thanks in advance,
Christoph

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM result output..

2013-09-15 Thread Lutfor Rahman
Thanks for that. Still I am a bit confused. Please advice me.
Now, I have got minimal adequate model keeping all the those significant
predictors in the model which is shown below:
Coefficients:
   Estimate Std. Error z value Pr(|z|)
(Intercept)  5.846747   0.987461   5.921  3.2e-09 ***
orgmatter -0.886985   0.235347  -3.769 0.000164 ***
baresoil-0.935106   0.293838  -3.182 0.001461 **
orgmatter:moistcont   0.009452   0.002759   3.426 0.000612 ***
baresoil:moistcont 0.025640   0.009698   2.644 0.008194 **
wood10:grass100.007433   0.003187   2.333 0.019667 *
grass10:rdnet100.004822   0.001563   3.085 0.002036 **
wood10:rdnet10-0.045485   0.016890  -2.693 0.007081 **

But when I do anova test of this minimal adequate model, only
baresoil:moistcont, grass10:rdnet, wood10:rdnet10 were found significant.

Df Deviance Resid. Df Resid. Dev Pr(Chi)
NULL   17 36.167
orgmatter1   2.426016 33.741 0.119334
baresoil 1   1.087115 32.654 0.297104
orgmatter:moistcont  1   2.561114 30.093 0.109526
baresoil:moistcont   1   8.297613 21.795 0.003970 **
wood10:grass10   1   0.018412 21.777 0.892042
grass10:rdnet10  1   5.452011 16.325 0.019546 *
wood10:rdnet10   1   8.156510  8.168 0.004291 **

So, when I report the outcome of this model, should I show summary
significance values or anova significance value (chi-square).

Regards
Lutfor





On Fri, Sep 13, 2013 at 7:42 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Sep 13, 2013, at 9:38 AM, Lutfor Rahman wrote:

  Dear forum members,

 Please help me understanding significance value when GLM done in r.

 After doing minimal adequate model, I have found a number of independent
 values  which are significant. But doing their anova significant values
 are
 different. Please find my result following. Which significant values
 should
 I use.


 glm(formula = richness ~ moistcont + orgmatter + baresoil + grass10 +
wood10 + rdnet10 + moistcont:orgmatter + moistcont:baresoil +
grass10:wood10 + grass10:rdnet10 + wood10:rdnet10, family = poisson,
data = data)

 Deviance Residuals:
 Min1QMedian3Q   Max
 -1.19112  -0.33682   0.09813   0.32808   0.70509

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 (Intercept) 11.384447   4.014170   2.836  0.00457 **
 moistcont   -0.095813   0.084995  -1.127  0.25962
 orgmatter   -1.810116   0.613688  -2.950  0.00318 **
 baresoil-1.636707   0.559129  -2.927  0.00342 **
 grass10 -0.018979   0.065647  -0.289  0.77250
 wood10   0.150683   0.128386   1.174  0.24053
 rdnet10 -0.011448   0.068090  -0.168  0.86648
 moistcont:orgmatter  0.025698   0.011521   2.231  0.02571 *
 moistcont:baresoil   0.044110   0.015799   2.792  0.00524 **
 grass10:wood10   0.010740   0.006498   1.653  0.09838 .
 grass10:rdnet10  0.011013   0.004412   2.496  0.01255 *
 wood10:rdnet10  -0.088297   0.027120  -3.256  0.00113 **


 The only p-value I would have expected to be the same would have been the
 last one in the avova output:

 Df Deviance Resid. Df Resid. Dev Pr(Chi)
 .

 wood10:rdnet10   1  10.7812 6  3.928 0.001025 **


 And that particular p-value is not far off from the 0.00113 value reported
 in the model summary. The other p-values are not of the same sort. The
 p-values above are basically reporting the significance of removing
 single predictors or interactions from the full model. The anova reported
 below is perfoming sequential addition of terms to a NULL model as well as
 doing a different test:  LR tests instead of Wald statistics.

 --
 David.


  ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 (Dispersion parameter for poisson family taken to be 1)

Null deviance: 36.1673  on 17  degrees of freedom
 Residual deviance:  3.9276  on  6  degrees of freedom
 AIC: 97.893

 Number of Fisher Scoring iterations: 4

  anova(data1, test=Chisq)

 Analysis of Deviance Table

 Model: poisson, link: log

 Response: richness

 Terms added sequentially (first to last)


Df Deviance Resid. Df Resid. Dev Pr(Chi)
 NULL   17 36.167
 moistcont1   8.632216 27.535 0.003303 **
 orgmatter1   2.124415 25.411 0.144966
 baresoil 1   0.002914 25.408 0.956986
 grass10  1   1.525113 23.883 0.216842
 wood10   1   3.695212 20.187 0.054570 .
 rdnet10  1   0.000111 20.187 0.990564
 moistcont:orgmatter  1   2.048210 18.139 0.152381
 moistcont:baresoil   1   2.8730 9 15.266 

Re: [R] GLM result output..

2013-09-15 Thread David Winsemius

On Sep 15, 2013, at 2:15 AM, Lutfor Rahman wrote:

 Thanks for that. Still I am a bit confused. Please advice me. 
 Now, I have got minimal adequate model keeping all the those significant 
 predictors in the model which is shown below:
 Coefficients:
Estimate Std. Error z value Pr(|z|)
 (Intercept)  5.846747   0.987461   5.921  3.2e-09 ***
 orgmatter -0.886985   0.235347  -3.769 0.000164 ***
 baresoil-0.935106   0.293838  -3.182 0.001461 ** 
 orgmatter:moistcont   0.009452   0.002759   3.426 0.000612 ***
 baresoil:moistcont 0.025640   0.009698   2.644 0.008194 ** 
 wood10:grass100.007433   0.003187   2.333 0.019667 *  
 grass10:rdnet100.004822   0.001563   3.085 0.002036 ** 
 wood10:rdnet10-0.045485   0.016890  -2.693 0.007081 ** 
 
 But when I do anova test of this minimal adequate model, only 
 baresoil:moistcont, grass10:rdnet, wood10:rdnet10 were found significant. 
 
 Df Deviance Resid. Df Resid. Dev Pr(Chi)   
 NULL   17 36.167
 orgmatter1   2.426016 33.741 0.119334   
 baresoil 1   1.087115 32.654 0.297104   
 orgmatter:moistcont  1   2.561114 30.093 0.109526   
 baresoil:moistcont   1   8.297613 21.795 0.003970 **
 wood10:grass10   1   0.018412 21.777 0.892042   
 grass10:rdnet10  1   5.452011 16.325 0.019546 * 
 wood10:rdnet10   1   8.156510  8.168 0.004291 **
 
 So, when I report the outcome of this model, should I show summary 
 significance values or anova significance value (chi-square).

I don't think you should report anything. You need a consultation.  

Rhelp is established to offer technical advice about running R code.

From the Posting Guide:

Questions about statistics: The R mailing lists are primarily intended for 
questions and discussion about the R software. However, questions about 
statistical methodology are sometimes posted. If the question is well-asked and 
of interest to someone on the list, it may elicit an informative up-to-date 
answer. See also the Usenet groups sci.stat.consult (applied statistics and 
consulting) and sci.stat.math (mathematical stat and probability).

Basic statistics and classroom homework: R-help is not intended for these.

-- 
David.
 
 Regards
 Lutfor   
 
 
 
 
 
 On Fri, Sep 13, 2013 at 7:42 PM, David Winsemius dwinsem...@comcast.net 
 wrote:
 
 On Sep 13, 2013, at 9:38 AM, Lutfor Rahman wrote:
 
 Dear forum members,
 
 Please help me understanding significance value when GLM done in r.
 
 After doing minimal adequate model, I have found a number of independent
 values  which are significant. But doing their anova significant values are
 different. Please find my result following. Which significant values should
 I use.
 
 
 glm(formula = richness ~ moistcont + orgmatter + baresoil + grass10 +
wood10 + rdnet10 + moistcont:orgmatter + moistcont:baresoil +
grass10:wood10 + grass10:rdnet10 + wood10:rdnet10, family = poisson,
data = data)
 
 Deviance Residuals:
 Min1QMedian3Q   Max
 -1.19112  -0.33682   0.09813   0.32808   0.70509
 
 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 (Intercept) 11.384447   4.014170   2.836  0.00457 **
 moistcont   -0.095813   0.084995  -1.127  0.25962
 orgmatter   -1.810116   0.613688  -2.950  0.00318 **
 baresoil-1.636707   0.559129  -2.927  0.00342 **
 grass10 -0.018979   0.065647  -0.289  0.77250
 wood10   0.150683   0.128386   1.174  0.24053
 rdnet10 -0.011448   0.068090  -0.168  0.86648
 moistcont:orgmatter  0.025698   0.011521   2.231  0.02571 *
 moistcont:baresoil   0.044110   0.015799   2.792  0.00524 **
 grass10:wood10   0.010740   0.006498   1.653  0.09838 .
 grass10:rdnet10  0.011013   0.004412   2.496  0.01255 *
 wood10:rdnet10  -0.088297   0.027120  -3.256  0.00113 **
 
 The only p-value I would have expected to be the same would have been the 
 last one in the avova output:
 
Df Deviance Resid. Df Resid. Dev Pr(Chi)
 .
 
 wood10:rdnet10   1  10.7812 6  3.928 0.001025 **
 
 And that particular p-value is not far off from the 0.00113 value reported in 
 the model summary. The other p-values are not of the same sort. The p-values 
 above are basically reporting the significance of removing single 
 predictors or interactions from the full model. The anova reported below is 
 perfoming sequential addition of terms to a NULL model as well as doing a 
 different test:  LR tests instead of Wald statistics.
 
 -- 
 David.
 
 
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 (Dispersion parameter for poisson family taken to be 1)
 
Null deviance: 36.1673  on 17  degrees of freedom
 Residual deviance:  

Re: [R] GLM result output..

2013-09-13 Thread David Winsemius


On Sep 13, 2013, at 9:38 AM, Lutfor Rahman wrote:


Dear forum members,

Please help me understanding significance value when GLM done in r.

After doing minimal adequate model, I have found a number of  
independent
values  which are significant. But doing their anova significant  
values are
different. Please find my result following. Which significant values  
should

I use.


glm(formula = richness ~ moistcont + orgmatter + baresoil + grass10 +
   wood10 + rdnet10 + moistcont:orgmatter + moistcont:baresoil +
   grass10:wood10 + grass10:rdnet10 + wood10:rdnet10, family =  
poisson,

   data = data)

Deviance Residuals:
Min1QMedian3Q   Max
-1.19112  -0.33682   0.09813   0.32808   0.70509

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) 11.384447   4.014170   2.836  0.00457 **
moistcont   -0.095813   0.084995  -1.127  0.25962
orgmatter   -1.810116   0.613688  -2.950  0.00318 **
baresoil-1.636707   0.559129  -2.927  0.00342 **
grass10 -0.018979   0.065647  -0.289  0.77250
wood10   0.150683   0.128386   1.174  0.24053
rdnet10 -0.011448   0.068090  -0.168  0.86648
moistcont:orgmatter  0.025698   0.011521   2.231  0.02571 *
moistcont:baresoil   0.044110   0.015799   2.792  0.00524 **
grass10:wood10   0.010740   0.006498   1.653  0.09838 .
grass10:rdnet10  0.011013   0.004412   2.496  0.01255 *
wood10:rdnet10  -0.088297   0.027120  -3.256  0.00113 **


The only p-value I would have expected to be the same would have been  
the last one in the avova output:



   Df Deviance Resid. Df Resid. Dev Pr(Chi)
.
wood10:rdnet10   1  10.7812 6  3.928 0.001025 **


And that particular p-value is not far off from the 0.00113 value  
reported in the model summary. The other p-values are not of the same  
sort. The p-values above are basically reporting the significance of  
removing single predictors or interactions from the full model. The  
anova reported below is perfoming sequential addition of terms to a  
NULL model as well as doing a different test:  LR tests instead of  
Wald statistics.


--
David.



---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

   Null deviance: 36.1673  on 17  degrees of freedom
Residual deviance:  3.9276  on  6  degrees of freedom
AIC: 97.893

Number of Fisher Scoring iterations: 4


anova(data1, test=Chisq)

Analysis of Deviance Table

Model: poisson, link: log

Response: richness

Terms added sequentially (first to last)


   Df Deviance Resid. Df Resid. Dev Pr(Chi)
NULL   17 36.167
moistcont1   8.632216 27.535 0.003303 **
orgmatter1   2.124415 25.411 0.144966
baresoil 1   0.002914 25.408 0.956986
grass10  1   1.525113 23.883 0.216842
wood10   1   3.695212 20.187 0.054570 .
rdnet10  1   0.000111 20.187 0.990564
moistcont:orgmatter  1   2.048210 18.139 0.152381
moistcont:baresoil   1   2.8730 9 15.266 0.090076 .
grass10:wood10   1   0.1431 8 15.123 0.705247
grass10:rdnet10  1   0.4141 7 14.709 0.519883
wood10:rdnet10   1  10.7812 6  3.928 0.001025 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM result output..

2013-09-13 Thread Lutfor Rahman
Dear forum members,

Please help me understanding significance value when GLM done in r.

After doing minimal adequate model, I have found a number of independent
values  which are significant. But doing their anova significant values are
different. Please find my result following. Which significant values should
I use.


glm(formula = richness ~ moistcont + orgmatter + baresoil + grass10 +
wood10 + rdnet10 + moistcont:orgmatter + moistcont:baresoil +
grass10:wood10 + grass10:rdnet10 + wood10:rdnet10, family = poisson,
data = data)

Deviance Residuals:
 Min1QMedian3Q   Max
-1.19112  -0.33682   0.09813   0.32808   0.70509

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept) 11.384447   4.014170   2.836  0.00457 **
moistcont   -0.095813   0.084995  -1.127  0.25962
orgmatter   -1.810116   0.613688  -2.950  0.00318 **
baresoil-1.636707   0.559129  -2.927  0.00342 **
grass10 -0.018979   0.065647  -0.289  0.77250
wood10   0.150683   0.128386   1.174  0.24053
rdnet10 -0.011448   0.068090  -0.168  0.86648
moistcont:orgmatter  0.025698   0.011521   2.231  0.02571 *
moistcont:baresoil   0.044110   0.015799   2.792  0.00524 **
grass10:wood10   0.010740   0.006498   1.653  0.09838 .
grass10:rdnet10  0.011013   0.004412   2.496  0.01255 *
wood10:rdnet10  -0.088297   0.027120  -3.256  0.00113 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 36.1673  on 17  degrees of freedom
Residual deviance:  3.9276  on  6  degrees of freedom
AIC: 97.893

Number of Fisher Scoring iterations: 4

 anova(data1, test=Chisq)
Analysis of Deviance Table

Model: poisson, link: log

Response: richness

Terms added sequentially (first to last)


Df Deviance Resid. Df Resid. Dev Pr(Chi)
NULL   17 36.167
moistcont1   8.632216 27.535 0.003303 **
orgmatter1   2.124415 25.411 0.144966
baresoil 1   0.002914 25.408 0.956986
grass10  1   1.525113 23.883 0.216842
wood10   1   3.695212 20.187 0.054570 .
rdnet10  1   0.000111 20.187 0.990564
moistcont:orgmatter  1   2.048210 18.139 0.152381
moistcont:baresoil   1   2.8730 9 15.266 0.090076 .
grass10:wood10   1   0.1431 8 15.123 0.705247
grass10:rdnet10  1   0.4141 7 14.709 0.519883
wood10:rdnet10   1  10.7812 6  3.928 0.001025 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM with Binomial Distribution

2013-07-24 Thread Bob O'Hara
The model is including the extra terms, but they are folded into, well,
somewhere. The -1 only removes the intercept from the Time effect: the
other factors still have to be contrasts to something. Doesn't Crawley's
book explain this? If not, I wrote a paper a few years ago that might help:
http://www.sekj.org/PDF/anz46-free/anz46-124.pdf

Bob


On 23 July 2013 20:53, Leif Rasmuson rasmu...@uoregon.edu wrote:

 Hi All,

 I am working on re-analyzing per a reviewers request.

 The goal of the project was to determine if the presence of predatory
 fishes caused female crabs to delay the release of larvae. Number of
 releases were recorded at three time periods: 1 hour before the simulated
 tide, 3 hours after the simulated tide and 6 hours after the tide.
 Predators were introduced at high tide and removed just following the 3
 hour observation. Thus we have two categorical variables with three levels
 each. Time: pre-introduction, after introduction and after removal.
 Predator: no-predator, predator of adults and predator of larvae.

 The number of females was not consistent between trials so my initial
 method was to use the SRH extension of the Kruskal wallace test to examine
 the percent of of larvae released.

 A reviewer suggested that I use a GLM with a binomial distribution and
 logit link to analyze the larval release patterns. I used Crawley's book to
 analyze the data based on proportions. See code below. The problem I am
 running into is that the model is not including the predator control and
 interaction is excluding the predator control and the introduction time
 periods. Is this just the nature of using a binomial distribution (and/or
 our small sample size) or is there a way to force R to include all the
 factors and run all the interactions?

 Thanks,
 Leif

 N=GLM$NumFemale-GLM$**NumFemaleRelease
 y=GLM$NumFemaleRelease
 rv -cbind(y,N)

  model1=glm(y~GLM$Time*GLM$**Treatment-1,family=binomial)
  summary(model1)

 Call:
 glm(formula = y ~ GLM$Time * GLM$Treatment - 1, family = binomial)

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.6024 -0.7146 -0.4284 -0.2512 3.9885

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 GLM$TimeAfterIntroduction -1.8289 0.2297 -7.963 1.68e-15 ***
 GLM$TimeAfterRemoval -3.6571 0.5064 -7.222 5.14e-13 ***
 GLM$TimePreIntro -5.0626 1.0029 -5.048 4.46e-07 ***
 GLM$TreatmentPlanktivore -0.0123 0.3211 -0.038 0.969
 GLM$TreatmentPredator -0.1589 0.3311 -0.480 0.631
 GLM$TimeAfterRemoval:GLM$**TreatmentPlanktivore 0.5339 0.7132 0.749 0.454
 GLM$TimePreIntro:GLM$**TreatmentPlanktivore 0.6561 1.2708 0.516 0.606
 GLM$TimeAfterRemoval:GLM$**TreatmentPredator -0.1791 0.8399 -0.213 0.831
 GLM$TimePreIntro:GLM$**TreatmentPredator 1.7496 1.1496 1.522 0.128
 ---
 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 1720.2 on 270 degrees of freedom
 Residual deviance: 258.3 on 261 degrees of freedom
 AIC: 378.23

 Number of Fisher Scoring iterations: 6

 T

 --
 Leif K. Rasmuson

 Doctoral Candidate
 Oregon Institute of Marine Biology
 Phone: (253)961-1763
 E-Mail: rasmu...@uoregon.edu

 º`·.¸¸.·´¯`·...¸º


 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://occamstypewriter.org/boboh/
Journal of Negative Results - EEB: www.jnr-eeb.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM with Binomial Distribution

2013-07-23 Thread Leif Rasmuson

Hi All,

I am working on re-analyzing per a reviewers request.

The goal of the project was to determine if the presence of predatory 
fishes caused female crabs to delay the release of larvae. Number of 
releases were recorded at three time periods: 1 hour before the 
simulated tide, 3 hours after the simulated tide and 6 hours after the 
tide. Predators were introduced at high tide and removed just following 
the 3 hour observation. Thus we have two categorical variables with 
three levels each. Time: pre-introduction, after introduction and after 
removal. Predator: no-predator, predator of adults and predator of larvae.


The number of females was not consistent between trials so my initial 
method was to use the SRH extension of the Kruskal wallace test to 
examine the percent of of larvae released.


A reviewer suggested that I use a GLM with a binomial distribution and 
logit link to analyze the larval release patterns. I used Crawley's book 
to analyze the data based on proportions. See code below. The problem I 
am running into is that the model is not including the predator control 
and interaction is excluding the predator control and the introduction 
time periods. Is this just the nature of using a binomial distribution 
(and/or our small sample size) or is there a way to force R to include 
all the factors and run all the interactions?


Thanks,
Leif

N=GLM$NumFemale-GLM$NumFemaleRelease
y=GLM$NumFemaleRelease
rv -cbind(y,N)

 model1=glm(y~GLM$Time*GLM$Treatment-1,family=binomial)
 summary(model1)

Call:
glm(formula = y ~ GLM$Time * GLM$Treatment - 1, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.6024 -0.7146 -0.4284 -0.2512 3.9885

Coefficients:
Estimate Std. Error z value Pr(|z|)
GLM$TimeAfterIntroduction -1.8289 0.2297 -7.963 1.68e-15 ***
GLM$TimeAfterRemoval -3.6571 0.5064 -7.222 5.14e-13 ***
GLM$TimePreIntro -5.0626 1.0029 -5.048 4.46e-07 ***
GLM$TreatmentPlanktivore -0.0123 0.3211 -0.038 0.969
GLM$TreatmentPredator -0.1589 0.3311 -0.480 0.631
GLM$TimeAfterRemoval:GLM$TreatmentPlanktivore 0.5339 0.7132 0.749 0.454
GLM$TimePreIntro:GLM$TreatmentPlanktivore 0.6561 1.2708 0.516 0.606
GLM$TimeAfterRemoval:GLM$TreatmentPredator -0.1791 0.8399 -0.213 0.831
GLM$TimePreIntro:GLM$TreatmentPredator 1.7496 1.1496 1.522 0.128
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1720.2 on 270 degrees of freedom
Residual deviance: 258.3 on 261 degrees of freedom
AIC: 378.23

Number of Fisher Scoring iterations: 6

T

--
Leif K. Rasmuson

Doctoral Candidate
Oregon Institute of Marine Biology
Phone: (253)961-1763
E-Mail: rasmu...@uoregon.edu

º`·.¸¸.·´¯`·...¸º


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm - change offset to avoid NA?

2013-07-16 Thread Hermann Norpois
I did not think of something like try. I thouht that there should always
be a value if I do a logistic regression but sometimes the values are far
from being meaningful. So there is a cut-off. My plan was to change the
cut-off.
Thanks
Johannes


2013/7/15 Bert Gunter gunter.ber...@gene.com

 I think what you want is

 ?try  ##or
 ?tryCatch

 ## The second is more flexible but slightly more complicated.

 to trap the error and perhaps refit the model without interaction?

 Cheers,
 Bert

 On Mon, Jul 15, 2013 at 10:45 AM, Hermann Norpois hnorp...@gmail.com
 wrote:
  Hello,
 
  I use glm within a function testing for the appearence of the coexistence
  of (minor allels in a subset of)  snps. And then I extract the
  Pr(|z|)-value for the interaction. Principally it works but sometimes
 the
  function stops because this value for the interaction  is NA. For
  instance, this is the case in the following example:
 
  lz - glm(trait~rs7572685*rs10520302, data=mus, family=binomial)
  summary (lz)
  ...
  Coefficients: (1 not defined because of singularities)
   Estimate Std. Error z value Pr(|z|)
  (Intercept)   0.056140.16782   0.3350.738
  rs7572685 0.490410.41437   1.1830.237
  rs105203020.492690.43514   1.1320.258
  rs7572685:rs10520302   NA NA  NA   NA
  ...
 
  I would prefer some values instead of NA (though it does not make any
 sense
  in terms of interpretation) for the sake of the smooth running of my
  function. How is this done? I guess I have to change the offset but I
 dont
  understand how.
  Thanks
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm - change offset to avoid NA?

2013-07-16 Thread David Winsemius

On Jul 16, 2013, at 8:52 AM, Hermann Norpois wrote:

 I did not think of something like try. I thouht that there should always
 be a value if I do a logistic regression but sometimes the values are far
 from being meaningful. So there is a cut-off.

That seems implausilbe. I think you are just seeing the effect of a random 
selection creating a data situation where the interaction term is aliased with 
the others, in which case `glm` will set the coefficient to NA.

 My plan was to change the
 cut-off.

You said there was an error, but you should have said that the results were 
unexpected. You can create an error-condition when this occurs with the 
`stop` function. And then do whatever remedial work is needed.

-- 
David.

 Thanks
 Johannes
 
 
 2013/7/15 Bert Gunter gunter.ber...@gene.com
 
 I think what you want is
 
 ?try  ##or
 ?tryCatch
 
 ## The second is more flexible but slightly more complicated.
 
 to trap the error and perhaps refit the model without interaction?
 
 Cheers,
 Bert
 
 On Mon, Jul 15, 2013 at 10:45 AM, Hermann Norpois hnorp...@gmail.com
 wrote:
 Hello,
 
 I use glm within a function testing for the appearence of the coexistence
 of (minor allels in a subset of)  snps. And then I extract the
 Pr(|z|)-value for the interaction. Principally it works but sometimes
 the
 function stops because this value for the interaction  is NA. For
 instance, this is the case in the following example:
 
 lz - glm(trait~rs7572685*rs10520302, data=mus, family=binomial)
 summary (lz)
 ...
 Coefficients: (1 not defined because of singularities)
 Estimate Std. Error z value Pr(|z|)
 (Intercept)   0.056140.16782   0.3350.738
 rs7572685 0.490410.41437   1.1830.237
 rs105203020.492690.43514   1.1320.258
 rs7572685:rs10520302   NA NA  NA   NA
 ...
 
 I would prefer some values instead of NA (though it does not make any
 sense
 in terms of interpretation) for the sake of the smooth running of my
 function. How is this done? I guess I have to change the offset but I
 dont
 understand how.
 Thanks
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm - change offset to avoid NA?

2013-07-15 Thread Hermann Norpois
Hello,

I use glm within a function testing for the appearence of the coexistence
of (minor allels in a subset of)  snps. And then I extract the
Pr(|z|)-value for the interaction. Principally it works but sometimes the
function stops because this value for the interaction  is NA. For
instance, this is the case in the following example:

lz - glm(trait~rs7572685*rs10520302, data=mus, family=binomial)
 summary (lz)
...
Coefficients: (1 not defined because of singularities)
 Estimate Std. Error z value Pr(|z|)
(Intercept)   0.056140.16782   0.3350.738
rs7572685 0.490410.41437   1.1830.237
rs105203020.492690.43514   1.1320.258
rs7572685:rs10520302   NA NA  NA   NA
...

I would prefer some values instead of NA (though it does not make any sense
in terms of interpretation) for the sake of the smooth running of my
function. How is this done? I guess I have to change the offset but I dont
understand how.
Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm - change offset to avoid NA?

2013-07-15 Thread Bert Gunter
I think what you want is

?try  ##or
?tryCatch

## The second is more flexible but slightly more complicated.

to trap the error and perhaps refit the model without interaction?

Cheers,
Bert

On Mon, Jul 15, 2013 at 10:45 AM, Hermann Norpois hnorp...@gmail.com wrote:
 Hello,

 I use glm within a function testing for the appearence of the coexistence
 of (minor allels in a subset of)  snps. And then I extract the
 Pr(|z|)-value for the interaction. Principally it works but sometimes the
 function stops because this value for the interaction  is NA. For
 instance, this is the case in the following example:

 lz - glm(trait~rs7572685*rs10520302, data=mus, family=binomial)
 summary (lz)
 ...
 Coefficients: (1 not defined because of singularities)
  Estimate Std. Error z value Pr(|z|)
 (Intercept)   0.056140.16782   0.3350.738
 rs7572685 0.490410.41437   1.1830.237
 rs105203020.492690.43514   1.1320.258
 rs7572685:rs10520302   NA NA  NA   NA
 ...

 I would prefer some values instead of NA (though it does not make any sense
 in terms of interpretation) for the sake of the smooth running of my
 function. How is this done? I guess I have to change the offset but I dont
 understand how.
 Thanks

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm and lm can't find weights

2013-03-13 Thread Dimitri Liakhovitski
Thanks a lot, Marc!
Dimitri

On Mon, Mar 11, 2013 at 4:28 PM, Marc Schwartz marc_schwa...@me.com wrote:


 On Mar 11, 2013, at 1:46 PM, Dimitri Liakhovitski 
 dimitri.liakhovit...@gmail.com wrote:

  Hello, and apologies for not providing an example. However, my question
 is
  more general.
 
  I have a lengthy function. This function is using another internal
 function
  that modifies the data frame I am reading in. This internal function is
  using the command model.frame (with data and weights inside) and returns
 a
  data frame I am using for further analyses.
  However, when I try to run my function (which has an lm or a glm
  commmand)
  using that new data frame and separately defined weights, I get an error:
  can't find your weights object.
  This happens despite the fact that I actually print the weights object
  right before the glm command. The object is there - I can see it using
  ls(). I checked and rechecked - there are no typos.
  Interestingly, this happens only when I run it as a function.
 
  When I rename my arguments, go inside the function and run it line by
 line
  - I don't get this problem. Clearly, something is happening with my
 weights
  in the function environment. I was thinking - can it be that once I've
 used
  model.frame - everything else - like glm and lm - is confused as to what
  the weights are and doesn't want to take the weights I hand over to it
 but
  is looking for them elsewhere?
 
  Thank you!
  Dimitri


 Many discussions on this over the years, along with the 'subset' argument.
 From the Details section of ?lm:

   All of weights, subset and offset are evaluated in the same way as
 variables in formula,
   that is first in data and then in the environment of formula.

 You might also review:

   http://developer.r-project.org/nonstandard-eval.pdf

 This post by Thomas from 2006 may be helpful:

   http://tolstoy.newcastle.edu.au/R/devel/06/06/5869.html

 and the reply from Peter also:

   http://tolstoy.newcastle.edu.au/R/devel/06/06/5868.html

 Regards,

 Marc Schwartz




-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm and lm can't find weights

2013-03-11 Thread Dimitri Liakhovitski
Hello, and apologies for not providing an example. However, my question is
more general.

I have a lengthy function. This function is using another internal function
that modifies the data frame I am reading in. This internal function is
using the command model.frame (with data and weights inside) and returns a
data frame I am using for further analyses.
However, when I try to run my function (which has an lm or a glm  commmand)
using that new data frame and separately defined weights, I get an error:
can't find your weights object.
This happens despite the fact that I actually print the weights object
right before the glm command. The object is there - I can see it using
ls(). I checked and rechecked - there are no typos.
Interestingly, this happens only when I run it as a function.

When I rename my arguments, go inside the function and run it line by line
- I don't get this problem. Clearly, something is happening with my weights
in the function environment. I was thinking - can it be that once I've used
model.frame - everything else - like glm and lm - is confused as to what
the weights are and doesn't want to take the weights I hand over to it but
is looking for them elsewhere?

Thank you!
Dimitri

-- 
Dimitri Liakhovitski
gfk.com http://marketfusionanalytics.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm and lm can't find weights

2013-03-11 Thread Marc Schwartz

On Mar 11, 2013, at 1:46 PM, Dimitri Liakhovitski 
dimitri.liakhovit...@gmail.com wrote:

 Hello, and apologies for not providing an example. However, my question is
 more general.
 
 I have a lengthy function. This function is using another internal function
 that modifies the data frame I am reading in. This internal function is
 using the command model.frame (with data and weights inside) and returns a
 data frame I am using for further analyses.
 However, when I try to run my function (which has an lm or a glm  commmand)
 using that new data frame and separately defined weights, I get an error:
 can't find your weights object.
 This happens despite the fact that I actually print the weights object
 right before the glm command. The object is there - I can see it using
 ls(). I checked and rechecked - there are no typos.
 Interestingly, this happens only when I run it as a function.
 
 When I rename my arguments, go inside the function and run it line by line
 - I don't get this problem. Clearly, something is happening with my weights
 in the function environment. I was thinking - can it be that once I've used
 model.frame - everything else - like glm and lm - is confused as to what
 the weights are and doesn't want to take the weights I hand over to it but
 is looking for them elsewhere?
 
 Thank you!
 Dimitri


Many discussions on this over the years, along with the 'subset' argument. From 
the Details section of ?lm:

  All of weights, subset and offset are evaluated in the same way as variables 
in formula, 
  that is first in data and then in the environment of formula.

You might also review:

  http://developer.r-project.org/nonstandard-eval.pdf

This post by Thomas from 2006 may be helpful:

  http://tolstoy.newcastle.edu.au/R/devel/06/06/5869.html

and the reply from Peter also:

  http://tolstoy.newcastle.edu.au/R/devel/06/06/5868.html

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm poisson and quasipoisson

2013-02-01 Thread Ivan-K
I am sure, that this is not a pure Poisson! Huge overdispersion!
You get inflated confidence intervals! 
(although, the point estimates of the regression coefficients stay the same)
Try to look for the causes of overdispersion! It may be geteroscedastisity?
What is the nature of the response, is it the positive integers?
Perhaps in your model still missing something important predictors?
Or just you can try the Gamma or log-normal distr.

Ivan, IPAE UB RAS




--
View this message in context: 
http://r.789695.n4.nabble.com/glm-poisson-and-quasipoisson-tp4657234p4657310.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm poisson and quasipoisson

2013-02-01 Thread ilai
On Thu, Jan 31, 2013 at 2:13 PM, Wim Kreinen wkrei...@gmail.com wrote:

 Hello,

 I have a question about modelling via  glm.


I think you are way off track. Either the data, glm, or both, are not what
you think they are.


 I have a dataset

skn300.tab - structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names = c(n,
freq, kum), row.names = c(NA, -97L), class = data.frame)


 that looks like as if it where poisson distributed (actually I would
 appreciate that) but it isnt


plot(skn300.tab)

My guess, we are looking at the pdf and cdf (maybe even of a Poisson
process), but not at any data that lends itself to a (generalized) linear
model. Consult a statistician, post on stackexchange, read about
regression, or better define your actual R problem here, demonstrating this
is not homework - see the posting guide.

Cheers




 because  mean unequals var.


  mean (x)
 [1] 901.7827
  var (x)
 [1] 132439.3


 Anyway, I tried to model it via poisson and quasipoisson. Actually, just to
 get an impression how glm works. But I dont know how to interprete the
 data. Of course this is the case because my knowledge concerning logistic
 regressions is rather limited. Hoping there is somebody with mercy I would
 like to understand which parameters are important, e.g. which paramter
 might give me a hint that a poisson model is a bad idea. For hints
 concerning some tutorials  about reading glm-output I would appreciate as
 well.

 Thanks
 Wim


  skn300.glmp - glm (freq~n, data=skn300.tab, family=poisson)
  summary (skn300.glmp)

 Call:
 glm(formula = freq ~ n, family = poisson, data = skn300.tab)

 Deviance Residuals:
 Min   1Q   Median   3Q  Max
 -51.332   -9.383   -6.599   -3.959   55.111

 Coefficients:
   Estimate Std. Error z value Pr(|z|)
 (Intercept)  7.2374375  0.0093285   775.8   2e-16 ***
 n   -0.0539424  0.0003699  -145.8   2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 (Dispersion parameter for poisson family taken to be 1)

 Null deviance: 71731  on 96  degrees of freedom
 Residual deviance: 37383  on 95  degrees of freedom
 AIC: 37800

 Number of Fisher Scoring iterations: 6

 
  skn300.glmq - glm (freq~n, data=skn300.tab, family=quasipoisson)
  summary (skn300.glmq)

 Call:
 glm(formula = freq ~ n, family = quasipoisson, data = skn300.tab)

 Deviance Residuals:
 Min   1Q   Median   3Q  Max
 -51.332   -9.383   -6.599   -3.959   55.111

 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  7.237438   0.186381  38.831   2e-16 ***
 n   -0.053942   0.007391  -7.298  8.8e-11 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 (Dispersion parameter for quasipoisson family taken to be 399.1874)

 Null deviance: 71731  on 96  degrees of freedom
 Residual deviance: 37383  on 95  degrees of freedom
 AIC: NA

 Number of Fisher Scoring iterations: 6


   dput (skn300.tab)
 structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
 100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
 1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
 297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
 78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
 19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
 4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
 1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
 kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
 L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
 18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
 22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 

[R] glm poisson and quasipoisson

2013-01-31 Thread Wim Kreinen
Hello,

I have a question about modelling via  glm. I have a dataset (see dput)
that looks like as if it where poisson distributed (actually I would
appreciate that) but it isnt because  mean unequals var.


 mean (x)
[1] 901.7827
 var (x)
[1] 132439.3


Anyway, I tried to model it via poisson and quasipoisson. Actually, just to
get an impression how glm works. But I dont know how to interprete the
data. Of course this is the case because my knowledge concerning logistic
regressions is rather limited. Hoping there is somebody with mercy I would
like to understand which parameters are important, e.g. which paramter
might give me a hint that a poisson model is a bad idea. For hints
concerning some tutorials  about reading glm-output I would appreciate as
well.

Thanks
Wim


 skn300.glmp - glm (freq~n, data=skn300.tab, family=poisson)
 summary (skn300.glmp)

Call:
glm(formula = freq ~ n, family = poisson, data = skn300.tab)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-51.332   -9.383   -6.599   -3.959   55.111

Coefficients:
  Estimate Std. Error z value Pr(|z|)
(Intercept)  7.2374375  0.0093285   775.8   2e-16 ***
n   -0.0539424  0.0003699  -145.8   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 71731  on 96  degrees of freedom
Residual deviance: 37383  on 95  degrees of freedom
AIC: 37800

Number of Fisher Scoring iterations: 6


 skn300.glmq - glm (freq~n, data=skn300.tab, family=quasipoisson)
 summary (skn300.glmq)

Call:
glm(formula = freq ~ n, family = quasipoisson, data = skn300.tab)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-51.332   -9.383   -6.599   -3.959   55.111

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)  7.237438   0.186381  38.831   2e-16 ***
n   -0.053942   0.007391  -7.298  8.8e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 399.1874)

Null deviance: 71731  on 96  degrees of freedom
Residual deviance: 37383  on 95  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 6


  dput (skn300.tab)
structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names = c(n,
freq, kum), row.names = c(NA, -97L), class = data.frame)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Modelling help needed

2013-01-14 Thread Oliver Keyes
So, with R you use object[int,int] to select the rows and columns you want
to highlight. ([rows,columns]); what you've done here is asked it to apply
to rows 1 to 2 (1:2), across all columns. You'll want [,3:4] to specify two
particular columns. I'm not familiar enough with glm itself to provide any
further guidance (I'll leave that to other readers) but when it comes to
automation, you might want to look at the mapply function (type ?mapply) or
the plyr package, particularly ddply (?ddply). With ddply you can (for
example) split by year and week, execute a function within ddply on
specified variables, and then recombine into one dataset consisting of
Year, Week and Relationship.

Assuming it works - like I said, not particularly familiar with glm :)

On Mon, Jan 14, 2013 at 12:08 AM, bhatmb mansi.bhatna...@alumni.utoronto.ca
 wrote:

 Hi Everyone,

 I am new to R and am figuring my way around it. I am trying to determine
 the
 relationship between A B, for each week of the year.

 My dataset looks like:
 YearWeekA  B
 19821   11.3   198.53
 19822   14.4309.00
 19823   23.2325.49

 When i tried to run glm on just the first entry using [1,] i got the error:
 Error in nmod2$R4.2w[1, ] : incorrect number of dimensions

 My edited  code for glm is: abundmod1-glm(R4.2w~N_4,data=nmod1[1:2,])

 Does this code still just use the information from the first week? Also
 since I have to run it for more than 10 years worth of data for 15 weeks,
 is
 there a faster way?

 Thank you so much in advance!

 Mansi



 --
 View this message in context:
 http://r.789695.n4.nabble.com/GLM-Modelling-help-needed-tp4655438.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM Modelling help needed

2013-01-13 Thread bhatmb
Hi Everyone,

I am new to R and am figuring my way around it. I am trying to determine the
relationship between A B, for each week of the year.

My dataset looks like:
YearWeekA  B
19821   11.3   198.53
19822   14.4309.00
19823   23.2325.49

When i tried to run glm on just the first entry using [1,] i got the error:
Error in nmod2$R4.2w[1, ] : incorrect number of dimensions

My edited  code for glm is: abundmod1-glm(R4.2w~N_4,data=nmod1[1:2,]) 

Does this code still just use the information from the first week? Also
since I have to run it for more than 10 years worth of data for 15 weeks, is
there a faster way?

Thank you so much in advance!

Mansi



--
View this message in context: 
http://r.789695.n4.nabble.com/GLM-Modelling-help-needed-tp4655438.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM ERROR

2012-12-22 Thread farnoosh sheikhi


 Hello,

I'm fitting a logistic regression as follows and I have this error In 
predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==  :
  prediction from a rank-deficient fit may be misleading

Codes:
##Training 

fit- glm(Y.Binary.training~., family=binomial(link=logit), data=training)

pred-predict(fit, training, type=response)
pred
pred.train-cbind(pred, Y.Binary.training)
preds-prediction(as.numeric(pred),as.numeric(Y.Binary.training))
perf- performance(preds, auc)
perf    I get a Roc Curve of 1 which is surprising!

##Testing
pred-predict(fit, testing, type=response)
pred
pred.test-cbind(pred, Y.Binary.testing)
preds-prediction(as.numeric(pred),as.numeric(Y.Binary.testing))
perf- performance(preds, auc)
perf    I also get a ROC curve of 1...!!!

I'm wondering if there is something wrong in my codes .
Thanks a lot and happy holidays!
 

Best,Farnoosh Sheikhi
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm - predict logistic regression - entering the betas manually.

2012-12-11 Thread Greg Snow
It may be overkill, but you can specify the model pieces using the offset
function in the model, then the predictions work out (at least for my
simple trial case).  Something like:

fit - glm( y ~ 0+offset(-1 + 2*x), family=binomial, data=data.frame(y=0,
x=0) )
predict( fit, newdata=data.frame(x=seq(-1,1, length.out=25)),
type='response' )



On Mon, Dec 10, 2012 at 7:26 PM, Raffaello Vardavas
r_varda...@hotmail.comwrote:


 Dear All,

 I know this may be a trivial question.

 In the past I have used glm to make logistic regressions on data. The
 output creates an object with the results of the logistic regression. This
 object can then be used to make predictions.

 Great.

 I have a different problem. I need to make predictions from a logistic
 regression taken from a paper. Thus I need to (by hand) enter the reported
 odds ratios, compute the betas and enter these into an object in order to
 use the predict.

 Sure, I can write a function myself (the logit function) to make these
 predictions. But I was wondering how one can do this by creating a glm
 output object by entering this  manually and then use the predict.

 I couldn't find any helpful site.

 Thanks.

 RAff.


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm - predict logistic regression - entering the betas manually.

2012-12-10 Thread Raffaello Vardavas

Dear All,

I know this may be a trivial question.

In the past I have used glm to make logistic regressions on data. The output 
creates an object with the results of the logistic regression. This object can 
then be used to make predictions.

Great.

I have a different problem. I need to make predictions from a logistic 
regression taken from a paper. Thus I need to (by hand) enter the reported odds 
ratios, compute the betas and enter these into an object in order to use the 
predict.

Sure, I can write a function myself (the logit function) to make these 
predictions. But I was wondering how one can do this by creating a glm output 
object by entering this  manually and then use the predict.

I couldn't find any helpful site.

Thanks.

RAff.
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Coding Issue

2012-11-29 Thread Craig P O'Connell

Thank you both for the extensive amount of help!  I am sorry it has taken me a 
bit to respond, but i've been trying to plug away at this.  I still have a 
few questions, if you don't mind giving me some pointers: 

Here is the fake data again: 


treatment 

feeding 

avoid 

noavoid (all visits which didn’t result in avoidance) 


Control 

nofeeding 

1 

357 


Control 

chum 

2 

292 


Control 

Satiation 

4 

186 


Proc. Control 

nofeeding 

15 

291 


Proc. Control 

chum 

25 

288 


Proc. Control 

Satiation 

17 

140 


Magnet 

nofeeding 

87 

224 


Magnet 

Chum 

34 

229 


Magnet 

Satiation 

46 

151 


Here is the coding: 

model1 - glm(cbind(avoid, noavoid) ~ treatment, family=binomial,  data=avoid) 
summary(model1) 

glm(formula = cbind(avoid, noavoid )~treatment , family = binomial, 
data = avoid) 


 #Predicting avoid, no avoid from just treatment 


model2 - glm(cbind(avoid, noavoid) ~ feeding, family=binomial, data=avoid) 
summary(model2) 




  glm(formula = cbind(avoid, noavoid )~feeding , family = binomial, 
data = avoid) 

#Predicting avoid, no avoid from just feeding 




model3-glm(cbind(avoid,noavoid)~treatment*feeding,family=binomial,data=avoid) 

summary(model3) 

glm(formula = cbind(avoid, noavoid )~treatment*feeding , family = binomial, 
data = avoid) 
#Predicting avoid,noavoid from the interaction between treatment and feeding 

However, when I run all the models, I notice that my control data is not 
incorporated in the output.  I just receive my procedural control and magnet 
data.  I cannot figure what is causing this. 

Secondly, when I run model 3 I am not receiving what I anticipated.  I thought 
I would get the following output data: 

treatmentmag                  
treatmentproc                 
feedingnofeed                    
feedingsat                        
treatmentmag:feedingnofeed        
treatmentproc:feedingnofeed       
treatmentmag:feedingsat          
treatmentproc:feedingsat 

What happened to the following:  treatmentcon, feedingchum, 
treatmentcon:feedingnofeed, treatmentcon:feedingsat, treatmentmag:feedingchum, 
treatmentproc:feedingchum, treatmentcon:feedingchum ?   I have a strange 
feeling that the issue is arising from my family=binomial term.  Any 
feedback would be greatly appreciated. 

Kind Regards, 

Craig 


- Original Message -

From: Steve Lianoglou mailinglist.honey...@gmail.com 
To: David Winsemius dwinsem...@comcast.net 
Cc: Craig P O'Connell coconne...@umassd.edu, r-help@r-project.org 
Sent: Tuesday, November 27, 2012 11:54:48 PM 
Subject: Re: [R] GLM Coding Issue 

Hi, 

On Tuesday, November 27, 2012, David Winsemius wrote: 
[snip] 



blockquote
`cbind`-ing doesn't make much sense here. What is your target (y) 
variable here? are you trying to predict `avoid` or `noavoid` status? 



Sorry, Steve. It does make sense. See : 

?glm  # First paragraph of Details. 
/blockquote



Indeed ... I've tried to,send a follow up email salvaging my bad call with some 
hopefully useful tidbits, but it matched some headers and is stuck in the 
mailman queue. It might come through eventually. 


Don't be sorry, though ... I learned something new :-) 


Still, I do apologize for the flawed advice re: the cbind-ing thing 


-Steve 



-- 
Steve Lianoglou 
Graduate Student: Computational Systems Biology 
 | Memorial Sloan-Kettering Cancer Center 
 | Weill Medical College of Cornell University 
Contact Info: http://cbio.mskcc.org/~lianos/contact 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm convergence warning

2012-11-27 Thread Jean V Adams
It could be that for some levels of your independent factor variables (WS, 
SS), the response is either all zeroes or all ones.  Or, for your 
continuous independent variables (DV, DS), there is a clean break between 
the zeroes and ones.  For example, if all the CIDs are one when DS = 18 
but all of the CIDs are zero when DS =20, then there is no single best 
fit for a logistic model to that relation, the curve could be straight up 
steep, or gradual and shallow.

Do you get convergence if you fit these subset models?

glm(CID ~ WS, data=kimu, family=binomial)
glm(CID ~ SS, data=kimu, family=binomial)
glm(CID ~ DV, data=kimu, family=binomial)
glm(CID ~ DS, data=kimu, family=binomial)

Can you see the problem when you plot the data?
attach(kimu)
plot(WS, CID)
plot(SS, CID)
plot(DV, CID)
plot(DS, CID)

Jean



Anne Schaefer annelsch...@gmail.com wrote on 11/26/2012 08:46:26 PM:
 
 Hello,
 
 When I run the following glm model:
 
 modelresult=glm(CID~WS+SS+DV+DS, data=kimu, family=binomial)
 
 I get the following warning messages:
 
 1: glm.fit: algorithm did not converge
 2: glm.fit: fitted probabilities numerically 0 or 1 occurred
 
 What I am trying to do is model my response variable (CID: correct bird
 identification) as a function of the predictor variables weather state
 (WS), sea state (SS), distance from the vessel (DV) and duration of the
 sighting (DS). I defined both sea state and weather state as factors 
with
 three levels (0, 1, or 2). Distance of the vessel values are 100, 80, 
60,
 40, and 20. Duration of the sighting ranges from 0 to 58 seconds.
 
 The output R is giving me is:
 
 Deviance Residuals:
Min  1Q  Median  3Q Max
 -3.562e-05  -2.100e-08   2.100e-08   2.100e-08   3.632e-05
 
 Coefficients:
   Estimate Std. Error z value Pr(|z|)
 (Intercept) -2.000e+02  1.067e+06   0.0001.000
 WSf1 7.744e+01  9.086e+04   0.0010.999
 WSf2 1.285e+01  6.199e+04   0.0001.000
 SSf1-1.042e+02  1.683e+05  -0.0011.000
 SSf2-1.859e+02  1.432e+05  -0.0010.999
 DV   6.770e-01  9.394e+03   0.0001.000
 DS   9.822e+00  1.884e+04   0.0011.000
 
 
 What do the warning messages mean? Can I still use coefficient estimates
 and standard error values?
 
 Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM Coding Issue

2012-11-27 Thread Craig P O'Connell


Dear all, 

   I am having a recurring problem when I attempt to conduct a GLM.  Here is 
what I am attempting (with fake data): 
First, I created a txt file, changed the directory in R (to the proper folder 
containing the file) and loaded the file: 

#avoid-read.table(avoid.txt,header=TRUE);avoid 
#  treatment feeding avoid noavoid 
#1   control  nofeed     1     357 
#2   control    feed     2     292 
#3   control     sat     4     186 
#4      proc  nofeed    15     291 
#5      proc    feed    25     288 
#6      proc     sat    17     140 
#7       mag  nofeed    87     224 
#8       mag    feed    34     229 
#9       mag     sat    46     151 

I then try to attach(avoid) the data, but continue to get an error message ( 
The following object(s) are masked _by_ .GlobalEnv :), so to fix this, I do the 
following: 

#newavoid-avoid 
#newavoid                (does this do anything?) 

Lastly, I have several GLM's I wanted to conduct.  Please see the following: 

#model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial) 

#model2=glm(cbind(avoid, noavoid)~feeding, familiy=binomial) 

#model3=glm(cbind(avoid, noavoid)~treatment+feeding, familiy=binomial) 

After running model1, I receive the error message Error in 
model.frame.default(formula = cbind(avoid, noavoid) ~ treatment,  : 
  invalid type (list) for variable 'cbind(avoid, noavoid)'.  

It would be greatly appreciated if somebody can help me with my coding, as you 
can see I am a novice but doing my best to learn.  I figured if I can get 
model1 to run, I should be able to figure out the rest of my models.  

Kind Regards! 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Coding Issue

2012-11-27 Thread Steve Lianoglou
Hi,

Comments inline:

On Tue, Nov 27, 2012 at 1:00 PM, Craig P O'Connell
coconne...@umassd.edu wrote:


 Dear all,

I am having a recurring problem when I attempt to conduct a GLM.  Here is 
 what I am attempting (with fake data):
 First, I created a txt file, changed the directory in R (to the proper folder 
 containing the file) and loaded the file:

 #avoid-read.table(avoid.txt,header=TRUE);avoid
 #  treatment feeding avoid noavoid
 #1   control  nofeed 1 357
 #2   controlfeed 2 292
 #3   control sat 4 186
 #4  proc  nofeed15 291
 #5  procfeed25 288
 #6  proc sat17 140
 #7   mag  nofeed87 224
 #8   magfeed34 229
 #9   mag sat46 151

 I then try to attach(avoid) the data, but continue to get an error message 
 ( The following object(s) are masked _by_ .GlobalEnv :), so to fix this, I do 
 the following:

 #newavoid-avoid
 #newavoid(does this do anything?)

It essentially makes a copy of `avoid` to `newavoid` -- what did you
want it to do?

That having been said, a good rule of thumb is to never use `attach`,
so let's avoid it for now.

 Lastly, I have several GLM's I wanted to conduct.  Please see the following:

 #model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial)

 #model2=glm(cbind(avoid, noavoid)~feeding, familiy=binomial)

 #model3=glm(cbind(avoid, noavoid)~treatment+feeding, familiy=binomial)

`cbind`-ing doesn't make much sense here. What is your target (y)
variable here? are you trying to predict `avoid` or `noavoid` status?

Let's assume you were predicting `noavoid` from just `treatment` and
`feeding` (I guess you have more data (rows) than you show), you would
build a model like so:

R model - glm(noavoid ~ treatment + feeding, binomial, avoid)

Or to be explicit about the parameters:

R model - glm(noavoid ~ treatment + feeding, family=binomial, data=avoid)


 It would be greatly appreciated if somebody can help me with my coding, as 
 you can see I am a novice but doing my best to learn.  I figured if I can get 
 model1 to run, I should be able to figure out the rest of my models.

Since you're just getting started, maybe it would be helpful for
people writing documentation/tutorials/whatever what needs to be
explained better.

For instance, I'm curious why you thought to `cbind` in your first glm
call, which was:

model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial)

What did you think `cbind`-ing was accomplishing for you? Is there an
example somewhere that's doing that as the first parameter to a `glm`
call?

Also, why just have `data=nothing`?

I'm not criticizing, just trying to better understand.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Coding Issue

2012-11-27 Thread Peter Ehlers


On 2012-11-27 14:34, Steve Lianoglou wrote:

Hi,

Comments inline:

On Tue, Nov 27, 2012 at 1:00 PM, Craig P O'Connell
coconne...@umassd.edu wrote:



Dear all,

I am having a recurring problem when I attempt to conduct a GLM.  Here is 
what I am attempting (with fake data):
First, I created a txt file, changed the directory in R (to the proper folder 
containing the file) and loaded the file:

#avoid-read.table(avoid.txt,header=TRUE);avoid
#  treatment feeding avoid noavoid
#1   control  nofeed 1 357
#2   controlfeed 2 292
#3   control sat 4 186
#4  proc  nofeed15 291
#5  procfeed25 288
#6  proc sat17 140
#7   mag  nofeed87 224
#8   magfeed34 229
#9   mag sat46 151

I then try to attach(avoid) the data, but continue to get an error message ( 
The following object(s) are masked _by_ .GlobalEnv :), so to fix this, I do the following:

#newavoid-avoid
#newavoid(does this do anything?)


It essentially makes a copy of `avoid` to `newavoid` -- what did you
want it to do?

That having been said, a good rule of thumb is to never use `attach`,
so let's avoid it for now.


Lastly, I have several GLM's I wanted to conduct.  Please see the following:

#model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial)

#model2=glm(cbind(avoid, noavoid)~feeding, familiy=binomial)

#model3=glm(cbind(avoid, noavoid)~treatment+feeding, familiy=binomial)


`cbind`-ing doesn't make much sense here. What is your target (y)
variable here? are you trying to predict `avoid` or `noavoid` status?

Let's assume you were predicting `noavoid` from just `treatment` and
`feeding` (I guess you have more data (rows) than you show), you would
build a model like so:

R model - glm(noavoid ~ treatment + feeding, binomial, avoid)

Or to be explicit about the parameters:

R model - glm(noavoid ~ treatment + feeding, family=binomial, data=avoid)



It would be greatly appreciated if somebody can help me with my coding, as you 
can see I am a novice but doing my best to learn.  I figured if I can get 
model1 to run, I should be able to figure out the rest of my models.


Since you're just getting started, maybe it would be helpful for
people writing documentation/tutorials/whatever what needs to be
explained better.

For instance, I'm curious why you thought to `cbind` in your first glm
call, which was:

model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial)

What did you think `cbind`-ing was accomplishing for you? Is there an
example somewhere that's doing that as the first parameter to a `glm`
call?



Steve:
re a matrix response: see MASS (the book, 4ed) page 191; also found
in the ch07.R file in the /library/MASS/scripts folder. I seem to
recall that this is mentioned somewhere in the docs, but put my finger
on it now.

One additional comment about the analysis: overdispersion might be
a problem.

Peter Ehlers

[...]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Coding Issue

2012-11-27 Thread David Winsemius


On Nov 27, 2012, at 3:34 PM, Steve Lianoglou wrote:


Hi,

Comments inline:

On Tue, Nov 27, 2012 at 1:00 PM, Craig P O'Connell
coconne...@umassd.edu wrote:



Dear all,

  I am having a recurring problem when I attempt to conduct a GLM.   
Here is what I am attempting (with fake data):
First, I created a txt file, changed the directory in R (to the  
proper folder containing the file) and loaded the file:


#avoid-read.table(avoid.txt,header=TRUE);avoid
#  treatment feeding avoid noavoid
#1   control  nofeed 1 357
#2   controlfeed 2 292
#3   control sat 4 186
#4  proc  nofeed15 291
#5  procfeed25 288
#6  proc sat17 140
#7   mag  nofeed87 224
#8   magfeed34 229
#9   mag sat46 151

I then try to attach(avoid) the data, but continue to get an  
error message ( The following object(s) are masked  
_by_ .GlobalEnv :), so to fix this, I do the following:


That was not an error message but rather a warning message. (You are  
asked to copy the full text.)




#newavoid-avoid
#newavoid(does this do anything?)


It essentially makes a copy of `avoid` to `newavoid` -- what did you
want it to do?

That having been said, a good rule of thumb is to never use `attach`,
so let's avoid it for now.

Lastly, I have several GLM's I wanted to conduct.  Please see the  
following:


#model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial)

#model2=glm(cbind(avoid, noavoid)~feeding, familiy=binomial)

#model3=glm(cbind(avoid, noavoid)~treatment+feeding,  
familiy=binomial)


`cbind`-ing doesn't make much sense here. What is your target (y)
variable here? are you trying to predict `avoid` or `noavoid` status?


Sorry, Steve. It does make sense. See :

?glm  # First paragraph of Details.




Let's assume you were predicting `noavoid` from just `treatment` and
`feeding` (I guess you have more data (rows) than you show), you would
build a model like so:

R model - glm(noavoid ~ treatment + feeding, binomial, avoid)

Or to be explicit about the parameters:

R model - glm(noavoid ~ treatment + feeding, family=binomial,  
data=avoid)



It would be greatly appreciated if somebody can help me with my  
coding, as you can see I am a novice but doing my best to learn.  I  
figured if I can get model1 to run, I should be able to figure out  
the rest of my models.


Since you're just getting started, maybe it would be helpful for
people writing documentation/tutorials/whatever what needs to be
explained better.

For instance, I'm curious why you thought to `cbind` in your first glm
call, which was:

model1-glm(cbind(avoid, noavoid)~treatment,data=,family=binomial)

What did you think `cbind`-ing was accomplishing for you? Is there an
example somewhere that's doing that as the first parameter to a `glm`
call?

Also, why just have `data=nothing`?

I'm not criticizing, just trying to better understand.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Coding Issue

2012-11-27 Thread Steve Lianoglou
Hi,

On Tuesday, November 27, 2012, David Winsemius wrote:
[snip]

 `cbind`-ing doesn't make much sense here. What is your target (y)
 variable here? are you trying to predict `avoid` or `noavoid` status?


 Sorry, Steve. It does make sense. See :

 ?glm  # First paragraph of Details.


Indeed ... I've tried to,send a follow up email salvaging my bad call with
some hopefully useful tidbits, but it matched some headers and is stuck
in the mailman queue. It might come through eventually.

Don't be sorry, though ... I learned something new :-)

Still, I do apologize for the flawed advice re: the cbind-ing thing

-Steve



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM Coding Issue

2012-11-27 Thread Steve Lianoglou
Hi Peter,

On Tue, Nov 27, 2012 at 8:05 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2012-11-27 14:34, Steve Lianoglou wrote:
[snip]
 Steve:
 re a matrix response: see MASS (the book, 4ed) page 191; also found
 in the ch07.R file in the /library/MASS/scripts folder. I seem to
 recall that this is mentioned somewhere in the docs, but put my finger
 on it now.

Indeed -- thanks for the pointer.

Well then ... let me try to recover and at least offer some decent
advice to Craig's original post.

I'd still recommend avoiding the use of `attach` and favor passing in
your data data.fame using the `data` param of the call to `glm`.

I'd also call the `avoid` data.frame something else, to avoid
potential confusion (if only in your mind) with the column `avoid`
inside the `avoid` data.frame.

Anyhow, when you pass in your data as param, things work ok:

R model1 - glm(cbind(avoid, noavoid) ~ treatment, binomial, avoid)
R summary(model1)

Call:
glm(formula = cbind(avoid, noavoid) ~ treatment, family = binomial,
data = avoid)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-3.6480  -1.3350   0.4297   1.5706   2.6200

...

OK ... well, hope that helped somewhat.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm convergence warning

2012-11-26 Thread Anne Schaefer
Hello,

When I run the following glm model:

modelresult=glm(CID~WS+SS+DV+DS, data=kimu, family=binomial)

I get the following warning messages:

1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred

What I am trying to do is model my response variable (CID: correct bird
identification) as a function of the predictor variables weather state
(WS), sea state (SS), distance from the vessel (DV) and duration of the
sighting (DS). I defined both sea state and weather state as factors with
three levels (0, 1, or 2). Distance of the vessel values are 100, 80, 60,
40, and 20. Duration of the sighting ranges from 0 to 58 seconds.

The output R is giving me is:

Deviance Residuals:
   Min  1Q  Median  3Q Max
-3.562e-05  -2.100e-08   2.100e-08   2.100e-08   3.632e-05

Coefficients:
  Estimate Std. Error z value Pr(|z|)
(Intercept) -2.000e+02  1.067e+06   0.0001.000
WSf1 7.744e+01  9.086e+04   0.0010.999
WSf2 1.285e+01  6.199e+04   0.0001.000
SSf1-1.042e+02  1.683e+05  -0.0011.000
SSf2-1.859e+02  1.432e+05  -0.0010.999
DV   6.770e-01  9.394e+03   0.0001.000
DS   9.822e+00  1.884e+04   0.0011.000


What do the warning messages mean? Can I still use coefficient estimates
and standard error values?

Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm fitting routine and convergence

2012-11-06 Thread Christopher A. Simon
What kind of special magic does glm have?

I'm working on a logistic regression solver (L-BFGS) in c and I've been
using glm to check my results.  I came across a data set that has a very
high condition number (the data matrix transpose the data matrix) that when
running my solver does not converge, but the same data set with glm was
converging ( I love R :) ).  I noticed that glm using IWLS to solve the MLE
problem I also noticed that the results from glm suggest that glm checks
for complete separation for variables. Besides this check for variable
separation is glm doing anything else besides a straight implementation to
IWLS that would allow it to converge for a near ill-posed data set? Is it
re-starting in some intelligent way?

My apologies if this is not the right place to post this message (wasn't
sure if I should post here or in r-dev).

Thanks,

-Chris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm fitting routine and convergence

2012-11-06 Thread David Winsemius

On Nov 6, 2012, at 1:44 PM, Christopher A. Simon wrote:

 What kind of special magic does glm have?
 
 I'm working on a logistic regression solver (L-BFGS) in c and I've been
 using glm to check my results.  I came across a data set that has a very
 high condition number (the data matrix transpose the data matrix) that when
 running my solver does not converge, but the same data set with glm was
 converging ( I love R :) ).  I noticed that glm using IWLS to solve the MLE
 problem I also noticed that the results from glm suggest that glm checks
 for complete separation for variables. Besides this check for variable
 separation is glm doing anything else besides a straight implementation to
 IWLS that would allow it to converge for a near ill-posed data set?

I do not think that is a sufficiently precise description to support an up/down 
vote.

 Is it
 re-starting in some intelligent way?
 
 My apologies if this is not the right place to post this message (wasn't
 sure if I should post here or in r-dev).

You should ask how you are handling your matrix operations. Look at the code, 
especially glm.fit. I think you will find that key function call is 

fit - .Call(C_Cdqrls, x[good, , drop = FALSE] * 
w, z * w, min(1e-07, control$epsilon/1000))

-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM and WLS

2012-11-05 Thread frespider
Hi 

I am running some simulation in  R after resampling from a huge data set 500
columns and 86759 rows and fit these two model and calculate R-square 
fit - lm(Y~X,weights = Weights)
bhat - coef(fit)
Yhat - X%*%bhat 
SST - sum((Y - mean(Y))^2)  
SSE - sum((Y - Yhat)^2)   
MSE - SSE/(n-pc)  
aRsqr - 1-(((n-1)*SSE)/((n-pc)*SST)) 

and the GLM Model

fitGLM - glm(Y~X,family=Gamma(link = log))

Yhat -exp( X%*%bhat )
SST - sum((Y - mean(Y))^2)  
SSE - sum((Y - Yhat)^2)   
MSE - SSE/(n-pc)  
aRsqr - 1-(((n-1)*SSE)/((n-pc)*SST)) 

I am getting a negative R-square which I don't understand why? Can some one
let me know why and if I can investigate that how I do that.

Thanks



--
View this message in context: 
http://r.789695.n4.nabble.com/GLM-and-WLS-tp4648531.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   >