Marc Schwartz wrote: > on 08/26/2008 07:31 PM (Ted Harding) wrote: > >> On 26-Aug-08 23:49:37, hadley wickham wrote: >> >>> On Tue, Aug 26, 2008 at 6:45 PM, Ted Harding >>> <[EMAIL PROTECTED]> wrote: >>> >>>> Hi Folks, >>>> This tip is probably lurking somewhere already, but I've just >>>> discovered it the hard way, so it is probably worth passing >>>> on for the benefit of those who might otherwise hack their >>>> way along the same path. >>>> >>>> Say (for example) you want to do a logistic regression of a >>>> binary response Y on variables X1, X2, X3, X4: >>>> >>>> GLM <- glm(Y ~ X1 + X2 + X3 + X4) >>>> >>>> Say there are 1000 cases in the data. Because of missing values >>>> (NAs) in the variables, the number of complete cases retained >>>> for the regression is, say, 600. glm() does this automatically. >>>> >>>> QUESTION: Which cases are they? >>>> >>>> You can of course find out "by hand" on the lines of >>>> >>>> ix <- which( (!is.na(Y))&(!is.na(X1))&...&(!is.na(X4)) ) >>>> >>>> but one feels that GLM already knows -- so how to get it to talk? >>>> >>>> ANSWER: (e.g.) >>>> >>>> ix <- as.integer(names(GLM$fit)) >>>> >>> Alternatively, you can use: >>> >>> attr(GLM$model, "na.action") >>> >>> Hadley >>> >> Thanks! I can see that it works -- though understanding how >> requires a deeper knowledge of "R internals". However, since >> you've approached it from that direction, simply >> >> GLM$model >> >> is a dataframe of the retained cases (with corresponding >> row-names), all variables at once, and that is possibly an >> even simpler approach! >> > > Or just use: > > model.frame(ModelObject) > > as the extractor function... :-) > > Another 'a priori' approach would be to use na.omit() or one of its > brethren on the data frame before creating the model. Which function is > used depends upon how 'na.action' is set. > > The returned value, or more specifically the 'na.action' attribute as > appropriate, would yield information similar to Hadley's approach > relative to which records were excluded. > > For example, using the simple data frame in ?na.omit: > > DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA)) > > >> DF >> > x y > 1 1 0 > 2 2 10 > 3 3 NA > > DF.na <- na.omit(DF) > > >> DF.na >> > x y > 1 1 0 > 2 2 10 > > >> attr(DF.na, "na.action") >> > 3 > 3 > attr(,"class") > [1] "omit" > > > So you can see that record 3 was removed from the original data frame > due to the NA for 'y'. > Also notice the possibility of
(g)lm(....., na.action=na.exclude) as in library(ISwR); attach(thuesen) fit <- lm(short.velocity ~ blood.glucose, na.action=na.exclude) which(is.na(fitted(fit))) # 16 This is often recommendable anyway, e.g. in case you want to plot residuals against original predictors. -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.