On 5/27/07, Robert A. LaBudde <[EMAIL PROTECTED]> wrote: > As I was working through elementary examples, I was using dataset > "plasma" of package "HSAUR". > > In performing a logistic regression of the data, and making the > diagnostic plots (R-2.5.0) > > data(plasma,package='HSAUR') > plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial()) > layout(matrix(1:4,nrow=2)) > plot(plasma_1) > > I find that data points corresponding to rownames 17 and 23 are > outliers and high leverage. > > I would then like to perform a fit without these two rows. > > In principle this should be easy, using an update() with subset=-c(17,23). > > The problem is that the rownames in this dataset are not ordered, > and, in fact, the relevant rows are 30 and 31, not 17 and 23. > > This brings up the following (elementary?) questions: > > 1. How do you reference rows in "subset=" for which you know the > rownames, but not the row numbers?
Use a logical vector: rownames(plasma) %in% c(17, 23) > > 2. How do you discovery the rows corresponding to particular > rownames? (Using plasma[rownames(plasma)==17,] shows the data, but > NOT the row number!) (Probably the same answer as in Q. 1 above.) which(rownames(plasma) %in% c(17, 23)) # 30, 31 > > 3. How do you sort (order) the rows of an existing data frame so that > the rownames are in order? plasma[order(as.numeric(rownames(plasma))), ] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.