Hi Holger,

On Sep 14, 2009, at 10:57 AM, Hollix wrote:


Hi folks,

I created a subset of a dataframe (i.e., selected only men):

subdata <- subset(data,data$gender==1)

After a residual diagnostic of a regression analysis, I detected three
outliers:

linmod <- lm(y ~ x, data=subdata)
plot(linmod)

Say, the cases 11,22, and 33 were outliers.

Here comes the problem: When I want to exclude these three cases in a
further regression analysis,
- for instance with linmod2 <- lm(y[-c(11,22,33)] ~ x[-c(11,22,33)],
data=subdata) - it does not work.

I suspect that your x matrix is probably a 2d matrix, so you might need to do:

R> lm(y[-c(11,22,33)] ~ x[-c(11,22,33),]

Note the trailing comma after the -c() vector when indexing into x!

Perhaps you can just remove those rows from your data and keep your formula "clean", like so?

R> linmod2 <- lm(y ~ x, data=subdata[-c(11,22,33),])

I guess this has something to do with this strange "row.names"- vector which has been added to the dataframe when creating the subset. I find it very strange why R gives the case numbers in the diagnostics but then doesn't
allow me to use these numbers for further exclusion.

Hmm .. not sure what you mean, but this won't get in your way either way if you are using integers to index into your data.frame.

Can anybody tell me:
1. what this row.names vector is
2. How I can refer to cases after creating a subset (e.g., in order to
exclude them).

Refer to them by their position in the data.frame as you would if you didn't create a subset.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to