[R] Problem with factor state when subset()ing a data.frame

Roger Leigh Thu, 08 Feb 2007 14:29:16 -0800

Hi folks,

I am running into a problem when calling subset() on a large
data.frame.  One of the columns contains strings which are used as
factors.  R seems to automatically factor the column when the
data.frame is contstructed, and this appears to not get updated when I
create a subset of the table.


A minimal testcase to demonstrate the problem follows:


sample <- data.frame(c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C"),
                     c(5,3,5,3,6,7,8,3,2,6))
names(sample) <- c("ID", "Value")

print(sample)

sample.filtered <- subset(sample, ID != "B", select=c(ID, Value))
# Or sample.filtered <- subset(sample, ID != "B", select=c(ID, Value), drop=T)

print(sample.filtered)

plot(sample.filtered)
plot(sample.filtered$Value ~ sample.filtered$ID)

print(levels(sample.filtered$ID))
print(levels(factor(sample.filtered$ID)))

plot(sample.filtered$Value ~ factor(sample.filtered$ID))


Am I doing something wrong here, or is this an R bug?  How can I get
the new data.frame to update the factors, so I don't get redundant
"empty" factors on the plot by eliminating the "phantom" factors?  (I
also need to remove the unused factors for other analyses, and
factoring them "by hand" seems a little redundant.)


Kind regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

pgpFI6TwMEntK.pgp
Description: PGP signature

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with factor state when subset()ing a data.frame

Reply via email to