[R] Selecting rows from a DF where the value in a selected column matches any element of a vector.

Andrew Hoerner Sat, 12 Apr 2014 05:38:27 -0700

Dear Folks--
I have a file with 3 million-odd rows of data from the 2007 U.S. Economic
Census. I am trying to pare it down to a subset of rows that both (1) has
any one of a vector of NAICS economic sector codes, and (2) also has any
one of a vector of geographic ID codes.


Here is the code I am trying to use.

ECwork  <-  EC07_A1[ any(GEO_ID == c("01000US", "04000US06", "33000US488",
"31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000") &
      any(SECTOR == c("32", "33", "42", 44", 45", 51", 54", 61", "71",
"81"), ]

I get back the following error:

Warning message:
In EC07_A1$SECTOR == c("32", "33", "42", "44", "45", "51", "54",  :
  longer object length is not a multiple of shorter object length

I see what R is doing.  Instead of comparing each element of the column
SECTOR to the row vector of codes, and returning a logical vector of the
length of SECTOR with rows marked as TRUE that match any of the codes, it
is lining my code list up with SECTOR as a column vector and doing
element-by-element testing, and then recycling the code list over three
million rows. But I am not sure how to make it do what I want -- test the
sector code in each row against the vector of code I am looking for. I
would be grateful if anyone could suggest an alternative that would achieve
my ends.

Oh, and I would add, if there is a way of correctly using doing this with
the extract function [], I would like to know what it is. If not, I guess
I'd like to know that too.

Sincerely, Andrew Hoerner

-- 
J. Andrew Hoerner
Director, Sustainable Economics Program
Redefining Progress
(510) 507-4820

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Selecting rows from a DF where the value in a selected column matches any element of a vector.

Reply via email to