Re: [R] grep(pattern = each element of a vector) ?

2013-09-12 Thread arun
Hi,
res- ddply(.data=df1,
  .variables='Taxa',
   .fun=transform,
   Class=find.class(Taxa))
#Warning messages:
#1: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used
#2: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used
#3: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used

May be it is better to modify the function:
find.class- function(x) df2[grep(unique(x),df2$Taxa),'Class']
res1- ddply(.data=df1,
   .variables='Taxa',
    .fun=transform,
    Class=find.class(Taxa)) #no warnings

#though it doesn't have any effect in the end result.
 identical(res,res1) 
#[1] TRUE


A.K.





- Original Message -
From: Allen, Joel allen.j...@epa.gov
To: Beaulieu, Jake beaulieu.j...@epa.gov; r-help@r-project.org 
r-help@r-project.org
Cc: Farrar, David farrar.da...@epa.gov; Green, Hyatt 
green.hy...@epa.gov; McManus, Michael mcmanus.mich...@epa.gov; Wahman, 
David wahman.da...@epa.gov
Sent: Thursday, September 12, 2013 2:49 PM
Subject: Re: [R] grep(pattern = each element of a vector) ?

Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit 
system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', 
NA))
df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
      .variables='Taxa',
      .fun=transform,
      Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
    index[i] - grep(df1$Taxa[1], df2$Taxa)
    }
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grep(pattern = each element of a vector) ?

2013-09-12 Thread Beaulieu, Jake
Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
index[i] - grep(df1$Taxa[1], df2$Taxa)
}
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.gov


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep(pattern = each element of a vector) ?

2013-09-12 Thread Allen, Joel
Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit 
system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', 
NA))
df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
  .variables='Taxa',
  .fun=transform,
  Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
index[i] - grep(df1$Taxa[1], df2$Taxa)
}
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.