subject:"\[R\] grep"

Re: [R] grep

2022-07-10 Thread John Fox


Dear Steven,

Beyond ?regex, the Wikipedia article on regular expressions 
 is quite helpful and 
not too long.


I hope this helps,
 John

On 2022-07-10 9:43 p.m., Steven T. Yen wrote:
Thanks Jeff. It works. If there is a good reference I should read 
(besides ? grep) I's be glad to have it.


On 7/11/2022 9:30 AM, Jeff Newmiller wrote:

grep( "^(z|x)\\.", jj, value = TRUE )

or

grep( r"(^(z|x)\.)", jj, value = TRUE )


On July 10, 2022 6:08:45 PM PDT, "Steven T. Yen"  
wrote:
Dear, Below, jj contains character strings starting with “z.” and 
“x.”. I want to grep all that contain either “z.” or “x.”. I had to 
grep “z.” and “x.” separately and then tack the result together. Is 
there a convenient grep option that would grep strings with either 
“z.” or “x.”. Thank you!



jj<-names(v$est); jj
  [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" 
"z.realinc"
  [7] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" 
"x.realinc"

[13] "mu1_1" "mu2_1" "rho"

j1<-grep("z.",jj,value=TRUE); j1

[1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" "z.realinc"

j2<-grep("x.",jj,value=TRUE); j2

[1] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" "x.realinc"

j<-c(j1,j2); j
  [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" 
"z.realinc"
  [7] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" 
"x.realinc"


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2022-07-10 Thread Andrew Simmons

?regex is a nice starting point, it's got plenty of details on meta
characters and characters classes, if you need more advanced stuff you'll
probably have to look at the perl regex documentation, I believe it's
linked in ?regex.

I might try something like

grep("^[xz]\\.")

or

grep("^[xz][.]")

or if you'd consider a different function,

startsWith(jj, "x.") | startsWith(jj, "z.")

On Sun, Jul 10, 2022, 21:49 Steven T. Yen  wrote:

> Thanks Jeff. It works. If there is a good reference I should read
> (besides ? grep) I's be glad to have it.
>
> On 7/11/2022 9:30 AM, Jeff Newmiller wrote:
> > grep( "^(z|x)\\.", jj, value = TRUE )
> >
> > or
> >
> > grep( r"(^(z|x)\.)", jj, value = TRUE )
> >
> >
> > On July 10, 2022 6:08:45 PM PDT, "Steven T. Yen" 
> wrote:
> >> Dear, Below, jj contains character strings starting with “z.” and “x.”.
> I want to grep all that contain either “z.” or “x.”. I had to grep “z.” and
> “x.” separately and then tack the result together. Is there a convenient
> grep option that would grep strings with either “z.” or “x.”. Thank you!
> >>
> >>> jj<-names(v$est); jj
> >>   [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep"
> "z.realinc"
> >>   [7] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep"
> "x.realinc"
> >> [13] "mu1_1" "mu2_1" "rho"
> >>> j1<-grep("z.",jj,value=TRUE); j1
> >> [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" "z.realinc"
> >>> j2<-grep("x.",jj,value=TRUE); j2
> >> [1] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" "x.realinc"
> >>> j<-c(j1,j2); j
> >>   [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep"
> "z.realinc"
> >>   [7] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep"
> "x.realinc"
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R grep question

2021-05-28 Thread Bert Gunter

FWIW:

I think Jim makes an excellent point -- regex's really aren't the right
tool for this sort of thing (imho); matching is.

Note also that if one is willing to live with a logical response (better,
again imho), then the ifelse() can of course be dispensed with:

> CRC$MMR.gene<-CRC$gene.all %in% match_strings
> CRC$MMR.gene
[1]  TRUE FALSE  TRUE FALSE

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, May 27, 2021 at 8:35 PM Jim Lemon  wrote:

> Hi Kai,
> You may find %in% easier than grep when multiple matches are needed:
>
> match_strings<-c("MLH1","MSH2")
> CRC<-data.frame(gene.all=c("MLH1","MSL1","MSH2","MCC3"))
> CRC$MMR.gene<-ifelse(CRC$gene.all %in% match_strings,"Yes","No")
>
> Composing your match strings before applying %in% may be more flexible
> if you have more than one selection to make.
>
> On Fri, May 28, 2021 at 1:57 AM Marc Schwartz via R-help
>  wrote:
> >
> > Hi,
> >
> > A quick clarification:
> >
> > The regular expression is a single quoted character vector, not a
> > character vector on either side of the | operator:
> >
> > "MLH1|MSH2"
> >
> > not:
> >
> > "MLH1"|"MSH2"
> >
> > The | is treated as a special character within the regular expression.
> > See ?regex.
> >
> > grep(), when value = FALSE, returns the index of the match within the
> > source vector, while when value = TRUE, returns the found character
> > entries themselves.
> >
> > Thus, you need to be sure that your ifelse() incantation is matching the
> > correct values.
> >
> > In the case of grepl(), it returns TRUE or FALSE, as Rui noted, thus:
> >
> >CRC$MMR.gene <- ifelse(grepl("MLH1|MSH2",CRC$gene.all), "Yes", "No")
> >
> > should work.
> >
> > Regards,
> >
> > Marc Schwartz
> >
> >
> > Kai Yang via R-help wrote on 5/27/21 11:23 AM:
> > >   Hi Rui,thank you for your suggestion.
> > > but when I try the solution, I got message below:
> > >
> > > Error in "MLH1" | "MSH2" :   operations are possible only for numeric,
> logical or complex types
> > >
> > > does it mean, grepl can not work on character field?
> > > Thanks,KaiOn Thursday, May 27, 2021, 01:37:58 AM PDT, Rui Barradas
>  wrote:
> > >
> > >   Hello,
> > >
> > > ifelse needs a logical condition, not the value. Try grepl.
> > >
> > >
> > > CRC$MMR.gene <- ifelse(grepl("MLH1"|"MSH2",CRC$gene.all), "Yes", "No")
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > Às 05:29 de 27/05/21, Kai Yang via R-help escreveu:
> > >> Hi List,
> > >> I wrote the code to create a new variable:
> > >>
> CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
> > >>
> > >>
> > >> I need to create MMR.gene column in CRC data frame, ifgene.all column
> contenes MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No
> > >>
> > >> But, the code doesn't work for me. Can anyone tell how to fix the
> code?
> > >>
> > >> Thank you,
> > >>
> > >> Kai
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R grep question

2021-05-27 Thread Jim Lemon

Hi Kai,
You may find %in% easier than grep when multiple matches are needed:

match_strings<-c("MLH1","MSH2")
CRC<-data.frame(gene.all=c("MLH1","MSL1","MSH2","MCC3"))
CRC$MMR.gene<-ifelse(CRC$gene.all %in% match_strings,"Yes","No")

Composing your match strings before applying %in% may be more flexible
if you have more than one selection to make.

On Fri, May 28, 2021 at 1:57 AM Marc Schwartz via R-help
 wrote:
>
> Hi,
>
> A quick clarification:
>
> The regular expression is a single quoted character vector, not a
> character vector on either side of the | operator:
>
> "MLH1|MSH2"
>
> not:
>
> "MLH1"|"MSH2"
>
> The | is treated as a special character within the regular expression.
> See ?regex.
>
> grep(), when value = FALSE, returns the index of the match within the
> source vector, while when value = TRUE, returns the found character
> entries themselves.
>
> Thus, you need to be sure that your ifelse() incantation is matching the
> correct values.
>
> In the case of grepl(), it returns TRUE or FALSE, as Rui noted, thus:
>
>CRC$MMR.gene <- ifelse(grepl("MLH1|MSH2",CRC$gene.all), "Yes", "No")
>
> should work.
>
> Regards,
>
> Marc Schwartz
>
>
> Kai Yang via R-help wrote on 5/27/21 11:23 AM:
> >   Hi Rui,thank you for your suggestion.
> > but when I try the solution, I got message below:
> >
> > Error in "MLH1" | "MSH2" :   operations are possible only for numeric, 
> > logical or complex types
> >
> > does it mean, grepl can not work on character field?
> > Thanks,KaiOn Thursday, May 27, 2021, 01:37:58 AM PDT, Rui Barradas 
> >  wrote:
> >
> >   Hello,
> >
> > ifelse needs a logical condition, not the value. Try grepl.
> >
> >
> > CRC$MMR.gene <- ifelse(grepl("MLH1"|"MSH2",CRC$gene.all), "Yes", "No")
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 05:29 de 27/05/21, Kai Yang via R-help escreveu:
> >> Hi List,
> >> I wrote the code to create a new variable:
> >> CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
> >>
> >>
> >> I need to create MMR.gene column in CRC data frame, ifgene.all column 
> >> contenes MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No
> >>
> >> But, the code doesn't work for me. Can anyone tell how to fix the code?
> >>
> >> Thank you,
> >>
> >> Kai
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R grep question

2021-05-27 Thread Marc Schwartz via R-help


Hi,

A quick clarification:

The regular expression is a single quoted character vector, not a 
character vector on either side of the | operator:


"MLH1|MSH2"

not:

"MLH1"|"MSH2"

The | is treated as a special character within the regular expression. 
See ?regex.


grep(), when value = FALSE, returns the index of the match within the 
source vector, while when value = TRUE, returns the found character 
entries themselves.


Thus, you need to be sure that your ifelse() incantation is matching the 
correct values.


In the case of grepl(), it returns TRUE or FALSE, as Rui noted, thus:

  CRC$MMR.gene <- ifelse(grepl("MLH1|MSH2",CRC$gene.all), "Yes", "No")

should work.

Regards,

Marc Schwartz


Kai Yang via R-help wrote on 5/27/21 11:23 AM:

  Hi Rui,thank you for your suggestion.
but when I try the solution, I got message below:

Error in "MLH1" | "MSH2" :   operations are possible only for numeric, logical 
or complex types

does it mean, grepl can not work on character field?
Thanks,KaiOn Thursday, May 27, 2021, 01:37:58 AM PDT, Rui Barradas 
 wrote:
  
  Hello,


ifelse needs a logical condition, not the value. Try grepl.


CRC$MMR.gene <- ifelse(grepl("MLH1"|"MSH2",CRC$gene.all), "Yes", "No")


Hope this helps,

Rui Barradas

Às 05:29 de 27/05/21, Kai Yang via R-help escreveu:

Hi List,
I wrote the code to create a new variable:
CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
   


I need to create MMR.gene column in CRC data frame, ifgene.all column contenes 
MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No

But, the code doesn't work for me. Can anyone tell how to fix the code?

Thank you,

Kai


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R grep question

2021-05-27 Thread Kai Yang via R-help

 Hi Rui,thank you for your suggestion. 
but when I try the solution, I got message below:

Error in "MLH1" | "MSH2" :   operations are possible only for numeric, logical 
or complex types

does it mean, grepl can not work on character field?
Thanks,KaiOn Thursday, May 27, 2021, 01:37:58 AM PDT, Rui Barradas 
 wrote:  
 
 Hello,

ifelse needs a logical condition, not the value. Try grepl.


CRC$MMR.gene <- ifelse(grepl("MLH1"|"MSH2",CRC$gene.all), "Yes", "No")


Hope this helps,

Rui Barradas

Às 05:29 de 27/05/21, Kai Yang via R-help escreveu:
> Hi List,
> I wrote the code to create a new variable:
> CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
>  
> 
> I need to create MMR.gene column in CRC data frame, ifgene.all column 
> contenes MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No
> 
> But, the code doesn't work for me. Can anyone tell how to fix the code?
> 
> Thank you,
> 
> Kai
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R grep question

2021-05-27 Thread Rui Barradas


Hello,

ifelse needs a logical condition, not the value. Try grepl.


CRC$MMR.gene <- ifelse(grepl("MLH1"|"MSH2",CRC$gene.all), "Yes", "No")


Hope this helps,

Rui Barradas

Às 05:29 de 27/05/21, Kai Yang via R-help escreveu:

Hi List,
I wrote the code to create a new variable:
CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
  


I need to create MMR.gene column in CRC data frame, ifgene.all column contenes 
MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No

But, the code doesn't work for me. Can anyone tell how to fix the code?

Thank you,

Kai
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R grep question

2021-05-27 Thread Jeff Newmiller

Post in plain text

Use grepl

On May 26, 2021 9:29:10 PM PDT, Kai Yang via R-help  
wrote:
>Hi List,
>I wrote the code to create a new variable:
>CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
> 
>
>I need to create MMR.gene column in CRC data frame, ifgene.all column
>contenes MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No
>
>But, the code doesn't work for me. Can anyone tell how to fix the code?
>
>Thank you,
>
>Kai
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R grep question

2021-05-26 Thread Kai Yang via R-help

Hi List,
I wrote the code to create a new variable:
CRC$MMR.gene<-ifelse(grep("MLH1"|"MSH2",CRC$gene.all,value=T),"Yes","No")
 

I need to create MMR.gene column in CRC data frame, ifgene.all column contenes 
MLH1 or MSH2, then the MMR.gene=Yes, if not,MMR.gene=No

But, the code doesn't work for me. Can anyone tell how to fix the code?

Thank you,

Kai
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2021-05-09 Thread Rui Barradas


Hello,

Maybe instead of a loop, vectorize with logical indices.


i1 <- is.na(jindex)
i2 <- is.numeric(jindex)
if(any(!i1)){
  if(any(!i2)){
words <- jindex[!i1 & !i2]
pattern <- paste(words, collapse = "|")
jindex <- grep(pattern = pattern, x.label, value = FALSE)
  }
  jj <- jindex[!i1]
  x.label <- x.label[jj]
}


Or even simpler

if(any(!i1 & !i2)){
  words <- jindex[!i1 & !i2]
  pattern <- paste(words, collapse = "|")
  jindex <- grep(pattern = pattern, x.label, value = FALSE)
  jj <- jindex[!i1]
  x.label <- x.label[jj]
}


Hope this helps,

Rui Barradas

Às 02:54 de 09/05/21, Steven Yen escreveu:

Thank to Rui, Jeff, and Bert. They are all very useful.
Somewhat related is the following, in which jindex is a numeric or 
alphanumeric vector in a function that starts with


try<-function(, jindex=NA)

In the if loop, in the first line I am trying to determine whether the 
vector jindex is NA;
In the second line, I am trying to determine whether elements in vector 
jindex is are all non-numeric.


Not sure how so I tried to judge by the first element of jindex. Any 
better way? Thannks.


   if (!is.na(jindex[1])){   # like to improve this line
     if(!is.numeric(jindex)[1]){ # like to improve this line
   words  <-jindex
   pattern<-paste(words,collapse="|")
   jindex <-grep(pattern=pattern,x.label,value=FALSE)
     }
     jj<-jindex; x.label<-x.label[jj]
   }

On 2021/5/9 上午 03:02, Rui Barradas wrote:

Hello,

The pattern can be assembled with paste(., collapse = "|").
With the same vector of names, nms:


words <- c("black","conserv")
pattern <- paste(words, collapse = "|")
grep(pattern = pattern, nms, value = TRUE)
#[1] "x1.black"   "x1.conserv" "x2.black"   "x2.conserv"


Hope this helps,

Rui Barradas

Às 18:20 de 08/05/21, Jeff Newmiller escreveu:
Regular expression patterns are not vectorized... only the data to be 
searched are. Use one of the many websites dedicated to tutoring 
regular expressions to learn how they work. (Using function names 
like "names" as data names is bad practice.)


nms <- c( "x1.one", "x1.black", "x1.othrrace", "x1.moddkna", 
"x1.conserv", "x1.nstrprty", "x1.strrep", "x1.sevngprt", "x2.one", 
"x2.black", "x2.othrrace", "x2.moddkna", "x2.conserv", "x2.nstrprty", 
"x2.strrep", "x2.sevngprt" )


grep( "black|conserv", nms, value = TRUE )

On May 8, 2021 10:00:12 AM PDT, Steven Yen  wrote:

Below, the first command simply creates a list of 16 names (labels)
which can be ignore.

In the 2nd and 3rd commands, I am able to identify names containing
"black".

In line 4, I am trying to identify names containing "black" or
"conserv"
but obviously it does not work. Can someone help? Thanks.


names<-names(tp.nohs$estimate)[c(1:8,58:65)]; names

  [1] "x1.one"  "x1.black"    "x1.othrrace" "x1.moddkna"
"x1.conserv"  "x1.nstrprty"
  [7] "x1.strrep"   "x1.sevngprt" "x2.one"  "x2.black" 
"x2.othrrace"


"x2.moddkna"
[13] "x2.conserv"  "x2.nstrprty" "x2.strrep"   "x2.sevngprt"

grep("black",names,value=TRUE)

[1] "x1.black" "x2.black"

grep("black",names,value=FALSE)

[1]  2 10

grep(c("black","conserv"),names,value=TRUE)

[1] "x1.black" "x2.black"
Warning message:
In grep(c("black", "conserv"), names, value = TRUE) :
   argument 'pattern' has length > 1 and only the first element will be
used

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2021-05-08 Thread Steven Yen


Thank to Rui, Jeff, and Bert. They are all very useful.
Somewhat related is the following, in which jindex is a numeric or 
alphanumeric vector in a function that starts with


try<-function(, jindex=NA)

In the if loop, in the first line I am trying to determine whether the 
vector jindex is NA;
In the second line, I am trying to determine whether elements in vector 
jindex is are all non-numeric.


Not sure how so I tried to judge by the first element of jindex. Any 
better way? Thannks.


  if (!is.na(jindex[1])){   # like to improve this line
    if(!is.numeric(jindex)[1]){ # like to improve this line
  words  <-jindex
  pattern<-paste(words,collapse="|")
  jindex <-grep(pattern=pattern,x.label,value=FALSE)
    }
    jj<-jindex; x.label<-x.label[jj]
  }

On 2021/5/9 上午 03:02, Rui Barradas wrote:

Hello,

The pattern can be assembled with paste(., collapse = "|").
With the same vector of names, nms:


words <- c("black","conserv")
pattern <- paste(words, collapse = "|")
grep(pattern = pattern, nms, value = TRUE)
#[1] "x1.black"   "x1.conserv" "x2.black"   "x2.conserv"


Hope this helps,

Rui Barradas

Às 18:20 de 08/05/21, Jeff Newmiller escreveu:
Regular expression patterns are not vectorized... only the data to be 
searched are. Use one of the many websites dedicated to tutoring 
regular expressions to learn how they work. (Using function names 
like "names" as data names is bad practice.)


nms <- c( "x1.one", "x1.black", "x1.othrrace", "x1.moddkna", 
"x1.conserv", "x1.nstrprty", "x1.strrep", "x1.sevngprt", "x2.one", 
"x2.black", "x2.othrrace", "x2.moddkna", "x2.conserv", "x2.nstrprty", 
"x2.strrep", "x2.sevngprt" )


grep( "black|conserv", nms, value = TRUE )

On May 8, 2021 10:00:12 AM PDT, Steven Yen  wrote:

Below, the first command simply creates a list of 16 names (labels)
which can be ignore.

In the 2nd and 3rd commands, I am able to identify names containing
"black".

In line 4, I am trying to identify names containing "black" or
"conserv"
but obviously it does not work. Can someone help? Thanks.


names<-names(tp.nohs$estimate)[c(1:8,58:65)]; names

  [1] "x1.one"  "x1.black"    "x1.othrrace" "x1.moddkna"
"x1.conserv"  "x1.nstrprty"
  [7] "x1.strrep"   "x1.sevngprt" "x2.one"  "x2.black" 
"x2.othrrace"


"x2.moddkna"
[13] "x2.conserv"  "x2.nstrprty" "x2.strrep"   "x2.sevngprt"

grep("black",names,value=TRUE)

[1] "x1.black" "x2.black"

grep("black",names,value=FALSE)

[1]  2 10

grep(c("black","conserv"),names,value=TRUE)

[1] "x1.black" "x2.black"
Warning message:
In grep(c("black", "conserv"), names, value = TRUE) :
   argument 'pattern' has length > 1 and only the first element will be
used

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2021-05-08 Thread Rui Barradas


Hello,

The pattern can be assembled with paste(., collapse = "|").
With the same vector of names, nms:


words <- c("black","conserv")
pattern <- paste(words, collapse = "|")
grep(pattern = pattern, nms, value = TRUE)
#[1] "x1.black"   "x1.conserv" "x2.black"   "x2.conserv"


Hope this helps,

Rui Barradas

Às 18:20 de 08/05/21, Jeff Newmiller escreveu:

Regular expression patterns are not vectorized... only the data to be searched are. Use 
one of the many websites dedicated to tutoring regular expressions to learn how they 
work. (Using function names like "names" as data names is bad practice.)

nms <- c( "x1.one", "x1.black", "x1.othrrace", "x1.moddkna", "x1.conserv", "x1.nstrprty", "x1.strrep", "x1.sevngprt", "x2.one", "x2.black", 
"x2.othrrace", "x2.moddkna", "x2.conserv", "x2.nstrprty", "x2.strrep", "x2.sevngprt" )

grep( "black|conserv", nms, value = TRUE )

On May 8, 2021 10:00:12 AM PDT, Steven Yen  wrote:

Below, the first command simply creates a list of 16 names (labels)
which can be ignore.

In the 2nd and 3rd commands, I am able to identify names containing
"black".

In line 4, I am trying to identify names containing "black" or
"conserv"
but obviously it does not work. Can someone help? Thanks.


names<-names(tp.nohs$estimate)[c(1:8,58:65)]; names

  [1] "x1.one"  "x1.black"    "x1.othrrace" "x1.moddkna"
"x1.conserv"  "x1.nstrprty"
  [7] "x1.strrep"   "x1.sevngprt" "x2.one"  "x2.black" "x2.othrrace"

"x2.moddkna"
[13] "x2.conserv"  "x2.nstrprty" "x2.strrep"   "x2.sevngprt"

grep("black",names,value=TRUE)

[1] "x1.black" "x2.black"

grep("black",names,value=FALSE)

[1]  2 10

grep(c("black","conserv"),names,value=TRUE)

[1] "x1.black" "x2.black"
Warning message:
In grep(c("black", "conserv"), names, value = TRUE) :
   argument 'pattern' has length > 1 and only the first element will be
used

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2021-05-08 Thread Jeff Newmiller

Regular expression patterns are not vectorized... only the data to be searched 
are. Use one of the many websites dedicated to tutoring regular expressions to 
learn how they work. (Using function names like "names" as data names is bad 
practice.)

nms <- c( "x1.one", "x1.black", "x1.othrrace", "x1.moddkna", "x1.conserv", 
"x1.nstrprty", "x1.strrep", "x1.sevngprt", "x2.one", "x2.black", "x2.othrrace", 
"x2.moddkna", "x2.conserv", "x2.nstrprty", "x2.strrep", "x2.sevngprt" )

grep( "black|conserv", nms, value = TRUE )

On May 8, 2021 10:00:12 AM PDT, Steven Yen  wrote:
>Below, the first command simply creates a list of 16 names (labels) 
>which can be ignore.
>
>In the 2nd and 3rd commands, I am able to identify names containing
>"black".
>
>In line 4, I am trying to identify names containing "black" or
>"conserv" 
>but obviously it does not work. Can someone help? Thanks.
>
> > names<-names(tp.nohs$estimate)[c(1:8,58:65)]; names
>  [1] "x1.one"  "x1.black"    "x1.othrrace" "x1.moddkna" 
>"x1.conserv"  "x1.nstrprty"
> [7] "x1.strrep"   "x1.sevngprt" "x2.one"  "x2.black" "x2.othrrace"
>
>"x2.moddkna"
>[13] "x2.conserv"  "x2.nstrprty" "x2.strrep"   "x2.sevngprt"
> > grep("black",names,value=TRUE)
>[1] "x1.black" "x2.black"
> > grep("black",names,value=FALSE)
>[1]  2 10
> > grep(c("black","conserv"),names,value=TRUE)
>[1] "x1.black" "x2.black"
>Warning message:
>In grep(c("black", "conserv"), names, value = TRUE) :
>  argument 'pattern' has length > 1 and only the first element will be
>used
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2021-05-08 Thread David Winsemius

On 5/8/21 10:00 AM, Steven Yen wrote:
Below, the first command simply creates a list of 16 names (labels) 
which can be ignore.

In the 2nd and 3rd commands, I am able to identify names containing 
"black".

In line 4, I am trying to identify names containing "black" or 
"conserv" but obviously it does not work. Can someone help? Thanks.

> names<-names(tp.nohs$estimate)[c(1:8,58:65)]; names
 [1] "x1.one"  "x1.black"    "x1.othrrace" "x1.moddkna" 
"x1.conserv"  "x1.nstrprty"
 [7] "x1.strrep"   "x1.sevngprt" "x2.one"  "x2.black" 
"x2.othrrace" "x2.moddkna"

[13] "x2.conserv"  "x2.nstrprty" "x2.strrep"   "x2.sevngprt"
> grep("black",names,value=TRUE)
[1] "x1.black" "x2.black"
> grep("black",names,value=FALSE)
[1]  2 10
> grep(c("black","conserv"),names,value=TRUE)
[1] "x1.black" "x2.black"
Warning message:
In grep(c("black", "conserv"), names, value = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be 
used

Try using the logical OR operator (vertical bar, AKA "pipe")

grep(c("black|conserv"), names, value=TRUE)

--

David.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep

2021-05-08 Thread Steven Yen

Below, the first command simply creates a list of 16 names (labels) 
which can be ignore.


In the 2nd and 3rd commands, I am able to identify names containing "black".

In line 4, I am trying to identify names containing "black" or "conserv" 
but obviously it does not work. Can someone help? Thanks.


> names<-names(tp.nohs$estimate)[c(1:8,58:65)]; names
 [1] "x1.one"  "x1.black"    "x1.othrrace" "x1.moddkna" 
"x1.conserv"  "x1.nstrprty"
 [7] "x1.strrep"   "x1.sevngprt" "x2.one"  "x2.black" "x2.othrrace" 
"x2.moddkna"

[13] "x2.conserv"  "x2.nstrprty" "x2.strrep"   "x2.sevngprt"
> grep("black",names,value=TRUE)
[1] "x1.black" "x2.black"
> grep("black",names,value=FALSE)
[1]  2 10
> grep(c("black","conserv"),names,value=TRUE)
[1] "x1.black" "x2.black"
Warning message:
In grep(c("black", "conserv"), names, value = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2018-07-16 Thread Ista Zahn

grep("(^| )ABHD14A( ;|$)",xgen, value = TRUE)

maybe.

On Mon, Jul 16, 2018 at 1:46 PM, Brian Smith  wrote:
> Hi,
>
> I was trying to find a pattern ("ABHD14A") in a character string ('xgen' in
> example below) using grepl. Note that the individual members may be
> separated by a semi-colon.
>
> The correct answer should return:
>
> "ABHD-ACY1 ; ABHD14A" "ABHD14A ; YYY"
>
> I have tried three approaches, but still seem a bit off. Attempt 2 below
> gets closest, but it also returns a hit where my pattern is a substring.
> Here is my code:
>
> ===
>
>
>   xgen <- c("XYZ","ABHD-ACY1 ; ABHD14A","ABHD14AXX","ABHD14A ; YYY")
>   ga <- "ABHD14A"
>
>   # 1.
>   kx <- grepl(paste0("^",ga,"$"),xgen)
>   xgen[kx]
>
>   # 2.
>   ky <- grepl(ga,xgen)
>   xgen[ky]
>
>
> ==
>
> What do I need to add/change in #2 above?
>
> many thanks!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep

2018-07-16 Thread Brian Smith

Hi,

I was trying to find a pattern ("ABHD14A") in a character string ('xgen' in
example below) using grepl. Note that the individual members may be
separated by a semi-colon.

The correct answer should return:

"ABHD-ACY1 ; ABHD14A" "ABHD14A ; YYY"

I have tried three approaches, but still seem a bit off. Attempt 2 below
gets closest, but it also returns a hit where my pattern is a substring.
Here is my code:

===


  xgen <- c("XYZ","ABHD-ACY1 ; ABHD14A","ABHD14AXX","ABHD14A ; YYY")
  ga <- "ABHD14A"

  # 1.
  kx <- grepl(paste0("^",ga,"$"),xgen)
  xgen[kx]

  # 2.
  ky <- grepl(ga,xgen)
  xgen[ky]


==

What do I need to add/change in #2 above?

many thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-19 Thread Joyce Robbins

I'm not sure you need grep:

> all %in% some
[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE

On Thu, May 19, 2016 at 7:58 PM, MacQueen, Don  wrote:

> Start with:
>
> > all <- c("ants","birds","cats","dogs","elks","fox")
> > all[grep('ants|cats|fox',all)]
> [1] "ants" "cats" "fox"
>
> Then construct the first arg to grep:
>
> > some <- c("ants","cats","fox")
> > all[ grep( paste(some,collapse='|') , all)]
> [1] "ants" "cats" "fox"
>
>
>
> --
> Don MacQueen
>
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
>
>
>
>
>
> On 5/19/16, 4:09 PM, "R-help on behalf of Steven Yen"
>  wrote:
>
> >What is a good way to grep multiple strings (say in a vector)? In the
> >following, I grep ants, cats, and fox separately and concatenate them,
> >is there a way to grep the trio in one action? Thanks.
> >
> >all<-c("ants","birds","cats","dogs","elks","fox"); all
> >[1] "ants"  "birds" "cats"  "dogs"  "elks"  "fox"
> >some<-c("ants","cats","fox"); some
> >[1] "ants" "cats" "fox"
> >j<-c(
> >   grep(some[1],all,value=F),
> >   grep(some[2],all,value=F),
> >   grep(some[3],all,value=F)); j; all[j]
> >[1] 1 3 6
> >[1] "ants" "cats" "fox"
> >
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-19 Thread MacQueen, Don

Start with:

> all <- c("ants","birds","cats","dogs","elks","fox")
> all[grep('ants|cats|fox',all)]
[1] "ants" "cats" "fox"

Then construct the first arg to grep:

> some <- c("ants","cats","fox")
> all[ grep( paste(some,collapse='|') , all)]
[1] "ants" "cats" "fox"



-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/19/16, 4:09 PM, "R-help on behalf of Steven Yen"
 wrote:

>What is a good way to grep multiple strings (say in a vector)? In the
>following, I grep ants, cats, and fox separately and concatenate them,
>is there a way to grep the trio in one action? Thanks.
>
>all<-c("ants","birds","cats","dogs","elks","fox"); all
>[1] "ants"  "birds" "cats"  "dogs"  "elks"  "fox"
>some<-c("ants","cats","fox"); some
>[1] "ants" "cats" "fox"
>j<-c(
>   grep(some[1],all,value=F),
>   grep(some[2],all,value=F),
>   grep(some[3],all,value=F)); j; all[j]
>[1] 1 3 6
>[1] "ants" "cats" "fox"
>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-19 Thread David Winsemius


> On May 19, 2016, at 4:09 PM, Steven Yen  wrote:
> 
> What is a good way to grep multiple strings (say in a vector)? In the 
> following, I grep ants, cats, and fox separately and concatenate them, 
> is there a way to grep the trio in one action? Thanks.
> 
> all<-c("ants","birds","cats","dogs","elks","fox"); all
> [1] "ants"  "birds" "cats"  "dogs"  "elks"  "fox"
> some<-c("ants","cats","fox"); some
> [1] "ants" "cats" "fox"
> j<-c(
>   grep(some[1],all,value=F),
>   grep(some[2],all,value=F),
>   grep(some[3],all,value=F)); j; all[j]
> [1] 1 3 6
> [1] "ants" "cats" "fox"

j <- grep( paste0( some, collapse="|") , all ); j; all[j]
#--
[1] 1 3 6
[1] "ants" "cats" "fox" 

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-19 Thread Peter Langfelder

I use my own functions multiGrep and multiGrepl:

multiGrep = function(patterns, x, ..., sort = TRUE, invert = FALSE)
{
  if (invert)
  {
out = multiIntersect(lapply(patterns, grep, x, ..., invert = TRUE))
  } else
out = unique(unlist(lapply(patterns, grep, x, ..., invert = FALSE)));
  if (sort) out = sort(out);
  out;
}

multiGrepl = function(patterns, x, ...)
{
  mat = do.call(cbind, lapply(patterns, function(p)
as.numeric(grepl(p, x, ...;
  rowSums(mat)>0;
}

> multiGrep(some, all)
[1] 1 3 6

> multiGrepl(some, all)
[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE

multiGrep(some, all, invert = TRUE)
[1] 2 4 5

Peter


On Thu, May 19, 2016 at 4:09 PM, Steven Yen  wrote:
> What is a good way to grep multiple strings (say in a vector)? In the
> following, I grep ants, cats, and fox separately and concatenate them,
> is there a way to grep the trio in one action? Thanks.
>
> all<-c("ants","birds","cats","dogs","elks","fox"); all
> [1] "ants"  "birds" "cats"  "dogs"  "elks"  "fox"
> some<-c("ants","cats","fox"); some
> [1] "ants" "cats" "fox"
> j<-c(
>grep(some[1],all,value=F),
>grep(some[2],all,value=F),
>grep(some[3],all,value=F)); j; all[j]
> [1] 1 3 6
> [1] "ants" "cats" "fox"
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grep command

2016-05-19 Thread Steven Yen

What is a good way to grep multiple strings (say in a vector)? In the 
following, I grep ants, cats, and fox separately and concatenate them, 
is there a way to grep the trio in one action? Thanks.

all<-c("ants","birds","cats","dogs","elks","fox"); all
[1] "ants"  "birds" "cats"  "dogs"  "elks"  "fox"
some<-c("ants","cats","fox"); some
[1] "ants" "cats" "fox"
j<-c(
   grep(some[1],all,value=F),
   grep(some[2],all,value=F),
   grep(some[3],all,value=F)); j; all[j]
[1] 1 3 6
[1] "ants" "cats" "fox"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread William Dunlap via R-help

No matter how expert you are at writing regular expressions,
it is important to list which sorts of strings you want matched
and which you do not want matched.  Saying you want to match
"age" but not "age2" leads to lots of possibilities.  Saying how
you want to categorize each string in a vector of stirngs like
the following would narrow things down.
   c("age", "ages ago", "age 60", "An aged man", "page", "Age", "age1",
  "age2",  "dark age", "the aGE")
>From such a list, make a good verbal description of the rule you
are thinking of and someone will be able to translate that into a regular
expression (or say that regular expressions cannot do the job).


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, May 4, 2016 at 9:59 AM, David Winsemius 
wrote:

>
> > On May 3, 2016, at 11:16 PM, Jeff Newmiller 
> wrote:
> >
> > Yes, but the answer is likely to depend on the actual patterns of
> strings in your real data, so the sooner you go find a book or tutorial on
> regular expressions the better.  This is decidedly not R specific and there
> are already lots of resources out there.
> >
> > Given the example you provide,  the pattern "age$" should work. However,
> that is probably not sufficiently selective for a practical data set so
> start learning to fish (design regex patterns) yourself.
>
> @ Steven;
>
> As is almost always the case I agree with Jeff. I found that reading Rhelp
> and attempting to answer regex-questions was the best method to learn them.
> In particular I found the postings by Gabor Grothendieck very helpful in
> getting some degree of competence in this area. I see that his grep-related
> postings still exceed my grep postings and I assure you that his will be
> more sophisticated than my efforts. I recommend the MarkMail Rhelp mirror
> interface as very useful in "mining" Rhelp for knowledge:
>
> Gabor Grothendieck answers with either 'grep' pr 'regex' in their body:
>
>
> http://markmail.org/search/?q=list%3Aorg.r-project.r-help+list%3Agrep+list%3Aregex+from%3A%22Gabor+Grothendieck
>
> --
> Happy searching;
> David.
>
>
> > --
> > Sent from my phone. Please excuse my brevity.
> >
> > On May 3, 2016 10:45:42 PM PDT, Steven Yen  wrote:
> >> Dear all
> >> In the grep command below, is there a way to identify only "age" and
> >> not "age2"? In other words, I like to greb "age" and "age2"
> >> separately, one at a time. Thanks.
> >>
> >> x<-c("abc","def","rst","xyz","age","age2")
> >> x
> >>
> >> [1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
> >>
> >> grep("age2",x)
> >>
> >> [1] 6
> >>
> >> grep("age",x) # I need to grab "age" only, not "age2"
> >>
> >> [1] 5 6
> >>
>
> David Winsemius
> Alameda, CA, USA
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread David Winsemius

> On May 3, 2016, at 11:16 PM, Jeff Newmiller  wrote:
> 
> Yes, but the answer is likely to depend on the actual patterns of strings in 
> your real data, so the sooner you go find a book or tutorial on regular 
> expressions the better.  This is decidedly not R specific and there are 
> already lots of resources out there.
> 
> Given the example you provide,  the pattern "age$" should work. However, that 
> is probably not sufficiently selective for a practical data set so start 
> learning to fish (design regex patterns) yourself. 

@ Steven;

As is almost always the case I agree with Jeff. I found that reading Rhelp and 
attempting to answer regex-questions was the best method to learn them. In 
particular I found the postings by Gabor Grothendieck very helpful in getting 
some degree of competence in this area. I see that his grep-related postings 
still exceed my grep postings and I assure you that his will be more 
sophisticated than my efforts. I recommend the MarkMail Rhelp mirror interface 
as very useful in "mining" Rhelp for knowledge:

Gabor Grothendieck answers with either 'grep' pr 'regex' in their body:

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+list%3Agrep+list%3Aregex+from%3A%22Gabor+Grothendieck

-- 
Happy searching;
David.

> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On May 3, 2016 10:45:42 PM PDT, Steven Yen  wrote:
>> Dear all
>> In the grep command below, is there a way to identify only "age" and
>> not "age2"? In other words, I like to greb "age" and "age2"
>> separately, one at a time. Thanks.
>> 
>> x<-c("abc","def","rst","xyz","age","age2")
>> x
>> 
>> [1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
>> 
>> grep("age2",x)
>> 
>> [1] 6
>> 
>> grep("age",x) # I need to grab "age" only, not "age2"
>> 
>> [1] 5 6
>> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread Doran, Harold

You asked this question yesterday, and received responses on this same 
response. Is there a reason this is reposted?

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Steven Yen
Sent: Wednesday, May 04, 2016 1:46 AM
To: r-help <r-help@r-project.org>
Subject: [R] Grep command

Dear all
In the grep command below, is there a way to identify only "age" and not 
"age2"? In other words, I like to greb "age" and "age2"
separately, one at a time. Thanks.

x<-c("abc","def","rst","xyz","age","age2")
x

[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"

grep("age2",x)

[1] 6

grep("age",x) # I need to grab "age" only, not "age2"

[1] 5 6

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread Niels Jespersen

> x <- c("abc","def","rst","xyz","age","age2")
> grep("^age$", x)
[1] 5
> grep("^age2$", x)
[1] 6
> 
>

-Oprindelig meddelelse-
Fra: R-help [mailto:r-help-boun...@r-project.org] På vegne af Steven Yen
Sendt: 4. maj 2016 07:46
Til: r-help
Emne: [R] Grep command

Dear all
In the grep command below, is there a way to identify only "age" and not 
"age2"? In other words, I like to greb "age" and "age2"
separately, one at a time. Thanks.

x<-c("abc","def","rst","xyz","age","age2")
x

[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"

grep("age2",x)

[1] 6

grep("age",x) # I need to grab "age" only, not "age2"

[1] 5 6

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread Jeff Newmiller

Yes, but the answer is likely to depend on the actual patterns of strings in 
your real data, so the sooner you go find a book or tutorial on regular 
expressions the better.  This is decidedly not R specific and there are already 
lots of resources out there.

Given the example you provide,  the pattern "age$" should work. However, that 
is probably not sufficiently selective for a practical data set so start 
learning to fish (design regex patterns) yourself. 
-- 
Sent from my phone. Please excuse my brevity.

On May 3, 2016 10:45:42 PM PDT, Steven Yen  wrote:
>Dear all
>In the grep command below, is there a way to identify only "age" and
>not "age2"? In other words, I like to greb "age" and "age2"
>separately, one at a time. Thanks.
>
>x<-c("abc","def","rst","xyz","age","age2")
>x
>
>[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
>
>grep("age2",x)
>
>[1] 6
>
>grep("age",x) # I need to grab "age" only, not "age2"
>
>[1] 5 6
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread Omar André Gonzáles Díaz

Hi Steven,

grep uses regex... so you can use this:

-grep("age$",x): it says: match "a", then "g", then "e" and stop.  The "$"
menas until here and no more.

> grep("age$",x)
[1] 5

2016-05-04 1:02 GMT-05:00 Jim Lemon :

> Hi Steven,
> If this is just a one-off, you could do this:
>
> grepl("age",x) & nchar(x)<4
>
> returning a logical vector containing TRUE for "age" but not "age2"
>
> Jim
>
>
> On Wed, May 4, 2016 at 3:45 PM, Steven Yen  wrote:
> > Dear all
> > In the grep command below, is there a way to identify only "age" and
> > not "age2"? In other words, I like to greb "age" and "age2"
> > separately, one at a time. Thanks.
> >
> > x<-c("abc","def","rst","xyz","age","age2")
> > x
> >
> > [1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
> >
> > grep("age2",x)
> >
> > [1] 6
> >
> > grep("age",x) # I need to grab "age" only, not "age2"
> >
> > [1] 5 6
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-04 Thread Jim Lemon

Hi Steven,
If this is just a one-off, you could do this:

grepl("age",x) & nchar(x)<4

returning a logical vector containing TRUE for "age" but not "age2"

Jim


On Wed, May 4, 2016 at 3:45 PM, Steven Yen  wrote:
> Dear all
> In the grep command below, is there a way to identify only "age" and
> not "age2"? In other words, I like to greb "age" and "age2"
> separately, one at a time. Thanks.
>
> x<-c("abc","def","rst","xyz","age","age2")
> x
>
> [1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
>
> grep("age2",x)
>
> [1] 6
>
> grep("age",x) # I need to grab "age" only, not "age2"
>
> [1] 5 6
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grep command

2016-05-03 Thread Steven Yen

Dear all
In the grep command below, is there a way to identify only "age" and
not "age2"? In other words, I like to greb "age" and "age2"
separately, one at a time. Thanks.

x<-c("abc","def","rst","xyz","age","age2")
x

[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"

grep("age2",x)

[1] 6

grep("age",x) # I need to grab "age" only, not "age2"

[1] 5 6

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep command

2016-05-03 Thread Hervé Pagès


On 05/03/2016 06:05 AM, Jeff Newmiller wrote:

Isn't that just an inefficient way to do

"age" == x


Yep, it's an inefficient way to do which(x == "age").

H.



?



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep command

2016-05-03 Thread Jeff Newmiller

Isn't that just an inefficient way to do

"age" == x

?
-- 
Sent from my phone. Please excuse my brevity.

On May 3, 2016 3:57:05 AM PDT, Ivan Calandra  
wrote:
>What about?
>
>grep("^age$", x)
>
>Ivan
>
>--
>Ivan Calandra, PhD
>Scientific Mediator
>University of Reims Champagne-Ardenne
>GEGENAA - EA 3795
>CREA - 2 esplanade Roland Garros
>51100 Reims, France
>+33(0)3 26 77 36 89
>ivan.calan...@univ-reims.fr
>--
>https://www.researchgate.net/profile/Ivan_Calandra
>https://publons.com/author/705639/
>
>Le 03/05/2016 à 12:38, Steven Yen a écrit :
>> Dear all
>> In the grep command below, is there a way to identify only "age" and
>> not "age2"? Thanks.
>>
>>> x<-c("abc","def","rst","xyz","age","age2")
>>> x
>> [1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
>>> grep("age2",x)
>> [1] 6
>>> grep("age",x) # I need to grab "age" only, not "age2"
>> [1] 5 6
>>
>> Also, I post message to r-help@r-project.org and that's subject to
>> approval by the list moderator. Am I sending it to the wrong address?
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep command

2016-05-03 Thread Ivan Calandra

Oh, and regarding the moderator approval, I guess it's because you're a 
new user to the list.


Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 03/05/2016 à 12:38, Steven Yen a écrit :

Dear all
In the grep command below, is there a way to identify only "age" and
not "age2"? Thanks.


x<-c("abc","def","rst","xyz","age","age2")
x

[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"

grep("age2",x)

[1] 6

grep("age",x) # I need to grab "age" only, not "age2"

[1] 5 6

Also, I post message to r-help@r-project.org and that's subject to
approval by the list moderator. Am I sending it to the wrong address?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep command

2016-05-03 Thread Ivan Calandra


What about?

grep("^age$", x)

Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 03/05/2016 à 12:38, Steven Yen a écrit :

Dear all
In the grep command below, is there a way to identify only "age" and
not "age2"? Thanks.


x<-c("abc","def","rst","xyz","age","age2")
x

[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"

grep("age2",x)

[1] 6

grep("age",x) # I need to grab "age" only, not "age2"

[1] 5 6

Also, I post message to r-help@r-project.org and that's subject to
approval by the list moderator. Am I sending it to the wrong address?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep command

2016-05-03 Thread Steven Yen

Dear all
In the grep command below, is there a way to identify only "age" and
not "age2"? Thanks.

> x<-c("abc","def","rst","xyz","age","age2")
> x
[1] "abc"  "def"  "rst"  "xyz"  "age"  "age2"
> grep("age2",x)
[1] 6
> grep("age",x) # I need to grab "age" only, not "age2"
[1] 5 6

Also, I post message to r-help@r-project.org and that's subject to
approval by the list moderator. Am I sending it to the wrong address?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep Help

2016-02-22 Thread Boris Steipe

I see numerous backticks in your code, not quotes. "`" and "'" are not the 
same. Backticks are not string delimiters.
As for valid names: look at the help page for make.names().


HTH,
Boris



On Feb 22, 2016, at 1:32 PM, Burhan ul haq  wrote:

> Hi,
> 
> # 1) I have read in a CSV file
> 
> df = read.csv(file="GiftCards - v1.csv",stringsAsFactors=FALSE)
> head(df)
> str(df)
> 
> # 2) converted to a tbl_df
> df2 = tbl_df(df)
> 
> # 3) fixed the names to remove leading "X" character
> n = names(df2)
> n2 = gsub(pattern="^\\w","\\1",n)
> names(df2) = n2
> 
> # 4) somehow the col names are character strings, requiring me to use
> quotes:
> df2$`2006` instead of df2$2006 # ---> PROBLEM 1
> 
> 
> # 5) I need to remove the leading $ sign followed by spaces to extract
> values. The problem is # it could be a two or three digit number. I am able
> to retrieve two digits correctly, but miss # out on the leading third digit.
> df2$`2006`= gsub("^(.+)([0-9]{2,3}\\.[0-9]{2})","\\2",df2$`2006`) # -->
> Problem 2
> 
> # 6) dump for the data frame
> 
> df2 <-
> structure(list(`2006` = structure(c(3L, 2L, 1L), .Label = c("$
> 24.81",
> "$ 39.16", "$   146.20"), class = "factor"), `2007` = structure(c(3L,
> 2L, 1L), .Label = c("$   26.25", "$ 41.95", "$   156.24"
> ), class = "factor"), `2008` = structure(c(3L, 2L, 1L), .Label = c("$
> 24.92",
> "$ 40.54", "$   147.33"), class = "factor"), `2009` = structure(c(3L,
> 2L, 1L), .Label = c("$   23.63", "$ 39.80", "$   139.91"
> ), class = "factor"), `2010` = structure(c(3L, 2L, 1L), .Label = c("$
> 24.78",
> "$ 41.48", "$   145.61"), class = "factor"), `2011` = structure(c(3L,
> 2L, 1L), .Label = c("$   27.80", "$ 43.23", "$   155.43"
> ), class = "factor"), `2012` = structure(c(3L, 2L, 1L), .Label = c("$
> 28.79",
> "$ 43.75", "$   156.86"), class = "factor"), `2013` = structure(c(3L,
> 2L, 1L), .Label = c("$   29.80", "$ 45.16", "$   163.16"
> ), class = "factor")), .Names = c("2006", "2007", "2008", "2009",
> "2010", "2011", "2012", "2013"), class = c("tbl_df", "tbl", "data.frame"
> ), row.names = c(NA, -3L))
> 
> 
> 
> Thanks for the help
> 
> 
> Br /
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grep Help

2016-02-22 Thread Burhan ul haq

Hi,

# 1) I have read in a CSV file

df = read.csv(file="GiftCards - v1.csv",stringsAsFactors=FALSE)
head(df)
str(df)

# 2) converted to a tbl_df
df2 = tbl_df(df)

# 3) fixed the names to remove leading "X" character
n = names(df2)
n2 = gsub(pattern="^\\w","\\1",n)
names(df2) = n2

# 4) somehow the col names are character strings, requiring me to use
quotes:
df2$`2006` instead of df2$2006 # ---> PROBLEM 1


# 5) I need to remove the leading $ sign followed by spaces to extract
values. The problem is # it could be a two or three digit number. I am able
to retrieve two digits correctly, but miss # out on the leading third digit.
df2$`2006`= gsub("^(.+)([0-9]{2,3}\\.[0-9]{2})","\\2",df2$`2006`) # -->
Problem 2

# 6) dump for the data frame

df2 <-
structure(list(`2006` = structure(c(3L, 2L, 1L), .Label = c("$
24.81",
"$ 39.16", "$   146.20"), class = "factor"), `2007` = structure(c(3L,
2L, 1L), .Label = c("$   26.25", "$ 41.95", "$   156.24"
), class = "factor"), `2008` = structure(c(3L, 2L, 1L), .Label = c("$
24.92",
"$ 40.54", "$   147.33"), class = "factor"), `2009` = structure(c(3L,
2L, 1L), .Label = c("$   23.63", "$ 39.80", "$   139.91"
), class = "factor"), `2010` = structure(c(3L, 2L, 1L), .Label = c("$
24.78",
"$ 41.48", "$   145.61"), class = "factor"), `2011` = structure(c(3L,
2L, 1L), .Label = c("$   27.80", "$ 43.23", "$   155.43"
), class = "factor"), `2012` = structure(c(3L, 2L, 1L), .Label = c("$
28.79",
"$ 43.75", "$   156.86"), class = "factor"), `2013` = structure(c(3L,
2L, 1L), .Label = c("$   29.80", "$ 45.16", "$   163.16"
), class = "factor")), .Names = c("2006", "2007", "2008", "2009",
"2010", "2011", "2012", "2013"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L))



Thanks for the help


Br /

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep/regexp

2015-12-01 Thread David Winsemius

> On Dec 1, 2015, at 2:47 PM, Ravi Varadhan  wrote:
> 
> Hi,
> I would appreciate some help with using grep().  I have a bunch of variables 
> in a data frame and I would like to select some of them using grep.  Here is 
> an example of what I am trying to do:
> 
> vars <- c("Fr_I_total", "Fr_I_percent_of_CD4", 
> "Ki.67_in_Fr_I_percent_of_Fr_I", "Fr_II_percent_of_CD4", 
> "Ki.67_in_Fr_II_percent_of_Fr_II")
> 
>> From the above vector, I would like to select those variables beginning with 
>> `Fr' and containing `percent' in them.   In other words, I would like to get 
>> the variables "Fr_I_percent_of_CD4" and "Fr_II_percent_of_CD4".
> 
> How can I use grep() to do this?

> grep("^Fr.*percent", vars, value=TRUE)
[1] "Fr_I_percent_of_CD4"  “Fr_II_percent_of_CD4"

> More generally, are there any good online resources with examples like this 
> for the use of grep() and regexp() in R?  I didn't find the help pages for 
> these very user-friendly.
> 

There are several interactive regex websites where you can get commented and 
tested solutions. They have the disadvantage that they don’t use the extra 
backslashes that R requires since both R and regex use backslashes as escape 
characters.

I learned regex from the R's ?regex page and by watching Gabor Grothendeick’s 
postings to Rhelp. (I’ve probably gone over the ?regex page 20 or thirty times.)

These days there are a great many regex questions and answers on Stack Overflow 
, many of them again written by Maestro Grothendeick.

— 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep/regexp

2015-12-01 Thread Ravi Varadhan

Hi,
I would appreciate some help with using grep().  I have a bunch of variables in 
a data frame and I would like to select some of them using grep.  Here is an 
example of what I am trying to do:

vars <- c("Fr_I_total", "Fr_I_percent_of_CD4", "Ki.67_in_Fr_I_percent_of_Fr_I", 
"Fr_II_percent_of_CD4", "Ki.67_in_Fr_II_percent_of_Fr_II")

>From the above vector, I would like to select those variables beginning with 
>`Fr' and containing `percent' in them.   In other words, I would like to get 
>the variables "Fr_I_percent_of_CD4" and "Fr_II_percent_of_CD4".

How can I use grep() to do this?

More generally, are there any good online resources with examples like this for 
the use of grep() and regexp() in R?  I didn't find the help pages for these 
very user-friendly.

Thank you very much,
Ravi

Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg)
Associate Professor,  Department of Oncology
Division of Biostatistics & Bionformatics
Sidney Kimmel Comprehensive Cancer Center
Johns Hopkins University
550 N. Broadway, Suite -E
Baltimore, MD 21205
410-502-2619


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grep out columns using a list of strings

2015-05-08 Thread Kate Ignatius

Hi,

I have a list of 150 strings, say, ap,:

aajkss
dfghjk
sdfghk
...
xxcvvn


And I would l like to grep out these strings from column names in
another file, af,.   I've tried the following but none seem to work:

aps - af[,grep(ap, colnames(af), value=TRUE)]
aps - af[,grep(ap, colnames(af), value=FIXED)]
aps - af[,grep(as.character(list(ap),colnames(af))]

and also aps - unique (grep(ap, colnames(af))

Is there another way I can do this - maybe without using grep?

Thanks!

Kate.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep out columns using a list of strings

2015-05-08 Thread Boris Steipe

How about %in% ?


# preparing something that looks like I think your data looks like:
ap - c(aajkss, dfghjk, sdfghk, xxcvvn)
af - matrix(1:10, nrow=2)
colnames(af) - c(aajkss, b, c, dfghjk, e)

# doing what I think you need done:
ap[ap %in% colnames(af)]


Cheers,
B.

(PS. a reproducible example saves us all time and unnecessary effort. :-)





On May 8, 2015, at 3:50 PM, Kate Ignatius kate.ignat...@gmail.com wrote:

 Hi,
 
 I have a list of 150 strings, say, ap,:
 
 aajkss
 dfghjk
 sdfghk
 ...
 xxcvvn
 
 
 And I would l like to grep out these strings from column names in
 another file, af,.   I've tried the following but none seem to work:
 
 aps - af[,grep(ap, colnames(af), value=TRUE)]
 aps - af[,grep(ap, colnames(af), value=FIXED)]
 aps - af[,grep(as.character(list(ap),colnames(af))]
 
 and also aps - unique (grep(ap, colnames(af))
 
 Is there another way I can do this - maybe without using grep?
 
 Thanks!
 
 Kate.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep won't work finding one column

2014-10-14 Thread Kate Ignatius

I'm having an issue with grep:

I have numerous columns that end with .at... when I use grep like so:

df[,grep(.at,colnames(df))]

it works fine.  When I have one column that ends with .at, it does not
work.  Why is that?  As this is loop with varying number of columns
ending in .at I would like some code that would work with 1 to n
number of columns.

Is there something more optimal than grep?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread John McKown

On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote:
 I'm having an issue with grep:

 I have numerous columns that end with .at... when I use grep like so:

 df[,grep(.at,colnames(df))]

 it works fine.  When I have one column that ends with .at, it does not
 work.  Why is that?  As this is loop with varying number of columns
 ending in .at I would like some code that would work with 1 to n
 number of columns.

 Is there something more optimal than grep?

 Thanks!

I can't answer your direct question. But do you realize that your code
does not match your words? The grep show does not _only_ match columns
who name end with the characters '.at'. It matches all column names
which contain any character followed by the characters at. To do the
match with only columns whose names end with the characters .at, you
need: grep(\.at$,colnames(df)).

You might want to post an example which fails. Just to be complete, be
sure to use the dput() function so that it is easy for members of the
group to cut'n'paste to get your data into our own R workspace.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread Kate Ignatius

For example,

DF will usually have numerous columns with sample1.at sample1.dp
sample1.fg sample2.at sample2.dp sample2.fg and so on

I'm running this code in R as part of a shell script which runs over
several different file sizes so sometimes it will come across a file
with one sample in it: i.e. sample1: when the R code runs through this
file... trying to grep out  the sample1.at column does not work and
it will halt and stop.

Here is some sample data... say I want to get out the AT_ only column


Sample_1 AT_1
A/A RR
G/G AA
T/T AA
G/A RA
G/G RR
C/C AA
C/C AA
C/T RA
A/A AA
T/G RA

it will have a problem grepping out this single column.

On Tue, Oct 14, 2014 at 10:38 AM, John McKown
john.archie.mck...@gmail.com wrote:
 On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 I'm having an issue with grep:

 I have numerous columns that end with .at... when I use grep like so:

 df[,grep(.at,colnames(df))]

 it works fine.  When I have one column that ends with .at, it does not
 work.  Why is that?  As this is loop with varying number of columns
 ending in .at I would like some code that would work with 1 to n
 number of columns.

 Is there something more optimal than grep?

 Thanks!

 I can't answer your direct question. But do you realize that your code
 does not match your words? The grep show does not _only_ match columns
 who name end with the characters '.at'. It matches all column names
 which contain any character followed by the characters at. To do the
 match with only columns whose names end with the characters .at, you
 need: grep(\.at$,colnames(df)).

 You might want to post an example which fails. Just to be complete, be
 sure to use the dput() function so that it is easy for members of the
 group to cut'n'paste to get your data into our own R workspace.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread Jeff Newmiller

Your question is missing a reproducible example, and you don't say how it does 
not work, so we cannot tell what is going on.

Two things do come to mind, though.

A) Data frame subsets with only one column by default return a vector, which is 
a different type of object than a single-column data frame. You would need to 
read ?[.data.frame about the drop argument if you wanted to consistently 
get a data frame from this expression.

B) The period is a wildcard in regular expressions. If you expect to limit your 
search to literal .at at the end of the name then you should use the search 
pattern  \\.at$ instead (the first slash allows the second one to be stored 
by R in the string, and the second one is the only one seen by grep, which it 
reads as making the period not act like a wildcard). You really should read 
about regular expressions before using them. There are many tutorials on the 
web about this topic.

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On October 14, 2014 7:23:55 AM PDT, Kate Ignatius kate.ignat...@gmail.com 
wrote:
I'm having an issue with grep:

I have numerous columns that end with .at... when I use grep like so:

df[,grep(.at,colnames(df))]

it works fine.  When I have one column that ends with .at, it does not
work.  Why is that?  As this is loop with varying number of columns
ending in .at I would like some code that would work with 1 to n
number of columns.

Is there something more optimal than grep?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread Ivan Calandra


Shouldn't it be
grep(\\.at$,colnames(df))
with double back slash?

Ivan

--
Ivan Calandra
University of Reims Champagne-Ardenne
GEGENA² - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 14/10/14 16:38, John McKown a écrit :

On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote:

I'm having an issue with grep:

I have numerous columns that end with .at... when I use grep like so:

df[,grep(.at,colnames(df))]

it works fine.  When I have one column that ends with .at, it does not
work.  Why is that?  As this is loop with varying number of columns
ending in .at I would like some code that would work with 1 to n
number of columns.

Is there something more optimal than grep?

Thanks!

I can't answer your direct question. But do you realize that your code
does not match your words? The grep show does not _only_ match columns
who name end with the characters '.at'. It matches all column names
which contain any character followed by the characters at. To do the
match with only columns whose names end with the characters .at, you
need: grep(\.at$,colnames(df)).

You might want to post an example which fails. Just to be complete, be
sure to use the dput() function so that it is easy for members of the
group to cut'n'paste to get your data into our own R workspace.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread John McKown

AT and at are not the same. If you want an case insensitive compare
for the characters at you need the ignore.case=TRUE added. E.g.:

df[,grep(.at,colnames(df),ignore.case=TRUE)

That should match the column name you gave. Which does not match your
initial description which said ending with .at. That has an embedded
AT. So I am still a bit confused about your needs.

On Tue, Oct 14, 2014 at 9:55 AM, Kate Ignatius kate.ignat...@gmail.com wrote:
 For example,

 DF will usually have numerous columns with sample1.at sample1.dp
 sample1.fg sample2.at sample2.dp sample2.fg and so on

 I'm running this code in R as part of a shell script which runs over
 several different file sizes so sometimes it will come across a file
 with one sample in it: i.e. sample1: when the R code runs through this
 file... trying to grep out  the sample1.at column does not work and
 it will halt and stop.

 Here is some sample data... say I want to get out the AT_ only column


 Sample_1 AT_1
 A/A RR
 G/G AA
 T/T AA
 G/A RA
 G/G RR
 C/C AA
 C/C AA
 C/T RA
 A/A AA
 T/G RA

 it will have a problem grepping out this single column.

 On Tue, Oct 14, 2014 at 10:38 AM, John McKown
 john.archie.mck...@gmail.com wrote:
 On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 I'm having an issue with grep:

 I have numerous columns that end with .at... when I use grep like so:

 df[,grep(.at,colnames(df))]

 it works fine.  When I have one column that ends with .at, it does not
 work.  Why is that?  As this is loop with varying number of columns
 ending in .at I would like some code that would work with 1 to n
 number of columns.

 Is there something more optimal than grep?

 Thanks!

 I can't answer your direct question. But do you realize that your code
 does not match your words? The grep show does not _only_ match columns
 who name end with the characters '.at'. It matches all column names
 which contain any character followed by the characters at. To do the
 match with only columns whose names end with the characters .at, you
 need: grep(\.at$,colnames(df)).

 You might want to post an example which fails. Just to be complete, be
 sure to use the dput() function so that it is easy for members of the
 group to cut'n'paste to get your data into our own R workspace.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown



-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread John McKown

You're right. I don't use regexps in R very much. In most other
languages, a single \ is needed. The R parser is different and I
forgot. Thanks for the heads up.

On Tue, Oct 14, 2014 at 10:01 AM, Ivan Calandra
ivan.calan...@univ-reims.fr wrote:
 Shouldn't it be
 grep(\\.at$,colnames(df))
 with double back slash?

 Ivan

 --
 Ivan Calandra
 University of Reims Champagne-Ardenne
 GEGENA² - EA 3795
 CREA - 2 esplanade Roland Garros
 51100 Reims, France
 +33(0)3 26 77 36 89
 ivan.calan...@univ-reims.fr
 https://www.researchgate.net/profile/Ivan_Calandra

 Le 14/10/14 16:38, John McKown a écrit :

 On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 I'm having an issue with grep:

 I have numerous columns that end with .at... when I use grep like so:

 df[,grep(.at,colnames(df))]

 it works fine.  When I have one column that ends with .at, it does not
 work.  Why is that?  As this is loop with varying number of columns
 ending in .at I would like some code that would work with 1 to n
 number of columns.

 Is there something more optimal than grep?

 Thanks!

 I can't answer your direct question. But do you realize that your code
 does not match your words? The grep show does not _only_ match columns
 who name end with the characters '.at'. It matches all column names
 which contain any character followed by the characters at. To do the
 match with only columns whose names end with the characters .at, you
 need: grep(\.at$,colnames(df)).

 You might want to post an example which fails. Just to be complete, be
 sure to use the dput() function so that it is easy for members of the
 group to cut'n'paste to get your data into our own R workspace.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread Kate Ignatius

In the sense - it does not work.  it works when there are 50 samples
in the file, but it does not work when there is one.

The usual headings are:  sample1.at sample1.dp
sample1.fg sample2.at sample2.dp sample2.fg and so on to a max of
sample50.at sample50.dp sample50.fg

using this greps out all the .at columns perfectly:

df[,grep(.at,colnames(df))]

When I come across a file when there is one sample:

sample1.at sample1.dp sample1.fg

Using this:

df[,grep(.at,colnames(df))]

returns nothing.

Oh - AT/at was just an example... thats not my problem...



On Tue, Oct 14, 2014 at 10:57 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 Your question is missing a reproducible example, and you don't say how it 
 does not work, so we cannot tell what is going on.

 Two things do come to mind, though.

 A) Data frame subsets with only one column by default return a vector, which 
 is a different type of object than a single-column data frame. You would need 
 to read ?[.data.frame about the drop argument if you wanted to 
 consistently get a data frame from this expression.

 B) The period is a wildcard in regular expressions. If you expect to limit 
 your search to literal .at at the end of the name then you should use the 
 search pattern  \\.at$ instead (the first slash allows the second one to be 
 stored by R in the string, and the second one is the only one seen by grep, 
 which it reads as making the period not act like a wildcard). You really 
 should read about regular expressions before using them. There are many 
 tutorials on the web about this topic.

 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On October 14, 2014 7:23:55 AM PDT, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
I'm having an issue with grep:

I have numerous columns that end with .at... when I use grep like so:

df[,grep(.at,colnames(df))]

it works fine.  When I have one column that ends with .at, it does not
work.  Why is that?  As this is loop with varying number of columns
ending in .at I would like some code that would work with 1 to n
number of columns.

Is there something more optimal than grep?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep won't work finding one column

2014-10-14 Thread Rolf Turner


On 15/10/14 04:09, Kate Ignatius wrote:

In the sense - it does not work.  it works when there are 50 samples
in the file, but it does not work when there is one.

The usual headings are:  sample1.at sample1.dp
sample1.fg sample2.at sample2.dp sample2.fg and so on to a max of
sample50.at sample50.dp sample50.fg

using this greps out all the .at columns perfectly:

df[,grep(.at,colnames(df))]

When I come across a file when there is one sample:

sample1.at sample1.dp sample1.fg

Using this:

df[,grep(.at,colnames(df))]

returns nothing.

Oh - AT/at was just an example... thats not my problem...


You are being (deliberately?) obtuse.

It's *all* your problem.  You have to be precise when working with 
computers and when providing examples.  Don't build examples with 
confusing red herrings.


Your assertion that df[,grep(.at,colnames(df))] returns nothing is 
simple ***INCORRECT***.  It works just fine.  See the (tidy, completely 
reproducible) example in the attached file kate.txt.


Note that, with a single .at column in your data frame, what is 
returned is ***NOT*** a data frame but rather a vector.  If you want a 
(one-column) data frame you need to use drop=FALSE in your 
subscripting call.


You need to study up on R and learn how it works (read the Introduction 
to R) and stop going off half-cocked.


cheers,

Rolf Turner

P.S.  It is a ***bad*** idea to use df as the name of a data frame. 
The string df is the name of a *function* in base R (it is the 
probability density function for the F distribution).  Although R is 
clever enough to distinguish functions from data objects in *most* 
circumstances, at the very least confusion could arise.


R. T.

--
Rolf Turner
Technical Editor ANZJS
#
# Check it out.
#

# Data frame with one .at column.
d1 - as.data.frame(matrix(1,ncol=3,nrow=10))
n1 - c(sample1.at,sample1.dp,sample1.g)
names(d1) - n1

# Data frame with many .at columns.
d2 - as.data.frame(matrix(1,ncol=50,nrow=10))
set.seed(42)
n2 - paste(sample,1:50,sample(c(.at,.dp,.fg),50,TRUE),sep=)
names(d2) - n2

# Extract the .at columns.
print(d1[,grep(.at,colnames(d1))])
print(d2[,grep(.at,colnames(d2))])
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep for multiple pattern?

2014-02-13 Thread Rainer M Krug

Hi

I want to search for multiple pattern as grep is doing for a single
pattern, but this obviously not work:

 grep(an, month.name)
[1] 1
 grep(em, month.name)
[1]  9 11 12
 grep(eb, month.name)
[1] 2
 grep(c(an, em, eb), month.name)
[1] 1
Warning message:
In grep(c(an, em, eb), month.name) :
  argument 'pattern' has length  1 and only the first element will be used


Is there an equivalent which returns the positions as grep is doing, but
not using the strict full-string matching of match()?

I could obviously do:

 unlist( sapply(pat, grep, month.name ) )
 an em1 em2 em3  eb
  1   9  11  12   2

but is there a more compact command I am missing?

Thanks,

Rainer

-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread PIKAL Petr

Hi

Maybe I am missing something but isn't  this

which(letters %in% c(a, x))

what you want?

Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Rainer M Krug
 Sent: Thursday, February 13, 2014 3:43 PM
 To: R-help@r-project.org
 Subject: [R] grep for multiple pattern?

 Hi

 I want to search for multiple pattern as grep is doing for a single
 pattern, but this obviously not work:

  grep(an, month.name)
 [1] 1
  grep(em, month.name)
 [1]  9 11 12
  grep(eb, month.name)
 [1] 2
  grep(c(an, em, eb), month.name)
 [1] 1
 Warning message:
 In grep(c(an, em, eb), month.name) :
   argument 'pattern' has length  1 and only the first element will be
 used
 

 Is there an equivalent which returns the positions as grep is doing,
 but not using the strict full-string matching of match()?

 I could obviously do:

  unlist( sapply(pat, grep, month.name ) )
  an em1 em2 em3  eb
   1   9  11  12   2

 but is there a more compact command I am missing?

 Thanks,

 Rainer

 --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology Stellenbosch University South
 Africa

 Tel :   +33 - (0)9 53 10 27 44
 Cell:   +33 - (0)6 85 62 59 98
 Fax :   +33 - (0)9 58 10 27 44

 Fax (D):+49 - (0)3 21 21 25 22 44

 email:  rai...@krugs.de

 Skype:  RMkrug



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread Marc Schwartz


On Feb 13, 2014, at 8:43 AM, Rainer M Krug rai...@krugs.de wrote:

 Hi
 
 I want to search for multiple pattern as grep is doing for a single
 pattern, but this obviously not work:
 
 grep(an, month.name)
 [1] 1
 grep(em, month.name)
 [1]  9 11 12
 grep(eb, month.name)
 [1] 2
 grep(c(an, em, eb), month.name)
 [1] 1
 Warning message:
 In grep(c(an, em, eb), month.name) :
  argument 'pattern' has length  1 and only the first element will be used
 
 
 Is there an equivalent which returns the positions as grep is doing, but
 not using the strict full-string matching of match()?
 
 I could obviously do:
 
 unlist( sapply(pat, grep, month.name ) )
 an em1 em2 em3  eb
  1   9  11  12   2
 
 but is there a more compact command I am missing?
 
 Thanks,
 
 Rainer


The vertical bar '|' acts as a logical 'or' operator in regex expressions:

 grep(an|em|eb, month.name)
[1]  1  2  9 11 12

 grep(an|em|eb, month.name, value = TRUE)
[1] January   February  September November  December 


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread jim holtman

use the | in regular expressions:

 grep(c(an|em|eb, month.name)


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Feb 13, 2014 at 9:43 AM, Rainer M Krug rai...@krugs.de wrote:

 Hi

 I want to search for multiple pattern as grep is doing for a single
 pattern, but this obviously not work:

  grep(an, month.name)
 [1] 1
  grep(em, month.name)
 [1]  9 11 12
  grep(eb, month.name)
 [1] 2
  grep(c(an, em, eb), month.name)
 [1] 1
 Warning message:
 In grep(c(an, em, eb), month.name) :
   argument 'pattern' has length  1 and only the first element will be used
 

 Is there an equivalent which returns the positions as grep is doing, but
 not using the strict full-string matching of match()?

 I could obviously do:

  unlist( sapply(pat, grep, month.name ) )
  an em1 em2 em3  eb
   1   9  11  12   2

 but is there a more compact command I am missing?

 Thanks,

 Rainer

 --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

 Tel :   +33 - (0)9 53 10 27 44
 Cell:   +33 - (0)6 85 62 59 98
 Fax :   +33 - (0)9 58 10 27 44

 Fax (D):+49 - (0)3 21 21 25 22 44

 email:  rai...@krugs.de

 Skype:  RMkrug


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread Rainer M Krug


On 02/13/14, 17:23 , jim holtman wrote:
 use the | in regular expressions:
 
  grep(c(an|em|eb, month.name http://month.name/) 

Thanks - again a reason to learn regexp.

Cheers,

Rainer

 
 
 Jim Holtman
 Data Munger Guru
  
 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.
 
 
 On Thu, Feb 13, 2014 at 9:43 AM, Rainer M Krug rai...@krugs.de
 mailto:rai...@krugs.de wrote:
 
 Hi
 
 I want to search for multiple pattern as grep is doing for a single
 pattern, but this obviously not work:
 
  grep(an, month.name http://month.name)
 [1] 1
  grep(em, month.name http://month.name)
 [1]  9 11 12
  grep(eb, month.name http://month.name)
 [1] 2
  grep(c(an, em, eb), month.name http://month.name)
 [1] 1
 Warning message:
 In grep(c(an, em, eb), month.name http://month.name) :
   argument 'pattern' has length  1 and only the first element will
 be used
 
 
 Is there an equivalent which returns the positions as grep is doing, but
 not using the strict full-string matching of match()?
 
 I could obviously do:
 
  unlist( sapply(pat, grep, month.name http://month.name ) )
  an em1 em2 em3  eb
   1   9  11  12   2
 
 but is there a more compact command I am missing?
 
 Thanks,
 
 Rainer
 
 --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)
 
 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa
 
 Tel :   +33 - (0)9 53 10 27 44
 tel:%2B33%20-%20%280%299%2053%2010%2027%2044
 Cell:   +33 - (0)6 85 62 59 98
 tel:%2B33%20-%20%280%296%2085%2062%2059%2098
 Fax :   +33 - (0)9 58 10 27 44
 tel:%2B33%20-%20%280%299%2058%2010%2027%2044
 
 Fax (D):+49 - (0)3 21 21 25 22 44
 tel:%2B49%20-%20%280%293%2021%2021%2025%2022%2044
 
 email:  rai...@krugs.de mailto:rai...@krugs.de
 
 Skype:  RMkrug
 
 
 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread Prof Brian Ripley


On 13/02/2014 16:25, Rainer M Krug wrote:


On 02/13/14, 17:23 , jim holtman wrote:

use the | in regular expressions:

  grep(c(an|em|eb, month.name http://month.name/)


Thanks - again a reason to learn regexp.


Note though that is an *extended* regex.  They are the default in R, but 
not for grep, sed, 


Another thing to watch out is that GNU grep allows (a\|b\|c) in 'basic' 
regexps -- but the POSIX standard does not, and nor do other 
implementations.  The authors of the graphviz configure code (shipped 
with Rgraphviz) did not know this and wasted other people's resources.




Cheers,

Rainer




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Feb 13, 2014 at 9:43 AM, Rainer M Krug rai...@krugs.de
mailto:rai...@krugs.de wrote:

 Hi

 I want to search for multiple pattern as grep is doing for a single
 pattern, but this obviously not work:

  grep(an, month.name http://month.name)
 [1] 1
  grep(em, month.name http://month.name)
 [1]  9 11 12
  grep(eb, month.name http://month.name)
 [1] 2
  grep(c(an, em, eb), month.name http://month.name)
 [1] 1
 Warning message:
 In grep(c(an, em, eb), month.name http://month.name) :
   argument 'pattern' has length  1 and only the first element will
 be used
 

 Is there an equivalent which returns the positions as grep is doing, but
 not using the strict full-string matching of match()?

 I could obviously do:

  unlist( sapply(pat, grep, month.name http://month.name ) )
  an em1 em2 em3  eb
   1   9  11  12   2

 but is there a more compact command I am missing?

 Thanks,

 Rainer

 --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

 Tel :   +33 - (0)9 53 10 27 44
 tel:%2B33%20-%20%280%299%2053%2010%2027%2044
 Cell:   +33 - (0)6 85 62 59 98
 tel:%2B33%20-%20%280%296%2085%2062%2059%2098
 Fax :   +33 - (0)9 58 10 27 44
 tel:%2B33%20-%20%280%299%2058%2010%2027%2044

 Fax (D):+49 - (0)3 21 21 25 22 44
 tel:%2B49%20-%20%280%293%2021%2021%2025%2022%2044

 email:  rai...@krugs.de mailto:rai...@krugs.de

 Skype:  RMkrug


 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread Keith Jewell


On 13/02/2014 15:51, Marc Schwartz wrote:


On Feb 13, 2014, at 8:43 AM, Rainer M Krug rai...@krugs.de wrote:


Hi

I want to search for multiple pattern as grep is doing for a single
pattern, but this obviously not work:


grep(an, month.name)

[1] 1

grep(em, month.name)

[1]  9 11 12

grep(eb, month.name)

[1] 2

grep(c(an, em, eb), month.name)

[1] 1
Warning message:
In grep(c(an, em, eb), month.name) :
  argument 'pattern' has length  1 and only the first element will be used




Is there an equivalent which returns the positions as grep is doing, but
not using the strict full-string matching of match()?

I could obviously do:


unlist( sapply(pat, grep, month.name ) )

an em1 em2 em3  eb
  1   9  11  12   2

but is there a more compact command I am missing?

Thanks,

Rainer



The vertical bar '|' acts as a logical 'or' operator in regex expressions:


grep(an|em|eb, month.name)

[1]  1  2  9 11 12


grep(an|em|eb, month.name, value = TRUE)

[1] January   February  September November  December


Regards,

Marc Schwartz


and if you want your patterns in a vector
 pat -c(an, em, eb)
 grep(paste(pat, collapse=|), month.name)
[1]  1  2  9 11 12

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep(pattern = each element of a vector) ?

2013-09-12 Thread arun

Hi,
res- ddply(.data=df1,
  .variables='Taxa',
   .fun=transform,
   Class=find.class(Taxa))
#Warning messages:
#1: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used
#2: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used
#3: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used

May be it is better to modify the function:
find.class- function(x) df2[grep(unique(x),df2$Taxa),'Class']
res1- ddply(.data=df1,
   .variables='Taxa',
    .fun=transform,
    Class=find.class(Taxa)) #no warnings

#though it doesn't have any effect in the end result.
 identical(res,res1) 
#[1] TRUE


A.K.





- Original Message -
From: Allen, Joel allen.j...@epa.gov
To: Beaulieu, Jake beaulieu.j...@epa.gov; r-help@r-project.org 
r-help@r-project.org
Cc: Farrar, David farrar.da...@epa.gov; Green, Hyatt 
green.hy...@epa.gov; McManus, Michael mcmanus.mich...@epa.gov; Wahman, 
David wahman.da...@epa.gov
Sent: Thursday, September 12, 2013 2:49 PM
Subject: Re: [R] grep(pattern = each element of a vector) ?

Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit 
system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', 
NA))
df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
      .variables='Taxa',
      .fun=transform,
      Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
    index[i] - grep(df1$Taxa[1], df2$Taxa)
    }
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep(pattern = each element of a vector) ?

2013-09-12 Thread Beaulieu, Jake

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
index[i] - grep(df1$Taxa[1], df2$Taxa)
}
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.gov


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep(pattern = each element of a vector) ?

2013-09-12 Thread Allen, Joel

Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit 
system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', 
NA))
df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
  .variables='Taxa',
  .fun=transform,
  Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
index[i] - grep(df1$Taxa[1], df2$Taxa)
}
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grep functions output

2013-08-05 Thread Lívio Cipriano

Hi,

I'm writing some R scripts and I would like to grab outputs from R functions 
to control if tests. Example, one function outputs something like p-vale = 
0.0765 and I want to program the following pseudo code in R

sig = grep pvalue

if (sig  0.05)
a()
else
b()

Should I use the grep function of R 
(http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html
) or is there other easy way to do it?

Regards

Lívio Cipriano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep functions output

2013-08-05 Thread jim holtman

try this:

 x -  p-value = 0.0765
 sig - as.numeric(sub(.*=(.*), \\1, x))

 sig
[1] 0.0765




On Mon, Aug 5, 2013 at 10:41 AM, Lívio Cipriano lcmail4li...@gmail.comwrote:

 Hi,

 I'm writing some R scripts and I would like to grab outputs from R
 functions
 to control if tests. Example, one function outputs something like p-vale =
 0.0765 and I want to program the following pseudo code in R

 sig = grep pvalue

 if (sig  0.05)
 a()
 else
 b()

 Should I use the grep function of R (
 http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html
 ) or is there other easy way to do it?

 Regards

 Lívio Cipriano

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep functions output

2013-08-05 Thread Jeff Newmiller

If you were to follow the recommendations in the footer of this email, you 
might get some better options than using grep.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Lívio Cipriano lcmail4li...@gmail.com wrote:
Hi,

I'm writing some R scripts and I would like to grab outputs from R
functions 
to control if tests. Example, one function outputs something like
p-vale = 
0.0765 and I want to program the following pseudo code in R

sig = grep pvalue

if (sig  0.05)
   a()
else
   b()

Should I use the grep function of R
(http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html
) or is there other easy way to do it?

Regards

Lívio Cipriano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep functions output

2013-08-05 Thread Lívio Cipriano

On 05 August 2013 11:11:22 Phil Spector wrote:
 but most functions in R 
 that provide p-values make it possible to extract the p-value 
from the 
 result of the function call without using any text

Thanks for your answer. In fact it's the simple way to do it.

Regards

Lívio Cipriano

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep help (character ommission)

2013-05-01 Thread Johannes Graumann

Hello,

Banging my head against a wall here ... can anyone light the way to a 
pattern modification that would make the following TRUE?

identical(
  grep(
^Intensity\\s[^HL],
c(Intensity,Intensity L, Intensity H, Intensity Rep1)),
  as.integer(c(1,4)))

Thank you for your time.

Sincerely, Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep help (character ommission)

2013-05-01 Thread jim holtman

try this:

 identical(
+   grep(
+ ^Intensity *[HL],
+ c(Intensity,Intensity L, Intensity H, Intensity Rep1),
+ invert = TRUE),
+   as.integer(c(1,4)))
[1] TRUE




On Wed, May 1, 2013 at 4:37 AM, Johannes Graumann
johannes_graum...@web.dewrote:

 Hello,

 Banging my head against a wall here ... can anyone light the way to a
 pattern modification that would make the following TRUE?

 identical(
   grep(
 ^Intensity\\s[^HL],
 c(Intensity,Intensity L, Intensity H, Intensity Rep1)),
   as.integer(c(1,4)))

 Thank you for your time.

 Sincerely, Joh

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep help (character ommission)

2013-05-01 Thread Rui Barradas


Hello,

The following pattern seems to do it.

grep(^Intensity$|^Intensity\\s[^HL],
c(Intensity,Intensity L, Intensity H, Intensity Rep1))


Hope this helps,

Rui Barradas

Em 01-05-2013 09:37, Johannes Graumann escreveu:

Hello,

Banging my head against a wall here ... can anyone light the way to a
pattern modification that would make the following TRUE?

identical(
   grep(
 ^Intensity\\s[^HL],
 c(Intensity,Intensity L, Intensity H, Intensity Rep1)),
   as.integer(c(1,4)))

Thank you for your time.

Sincerely, Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep help (character ommission)

2013-05-01 Thread arun



HI,
You could also do:
vec1- c(Intensity,Intensity L, Intensity H, Intensity Rep1)
identical(setdiff(seq_along(vec1),grep(H|L,vec1)),as.integer(c(1,4)))
#[1] TRUE
A.K.


- Original Message -
From: Johannes Graumann johannes_graum...@web.de
To: r-h...@stat.math.ethz.ch
Cc: 
Sent: Wednesday, May 1, 2013 4:37 AM
Subject: [R] grep help (character ommission)

Hello,

Banging my head against a wall here ... can anyone light the way to a 
pattern modification that would make the following TRUE?

identical(
  grep(
    ^Intensity\\s[^HL],
    c(Intensity,Intensity L, Intensity H, Intensity Rep1)),
  as.integer(c(1,4)))

Thank you for your time.

Sincerely, Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep with wildcards across multiple columns

2013-03-15 Thread Bush, Daniel P. DPI

I think the way I set up my sample data without any explanation confused things 
slightly. These data might make things clearer:

# Create fake data
df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
 year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
 fund   = rep(c(10E, 27E, 27E, 29E), 4),
 func   = rep(c(11, 122000, 214000, 158000), 4),
 obj= rep(c(100, 100, 210, 220), 4),
 amount = round(rnorm(16, 5, 1)))

These are financial data with a hierarchical account structure where a zero 
represents a summary account that rolls up all the accounts at subsequent 
digits (e.g. 10 rolls up 11, 122000, 158000, etc.). I was trying to do 
two things with the search parameters: turn zeroes into question marks, and 
duplicate the functionality of a SQL query using those question marks as 
wildcards:

# Set parameters
par.fund - 20E; par.func - 10; par.obj - 000
par.fund - glob2rx(gsub(0, ?, par.fund))
par.func - glob2rx(gsub(0, ?, par.func))
par.obj - glob2rx(gsub(0, ?, par.obj))

Fortunately, Bill's suggestion to use the intersect function worked just 
fine--since intersect accepts only two arguments, I had to nest a pair of 
statements:

# Solution: Use a pair of nested intersects
dt2 - dt[intersect(intersect(grep(par.fund, fund), grep(par.func, func)),
grep(par.obj, obj)),
  sum(amount), by=c('code', 'year')]
df2 - ddply(df[intersect(intersect(grep(par.fund, df$fund),
grep(par.func, df$func)),
  grep(par.obj, df$obj)), ],
 .(code, year), summarize, amount = sum(amount))

Thanks for your ideas!

DB

Daniel Bush | School Finance Consultant 
School Financial Services | Wis. Dept. of Public Instruction 
daniel.bush -at- dpi.wi.gov | 608-267-9212

-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Thursday, March 14, 2013 5:49 PM
To: Bush, Daniel P. DPI; 'r-help@r-project.org'
Subject: RE: Grep with wildcards across multiple columns

grep(pattern, textVector) returns of the integer indices of the elements of 
textVector that match the pattern.  E.g.,
   grep(T, c(One,Two,Three,Four))
  [1] 2 3

The '' operator naturally operates on logical vectors of the same length (If 
you give it numbers it silently converts 0 to FALSE and  other numbers to TRUE.)

The two don't fit together.  You could use grepl(), which returns a logical 
vector the length of textVector, as in
   grepl(p1,v1)  grepl(p2,v2)
to figure which entries in the table have v1 matching p1 and v2 matching p2.

Or, you could use
  intersect(grep(p1,v1), grep(p2,v2))
if you want to stick with integer indices.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Bush, Daniel P. DPI
 Sent: Thursday, March 14, 2013 2:43 PM
 To: 'r-help@r-project.org'
 Subject: [R] Grep with wildcards across multiple columns
 
 I have a fairly large data set with six variables set up like the following 
 dummy:
 
 # Create fake data
 df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
  year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
  fund   = rep(c(10E, 10E, 10E, 27E), 4),
  func   = rep(c(11, 122000, 214000, 158000), 4),
  obj= rep(100, 16),
  amount = round(rnorm(16, 5, 1)))
 
 What I would like to do is sum the amount variable by code and year, 
 filtering rows using different wildcard searches in each of three 
 columns: 1?E in fund, 1?? in func, and ??? in obj. I'm OK turning 
 these into regular expressions:
 
 # Set parameters
 par.fund - 10E; par.func - 10; par.obj - 000
 par.fund - glob2rx(gsub(0, ?, par.fund)) par.func - 
 glob2rx(gsub(0, ?, par.func)) par.obj - glob2rx(gsub(0, ?, 
 par.obj))
 
 The problem occurs when I try to apply multiple greps across columns. 
 I'd prefer to use data.table since it's so much faster than plyr and I 
 have 159 different sets of parameters to run through, but I get the same 
 error setting it up either way:
 
 # Doesn't work
 library(data.table)
 dt - data.table(df)
 eval(parse(text=paste(
   dt2 - dt[, grep(', par.fund, ', fund)  ,
   grep(', par.func, ', func)  grep(', par.obj, ', obj),
   , sum(amount), by=c('code', 'year')] , sep=))) # Warning 
 message:
 #   In grep(^1.E$, fund)  grep(^1.$, func) :
 #   longer object length is not a multiple of shorter object length
 
 # Also doesn't work
 library(plyr)
 eval(parse(text=paste(
   df2 - ddply(df[grep(', par.fund, ', df$fund)  ,
   grep(', par.func, ', df$func)  grep(', par.obj, ', df$obj), ],
   , .(code, year), summarize, amount = sum(amount)) , sep=))) # 
 Warning message:
 #   In grep(^1.E$, df$fund)  grep(^1.$, df$func) :
 #   longer object length is not a multiple of shorter object length
 
 Clearly

Re: [R] Grep with wildcards across multiple columns

2013-03-15 Thread arun



Hi,

You could try this for multiple intersect:

 dt[Reduce(function(...) intersect(...), 
list(grep(par.fund,fund),grep(par.func,func),grep(par.obj,obj))),sum(amount),by=c('code','year')]
#   code year V1
#1: 1001 2011 123528
#2: 1001 2012  97362
#3: 1002 2011 103811
#4: 1002 2012  97179
 dt[intersect(intersect(grep(par.fund, fund), grep(par.func, func)),
 grep(par.obj, obj)),
   sum(amount), by=c('code', 'year')]
 #  code year V1
#1: 1001 2011 123528
#2: 1001 2012  97362
#3: 1002 2011 103811
#4: 1002 2012  97179
A.K.



- Original Message -
From: Bush,  Daniel P.   DPI daniel.b...@dpi.wi.gov
To: 'r-help@r-project.org' r-help@r-project.org
Cc: 'William Dunlap' wdun...@tibco.com; 'smartpink...@yahoo.com' 
smartpink...@yahoo.com; 'djmu...@gmail.com' djmu...@gmail.com
Sent: Friday, March 15, 2013 10:06 AM
Subject: RE: Grep with wildcards across multiple columns

I think the way I set up my sample data without any explanation confused things 
slightly. These data might make things clearer:

# Create fake data
df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
                 year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
                 fund   = rep(c(10E, 27E, 27E, 29E), 4),
                 func   = rep(c(11, 122000, 214000, 158000), 4),
                 obj    = rep(c(100, 100, 210, 220), 4),
                 amount = round(rnorm(16, 5, 1)))

These are financial data with a hierarchical account structure where a zero 
represents a summary account that rolls up all the accounts at subsequent 
digits (e.g. 10 rolls up 11, 122000, 158000, etc.). I was trying to do 
two things with the search parameters: turn zeroes into question marks, and 
duplicate the functionality of a SQL query using those question marks as 
wildcards:

# Set parameters
par.fund - 20E; par.func - 10; par.obj - 000
par.fund - glob2rx(gsub(0, ?, par.fund))
par.func - glob2rx(gsub(0, ?, par.func))
par.obj - glob2rx(gsub(0, ?, par.obj))

Fortunately, Bill's suggestion to use the intersect function worked just 
fine--since intersect accepts only two arguments, I had to nest a pair of 
statements:

# Solution: Use a pair of nested intersects
dt2 - dt[intersect(intersect(grep(par.fund, fund), grep(par.func, func)),
                    grep(par.obj, obj)),
          sum(amount), by=c('code', 'year')]
df2 - ddply(df[intersect(intersect(grep(par.fund, df$fund),
                                    grep(par.func, df$func)),
                          grep(par.obj, df$obj)), ],
             .(code, year), summarize, amount = sum(amount))

Thanks for your ideas!

DB

Daniel Bush | School Finance Consultant 
School Financial Services | Wis. Dept. of Public Instruction 
daniel.bush -at- dpi.wi.gov | 608-267-9212

-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Thursday, March 14, 2013 5:49 PM
To: Bush, Daniel P. DPI; 'r-help@r-project.org'
Subject: RE: Grep with wildcards across multiple columns

grep(pattern, textVector) returns of the integer indices of the elements of 
textVector that match the pattern.  E.g.,
   grep(T, c(One,Two,Three,Four))
  [1] 2 3

The '' operator naturally operates on logical vectors of the same length (If 
you give it numbers it silently converts 0 to FALSE and  other numbers to TRUE.)

The two don't fit together.  You could use grepl(), which returns a logical 
vector the length of textVector, as in
   grepl(p1,v1)  grepl(p2,v2)
to figure which entries in the table have v1 matching p1 and v2 matching p2.

Or, you could use
  intersect(grep(p1,v1), grep(p2,v2))
if you want to stick with integer indices.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Bush, Daniel P. DPI
 Sent: Thursday, March 14, 2013 2:43 PM
 To: 'r-help@r-project.org'
 Subject: [R] Grep with wildcards across multiple columns
 
 I have a fairly large data set with six variables set up like the following 
 dummy:
 
 # Create fake data
 df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
                  year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
                  fund   = rep(c(10E, 10E, 10E, 27E), 4),
                  func   = rep(c(11, 122000, 214000, 158000), 4),
                  obj    = rep(100, 16),
                  amount = round(rnorm(16, 5, 1)))
 
 What I would like to do is sum the amount variable by code and year, 
 filtering rows using different wildcard searches in each of three 
 columns: 1?E in fund, 1?? in func, and ??? in obj. I'm OK turning 
 these into regular expressions:
 
 # Set parameters
 par.fund - 10E; par.func - 10; par.obj - 000
 par.fund - glob2rx(gsub(0, ?, par.fund)) par.func - 
 glob2rx(gsub(0, ?, par.func)) par.obj - glob2rx(gsub(0, ?, 
 par.obj))
 
 The problem occurs when I try to apply multiple greps across columns. 
 I'd prefer to use data.table since it's

[R] Grep with wildcards across multiple columns

2013-03-14 Thread Bush, Daniel P. DPI

I have a fairly large data set with six variables set up like the following 
dummy:

# Create fake data
df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
 year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
 fund   = rep(c(10E, 10E, 10E, 27E), 4),
 func   = rep(c(11, 122000, 214000, 158000), 4),
 obj= rep(100, 16),
 amount = round(rnorm(16, 5, 1)))

What I would like to do is sum the amount variable by code and year, filtering 
rows using different wildcard searches in each of three columns: 1?E in fund, 
1?? in func, and ??? in obj. I'm OK turning these into regular 
expressions:

# Set parameters
par.fund - 10E; par.func - 10; par.obj - 000
par.fund - glob2rx(gsub(0, ?, par.fund))
par.func - glob2rx(gsub(0, ?, par.func))
par.obj - glob2rx(gsub(0, ?, par.obj))

The problem occurs when I try to apply multiple greps across columns. I'd 
prefer to use data.table since it's so much faster than plyr and I have 159 
different sets of parameters to run through, but I get the same error setting 
it up either way:

# Doesn't work
library(data.table)
dt - data.table(df)
eval(parse(text=paste(
  dt2 - dt[, grep(', par.fund, ', fund)  ,
  grep(', par.func, ', func)  grep(', par.obj, ', obj),
  , sum(amount), by=c('code', 'year')] , sep=)))
# Warning message:
#   In grep(^1.E$, fund)  grep(^1.$, func) :
#   longer object length is not a multiple of shorter object length

# Also doesn't work
library(plyr)
eval(parse(text=paste(
  df2 - ddply(df[grep(', par.fund, ', df$fund)  ,
  grep(', par.func, ', df$func)  grep(', par.obj, ', df$obj), ],
  , .(code, year), summarize, amount = sum(amount)) , sep=)))
# Warning message:
#   In grep(^1.E$, df$fund)  grep(^1.$, df$func) :
#   longer object length is not a multiple of shorter object length

Clearly, the problem is how I'm trying to combine greps in subsetting rows, but 
I haven't been able to find a solution that works. Any thoughts-preferably 
something that works with data.table?

DB

Daniel Bush
School Finance Consultant
School Financial Services
Wisconsin Department of Public Instruction
PO Box 7841 | Madison, WI 53707-7841
daniel.bush -at- dpi.wi.gov | sfs.dpi.wi.gov
Ph: 608-267-9212 | Fax: 608-266-2840




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep with wildcards across multiple columns

2013-03-14 Thread arun

HI,

Not sure whether this helps.
If you take out the grep(,par.obj,..), it works without any warning.
eval(parse(text=paste(
  dt2 - dt[, grep(', par.fund, ', fund)  ,
  grep(', par.func, ', func),
  , sum(amount), by=c('code', 'year')] , sep=)))
 dt[grep('^1.E$',fund)  grep('^1.$',func),sum(amount),by=c('code','year')]
#   code year V1
#1: 1001 2011 185482
#2: 1001 2012 189367
#3: 1002 2011 238098
#4: 1002 2012 211499
aggregate(amount~code+year,data=df,sum)
#  code year amount
#1 1001 2011 185482
#2 1002 2011 238098
#3 1001 2012 189367
#4 1002 2012 211499

In the df, you provided, there is only value of obj.
levels(df$obj)
#[1] 100
A.K.




- Original Message -
From: Bush,  Daniel P.   DPI daniel.b...@dpi.wi.gov
To: 'r-help@r-project.org' r-help@r-project.org
Cc: 
Sent: Thursday, March 14, 2013 5:43 PM
Subject: [R] Grep with wildcards across multiple columns

I have a fairly large data set with six variables set up like the following 
dummy:

# Create fake data
df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
                 year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
                 fund   = rep(c(10E, 10E, 10E, 27E), 4),
                 func   = rep(c(11, 122000, 214000, 158000), 4),
                 obj    = rep(100, 16),
                 amount = round(rnorm(16, 5, 1)))

What I would like to do is sum the amount variable by code and year, filtering 
rows using different wildcard searches in each of three columns: 1?E in fund, 
1?? in func, and ??? in obj. I'm OK turning these into regular 
expressions:

# Set parameters
par.fund - 10E; par.func - 10; par.obj - 000
par.fund - glob2rx(gsub(0, ?, par.fund))
par.func - glob2rx(gsub(0, ?, par.func))
par.obj - glob2rx(gsub(0, ?, par.obj))

The problem occurs when I try to apply multiple greps across columns. I'd 
prefer to use data.table since it's so much faster than plyr and I have 159 
different sets of parameters to run through, but I get the same error setting 
it up either way:

# Doesn't work
library(data.table)
dt - data.table(df)
eval(parse(text=paste(
  dt2 - dt[, grep(', par.fund, ', fund)  ,
  grep(', par.func, ', func)  grep(', par.obj, ', obj),
  , sum(amount), by=c('code', 'year')] , sep=)))
# Warning message:
#   In grep(^1.E$, fund)  grep(^1.$, func) :
#   longer object length is not a multiple of shorter object length

# Also doesn't work
library(plyr)
eval(parse(text=paste(
  df2 - ddply(df[grep(', par.fund, ', df$fund)  ,
  grep(', par.func, ', df$func)  grep(', par.obj, ', df$obj), ],
  , .(code, year), summarize, amount = sum(amount)) , sep=)))
# Warning message:
#   In grep(^1.E$, df$fund)  grep(^1.$, df$func) :
#   longer object length is not a multiple of shorter object length

Clearly, the problem is how I'm trying to combine greps in subsetting rows, but 
I haven't been able to find a solution that works. Any thoughts-preferably 
something that works with data.table?

DB

Daniel Bush
School Finance Consultant
School Financial Services
Wisconsin Department of Public Instruction
PO Box 7841 | Madison, WI 53707-7841
daniel.bush -at- dpi.wi.gov | sfs.dpi.wi.gov
Ph: 608-267-9212 | Fax: 608-266-2840




    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep with wildcards across multiple columns

2013-03-14 Thread William Dunlap

grep(pattern, textVector) returns of the integer indices of the elements
of textVector that match the pattern.  E.g.,
   grep(T, c(One,Two,Three,Four))
  [1] 2 3

The '' operator naturally operates on logical vectors of the same length
(If you give it numbers it silently converts 0 to FALSE and  other numbers
to TRUE.)

The two don't fit together.  You could use grepl(), which returns a logical
vector the length of textVector, as in
   grepl(p1,v1)  grepl(p2,v2)
to figure which entries in the table have v1 matching p1 and v2 matching p2.

Or, you could use
  intersect(grep(p1,v1), grep(p2,v2))
if you want to stick with integer indices.

By the way, the eval(parse(text=paste(...))) business is a good way to make
hard-to-read code and hard-to-read means hard-to-fix.  Just write out the
expression.
   paste(
  +   dt2 - dt[, grep(', par.fund, ', fund)  ,
  +   grep(', par.func, ', func)  grep(', par.obj, ', obj),
  +   , sum(amount), by=c('code', 'year')] , sep=)
  [1] dt2 - dt[grep('^1.E$', fund)  grep('^1.$', func)  grep('^...$', 
obj), sum(amount), by=c('code', 'year')]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Bush, Daniel P. DPI
 Sent: Thursday, March 14, 2013 2:43 PM
 To: 'r-help@r-project.org'
 Subject: [R] Grep with wildcards across multiple columns
 
 I have a fairly large data set with six variables set up like the following 
 dummy:
 
 # Create fake data
 df - data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
  year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
  fund   = rep(c(10E, 10E, 10E, 27E), 4),
  func   = rep(c(11, 122000, 214000, 158000), 4),
  obj= rep(100, 16),
  amount = round(rnorm(16, 5, 1)))
 
 What I would like to do is sum the amount variable by code and year, 
 filtering rows using
 different wildcard searches in each of three columns: 1?E in fund, 
 1?? in func,
 and ??? in obj. I'm OK turning these into regular expressions:
 
 # Set parameters
 par.fund - 10E; par.func - 10; par.obj - 000
 par.fund - glob2rx(gsub(0, ?, par.fund))
 par.func - glob2rx(gsub(0, ?, par.func))
 par.obj - glob2rx(gsub(0, ?, par.obj))
 
 The problem occurs when I try to apply multiple greps across columns. I'd 
 prefer to use
 data.table since it's so much faster than plyr and I have 159 different sets 
 of parameters
 to run through, but I get the same error setting it up either way:
 
 # Doesn't work
 library(data.table)
 dt - data.table(df)
 eval(parse(text=paste(
   dt2 - dt[, grep(', par.fund, ', fund)  ,
   grep(', par.func, ', func)  grep(', par.obj, ', obj),
   , sum(amount), by=c('code', 'year')] , sep=)))
 # Warning message:
 #   In grep(^1.E$, fund)  grep(^1.$, func) :
 #   longer object length is not a multiple of shorter object length
 
 # Also doesn't work
 library(plyr)
 eval(parse(text=paste(
   df2 - ddply(df[grep(', par.fund, ', df$fund)  ,
   grep(', par.func, ', df$func)  grep(', par.obj, ', df$obj), ],
   , .(code, year), summarize, amount = sum(amount)) , sep=)))
 # Warning message:
 #   In grep(^1.E$, df$fund)  grep(^1.$, df$func) :
 #   longer object length is not a multiple of shorter object length
 
 Clearly, the problem is how I'm trying to combine greps in subsetting rows, 
 but I haven't
 been able to find a solution that works. Any thoughts-preferably something 
 that works
 with data.table?
 
 DB
 
 Daniel Bush
 School Finance Consultant
 School Financial Services
 Wisconsin Department of Public Instruction
 PO Box 7841 | Madison, WI 53707-7841
 daniel.bush -at- dpi.wi.gov | sfs.dpi.wi.gov
 Ph: 608-267-9212 | Fax: 608-266-2840
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep txt file names from html

2012-10-31 Thread chuck.01

Sorry, I know I should read a little 1st about this, but I am actually just
helping somebody really quick and need help too. 

I want to grep all of the names of the .txt files mentioned on this html web
page:

http://www.epa.gov/emap/remap/html/three/data/index.html

Thanks ahead of time.



--
View this message in context: 
http://r.789695.n4.nabble.com/grep-txt-file-names-from-html-tp4648037.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep txt file names from html

2012-10-31 Thread Sarah Goslee

they're all of the form

http.*txt

but the best way to grep them (by which I assume you mean extract the
file names from the page source) depends on what you plan to do with them,
and what sort of output you expect.

It isn't even clear whether you plan to do this in R.

Sarah


On Wed, Oct 31, 2012 at 12:56 PM, chuck.01 charliethebrow...@gmail.comwrote:

 Sorry, I know I should read a little 1st about this, but I am actually just
 helping somebody really quick and need help too.

 I want to grep all of the names of the .txt files mentioned on this html
 web
 page:

 http://www.epa.gov/emap/remap/html/three/data/index.html

 Thanks ahead of time.


-- 
Sarah Goslee
http://www.functionaldiversity.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep txt file names from html

2012-10-31 Thread David Winsemius


On Oct 31, 2012, at 9:56 AM, chuck.01 wrote:

 Sorry, I know I should read a little 1st about this, but I am actually just
 helping somebody really quick and need help too. 
 
 I want to grep all of the names of the .txt files mentioned on this html web
 page:
 
 http://www.epa.gov/emap/remap/html/three/data/index.html


This shows code that will identify lines in that source page containing URLs 
that end in '.txt'

 lines - 
 readLines(con=url(http://www.epa.gov/emap/remap/html/three/data/index.html;) 
 )
Warning message:
In readLines(con = 
url(http://www.epa.gov/emap/remap/html/three/data/index.html;)) :
  incomplete final line found on 
'http://www.epa.gov/emap/remap/html/three/data/index.html'
# You can generally ignore that warning.

 length(grep('\\http://([./A-Za-z]){1+}\\.txt', lines) )
[1] 11

Should be fairly straightforward to remove the preceding and trailing material.

 sub('(^.*\\)(http://([./A-Za-z]){1+}\\.txt)(.*$)', \\2, lines[ 
 grep('\\http://([./A-Za-z]){1+}\\.txt', lines) ] )
 [1] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/benthic/benmet.txt;
  
 [2] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/benthic/bencnt.txt;
  
 [3] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/location/watchr.txt;
 
 [4] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/location/habbest.txt;
 [5] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/design/sdesign.txt;
  
 [6] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt;

 [7] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/fish/fshmet.txt;
 
 [8] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/fish/fshcnt.txt;
 
 [9] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/fish/fshnam.txt;
 
[10] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/tissue/ftmet.txt;

[11] 
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/tissue/ftorg.txt;


 

 Thanks ahead of time.
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/grep-txt-file-names-from-html-tp4648037.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep txt file names from html

2012-10-31 Thread chuck.01

Sorry Sarah. 
I want to store them as a vector for use later.  

so, similar to this:

links -
c(http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/benthic/benmet.txt;,
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/location/watchr.txt;,
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt;)




Sarah Goslee wrote
 they're all of the form
 
 http.*txt
 
 but the best way to grep them (by which I assume you mean extract the
 file names from the page source) depends on what you plan to do with them,
 and what sort of output you expect.
 
 It isn't even clear whether you plan to do this in R.
 
 Sarah
 
 
 On Wed, Oct 31, 2012 at 12:56 PM, chuck.01 lt;

 CharlieTheBrown77@

 gt;wrote:
 
 Sorry, I know I should read a little 1st about this, but I am actually
 just
 helping somebody really quick and need help too.

 I want to grep all of the names of the .txt files mentioned on this html
 web
 page:

 http://www.epa.gov/emap/remap/html/three/data/index.html

 Thanks ahead of time.


 -- 
 Sarah Goslee
 http://www.functionaldiversity.org
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





--
View this message in context: 
http://r.789695.n4.nabble.com/grep-txt-file-names-from-html-tp4648037p4648043.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep and XML

2012-04-16 Thread Simon Kiss

Hi all:
I struggle a lot scraping web data. I still haven't got a handle on the XML 
package. 
I'd like to get particular exchange rates from this table: 
https://raw.github.com/currencybot/open-exchange-rates/master/latest.json
This is the code that I'm working with:
library(RCurl)
library(XML)

txt-getURL(https://raw.github.com/currencybot/open-exchange-rates/master/latest.json;)
txt-htmlParse(txt, asText=TRUE)
txt-  getNodeSet(txt, '//p')
So, I can get the node, properly but then, if I try soething like this:
grep(c('USD'), txt)

I get: 
integer(0)

Can anyone suggest a way forward?
Yours, Simon KIss

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep and XML

2012-04-16 Thread Henrique Dallazuanna

Try this:

library(rjson)
j - fromJSON(file =
'https://raw.github.com/currencybot/open-exchange-rates/master/latest.json')
j$rates$USD

On Mon, Apr 16, 2012 at 6:03 PM, Simon Kiss sjk...@gmail.com wrote:
 Hi all:
 I struggle a lot scraping web data. I still haven't got a handle on the XML 
 package.
 I'd like to get particular exchange rates from this table:
 https://raw.github.com/currencybot/open-exchange-rates/master/latest.json
 This is the code that I'm working with:
 library(RCurl)
 library(XML)

 txt-getURL(https://raw.github.com/currencybot/open-exchange-rates/master/latest.json;)
 txt-htmlParse(txt, asText=TRUE)
 txt-  getNodeSet(txt, '//p')
 So, I can get the node, properly but then, if I try soething like this:
 grep(c('USD'), txt)

 I get:
 integer(0)

 Can anyone suggest a way forward?
 Yours, Simon KIss

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep TRUE/FALSE

2012-01-25 Thread Ana

Is there a way to get a TRUE/FALSE condition as result, when using grep ?

c=The file will be updated

if grep(updated,c)==1 then I get TRUE

but

How can I deal with grep(new,c) ? it gives integer(0)


Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep TRUE/FALSE

2012-01-25 Thread Uwe Ligges


Yes, use grepl() rather than grep().

uwe Ligges


On 25.01.2012 12:28, Ana wrote:

Is there a way to get a TRUE/FALSE condition as result, when using grep ?

c=The file will be updated

if grep(updated,c)==1 then I get TRUE

but

How can I deal with grep(new,c) ? it gives integer(0)


Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep fixed (?) in 2.14

2011-11-03 Thread Stephen Sefick

#This is probably due to my incomplete understanding of grep, but the 
below code has been working for some time to
#search for .R with anything in front of it and return a list of scripts 
to source.  Likely, the syntax for the

#grep statement has been wrong all along.

scripts2source - (c(/home/ssefick/R_scripts/Convert_package.R, 
/home/ssefick/R_scripts/Convert_R_CODE,

/home/ssefick/R_scripts/CV.R, /home/ssefick/R_scripts/cvs.out.R,
/home/ssefick/R_scripts/database_connect, 
/home/ssefick/R_scripts/database_connect_package.R,

/home/ssefick/R_scripts/exit_db.R, /home/ssefick/R_scripts/exit.R,
/home/ssefick/R_scripts/hourly_zoo.R, 
/home/ssefick/R_scripts/model_diag.R,
/home/ssefick/R_scripts/not_numeric.R, 
/home/ssefick/R_scripts/num_ecol_package.R,
/home/ssefick/R_scripts/NumEcolR_scripts, 
/home/ssefick/R_scripts/old_scripts_DELETE_AFTER_DECEMBER,
/home/ssefick/R_scripts/only_numeric_dataframe.R, 
/home/ssefick/R_scripts/only_numeric.R,
/home/ssefick/R_scripts/PCA.ve.R, 
/home/ssefick/R_scripts/poster_ggplot2_theme.R,
/home/ssefick/R_scripts/pressure_transducer_package.R, 
/home/ssefick/R_scripts/Pressure_Transducer_R_CODE,
/home/ssefick/R_scripts/publication_ggplot2_theme.R, 
/home/ssefick/R_scripts/r2test.R,
/home/ssefick/R_scripts/recession_constant_package.R, 
/home/ssefick/R_scripts/recession_constant_R_CODE,
/home/ssefick/R_scripts/serdp_name_split.R, 
/home/ssefick/R_scripts/setup_R.R,

/home/ssefick/R_scripts/USGS.R))

scripts2source[grep(*.R, scripts2source)]

#here is my problem;  I would like these to be removed.
scripts2source[c(2,5,13,14,20,24)]

#Thanks for all of your help in advance
#kindest regards,

#Stephen Sefick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep fixed (?) in 2.14

2011-11-03 Thread Sarah Goslee

Hi,

. and * both mean something different in regular expressions than in
command-line wildcards. You need:

grep(.*\\.R$, scripts2source)

which parses to
.- any character
*- any number of times
\\.  - an actual . escaped as R requires
R  - the R denoting a script
$  - at the end of the string

 grep(.*\\.R$, scripts2source)
 [1]  1  3  4  6  7  8  9 10 11 12 15 16 17 18 19 21 22 23 25 26 27

Sarah

On Thu, Nov 3, 2011 at 8:54 AM, Stephen Sefick sas0...@auburn.edu wrote:
 #This is probably due to my incomplete understanding of grep, but the below
 code has been working for some time to
 #search for .R with anything in front of it and return a list of scripts to
 source.  Likely, the syntax for the
 #grep statement has been wrong all along.

 scripts2source - (c(/home/ssefick/R_scripts/Convert_package.R,
 /home/ssefick/R_scripts/Convert_R_CODE,
 /home/ssefick/R_scripts/CV.R, /home/ssefick/R_scripts/cvs.out.R,
 /home/ssefick/R_scripts/database_connect,
 /home/ssefick/R_scripts/database_connect_package.R,
 /home/ssefick/R_scripts/exit_db.R, /home/ssefick/R_scripts/exit.R,
 /home/ssefick/R_scripts/hourly_zoo.R,
 /home/ssefick/R_scripts/model_diag.R,
 /home/ssefick/R_scripts/not_numeric.R,
 /home/ssefick/R_scripts/num_ecol_package.R,
 /home/ssefick/R_scripts/NumEcolR_scripts,
 /home/ssefick/R_scripts/old_scripts_DELETE_AFTER_DECEMBER,
 /home/ssefick/R_scripts/only_numeric_dataframe.R,
 /home/ssefick/R_scripts/only_numeric.R,
 /home/ssefick/R_scripts/PCA.ve.R,
 /home/ssefick/R_scripts/poster_ggplot2_theme.R,
 /home/ssefick/R_scripts/pressure_transducer_package.R,
 /home/ssefick/R_scripts/Pressure_Transducer_R_CODE,
 /home/ssefick/R_scripts/publication_ggplot2_theme.R,
 /home/ssefick/R_scripts/r2test.R,
 /home/ssefick/R_scripts/recession_constant_package.R,
 /home/ssefick/R_scripts/recession_constant_R_CODE,
 /home/ssefick/R_scripts/serdp_name_split.R,
 /home/ssefick/R_scripts/setup_R.R,
 /home/ssefick/R_scripts/USGS.R))

 scripts2source[grep(*.R, scripts2source)]

 #here is my problem;  I would like these to be removed.
 scripts2source[c(2,5,13,14,20,24)]

 #Thanks for all of your help in advance
 #kindest regards,

 #Stephen Sefick

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep fixed (?) in 2.14

2011-11-03 Thread jim holtman

your syntax is wrong, you need:

 scripts2source[grep(*\\.R$, scripts2source)]

Notice the '\\.' to escape the special meaning of '.', and the $ to
anchor to the end of the line.


On Thu, Nov 3, 2011 at 8:54 AM, Stephen Sefick sas0...@auburn.edu wrote:
 #This is probably due to my incomplete understanding of grep, but the below
 code has been working for some time to
 #search for .R with anything in front of it and return a list of scripts to
 source.  Likely, the syntax for the
 #grep statement has been wrong all along.

 scripts2source - (c(/home/ssefick/R_scripts/Convert_package.R,
 /home/ssefick/R_scripts/Convert_R_CODE,
 /home/ssefick/R_scripts/CV.R, /home/ssefick/R_scripts/cvs.out.R,
 /home/ssefick/R_scripts/database_connect,
 /home/ssefick/R_scripts/database_connect_package.R,
 /home/ssefick/R_scripts/exit_db.R, /home/ssefick/R_scripts/exit.R,
 /home/ssefick/R_scripts/hourly_zoo.R,
 /home/ssefick/R_scripts/model_diag.R,
 /home/ssefick/R_scripts/not_numeric.R,
 /home/ssefick/R_scripts/num_ecol_package.R,
 /home/ssefick/R_scripts/NumEcolR_scripts,
 /home/ssefick/R_scripts/old_scripts_DELETE_AFTER_DECEMBER,
 /home/ssefick/R_scripts/only_numeric_dataframe.R,
 /home/ssefick/R_scripts/only_numeric.R,
 /home/ssefick/R_scripts/PCA.ve.R,
 /home/ssefick/R_scripts/poster_ggplot2_theme.R,
 /home/ssefick/R_scripts/pressure_transducer_package.R,
 /home/ssefick/R_scripts/Pressure_Transducer_R_CODE,
 /home/ssefick/R_scripts/publication_ggplot2_theme.R,
 /home/ssefick/R_scripts/r2test.R,
 /home/ssefick/R_scripts/recession_constant_package.R,
 /home/ssefick/R_scripts/recession_constant_R_CODE,
 /home/ssefick/R_scripts/serdp_name_split.R,
 /home/ssefick/R_scripts/setup_R.R,
 /home/ssefick/R_scripts/USGS.R))

 scripts2source[grep(*.R, scripts2source)]

 #here is my problem;  I would like these to be removed.
 scripts2source[c(2,5,13,14,20,24)]

 #Thanks for all of your help in advance
 #kindest regards,

 #Stephen Sefick

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep fixed (?) in 2.14

2011-11-03 Thread Stephen Sefick

That did the trick.  I have read about regular expressions often, and 
sometimes I get them right and sometimes I don't.  Is there a good 
reference resource that anyone could suggest?  Thanks for all of the help.


Stephen Sefick



On 11/03/2011 08:03 AM, jim holtman wrote:

your syntax is wrong, you need:

  scripts2source[grep(*\\.R$, scripts2source)]

Notice the '\\.' to escape the special meaning of '.', and the $ to
anchor to the end of the line.


On Thu, Nov 3, 2011 at 8:54 AM, Stephen Seficksas0...@auburn.edu  wrote:

#This is probably due to my incomplete understanding of grep, but the below
code has been working for some time to
#search for .R with anything in front of it and return a list of scripts to
source.  Likely, the syntax for the
#grep statement has been wrong all along.

scripts2source- (c(/home/ssefick/R_scripts/Convert_package.R,
/home/ssefick/R_scripts/Convert_R_CODE,
/home/ssefick/R_scripts/CV.R, /home/ssefick/R_scripts/cvs.out.R,
/home/ssefick/R_scripts/database_connect,
/home/ssefick/R_scripts/database_connect_package.R,
/home/ssefick/R_scripts/exit_db.R, /home/ssefick/R_scripts/exit.R,
/home/ssefick/R_scripts/hourly_zoo.R,
/home/ssefick/R_scripts/model_diag.R,
/home/ssefick/R_scripts/not_numeric.R,
/home/ssefick/R_scripts/num_ecol_package.R,
/home/ssefick/R_scripts/NumEcolR_scripts,
/home/ssefick/R_scripts/old_scripts_DELETE_AFTER_DECEMBER,
/home/ssefick/R_scripts/only_numeric_dataframe.R,
/home/ssefick/R_scripts/only_numeric.R,
/home/ssefick/R_scripts/PCA.ve.R,
/home/ssefick/R_scripts/poster_ggplot2_theme.R,
/home/ssefick/R_scripts/pressure_transducer_package.R,
/home/ssefick/R_scripts/Pressure_Transducer_R_CODE,
/home/ssefick/R_scripts/publication_ggplot2_theme.R,
/home/ssefick/R_scripts/r2test.R,
/home/ssefick/R_scripts/recession_constant_package.R,
/home/ssefick/R_scripts/recession_constant_R_CODE,
/home/ssefick/R_scripts/serdp_name_split.R,
/home/ssefick/R_scripts/setup_R.R,
/home/ssefick/R_scripts/USGS.R))

scripts2source[grep(*.R, scripts2source)]

#here is my problem;  I would like these to be removed.
scripts2source[c(2,5,13,14,20,24)]

#Thanks for all of your help in advance
#kindest regards,

#Stephen Sefick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss

Dear colleagues,
I have a series of newspaper articles in a text file, downloaded from a text 
file.  They look as follows:

Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date

I have a series of grep scripts that can extract the date and convert it to a 
date object, but I can't figure out how to grep the newspaper name.  There is 
no field ID attached to those lines. The best I can come up with would be to 
have the program grep the four lines following matching the pattern Document 
[0-9].  There is an an argument to grep in unix that can do this ...grep -A4 
'pattern' infileoutfile, but I don't know if there is an equivalent argument 
in R.

Any thoughts.
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley

Dear Simon,

Maybe I don't understand properlyif you are doing this in R, can't
you just pick the line you want?

Josh

## print your data to clipboard
cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
clipboard)
## read data in, and only select the 4th line to pass to grep()
grep(pattern, x = readLines(clipboard)[4])


On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a text 
 file.  They look as follows:

 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date

 I have a series of grep scripts that can extract the date and convert it to a 
 date object, but I can't figure out how to grep the newspaper name.  There is 
 no field ID attached to those lines. The best I can come up with would be to 
 have the program grep the four lines following matching the pattern Document 
 [0-9].  There is an an argument to grep in unix that can do this ...grep -A4 
 'pattern' infileoutfile, but I don't know if there is an equivalent argument 
 in R.

 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss

Hi Josh,
Sorry for the insufficient introduction. This might work, but I'm not sure.
The file that I have includes up to 100 documents (Document 1, Document 2, 
Document 3Document 100) with the newspaper name following 4 lines below 
each Document number.
I'm using readlines to get the text file into R and then trying to use grep to 
get the newspaper name for each record. But your idea of indexing the text 
object read into R with the line number where the newspaper name is found is a 
good one.  I'll just have to come up with a loop to tell R to get the 4th, 8th, 
12, 16th, line, etc. 
I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

 Dear Simon,
 
 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?
 
 Josh
 
 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])
 
 
 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a text 
 file.  They look as follows:
 
 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date
 
 I have a series of grep scripts that can extract the date and convert it to 
 a date object, but I can't figure out how to grep the newspaper name.  There 
 is no field ID attached to those lines. The best I can come up with would be 
 to have the program grep the four lines following matching the pattern 
 Document [0-9].  There is an an argument to grep in unix that can do this 
 ...grep -A4 'pattern' infileoutfile, but I don't know if there is an 
 equivalent argument in R.
 
 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley

If you know you can find the start of the document (say that line
always starts with Document...), then:

grep(Document+., yourfile, value = FALSE) + 4

should give you 4 lines after each line where Document occurred.  No
loop needed :)

On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not sure.
 The file that I have includes up to 100 documents (Document 1, Document 2, 
 Document 3Document 100) with the newspaper name following 4 lines below 
 each Document number.
 I'm using readlines to get the text file into R and then trying to use grep 
 to get the newspaper name for each record. But your idea of indexing the text 
 object read into R with the line number where the newspaper name is found is 
 a good one.  I'll just have to come up with a loop to tell R to get the 4th, 
 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

 Dear Simon,

 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?

 Josh

 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])


 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a 
 text file.  They look as follows:

 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date

 I have a series of grep scripts that can extract the date and convert it to 
 a date object, but I can't figure out how to grep the newspaper name.  
 There is no field ID attached to those lines. The best I can come up with 
 would be to have the program grep the four lines following matching the 
 pattern Document [0-9].  There is an an argument to grep in unix that can 
 do this ...grep -A4 'pattern' infileoutfile, but I don't know if there is 
 an equivalent argument in R.

 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606















-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss

Josh, that's amazing. Is there any way to have it grab two different lines 
after the grep, say the second and the fourth line? There's some other 
information in the text file I'd like to grab.  I could do two separate 
commands, but I'd like to know if this could be done in one command...
Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:

 If you know you can find the start of the document (say that line
 always starts with Document...), then:
 
 grep(Document+., yourfile, value = FALSE) + 4
 
 should give you 4 lines after each line where Document occurred.  No
 loop needed :)
 
 On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not sure.
 The file that I have includes up to 100 documents (Document 1, Document 2, 
 Document 3Document 100) with the newspaper name following 4 lines below 
 each Document number.
 I'm using readlines to get the text file into R and then trying to use grep 
 to get the newspaper name for each record. But your idea of indexing the 
 text object read into R with the line number where the newspaper name is 
 found is a good one.  I'll just have to come up with a loop to tell R to get 
 the 4th, 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
 
 Dear Simon,
 
 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?
 
 Josh
 
 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])
 
 
 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a 
 text file.  They look as follows:
 
 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date
 
 I have a series of grep scripts that can extract the date and convert it 
 to a date object, but I can't figure out how to grep the newspaper name.  
 There is no field ID attached to those lines. The best I can come up with 
 would be to have the program grep the four lines following matching the 
 pattern Document [0-9].  There is an an argument to grep in unix that 
 can do this ...grep -A4 'pattern' infileoutfile, but I don't know if 
 there is an equivalent argument in R.
 
 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/
 
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley

Try this (untested as I'm on my iPhone now):

index - grep(Document+., yourfile, value = FALSE)
index - c(index + 2, index + 4)

You just need to make sure you avoid recycling, e.g.,

1:10 + c(2, 4) # not what you want

If you want a sufficient number of lines that manually writing index + becomes 
cumbersome, you could use something like:

as.vector(sapply(c(2, 4), +, e2 = index))

HTH,

Josh

On Jul 11, 2011, at 11:09, Simon Kiss sjk...@gmail.com wrote:

 Josh, that's amazing. Is there any way to have it grab two different lines 
 after the grep, say the second and the fourth line? There's some other 
 information in the text file I'd like to grab.  I could do two separate 
 commands, but I'd like to know if this could be done in one command...
 Simon Kiss
 On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
 
 If you know you can find the start of the document (say that line
 always starts with Document...), then:
 
 grep(Document+., yourfile, value = FALSE) + 4
 
 should give you 4 lines after each line where Document occurred.  No
 loop needed :)
 
 On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not sure.
 The file that I have includes up to 100 documents (Document 1, Document 2, 
 Document 3Document 100) with the newspaper name following 4 lines below 
 each Document number.
 I'm using readlines to get the text file into R and then trying to use grep 
 to get the newspaper name for each record. But your idea of indexing the 
 text object read into R with the line number where the newspaper name is 
 found is a good one.  I'll just have to come up with a loop to tell R to 
 get the 4th, 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
 
 Dear Simon,
 
 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?
 
 Josh
 
 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])
 
 
 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a 
 text file.  They look as follows:
 
 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date
 
 I have a series of grep scripts that can extract the date and convert it 
 to a date object, but I can't figure out how to grep the newspaper name.  
 There is no field ID attached to those lines. The best I can come up with 
 would be to have the program grep the four lines following matching the 
 pattern Document [0-9].  There is an an argument to grep in unix that 
 can do this ...grep -A4 'pattern' infileoutfile, but I don't know if 
 there is an equivalent argument in R.
 
 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/
 
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/
 
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 
 
 
 
 
 
 
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Bert Gunter

Simon:

Basic basic stuff (not grep -- the stuff thereafter) . Please read the
docs, especially the tutorial,  An Intro to R.

... and Josh's solution can be shortened to (as he knows):

index - grep(Document+., yourfile, value = FALSE) + c(2,4)

-- Bert

On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Try this (untested as I'm on my iPhone now):

 index - grep(Document+., yourfile, value = FALSE)
 index - c(index + 2, index + 4)

 You just need to make sure you avoid recycling, e.g.,

 1:10 + c(2, 4) # not what you want

 If you want a sufficient number of lines that manually writing index + 
 becomes cumbersome, you could use something like:

 as.vector(sapply(c(2, 4), +, e2 = index))

 HTH,

 Josh

 On Jul 11, 2011, at 11:09, Simon Kiss sjk...@gmail.com wrote:

 Josh, that's amazing. Is there any way to have it grab two different lines 
 after the grep, say the second and the fourth line? There's some other 
 information in the text file I'd like to grab.  I could do two separate 
 commands, but I'd like to know if this could be done in one command...
 Simon Kiss
 On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:

 If you know you can find the start of the document (say that line
 always starts with Document...), then:

 grep(Document+., yourfile, value = FALSE) + 4

 should give you 4 lines after each line where Document occurred.  No
 loop needed :)

 On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not sure.
 The file that I have includes up to 100 documents (Document 1, Document 2, 
 Document 3Document 100) with the newspaper name following 4 lines 
 below each Document number.
 I'm using readlines to get the text file into R and then trying to use 
 grep to get the newspaper name for each record. But your idea of indexing 
 the text object read into R with the line number where the newspaper name 
 is found is a good one.  I'll just have to come up with a loop to tell R 
 to get the 4th, 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

 Dear Simon,

 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?

 Josh

 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])


 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a 
 text file.  They look as follows:

 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date

 I have a series of grep scripts that can extract the date and convert it 
 to a date object, but I can't figure out how to grep the newspaper name. 
  There is no field ID attached to those lines. The best I can come up 
 with would be to have the program grep the four lines following matching 
 the pattern Document [0-9].  There is an an argument to grep in unix 
 that can do this ...grep -A4 'pattern' infileoutfile, but I don't know 
 if there is an equivalent argument in R.

 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606















 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606












 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

--

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley

On Jul 11, 2011, at 12:00, Bert Gunter gunter.ber...@gene.com wrote:

 Simon:
 
 Basic basic stuff (not grep -- the stuff thereafter) . Please read the
 docs, especially the tutorial,  An Intro to R.
 
 ... and Josh's solution can be shortened to (as he knows):
 
 index - grep(Document+., yourfile, value = FALSE) + c(2,4)
 

Really?  Won't the 2 and 4 get recycled so that every other element returned 
from grep will have 2 or 4 added instead of 2 *and* 4?

My understanding is that Simon has a single file with for example Document 1 on 
line 1 Document 2 on line 301 etc. And he wants both the 2nd and 4th lines 
after each document, so lines 3, 5, 303, 305 but just doing + c(2,4) would only 
give 3, 305.

Josh

 -- Bert
 
 On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Try this (untested as I'm on my iPhone now):
 
 index - grep(Document+., yourfile, value = FALSE)
 index - c(index + 2, index + 4)
 
 You just need to make sure you avoid recycling, e.g.,
 
 1:10 + c(2, 4) # not what you want
 
 If you want a sufficient number of lines that manually writing index + 
 becomes cumbersome, you could use something like:
 
 as.vector(sapply(c(2, 4), +, e2 = index))
 
 HTH,
 
 Josh
 
 On Jul 11, 2011, at 11:09, Simon Kiss sjk...@gmail.com wrote:
 
 Josh, that's amazing. Is there any way to have it grab two different lines 
 after the grep, say the second and the fourth line? There's some other 
 information in the text file I'd like to grab.  I could do two separate 
 commands, but I'd like to know if this could be done in one command...
 Simon Kiss
 On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
 
 If you know you can find the start of the document (say that line
 always starts with Document...), then:
 
 grep(Document+., yourfile, value = FALSE) + 4
 
 should give you 4 lines after each line where Document occurred.  No
 loop needed :)
 
 On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not 
 sure.
 The file that I have includes up to 100 documents (Document 1, Document 
 2, Document 3Document 100) with the newspaper name following 4 lines 
 below each Document number.
 I'm using readlines to get the text file into R and then trying to use 
 grep to get the newspaper name for each record. But your idea of indexing 
 the text object read into R with the line number where the newspaper name 
 is found is a good one.  I'll just have to come up with a loop to tell R 
 to get the 4th, 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
 
 Dear Simon,
 
 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?
 
 Josh
 
 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])
 
 
 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a 
 text file.  They look as follows:
 
 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date
 
 I have a series of grep scripts that can extract the date and convert 
 it to a date object, but I can't figure out how to grep the newspaper 
 name.  There is no field ID attached to those lines. The best I can 
 come up with would be to have the program grep the four lines following 
 matching the pattern Document [0-9].  There is an an argument to grep 
 in unix that can do this ...grep -A4 'pattern' infileoutfile, but I 
 don't know if there is an equivalent argument in R.
 
 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/
 
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/
 
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 
 
 
 
 
 
 
 
 
 
 
 __

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread David Winsemius



On Jul 11, 2011, at 3:33 PM, Joshua Wiley wrote:


On Jul 11, 2011, at 12:00, Bert Gunter gunter.ber...@gene.com wrote:


Simon:

Basic basic stuff (not grep -- the stuff thereafter) . Please read  
the

docs, especially the tutorial,  An Intro to R.

... and Josh's solution can be shortened to (as he knows):

index - grep(Document+., yourfile, value = FALSE) + c(2,4)



Really?  Won't the 2 and 4 get recycled so that every other element  
returned from grep will have 2 or 4 added instead of 2 *and* 4?


My understanding is that Simon has a single file with for example  
Document 1 on line 1 Document 2 on line 301 etc. And he wants both  
the 2nd and 4th lines after each document, so lines 3, 5, 303, 305  
but just doing + c(2,4) would only give 3, 305.


So:

rep(index, each=2) + c(2,4)

--
David.




Josh


-- Bert

On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley jwiley.ps...@gmail.com 
 wrote:

Try this (untested as I'm on my iPhone now):

index - grep(Document+., yourfile, value = FALSE)
index - c(index + 2, index + 4)

You just need to make sure you avoid recycling, e.g.,

1:10 + c(2, 4) # not what you want

If you want a sufficient number of lines that manually writing  
index + becomes cumbersome, you could use something like:


as.vector(sapply(c(2, 4), +, e2 = index))

HTH,

Josh

On Jul 11, 2011, at 11:09, Simon Kiss sjk...@gmail.com wrote:

Josh, that's amazing. Is there any way to have it grab two  
different lines after the grep, say the second and the fourth  
line? There's some other information in the text file I'd like to  
grab.  I could do two separate commands, but I'd like to know if  
this could be done in one command...

Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:


If you know you can find the start of the document (say that line
always starts with Document...), then:

grep(Document+., yourfile, value = FALSE) + 4

should give you 4 lines after each line where Document  
occurred.  No

loop needed :)

On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com  
wrote:

Hi Josh,
Sorry for the insufficient introduction. This might work, but  
I'm not sure.
The file that I have includes up to 100 documents (Document 1,  
Document 2, Document 3Document 100) with the newspaper name  
following 4 lines below each Document number.
I'm using readlines to get the text file into R and then trying  
to use grep to get the newspaper name for each record. But your  
idea of indexing the text object read into R with the line  
number where the newspaper name is found is a good one.  I'll  
just have to come up with a loop to tell R to get the 4th, 8th,  
12, 16th, line, etc.

I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:


Dear Simon,

Maybe I don't understand properlyif you are doing this in  
R, can't

you just pick the line you want?

Josh

## print your data to clipboard
cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day  
Date, file =

clipboard)
## read data in, and only select the 4th line to pass to grep()
grep(pattern, x = readLines(clipboard)[4])


On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com  
wrote:

Dear colleagues,
I have a series of newspaper articles in a text file,  
downloaded from a text file.  They look as follows:


Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date

I have a series of grep scripts that can extract the date and  
convert it to a date object, but I can't figure out how to  
grep the newspaper name.  There is no field ID attached to  
those lines. The best I can come up with would be to have the  
program grep the four lines following matching the pattern  
Document [0-9].  There is an an argument to grep in unix  
that can do this ...grep -A4 'pattern' infileoutfile, but I  
don't know if there is an equivalent argument in R.







David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Bert Gunter

Josh:

(assuming you have interpreted correctly) You are *absolutely* right
-- I did not read carefully enough.

Does

 index -  matrix(rep(grep(Document+., yourfile, value = FALSE),e=3)
+ c(0,2,4),nc=3,byr=TRUE)

do it for you?

Sheepishly,

Bert




On Mon, Jul 11, 2011 at 12:33 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 On Jul 11, 2011, at 12:00, Bert Gunter gunter.ber...@gene.com wrote:

 Simon:

 Basic basic stuff (not grep -- the stuff thereafter) . Please read the
 docs, especially the tutorial,  An Intro to R.

 ... and Josh's solution can be shortened to (as he knows):

 index - grep(Document+., yourfile, value = FALSE) + c(2,4)


 Really?  Won't the 2 and 4 get recycled so that every other element returned 
 from grep will have 2 or 4 added instead of 2 *and* 4?

 My understanding is that Simon has a single file with for example Document 1 
 on line 1 Document 2 on line 301 etc. And he wants both the 2nd and 4th lines 
 after each document, so lines 3, 5, 303, 305 but just doing + c(2,4) would 
 only give 3, 305.

 Josh

 -- Bert

 On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley jwiley.ps...@gmail.com 
 wrote:
 Try this (untested as I'm on my iPhone now):

 index - grep(Document+., yourfile, value = FALSE)
 index - c(index + 2, index + 4)

 You just need to make sure you avoid recycling, e.g.,

 1:10 + c(2, 4) # not what you want

 If you want a sufficient number of lines that manually writing index + 
 becomes cumbersome, you could use something like:

 as.vector(sapply(c(2, 4), +, e2 = index))

 HTH,

 Josh

 On Jul 11, 2011, at 11:09, Simon Kiss sjk...@gmail.com wrote:

 Josh, that's amazing. Is there any way to have it grab two different lines 
 after the grep, say the second and the fourth line? There's some other 
 information in the text file I'd like to grab.  I could do two separate 
 commands, but I'd like to know if this could be done in one command...
 Simon Kiss
 On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:

 If you know you can find the start of the document (say that line
 always starts with Document...), then:

 grep(Document+., yourfile, value = FALSE) + 4

 should give you 4 lines after each line where Document occurred.  No
 loop needed :)

 On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss sjk...@gmail.com wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not 
 sure.
 The file that I have includes up to 100 documents (Document 1, Document 
 2, Document 3Document 100) with the newspaper name following 4 lines 
 below each Document number.
 I'm using readlines to get the text file into R and then trying to use 
 grep to get the newspaper name for each record. But your idea of 
 indexing the text object read into R with the line number where the 
 newspaper name is found is a good one.  I'll just have to come up with a 
 loop to tell R to get the 4th, 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

 Dear Simon,

 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?

 Josh

 ## print your data to clipboard
 cat(Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date, file =
 clipboard)
 ## read data in, and only select the 4th line to pass to grep()
 grep(pattern, x = readLines(clipboard)[4])


 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from 
 a text file.  They look as follows:

 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date

 I have a series of grep scripts that can extract the date and convert 
 it to a date object, but I can't figure out how to grep the newspaper 
 name.  There is no field ID attached to those lines. The best I can 
 come up with would be to have the program grep the four lines 
 following matching the pattern Document [0-9].  There is an an 
 argument to grep in unix that can do this ...grep -A4 'pattern' 
 infileoutfile, but I don't know if there is an equivalent argument in 
 R.

 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606















 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles

Re: [R] grep pattern

2011-05-25 Thread jim holtman

try this using strsplit:

 x - round(runif(10)*10, digits=0)
 y - as.Date(x, origin=1970-01-01)
 str(y)
Class 'Date'  num [1:10] 26551 37212 57285 90821 20168 ...
 y1 - as.character(y)
 str(y1)
 chr [1:10] 2042-09-11 2071-11-19 2126-11-04 2218-08-30
2025-03-21 2215-12-22 ...
 x - strsplit(y1, '-')
 x[1:3]
[[1]]
[1] 2042 09   11

[[2]]
[1] 2071 11   19

[[3]]
[1] 2126 11   04

 x.1 - sapply(x, '[', 3)
 str(x.1)
 chr [1:10] 11 19 04 30 21 22 24 03 31 02



On Tue, May 24, 2011 at 10:19 AM, Kang Min ngokang...@gmail.com wrote:
 I have another question -

 I'd like to extract dates from a vector of -mm-dd, so I just want
 the dd.

 x - round(runif(10)*10, digits=0)
 y - as.Date(x, origin=1970-01-01)

 I tried this based on the code that Jim provided, but it just printed
 the whole date. I think I just need to tweak it a little, but haven't
 been able to figure it out.

 y[grep([[:digit:]]{2}$, y)]

 Thanks.
 Kang Min

 On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote:
 If you want to only match names of length 6, you will have to use 
 thispattern:

  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,

 +     ZZAZ, ZRITEZ)









  # match exactly values of length 6
  len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
 [1] 2 5 9

 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
  Thanks!

  On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
  On May 20, 2011, at 11:57 AM, Kang Min wrote:

   Hi all,

   I'm trying to subset apatternin a vector. Each argument has 6
   letters, and I need those that start with Z and end with Z.

   e.g.
   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

   I've looked up other discussions but still can't seem to find the
   answer.

  You may need to study the regex page a bit longer

  the ^ is the beginning of a string
  .+ will math can arbitrarily long string of anything
  and $ indicates the end of a string

    x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
  [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
  [1] ZFHJKZ ZKFLPZ

   Thanks.
   Kangmin

   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  David Winsemius, MD
  West Hartford, CT

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting 
  guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep pattern

2011-05-24 Thread Kang Min

I have another question -

I'd like to extract dates from a vector of -mm-dd, so I just want
the dd.

x - round(runif(10)*10, digits=0)
y - as.Date(x, origin=1970-01-01)

I tried this based on the code that Jim provided, but it just printed
the whole date. I think I just need to tweak it a little, but haven't
been able to figure it out.

y[grep([[:digit:]]{2}$, y)]

Thanks.
Kang Min

On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote:
 If you want to only match names of length 6, you will have to use thispattern:

  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,

 +     ZZAZ, ZRITEZ)









  # match exactly values of length 6
  len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
 [1] 2 5 9

 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
  Thanks!

  On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
  On May 20, 2011, at 11:57 AM, Kang Min wrote:

   Hi all,

   I'm trying to subset apatternin a vector. Each argument has 6
   letters, and I need those that start with Z and end with Z.

   e.g.
   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

   I've looked up other discussions but still can't seem to find the
   answer.

  You may need to study the regex page a bit longer

  the ^ is the beginning of a string
  .+ will math can arbitrarily long string of anything
  and $ indicates the end of a string

    x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
  [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
  [1] ZFHJKZ ZKFLPZ

   Thanks.
   Kangmin

   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  David Winsemius, MD
  West Hartford, CT

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep pattern

2011-05-22 Thread Kang Min

Thanks!

On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
 On May 20, 2011, at 11:57 AM, Kang Min wrote:

  Hi all,

  I'm trying to subset a pattern in a vector. Each argument has 6
  letters, and I need those that start with Z and end with Z.

  e.g.
  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

  I've looked up other discussions but still can't seem to find the
  answer.

 You may need to study the regex page a bit longer

 the ^ is the beginning of a string
 .+ will math can arbitrarily long string of anything
 and $ indicates the end of a string

   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
 [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
 [1] ZFHJKZ ZKFLPZ



  Thanks.
  Kangmin

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep pattern

2011-05-22 Thread jim holtman

If you want to only match names of length 6, you will have to use this pattern:

 x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,
+ ZZAZ, ZRITEZ)
 # match exactly values of length 6
 len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
[1] 2 5 9



On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
 Thanks!

 On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
 On May 20, 2011, at 11:57 AM, Kang Min wrote:

  Hi all,

  I'm trying to subset a pattern in a vector. Each argument has 6
  letters, and I need those that start with Z and end with Z.

  e.g.
  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

  I've looked up other discussions but still can't seem to find the
  answer.

 You may need to study the regex page a bit longer

 the ^ is the beginning of a string
 .+ will math can arbitrarily long string of anything
 and $ indicates the end of a string

   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
 [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
 [1] ZFHJKZ ZKFLPZ



  Thanks.
  Kangmin

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep pattern

2011-05-20 Thread Kang Min

Hi all,

I'm trying to subset a pattern in a vector. Each argument has 6
letters, and I need those that start with Z and end with Z.

e.g.
 x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

I've looked up other discussions but still can't seem to find the
answer.

Thanks.
Kangmin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 172 matches

Mail list logo