I'd suggest doing it with multiple regular expressions -- you could construct a single regular expression for this, but I expect it would get quite complicated and possibly very slow.

The expression for "y" in the example below tabulates how many words matched for each line (i.e., line 2 matched 1 word, line 3 matched 3 words, and line 4 matched 2 words).


> x <- readLines("clipboard", -1)
> x
[1] "Is there a way to use regular expressions to capture two or more words in a "
[2] "sentence? For example, I wish to to find all the lines that have the words \"thomas\", "
[3] "\"perl\", and \"program\", such as \"thomas uses a program called perl\", or \"perl is a "
[4] "program that thomas uses\", etc."
> sapply(c("perl","program","thomas"), function(re) grep(re, x))
$perl
[1] 3


$program
[1] 3 4

$thomas
[1] 2 3 4

> unlist(sapply(c("perl","program","thomas"), function(re) grep(re, x)), use.names=F)
[1] 3 3 4 2 3 4
> y <- table(unlist(sapply(c("perl","program","thomas"), function(re) grep(re, x)), use.names=F))
> y


2 3 4
1 3 2
> which(y>=2)
3 4
2 3
>

hope this helps,

Tony Plate

At Monday 05:59 PM 7/12/2004, Sangick Jeon wrote:


Hi,

Is there a way to use regular expressions to capture two or more words in a
sentence? For example, I wish to to find all the lines that have the words "thomas",
"perl", and "program", such as "thomas uses a program called perl", or "perl is a
program that thomas uses", etc.


I'm sure this is a very easy task, I would greatly appreciate any help. Thanks!

Sangick

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to