No, Lucene does not have a built-in query that uses regular expressions. It's trivial to write a custom Query class like WildcardQuery that does regular expression searching. In fact, I've created this and am contributing it to Lucene as soon as I can (slowly but surely).

As for how that would work with Nutch - you'd need to integrate it using a plugin once Lucene or your custom code has this capability.

    Erik

On 31 Oct 2005, at 11:45, Rajan, Renuka wrote:

Hello all

I am a newbie to Nutch and Lucene and am experimenting with this
combination to 'scrape' web pages.  To this end, I need to use regular
expressions in combination with Lucene to search the pages fetched by
nutch.

Does Lucene support regular expressions? The book 'Lucene in Action'
talks about wildcard queries but not so much about regex.  Does Lucene
support regex searches?

Thanks in advance for your help
Renuka

-----Original Message-----
From: Zaheed Haque [mailto:[EMAIL PROTECTED]
Sent: Monday, October 31, 2005 9:58 AM
To: [email protected]
Subject: Re: Jira - Nutch 48 - did you mean patch

:-)

Yep! Works Great!

/Z

On 10/31/05, Byron Miller <[EMAIL PROTECTED]> wrote:

brainfar, meant mozdex.com using slashdot.org as an
example

http://www.mozdex.com/search.jsp?query=slashdt

Try that one.

--- Zaheed Haque <[EMAIL PROTECTED]> wrote:


I just tried

http://slashdot.org/search.pl?query=slashdt

doesn't work! or maybe the URL above is not correct?

Cheers
Zaheed

On 10/31/05, Byron Miller <[EMAIL PROTECTED]>
wrote:

I got this to work this evening.. was a problem

with

patch on the system i was working on..

feel free to check it out on slashdot.org..  you

can

try an example of searching for "slashdt" and it
should recommend the good site :)

-byron

--- Byron Miller <[EMAIL PROTECTED]> wrote:


Anyone using this patch?

http://issues.apache.org/jira/browse/NUTCH-48

I would like to incorporate this, but not having
much
luck getting the patch to install over svn

release

(branch .7)

-byron












The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.


Reply via email to