No, Lucene does not have a built-in query that uses regular
expressions. It's trivial to write a custom Query class like
WildcardQuery that does regular expression searching. In fact, I've
created this and am contributing it to Lucene as soon as I can
(slowly but surely).
As for how that would work with Nutch - you'd need to integrate it
using a plugin once Lucene or your custom code has this capability.
Erik
On 31 Oct 2005, at 11:45, Rajan, Renuka wrote:
Hello all
I am a newbie to Nutch and Lucene and am experimenting with this
combination to 'scrape' web pages. To this end, I need to use regular
expressions in combination with Lucene to search the pages fetched by
nutch.
Does Lucene support regular expressions? The book 'Lucene in Action'
talks about wildcard queries but not so much about regex. Does Lucene
support regex searches?
Thanks in advance for your help
Renuka
-----Original Message-----
From: Zaheed Haque [mailto:[EMAIL PROTECTED]
Sent: Monday, October 31, 2005 9:58 AM
To: [email protected]
Subject: Re: Jira - Nutch 48 - did you mean patch
:-)
Yep! Works Great!
/Z
On 10/31/05, Byron Miller <[EMAIL PROTECTED]> wrote:
brainfar, meant mozdex.com using slashdot.org as an
example
http://www.mozdex.com/search.jsp?query=slashdt
Try that one.
--- Zaheed Haque <[EMAIL PROTECTED]> wrote:
I just tried
http://slashdot.org/search.pl?query=slashdt
doesn't work! or maybe the URL above is not correct?
Cheers
Zaheed
On 10/31/05, Byron Miller <[EMAIL PROTECTED]>
wrote:
I got this to work this evening.. was a problem
with
patch on the system i was working on..
feel free to check it out on slashdot.org.. you
can
try an example of searching for "slashdt" and it
should recommend the good site :)
-byron
--- Byron Miller <[EMAIL PROTECTED]> wrote:
Anyone using this patch?
http://issues.apache.org/jira/browse/NUTCH-48
I would like to incorporate this, but not having
much
luck getting the patch to install over svn
release
(branch .7)
-byron
The information contained in this communication may be CONFIDENTIAL
and is intended only for the use of the recipient(s) named above.
If you are not the intended recipient, you are hereby notified that
any dissemination, distribution, or copying of this communication,
or any of its contents, is strictly prohibited. If you have
received this communication in error, please notify the sender and
delete/destroy the original message and any copy of it from your
computer or paper files.