Sandeep Natarajan wrote:
Hi,
We are using Nutch for building a recommendation engine for Oregon State University library. This is a research project.
I have a question about Nutch. I couldn't find any information on searching with "-inurl:" option in Nutch.
Could you please tell me if Nutch supports this.
Not out of the box, but it wouldn't be too difficult to implement as a pair of index/query plugins. Please take a look at index-more plugin.
Each Nutch/Lucene document already contains a "url" field, which is indexed, tokenized and stored, so you would translate "inurl" type of queries into Lucene phrase query on this field. The trick, however, is to remember that by default NutchAnalyzer produces tokens like "http http-www www nutch org", i.e. combining some of the common terms to reduce their frequency - your phrase query plugin would need to take this into account.
So perhaps it would be easier to just add another field, using a simple "letters and digits" Lucene analyzer.
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general
