Re: [Nutch-dev] Comments on summaries and score tuning

Doug Cutting Fri, 14 May 2004 11:02:49 -0700

Luke Baker wrote:

I think I've seen Doug say that the query options were meant to be simple at first and hopefully add more advanced things later. One thing related to this that I would be very interested in using would be something equivalent to the [inurl:] query that Google provides.

I intend to implement this soon.

Over the next month I hope to add to Nutch:

- an extensible mechanism for handling different content types. Initially I'll just implement text/html and perhaps text/plain, but it should be easy to add handlers for, say, application/pdf or application/msword.

- an extensible mechanism for determining what Lucene fields are indexed with each page. For HTML documents, this will be able to access a DOM tree for the page, and hence be able to index arbitrary metadata.

- an query extension mechanism. My idea here is that the query parser should make clauses following colons (e.g., "inurl:") in a query accessible to an extensible query translator. So you should be able to, for example, easily add a method that, when "inurl:" is present in a query, adds a clause to the query that searches just the url field. In fact, that'll probably be the built-in demonstration of the extension mechanism. But one can also use this to query metadata indexed with the above mechanism.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Comments on summaries and score tuning

Reply via email to