I think I've seen Doug say that the query options were meant to be simple at first and hopefully add more advanced things later. One thing related to this that I would be very interested in using would be something equivalent to the [inurl:] query that Google provides.
I intend to implement this soon.
Over the next month I hope to add to Nutch:
- an extensible mechanism for handling different content types. Initially I'll just implement text/html and perhaps text/plain, but it should be easy to add handlers for, say, application/pdf or application/msword.
- an extensible mechanism for determining what Lucene fields are indexed with each page. For HTML documents, this will be able to access a DOM tree for the page, and hence be able to index arbitrary metadata.
- an query extension mechanism. My idea here is that the query parser should make clauses following colons (e.g., "inurl:") in a query accessible to an extensible query translator. So you should be able to, for example, easily add a method that, when "inurl:" is present in a query, adds a clause to the query that searches just the url field. In fact, that'll probably be the built-in demonstration of the extension mechanism. But one can also use this to query metadata indexed with the above mechanism.
Doug
------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
