Hi Andrzej,
I just tried your PruneIndexTool and I'm a little bit confused:
Hi there,
I will be committing a newer version of the tool soon (tomorrow or on Monday).
- using the webfrontend I have a query "wordA wordB wordC" which returns 2 results with different URLs.
A very important thing is that PruneIndexTool uses a DIFFERENT syntax for queries than the Nutch web frontend. The syntax for the tool is Lucene QueryParser syntax - please see the javadoc comments for an example.
- The I tried to remove the pages using PruneIndex: content: "wordA wordB wordC"
First of all, there must be no space between the field name, colon, and the query term. I assume it's just a transcription error, and not the real query...
Anyway, this query means that you want to match all documents, which contain "wordA wordB wordC" as an exact phrase in the content field. Probably not what you wanted... you probably wanted something like:
content:(wordA OR wordB OR wordC)
Am I right?
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
