Hi,
I think I got my mistake:
content:"wordA wordB"
returns the two pages and this are the results highlighted in the webfrontend.
content:"wordA wordB wordC"
returns the same results (not showing "wordC" in the resultpage). The first result has a score of > 2000 and the second of .11 !
ok - thanks!
Michael
Michael Nebel schrieb:
Hi Andrzej,
(I have been to stupid to set the correct subject last time, so I correct it now).
- using the webfrontend I have a query "wordA wordB wordC" which returns 2 results with different URLs.
A very important thing is that PruneIndexTool uses a DIFFERENT syntax for queries than the Nutch web frontend. The syntax for the tool is Lucene QueryParser syntax - please see the javadoc comments for an example.
as far as I checked, I got the lucene query syntax, but this did not help either. Have I gotten the wrong page? http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
BTW: have you implemented a "not"-query"? I want to remove all documents with 'lang:(NOT "en" and NOT "de")', but the output looks not ok. I have french and danish pages in my test system. Queries like
lang:"fr" or lang:"da"
or
lang:"fr" or lang:"da"
shows them.
- The I tried to remove the pages using PruneIndex: content: "wordA wordB wordC"
First of all, there must be no space between the field name, colon, and the query term. I assume it's just a transcription error, and not the real query...
I had no spaces, but I just tried: with spaces it's the same result ?-0 Checking the fields it's really the same result with and without.
Anyway, this query means that you want to match all documents, which contain "wordA wordB wordC" as an exact phrase in the content field. Probably not what you wanted... you probably wanted something like:
no - I want to delete the page with all three words in. The webfrontend shows two different results (in fact it's a special testcase I build :-).
Concerning the example of your original posting:
content:wordA +url:"abc"
returns the same results than
url:"abc"
The wordA makes no different.
+content:"wordA" +url:"abc"
works as expected.
I think, I'm making a mistake, but which?
Regards
Michael
PS.: But the tool itself is a great idea! Thanks!
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
