I had to edit DateQueryFilter.java (in
src/plugin/query-more/src/java/org/apache/nutch/searcher/more/DateQueryFilter.java)
in order to have queries that just had date by itself.
The relevant line is:
rangeQuery.setBoost(0.0f); // trigger filterization
I changed 0.0f to 1.0f
More generally, I learned that it doesn't matter if a query works in Lucene,
there has to be support for it somewhere in Nutch query code.
I made the same change to TypeQueryFilter.java.
I also added a TitleQueryFilter since I found that there wasn't even any
code for it. All I did was take URLQueryFilter.java and replace
super("url"); with super("title");
HTH.
Ben
On 8/2/06, Matthew Holt <[EMAIL PROTECTED]> wrote:
I am unable to query fields in my index in the method that has been
suggested. I used Luke to examine my index and the following field types
exist:
anchor, boost, content, contentLength, date, digest, host, lastModified,
primaryType, segment, site, subType, title, type, url
However, when I do a search using one of the fields, followed by a
colon, an incorrect result is returned. I used Luke to find the top term
in the date field which is '20060801'. I then searched using the
following query:
date: 20060801
Unfortunately, nothing was returned. The correct plugins are enabled,
here is an excerpt from my nutch-site.xml:
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|oo|pdf|msword|mspowerpoint|rtf|zip)|index-(basic|more)|query-(more|site|stemmer|url)|summary-basic|scoring-opic</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin.
By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
Any ideas? I'm not the only one having the same problem, I saw an
earlier mailing list post but couldn't find any resolve... Thanks,
Matt