If it's showing up using Luke, the indexing filter is probably fine.
You can try putting print statements into the query-filter. Print
out both the input query and the output query, and see if
the numbers are being filtered out somewhere.
You might want to see what's happening in Query.java in the
parse method also.
Howie
Howie,
I inspected my index using Luke and 20060801 shows up several times in
the index. I'm unable to query pretty much any field. Several people seem
to be having the same problem. Does anyone know whats going on?
This is one of the last things I have to resolve to have Nutch deployed
successfully at my organization. Unfortunately, Friday is my last day. Can
anyone offer any assistance??
Thanks,
Matt
Howie Wang wrote:
I think that I have problems querying for numbers and
words with digits in them. Now that I think of it, is it
possible it has something to do with the stemming in
either the query filter or indexing? In either case, I would
print out the text that is being indexed and the phrases
added to the query. You could also using luke to inspect
your index and see whether 20060801 shows up anywhere.
Howie
I tried looked for a page that had the date 20060801 and the text "test"
in the page. I tried the following:
date: 20060801 test
and
date 20060721-20060803 test
Neither worked, any ideas??
Matt
Matthew Holt wrote:
Thanks Jake,
However, it seems to me that it makes most sense that a query should
return all pages that match the query, instead of acting as a content
filter. However, I know its something easy to suggest when you're not
having to implement it, so just a suggestion.
Matt
Vanderdray, Jacob wrote:
Try querying with both the date and something you'd expect to find in
the content. The field query filter is just a filter. It only
restricts your results to things that match the basic query and has the
contents you require in the field. So if you query for "date:2006080
text" you'll be searching for documents that contain "text" in one of
the default query fields and has the value 2006080 in the date field.
Leaving out text in that example would essentially be asking for
nothing in the default fields and 2006080 in the date field which is
why it doesn't return any results.
Hope that helps,
Jake.
-----Original Message-----
From: Matthew Holt [mailto:[EMAIL PROTECTED]
Sent: Wed 8/2/2006 4:58 PM
To: [email protected]
Subject: Querying Fields
I am unable to query fields in my index in the method that has been
suggested. I used Luke to examine my index and the following field
types exist:
anchor, boost, content, contentLength, date, digest, host,
lastModified, primaryType, segment, site, subType, title, type, url
However, when I do a search using one of the fields, followed by a
colon, an incorrect result is returned. I used Luke to find the top
term in the date field which is '20060801'. I then searched using the
following query:
date: 20060801
Unfortunately, nothing was returned. The correct plugins are enabled,
here is an excerpt from my nutch-site.xml:
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|oo|pdf|msword|mspowerpoint|rtf|zip)|index-(basic|more)|query-(more|site|stemmer|url)|summary-basic|scoring-opic</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints
plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
Any ideas? I'm not the only one having the same problem, I saw an
earlier mailing list post but couldn't find any resolve... Thanks,
Matt