I had the following set in nutch-site.xml during the crawl:

<property>
 <name>plugin.includes</name>
<value>protocol-(httpclient|http|file|ftp|file)|urlfilter-regex|parse-(text|html|js|msword|pdf|rss|ext)|index-(basic|more)|query-(basi
c|site|url|more)</value>
 <description>Regular expression naming plugin directory names to
 include.  Any plugin not matching this expression is excluded.  By
 default Nutch includes crawling just HTML and plain text via HTTP,
 and basic indexing and search plugins.
 </description>
</property>

Is there anywhere else I need to enable index and query-more? Also, (sorry if this is a dumb question) how do I reindex the segments?

Thanks,

Ed.

Edward Quick wrote:
Hi,

Should type: and date: queries work with the search.jsp program?
I'm using Nutch 0.7, and crawled the intranet at work. String searches work fine, but I want to test out the new features added by John Xing in the changelog (http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/CHANGES.txt?rev=1.48) for 0.7.

When I search on something like:

news type:pdf

or

news type:application/pdf

I don't get any results, where I would expect to because all our news docs are in pdf format.

You probably forgot to enable index-more and query-more plugins. After you do this, you need to re-index your segments.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to