You need to turn on two plugins, parse-pdf and parse-msword.;
Look at your ${NUTCH_HOME}/conf/nutch-site.xml, change property
"plugin.include"s:
for example:
<property>
<name>plugin.includes</name>
<value>protocol-(httpclient|file)|urlfilter-(regex)|parse-(text|
html|js|pdf|msword)|index-(basic)|query-
(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|
basic)
</value>
</property>
On Tue, 2008-07-08 at 09:55 +0800, 宫照 wrote:
> hi everybody,
>
> I setup nuthc-0.9, and I can search txt and html in local system . Now i
> want to search pdf and msword , can you tell me how to do?
>
> BR,
>
> mingkong