You need to turn on two plugins, parse-pdf and parse-msword.;
Look at your ${NUTCH_HOME}/conf/nutch-site.xml, change property
"plugin.include"s:

for example:

<property>
        <name>plugin.includes</name>
        <value>protocol-(httpclient|file)|urlfilter-(regex)|parse-(text|
html|js|pdf|msword)|index-(basic)|query-
(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|
basic)
        </value>        
</property>


On Tue, 2008-07-08 at 09:55 +0800, 宫照 wrote:
> hi everybody,
> 
> I setup nuthc-0.9, and I can search txt and html in local system . Now i
> want to search pdf and msword , can you tell me how to do?
> 
> BR,
> 
> mingkong

Reply via email to