Thank you Kevinchen for your tips, I already can parsing pdf and word now.
but in the search result when I click cached, the page will give a result
like this:
The cached content has mime type "application/pdf", click this
link<./servlet/cached?idx=0&id=55>to download it directly.
I want the result cached like google, anybody know how to do?
2008/7/8 kevin chen <[EMAIL PROTECTED]>:
> You need to turn on two plugins, parse-pdf and parse-msword.;
> Look at your ${NUTCH_HOME}/conf/nutch-site.xml, change property
> "plugin.include"s:
>
> for example:
>
> <property>
> <name>plugin.includes</name>
> <value>protocol-(httpclient|file)|urlfilter-(regex)|parse-(text|
> html|js|pdf|msword)|index-(basic)|query-
> (basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|
> basic)
> </value>
> </property>
>
>
> On Tue, 2008-07-08 at 09:55 +0800, 宫照 wrote:
> > hi everybody,
> >
> > I setup nuthc-0.9, and I can search txt and html in local system . Now i
> > want to search pdf and msword , can you tell me how to do?
> >
> > BR,
> >
> > mingkong
>
>