Thank you Kevinchen for your tips, I already can parsing pdf and word now.

but in the search result when I click cached, the page will give a result
like this:

The cached content has mime type "application/pdf", click this
link<./servlet/cached?idx=0&id=55>to download it directly.

I want the result cached like google, anybody know how to do?



2008/7/8 kevin chen <[EMAIL PROTECTED]>:

> You need to turn on two plugins, parse-pdf and parse-msword.;
> Look at your ${NUTCH_HOME}/conf/nutch-site.xml, change property
> "plugin.include"s:
>
> for example:
>
> <property>
>        <name>plugin.includes</name>
>        <value>protocol-(httpclient|file)|urlfilter-(regex)|parse-(text|
> html|js|pdf|msword)|index-(basic)|query-
> (basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|
> basic)
>        </value>
> </property>
>
>
> On Tue, 2008-07-08 at 09:55 +0800, 宫照 wrote:
> > hi everybody,
> >
> > I setup nuthc-0.9, and I can search txt and html in local system . Now i
> > want to search pdf and msword , can you tell me how to do?
> >
> > BR,
> >
> > mingkong
>
>

Reply via email to