How can I change it to read from <segment>/parse_text instead of
<segment>/content ?

On 5/31/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
Hi,

On 5/31/07, Manoharam Reddy <[EMAIL PROTECTED]> wrote:
> Some confusions regarding plugins.includes
>
> 1. I find a "parse-oo" in the plugins folder. What is that for?

Plugin parse-oo has something to do with parsing OpenOffice.org
documents, I am not sure what exactly.

>
> 2. I have enabled "parse-pdf" by including in "plugins.include" of
> nutch-site.xml. The pages now come in the search result. But when I
> visit the cached page of the result. It shows a message like this:-
>
> The cached content has mime type "application/pdf", click this link to
> download it directly.
>
> Is it not possible to display the parsed content of the PDF instead of
> this message?
>

As its name implies, cached content shows url's content:) . What you
want to see is its parse text. Nutch doesn't do this but it is simple
to change it so that it reads from <segment>/parse_text instead of
<segment>/content .

--
Doğacan Güney

Reply via email to