Hi,

On 5/31/07, Manoharam Reddy <[EMAIL PROTECTED]> wrote:
Some confusions regarding plugins.includes

1. I find a "parse-oo" in the plugins folder. What is that for?

Plugin parse-oo has something to do with parsing OpenOffice.org
documents, I am not sure what exactly.


2. I have enabled "parse-pdf" by including in "plugins.include" of
nutch-site.xml. The pages now come in the search result. But when I
visit the cached page of the result. It shows a message like this:-

The cached content has mime type "application/pdf", click this link to
download it directly.

Is it not possible to display the parsed content of the PDF instead of
this message?


As its name implies, cached content shows url's content:) . What you
want to see is its parse text. Nutch doesn't do this but it is simple
to change it so that it reads from <segment>/parse_text instead of
<segment>/content .

--
Doğacan Güney

Reply via email to