Bradford Stephens wrote:
Greetings,
IIRC, Lucene (which Nutch uses for document indexing) actually indexes data
types via plugins. So if you have a plugin for PDF parsing (I believe there
is one), then you would be able to do what you wish for it.
Cheers,
Bradford
On Thu, Feb 26, 2009 at 11:40
h search the text within the
> image and then catalog the text as part of that PDF document?
>
>
> *Does Nutch index content for .PDF image on text format?*
>
the
image and then catalog the text as part of that PDF document?
*Does Nutch index content for .PDF image on text format?*