RE: [Nutch-dev] Alternative Content Types

Sullivan, Sean C - MWT Tue, 07 Dec 2004 08:44:07 -0800

The code is in CVS

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/parse-pdf/

________________________________________
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Richmond
Sent: Tuesday, December 07, 2004 8:39 AM
To: [EMAIL PROTECTED]
Subject: [Nutch-dev] Alternative Content Types

To Whom It May Concern:
I am a Java developer looking to get involved with a project.� I came across 
your site and noticed that there is a lot of attention paid to PDF parsing.� 
I'm curious why PDF file parsing has not yet been added to Nutch.� There seem 
to be a number of open source (GPL'd) PDF parsers:
PDFBox (http://pdfbox.org)
XPDF (http://www.foolabs.com/xpdf/)
Pdftohtml (http://pdftohtml.sourceforge.net)
Etc...

Is there a reason that these are not used, or are you just waiting for someone 
to implement it?

Regards,

Mike Richmond

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

RE: [Nutch-dev] Alternative Content Types

Reply via email to