The code is in CVS http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/parse-pdf/
________________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Richmond Sent: Tuesday, December 07, 2004 8:39 AM To: [EMAIL PROTECTED] Subject: [Nutch-dev] Alternative Content Types To Whom It May Concern: I am a Java developer looking to get involved with a project.� I came across your site and noticed that there is a lot of attention paid to PDF parsing.� I'm curious why PDF file parsing has not yet been added to Nutch.� There seem to be a number of open source (GPL'd) PDF parsers: PDFBox (http://pdfbox.org) XPDF (http://www.foolabs.com/xpdf/) Pdftohtml (http://pdftohtml.sourceforge.net) Etc... Is there a reason that these are not used, or are you just waiting for someone to implement it? Regards, Mike Richmond ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
