Are there any development on the way ?
I saw a message in August talking about an Excel parser, but no other news after this...
Sullivan, Sean C - MWT wrote:
The code is in CVS
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/parse-pdf/
________________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Richmond Sent: Tuesday, December 07, 2004 8:39 AM To: [EMAIL PROTECTED] Subject: [Nutch-dev] Alternative Content Types
To Whom It May Concern: I am a Java developer looking to get involved with a project. I came across your site and noticed that there is a lot of attention paid to PDF parsing. I'm curious why PDF file parsing has not yet been added to Nutch. There seem to be a number of open source (GPL'd) PDF parsers: PDFBox (http://pdfbox.org) XPDF (http://www.foolabs.com/xpdf/) Pdftohtml (http://pdftohtml.sourceforge.net) Etc...
Is there a reason that these are not used, or are you just waiting for someone to implement it?
Regards,
Mike Richmond
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
