What about the Excel and Powerpoint file format?
Are there any development on the way ?
I saw a message in August talking about an Excel parser, but no other news after this...


Sullivan, Sean C - MWT wrote:

The code is in CVS

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/parse-pdf/


________________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Richmond Sent: Tuesday, December 07, 2004 8:39 AM To: [EMAIL PROTECTED] Subject: [Nutch-dev] Alternative Content Types

To Whom It May Concern:
I am a Java developer looking to get involved with a project.  I came across 
your site and noticed that there is a lot of attention paid to PDF parsing.  
I'm curious why PDF file parsing has not yet been added to Nutch.  There seem 
to be a number of open source (GPL'd) PDF parsers:
PDFBox (http://pdfbox.org)
XPDF (http://www.foolabs.com/xpdf/)
Pdftohtml (http://pdftohtml.sourceforge.net)
Etc...

Is there a reason that these are not used, or are you just waiting for someone 
to implement it?


Regards,

Mike Richmond



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers





-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to