To Whom It May Concern:

I am a Java developer looking to get involved with a project.  I came across your site and noticed that there is a lot of attention paid to PDF parsing.  I’m curious why PDF file parsing has not yet been added to Nutch.  There seem to be a number of open source (GPL’d) PDF parsers:

PDFBox (http://pdfbox.org)

XPDF (http://www.foolabs.com/xpdf/)

Pdftohtml (http://pdftohtml.sourceforge.net)

Etc…

 

Is there a reason that these are not used, or are you just waiting for someone to implement it?

 

 

Regards,

 

Mike Richmond

Reply via email to