|
To Whom It May Concern: I am a Java developer looking to get involved with a
project. I came across your site and noticed that there is a lot of attention
paid to PDF parsing. I’m curious why PDF file parsing has not yet been
added to Nutch. There seem to be a number of open source (GPL’d) PDF
parsers: PDFBox (http://pdfbox.org) XPDF (http://www.foolabs.com/xpdf/) Pdftohtml (http://pdftohtml.sourceforge.net) Etc… Is there a reason that these are not used, or are you just
waiting for someone to implement it? Regards, Mike Richmond |
- [Nutch-dev] Alternative Content Types Mike Richmond
- Re: [Nutch-dev] Alternative Content Types Luke Baker
- Re: [Nutch-dev] Alternative Content Types John X
- RE: [Nutch-dev] Alternative Content Types Sullivan, Sean C - MWT
- Re: [Nutch-dev] Alternative Content Types Stéphane Lagraulet
- RE: [Nutch-dev] Alternative Content Types Sullivan, Sean C - MWT
- Re: [Nutch-dev] Alternative Content Types Stéphane Lagraulet
- Re: [Nutch-dev] Alternative Content Ty... John X
- Re: [Nutch-dev] Alternative Conten... Stéphane Lagraulet
- Re: [Nutch-dev] Alternative C... Stéphane Lagraulet
