On Tue, Mar 20, 2012 at 6:40 AM, Lee <[email protected]> wrote: > CF9 has CFPDF which can extract text, does OpenBD have anything similar? > the cfdocument write up in the manual says 'no documentation available'. >
OpenBD ships with both PDFBox and iText, both of which can extract text from PDFs. We just haven't implemented a CFPDF tag yet. Bit of info on both solutions: http://www.danielspangler.com/2009/01/pdf-text-extraction-in-java.html If you want to use CFEXECUTE and write the text from the PDF out to a file you can do this: http://pdfbox.apache.org/commandlineutilities/ExtractText.html If you want to read the text from a PDF into a variable you can use PDFTextStripper: http://pdfbox.apache.org/userguide/text_extraction.html http://pdfbox.apache.org/apidocs/org/apache/pdfbox/util/PDFTextStripper.html With either PDFBox or iText since those ship with OpenBD you'd just use CreateObject("java" ...) to create instances of the necessary PDFBox or iText classes and go from there. jPedal (which is what CF uses under the hood for a lot of its PDF functionality) also does this. jPedal doesn't ship with OpenBD but it's available as LGPL depending on what you're doing with it so could be used for free: http://www.jpedal.org/support_Extraction.php Hope that helps. If you need a specific example of how to do this in OpenBD I can put one together later today. -- Matthew Woodward [email protected] http://blog.mattwoodward.com identi.ca / Twitter: @mpwoodward Please do not send me proprietary file formats such as Word, PowerPoint, etc. as attachments. http://www.gnu.org/philosophy/no-word-attachments.html -- online documentation: http://openbd.org/manual/ google+ hints/tips: https://plus.google.com/115990347459711259462 http://groups.google.com/group/openbd?hl=en
