--- Richard Gaskin <[EMAIL PROTECTED]> wrote: > > Is there an easy way to do this in script? > > -- > Richard Gaskin > Fourth World Media Corporation >
Hi Richard et al, Extracting text from a PDF file is possible, and can indeed be done via scripting, though not for all files until you've climbed the decompression, decryption and decoding mountains. But that's actually just the start of it: PDF is just about the worst text file format in history. Even after stripping out the intermingled styling and positioning instructions, you're left with a bunch of strings which may not necessarily be in the correct order. The applications that are out there to convert PDF to Word files, have a lot in common with Optical Character Recognition (OCR) applications, which attempt to convert scanned images to text, in that they apply algorithms to "collate" the pieces of text into a collection of words and paragraphs. Heck, even Adobe Reader, Apple Preview and other PDF viewers have to "best-guess" what text makes up a sentence when you use the text selection tool. Granted, a good number of files can be read sequentially and churn out the strings in a reasonably effective order - but all bets are off if you takea random document that came out of graphically-oriented tools where people play around with layers and filter effects. Sorry to disappoint you, Jan Schenkel. Quartam Reports & PDF Library for Revolution <http://www.quartam.com> ===== "As we grow older, we grow both wiser and more foolish at the same time." (La Rochefoucauld) ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs _______________________________________________ use-revolution mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
