On Wed, 07 Apr 2010 20:54:41 +0100
Rob Clement <[email protected]> dijo:

>To extract the text from a pdf - why not simply scan in the text using
>a document scanner set to OCR and not to scan pictures. It would save 
>trying to do it another way.

Earlier in this thread there was a mention of printing to CUPS-PDF. I
have done so on a few occasions, and in each case the text was
converted to outlines. I just wanted to add that fact to the thread,
lest someone think that printing to CUPS-PDF is the equivalent of
exporting from OOo or Scribus.

Regarding doing an OCR of a printout of the file, that is not generally
necessary. Most OCR programs these days can work on a bitmap of the
file. So just print the PDF to an image format and then OCR the image
format. No need to waste paper or even own a printer or scanner.

Having said that, OCR programs are far from perfect. They save typing,
but you'll still have to spend some time cleaning up the errors. And
you will typically lose design formatting, e.g., indents.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to