Re: PDF image support

Jeffrey Ratcliffe Sat, 18 Jul 2009 00:47:43 -0700

2009/7/18 Thomas Breuel <[email protected]>:
>>> Many PDFs are just collections of scanned page images.  In those
>>> cases, the best thing to do is to extract the page images and hand
>>> them to OCRopus directly.  If those images contain OCR text from
>>> Distiller, that, too, is potentially useful and it would be good to
>>> extract that so that OCRopus can combine it with its own results.
>> I don't do that yet. I render, rather than extracting images. It might not be
>> hard to detect this though. We'd also need to decide how to match up page
>> images and text.
>
> Match up in what sense?


I embed the OCR output behind the scanned image so that to some extend
you can highlight, copy and paste the text in the correct places.

Regards

Jeff

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: PDF image support

Reply via email to