Re: re ocropus wrapper question (pdfimages)

Thomas Breuel Thu, 01 Jan 2009 17:40:25 -0800

On Thu, Jan 1, 2009 at 13:20, James Cloos <[email protected]> wrote:


> All of the pbm images looked just right for OCR-ing.
> [...]
> So, with those LuraTech PDFs, if you run pdfimages and then drop
> everything except the .pbm files, you should have usable images for
> doing OCR.


There is no guarantee that using MRC mask images represent a reasonable
binarization of the input image or that it is going to remain constant or
predictable across documents.

The most reliable way of performing OCR on those images is to treat the MRC
compression inside the PDF as a black box and render the images at 300dpi.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: re ocropus wrapper question (pdfimages)

Reply via email to