On Thu, Jan 1, 2009 at 13:20, James Cloos <[email protected]> wrote:
> All of the pbm images looked just right for OCR-ing. > [...] > So, with those LuraTech PDFs, if you run pdfimages and then drop > everything except the .pbm files, you should have usable images for > doing OCR. There is no guarantee that using MRC mask images represent a reasonable binarization of the input image or that it is going to remain constant or predictable across documents. The most reliable way of performing OCR on those images is to treat the MRC compression inside the PDF as a black box and render the images at 300dpi. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
