> > These all involve rasterising the image - a resolution must be > selected, and if the resolution is not the same as that of the images, > then the results will not be good.
The images look like mixed raster content to me (background, foreground, and selection layer). You could try to reconstruct the original image from that, but there is no guarantee that whatever procedure you develop will continue working on other PDFs. The only process that knows what to do with those images is PDF interpretation of the PDF file. (What you shouldn't do is treat binary mask as a binary image; there is no guarantee that it is.) Just render the images at 300dpi and you'll be fine as far as OCR and layout analysis are concerned. Alternatively, you can obtain the actual image size or resolution from the metadata and render at that resolution. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
