Re: re ocropus wrapper question (pdfimages)

Thomas Breuel Tue, 30 Dec 2008 09:00:34 -0800

>
> These all involve rasterising the image - a resolution must be
> selected, and if the resolution is not the same as that of the images,
> then the results will not be good.



The images look like mixed raster content to me (background, foreground, and
selection layer).  You could try to reconstruct the original image from
that, but there is no guarantee that whatever procedure you develop will
continue working on other PDFs.  The only process that knows what to do with
those images is PDF interpretation of the PDF file.  (What you shouldn't do
is treat binary mask as a binary image; there is no guarantee that it is.)

Just render the images at 300dpi and you'll be fine as far as OCR and layout
analysis are concerned.  Alternatively, you can obtain the actual image size
or resolution from the metadata and render at that resolution.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: re ocropus wrapper question (pdfimages)

Reply via email to