Hi - I wonder if you know some method of using pdfimages that will get around the following problem:
I downloaded a pdf file found at http://www.archive.org/details/namalinganusasan00amariala ( a Sanskrit dictionary ). When viewed with the standard pdf viewer, or with Xpdf, images in the pdf look fine; there are about 340 pages. However, When I try extracting pages from the pdf with pdfimages, the output appears to have two pages for each one of the 'real' pages (about 700 total ppm files are generated from the pdf), and neither is usable: one is very light in color, one very dark. So, I suspect that somehow the viewers know how to combine these - but does some combination of options and settings with pdfimages permit it to also give 340 useful combined images as output? Or, is there some way to extract all the useful images with xpdf. Thanks for any suggestions. BTW, the pdf mentioned is a typical source of images to which I hope to apply the ocropus layout analysis. I have had some success in extracting images from such pdfs using omnipage15, but omnipage15 often crashes, and requires Windows OS, so a Ubuntu solution to image extraction from such pdfs is quite desireable to me. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
