re ocropus wrapper question (pdfimages)

jimfunderburk Mon, 29 Dec 2008 19:58:18 -0800

Hi - I wonder if you know some method of using pdfimages that will get
around the following problem:


I downloaded a pdf file found at 
http://www.archive.org/details/namalinganusasan00amariala
( a Sanskrit dictionary ).  When viewed with the standard pdf viewer,
or with Xpdf, images in the pdf look fine; there are about 340 pages.
However, When I try extracting pages from the pdf with pdfimages,  the
output appears to have
two pages for each one of the 'real' pages (about 700 total ppm files
are generated from the pdf), and neither is usable: one is very light
in color, one very dark. So, I suspect that somehow the viewers know
how to combine these - but does some combination of options and
settings with pdfimages permit it to also give 340 useful combined
images as output?
Or, is there some way to extract all the useful images with xpdf.

Thanks for any suggestions.

BTW, the pdf mentioned is a typical source of images to which I hope
to apply the ocropus layout analysis.  I have had some success in
extracting images from such pdfs using omnipage15, but omnipage15
often crashes, and requires Windows OS, so a Ubuntu solution to image
extraction from such pdfs is quite desireable to me.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

re ocropus wrapper question (pdfimages)

Reply via email to