Re: re ocropus wrapper question (pdfimages)

jimfunderburk Tue, 30 Dec 2008 09:56:45 -0800

I tried pdftoppm, and this seems to do the trick!  Thank you.
btw, pdftoppm has some output options - I used 300 dpi per your
suggestion,
and thought that probably also grey option is ok as far as later
ocropus processing is concerned;
I used convert to get to jpg form, and notice that the resulting jpg
file sizes are about the same for
gray(pgm) or color (ppm), so maybe the gray doesn't matter much from
the point of file sizes.


Jim


On Dec 29, 11:39 pm, "Thomas Breuel" <[email protected]> wrote:
> pdfimages extracts embedded images from PDF files.  That is useful for some
> applications, but there is no guarantee that those are in any way related to
> what you see on the screen.
>
> What you actually want is to render the PDF file in image form; for that,
> you need to use pdftoppm, GhostScript with a file as an output device, or
> ImageMagick's convert.
>
> Tom
>
> On Mon, Dec 29, 2008 at 19:57, jimfunderburk
> <[email protected]>wrote:
>
>
>
> > Hi - I wonder if you know some method of using pdfimages that will get
> > around the following problem:
>
> > I downloaded a pdf file found at
> >http://www.archive.org/details/namalinganusasan00amariala
> > ( a Sanskrit dictionary ).  When viewed with the standard pdf viewer,
> > or with Xpdf, images in the pdf look fine; there are about 340 pages.
> > However, When I try extracting pages from the pdf with pdfimages,  the
> > output appears to have
> > two pages for each one of the 'real' pages (about 700 total ppm files
> > are generated from the pdf), and neither is usable: one is very light
> > in color, one very dark. So, I suspect that somehow the viewers know
> > how to combine these - but does some combination of options and
> > settings with pdfimages permit it to also give 340 useful combined
> > images as output?
> > Or, is there some way to extract all the useful images with xpdf.
>
> > Thanks for any suggestions.
>
> > BTW, the pdf mentioned is a typical source of images to which I hope
> > to apply the ocropus layout analysis.  I have had some success in
> > extracting images from such pdfs using omnipage15, but omnipage15
> > often crashes, and requires Windows OS, so a Ubuntu solution to image
> > extraction from such pdfs is quite desireable to me.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: re ocropus wrapper question (pdfimages)

Reply via email to