Thanks Nick I already have it set up for ghostscript as it gives better results than imagemagick.
As the PDF's are mostly multi-page and ghostscript can generate multi-page TIFF from these, I can feed these directly into Tesseract. So I don't think pdfimages is an option as it spits out multiple files. Steve On Tuesday, April 30, 2013 12:39:53 AM UTC+12, Nick White wrote: > > On Mon, Apr 29, 2013 at 04:10:43AM -0700, Steven McArdle wrote: > > What do you mean by "it doesn't support straight PDF" ? > > I mean it only accepts image files. So you need to extract the > images from the PDF before getting Tesseract to process them. > > Now I think of it, the 'pdfimages' tool is better for this than > imagemagick, as it will extract without converting or losing any > quality. But either would work fine (or Ghostscript, as you point > out). > > Nick > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

