Oh cool, I haven't actually used multi-page TIFFs before, it's nice
that Tesseract handles them well, straight from ghostscript.

Yes, at the moment I suppose you'll just have to make a little
script or something to wrap the ghostscript and tesseract steps
appropriately.

I have used pdfimages for a number of things, with scripts handling
the files one at a time. But I can see ghostscript would be a better
way of working for you (and quite possibly for me, next time I have
lots of stuff to process).

Nick

On Mon, Apr 29, 2013 at 05:51:49AM -0700, Steven McArdle wrote:
> Thanks Nick
> 
> I already have it set up for ghostscript as it gives better results than
> imagemagick.
> 
> As the PDF's are mostly multi-page and ghostscript can generate multi-page 
> TIFF
> from these, I can feed these directly into Tesseract.
> 
> So I don't think pdfimages is an option as it spits out multiple files.
> 
> Steve
> 
> On Tuesday, April 30, 2013 12:39:53 AM UTC+12, Nick White wrote:
> 
>     On Mon, Apr 29, 2013 at 04:10:43AM -0700, Steven McArdle wrote:
>     > What do you mean by "it doesn't support straight PDF" ?
> 
>     I mean it only accepts image files. So you need to extract the
>     images from the PDF before getting Tesseract to process them.
> 
>     Now I think of it, the 'pdfimages' tool is better for this than
>     imagemagick, as it will extract without converting or losing any
>     quality. But either would work fine (or Ghostscript, as you point
>     out).
> 
>     Nick
> 
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>  
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to