Re: How to improve recognition on TIFF black-and white Romanian text?

Robert Komar Thu, 06 Sep 2012 10:18:48 -0700

On Thu, 6 Sep 2012, Nick White wrote:

Hi Piyush,

As you said 600 DPI image would be good for OCRs. But I am not able to relate
600 DPI with these parameters. My guess is DPI is same as density. Any
suggestion would be highly appreciated.


DPI is the same as imagemagick's -density command, at least for what
we're using it for. Your command may be failing as convert needs to
be used like this:

 convert inputfile.pdf -option1 -option2 outputfile.png

E.g. the input file needs to be before the output options.

Hope this helps. Other than that, the options you're using look fine
to me. Is there something specific that is causing problems?

Nick


I think the "-density 600" option should come before the
name of the pdf file.  It will then scale the vector
output to that DPI (assuming the PDF file has reasonable
DPI values within it).  Putting it after just sets the DPI
tag on the output while rendering the contents at 72 DPI.

I would leave off the -geometry option, as '4000' is
probably not what the width of the scaled contents actually
is.  -depth is for telling convert what the input depth is
(if it can't figure it out itself), so I'd leave that off,
too.  Try something simple like:

convert -density 600 inputfile.pdf -monochrome -compress \
 Group4 outputfile.tif

Cheers,
Rob Komar

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: How to improve recognition on TIFF black-and white Romanian text?

Reply via email to