Hi,

I am creating a tiff image from a pdf document. The convert command 
provides too many options to create image.
eg. 
convert -monochrome -depth 8 -geometry 4000 -density 600 -quality 100 
sample.pdf sample.tiff

I want to tune these parameters to get most suitable image for OCR.
I can change depth, geometry, density, quality, opt for monochrome image 
etc.

As you said 600 DPI image would be good for OCRs. But I am not able to 
relate 600 DPI with these parameters. My guess is DPI is same as density. 
Any suggestion would be highly appreciated.







On Wednesday, 22 August 2012 20:20:06 UTC+5:30, Jani Monoses wrote:
>
> > 
> > OK, I see. One thing you could do would be to experiment with 
> > increasing Tesseract's trust in its dictionary. I have done 
> > something similar with my training. Create a file with this in: 
> > 
> > language_model_penalty_non_freq_dict_word 0.2 
> > language_model_penalty_non_dict_word 0.3 
> > 
>
> Thanks, I tried this and the output is certainly different, but as 
> with the dpi changes 
> some things got better, other regressed with no clear winner. 
>
> I tried increasing the values even more but then the regressions seem 
> to multiply too. 
> What I notice now is that at higher dpi, all lowercase o is recognized 
> as e, so I'll probably stick to 600dpi for now. 
>
> So there's no way of just adding new words to the existing dictionary 
> without redoing the whole training? 
>
> Are any other tunables such as the above that you think may help looking 
> into? 
>
> Jani 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to