Re: How to improve recognition on TIFF black-and white Romanian text?

Jani Monoses Wed, 22 Aug 2012 08:07:33 -0700

>
> OK, I see. One thing you could do would be to experiment with
> increasing Tesseract's trust in its dictionary. I have done
> something similar with my training. Create a file with this in:
>
> language_model_penalty_non_freq_dict_word 0.2
> language_model_penalty_non_dict_word 0.3
>


Thanks, I tried this and the output is certainly different, but as
with the dpi changes
some things got better, other regressed with no clear winner.

I tried increasing the values even more but then the regressions seem
to multiply too.
What I notice now is that at higher dpi, all lowercase o is recognized
as e, so I'll probably stick to 600dpi for now.

So there's no way of just adding new words to the existing dictionary
without redoing the whole training?

Are any other tunables such as the above that you think may help looking into?

Jani

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: How to improve recognition on TIFF black-and white Romanian text?

Reply via email to