> > OK, I see. One thing you could do would be to experiment with > increasing Tesseract's trust in its dictionary. I have done > something similar with my training. Create a file with this in: > > language_model_penalty_non_freq_dict_word 0.2 > language_model_penalty_non_dict_word 0.3 >
Thanks, I tried this and the output is certainly different, but as with the dpi changes some things got better, other regressed with no clear winner. I tried increasing the values even more but then the regressions seem to multiply too. What I notice now is that at higher dpi, all lowercase o is recognized as e, so I'll probably stick to 600dpi for now. So there's no way of just adding new words to the existing dictionary without redoing the whole training? Are any other tunables such as the above that you think may help looking into? Jani -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

