I butted my head against this sort of thing at first too, and the answer was that Tesseract doesn't seem to trust the dictionary very much by default. There's a FAQ entry on how to change that: http://code.google.com/p/tesseract-ocr/wiki/FAQ (How to increase the trust in/strength of the dictionary?).
On Apr 10, 10:33 am, Jon <[email protected]> wrote: > Err... my bad, the parameter should indeed be 1 character. > > On Apr 10, 5:24 pm, Jon <[email protected]> wrote: > > > I may be wrong, but I think /dict/dawg.cpp line 144 doesn't seem to > > consider UTF-8 (parameter 3 is a single byte), and thus fails on my > > Hebrew word. > > I'm still looking into it, it's the first time I'm looking at the > > code. > > > More to come. > > > On Apr 6, 11:57 am, paulfeakins <[email protected]> wrote: > > > > Hi Jon, I also get the feeling my dictionary files are being ignored, > > > but I don't know what's causing it as yet... --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

