Hey all, I'm trying to build a tool to digitize some images of recipe,and just started experimenting with Tesseract. The result seems reasonable. But it seems could be further improved by supplying domain specific language model. For example, I'm seeing "fish sauce" being recognized as "iisir sauce", "shrimp" being recognized "shrmp" ...
Can someone point out where I can find more information regarding language model format. I saw the files with "eng.cube" prefix in language data. I would like to know how to interpret them. Also, is there any tool to show me intermediate result of the process, for instance the result of layout analysis, and alternative word hypothesis. Thanks jia -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/527a7294-5f83-4a8e-a4e4-27a17104d62b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

