Hey all,

I'm trying to build a tool to digitize some images of recipe,and  just 
started experimenting with Tesseract. The result seems reasonable. But it 
seems could be further improved by supplying domain specific language 
model. For example, I'm seeing "fish sauce" being recognized as "iisir 
sauce", "shrimp" being recognized "shrmp" ... 

Can someone point out where I can find more information regarding language 
model format. I saw the files with "eng.cube" prefix in language data. I 
would like to know how to interpret them.

Also, is there any tool to show me intermediate result of the process, for 
instance the result of layout analysis, and alternative word hypothesis.

Thanks
jia

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/527a7294-5f83-4a8e-a4e4-27a17104d62b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to