Dear Sochenda, Glad you have succeeded in training for Khmer and thanks for your kind words.
Could you please share with us the images and .box files you used for training? Also some sample input images and respective recognition results would be of much use. Sriranga, I see your *training* process is doing pretty well. Most of your problems are in the dictionary facility. However I do not feel proficient in this field. I mean I know how it works (to be exact how it *should* work), I understand the theoretical basis besides it, but I avoided using it as much as could. When I was getting ready to start using Tess in my project, I read much of the tesseract-XXX groups and I understood that dictionary facility is far from being perfect, at list it's not ready to use yet. Fortunately my project involves much image processing and the specifics of my task imply block/line/letter segmentation so I managed to keep off most of dubious Tess's parts and used it solely as a raw classifier. And I think, classification is what Tess does quite well. Unfortunately I think you will have much struggling with various inconsistencies and cryptic errors, but anyway I think it's worth it. You should report your every error to the team and wait until it's fixed, at the same time trying to found your way around it. Or you can leave the dictionary facility and rely completely on some home brewed post-processing. If you choose this, your problem turns into a small R&D project so you need to find appropriate people to do this job. Warm regards, Dmitry Silaev -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

