Dear Dmitry and Sriranga, It is nice discussing with you both. Now I am finding the way out with unicharambigs to see if it can help some. I feel that unicharambigs didn't help anything as I have edit some and the output of the recognition file is unchanged. I wonder why it is so.
Regarding the recognition rate, I wonder if you guys know any tools to get the statistics of the recognition rates? What is the formula? Sriranga, you can put khm. as a prefix for Khmer Language. I also success in getting the output of the recognition step even with the warning from while training. This issue is because of the disorder of the glyphs. I try to fix the order of the glyphs in box file manually now. It looks okay with the output file, just rendering matters only. I am glad that your language and my language has similar structure. This is helpful!! Will get back to you when I finish updating the training file with complete set of glyphs. Thank you and Best Regards, Sochenda 2011/1/24 Sriranga(78yrsold) <[email protected]> > Dear Dmitry, > thanks for the valuable guidance and encouragement. In fact I am not > programmer nor developer.. Since the said Khem lang has independent vowel > as dependent vowel which are similar to Kannada lang, > I took interest to know how tesseract will work for khemer lang and also to > gain experience. Anyway I have succeeded to generate the lim.traineddata > without any problem. I am interested to know the percentage of accuracy in > the output text viz testlim.txt - since I don't know khem lang. Only > Sochenda has to tell. I dont know how to create post-processor program- > which is better than charambigs. > With warmest Regards, > -sriranga(78yrs old) > > > On Mon, Jan 24, 2011 at 2:22 PM, Dmitry Silaev <[email protected]>wrote: > >> Dear Sochenda, >> >> Glad you have succeeded in training for Khmer and thanks for your kind >> words. >> >> Could you please share with us the images and .box files you used for >> training? Also some sample input images and respective recognition results >> would be of much use. >> >> Sriranga, I see your *training* process is doing pretty well. Most of your >> problems are in the dictionary facility. However I do not feel proficient in >> this field. I mean I know how it works (to be exact how it *should* work), I >> understand the theoretical basis besides it, but I avoided using it as much >> as could. When I was getting ready to start using Tess in my project, I read >> much of the tesseract-XXX groups and I understood that dictionary facility >> is far from being perfect, at list it's not ready to use yet. Fortunately my >> project involves much image processing and the specifics of my task imply >> block/line/letter segmentation so I managed to keep off most of dubious >> Tess's parts and used it solely as a raw classifier. And I think, >> classification is what Tess does quite well. >> >> Unfortunately I think you will have much struggling with various >> inconsistencies and cryptic errors, but anyway I think it's worth it. You >> should report your every error to the team and wait until it's fixed, at the >> same time trying to found your way around it. Or you can leave the >> dictionary facility and rely completely on some home brewed post-processing. >> If you choose this, your problem turns into a small R&D project so you need >> to find appropriate people to do this job. >> >> >> Warm regards, >> Dmitry Silaev >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<tesseract-ocr%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

