Dear Dmitry and Sriranga,

It is nice discussing with you both. Now I am finding the way out with
unicharambigs to see if it can help some. I feel that unicharambigs didn't
help anything as I have edit some and the output of the recognition file is
unchanged. I wonder why it is so.

Regarding the recognition rate, I wonder if you guys know any tools to get
the statistics of the recognition rates? What is the formula?

Sriranga, you can put khm. as a prefix for Khmer Language. I also success in
getting the output of the recognition step even with the warning from while
training. This issue is because of the disorder of the glyphs. I try to fix
the order of the glyphs in box file manually now. It looks okay with the
output file, just rendering matters only.

I am glad that your language and my language has similar structure. This is
helpful!!

Will get back to you when I finish updating the training file with complete
set of glyphs.

Thank you and Best Regards,

Sochenda

2011/1/24 Sriranga(78yrsold) <[email protected]>

> Dear Dmitry,
> thanks for the valuable guidance and encouragement. In fact I am not
> programmer nor developer.. Since the said  Khem lang has independent vowel
> as dependent vowel which are similar to Kannada lang,
> I took interest to know how tesseract will work for khemer lang and also to
> gain experience. Anyway I have succeeded to generate the lim.traineddata
> without any problem.  I am interested to know the percentage of accuracy in
> the output text viz testlim.txt - since I don't know khem lang. Only
> Sochenda  has to tell.  I dont know how to create post-processor program-
> which is better than charambigs.
> With warmest Regards,
> -sriranga(78yrs old)
>
>
> On Mon, Jan 24, 2011 at 2:22 PM, Dmitry Silaev <[email protected]>wrote:
>
>> Dear Sochenda,
>>
>> Glad you have succeeded in training for Khmer and thanks for your kind
>> words.
>>
>> Could you please share with us the images and .box files you used for
>> training? Also some sample input images and respective recognition results
>> would be of much use.
>>
>> Sriranga, I see your *training* process is doing pretty well. Most of your
>> problems are in the dictionary facility. However I do not feel proficient in
>> this field. I mean I know how it works (to be exact how it *should* work), I
>> understand the theoretical basis besides it, but I avoided using it as much
>> as could. When I was getting ready to start using Tess in my project, I read
>> much of the tesseract-XXX groups and I understood that dictionary facility
>> is far from being perfect, at list it's not ready to use yet. Fortunately my
>> project involves much image processing and the specifics of my task imply
>> block/line/letter segmentation so I managed to keep off most of dubious
>> Tess's parts and used it solely as a raw classifier. And I think,
>> classification is what Tess does quite well.
>>
>> Unfortunately I think you will have much struggling with various
>> inconsistencies and cryptic errors, but anyway I think it's worth it. You
>> should report your every error to the team and wait until it's fixed, at the
>> same time trying to found your way around it. Or you can leave the
>> dictionary facility and rely completely on some home brewed post-processing.
>> If you choose this, your problem turns into a small R&D project so you need
>> to find appropriate people to do this job.
>>
>>
>> Warm regards,
>> Dmitry Silaev
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<tesseract-ocr%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to