Re: Tesseract Training

Dmitry Silaev Mon, 24 Jan 2011 00:53:03 -0800

Dear Sochenda,

Glad you have succeeded in training for Khmer and thanks for your kind
words.


Could you please share with us the images and .box files you used for
training? Also some sample input images and respective recognition results
would be of much use.

Sriranga, I see your *training* process is doing pretty well. Most of your
problems are in the dictionary facility. However I do not feel proficient in
this field. I mean I know how it works (to be exact how it *should* work), I
understand the theoretical basis besides it, but I avoided using it as much
as could. When I was getting ready to start using Tess in my project, I read
much of the tesseract-XXX groups and I understood that dictionary facility
is far from being perfect, at list it's not ready to use yet. Fortunately my
project involves much image processing and the specifics of my task imply
block/line/letter segmentation so I managed to keep off most of dubious
Tess's parts and used it solely as a raw classifier. And I think,
classification is what Tess does quite well.

Unfortunately I think you will have much struggling with various
inconsistencies and cryptic errors, but anyway I think it's worth it. You
should report your every error to the team and wait until it's fixed, at the
same time trying to found your way around it. Or you can leave the
dictionary facility and rely completely on some home brewed post-processing.
If you choose this, your problem turns into a small R&D project so you need
to find appropriate people to do this job.

Warm regards,
Dmitry Silaev

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Tesseract Training

Reply via email to