Regarding Tesseract 3.0 training

Sandeep Parmar Thu, 23 Jun 2011 21:39:33 -0700

Hi all,

I am evaluating tesseract for my project and I found that its very good
compared to other free OCRs. However I have some
doubts regarding Training Tesseract 3.0 for new font types.I did two things
while training tesseract..


1) I made a text document containing all the Alphabets, numbers and ASCII
charactres for different fonts like Times New Roman,
    Arial, Verdana, Comic Sans etc. I got Printout of all and then scanned
them to make TIF images. And i followed the steps mentioned
    for training tesserct 3.0 on
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

    But, the result I got from my trained data was not comparable to
'eng.traineddata' provided by default, it was very poor.

2) Then I decided to make a traineddata from the TIF & BOX files for
tesseract 2.04 provided by Tesseract from

http://code.google.com/p/tesseract-ocr/downloads/detail?name=boxtiff-2.01.eng.tar.gz&can=2&q=
     I successfully created the my 'eng.traineddata' from this and I got
improved result compared to my first approach. But, the output of
     the second approach was differing slightly from the output i got from
original 'eng.traineddata'

     Also, the size of the my trained data was less then the
'eng.traineddata' provided by Tesseract3.0.exe (windows installaler)


Please suggest what could be the reason for such differences

Regards
Sandeep

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Regarding Tesseract 3.0 training

Reply via email to