Regarding Training of Tesseract for English Language

Sandeep Parmar Thu, 23 Jun 2011 21:40:00 -0700

Hi all,

I am evaluating tesseract for my project and I found that its very good
compared to other free OCRs. However I have some
doubts regarding Training Tesseract 3.0 for new font types.I did two things
while training tesseract..


1) I made a text document containing all the Alphabets, numbers and ASCII
charactres for different fonts like Times New Roman,
    Arial, Verdana, Comic Sans etc. I got Printout of all and then scanned
them to make TIF images. And i followed the steps mentioned
    for training tesserct 3.0 on
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

    But, the result I got from my trained data was not comparable to
'eng.traineddata' provided by default, it was very poor.

2) Then I decided to make a traineddata from the TIF & BOX files for
tesseract 2.04 provided by Tesseract from

http://code.google.com/p/tesseract-ocr/downloads/detail?name=boxtiff-2.01.eng.tar.gz&can=2&q=
     I successfully created the my 'eng.traineddata' from this and I got
improved result compared to my first approach. But, the output of

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Regarding Training of Tesseract for English Language

Reply via email to