Hi list,
I have been working on a tool to automatically generate the files required
by tesseract-ocr for adding support to a new script. This tool takes as
input a file containing all characters of the alphabet, and a directory of
all different fonts. It then generates several tif images and corresponding
box files, and then proceeds to generate the 5 training files:


   - inttemp
   - normproto
   - unicharset
   - Microfeat
   - pffmtable

Here are the links:


   1.
   http://tesseractindic.googlecode.com/files/tesseract_trainer.beta.tar.gz
   - The tar ball itself
   2.
   
http://code.google.com/p/tesseractindic/source/browse/trunk/tesseract_trainer/readme
   - The readme file
   3. http://www.youtube.com/watch?v=vuuVwm5ZjkI - YouTube video of the tool
   working for Bengali

I request feedback.

Thank You,
Debayan Banerjee
NIT Durgapur, India

-- 
Be Intelligent, Use GNU/Linux.
http://debayan.wordpress.com
http://lug.nitdgp.ac.in
http://planet-india.randomink.org

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to