Hi list, I have been working on a tool to automatically generate the files required by tesseract-ocr for adding support to a new script. This tool takes as input a file containing all characters of the alphabet, and a directory of all different fonts. It then generates several tif images and corresponding box files, and then proceeds to generate the 5 training files:
- inttemp - normproto - unicharset - Microfeat - pffmtable Here are the links: 1. http://tesseractindic.googlecode.com/files/tesseract_trainer.beta.tar.gz - The tar ball itself 2. http://code.google.com/p/tesseractindic/source/browse/trunk/tesseract_trainer/readme - The readme file 3. http://www.youtube.com/watch?v=vuuVwm5ZjkI - YouTube video of the tool working for Bengali I request feedback. Thank You, Debayan Banerjee NIT Durgapur, India -- Be Intelligent, Use GNU/Linux. http://debayan.wordpress.com http://lug.nitdgp.ac.in http://planet-india.randomink.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

