Hi, In PersianOcr project <https://github.com/reza1615/PersianOcr> we developed a tool by php & JavaScript that can make a huge box in few Minutes! without printing and scan texts
*How it works?* It is a html tool that runs with browsers you should type or copy a text in input text box. it will create box and image that are used for trainingdata training. it works with all of browsers but we suggest you for texts more than 13000 you should use Firefox because the other browsers will crash! *Note*:IE 9 is very slow don't use it! *Huge Text* Till 10000 words Chrome and firefox will create image from text but more that it you should use Firefox and it's screenshot extension (Awesome screenshot:Capture and annotate 2.3.7) after capturing screen you can crop image according to rectangle. *Note*:chrome for more than 10k words will crash (tested for Persian) *Calibrating* Unfortunately Chrome and Firefox doesn't have the same result (box file) on the same system! after making box you should create similar box with tesseract-ocr (it is not importnet that is supporting your language or not. only write* **tesseract example.tif example -l eng batch.nochop makebox* ) and you should compaire the same character's numbers (i.e. you can add $$$ at the first line of your text to finding similar characters) you can use shift 1 , shift 2, shift 3, shift 4 text box for shifting all the boxes coordination for calibrating total boxes with image after pressing *UPDATE* your box will be prepared and you can use it. *Options *you can create box and image fore different languages (Right to left, Left to right, connected characters, compact texts, ZWNJ characters, Different fonts)* * *Fonts* By changing line 22 in default.html file you can add you font *That's it!* You can find this tool here<https://github.com/reza1615/PersianOcr/blob/master/BoxMaker-en.zip> p.s.*To Admins*: is it possible to add this tool to wiki's training part? or add in part? your, Reza -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

