Hi,
In PersianOcr project <https://github.com/reza1615/PersianOcr> we developed 
a tool by php & JavaScript that can make a huge box in few Minutes! without 
printing and scan texts

*How it works?*
It is a html tool that runs with browsers you should type or copy a text in 
input text box.
it will create box and image that are used for trainingdata training.
it works with all of browsers but we suggest you for texts more than 13000 
you should use Firefox because the other browsers will crash!
*Note*:IE 9 is very slow don't use it!

*Huge Text*
Till 10000 words Chrome and firefox will create image from text but more 
that it you should use Firefox and it's screenshot extension (Awesome 
screenshot:Capture and annotate 2.3.7) after capturing screen you can crop 
image according to rectangle.
*Note*:chrome for more than 10k words will crash (tested for Persian)

*Calibrating*
Unfortunately Chrome and Firefox doesn't have the same result (box file) on 
the same system! after making box you should create similar box with 
tesseract-ocr 
(it is not importnet that is supporting your language or not. only write* 
**tesseract 
example.tif example -l eng batch.nochop makebox* ) and you should compaire 
the same character's  numbers (i.e. you can add $$$ at the first line of 
your text to finding similar characters) 
you can use shift 1 , shift 2, shift 3, shift 4 text box for shifting all 
the boxes coordination for calibrating total boxes with image after 
pressing *UPDATE* your box will be prepared and you can use it.

*Options
*you can create box and image fore different languages (Right to left, Left 
to right, connected characters, compact texts, ZWNJ characters, Different 
fonts)*
*
*Fonts*
By changing line 22 in default.html file you can add you font
*That's it!*

You can find this tool 
here<https://github.com/reza1615/PersianOcr/blob/master/BoxMaker-en.zip>

p.s.*To Admins*: is it possible to add this tool to wiki's  training part? 
or add in part?
your,
Reza





-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to