Hi, I have written a little GUI for Tesseract in python. It is available on http://ocropus.googlegroups.com/web/guitesseract.py?gsc=Z8kTHBYAAACg2FwGPC9XOsx4lRLfkDCCq9K8Kz9yQIr4tC0O5ImEZA.
Implemented features: 1) Batch processing using Tesseract over all *.jpg, *.jpeg images in selected directory. 2) Optionally crop, rotate, normalize the image using imagemagick. (The crop region can be visually chosen on the image preview, as the program expects the images were scanned/ photographed so that the text regions are on the same position.) 3) Simple interface - in case of well-prepared images, it only takes few clicks to process them. 4) Automatic numbering of output pages (all/even/odd). Planned features: 1) Interaction with Ocropus, not only Tesseract. 2) Tree view of pages->regions->region_options, allowing the user to visually select the region geometry or to check the layout analysis returned correct results. 3) Some user friendly tool to train Tesseract/Ocropus to recognize new languages and fonts. (I haven't made this idea clear yet.) 4) Options for constructing a complete document in plain text, ODT, HTML, LaTeX. Each of these options would need a different approach of outputting images (ODT embeds them, HTML needs PNG, LaTeX needs EPS/ PDF). 5) Some simple wizard for beginners. 6) I18n, packaging, distribution with Ocropus. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
