If you intend to do such a project please consider the path taken by an very old, and now defunct, project ... Clara OCR - Cooperative optical recognition <http://freecode.com/projects/claraocr> I mean just skip the final automatic AI full char recognition and only use the char segmentation engine then group all real segmented chars to logical labels and final part of recognition to be manual, assisted by you training UI, revised and aided by real human eyes not that stupid AI who manage to misslabel a char every now and then. Is my belief that using this approach the accuracy of recognition will skyrocket trough the roof of 100% with a very modest time increase necessary for a brief and final human revision not too time consuming because this is what can humans do best spotting the black wolf in a set of white sheep.
======================================= linux is free, but needed expertise to use this little beast is a personal, time consuming, continuous accumulation of knowledge and wasted time can not be rolled back no matter how much money you have selling free software can not bring to you too much money, but USING free software you can make a lot of money like Google ... or IBM your little help to free software development does not bring to you any money but can help you use it more efficiently, so you can make more money ... meanwhile, other users of that little free software program, using your contribution can make more money! nobody loses anything, all those who know how to use a free software program, in continuous evolution, wins registered linux user #352479 2016-09-14 13:55 GMT+03:00 Nalin Linux <[email protected]>: > Dear list members, > Currently I am developing a tesseract training GUI based on the manual > sited at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract. > The deb installer package is attached with this mail which is tested on > ubuntu 16.04. Please test the trainer and report your feedback. > > Installing from git > Dependecy list : tesseract-ocr,imagemagick,cuneiform,python3-imaging- > sane|python3-sane,espeak,poppler-utils,python3-enchant, > aspell-en,python3-speechd > git clone https://gitlab.com/Nalin-x-Linux/lios-3.git > cd lios-3 > python3 setup.py install --install-data=/usr > > Lios Forum : https://groups.google.com/forum/#!forum/lios > > Thanking you, Nalin > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/08ea1575-d457-4893-aa5d-f96c130e3904% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/08ea1575-d457-4893-aa5d-f96c130e3904%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPj%3Dswkz1-NnFrMwtLRLsfJ0rg8g5dx0xAEwvWz1oJit2adfJA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

