I just get the same mistakes all the time. The letter ו is often read as ט The letter נ is often read as ) and so on.
When I add more training data files I just get worse results instead of better results. On Monday, June 6, 2016 at 1:51:45 PM UTC+3, Ashish Goel wrote: > > If you can elaborate on what kind of failures you are experiencing, people > might be able to help. > > > On Monday, June 6, 2016 at 12:47:29 PM UTC+5:30, Doron Saar wrote: >> >> Hi, >> >> I'm trying to train Tesseract to work with a large library of Hebrew >> language documents. >> They are all in good quality scanning, black and white, and most of them >> have the same font and character size. >> >> The hebrew alphabet should be relatively very simple for OCR: 27 >> characters, no Upper/Lower cases, characters seperated from each other and >> standard punctuation like in English. >> >> Even though, after creating manually about 30 training BOX files and >> compiling them, I still get very poor results. >> (about 70% accuracy). >> It does not seem to improve when I add more training data. >> >> What can cause this? >> >> Do I need more training documents? >> >> Is there a minimal characters resolution? >> >> What can I do better? >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c9778728-9bcb-4ed8-85cb-c32a295fd36f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

