If you can elaborate on what kind of failures you are experiencing, people might be able to help.
On Monday, June 6, 2016 at 12:47:29 PM UTC+5:30, Doron Saar wrote: > > Hi, > > I'm trying to train Tesseract to work with a large library of Hebrew > language documents. > They are all in good quality scanning, black and white, and most of them > have the same font and character size. > > The hebrew alphabet should be relatively very simple for OCR: 27 > characters, no Upper/Lower cases, characters seperated from each other and > standard punctuation like in English. > > Even though, after creating manually about 30 training BOX files and > compiling them, I still get very poor results. > (about 70% accuracy). > It does not seem to improve when I add more training data. > > What can cause this? > > Do I need more training documents? > > Is there a minimal characters resolution? > > What can I do better? > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d35802dd-6acc-4dda-8101-0dc65cf31403%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

