If you can elaborate on what kind of failures you are experiencing, people 
might be able to help.


On Monday, June 6, 2016 at 12:47:29 PM UTC+5:30, Doron Saar wrote:
>
> Hi,
>
> I'm trying to train Tesseract to work with a large library of Hebrew 
> language documents.
> They are all in good quality scanning, black and white, and most of them 
> have the same font and character size.
>
> The hebrew alphabet should be relatively very simple for OCR: 27 
> characters, no Upper/Lower cases, characters seperated from each other and 
> standard punctuation like in English.
>
> Even though, after creating manually about 30 training BOX files and 
> compiling them, I still get very poor results.
> (about 70% accuracy).
> It does not seem to improve when I add more training data.
>
> What can cause this?
>
> Do I need more training documents?
>
> Is there a minimal characters resolution?
>
> What can I do better?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d35802dd-6acc-4dda-8101-0dc65cf31403%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to