I just get the same mistakes all the time.

The letter ו is often read as ט 
The letter נ is often read as )
and so on.

When I add more training data files I just get worse results instead of 
better results.


On Monday, June 6, 2016 at 1:51:45 PM UTC+3, Ashish Goel wrote:
>
> If you can elaborate on what kind of failures you are experiencing, people 
> might be able to help.
>
>
> On Monday, June 6, 2016 at 12:47:29 PM UTC+5:30, Doron Saar wrote:
>>
>> Hi,
>>
>> I'm trying to train Tesseract to work with a large library of Hebrew 
>> language documents.
>> They are all in good quality scanning, black and white, and most of them 
>> have the same font and character size.
>>
>> The hebrew alphabet should be relatively very simple for OCR: 27 
>> characters, no Upper/Lower cases, characters seperated from each other and 
>> standard punctuation like in English.
>>
>> Even though, after creating manually about 30 training BOX files and 
>> compiling them, I still get very poor results.
>> (about 70% accuracy).
>> It does not seem to improve when I add more training data.
>>
>> What can cause this?
>>
>> Do I need more training documents?
>>
>> Is there a minimal characters resolution?
>>
>> What can I do better?
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c9778728-9bcb-4ed8-85cb-c32a295fd36f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to