I experience the same with tesseract 4.0 installed with best traindata from 
repo

# printf "deb https://notesalexp.org/tesseract-ocr/$(lsb_release -sc)/ 
$(lsb_release -sc) main\ndeb 
https://notesalexp.org/tesseract-ocr/tessdata_best/ stretch main\n" >> 
/etc/apt/sources.list

onsdag den 25. april 2018 kl. 16.59.34 UTC+2 skrev Youcef:
>
> Hi,
>
>
> Tesseract seems to post process its prediction.
>
> Here after, what I get after OCRizing images (same font, same size images 
> generated with text2image):
>
> - an image containing "12345678I" => `123456781`
> - an image containing "GLOTHUVFI" => `GLOTHUVFI`
> - an image containing "12345678H" => `12345678H`
> - an image containing "GLOTHUVFH" => `GLOTHUVFH`
> - an image containing "12345678A" => `123456784`
> - an image containing "GLOTHUVFA" => `GLOTHUVFA`
>
> It looks like Tesseract doesn't like a word with a some numbers and one 
> letter at the end. In fact, if the letter looks like a number ("I" and "A" 
> looks like "1" and "4" respectively), it replaces it by the closest number.
> I have tried to tune following parameters without any changement in the 
> result:
>
> - segment_penalty_dict_frequent_word
> - language_model_penalty_chartype
>
> Thanks for any help.
>
> Regards
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f6c60f25-83c0-4ef6-92d2-eefa85674845%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to