[tesseract-ocr] Extend the standard dictionary for a language with own words

Dayton Thu, 19 Mar 2020 16:16:25 -0700

Hello everyone,

I have noted that Tesseract does not recognize correctly a number of words 
in Polish. I mean, it usually fails with the same Words in all documents 
that I am OCRing.


Is there any way to add these words in the traineddata file? If this is not 
possible, can I train Tesseract to recognize these words correctly and then 
keeping the results in another data file in order to improve the accuracy 
of OCR?

The idea would be to call the standard traineddata file for Polish but also 
looking at these specific words which are not recognized correctly. Not 
sure if it is possible to extend the standard dictionary with my own words 
to improve the accuracy the OCR.

Thanks in advance for your help! 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8811f827-9b46-4a38-9b4a-b2d40090fcd5%40googlegroups.com.

[tesseract-ocr] Extend the standard dictionary for a language with own words

Reply via email to