Hello everyone, I have noted that Tesseract does not recognize correctly a number of words in Polish. I mean, it usually fails with the same Words in all documents that I am OCRing.
Is there any way to add these words in the traineddata file? If this is not possible, can I train Tesseract to recognize these words correctly and then keeping the results in another data file in order to improve the accuracy of OCR? The idea would be to call the standard traineddata file for Polish but also looking at these specific words which are not recognized correctly. Not sure if it is possible to extend the standard dictionary with my own words to improve the accuracy the OCR. Thanks in advance for your help! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8811f827-9b46-4a38-9b4a-b2d40090fcd5%40googlegroups.com.

