Hi all, I would like to know if I can just unpack-modify-pack files from traineddata and get an improvement in OCR (i am using tesseract 3.04). More specifically, I want to add to characters (like section character ยง) and new words to the dictionary. Do I need to re-train tesseract or it will "just use" new traineddata file?
Or maybe a mixed method: some of the below files require retraining and others do not? 1. deu.bigram-dawg 2. deu.freq-dawg 3. deu.inttemp 4. deu.normproto 5. deu.number-dawg 6. deu.params-model 7. deu.pffmtable 8. deu.punc-dawg 9. deu.shapetable 10. deu.traineddata 11. deu.unicharambigs 12. deu.unicharset 13. deu.word-dawg I did a short test with extending the wordlist by adding words from one document and do not see any significant improvement. Maybe I am doing something wrong. Thanks in advance and best regards, Nikolai KROT -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/455cea40-91a2-4945-b521-23de67c1593c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

