[tesseract-ocr] modifying traineddata without retraining possible?

Nikolai Krot Wed, 01 Nov 2017 12:42:07 -0700

Hi all,

I would like to know if I can just unpack-modify-pack files from 
traineddata and get an improvement in OCR (i am using tesseract 3.04). More 
specifically, I want to add to characters (like section character §) and 
new words to the dictionary. Do I need to re-train tesseract or it will 
"just use" new traineddata file?


Or maybe a mixed method: some of the below files require retraining and 
others do not?

   1. deu.bigram-dawg
   2. deu.freq-dawg
   3. deu.inttemp
   4. deu.normproto
   5. deu.number-dawg
   6. deu.params-model
   7. deu.pffmtable
   8. deu.punc-dawg
   9. deu.shapetable
   10. deu.traineddata
   11. deu.unicharambigs
   12. deu.unicharset
   13. deu.word-dawg

I did a short test with extending the wordlist by adding words from one 
document and do not see any significant improvement. Maybe I am doing 
something wrong.

Thanks in advance and best regards,
Nikolai KROT

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/455cea40-91a2-4945-b521-23de67c1593c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] modifying traineddata without retraining possible?

Reply via email to