I need to add words to the list of words recognized by tesseract; the problem is that the list of words I'm adding could be lengthy, and I'm concerned that if I put them all in a .user-words file that the OCR process will be very slow (I'm assuming it does the equivalent of wordlist2dawg on the .user-words file each init()) so I had thought to take my list and "compile" it into a .traineddata file, but of course I'm missing the config, unicharset, unicharambigs, inttemp, pffmtable, and normproto files.
I know that all my words will come from the same language, can I take the existing .traineddata file for that language, extract the config, unicharset, unicharambigs, inttemp, pffmtable, and normproto files, and use them in my own .traineddata file? Maybe another way to ask the question is this - are the config, unicharset, unicharambigs, inttemp, pffmtable, and normproto files dependent on the word list, or are they dependent only on the language & font? Thanks, Chris -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

