Hi all, I'm working on training Tesseract for Georgian script. I was wondering what type of input I should use to make the punc-dawg and number-dawg dictionaries. Right now, I have a word dictionary, which has eliminated some errors, but it has also caused Tesseract to ignore punctuation in some cases. I am hoping that providing a punc-dawg is the solution, but I haven't been able to find a good resource for this, either in the list archives or in the source files.
Can anyone tell me what type of file I should use to create the punc-dawg and number-dawg files? Thanks! Derek -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

