Is it possible to find out what characters are included in a language set? Ideally, I'm looking for some function that gives me all possible string values in the charset. For example, if I just trained Tesseract with the characters ABC123 in my language set, I'd like to get a list of these 6 characters.
I see this function in baseapi.h // Returns true if utf8_character is defined in the UniCharset. bool IsValidCharacter(const char *utf8_character); But I'd have to potentially iterate through every utf-8 character to get what I need. Are there any other ways that would work? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/338e59e5-03e2-48e6-96e3-1825745725a9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

