[tesseract-ocr] Query possible characters are runtime

Matt Hill Mon, 10 Aug 2015 11:50:09 -0700

Is it possible to find out what characters are included in a language set? 
 Ideally, I'm looking for some function that gives me all possible string 
values in the charset.  For example, if I just trained Tesseract with the 
characters ABC123 in my language set, I'd like to get a list of these 6 
characters.


I see this function in baseapi.h

  // Returns true if utf8_character is defined in the UniCharset.
  bool IsValidCharacter(const char *utf8_character);

But I'd have to potentially iterate through every utf-8 character to get 
what I need.  Are there any other ways that would work?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/338e59e5-03e2-48e6-96e3-1825745725a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Query possible characters are runtime

Reply via email to