Hi Jozef, Thank you for the valuable tool. I am training tesseract for the Sinhalese language and your tool is very helpful to identify what are the characters that have not been trained well. But I have an issue when analyzing traineddata file which generated from multiple training images or generated from multiple fonts.
Issue is the features (character glyph/feature map) of characters and corresponding Unicode labels are not matched. But they are correct if traineddata file is only for few training images and only for one font. Is it a bug in the tool or generated traineddata file is distorted somehow? Please let me know what is the issue for this effect. Thank you, Ruwanka De Silva On Thursday, September 3, 2015 at 3:03:33 PM UTC+5:30, jm wrote: > > Dear all, > > you can use the following web app to inspect some of the internals of > traineddata files: > https://te-traineddata-ui.herokuapp.com > > Few notes: > - this version does not parse cube specifics and some of the newer files; > - free hosting limits apply which means several parallel requests will > kill it, be patient. > > Best, > Jozef > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5827ac16-1525-4c87-9c94-54bb5b44d8a8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

