[tesseract-ocr] Re: Traineddata inspector

Ruwanka De Silva Fri, 18 Sep 2015 09:39:55 -0700

Hi Jozef,

Thank you for the valuable tool. I am training tesseract for the Sinhalese 
language and your tool is very helpful to identify what are the characters 
that have not been trained well. But I have an issue when analyzing 
traineddata file which generated from multiple training images or generated 
from multiple fonts.


Issue is the features (character glyph/feature map) of characters and 
corresponding Unicode labels are not matched. But they are correct if 
traineddata file is only for few training images and only for one font. Is 
it a bug in the tool or generated traineddata file is distorted somehow? 
Please let me know what is the issue for this effect. 

Thank you,
Ruwanka De Silva

On Thursday, September 3, 2015 at 3:03:33 PM UTC+5:30, jm wrote:
>
> Dear all, 
>
> you can use the following web app to inspect some of the internals of 
> traineddata files:
> https://te-traineddata-ui.herokuapp.com
>
> Few notes:
> - this version does not parse cube specifics and some of the newer files;
> - free hosting limits apply which means several parallel requests will 
> kill it, be patient.
>
> Best,
> Jozef
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5827ac16-1525-4c87-9c94-54bb5b44d8a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Traineddata inspector

Reply via email to