Hello Generally it is very sad that there is no detailed documentation about Tesseract.
The only documentation about Unicharset file that I could find is this: https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharset.5.html But this is completely insufficient and not understandable. And unicharset_extractor.exe produces wrong and uncomplete files. So I have to edit them by hand. But how ? I need a detailed explanation how to enter the values for the several min/max parameters. The sparse documentation says that 128 is the x-height. Does anybody think that with this information one is able to edit a Unicharset file ??? How do I enter the width of a character ? How do I enter the minimum bottom and the maximum bottom value ? And the example given on that page does not make any sense: 1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1 9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9 So this example says that the character "1" has a min_bottom value of 59 and the character "9" has a min_bottom value of 18. Weird ? ? ? Both numbers are aligned to the baseline! Wouldn't it be more intelligent to define the min_bottom for "9" with a higher value to distinguish it from a lowercase "g" ?? And what about the other values ? bearing, advance ? Where do I get them from ? The most weird thing is that the training data may contain 32 fonts but there is only one Unicharset file! If there was one Unicharset file per font I would understand. But in a monospaced font the advance is equal for an "i" and a "W" while in in Arial they are very different. How do I create a Unicharset file that must fit for such different fonts ? I need a detailed explanation with images (not only text!!) how to obtain these values. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2c8fa12f-d315-4907-b3d2-afd25eddeb00%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

