Hello

Generally it is very sad that there is no detailed documentation about 
Tesseract.

The only documentation about Unicharset file that I could find is this:
https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharset.5.html

But this is completely insufficient and not understandable.

And unicharset_extractor.exe produces wrong and uncomplete files.
So I have to edit them by hand.
But how ?

I need a detailed explanation how to enter the values for the several 
min/max parameters.

The sparse documentation says that 128 is the x-height.
Does anybody think that with this information one is able to edit a 
Unicharset file ???

How do I enter the width of a character ?
How do I enter the minimum bottom and the maximum bottom value ?

And the example given on that page does not make any sense:

1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1
9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9

So this example says that
the character "1" has a min_bottom value of 59 and
the character "9" has a min_bottom value of 18.

Weird ? ? ?
Both numbers are aligned to the baseline!

Wouldn't it be more intelligent to define the min_bottom for "9" with a higher 
value to distinguish it from a lowercase "g" ??

And what about the other values ?
bearing, advance ?
Where do I get them from ?

The most weird thing is that the training data may contain 32 fonts but there 
is only one Unicharset file!
If there was one Unicharset file per font I would understand.

But in a monospaced font the advance is equal for an "i" and a "W" while in in 
Arial they are very different.
How do I create a Unicharset file that must fit for such different fonts ?

I need a detailed explanation with images (not only text!!) how to obtain these 
values.





-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2c8fa12f-d315-4907-b3d2-afd25eddeb00%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to