Can you please provide explanation why do you think that
"unicharset_extractor.exe
produces wrong and uncomplete files"?

Zdenko


On Fri, Jul 4, 2014 at 6:40 AM, Albrecht Hilker <[email protected]>
wrote:

> Hello
>
> Generally it is very sad that there is no detailed documentation about
> Tesseract.
>
> The only documentation about Unicharset file that I could find is this:
>
> https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharset.5.html
>
> But this is completely insufficient and not understandable.
>
> And unicharset_extractor.exe produces wrong and uncomplete files.
> So I have to edit them by hand.
> But how ?
>
> I need a detailed explanation how to enter the values for the several
> min/max parameters.
>
> The sparse documentation says that 128 is the x-height.
> Does anybody think that with this information one is able to edit a
> Unicharset file ???
>
> How do I enter the width of a character ?
> How do I enter the minimum bottom and the maximum bottom value ?
>
> And the example given on that page does not make any sense:
>
> 1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1
> 9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9
>
> So this example says that
> the character "1" has a min_bottom value of 59 and
> the character "9" has a min_bottom value of 18.
>
> Weird ? ? ?
> Both numbers are aligned to the baseline!
>
> Wouldn't it be more intelligent to define the min_bottom for "9" with a 
> higher value to distinguish it from a lowercase "g" ??
>
> And what about the other values ?
> bearing, advance ?
> Where do I get them from ?
>
> The most weird thing is that the training data may contain 32 fonts but there 
> is only one Unicharset file!
> If there was one Unicharset file per font I would understand.
>
> But in a monospaced font the advance is equal for an "i" and a "W" while in 
> in Arial they are very different.
> How do I create a Unicharset file that must fit for such different fonts ?
>
> I need a detailed explanation with images (not only text!!) how to obtain 
> these values.
>
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2c8fa12f-d315-4907-b3d2-afd25eddeb00%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/2c8fa12f-d315-4907-b3d2-afd25eddeb00%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x52vv-%2BdUhHsfgj--2nuFWkaeuVuqG2DAEcTHQVoeQaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to