unicharset script and metrics questions

Nick White Thu, 07 Jun 2012 05:42:55 -0700

On Thu, Jun 07, 2012 at 08:22:27AM +0200, zdenko podobny wrote:
> I start to put my notes[1] what I found (just for me ;-) ) - at the moment
> there is not a lot of information and maybe there are some things that
> I misunderstood ;-) .
> 
> [1] http://www.sk-spell.sk.cx/first-notes-for-tesseract-ocr-302-traning


Thanks so much for posting your notes Zdenko, they're very handy
indeed, incomplete and incorrect though they may be ;)

I am suffering from some of the same problems as you with the output
from unicharset_extractor. In particular, glyph_metrics is always:
0,255,0,255,0,32767,0,32767,0,32767
and script is always NULL.

I'm training Ancient Greek, so it seems pretty clear that script
should be Greek. But does anybody know what the script field is used
for? Not setting it doesn't seem to cause any problems. Anybody have
any clues as to why it wouldn't be set automatically? Are there any
known problems to setting it manually once the unicharset has been
generated? I'll look into these more in the code when I can, but any
experience from others would be most useful.

As for the glyph_metrics, it seems more worrying that it doesn't
seem to be filled out at all. Has anybody else had any luck with it?
And any idea why?

Any thoughts or ideas would be most welcome!

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

unicharset script and metrics questions

Reply via email to