On Thu, Jun 07, 2012 at 08:22:27AM +0200, zdenko podobny wrote: > I start to put my notes[1] what I found (just for me ;-) ) - at the moment > there is not a lot of information and maybe there are some things that > I misunderstood ;-) . > > [1] http://www.sk-spell.sk.cx/first-notes-for-tesseract-ocr-302-traning
Thanks so much for posting your notes Zdenko, they're very handy indeed, incomplete and incorrect though they may be ;) I am suffering from some of the same problems as you with the output from unicharset_extractor. In particular, glyph_metrics is always: 0,255,0,255,0,32767,0,32767,0,32767 and script is always NULL. I'm training Ancient Greek, so it seems pretty clear that script should be Greek. But does anybody know what the script field is used for? Not setting it doesn't seem to cause any problems. Anybody have any clues as to why it wouldn't be set automatically? Are there any known problems to setting it manually once the unicharset has been generated? I'll look into these more in the code when I can, but any experience from others would be most useful. As for the glyph_metrics, it seems more worrying that it doesn't seem to be filled out at all. Has anybody else had any luck with it? And any idea why? Any thoughts or ideas would be most welcome! Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

