Hi Albrecht, On Thu, Jul 03, 2014 at 09:40:51PM -0700, Albrecht Hilker wrote: > Generally it is very sad that there is no detailed documentation about > Tesseract.
I agree. I do work on the documentation, but there is an awful lot missing. I appreciate you taking the time to ask questions here so we can help improve it. > The only documentation about Unicharset file that I could find is this: > https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/ > unicharset.5.html > > But this is completely insufficient and not understandable. Yes, that's all there is, plus a very basic overview of the older format in the TrainingTesseract3 wiki page, IIRC. > And unicharset_extractor.exe produces wrong and uncomplete files. They are not really wrong, though they are not as complete as would be ideal. > So I have to edit them by hand. > But how ? The new training program set_unicharset_properties helps by setting some more of the properties automatically. You can see how I'm using it in my grc Makefile if you're interested[0]. However it doesn't set the dimensions of characters, as you've noticed. I started looking into this a little while ago, but ran out of time to go further (and you've clearly got further than I did already - good job!) We should figure out exactly what's required for each value together, and then I will very happily document it properly. I don't have time to look into your specific questions now, sorry, but between us we should be able to figure it out in short order. Thanks a lot for bringing this up; as I said, it has been bothering me, but I hadn't found the time to do anything much about it. More soon! Nick 0. git clone http://ancientgreekocr.org/grc.git -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20140709175125.GA15266%40manta.lan. For more options, visit https://groups.google.com/d/optout.