Hi Albrecht,

On Thu, Jul 03, 2014 at 09:40:51PM -0700, Albrecht Hilker wrote: 
> Generally it is very sad that there is no detailed documentation about
> Tesseract.

I agree. I do work on the documentation, but there is an awful lot 
missing. I appreciate you taking the time to ask questions here so 
we can help improve it.

> The only documentation about Unicharset file that I could find is this:
> https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/
> unicharset.5.html
> But this is completely insufficient and not understandable.

Yes, that's all there is, plus a very basic overview of the older 
format in the TrainingTesseract3 wiki page, IIRC.

> And unicharset_extractor.exe produces wrong and uncomplete files.

They are not really wrong, though they are not as complete as would 
be ideal.

> So I have to edit them by hand.
> But how ?

The new training program set_unicharset_properties helps by setting 
some more of the properties automatically. You can see how I'm using 
it in my grc Makefile if you're interested[0].

However it doesn't set the dimensions of characters, as you've 
noticed. I started looking into this a little while ago, but ran out 
of time to go further (and you've clearly got further than I did 
already - good job!)

We should figure out exactly what's required for each value 
together, and then I will very happily document it properly.

I don't have time to look into your specific questions now, sorry, 
but between us we should be able to figure it out in short order.

Thanks a lot for bringing this up; as I said, it has been bothering 
me, but I hadn't found the time to do anything much about it.

More soon!


0. git clone http://ancientgreekocr.org/grc.git

You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to