Re: Tess v3 not recognising accented Esperanto characters.

Nick White Mon, 24 Sep 2012 05:36:33 -0700

Hi Donaldo,

Well, I'm relieved it's compiling for you now!


I'll reply to your questions below.

> Right, so now what do I do? You said last week to use commands such as:
> 
> ./lazytrain textfile.txt DejaVu-Serif-Book 1.png 1.box
> 
> which seems to work, but I guess that I need to run it for many fonts and then
> combine them all into a traineddata file?

Yep, exactly correct.

> Are there guidelines on choice of fonts for a Latin-based alphabet,
> bold, italic etc? How many is enough? 

Well, they want to reflect whatever you are actually going to be
scanning. For example I initially used every font on my system that
covered the characters I cared about. But choosing a subset which
actually represented what was printed in the books I was scanning
produced much better results.

> Is it
> still necessary to have a text file with multiple instances of each letter,
> which I have, but which will produce large box files?

Yes. Large box files are fine, don't fear them ;) One of the
advantages of generating this stuff automatically is that you can be
more comprehensive than you have time to do manually.

Hope this helps, let us know of any other questions you have.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tess v3 not recognising accented Esperanto characters.

Reply via email to