Ad 1. This file was generated by Google in their internal system. The tools
are opensourced (see 3Training.pdf[1] - but I would suggest you to read all
presentations) now (or "ported", so they use free libraries instead of
google internal libraries). Regarding used fonts I guess that
file eng.cube.size[2] should provide you relevant indication.

Ad 3. I am not sure if there could an simple answer. IMO for "modern" fonts
and text without graphics it should be not to use tesseract for OCR. For
old fonts (a.k.a. fraktur fonts) you will need training. If your text has a
graphics, tables etc. you should expect problems[3]. I heard about
companies that successfully use tesseract in OCR of invoices, but their sw
is doing image pre-processing, page segmentation and text post-processing
and tesseract is used for only for OCR of text areas.

[1] https://docs.google.com/file/d/0B7l10Bj_LprhbUlIUFlCdGtDYkE
[2] http://tessdata.tesseract-ocr.googlecode.com/git/eng.cube.size
[3] https://code.google.com/p/tesseract-ocr/issues/detail?id=1412

Zdenko

On Tue, May 12, 2015 at 4:22 PM, smwikipedia smwikipedia <
[email protected]> wrote:

> Regarding question 2, I just found 2 sites to explain the control
> parameters:
>
> https://code.google.com/p/tesseract-ocr/wiki/ControlParams
>
> http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version
>
>
> 在 2015年5月11日星期一 UTC+8下午8:49:04,smwikipedia smwikipedia写道:
>>
>>
>>
>> 1. For tesseract 3.02, after installation I see there's a pre-trained
>> *eng.traineddata* file in the tessdata folder. How is this file
>> generated? What font does it target? Can I blindly use it for my OCR
>> application?
>>
>> 2. For tesseract 3.03, I see there's a new option "--print-parameters"
>> for the tesseract executable. There're more than 600 parameters. How am I
>> supposed to use them? If I need to tune them, how?
>>
>> 3. During my experimentation, I see tesseract works better for some font
>> type than other font type. Is this true? Which font has the best precision?
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e201c2a8-3271-40f6-87a0-183245a19abb%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/e201c2a8-3271-40f6-87a0-183245a19abb%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yGLzvHzT8Ny1FUphBbdtbE1MwAcoqwizbQ_wnZS5m6-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to