http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html#_components
:

lang.punc-dawg

(Optional) A dawg made from punctuation patterns found around words. The
"word" part is replaced by a single space.
lang.number-dawg

(Optional) A dawg made from tokens which originally contained digits. Each
digit is replaced by a space character.

-- 
Zdenko

On Thu, Jun 7, 2012 at 12:48 PM, Nick White <[email protected]> wrote:

> Does anybody have any clue as to what number-dawg and punc-dawg are
> supposed to contain? There is no information on them in the
> TrainingTesseract3 wiki page, and I couldn't find anything anywhere
> else. I looked briefly at the dawg2wordlist for other trainings, but
> it didn't reveal anything as obvious as I had hoped.
>
> Nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to