Hi Zdenko,

I saw the descriptions you give below, I just wasn't very clear on
what they meant.

On Thu, Jun 07, 2012 at 02:50:57PM +0200, zdenko podobny wrote:
> lang.punc-dawg
> (Optional) A dawg made from punctuation patterns found around words. The
> "word" part is replaced by a single space.
> lang.number-dawg

So for english, ( ) and " " spring to mind. Is this the sort of
thing that is expected?

> (Optional) A dawg made from tokens which originally contained digits. Each
> digit is replaced by a space character.

Ah, looking at one of the official trainings with dawg2wordlist I
see entries such as '(c)    ' (without quotes.) Thanks, that makes
sense. Though I'm suprised (and impressed) that Tesseract goes down
to that level of granularity in its scanning.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to