Hi Zdenko, I saw the descriptions you give below, I just wasn't very clear on what they meant.
On Thu, Jun 07, 2012 at 02:50:57PM +0200, zdenko podobny wrote: > lang.punc-dawg > (Optional) A dawg made from punctuation patterns found around words. The > "word" part is replaced by a single space. > lang.number-dawg So for english, ( ) and " " spring to mind. Is this the sort of thing that is expected? > (Optional) A dawg made from tokens which originally contained digits. Each > digit is replaced by a space character. Ah, looking at one of the official trainings with dawg2wordlist I see entries such as '(c) ' (without quotes.) Thanks, that makes sense. Though I'm suprised (and impressed) that Tesseract goes down to that level of granularity in its scanning. Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

