Hi all,

I am using Tesseract 3.02 and have created a lang.unicharambigs file for my 
data. It seems to work OK when I have more than one character on either 
side of the "equation", but otherwise doesn't have any affect on 
single-character substitutions (t -> r, or vice versa). I've been seeing a 
lot of references in the discussions in this group to DangAmbigs, but I'm 
not sure what that is or how it's different from unicharambigs. I have seen 
some examples of the syntax, but no clear indication of whether this is a 
stand-alone file, what it should be named, and where it goes. I'm assuming 
that it's a standalone file that would be named something like 
lang.DangAmbigs and go in the same folder as my DAWG files so that it can 
be included when it run combine_tessdata to create my language. Can anyone 
confirm that?

What I'd really like to know is how these two files are different? Is 
DanAmbigs just a unicharambigs that's used during dictionary lookup 
somehow, or is it an older version of the unicharambigs from earlier 
tesseracts?

Maybe this will answer another question I have about unicharambigs: is 
unicharambigs consulted/processed before the dictionary lookups or after? 
I'm going to try an experiment to create a unicharmbigs file for my data 
that will turn all ligatures into their modern two-character equivalents 
since my dictionary doesn't contain any ligatures or long-Ses. Maybe that 
will answer my question. But I'd appreciate any comments on this from 
anyone who knows as well. I'll post my results to this thread.

Thanks,
Matt

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to