On Wed, Nov 27, 2013 at 09:29:43PM +0530, V S Rawat wrote:
> However, if sed or other "substitutors" are not there, or if one
> wants to avoid using them, I think it can be done using built in
> post-processing method of tesseract.
> 
> use san.DangAmbigs.txt or hin.DangAmbigs.txt whichever language you
> are using.
> 
> then put them as
> Å=Ā
> one per line.
> 
> Should it work equally well and automatically, without needing manual step?

Yes, that should work as well. DangAmbigs was the format for
Tesseract 2, current tesseract uses unicharambigs instead - see
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#The_last_file_(unicharambigs).

So the file would be of the form:
v1
1       Å       1       Ā       1

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to