On Wed, Nov 27, 2013 at 09:29:43PM +0530, V S Rawat wrote: > However, if sed or other "substitutors" are not there, or if one > wants to avoid using them, I think it can be done using built in > post-processing method of tesseract. > > use san.DangAmbigs.txt or hin.DangAmbigs.txt whichever language you > are using. > > then put them as > Å=Ā > one per line. > > Should it work equally well and automatically, without needing manual step?
Yes, that should work as well. DangAmbigs was the format for Tesseract 2, current tesseract uses unicharambigs instead - see http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#The_last_file_(unicharambigs). So the file would be of the form: v1 1 Å 1 Ā 1 Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

