wow. That was new information that we should use unicharambigs instead
of DangAmbigs.
Thanks Nick, for sharing about it.
1. where should this file be put.
2. Is the same file to be used for all lanuages? previous method was
convenient when each language has its own file name.
File should have a recognized extension to help it getting opened
automatically in standard relevant editor. it is bad method to have a
file without an extension.
Thanks.
--
Rawat
On 11/27/2013 9:36 PM, Nick White wrote:
On Wed, Nov 27, 2013 at 09:29:43PM +0530, V S Rawat wrote:
However, if sed or other "substitutors" are not there, or if one
wants to avoid using them, I think it can be done using built in
post-processing method of tesseract.
use san.DangAmbigs.txt or hin.DangAmbigs.txt whichever language you
are using.
then put them as
Å=Ā
one per line.
Should it work equally well and automatically, without needing manual step?
Yes, that should work as well. DangAmbigs was the format for
Tesseract 2, current tesseract uses unicharambigs instead - see
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#The_last_file_(unicharambigs).
So the file would be of the form:
v1
1 Å 1 Ā 1
Nick
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.