While doing OCR with san.traineddata I am getting many cases where
[ ga ग] [virāma ्] [ZWJ] i.e. ग् followed by ा is being output, instead of ग similarly for श ण etc. Zero width joiner is not a unit in the unichar file. And, most half letters are shown with viraama - so I may have ग् in unicharset, but not ग् How do I give the substitutions for this case in unicharambigs file? Will the following work? 2 ग् ा 1 ग 1 -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

