Hi!
I'm kasun an undergraduate who is currently involved with some ocr based
research in Sri Lanka. I have been trying to train tesseract for sinhalses
language[1] comprehensively. I'm using the unicharabigs file to overcome
some of the problems I'm having during the training. I'm grateful if
somebody can sort out some of the problems i'm currently facing.
The first problem is that I don't really understand how the optional field
in unicharambigs is working.
The second problem is I'm able to get *one *of the unichrambigs to work but
not both.
2 ෙ ක 1 කෙ 1 ( ෙ ක - U+0DD9 U+0D9A කෙ - U+0D9A U+0DD9 )
3 ෙ ක ා් 1 කෝ 1 (ෙ ක ා් - U+0DD9 U+0D9A U+0DCF U+0DCA කෝ - U+0D9A
U+0DDD )
I understand that if the first rule is invoked then the second rule would
become dormant since the unicodes U+0DD9 U+0D9A will switch places. but
if i put
2 කෙ ා් 1 කෝ 1 ( කෙ ා් - U+0D9A U+0DD9 U+0DCF U+0DCA කෝ- U+0D9A
U+0DDD )
it won't work either.
Any help in this regard is highly appreciated! Thanks in advance.
References:
[1] https://en.wikipedia.org/wiki/Sinhalese_language
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/20f33f67-8481-4595-986d-be90814afbd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.