[tesseract-ocr] Using the unicharambigs in tesseract

kasun balasooriya Sun, 03 Apr 2016 22:46:02 -0700

Hi!
I'm kasun an undergraduate who is currently involved with some ocr based 
research in Sri Lanka. I have been trying to train tesseract for sinhalses 
language[1] comprehensively. I'm using the unicharabigs file to overcome 
some of the problems I'm having during the training. I'm grateful if 
somebody can sort out some of the problems i'm currently facing.


The first problem is that I don't really understand how the optional field 
in unicharambigs is working. 

The second problem is I'm able to get *one *of the unichrambigs to work but 
not both. 

2 ෙ ක 1 කෙ 1    (   ෙ ක - U+0DD9  U+0D9A කෙ - U+0D9A U+0DD9 ) 
      

3 ෙ ක ා් 1 කෝ 1 (ෙ ක ා් - U+0DD9  U+0D9A  U+0DCF U+0DCA   කෝ - U+0D9A 
U+0DDD ) 

I understand that if the first rule is invoked then the second rule would 
become dormant since the unicodes  U+0DD9  U+0D9A will switch places. but 
if i put 

2 කෙ  ා්  1 කෝ 1 ( කෙ  ා් - U+0D9A U+0DD9   U+0DCF U+0DCA   කෝ- U+0D9A 
U+0DDD ) 

it won't work either. 

Any help in this regard is highly appreciated! Thanks in advance. 

References:
[1] https://en.wikipedia.org/wiki/Sinhalese_language

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20f33f67-8481-4595-986d-be90814afbd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Using the unicharambigs in tesseract

Reply via email to