have problem with unicharambigs for connected characters

Reza M Sat, 03 Nov 2012 21:37:23 -0700

Hi,
At first excuse because of text editor's problems with mix of rtl and ltr I 
had to use screen shot!
I want to make per.unicharambigs file but I am confused which solution is 
correct?


In attached image

*1-case1:* what should i do? i shod define connected characters as 1 unite 
or counting the number of characters? (in box I defined connected 
characters as 1 unite)
i.e.for word *رضا *i made this box and it works fine but for making 
unicharambigs 
I am confused!

ر 10 298 22 352 0
ضا 1314 248 1323 302 0

*2-case2:* Ocr had problem with some of middle characters. what should i 
do? adding middle character type or their general type?

i.e *م* has four types
1- *ﻡ* (U+FEE1) which is uses with no connection and it is the general 
shape of this character like : آرام
2-*ﻢ* (U+FEE2) which is uses at the end of word (connected to last 
character) like : رفتم
3- *ﻣ*  (U+FEE3 ) which is uses at the first of word like: مرتضی
4- *ﻤ*  (U+FEE4) which is uses at the middle of word like: عمل








-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

<<attachment: unicharambigs.png>>

have problem with unicharambigs for connected characters

Reply via email to