Hi, At first excuse because of text editor's problems with mix of rtl and ltr I had to use screen shot! I want to make per.unicharambigs file but I am confused which solution is correct?
In attached image *1-case1:* what should i do? i shod define connected characters as 1 unite or counting the number of characters? (in box I defined connected characters as 1 unite) i.e.for word *رضا *i made this box and it works fine but for making unicharambigs I am confused! ر 10 298 22 352 0 ضا 1314 248 1323 302 0 *2-case2:* Ocr had problem with some of middle characters. what should i do? adding middle character type or their general type? i.e *م* has four types 1- *ﻡ* (U+FEE1) which is uses with no connection and it is the general shape of this character like : آرام 2-*ﻢ* (U+FEE2) which is uses at the end of word (connected to last character) like : رفتم 3- *ﻣ* (U+FEE3 ) which is uses at the first of word like: مرتضی 4- *ﻤ* (U+FEE4) which is uses at the middle of word like: عمل -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
<<attachment: unicharambigs.png>>

