>
>     To train for Kannada is posing problems as the script is very complex.
>> Making box file for normal image of a character and using string in the
>> standard way has been tried. But the efficiency level is not rising. Main
>> problem lies in a data file without all combinations and required number of
>> repetitions. One transliteration scheme (equal to writing kannada using
>> English keyboard in the same way as writing your name in English) is purely
>> based on phonetic way. By using ZWJ between consonant and vowel live
>> consonant can be broken and similarly this ZWJ can be removed   during post
>> processing to get the output in the normal way. I was thinking whether using
>> ZWJ during training helps in obtaining the requirement of combinations and
>> repetitions.
>>  [image: [email protected]]
>>
>
>  i.e    *ಕ್‍ಅ= ಕ | ಕ್‍ಆ= ಕಾ | ಕ್‍ಇ = ಕಿ*
>
>
>>  *i.e  ನಾ = ನ್‍ಆ | ನು=ನ್‍ಉ | ನ = ನ್‍ಅ*
>>
>>
>>
>> In the first instance* ‘ka’* is split as ‘*k’ ‘a’* separated by*ZWJ* *‘^’
>> *while preparing the *image* and *box* file.
>>
>   In the second instance image as per normal rendering is used for boxing
>> and string within the box is split as shown on the top of the box using*
>> "^"*
>>
>> Here again as my knowledge of the Tesseract engine is *poor *I am not
>> able to decide whether using ZWJ is to be used while creating the data *
>> image* or in the strings in the *box file*.
>>
>> Another point is whether this is really a solution?
>>
>> Somebody in the group who is having good insight into the working of the
>> OCR engine and also some fonts using transliteration schemes using normal
>> keyboard and phonetic method.
>>
>> N.B  ^ = ZWJ (zero width joiner)
>>
>>        ^^ = ZWNJ (zero width non joiner)
>>
>> used in baraha s/w
>> MNS Rao
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

<<[email protected]>>

Reply via email to