Hi, version of the ambigs file for tesseract 3.00 is 'v1' (it means that next version can bring another version of ambigs file with new format/features). In ccutil/ambigs.cpp I found only one test for version (>0) : it is connected to meaning of last column...
If you want to play/create you own lang.unicharambigs maybe it would be
good to use these variables:
global_ambigs_debug_level 1
global_tessedit_ambigs_training true
it will produce some additional informations.
Concerning UTF-8 I tried to do quick test but I was not able to create
image with mistake in utf-8 letter (tesseract made mistake only on ascii
letter ;-) ). But it make one mistake:
„
interpreted as:
,,
When I created unicharambigs like this:
v1
2 , , 1 „ 1
Than tesseract recognize „ correctly in output („ is utf-8).
So maybe issues regarding utf-8 are solved in lang.unicharambigs.
However, you should make own (more extensive) tests for Kannada or
Indic lang.
Zdenko.
Dňa 01.05.2010 03:55, Sriranga(77yrsold) wrote / napísal(a):
> Hi,
> In your additional comments, it is stated as "first line determine the
> version of the ambigs file." -how to
> determine the version of the ambigs file? Whether the ambigs file of
> tess.3.o is supported for utf-8 say Kannada or any of Indic lang? Previous
> version of tesseract 2.xx did not support utf-8
> With regards,
> -sriranga(77yrsold)
>
> 2010/5/1 Zdenko Podobný <[email protected]>
>
>
>> Hi,
>>
>> I made a test with tesseract 3.00: I created English traineddata without
>> dawg dictionaries (eng_nodict.traineddata) and than I run tesseract to see
>> difference (on file phototest.tif)
>> As you can see dictionary improved result especially in case of "l" vs.
>> "1".
>>
>> I put some additional comments here:
>> http://www.sk-spell.sk.cx/tesseract-ocr-en-dictionary-creating
>>
>> So dictionaries helps to improve results...
>>
>> Zdenko
>>
>> Dňa 30.04.2010 19:47, M. Bashir Al-Noimi wrote / napísal(a):
>>
>> Hi folks,
>>
>> Could you tell me what's the benefit of the dictionary in Tesseract? Does
>> it affect on recognizing decision (the result)?
>>
>> I ask this question because I'm planning to use Tesseract for recognizing
>> singles of characters not complete words.
>>
>>
>>
>
smime.p7s
Description: S/MIME Cryptographic Signature

