Re: Benefit of the dictionary

Zdenko Podobný Sat, 01 May 2010 11:38:11 -0700

Hi,

version of the ambigs file for tesseract 3.00 is 'v1' (it means that
next version can bring another version of ambigs file
with new format/features). In ccutil/ambigs.cpp I found only one test
for version (>0) : it is connected to meaning of last column...


If you want to play/create you own lang.unicharambigs maybe it would be
good to use these variables:

    global_ambigs_debug_level 1
    global_tessedit_ambigs_training true

it will produce some additional informations.

Concerning UTF-8 I tried to do quick test but I was not able to create
image with mistake in utf-8 letter (tesseract made mistake only on ascii
letter ;-) ). But it make one mistake:
„
interpreted as:
,,

When I created unicharambigs like this:
v1
2    , ,    1    „    1

Than tesseract recognize „ correctly in output („ is utf-8).
So maybe issues regarding utf-8 are solved in lang.unicharambigs.
However,  you should make own (more extensive) tests for Kannada or
Indic lang.


Zdenko.

Dňa 01.05.2010 03:55, Sriranga(77yrsold)  wrote / napísal(a):
> Hi,
> In your additional comments, it is stated as "first line determine the
> version of the ambigs file." -how to
> determine the version of the ambigs file? Whether the ambigs file of
> tess.3.o is supported for utf-8 say Kannada or any of Indic lang? Previous
> version of tesseract 2.xx did not support utf-8
> With regards,
> -sriranga(77yrsold)
>
> 2010/5/1 Zdenko Podobný <[email protected]>
>
>   
>>  Hi,
>>
>> I made a test with tesseract 3.00: I created English traineddata without
>> dawg dictionaries (eng_nodict.traineddata)  and than I run tesseract to see
>> difference (on file phototest.tif)
>> As you can see dictionary improved result especially in case of "l" vs.
>> "1".
>>
>> I put some additional comments here:
>> http://www.sk-spell.sk.cx/tesseract-ocr-en-dictionary-creating
>>
>> So dictionaries helps to improve results...
>>
>>  Zdenko
>>
>> Dňa 30.04.2010 19:47, M. Bashir Al-Noimi  wrote / napísal(a):
>>
>> Hi folks,
>>
>> Could you tell me what's the benefit of the dictionary in Tesseract? Does
>> it affect on recognizing decision (the result)?
>>
>> I ask this question because I'm planning to use Tesseract for recognizing
>> singles of characters not complete words.
>>
>>
>>     
>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Benefit of the dictionary

Reply via email to