That error message means you did not follow tesseract training wiki (or you
ignored error messages).

Zdenko


On Sun, Sep 22, 2013 at 9:05 PM, clyde <[email protected]> wrote:

> I had the same error: 
> tessdata_manager.SeekToStart(**TESSDATA_INTTEMP):Error:Assert
> failed:in
> file ..\classify\adaptmatch.cpp, line 555
>
> how did you solve it? Pls help me
>
>
> Noong Huwebes, Marso 29 2012 03:31:53 UTC+8, si nkantan r ay sumulat:
>
>> hi
>> i know there are two tamil trained data files corresponding to Latha
>> and Lohit fonts; going through the box and tif files i understand that
>> the boxes for combined consonants (உயிர்மெய்) are selected as
>> individual (for eg: கே  is selected as individual ே and க instead of a
>> merged கே. Since the vowel variation ே comes before the base consonant
>> க, post processing is elaborately required while such post-processing
>> can be written by a person knowing tamil aswell cpp! and as such post-
>> processing is now altogether missing;
>>
>> to elaborate further:   குகூகெகே  is read correctly but texted out as
>> குகூெகேக; this is because the  sequence is read as கு கூ ெ, க ே க; by
>> unicharater reading க followed by ே is read as single unicharacter
>> கே;  the net result is குகூெகேக
>> this becomes worse when a single characters "கொ"  "கோ" "கௌ" are read
>> as three characters in three boxes!
>>
>> another major issue is the missing vowel ஔ which is read as  while
>> reading ஒ and ள;
>>
>> to avoid these issues, i am retraining the tamil alphabet in its
>> proper form; though i have succeeded doing the same in one font (Latha
>> size 12), while combining the language files i am getting :
>>
>> Combining tessdata files
>> TessdataManager combined tess
>> Offset for type 0 is -1
>> Offset for type 1 is 108
>> Offset for type 2 is -1
>> Offset for type 3 is -1
>> Offset for type 4 is 17420
>> Offset for type 5 is -1
>> Offset for type 6 is -1
>> Offset for type 7 is 21008
>> Offset for type 8 is -1
>> Offset for type 9 is 31506
>> Offset for type 10 is -1
>> Offset for type 11 is -1
>> Offset for type 12 is -1
>>
>> C:\indicocr\tesseract301>
>>
>> obviously the -1 above indicates something wrong;? in the whole of the
>> tesseract-ocr project page, it is not possible to get the samples for
>>
>> •tessdata/eng.config
>> •tessdata/eng.unicharset
>> •tessdata/eng.unicharambigs
>> •tessdata/eng.inttemp
>> •tessdata/eng.pffmtable
>> •tessdata/eng.normproto
>> •tessdata/eng.punc-dawg
>> •tessdata/eng.word-dawg
>> •tessdata/eng.number-dawg
>> •tessdata/eng.freq-dawg
>>
>> There are 13 items listed in the combinedTess while only 10 files are
>> listed out above.
>>
>> Though it is mentioned that unicharset, inttemp, pffmtable, normproto
>> are the four files required about from word-dawg and freq-dawg, there
>> is no mention if the other files such as tam,config, tam.unicharmbigs
>> etc can be left absent or empty files are required.
>>
>> now while trying to Tesseract using the above made tam.traineddata
>> i am getting the error as below:
>> ==============================**=====
>> C:\indicocr\tesseract301>**tesseract image.tif testtxt -l tam
>> tessdata_manager.SeekToStart(**TESSDATA_INTTEMP):Error:Assert failed:in
>> file ..\classify\adaptmatch.cpp, line 512
>>
>> C:\indicocr\tesseract301>
>> ==============================**=========
>>
>> kinly advise what went wrong, and what need be done to get proper
>> traineddata file. and i am really hopeful that the files used before
>> combining are also made availalable so that one can see the samples.
>>
>> regards
>> rnkantan
>>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to