That error message means you did not follow tesseract training wiki (or you ignored error messages).
Zdenko On Sun, Sep 22, 2013 at 9:05 PM, clyde <[email protected]> wrote: > I had the same error: > tessdata_manager.SeekToStart(**TESSDATA_INTTEMP):Error:Assert > failed:in > file ..\classify\adaptmatch.cpp, line 555 > > how did you solve it? Pls help me > > > Noong Huwebes, Marso 29 2012 03:31:53 UTC+8, si nkantan r ay sumulat: > >> hi >> i know there are two tamil trained data files corresponding to Latha >> and Lohit fonts; going through the box and tif files i understand that >> the boxes for combined consonants (உயிர்மெய்) are selected as >> individual (for eg: கே is selected as individual ே and க instead of a >> merged கே. Since the vowel variation ே comes before the base consonant >> க, post processing is elaborately required while such post-processing >> can be written by a person knowing tamil aswell cpp! and as such post- >> processing is now altogether missing; >> >> to elaborate further: குகூகெகே is read correctly but texted out as >> குகூெகேக; this is because the sequence is read as கு கூ ெ, க ே க; by >> unicharater reading க followed by ே is read as single unicharacter >> கே; the net result is குகூெகேக >> this becomes worse when a single characters "கொ" "கோ" "கௌ" are read >> as three characters in three boxes! >> >> another major issue is the missing vowel ஔ which is read as while >> reading ஒ and ள; >> >> to avoid these issues, i am retraining the tamil alphabet in its >> proper form; though i have succeeded doing the same in one font (Latha >> size 12), while combining the language files i am getting : >> >> Combining tessdata files >> TessdataManager combined tess >> Offset for type 0 is -1 >> Offset for type 1 is 108 >> Offset for type 2 is -1 >> Offset for type 3 is -1 >> Offset for type 4 is 17420 >> Offset for type 5 is -1 >> Offset for type 6 is -1 >> Offset for type 7 is 21008 >> Offset for type 8 is -1 >> Offset for type 9 is 31506 >> Offset for type 10 is -1 >> Offset for type 11 is -1 >> Offset for type 12 is -1 >> >> C:\indicocr\tesseract301> >> >> obviously the -1 above indicates something wrong;? in the whole of the >> tesseract-ocr project page, it is not possible to get the samples for >> >> •tessdata/eng.config >> •tessdata/eng.unicharset >> •tessdata/eng.unicharambigs >> •tessdata/eng.inttemp >> •tessdata/eng.pffmtable >> •tessdata/eng.normproto >> •tessdata/eng.punc-dawg >> •tessdata/eng.word-dawg >> •tessdata/eng.number-dawg >> •tessdata/eng.freq-dawg >> >> There are 13 items listed in the combinedTess while only 10 files are >> listed out above. >> >> Though it is mentioned that unicharset, inttemp, pffmtable, normproto >> are the four files required about from word-dawg and freq-dawg, there >> is no mention if the other files such as tam,config, tam.unicharmbigs >> etc can be left absent or empty files are required. >> >> now while trying to Tesseract using the above made tam.traineddata >> i am getting the error as below: >> ==============================**===== >> C:\indicocr\tesseract301>**tesseract image.tif testtxt -l tam >> tessdata_manager.SeekToStart(**TESSDATA_INTTEMP):Error:Assert failed:in >> file ..\classify\adaptmatch.cpp, line 512 >> >> C:\indicocr\tesseract301> >> ==============================**========= >> >> kinly advise what went wrong, and what need be done to get proper >> traineddata file. and i am really hopeful that the files used before >> combining are also made availalable so that one can see the samples. >> >> regards >> rnkantan >> > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

