I had the same error: tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file ..\classify\adaptmatch.cpp, line 555
how did you solve it? Pls help me Noong Huwebes, Marso 29 2012 03:31:53 UTC+8, si nkantan r ay sumulat: > > hi > i know there are two tamil trained data files corresponding to Latha > and Lohit fonts; going through the box and tif files i understand that > the boxes for combined consonants (உயிர்மெய்) are selected as > individual (for eg: கே is selected as individual ே and க instead of a > merged கே. Since the vowel variation ே comes before the base consonant > க, post processing is elaborately required while such post-processing > can be written by a person knowing tamil aswell cpp! and as such post- > processing is now altogether missing; > > to elaborate further: குகூகெகே is read correctly but texted out as > குகூெகேக; this is because the sequence is read as கு கூ ெ, க ே க; by > unicharater reading க followed by ே is read as single unicharacter > கே; the net result is குகூெகேக > this becomes worse when a single characters "கொ" "கோ" "கௌ" are read > as three characters in three boxes! > > another major issue is the missing vowel ஔ which is read as while > reading ஒ and ள; > > to avoid these issues, i am retraining the tamil alphabet in its > proper form; though i have succeeded doing the same in one font (Latha > size 12), while combining the language files i am getting : > > Combining tessdata files > TessdataManager combined tess > Offset for type 0 is -1 > Offset for type 1 is 108 > Offset for type 2 is -1 > Offset for type 3 is -1 > Offset for type 4 is 17420 > Offset for type 5 is -1 > Offset for type 6 is -1 > Offset for type 7 is 21008 > Offset for type 8 is -1 > Offset for type 9 is 31506 > Offset for type 10 is -1 > Offset for type 11 is -1 > Offset for type 12 is -1 > > C:\indicocr\tesseract301> > > obviously the -1 above indicates something wrong;? in the whole of the > tesseract-ocr project page, it is not possible to get the samples for > > •tessdata/eng.config > •tessdata/eng.unicharset > •tessdata/eng.unicharambigs > •tessdata/eng.inttemp > •tessdata/eng.pffmtable > •tessdata/eng.normproto > •tessdata/eng.punc-dawg > •tessdata/eng.word-dawg > •tessdata/eng.number-dawg > •tessdata/eng.freq-dawg > > There are 13 items listed in the combinedTess while only 10 files are > listed out above. > > Though it is mentioned that unicharset, inttemp, pffmtable, normproto > are the four files required about from word-dawg and freq-dawg, there > is no mention if the other files such as tam,config, tam.unicharmbigs > etc can be left absent or empty files are required. > > now while trying to Tesseract using the above made tam.traineddata > i am getting the error as below: > =================================== > C:\indicocr\tesseract301>tesseract image.tif testtxt -l tam > tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in > file ..\classify\adaptmatch.cpp, line 512 > > C:\indicocr\tesseract301> > ======================================= > > kinly advise what went wrong, and what need be done to get proper > traineddata file. and i am really hopeful that the files used before > combining are also made availalable so that one can see the samples. > > regards > rnkantan > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

