Well, this is a really old thread but I'm hoping some of you are still around. What do those Error messages mean? I am using tesseract on some Kannada files and I get these messages. Since I'm processing hundreds of pages, I cannot tell whether or not the OCR is accurate. Error messages are worrisome.
Sushil On Wednesday, February 8, 2012 at 10:02:32 PM UTC+5:30, sriranga(83yrsold) wrote: > > Derek, > As suggested by Ray( to combine *eng+hin*) i tested using version 3.02 > vide extract of CMD below*** by using combined as *eng+kan* > Also attached sample untitled.tif and output file viz. testunittled.txt. > Thus confirmed "*Added simultaneous multi-language capability"* > > ***extract of CMD: > M:\rao- files\chilume\test-3.02>tesseract untitled.TIF testuntitled -l > eng+kan > Error: unichar |:|0n2 in normproto file is not in unichar set. > Error: unichar |:|1n2 in normproto file is not in unichar set. > Error: unichar |!|0n2 in normproto file is not in unichar set. > Error: unichar |!|1n2 in normproto file is not in unichar set. > Error: unichar |;|0n2 in normproto file is not in unichar set. > Error: unichar |;|1n2 in normproto file is not in unichar set. > Error: unichar |ರಂ|0n2 in normproto file is not in unichar set. > Error: unichar |ರಂ|1n2 in normproto file is not in unichar set. > Error: unichar |ರಿಂ|0n2 in normproto file is not in unichar set. > Error: unichar |ರಿಂ|1n2 in normproto file is not in unichar set. > Error: unichar |%|0n3 in normproto file is not in unichar set. > Error: unichar |%|1n3 in normproto file is not in unichar set. > Error: unichar |%|2n3 in normproto file is not in unichar set. > Error: unichar |ರೀಂ|0n3 in normproto file is not in unichar set. > Error: unichar |ರೀಂ|1n3 in normproto file is not in unichar set. > Error: unichar |ರೀಂ|2n3 in normproto file is not in unichar set. > Error: unichar |ಲಂ|0n2 in normproto file is not in unichar set. > Error: unichar |ಲಂ|1n2 in normproto file is not in unichar set. > Tesseract Open Source OCR Engine v3.02 with Leptonica > Page 0 > M:\rao- files\chilume\test-3.02> > > cheers, > -sriranga(79yrs) > > ================================================================= > > > > On Sun, Feb 5, 2012 at 7:15 PM, Patrick Questembert < > [email protected] <javascript:>> wrote: > >> I just did and I get this error: >> "*Error opening data file tessdata/eng+ell.traineddata*" >> >> I am passing "eng+ell" as the language parameter (2nd parameter) in: >> >> myTess->Init(tessDataDir.c_str(), language, OEM_DEFAULT, NULL, , 0, false >> ); >> No issue when using just "ell" or "eng". Should I be using a >> different/new API? >> >> Thanks, >> Patrick >> >> On Fri, Feb 3, 2012 at 11:59 AM, Ray Smith <[email protected] >> <javascript:>> wrote: >> >>> Try using eng+hin as the language code... >>> >>> >>> On Fri, Feb 3, 2012 at 4:56 AM, Derek Dohler <[email protected] >>> <javascript:>> wrote: >>> >>>> I'm excited by this: >>>> >>>>> Added simultaneous multi-language capability. >>>> >>>> >>>> Can you provide any info on how this works? >>>> >>>> Cheers, >>>> Derek >>>> >>>> On Fri, Feb 3, 2012 at 4:32 PM, Sriranga(78yrsold) <[email protected] >>>> <javascript:>> wrote: >>>> >>>>> Attached release notes for 3.02. Download can be done from svn of the >>>>> project site.tesseract-ocr - Project Hosting on Google Code >>>>> <http://code.google.com/p/tesseract-ocr/> >>>>> cheers, >>>>> -sriranga(79yrs) >>>>> >>>>> On Fri, Feb 3, 2012 at 4:54 PM, Wil Hadden <[email protected] >>>>> <javascript:>> wrote: >>>>> >>>>>> Hi Ray, >>>>>> >>>>>> Any idea of timescales when there will be a 3.02 package on the >>>>>> downloads page of googlecode? >>>>>> >>>>>> Or are there any release notes between 3.01 and 3.02, I'm, just a bit >>>>>> wary of being bleeding edge :) >>>>>> >>>>>> Wil >>>>>> >>>>>> On Feb 2, 6:55 pm, Ray Smith <[email protected]> wrote: >>>>>> > Tesseract 3.02 is now available in svn for preliminary testing, >>>>>> currently >>>>>> > Linux-only. >>>>>> > >>>>>> > There are now 65 languages and some big improvements in layout >>>>>> analysis and >>>>>> > character accuracy. >>>>>> > This version will with luck make it into Ubunto LTS Precise >>>>>> Pangolin, so >>>>>> > please test to see if your favorite issue is resolved. >>>>>> > >>>>>> > Thanks and enjoy! >>>>>> > >>>>>> > Ray. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to [email protected] >>>>>> <javascript:> >>>>>> To unsubscribe from this group, send email to >>>>>> [email protected] <javascript:> >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> <javascript:> >>>>> To unsubscribe from this group, send email to >>>>> [email protected] <javascript:> >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> <javascript:> >>>> To unsubscribe from this group, send email to >>>> [email protected] <javascript:> >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> <javascript:> >>> To unsubscribe from this group, send email to >>> [email protected] <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> >> >> -- >> Patrick Questembert, *ScanBizCards* >> +1-917-250-4177 | www.scanbizcards.com >> twitter.com/ScanBizCards | www.facebook.com/ScanBizCards >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> <javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1dab4dfe-a30a-45f9-829d-7c613c398930%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

