Well, this is a really old thread but I'm hoping some of you are still 
around. What do those Error messages mean? I am using tesseract on some 
Kannada files and I get these messages. Since I'm processing hundreds of 
pages, I cannot tell whether or not the OCR is accurate. Error messages are 
worrisome.

Sushil

On Wednesday, February 8, 2012 at 10:02:32 PM UTC+5:30, sriranga(83yrsold) 
wrote:
>
> Derek,
> As suggested by Ray( to combine *eng+hin*) i tested  using version 3.02 
> vide extract of CMD below*** by using combined as *eng+kan*
> Also attached sample untitled.tif and output file viz. testunittled.txt. 
> Thus confirmed "*Added simultaneous multi-language capability"*
>
> ***extract of CMD:
> M:\rao- files\chilume\test-3.02>tesseract untitled.TIF  testuntitled -l 
> eng+kan
> Error: unichar |:|0n2 in normproto file is not in unichar set.
> Error: unichar |:|1n2 in normproto file is not in unichar set.
> Error: unichar |!|0n2 in normproto file is not in unichar set.
> Error: unichar |!|1n2 in normproto file is not in unichar set.
> Error: unichar |;|0n2 in normproto file is not in unichar set.
> Error: unichar |;|1n2 in normproto file is not in unichar set.
> Error: unichar |ರಂ|0n2 in normproto file is not in unichar set.
> Error: unichar |ರಂ|1n2 in normproto file is not in unichar set.
> Error: unichar |ರಿಂ|0n2 in normproto file is not in unichar set.
> Error: unichar |ರಿಂ|1n2 in normproto file is not in unichar set.
> Error: unichar |%|0n3 in normproto file is not in unichar set.
> Error: unichar |%|1n3 in normproto file is not in unichar set.
> Error: unichar |%|2n3 in normproto file is not in unichar set.
> Error: unichar |ರೀಂ|0n3 in normproto file is not in unichar set.
> Error: unichar |ರೀಂ|1n3 in normproto file is not in unichar set.
> Error: unichar |ರೀಂ|2n3 in normproto file is not in unichar set.
> Error: unichar |ಲಂ|0n2 in normproto file is not in unichar set.
> Error: unichar |ಲಂ|1n2 in normproto file is not in unichar set.
> Tesseract Open Source OCR Engine v3.02 with Leptonica
> Page 0
> M:\rao- files\chilume\test-3.02>
>
> cheers,
> -sriranga(79yrs)
>
> =================================================================
>
>
>
> On Sun, Feb 5, 2012 at 7:15 PM, Patrick Questembert <
> [email protected] <javascript:>> wrote:
>
>> I just did and I get this error:
>> "*Error opening data file tessdata/eng+ell.traineddata*"
>>
>> I am passing "eng+ell" as the language parameter (2nd parameter) in:
>>
>> myTess->Init(tessDataDir.c_str(), language, OEM_DEFAULT, NULL, , 0, false
>> );
>> No issue when using just "ell" or "eng". Should I be using a 
>> different/new API?
>>
>> Thanks,
>> Patrick
>>
>> On Fri, Feb 3, 2012 at 11:59 AM, Ray Smith <[email protected] 
>> <javascript:>> wrote:
>>
>>> Try using eng+hin as the language code...
>>>
>>>
>>> On Fri, Feb 3, 2012 at 4:56 AM, Derek Dohler <[email protected] 
>>> <javascript:>> wrote:
>>>
>>>> I'm excited by this:
>>>>
>>>>> Added simultaneous multi-language capability.
>>>>
>>>>
>>>> Can you provide any info on how this works?
>>>>
>>>> Cheers,
>>>> Derek 
>>>>
>>>> On Fri, Feb 3, 2012 at 4:32 PM, Sriranga(78yrsold) <[email protected] 
>>>> <javascript:>> wrote:
>>>>
>>>>> Attached release notes for 3.02. Download can be done from svn of the 
>>>>> project site.tesseract-ocr - Project Hosting on Google Code 
>>>>> <http://code.google.com/p/tesseract-ocr/> 
>>>>> cheers,
>>>>> -sriranga(79yrs)
>>>>>
>>>>> On Fri, Feb 3, 2012 at 4:54 PM, Wil Hadden <[email protected] 
>>>>> <javascript:>> wrote:
>>>>>
>>>>>> Hi Ray,
>>>>>>
>>>>>> Any idea of timescales when there will be a 3.02 package on the
>>>>>> downloads page of googlecode?
>>>>>>
>>>>>> Or are there any release notes between 3.01 and 3.02, I'm, just a bit
>>>>>> wary of being bleeding edge :)
>>>>>>
>>>>>> Wil
>>>>>>
>>>>>> On Feb 2, 6:55 pm, Ray Smith <[email protected]> wrote:
>>>>>> > Tesseract 3.02 is now available in svn for preliminary testing, 
>>>>>> currently
>>>>>> > Linux-only.
>>>>>> >
>>>>>> > There are now 65 languages and some big improvements in layout 
>>>>>> analysis and
>>>>>> > character accuracy.
>>>>>> > This version will with luck make it into Ubunto LTS Precise 
>>>>>> Pangolin, so
>>>>>> > please test to see if your favorite issue is resolved.
>>>>>> >
>>>>>> > Thanks and enjoy!
>>>>>> >
>>>>>> > Ray.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to [email protected] 
>>>>>> <javascript:>
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected] <javascript:>
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected] 
>>>>> <javascript:>
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected] <javascript:>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected] 
>>>> <javascript:>
>>>> To unsubscribe from this group, send email to
>>>> [email protected] <javascript:>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>
>>> To unsubscribe from this group, send email to
>>> [email protected] <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>
>>
>> -- 
>> Patrick Questembert, *ScanBizCards*
>> +1-917-250-4177 | www.scanbizcards.com
>> twitter.com/ScanBizCards | www.facebook.com/ScanBizCards
>>
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected] 
>> <javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1dab4dfe-a30a-45f9-829d-7c613c398930%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to