Re: Chinese Simplified on this image not working

Lee Kohl-Bradley Tue, 18 Dec 2012 18:26:03 -0800

Seeing the same issue, Win7 Starter with a fresh install of 3.02.02 and the 
3.02 simplified Chinese which I renamed to chi.traineddata.


I've tried a few files of various quality and even very high quality still 
has the same errors.

It does produce an output file with quality which more or less matches the 
source, although even high quality source yields an output file with lots 
of mistakes, mostly adding extraneous characters. I'm getting ~10% 
difference in number of characters source to output.

Typical error for a two-line ~80 character source file:

C:\Tesseract-OCR>tesseract chi_test.png out -l chi
> Too many unichars in ambiguity on line 11087864
> Too many unichars in ambiguity on line 11087864
> Too many unichars in ambiguity on line 3852040
> Tesseract Open Source OCR Engine v3.02 with Leptonica


I also notice it will fail if text is too large, characters with multiple 
parts start to be broken down into their components, ie. 现-->王见.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Chinese Simplified on this image not working

Reply via email to