Seeing the same issue, Win7 Starter with a fresh install of 3.02.02 and the 3.02 simplified Chinese which I renamed to chi.traineddata.
I've tried a few files of various quality and even very high quality still has the same errors. It does produce an output file with quality which more or less matches the source, although even high quality source yields an output file with lots of mistakes, mostly adding extraneous characters. I'm getting ~10% difference in number of characters source to output. Typical error for a two-line ~80 character source file: C:\Tesseract-OCR>tesseract chi_test.png out -l chi > Too many unichars in ambiguity on line 11087864 > Too many unichars in ambiguity on line 11087864 > Too many unichars in ambiguity on line 3852040 > Tesseract Open Source OCR Engine v3.02 with Leptonica I also notice it will fail if text is too large, characters with multiple parts start to be broken down into their components, ie. 现-->王见. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

