Perhaps you need to binarize your image instead of using grayscale? binary (b+w) images typically give better results, and tess may even be internally converting the grayscale to binary, resulting in lower quality... --Sven
On Tue, Jul 17, 2012 at 12:20 PM, Zdenko Podobný <[email protected]> wrote: > Dňa 17.07.2012 02:32, Wei Liu wrote / napísal(a): >> >> My platform: Mac OS X 10.7.4 + Xcode 4.3.2 + OpenCV 2.4.0 >> >> >> I want to use tesseract-ocr to recognize a few image (see attachment), and >> I wrote a simple function to process the image using OpenCV, which is shown >> as following >> >> >> char* wl_ocr(const IplImage* im) >> >> { >> >> // convert image to gray >> >> IplImage* imGray = wl_rgb2gray(im); >> >> cv::Mat matGray = imGray; >> >> >> >> // initialize tesseract-ocr >> >> tesseract::TessBaseAPI tess; >> >> tess.Init("", "eng", tesseract::OEM_DEFAULT); >> >> tess.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" >> ); >> >> // tess.SetVariable("tessedit_char_whitelist", >> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"); >> >> tess.SetPageSegMode(tesseract::PSM_AUTO); >> >> >> >> // process the image >> >> // tess.TesseractRect(matGray.data, 1, matGray.step1(), 0, 0, >> matGray.cols, matGray.rows); >> >> tess.SetImage((uchar*)matGray.data, matGray.size().width, matGray.size >> ().height, matGray.channels(), matGray.step1()); >> >> tess.Recognize(0); >> >> >> >> // get the recognized text >> >> char* text; >> >> text = tess.GetUTF8Text(); >> >> >> >> // clean up >> >> cvReleaseImage(&imGray); >> >> >> >> return text; >> >> } >> >> >> I got the following results: >> >> >> 0.png --> CAUTION >> >> 1.png --> TILE WAL >> >> 2.png --> SLIPPERY >> >> >> The correct one should be: >> >> >> 0.png --> CAUTION >> >> 1.png --> TILE WALKWAY >> >> 2.png --> SLIPPERY WHEN WET >> >> >> The images seem to be pretty simple and clean, but my function cannot >> output the whole words but only part of the words. I am not sure if I >> misconfigure something in my code or if there is anything wrong with my >> code. >> >> >> BTW. I did not train tesseract-ocr, I simply copy eng.traineddata to >> certain folder (/usr/local/share/tessdata) >> > > What version of tesseract are you using? At the moment I do not have > time to test your code, but I just tried this (using tesseract 3.02): > > $ tesseract 0.png 0 && cat 0.txt > Tesseract Open Source OCR Engine v3.02 with Leptonica > CAUTION > > $ tesseract 1.png 1 && cat 1.txt > Tesseract Open Source OCR Engine v3.02 with Leptonica > TILE WALKWAY > > $ tesseract 2.png 2 && cat 2.txt > Tesseract Open Source OCR Engine v3.02 with Leptonica > SLIPPERY WHEN WET > > it looks tesseract 3.02 is able to OCR your images correctly (e.g. you > should upgrade to 3.02 version or debug your code). > > -- > Zdenko > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

