Re: tesseract-ocr does not output desired results

Wei Liu Wed, 18 Jul 2012 00:26:59 -0700

Hi Zdenko,

I installed the 3.02 version, and it now works. I guess the problem is not 
because I used a older version tesseract-ocr, but because I wrote an 
additional function to keep only alphabet characters, which has bug in it. 
I suppose the simple function I provide works for opencv data.


Thanks for your help though :)

On Tuesday, July 17, 2012 10:20:48 AM UTC-7, zdenop wrote:
>
> Dňa 17.07.2012 02:32, Wei Liu wrote / napísal(a): 
> > 
> > My platform: Mac OS X 10.7.4 + Xcode 4.3.2 + OpenCV 2.4.0 
> > 
> > 
> > I want to use tesseract-ocr to recognize a few image (see attachment), 
> and 
> > I wrote a simple function to process the image using OpenCV, which is 
> shown 
> > as following 
> > 
> > 
> > char* wl_ocr(const IplImage* im) 
> > 
> > { 
> > 
> >     // convert image to gray 
> > 
> >     IplImage* imGray = wl_rgb2gray(im); 
> > 
> >     cv::Mat matGray = imGray; 
> > 
> >     
> > 
> >     // initialize tesseract-ocr 
> > 
> >     tesseract::TessBaseAPI tess; 
> > 
> >     tess.Init("", "eng", tesseract::OEM_DEFAULT); 
> > 
> >     tess.SetVariable("tessedit_char_whitelist", 
> "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 
> > ); 
> > 
> >     // tess.SetVariable("tessedit_char_whitelist", 
> > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"); 
> > 
> >     tess.SetPageSegMode(tesseract::PSM_AUTO); 
> > 
> >     
> > 
> >     // process the image 
> > 
> >     // tess.TesseractRect(matGray.data, 1, matGray.step1(), 0, 0, 
> > matGray.cols, matGray.rows); 
> > 
> >     tess.SetImage((uchar*)matGray.data, matGray.size().width, 
> matGray.size 
> > ().height, matGray.channels(), matGray.step1()); 
> > 
> >     tess.Recognize(0); 
> > 
> >     
> > 
> >     // get the recognized text 
> > 
> >     char* text; 
> > 
> >     text = tess.GetUTF8Text(); 
> > 
> >     
> > 
> >     // clean up 
> > 
> >     cvReleaseImage(&imGray); 
> > 
> >     
> > 
> >     return text; 
> > 
> > } 
> > 
> > 
> > I got the following results: 
> > 
> > 
> > 0.png --> CAUTION 
> > 
> > 1.png --> TILE WAL 
> > 
> > 2.png --> SLIPPERY 
> > 
> > 
> > The correct one should be: 
> > 
> > 
> > 0.png --> CAUTION 
> > 
> > 1.png --> TILE WALKWAY 
> > 
> > 2.png --> SLIPPERY WHEN WET 
> > 
> > 
> > The images seem to be pretty simple and clean, but my function cannot 
> > output the whole words but only part of the words. I am not sure if I 
> > misconfigure something in my code or if there is anything wrong with my 
> > code. 
> > 
> > 
> > BTW. I did not train tesseract-ocr, I simply copy eng.traineddata to 
> > certain folder (/usr/local/share/tessdata) 
> > 
>
> What version of tesseract are you using? At the moment I do not have 
> time to test your code, but I just tried this (using tesseract 3.02): 
>
> $ tesseract 0.png 0 && cat 0.txt 
> Tesseract Open Source OCR Engine v3.02 with Leptonica 
> CAUTION 
>
> $ tesseract 1.png 1 && cat 1.txt 
> Tesseract Open Source OCR Engine v3.02 with Leptonica 
> TILE WALKWAY 
>
> $ tesseract 2.png 2 && cat 2.txt 
> Tesseract Open Source OCR Engine v3.02 with Leptonica 
> SLIPPERY WHEN WET 
>
> it looks tesseract 3.02 is able to OCR your images correctly (e.g. you 
> should upgrade to 3.02 version or debug your code). 
>
> -- 
> Zdenko 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: tesseract-ocr does not output desired results

Reply via email to