Hi Zdenko,
I installed the 3.02 version, and it now works. I guess the problem is not
because I used a older version tesseract-ocr, but because I wrote an
additional function to keep only alphabet characters, which has bug in it.
I suppose the simple function I provide works for opencv data.
Thanks for your help though :)
On Tuesday, July 17, 2012 10:20:48 AM UTC-7, zdenop wrote:
>
> Dňa 17.07.2012 02:32, Wei Liu wrote / napísal(a):
> >
> > My platform: Mac OS X 10.7.4 + Xcode 4.3.2 + OpenCV 2.4.0
> >
> >
> > I want to use tesseract-ocr to recognize a few image (see attachment),
> and
> > I wrote a simple function to process the image using OpenCV, which is
> shown
> > as following
> >
> >
> > char* wl_ocr(const IplImage* im)
> >
> > {
> >
> > // convert image to gray
> >
> > IplImage* imGray = wl_rgb2gray(im);
> >
> > cv::Mat matGray = imGray;
> >
> >
> >
> > // initialize tesseract-ocr
> >
> > tesseract::TessBaseAPI tess;
> >
> > tess.Init("", "eng", tesseract::OEM_DEFAULT);
> >
> > tess.SetVariable("tessedit_char_whitelist",
> "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> > );
> >
> > // tess.SetVariable("tessedit_char_whitelist",
> > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
> >
> > tess.SetPageSegMode(tesseract::PSM_AUTO);
> >
> >
> >
> > // process the image
> >
> > // tess.TesseractRect(matGray.data, 1, matGray.step1(), 0, 0,
> > matGray.cols, matGray.rows);
> >
> > tess.SetImage((uchar*)matGray.data, matGray.size().width,
> matGray.size
> > ().height, matGray.channels(), matGray.step1());
> >
> > tess.Recognize(0);
> >
> >
> >
> > // get the recognized text
> >
> > char* text;
> >
> > text = tess.GetUTF8Text();
> >
> >
> >
> > // clean up
> >
> > cvReleaseImage(&imGray);
> >
> >
> >
> > return text;
> >
> > }
> >
> >
> > I got the following results:
> >
> >
> > 0.png --> CAUTION
> >
> > 1.png --> TILE WAL
> >
> > 2.png --> SLIPPERY
> >
> >
> > The correct one should be:
> >
> >
> > 0.png --> CAUTION
> >
> > 1.png --> TILE WALKWAY
> >
> > 2.png --> SLIPPERY WHEN WET
> >
> >
> > The images seem to be pretty simple and clean, but my function cannot
> > output the whole words but only part of the words. I am not sure if I
> > misconfigure something in my code or if there is anything wrong with my
> > code.
> >
> >
> > BTW. I did not train tesseract-ocr, I simply copy eng.traineddata to
> > certain folder (/usr/local/share/tessdata)
> >
>
> What version of tesseract are you using? At the moment I do not have
> time to test your code, but I just tried this (using tesseract 3.02):
>
> $ tesseract 0.png 0 && cat 0.txt
> Tesseract Open Source OCR Engine v3.02 with Leptonica
> CAUTION
>
> $ tesseract 1.png 1 && cat 1.txt
> Tesseract Open Source OCR Engine v3.02 with Leptonica
> TILE WALKWAY
>
> $ tesseract 2.png 2 && cat 2.txt
> Tesseract Open Source OCR Engine v3.02 with Leptonica
> SLIPPERY WHEN WET
>
> it looks tesseract 3.02 is able to OCR your images correctly (e.g. you
> should upgrade to 3.02 version or debug your code).
>
> --
> Zdenko
>
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en