tesseract-ocr does not output desired results

Wei Liu Mon, 16 Jul 2012 19:48:44 -0700


My platform: Mac OS X 10.7.4 + Xcode 4.3.2 + OpenCV 2.4.0



I want to use tesseract-ocr to recognize a few image (see attachment), and 
I wrote a simple function to process the image using OpenCV, which is shown 
as following


char* wl_ocr(const IplImage* im)

{

    // convert image to gray

    IplImage* imGray = wl_rgb2gray(im);

    cv::Mat matGray = imGray;

    

    // initialize tesseract-ocr

    tesseract::TessBaseAPI tess;

    tess.Init("", "eng", tesseract::OEM_DEFAULT);

    tess.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
);

    // tess.SetVariable("tessedit_char_whitelist", 
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");

    tess.SetPageSegMode(tesseract::PSM_AUTO);

    

    // process the image

    // tess.TesseractRect(matGray.data, 1, matGray.step1(), 0, 0, 
matGray.cols, matGray.rows);

    tess.SetImage((uchar*)matGray.data, matGray.size().width, matGray.size
().height, matGray.channels(), matGray.step1());

    tess.Recognize(0);

    

    // get the recognized text

    char* text;

    text = tess.GetUTF8Text();

    

    // clean up

    cvReleaseImage(&imGray);

    

    return text;

}


I got the following results:


0.png --> CAUTION

1.png --> TILE WAL

2.png --> SLIPPERY


The correct one should be:


0.png --> CAUTION

1.png --> TILE WALKWAY

2.png --> SLIPPERY WHEN WET


The images seem to be pretty simple and clean, but my function cannot 
output the whole words but only part of the words. I am not sure if I 
misconfigure something in my code or if there is anything wrong with my 
code.


BTW. I did not train tesseract-ocr, I simply copy eng.traineddata to 
certain folder (/usr/local/share/tessdata)

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

<<attachment: 0.png>>

<<attachment: 1.png>>

<<attachment: 2.png>>

tesseract-ocr does not output desired results

Reply via email to