My platform: Mac OS X 10.7.4 + Xcode 4.3.2 + OpenCV 2.4.0
I want to use tesseract-ocr to recognize a few image (see attachment), and
I wrote a simple function to process the image using OpenCV, which is shown
as following
char* wl_ocr(const IplImage* im)
{
// convert image to gray
IplImage* imGray = wl_rgb2gray(im);
cv::Mat matGray = imGray;
// initialize tesseract-ocr
tesseract::TessBaseAPI tess;
tess.Init("", "eng", tesseract::OEM_DEFAULT);
tess.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
);
// tess.SetVariable("tessedit_char_whitelist",
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
tess.SetPageSegMode(tesseract::PSM_AUTO);
// process the image
// tess.TesseractRect(matGray.data, 1, matGray.step1(), 0, 0,
matGray.cols, matGray.rows);
tess.SetImage((uchar*)matGray.data, matGray.size().width, matGray.size
().height, matGray.channels(), matGray.step1());
tess.Recognize(0);
// get the recognized text
char* text;
text = tess.GetUTF8Text();
// clean up
cvReleaseImage(&imGray);
return text;
}
I got the following results:
0.png --> CAUTION
1.png --> TILE WAL
2.png --> SLIPPERY
The correct one should be:
0.png --> CAUTION
1.png --> TILE WALKWAY
2.png --> SLIPPERY WHEN WET
The images seem to be pretty simple and clean, but my function cannot
output the whole words but only part of the words. I am not sure if I
misconfigure something in my code or if there is anything wrong with my
code.
BTW. I did not train tesseract-ocr, I simply copy eng.traineddata to
certain folder (/usr/local/share/tessdata)
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
<<attachment: 0.png>>
<<attachment: 1.png>>
<<attachment: 2.png>>

