Hello,
I just tried using tesseract-ocr for the first time but failed to get it
working for a simple case. I have a 400x45 pixels .bmp picture where the
characters are about 28 pixels high. They don't have to be digits but the
characters are machine printed.
cv::Mat gray = cv::imread("sample.bmp",0);
tesseract::TessBaseAPI tess;
int tesscreated = tess.Init("C:/Program Files/Tesseract-OCR/tessdata", "eng"
, tesseract::OEM_DEFAULT);
if (tesscreated==-1) {
throw(stderr, "Could not initialize tesseract.\n");
}
tess.SetImage((uchar*)gray.data, gray.cols, gray.rows, 1, gray.cols);
char* text = tess.GetUTF8Text();
result: *"&fl"*
using
tess.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
result: *"4Lu2A0—UJP"*
using
tess.SetVariable("tessedit_char_whitelist", "0123456789");
result: *"43"*
using
tess.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tess.SetVariable("tessedit_char_whitelist", "0123456789");
result: *"4 32523133"*
The results are not even close to what the ground truth is
("8712400764278").
Pre-processing like binarization helps but still it's far from perfect for
this simple case where I am planning to introduce noise for the next stage.
Does anybody know why I am getting poor results and have a suggestion how I
can improve them?
Thank you,
Levent
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/57e57c2d-c329-408c-af06-f47806884701%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.