[tesseract-ocr] Simple OCR fails - why and what can be done

Levent Serbesatik Tue, 06 May 2014 02:23:30 -0700

Hello,

I just tried using tesseract-ocr for the first time but failed to get it 
working for a simple case. I have a 400x45 pixels .bmp picture where the 
characters are about 28 pixels high. They don't have to be digits but the 
characters are machine printed.


cv::Mat gray = cv::imread("sample.bmp",0);

tesseract::TessBaseAPI tess; 
int tesscreated = tess.Init("C:/Program Files/Tesseract-OCR/tessdata", "eng"
, tesseract::OEM_DEFAULT);
    if (tesscreated==-1) {
        throw(stderr, "Could not initialize tesseract.\n");
    }

tess.SetImage((uchar*)gray.data, gray.cols, gray.rows, 1, gray.cols);

char* text = tess.GetUTF8Text();
result:        *"&ï¬‚"*

using
tess.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
result: *"4Lu2A0â€”UJP"*

using
tess.SetVariable("tessedit_char_whitelist", "0123456789"); 
result:       *"43"*

using
tess.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tess.SetVariable("tessedit_char_whitelist", "0123456789"); 
result: *"4 32523133"*

The results are not even close to what the ground truth is 
("8712400764278").
Pre-processing like binarization helps but still it's far from perfect for 
this simple case where I am planning to introduce noise for the next stage.

Does anybody know why I am getting poor results and have a suggestion how I 
can improve them?

Thank you,
Levent



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/57e57c2d-c329-408c-af06-f47806884701%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Simple OCR fails - why and what can be done

Reply via email to