[tesseract-ocr] my scan of alphanumeric data needs TLC

Stephane Charette Tue, 27 Aug 2019 02:12:15 -0700

I have a large number of images that contain a single line of alphanumeric 
data.  My scans so far have not been great, and I could use some assistance.


Several vars are turned off as recommended in the docs:

    key.push_back("load_system_dawg");
    val.push_back("false");
    key.push_back("load_freq_dawg");
    val.push_back("false");


These are set at initialization:

    tess->Init(nullptr, "eng", tesseract::OEM_DEFAULT, nullptr, 0, &key, &val, 
false);
    tess->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);


Some images are close, such as this one:

[image: "32 EC 5"]
...which is interpreted as "SZ2EC 3".

Other like this one return a blank string:

[image: "30 B 9"]
And then I have some like this one which is so close, but Tesseract removes 
the spaces between the letters, so this example results in "1201":

[image: "12 O 1"]
I've posted my full .cpp test file and more example images showing the 
problem on StackOverflow:  
https://stackoverflow.com/questions/57670769/how-to-get-tesseract-to-recognize-these-alphanumeric-strings

Thanks,

Stéphane

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f721e105-d0d6-4322-b9c5-6c5f2d487d06%40googlegroups.com.

[tesseract-ocr] my scan of alphanumeric data needs TLC

Reply via email to