Try to manually clean the images with Gimp, remove the black noise and see
if it helps. Also try to remove the white border. After each step run
tesseract again to see if the problem was there.
Also try to downscale the images so that the text is 40/60 px tall, try
different sizes and see what works best. As an alternative you can play
with the dpi settings (but I never did this). Tesseract does not know how
tall your text is and where lines are, if the 0 is a zero or a big dot, if
the 1 is a one or a quote.

Also try PSM single block.

Once you found the problem, fix the image with code before passing it to
tesseract.


Bye

Lorenzo


Il giorno mar 27 ago 2019 alle ore 11:12 Stephane Charette <
[email protected]> ha scritto:

> I have a large number of images that contain a single line of alphanumeric
> data.  My scans so far have not been great, and I could use some assistance.
>
> Several vars are turned off as recommended in the docs:
>
>     key.push_back("load_system_dawg");
>     val.push_back("false");
>     key.push_back("load_freq_dawg");
>     val.push_back("false");
>
>
> These are set at initialization:
>
>     tess->Init(nullptr, "eng", tesseract::OEM_DEFAULT, nullptr, 0, &key, 
> &val, false);
>     tess->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);
>
>
> Some images are close, such as this one:
>
> [image: "32 EC 5"]
> ...which is interpreted as "SZ2EC 3".
>
> Other like this one return a blank string:
>
> [image: "30 B 9"]
> And then I have some like this one which is so close, but Tesseract
> removes the spaces between the letters, so this example results in "1201":
>
> [image: "12 O 1"]
> I've posted my full .cpp test file and more example images showing the
> problem on StackOverflow:
> https://stackoverflow.com/questions/57670769/how-to-get-tesseract-to-recognize-these-alphanumeric-strings
>
> Thanks,
>
> Stéphane
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f721e105-d0d6-4322-b9c5-6c5f2d487d06%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f721e105-d0d6-4322-b9c5-6c5f2d487d06%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzkwWCLYN%2B9eO7_GbBRmpT39VX0_W1jrjCEy43zNxOqSQ%40mail.gmail.com.

Reply via email to