You need to reduce it to black and white, or at least greyscale. This appears to be crafted specifically to thwart OCR. The color gradient in the background is echoed in the numbers so select by color isn't that helpful. It looks like gimp fuzzy select can get you close, but those drop-shadows around the digits are really a pain. They're not properly all the same color. If you can get the background and shadows around the digits to a proper black, you may be able to invert the colors and get something useful.
After 20 minutes of mucking about, I've not been able to produce anything usable, unless you would be happy with just the labels and not the numbers. On Sun, Nov 5, 2023 at 11:40 AM Harry Stevenson <[email protected]> wrote: > I'm trying to extract numerical data from this image, but I'm not getting > good results. Can anyone recommend any other config options/ how I should > crop it to help. > Here is what it currently recognises: > `$ > 8 ee > - oO eta > 8334 .°8R 3 > 339 Sf 2 fe > 2 Soe £3 BS oO > BRSal Sage > 5 SEE: > papeo cee |` > with config: --psm 5 load_system_dawg=false load_freq_dawg=false > Thank you > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/930a2ad3-e8c7-4239-95a0-8ad1e1dcc53dn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/930a2ad3-e8c7-4239-95a0-8ad1e1dcc53dn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAL7mBq5UHAwp9fUfqkso_dWuPaQRQ0hbP8nt3JsScV05%2B3Arhw%40mail.gmail.com.

