I had a little success applying 2.5 pixels of blur and then thresholding at 217-255. FWIW, I used gimp for the preprocesing. Here's what I got after just a few minutes: a i @)
-230 & 50 90 6 50 90 6 -100 130 6 130 6 ~100 -130 I don't know what happened to the first column or why the last 2 lines got split the way they did. On Wed, Nov 1, 2023 at 4:30 PM Slartybartfast <[email protected]> wrote: > Doesn't anybody have any ideas? :-( > > On Tuesday, October 24, 2023 at 5:40:20 PM UTC+1 Slartybartfast wrote: > >> Hi >> I am a new tesseract user, and I'm really struggling to get it to produce >> any kind of sensible results, especially with numerical text. I have some >> text that looks like this: >> [image: example_input.jpg] >> I've read the documentation, and looked through the parameter list, and I >> added the following to the command line: >> --psm 6 >> -c preserve_interword_spaces=1 >> -c textord_dotmatrix_gap=6 >> -c classify_bln_numeric_mode=1 >> -c rej_alphas_in_number_perm=1 >> >> But I just get garbage out: >> >> Oo -250 6 3a >> 190 & So >> 190 6 -100 >> 1 $1290 6 ~140 >> 1 $130 6 ~150 >> >> I've tried all sorts of additional image processing to try and improve >> the look of the text, but none of it works. In fact, this is the best >> output of seen. It's usually worse. I'm really hoping someone who has >> worked with dot-matrix input can offer some magic incantation to make >> tesseract come to its senses. Thanks. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/15797f86-58c9-4e71-b316-54f663d04cbfn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/15797f86-58c9-4e71-b316-54f663d04cbfn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAL7mBq42NBRBQH6BP1MTVC2T7ww3AV4shvcGmaTsiC-CNwT%2B5Q%40mail.gmail.com.

