Thank you! The original has much more border around it. I just cropped it for easier viewing here. I already did a little bit of pre-processing but looks like I need to do more. Interesting that scaling up improved things. According to one analysis done, accuracy depends on character height. According to that - I had the optimum character height, but maybe things have changed. The original scan was done at 300 dpi. I'll try 600.
Incidentally ... I got so frustrated I wrote my own OCR program today. Only took me a few hours. Much more accurate than Tesseract, though working with fixed-width fonts makes life a lot easier!! Just divide the image up into a grid, and pattern match each "cell". As I was only interested in the numbers, I only had 16 (hex digits) to match against. Cheers On Thursday, November 2, 2023 at 12:43:12 PM UTC piggy wrote: > I added more white space around the target text by scaling the canvas to > 500 pixels wide, and then scaled up the whole image by a factor of 2. > > -230 6 5O > > 90 6 50 > > 90 6 -100 > 130 6 -100 > 130 6 -150 > > On Thu, Nov 2, 2023 at 8:35 AM La Monte H. P. Yarroll <[email protected]> > wrote: > >> I had a little success applying 2.5 pixels of blur and then thresholding >> at 217-255. FWIW, I used gimp for the preprocesing. Here's what I got after >> just a few minutes: >> a i @) >> >> -230 & 50 >> 90 6 50 >> 90 6 -100 >> >> 130 6 >> 130 6 >> >> ~100 >> -130 >> >> I don't know what happened to the first column or why the last 2 lines >> got split the way they did. >> >> >> On Wed, Nov 1, 2023 at 4:30 PM Slartybartfast < >> [email protected]> wrote: >> >>> Doesn't anybody have any ideas? :-( >>> >>> On Tuesday, October 24, 2023 at 5:40:20 PM UTC+1 Slartybartfast wrote: >>> >>>> Hi >>>> I am a new tesseract user, and I'm really struggling to get it to >>>> produce any kind of sensible results, especially with numerical text. I >>>> have some text that looks like this: >>>> [image: example_input.jpg] >>>> I've read the documentation, and looked through the parameter list, and >>>> I added the following to the command line: >>>> --psm 6 >>>> -c preserve_interword_spaces=1 >>>> -c textord_dotmatrix_gap=6 >>>> -c classify_bln_numeric_mode=1 >>>> -c rej_alphas_in_number_perm=1 >>>> >>>> But I just get garbage out: >>>> >>>> Oo -250 6 3a >>>> 190 & So >>>> 190 6 -100 >>>> 1 $1290 6 ~140 >>>> 1 $130 6 ~150 >>>> >>>> I've tried all sorts of additional image processing to try and improve >>>> the look of the text, but none of it works. In fact, this is the best >>>> output of seen. It's usually worse. I'm really hoping someone who has >>>> worked with dot-matrix input can offer some magic incantation to make >>>> tesseract come to its senses. Thanks. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/15797f86-58c9-4e71-b316-54f663d04cbfn%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/15797f86-58c9-4e71-b316-54f663d04cbfn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5c364cf1-076a-43e4-86f2-61b925b9d6c3n%40googlegroups.com.

