Ger, Thanks for taking the time to reply.
On 1/1/2021 4:00 PM, Ger Hobbelt wrote: Another technique specifically for dot-matrix might be to blend multiple copies of the scan at small offsets. The idea here is that back in the old days of dot matrix, a few DTP applications had printing modes which would print dot patterns several times on the same line, but ever so slightly offset from one another to 'fill the character up'. The poor man's way to print BOLD characters that way was to print the same line multiple times at slight offsets. The printer's manual actually details so much of this internal working. Besides schematics and BOM lists, descriptions of theory of operation, etc I had forgotten the level of detail we used to get when we bought a multi-hundred dollar product. Hence to simulate this sort of 'gap closing', one could scan at higher resolution, then offset the image multiple times in various directions by "half a printer dot" (or less) and blend the copies using a blending mode like Photoshop Darken. I **believe** that morphological dilation is similar to what you're talking about here. "Dilation [...] adds a layer of pixels to both the inner and outer boundaries of regions." from https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing-html/topic4.htm I tried a few different techniques similar to what you've mentioned. While conceptually it should help, practically speaking I saw only minimal improvement. While it's still a work in progress, I'm describing my current best efforts/results in the other reply here. Thanks, Keith On Friday, January 1, 2021 at 10:03:37 PM UTC-5 shree wrote: > Please see old thread at > https://groups.google.com/g/tesseract-ocr/c/ApM_TqwV7aE/m/z5jZV0I0AgAJ > for link to a completed project for dot matrix > > On Monday, December 14, 2020 at 12:11:00 PM UTC+5:30 Keith M wrote: > >> Hi there, >> >> I've been circling a problem with OCR'ing 90-pages of 30 year old BASIC >> code. I've been working on optimizing my scanning settings, and >> pre-processing, stuck in photoshop for hours messing around. Long couple >> days with this stuff! >> >> I've been through tessdoc, through the FAQ, through wikipedia reading >> about morphological operators. Through PPAs for 5.0.0-alpha-833-ga06c. >> >> I'm getting OK results so far, but need to process more images, my >> workflow is tedious. >> >> Sample image here >> https://www.techtravels.org/wp-content/uploads/2020/12/FNBBS-02_crop.png >> >> 150dpi image extracted via pdftoppm -png from a 1200dpi scan. While it's >> not super clear to me why, higher res scans are resulting in WORSE OCR's. >> >> *TLDR; What should be the ideal configuration of tesseract for my >> application? Disable the dictionary? Can I add BASIC commands and keywords >> to eng.user-words? From the manual "CONFIG FILES AND AUGMENTING WITH USER >> DATA" section ??* >> >> I could use some help, thanks! >> >> Keith >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bae14083-f171-4dce-8de1-f08151d5f57an%40googlegroups.com.

