Hi Balaji, You may find this GitHub repository useful, https://github.com/ameera3/OCR_Expiration_Date
Best wishes, ameera3 On Friday, June 7, 2019 at 6:18:39 AM UTC-7, Balaji Gurunathan wrote: > > Hi, > > I've a similar requirement to read dot-matrix fonts but I'm not sure where > to begin this from since I'm new to Tesseract. Could you please share > references/guide. > > Thanks. > > On Friday, March 22, 2019 at 12:41:11 PM UTC+5:30, [email protected] > wrote: >> >> I am trying to fine-tune Tesseract for dot-matrix fonts such as that in >> the picture below. When the dots are closely spaced together and touch, >> Tesseract can more or less handle the dot-matrix font with some fine-tuning >> and image processing. However, when the dots do not touch, as in the >> picture below, Tesseract struggles. >> >> >> I read in An Overview of the Tesseract OCR Engine >> <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf> >> that >> the first step in Tesseract's processing pipeline is a connected component >> analysis (second paragraph of Section 2). Since the letters in a >> dot-matrix font do not form connected components, I am wondering if >> Tesseract's connected component analysis may be one reason that Tesseract >> struggles on the image below. >> >> >> Is there a command to see how Tesseract performs connected component >> analysis on this image? >> >> >> [image: ex_20.jpg] >> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5aa03b3d-8fa7-4333-81e0-b0aa2dd609b7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

