Hi Ameera, Please do check with other images too as I tested with only one image that you sent.
I had initially tried fine tuning (impact and plus) but those were not giving accurate results for 2nd line. Then I tried replace the top layer, using new training text all in UPPER case, with many lines in the same format as the image u sent. I used just a couple of fonts that looked similar to the image. Regarding the image, I tested different versions by changing it interactively in irfanview. Mainly, straighten the image, convert to black and white , resize to half and then half again. I haven't tested the new traineddata with the original image. I will email you the training text and fonts used, if you want. On Sat, 23 Mar 2019, 03:33 , <[email protected]> wrote: > Hi Shree, > > Thanks for sending these images and the traineddata file. I confirmed > that they worked. Would you please tell me a little bit more about what > kind of image processing you used to make the .png images and how you > created your traineddata file using fine-tuning? > > Thank you, > Ameera > > On Friday, March 22, 2019 at 12:11:11 AM UTC-7, [email protected] wrote: >> >> I am trying to fine-tune Tesseract for dot-matrix fonts such as that in >> the picture below. When the dots are closely spaced together and touch, >> Tesseract can more or less handle the dot-matrix font with some fine-tuning >> and image processing. However, when the dots do not touch, as in the >> picture below, Tesseract struggles. >> >> >> I read in An Overview of the Tesseract OCR Engine >> <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf> >> that >> the first step in Tesseract's processing pipeline is a connected component >> analysis (second paragraph of Section 2). Since the letters in a >> dot-matrix font do not form connected components, I am wondering if >> Tesseract's connected component analysis may be one reason that Tesseract >> struggles on the image below. >> >> >> Is there a command to see how Tesseract performs connected component >> analysis on this image? >> >> >> [image: ex_20.jpg] >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVAFv0PM1q2pW512cXeDzZRRhhU%3DfpF4FURH2b9mjm8ig%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

