Also changed image to 300 dpi and used --dpi 300. On Sat, 23 Mar 2019, 07:43 Shree Devi Kumar, <[email protected]> wrote:
> Hi Ameera, > > Please do check with other images too as I tested with only one image that > you sent. > > I had initially tried fine tuning (impact and plus) but those were not > giving accurate results for 2nd line. > > Then I tried replace the top layer, using new training text all in UPPER > case, with many lines in the same format as the image u sent. I used just a > couple of fonts that looked similar to the image. > > Regarding the image, I tested different versions by changing it > interactively in irfanview. Mainly, straighten the image, convert to black > and white , resize to half and then half again. I haven't tested the new > traineddata with the original image. > > I will email you the training text and fonts used, if you want. > > On Sat, 23 Mar 2019, 03:33 , <[email protected]> wrote: > >> Hi Shree, >> >> Thanks for sending these images and the traineddata file. I confirmed >> that they worked. Would you please tell me a little bit more about what >> kind of image processing you used to make the .png images and how you >> created your traineddata file using fine-tuning? >> >> Thank you, >> Ameera >> >> On Friday, March 22, 2019 at 12:11:11 AM UTC-7, [email protected] >> wrote: >>> >>> I am trying to fine-tune Tesseract for dot-matrix fonts such as that in >>> the picture below. When the dots are closely spaced together and touch, >>> Tesseract can more or less handle the dot-matrix font with some fine-tuning >>> and image processing. However, when the dots do not touch, as in the >>> picture below, Tesseract struggles. >>> >>> >>> I read in An Overview of the Tesseract OCR Engine >>> <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf> >>> that >>> the first step in Tesseract's processing pipeline is a connected component >>> analysis (second paragraph of Section 2). Since the letters in a >>> dot-matrix font do not form connected components, I am wondering if >>> Tesseract's connected component analysis may be one reason that Tesseract >>> struggles on the image below. >>> >>> >>> Is there a command to see how Tesseract performs connected component >>> analysis on this image? >>> >>> >>> [image: ex_20.jpg] >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXSuFqWzPKgSFE9KRWpBcXTZo4JWruGsXWoPajfp9gPJQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

