Hi Ameera,

Please do check with other images too as I tested with only one image that
you sent.

I had initially tried fine tuning (impact and plus) but those were not
giving accurate results for 2nd line.

Then I tried replace the top layer, using new training text all in UPPER
case, with many lines in the same format as the image u sent. I used just a
couple of fonts that looked similar to the image.

Regarding the image, I tested different versions by changing it
interactively in irfanview. Mainly, straighten the image, convert to black
and white , resize to half and then half again. I haven't tested the new
traineddata with the original image.

I will email you the training text and fonts used, if you want.

On Sat, 23 Mar 2019, 03:33 , <[email protected]> wrote:

> Hi Shree,
>
> Thanks for sending these images and the traineddata file.  I confirmed
> that they worked.  Would you please tell me a little bit more about what
> kind of image processing you used to make the .png images and how you
> created your traineddata file using fine-tuning?
>
> Thank you,
> Ameera
>
> On Friday, March 22, 2019 at 12:11:11 AM UTC-7, [email protected] wrote:
>>
>> I am trying to fine-tune Tesseract for dot-matrix fonts such as that in
>> the picture below.  When the dots are closely spaced together and touch,
>> Tesseract can more or less handle the dot-matrix font with some fine-tuning
>> and image processing.  However, when the dots do not touch, as in the
>> picture below, Tesseract struggles.
>>
>>
>> I read in An Overview of the Tesseract OCR Engine
>> <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf>
>>  that
>> the first step in Tesseract's processing pipeline is a connected component
>> analysis (second paragraph of Section 2).  Since the letters in a
>> dot-matrix font do not form connected components, I am wondering if
>> Tesseract's connected component analysis may be one reason that Tesseract
>> struggles on the image below.
>>
>>
>> Is there a command to see how Tesseract performs connected component
>> analysis on this image?
>>
>>
>> [image: ex_20.jpg]
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVAFv0PM1q2pW512cXeDzZRRhhU%3DfpF4FURH2b9mjm8ig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to