Also changed image to 300 dpi and used --dpi 300.

On Sat, 23 Mar 2019, 07:43 Shree Devi Kumar, <[email protected]> wrote:

> Hi Ameera,
>
> Please do check with other images too as I tested with only one image that
> you sent.
>
> I had initially tried fine tuning (impact and plus) but those were not
> giving accurate results for 2nd line.
>
> Then I tried replace the top layer, using new training text all in UPPER
> case, with many lines in the same format as the image u sent. I used just a
> couple of fonts that looked similar to the image.
>
> Regarding the image, I tested different versions by changing it
> interactively in irfanview. Mainly, straighten the image, convert to black
> and white , resize to half and then half again. I haven't tested the new
> traineddata with the original image.
>
> I will email you the training text and fonts used, if you want.
>
> On Sat, 23 Mar 2019, 03:33 , <[email protected]> wrote:
>
>> Hi Shree,
>>
>> Thanks for sending these images and the traineddata file.  I confirmed
>> that they worked.  Would you please tell me a little bit more about what
>> kind of image processing you used to make the .png images and how you
>> created your traineddata file using fine-tuning?
>>
>> Thank you,
>> Ameera
>>
>> On Friday, March 22, 2019 at 12:11:11 AM UTC-7, [email protected]
>> wrote:
>>>
>>> I am trying to fine-tune Tesseract for dot-matrix fonts such as that in
>>> the picture below.  When the dots are closely spaced together and touch,
>>> Tesseract can more or less handle the dot-matrix font with some fine-tuning
>>> and image processing.  However, when the dots do not touch, as in the
>>> picture below, Tesseract struggles.
>>>
>>>
>>> I read in An Overview of the Tesseract OCR Engine
>>> <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf>
>>>  that
>>> the first step in Tesseract's processing pipeline is a connected component
>>> analysis (second paragraph of Section 2).  Since the letters in a
>>> dot-matrix font do not form connected components, I am wondering if
>>> Tesseract's connected component analysis may be one reason that Tesseract
>>> struggles on the image below.
>>>
>>>
>>> Is there a command to see how Tesseract performs connected component
>>> analysis on this image?
>>>
>>>
>>> [image: ex_20.jpg]
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/7a30ee84-cae8-406f-82e1-ca7767e40f20%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXSuFqWzPKgSFE9KRWpBcXTZo4JWruGsXWoPajfp9gPJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to