Re: [tesseract-ocr] Reading dot matrix characters

ShreeDevi Kumar Wed, 05 Nov 2014 06:16:50 -0800

I have added it as an issue at
https://code.google.com/p/tesseract-ocr/issues/detail?id=1374


Please attach an image there with the whole alphabet - upper and lower case
as well as numbers to identify whether there are any other issues.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Nov 5, 2014 at 4:57 PM, ShreeDevi Kumar <[email protected]>
wrote:

> I had asked to try vietocr because it is using a newer svn version for the
> java 4.0beta  and I find it easy to test under windows with the gui, as I
> can change the image filter settings in it.
>
> You will have to choose the tools based on your platform and other
> requirements. You could use imagemagick for preprocessing. You may still
> have problem because of the shape of 'A'.
>
> I am attaching the results that I got using latest version of tesseract
> from git (I run it under msys2/mingw-w64 on windows8). I tried with the png
> and then with a modified tif - I used irfanview - negative (invert image) -
> blur - resize/resample to tif with lzw compression,
>
> Both image files and results are attached.
>
> BTW, I am using the english traineddata and other related files from
> https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata
> The file is 20.9 MB.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Nov 5, 2014 at 2:54 PM, <[email protected]> wrote:
>
>> I tried it with version 3.03 and found no improvements. As you suggested,
>> I used invert, tried blurring but could not improve recognition. VietOCR is
>> not an option as I have to integrate the recognition into an application
>> and have to do this without a GUI.
>> Could you tell me the steps (and if available, parameters) you used to
>> convert the image to get better results?
>>
>>
>> Am Donnerstag, 23. Oktober 2014 08:55:36 UTC+2 schrieb shree:
>>>
>>> Try .net wrapper with newer version of tesseract.
>>>
>>> invert the image, smoothen/blur, make greyscale ... I tried with vietocr
>>>
>>> output is 'QBCDEFGHIJKL'
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Thu, Oct 23, 2014 at 12:07 PM, <[email protected]> wrote:
>>>
>>>> Hello.
>>>>
>>>> I have images that contain characters that are made from individual
>>>> dots, like from a dot matrix printer. I tried to use various operations on
>>>> the images (binarization, edge detection, dilatation, ...) and was able to
>>>> make the dots bigger so they are connected 90% of the time. However,
>>>> detection is still very bad.
>>>>
>>>> This image contains characters from A to L
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-WxgjmUF846M/VEig6eA1FNI/AAAAAAAAAAM/BdQPQPVTUrs/s1600/AL.png>
>>>> my modified version is
>>>>
>>>>
>>>> <https://lh5.googleusercontent.com/-TUZSXsiBHJY/VEihDy5RCUI/AAAAAAAAAAU/HmwIkEemSAY/s1600/AL2.png>
>>>> after recognition, Tesseract (3.02, using the .NET wrapper) gives me
>>>> for the standard english language the characters "FJBEDEFEHIJKL". Only the
>>>> last 5 characters are right, the rest is wrong. Do you know of a way to
>>>> make recognition better besides training a new font for this special case?
>>>> Tesseract works quite good for other projects I have, I would love a
>>>> solution that does not rely on a special font if possible.
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/e6b8d4bb-ecc3-463c-9cc7-96f46a63be27%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e6b8d4bb-ecc3-463c-9cc7-96f46a63be27%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/a7a262b3-f785-44e8-82c1-56fc3e60eeec%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/a7a262b3-f785-44e8-82c1-56fc3e60eeec%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX4MNmMUgQbb8ML%2BJxt9Y0Qr50GTAweQ3NmVhsk4-JcQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Reading dot matrix characters

Reply via email to