I have added it as an issue at https://code.google.com/p/tesseract-ocr/issues/detail?id=1374
Please attach an image there with the whole alphabet - upper and lower case as well as numbers to identify whether there are any other issues. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 5, 2014 at 4:57 PM, ShreeDevi Kumar <[email protected]> wrote: > I had asked to try vietocr because it is using a newer svn version for the > java 4.0beta and I find it easy to test under windows with the gui, as I > can change the image filter settings in it. > > You will have to choose the tools based on your platform and other > requirements. You could use imagemagick for preprocessing. You may still > have problem because of the shape of 'A'. > > I am attaching the results that I got using latest version of tesseract > from git (I run it under msys2/mingw-w64 on windows8). I tried with the png > and then with a modified tif - I used irfanview - negative (invert image) - > blur - resize/resample to tif with lzw compression, > > Both image files and results are attached. > > BTW, I am using the english traineddata and other related files from > https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata > The file is 20.9 MB. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Nov 5, 2014 at 2:54 PM, <[email protected]> wrote: > >> I tried it with version 3.03 and found no improvements. As you suggested, >> I used invert, tried blurring but could not improve recognition. VietOCR is >> not an option as I have to integrate the recognition into an application >> and have to do this without a GUI. >> Could you tell me the steps (and if available, parameters) you used to >> convert the image to get better results? >> >> >> Am Donnerstag, 23. Oktober 2014 08:55:36 UTC+2 schrieb shree: >>> >>> Try .net wrapper with newer version of tesseract. >>> >>> invert the image, smoothen/blur, make greyscale ... I tried with vietocr >>> >>> output is 'QBCDEFGHIJKL' >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Thu, Oct 23, 2014 at 12:07 PM, <[email protected]> wrote: >>> >>>> Hello. >>>> >>>> I have images that contain characters that are made from individual >>>> dots, like from a dot matrix printer. I tried to use various operations on >>>> the images (binarization, edge detection, dilatation, ...) and was able to >>>> make the dots bigger so they are connected 90% of the time. However, >>>> detection is still very bad. >>>> >>>> This image contains characters from A to L >>>> >>>> >>>> <https://lh3.googleusercontent.com/-WxgjmUF846M/VEig6eA1FNI/AAAAAAAAAAM/BdQPQPVTUrs/s1600/AL.png> >>>> my modified version is >>>> >>>> >>>> <https://lh5.googleusercontent.com/-TUZSXsiBHJY/VEihDy5RCUI/AAAAAAAAAAU/HmwIkEemSAY/s1600/AL2.png> >>>> after recognition, Tesseract (3.02, using the .NET wrapper) gives me >>>> for the standard english language the characters "FJBEDEFEHIJKL". Only the >>>> last 5 characters are right, the rest is wrong. Do you know of a way to >>>> make recognition better besides training a new font for this special case? >>>> Tesseract works quite good for other projects I have, I would love a >>>> solution that does not rely on a special font if possible. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/e6b8d4bb-ecc3-463c-9cc7-96f46a63be27%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/e6b8d4bb-ecc3-463c-9cc7-96f46a63be27%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/a7a262b3-f785-44e8-82c1-56fc3e60eeec%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/a7a262b3-f785-44e8-82c1-56fc3e60eeec%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX4MNmMUgQbb8ML%2BJxt9Y0Qr50GTAweQ3NmVhsk4-JcQw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

