On Monday, 22 June 2015 13:56:51 UTC+2, Gunasekaran Velu wrote: > > > > HI > > I have attached the image as well as Tesseract OCR result for attached > image screen shot. the below OCR some words are missing from OCR how can i > improve the image quality to detect the missing words. > > The attached image DPI are > > Horizontal resolution - 204 DPI > Vertical resolution - 98 DPI > > Please help me to improve the OCR accuracy. > > Looking forward your reply. > > Regards > Guna > > > Hi,
I use tesseract to do ocr conversion on bank transfer forms scanned on my flatbed scanner. Although I am restricting conversion to only digits plus some very few special characters, and although I do pre-processing with ImageMagick (select the area to be converted, cut off noise) I still observed an amount of residual errors hard to explain - and to tolerate. I now obtained substantial improvements by taking particular care when aligning the transfer forms to the border of my scanner. Tesseract appears to be very sensible to rotational mis-alignments. A second (but to a lesser degree) improvement can be made by playing with the character size. My ImageMagick filter allows to play with the size of the characters submitted to OCR conversion. Normally, I use a scaling factor of 200%, but when a transfer form presents problems, the result can often be improved by modifying the scaling factor to something between 100% and up to 400%. On the other hand, I normally scan with 300 dpi resolution - going beyond that did not have any significant impact on the error rate of the result of OCR conversion. Are - the rotational sensitivity, - dependency on the size of the scanned characters known issues with tesseract (and does tesseract allow to deal with these 2 problems)? - I already wondered whether I should enhance my pre-processing with ImageMagic to detect and correct the problem of rotational mis-alignment Juergen -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/89b74054-d1c2-4277-88d1-e161c11fd589%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

