On Monday, 22 June 2015 13:56:51 UTC+2, Gunasekaran Velu wrote:
>
>
>
> HI
>
> I have attached the image as well as Tesseract OCR result for attached 
> image screen shot. the below OCR some words are missing from OCR how can i 
> improve the image quality to detect the missing words.
>
> The attached image DPI are
>
> Horizontal resolution - 204 DPI
> Vertical resolution    -    98 DPI
>
> Please help me to improve the OCR accuracy.
>
> Looking forward your reply.
>
> Regards
> Guna
>
>
>
Hi,

I use tesseract to do ocr conversion on bank transfer forms scanned on my 
flatbed scanner. Although I am restricting conversion to only digits plus 
some very few special characters, and although I do pre-processing with 
ImageMagick (select the area to be converted, cut off noise) I still 
observed an amount of residual errors hard to explain - and to tolerate.

I now obtained substantial improvements by taking particular care when 
aligning the transfer forms to the border of my scanner. Tesseract appears 
to be very sensible to rotational mis-alignments.

A second (but to a lesser degree) improvement can be made by playing with 
the character size. My ImageMagick filter allows to play with the size of 
the characters submitted to OCR conversion. Normally, I use a scaling 
factor of 200%, but when a transfer form presents problems, the result can 
often be improved by modifying the scaling factor to something between 100% 
and up to 400%.

On the other hand, I normally scan with 300 dpi resolution - going beyond 
that did not have any significant impact on the error rate of the result of 
OCR conversion.

Are

   - the rotational sensitivity,
   - dependency on the size of the scanned characters

known issues with tesseract (and does tesseract allow to deal with these 2 
problems)? - I already wondered whether I should enhance my pre-processing 
with ImageMagic to detect and correct the problem of rotational 
mis-alignment

Juergen


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/89b74054-d1c2-4277-88d1-e161c11fd589%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to