The white areas within the characters in the PNG version are likely to confuse tesseract about the character shapes. Perhaps you can do something to improve that? I think someone has posted methods for dealing with that recently. --Sven
On Fri, Aug 23, 2013 at 9:08 AM, Shree Devi Kumar <[email protected]<javascript:_e({}, 'cvml', '[email protected]');> > wrote: > I > want to OCR a sanskrit book available as a pdf. > > I used gsview to save all pages as png and > then used scantailor to deskew the images which saved them as tifs. > Then I used irfanview to apply blur and median filters as the text is very > grainy in the original and also resized the page to a smaller size. > > The pre-processed image as above is giving better result than original. > > I would like to know if there is a simpler/better method to pre-process > the image. The pdf is 500+ pages. > > I am attaching a single page from the pdf and the processed image file. > > Thnaks, > Shree > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to > [email protected]<javascript:_e({}, 'cvml', > '[email protected]');> > To unsubscribe from this group, send email to > [email protected] <javascript:_e({}, 'cvml', > 'tesseract-ocr%[email protected]');> > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:_e({}, > 'cvml', 'tesseract-ocr%[email protected]');>. > For more options, visit https://groups.google.com/groups/opt_out. > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

