Thanks, Sven. Yes, that's the kind of improvement I am looking for. I have read that imagemagick is helpful in fixing the images. I'll give it a try.
I was hoping that someone in the group would mention the settings they used to fix similar grainy images . Shree Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 24, 2013 at 12:08 AM, Sven Pedersen <[email protected]>wrote: > The white areas within the characters in the PNG version are likely to > confuse tesseract about the character shapes. Perhaps you can do something > to improve that? I think someone has posted methods for dealing with that > recently. > --Sven > > > On Fri, Aug 23, 2013 at 9:08 AM, Shree Devi Kumar <[email protected]>wrote: > >> I >> want to OCR a sanskrit book available as a pdf. >> >> I used gsview to save all pages as png and >> then used scantailor to deskew the images which saved them as tifs. >> Then I used irfanview to apply blur and median filters as the text is >> very grainy in the original and also resized the page to a smaller size. >> >> The pre-processed image as above is giving better result than original. >> >> I would like to know if there is a simpler/better method to pre-process >> the image. The pdf is 500+ pages. >> >> I am attaching a single page from the pdf and the processed image file. >> >> Thnaks, >> Shree >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

