Hi, The result output of OCR also depends on traineddata file of the language of the input image. If you have a good traineddata file for sanskrit you can use FreeOCR 4.2(http://www.paperfile.net/) by adding it in the settings-->open language folder and pasting it there. FreeOCR 4.2 does the entire PDF book (input at 'open PDF' ) at one click OCR-->ocr all pages. Try with original book first and if not satisfaactory convert cleaned images into PDF book again I also need sanskrit traineddata file if you can spare it.. Wishing success, MNS Rao
On Friday, 23 August 2013 18:38:44 UTC+5:30, shree wrote: > > I > want to OCR a sanskrit book available as a pdf. > > I used gsview to save all pages as png and > then used scantailor to deskew the images which saved them as tifs. > Then I used irfanview to apply blur and median filters as the text is very > grainy in the original and also resized the page to a smaller size. > > The pre-processed image as above is giving better result than original. > > I would like to know if there is a simpler/better method to pre-process > the image. The pdf is 500+ pages. > > I am attaching a single page from the pdf and the processed image file. > > Thnaks, > Shree > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

