Hi,
The result output of OCR also depends on traineddata file of the language 
of the input image. If you have a good traineddata file for sanskrit you 
can use FreeOCR 4.2(http://www.paperfile.net/) by adding it in the 
settings-->open language folder and pasting it there. FreeOCR 4.2 does the 
entire PDF book (input at 'open PDF' ) at one click OCR-->ocr all pages. 
Try with original book first and if not satisfaactory convert cleaned 
images into PDF book again  
 I also need sanskrit traineddata file if you can spare it..
Wishing success,
MNS Rao

On Friday, 23 August 2013 18:38:44 UTC+5:30, shree wrote:
>
> I
>  want to OCR a sanskrit book available as a pdf. 
>
> I used gsview to save all pages as png and 
> then used scantailor to deskew the images which saved them as tifs.
> Then I used irfanview to apply blur and median filters as the text is very 
> grainy in the original and also resized the page to a smaller size.
>
> The pre-processed image as above is giving better result than original.
>
> I would like to know if there is a simpler/better method to pre-process 
> the image. The pdf is 500+ pages.
>
> I am attaching a single page from the pdf and the processed image file.
>
> Thnaks,
> Shree
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to