The white areas within the characters in the PNG version are likely to
confuse tesseract about the character shapes. Perhaps you can do something
to improve that? I think someone has posted methods for dealing with that
recently.
--Sven


On Fri, Aug 23, 2013 at 9:08 AM, Shree Devi Kumar
<[email protected]<javascript:_e({}, 'cvml',
'[email protected]');>
> wrote:

> I
> want to OCR a sanskrit book available as a pdf.
>
> I used gsview to save all pages as png and
> then used scantailor to deskew the images which saved them as tifs.
> Then I used irfanview to apply blur and median filters as the text is very
> grainy in the original and also resized the page to a smaller size.
>
> The pre-processed image as above is giving better result than original.
>
> I would like to know if there is a simpler/better method to pre-process
> the image. The pdf is 500+ pages.
>
> I am attaching a single page from the pdf and the processed image file.
>
> Thnaks,
> Shree
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to 
> [email protected]<javascript:_e({}, 'cvml', 
> '[email protected]');>
> To unsubscribe from this group, send email to
> [email protected] <javascript:_e({}, 'cvml',
> 'tesseract-ocr%[email protected]');>
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:_e({},
> 'cvml', 'tesseract-ocr%[email protected]');>.
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”


-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to