Thanks, Sven.

Yes, that's the kind of improvement I am looking for. I have read that
imagemagick is helpful in fixing the images. I'll give it a try.

I was hoping that someone in the group would mention the settings they used
to fix  similar grainy images .

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Sat, Aug 24, 2013 at 12:08 AM, Sven Pedersen <[email protected]>wrote:

> The white areas within the characters in the PNG version are likely to
> confuse tesseract about the character shapes. Perhaps you can do something
> to improve that? I think someone has posted methods for dealing with that
> recently.
> --Sven
>
>
> On Fri, Aug 23, 2013 at 9:08 AM, Shree Devi Kumar <[email protected]>wrote:
>
>> I
>> want to OCR a sanskrit book available as a pdf.
>>
>> I used gsview to save all pages as png and
>> then used scantailor to deskew the images which saved them as tifs.
>> Then I used irfanview to apply blur and median filters as the text is
>> very grainy in the original and also resized the page to a smaller size.
>>
>> The pre-processed image as above is giving better result than original.
>>
>> I would like to know if there is a simpler/better method to pre-process
>> the image. The pdf is 500+ pages.
>>
>> I am attaching a single page from the pdf and the processed image file.
>>
>> Thnaks,
>> Shree
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”
>
>
> --
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to