If your whole body of document images suffers from such degradation,
you are about to have an interesting work ))  As you probably know,
LeadTools and AccuSoft are well-known players in this market. But
their products do not satisfy your semi-automatic requirement (you've
meant user-assisted?) I see your solution in developing your own
preprocessor using one of these SDKs (however not sure they can handle
such documents well) or develop everything from scratch. There's
plenty of articles about image denoising and cleanup. Such degradation
often can be seen in historical documents. You can start your search,
for example, from "An Adaptive Binarization Technique for Low Quality
Historical Documents", by three Greek scientists, 2004, and similar
articles.

Warm regards,
Dmitri Silaev
www.CustomOCR.com


On Wed, Nov 9, 2011 at 2:49 PM, Esteban Bordón <[email protected]> wrote:
> Hi,
>
> I send 2 examples of expedients.
>
> Thanks,
> Esteban.
>
> 2011/11/8 Sven Pedersen <[email protected]>
>>
>> Hi Esteban,
>> Please show us a sample, even if it is a partial image of a word or two.
>> Thanks,
>> Sven
>>
>> On Mon, Nov 7, 2011 at 6:46 PM, Esteban Bordón <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> I am currently working on digitalization of old typewritten expedients
>>> that are deteriorated.
>>> Tesseract does not work well with these images and I would like to know
>>> which tools are better to enhance the images. I have a lot of images and I
>>> need some semi-automated tool to make binarisation and segmentation.
>>>
>>> Thanks a lot for any help.
>>> Esteban.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>>
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to