Re: Use of scanned documents for text extraction and indexing

Shashi Kant Thu, 26 Feb 2009 09:11:34 -0800

Another project worth investigating is Tesseract.

http://code.google.com/p/tesseract-ocr/





----- Original Message ----
From: Hannes Carl Meyer <[email protected]>
To: [email protected]
Sent: Thursday, February 26, 2009 11:35:14 AM
Subject: Re: Use of scanned documents for text extraction and indexing

Hi Sithu,

there is a project called ocropus done by the DFKI, check the online demo
here: http://demo.iupr.org/cgi-bin/main.cgi

And also http://sites.google.com/site/ocropus/

Regards

Hannes

[email protected]
http://mimblog.de

On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
[email protected]> wrote:

>
> Hi All:
>
> Is there any study / research done on using scanned paper documents as
> images (may be PDF), and then use some OCR or other technique for
> extracting text, and the resultant index quality?
>
>
> Thanks in advance,
> Sithu D Sudarsan
>
> [email protected]
> [email protected]
>
>
>

Re: Use of scanned documents for text extraction and indexing

Reply via email to