Re: Image to text conversion

2015-04-30 Thread Pei Chen
Sekhar,
There are a few open Jira's:
I think it would be a great contribution if you get this to work:

   - CTAKES-189 https://issues.apache.org/jira/browse/CTAKES-189

GSoC: Implement OCR/Tika to standardize text input for cTAKES

   -
  - CTAKES-105 https://issues.apache.org/jira/browse/CTAKES-105

   Add Apache Tika integration


On Thu, Apr 30, 2015 at 1:21 AM, Hari, Sekhar sekhar.h...@cgi.com wrote:

 Thanks. Let me try this, and will let you know for any help if required.

 Cheers,
 Sekhar H.

 -Original Message-
 From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
 Sent: Thursday, April 30, 2015 10:44 AM
 To: dev@ctakes.apache.org; u...@ctakes.apache.org
 Subject: Re: Image to text conversion

 What about using Apache Tika within cTAKES for this? Tika supports OCR
 through Tesseract:

 http://wiki.apache.org/tika/TikaOCR

 Cheers,
 Chris


 ++
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398) NASA Jet
 Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Associate Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++






 -Original Message-
 From: Hari, Sekhar sekhar.h...@cgi.com
 Reply-To: dev@ctakes.apache.org dev@ctakes.apache.org
 Date: Wednesday, April 29, 2015 at 10:11 PM
 To: dev@ctakes.apache.org dev@ctakes.apache.org, 
 u...@ctakes.apache.org u...@ctakes.apache.org
 Subject: Image to text conversion

 Hello All -
 
 I am looking for an OCR ability in cTAKES. The requirement is to
 convert scanned image documents (ex: scanned hand written
 prescriptions) into a text format. Then apply the usual NLP pipeline to
 convert the unstructured text to a structured data.
 
 Can cTAKES convert scanned image documents into a text? If so, please
 help me to understand this by sharing any documents or video.
 
 Many thanks,
 Sekhar H.
 




Re: Image to text conversion

2015-04-29 Thread Mattmann, Chris A (3980)
What about using Apache Tika within cTAKES for this? Tika supports
OCR through Tesseract:

http://wiki.apache.org/tika/TikaOCR

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Hari, Sekhar sekhar.h...@cgi.com
Reply-To: dev@ctakes.apache.org dev@ctakes.apache.org
Date: Wednesday, April 29, 2015 at 10:11 PM
To: dev@ctakes.apache.org dev@ctakes.apache.org,
u...@ctakes.apache.org u...@ctakes.apache.org
Subject: Image to text conversion

Hello All -

I am looking for an OCR ability in cTAKES. The requirement is to convert
scanned image documents (ex: scanned hand written prescriptions) into a
text format. Then apply the usual NLP pipeline to convert the
unstructured text to a structured data.

Can cTAKES convert scanned image documents into a text? If so, please
help me to understand this by sharing any documents or video.

Many thanks,
Sekhar H.




Image to text conversion

2015-04-29 Thread Hari, Sekhar
Hello All -

I am looking for an OCR ability in cTAKES. The requirement is to convert 
scanned image documents (ex: scanned hand written prescriptions) into a text 
format. Then apply the usual NLP pipeline to convert the unstructured text to a 
structured data.

Can cTAKES convert scanned image documents into a text? If so, please help me 
to understand this by sharing any documents or video.

Many thanks,
Sekhar H.



RE: Image to text conversion

2015-04-29 Thread Hari, Sekhar
Thanks. Let me try this, and will let you know for any help if required.

Cheers,
Sekhar H.

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Thursday, April 30, 2015 10:44 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Image to text conversion

What about using Apache Tika within cTAKES for this? Tika supports OCR through 
Tesseract:

http://wiki.apache.org/tika/TikaOCR

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department University of Southern 
California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Hari, Sekhar sekhar.h...@cgi.com
Reply-To: dev@ctakes.apache.org dev@ctakes.apache.org
Date: Wednesday, April 29, 2015 at 10:11 PM
To: dev@ctakes.apache.org dev@ctakes.apache.org, u...@ctakes.apache.org 
u...@ctakes.apache.org
Subject: Image to text conversion

Hello All -

I am looking for an OCR ability in cTAKES. The requirement is to 
convert scanned image documents (ex: scanned hand written 
prescriptions) into a text format. Then apply the usual NLP pipeline to 
convert the unstructured text to a structured data.

Can cTAKES convert scanned image documents into a text? If so, please 
help me to understand this by sharing any documents or video.

Many thanks,
Sekhar H.