Re: UIMA- Support for HTML, PDF, Doc files

Jörn Kottmann Thu, 29 Sep 2011 01:54:02 -0700

Hello,

UIMA itself is just a framework to build analysis pipelines. To analyzeHTML, PDF or Word documents

you need a component which can extract the text from these formats.

You can use Apache Tika together with our Tika integration in the addonsproject

to extract text from various data formats.

Jörn

On 9/29/11 8:28 AM, abhishek wrote:

Hi,
While reading the docuemntation of UIMA, i found out that 
UIMA&nbsp;supports&nbsp;html files.
&nbsp;
However, when i am running the 
org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to 
understand the text.
&nbsp;
Kindly let me know, the correct way to read these type of files.
&nbsp;

Re: UIMA- Support for HTML, PDF, Doc files

Reply via email to