Hello,

UIMA itself is just a framework to build analysis pipelines. To analyze HTML, PDF or Word documents
you need a component which can extract the text from these formats.

You can use Apache Tika together with our Tika integration in the addons project
to extract text from various data formats.

Jörn

On 9/29/11 8:28 AM, abhishek wrote:
Hi,
While reading the docuemntation of UIMA, i found out that 
UIMA supports html files.
 
However, when i am running the 
org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to 
understand the text.
 
Kindly let me know, the correct way to read these type of files.
 

Reply via email to