Hello,
UIMA itself is just a framework to build analysis pipelines. To analyze
HTML, PDF or Word documents
you need a component which can extract the text from these formats.
You can use Apache Tika together with our Tika integration in the addons
project
to extract text from various data formats.
Jörn
On 9/29/11 8:28 AM, abhishek wrote:
Hi,
While reading the docuemntation of UIMA, i found out that
UIMA supports html files.
However, when i am running the
org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to
understand the text.
Kindly let me know, the correct way to read these type of files.