Re: UIMA- Support for HTML, PDF, Doc files

Julien Nioche Thu, 29 Sep 2011 01:52:11 -0700

Hi,

Have a look at the TikaAnnotator in the sandbox. It extracts the text and
metadata from various document formats and converts any available markup
into annotations


HTH

Julien


On 29 September 2011 07:28, abhishek <[email protected]> wrote:

> Hi,
> While reading the docuemntation of UIMA, i found out that
> UIMA&nbsp;supports&nbsp;html files.
> &nbsp;
> However, when i am running the
> org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to
> understand the text.
> &nbsp;
> Kindly let me know, the correct way to read these type of files.
> &nbsp;




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: UIMA- Support for HTML, PDF, Doc files

Reply via email to