Hi, Have a look at the TikaAnnotator in the sandbox. It extracts the text and metadata from various document formats and converts any available markup into annotations
HTH Julien On 29 September 2011 07:28, abhishek <[email protected]> wrote: > Hi, > While reading the docuemntation of UIMA, i found out that > UIMA supports html files. > > However, when i am running the > org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to > understand the text. > > Kindly let me know, the correct way to read these type of files. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
