Hi,

Basic UIMA is an empty framework, designed to let people independently develop "annotators" and let others pull them together into processing pipelines.

So, the basic UIMA framework, by itself, won't help, without finding some Annotators to have it run.

You can search the internet for UIMA Annotators - there are several repositories of these (some of them are cataloged on the Apache UIMA website, under external links). And you can write your own, to particular things you need.

The Apache UIMA project also comes with a few annotators of its own (click on "annotators", or use this direct link: http://uima.apache.org/sandbox.html#uima-addons-annotators ), and other Apache projects (such as OpenNLP) make their functionality available as UIMA annotators.

HTH   -Marshall

On 5/18/2012 3:06 AM, Mansour wrote:
Hello all,
I am new to this framework, and to this topic in general.
My requirement is to build a component that can take unstructured html
documents, and extract data. Something like this can be built with a regular
html parser.
However due the number of different html document types, building something
like this by hand is a time consuming, especially if there is a way to generate
a parser automatically from training data and apply incremental learning as new
samples are proven valid.
Many of the documents I am looking to structure and extract data from, contains
financial data (currency, numbers, dates and times .. etc).

So my first question is, Can UIMA help ? I did some reading about opennlp, and
got lost, which one is closer to what I need if any.

Thank you a lot, for your time.



Reply via email to