Hi,
Basic UIMA is an empty framework, designed to let people independently develop
"annotators" and let others pull them together into processing pipelines.
So, the basic UIMA framework, by itself, won't help, without finding some
Annotators to have it run.
You can search the internet for UIMA Annotators - there are several repositories
of these (some of them are cataloged on the Apache UIMA website, under external
links). And you can write your own, to particular things you need.
The Apache UIMA project also comes with a few annotators of its own (click on
"annotators", or use this direct link:
http://uima.apache.org/sandbox.html#uima-addons-annotators ), and other Apache
projects (such as OpenNLP) make their functionality available as UIMA annotators.
HTH -Marshall
On 5/18/2012 3:06 AM, Mansour wrote:
Hello all,
I am new to this framework, and to this topic in general.
My requirement is to build a component that can take unstructured html
documents, and extract data. Something like this can be built with a regular
html parser.
However due the number of different html document types, building something
like this by hand is a time consuming, especially if there is a way to generate
a parser automatically from training data and apply incremental learning as new
samples are proven valid.
Many of the documents I am looking to structure and extract data from, contains
financial data (currency, numbers, dates and times .. etc).
So my first question is, Can UIMA help ? I did some reading about opennlp, and
got lost, which one is closer to what I need if any.
Thank you a lot, for your time.