You may also want to look at Clear-TK, based on UIMA, focused on NLP and
Machine Learning.

Ross

On Wed, May 23, 2012 at 8:27 PM, Marshall Schor <[email protected]> wrote:

> Hi,
>
> Basic UIMA is an empty framework, designed to let people independently
> develop "annotators" and let others pull them together into processing
> pipelines.
>
> So, the basic UIMA framework, by itself, won't help, without finding some
> Annotators to have it run.
>
> You can search the internet for UIMA Annotators - there are several
> repositories of these (some of them are cataloged on the Apache UIMA
> website, under external links).  And you can write your own, to particular
> things you need.
>
> The Apache UIMA project also comes with a few annotators of its own (click
> on "annotators", or use this direct link: http://uima.apache.org/**
> sandbox.html#uima-addons-**annotators<http://uima.apache.org/sandbox.html#uima-addons-annotators>),
>  and other Apache projects (such as OpenNLP) make their functionality
> available as UIMA annotators.
>
> HTH   -Marshall
>
>
> On 5/18/2012 3:06 AM, Mansour wrote:
>
>> Hello all,
>> I am new to this framework, and to this topic in general.
>> My requirement is to build a component that can take unstructured html
>> documents, and extract data. Something like this can be built with a
>> regular
>> html parser.
>> However due the number of different html document types, building
>> something
>> like this by hand is a time consuming, especially if there is a way to
>> generate
>> a parser automatically from training data and apply incremental learning
>> as new
>> samples are proven valid.
>> Many of the documents I am looking to structure and extract data from,
>> contains
>> financial data (currency, numbers, dates and times .. etc).
>>
>> So my first question is, Can UIMA help ? I did some reading about
>> opennlp, and
>> got lost, which one is closer to what I need if any.
>>
>> Thank you a lot, for your time.
>>
>>
>>
>>

Reply via email to