You may also want to look at Clear-TK, based on UIMA, focused on NLP and Machine Learning.
Ross On Wed, May 23, 2012 at 8:27 PM, Marshall Schor <[email protected]> wrote: > Hi, > > Basic UIMA is an empty framework, designed to let people independently > develop "annotators" and let others pull them together into processing > pipelines. > > So, the basic UIMA framework, by itself, won't help, without finding some > Annotators to have it run. > > You can search the internet for UIMA Annotators - there are several > repositories of these (some of them are cataloged on the Apache UIMA > website, under external links). And you can write your own, to particular > things you need. > > The Apache UIMA project also comes with a few annotators of its own (click > on "annotators", or use this direct link: http://uima.apache.org/** > sandbox.html#uima-addons-**annotators<http://uima.apache.org/sandbox.html#uima-addons-annotators>), > and other Apache projects (such as OpenNLP) make their functionality > available as UIMA annotators. > > HTH -Marshall > > > On 5/18/2012 3:06 AM, Mansour wrote: > >> Hello all, >> I am new to this framework, and to this topic in general. >> My requirement is to build a component that can take unstructured html >> documents, and extract data. Something like this can be built with a >> regular >> html parser. >> However due the number of different html document types, building >> something >> like this by hand is a time consuming, especially if there is a way to >> generate >> a parser automatically from training data and apply incremental learning >> as new >> samples are proven valid. >> Many of the documents I am looking to structure and extract data from, >> contains >> financial data (currency, numbers, dates and times .. etc). >> >> So my first question is, Can UIMA help ? I did some reading about >> opennlp, and >> got lost, which one is closer to what I need if any. >> >> Thank you a lot, for your time. >> >> >> >>
