Hello all,
I am new to this framework, and to this topic in general.
My requirement is to build a component that can take unstructured html
documents, and extract data. Something like this can be built with a regular
html parser.
However due the number of different html document types, building something
like this by hand is a time consuming, especially if there is a way to generate
a parser automatically from training data and apply incremental learning as new
samples are proven valid. 
Many of the documents I am looking to structure and extract data from, contains
financial data (currency, numbers, dates and times .. etc).

So my first question is, Can UIMA help ? I did some reading about opennlp, and
got lost, which one is closer to what I need if any.

Thank you a lot, for your time.


Reply via email to