Hello all, I am new to this framework, and to this topic in general. My requirement is to build a component that can take unstructured html documents, and extract data. Something like this can be built with a regular html parser. However due the number of different html document types, building something like this by hand is a time consuming, especially if there is a way to generate a parser automatically from training data and apply incremental learning as new samples are proven valid. Many of the documents I am looking to structure and extract data from, contains financial data (currency, numbers, dates and times .. etc).
So my first question is, Can UIMA help ? I did some reading about opennlp, and got lost, which one is closer to what I need if any. Thank you a lot, for your time.
