Re: How to process structured input with UIMA?

Jörn Kottmann Wed, 02 Mar 2011 02:46:42 -0800

On 3/2/11 11:14 AM, Andreas Kahl wrote:

Mainly I am concerned with the latter:
Those metadata-records would come in as XML with dozens of fields containing 
relatively short texts (most less than 255chars). We need to perform NLP 
(tokenization, stemming ...) and some simpler manipulations like reading 3 
fields and constructing a 4th from that.
It would be very desirable to use one Framework for both tasks (in fact we 
would use the pipeline to enrich the Metadata-Records with the long texts).

You could take the xml, parse it and then construct a short text whichcontains the content togehterwith annoations to mark the existing structure. This new text with theannotations will be placed in a new view.

Afterward you can perform your processing within these annotation bounds.

Not sure how you construct the 4th field, but when you can do thatdirectly after

the xml parsing it could be part of the constructed text.

With UIMA-AS you should be able to nicely scale the analysis to a fewmachines.


Hope that helps,
Jörn

Re: How to process structured input with UIMA?

Reply via email to