Re: merging CAS objects

Jens Grivolla Fri, 16 Sep 2011 02:15:46 -0700

On 09/16/2011 10:43 AM, Alexander Klenner wrote:

I have a question concerning the merging of different UIMA pipelines.
Say I have 3 different annotators that work on the same document (The
CAS sofa data is identical for each of the pipelines) They do this
parallel and all of them produce different annotations but in a sofa
with the same name(_textView). Finally I have 3 serialized XCAS files
in three different folders, coming from different nodes of a
cluster.

We have the same problem sometimes, and I'd be very interested in a"clean" solution.

Is there an UIMA conform way to merge the corresponding xml files
into one CAS object that has all the annotations of the three
separate files? I could easily do this with a non uima java class
that just adds all the annotation information into one file. Since
the sofa data is the same, the offset information of the annotations
will be correct, but I'd rather stay in the UIMA context.

We actually edit XMI files using python scripts to add annotations thatcome from outside UIMA, etc. However, especially given the veryunfortunate disappearance of Ed Loper's uimapy, our approach is a bithacky, e.g. for dealing with the xmi:id features, namespace prefixes fortype systems, etc. Also, XMI allows for many different representationsof the same information, and our scripts really only deal with the mostcommon version (as attributes).

I guess in Java you can at least useorg.apache.uima.cas.impl.XmiCasDeserializer andorg.apache.uima.cas.impl.XmiCasSerializer to avoid the XMI specific details.


Bye,
Jens

Re: merging CAS objects

Reply via email to