On 09/16/2011 10:43 AM, Alexander Klenner wrote:
I have a question concerning the merging of different UIMA pipelines.
Say I have 3 different annotators that work on the same document (The
CAS sofa data is identical for each of the pipelines) They do this
parallel and all of them produce different annotations but in a sofa
with the same name(_textView). Finally I have 3 serialized XCAS files
in three different folders, coming from different nodes of a
cluster.

We have the same problem sometimes, and I'd be very interested in a "clean" solution.

Is there an UIMA conform way to merge the corresponding xml files
into one CAS object that has all the annotations of the three
separate files? I could easily do this with a non uima java class
that just adds all the annotation information into one file. Since
the sofa data is the same, the offset information of the annotations
will be correct, but I'd rather stay in the UIMA context.

We actually edit XMI files using python scripts to add annotations that come from outside UIMA, etc. However, especially given the very unfortunate disappearance of Ed Loper's uimapy, our approach is a bit hacky, e.g. for dealing with the xmi:id features, namespace prefixes for type systems, etc. Also, XMI allows for many different representations of the same information, and our scripts really only deal with the most common version (as attributes).

I guess in Java you can at least use org.apache.uima.cas.impl.XmiCasDeserializer and org.apache.uima.cas.impl.XmiCasSerializer to avoid the XMI specific details.

Bye,
Jens

Reply via email to