This is indeed an important issue for an easy interoperability. However a simple mapping solves only a part of the issue. In a lot of case, the mapping operation requires a lot of intelligence. For a tagger by example, one will have 17 tags related to Verbs and another 4: a simple mapping will not work. For a name extractor, one will provide 2 fields, firstName and lastName, and another one will have a middle initial, a title and so on. So a lot of time you have to modify the data themselves to move those from one typesystem to another one and this require simple or not so simple code.
What we do usually is to use a java scripting language to do the mapping like beanshell. This is flexible and simple but still powerful to use Java libs you may have to do complex things. Maybe an idea could be to develop a mapping component based on a java scripting language, configurable using XML files as you suggest but also flexible to add code in an easy way. A flexible mapping annotator will be very useful in the UIMA. This is one way which is needed and pragmatic. Another important and complementary way to tackle the issue is to have some recommendations for standards entity types like People names, dates, places and so on, like Dublin Core for Metadata. If we do this, people providing annotators will be able to add this type as output and will do the mapping themselves to be "UIMA people compliant" by example. This can be a degraded mode for them in regard of their standard capabilities but this will be nice for fast prototyping. Pascal -----Original Message----- From: Michael Baessler [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 5:09 PM To: [email protected] Subject: Annotation Mapping Annotator Is there some interest/need in the UIMA community to have an annotation mapping annotator? I think some of you might know the issue that different UIMA components work on different annotations and type systems. A mapping annotator component could be used to translate the annotations between these different requirements. E.g. we have a tokenizer component at the beginning of the analysis flow that produces example.Token annotations with a POS feature set. Later in the flow have a component that needs that information, but expects an example.Noun annotation. Unfortunately there is no way to configure both components to produce or read different annotation types, so in that case we need a mapping. Tokenizer creates: example.Token (2,8) POS = NN Mapping annotator translates this to: example.Noun (2,8) posTag = NN If there is a need for such a component we can reuse some of the code developed for the UIMA SimpleServer. The SimpleServer has a mapping syntax with additional filtering as shown below. The mapping for the example above looks like: <type name="example.Token" outputTag="example.Noun"> <filters> <filter featurePath="POS" operator="=" value="NN" /> </filters> <outputs> <output featurePath="pos" outputAttribute="posTag"/> </outputs> </type> Any feedback/comments for such a component? Are there any implementations available? -- Michael
