This is indeed an important issue for an easy interoperability.
However a simple mapping solves only a part of the issue. In a lot of
case, the mapping operation requires a lot of intelligence. For a tagger
by example, one will have 17 tags related to Verbs and another 4: a
simple mapping will not work. For a name extractor, one will provide 2
fields, firstName and lastName, and another one will have a middle
initial, a title and so on. So a lot of time you have to modify the data
themselves to move those from one typesystem to another one and this
require simple or not so simple code.
What we do usually is to use a java scripting language to do the mapping
like beanshell. This is flexible and simple but still powerful to use
Java libs you may have to do complex things. Maybe an idea could be to
develop a mapping component based on a java scripting language,
configurable using XML files as you suggest but also flexible to add
code in an easy way.
A flexible mapping annotator will be very useful in the UIMA. This is
one way which is needed and pragmatic. Another important and
complementary way to tackle the issue is to have some recommendations
for standards entity types like People names, dates, places and so on,
like Dublin Core for Metadata. If we do this, people providing
annotators will be able to add this type as output and will do the
mapping themselves to be "UIMA people compliant" by example. This can be
a degraded mode for them in regard of their standard capabilities but
this will be nice for fast prototyping.
Pascal
-----Original Message-----
From: Michael Baessler [mailto:[EMAIL PROTECTED] Sent: Tuesday,
May 13, 2008 5:09 PM
To: [email protected]
Subject: Annotation Mapping Annotator
Is there some interest/need in the UIMA community to have an annotation
mapping annotator?
I think some of you might know the issue that different UIMA components
work on different
annotations and type systems. A mapping annotator component could be
used to translate the
annotations between these different requirements. E.g. we have a
tokenizer component at the
beginning of the analysis flow that produces example.Token annotations
with a POS feature set. Later
in the flow have a component that needs that information, but expects an
example.Noun annotation.
Unfortunately there is no way to configure both components to produce or
read different annotation
types, so in that case we need a mapping.
Tokenizer creates:
example.Token (2,8)
POS = NN
Mapping annotator translates this to:
example.Noun (2,8)
posTag = NN
If there is a need for such a component we can reuse some of the code
developed for the UIMA
SimpleServer. The SimpleServer has a mapping syntax with additional
filtering as shown below.
The mapping for the example above looks like:
<type name="example.Token" outputTag="example.Noun">
<filters>
<filter featurePath="POS" operator="=" value="NN" />
</filters>
<outputs>
<output featurePath="pos" outputAttribute="posTag"/>
</outputs>
</type>
Any feedback/comments for such a component?
Are there any implementations available?
-- Michael