The simple name translation capability I think could be provided by an extension to the base framework, using an "alias" notion. The concept would be to allow multiple type and feature names to map to the same internal type. This would allow the same internal CAS object to be referred to by the aliased names, without doing any actual copying of the object.

-Marshall

David Buttler wrote:
I agree with Pascal that there are many use cases where simple name transformation (e.g. org.apache.uima.Person to com.company.PersonName) is insufficient. Incorporating a scripting framework that allows arbitrary computation is the easiest way to create a component that is usable for the task. However, I think the simple name translation service does provide value by itself, and it is conceptually much simpler to use. I would like to see both components. This would allow people to start with something that is simple and easy to use, and then graduate into a more complex mapping once they realize the need. Either that, or a component that does everything Pascal describes, but also allows a simple configuration for easy cases without a performance or configuration penalty for being able to extend a mapping to a scripting framework.

Dave

Pascal Coupet wrote:
This is indeed an important issue for an easy interoperability.

However a simple mapping solves only a part of the issue. In a lot of
case, the mapping operation requires a lot of intelligence. For a tagger
by example, one will have 17 tags related to Verbs and another 4: a
simple mapping will not work. For a name extractor, one will provide 2
fields, firstName and lastName, and another one will have a middle
initial, a title and so on. So a lot of time you have to modify the data
themselves to move those from one typesystem to another one and this
require simple or not so simple code.
What we do usually is to use a java scripting language to do the mapping
like beanshell. This is flexible and simple but still powerful to use
Java libs you may have to do complex things. Maybe an idea could be to
develop a mapping component based on a java scripting language,
configurable using XML files as you suggest but also flexible to add
code in an easy way. A flexible mapping annotator will be very useful in the UIMA. This is
one way which is needed and pragmatic. Another important and
complementary way to tackle the issue is to have some recommendations
for standards entity types like People names, dates, places and so on,
like Dublin Core for Metadata. If we do this, people providing
annotators will be able to add this type as output and will do the
mapping themselves to be "UIMA people compliant" by example. This can be
a degraded mode for them in regard of their standard capabilities but
this will be nice for fast prototyping.
Pascal

-----Original Message-----
From: Michael Baessler [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 5:09 PM
To: [email protected]
Subject: Annotation Mapping Annotator

Is there some interest/need in the UIMA community to have an annotation
mapping annotator?

I think some of you might know the issue that different UIMA components
work on different
annotations and type systems. A mapping annotator component could be
used to translate the
annotations between these different requirements. E.g. we have a
tokenizer component at the
beginning of the analysis flow that produces example.Token annotations
with a POS feature set. Later
in the flow have a component that needs that information, but expects an
example.Noun annotation.
Unfortunately there is no way to configure both components to produce or
read different annotation
types, so in that case we need a mapping.

Tokenizer creates:

  example.Token (2,8)
     POS = NN

Mapping annotator translates this to:

  example.Noun (2,8)
     posTag = NN


If there is a need for such a component we can reuse some of the code
developed for the UIMA
SimpleServer. The SimpleServer has a mapping syntax with additional
filtering as shown below.

The mapping for the example above looks like:

<type name="example.Token" outputTag="example.Noun">
  <filters>
      <filter featurePath="POS" operator="=" value="NN" />
  </filters>
  <outputs>
      <output featurePath="pos" outputAttribute="posTag"/>

  </outputs>
</type>

Any feedback/comments for such a component?
Are there any implementations available?

-- Michael





Reply via email to