The case where you need a simple name mapping is not going to be a
result of unrelated annotators being used together but rather as a
result of the development process.  Decoupling the type system from the
analysis engine should allow developers to do things like have multiple
JCas implementations of a single CAS type within a single jvm, for
instance. 

Also, I have to say that I get nervous when I hear people talking about
getting some sort of "recommended type systems", especially when they
are talking about syntactic sorts of things like syntactic trees,
because having these presupposes some set of algorithms.  Even getting
the semantics for something like "token" down (is it simply white space
delimited? Is <n't> part of <can't> or a separate token?) forces one
into using a particular set of algorithms.  I'm not saying that there
can't be recommendations but they should be completely divorced from
UIMA, the OASIS standard or the Apache implementation.  If I thought
that most developers on UIMA were targeting something like a Minimalist
Syntax named entity/relationship recognizer for ACE annotation types
because a recommendation saying such was part of the spec, I wouldn't go
near the thing.  We should encourage NIST to adopt UIMA for ACE, rather
than having UIMA adopt the ACE types.

-----Original Message-----
From: Steven Bethard [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 14, 2008 2:57 PM
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: Re: Annotation Mapping Annotator

On Tue, May 13, 2008 at 11:36 AM, Pascal Coupet
<[EMAIL PROTECTED]> wrote:
> However a simple mapping solves only a part of the issue. In a lot of
> case, the mapping operation requires a lot of intelligence. For a
tagger
> by example, one will have 17 tags related to Verbs and another 4: a
> simple mapping will not work. For a name extractor, one will provide 2
> fields, firstName and lastName, and another one will have a middle
> initial, a title and so on. So a lot of time you have to modify the
data
> themselves to move those from one typesystem to another one and this
> require simple or not so simple code.

FWIW, the annotation mapping problems I've run into are all of this
type - where a simple name-to-name mapping is insufficient. The
solution so far has simply been to write an appropriate AnalysisEngine
that does the translation. I like this option better than introducing
a scripting language or an XML file -- if I'm forced to write in Java
in the first place, I at least want to get the most out of its type
checking.

> A flexible mapping annotator will be very useful in the UIMA. This is
> one way which is needed and pragmatic. Another important and
> complementary way to tackle the issue is to have some recommendations
> for standards entity types like People names, dates, places and so on,
> like Dublin Core for Metadata. If we do this, people providing
> annotators will be able to add this type as output and will do the
> mapping themselves to be "UIMA people compliant" by example. This can
be
> a degraded mode for them in regard of their standard capabilities but
> this will be nice for fast prototyping.

Having some recommended type systems for common types would be
incredibly useful. Common things we run into: sentences, tokens,
stems, parts of speech, named entities, syntactic trees, semantic role
structures. If there were standard representations for these, we'd be
happy to use them instead of inventing our own.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
 --- Bucky Katt, Get Fuzzy

Reply via email to