Well, using a scripting language has scripting language advantages! It
can be interesting or not depending of your context.
It's fast to do a modification like correcting a regexp and anybody can
do it. No need too be a Java developper and have a Java development
environment at hand. So a quick fix on a server is much easier this way.
It's mainly a glue program and scripting languages are good at that.  
We use something very similar to the BSF annotator which is in the UIMA
sandbox. 

Pascal

-----Original Message-----
From: Steven Bethard [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 14, 2008 8:57 PM
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: Re: Annotation Mapping Annotator

On Tue, May 13, 2008 at 11:36 AM, Pascal Coupet
<[EMAIL PROTECTED]> wrote:
> However a simple mapping solves only a part of the issue. In a lot of
> case, the mapping operation requires a lot of intelligence. For a
tagger
> by example, one will have 17 tags related to Verbs and another 4: a
> simple mapping will not work. For a name extractor, one will provide 2
> fields, firstName and lastName, and another one will have a middle
> initial, a title and so on. So a lot of time you have to modify the
data
> themselves to move those from one typesystem to another one and this
> require simple or not so simple code.

FWIW, the annotation mapping problems I've run into are all of this
type - where a simple name-to-name mapping is insufficient. The
solution so far has simply been to write an appropriate AnalysisEngine
that does the translation. I like this option better than introducing
a scripting language or an XML file -- if I'm forced to write in Java
in the first place, I at least want to get the most out of its type
checking.

> A flexible mapping annotator will be very useful in the UIMA. This is
> one way which is needed and pragmatic. Another important and
> complementary way to tackle the issue is to have some recommendations
> for standards entity types like People names, dates, places and so on,
> like Dublin Core for Metadata. If we do this, people providing
> annotators will be able to add this type as output and will do the
> mapping themselves to be "UIMA people compliant" by example. This can
be
> a degraded mode for them in regard of their standard capabilities but
> this will be nice for fast prototyping.

Having some recommended type systems for common types would be
incredibly useful. Common things we run into: sentences, tokens,
stems, parts of speech, named entities, syntactic trees, semantic role
structures. If there were standard representations for these, we'd be
happy to use them instead of inventing our own.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
 --- Bucky Katt, Get Fuzzy

Reply via email to