RE: Annotation Mapping Annotator

Pascal Coupet Thu, 15 May 2008 03:45:48 -0700

> From: LeHouillier, Frank D. [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 15, 2008 11:41 AM
> To: [email protected]
> Subject: RE: Annotation Mapping Annotator
> 
> The case where you need a simple name mapping is not going to be a
> result of unrelated annotators being used together but rather as a
> result of the development process.  Decoupling the type system from
the
> analysis engine should allow developers to do things like have
multiple
> JCas implementations of a single CAS type within a single jvm, for
> instance.
> 
> Also, I have to say that I get nervous when I hear people talking
about
> getting some sort of "recommended type systems", especially when they
Well, standards are recommended objects. UIMA restricts your freedom too
with the promises that you will be able to interact in a much easier way
with others. I think we are at the early beginning but that standards
will emerge to represents common entities and this will make
applications and the web more useful. As stated by Olivier in another
mail, people in the web semantic domain are producing interesting
thinks, ACE is interesting also as you noted, NewsML2 is useful for
events in the news space and so on.


> are talking about syntactic sorts of things like syntactic trees,
> because having these presupposes some set of algorithms.  Even getting
> the semantics for something like "token" down (is it simply white
space
> delimited? Is <n't> part of <can't> or a separate token?) forces one

You need to understand algorithms when you select a new tokenizer to
decide if it is acceptable to you, but if yes, your problem is now to
map its tags to what your following annotators are expecting. 

> into using a particular set of algorithms.  I'm not saying that there
> can't be recommendations but they should be completely divorced from
> UIMA, the OASIS standard or the Apache implementation.  If I thought
> that most developers on UIMA were targeting something like a
Minimalist
> Syntax named entity/relationship recognizer for ACE annotation types
> because a recommendation saying such was part of the spec, I wouldn't
go
> near the thing.  We should encourage NIST to adopt UIMA for ACE,
rather
> than having UIMA adopt the ACE types.

I agree. There are 2 levels for recommendations: one for the semantic
and a lot of work is already in progress, the second is how to represent
it using UIMA type systems. It will be nice to have recommended type
systems for ACE, NewML2 events, and so on because a same semantic
description can be encoded in various ways as UIMA type systems. This
can be triggered by the organization behind each proposal but it may be
much more efficient to be pro-active and propose type systems as part of
the UIMA Apache project for recognized standards.  I think it will be
very useful to have a Sandbox for type systems in a similar ways than
the one for Annotators.  

> 
> -----Original Message-----
> From: Steven Bethard [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, May 14, 2008 2:57 PM
> To: [email protected]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Annotation Mapping Annotator
> 
> On Tue, May 13, 2008 at 11:36 AM, Pascal Coupet
> <[EMAIL PROTECTED]> wrote:
> > However a simple mapping solves only a part of the issue. In a lot
of
> > case, the mapping operation requires a lot of intelligence. For a
> tagger
> > by example, one will have 17 tags related to Verbs and another 4: a
> > simple mapping will not work. For a name extractor, one will provide
2
> > fields, firstName and lastName, and another one will have a middle
> > initial, a title and so on. So a lot of time you have to modify the
> data
> > themselves to move those from one typesystem to another one and this
> > require simple or not so simple code.
> 
> FWIW, the annotation mapping problems I've run into are all of this
> type - where a simple name-to-name mapping is insufficient. The
> solution so far has simply been to write an appropriate AnalysisEngine
> that does the translation. I like this option better than introducing
> a scripting language or an XML file -- if I'm forced to write in Java
> in the first place, I at least want to get the most out of its type
> checking.
> 
> > A flexible mapping annotator will be very useful in the UIMA. This
is
> > one way which is needed and pragmatic. Another important and
> > complementary way to tackle the issue is to have some
recommendations
> > for standards entity types like People names, dates, places and so
on,
> > like Dublin Core for Metadata. If we do this, people providing
> > annotators will be able to add this type as output and will do the
> > mapping themselves to be "UIMA people compliant" by example. This
can
> be
> > a degraded mode for them in regard of their standard capabilities
but
> > this will be nice for fast prototyping.
> 
> Having some recommended type systems for common types would be
> incredibly useful. Common things we run into: sentences, tokens,
> stems, parts of speech, named entities, syntactic trees, semantic role
> structures. If there were standard representations for these, we'd be
> happy to use them instead of inventing our own.
> 
> Steve
> --
> I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
> tiny blip on the distant coast of sanity.
>  --- Bucky Katt, Get Fuzzy

RE: Annotation Mapping Annotator

Reply via email to