> From: LeHouillier, Frank D. [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 15, 2008 11:41 AM > To: [email protected] > Subject: RE: Annotation Mapping Annotator > > The case where you need a simple name mapping is not going to be a > result of unrelated annotators being used together but rather as a > result of the development process. Decoupling the type system from the > analysis engine should allow developers to do things like have multiple > JCas implementations of a single CAS type within a single jvm, for > instance. > > Also, I have to say that I get nervous when I hear people talking about > getting some sort of "recommended type systems", especially when they Well, standards are recommended objects. UIMA restricts your freedom too with the promises that you will be able to interact in a much easier way with others. I think we are at the early beginning but that standards will emerge to represents common entities and this will make applications and the web more useful. As stated by Olivier in another mail, people in the web semantic domain are producing interesting thinks, ACE is interesting also as you noted, NewsML2 is useful for events in the news space and so on.
> are talking about syntactic sorts of things like syntactic trees, > because having these presupposes some set of algorithms. Even getting > the semantics for something like "token" down (is it simply white space > delimited? Is <n't> part of <can't> or a separate token?) forces one You need to understand algorithms when you select a new tokenizer to decide if it is acceptable to you, but if yes, your problem is now to map its tags to what your following annotators are expecting. > into using a particular set of algorithms. I'm not saying that there > can't be recommendations but they should be completely divorced from > UIMA, the OASIS standard or the Apache implementation. If I thought > that most developers on UIMA were targeting something like a Minimalist > Syntax named entity/relationship recognizer for ACE annotation types > because a recommendation saying such was part of the spec, I wouldn't go > near the thing. We should encourage NIST to adopt UIMA for ACE, rather > than having UIMA adopt the ACE types. I agree. There are 2 levels for recommendations: one for the semantic and a lot of work is already in progress, the second is how to represent it using UIMA type systems. It will be nice to have recommended type systems for ACE, NewML2 events, and so on because a same semantic description can be encoded in various ways as UIMA type systems. This can be triggered by the organization behind each proposal but it may be much more efficient to be pro-active and propose type systems as part of the UIMA Apache project for recognized standards. I think it will be very useful to have a Sandbox for type systems in a similar ways than the one for Annotators. > > -----Original Message----- > From: Steven Bethard [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 14, 2008 2:57 PM > To: [email protected] > Cc: [EMAIL PROTECTED] > Subject: Re: Annotation Mapping Annotator > > On Tue, May 13, 2008 at 11:36 AM, Pascal Coupet > <[EMAIL PROTECTED]> wrote: > > However a simple mapping solves only a part of the issue. In a lot of > > case, the mapping operation requires a lot of intelligence. For a > tagger > > by example, one will have 17 tags related to Verbs and another 4: a > > simple mapping will not work. For a name extractor, one will provide 2 > > fields, firstName and lastName, and another one will have a middle > > initial, a title and so on. So a lot of time you have to modify the > data > > themselves to move those from one typesystem to another one and this > > require simple or not so simple code. > > FWIW, the annotation mapping problems I've run into are all of this > type - where a simple name-to-name mapping is insufficient. The > solution so far has simply been to write an appropriate AnalysisEngine > that does the translation. I like this option better than introducing > a scripting language or an XML file -- if I'm forced to write in Java > in the first place, I at least want to get the most out of its type > checking. > > > A flexible mapping annotator will be very useful in the UIMA. This is > > one way which is needed and pragmatic. Another important and > > complementary way to tackle the issue is to have some recommendations > > for standards entity types like People names, dates, places and so on, > > like Dublin Core for Metadata. If we do this, people providing > > annotators will be able to add this type as output and will do the > > mapping themselves to be "UIMA people compliant" by example. This can > be > > a degraded mode for them in regard of their standard capabilities but > > this will be nice for fast prototyping. > > Having some recommended type systems for common types would be > incredibly useful. Common things we run into: sentences, tokens, > stems, parts of speech, named entities, syntactic trees, semantic role > structures. If there were standard representations for these, we'd be > happy to use them instead of inventing our own. > > Steve > -- > I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a > tiny blip on the distant coast of sanity. > --- Bucky Katt, Get Fuzzy
