The case where you need a simple name mapping is not going to be a result of unrelated annotators being used together but rather as a result of the development process. Decoupling the type system from the analysis engine should allow developers to do things like have multiple JCas implementations of a single CAS type within a single jvm, for instance.
Also, I have to say that I get nervous when I hear people talking about getting some sort of "recommended type systems", especially when they are talking about syntactic sorts of things like syntactic trees, because having these presupposes some set of algorithms. Even getting the semantics for something like "token" down (is it simply white space delimited? Is <n't> part of <can't> or a separate token?) forces one into using a particular set of algorithms. I'm not saying that there can't be recommendations but they should be completely divorced from UIMA, the OASIS standard or the Apache implementation. If I thought that most developers on UIMA were targeting something like a Minimalist Syntax named entity/relationship recognizer for ACE annotation types because a recommendation saying such was part of the spec, I wouldn't go near the thing. We should encourage NIST to adopt UIMA for ACE, rather than having UIMA adopt the ACE types. -----Original Message----- From: Steven Bethard [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 14, 2008 2:57 PM To: [email protected] Cc: [EMAIL PROTECTED] Subject: Re: Annotation Mapping Annotator On Tue, May 13, 2008 at 11:36 AM, Pascal Coupet <[EMAIL PROTECTED]> wrote: > However a simple mapping solves only a part of the issue. In a lot of > case, the mapping operation requires a lot of intelligence. For a tagger > by example, one will have 17 tags related to Verbs and another 4: a > simple mapping will not work. For a name extractor, one will provide 2 > fields, firstName and lastName, and another one will have a middle > initial, a title and so on. So a lot of time you have to modify the data > themselves to move those from one typesystem to another one and this > require simple or not so simple code. FWIW, the annotation mapping problems I've run into are all of this type - where a simple name-to-name mapping is insufficient. The solution so far has simply been to write an appropriate AnalysisEngine that does the translation. I like this option better than introducing a scripting language or an XML file -- if I'm forced to write in Java in the first place, I at least want to get the most out of its type checking. > A flexible mapping annotator will be very useful in the UIMA. This is > one way which is needed and pragmatic. Another important and > complementary way to tackle the issue is to have some recommendations > for standards entity types like People names, dates, places and so on, > like Dublin Core for Metadata. If we do this, people providing > annotators will be able to add this type as output and will do the > mapping themselves to be "UIMA people compliant" by example. This can be > a degraded mode for them in regard of their standard capabilities but > this will be nice for fast prototyping. Having some recommended type systems for common types would be incredibly useful. Common things we run into: sentences, tokens, stems, parts of speech, named entities, syntactic trees, semantic role structures. If there were standard representations for these, we'd be happy to use them instead of inventing our own. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
