Well, using a scripting language has scripting language advantages! It can be interesting or not depending of your context. It's fast to do a modification like correcting a regexp and anybody can do it. No need too be a Java developper and have a Java development environment at hand. So a quick fix on a server is much easier this way. It's mainly a glue program and scripting languages are good at that. We use something very similar to the BSF annotator which is in the UIMA sandbox.
Pascal -----Original Message----- From: Steven Bethard [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 14, 2008 8:57 PM To: [email protected] Cc: [EMAIL PROTECTED] Subject: Re: Annotation Mapping Annotator On Tue, May 13, 2008 at 11:36 AM, Pascal Coupet <[EMAIL PROTECTED]> wrote: > However a simple mapping solves only a part of the issue. In a lot of > case, the mapping operation requires a lot of intelligence. For a tagger > by example, one will have 17 tags related to Verbs and another 4: a > simple mapping will not work. For a name extractor, one will provide 2 > fields, firstName and lastName, and another one will have a middle > initial, a title and so on. So a lot of time you have to modify the data > themselves to move those from one typesystem to another one and this > require simple or not so simple code. FWIW, the annotation mapping problems I've run into are all of this type - where a simple name-to-name mapping is insufficient. The solution so far has simply been to write an appropriate AnalysisEngine that does the translation. I like this option better than introducing a scripting language or an XML file -- if I'm forced to write in Java in the first place, I at least want to get the most out of its type checking. > A flexible mapping annotator will be very useful in the UIMA. This is > one way which is needed and pragmatic. Another important and > complementary way to tackle the issue is to have some recommendations > for standards entity types like People names, dates, places and so on, > like Dublin Core for Metadata. If we do this, people providing > annotators will be able to add this type as output and will do the > mapping themselves to be "UIMA people compliant" by example. This can be > a degraded mode for them in regard of their standard capabilities but > this will be nice for fast prototyping. Having some recommended type systems for common types would be incredibly useful. Common things we run into: sentences, tokens, stems, parts of speech, named entities, syntactic trees, semantic role structures. If there were standard representations for these, we'd be happy to use them instead of inventing our own. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
