> That being said there are some additional tools that might help with
> putting together various pipelines.  For instance, I've been working on
> (although not very hard) an Analysis engine that would allow an
> different Analysis Engine to work on only part of the output of another
> Analysis Engine.  Suppose I had an Analysis Engine that detected the
> different languages being used in a text.  Then suppose I had a Person
> Annotation Extractor that only works on Japanese.  I might want to be
> able to send the Japanese parts of my text to the Person Annotation
> Extractor without writing any code.  I'm not at all sure what the best
> way to go about this would be.  Such an Analysis Engine might be good to
> include in the UIMA package but it might not belong in the
> specification. 

This touches on thoughts I've been having about combining arbitrary annotators.
Ultimately, it would be good to get some level of standards for Type Systems 
that
define a minimal set of fields for tokens, parts of speech, named entities, 
taxonomy
classifications, etc.  However, that's a long process that will involve lots of
community organizing and vendor cooperation, and won't be happening any time
in the near future, I think.

In the interim, I believe the only way it will be possible to combine arbitrary 
annotators
is by transforming the data in the CAS from one type system to another.  Sort 
of an
ETL for UIMA.  I can imagine something with a nice mapping GUI similar to the 
GUIs
in an database ETL product such as Informatica.  The kind of sub-setting you
describe above would be one of the things such a tool could do.  Another 
example would
be to take parts of speech coming out of OpenNLP and transform them into parts
of speech as required by a particular named entity annotator.

Maybe there's a business opportunity here for someone.  Or maybe there are
open-source tools that could be adapted to do this.  It does seem like a 
project that
probably exceeds the capacity of the current UIMA project.


Greg Holmberg

Reply via email to