NOTE: This was foolishly posted to just uima-dev in September. The type system and some of the sample code are used in the recently mentioned UCC tool from CMU.
We would like to add a UIMA type system and sample annotators to the Apache Incubator project as an example of a rich multimodal application. Our hope is that others will find the techniques and types useful, and will find it a good starting point for developing other multimodal applications. GTS is a type system designed for multi-modal applications that combine analytics from multiple sources and modalities, such as speech recognition, language translation, entity detection. etc. It is currently used by 10 cooperating groups participating in the Darpa GALE project ( http://www.darpa.mil/ipto/programs/gale/gale.asp) to transcribe, translate, and extract information from foreign language news broadcasts. This application requires that all the data is cross-referenced so that, for example, any English sentence can be traced back to the precise region of foreign language audio that generated it. The CAS organization and type system have been designed to allow each analytic to easily work on data of the appropriate modality. Speech recognition engines annotate an audio view with words aligned to a time axis; machine translation annotates a text view of foreign sentences with their English translation; entity detection annotates a text view of the English sentences. Multiple analytics of each type may be employed to improve the overall accuracy. The sample code includes data reorganization components that are inserted between the different analytics to perform the necessary bookkeeping of creating views and cross-reference links from one view back to an earlier one. e.g. after all speech recognition analytics have run, a reorg module creates a source-language text view for each STT engine, along with cross-reference annotations from each word in the new view back to the appropriate time span in the audio view. One reorg component is a CAS Multiplier that resegments the initial fixed-length audio segments at likely story boundaries so that later components can treat each CAS as a complete story. The STT and MT analytics are simulated analytics that read their results from a file, so that a complete pipeline of components can be tested. We welcome any comments or suggestions or questions! - Burn.
