Andrew, Let me address the question about Python and Perl. The UIMA interface to these languages (and Tcl) were created to support the use of speech-to-text analytics whose top-level interface used scripting languages as glue code to drive complex C++ modules. The motivation here was to be able to easily UIMA-fy existing analytics in these languages for a research application. The CAS interface code used was not much more than that given in the scriptator samples, which demonstrate both creating and accessing annotations.
The overhead of moving the CAS back and forth between Java and the native environment is virtually negligible except for the most light-weight C++ analytics. Not sure if this would ever be a factor for these interpreted languages. For those applications that cannot or do not want to use the JNI, I expect that native C++ service wrappers will be available. Regards, Eddie On 6/6/07, Andrew Borthwick <[EMAIL PROTECTED]> wrote:
All, I'm studying whether UIMA would make sense to add to my company's architecture and would appreciate it if anyone could point me to either of the following. We are considering using UIMA as a framework on which to build a web-scale NLP pipeline which would involve components like sentence boundary identification, tokenization, named entity identification, and phrase chunking. 1. Substantive examples of Python or Perl annoators, preferably something that both inputs and outputs consumes annotations. 2. More generally, could anybody point me to the use of UIMA in fielded industrial applications outside of IBM? I'd be particularly interested in talking to someone who had evaluated various alternatives and decided to go with UIMA. Anybody who decided to go with a different framework after evaulating UIMA would be helpful too. Thanks, Andrew Borthwick Principal Scientist Spock Networks
