Frank, Your ANTLR-based approach sounds interesting. I'd like to see the paper and I'd be interested in seeing some demo code too.
This whole area of needing JAPE-like functionality for UIMA is a critical issue, as far as I'm concerned. The lack of support for writing regex's over annotations was one of two key reasons that my company decided to go with GATE over UIMA a year ago (the other was that GATE has the ability to read in an html document and convert the html into GATE annotations, which is a key feature for working with web documents). Although we like UIMA's infrastructure and array of ML toolkits, we would need to see some kind of solid regex functionality before we could consider starting to develop in UIMA. Regards, Andrew Borthwick On Wed, May 21, 2008 at 9:51 AM, Frank Schilder < [EMAIL PROTECTED]> wrote: > > > >> > >>> > >>> If not, does anything like this exist for UIMA right now or is anything > in > >>> the works? > >> > >> I know of several proprietary ones, but nothing open source. It > >> would be nice to have something like Jape in UIMA. > >> > > > > well, I wrote an annotator that uses Jape. > > > > We have been using ANTLR (www.antlr.org) for writing grammars that detect, > for example, temporal and monetary expressions. The integration of an ANTLR > lexer and parser into UIMA was fairly straight forward. We based our > integration on a posting that explains the interfacing of StAX with ANTLR > http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR > > ANTLR grammars are written in EBNF and can be compiled into different > programming languages (e.g. Java, C, C#). The ANTLR grammar can also > contain > Java code, if you want to manipulate other objects (e.g. adding annotations > to the CAS) while parsing the input. > > You can write an ANTLR grammar, add java code to it and compile everything > into a java class. This java class can then be used by your AE in UIMA. > > We experimented with lexers and parsers in ANTLR: > > 1) a lexer in ANTLR can be set to be a scanner that scans an input string > for expressions defined within EBNF > 2) a parser expects a stream of ANTLR tokens. A stream of ANTLR tokens can > be constructed from UIMA annotations (see integration of StAX events into > ANTLR). Such a grammar can detect more complex structures consisting of > basic (UIMA) annotations. > > > The grammar formalism used by ANTLR is LL(*) which is more flexible than > LL(k). We found the grammars we wrote are much faster than the Jape > grammars > we also used within UIMA. You're more constrained by the LL(*) formalism in > writing rules, but ANTLRworks is a useful GUI development environment that > alerts you to ambiguous rules. > http://www.antlr.org/works/index.html > > BTW: This work will also be discusses as part of our paper at the LREC UIMA > workshop next week. > http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/<http://watchtower.coling.uni-jena.de/%7Ecoling/uimaws_lrec2008/> > > Frank > > > > > > > > There are some limits: > > - it's impossible to create (in jape) an annotation that references to > > another annotation, that's easy to do in uima (pseudo code): > > Lemma lemma = new Lemma(cas); > > Token token = new Token(cas); > > token.setLemma(lemma); > > - the annotator is packaged as a PEAR that include ALL the GATE jars... > > - if the annotator is deployed in a web context, only the precompiled > > grammars are working: I think it's a class loading problem: the pear > > is loaded by a class loader, the uimaframework in deployed inside a > > web context that is under another class loader.... and so on.... > > -performance: the reverse mapping from gate to uima il slow: updating > > the existing annotation means scanning all the annos in the cas, each > > feature and check if they're changed (well, if the grammar doesn't > > update anithing, the updates could be excluded) > > > > I want to open the annototor, but at the moment I don't have the > > permission to do that. > > > > But, the better would be to have a JAPE clone, or something better, > > that uses UIMA directly. > > I want to take a loook to the BSFAnnotator to understand if it could be > > usefull. > > > > cheers, > > Roberto > > > > -- > > Roberto Franchini > > CELI s.r.l. (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - > ITALY > > Tel +39-011-6600814 - Fax +39-011-6600687 > > jabber:[EMAIL PROTECTED] <[EMAIL PROTECTED]>skype:ro.franchini > > -- Andrew Borthwick, Ph.D. | SPOCK Networks Spock is Hiring! www.spock.com/jobs P.S. We pay a $5,000 referral fee for anyone we hire
