>> 
>>> 
>>> If not, does anything like this exist for UIMA right now or is anything in
>>> the works?
>> 
>> I know of several proprietary ones, but nothing open source.  It
>> would be nice to have something like Jape in UIMA.
>> 
> 
> well, I wrote an annotator that uses Jape.
> 

We have been using ANTLR (www.antlr.org) for writing grammars that detect,
for example, temporal and monetary expressions. The integration of an ANTLR
lexer and parser into UIMA was fairly straight forward. We based our
integration on a posting that explains the interfacing of StAX with ANTLR
http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR

ANTLR grammars are written in EBNF and can be compiled into different
programming languages (e.g. Java, C, C#). The ANTLR grammar can also contain
Java code, if you want to manipulate other objects (e.g. adding annotations
to the CAS) while parsing the input.

You can write an ANTLR grammar, add java code to it and compile everything
into a java class. This java class can then be used by your AE in UIMA.

We experimented with lexers and parsers in ANTLR:

1) a lexer in ANTLR can be set to be a scanner that scans an input string
for expressions defined within EBNF
2) a parser expects a stream of ANTLR tokens. A stream of ANTLR tokens can
be constructed from UIMA annotations (see integration of StAX events into
ANTLR). Such a grammar can detect more complex structures consisting of
basic (UIMA) annotations.


The grammar formalism used by ANTLR is LL(*) which is more flexible than
LL(k). We found the grammars we wrote are much faster than the Jape grammars
we also used within UIMA. You're more constrained by the LL(*) formalism in
writing rules, but ANTLRworks is a useful GUI development environment that
alerts you to ambiguous rules.
http://www.antlr.org/works/index.html

BTW: This work will also be discusses as part of our paper at the LREC UIMA
workshop next week.
http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/

Frank




> 
> There are some limits:
> - it's impossible to create (in jape) an annotation that references to
> another annotation, that's easy to do in uima (pseudo code):
> Lemma lemma = new Lemma(cas);
> Token token = new Token(cas);
> token.setLemma(lemma);
> - the annotator is packaged as a PEAR that include ALL the GATE jars...
> - if the annotator is deployed in a web context, only the precompiled
> grammars are working: I think it's a class loading problem: the pear
> is loaded by a class loader, the uimaframework in deployed inside a
> web context that is under another class loader.... and so on....
> -performance: the reverse mapping from gate to uima il slow: updating
> the existing annotation means scanning all the annos in the cas, each
> feature and check if they're changed (well, if the grammar doesn't
> update anithing, the updates could be excluded)
> 
> I want to open the annototor, but at the moment I don't have the
> permission to do that.
> 
> But, the better would be to have a JAPE clone, or something better,
> that uses UIMA directly.
> I want to take a loook to the BSFAnnotator to understand if it could be
> usefull.
> 
> cheers,
> Roberto
> 
> -- 
> Roberto Franchini
> CELI s.r.l. (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - ITALY
> Tel +39-011-6600814 - Fax +39-011-6600687
> jabber:[EMAIL PROTECTED] skype:ro.franchini

Reply via email to