Re: OpenNLP Annotations Proposal

Jörn Kottmann Wed, 22 Jun 2011 02:20:09 -0700

On 6/22/11 10:45 AM, Olivier Grisel wrote:

I wind the UIMA CAS API much more complicated to work with than
directly working with token-level concepts with the OpenNLP API (i.e.
with arrays of Span). I haven't add a look at the opennlp-uima
subproject though: you probably already have tooling and predefined
type systems that makes interoperability with CAS instance less of a
pain.

If you look at annotation tool they usually always give some flexibilityto the userin terms what kind of annotations they are allowed to add. One thing Ialways see isas soon as they allow more complex annotations the tools and code whichhandles to

annotations gets also complex. Have a look at Wordfreak or Gate.

The CAS might be difficult to use first, but at least it works and is
very well tested. If we create a custom solution we might end up with
a similar complexity anyway.

We would need to define a type system, but that is something we need
to do anyway independent of which way we implement it.
Maybe we even need to support different type systems for different corpora.
I guess we start with wikipedia based data, but one day we might want to
annotate an email or blog corpus.

It is an interesting question how the type system should look, since weneed totrack where the annotations come from, and might even want some to bedouble checked,

or need to annotate the disagreement of annotators.

Jörn

Re: OpenNLP Annotations Proposal

Reply via email to