Thanks a lot for your detailed reply, again extremely useful and
interesting!
(I have put some responses inline)

On Thu, 18 Feb 2021 at 22:14, Richard Eckart de Castilho <r...@apache.org>
wrote:

>
> There is GATE of course ;) Although as far as I understood, GATE like UIMA
> intentionally does not prescribe a particular annotation schema / encoding
> so that it remains as flexible as possible to its users.
>

Yes - I have been both a user and developer of GATE and from my own
personal POV,
this flexibility is both a blessing and a curse. In addition, GATE does not
support
order of annotations over the same span or of zero-length annotations so
one would
have to deal with this too.
This is why in the new Python GateNLP package, there is support for
everything
necessary to do this (ordered annotations for the same/zero-length span and
proper support of zero-length annotations) and a pre-implemented convention
for how to represent MWTs: if there are more words than the token has
characters,
the words are represented as annotations over evenly divided character
ranges,
otherwise the last words which do not fit are ordered zero-length
annotations.

This is not enforced but treated as a useful convention -- the more tools
follow
the convention, the easier they will be able to interact.


> Nancy Ide [1] has done a lot of work on interoperability in the NLP
> space. One of the recent projects she is involved in is the LAPPS Grid [2]
> which includes a JSON-based data format, a schema, and an whole processing
> platform including components. The LAPPS Grid also integrates third-party
> components such as GATE or DKPro Core.
>

This is a very interesting pointer, thank you!


>
> In Germany, there is the Weblicht [3] platform of CLARIN-D. They have the
> XML-based TCF format for representing their stuff.
>
> In the Netherlands, there is CLARIAH [4]. They have the XML-based FoLiA
> and a lot of stuff building on that, e.g. CLAM [6].
>
> From the semantic web space, there is the RDF-based NIF [7].
>
> ... and these are just the ones I remember off the top of my head.
>
> If you follow these references and do a bit of digging, you probably find
> much more.
>
> However, doing a fine-grained comparison between all of these do distill
> commonalities
> and differences is quite a daunting task. Been there, done that - as you
> say - that is
> a place few people dare to venture.
>
>
Thanks for all the pointers - you are right, it is quite daunting!

All the best,
  Johann



Cheers,
>
> -- Richard
>
> [1] https://scholar.google.de/citations?hl=de&user=WkfhlGkAAAAJ
> [2] https://www.lappsgrid.org
> [3] https://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Main_Page
> [4] https://www.clariah.nl
> [5] https://pypi.org/project/FoLiA/
> [6] https://clam.readthedocs.io/en/latest/installation.html
> [7]
> https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html

Reply via email to