Dear list,
this is an explorative email to find possible intersections and points
for collaboration. "explorative" because I would claim to have a rough
idea about what Stanbol is, but I don't know enough yet to wrap my
head around it completely and pinpoint the overlaps.
This is why I will try to give a brief outline of what we have been
working on at the LOD2 EU project, which direction it is going and
what my initial ideas are. Feedback is very welcome. I am not being a
Stanbol expert yet, so I might be off the path ;)
Last year, we have been working on the NLP Interchange Format (NIF).
NIF is an RDF/OWL-based format that aims to achieve interoperability
between Natural Language Processing (NLP) tools, language resources
and annotations.
What NIF currently is:
1. In Sept. 2011, we published the specification 1.0:
http://nlp2rdf.org/nif-1-0 . There are about 8-12 implementations (see
demo at 5.) out there, we know of.
2. One of the latest draft papers about it can be found here:
http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
3. Basic idea is to use # fragments to give URIs to Strings, e.g.:
http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
represents the first occurence of "Semantic Web" in
http://www.w3.org/DesignIssues/LinkedData.html
Of course, you can then use this URI as subject and add any annotation
you want.
e.g.:
:offset_717_729 its:mentions dbpedia:Semantic_Web .
4. There is a Web annotator making use of the Hash URI scheme or NIF:
http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A//www.w3.org/DesignIssues/LinkedData.html%23frag_65b9eea6e1cc6bb9f0cd2a47751a186f
5. There is a demonstrator (will be much nicer in a couple of days):
http://nlp2rdf.lod2.eu/demo.php
with eye candy, but minor bug: http://nlp2rdf.lod2.eu/demo_new.php
6. Apart from that NIF also tries to find best practices for
annotation. E.g. OLiA idenitifers for Part of Speech tags
http://www.sfb632.uni-potsdam.de/~chiarcos/ontologies.xml or NERD or
the lemon model.
What is planned for NIF:
a) A new spec NIF 2.0 within this year. Discussion will be on this
mailing list:
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
NIF will be simplified (simpler URI Schemes and annotations),
consolidated (Better implementations) and extended (ability to express
confidence value and string sets, etc. )
b) We plan to have implementations for NERD http://nerd.eurecom.fr ,
DBpedia Spotlight, Zemanta.com and DKPro
http://www.ukp.tu-darmstadt.de/research/current-projects/dkpro/
c) Inclusion of XPointer as NIF URI Scheme and creation of a mapping
to "string uris". This should somehow be compatible with the
Internationalisation Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/ ,
but we are still working together on a bidirectional bridge. There
have been a plethora of discussion partly at this thread:
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
d) NIF should be compatible with PROV-AQ: Provenance Access and Query
http://www.w3.org/TR/2012/WD-prov-aq-20120619/
What I am hoping for or my ideas about how Stanbol and NIF overlap:
I) Reading your docu, you guys seem to be able to provide very good
use cases and feedback for NIF 2.0 . We would really like to include
that and also tailor NIF 2.0 to your needs. We are currently setting
up a Wiki - still ugly sorry: http://wiki.nlp2rdf.org/ Please mail me
for accounts.
II) I would assume, that you need some OWL model for all the enhancer
output. NIF standardizes NLP tool output and it tries to be blank-node
free and lightweight, but still as expressive as possible. So for you
this would mean that you could really save time, as ontology modelling
is really tedious. By reusing NIF you would get a free data model and
spec and you could focus on the implementation of the Stanbol engine.
I got a 404 on
http://incubator.apache.org/enhancer/enhancementstructure.html
I read "fise" somewhere. What is it? How does it compare to NIF? What
URIs do you use? How many triples do you have per annotation?
III) With NIF we focused on the RDF output for tools, not on the
workflow. Stanbol seems to focus on the workflow as well, right? It
might be easy to implement a NIF engine with Stanbol. This could be a
good showcase for NIF and Stanbol. With a Debian package, we could
include Stanbol into the LOD2 Stack http://stack.lod2.eu/
Sorry for the long email, please give some feedback about your ideas.
I am also willing to answer questions and provide examples.
All the best,
Sebastian
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.