Dear list,
this is an explorative email to find possible intersections and points for collaboration. "explorative" because I would claim to have a rough idea about what Stanbol is, but I don't know enough yet to wrap my head around it completely and pinpoint the overlaps. This is why I will try to give a brief outline of what we have been working on at the LOD2 EU project, which direction it is going and what my initial ideas are. Feedback is very welcome. I am not being a Stanbol expert yet, so I might be off the path ;)

Last year, we have been working on the NLP Interchange Format (NIF).
NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

What NIF currently is:
1. In Sept. 2011, we published the specification 1.0: http://nlp2rdf.org/nif-1-0 . There are about 8-12 implementations (see demo at 5.) out there, we know of. 2. One of the latest draft papers about it can be found here: http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
3. Basic idea is to use # fragments to give URIs to Strings, e.g.:
http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 represents the first occurence of "Semantic Web" in http://www.w3.org/DesignIssues/LinkedData.html Of course, you can then use this URI as subject and add any annotation you want.
e.g.:
:offset_717_729 its:mentions dbpedia:Semantic_Web .
4. There is a Web annotator making use of the Hash URI scheme or NIF:
http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A//www.w3.org/DesignIssues/LinkedData.html%23frag_65b9eea6e1cc6bb9f0cd2a47751a186f
5. There is a demonstrator (will be much nicer in a couple of days): http://nlp2rdf.lod2.eu/demo.php
with eye candy, but minor bug: http://nlp2rdf.lod2.eu/demo_new.php
6. Apart from that NIF also tries to find best practices for annotation. E.g. OLiA idenitifers for Part of Speech tags http://www.sfb632.uni-potsdam.de/~chiarcos/ontologies.xml or NERD or the lemon model.

What is planned for NIF:
a) A new spec NIF 2.0 within this year. Discussion will be on this mailing list: http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf NIF will be simplified (simpler URI Schemes and annotations), consolidated (Better implementations) and extended (ability to express confidence value and string sets, etc. ) b) We plan to have implementations for NERD http://nerd.eurecom.fr , DBpedia Spotlight, Zemanta.com and DKPro http://www.ukp.tu-darmstadt.de/research/current-projects/dkpro/ c) Inclusion of XPointer as NIF URI Scheme and creation of a mapping to "string uris". This should somehow be compatible with the Internationalisation Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/ , but we are still working together on a bidirectional bridge. There have been a plethora of discussion partly at this thread: http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html d) NIF should be compatible with PROV-AQ: Provenance Access and Query http://www.w3.org/TR/2012/WD-prov-aq-20120619/

What I am hoping for or my ideas about how Stanbol and NIF overlap:
I) Reading your docu, you guys seem to be able to provide very good use cases and feedback for NIF 2.0 . We would really like to include that and also tailor NIF 2.0 to your needs. We are currently setting up a Wiki - still ugly sorry: http://wiki.nlp2rdf.org/ Please mail me for accounts. II) I would assume, that you need some OWL model for all the enhancer output. NIF standardizes NLP tool output and it tries to be blank-node free and lightweight, but still as expressive as possible. So for you this would mean that you could really save time, as ontology modelling is really tedious. By reusing NIF you would get a free data model and spec and you could focus on the implementation of the Stanbol engine. I got a 404 on http://incubator.apache.org/enhancer/enhancementstructure.html I read "fise" somewhere. What is it? How does it compare to NIF? What URIs do you use? How many triples do you have per annotation? III) With NIF we focused on the RDF output for tools, not on the workflow. Stanbol seems to focus on the workflow as well, right? It might be easy to implement a NIF engine with Stanbol. This could be a good showcase for NIF and Stanbol. With a Debian package, we could include Stanbol into the LOD2 Stack http://stack.lod2.eu/

Sorry for the long email, please give some feedback about your ideas.
I am also willing to answer questions and provide examples.
All the best,
Sebastian

--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


Reply via email to