Re: What's going on with reasoners (STANBOL-185)

Sebastian Schaffert Wed, 31 Aug 2011 07:26:13 -0700

Dear Enrico and all,

I had several discussions with Alessandro on the topic of reasoning already, 
and use the opportunity to summarise my points and also our conclusions for 
consideration here on the mailinglist.


A bit on my background: I have had a pretty deep experience in the reasoning 
and logics domain during my PhD (where I developed a rule-based reasoner for 
the Web) and as one of the main persons responsible for the FP6 project REWERSE 
(reasoning on the web with rules and semantics), even though I am nowadays more 
concerned with Linked Data and Social Semantic Web. Recently, I have 
implemented the reasoner of the Linked Media Framework, described here:
http://code.google.com/p/kiwi/wiki/Reasoning

1. Goal of Reasoning in Stanbol
===============================

The first thing that needs to be clarified is: why does Apache Stanbol need 
reasoning and what will it be used for?! Please give me concrete use cases and 
examples ;-)

Depending in these goals, the actual kind of reasoning needs to be decided on. 
Possible kinds of reasoning:

schema validation
----------------- 

This is what you call "check" and this is the typical reasoning applied in the 
Semantic Web domain. I always found its usefulness very limited and therefore 
this is only mildly interesting to me. How often will you really need to check 
whether a model is consistent? Usually only during development, if at all. Even 
worse: when operating on the World Wide Web you will *inevitably* have 
inconsistencies, so it is better to simply live with them and not care too much 
about consistent schemas, it will only drive you crazy. ;-)

For schema validation, you will probably use some of the existing OWL and 
description logics reasoners like Hermit, FaCT, Racer, ... but be aware that 
they are all rather complex and inefficient when dealing with bigger datasets, 
as you identified as a problem yourself.


instance classification
-----------------------

Instance classification means, informmaly, to figure out the type of a resource 
based on the existance of a schema or ontology for the data. Now this is at 
least *a bit* more interesting than schema validation, because it provides you 
with actual new information that you can make use of. It allows you to treat a 
Blog Post as a Document, a Meeting as a Location, etc. On the other hand, every 
programmer can implement you this kind of reasoning in a couple of hours in 
Java without any need of a reasoner.

For instance classification you can in the most simple case use an RDFS 
reasoner (very efficient). OWL reasoners are obviously also capable of this, 
but usually in practice not much better than RDFS reasoners so it is not worth 
the added complexity. Instance classification can furthermore be considered a 
specific case of implicit/rule-based knowledge (what you call "enrichment" 
below).


implicit/rule-based knowledge
-----------------------------

In this case, you extend your factual, static knowledge base (i.e. triples) by 
rules that represent implicit knowledge (e.g. "if a meeting M is located in a 
place P and P has coordinates X and Y, then M also has coordinates X and Y). 
This kind of reasoning is the classical way of representing knowledge in 
knowledge based systems and adds, in my opinion, real additional value to the 
information, because much human knowledge is actually rule-based. I consider 
this kind of reasoning the most interesting in an information system. 
Rule-based reasoners can cover instance classification as a special case and 
with a few extensions even most parts of consistency checking.

You call this kind of reasoning "enrichment", but this is in fact in my opinion 
not really correct. This kind of reasoning does not really "enrich" existing 
information, it just explicitly materializes implicitly present knowledge that 
is already there.

Rule-based reasoning is currently still a bit esoteric in the Semantic Web 
domain (Jena only uses it for implementing RDFS/OWL reasoning), as the 
reasoning topic has been "hijacked" by the Description Logics people. I often 
criticised this because of its impracticality, but noone would really listen 
;-) There is the RuleML initiative which resulted in the specification of SWRL, 
but SWRL is seen as an extension of DL reasoning and thus suffers from the 
inefficiency of these reasoners. Outside the Semantic Web, the most successful 
rule-based reasoning system is probably Prolog and Datalog, both highly 
efficient systems but unfortunately not really present in the Java and RDF 
world.

Inside the "Linked Media Framework", which we intend to integrate with Apache 
Stanbol, I have implemented a Datalog-style rule-based reasoner over RDF 
triples that can be evaluated very efficiently (see the link I sent above for 
the specification). Even though this reasoner has a quite restricted 
expressivity, it is still sufficient to cover many useful scenarios.


2. Reasoning API
================

Since reasoing can have so many distinctive goals, I am not sure whether it is 
possible or wise to develop a single reasoning API for Stanbol. It simply 
depends on what you want to achieve. For example:

- consistency checking will probably be in a (in-memory) session, i.e. you send 
your model and you want to know whether it is consistent in itself or with the 
existing models, but you won't store the model before it is considered finished

- instance checking and rule-based reasoning will probably operate on a 
more-or-less persistent representation of the data in the triple store, because 
conceptually it means that implicit knowledge is applied to the data to 
explicitly materialize it; 

- reasoners are customizable, in rule-based reasoning you are even able to 
upload and evaluate your own rule sets on the data; your API would need to be 
very generic to cover this



My two cents to the reasoning discussion ;-)

Greetings, 

Sebastian

-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg




Am 30.08.2011 um 16:12 schrieb Enrico Daga:

> Hallo,
> this is to share the developments of the last three weeks around
> STANBOL-185 (Support for Jena-based reasoning services).
> 
> Initial problem was:
> 1) No support for Jena
> 2) Hermit LGPL dependency (so no default reasoner with the current
> implementation that can be distributed with Stanbol)
> 3) No extendibility for off the shelf reasoning services
> 
> All three points are addressed by the current version in
> branches/jena-reasoners (even if lot of things can improved).
> 
> What have been implemented?
> * A /serviceapi for ReasoningServices, using SCR, and a REST endpoint
> which automatically publish all available services (in /web)
> * Base OWLApi and Jena abstract services
> * Jena RDFS,OWL,OWLMini reasoning services
> * HermiT reasoning service (this will be moved out of Stanbol, finally)
> 
> At the end of this e-mail I include the content of the README.txt
> file, for your convenience, which includes some of ideas and proposal
> for improvements.
> I would really appreciate if the community can have a look at it, and
> provide some feedback.
> 
> I did my best to make this api as flexible and efficient as possible,
> but I know more work must still be done in this direction (some ideas
> are below).
> 
> I am going to verify the integritycheck demo with respect to this
> implementation, while it is working, I will move the code in /trunk.
> 
> Thank you all
> 
> Enrico
> 
> -------------------------------------
> 
> Description
> =============
> 
> * A serviceapi for ReasoningServices, using SCR
> * Base OWLApi and Jena abstract services
> * Jena RDFS,OWL,OWLMini reasoning services
> * HermiT reasoning service
> 
> * A common REST endpoint at /reasoners with the following preloaded services:
> **    /rdfs
> **    /owl
> **    /owlmini
> **    /owl2
> 
> each can be accessed with one of three tasks: check,enrich,classify,
> for example:
> 
> /reasoners/owl/check    (the Jena owl service with task classify)
> or
> /reasoners/owl2/classify (the hermit service with task classify)
> 
> Tasks description:
> * check    : is the input consistent? 200 =true, 204 =false
> * classify : return only rdf:type inferences
> * enrich   : return all inferences
> 
> This is how the endpoint behave:
> 
> GET (same if POST and Content-type: application/x-www-form-urlencoded)
> params:
> * url        // Loads the input from url
> * target  // (optional) If given, save output in the store (TcManager)
> and does not return the stream
> 
> for example:
> $ curl 
> "http://localhost:8080/reasoners/owl2/classify?url=http://xmlns.com/foaf/0.1/";
> 
> POST   [Content-type: multipart/form-data]
> * file       // Loads from the input stream
> * target  // (optional)  If given, save output in the store
> (TcManager) and does not return the stream
> 
> Other parameters can be sent, to support inputs from Ontonet and Rules:
> These additional parameters can be sent:
> * scope // the ID of an Ontonet scope
> * session // The ID of an Ontonet session
> * recipe  // The ID of a recipe from the Rules module (only with
> OWLApi based services)s
> 
> Supported output formats:
> Supported return formats are all classic RDF types (n3,turtle,rdf+xml)
> and HTML. For HTML the returned statements are provided in Turtle
> (Jena) or OWL Manchester syntax (OWLApi), wrapped in the stanbol
> layout. It would be nice to have all in the latter, which is very much
> readable (todo).
> 
> Todo
> =============
> * Support for return types json and json-ld (need to write jersey writers)
> * The front service actually returns only inferred statements. It is
> useful also to have the complete set of input+inferred statements
> * Support for long-term operations. This is crucial for reasoning
> tasks, since it can take some time with large graphs. This is needed
> in general for Stanbol, something like "Stanbol Jobs".
> * Decouple input preparation from the rest endpoint resource, creating
> something like an InputProvider SCR api;  each InputProvider is bound
> to a set of additional parameters.
> This have several benefits:
> ** Remove of additional optional parameters, bound to specific input
> sources from the default rest api (ex, session, scope, recipe)
> ** Remove dependencies to ontonet, rules and other modules which are
> not needed for standard usage. They could be implemented as
> InputProvider/s, bound to specific parameters.
> ** Allow the addition of other input sources (for example 'graph',
> 'entity' or 'site')
> * Implement a Custom Jena ReasoningService, to use a Jena rules file
> or a stanbol recipe (when implemented the toJena() functionality in
> the rules module) from configuration. This could be done as multiple
> SCR instance, as it is now for entityhub sites, for example.
> * Provide a validation report in case of task CHECK (validity check).
> * Implement a progress monitor, relying on the jena and owlapi apis,
> which have this feature, for debugging purpose
> * Implement a benchmark endpoint, relying on OWL manchester syntax, to
> setup benchmark tests in the style of the one made for the enhancer
> * Implementing owllink client reasoning service
> * Implement additional data preparation steps, for example to
> implement a "consistent refactoring" task. For example, giving a
> parameter 'refactor=<recipe-id>' the service could refactor the graph
> before execute the task.
> * Implement off the shelf reasoning services (for example, targeted to
> resolve only owl:sameAs links)
> 
> General issues
> =============
> The main problem is performance, which decrease while the input data
> grows, in some cases dramatically. This could be faced (IMHO), in two
> directions:
> * Improve input preparation. In particular, the preparation of input
> form ontonet scope/session needs to stream the ontologies, in cases of
> more input (url provided) twice!, and this have some drawback on
> performance.
> * Support long-term operations, to start the process from the REST
> call and then ping it's process through a dedicated endpoint
> 
> Notes (to be known)
> =============
> Differences between Jena and OWLApi services:
> * CHECK have different meaning with respect to the reasoning service
> implementation
> 
> 
> 
> Examples
> =============
> #
> # Basic GET calls to the reasoning services.
> # Send a URL and the service will return the inferred triples
> #
> # Classify the FOAF ontology, getting it from the web using the Jena
> OWL reasoner, result in turtle
> curl -v -H "Accept: application/turtle"
> "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/";
> 
> # Classify the FOAF ontology, getting it from the web using the Jena
> OWL reasoner, result in n3
> curl -v -H "Accept: text/n3"
> "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/";
> 
> # Enrich the FOAF ontology, getting it from the web using the Jena
> RDFS reasoner, result in rdf/xml
> curl -v -H "Accept: application/rdf+xml"
> "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/";
> 
> # Check consistency of the FOAF ontology, getting it from the web
> using the Jena OWL reasoner, result in turtle
> curl -v 
> "http://localhost:8080/reasoners/owl/check?url=http://xmlns.com/foaf/0.1/";
> 
> # Check consistency of the FOAF ontology, getting it from the web
> using the Hermit OWL2 reasoner, result in turtle
> curl -v 
> "http://localhost:8080/reasoners/owl2/check?url=http://xmlns.com/foaf/0.1/";
> 
> # Trying with an ontology network (large ontology composed by a set of
> little ontologies connected through owl:import statements)
> curl -v 
> "http://localhost:8080/reasoners/owl2/check?url=http://www.cnr.it/ontology/cnr/cnr.owl";
> # or
> curl -v 
> "http://localhost:8080/reasoners/owl2/enrich?url=http://www.cnr.it/ontology/cnr/cnr.owl";
> 
> #
> # POST calls (send a file)
> #
> # Send the foaf.rdf file to a reasoning service and see the output
> # (get it with
> curl -H "Accept: application/rdf+xml"  http://xmlns.com/foaf/0.1/ > foaf.rdf
> # )
> curl -X POST -H "Content-type: multipart/form-data" -H "Accept:
> text/turtle" -F [email protected]
> "http://localhost:8080/reasoners/rdfs/enrich";
> 
> # Save output in the triple store instead of return
> # >> Add the "target" parameter, with the graph identifier
> curl 
> "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/&target=example-foaf-inferred";
> # or, posting a file
> curl -X POST -H "Content-type: multipart/form-data" -F [email protected]
> -F target=example-rdfs-inferences
> "http://localhost:8080/reasoners/rdfs/enrich";
> 
> 
> 
> -- 
> Enrico Daga
> 
> --
> http://www.enridaga.net
> skype: enri-pan

Re: What's going on with reasoners (STANBOL-185)

Reply via email to