Antw: Re: Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-22 Thread Andreas Kahl
Hello Dave and Alexis, 

Thanks for your advice, I will give this a try. (Especially the
GenericRule-Reasoner)
This plus removal of all non Whitelisted sameAs links could be a good
combination. 

Best Regards
Andreas


>>> Alexis Armin Huf  20.02.2018 13:43 >>>
Hi Andreas,

I had a similar scenario, but as Dave said there is no
choose-a-enum-value
reasoner for that. What I did was picking the RDFS and owl:sameAs
rules
from the rule files in Jena source and instantiate a
GenericRuleReasoner
with my custom rule file. Docs for how to do this are here:
https://jena.apache.org/documentation/inference/index.html#rules. Below
is
a short walk through of how I set this up.

First, build your rules file. Look for the files in
jena-core/src/main/resources/etc
,
specially rdfs-fb.rules and owl-fb-mini.rules. Your rule file will
look
like this:

-> tableAll().

[rdfs7b: (?a rdf:type rdfs:Class) -> (?a rdfs:subClassOf
rdfs:Resource)]

[rdfs2:  (?p rdfs:domain ?c) -> [(?x rdf:type ?c) <- (?x ?p ?y)] ]
[rdfs3:  (?p rdfs:range ?c)  -> [(?y rdf:type ?c) <- (?x ?p ?y)] ]
[rdfs5a: (?a rdfs:subPropertyOf ?b), (?b rdfs:subPropertyOf ?c) -> (?a
rdfs:subPropertyOf ?c)]
[rdfs5b: (?a rdf:type rdf:Property) -> (?a rdfs:subPropertyOf ?a)]
# ... and this goes on ...

# There are a lot of details around owl:sameAs, but you probably will
need
these:
[sameAs1: (?A owl:sameAs ?B) -> (?B owl:sameAs ?A) ]
[sameAs2: (?A owl:sameAs ?B) (?B owl:sameAs ?C) -> (?A owl:sameAs ?C)
]
[equality1: (?X owl:sameAs ?Y), notEqual(?X,?Y) ->
[(?X ?P ?V) <- (?Y ?P ?V)]
[(?V ?P ?X) <- (?V ?P ?Y)] ]

Save this file a a resource of your application, parse it and create a
GenericRuleReasoner, like this:

ClassLoader loader = SomeClass.class.getClassLoader();
try (BufferedReader reader = new BufferedReader(new
InputStreamReader(loader.getResourceAsStream("rules/rdfs+sameAs.rules"
{
List rules =
Rule.parseRules(Rule.rulesParserFromReader(reader));
GenericRuleReasoner reasoner = new GenericRuleReasoner(rules);
return
ModelFactory.createModelForGraph(reasoner.bind(modelThatNeedsReasoning.getGraph()))
}

Hope that helps!



Dave Reynolds  schrieb am Di., 20. Feb. 2018
um
06:03 Uhr:

> Hi Andreas,
>
> Jena does not currently have any alternative built reasoner for RDFS
+
> owl:sameAs and I'm not aware of any such "equality reasoner" being
in
> development. You could try Pellet, which may offer better
performance.
>
> In fact equality reasoning is notoriously expensive in the general
case,
> the logic is indeed simple but the cost can blow up easily because
it
> leads to a combinatorial number of deductions.
>
> Depending on what problem you are trying to solve your best bet may
be
> to avoid using owl:sameAs reasoning at run time altogether. For
example,
> in some cases it may be possible to do a pass over the data at
ingest
> time to identify all aliases and to only assert in the model some
> "cannonical" URI for each alias equivalence set.
>
> Dave
>
> On 20/02/18 07:17, Andreas Kahl wrote:
> > Hello everyone,
> >
> > I am currently developing a little Jena Model that should be able
to do
> > RDFS inferencing plus owl:sameAs. From the documentation I learned
that
> > the minimal Reasoner for that is OWLmini. During development I
> > experienced some severe performance bottlenecks if a runtime model
> > contains too many owl:sameAs links and generally for nearly all
models
> > exceeding 1000 Statements. Most of the tests simply freeze at some
point
> > if those performance bottlenecks occur, sometimes selecting a
Statement
> > with a SimpleSelector consisting of a subject URI, a predicate URI
and a
> > null Object takes 20secs.
> > There should be not problems with blocking of threads as I run my
> > integration tests single threaded - especially if I am
experiencing
> > failures.
> >
> > I could confine this by using models without inferencing while
> > collecting and adding data spidered from the web, and especially
adding
> > Ontologies last, only where absolutely needed. Also I use a
whitelist
> > internally for domains my spider is allowed to fetch data from;
> > therefore I remove all owl:sameAs Statements containing object URIs
not
> > in this whitelist. In the end, in my querying methods, I clone
that
> > basic model with the collected data and add it to an InfModel:
> >
> > protected static Model getInfModelFrom(Model model) {
> >  final long size = model.size();
> >  LOG.debug("getInfModelFrom: Input size: " +
> > Long.toString(size));
> >  final Model copy = ModelFactory.createDefaultModel();
> >  copy.add(model instanceof InfModel ? ((InfModel)
> > model).getRawModel() : model);
> >  final InfModel infModel =
> > ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
> > copy);
> >  return infModel;
> >  }
> >
> > The 

Re: Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-20 Thread Alexis Armin Huf
Hi Andreas,

I had a similar scenario, but as Dave said there is no choose-a-enum-value
reasoner for that. What I did was picking the RDFS and owl:sameAs rules
from the rule files in Jena source and instantiate a GenericRuleReasoner
with my custom rule file. Docs for how to do this are here:
https://jena.apache.org/documentation/inference/index.html#rules. Below is
a short walk through of how I set this up.

First, build your rules file. Look for the files in
jena-core/src/main/resources/etc
,
specially rdfs-fb.rules and owl-fb-mini.rules. Your rule file will look
like this:

-> tableAll().

[rdfs7b: (?a rdf:type rdfs:Class) -> (?a rdfs:subClassOf rdfs:Resource)]

[rdfs2:  (?p rdfs:domain ?c) -> [(?x rdf:type ?c) <- (?x ?p ?y)] ]
[rdfs3:  (?p rdfs:range ?c)  -> [(?y rdf:type ?c) <- (?x ?p ?y)] ]
[rdfs5a: (?a rdfs:subPropertyOf ?b), (?b rdfs:subPropertyOf ?c) -> (?a
rdfs:subPropertyOf ?c)]
[rdfs5b: (?a rdf:type rdf:Property) -> (?a rdfs:subPropertyOf ?a)]
# ... and this goes on ...

# There are a lot of details around owl:sameAs, but you probably will need
these:
[sameAs1: (?A owl:sameAs ?B) -> (?B owl:sameAs ?A) ]
[sameAs2: (?A owl:sameAs ?B) (?B owl:sameAs ?C) -> (?A owl:sameAs ?C) ]
[equality1: (?X owl:sameAs ?Y), notEqual(?X,?Y) ->
[(?X ?P ?V) <- (?Y ?P ?V)]
[(?V ?P ?X) <- (?V ?P ?Y)] ]

Save this file a a resource of your application, parse it and create a
GenericRuleReasoner, like this:

ClassLoader loader = SomeClass.class.getClassLoader();
try (BufferedReader reader = new BufferedReader(new
InputStreamReader(loader.getResourceAsStream("rules/rdfs+sameAs.rules" {
List rules = Rule.parseRules(Rule.rulesParserFromReader(reader));
GenericRuleReasoner reasoner = new GenericRuleReasoner(rules);
return
ModelFactory.createModelForGraph(reasoner.bind(modelThatNeedsReasoning.getGraph()))
}

Hope that helps!



Dave Reynolds  schrieb am Di., 20. Feb. 2018 um
06:03 Uhr:

> Hi Andreas,
>
> Jena does not currently have any alternative built reasoner for RDFS +
> owl:sameAs and I'm not aware of any such "equality reasoner" being in
> development. You could try Pellet, which may offer better performance.
>
> In fact equality reasoning is notoriously expensive in the general case,
> the logic is indeed simple but the cost can blow up easily because it
> leads to a combinatorial number of deductions.
>
> Depending on what problem you are trying to solve your best bet may be
> to avoid using owl:sameAs reasoning at run time altogether. For example,
> in some cases it may be possible to do a pass over the data at ingest
> time to identify all aliases and to only assert in the model some
> "cannonical" URI for each alias equivalence set.
>
> Dave
>
> On 20/02/18 07:17, Andreas Kahl wrote:
> > Hello everyone,
> >
> > I am currently developing a little Jena Model that should be able to do
> > RDFS inferencing plus owl:sameAs. From the documentation I learned that
> > the minimal Reasoner for that is OWLmini. During development I
> > experienced some severe performance bottlenecks if a runtime model
> > contains too many owl:sameAs links and generally for nearly all models
> > exceeding 1000 Statements. Most of the tests simply freeze at some point
> > if those performance bottlenecks occur, sometimes selecting a Statement
> > with a SimpleSelector consisting of a subject URI, a predicate URI and a
> > null Object takes 20secs.
> > There should be not problems with blocking of threads as I run my
> > integration tests single threaded - especially if I am experiencing
> > failures.
> >
> > I could confine this by using models without inferencing while
> > collecting and adding data spidered from the web, and especially adding
> > Ontologies last, only where absolutely needed. Also I use a whitelist
> > internally for domains my spider is allowed to fetch data from;
> > therefore I remove all owl:sameAs Statements containing object URIs not
> > in this whitelist. In the end, in my querying methods, I clone that
> > basic model with the collected data and add it to an InfModel:
> >
> > protected static Model getInfModelFrom(Model model) {
> >  final long size = model.size();
> >  LOG.debug("getInfModelFrom: Input size: " +
> > Long.toString(size));
> >  final Model copy = ModelFactory.createDefaultModel();
> >  copy.add(model instanceof InfModel ? ((InfModel)
> > model).getRawModel() : model);
> >  final InfModel infModel =
> > ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
> > copy);
> >  return infModel;
> >  }
> >
> > The only Ontology I am using is
> > http://d-nb.info/standards/elementset/gnd# .
> >
> > I suppose that the Reasoner I use is much to mighty for the seemingly
> > simple owl:sameAs. Is there any more basic option understanding
> > owl:sameAs besides RDFS? All other OWL Axioms are not 

Re: Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-20 Thread Dave Reynolds

Hi Andreas,

Jena does not currently have any alternative built reasoner for RDFS + 
owl:sameAs and I'm not aware of any such "equality reasoner" being in 
development. You could try Pellet, which may offer better performance.


In fact equality reasoning is notoriously expensive in the general case, 
the logic is indeed simple but the cost can blow up easily because it 
leads to a combinatorial number of deductions.


Depending on what problem you are trying to solve your best bet may be 
to avoid using owl:sameAs reasoning at run time altogether. For example, 
in some cases it may be possible to do a pass over the data at ingest 
time to identify all aliases and to only assert in the model some 
"cannonical" URI for each alias equivalence set.


Dave

On 20/02/18 07:17, Andreas Kahl wrote:

Hello everyone,

I am currently developing a little Jena Model that should be able to do
RDFS inferencing plus owl:sameAs. From the documentation I learned that
the minimal Reasoner for that is OWLmini. During development I
experienced some severe performance bottlenecks if a runtime model
contains too many owl:sameAs links and generally for nearly all models
exceeding 1000 Statements. Most of the tests simply freeze at some point
if those performance bottlenecks occur, sometimes selecting a Statement
with a SimpleSelector consisting of a subject URI, a predicate URI and a
null Object takes 20secs.
There should be not problems with blocking of threads as I run my
integration tests single threaded - especially if I am experiencing
failures.

I could confine this by using models without inferencing while
collecting and adding data spidered from the web, and especially adding
Ontologies last, only where absolutely needed. Also I use a whitelist
internally for domains my spider is allowed to fetch data from;
therefore I remove all owl:sameAs Statements containing object URIs not
in this whitelist. In the end, in my querying methods, I clone that
basic model with the collected data and add it to an InfModel:

protected static Model getInfModelFrom(Model model) {
 final long size = model.size();
 LOG.debug("getInfModelFrom: Input size: " +
Long.toString(size));
 final Model copy = ModelFactory.createDefaultModel();
 copy.add(model instanceof InfModel ? ((InfModel)
model).getRawModel() : model);
 final InfModel infModel =
ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
copy);
 return infModel;
 }

The only Ontology I am using is
http://d-nb.info/standards/elementset/gnd# .

I suppose that the Reasoner I use is much to mighty for the seemingly
simple owl:sameAs. Is there any more basic option understanding
owl:sameAs besides RDFS? All other OWL Axioms are not needed.
Are there any best practices dealing with Inferencing for relatively
small in memory models <10,000 Statements (most <5,000 Statements)? I
found some information on the web that a simple 'Equality Reasoner' is
in the works. Would that be a good choice? Will it be available any time
soon?

Thanks for any hints
Andreas



Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-19 Thread Andreas Kahl
Hello everyone, 

I am currently developing a little Jena Model that should be able to do
RDFS inferencing plus owl:sameAs. From the documentation I learned that
the minimal Reasoner for that is OWLmini. During development I
experienced some severe performance bottlenecks if a runtime model
contains too many owl:sameAs links and generally for nearly all models
exceeding 1000 Statements. Most of the tests simply freeze at some point
if those performance bottlenecks occur, sometimes selecting a Statement
with a SimpleSelector consisting of a subject URI, a predicate URI and a
null Object takes 20secs. 
There should be not problems with blocking of threads as I run my
integration tests single threaded - especially if I am experiencing
failures. 

I could confine this by using models without inferencing while
collecting and adding data spidered from the web, and especially adding
Ontologies last, only where absolutely needed. Also I use a whitelist
internally for domains my spider is allowed to fetch data from;
therefore I remove all owl:sameAs Statements containing object URIs not
in this whitelist. In the end, in my querying methods, I clone that
basic model with the collected data and add it to an InfModel: 

protected static Model getInfModelFrom(Model model) {
final long size = model.size();
LOG.debug("getInfModelFrom: Input size: " +
Long.toString(size));
final Model copy = ModelFactory.createDefaultModel();
copy.add(model instanceof InfModel ? ((InfModel)
model).getRawModel() : model);
final InfModel infModel =
ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
copy);
return infModel;
}

The only Ontology I am using is
http://d-nb.info/standards/elementset/gnd# . 

I suppose that the Reasoner I use is much to mighty for the seemingly
simple owl:sameAs. Is there any more basic option understanding
owl:sameAs besides RDFS? All other OWL Axioms are not needed. 
Are there any best practices dealing with Inferencing for relatively
small in memory models <10,000 Statements (most <5,000 Statements)? I
found some information on the web that a simple 'Equality Reasoner' is
in the works. Would that be a good choice? Will it be available any time
soon?

Thanks for any hints
Andreas