Re: RDFS inference produces invalid data

2019-10-24 Thread Andy Seaborne




On 23/10/2019 07:53, Lorenz Buehmann wrote:

I'm sure it happens because no such type check of the predicate in a
Triple object happens, it allows any Node object in predicate position

What you could do is to either post process or directly encode it in the
appropriate rules files [1] I guess?

You could use notBlank() atom, i.e. (in RDF 1.1 it should be rdfs7)

[rdfs6: (?a ?p ?b), (?p rdfs:subPropertyOf ?q) -> (?a ?q ?b)]

becomes

[rdfs6: (?a ?p ?b), (?p rdfs:subPropertyOf ?q), notBlank(?q) -> (?a ?q ?b)]

But it's just a guess. I'm sure Andy or Dave do have better solutions



Minor comment:

According to Ter Horst et al [2], RDFS entailment was (is?) incomplete
because of the blank node restriction:

p  rdfs:subPropertyOf  b .    (1)
b  rdfs:domain  u .       (2)
v  p  w .                            (3)

with b being a blank node.

v  rdf:type  u .                        (4)

Triple (4) can't be derived because rule rdfs7 can't be applied to (1)
and (3) to get the triple

v  b  w .  (5)

thus, we can't apply rule rdfs2 to (2) and (5) . But I guess that's a
different story and the extended RDF never made it into the W3C
recommendation I think.


This is close by:

https://www.w3.org/TR/rdf11-concepts/#section-generalized-rdf



In Jena this should work though :D


Yes - At the Graph/Triple/Node level there are no fundamental limitations.

In SPARQL, these generalized triples turn up naturally in query pattern 
evaluation and construct templates because variables aren't restricted.


   BIND(BNODE() as ?p)
   ?s ?p ?o .

which is a join on ?p and executed as a substitution usually (a form of 
index join).


Andy





[1]
https://github.com/apache/jena/tree/master/jena-core/src/main/resources/etc
[2] https://www.sciencedirect.com/science/article/abs/pii/S1570826805000144

On 22.10.19 19:59, Jindřich Mynarz wrote:

Hi,

it is reasonably common in RDF vocabularies to see that a property is
defined as an rdfs:subPropertyOf a blank node, such as in this example:

### vocabulary.ttl

PREFIX : 
PREFIX owl:  
PREFIX rdfs: 

:p1 rdfs:subPropertyOf _:b1 .
_:b1 owl:inverseOf :p2 .

###

Example data using this vocabulary:

### data.ttl

PREFIX : 

:e1 :p1 :e2 .

###

When RDFS inference is applied to this data, using the vocabulary with
Jena's command-line tool infer (the current 3.13.1 version, source code:
https://github.com/apache/jena/blob/master/jena-cmds/src/main/java/riotcmd/infer.java),
i.e. infer --rdfs vocabulary.ttl data.ttl, we get the following result
(here prettified):

###

PREFIX : 

:e1 :p1 :e2 ;
   _:b2 :e2 .

###

As you can see, we infer that :e1 _:b2 :e2, which is invalid, because blank
nodes (i.e. _:b2) are not permitted as predicates in RDF (
https://www.w3.org/TR/rdf11-concepts/#h3_section-triples).

Now, it is clear how that follows from the rdfs:subPropertyOf inference
rules (set aside that what we might actually want an RDFS/OWL reasoner that
would give us :e2 :p2 :e1), but should such inference be made if it
violates the RDF data model?

I wonder if checking the produced inferences for validity is expensive, or
if Jena's infer assumes a superset of RDF. Removing such inferences in
post-processing is a bit tricky because RDF parsers recognize this as an
error and fail.

- Jindrich





Re: RDFS inference produces invalid data

2019-10-23 Thread Lorenz Buehmann
I'm sure it happens because no such type check of the predicate in a
Triple object happens, it allows any Node object in predicate position

What you could do is to either post process or directly encode it in the
appropriate rules files [1] I guess?

You could use notBlank() atom, i.e. (in RDF 1.1 it should be rdfs7)

[rdfs6: (?a ?p ?b), (?p rdfs:subPropertyOf ?q) -> (?a ?q ?b)]

becomes

[rdfs6: (?a ?p ?b), (?p rdfs:subPropertyOf ?q), notBlank(?q) -> (?a ?q ?b)]

But it's just a guess. I'm sure Andy or Dave do have better solutions



Minor comment:

According to Ter Horst et al [2], RDFS entailment was (is?) incomplete
because of the blank node restriction:

p  rdfs:subPropertyOf  b .    (1)
b  rdfs:domain  u .       (2)
v  p  w .                            (3)

with b being a blank node.

v  rdf:type  u .                        (4)

Triple (4) can't be derived because rule rdfs7 can't be applied to (1)
and (3) to get the triple

v  b  w .  (5)

thus, we can't apply rule rdfs2 to (2) and (5) . But I guess that's a
different story and the extended RDF never made it into the W3C
recommendation I think.

In Jena this should work though :D



[1]
https://github.com/apache/jena/tree/master/jena-core/src/main/resources/etc
[2] https://www.sciencedirect.com/science/article/abs/pii/S1570826805000144

On 22.10.19 19:59, Jindřich Mynarz wrote:
> Hi,
>
> it is reasonably common in RDF vocabularies to see that a property is
> defined as an rdfs:subPropertyOf a blank node, such as in this example:
>
> ### vocabulary.ttl
>
> PREFIX : 
> PREFIX owl:  
> PREFIX rdfs: 
>
> :p1 rdfs:subPropertyOf _:b1 .
> _:b1 owl:inverseOf :p2 .
>
> ###
>
> Example data using this vocabulary:
>
> ### data.ttl
>
> PREFIX : 
>
> :e1 :p1 :e2 .
>
> ###
>
> When RDFS inference is applied to this data, using the vocabulary with
> Jena's command-line tool infer (the current 3.13.1 version, source code:
> https://github.com/apache/jena/blob/master/jena-cmds/src/main/java/riotcmd/infer.java),
> i.e. infer --rdfs vocabulary.ttl data.ttl, we get the following result
> (here prettified):
>
> ###
>
> PREFIX : 
>
> :e1 :p1 :e2 ;
>   _:b2 :e2 .
>
> ###
>
> As you can see, we infer that :e1 _:b2 :e2, which is invalid, because blank
> nodes (i.e. _:b2) are not permitted as predicates in RDF (
> https://www.w3.org/TR/rdf11-concepts/#h3_section-triples).
>
> Now, it is clear how that follows from the rdfs:subPropertyOf inference
> rules (set aside that what we might actually want an RDFS/OWL reasoner that
> would give us :e2 :p2 :e1), but should such inference be made if it
> violates the RDF data model?
>
> I wonder if checking the produced inferences for validity is expensive, or
> if Jena's infer assumes a superset of RDF. Removing such inferences in
> post-processing is a bit tricky because RDF parsers recognize this as an
> error and fail.
>
> - Jindrich
>



RDFS inference produces invalid data

2019-10-22 Thread Jindřich Mynarz
Hi,

it is reasonably common in RDF vocabularies to see that a property is
defined as an rdfs:subPropertyOf a blank node, such as in this example:

### vocabulary.ttl

PREFIX : 
PREFIX owl:  
PREFIX rdfs: 

:p1 rdfs:subPropertyOf _:b1 .
_:b1 owl:inverseOf :p2 .

###

Example data using this vocabulary:

### data.ttl

PREFIX : 

:e1 :p1 :e2 .

###

When RDFS inference is applied to this data, using the vocabulary with
Jena's command-line tool infer (the current 3.13.1 version, source code:
https://github.com/apache/jena/blob/master/jena-cmds/src/main/java/riotcmd/infer.java),
i.e. infer --rdfs vocabulary.ttl data.ttl, we get the following result
(here prettified):

###

PREFIX : 

:e1 :p1 :e2 ;
  _:b2 :e2 .

###

As you can see, we infer that :e1 _:b2 :e2, which is invalid, because blank
nodes (i.e. _:b2) are not permitted as predicates in RDF (
https://www.w3.org/TR/rdf11-concepts/#h3_section-triples).

Now, it is clear how that follows from the rdfs:subPropertyOf inference
rules (set aside that what we might actually want an RDFS/OWL reasoner that
would give us :e2 :p2 :e1), but should such inference be made if it
violates the RDF data model?

I wonder if checking the produced inferences for validity is expensive, or
if Jena's infer assumes a superset of RDF. Removing such inferences in
post-processing is a bit tricky because RDF parsers recognize this as an
error and fail.

- Jindrich

-- 
Jindrich Mynarz
https://mynarz.net/#jindrich