Re: SHACL Endpoint questions

Andy Seaborne Tue, 19 May 2020 11:09:33 -0700



On 19/05/2020 18:44, Chris Tomlinson wrote:

Hi Andy,

Thanks for the very helpful feedback.

1) I did not understand the proper use of sh:inversePath. I thought it was to 
verify that the target of the target/value of the sh:path property had a 
property equal to the value of sh:inversePath. I see that is just not correct.

2) I’ve found an effective solution to the problem of limiting validation to 
just the triples that should be in the graph of a resource such as bdr:P707 by 
creating a new shapes module that uses  [ ] sh:deactivated true   on any 
propertyShapes that leave the graph in question.

I’m getting closer to being able to formulate a plausible extension to the 
shacl endpoint.


looking forward to it.

Having ?graph=..&node=.. makes sense.


Thank you again for your help in the midst of all the 3.15.0 work,
Chris

On May 16, 2020, at 5:45 AM, Andy Seaborne <a...@apache.org> wrote:



On 15/05/2020 00:57, Chris Tomlinson wrote:

Hello Andy,
I have standalone code using validator.validate(Shapes, Graph, Node) where the 
graph is a merge of the target graph, e.g., P707, and the ontology graph. This 
works fine to validate examples like P707 generating sh:results just for 
references to P705 which is not otherwise included in the merged graph, which 
is what I expect.
If the code was running in Fuseki and the graph is the dataset graph (equiv 
union graph I think) then I would like to know how far out from the node the 
validation process will reach.


That depends on the target (or, here, implicit class target) and the shape 
itself.

digression

... something that I've experimented with - analysing the shapes to determine 
execution strategy. There are some useful cases:

* the validation is only on the triple added (e.g. sh:datatype) - and does not 
need access to the database so it can be done in parallel outside the 
transaction.
* the validation needs local changes (e.g. minCount) to the entity (subject and 
all triples with that subject) - that can be used to reduce the number of 
validations done. If a entity isn't touched, no validation necessary.
* global - needs access to the whole database. Not much can be done except 
execute inline at the end of the transaction. Often these are SPARQL constraits 
where you can e.g. count the triples.

/digression

For example, given that the shapes include the shape:
bds:PersonShape-hasParent
         a               sh:PropertyShape ;
         sh:class        bdo:Person ;
         sh:description  "this Person may have at most two parents."@en ;
         sh:inversePath  bdo:hasChild ;

??

         sh:maxCount     2 ;
         sh:path         bdo:hasParent ;
.
Then I thought that the validation process would check just that:
     P705 rdf:type bdo:Person .
as well as validating the count constraint; and in the case of the shape:



Yes - there are two constraints: sh:class and sh:maxCount

bds:PersonShape-hasFather
         a               sh:PropertyShape ;
         sh:description  "this Person may have a father."@en ;
         sh:inversePath  bdo:hasChild ;


Is that supposed to be:

sh:path [ sh:inversePath  bdo:hasChild  ]

?

A property shape has a sh:path and that sh:path can be a inverse path.

sh:inversePath isn't used on the property shape itself.

         sh:maxCount     1 ;
         sh:node         bds:MaleShape ;
         sh:path         bdo:hasFather ;


and now we have two sh:paths?

(If that is you shape, the sh:inversePath is going to be ignored as it is out 
of place.)

.
will in addition check that:
     P705 bdo:gender bdr:GenderMale .
and not check any other constraints on P705, such as its students or kinship 
relations.


If P705 is reached with "sh:path bdo:hasFather"

The purpose being that when a user has “edited” an existing resource or 
“created” a new resource then we just want to validate the changed or new 
resource without having the validation process traverse all resources reachable 
from P707 via arbitrary length paths, which is unnecessary.
Assuming the validator.validate(Shapes, Graph, Node) works along the lines I’ve 
sketched, then since the shacl endpoint doesn’t use this method it would take 
an extension to the endpoint or a new endpoint to accomplish want I’ve 
described.


See the code.

    validator.validate(Shapes, Graph, Node)

executes the shapes (any that apply) to the single focus node.  It does check 
the shapes to see which apply so the target clause (inc implicit targets) has 
to be something that would include the node.

I’m happy to raise an issue and create a PR if that makes sense.


Great.

    Andy

Thank you again very much,
Chris

On May 14, 2020, at 4:16 PM, Andy Seaborne <a...@apache.org> wrote:

On 14/05/2020 19:06, Chris Tomlinson wrote:

Hi Andy,
I want to validate a named graph in the context of the union graph. I don’t want to 
validate the union graph. The union graph has information in it such as the ontology 
which defines subClass and subProperty relations needed to successfully validate a 
target graph such as http://purl.bdrc.io/graph/P707 
<http://purl.bdrc.io/graph/P707>.


I don't understand "in the context of the union graph."

Isn't "Context" in RDF is "merge the graphs"?

Validation is a process that operates on a shapes graph (which is parsed so 
really its just shapes - anything else in it is ignored) and a data graph.

There's no structure to the data graph - it is everything being validated.

I did suggest some SHACL extensions

   https://afs.github.io/shacl-datasets.html

but they are hypothetical extensions.


In code, you could make a temporary union of two or more graphs to make a 
single data graph.

"a named graph in the context of the union graph."

So the NG is in addition to the dataset graphs? or is in in the dataset already?


In the SHACL service ?graph= is the data target and is taken from the dataset.

Also P707 refers to a parent and teacher P705 which needs to be verified that 
it meets minimum criteria for a Person.
I thought that validate(shapes, graph, node)


/** Produce a full validation report for this node in the data. */

i.e. use node as the focus node (like sh:targetNode) and execute the shapes 
only with that node.

But does P707 have one focus node or many?

should accomplish this if graph = the dataset graph which contains all these 
additional bits of information.
That’s why the endpoint is interesting since it provides in principle access to 
using shacl inside of Fuseki, where the entire dataset is available, without 
having to write an independent bit of code that we add to our fuseki 
deployments.


There is nothing special about Fuseki endpoint - any Dataset has a union graph.

It's a way to call

ValidationReport report =
    ShaclValidator.get().validate(shapesGraph, data);

on a remote data graph.

I hope this clarifies what I’m wanting to accomplish. I probably don’t 
understand what validate(shapes, graph, node) is supposed to do.
Thanks for your patience,
Chris

On May 14, 2020, at 12:34 PM, Andy Seaborne <a...@apache.org> wrote:

?graph names the graph to be validated.

?graph can be a URI of a named graph in the dataset

or ?graph=default for the default graph (note: this is the storage default 
graph, not the union default graph)

or ?graph=union for the union of all named graphs which is what I think you're 
asking for.

(This is the org.apache.jena.fuseki.servlets.SHACL_Validation servlet.)


On 14/05/2020 15:40, Chris Tomlinson wrote:

Hi Andy,
Thanks very much for the shacl guidance. The use of sh:targetSubjectsOf is 
quite helpful. I replaced the bdo:personName w/ bdo:isRoot which must be 
present on any Entity resource so that if a Work or Place or other entity is 
checked it will fail if it isn’t a bdo:Person.
This still fails in the event that there is no bdo:isRoot so in some way that 
negative needs also to be caught to weed out really malformed graphs.
I still have a question about the shacl endpoint:
     Is the ?graph parameter validated in the context of the entire dataset 
specified in the endpoint URL or just the named graph itself?
It appears to be just the named graph itself so is the same as running the 
shacl command outside of Fuseki.


Yes - as above, it can be the union.

We are wanting a validation of the named graph against the entire (union) 
dataset graph


Not sure what "against" means here. There is a shapes graph in the validate 
request and data graph, which can be the union graph of the dataset.

To direct the validation to a certain node, use sh:targetNode.

which will have sufficient information about subClassOf* and external resources 
like P705 without entailing a validation of all nodes reachable from triples in 
the ?graph named graph. This might be similar to:
     validator.validate(shapes, dsg, node)
where node would be the root resource URI like, <http://purl.bdrc.io/resource/P707 
<http://purl.bdrc.io/resource/P707>>.
Is this something that needs an issue raised and a bit of extension of the 
endpoint or is there another way to get this kind of behavior through the 
endpoint?
Thank you very much for your help,
Chris

On May 13, 2020, at 12:16 PM, Andy Seaborne <a...@apache.org> wrote:



On 13/05/2020 16:03, Chris Tomlinson wrote:

Hi Andy,
Thank you for the reply. I can get your example to work as you indicate, but 
have some questions:
1) I went through the latest SHACL draft 
<https://w3c.github.io/data-shapes/shacl/> and I cannot find how to know that 
sh:targetNode always executes. It’s also not clear to me what it means to execute. I 
thought that sh:targetNode X was a way to restrict a shape to X in the data graph, 
whatever X might be.


It sets the target node to X and that becomes $this.

It does not say the target has to be in the graph.

The tests use this idiom quite a lot.

This matters because in some places the spec is not complete and without some 
light reverse engineering from the tests, I'd not have been able to implement 
some of the SPARQL functionality (particularly SPARQL components, not the 
SPARQl constraints we're talking about here).

Also, RDF graphs do not have a formally defined set of nodes - they are a set 
of edges and any nodes you want can be used in triples.

2) What I’m trying to do is validate that a resource like 
http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707> is a 
Person, which at a minimum means that:
     <http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>  a  
<http://purl.bdrc.io/ontology/core/Person <http://purl.bdrc.io/ontology/core/Person>> .
is present in the http://purl.bdrc.io/graph/P707 <http://purl.bdrc.io/graph/P707>. 
The PersonShape 
<https://github.com/buda-base/editor-templates/blob/master/templates/core/person.shapes.ttl>
 has:
     sh:targetClass bdo:Person
but that only serves to say that PersonShape only applies to resources of class bdo:Person 
and if there are none, then there are no violations which means I can try to validate a 
bibliographic element such as http://purl.bdrc.io/resource/W1FPL1 
<http://purl.bdrc.io/resource/W1FPL1> which is of class bdo: ImageInstance but of 
course that still sh:conforms true since bds:PersonShape doesn’t apply and hence there 
aren’t any violations. (to see the resources, use 
http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl 
<http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl>, for example).
The use case is: a client submits a graph of a resource and claims it to be a bdo:Person 
or a subClassOf* it; and we want to validate the graph as a bdo:Person and so want to get 
the result “false" for bdr:W1FPL1 instead of “true".
It’s our intent to use a tool like shacl for this top-level task as well as 
validating the details liuke having at least one name, a gender, and so on.
I tried using something like your example:
bds:CheckPersonClassShape  a      sh:NodeShape ;
     rdfs:label      "Check Person Class Shape"@en ;
     sh:targetNode "Check Class" ;
     sh:sparql [
       a sh:SPARQLConstraint ;
       sh:prefixes [
         sh:declare [
           sh:prefix "rdf" ;
           sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; ;
         ] , [
           sh:prefix "bdo" ;
           sh:namespace "http://purl.bdrc.io/ontology/core/"; ;
         ]
       ] ;
       sh:select """
         select $this (rdf:type as ?path) (bdo:Person as ?value)
            where {
            filter not exists { $this ?path ?value }
            }
          """ ;


That query does not look right.

1/ $this is the targetNode

$this is "Check Class" - the shape needs to find the thing that is the person 
amongst the several subjects in the data. That can be in the SPARQL or as a target of 
some kind.

Either set the target to be bdr:P707
or find a signature such has a bdo:personName triple. "sh:targetSubjectsOf 
bdo:personName"
or write some pattern in the SPARQL query.

You may want some "whole graph" validation such as not completely empty or has at least 
some relevant vocabulary to ensure that the data is not so off that nothing will trigger.  That's 
where the sh:targetNode "foobar" trick comes in.

2/ It's looking for any triple with $this as subject, not "a bdo:Person"

The SELECT-AS happens after the WHERE.
FILTER NOT EXISTS does not set ?path ?value so if they are unset there are free 
variables.

   filter not exists { $this ?P ?O }

would be just the same and matches any triple with $this as subject.



You want to set ?value and ?path before the FILTER:

  BIND (bdo:Person as ?value)
  BIND (rdf:type as ?path)



or write directly and not worry about ?path and ?value.

  filter not exists { $this rdf:type bdo:Person }

(
The message processing from SPARQL constraints and components doesn't do 
templating.
)

     ] ;
.
But this just always reports a violation that the literal, “Check Class”, 
doesn’t conform, which is true since it isn’t in the data graph.



bds:CheckPersonClassShape  a      sh:NodeShape ;
    rdfs:label      "Check Person Class Shape"@en ;
    ## sh:targetNode bdr:P707 ;
    sh:targetSubjectsOf bdo:personName ;
    sh:sparql [
      a sh:SPARQLConstraint ;
      sh:prefixes [
        sh:declare [
          sh:prefix "rdf" ;
          sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; ;
        ] , [
          sh:prefix "bdo" ;
          sh:namespace "http://purl.bdrc.io/ontology/core/"; ;
        ]
      ] ;
      sh:select """
        select $this
            where {
           filter not exists { $this rdf:type bdo:Person }
            }
          """ ;
    ] ;
.



shacl validate -v -s shapes.ttl -d P707.ttl

shows the validation when "a  bdo:Person ;" commented out of the data:

NodeShape[http://example/CheckPersonClassShape]
N: FocusNodes(1): [http://purl.bdrc.io/resource/P707]
  F: http://purl.bdrc.io/resource/P707
  S: NodeShape[http://example/CheckPersonClassShape]
  C: SPARQL[PREFIX  bdo:  <http://purl.bdrc.io/ontology/core/> PREFIX rdf:  
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  SELECT  ?this WHERE {   FILTER NOT 
EXISTS { ?this  rdf:type  bdo:Person } }]

... prefixes ...

[ a            sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ a                             sh:ValidationResult ;
                 sh:focusNode                  bdr:P707 ;
                 sh:resultMessage              "SPARQL SELECT constraint for 
<http://purl.bdrc.io/resource/P707> returns <http://purl.bdrc.io/resource/P707>" ;
                 sh:resultSeverity             sh:Violation ;
                 sh:sourceConstraintComponent sh:SPARQLConstraintComponent ;
                 sh:sourceShape                bds:CheckPersonClassShape ;
                 sh:value                      bdr:P707
               ]
] .

3) The original reason for wanting to use the shacl endpoint was so that we 
could PUT the submitted graph in the Fuseki dataset and then use the endpoint 
to validate the resource bdr:P707 (or bdr:W1FPL1) as a Person (or not) with the 
rest of the dataset graph available to handle things like subClassOf*  and 
subPropertyOf* for various items as well as validating the minimum of resources 
referenced by P707 such as that P705 is a male person and hence can be a father 
of P707.


That sounds like

   sh:targetNode bdr:P707

and also some shapes to check "is there anything relevant at all".

    Andy

The graph for P707 that is submitted would only have references to P705, with 
no properties on P705, since that resource is in its own graph.
I thought this is pretty much how validate(Shapes Graph, Node) would work, 
where Graph would be the union dataset graph.
I’m evidently missing some understanding.
I appreciate your patience,
Chris

On May 12, 2020, at 3:52 AM, Andy Seaborne <a...@apache.org> wrote:

Chris,

Here's a shape that always executes and tests for an empty data graph.

# No violation
shacl validate -v -shapes ex-shapes.ttl -data not-empty.ttl

# Violation
shacl validate -v -shapes ex-shapes.ttl -data empty.nt

"sh:targetNode" always executes.

With this pattern, the SPARQL query can do arbitrary checks.

    Andy

## ex-shapes.ttl
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>

PREFIX sh:      <http://www.w3.org/ns/shacl#>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

PREFIX ex:        <http://example/>

ex:NotEmptyGraphShape
  rdf:type sh:NodeShape ;
  sh:targetNode "Empty Graph" ;
  sh:sparql [
    a sh:SPARQLConstraint ;
    sh:select """
        SELECT $this ?value
        WHERE {
            FILTER NOT EXISTS { ?s ?p ?o }
        }
        """ ;
   ] .

On 11/05/2020 17:14, Chris Tomlinson wrote:

I appreciate that it works that way but until and unless I can understand your 
point about
  [] sh:targetNode ex:myNode
then I don’t know how to distinguish: 1) no violations because a Person graph 
conforms to the PersonShapes - like there’s no Work indicated as a parent of 
the person or a rdfs:label is used where a skos:prefLabel is expected; versus 
2) no violations because the question is vacuous like asking if a Work looks 
like a person or an empty non-existent graph looks like a person.

Re: SHACL Endpoint questions

Reply via email to