Re: SHACL Endpoint questions

Chris Tomlinson Tue, 19 May 2020 10:44:37 -0700

Hi Andy,

Thanks for the very helpful feedback.


1) I did not understand the proper use of sh:inversePath. I thought it was to 
verify that the target of the target/value of the sh:path property had a 
property equal to the value of sh:inversePath. I see that is just not correct.

2) I’ve found an effective solution to the problem of limiting validation to 
just the triples that should be in the graph of a resource such as bdr:P707 by 
creating a new shapes module that uses  [ ] sh:deactivated true   on any 
propertyShapes that leave the graph in question.

I’m getting closer to being able to formulate a plausible extension to the 
shacl endpoint.

Thank you again for your help in the midst of all the 3.15.0 work,
Chris


> On May 16, 2020, at 5:45 AM, Andy Seaborne <[email protected]> wrote:
> 
> 
> 
> On 15/05/2020 00:57, Chris Tomlinson wrote:
>> Hello Andy,
>> I have standalone code using validator.validate(Shapes, Graph, Node) where 
>> the graph is a merge of the target graph, e.g., P707, and the ontology 
>> graph. This works fine to validate examples like P707 generating sh:results 
>> just for references to P705 which is not otherwise included in the merged 
>> graph, which is what I expect.
>> If the code was running in Fuseki and the graph is the dataset graph (equiv 
>> union graph I think) then I would like to know how far out from the node the 
>> validation process will reach.
> 
> That depends on the target (or, here, implicit class target) and the shape 
> itself.
> 
> digression
> 
> ... something that I've experimented with - analysing the shapes to determine 
> execution strategy. There are some useful cases:
> 
> * the validation is only on the triple added (e.g. sh:datatype) - and does 
> not need access to the database so it can be done in parallel outside the 
> transaction.
> * the validation needs local changes (e.g. minCount) to the entity (subject 
> and all triples with that subject) - that can be used to reduce the number of 
> validations done. If a entity isn't touched, no validation necessary.
> * global - needs access to the whole database. Not much can be done except 
> execute inline at the end of the transaction. Often these are SPARQL 
> constraits where you can e.g. count the triples.
> 
> /digression
> 
>> For example, given that the shapes include the shape:
>> bds:PersonShape-hasParent
>>         a               sh:PropertyShape ;
>>         sh:class        bdo:Person ;
>>         sh:description  "this Person may have at most two parents."@en ;
>>         sh:inversePath  bdo:hasChild ;
> 
> ??
> 
>>         sh:maxCount     2 ;
>>         sh:path         bdo:hasParent ;
>> .
>> Then I thought that the validation process would check just that:
>>     P705 rdf:type bdo:Person .
>> as well as validating the count constraint; and in the case of the shape:
> 
> 
> Yes - there are two constraints: sh:class and sh:maxCount
> 
>> bds:PersonShape-hasFather
>>         a               sh:PropertyShape ;
>>         sh:description  "this Person may have a father."@en ;
>>         sh:inversePath  bdo:hasChild ;
> 
> Is that supposed to be:
> 
> sh:path [ sh:inversePath  bdo:hasChild  ]
> 
> ?
> 
> A property shape has a sh:path and that sh:path can be a inverse path.
> 
> sh:inversePath isn't used on the property shape itself.
> 
>>         sh:maxCount     1 ;
>>         sh:node         bds:MaleShape ;
>>         sh:path         bdo:hasFather ;
> 
> and now we have two sh:paths?
> 
> (If that is you shape, the sh:inversePath is going to be ignored as it is out 
> of place.)
> 
>> .
>> will in addition check that:
>>     P705 bdo:gender bdr:GenderMale .
>> and not check any other constraints on P705, such as its students or kinship 
>> relations.
> 
> If P705 is reached with "sh:path bdo:hasFather"
> 
>> The purpose being that when a user has “edited” an existing resource or 
>> “created” a new resource then we just want to validate the changed or new 
>> resource without having the validation process traverse all resources 
>> reachable from P707 via arbitrary length paths, which is unnecessary.
>> Assuming the validator.validate(Shapes, Graph, Node) works along the lines 
>> I’ve sketched, then since the shacl endpoint doesn’t use this method it 
>> would take an extension to the endpoint or a new endpoint to accomplish want 
>> I’ve described.
> 
> See the code.
> 
>    validator.validate(Shapes, Graph, Node)
> 
> executes the shapes (any that apply) to the single focus node.  It does check 
> the shapes to see which apply so the target clause (inc implicit targets) has 
> to be something that would include the node.
> 
> 
>> I’m happy to raise an issue and create a PR if that makes sense.
> 
> Great.
> 
>    Andy
> 
>> Thank you again very much,
>> Chris
>>> On May 14, 2020, at 4:16 PM, Andy Seaborne <[email protected]> wrote:
>>> 
>>> On 14/05/2020 19:06, Chris Tomlinson wrote:
>>>> Hi Andy,
>>>> I want to validate a named graph in the context of the union graph. I 
>>>> don’t want to validate the union graph. The union graph has information in 
>>>> it such as the ontology which defines subClass and subProperty relations 
>>>> needed to successfully validate a target graph such as 
>>>> http://purl.bdrc.io/graph/P707 <http://purl.bdrc.io/graph/P707>.
>>> 
>>> I don't understand "in the context of the union graph."
>>> 
>>> Isn't "Context" in RDF is "merge the graphs"?
>>> 
>>> Validation is a process that operates on a shapes graph (which is parsed so 
>>> really its just shapes - anything else in it is ignored) and a data graph.
>>> 
>>> There's no structure to the data graph - it is everything being validated.
>>> 
>>> I did suggest some SHACL extensions
>>> 
>>>   https://afs.github.io/shacl-datasets.html
>>> 
>>> but they are hypothetical extensions.
>>> 
>>> 
>>> In code, you could make a temporary union of two or more graphs to make a 
>>> single data graph.
>>> 
>>> "a named graph in the context of the union graph."
>>> 
>>> So the NG is in addition to the dataset graphs? or is in in the dataset 
>>> already?
>>> 
>>> 
>>> In the SHACL service ?graph= is the data target and is taken from the 
>>> dataset.
>>> 
>>>> Also P707 refers to a parent and teacher P705 which needs to be verified 
>>>> that it meets minimum criteria for a Person.
>>>> I thought that validate(shapes, graph, node)
>>> 
>>> /** Produce a full validation report for this node in the data. */
>>> 
>>> i.e. use node as the focus node (like sh:targetNode) and execute the shapes 
>>> only with that node.
>>> 
>>> But does P707 have one focus node or many?
>>> 
>>>> should accomplish this if graph = the dataset graph which contains all 
>>>> these additional bits of information.
>>>> That’s why the endpoint is interesting since it provides in principle 
>>>> access to using shacl inside of Fuseki, where the entire dataset is 
>>>> available, without having to write an independent bit of code that we add 
>>>> to our fuseki deployments.
>>> 
>>> There is nothing special about Fuseki endpoint - any Dataset has a union 
>>> graph.
>>> 
>>> It's a way to call
>>> 
>>> ValidationReport report =
>>>    ShaclValidator.get().validate(shapesGraph, data);
>>> 
>>> on a remote data graph.
>>> 
>>>> I hope this clarifies what I’m wanting to accomplish. I probably don’t 
>>>> understand what validate(shapes, graph, node) is supposed to do.
>>>> Thanks for your patience,
>>>> Chris
>>>>> On May 14, 2020, at 12:34 PM, Andy Seaborne <[email protected]> wrote:
>>>>> 
>>>>> ?graph names the graph to be validated.
>>>>> 
>>>>> ?graph can be a URI of a named graph in the dataset
>>>>> 
>>>>> or ?graph=default for the default graph (note: this is the storage 
>>>>> default graph, not the union default graph)
>>>>> 
>>>>> or ?graph=union for the union of all named graphs which is what I think 
>>>>> you're asking for.
>>>>> 
>>>>> (This is the org.apache.jena.fuseki.servlets.SHACL_Validation servlet.)
>>>>> 
>>>>> 
>>>>> On 14/05/2020 15:40, Chris Tomlinson wrote:
>>>>>> Hi Andy,
>>>>>> Thanks very much for the shacl guidance. The use of sh:targetSubjectsOf 
>>>>>> is quite helpful. I replaced the bdo:personName w/ bdo:isRoot which must 
>>>>>> be present on any Entity resource so that if a Work or Place or other 
>>>>>> entity is checked it will fail if it isn’t a bdo:Person.
>>>>>> This still fails in the event that there is no bdo:isRoot so in some way 
>>>>>> that negative needs also to be caught to weed out really malformed 
>>>>>> graphs.
>>>>>> I still have a question about the shacl endpoint:
>>>>>>     Is the ?graph parameter validated in the context of the entire 
>>>>>> dataset specified in the endpoint URL or just the named graph itself?
>>>>>> It appears to be just the named graph itself so is the same as running 
>>>>>> the shacl command outside of Fuseki.
>>>>> 
>>>>> Yes - as above, it can be the union.
>>>>> 
>>>>>> We are wanting a validation of the named graph against the entire 
>>>>>> (union) dataset graph
>>>>> 
>>>>> Not sure what "against" means here. There is a shapes graph in the 
>>>>> validate request and data graph, which can be the union graph of the 
>>>>> dataset.
>>>>> 
>>>>> To direct the validation to a certain node, use sh:targetNode.
>>>>> 
>>>>>> which will have sufficient information about subClassOf* and external 
>>>>>> resources like P705 without entailing a validation of all nodes 
>>>>>> reachable from triples in the ?graph named graph. This might be similar 
>>>>>> to:
>>>>>>     validator.validate(shapes, dsg, node)
>>>>>> where node would be the root resource URI like, 
>>>>>> <http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>.
>>>>>> Is this something that needs an issue raised and a bit of extension of 
>>>>>> the endpoint or is there another way to get this kind of behavior 
>>>>>> through the endpoint?
>>>>>> Thank you very much for your help,
>>>>>> Chris
>>>>>>> On May 13, 2020, at 12:16 PM, Andy Seaborne <[email protected]> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 13/05/2020 16:03, Chris Tomlinson wrote:
>>>>>>>> Hi Andy,
>>>>>>>> Thank you for the reply. I can get your example to work as you 
>>>>>>>> indicate, but have some questions:
>>>>>>>> 1) I went through the latest SHACL draft 
>>>>>>>> <https://w3c.github.io/data-shapes/shacl/> and I cannot find how to 
>>>>>>>> know that sh:targetNode always executes. It’s also not clear to me 
>>>>>>>> what it means to execute. I thought that sh:targetNode X was a way to 
>>>>>>>> restrict a shape to X in the data graph, whatever X might be.
>>>>>>> 
>>>>>>> It sets the target node to X and that becomes $this.
>>>>>>> 
>>>>>>> It does not say the target has to be in the graph.
>>>>>>> 
>>>>>>> The tests use this idiom quite a lot.
>>>>>>> 
>>>>>>> This matters because in some places the spec is not complete and 
>>>>>>> without some light reverse engineering from the tests, I'd not have 
>>>>>>> been able to implement some of the SPARQL functionality (particularly 
>>>>>>> SPARQL components, not the SPARQl constraints we're talking about here).
>>>>>>> 
>>>>>>> Also, RDF graphs do not have a formally defined set of nodes - they are 
>>>>>>> a set of edges and any nodes you want can be used in triples.
>>>>>>> 
>>>>>>>> 2) What I’m trying to do is validate that a resource like 
>>>>>>>> http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707> 
>>>>>>>> is a Person, which at a minimum means that:
>>>>>>>>     <http://purl.bdrc.io/resource/P707 
>>>>>>>> <http://purl.bdrc.io/resource/P707>>  a  
>>>>>>>> <http://purl.bdrc.io/ontology/core/Person 
>>>>>>>> <http://purl.bdrc.io/ontology/core/Person>> .
>>>>>>>> is present in the http://purl.bdrc.io/graph/P707 
>>>>>>>> <http://purl.bdrc.io/graph/P707>. The PersonShape 
>>>>>>>> <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.shapes.ttl>
>>>>>>>>  has:
>>>>>>>>     sh:targetClass bdo:Person
>>>>>>>> but that only serves to say that PersonShape only applies to resources 
>>>>>>>> of class bdo:Person and if there are none, then there are no 
>>>>>>>> violations which means I can try to validate a bibliographic element 
>>>>>>>> such as http://purl.bdrc.io/resource/W1FPL1 
>>>>>>>> <http://purl.bdrc.io/resource/W1FPL1> which is of class bdo: 
>>>>>>>> ImageInstance but of course that still sh:conforms true since 
>>>>>>>> bds:PersonShape doesn’t apply and hence there aren’t any violations. 
>>>>>>>> (to see the resources, use 
>>>>>>>> http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl 
>>>>>>>> <http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl>, for example).
>>>>>>>> The use case is: a client submits a graph of a resource and claims it 
>>>>>>>> to be a bdo:Person or a subClassOf* it; and we want to validate the 
>>>>>>>> graph as a bdo:Person and so want to get the result “false" for 
>>>>>>>> bdr:W1FPL1 instead of “true".
>>>>>>>> It’s our intent to use a tool like shacl for this top-level task as 
>>>>>>>> well as validating the details liuke having at least one name, a 
>>>>>>>> gender, and so on.
>>>>>>>> I tried using something like your example:
>>>>>>>> bds:CheckPersonClassShape  a      sh:NodeShape ;
>>>>>>>>     rdfs:label      "Check Person Class Shape"@en ;
>>>>>>>>     sh:targetNode "Check Class" ;
>>>>>>>>     sh:sparql [
>>>>>>>>       a sh:SPARQLConstraint ;
>>>>>>>>       sh:prefixes [
>>>>>>>>         sh:declare [
>>>>>>>>           sh:prefix "rdf" ;
>>>>>>>>           sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; ;
>>>>>>>>         ] , [
>>>>>>>>           sh:prefix "bdo" ;
>>>>>>>>           sh:namespace "http://purl.bdrc.io/ontology/core/"; ;
>>>>>>>>         ]
>>>>>>>>       ] ;
>>>>>>>>       sh:select """
>>>>>>>>         select $this (rdf:type as ?path) (bdo:Person as ?value)
>>>>>>>>            where {
>>>>>>>>            filter not exists { $this ?path ?value }
>>>>>>>>            }
>>>>>>>>          """ ;
>>>>>>> 
>>>>>>> That query does not look right.
>>>>>>> 
>>>>>>> 1/ $this is the targetNode
>>>>>>> 
>>>>>>> $this is "Check Class" - the shape needs to find the thing that is the 
>>>>>>> person amongst the several subjects in the data. That can be in the 
>>>>>>> SPARQL or as a target of some kind.
>>>>>>> 
>>>>>>> Either set the target to be bdr:P707
>>>>>>> or find a signature such has a bdo:personName triple. 
>>>>>>> "sh:targetSubjectsOf bdo:personName"
>>>>>>> or write some pattern in the SPARQL query.
>>>>>>> 
>>>>>>> You may want some "whole graph" validation such as not completely empty 
>>>>>>> or has at least some relevant vocabulary to ensure that the data is not 
>>>>>>> so off that nothing will trigger.  That's where the sh:targetNode 
>>>>>>> "foobar" trick comes in.
>>>>>>> 
>>>>>>> 2/ It's looking for any triple with $this as subject, not "a bdo:Person"
>>>>>>> 
>>>>>>> The SELECT-AS happens after the WHERE.
>>>>>>> FILTER NOT EXISTS does not set ?path ?value so if they are unset there 
>>>>>>> are free variables.
>>>>>>> 
>>>>>>>   filter not exists { $this ?P ?O }
>>>>>>> 
>>>>>>> would be just the same and matches any triple with $this as subject.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> You want to set ?value and ?path before the FILTER:
>>>>>>> 
>>>>>>>  BIND (bdo:Person as ?value)
>>>>>>>  BIND (rdf:type as ?path)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> or write directly and not worry about ?path and ?value.
>>>>>>> 
>>>>>>>  filter not exists { $this rdf:type bdo:Person }
>>>>>>> 
>>>>>>> (
>>>>>>> The message processing from SPARQL constraints and components doesn't 
>>>>>>> do templating.
>>>>>>> )
>>>>>>> 
>>>>>>>>     ] ;
>>>>>>>> .
>>>>>>>> But this just always reports a violation that the literal, “Check 
>>>>>>>> Class”, doesn’t conform, which is true since it isn’t in the data 
>>>>>>>> graph.
>>>>>>> 
>>>>>>> 
>>>>>>> bds:CheckPersonClassShape  a      sh:NodeShape ;
>>>>>>>    rdfs:label      "Check Person Class Shape"@en ;
>>>>>>>    ## sh:targetNode bdr:P707 ;
>>>>>>>    sh:targetSubjectsOf bdo:personName ;
>>>>>>>    sh:sparql [
>>>>>>>      a sh:SPARQLConstraint ;
>>>>>>>      sh:prefixes [
>>>>>>>        sh:declare [
>>>>>>>          sh:prefix "rdf" ;
>>>>>>>          sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; ;
>>>>>>>        ] , [
>>>>>>>          sh:prefix "bdo" ;
>>>>>>>          sh:namespace "http://purl.bdrc.io/ontology/core/"; ;
>>>>>>>        ]
>>>>>>>      ] ;
>>>>>>>      sh:select """
>>>>>>>        select $this
>>>>>>>             where {
>>>>>>>           filter not exists { $this rdf:type bdo:Person }
>>>>>>>             }
>>>>>>>           """ ;
>>>>>>>    ] ;
>>>>>>> .
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> shacl validate -v -s shapes.ttl -d P707.ttl
>>>>>>> 
>>>>>>> shows the validation when "a  bdo:Person ;" commented out of the data:
>>>>>>> 
>>>>>>> NodeShape[http://example/CheckPersonClassShape]
>>>>>>> N: FocusNodes(1): [http://purl.bdrc.io/resource/P707]
>>>>>>>  F: http://purl.bdrc.io/resource/P707
>>>>>>>  S: NodeShape[http://example/CheckPersonClassShape]
>>>>>>>  C: SPARQL[PREFIX  bdo:  <http://purl.bdrc.io/ontology/core/> PREFIX 
>>>>>>> rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  SELECT  ?this 
>>>>>>> WHERE {   FILTER NOT EXISTS { ?this  rdf:type  bdo:Person } }]
>>>>>>> 
>>>>>>> ... prefixes ...
>>>>>>> 
>>>>>>> [ a            sh:ValidationReport ;
>>>>>>>  sh:conforms  false ;
>>>>>>>  sh:result    [ a                             sh:ValidationResult ;
>>>>>>>                 sh:focusNode                  bdr:P707 ;
>>>>>>>                 sh:resultMessage              "SPARQL SELECT constraint 
>>>>>>> for <http://purl.bdrc.io/resource/P707> returns 
>>>>>>> <http://purl.bdrc.io/resource/P707>" ;
>>>>>>>                 sh:resultSeverity             sh:Violation ;
>>>>>>>                 sh:sourceConstraintComponent 
>>>>>>> sh:SPARQLConstraintComponent ;
>>>>>>>                 sh:sourceShape                bds:CheckPersonClassShape 
>>>>>>> ;
>>>>>>>                 sh:value                      bdr:P707
>>>>>>>               ]
>>>>>>> ] .
>>>>>>> 
>>>>>>> 
>>>>>>>> 3) The original reason for wanting to use the shacl endpoint was so 
>>>>>>>> that we could PUT the submitted graph in the Fuseki dataset and then 
>>>>>>>> use the endpoint to validate the resource bdr:P707 (or bdr:W1FPL1) as 
>>>>>>>> a Person (or not) with the rest of the dataset graph available to 
>>>>>>>> handle things like subClassOf*  and subPropertyOf* for various items 
>>>>>>>> as well as validating the minimum of resources referenced by P707 such 
>>>>>>>> as that P705 is a male person and hence can be a father of P707.
>>>>>>> 
>>>>>>> That sounds like
>>>>>>> 
>>>>>>>   sh:targetNode bdr:P707
>>>>>>> 
>>>>>>> and also some shapes to check "is there anything relevant at all".
>>>>>>> 
>>>>>>>    Andy
>>>>>>> 
>>>>>>>> The graph for P707 that is submitted would only have references to 
>>>>>>>> P705, with no properties on P705, since that resource is in its own 
>>>>>>>> graph.
>>>>>>>> I thought this is pretty much how validate(Shapes Graph, Node) would 
>>>>>>>> work, where Graph would be the union dataset graph.
>>>>>>>> I’m evidently missing some understanding.
>>>>>>>> I appreciate your patience,
>>>>>>>> Chris
>>>>>>>>> On May 12, 2020, at 3:52 AM, Andy Seaborne <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Chris,
>>>>>>>>> 
>>>>>>>>> Here's a shape that always executes and tests for an empty data graph.
>>>>>>>>> 
>>>>>>>>> # No violation
>>>>>>>>> shacl validate -v -shapes ex-shapes.ttl -data not-empty.ttl
>>>>>>>>> 
>>>>>>>>> # Violation
>>>>>>>>> shacl validate -v -shapes ex-shapes.ttl -data empty.nt
>>>>>>>>> 
>>>>>>>>> "sh:targetNode" always executes.
>>>>>>>>> 
>>>>>>>>> With this pattern, the SPARQL query can do arbitrary checks.
>>>>>>>>> 
>>>>>>>>>    Andy
>>>>>>>>> 
>>>>>>>>> ## ex-shapes.ttl
>>>>>>>>> PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>>>> PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>>>> 
>>>>>>>>> PREFIX sh:      <http://www.w3.org/ns/shacl#>
>>>>>>>>> PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>
>>>>>>>>> 
>>>>>>>>> PREFIX ex:        <http://example/>
>>>>>>>>> 
>>>>>>>>> ex:NotEmptyGraphShape
>>>>>>>>>  rdf:type sh:NodeShape ;
>>>>>>>>>  sh:targetNode "Empty Graph" ;
>>>>>>>>>  sh:sparql [
>>>>>>>>>    a sh:SPARQLConstraint ;
>>>>>>>>>    sh:select """
>>>>>>>>>       SELECT $this ?value
>>>>>>>>>       WHERE {
>>>>>>>>>            FILTER NOT EXISTS { ?s ?p ?o }
>>>>>>>>>       }
>>>>>>>>>       """ ;
>>>>>>>>>   ] .
>>>>>>>>> 
>>>>>>>>> On 11/05/2020 17:14, Chris Tomlinson wrote:
>>>>>>>>> 
>>>>>>>>>> I appreciate that it works that way but until and unless I can 
>>>>>>>>>> understand your point about
>>>>>>>>>>  [] sh:targetNode ex:myNode
>>>>>>>>>> then I don’t know how to distinguish: 1) no violations because a 
>>>>>>>>>> Person graph conforms to the PersonShapes - like there’s no Work 
>>>>>>>>>> indicated as a parent of the person or a rdfs:label is used where a 
>>>>>>>>>> skos:prefLabel is expected; versus 2) no violations because the 
>>>>>>>>>> question is vacuous like asking if a Work looks like a person or an 
>>>>>>>>>> empty non-existent graph looks like a person.

Re: SHACL Endpoint questions

Reply via email to