Hi Andy, Thanks for the very helpful feedback.
1) I did not understand the proper use of sh:inversePath. I thought it was to verify that the target of the target/value of the sh:path property had a property equal to the value of sh:inversePath. I see that is just not correct. 2) I’ve found an effective solution to the problem of limiting validation to just the triples that should be in the graph of a resource such as bdr:P707 by creating a new shapes module that uses [ ] sh:deactivated true on any propertyShapes that leave the graph in question. I’m getting closer to being able to formulate a plausible extension to the shacl endpoint. Thank you again for your help in the midst of all the 3.15.0 work, Chris > On May 16, 2020, at 5:45 AM, Andy Seaborne <[email protected]> wrote: > > > > On 15/05/2020 00:57, Chris Tomlinson wrote: >> Hello Andy, >> I have standalone code using validator.validate(Shapes, Graph, Node) where >> the graph is a merge of the target graph, e.g., P707, and the ontology >> graph. This works fine to validate examples like P707 generating sh:results >> just for references to P705 which is not otherwise included in the merged >> graph, which is what I expect. >> If the code was running in Fuseki and the graph is the dataset graph (equiv >> union graph I think) then I would like to know how far out from the node the >> validation process will reach. > > That depends on the target (or, here, implicit class target) and the shape > itself. > > digression > > ... something that I've experimented with - analysing the shapes to determine > execution strategy. There are some useful cases: > > * the validation is only on the triple added (e.g. sh:datatype) - and does > not need access to the database so it can be done in parallel outside the > transaction. > * the validation needs local changes (e.g. minCount) to the entity (subject > and all triples with that subject) - that can be used to reduce the number of > validations done. If a entity isn't touched, no validation necessary. > * global - needs access to the whole database. Not much can be done except > execute inline at the end of the transaction. Often these are SPARQL > constraits where you can e.g. count the triples. > > /digression > >> For example, given that the shapes include the shape: >> bds:PersonShape-hasParent >> a sh:PropertyShape ; >> sh:class bdo:Person ; >> sh:description "this Person may have at most two parents."@en ; >> sh:inversePath bdo:hasChild ; > > ?? > >> sh:maxCount 2 ; >> sh:path bdo:hasParent ; >> . >> Then I thought that the validation process would check just that: >> P705 rdf:type bdo:Person . >> as well as validating the count constraint; and in the case of the shape: > > > Yes - there are two constraints: sh:class and sh:maxCount > >> bds:PersonShape-hasFather >> a sh:PropertyShape ; >> sh:description "this Person may have a father."@en ; >> sh:inversePath bdo:hasChild ; > > Is that supposed to be: > > sh:path [ sh:inversePath bdo:hasChild ] > > ? > > A property shape has a sh:path and that sh:path can be a inverse path. > > sh:inversePath isn't used on the property shape itself. > >> sh:maxCount 1 ; >> sh:node bds:MaleShape ; >> sh:path bdo:hasFather ; > > and now we have two sh:paths? > > (If that is you shape, the sh:inversePath is going to be ignored as it is out > of place.) > >> . >> will in addition check that: >> P705 bdo:gender bdr:GenderMale . >> and not check any other constraints on P705, such as its students or kinship >> relations. > > If P705 is reached with "sh:path bdo:hasFather" > >> The purpose being that when a user has “edited” an existing resource or >> “created” a new resource then we just want to validate the changed or new >> resource without having the validation process traverse all resources >> reachable from P707 via arbitrary length paths, which is unnecessary. >> Assuming the validator.validate(Shapes, Graph, Node) works along the lines >> I’ve sketched, then since the shacl endpoint doesn’t use this method it >> would take an extension to the endpoint or a new endpoint to accomplish want >> I’ve described. > > See the code. > > validator.validate(Shapes, Graph, Node) > > executes the shapes (any that apply) to the single focus node. It does check > the shapes to see which apply so the target clause (inc implicit targets) has > to be something that would include the node. > > >> I’m happy to raise an issue and create a PR if that makes sense. > > Great. > > Andy > >> Thank you again very much, >> Chris >>> On May 14, 2020, at 4:16 PM, Andy Seaborne <[email protected]> wrote: >>> >>> On 14/05/2020 19:06, Chris Tomlinson wrote: >>>> Hi Andy, >>>> I want to validate a named graph in the context of the union graph. I >>>> don’t want to validate the union graph. The union graph has information in >>>> it such as the ontology which defines subClass and subProperty relations >>>> needed to successfully validate a target graph such as >>>> http://purl.bdrc.io/graph/P707 <http://purl.bdrc.io/graph/P707>. >>> >>> I don't understand "in the context of the union graph." >>> >>> Isn't "Context" in RDF is "merge the graphs"? >>> >>> Validation is a process that operates on a shapes graph (which is parsed so >>> really its just shapes - anything else in it is ignored) and a data graph. >>> >>> There's no structure to the data graph - it is everything being validated. >>> >>> I did suggest some SHACL extensions >>> >>> https://afs.github.io/shacl-datasets.html >>> >>> but they are hypothetical extensions. >>> >>> >>> In code, you could make a temporary union of two or more graphs to make a >>> single data graph. >>> >>> "a named graph in the context of the union graph." >>> >>> So the NG is in addition to the dataset graphs? or is in in the dataset >>> already? >>> >>> >>> In the SHACL service ?graph= is the data target and is taken from the >>> dataset. >>> >>>> Also P707 refers to a parent and teacher P705 which needs to be verified >>>> that it meets minimum criteria for a Person. >>>> I thought that validate(shapes, graph, node) >>> >>> /** Produce a full validation report for this node in the data. */ >>> >>> i.e. use node as the focus node (like sh:targetNode) and execute the shapes >>> only with that node. >>> >>> But does P707 have one focus node or many? >>> >>>> should accomplish this if graph = the dataset graph which contains all >>>> these additional bits of information. >>>> That’s why the endpoint is interesting since it provides in principle >>>> access to using shacl inside of Fuseki, where the entire dataset is >>>> available, without having to write an independent bit of code that we add >>>> to our fuseki deployments. >>> >>> There is nothing special about Fuseki endpoint - any Dataset has a union >>> graph. >>> >>> It's a way to call >>> >>> ValidationReport report = >>> ShaclValidator.get().validate(shapesGraph, data); >>> >>> on a remote data graph. >>> >>>> I hope this clarifies what I’m wanting to accomplish. I probably don’t >>>> understand what validate(shapes, graph, node) is supposed to do. >>>> Thanks for your patience, >>>> Chris >>>>> On May 14, 2020, at 12:34 PM, Andy Seaborne <[email protected]> wrote: >>>>> >>>>> ?graph names the graph to be validated. >>>>> >>>>> ?graph can be a URI of a named graph in the dataset >>>>> >>>>> or ?graph=default for the default graph (note: this is the storage >>>>> default graph, not the union default graph) >>>>> >>>>> or ?graph=union for the union of all named graphs which is what I think >>>>> you're asking for. >>>>> >>>>> (This is the org.apache.jena.fuseki.servlets.SHACL_Validation servlet.) >>>>> >>>>> >>>>> On 14/05/2020 15:40, Chris Tomlinson wrote: >>>>>> Hi Andy, >>>>>> Thanks very much for the shacl guidance. The use of sh:targetSubjectsOf >>>>>> is quite helpful. I replaced the bdo:personName w/ bdo:isRoot which must >>>>>> be present on any Entity resource so that if a Work or Place or other >>>>>> entity is checked it will fail if it isn’t a bdo:Person. >>>>>> This still fails in the event that there is no bdo:isRoot so in some way >>>>>> that negative needs also to be caught to weed out really malformed >>>>>> graphs. >>>>>> I still have a question about the shacl endpoint: >>>>>> Is the ?graph parameter validated in the context of the entire >>>>>> dataset specified in the endpoint URL or just the named graph itself? >>>>>> It appears to be just the named graph itself so is the same as running >>>>>> the shacl command outside of Fuseki. >>>>> >>>>> Yes - as above, it can be the union. >>>>> >>>>>> We are wanting a validation of the named graph against the entire >>>>>> (union) dataset graph >>>>> >>>>> Not sure what "against" means here. There is a shapes graph in the >>>>> validate request and data graph, which can be the union graph of the >>>>> dataset. >>>>> >>>>> To direct the validation to a certain node, use sh:targetNode. >>>>> >>>>>> which will have sufficient information about subClassOf* and external >>>>>> resources like P705 without entailing a validation of all nodes >>>>>> reachable from triples in the ?graph named graph. This might be similar >>>>>> to: >>>>>> validator.validate(shapes, dsg, node) >>>>>> where node would be the root resource URI like, >>>>>> <http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>. >>>>>> Is this something that needs an issue raised and a bit of extension of >>>>>> the endpoint or is there another way to get this kind of behavior >>>>>> through the endpoint? >>>>>> Thank you very much for your help, >>>>>> Chris >>>>>>> On May 13, 2020, at 12:16 PM, Andy Seaborne <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 13/05/2020 16:03, Chris Tomlinson wrote: >>>>>>>> Hi Andy, >>>>>>>> Thank you for the reply. I can get your example to work as you >>>>>>>> indicate, but have some questions: >>>>>>>> 1) I went through the latest SHACL draft >>>>>>>> <https://w3c.github.io/data-shapes/shacl/> and I cannot find how to >>>>>>>> know that sh:targetNode always executes. It’s also not clear to me >>>>>>>> what it means to execute. I thought that sh:targetNode X was a way to >>>>>>>> restrict a shape to X in the data graph, whatever X might be. >>>>>>> >>>>>>> It sets the target node to X and that becomes $this. >>>>>>> >>>>>>> It does not say the target has to be in the graph. >>>>>>> >>>>>>> The tests use this idiom quite a lot. >>>>>>> >>>>>>> This matters because in some places the spec is not complete and >>>>>>> without some light reverse engineering from the tests, I'd not have >>>>>>> been able to implement some of the SPARQL functionality (particularly >>>>>>> SPARQL components, not the SPARQl constraints we're talking about here). >>>>>>> >>>>>>> Also, RDF graphs do not have a formally defined set of nodes - they are >>>>>>> a set of edges and any nodes you want can be used in triples. >>>>>>> >>>>>>>> 2) What I’m trying to do is validate that a resource like >>>>>>>> http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707> >>>>>>>> is a Person, which at a minimum means that: >>>>>>>> <http://purl.bdrc.io/resource/P707 >>>>>>>> <http://purl.bdrc.io/resource/P707>> a >>>>>>>> <http://purl.bdrc.io/ontology/core/Person >>>>>>>> <http://purl.bdrc.io/ontology/core/Person>> . >>>>>>>> is present in the http://purl.bdrc.io/graph/P707 >>>>>>>> <http://purl.bdrc.io/graph/P707>. The PersonShape >>>>>>>> <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.shapes.ttl> >>>>>>>> has: >>>>>>>> sh:targetClass bdo:Person >>>>>>>> but that only serves to say that PersonShape only applies to resources >>>>>>>> of class bdo:Person and if there are none, then there are no >>>>>>>> violations which means I can try to validate a bibliographic element >>>>>>>> such as http://purl.bdrc.io/resource/W1FPL1 >>>>>>>> <http://purl.bdrc.io/resource/W1FPL1> which is of class bdo: >>>>>>>> ImageInstance but of course that still sh:conforms true since >>>>>>>> bds:PersonShape doesn’t apply and hence there aren’t any violations. >>>>>>>> (to see the resources, use >>>>>>>> http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl >>>>>>>> <http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl>, for example). >>>>>>>> The use case is: a client submits a graph of a resource and claims it >>>>>>>> to be a bdo:Person or a subClassOf* it; and we want to validate the >>>>>>>> graph as a bdo:Person and so want to get the result “false" for >>>>>>>> bdr:W1FPL1 instead of “true". >>>>>>>> It’s our intent to use a tool like shacl for this top-level task as >>>>>>>> well as validating the details liuke having at least one name, a >>>>>>>> gender, and so on. >>>>>>>> I tried using something like your example: >>>>>>>> bds:CheckPersonClassShape a sh:NodeShape ; >>>>>>>> rdfs:label "Check Person Class Shape"@en ; >>>>>>>> sh:targetNode "Check Class" ; >>>>>>>> sh:sparql [ >>>>>>>> a sh:SPARQLConstraint ; >>>>>>>> sh:prefixes [ >>>>>>>> sh:declare [ >>>>>>>> sh:prefix "rdf" ; >>>>>>>> sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ; >>>>>>>> ] , [ >>>>>>>> sh:prefix "bdo" ; >>>>>>>> sh:namespace "http://purl.bdrc.io/ontology/core/" ; >>>>>>>> ] >>>>>>>> ] ; >>>>>>>> sh:select """ >>>>>>>> select $this (rdf:type as ?path) (bdo:Person as ?value) >>>>>>>> where { >>>>>>>> filter not exists { $this ?path ?value } >>>>>>>> } >>>>>>>> """ ; >>>>>>> >>>>>>> That query does not look right. >>>>>>> >>>>>>> 1/ $this is the targetNode >>>>>>> >>>>>>> $this is "Check Class" - the shape needs to find the thing that is the >>>>>>> person amongst the several subjects in the data. That can be in the >>>>>>> SPARQL or as a target of some kind. >>>>>>> >>>>>>> Either set the target to be bdr:P707 >>>>>>> or find a signature such has a bdo:personName triple. >>>>>>> "sh:targetSubjectsOf bdo:personName" >>>>>>> or write some pattern in the SPARQL query. >>>>>>> >>>>>>> You may want some "whole graph" validation such as not completely empty >>>>>>> or has at least some relevant vocabulary to ensure that the data is not >>>>>>> so off that nothing will trigger. That's where the sh:targetNode >>>>>>> "foobar" trick comes in. >>>>>>> >>>>>>> 2/ It's looking for any triple with $this as subject, not "a bdo:Person" >>>>>>> >>>>>>> The SELECT-AS happens after the WHERE. >>>>>>> FILTER NOT EXISTS does not set ?path ?value so if they are unset there >>>>>>> are free variables. >>>>>>> >>>>>>> filter not exists { $this ?P ?O } >>>>>>> >>>>>>> would be just the same and matches any triple with $this as subject. >>>>>>> >>>>>>> >>>>>>> >>>>>>> You want to set ?value and ?path before the FILTER: >>>>>>> >>>>>>> BIND (bdo:Person as ?value) >>>>>>> BIND (rdf:type as ?path) >>>>>>> >>>>>>> >>>>>>> >>>>>>> or write directly and not worry about ?path and ?value. >>>>>>> >>>>>>> filter not exists { $this rdf:type bdo:Person } >>>>>>> >>>>>>> ( >>>>>>> The message processing from SPARQL constraints and components doesn't >>>>>>> do templating. >>>>>>> ) >>>>>>> >>>>>>>> ] ; >>>>>>>> . >>>>>>>> But this just always reports a violation that the literal, “Check >>>>>>>> Class”, doesn’t conform, which is true since it isn’t in the data >>>>>>>> graph. >>>>>>> >>>>>>> >>>>>>> bds:CheckPersonClassShape a sh:NodeShape ; >>>>>>> rdfs:label "Check Person Class Shape"@en ; >>>>>>> ## sh:targetNode bdr:P707 ; >>>>>>> sh:targetSubjectsOf bdo:personName ; >>>>>>> sh:sparql [ >>>>>>> a sh:SPARQLConstraint ; >>>>>>> sh:prefixes [ >>>>>>> sh:declare [ >>>>>>> sh:prefix "rdf" ; >>>>>>> sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ; >>>>>>> ] , [ >>>>>>> sh:prefix "bdo" ; >>>>>>> sh:namespace "http://purl.bdrc.io/ontology/core/" ; >>>>>>> ] >>>>>>> ] ; >>>>>>> sh:select """ >>>>>>> select $this >>>>>>> where { >>>>>>> filter not exists { $this rdf:type bdo:Person } >>>>>>> } >>>>>>> """ ; >>>>>>> ] ; >>>>>>> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> shacl validate -v -s shapes.ttl -d P707.ttl >>>>>>> >>>>>>> shows the validation when "a bdo:Person ;" commented out of the data: >>>>>>> >>>>>>> NodeShape[http://example/CheckPersonClassShape] >>>>>>> N: FocusNodes(1): [http://purl.bdrc.io/resource/P707] >>>>>>> F: http://purl.bdrc.io/resource/P707 >>>>>>> S: NodeShape[http://example/CheckPersonClassShape] >>>>>>> C: SPARQL[PREFIX bdo: <http://purl.bdrc.io/ontology/core/> PREFIX >>>>>>> rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?this >>>>>>> WHERE { FILTER NOT EXISTS { ?this rdf:type bdo:Person } }] >>>>>>> >>>>>>> ... prefixes ... >>>>>>> >>>>>>> [ a sh:ValidationReport ; >>>>>>> sh:conforms false ; >>>>>>> sh:result [ a sh:ValidationResult ; >>>>>>> sh:focusNode bdr:P707 ; >>>>>>> sh:resultMessage "SPARQL SELECT constraint >>>>>>> for <http://purl.bdrc.io/resource/P707> returns >>>>>>> <http://purl.bdrc.io/resource/P707>" ; >>>>>>> sh:resultSeverity sh:Violation ; >>>>>>> sh:sourceConstraintComponent >>>>>>> sh:SPARQLConstraintComponent ; >>>>>>> sh:sourceShape bds:CheckPersonClassShape >>>>>>> ; >>>>>>> sh:value bdr:P707 >>>>>>> ] >>>>>>> ] . >>>>>>> >>>>>>> >>>>>>>> 3) The original reason for wanting to use the shacl endpoint was so >>>>>>>> that we could PUT the submitted graph in the Fuseki dataset and then >>>>>>>> use the endpoint to validate the resource bdr:P707 (or bdr:W1FPL1) as >>>>>>>> a Person (or not) with the rest of the dataset graph available to >>>>>>>> handle things like subClassOf* and subPropertyOf* for various items >>>>>>>> as well as validating the minimum of resources referenced by P707 such >>>>>>>> as that P705 is a male person and hence can be a father of P707. >>>>>>> >>>>>>> That sounds like >>>>>>> >>>>>>> sh:targetNode bdr:P707 >>>>>>> >>>>>>> and also some shapes to check "is there anything relevant at all". >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>>> The graph for P707 that is submitted would only have references to >>>>>>>> P705, with no properties on P705, since that resource is in its own >>>>>>>> graph. >>>>>>>> I thought this is pretty much how validate(Shapes Graph, Node) would >>>>>>>> work, where Graph would be the union dataset graph. >>>>>>>> I’m evidently missing some understanding. >>>>>>>> I appreciate your patience, >>>>>>>> Chris >>>>>>>>> On May 12, 2020, at 3:52 AM, Andy Seaborne <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Chris, >>>>>>>>> >>>>>>>>> Here's a shape that always executes and tests for an empty data graph. >>>>>>>>> >>>>>>>>> # No violation >>>>>>>>> shacl validate -v -shapes ex-shapes.ttl -data not-empty.ttl >>>>>>>>> >>>>>>>>> # Violation >>>>>>>>> shacl validate -v -shapes ex-shapes.ttl -data empty.nt >>>>>>>>> >>>>>>>>> "sh:targetNode" always executes. >>>>>>>>> >>>>>>>>> With this pattern, the SPARQL query can do arbitrary checks. >>>>>>>>> >>>>>>>>> Andy >>>>>>>>> >>>>>>>>> ## ex-shapes.ttl >>>>>>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>>>>>>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> >>>>>>>>> >>>>>>>>> PREFIX sh: <http://www.w3.org/ns/shacl#> >>>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>>>>>>> >>>>>>>>> PREFIX ex: <http://example/> >>>>>>>>> >>>>>>>>> ex:NotEmptyGraphShape >>>>>>>>> rdf:type sh:NodeShape ; >>>>>>>>> sh:targetNode "Empty Graph" ; >>>>>>>>> sh:sparql [ >>>>>>>>> a sh:SPARQLConstraint ; >>>>>>>>> sh:select """ >>>>>>>>> SELECT $this ?value >>>>>>>>> WHERE { >>>>>>>>> FILTER NOT EXISTS { ?s ?p ?o } >>>>>>>>> } >>>>>>>>> """ ; >>>>>>>>> ] . >>>>>>>>> >>>>>>>>> On 11/05/2020 17:14, Chris Tomlinson wrote: >>>>>>>>> >>>>>>>>>> I appreciate that it works that way but until and unless I can >>>>>>>>>> understand your point about >>>>>>>>>> [] sh:targetNode ex:myNode >>>>>>>>>> then I don’t know how to distinguish: 1) no violations because a >>>>>>>>>> Person graph conforms to the PersonShapes - like there’s no Work >>>>>>>>>> indicated as a parent of the person or a rdfs:label is used where a >>>>>>>>>> skos:prefLabel is expected; versus 2) no violations because the >>>>>>>>>> question is vacuous like asking if a Work looks like a person or an >>>>>>>>>> empty non-existent graph looks like a person.
