On 01/06/2020 21:08, Chris Tomlinson wrote:
Hi Andy,
Not trying to be pedantic below but I’m trying to understand how to think in
shacl and establish some expectations of the validation process.
If it help, the general pattern is
Target ->
(Node shape -> property shape->)*
Constraint*
On May 31, 2020, at 9:40 AM, Andy Seaborne <[email protected]> wrote:
Do we agree that this is a test case?
(one file, data and shapes combined)
Only command line tools needed.
I agree that the combined data and shapes file exhibits differences in report
results, when interchanging bds:PersonShape and bds:PersonLocalShape.
------------------------
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix bdo: <http://purl.bdrc.io/ontology/core/> .
@prefix bdr: <http://purl.bdrc.io/resource/> .
@prefix bds: <http://purl.bdrc.io/ontology/shapes/core/> .
## Data:
bdr:NM0895CB6787E8AC6E
a bdo:PersonName ;
.
bdr:P707 a bdo:Person ;
bdo:personName bdr:NM0895CB6787E8AC6E ;
.
## Shapes:
#bds:PersonShape # 2
bds:PersonLocalShape # 1
sh:property bds:PersonShape-personName ;
sh:targetClass bdo:Person ;
.
bds:PersonShape-personName
sh:message "PersonName is not well-formed, wrong Class or missing
rdfs:label"@en ;
sh:node bds:PersonNameShape ;
sh:path bdo:personName ;
.
bds:PersonNameShape a sh:NodeShape ;
sh:property bds:PersonNameShape-personNameLabel ;
sh:targetClass bdo:PersonName ;
.
bds:PersonNameShape-personNameLabel
sh:message ":PersonName must have exactly one rdfs:label"@en ;
sh:minCount 1 ;
sh:path rdfs:label ;
.
------------------------
The differences seems to be that the hash order is different and it affects
finding targets, combined with the fact that targets are nested:
I see JENA-1907 <https://issues.apache.org/jira/browse/JENA-1907> raises the
issue; I understand:
If A is processed first as a target then the parser shapes now includes B so
processing B is skipped.
Note - the effect is only in the number of times constriants are executed ,
once or twice, not whether they are omitted.
to say that, in the current test case w/ the hash order issue, when nesting occurs owing
to sh:node, then when a violation is found by (A) bds:PersonShape-personName, then the
validation does not "go deeper" to consider (B) bds:PersonNameShape, by itself.
W/o sh:node, in bds:PersonShape-personName, then both bds:PersonShape-personName and
bds:PersonNameShape are parsed as independent targets and executed independently.
bds:PersonLocalShape (target)
-> bds:PersonLocalShape
-> bds:PersonNameShape (target)
-> bds:PersonNameShape-personNameLabel
I think the second line above is supposed to be
-> bds:PersonShape-personName
Both targets match bdr:P707, one by class, one by property.
I understand the NodeShape, bds:PersonLocalShape, matching bdr:P707, meaning,
to me, that the constraints expressed in that shape need to be evaluated w/
P707 being the subject (== focus node). I take this to be “by class”.
I do not understand how NodeShape, bds:PersonNameShape, matches bdr:P707. I
think bds:PersonNameShape matches bdr:NM0895CB6787E8AC6E because of
sh:targetClass bdo:PersonName.
1/
bds:PersonShape
sh:targetClass bdo:Person
-> bdr:P707
and is has
sh:property bds:PersonShape-personName ;
->
sh:node bds:PersonNameShape ;
->
sh:property bds:PersonNameShape-personNameLabel ;
2/
bds:PersonNameShape a sh:NodeShape ;
sh:property bds:PersonNameShape-personNameLabel ;
sh:targetClass bdo:PersonName ; <-- which is part of bdr:P707
-> bdr:NM0895CB6787E8AC6E ;
so two ways to get to bds:PersonNameShape-personNameLabel from target
declarations.
(try "shacl validate -v")
In case1: you can see the paths:
2 targets.
each with one focus node
leading to the same property shape /PersonNameShape-personNameLabel
which has a constraint.
(I checked the spec and it is onlt says to execute once if the same
focus node comes up multiple times for the same target shape but here
there are two different target shapes. TQ shacl agrees.)
F: Focus node
S: Node Shape
P: Property Shape.
C: Constraint
NodeShape[http://purl.bdrc.io/ontology/shapes/core/PersonLocalShape]
N: FocusNodes(1): [http://purl.bdrc.io/resource/P707]
F: http://purl.bdrc.io/resource/P707
S: NodeShape[http://purl.bdrc.io/ontology/shapes/core/PersonLocalShape]
P:
PropertyShape[http://purl.bdrc.io/ontology/shapes/core/PersonShape-personName
-> <http://purl.bdrc.io/ontology/core/personName>]
C: http://purl.bdrc.io/resource/P707 :: Node
S: NodeShape[http://purl.bdrc.io/ontology/shapes/core/PersonNameShape]
P:
PropertyShape[http://purl.bdrc.io/ontology/shapes/core/PersonNameShape-personNameLabel
-> <http://www.w3.org/2000/01/rdf-schema#label>]
C: http://purl.bdrc.io/resource/NM0895CB6787E8AC6E :: minCount[1]
NodeShape[http://purl.bdrc.io/ontology/shapes/core/PersonNameShape]
N: FocusNodes(1): [http://purl.bdrc.io/resource/NM0895CB6787E8AC6E]
F: http://purl.bdrc.io/resource/NM0895CB6787E8AC6E
S: NodeShape[http://purl.bdrc.io/ontology/shapes/core/PersonNameShape]
P:
PropertyShape[http://purl.bdrc.io/ontology/shapes/core/PersonNameShape-personNameLabel
-> <http://www.w3.org/2000/01/rdf-schema#label>]
C: http://purl.bdrc.io/resource/NM0895CB6787E8AC6E :: minCount[1]
It should execute twice -
I’m not following the referent “it” (but see below, I think I may).
The constraint(s) of bds:PersonShape-personName
My understanding of (target) bds:PersonLocalShape is that for resources of
targetClass, bdo:Person, check that the constraints expressed in
bds:PersonShape-personName conform for all objects of bdo:personName where the
subject of that property path is bdr:P707 (in this case); and
(target) bds:PersonNameShape says that for resources of targetClass,
bdo:PersonName, check that the constraints in PersonShape-personNameLabel
conform where the resource is a bdo:PersonName, in this case
bdr:NM0895CB6787E8AC6E.
I don’t see what’s supposed to execute twice.
/PersonNameShape-personNameLabel
and constraint minCount[1] on NM0895CB6787E8AC6E
but did you mean to do this in the first place? Note while it is a minCount failure,
because of going through the sh;node, the message is the "wrong Class" one
because executing via bds:PersonShape-personName makes that the message.
I meant to express that for a bdo:Person there must be at least 1
bdo:personName - via bds:PersonShape-personName (the test case omits
sh:minCount 1 in bds:PersonShape-personName);
Yes - because that minCount was not a factor.
I worked though the data removing each element that did not affect the
outcome, 3 vs 2, then remove the SPARQL constaint which is not relevant
(it contributed one violation in both cases) leaving 2 vs 1.
and that is due to the /PersonNameShape-personNameLabel minCount
and that a conforming bdoPersonName must have exactly 1 rdfs:label (the test
case omits sh:maxCount 1 in bds:PersonShape-personNameLabel).
I used "sh:node bds:PersonNameShape" in the declaration for
bds:PersonShape-personName to identify the particular NodeShape that is intended to validate
objects of the "sh:path bdo:personName” in this situation.
Perhaps I see what is "supposed to execute twice”.
With the "sh:node bds:PersonNameShape” in bds:PersonShape-personName, then
bds:PersonNameShape validation must be executed (if it hasn’t already been
executed); and
since bdr:NM0895CB6787E8AC6E will match bds:PersonNameShape separately by
considering “sh:targetClass bdo:PersonName” then unless there is some check in
the validator to see if a (node, shape) pair has already been executed, then
there will be 2 executions instead of just 1.
You can see the differences with "shacl print”.
I do see differences w/ “shacl parse” w/ and w/o "sh:node bds:PersonNameShape”.
I’ll learn to use the tool.
My take away is that I shouldn’t be using sh:node as I have or perhaps I could remove the
sh:targetClass from bds:PersonNameShape and use sh:node to steer the validation. But I
guess the latter would lead to the generic "PersonName is not well-formed …” message
instead of the more specific "PersonName must have exactly one rdfs:label”.
Dulication arises when theer is a target that is also referred to by
another target by some connections though the shaps graph - sh:node is
one way of doing.
There are other ways to link in a constraint twice like graph linking:
## Data:
:foo a :C ;
:prop 1 , 2 .
## Shapes:
:A
sh:targetClass :C ;
sh:property :P .
:B
sh:targetClass :C ;
sh:property :P .
:P
sh:path :prop ;
sh:message "Hello world" ;
sh:maxCount 1 .
2 violations, both with "Hello World", for the same reason
There seem to be many nuances to shacl.
Anyway thanks very much for the valuable information regarding using shacl,
Chris
Andy
On 29/05/2020 20:39, Chris Tomlinson wrote:
Hi Andy,
Thank you for the reply. Focussing on just the first question. I have prepared
small self-contained tests of jena-shacl from 3.14.0 (JS) and TopQuadrant Shacl
1.3.2 (TQ).
The apps differ only according to differences imposed by the JS and TQ APIs:
ShaclName_validateGraphJS.java <https://pastebin.com/5382xZeL>
ShaclName_validateGraphTQ.java <https://pastebin.com/3BxmyhqA>
The DATA_P707.ttl <https://pastebin.com/ugCZfABj> contains the three needed
triples from the ontology and the bare minimum from the example P707 with two
different errors in two of the PersonName instances.
The ShapeName_01.ttl <https://pastebin.com/jDqzvPTe> contains the shape
definitions and all tests are performed only by changing the name on line 9.
The ShaclName_validateGraphJS-results-PersonShape.txt
<https://pastebin.com/seEfWKNa> shows the results when the JS app is run with
the name bds:PersonShape and gives the expected results.
The ShaclName_validateGraphJS-results-PersonLocalShape…
<https://pastebin.com/q1SWMC4H> shows the results when the JS app is run with
the name bds:PersonLocalShape and gives unexpected results. Namely, the expected
violation regarding the PersonName which uses skos:prefLabel instead of rdfs:label is
erroneously reported as conforming.
The ShaclName_validateGraphJS-results-varying.txt
<https://pastebin.com/CNwnE5kg> shows results for names ranging from “P”, “Pe”,
“Per” thru “PersonLocal”, “PersonShape” upto “PersonLocalShape”, “PersonLocalShaper”,
and finally “PersonLocalShapers” for the JS app. In the table a “0” means the
unexpected result and a “1” means the expected result - 7 names produce unexpected
results and 20 names produce expected results.
The ShaclName_validateGraphTQ-results.txt <https://pastebin.com/BQnStjVq> shows the
results when the TQ app is run for any spelling of the name on line 9 of ShapeName_01.ttl
<https://pastebin.com/jDqzvPTe>. The results are the expected results as with some
spellings of the name in the JS case. TQ shows no variation owing to the name on line 9 as
is expected.
(Note: The TQ engine needed to be re-initialized for each use otherwise it
accumulated results. This is why there is an init of the ShaclSimpleValidator
at each use in the JS app even though it is not needed. I just wanted to
produce as much as possible an apples-to-apples comparison of JS and TQ.)
(Note: The TQ report does not include sh:conforms true ; in the results, just:
[ a sh:ValidationReport ] . I don’t know if this conforms to the SHACL
spec but that’s another matter.)
The results from the command line tests show the same as the above.
Running with line 9 of ShapeName_01.ttl <https://pastebin.com/jDqzvPTe> set
to bds:PersonLocalShape:
shacl v -s ShapeName_01.ttl -d DATA_P707.ttl > PersonLocalShape_JS_Results.ttl
<https://pastebin.com/M9s859Kc>
produces the unexpected results, namely there is no detail regarding the
missing rdfs:label on bdr:NM0895CB6787E8AC6E.
However, running with line 9 of ShapeName_01.ttl
<https://pastebin.com/jDqzvPTe> set to bds:PersonShape:
shacl v -s ShapeName_01.ttl -d DATA_P707.ttl > PersonShape_JS_Results.ttl
<https://pastebin.com/DhBNucpX>
produces the expected results, in that the detail regarding the missing
rdfs:label on bdr:NM0895CB6787E8AC6E is present among the results.
I did not set up the TQ command line but I think the above TQ results make this
testing unnecessary.
I think these tests show that there is an unexpected dependence on a shape name
in the JS library and not in the TQ library. I think this is an error and I can
open a JIRA issue if appropriate.
A consideration I have is that we want to be able to use the fuseki shacl
endpoint for some processing and hence need to understand the expected behavior
of the JS library which is integrated.
Thank you again for your help
Chris
On May 29, 2020, at 6:26 AM, Andy Seaborne <[email protected]> wrote:
Question 1: regarding the name bds:PersonShape at line 9 of ShapeName_01.ttl
<https://pastebin.com/spJJAsJ3>. With that name the results of running
ShaclName_validateGraph.java <https://pastebin.com/qvUy2XeB> are as expected, see
ShapeName-results-PersonShape.txt <https://pastebin.com/Hbk4dj04>.
There are two errors in P707_nameErrs02.ttl <https://pastebin.com/8wZeMiEU> regarding
bdr:NMC2A097019ABA499F and bdr:NM0895CB6787E8AC6E which are reported in the
ShapeName-results-PersonShape.txt <https://pastebin.com/Hbk4dj04> file.
However, if the name at line 9 of ShapeName_01.ttl <https://pastebin.com/spJJAsJ3> is
changed to: bds:PersonLocalShape or bds:Frogs; then detail for bdr:NM0895CB6787E8AC6E
reports, (see ShapeName-results-PersonLocalShape.txt <https://pastebin.com/f4F9h1E2>):
[ a sh:ValidationReport ;
sh:conforms true ] .
instead of:
[ a sh:ValidationReport ;
sh:conforms false ;
sh:result [ a sh:ValidationResult ;
sh:focusNode bdr:NM0895CB6787E8AC6E ;
sh:resultMessage ":PersonName must have exactly one
rdfs:label"@en ;
sh:resultPath rdfs:label ;
sh:resultSeverity sh:Violation ;
sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
sh:sourceShape
bds:PersonNameShape-personNameLabel
]
] .
which is the result with bds:PersonShape at line 9 of ShapeName_01.ttl
<https://pastebin.com/spJJAsJ3>. In fact changing the name to bds:FrogTarts
also produces the expected results.
Summary: If the shape name at line 9 of ShapeName_01.ttl
<https://pastebin.com/spJJAsJ3> is either bds:PersonShape or bds:FrogTarts then
the results are as expected; while if the shape name is either bds:PersonLocalShape
or bds:Frogs then one of the detail results disappears and is replaced by
sh:conforms true.
Why this dependence on the shape name? The shape name isn’t referred to elsewhere in
ShapeName_01.ttl <https://pastebin.com/spJJAsJ3>.
A way to check is run both Jena Shacl and TQ Shacl and see if they get the same
violations
I ran the shapes and data in both and get 32 violations (with no ontology added)
and then running with the datafile as P707+ontology. Now 5 results each.
shacl v -s ShapeName_01.ttl -d P707_nameErrs02.ttl > V1.ttl
tb-shacl -shapesfile ShapeName_01.ttl -datafile P707_nameErrs02.ttl
The name of the shape does not seem to make a difference when run like this.
Have you tries with targetNode to select the node to validate? With a subset of
thee shapes? That would make discussing it much easier as would a
self-contained data (the ontology isn't particularly small).
Do you have an example which has one target shape and shows differences?
This:
bds:PersonShape-personName
a sh:PropertyShape ;
sh:class bdo:PersonName ;
sh:message "PersonName is not well-formed, wrong Class or missing
rdfs:label"@en ;
sh:minCount 1 ;
sh:node bds:PersonNameShape ;
sh:nodeKind sh:IRI ;
sh:path bdo:personName ;
.
(and others) could be split up into separate shapes, one per constraint (this
has node kind, node shape, and minCount) which might make the report clearer
bds:PersonNameShape also has a target - it can get called via two different
routes.
It's quite complicated to track what's going on.