On May 16, 2020, at 5:45 AM, Andy Seaborne <a...@apache.org> wrote:
On 15/05/2020 00:57, Chris Tomlinson wrote:
Hello Andy,
I have standalone code using validator.validate(Shapes, Graph, Node) where the
graph is a merge of the target graph, e.g., P707, and the ontology graph. This
works fine to validate examples like P707 generating sh:results just for
references to P705 which is not otherwise included in the merged graph, which
is what I expect.
If the code was running in Fuseki and the graph is the dataset graph (equiv
union graph I think) then I would like to know how far out from the node the
validation process will reach.
That depends on the target (or, here, implicit class target) and the shape
itself.
digression
... something that I've experimented with - analysing the shapes to determine
execution strategy. There are some useful cases:
* the validation is only on the triple added (e.g. sh:datatype) - and does not
need access to the database so it can be done in parallel outside the
transaction.
* the validation needs local changes (e.g. minCount) to the entity (subject and
all triples with that subject) - that can be used to reduce the number of
validations done. If a entity isn't touched, no validation necessary.
* global - needs access to the whole database. Not much can be done except
execute inline at the end of the transaction. Often these are SPARQL constraits
where you can e.g. count the triples.
/digression
For example, given that the shapes include the shape:
bds:PersonShape-hasParent
a sh:PropertyShape ;
sh:class bdo:Person ;
sh:description "this Person may have at most two parents."@en ;
sh:inversePath bdo:hasChild ;
??
sh:maxCount 2 ;
sh:path bdo:hasParent ;
.
Then I thought that the validation process would check just that:
P705 rdf:type bdo:Person .
as well as validating the count constraint; and in the case of the shape:
Yes - there are two constraints: sh:class and sh:maxCount
bds:PersonShape-hasFather
a sh:PropertyShape ;
sh:description "this Person may have a father."@en ;
sh:inversePath bdo:hasChild ;
Is that supposed to be:
sh:path [ sh:inversePath bdo:hasChild ]
?
A property shape has a sh:path and that sh:path can be a inverse path.
sh:inversePath isn't used on the property shape itself.
sh:maxCount 1 ;
sh:node bds:MaleShape ;
sh:path bdo:hasFather ;
and now we have two sh:paths?
(If that is you shape, the sh:inversePath is going to be ignored as it is out
of place.)
.
will in addition check that:
P705 bdo:gender bdr:GenderMale .
and not check any other constraints on P705, such as its students or kinship
relations.
If P705 is reached with "sh:path bdo:hasFather"
The purpose being that when a user has “edited” an existing resource or
“created” a new resource then we just want to validate the changed or new
resource without having the validation process traverse all resources reachable
from P707 via arbitrary length paths, which is unnecessary.
Assuming the validator.validate(Shapes, Graph, Node) works along the lines I’ve
sketched, then since the shacl endpoint doesn’t use this method it would take
an extension to the endpoint or a new endpoint to accomplish want I’ve
described.
See the code.
validator.validate(Shapes, Graph, Node)
executes the shapes (any that apply) to the single focus node. It does check
the shapes to see which apply so the target clause (inc implicit targets) has
to be something that would include the node.
I’m happy to raise an issue and create a PR if that makes sense.
Great.
Andy
Thank you again very much,
Chris
On May 14, 2020, at 4:16 PM, Andy Seaborne <a...@apache.org> wrote:
On 14/05/2020 19:06, Chris Tomlinson wrote:
Hi Andy,
I want to validate a named graph in the context of the union graph. I don’t want to
validate the union graph. The union graph has information in it such as the ontology
which defines subClass and subProperty relations needed to successfully validate a
target graph such as http://purl.bdrc.io/graph/P707
<http://purl.bdrc.io/graph/P707>.
I don't understand "in the context of the union graph."
Isn't "Context" in RDF is "merge the graphs"?
Validation is a process that operates on a shapes graph (which is parsed so
really its just shapes - anything else in it is ignored) and a data graph.
There's no structure to the data graph - it is everything being validated.
I did suggest some SHACL extensions
https://afs.github.io/shacl-datasets.html
but they are hypothetical extensions.
In code, you could make a temporary union of two or more graphs to make a
single data graph.
"a named graph in the context of the union graph."
So the NG is in addition to the dataset graphs? or is in in the dataset already?
In the SHACL service ?graph= is the data target and is taken from the dataset.
Also P707 refers to a parent and teacher P705 which needs to be verified that
it meets minimum criteria for a Person.
I thought that validate(shapes, graph, node)
/** Produce a full validation report for this node in the data. */
i.e. use node as the focus node (like sh:targetNode) and execute the shapes
only with that node.
But does P707 have one focus node or many?
should accomplish this if graph = the dataset graph which contains all these
additional bits of information.
That’s why the endpoint is interesting since it provides in principle access to
using shacl inside of Fuseki, where the entire dataset is available, without
having to write an independent bit of code that we add to our fuseki
deployments.
There is nothing special about Fuseki endpoint - any Dataset has a union graph.
It's a way to call
ValidationReport report =
ShaclValidator.get().validate(shapesGraph, data);
on a remote data graph.
I hope this clarifies what I’m wanting to accomplish. I probably don’t
understand what validate(shapes, graph, node) is supposed to do.
Thanks for your patience,
Chris
On May 14, 2020, at 12:34 PM, Andy Seaborne <a...@apache.org> wrote:
?graph names the graph to be validated.
?graph can be a URI of a named graph in the dataset
or ?graph=default for the default graph (note: this is the storage default
graph, not the union default graph)
or ?graph=union for the union of all named graphs which is what I think you're
asking for.
(This is the org.apache.jena.fuseki.servlets.SHACL_Validation servlet.)
On 14/05/2020 15:40, Chris Tomlinson wrote:
Hi Andy,
Thanks very much for the shacl guidance. The use of sh:targetSubjectsOf is
quite helpful. I replaced the bdo:personName w/ bdo:isRoot which must be
present on any Entity resource so that if a Work or Place or other entity is
checked it will fail if it isn’t a bdo:Person.
This still fails in the event that there is no bdo:isRoot so in some way that
negative needs also to be caught to weed out really malformed graphs.
I still have a question about the shacl endpoint:
Is the ?graph parameter validated in the context of the entire dataset
specified in the endpoint URL or just the named graph itself?
It appears to be just the named graph itself so is the same as running the
shacl command outside of Fuseki.
Yes - as above, it can be the union.
We are wanting a validation of the named graph against the entire (union)
dataset graph
Not sure what "against" means here. There is a shapes graph in the validate
request and data graph, which can be the union graph of the dataset.
To direct the validation to a certain node, use sh:targetNode.
which will have sufficient information about subClassOf* and external resources
like P705 without entailing a validation of all nodes reachable from triples in
the ?graph named graph. This might be similar to:
validator.validate(shapes, dsg, node)
where node would be the root resource URI like, <http://purl.bdrc.io/resource/P707
<http://purl.bdrc.io/resource/P707>>.
Is this something that needs an issue raised and a bit of extension of the
endpoint or is there another way to get this kind of behavior through the
endpoint?
Thank you very much for your help,
Chris
On May 13, 2020, at 12:16 PM, Andy Seaborne <a...@apache.org> wrote:
On 13/05/2020 16:03, Chris Tomlinson wrote:
Hi Andy,
Thank you for the reply. I can get your example to work as you indicate, but
have some questions:
1) I went through the latest SHACL draft
<https://w3c.github.io/data-shapes/shacl/> and I cannot find how to know that
sh:targetNode always executes. It’s also not clear to me what it means to execute. I
thought that sh:targetNode X was a way to restrict a shape to X in the data graph,
whatever X might be.
It sets the target node to X and that becomes $this.
It does not say the target has to be in the graph.
The tests use this idiom quite a lot.
This matters because in some places the spec is not complete and without some
light reverse engineering from the tests, I'd not have been able to implement
some of the SPARQL functionality (particularly SPARQL components, not the
SPARQl constraints we're talking about here).
Also, RDF graphs do not have a formally defined set of nodes - they are a set
of edges and any nodes you want can be used in triples.
2) What I’m trying to do is validate that a resource like
http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707> is a
Person, which at a minimum means that:
<http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>> a
<http://purl.bdrc.io/ontology/core/Person <http://purl.bdrc.io/ontology/core/Person>> .
is present in the http://purl.bdrc.io/graph/P707 <http://purl.bdrc.io/graph/P707>.
The PersonShape
<https://github.com/buda-base/editor-templates/blob/master/templates/core/person.shapes.ttl>
has:
sh:targetClass bdo:Person
but that only serves to say that PersonShape only applies to resources of class bdo:Person
and if there are none, then there are no violations which means I can try to validate a
bibliographic element such as http://purl.bdrc.io/resource/W1FPL1
<http://purl.bdrc.io/resource/W1FPL1> which is of class bdo: ImageInstance but of
course that still sh:conforms true since bds:PersonShape doesn’t apply and hence there
aren’t any violations. (to see the resources, use
http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl
<http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl>, for example).
The use case is: a client submits a graph of a resource and claims it to be a bdo:Person
or a subClassOf* it; and we want to validate the graph as a bdo:Person and so want to get
the result “false" for bdr:W1FPL1 instead of “true".
It’s our intent to use a tool like shacl for this top-level task as well as
validating the details liuke having at least one name, a gender, and so on.
I tried using something like your example:
bds:CheckPersonClassShape a sh:NodeShape ;
rdfs:label "Check Person Class Shape"@en ;
sh:targetNode "Check Class" ;
sh:sparql [
a sh:SPARQLConstraint ;
sh:prefixes [
sh:declare [
sh:prefix "rdf" ;
sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ;
] , [
sh:prefix "bdo" ;
sh:namespace "http://purl.bdrc.io/ontology/core/" ;
]
] ;
sh:select """
select $this (rdf:type as ?path) (bdo:Person as ?value)
where {
filter not exists { $this ?path ?value }
}
""" ;
That query does not look right.
1/ $this is the targetNode
$this is "Check Class" - the shape needs to find the thing that is the person
amongst the several subjects in the data. That can be in the SPARQL or as a target of
some kind.
Either set the target to be bdr:P707
or find a signature such has a bdo:personName triple. "sh:targetSubjectsOf
bdo:personName"
or write some pattern in the SPARQL query.
You may want some "whole graph" validation such as not completely empty or has at least
some relevant vocabulary to ensure that the data is not so off that nothing will trigger. That's
where the sh:targetNode "foobar" trick comes in.
2/ It's looking for any triple with $this as subject, not "a bdo:Person"
The SELECT-AS happens after the WHERE.
FILTER NOT EXISTS does not set ?path ?value so if they are unset there are free
variables.
filter not exists { $this ?P ?O }
would be just the same and matches any triple with $this as subject.
You want to set ?value and ?path before the FILTER:
BIND (bdo:Person as ?value)
BIND (rdf:type as ?path)
or write directly and not worry about ?path and ?value.
filter not exists { $this rdf:type bdo:Person }
(
The message processing from SPARQL constraints and components doesn't do
templating.
)
] ;
.
But this just always reports a violation that the literal, “Check Class”,
doesn’t conform, which is true since it isn’t in the data graph.
bds:CheckPersonClassShape a sh:NodeShape ;
rdfs:label "Check Person Class Shape"@en ;
## sh:targetNode bdr:P707 ;
sh:targetSubjectsOf bdo:personName ;
sh:sparql [
a sh:SPARQLConstraint ;
sh:prefixes [
sh:declare [
sh:prefix "rdf" ;
sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ;
] , [
sh:prefix "bdo" ;
sh:namespace "http://purl.bdrc.io/ontology/core/" ;
]
] ;
sh:select """
select $this
where {
filter not exists { $this rdf:type bdo:Person }
}
""" ;
] ;
.
shacl validate -v -s shapes.ttl -d P707.ttl
shows the validation when "a bdo:Person ;" commented out of the data:
NodeShape[http://example/CheckPersonClassShape]
N: FocusNodes(1): [http://purl.bdrc.io/resource/P707]
F: http://purl.bdrc.io/resource/P707
S: NodeShape[http://example/CheckPersonClassShape]
C: SPARQL[PREFIX bdo: <http://purl.bdrc.io/ontology/core/> PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?this WHERE { FILTER NOT
EXISTS { ?this rdf:type bdo:Person } }]
... prefixes ...
[ a sh:ValidationReport ;
sh:conforms false ;
sh:result [ a sh:ValidationResult ;
sh:focusNode bdr:P707 ;
sh:resultMessage "SPARQL SELECT constraint for
<http://purl.bdrc.io/resource/P707> returns <http://purl.bdrc.io/resource/P707>" ;
sh:resultSeverity sh:Violation ;
sh:sourceConstraintComponent sh:SPARQLConstraintComponent ;
sh:sourceShape bds:CheckPersonClassShape ;
sh:value bdr:P707
]
] .
3) The original reason for wanting to use the shacl endpoint was so that we
could PUT the submitted graph in the Fuseki dataset and then use the endpoint
to validate the resource bdr:P707 (or bdr:W1FPL1) as a Person (or not) with the
rest of the dataset graph available to handle things like subClassOf* and
subPropertyOf* for various items as well as validating the minimum of resources
referenced by P707 such as that P705 is a male person and hence can be a father
of P707.
That sounds like
sh:targetNode bdr:P707
and also some shapes to check "is there anything relevant at all".
Andy
The graph for P707 that is submitted would only have references to P705, with
no properties on P705, since that resource is in its own graph.
I thought this is pretty much how validate(Shapes Graph, Node) would work,
where Graph would be the union dataset graph.
I’m evidently missing some understanding.
I appreciate your patience,
Chris
On May 12, 2020, at 3:52 AM, Andy Seaborne <a...@apache.org> wrote:
Chris,
Here's a shape that always executes and tests for an empty data graph.
# No violation
shacl validate -v -shapes ex-shapes.ttl -data not-empty.ttl
# Violation
shacl validate -v -shapes ex-shapes.ttl -data empty.nt
"sh:targetNode" always executes.
With this pattern, the SPARQL query can do arbitrary checks.
Andy
## ex-shapes.ttl
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ex: <http://example/>
ex:NotEmptyGraphShape
rdf:type sh:NodeShape ;
sh:targetNode "Empty Graph" ;
sh:sparql [
a sh:SPARQLConstraint ;
sh:select """
SELECT $this ?value
WHERE {
FILTER NOT EXISTS { ?s ?p ?o }
}
""" ;
] .
On 11/05/2020 17:14, Chris Tomlinson wrote:
I appreciate that it works that way but until and unless I can understand your
point about
[] sh:targetNode ex:myNode
then I don’t know how to distinguish: 1) no violations because a Person graph
conforms to the PersonShapes - like there’s no Work indicated as a parent of
the person or a rdfs:label is used where a skos:prefLabel is expected; versus
2) no violations because the question is vacuous like asking if a Work looks
like a person or an empty non-existent graph looks like a person.