On 16/10/2020 17:28, Martynas Jusevičius wrote:
Hi,
we're moving more and more RDF I/O to streaming implementations, and
that doesn't work well with another important feature -- constraint
validation.
I'm wondering if it is possible, even in theory, to validate streaming
RDF data, e.g. with SPARQL/SPIN/SHACL? My intuition says no, or maybe
with some severe restrictions.
SHACL SPARQL constraints - they can access anything.
SHACL core constraints:
And assuming a SHACL evaluator written specially for the streaming case:
To add to Holger's reply - there is also the target and sh:path to
consider as well.
If the sh:path is not simple (one property), then that makes it harder.
Think about some examples:
So - how to do something like this:
---
shape ex:shape1 -> ns:C {
ns:p [0..1] datatype=xsd:integer .
}
---
ex:shape1 a sh:NodeShape ;
sh:targetClass ns:C .
sh:property [
sh:path ns:p ;
sh:maxCount 1 ;
sh:datatype xsd:integer
] ;
---
The target is "class", there is a simple path, there is a single-node
constraint, and there is a cardinality constraint.
You need to know the class of the subject and the path at the same time.
If you know that ns:p has rdfs:domain ns:C then one triple is all you
need for sh:datatype. But the domain may not be that simple.
If the stream is going into an existing graph, then the rdf:type may be
in the destination.
If you don't know the domain sufficiently you need to confirm the class.
You don't know whether the class or the sh:path will come first so the
processor carries state.
Maybe you can run the validation tentatively and confirm later (think:
CPU speculative execution - in parallel as well).
If the stream is known to be batched into exclusive same-subject blocks,
which is NOT what BatchedStreamRDF does, then quite doable. It is a
windowing function. And it is preprocessing an abitrary stream into buckets.
There again, some things are easy:
---
shape ex:shape2 {
sh:targetObjectsOf ns:p .
datatype=xsd:integer .
}
---
Only need the "? ns:p ?" triple for that.
====
What looks more practical is to track changes to targets and focus nodes
then use that information to run only the necessary validations at the
end of a transaction, with abort of the transaction on violation.
Andy
The XSLT 3.0 spec includes some advanced streamability analysis that
is somewhat related: https://www.w3.org/TR/xslt-30/#streamability
But then again, XML trees are not RDF graphs...
Martynas