On 16/10/2020 17:28, Martynas Jusevičius wrote:
Hi,

we're moving more and more RDF I/O to streaming implementations, and
that doesn't work well with another important feature -- constraint
validation.

I'm wondering if it is possible, even in theory, to validate streaming
RDF data, e.g. with SPARQL/SPIN/SHACL? My intuition says no, or maybe
with some severe restrictions.

SHACL SPARQL constraints - they can access anything.

SHACL core constraints:
And assuming a SHACL evaluator written specially for the streaming case:

To add to Holger's reply - there is also the target and sh:path to consider as well.

If the sh:path is not simple (one property), then that makes it harder.

Think about some examples:

So - how to do something like this:

---
shape ex:shape1 -> ns:C {
   ns:p [0..1] datatype=xsd:integer .
}
---
ex:shape1  a            sh:NodeShape ;
        sh:targetClass  ns:C .
        sh:property     [
                          sh:path      ns:p ;
                          sh:maxCount  1 ;
                          sh:datatype xsd:integer
                        ] ;
---

The target is "class", there is a simple path, there is a single-node constraint, and there is a cardinality constraint.

You need to know the class of the subject and the path at the same time.

If you know that ns:p has rdfs:domain ns:C then one triple is all you need for sh:datatype. But the domain may not be that simple.

If the stream is going into an existing graph, then the rdf:type may be in the destination.

If you don't know the domain sufficiently you need to confirm the class.
You don't know whether the class or the sh:path will come first so the processor carries state.

Maybe you can run the validation tentatively and confirm later (think: CPU speculative execution - in parallel as well).

If the stream is known to be batched into exclusive same-subject blocks, which is NOT what BatchedStreamRDF does, then quite doable. It is a windowing function. And it is preprocessing an abitrary stream into buckets.

There again, some things are easy:
---
shape ex:shape2 {
  sh:targetObjectsOf ns:p .
  datatype=xsd:integer .
}
---
Only need the "? ns:p ?" triple for that.

====

What looks more practical is to track changes to targets and focus nodes then use that information to run only the necessary validations at the end of a transaction, with abort of the transaction on violation.

    Andy

The XSLT 3.0 spec includes some advanced streamability analysis that
is somewhat related: https://www.w3.org/TR/xslt-30/#streamability
But then again, XML trees are not RDF graphs...


Martynas

Reply via email to