Re: statement ids, rdf* and reification

Andy Seaborne Tue, 08 Aug 2017 03:38:19 -0700


On 07/08/17 19:35, Chris Tomlinson wrote:

Hello,

We're investigating various approaches to adding annotations about
individual statements (or perhaps rarely a subset of statements) of a
named graph.

There’s note from 2015, Re: Performance Cost of Reification
<http://apache.markmail.org/message/js6s6ry5st73soay>, that mentions a
syntax like:

     <<A sends email to B>>,

that was proposed for use in Sparql 1.0 and that at the time of the note was 
still in the ARQ
parser source.

The <<>> syntax, as in ARQ and discussed in SPARQL 1.0 is shorthand forwriting out reification, not an extension to the data model nor semantics.


<<s p o>> is syntax for

? rdf:subject s
? rdf:property p
? rdf:object o

i.e. not a triple id.

Data and/or query can be written long hand.

The syntax is similar to that of the Blazegraph RDF*/Sparql* <https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right> and we’re interested to know if these are related ideas and > if there is any anticipation that such an approach might ever find its way into appropriatestandards.

RDF* seems to be based around the assumption a statement is reified onlyonce and that the base fact is asserted in the graph. That means tripleids make sense.


Note the example:

BIND( <<?bob foaf:age ?age>> AS ?t ) .

which matches the graph for ?bob foaf:age ?age and matches once to makesense (it's a BIND).

Reification can be multiple times (in different files, with differentannotations, to be merged), and you can reify a statement withoutneeding it in the data (it's necessarily not asserted).

This is why RDF* is compatible with reification but reification is notcompatible with RDF* : RDF* is a subset of the reification possibilities- maybe its a useful subset - different discussion.

Storage systems look like they are much easier for RDF* - it looks to bean extra column on the triple/quads table.


Reification has nasty cases like partial reification (e.g. just
"? rdf:subject s . ? rdf:property p" triples).

But it is at the modelling level, not a data model extension.Reification is the ability to talk about making a claim, not thestatement itself. It's not adding triples to the domain of discourse; itis not working on the data model level.

Other approaches extend the data model such as N3 formulae ("graphs asnodes of the graph").

Named Graph are weaker - can't have a graphs in a graph - but were anaproach in most common use at the time of SPARQL 1.0.

It seems that a Jena property function extension could do some of the work of 
statement ids but it would be desirable to have serialization support as well.

The 2015 note indicates that reification "is a minor feature of RDF”



The full quote is:

[[

About reification, they [Property Graph claims] are somewhat off-track.Reification is a quite specialised feature for limited use. It is notRDF's equivalent to attributes on links in PG.

]]

Attributes on links are much closer to an n-ary relationship in RDFterms IMO. See the "A send email to B" anti-pattern discussion in theproperty graphs book. On that basis, I content that reification of onestatement is quite specialised compared to n-ary relations.

Indeed, I think that the unit is wrong - assertions come as a number ofstatements e.g. all the FOAF details of someone. Reifying eachstatement then requires having to associate the statements togetheragain - you need grouping structures.

and yet wanting track updates,


You may be interested in

https://afs.github.io/rdf-delta/

which captures updates, gives the update an id that RDF statements canthen refer to. It makes updates first class web resources.

make claims and counter-claims about particular statements, and so onis not for us a minor use-case.

If claims and counter claims are in the same graph, then the statementitself must not be in the graph else it's true.

Named graphs mean a triple is true in that graph but not in another. Soyou can make statements about that named graph. Named graphs of onetriple are useful and less overhead than full reification. I don't knowof any work comparing RDF* and NGs of one triple.

The 2015 note illustrates using event modeling to provide a natural way of 
capturing some
annotations but it does not seem to be uniformly applicable. We have many n-ary situations in our > current ontology that work well to provide essentially blank nodes where annotation statementscan be added to further describe provenance or other annotations.
However, there are plenty of situations of the form:

     subject property literal

which provide no natural place to add an annotation explaining why that 
assertion has been made or indicating that the assertion is considered in error 
and so on.

If it is an error, you have to ensure that the statement itself is notin the data, only the reification.

Using named graphs, then creating a union graph means you can have bothviews - keep the information separate so various assertions can be made,choose, for the purpose a single query, to treat all triples are validclaims.


Further similar cases arise of the form:

     subject property object-uri

that are similarly not amenable to providing natural places to add annotation 
statements.

The idea of RDF*/Sparql* seems appealing as a uniform approach to mentioning a 
statement when there is need to decorate the statement with some annotations.

On the other hand, we have entertained the idea that every basic property could 
be modeled as a potentially n-ary case which most of the time would just have a 
single statement (ignoring an implied rdf:type statement). For (a contrived) 
example,

ex:W123 a :Work ;
     :hasLCCN [ :value 741297845 ] .

rather than

ex:W123 a :Work ;
     :hasLCCN 741297845 .

The former has a blank node that would readily permit adding an annotation:

ex:W123 a :Work ;
     :hasLCCN [ :value 741297845 ;
         :retrievedFrom http://libraryofcongress.gov ;
         :retrievedOn “12/27/1997” ] .

This seems to be event-based modelling, which is a useful way to captureprovenance. By having the explicit event, you can talk about the event.


"A sends email to B" is an event, not a simple link from A to B.

"ex:W123 :hasLCCN has the value 741297845" is an event as well.

Now, excessive n-ary relationships can be messy to work with (true in PGand RDF). But if you want every detail recorded ... then you'll getvery fine grained data modelling.


Anyway, the question is really about the status of the RDF* idea and ay support 
latent or pending in Jena.

If someone wants to work on that, then I'm sure the project will look atany contributions. There is no use of the << >> syntax (in the parserit is "#if 0"'ed out) so it could be repurposed.

Adding N3-formulae is also doable for in-memory - add a new Nodesubclass to have a Node_Graph. Its beyond RDF so the consequences onseeing that through the whole system might be quite extensive.


    Andy


Thanks,
Chris

Re: statement ids, rdf* and reification

Reply via email to