On 07/08/17 19:35, Chris Tomlinson wrote:
Hello,

We're investigating various approaches to adding annotations about
individual statements (or perhaps rarely a subset of statements) of a
named graph.

There’s note from 2015, Re: Performance Cost of Reification
<http://apache.markmail.org/message/js6s6ry5st73soay>, that mentions a
syntax like:

     <<A sends email to B>>,

that was proposed for use in Sparql 1.0 and that at the time of the note was 
still in the ARQ
parser source.

The <<>> syntax, as in ARQ and discussed in SPARQL 1.0 is shorthand for writing out reification, not an extension to the data model nor semantics.

<<s p o>> is syntax for

? rdf:subject s
? rdf:property p
? rdf:object o

i.e. not a triple id.

Data and/or query can be written long hand.


The syntax is similar to that of the Blazegraph RDF*/Sparql* <https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right> and we’re interested to know if these are related ideas and > if there is any anticipation that such an approach might ever find its way into appropriate standards.

RDF* seems to be based around the assumption a statement is reified only once and that the base fact is asserted in the graph. That means triple ids make sense.

Note the example:

BIND( <<?bob foaf:age ?age>> AS ?t ) .

which matches the graph for ?bob foaf:age ?age and matches once to make sense (it's a BIND).

Reification can be multiple times (in different files, with different annotations, to be merged), and you can reify a statement without needing it in the data (it's necessarily not asserted).

This is why RDF* is compatible with reification but reification is not compatible with RDF* : RDF* is a subset of the reification possibilities - maybe its a useful subset - different discussion.

Storage systems look like they are much easier for RDF* - it looks to be an extra column on the triple/quads table.

Reification has nasty cases like partial reification (e.g. just
"? rdf:subject s . ? rdf:property p" triples).

But it is at the modelling level, not a data model extension. Reification is the ability to talk about making a claim, not the statement itself. It's not adding triples to the domain of discourse; it is not working on the data model level.

Other approaches extend the data model such as N3 formulae ("graphs as nodes of the graph").

Named Graph are weaker - can't have a graphs in a graph - but were an aproach in most common use at the time of SPARQL 1.0.

It seems that a Jena property function extension could do some of the work of 
statement ids but it would be desirable to have serialization support as well.

The 2015 note indicates that reification "is a minor feature of RDF”


The full quote is:

[[
About reification, they [Property Graph claims] are somewhat off-track. Reification is a quite specialised feature for limited use. It is not RDF's equivalent to attributes on links in PG.
]]

Attributes on links are much closer to an n-ary relationship in RDF terms IMO. See the "A send email to B" anti-pattern discussion in the property graphs book. On that basis, I content that reification of one statement is quite specialised compared to n-ary relations.

Indeed, I think that the unit is wrong - assertions come as a number of statements e.g. all the FOAF details of someone. Reifying each statement then requires having to associate the statements together again - you need grouping structures.


and yet wanting track updates,

You may be interested in

https://afs.github.io/rdf-delta/

which captures updates, gives the update an id that RDF statements can then refer to. It makes updates first class web resources.

make claims and counter-claims about particular statements, and so on is not for us a minor use-case.

If claims and counter claims are in the same graph, then the statement itself must not be in the graph else it's true.

Named graphs mean a triple is true in that graph but not in another. So you can make statements about that named graph. Named graphs of one triple are useful and less overhead than full reification. I don't know of any work comparing RDF* and NGs of one triple.

The 2015 note illustrates using event modeling to provide a natural way of 
capturing some
annotations but it does not seem to be uniformly applicable. We have many n-ary situations in our > current ontology that work well to provide essentially blank nodes where annotation statements can be added to further describe provenance or other annotations.

However, there are plenty of situations of the form:

     subject property literal

which provide no natural place to add an annotation explaining why that 
assertion has been made or indicating that the assertion is considered in error 
and so on.

If it is an error, you have to ensure that the statement itself is not in the data, only the reification.

Using named graphs, then creating a union graph means you can have both views - keep the information separate so various assertions can be made, choose, for the purpose a single query, to treat all triples are valid claims.


Further similar cases arise of the form:

     subject property object-uri

that are similarly not amenable to providing natural places to add annotation 
statements.

The idea of RDF*/Sparql* seems appealing as a uniform approach to mentioning a 
statement when there is need to decorate the statement with some annotations.

On the other hand, we have entertained the idea that every basic property could 
be modeled as a potentially n-ary case which most of the time would just have a 
single statement (ignoring an implied rdf:type statement). For (a contrived) 
example,

ex:W123 a :Work ;
     :hasLCCN [ :value 741297845 ] .

rather than

ex:W123 a :Work ;
     :hasLCCN 741297845 .

The former has a blank node that would readily permit adding an annotation:

ex:W123 a :Work ;
     :hasLCCN [ :value 741297845 ;
         :retrievedFrom http://libraryofcongress.gov ;
         :retrievedOn “12/27/1997” ] .

This seems to be event-based modelling, which is a useful way to capture provenance. By having the explicit event, you can talk about the event.

"A sends email to B" is an event, not a simple link from A to B.

"ex:W123 :hasLCCN has the value 741297845" is an event as well.

Now, excessive n-ary relationships can be messy to work with (true in PG and RDF). But if you want every detail recorded ... then you'll get very fine grained data modelling.


Anyway, the question is really about the status of the RDF* idea and ay support 
latent or pending in Jena.

If someone wants to work on that, then I'm sure the project will look at any contributions. There is no use of the << >> syntax (in the parser it is "#if 0"'ed out) so it could be repurposed.

Adding N3-formulae is also doable for in-memory - add a new Node subclass to have a Node_Graph. Its beyond RDF so the consequences on seeing that through the whole system might be quite extensive.

    Andy


Thanks,
Chris






Reply via email to