On 07/08/17 19:35, Chris Tomlinson wrote:
Hello,
We're investigating various approaches to adding annotations about
individual statements (or perhaps rarely a subset of statements) of a
named graph.
There’s note from 2015, Re: Performance Cost of Reification
<http://apache.markmail.org/message/js6s6ry5st73soay>, that mentions a
syntax like:
<<A sends email to B>>,
that was proposed for use in Sparql 1.0 and that at the time of the note was
still in the ARQ
parser source.
The <<>> syntax, as in ARQ and discussed in SPARQL 1.0 is shorthand for
writing out reification, not an extension to the data model nor semantics.
<<s p o>> is syntax for
? rdf:subject s
? rdf:property p
? rdf:object o
i.e. not a triple id.
Data and/or query can be written long hand.
The syntax is similar to that of the Blazegraph RDF*/Sparql* <https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right> and we’re interested to know if these are related ideas and > if there is any anticipation that such an approach might ever find its way into appropriate
standards.
RDF* seems to be based around the assumption a statement is reified only
once and that the base fact is asserted in the graph. That means triple
ids make sense.
Note the example:
BIND( <<?bob foaf:age ?age>> AS ?t ) .
which matches the graph for ?bob foaf:age ?age and matches once to make
sense (it's a BIND).
Reification can be multiple times (in different files, with different
annotations, to be merged), and you can reify a statement without
needing it in the data (it's necessarily not asserted).
This is why RDF* is compatible with reification but reification is not
compatible with RDF* : RDF* is a subset of the reification possibilities
- maybe its a useful subset - different discussion.
Storage systems look like they are much easier for RDF* - it looks to be
an extra column on the triple/quads table.
Reification has nasty cases like partial reification (e.g. just
"? rdf:subject s . ? rdf:property p" triples).
But it is at the modelling level, not a data model extension.
Reification is the ability to talk about making a claim, not the
statement itself. It's not adding triples to the domain of discourse; it
is not working on the data model level.
Other approaches extend the data model such as N3 formulae ("graphs as
nodes of the graph").
Named Graph are weaker - can't have a graphs in a graph - but were an
aproach in most common use at the time of SPARQL 1.0.
It seems that a Jena property function extension could do some of the work of
statement ids but it would be desirable to have serialization support as well.
The 2015 note indicates that reification "is a minor feature of RDF”
The full quote is:
[[
About reification, they [Property Graph claims] are somewhat off-track.
Reification is a quite specialised feature for limited use. It is not
RDF's equivalent to attributes on links in PG.
]]
Attributes on links are much closer to an n-ary relationship in RDF
terms IMO. See the "A send email to B" anti-pattern discussion in the
property graphs book. On that basis, I content that reification of one
statement is quite specialised compared to n-ary relations.
Indeed, I think that the unit is wrong - assertions come as a number of
statements e.g. all the FOAF details of someone. Reifying each
statement then requires having to associate the statements together
again - you need grouping structures.
and yet wanting track updates,
You may be interested in
https://afs.github.io/rdf-delta/
which captures updates, gives the update an id that RDF statements can
then refer to. It makes updates first class web resources.
make claims and counter-claims about particular statements, and so on
is not for us a minor use-case.
If claims and counter claims are in the same graph, then the statement
itself must not be in the graph else it's true.
Named graphs mean a triple is true in that graph but not in another. So
you can make statements about that named graph. Named graphs of one
triple are useful and less overhead than full reification. I don't know
of any work comparing RDF* and NGs of one triple.
The 2015 note illustrates using event modeling to provide a natural way of
capturing some
annotations but it does not seem to be uniformly applicable. We have many n-ary situations in our > current ontology that work well to provide essentially blank nodes where annotation statements
can be added to further describe provenance or other annotations.
However, there are plenty of situations of the form:
subject property literal
which provide no natural place to add an annotation explaining why that
assertion has been made or indicating that the assertion is considered in error
and so on.
If it is an error, you have to ensure that the statement itself is not
in the data, only the reification.
Using named graphs, then creating a union graph means you can have both
views - keep the information separate so various assertions can be made,
choose, for the purpose a single query, to treat all triples are valid
claims.
Further similar cases arise of the form:
subject property object-uri
that are similarly not amenable to providing natural places to add annotation
statements.
The idea of RDF*/Sparql* seems appealing as a uniform approach to mentioning a
statement when there is need to decorate the statement with some annotations.
On the other hand, we have entertained the idea that every basic property could
be modeled as a potentially n-ary case which most of the time would just have a
single statement (ignoring an implied rdf:type statement). For (a contrived)
example,
ex:W123 a :Work ;
:hasLCCN [ :value 741297845 ] .
rather than
ex:W123 a :Work ;
:hasLCCN 741297845 .
The former has a blank node that would readily permit adding an annotation:
ex:W123 a :Work ;
:hasLCCN [ :value 741297845 ;
:retrievedFrom http://libraryofcongress.gov ;
:retrievedOn “12/27/1997” ] .
This seems to be event-based modelling, which is a useful way to capture
provenance. By having the explicit event, you can talk about the event.
"A sends email to B" is an event, not a simple link from A to B.
"ex:W123 :hasLCCN has the value 741297845" is an event as well.
Now, excessive n-ary relationships can be messy to work with (true in PG
and RDF). But if you want every detail recorded ... then you'll get
very fine grained data modelling.
Anyway, the question is really about the status of the RDF* idea and ay support
latent or pending in Jena.
If someone wants to work on that, then I'm sure the project will look at
any contributions. There is no use of the << >> syntax (in the parser
it is "#if 0"'ed out) so it could be repurposed.
Adding N3-formulae is also doable for in-memory - add a new Node
subclass to have a Node_Graph. Its beyond RDF so the consequences on
seeing that through the whole system might be quite extensive.
Andy
Thanks,
Chris