Hi,

Thank so much for such a thorough response. For now we’re looking closely at 
using reification of complete statements. The event-based approach seems 
limited to situations in which annotations are attached to single statement 
complexes within a single graph and we would like to support annotations that 
apply to sets of statements that may be in several named graphs. Also the 
event-based approach seemed also a bit confining when applied to simple cases 
such as:

    subject property literal

especially as we wanted to retain information on ranges in our current 
ontology. Given a skeletal example:

bdr:W12827 a :Work ;
    :workLccn “75903140” .

It seems workable to use reification in a stylized manner:

stmt:W12827_S0001 a rdf:Statement ;
    rdf:subject bdr:W12827 ;
    rdf:property :hasLCCN ;
    rdf:object “75903140” ;
    :retrieved [ 
        :retrievedFrom http://lccn.loc.gov/75903140 
<http://lccn.loc.gov/75903140> ;
        :retrievedOn “12/27/1998”^^xsd:date ] . 

It is clear that the above is not attaching the annotation to the original 
statement but rather associating the annotation with a description that is 
associated with the original statement by a convention outside the system as it 
were. This is not appealing but it seems workable.

How would an event-based approach (I’ve been thinking of it as a blank-node 
approach or “everything n-ary”) work across an arbitrary set of statements 
possibly across two or more named graphs?

After we have some experience with the reification approach perhaps we can make 
an informed proposal for a reification oriented extension to Jena.

We’re really interested in what approaches are currently taken for supporting 
annotations other than RDR, single property, and event-based. How important are 
annotations in current mainstream usage of Jena?

Thank you again for your reply,
Chris


> On Aug 8, 2017, at 5:37 AM, Andy Seaborne <a...@apache.org> wrote:
> 
> 
> 
> On 07/08/17 19:35, Chris Tomlinson wrote:
>> Hello,
>> We're investigating various approaches to adding annotations about
>> individual statements (or perhaps rarely a subset of statements) of a
>> named graph.
>> There’s note from 2015, Re: Performance Cost of Reification
>> <http://apache.markmail.org/message/js6s6ry5st73soay>, that mentions a
>> syntax like:
>>     <<A sends email to B>>,
>> that was proposed for use in Sparql 1.0 and that at the time of the note was 
>> still in the ARQ
>> parser source.
> 
> The <<>> syntax, as in ARQ and discussed in SPARQL 1.0 is shorthand for 
> writing out reification, not an extension to the data model nor semantics.
> 
> <<s p o>> is syntax for
> 
> ? rdf:subject s
> ? rdf:property p
> ? rdf:object o
> 
> i.e. not a triple id.
> 
> Data and/or query can be written long hand.
> 
>> The syntax is similar to that of the Blazegraph RDF*/Sparql* 
>> <https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right> and 
>> we’re interested to know if these are related ideas and > if there is any 
>> anticipation that such an approach might ever find its way into appropriate 
>> standards.
> 
> RDF* seems to be based around the assumption a statement is reified only once 
> and that the base fact is asserted in the graph. That means triple ids make 
> sense.
> 
> Note the example:
> 
> BIND( <<?bob foaf:age ?age>> AS ?t ) .
> 
> which matches the graph for ?bob foaf:age ?age and matches once to make sense 
> (it's a BIND).
> 
> Reification can be multiple times (in different files, with different 
> annotations, to be merged), and you can reify a statement without needing it 
> in the data (it's necessarily not asserted).
> 
> This is why RDF* is compatible with reification but reification is not 
> compatible with RDF* : RDF* is a subset of the reification possibilities - 
> maybe its a useful subset - different discussion.
> 
> Storage systems look like they are much easier for RDF* - it looks to be an 
> extra column on the triple/quads table.
> 
> Reification has nasty cases like partial reification (e.g. just
> "? rdf:subject s . ? rdf:property p" triples).
> 
> But it is at the modelling level, not a data model extension. Reification is 
> the ability to talk about making a claim, not the statement itself. It's not 
> adding triples to the domain of discourse; it is not working on the data 
> model level.
> 
> Other approaches extend the data model such as N3 formulae ("graphs as nodes 
> of the graph").
> 
> Named Graph are weaker - can't have a graphs in a graph - but were an aproach 
> in most common use at the time of SPARQL 1.0.
> 
>> It seems that a Jena property function extension could do some of the work 
>> of statement ids but it would be desirable to have serialization support as 
>> well.
>> The 2015 note indicates that reification "is a minor feature of RDF” 
> 
> 
> The full quote is:
> 
> [[
> About reification, they [Property Graph claims] are somewhat off-track. 
> Reification is a quite specialised feature for limited use. It is not RDF's 
> equivalent to attributes on links in PG.
> ]]
> 
> Attributes on links are much closer to an n-ary relationship in RDF terms 
> IMO.  See the "A send email to B" anti-pattern discussion in the property 
> graphs book.  On that basis, I content that reification of one statement is 
> quite specialised compared to n-ary relations.
> 
> Indeed, I think that the unit is wrong - assertions come as a number of 
> statements e.g. all the FOAF details of someone.  Reifying each statement 
> then requires having to associate the statements together again - you need 
> grouping structures.
> 
>> and yet wanting track updates, 
> 
> You may be interested in
> 
> https://afs.github.io/rdf-delta/
> 
> which captures updates, gives the update an id that RDF statements can then 
> refer to.  It makes updates first class web resources.
> 
>> make claims and counter-claims about particular statements, and so on is not 
>> for us a minor use-case.
> 
> If claims and counter claims are in the same graph, then the statement itself 
> must not be in the graph else it's true.
> 
> Named graphs mean a triple is true in that graph but not in another.  So you 
> can make statements about that named graph.  Named graphs of one triple are 
> useful and less overhead than full reification.  I don't know of any work 
> comparing RDF* and NGs of one triple.
> 
>> The 2015 note illustrates using event modeling to provide a natural way of 
>> capturing some
>> annotations but it does not seem to be uniformly applicable. We have many 
>> n-ary situations in our > current ontology that work well to provide 
>> essentially blank nodes where annotation statements can be added to further 
>> describe provenance or other annotations.
>> However, there are plenty of situations of the form:
>>     subject property literal
>> which provide no natural place to add an annotation explaining why that 
>> assertion has been made or indicating that the assertion is considered in 
>> error and so on.
> 
> If it is an error, you have to ensure that the statement itself is not in the 
> data, only the reification.
> 
> Using named graphs, then creating a union graph means you can have both views 
> - keep the information separate so various assertions can be made, choose, 
> for the purpose a single query, to treat all triples are valid claims.
> 
>> Further similar cases arise of the form:
>>     subject property object-uri
>> that are similarly not amenable to providing natural places to add 
>> annotation statements.
>> The idea of RDF*/Sparql* seems appealing as a uniform approach to mentioning 
>> a statement when there is need to decorate the statement with some 
>> annotations.
>> On the other hand, we have entertained the idea that every basic property 
>> could be modeled as a potentially n-ary case which most of the time would 
>> just have a single statement (ignoring an implied rdf:type statement). For 
>> (a contrived) example,
>> ex:W123 a :Work ;
>>     :hasLCCN [ :value 741297845 ] .
>> rather than
>> ex:W123 a :Work ;
>>     :hasLCCN 741297845 .
>> The former has a blank node that would readily permit adding an annotation:
>> ex:W123 a :Work ;
>>     :hasLCCN [ :value 741297845 ;
>>         :retrievedFrom http://libraryofcongress.gov ;
>>         :retrievedOn “12/27/1997” ] .
> 
> This seems to be event-based modelling, which is a useful way to capture 
> provenance.  By having the explicit event, you can talk about the event.
> 
> "A sends email to B" is an event, not a simple link from A to B.
> 
> "ex:W123 :hasLCCN has the value 741297845" is an event as well.
> 
> Now, excessive n-ary relationships can be messy to work with (true in PG and 
> RDF).  But if you want every detail recorded ... then you'll get very fine 
> grained data modelling.
> 
>> Anyway, the question is really about the status of the RDF* idea and ay 
>> support latent or pending in Jena.
> 
> If someone wants to work on that, then I'm sure the project will look at any 
> contributions.  There is no use of the << >> syntax (in the parser it is "#if 
> 0"'ed out) so it could be repurposed.
> 
> Adding N3-formulae is also doable for in-memory - add a new Node subclass to 
> have a Node_Graph. Its beyond RDF so the consequences on seeing that through 
> the whole system might be quite extensive.
> 
>    Andy
> 
>> Thanks,
>> Chris

Reply via email to