Hi All,

I'd like to share some information about what we've implemented and see if 
there is either:

-       Previous work done in this area, or

-       Others that might find this useful

Perhaps in longer term this is something that could even be standardized.

To set the scene we've been working on converting a rather large dataset to RDF.
The dataset is in product lifecycle management domain.
The primary goal is to have a 'virtualized' copy of the current state of all 
items that can be flexibly queried over.

For management of the data in the graph store we settled on a graph per 
resource pattern [1] where each named graph contains a description of one item 
plus some additional metadata about the graph itself.
This allows us to use HTTP operations (e.g. PUT) to interact with the named 
graphs, which is consistent with the granularity of updates to individual items 
from the source system (i.e. any change to an item creates a new version of the 
item which replaces the previous version).

However we also knew that the updates from the source system were sent as 
messages which contain the description of one or more changed items plus the 
description of all related items potentially impacted by the change.
One option we considered was to deconstruct the message into several HTTP PUT 
operations for each item described in a particular message.
However this would have the downside that the updates in the graph store (state 
changes) do not directly correspond to the messages and that potentially the 
updates in a message might be half applied should there be some error during 
processing.

The solution we arrived at was the convert the message to RDF quads and apply 
the update with a HTTP PATCH request to the graph store with 'custom' semantics.
We define HTTP PATCH using Quad data as equivalent to:

-       DROP SILENT operation on each named graph in payload, followed by

-       INSERT DATA operation on each named graph in payload

In other words this is the same as a HTTP PUT request against each named graph 
in the quad data.

This allows us to apply the changes described in a message in one atomic action.
Any named graphs already present in the graph store that are not in the RDF 
quad payload are not mutated.

There is some more info in slides 14-17 of a recent presentation [2].

One could imagine similar quad semantics for HTTP GET, PUT, POST and DELETE 
where:

-       GET would return the entire contents of the graph store in the 
requested quad format (could also support triples where context is omitted)

-       PUT would replace the entire contents of the graph store with the RDF 
quad payload

-       POST would insert the RDF quad payload into the graph store leaving 
existing data intact

-       DELETE would be equivalent to DROP ALL

Here it may also be useful to have separate URIs to represent the graph store 
instance and the data in that instance to remove any ambiguity if the DELETE 
request, for example, should delete the graph store itself or the data in the 
store.

Regards,

John Walker

[1] http://patterns.dataincubator.org/book/graph-per-resource.html
[2] http://www.nxp.com/documents/other/PiLOD2_20140417.pdf

Reply via email to