Re: Comparing Sesame and Clerezza RDF API (was: ApacheCon EU CFP is open)

Andy Seaborne Wed, 08 Aug 2012 07:56:59 -0700

Sebastian Schaffert wrote:
>
Reto wrote:
>>

In the Jena API a resource retuned in triple-iterator is directly an object
similar to a GraphNode in Clerezza, something that is tied to the graph
('model' in Jena) and which can thus provide methods to list its properties
and their values. This is not the case for the Sesame API (at least not in

>> org.openrdf.model).

Reto is only referring to the resource API. Storage system implement theGraph/Triple/Node API which is simpler.

>- Ids for BNode. In ZZ Bnodes are just what they are according to the
>specs: Anonymous resources. They are not java serializable objects so a
>client can only reference a BNode as long as the object is alive. This
>allows implementation to remove obsolete triples/duplicate bnodes when
>nobody holds a reference to that bnode. In Sesame BNodes have an ID and can
>be reconstructed with an ID. This means that an implementation doesn't know
>how long a bnode is referenced. When a duplicate is detected it should
>internally keep all the aliases of the node as it doesn't know for sure
>clients will not reference this bnode by a specific id it was once exposed
>with.

The semantics of BNodes are an issue of open debate and even dispute until 
today. In practice, it is often a disadvantage to not expose an ID, and this is 
why both Sesame and Jena do it, and most serialization formats also do it. 
Actually I had some troubles with Clerezza in Stanbol for exactly this reason. 
The case that does not work easily here is incremental updates of graphs 
between two systems involving blank nodes. In the specification, this case is 
forbidden (blank nodes are always distinct). In practice, it is very useful to 
still be able to do it. And actually, since this is also very common practice 
in logics (so-called Skolemization) the RDF specification takes this into 
account and explicitly acknowledge it [4]:

"Blank node identifiers are local identifiers that are used in some concrete RDF 
syntaxes or RDF store implementations. They are always locally scoped to the file or RDF 
store, and are not persistent or portable identifiers for blank nodes…."

[4]http://www.w3.org/TR/rdf11-concepts/#dfn-blank-node


Exactly.

And having no name is not the same as having no identity. Inversefunctional properties show that.

The ZZ approach of having a literal factory
to convert java types to literals is more flexible and can be extended to
new types.


Which is (strictly speaking) not really foreseen in the RDF

> specification. But I agree that it can be convenient …

Convenient ... sometimes! As the mapping of java to XSD isn't perfect,users do (rightly) get confused. Esp. around dates - more a Java issuebut even numbers throw up a stead stream of user questions. (c.f.javascript numbers)


- In Sesame Graphs triples are added with one or several context. Such a
context is not defined in RDF semantics or in the abstract Syntax. In
Sesame a Graph is a collection of Statements where a Statement is not the
same as a Triple in RDF


The currently official RDF specification dates back to 1998 with a

minor revision in 2004 [1]. The definition of named graphs is undergoing
specification in the course of the work on RDF 1.1 at the W3C until 2013
[2]. Whether it is technically represented as quadruples (Sesame) or as
in the proposal for the abstract RDF model (as in Clerezza) is merely an
implementation detail, and also still under discussion. The Sesame
approach implements essentially the named graph specification of SPARQL
1.1 [3] (the only one that currently officially exists) and has the
advantage of offering more efficient implementations, and especially of
being very convenient to the user (e.g. give me triples matching a
certain pattern and occurring either in graph1 or in graph2).


[1] http://www.w3.org/TR/REC-rdf-syntax/
[2] http://www.w3.org/TR/rdf11-concepts/#section-dataset
[3] http://www.w3.org/TR/sparql11-query/#rdfDataset

The SPARQL approach was an attempt to find compromise between theapproaches such as Sesame (and other systems where the 4th field is thecontext labelling part of the graph/store), systems such as 3Store wherethe 4th field is source tracking, AWWW and what it says aboutdereferencing a name, and then keeping things apart that are namedapart, as well as the "named graphs" paper (where naming is the value,not the container).


It is a compromise to cover different use cases.

API designs are compromises too.

        Andy

Re: Comparing Sesame and Clerezza RDF API (was: ApacheCon EU CFP is open)

Reply via email to