This email covers the changes in RDF 1.1 around plain literals.

If you think you are affected by this, please let us now as soon as possible.

(If you read dev@jena you'll have seen this already - it's being sent to users@jena to get the wider audience.)

== Summary

In RDF 1.1, all literals have a datatype.

* simple literals (e..g "foo") have datatype xsd:string.

* literals with a language tag (e.g. "foo"@en)
  have a datatype rdf:langString.

This change may have an impact on databases.

== RDF 1.1

The current situation for RDF (know as RDF-2004) is that "plain literals" are literals which have no datatype. They are either "simple literals" (no datatype, no language tag) or have a language tag. A literal does not have both a language tag and a datatype in RDF-2004.

In RDF 1.1, all literals have a datatype always.

* simple literals have datatype xsd:string.
  simple literals and xsd:strings are the same RDF term.

* literals with a language tag have datatype rdf:langString.

This is a change but the working group believes it is a small one. Mixed data, with both plain literals and xsd:string is assumed to be rare.

The first one, simple literal/xsd:string, is the more significant change.

== Example

Previously:

:s :p "foo" .
:s :p "foo"^^xsd:string .

was 2 triples. In RDF 1.1 there is a graph of one triple there because a graph is a set of triples; "foo" and "foo"^^xsd:string are different ways of writing the same thing much like this shows two ways to write the same triple:

---------
@prefix : <http://example/> .

:x :p 123 .
<http://example/x> :p 123 .
---------

== Syntax

This change happens because of the treatment of syntax, input and output:

On input, simple literal and xsd:string create the same RDF term, with datatype xsd:string. Langtags cause a literal with type rdf:langString, and a language tag, to be created.

On output, the plain literal forms are used. xsd:string and xsd:langString do not appear in the output.

(Aside: rdf:plainLiteral should never appear in RDF data but we could do the same transforms to the canonical value form)

== Effects
(due to xsd:string)

Systems using xsd:string, and sensitive to an explicit type, are affected. At a guess, OWL systems, maybe Protégé (but I have no evidence one way of the other. They see to have xsd:strings in the data and until converted may see data without explicit xsd:string and get confused.)

The numbers of triples changes IF the same subject/predicate is used with simple literals and with xsd:strings.

Generally, I see data that either uses xsd:string or uses simple literals. Mixing seems quite rare.

== Jena
(xsd:string)

Jena in-memory already equates simple literals and xsd:strings for searching (i.e. Graph.find) so while the number of results can change, it should not a case of not finding data.

The worse case is producing data for other systems that are not RDF 1.1 and do expect an explicit xsd:string datatype on literals.

== RDF API users
(rdf:langString)

The key is "test language before datatype" - if tested that way round the appearance of rdf:langString will not matter. If the test is "datatype first, null meaning plain literal", it will matter.

I doubt much code outside Jena does this sort of thing - it's something writers do so that needs completely checking but it's just a case of finding all the calls of getLiteralLanguage().

This is the most significant rdf:langString related change as far as I can see.

== SPARQL
(xsd:string)

SPARQL already has some adaptation:
   datatype("x") = xsd:string           (SPARQL 1.0)
   datatype("x"@en) = rdf:langString    (SPARQL 1.1)

Due to the xsd:string change, matching basic graph patterns may produce a result it didn't before:

{ ?x :p "foo"^^xsd:string }  will match data  :x :p "foo"
{ ?x :p "foo" }              will match data  :x :p "foo"^^xsd:string

It makes it easier to optimize FILTER(?x = "foo")

== Databases
(xsd:string)

Anything that relies on a hash of literal in a system that uses xsd:string will need to reload. Currently, if keeping simple literals and xsd:strings apart includes hashing them differently, then this change is significant.

This does affect TDB and SDB.

= Compatibility

We could provide some compatibility

1/ The ability to write data with explicit xsd:string
2/ Hide rdf:langString from Node.getLiteralDatatype()

What does not work is recording whether an RDF term was originally written as xsd:string or as a simple literal. That could end up with two different terms (Nodes) that represent the same term, or non-determinism depending on which term is seen first.

    Andy

Reply via email to