RDF 1.1 -- changes to plain literals -- impact assessment

Andy Seaborne Wed, 06 Nov 2013 03:12:20 -0800

This email covers the changes in RDF 1.1 around plain literals.

If you think you are affected by this, please let us now as soon aspossible.

(If you read dev@jena you'll have seen this already - it's being sent tousers@jena to get the wider audience.)


== Summary

In RDF 1.1, all literals have a datatype.

* simple literals (e..g "foo") have datatype xsd:string.

* literals with a language tag (e.g. "foo"@en)
  have a datatype rdf:langString.

This change may have an impact on databases.

== RDF 1.1

The current situation for RDF (know as RDF-2004) is that "plainliterals" are literals which have no datatype. They are either "simpleliterals" (no datatype, no language tag) or have a language tag. Aliteral does not have both a language tag and a datatype in RDF-2004.


In RDF 1.1, all literals have a datatype always.

* simple literals have datatype xsd:string.
  simple literals and xsd:strings are the same RDF term.

* literals with a language tag have datatype rdf:langString.

This is a change but the working group believes it is a small one. Mixeddata, with both plain literals and xsd:string is assumed to be rare.


The first one, simple literal/xsd:string, is the more significant change.

== Example

Previously:

:s :p "foo" .
:s :p "foo"^^xsd:string .

was 2 triples. In RDF 1.1 there is a graph of one triple there becausea graph is a set of triples; "foo" and "foo"^^xsd:string are differentways of writing the same thing much like this shows two ways to writethe same triple:


---------
@prefix : <http://example/> .

:x :p 123 .
<http://example/x> :p 123 .
---------

== Syntax

This change happens because of the treatment of syntax, input and output:

On input, simple literal and xsd:string create the same RDF term, withdatatype xsd:string. Langtags cause a literal with type rdf:langString,and a language tag, to be created.

On output, the plain literal forms are used. xsd:string andxsd:langString do not appear in the output.

(Aside: rdf:plainLiteral should never appear in RDF data but we could dothe same transforms to the canonical value form)


== Effects
(due to xsd:string)

Systems using xsd:string, and sensitive to an explicit type, areaffected. At a guess, OWL systems, maybe Protégé (but I have noevidence one way of the other. They see to have xsd:strings in the dataand until converted may see data without explicit xsd:string and getconfused.)

The numbers of triples changes IF the same subject/predicate is usedwith simple literals and with xsd:strings.

Generally, I see data that either uses xsd:string or uses simpleliterals. Mixing seems quite rare.


== Jena
(xsd:string)

Jena in-memory already equates simple literals and xsd:strings forsearching (i.e. Graph.find) so while the number of results can change,it should not a case of not finding data.

The worse case is producing data for other systems that are not RDF 1.1and do expect an explicit xsd:string datatype on literals.


== RDF API users
(rdf:langString)

The key is "test language before datatype" - if tested that way roundthe appearance of rdf:langString will not matter. If the test is"datatype first, null meaning plain literal", it will matter.

I doubt much code outside Jena does this sort of thing - it's somethingwriters do so that needs completely checking but it's just a case offinding all the calls of getLiteralLanguage().

This is the most significant rdf:langString related change as far as Ican see.


== SPARQL
(xsd:string)

SPARQL already has some adaptation:
   datatype("x") = xsd:string           (SPARQL 1.0)
   datatype("x"@en) = rdf:langString    (SPARQL 1.1)

Due to the xsd:string change, matching basic graph patterns may producea result it didn't before:


{ ?x :p "foo"^^xsd:string }  will match data  :x :p "foo"
{ ?x :p "foo" }              will match data  :x :p "foo"^^xsd:string

It makes it easier to optimize FILTER(?x = "foo")

== Databases
(xsd:string)

Anything that relies on a hash of literal in a system that usesxsd:string will need to reload. Currently, if keeping simple literalsand xsd:strings apart includes hashing them differently, then thischange is significant.


This does affect TDB and SDB.

= Compatibility

We could provide some compatibility

1/ The ability to write data with explicit xsd:string
2/ Hide rdf:langString from Node.getLiteralDatatype()

What does not work is recording whether an RDF term was originallywritten as xsd:string or as a simple literal. That could end up withtwo different terms (Nodes) that represent the same term, ornon-determinism depending on which term is seen first.


    Andy

RDF 1.1 -- changes to plain literals -- impact assessment

Reply via email to