Re: Creating JSON from RDF

Dave Reynolds Sun, 13 Dec 2009 05:35:51 -0800

Hi Jeni,

Jeni Tennison wrote:

As part of the linked data work the UK government is doing, we'relooking at how to use the linked data that we have as the basis of APIsthat are readily usable by developers who really don't want to learnabout RDF or SPARQL.

Wow! Talk about timing. We are looking at exactly the same issue as partof the TSB work and were starting to look at JSON formats just this lastcouple of days. We should combine forces.

One thing that we want to do is provide JSON representations of both RDFgraphs and SPARQL results. I wanted to run some ideas past this group asto how we might do that.

I agree we want both graphs and SPARQL results but I think there isanother third case - lists of described objects.

This seems to have been a common pattern in the apps that I've workedon. You want to find all objects (resources in RDF speak) that matchsome criteria, with some ordering, and get back a list of them and theirassociated properties. This is like a SPARQL DESCRIBE operating on eachof an ordered list of resources found by a SPARQL SELECT.

The point is that this is not a graph because the top level list needsto be ordered. It is not a SPARQL result set because you want thedescriptions to include any of the properties that are present in thedata (potentially included bNode closure) without having to know allthose and spell them out in the query. But it is a natural thing to wantto return from a REST API.

To put this in context, what I think we should aim for is a purepublishing format that is optimised for approachability for normaldevelopers, *not* an interchange format. RDF/JSON [1] and the SPARQLresults JSON format [2] aren't entirely satisfactory as far as I'mconcerned because of the way the objects of statements are representedas JSON objects rather than as simple values. I still think we shouldproduce them (to wean people on to, and for those using more generictools), but I'd like to think about producing something that is a bitmore immediately approachable too.
RDFj [3] is closer to what I think is needed here. However, I don'tthink there's a need for setting 'context' given I'm not aiming for aninterchange format, there are no clear rules about how to generate itfrom an arbitrary graph (basically there can't be without someadditional configuration) and it's not clear how to deal with datatypesor languages.

WRT 'context' you might not need it but it I don't think it is harmful.I think if we said to developers that there is some outer wrapper like:


{
   "format" : "RDF-JSON",
   "version" : "0.1",
   "mapping" :  ... magic stuff ...
   "data" : ... the bit you care about ...
}

The developers would be quite happy doing that one dereference andignore the mapping stuff but it might allow inversion back to RDF forthose few who do care, or come to care.

I suppose my first question is whether there are any other JSON-basedformats that we should be aware of, that we could use or borrow ideas from?

The one that most intrigued me as a possible starting point was theSimile Exhibit JSON format [1]. It is developer friendly in much the waythat you talk about but it has the advantage of zero configuration, somemeasure of invertability, has an online translator [2] and is supportedby the RPI Sparql proxy [3].


I've some reservations about standardizing on it as is:
 - lack of documentation of the mapping

- some inconsistencies in how references between resources are encoded(at least judging by the output of Babel[2] on test cases)- handling of bNodes - I'd rather single referenced bNodes wereserialized as nested structures

[There was another format we used in a project in my previous existencebut I'm not sure if that was made public anywhere, will check.]

Assuming there aren't, I wanted to discuss what generic rules we mightuse, where configuration is necessary and how the configuration might bedone.

One starting assumption to call out: I'd like to aim for a zeroconfiguration option and that explicit configuration is only used tohelp tidy things up but isn't required to get started.

# RDF Graphs #

Let's take as an example:

  <http://www.w3.org/TR/rdf-syntax-grammar>
    dc:title "RDF/XML Syntax Specification (Revised)" ;
    ex:editor [
      ex:fullName "Dave Beckett" ;
      ex:homePage <http://purl.org/net/dajobe/> ;
    ] .

In JSON, I think we'd like to create something like:

  {
    "$": "http://www.w3.org/TR/rdf-syntax-grammar";,
    "title": "RDF/XML Syntax Specification (Revised)",
    "editor": {
      "name": "Dave Beckett",
      "homepage": "http://purl.org/net/dajobe/";
    }
  }


+1 on style

In terms of details I was thinking of following the Simile convention onshort form naming that, in the absence of clashes, use the rdfs:labelfalling back to the localname, as the basis for the shortened propertynames. So knowing nothing else the bNode would be:


  ...
    "editor": {
       "fullName": "Dave Beckett",
       "homePage": "http://purl.org/net/dajobe/";
    }

In the event of clashes then fall back on a prefix based disambiguation.

Note that the "$" is taken from RDFj. I'm not convinced it's a good ideato use this symbol, rather than simply a property called "about" or"this" -- any opinions?

I'd prefer "id" (though "about" is OK), "$" is too heavily overused injavascript libraries.

Also note that I've made no distinction in the above between a URI and aliteral, while RDFj uses <>s around literals. My feeling is that normaldevelopers really don't care about the distinction between a URI literaland a pointer to a resource, and that they will base the treatment ofthe value of a property on the (name of) the property itself.


Probably right.

Actually, in your example isn't that value a resource anyway? To make ita literal you'd have to have:


  ex:homePage "http://purl.org/net/dajobe/"^^xsd:anyURI

So, the first piece of configuration that I think we need here is to mapproperties on to short names that make good JSON identifiers (ie nametokens without hyphens). Given that properties normally havelowercaseCamelCase local names, it should be possible to use that as adefault. If you need something more readable, though, it seems like itshould be possible to use a property of the property, such as:
  ex:fullName api:jsonName "name" .
  ex:homePage api:jsonName "homepage" .

Suggest Simile approach and have api:jsonName or your API as an optionalextra for resolving problems rather than a requirement.

However, in any particular graph, there may be properties that have beengiven the same JSON name (or, even more probably, local name). We couldprovide multiple alternative names that could be chosen between, but anymapping to JSON is going to need to give consistent results across agiven dataset for people to rely on it as an API, and that means themapping can't be based on what's present in the data. We could dosomething with prefixes, but I have a strong aversion to assuming globalprefixes.
So I think this means that we need to provide configuration at an APIlevel rather than at a global level: something that can be usedconsistently across a particular API to determine the token that's usedfor a given property. For example:
  <> a api:JSON ;
    api:mapping [
      api:property ex:fullName ;
      api:name "name" ;
    ] , [
      api:property ex:homePage ;
      api:name "homepage" ;
    ] .

Are you thinking of this as something the publisher provides or the APIcaller provides?

If the former, then OK but as I say I think a zero config set of defaultconventions is OK with the API to allow fine tuning.

There are four more areas where I think there's configuration we need tothink about:
  * multi-valued properties
  * typed and language-specific values
  * nesting objects
  * suppressing properties

## Multi-valued Properties ##
First one first. It seems obvious that if you have a property withmultiple values, it should turn into a JSON array structure. For example:
  [] foaf:name "Anna Wilder" ;
    foaf:nick "wilding", "wilda" ;
    foaf:homepage <http://example.org/about> .

should become something like:

  {
    "name": "Anna Wilder",
    "nick": [ "wilding", "wilda" ],
    "homepage": "http://example.org/about";
  }

+1

The trouble is that if you determine whether something is an array ornot based on the data that is actually available, you'll get situationswhere the value of a particular JSON property is sometimes an array andsometimes a string; that's bad for predictability for the people usingthe API. (RDF/JSON solves this by every value being an array, but that'scounter-intuitive for normal developers.)
So I think a second API-level configuration that needs to be made is toindicate which properties should be arrays and which not:
  <> a api:API ;
    api:mapping [
      api:property foaf:nick ;
      api:name "nick" ;
      api:array true ;
    ] .

So if this is not specified in the mapping then you get theunpredictable behaviour but by providing a mapping spec you can forcearrays on single values but not force singletons on multi-values. Isthat right? If so OK.

There is a related issue: how to represent RDF lists. There are timesyou want ordered property values. At the RDF end the good way to do thatis to use lists (sorry "collections"). I'd argue that a naturalrepresentation of:


   <http://example.com/ourpaper>
       ex:authors (
              <http://example.com/people#Jeni>
               <http://example.com/people#Dave
       ) .

is

  {
      "id" : "http://example.com/ourpaper";,
      "authors" : [
         "http://example.com/people#Jeni";,
         "http://example.com/people#Dave";
      ]
  }

The problem is that this looks just the same as the multi-valued case.

We could:
(1) decide not to care, the mapping can't be inverted

(2) keep this mapping but include context information in the outerwrapper that allows the inversion (in uniform cases)

(3) have a separate list notation:

  {
      "id" : "http://example.com/ourpaper";,
      "authors" : { "type" : "list", "value" : [
         "http://example.com/people#Jeni";,
         "http://example.com/people#Dave";
      ] }
  }

My preference is (2) because I think lists are really useful and shouldbe as simple as possible in the JSON translation but think (3) istechnically cleaner.

## Typed Values and Languages ##

Typed values and values with languages are really the same problem.


Not sure I agree with this, see later.

Ifwe have something like:


  <http://statistics.data.gov.uk/id/local-authority-district/00PB>
    skos:prefLabel "The County Borough of Bridgend"@en ;
    skos:prefLabel "Pen-y-bont ar Ogwr"@cy ;
    skos:notation "00PB"^^geo:StandardCode ;
    skos:notation "6405"^^transport:LocalAuthorityCode .

then we'd really want the JSON to look something like:

  {
    "$": "http://statistics.data.gov.uk/id/local-authority-district/00PB";,
    "name": "The County Borough of Bridgend",
    "welshName": "Pen-y-bont ar Ogwr",
    "onsCode": "00PB",
    "dftCode": "6405"
  }

I think that for this to work, the configuration needs to be able tofilter values based on language or datatype to determine the JSONproperty name. Something like:


  <> a api:JSON ;
    api:mapping [
      api:property skos:prefLabel ;
      api:lang "en" ;
      api:name "name" ;
    ] , [
      api:property skos:prefLabel ;
      api:lang "cy" ;
      api:name "welshName" ;
    ] , [
      api:property skos:notation ;
      api:datatype geo:StandardCode ;
      api:name "onsCode" ;
    ] , [
      api:property skos:notation ;
      api:datatype transport:LocalAuthorityCode ;
      api:name "dftCode" ;
    ] .


Neat but ...

Language codes are effectively open ended. I can't necessarily predictwhat lang codes are going to be in my data and provide a propertymapping for every single one.

Plus when working with language-tagged data you often have code to do a"best match" (not simple lookup) between the user's language preferencesand the available lang tags. That looks hard if each is in a differentproperty and the lang tags themselves are hidden in the API configuration.


I think we may need the long winded encoding available:

{
  "id" : "http://statistics.data.gov.uk/id/local-authority-district/00PB";,
  "prefLabel" : [
    "The County Borough of Bridgend",
    { "value" : "The County Borough of Bridgend", "lang" : "en" },
    { "value" : "Pen-y-bont ar Ogwr", "lang : "cy" }
  ]
  ...

Then it would up to the publisher whether provide the simpler propertiesas well or instead. But those could be regard as transformations of theRDF for convenience (much like choosing to include RDFS closure info).


Turning to data types ...

Your onsCode examples are a particular pattern for how to use datatypeswhich are indeed a similar case to lang tags. But how are you thinkingof handling the common cases like the XSD types?

I'm assuming that all the number formats would all become JSON numbersrather than strings, right? That looses the distinction between sayxsd:decimal and xsd:float but javascript doesn't care about that and ifwe are not doing an interchange format that's OK.

For things like xsd:dateTime then there seems a couple of options. TheSimile type option would be to have them as strings but define the rangeof the property in some associated context/properties table.


The other would be to use a structured representation:

  {
      "id" : "http://example.com/ourpaper";,
      "date" : { "type" : date, "value" : "20091312"}
     ...

I'm guessing you would just have them as strings and let the consumerfigure out when they want to treat them as dates, is that right?

## Nesting Objects ##

Regarding nested objects, I'm again inclined to view this as aconfiguration option rather than something that is based on theavailable data. For example, if we have:


  <http://example.org/about>
    dc:title "Anna's Homepage"@en ;
    foaf:maker <http://example.org/anna> .

  <http://example.org/anna>
    foaf:name "Anna Wilder" ;
    foaf:homepage <http://example.org/about> .

this could be expressed in JSON as either:

  {
    "$": "http://example.org/about";,
    "title": "Anna's Homepage",
    "maker": {
      "$": "http://example.org/anna";,
      "name": "Anna Wilder",
      "homepage": "http://example.org/about";
    }
  }

or:

  {
    "$": "http://example.org/anna";,
    "name": "Anna Wilder",
    "homepage": {
      "$": "http://example.org/about";,
      "title": "Anna's Homepage",
      "maker": "http://example.org/anna";
    }
  }


Or:

[
  {
    "id": "http://example.org/about";,
    "title": "Anna's Homepage",
    "maker": "http://example.org/anna";
  },

  {
    "id": "http://example.org/anna";,
    "name": "Anna Wilder",
    "homepage": "http://example.org/about";
  }
]

The one that's required could be indicated through the configuration,for example:
  <> a api:API ;
    api:mapping [
      api:property foaf:maker ;
      api:name "maker" ;
      api:embed true ;
    ] .

My zero-configuration default would be to nest single-referenced bNodesand have everything else as top level resources with cross-references,as above.

The final thought that I had for representing RDF graphs as JSON wasabout suppressing properties. Basically I'm thinking that thisconfiguration should work on any graph, most likely one generated from aDESCRIBE query. That being the case, it's likely that there will beproperties that repeat information (because, for example, they are asuper-property of another property). It will make a cleaner JSON API ifthose repeated properties aren't included. So something like:
  <> a api:API ;
    api:mapping [
      api:property admingeo:contains ;
      api:ignore true ;
    ] .


Seems reasonable but seems a separate issue from the JSON encoding.

# SPARQL Results #
I'm inclined to think that creating JSON representations of SPARQLresults that are acceptable to normal developers is less important thancreating JSON representations of RDF graphs, for two reasons:
1. SPARQL naturally gives short, usable, names to the properties inJSON objects2. You have to be using SPARQL to create them anyway, and if you'redoing that then you can probably grok the extra complexity of havingvalues that are objects

+1

Nevertheless, there are two things that could be done to simplify theSPARQL results format for normal developers.
One would be to just return an array of the results, rather than anobject that contains a results property that contains an object with abindings property that contains an array of the results. People who wantmetadata can always request the standard SPARQL results JSON format.


This seems quite minor, it's very easy to do the deref.

The second would be to always return simple values rather than objects.For example, rather than:
  {
    "head": {
      "vars": [ "book", "title" ]
    },
    "results": {
      "bindings": [
        {
          "book": {
            "type": "uri",
            "value": "http://example.org/book/book6";
          },
          "title": {
            "type": "literal",
            "value", "Harry Potter and the Half-Blood Prince"
          }
        },
        {
          "book": {
            "type": "uri",
            "value": "http://example.org/book/book5";
          },
          "title": {
            "type": "literal",
            "value": "Harry Potter and the Order of the Phoenix"
          }
        },
        ...
      ]
    }
  }

a normal developer would want to just get:

  [{
    "book": "http://example.org/book/book6";,
    "title": "Harry Potter and the Half-Blood Prince"
   },{
     "book": "http://example.org/book/book5";,
     "title": "Harry Potter and the Order of the Phoenix"
   },
   ...
  ]
I don't think we can do any configuration here. It means thatinformation about datatypes and languages isn't visible, but (a) I'mpretty sure that 80% of the time that doesn't matter, (b) there's alwaysthe full JSON version if people need it and (c) they could write SPARQLqueries that used the datatype/language to populate differentvariables/properties if they wanted to.

+1

So there you are. I'd really welcome any thoughts or pointers about anyof this: things I've missed, vocabularies we could reuse, things thatyou've already done along these lines, and so on. Reasons why none ofthis is necessary are fine too, but I'll warn you in advance that I'munlikely to be convinced ;)

Thanks so much for getting this started and kicking off with suchdetailed suggestions.


Cheers,
Dave

[1] The data model is described at:
http://simile.mit.edu/wiki/Exhibit/Understanding_Exhibit_Database
The JSON page is unhelpful!
http://simile.mit.edu/wiki/Exhibit/Understanding_Exhibit_JSON_Format
But there is some documentation:
http://simile.mit.edu/wiki/Exhibit/Creating,_Importing,_and_Managing_Data
[2] http://simile.mit.edu/babel/
[3] http://data-gov.tw.rpi.edu/ws/sparqlproxy.php

Re: Creating JSON from RDF

Reply via email to