Re: [whatwg] RDFa

Dan Brickley Tue, 26 Aug 2008 00:32:34 -0700

Ian Hickson wrote:

On Sat, 23 Aug 2008, Julian Reschke wrote:
Again you're confusing HTTP URLs with URIs.
Using URIs as identifiers allows lots of identification schemes otherthan HTTP, in particular ones that are not based on DNS, or that useDNS, but include a timestamp to address the concern of "losing" a domainname (tag URI scheme).
Sure, but most people use HTTP URIs anyway for namespaces.
You can use any URI or any system you want with class="". The key is justto make it unique enough that clashes won't happen. In practice, nameslike "dc:title" are actually quite unique enough. But people can use muchmore unique ones if desired, all the way to full URIs.

I'm certainly in favour of making mainstream namespace names prettier.But this design worries me, since it requires guesswork and heuristicson the part of consumer code to figure out if class = "info.age" or"museum.acquisitionDate" is intended as a URI or not. I'll air the worryfirst, and then sketch an approach that makes me worry less and whichmight have some of the characteristics that you value (such as notdepending on separate xmlns-like declarations of abbreviations, and notbeing too ugly to look at).

You mentioned earlier that the RDFish practices around downloading andinterpreting schemas from the Web is news to you. I'll take up an actionto document some of the things we do in that area (eg. with SPARQL fordata merging), probably as a blog post.

Doing so would help as background on my next point, which is that makingit ambiguous whether a URI was declared is something that would needcareful security review, to ensure that data consumers are aware thatthey should not expect property definitions found at the domain to beconsistent with the intended meaning of the markup.


Sketch of a scenario:

1. Alice deploys <class="creationDate.info">1979</class> to describe amuseum artifact. She calls it this because it marks up some informationabout the creation date of some real world thing, and because'creationDate' is already in use for describing page creation dates, inthe CSS library she's using.

2. Bob buys himself the Internet domain creationDate.info and wires up awebserver to respond with an RDFa schema defining creationDate as asub-property of http://ecommerce.example.com/vocab#priceInEuros.

3. Charlie's code downloads Alice's markup, parses out the RDFa, andnoticing that creationDate.info seems to be de-referencable, so goes tofetch the schema. For every triple "x creationDate y" in the document,it also generates "x ecom:priceInEuros y" too. Perhaps Bob is sellingother museum artifact and wants to make Alice's look more expensive. Orcheaper. Or to make her data look corrupted so that certain consumerswon't include her listing. Or maybe he wants to buy the item cheaply andis probing for bugs in Alice's online shopping system.

In other words, the fact that Alice's markup only *appears* to be usingan Internet domain opens her up to risk that someone will go buy thatdomain, and put a fake schema there which affects the likelyinterpretation of her markup. This exposure is increased by ouruncertainty about ICANN strategy: we can't rely on the assumption thatthere are only a tiny handful of TLDs. We can probably rely on thembeing expensive at the top level, but not on having a hardcoded listenumerating them.

[[

Icann has announced it will allow the creation of any new top-leveldomains, albeit at a considerable cost.

As well as opening the door to an influx of new web addresses, Icann hasalso said that it will allow Japanese, Chinese, Arabic and Cyrilliccharacters to be used in registrations for the first time.

"It's a massive increase in the real estate of the internet. It willallow groups, communities and businesses to express their identitiesonline," says Paul Twonmey, chief executive of Icann, speaking to the Times.]]http://www.pcpro.co.uk/news/208833/icann-creates-domain-name-freeforall.html

The RDF approach generally has been to make it very clear which chunksof data contain URIs, and whether they can be relative or not. Othermarkup systems have adopted a similar approach. These share the meritthat it makes such ambiguity much less of a problem (although there areother attacks of course).

Lately I've been thinking that perhaps we can get something less uglythan "http://"; in the markup, yet specify rules that allow expansion tohttp:// or https:// while keeping it clear whether the markup authorreally intends to cite some domain/page as vocabulary documentation.


For example <p>I'm <span property="info.foaf/age">1979</p> years old</p>

(if FOAF was documented at http://foaf.info/age and we specified theproperty attribute to use java-style names, and be declared relative tothe http:// scheme).


Or <p>I'm <span property="foaf/age">1979</p> years old</p>
(if I spend $100k at ICANN to buy a tld 'foaf')

or <p>I'm <span property="Com.xmlns.foaf.age">1979</p> years old</p>
(if I did some Apache config sysadmin on xmlns.com)

<p>I'm <span property="http://xmlns.com/foaf/0.1/age";>1979</p> years old</p>

(if this was written out in fullest form, and if the 'age' propertyexisted yet in FOAF).

Such a design would open things to a marketplace in a real sense.Parties who wanted nice short URLs for their properties could beg,borrow or buy the appropriate domain names. The reverse-domain formatfrom Java would be a bit unusual for people used to the HTTP/browserway. Perhaps property="age.foaf.xmlns.com" is equally readable?

The main cost here is that our prettification strategy is syntacticallyindistinguishable from relative URIs. So we could only reliably use itin attributes where we know we don't have a relative URI. Forproperties, that seems fine. For the subjects and objects of statements(ie. the things the properties apply to, or take as values) this wouldrequire further thought.


Am I making any sense here? (regardless of whether you agree...)

cheers,

Dan

--
http://danbri.org/

Re: [whatwg] RDFa

Reply via email to