Ian Hickson wrote:
On Sat, 23 Aug 2008, Julian Reschke wrote:
Again you're confusing HTTP URLs with URIs.

Using URIs as identifiers allows lots of identification schemes other than HTTP, in particular ones that are not based on DNS, or that use DNS, but include a timestamp to address the concern of "losing" a domain name (tag URI scheme).

Sure, but most people use HTTP URIs anyway for namespaces.

You can use any URI or any system you want with class="". The key is just to make it unique enough that clashes won't happen. In practice, names like "dc:title" are actually quite unique enough. But people can use much more unique ones if desired, all the way to full URIs.

I'm certainly in favour of making mainstream namespace names prettier. But this design worries me, since it requires guesswork and heuristics on the part of consumer code to figure out if class = "info.age" or "museum.acquisitionDate" is intended as a URI or not. I'll air the worry first, and then sketch an approach that makes me worry less and which might have some of the characteristics that you value (such as not depending on separate xmlns-like declarations of abbreviations, and not being too ugly to look at).

You mentioned earlier that the RDFish practices around downloading and interpreting schemas from the Web is news to you. I'll take up an action to document some of the things we do in that area (eg. with SPARQL for data merging), probably as a blog post.

Doing so would help as background on my next point, which is that making it ambiguous whether a URI was declared is something that would need careful security review, to ensure that data consumers are aware that they should not expect property definitions found at the domain to be consistent with the intended meaning of the markup.

Sketch of a scenario:

1. Alice deploys <class="creationDate.info">1979</class> to describe a museum artifact. She calls it this because it marks up some information about the creation date of some real world thing, and because 'creationDate' is already in use for describing page creation dates, in the CSS library she's using.

2. Bob buys himself the Internet domain creationDate.info and wires up a webserver to respond with an RDFa schema defining creationDate as a sub-property of http://ecommerce.example.com/vocab#priceInEuros.

3. Charlie's code downloads Alice's markup, parses out the RDFa, and noticing that creationDate.info seems to be de-referencable, so goes to fetch the schema. For every triple "x creationDate y" in the document, it also generates "x ecom:priceInEuros y" too. Perhaps Bob is selling other museum artifact and wants to make Alice's look more expensive. Or cheaper. Or to make her data look corrupted so that certain consumers won't include her listing. Or maybe he wants to buy the item cheaply and is probing for bugs in Alice's online shopping system.

In other words, the fact that Alice's markup only *appears* to be using an Internet domain opens her up to risk that someone will go buy that domain, and put a fake schema there which affects the likely interpretation of her markup. This exposure is increased by our uncertainty about ICANN strategy: we can't rely on the assumption that there are only a tiny handful of TLDs. We can probably rely on them being expensive at the top level, but not on having a hardcoded list enumerating them.

[[
Icann has announced it will allow the creation of any new top-level domains, albeit at a considerable cost.

As well as opening the door to an influx of new web addresses, Icann has also said that it will allow Japanese, Chinese, Arabic and Cyrillic characters to be used in registrations for the first time.

"It's a massive increase in the real estate of the internet. It will allow groups, communities and businesses to express their identities online," says Paul Twonmey, chief executive of Icann, speaking to the Times. ]] http://www.pcpro.co.uk/news/208833/icann-creates-domain-name-freeforall.html


The RDF approach generally has been to make it very clear which chunks of data contain URIs, and whether they can be relative or not. Other markup systems have adopted a similar approach. These share the merit that it makes such ambiguity much less of a problem (although there are other attacks of course).

Lately I've been thinking that perhaps we can get something less ugly than "http://"; in the markup, yet specify rules that allow expansion to http:// or https:// while keeping it clear whether the markup author really intends to cite some domain/page as vocabulary documentation.

For example <p>I'm <span property="info.foaf/age">1979</p> years old</p>
(if FOAF was documented at http://foaf.info/age and we specified the property attribute to use java-style names, and be declared relative to the http:// scheme).

Or <p>I'm <span property="foaf/age">1979</p> years old</p>
(if I spend $100k at ICANN to buy a tld 'foaf')

or <p>I'm <span property="Com.xmlns.foaf.age">1979</p> years old</p>
(if I did some Apache config sysadmin on xmlns.com)

<p>I'm <span property="http://xmlns.com/foaf/0.1/age";>1979</p> years old</p>
(if this was written out in fullest form, and if the 'age' property existed yet in FOAF).


Such a design would open things to a marketplace in a real sense. Parties who wanted nice short URLs for their properties could beg, borrow or buy the appropriate domain names. The reverse-domain format from Java would be a bit unusual for people used to the HTTP/browser way. Perhaps property="age.foaf.xmlns.com" is equally readable?

The main cost here is that our prettification strategy is syntactically indistinguishable from relative URIs. So we could only reliably use it in attributes where we know we don't have a relative URI. For properties, that seems fine. For the subjects and objects of statements (ie. the things the properties apply to, or take as values) this would require further thought.

Am I making any sense here? (regardless of whether you agree...)

cheers,

Dan

--
http://danbri.org/

Reply via email to