Re: Change Proposal for HttpRange-14
On 30/03/2012 16:15, Tom Heath tom.he...@talis.com wrote: Hi Michael, On 27 March 2012 16:17, Michael Smethurst michael.smethu...@bbc.co.uk wrote: On 26/03/2012 17:13, Tom Heath tom.he...@talis.com wrote: Hi Jeni, On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. No data here I fear; merely anecdote. But anecdote is usually the best form of data :-) I guess this is where we'll have to differ :) Of all people you guys at the BBC have great anecdotes, and clearly personally you have heaps of opinions about some of the big thorny issues in Linked Data deployment and usage, formed from first hand experience. I'm not saying I agree or disagree with any of the specifics, I'm just making a plea for us to raise the level of analysis to a point where we have some more robust evidence from which to draw conclusions. I'll do what I can to contribute, but I think we all need to pitch in and produce this evidence if the discussion and conclusions are going to be credible. Anecdotes and opinion only get us so far. Hi Tom A late response before I return to lurking... As lots of other people have pointed out I think there are 2 quite different problems here: - the performance problems (including non-cachability of 303s (as implemented if not as specced) and cdns etc) - the organisational / institutional / cultural problems. Basically explaining and convincing enough people up the management chain to make it happen. And, every 2 years, when everybody swaps seats, having a whole new chain of people to convince... (And using fragment identifiers buys you out of all that pain) I can see how you'd get data to make a reasonable evaluation of the former. (and as I said in the earlier email I think at least some of the performance problems would be solved by separating out 303s from conneg, routing html links to the generic document resource uri, not channelling every request thru a 303 and only referring to the thing that isn't a document when you want to make statements about it) But I have no idea how you get data to help analyse the latter. In which case you're left with anecdote... === And for some cases I admit I find it difficult to explain to myself. Most of the example explanations start with physical things (people, cats, buildings, trains, bridges etc) and the explanation is easy. For some set of metaphysical things (organisations, football clubs (not teams / squads), tv series, species...) it's also (relatively) easy. But for some set of stuff (the definition of which I can't quite put my finger on), it's really not that easy Over recent days this list seems to have settled on something like: if you can get a reasonable representation it's content; if you can't it's description. For some definition of reasonable Taking 2 uris from dbpedia: http://dbpedia.org/resource/Fox_News_Channel is an organisation / corporation / tv channel. It's easyish to argue you can't get a reasonable response that isn't just a description http://dbpedia.org/resource/Fox_News_Channel_controversies is (in wikipedia terms) an overspill article. It could be a skos type concept I guess but it's more of a compound concept (a sentence). No matter what http evolves into, I can't think of a more reasonable response to that than a list of controversies involving fox news. What's the 303 doing in that case? It's made more confusing because the statements you get back from Fox_News_Channel_controversies are more or less identical to the statements you get back from Fox_News_Channel because the infoboxes on both wikipedia pages are more or less the same. So dbpedia says fox news controversies is an entity of type broadcaster and has a broadcastArea, a firstAirDate, a headquarter, an owningCompany, a pictureFormat etc Yours (in confusion) Michael Cheers, Tom. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as
Re: Change Proposal for HttpRange-14
On 4/4/12 5:48 AM, Michael Smethurst wrote: Over recent days this list seems to have settled on something like: if you can get a reasonable representation it's content; if you can't it's description. For some definition of reasonable Taking 2 uris from dbpedia: http://dbpedia.org/resource/Fox_News_Channel is an organisation / corporation / tv channel. It's easyish to argue you can't get a reasonable response that isn't just a description http://dbpedia.org/resource/Fox_News_Channel_controversies is (in wikipedia terms) an overspill article. It could be a skos type concept I guess but it's more of a compound concept (a sentence). No matter what http evolves into, I can't think of a more reasonable response to that than a list of controversies involving fox news. What's the 303 doing in that case? It's made more confusing because the statements you get back from Fox_News_Channel_controversies are more or less identical to the statements you get back from Fox_News_Channel because the infoboxes on both wikipedia pages are more or less the same. So dbpedia says fox news controversies is an entity of type broadcaster and has a broadcastArea, a firstAirDate, a headquarter, an owningCompany, a pictureFormat etc Yours (in confusion) Michael Michael, DBpedia is but one of many data sources accessible via the burgeoning Web of Linked Data. The relations in DBpedia are not always accurate per se., they typically provide a commencement point for additional finessing by subject matter experts. For instance, you can apply YAGO [1] context to DBpedia data en route to enhanced relations [2][3] that provide better descriptions for a given entity. The emergence of the Data Wiki via projects such as OntoWiki [4] and Wikidata [5] will ultimately help everyone understand that Linked Data isn't a read-only affair where relations are implicitly canonical, and cast in stone :-) Links: 1. http://www.mpi-inf.mpg.de/yago-naga/yago/ -- YAGO 2. http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FFox_News_Channel_controversies -- a description from the LOD cloud cache we maintain (note: the Type drop-down and the entries it exposes courtesy of YAGO) 3. http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fyago-knowledge.org%2Fresource%2FFox_News_Channel_controversies -- YAGO description of the same DBpedia entity 4. http://ontowiki.net/Projects/OntoWiki -- OntoWiki 5. http://meta.wikimedia.org/wiki/Wikidata -- WikiData. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Jonathan, On Sun, Apr 01, 2012 at 05:05:10PM +0200, Jonathan A Rees wrote: Hmm... so from a 200 statuscode and HR14, I can conclude that I have a representation of it, that is is an IR and therefor has a representation that conveys the essential characteristics of it (definition of IR at http://www.w3.org/2001/tag/doc/uddp-20120229/), but not that the representation I got actually is a representation that conveys the essential characteristics of it ? Well not necessarily. The problem is that 'representation' has at least three different meanings in these discussions. There's (1) the REST / AWWW glossary / HTTPbis definition, the record of the state of something, which I take to mean that descriptions, as well as depictions, content, etc. are representations. There's (2) the usage where it means expression or encoding of information, similar to what I call instance or TimBL calls content, which would be a special case of (1). (2) shows up in RFC 2616 and in parts of AWWW. Then there's 'representation' as (3) whatever you get in a 200 response, what I call nominal representation. (1)=(3) if the manner of 'representing' can be idiosyncratic to each resource. So you have to be careful which sense you mean. Most of the specs are pretty murky on this, and that's part, maybe most, of the reason why the conversation is so incredibly painful. I meant 2) but that is not relevant because if you take what is written in http://www.w3.org/2001/tag/doc/uddp-20120229/ as granted, my statement not only holds for the special case 2), but also for the general case 1). I thought current representation of in http://www.w3.org/2001/tag/doc/uddp-20120229/ refers to something more like 2) and definitely not to mere descriptions but when I look at it there seems to be nothing to back this. But whatever representation means exactly, I would rephrase the sentence that the identified resource is an information resource (see below). in http://www.w3.org/2001/tag/doc/uddp-20120229/ to that the identified resource has a representation that conveys it's essential characteristics (see below). This makes it clearer what conclusions you cannot draw and avoids the term information resource in the important sentence while basically saying the same. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Jonathan, On Tue, Apr 03, 2012 at 02:05:29PM +0200, Michael Brunnbauer wrote: I thought current representation of in http://www.w3.org/2001/tag/doc/uddp-20120229/ refers to something more like 2) and definitely not to mere descriptions but when I look at it there seems to be nothing to back this. BTW: If it is true that nothing backs this interpretation, then the IR stuff in http://www.w3.org/2001/tag/doc/uddp-20120229/ would be the only thing that stops somebody from calling his homepage a representation of himself and using it's URI for denoting himself. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
hi all On Sat, Mar 31, 2012 at 05:53:03PM +0200, Michael Brunnbauer wrote: maybe I made an error by assuming that the term IR is inherent in the term representation - by assuming that a NIR cannot have a representation, only descriptions ? No. The whole point about the use of the term IR in HR14 seems to be to say: Everything that has a representation has a representation that conveys it's essential characteristics. Is this important ? If yes, should we write it this way ? Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 4/1/12 4:35 AM, Michael Brunnbauer wrote: hi all On Sat, Mar 31, 2012 at 05:53:03PM +0200, Michael Brunnbauer wrote: maybe I made an error by assuming that the term IR is inherent in the term representation - by assuming that a NIR cannot have a representation, only descriptions ? No. The whole point about the use of the term IR in HR14 seems to be to say: Everything that has a representation has a representation that conveys it's essential characteristics. Is this important ? If yes, should we write it this way ? Regards, Michael Brunnbauer Aren't we somehow losing the fundamental fact that all resources on the Web are supposed to be bear self-describing content, constrained by mime type. That when all is said and done, irrespective of mime type, all Web resources are Information Resources. The above gels nicely with the fact that all content bears representation of something the provides information to appropriate systems, courtesy of the mime type component of this content+mime-type composite. The content of a basic HTML web page, an RDF document, OWL and RDFs documents all bear content that deliver information. Of course, within specific system realms such as RDF, Linked Data, and the Semantic Web, the content represents a more specific kind of information in the form of descriptions and definitions -- at least, in the eyes of systems (clients and servers) for said realms. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 4/1/12 11:42 AM, Kingsley Idehen wrote: On 4/1/12 4:35 AM, Michael Brunnbauer wrote: hi all On Sat, Mar 31, 2012 at 05:53:03PM +0200, Michael Brunnbauer wrote: maybe I made an error by assuming that the term IR is inherent in the term representation - by assuming that a NIR cannot have a representation, only descriptions ? No. The whole point about the use of the term IR in HR14 seems to be to say: Everything that has a representation has a representation that conveys it's essential characteristics. Is this important ? If yes, should we write it this way ? Regards, Michael Brunnbauer Aren't we somehow losing the fundamental fact that all resources on the Web are supposed to be bear self-describing content, constrained by mime type. That when all is said and done, irrespective of mime type, all Web resources are Information Resources. The above gels nicely with the fact that all content bears representation of something the provides information to appropriate systems, courtesy of the mime type component of this content+mime-type composite. The content of a basic HTML web page, an RDF document, OWL and RDFs documents all bear content that deliver information. Of course, within specific system realms such as RDF, Linked Data, and the Semantic Web, the content represents a more specific kind of information in the form of descriptions and definitions -- at least, in the eyes of systems (clients and servers) for said realms. In the post above, I forgot to add this link for reference: http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Sun, 2012-04-01 at 10:35 +0200, Michael Brunnbauer wrote: The whole point about the use of the term IR in HR14 seems to be to say: Everything that has a representation has a representation that conveys it's essential characteristics. Is this important ? If yes, should we write it this way ? FYI, Jonathan Rees has written up a very nice formalization of the relationship between an information resource and a representation -- or in his parlance an instance and a generic information entity -- in terms of what it means to make metadata statements about them: http://www.w3.org/2001/tag/awwsw/ir/latest/ -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Kingsley, Everything that has a representation has a representation that conveys it's essential characteristics. [...] Aren't we somehow losing the fundamental fact that all resources on the Web are supposed to be bear self-describing content, constrained by mime type. That when all is said and done, irrespective of mime type, all Web resources are Information Resources. Your last sentence is what my sentence above says. In http://www.w3.org/2001/tag/doc/uddp-20120229/, an IR is defined as something that has a representation that conveys the essential characteristics of it. We can get rid of the term information resource by putting something like the above statement about representations in the papers - if I am not the only one who thinks that things are easier to understand this way :-) Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 4/1/12 12:31 PM, Michael Brunnbauer wrote: Hello Kingsley, Everything that has a representation has a representation that conveys it's essential characteristics. [...] Aren't we somehow losing the fundamental fact that all resources on the Web are supposed to be bear self-describing content, constrained by mime type. That when all is said and done, irrespective of mime type, all Web resources are Information Resources. Your last sentence is what my sentence above says. In http://www.w3.org/2001/tag/doc/uddp-20120229/, an IR is defined as something that has a representation that conveys the essential characteristics of it. We can get rid of the term information resource by putting something like the above statement about representations in the papers - if I am not the only one who thinks that things are easier to understand this way :-) Yes, but in the context of RDF a triple can be seen as conveying information about the referent of a URI. This information can take the form of a description or a more specific definition. Resources always bear 'information' the question ultimately boils down to what kind of information, subject to the Web system (dimension or aspect) in question :-) Regards, Michael Brunnbauer -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Sat, 2012-03-31 at 11:32 -0400, Jonathan A Rees wrote: [ . . . ] So this is something we already knew from the HTTP spec, which all of us pretty much agree to; We all agree to it as a *protocol* specification -- not as a *semantics* specification. [ . . . ] On the other hand the specs are all terribly murky, [ . . . ] They're only murky if you are trying to interpret them as defining a global semantics for the web, which is not what they were intended to do. -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 4/1/12 9:42 PM, David Booth wrote: On Sat, 2012-03-31 at 11:32 -0400, Jonathan A Rees wrote: [ . . . ] So this is something we already knew from the HTTP spec, which all of us pretty much agree to; We all agree to it as a *protocol* specification -- not as a *semantics* specification. [ . . . ] On the other hand the specs are all terribly murky, [ . . . ] They're only murky if you are trying to interpret them as defining a global semantics for the web, which is not what they were intended to do. +1 -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
hi all The document at http://www.w3.org/2001/tag/doc/uddp-20120229/ uses the term X (a sequence of octets + media type) is a representation of Y (an entity). I have a question: Can two different entities have the same representation ? If not, we can define an IR as a thing for which there is at least one sequence of octets + media type that is a representation of it because it's essential characteristics would be conveyed in that message. The term IR would not have much value in this case as it would not be a term of it's own. If yes, I could have a lossy compression algorithm that makes the same sequence of octets out of two different images and still have a representation of those images when I GET them. I think all the difficult questions (or the nitpicking, if you want) do not go away if we drop the term IR. They also lie in the question does this URI denote what it accesses or is this message a representation of the entity or do I serve the content of this sucker. Regards, Michael Brunnbauer On Wed, Mar 28, 2012 at 11:35:04PM +0200, Michael Brunnbauer wrote: Hallo Norman, -Regardless of how you define IR, everything that denotes what it accesses should lie in IR. -Putting something in NIR therefor also answers the question if it denotes what it accesses with no by entailment. I have worded this very badly. We are talking about things and names of things. This should be: For all URIs U: denote(U) = access(U) - denote(U) a IR It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U) -There may or may not be IRs that do not denote what they access. And this should be: There is a URI U where: denote(U) a IR and denote(U) != access(U). Now if a am allowed to mint a URI that 303's to your homepage and your homepage is an IR, such an URI must exist: U1 = Your URI for your homepage U2 = My URI for your homepage denote(U1) a IR denote(U2) != access(U2) denote(U1) = denote(U2) thererfor denote(U2) a IR and denote(U2) != access(U2) I think I'll stay out of this discussion from now :-) Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Sat, Mar 31, 2012 at 8:05 AM, Michael Brunnbauer bru...@netestate.de wrote: hi all The document at http://www.w3.org/2001/tag/doc/uddp-20120229/ uses the term X (a sequence of octets + media type) is a representation of Y (an entity). I have a question: Can two different entities have the same representation ? I've never heard anything say anything that would rule this out. Well, on second thought, in conversation I've heard people give a theory that representations being on the wire, which would make them events, which occur in space and time, thus cannot happen twice. But this idea does not follow from 2616 or 3986 or AWWW and does not have anything like consensus. I always thought that Content-location: suggested this situation, where you have two URIs and two resources, and a representation that is of both, where the first resource might have *additional* representations, and the second doesn't. This seems tidy to me, but it's just my theory. If not, we can define an IR as a thing for which there is at least one sequence of octets + media type that is a representation of it because it's essential characteristics would be conveyed in that message. The term IR would not have much value in this case as it would not be a term of it's own. That is: an IR is something that has a representation. I think this has been suggested several times. Unfortunately information resource has a definition in AWWW and I don't see the merit in redefining the term rather than introducing a new term. However I believe I have heard this suggestion, or something like it, before, from several sources, so it's not completely out of the question. It would be nice in a way because it would make HR14a completely vacuous. This is what I call opt in because you wouldn't be able to assume that what you GET is content (Tim's word, my instance). If yes, I could have a lossy compression algorithm that makes the same sequence of octets out of two different images and still have a representation of those images when I GET them. I think all the difficult questions (or the nitpicking, if you want) do not go away if we drop the term IR. They also lie in the question does this URI denote what it accesses or is this message a representation of the entity or do I serve the content of this sucker. I'm glad you say this. I agree that the when is X content of Y question remains, although to me it's not such a difficult question (it remains for me to convince others of this). Best Jonathan Regards, Michael Brunnbauer On Wed, Mar 28, 2012 at 11:35:04PM +0200, Michael Brunnbauer wrote: Hallo Norman, -Regardless of how you define IR, everything that denotes what it accesses should lie in IR. -Putting something in NIR therefor also answers the question if it denotes what it accesses with no by entailment. I have worded this very badly. We are talking about things and names of things. This should be: For all URIs U: denote(U) = access(U) - denote(U) a IR It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U) -There may or may not be IRs that do not denote what they access. And this should be: There is a URI U where: denote(U) a IR and denote(U) != access(U). Now if a am allowed to mint a URI that 303's to your homepage and your homepage is an IR, such an URI must exist: U1 = Your URI for your homepage U2 = My URI for your homepage denote(U1) a IR denote(U2) != access(U2) denote(U1) = denote(U2) thererfor denote(U2) a IR and denote(U2) != access(U2) I think I'll stay out of this discussion from now :-) Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Jonathan, maybe I made an error by assuming that the term IR is inherent in the term representation - by assuming that a NIR cannot have a representation, only descriptions ? But if a a NIR cannot have a representation and two different IRs cannot have the same representation, then getting a representation of an IR is as close as I can get to it. Regards, Michael Brunnbauer On Sat, Mar 31, 2012 at 11:32:37AM -0400, Jonathan A Rees wrote: On Sat, Mar 31, 2012 at 11:13 AM, Michael Brunnbauer bru...@netestate.de wrote: Hallo Jonathan [off list. If you think your answer will be helpful to others, put it back on the list] On Sat, Mar 31, 2012 at 10:54:09AM -0400, Jonathan A Rees wrote: That is: an IR is something that has a representation. [...] It would be nice in a way because it would make HR14a completely vacuous. This is what I call opt in because you wouldn't be able to assume that what you GET is content (Tim's word, my instance). Why would this definition make HR14 vacuous ? I would say that the rule from a statuscode 200, you can infer that you got a representation of what the URI denotes can be made with or without that definition. What I mean by vacuous is that RFC 2616 (certainly HTTPbis) already says - in my reading at least - that the retrieved representation is a representation of the resource identified by the URI (or at least that the server is *saying* so, i.e. it is nominally so, which is usually good enough). So this is something we already knew from the HTTP spec, which all of us pretty much agree to; neither the TAG nor anyone else would have to say that this is the case in any pronouncement resembling httpRange-14(a). Maybe vacuous was a poor choice of word. On the other hand the specs are all terribly murky, so maybe it would be good to repeat this somewhere. In any case information resource as used in HR14a is well connected to AWWW and I think redefining the term, no matter how bad the definition, would just confuse things. You could say HTTP resource or something for resources that have representations (what would be an example of one that doesn't?). My opinion. Best Jonathan Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 3/31/12 11:32 AM, Jonathan A Rees wrote: In any case information resource as used in HR14a is well connected to AWWW and I think redefining the term, no matter how bad the definition, would just confuse things. You could say HTTP resource or something for resources that have representations (what would be an example of one that doesn't?). My opinion. Information Resource isn't the problem. Its the Non Information Resource (NIR) that's the problem. In the Linked Data realm we have 'Descriptor Resources' that bear higher fidelity structured content which are still ultimately constrained by mime type. An illustration: Information Space dimension | -- isA -- Web dimension | |-- isA -- Web of Information Resources (e.g. an HTML page modulo Microdata or RDFa data islands) | Data Space dimension | -- isA -- Web dimension | |-- isA -- Web of Descriptor Resources (e.g. RDF documents where content is RDF/XML, N-Triples, Turtle, HTML+Microdata, (X)HTML+RDFa etc..) All of the resource types above are self-describing, courtesy of mime type constrained content. Excerpt from TimBL's Web FAQ [1]: Q: What did you have in mind when you first developed the Web? From A Short Personal History of the Web: A: The dream behind the Web is of a common information space in which we communicate by sharing information. Its universality is essential: the fact that a hypertext link can point to anything, be it personal, local or global, be it draft or highly polished. There was a second part of the dream, too, dependent on the Web being so generally used that it became a realistic mirror (or in fact the primary embodiment) of the ways in which we work and play and socialize. That was that once the state of our interactions was on line, we could then use computers to help us analyze it, make sense of what we are doing, where we individually fit in, and how we can better work together. Bearing in mind the above, it should aid understanding why Linked Data is about the Web's Data Space dimension. Remember, Data != Information. When you put data in context you get information. A protocol for accessing data combined with a model for data representation are critical components for providing context for data, en route to producing information. Links: 1. http://www.w3.org/People/Berners-Lee/FAQ.html -- TimBL FAQ re. Web. 2. http://tools.ietf.org/html/draft-hammer-discovery-06 -- some context for descriptor resources which also demonstrates how this term provides a conduit to others that are less interested in RDF content formats while still interested in Web scale structured and linked data . 3. http://goo.gl/BBsIz -- Three main types of Object Descriptors (remember: the Web is really a contemporary and widely successful Distributed Object system) . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Jonathan, On Sat, Mar 31, 2012 at 10:54:09AM -0400, Jonathan A Rees wrote: I have a question: Can two different entities have the same representation ? I've never heard anything say anything that would rule this out. Hmm... so from a 200 statuscode and HR14, I can conclude that I have a representation of it, that is is an IR and therefor has a representation that conveys the essential characteristics of it (definition of IR at http://www.w3.org/2001/tag/doc/uddp-20120229/), but not that the representation I got actually is a representation that conveys the essential characteristics of it ? Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Thu, Mar 29, 2012 at 10:20 PM, David Booth da...@dbooth.org wrote: On Thu, 2012-03-29 at 20:51 -0400, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 6:01 PM, Jeni Tennison j...@jenitennison.com wrote: [ . . . ] But then we would also have to define what 'content' and 'description' meant. I have a feeling that might prove just as slippery and ultimately unhelpful as 'information resource'. Agreed. As long as there's an attempt to define a difference between the two, we'll be mired in the same impossible I disagree. I've been able to reverse engineer a semantics [1] for 'content' that matches the original RDF design (for metadata, [2]) and what I think was *intended* by httpRange-14(a). The 'information resource' definition is just really unactionable; perhaps reparable but I don't think repairing it would help much since that's not even the issue. [1] http://www.w3.org/2001/tag/awwsw/ir/latest/ [2] http://www.w3.org/TR/WD-rdf-syntax-971002/ A semantics for 'content'? That's not at all what I read in [1]. Did you mean to reference some other document? I think [1] describes an excellent way to formalize what it means to write an assertion about an information resource (though it's called a generic information entity in that document instead of information resource). But it only uses the term 'content' three times in the body, and only in passing. And it *never* defines the term. In what sense do you think it defines a semantics for 'content'? Tim's 'content' ~= my 'instance' I thought that was clear from the way I've been saying content / instance and content (instance) and instance (content) in my emails I've started using Tim's word since people listen more closely to him than they do to me. Jonathan
Re: Change Proposal for HttpRange-14
Hi Michael, On 27 March 2012 16:17, Michael Smethurst michael.smethu...@bbc.co.uk wrote: On 26/03/2012 17:13, Tom Heath tom.he...@talis.com wrote: Hi Jeni, On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. No data here I fear; merely anecdote. But anecdote is usually the best form of data :-) I guess this is where we'll have to differ :) Of all people you guys at the BBC have great anecdotes, and clearly personally you have heaps of opinions about some of the big thorny issues in Linked Data deployment and usage, formed from first hand experience. I'm not saying I agree or disagree with any of the specifics, I'm just making a plea for us to raise the level of analysis to a point where we have some more robust evidence from which to draw conclusions. I'll do what I can to contribute, but I think we all need to pitch in and produce this evidence if the discussion and conclusions are going to be credible. Anecdotes and opinion only get us so far. Cheers, Tom. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). I think it's explainable. I don't think it's self evident And explanation can be tricky because: a) once you get past the obvious cases (a person and their homepage) there are further levels of abstraction that make things complicated. A journalist submits a report to a news agency, a sub-editor tweaks it and puts it on the wires, a news publisher picks up the report, a journalist shapes an article around it, another sub-editor tweaks that, the article gets published, the article gets syndicated. Which document is the rdf making claims (created by, created at) about? And is that the important / interesting thing? You quickly head down a frbr shaped rabbit hole b) The way people make and use websites (outside the whole linked data thing) has moved on. Many people don't just publish pages; they publish pages that have a one-to-one correspondence with real world things. A page per photo or programme or species or recipe or person. They're already in the realm of thinking about things before pages and to them the page and it's url is a good enough approximation for description c) people using the web are already thinking about things not pages. If you search google for Obama your mental model is of the person, not any resulting pages d) we already have the resource / representation split which is quite enough abstraction for some people e) the list of things you might want to say about a document is finite; the list of things you might want to say about the world isn't 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. I know publishers whose platform is so constrained they can't even edit the head section of their html documents. They certainly don't have access at the server level Even where 303s are technically possible they might not be politically possible. Technically we could have easily created bbc.co.uk/things/:blah and made it 303 but that would have involved setting up /things and that's a *very* difficult conversation with management and ops And if it's technically and politically possible it really depends on how the 303 is set up. Lots of linked data people seem to conflate the 303 and content negotiation. So I ask for something that can't be sent,
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Mar 27, 2012, at 6:59 AM, Danny Ayers wrote: This seems an appropriate place for me to drop in my 2 cents. I like the 303 trick. People that care about this stuff can use it (and appear to be doing so), but it doesn't really matter too much that people that don't care don't use it. It seems analogous to the question of HTML validity. Best practices suggest creating valid markup, but if it isn't perfect, it's not a big deal, most UAs will be able to make sense of it. There will be reduced fidelity of communication, sure, but there will be imperfections in the system whatever, so any trust/provenance chain will have to consider such issues anyway. So I don't really think Jeni's proposal is necessary, but don't feel particularly strongly one way or the other. Philosophically I reckon the flexibility of what a representation of a resource can be means that the notion of an IR isn't really needed. I've said this before in another thread somewhere, but if the network supported the media type thing/dog then it would be possible to GET http://example.org/Basil with full fidelity. Right now it doesn't, but I'd argue that what you could get with media type image/png would still be a valid, if seriously incomplete representation of my dog. In other words, a description of a thing shares characteristics with the thing itself, and that's near enough for HTTP representation purposes. It might be for HTTP, but not for RDF (and up) representational purposes. And as this entire brouhaha only arose when people started worrying about semantics at the RDF level (and up), this is not a particularly helpful remark. The basic mistake you (and others) are making is to conflate reference with similarity. A description of a thing shares NO characteristics with the thing it describes. Describing is not being-somewhat-similar-to. For an early (1726) but still insightful explanation of what is wrong with this idea, see http://4umi.com/swift/gulliver/laputa/5 : We next went to the School of Languages, where three Professors sate in Consultation upon improving that of their own country. The first Project was to shorten Discourse by cutting Polysyllables into one, and leaving out Verbs and Participles, because in reality all things imaginable are but Nouns. The other, was a Scheme for entirely abolishing all Words whatsoever; and this was urged as a great Advantage in Point of Health as well as Brevity. For it is plain, that every Word we speak is in some Degree a Diminution of our Lungs by Corrosion, and consequently contributes to the shortning of our Lives. An Expedient was therefore offered, that since Words are only Names for Things, it would be more convenient for all Men to carry about them, such Things as were necessary to express the particular Business they are to discourse on. And this Invention would certainly have taken Place, to the great Ease as well as Health of the Subject, if the Women in conjunction with the Vulgar and Illiterate had not threatned to raise a Rebellion, unless they might be allowed the Liberty to speak with their Tongues, after the manner of their Ancestors; such constant irreconcilable Enemies to Science are the common People. However, many of the most Learned and Wise adhere to the New Scheme of expressing themselves by Things, which hath only this Inconvenience attending it, that if a Man's Business be very great, and of various kinds, he must be obliged in Proportion to carry a greater bundle of Things upon his Back, unless he can afford one or two strong Servants to attend him. I have often beheld two of those Sages almost sinking under the Weight of their Packs, like Pedlars among us; who, when they met in the Streets, would lay down their Loads, open their Sacks, and hold Conversation for an Hour together; then put up their Implements, help each other to resume their Burthens, and take their Leave. But for short Conversations a Man may carry Implements in his Pockets and under his Arms, enough to supply him, and in his House he cannot be at a loss: Therefore the Room where Company meet who practise this Art, is full of all Things ready at Hand, requisite to furnish Matter for this kind of artificial Converse. Another great Advantage proposed by this Invention, was that it would serve as a Universal Language to be understood in all civilized Nations, whose Goods and Utensils are generally of the same kind, or nearly resembling, so that their Uses might easily be comprehended. And thus Embassadors would be qualified to treat with foreign Princes or Ministers of State to whose Tongues they were utter Strangers. Pat Cheers, Danny. -- http://dannyayers.com http://webbeep.it - text to tones and back again IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola
Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)
Hi Jeni, On 27 March 2012 18:54, Jeni Tennison j...@jenitennison.com wrote: Hi Tom, On 26 Mar 2012, at 17:13, Tom Heath wrote: On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). How would we assess that though? Give me some free time and enough motivation and I'd design an experimental protocol to unpick this issue ;) My experience is in some way similar -- it's easy enough to explain that you can't get a Road or a Person when you ask for them on the web -- but when you move on to then explaining how that means you need two URIs for most of the things that you really want to talk about, and exactly how you have to support those URIs, it starts getting much harder. My original question was only about the distinction, but yes, some of the details do get tricky, but when was it ever otherwise with technology. The biggest indication to me that explaining the distinction is a problem is that neither OGP nor schema.org even attempts to go near it when explaining to people how to add to semantic information into their web pages. The URIs that you use in the 'url' properties of those vocabularies are explained in terms of 'canonical URLs' for the thing that is being talked about. These are the kinds of graphs that millions of developers are building on, and those developers do not consider themselves linked data adopters and will not be going to linked data experts for training. Yeah, this is a shame (the OGP/schema.org bit, and the fact they won't be asking for LD training ;). IIRC Ian Davis proposed a schema-level workaround for this around the time OGP was released. He had a good case that it was a non-problem technically, but no, that doesn't explain why the distinction is not baked into the data model; same with microformats. 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. I could offer you a data point from legislation.gov.uk if you like. Woohoo! You've made my decade :D When someone requests the ToC for an item of legislation, they will usually hit our CDN and the result will come back extremely quickly. I just tried: curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents and it showed the result coming back in 59ms. When someone uses the identifier URI for the abstract concept of an item of legislation, there's no caching so the request goes right back to the server. I just tried: curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67 and it showed the result coming back in 838ms, of course the redirection goes to the ToC above, so in total it takes around 900ms to get back the data. Brilliant. This is just the kind of analysis I'm talking about. Now we need to do similar across a bunch of services, connection speeds, locations, etc., and then compare it to typical response times across a representative sample of web sites. We use New Relic for this kind of thing, and the results are rather illuminating. 1ms response times makes you rather special IIRC. That's not to excuse sluggish sites, but just to put this in context. So every time that we refer to an item of legislation through its generic identifier rather than a direct link to its ToC we are making the
Re: Change Proposal for HttpRange-14
Hi Giovanni, On 27 March 2012 21:01, Giovanni Tummarello giovanni.tummare...@deri.org wrote: Tom if you were to do a serious assessment then measuring milliseconds and redirect hits means looking at a misleading 10% of the problem. Sorry, but I don't buy your argument, which equates to never asking any questions unless you can also answer at the same time all the others that pertain to the same issue. You've gotta start somewhere, and just be realistic about the extent of claims you make based on the evidence. As for economics and perception of benefits, you're talking about broader issues of Linked Data adoption. My original point was only about httpRange-14. Tom. Cognitive loads,economics and perception of benefits are the over the 90% of the question here. An assessment that could begin describing the issue * get a normal webmaster calculate how much it takes to explain him the thing,follow him on and * see how quickly he forgets, * assess how much it takes to VALIDATE the whole thing works (E.g. a newly implemented spects) * assess what are the tools that would check if something break * assess the same thing for implementers e.g. of applications or consuming APIs to get all teh above * then once you calculate the huge cost above then compare it with the perceived benefits. THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical level because for sites that matters ITS MANAGERS THAT DECIDE geek run websites dont count, sorry. Same thing when looking at 'real world applications' by counting just geeky hacked together demostrators or semweb aficionados libs has the same skew.. these people and apps were paid by EU money or research money or so they should'n count toward real world economics driven apps, so if one was thinking of counting 50 apps that would break that'd be just as partial and misleading. .. and we could go on. Now do you really need to do the above? (let alone how difficult it is to do inproper terms) me and a whole crowd know already the results for the same exercise have been done over and over and we've been witnessing it. i sincerely hope this is the time we get this fixed so we can indeed go back and talk about the new linked data (linked data 2.0) to actual web developers, it managers etc. removing the 303 thing doesnt solve the whole problem, it is just the beginning. Looking forward to discuss next steps Gio On Mon, Mar 26, 2012 at 6:13 PM, Tom Heath tom.he...@talis.com wrote: Hi Jeni, On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. 3) hard data about occurrences of different patterns/anti-patterns; we need something more concrete/comprehensive than the list in the change proposal document. 4) examples of cases where the use of anti-patterns has actually caused real problems for people, and I don't mean problems in principle; have planes fallen out of the sky, has anyone died? Does it really matter from a consumption perspective? The answer to this is probably not, which may indicate a larger problem of non-adoption. The larger question is how do we get to a state where we *don't* have this
Re: Change Proposal for HttpRange-14
On 3/30/12 11:15 AM, Tom Heath wrote: I'm not saying I agree or disagree with any of the specifics, I'm just making a plea for us to raise the level of analysis to a point where we have some more robust evidence from which to draw conclusions. I'll do what I can to contribute, but I think we all need to pitch in and produce this evidence if the discussion and conclusions are going to be credible. Anecdotes and opinion only get us so far. Cheers, Tom. I'll have a DBpedia report published soon. Stay tuned :-) -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
On 30 March 2012 17:39, Kingsley Idehen kide...@openlinksw.com wrote: On 3/30/12 11:15 AM, Tom Heath wrote: I'm not saying I agree or disagree with any of the specifics, I'm just making a plea for us to raise the level of analysis to a point where we have some more robust evidence from which to draw conclusions. I'll do what I can to contribute, but I think we all need to pitch in and produce this evidence if the discussion and conclusions are going to be credible. Anecdotes and opinion only get us so far. Cheers, Tom. I'll have a DBpedia report published soon. Stay tuned :-) Kingsley, I'd like to buy you a beer! (you too Jeni ;) Have a great weekend all, Tom. -- Dr. Tom Heath Senior Research Scientist Talis Education Ltd. W: http://www.talisaspire.com/ W: http://tomheath.com/
Re: Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)
I think this may be a stuck record, but here goes… Would be nice, Tom, but. Yet again, the discussion around this issue is entirely focussed on a) aspects of logic, philosophy and like-minded; b) aspects of the problems of publishing; c) network issues. So where is the consumption aspect? The measure by which we decide if all this engineering is fit for purpose. Design all the protocols you want, but if you are not examining the right thing, it is not very helpful (to put it mildly). David Booth (sorry David!) said we need to deal with the engineering before addressing how we can educate people to understand it. And therefore, I would say, whether it is possible. This is not a recipe for building stuff that people can use. In fact it is not engineering at all. What is the definition of fit for purpose that you propose to use to define your protocols? My definition requires that it is suitable input for building real applications that ordinary people can use, informed by multiple and even unbounded sites from the Web of Data. Clearly, at a decent scale as well. This, I think, is the vision of Linked Data for many people. (There is also an Agent point of view, but I think we are miles from that at the moment.) If an argument cannot be made to support a point of view that has this as the end game, then I lose interest in the argument. But let us say, you have a definition of fit for purpose, and define your protocol to assess it for these questions. Tom, you said to Michael From: Tom Heath tom.he...@talis.com Subject: Re: Change Proposal for HttpRange-14 Of all people you guys at the BBC have great anecdotes, and clearly I found this really sad. To my knowledge, Michael has never consumed much in the way of other people's Linked Data. He has a fantastic wealth of knowledge about using Linked Data technologies to do Integration, which is a huge market for us. Various people keep asking for examples of applications that might be interesting data points of Linked Data consuming end-user applications to inform the discussion. But I have yet to see a satisfactory response. But the sad truth is that I am beginning to think that after all these year I (RKBExplorer.com) may still be the only one who has actually built anything that consumes data from across the Linked Data Cloud, uses it to enhance the knowledge, and then deliver it to ordinary people (OK, we might fail, but we try). This means that you, or others, just don't have enough data points to gather evidence for the assessment you want to do. But please do try - I would love to see detailed analysis of fit for purpose - and I know how much time and effort that takes! And yes, I am happy to provide you with any data I can. Best Hugh On 30 Mar 2012, at 17:22, Tom Heath wrote: Hi Jeni, On 27 March 2012 18:54, Jeni Tennison j...@jenitennison.com wrote: Hi Tom, On 26 Mar 2012, at 17:13, Tom Heath wrote: On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). How would we assess that though? Give me some free time and enough motivation and I'd design an experimental protocol to unpick this issue ;) My experience is in some way similar -- it's easy enough to explain that you can't get a Road or a Person when you ask for them on the web -- but when you move on to then explaining how that means you need two URIs for most of the things that you really want to talk about, and exactly how you have to support those URIs, it starts getting much harder. My original question was only about the distinction
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Fri, Mar 30, 2012 at 10:32 AM, Jeni Tennison j...@jenitennison.com wrote: I see best practices as being separate from normative requirements, and thought that the proposals were for the normative requirements. We did recognise in the proposal the requirement for a best practice document to supplement the normative requirements: This is a helpful discussion because I'm still trying to figure out the right way to say what I want to say, and with each iteration I think I come a bit closer to the point. My opinion is that any proposal needs to specify a way to say how you get from a resource to its content. I do a SPARQL query and find a URI for a resource based on metadata (stored in the triple store) that make it seem interesting; title, license, rating, whatever. Then I want to *look at it*. What do I do? httpRange-14(a) (or its intended stronger form) says you do a GET on its URI, and if you get a 200, that's the content, that's what I want to look at. So that's successful communication. If you delete HR14a, which is fine, you need, IMO, to replace it with some other way - normative and actionable - to express the same information, and that method has to be provided normatively, not as a best practice. Tim's proposal does this, my SHOULD not MUST proposal does, yours doesn't. And a reminder that I *do* understand content negotation; you don't actually get the content but rather a content or one of its many contentses. The normative part would be the specification of this property; the best practice would just be that you should use it, if the resource has content on the web. Of course there are many situations where you wouldn't use it, because you don't have the content, want to hide it, don't want to be bothered, don't know where it is, etc. That's OK. Sure, it's nice to be able to GET a description, as you have specified, but that doesn't help in general, e.g. in the PICS/POWDER use cases and what I gave above. This is an easy fix to your proposal. You just add a normative section that defines a property that people *may* use to provide this information: http://example/foo baz:hasContentUri http://example/foo-content;. or whatever you want to call it (Larry suggested 'location', I suggested 'hasInstanceUri'). This means that to get the content do a GET on that URI, and if the result is a 200 then you got content, otherwise all bets are off. (Well, dealing with 301/302/307 would be gravy.) Then the proposal will not be a net loss as far as expressive power goes. Opt-in to HR14a looks like this: http://example/foo baz:hasContentUri http://example/foo;. but nobody *has* to do that. There are problems with this idea, such as what if an agent can't parse the particular flavor of RDF that's in use, but before we get into that I want to see if you understand what I'm suggesting. Jonathan On Fri, Mar 30, 2012 at 10:32 AM, Jeni Tennison j...@jenitennison.com wrote: Jonathan, On 30 Mar 2012, at 01:51, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 6:01 PM, Jeni Tennison j...@jenitennison.com wrote: Good practice would be for Flickr to use separate URIs for 'the photograph' and 'the description of the photograph', to ensure that 'the description of the photograph' was reachable from 'the photograph' and to ensure that any statements referred to the correct one. Under the proposal, they could change to this good practice in four ways: 1. by adding: link rel=describedby href=#main / to their page (or pointing to some other URL that they choose to use for 'the description of the photograph') 2. by adding a Link: header with a 'describedby' relationship that points at a separate URI for 'the description of the photograph' (possibly a fragment as in 1?) Sorry, I didn't get why these are said to be better practice than the current Flickr page - how the document distinguishes the two cases. Does it say there 'should' or 'must' be a describedby? If the info resource assumption is gone, won't the Flickr page [still?] be understood the way Flickr intends? I'll have to study the proposal again (sorry, very hurried now, can't keep up) I see best practices as being separate from normative requirements, and thought that the proposals were for the normative requirements. We did recognise in the proposal the requirement for a best practice document to supplement the normative requirements: We also recommend that a clear guide on best practices when publishing and consuming data should be written, possibly an update to [cooluris]. I don't see this proposal as changing the current best practice recommendations, which are to have separate URIs for documents about things from the things themselves. I'm not sure I've understood your second question, but perhaps you're saying that using hash URIs for fragments of the page that contains descriptions doesn't work when mixed with the assumption that you get the description of that hash URI by
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Jonathan, On 30 Mar 2012, at 18:10, Jonathan A Rees wrote: My opinion is that any proposal needs to specify a way to say how you get from a resource to its content. I do a SPARQL query and find a URI for a resource based on metadata (stored in the triple store) that make it seem interesting; title, license, rating, whatever. Then I want to *look at it*. What do I do? httpRange-14(a) (or its intended stronger form) says you do a GET on its URI, and if you get a 200, that's the content, that's what I want to look at. So that's successful communication. If you delete HR14a, which is fine, you need, IMO, to replace it with some other way - normative and actionable - to express the same information, and that method has to be provided normatively, not as a best practice. Tim's proposal does this, my SHOULD not MUST proposal does, yours doesn't. OK, I think I see. The intention of the 'no longer implies' proposal is that you GET its URI. If you get a 200 then you have to look at the content that comes back to work out the relationship between the URI and the representation because you can't generally tell whether the representation is content or description. Your assertion, I think, is that we haven't specified a mechanism for providing an explicit statement within the content that says this stuff you got is the content of the resource this URI identifies, only one for saying the stuff over there is the description of the resource this URI identifies. The intention was for the :describedby property to double-up for this. The proposal states that if the content includes a statement using :describedby property in which the resource is the object of the statement, then you know that resource is an information resource (ie that you get the content of the resource from the URI). So if you GET U and you have a 200 and it contains something that looks like: _:something :describedby U . then you know that what you have gotten from U is the content of U. You say the gap can be fixed with: This is an easy fix to your proposal. You just add a normative section that defines a property that people *may* use to provide this information: http://example/foo baz:hasContentUri http://example/foo-content;. or whatever you want to call it (Larry suggested 'location', I suggested 'hasInstanceUri'). This means that to get the content do a GET on that URI, and if the result is a 200 then you got content, otherwise all bets are off. (Well, dealing with 301/302/307 would be gravy.) Then the proposal will not be a net loss as far as expressive power goes. I *think* that the :describedby triple, as defined in the proposal, provides equivalent information. If you have: U :describedby V . then you can turn it into: V :hasContentUri U . and it has the same meaning. What have I missed? Is it important that U is a string rather than a resource for example? Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)
Hi Tom, On 30 Mar 2012, at 17:22, Tom Heath wrote: On 27 March 2012 18:54, Jeni Tennison j...@jenitennison.com wrote: 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. I could offer you a data point from legislation.gov.uk if you like. Woohoo! You've made my decade :D When someone requests the ToC for an item of legislation, they will usually hit our CDN and the result will come back extremely quickly. I just tried: curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents and it showed the result coming back in 59ms. When someone uses the identifier URI for the abstract concept of an item of legislation, there's no caching so the request goes right back to the server. I just tried: curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67 and it showed the result coming back in 838ms, of course the redirection goes to the ToC above, so in total it takes around 900ms to get back the data. Brilliant. This is just the kind of analysis I'm talking about. Now we need to do similar across a bunch of services, connection speeds, locations, etc., and then compare it to typical response times across a representative sample of web sites. We use New Relic for this kind of thing, and the results are rather illuminating. 1ms response times makes you rather special IIRC. That's not to excuse sluggish sites, but just to put this in context. So every time that we refer to an item of legislation through its generic identifier rather than a direct link to its ToC we are making the site seem about 15 times slower. So now we're getting down to the crux of the question: does this outcome really matter?! 15x almost nothing is still almost nothing! 15x slower may offend our geek sensibilities, but probably doesn't matter in practice when the absolute numbers are so small. To give another example, I just did some very ad-hoc tests on some URIs at a department of a well-known UK university, and the results were rather revealing! The total response time (get URI of NIR, receive 303 response, get URI of IR, receive 200 OK and resource representation back) took ~10s, of which ***over 90%*** was taken up by waiting for the page/IR about that NIR to be generated! (and that's with curl, not a browser, which may then pull in a bunch of external dependencies). In this kind of situation I think there are other, bigger issues to worry about than the 1s taken for a 303-based roundtrip!! Just to put this into context for you so that you understand why it's a big deal. We have a contract [1] (well, actually three contracts) that specifies the time that it takes on average for a typical table of contents or section to be retrieved as less than one second. In the England/Wales contract [2], it's Clauses 12-13 of Section 6.8 of Schedule 1, on page 125 if you want to take a look. The contract includes financial penalties when these targets aren't reached. It's not easy to reach these targets with the kind of complex content we're dealing with. The only way we have a hope is through caching the hell out of the site, delivering it through a CDN. Now we could quibble over how exactly you measure the length of time for retrieving a section or table of contents, but it's really clear that what the customer (TNA) wants is a performant website that doesn't suffer from the noticeable delay when loading a page that you get when a page takes more than a second to come through [3]. If we had 303 hops, they would definitely be complaining (remember the 900ms doesn't include downloading CSS and Javascript, which add delays), and it could cost TSO money. I'm absolutely prepared to believe that there are sites out there that don't have these limitations: I don't really care if it takes more than a second for pages on my own website to get returned, for example. But for large-scale websites like legislation.gov.uk, delivered under contracts that have penalty clauses for poor performance, yes it really really does matter that it's 60ms rather than 900ms. Cheers, Jeni [1] http://www.contractsfinder.businesslink.gov.uk/Common/View%20Notice.aspx?site=1000lang=ennoticeid=272362fs=true [2] http://www.contractsfinder.businesslink.gov.uk/~/docs/DocumentDownloadHandler.ashx?noticeDocumentId=18140fileId=b826ad80-f316-493a-a86d-23546ceb95e2 [3] http://www.useit.com/papers/responsetime.html -- Jeni Tennison http://www.jenitennison.com
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Thu, 2012-03-29 at 01:37 +0100, Norman Gray wrote: [ . . . ] Thus as it stands, the term 'information resource' in [1] has no implication (beyond incidentally reiterating that the 200-retrieved content is a (REST) representation of the resource). However, the point of introducing the term is, I've always taken it, that it licenses the client to jump to some conclusions. These conclusions aren't spelled out anywhere, but (unless you're being whimsical) they're things like 'this is a document', or 'this is a network thing', or 'this is not a squawking macaw which will squeeze out of the ethernet port and crap on my keyboard'. What those conclusions materialise as in practice surely _depends on the application_ which is processing the resource. Exactly. And that is precisely why the UDDP Proposal use the term information resource but explicitly leaves its definition unconstrained: http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#2.7_Information_resource As I mentioned elsewhere, the term is not needed, and could be eliminated entirely. But it does provide a convenience for applications that wish to make additional assumptions based on an HTTP 200 response. -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Re: Change Proposal for HttpRange-14
On 27/03/2012 18:12, Kingsley Idehen kide...@openlinksw.com wrote: On 3/27/12 12:35 PM, Michael Smethurst wrote: On 27/03/2012 16:53, Kingsley Idehenkide...@openlinksw.com wrote: On 3/27/12 11:17 AM, Michael Smethurst wrote: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. I've said here before that the dbpedia publishing pattern is an anti-pattern and shouldn't be encouraged Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top priority, the goal was simple: unleash Linked Data in a manner that just worked. That meant catering for: 1. frameworks and libraries that send hash URIs over the wire 2. work with all browsers, no excuses. Linked Data is now alive and in broad use (contrary to many misconceptions to the contrary), there is still a need for slash URIs. This isn't a matter of encouragement or discouragement, its a case of what works for the project goals at hand. If slash URIs don't work then use hash URIs or vice versa. Platforms that conform to Linked Data meme principles should be able to handle these scenarios. BTW - Imagine a scenario where Linked Data only worked with one style of URI, where would we be today or tomorrow, re. Linked Data? Being dexterous and unobtrusive has to be a celebrated feature rather than a point of perpetual distraction. My point wasn't about hashes or slashes or any style of uri. Your comment was: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. Yes, but I was making a point about the one step, not the slashes or the 303 You described DBpedia method of doing things as being one to be discouraged. DBpedia deploys Linked Data via slash URIs, hence my response. Also, in the context of Linked Data slash URIs ultimately lead to the contentious 303 entity name / web resource address disambiguation heuristic. But they don't have to lead to doing conneg and 303 in one step It was about conflating 303s ((I can't give you that but) here's something that might be useful) with conneg (here's the useful thing in the representation you asked for). 303 isn't a conflation of anything. It's a redirection mechanism that can be used in different ways. Sometimes it facilitates access to alternative representations and sometimes it can just be used to facilitate indirection re. data access by name reference as per Linked Data principles. I didn't say 303 is a conflation. I said when you conflate the 303 part with the conneg... that's a conflation. I don't think it's 303s job to facilitate access to alternative representations; that's what conneg's for Dbpedia does: Thing that's not a web document conneg + 303 representation of a web document Instead of: Thing that's not a web document 303 resource uri for a web document conneg representation of web document If you do the latter then all html links can point to the resource uri of the web document so the publisher still incurs a conneg cost for each request (which is reasonable) but doesn't incur a 303 cost for every request (which isn't). The only place you need to refer to the uri of the thing that's not a web document is when you want to make statements about it If you do the former then (as per dbpedia) you end up linking to the thing that is not a web document and picking up a 303 penalty for every request So I'm saying that you can do slashes and 303s but in way that's more palatable to publishers than dbpedia I still think there are other problems with 303s: - some people who want to publish linked data just don't have access to configure their server to do this (which would also be a problem for any new 20x response) - persuading your manager and your manager's manager and your manager's manager's manager (not to mention ops!) is not easy But this is heading off topic so apologies Michael In the Linked Data system, you are seeking the description of an Entity that's been identified using a URI. If it so happens that the URI is hashless (or slash based) the system doesn't reply with an actual entity descriptor resource address, it redirects you. The very same thing happens with a hash URI but it has the benefit of delivering said indirection and disambiguation implicitly. There is always indirection in play. 303 isn't conflation, its simply redirection that is exploitable in a variety of ways. And about how not exposing the generic IR URI and not linking to it imposes too high a penalty Here are the potential penalties, both ultimately about entity name / entity descriptor (description) resource address disambiguation: 1. 303 round trip costs 2. agreement about which relations and constituent predicates provide agreed up semantics that address
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Greetings. [This is a late response, because I dithered about sending it, because this whole thing seems simple enough that I've got to be missing stuff] On 2012 Mar 27, at 14:02, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Tim, On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote: 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. I think the question should be does the response contain the content of it because I can serve both at once (foaf:PersonalProfileDocument rdf:about=). Yes, this is the question - is the retrieved representation content (I used the word instance but it's not catching on), or description. It can be both. Fine -- that seems the key question. In some ideal world, everything on the web would come with RDF which explained what it was; but expecting that ever to happen would be mad. The HR14 resolution gives one answer to this, by doing _two_ things. Step 1. HR14 declares the existence of a subset of resources named 'IR'. You can gloss this set as 'information resource', or 'document', note that the set is vague, or deny that the set is important, but that doesn't matter. Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a resource in IR: If you get a 200 when you dereference X, the resource is conclusively in IR. End of story. (you can all suck eggs, now, yes?) Why does the set IR matter? (and pace Tim and various weary voices in this metathread, I think it does matter). Because saying 'X names a resource in IR' tells you that the URI and the associated resource have a Particularly Simple Relationship -- the content of the HTTP retrieval is the 'content' of the resource (in some way which probably doesn't have to be precise, but which asserts that resource is something, unlike a Macaw, that can come through a network). In this way -- crucially -- it answers Tim's question (12) above: retrieving X with a 200 status obtains the content of the sucker. So the concept of 'IR' does do some work because it gives the client information about the object. Right? BUT, we (obviously) also want to talk about things where there's a slightly more complicated relationship between the URI and some resource (eg a URI which names a bird). In this case, the extra information (that the URI and the resource have a Particularly Simple Relationship) would be false. The cost of a particularly simple step 2 above, is the (in retrospect variously costly) indirection of the 303-dance. So the whole discussion seems to be about whether and how to relax step 2. Jeni Tennison's proposal says it should be relaxed in the presence of a 'describedby' link, David Booth's that it should be relaxed with a new definedby link, or a (self-)reference with rdfs:isDefinedBy. My 'proposal' was that it could be relaxed even more minimally, by saying that placing the resource in IR (step 2 above) could be done by the client only if this didn't contradict any RDF in the content of the resource (because the RDF said that X named a person, say), however conveyed (and of course these two proposals achieve that). After all this torrent of message (and I have honestly tried to read a significant fraction of them, and associated documents), I'm still not seeing how this is problematic. Perhaps I'm slow, or I've read the wrong fraction of messages. * Anything that was HR14-compliant will still be compliant with the relaxed Step 2. No change. * Any resource that wasn't in IR before, but whose URI nonetheless produced 200, was formally broken. It was telling lies. With a relaxed Step 2, it now won't be broken any more. Some applications (Tabulator?) will have to change to respect that, but they couldn't tell they were being lied to before, so they're merely exchanging one problem for a fixable one. * This is insensitive to the definition of 'information resource', and it doesn't matter if the content is multiple things. If a resource 200-says that its URI names a Book, then you don't have to worry whether that's an 'information resource' or not, because you know it's a book; end of algorithm; do not go to the end of Step 2; do not add any extra information hacked/derived from protocol details. That seems an inexpensive change which un-breaks a lot of things. All the best (in some puzzlement), Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Norman, let me summarize that: -Regardless of how you define IR, everything that denotes what it accesses should lie in IR. -Putting something in NIR therefor also answers the question if it denotes what it accesses with no by entailment. -There may or may not be IRs that do not denote what they access. But would it not be simpler just to signal this uri does not access what it denotes for 200 statuscodes instead of signalling this uri is a NIR ? Regards, Michael Brunnbauer On Wed, Mar 28, 2012 at 06:59:05PM +0100, Norman Gray wrote: Greetings. [This is a late response, because I dithered about sending it, because this whole thing seems simple enough that I've got to be missing stuff] On 2012 Mar 27, at 14:02, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Tim, On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote: 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. I think the question should be does the response contain the content of it because I can serve both at once (foaf:PersonalProfileDocument rdf:about=). Yes, this is the question - is the retrieved representation content (I used the word instance but it's not catching on), or description. It can be both. Fine -- that seems the key question. In some ideal world, everything on the web would come with RDF which explained what it was; but expecting that ever to happen would be mad. The HR14 resolution gives one answer to this, by doing _two_ things. Step 1. HR14 declares the existence of a subset of resources named 'IR'. You can gloss this set as 'information resource', or 'document', note that the set is vague, or deny that the set is important, but that doesn't matter. Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a resource in IR: If you get a 200 when you dereference X, the resource is conclusively in IR. End of story. (you can all suck eggs, now, yes?) Why does the set IR matter? (and pace Tim and various weary voices in this metathread, I think it does matter). Because saying 'X names a resource in IR' tells you that the URI and the associated resource have a Particularly Simple Relationship -- the content of the HTTP retrieval is the 'content' of the resource (in some way which probably doesn't have to be precise, but which asserts that resource is something, unlike a Macaw, that can come through a network). In this way -- crucially -- it answers Tim's question (12) above: retrieving X with a 200 status obtains the content of the sucker. So the concept of 'IR' does do some work because it gives the client information about the object. Right? BUT, we (obviously) also want to talk about things where there's a slightly more complicated relationship between the URI and some resource (eg a URI which names a bird). In this case, the extra information (that the URI and the resource have a Particularly Simple Relationship) would be false. The cost of a particularly simple step 2 above, is the (in retrospect variously costly) indirection of the 303-dance. So the whole discussion seems to be about whether and how to relax step 2. Jeni Tennison's proposal says it should be relaxed in the presence of a 'describedby' link, David Booth's that it should be relaxed with a new definedby link, or a (self-)reference with rdfs:isDefinedBy. My 'proposal' was that it could be relaxed even more minimally, by saying that placing the resource in IR (step 2 above) could be done by the client only if this didn't contradict any RDF in the content of the resource (because the RDF said that X named a person, say), however conveyed (and of course these two proposals achieve that). After all this torrent of message (and I have honestly tried to read a significant fraction of them, and associated documents), I'm still not seeing how this is problematic. Perhaps I'm slow, or I've read the wrong fraction of messages. * Anything that was HR14-compliant will still be compliant with the relaxed Step 2. No change. * Any resource that wasn't in IR before, but whose URI nonetheless produced 200, was formally broken. It was telling lies. With a relaxed Step 2, it now won't be broken any more. Some applications (Tabulator?) will have to change to respect that, but they couldn't tell they were being lied to before, so they're merely exchanging one problem for a fixable one. * This is insensitive to the definition of 'information resource', and it doesn't matter if the content is multiple things. If a resource 200-says that its URI names a Book, then you don't have to worry whether that's an 'information
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Wed, Mar 28, 2012 at 1:59 PM, Norman Gray nor...@astro.gla.ac.uk wrote: Greetings. [This is a late response, because I dithered about sending it, because this whole thing seems simple enough that I've got to be missing stuff] On 2012 Mar 27, at 14:02, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Tim, On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote: 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. I think the question should be does the response contain the content of it because I can serve both at once (foaf:PersonalProfileDocument rdf:about=). Yes, this is the question - is the retrieved representation content (I used the word instance but it's not catching on), or description. It can be both. Fine -- that seems the key question. In some ideal world, everything on the web would come with RDF which explained what it was; but expecting that ever to happen would be mad. The HR14 resolution gives one answer to this, by doing _two_ things. Step 1. HR14 declares the existence of a subset of resources named 'IR'. You can gloss this set as 'information resource', or 'document', note that the set is vague, or deny that the set is important, but that doesn't matter. Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a resource in IR: If you get a 200 when you dereference X, the resource is conclusively in IR. End of story. (you can all suck eggs, now, yes?) Why does the set IR matter? (and pace Tim and various weary voices in this metathread, I think it does matter). Because saying 'X names a resource in IR' tells you that the URI and the associated resource have a Particularly Simple Relationship -- the content of the HTTP retrieval is the 'content' of the resource (in some way which probably doesn't have to be precise, but which asserts that resource is something, unlike a Macaw, that can come through a network). In this way -- crucially -- it answers Tim's question (12) above: retrieving X with a 200 status obtains the content of the sucker. So the concept of 'IR' does do some work because it gives the client information about the object. Right? Wrong. Just knowing that it is an IR is not sufficient. You made a logical leap, unjustified by anything written down anywhere, that it was an IR *that had that content*. The Flickr and Jamendo examples are perfectly consistent with the URI naming an IR, but the content you get is not content of the IR described by the RDF therein, so they name a different IR. But let's grant this, as it can easily be fixed with a small clarification, and move on. It does not really bear on your proposal anyhow. BUT, we (obviously) also want to talk about things where there's a slightly more complicated relationship between the URI and some resource (eg a URI which names a bird). In this case, the extra information (that the URI and the resource have a Particularly Simple Relationship) would be false. The cost of a particularly simple step 2 above, is the (in retrospect variously costly) indirection of the 303-dance. So the whole discussion seems to be about whether and how to relax step 2. Jeni Tennison's proposal says it should be relaxed in the presence of a 'describedby' link, David Booth's that it should be relaxed with a new definedby link, or a (self-)reference with rdfs:isDefinedBy. My 'proposal' was that it could be relaxed even more minimally, by saying that placing the resource in IR (step 2 above) could be done by the client only if this didn't contradict any RDF in the content of the resource (because the RDF said that X named a person, say), however conveyed (and of course these two proposals achieve that). You are asking the right question, and I applaud the effort. I think many people would like a solution similar to this one. But IMO looking for a contradiction is not actionable, and for me that's a recipe for disaster, since it forces human judgment to intervene in each case. Human judgment is both expensive and unreliable. Contradictions are impossible to test by machine. The consistency of statements such as dc:creator or rdfs:comment with what the content is lies outside what machines can do. So you put humans in the path of deciding whether there is a contradiction, and therefore what the URI mode is. This doesn't sound good to me. Second, we know OWL Full consistency (i.e. contradiction detection) is undecidable, and OWL DL can be pretty hard. How did deciding the URI mode come to depend on what logic is being used, and become so complicated? Third, the RDF could be accidentally consistent with what the content is, when the intent was for the URI to refer to something that
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hallo Norman, -Regardless of how you define IR, everything that denotes what it accesses should lie in IR. -Putting something in NIR therefor also answers the question if it denotes what it accesses with no by entailment. I have worded this very badly. We are talking about things and names of things. This should be: For all URIs U: denote(U) = access(U) - denote(U) a IR It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U) -There may or may not be IRs that do not denote what they access. And this should be: There is a URI U where: denote(U) a IR and denote(U) != access(U). Now if a am allowed to mint a URI that 303's to your homepage and your homepage is an IR, such an URI must exist: U1 = Your URI for your homepage U2 = My URI for your homepage denote(U1) a IR denote(U2) != access(U2) denote(U1) = denote(U2) thererfor denote(U2) a IR and denote(U2) != access(U2) I think I'll stay out of this discussion from now :-) Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Michael, hello. On 2012 Mar 28, at 22:35, Michael Brunnbauer wrote: For all URIs U: denote(U) = access(U) - denote(U) a IR It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U) I think it's impossible, within the terms of HR14, to say 'denote(U) not a IR' -- you can prove something is in IR, but you can neither prove nor even operationally assert that it's not. -There may or may not be IRs that do not denote what they access. And this should be: There is a URI U where: denote(U) a IR and denote(U) != access(U). Now if a am allowed to mint a URI that 303's to your homepage and your homepage is an IR, such an URI must exist: U1 = Your URI for your homepage U2 = My URI for your homepage I don't think you even need the 303. If you make a URI, declare it to be a URI identifying my home page (I think David Booth has written about how you'd do this formally), and then have it 200-respond with a map of the Englisher Garten, then this places my homepage in IR, but does not access it. This appears to be compatible with HR14 in an only slightly perverse reading. I'm not sure what follows from that (but I suspect that way madness lies). I think I'll stay out of this discussion from now :-) I think we should draw a veil, here... Best wishes, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Tim, On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote: 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. I think the question should be does the response contain the content of it because I can serve both at once (foaf:PersonalProfileDocument rdf:about=). Is there a difference between this question and the IR question if we take Dans definition of IR as 'Web-serializable networked entity' ? Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
This seems an appropriate place for me to drop in my 2 cents. I like the 303 trick. People that care about this stuff can use it (and appear to be doing so), but it doesn't really matter too much that people that don't care don't use it. It seems analogous to the question of HTML validity. Best practices suggest creating valid markup, but if it isn't perfect, it's not a big deal, most UAs will be able to make sense of it. There will be reduced fidelity of communication, sure, but there will be imperfections in the system whatever, so any trust/provenance chain will have to consider such issues anyway. So I don't really think Jeni's proposal is necessary, but don't feel particularly strongly one way or the other. Philosophically I reckon the flexibility of what a representation of a resource can be means that the notion of an IR isn't really needed. I've said this before in another thread somewhere, but if the network supported the media type thing/dog then it would be possible to GET http://example.org/Basil with full fidelity. Right now it doesn't, but I'd argue that what you could get with media type image/png would still be a valid, if seriously incomplete representation of my dog. In other words, a description of a thing shares characteristics with the thing itself, and that's near enough for HTTP representation purposes. Cheers, Danny. -- http://dannyayers.com http://webbeep.it - text to tones and back again
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 3/27/12 7:59 AM, Danny Ayers wrote: This seems an appropriate place for me to drop in my 2 cents. I like the 303 trick. People that care about this stuff can use it (and appear to be doing so), but it doesn't really matter too much that people that don't care don't use it. It seems analogous to the question of HTML validity. Best practices suggest creating valid markup, but if it isn't perfect, it's not a big deal, most UAs will be able to make sense of it. There will be reduced fidelity of communication, sure, but there will be imperfections in the system whatever, so any trust/provenance chain will have to consider such issues anyway. So I don't really think Jeni's proposal is necessary, but don't feel particularly strongly one way or the other. Philosophically I reckon the flexibility of what a representation of a resource can be means that the notion of an IR isn't really needed. I've said this before in another thread somewhere, but if the network supported the media type thing/dog then it would be possible to GET http://example.org/Basil with full fidelity. Right now it doesn't, but I'd argue that what you could get with media type image/png would still be a valid, if seriously incomplete representation of my dog. In other words, a description of a thing shares characteristics with the thing itself, and that's near enough for HTTP representation purposes. Cheers, Danny. Amen!! We have resources that just 'mention' or 'refer' to *things* loosely i.e., you typical Web page. RDF introduce resources that explicitly 'describe' unambiguously named *things* via URIs. RDFS OWL introduces resources that explicitly 'define' unambiguously named *things* such as classes and properties via URIs. Linked Data (or Hyperdata0 introduces resources that explicitly 'describe' and 'define' unambiguously named *things* via de-referencable URIs. When all is said an done, all of the above boils down to *representation fidelity* that one could order (hierarchically) as follows: 1. generic representation -- Web Pages 2. description oriented representation -- RDF which may or may not follow Linked Data principles 3. definition oriented representation -- RDFS, OWL, which may or may not follow Linked Data principles. BTW -- I've published a work in progress post [1] that includes some diagrams (including the original WWW proposal depiction) re. Data, Documents, Content, URIs, and URLs. Links: 1. http://goo.gl/DRvQM -- Understanding Data . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Tim, On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote: 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. I think the question should be does the response contain the content of it because I can serve both at once (foaf:PersonalProfileDocument rdf:about=). Yes, this is the question - is the retrieved representation content (I used the word instance but it's not catching on), or description. It can be both. Is there a difference between this question and the IR question if we take Dans definition of IR as 'Web-serializable networked entity' ? There is a difference, since what is described could be an IR that does not have the description as content. A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). The identified resource is an IR as you suggest, but the representation (after the 303 redirect) is not its content. Another example (anti-httpRange-14) is http://www.flickr.com/photos/70365734@N00/6905069277/ The identified resource (according to the retrieved RDFa) is an IR, but the retrieved representation is not its content. In other words, even if the identified resource is an IR (under any definition), the question remains of whether the retrieved representation is content or description (except in the case where it is both). The two dimensions are orthogonal. Maybe I misunderstand your question. This whole information resource thing needs to just go away. I can't believe how many people come back to it after the mistake has been pointed out so many times. Maybe the TAG or someone has to make a statement admitting that the way httpRange-14(a) was phrased was a big screwup, that the real issue is content vs. description, not a type distinction. I think Jeni's proposal is to say that the Flickr URI is good practice, rather than deny it. My proposal is to say that the description-free situation is good practice, rather than just an undocumented common practice. In a hybrid world where some URIs work one way (by description) and others work the other way (by ostention), the question for anyone encountering a hashless http: URI in RDF, is which of the two situations (or both) obtain. (Maybe there are some URIs that work neither way, or there is a gray area.) It would be nice if there were definite answers at least for some URIs. Jonathan Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 3/27/12 9:02 AM, Jonathan A Rees wrote: A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). I don't always have to seek or need RDF. I just need structured data. I can make Linked Data from non RDF resources. See: 1. http://uriburner.com/about/html/http://dx.doi.org/10.1371/journal.pcbi.1000462 -- a basic description 2. http://uriburner.com/about/id/entity/http/dx.doi.org/10.1371/journal.pcbi.1000462 -- an inferred description. You can use the rules of HttpRange-14 combined with the rules of Linked Data to make a description oriented representation. Of course, I am doing translation and inference, but that's only possible due to the ground rules that are already in place etc.. Thus, we have an identifier associated with data that's ended up being interpreted courtesy of key ground rules from HttpRange-14 findings and Linked Data principles. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Tue, Mar 27, 2012 at 9:32 AM, Kingsley Idehen kide...@openlinksw.com wrote: On 3/27/12 9:02 AM, Jonathan A Rees wrote: A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). I don't always have to seek or need RDF. I just need structured data. I can make Linked Data from non RDF resources. That wasn't my point. I was just giving an example where a 303 URI refers to an IR. This illustrates the idea that being defined by description does not imply that you have an non-IR (which I admit was not a point I had to make). That's all. You don't have to do the conneg if you don't want to, you just get a non-RDF description of the resource if you don't ask for RDF. If you don't like this example look at the Flickr one instead. Jonathan
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 3/27/12 9:02 AM, Jonathan A Rees wrote: Maybe the TAG or someone has to make a statement admitting that the way httpRange-14(a) was phrased was a big screwup, that the real issue is content vs. description, not a type distinction. It should! -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hello Jonathan, so let the question be did I GET what the URI denotes and let httprange14 be 200 - yes, 303 - no. Let another question be can this URI be used with document annotation properties (or: Is this URI an IR) ? From 200 a statuscode, I can infer that the URI can be used with document annotation properties and use those properties. I can also use those properties with some 303 URIs but not always. Both these questions may not be answered from a 200 statuscode in the future. Is all of this right ? Regards, Michael Brunnbauer On Tue, Mar 27, 2012 at 09:02:04AM -0400, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Tim, On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote: 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. I think the question should be does the response contain the content of it because I can serve both at once (foaf:PersonalProfileDocument rdf:about=). Yes, this is the question - is the retrieved representation content (I used the word instance but it's not catching on), or description. It can be both. Is there a difference between this question and the IR question if we take Dans definition of IR as 'Web-serializable networked entity' ? There is a difference, since what is described could be an IR that does not have the description as content. A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). The identified resource is an IR as you suggest, but the representation (after the 303 redirect) is not its content. Another example (anti-httpRange-14) is http://www.flickr.com/photos/70365734@N00/6905069277/ The identified resource (according to the retrieved RDFa) is an IR, but the retrieved representation is not its content. In other words, even if the identified resource is an IR (under any definition), the question remains of whether the retrieved representation is content or description (except in the case where it is both). The two dimensions are orthogonal. Maybe I misunderstand your question. This whole information resource thing needs to just go away. I can't believe how many people come back to it after the mistake has been pointed out so many times. Maybe the TAG or someone has to make a statement admitting that the way httpRange-14(a) was phrased was a big screwup, that the real issue is content vs. description, not a type distinction. I think Jeni's proposal is to say that the Flickr URI is good practice, rather than deny it. My proposal is to say that the description-free situation is good practice, rather than just an undocumented common practice. In a hybrid world where some URIs work one way (by description) and others work the other way (by ostention), the question for anyone encountering a hashless http: URI in RDF, is which of the two situations (or both) obtain. (Maybe there are some URIs that work neither way, or there is a gray area.) It would be nice if there were definite answers at least for some URIs. Jonathan Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: Change Proposal for HttpRange-14
On 26/03/2012 17:13, Tom Heath tom.he...@talis.com wrote: Hi Jeni, On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. No data here I fear; merely anecdote. But anecdote is usually the best form of data :-) What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). I think it's explainable. I don't think it's self evident And explanation can be tricky because: a) once you get past the obvious cases (a person and their homepage) there are further levels of abstraction that make things complicated. A journalist submits a report to a news agency, a sub-editor tweaks it and puts it on the wires, a news publisher picks up the report, a journalist shapes an article around it, another sub-editor tweaks that, the article gets published, the article gets syndicated. Which document is the rdf making claims (created by, created at) about? And is that the important / interesting thing? You quickly head down a frbr shaped rabbit hole b) The way people make and use websites (outside the whole linked data thing) has moved on. Many people don't just publish pages; they publish pages that have a one-to-one correspondence with real world things. A page per photo or programme or species or recipe or person. They're already in the realm of thinking about things before pages and to them the page and it's url is a good enough approximation for description c) people using the web are already thinking about things not pages. If you search google for Obama your mental model is of the person, not any resulting pages d) we already have the resource / representation split which is quite enough abstraction for some people e) the list of things you might want to say about a document is finite; the list of things you might want to say about the world isn't 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. I know publishers whose platform is so constrained they can't even edit the head section of their html documents. They certainly don't have access at the server level Even where 303s are technically possible they might not be politically possible. Technically we could have easily created bbc.co.uk/things/:blah and made it 303 but that would have involved setting up /things and that's a *very* difficult conversation with management and ops And if it's technically and politically possible it really depends on how the 303 is set up. Lots of linked data people seem to conflate the 303 and content negotiation. So I ask for something that can't be sent, they do the accept header stuff and 303 me to the *representation* url. Rather than: I ask for something that can't be sent, they 303 to a generic information resource which content negotiates to the appropriate representation. If you do this in two steps (303 then conneg) you can point any html links at the generic document resource url so you don't pick up a 303 penalty for every request No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. I've said here before that the dbpedia publishing pattern is an anti-pattern and shouldn't be encouraged Whichever way you do it, it doesn't take away Dave Reynold's point that: I have been in discussions with clients
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Tue, Mar 27, 2012 at 10:37 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Jonathan, so let the question be did I GET what the URI denotes and let httprange14 be 200 - yes, 303 - no. Basically yes, although you have to be careful preserve the generic/specific (or resource/representation) distinction somehow, or else people will say that you don't know what you're talking about. If you always get the same representation from a URI, the distinction goes away, but in practice you have content negotiation, change over time, banner ads, login specific customizations, etc. that make life more difficult. The only way I've found to make sense of this complexity is what I wrote up in my Generic resources and web metadata note, which claims that what people unconsciously intend is usually universal quantification. If you need to be really precise about what you say about documents (transclusion, scripts, etc.) then using 200 URIs in RDF without further explanation is probably not a great idea; you'd want some kind of vocabulary that allowed you to say precisely what you mean. Let another question be can this URI be used with document annotation properties (or: Is this URI an IR) ? From 200 a statuscode, I can infer that the URI can be used with document annotation properties and use those properties. I can also use those properties with some 303 URIs but not always. That's another question, but it is rarely asked without also wondering just what the content is, since the content is going to determine whether the annotations are true or not. So I would focus on the content, and then annotatability will sort itself out. Both these questions may not be answered from a 200 statuscode in the future. If consensus is built around non-HR14a uses of 200, yes. You'd have to look elsewhere for additional clues, e.g. the headers or content. Is all of this right ? Close enough. Jonathan
Re: Change Proposal for HttpRange-14
On 3/27/12 11:17 AM, Michael Smethurst wrote: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. I've said here before that the dbpedia publishing pattern is an anti-pattern and shouldn't be encouraged Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top priority, the goal was simple: unleash Linked Data in a manner that just worked. That meant catering for: 1. frameworks and libraries that send hash URIs over the wire 2. work with all browsers, no excuses. Linked Data is now alive and in broad use (contrary to many misconceptions to the contrary), there is still a need for slash URIs. This isn't a matter of encouragement or discouragement, its a case of what works for the project goals at hand. If slash URIs don't work then use hash URIs or vice versa. Platforms that conform to Linked Data meme principles should be able to handle these scenarios. BTW - Imagine a scenario where Linked Data only worked with one style of URI, where would we be today or tomorrow, re. Linked Data? Being dexterous and unobtrusive has to be a celebrated feature rather than a point of perpetual distraction. As is always the case, a good system must pass the horses for courses test. Linked Data -- courtesy of the underlying architecture of the World Wide Web -- does that with aplomb modulo the distraction star wanderings of planet HttpRange-14 into its solar system every so many months :-) -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
On 27/03/2012 16:53, Kingsley Idehen kide...@openlinksw.com wrote: On 3/27/12 11:17 AM, Michael Smethurst wrote: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. I've said here before that the dbpedia publishing pattern is an anti-pattern and shouldn't be encouraged Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top priority, the goal was simple: unleash Linked Data in a manner that just worked. That meant catering for: 1. frameworks and libraries that send hash URIs over the wire 2. work with all browsers, no excuses. Linked Data is now alive and in broad use (contrary to many misconceptions to the contrary), there is still a need for slash URIs. This isn't a matter of encouragement or discouragement, its a case of what works for the project goals at hand. If slash URIs don't work then use hash URIs or vice versa. Platforms that conform to Linked Data meme principles should be able to handle these scenarios. BTW - Imagine a scenario where Linked Data only worked with one style of URI, where would we be today or tomorrow, re. Linked Data? Being dexterous and unobtrusive has to be a celebrated feature rather than a point of perpetual distraction. My point wasn't about hashes or slashes or any style of uri. It was about conflating 303s ((I can't give you that but) here's something that might be useful) with conneg (here's the useful thing in the representation you asked for). And about how not exposing the generic IR URI and not linking to it imposes too high a penalty Whether 303s are useful or not, there's a good and bad way to use them Cheers micheel As is always the case, a good system must pass the horses for courses test. Linked Data -- courtesy of the underlying architecture of the World Wide Web -- does that with aplomb modulo the distraction star wanderings of planet HttpRange-14 into its solar system every so many months :-) http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Change Proposal for HttpRange-14
On 3/27/12 12:35 PM, Michael Smethurst wrote: On 27/03/2012 16:53, Kingsley Idehenkide...@openlinksw.com wrote: On 3/27/12 11:17 AM, Michael Smethurst wrote: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. I've said here before that the dbpedia publishing pattern is an anti-pattern and shouldn't be encouraged Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top priority, the goal was simple: unleash Linked Data in a manner that just worked. That meant catering for: 1. frameworks and libraries that send hash URIs over the wire 2. work with all browsers, no excuses. Linked Data is now alive and in broad use (contrary to many misconceptions to the contrary), there is still a need for slash URIs. This isn't a matter of encouragement or discouragement, its a case of what works for the project goals at hand. If slash URIs don't work then use hash URIs or vice versa. Platforms that conform to Linked Data meme principles should be able to handle these scenarios. BTW - Imagine a scenario where Linked Data only worked with one style of URI, where would we be today or tomorrow, re. Linked Data? Being dexterous and unobtrusive has to be a celebrated feature rather than a point of perpetual distraction. My point wasn't about hashes or slashes or any style of uri. Your comment was: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. You described DBpedia method of doing things as being one to be discouraged. DBpedia deploys Linked Data via slash URIs, hence my response. Also, in the context of Linked Data slash URIs ultimately lead to the contentious 303 entity name / web resource address disambiguation heuristic. It was about conflating 303s ((I can't give you that but) here's something that might be useful) with conneg (here's the useful thing in the representation you asked for). 303 isn't a conflation of anything. It's a redirection mechanism that can be used in different ways. Sometimes it facilitates access to alternative representations and sometimes it can just be used to facilitate indirection re. data access by name reference as per Linked Data principles. In the Linked Data system, you are seeking the description of an Entity that's been identified using a URI. If it so happens that the URI is hashless (or slash based) the system doesn't reply with an actual entity descriptor resource address, it redirects you. The very same thing happens with a hash URI but it has the benefit of delivering said indirection and disambiguation implicitly. There is always indirection in play. 303 isn't conflation, its simply redirection that is exploitable in a variety of ways. And about how not exposing the generic IR URI and not linking to it imposes too high a penalty Here are the potential penalties, both ultimately about entity name / entity descriptor (description) resource address disambiguation: 1. 303 round trip costs 2. agreement about which relations and constituent predicates provide agreed up semantics that address actual entity name / entity descriptor resource address ambiguity . Here are some of the constituencies to which these potential costs apply: 1. Web Page Publishers -- content publishers 2. Linked Data publishers -- structured data publishers 3. Web Page Consumers -- content consumers 4. Linked Data Consumers -- structured data consumers. Expand the items above and you get an interesting cost vs benefits matrix. To cut a longish story short, if HTTP had a DESCRIBE method all of this confusion would vanish, pronto. Then you would have HTTP requests of the form: DESCRIBE http://dbpedia.org/resource/Linked_Data and DESCRIBE http://dbpedia.org/page/Linked_Data Net effect: an HTTP request could specifically return the relevant chunks of the description data that you seek. Today, the SPARQL protocol provides the next best thing. Whether 303s are useful or not, there's a good and bad way to use them As is the case with everything :-) Kingsley Cheers micheel As is always the case, a good system must pass the horses for courses test. Linked Data -- courtesy of the underlying architecture of the World Wide Web -- does that with aplomb modulo the distraction star wanderings of planet HttpRange-14 into its solar system every so many months :-) http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further
Re: Change Proposal for HttpRange-14
Hi Tom, On 26 Mar 2012, at 17:13, Tom Heath wrote: On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). How would we assess that though? My experience is in some way similar -- it's easy enough to explain that you can't get a Road or a Person when you ask for them on the web -- but when you move on to then explaining how that means you need two URIs for most of the things that you really want to talk about, and exactly how you have to support those URIs, it starts getting much harder. The biggest indication to me that explaining the distinction is a problem is that neither OGP nor schema.org even attempts to go near it when explaining to people how to add to semantic information into their web pages. The URIs that you use in the 'url' properties of those vocabularies are explained in terms of 'canonical URLs' for the thing that is being talked about. These are the kinds of graphs that millions of developers are building on, and those developers do not consider themselves linked data adopters and will not be going to linked data experts for training. 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. I could offer you a data point from legislation.gov.uk if you like. When someone requests the ToC for an item of legislation, they will usually hit our CDN and the result will come back extremely quickly. I just tried: curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents and it showed the result coming back in 59ms. When someone uses the identifier URI for the abstract concept of an item of legislation, there's no caching so the request goes right back to the server. I just tried: curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67 and it showed the result coming back in 838ms, of course the redirection goes to the ToC above, so in total it takes around 900ms to get back the data. So every time that we refer to an item of legislation through its generic identifier rather than a direct link to its ToC we are making the site seem about 15 times slower. What's more, it puts load on our servers which doesn't happen when the data is cached; the more load, the slower the responses to other important things that are hard to cache, such as free-text searching. The consequence of course is that for practical reasons we design the site not to use generic identifiers for items of legislation unless we really can't avoid it and add redirections where we should technically be using 404s. The impracticality of 303s has meant that we've had to compromise in other areas of the structure of the site. This is just one data point of course, and it's possible that if we'd fudged the handling of the generic identifiers (eg by not worrying about when they should return 404s or 300s and just always doing a regex mapping to a guess of an equivalent document URI) we would have better performance from them, but that would also have been a design compromise forced on us because of the impracticality of 303s. (In fact we made this precise design compromise for the data.gov.uk linked data.) 3) hard data about occurrences of different patterns/anti-patterns; we need something more concrete/comprehensive than the
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hi, On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees r...@mumble.net wrote: ... There is a difference, since what is described could be an IR that does not have the description as content. A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). The identified resource is an IR as you suggest, but the representation (after the 303 redirect) is not its content. A couple of comments here: 1. Its not any DOI. I believe CrossRef are still the only registrar that support this, but I might have missed an announcement. That's still 50m DOIs though 2. Are you sure its an Information Resource? The DOI handbook [1] notes that while typically used to identify intellectual property a DOI can be used to identify anything. The CrossRef guidelines [2] explain that [a]s a matter of current policy, the CrossRef DOI identifies the work, not its various potential manifestations Is a FRBR work an Information Resource? Personally I'd say not, but others may disagree. But as Dan Brickley has noted elsewhere in the discussion, there's other nuances to take into account. [1]. http://www.doi.org/handbook_2000/intro.html#1.6 [2]. http://crossref.org/02publishers/15doi_guidelines.html Cheers, L.
Re: Change Proposal for HttpRange-14
On 27 March 2012 19:54, Jeni Tennison j...@jenitennison.com wrote: Hi Tom, On 26 Mar 2012, at 17:13, Tom Heath wrote: On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). How would we assess that though? My experience is in some way similar -- it's easy enough to explain that you can't get a Road or a Person when you ask for them on the web -- but when you move on to then explaining how that means you need two URIs for most of the things that you really want to talk about, and exactly how you have to support those URIs, it starts getting much harder. I'm curious as to why this is difficult to explain. Especially since I also have difficulties explaining the benefits of linked data. However, normally the road block I hit is explaining why URIs are important. Are there perhaps similar paradigms that the majority of developers are already already familiar with? One that springs to mind is in java You have a file Hello.java But the file contains the actual class, Hello, which has keys and values. Or perhaps most people these datys know JSON, where you have file like hello.json The file itself is not that important, but it can contain 0 or more objects, such as { key1 : value1, key2 : value2, key3 : value3 } Would this be a valid analogy? The biggest indication to me that explaining the distinction is a problem is that neither OGP nor schema.org even attempts to go near it when explaining to people how to add to semantic information into their web pages. The URIs that you use in the 'url' properties of those vocabularies are explained in terms of 'canonical URLs' for the thing that is being talked about. These are the kinds of graphs that millions of developers are building on, and those developers do not consider themselves linked data adopters and will not be going to linked data experts for training. 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. I could offer you a data point from legislation.gov.uk if you like. When someone requests the ToC for an item of legislation, they will usually hit our CDN and the result will come back extremely quickly. I just tried: curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents and it showed the result coming back in 59ms. When someone uses the identifier URI for the abstract concept of an item of legislation, there's no caching so the request goes right back to the server. I just tried: curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67 and it showed the result coming back in 838ms, of course the redirection goes to the ToC above, so in total it takes around 900ms to get back the data. So every time that we refer to an item of legislation through its generic identifier rather than a direct link to its ToC we are making the site seem about 15 times slower. What's more, it puts load on our servers which doesn't happen when the data is cached; the more load, the slower the responses to other important things that are hard to cache, such as free-text searching. The consequence of course is that for practical reasons we design the site not to use generic identifiers for items of legislation
Re: Change Proposal for HttpRange-14
Tom if you were to do a serious assessment then measuring milliseconds and redirect hits means looking at a misleading 10% of the problem. Cognitive loads,economics and perception of benefits are the over the 90% of the question here. An assessment that could begin describing the issue * get a normal webmaster calculate how much it takes to explain him the thing,follow him on and * see how quickly he forgets, * assess how much it takes to VALIDATE the whole thing works (E.g. a newly implemented spects) * assess what are the tools that would check if something break * assess the same thing for implementers e.g. of applications or consuming APIs to get all teh above * then once you calculate the huge cost above then compare it with the perceived benefits. THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical level because for sites that matters ITS MANAGERS THAT DECIDE geek run websites dont count, sorry. Same thing when looking at 'real world applications' by counting just geeky hacked together demostrators or semweb aficionados libs has the same skew.. these people and apps were paid by EU money or research money or so they should'n count toward real world economics driven apps, so if one was thinking of counting 50 apps that would break that'd be just as partial and misleading. .. and we could go on. Now do you really need to do the above? (let alone how difficult it is to do inproper terms) me and a whole crowd know already the results for the same exercise have been done over and over and we've been witnessing it. i sincerely hope this is the time we get this fixed so we can indeed go back and talk about the new linked data (linked data 2.0) to actual web developers, it managers etc. removing the 303 thing doesnt solve the whole problem, it is just the beginning. Looking forward to discuss next steps Gio On Mon, Mar 26, 2012 at 6:13 PM, Tom Heath tom.he...@talis.com wrote: Hi Jeni, On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. 3) hard data about occurrences of different patterns/anti-patterns; we need something more concrete/comprehensive than the list in the change proposal document. 4) examples of cases where the use of anti-patterns has actually caused real problems for people, and I don't mean problems in principle; have planes fallen out of the sky, has anyone died? Does it really matter from a consumption perspective? The answer to this is probably not, which may indicate a larger problem of non-adoption. The larger question is how do we get to a state where we *don't* have this permathread running, year in year out. Jonathan and the TAG's aim with the call for change proposals is to get us to that state. The idea is that by getting people who think that the specs should say something different to put their money where their mouth is and express what that should be, we have something more solid to work from than reams and reams of opinionated emails. This is a really worthy goal, and thank you to you, Jonathan and the TAG for taking it on. I long for the situation you describe where the permathread is 'permadead' :) But we do all need
Re: Change Proposal for HttpRange-14
On 3/27/12 3:23 PM, Melvin Carvalho wrote: curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents and it showed the result coming back in 59ms. When someone uses the identifier URI for the abstract concept of an item of legislation, there's no caching so the request goes right back to the server. I just tried: curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67 What do you get for timing results when you compare: curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67 and curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf ? I would expect the delta to be the overhead contributed by indirection delivered via the 303 redirection heuristic. From my U.S. location I get the following results: for: time curl -v http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf real0m1.117s user0m0.002s sys0m0.003s for: time curl -v http://www.legislation.gov.uk/id/ukpga/1985/67 real0m1.521s user0m0.002s sys0m0.003s Also note, if you add wdrs:describedby relations to your RDF documents the descripton subject URI or its descriptor document URL will work fine i.e., existing Linked Data clients will ultimately end up in a follow-your-nose friendly Linked Data graph. The relation in question is a triple of the form: http://www.legislation.gov.uk/id/ukpga/1985/67 wdrs:describedby http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf Here is placeholder URI for my suggestion: http://linkeddata.uriburner.com/about/id/entity/http/www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf . If you make the change, reload using URL pattern: http://linkeddata.uriburner.com/about/html/http/www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf?sponger:get=addrefresh=0 As for: http://www.legislation.gov.uk/ukpga/1985/67/contents, what about a link/ relation in its head/ section that establishes http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf as an alternative representation? You already have this sort of relation in place as per the following entry: link rel=alternate type=application/xml href=http://legislation.data.gov.uk/ukpga/1985/67/contents/data.xml; / -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Tue, Mar 27, 2012 at 2:14 PM, Leigh Dodds le...@ldodds.com wrote: Hi, On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees r...@mumble.net wrote: ... There is a difference, since what is described could be an IR that does not have the description as content. A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). The identified resource is an IR as you suggest, but the representation (after the 303 redirect) is not its content. A couple of comments here: 1. Its not any DOI. I believe CrossRef are still the only registrar that support this, but I might have missed an announcement. That's still 50m DOIs though You are right, it's not all registrars. I meant Crossref DOIs. I think Datacite DOIs do this too, but I'm not sure. 2. Are you sure its an Information Resource? Nobody can be sure of any such question. I would say it is (as would be a variety of FRBR Works or Expressions or Manifestations, and many other things besides), but there is nothing I could possibly say that would persuade you of this. This is why, as Tim and I keep saying, you have to forget about the information resource nonsense and focus instead on the idea of content or instantiation. I assume you're aware of what I've written on this subject, so it would be pointless for me to say more here. I hope the TAG will make a clear statement about this to help people stop bickering about this kind of thing. Often I think people attack information resource just because they want to use 200s for their linked data descriptions. This is a rather indirect tactic, and it misses the whole point of httpRange-14(a), which admittedly was a screwup in execution, but not idiotic in motivation. Jonathan The DOI handbook [1] notes that while typically used to identify intellectual property a DOI can be used to identify anything. The CrossRef guidelines [2] explain that [a]s a matter of current policy, the CrossRef DOI identifies the work, not its various potential manifestations Is a FRBR work an Information Resource? Personally I'd say not, but others may disagree. But as Dan Brickley has noted elsewhere in the discussion, there's other nuances to take into account. [1]. http://www.doi.org/handbook_2000/intro.html#1.6 [2]. http://crossref.org/02publishers/15doi_guidelines.html Cheers, L.
Re: Change Proposal for HttpRange-14
On 3/27/12 4:01 PM, Giovanni Tummarello wrote: Tom if you were to do a serious assessment then measuring milliseconds and redirect hits means looking at a misleading 10% of the problem. Cognitive loads,economics and perception of benefits are the over the 90% of the question here. An assessment that could begin describing the issue * get a normal webmaster calculate how much it takes to explain him the thing,follow him on and * see how quickly he forgets, * assess how much it takes to VALIDATE the whole thing works (E.g. a newly implemented spects) * assess what are the tools that would check if something break * assess the same thing for implementers e.g. of applications or consuming APIs to get all teh above * then once you calculate the huge cost above then compare it with the perceived benefits. THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical level because for sites that matters ITS MANAGERS THAT DECIDE geek run websites dont count, sorry. That's a really skewed and somewhat biased sequence. How about this one: 1. Demonstrate the virtues of Linked Data modulo a single line of code 2. Determine if the customer can work with the Linked Data tool as is 3. Quote on professional services if they opt to engage you to get it going rather that doing it themselves. Look, your example is akin to prescribing the following to an ODBC driver customer: 1. Explain what an ODBC Data Source Name is 2. Explain the constituency of a connect string 3. Explain who to use the ODBC API in C/C++ or VB where Environment Handles and Connection Handles management creep in 4. Compare to the perceived benefits. Q: What are the perceived, anticipated, or actual benefits of Linked Data? A: Enterprise and/or Individual agility improvements via increased access to data across disparate data sources. Q: What are the perceived, anticipated, or actual benefits of the World Wide Web? A: Enterprise and/or individual agility improvements via increased access to data across disparate data sources. If you bring minutia into the conversation you invite the skewed sequence you outlined in your sequence. Here's what we are all ultimately seeking to enable. The sequence goes something like this: 1. Something piques your interest; 2. You make a statement about it in a document; 3. You publish the document to the Web (or private network); 4. Done! This pattern works absolutely fine using hash URIs, you can even go kinda primitive re. your narrative. Say something like this: 1. Create a file; 2. Describe the item of interest via structured content in 3-tuple (triple) form using an Identifier of the form: file-name#this ; 3. Save the file ; 4. Publish the file to the Web; 5. Done! This whole thing is like a global jigsaw puzzle, instead of trying to put all the pieces together in one go, simply contribute or connect the pieces that are of interest to you. The Web (or your private HTTP based network) will do the REST, no joke :-) Same thing when looking at 'real world applications' by counting just geeky hacked together demostrators or semweb aficionados libs has the same skew.. these people and apps were paid by EU money or research money or so they should'n count toward real world economics driven apps, so if one was thinking of counting 50 apps that would break that'd be just as partial and misleading. You are simply confirming the issue re. obvious dearth of productivity oriented tools in the Linked Data realm. .. and we could go on. Now do you really need to do the above? (let alone how difficult it is to do inproper terms) me and a whole crowd know already the results for the same exercise have been done over and over and we've been witnessing it. i sincerely hope this is the time we get this fixed so we can indeed go back and talk about the new linked data (linked data 2.0) to actual web developers, it managers etc. Managers will always fund projects that are beneficial. Thus, time to manifest value proposition is crucial. If the journey requires scripting or heavy duty coding as a basic prerequisite, it's deservedly dead on arrival. removing the 303 thing doesnt solve the whole problem, it is just the beginning. Looking forward to discuss next steps It has nothing to do with 303. You keep on pulling 303 into to the conversation then end up complaining about the mess it potentially creates. Show the value first, not the mechanics of the value engine. As per usual, I encourage you and others to study the 20+ year old ODBC ecosystem which is comprised of: 1. ODBC compliant productivity tools 2. ODBC drivers 3. Relational Databases. The only difference between ODBC Data Source Names and Linked Data is the use of X.500 style naming re. ODBC connection strings and the fact that the graphs a confined to the realm of 'C' data structures. If you study the API you would be quite amazed as to how much it actually covers. Linked Data is more
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hi Jonathan, On 3/27/2012 3:27 PM, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 2:14 PM, Leigh Doddsle...@ldodds.com wrote: Hi, On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Reesr...@mumble.net wrote: ... There is a difference, since what is described could be an IR that does not have the description as content. A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). The identified resource is an IR as you suggest, but the representation (after the 303 redirect) is not its content. A couple of comments here: 1. Its not any DOI. I believe CrossRef are still the only registrar that support this, but I might have missed an announcement. That's still 50m DOIs though You are right, it's not all registrars. I meant Crossref DOIs. I think Datacite DOIs do this too, but I'm not sure. 2. Are you sure its an Information Resource? Nobody can be sure of any such question. I would say it is (as would be a variety of FRBR Works or Expressions or Manifestations, and many other things besides), but there is nothing I could possibly say that would persuade you of this. This is why, as Tim and I keep saying, you have to forget about the information resource nonsense and focus instead on the idea of content or instantiation. I assume you're aware of what I've written on this subject, so it would be pointless for me to say more here. I find this rather remarkable when in your own call [1] you state this Rule for Engagement: 9. Kindly avoid arguing in the change proposals over the terminology that is used in the baseline document. Please use the terminology that it uses. If necessary discuss terminology questions on the list as document issues independent of the 303 question. Either the TAG is going to address this terminology head on or it is not. It is one of the cruxes to the problem, and not just because people are using it as an excuse to justify 200s. I will be saying more about this shortly. Thanks, Mike [1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html Jonathan A Rees, 29 February 2012 I hope the TAG will make a clear statement about this to help people stop bickering about this kind of thing. Often I think people attack information resource just because they want to use 200s for their linked data descriptions. This is a rather indirect tactic, and it misses the whole point of httpRange-14(a), which admittedly was a screwup in execution, but not idiotic in motivation. Jonathan The DOI handbook [1] notes that while typically used to identify intellectual property a DOI can be used to identify anything. The CrossRef guidelines [2] explain that [a]s a matter of current policy, the CrossRef DOI identifies the work, not its various potential manifestations Is a FRBR work an Information Resource? Personally I'd say not, but others may disagree. But as Dan Brickley has noted elsewhere in the discussion, there's other nuances to take into account. [1]. http://www.doi.org/handbook_2000/intro.html#1.6 [2]. http://crossref.org/02publishers/15doi_guidelines.html Cheers, L.
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
On Tue, Mar 27, 2012 at 4:58 PM, Mike Bergman m...@mkbergman.com wrote: Hi Jonathan, On 3/27/2012 3:27 PM, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 2:14 PM, Leigh Doddsle...@ldodds.com wrote: Hi, On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Reesr...@mumble.net wrote: ... There is a difference, since what is described could be an IR that does not have the description as content. A prime example is any DOI, e.g. http://dx.doi.org/10.1371/journal.pcbi.1000462 (try doing conneg for RDF). The identified resource is an IR as you suggest, but the representation (after the 303 redirect) is not its content. A couple of comments here: 1. Its not any DOI. I believe CrossRef are still the only registrar that support this, but I might have missed an announcement. That's still 50m DOIs though You are right, it's not all registrars. I meant Crossref DOIs. I think Datacite DOIs do this too, but I'm not sure. 2. Are you sure its an Information Resource? Nobody can be sure of any such question. I would say it is (as would be a variety of FRBR Works or Expressions or Manifestations, and many other things besides), but there is nothing I could possibly say that would persuade you of this. This is why, as Tim and I keep saying, you have to forget about the information resource nonsense and focus instead on the idea of content or instantiation. I assume you're aware of what I've written on this subject, so it would be pointless for me to say more here. I find this rather remarkable when in your own call [1] you state this Rule for Engagement: 9. Kindly avoid arguing in the change proposals over the terminology that is used in the baseline document. Please use the terminology that it uses. If necessary discuss terminology questions on the list as document issues independent of the 303 question. Either the TAG is going to address this terminology head on or it is not. It is one of the cruxes to the problem, and not just because people are using it as an excuse to justify 200s. I agree that it is cruxical, and I will do what I can to get the TAG to fix the problem. I thought that's what I said. I've written about this many times on the www-tag list, and even put it as a goal for the session at the F2F. I don't speak for the TAG, though, I'm just a member, so I can't promise anything. If it were up to me I'd purge information resource from the document, since I don't want to argue about what it means, and strengthen the (a) clause to be about content or instantiation or something. But the document had to reflect the status quo, not things as I would have liked them to be. I have not submitted this as a change proposal because it doesn't address ISSUE-57, but it is impossible to address ISSUE-57 with a 200-related change unless this issue is addressed, as you say, head on. This is what I've written in my TAG F2F preparation materials. I will be saying more about this shortly. I thought enough had been said already, but will read with interest. Best Jonathan Thanks, Mike [1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html Jonathan A Rees, 29 February 2012 I hope the TAG will make a clear statement about this to help people stop bickering about this kind of thing. Often I think people attack information resource just because they want to use 200s for their linked data descriptions. This is a rather indirect tactic, and it misses the whole point of httpRange-14(a), which admittedly was a screwup in execution, but not idiotic in motivation. Jonathan The DOI handbook [1] notes that while typically used to identify intellectual property a DOI can be used to identify anything. The CrossRef guidelines [2] explain that [a]s a matter of current policy, the CrossRef DOI identifies the work, not its various potential manifestations Is a FRBR work an Information Resource? Personally I'd say not, but others may disagree. But as Dan Brickley has noted elsewhere in the discussion, there's other nuances to take into account. [1]. http://www.doi.org/handbook_2000/intro.html#1.6 [2]. http://crossref.org/02publishers/15doi_guidelines.html Cheers, L.
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Jonathan, On 27 Mar 2012, at 14:02, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: This whole information resource thing needs to just go away. I can't believe how many people come back to it after the mistake has been pointed out so many times. Maybe the TAG or someone has to make a statement admitting that the way httpRange-14(a) was phrased was a big screwup, that the real issue is content vs. description, not a type distinction. Yes, that may help. But then we would also have to define what 'content' and 'description' meant. I have a feeling that might prove just as slippery and ultimately unhelpful as 'information resource'. I think Jeni's proposal is to say that the Flickr URI is good practice, rather than deny it. My proposal is to say that the description-free situation is good practice, rather than just an undocumented common practice. Let's call it 'The Explicit Description Link Change Proposal'; it isn't mine except in so far as I coordinated its drafting and submitted it. Anyway, it doesn't say that the Flickr URI is good practice, it just says that clients can't make any assumptions one way or the other about whether the retrieved representation is content or description unless it contains explicit statements or the description is reached through a description link (303 redirect; 'describedby' Link: header). Good practice would be for Flickr to use separate URIs for 'the photograph' and 'the description of the photograph', to ensure that 'the description of the photograph' was reachable from 'the photograph' and to ensure that any statements referred to the correct one. Under the proposal, they could change to this good practice in four ways: 1. by adding: link rel=describedby href=#main / to their page (or pointing to some other URL that they choose to use for 'the description of the photograph') 2. by adding a Link: header with a 'describedby' relationship that points at a separate URI for 'the description of the photograph' (possibly a fragment as in 1?) 3. by switching to using http://www.flickr.com/photos/70365734@N00/6905069277/#photo or something everywhere the photograph was referred to, adding: link about=#photo rel=describedby href= / in their page and adding about=#photo on the body element in the HTML so that the RDFa statements in the page were about the photograph 4. by introducing support for a new page http://www.flickr.com/photos/70365734@N00/6905069277/description and adding a 303 redirection from http://www.flickr.com/photos/70365734@N00/6905069277/ to that URL The first two methods are only feasible under the proposal; the others are things they could do now. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
On 2012-03 -27, at 16:17, Michael Smethurst wrote: No sane publisher trying to handle a decent amount of traffic is gonna follow the dbpedia pattern of doing it in one step (conneg to 303) and picking up 2 server hits per request. I've said here before that the dbpedia publishing pattern is an anti-pattern and shouldn't be encouraged So see the alternative suggestion to use 200 with a header to mean I am using the other semantics, you asked for a thing and here is a representation of a document describing it - and BTW the document has this URI if you want to talk about it. http://www.w3.org/wiki/HTML/ChangeProposal25 Tim
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hi all, On Mar 27, 2012, at 18:01, Jeni Tennison wrote: Jonathan, On 27 Mar 2012, at 14:02, Jonathan A Rees wrote: On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote: This whole information resource thing needs to just go away. I can't believe how many people come back to it after the mistake has been pointed out so many times. Maybe the TAG or someone has to make a statement admitting that the way httpRange-14(a) was phrased was a big screwup, that the real issue is content vs. description, not a type distinction. Yes, that may help. But then we would also have to define what 'content' and 'description' meant. I have a feeling that might prove just as slippery and ultimately unhelpful as 'information resource'. I fought against jettisoning the IR/NIR distinction for years, but finally realized that I was wrong to do so. The thing that convinced me was the simple fact that we can describe an IR (e.g. an HTML page) with another IR (an RDF document) without needing to say that either one was or was not an IR (other than optionally in the RDF). By contrast, we do have Content-Type to talk about the content of a Representation and Jeni's four ways below (two ways of using a link tag with rel=describedby, Link: header with a 'describedby', or 303) to talk about descriptions. I'd be happy to forget about IR/NIR, limit the meaning of content to the Content-Type and limit the scope of a description to one of those four approaches. Any takers? Regards, Dave I think Jeni's proposal is to say that the Flickr URI is good practice, rather than deny it. My proposal is to say that the description-free situation is good practice, rather than just an undocumented common practice. Let's call it 'The Explicit Description Link Change Proposal'; it isn't mine except in so far as I coordinated its drafting and submitted it. Anyway, it doesn't say that the Flickr URI is good practice, it just says that clients can't make any assumptions one way or the other about whether the retrieved representation is content or description unless it contains explicit statements or the description is reached through a description link (303 redirect; 'describedby' Link: header). Good practice would be for Flickr to use separate URIs for 'the photograph' and 'the description of the photograph', to ensure that 'the description of the photograph' was reachable from 'the photograph' and to ensure that any statements referred to the correct one. Under the proposal, they could change to this good practice in four ways: 1. by adding: link rel=describedby href=#main / to their page (or pointing to some other URL that they choose to use for 'the description of the photograph') 2. by adding a Link: header with a 'describedby' relationship that points at a separate URI for 'the description of the photograph' (possibly a fragment as in 1?) 3. by switching to using http://www.flickr.com/photos/70365734@N00/6905069277/#photo or something everywhere the photograph was referred to, adding: link about=#photo rel=describedby href= / in their page and adding about=#photo on the body element in the HTML so that the RDFa statements in the page were about the photograph 4. by introducing support for a new page http://www.flickr.com/photos/70365734@N00/6905069277/description and adding a 303 redirection from http://www.flickr.com/photos/70365734@N00/6905069277/ to that URL The first two methods are only feasible under the proposal; the others are things they could do now. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
On Mar 26, 2012, at 9:15 AM, Bernard Vatant wrote: All Like many others it seems, I had sworn to myself : nevermore HttpRange-14, but I will also bite the bullet. Hi Bernard Here goes ... Sorry I've hard time to follow-up with whom said what with all those entangled threads, so I answer to ideas more than to people. There is no need for anyone to even talk about information resources. YES! I've come with years to a very radical position on this, which is that we have create ourselves a huge non-issue with those notions of information resource and non-information resource. Please show any application making use of this distinction, or which would break if we get rid of this distinction. And in any case if there is a distinction, this distinction is about how the URI behave in the http protocol (what it accesses), which should be kept independent of what the URI denotes. The neverending debate will never end as long as those two aspects are mixed, as they are in the current httpRange-14 as well as in various change proposals (hence those interminable threads). The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. That has always been my understanding of the intent of the decision. I think the way that TimBL phrases it, as a choice betweeen the identified resource *being* the meaning (200-code response) or *describing* the meaning (303 response) is basically the same distinction with a cherry on top. The proposal is that URI X denotes what the publisher of X says it denotes, whether it returns 200 or not. The problem here is that virtually all publihers don't do this, and there is absolutely no sign that anything more than a vanishingly small percentage ever will. Not to mention there is no accepted way to do this, or to check when it has been done. And, as TImBL reported in a recent email, many people (read: the TAG) want it to be the case that there is a 'default' in such cases, and that it should be that the URI denotes the Web document which it accesses, so that the semantic web can easily talk about the nonsemantic web. This is the only position which makes sense to me. What the URI is intended to denote can be only derived from explicit descriptions, whatever the way you access those descriptions. Well, except that in fact you can't do this, as we all know (fix a referent by giving a description). You have to rely on actual ostention at some point, both an and off the Web; and on the Web, existing Web pages are the only contact point for using ostention (ie explicitly pointing to something and saying, in effect, I'm referring to *that* And assume that if there is no such description, the URI is intended to provide access to somewhere, but not to denote *some* *thing*. It's just actionable in the protocol, and clients do whatever they want with what they get. It's the way the (non-semantic) Web works, and it's OK. And what if the publisher simply does not say anything about what the URi denotes? Then nobody knows, and actually nobody cares But people do care, see above. what the URI denotes, or say that all users implicitly agree it is the same thing, but it does not break any system to ignore what it is. Or, again, show me counter-examples.. TimBL has many. After all, something like 99.999% of the URIs on the planet lack this information. Which means that for the Web to work so far, knowing what a URI denotes is useless. But it's useful for the Semantic Web. So let's say that a URI is useful for, or is part of, the Semantic Web if some description(s) of it can be found. And we're done. What, if anything, can be concluded about what they denote? Nothing, and let's face it. The http-range-14 rule provides an answer to this which seems reasonably intuitive. Wonder if it can be the same Pat Hayes writing this as the one who wrote six years ago In Defence of Ambiguity :) http://www.ibiblio.org/hhalpin/irw2006/presentations/HayesSlides.pdf Quote (from the conclusion) WebArch http-range-14 seems to presume that if a URI accesses something directly (not via an http redirect), then the URI must refer to what it accesses. This decision is so bad that it is hard to list all the mistakes in it, but here are a few : - It presumes, wrongly, that the distinction between access and reference is based on the distinction between accessible and inaccessible referents. ... [see above link for full list] Pat, has your position changed on this? Not on the ambiguity point, but yes on http-range-14. I still dislike it wholeheartedly and I wish there was some other way to go, but I can see that it is useful and relatively simple and enables people to move forward, and it seems to kind of work. Maybe this (or
Re: Change Proposal for HttpRange-14
On 25/03/12 19:24, Kingsley Idehen wrote: Tim, Alternatively, why not use the existing Link: header? Then we end up with the ability to express the same :describedby relation in three places Which is, of course, in the now-submitted proposal. Dave
Re: Change Proposal for HttpRange-14
Tim, On 25 Mar 2012, at 20:26, Tim Berners-Lee wrote: For example, To take an arbitrary one of the trillions out there, what does http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 identify, there being no RDF in it? What can I possibly do with that URI if the publisher has not explicitly allowed me to use it to refer to the online book, under your proposal? I don't know about anyone else, but I am getting increasingly confused by your use of this example. What is it that you want to be able to do? Is it that you want to be able to use http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=1 to refer to the book Moby Dick? You can't do that currently. http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 is a web page, not a book. Just because 1. The book Moby Dick is a book and therefore is an information resource. 2. http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 returns a 200 therefore http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 is an information resource. 3. http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 shows a bit of the book Moby Dick. it does not follow that http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 refers to the book Moby Dick. Do you think it does? Of course you could, currently, in some RDF that you own assert something like: #me :like http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 ; . http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 a bibo:Book ; dct:title Moby Dick ; . and therefore state that you mean http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 to refer to the book Moby Dick, rather than specifically page 11 of the Project Gutenberg version, but whether anyone else would use that same URL to refer to the book, or trust your assertions about that URL, is a purely social question. Under the proposal that we've put forward, you can still make those assertions in your own RDF if you want, and consumers will still trust them or not as they wish. The only thing that changes is that consumers can't make the assumption that just because http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 returns a 200 it's an information resource, but you haven't required that assumption to make your assertions about http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11, so I really don't see how that would affect anything that you're doing. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
Hi Tim, On Sun, Mar 25, 2012 at 8:26 PM, Tim Berners-Lee ti...@w3.org wrote: ... For example, To take an arbitrary one of the trillions out there, what does http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 identify, there being no RDF in it? What can I possibly do with that URI if the publisher has not explicitly allowed me to use it to refer to the online book, under your proposal? You can do anything you want with it. You could use record statements about your HTTP interactions, e.g. retrieval status date. Or, because RDF lets anyone, say anything, anywhere, you could just decide to use that as the URI for the book and annotate it accordingly. The obvious caveat and risk is that the publisher might subsequently disagree with you if they do decide to publish some RDF. I can re-use your data if I decide that risk is acceptable and we can still usefully interact. Even if Gutenberg.org did publish some RDF at that URI, you still have the risk that they could change their mind at a later date. httprange-14 doesn't help at all there. Lack of precision and inconsistency is going to be rife whatever form the URIs or response codes used. Encouraging people to say what their URIs refer to is the very first piece of best practice advice. L.
Re: Change Proposal for HttpRange-14
On 3/26/12 3:57 AM, Dave Reynolds wrote: On 25/03/12 19:24, Kingsley Idehen wrote: Tim, Alternatively, why not use the existing Link: header? Then we end up with the ability to express the same :describedby relation in three places Which is, of course, in the now-submitted proposal. Dave Yes, and I only found that out yesterday as you'll see from my thread with Jonathan :-) -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
All Like many others it seems, I had sworn to myself : nevermore HttpRange-14, but I will also bite the bullet. Here goes ... Sorry I've hard time to follow-up with whom said what with all those entangled threads, so I answer to ideas more than to people. There is no need for anyone to even talk about information resources. YES! I've come with years to a very radical position on this, which is that we have create ourselves a huge non-issue with those notions of information resource and non-information resource. Please show any application making use of this distinction, or which would break if we get rid of this distinction. And in any case if there is a distinction, this distinction is about how the URI behave in the http protocol (what it accesses), which should be kept independent of what the URI denotes. The neverending debate will never end as long as those two aspects are mixed, as they are in the current httpRange-14 as well as in various change proposals (hence those interminable threads). The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. The proposal is that URI X denotes what the publisher of X says it denotes, whether it returns 200 or not. This is the only position which makes sense to me. What the URI is intended to denote can be only derived from explicit descriptions, whatever the way you access those descriptions. And assume that if there is no such description, the URI is intended to provide access to somewhere, but not to denote *some* *thing*. It's just actionable in the protocol, and clients do whatever they want with what they get. It's the way the (non-semantic) Web works, and it's OK. And what if the publisher simply does not say anything about what the URi denotes? Then nobody knows, and actually nobody cares what the URI denotes, or say that all users implicitly agree it is the same thing, but it does not break any system to ignore what it is. Or, again, show me counter-examples.. After all, something like 99.999% of the URIs on the planet lack this information. Which means that for the Web to work so far, knowing what a URI denotes is useless. But it's useful for the Semantic Web. So let's say that a URI is useful for, or is part of, the Semantic Web if some description(s) of it can be found. And we're done. What, if anything, can be concluded about what they denote? Nothing, and let's face it. The http-range-14 rule provides an answer to this which seems reasonably intuitive. Wonder if it can be the same Pat Hayes writing this as the one who wrote six years ago In Defence of Ambiguity :) http://www.ibiblio.org/hhalpin/irw2006/presentations/HayesSlides.pdf Quote (from the conclusion) WebArch http-range-14 seems to presume that if a URI accesses something directly (not via an http redirect), then the URI must refer to what it accesses. This decision is so bad that it is hard to list all the mistakes in it, but here are a few : - It presumes, wrongly, that the distinction between access and reference is based on the distinction between accessible and inaccessible referents. ... [see above link for full list] Pat, has your position changed on this? What would be your answer? Or do you think there should not be any 'default' rule in such cases? I would say so, because such a rule is basically useless. As useless as to wonder what a phone number denotes. A phone number allows you to access a point in a network given the phone infrastructure and protocols, it does not denote anything except in specific contexts where it's used explicitly as an identifier e.g., to uniquely identify people, organizations or services. Otherwise it works just like a phone number should do. Best regards Bernard -- *Bernard Vatant * Vocabularies Data Engineering Tel : + 33 (0)9 71 48 84 59 Skype : bernard.vatant Linked Open Vocabularies http://labs.mondeca.com/dataset/lov *Mondeca** ** * 3 cité Nollez 75018 Paris, France www.mondeca.com Follow us on Twitter : @mondecanews http://twitter.com/#%21/mondecanews
Re: Change Proposal for HttpRange-14
On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: On 23 Mar 2012, at 14:05, Jonathan A Rees wrote: 2012/3/23 Melvin Carvalho melvincarva...@gmail.com: I dont think, even the wildest optimist, could have predicted the success of the current architecture (both pre and post HR14). The votes of confidence are interesting to me, as I have not been hearing them previously. It does appear we have a divided community, with some voices feeling that 303 will be the death of linked data, and others saying hash and 303 are working well. Where the center of gravity lies, I have no way of telling (and perhaps it's not important as long as any disagreement, or even ignorance, remains). As Larry Masinter said at the last TAG telcon, things do not seem to be converging. I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. Tom. P.S. Apologies if this repeats comments later in the thread than Steve's post; the novelty of agreeing with Kingsley still isn't enough to convince me to read the rest ;) Sad but true. -- Dr. Tom Heath Senior Research Scientist Talis Education Ltd. W: http://www.talisaspire.com/ W: http://tomheath.com/
Re: Change Proposal for HttpRange-14
Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? The larger question is how do we get to a state where we *don't* have this permathread running, year in year out. Jonathan and the TAG's aim with the call for change proposals is to get us to that state. The idea is that by getting people who think that the specs should say something different to put their money where their mouth is and express what that should be, we have something more solid to work from than reams and reams of opinionated emails. But we do all need to work at it if we're going to come to a consensus. I know everyone's tired of this discussion, but I don't think the TAG is going to do this exercise again, so this really is the time to contribute, and preferably in a constructive manner, recognising the larger aim. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
Hi Jeni, On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote: Tom, On 26 Mar 2012, at 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote: I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. What hard data do you think would resolve (or if not resolve, at least move forward) the argument? Some people are contributing their own experience from building systems, but perhaps that's too anecdotal? Would a structured survey be helpful? Or do you think we might be able to pick up trends from the webdatacommons.org (or similar) data? A few things come to mind: 1) a rigorous assessment of how difficult people *really* find it to understand distinctions such as things vs documents about things. I've heard many people claim that they've failed to explain this (or similar) successfully to developers/adopters; my personal experience is that everyone gets it, it's no big deal (and IRs/NIRs would probably never enter into the discussion). 2) hard data about the 303 redirect penalty, from a consumer and publisher side. Lots of claims get made about this but I've never seen hard evidence of the cost of this; it may be trivial, we don't know in any reliable way. I've been considering writing a paper on this for the ISWC2012 Experiments and Evaluation track, but am short on spare time. If anyone wants to join me please shout. 3) hard data about occurrences of different patterns/anti-patterns; we need something more concrete/comprehensive than the list in the change proposal document. 4) examples of cases where the use of anti-patterns has actually caused real problems for people, and I don't mean problems in principle; have planes fallen out of the sky, has anyone died? Does it really matter from a consumption perspective? The answer to this is probably not, which may indicate a larger problem of non-adoption. The larger question is how do we get to a state where we *don't* have this permathread running, year in year out. Jonathan and the TAG's aim with the call for change proposals is to get us to that state. The idea is that by getting people who think that the specs should say something different to put their money where their mouth is and express what that should be, we have something more solid to work from than reams and reams of opinionated emails. This is a really worthy goal, and thank you to you, Jonathan and the TAG for taking it on. I long for the situation you describe where the permathread is 'permadead' :) But we do all need to work at it if we're going to come to a consensus. I know everyone's tired of this discussion, but I don't think the TAG is going to do this exercise again, so this really is the time to contribute, and preferably in a constructive manner, recognising the larger aim. I hear you. And you'll be pleased to know I commented on some aspects of the document (constructively I hope). If my previous email was anything but constructive, apologies - put it down to httpRange-14 fatigue :) Cheers, Tom. -- Dr. Tom Heath Senior Research Scientist Talis Education Ltd. W: http://www.talisaspire.com/ W: http://tomheath.com/
Re: Change Proposal for HttpRange-14
Hi Dave, On 26 March 2012 16:51, Dave Reynolds d...@epimorphics.com wrote: On 26/03/12 16:05, Tom Heath wrote: On 23 March 2012 15:35, Steve Harrissteve.har...@garlik.com wrote: On 23 Mar 2012, at 14:05, Jonathan A Rees wrote: 2012/3/23 Melvin Carvalhomelvincarva...@gmail.com: I dont think, even the wildest optimist, could have predicted the success of the current architecture (both pre and post HR14). The votes of confidence are interesting to me, as I have not been hearing them previously. It does appear we have a divided community, with some voices feeling that 303 will be the death of linked data, and others saying hash and 303 are working well. Where the center of gravity lies, I have no way of telling (and perhaps it's not important as long as any disagreement, or even ignorance, remains). As Larry Masinter said at the last TAG telcon, things do not seem to be converging. I'm sure many people are just deeply bored of this discussion. No offense intended to Jeni and others who are working hard on this, but *amen*, with bells on! No argument. One of the things that bothers me most about the many years worth of httpRange-14 discussions (and the implications that HR14 is partly/heavily/solely to blame for slowing adoption of Linked Data) is the almost complete lack of hard data being used to inform the discussions. For a community populated heavily with scientists I find that pretty tragic. The primary reason for having put my name to the proposal was that I personally been adversely affected. I have been involved in client discussions that have been derailed by someone bringing up httprange-14. I have been in discussions with clients where 303s are not acceptable (thanks to CDN behaviour). I have both received and (sadly) sent out data that is broken and caused errors due to cut/paste from the browser bar thanks to httprange-14. My anecdotal evidence is that the nature of the recurrent discussion can create or reinforce an impression of the area being too academic, not ready for practical use. My anecdotal evidence suggests the same. I don't claim that httprange-14 is solely or substantially to blame for holding back linked data. I don't claim that my personal experience is necessarily widespread or representative. As I suspected, knowing your penchant for rigour :) There is no science on offer here, move on. But ... if, with the current TAG process, there is a chance of a new resolution that reduces any of these problems then it is worth a tiny bit of effort. If there is a chance the new resolution will be so good as to damp down this permathread then it is worth more effort. If it kills the permathread completely then I owe someone at least a crate of beer. I'll personally double the beer prize. I really want to see closure as much as you do, but will admit I'm skeptical. Do we collectively have an agreement that whatever the TAG decide we'll accept it, implement it, shut up, and move on? Now that's a document I want to see signatures on! (Obviously such a document would place even greater pressure on the TAG, who already have my gratitude for doing a tough job.) Cheers, Tom. -- Dr. Tom Heath Senior Research Scientist Talis Education Ltd. W: http://www.talisaspire.com/ W: http://tomheath.com/
NIR SIDETRACK Re: Change Proposal for HttpRange-14
On 2012-03 -25, at 14:06, Norman Gray wrote: Tim, greetings. On 2012 Mar 25, at 17:35, Tim Berners-Lee wrote: (Not useful to talk about NIRs. The web architecture does not. Now does Jonathan's baseline, not HTTP Range-14. Never assume that what an IR is about is not itself a IR.) Well, httpRange-14 sort of does talk about 'non-information resources', by necessary implication. Of course you can define the class but I said it isn't useful to talk about it. That was an understatement. It has wasted person-centuries of work. Let me give a potted history for newcomers: pinch of salt 1) The TAG wanted to settle whether, after a 200 response, the URI always referred to a document which you just got a representation of. 2) They - we - foolishly cutely phrased the issue what is range of the HTTP dereference function. Mistake. (This is the function mapping URI to HTTP entity aka HTTP representation, aka content. So its range would e representation -- but we meant what does the URI denote if you use it in say an RDF system -- more like the range of denotation relation for HTTP hashless URIs ) 3) They figured the semantics of the HTTP deref function were the relationship between the name for a document and the the contents of the document. 4) So in that case the domain of the function is name (URI in fact) and the domain is representation, and the URI denotes a document (Information Resource in fact). Which is not a big deal. 5) Nor is the exact definition of the class document a big deal. 6) People then for some reason thought, oh, if I am running a server, then I must test everything I am serving to make sure it is an IR before I serve it -- oh no -- how can I make that test? We must have a decision algorithm! Mistake. 7) They should have asked For each URI, what is the content of the document it names? 8) Instead they argued for years about the edge cases of what exactly was as what wasn't a IR. Is a book? Is a girl with a tattooed poem? A page which says it is a person? A fridge? 9) Instead they should have thought Am I serving the contents of this, or am I serving data about this? If I am serving the contents then I will use its URI; otherwise I will use a different URI for the document. 9.5) People actually experimented -- served up girls with tatoos and pages opining they were not pages and everything. For years. 10) (Ignore DanBri who now suggests that you coud argue forever about the difference between the content of something and a description of it. He only does it to annoy because he knows it teases.) 11) After a few years enough people such as Ian D said they wanted an alternative architecture, where do a get on the URI of a thing, and a document about the thing is returned, and the URI is not the URI of the document. That lead to the adoption of the 303. Which is still a problem as it takes time. 12) Still people say well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR when instead they should be saying Well, am I going to serve the content of this sucker or information about it?. 13) In fact lots of times, people serve information *about* something, not its contents, even though it has contents (it is an IR). 14) That's why I said Never assume that what an IR is about is not itself a IR 15) That's why I said Not useful to talk about NIRs. In brief. Or you can scan the email archive and see the long version. /pinch of salt If the set of information resources (IR) is not the same as the set of all resources (R), then the set R\IR (which in any case exists) is non-null, and might as well be called the set of 'non-information-resources' as anything else. But perhaps R\IR is a better notation. (I don't intend this to be hair-splitting) What exactly do you mean by hair-splitting? Parenthetically, what _is_ IR? You can't define a set we are going to use mathematically exactly in terms of the real world. or people will always agued edge cases. You can define a set of the things mathematically in terms of each other, or you can try to define them in words like an encyclopaedia, but if you try to both you get endless arguments, as people argue that the terms you Documents like Jonathan's carefully defined these functions in terms of each other, and there have been millions of attempts to explain to different people on the lists in terms they understand, but if you Parenthetically, what is a Resource? Actually, what is an architecture? Referring to Rees's editors draft [1], [issue-14-resolved] effectively says that iff a resource X is 200-retrieved, then it must _always_ be assigned to the set IR (the resolution seems to effectively define 'being 200-retrievable' as the definition of 'information resource', and this is consistent with [1] section 1.1 which says One convention[...] was for a hashless URI to refer to the document-like entity (information resource) served at
Re: Change Proposal for HttpRange-14
Hi Michael, On 24 Mar 2012, at 23:16, Michael Brunnbauer wrote: so every publisher who wants to provide licencing information for his RDF has to either 1) use 303 redirects 2) publish no data at the NIR except the describedby triples, which seems pointless to me They can publish whatever data they like at the NIR. 3) use the same URI for the IR and the NIR This isn't best practice, as it isn't today. But of course people do it, just as they conflate different meanings for any URI. If he also wants to provide meta information for his HTML, he cannot publish the HTML at the NIR. I don't see a new and easier option offered by the proposal. In the end, people will do what they already do today. Yes, some publishers who are publishing according to the current specs (with 303 redirections being their only option) might continue to do what they do today. Others, as I have explained previously, may prefer to use 200 responses so that they can use the main (NIR) URI everywhere in their web application, while retaining the other responses that they already support for semantic web purists to access. Publishers who are not today using 303s but are nevertheless minting URIs which identify NIRs (ie those outside the semantic web community) will also continue to do what they do today. The difference is that we (semantic web purists) will no longer be constantly telling them that they're Doing It Wrong, but will be able to build consumers that cope with this reality. BTW: The POWDER describedby property suggests that you will find some information about the subject of the describedby triple when you dereference the object URI but this does not seem to be intended here. Wrong. The documents that are said to describe a URI should still describe that URI. POWDER with it's power to assert metadata for whole collections of IRs also probably will contribute to the IR/NIR conflation. I think we should leave everything as it is and just don't blame publishers who conflate IRs and NIRs. Sooner or later, they probably will fix it all. I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
Hi, As you will have seen, I have now sent this Change Proposal to the TAG [1]. Technical discussion and comments should continue on www-...@w3.org. I note that there are other change proposals around for discussion as well [2][3][4]. The call is open until 29th March, so please get any other proposals in soon. The TAG meets 2-4th April and this is our first item of technical discussion [5]. Thanks everyone, Jeni [1] http://lists.w3.org/Archives/Public/www-tag/2012Mar/0086.html [2] http://lists.w3.org/Archives/Public/www-tag/2012Mar/.html [3] http://lists.w3.org/Archives/Public/www-tag/2012Mar/0006.html [4] http://lists.w3.org/Archives/Public/www-tag/2012Mar/0085.html [5] http://www.w3.org/2001/tag/2012/04/02-agenda On 22 Mar 2012, at 20:21, Jeni Tennison wrote: Hi there, Hopefully you're all aware that there's a Call for Change Proposals [1] to amend the TAG's long-standing HttpRange-14 decision [2]. Jonathan Rees has put together a specification that expresses that decision in a more formal way [3], against which changes need to be made. Leigh Dodds, Dave Reynolds, Ian Davis and I have put together a Change Proposal [4], which I've copied below. From a publishing perspective, the basic change is that it becomes acceptable for publishers to publish data about non-information resources with a 200 response; if a publisher want to provide licensing/provenance information they can use a wdrs:describedby statement to point to a separate resource about which such information could be provided. From a consumption perspective, the basic change is that consumers can no longer assume that a 2XX response implies that the resource is an information resource, though they can make that inference if the resource is the object of a wdrs:describedby statement or has been reached by following a 303 redirection of a 'describedby' Link header. The aim of this email is not to start a discussion about the merits of this or any other Change Proposal, but to make a very simple request: if you agree with these changes, please can you add your name to the document at: https://docs.google.com/document/d/1aSI7LpD4UDuHiDNqx8qN1W400QeZdzWYD-CRuU0Xmk0/edit That document also contains a link to the Google Doc version of the proposal [4] if you want to add comments. We will not be making substantive changes to this Change Proposal: if you want to suggest a different set of changes to the HttpRange-14 decision, I heartily recommend that you create a Change Proposal yourself! :) You should feel free to use this Change Proposal as a basis for yours if you want. Note that the deadline for doing so is 29th March (ie one week from today) so that the proposals can be discussed at the TAG F2F meeting the following week. Thanks, Jeni [1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html [2] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html [3] http://www.w3.org/2001/tag/doc/uddp/ [4] https://docs.google.com/document/d/1ognNNOIcghga9ltQdoi-CvbNS8q-dOzJjhMutJ7_vZo/edit --- Summary This proposal contains two substantive changes. First, it enables publishers to link to URI documentation for a given probe URI by providing a 200 response to that probe URI that contains a statement including a ‘describedby’ relationship from the probe URI to the URI documentation. Second, a 200 response to a probe URI no longer implies that the probe URI identifies an information resource; instead, this can only be inferred if the probe URI is the object of a ‘describedby’ relationship. Rationale While there are instances of linked data websites using 303 redirections, there are also many examples of people making statements about URIs (particularly using HTML link relations, RDFa, microdata, and microformats) where those statements indicate that the URI is supposed to identify a non-information resource such as a Person or Book. Rather than simply telling these people that they are Doing It Wrong, “Understanding URI Hosting Practice as Support for URI Documentation Discovery” should ensure that: * applications that interpret such data do not draw wrong conclusions about these URIs simply because they return a 200 response without a describedby Link header * publishers of this data can easily upgrade to making the distinction between the non-information resource that the page holds information about and the information resource that is the page itself, should they discover that they need to Details In section 4.1, in place of the second paragraph and following list, substitute: There are three ways to locate a URI documentation link in an HTTP response: * using the Location: response header of a 303 See Other response [httpbis-2], e.g. 303 See Other Location: http://example.com/uri-documentation • using a Link: response header with link relation
Re: Change Proposal for HttpRange-14
Hello Jeni, On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Do you think semantic web projects have been stopped because some purist involved did not see a way to bring httprange14 into agreement with the other intricacies of the project ? Those purists will still see the new options that the proposal offers as what they are: Suboptimal. Or do you think some purists have been actually blaming publishers ? What will stop them in the future to complain like this: Hey, your website consists solely of NIRs, I cannot talk about it! Please use 303. You are solving the problem by pretending that the IRs are not there then the publisher does not make the distinction between IR and NIR. Maybe we can optimize the wording of standards and best practise guides to something like these are the optimal solutions. Many people also do it this way but this has the following drawbacks... Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: Change Proposal for HttpRange-14
Fair questions, Michael. I have a lot of sympathy for your I don't see the point of this whole discussion. We can write what we want in documents, but the world can ignore them - and will if they don't work. And the world will be what it is, not what we want it to be. However. Unfortunately, perhaps, standards are important for people who work in the field providing systems to others. Personally, I never did agree with the solution, but have always aimed to carry out the implications of it in the systems I construct. This is for two reasons: a) as a member of a small community, it is destructive to do otherwise; b) as a professional engineer, my ethical obligations require me to do so. It is this second, the ethical obligations that are the most significant. I should not digress from the standards, or even Best Practice, in my work. (Apart from anything else, the legal implications of doing otherwise are very unpleasant.) This means that systems involving Linked Data do not get built because the options I am allowed to offer are too expensive (in money, complexity, time or business disruption), or technologically infeasible due to local constraints. So the answer to your first question is yes: semantic web (parts of) projects are stopped because of this. Ethics and community membership requires it. When they do go ahead, of course they actually cause me some pain - implementing a situation I think is significantly sub-optimal - but I do not have the choice. Of course, people who are outside this community will do what they feel like, as always. But the current situation constrains the people in the community, who are the very people who should be helping others to build systems that are a little less broken. Best Hugh On 25 Mar 2012, at 11:03, Michael Brunnbauer wrote: Hello Jeni, On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Do you think semantic web projects have been stopped because some purist involved did not see a way to bring httprange14 into agreement with the other intricacies of the project ? Those purists will still see the new options that the proposal offers as what they are: Suboptimal. Or do you think some purists have been actually blaming publishers ? What will stop them in the future to complain like this: Hey, your website consists solely of NIRs, I cannot talk about it! Please use 303. You are solving the problem by pretending that the IRs are not there then the publisher does not make the distinction between IR and NIR. Maybe we can optimize the wording of standards and best practise guides to something like these are the optimal solutions. Many people also do it this way but this has the following drawbacks... Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel -- Hugh Glaser, Web and Internet Science Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ Work: +44 23 8059 3670, Fax: +44 23 8059 3045 Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652 http://www.ecs.soton.ac.uk/~hg/
Re: Change Proposal for HttpRange-14
Michael, On 25 Mar 2012, at 11:03, Michael Brunnbauer wrote: On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Do you think semantic web projects have been stopped because some purist involved did not see a way to bring httprange14 into agreement with the other intricacies of the project ? Those purists will still see the new options that the proposal offers as what they are: Suboptimal. What would be optimal in your view? Or do you think some purists have been actually blaming publishers ? What will stop them in the future to complain like this: Hey, your website consists solely of NIRs, I cannot talk about it! Please use 303. Nothing. In fact TimBL has already said this [1], and Jonathan has pointed out what such people will have to do to make those kinds of statements [2]. This is already listed as a disadvantage in the proposal. I recognise it's a disadvantage, I just think it is worth the hit compared to the advantages of the change. You are solving the problem by pretending that the IRs are not there then the publisher does not make the distinction between IR and NIR. No, I am just proposing stopping pretending that the NIR is not there, which is what is mandated by the current httpRange-14 design. Maybe we can optimize the wording of standards and best practise guides to something like these are the optimal solutions. Many people also do it this way but this has the following drawbacks... Yes, as I argued here [3] I strongly believe that casting the separation of IR and NIR as a best practice rather than a vital necessity is the right way to go. Cheers, Jeni [1] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0143.html [2] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0144.html [3] http://www.jenitennison.com/blog/node/159 -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
Hello Jeni, On Sun, Mar 25, 2012 at 12:31:18PM +0100, Jeni Tennison wrote: Those purists will still see the new options that the proposal offers as what they are: Suboptimal. What would be optimal in your view? I do not know a way to mint two URIs (IR+NIR) in a way that is less painful. You are solving the problem by pretending that the IRs are not there then the publisher does not make the distinction between IR and NIR. No, I am just proposing stopping pretending that the NIR is not there, which is what is mandated by the current httpRange-14 design. If - like Hugh suggested - httpRange-14 is really stopping people inside the community from delivering solutions and those people are willing to sacrifice the IRs (although I find both of this hard to believe) - then you have good reasons to go ahead. But this makes me think about what those same people will be unable to deliver because they cannot make the default IR assumption any more (as I said, the rest of the world will probably go on making it). Perhaps the default IR assumption be saved by saying that a 200 URI X is a IR as long as we don't find some triple at X that suggests otherwise. Why not a NIR class ? If the concept of IRs/NIRs is sufficiently unambiguous to talk about it in natural language (I think it is), we can talk about it in RDF. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: Change Proposal for HttpRange-14
On 2012-03 -25, at 07:31, Jeni Tennison wrote: [..] Yes, as I argued here [3] I strongly believe that casting the separation of IR and NIR as a best practice rather than a vital necessity is the right way to go. Let me assume that you meant: [..] Yes, as I argued here [3] I strongly believe that casting the separation of IR and the thing it describes as a best practice rather than a vital necessity is the right way to go. To actually confused those things in a system is to me is absolutely unacceptable. When I build rule file or systems some of them deal with documents and some with things that those documents describe, and in general they do both. If you want to define a new proposal then it had better be one where for a given URL I know which it identifiers. Pre - HR14, I could do that by looking at the URL. Post-HR14, I had to do a network operation to find out, but I can put up with that if it REALLY helps people. With your change proposal, there are times when you don't know at all! For example, under your change proposal does http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11; identifify? If card:i :likes http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11. Do I like a book or a whale? To not know is unacceptable to me. And to merge the two the IR and what it describes, as the same thing, is unacceptable too. Tim Cheers, Jeni [1] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0143.html [2] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0144.html [3] http://www.jenitennison.com/blog/node/159 -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
Michael and all, greetings. On 2012 Mar 25, at 14:19, Michael Brunnbauer wrote: Perhaps the default IR assumption be saved by saying that a 200 URI X is a IR as long as we don't find some triple at X that suggests otherwise. Why not a NIR class ? If the concept of IRs/NIRs is sufficiently unambiguous to talk about it in natural language (I think it is), we can talk about it in RDF. I confess I haven't kept fully up with the details of this suddenly rampant thread, but this suggestion is the one I associate with Ian Davis back in the 'Is 303 really necessary?' thread of November 2010 (that long ago!?). One can characterise this as 'httpRange-14 is defeasible', or, as a procedure: After a client has extracted all of the 'authoritative' statements about a resource X, which is retrieved with a 200 status, it rfc2119-should add the triple 'X a eg:InformationResource', unless this would create a contradiction. Why would this create a contradiction? The resource X might explicitly say that it is a eg:NonInformationResource; it might be declared to be a eg:Book, which is here or elsewhere declared to be subClassOf eg:NonInformationResource; or X might be in the domain or range of a property which indicates that it is a non-IR, such as for example :describedBy. What's 'extracted'? That could include RDF+conneg, or RDFa, or some semi-formal microformats-based process, or anything you like. What's 'authoritative'? That's to some extent up to the client, but it would sensibly be the list of statements 200-retrieved from the resource itself. That seems to include the practice described by Jeni's change request, and so inherit its advantages. It avoids telling anyone they're Doing It Wrong, with a 200 NIR resource. If someone at present describes a NIR with a 200 response, they can 'fix' that with a simple one-triple addition. Also, it leaves it entirely up to the resource owner to decide how many URIs they wish to maintain, and which one documents which. I'm sure most RDF descriptions of NIRs already do implicitly declare that they are NIRs. This overall seems to be the intent behind the :isdescribedby proposal. Is that correct? Best wishes, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Change Proposal for HttpRange-14
On 3/25/12 5:13 AM, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. They only get blamed when they claim that they are publishing Linked Data. If they don't do that nobody will complain. All Structured Data isn't Linked Data. All Linked Data is a form Structured Data. HttpRange-14 findings facilitate the co-existence of Structured Data and Linked Data on the Web. RDF and its family of syntaxes and serialization formats are vehicles for constructing resources that bear structured data. The same applies to HTML, XML, JSON etc.. None of these syntaxes produce Linked Data implicitly, you have to adhere to Linked Data principles for that to happen. The fundamental concern I have right now is that this effort is conflating basic Structured Data and the fidelity of Linked Data. You don't need any kind of revision to HttpRange-14 recommendations to enable what has long been reality on the Web. By that I mean: people have conflated Names and Addresses via URIs forever. Said conflation is only an issue when the end product is inaccurately classified as being Linked Data principles compliant. Linked Data is a different system or dimension of the Web. Without its fidelity many critical items become impossible to implement at Web scale: 1. data access by Name reference; 2. equivalence fidelity and inference; 3. distributed verifiable identity; 4. functional read-write-web . Conflate Names and Addresses and the above simply fail. Structured Data is growing exponentially on the Web thanks to efforts such as schema.org, Facebook Open Graph, and the emergence of JSON as an alternative to XML re., structured data representation syntax. That's a good thing. The more Structured Data we have on the Web the easier it becomes to explain and demonstrate the unique fidelity and benefits that Linked Data introduces. To conclude, we need to change our tendency to conflate matters since all Structured Data != Linked Data. Every time we conflate everything gets mucked up and things stall. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
On 3/25/12 6:03 AM, Michael Brunnbauer wrote: Hello Jeni, On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Do you think semantic web projects have been stopped because some purist involved did not see a way to bring httprange14 into agreement with the other intricacies of the project ? Those purists will still see the new options that the proposal offers as what they are: Suboptimal. Or do you think some purists have been actually blaming publishers ? What will stop them in the future to complain like this: Hey, your website consists solely of NIRs, I cannot talk about it! Please use 303. You are solving the problem by pretending that the IRs are not there then the publisher does not make the distinction between IR and NIR. Maybe we can optimize the wording of standards and best practise guides to something like these are the optimal solutions. Many people also do it this way but this has the following drawbacks... Regards, Michael Brunnbauer +1 Structured Data != Linked Data. Linked Data == Structured Data. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
On 2012-03 -23, at 21:02, Jeni Tennison wrote: On 23 Mar 2012, at 22:42, Jonathan A Rees wrote: On Thu, Mar 22, 2012 at 4:21 PM, Jeni Tennison j...@jenitennison.com wrote: While there are instances of linked data websites using 303 redirections, there are also many examples of people making statements about URIs (particularly using HTML link relations, RDFa, microdata, and microformats) where those statements indicate that the URI is supposed to identify a non-information resource such as a Person or Book. Can you provide a handful of these Doing It Wrong URIs please from various sites? I think it would really be helpful to have them on hand during discussions. OK. These picked up from dumps made available by webdatacommons.org, so very grateful to them for making that available; it can be quite hard to locate this kind of markup generally. Also I've used Gregg's distiller [1] to extract the RDFa out of the documents to double-check. http://www.logosportswear.com/product/1531 - 301 - http://www.logosportswear.com/product/1531/harbor-cruise-boat-tote which contains the RDFa statement http://www.logosportswear.com/product/1531 a http://rdf.data-vocabulary.org/#Product ; . The URI is intended to identify a product, not a web page. Indeed, and notice that it has a different URI from the web page The 301 could be easily changed to a 303, and all would be happy. They have done the difficult bit of separating out the product and the page. A site, by the way, which uses 301 is saying that the URI you asked for is obsolete, and you should stop using it. Tim
Re: Change Proposal for HttpRange-14
On 3/25/12 7:18 AM, Hugh Glaser wrote: Fair questions, Michael. I have a lot of sympathy for your I don't see the point of this whole discussion. We can write what we want in documents, but the world can ignore them - and will if they don't work. And the world will be what it is, not what we want it to be. However. Unfortunately, perhaps, standards are important for people who work in the field providing systems to others. Personally, I never did agree with the solution, but have always aimed to carry out the implications of it in the systems I construct. This is for two reasons: a) as a member of a small community, it is destructive to do otherwise; b) as a professional engineer, my ethical obligations require me to do so. It is this second, the ethical obligations that are the most significant. I should not digress from the standards, or even Best Practice, in my work. (Apart from anything else, the legal implications of doing otherwise are very unpleasant.) This means that systems involving Linked Data do not get built because the options I am allowed to offer are too expensive (in money, complexity, time or business disruption), or technologically infeasible due to local constraints. But as an engineer the complexity of the spec shouldn't determine the very essence of the spec. The whole AWWW is about the deceptively simple principle in action. It isn't a simply simple solution. We have URI abstraction and styles of URIs (hash or slash). The system (Linked Data in this case) is concerned about separation of powers right down to the fine-grained level of structured data representation. As result, there are implications that arise from the style of URI used in this context. Since 1998 we've ended up with the following syntaxes and serializations formats for the RDF model (EAV enhanced with URIs, language tags, and typed literals): 1. RDF/XML 2. N3 3. Turtle 4. TriX 5. N-Triples 7. TriG 8. NQuad 9. (X)HTML+RDFa 10. HTML+Microdata 11. JSON/RDF 12. JSON-LD. Don't you see a pattern here? Also what's an innocent newbie supposed to do when they encounter the above. Now we want repeat the pattern, this time scoped to URIs and they usage re. Linked Data fidelity: 1. hash -- Linked Data indirection is implicit 2. slash -- 303 redirection delivering Linked Data indirection explicitly 3. slash -- 200 OK and no redirection leaving user agents to process relations (and HTTP response headers) en route to manifestation of Linked Data's mandatory indirection. Again, don't you see the same pattern is taking shape i.e., a potpourri of suggestions that ultimately only add more confusion to newbies. Even worse, this particular suggest is ultimately a reworking of the entire AWWW. So the answer to your first question is yes: semantic web (parts of) projects are stopped because of this. I don't buy that for one second. There's a little more to it than that. How about the tools being used for these projects? You statement implies the very best tools available where used and they failed. You know that cannot be true. Ethics and community membership requires it. When they do go ahead, of course they actually cause me some pain - implementing a situation I think is significantly sub-optimal - but I do not have the choice. We have to separate issues here. We have: 1. a spec or set of best practices; 2. tools that implement the spec or best practices; 3. projects seeking to exploit the spec or best practices. You are basically ruling out tool choices as reasons for project failure. Of course, people who are outside this community will do what they feel like, as always. And in due course opportunity costs force them to reevaluate their choices. Decision makers in commercial enterprises don't care about technology, they are fundamentally preoccupied with opportunity costs. Make opportunity costs palpable and you have the ear of any decision maker in charge of a commercial venture. But the current situation constrains the people in the community, who are the very people who should be helping others to build systems that are a little less broken. It doesn't. I just don't buy that. You can have Structured Data that isn't Linked Data. We can't have it both ways. Why not move folks over in stages i.e., get them to Structured Data first, then upgrade them to Linked Data since the virtues of the upgrade will have much clearer context since Structured Data modulo Linked Data fidelity has clear limitations. Basically, turn what seems to be today's headache into a narrative showcasing specific virtues. Also note, we don't have a bookmarking problem with any style of URI for Linked Data. People can start by bookmarking the URLs of Information Resources. Kingsley Best Hugh On 25 Mar 2012, at 11:03, Michael Brunnbauer wrote: Hello Jeni, On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who
Re: Change Proposal for HttpRange-14
Hi Kingsley, On 25 Mar 2012, at 17:17, Kingsley Idehen wrote: On 3/25/12 7:18 AM, Hugh Glaser wrote: Fair questions, Michael. I have a lot of sympathy for your I don't see the point of this whole discussion. We can write what we want in documents, but the world can ignore them - and will if they don't work. And the world will be what it is, not what we want it to be. However. Unfortunately, perhaps, standards are important for people who work in the field providing systems to others. Personally, I never did agree with the solution, but have always aimed to carry out the implications of it in the systems I construct. This is for two reasons: a) as a member of a small community, it is destructive to do otherwise; b) as a professional engineer, my ethical obligations require me to do so. It is this second, the ethical obligations that are the most significant. I should not digress from the standards, or even Best Practice, in my work. (Apart from anything else, the legal implications of doing otherwise are very unpleasant.) This means that systems involving Linked Data do not get built because the options I am allowed to offer are too expensive (in money, complexity, time or business disruption), or technologically infeasible due to local constraints. But as an engineer the complexity of the spec shouldn't determine the very essence of the spec. The whole AWWW is about the deceptively simple principle in action. It isn't a simply simple solution. I keep meaning to ask: what is AWWW? It's not a term I see used anywhere but your emails. We have URI abstraction and styles of URIs (hash or slash). The system (Linked Data in this case) is concerned about separation of powers right down to the fine-grained level of structured data representation. As result, there are implications that arise from the style of URI used in this context. Since 1998 we've ended up with the following syntaxes and serializations formats for the RDF model (EAV enhanced with URIs, language tags, and typed literals): 1. RDF/XML 2. N3 3. Turtle 4. TriX 5. N-Triples 7. TriG 8. NQuad 9. (X)HTML+RDFa 10. HTML+Microdata 11. JSON/RDF 12. JSON-LD. Don't you see a pattern here? Also what's an innocent newbie supposed to do when they encounter the above. Probably run screaming from the room. Or at least tell us to go away and come back when the community has sorted itself out. (Were I to present things this way.) Now we want repeat the pattern, this time scoped to URIs and they usage re. Linked Data fidelity: 1. hash -- Linked Data indirection is implicit 2. slash -- 303 redirection delivering Linked Data indirection explicitly 3. slash -- 200 OK and no redirection leaving user agents to process relations (and HTTP response headers) en route to manifestation of Linked Data's mandatory indirection. Again, don't you see the same pattern is taking shape i.e., a potpourri of suggestions that ultimately only add more confusion to newbies. Even worse, this particular suggest is ultimately a reworking of the entire AWWW. I'm not sure I agree with your assertion of the same pattern. In any case, I didn't say this proposal was perfect - I would do it differently. But if it is a broken world - not fixing it should not be an option. You and I will have to differ as to whether the Project is currently a success - you clearly think so - I think that we are far back from what where we should be by now. So the answer to your first question is yes: semantic web (parts of) projects are stopped because of this. I don't buy that for one second. There's a little more to it than that. How about the tools being used for these projects? You statement implies the very best tools available where used and they failed. You know that cannot be true. Actually, it is. Your fallacy is to think that these are purely technological issues, and can always be solved with tools. These are socio-technical issues at best. Ethics and community membership requires it. When they do go ahead, of course they actually cause me some pain - implementing a situation I think is significantly sub-optimal - but I do not have the choice. We have to separate issues here. We have: 1. a spec or set of best practices; 2. tools that implement the spec or best practices; 3. projects seeking to exploit the spec or best practices. You are basically ruling out tool choices as reasons for project failure. Of course, people who are outside this community will do what they feel like, as always. And in due course opportunity costs force them to reevaluate their choices. Decision makers in commercial enterprises don't care about technology, they are fundamentally preoccupied with opportunity costs. Make opportunity costs palpable and you have the ear of any decision maker in charge of a commercial venture. But the current situation constrains the people in
Re: Change Proposal for HttpRange-14
On 3/25/12 1:07 PM, Niklas Lindström wrote: To clarify, what do I mean by another information resource? Isn't a representation of a resource also a resource, ultimately different from the thing itself? Yes! We have the following in play, but never easily discernible from Semantic Web and Linked Data narratives: 1. A Document (a Resource) -- bears representations of whatever 2. A Descriptor Document (a subClassOf Document) -- specifically bears representation of the description of an unambiguously named subject 3. An unambiguously named subject or entity . The term Resource continues to be used carelessly and the net effect is utter confusion :-( When reading Semantic Web (and even Linked Data literature) it's very easy for the untrained eye to assume 1-3 are Web resources. The subject of a description may or may not be a Web realm entity. Its just something that's caught the interest of an author (creator) of a description documentation. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
On 3/25/12 1:18 PM, Hugh Glaser wrote: Hi Kingsley, On 25 Mar 2012, at 17:17, Kingsley Idehen wrote: On 3/25/12 7:18 AM, Hugh Glaser wrote: Fair questions, Michael. I have a lot of sympathy for your I don't see the point of this whole discussion. We can write what we want in documents, but the world can ignore them - and will if they don't work. And the world will be what it is, not what we want it to be. However. Unfortunately, perhaps, standards are important for people who work in the field providing systems to others. Personally, I never did agree with the solution, but have always aimed to carry out the implications of it in the systems I construct. This is for two reasons: a) as a member of a small community, it is destructive to do otherwise; b) as a professional engineer, my ethical obligations require me to do so. It is this second, the ethical obligations that are the most significant. I should not digress from the standards, or even Best Practice, in my work. (Apart from anything else, the legal implications of doing otherwise are very unpleasant.) This means that systems involving Linked Data do not get built because the options I am allowed to offer are too expensive (in money, complexity, time or business disruption), or technologically infeasible due to local constraints. But as an engineer the complexity of the spec shouldn't determine the very essence of the spec. The whole AWWW is about the deceptively simple principle in action. It isn't a simply simple solution. I keep meaning to ask: what is AWWW? It's not a term I see used anywhere but your emails. Architecture of the World Wide Web. We have URI abstraction and styles of URIs (hash or slash). The system (Linked Data in this case) is concerned about separation of powers right down to the fine-grained level of structured data representation. As result, there are implications that arise from the style of URI used in this context. Since 1998 we've ended up with the following syntaxes and serializations formats for the RDF model (EAV enhanced with URIs, language tags, and typed literals): 1. RDF/XML 2. N3 3. Turtle 4. TriX 5. N-Triples 7. TriG 8. NQuad 9. (X)HTML+RDFa 10. HTML+Microdata 11. JSON/RDF 12. JSON-LD. Don't you see a pattern here? Also what's an innocent newbie supposed to do when they encounter the above. Probably run screaming from the room. Or at least tell us to go away and come back when the community has sorted itself out. (Were I to present things this way.) Now we want repeat the pattern, this time scoped to URIs and they usage re. Linked Data fidelity: 1. hash -- Linked Data indirection is implicit 2. slash -- 303 redirection delivering Linked Data indirection explicitly 3. slash -- 200 OK and no redirection leaving user agents to process relations (and HTTP response headers) en route to manifestation of Linked Data's mandatory indirection. Again, don't you see the same pattern is taking shape i.e., a potpourri of suggestions that ultimately only add more confusion to newbies. Even worse, this particular suggest is ultimately a reworking of the entire AWWW. I'm not sure I agree with your assertion of the same pattern. In any case, I didn't say this proposal was perfect - I would do it differently. But if it is a broken world - not fixing it should not be an option. I don't think the AWWW is broken. That's my fundamental argument. You and I will have to differ as to whether the Project is currently a success - you clearly think so - I think that we are far back from what where we should be by now. I don't think so. There is more structured data on the Web, and it is growing exponentially. This simplifies the entire pursuit of Web scale Linked Data. So the answer to your first question is yes: semantic web (parts of) projects are stopped because of this. I don't buy that for one second. There's a little more to it than that. How about the tools being used for these projects? You statement implies the very best tools available where used and they failed. You know that cannot be true. Actually, it is. Your fallacy is to think that these are purely technological issues, and can always be solved with tools. I know these issues can be solved by tools. I've designed such tools and they are in broad use :-) These are socio-technical issues at best. Ethics and community membership requires it. When they do go ahead, of course they actually cause me some pain - implementing a situation I think is significantly sub-optimal - but I do not have the choice. We have to separate issues here. We have: 1. a spec or set of best practices; 2. tools that implement the spec or best practices; 3. projects seeking to exploit the spec or best practices. You are basically ruling out tool choices as reasons for project failure. Of course, people who are outside this community will do what they feel like, as always. And in due course opportunity costs force them to
Re: Change Proposal for HttpRange-14
On Mar 24, 2012, at 08:38, James Leigh wrote: On Sat, 2012-03-24 at 08:11 +, Jeni Tennison wrote: Can I just cast that into the language used by the rest of the proposal? What about: when documentation is served with a 200 response from a probe URI and does not contain a 'describedby' statement, some agents (including the publisher) might use it to identify the documentation and others a non-information resource. Publishers still need to provide support for two distinct URIs if they want to enable more consistent use of the URI. How does that sound? I'd buy into that. It works, but asks a lot from implementors and users to read and understand the subtlety. That's why I'd prefer an approach that provides a more simple, unambiguous definition. Regards, Dave Regards, James
Re: Change Proposal for HttpRange-14
On 25 March 2012 11:03, Michael Brunnbauer bru...@netestate.de wrote: Hello Jeni, On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Do you think semantic web projects have been stopped because some purist involved did not see a way to bring httprange14 into agreement with the other intricacies of the project ? Those purists will still see the new options that the proposal offers as what they are: Suboptimal. Or do you think some purists have been actually blaming publishers ? [...] http://go-to-hellman.blogspot.co.uk/2009/10/new-york-times-blunders-into-linked.html comes close to doing so... though more around semantics of 'sameas' than IR/NIR. Dan
Re: Change Proposal for HttpRange-14
Tim, greetings. On 2012 Mar 25, at 17:35, Tim Berners-Lee wrote: (Not useful to talk about NIRs. The web architecture does not. Now does Jonathan's baseline, not HTTP Range-14. Never assume that what an IR is about is not itself a IR.) Well, httpRange-14 sort of does talk about 'non-information resources', by necessary implication. If the set of information resources (IR) is not the same as the set of all resources (R), then the set R\IR (which in any case exists) is non-null, and might as well be called the set of 'non-information-resources' as anything else. But perhaps R\IR is a better notation. (I don't intend this to be hair-splitting) Parenthetically, what _is_ IR? Referring to Rees's editors draft [1], [issue-14-resolved] effectively says that iff a resource X is 200-retrieved, then it must _always_ be assigned to the set IR (the resolution seems to effectively define 'being 200-retrievable' as the definition of 'information resource', and this is consistent with [1] section 1.1 which says One convention[...] was for a hashless URI to refer to the document-like entity (information resource) served at that URI). So my phrasing was intended to weaken [issue-14-resolved] to suggest that X being 200-retrievable puts X in IR, _only_ if the documentation about X (retrieved by conneg on X, say) does not put it in R\IR. How something is put into R\IR is a separate issue. Perhaps there's a need for a class std:RnotIR, or perhaps this is up to the client, who may decide that discovering that 'X a foaf:Person' is enough to put it in R\IR for the client's purposes. Example: So, if X=http://example.org/cedric 200-returns foaf:name Cedric. then X is in IR, and oddly enough has a name (the domain of foaf:name isn't restricted to foaf:Person). If it 200-returns foaf:name Cedric; a foaf:Person. then the client should deem X to be in R\IR. This does mean that the RDF description document which has been retrieved from the URI X doesn't have a name at this point. But if that matters to the owner of X (perhaps because they want to refer to how the description document is licensed), then this minority (?) situation can be managed by having retrieval of X produce a foaf:Person; eg:describedBy http://example.org/cedric-description. http://example.org/cedric-description eg:licensed cc-by. That places X in R\IR, and indicates a description document about which anything one wishes can be asserted. All the best, Norman [1] http://www.w3.org/2001/tag/doc/uddp-20120229/ -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Change Proposal for HttpRange-14
On 2012-03 -24, at 00:47, Pat Hayes wrote: I am sympathetic, but... On Mar 23, 2012, at 9:59 AM, Dave Reynolds wrote: The proposal is that URI X denotes what the publisher of X says it denotes, whether it returns 200 or not. And what if the publisher simply does not say anything about what the URi denotes? After all, something like 99.999% of the URIs on the planet lack this information. What, if anything, can be concluded about what they denote? The http-range-14 rule provides an answer to this which seems reasonably intuitive. What would be your answer? Or do you think there should not be any 'default' rule in such cases? Exactly. For example, To take an arbitrary one of the trillions out there, what does http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 identify, there being no RDF in it? What can I possibly do with that URI if the publisher has not explicitly allowed me to use it to refer to the online book, under your proposal? Pat
Re: Change Proposal for HttpRange-14
On 25 March 2012 20:26, Tim Berners-Lee ti...@w3.org wrote: On 2012-03 -24, at 00:47, Pat Hayes wrote: I am sympathetic, but... On Mar 23, 2012, at 9:59 AM, Dave Reynolds wrote: The proposal is that URI X denotes what the publisher of X says it denotes, whether it returns 200 or not. And what if the publisher simply does not say anything about what the URi denotes? After all, something like 99.999% of the URIs on the planet lack this information. What, if anything, can be concluded about what they denote? The http-range-14 rule provides an answer to this which seems reasonably intuitive. What would be your answer? Or do you think there should not be any 'default' rule in such cases? Exactly. For example, To take an arbitrary one of the trillions out there, what does http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 identify, there being no RDF in it? What can I possibly do with that URI if the publisher has not explicitly allowed me to use it to refer to the online book, under your proposal? Pat Just to follow up on this specific example with the current actual details: (aside: in my mailer I'm replying to TimBL but all the most recent text seems attributed to Pat; maybe some mangling occured?) I can't see a mechanical way to find this, but I happened to know about http://www.gutenberg.org/wiki/Gutenberg:Feeds#The_Project_Gutenberg_Catalog_in_RDF.2FXML_Format ...which guides us to http://www.gutenberg.org/ebooks/2701.rdf and via http 302 from there to pThe document has moved a href=http://www.gutenberg.org/cache/epub/2701/pg2701.rdf;here/a./p it uses xmlns:pgterms=http://www.gutenberg.org/2009/pgterms/; and other vocabs to say, amongst other things, pgterms:ebook rdf:about=ebooks/2701 dcterms:creator rdf:resource=2009/agents/9/ dcterms:descriptionSee also Etext #2489, Etext #15, and a computer-generated audio file, Etext #9147./dcterms:description dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.epub.noimages/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.kindle.noimages/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.plucker/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.qioo/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.txt.utf8/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701-h.zip/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701-h/2701-h.htm/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701.txt/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701.zip/ dcterms:issued rdf:datatype=http://www.w3.org/2001/XMLSchema#date;2001-07-01/dcterms:issued dcterms:language rdf:datatype=http://purl.org/dc/terms/RFC4646;en/dcterms:language dcterms:license rdf:resource=license/ dcterms:publisherProject Gutenberg/dcterms:publisher dcterms:rightsPublic domain in the USA./dcterms:rights dcterms:subject rdf:Description dcam:memberOf rdf:resource=http://purl.org/dc/terms/LCSH/ rdf:valueAdventure stories/rdf:value rdf:valueAhab, Captain (Fictitious character) -- Fiction/rdf:value rdf:valueAllegories/rdf:value rdf:valueEpic literature/rdf:value rdf:valueSea stories/rdf:value rdf:valueWhales -- Fiction/rdf:value rdf:valueWhaling -- Fiction/rdf:value /rdf:Description /dcterms:subject pgterms:agent rdf:about=2009/agents/9 pgterms:birthdate rdf:datatype=http://www.w3.org/2001/XMLSchema#integer;1819/pgterms:birthdate pgterms:deathdate rdf:datatype=http://www.w3.org/2001/XMLSchema#integer;1891/pgterms:deathdate pgterms:nameMelville, Herman/pgterms:name pgterms:webpage rdf:resource=http://en.wikipedia.org/wiki/Herman_Melville/ /pgterms:agent I found this by finding the item number 2701 from inspection of the original link, and plugging it into the metadata template from their human-oriented documentation . The RDF I found makes assertions about various related URLs and things, but nothing that ties directly back to the initial URL. Worse, we've not even any evidence that the RDF doc and the other docs are in the same voice, same publisher, or author etc. Seems a great shame they went to the trouble of publishing quite a rich description of this fine work, and yet it's not easy to find by the machines that could make use of it. Dan
Re: Change Proposal for HttpRange-14
James, On 24 Mar 2012, at 00:38, James Leigh wrote: On Fri, 2012-03-23 at 21:42 +, Jeni Tennison wrote: The big thing that *is* different under this proposal is that if you have an HTML+RDFa 1.1 document like: !DOCTYPE html html head base href=http://example.org/me/ link rel=stylesheet resource=style.css/ titleMe/title /head body typeof=foaf:Person h1 property=foaf:nameJames/h1 /body /html returned with a 200 response from http://example.org/me then the application knows: * http://example.org/me is a Person * http://example.org/me's name is James and does not have a stray and inaccurate * http://example.org/me is an information resource hanging around which was contrary to the publisher's intent. In the above example, the last statement is stray and inaccurate, but that it is not always the case. Consider: !DOCTYPE html html head base href=http://example.org/me/ link rel=stylesheet resource=style.css/ titleMe/title /head body typeof=foaf:Document h1 property=dc:titleMe/h1 /body /html Yes, absolutely. Interestingly, if someone does publish something documents like that they can easily ensure that the document is interpreted as an information resource by adding a describedby link (which does have built-in semantics in RDFa 1.1): !DOCTYPE html html head base href=http://example.org/me/ link rel=stylesheet resource=style.css/ link rel=describedby resource=/ titleMe/title /head body typeof=foaf:Document h1 property=dc:titleMe/h1 /body /html Anyway, I wonder how we might change the paragraph that you quoted to remove the implication that publishers can get away with one URI when they want to identify two things. Would this work better: where a URI is intended to identify a NIR but provides a 200 response, there remains no method of addressing the documentation that is returned by that 200 response (to assert its license, provenance etc); publishers still need to support a separate URI if they want to make statements about the documentation distinct from the NIR. An updated set of best practices for linked data publishers would need to spell out what publishers should do and how consumers should combine the information provided within the response with that found at the end of any ‘describedby’ links. Good suggestion, but I don't think we can make the decision for all cases like this. I think we need to leave interpretation up to the agent, who perhaps knows more about the publisher's intents. If an agent is looking for foaf:Person, let it disregard the statement this is an information resource (no disagreements here). However, if an agent is looking specifically for information resources, let it use the URL (w/200 response) as the identifier of an IR, regardless of what it contains. How is let it disregard the statement 'this is an information resource' is the same as don't infer that it's an information resource. I don't see how it makes sense to infer something that you later disregard? I would be more happy with something like this: When a URI is served with a 200 response, agents may use the URI to address the IR that is returned by that 200 response, or use the URI to address a NIR described in the response (if a description exists). Publishers still need to support two distinct URIs if they want agents to have a more consistent interpretation. Can I just cast that into the language used by the rest of the proposal? What about: when documentation is served with a 200 response from a probe URI and does not contain a 'describedby' statement, some agents (including the publisher) might use it to identify the documentation and others a non-information resource. Publishers still need to provide support for two distinct URIs if they want to enable more consistent use of the URI. How does that sound? Thanks, Jeni -- Jeni Tennison http://www.jenitennison.com
Re: Change Proposal for HttpRange-14
2012/3/23 Melvin Carvalho melvincarva...@gmail.com: 2012/3/23 Giovanni Tummarello giovanni.tummare...@deri.org 2012/3/23 Sergio Fernández sergio.fernan...@fundacionctic.org: Do you really think that base your proposal on the usage on a Powder annotation is a good idea? Sorry, but IMHO HttpRange-14 is a good enough agreement. yup performed brilliantly so far, nothing to say. Industry is flocking to adoption, and what a consensus. +1 'Brilliantly' is an understatement :) And we're probably still only towards the beginning of the adoption cycle! I dont think, even the wildest optimist, could have predicted the success of the current architecture (both pre and post HR14). Oh dear, so now I don't know any more if Gio was being saracastic! Linked Data is a brilliant success, despite the burden of http-range-14. Is a SKOS Concept an Information Resource? Must its URIs 303 redirect? Is a # pointing into an RDFa page OK? We don't make this stuff easy. http-range-14 has long been an embarrassment. Just now all the critics get invited to try to do a better job, which isn't as easy as it looks :) Dan
Re: Change Proposal for HttpRange-14
Hi Jeni, Please can you clarify something for me? (I am not very good at reading these formal documents - a bear of little brain, perhaps.) Am I right in thinking that, under your Change Proposal, the following sort of thing becomes possible (I hope I am getting it right). Taking a site such as myexperiment.org (but it could very easily by the eprints software, BBC, or even dbpedia.) See http://www.myexperiment.org/workflows/16 A huge barrier to adoption of LD for them was that their users would be exposed to the intricacies of the different URIs, and in particular that if myexperiment.org moved over to using LD URIs completely, users would not be able to cut and paste them from the address bar etc.. Great confusion would ensue, especially as their workflows already offered XML in addition to the HTML. This was a Bad Thing for them - their users were only just coming to terms with all this online workflow stuff, and could easily get spooked. They nearly didn't do it, but because many of their technology providers were Linked Data people, it went ahead (a few years ago now). The current outcome is what you see at the bottom of the workflow page - a panel offering the different URIs, with a link to a page describing the Linked Data world (to Chemists), which they are expected to understand. (Hash URIs might have been a bit better, but introduced a different mechanism from the XML.) As a result of your Change Proposal, it would have been acceptable (*if they wanted*), to simply add RDF as a Content Negotiation option, and deliver an RDF document with 200, in response to -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16, just as they did for XML, I think. And this would enable them to use http://www.myexperiment.org/workflows/16 as the anchor throughout the site (as they do) and have the same URI in the address bar, and in fact have http://www.myexperiment.org/workflows/16 as the only thing users see. Is that right? Apropos Doing It Wrong: It is interesting to note that I see myexperiment.org have made the practical decision to 303 to the RDF from curl -i -L -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16.html which suggests that they are already subverting things to get round some sort of problem. Few sites I can find (apart from dbpedia) actually return 406 when you ask the HTML URI for RDF: they usually return the HTML. It is a foolish agent that relies on RDF coming back from a 200 OK when it has asked for application/rdf+xml. Apropos Risk. You say there is no risk. Is this a risk?: There may be a serious increase in the number of URIs for current sites. Taking Freebase as another example. (In fact any of these sites that have worked hard to conform to the current regime will have a decision to make.) Presently, if I curl -i -L -H Accept:application/rdf+xml http://www.freebase.com/view/en/engelbert_humperdinck it gives me back HTML. What will it do in future? I know this Change Proposal is not proposing that they need to change, but will they? They already have http://rdf.freebase.com/ns/en.engelbert_humperdinck (and http://rdf.freebase.com/ns/m.047vj6 and another longer one). Effectively http://www.freebase.com/view/en/engelbert_humperdinck becomes yet another URI that people can use, since it would return RDF (as myexperiment). Obviously I am viewing this a bit from the sameAs.org viewpoint. I know that the resource in the RDF document will (should) never be the HTML URI, but people can and possibly will start passing around the HTML URI as if it was the proper URI, and so a sensible sameAs service would have it as a way of looking up the proper URIs. In fact I have sometimes toyed with the idea of allowing look up by HTML URL on sameAs.org (giving back only the real Linked Data URIs) - it is what a user expects from such a query, after all. (I hope all that makes sense.) My view of this potential Risk, however, is that it is a long-term risk of the way we are doing things now. If Linked Data is really successful, we will be at best in a myexperiment world. And so the sooner we make the change, the more manageable the Risk is. Best Hugh -- Hugh Glaser, Web and Internet Science Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ Work: +44 23 8059 3670, Fax: +44 23 8059 3045 Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652 http://www.ecs.soton.ac.uk/~hg/
Re: Change Proposal for HttpRange-14
On 23 March 2012 14:33, Pat Hayes pha...@ihmc.us wrote: On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote: I am a bit dismayed that nobody seems to be picking up on the point I've been hammering on (TimBL and others have also pointed it out), that, as shown by the Flickr and Jamendo examples, the real issue is not an IR/NIR type distinction, but rather a distinction in the *manner* in which a URI gets its meaning, via instantiation (of some generic IR) on the one hand, vs. description (of *any* resource, perhaps even an IR) on the other. The whole information-resource-as-type issue is a total red herring, perhaps the most destructive mistake made by the httpRange-14 resolution. +1000. There is no need for anyone to even talk about information resources. The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. We don't need to get into the metaphysics of HTTP in order to see that a book (say) can't be accessed by HTTP, so if you want to denote it (the book) with an IRI and stay in conformance with this rule, then you have to use something other than a 200-level response. Setting aside http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived ('ebooks' will soon just be 'books', just as 'email' became 'mail'), and slipping into general opinion here that's not particularly directed at Pat. I assume you're emphasising the physical notion of book. Perhaps 'person' is even more obviously physical (though heavily tattoo'd people have some commonaliities with books). The Web architecture that I first learned, was explained to me (HTTP-NG WG era) in terms familiar from the Object Oriented style of thinking about computing (and a minor religion at the time too). The idea is that the Web interface is a kind of encapsulation. External parties don't get direct access to the insides, it's always mediated by HTTP GET and other requests. Just as in Java, you an expose an object's data internals directly, or you get hide them behind getters and setters, same with Web content. So a Web site might encapsulate a coffee machine, teapot or toaster; a CSV file, SGML repository, perl script or whatever). That pattern allowed the Web to get very big, very fast; you could wrap it around anything. In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on this view described, in which the hidden innards of a Web object are constrained to be 'data'. When we think of the Web today, the idea of a 'resource' comes to mind. In general, a resource is an Object that has some methods (e.g. in HTTP, Get Head and Post) that can be invoked on it. Objects may be stateful in that they have some sort of opaque 'native data' that influences their behavior. The nature of this native data is unknown to the outside, unless the object explicitly makes it known somehow. (note, this is from the failed HTTP-NG initiative, not the HTTP/webarch we currently enjoy) So on this thinking, Dan's homepage is an item of Web content, that is encapsulated inside the standard Web interface. It has http-based getters and (potentially) setters, so you can ask for the default bytestream rendering of it, or perhaps content-negotiate with different getter and get a PDF, or a version in another language. But on this OO-style of thinking about Web content, you *never get the thing itself*. Only (possibly lossy, possibly on-the-fly generated) serializations of it. The notion of 'serialization' (also familiar to many coders) doesn't get used much in discussing http-range-14, yes it seems to be very close to our concerns here. Perhaps all the different public serializations of my homepage are so rich that they constitute full (potentially round-trippable) serializations of the secret internal state. Or perhaps they're all lossy, because enough internals are never actually sent out over the wire. The Web design (as I understand/understood) it means that you'll never 100% know what's on the inside. My homepage might be generated by 1000 typing monkeys; or by pulling zeros and ones from filesystem, or composed from a bunch of SQL database lookups. It might be generated by different methods in 2010 to 2012; or from minute to minute. All of this is my private webmasterly business: as far as the rest of the world is concerned, it's all the same thing, ... my homepage. I can move the internals from filesystem-based to wordpress to mediawiki, and from provider to provider. I can choose to serve US IP addresses from a mediawiki in Boston, and Japanese IP addresses from a customised MoinMoin wiki in Tokyo. Why? That's my business! But it's still my homepage. And you - the outside world - don't get to know how it's made. On that thinking, it might be sometimes useful to have clues as to whether sufficient of the secret internals of some Web page could be fully
Re: Change Proposal for HttpRange-14
On 3/24/12 6:28 AM, Dan Brickley wrote: I don't believe we'll ever come up with a clear distinction between 'description' and 'representation', such that we can say look, Dan's homepage, you get a proper representation of it across the wire, ... whereas a physical book, you're merely getting a description. BTW - great post! Re. the above, how about seeing a 'description' as a kind of 'representation'. Thus, you can have 'representations' of web pages which are Web medium artifacts and 'descriptions' for everything else that isn't a Web medium artifact. Likewise, we can even get folks to accept/understand that 'definitions' are a kind of 'description' too. Underutilized predicates such as: wdrs:describedby and rdfs:isDefinedBy, capture the semantics of the statement above. Ditto partitioning of relations across the TBox and Abox in knowledge management. Links: 1. http://www.differencebetween.com/difference-between-description-and-vs-definition/ -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Change Proposal for HttpRange-14
Hi Hugh, On 24 Mar 2012, at 10:02, Hugh Glaser wrote: Please can you clarify something for me? (I am not very good at reading these formal documents - a bear of little brain, perhaps.) I will try my best. Am I right in thinking that, under your Change Proposal, the following sort of thing becomes possible (I hope I am getting it right). Taking a site such as myexperiment.org (but it could very easily by the eprints software, BBC, or even dbpedia.) See http://www.myexperiment.org/workflows/16 A huge barrier to adoption of LD for them was that their users would be exposed to the intricacies of the different URIs, and in particular that if myexperiment.org moved over to using LD URIs completely, users would not be able to cut and paste them from the address bar etc.. Great confusion would ensue, especially as their workflows already offered XML in addition to the HTML. Right. This was a Bad Thing for them - their users were only just coming to terms with all this online workflow stuff, and could easily get spooked. They nearly didn't do it, but because many of their technology providers were Linked Data people, it went ahead (a few years ago now). The current outcome is what you see at the bottom of the workflow page - a panel offering the different URIs, with a link to a page describing the Linked Data world (to Chemists), which they are expected to understand. (Hash URIs might have been a bit better, but introduced a different mechanism from the XML.) Yep. As a result of your Change Proposal, it would have been acceptable (*if they wanted*), to simply add RDF as a Content Negotiation option, and deliver an RDF document with 200, in response to -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16, just as they did for XML, I think. And this would enable them to use http://www.myexperiment.org/workflows/16 as the anchor throughout the site (as they do) and have the same URI in the address bar, and in fact have http://www.myexperiment.org/workflows/16 as the only thing users see. Is that right? Yes. They could have used http://www.myexperiment.org/workflows/16 throughout the site, had it respond with a 200 based on conneg with either HTML or RDF as required. It wouldn't have taken a linked data expert to figure out that if they wanted to refer to the workflow they had to copy and paste from the box at the bottom of the HTML page rather than the location bar at the top from you which you usually copy and paste URIs. They could also (as they are doing) had separate URIs for the individual formats like: http://www.myexperiment.org/workflows/16.html http://www.myexperiment.org/workflows/16.rdf http://www.myexperiment.org/workflows/16.xml They could have included within the RDF that you got from http://www.myexperiment.org/workflows/16 statements of the form: http://www.myexperiment.org/workflows/16 wdrs:describedby http://www.myexperiment.org/workflows/16.html ; wdrs:describedby http://www.myexperiment.org/workflows/16.rdf ; wdrs:describedby http://www.myexperiment.org/workflows/16.xml ; . This would have enabled them to make separate statements about the licensing and provenance of the information held in those documents. If they didn't want to make those kinds of statements or enable those formats to be individually addressable, they could have just supported the http://www.myexperiment.org/workflows/16 URL and used conneg. Apropos Doing It Wrong: It is interesting to note that I see myexperiment.org have made the practical decision to 303 to the RDF from curl -i -L -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16.html which suggests that they are already subverting things to get round some sort of problem. It looks as though it's: http://www.myexperiment.org/workflows/16.html - 301 - http://www.myexperiment.org/workflows/16 - 303 - http://www.myexperiment.org/workflows/16.rdf - 200 Technically I think, per http://www.w3.org/2001/tag/doc/uddp/#idp439264 this should mean that you can infer http://www.myexperiment.org/workflows/16.html sameAs http://www.myexperiment.org/workflows/16 but I'm not 100% sure what's intended (I think this needs spelling out). Few sites I can find (apart from dbpedia) actually return 406 when you ask the HTML URI for RDF: they usually return the HTML. It is a foolish agent that relies on RDF coming back from a 200 OK when it has asked for application/rdf+xml. Yes. Apropos Risk. You say there is no risk. Is this a risk?: There may be a serious increase in the number of URIs for current sites. Taking Freebase as another example. (In fact any of these sites that have worked hard to conform to the current regime will have a decision to make.) Presently, if I curl -i -L -H Accept:application/rdf+xml http://www.freebase.com/view/en/engelbert_humperdinck it gives me back HTML. What will it do in future? I
Re: Change Proposal for HttpRange-14
Many thanks. I'm pleased I already put my name on the list then :-) And for me some useful fleshing out of how things would/will work. No further comments inline. Best Hugh On 24 Mar 2012, at 11:17, Jeni Tennison wrote: Hi Hugh, On 24 Mar 2012, at 10:02, Hugh Glaser wrote: Please can you clarify something for me? (I am not very good at reading these formal documents - a bear of little brain, perhaps.) I will try my best. Am I right in thinking that, under your Change Proposal, the following sort of thing becomes possible (I hope I am getting it right). Taking a site such as myexperiment.org (but it could very easily by the eprints software, BBC, or even dbpedia.) See http://www.myexperiment.org/workflows/16 A huge barrier to adoption of LD for them was that their users would be exposed to the intricacies of the different URIs, and in particular that if myexperiment.org moved over to using LD URIs completely, users would not be able to cut and paste them from the address bar etc.. Great confusion would ensue, especially as their workflows already offered XML in addition to the HTML. Right. This was a Bad Thing for them - their users were only just coming to terms with all this online workflow stuff, and could easily get spooked. They nearly didn't do it, but because many of their technology providers were Linked Data people, it went ahead (a few years ago now). The current outcome is what you see at the bottom of the workflow page - a panel offering the different URIs, with a link to a page describing the Linked Data world (to Chemists), which they are expected to understand. (Hash URIs might have been a bit better, but introduced a different mechanism from the XML.) Yep. As a result of your Change Proposal, it would have been acceptable (*if they wanted*), to simply add RDF as a Content Negotiation option, and deliver an RDF document with 200, in response to -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16, just as they did for XML, I think. And this would enable them to use http://www.myexperiment.org/workflows/16 as the anchor throughout the site (as they do) and have the same URI in the address bar, and in fact have http://www.myexperiment.org/workflows/16 as the only thing users see. Is that right? Yes. They could have used http://www.myexperiment.org/workflows/16 throughout the site, had it respond with a 200 based on conneg with either HTML or RDF as required. It wouldn't have taken a linked data expert to figure out that if they wanted to refer to the workflow they had to copy and paste from the box at the bottom of the HTML page rather than the location bar at the top from you which you usually copy and paste URIs. They could also (as they are doing) had separate URIs for the individual formats like: http://www.myexperiment.org/workflows/16.html http://www.myexperiment.org/workflows/16.rdf http://www.myexperiment.org/workflows/16.xml They could have included within the RDF that you got from http://www.myexperiment.org/workflows/16 statements of the form: http://www.myexperiment.org/workflows/16 wdrs:describedby http://www.myexperiment.org/workflows/16.html ; wdrs:describedby http://www.myexperiment.org/workflows/16.rdf ; wdrs:describedby http://www.myexperiment.org/workflows/16.xml ; . This would have enabled them to make separate statements about the licensing and provenance of the information held in those documents. If they didn't want to make those kinds of statements or enable those formats to be individually addressable, they could have just supported the http://www.myexperiment.org/workflows/16 URL and used conneg. Apropos Doing It Wrong: It is interesting to note that I see myexperiment.org have made the practical decision to 303 to the RDF from curl -i -L -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16.html which suggests that they are already subverting things to get round some sort of problem. It looks as though it's: http://www.myexperiment.org/workflows/16.html - 301 - http://www.myexperiment.org/workflows/16 - 303 - http://www.myexperiment.org/workflows/16.rdf - 200 Technically I think, per http://www.w3.org/2001/tag/doc/uddp/#idp439264 this should mean that you can infer http://www.myexperiment.org/workflows/16.html sameAs http://www.myexperiment.org/workflows/16 but I'm not 100% sure what's intended (I think this needs spelling out). Few sites I can find (apart from dbpedia) actually return 406 when you ask the HTML URI for RDF: they usually return the HTML. It is a foolish agent that relies on RDF coming back from a 200 OK when it has asked for application/rdf+xml. Yes. Apropos Risk. You say there is no risk. Is this a risk?: There may be a serious increase in the number of URIs for current sites. Taking Freebase as another example. (In fact any of
Re: Change Proposal for HttpRange-14
On Sat, 2012-03-24 at 08:11 +, Jeni Tennison wrote: Can I just cast that into the language used by the rest of the proposal? What about: when documentation is served with a 200 response from a probe URI and does not contain a 'describedby' statement, some agents (including the publisher) might use it to identify the documentation and others a non-information resource. Publishers still need to provide support for two distinct URIs if they want to enable more consistent use of the URI. How does that sound? I'd buy into that. Regards, James
Re: Change Proposal for HttpRange-14
On Fri, Mar 23, 2012 at 9:02 PM, Jeni Tennison j...@jenitennison.com wrote: Can you provide a handful of these Doing It Wrong URIs please from various sites? I think it would really be helpful to have them on hand during discussions. OK. These picked up from dumps made available by webdatacommons.org, so very grateful to them for making that available; it can be quite hard to locate this kind of markup generally. Also I've used Gregg's distiller [1] to extract the RDFa out of the documents to double-check. ... Thanks much Jeni, this is very helpful. Jonathan