subject:"Change Proposal for HttpRange\-14"


On 4/1/12 4:35 AM, Michael Brunnbauer wrote:

hi all

On Sat, Mar 31, 2012 at 05:53:03PM +0200, Michael Brunnbauer wrote:

maybe I made an error by assuming that the term IR is inherent in the term
representation - by assuming that a NIR cannot have a representation, only
descriptions ?

No. The whole point about the use of the term IR in HR14 seems to be to say:

Everything that has a representation has a representation that conveys it's
essential characteristics.

Is this important ? If yes, should we write it this way ?

Regards,

Michael Brunnbauer



Aren't we somehow losing the fundamental fact that all resources on the 
Web are supposed to be bear self-describing content, constrained by mime 
type. That when all is said and done, irrespective of mime type, all Web 
resources are Information Resources.


The above gels nicely with the fact that all content bears 
representation of something the provides information to appropriate 
systems, courtesy of the mime type component of this content+mime-type 
composite.


The content of a basic HTML web page, an RDF document, OWL and RDFs 
documents all bear content that deliver information. Of course, within 
specific system realms such as RDF, Linked Data, and the Semantic Web, 
the content represents a more specific kind of information in the form 
of descriptions and definitions --  at least, in the eyes of systems 
(clients and servers) for said realms.



--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14


On 4/1/12 11:42 AM, Kingsley Idehen wrote:

On 4/1/12 4:35 AM, Michael Brunnbauer wrote:

hi all

On Sat, Mar 31, 2012 at 05:53:03PM +0200, Michael Brunnbauer wrote:
maybe I made an error by assuming that the term IR is inherent in 
the term
representation - by assuming that a NIR cannot have a 
representation, only

descriptions ?
No. The whole point about the use of the term IR in HR14 seems to be 
to say:


Everything that has a representation has a representation that 
conveys it's

essential characteristics.

Is this important ? If yes, should we write it this way ?

Regards,

Michael Brunnbauer



Aren't we somehow losing the fundamental fact that all resources on 
the Web are supposed to be bear self-describing content, constrained 
by mime type. That when all is said and done, irrespective of mime 
type, all Web resources are Information Resources.


The above gels nicely with the fact that all content bears 
representation of something the provides information to appropriate 
systems, courtesy of the mime type component of this content+mime-type 
composite.


The content of a basic HTML web page, an RDF document, OWL and RDFs 
documents all bear content that deliver information. Of course, within 
specific system realms such as RDF, Linked Data, and the Semantic Web, 
the content represents a more specific kind of information in the form 
of descriptions and definitions --  at least, in the eyes of systems 
(clients and servers) for said realms.





In the post above, I forgot to add this link for reference:  
http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html .


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-04-01 Thread David Booth

On Sun, 2012-04-01 at 10:35 +0200, Michael Brunnbauer wrote:
 The whole point about the use of the term IR in HR14 seems to be to say:
 
 Everything that has a representation has a representation that conveys it's
 essential characteristics. 
 
 Is this important ? If yes, should we write it this way ?

FYI, Jonathan Rees has written up a very nice formalization of the
relationship between an information resource and a representation -- or
in his parlance an instance and a generic information entity -- in
terms of what it means to make metadata statements about them:
http://www.w3.org/2001/tag/awwsw/ir/latest/



-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-04-01 Thread Michael Brunnbauer


Hello Kingsley,

 Everything that has a representation has a representation that conveys it's
 essential characteristics.
[...]
 Aren't we somehow losing the fundamental fact that all resources on the 
 Web are supposed to be bear self-describing content, constrained by mime 
 type. That when all is said and done, irrespective of mime type, all Web 
 resources are Information Resources.

Your last sentence is what my sentence above says. In 
http://www.w3.org/2001/tag/doc/uddp-20120229/, an IR is defined as something
that has a representation that conveys the essential characteristics of it.

We can get rid of the term information resource by putting something like the 
above statement about representations in the papers - if I am not the only
one who thinks that things are easier to understand this way :-)

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14


On 4/1/12 12:31 PM, Michael Brunnbauer wrote:

Hello Kingsley,


Everything that has a representation has a representation that conveys it's
essential characteristics.

[...]

Aren't we somehow losing the fundamental fact that all resources on the
Web are supposed to be bear self-describing content, constrained by mime
type. That when all is said and done, irrespective of mime type, all Web
resources are Information Resources.

Your last sentence is what my sentence above says. In
http://www.w3.org/2001/tag/doc/uddp-20120229/, an IR is defined as something
that has a representation that conveys the essential characteristics of it.

We can get rid of the term information resource by putting something like the
above statement about representations in the papers - if I am not the only
one who thinks that things are easier to understand this way :-)


Yes, but in the context of RDF a triple can be seen as conveying 
information about the referent of a URI. This information can take the 
form of a description or a more specific definition.


Resources always bear 'information' the question ultimately boils down 
to what kind of information, subject to the Web system (dimension or 
aspect) in question :-)


Regards,

Michael Brunnbauer




--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-04-01 Thread David Booth

On Sat, 2012-03-31 at 11:32 -0400, Jonathan A Rees wrote:
[ . . . ]
 So this is something we already knew from the HTTP spec, which all of
 us pretty much agree to; 

We all agree to it as a *protocol* specification -- not as a *semantics*
specification.

[ . . . ]
 On the other hand the specs are all terribly murky, [ . . . ]

They're only murky if you are trying to interpret them as defining a
global semantics for the web, which is not what they were intended to
do.


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14


On 4/1/12 9:42 PM, David Booth wrote:

On Sat, 2012-03-31 at 11:32 -0400, Jonathan A Rees wrote:
[ . . . ]

So this is something we already knew from the HTTP spec, which all of
us pretty much agree to;

We all agree to it as a *protocol* specification -- not as a *semantics*
specification.

[ . . . ]

On the other hand the specs are all terribly murky, [ . . . ]

They're only murky if you are trying to interpret them as defining a
global semantics for the web, which is not what they were intended to
do.



+1

--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-31 Thread Michael Brunnbauer


hi all

The document at http://www.w3.org/2001/tag/doc/uddp-20120229/ uses the
term X (a sequence of octets + media type) is a representation of Y (an entity).

I have a question: Can two different entities have the same representation ?

If not, we can define an IR as a thing for which there is at least one
sequence of octets + media type that is a representation of it because it's 
essential characteristics would be conveyed in that message. The term IR 
would not have much value in this case as it would not be a term of it's own.

If yes, I could have a lossy compression algorithm that makes the same
sequence of octets out of two different images and still have a representation
of those images when I GET them.

I think all the difficult questions (or the nitpicking, if you want) do not 
go away if we drop the term IR. They also lie in the question does this
URI denote what it accesses or is this message a representation of the
entity or do I serve the content of this sucker.

Regards,

Michael Brunnbauer

On Wed, Mar 28, 2012 at 11:35:04PM +0200, Michael Brunnbauer wrote:
 
 Hallo Norman,
 
  -Regardless of how you define IR, everything that denotes what it accesses
   should lie in IR.
  
  -Putting something in NIR therefor also answers the question if it denotes
   what it accesses with no by entailment.
 
 I have worded this very badly. We are talking about things and names of 
 things.
 This should be:
 
 For all URIs U: denote(U) = access(U) - denote(U) a IR
 
 It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U)
 
  -There may or may not be IRs that do not denote what they access.
 
 And this should be:
 
 There is a URI U where: denote(U) a IR and denote(U) != access(U).
 
 Now if a am allowed to mint a URI that 303's to your homepage and your
 homepage is an IR, such an URI must exist:
 
 U1 = Your URI for your homepage
 U2 = My URI for your homepage
 denote(U1) a IR
 denote(U2) != access(U2)
 denote(U1) = denote(U2) thererfor denote(U2) a IR and denote(U2) != access(U2)
 
 I think I'll stay out of this discussion from now :-)
 
 Regards,
 
 Michael Brunnbauer
 
 
 -- 
 ++  Michael Brunnbauer
 ++  netEstate GmbH
 ++  Geisenhausener Straße 11a
 ++  81379 München
 ++  Tel +49 89 32 19 77 80
 ++  Fax +49 89 32 19 77 89 
 ++  E-Mail bru...@netestate.de
 ++  http://www.netestate.de/
 ++
 ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
 ++  USt-IdNr. DE221033342
 ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
 ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-31 Thread Jonathan A Rees

On Sat, Mar 31, 2012 at 8:05 AM, Michael Brunnbauer bru...@netestate.de wrote:

 hi all

 The document at http://www.w3.org/2001/tag/doc/uddp-20120229/ uses the
 term X (a sequence of octets + media type) is a representation of Y (an 
 entity).

 I have a question: Can two different entities have the same representation ?

I've never heard anything say anything that would rule this out.

Well, on second thought, in conversation I've heard people give a
theory that representations being on the wire, which would make them
events, which occur in space and time, thus cannot happen twice. But
this idea does not follow from 2616 or 3986 or AWWW and does not have
anything like consensus.

I always thought that Content-location: suggested this situation,
where you have two URIs and two resources, and a representation that
is of both, where the first resource might have *additional*
representations, and the second doesn't. This seems tidy to me, but
it's just my theory.

 If not, we can define an IR as a thing for which there is at least one
 sequence of octets + media type that is a representation of it because it's
 essential characteristics would be conveyed in that message. The term IR
 would not have much value in this case as it would not be a term of it's own.

That is: an IR is something that has a representation. I think this
has been suggested several times. Unfortunately information resource
has a definition in AWWW and I don't see the merit in redefining the
term rather than introducing a new term.

However I believe I have heard this suggestion, or something like it,
before, from several sources, so it's not completely out of the
question. It would be nice in a way because it would make HR14a
completely vacuous. This is what I call opt in because you wouldn't
be able to assume that what you GET is content (Tim's word, my
instance).

 If yes, I could have a lossy compression algorithm that makes the same
 sequence of octets out of two different images and still have a representation
 of those images when I GET them.

 I think all the difficult questions (or the nitpicking, if you want) do not
 go away if we drop the term IR. They also lie in the question does this
 URI denote what it accesses or is this message a representation of the
 entity or do I serve the content of this sucker.

I'm glad you say this. I agree that the when is X content of Y
question remains, although to me it's not such a difficult question
(it remains for me to convince others of this).

Best
Jonathan

 Regards,

 Michael Brunnbauer

 On Wed, Mar 28, 2012 at 11:35:04PM +0200, Michael Brunnbauer wrote:

 Hallo Norman,

  -Regardless of how you define IR, everything that denotes what it accesses
   should lie in IR.
 
  -Putting something in NIR therefor also answers the question if it denotes
   what it accesses with no by entailment.

 I have worded this very badly. We are talking about things and names of 
 things.
 This should be:

 For all URIs U: denote(U) = access(U) - denote(U) a IR

 It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U)

  -There may or may not be IRs that do not denote what they access.

 And this should be:

 There is a URI U where: denote(U) a IR and denote(U) != access(U).

 Now if a am allowed to mint a URI that 303's to your homepage and your
 homepage is an IR, such an URI must exist:

 U1 = Your URI for your homepage
 U2 = My URI for your homepage
 denote(U1) a IR
 denote(U2) != access(U2)
 denote(U1) = denote(U2) thererfor denote(U2) a IR and denote(U2) != 
 access(U2)

 I think I'll stay out of this discussion from now :-)

 Regards,

 Michael Brunnbauer


 --
 ++  Michael Brunnbauer
 ++  netEstate GmbH
 ++  Geisenhausener Straße 11a
 ++  81379 München
 ++  Tel +49 89 32 19 77 80
 ++  Fax +49 89 32 19 77 89
 ++  E-Mail bru...@netestate.de
 ++  http://www.netestate.de/
 ++
 ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
 ++  USt-IdNr. DE221033342
 ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
 ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

 --
 ++  Michael Brunnbauer
 ++  netEstate GmbH
 ++  Geisenhausener Straße 11a
 ++  81379 München
 ++  Tel +49 89 32 19 77 80
 ++  Fax +49 89 32 19 77 89
 ++  E-Mail bru...@netestate.de
 ++  http://www.netestate.de/
 ++
 ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
 ++  USt-IdNr. DE221033342
 ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
 ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-31 Thread Michael Brunnbauer


Hello Jonathan,

maybe I made an error by assuming that the term IR is inherent in the term 
representation - by assuming that a NIR cannot have a representation, only 
descriptions ?

But if a a NIR cannot have a representation and two different IRs cannot
have the same representation, then getting a representation of an IR
is as close as I can get to it.

Regards,

Michael Brunnbauer

On Sat, Mar 31, 2012 at 11:32:37AM -0400, Jonathan A Rees wrote:
 On Sat, Mar 31, 2012 at 11:13 AM, Michael Brunnbauer
 bru...@netestate.de wrote:
 
  Hallo Jonathan
 
  [off list. If you think your answer will be helpful to others, put it back 
  on
  the list]
 
  On Sat, Mar 31, 2012 at 10:54:09AM -0400, Jonathan A Rees wrote:
  That is: an IR is something that has a representation.
  [...]
  It would be nice in a way because it would make HR14a
  completely vacuous. This is what I call opt in because you wouldn't
  be able to assume that what you GET is content (Tim's word, my
  instance).
 
  Why would this definition make HR14 vacuous ? I would say that the rule
  from a statuscode 200, you can infer that you got a representation of what
  the URI denotes can be made with or without that definition.
 
 What I mean by vacuous is that RFC 2616 (certainly HTTPbis) already
 says - in my reading at least - that the retrieved representation is a
 representation of the resource identified by the URI (or at least that
 the server is *saying* so, i.e. it is nominally so, which is usually
 good enough).
 
 So this is something we already knew from the HTTP spec, which all of
 us pretty much agree to; neither the TAG nor anyone else would have to
 say that this is the case in any pronouncement resembling
 httpRange-14(a).
 
 Maybe vacuous was a poor choice of word.
 
 On the other hand the specs are all terribly murky, so maybe it would
 be good to repeat this somewhere.
 
 In any case information resource as used in HR14a is well connected
 to AWWW and I think redefining the term, no matter how bad the
 definition, would just confuse things. You could say HTTP resource
 or something for resources that have representations (what would be an
 example of one that doesn't?). My opinion.
 
 Best
 Jonathan
 
  Regards,
 
  Michael Brunnbauer
 
  --
  ++  Michael Brunnbauer
  ++  netEstate GmbH
  ++  Geisenhausener Straße 11a
  ++  81379 München
  ++  Tel +49 89 32 19 77 80
  ++  Fax +49 89 32 19 77 89
  ++  E-Mail bru...@netestate.de
  ++  http://www.netestate.de/
  ++
  ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
  ++  USt-IdNr. DE221033342
  ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
  ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-31 Thread Kingsley Idehen


On 3/31/12 11:32 AM, Jonathan A Rees wrote:

In any case information resource as used in HR14a is well connected
to AWWW and I think redefining the term, no matter how bad the
definition, would just confuse things. You could say HTTP resource
or something for resources that have representations (what would be an
example of one that doesn't?). My opinion.
Information Resource isn't the problem. Its the Non Information Resource 
(NIR) that's the problem. In the Linked Data realm we have  'Descriptor 
Resources' that bear higher fidelity structured content which are still 
ultimately constrained by mime type.


An illustration:

Information Space dimension
  | -- isA -- Web dimension
  |
  |-- isA -- Web of Information Resources (e.g. an HTML page 
modulo Microdata or RDFa data islands)

  |
Data Space dimension
  | -- isA -- Web dimension
  |
  |-- isA -- Web of Descriptor Resources (e.g. RDF documents where 
content is RDF/XML, N-Triples, Turtle, HTML+Microdata, (X)HTML+RDFa etc..)


All of the resource types above are self-describing, courtesy of mime 
type constrained content.


Excerpt from TimBL's  Web FAQ [1]:

Q: What did you have in mind when you first developed the Web?

From A Short Personal History of the Web:

A: The dream behind the Web is of a common information space in which we 
communicate by sharing information. Its universality is essential: the 
fact that a hypertext link can point to anything, be it personal, local 
or global, be it draft or highly polished. There was a second part of 
the dream, too, dependent on the Web being so generally used that it 
became a realistic mirror (or in fact the primary embodiment) of the 
ways in which we work and play and socialize. That was that once the 
state of our interactions was on line, we could then use computers to 
help us analyze it, make sense of what we are doing, where we 
individually fit in, and how we can better work together.


Bearing in mind the above, it should aid understanding why Linked Data 
is about the Web's Data Space dimension. Remember, Data != Information. 
When you put data in context you get information. A protocol for 
accessing data combined with a model for data representation are 
critical components for providing context for data, en route to 
producing information.



Links:

1. http://www.w3.org/People/Berners-Lee/FAQ.html -- TimBL FAQ re. Web.

2. http://tools.ietf.org/html/draft-hammer-discovery-06 -- some context 
for descriptor resources which also demonstrates how this term provides 
a conduit to others that are less interested in RDF content formats 
while still interested in Web scale structured and linked data .


3. http://goo.gl/BBsIz -- Three main types of Object Descriptors 
(remember: the Web is really a contemporary and widely successful 
Distributed Object system) .


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-31 Thread Michael Brunnbauer


Hello Jonathan,

On Sat, Mar 31, 2012 at 10:54:09AM -0400, Jonathan A Rees wrote:
  I have a question: Can two different entities have the same representation ?
 I've never heard anything say anything that would rule this out.

Hmm... so from a 200 statuscode and HR14, I can conclude that I have
a representation of it, that is is an IR and therefor has a representation
that conveys the essential characteristics of it (definition of IR at
http://www.w3.org/2001/tag/doc/uddp-20120229/), but not that the 
representation I got actually is a representation that conveys the essential 
characteristics of it ?

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-30 Thread Jonathan A Rees

On Thu, Mar 29, 2012 at 10:20 PM, David Booth da...@dbooth.org wrote:
 On Thu, 2012-03-29 at 20:51 -0400, Jonathan A Rees wrote:
 On Tue, Mar 27, 2012 at 6:01 PM, Jeni Tennison j...@jenitennison.com wrote:
 [ . . . ] But then we would also have to define what 'content' and
 'description' meant. I have a feeling that might prove just as
 slippery and ultimately unhelpful as 'information resource'.

 Agreed.  As long as there's an attempt to define a difference between
 the two, we'll be mired in the same impossible

 I disagree. I've been able to reverse engineer a semantics [1] for
 'content' that matches the original RDF design (for metadata, [2]) and
 what I think was *intended* by httpRange-14(a). The 'information
 resource' definition is just really unactionable; perhaps reparable
 but I don't think repairing it would help much since that's not even
 the issue.

 [1] http://www.w3.org/2001/tag/awwsw/ir/latest/
 [2] http://www.w3.org/TR/WD-rdf-syntax-971002/

 A semantics for 'content'?  That's not at all what I read in [1].  Did
 you mean to reference some other document?  I think [1] describes an
 excellent way to formalize what it means to write an assertion about an
 information resource (though it's called a generic information entity
 in that document instead of information resource).  But it only uses
 the term 'content' three times in the body, and only in passing.  And it
 *never* defines the term.  In what sense do you think it defines a
 semantics for 'content'?

Tim's 'content' ~= my 'instance'

I thought that was clear from the way I've been saying content /
instance and content (instance) and instance (content) in my
emails

I've started using Tim's word since people listen more closely to him
than they do to me.

Jonathan

Re: Change Proposal for HttpRange-14

Hi Michael,

On 27 March 2012 16:17, Michael Smethurst michael.smethu...@bbc.co.uk wrote:

 On 26/03/2012 17:13, Tom Heath tom.he...@talis.com wrote:

 Hi Jeni,

 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,

 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.

 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!

 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.

 No data here I fear; merely anecdote. But anecdote is usually the best form
 of data :-)

I guess this is where we'll have to differ :)

Of all people you guys at the BBC have great anecdotes, and clearly
personally you have heaps of opinions about some of the big thorny
issues in Linked Data deployment and usage, formed from first hand
experience.

I'm not saying I agree or disagree with any of the specifics, I'm just
making a plea for us to raise the level of analysis to a point where
we have some more robust evidence from which to draw conclusions. I'll
do what I can to contribute, but I think we all need to pitch in and
produce this evidence if the discussion and conclusions are going to
be credible. Anecdotes and opinion only get us so far.

Cheers,

Tom.


 What hard data do you think would resolve (or if not resolve, at least move
 forward) the argument? Some people  are contributing their own experience
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up
 trends from the webdatacommons.org  (or similar) data?

 A few things come to mind:

 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

 I think it's explainable. I don't think it's self evident

 And explanation can be tricky because:

 a) once you get past the obvious cases (a person and their homepage) there
 are further levels of abstraction that make things complicated. A journalist
 submits a report to a news agency, a sub-editor tweaks it and puts it on the
 wires, a news publisher picks up the report, a journalist shapes an article
 around it, another sub-editor tweaks that, the article gets published, the
 article gets syndicated. Which document is the rdf making claims (created
 by, created at) about? And is that the important / interesting thing? You
 quickly head down a frbr shaped rabbit hole

 b) The way people make and use websites (outside the whole linked data
 thing) has moved on. Many people don't just publish pages; they publish
 pages that have a one-to-one correspondence with real world things. A page
 per photo or programme or species or recipe or person. They're already in
 the realm of thinking about things before pages and to them the page and
 it's url is a good enough approximation for description

 c) people using the web are already thinking about things not pages. If you
 search google for Obama your mental model is of the person, not any
 resulting pages

 d) we already have the resource / representation split which is quite enough
 abstraction for some people

 e) the list of things you might want to say about a document is finite; the
 list of things you might want to say about the world isn't

 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

 I know publishers whose platform is so constrained they can't even edit the
 head section of their html documents. They certainly don't have access at
 the server level

 Even where 303s are technically possible they might not be politically
 possible. Technically we could have easily created bbc.co.uk/things/:blah
 and made it 303 but that would have involved setting up /things and that's a
 *very* difficult conversation with management and ops

 And if it's technically and politically possible it really depends on how
 the 303 is set up. Lots of linked data people seem to conflate the 303 and
 content negotiation. So I ask for something that can't be sent,

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-30 Thread Pat Hayes


On Mar 27, 2012, at 6:59 AM, Danny Ayers wrote:

 This seems an appropriate place for me to drop in my 2 cents.
 
 I like the 303 trick. People that care about this stuff can use it
 (and appear to be doing so), but it doesn't really matter too much
 that people that don't care don't use it. It seems analogous to the
 question of HTML validity. Best practices suggest creating valid
 markup, but if it isn't perfect, it's not a big deal, most UAs will be
 able to make sense of it. There will be reduced fidelity of
 communication, sure, but there will be imperfections in the system
 whatever, so any trust/provenance chain will have to consider such
 issues anyway.
 So I don't really think Jeni's proposal is necessary, but don't feel
 particularly strongly one way or the other.
 
 Philosophically I reckon the flexibility of what a representation of a
 resource can be means that the notion of an IR isn't really needed.
 I've said this before in another thread somewhere, but if the network
 supported the media type thing/dog then it would be possible to GET
 http://example.org/Basil with full fidelity. Right now it doesn't, but
 I'd argue that what you could get with media type image/png would
 still be a valid, if seriously incomplete representation of my dog. In
 other words, a description of a thing shares characteristics with the
 thing itself, and that's near enough for HTTP representation purposes.

It might be for HTTP, but not for RDF (and up) representational purposes. And 
as this entire brouhaha only arose when people started worrying about semantics 
at the RDF level (and up), this is not a particularly helpful remark. 

The basic mistake you (and others) are making is to conflate reference with 
similarity. A description of a thing shares NO characteristics with the thing 
it describes. Describing is not being-somewhat-similar-to. For an early (1726) 
but still insightful explanation of what is wrong with this idea, see 
http://4umi.com/swift/gulliver/laputa/5 :

We next went to the School of Languages, where three Professors sate in 
Consultation upon improving that of their own country.
The first Project was to shorten Discourse by cutting Polysyllables into one, 
and leaving out Verbs and Participles, because in reality all things imaginable 
are but Nouns.
The other, was a Scheme for entirely abolishing all Words whatsoever; and this 
was urged as a great Advantage in Point of Health as well as Brevity. For it is 
plain, that every Word we speak is in some Degree a Diminution of our Lungs by 
Corrosion, and consequently contributes to the shortning of our Lives. An 
Expedient was therefore offered, that since Words are only Names for Things, it 
would be more convenient for all Men to carry about them, such Things as were 
necessary to express the particular Business they are to discourse on. And this 
Invention would certainly have taken Place, to the great Ease as well as Health 
of the Subject, if the Women in conjunction with the Vulgar and Illiterate had 
not threatned to raise a Rebellion, unless they might be allowed the Liberty to 
speak with their Tongues, after the manner of their Ancestors; such constant 
irreconcilable Enemies to Science are the common People. However, many of the 
most Learned and Wise adhere to the New Scheme of expressing themselves by 
Things, which hath only this Inconvenience attending it, that if a Man's 
Business be very great, and of various kinds, he must be obliged in Proportion 
to carry a greater bundle of Things upon his Back, unless he can afford one or 
two strong Servants to attend him. I have often beheld two of those Sages 
almost sinking under the Weight of their Packs, like Pedlars among us; who, 
when they met in the Streets, would lay down their Loads, open their Sacks, and 
hold Conversation for an Hour together; then put up their Implements, help each 
other to resume their Burthens, and take their Leave.
But for short Conversations a Man may carry Implements in his Pockets and under 
his Arms, enough to supply him, and in his House he cannot be at a loss: 
Therefore the Room where Company meet who practise this Art, is full of all 
Things ready at Hand, requisite to furnish Matter for this kind of artificial 
Converse.
Another great Advantage proposed by this Invention, was that it would serve as 
a Universal Language to be understood in all civilized Nations, whose Goods and 
Utensils are generally of the same kind, or nearly resembling, so that their 
Uses might easily be comprehended. And thus Embassadors would be qualified to 
treat with foreign Princes or Ministers of State to whose Tongues they were 
utter Strangers.

Pat

 
 Cheers,
 Danny.
 
 -- 
 http://dannyayers.com
 
 http://webbeep.it  - text to tones and back again
 
 


IHMC (850)434 8903 or (650)494 3973   
40 South Alcaniz St.   (850)202 4416   office
Pensacola

Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)

Hi Jeni,

On 27 March 2012 18:54, Jeni Tennison j...@jenitennison.com wrote:
 Hi Tom,

 On 26 Mar 2012, at 17:13, Tom Heath wrote:
 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,

 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.

 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!

 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


 What hard data do you think would resolve (or if not resolve, at least move 
 forward) the argument? Some people  are contributing their own experience 
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?

 A few things come to mind:

 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

 How would we assess that though?

Give me some free time and enough motivation and I'd design an
experimental protocol to unpick this issue ;)

 My experience is in some way similar -- it's easy enough to explain that you 
 can't get a Road or a Person when
 you ask for them on the web -- but when you move on to then explaining how 
 that means you need two URIs for  most of the things that you really want to 
 talk about, and exactly how you have to support those URIs, it starts
 getting much harder.

My original question was only about the distinction, but yes, some of
the details do get tricky, but when was it ever otherwise with
technology.

 The biggest indication to me that explaining the distinction is a problem is 
 that neither OGP nor schema.org even  attempts to go near it when explaining 
 to people how to add to semantic information into their web pages. The
 URIs that you use in the 'url' properties of those vocabularies are explained 
 in terms of 'canonical URLs' for the
 thing that is being talked about. These are the kinds of graphs that millions 
 of developers are building on, and
 those developers do not consider themselves linked data adopters and will not 
 be going to linked data experts for  training.

Yeah, this is a shame (the OGP/schema.org bit, and the fact they won't
be asking for LD training ;). IIRC Ian Davis proposed a schema-level
workaround for this around the time OGP was released. He had a good
case that it was a non-problem technically, but no, that doesn't
explain why the distinction is not baked into the data model; same
with microformats.

 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

 I could offer you a data point from legislation.gov.uk if you like.

Woohoo! You've made my decade :D

 When someone requests the ToC for an item of
 legislation, they will usually hit our CDN and the result will come back 
 extremely quickly. I just tried:

 curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents

 and it showed the result coming back in 59ms.

 When someone uses the identifier URI for the abstract concept of an item of 
 legislation, there's no caching so the  request goes right back to the 
 server. I just tried:

 curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67

 and it showed the result coming back in 838ms, of course the redirection goes 
 to the ToC above, so in total it
 takes around 900ms to get back the data.

Brilliant. This is just the kind of analysis I'm talking about. Now we
need to do similar across a bunch of services, connection speeds,
locations, etc., and then compare it to typical response times across
a representative sample of web sites. We use New Relic for this kind
of thing, and the results are rather illuminating. 1ms response times
makes you rather special IIRC. That's not to excuse sluggish sites,
but just to put this in context.

 So every time that we refer to an item of legislation through its generic 
 identifier rather than a direct link to its ToC  we are making the

Re: Change Proposal for HttpRange-14

Hi Giovanni,

On 27 March 2012 21:01, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 Tom if you were to do a serious assessment then measuring milliseconds
 and redirect hits means looking at a misleading 10% of the problem.

Sorry, but I don't buy your argument, which equates to never asking
any questions unless you can also answer at the same time all the
others that pertain to the same issue. You've gotta start somewhere,
and just be realistic about the extent of claims you make based on the
evidence.

As for economics and perception of benefits, you're talking about
broader issues of Linked Data adoption. My original point was only
about httpRange-14.

Tom.


 Cognitive loads,economics and perception of benefits are the over the
 90% of the question here.

 An assessment that could begin describing the issue

 * get a normal webmaster calculate how much it takes to explain him
 the thing,follow him on and
 * see how quickly he forgets,
 * assess how much it takes to VALIDATE the whole thing works (E.g. a
 newly implemented spects)
 * assess what are the tools that would check if something break
 * assess the same thing for implementers e.g. of applications or
 consuming APIs to get all teh above
 * then once you calculate the huge cost above then compare it with the
 perceived benefits.

 THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical
 level because for sites that matters ITS MANAGERS THAT DECIDE geek run
 websites dont count, sorry.

 Same thing when looking at 'real world applications' by counting just
 geeky hacked together demostrators or semweb aficionados libs has the
 same skew.. these people and apps were paid by EU money or research
 money or  so they should'n  count toward real world economics driven
 apps, so if one was thinking of counting   50 apps that would break
 that'd be just as partial and misleading.

 .. and we could go on. Now do you really need to do the above? (let
 alone how difficult it is to do inproper terms) me and a whole crowd
 know already the  results for the same exercise have been done over
 and over and we've been witnessing it.
  i sincerely hope this is the time we get this fixed so we can indeed
 go back and talk about the new linked data (linked data 2.0) to actual
 web developers, it managers etc.

 removing the 303 thing doesnt solve the whole problem, it is just the
 beginning. Looking forward to discuss next steps

 Gio




 On Mon, Mar 26, 2012 at 6:13 PM, Tom Heath tom.he...@talis.com wrote:
 Hi Jeni,

 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,

 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.

 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!

 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


 What hard data do you think would resolve (or if not resolve, at least move 
 forward) the argument? Some people  are contributing their own experience 
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?

 A few things come to mind:

 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

 3) hard data about occurrences of different patterns/anti-patterns; we
 need something more concrete/comprehensive than the list in the change
 proposal document.

 4) examples of cases where the use of anti-patterns has actually
 caused real problems for people, and I don't mean problems in
 principle; have planes fallen out of the sky, has anyone died? Does it
 really matter from a consumption perspective? The answer to this is
 probably not, which may indicate a larger problem of non-adoption.

 The larger question is how do we get to a state where we *don't* have this

Re: Change Proposal for HttpRange-14

2012-03-30 Thread Kingsley Idehen


On 3/30/12 11:15 AM, Tom Heath wrote:

I'm not saying I agree or disagree with any of the specifics, I'm just
making a plea for us to raise the level of analysis to a point where
we have some more robust evidence from which to draw conclusions. I'll
do what I can to contribute, but I think we all need to pitch in and
produce this evidence if the discussion and conclusions are going to
be credible. Anecdotes and opinion only get us so far.

Cheers,

Tom.

I'll have a DBpedia report published soon.


Stay tuned :-)

--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: Change Proposal for HttpRange-14

On 30 March 2012 17:39, Kingsley Idehen kide...@openlinksw.com wrote:
 On 3/30/12 11:15 AM, Tom Heath wrote:

 I'm not saying I agree or disagree with any of the specifics, I'm just
 making a plea for us to raise the level of analysis to a point where
 we have some more robust evidence from which to draw conclusions. I'll
 do what I can to contribute, but I think we all need to pitch in and
 produce this evidence if the discussion and conclusions are going to
 be credible. Anecdotes and opinion only get us so far.

 Cheers,

 Tom.

 I'll have a DBpedia report published soon.


 Stay tuned :-)

Kingsley, I'd like to buy you a beer! (you too Jeni ;)

Have a great weekend all,

Tom.

-- 
Dr. Tom Heath
Senior Research Scientist
Talis Education Ltd.
W: http://www.talisaspire.com/
W: http://tomheath.com/

Re: Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)

2012-03-30 Thread Hugh Glaser

I think this may be a stuck record, but here goes…
Would be nice, Tom, but.
Yet again, the discussion around this issue is entirely focussed on
a) aspects of logic, philosophy and like-minded;
b) aspects of the problems of publishing;
c) network issues.

So where is the consumption aspect?
The measure by which we decide if all this engineering is fit for purpose.
Design all the protocols you want, but if you are not examining the right 
thing, it is not very helpful (to put it mildly).
David Booth (sorry David!) said we need to deal with the engineering before 
addressing how we can educate people to understand it.
And therefore, I would say, whether it is possible.
This is not a recipe for building stuff that people can use.
In fact it is not engineering at all.

What is the definition of fit for purpose that you propose to use to define 
your protocols?
My definition requires that it is suitable input for building real applications 
that ordinary people can use, informed by multiple and even unbounded sites 
from the Web of Data. Clearly, at a decent scale as well.
This, I think, is the vision of Linked Data for many people.
(There is also an Agent point of view, but I think we are miles from that at 
the moment.)
If an argument cannot be made to support a point of view that has this as the 
end game, then I lose interest in the argument.

But let us say, you have a definition of fit for purpose, and define your 
protocol to assess it for these questions.

Tom, you said to Michael
 From: Tom Heath tom.he...@talis.com
 Subject: Re: Change Proposal for HttpRange-14
 
 Of all people you guys at the BBC have great anecdotes, and clearly


I found this really sad.
To my knowledge, Michael has never consumed much in the way of other people's 
Linked Data.
He has a fantastic wealth of knowledge about using Linked Data technologies to 
do Integration, which is a huge market for us.
Various people keep asking for examples of applications that might be 
interesting data points of Linked Data consuming end-user applications to 
inform the discussion.
But I have yet to see a satisfactory response.
But the sad truth is that I am beginning to think that after all these year I 
(RKBExplorer.com) may still be the only one who has actually built anything 
that consumes data from across the Linked Data Cloud, uses it to enhance the 
knowledge, and then deliver it to ordinary people (OK, we might fail, but we 
try).

This means that you, or others, just don't have enough data points to gather 
evidence for the assessment you want to do.

But please do try - I would love to see detailed analysis of fit for purpose - 
and I know how much time and effort that takes!
And yes, I am happy to provide you with any data I can.

Best
Hugh

On 30 Mar 2012, at 17:22, Tom Heath wrote:

 Hi Jeni,
 
 On 27 March 2012 18:54, Jeni Tennison j...@jenitennison.com wrote:
 Hi Tom,
 
 On 26 Mar 2012, at 17:13, Tom Heath wrote:
 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,
 
 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.
 
 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!
 
 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.
 
 
 What hard data do you think would resolve (or if not resolve, at least 
 move forward) the argument? Some people  are contributing their own 
 experience from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?
 
 A few things come to mind:
 
 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).
 
 How would we assess that though?
 
 Give me some free time and enough motivation and I'd design an
 experimental protocol to unpick this issue ;)
 
 My experience is in some way similar -- it's easy enough to explain that you 
 can't get a Road or a Person when
 you ask for them on the web -- but when you move on to then explaining how 
 that means you need two URIs for  most of the things that you really want 
 to talk about, and exactly how you have to support those URIs, it starts
 getting much harder.
 
 My original question was only about the distinction

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-30 Thread Jonathan A Rees

On Fri, Mar 30, 2012 at 10:32 AM, Jeni Tennison j...@jenitennison.com wrote:

 I see best practices as being separate from normative requirements, and 
 thought that the proposals were for the normative requirements. We did 
 recognise in the proposal the requirement for a best practice document to 
 supplement the normative requirements:


This is a helpful discussion because I'm still trying to figure out
the right way to say what I want to say, and with each iteration I
think I come a bit closer to the point.

My opinion is that any proposal needs to specify a way to say how you
get from a resource to its content. I do a SPARQL query and find a URI
for a resource based on metadata (stored in the triple store) that
make it seem interesting; title, license, rating, whatever. Then I
want to *look at it*. What do I do? httpRange-14(a) (or its intended
stronger form) says you do a GET on its URI, and if you get a 200,
that's the content, that's what I want to look at. So that's
successful communication. If you delete HR14a, which is fine, you
need, IMO, to replace it with some other way - normative and
actionable - to express the same information, and that method has to
be provided normatively, not as a best practice. Tim's proposal does
this, my SHOULD not MUST proposal does, yours doesn't.

And a reminder that I *do* understand content negotation; you don't
actually get the content but rather a content or one of its many
contentses.

The normative part would be the specification of this property; the
best practice would just be that you should use it, if the resource
has content on the web. Of course there are many situations where you
wouldn't use it, because you don't have the content, want to hide it,
don't want to be bothered, don't know where it is, etc. That's OK.

Sure, it's nice to be able to GET a description, as you have
specified, but that doesn't help in general, e.g. in the PICS/POWDER
use cases and what I gave above.

This is an easy fix to your proposal. You just add a normative section
that defines a property that people *may* use to provide this
information:

   http://example/foo baz:hasContentUri http://example/foo-content;.

or whatever you want to call it (Larry suggested 'location', I
suggested 'hasInstanceUri'). This means that to get the content do a
GET on that URI, and if the result is a 200 then you got content,
otherwise all bets are off. (Well, dealing with 301/302/307 would be
gravy.) Then the proposal will not be a net loss as far as expressive
power goes.

Opt-in to HR14a looks like this:

   http://example/foo baz:hasContentUri http://example/foo;.

but nobody *has* to do that.

There are problems with this idea, such as what if an agent can't
parse the particular flavor of RDF that's in use, but before we get
into that I want to see if you understand what I'm suggesting.

Jonathan

On Fri, Mar 30, 2012 at 10:32 AM, Jeni Tennison j...@jenitennison.com wrote:
 Jonathan,

 On 30 Mar 2012, at 01:51, Jonathan A Rees wrote:
 On Tue, Mar 27, 2012 at 6:01 PM, Jeni Tennison j...@jenitennison.com wrote:
 Good practice would be for Flickr to use separate URIs for 'the photograph' 
 and 'the description of the photograph', to ensure that 'the description of 
 the photograph' was reachable from 'the photograph' and to ensure that any 
 statements referred to the correct one. Under the proposal, they could 
 change to this good practice in four ways:

 1. by adding:

  link rel=describedby href=#main /

 to their page (or pointing to some other URL that they choose to use for 
 'the description of the photograph')

 2. by adding a Link: header with a 'describedby' relationship that points 
 at a separate URI for 'the description of the photograph' (possibly a 
 fragment as in 1?)

 Sorry, I didn't get why these are said to be better practice than the
 current Flickr page - how the document distinguishes the two cases.
 Does it say there 'should' or 'must' be a describedby? If the info
 resource assumption is gone, won't the Flickr page [still?] be
 understood the way Flickr intends? I'll have to study the proposal
 again (sorry, very hurried now, can't keep up)


 I see best practices as being separate from normative requirements, and 
 thought that the proposals were for the normative requirements. We did 
 recognise in the proposal the requirement for a best practice document to 
 supplement the normative requirements:

  We also recommend that a clear guide on best practices when publishing
  and consuming data should be written, possibly an update to [cooluris].

 I don't see this proposal as changing the current best practice 
 recommendations, which are to have separate URIs for documents about things 
 from the things themselves.

 I'm not sure I've understood your second question, but perhaps you're saying 
 that using hash URIs for fragments of the page that contains descriptions 
 doesn't work when mixed with the assumption that you get the description of 
 that hash URI by

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-30 Thread Jeni Tennison

Jonathan,

On 30 Mar 2012, at 18:10, Jonathan A Rees wrote:
 My opinion is that any proposal needs to specify a way to say how you
 get from a resource to its content. I do a SPARQL query and find a URI
 for a resource based on metadata (stored in the triple store) that
 make it seem interesting; title, license, rating, whatever. Then I
 want to *look at it*. What do I do? httpRange-14(a) (or its intended
 stronger form) says you do a GET on its URI, and if you get a 200,
 that's the content, that's what I want to look at. So that's
 successful communication. If you delete HR14a, which is fine, you
 need, IMO, to replace it with some other way - normative and
 actionable - to express the same information, and that method has to
 be provided normatively, not as a best practice. Tim's proposal does
 this, my SHOULD not MUST proposal does, yours doesn't.

OK, I think I see. The intention of the 'no longer implies' proposal is that 
you GET its URI. If you get a 200 then you have to look at the content that 
comes back to work out the relationship between the URI and the representation 
because you can't generally tell whether the representation is content or 
description.

Your assertion, I think, is that we haven't specified a mechanism for providing 
an explicit statement within the content that says this stuff you got is the 
content of the resource this URI identifies, only one for saying the stuff 
over there is the description of the resource this URI identifies.

The intention was for the :describedby property to double-up for this. The 
proposal states that if the content includes a statement using :describedby 
property in which the resource is the object of the statement, then you know 
that resource is an information resource (ie that you get the content of the 
resource from the URI). So if you GET U and you have a 200 and it contains 
something that looks like:

  _:something :describedby U .

then you know that what you have gotten from U is the content of U.

You say the gap can be fixed with:

 This is an easy fix to your proposal. You just add a normative section
 that defines a property that people *may* use to provide this
 information:
 
   http://example/foo baz:hasContentUri http://example/foo-content;.
 
 or whatever you want to call it (Larry suggested 'location', I
 suggested 'hasInstanceUri'). This means that to get the content do a
 GET on that URI, and if the result is a 200 then you got content,
 otherwise all bets are off. (Well, dealing with 301/302/307 would be
 gravy.) Then the proposal will not be a net loss as far as expressive
 power goes.

I *think* that the :describedby triple, as defined in the proposal, provides 
equivalent information. If you have:

  U :describedby V .

then you can turn it into:

  V :hasContentUri U .

and it has the same meaning. What have I missed? Is it important that U is a 
string rather than a resource for example?

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)

2012-03-30 Thread Jeni Tennison

Hi Tom,

On 30 Mar 2012, at 17:22, Tom Heath wrote:
On 27 March 2012 18:54, Jeni Tennison j...@jenitennison.com wrote:
2) hard data about the 303 redirect penalty, from a consumer and
publisher side. Lots of claims get made about this but I've never seen
hard evidence of the cost of this; it may be trivial, we don't know in
any reliable way. I've been considering writing a paper on this for
the ISWC2012 Experiments and Evaluation track, but am short on spare
time. If anyone wants to join me please shout.

I could offer you a data point from legislation.gov.uk if you like.

Woohoo! You've made my decade :D

When someone requests the ToC for an item of
legislation, they will usually hit our CDN and the result will come back
extremely quickly. I just tried:

curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents

and it showed the result coming back in 59ms.

When someone uses the identifier URI for the abstract concept of an item of
legislation, there's no caching so the request goes right back to the
server. I just tried:

curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67

and it showed the result coming back in 838ms, of course the redirection
goes to the ToC above, so in total it
takes around 900ms to get back the data.

Brilliant. This is just the kind of analysis I'm talking about. Now we
need to do similar across a bunch of services, connection speeds,
locations, etc., and then compare it to typical response times across
a representative sample of web sites. We use New Relic for this kind
of thing, and the results are rather illuminating. 1ms response times
makes you rather special IIRC. That's not to excuse sluggish sites,
but just to put this in context.

So every time that we refer to an item of legislation through its generic
identifier rather than a direct link to its ToC we are making the site
seem about 15 times slower.

So now we're getting down to the crux of the question: does this
outcome really matter?! 15x almost nothing is still almost nothing!
15x slower may offend our geek sensibilities, but probably doesn't
matter in practice when the absolute numbers are so small.

To give another example, I just did some very ad-hoc tests on some
URIs at a department of a well-known UK university, and the results
were rather revealing! The total response time (get URI of NIR,
receive 303 response, get URI of IR, receive 200 OK and resource
representation back) took ~10s, of which ***over 90%*** was taken up
by waiting for the page/IR about that NIR to be generated! (and that's
with curl, not a browser, which may then pull in a bunch of external
dependencies). In this kind of situation I think there are other,
bigger issues to worry about than the 1s taken for a 303-based
roundtrip!!

Just to put this into context for you so that you understand why it's a big
deal. We have a contract [1] (well, actually three contracts) that specifies
the time that it takes on average for a typical table of contents or section to
be retrieved as less than one second. In the England/Wales contract [2], it's
Clauses 12-13 of Section 6.8 of Schedule 1, on page 125 if you want to take a
look. The contract includes financial penalties when these targets aren't
reached.

It's not easy to reach these targets with the kind of complex content we're
dealing with. The only way we have a hope is through caching the hell out of
the site, delivering it through a CDN.

Now we could quibble over how exactly you measure the length of time for
retrieving a section or table of contents, but it's really clear that what the
customer (TNA) wants is a performant website that doesn't suffer from the
noticeable delay when loading a page that you get when a page takes more than a
second to come through [3]. If we had 303 hops, they would definitely be
complaining (remember the 900ms doesn't include downloading CSS and Javascript,
which add delays), and it could cost TSO money.

I'm absolutely prepared to believe that there are sites out there that don't
have these limitations: I don't really care if it takes more than a second for
pages on my own website to get returned, for example. But for large-scale
websites like legislation.gov.uk, delivered under contracts that have penalty
clauses for poor performance, yes it really really does matter that it's 60ms
rather than 900ms.

Cheers,

Jeni

[1]
http://www.contractsfinder.businesslink.gov.uk/Common/View%20Notice.aspx?site=1000lang=ennoticeid=272362fs=true
[2]
http://www.contractsfinder.businesslink.gov.uk/~/docs/DocumentDownloadHandler.ashx?noticeDocumentId=18140fileId=b826ad80-f316-493a-a86d-23546ceb95e2
[3] http://www.useit.com/papers/responsetime.html
--
Jeni Tennison
http://www.jenitennison.com

Re: Middle ground change proposal for httpRange-14

2012-03-29 Thread David Booth

Hi Jeni,

On Wed, 2012-03-28 at 18:01 +0100, Jeni Tennison wrote:

http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol
[ . . . ]
1. The focus on the *definition* of a URI as opposed to a mere
description is problematic for me. There are lots of things in the
world that couldn't be adequately *defined* but can be described to
more or less detail. I worry that people will get tied up in knots
trying to work out what a definition looks like for a Person or a
Book. Although I prefer most of the language in your draft, I prefer
the looser 'description' used in Jonathan's document.

That sounds like an important concern, but I think it is best to
separate the issue of how we educate the public about how this works,
from figuring out the engineering of how it works. We first need to
deal with the engineering.

If you notice the definition of URI documentation in the baseline
document
http://www.w3.org/2001/tag/doc/uddp-20120229/
it says: URI documentation is information that documents the intended
meaning of a particular probe URI. That's what a definition is. So
the terminology in the UDDP proposal
http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#2.4_URI_definition.2C_explicit_URI_definition_and_implicit_URI_definition
is merely calling a spade a spade.

Furthermore, there is an important distinction between a definition and
any other documentation or description. A definition *is* documentation
(or a description) but not every piece of documentation (or not every
description) is a definition. This key difference tends to get blurred
when a definition is blandly called documentation or a description.
I suspect that some have been wary of recognizing this distinction out
of a concern that if something is called a definition, then a client
will be obligated to use that definition, and that would unreasonably
constrain the client. But this concern is unfounded if the
specification makes clear that a client is free to do whatever it wishes
with a URI definition that it retrieves.

Finally, to give a little more insight about what it means to provide a
URI definition for something such as a Person or a Book, in some sense
the URI definition does not actually *define* that person or book.
Rather, it defines the *binding* of the URI (as a name) to a particular
description of that thing, which indirectly (partially) identifies it.
And as Pat Hayes (and others) have pointed out many times, there is
inherent ambiguity in virtually any description. This means that a URI
definition does not *fully* determine the thing that the URI is supposed
to identify. That is both a plus and a minus. It is a minus because it
means that in general others can never know *exactly* what that URI
owner intended it to identify, and this leads to downstream
inconsistencies, as illustrated Part 2 of Resource Identity and
Semantic Extensions: Making Sense of Ambiguity:
http://dbooth.org/2010/ambiguity/paper.html#inconsistent-merge

On the other hand ambiguity is also a plus because it means that the URI
can be used in a much wider variety of contexts, such as the URIs in a
loose vocabulary like SKOS. This does *not* mean that such a vocabulary
is universally *better* than one that is very precise, such as a
detailed biomedical ontology. It just means that it has different uses.
Of course, the holy grail is to produce ontologies that are both precise
and have wide application, but this is exceedingly difficult to achieve.
In the meantime we must muddle along in our imperfect world, and the
architecture must be designed with this in mind.

2. While the draft says that it doesn't define the term information
resource it nevertheless uses that term in many places, as if it
means something.

Right. That is an artifact of AWWW and the httpRange-14 resolution that
I left in there, but as Mike Bergman suggests
http://lists.w3.org/Archives/Public/public-lod/2012Mar/0325.html
it could eliminated entirely, as it is not needed.

For example, in 3.2.1 it says that you can tell (if a result is eg a
200 OK) that the target URI identifies an information resource. Given
that 'information resource' isn't defined in the document, what does
that actually mean in terms of what an application should do?

Nothing. The application may use it or ignore it as it sees fit. You
can think of it like a marker interface
http://en.wikipedia.org/wiki/Marker_interface_pattern
with no initial semantics. At first glance this may seem pointless, but
it does actually have some utility, because it means that applications
that *choose* to do so can conveniently hang additional semantics onto
that class.

For example, an application that *chooses* to treat the class of
information resources as disjoint with the class of Persons can easily
do so. This is a choice rather than a requirement because, as has been
pointed out many times, there is no clear distinction between the class
of information resources and non-information resources.

3. I

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-29 Thread David Booth

On Thu, 2012-03-29 at 01:37 +0100, Norman Gray wrote:
[ . . . ]
 Thus as it stands, the term 'information resource' in [1] has no
 implication (beyond incidentally reiterating that the 200-retrieved
 content is a (REST) representation of the resource).
 
 However, the point of introducing the term is, I've always taken it,
 that it licenses the client to jump to some conclusions.  These
 conclusions aren't spelled out anywhere, but (unless you're being
 whimsical) they're things like 'this is a document', or 'this is a
 network thing', or 'this is not a squawking macaw which will squeeze
 out of the ethernet port and crap on my keyboard'.  What those
 conclusions materialise as in practice surely _depends on the
 application_ which is processing the resource.

Exactly.  And that is precisely why the UDDP Proposal use the term
information resource but explicitly leaves its definition
unconstrained:
http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol#2.7_Information_resource
As I mentioned elsewhere, the term is not needed, and could be
eliminated entirely.  But it does provide a convenience for applications
that wish to make additional assumptions based on an HTTP 200 response.


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Re: Change Proposal for HttpRange-14

2012-03-28 Thread Michael Smethurst




On 27/03/2012 18:12, Kingsley Idehen kide...@openlinksw.com wrote:

 On 3/27/12 12:35 PM, Michael Smethurst wrote:
 
 
 On 27/03/2012 16:53, Kingsley Idehenkide...@openlinksw.com  wrote:
 
 On 3/27/12 11:17 AM, Michael Smethurst wrote:
 No sane publisher trying to handle a decent amount of traffic is gonna
 follow the dbpedia pattern of doing it in one step (conneg to 303) and
 picking up 2 server hits per request. I've said here before that the
 dbpedia
 publishing pattern is an anti-pattern and shouldn't be encouraged
 Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top
 priority, the goal was simple: unleash Linked Data in a manner that just
 worked. That meant catering for:
 
 1. frameworks and libraries that send hash URIs over the wire
 2. work with all browsers, no excuses.
 
 Linked Data is now alive and in broad use (contrary to many
 misconceptions to the contrary), there is still a need for slash URIs.
 This isn't a matter of encouragement or discouragement, its a case of
 what works for the project goals at hand. If slash URIs don't work then
 use hash URIs or vice versa. Platforms that conform to Linked Data meme
 principles should be able to handle these scenarios.
 
 BTW - Imagine a scenario where Linked Data only worked with one style of
 URI, where would we be today or tomorrow, re. Linked Data? Being
 dexterous and unobtrusive has to be a celebrated feature rather than a
 point of perpetual distraction.
 My point wasn't about hashes or slashes or any style of uri.
 Your comment was:
 
  No sane publisher trying to handle a decent amount of traffic is gonna
 follow the dbpedia pattern of doing it in one step (conneg to 303) and
 picking up 2 server hits per request.

Yes, but I was making a point about the one step, not the slashes or the 303
 
 You described DBpedia method of doing things as being one to be
 discouraged. DBpedia deploys Linked Data via slash URIs, hence my
 response. Also, in the context of Linked Data slash URIs ultimately lead
 to the contentious 303 entity name / web resource address disambiguation
 heuristic.

But they don't have to lead to doing conneg and 303 in one step

   It was about
 conflating 303s ((I can't give you that but) here's something that might be
 useful) with conneg (here's the useful thing in the representation you asked
 for).
 
 303 isn't a conflation of anything. It's a redirection mechanism that
 can be used in different ways. Sometimes it facilitates access to
 alternative representations and sometimes it can just be used to
 facilitate indirection re. data access by name reference as per Linked
 Data principles.

I didn't say 303 is a conflation. I said when you conflate the 303 part with
the conneg... that's a conflation.

I don't think it's 303s job to facilitate access to alternative
representations; that's what conneg's for

Dbpedia does:

Thing that's not a web document  conneg + 303  representation of a web
document

Instead of:

Thing that's not a web document  303  resource uri for a web document 
conneg  representation of web document

If you do the latter then all html links can point to the resource uri of
the web document so the publisher still incurs a conneg cost for each
request (which is reasonable) but doesn't incur a 303 cost for every request
(which isn't). The only place you need to refer to the uri of the thing
that's not a web document is when you want to make statements about it

If you do the former then (as per dbpedia) you end up linking to the thing
that is not a web document and picking up a 303 penalty for every request

So I'm saying that you can do slashes and 303s but in way that's more
palatable to publishers than dbpedia

I still think there are other problems with 303s:
- some people who want to publish linked data just don't have access to
configure their server to do this (which would also be a problem for any new
20x response)
- persuading your manager and your manager's manager and your manager's
manager's manager (not to mention ops!) is not easy

But this is heading off topic so apologies

Michael
 
 In the Linked Data system, you are seeking the description of an Entity
 that's been identified using a URI. If it so happens that the URI is
 hashless (or slash based) the system doesn't reply with an actual entity
 descriptor resource address, it redirects you. The very same thing
 happens with a hash URI but it has the benefit of delivering said
 indirection and disambiguation implicitly.
 
 
 There is always indirection in play. 303 isn't conflation, its simply
 redirection that is exploitable in a variety of ways.
 And about how not exposing the generic IR URI and not linking to it
 imposes too high a penalty
 
 Here are the potential penalties, both ultimately about entity name /
 entity descriptor (description) resource address disambiguation:
 
 1. 303 round trip costs
 2. agreement about which relations and constituent predicates provide
 agreed up semantics that address

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-28 Thread Norman Gray


Greetings.

[This is a late response, because I dithered about sending it, because this 
whole thing seems simple enough that I've got to be missing stuff]

On 2012 Mar 27, at 14:02, Jonathan A Rees wrote:

 On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de 
 wrote:
 
 Hello Tim,
 
 On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
 12) Still people say well, to know whether I use 200 or 303 I need to know 
 if this sucker is an IR or NIR when instead they should be saying Well, 
 am I going to serve the content of this sucker or information about it?.
 
 I think the question should be does the response contain the content of it
 because I can serve both at once (foaf:PersonalProfileDocument 
 rdf:about=).
 
 Yes, this is the question - is the retrieved representation content (I
 used the word instance but it's not catching on), or description. It
 can be both.

Fine -- that seems the key question.  In some ideal world, everything on the 
web would come with RDF which explained what it was; but expecting that ever to 
happen would be mad.

The HR14 resolution gives one answer to this, by doing _two_ things.

Step 1. HR14 declares the existence of a subset of resources named 'IR'.  You 
can gloss this set as 'information resource', or 'document', note that the set 
is vague, or deny that the set is important, but that doesn't matter.

Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a 
resource in IR:  If you get a 200 when you dereference X, the resource is 
conclusively in IR.  End of story.

(you can all suck eggs, now, yes?)

Why does the set IR matter? (and pace Tim and various weary voices in this 
metathread, I think it does matter).  Because saying 'X names a resource in IR' 
tells you that the URI and the associated resource have a Particularly Simple 
Relationship -- the content of the HTTP retrieval is the 'content' of the 
resource (in some way which probably doesn't have to be precise, but which 
asserts that resource is something, unlike a Macaw, that can come through a 
network).  In this way -- crucially -- it answers Tim's question (12) above: 
retrieving X with a 200 status obtains the content of the sucker.  So the 
concept of 'IR' does do some work because it gives the client information about 
the object.

Right?

BUT, we (obviously) also want to talk about things where there's a slightly 
more complicated relationship between the URI and some resource (eg a URI which 
names a bird).  In this case, the extra information (that the URI and the 
resource have a Particularly Simple Relationship) would be false.  The cost of 
a particularly simple step 2 above, is the (in retrospect variously costly) 
indirection of the 303-dance.

So the whole discussion seems to be about whether and how to relax step 2.  
Jeni Tennison's proposal says it should be relaxed in the presence of a 
'describedby' link, David Booth's that it should be relaxed with a new 
definedby link, or a (self-)reference with rdfs:isDefinedBy.  My 'proposal' was 
that it could be relaxed even more minimally, by saying that placing the 
resource in IR (step 2 above) could be done by the client only if this didn't 
contradict any RDF in the content of the resource (because the RDF said that X 
named a person, say), however conveyed (and of course these two proposals 
achieve that).

After all this torrent of message (and I have honestly tried to read a 
significant fraction of them, and associated documents), I'm still not seeing 
how this is problematic.  Perhaps I'm slow, or I've read the wrong fraction of 
messages.

  * Anything that was HR14-compliant will still be compliant with the relaxed 
Step 2. No change.

  * Any resource that wasn't in IR before, but whose URI nonetheless produced 
200, was formally broken. It was telling lies.  With a relaxed Step 2, it now 
won't be broken any more.  Some applications (Tabulator?) will have to change 
to respect that, but they couldn't tell they were being lied to before, so 
they're merely exchanging one problem for a fixable one.

  * This is insensitive to the definition of 'information resource', and it 
doesn't matter if the content is multiple things.  If a resource 200-says that 
its URI names a Book, then you don't have to worry whether that's an 
'information resource' or not, because you know it's a book; end of algorithm; 
do not go to the end of Step 2; do not add any extra information hacked/derived 
from protocol details.

That seems an inexpensive change which un-breaks a lot of things.

All the best (in some puzzlement),

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-28 Thread Michael Brunnbauer


Hello Norman,

let me summarize that:

-Regardless of how you define IR, everything that denotes what it accesses
 should lie in IR.

-Putting something in NIR therefor also answers the question if it denotes
 what it accesses with no by entailment.

-There may or may not be IRs that do not denote what they access.

But would it not be simpler just to signal this uri does not access what
it denotes for 200 statuscodes instead of signalling this uri is a NIR ?

Regards,

Michael Brunnbauer

On Wed, Mar 28, 2012 at 06:59:05PM +0100, Norman Gray wrote:
 
 Greetings.
 
 [This is a late response, because I dithered about sending it, because this 
 whole thing seems simple enough that I've got to be missing stuff]
 
 On 2012 Mar 27, at 14:02, Jonathan A Rees wrote:
 
  On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de 
  wrote:
  
  Hello Tim,
  
  On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
  12) Still people say well, to know whether I use 200 or 303 I need to 
  know if this sucker is an IR or NIR when instead they should be saying 
  Well, am I going to serve the content of this sucker or information 
  about it?.
  
  I think the question should be does the response contain the content of 
  it
  because I can serve both at once (foaf:PersonalProfileDocument 
  rdf:about=).
  
  Yes, this is the question - is the retrieved representation content (I
  used the word instance but it's not catching on), or description. It
  can be both.
 
 Fine -- that seems the key question.  In some ideal world, everything on the 
 web would come with RDF which explained what it was; but expecting that ever 
 to happen would be mad.
 
 The HR14 resolution gives one answer to this, by doing _two_ things.
 
 Step 1. HR14 declares the existence of a subset of resources named 'IR'.  You 
 can gloss this set as 'information resource', or 'document', note that the 
 set is vague, or deny that the set is important, but that doesn't matter.
 
 Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a 
 resource in IR:  If you get a 200 when you dereference X, the resource is 
 conclusively in IR.  End of story.
 
 (you can all suck eggs, now, yes?)
 
 Why does the set IR matter? (and pace Tim and various weary voices in this 
 metathread, I think it does matter).  Because saying 'X names a resource in 
 IR' tells you that the URI and the associated resource have a Particularly 
 Simple Relationship -- the content of the HTTP retrieval is the 'content' of 
 the resource (in some way which probably doesn't have to be precise, but 
 which asserts that resource is something, unlike a Macaw, that can come 
 through a network).  In this way -- crucially -- it answers Tim's question 
 (12) above: retrieving X with a 200 status obtains the content of the sucker. 
  So the concept of 'IR' does do some work because it gives the client 
 information about the object.
 
 Right?
 
 BUT, we (obviously) also want to talk about things where there's a slightly 
 more complicated relationship between the URI and some resource (eg a URI 
 which names a bird).  In this case, the extra information (that the URI and 
 the resource have a Particularly Simple Relationship) would be false.  The 
 cost of a particularly simple step 2 above, is the (in retrospect variously 
 costly) indirection of the 303-dance.
 
 So the whole discussion seems to be about whether and how to relax step 2.  
 Jeni Tennison's proposal says it should be relaxed in the presence of a 
 'describedby' link, David Booth's that it should be relaxed with a new 
 definedby link, or a (self-)reference with rdfs:isDefinedBy.  My 'proposal' 
 was that it could be relaxed even more minimally, by saying that placing the 
 resource in IR (step 2 above) could be done by the client only if this didn't 
 contradict any RDF in the content of the resource (because the RDF said that 
 X named a person, say), however conveyed (and of course these two proposals 
 achieve that).
 
 After all this torrent of message (and I have honestly tried to read a 
 significant fraction of them, and associated documents), I'm still not seeing 
 how this is problematic.  Perhaps I'm slow, or I've read the wrong fraction 
 of messages.
 
   * Anything that was HR14-compliant will still be compliant with the relaxed 
 Step 2. No change.
 
   * Any resource that wasn't in IR before, but whose URI nonetheless produced 
 200, was formally broken. It was telling lies.  With a relaxed Step 2, it now 
 won't be broken any more.  Some applications (Tabulator?) will have to change 
 to respect that, but they couldn't tell they were being lied to before, so 
 they're merely exchanging one problem for a fixable one.
 
   * This is insensitive to the definition of 'information resource', and it 
 doesn't matter if the content is multiple things.  If a resource 200-says 
 that its URI names a Book, then you don't have to worry whether that's an 
 'information

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-28 Thread Jonathan A Rees

On Wed, Mar 28, 2012 at 1:59 PM, Norman Gray nor...@astro.gla.ac.uk wrote:

 Greetings.

 [This is a late response, because I dithered about sending it, because this 
 whole thing seems simple enough that I've got to be missing stuff]

 On 2012 Mar 27, at 14:02, Jonathan A Rees wrote:

 On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de 
 wrote:

 Hello Tim,

 On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
 12) Still people say well, to know whether I use 200 or 303 I need to 
 know if this sucker is an IR or NIR when instead they should be saying 
 Well, am I going to serve the content of this sucker or information about 
 it?.

 I think the question should be does the response contain the content of it
 because I can serve both at once (foaf:PersonalProfileDocument 
 rdf:about=).

 Yes, this is the question - is the retrieved representation content (I
 used the word instance but it's not catching on), or description. It
 can be both.

 Fine -- that seems the key question.  In some ideal world, everything on the 
 web would come with RDF which explained what it was; but expecting that ever 
 to happen would be mad.

 The HR14 resolution gives one answer to this, by doing _two_ things.

 Step 1. HR14 declares the existence of a subset of resources named 'IR'.  You 
 can gloss this set as 'information resource', or 'document', note that the 
 set is vague, or deny that the set is important, but that doesn't matter.

 Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a 
 resource in IR:  If you get a 200 when you dereference X, the resource is 
 conclusively in IR.  End of story.

 (you can all suck eggs, now, yes?)

 Why does the set IR matter? (and pace Tim and various weary voices in this 
 metathread, I think it does matter).  Because saying 'X names a resource in 
 IR' tells you that the URI and the associated resource have a Particularly 
 Simple Relationship -- the content of the HTTP retrieval is the 'content' of 
 the resource (in some way which probably doesn't have to be precise, but 
 which asserts that resource is something, unlike a Macaw, that can come 
 through a network).  In this way -- crucially -- it answers Tim's question 
 (12) above: retrieving X with a 200 status obtains the content of the sucker. 
  So the concept of 'IR' does do some work because it gives the client 
 information about the object.

 Right?

Wrong. Just knowing that it is an IR is not sufficient. You made a
logical leap, unjustified by anything written down anywhere, that it
was an IR *that had that content*. The Flickr and Jamendo examples are
perfectly consistent with the URI naming an IR, but the content you
get is not content of the IR described by the RDF therein, so they
name a different IR.

But let's grant this, as it can easily be fixed with a small
clarification, and move on. It does not really bear on your proposal
anyhow.

 BUT, we (obviously) also want to talk about things where there's a slightly 
 more complicated relationship between the URI and some resource (eg a URI 
 which names a bird).  In this case, the extra information (that the URI and 
 the resource have a Particularly Simple Relationship) would be false.  The 
 cost of a particularly simple step 2 above, is the (in retrospect variously 
 costly) indirection of the 303-dance.

 So the whole discussion seems to be about whether and how to relax step 2.  
 Jeni Tennison's proposal says it should be relaxed in the presence of a 
 'describedby' link, David Booth's that it should be relaxed with a new 
 definedby link, or a (self-)reference with rdfs:isDefinedBy.  My 'proposal' 
 was that it could be relaxed even more minimally, by saying that placing the 
 resource in IR (step 2 above) could be done by the client only if this didn't 
 contradict any RDF in the content of the resource (because the RDF said that 
 X named a person, say), however conveyed (and of course these two proposals 
 achieve that).

You are asking the right question, and I applaud the effort. I think
many people would like a solution similar to this one. But IMO looking
for a contradiction is not actionable, and for me that's a recipe for
disaster, since it forces human judgment to intervene in each case.
Human judgment is both expensive and unreliable.

Contradictions are impossible to test by machine. The consistency of
statements such as dc:creator or rdfs:comment with what the content is
lies outside what machines can do. So you put humans in the path of
deciding whether there is a contradiction, and therefore what the URI
mode is. This doesn't sound good to me.

Second, we know OWL Full consistency (i.e. contradiction detection) is
undecidable, and OWL DL can be pretty hard. How did deciding the URI
mode come to depend on what logic is being used, and become so
complicated?

Third, the RDF could be accidentally consistent with what the content
is, when the intent was for the URI to refer to something that

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-28 Thread Michael Brunnbauer


Hallo Norman,

 -Regardless of how you define IR, everything that denotes what it accesses
  should lie in IR.
 
 -Putting something in NIR therefor also answers the question if it denotes
  what it accesses with no by entailment.

I have worded this very badly. We are talking about things and names of things.
This should be:

For all URIs U: denote(U) = access(U) - denote(U) a IR

It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U)

 -There may or may not be IRs that do not denote what they access.

And this should be:

There is a URI U where: denote(U) a IR and denote(U) != access(U).

Now if a am allowed to mint a URI that 303's to your homepage and your
homepage is an IR, such an URI must exist:

U1 = Your URI for your homepage
U2 = My URI for your homepage
denote(U1) a IR
denote(U2) != access(U2)
denote(U1) = denote(U2) thererfor denote(U2) a IR and denote(U2) != access(U2)

I think I'll stay out of this discussion from now :-)

Regards,

Michael Brunnbauer


-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: Middle ground change proposal for httpRange-14

2012-03-28 Thread Jeni Tennison

Hi David,

On 25 Mar 2012, at 16:54, David Booth wrote:
 I have drafted what I think may represent a middle ground change
 proposal and I am wondering if something along this line would also meet
 your concerns:
 http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol
 
 Highlights of this proposal:
 - It enables a URI owner to unambiguously convey any URI definition to
 an interested client.
 - It does not constrain whether or how a client will use that or any
 other URI definition, as that is the client's business.
 - It retains the existing httpRange-14 rule.
 - It also permits the use of an HTTP 200 response with RDF content as a
 means of conveying a URI definition.
 - It provides guidelines for avoiding confusion and inconsistencies,
 while acknowledging the burden those guidelines place on URI owners.
 - It encourages URI owners to publish URI definitions even if those URI
 definitions are not perfect. 
 
 It also includes numerous other clarifications. 
 
 Would something along these lines also meet your concerns?


I don't think it does, quite, for me, for the following reasons:

1. The focus on the *definition* of a URI as opposed to a mere description is 
problematic for me. There are lots of things in the world that couldn't be 
adequately *defined* but can be described to more or less detail. I worry that 
people will get tied up in knots trying to work out what a definition looks 
like for a Person or a Book. Although I prefer most of the language in your 
draft, I prefer the looser 'description' used in Jonathan's document.

2. While the draft says that it doesn't define the term information resource 
it nevertheless uses that term in many places, as if it means something. For 
example, in 3.2.1 it says that you can tell (if a result is eg a 200 OK) that 
the target URI identifies an information resource. Given that 'information 
resource' isn't defined in the document, what does that actually mean in terms 
of what an application should do?

3. I like the section about resolving incompatibilities, but for me it isn't 
strong enough, particularly as it's non-normative. I'd like publishers to be 
able to rely on clients ignoring an implicit URI definition when there's an 
explicit URI definition, for example. Without that, I think the draft is just a 
reworded version of Jonathan's draft: publishers who 200 OK on URIs that are 
supposed to identify People are still Wrong.

So it gets a lot of the way there, just not quite all of it.

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-28 Thread Norman Gray


Michael, hello.

On 2012 Mar 28, at 22:35, Michael Brunnbauer wrote:

 For all URIs U: denote(U) = access(U) - denote(U) a IR
 
 It follows: For all URIs U: denote(U) not a IR - denote(U) != access(U)

I think it's impossible, within the terms of HR14, to say 'denote(U) not a IR' 
-- you can prove something is in IR, but you can neither prove nor even 
operationally assert that it's not.

 -There may or may not be IRs that do not denote what they access.
 
 And this should be:
 
 There is a URI U where: denote(U) a IR and denote(U) != access(U).
 
 Now if a am allowed to mint a URI that 303's to your homepage and your
 homepage is an IR, such an URI must exist:
 
 U1 = Your URI for your homepage
 U2 = My URI for your homepage

I don't think you even need the 303.

If you make a URI, declare it to be a URI identifying my home page (I think 
David Booth has written about how you'd do this formally), and then have it 
200-respond with a map of the Englisher Garten, then this places my homepage in 
IR, but does not access it.  This appears to be compatible with HR14 in an only 
slightly perverse reading.  I'm not sure what follows from that (but I suspect 
that way madness lies).

 I think I'll stay out of this discussion from now :-)

I think we should draw a veil, here...

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Michael Brunnbauer


Hello Tim,

On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
 12) Still people say well, to know whether I use 200 or 303 I need to know 
 if this sucker is an IR or NIR when instead they should be saying Well, am 
 I going to serve the content of this sucker or information about it?. 

I think the question should be does the response contain the content of it 
because I can serve both at once (foaf:PersonalProfileDocument rdf:about=).

Is there a difference between this question and the IR question if we take
Dans definition of IR as 'Web-serializable networked entity' ?

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Danny Ayers

This seems an appropriate place for me to drop in my 2 cents.

I like the 303 trick. People that care about this stuff can use it
(and appear to be doing so), but it doesn't really matter too much
that people that don't care don't use it. It seems analogous to the
question of HTML validity. Best practices suggest creating valid
markup, but if it isn't perfect, it's not a big deal, most UAs will be
able to make sense of it. There will be reduced fidelity of
communication, sure, but there will be imperfections in the system
whatever, so any trust/provenance chain will have to consider such
issues anyway.
So I don't really think Jeni's proposal is necessary, but don't feel
particularly strongly one way or the other.

Philosophically I reckon the flexibility of what a representation of a
resource can be means that the notion of an IR isn't really needed.
I've said this before in another thread somewhere, but if the network
supported the media type thing/dog then it would be possible to GET
http://example.org/Basil with full fidelity. Right now it doesn't, but
I'd argue that what you could get with media type image/png would
still be a valid, if seriously incomplete representation of my dog. In
other words, a description of a thing shares characteristics with the
thing itself, and that's near enough for HTTP representation purposes.

Cheers,
Danny.

-- 
http://dannyayers.com

http://webbeep.it  - text to tones and back again

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14


On 3/27/12 7:59 AM, Danny Ayers wrote:

This seems an appropriate place for me to drop in my 2 cents.

I like the 303 trick. People that care about this stuff can use it
(and appear to be doing so), but it doesn't really matter too much
that people that don't care don't use it. It seems analogous to the
question of HTML validity. Best practices suggest creating valid
markup, but if it isn't perfect, it's not a big deal, most UAs will be
able to make sense of it. There will be reduced fidelity of
communication, sure, but there will be imperfections in the system
whatever, so any trust/provenance chain will have to consider such
issues anyway.
So I don't really think Jeni's proposal is necessary, but don't feel
particularly strongly one way or the other.

Philosophically I reckon the flexibility of what a representation of a
resource can be means that the notion of an IR isn't really needed.
I've said this before in another thread somewhere, but if the network
supported the media type thing/dog then it would be possible to GET
http://example.org/Basil with full fidelity. Right now it doesn't, but
I'd argue that what you could get with media type image/png would
still be a valid, if seriously incomplete representation of my dog. In
other words, a description of a thing shares characteristics with the
thing itself, and that's near enough for HTTP representation purposes.

Cheers,
Danny.



Amen!!

We have resources that just 'mention' or 'refer' to *things* loosely 
i.e., you typical Web page.
RDF introduce resources that explicitly 'describe'  unambiguously named 
*things* via URIs.
RDFS  OWL introduces resources that explicitly 'define' unambiguously 
named *things* such as classes and properties via URIs.
Linked Data (or Hyperdata0 introduces resources that explicitly 
'describe' and 'define' unambiguously named *things* via de-referencable 
URIs.


When all is said an done, all of the above boils down to *representation 
fidelity* that one could order (hierarchically) as follows:


1. generic representation -- Web Pages
2. description oriented representation -- RDF which may or may not 
follow Linked Data principles
3. definition oriented representation -- RDFS, OWL, which may or may not 
follow Linked Data principles.



BTW -- I've published a work in progress post [1] that includes some 
diagrams (including the original WWW proposal depiction) re. Data, 
Documents, Content, URIs, and URLs.


Links:

1. http://goo.gl/DRvQM -- Understanding Data .

--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de wrote:

 Hello Tim,

 On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
 12) Still people say well, to know whether I use 200 or 303 I need to know 
 if this sucker is an IR or NIR when instead they should be saying Well, am 
 I going to serve the content of this sucker or information about it?.

 I think the question should be does the response contain the content of it
 because I can serve both at once (foaf:PersonalProfileDocument 
 rdf:about=).

Yes, this is the question - is the retrieved representation content (I
used the word instance but it's not catching on), or description. It
can be both.

 Is there a difference between this question and the IR question if we take
 Dans definition of IR as 'Web-serializable networked entity' ?

There is a difference, since what is described could be an IR that
does not have the description as content. A prime example is any DOI,
e.g.

http://dx.doi.org/10.1371/journal.pcbi.1000462

(try doing conneg for RDF). The identified resource is an IR as you
suggest, but the representation (after the 303 redirect) is not its
content.

Another example (anti-httpRange-14) is

http://www.flickr.com/photos/70365734@N00/6905069277/

The identified resource (according to the retrieved RDFa) is an IR,
but the retrieved representation is not its content.

In other words, even if the identified resource is an IR (under any
definition), the question remains of whether the retrieved
representation is content or description (except in the case where it
is both). The two dimensions are orthogonal.

Maybe I misunderstand your question.

This whole information resource thing needs to just go away. I can't
believe how many people come back to it after the mistake has been
pointed out so many times. Maybe the TAG or someone has to make a
statement admitting that the way httpRange-14(a) was phrased was a big
screwup, that the real issue is content vs. description, not a type
distinction.

I think Jeni's proposal is to say that the Flickr URI is good
practice, rather than deny it. My proposal is to say that the
description-free situation is good practice, rather than just an
undocumented common practice.

In a hybrid world where some URIs work one way (by description) and
others work the other way (by ostention), the question for anyone
encountering a hashless http: URI in RDF, is which of the two
situations (or both) obtain. (Maybe there are some URIs that work
neither way, or there is a gray area.) It would be nice if there were
definite answers at least for some URIs.

Jonathan

 Regards,

 Michael Brunnbauer

 --
 ++  Michael Brunnbauer
 ++  netEstate GmbH
 ++  Geisenhausener Straße 11a
 ++  81379 München
 ++  Tel +49 89 32 19 77 80
 ++  Fax +49 89 32 19 77 89
 ++  E-Mail bru...@netestate.de
 ++  http://www.netestate.de/
 ++
 ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
 ++  USt-IdNr. DE221033342
 ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
 ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14


On 3/27/12 9:02 AM, Jonathan A Rees wrote:

A prime example is any DOI,
e.g.

http://dx.doi.org/10.1371/journal.pcbi.1000462

(try doing conneg for RDF).
I don't always have to seek or need RDF. I just need structured data. I 
can make Linked Data from non RDF resources.


See:

1. 
http://uriburner.com/about/html/http://dx.doi.org/10.1371/journal.pcbi.1000462 
-- a basic description
2. 
http://uriburner.com/about/id/entity/http/dx.doi.org/10.1371/journal.pcbi.1000462 
-- an inferred description.


You can use the rules of HttpRange-14 combined with the rules of Linked 
Data to make a description oriented representation. Of course, I am 
doing translation and inference, but that's only possible due to the 
ground rules that are already in place etc..


Thus, we have an identifier associated with data that's ended up being 
interpreted courtesy of key ground rules from HttpRange-14 findings and 
Linked Data principles.


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

On Tue, Mar 27, 2012 at 9:32 AM, Kingsley Idehen kide...@openlinksw.com wrote:
 On 3/27/12 9:02 AM, Jonathan A Rees wrote:

 A prime example is any DOI,
 e.g.

 http://dx.doi.org/10.1371/journal.pcbi.1000462

 (try doing conneg for RDF).

 I don't always have to seek or need RDF. I just need structured data. I can
 make Linked Data from non RDF resources.

That wasn't my point. I was just giving an example where a 303 URI
refers to an IR. This illustrates the idea that being defined by
description does not imply that you have an non-IR (which I admit was
not a point I had to make). That's all. You don't have to do the
conneg if you don't want to, you just get a non-RDF description of the
resource if you don't ask for RDF. If you don't like this example look
at the Flickr one instead.

Jonathan

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14


On 3/27/12 9:02 AM, Jonathan A Rees wrote:

Maybe the TAG or someone has to make a
statement admitting that the way httpRange-14(a) was phrased was a big
screwup, that the real issue is content vs. description, not a type
distinction.


It should!

--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Michael Brunnbauer


Hello Jonathan,

so let the question be did I GET what the URI denotes and let httprange14
be 200 - yes, 303 - no.

Let another question be can this URI be used with document annotation
properties (or: Is this URI an IR) ? From 200 a statuscode, I can infer that
the URI can be used with document annotation properties and use those
properties. I can also use those properties with some 303 URIs but not always.

Both these questions may not be answered from a 200 statuscode in the future.

Is all of this right ?

Regards,

Michael Brunnbauer


On Tue, Mar 27, 2012 at 09:02:04AM -0400, Jonathan A Rees wrote:
 On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de 
 wrote:
 
  Hello Tim,
 
  On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
  12) Still people say well, to know whether I use 200 or 303 I need to 
  know if this sucker is an IR or NIR when instead they should be saying 
  Well, am I going to serve the content of this sucker or information about 
  it?.
 
  I think the question should be does the response contain the content of it
  because I can serve both at once (foaf:PersonalProfileDocument 
  rdf:about=).
 
 Yes, this is the question - is the retrieved representation content (I
 used the word instance but it's not catching on), or description. It
 can be both.
 
  Is there a difference between this question and the IR question if we take
  Dans definition of IR as 'Web-serializable networked entity' ?
 
 There is a difference, since what is described could be an IR that
 does not have the description as content. A prime example is any DOI,
 e.g.
 
 http://dx.doi.org/10.1371/journal.pcbi.1000462
 
 (try doing conneg for RDF). The identified resource is an IR as you
 suggest, but the representation (after the 303 redirect) is not its
 content.
 
 Another example (anti-httpRange-14) is
 
 http://www.flickr.com/photos/70365734@N00/6905069277/
 
 The identified resource (according to the retrieved RDFa) is an IR,
 but the retrieved representation is not its content.
 
 In other words, even if the identified resource is an IR (under any
 definition), the question remains of whether the retrieved
 representation is content or description (except in the case where it
 is both). The two dimensions are orthogonal.
 
 Maybe I misunderstand your question.
 
 This whole information resource thing needs to just go away. I can't
 believe how many people come back to it after the mistake has been
 pointed out so many times. Maybe the TAG or someone has to make a
 statement admitting that the way httpRange-14(a) was phrased was a big
 screwup, that the real issue is content vs. description, not a type
 distinction.
 
 I think Jeni's proposal is to say that the Flickr URI is good
 practice, rather than deny it. My proposal is to say that the
 description-free situation is good practice, rather than just an
 undocumented common practice.
 
 In a hybrid world where some URIs work one way (by description) and
 others work the other way (by ostention), the question for anyone
 encountering a hashless http: URI in RDF, is which of the two
 situations (or both) obtain. (Maybe there are some URIs that work
 neither way, or there is a gray area.) It would be nice if there were
 definite answers at least for some URIs.
 
 Jonathan
 
  Regards,
 
  Michael Brunnbauer
 
  --
  ++  Michael Brunnbauer
  ++  netEstate GmbH
  ++  Geisenhausener Straße 11a
  ++  81379 München
  ++  Tel +49 89 32 19 77 80
  ++  Fax +49 89 32 19 77 89
  ++  E-Mail bru...@netestate.de
  ++  http://www.netestate.de/
  ++
  ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
  ++  USt-IdNr. DE221033342
  ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
  ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
 

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Michael Smethurst




On 26/03/2012 17:13, Tom Heath tom.he...@talis.com wrote:

 Hi Jeni,
 
 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,
 
 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.
 
 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!
 
 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.

No data here I fear; merely anecdote. But anecdote is usually the best form
of data :-)
 
 
 What hard data do you think would resolve (or if not resolve, at least move
 forward) the argument? Some people  are contributing their own experience
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up
 trends from the webdatacommons.org  (or similar) data?
 
 A few things come to mind:
 
 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

I think it's explainable. I don't think it's self evident

And explanation can be tricky because:

a) once you get past the obvious cases (a person and their homepage) there
are further levels of abstraction that make things complicated. A journalist
submits a report to a news agency, a sub-editor tweaks it and puts it on the
wires, a news publisher picks up the report, a journalist shapes an article
around it, another sub-editor tweaks that, the article gets published, the
article gets syndicated. Which document is the rdf making claims (created
by, created at) about? And is that the important / interesting thing? You
quickly head down a frbr shaped rabbit hole

b) The way people make and use websites (outside the whole linked data
thing) has moved on. Many people don't just publish pages; they publish
pages that have a one-to-one correspondence with real world things. A page
per photo or programme or species or recipe or person. They're already in
the realm of thinking about things before pages and to them the page and
it's url is a good enough approximation for description

c) people using the web are already thinking about things not pages. If you
search google for Obama your mental model is of the person, not any
resulting pages

d) we already have the resource / representation split which is quite enough
abstraction for some people

e) the list of things you might want to say about a document is finite; the
list of things you might want to say about the world isn't
 
 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

I know publishers whose platform is so constrained they can't even edit the
head section of their html documents. They certainly don't have access at
the server level

Even where 303s are technically possible they might not be politically
possible. Technically we could have easily created bbc.co.uk/things/:blah
and made it 303 but that would have involved setting up /things and that's a
*very* difficult conversation with management and ops

And if it's technically and politically possible it really depends on how
the 303 is set up. Lots of linked data people seem to conflate the 303 and
content negotiation. So I ask for something that can't be sent, they do the
accept header stuff and 303 me to the *representation* url. Rather than: I
ask for something that can't be sent, they 303 to a generic information
resource which content negotiates to the appropriate representation.

If you do this in two steps (303 then conneg) you can point any html links
at the generic document resource url so you don't pick up a 303 penalty for
every request

No sane publisher trying to handle a decent amount of traffic is gonna
follow the dbpedia pattern of doing it in one step (conneg to 303) and
picking up 2 server hits per request. I've said here before that the dbpedia
publishing pattern is an anti-pattern and shouldn't be encouraged

Whichever way you do it, it doesn't take away Dave Reynold's point that:

 I have been in discussions with clients

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

On Tue, Mar 27, 2012 at 10:37 AM, Michael Brunnbauer
bru...@netestate.de wrote:

 Hello Jonathan,

 so let the question be did I GET what the URI denotes and let httprange14
 be 200 - yes, 303 - no.

Basically yes, although you have to be careful preserve the
generic/specific (or resource/representation) distinction somehow, or
else people will say that you don't know what you're talking about.

If you always get the same representation from a URI, the distinction
goes away, but in practice you have content negotiation, change over
time, banner ads, login specific customizations, etc. that make life
more difficult. The only way I've found to make sense of this
complexity is what I wrote up in my Generic resources and web
metadata note, which claims that what people unconsciously intend is
usually universal quantification. If you need to be really precise
about what you say about documents (transclusion, scripts, etc.) then
using 200 URIs in RDF without further explanation is probably not a
great idea; you'd want some kind of vocabulary that allowed you to say
precisely what you mean.

 Let another question be can this URI be used with document annotation
 properties (or: Is this URI an IR) ? From 200 a statuscode, I can infer that
 the URI can be used with document annotation properties and use those
 properties. I can also use those properties with some 303 URIs but not always.

That's another question, but it is rarely asked without also wondering
just what the content is, since the content is going to determine
whether the annotations are true or not. So I would focus on the
content, and then annotatability will sort itself out.

 Both these questions may not be answered from a 200 statuscode in the future.

If consensus is built around non-HR14a uses of 200, yes. You'd have to
look elsewhere for additional clues, e.g. the headers or content.

 Is all of this right ?

Close enough.
Jonathan

Re: Change Proposal for HttpRange-14


On 3/27/12 11:17 AM, Michael Smethurst wrote:

No sane publisher trying to handle a decent amount of traffic is gonna
follow the dbpedia pattern of doing it in one step (conneg to 303) and
picking up 2 server hits per request. I've said here before that the dbpedia
publishing pattern is an anti-pattern and shouldn't be encouraged
Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top 
priority, the goal was simple: unleash Linked Data in a manner that just 
worked. That meant catering for:


1. frameworks and libraries that send hash URIs over the wire
2. work with all browsers, no excuses.

Linked Data is now alive and in broad use (contrary to many 
misconceptions to the contrary), there is still a need for slash URIs. 
This isn't a matter of encouragement or discouragement, its a case of 
what works for the project goals at hand. If slash URIs don't work then 
use hash URIs or vice versa. Platforms that conform to Linked Data meme 
principles should be able to handle these scenarios.


BTW - Imagine a scenario where Linked Data only worked with one style of 
URI, where would we be today or tomorrow, re. Linked Data? Being 
dexterous and unobtrusive has to be a celebrated feature rather than a 
point of perpetual distraction.


As is always the case, a good system must pass the horses for courses 
test. Linked Data -- courtesy of the underlying architecture of the 
World Wide Web -- does that with aplomb modulo the distraction star 
wanderings of planet HttpRange-14 into its solar system every so many 
months :-)


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Michael Smethurst




On 27/03/2012 16:53, Kingsley Idehen kide...@openlinksw.com wrote:

 On 3/27/12 11:17 AM, Michael Smethurst wrote:
 No sane publisher trying to handle a decent amount of traffic is gonna
 follow the dbpedia pattern of doing it in one step (conneg to 303) and
 picking up 2 server hits per request. I've said here before that the dbpedia
 publishing pattern is an anti-pattern and shouldn't be encouraged
 Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top
 priority, the goal was simple: unleash Linked Data in a manner that just
 worked. That meant catering for:
 
 1. frameworks and libraries that send hash URIs over the wire
 2. work with all browsers, no excuses.
 
 Linked Data is now alive and in broad use (contrary to many
 misconceptions to the contrary), there is still a need for slash URIs.
 This isn't a matter of encouragement or discouragement, its a case of
 what works for the project goals at hand. If slash URIs don't work then
 use hash URIs or vice versa. Platforms that conform to Linked Data meme
 principles should be able to handle these scenarios.
 
 BTW - Imagine a scenario where Linked Data only worked with one style of
 URI, where would we be today or tomorrow, re. Linked Data? Being
 dexterous and unobtrusive has to be a celebrated feature rather than a
 point of perpetual distraction.

My point wasn't about hashes or slashes or any style of uri. It was about
conflating 303s ((I can't give you that but) here's something that might be
useful) with conneg (here's the useful thing in the representation you asked
for). And about how not exposing the generic IR URI and not linking to it
imposes too high a penalty

Whether 303s are useful or not, there's a good and bad way to use them

Cheers
micheel
 
 As is always the case, a good system must pass the horses for courses
 test. Linked Data -- courtesy of the underlying architecture of the
 World Wide Web -- does that with aplomb modulo the distraction star
 wanderings of planet HttpRange-14 into its solar system every so many
 months :-)


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Change Proposal for HttpRange-14


On 3/27/12 12:35 PM, Michael Smethurst wrote:



On 27/03/2012 16:53, Kingsley Idehenkide...@openlinksw.com  wrote:


On 3/27/12 11:17 AM, Michael Smethurst wrote:

No sane publisher trying to handle a decent amount of traffic is gonna
follow the dbpedia pattern of doing it in one step (conneg to 303) and
picking up 2 server hits per request. I've said here before that the dbpedia
publishing pattern is an anti-pattern and shouldn't be encouraged

Circa. 2006-2007, with Linked Data bootstrap via the LOD project as top
priority, the goal was simple: unleash Linked Data in a manner that just
worked. That meant catering for:

1. frameworks and libraries that send hash URIs over the wire
2. work with all browsers, no excuses.

Linked Data is now alive and in broad use (contrary to many
misconceptions to the contrary), there is still a need for slash URIs.
This isn't a matter of encouragement or discouragement, its a case of
what works for the project goals at hand. If slash URIs don't work then
use hash URIs or vice versa. Platforms that conform to Linked Data meme
principles should be able to handle these scenarios.

BTW - Imagine a scenario where Linked Data only worked with one style of
URI, where would we be today or tomorrow, re. Linked Data? Being
dexterous and unobtrusive has to be a celebrated feature rather than a
point of perpetual distraction.

My point wasn't about hashes or slashes or any style of uri.

Your comment was:

 No sane publisher trying to handle a decent amount of traffic is gonna 
follow the dbpedia pattern of doing it in one step (conneg to 303) and 
picking up 2 server hits per request.


You described DBpedia method of doing things as being one to be 
discouraged. DBpedia deploys Linked Data via slash URIs, hence my 
response. Also, in the context of Linked Data slash URIs ultimately lead 
to the contentious 303 entity name / web resource address disambiguation 
heuristic.

  It was about
conflating 303s ((I can't give you that but) here's something that might be
useful) with conneg (here's the useful thing in the representation you asked
for).


303 isn't a conflation of anything. It's a redirection mechanism that 
can be used in different ways. Sometimes it facilitates access to 
alternative representations and sometimes it can just be used to 
facilitate indirection re. data access by name reference as per Linked 
Data principles.


In the Linked Data system, you are seeking the description of an Entity 
that's been identified using a URI. If it so happens that the URI is 
hashless (or slash based) the system doesn't reply with an actual entity 
descriptor resource address, it redirects you. The very same thing 
happens with a hash URI but it has the benefit of delivering said 
indirection and disambiguation implicitly.



There is always indirection in play. 303 isn't conflation, its simply 
redirection that is exploitable in a variety of ways.

And about how not exposing the generic IR URI and not linking to it
imposes too high a penalty


Here are the potential penalties, both ultimately about entity name / 
entity descriptor (description) resource address disambiguation:


1. 303 round trip costs
2. agreement about which relations and constituent predicates provide 
agreed up semantics that address actual entity name / entity descriptor 
resource address ambiguity .


Here are some of the constituencies to which these potential costs apply:

1. Web Page Publishers -- content publishers
2. Linked Data publishers -- structured data publishers
3. Web Page Consumers -- content consumers
4. Linked Data Consumers -- structured data consumers.

Expand the items above and you get an interesting cost vs benefits matrix.
To cut a longish story short, if HTTP had a DESCRIBE method all of this 
confusion would vanish, pronto. Then you would have HTTP requests of the 
form:


DESCRIBE http://dbpedia.org/resource/Linked_Data
and
DESCRIBE http://dbpedia.org/page/Linked_Data

Net effect: an HTTP request could specifically return the relevant 
chunks of the description data that you seek. Today, the SPARQL protocol 
provides the next best thing.

Whether 303s are useful or not, there's a good and bad way to use them


As is the case with everything :-)


Kingsley



Cheers
micheel

As is always the case, a good system must pass the horses for courses
test. Linked Data -- courtesy of the underlying architecture of the
World Wide Web -- does that with aplomb modulo the distraction star
wanderings of planet HttpRange-14 into its solar system every so many
months :-)


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Jeni Tennison

Hi Tom,

On 26 Mar 2012, at 17:13, Tom Heath wrote:
 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,
 
 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.
 
 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!
 
 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.
 
 
 What hard data do you think would resolve (or if not resolve, at least move 
 forward) the argument? Some people  are contributing their own experience 
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?
 
 A few things come to mind:
 
 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

How would we assess that though? My experience is in some way similar -- it's 
easy enough to explain that you can't get a Road or a Person when you ask for 
them on the web -- but when you move on to then explaining how that means you 
need two URIs for most of the things that you really want to talk about, and 
exactly how you have to support those URIs, it starts getting much harder.

The biggest indication to me that explaining the distinction is a problem is 
that neither OGP nor schema.org even attempts to go near it when explaining to 
people how to add to semantic information into their web pages. The URIs that 
you use in the 'url' properties of those vocabularies are explained in terms of 
'canonical URLs' for the thing that is being talked about. These are the kinds 
of graphs that millions of developers are building on, and those developers do 
not consider themselves linked data adopters and will not be going to linked 
data experts for training.

 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

I could offer you a data point from legislation.gov.uk if you like. When 
someone requests the ToC for an item of legislation, they will usually hit our 
CDN and the result will come back extremely quickly. I just tried:

curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents

and it showed the result coming back in 59ms.

When someone uses the identifier URI for the abstract concept of an item of 
legislation, there's no caching so the request goes right back to the server. I 
just tried:

curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67

and it showed the result coming back in 838ms, of course the redirection goes 
to the ToC above, so in total it takes around 900ms to get back the data.

So every time that we refer to an item of legislation through its generic 
identifier rather than a direct link to its ToC we are making the site seem 
about 15 times slower. What's more, it puts load on our servers which doesn't 
happen when the data is cached; the more load, the slower the responses to 
other important things that are hard to cache, such as free-text searching.

The consequence of course is that for practical reasons we design the site not 
to use generic identifiers for items of legislation unless we really can't 
avoid it and add redirections where we should technically be using 404s. The 
impracticality of 303s has meant that we've had to compromise in other areas of 
the structure of the site.

This is just one data point of course, and it's possible that if we'd fudged 
the handling of the generic identifiers (eg by not worrying about when they 
should return 404s or 300s and just always doing a regex mapping to a guess of 
an equivalent document URI) we would have better performance from them, but 
that would also have been a design compromise forced on us because of the 
impracticality of 303s. (In fact we made this precise design compromise for the 
data.gov.uk linked data.)

 3) hard data about occurrences of different patterns/anti-patterns; we
 need something more concrete/comprehensive than the

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Leigh Dodds

Hi,

On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees r...@mumble.net wrote:
 ...
 There is a difference, since what is described could be an IR that
 does not have the description as content. A prime example is any DOI,
 e.g.

 http://dx.doi.org/10.1371/journal.pcbi.1000462

 (try doing conneg for RDF). The identified resource is an IR as you
 suggest, but the representation (after the 303 redirect) is not its
 content.

A couple of comments here:

1. Its not any DOI. I believe CrossRef are still the only registrar
that support this, but I might have missed an announcement. That's
still 50m DOIs though

2. Are you sure its an Information Resource? The DOI handbook [1]
notes that while typically used to identify intellectual property a
DOI can be used to identify anything. The CrossRef guidelines [2]
explain that [a]s a matter of current policy, the CrossRef DOI
identifies the work, not its various potential manifestations

Is a FRBR work an Information Resource? Personally I'd say not, but
others may disagree. But as Dan Brickley has noted elsewhere in the
discussion, there's other nuances to take into account.

[1]. http://www.doi.org/handbook_2000/intro.html#1.6
[2]. http://crossref.org/02publishers/15doi_guidelines.html

Cheers,

L.

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Melvin Carvalho

On 27 March 2012 19:54, Jeni Tennison j...@jenitennison.com wrote:

 Hi Tom,

 On 26 Mar 2012, at 17:13, Tom Heath wrote:
  On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
  Tom,
 
  On 26 Mar 2012, at 16:05, Tom Heath wrote:
  On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
  I'm sure many people are just deeply bored of this discussion.
 
  No offense intended to Jeni and others who are working hard on this,
  but *amen*, with bells on!
 
  One of the things that bothers me most about the many years worth of
  httpRange-14 discussions (and the implications that HR14 is
  partly/heavily/solely to blame for slowing adoption of Linked Data) is
  the almost complete lack of hard data being used to inform the
  discussions. For a community populated heavily with scientists I find
  that pretty tragic.
 
 
  What hard data do you think would resolve (or if not resolve, at least
 move forward) the argument? Some people  are contributing their own
 experience from building systems, but perhaps that's too anecdotal? Would a
  structured survey be helpful? Or do you think we might be able to pick
 up trends from the webdatacommons.org  (or similar) data?
 
  A few things come to mind:
 
  1) a rigorous assessment of how difficult people *really* find it to
  understand distinctions such as things vs documents about things.
  I've heard many people claim that they've failed to explain this (or
  similar) successfully to developers/adopters; my personal experience
  is that everyone gets it, it's no big deal (and IRs/NIRs would
  probably never enter into the discussion).

 How would we assess that though? My experience is in some way similar --
 it's easy enough to explain that you can't get a Road or a Person when you
 ask for them on the web -- but when you move on to then explaining how that
 means you need two URIs for most of the things that you really want to talk
 about, and exactly how you have to support those URIs, it starts getting
 much harder.


I'm curious as to why this is difficult to explain.  Especially since I
also have difficulties explaining the benefits of linked data.  However,
normally the road block I hit is explaining why URIs are important.

Are there perhaps similar paradigms that the majority of developers are
already already familiar with?


One that springs to mind is in java

You have a file Hello.java

But the file contains the actual class, Hello, which has keys and
values.


Or perhaps most people these datys know JSON, where you have file like
hello.json

The file itself is not that important, but it can contain 0 or more
objects, such as
{
  key1 : value1,
  key2 : value2,
  key3 : value3
}

Would this be a valid analogy?



 The biggest indication to me that explaining the distinction is a problem
 is that neither OGP nor schema.org even attempts to go near it when
 explaining to people how to add to semantic information into their web
 pages. The URIs that you use in the 'url' properties of those vocabularies
 are explained in terms of 'canonical URLs' for the thing that is being
 talked about. These are the kinds of graphs that millions of developers are
 building on, and those developers do not consider themselves linked data
 adopters and will not be going to linked data experts for training.

  2) hard data about the 303 redirect penalty, from a consumer and
  publisher side. Lots of claims get made about this but I've never seen
  hard evidence of the cost of this; it may be trivial, we don't know in
  any reliable way. I've been considering writing a paper on this for
  the ISWC2012 Experiments and Evaluation track, but am short on spare
  time. If anyone wants to join me please shout.

 I could offer you a data point from legislation.gov.uk if you like. When
 someone requests the ToC for an item of legislation, they will usually hit
 our CDN and the result will come back extremely quickly. I just tried:

 curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents

 and it showed the result coming back in 59ms.

 When someone uses the identifier URI for the abstract concept of an item
 of legislation, there's no caching so the request goes right back to the
 server. I just tried:

 curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67

 and it showed the result coming back in 838ms, of course the redirection
 goes to the ToC above, so in total it takes around 900ms to get back the
 data.

 So every time that we refer to an item of legislation through its generic
 identifier rather than a direct link to its ToC we are making the site seem
 about 15 times slower. What's more, it puts load on our servers which
 doesn't happen when the data is cached; the more load, the slower the
 responses to other important things that are hard to cache, such as
 free-text searching.

 The consequence of course is that for practical reasons we design the site
 not to use generic identifiers for items of legislation

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Giovanni Tummarello

Tom if you were to do a serious assessment then measuring milliseconds
and redirect hits means looking at a misleading 10% of the problem.

Cognitive loads,economics and perception of benefits are the over the
90% of the question here.

An assessment that could begin describing the issue

* get a normal webmaster calculate how much it takes to explain him
the thing,follow him on and
* see how quickly he forgets,
* assess how much it takes to VALIDATE the whole thing works (E.g. a
newly implemented spects)
* assess what are the tools that would check if something break
* assess the same thing for implementers e.g. of applications or
consuming APIs to get all teh above
* then once you calculate the huge cost above then compare it with the
perceived benefits.

THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical
level because for sites that matters ITS MANAGERS THAT DECIDE geek run
websites dont count, sorry.

Same thing when looking at 'real world applications' by counting just
geeky hacked together demostrators or semweb aficionados libs has the
same skew.. these people and apps were paid by EU money or research
money or  so they should'n  count toward real world economics driven
apps, so if one was thinking of counting   50 apps that would break
that'd be just as partial and misleading.

.. and we could go on. Now do you really need to do the above? (let
alone how difficult it is to do inproper terms) me and a whole crowd
know already the  results for the same exercise have been done over
and over and we've been witnessing it.
 i sincerely hope this is the time we get this fixed so we can indeed
go back and talk about the new linked data (linked data 2.0) to actual
web developers, it managers etc.

removing the 303 thing doesnt solve the whole problem, it is just the
beginning. Looking forward to discuss next steps

Gio




On Mon, Mar 26, 2012 at 6:13 PM, Tom Heath tom.he...@talis.com wrote:
 Hi Jeni,

 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,

 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.

 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!

 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


 What hard data do you think would resolve (or if not resolve, at least move 
 forward) the argument? Some people  are contributing their own experience 
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?

 A few things come to mind:

 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

 3) hard data about occurrences of different patterns/anti-patterns; we
 need something more concrete/comprehensive than the list in the change
 proposal document.

 4) examples of cases where the use of anti-patterns has actually
 caused real problems for people, and I don't mean problems in
 principle; have planes fallen out of the sky, has anyone died? Does it
 really matter from a consumption perspective? The answer to this is
 probably not, which may indicate a larger problem of non-adoption.

 The larger question is how do we get to a state where we *don't* have this 
 permathread running, year in year
 out. Jonathan and the TAG's aim with the call for change proposals is to get 
 us to that state. The idea is that by
 getting people who think that the specs should say something different to 
 put their money where their mouth is  and express what that should be, we 
 have something more solid to work from than reams and reams of
 opinionated emails.

 This is a really worthy goal, and thank you to you, Jonathan and the
 TAG for taking it on. I long for the situation you describe where the
 permathread is 'permadead' :)

 But we do all need

Re: Change Proposal for HttpRange-14


On 3/27/12 3:23 PM, Melvin Carvalho wrote:

curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents

and it showed the result coming back in 59ms.

When someone uses the identifier URI for the abstract concept of an 
item of legislation, there's no caching so the request goes right back 
to the server. I just tried:


curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67

What do you get for timing results when you compare:

curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67

and

curl --trace-time -v 
http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf ?


I would expect the delta to be the overhead contributed by indirection 
delivered via the 303 redirection heuristic.


From my U.S. location I get the following results:

for: time curl -v 
http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf


real0m1.117s
user0m0.002s
sys0m0.003s


for: time curl -v http://www.legislation.gov.uk/id/ukpga/1985/67

real0m1.521s
user0m0.002s
sys0m0.003s


Also note, if you add wdrs:describedby relations to your RDF documents 
the descripton subject URI or its descriptor document URL will work fine 
i.e., existing Linked Data clients will ultimately end up in a 
follow-your-nose friendly Linked Data graph. The relation in question is 
a triple of the form:


http://www.legislation.gov.uk/id/ukpga/1985/67 wdrs:describedby 
http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf


Here is placeholder URI for my suggestion: 
http://linkeddata.uriburner.com/about/id/entity/http/www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf 
. If you make the change, reload using URL pattern: 
http://linkeddata.uriburner.com/about/html/http/www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf?sponger:get=addrefresh=0


As for: http://www.legislation.gov.uk/ukpga/1985/67/contents, what about 
a link/ relation in its head/ section that establishes 
http://www.legislation.gov.uk/ukpga/1985/67/2009-09-01/data.rdf as an 
alternative representation? You already have this sort of relation in 
place as per the following entry:
link rel=alternate type=application/xml 
href=http://legislation.data.gov.uk/ukpga/1985/67/contents/data.xml; /


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

On Tue, Mar 27, 2012 at 2:14 PM, Leigh Dodds le...@ldodds.com wrote:
 Hi,

 On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees r...@mumble.net wrote:
 ...
 There is a difference, since what is described could be an IR that
 does not have the description as content. A prime example is any DOI,
 e.g.

 http://dx.doi.org/10.1371/journal.pcbi.1000462

 (try doing conneg for RDF). The identified resource is an IR as you
 suggest, but the representation (after the 303 redirect) is not its
 content.

 A couple of comments here:

 1. Its not any DOI. I believe CrossRef are still the only registrar
 that support this, but I might have missed an announcement. That's
 still 50m DOIs though

You are right, it's not all registrars. I meant Crossref DOIs.
I think Datacite DOIs do this too, but I'm not sure.

 2. Are you sure its an Information Resource?

Nobody can be sure of any such question. I would say it is (as would
be a variety of FRBR Works or Expressions or Manifestations, and many
other things besides), but there is nothing I could possibly say that
would persuade you of this.

This is why, as Tim and I keep saying, you have to forget about the
information resource nonsense and focus instead on the idea of
content or instantiation. I assume you're aware of what I've written
on this subject, so it would be pointless for me to say more here.

I hope the TAG will make a clear statement about this to help people
stop bickering about this kind of thing.

Often I think people attack information resource just because they
want to use 200s for their linked data descriptions. This is a rather
indirect tactic, and it misses the whole point of httpRange-14(a),
which admittedly was a screwup in execution, but not idiotic in
motivation.

Jonathan

 The DOI handbook [1]
 notes that while typically used to identify intellectual property a
 DOI can be used to identify anything. The CrossRef guidelines [2]
 explain that [a]s a matter of current policy, the CrossRef DOI
 identifies the work, not its various potential manifestations

 Is a FRBR work an Information Resource? Personally I'd say not, but
 others may disagree. But as Dan Brickley has noted elsewhere in the
 discussion, there's other nuances to take into account.

 [1]. http://www.doi.org/handbook_2000/intro.html#1.6
 [2]. http://crossref.org/02publishers/15doi_guidelines.html

 Cheers,

 L.

Re: Change Proposal for HttpRange-14


On 3/27/12 4:01 PM, Giovanni Tummarello wrote:

Tom if you were to do a serious assessment then measuring milliseconds
and redirect hits means looking at a misleading 10% of the problem.

Cognitive loads,economics and perception of benefits are the over the
90% of the question here.

An assessment that could begin describing the issue

* get a normal webmaster calculate how much it takes to explain him
the thing,follow him on and
* see how quickly he forgets,
* assess how much it takes to VALIDATE the whole thing works (E.g. a
newly implemented spects)
* assess what are the tools that would check if something break
* assess the same thing for implementers e.g. of applications or
consuming APIs to get all teh above
* then once you calculate the huge cost above then compare it with the
perceived benefits.

THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical
level because for sites that matters ITS MANAGERS THAT DECIDE geek run
websites dont count, sorry.


That's a really skewed and somewhat biased sequence. How about this one:

1. Demonstrate the virtues of Linked Data modulo a single line of code
2. Determine if the customer can work with the Linked Data tool as is
3. Quote on professional services if they opt to engage you to get it 
going rather that doing it themselves.


Look, your example is akin to prescribing the following to an ODBC 
driver customer:


1. Explain what an ODBC Data Source Name is
2. Explain the constituency of a connect string
3. Explain who to use the ODBC API in C/C++ or VB where Environment 
Handles and Connection Handles management creep in

4. Compare to the perceived benefits.

Q: What are the perceived, anticipated, or actual benefits of Linked Data?
A: Enterprise and/or Individual agility improvements via increased 
access to data across disparate data sources.


Q: What are the perceived, anticipated, or actual benefits of the World 
Wide Web?
A: Enterprise and/or individual agility improvements via increased 
access to data across disparate data sources.


If you bring minutia into the conversation you invite the skewed 
sequence you outlined in your sequence.


Here's what we are all ultimately seeking to enable. The sequence goes 
something like this:


1. Something piques your interest;
2. You make a statement about it in a document;
3. You publish the document to the Web (or private network);
4. Done!

This pattern works absolutely fine using hash URIs, you can even go 
kinda primitive re. your narrative. Say something like this:


1. Create a file;
2. Describe the item of interest via structured content in 3-tuple 
(triple) form using an Identifier of the form: file-name#this ;

3. Save the file ;
4. Publish the file to the Web;
5. Done!

This whole thing is like a global jigsaw puzzle, instead of trying to 
put all the pieces together in one go, simply contribute or connect the 
pieces that are of interest to you. The Web (or your private HTTP based 
network) will do the REST, no joke :-)




Same thing when looking at 'real world applications' by counting just
geeky hacked together demostrators or semweb aficionados libs has the
same skew.. these people and apps were paid by EU money or research
money or  so they should'n  count toward real world economics driven
apps, so if one was thinking of counting   50 apps that would break
that'd be just as partial and misleading.


You are simply confirming the issue re. obvious dearth of productivity 
oriented tools in the Linked Data realm.




.. and we could go on. Now do you really need to do the above? (let
alone how difficult it is to do inproper terms) me and a whole crowd
know already the  results for the same exercise have been done over
and over and we've been witnessing it.
  i sincerely hope this is the time we get this fixed so we can indeed
go back and talk about the new linked data (linked data 2.0) to actual
web developers, it managers etc.


Managers will always fund projects that are beneficial. Thus, time to 
manifest value proposition is crucial. If the journey requires scripting 
or heavy duty coding as a basic prerequisite, it's deservedly dead on 
arrival.




removing the 303 thing doesnt solve the whole problem, it is just the
beginning. Looking forward to discuss next steps


It has nothing to do with 303. You keep on pulling 303 into to the 
conversation then end up complaining about the mess it potentially creates.


Show the value first, not the mechanics of the value engine.

As per usual, I encourage you and others to study the 20+ year old ODBC 
ecosystem which is comprised of:


1. ODBC compliant productivity tools
2. ODBC drivers
3. Relational Databases.

The only difference between ODBC Data Source Names and Linked Data is 
the use of X.500 style naming re. ODBC connection strings and the fact 
that the graphs a confined to the realm of 'C' data structures. If you 
study the API you would be quite amazed as to how much it actually covers.


Linked Data is more

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Mike Bergman


Hi Jonathan,

On 3/27/2012 3:27 PM, Jonathan A Rees wrote:

On Tue, Mar 27, 2012 at 2:14 PM, Leigh Doddsle...@ldodds.com  wrote:

Hi,

On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Reesr...@mumble.net  wrote:

...
There is a difference, since what is described could be an IR that
does not have the description as content. A prime example is any DOI,
e.g.

http://dx.doi.org/10.1371/journal.pcbi.1000462

(try doing conneg for RDF). The identified resource is an IR as you
suggest, but the representation (after the 303 redirect) is not its
content.


A couple of comments here:

1. Its not any DOI. I believe CrossRef are still the only registrar
that support this, but I might have missed an announcement. That's
still 50m DOIs though


You are right, it's not all registrars. I meant Crossref DOIs.
I think Datacite DOIs do this too, but I'm not sure.


2. Are you sure its an Information Resource?


Nobody can be sure of any such question. I would say it is (as would
be a variety of FRBR Works or Expressions or Manifestations, and many
other things besides), but there is nothing I could possibly say that
would persuade you of this.

This is why, as Tim and I keep saying, you have to forget about the
information resource nonsense and focus instead on the idea of
content or instantiation. I assume you're aware of what I've written
on this subject, so it would be pointless for me to say more here.


I find this rather remarkable when in your own call [1] you state this 
Rule for Engagement:


9. Kindly avoid arguing in the change proposals over the terminology 
that is used in the baseline document. Please use the terminology that 
it uses. If necessary discuss terminology questions on the list as 
document issues independent of the 303 question.


Either the TAG is going to address this terminology head on or it is 
not. It is one of the cruxes to the problem, and not just because people 
are using it as an excuse to justify 200s.


I will be saying more about this shortly.

Thanks, Mike

[1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html

Jonathan A Rees, 29 February 2012


I hope the TAG will make a clear statement about this to help people
stop bickering about this kind of thing.

Often I think people attack information resource just because they
want to use 200s for their linked data descriptions. This is a rather
indirect tactic, and it misses the whole point of httpRange-14(a),
which admittedly was a screwup in execution, but not idiotic in
motivation.

Jonathan


The DOI handbook [1]
notes that while typically used to identify intellectual property a
DOI can be used to identify anything. The CrossRef guidelines [2]
explain that [a]s a matter of current policy, the CrossRef DOI
identifies the work, not its various potential manifestations

Is a FRBR work an Information Resource? Personally I'd say not, but
others may disagree. But as Dan Brickley has noted elsewhere in the
discussion, there's other nuances to take into account.

[1]. http://www.doi.org/handbook_2000/intro.html#1.6
[2]. http://crossref.org/02publishers/15doi_guidelines.html

Cheers,

L.

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

On Tue, Mar 27, 2012 at 4:58 PM, Mike Bergman m...@mkbergman.com wrote:
 Hi Jonathan,


 On 3/27/2012 3:27 PM, Jonathan A Rees wrote:

 On Tue, Mar 27, 2012 at 2:14 PM, Leigh Doddsle...@ldodds.com  wrote:

 Hi,

 On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Reesr...@mumble.net  wrote:

 ...
 There is a difference, since what is described could be an IR that
 does not have the description as content. A prime example is any DOI,
 e.g.

 http://dx.doi.org/10.1371/journal.pcbi.1000462

 (try doing conneg for RDF). The identified resource is an IR as you
 suggest, but the representation (after the 303 redirect) is not its
 content.


 A couple of comments here:

 1. Its not any DOI. I believe CrossRef are still the only registrar
 that support this, but I might have missed an announcement. That's
 still 50m DOIs though


 You are right, it's not all registrars. I meant Crossref DOIs.
 I think Datacite DOIs do this too, but I'm not sure.

 2. Are you sure its an Information Resource?


 Nobody can be sure of any such question. I would say it is (as would
 be a variety of FRBR Works or Expressions or Manifestations, and many
 other things besides), but there is nothing I could possibly say that
 would persuade you of this.

 This is why, as Tim and I keep saying, you have to forget about the
 information resource nonsense and focus instead on the idea of
 content or instantiation. I assume you're aware of what I've written
 on this subject, so it would be pointless for me to say more here.


 I find this rather remarkable when in your own call [1] you state this Rule
 for Engagement:

 9. Kindly avoid arguing in the change proposals over the terminology that
 is used in the baseline document. Please use the terminology that it uses.
 If necessary discuss terminology questions on the list as document issues
 independent of the 303 question.

 Either the TAG is going to address this terminology head on or it is not. It
 is one of the cruxes to the problem, and not just because people are using
 it as an excuse to justify 200s.

I agree that it is cruxical, and I will do what I can to get the TAG
to fix the problem. I thought that's what I said. I've written about
this many times on the www-tag list, and even put it as a goal for the
session at the F2F. I don't speak for the TAG, though, I'm just a
member, so I can't promise anything.

If it were up to me I'd purge information resource from the
document, since I don't want to argue about what it means, and
strengthen the (a) clause to be about content or instantiation or
something. But the document had to reflect the status quo, not things
as I would have liked them to be.

I have not submitted this as a change proposal because it doesn't
address ISSUE-57, but it is impossible to address ISSUE-57 with a
200-related change unless this issue is addressed, as you say, head
on. This is what I've written in my TAG F2F preparation materials.

 I will be saying more about this shortly.

I thought enough had been said already, but will read with interest.

Best
Jonathan

 Thanks, Mike

 [1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html

 Jonathan A Rees, 29 February 2012


 I hope the TAG will make a clear statement about this to help people
 stop bickering about this kind of thing.

 Often I think people attack information resource just because they
 want to use 200s for their linked data descriptions. This is a rather
 indirect tactic, and it misses the whole point of httpRange-14(a),
 which admittedly was a screwup in execution, but not idiotic in
 motivation.

 Jonathan

 The DOI handbook [1]
 notes that while typically used to identify intellectual property a
 DOI can be used to identify anything. The CrossRef guidelines [2]
 explain that [a]s a matter of current policy, the CrossRef DOI
 identifies the work, not its various potential manifestations

 Is a FRBR work an Information Resource? Personally I'd say not, but
 others may disagree. But as Dan Brickley has noted elsewhere in the
 discussion, there's other nuances to take into account.

 [1]. http://www.doi.org/handbook_2000/intro.html#1.6
 [2]. http://crossref.org/02publishers/15doi_guidelines.html

 Cheers,

 L.

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Jeni Tennison

Jonathan,

On 27 Mar 2012, at 14:02, Jonathan A Rees wrote:
 On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de 
 wrote:
 This whole information resource thing needs to just go away. I can't
 believe how many people come back to it after the mistake has been
 pointed out so many times. Maybe the TAG or someone has to make a
 statement admitting that the way httpRange-14(a) was phrased was a big
 screwup, that the real issue is content vs. description, not a type
 distinction.

Yes, that may help. But then we would also have to define what 'content' and 
'description' meant. I have a feeling that might prove just as slippery and 
ultimately unhelpful as 'information resource'.

 I think Jeni's proposal is to say that the Flickr URI is good
 practice, rather than deny it. My proposal is to say that the
 description-free situation is good practice, rather than just an
 undocumented common practice.

Let's call it 'The Explicit Description Link Change Proposal'; it isn't mine 
except in so far as I coordinated its drafting and submitted it.

Anyway, it doesn't say that the Flickr URI is good practice, it just says that 
clients can't make any assumptions one way or the other about whether the 
retrieved representation is content or description unless it contains explicit 
statements or the description is reached through a description link (303 
redirect; 'describedby' Link: header).

Good practice would be for Flickr to use separate URIs for 'the photograph' and 
'the description of the photograph', to ensure that 'the description of the 
photograph' was reachable from 'the photograph' and to ensure that any 
statements referred to the correct one. Under the proposal, they could change 
to this good practice in four ways:

1. by adding:

  link rel=describedby href=#main /

to their page (or pointing to some other URL that they choose to use for 'the 
description of the photograph')

2. by adding a Link: header with a 'describedby' relationship that points at a 
separate URI for 'the description of the photograph' (possibly a fragment as in 
1?)

3. by switching to using 
http://www.flickr.com/photos/70365734@N00/6905069277/#photo or something 
everywhere the photograph was referred to, adding:

  link about=#photo rel=describedby href= /

in their page and adding about=#photo on the body element in the HTML so that 
the RDFa statements in the page were about the photograph

4. by introducing support for a new page 
http://www.flickr.com/photos/70365734@N00/6905069277/description and adding a 
303 redirection from http://www.flickr.com/photos/70365734@N00/6905069277/ to 
that URL

The first two methods are only feasible under the proposal; the others are 
things they could do now.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Tim Berners-Lee


On 2012-03 -27, at 16:17, Michael Smethurst wrote:

 No sane publisher trying to handle a decent amount of traffic is gonna
 follow the dbpedia pattern of doing it in one step (conneg to 303) and
 picking up 2 server hits per request. I've said here before that the dbpedia
 publishing pattern is an anti-pattern and shouldn't be encouraged


So see the alternative suggestion to use 200 with a header to mean
I am using the other semantics, you asked for a thing and 
here is a representation of a  document describing it - and BTW the 
document has this URI if you want to talk about it.

http://www.w3.org/wiki/HTML/ChangeProposal25

Tim

Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread David Wood

Hi all,

On Mar 27, 2012, at 18:01, Jeni Tennison wrote:

 Jonathan,
 
 On 27 Mar 2012, at 14:02, Jonathan A Rees wrote:
 On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer bru...@netestate.de 
 wrote:
 This whole information resource thing needs to just go away. I can't
 believe how many people come back to it after the mistake has been
 pointed out so many times. Maybe the TAG or someone has to make a
 statement admitting that the way httpRange-14(a) was phrased was a big
 screwup, that the real issue is content vs. description, not a type
 distinction.
 
 Yes, that may help. But then we would also have to define what 'content' and 
 'description' meant. I have a feeling that might prove just as slippery and 
 ultimately unhelpful as 'information resource'.


I fought against jettisoning the IR/NIR distinction for years, but finally 
realized that I was wrong to do so.  The thing that convinced me was the simple 
fact that we can describe an IR (e.g. an HTML page) with another IR (an RDF 
document) without needing to say that either one was or was not an IR (other 
than optionally in the RDF).

By contrast, we do have Content-Type to talk about the content of a 
Representation and Jeni's four ways below (two ways of using a link tag with 
rel=describedby, Link: header with a 'describedby', or 303) to talk about 
descriptions.

I'd be happy to forget about IR/NIR, limit the meaning of content to the 
Content-Type and limit the scope of a description to one of those four 
approaches.

Any takers?

Regards,
Dave


 
 I think Jeni's proposal is to say that the Flickr URI is good
 practice, rather than deny it. My proposal is to say that the
 description-free situation is good practice, rather than just an
 undocumented common practice.
 
 Let's call it 'The Explicit Description Link Change Proposal'; it isn't 
 mine except in so far as I coordinated its drafting and submitted it.
 
 Anyway, it doesn't say that the Flickr URI is good practice, it just says 
 that clients can't make any assumptions one way or the other about whether 
 the retrieved representation is content or description unless it contains 
 explicit statements or the description is reached through a description link 
 (303 redirect; 'describedby' Link: header).
 
 Good practice would be for Flickr to use separate URIs for 'the photograph' 
 and 'the description of the photograph', to ensure that 'the description of 
 the photograph' was reachable from 'the photograph' and to ensure that any 
 statements referred to the correct one. Under the proposal, they could change 
 to this good practice in four ways:
 
 1. by adding:
 
  link rel=describedby href=#main /
 
 to their page (or pointing to some other URL that they choose to use for 'the 
 description of the photograph')
 
 2. by adding a Link: header with a 'describedby' relationship that points at 
 a separate URI for 'the description of the photograph' (possibly a fragment 
 as in 1?)
 
 3. by switching to using 
 http://www.flickr.com/photos/70365734@N00/6905069277/#photo or something 
 everywhere the photograph was referred to, adding:
 
  link about=#photo rel=describedby href= /
 
 in their page and adding about=#photo on the body element in the HTML so 
 that the RDFa statements in the page were about the photograph
 
 4. by introducing support for a new page 
 http://www.flickr.com/photos/70365734@N00/6905069277/description and adding a 
 303 redirection from http://www.flickr.com/photos/70365734@N00/6905069277/ to 
 that URL
 
 The first two methods are only feasible under the proposal; the others are 
 things they could do now.
 
 Cheers,
 
 Jeni
 -- 
 Jeni Tennison
 http://www.jenitennison.com

Re: Middle ground change proposal for httpRange-14 -- submission

2012-03-27 Thread Mike Bergman

As someone who has often commented publicly (and negatively) on matters 
of semWeb semantics and httpRange-14, I feel I have an obligation to 
offer comment publicly on the various change proposals being put forward 
[1].


Despite everyone's acknowledged fatigue about this issue, I think 
Jonathan Rees has done the community a real service pushing to 
re-surface and re-open this question for resolution. His efforts go back 
to 2007, showing just how slowly the wheels of standards sometimes turn 
[2].


I think everyone who has commented publicly on these issues over the 
years has an obligation to state their position: by submitting an 
alternative change proposal; by co-signing one of the existing 
proposals; or otherwise stating their views publicly. It is like 
democracy; if you don't vote, don't bitch about the outcome.


As for me, I am supporting two items:

1) the complete abandonment of the information resource terminology: 
bury it forever! and

2) David Booth's alternative change proposal [3].

I am supporting David's proposal because it:

* Sidesteps the “information resource” definition (though weaker than I 
would want)

* Addresses only the specific HTTP and HTTPS cases
* Avoids the constrained response format suggested by the TAG
* Explicitly rejects assigning innate meanings to URIs
* Poses the solution as a protocol (an understanding between publisher 
and consumer) rather than defining or establishing a meaning via naming
* Provides multiple “cow paths” by which resource definitions can be 
conveyed, which gives publishers and consumers choice and offers the 
best chance for more well-trodden paths to emerge
* Does not call for an outright repeal of the httpRange-14 rule, but 
retains it as one of multiple options for URI owners to describe resources
* Permits the use of an HTTP 200 response with RDF content as a means of 
conveying a URI definition

* Retains the use of the hash URI as an option
* Provides alternatives to those who can not easily (or at all) use the 
303 see also redirect mechanism, and

* Simplifies the language and the presentation.

BTW, I think it would be silly for the TAG not to address the entire 
range of httpRange-14 issues -- including terminology -- by hewing to an 
overly narrow interpretation of ISSUE-57.


Mike

[1] 
http://www.mkbergman.com/1002/tortured-terminology-and-problematic-prescriptions/

[2] http://www.w3.org/2001/tag/group/track/issues/57
[3] http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Pat Hayes


On Mar 26, 2012, at 9:15 AM, Bernard Vatant wrote:

 All
 
 Like many others it seems, I had sworn to myself : nevermore HttpRange-14, 
 but I will also bite the bullet.

Hi Bernard

 Here goes ... Sorry I've hard time to follow-up with whom said what with all 
 those entangled threads, so I answer to ideas more than to people.
 
 There is no need for anyone to even talk about information resources.
 
 YES! I've come with years to a very radical position on this, which is that 
 we have create ourselves a huge non-issue with those notions of information 
 resource and non-information resource. Please show any application making 
 use of this distinction, or which would break if we get rid of this 
 distinction.
 And in any case if there is a distinction, this distinction is about how the 
 URI behave in the http protocol (what it accesses), which should be kept 
 independent of what the URI denotes. The neverending debate will never end as 
 long as those two aspects are mixed, as they are in the current httpRange-14 
 as well as in various change proposals (hence those interminable threads).
  
 The important point about http-range-14, which unfortunately it itself does 
 not make clear, is that the 200-level code is a signal that the URI *denotes* 
 whatever it *accesses* via the HTTP internet architecture.

That has always been my understanding of the intent of the decision. I think 
the way that TimBL phrases it, as a choice betweeen the identified resource 
*being* the meaning (200-code response) or *describing* the meaning (303 
response) is basically the same  distinction with a cherry on top. 

 
 The proposal is that URI X denotes what the publisher of X says it denotes, 
 whether it returns 200 or not.

The problem here is that virtually all publihers don't do this, and there is 
absolutely no sign that anything more than a vanishingly small percentage ever 
will. Not to mention there is no accepted way to do this, or to check when it 
has been done. And, as TImBL reported in a recent email, many people (read: the 
TAG) want it to be the case that there is a 'default' in such cases, and that 
it should be that the URI denotes the Web document which it accesses, so that 
the semantic web can easily talk about the nonsemantic web.

 
 This is the only position which makes sense to me. What the URI is intended 
 to denote can be only derived from explicit descriptions, whatever the way 
 you access those descriptions.

Well, except that in fact you can't do this, as we all know (fix a referent by 
giving a description). You have to rely on actual ostention at some point, both 
an and off the Web; and on the Web, existing Web pages are the only contact 
point for using ostention (ie explicitly pointing to something and saying, in 
effect, I'm referring to *that*

 And assume that if there is no such description, the URI is intended to 
 provide access to somewhere, but not to denote *some* *thing*. It's just 
 actionable in the protocol, and clients do whatever they want with what they 
 get. It's the way the (non-semantic) Web works, and it's OK.
  
 And what if the publisher simply does not say anything about what the URi 
 denotes?
 
 Then nobody knows, and actually nobody cares

But people do care, see above. 

 what the URI denotes, or say that all users implicitly agree it is the same 
 thing, but it does not break any system to ignore what it is. Or, again, show 
 me counter-examples..

TimBL has many.

 
 After all, something like 99.999% of the URIs on the planet lack this 
 information.
 
 Which means that for the Web to work so far, knowing what a URI denotes is 
 useless. But it's useful for the Semantic Web. So let's say that a URI is 
 useful for, or is part of, the Semantic Web if some description(s) of it can 
 be found. And we're done.
  
 What, if anything, can be concluded about what they denote?
 
 Nothing, and let's face it.
  
 The http-range-14 rule provides an answer to this which seems reasonably 
 intuitive.
 
 Wonder if it can be the same Pat Hayes writing this as the one who wrote six 
 years ago In Defence of Ambiguity :) 
 http://www.ibiblio.org/hhalpin/irw2006/presentations/HayesSlides.pdf
 Quote (from the conclusion)
 WebArch http-range-14 seems to presume that if a URI accesses  something 
 directly (not via an http redirect), then the URI must refer  to what it 
 accesses.
 This decision is so bad that it is hard to list all the mistakes in it, but 
 here are a few :
 - It presumes, wrongly, that the distinction between access and  reference is 
 based 
 on the distinction between accessible and  inaccessible referents.
  ... [see above link for full list]
 
 Pat, has your position changed on this? 

Not on the ambiguity point, but yes on http-range-14. I still dislike it 
wholeheartedly and I wish there was some other way to go, but I can see that it 
is useful and relatively simple and enables people to move forward, and it 
seems to kind of work. Maybe this (or

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Dave Reynolds


On 25/03/12 19:24, Kingsley Idehen wrote:


Tim,

Alternatively, why not use the existing Link: header? Then we end up
with the ability to express the same :describedby relation in three
places


Which is, of course, in the now-submitted proposal.

Dave

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Jeni Tennison

Tim,

On 25 Mar 2012, at 20:26, Tim Berners-Lee wrote:
 For example, To take an arbitrary one of the trillions out there, what does 
 http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11
  identify, there being no RDF in it?
 What can I possibly do with that URI if the publisher has not explicitly 
 allowed me to use it
 to refer to the online book, under your proposal?

I don't know about anyone else, but I am getting increasingly confused by your 
use of this example.

What is it that you want to be able to do?

Is it that you want to be able to use 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=1 to 
refer to the book Moby Dick?

You can't do that currently. 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 is a 
web page, not a book. Just because

 1. The book Moby Dick is a book and therefore is an information resource.
 2. http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 
returns a 200 therefore 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 is 
an information resource.
 3. http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 
shows a bit of the book Moby Dick.

it does not follow that 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 
refers to the book Moby Dick. Do you think it does?


Of course you could, currently, in some RDF that you own assert something like:

  #me 
:like 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 ;
.

  http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11
a bibo:Book ;
dct:title Moby Dick ;
.

and therefore state that you mean 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 to 
refer to the book Moby Dick, rather than specifically page 11 of the Project 
Gutenberg version, but whether anyone else would use that same URL to refer to 
the book, or trust your assertions about that URL, is a purely social question.

Under the proposal that we've put forward, you can still make those assertions 
in your own RDF if you want, and consumers will still trust them or not as they 
wish. The only thing that changes is that consumers can't make the assumption 
that just because 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 
returns a 200 it's an information resource, but you haven't required that 
assumption to make your assertions about 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11, so 
I really don't see how that would affect anything that you're doing.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Leigh Dodds

Hi Tim,

On Sun, Mar 25, 2012 at 8:26 PM, Tim Berners-Lee ti...@w3.org wrote:
 ...
 For example, To take an arbitrary one of the trillions out there, what does
 http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11
  identify, there being no RDF in it?
 What can I possibly do with that URI if the publisher has not explicitly
 allowed me to use it
 to refer to the online book, under your proposal?

You can do anything you want with it. You could use record statements
about your HTTP interactions, e.g. retrieval status  date. Or,
because RDF lets anyone, say anything, anywhere, you could just decide
to use that as the URI for the book and annotate it accordingly. The
obvious caveat and risk is that the publisher might subsequently
disagree with you if they do decide to publish some RDF. I can re-use
your data if I decide that risk is acceptable and we can still
usefully interact.

Even if Gutenberg.org did publish some RDF at that URI, you still have
the risk that they could change their mind at a later date.
httprange-14 doesn't help at all there. Lack of precision and
inconsistency is going to be rife whatever form the URIs or response
codes used.

Encouraging people to say what their URIs refer to is the very first
piece of best practice advice.

L.

Re: Middle ground change proposal for httpRange-14

2012-03-26 Thread Leigh Dodds

Hi David,

On Sun, Mar 25, 2012 at 6:50 PM, David Wood da...@3roundstones.com wrote:
 Hi David,

 *sigh*.  I said recently that I would rather chew my arm off than re-engage 
 with http-range-14.  Apparently I have very little self control.

 On Mar 25, 2012, at 11:54, David Booth wrote:
 Jeni, Ian, Leigh, Nick, Hugh, Steve, Masahide, Gregg, Niklas, Jerry,
 Dave, Bill, Andy, John, Ben, Damian, Thomas, Ed Summers and Davy,

 I have drafted what I think may represent a middle ground change
 proposal and I am wondering if something along this line would also meet
 your concerns:
 http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol


 Highlights of this proposal:
 - It enables a URI owner to unambiguously convey any URI definition to
 an interested client.

 +1 to this.  I have long been a fan of unambiguous definition.  The summary 
 argument against is Leigh Dodd's
 show what is actually broken approach and the summary argument for is my 
 we need to invent new ways to associate RDF
 with other Web resources in a discoverable manner to allow for 
 'follow-your-nose' across islands of Linked Data.

I may be misreading you here, but I'm not against unambiguous
definition. My show what is actually broken comment (on twitter) was
essentially the same question as I've asked here before, and as Hugh
asked again recently: what applications currently rely on httprange-14
as it is written today. That useful so we can get a sense of what
would break with a change. So far there's been 2 examples I think.

That's in contrast to a lot of publisher data (but granted, not yet
quantified as to how much) that breaks the rules of httprange-14. I'd
prefer to fix that even if at the cost of breaking a few apps. But we
all know there are very, very few apps that consume Linked Data today,
so changing client expectations isn't a massive problem.

Identifying a set of publishing patterns that identify how publishers
can reduce ambiguity, and advice for clients on how to tread carefully
in the face of ambiguity and inconsistency is a better starting point
IMHO. The goal there being to encourage more unambiguous publishing of
data, by demonstrating value at every step.

Cheers,

L.

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Kingsley Idehen


On 3/26/12 3:57 AM, Dave Reynolds wrote:

On 25/03/12 19:24, Kingsley Idehen wrote:


Tim,

Alternatively, why not use the existing Link: header? Then we end up
with the ability to express the same :describedby relation in three
places


Which is, of course, in the now-submitted proposal.

Dave



Yes, and I only found that out yesterday as you'll see from my thread 
with Jonathan :-)


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Bernard Vatant

All

Like many others it seems, I had sworn to myself : nevermore HttpRange-14,
but I will also bite the bullet.
Here goes ... Sorry I've hard time to follow-up with whom said what with
all those entangled threads, so I answer to ideas more than to people.

There is no need for anyone to even talk about information resources.


YES! I've come with years to a very radical position on this, which is that
we have create ourselves a huge non-issue with those notions of
information resource and non-information resource. Please show any
application making use of this distinction, or which would break if we get
rid of this distinction.
And in any case if there is a distinction, this distinction is about how
the URI behave in the http protocol (what it accesses), which should be
kept independent of what the URI denotes. The neverending debate will never
end as long as those two aspects are mixed, as they are in the current
httpRange-14 as well as in various change proposals (hence those
interminable threads).


 The important point about http-range-14, which unfortunately it itself
 does not make clear, is that the 200-level code is a signal that the URI
 *denotes* whatever it *accesses* via the HTTP internet architecture.


The proposal is that URI X denotes what the publisher of X says it denotes,
 whether it returns 200 or not.


This is the only position which makes sense to me. What the URI is intended
to denote can be only derived from explicit descriptions, whatever the way
you access those descriptions. And assume that if there is no such
description, the URI is intended to provide access to somewhere, but not to
denote *some* *thing*. It's just actionable in the protocol, and clients do
whatever they want with what they get. It's the way the (non-semantic) Web
works, and it's OK.


 And what if the publisher simply does not say anything about what the URi
 denotes?


Then nobody knows, and actually nobody cares what the URI denotes, or say
that all users implicitly agree it is the same thing, but it does not break
any system to ignore what it is. Or, again, show me counter-examples..

After all, something like 99.999% of the URIs on the planet lack this
 information.


Which means that for the Web to work so far, knowing what a URI denotes is
useless. But it's useful for the Semantic Web. So let's say that a URI is
useful for, or is part of, the Semantic Web if some description(s) of it
can be found. And we're done.


 What, if anything, can be concluded about what they denote?


Nothing, and let's face it.


 The http-range-14 rule provides an answer to this which seems reasonably
 intuitive.


Wonder if it can be the same Pat Hayes writing this as the one who wrote
six years ago In Defence of Ambiguity :)
http://www.ibiblio.org/hhalpin/irw2006/presentations/HayesSlides.pdf
Quote (from the conclusion)
WebArch http-range-14 seems to presume that if a URI accesses  something
directly (not via an http redirect), then the URI must refer  to what it
accesses.
This decision is so bad that it is hard to list all the mistakes in it, but
here are a few :
- It presumes, wrongly, that the distinction between access and  reference
is based
on the distinction between accessible and  inaccessible referents.
 ... [see above link for full list]

Pat, has your position changed on this?


 What would be your answer? Or do you think there should not be any
 'default' rule in such cases?


I would say so, because such a rule is basically useless. As useless as to
wonder what a phone number denotes. A phone number allows you to access a
point in a network given the phone infrastructure and protocols, it does
not denote anything except in specific contexts where it's used explicitly
as an identifier e.g., to uniquely identify people, organizations or
services. Otherwise it works just like a phone number should do.

Best regards

Bernard

-- 
*Bernard Vatant
*
Vocabularies  Data Engineering
Tel :  + 33 (0)9 71 48 84 59
 Skype : bernard.vatant
Linked Open Vocabularies http://labs.mondeca.com/dataset/lov


*Mondeca**  **   *
3 cité Nollez 75018 Paris, France
www.mondeca.com
Follow us on Twitter : @mondecanews http://twitter.com/#%21/mondecanews

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Tom Heath

On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 On 23 Mar 2012, at 14:05, Jonathan A Rees wrote:
 2012/3/23 Melvin Carvalho melvincarva...@gmail.com:
 I dont think, even the wildest optimist, could have predicted the success of
 the current architecture (both pre and post HR14).

 The votes of confidence are interesting to me, as I have not been
 hearing them previously. It does appear we have a divided community,
 with some voices feeling that 303 will be the death of linked data,
 and others saying hash and 303 are working well. Where the center of
 gravity lies, I have no way of telling (and perhaps it's not important
 as long as any disagreement, or even ignorance, remains). As Larry
 Masinter said at the last TAG telcon, things do not seem to be
 converging.

 I'm sure many people are just deeply bored of this discussion.

No offense intended to Jeni and others who are working hard on this,
but *amen*, with bells on!

One of the things that bothers me most about the many years worth of
httpRange-14 discussions (and the implications that HR14 is
partly/heavily/solely to blame for slowing adoption of Linked Data) is
the almost complete lack of hard data being used to inform the
discussions. For a community populated heavily with scientists I find
that pretty tragic.

Tom.

P.S. Apologies if this repeats comments later in the thread than
Steve's post; the novelty of agreeing with Kingsley still isn't enough
to convince me to read the rest ;) Sad but true.

-- 
Dr. Tom Heath
Senior Research Scientist
Talis Education Ltd.
W: http://www.talisaspire.com/
W: http://tomheath.com/

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Jeni Tennison

Tom,

On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.
 
 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!
 
 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


What hard data do you think would resolve (or if not resolve, at least move 
forward) the argument? Some people are contributing their own experience from 
building systems, but perhaps that's too anecdotal? Would a structured survey 
be helpful? Or do you think we might be able to pick up trends from the 
webdatacommons.org (or similar) data?

The larger question is how do we get to a state where we *don't* have this 
permathread running, year in year out. Jonathan and the TAG's aim with the call 
for change proposals is to get us to that state. The idea is that by getting 
people who think that the specs should say something different to put their 
money where their mouth is and express what that should be, we have something 
more solid to work from than reams and reams of opinionated emails.

But we do all need to work at it if we're going to come to a consensus. I know 
everyone's tired of this discussion, but I don't think the TAG is going to do 
this exercise again, so this really is the time to contribute, and preferably 
in a constructive manner, recognising the larger aim.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Tom Heath

Hi Jeni,

On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,

 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.

 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!

 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


 What hard data do you think would resolve (or if not resolve, at least move 
 forward) the argument? Some people  are contributing their own experience 
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?

A few things come to mind:

1) a rigorous assessment of how difficult people *really* find it to
understand distinctions such as things vs documents about things.
I've heard many people claim that they've failed to explain this (or
similar) successfully to developers/adopters; my personal experience
is that everyone gets it, it's no big deal (and IRs/NIRs would
probably never enter into the discussion).

2) hard data about the 303 redirect penalty, from a consumer and
publisher side. Lots of claims get made about this but I've never seen
hard evidence of the cost of this; it may be trivial, we don't know in
any reliable way. I've been considering writing a paper on this for
the ISWC2012 Experiments and Evaluation track, but am short on spare
time. If anyone wants to join me please shout.

3) hard data about occurrences of different patterns/anti-patterns; we
need something more concrete/comprehensive than the list in the change
proposal document.

4) examples of cases where the use of anti-patterns has actually
caused real problems for people, and I don't mean problems in
principle; have planes fallen out of the sky, has anyone died? Does it
really matter from a consumption perspective? The answer to this is
probably not, which may indicate a larger problem of non-adoption.

 The larger question is how do we get to a state where we *don't* have this 
 permathread running, year in year
 out. Jonathan and the TAG's aim with the call for change proposals is to get 
 us to that state. The idea is that by
 getting people who think that the specs should say something different to 
 put their money where their mouth is  and express what that should be, we 
 have something more solid to work from than reams and reams of
 opinionated emails.

This is a really worthy goal, and thank you to you, Jonathan and the
TAG for taking it on. I long for the situation you describe where the
permathread is 'permadead' :)

 But we do all need to work at it if we're going to come to a consensus. I 
 know everyone's tired of this discussion,  but I don't think the TAG is 
 going to do this exercise again, so this really is the time to contribute, 
 and preferably
 in a constructive manner, recognising the larger aim.

I hear you. And you'll be pleased to know I commented on some aspects
of the document (constructively I hope). If my previous email was
anything but constructive, apologies - put it down to httpRange-14
fatigue :)

Cheers,

Tom.

-- 
Dr. Tom Heath
Senior Research Scientist
Talis Education Ltd.
W: http://www.talisaspire.com/
W: http://tomheath.com/

Re: Change Proposal for HttpRange-14

2012-03-26 Thread Tom Heath

Hi Dave,

On 26 March 2012 16:51, Dave Reynolds d...@epimorphics.com wrote:
 On 26/03/12 16:05, Tom Heath wrote:

 On 23 March 2012 15:35, Steve Harrissteve.har...@garlik.com  wrote:

 On 23 Mar 2012, at 14:05, Jonathan A Rees wrote:

 2012/3/23 Melvin Carvalhomelvincarva...@gmail.com:

 I dont think, even the wildest optimist, could have predicted the
 success of
 the current architecture (both pre and post HR14).


 The votes of confidence are interesting to me, as I have not been
 hearing them previously. It does appear we have a divided community,
 with some voices feeling that 303 will be the death of linked data,
 and others saying hash and 303 are working well. Where the center of
 gravity lies, I have no way of telling (and perhaps it's not important
 as long as any disagreement, or even ignorance, remains). As Larry
 Masinter said at the last TAG telcon, things do not seem to be
 converging.


 I'm sure many people are just deeply bored of this discussion.


 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!


 No argument.


 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


 The primary reason for having put my name to the proposal was that I
 personally been adversely affected. I have been involved in client
 discussions that have been derailed by someone bringing up httprange-14. I
 have been in discussions with clients where 303s are not acceptable (thanks
 to CDN behaviour). I have both received and (sadly) sent out data that is
 broken and caused errors due to cut/paste from the browser bar thanks to
 httprange-14.

 My anecdotal evidence is that the nature of the recurrent discussion can
 create or reinforce an impression of the area being too academic, not ready
 for practical use.

My anecdotal evidence suggests the same.

 I don't claim that httprange-14 is solely or substantially to blame for
 holding back linked data.  I don't claim that my personal experience is
 necessarily widespread or representative.

As I suspected, knowing your penchant for rigour :)

 There is no science on offer here,
 move on.

 But ... if, with the current TAG process, there is a chance of a new
 resolution that reduces any of these problems then it is worth a tiny bit of
 effort. If there is a chance the new resolution will be so good as to damp
 down this permathread then it is worth more effort. If it kills the
 permathread completely then I owe someone at least a crate of beer.

I'll personally double the beer prize. I really want to see closure as
much as you do, but will admit I'm skeptical. Do we collectively have
an agreement that whatever the TAG decide we'll accept it, implement
it, shut up, and move on? Now that's a document I want to see
signatures on!

(Obviously such a document would place even greater pressure on the
TAG, who already have my gratitude for doing a tough job.)

Cheers,

Tom.

-- 
Dr. Tom Heath
Senior Research Scientist
Talis Education Ltd.
W: http://www.talisaspire.com/
W: http://tomheath.com/

NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-26 Thread Tim Berners-Lee


On 2012-03 -25, at 14:06, Norman Gray wrote:

 
 Tim, greetings.
 
 On 2012 Mar 25, at 17:35, Tim Berners-Lee wrote:
 
 (Not useful to  talk about NIRs.  The web architecture does not. Now does 
 Jonathan's baseline, not HTTP Range-14.  Never assume that what an IR is 
 about is not itself a IR.)
 
 Well, httpRange-14 sort of does talk about 'non-information resources', by 
 necessary implication.  

Of course you can define the class but I said it isn't useful to talk about it.
That was an understatement.  It has wasted person-centuries of work.
Let me give a potted history for newcomers:

pinch of salt
1) The TAG wanted to settle whether, after a 200 response, the URI always 
referred to a document which you just got a representation of.
2) They - we - foolishly cutely phrased the issue what is range of the HTTP 
dereference function. Mistake.
   (This is the function mapping URI to HTTP entity aka HTTP representation, 
aka content.  So its range would e representation -- but we meant what does the 
URI denote if you use it in say an RDF system -- more like the range of 
denotation relation for HTTP hashless URIs )
3) They figured the semantics of the HTTP deref function were the relationship 
between the name for a document and the the contents of the document.
4) So in that case the domain of the function is name (URI in fact) and the 
domain is representation, and the URI denotes a document (Information Resource 
in fact). Which is not a big deal.
5)  Nor is the exact definition of the class document a big deal.
6) People then for some reason thought, oh, if I am running a server, then I 
must test everything I am serving to make sure it is an IR before I serve it 
-- oh no -- how can I make that test?  We must have a decision algorithm! 
Mistake.
7) They should have asked For each URI, what is the content of the document it 
names?
8) Instead they argued for years about the edge cases of what exactly was as 
what wasn't a IR.
Is a book? Is a girl with a tattooed poem? A page which says it is a person? A 
fridge?
9) Instead they should have thought Am I serving the contents of this, or am I 
serving data about this?  If I am serving the contents then I will use its 
URI; otherwise I will use a different URI for the document.
9.5) People actually experimented -- served up girls with tatoos and pages 
opining they were not pages and everything. For years.
10) (Ignore DanBri who now suggests that you coud argue forever about the 
difference between the content of something and a description of it. He only 
does it to annoy because he knows it teases.)
11) After a few years enough people such as Ian D said they wanted an 
alternative architecture, where do a get on the URI of a thing, and a document 
about the thing is returned, and the URI is not the URI of the document. That 
lead to the adoption of the 303. Which is still a problem as it takes time.
12) Still people say well, to know whether I use 200 or 303 I need to know if 
this sucker is an IR or NIR when instead they should be saying Well, am I 
going to serve the content of this sucker or information about it?. 
13) In fact lots of times, people serve information *about* something, not its 
contents, even though it has contents (it is an IR).
14) That's why I said Never assume that what an IR is about is not itself a IR
15) That's why I said Not useful to  talk about NIRs. In brief.  Or you can 
scan the email archive and see the long version.
/pinch of salt


 If the set of information resources (IR) is not the same as the set of all 
 resources (R), then the set R\IR (which in any case exists) is non-null, and 
 might as well be called the set of 'non-information-resources' as anything 
 else.  But perhaps R\IR is a better notation. (I don't intend this to be 
 hair-splitting)

What exactly do you mean by hair-splitting?

 Parenthetically, what _is_ IR?

You can't define a set we are going to use mathematically exactly in terms of 
the real world.
or people will always agued edge cases.  You can define a set of the things 
mathematically
in terms of each other, or you can try to define them in words like an 
encyclopaedia, 
but if you try to both you get endless arguments, as people argue that the 
terms you 


Documents like Jonathan's carefully defined these functions in terms of each 
other,
and there have been millions of attempts to explain to different people on the 
lists
in terms they understand, but if you 

Parenthetically, what is a Resource? Actually, what is an architecture? 

  Referring to Rees's editors draft [1], [issue-14-resolved] effectively says 
 that iff a resource X is 200-retrieved, then it must _always_ be assigned to 
 the set IR (the resolution seems to effectively define 'being 
 200-retrievable' as the definition of 'information resource', and this is 
 consistent with [1] section 1.1 which says One convention[...] was for a 
 hashless URI to refer to the document-like entity (information resource) 
 served at

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Jeni Tennison

Hi Michael,

On 24 Mar 2012, at 23:16, Michael Brunnbauer wrote:
 so every publisher who wants to provide licencing information for his RDF has 
 to either 
 
 1) use 303 redirects
 2) publish no data at the NIR except the describedby triples, which seems 
 pointless to me

They can publish whatever data they like at the NIR.

 3) use the same URI for the IR and the NIR

This isn't best practice, as it isn't today. But of course people do it, just 
as they conflate different meanings for any URI.

 If he also wants to provide meta information for his HTML, he cannot
 publish the HTML at the NIR. I don't see a new and easier option offered by
 the proposal. In the end, people will do what they already do today.

Yes, some publishers who are publishing according to the current specs (with 
303 redirections being their only option) might continue to do what they do 
today. Others, as I have explained previously, may prefer to use 200 responses 
so that they can use the main (NIR) URI everywhere in their web application, 
while retaining the other responses that they already support for semantic web 
purists to access.

Publishers who are not today using 303s but are nevertheless minting URIs which 
identify NIRs (ie those outside the semantic web community) will also continue 
to do what they do today. The difference is that we (semantic web purists) will 
no longer be constantly telling them that they're Doing It Wrong, but will be 
able to build consumers that cope with this reality.

 BTW: The POWDER describedby property suggests that you will find some
 information about the subject of the describedby triple when you dereference
 the object URI but this does not seem to be intended here.

Wrong. The documents that are said to describe a URI should still describe that 
URI.

 POWDER with it's
 power to assert metadata for whole collections of IRs also probably will
 contribute to the IR/NIR conflation.
 
 I think we should leave everything as it is and just don't blame publishers
 who conflate IRs and NIRs. Sooner or later, they probably will fix it all.


I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
what happens at the moment. Therefore we need to change something.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Jeni Tennison

Hi,

As you will have seen, I have now sent this Change Proposal to the TAG [1]. 
Technical discussion and comments should continue on www-...@w3.org. I note 
that there are other change proposals around for discussion as well [2][3][4].

The call is open until 29th March, so please get any other proposals in soon. 
The TAG meets 2-4th April and this is our first item of technical discussion 
[5].

Thanks everyone,

Jeni

[1] http://lists.w3.org/Archives/Public/www-tag/2012Mar/0086.html
[2] http://lists.w3.org/Archives/Public/www-tag/2012Mar/.html
[3] http://lists.w3.org/Archives/Public/www-tag/2012Mar/0006.html
[4] http://lists.w3.org/Archives/Public/www-tag/2012Mar/0085.html
[5] http://www.w3.org/2001/tag/2012/04/02-agenda

On 22 Mar 2012, at 20:21, Jeni Tennison wrote:

 Hi there,
 
 Hopefully you're all aware that there's a Call for Change Proposals [1] to 
 amend the TAG's long-standing HttpRange-14 decision [2]. Jonathan Rees has 
 put together a specification that expresses that decision in a more formal 
 way [3], against which changes need to be made.
 
 Leigh Dodds, Dave Reynolds, Ian Davis and I have put together a Change 
 Proposal [4], which I've copied below.
 
 From a publishing perspective, the basic change is that it becomes acceptable 
 for publishers to publish data about non-information resources with a 200 
 response; if a publisher want to provide licensing/provenance information 
 they can use a wdrs:describedby statement to point to a separate resource 
 about which such information could be provided.
 
 From a consumption perspective, the basic change is that consumers can no 
 longer assume that a 2XX response implies that the resource is an information 
 resource, though they can make that inference if the resource is the object 
 of a wdrs:describedby statement or has been reached by following a 303 
 redirection of a 'describedby' Link header.
 
 The aim of this email is not to start a discussion about the merits of this 
 or any other Change Proposal, but to make a very simple request: if you agree 
 with these changes, please can you add your name to the document at:
 
  
 https://docs.google.com/document/d/1aSI7LpD4UDuHiDNqx8qN1W400QeZdzWYD-CRuU0Xmk0/edit
 
 That document also contains a link to the Google Doc version of the proposal 
 [4] if you want to add comments.
 
 We will not be making substantive changes to this Change Proposal: if you 
 want to suggest a different set of changes to the HttpRange-14 decision, I 
 heartily recommend that you create a Change Proposal yourself! :) You should 
 feel free to use this Change Proposal as a basis for yours if you want. Note 
 that the deadline for doing so is 29th March (ie one week from today) so that 
 the proposals can be discussed at the TAG F2F meeting the following week.
 
 Thanks,
 
 Jeni
 
 [1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html
 [2] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
 [3] http://www.w3.org/2001/tag/doc/uddp/
 [4] 
 https://docs.google.com/document/d/1ognNNOIcghga9ltQdoi-CvbNS8q-dOzJjhMutJ7_vZo/edit
 
 ---
 Summary
 
 This proposal contains two substantive changes.
 
 First, it enables publishers to link to URI documentation for a given probe 
 URI by providing a 200 response to that probe URI that contains a statement 
 including a ‘describedby’ relationship from the probe URI to the URI 
 documentation.
 
 Second, a 200 response to a probe URI no longer implies that the probe URI 
 identifies an information resource; instead, this can only be inferred if the 
 probe URI is the object of a ‘describedby’ relationship.
 
 Rationale
 
 While there are instances of linked data websites using 303 redirections, 
 there are also many examples of people making statements about URIs 
 (particularly using HTML link relations, RDFa, microdata, and microformats) 
 where those statements indicate that the URI is supposed to identify a 
 non-information resource such as a Person or Book.
 
 Rather than simply telling these people that they are Doing It Wrong, 
 “Understanding URI Hosting Practice as Support for URI Documentation 
 Discovery” should ensure that:
 
 * applications that interpret such data do not draw wrong conclusions about 
 these URIs simply because they return a 200 response without a describedby 
 Link header
 * publishers of this data can easily upgrade to making the distinction 
 between the non-information resource that the page holds information about 
 and the information resource that is the page itself, should they discover 
 that they need to
 
 Details
 
 In section 4.1, in place of the second paragraph and following list, 
 substitute:
 
  There are three ways to locate a URI documentation link in an HTTP response:
 
  * using the Location: response header of a 303 See Other response 
 [httpbis-2], 
e.g.
 
303 See Other
Location: http://example.com/uri-documentation
 
  • using a Link: response header with link relation

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Michael Brunnbauer


Hello Jeni,

On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:
 I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
 what happens at the moment. Therefore we need to change something.

Do you think semantic web projects have been stopped because some purist
involved did not see a way to bring httprange14 into agreement with the
other intricacies of the project ? Those purists will still see the new
options that the proposal offers as what they are: Suboptimal.

Or do you think some purists have been actually blaming publishers ? What will
stop them in the future to complain like this: Hey, your website consists
solely of NIRs, I cannot talk about it! Please use 303.

You are solving the problem by pretending that the IRs are not there then
the publisher does not make the distinction between IR and NIR.

Maybe we can optimize the wording of standards and best practise guides to 
something like these are the optimal solutions. Many people also do it this 
way but this has the following drawbacks...

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Hugh Glaser

Fair questions, Michael.
I have a lot of sympathy for your I don't see the point of this whole 
discussion.
We can write what we want in documents, but the world can ignore them - and 
will if they don't work.
And the world will be what it is, not what we want it to be.

However.
Unfortunately, perhaps, standards are important for people who work in the 
field providing systems to others.
Personally, I never did agree with the solution, but have always aimed to carry 
out the implications of it in the systems I construct.

This is for two reasons:
a) as a member of a small community, it is destructive to do otherwise;
b) as a professional engineer, my ethical obligations require me to do so.

It is this second, the ethical obligations that are the most significant.
I should not digress from the standards, or even Best Practice, in my work.
(Apart from anything else, the legal implications of doing otherwise are very 
unpleasant.)

This means that systems involving Linked Data do not get built because the 
options I am allowed to offer are too expensive (in money, complexity, time or 
business disruption), or technologically infeasible due to local constraints.
So the answer to your first question is yes: semantic web (parts of) projects 
are stopped because of this. Ethics and community membership requires it.
When they do go ahead, of course they actually cause me some pain - 
implementing a situation I think is significantly sub-optimal - but I do not 
have the choice.

Of course, people who are outside this community will do what they feel like, 
as always.
But the current situation constrains the people in the community, who are the 
very people who should be helping others to build systems that are a little 
less broken.

Best
Hugh

On 25 Mar 2012, at 11:03, Michael Brunnbauer wrote:

 
 Hello Jeni,
 
 On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:
 I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
 what happens at the moment. Therefore we need to change something.
 
 Do you think semantic web projects have been stopped because some purist
 involved did not see a way to bring httprange14 into agreement with the
 other intricacies of the project ? Those purists will still see the new
 options that the proposal offers as what they are: Suboptimal.
 
 Or do you think some purists have been actually blaming publishers ? What will
 stop them in the future to complain like this: Hey, your website consists
 solely of NIRs, I cannot talk about it! Please use 303.
 
 You are solving the problem by pretending that the IRs are not there then
 the publisher does not make the distinction between IR and NIR.
 
 Maybe we can optimize the wording of standards and best practise guides to 
 something like these are the optimal solutions. Many people also do it this 
 way but this has the following drawbacks...
 
 Regards,
 
 Michael Brunnbauer
 
 -- 
 ++  Michael Brunnbauer
 ++  netEstate GmbH
 ++  Geisenhausener Straße 11a
 ++  81379 München
 ++  Tel +49 89 32 19 77 80
 ++  Fax +49 89 32 19 77 89 
 ++  E-Mail bru...@netestate.de
 ++  http://www.netestate.de/
 ++
 ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
 ++  USt-IdNr. DE221033342
 ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
 ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
 

-- 
Hugh Glaser,  
 Web and Internet Science
 Electronics and Computer Science,
 University of Southampton,
 Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Jeni Tennison

Michael,

On 25 Mar 2012, at 11:03, Michael Brunnbauer wrote:
 On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:
 I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
 what happens at the moment. Therefore we need to change something.
 
 Do you think semantic web projects have been stopped because some purist
 involved did not see a way to bring httprange14 into agreement with the
 other intricacies of the project ? Those purists will still see the new
 options that the proposal offers as what they are: Suboptimal.

What would be optimal in your view?

 Or do you think some purists have been actually blaming publishers ? What will
 stop them in the future to complain like this: Hey, your website consists
 solely of NIRs, I cannot talk about it! Please use 303.

Nothing. In fact TimBL has already said this [1], and Jonathan has pointed out 
what such people will have to do to make those kinds of statements [2]. This is 
already listed as a disadvantage in the proposal. I recognise it's a 
disadvantage, I just think it is worth the hit compared to the advantages of 
the change.

 You are solving the problem by pretending that the IRs are not there then
 the publisher does not make the distinction between IR and NIR.

No, I am just proposing stopping pretending that the NIR is not there, which is 
what is mandated by the current httpRange-14 design.

 Maybe we can optimize the wording of standards and best practise guides to 
 something like these are the optimal solutions. Many people also do it this 
 way but this has the following drawbacks...


Yes, as I argued here [3] I strongly believe that casting the separation of IR 
and NIR as a best practice rather than a vital necessity is the right way to go.

Cheers,

Jeni

[1] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0143.html
[2] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0144.html
[3] http://www.jenitennison.com/blog/node/159
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Michael Brunnbauer


Hello Jeni,

On Sun, Mar 25, 2012 at 12:31:18PM +0100, Jeni Tennison wrote:
  Those purists will still see the new
  options that the proposal offers as what they are: Suboptimal.
 
 What would be optimal in your view?

I do not know a way to mint two URIs (IR+NIR) in a way that is less painful.

  You are solving the problem by pretending that the IRs are not there then
  the publisher does not make the distinction between IR and NIR.
 
 No, I am just proposing stopping pretending that the NIR is not there, which 
 is what is mandated by the current httpRange-14 design.

If - like Hugh suggested - httpRange-14 is really stopping people inside the
community from delivering solutions and those people are willing to sacrifice
the IRs (although I find both of this hard to believe) - then you have good
reasons to go ahead.

But this makes me think about what those same people will be unable to deliver
because they cannot make the default IR assumption any more (as I said, the
rest of the world will probably go on making it).

Perhaps the default IR assumption be saved by saying that a 200 URI X is a 
IR as long as we don't find some triple at X that suggests otherwise. Why not a
NIR class ? If the concept of IRs/NIRs is sufficiently unambiguous to talk
about it in natural language (I think it is), we can talk about it in RDF.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Tim Berners-Lee


On 2012-03 -25, at 07:31, Jeni Tennison wrote:
 [..]
 Yes, as I argued here [3] I strongly believe that casting the separation of 
 IR and NIR as a best practice rather than a vital necessity is the right way 
 to go.
 

Let me assume that you meant:

 [..]
 Yes, as I argued here [3] I strongly believe that casting the separation of 
 IR and the thing it describes as a best practice rather than a vital 
 necessity is the right way to go.
 

To actually confused those things in a system is to me is absolutely 
unacceptable.
When I build rule file or systems some of them deal with documents and
some with things that those documents describe, and in general they do both.

If you want to define a new proposal then it had better be one where for
a given URL I know which it identifiers.  

Pre - HR14, I could do that by looking at the URL.

Post-HR14, I had to do a network operation to find out, but I can put 
up with that if it REALLY helps people.

With your change proposal, there are times when you don't know at all!

For example, under your change proposal does 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11;
identifify? 

If
 card:i  :likes 
http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11.
Do I like a book or a whale?

To not know is unacceptable to me. 
And to merge the two the IR and what it describes, as the same thing, is 
unacceptable too.

Tim


 Cheers,
 
 Jeni
 
 [1] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0143.html
 [2] http://lists.w3.org/Archives/Public/public-lod/2012Mar/0144.html
 [3] http://www.jenitennison.com/blog/node/159
 -- 
 Jeni Tennison
 http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Norman Gray


Michael and all, greetings.

On 2012 Mar 25, at 14:19, Michael Brunnbauer wrote:

 Perhaps the default IR assumption be saved by saying that a 200 URI X is a 
 IR as long as we don't find some triple at X that suggests otherwise. Why not 
 a
 NIR class ? If the concept of IRs/NIRs is sufficiently unambiguous to talk
 about it in natural language (I think it is), we can talk about it in RDF.

I confess I haven't kept fully up with the details of this suddenly rampant 
thread, but this suggestion is the one I associate with Ian Davis back in the 
'Is 303 really necessary?' thread of November 2010 (that long ago!?).

One can characterise this as 'httpRange-14 is defeasible', or, as a procedure:


After a client has extracted all of the 'authoritative' statements about a 
resource X, which is retrieved with a 200 status, it rfc2119-should add the 
triple 'X a eg:InformationResource', unless this would create a contradiction.


Why would this create a contradiction?  The resource X might explicitly say 
that it is a eg:NonInformationResource; it might be declared to be a eg:Book, 
which is here or elsewhere declared to be subClassOf eg:NonInformationResource; 
or X might be in the domain or range of a property which indicates that it is a 
non-IR, such as for example :describedBy.

What's 'extracted'?  That could include RDF+conneg, or RDFa, or some 
semi-formal microformats-based process, or anything you like.

What's 'authoritative'?  That's to some extent up to the client, but it would 
sensibly be the list of statements 200-retrieved from the resource itself.

That seems to include the practice described by Jeni's change request, and so 
inherit its advantages.  It avoids telling anyone they're Doing It Wrong, with 
a 200 NIR resource.  If someone at present describes a NIR with a 200 response, 
they can 'fix' that with a simple one-triple addition.  Also, it leaves it 
entirely up to the resource owner to decide how many URIs they wish to 
maintain, and which one documents which.

I'm sure most RDF descriptions of NIRs already do implicitly declare that they 
are NIRs.

This overall seems to be the intent behind the :isdescribedby proposal.  Is 
that correct?

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Re: Change Proposal for HttpRange-14


On 3/25/12 5:13 AM, Jeni Tennison wrote:

I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
what happens at the moment. Therefore we need to change something.


They only get blamed when they claim that they are publishing Linked 
Data. If they don't do that nobody will complain. All Structured Data 
isn't Linked Data. All Linked Data is a form Structured Data.


HttpRange-14 findings facilitate the co-existence of Structured Data and 
Linked Data on the Web.


RDF and its family of syntaxes and serialization formats are vehicles 
for constructing resources that bear structured data. The same applies 
to HTML, XML, JSON etc.. None of these syntaxes produce Linked Data 
implicitly, you have to adhere to Linked Data principles for that to 
happen.


The fundamental concern I have right now is that this effort is 
conflating basic Structured Data and the fidelity of Linked Data.


You don't need any kind of revision to HttpRange-14 recommendations to 
enable what has long been reality on the Web. By that I mean: people 
have conflated Names and Addresses via URIs forever. Said conflation is 
only an issue when the end product is inaccurately classified as being 
Linked Data principles compliant.


Linked Data is a different system or dimension of the Web. Without its 
fidelity many critical items become impossible to implement at Web scale:


1. data access by Name reference;
2. equivalence fidelity and inference;
3. distributed verifiable identity;
4. functional read-write-web .

Conflate Names and Addresses and the above simply fail.

Structured Data is growing exponentially on the Web thanks to efforts 
such as schema.org, Facebook Open Graph,  and the emergence of JSON as 
an alternative to XML re., structured data representation syntax. That's 
a good thing. The more Structured Data we have on the Web the easier it 
becomes to explain and demonstrate the unique fidelity and benefits that 
Linked Data introduces.


To conclude, we need to change our tendency to conflate matters since 
all Structured Data != Linked Data. Every time we conflate everything 
gets mucked up and things stall.


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: Change Proposal for HttpRange-14


On 3/25/12 6:03 AM, Michael Brunnbauer wrote:

Hello Jeni,

On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:

I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
what happens at the moment. Therefore we need to change something.

Do you think semantic web projects have been stopped because some purist
involved did not see a way to bring httprange14 into agreement with the
other intricacies of the project ? Those purists will still see the new
options that the proposal offers as what they are: Suboptimal.

Or do you think some purists have been actually blaming publishers ? What will
stop them in the future to complain like this: Hey, your website consists
solely of NIRs, I cannot talk about it! Please use 303.

You are solving the problem by pretending that the IRs are not there then
the publisher does not make the distinction between IR and NIR.

Maybe we can optimize the wording of standards and best practise guides to
something like these are the optimal solutions. Many people also do it this
way but this has the following drawbacks...

Regards,

Michael Brunnbauer


+1

Structured Data != Linked Data.

Linked Data == Structured Data.

--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Middle ground change proposal for httpRange-14

2012-03-25 Thread David Booth

Jeni, Ian, Leigh, Nick, Hugh, Steve, Masahide, Gregg, Niklas, Jerry,
Dave, Bill, Andy, John, Ben, Damian, Thomas, Ed Summers and Davy,

I have drafted what I think may represent a middle ground change
proposal and I am wondering if something along this line would also meet
your concerns:
http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol

Highlights of this proposal:
 - It enables a URI owner to unambiguously convey any URI definition to
an interested client.
 - It does not constrain whether or how a client will use that or any
other URI definition, as that is the client's business.
 - It retains the existing httpRange-14 rule.
 - It also permits the use of an HTTP 200 response with RDF content as a
means of conveying a URI definition.
 - It provides guidelines for avoiding confusion and inconsistencies,
while acknowledging the burden those guidelines place on URI owners.
 - It encourages URI owners to publish URI definitions even if those URI
definitions are not perfect. 

It also includes numerous other clarifications. 

Would something along these lines also meet your concerns?



-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Tim Berners-Lee


On 2012-03 -23, at 21:02, Jeni Tennison wrote:

 
 On 23 Mar 2012, at 22:42, Jonathan A Rees wrote:
 
 On Thu, Mar 22, 2012 at 4:21 PM, Jeni Tennison j...@jenitennison.com wrote:
 
 While there are instances of linked data websites using 303 redirections, 
 there are also many examples of people making statements about URIs 
 (particularly using HTML link relations, RDFa, microdata, and microformats) 
 where those statements indicate that the URI is supposed to identify a 
 non-information resource such as a Person or Book.
 
 Can you provide a handful of these Doing It Wrong URIs please from
 various sites? I think it would really be helpful to have them on hand
 during discussions.
 
 
 OK. These picked up from dumps made available by webdatacommons.org, so very 
 grateful to them for making that available; it can be quite hard to locate 
 this kind of markup generally. Also I've used Gregg's distiller [1] to 
 extract the RDFa out of the documents to double-check.
 
 
 http://www.logosportswear.com/product/1531
 - 301 
 - http://www.logosportswear.com/product/1531/harbor-cruise-boat-tote
 
  which contains the RDFa statement
 
  http://www.logosportswear.com/product/1531
a http://rdf.data-vocabulary.org/#Product ;
.
 
  The URI is intended to identify a product, not a web page.
 

Indeed, and notice that it has a different URI from the web page
The 301 could be easily changed to a 303, and all would be happy.
They have done the difficult bit of separating out the product and the page.

A site, by the way, which uses 301 is saying that the URI you asked for
is obsolete, and you should stop using it.

Tim

Re: Change Proposal for HttpRange-14


On 3/25/12 7:18 AM, Hugh Glaser wrote:

Fair questions, Michael.
I have a lot of sympathy for your I don't see the point of this whole 
discussion.
We can write what we want in documents, but the world can ignore them - and 
will if they don't work.
And the world will be what it is, not what we want it to be.

However.
Unfortunately, perhaps, standards are important for people who work in the 
field providing systems to others.
Personally, I never did agree with the solution, but have always aimed to carry 
out the implications of it in the systems I construct.

This is for two reasons:
a) as a member of a small community, it is destructive to do otherwise;
b) as a professional engineer, my ethical obligations require me to do so.

It is this second, the ethical obligations that are the most significant.
I should not digress from the standards, or even Best Practice, in my work.
(Apart from anything else, the legal implications of doing otherwise are very 
unpleasant.)

This means that systems involving Linked Data do not get built because the 
options I am allowed to offer are too expensive (in money, complexity, time or 
business disruption), or technologically infeasible due to local constraints.


But as an engineer the complexity of the spec shouldn't determine the 
very essence of the spec. The whole AWWW is about the deceptively 
simple principle in action. It isn't a simply simple solution.


We have URI abstraction and styles of URIs (hash or slash). The system 
(Linked Data in this case) is concerned about separation of powers right 
down to the fine-grained level of structured data representation. As 
result, there are implications that arise from the style of URI used in 
this context.


Since 1998 we've ended up with the following syntaxes and serializations 
formats for the RDF model (EAV enhanced with URIs, language tags, and 
typed literals):


1. RDF/XML
2. N3
3. Turtle
4. TriX
5. N-Triples
7. TriG
8. NQuad
9. (X)HTML+RDFa
10. HTML+Microdata
11. JSON/RDF
12. JSON-LD.

Don't you see a pattern here? Also what's an innocent newbie supposed to 
do when they encounter the above.


Now we want repeat the pattern, this time scoped to URIs and they usage 
re. Linked Data fidelity:


1. hash -- Linked Data indirection is implicit
2. slash -- 303 redirection delivering Linked Data indirection explicitly
3. slash -- 200 OK and no redirection leaving user agents to process 
relations (and HTTP response headers) en route to manifestation of 
Linked Data's mandatory indirection.


Again, don't you see the same pattern is taking shape i.e., a potpourri 
of suggestions that ultimately only add more confusion to newbies. Even 
worse, this particular suggest is ultimately a reworking of the entire 
AWWW.



So the answer to your first question is yes: semantic web (parts of) projects 
are stopped because of this.


I don't buy that for one second. There's a little more to it than that.  
How about the tools being used for these projects? You statement implies 
the very best tools available where used and they failed. You know that 
cannot be true.



  Ethics and community membership requires it.
When they do go ahead, of course they actually cause me some pain - 
implementing a situation I think is significantly sub-optimal - but I do not 
have the choice.


We have to separate issues here. We have:

1. a spec or set of best practices;
2. tools that implement the spec or best practices;
3. projects seeking to exploit the spec or best practices.

You are basically ruling out tool choices as reasons for project failure.



Of course, people who are outside this community will do what they feel like, 
as always.


And in due course opportunity costs force them to reevaluate their 
choices. Decision makers in commercial enterprises don't care about 
technology, they are fundamentally preoccupied with opportunity costs. 
Make opportunity costs palpable and you have the ear of any decision 
maker in charge of a commercial venture.



But the current situation constrains the people in the community, who are the 
very people who should be helping others to build systems that are a little 
less broken.


It doesn't. I just don't buy that. You can have Structured Data that 
isn't Linked Data. We can't have it both ways. Why not move folks over 
in stages i.e., get them to Structured Data first, then upgrade them to 
Linked Data since the virtues of the upgrade will have much clearer 
context since Structured Data modulo Linked Data fidelity has clear 
limitations. Basically, turn what seems to be today's headache into a 
narrative showcasing specific virtues.


Also note, we don't have a bookmarking problem with any style of URI for 
Linked Data. People can start by bookmarking the URLs of Information 
Resources.



Kingsley


Best
Hugh

On 25 Mar 2012, at 11:03, Michael Brunnbauer wrote:


Hello Jeni,

On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:

I agree we shouldn't blame publishers who

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Hugh Glaser

Hi Kingsley,
On 25 Mar 2012, at 17:17, Kingsley Idehen wrote:

 On 3/25/12 7:18 AM, Hugh Glaser wrote:
 Fair questions, Michael.
 I have a lot of sympathy for your I don't see the point of this whole 
 discussion.
 We can write what we want in documents, but the world can ignore them - and 
 will if they don't work.
 And the world will be what it is, not what we want it to be.
 
 However.
 Unfortunately, perhaps, standards are important for people who work in the 
 field providing systems to others.
 Personally, I never did agree with the solution, but have always aimed to 
 carry out the implications of it in the systems I construct.
 
 This is for two reasons:
 a) as a member of a small community, it is destructive to do otherwise;
 b) as a professional engineer, my ethical obligations require me to do so.
 
 It is this second, the ethical obligations that are the most significant.
 I should not digress from the standards, or even Best Practice, in my work.
 (Apart from anything else, the legal implications of doing otherwise are 
 very unpleasant.)
 
 This means that systems involving Linked Data do not get built because the 
 options I am allowed to offer are too expensive (in money, complexity, time 
 or business disruption), or technologically infeasible due to local 
 constraints.
 
 But as an engineer the complexity of the spec shouldn't determine the very 
 essence of the spec. The whole AWWW is about the deceptively simple 
 principle in action. It isn't a simply simple solution.
I keep meaning to ask: what is AWWW? It's not a term I see used anywhere but 
your emails.
 
 We have URI abstraction and styles of URIs (hash or slash). The system 
 (Linked Data in this case) is concerned about separation of powers right down 
 to the fine-grained level of structured data representation. As result, there 
 are implications that arise from the style of URI used in this context.
 
 Since 1998 we've ended up with the following syntaxes and serializations 
 formats for the RDF model (EAV enhanced with URIs, language tags, and typed 
 literals):
 
 1. RDF/XML
 2. N3
 3. Turtle
 4. TriX
 5. N-Triples
 7. TriG
 8. NQuad
 9. (X)HTML+RDFa
 10. HTML+Microdata
 11. JSON/RDF
 12. JSON-LD.
 
 Don't you see a pattern here? Also what's an innocent newbie supposed to do 
 when they encounter the above.
Probably run screaming from the room.
Or at least tell us to go away and come back when the community has sorted 
itself out.
(Were I to present things this way.)
 
 Now we want repeat the pattern, this time scoped to URIs and they usage re. 
 Linked Data fidelity:
 
 1. hash -- Linked Data indirection is implicit
 2. slash -- 303 redirection delivering Linked Data indirection explicitly
 3. slash -- 200 OK and no redirection leaving user agents to process 
 relations (and HTTP response headers) en route to manifestation of Linked 
 Data's mandatory indirection.
 
 Again, don't you see the same pattern is taking shape i.e., a potpourri of 
 suggestions that ultimately only add more confusion to newbies. Even worse, 
 this particular suggest is ultimately a reworking of the entire AWWW.
I'm not sure I agree with your assertion of the same pattern.
In any case, I didn't say this proposal was perfect - I would do it differently.
But if it is a broken world - not fixing it should not be an option.
You and I will have to differ as to whether the Project is currently a success 
- you clearly think so - I think that we are far back from what where we should 
be by now.
 
 So the answer to your first question is yes: semantic web (parts of) 
 projects are stopped because of this.
 
 I don't buy that for one second. There's a little more to it than that.  How 
 about the tools being used for these projects? You statement implies the very 
 best tools available where used and they failed. You know that cannot be true.
Actually, it is.
Your fallacy is to think that these are purely technological issues, and can 
always be solved with tools.
These are socio-technical issues at best.
 
  Ethics and community membership requires it.
 When they do go ahead, of course they actually cause me some pain - 
 implementing a situation I think is significantly sub-optimal - but I do not 
 have the choice.
 
 We have to separate issues here. We have:
 
 1. a spec or set of best practices;
 2. tools that implement the spec or best practices;
 3. projects seeking to exploit the spec or best practices.
 
 You are basically ruling out tool choices as reasons for project failure.
 
 
 Of course, people who are outside this community will do what they feel 
 like, as always.
 
 And in due course opportunity costs force them to reevaluate their choices. 
 Decision makers in commercial enterprises don't care about technology, they 
 are fundamentally preoccupied with opportunity costs. Make opportunity costs 
 palpable and you have the ear of any decision maker in charge of a commercial 
 venture.
 
 But the current situation constrains the people in

Re: Change Proposal for HttpRange-14


On 3/25/12 1:07 PM, Niklas Lindström wrote:

To clarify, what do I mean by another information resource? Isn't a
representation of a resource also a resource, ultimately different
from the thing itself?


Yes!

We have the following in play, but never easily discernible from 
Semantic Web and Linked Data narratives:


1. A Document (a Resource) -- bears representations of whatever
2. A Descriptor Document (a subClassOf Document) -- specifically bears 
representation of the description of an unambiguously named subject

3. An unambiguously named subject or entity .

The term Resource continues to be used carelessly and the net effect 
is utter confusion :-(  When reading Semantic Web (and even Linked Data 
literature) it's very easy for the untrained eye to assume 1-3 are Web 
resources. The subject of a description may or may not be a Web realm 
entity. Its just something that's caught the interest of an author 
(creator) of a description documentation.


--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








smime.p7s
Description: S/MIME Cryptographic Signature

Re: Middle ground change proposal for httpRange-14

2012-03-25 Thread David Wood

Hi David,

*sigh*.  I said recently that I would rather chew my arm off than re-engage 
with http-range-14.  Apparently I have very little self control.

On Mar 25, 2012, at 11:54, David Booth wrote:
 Jeni, Ian, Leigh, Nick, Hugh, Steve, Masahide, Gregg, Niklas, Jerry,
 Dave, Bill, Andy, John, Ben, Damian, Thomas, Ed Summers and Davy,
 
 I have drafted what I think may represent a middle ground change
 proposal and I am wondering if something along this line would also meet
 your concerns:
 http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol
 

 Highlights of this proposal:
 - It enables a URI owner to unambiguously convey any URI definition to
 an interested client.

+1 to this.  I have long been a fan of unambiguous definition.  The summary 
argument against is Leigh Dodd's show what is actually broken approach and 
the summary argument for is my we need to invent new ways to associate RDF 
with other Web resources in a discoverable manner to allow for 
'follow-your-nose' across islands of Linked Data.


 - It does not constrain whether or how a client will use that or any
 other URI definition, as that is the client's business.

Yes, +1 of course.

 - It retains the existing httpRange-14 rule.

No real argument here, I suppose.

 - It also permits the use of an HTTP 200 response with RDF content as a
 means of conveying a URI definition.

+1.  I quite like the registration of the isDefinedBy relation as a 
complement to POWDER's describedby, although I'd probably name it in the same 
way for consistency (definedby).


 - It provides guidelines for avoiding confusion and inconsistencies,
 while acknowledging the burden those guidelines place on URI owners.

This is no different to the current Web and allows people to play fast and 
loose with the standards if they need or choose to.  Overall, that is a feature 
not a bug.


 - It encourages URI owners to publish URI definitions even if those URI
 definitions are not perfect. 

It also allows non-URI owners to publish conflicting or complementary 
definitions and for them to refer by URI to each others definitions.  That's 
good.


 
 It also includes numerous other clarifications. 

A note regarding hash URIs:  It seems to me that the major use for hash URIs is 
to provide links into monolithic human-oriented documentation.  That made sense 
on the early Web and still works today.  Your proposal would allow existing 
users of hash URIs to provide a separate machine-oriented isDefinedBy 
relation URI.  That would make a nice (and useful) bridge toward addressing 
hash URI deployments without breaking them.

It is also worth noting that hash URI users could easily navigate 
bi-directionally between human- and machine-oriented content using this 
approach, thereby taking the sting out of the use of hash URIs:  Once a 
machine-oriented definition URI has been found once, it could be cached and 
used subsequently.  In other words, this proposal (ever so slightly extended to 
be more clear regarding the caching guidance for the isDefinedBy relation) 
could be used to get around the existing cacheless nature of the 303 response.


 
 Would something along these lines also meet your concerns?


After giving this proposal a provisional thumbs up, I still doubt whether the 
TAG (after more than a decade arguing about this and finally satisfying 
http-range-14 via a minimal patch) will be able to come to consensus on a major 
change.  Good luck :)

Regards,
Dave
--
David Wood, Ph.D.
3 Round Stones
http://3roundstones.com

Re: Change Proposal for HttpRange-14

On 3/25/12 1:18 PM, Hugh Glaser wrote:

Hi Kingsley,
On 25 Mar 2012, at 17:17, Kingsley Idehen wrote:

On 3/25/12 7:18 AM, Hugh Glaser wrote:

Fair questions, Michael.
I have a lot of sympathy for your I don't see the point of this whole
discussion.
We can write what we want in documents, but the world can ignore them - and
will if they don't work.
And the world will be what it is, not what we want it to be.

However.
Unfortunately, perhaps, standards are important for people who work in the
field providing systems to others.
Personally, I never did agree with the solution, but have always aimed to carry
out the implications of it in the systems I construct.

This is for two reasons:
a) as a member of a small community, it is destructive to do otherwise;
b) as a professional engineer, my ethical obligations require me to do so.

It is this second, the ethical obligations that are the most significant.
I should not digress from the standards, or even Best Practice, in my work.
(Apart from anything else, the legal implications of doing otherwise are very
unpleasant.)

This means that systems involving Linked Data do not get built because the
options I am allowed to offer are too expensive (in money, complexity, time or
business disruption), or technologically infeasible due to local constraints.

But as an engineer the complexity of the spec shouldn't determine the very essence of the spec. The
whole AWWW is about the deceptively simple principle in action. It isn't a simply
simple solution.

I keep meaning to ask: what is AWWW? It's not a term I see used anywhere but
your emails.

Architecture of the World Wide Web.

We have URI abstraction and styles of URIs (hash or slash). The system (Linked
Data in this case) is concerned about separation of powers right down to the
fine-grained level of structured data representation. As result, there are
implications that arise from the style of URI used in this context.

Since 1998 we've ended up with the following syntaxes and serializations
formats for the RDF model (EAV enhanced with URIs, language tags, and typed
literals):

1. RDF/XML
2. N3
3. Turtle
4. TriX
5. N-Triples
7. TriG
8. NQuad
9. (X)HTML+RDFa
10. HTML+Microdata
11. JSON/RDF
12. JSON-LD.

Don't you see a pattern here? Also what's an innocent newbie supposed to do
when they encounter the above.

Probably run screaming from the room.
Or at least tell us to go away and come back when the community has sorted
itself out.
(Were I to present things this way.)

Now we want repeat the pattern, this time scoped to URIs and they usage re.
Linked Data fidelity:

1. hash -- Linked Data indirection is implicit
2. slash -- 303 redirection delivering Linked Data indirection explicitly
3. slash -- 200 OK and no redirection leaving user agents to process relations
(and HTTP response headers) en route to manifestation of Linked Data's
mandatory indirection.

Again, don't you see the same pattern is taking shape i.e., a potpourri of
suggestions that ultimately only add more confusion to newbies. Even worse,
this particular suggest is ultimately a reworking of the entire AWWW.

I'm not sure I agree with your assertion of the same pattern.
In any case, I didn't say this proposal was perfect - I would do it differently.
But if it is a broken world - not fixing it should not be an option.

I don't think the AWWW is broken. That's my fundamental argument.

You and I will have to differ as to whether the Project is currently a success
- you clearly think so - I think that we are far back from what where we should
be by now.

I don't think so. There is more structured data on the Web, and it is
growing exponentially. This simplifies the entire pursuit of Web scale
Linked Data.

So the answer to your first question is yes: semantic web (parts of) projects
are stopped because of this.

I don't buy that for one second. There's a little more to it than that. How
about the tools being used for these projects? You statement implies the very
best tools available where used and they failed. You know that cannot be true.

Actually, it is.
Your fallacy is to think that these are purely technological issues, and can always be
solved with tools.

I know these issues can be solved by tools. I've designed such tools and
they are in broad use :-)

These are socio-technical issues at best.

Ethics and community membership requires it.
When they do go ahead, of course they actually cause me some pain -
implementing a situation I think is significantly sub-optimal - but I do not
have the choice.

We have to separate issues here. We have:

1. a spec or set of best practices;
2. tools that implement the spec or best practices;
3. projects seeking to exploit the spec or best practices.

You are basically ruling out tool choices as reasons for project failure.

Of course, people who are outside this community will do what they feel like,
as always.

And in due course opportunity costs force them to

Re: Change Proposal for HttpRange-14

2012-03-25 Thread David Wood

On Mar 24, 2012, at 08:38, James Leigh wrote:

 On Sat, 2012-03-24 at 08:11 +, Jeni Tennison wrote:
 Can I just cast that into the language used by the rest of the proposal? 
 What about:
 
when documentation is served with a 200 response from a probe
URI and does not contain a 'describedby' statement, some agents 
(including the publisher) might use it to identify the documentation
and others a non-information resource. Publishers still need to 
provide support for two distinct URIs if they want to enable more
consistent use of the URI.
 
 How does that sound?
 
 
 I'd buy into that.

It works, but asks a lot from implementors and users to read and understand the 
subtlety.  That's why I'd prefer an approach that provides a more simple, 
unambiguous definition.

Regards,
Dave



 
 Regards,
 James

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Dan Brickley

On 25 March 2012 11:03, Michael Brunnbauer bru...@netestate.de wrote:

 Hello Jeni,

 On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:
 I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
 what happens at the moment. Therefore we need to change something.

 Do you think semantic web projects have been stopped because some purist
 involved did not see a way to bring httprange14 into agreement with the
 other intricacies of the project ? Those purists will still see the new
 options that the proposal offers as what they are: Suboptimal.

 Or do you think some purists have been actually blaming publishers ? [...]

http://go-to-hellman.blogspot.co.uk/2009/10/new-york-times-blunders-into-linked.html
comes close to doing so... though more around semantics of 'sameas'
than IR/NIR.

Dan

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Norman Gray


Tim, greetings.

On 2012 Mar 25, at 17:35, Tim Berners-Lee wrote:

 (Not useful to  talk about NIRs.  The web architecture does not. Now does 
 Jonathan's baseline, not HTTP Range-14.  Never assume that what an IR is 
 about is not itself a IR.)

Well, httpRange-14 sort of does talk about 'non-information resources', by 
necessary implication.  If the set of information resources (IR) is not the 
same as the set of all resources (R), then the set R\IR (which in any case 
exists) is non-null, and might as well be called the set of 
'non-information-resources' as anything else.  But perhaps R\IR is a better 
notation. (I don't intend this to be hair-splitting)

Parenthetically, what _is_ IR?  Referring to Rees's editors draft [1], 
[issue-14-resolved] effectively says that iff a resource X is 200-retrieved, 
then it must _always_ be assigned to the set IR (the resolution seems to 
effectively define 'being 200-retrievable' as the definition of 'information 
resource', and this is consistent with [1] section 1.1 which says One 
convention[...] was for a hashless URI to refer to the document-like entity 
(information resource) served at that URI).

So my phrasing was intended to weaken [issue-14-resolved] to suggest that X 
being 200-retrievable puts X in IR, _only_ if the documentation about X 
(retrieved by conneg on X, say) does not put it in R\IR.

How something is put into R\IR is a separate issue.  Perhaps there's a need for 
a class std:RnotIR, or perhaps this is up to the client, who may decide that 
discovering that 'X a foaf:Person' is enough to put it in R\IR for the client's 
purposes.


Example:

So, if X=http://example.org/cedric 200-returns

 foaf:name Cedric.

then X is in IR, and oddly enough has a name (the domain of foaf:name isn't 
restricted to foaf:Person).  If it 200-returns

 foaf:name Cedric; 
a foaf:Person.

then the client should deem X to be in R\IR.


This does mean that the RDF description document which has been retrieved from 
the URI X doesn't have a name at this point.  But if that matters to the owner 
of X (perhaps because they want to refer to how the description document is 
licensed), then this minority (?) situation can be managed by having retrieval 
of X produce

 a foaf:Person; 
eg:describedBy http://example.org/cedric-description.
http://example.org/cedric-description eg:licensed cc-by.

That places X in R\IR, and indicates a description document about which 
anything one wishes can be asserted.

All the best,

Norman


[1] http://www.w3.org/2001/tag/doc/uddp-20120229/

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Re: Middle ground change proposal for httpRange-14