Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Karen Coyle

Mike Taylor wrote:

Going back to someone's point about living in the real
world (sorry, I forget who), the Inconvenient Truth is that 90% of
programs and 99% of users, on seeing an http: URL, will try to treat
it as a link.  They don't know any better.
  


And they can't know any better because there is no discernible 
difference to either a human being or a program. There is nothing about 
an http URI that tells you what it is being used as and whether there is 
anything at the other end until you send it out over the net where it 
will be processed by http.[*] So we have two things that are identical 
in form but very different in what we can do with them. Isn't this a bad 
idea? There are probably solutions to this; perhaps a particular port 
that indicates that the identifier cannot be dereferenced (I'll suggest 
666) or one that gets data about the resource identified (1 would 
do). But it seems to me that the best solution is to use a URI scheme 
that isn't the same as a URI scheme that is already used for a protocol.


And before someone comes up with the statement that when you have a URL 
you don't know beforehand if you will get something or will get a 404 
error, and therefore it's the same as a URI, let me remind you that 404 
is indeed an *error code* that means 'Not found', which implies that 
*not finding anything* is an error. It can't, however, tell you if there 
*should* have been something there, or if there never was supposed to be 
something there, but it does mean that an error has occurred.And if you 
want to make the argument that one could return some other http return 
code, that implies having some program responding to the URI in response 
to http, which is a form of derefencing. If you're going to do that, you 
might as well include some real info about the thing identified.


kc

[*] This is one of the things that I've always disliked about the DOI -- 
it is an identifier for the resource, but it doesn't necessarily resolve 
to the resource. In fact, some DOIs resolve to the resource, some 
resolve to a page with metadata for the resource, some go to a 
publisher's home page and the resource isn't available in digital 
format. I understand why this is the case, but it makes it hard for 
humans and machines because isn't clear what you will get when you 
dereference a DOI (and you are encouraged to dereference them because 
they are a sales mechanism). I think that not getting what you want and 
expect when clicking on a link is one of the things that discourages 
users (machines just trundle happily along, but many of us are not 
machines).


--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Alexander Johannesen
On Fri, Apr 3, 2009 at 10:44, Mike Taylor  wrote:
> Going back to someone's point about living in the real
> world (sorry, I forget who), the Inconvenient Truth is that 90% of
> programs and 99% of users, on seeing an http: URL, will try to treat
> it as a link.  They don't know any better.

What on earth is this about? URIs *are* links; its in its design, it's
what its supposed to be. Don't design systems where they are treated
any differently. Again we're seeing that "all we need are URIs" poor
judgement of SemWeb enthusiasts muddling the waters. The short of it
is, if you're using URIs as identifiers, having the choice to
dereference it is a *feature*; if it resolves to 404 then tough (and
I'd say you designed your system poorly), but if it resolves to an
information snippet about the semantic meaning of that URI, they yay.
This is how us Topic Mappers see this whole debacle and flaw in the
SemWeb structure, and we call it Public Subject Indicators, where
"Public" means it resolves to something (just like WikiPedia URIs
resolve to some text that explains what it is representing),
"Subjects" are anything in the world (but distinct from Topics which
are software representations), and "Indicators" as they indicate
(rather than absolutely identify) things.

In other words, if you use URIs as identifiers (which is a *good*
thing), then resolvability is a feature to be promoted, not something
to be shunned. If you can't make good systems design, use URNs. You
can treat URI identifiers as both identifiers and subject indicators,
while URNs are evil.

> Let's make our identifiers look like identifiers.

What does that even mean? :)

> (By the way, note that this is NOT what I was saying back at the start
> of the thread.  This means that I have -- *gasp* -- changed my mind!
> Is this a first on the Internet?  :-)

Maybe, but it surely will be the last ...


Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Erik Hetzner
Erik Hetzner writes:
> Could somebody explain to me the way in which this identifier:
> 
> 
> 
> does not work *as an identifier*, absent any way of getting
> information about the referent, in a way that:
> 
> 
> 
> does work?

A quick clarification - before I digest Mike’s thoughts - I didn’t
mean to make a meaningless HTTP URI but a meaningful info URI.

What I was trying to illustrate was a non-dereferenceable URI. So,
for:



please read instead:



Thanks!

best, Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpwlL93ehevk.pgp
Description: PGP signature


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Mike Taylor
I keep telling myself I'm going to stop posting on this thread, but ...

Erik Hetzner writes:
 > Could somebody explain to me the way in which this identifier:
 > 
 > 
 > 
 > does not work *as an identifier*, absent any way of getting
 > information about the referent, in a way that:
 > 
 > 
 > 
 > does work?

We know that the syntax of URIs is :.  We know that for
info: URIs (i.e. when  is "info") the syntax of  is
/.  So parsing and handling the info: URI is really
easy to do in a clean way with separable pieces of code that have no
special cases.  All you need to know to make this work is that the
identifier is a URI -- the rest follows from established rules.  The
info: URI is more self-describing than the http: URI.  Even for a
human reading these, there is a big difference -- it's pretty much
impossible NOT to recognise what the info: URI identifies, whereas I
have absolutely no idea what the http: URI represents.

No-one disputes that it's _possible_ to use http: URLs as identifiers.
It's _possible_ to use compressed sawdust blocks as building materials
for houses, but people mostly don't do that, because we have better
options to hand which get the job done more efficiently and
appropriately.  Going back to someone's point about living in the real
world (sorry, I forget who), the Inconvenient Truth is that 90% of
programs and 99% of users, on seeing an http: URL, will try to treat
it as a link.  They don't know any better.  Heck, most of the time,
_we_ don't know any better, and it goes without saying that our
insight, experience, charm and rugged good looks make us the elite.
Let's make our identifiers look like identifiers.

(By the way, note that this is NOT what I was saying back at the start
of the thread.  This means that I have -- *gasp* -- changed my mind!
Is this a first on the Internet?  :-)

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "I've got a slug ..." -- _Parrot Sketch_, Monty Python's Flying
 Circus.


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 11:34:12 -0400,
Jonathan Rochkind wrote:
> […]
>
> I think too much of this conversation is about people's ideal vision of 
> how things _could_ work, rather than trying to make things work as best 
> as we can in the _actual world we live in_, _as well as_ planning for 
> the future when hopefully things will work even better.  You need a 
> balance between the two.

This is a good point. But as I see it, the web people - for lack of a
better word - *are* discussing the world we live in. It is those who
want to re-invent better ways of doing things who are not.

HTTP is here. HTTP works. *Everything* (save one) people want to do
with info: URIs or urn: URIs or whatever already works with HTTP.

I can count one thing that info URIs possess that HTTP URIs don’t: the
‘feature’ of not ever being dereferenceable. And even that is up in
the air - somebody could devise a method to dereference them at any
time. And then where are you?

> […]
>
> a) Are as likely to keep working indefinitely, in the real world of
> organizations with varying levels of understanding, resources, and
> missions.

Could somebody explain to me the way in which this identifier:



does not work *as an identifier*, absent any way of getting
information about the referent, in a way that:



does work?

I don’t mean to be argumentative - I really want to know! I think
there may be something that I am missing here.

> b) Are as likely as possible to be adopted by as many people as possible 
> for inter-operability. Having an ever-increasing number of possible 
> different URIs to represent the same thing is something to be avoided if 
> possible.

+1

> c) Are as useful as possible for the linked data vision.

+1

> […]

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpTIK69UTZMm.pgp
Description: PGP signature


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Jonathan Rochkind

Houghton,Andrew wrote:

I think the answer lies in DNS. Even though you have a single DNS name
requests could be redirected to one of multiple servers, called a server
farm.  I believe this is how many large sites, like Google, operate.  So
even if a single server fails the load balancer sends requests to other
servers.  Even OCLC does this.
  


Certainly one _could_ do lots of things -- although I'm more worried 
about an _organization_ as a point of failure, than a particular piece 
of hardware. I'm not worried about that with OCLC, but I am worried 
about that with some random programmer minting URIs at some random 
institution that doesn't neccesarily understand persistence as part of 
it's institutional mission.  With Mike's vision of lots of people 
everywhere minting URIs pointing at their own domains... how likely is 
it that hardly any of them are going to do this? Any time soon?


I think too much of this conversation is about people's ideal vision of 
how things _could_ work, rather than trying to make things work as best 
as we can in the _actual world we live in_, _as well as_ planning for 
the future when hopefully things will work even better.  You need a 
balance between the two.


I also start seeing people in this thread saying "But if you do it that 
way it doesn't work for the Semantic Web (tm)."  Except they are more 
likely to say 'linked data' than 'semantic web', because the latter 
phrase seems to have been somewhat discredited.


The linked data vision is cool, and makes many interesting things 
possible. As parts of it are built, piece by piece, more interesting 
things become possible. dbpedia is awesome.  I want to support such uses 
by making my work compatible with the linked data vision where possible.


But linked data is NOT the only reason or use case for URI identifiers.  
I am also trying to solve particular _right now_ use cases that do not 
neccesarily depend on the linked data vision.  When I do this, I am 
trying to create identifiers (and other schemes) that:


a) Are as likely to keep working indefinitely, in the real world of 
organizations with varying levels of understanding, resources, and 
missions.
b) Are as likely as possible to be adopted by as many people as possible 
for inter-operability. Having an ever-increasing number of possible 
different URIs to represent the same thing is something to be avoided if 
possible.

c) Are as useful as possible for the linked data vision.

These things need to be balanced. I believe that sometimes an info: URI 
is going to be the best balance. Other times an http URI pointing to 
purl.org might be.  Other times an http URI pointing at some particular 
entity might be. (In the latter case, especially when it's an entity 
that understands what it's getting into, commits to it, documents it, 
and has enough 'power' in the community to encourage other people to use 
it).   Sometimes we'll be wrong, and we'll discover that.


I am equally frustrated with what I see as the dogmatism of both of 
these absolute points of view: That in a particular case, or even in 
_all_ cases, 1) it's obvious that an http uri is the only right 
solution, or 2) it's obvious that an http uri is an unacceptable solution.


In the cases we're talking about, neither of those things are obvious to 
me.  We're inventing this stuff as we go.  And we need to invent it for 
the real world where people don't always do what we think they ought to, 
not just for our ideal fantasy linked data world.


Jonathan


  


[CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Mike Taylor
Jonathan Rochkind writes:
 > Isn't there always a single point of failure if you are expecting
 > to be able to resolve an http URI via the HTTP protocol?

Yes (modulo the use of multiple servers at a single IP address).  But
failure of a given document is typically not catastrophic -- there are
plenty of other documents out there.  Failure of the PURL server means
failure of every document that has a PURL.

What's more, PURL doesn't _replace_ the existing point of failure, it
just adds another one: remember that purl.org doesn't itself serve any
documents, it just redirects to where the document actually: for
example, http://purl.org/dc/terms/ redirects to
http://dublincore.org/2008/01/14/dcterms.rdf#
So if either purl.org or dublincore.org goes down, you're nadgered.

 > Now, if you have a collection of disparate http URIs, you have
 > _many_ points of failure in that collection. Any entity goes down
 > or ceases to exist, and the http URIs that resolved to that
 > entity's web server will stop working.
 > 
 > I'd actually rather have a _single_ point of failure, in an
 > organization that resources are being put into to ensure
 > persistence, then hundreds or thousands of points of failure, many
 > of which are organizations that may lack the mission, funding, or
 > understanding to provide reliable persistence.

Sounds like what you want is a single _host_, which really would
either work or not.  At the moment, if you use PURLs, you know none of
them will work if the PURL server goes down, and you still have the
problem of individual server flakiness.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "You take two bodies and you twirl them into one, their hearts
 and their bones, and they won't come undone" -- Paul Simon,
 "Hearts and Bones"


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Thursday, April 02, 2009 10:53 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution
> and identification )
> 
> Isn't there always a single point of failure if you are expecting to be
> able to resolve an http URI via the HTTP protocol?
> 
> Whether it's purl.org or not, there's always a single point of failure
> on a given http URI that you expect to resolve via HTTP, the entity
> operating the web server at the specified address. Right?

I think the answer lies in DNS.  Even though you have a single DNS name
requests could be redirected to one of multiple servers, called a server
farm.  I believe this is how many large sites, like Google, operate.  So
even if a single server fails the load balancer sends requests to other
servers.  Even OCLC does this.

> Now, if you have a collection of disparate http URIs, you have _many_
> points of failure in that collection. Any entity goes down or ceases to
> exist, and the http URIs that resolved to that entity's web server will
> stop working.

I think this also gets back to DNS.  Even though you have a single DNS
name requests could be redirected to servers outside the original request
domain.  So you could have distributed servers under many different domain
names.


Andy.


[CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Jonathan Rochkind
Isn't there always a single point of failure if you are expecting to be 
able to resolve an http URI via the HTTP protocol?


Whether it's purl.org or not, there's always a single point of failure 
on a given http URI that you expect to resolve via HTTP, the entity 
operating the web server at the specified address. Right?


Now, if you have a collection of disparate http URIs, you have _many_ 
points of failure in that collection. Any entity goes down or ceases to 
exist, and the http URIs that resolved to that entity's web server will 
stop working.


I'd actually rather have a _single_ point of failure, in an organization 
that resources are being put into to ensure persistence, then hundreds 
or thousands of points of failure, many of which are organizations that 
may lack the mission, funding, or understanding to provide reliable 
persistence.


Jonathan



Mike Taylor wrote:

Ross Singer writes:
 > Ray, you are absolutely right.  These would be bad identifiers.  But
 > let's say they're all identical (which I think is what you're saying,
 > right?), then this just strengthens the case for indirection through a
 > service like purl.org.  Then it doesn't *matter* that all of these are
 > different locations, there is one URI that represent the concept of
 > what is being kept at these locations.  At the end of the redirect can
 > be some sort of 300 response that lets the client pick which endpoint
 > is right for them -or arbitrarily chooses one for them.

I have to say I am suspicious of schemes like PURL, which for all
their good points introduce a single point of failure into, well,
everything that uses them.  That can't be good.  Especially as it's
run by the same compary that also runs the often-unavailable OpenURL
registry.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "I don't really think that the end can be assessed as of itself,
 as being the end, because what does the end feel like?  It's like
 trying to extrapolate the end of the universe.  lf the universe
 is indeed infinite, then what does that mean?  How far is all
 the way?  And then if it stops, what's stopping it and what's
 behind what's stopping it?  So 'What is the end?' is my question
 to you" -- David St. Hubbins, _This Is Spinal Tap_.