Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Karen Coyle

Mike Taylor wrote:

Going back to someone's point about living in the real
world (sorry, I forget who), the Inconvenient Truth is that 90% of
programs and 99% of users, on seeing an http: URL, will try to treat
it as a link.  They don't know any better.
  


And they can't know any better because there is no discernible 
difference to either a human being or a program. There is nothing about 
an http URI that tells you what it is being used as and whether there is 
anything at the other end until you send it out over the net where it 
will be processed by http.[*] So we have two things that are identical 
in form but very different in what we can do with them. Isn't this a bad 
idea? There are probably solutions to this; perhaps a particular port 
that indicates that the identifier cannot be dereferenced (I'll suggest 
666) or one that gets data about the resource identified (1 would 
do). But it seems to me that the best solution is to use a URI scheme 
that isn't the same as a URI scheme that is already used for a protocol.


And before someone comes up with the statement that when you have a URL 
you don't know beforehand if you will get something or will get a 404 
error, and therefore it's the same as a URI, let me remind you that 404 
is indeed an *error code* that means 'Not found', which implies that 
*not finding anything* is an error. It can't, however, tell you if there 
*should* have been something there, or if there never was supposed to be 
something there, but it does mean that an error has occurred.And if you 
want to make the argument that one could return some other http return 
code, that implies having some program responding to the URI in response 
to http, which is a form of derefencing. If you're going to do that, you 
might as well include some real info about the thing identified.


kc

[*] This is one of the things that I've always disliked about the DOI -- 
it is an identifier for the resource, but it doesn't necessarily resolve 
to the resource. In fact, some DOIs resolve to the resource, some 
resolve to a page with metadata for the resource, some go to a 
publisher's home page and the resource isn't available in digital 
format. I understand why this is the case, but it makes it hard for 
humans and machines because isn't clear what you will get when you 
dereference a DOI (and you are encouraged to dereference them because 
they are a sales mechanism). I think that not getting what you want and 
expect when clicking on a link is one of the things that discourages 
users (machines just trundle happily along, but many of us are not 
machines).


--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Alexander Johannesen
On Fri, Apr 3, 2009 at 10:44, Mike Taylor  wrote:
> Going back to someone's point about living in the real
> world (sorry, I forget who), the Inconvenient Truth is that 90% of
> programs and 99% of users, on seeing an http: URL, will try to treat
> it as a link.  They don't know any better.

What on earth is this about? URIs *are* links; its in its design, it's
what its supposed to be. Don't design systems where they are treated
any differently. Again we're seeing that "all we need are URIs" poor
judgement of SemWeb enthusiasts muddling the waters. The short of it
is, if you're using URIs as identifiers, having the choice to
dereference it is a *feature*; if it resolves to 404 then tough (and
I'd say you designed your system poorly), but if it resolves to an
information snippet about the semantic meaning of that URI, they yay.
This is how us Topic Mappers see this whole debacle and flaw in the
SemWeb structure, and we call it Public Subject Indicators, where
"Public" means it resolves to something (just like WikiPedia URIs
resolve to some text that explains what it is representing),
"Subjects" are anything in the world (but distinct from Topics which
are software representations), and "Indicators" as they indicate
(rather than absolutely identify) things.

In other words, if you use URIs as identifiers (which is a *good*
thing), then resolvability is a feature to be promoted, not something
to be shunned. If you can't make good systems design, use URNs. You
can treat URI identifiers as both identifiers and subject indicators,
while URNs are evil.

> Let's make our identifiers look like identifiers.

What does that even mean? :)

> (By the way, note that this is NOT what I was saying back at the start
> of the thread.  This means that I have -- *gasp* -- changed my mind!
> Is this a first on the Internet?  :-)

Maybe, but it surely will be the last ...


Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Erik Hetzner
Erik Hetzner writes:
> Could somebody explain to me the way in which this identifier:
> 
> 
> 
> does not work *as an identifier*, absent any way of getting
> information about the referent, in a way that:
> 
> 
> 
> does work?

A quick clarification - before I digest Mike’s thoughts - I didn’t
mean to make a meaningless HTTP URI but a meaningful info URI.

What I was trying to illustrate was a non-dereferenceable URI. So,
for:



please read instead:



Thanks!

best, Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpwlL93ehevk.pgp
Description: PGP signature


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Mike Taylor
I keep telling myself I'm going to stop posting on this thread, but ...

Erik Hetzner writes:
 > Could somebody explain to me the way in which this identifier:
 > 
 > 
 > 
 > does not work *as an identifier*, absent any way of getting
 > information about the referent, in a way that:
 > 
 > 
 > 
 > does work?

We know that the syntax of URIs is :.  We know that for
info: URIs (i.e. when  is "info") the syntax of  is
/.  So parsing and handling the info: URI is really
easy to do in a clean way with separable pieces of code that have no
special cases.  All you need to know to make this work is that the
identifier is a URI -- the rest follows from established rules.  The
info: URI is more self-describing than the http: URI.  Even for a
human reading these, there is a big difference -- it's pretty much
impossible NOT to recognise what the info: URI identifies, whereas I
have absolutely no idea what the http: URI represents.

No-one disputes that it's _possible_ to use http: URLs as identifiers.
It's _possible_ to use compressed sawdust blocks as building materials
for houses, but people mostly don't do that, because we have better
options to hand which get the job done more efficiently and
appropriately.  Going back to someone's point about living in the real
world (sorry, I forget who), the Inconvenient Truth is that 90% of
programs and 99% of users, on seeing an http: URL, will try to treat
it as a link.  They don't know any better.  Heck, most of the time,
_we_ don't know any better, and it goes without saying that our
insight, experience, charm and rugged good looks make us the elite.
Let's make our identifiers look like identifiers.

(By the way, note that this is NOT what I was saying back at the start
of the thread.  This means that I have -- *gasp* -- changed my mind!
Is this a first on the Internet?  :-)

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "I've got a slug ..." -- _Parrot Sketch_, Monty Python's Flying
 Circus.


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 11:34:12 -0400,
Jonathan Rochkind wrote:
> […]
>
> I think too much of this conversation is about people's ideal vision of 
> how things _could_ work, rather than trying to make things work as best 
> as we can in the _actual world we live in_, _as well as_ planning for 
> the future when hopefully things will work even better.  You need a 
> balance between the two.

This is a good point. But as I see it, the web people - for lack of a
better word - *are* discussing the world we live in. It is those who
want to re-invent better ways of doing things who are not.

HTTP is here. HTTP works. *Everything* (save one) people want to do
with info: URIs or urn: URIs or whatever already works with HTTP.

I can count one thing that info URIs possess that HTTP URIs don’t: the
‘feature’ of not ever being dereferenceable. And even that is up in
the air - somebody could devise a method to dereference them at any
time. And then where are you?

> […]
>
> a) Are as likely to keep working indefinitely, in the real world of
> organizations with varying levels of understanding, resources, and
> missions.

Could somebody explain to me the way in which this identifier:



does not work *as an identifier*, absent any way of getting
information about the referent, in a way that:



does work?

I don’t mean to be argumentative - I really want to know! I think
there may be something that I am missing here.

> b) Are as likely as possible to be adopted by as many people as possible 
> for inter-operability. Having an ever-increasing number of possible 
> different URIs to represent the same thing is something to be avoided if 
> possible.

+1

> c) Are as useful as possible for the linked data vision.

+1

> […]

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpTIK69UTZMm.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 19:29:49 +0100,
Rob Sanderson wrote:
> All I meant by that was that the info:doi/ URI is more informative as to
> what the identifier actually is than just the doi by itself, which could
> be any string.  Equally, if I saw an SRW info URI like:
> 
> info:srw/cql-context-set/2/relevance-1.0
> 
> that's more informative than some ad-hoc URI for the same thing.
> Without the external knowledge that info:doi/xxx is a DOI and
> info:srw/cql-context-set/2/ is a cql context set administered by the
> owner with identifier '2' (which happens to be me), then they're still
> just opaque strings.

Yes, info:doi/10./xxx is more easily recognizable (‘sniffable’) as
a DOI than 10./xxx, both for humans and machines.

If we don’t know, by some external means, that a given string has the
form of some identifier, then we must guess, or sniff it.

But it is good practice to use other means to ensure that we know
whether or not any given string is an identifier, and if it is, what
type it is. Otherwise we can get confused by strings like go:home. Was
that a URI or not?

That said, I see no reason why the URI:

info:srw/cql-context-set/2/relevance-1.0

is more informative than the URI:

http://srw.org/cql-context-set/2/relevance-1.0

As you say, both are just opaque URIs without the additional
information. This information is provided by, in the first case, the
info-uri registry people, or, in the second case, by the organization
that owns srw.org.

> I could have said that http://srw.cheshire3.org/contextSets/rel/ was the
> identifier for it (SRU doesn't care) but that's the location for the
> retrieval documentation for the context set, not a collection of
> abstract access points.
> 
> If srw.cheshire3.org was to go away, then people can still happily use
> the info URI with the continued knowledge that it shouldn't resolve to
> anything.

If srw.cheshire3.org goes away, people can still happily use the http
URI. (see below)

> With the potential dissolution of DLF, this has real implications, as
> DLF have an info URI namespace.  If they'd registered a bunch of URIs
> with diglib.org instead, which will go away, then people would have
> trouble using them.  Notably when someone else grabs the domain and
> starts using the URIs for something else.

The original URIs are still just as useful as identifiers, they have
become less useful as dereferenceable identifiers.

> Now if DLF were to disband AND reform, then they can happily go back to
> using info:dlf/ URIs even if they have a brand new domain.

The info:dlf/ URIs would be the same non-dereferenceable URIs they
always were, true. But what have we gained?

The issue of persistence of dereferenceablity is a real one. There are
solutions, e.g, other organizations can step in to host the domain;
the ARK scheme; or, we can all agree that the diglib.org domain is too
important to let be squatted, and agree that URIs that begin
http://diglib.org/ are special, and should by-pass DNS. [1]

> > I think that all of us in this discussion like URIs. I can’t speak
> > for, say, Andrew, but, tentatively, I think that I prefer
> >  to plain 10.111/xxx. I would just prefer
> > 
> 
> info URIs, In My Opinion, are ideally suited for long term
> identifiers of non information resources. But http URIs are
> definitely better than something which isn't a URI at all.

Something we can all agree on! URIs are better than no URIs.

best,
Erik

1. Take with a grain of salt, as this is not something I have fully
thought out the implications of.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgp4pFCxNEtYW.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Ross Singer
On Thu, Apr 2, 2009 at 3:03 PM, Jonathan Rochkind  wrote:

> Note this isn't as much of a problem for "born web" resources -- nobody's
> going to accidentally create an alternate URI for a dbpedia term, because
> anybody that knows about dbpedia knows that it lives at dbpedia.

Unless they use the corresponding URI from Wikipedia or Freebase.

In short, identifiers are based on social contracts and only
"validated" through use.  Not because some authority or other has
endorsed them, but because they've proliferated through actual, real
world, use.

Different communities might have the reason to use a different
identifier that expresses the *exact same thing* because the syntax or
the format better suits their needs.  Or language.  Or environment.

And there's nothing any governing body or standards document can do to
stop them.

It's obviously a bad time to use this term, but identifiers will not
be produced by standards, but by "market forces", branding and
momentum.

-Ross.


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Jonathan Rochkind

Rob Sanderson wrote:


info URIs, In My Opinion, are ideally suited for long term identifiers
of non information resources.  But http URIs are definitely better than
something which isn't a URI at all.

  
Through this discussion I am clarifying my thoughts on this too. I feel 
that info URIs are especially suited for identifiers that are not only 
long-term identifiers of non-web resources (an ISBN may identify an 
'information' resource, but it's not a web resource), but also 
especially when in addition all of the following are true:


0) Of potential wide-spread (not just local) interest. Ie, NOT a URI for 
a record in my local catalog.
1) The identifier vocabularly itself pre-dates the web and was not 
designed for the web. (ISBN, SuDoc).
2) There is not a controlling authority for the identifier vocabularly 
that _recognizes_ it's responsibility to maintain persistence _and_ has 
the resources to do fulfill that responsibility. That could be be because:
   a) There is no single controlling authority at all, the control is 
distributed, and they don't all have their coordinated act together for 
a web-world.
   b) The controlling authority hasn't yet realized that these 
identifiers matter for a web world, and don't care about URIs.
   c) There's nobody that wants to commit to this because they think 
they can't afford it.



That's what I'm thinking.  URI for a wikipedia concept from dbpedia?  
Sure, use http.  Those aren't going anywhere, because they are 
web-native, they were created to be web-native, the folks that created 
them realize what this means, and as long as their project exists 
they're likely to maintain them, and they're project isn't likely to go 
away.


URI for an ISBN or SuDocs?  I don't think the GPO is going anywhere, but 
the GPO isn't committing to supporting an http URI scheme, and whoever 
is, who knows if they're going anywhere. That issue is certainly 
mitigated by Ross using purl.org for these, instead of his own personal 
http URI. But another issue that makes us want a controlling authority 
is increasing the chances that everyone will use the _same_ URI.  If GPO 
were behind the purl.org/NET/sudoc URIs, those chances would be high. 
Just Ross on his own, the chances go down, later someone else (OCLC, 
GPO, some other guy like Ross) might accidentally create a 'competitor', 
which would be unfortunate. Note this isn't as much of a problem for 
"born web" resources -- nobody's going to accidentally create an 
alternate URI for a dbpedia term, because anybody that knows about 
dbpedia knows that it lives at dbpedia.


So those are my thoughts. Now everyone else can argue bitterly over them 
for a while. :)


And yes, I agree fully that ALL identifiers ought to be expressed as 
_some_ kind of URI.  Once you've done that, you've avoided the most 
important mistake, I think.


Jonathan


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Erik Hetzner
Hi Ray -

At Thu, 2 Apr 2009 13:48:19 -0400,
Ray Denenberg, Library of Congress wrote:
> 
> You're right, if there were a "web:"  URI scheme, the world would be a 
> better place.   But it's not, and the world is worse off for it.

Well, the original concept of the ‘web’ was, as I understand it, to
bring together all the existing protocols (gopher, ftp, etc.), with
the new one in addition (HTTP), with one unifying address scheme, so
that you could have this ‘web browser’ that you could use for
everything. So web: would have been nice, but probably wouldn’t have
been accepted.

As it turns out, HTTP won overwhelmingly, and the older protocols died
off.

> It shouldn't surprise anyone that I am sympathetic to Karen's
> criticisms. Here is some of my historical perspective (which may
> well differ from others').
> 
> Back in the old days, URIs (or URLs) were protocol based. The ftp
> scheme was for retrieving documents via ftp. The telnet scheme was
> for telnet. And so on. Some of you may remember the ZIG (Z39.50
> Implementors Group) back when we developed the z39.50 URI scheme,
> which was around 1995. Most of us were not wise to the ways of the
> web that long ago, but we were told, by those who were, that
> "z39.50r:" and "z39.50s:" at the beginning of a URL are explicit
> indications that the URI is to be resolved by Z39.50.
> 
> A few years later the semantic web was conceived and alot of SW
> people began coining all manner of http URIs that had nothing to do
> with the http protocol. By the time the rest of the world noticed,
> there were so many that it was too late to turn back. So instead,
> history was altered. The company line became "we never told you that
> the URI scheme was tied to a protocol".
> 
> Instead, they should have bit the bullet and coined a new scheme.  They 
> didn't, and that's why we're in the mess we're in.

Not knowing the details of the history, your account seems correct to
me, except that I don’t think the web people tried to alter history.

I think of the web of having been a learning experience for all of us.
Yes, we used to think that the URI was tied to the protocol. But we
have learned that it doesn’t need to be, that HTTP URIs can be just
identifiers which happen to be dereferencable at the moment using the
HTTP protocol.

And it became useful to begin identifying lots of things, people and
places and so on, using identifiers, and it also seemed useful to use
a protocol that existed (HTTP), instead of coming up with the
Person-Metadata Transfer Protocol and inventing a new URI scheme
(pmtp://...) to resolve metadata about persons. Because HTTP doesn’t
care what kind of data it is sending down the line; it can happily
send metadata about people.

But that is how things grow; the http:// at the beginning of a URI may
eventually be a spandrel, when HTTP is dead and buried. And people
will wonder why the address http://dx.doi.org/10./xxx has those
funny characters in front of it. And doi.org will be long gone,
because they ran out of money, and their domain was taken over by
squatters, so we all had to agree to alter our browsers to include an
override to not use DNS to resolve the dx.doi.org domain but instead
point to a new, distributed system of DOI resolution.

We will need to fix these problems as they arise.

In my opinion, if we are interested in identifier persistent, clarity
about the difference between things and information about things,
creating a more useful web (of data), and the other things we ought to
be interested in, our time is best spent worrying about these things,
and how they can be built on top of the web. Our time is not well
spent in coming up with new ways to do things that web already does
for us.

For instance: if there is concern that HTTP URIs are not seen as being
persistent, it would be useful to try to add a method to HTTP which
indicated the persistence of an identifier. This way browsers could
display a little icon that indicated that the URI was persistent. A
user could click on this icon and get information about the
institution which claimed persistence for the URI, what the level of
support was, what other institution could back up that claim, etc.

Our time would not be well spent coming up with an elaborate scheme
for phttp:// URIs, creating a better DNS, with name control by a
better institution, and a better HTTP, with metadata, and a better
caching system, and so on. This is a lot of work and you forget what
you were trying to do in the first place, which is make HTTP URIs
persistent.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpOEgu0KFRiA.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Rob Sanderson
On Thu, 2009-04-02 at 18:11 +0100, Erik Hetzner wrote:
> At Thu, 2 Apr 2009 13:47:50 +0100,
> Mike Taylor wrote:
> > 
> > Erik Hetzner writes:
> >  > Without external knowledge that info:doi/10./xxx is a URI, I can
> >  > only guess.
> > 
> > Yes, that is true.  The point is that by specifying that the rft_id
> > has to be a URI, you can then use other kinds of URI without needing
> > to broaden the specification.
> 
> Thanks for the clarification. Honestly I was also responding to Rob
> Sanderson’s message (bad practice, surely) where he described URIs as
> ‘self-describing’, which seemed to me unclear. URIs are only
> self-describing insofar as they describe what type of URI they are.

All I meant by that was that the info:doi/ URI is more informative as to
what the identifier actually is than just the doi by itself, which could
be any string.  Equally, if I saw an SRW info URI like:

info:srw/cql-context-set/2/relevance-1.0

that's more informative than some ad-hoc URI for the same thing.
Without the external knowledge that info:doi/xxx is a DOI and
info:srw/cql-context-set/2/ is a cql context set administered by the
owner with identifier '2' (which happens to be me), then they're still
just opaque strings.

I could have said that http://srw.cheshire3.org/contextSets/rel/ was the
identifier for it (SRU doesn't care) but that's the location for the
retrieval documentation for the context set, not a collection of
abstract access points.

If srw.cheshire3.org was to go away, then people can still happily use
the info URI with the continued knowledge that it shouldn't resolve to
anything.

With the potential dissolution of DLF, this has real implications, as
DLF have an info URI namespace.  If they'd registered a bunch of URIs
with diglib.org instead, which will go away, then people would have
trouble using them.  Notably when someone else grabs the domain and
starts using the URIs for something else.
Now if DLF were to disband AND reform, then they can happily go back to
using info:dlf/ URIs even if they have a brand new domain.


> I think that all of us in this discussion like URIs. I can’t speak
> for, say, Andrew, but, tentatively, I think that I prefer
>  to plain 10.111/xxx. I would just prefer
> 

info URIs, In My Opinion, are ideally suited for long term identifiers
of non information resources.  But http URIs are definitely better than
something which isn't a URI at all.

Rob


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
An account that has a depressing ring of accuracy to it.

Ray Denenberg, Library of Congress writes:
 > You're right, if there were a "web:"  URI scheme, the world would be a 
 > better place.   But it's not, and the world is worse off for it.
 > 
 > It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
 > Here is some of my historical perspective (which may well differ from 
 > others').
 > 
 > Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
 > was for retrieving documents via ftp. The telnet scheme was for telnet. And 
 > so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
 > when we developed the z39.50 URI scheme, which was around 1995. Most of us 
 > were not wise to the ways of the web that long ago, but we were told, by 
 > those who were, that "z39.50r:" and "z39.50s:"  at the beginning of a URL 
 > are explicit indications that the URI is to be resolved by Z39.50.
 > 
 > A few years later the semantic web was conceived and alot of SW people began 
 > coining all manner of http URIs that had nothing to do with the http 
 > protocol.   By the time the rest of the world noticed, there were so many 
 > that it was too late to turn back. So instead, history was altered.  The 
 > company line became "we never told you that the URI scheme was tied to a 
 > protocol".
 > 
 > Instead, they should have bit the bullet and coined a new scheme.  They 
 > didn't, and that's why we're in the mess we're in.
 > 
 > --Ray
 > 
 > 
 > - Original Message - 
 > From: "Houghton,Andrew" 
 > To: 
 > Sent: Thursday, April 02, 2009 9:41 AM
 > Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
 > registering info: uris?)
 > 
 > 
 > >> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 > >> Karen Coyle
 > >> Sent: Wednesday, April 01, 2009 2:26 PM
 > >> To: CODE4LIB@LISTSERV.ND.EDU
 > >> Subject: Re: [CODE4LIB] resolution and identification (was Re:
 > >> [CODE4LIB] registering info: uris?)
 > >>
 > >> This really puzzles me, because I thought http referred to a protocol:
 > >> hypertext transfer protocol. And when you put "http://"; in front of
 > >> something you are indicating that you are sending the following string
 > >> along to be processed by that protocol. It implies a certain
 > >> application
 > >> over the web, just as "mailto:"; implies a particular application. Yes,
 > >> "http" is the URI for the hypertext transfer protocol. That doesn't
 > >> negate the fact that it indicates a protocol.
 > >
 > > RFC 3986 (URI generic syntax) says that "http:" is a URI scheme not a
 > > protocol.  Just because it says "http" people make all kinds of
 > > assumptions about type of use, persistence, resolvability, etc.  As I
 > > indicated in a prior message, whoever registered the http URI scheme
 > > could have easily used the token "web:" instead of "http:".  All the
 > > URI scheme in RFC 3986 does is indicate what the syntax of the rest
 > > of the URI will look like.  That's all.  You give an excellent
 > > example: mailto.  The mailto URI scheme does not imply a particular
 > > application.  It is a URI scheme with a specific syntax.  That URI
 > > is often resolved with the SMTP (mail) protocol.  Whoever registered
 > > the mailto URI scheme could have specified the token as "smtp:"
 > > instead of "mailto:";.
 > >
 > >> My reading of Cool URIs is
 > >> that they use the protocol, not just the URI. If they weren't intended
 > >> to take advantage of http then W3C would have used something else as a
 > >> URI. Read through the Cool URIs document and it's not about
 > >> identifiers,
 > >> it's all about using the *protocol* in service of identifying. Why use
 > >> http?
 > >
 > > I'm assuming here when you say "My reading of Cool URIs..." means reading
 > > the "Cool URIs for the Semantic Web" document and not the "Cool URIs Don't
 > > Change" document.  The "Cool URIs for the Semantic Web" document is about
 > > linked data.  Tim Burners-Lee's four linked data priciples state:
 > >
 > >   1. Use URIs as names for things.
 > >   2. Use HTTP URIs so that people can look up those names.
 > >   3. When someone looks up a URI, provide useful information.
 > >   4. Include links to other URIs. so that they can discover more things.
 > >
 > > (2) is an important aspect to linking.  The Web is a hypertext based 
 > > system
 > > that uses HTTP URIs to identify resources.  If you want to link, then you
 > > need to use HTTP URIs.  There is only one protocol, today, that accepts
 > > HTTP URIs as "currency" and its appropriately called HTTP and defined by
 > > RFC 2616.
 > >
 > > The "Cool URIs for the Semantic Web" document describes how an HTTP 
 > > protocol
 > > implementation (of RFC 2616) should respond to a dereference of an HTTP 
 > > URI.
 > > Its important to understand the URIs are just tokens that *can* be 
 > > presented
 > > to a protocol for resolution.  Its up to the protocol to define th

[CODE4LIB] Reminder: Lecture/Discussion, New York Public Library, April 6: "Of Maps and Metadata", Dr. Tim Sherratt, National Archives of Australia

2009-04-02 Thread Mark A. Matienzo
Please accept my apology for any duplicate copies of this message you
might receive. Redistribute as appropriate.
=

http://www.nypl.org/research/calendar/class/hssl/talkdesc.cfm?id=5351

Of maps and metadata:
Explorations in online access at the National Archives of Australia
Dr. Tim Sherratt, National Archives of Australia

Monday, April 6, 2009 – 12 PM-2 PM

South Court Auditorium
Stephen A. Schwartzman Building
The New York Public Library
Fifth Avenue and 42nd Street, New York, NY

This program is free and open to the public.
No registration is necessary.

The National Archives of Australia provides online access to 1.6
million fully-digitized files, totaling more than 20 million digital
images. Another 6 million are described in its online database. This
represents only about ten per cent of of the Archives' total holdings,
but it's more than enough to challenge the skills of even the most
experienced researcher. As these numbers continue to grow, issues of
access, findability and visualization will become increasingly
pressing. How will we orient researchers within this mass of data? How
will we help them find what they want? How will we help them use what
they've found?

In November 2008, the National Archives of Australia launched Mapping
our Anzacs , a Google Maps based
interface to the 376,000 World War I service records in its care.
Through this site, users can browse places around the world where
service people were born or enlisted, following links to digitized
copies of their records. They can also contribute notes and photos
through an online scrapbook. Mapping our Anzacs provides a wholly new
way of accessing and interacting with the collection and has proved
very popular, but how can the ideas underpinning its development be
applied more broadly?

This talk will discuss the past and future of Mapping our Anzacs and
introduce some of the other online initiatives being explored by the
National Archives of Australia.

Dr. Tim Sherratt works as a web content developer at the National
Archives of Australia. He is a historian of Australian science and
culture who has been developing online resources relating to archives
and history since 1993. He has written on weather, progress and the
atomic age, and has developed resources including Bright Sparcs and
Mapping our Anzacs.

--
Mark A. Matienzo
Applications Developer, Digital Experience Group
The New York Public Library


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress
You're right, if there were a "web:"  URI scheme, the world would be a 
better place.   But it's not, and the world is worse off for it.


It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
Here is some of my historical perspective (which may well differ from 
others').


Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
was for retrieving documents via ftp. The telnet scheme was for telnet. And 
so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
when we developed the z39.50 URI scheme, which was around 1995. Most of us 
were not wise to the ways of the web that long ago, but we were told, by 
those who were, that "z39.50r:" and "z39.50s:"  at the beginning of a URL 
are explicit indications that the URI is to be resolved by Z39.50.


A few years later the semantic web was conceived and alot of SW people began 
coining all manner of http URIs that had nothing to do with the http 
protocol.   By the time the rest of the world noticed, there were so many 
that it was too late to turn back. So instead, history was altered.  The 
company line became "we never told you that the URI scheme was tied to a 
protocol".


Instead, they should have bit the bullet and coined a new scheme.  They 
didn't, and that's why we're in the mess we're in.


--Ray


- Original Message - 
From: "Houghton,Andrew" 

To: 
Sent: Thursday, April 02, 2009 9:41 AM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 2:26 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

This really puzzles me, because I thought http referred to a protocol:
hypertext transfer protocol. And when you put "http://"; in front of
something you are indicating that you are sending the following string
along to be processed by that protocol. It implies a certain
application
over the web, just as "mailto:"; implies a particular application. Yes,
"http" is the URI for the hypertext transfer protocol. That doesn't
negate the fact that it indicates a protocol.


RFC 3986 (URI generic syntax) says that "http:" is a URI scheme not a
protocol.  Just because it says "http" people make all kinds of
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token "web:" instead of "http:".  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as "smtp:"
instead of "mailto:";.


My reading of Cool URIs is
that they use the protocol, not just the URI. If they weren't intended
to take advantage of http then W3C would have used something else as a
URI. Read through the Cool URIs document and it's not about
identifiers,
it's all about using the *protocol* in service of identifying. Why use
http?


I'm assuming here when you say "My reading of Cool URIs..." means reading
the "Cool URIs for the Semantic Web" document and not the "Cool URIs Don't
Change" document.  The "Cool URIs for the Semantic Web" document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based 
system

that uses HTTP URIs to identify resources.  If you want to link, then you
need to use HTTP URIs.  There is only one protocol, today, that accepts
HTTP URIs as "currency" and its appropriately called HTTP and defined by
RFC 2616.

The "Cool URIs for the Semantic Web" document describes how an HTTP 
protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP 
URI.
Its important to understand the URIs are just tokens that *can* be 
presented
to a protocol for resolution.  Its up to the protocol to define the 
"currency"
that it will accept, e.g., HTTP URIs, and its up to an implementation of 
the

protocol to define the "tokens" of that "currency" that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but 
in

the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, e

Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 13:47:50 +0100,
Mike Taylor wrote:
> 
> Erik Hetzner writes:
>  > Without external knowledge that info:doi/10./xxx is a URI, I can
>  > only guess.
> 
> Yes, that is true.  The point is that by specifying that the rft_id
> has to be a URI, you can then use other kinds of URI without needing
> to broaden the specification.  So:
>   info:doi/10./j.1475-4983.2007.00728.x
>   urn:isbn:1234567890
>   ftp://ftp.indexdata.com/pub/yaz
> 
> [Yes, I am throwing in an ftp: URL as an identifier just because I can
> -- please let's not get sidetracked by this very bad idea :-) ]
>
> This is not just hypothetical: the flexibility is useful and the
> ecapsulation of the choice within a URI is helpful. I maintain an
> OpenURL resolver that handles rft_id's by invoking a plugin
> depending on what the URI scheme is; for some URI schemes, such as
> info:, that then invokes another, lower-level plugin based on the
> type (e.g. "doi" in the example above). Such code is straightforward
> to write, simple to understand, easy to maintain, and nice to extend
> since all you have to do is provide one more encapsulated plugin.

Thanks for the clarification. Honestly I was also responding to Rob
Sanderson’s message (bad practice, surely) where he described URIs as
‘self-describing’, which seemed to me unclear. URIs are only
self-describing insofar as they describe what type of URI they are.

I think that all of us in this discussion like URIs. I can’t speak
for, say, Andrew, but, tentatively, I think that I prefer
 to plain 10.111/xxx. I would just prefer


>  > (Caveat: I have no idea what rft_id, etc, means, so maybe that
>  > changes the meaning of what you are saying from how I read it.)
> 
> No, it's doesn't :-)  rft_id is the name of the parameter used in
> OpenURL 1.0 to denote a referent ID, which is the same thing I've been
> calling a Thing Identifier elsewhere in this thread.  The point with
> this part of OpenURL is precisely that you can just shove any
> identifier at the resolver and leave it to do the best job it can.
> Your only responsibility is to ensure that the identifier you give it
> is in the form of a URI, so the resolver can use simple rules to pick
> it apart and decide what to do.

Thanks.

best,
Erik Hetzner


pgprSzdg7GAkN.pgp
Description: PGP signature


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
 > > > RFC 3986 (URI generic syntax) says that "http:" is a URI scheme
 > > > not a protocol.  Just because it says "http" people make all
 > > > kinds of assumptions about type of use, persistence,
 > > > resolvability, etc.
 > > 
 > > And RFC 2616 (Hypertext transfer protocol) says:
 > > 
 > > "The HTTP protocol is a request/response protocol. A client sends
 > > a request to the server in the form of a request method, URI, and
 > > protocol version, followed by a MIME-like message containing
 > > request modifiers, client information, and possible body content
 > > over a connection with a server."
 > > 
 > > So what you are saying is that it's ok to use the URI for the
 > > hypertext transfer protocol in a way that ignores RFC 2616. I'm
 > > just not sure how functional that is, in the grand scheme of
 > > things.
 > 
 > You missed the whole point that URIs, specified by RFC 3986, are
 > just tokens that are divorced from protocols, like RFC 2616, but
 > often work in conjunction with them to retrieve a representation of
 > the resource defined by the URI scheme.  It is up to the protocol
 > to decide which URI schemes that it will accept.

But if Karen missed that point, and I also miss that point, and a
whole bunch of the other smart people on this list have all missed
this point, then there surely comes a stage in the argument where we
have to pragmatically accept that the point has missed.  If people on
CODE4LIB don't get this, then the general population is not going to,
either.

I think the W3C are so infatuated (justifiably) with the success of
HTTP that they've lost the perspective to see beyond it.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "By filing this bug report you have challenged the honor of
 my family.  PREPARE TO DIE!" -- Klingon Programming Mantra


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Jonathan Rochkind

Houghton,Andrew wrote:

I think the answer lies in DNS. Even though you have a single DNS name
requests could be redirected to one of multiple servers, called a server
farm.  I believe this is how many large sites, like Google, operate.  So
even if a single server fails the load balancer sends requests to other
servers.  Even OCLC does this.
  


Certainly one _could_ do lots of things -- although I'm more worried 
about an _organization_ as a point of failure, than a particular piece 
of hardware. I'm not worried about that with OCLC, but I am worried 
about that with some random programmer minting URIs at some random 
institution that doesn't neccesarily understand persistence as part of 
it's institutional mission.  With Mike's vision of lots of people 
everywhere minting URIs pointing at their own domains... how likely is 
it that hardly any of them are going to do this? Any time soon?


I think too much of this conversation is about people's ideal vision of 
how things _could_ work, rather than trying to make things work as best 
as we can in the _actual world we live in_, _as well as_ planning for 
the future when hopefully things will work even better.  You need a 
balance between the two.


I also start seeing people in this thread saying "But if you do it that 
way it doesn't work for the Semantic Web (tm)."  Except they are more 
likely to say 'linked data' than 'semantic web', because the latter 
phrase seems to have been somewhat discredited.


The linked data vision is cool, and makes many interesting things 
possible. As parts of it are built, piece by piece, more interesting 
things become possible. dbpedia is awesome.  I want to support such uses 
by making my work compatible with the linked data vision where possible.


But linked data is NOT the only reason or use case for URI identifiers.  
I am also trying to solve particular _right now_ use cases that do not 
neccesarily depend on the linked data vision.  When I do this, I am 
trying to create identifiers (and other schemes) that:


a) Are as likely to keep working indefinitely, in the real world of 
organizations with varying levels of understanding, resources, and 
missions.
b) Are as likely as possible to be adopted by as many people as possible 
for inter-operability. Having an ever-increasing number of possible 
different URIs to represent the same thing is something to be avoided if 
possible.

c) Are as useful as possible for the linked data vision.

These things need to be balanced. I believe that sometimes an info: URI 
is going to be the best balance. Other times an http URI pointing to 
purl.org might be.  Other times an http URI pointing at some particular 
entity might be. (In the latter case, especially when it's an entity 
that understands what it's getting into, commits to it, documents it, 
and has enough 'power' in the community to encourage other people to use 
it).   Sometimes we'll be wrong, and we'll discover that.


I am equally frustrated with what I see as the dogmatism of both of 
these absolute points of view: That in a particular case, or even in 
_all_ cases, 1) it's obvious that an http uri is the only right 
solution, or 2) it's obvious that an http uri is an unacceptable solution.


In the cases we're talking about, neither of those things are obvious to 
me.  We're inventing this stuff as we go.  And we need to invent it for 
the real world where people don't always do what we think they ought to, 
not just for our ideal fantasy linked data world.


Jonathan


  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Roy Tennant
Uh, before this gets completely out of hand, UC is still running a PURL
server, see for example:



Roy

On 4/2/09 4/2/09 € 8:07 AM, "Mike Taylor"  wrote:

> Karen Coyle writes:
>>> OK, good, then if you are concerned about the PURL services SPOF,
>>> take the freely available PURL software and created a distributed
>>> PURL based system and put it up for the community.  I think
>>> several people have looked at this, but I have not heard of any
>>> progress or implementations.
>> 
>> The California Digital Library ran the PURL software for a while,
>> using it to mint identifiers for digital documents. It was a while
>> back, but someone there may remember how it went.
> 
> Wait, what?  They _were_ running a PURL resolver, but now they're not?
> What does the P in PURL stand for again?
> 
>  _/|_  ___
> /o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
> )_v__/\  "Wagner's music is nowhere near as bad as it sounds" -- Mark
> Twain.
> 

-- 


[CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Mike Taylor
Jonathan Rochkind writes:
 > Isn't there always a single point of failure if you are expecting
 > to be able to resolve an http URI via the HTTP protocol?

Yes (modulo the use of multiple servers at a single IP address).  But
failure of a given document is typically not catastrophic -- there are
plenty of other documents out there.  Failure of the PURL server means
failure of every document that has a PURL.

What's more, PURL doesn't _replace_ the existing point of failure, it
just adds another one: remember that purl.org doesn't itself serve any
documents, it just redirects to where the document actually: for
example, http://purl.org/dc/terms/ redirects to
http://dublincore.org/2008/01/14/dcterms.rdf#
So if either purl.org or dublincore.org goes down, you're nadgered.

 > Now, if you have a collection of disparate http URIs, you have
 > _many_ points of failure in that collection. Any entity goes down
 > or ceases to exist, and the http URIs that resolved to that
 > entity's web server will stop working.
 > 
 > I'd actually rather have a _single_ point of failure, in an
 > organization that resources are being put into to ensure
 > persistence, then hundreds or thousands of points of failure, many
 > of which are organizations that may lack the mission, funding, or
 > understanding to provide reliable persistence.

Sounds like what you want is a single _host_, which really would
either work or not.  At the moment, if you use PURLs, you know none of
them will work if the PURL server goes down, and you still have the
problem of individual server flakiness.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "You take two bodies and you twirl them into one, their hearts
 and their bones, and they won't come undone" -- Paul Simon,
 "Hearts and Bones"


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Jonathan Rochkind

Mike Taylor wrote:

Wait, what?  They _were_ running a PURL resolver, but now they're not?
What does the P in PURL stand for again?
  


Which, Mike, is why I'd rather have a single point of failure at 
purl.org, an organization which understands it's persistence mission and 
is likely to be supported (with financial resources) by the library 
community to carry out that mission.  Compared to multiple points of 
failure, many of which may lack the understanding, mission, commitment, 
or resources for persistence. :)


Jonathan



 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "Wagner's music is nowhere near as bad as it sounds" -- Mark
 Twain.

  


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Thursday, April 02, 2009 10:53 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution
> and identification )
> 
> Isn't there always a single point of failure if you are expecting to be
> able to resolve an http URI via the HTTP protocol?
> 
> Whether it's purl.org or not, there's always a single point of failure
> on a given http URI that you expect to resolve via HTTP, the entity
> operating the web server at the specified address. Right?

I think the answer lies in DNS.  Even though you have a single DNS name
requests could be redirected to one of multiple servers, called a server
farm.  I believe this is how many large sites, like Google, operate.  So
even if a single server fails the load balancer sends requests to other
servers.  Even OCLC does this.

> Now, if you have a collection of disparate http URIs, you have _many_
> points of failure in that collection. Any entity goes down or ceases to
> exist, and the http URIs that resolved to that entity's web server will
> stop working.

I think this also gets back to DNS.  Even though you have a single DNS
name requests could be redirected to servers outside the original request
domain.  So you could have distributed servers under many different domain
names.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Karen Coyle writes:
 > > OK, good, then if you are concerned about the PURL services SPOF,
 > > take the freely available PURL software and created a distributed
 > > PURL based system and put it up for the community.  I think
 > > several people have looked at this, but I have not heard of any
 > > progress or implementations.
 > 
 > The California Digital Library ran the PURL software for a while,
 > using it to mint identifiers for digital documents. It was a while
 > back, but someone there may remember how it went.

Wait, what?  They _were_ running a PURL resolver, but now they're not?
What does the P in PURL stand for again?

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "Wagner's music is nowhere near as bad as it sounds" -- Mark
 Twain.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Karen Coyle
> Sent: Thursday, April 02, 2009 10:15 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> Houghton,Andrew wrote:
> > RFC 3986 (URI generic syntax) says that "http:" is a URI scheme not a
> > protocol.  Just because it says "http" people make all kinds of
> > assumptions about type of use, persistence, resolvability, etc.
> >
> 
> And RFC 2616 (Hypertext transfer protocol) says:
> 
> "The HTTP protocol is a request/response protocol. A client sends a
> request to the server in the form of a request method, URI, and
> protocol
> version, followed by a MIME-like message containing request modifiers,
> client information, and possible body content over a connection with a
> server."
> 
> So what you are saying is that it's ok to use the URI for the hypertext
> transfer protocol in a way that ignores RFC 2616. I'm just not sure how
> functional that is, in the grand scheme of things.

You missed the whole point that URIs, specified by RFC 3986, are just tokens
that are divorced from protocols, like RFC 2616, but often work in conjunction
with them to retrieve a representation of the resource defined by the URI
scheme.  It is up to the protocol to decide which URI schemes that it will 
accept.  In the case of RFC 2616, there is a one-to-one relationship, today,
with the HTTP URI scheme.  RFC 2616 could also have said it would accept other 
URI schemes too or another protocol could be defined, tomorrow, that also 
accepts the HTTP URI scheme, causing the HTTP URI scheme to have a one-to-many 
relationship between its scheme and protocols that accept its scheme.

> And when you say:
> 
> > The "Cool URIs for the Semantic Web" document describes how an HTTP
> protocol
> > implementation (of RFC 2616) should respond to a dereference of an
> HTTP URI.
> 
> I think you are deliberating distorting the intent of the Cool URIs
> document. You seem to read it that *given* an http uri, here is how the
> protocol should respond. But in fact the Cool URIs document asks the
> question "So the question is, what URIs should we use in RDF?" and
> responds that one should use http URIs for the reason that:
> 
> "Given only a URI, machines and people should be able to retrieve a
> description about the resource identified by the URI from the Web. Such
> a look-up mechanism is important to establish shared understanding of
> what a URI identifies. Machines should get RDF data and humans should
> get a readable representation, such as HTML. The standard Web transfer
> protocol, HTTP, should be used."

The answer to the question posed in the document is based on Tim 
Burners-Lee four linked data principles where one of them states to 
use HTTP URIs.  Nobody, as far as I know, has created a hypertext 
based system based on the URN or info URI schemes.  The only 
hypertext based system available today is the Web which is based on 
the HTTP protocol that accepts HTTP URIs.  So you cannot effectively 
accomplish linked data on the Web without using HTTP URIs.

The document has an RDF / Semantic Web slant, but Tim Burners-Lee's 
four linked data principles say nothing about RDF or the Semantic Web.  
Those four principles might be more aptly named the four linked 
"information" principles for the Web.  Further, the document does go on 
to describe how an HTTP server (an implementation of RFC 2616) should 
respond to requests for Real World Object, Generic Documents and Web 
Documents which is based on the W3C TAG decisions for httpRange-14 and 
genericResources-53.

The scope of the document clearly says:

  "This document is a practical guide for implementers of the RDF 
   specification... It explains two approaches for RDF data hosted 
   on HTTP servers..."

Section 2.1 discusses HTTP and content negotiation for Generic Documents.

Section 4 discusses how the HTTP server should respond with diagrams and
actual HTTP status codes to let user agents know which URIs are Real
World Objects vs. Generic Document and Web Documents, per the W3 TAG
decisions on httpRange-14 and genericResources-53.

Section 6 directly address the question that this thread has been talking
about, namely using new URI schemes, like URN and info and why they are
not acceptable in the context of linked data.

And here is a quote which is what I have said over and over again about
URI being tokens and divorced from protocols:

  "To be truly useful, a new scheme must be accompanied by a protocol 
   defining how to access more information about the identified resource.
   For example, the ftp:// URI scheme identifies resources (files on an 
   FTP server), and also comes with a protocol for accessing them (the 
   FTP protocol)."

  "Some of the new URI schemes provide no such protocol at all. Others 
   provide a Web Service that allows retrieval of descriptions using the 
   HTTP protocol

[CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Jonathan Rochkind
Isn't there always a single point of failure if you are expecting to be 
able to resolve an http URI via the HTTP protocol?


Whether it's purl.org or not, there's always a single point of failure 
on a given http URI that you expect to resolve via HTTP, the entity 
operating the web server at the specified address. Right?


Now, if you have a collection of disparate http URIs, you have _many_ 
points of failure in that collection. Any entity goes down or ceases to 
exist, and the http URIs that resolved to that entity's web server will 
stop working.


I'd actually rather have a _single_ point of failure, in an organization 
that resources are being put into to ensure persistence, then hundreds 
or thousands of points of failure, many of which are organizations that 
may lack the mission, funding, or understanding to provide reliable 
persistence.


Jonathan



Mike Taylor wrote:

Ross Singer writes:
 > Ray, you are absolutely right.  These would be bad identifiers.  But
 > let's say they're all identical (which I think is what you're saying,
 > right?), then this just strengthens the case for indirection through a
 > service like purl.org.  Then it doesn't *matter* that all of these are
 > different locations, there is one URI that represent the concept of
 > what is being kept at these locations.  At the end of the redirect can
 > be some sort of 300 response that lets the client pick which endpoint
 > is right for them -or arbitrarily chooses one for them.

I have to say I am suspicious of schemes like PURL, which for all
their good points introduce a single point of failure into, well,
everything that uses them.  That can't be good.  Especially as it's
run by the same compary that also runs the often-unavailable OpenURL
registry.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "I don't really think that the end can be assessed as of itself,
 as being the end, because what does the end feel like?  It's like
 trying to extrapolate the end of the universe.  lf the universe
 is indeed infinite, then what does that mean?  How far is all
 the way?  And then if it stops, what's stopping it and what's
 behind what's stopping it?  So 'What is the end?' is my question
 to you" -- David St. Hubbins, _This Is Spinal Tap_.

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle

Houghton,Andrew wrote:


OK, good, then if you are concerned about the PURL services SPOF, take 
the freely available PURL software and created a distributed PURL based 
system and put it up for the community.  I think several people have

looked at this, but I have not heard of any progress or implementations.


Andy.

  


The California Digital Library ran the PURL software for a while, using 
it to mint identifiers for digital documents. It was a while back, but 
someone there may remember how it went.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
 > >  > > I have to say I am suspicious of schemes like PURL, which
 > >  > > for all their good points introduce a single point of
 > >  > > failure into, well, everything that uses them.  That can't
 > >  > > be good.  Especially as it's run by the same compary that
 > >  > > also runs the often-unavailable OpenURL registry.
 > >  >
 > >  > What you are saying is that you are suspicious of the HTTP
 > >  > protocol.
 > > 
 > > That is NOT what I am saying.
 > > 
 > > I am saying I am suspicious of a single point of failure.
 > > Especially since the entire architecture of the Internet was
 > > (rightly IMHO) designed with the goal of avoid SPOFs.
 > 
 > OK, good, then if you are concerned about the PURL services SPOF,
 > take the freely available PURL software and created a distributed
 > PURL based system and put it up for the community.

Why would  I want to do this when I could just Not Use PURLs?

Anyway, we're way off the subject now -- I guess if we want to argue
about the utility of PURL we could get a room :-)


 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "The cladistic defintion of Aves is: an unimportant offshoot of
 the much cooler dinosaur family which somehow managed to survive
 the K/T boundry intact" -- Eric Lurio.


[CODE4LIB] Evergreen conference Early Bird deadline

2009-04-02 Thread K.G. Schneider
If you are planning to attend the Evergreen International Conference
(May 20-22, Athens, Georgia), please note that Early Bird registration
ends tomorrow, Friday, April 3. Also note the NEW conference web
address:

http://www.lyrasis.org/evergreen

We have 18 great programs lined up and two great keynote speakers (Joe
Lucia and Jessamyn West), plus many opportunities for user-directed
activities: lightning talks, table talks, dine-arounds, and hackfest (or
anythingfest) time. Athens is a lovely venue (home of 2007 Code4Lib)
with nice restaurants and pubs.

See the program lineup and more at the conference wiki:

http://evergreen-ils.org/dokuwiki/doku.php?id=eg09:main

Evergreen International Conference 2009 is jointly sponsored by Georgia
Public Library Service, LYRASIS, and Equinox Software, Inc. Hope to see
you there! 

Karen G. Schneider
Community Librarian
Equinox Software, Inc. "The Evergreen Experts"
http://esilibrary.com
k...@esilibrary.com 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle

Houghton,Andrew wrote:

RFC 3986 (URI generic syntax) says that "http:" is a URI scheme not a
protocol.  Just because it says "http" people make all kinds of 
assumptions about type of use, persistence, resolvability, etc.




And RFC 2616 (Hypertext transfer protocol) says:

"The HTTP protocol is a request/response protocol. A client sends a 
request to the server in the form of a request method, URI, and protocol 
version, followed by a MIME-like message containing request modifiers, 
client information, and possible body content over a connection with a 
server."


So what you are saying is that it's ok to use the URI for the hypertext 
transfer protocol in a way that ignores RFC 2616. I'm just not sure how 
functional that is, in the grand scheme of things. And when you say:



The "Cool URIs for the Semantic Web" document describes how an HTTP protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP URI.


I think you are deliberating distorting the intent of the Cool URIs 
document. You seem to read it that *given* an http uri, here is how the 
protocol should respond. But in fact the Cool URIs document asks the 
question "So the question is, what URIs should we use in RDF?" and 
responds that one should use http URIs for the reason that:


"Given only a URI, machines and people should be able to retrieve a 
description about the resource identified by the URI from the Web. Such 
a look-up mechanism is important to establish shared understanding of 
what a URI identifies. Machines should get RDF data and humans should 
get a readable representation, such as HTML. The standard Web transfer 
protocol, HTTP, should be used."


So it doesn't just say how to respond to an http URI; it says to use 
http URIs *because* there is a useful possible response. That's a very 
different statement. It is signficant that (as Mike pointed out, perhaps 
inadvertently) no one is using mailto: or ftp: as identifiers. That's 
not a coincidence.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Mike Taylor
> Sent: Thursday, April 02, 2009 10:07 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> Houghton,Andrew writes:
>  > > I have to say I am suspicious of schemes like PURL, which for all
>  > > their good points introduce a single point of failure into, well,
>  > > everything that uses them.  That can't be good.  Especially as
>  > > it's run by the same compary that also runs the often-unavailable
>  > > OpenURL registry.
>  >
>  > What you are saying is that you are suspicious of the HTTP protocol.
> 
> That is NOT what I am saying.
> 
> I am saying I am suspicious of a single point of failure.  Especially
> since the entire architecture of the Internet was (rightly IMHO)
> designed with the goal of avoid SPOFs.

OK, good, then if you are concerned about the PURL services SPOF, take 
the freely available PURL software and created a distributed PURL based 
system and put it up for the community.  I think several people have
looked at this, but I have not heard of any progress or implementations.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
 > > I have to say I am suspicious of schemes like PURL, which for all
 > > their good points introduce a single point of failure into, well,
 > > everything that uses them.  That can't be good.  Especially as
 > > it's run by the same compary that also runs the often-unavailable
 > > OpenURL registry.
 > 
 > What you are saying is that you are suspicious of the HTTP protocol.

That is NOT what I am saying.

I am saying I am suspicious of a single point of failure.  Especially
since the entire architecture of the Internet was (rightly IMHO)
designed with the goal of avoid SPOFs.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "In My Egotistical Opinion, most people's C programs should
 be indented six feet downward and covered with dirt" -- Blair
 P. Houghton.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Mike Taylor
> Sent: Thursday, April 02, 2009 8:41 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> I have to say I am suspicious of schemes like PURL, which for all
> their good points introduce a single point of failure into, well,
> everything that uses them.  That can't be good.  Especially as it's
> run by the same compary that also runs the often-unavailable OpenURL
> registry.

What you are saying is that you are suspicious of the HTTP protocol.  All
the PURL server does is use mechanisms specified by the HTTP protocol.
Any HTTP server is capable of implementing those same mechanisms.  The
actual PURL server is a community based service that allows people to
create HTTP URIs that redirect to other URIs without having to run an 
actual HTTP server.  If you don't like its single point of failure, then 
create your own in-house service using your existing HTTP server.  I 
believe the source code for the entire PURL service is freely available 
and other people have taken the opportunity to run their own in-house or 
community based service.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Karen Coyle
> Sent: Wednesday, April 01, 2009 2:26 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> This really puzzles me, because I thought http referred to a protocol:
> hypertext transfer protocol. And when you put "http://"; in front of
> something you are indicating that you are sending the following string
> along to be processed by that protocol. It implies a certain
> application
> over the web, just as "mailto:"; implies a particular application. Yes,
> "http" is the URI for the hypertext transfer protocol. That doesn't
> negate the fact that it indicates a protocol. 

RFC 3986 (URI generic syntax) says that "http:" is a URI scheme not a
protocol.  Just because it says "http" people make all kinds of 
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token "web:" instead of "http:".  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as "smtp:"
instead of "mailto:";.

> My reading of Cool URIs is
> that they use the protocol, not just the URI. If they weren't intended
> to take advantage of http then W3C would have used something else as a
> URI. Read through the Cool URIs document and it's not about
> identifiers,
> it's all about using the *protocol* in service of identifying. Why use
> http?

I'm assuming here when you say "My reading of Cool URIs..." means reading
the "Cool URIs for the Semantic Web" document and not the "Cool URIs Don't
Change" document.  The "Cool URIs for the Semantic Web" document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

   1. Use URIs as names for things.
   2. Use HTTP URIs so that people can look up those names.
   3. When someone looks up a URI, provide useful information.
   4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based system
that uses HTTP URIs to identify resources.  If you want to link, then you 
need to use HTTP URIs.  There is only one protocol, today, that accepts 
HTTP URIs as "currency" and its appropriately called HTTP and defined by 
RFC 2616.

The "Cool URIs for the Semantic Web" document describes how an HTTP protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP URI.
Its important to understand the URIs are just tokens that *can* be presented 
to a protocol for resolution.  Its up to the protocol to define the "currency"
that it will accept, e.g., HTTP URIs, and its up to an implementation of the
protocol to define the "tokens" of that "currency" that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but in
the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, etc. or it should
dereference the HTTP URI with the HTTP protocol.  In fact the HTTP protocol
doesn't directly accept HTTP URIs.  As part of the dereference process the
HTTP user agent needs to break apart the HTTP URI and present it to the HTTP
protocol.  For example the HTTP URI: http://example.org/ becomes the HTTP 
protocol request:

GET / HTTP/1.1
Host: example.org

Think of a URI as a minted token.  The New York subway mints tokens to ride 
the subway to get to a destination.  Placing a U.S. quarter or a Boston
subway token in a turn style will not allow you to pass.  You must use the 
New York subway minted token, e.g., "currency".  URIs are the same.  OCLC 
can mint HTTP URI tokens and LC can mint HTTP URI tokens, both are using
the HTTP URI "currency", but sending LC HTTP URI tokens, e.g., Boston subway
tokens, to OCLC's Web server will most likely result in a 404, you cannot
pass since OCLC's Web server only accepts OCLC tokens, e.g., New York subway
tokens, that identify a resource under its control.


Andy.


Re: [CODE4LIB] resolution and identification

2009-04-02 Thread Mike Taylor
Jonathan Rochkind writes:
 > > Organization need to have a clear understanding of what they are
 > > minting URIs for.
 > 
 > Precisely. And in the real world... they don't always have
 > that. Neither the minters nor the users of URIs, especially the
 > users of http URIs, where you can find so many potential http URIs
 > that are different but seem to refer to the same thing.
 > 
 > ONE of the benefits of info is that the registry process forces
 > minters to develop that clear understanding (to some extent), and
 > documents it for later users.  There are also other pros and cons.
 > 
 > But again, I think http URIs _used appropriately_ can certainly
 > serve the same purpose as info uris.  In actuality, there seems to
 > be a lot of things causing people to use them inappropriately.

This is the best (and, maybe not coincidentally) the most concise
summary of the issue that I've read.

Houghton,Andrew writes:
 > People see http: and assume that it means the HTTP protocol so it
 > must be a locator.  [...] People don't understand what RFC 3986 is
 > saying.  It makes no claim that any URI registered scheme has
 > persistence or can be dereferenced.  An HTTP URI is just a token to
 > identify some resource, nothing more.

This is technically true ... just as it's technically true that a
female breast is just a piece of fatty tissue.  But, just like
boobies, http: URLs carry a LOT of cultural baggage and all sorts of
connotations -- some just wired into our minds, some coded right down
into our mail-readers and other software -- and they simply cannot be
realistically seen in that light by the great majority of people.

I suppose the bottom line is that, although we all agree that http:
URLs can indeed serve as identifiers, there are lots of good "soft"
reasons why it's useful to be able to tell a location from an
identifier at a glace -- both for busy people and for lazy software.
So to my surprise I am finding myself sort of reconciled with info:
URIs, even though I didn't like them at first.

(Although I'd like like them more if I could mint them myself wihout
needing to go through a registration process, like I can with http:
URLs.  Something like info:bydomain:miketaylor.org.uk/someSchema/1.0)

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  You have to take me in the spirit in which I'm intended.


Re: [CODE4LIB] Pacific Northwest Code4Lib chapter and meeting

2009-04-02 Thread Ed Summers
On Wed, Apr 1, 2009 at 2:12 PM, Reese, Terry
 wrote:
> FYI for the larger group.  Since many members in the PNW simply cannot
> travel to the larger C4L meeting due to budgetary restraints (this year,
> and very likely the next), etc -- we will be starting up a PNW local
> chapter and hosting a one day C4L meeting for those in the area that are
> interested, but maybe otherwise were not able to attend the annual C4L
> meeting.  Info can be found at:
> http://groups.google.com/group/pnwcode4lib?hl=en.  Plus, it will give
> the PNW a group that can start crafting a plan to bring the C4L
> conference back to its PNW home. J

Great news Terry. I would encourage you to continue to use this
discussion list to talk about events, planning and meetings if you
can. Also, if you find yourself outgrowing the wiki and want
pnw.code4lib.org or somesuch to point somewhere just say the word.

//Ed


[CODE4LIB] oclc data sharing policy survey

2009-04-02 Thread Eric Lease Morgan


[Posted on behalf of Jennifer Younger, this is an additional  
announcement regarding the OCLC data sharing policy survey. Please  
consider completing the survey (http://tinyurl.com/ctptuy). --ELM]



Dear Colleague:

As chair of the OCLC Review Board on Shared Data Creation and  
Stewardship, I invite you to participate in a Web-based survey among  
librarians and other interested constituents. The primary goal of this  
survey is to gather input from both OCLC members and non-members about  
a proposed OCLC policy, Policy for Use and Transfer of WorldCat®  
Records.


The Review Board will consider the results of this survey in its  
recommendations to OCLC.


Please review the existing guidelines and proposed policy if you have  
not already done so:


  * Guidelines [1]
  * Proposed Policy [2]

The OCLC Review Board is an independent committee convened by the OCLC  
Board of Trustees and the OCLC Members Council. If you wish to find  
out more about the Review Board, please visit these links:


  * OCLC Review Board [3]
  * Press release - formation of Review Board [4]
  * Press release - Review Board members [5]

The survey is available online at the following site, where you will  
find specific instructions on completing the survey. [6]


I invite you to complete this survey yourself or forward it to a  
colleague with an interest in this issue. Your opinions and comments  
are vital to our evaluation, regardless of your current level of usage  
of OCLC services or your relationship to OCLC.


Please complete the questionnaire online by April 8, 2009. To protect  
the confidentiality of your responses, all data will be collected,  
tabulated, and analyzed by Linray, an independent market research  
consultant. We will receive data in aggregate form only; your answers  
will not be associated in any way with you or your organization.


If you have questions about the content of the survey, please send an  
e-mail to reviewbo...@oclc.org. As an alternative to the survey, we  
welcome your feedback by sending an e-mail to reviewbo...@oclc.org or  
posting comments. [7] Please feel free to provide input in the  
language of your choice.


Thank you for your participation in this survey.

[1] 
http://www.oclc.org/support/documentation/worldcat/records/guidelines/default.htm
[2] http://www.oclc.org/worldcat/catalog/policy/recordusepolicy.pdf
[3] http://www.oclc.org/worldcat/catalog/policy/board/default.htm
[4] http://www.oclc.org/us/en/news/releases/20092.htm
[5] http://www.oclc.org/us/en/news/releases/200910.htm
[6] https://www.surveymonkey.com/s.aspx?sm=c0hILWPafv97EDbNiRXXjg_3d_3d
[7] http://community.oclc.org/reviewboard/

Sincerely,

Jennifer A. Younger
Chair, OCLC Review Board of Shared Data Creation and Stewardship
Edward H. Arnold Director, Hesburgh Libraries
University of Notre Dame




about the survey.doc
Description: MS-Word document


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Mike Taylor
Erik Hetzner writes:
 > > Not quite.  Embedding a DOI in an info URI (or a URN) means that
 > > the identifier describes its own type.  If you just get the naked
 > > string
 > >10./j.1475-4983.2007.00728.x
 > > passed to you, say as an rft_id in an OpenURL, then you can't
 > > tell (except by guessing) whether it's a DOI, a SICI, and ISBN or
 > > a biological species identifier.  But if you get
 > >info:doi/10./j.1475-4983.2007.00728.x
 > > then you know what you've got, and can act on it accordingly.
 > 
 > It seems to me that you are just pushing out by one more level the
 > mechanism to be able to tell what something is.
 > 
 > That is - before you needed to know that 10./xxx was a DOI. Now
 > you need to know that info:doi/10./xxx is a URI.
 > 
 > Without external knowledge that info:doi/10./xxx is a URI, I can
 > only guess.

Yes, that is true.  The point is that by specifying that the rft_id
has to be a URI, you can then use other kinds of URI without needing
to broaden the specification.  So:
info:doi/10./j.1475-4983.2007.00728.x
urn:isbn:1234567890
ftp://ftp.indexdata.com/pub/yaz

[Yes, I am throwing in an ftp: URL as an identifier just because I can
-- please let's not get sidetracked by this very bad idea :-) ]

This is not just hypothetical: the flexibility is useful and the
ecapsulation of the choice within a URI is helpful.  I maintain an
OpenURL resolver that handles rft_id's by invoking a plugin depending
on what the URI scheme is; for some URI schemes, such as info:, that
then invokes another, lower-level plugin based on the type (e.g. "doi"
in the example above).  Such code is straightforward to write, simple
to understand, easy to maintain, and nice to extend since all you have
to do is provide one more encapsulated plugin.

 > (Caveat: I have no idea what rft_id, etc, means, so maybe that
 > changes the meaning of what you are saying from how I read it.)

No, it's doesn't :-)  rft_id is the name of the parameter used in
OpenURL 1.0 to denote a referent ID, which is the same thing I've been
calling a Thing Identifier elsewhere in this thread.  The point with
this part of OpenURL is precisely that you can just shove any
identifier at the resolver and leave it to do the best job it can.
Your only responsibility is to ensure that the identifier you give it
is in the form of a URI, so the resolver can use simple rules to pick
it apart and decide what to do.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "There are three rules for writing a novel.  Unfortunately,
 no one knows what they are" -- W. Somerset Maugham.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ray Denenberg, Library of Congress
> Sent: Wednesday, April 01, 2009 2:38 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> No,  not identical URIs.
> 
> Let's say I've put a copy of the schema permanently at each of the
> following
> locations.
>  http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
>  http://www.acme.com//mods-3-3.xsd
>  http://www.takoma.org/standards/mods-3-3.xsd
> 
> Three locations, three URIs.
> 
> But the issue of redirect or even resolution is irrelevant in the use
> case
> I'm citing.   I'm talking about the use of an identifier within a
> protocol,
> for the sole purpose of identifying an object that the recipient of the
> URI
> already has - or if it doesn't have it it isn't going to retrieve it,
> it
> will just fail the request.   The purpose of the identifier is to
> enable the
> server to determine whether it has the schema that the client is
> looking
> for.  (And by the way that should answer Ed's question about a use
> case.)
> 
> So the server has some table of schemas, in that table is the row:
> 
> ["mods schema]   [ ]
> 
> It recieves the SRU request:
> http://z3950.loc.gov:7090/voyager?
> version=1.1&operation=searchRetrieve&query=dinosaur&maximumRecords=1&re
> cordSchema= identifying the mods schema>
> 
> If the "URI identifying the MODS schema" in the request matches the URI
> in
> the table, then the server know what schema the client wants, and it
> proceeds.  If there are multiple identifiers then it has to have a row
> in
> its table for each.
> 
> Does that make sense?

Absolute sense to me.  Since LC is the "author/creator" of MODS it should
create a Real World Object URI for MODS version 3.3 schema.  So LC now
creates:

http://www.loc.gov/standards/mods/v3.3

Everyone uses that URI for the SRU recordSchema parameter.  What LC has
done is define a URI with the following policy statement:

1) Type of usage: Real World Object (RWO)
2) Persistence: Yes
3) Resolvable: No

Issue resolved.

As a side issue, one could argue that placing a schema at:

>  http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
>  http://www.acme.com//mods-3-3.xsd
>  http://www.takoma.org/standards/mods-3-3.xsd

and not having an authorized location is a recipe for disaster. One of 
those URIs is an authorized URI the other two are URI aliases.  So
according to RFC 2616 you would probably what to have the latter two
URIs either redirect 301/302/307 back to the first URI or have the
latter two return a 200 with a Content-Location header containing
the first URI.  Now user agents can figure out which is the authorized
version of the schema and which are URI aliases.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Ross Singer writes:
 > Ray, you are absolutely right.  These would be bad identifiers.  But
 > let's say they're all identical (which I think is what you're saying,
 > right?), then this just strengthens the case for indirection through a
 > service like purl.org.  Then it doesn't *matter* that all of these are
 > different locations, there is one URI that represent the concept of
 > what is being kept at these locations.  At the end of the redirect can
 > be some sort of 300 response that lets the client pick which endpoint
 > is right for them -or arbitrarily chooses one for them.

I have to say I am suspicious of schemes like PURL, which for all
their good points introduce a single point of failure into, well,
everything that uses them.  That can't be good.  Especially as it's
run by the same compary that also runs the often-unavailable OpenURL
registry.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "I don't really think that the end can be assessed as of itself,
 as being the end, because what does the end feel like?  It's like
 trying to extrapolate the end of the universe.  lf the universe
 is indeed infinite, then what does that mean?  How far is all
 the way?  And then if it stops, what's stopping it and what's
 behind what's stopping it?  So 'What is the end?' is my question
 to you" -- David St. Hubbins, _This Is Spinal Tap_.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ray Denenberg, Library of Congress
> Sent: Wednesday, April 01, 2009 1:59 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> We do just fine minting our URIs at LC, Andy. But we do appreciate your
> concern.

Sorry Ray, that statement wasn't directed at LC in particular, but was a 
general statement.  OCLC doesn’t do any better in this area, especially 
with WorldCat where there are the same issues I pointed out with your 
examples and additional issues to boot.  The point I was trying to make
was *all* organizations need to have clear policies on creating, 
maintaining, persistence, etc.  Failure to do so creates a big mess 
that takes time to fix, often creating headaches for those using an 
organizations URIs.  Take for example when NISO redesigned their site 
and broke all the URIs to their standards.  Tim Berners-Lee addresses 
this in his Cool URIs Don't Break article.

> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Wednesday, April 01, 2009 2:07 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] resolution and identification (was Re:
> [CODE4LIB] registering info: uris?)
> 
> Ray, you are absolutely right.  These would be bad identifiers.  But
> let's say they're all identical (which I think is what you're saying,
> right?), then this just strengthens the case for indirection through a
> service like purl.org.  Then it doesn't *matter* that all of these are
> different locations, there is one URI that represent the concept of
> what is being kept at these locations.  At the end of the redirect can
> be some sort of 300 response that lets the client pick which endpoint
> is right for them -or arbitrarily chooses one for them.

Exactly, but purl.org is just using standard HTTP protocol mechanisms 
which could be easily done by LC's site given Ray's examples.

What is at issue is the identification of a Real World Object URI for
MODS v3.3.  Whether I get back an XML schema, a RelaxNG schema, etc.
are just Web Documents or representations of that abstract Real World 
Object.  What Ross did was make the PURL the Real World Object URI for
MODS v3.3 and used it to redirect to the geographically distributed
Web Documents, e.g., representations.  LC could have just as well
minted one under its own domain.


Andy.


[CODE4LIB] mailing list administratativa

2009-04-02 Thread Eric Lease Morgan

This is a bit of mailing list administratativa.

First, the list turned itself off yesterday because we exceeded the 50  
messages/day limit. Hmmm... I have turned the list back on.


Second, you can manage your subscription at the following URL. You  
might want to turn on digest mode:


  http://listserv.nd.edu/archives/code4lib.html

That's all.

--
Eric Lease Morgan, List Owner
University of Notre Dame

574/631-8604


[CODE4LIB] Pacific Northwest Code4Lib chapter and meeting

2009-04-02 Thread Reese, Terry
FYI for the larger group.  Since many members in the PNW simply cannot
travel to the larger C4L meeting due to budgetary restraints (this year,
and very likely the next), etc -- we will be starting up a PNW local
chapter and hosting a one day C4L meeting for those in the area that are
interested, but maybe otherwise were not able to attend the annual C4L
meeting.  Info can be found at:
http://groups.google.com/group/pnwcode4lib?hl=en.  Plus, it will give
the PNW a group that can start crafting a plan to bring the C4L
conference back to its PNW home. J

 

--TR

 

 

***

Terry Reese

The Gray Family Chair for Innovative Library Services

Oregon State University Libraries

Corvallis, OR  97331

tel: 541-737-6384

email: terry.re...@oregonstate.edu

http: http://oregonstate.edu/~reeset

*** 

 

 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle
This really puzzles me, because I thought http referred to a protocol: 
hypertext transfer protocol. And when you put "http://"; in front of 
something you are indicating that you are sending the following string 
along to be processed by that protocol. It implies a certain application 
over the web, just as "mailto:"; implies a particular application. Yes, 
"http" is the URI for the hypertext transfer protocol. That doesn't 
negate the fact that it indicates a protocol. My reading of Cool URIs is 
that they use the protocol, not just the URI. If they weren't intended 
to take advantage of http then W3C would have used something else as a 
URI. Read through the Cool URIs document and it's not about identifiers, 
it's all about using the *protocol* in service of identifying. Why use 
http? Here's what it says:


"1. Be on the Web.

   Given only a URI, machines and people should be able to retrieve a
   description about the resource identified by the URI from the Web.
   Such a look-up mechanism is important to establish shared
   understanding of what a URI identifies. Machines should get RDF data
   and humans should get a readable representation, such as HTML. The
   standard Web transfer protocol, HTTP, should be used."

It's using the *protocol*. Now, you can argue that W3C should NOT be 
doing this, but it is clear to me that they are not seeking out the 
*pure* identifiers that Mike Taylor talks about, but are creating 
something quite different.


I just don't see how you can use the URI that indicates a particular 
protocol and claim that it doesn't really mean what it says it means.


kc

Ross Singer wrote:

My point is that I don't see how they're different in practice.

And one of them actually allowed you to do something from your email client.

-Ross.

On Wed, Apr 1, 2009 at 1:20 PM, Karen Coyle  wrote:
  

Ross, I don't get your point. My point was about the confusion between two
things that begin: http:// but that are very different in practice. What's
yours?

kc

Ross Singer wrote:


Your email client knew what do with:

info:doi/10./j.1475-4983.2007.00728.x ?

doi:10./j.1475-4983.2007.00728.x ?

Or did you recognize the info:doi scheme and Google it?

Or would this, in case of 99% of the world, just look like gibberish
or part of some nerd's PGP key?

-Ross.

On Wed, Apr 1, 2009 at 1:06 PM, Karen Coyle  wrote:

  

Ross Singer wrote:



On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle  wrote:


  

But shouldn't we be able to know the difference between an identifier
and
a
locator? Isn't that the problem here? That you don't know which it is
if
it
starts with http://.




But you do if it starts with http://dx.doi.org


  

No, *I* don't. And neither does my email program, since it displayed it
as a
URL (blue and underlined). That's inside knowledge, not part of the
technology. Someone COULD create a web site at that address, and there's
nothing in the URI itself to tell me if it's a URI or a URL.

The general convention is that "http://"; is a web address, a location. I
realize that it's also a form of URI, but that's a minority use of http.
This leads to a great deal of confusion. I understand the desire to use
domain names as a way to create unique, managed identifiers, but the http
part is what is causing us problems.

John Kunze's ARK system attempted to work around this by using http to
retrieve information about the URI, so you're not just left guessing.
It's
not a question of resolution, but of giving you a short list of things
that
you can learn about a URI that begins with http. However, again, unless
you
know the secret you have no idea that those particular URI/Ls have that
capability. So again we're going beyond the technology into some human
knowledge that has to be there to take advantage of the capabilities. It
doesn't seem so far fetched to make it possible for programs (dumb, dumb
programs) to know the difference between an identifier and a location
based
on something universal, like a prefix, without having to be coded for
dozens
or hundreds of exceptions.

kc




I still don't see the difference.  The same logic that would be
required to parse and understand the info: uri scheme could be used to
apply towards an http uri scheme.

-Ross.




  

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234






  

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234






  



--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress

No,  not identical URIs.

Let's say I've put a copy of the schema permanently at each of the following 
locations.

http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
http://www.acme.com//mods-3-3.xsd
http://www.takoma.org/standards/mods-3-3.xsd

Three locations, three URIs.

But the issue of redirect or even resolution is irrelevant in the use case 
I'm citing.   I'm talking about the use of an identifier within a protocol, 
for the sole purpose of identifying an object that the recipient of the URI 
already has - or if it doesn't have it it isn't going to retrieve it, it 
will just fail the request.   The purpose of the identifier is to enable the 
server to determine whether it has the schema that the client is looking 
for.  (And by the way that should answer Ed's question about a use case.)


So the server has some table of schemas, in that table is the row:

["mods schema]   [ ]

It recieves the SRU request:
http://z3950.loc.gov:7090/voyager?
version=1.1&operation=searchRetrieve&query=dinosaur&maximumRecords=1&recordSchema=identifying the mods schema>


If the "URI identifying the MODS schema" in the request matches the URI in 
the table, then the server know what schema the client wants, and it 
proceeds.  If there are multiple identifiers then it has to have a row in 
its table for each.


Does that make sense?

--Ray


- Original Message - 
From: "Ross Singer" 

To: 
Sent: Wednesday, April 01, 2009 2:07 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




Ray, you are absolutely right.  These would be bad identifiers.  But
let's say they're all identical (which I think is what you're saying,
right?), then this just strengthens the case for indirection through a
service like purl.org.  Then it doesn't *matter* that all of these are
different locations, there is one URI that represent the concept of
what is being kept at these locations.  At the end of the redirect can
be some sort of 300 response that lets the client pick which endpoint
is right for them -or arbitrarily chooses one for them.

-Ross.

On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress
 wrote:

We do just fine minting our URIs at LC, Andy. But we do appreciate your
concern.

The analysis of our MODS URIs misses the point, I'm afraid. Let's forget
the set I cited (bad example) and assume that the schema is replicated at
several locations (geographically dispersed) all of which are planned to
house the specific version permanently. The suggestion to designate one 
as

cannonical is a good suggestion but it isn't always possible (for various
reasons, possibly political). So I maintain that in this scenario you 
have

several *location* none of which serves well as an identifier. I'm not
arguing (here) that info is better than http (for this scenario) just 
that

these are not good identifiers.

--Ray

- Original Message - From: "Houghton,Andrew" 
To: 
Sent: Wednesday, April 01, 2009 1:21 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
registering info: uris?)



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 1:06 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

The general convention is that "http://"; is a web address, a location.
I
realize that it's also a form of URI, but that's a minority use of
http.
This leads to a great deal of confusion. I understand the desire to use
domain names as a way to create unique, managed identifiers, but the
http part is what is causing us problems.


http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that
it is a web addresss. However, it is not a location. URIs according
to RFC 3986 are just tokens to identify resources. These tokens, e.g.,
URIs are presented to protocol mechanisms as part of the dereferencing
process to locate and retrieve a representation of the resource.

People see http: and assume that it means the HTTP protocol so it must
be a locator. Whoever initially registered the HTTP URI scheme could
have used "web" as the token instead and we would all be doing:
. This is the confusion. People don't understand
what RFC 3986 is saying. It makes no claim that any URI registered
scheme has persistence or can be dereferenced. An HTTP URI is just a
token to identify some resource, nothing more.


Andy.