Re: [CODE4LIB] registering info: uris?
So hey, I'm nobody wanted to see this thread revived, but I'm hoping you info uri folks can clear something up for me. So I'm trying to gather together a vocabulary of identifiers to unambiguously describe the format of the data you would be getting in a Jangle feed or an UnAPI response (or any other variation on this theme). I have a MODS document and I want *you* to have it too!. Jakob Voss made the (reasonable) suggestion that rather than create yet another identifier or registry to describe these formats, instead it would make sense to use the work that the SRU: http://www.loc.gov/standards/sru/resources/schemas.html or OpenURL: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats communities have already done. Which makes a lot of sense. It would be nice to use the same identifier in Jangle, SRU and OpenURL to say that this is a MARCXML or ONIX record. Except that OpenURL and SRU /already use different info URIs to describe the same things/. info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix What is the rationale for this? How do we keep up? Are they reusable? Which one should be used? Doesn't this pretty horribly undermine the purpose of using info URIs in the first place? Is anybody else interested in working on a way to unambiguously say here is a Dublin Core resource as XML, but it is not OAI DC or this is text/x-vcard, it conforms to vCard 3.0 in a way that we can reuse among all of our various ways of sharing data? Thanks, -Ross.
Re: [CODE4LIB] registering info: uris?
From: Ross Singer rossfsin...@gmail.com Except that OpenURL and SRU /already use different info URIs to describe the same things/. info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix What is the rationale for this? None. (Or, whatever rationale there was, historically, should no longer apply.) These should be aligned. Post this to the OpenURL list (and perhaps SRU as well). I'm certainly willing to work to come up with a solution. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Jonathan Rochkind writes: There are trade-offs. I think a lot of that TAG stuff privileges the theoretically pure over the on the ground practicalities. They've got a great fantasy in their heads of what the semantic web _could_ be, and I agree it's theoretically sound and _could_ be; but you've got to make it convenient and cheap if you actually want it to happen for real, sometimes sacrificing theoretical purity. And THAT'S one important lesson of the success of the WWW. Very true and very important. I've seen this stated most succinctly by Clay Shirky: You cannot simultaneously have mass adoption and rigor. I hope one day I can come up with eight words as pithy as that. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Good craftsmanship may not be art, but good art incorporates good craftsmanship -- Jane MacDonald.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Alexander Johannesen wrote: I think you are quite mistaken on this, but before we leap into wheter the web is suitable for SuDoc I'd rather point out that SuDoc isn't web friendly in itself, and *that* more than anything stands in the way of using them with the web. It stands in the way of using them in the fully realized sem web vision. It does NOT stand in the way of using them in many useful ways that I can and want to use them _right now_. Ways which having a URI to refer to them are MUCH helped by. Whether it can resolve or not (YOU just made the point that a URI doesn't actually need to resolve, right? I'm still confused by this having it both ways -- URIs don't need to resolve, but if you're URIs don't resolve than you're doing it wrong. Huh?), if you have a URI for a SuDoc you can use it in any infrastructure set up to accept, store, and relate URIs. Like an OpenURL rft_id, and, yeah, like RDF even. You can make statements about a SuDoc if it has a URI, whether or not it resolves, whether or not SuDoc itself is 'web friendly'. One step at a time. This is my frustration with semantic web stuff, making it harder to do things that we _could_ do right here and now, because it violates a fantasy of an ideal infrastructure that we may never actually have. There are business costs, as well as technical problems, to be solved to create that ideal fantasy infrastructure. The business costs are _real_ Also, having a unified resolver for SuDoc isn't hard, can be at a fixed URL, and use a parameter for identifiers. You don't need to snoop the non-parameterized section of an URI to get the ID's ; Okay, Alex, why don't you set this up for us then? And commit to providing it persistently indefinitely? Because I don't have the resources to do that. And for the use cases I am confronted with, I don't _need_ it, any old URI, even not resolvable, will do--yes, as long as I can recognize it as a SuDoc and extract the bare SuDoc out of it. Which you say I shouldn't be doing (while others say that's a mis-reading of those docs to think I shouldn't be doing it) -- but avoiding doing that would raise the costs of my software quite a bit, and make the feature infeasible in the first place. Business costs and resources _matter_. I'm being a bit dis-ingenous here, because rsinger actually already _has_ set something like this up, using purl.org. Which isn't perfect, but it's there, so fine. I still don't even need it for what I'm doing. No it's not; if you design your system RESTfully (which, indeed, HTTP is) then the discovery part can be fast, cached, and using URI templates embedded in HTTP responses, fully flexible and fit for your purposes. Feel free to contribute code to my open source project (Umlaut) to accomplish the things I need to do in an efficient manner while making an HTTP request for every single rft_id that comes in. These URIs are _external_ URIs from third parties, I have no control over whether they are designed RESTfully or not. But you contribute the code, and it's good code, I'll be happy to use it. In the meantime, I'll continue trying to balance functionality, maintainability, future expansion, and the programming and hardware resources available to me, same as I always do, here in the real world when we're building production apps, not RD experiments, where we don't have complete control over the entire environment we operate in. You telling me that everything would work great _if only_ everyone in the whole world that I need to inter-operate with did things the way you say they should -- does absolutely nothing for me. And this, again, is my frustration with many of these semantic web arguments I'm hearing -- describing an ideal fantasy world that doesn't exist, but insisting we act as if it does, even if that means putting barriers in the way of actually getting things done. I'd like to actually get things done while moving bit-by-bit toward the semantic web vision. I can't if the semantic web vision insists that everything must be perfect, and disallows alternate solutions, alternate trade-offs, and alternate compromises. I don't have time for that, I'm building actual production apps with limited resources. Jonathan
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Hiya, On Thu, Apr 16, 2009 at 01:10, Jonathan Rochkind rochk...@jhu.edu wrote: It stands in the way of using them in the fully realized sem web vision. Ok, I'm puzzled. How? As the SemWeb vision is all about first-order logic over triplets, and the triplets are defined as URIs, if you can pop something into a URI you're good to go. So how is it that SuDoc doesn't fit into this, as you *can* chuck it in a URI? I said it was unfriendly to the Web, not impossible. It does NOT stand in the way of using them in many useful ways that I can and want to use them _right now_. Ah, but then go fix it. Ways which having a URI to refer to them are MUCH helped by. Whether it can resolve or not (YOU just made the point that a URI doesn't actually need to resolve, right? I'm still confused by this having it both ways -- URIs don't need to resolve, but if you're URIs don't resolve than you're doing it wrong. Huh?) C'mon, it ain't *that* hard. :) URIs as identifiers is fine, having them resolve as well is great. What's so confusing about that? , if you have a URI for a SuDoc you can use it in any infrastructure set up to accept, store, and relate URIs. Like an OpenURL rft_id, and, yeah, like RDF even. You can make statements about a SuDoc if it has a URI, whether or not it resolves, whether or not SuDoc itself is 'web friendly'. One step at a time. This is my frustration with semantic web stuff, making it harder to do things that we _could_ do right here and now, because it violates a fantasy of an ideal infrastructure that we may never actually have. Huh? The people who made SuDoc didn't make it web friendly, and thus the SemWeb stuff is harder to do because it lives on the web? (And chucking your meta data into HTML as MF or RDF snippets ain't that hard, it just require a minimum of knowledge) There are business costs, as well as technical problems, to be solved to create that ideal fantasy infrastructure. The business costs are _real_ No more real than the cost currently in place. The thing is that a lot of people see the traditional cost disappear with the advent of SemWeb and the new costs heavily reduced. Also, having a unified resolver for SuDoc isn't hard, can be at a fixed URL, and use a parameter for identifiers. You don't need to snoop the non-parameterized section of an URI to get the ID's ; Okay, Alex, why don't you set this up for us then? Why? I don't give a rats bottom about SuDoc, don't need it, think it's poorly designed, and gives me nothing in life. Why should I bother? (Unless I'm given money for it, then I'll start caring ... :) And commit to providing it persistently indefinitely? Because I don't have the resources to do that. Who's behind SuDoc, and are they serious about their creation? That's the people you should send your anger instead. And for the use cases I am confronted with, I don't _need_ it, any old URI, even not resolvable, will do--yes, as long as I can recognize it as a SuDoc and extract the bare SuDoc out of it. So what's the problem with just making some stuff up? If you can do your thing in a vacuum I don't fully understand your problem with the SemWeb stuff? If you don't want it, don't use it. Which you say I shouldn't be doing (while others say that's a mis-reading of those docs to think I shouldn't be doing it) No, I think this one is the subtle difference between a URL and a URI. but avoiding doing that would raise the costs of my software quite a bit, and make the feature infeasible in the first place. Business costs and resources _matter_. As with anything on the Web, you work with what you got, and if you can fix and share your fix, we all will love you for it. I seriously don't think I understand what you're getting at here; it's been this way since the Web popped into existance, and don't really want it to be any other way. No it's not; if you design your system RESTfully (which, indeed, HTTP is) then the discovery part can be fast, cached, and using URI templates embedded in HTTP responses, fully flexible and fit for your purposes. These URIs are _external_ URIs from third parties, I have no control over whether they are designed RESTfully or not. Not sure I follow this one. There are no good or bad RESTful URIs, just URIs. REST is how your framework work with the URIs. In the meantime, I'll continue trying to balance functionality, maintainability, future expansion, and the programming and hardware resources available to me, same as I always do, here in the real world when we're building production apps, not RD experiments My day job is to balance functionality, maintainability, future expansion, and the programming and hardware resources available to me, same as I always do, here in the real world when we're building production apps ... and I'm using Topic Maps and SemWeb technologies. Is there something I'm doing which degrades my work to an RD experiment, something I should let my customers
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Jonatan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Tuesday, April 14, 2009 9:27 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Hiya, Been meaning to jump into this discussion for a while, but I've been off to an alternative universe and I can't even say it's good to be back. :) Anwhoo ... On Fri, Apr 3, 2009 at 03:48, Ray Denenberg, Library of Congress r...@loc.gov wrote: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. I'm rather confused by this statement. The web: URI scheme? The Web *is* the URI scheme; they are all identifiers to resources (ftp: http: gopher: https: etc.), and together they make up, the, um, web of things. What am I missing? Back in the old days, URIs (or URLs) were protocol based. No, which one do you mean, URIs or URLs? The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Again, have I missed something? This has changed, as opposed to the good old days? A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. I've been browsing back and forth this discussion, and couldn't find much to back this up. What do you mean by this? Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. I'm sorry, but mess? Did you know the messiness of the web is probably what made it successful? Not to mention that having URIs be identifiers *and* have the ability to resolve them is a bonus; they're identifiers of things (as they've always been, as I'm sure you know URI stands for Unified Resource Identifier, right? :), as in they consists of a string of characters used to identify or name a resource on the Internet. And then, if you so choose, you can use the protocol level to *resolve* them. Not sure how anyone can consider this to be bad, though. Or is this just a misunderstanding of the difference between URIs and URLs? Kind regards, Alexander -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote: The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. Sure it exists: URLs are a subset of URIs. URLs are locators as opposed to just identifiers (which is an important distinction, much used in SemWeb lingo), where URLs are closer to the protocol like things Ray describe (or so I think). I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. But what about it is confusing (apart from us having this discussion :) ? Is it that we have IDs that happens to *also* resolve? And why is that confusing? Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Well, let me just say that there's more to SemWeb than what comes out of W3C. :) Kind regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Can you show me where this definition of a URL vs. a URI is made in any RFC or standard-like document? Sure, we have a _sense_ of how the connotation is different, but I don't think that sense is actually formalized anywhere. And that's part of what makes it confusing, yeah. I think the sem web crowd actually embraces this confusingness, they want to have it both ways: Oh, a URI doesn't need to resolve, it's just an opaque identifier; but you really should use http URIs for all URIs; why? because it's important that they resolve. In general, combining two functions in one mechanism is a dangerous and confusing thing to do in data design, in my opinion. By analogy, it's what gets a lot of MARC/AACR2 into trouble. It's also often a very convenient thing to do, and convenience matters. Although ironically, my problem with some of those TAG documents is actually that they privilege pure theory over practical convenience. Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Infeasible for performance and cost reasons, and infeasible because it requires a lot more development effort at BOTH ends -- it means that every single URI _would_ have to de-reference to an RDF representation capable of telling me it identifies a SuDoc and what the acutal bare SuDoc is. Contrary to the protestations that a URI is different than a URL and does not need to resolve, foll! owing the opacity recommendation/requirement would mean that resolution would be absolutely required in order for me to use it. Meaning that someone minting the URI would have to provide that infrastructure, and I as a client would have to write code to use it. But I just want a darn SuDoc in a URI -- and there are advantages to putting a SuDoc in a URI _precisely_ so it can be used in URI-using infrastructures like RDF, and these advantages hold _even if_ it's not resolvable and we ignore the 'opacity' reccommendation. There are trade-offs. I think a lot of that TAG stuff privileges the theoretically pure over the on the ground practicalities. They've got a great fantasy in their heads of what the semantic web _could_ be, and I agree it's theoretically sound and _could_ be; but you've got to make it convenient and cheap if you actually want it to happen for real, sometimes sacrificing theoretical purity. And THAT'S one important lesson of the success of the WWW. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Tuesday, April 14, 2009 9:48 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote: The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. Sure it exists: URLs are a subset of URIs. URLs are locators as opposed to just identifiers (which is an important distinction, much used in SemWeb lingo), where URLs are closer to the protocol like things Ray describe (or so I think). I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. But what about it is confusing (apart from us having this discussion :) ? Is it that we have IDs that happens to *also* resolve? And why is that confusing? Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Well, let me just say that there's more to SemWeb than what comes out of W3C. :) Kind regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Thanks Ray. By that definition ALL http URIs are URLs, a priori. I read Alexander as trying to make a different distinction. Ray Denenberg, Library of Congress wrote: From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Am I not an agent making use of a URI who is attempting to infer properties from it? Like that it represents a SuDoc, and in particular what that SuDoc is? If this kind of talmudic parsing of the TAG reccommendations to figure out what they _really_ mean is neccesary, I stand by my statement that the environment those TAG documents are encouraging is a confusing one. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
The User Agent is understood to be a typical browser, or other piece of software, like wget, curl, etc. It's the thing implementing the client side of the specs. I don't think you are operating as a user agent here as much as you are a server application. That is, assuming I have any idea what you're actually doing. --Joe On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote: Am I not an agent making use of a URI who is attempting to infer properties from it? Like that it represents a SuDoc, and in particular what that SuDoc is? If this kind of talmudic parsing of the TAG reccommendations to figure out what they _really_ mean is neccesary, I stand by my statement that the environment those TAG documents are encouraging is a confusing one. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind rochk...@jhu.edu wrote: Can you show me where this definition of a URL vs. a URI is made in any RFC or standard-like document? From http://www.faqs.org/rfcs/rfc3986.html ; 1.1.3. URI, URL, and URN A URI can be further classified as a locator, a name, or both. The term Uniform Resource Locator (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network location). The term Uniform Resource Name (URN) has been used historically to refer to both URIs under the urn scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name. An individual scheme does not have to be classified as being just one of name or locator. Instances of URIs from any given scheme may have the characteristics of names or locators or both, often depending on the persistence and care in the assignment of identifiers by the naming authority, rather than on any quality of the scheme. Future specifications and related documentation should use the general term URI rather than the more restrictive terms URL and URN [RFC3305]. As you can see, an URI is an identifier, and a URL is a locator (mechanism for retrieval), and since a URL is a subset of an URI, you _can_ resolve URIs as well. Sure, we have a _sense_ of how the connotation is different, but I don't think that sense is actually formalized anywhere. It is, and the same stuff is documented in WikiPedia as well ; http://en.wikipedia.org/wiki/Uniform_Resource_Identifier http://en.wikipedia.org/wiki/Uniform_Resource_Locator I think the sem web crowd actually embraces this confusingness, No, I think they take it at face value; they(the URIs) are identifiers for things, and can be used for just that purpose, but they are also URLs which mean they resolve to something. What I think you're coming at is that something thing it resolves too, as *that* has no definition. But then, if you go from RDF to Topic Maps PSIs (PSIs are URIs with an extended meaning), *that* thing it resolves to indeed has a definition; it's the prose explaining what the identifier identifies, and this is the most important difference between RDF and Topic Maps (and a very subtle but important difference, too). they want to have it both ways: Oh, a URI doesn't need to resolve, it's just an opaque identifier; but you really should use http URIs for all URIs; why? because it's important that they resolve. I smell straw-man. :) But yes, they do want both, as both is in fact a friggin' smart thing to have. We all deal with identifiers all the time, in internal as external applications, so why not use an indetifier scheme that has the added bonus of adding a resolver mechanism? If you want to be stupid and lock yourself in your limited world, then using them as just identifiers is fine but perhaps a bit, well, stupid. But if you want to be smart about it, realizing that without ontological work there will *never* be proper interop, you use those identifiers and let them resolve to something. And if you're really smart, you let them resolve to either more RDF statements, or, if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :). In general, combining two functions in one mechanism is a dangerous and confusing thing to do in data design, in my opinion. Because ... ? By analogy, it's what gets a lot of MARC/AACR2 into trouble. Hmm, and I thought it was crap design that did that, coupled with poor metadata constraints and validation channels, untyped fields, poor tooling, the lack of machine understandability, and the general library idiom of not invented here. But correct me if I'm wrong. :) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html Umm, I'd be wary to take as canon a draft with editorial notes going back 4 to 5 years that still aren't resolved. In other words, this document isn't relevant to the real world. Yet. They suggest: URI opacity 'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' Well, as a RESTafarian I understand this argument quite well. It's about not assuming too much from the internal structure of the URI. Again, it's an identifier, not a scheme such as an URL where structure is defined. Again, for URIs, don't assume structure because at this point it isn't an URL. If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement I think you are quite mistaken on this, but before we leap into wheter the web is suitable for SuDoc I'd
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Well, the thing is, those sem web folks LIKE what has resulted. They think it's _good_ that http:// can be resolved with a certain protocol in some cases, but can be an arbitrary identifier untied to protocol in others. It definitely is convenient in some cases. I have mixed feelings, I don't think it's a disaster, but I'm not sure it's always a good idea. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mike Taylor [m...@indexdata.com] Sent: Thursday, April 02, 2009 2:33 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) An account that has a depressing ring of accuracy to it. Ray Denenberg, Library of Congress writes: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you
Re: [CODE4LIB] registering info: uris?
no, that's not at all what it implies. the ofi/name identifiers were minted as identifiers for namespaces of indentifiers, not as a wrapper scheme for the identifiers themselves. Yes, it's a bit TOO meta, but they can be safely ignored unless a new profile is desired. On Apr 5, 2009, at 10:31 AM, Karen Coyle wrote: Jonathan Rochkind wrote: URI for an ISBN or SuDocs? I don't think the GPO is going anywhere, but the GPO isn't committing to supporting an http URI scheme, and whoever is, who knows if they're going anywhere. That issue is certainly mitigated by Ross using purl.org for these, instead of his own personal http URI. But another issue that makes us want a controlling authority is increasing the chances that everyone will use the _same_ URI. If GPO were behind the purl.org/ NET/sudoc URIs, those chances would be high. Just Ross on his own, the chances go down, later someone else (OCLC, GPO, some other guy like Ross) might accidentally create a 'competitor', which would be unfortunate. Note this isn't as much of a problem for born web resources -- nobody's going to accidentally create an alternate URI for a dbpedia term, because anybody that knows about dbpedia knows that it lives at dbpedia. So those are my thoughts. Now everyone else can argue bitterly over them for a while. :) The ones that really puzzle me, however, are the OpenURL info namespace URIs for ftp, http, https and info. This implies that EVERY identifier used by OpenURL needs an info URI, even if it is a URI in its own right. They are under info:ofi/nam which is called Namespace reserved for registry identifiers of namespaces. There's something so circular about this that I just get a brain dump when I try to understand it. Does it make sense to anyone? kc -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234 Eric Hellman http://hellman.net/eric/
Re: [CODE4LIB] registering info: uris?
Karen Coyle wrote: The ones that really puzzle me, however, are the OpenURL info namespace URIs for ftp, http, https and info. This implies that EVERY identifier used by OpenURL needs an info URI, even if it is a URI in its own right. They are under info:ofi/nam which is called Namespace reserved for registry identifiers of namespaces. There's something so circular about this that I just get a brain dump when I try to understand it. Does it make sense to anyone? No, it does not make sense to anyone, as far as I can tell. Jonathan kc
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
No, not identical URIs. Let's say I've put a copy of the schema permanently at each of the following locations. http://www.loc.gov/standards/mods/v3/mods-3-3.xsd http://www.acme.com//mods-3-3.xsd http://www.takoma.org/standards/mods-3-3.xsd Three locations, three URIs. But the issue of redirect or even resolution is irrelevant in the use case I'm citing. I'm talking about the use of an identifier within a protocol, for the sole purpose of identifying an object that the recipient of the URI already has - or if it doesn't have it it isn't going to retrieve it, it will just fail the request. The purpose of the identifier is to enable the server to determine whether it has the schema that the client is looking for. (And by the way that should answer Ed's question about a use case.) So the server has some table of schemas, in that table is the row: [mods schema] [ URI identifying the mods schema] It recieves the SRU request: http://z3950.loc.gov:7090/voyager? version=1.1operation=searchRetrievequery=dinosaurmaximumRecords=1recordSchema=URI identifying the mods schema If the URI identifying the MODS schema in the request matches the URI in the table, then the server know what schema the client wants, and it proceeds. If there are multiple identifiers then it has to have a row in its table for each. Does that make sense? --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 2:07 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Ray, you are absolutely right. These would be bad identifiers. But let's say they're all identical (which I think is what you're saying, right?), then this just strengthens the case for indirection through a service like purl.org. Then it doesn't *matter* that all of these are different locations, there is one URI that represent the concept of what is being kept at these locations. At the end of the redirect can be some sort of 300 response that lets the client pick which endpoint is right for them -or arbitrarily chooses one for them. -Ross. On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress r...@loc.gov wrote: We do just fine minting our URIs at LC, Andy. But we do appreciate your concern. The analysis of our MODS URIs misses the point, I'm afraid. Let's forget the set I cited (bad example) and assume that the schema is replicated at several locations (geographically dispersed) all of which are planned to house the specific version permanently. The suggestion to designate one as cannonical is a good suggestion but it isn't always possible (for various reasons, possibly political). So I maintain that in this scenario you have several *location* none of which serves well as an identifier. I'm not arguing (here) that info is better than http (for this scenario) just that these are not good identifiers. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 1:21 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 1:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that it is a web addresss. However, it is not a location. URIs according to RFC 3986 are just tokens to identify resources. These tokens, e.g., URIs are presented to protocol mechanisms as part of the dereferencing process to locate and retrieve a representation of the resource. People see http: and assume that it means the HTTP protocol so it must be a locator. Whoever initially registered the HTTP URI scheme could have used web as the token instead and we would all be doing: web://example.org/. This is the confusion. People don't understand what RFC 3986 is saying. It makes no claim that any URI registered scheme has persistence or can be dereferenced. An HTTP URI is just a token to identify some resource, nothing more. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ray Denenberg, Library of Congress Sent: Wednesday, April 01, 2009 1:59 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) We do just fine minting our URIs at LC, Andy. But we do appreciate your concern. Sorry Ray, that statement wasn't directed at LC in particular, but was a general statement. OCLC doesn’t do any better in this area, especially with WorldCat where there are the same issues I pointed out with your examples and additional issues to boot. The point I was trying to make was *all* organizations need to have clear policies on creating, maintaining, persistence, etc. Failure to do so creates a big mess that takes time to fix, often creating headaches for those using an organizations URIs. Take for example when NISO redesigned their site and broke all the URIs to their standards. Tim Berners-Lee addresses this in his Cool URIs Don't Break article. From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Wednesday, April 01, 2009 2:07 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Ray, you are absolutely right. These would be bad identifiers. But let's say they're all identical (which I think is what you're saying, right?), then this just strengthens the case for indirection through a service like purl.org. Then it doesn't *matter* that all of these are different locations, there is one URI that represent the concept of what is being kept at these locations. At the end of the redirect can be some sort of 300 response that lets the client pick which endpoint is right for them -or arbitrarily chooses one for them. Exactly, but purl.org is just using standard HTTP protocol mechanisms which could be easily done by LC's site given Ray's examples. What is at issue is the identification of a Real World Object URI for MODS v3.3. Whether I get back an XML schema, a RelaxNG schema, etc. are just Web Documents or representations of that abstract Real World Object. What Ross did was make the PURL the Real World Object URI for MODS v3.3 and used it to redirect to the geographically distributed Web Documents, e.g., representations. LC could have just as well minted one under its own domain. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept. It just so happens that HTTP URIs are accepted by the HTTP protocol, but in the case of mailto URIs they are accepted by the SMTP protocol. However, it is important to note that a HTTP user agent, e.g., a browser, accepts both HTTP and mailto URIs. It decides that it should send the mailto URI to an SMTP user agent, e.g., Outlook, Thunderbird, etc. or it should dereference the HTTP URI with the HTTP protocol. In fact the HTTP protocol doesn't directly accept HTTP URIs. As part of the dereference process the HTTP user agent needs to break apart the HTTP URI and present it to the HTTP protocol. For example the HTTP URI: http://example.org/ becomes the HTTP protocol request: GET / HTTP/1.1 Host: example.org Think of a URI as a minted token. The New York subway mints tokens to ride the subway to get to a destination. Placing a U.S. quarter or a Boston subway token in a turn style will not allow you to pass. You must use the New York subway minted token, e.g., currency. URIs are the same. OCLC can mint HTTP URI tokens and LC can mint HTTP URI tokens, both are using the HTTP URI currency, but sending LC HTTP URI tokens, e.g., Boston subway tokens, to OCLC's Web server will most likely result in a 404, you cannot pass since OCLC's Web server only accepts OCLC tokens, e.g., New York subway tokens, that identify a resource under its control. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Thursday, April 02, 2009 8:41 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) I have to say I am suspicious of schemes like PURL, which for all their good points introduce a single point of failure into, well, everything that uses them. That can't be good. Especially as it's run by the same compary that also runs the often-unavailable OpenURL registry. What you are saying is that you are suspicious of the HTTP protocol. All the PURL server does is use mechanisms specified by the HTTP protocol. Any HTTP server is capable of implementing those same mechanisms. The actual PURL server is a community based service that allows people to create HTTP URIs that redirect to other URIs without having to run an actual HTTP server. If you don't like its single point of failure, then create your own in-house service using your existing HTTP server. I believe the source code for the entire PURL service is freely available and other people have taken the opportunity to run their own in-house or community based service. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew writes: I have to say I am suspicious of schemes like PURL, which for all their good points introduce a single point of failure into, well, everything that uses them. That can't be good. Especially as it's run by the same compary that also runs the often-unavailable OpenURL registry. What you are saying is that you are suspicious of the HTTP protocol. That is NOT what I am saying. I am saying I am suspicious of a single point of failure. Especially since the entire architecture of the Internet was (rightly IMHO) designed with the goal of avoid SPOFs. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt -- Blair P. Houghton.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew wrote: RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. And RFC 2616 (Hypertext transfer protocol) says: The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. So what you are saying is that it's ok to use the URI for the hypertext transfer protocol in a way that ignores RFC 2616. I'm just not sure how functional that is, in the grand scheme of things. And when you say: The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. I think you are deliberating distorting the intent of the Cool URIs document. You seem to read it that *given* an http uri, here is how the protocol should respond. But in fact the Cool URIs document asks the question So the question is, what URIs should we use in RDF? and responds that one should use http URIs for the reason that: Given only a URI, machines and people should be able to retrieve a description about the resource identified by the URI from the Web. Such a look-up mechanism is important to establish shared understanding of what a URI identifies. Machines should get RDF data and humans should get a readable representation, such as HTML. The standard Web transfer protocol, HTTP, should be used. So it doesn't just say how to respond to an http URI; it says to use http URIs *because* there is a useful possible response. That's a very different statement. It is signficant that (as Mike pointed out, perhaps inadvertently) no one is using mailto: or ftp: as identifiers. That's not a coincidence. kc -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew writes: I have to say I am suspicious of schemes like PURL, which for all their good points introduce a single point of failure into, well, everything that uses them. That can't be good. Especially as it's run by the same compary that also runs the often-unavailable OpenURL registry. What you are saying is that you are suspicious of the HTTP protocol. That is NOT what I am saying. I am saying I am suspicious of a single point of failure. Especially since the entire architecture of the Internet was (rightly IMHO) designed with the goal of avoid SPOFs. OK, good, then if you are concerned about the PURL services SPOF, take the freely available PURL software and created a distributed PURL based system and put it up for the community. Why would I want to do this when I could just Not Use PURLs? Anyway, we're way off the subject now -- I guess if we want to argue about the utility of PURL we could get a room :-) _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ The cladistic defintion of Aves is: an unimportant offshoot of the much cooler dinosaur family which somehow managed to survive the K/T boundry intact -- Eric Lurio.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew wrote: OK, good, then if you are concerned about the PURL services SPOF, take the freely available PURL software and created a distributed PURL based system and put it up for the community. I think several people have looked at this, but I have not heard of any progress or implementations. Andy. The California Digital Library ran the PURL software for a while, using it to mint identifiers for digital documents. It was a while back, but someone there may remember how it went. kc -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Thursday, April 02, 2009 10:15 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Houghton,Andrew wrote: RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. And RFC 2616 (Hypertext transfer protocol) says: The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. So what you are saying is that it's ok to use the URI for the hypertext transfer protocol in a way that ignores RFC 2616. I'm just not sure how functional that is, in the grand scheme of things. You missed the whole point that URIs, specified by RFC 3986, are just tokens that are divorced from protocols, like RFC 2616, but often work in conjunction with them to retrieve a representation of the resource defined by the URI scheme. It is up to the protocol to decide which URI schemes that it will accept. In the case of RFC 2616, there is a one-to-one relationship, today, with the HTTP URI scheme. RFC 2616 could also have said it would accept other URI schemes too or another protocol could be defined, tomorrow, that also accepts the HTTP URI scheme, causing the HTTP URI scheme to have a one-to-many relationship between its scheme and protocols that accept its scheme. And when you say: The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. I think you are deliberating distorting the intent of the Cool URIs document. You seem to read it that *given* an http uri, here is how the protocol should respond. But in fact the Cool URIs document asks the question So the question is, what URIs should we use in RDF? and responds that one should use http URIs for the reason that: Given only a URI, machines and people should be able to retrieve a description about the resource identified by the URI from the Web. Such a look-up mechanism is important to establish shared understanding of what a URI identifies. Machines should get RDF data and humans should get a readable representation, such as HTML. The standard Web transfer protocol, HTTP, should be used. The answer to the question posed in the document is based on Tim Burners-Lee four linked data principles where one of them states to use HTTP URIs. Nobody, as far as I know, has created a hypertext based system based on the URN or info URI schemes. The only hypertext based system available today is the Web which is based on the HTTP protocol that accepts HTTP URIs. So you cannot effectively accomplish linked data on the Web without using HTTP URIs. The document has an RDF / Semantic Web slant, but Tim Burners-Lee's four linked data principles say nothing about RDF or the Semantic Web. Those four principles might be more aptly named the four linked information principles for the Web. Further, the document does go on to describe how an HTTP server (an implementation of RFC 2616) should respond to requests for Real World Object, Generic Documents and Web Documents which is based on the W3C TAG decisions for httpRange-14 and genericResources-53. The scope of the document clearly says: This document is a practical guide for implementers of the RDF specification... It explains two approaches for RDF data hosted on HTTP servers... Section 2.1 discusses HTTP and content negotiation for Generic Documents. Section 4 discusses how the HTTP server should respond with diagrams and actual HTTP status codes to let user agents know which URIs are Real World Objects vs. Generic Document and Web Documents, per the W3 TAG decisions on httpRange-14 and genericResources-53. Section 6 directly address the question that this thread has been talking about, namely using new URI schemes, like URN and info and why they are not acceptable in the context of linked data. And here is a quote which is what I have said over and over again about URI being tokens and divorced from protocols: To be truly useful, a new scheme must be accompanied by a protocol defining how to access more information about the identified resource. For example, the ftp:// URI scheme identifies resources (files on an FTP server), and also comes with a protocol for accessing them (the FTP protocol). Some of the new URI schemes provide no such protocol at all. Others provide a Web Service that allows retrieval of descriptions using the HTTP protocol. The identifier is passed to the service, which looks up
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Karen Coyle writes: OK, good, then if you are concerned about the PURL services SPOF, take the freely available PURL software and created a distributed PURL based system and put it up for the community. I think several people have looked at this, but I have not heard of any progress or implementations. The California Digital Library ran the PURL software for a while, using it to mint identifiers for digital documents. It was a while back, but someone there may remember how it went. Wait, what? They _were_ running a PURL resolver, but now they're not? What does the P in PURL stand for again? _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Wagner's music is nowhere near as bad as it sounds -- Mark Twain.
Re: [CODE4LIB] registering info: uris?
At Thu, 2 Apr 2009 13:47:50 +0100, Mike Taylor wrote: Erik Hetzner writes: Without external knowledge that info:doi/10./xxx is a URI, I can only guess. Yes, that is true. The point is that by specifying that the rft_id has to be a URI, you can then use other kinds of URI without needing to broaden the specification. So: info:doi/10./j.1475-4983.2007.00728.x urn:isbn:1234567890 ftp://ftp.indexdata.com/pub/yaz [Yes, I am throwing in an ftp: URL as an identifier just because I can -- please let's not get sidetracked by this very bad idea :-) ] This is not just hypothetical: the flexibility is useful and the ecapsulation of the choice within a URI is helpful. I maintain an OpenURL resolver that handles rft_id's by invoking a plugin depending on what the URI scheme is; for some URI schemes, such as info:, that then invokes another, lower-level plugin based on the type (e.g. doi in the example above). Such code is straightforward to write, simple to understand, easy to maintain, and nice to extend since all you have to do is provide one more encapsulated plugin. Thanks for the clarification. Honestly I was also responding to Rob Sanderson’s message (bad practice, surely) where he described URIs as ‘self-describing’, which seemed to me unclear. URIs are only self-describing insofar as they describe what type of URI they are. I think that all of us in this discussion like URIs. I can’t speak for, say, Andrew, but, tentatively, I think that I prefer info:doi/10./xxx to plain 10.111/xxx. I would just prefer http://dx.doi.org/10./xxx (Caveat: I have no idea what rft_id, etc, means, so maybe that changes the meaning of what you are saying from how I read it.) No, it's doesn't :-) rft_id is the name of the parameter used in OpenURL 1.0 to denote a referent ID, which is the same thing I've been calling a Thing Identifier elsewhere in this thread. The point with this part of OpenURL is precisely that you can just shove any identifier at the resolver and leave it to do the best job it can. Your only responsibility is to ensure that the identifier you give it is in the form of a URI, so the resolver can use simple rules to pick it apart and decide what to do. Thanks. best, Erik Hetzner pgprSzdg7GAkN.pgp Description: PGP signature
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept. It just so happens that HTTP URIs are accepted by the HTTP protocol, but in the case of mailto URIs they are accepted by the SMTP protocol. However, it is important to note that a HTTP user agent, e.g., a browser, accepts both HTTP and mailto URIs. It decides that it should send the mailto URI to an SMTP user agent, e.g., Outlook, Thunderbird, etc
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
An account that has a depressing ring of accuracy to it. Ray Denenberg, Library of Congress writes: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Hi Ray - At Thu, 2 Apr 2009 13:48:19 -0400, Ray Denenberg, Library of Congress wrote: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. Well, the original concept of the ‘web’ was, as I understand it, to bring together all the existing protocols (gopher, ftp, etc.), with the new one in addition (HTTP), with one unifying address scheme, so that you could have this ‘web browser’ that you could use for everything. So web: would have been nice, but probably wouldn’t have been accepted. As it turns out, HTTP won overwhelmingly, and the older protocols died off. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. Not knowing the details of the history, your account seems correct to me, except that I don’t think the web people tried to alter history. I think of the web of having been a learning experience for all of us. Yes, we used to think that the URI was tied to the protocol. But we have learned that it doesn’t need to be, that HTTP URIs can be just identifiers which happen to be dereferencable at the moment using the HTTP protocol. And it became useful to begin identifying lots of things, people and places and so on, using identifiers, and it also seemed useful to use a protocol that existed (HTTP), instead of coming up with the Person-Metadata Transfer Protocol and inventing a new URI scheme (pmtp://...) to resolve metadata about persons. Because HTTP doesn’t care what kind of data it is sending down the line; it can happily send metadata about people. But that is how things grow; the http:// at the beginning of a URI may eventually be a spandrel, when HTTP is dead and buried. And people will wonder why the address http://dx.doi.org/10./xxx has those funny characters in front of it. And doi.org will be long gone, because they ran out of money, and their domain was taken over by squatters, so we all had to agree to alter our browsers to include an override to not use DNS to resolve the dx.doi.org domain but instead point to a new, distributed system of DOI resolution. We will need to fix these problems as they arise. In my opinion, if we are interested in identifier persistent, clarity about the difference between things and information about things, creating a more useful web (of data), and the other things we ought to be interested in, our time is best spent worrying about these things, and how they can be built on top of the web. Our time is not well spent in coming up with new ways to do things that web already does for us. For instance: if there is concern that HTTP URIs are not seen as being persistent, it would be useful to try to add a method to HTTP which indicated the persistence of an identifier. This way browsers could display a little icon that indicated that the URI was persistent. A user could click on this icon and get information about the institution which claimed persistence for the URI, what the level of support was, what other institution could back up that claim, etc. Our time would not be well spent coming up with an elaborate scheme for phttp:// URIs, creating a better DNS, with name control by a better institution, and a better HTTP, with metadata, and a better caching system, and so on. This is a lot of work and you forget what you were trying to do in the first place, which is make HTTP URIs persistent. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpOEgu0KFRiA.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
Rob Sanderson wrote: info URIs, In My Opinion, are ideally suited for long term identifiers of non information resources. But http URIs are definitely better than something which isn't a URI at all. Through this discussion I am clarifying my thoughts on this too. I feel that info URIs are especially suited for identifiers that are not only long-term identifiers of non-web resources (an ISBN may identify an 'information' resource, but it's not a web resource), but also especially when in addition all of the following are true: 0) Of potential wide-spread (not just local) interest. Ie, NOT a URI for a record in my local catalog. 1) The identifier vocabularly itself pre-dates the web and was not designed for the web. (ISBN, SuDoc). 2) There is not a controlling authority for the identifier vocabularly that _recognizes_ it's responsibility to maintain persistence _and_ has the resources to do fulfill that responsibility. That could be be because: a) There is no single controlling authority at all, the control is distributed, and they don't all have their coordinated act together for a web-world. b) The controlling authority hasn't yet realized that these identifiers matter for a web world, and don't care about URIs. c) There's nobody that wants to commit to this because they think they can't afford it. That's what I'm thinking. URI for a wikipedia concept from dbpedia? Sure, use http. Those aren't going anywhere, because they are web-native, they were created to be web-native, the folks that created them realize what this means, and as long as their project exists they're likely to maintain them, and they're project isn't likely to go away. URI for an ISBN or SuDocs? I don't think the GPO is going anywhere, but the GPO isn't committing to supporting an http URI scheme, and whoever is, who knows if they're going anywhere. That issue is certainly mitigated by Ross using purl.org for these, instead of his own personal http URI. But another issue that makes us want a controlling authority is increasing the chances that everyone will use the _same_ URI. If GPO were behind the purl.org/NET/sudoc URIs, those chances would be high. Just Ross on his own, the chances go down, later someone else (OCLC, GPO, some other guy like Ross) might accidentally create a 'competitor', which would be unfortunate. Note this isn't as much of a problem for born web resources -- nobody's going to accidentally create an alternate URI for a dbpedia term, because anybody that knows about dbpedia knows that it lives at dbpedia. So those are my thoughts. Now everyone else can argue bitterly over them for a while. :) And yes, I agree fully that ALL identifiers ought to be expressed as _some_ kind of URI. Once you've done that, you've avoided the most important mistake, I think. Jonathan
Re: [CODE4LIB] registering info: uris?
On Thu, Apr 2, 2009 at 3:03 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Note this isn't as much of a problem for born web resources -- nobody's going to accidentally create an alternate URI for a dbpedia term, because anybody that knows about dbpedia knows that it lives at dbpedia. Unless they use the corresponding URI from Wikipedia or Freebase. In short, identifiers are based on social contracts and only validated through use. Not because some authority or other has endorsed them, but because they've proliferated through actual, real world, use. Different communities might have the reason to use a different identifier that expresses the *exact same thing* because the syntax or the format better suits their needs. Or language. Or environment. And there's nothing any governing body or standards document can do to stop them. It's obviously a bad time to use this term, but identifiers will not be produced by standards, but by market forces, branding and momentum. -Ross.
Re: [CODE4LIB] registering info: uris?
At Thu, 2 Apr 2009 19:29:49 +0100, Rob Sanderson wrote: All I meant by that was that the info:doi/ URI is more informative as to what the identifier actually is than just the doi by itself, which could be any string. Equally, if I saw an SRW info URI like: info:srw/cql-context-set/2/relevance-1.0 that's more informative than some ad-hoc URI for the same thing. Without the external knowledge that info:doi/xxx is a DOI and info:srw/cql-context-set/2/ is a cql context set administered by the owner with identifier '2' (which happens to be me), then they're still just opaque strings. Yes, info:doi/10./xxx is more easily recognizable (‘sniffable’) as a DOI than 10./xxx, both for humans and machines. If we don’t know, by some external means, that a given string has the form of some identifier, then we must guess, or sniff it. But it is good practice to use other means to ensure that we know whether or not any given string is an identifier, and if it is, what type it is. Otherwise we can get confused by strings like go:home. Was that a URI or not? That said, I see no reason why the URI: info:srw/cql-context-set/2/relevance-1.0 is more informative than the URI: http://srw.org/cql-context-set/2/relevance-1.0 As you say, both are just opaque URIs without the additional information. This information is provided by, in the first case, the info-uri registry people, or, in the second case, by the organization that owns srw.org. I could have said that http://srw.cheshire3.org/contextSets/rel/ was the identifier for it (SRU doesn't care) but that's the location for the retrieval documentation for the context set, not a collection of abstract access points. If srw.cheshire3.org was to go away, then people can still happily use the info URI with the continued knowledge that it shouldn't resolve to anything. If srw.cheshire3.org goes away, people can still happily use the http URI. (see below) With the potential dissolution of DLF, this has real implications, as DLF have an info URI namespace. If they'd registered a bunch of URIs with diglib.org instead, which will go away, then people would have trouble using them. Notably when someone else grabs the domain and starts using the URIs for something else. The original URIs are still just as useful as identifiers, they have become less useful as dereferenceable identifiers. Now if DLF were to disband AND reform, then they can happily go back to using info:dlf/ URIs even if they have a brand new domain. The info:dlf/ URIs would be the same non-dereferenceable URIs they always were, true. But what have we gained? The issue of persistence of dereferenceablity is a real one. There are solutions, e.g, other organizations can step in to host the domain; the ARK scheme; or, we can all agree that the diglib.org domain is too important to let be squatted, and agree that URIs that begin http://diglib.org/ are special, and should by-pass DNS. [1] I think that all of us in this discussion like URIs. I can’t speak for, say, Andrew, but, tentatively, I think that I prefer info:doi/10./xxx to plain 10.111/xxx. I would just prefer http://dx.doi.org/10./xxx info URIs, In My Opinion, are ideally suited for long term identifiers of non information resources. But http URIs are definitely better than something which isn't a URI at all. Something we can all agree on! URIs are better than no URIs. best, Erik 1. Take with a grain of salt, as this is not something I have fully thought out the implications of. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgp4pFCxNEtYW.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
On Wed, Apr 1, 2009 at 6:14 AM, Mike Taylor m...@indexdata.com wrote: As usual, an ounce of example is worth a ton of exposition, so: Suppose I always keep a PDF of my latest paper at http://www.miketaylor.org.uk/latest.pdf for the benefit of people who want to keep an eye on my research. (Hey, it might happen!) Today, I have a PDF there of a paper with the DOI 10./j.1475-4983.2007.00728.x. Tomorrow, my new paper comes out, and I replace the old one with a PDF of that new paper whose DOI is 10.abcdefghij. I move the PDF of the old paper to http://www.miketaylor.org.uk/previous.pdf Now, then -- the DOIs are identifiers: they are not in themsleves dereferencable (although of course they can be used as keys for some mechanism that knows how to dereference them). Each DOI always identifies the same Thing. The URLs are locations: they are dereferencable, but they do not give you any guarantee about what you will find at that location. Two different days, two different papers. Note that a single location (latest.pdf) contains at different times two different Things. And note that a single Thing (the older of the two papers) can be found at different times in two different locations. In contrast, the same identifier always identifies the same Thing, irrespective of what location it's at. Hoorah for examples! Assuming a world where you cannot de-reference this DOI what is it good for? //Ed
Re: [CODE4LIB] registering info: uris?
On Wed, Apr 1, 2009 at 8:37 AM, Mike Taylor m...@indexdata.com wrote: Worse, consider how the actionable-identifier approach would translate to other non-actionable identifiers like ISBNs. If I offer the non-actionable identifier info:isbn/025490 which identified Farlow and Brett-Surman's edited volume The Complete Dinosaur, it's obvious that you have a choice of methods for resolving the ISBN ... but the identifier gives no indication of what those choices might be, and I wouldn't even be able to find out anything more about the info:isbn scheme unless I happened to know that http://info-uri.info/ is the registry for info: URIs (or could Google my way to it). An http: identifier could at least take you to general information about the scheme (perhaps with options for resolution), if not directly to some description of the identified thing itself. Keith
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Wednesday, April 01, 2009 8:38 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Ross Singer writes: I suppose my point is, there's a valid case for identifiers like your doi, I think we can agree on that (well, we don't have to agree, these identifiers will exist and continue to exist long after we've grown tired of flashing out gang signs). What I don't understand is the reason to express that identifier as: info:doi/10./j.1475-4983.2007.00728.x when http://dx.doi.org/10./j.1475-4983.2007.00728.x can serve exactly the same function *and* be actionable. This was exactly the point I was making, but you said it much more coherently than what I said, Ross. If you are going to use a natural identifier, like doi, isbn, lccn, etc., then use it, but if you are going to Web-ify that natural identifier, use an HTTP URI. It doesn't need to be actionable today, but can be tomorrow, without anybody having to write a new resolution mechanism and clients having to integrate that new resolution mechanism in their systems. Typically, most resolution mechanisms for unresolvable URI schemes use HTTP URIs anyway and amount to: http://resolve.example.org/?uri=info:isbn/141574338X http://resolve.example.org/?uri=urn:isbn:141574338X which could have just been: http://isbn.info/141574338X The problem with the latter identifier (and to be clear, yes, I agree that it COULD function as an identifier) is that it gives the impression that what you get when you dereference the DOI is that specific resource, i.e. it enshrines dx.doi.org as THE way of dereferencing DOIs. I agree that Ross's DOI example could function as an identifier. I think we can agree that RFC 3986 says that URIs are just tokens with a specified syntax. Nothing in RFC 3986 says that a URI has to be actionable. You are talking about an impression that isn't enshrined in RFC 3986. It might be better to think about this in terms of the W3C's Cool URIs for the Semantic Web document. That document classifies URIs into three types: Real World Objects, Generic Documents and Web Documents. So which type is: http://dx.doi.org/10./j.1475-4983.2007.00728.x? It depends. If I say that it is a Real World Object, it’s an identifier for the actual DOI identifier. If I say that it is a Web Document, then dereferencing it will give me a specific resource. In this case I can have and probably should have both a Real World Object URI and a Web Document URI. What if I don't want to get the article from dx.doi.org? Maybe if I go via that site, it'll point me to Elsevier's pay-for copy of an article, whereas if I'd fed the DOI to my local library's resolver, it would have sent me to Blackwell's version which the library has a subscription for. An actionable URI mandates (or at leasts strongly suggests) a particular course of action: but I don't want you to tell me what to _do_, I just what you to tell me what the Thing is. People wanting to identify the DOI use the Real World Object URI and people wanting to find out information about the DOI use the Web Document URI. Both these URI, Real World Object and Web Document, are HTTP URIs, so there is little if any value in using info or URN URIs. People *tend* to use URN URIs because RFC 2141 states that the URI has persistents and people *tend* to use info URIs because RFC 4452 because it states there is no persistents. However, persistents is a policy statement made by the minter of a URI. You can make a persistents policy statement about any URI including HTTP URIs. Andy.
Re: [CODE4LIB] registering info: uris?
On Wed, 2009-04-01 at 14:17 +0100, Mike Taylor wrote: Ed Summers writes: Assuming a world where you cannot de-reference this DOI what is it good for? It wouldn't be good for much if you couldn't dereference it at all. The point is that (I argue) the identifier shouldn't tie itself to a particular dereferencing mechanism (such as dx.doi.org, or amazon.com) but should be dereferenced by software that knows what's the most appropriate dereferencing mechanism _for you_ in your situation, with your subscriptions, at particular distances from specific libraries, etc. Heh, that sounds like a good idea. Maybe we could call it an OpenURL? And that distinction about having a dereferencing mechanism sounds okay, but let's call it a ... service. Then we could define an architecture for that sort of thing rather than a Resource oriented one. We could call it a Service Oriented Architecture. Oh, wait... Rob
Re: [CODE4LIB] registering info: uris?
Houghton,Andrew writes: The point is that (I argue) the identifier shouldn't tie itself to a particular dereferencing mechanism (such as dx.doi.org, or amazon.com) but should be dereferenced by software that knows what's the most appropriate dereferencing mechanism _for you_ in your situation, with your subscriptions, at particular distances from specific libraries, etc. Lets separate your argument into two pieces. Identification and resolution. The DOI is the identifier and it inherently doesn't tie itself to any resolution mechanism. Yes. So far, we agree :-) So creating an info URI for it is meaningless, it's just another alias for the DOI. Not quite. Embedding a DOI in an info URI (or a URN) means that the identifier describes its own type. If you just get the naked string 10./j.1475-4983.2007.00728.x passed to you, say as an rft_id in an OpenURL, then you can't tell (except by guessing) whether it's a DOI, a SICI, and ISBN or a biological species identifier. But if you get info:doi/10./j.1475-4983.2007.00728.x then you know what you've got, and can act on it accordingly. I can create an HTTP resolution mechanism for DOI's by doing: http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x or http://resolve.example.org/?uri=info:doi/10./j.1475-4983.2007.00728.x since the info URI contains the natural DOI identifier, wrapping it in a URI scheme has no value when I could have used the DOI identifier directly, as in the first HTTP resolution example. In this case, you're right -- because the parameter name doi tells you what vocabulary the identifier is drawn from, much as the prefix of an XML element name tells you what namespace it's drawn from. But in general, when you can't rely on having that extra bit of data floating around alongside the actual identifier (as in the OpenURL rft_id example) it's nice to have identifiers that are self-describing. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ If only there were some EASY, COWARDLY way out of this -- Bob the Angry Flower, www.angryflower.com
Re: [CODE4LIB] registering info: uris?
I'll bite. There are actually a number of http URLs that work like http://dx.doi.org/10./j.1475-4983.2007.00728.x One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x Another is run by crossref; Some OpenURL ink servers also have doi proxy capability. So for code to extract the doi reliably from http urls, the code needs to know all the possibilities for the doi proxy stem. The proxies also tend to have optional parameters that can control the resolution. In principle, the info:doi/ stem addresses this. On Apr 1, 2009, at 7:27 AM, Ross Singer wrote: What I don't understand is the reason to express that identifier as: info:doi/10./j.1475-4983.2007.00728.x when http://dx.doi.org/10./j.1475-4983.2007.00728.x Eric Hellman e...@hellman.net (personal) http://hellman.net/eric/
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Wednesday, April 01, 2009 9:35 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Houghton,Andrew writes: So creating an info URI for it is meaningless, it's just another alias for the DOI. Not quite. Embedding a DOI in an info URI (or a URN) means that the identifier describes its own type. If you just get the naked string 10./j.1475-4983.2007.00728.x passed to you, say as an rft_id in an OpenURL, then you can't tell (except by guessing) whether it's a DOI, a SICI, and ISBN or a biological species identifier. But if you get info:doi/10./j.1475-4983.2007.00728.x then you know what you've got, and can act on it accordingly. Now you are changing the argument to a specific resolution mechanism, e.g., OpenURL. OpenURL could have easily defined rft_idType where you specified DOI, SICI, ISBN, etc. along with its actual identifier value in rft_id. However, given that OpenURL didn't do this, there is no difference plugging either of the following URIs into rft_id: http://dx.doi.org/10./j.1475-4983.2007.00728.x info:doi/10./j.1475-4983.2007.00728.x when I identify the HTTP URI as a Real World Object. This was the whole point of the W3C TAG httpRange-14 decision which the Cool URIs for the Semantic Web document is based on. So again, wrapping the natural DOI in an unresolvable URI scheme is meaningless. When talking about resolution mechanisms any number of implementations are possible, including separating an identifier type from it value or conflating the two. In the two URIs above the only real differences are: 1) http: vs. info: URI scheme 2) an authority named: dx.doi.org vs. doi These are just simple substitutions. Whoever registered the info URI for doi could have easily applied for an authority named: dx.doi.org instead of just doi, then the only difference would be the URI scheme. Andy.
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric Hellman Sent: Wednesday, April 01, 2009 9:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? There are actually a number of http URLs that work like http://dx.doi.org/10./j.1475-4983.2007.00728.x One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x Another is run by crossref; Some OpenURL ink servers also have doi proxy capability. So for code to extract the doi reliably from http urls, the code needs to know all the possibilities for the doi proxy stem. The proxies also tend to have optional parameters that can control the resolution. In principle, the info:doi/ stem addresses this. Again we have moved the discussion to a specific resolution mechanism, e.g., OpenURL. OpenURL could have been defined differently, such that rft_id and rft_idScheme were available and you used the actual DOI value and specified the scheme of the identifier. Then the issue of extraction of the identifier value from the URI goes away, because there is no URI needed. Andy.
Re: [CODE4LIB] registering info: uris?
Houghton,Andrew writes: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric Hellman Sent: Wednesday, April 01, 2009 9:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? There are actually a number of http URLs that work like http://dx.doi.org/10./j.1475-4983.2007.00728.x One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x Another is run by crossref; Some OpenURL ink servers also have doi proxy capability. So for code to extract the doi reliably from http urls, the code needs to know all the possibilities for the doi proxy stem. The proxies also tend to have optional parameters that can control the resolution. In principle, the info:doi/ stem addresses this. Again we have moved the discussion to a specific resolution mechanism, e.g., OpenURL. OpenURL could have been defined differently, such that rft_id and rft_idScheme were available and you used the actual DOI value and specified the scheme of the identifier. Then the issue of extraction of the identifier value from the URI goes away, because there is no URI needed. Yes, that would have been OK, too. But no doubt there are other contexts where it's possible to pass in an identifier without also being able to say and by the way, it's of type XYZ. Surely you don't disagree that it's good for identifiers to be self-describing? It's the same with actionable URLs: isn't it better than I can tell you: http://www.miketaylor.org.uk/dino/pubs/ Instead of having to say: www.miketaylor.org.uk/dino/pubs/ Oh, by the way, access this using HTTP rather than FTP. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ A Linux system requires rebooting about as often as a Windoze system requires re-installing -- David Joffe.
[CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew wrote: Lets separate your argument into two pieces. Identification and resolution. The DOI is the identifier and it inherently doesn't tie itself to any resolution mechanism. So creating an info URI for it is meaningless, it's just another alias for the DOI. I can create an HTTP resolution mechanism for DOI's by doing: http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x or http://resolve.example.org/?uri=info:doi/10./j.1475-4983.2007.00728.x since the info URI contains the natural DOI identifier, wrapping it in a URI scheme has no value when I could have used the DOI identifier directly, as in the first HTTP resolution example. I disagree that wrapping it in a URI scheme has no value. We have very much software and schemas that are built to store URIs, even if they don't know what the URI is or what can be done with it, we have infrastructure in place for dealing with URIs. So there is value in wrapping a 'natural' identifier in a URI, even if that URI does not carry it's own resolution mechanism with it. I have run into this in several places in my own work. I share Mike's concerns about tying resolution to identification in one mechanism. As a sort of general principle or 'pattern' or design, trying to make one mechanism do two jobs at once is a 'bad smell'. It's in fact (I hope this isn't too far afield) how I'd sum up much of the failure of AACR2/MARC, involving our 'controlled headings' (see me expanding on this in some blog posts at http://bibwild.wordpress.com/2008/01/17/identifiers-and-display-labels-again/). On the other hand, it is awfully _convenient_ to combine these two functions in one mechanism. And convenience does matter too. I can see both sides. So I think we just do what feels right, and when we all disagree on what feels right, we pick one. I don't share the opinion of those who think it's obvious that everything should be an http uri, nor do I share the opinion of those who think it's obvious that this is a disaster. DOI is definitely one good example of where One Canonical Resolution fails. The DOI _resolution_ system fails for me -- it does not reliably or predictably deliver the right document for my users. But a DOI as an identifier is still useful for me. Even if that DOI were expressed in a URI as http://dx.doi.org/resolve/10./j.1475-4983.2007.00728.x, I STILL wouldn't actually use the HTTP server at dx.doi.org to resolve it. I'd extract the actual DOI out of it, and use a different resolution mechanism. Another example to think about is what happens when the protocol for resolution changes? Right now already we could find a resolution service starting to make available and/or insist upon https protocol resolution. But all those existing identifiers expressed as http URIs should not change, they are meant to be persistent. So already it's possible for an identifier originally intended to describe it's own resolution to be slightly wrong. Is this confusing? In the future, maybe we'll have something different than http entirely. Jonathan
Re: [CODE4LIB] registering info: uris?
+1 Jon Stroop Metadata Analyst C-17-D2 Firestone Library Princeton University Princeton, NJ 08544 Email: jstr...@princeton.edu Phone: (609)258-0059 Fax: (609)258-0441 http://diglib.princeton.edu http://diglib.princeton.edu/ead Edward M. Corrado wrote: I disagree. Keep this going. A delete key is in easy reach and if you have a mail reader that does threading you can easily ignore the thread. I have been finding this discussion rather educational. Edward On Wed, Apr 1, 2009 at 10:14 AM, Glen Newton - NRC/CNRC CISTI/ICIST Research glen.new...@nrc-cnrc.gc.ca wrote: I count 75 messages on this topic. Perhaps it is time to take this off list? Someone give us a summary when/if this is resolved? Or start a new list for this issue and tell us where it is? thanks, Glen From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric Hellman Sent: Wednesday, April 01, 2009 9:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? There are actually a number of http URLs that work like http://dx.doi.org/10./j.1475-4983.2007.00728.x One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x Another is run by crossref; Some OpenURL ink servers also have doi proxy capability. So for code to extract the doi reliably from http urls, the code needs to know all the possibilities for the doi proxy stem. The proxies also tend to have optional parameters that can control the resolution. In principle, the info:doi/ stem addresses this. Again we have moved the discussion to a specific resolution mechanism, e.g., OpenURL. OpenURL could have been defined differently, such that rft_id and rft_idScheme were available and you used the actual DOI value and specified the scheme of the identifier. Then the issue of extraction of the identifier value from the URI goes away, because there is no URI needed. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
I admit that httprange-14 still confuses me. (I have no idea why it's called httprange-14 for one thing). But how do you identify the URI as being a Real World Object? I don't understand what it entails. And http://doi.org/*; describes it's own type only to software that knows what a URI beginning http://doi.org means, right? What about Eric Hellman's point that there are a variety of possible http URIs (not just possible but _in use_) that encapsulate a DOI, and given software would have to know all of the possible templates (with more being created all the time)? Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 01, 2009 11:08 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Houghton,Andrew wrote: Lets separate your argument into two pieces. Identification and resolution. The DOI is the identifier and it inherently doesn't tie itself to any resolution mechanism. So creating an info URI for it is meaningless, it's just another alias for the DOI. I can create an HTTP resolution mechanism for DOI's by doing: http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x or http://resolve.example.org/?uri=info:doi/10./j.1475- 4983.2007.00728.x since the info URI contains the natural DOI identifier, wrapping it in a URI scheme has no value when I could have used the DOI identifier directly, as in the first HTTP resolution example. I disagree that wrapping it in a URI scheme has no value. We have very much software and schemas that are built to store URIs, even if they don't know what the URI is or what can be done with it, we have infrastructure in place for dealing with URIs. Oops... that should have read ... wrapping it in an unresolvable URI scheme... The point being that: urn:doi:* info:doi:* provide no advantages over: http://doi.org/* when, per W3C TAG httpRange-14 decision you identify the URI as being a Real World Object. When identifying the HTTP URI as a Real World Object, it is the same as what Mike said about the info URI that: the identifier describes its own type. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 1, 2009 at 11:37 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I admit that httprange-14 still confuses me. (I have no idea why it's called httprange-14 for one thing). http://www.w3.org/2001/tag/group/track/issues/14 Some background: http://efoundations.typepad.com/efoundations/2009/02/httprange14-cool-uris-frbr.html And http://doi.org/*; describes it's own type only to software that knows what a URI beginning http://doi.org means, right? How is that different from the software knowing what info:doi/ means? The difference is, how much more software knows what http: means vs. info:? And this, I think, has got to be point here. How many times do we need to marginalize ourselves with our ideals and expectations that nobody else adheres to before we're rendered completely irrelevant? Doesn't it make sense to coopt the mainstream processes and apply them to our ideals? What, exactly, is the resistance here? What about Eric Hellman's point that there are a variety of possible http URIs (not just possible but _in use_) that encapsulate a DOI, and given software would have to know all of the possible templates (with more being created all the time)? Right, but here again is where we're talking about the difference between a location and the identifier. We're talking about establishing http://dx.doi.org/10./j.1475-4983.2007.00728.x (or something like that -- http://hdl.handle.net/10./j.1475-4983.2007.00728.x might be more appropriate) as the identifier for doi:10./j.1475-4983.2007.00728.x That you can access it via http://doi.wiley.com/10./j.1475-4983.2007.00728.x (or resolve it there) doesn't mean that that's the identifier for it. -Ross.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Houghton,Andrew hough...@oclc.org The point being that: urn:doi:* info:doi:* provide no advantages over: http://doi.org/* I think they do. I realize this is pretty much a dead-end debate as everyone has dug themselves into a position and nobody is going to change their mind. It is a philosophical debate and there isn't a right answer. But in my opinion I won't use the doi example because it's overloaded. Let's talk about the hypothetical sudoc. I think info:sudoc/xyz provides an advantages over: http://sudoc.org/xyz if the latter is not going to resolve. Why? Because it drives me nuts to see http URIs everywhere that give all appearances of resolvability - browsers, editors, etc. turn them into clickable links. Now, if you are setting up a resolution service where you get the document that the sudoc identifies when you click on the URI, then http is appropriate. The *actual document*. Not a description of it in lieu of the document. And the so-called architectural justification that it's ok to return metadata instead of the resource (representation) -- I don't buy it. --Ray
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Wednesday, April 01, 2009 10:17 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Houghton,Andrew writes: Again we have moved the discussion to a specific resolution mechanism, e.g., OpenURL. OpenURL could have been defined differently, such that rft_id and rft_idScheme were available and you used the actual DOI value and specified the scheme of the identifier. Then the issue of extraction of the identifier value from the URI goes away, because there is no URI needed. Yes, that would have been OK, too. But no doubt there are other contexts where it's possible to pass in an identifier without also being able to say and by the way, it's of type XYZ. Surely you don't disagree that it's good for identifiers to be self-describing? Ok, now we moved the discussion back to identifiers rather than resolution mechanisms. Absolutely agree that it's good for identifiers to be self-describing, I wasn't saying otherwise. However, lets take the following URIs: http://any.identifier.org/?scheme=doiid=10./j.1475-4983.2007.00728.x info:doi/10./j.1475-4983.2007.00728.x urn:doi:10./j.1475-4983.2007.00728.x All three are self describing URI. The HTTP URI does exactly the same thing as the info URI without having to create a new URI scheme, e.g., info, and the argument made by IETF and W3C against the creation of info URIs. Also, since the info URI folks actually created a domain name for registering info URIs you could have easily changed any.identifier.org to info-uri.info to achieve the same effect as the info URI. From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Wednesday, April 01, 2009 10:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Imagine your web-browser extended by a plugin that knows how to resolve particularly kinds of info: URLs. If you just paste the raw DOI into the URI bar, it won't have a clue what to do with it, but the wrapped-in-a-URI version stands alone and self-describes, so the plugin can pull it apart and say, ah yes, this URI is a DOI, and I know how my user has configured me to resolve those. Sure you can imagine a web-browser plugin, but these things never happen due to a) the cost of developing or, b) in order for it to work you need a plugin to work for every type of browser. This is why the Architecture of the Web document states: While Web architecture allows the definition of new schemes, introducing a new scheme is costly. Many aspects of URI processing are scheme-dependent, and a large amount of deployed software already processes URIs of well-known schemes. Introducing a new URI scheme requires the development and deployment not only of client software to handle the scheme, but also of ancillary agents such as gateways, proxies, and caches. See [RFC2718] for other considerations and costs related to URI scheme design What you seem to be suggesting (are you?) is that in the former case, the resolver should recognise that the HTTP URL matches the regular expression ^http://dx\.doi\.org\.(.*)$/ and so extract the match and go off and do something else with it. Back to resolution mechanisms... I'm not suggesting anything. You are suggesting a resolution mechanism implementation which uses regular expressions. That is one of many ways a resolution mechanism can retrieve the embedded DOI or identifier of choice. URI Templates is another and given this URI: http://any.identifier.org/?scheme=doiid=10./j.1475-4983.2007.00728.x any Web library on the planet can pull the query parameters out of the URI. as the actionable identifier might be something uglier... A URI is just a token with a predefined syntax, per RFC 3986, used to identify a resource which can be an abstract thing, e.g., Real World Object or a representation of a resource, e.g., a Web Document. One could postulate that all URIs are ugly. Whether a URI is ugly or not is irrelevant. Andy.
Re: [CODE4LIB] registering info: uris?
I completely disagree. There are all sorts of useful identifiers I use in my work every day that can not be automatically dereferenced. Jonathan Ed Summers wrote: On Wed, Apr 1, 2009 at 9:17 AM, Mike Taylor m...@indexdata.com wrote: It wouldn't be good for much if you couldn't dereference it at all. I totally agree. //Ed
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross.
Re: [CODE4LIB] registering info: uris?
From: Jonathan Rochkind rochk...@jhu.edu There are all sorts of useful identifiers I use in my work every day that can not be automatically dereferenced. Even more to the point: there is no sound definition of dereference. To dereference a resource means to retrieve a representation of it. There has never been any agreement within the w3c of what constitutes a representation. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Ross Singer wrote: On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org No, *I* don't. And neither does my email program, since it displayed it as a URL (blue and underlined). That's inside knowledge, not part of the technology. Someone COULD create a web site at that address, and there's nothing in the URI itself to tell me if it's a URI or a URL. The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. John Kunze's ARK system attempted to work around this by using http to retrieve information about the URI, so you're not just left guessing. It's not a question of resolution, but of giving you a short list of things that you can learn about a URI that begins with http. However, again, unless you know the secret you have no idea that those particular URI/Ls have that capability. So again we're going beyond the technology into some human knowledge that has to be there to take advantage of the capabilities. It doesn't seem so far fetched to make it possible for programs (dumb, dumb programs) to know the difference between an identifier and a location based on something universal, like a prefix, without having to be coded for dozens or hundreds of exceptions. kc I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross. -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 1:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that it is a web addresss. However, it is not a location. URIs according to RFC 3986 are just tokens to identify resources. These tokens, e.g., URIs are presented to protocol mechanisms as part of the dereferencing process to locate and retrieve a representation of the resource. People see http: and assume that it means the HTTP protocol so it must be a locator. Whoever initially registered the HTTP URI scheme could have used web as the token instead and we would all be doing: web://example.org/. This is the confusion. People don't understand what RFC 3986 is saying. It makes no claim that any URI registered scheme has persistence or can be dereferenced. An HTTP URI is just a token to identify some resource, nothing more. Andy.
Re: [CODE4LIB] registering info: uris?
On Wed, Apr 1, 2009 at 12:28 PM, Ray Denenberg, Library of Congress r...@loc.gov wrote: Even more to the point: there is no sound definition of dereference. To dereference a resource means to retrieve a representation of it. There has never been any agreement within the w3c of what constitutes a representation. So are you not a fan of: http://www.w3.org/TR/webarch/#internet-media-type //Ed
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
My point is that I don't see how they're different in practice. And one of them actually allowed you to do something from your email client. -Ross. On Wed, Apr 1, 2009 at 1:20 PM, Karen Coyle li...@kcoyle.net wrote: Ross, I don't get your point. My point was about the confusion between two things that begin: http:// but that are very different in practice. What's yours? kc Ross Singer wrote: Your email client knew what do with: info:doi/10./j.1475-4983.2007.00728.x ? doi:10./j.1475-4983.2007.00728.x ? Or did you recognize the info:doi scheme and Google it? Or would this, in case of 99% of the world, just look like gibberish or part of some nerd's PGP key? -Ross. On Wed, Apr 1, 2009 at 1:06 PM, Karen Coyle li...@kcoyle.net wrote: Ross Singer wrote: On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org No, *I* don't. And neither does my email program, since it displayed it as a URL (blue and underlined). That's inside knowledge, not part of the technology. Someone COULD create a web site at that address, and there's nothing in the URI itself to tell me if it's a URI or a URL. The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. John Kunze's ARK system attempted to work around this by using http to retrieve information about the URI, so you're not just left guessing. It's not a question of resolution, but of giving you a short list of things that you can learn about a URI that begins with http. However, again, unless you know the secret you have no idea that those particular URI/Ls have that capability. So again we're going beyond the technology into some human knowledge that has to be there to take advantage of the capabilities. It doesn't seem so far fetched to make it possible for programs (dumb, dumb programs) to know the difference between an identifier and a location based on something universal, like a prefix, without having to be coded for dozens or hundreds of exceptions. kc I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross. -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234 -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] registering info: uris?
At Wed, 1 Apr 2009 14:34:45 +0100, Mike Taylor wrote: Not quite. Embedding a DOI in an info URI (or a URN) means that the identifier describes its own type. If you just get the naked string 10./j.1475-4983.2007.00728.x passed to you, say as an rft_id in an OpenURL, then you can't tell (except by guessing) whether it's a DOI, a SICI, and ISBN or a biological species identifier. But if you get info:doi/10./j.1475-4983.2007.00728.x then you know what you've got, and can act on it accordingly. It seems to me that you are just pushing out by one more level the mechanism to be able to tell what something is. That is - before you needed to know that 10./xxx was a DOI. Now you need to know that info:doi/10./xxx is a URI. Without external knowledge that info:doi/10./xxx is a URI, I can only guess. (Caveat: I have no idea what rft_id, etc, means, so maybe that changes the meaning of what you are saying from how I read it.) -Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpRKlTtYU7Wa.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
On Tue, Mar 31, 2009 at 5:55 AM, Mike Taylor m...@indexdata.com wrote: Identifiers identify; locations locate. I've been avoiding and ignoring this all day, because I wanted the thread to die and we all move on with our lives. But Kevin Clarke just quoted this on Twitter, and I felt I couldn't let this slide by. Locations do not locate. Locations identify 'place'. They are still identifiers. -Ross.
Re: [CODE4LIB] registering info: uris?
At Fri, 27 Mar 2009 20:56:42 -0400, Ross Singer wrote: So, in a what is probably a vain attempt to put this debate to rest, I created a partial redirect PURL for sudoc: http://purl.org/NET/sudoc/ If you pass it any urlencoded sudoc string, you'll be redirected to the GPO's Aleph catalog that searches the sudoc field for that string. http://purl.org/NET/sudoc/E%202.11/3:EL%202 should take you to: http://catalog.gpo.gov/F/?func=find-cccl_term=GVD%3DE%202.11/3:EL%202 There, Jonathan, you have a dereferenceable URI structure that you A) don't have to worry about pointing at something misleading B) don't have to maintain (although I'll be happy to add whoever as a maintainer to this PURL) If the GPO ever has a better alternative, we just point the PURL at it in the future. Beautiful work, Ross. Thank you. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpC8fHWXKSFo.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
From: Erik Hetzner erik.hetz...@ucop.edu I believe that registering a domain would be less work than going through an info URI registration process, but I don’t know how difficult the info URI registration process would be (thus bringing the conversation full circle). [1] Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. --Ray
Re: [CODE4LIB] registering info: uris?
That's got a session token in it, Andrew. Not to mention it will no longer resolve to anything whenever GPO changes their ILS platform. You guys don't seem to believe that I've spent a chunk of time investigating all this stuff before I even brought it up here. I did, really! Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 6:09 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? If GPO had a system where I could resolve Sudoc identifiers, then this whole problem would be solved right there, I wouldn't need to go any further, I'd just use the http URI's associated with that system as identifiers! This whole problem statement is because GPO does not provide any persistent URIs for sudoc's in the first place, right? With a little Googling how about this: sudoc: E 2.11/3:EL 2 http://catalog.gpo.gov/F/FIBJ8T23DNC33L6KEDYR7Q8Q3MF6BI9H7Q5XPG4KB3N57HX35X-17544?func=scanscan_code=SUDscan_start=E+2.11%2F3%3AEL+2 looks like the param scan_start= holds the sudoc number. Sure it gives you other results, but its might work for your purposes. Seems like they are creating bad HTTP responses since Fiddler throws an protocol violation because they do not end the HTTP headers with CR,LF,CR,LF and instead use LF,LF... Andy.
Re: [CODE4LIB] registering info: uris?
I think this is a good point. Ray Denenberg, Library of Congress wrote: From: Erik Hetzner erik.hetz...@ucop.edu I believe that registering a domain would be less work than going through an info URI registration process, but I don’t know how difficult the info URI registration process would be (thus bringing the conversation full circle). [1] Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. --Ray
Re: [CODE4LIB] registering info: uris?
So is there anything wrong with having both that http-based PURL URI available, AND an info uri? Not only available, but in common use? It gets complicated thinking about these things. There are potentially several things wrong with it. Jonathan Ross Singer wrote: On Mon, Mar 30, 2009 at 10:12 AM, Ray Denenberg, Library of Congress r...@loc.gov wrote: Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. Avoiding the religious debate that I *think* Ray is referring to (http vs. info URIs) and instead raising a different religious debate... I don't have a problem with going through this process to formalize an info URI once a domain has been thoroughly evaluated and worked out, but it throws any and all sense of 'agility' out the window and in many cases, kills any potential hope of actually seeing these identifiers at all. The upfront costs are just too high, the details too arcane and the payoff too low for somebody like Jonathan to solve an immediate problem. I'm not saying we shouldn't think these things out beforehand; recklessness, of course, is not the answer. Perfection, however, being the enemy of the good makes me think the info:uri process isn't a particularly good or efficient one for working with real world problems. Add to it that nobody gives a damn about info:uris outside of libraries, it seems like a total waste of energy. Although I suppose that strays back into the original religious debate. -Ross.
Re: [CODE4LIB] registering info: uris?
Jonathan Rochkind writes: So is there anything wrong with having both that http-based PURL URI available, AND an info uri? Not only available, but in common use? Yes, of course! You don't want _two_ vocabularies of URIs for SUDOCs! _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ I am not so much afraid of death, as ashamed thereof -- Sir Thomas Browne (1605-1682), English physician and author.
Re: [CODE4LIB] registering info: uris?
There should be no issue with having both, mainly because like I mentioned earlier, nobody cares about info:uris. Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. As Ray mentioned earlier in this thread, there is absolutely no reason an object cannot have multiple identifiers, especially if they stand to serve somewhat different purposes. I guess the way I look at it is: 1. The web is not going to wait for info:uris 2. The web is not going to use info:uris anyway, even after we've exhausted all of the corner cases and come up with the perfect URI model for a given domain, *because there's nothing the web can do with them anyway*. -Ross. On Mon, Mar 30, 2009 at 10:55 AM, Jonathan Rochkind rochk...@jhu.edu wrote: So is there anything wrong with having both that http-based PURL URI available, AND an info uri? Not only available, but in common use? It gets complicated thinking about these things. There are potentially several things wrong with it. Jonathan Ross Singer wrote: On Mon, Mar 30, 2009 at 10:12 AM, Ray Denenberg, Library of Congress r...@loc.gov wrote: Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. Avoiding the religious debate that I *think* Ray is referring to (http vs. info URIs) and instead raising a different religious debate... I don't have a problem with going through this process to formalize an info URI once a domain has been thoroughly evaluated and worked out, but it throws any and all sense of 'agility' out the window and in many cases, kills any potential hope of actually seeing these identifiers at all. The upfront costs are just too high, the details too arcane and the payoff too low for somebody like Jonathan to solve an immediate problem. I'm not saying we shouldn't think these things out beforehand; recklessness, of course, is not the answer. Perfection, however, being the enemy of the good makes me think the info:uri process isn't a particularly good or efficient one for working with real world problems. Add to it that nobody gives a damn about info:uris outside of libraries, it seems like a total waste of energy. Although I suppose that strays back into the original religious debate. -Ross.
Re: [CODE4LIB] registering info: uris?
On Mon, 2009-03-30 at 16:08 +0100, Ross Singer wrote: There should be no issue with having both, mainly because like I mentioned earlier, nobody cares about info:uris. s/nobody cares/the web doesn't care/ 'The Web' isn't the only use case. There are plenty of reasons for having non dereferencable identifiers, for example for things which do not have a web representation, or have too many web representations to make favouring one over another a waste of time. For example abstract concepts. I guess the way I look at it is: 1. The web is not going to wait for info:uris 2. The web is not going to use info:uris anyway, even after we've exhausted all of the corner cases and come up with the perfect URI model for a given domain, *because there's nothing the web can do with them anyway*. Working As Intended. If you want an identifier that *explicitly* cannot be dereferenced, then info URIs are a good choice. If you want one that can be dereferenced to some representation of the identified object, then HTTP is the only choice. Rob
Re: [CODE4LIB] registering info: uris?
On Mon, Mar 30, 2009 at 11:18 AM, Ray Denenberg, Library of Congress r...@loc.gov wrote: Nor do people outside of libraries care about identifiers. Except, of course, for Tim Berners-Lee and anybody who listens to him: http://www.w3.org/DesignIssues/LinkedData.html -Ross.
Re: [CODE4LIB] registering info: uris?
From: Ross Singer rossfsin...@gmail.com nobody gives a damn about info:uris outside of libraries, Nor do people outside of libraries care about identifiers. --Ray
Re: [CODE4LIB] registering info: uris?
Ross Singer writes: There should be no issue with having both, mainly because like I mentioned earlier, nobody cares about info:uris. Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. What on earth does dereferencing have to do with this? We're talking about an identifier. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ You can never go back -- only forwards, or stand still.
Re: [CODE4LIB] registering info: uris?
On Mon, Mar 30, 2009 at 11:17 AM, Rob Sanderson azar...@liverpool.ac.uk wrote: If you want an identifier that *explicitly* cannot be dereferenced, then info URIs are a good choice. If you want one that can be dereferenced to some representation of the identified object, then HTTP is the only choice. Yes, I completely agree with this, which is why I think it *has* to be no problem that both info:uris and http uris can co-exist. I'm not entirely sure of the use case of identifiers that cannot be derefenced, I mean, I'm sure they exist (driver's license numbers, might be an example), but I don't see anything in the current info:uri registry wouldn't necessarily be better served with an HTTP uri. -Ross.
Re: [CODE4LIB] registering info: uris?
Because the ability to de-reference seems to be the main reason to use an HTTP URI as an identifier, and the main reason that some people prefer an HTTP URI as an identifier to an info: URI. Jonathan Mike Taylor wrote: Ross Singer writes: There should be no issue with having both, mainly because like I mentioned earlier, nobody cares about info:uris. Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. What on earth does dereferencing have to do with this? We're talking about an identifier. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ You can never go back -- only forwards, or stand still.
Re: [CODE4LIB] registering info: uris?
Jonathan Rochkind writes: Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. What on earth does dereferencing have to do with this? We're talking about an identifier. Because the ability to de-reference seems to be the main reason to use an HTTP URI as an identifier, and the main reason that some people prefer an HTTP URI as an identifier to an info: URI. That looks like a plain and simple confusion to me. Identifiers and addresses are two quite different things. That they happen to be expressed in similar or even identical syntax is an accident of history. Surely our experiences with XML namespaces (which do not exist) have taught us that? _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Our users will know fear and cower before our software! Ship it! Ship it and let them flee like the dogs they are! -- Klingon Programming Mantra
Re: [CODE4LIB] registering info: uris?
This is a long argument that's been going on in other communities for a long time, Mike. I can see both sides. Jonathan Mike Taylor wrote: Jonathan Rochkind writes: Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. What on earth does dereferencing have to do with this? We're talking about an identifier. Because the ability to de-reference seems to be the main reason to use an HTTP URI as an identifier, and the main reason that some people prefer an HTTP URI as an identifier to an info: URI. That looks like a plain and simple confusion to me. Identifiers and addresses are two quite different things. That they happen to be expressed in similar or even identical syntax is an accident of history. Surely our experiences with XML namespaces (which do not exist) have taught us that? _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Our users will know fear and cower before our software! Ship it! Ship it and let them flee like the dogs they are! -- Klingon Programming Mantra
Re: [CODE4LIB] registering info: uris?
Houghton,Andrew writes: Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. What on earth does dereferencing have to do with this? We're talking about an identifier. Exactly, that is what people don't understand about RFC 3986. URIs are just identifiers and have nothing to do with dereferencing. Dereferencing only comes into play when the URI is used with an actual protocol like HTTP. The only thing the http:, e.g., URI scheme, starting the URI tells you is what the syntax of the rest of the URI looks like. This is where the authors of info URIs missed the boat. They conflated the URI scheme, e.g., http:, with dereferencing and used it as a justification for a new URI scheme. The authors were told of that misconception before info became an RFC by both the IETF and W3C [...] ... and by me, for what's it's worth (remember, Ray? :-)) ... [...], but they decided to proceed anyway creating another library specific standard that no one else will use. If people would just follow the prescribed practice by the W3C: http://www.w3.org/TR/webarch/ Architecture of the Web says: 2.3.1. URI aliases Best practice: A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource. 2.4. URI Schemes Best practice: A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources. True -- it's all there. The problem is that, after setting up a non-dereferencable http: URI to name something like an XML namespace or a CQL context set, it's just so darned _tempting_ to put something explanatory at the location which happens to be indicated by that URI :-) _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ You can also join us online at www.msnbc.com. You know, I'm always afraid I'm going to say too many Ws. -- NBC news anchorman Tom Brokaw.
Re: [CODE4LIB] registering info: uris?
Meanwhile, there are others who are arguing just as strongly that identifiers should _always_ be resolvable. Seriously, this debate has been going on in a while in other forums, we aren't the first to have it. I can see both sides, neither seems obviously right to me. Which I guess suggests that we need room for both resolvable identifiers and non-resolvable identifiers. (And then people will start arguing on whether http uri's provide all the room we need for non-resolvable ones or not. That argument has been had before too, and I see both sides there too!) Some hints of the existing argument in other forums can be found in this post by Stu Weibel, and the other posts it links to. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Monday, March 30, 2009 11:30 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Ross Singer writes: There should be no issue with having both, mainly because like I mentioned earlier, nobody cares about info:uris. Take, for instance, DOIs. What do you see in the wild? Do you ever see info:uris (except in OpenURLs)? If you don't see http://dx.doi.org/ URIs you generally see doi:10... URIs. It seems like having http and info URIs would *have* to be fine, since info:uris *not being dereferenceable* are far less useful (I won't go so far as 'useless') on the web, which is where all this is happening. What on earth does dereferencing have to do with this? We're talking about an identifier. Exactly, that is what people don't understand about RFC 3986. URIs are just identifiers and have nothing to do with dereferencing. Dereferencing only comes into play when the URI is used with an actual protocol like HTTP. The only thing the http:, e.g., URI scheme, starting the URI tells you is what the syntax of the rest of the URI looks like. This is where the authors of info URIs missed the boat. They conflated the URI scheme, e.g., http:, with dereferencing and used it as a justification for a new URI scheme. The authors were told of that misconception before info became an RFC by both the IETF and W3C, but they decided to proceed anyway creating another library specific standard that no one else will use. If people would just follow the prescribed practice by the W3C: http://www.w3.org/TR/webarch/ Architecture of the Web says: 2.3.1. URI aliases Best practice: A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource. 2.4. URI Schemes Best practice: A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources. Quote: While Web architecture allows the definition of new schemes, introducing a new scheme is costly. Many aspects of URI processing are scheme-dependent, and a large amount of deployed software already processes URIs of well-known schemes. Introducing a new URI scheme requires the development and deployment not only of client software to handle the scheme, but also of ancillary agents such as gateways, proxies, and caches. See [RFC2718] for other considerations and costs related to URI scheme design. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This tag finding pretty much debunks all the reasons given by the info URI authors for creating a new URI scheme. I think Erik Hetzner also referenced it in his posts. Andy.
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Monday, March 30, 2009 12:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? The problem is that, after setting up a non-dereferencable http: URI to name something like an XML namespace or a CQL context set, it's just so darned _tempting_ to put something explanatory at the location which happens to be indicated by that URI :-) and that is what you are suppose to do... Having a representation of the thing is useful and is what makes the Web and any other hypertext system useful. Andy.
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Monday, March 30, 2009 12:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Some hints of the existing argument in other forums can be found in this post by Stu Weibel, and the other posts it links to. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html Unfortunately, Stu is an author of the info URI specification and the he makes the same arguments that they made for the justification of the info URI RFC which has been debunked by the W3C: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Andy.
Re: [CODE4LIB] registering info: uris?
At Mon, 30 Mar 2009 10:12:39 -0400, Ray Denenberg, Library of Congress wrote: Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. Reading over your previous message regarding mapping SuDocs syntax to URI syntax, I completely agree about the necessity of clarifying these rules. But I was referring to the bureaucratic overhead (little thought it may be) in registering an info: URI. This overhead may or may not be useful, but it is there, including a submission process, internal review, public comments (according the draft info URI registry policy). -Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpz1Vry1WFt3.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
I agree with this as well. I guess it just depends on whether you think this needs to be done prior to facitating the process to mint URIs or after. The advantage to the former is that it will actually get documented. Speaking of, if anybody wants to help formalize this for the purl method, I'll be happy to work on it with somebody. -Ross. On Mon, Mar 30, 2009 at 1:40 PM, Erik Hetzner erik.hetz...@ucop.edu wrote: At Mon, 30 Mar 2009 10:12:39 -0400, Ray Denenberg, Library of Congress wrote: Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. Reading over your previous message regarding mapping SuDocs syntax to URI syntax, I completely agree about the necessity of clarifying these rules. But I was referring to the bureaucratic overhead (little thought it may be) in registering an info: URI. This overhead may or may not be useful, but it is there, including a submission process, internal review, public comments (according the draft info URI registry policy). -Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] registering info: uris?
It's interesting that there are at least three, if not four, viewpoints being represented in this conversation. The first argument is over whether all identifiers should be resolvable or not. While I respect the argument that it's _useful_ to have resolvable (to something) identifiers , I think it's an unneccesary limitation to say that all identifiers _must_ be resolvable. There are cases where it is infeasible on a business level to support resolvability. It may be for as simple a reason as that the body who actually maintains the identifiers is not interested in providing such at present. You can argue that they _ought_ to be, but back in the real world, should that stand as a barrier to anyone else using URI identifiers based on that particular identifier system? Wouldn't it be better if it didn't have to be? [ Another obvious example is the SICI -- an identifier for a particular article in a serial. Making these all resolvable in a useful way is a VERY non-trivial exersize. It is not at all easy, and a solution is definitely not cheap (DOI is an attempted solution; which some publishers choose not to pay for; both the DOI fees and the cost of building out their own infrastructure to support it). Why should we be prevented from using identifiers for a particular article in a serial until this difficult and expensive problem is solved?] So I don't buy that all identifiers must always be resolvable, and that if we can't make an identifier resolvable we can't use it. That excludes too much useful stuff. The next argument is, okay, so many all identifiers don't have to be resolvable, but even if it's not resolvable you can still use an http uri for it, just one that doesn't actually resolve. Formally, this is certainly correct. There's no formal requirement that an http URI go anywhere, that there even be an HTTP server responding at the hostname mentioned _at all_. So you _could_ use an http uri like that. But it gets confusing quickly, in part because the first argument referenced is still going on, and some people assume that any http URI _ought_ to be resolvable (to _something_; to _what_ is another argument). Using a non-http uri is a way to avoid confusion over your intentions, stating that you acknolwedged from the start that it was infeasible at the present time to provide http resolution for these identifiers. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Monday, March 30, 2009 12:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Some hints of the existing argument in other forums can be found in this post by Stu Weibel, and the other posts it links to. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html Unfortunately, Stu is an author of the info URI specification and the he makes the same arguments that they made for the justification of the info URI RFC which has been debunked by the W3C: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Andy.
Re: [CODE4LIB] registering info: uris?
On Mar 30, 2009, at 11:18 AM, Ray Denenberg, Library of Congress wrote: From: Ross Singer rossfsin...@gmail.com nobody gives a damn about info:uris outside of libraries, Nor do people outside of libraries care about identifiers. You might be surprised: http://www.lsrn.org/ -hilmar -- === : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : ===
Re: [CODE4LIB] registering info: uris?
At Mon, 30 Mar 2009 13:58:04 -0400, Jonathan Rochkind wrote: It's interesting that there are at least three, if not four, viewpoints being represented in this conversation. The first argument is over whether all identifiers should be resolvable or not. While I respect the argument that it's _useful_ to have resolvable (to something) identifiers , I think it's an unneccesary limitation to say that all identifiers _must_ be resolvable. There are cases where it is infeasible on a business level to support resolvability. It may be for as simple a reason as that the body who actually maintains the identifiers is not interested in providing such at present. You can argue that they _ought_ to be, but back in the real world, should that stand as a barrier to anyone else using URI identifiers based on that particular identifier system? Wouldn't it be better if it didn't have to be? [ Another obvious example is the SICI -- an identifier for a particular article in a serial. Making these all resolvable in a useful way is a VERY non-trivial exersize. It is not at all easy, and a solution is definitely not cheap (DOI is an attempted solution; which some publishers choose not to pay for; both the DOI fees and the cost of building out their own infrastructure to support it). Why should we be prevented from using identifiers for a particular article in a serial until this difficult and expensive problem is solved?] So I don't buy that all identifiers must always be resolvable, and that if we can't make an identifier resolvable we can't use it. That excludes too much useful stuff. I don’t actually think that there is anybody who is arguing that all identifiers must be resolvable. There are people who argue that there are identifiers which must NOT be resolvable; at least in their basic form. (see Stuart Weibel [1]). […] best, Erik 1. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpuKdGTC0Mj7.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
From: Hilmar Lapp hl...@duke.edu Nor do people outside of libraries care about identifiers. You might be surprised: http://www.lsrn.org/ yes, I overstated, let me rephrase. There are communities who are interested in specific object classes and want identifier schemes for them. For libraries there are books, article, journals, and many others. And certainly this isn't limited to libraries, for example many scientific disciplines have a similar interest in identifer schemes for objects in specific object classes. But the term identifier has taken on a whole new meaning with the web. It has now been generalized to identify any resouce, and we don't even have a clear definition of resource, aside from the convoluted anything that can be identified - The discussions on this are often a convoluted mess, and it's no wonder location and identity get confused. And because of all the emphasis on solving this part of the web architecture - which haven't been accomplished, and there is debate within the W3C whether it is even possible - the original concept of identifer seems to be lost, aside from within the communities I alluded to above. And it is for those communities that the info URI is useful. Now as to my reference to religious issues, a statement like Having unresolvable URIs is anti-Web would be better to stated as: Having unresolvable URIs IN MY OPINION is anti-Web. It is an opinion, not a fact. Stating is as fact is dogmatic. It is a reasonable opinion, however, my opinion: Having unresolvable URIs IN MY OPINION is PRO-Web is just as reasonable. I needn't go into further detail, we've beaten this to death already. --Ray
Re: [CODE4LIB] registering info: uris?
Erik Hetzner wrote: I don’t actually think that there is anybody who is arguing that all identifiers must be resolvable. There are people who argue that there are identifiers which must NOT be resolvable; at least in their basic form. (see Stuart Weibel [1]). There are indeed people arguing that, Erik, on this very list. Like, in the email I responded to (did you read that one?). That's why I wrote what I did, man! You know I'm the one who cited Stu's argument first on this list! I am aware of his arguments. I am aware of people arguing various things on this issue. But when did someone suggest that all identifiers must be resolvable? When Andrew argued that: Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Okay, I guess he didn't actually SAY that you should never have non-resolvable identifiers, but he rather strongly implied it, by using the anti-Web epithet. But now we're arguing about what we're arguing about, which is the sure sign that an internet argument should die. Suffice it to say that there are at LEAST three viewpoints (if not more) being expressed in this argument, it's not just two sides. And that, I agree with Ray, these are NOT entirely solved questions, the right answer is not always obvious, reasonable people can disagree. (I happen to think there are a handful of clear WRONG answers, but also a variety of competing potentially right ones.) Jonathan
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Monday, March 30, 2009 3:52 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? But when did someone suggest that all identifiers must be resolvable? When Andrew argued that: Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Okay, I guess he didn't actually SAY that you should never have non- resolvable identifiers, but he rather strongly implied it, by using the anti-Web epithet. You are correct that I didn't say that you should never have unresolvable identifiers and I wasn't implying that either. Though I was pointing out that sticking a href=info:lccn/sh2009123456Text/a into the hypertext system where info URIs are unresolvable negates the effect of linking to it in the first place. Andy.
Re: [CODE4LIB] registering info: uris?
There are obviously other uses for URIs than sticking them in an 'href' attribute of an a. Like, the uses I thought this conversation was about? What are we talking about again? Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Monday, March 30, 2009 3:52 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? But when did someone suggest that all identifiers must be resolvable? When Andrew argued that: Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Okay, I guess he didn't actually SAY that you should never have non- resolvable identifiers, but he rather strongly implied it, by using the anti-Web epithet. You are correct that I didn't say that you should never have unresolvable identifiers and I wasn't implying that either. Though I was pointing out that sticking a href=info:lccn/sh2009123456Text/a into the hypertext system where info URIs are unresolvable negates the effect of linking to it in the first place. Andy.
Re: [CODE4LIB] registering info: uris?
At Mon, 30 Mar 2009 15:52:10 -0400, Jonathan Rochkind wrote: Erik Hetzner wrote: I don’t actually think that there is anybody who is arguing that all identifiers must be resolvable. There are people who argue that there are identifiers which must NOT be resolvable; at least in their basic form. (see Stuart Weibel [1]). There are indeed people arguing that, Erik, on this very list. Like, in the email I responded to (did you read that one?). That's why I wrote what I did, man! You know I'm the one who cited Stu's argument first on this list! I am aware of his arguments. I am aware of people arguing various things on this issue. My apologies for missing Andrew’s argument and not pointing out that you had originally pointed to Stuart’s argument. But when did someone suggest that all identifiers must be resolvable? When Andrew argued that: Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Okay, I guess he didn't actually SAY that you should never have non-resolvable identifiers, but he rather strongly implied it, by using the anti-Web epithet. Given Andrew’s later response, I would like to restate my previous argument: I don’t [] think that there is anybody who is +seriously+ arguing that all identifiers must be resolvable +to be useful as identifiers+. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgps01lTF1mj0.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Jonathan Ray Denenberg, Library of Congress wrote: It's a fairly straightforward process, See: http://info-uri.info/registry/register.html You should look at a few examples first, go to http://info-uri.info/registry/ and click on a few of those listed in the left column. I think registering one for SuDocs would be fairly easy. The info folks are most concerned that the syntax rules are well-described. I had registered a few of these before they started cracking the whip on that (and rightly so), and when I registered info:lc it became more difficult; you might want to look at that for an example: http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lc/ Also, normalization - I suggested looking at info:lccn normalization rules: http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lccn/ --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 3:12 PM Subject: [CODE4LIB] registering info: uris? Does anyone know the process for registering a sub-scheme for info: uris? I'd like to have one for SuDoc classification numbers, info:sudoc/. I'm not sure if I can register that on my own, without working with the US Government Printing Office, who actually maintains sudocs. But if I have to get GPO to do it, I'll probably give up quicker (unless it turns out easier than I thought to find the right person at GPO and get them to sign on -- I doubt it!). Or if the registration process is really long and onerous. But if it's easy enough to just fill out a form and get info:sudoc registered, I'd rather it be legal than use things that look like an info uri but really aren't a legally registered sub-scheme. Anyone know? Jonathan
Re: [CODE4LIB] registering info: uris?
Pointing to the documentation and saying one of these isn't going to work, I'm afraid. Most important is to make sure that the syntax is consistent with URI syntax. Where the syntax of the identifier you're representing is potentially at odds with URI syntax, you might have to make adjustments, like percent-encode. So if you're going to register sudoc, you're going to have to understand the syntax to some degree, there's really no way around it. (I didn't know the lccn syntax, registering it forced me to learn it, and I'm a better man for it.) I don't know much about SuDoc, and most everything seems to point to http://www.gpo.gov/su_docs/fdlp/pubs/explain.html which doesn't really explain their syntax. (Though if you look a bit harder maybe you'll find something better.) But I see this example:Y 3.C 76/3:2 K 54 That's apparently a sudoc. It immediately raises the following flags: spaces, slash, colon, and case (sensitivity).For your purposes I don't think that colon or slash is a problem. (They become a problem when you are using them as special characters for delimitation, but you're not doing that.) Spaces, though, have to be percent encoded. (That simply means replace each occurence of a space with %20.) You also need to look at case-sensitivity. If sudocs are case-sensitive, no problem, if not, then you may want to normalize to either upper or lower case. There may not be any normalization issues (other than case sensitivity, if that). Normalization is an issue only if a particular sudoc can be represented by more than one string. If so you have two choices: 1. prescribe a canonical form (which is the approach we took for LCCNs). 2. simply describe the rules for determining when two strings represent the same sudoc (there is no rule that says that two different info URIs can't refer to the same resource). You can contact me privately if you have problems. No, sorry, I don't know anyone at GPO. I worked the graveyard shift there part time during college. (I had to load mailing machines with junk mail. Several junk items loaded into a machine which would combine them into one mailing item. The machine would jam about every tenth time. Worst job I ever had.) But that was many years ago and that's the last contact I've had with GPO. Good luck. -Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 3:36 PM Subject: Re: [CODE4LIB] registering info: uris? Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Jonathan Ray Denenberg, Library of Congress wrote: It's a fairly straightforward process, See: http://info-uri.info/registry/register.html You should look at a few examples first, go to http://info-uri.info/registry/ and click on a few of those listed in the left column. I think registering one for SuDocs would be fairly easy. The info folks are most concerned that the syntax rules are well-described. I had registered a few of these before they started cracking the whip on that (and rightly so), and when I registered info:lc it became more difficult; you might want to look at that for an example: http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lc/ Also, normalization - I suggested looking at info:lccn normalization rules: http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lccn/ --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 3:12 PM Subject: [CODE4LIB] registering info: uris? Does anyone know the process for registering a sub-scheme for info: uris? I'd like to have one for SuDoc classification numbers, info:sudoc/. I'm not sure if I can register that on my own, without working with the US Government Printing Office, who actually maintains sudocs. But if I have to get GPO to do it, I'll probably give up quicker (unless it turns out easier than I thought to find the right person at GPO and get them to sign on -- I doubt it!). Or if the registration process is really long
Re: [CODE4LIB] registering info: uris?
At Fri, 27 Mar 2009 15:36:43 -0400, Jonathan Rochkind wrote: Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Hi Jonathan - Obviously I don’t know your requirements, but I’d like to suggest that before going down the info: URI road, you read the W3C Technical Architecture Group’s finding ‘URNs, Namespaces and Registries’ [1]. | Abstract | This finding addresses the questions When should URNs or URIs with | novel URI schemes be used to name information resources for the | Web? and Should registries be provided for such identifiers?. The | answers given are Rarely if ever and Probably not. Common | arguments in favor of such novel naming schemas are examined, and | their properties compared with those of the existing http: URI | scheme. | Three case studies are then presented, illustrating how the http: | URI scheme can be used to achieve many of the stated requirements | for new URI schemes. best, Erik Hetzner 1. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpvBsZoxJDPh.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
Yeah, I thought of the URI encoding issue, that's easy enough to deal with, makes sense. I have no idea how to tell if SuDocs are case sensitive or not. But they ARE all assigned by the GPO, and look-up-able in the GPO catalog. Yeah, they have to be URL encoded, certainly, but can't we just say must be a valid SuDoc class (including book number) assigned by the GPO, but [url encode it]. This can't be the only use case for essentially arbitrary strings assigned by a third party controlling authority, that you want to make into an info: uri, right? But maybe I'll try doing the best I can, with or without GPO assistance (Ed Summers said he thought he might know somebody at GPO interested in identifiers), and maybe run it by you? If this ends up being a huge time sink -- I'm probably going to give up, and just use my own illegal info:sudoc identifiers that aren't really registered at all, which would be bad, but I need a sudoc URI and don't have a huge amount of time to sink into doing it 'right'. Believe me, I have already spent quite a bit of time with that document you reference. It was written for an earlier era, clearly. Jonathan Ray Denenberg, Library of Congress wrote: Pointing to the documentation and saying one of these isn't going to work, I'm afraid. Most important is to make sure that the syntax is consistent with URI syntax. Where the syntax of the identifier you're representing is potentially at odds with URI syntax, you might have to make adjustments, like percent-encode. So if you're going to register sudoc, you're going to have to understand the syntax to some degree, there's really no way around it. (I didn't know the lccn syntax, registering it forced me to learn it, and I'm a better man for it.) I don't know much about SuDoc, and most everything seems to point to http://www.gpo.gov/su_docs/fdlp/pubs/explain.html which doesn't really explain their syntax. (Though if you look a bit harder maybe you'll find something better.) But I see this example:Y 3.C 76/3:2 K 54 That's apparently a sudoc. It immediately raises the following flags: spaces, slash, colon, and case (sensitivity).For your purposes I don't think that colon or slash is a problem. (They become a problem when you are using them as special characters for delimitation, but you're not doing that.) Spaces, though, have to be percent encoded. (That simply means replace each occurence of a space with %20.) You also need to look at case-sensitivity. If sudocs are case-sensitive, no problem, if not, then you may want to normalize to either upper or lower case. There may not be any normalization issues (other than case sensitivity, if that). Normalization is an issue only if a particular sudoc can be represented by more than one string. If so you have two choices: 1. prescribe a canonical form (which is the approach we took for LCCNs). 2. simply describe the rules for determining when two strings represent the same sudoc (there is no rule that says that two different info URIs can't refer to the same resource). You can contact me privately if you have problems. No, sorry, I don't know anyone at GPO. I worked the graveyard shift there part time during college. (I had to load mailing machines with junk mail. Several junk items loaded into a machine which would combine them into one mailing item. The machine would jam about every tenth time. Worst job I ever had.) But that was many years ago and that's the last contact I've had with GPO. Good luck. -Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 3:36 PM Subject: Re: [CODE4LIB] registering info: uris? Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Jonathan Ray Denenberg, Library of Congress wrote: It's a fairly straightforward process, See: http://info-uri.info/registry/register.html You should look at a few examples first, go to http://info-uri.info/registry/ and click on a few of those listed in the left column. I think registering one for SuDocs would be fairly easy. The info folks are most
Re: [CODE4LIB] registering info: uris?
I am looking for the easiest possible way to get a legal URI representing a sudoc. My understanding, after looking at this stuff previously, is that info: is a LOT lower barrier than urn:, and that's part of it's purpose. Before Ed or someone else mentions http, to me, using http: URIs would only make sense if the GPO were actually interested in supporting such in a persistent way. I don't really want to have to go down that road just to get a legal URI for a sudoc, but if someone else does, please feel free. :) Jonathan Erik Hetzner wrote: At Fri, 27 Mar 2009 15:36:43 -0400, Jonathan Rochkind wrote: Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Hi Jonathan - Obviously I don’t know your requirements, but I’d like to suggest that before going down the info: URI road, you read the W3C Technical Architecture Group’s finding ‘URNs, Namespaces and Registries’ [1]. | Abstract | This finding addresses the questions When should URNs or URIs with | novel URI schemes be used to name information resources for the | Web? and Should registries be provided for such identifiers?. The | answers given are Rarely if ever and Probably not. Common | arguments in favor of such novel naming schemas are examined, and | their properties compared with those of the existing http: URI | scheme. | Three case studies are then presented, illustrating how the http: | URI scheme can be used to achieve many of the stated requirements | for new URI schemes. best, Erik Hetzner 1. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] registering info: uris?
True, good point. I am looking for something a _bit_ more shareable between other software and institutions than tag. info: still seems a nice compromise to me. Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 4:42 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? I am looking for the easiest possible way to get a legal URI representing a sudoc. My understanding, after looking at this stuff previously, is that info: is a LOT lower barrier than urn:, and that's part of it's purpose. Jonathan you could use TAG URI's, RFC 4151, if you are looking for something quick and dirty. No need to register with any authority since you are using your own DNS name. http://tools.ietf.org/html/rfc4151 Andy.
Re: [CODE4LIB] registering info: uris?
Aha, cool! Yeah, I could use tag for this, but it wouldn't seem appropriate for something I want to encourage others to use compatibly as well, info seems better. Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 4:52 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Also, the date aspect of a tag-uri seems to make it hard to use to mint an identifier that will always represent the same SuDoc, regardless of when it was minted. No the date part is a versioning scheme, not the date you created the tag URI. It's used, for example, where I created a specific tag scheme one day and then decided to create another tag scheme some other day: tag:example.org,1999:date/yy-mm-dd where yy-mm-dd is the year, month and day values. Then I realize that it's Y2K so I create a new tag scheme: tag:example.org,2000:date/-mm-dd Andy.
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 5:00 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Aha, cool! Yeah, I could use tag for this, but it wouldn't seem appropriate for something I want to encourage others to use compatibly as well, info seems better. Not to push tag URIs on you, just providing some information, but if you are working with other organizations, you could just go to GoDaddy and get a domain name for your project, then use an email address instead of ND.EDU: tag:project-n...@my-tags.org,2009:id/sudoc-value Andy.
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 5:28 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Another good idea, true. There are indeed lots of ways to do this. But wait, you don't need a unique hostname for a tag uri, a unique uri (hostname+path) will do? purl.org will only give me the latter, not the former, right? Tag URIs require that the authorizing agency own the domain name and they cannot specify a date that is before their domain registration or in the future. So nobody could mint Tag URIs with purl.org as the domain name. PURLs might be an interesting solution for you if GAO has a system where you can resolve SUDOC identifiers. Then you could create a PURL and point it to their system. Now you get to use your PURL for your project and as a side benefit get lookup capabilities from GAO! Otherwise you could just send them to a relevant page on GAO site. Andy.
Re: [CODE4LIB] registering info: uris?
Correct me if I'm wrong but isn't the point of all this to be able to put the URI in an OpenURL? And info was invented (in part) to avoid putting http URIs in OpenURLs (because they are complicated enough already, why clutter them further). So I don't see that pursuing an http solution to this is very useful. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 5:24 PM Subject: Re: [CODE4LIB] registering info: uris? From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 5:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? I am not interested in maintaining a sudoc.info registration, and neither is my institution, who I wouldn't trust to maintain it (even to the extent of not letting the DNS registration expire) after I left. BTW, you could always use http://purl.org/ and later if you wanted to have it resolve to something just change the PURL.
Re: [CODE4LIB] registering info: uris?
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ray Denenberg, Library of Congress Sent: Friday, March 27, 2009 5:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? Correct me if I'm wrong but isn't the point of all this to be able to put the URI in an OpenURL? And info was invented (in part) to avoid putting http URIs in OpenURLs (because they are complicated enough already, why clutter them further). So I don't see that pursuing an http solution to this is very useful. --Ray Ray, I don't quite understand the to avoid putting http URIs in OpenURLs part. An info URI as well as an HTTP URI use the same encoding rules from RFC 3986, URI Generic Syntax. So neither has an advantage over the other. If you have a %80%CC in your info URI or HTTP URI then sticking it in an OpenURL will require it to become %2580%25CC. So what am I missing about your statement? Andy.
Re: [CODE4LIB] registering info: uris?
At Fri, 27 Mar 2009 17:18:24 -0400, Jonathan Rochkind wrote: I am not interested in maintaining a sudoc.info registration, and neither is my institution, who I wouldn't trust to maintain it (even to the extent of not letting the DNS registration expire) after I left. I think even something as simple as this really needs to be committed to by an organization. So yeah, even willing to take on the responsibility of owning that domain until such time as something useful can be done with it, I do not have, and to me that seems like a requirement, not just a nice to have. I see your point. I believe that registering a domain would be less work than going through an info URI registration process, but I don’t know how difficult the info URI registration process would be (thus bringing the conversation full circle). [1] But it certainly is another option. I feel like most people have the _expectation_ of http resolvability for http URIs though, even though it isn't actually required. If you want there to be an actual http server there at ALL, even one that just responds to all requests with a link to the SuDoc documentation, that's another thing you need. I think there is a strong expectation that if I resolve a URI, I do not end up with a domain squatter. Otherwise I am not so sure what is expected when using an HTTP URI whose primary purpose is identification, not dereferencing. Personally I would be happy to get either a page telling me to check back later [2], or nothing at all. best, Erik Hetzner 1. My last word on this. Because I am already beating a dead horse, I have put it in a footnote. For $100 and basically no time at all you can have 10 years of sudoc.info. If it takes an organization more than 2 or 3 hours of work to register an info: URI, then domain registration is a better deal, as I see it. 2. http://lccn.info/2002022641 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpLGEdroPmog.pgp Description: Digital Signature
Re: [CODE4LIB] registering info: uris?
I've got nothing against putting http uris in OpenURLs myself. I don't understand the objection to that, really. Ray Denenberg, Library of Congress wrote: Correct me if I'm wrong but isn't the point of all this to be able to put the URI in an OpenURL? And info was invented (in part) to avoid putting http URIs in OpenURLs (because they are complicated enough already, why clutter them further). So I don't see that pursuing an http solution to this is very useful. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 5:24 PM Subject: Re: [CODE4LIB] registering info: uris? From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 5:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? I am not interested in maintaining a sudoc.info registration, and neither is my institution, who I wouldn't trust to maintain it (even to the extent of not letting the DNS registration expire) after I left. BTW, you could always use http://purl.org/ and later if you wanted to have it resolve to something just change the PURL.