Re: [CODE4LIB] Protocol-relative URLs in MARC
Thank you Christian Pietsch and Kevin Ford for saving me the trouble, and for doing so with more correctness than I would have mustered. IMHO, where an https url is available, adding a insecure link as an alternative is 100% disadvantageous to users. Eric Hellman President, Free Ebook Foundation Founder, Unglue.it https://unglue.it/ http://go-to-hellman.blogspot.com/ twitter: @gluejar On Aug 18, 2015, at 5:50 AM, Christian Pietsch chr.pietsch+web4...@googlemail.com wrote: On Tue, Aug 18, 2015 at 09:29:17PM +1200, Stuart A. Yeates wrote: While these may appear to be OAI-PMH providers, they're non-conformant: http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolFeatures OAI-PMH requests *must* be submitted using either the HTTP GET or POST methods. Everything that holds for HTTP also holds for HTTPS because HTTPS is simply HTTP over TLS, as the HTTPS standard is aptly titled: https://tools.ietf.org/html/rfc2818 A discussion on the OAI implementers mailing list seemed to converge on the position to accept HTTPS wherever possible but not to require it. That was in 2005 when the IETF had not started to consider declaring HTTP without TLS obsolete altogether. https://www.openarchives.org/pipermail/oai-implementers/2005-February/001419.html Maybe because forcing people to upgrade their tech leaves behind those with the least resources. Maybe because switching to a protocol whose minimum message cost (in cpu cycles) is many thousands of times higher is a dubious cost/benefit trade-off in some situations. The burden of TLS encryption on CPUs is negligible these days: https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html C: -- Christian Pietsch · http://purl.org/net/pietsch LibTec (Library Technology and Knowledge Management) department of Bielefeld University Library, Bielefeld, Germany On Aug 18, 2015, at 11:21 AM, Kevin Ford k...@3windmills.com wrote: I think it is technically permissible, but unwise for a host of reasons, a number of which have been noted in this thread. It boils down to this: at the end of the day - and putting aside the whole SSL/non-SSL tangent - it is a relative reference according to the RFC and that begs the question, Relative to what? Is it relative to your specific system? Relative to the value found in the $2? And how is this crucial component - the base-uri/scheme with which to make the reference absolute - captured? And that’s the crux of the issue. You are looking to address the binary choice between http/https, but those are only two possible schemes out of many. Other valid schemes could be: ftp, sftp, ldap, rtmp, rsync, udp, file, etc. And, without anyway of knowing which scheme is valid, if you dropped the 'scheme' from the URI and those records made it into the wild, the utility of those $u subfields will be substantially diminished, minimally. Finally, I also suspect that it is uncommon (at best) to find relative references in $u (for the reasons above). The RFC recognizes as much, noting a relative reference that begins with two slash characters...are rarely used. Why not just repeat the $u? This is one of the reasons it is repeatable. Rgds, Kevin
Re: [CODE4LIB] Protocol-relative URLs in MARC
Thank you, Andrew, for answering the question. What Stuart wrote, however, is misleading: On Tue, Aug 18, 2015 at 02:59:37PM +1200, Stuart A. Yeates wrote: On Tue, Aug 18, 2015 at 10:08 AM, Andrew Anderson and...@lirn.net wrote: That said, there is a big push recently for dropping non-SSL connections in general (going so far as to call the protocol relative URIs an anti-pattern), so is it really worth all the potential pain and suffering to make your links scheme-agnostic, when maybe it would be a better investment in time to switch them all to SSL instead? This dovetails nicely with some of the discussions I have had recently with electronic services librarians about how to protect patron privacy in an online world by using SSL as an arrow in that quiver. Dropping non-SSL connections is almost certainly a mistake for two classes reasons: (i) a number of very widely used tools and standards (OAI-PMH, web cacheing, monitoring, etc.) are HTTP-only Let me give you a counter example: Of 4810 OAI-PMH providers currently known to BASE https://base-search.net, 147 use a HTTPS base URL. Of the 3632 OAI-PMH sources BASE actively harvests at this time, 107 use HTTPS. (ii) assumptions about the proportion of our users who have access to a certain level tech (i.e. HTTP vs HTTPS) systematically disadvantages already disadvantaged groups of users, perpetuating the kind of social ills that libraries are traditional held to be the cure of. I fail to see how continuing to use insecure, obsolete software is serving social justice. Excellent cryptographic software is available freely and openly. Cheers, Chris -- Christian Pietsch · http://purl.org/net/pietsch LibTec (Library Technology and Knowledge Management) department of Bielefeld University Library, Bielefeld, Germany signature.asc Description: Digital signature
Re: [CODE4LIB] Protocol-relative URLs in MARC
I think it is technically permissible, but unwise for a host of reasons, a number of which have been noted in this thread. It boils down to this: at the end of the day - and putting aside the whole SSL/non-SSL tangent - it is a relative reference according to the RFC and that begs the question, Relative to what? Is it relative to your specific system? Relative to the value found in the $2? And how is this crucial component - the base-uri/scheme with which to make the reference absolute - captured? And that’s the crux of the issue. You are looking to address the binary choice between http/https, but those are only two possible schemes out of many. Other valid schemes could be: ftp, sftp, ldap, rtmp, rsync, udp, file, etc. And, without anyway of knowing which scheme is valid, if you dropped the 'scheme' from the URI and those records made it into the wild, the utility of those $u subfields will be substantially diminished, minimally. Finally, I also suspect that it is uncommon (at best) to find relative references in $u (for the reasons above). The RFC recognizes as much, noting a relative reference that begins with two slash characters...are rarely used. Why not just repeat the $u? This is one of the reasons it is repeatable. Rgds, Kevin On 8/17/15 5:44 PM, Cary Gordon wrote: I think that this is a great idea, if you control all of the URLs in your systems. Otherwise unless all of the major browsers drop http — unlikely — it easily has another ten years in it. Chrome dropped support for SHA-1 a few months ago, and I am sure that it will be another 33 months before all of the old certs are fixed. In other words, the pre-drop certs will all have expired by then and all new ones are SHA-2. Cary On Aug 17, 2015, at 3:08 PM, Andrew Anderson and...@lirn.net wrote: That said, there is a big push recently for dropping non-SSL connections in general (going so far as to call the protocol relative URIs an anti-pattern), so is it really worth all the potential pain and suffering to make your links scheme-agnostic, when maybe it would be a better investment in time to switch them all to SSL instead? This dovetails nicely with some of the discussions I have had recently with electronic services librarians about how to protect patron privacy in an online world by using SSL as an arrow in that quiver.
Re: [CODE4LIB] Protocol-relative URLs in MARC
I once did a lot of serials cataloging and including 'http' or 'https' was the standard we always followed. To double check I ran a quick search on a big pile of MARC records I had handy and found that, of ~120K records with an 856 with first indicator 4, only a very small percentage (about 0.2%) don't include the 'http' in a subfield u. And about two-thirds of those look to be cataloging blunders of one sort or another rather than deliberate decisions. The records that aren't pure blunders all follow the pattern 'example.org', leaving off the leading slashes and trailing. If the http/https is variable or indeterminate for some reason other than the client's browser then you could just record it like that and add an explanatory note in a subfield z. Best, Nate Florin Center for Research Libraries -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart A. Yeates Sent: Monday, August 17, 2015 3:41 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Protocol-relative URLs in MARC I'm in the middle of some work which includes touching the 856s in lots of MARC records pointing to websites we control. The websites are available on both https://example.org/ and http://example.org/ Can I put //example.org/ in the MARC or is this contrary to the standard? Note that there is a separate question about whether various software systems support this, but that's entirely secondary to the question of the standard. cheers stuart -- ...let us be heard from red core to black sky
Re: [CODE4LIB] Protocol-relative URLs in MARC
I think that this is a great idea, if you control all of the URLs in your systems. Otherwise unless all of the major browsers drop http — unlikely — it easily has another ten years in it. Chrome dropped support for SHA-1 a few months ago, and I am sure that it will be another 33 months before all of the old certs are fixed. In other words, the pre-drop certs will all have expired by then and all new ones are SHA-2. Cary On Aug 17, 2015, at 3:08 PM, Andrew Anderson and...@lirn.net wrote: That said, there is a big push recently for dropping non-SSL connections in general (going so far as to call the protocol relative URIs an anti-pattern), so is it really worth all the potential pain and suffering to make your links scheme-agnostic, when maybe it would be a better investment in time to switch them all to SSL instead? This dovetails nicely with some of the discussions I have had recently with electronic services librarians about how to protect patron privacy in an online world by using SSL as an arrow in that quiver.
Re: [CODE4LIB] Protocol-relative URLs in MARC
In theory the 1st indicator dictates the protocol used and 4 =HTTP. However, in all examples on http://www.loc.gov/marc/bibliographic/bd856.html, despite the indicator being used, the protocol part of the URI it is then repeated in the $u field. You can put ‘7’ in the 1st indicator, then use subfield $2 to define other methods. Since only ‘http’ is one of the preset protocols, not https, I guess in theory this means you should use something like 856 70 $uhttps://example.com$2https I’d be pretty surprised if in practice people don’t just do: 856 40 $uhttps://example.com Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 17 Aug 2015, at 21:41, Stuart A. Yeates syea...@gmail.com wrote: I'm in the middle of some work which includes touching the 856s in lots of MARC records pointing to websites we control. The websites are available on both https://example.org/ and http://example.org/ Can I put //example.org/ in the MARC or is this contrary to the standard? Note that there is a separate question about whether various software systems support this, but that's entirely secondary to the question of the standard. cheers stuart -- ...let us be heard from red core to black sky
Re: [CODE4LIB] Protocol-relative URLs in MARC
On Tue, Aug 18, 2015 at 10:08 AM, Andrew Anderson and...@lirn.net wrote: That said, there is a big push recently for dropping non-SSL connections in general (going so far as to call the protocol relative URIs an anti-pattern), so is it really worth all the potential pain and suffering to make your links scheme-agnostic, when maybe it would be a better investment in time to switch them all to SSL instead? This dovetails nicely with some of the discussions I have had recently with electronic services librarians about how to protect patron privacy in an online world by using SSL as an arrow in that quiver. Dropping non-SSL connections is almost certainly a mistake for two classes reasons: (i) a number of very widely used tools and standards (OAI-PMH, web cacheing, monitoring, etc.) are HTTP-only (ii) assumptions about the proportion of our users who have access to a certain level tech (i.e. HTTP vs HTTPS) systematically disadvantages already disadvantaged groups of users, perpetuating the kind of social ills that libraries are traditional held to be the cure of. cheers stuart
Re: [CODE4LIB] Protocol-relative URLs in MARC
There are multiple questions embedded in this: 1) What does the MARC standard have to say about 856$u? $u - Uniform Resource Identifier Uniform Resource Identifier (URI), which provides standard syntax for locating an object using existing Internet protocols. Field 856 is structured to allow for the creation of a URL from the concatenation of other separate 856 subfields. Subfield $u may be used instead of those separate subfields or in addition to them. Subfield $u may be repeated only if both a URN or a URL or more than one URN are recorded. Used for automated access to an electronic item using one of the Internet protocols or by resolution of a URN. Subfield $u may be repeated only if both a URN and a URL or more than one URN are recorded. Field 856 is repeated if more than one URL needs to be recorded. Here, it is established that $u uses a URI, which leads to…. 2) What do the RFCs say about protocol-relative URIs? http://tools.ietf.org/html/rfc3986#section-4.1 URI-reference is used to denote the most common usage of a resource identifier. URI-reference = URI / relative-ref A URI-reference is either a URI or a relative reference. If the URI-reference's prefix does not match the syntax of a scheme followed by its colon separator, then the URI-reference is a relative reference. So by the stated use of URIs in the MARC standard, and the RFC definition of the URI relative reference, there should be no standards basis by which protocol relative URLs should not be valid for use in 856. Expanding out to the software support, most tools that I have used with general URL manipulation in general have no problems with this format, but I have only used PyMARC for manipulating MARC records, not any of the other MARC editors. If they try to be too clever about data validation and not quite clever enough about standards and patterns, there could be issues at this level. As for browser support, IE7 IE8 have issues with double-loading some resources when used in this manner, but those browsers are becoming nearly extinct, so I would not anticipate client-side issues as long as the intermediate system that consumed the 856 record and render it for display can handle this. Our web properties switched to using this pattern several years ago to avoid the “insecure content” warnings and we have had no issues on the client side. Then the other consumers of MARC data come into play — title lists, link resolvers, proxy servers, etc. A lot of what I’ve seen in this space are lipstick wearing dinosaurs of a code base, so unless the vendor is particularly good about keeping up with current web patterns, this is where I would expect the most challenges. There may be implicit or explicit assumptions built into systems that would break with protocol-relative URLs, e.g. if the value is passed directly to a proxy server, it may not know what to do without a scheme prefixed to the URI, and attempt to serve local content instead. That said, there is a big push recently for dropping non-SSL connections in general (going so far as to call the protocol relative URIs an anti-pattern), so is it really worth all the potential pain and suffering to make your links scheme-agnostic, when maybe it would be a better investment in time to switch them all to SSL instead? This dovetails nicely with some of the discussions I have had recently with electronic services librarians about how to protect patron privacy in an online world by using SSL as an arrow in that quiver. Andrew -- Andrew Anderson, President CEO, Library and Information Resources Network, Inc. http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes On Aug 17, 2015, at 16:41, Stuart A. Yeates syea...@gmail.com wrote: I'm in the middle of some work which includes touching the 856s in lots of MARC records pointing to websites we control. The websites are available on both https://example.org/ and http://example.org/ Can I put //example.org/ in the MARC or is this contrary to the standard? Note that there is a separate question about whether various software systems support this, but that's entirely secondary to the question of the standard. cheers stuart -- ...let us be heard from red core to black sky
Re: [CODE4LIB] Protocol-relative URLs in MARC
Information in subfield u should be complete, but even if that weren't the case, it's important to consider how systems handle the information they're given. MARC is just a container, and just because the information is syntactically kosher does not mean it will be processed how you like. In the case at hand, you can do anything you like if the information is just used locally and your systems behaves the way you need. As Andrew mentions, you'll run into trouble if this information gets imported into other systems. kyle On Mon, Aug 17, 2015 at 1:41 PM, Stuart A. Yeates syea...@gmail.com wrote: I'm in the middle of some work which includes touching the 856s in lots of MARC records pointing to websites we control. The websites are available on both https://example.org/ and http://example.org/ Can I put //example.org/ in the MARC or is this contrary to the standard? Note that there is a separate question about whether various software systems support this, but that's entirely secondary to the question of the standard. cheers stuart -- ...let us be heard from red core to black sky