Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-19 Thread Eric Hellman
Thank you Christian Pietsch and Kevin Ford for saving me the trouble, and for 
doing so with more correctness than I would have mustered.

IMHO, where an https url is available, adding a insecure link as an alternative 
is 100% disadvantageous to users.

Eric Hellman
President, Free Ebook Foundation
Founder, Unglue.it https://unglue.it/
http://go-to-hellman.blogspot.com/
twitter: @gluejar

 On Aug 18, 2015, at 5:50 AM, Christian Pietsch 
 chr.pietsch+web4...@googlemail.com wrote:
 
 On Tue, Aug 18, 2015 at 09:29:17PM +1200, Stuart A. Yeates wrote:
 While these may appear to be OAI-PMH providers, they're non-conformant:
 
 http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolFeatures
 
 OAI-PMH requests *must* be submitted using either the HTTP GET or POST
 methods.
 
 Everything that holds for HTTP also holds for HTTPS because HTTPS is
 simply HTTP over TLS, as the HTTPS standard is aptly titled:
 https://tools.ietf.org/html/rfc2818
 
 A discussion on the OAI implementers mailing list seemed to converge
 on the position to accept HTTPS wherever possible but not to require
 it. That was in 2005 when the IETF had not started to consider
 declaring HTTP without TLS obsolete altogether.
 https://www.openarchives.org/pipermail/oai-implementers/2005-February/001419.html
 
 Maybe because forcing people to upgrade their tech leaves behind those with
 the least resources. Maybe because switching to a protocol whose minimum
 message cost (in cpu cycles) is many thousands of times higher is a dubious
 cost/benefit trade-off in some situations.
 
 The burden of TLS encryption on CPUs is negligible these days:
 https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
 
 C:
 
 -- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec (Library Technology and Knowledge Management) department
  of Bielefeld University Library, Bielefeld, Germany

 On Aug 18, 2015, at 11:21 AM, Kevin Ford k...@3windmills.com wrote:
 
 I think it is technically permissible, but unwise for a host of reasons, a 
 number of which have been noted in this thread.
 
 It boils down to this:  at the end of the day - and putting aside the whole 
 SSL/non-SSL tangent - it is a relative reference according to the RFC and 
 that begs the question, Relative to what?  Is it relative to your specific 
 system?  Relative to the value found in the $2?  And how is this crucial 
 component - the base-uri/scheme with which to make the reference absolute - 
 captured?
 
 And that’s the crux of the issue.  You are looking to address the binary 
 choice between http/https, but those are only two possible schemes out of 
 many.  Other valid schemes could be:  ftp, sftp, ldap, rtmp, rsync, udp, 
 file, etc.
 
 And, without anyway of knowing which scheme is valid, if you dropped the 
 'scheme' from the URI and those records made it into the wild, the utility of 
 those $u subfields will be substantially diminished, minimally.
 
 Finally, I also suspect that it is uncommon (at best) to find relative 
 references in $u (for the reasons above). The RFC recognizes as much, noting 
 a relative reference that begins with two slash characters...are rarely 
 used.
 
 Why not just repeat the $u?  This is one of the reasons it is repeatable.
 
 Rgds,
 Kevin
 
 


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-18 Thread Christian Pietsch
Thank you, Andrew, for answering the question. What Stuart wrote,
however, is misleading:

On Tue, Aug 18, 2015 at 02:59:37PM +1200, Stuart A. Yeates wrote:
 On Tue, Aug 18, 2015 at 10:08 AM, Andrew Anderson and...@lirn.net wrote:
 
  That said, there is a big push recently for dropping non-SSL connections
  in general (going so far as to call the protocol relative URIs an
  anti-pattern), so is it really worth all the potential pain and suffering
  to make your links scheme-agnostic, when maybe it would be a better
  investment in time to switch them all to SSL instead?  This dovetails
  nicely with some of the discussions I have had recently with electronic
  services librarians about how to protect patron privacy in an online world
  by using SSL as an arrow in that quiver.
 
 
 Dropping non-SSL connections is almost certainly a mistake for two classes
 reasons:
 (i) a number of very widely used tools and standards (OAI-PMH, web
 cacheing, monitoring, etc.) are HTTP-only

Let me give you a counter example: Of 4810 OAI-PMH providers currently
known to BASE https://base-search.net, 147 use a HTTPS base URL. Of
the 3632 OAI-PMH sources BASE actively harvests at this time, 107 use
HTTPS.

 (ii) assumptions about the proportion of our users who have access
 to a certain level tech (i.e. HTTP vs HTTPS) systematically disadvantages
 already disadvantaged groups of users, perpetuating the kind of
 social ills that libraries are traditional held to be the cure of.

I fail to see how continuing to use insecure, obsolete software is
serving social justice. Excellent cryptographic software is available
freely and openly.

Cheers,
Chris

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec (Library Technology and Knowledge Management) department
  of Bielefeld University Library, Bielefeld, Germany


signature.asc
Description: Digital signature


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-18 Thread Kevin Ford
I think it is technically permissible, but unwise for a host of reasons, 
a number of which have been noted in this thread.


It boils down to this:  at the end of the day - and putting aside the 
whole SSL/non-SSL tangent - it is a relative reference according to 
the RFC and that begs the question, Relative to what?  Is it relative 
to your specific system?  Relative to the value found in the $2?  And 
how is this crucial component - the base-uri/scheme with which to make 
the reference absolute - captured?


And that’s the crux of the issue.  You are looking to address the binary 
choice between http/https, but those are only two possible schemes out 
of many.  Other valid schemes could be:  ftp, sftp, ldap, rtmp, rsync, 
udp, file, etc.


And, without anyway of knowing which scheme is valid, if you dropped the 
'scheme' from the URI and those records made it into the wild, the 
utility of those $u subfields will be substantially diminished, minimally.


Finally, I also suspect that it is uncommon (at best) to find relative 
references in $u (for the reasons above). The RFC recognizes as much, 
noting a relative reference that begins with two slash characters...are 
rarely used.


Why not just repeat the $u?  This is one of the reasons it is repeatable.

Rgds,
Kevin


On 8/17/15 5:44 PM, Cary Gordon wrote:

I think that this is a great idea, if you control all of the URLs in your 
systems. Otherwise unless all of the major browsers drop http — unlikely — it 
easily has another ten years in it.

Chrome dropped support for SHA-1 a few months ago, and I am sure that it will 
be another 33 months before all of the old certs are fixed. In other words, the 
pre-drop certs will all have expired by then and all new ones are SHA-2.

Cary



On Aug 17, 2015, at 3:08 PM, Andrew Anderson and...@lirn.net wrote:

That said, there is a big push recently for dropping non-SSL connections in 
general (going so far as to call the protocol relative URIs an anti-pattern), 
so is it really worth all the potential pain and suffering to make your links 
scheme-agnostic, when maybe it would be a better investment in time to switch 
them all to SSL instead?  This dovetails nicely with some of the discussions I 
have had recently with electronic services librarians about how to protect 
patron privacy in an online world by using SSL as an arrow in that quiver.


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Nathaniel Florin
I once did a lot of serials cataloging and including 'http' or 'https' was the 
standard we always followed. To double check I ran a quick search on a big pile 
of MARC records I had handy and found that, of ~120K records with an 856 with 
first indicator 4, only a very small percentage (about 0.2%) don't include the 
'http' in a subfield u. And about two-thirds of those look to be cataloging 
blunders of one sort or another rather than deliberate decisions.

The records that aren't pure blunders all follow the pattern 'example.org', 
leaving off the leading slashes and trailing. If the http/https is variable or 
indeterminate for some reason other than the client's browser then you could 
just record it like that and add an explanatory note in a subfield z.

Best,

Nate Florin
Center for Research Libraries


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart 
A. Yeates
Sent: Monday, August 17, 2015 3:41 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Protocol-relative URLs in MARC

I'm in the middle of some work which includes touching the 856s in lots of MARC 
records pointing to websites we control. The websites are available on both 
https://example.org/ and http://example.org/

Can I put //example.org/ in the MARC or is this contrary to the standard?

Note that there is a separate question about whether various software systems 
support this, but that's entirely secondary to the question of the standard.

cheers
stuart
--
...let us be heard from red core to black sky


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Cary Gordon
I think that this is a great idea, if you control all of the URLs in your 
systems. Otherwise unless all of the major browsers drop http — unlikely — it 
easily has another ten years in it.

Chrome dropped support for SHA-1 a few months ago, and I am sure that it will 
be another 33 months before all of the old certs are fixed. In other words, the 
pre-drop certs will all have expired by then and all new ones are SHA-2.

Cary


 On Aug 17, 2015, at 3:08 PM, Andrew Anderson and...@lirn.net wrote:
 
 That said, there is a big push recently for dropping non-SSL connections in 
 general (going so far as to call the protocol relative URIs an anti-pattern), 
 so is it really worth all the potential pain and suffering to make your links 
 scheme-agnostic, when maybe it would be a better investment in time to switch 
 them all to SSL instead?  This dovetails nicely with some of the discussions 
 I have had recently with electronic services librarians about how to protect 
 patron privacy in an online world by using SSL as an arrow in that quiver.


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Owen Stephens
In theory the 1st indicator dictates the protocol used and 4 =HTTP. However, in 
all examples on http://www.loc.gov/marc/bibliographic/bd856.html, despite the 
indicator being used, the protocol part of the URI it is then repeated in the 
$u field.

You can put ‘7’ in the 1st indicator, then use subfield $2 to define other 
methods.

Since only ‘http’ is one of the preset protocols, not https, I guess in theory 
this means you should use something like

856 70 $uhttps://example.com$2https

I’d be pretty surprised if in practice people don’t just do:

856 40 $uhttps://example.com

Owen


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

 On 17 Aug 2015, at 21:41, Stuart A. Yeates syea...@gmail.com wrote:
 
 I'm in the middle of some work which includes touching the 856s in lots of
 MARC records pointing to websites we control. The websites are available on
 both https://example.org/ and http://example.org/
 
 Can I put //example.org/ in the MARC or is this contrary to the standard?
 
 Note that there is a separate question about whether various software
 systems support this, but that's entirely secondary to the question of the
 standard.
 
 cheers
 stuart
 --
 ...let us be heard from red core to black sky


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Stuart A. Yeates
On Tue, Aug 18, 2015 at 10:08 AM, Andrew Anderson and...@lirn.net wrote:

 That said, there is a big push recently for dropping non-SSL connections
 in general (going so far as to call the protocol relative URIs an
 anti-pattern), so is it really worth all the potential pain and suffering
 to make your links scheme-agnostic, when maybe it would be a better
 investment in time to switch them all to SSL instead?  This dovetails
 nicely with some of the discussions I have had recently with electronic
 services librarians about how to protect patron privacy in an online world
 by using SSL as an arrow in that quiver.


Dropping non-SSL connections is almost certainly a mistake for two classes
reasons:
(i) a number of very widely used tools and standards (OAI-PMH, web
cacheing, monitoring, etc.) are HTTP-only
(ii) assumptions about the proportion of our users who have access to a
certain level tech (i.e. HTTP vs HTTPS) systematically disadvantages
already disadvantaged groups of users, perpetuating the kind of social ills
that libraries are traditional held to be the cure of.

cheers
stuart


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Andrew Anderson
There are multiple questions embedded in this:

1) What does the MARC standard have to say about 856$u?

$u - Uniform Resource Identifier

Uniform Resource Identifier (URI), which provides standard syntax for locating 
an object using existing Internet protocols. Field 856 is structured to allow 
for the creation of a URL from the concatenation of other separate 856 
subfields. Subfield $u may be used instead of those separate subfields or in 
addition to them.

Subfield $u may be repeated only if both a URN or a URL or more than one URN 
are recorded.

Used for automated access to an electronic item using one of the Internet 
protocols or by resolution of a URN. Subfield $u may be repeated only if both a 
URN and a URL or more than one URN are recorded. Field 856 is repeated if more 
than one URL needs to be recorded.

Here, it is established that $u uses a URI, which leads to….

2) What do the RFCs say about protocol-relative URIs?

http://tools.ietf.org/html/rfc3986#section-4.1

  URI-reference is used to denote the most common usage of a resource
   identifier.

  URI-reference = URI / relative-ref

   A URI-reference is either a URI or a relative reference.  If the
   URI-reference's prefix does not match the syntax of a scheme followed
   by its colon separator, then the URI-reference is a relative
   reference.

So by the stated use of URIs in the MARC standard, and the RFC definition of 
the URI relative reference, there should be no standards basis by which 
protocol relative URLs should not be valid for use in 856.

Expanding out to the software support, most tools that I have used with general 
URL manipulation in general have no problems with this format, but I have only 
used PyMARC for manipulating MARC records, not any of the other MARC editors. 
If they try to be too clever about data validation and not quite clever enough 
about standards and patterns, there could be issues at this level.

As for browser support, IE7  IE8 have issues with double-loading some 
resources when used in this manner, but those browsers are becoming nearly 
extinct, so I would not anticipate client-side issues as long as the 
intermediate system that consumed the 856 record and render it for display can 
handle this.  Our web properties switched to using this pattern several years 
ago to avoid the “insecure content” warnings and we have had no issues on the 
client side.  

Then the other consumers of MARC data come into play — title lists, link 
resolvers, proxy servers, etc.  A lot of what I’ve seen in this space are 
lipstick wearing dinosaurs of a code base, so unless the vendor is particularly 
good about keeping up with current web patterns, this is where I would expect 
the most challenges.  There may be implicit or explicit assumptions built into 
systems that would break with protocol-relative URLs, e.g. if the value is 
passed directly to a proxy server, it may not know what to do without a scheme 
prefixed to the URI, and attempt to serve local content instead.

That said, there is a big push recently for dropping non-SSL connections in 
general (going so far as to call the protocol relative URIs an anti-pattern), 
so is it really worth all the potential pain and suffering to make your links 
scheme-agnostic, when maybe it would be a better investment in time to switch 
them all to SSL instead?  This dovetails nicely with some of the discussions I 
have had recently with electronic services librarians about how to protect 
patron privacy in an online world by using SSL as an arrow in that quiver.

Andrew

-- 
Andrew Anderson, President  CEO, Library and Information Resources Network, 
Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | 
http://www.facebook.com/LIRNnotes

On Aug 17, 2015, at 16:41, Stuart A. Yeates syea...@gmail.com wrote:

 I'm in the middle of some work which includes touching the 856s in lots of
 MARC records pointing to websites we control. The websites are available on
 both https://example.org/ and http://example.org/
 
 Can I put //example.org/ in the MARC or is this contrary to the standard?
 
 Note that there is a separate question about whether various software
 systems support this, but that's entirely secondary to the question of the
 standard.
 
 cheers
 stuart
 --
 ...let us be heard from red core to black sky


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Kyle Banerjee
Information in subfield u should be complete, but even if that weren't the
case, it's important to consider how systems handle the information they're
given. MARC is just a container, and just because the information is
syntactically kosher does not mean it will be processed how you like.

In the case at hand,  you can do anything you like if the information is
just used locally and your systems behaves the way you need. As Andrew
mentions, you'll run into trouble if this information gets imported into
other systems.

kyle


On Mon, Aug 17, 2015 at 1:41 PM, Stuart A. Yeates syea...@gmail.com wrote:

 I'm in the middle of some work which includes touching the 856s in lots of
 MARC records pointing to websites we control. The websites are available on
 both https://example.org/ and http://example.org/

 Can I put //example.org/ in the MARC or is this contrary to the standard?

 Note that there is a separate question about whether various software
 systems support this, but that's entirely secondary to the question of the
 standard.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky