Re: [CODE4LIB] Representing copyright holder in MODS
From: Mike Taylor Any thoughts on how I might use this to express the copyright status of the item's abstract? One way, that I have heard discussed (though I don't know if anyone is doing it) is to represent the abstract as part of a related item (type = constituent). The related item could consist of just the abstract and the copyright statement. --Ray
Re: [CODE4LIB] If you were starting over, what would you learn and how would you do it?
Along the lines of oh, you meant THIS profession Rotational vs. linear mechanics. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate Vack Sent: Friday, May 06, 2011 4:47 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] If you were starting over, what would you learn and how would you do it? On Fri, May 6, 2011 at 2:07 PM, Ceci Land cl...@library.msstate.edu wrote: How would you choose to develop your skills from baby level to something useful to the profession? I'd pretty much follow the plot of Batman Begins as closely as possible. Wait, useful to *this* profession? -n
Re: [CODE4LIB] mailing list administratativia
I think the constraint is that it has to be a rational number. -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric Hellman Sent: Wednesday, October 27, 2010 5:58 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] mailing list administratativia I vote for changing the limit threshold to PI * (eventual length of this meta-thread). On Oct 27, 2010, at 3:37 PM, Alexander Johannesen wrote: On Thu, Oct 28, 2010 at 2:44 AM, Doran, Michael D do...@uta.edu wrote: Can that limit threshold be raised? If so, are there reasons why it should not be raised? Is it to throttle spam or something? 50 seems rather low, and it's rather depressing to have a lively discussion throttled like that. Not to mention I thought I was simply kicked out for living things up (especially given my reasonable follow-up was where the throttling began). Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen --- Eric Hellman President, Gluejar, Inc. 41 Watchung Plaza, #132 Montclair, NJ 07042 USA e...@hellman.net http://go-to-hellman.blogspot.com/ @gluejar
Re: [CODE4LIB] MARCXML - What is it for?
It really is possible to make your point without being quite so obnoxious. Everyone else seems to be able to do so. --Ray -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen Sent: Monday, October 25, 2010 3:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote: Switching to an XML format doesn't help with that at all. I'm willing to take it further and say that MARCXML was the worst thing the library world ever did. Some might argue it was a good first step, and that it was better with something rather than nothing, to which I respond ; Poppycock! MARCXML is nothing short of evil. Not only does it goes against every principal of good XML anywhere (don't rely on whitespace, structure over code, namespace conventions, identity management, document control, separation of entities and properties, and on and on), it breaks the ontological commitment that a better treatment of the MARC data could bring, deterring people from actually a) using the darn thing as anything but a bare minimal crutch, and b) expanding it to be actual useful and interesting. The quicker the library world can get rid of this monstrosity, the better, although I doubt that will ever happen; it will hang around like a foul stench for as long as there is MARC in the world. A long time. A long sad time. A few extra notes; http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html Can you tell I'm not a fan? :) Kind regards, Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] SRU 2.0 / Accept-Ranges (was: Inlining HTTP Headers in URLs )
Joe Hourcle wrote: Do we have anyone affiliated with the project on this list who can make a correction before it leaves draft? Could you submit this suggestion formally See: http://www.oasis-open.org/committees/comments/index.php?wg_abbrev=search-ws (The SRU and CQL development gets discussed on various lists, which is fine, but when the discussion leads to suggested changes, it is best if any such proposals can be formally submitted to OASIS via the above. Otherwise OASIS gets angry.) --Ray
Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts
There is no synchronous operation in SRU. As for federated search . To digress a moment, you may recall -- I believe it was on this list -- there was discussion (maybe a year ago?) of what that even means and whether it is the same or differs from metasearch, whatever that means. That discussion was inconclusive. Anyway, earlier drafts of SRU 2.0 describe a metasearch model. Recently, the committee decided that the terms metasearch and federated search are undefined jargon. We now choose to call it multi-server search. So to answer the question of whether there is federated search support: yes, limited support, if by federated search you mean multi server search. There is no multi-server support in terms of separate result sets for different servers. However, for (1) faceted search results, and (2) subquery results, these can be grouped according to server. --Ray -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Kuba Sent: Tuesday, May 18, 2010 4:07 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts Hi, Does the current draft include any support for asynchronous operation of the protocol (either by status notifications and polling and/or streaming), e.g some chunk of results coming back before others? Sometime ago I read through an early draft published on the LOC site and it mentioned support for federated search but it's hard to imagine how could that be implemented without any async support.
Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts
On 18 May 2010 15:24, Ray Denenberg, Library of Congress r...@loc.gov wrote: There is no synchronous operation in SRU. Sorry, meant to say no asynchronous . --Ray
Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts
What advantage do you see in having a concurrent operations feature (like Z39.50) versus opening several connections? (Concurrent operations introduced significant complexity into Z39.50 - including reference ids, operations, etc, and I'm not sure anyone ever really thought it was worth it.) --Ray -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Kuba Sent: Tuesday, May 18, 2010 12:58 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts That is quite unfortunate, as we were looking at SRU 2.0 as a possible candidate for the front-end protocol for Index Data's pazpar2. The main problem with federate/broadcast/meta (however you want to call it ;) searching is that the back-end databases are scattered in different locations or simply slow in their response times and in order to provide decent user experience you need to be able to present some results sooner than others. Waiting for the slowest database to respond is usually not an option. On Tue, May 18, 2010 at 5:24 PM, Ray Denenberg, Library of Congress r...@loc.gov wrote: On 18 May 2010 15:24, Ray Denenberg, Library of Congress r...@loc.gov wrote: There is no synchronous operation in SRU. Sorry, meant to say no asynchronous . --Ray -- Cheers, Jakub
Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts
First, no. There are extensibility features in SRU but nothing that would help here. Actually, Jonathan, what I though you were suggesting was the creation of a (I hesitate to say it) metasearch engine. I use that term because it is what NISO called it, when they started their metasearch initiative five or so years ago, to create a standard for a metasearch engine, but they got distracted and the effort really came to nothing. The premise of the metasearch engine is that there exists a single-thread protocol, for example, SRU, and the need is to manage many threads, which is what the metasearch engine would have done if it had ever been defined. This is probably not an area for OASIS work, but if someone wanted to revive the effort in NISO (and put it on the right track) it could be useful. --Ray -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, May 18, 2010 2:56 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts Jakub Skoczen wrote: I wonder if someone, like Kuba, could design an 'extended async SRU' on top of SRU, that is very SRU like, but builds on top of it to add just enough operations for Kuba's use case area. I think that's the right way to approach it. Is there a particular extensibility feature in the protocol that allows for this? I don't know, but that's not what I was suggesting. I was suggesting you read the SRU spec, and then design your own SRU-async spec, which is defined as exactly like SRU 2.0, except it also has the following operations, and is identified in an Explain document like X. Jonathan
Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts
Rather, OpenSearch descriptions provide a _different URL template_ for every response IANA content type available Yes, of course, that's what I meant, I said it somewhat slopily, but in many of the examples we've looked at, it comes down to the same thing, that the different templates differ only in a single (hard-coded) parameter. --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Monday, May 17, 2010 6:11 PM Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts Ray Denenberg, Library of Congress wrote: Ralph will probably be able to articulate this better than I can, but the accept parameter is driven by the requirement to be able to use OpenSearch (for example) to query an SRU server. The description document isn't going to provide templates that allow you to do this via content negotiation, they provide a parameter instead, to allow the client to tell the server that it wants, for example, an rss response. No, they don't. I am having this same debate with Tony Hammond. OpenSearch descriptions do NOT provide a parameter to allow the client to tell the server what response it wants. They also don't easily provide for content negotiation, it is true. Rather, OpenSearch descriptions provide a _different URL template_ for every response IANA content type available. Here is the example from the OpenSearch documentation: Url type=application/rss+xml xmlns:example=http://example.com/opensearchextensions/1.0/; template=http://example.com?q={searchTerms}amp;c={example:color?}/ Please note that application/rss+xml is an attribute of the URL template itself, it is NOT a parameter in the template. If SRU added an accept parameter to try and make OpenSearch happy, this is a big mistake, because it in fact _conflicts_ with OpenSearch desc -- to make it available as an actual parameter in the OpenSearch URL template. If on the other hand, you just want to hard-code it in though, that could make some sense. Url type=application/rss+xml xmlns:example=http://example.com/opensearchextensions/1.0/; template=http://my-sru-server.com?q={searchTerms}amp;accept=application%2Frss/ That might make sense, if that's the use case. But actually trying to provide a parameter for the _client_ to fill out in an OpenSearch desc in fact _conflicts_ with OpenSearch, for better or for worse the respones type is _hard coded_ into the template. Jonathan (I suggest, though, that you move further discussion of this to the SRU list.) --Ray -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Robert Sanderson Sent: Monday, May 17, 2010 3:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts In today's RESTful world, what's the requirement for the httpAccept parameter? Isn't straight content negotiation sufficient rather than pulling the headers into the URI? What happens if the accept header and the httpAccept parameter say different things? Rob On Mon, May 17, 2010 at 1:37 PM, LeVan,Ralph le...@oclc.org wrote: I'd code it. (I have already coded to it.) For me, the httpAccept parameter and support for content negotiation on responses is a wonderful addition to the standard. It lets us be OpenSearch compliant finally. The virtue of coding to the draft is that there's a chance we can fix any problems you encounter. While we consider the draft stable, that doesn't mean everything has been tested in the real world. I'm particularly nervous about the facets support I championed. I asked for it to support users of my SRW server framework who wanted to create an interface to SOLR. Those users disappeared and the usability of the SRU interface is untested. Ralph -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Monday, May 17, 2010 3:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts Wait, I'm so confused. Is SRU 2.0 actually a published standard, or are you just showing us a work in progress that nobody should be writing code to yet? I'm confused because I thought it was just a draft work in progress, but then you talk about official vs unofficial copies... an unofficial copy of a draft work in progress that isn't a spec yet anyway? Very confused. If I'm planning on writing software to SRU... do you recommend I use the (until now not publically available so I didn't have a choice) unofficial SRU 2.0 thing, or is that still just a draft work in progress nobody should be writing software to yet? Jonathan Ray Denenberg, Library of Congress wrote: For those of you who have recently asked about current OASIS drafts of SRU (2.0) and CQL ... The *official* versions reside at OASIS, but because of confusing (and sometimes
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
From: Jonathan Rochkind rochk...@jhu.edu Another question though. I note when looking up schemaInfo... I'm a bit confused by the sort attribute. How could you sort by a schema? What is this attribute actually for? Well indulge me, this is best explained by the current OASIS SRU draft. (The current and earlier specs don't do a good job here. But for background if interested: sorting as an SRU function was supported in SRU 1.1 and taken out of version 1.2, replaced by sorting as a function of the query language rather than the protocol. For the OASIS work it's in both. For the current spec at LC, which reflects 1.2, the attribute doesn't even make sense. If you go back to the 1.1 archive it does. Still, the OASIS document treats it more clearly.) See http://www.loc.gov/standards/sru/oasis/sru-2-0-draft-most-current.doc See section 9.1. So essentially, when you sort in SRU, you provide an XPath expression. The XPath expression is meaningful in the context of a schema, but the *record schema* may not be the most meaningful schema for purposes of sorting, there may be another schema more meaningful. So, you have the capability to specify not only a record schema but an auxiliary sort schema. A given schema that an Explain file lists will usually be one that is used as a record schema, but it may also be usable as a sort schema. That's what the sort attribute tells you. --Ray
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
From: Jonathan Rochkind rochk...@jhu.edu But you will leave sorting as part of CQL too in any changes to CQL specs, I hope? I think CQL has a lot of use even outside of SRU proper, so I encourage you to leave it's spec not too tightly coupled to SRU. The OASIS TC firmly supports this approach (and by firmly I mean 100%) so the only way this could get changed is via public comment. I think there are at least three ways to sort as part of (different versions of?) SRU now! 1) An actual separate sortKeys query paramater 2) Included in the CQL expression in query, using the sortBy keyword. 3) In draft not finalized, OASIS/SRU 2.0 methods of specifying XPaths for sorting. [Thanks for including the link to the current SRU 2.0 draft, I didn't know that was publically available anywhere, it's not really googlable]. As you corrected yourself in a subsequent message: Ah, I think I was wrong below. I must have been looking at different versions of the SRU spec without realizing it. SRU 1.1 includes a sortKeys parameter, and CQL 1.1 does not include a sortBy clause. SRU 1.2 does NOT include a sortKeys parameter, and CQL 1.2 does include a sortBy clause. Yes, that's correct. Do I have this right? As SRU 1.2 is the only actual spec I have to work with... am I right that either top-level sortKeys, or embedded in CQL with sortBy would both be legal in SRU 1.2 No. Legal in 2.0 - the OASIS version, not legal in 1.2. In 1.2 it is not legal to have a sort parameter in the request. OASIS is standardizing SRU and CQL loosely coupled that is, SRU can use other query languages and CQL may be invoked by other protocols, but they are generally oriented towards being used together. But since SRU may be used with a query language that might not have sort capability, the TC felt it necessary to include sorting as part of the protocol. Conversely since CQL may be used by a protocol that doesn't support sorting, similarly CQL should support sorting. There is a section in the draft standard that discusses what to do if a request has conflicting sort specifications. --Ray
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
schemaInfo is what you're looking for I think. Look at http://z3950.loc.gov:7090/voyager. Line 74, for example, schemaInfo schema identifier=info:srw/schema/1/marcxml-v1.1 sort=false name=marcxml titleMARCXML/title /schema Is this what you're looking for? --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, April 30, 2010 3:57 PM Subject: [CODE4LIB] SRU/ZeeRex explain question : record schemas This page: http://www.loc.gov/standards/sru/resources/schemas.html says: The Explain document lists the XML schemas for a given database in which records may be transferred. Every schemas is unambiguously identified by a URI and a server may assign a short name, which may or may not be the same as the short name listed in the table below (and may differ from the short name that another server assigns). But perusing the SRU/ZeeRex Explain documentation I've been able to find, I've been unable to find WHERE in the Explain document this information is listed/advertised. Can anyone clue me in?
Re: [CODE4LIB] Code4Lib Midwest?
If you're going to host it at Notre Dame I would expect an agenda something like this (do it on Saturday, 9/11) 1. meet greet (mid-morning) 2. Tailgate (noon) 3. Notre Dame Vs. Michigan (3:30) 4. presentation to library staff 5. hack session 6. go home Can you get us in to the game, Eric? I'd come. --Ray On Thu, Mar 4, 2010 at 10:38 AM, Eric Lease Morgan emor...@nd.edu wrote: If we were to host something here at Notre Dame, then I imagine something like the following agenda possible starting in the mid-morning: 1. meet greet 2. share demonstrations 3. eat lunch 4. give a presentation to library staff 5. have a hack session 6. socialize in the evening 7. go home the next day There are plenty of inexpensive hotels in South Bend because of all of our football games. -- Eric Lease Morgan University of Notre Dame (574) 631-8604 --
Re: [CODE4LIB] XML schemas question
Any element that is not a child of another. Or another way to look at it, any element that can be referenced by ref=. In the following schema: __ xs:schema xs:element name=a type=aType/ xs:element name=b type=bType/ !-- !-- xs:complexType name=aType xs:element ref=a/ xs:element name=c type=xs:string/ /xs:complexType !-- !-- xs:complexType name=bType xs:element ref=b/ xs:element name=c type=xs:string/ /xs:complexType /xs:schema __ a, and b can be roots. d cannot. --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Monday, July 27, 2009 11:35 AM Subject: [CODE4LIB] XML schemas question Anyone familiar with XML schemas (.xsd)? Can you help me figure something out. Is there something in the schema that specifies what elements can serve as the 'root node'... or is any element described in the schema avaialable for use as a 'root node', and it'll still validate? Thanks for any tips. Jonathan
Re: [CODE4LIB] Open, public standards v. pay per view standards and usage
I am not even remotely suggesting that anyone would implement the holdings standard with nothing but the schema. We're working on a solution to this. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, July 16, 2009 11:26 AM Subject: Re: [CODE4LIB] Open, public standards v. pay per view standards and usage From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Thursday, July 16, 2009 11:07 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Open, public standards v. pay per view standards and usage On Wed, Jul 15, 2009 at 8:57 AM, Ray Denenberg, Library of Congressr...@loc.gov wrote: Ross, if you're talking about the ISO 20775 xml schema: http://www.loc.gov/standards/iso20775/ISOholdings_V1.0.xsd It's free. It's also not a spec, it's a schema. If the expectation is that people are actually going to adopt a standard from merely looking at an .xsd, my prediction is that this will go nowhere. I mean, I'm wrong a lot, but I feel pretty good about this reading from my crystal ball. Not saying you're wrong Ross, but it depends. People adopted MARC-XML by looking at the .xsd without an actual specification. Granted it's not a complicated schema however, and there already existed the MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media so it wasn't a big leap to adopt MARC-XML, IMHO. Generally I agree with your conclusion Ross. It's difficult for people to just pick up an .xsd and understand what the semantics are for each element and attribute in the schema and which element(s) should be used for the document element. This is mitigated by annotations in the .xsd for the elements and attributes and also mitigated by using the Russian doll schema approach, that MARC-XML uses, so it's clear what elements can be used for the document element. Also tools like XMLSpy that provide a graphical representation of the .xsd can provide insights into how the schema should be used. But these are a lot of if this and that was done, and you have appropriate tools. A freely available specification detailing each element and attribute along with their semantics is much better for understanding a schema than the schema itself, but obviously the schema is the definitive authority when it comes to generating conforming instance documents. Andy.
Re: [CODE4LIB] Open, public standards v. pay per view standards and usage
From: Ross Singer rossfsin...@gmail.com Well, it's not a great example, because I don't have a 'counter-example', but I think it will remain to be seen if ISO 20775 goes anywhere if it, too, remains behind a pay wall. If an open spec were to come along that allowed the transfer of holdings and availability information that was decent and simple it would basically render ISO 20775 irrelevant (if the pay wall doesn't already). Ross, if you're talking about the ISO 20775 xml schema: http://www.loc.gov/standards/iso20775/ISOholdings_V1.0.xsd It's free. --Ray
Re: [CODE4LIB] WARC file format now ISO standard
But you have to pay $200 for the document that lists changes from last draft to first official version. (Ok, Ok, it was just a joke. But you do get the point.) - Original Message - From: st...@archive.org st...@archive.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Tuesday, June 02, 2009 5:18 PM Subject: Re: [CODE4LIB] WARC file format now ISO standard hi Karen, understood. the final draft of the spec is available here: http://www.scribd.com/doc/4303719/WARC-ISO-28500-final-draft-v018-Zentveld-080618 and other (similar) versions here: http://archive-access.sourceforge.net/warc/ /st...@archive.org On 6/2/09 2:15 PM, Karen Coyle wrote: Unfortunately, being an ISO standard, to obtain it costs 118 CHF (about $110 USD). Hard to follow a standard you can't afford to read. Is there an online version somewhere? kc st...@archive.org wrote: hi code4lib, if you're archiving web content, please use the WARC format. thanks, /st...@archive.org WARC File Format Published as an International Standard http://netpreserve.org/press/pr20090601.php ISO 28500:2009 specifies the WARC file format: * to store both the payload content and control information from mainstream Internet application layer protocols, such as the Hypertext Transfer Protocol (HTTP), Domain Name System (DNS), and File Transfer Protocol (FTP); * to store arbitrary metadata linked to other stored data (e.g. subject classifier, discovered language, encoding); * to support data compression and maintain data record integrity; * to store all control information from the harvesting protocol (e.g. request headers), not just response information; * to store the results of data transformations linked to other stored data; * to store a duplicate detection event linked to other stored data (to reduce storage in the presence of identical or substantially similar resources); * to be extended without disruption to existing functionality; * to support handling of overly long records by truncation or segmentation, where desired. more info here: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
Re: [CODE4LIB] exploiting z39.50
From: Eric Lease Morgan emor...@nd.edu 1. What MARC field/subfield might I put this string? 2. How would I go about getting the string indexed? 3. How might I go about querying the server for records with this string? I can at least talk about the third question. There was work on a marc attribute set, though not completed. If you look at the oid register at http://www.loc.gov/z3950/agency/defns/oids.html you'll see that the latest work on it (second draft) was in 2000, http://www.nlc-bnc.ca/iso/z3950/MARC_attribute_set_2.doc. So if someone actually wanted to put it to use it would have to be completed. For SRU there is a complete marc context set, http://www.loc.gov/standards/sru/resources/marc-context-set.html. --Ray
Re: [CODE4LIB] registering info: uris?
From: Ross Singer rossfsin...@gmail.com Except that OpenURL and SRU /already use different info URIs to describe the same things/. info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix What is the rationale for this? None. (Or, whatever rationale there was, historically, should no longer apply.) These should be aligned. Post this to the OpenURL list (and perhaps SRU as well). I'm certainly willing to work to come up with a solution. --Ray
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Thanks, Ross. For SRU, this is an opportune time to reconcile these differences. Opportune, because we are approaching standardization of SRU/CQL within OASIS, and there will be a number of areas that need to change. Some observations. 1. the 'ofi' namespace of 'info' has the advantage that the name, ofi, isn't necessarily tied to a community or application (I suppose one could claim that the acronym ofi means openURL something starting with 'f' for Identifiers but it doesn't say so anywhere that I can find.) However, the namespace itself (if not the name) is tied to OpenURL. Namespace of Registry Identifiers used by the NISO OpenURL Framework Registry. That seems like a simple problem to fix. (Changing that title would not cause any technical problems. ) 2. In contrast, with the srw namespace, the actual name is srw. So at least in name, it is tied to an application. 3. On the other side, the srw namespace has the distinct advantage of built-in extensibility. For the URI: info:srw/schema/1/onix-v2.0, the 1 is an authority. There are (currently) 15 such authorities, they are listed in the (second) table at http://www.loc.gov/standards/sru/resources/infoURI.html Authority 1 is the SRU maintenance agency, and the objects registered under that authority are, more-or-less, public. But objects can be defined under the other authorities with no registration process required. 4. ofi does not offer this sort of extensibility. So, if we were going to unify these two systems (and I can't speak for the SRU community and commit to doing so yet) the extensibility offered by the srw approach would be an absolute requirement. If it could somehow be built in to ofi, then I would not be opposed to migrating the srw identifiers. Another approach would be to register an entirely new 'info:' URI namespace and migrating all of these identifiers to the new namespace. --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: z...@listserv.loc.gov Sent: Thursday, April 30, 2009 2:59 PM Subject: One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of sense. It also made me realize that OpenURL *also* has a registry of metadata formats: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats The problem here is that OpenURL and SRW are using different info URIs to describe the same things: info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix The latter technically isn't the same thing since the OpenURL one claims it's an identifier for ONIX 2.1, but if I wasn't sending this email now, eventually SRU would have registered info:srw/schema/1/onix-v2.1 There are several other examples, as well (MODS, ISO20775, etc.) and it's not a stretch to envision more in the future. So there are a couple of questions here. First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Secondly, and this gets to the reason why any of this was brought up in the first place, how can we coordinate these identifiers more effectively and efficiently to reuse among various specs and protocols, but not: 1) be tied to a particular community 2) require some laborious and lengthy submission and review process to just say hey, here's my FOAF available via UnAPI 3) be so
Re: [CODE4LIB] exact title searches with z39.50
From: Mike Taylor m...@indexdata.com The irony is that Z39.50 actually make _much_ more effort to specify semantics than most other standards -- and yet still finds itself in the situation where many implementations do not respond correctly to the BIB-1 attribute 6=3 (completeness=complete field) which is how Eric should be able to do what he wants here. Not that I have any good answers to this problem ... but I DO know that inventing more and more replacement standards it NOT the answer. Everything that's come along since Z39.50 has suffered from exactly the same problem but more so. I think this remains to be seen for SRU/CQL, in particular for the example at hand, how to search for exact title. There are two related issues: one, how arcane the standard is, and two, how closely implementations conform to the intended semantics. And clearly the first has a bearing on the second. And even I would say that Z39.50 is a bit on the arcance side when it comes to formulating a query for exact title. With SRU/CQL there is an exact relation ('exact' in 1.1, '==' in 1.2). So I would think there is less excuse for a server to apply a creative interpretation. If it cannot support exact title it should fail the search. With Z39.50 there is more perceived latitude for a server to pretend it supports something it doesn't. --Ray
Re: [CODE4LIB] exact title searches with z39.50
Right, Mike. There is a long and rich history of the debate between loose and strict interpretation, in the world at large, and in particular, within Z39.50, this debate raged from the late 1980s throughout the 90s. The faction that said If you can't give the client what is asks for, at least give them something; make them happy was almost religious in its zeal. Those who said If you can't give the client what it asks for, be honest about it; give them good diagnostic information, tell them a better way to formulate the request, etc. But don't pretend the transaction was a success if it wasn't was shouted down most every time. I can't predict, but I'm just hoping that lessons have been learned from the mess that that mentality got us into. --Ray - Original Message - From: Mike Taylor m...@indexdata.com To: CODE4LIB@LISTSERV.ND.EDU Sent: Tuesday, April 28, 2009 10:43 AM Subject: Re: [CODE4LIB] exact title searches with z39.50 Ray Denenberg, Library of Congress writes: The irony is that Z39.50 actually make _much_ more effort to specify semantics than most other standards -- and yet still finds itself in the situation where many implementations do not respond correctly to the BIB-1 attribute 6=3 (completeness=complete field) which is how Eric should be able to do what he wants here. Not that I have any good answers to this problem ... but I DO know that inventing more and more replacement standards it NOT the answer. Everything that's come along since Z39.50 has suffered from exactly the same problem but more so. I think this remains to be seen for SRU/CQL, in particular for the example at hand, how to search for exact title. There are two related issues: one, how arcane the standard is, and two, how closely implementations conform to the intended semantics. And clearly the first has a bearing on the second. And even I would say that Z39.50 is a bit on the arcance side when it comes to formulating a query for exact title. With SRU/CQL there is an exact relation ('exact' in 1.1, '==' in 1.2). So I would think there is less excuse for a server to apply a creative interpretation. If it cannot support exact title it should fail the search. IMHO, this is where it breaks down 90% of the time. Servers that can't do what they're asked should say I can't do that, but -- for reasons that seem good at the time -- nearly no server fails requests that it can sort of fulfil. Nine out of ten Z39.50 servers asked to do a whole-field search and which can't do it will instead do a word search, because it's better to give the user SOMETHING. I bet the same is true of SRU servers. (I am as guilty as anyone else, I've written servers like that.) The idea that it's better to give the user SOMETHING might -- might -- have been true when we mostly used Z39.50 servers for interactive sessions. Now that they are mostly used as targets in metasearching, that approach is disastrous. _/|_ ___ /o ) \/ Mike Taylorm...@indexdata.com http://www.miketaylor.org.uk )_v__/\ I try to take one day at a time, but sometimes several days attack me at once -- Ashleigh Brilliant.
Re: [CODE4LIB] exact title searches with z39.50
From: Walker, David dwal...@calstate.edu I'm not sure it's a _big_ mess, though, at least for metasearching. I wasn't thinking specifically about metasearch, but rather, bad decisions getting replicated and you end up with an installed base of bad implementations. The best illustration would be the huge mess that HTML is. --Ray
Re: [CODE4LIB] exact title searches with z39.50
From: Jonathan Rochkind rochk...@jhu.edu HTML works out pretty well. If our biggest failures were 'failures' like HTML, we'd be doing pretty well. HTML is a wonderful standard. And I don't mean to take the discussion off-course. My point was simply that because early browsers did not insist on clean html, the proliferation of unlean html has reached the point where, well, whether you consider it a mess or not depends on how much importance you place on clean html. It's important to me. --Ray
Re: [CODE4LIB] Serials Solutions Summon
From: Thomas Dowling tdowl...@ohiolink.edu You can define differences between meta-, federated, and broadcast search, but every discussion on the topic will be punctuated by people asking, Wait, what's the difference again? Leaving aside metasearch and broadcast search (terms invented more recently) it is a shame if federated has really lost its distinction fromdistributed. Historically, a federated database is one that integrates multiple (autonomous) databases so it is in effect a virtual distributed database, though a single database.I don't think that's a hard concept and I don't think it is a trivial distinction. --Ray
Re: [CODE4LIB] Serials Solutions Summon
From: Jonathan Rochkind rochk...@jhu.edu If you want to reclaim the term federated to mean a local index, I think you have a losing battle in front of you. It's not a battle I plan to pursue, I don't fight battles anymore. I just feel obligated to observe that when vocabulary is tinkered with in this fashion -- and I did notice, probably more than ten years ago, that federated was being manipulated -- it makes it difficult to express modeling concepts when definitions are a moving target. In the future, vendors should be more careful about messing around with established definitions. If you don't like the federated model (the old one), don't redefine the term, find a new term. The old term could someday come in handy for expressing why you don't like that model, but it's useless if nobody agrees what it means. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] Something completely different
From: Mike Taylor m...@indexdata.com ... anyway, all of this is far, far away from the point. MARC is old and ugly yes; but then so am I, I don't think you're old, Mike. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
No, not identical URIs. Let's say I've put a copy of the schema permanently at each of the following locations. http://www.loc.gov/standards/mods/v3/mods-3-3.xsd http://www.acme.com//mods-3-3.xsd http://www.takoma.org/standards/mods-3-3.xsd Three locations, three URIs. But the issue of redirect or even resolution is irrelevant in the use case I'm citing. I'm talking about the use of an identifier within a protocol, for the sole purpose of identifying an object that the recipient of the URI already has - or if it doesn't have it it isn't going to retrieve it, it will just fail the request. The purpose of the identifier is to enable the server to determine whether it has the schema that the client is looking for. (And by the way that should answer Ed's question about a use case.) So the server has some table of schemas, in that table is the row: [mods schema] [ URI identifying the mods schema] It recieves the SRU request: http://z3950.loc.gov:7090/voyager? version=1.1operation=searchRetrievequery=dinosaurmaximumRecords=1recordSchema=URI identifying the mods schema If the URI identifying the MODS schema in the request matches the URI in the table, then the server know what schema the client wants, and it proceeds. If there are multiple identifiers then it has to have a row in its table for each. Does that make sense? --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 2:07 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Ray, you are absolutely right. These would be bad identifiers. But let's say they're all identical (which I think is what you're saying, right?), then this just strengthens the case for indirection through a service like purl.org. Then it doesn't *matter* that all of these are different locations, there is one URI that represent the concept of what is being kept at these locations. At the end of the redirect can be some sort of 300 response that lets the client pick which endpoint is right for them -or arbitrarily chooses one for them. -Ross. On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress r...@loc.gov wrote: We do just fine minting our URIs at LC, Andy. But we do appreciate your concern. The analysis of our MODS URIs misses the point, I'm afraid. Let's forget the set I cited (bad example) and assume that the schema is replicated at several locations (geographically dispersed) all of which are planned to house the specific version permanently. The suggestion to designate one as cannonical is a good suggestion but it isn't always possible (for various reasons, possibly political). So I maintain that in this scenario you have several *location* none of which serves well as an identifier. I'm not arguing (here) that info is better than http (for this scenario) just that these are not good identifiers. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 1:21 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 1:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that it is a web addresss. However, it is not a location. URIs according to RFC 3986 are just tokens to identify resources. These tokens, e.g., URIs are presented to protocol mechanisms as part of the dereferencing process to locate and retrieve a representation of the resource. People see http: and assume that it means the HTTP protocol so it must be a locator. Whoever initially registered the HTTP URI scheme could have used web as the token instead and we would all be doing: web://example.org/. This is the confusion. People don't understand what RFC 3986 is saying. It makes no claim that any URI registered scheme has persistence or can be dereferenced. An HTTP URI is just a token to identify some resource, nothing more. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept. It just so happens that HTTP URIs are accepted by the HTTP protocol, but in the case of mailto URIs they are accepted by the SMTP protocol. However, it is important to note that a HTTP user agent, e.g., a browser, accepts both HTTP and mailto URIs. It decides that it should send the mailto URI to an SMTP user agent, e.g., Outlook, Thunderbird, etc.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Houghton,Andrew hough...@oclc.org The point being that: urn:doi:* info:doi:* provide no advantages over: http://doi.org/* I think they do. I realize this is pretty much a dead-end debate as everyone has dug themselves into a position and nobody is going to change their mind. It is a philosophical debate and there isn't a right answer. But in my opinion I won't use the doi example because it's overloaded. Let's talk about the hypothetical sudoc. I think info:sudoc/xyz provides an advantages over: http://sudoc.org/xyz if the latter is not going to resolve. Why? Because it drives me nuts to see http URIs everywhere that give all appearances of resolvability - browsers, editors, etc. turn them into clickable links. Now, if you are setting up a resolution service where you get the document that the sudoc identifies when you click on the URI, then http is appropriate. The *actual document*. Not a description of it in lieu of the document. And the so-called architectural justification that it's ok to return metadata instead of the resource (representation) -- I don't buy it. --Ray
Re: [CODE4LIB] registering info: uris?
From: Jonathan Rochkind rochk...@jhu.edu There are all sorts of useful identifiers I use in my work every day that can not be automatically dereferenced. Even more to the point: there is no sound definition of dereference. To dereference a resource means to retrieve a representation of it. There has never been any agreement within the w3c of what constitutes a representation. --Ray
Re: [CODE4LIB] resolution and identification
A concrete example. The MODS schema, version 3.3, has an info identifier, for SRU purposes: info:srw/schema/1/mods-v3.3 So in an SRU request you can say recordSchema=info:srw/schema/1/mods-v3.3 Meaning you want records returned in the mods version 3.3 schema. And that's really the purpose of the schema identifier. Both the client and server know the schema by this identifier - or the server doesn't know it at all and the request fails - but nobody wants to resolve the identifier. Now in contrast, the schema is at http://www.loc.gov/standards/mods/v3/mods-3-3.xsd And it's also at: http://www.loc.gov/mods/v3/mods-3-3.xsd And also: http://www.loc.gov/mods/mods.xsd And: http://www.loc.gov/standards/mods/mods.xsd And: http://www.loc.gov/standards/mods/v3/mods.xsd So there you have five http identifiers for the schema. Which is the better identifier for this purpose? The single info identifer, or a choice http identifers, one for every possible location where the schema may reside (which is more than these five).If the answer is that it's better to use one of the http identifiers, how do you know that the one you pick is the one that the server recognizes it by? Or should the server maintain a list of all possible locations? --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 12:26 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross.
Re: [CODE4LIB] registering info: uris?
From: Erik Hetzner erik.hetz...@ucop.edu I believe that registering a domain would be less work than going through an info URI registration process, but I don’t know how difficult the info URI registration process would be (thus bringing the conversation full circle). [1] Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. --Ray
Re: [CODE4LIB] registering info: uris?
From: Ross Singer rossfsin...@gmail.com nobody gives a damn about info:uris outside of libraries, Nor do people outside of libraries care about identifiers. --Ray
Re: [CODE4LIB] registering info: uris?
From: Hilmar Lapp hl...@duke.edu Nor do people outside of libraries care about identifiers. You might be surprised: http://www.lsrn.org/ yes, I overstated, let me rephrase. There are communities who are interested in specific object classes and want identifier schemes for them. For libraries there are books, article, journals, and many others. And certainly this isn't limited to libraries, for example many scientific disciplines have a similar interest in identifer schemes for objects in specific object classes. But the term identifier has taken on a whole new meaning with the web. It has now been generalized to identify any resouce, and we don't even have a clear definition of resource, aside from the convoluted anything that can be identified - The discussions on this are often a convoluted mess, and it's no wonder location and identity get confused. And because of all the emphasis on solving this part of the web architecture - which haven't been accomplished, and there is debate within the W3C whether it is even possible - the original concept of identifer seems to be lost, aside from within the communities I alluded to above. And it is for those communities that the info URI is useful. Now as to my reference to religious issues, a statement like Having unresolvable URIs is anti-Web would be better to stated as: Having unresolvable URIs IN MY OPINION is anti-Web. It is an opinion, not a fact. Stating is as fact is dogmatic. It is a reasonable opinion, however, my opinion: Having unresolvable URIs IN MY OPINION is PRO-Web is just as reasonable. I needn't go into further detail, we've beaten this to death already. --Ray
Re: [CODE4LIB] registering info: uris?
Pointing to the documentation and saying one of these isn't going to work, I'm afraid. Most important is to make sure that the syntax is consistent with URI syntax. Where the syntax of the identifier you're representing is potentially at odds with URI syntax, you might have to make adjustments, like percent-encode. So if you're going to register sudoc, you're going to have to understand the syntax to some degree, there's really no way around it. (I didn't know the lccn syntax, registering it forced me to learn it, and I'm a better man for it.) I don't know much about SuDoc, and most everything seems to point to http://www.gpo.gov/su_docs/fdlp/pubs/explain.html which doesn't really explain their syntax. (Though if you look a bit harder maybe you'll find something better.) But I see this example:Y 3.C 76/3:2 K 54 That's apparently a sudoc. It immediately raises the following flags: spaces, slash, colon, and case (sensitivity).For your purposes I don't think that colon or slash is a problem. (They become a problem when you are using them as special characters for delimitation, but you're not doing that.) Spaces, though, have to be percent encoded. (That simply means replace each occurence of a space with %20.) You also need to look at case-sensitivity. If sudocs are case-sensitive, no problem, if not, then you may want to normalize to either upper or lower case. There may not be any normalization issues (other than case sensitivity, if that). Normalization is an issue only if a particular sudoc can be represented by more than one string. If so you have two choices: 1. prescribe a canonical form (which is the approach we took for LCCNs). 2. simply describe the rules for determining when two strings represent the same sudoc (there is no rule that says that two different info URIs can't refer to the same resource). You can contact me privately if you have problems. No, sorry, I don't know anyone at GPO. I worked the graveyard shift there part time during college. (I had to load mailing machines with junk mail. Several junk items loaded into a machine which would combine them into one mailing item. The machine would jam about every tenth time. Worst job I ever had.) But that was many years ago and that's the last contact I've had with GPO. Good luck. -Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 3:36 PM Subject: Re: [CODE4LIB] registering info: uris? Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Jonathan Ray Denenberg, Library of Congress wrote: It's a fairly straightforward process, See: http://info-uri.info/registry/register.html You should look at a few examples first, go to http://info-uri.info/registry/ and click on a few of those listed in the left column. I think registering one for SuDocs would be fairly easy. The info folks are most concerned that the syntax rules are well-described. I had registered a few of these before they started cracking the whip on that (and rightly so), and when I registered info:lc it became more difficult; you might want to look at that for an example: http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lc/ Also, normalization - I suggested looking at info:lccn normalization rules: http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lccn/ --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 3:12 PM Subject: [CODE4LIB] registering info: uris? Does anyone know the process for registering a sub-scheme for info: uris? I'd like to have one for SuDoc classification numbers, info:sudoc/. I'm not sure if I can register that on my own, without working with the US Government Printing Office, who actually maintains sudocs. But if I have to get GPO to do it, I'll probably give up quicker (unless it turns out easier than I thought to find the right person at GPO and get them to sign on -- I doubt it!). Or if the registration process is really long
Re: [CODE4LIB] registering info: uris?
Correct me if I'm wrong but isn't the point of all this to be able to put the URI in an OpenURL? And info was invented (in part) to avoid putting http URIs in OpenURLs (because they are complicated enough already, why clutter them further). So I don't see that pursuing an http solution to this is very useful. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, March 27, 2009 5:24 PM Subject: Re: [CODE4LIB] registering info: uris? From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, March 27, 2009 5:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] registering info: uris? I am not interested in maintaining a sudoc.info registration, and neither is my institution, who I wouldn't trust to maintain it (even to the extent of not letting the DNS registration expire) after I left. BTW, you could always use http://purl.org/ and later if you wanted to have it resolve to something just change the PURL.
Re: [CODE4LIB] MIME Type for MARC, Mods, etc.?
Sorry for the confusion over SRU, and I'm afraid this takes up way off-topic, but since you asked . I meant the SRU *response format*. And even that doesn't make sense, not in the context of the current SRU spec. But in the next version, 2.0, which we are now developing within OASIS, the response can take on different formats, subject (possibly) to content negotiation. For example the response can be packaged in ATOM, or RSS, or the default SRU schema, and it is the later that we are registering. --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, February 13, 2009 12:02 PM Subject: Re: [CODE4LIB] MIME Type for MARC, Mods, etc.? Thanks Ray. marcxml+xml makes sense to me, the name is really arbitrary, so long as we have _something_ that represents MARC-XML. Glad you are working on this. I'm confused about your suggestion of registering a content type for SRU. My understanding is that SRU is a _protocol_, not a media type? Unless you mean registering a type for the SRU explain document? In general, with my understanding of SRU and of the purpose of internet content types, it doesn't seem to make sense to me to register a content type for SRU the protocol. Media types must function as an actual media format: Registration of things that are better thought of as a transfer encoding, as a character set, or as a collection of separate entities of another type, is not allowed. Is SRU a media/document type, or is it a communications protocol using a collection of separate document types? Jonathan Ray Denenberg, Library of Congress wrote: A few points: 1. x- is commonly used in cases when an application for a mime type is pending, and when there is a reasonable expectation that it will be approved. The mime type is prefixed with x- until the requested mime type becomes official, after which the x- is dropped. 2. We will be registering MODS and MARCXML: - application/mods+xml - application/marcxml+xml 3. The reason one uses (or doesn't use) +xml is made very clear in one of the relevant RFCs (I don't have the number at the moment): the application consuming the content is supposed to recognize the mime type and process it accordingly, however, in the event that it does not recognize the mime type, the +xml signals at least that the content is xml, and so there is a possibility that it might do something useful with it, even though it cannot proccess it according to mime type - it may be able to parse the XML and present something readable to the user. Even better, consider the case where it is a protocol response, for example SRU, where we are registering application/sru+xml, there might be an accompanying stylesheet url, and the client can then format a complete sru response without knowing that it did so. The reason is NOT, as some have suggested, to distinguish mods+xml from mods+xyz where xyz is some alternative syntax. However, because of the confusion, we would register marcxml as marcxml+xml (even though it sounds funny) rather than marc+xml, because of all the confusion that the latter name would cause. --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, February 12, 2009 5:21 PM Subject: Re: [CODE4LIB] MIME Type for MARC, Mods, etc.? Actually, re-reading some of the RFCs, I would clarify one thing. It seems like using unregistered x- MIME type is discouraged, and instead you are encouraged to use what is (claimed to be) a very quick and easy and painless process of registering vnd. types. So I'd encourage LC to investigate doing that for MARC, while waiting for someone to have time to do an actual (more time consuming) application/marc+xml registration. That would give us the beneift of an actual registration (albeit under vnc.) instead of an unregistered x-. As far as text/xml, the general consensus on the internet seems to be that it was a mistake, but it's there and no one cares enough to try to somehow remove it, so it _is_ legal, but nobody really encourages using it. One problem with text/html is that it's default char encoding is ascii, while the default char encoding for XML is of course UTF-8. This can very easily lead to confusion and encoding errors unless software is more careful than we know most software has a tendency to be. :) Still, it's legal, but I don't see any reason to encourage it's use for MARC. application/xml, sure, but it would be _really_ useful, for the reasons discussed in last week's thread, to have a specific type for marc xml (and mods). If the folks at LC don't understand why, thinking that application/xml is sufficient, i could try to write up a persuasive essay again, or copy and paste from last week's thread. Or is there someone else other than LC who could conceivably fill out an application for application/marc+xml and application
Re: [CODE4LIB] [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.?
Hello, this thread was recently brought to my attention. (And then it took longer than it should have to get subscribed to this list. And I haven't seen an archive so I don't know if there has been any discussion beyond 2/4. Anyway ) We (LC) decided a year ago to register mime types for both MODS and MARC but regrettably got sidetracked. We're back on track now and it should be done reasonably soon. --Ray Denenberg - Original Message - From: Riley, Jenn jenlr...@indiana.edu To: mods...@listserv.loc.gov Sent: Wednesday, February 04, 2009 6:58 PM Subject: Re: [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.? Cool! Can you post a response to CODE4LIB? Thanks, Jenn -Original Message- From: MODS Editorial Committee Forum [mailto:mods...@loc.gov] On Behalf Of Ray Denenberg, Library of Congress Sent: Wednesday, February 04, 2009 6:56 PM To: mods...@listserv.loc.gov Subject: Re: [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.? Application for mime types for MARC and MODS is in the works. It's been slowed down but we're back to moving it along and it should be reasonably soon, I hope. --Ray - Original Message - From: Riley, Jenn jenlr...@indiana.edu To: mods...@listserv.loc.gov Sent: Wednesday, February 04, 2009 6:42 PM Subject: [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.? Something for us to think about... Jenn -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Wednesday, February 04, 2009 10:59 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MIME Type for MARC, Mods, etc.? You CAN use application/xml for any XML, but it's often useful to have a specific type for your specific content, so the user-agent can know what to do with it. The convention is to include +xml on the end, so if the user agent doens't know your specific format, it can fall back to treating it as generic XML. For instance: application/rss+xml application/atom+xml application/rdf+xml And dozens more you can see at: http://www.iana.org/assignments/media-types/application/ (search for +xml). Thanks to Mark and Ross Singer for pointing out application/marc already exists. (and is on that list above). Awesome. I'm still feeling the need for application/marc+xml, and application/mods+xml Jonathan Ethan Gruber wrote: Correct me if I'm wrong, but wouldn't the mime type for MARC-XML and MODS be application/xml, like every other xml file? As for MARC-binary, I can't say. I don't have any of those files handy. Ethan On Wed, Feb 4, 2009 at 10:47 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I am actually rather shocked that it seems that MARC-XML, MODS, MARC21-binary, do not have registered Internet Content Types (aka MIME types). Am I missing something, or is this really so? Anyone know what the process is for registering such? Anyone want to help try to do that? I guess we'd probably have to talk to the standards organizations for each of those types, rather than doing it independently? Jonathan