Re: [CODE4LIB] Representing copyright holder in MODS

2011-06-13 Thread Ray Denenberg, Library of Congress
 From: Mike Taylor
 Any thoughts on how I might use this to express the copyright status of
 the item's abstract?

One way, that I have heard discussed (though I don't know if anyone is doing
it) is to represent the abstract as part of a related item (type =
constituent).  The related item could consist of just the abstract and the
copyright statement.  

--Ray


Re: [CODE4LIB] If you were starting over, what would you learn and how would you do it?

2011-05-09 Thread Ray Denenberg, Library of Congress
Along the lines of oh, you meant THIS profession 

Rotational vs. linear mechanics.


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Nate Vack
 Sent: Friday, May 06, 2011 4:47 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] If you were starting over, what would you learn
 and how would you do it?
 
 On Fri, May 6, 2011 at 2:07 PM, Ceci Land cl...@library.msstate.edu
 wrote:
 
  How would you choose to develop your skills from baby level to
 something useful to the profession?
 
 I'd pretty much follow the plot of Batman Begins as closely as
 possible.
 
 Wait, useful to *this* profession?
 
 -n


Re: [CODE4LIB] mailing list administratativia

2010-10-27 Thread Ray Denenberg, Library of Congress
I think the constraint is that it has to be a rational number. 

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric
Hellman
Sent: Wednesday, October 27, 2010 5:58 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] mailing list administratativia

I vote for changing the limit threshold to 

 PI * (eventual length of this meta-thread).

On Oct 27, 2010, at 3:37 PM, Alexander Johannesen wrote:

 On Thu, Oct 28, 2010 at 2:44 AM, Doran, Michael D do...@uta.edu wrote:
 Can that limit threshold be raised?  If so, are there reasons why it
should not be raised?
 
 Is it to throttle spam or something? 50 seems rather low, and it's 
 rather depressing to have a lively discussion throttled like that. Not 
 to mention I thought I was simply kicked out for living things up 
 (especially given my reasonable follow-up was where the throttling 
 began).
 
 Alex
 --
  Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic 
 Maps
 --- http://shelter.nu/blog/ 
 --
 -- http://www.google.com/profiles/alexander.johannesen 
 ---

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

e...@hellman.net
http://go-to-hellman.blogspot.com/
@gluejar


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Ray Denenberg, Library of Congress
It really is possible to make your point without being quite so obnoxious.
Everyone else seems to be able to do so. --Ray

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Alexander Johannesen
Sent: Monday, October 25, 2010 3:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?

Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
 Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst thing the
library world ever did. Some might argue it was a good first step, and that
it was better with something rather than nothing, to which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure over
code, namespace conventions, identity management, document control,
separation of entities and properties, and on and on), it breaks the
ontological commitment that a better treatment of the MARC data could bring,
deterring people from actually a) using the darn thing as anything but a
bare minimal crutch, and b) expanding it to be actual useful and
interesting.

The quicker the library world can get rid of this monstrosity, the better,
although I doubt that will ever happen; it will hang around like a foul
stench for as long as there is MARC in the world. A long time. A long sad
time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)


Kind regards,

Alex
--
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---


Re: [CODE4LIB] SRU 2.0 / Accept-Ranges (was: Inlining HTTP Headers in URLs )

2010-06-02 Thread Ray Denenberg, Library of Congress
 Joe Hourcle wrote:
Do we have anyone affiliated with the project on this list who can make a
correction before it leaves draft?

Could you submit this suggestion formally  See:

http://www.oasis-open.org/committees/comments/index.php?wg_abbrev=search-ws

(The SRU and CQL development gets discussed on various lists, which is fine,
but when the discussion leads to suggested changes, it is best if any such
proposals can be formally submitted to OASIS via the above.  Otherwise OASIS
gets angry.) 


--Ray


Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-18 Thread Ray Denenberg, Library of Congress
There is no synchronous operation in SRU.   

As for federated  search .  

To digress a moment, you may recall -- I believe it was on this list --
there was discussion (maybe a year ago?) of what that even means and whether
it is the same or differs from metasearch, whatever that means.  That
discussion was inconclusive.  Anyway, earlier drafts of SRU 2.0  describe a
metasearch model.  Recently, the committee decided that the terms
metasearch and federated search are undefined jargon.  We now choose to
call it multi-server search.

So to answer the question of whether there is federated search support: yes,
limited support, if by federated search you mean multi server search.   

There is no multi-server support in terms of separate result sets for
different servers.   However,  for (1) faceted search results, and (2)
subquery results, these can be grouped according to server. 

--Ray



-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Kuba
Sent: Tuesday, May 18, 2010 4:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

Hi,

Does the current draft include any support for asynchronous operation of the
protocol (either by status notifications and polling and/or streaming), e.g
some chunk of results coming back before others?
Sometime ago I read through an early draft published on the LOC site and it
mentioned support for federated search but it's hard to imagine how could
that be implemented without any async support.


Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-18 Thread Ray Denenberg, Library of Congress
On 18 May 2010 15:24, Ray Denenberg, Library of Congress r...@loc.gov
wrote:
 There is no synchronous operation in SRU.

Sorry, meant to say no asynchronous .

--Ray


Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-18 Thread Ray Denenberg, Library of Congress
What advantage do you see in having a concurrent operations feature (like
Z39.50) versus opening several connections?

(Concurrent operations introduced significant complexity into Z39.50 -
including reference ids, operations, etc, and I'm not sure anyone ever
really thought it was worth it.)

--Ray

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Kuba
Sent: Tuesday, May 18, 2010 12:58 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

That is quite unfortunate, as we were looking at SRU 2.0 as a possible
candidate for the front-end protocol for Index Data's pazpar2. The main
problem with federate/broadcast/meta (however you want to call it
;) searching is that the back-end databases are scattered in different
locations or simply slow in their response times and in order to provide
decent user experience you need to be able to present some results sooner
than others. Waiting for the slowest database to respond is usually not an
option.

On Tue, May 18, 2010 at 5:24 PM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
 On 18 May 2010 15:24, Ray Denenberg, Library of Congress 
 r...@loc.gov
 wrote:
 There is no synchronous operation in SRU.

 Sorry, meant to say no asynchronous .

 --Ray




-- 

Cheers,
Jakub


Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-18 Thread Ray Denenberg, Library of Congress
First, no. There are extensibility features in SRU but nothing that would
help here. 

Actually, Jonathan, what I though you were suggesting was the creation of a
(I hesitate to say it) metasearch engine. I use that term because it is what
NISO called it, when they started their metasearch initiative five or so
years ago, to create a standard for a metasearch engine, but they got
distracted and the effort really came to nothing.   

The premise of the metasearch engine is that there exists a single-thread
protocol, for example, SRU, and the need is to manage many threads, which is
what the metasearch engine would have done if it had ever been defined. This
is probably not an area for OASIS work, but if someone wanted to revive the
effort in NISO (and put it on the right track) it could be useful. 

--Ray


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, May 18, 2010 2:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

Jakub Skoczen wrote:

 I wonder if someone, like Kuba, could design an 'extended async SRU' 
 on top of SRU, that is very SRU like, but builds on top of it to add 
 just enough operations for Kuba's use case area.  I think that's the 
 right way to approach it.
 

 Is there a particular extensibility feature in the protocol that 
 allows for this?
   
I don't know, but that's not what I was suggesting. I was suggesting you
read the SRU spec, and then design your own SRU-async spec, which is
defined as exactly like SRU 2.0, except it also has the following
operations, and is identified in an Explain document like X.

Jonathan


Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-17 Thread Ray Denenberg, Library of Congress

Rather, OpenSearch descriptions provide a _different URL template_ for
every response IANA content type available

Yes, of course, that's what I meant, I said it somewhat slopily, but in many 
of the examples we've looked at, it comes down to the same thing, that the 
different templates differ only in a single (hard-coded) parameter.


--Ray


- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Monday, May 17, 2010 6:11 PM
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts



Ray Denenberg, Library of Congress wrote:

Ralph will probably be able to articulate this better than I can, but the
accept parameter is driven by the requirement to be able to use 
OpenSearch
(for example) to query an SRU server. The description document isn't 
going
to provide templates that allow you to do this via content negotiation, 
they
provide a parameter instead, to allow the client to tell the server that 
it

wants, for example, an rss response.


No, they don't.  I am having this same debate with Tony Hammond.

OpenSearch descriptions do NOT provide a parameter to allow the client to 
tell the server what response it wants. They also don't easily provide for 
content negotiation, it is true.
Rather, OpenSearch descriptions provide a _different URL template_ for 
every response IANA content type available.  Here is the example from the 
OpenSearch documentation:


Url type=application/rss+xml 
xmlns:example=http://example.com/opensearchextensions/1.0/;


template=http://example.com?q={searchTerms}amp;c={example:color?}/

Please note that application/rss+xml is an attribute of the URL template 
itself, it is NOT a parameter in the template.


If SRU added an accept parameter to try and make OpenSearch happy, this is 
a big mistake, because it in fact _conflicts_ with OpenSearch desc -- 
to make it available as an actual parameter in the OpenSearch URL 
template.


If on the other hand, you just want to hard-code it in though, that could 
make some sense.


Url type=application/rss+xml 
xmlns:example=http://example.com/opensearchextensions/1.0/;


template=http://my-sru-server.com?q={searchTerms}amp;accept=application%2Frss/

That might make sense, if that's the use case.  But actually trying to 
provide a parameter for the _client_ to fill out in an OpenSearch desc in 
fact _conflicts_ with OpenSearch, for better or for worse the respones 
type is _hard coded_ into the template.


Jonathan




(I suggest, though, that you move further discussion of this to the SRU
list.)

--Ray

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Robert Sanderson
Sent: Monday, May 17, 2010 3:44 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

In today's RESTful world, what's the requirement for the httpAccept
parameter?  Isn't straight content negotiation sufficient rather than
pulling the headers into the URI?
What happens if the accept header and the httpAccept parameter say 
different

things?


Rob

On Mon, May 17, 2010 at 1:37 PM, LeVan,Ralph le...@oclc.org wrote:

I'd code it.  (I have already coded to it.)  For me, the httpAccept 
parameter and support for content negotiation on responses is a 
wonderful addition to the standard.  It lets us be OpenSearch compliant 
finally.


The virtue of coding to the draft is that there's a chance we can fix 
any problems you encounter.  While we consider the draft stable, that 
doesn't mean everything has been tested in the real world.  I'm 
particularly nervous about the facets support I championed.  I asked for 
it to support users of my SRW server framework who wanted to create an 
interface to SOLR.  Those users disappeared and the usability of the SRU 
interface is untested.


Ralph



-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf


Of


Jonathan Rochkind
Sent: Monday, May 17, 2010 3:18 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current


drafts


Wait, I'm so confused. Is SRU 2.0 actually a published standard, or


are

you just showing us a work in progress that nobody should be writing 
code to yet?


I'm confused because I thought it was just a draft work in progress,


but


then you talk about official vs unofficial copies... an unofficial


copy


of a draft work in progress that isn't a spec yet anyway?  Very


confused.


If I'm planning on writing software to SRU... do you recommend I use


the

(until now not publically available so I didn't have a choice) 
unofficial SRU 2.0 thing, or is that still just a draft work in 
progress nobody should be writing software to yet?


Jonathan

Ray Denenberg, Library of Congress wrote:


For those of you who have recently asked about current OASIS drafts


of SRU


(2.0) and CQL ...

The *official* versions reside at OASIS, but because of confusing


(and

sometimes

Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas

2010-05-03 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu
Another question though. I note when looking up schemaInfo... I'm a bit 
confused by the sort attribute.  How could you sort by a schema? What is 
this attribute actually for?



Well indulge me, this is best explained by the current OASIS SRU draft.

(The current and earlier specs don't do a good job here. But for background 
if interested:  sorting as an SRU function was supported in SRU 1.1 and 
taken out of version 1.2, replaced by sorting as a function of the query 
language rather than the protocol. For the OASIS work it's in both.  For the 
current spec at LC, which reflects 1.2, the attribute doesn't even make 
sense. If you go back to the 1.1 archive it does. Still, the OASIS document 
treats it more clearly.)


See http://www.loc.gov/standards/sru/oasis/sru-2-0-draft-most-current.doc 
See section 9.1.


So essentially, when you sort in SRU, you provide an XPath expression.  The 
XPath expression is meaningful in the context of a schema, but the *record 
schema* may not be the most meaningful schema for purposes of sorting, there 
may be another schema more meaningful.  So, you have the capability to 
specify not only a record schema but an auxiliary sort schema.


A given schema that an Explain file lists will usually be one that is used 
as a record schema, but it may also be usable as a sort schema.   That's 
what the sort attribute tells you.


--Ray 


Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas

2010-05-03 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu

But you will leave sorting as part of CQL too in any changes to CQL specs, 
I hope?  I think CQL has a lot of use even outside of SRU proper, so I 
encourage you to leave it's spec not too tightly coupled to SRU.


The OASIS TC firmly supports this approach (and by firmly I mean 100%) so 
the only way this could get changed is via public comment.





I think there are at least three ways to sort as part of (different 
versions of?) SRU now!

1) An actual separate sortKeys query paramater
2) Included in the CQL expression in query, using the sortBy keyword.
3) In draft not finalized, OASIS/SRU 2.0 methods of specifying XPaths for 
sorting.  [Thanks for including the link to the current SRU 2.0 draft, I 
didn't know that was publically available anywhere, it's not really 
googlable].


As you corrected yourself in a subsequent message:

Ah, I think I was wrong below. I must have been looking at different 
versions of the SRU spec without realizing it.


SRU 1.1 includes a sortKeys parameter, and CQL 1.1 does not include a 
sortBy clause.


SRU 1.2 does NOT include a sortKeys parameter, and CQL 1.2 does include 
a sortBy clause.


Yes, that's correct.


Do I have this right?  As SRU 1.2 is the only actual spec I have to work 
with... am I right that either top-level sortKeys, or embedded in CQL 
with sortBy would both be legal in SRU 1.2
No. Legal in 2.0 - the OASIS version, not legal in 1.2.   In 1.2 it is not 
legal to have a sort parameter in the request.


OASIS is standardizing SRU and CQL loosely coupled that is, SRU can use 
other query languages and CQL may be invoked by other protocols, but they 
are generally oriented towards being used together.   But since SRU may be 
used with a query language that might not have sort capability, the TC felt 
it necessary to include sorting as part of the protocol. Conversely since 
CQL may be used by a protocol that doesn't support sorting, similarly CQL 
should support sorting. There is a section in the draft standard that 
discusses what to do if a request has conflicting sort specifications.


--Ray


Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas

2010-04-30 Thread Ray Denenberg, Library of Congress

schemaInfo is what you're looking for I think.

Look at http://z3950.loc.gov:7090/voyager.

Line 74, for example,
schemaInfo
schema identifier=info:srw/schema/1/marcxml-v1.1 sort=false 
name=marcxml

titleMARCXML/title
/schema


Is this what you're looking for?

--Ray


- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, April 30, 2010 3:57 PM
Subject: [CODE4LIB] SRU/ZeeRex explain question : record schemas



This page:
http://www.loc.gov/standards/sru/resources/schemas.html

says:

The Explain document lists the XML schemas for a given database in which 
records may be transferred. Every schemas is unambiguously identified by a 
URI and a server may assign a short name, which may or may not be the same 
as the short name listed in the table below (and may differ from the short 
name that another server assigns).



But perusing the SRU/ZeeRex Explain documentation I've been able to find, 
I've been unable to find WHERE in the Explain document this information is 
listed/advertised.


Can anyone clue me in? 


Re: [CODE4LIB] Code4Lib Midwest?

2010-03-04 Thread Ray Denenberg, Library of Congress
If you're going to host it at Notre Dame I would expect an agenda something 
like this (do it on Saturday, 9/11)


1. meet  greet (mid-morning)
2. Tailgate (noon)
3. Notre Dame Vs. Michigan (3:30)
4. presentation to library staff
5. hack session
6. go home

Can you get us in to the game, Eric?  I'd come.

--Ray

On Thu, Mar 4, 2010 at 10:38 AM, Eric Lease Morgan emor...@nd.edu wrote:


If we were to host something here at Notre Dame, then I imagine something 
like the following agenda possible starting in the mid-morning:


1. meet  greet
2. share demonstrations
3. eat lunch
4. give a presentation to library staff
5. have a hack session
6. socialize in the evening
7. go home the next day

There are plenty of inexpensive hotels in South Bend because of all of our 
football games.


--
Eric Lease Morgan
University of Notre Dame

(574) 631-8604





--


Re: [CODE4LIB] XML schemas question

2009-07-27 Thread Ray Denenberg, Library of Congress
Any element that is not a child of another.   Or another way to look at it, 
any element that can be referenced by ref=.


In the following schema:

__
xs:schema
xs:element name=a type=aType/
xs:element name=b type=bType/
!-- !--
xs:complexType name=aType
xs:element ref=a/
xs:element name=c type=xs:string/
/xs:complexType
!-- !--
xs:complexType name=bType
xs:element ref=b/
xs:element name=c type=xs:string/
/xs:complexType
/xs:schema
__

a,  and b can be roots.  d cannot.

--Ray



- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Monday, July 27, 2009 11:35 AM
Subject: [CODE4LIB] XML schemas question



Anyone familiar with XML schemas (.xsd)?

Can you help me figure something out. Is there something in the schema 
that specifies what elements can serve as the 'root node'... or is any 
element described in the schema avaialable for use as a 'root node', and 
it'll still validate?


Thanks for any tips.

Jonathan 


Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-16 Thread Ray Denenberg, Library of Congress
I am not even remotely suggesting that anyone would implement the holdings 
standard with nothing but the schema.   We're working on a solution to this.


--Ray


- Original Message - 
From: Houghton,Andrew hough...@oclc.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, July 16, 2009 11:26 AM
Subject: Re: [CODE4LIB] Open, public standards v. pay per view standards and 
usage




From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Ross Singer
Sent: Thursday, July 16, 2009 11:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Open, public standards v. pay per view
standards and usage

On Wed, Jul 15, 2009 at 8:57 AM, Ray Denenberg, Library of
Congressr...@loc.gov wrote:

 Ross, if you're talking about the ISO 20775 xml schema:
 http://www.loc.gov/standards/iso20775/ISOholdings_V1.0.xsd

 It's free.

It's also not a spec, it's a schema.  If the expectation is that
people are actually going to adopt a standard from merely looking at
an .xsd, my prediction is that this will go nowhere.

I mean, I'm wrong a lot, but I feel pretty good about this reading
from my crystal ball.


Not saying you're wrong Ross, but it depends.  People adopted MARC-XML
by looking at the .xsd without an actual specification.  Granted it's
not a complicated schema however, and there already existed the MARC 21
Specifications for Record Structure, Character Sets, and Exchange Media
so it wasn't a big leap to adopt MARC-XML, IMHO.

Generally I agree with your conclusion Ross. It's difficult for people
to just pick up an .xsd and understand what the semantics are for each
element and attribute in the schema and which element(s) should be used
for the document element.  This is mitigated by annotations in the .xsd
for the elements and attributes and also mitigated by using the Russian
doll schema approach, that MARC-XML uses, so it's clear what elements
can be used for the document element.  Also tools like XMLSpy that
provide a graphical representation of the .xsd can provide insights
into how the schema should be used.

But these are a lot of if this and that was done, and you have appropriate
tools.  A freely available specification detailing each element and
attribute along with their semantics is much better for understanding a
schema than the schema itself, but obviously the schema is the definitive
authority when it comes to generating conforming instance documents.


Andy. 


Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-15 Thread Ray Denenberg, Library of Congress

From: Ross Singer rossfsin...@gmail.com

Well, it's not a great example, because I don't have a
'counter-example', but I think it will remain to be seen if ISO 20775
goes anywhere if it, too, remains behind a pay wall.  If an open spec
were to come along that allowed the transfer of holdings and
availability information that was decent and simple it would basically
render ISO 20775 irrelevant (if the pay wall doesn't already).


Ross, if you're talking about the ISO 20775 xml schema:
http://www.loc.gov/standards/iso20775/ISOholdings_V1.0.xsd

It's free. 


--Ray


Re: [CODE4LIB] WARC file format now ISO standard

2009-06-02 Thread Ray Denenberg, Library of Congress
But you have to pay $200 for the document that lists changes from last draft 
to first official version.


(Ok, Ok, it was just a joke. But you do get the point.)


- Original Message - 
From: st...@archive.org st...@archive.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Tuesday, June 02, 2009 5:18 PM
Subject: Re: [CODE4LIB] WARC file format now ISO standard



hi Karen,

understood.

the final draft of the spec is available here:
http://www.scribd.com/doc/4303719/WARC-ISO-28500-final-draft-v018-Zentveld-080618

and other (similar) versions here:
http://archive-access.sourceforge.net/warc/


/st...@archive.org



On 6/2/09 2:15 PM, Karen Coyle wrote:
Unfortunately, being an ISO standard, to obtain it costs 118 CHF (about 
$110 USD). Hard to follow a standard you can't afford to read. Is there 
an online version somewhere?


kc

st...@archive.org wrote:

hi code4lib,

if you're archiving web content, please use the WARC format.

thanks,
/st...@archive.org



WARC File Format Published as an International Standard
http://netpreserve.org/press/pr20090601.php

ISO 28500:2009 specifies the WARC file format:

* to store both the payload content and control information from
  mainstream Internet application layer protocols, such as the
  Hypertext Transfer Protocol (HTTP), Domain Name System (DNS),
  and File Transfer Protocol (FTP);
* to store arbitrary metadata linked to other stored data
  (e.g. subject classifier, discovered language, encoding);
* to support data compression and maintain data record integrity;
* to store all control information from the harvesting protocol
  (e.g. request headers), not just response information;
* to store the results of data transformations linked to other
  stored data;
* to store a duplicate detection event linked to other stored
  data (to reduce storage in the presence of identical or
  substantially similar resources);
* to be extended without disruption to existing functionality;
* to support handling of overly long records by truncation or
  segmentation, where desired.


more info here:
http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml






Re: [CODE4LIB] exploiting z39.50

2009-05-08 Thread Ray Denenberg, Library of Congress

From: Eric Lease Morgan emor...@nd.edu

  1. What MARC field/subfield might I put this string?
  2. How would I go about getting the string indexed?
  3. How might I go about querying the server for records with this 
string?


I can at least talk about the third question.  There was work on a marc 
attribute set, though not completed.  If you look at the oid register at 
http://www.loc.gov/z3950/agency/defns/oids.html you'll see that the latest 
work on it (second draft) was in 2000, 
http://www.nlc-bnc.ca/iso/z3950/MARC_attribute_set_2.doc. So if someone 
actually wanted to put it to use it would have to be completed.


For SRU there is a complete marc context set, 
http://www.loc.gov/standards/sru/resources/marc-context-set.html.


--Ray


Re: [CODE4LIB] registering info: uris?

2009-04-30 Thread Ray Denenberg, Library of Congress

From: Ross Singer rossfsin...@gmail.com

Except that OpenURL and SRU /already use different info URIs to
describe the same things/.

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

What is the rationale for this?


None.  (Or, whatever rationale there was, historically, should no longer 
apply.)  These should be aligned.   Post this to the OpenURL list (and 
perhaps SRU as well).  I'm certainly willing to work to come up with a 
solution.


--Ray


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Ray Denenberg, Library of Congress
Thanks, Ross. For SRU, this is an opportune time to reconcile these 
differences.  Opportune, because we are approaching standardization of 
SRU/CQL within OASIS, and there will be a number of areas that need to 
change.


Some observations.

1. the 'ofi' namespace of 'info' has the advantage that the name, ofi, 
isn't necessarily tied to a community or application (I suppose one could 
claim that  the acronym ofi means openURL something starting with 'f' 
for Identifiers  but it doesn't say so anywhere that I can find.)  However, 
the namespace itself (if not the name) is tied to OpenURL.  Namespace of 
Registry Identifiers used by the NISO OpenURL Framework Registry.  That 
seems like a simple problem to fix.  (Changing  that title would not cause 
any technical problems. )


2. In contrast,  with the srw namespace,  the actual name is srw. So at 
least in name, it is tied to an application.


3. On the other side, the srw namespace has the distinct advantage of 
built-in extensibility.  For the URI: info:srw/schema/1/onix-v2.0,  the 1 
is an authority.   There are (currently) 15 such authorities, they are 
listed in the (second) table at 
http://www.loc.gov/standards/sru/resources/infoURI.html


Authority 1  is the SRU maintenance agency, and the objects registered 
under that authority are, more-or-less, public. But objects can be defined 
under the other authorities with no registration process required.


4.  ofi does not offer this sort of extensibility.


So, if we were going to unify these two systems (and I can't speak for the 
SRU community and commit to doing so yet) the extensibility offered by the 
srw approach would be an absolute requirement.   If it could somehow be 
built in to ofi,  then I would not be opposed to migrating the srw 
identifiers.   Another approach would be to register  an entirely  new 
'info:' URI namespace and migrating all of these identifiers to the new 
namespace.


--Ray


- Original Message - 
From: Ross Singer rossfsin...@gmail.com

To: z...@listserv.loc.gov
Sent: Thursday, April 30, 2009 2:59 PM
Subject: One Data Format Identifier (and Registry) to Rule Them All



Hello everybody.  I apologize for the crossposting, but this is an
area that could (potentially) affect every one of these groups.  I
realize that not everybody will be able to respond to all lists,
but...

First of all, some back story (Code4Lib subscribers can probably skip 
ahead):


Jangle [1] requires URIs to explicitly declare the format of the data
it is transporting (binary marc, marcxml, vcard, DLF
simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
own URI structure for this (http://jangle.org/vocab/formats#...) but
this was always been with the intention of moving out of the
jangle.org into a more generic space so it could be used by other
initiatives.

This same concept came up in UnAPI [2] (I think this thread:
http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682
discusses it a bit - there is a reference there that it maybe had come
up before) although was rejected ultimately in favor of an (optional)
approach more in line with how OAI-PMH disambiguates metadata formats.
That being said, this page used to try to set sort of convention
around the UnAPI formats:
http://unapi.stikipad.com/unapi/show/existing+formats
But it's now just a squatter page.

Jakob Voss pointed out that SRU has a schema registry and that it
would make sense to coordinate with this rather than mint new URIs for
things that have already been defined there:
http://www.loc.gov/standards/sru/resources/schemas.html

This, of course, made a lot of sense.  It also made me realize that
OpenURL *also* has a registry of metadata formats:
http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats

The problem here is that OpenURL and SRW are using different info URIs
to describe the same things:

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

The latter technically isn't the same thing since the OpenURL one
claims it's an identifier for ONIX 2.1, but if I wasn't sending this
email now, eventually SRU would have registered
info:srw/schema/1/onix-v2.1

There are several other examples, as well (MODS, ISO20775, etc.) and
it's not a stretch to envision more in the future.

So there are a couple of questions here.

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say hey, here's my FOAF available via UnAPI
3) be so 

Re: [CODE4LIB] exact title searches with z39.50

2009-04-28 Thread Ray Denenberg, Library of Congress

From: Mike Taylor m...@indexdata.com

The irony is that Z39.50 actually make _much_ more effort to specify
semantics than most other standards -- and yet still finds itself in
the situation where many implementations do not respond correctly to
the BIB-1 attribute 6=3 (completeness=complete field) which is how
Eric should be able to do what he wants here.

Not that I have any good answers to this problem ... but I DO know
that inventing more and more replacement standards it NOT the answer.
Everything that's come along since Z39.50 has suffered from exactly
the same problem but more so.


I think this remains to be seen for SRU/CQL, in particular for the example 
at hand, how to search for exact title.  There are two related issues: one, 
how arcane the standard is, and two, how closely implementations conform to 
the intended semantics. And clearly the first has a bearing on the second.


And even I would say that Z39.50 is a bit on the arcance side when it comes 
to formulating a query for exact title. With  SRU/CQL there is an exact 
relation ('exact' in 1.1,  '=='  in 1.2).  So I would think there is less 
excuse for a server to apply a creative interpretation. If it cannot support 
exact title it should fail the search. With Z39.50 there is more perceived 
latitude for a server to pretend it supports something it doesn't.


--Ray


Re: [CODE4LIB] exact title searches with z39.50

2009-04-28 Thread Ray Denenberg, Library of Congress
Right, Mike. There is a long and rich history of the debate between loose 
and strict interpretation, in the world at large, and in particular, within 
Z39.50, this debate raged from the late 1980s throughout the 90s.  The 
faction that said If you can't give the client what is asks for, at least 
give them something; make them happy was almost religious in its zeal. 
Those who said If you can't give the client what it asks for, be honest 
about it; give them good diagnostic information, tell them a better way to 
formulate the request, etc. But don't pretend the transaction was a success 
if it wasn't was shouted down most every time.   I can't predict, but I'm 
just hoping that lessons have been learned from the mess that that mentality 
got us into.


--Ray

- Original Message - 
From: Mike Taylor m...@indexdata.com

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Tuesday, April 28, 2009 10:43 AM
Subject: Re: [CODE4LIB] exact title searches with z39.50



Ray Denenberg, Library of Congress writes:
  The irony is that Z39.50 actually make _much_ more effort to
  specify semantics than most other standards -- and yet still
  finds itself in the situation where many implementations do not
  respond correctly to the BIB-1 attribute 6=3
  (completeness=complete field) which is how Eric should be able to
  do what he wants here.
 
  Not that I have any good answers to this problem ... but I DO
  know that inventing more and more replacement standards it NOT
  the answer.  Everything that's come along since Z39.50 has
  suffered from exactly the same problem but more so.

 I think this remains to be seen for SRU/CQL, in particular for the
 example at hand, how to search for exact title.  There are two
 related issues: one, how arcane the standard is, and two, how
 closely implementations conform to the intended semantics. And
 clearly the first has a bearing on the second.

 And even I would say that Z39.50 is a bit on the arcance side when
 it comes to formulating a query for exact title. With SRU/CQL there
 is an exact relation ('exact' in 1.1, '==' in 1.2).  So I would
 think there is less excuse for a server to apply a creative
 interpretation. If it cannot support exact title it should fail
 the search.

IMHO, this is where it breaks down 90% of the time.  Servers that
can't do what they're asked should say I can't do that, but -- for
reasons that seem good at the time -- nearly no server fails requests
that it can sort of fulfil.  Nine out of ten Z39.50 servers asked to
do a whole-field search and which can't do it will instead do a word
search, because it's better to give the user SOMETHING.  I bet the
same is true of SRU servers.  (I am as guilty as anyone else, I've
written servers like that.)

The idea that it's better to give the user SOMETHING might -- might
-- have been true when we mostly used Z39.50 servers for interactive
sessions.  Now that they are mostly used as targets in metasearching,
that approach is disastrous.

_/|_ ___
/o ) \/  Mike Taylorm...@indexdata.com 
http://www.miketaylor.org.uk

)_v__/\  I try to take one day at a time, but sometimes several days
attack me at once -- Ashleigh Brilliant. 


Re: [CODE4LIB] exact title searches with z39.50

2009-04-28 Thread Ray Denenberg, Library of Congress

From: Walker, David dwal...@calstate.edu

I'm not sure it's a _big_ mess, though, at least for metasearching.


I wasn't thinking  specifically about metasearch, but rather,  bad decisions 
getting replicated and you end up with an installed base of bad 
implementations. The best illustration would be the huge mess that HTML is.


--Ray


Re: [CODE4LIB] exact title searches with z39.50

2009-04-28 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu
HTML works out pretty well. If our biggest failures were 'failures' like 
HTML, we'd be doing pretty well.


HTML is a wonderful standard.

And I don't mean to take the discussion off-course.   My point was simply 
that because early browsers did not insist on clean html, the proliferation 
of unlean html has reached the point where, well, whether you consider it a 
mess  or not depends on how much importance you place on clean html.  It's 
important to me.


--Ray 


Re: [CODE4LIB] Serials Solutions Summon

2009-04-21 Thread Ray Denenberg, Library of Congress

From: Thomas Dowling tdowl...@ohiolink.edu
You can define differences between meta-, federated, and broadcast search, 
but

every discussion on the topic will be punctuated by people asking, Wait,
what's the difference again?


Leaving aside metasearch and broadcast search (terms invented more recently) 
it  is a shame if federated has really lost its distinction 
fromdistributed.  Historically, a federated database is one that 
integrates multiple (autonomous) databases so it is in effect a virtual 
distributed database, though a single database.I don't think that's a 
hard concept and I don't think it is a trivial distinction.


--Ray 


Re: [CODE4LIB] Serials Solutions Summon

2009-04-21 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu

If you want to reclaim the term federated to mean a local index, I think
you have a losing battle in front of you.


It's not a battle I plan to pursue, I don't fight battles anymore. I just 
feel obligated to observe that when vocabulary is tinkered with in this 
fashion -- and I did notice, probably more than ten years ago, that 
federated was being manipulated --  it  makes it difficult to express 
modeling concepts when definitions are a moving target.  In the future, 
vendors should be more careful about messing around with established 
definitions.  If you don't like the federated model (the old one), don't 
redefine the term, find a new term.  The old term could someday come in 
handy for expressing why you don't like that model, but it's useless if 
nobody agrees what it means.   --Ray


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu


The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.


The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray


Re: [CODE4LIB] Something completely different

2009-04-09 Thread Ray Denenberg, Library of Congress

From: Mike Taylor m...@indexdata.com


... anyway, all of this is far, far away from the point.  MARC is old
and ugly yes; but then so am I, 


I don't think you're old, Mike. 


--Ray


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress

No,  not identical URIs.

Let's say I've put a copy of the schema permanently at each of the following 
locations.

http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
http://www.acme.com//mods-3-3.xsd
http://www.takoma.org/standards/mods-3-3.xsd

Three locations, three URIs.

But the issue of redirect or even resolution is irrelevant in the use case 
I'm citing.   I'm talking about the use of an identifier within a protocol, 
for the sole purpose of identifying an object that the recipient of the URI 
already has - or if it doesn't have it it isn't going to retrieve it, it 
will just fail the request.   The purpose of the identifier is to enable the 
server to determine whether it has the schema that the client is looking 
for.  (And by the way that should answer Ed's question about a use case.)


So the server has some table of schemas, in that table is the row:

[mods schema]   [ URI identifying the mods schema]

It recieves the SRU request:
http://z3950.loc.gov:7090/voyager?
version=1.1operation=searchRetrievequery=dinosaurmaximumRecords=1recordSchema=URI 
identifying the mods schema


If the URI identifying the MODS schema in the request matches the URI in 
the table, then the server know what schema the client wants, and it 
proceeds.  If there are multiple identifiers then it has to have a row in 
its table for each.


Does that make sense?

--Ray


- Original Message - 
From: Ross Singer rossfsin...@gmail.com

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 2:07 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




Ray, you are absolutely right.  These would be bad identifiers.  But
let's say they're all identical (which I think is what you're saying,
right?), then this just strengthens the case for indirection through a
service like purl.org.  Then it doesn't *matter* that all of these are
different locations, there is one URI that represent the concept of
what is being kept at these locations.  At the end of the redirect can
be some sort of 300 response that lets the client pick which endpoint
is right for them -or arbitrarily chooses one for them.

-Ross.

On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:

We do just fine minting our URIs at LC, Andy. But we do appreciate your
concern.

The analysis of our MODS URIs misses the point, I'm afraid. Let's forget
the set I cited (bad example) and assume that the schema is replicated at
several locations (geographically dispersed) all of which are planned to
house the specific version permanently. The suggestion to designate one 
as

cannonical is a good suggestion but it isn't always possible (for various
reasons, possibly political). So I maintain that in this scenario you 
have

several *location* none of which serves well as an identifier. I'm not
arguing (here) that info is better than http (for this scenario) just 
that

these are not good identifiers.

--Ray

- Original Message - From: Houghton,Andrew hough...@oclc.org
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 1:21 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
registering info: uris?)



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 1:06 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

The general convention is that http://; is a web address, a location.
I
realize that it's also a form of URI, but that's a minority use of
http.
This leads to a great deal of confusion. I understand the desire to use
domain names as a way to create unique, managed identifiers, but the
http part is what is causing us problems.


http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that
it is a web addresss. However, it is not a location. URIs according
to RFC 3986 are just tokens to identify resources. These tokens, e.g.,
URIs are presented to protocol mechanisms as part of the dereferencing
process to locate and retrieve a representation of the resource.

People see http: and assume that it means the HTTP protocol so it must
be a locator. Whoever initially registered the HTTP URI scheme could
have used web as the token instead and we would all be doing:
web://example.org/. This is the confusion. People don't understand
what RFC 3986 is saying. It makes no claim that any URI registered
scheme has persistence or can be dereferenced. An HTTP URI is just a
token to identify some resource, nothing more.


Andy.




Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress
You're right, if there were a web:  URI scheme, the world would be a 
better place.   But it's not, and the world is worse off for it.


It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
Here is some of my historical perspective (which may well differ from 
others').


Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
was for retrieving documents via ftp. The telnet scheme was for telnet. And 
so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
when we developed the z39.50 URI scheme, which was around 1995. Most of us 
were not wise to the ways of the web that long ago, but we were told, by 
those who were, that z39.50r: and z39.50s:  at the beginning of a URL 
are explicit indications that the URI is to be resolved by Z39.50.


A few years later the semantic web was conceived and alot of SW people began 
coining all manner of http URIs that had nothing to do with the http 
protocol.   By the time the rest of the world noticed, there were so many 
that it was too late to turn back. So instead, history was altered.  The 
company line became we never told you that the URI scheme was tied to a 
protocol.


Instead, they should have bit the bullet and coined a new scheme.  They 
didn't, and that's why we're in the mess we're in.


--Ray


- Original Message - 
From: Houghton,Andrew hough...@oclc.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, April 02, 2009 9:41 AM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 2:26 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

This really puzzles me, because I thought http referred to a protocol:
hypertext transfer protocol. And when you put http://; in front of
something you are indicating that you are sending the following string
along to be processed by that protocol. It implies a certain
application
over the web, just as mailto:; implies a particular application. Yes,
http is the URI for the hypertext transfer protocol. That doesn't
negate the fact that it indicates a protocol.


RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token web: instead of http:.  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as smtp:
instead of mailto:;.


My reading of Cool URIs is
that they use the protocol, not just the URI. If they weren't intended
to take advantage of http then W3C would have used something else as a
URI. Read through the Cool URIs document and it's not about
identifiers,
it's all about using the *protocol* in service of identifying. Why use
http?


I'm assuming here when you say My reading of Cool URIs... means reading
the Cool URIs for the Semantic Web document and not the Cool URIs Don't
Change document.  The Cool URIs for the Semantic Web document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based 
system

that uses HTTP URIs to identify resources.  If you want to link, then you
need to use HTTP URIs.  There is only one protocol, today, that accepts
HTTP URIs as currency and its appropriately called HTTP and defined by
RFC 2616.

The Cool URIs for the Semantic Web document describes how an HTTP 
protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP 
URI.
Its important to understand the URIs are just tokens that *can* be 
presented
to a protocol for resolution.  Its up to the protocol to define the 
currency
that it will accept, e.g., HTTP URIs, and its up to an implementation of 
the

protocol to define the tokens of that currency that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but 
in

the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, etc. 

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ray Denenberg, Library of Congress

From: Houghton,Andrew hough...@oclc.org


The point being that:

urn:doi:*
info:doi:*

provide no advantages over:

http://doi.org/*



I think they do.

I realize this is pretty much a dead-end debate as everyone has dug 
themselves into a position and nobody is going to change their mind. It is a 
philosophical debate and there isn't a right answer.  But in my opinion 


I won't use the doi example because it's overloaded.  Let's talk about the 
hypothetical sudoc. I think info:sudoc/xyz provides an advantages over: 
http://sudoc.org/xyz   if the latter is not going to resolve.


Why? Because it drives me nuts to see http URIs everywhere that give all 
appearances of resolvability - browsers, editors, etc.  turn them into 
clickable links.   Now, if you are setting up a resolution service where you 
get the document that the sudoc identifies when you click on the URI, then 
http is appropriate.   The *actual document*. Not a description of it in 
lieu of the document.  And the so-called architectural justification that 
it's ok to return metadata instead of the resource (representation) -- I 
don't buy it.


--Ray 


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu
 There are all sorts of useful identifiers I use in my work every day that 
can not be automatically dereferenced.


Even more to the point: there is no sound definition of dereference.  To 
dereference a resource means to retrieve a representation of it. There has 
never been any agreement within the w3c of what constitutes a 
representation.


--Ray 


Re: [CODE4LIB] resolution and identification

2009-04-01 Thread Ray Denenberg, Library of Congress

A concrete example.

The MODS schema, version 3.3, has an info identifier, for SRU purposes:

info:srw/schema/1/mods-v3.3

So in an SRU request you can say

recordSchema=info:srw/schema/1/mods-v3.3

Meaning you want records returned in the mods version 3.3 schema.  And 
that's really the purpose of the schema identifier. Both the client and 
server know the schema by this identifier  - or the server doesn't know it 
at all and the request fails - but nobody wants to resolve the identifier.


Now in contrast, the schema is at
http://www.loc.gov/standards/mods/v3/mods-3-3.xsd

And it's also at:
http://www.loc.gov/mods/v3/mods-3-3.xsd

And also:
http://www.loc.gov/mods/mods.xsd

And:
http://www.loc.gov/standards/mods/mods.xsd

And:
http://www.loc.gov/standards/mods/v3/mods.xsd



So there you have five http identifiers for the schema.

Which is the better identifier for this purpose? The single info identifer, 
or a choice http identifers, one for  every possible location where the 
schema may reside (which is more than these five).If the answer is that 
it's better to use one of the http identifiers, how do you know that the one 
you pick is the one that the server recognizes it by?  Or should the server 
maintain a list of all possible locations?


--Ray


- Original Message - 
From: Ross Singer rossfsin...@gmail.com

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 12:26 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:
But shouldn't we be able to know the difference between an identifier and 
a
locator? Isn't that the problem here? That you don't know which it is if 
it

starts with http://.


But you do if it starts with http://dx.doi.org

I still don't see the difference.  The same logic that would be
required to parse and understand the info: uri scheme could be used to
apply towards an http uri scheme.

-Ross. 


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ray Denenberg, Library of Congress

From: Erik Hetzner erik.hetz...@ucop.edu

 I believe that registering a domain would be less
work than going through an info URI registration process, but I don’t
know how difficult the info URI registration process would be (thus
bringing the conversation full circle). [1]



Leaving aside religious issues I just want to be  sure we're clear on one 
point: the work required for the info URI process is exactly the amount of 
work required, no more no less.  It forces you to specify clear syntax and 
semantics, normalization (if applicable), etc.  If you go a different route 
because it's less work, then you're probably avoiding doing work that needs 
to be done.


--Ray 


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ray Denenberg, Library of Congress

From: Ross Singer rossfsin...@gmail.com

nobody gives a damn about info:uris outside of
libraries, 


Nor do people outside of libraries care about identifiers. 


--Ray


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ray Denenberg, Library of Congress

From: Hilmar Lapp hl...@duke.edu

Nor do people outside of libraries care about identifiers.


You might be surprised: http://www.lsrn.org/


yes,  I overstated, let me rephrase. There are communities who are 
interested in specific object classes and want identifier schemes for them. 
For libraries there are books, article, journals, and many others. And 
certainly this isn't limited to libraries, for example many scientific 
disciplines have a similar interest in identifer schemes for objects in 
specific object classes.


But the term  identifier has taken on a whole new meaning with the web. 
It has now been generalized to identify any resouce, and we don't even 
have a clear  definition of resource, aside from the convoluted anything 
that can be identified -  The discussions on this are often a convoluted 
mess, and  it's no wonder location and identity get confused.  And because 
of all the emphasis on solving this part of  the web architecture -  which 
haven't been accomplished, and there is debate within the W3C whether it is 
even possible - the original concept of identifer seems to be lost, aside 
from within the communities I alluded to above. And it is for those 
communities that the info URI is useful.


Now as to my reference to religious issues,  a statement like Having 
unresolvable URIs is anti-Web would be better to stated as: Having 
unresolvable URIs IN MY OPINION is anti-Web.  It is an opinion, not a fact. 
Stating is as fact is dogmatic.  It is a reasonable opinion, however, my 
opinion: Having unresolvable URIs IN MY OPINION is PRO-Web is just as 
reasonable.   I needn't go into further detail, we've beaten this to death 
already.


--Ray


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Ray Denenberg, Library of Congress
Pointing to the documentation and saying one of these isn't going to work, 
I'm afraid.   Most important is to make sure that the syntax is consistent 
with URI syntax.  Where the syntax of the identifier you're representing is 
potentially at odds with URI syntax, you  might have to make adjustments, 
like percent-encode. So if you're going to register sudoc, you're going to 
have to understand the syntax to some degree, there's really no way around 
it. (I didn't know the lccn syntax, registering it forced me to learn it, 
and I'm a better man for it.)


I don't know much about SuDoc, and most everything seems to point to 
http://www.gpo.gov/su_docs/fdlp/pubs/explain.html which doesn't really 
explain their syntax. (Though if you look a bit harder maybe you'll find 
something better.)


But I see this example:Y 3.C 76/3:2 K 54

That's apparently a sudoc.  It immediately raises the following flags: 
spaces, slash, colon, and case (sensitivity).For your purposes I don't 
think that colon or slash is a problem. (They become a problem when you are 
using them as special characters for delimitation, but you're not doing 
that.) Spaces, though, have to be percent encoded. (That simply means 
replace each occurence of a space with %20.)


You also need to look at case-sensitivity. If sudocs are case-sensitive, no 
problem, if not, then you may want to normalize to either upper or lower 
case.


There may not be any normalization issues (other than case sensitivity, if 
that).   Normalization is an issue only if a particular sudoc can be 
represented by more than one string.   If so you have two choices:

1. prescribe a canonical form (which is the approach we took for LCCNs).
2.  simply describe the rules for determining when two strings represent the 
same sudoc (there is no rule that says that two different info URIs can't 
refer to the same resource).


You can contact me privately if you have problems.

No, sorry, I don't know anyone at GPO.  I worked the graveyard shift there 
part time during college.  (I had to load mailing machines with junk mail. 
Several junk items loaded into a machine which would combine them into one 
mailing item. The machine would jam about every tenth time. Worst job I ever 
had.) But that was many years ago and that's the last contact I've had with 
GPO.


Good luck.

-Ray

- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 3:36 PM
Subject: Re: [CODE4LIB] registering info: uris?



Thanks Ray.

Oh boy, I don't know enough about SuDoc to describe the syntax rules 
fully. I can spend some more time with the SuDoc documentation (written 
for a pre-computer era) and try to figure it out, or do the best I can.  I 
mean, the info registration can clearly point to the existing SuDoc 
documentation and say one of these -- but actually describing the syntax 
formally may or may not be possible/easy/possible-for-me-personally.


I can't even tell if normalization would be required or not. I don't think 
so.  I think SuDocs don't suffer from that problem LCCNs did to require 
normalization, I think they already have consistent form,  but I'm not 
certain.


I'll see what I can do with it.
But Ray, you work for 'the government'.   Do you have a relationship with 
a counter-part at GPO that might be interested in getting involved with 
this?


Jonathan

Ray Denenberg, Library of Congress wrote:

It's a fairly straightforward process,  See:
http://info-uri.info/registry/register.html

You should look at a few examples first, go to 
http://info-uri.info/registry/  and click on a few of those listed in the 
left column.


I think registering one for SuDocs would be fairly easy.

The info folks are most concerned that the syntax rules are 
well-described. I had registered a few of these before they started 
cracking the whip on that (and rightly so), and when I registered info:lc 
it became more difficult; you might want to look at that for an example:

http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lc/

Also, normalization - I suggested looking at info:lccn normalization 
rules:

http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lccn/

--Ray


- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 3:12 PM
Subject: [CODE4LIB] registering info: uris?



Does anyone know the process for registering a sub-scheme for info: 
uris?


I'd like to have one for SuDoc classification numbers, info:sudoc/.

I'm not sure if I can register that on my own, without working with the 
US Government Printing Office, who actually maintains sudocs.  But if I 
have to get GPO to do it, I'll probably give up quicker (unless it turns 
out easier than I thought to find the right person at GPO and get them 
to sign on -- I doubt it!). Or if the registration process is really 
long

Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Ray Denenberg, Library of Congress
Correct me if I'm wrong but isn't the point of all this to be able to put 
the URI in an OpenURL?   And info was invented (in part) to avoid putting 
http URIs in OpenURLs  (because they are complicated enough already, why 
clutter them further).  So I don't see that pursuing an http solution to 
this is very useful.   --Ray



- Original Message - 
From: Houghton,Andrew hough...@oclc.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 5:24 PM
Subject: Re: [CODE4LIB] registering info: uris?



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Friday, March 27, 2009 5:18 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

I am not interested in maintaining a sudoc.info registration, and
neither is my institution, who I wouldn't trust to maintain it (even to
the extent of not letting the DNS registration expire) after I left.


BTW, you could always use http://purl.org/ and later if you wanted
to have it resolve to something just change the PURL. 


Re: [CODE4LIB] MIME Type for MARC, Mods, etc.?

2009-02-13 Thread Ray Denenberg, Library of Congress
Sorry for the confusion over SRU, and I'm afraid this takes up way 
off-topic, but since you asked .


I meant the SRU *response format*.  And even that doesn't make sense, not in 
the context of the current SRU spec.  But in the next version, 2.0, which we 
are now developing within OASIS, the response can take on different formats, 
subject (possibly) to content negotiation.  For example the response can be 
packaged in ATOM, or RSS, or the default SRU schema, and it is the later 
that we are registering.   --Ray



- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, February 13, 2009 12:02 PM
Subject: Re: [CODE4LIB] MIME Type for MARC, Mods, etc.?


Thanks Ray. marcxml+xml makes sense to me, the name is really arbitrary, 
so long as we have _something_ that represents MARC-XML.  Glad you are 
working on this.


I'm confused about your suggestion of registering a content type for SRU. 
My understanding is that SRU is a _protocol_, not a media type?  Unless 
you mean registering a type for the SRU explain document?  In general, 
with my understanding of SRU and of the purpose of internet content types, 
it doesn't seem to make sense to me to register a content type for SRU the 
protocol.


Media types must function as an actual media format: Registration of 
things that are better thought of as a transfer encoding, as a character 
set, or as a collection of separate entities of another type, is not 
allowed.


Is SRU a media/document type, or is it a communications protocol using a 
collection of separate document types?


Jonathan

Ray Denenberg, Library of Congress wrote:

A few points:

1. x- is commonly used in cases when an application for a mime type is
pending, and when there is a reasonable expectation that it will be
approved.   The mime type is prefixed with x- until the requested mime
type becomes official, after which the x- is dropped.

2. We will be registering MODS and MARCXML:
 - application/mods+xml
 - application/marcxml+xml

3. The reason one uses (or doesn't use) +xml  is made very clear in one 
of
the relevant RFCs (I don't have the number at the moment):  the 
application
consuming the content is supposed to recognize the mime type and process 
it
accordingly, however, in the event that it does not recognize the mime 
type,

the +xml signals at least that the content is xml, and so there is a
possibility that it might do something useful with it, even though it 
cannot

proccess it according to mime type - it may be able to parse the XML and
present something readable to the user. Even better, consider  the case
where it is a protocol response, for example SRU, where we are 
registering
application/sru+xml, there might be an accompanying  stylesheet url, and 
the
client can then format a complete sru response without knowing that it 
did

so.

 The reason is NOT, as some have suggested, to distinguish mods+xml 
from
mods+xyz where xyz is some alternative syntax.  However, because of 
the
confusion, we would register marcxml as marcxml+xml (even though it 
sounds

funny) rather than marc+xml, because of all the confusion that the latter
name would cause.

--Ray

- Original Message -
From: Jonathan Rochkind rochk...@jhu.edu
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, February 12, 2009 5:21 PM
Subject: Re: [CODE4LIB] MIME Type for MARC, Mods, etc.?




Actually, re-reading some of the RFCs, I would clarify one thing.

It seems like using unregistered x- MIME type is discouraged, and
instead you are encouraged to use what is (claimed to be) a very quick 
and

easy and painless process of registering vnd. types.  So I'd encourage
LC to investigate doing that for MARC, while waiting for someone to have
time to do an actual (more time consuming) application/marc+xml
registration. That would give us the beneift of an actual registration
(albeit under vnc.) instead of an unregistered x-.

As far as text/xml, the general consensus on the internet seems to be 
that
it was a mistake, but it's there and no one cares enough to try to 
somehow

remove it, so it _is_ legal, but nobody really encourages using it.  One
problem with text/html is that it's default char encoding is ascii, 
while
the default char encoding for XML is of course UTF-8. This can very 
easily
lead to confusion and encoding errors unless software is more careful 
than

we know most software has a tendency to be. :)  Still, it's legal, but I
don't see any reason to encourage it's use for MARC.
application/xml, sure, but it would be _really_ useful, for the reasons
discussed in last week's thread, to have a specific type for marc xml 
(and

mods).  If the folks at LC don't understand why, thinking that
application/xml is sufficient, i could try to write up a persuasive 
essay
again, or copy and paste from last week's thread. Or is there someone 
else

other than LC who could conceivably fill out an application for
application/marc+xml and application

Re: [CODE4LIB] [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.?

2009-02-12 Thread Ray Denenberg, Library of Congress
Hello, this thread was recently brought to my attention.  (And then it took 
longer than it should have to get subscribed to this list. And I haven't 
seen an archive so I don't know if there has been any discussion beyond 2/4. 
Anyway )


We (LC) decided a year ago to register mime types for both MODS and MARC but 
regrettably got sidetracked. We're back on track now and it should be done 
reasonably soon.


--Ray Denenberg

- Original Message - 
From: Riley, Jenn jenlr...@indiana.edu

To: mods...@listserv.loc.gov
Sent: Wednesday, February 04, 2009 6:58 PM
Subject: Re: [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.?


Cool! Can you post a response to CODE4LIB?

Thanks,

Jenn

-Original Message-
From: MODS Editorial Committee Forum [mailto:mods...@loc.gov] On Behalf Of 
Ray Denenberg, Library of Congress

Sent: Wednesday, February 04, 2009 6:56 PM
To: mods...@listserv.loc.gov
Subject: Re: [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.?

Application  for mime types for MARC and MODS is in the works. It's been
slowed down but we're back to moving it along and it should be reasonably
soon, I hope. --Ray

- Original Message - 
From: Riley, Jenn jenlr...@indiana.edu

To: mods...@listserv.loc.gov
Sent: Wednesday, February 04, 2009 6:42 PM
Subject: [MODS-EC] FW: [CODE4LIB] MIME Type for MARC, Mods, etc.?


Something for us to think about...

Jenn


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Wednesday, February 04, 2009 10:59 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MIME Type for MARC, Mods, etc.?

You CAN use application/xml for any XML, but it's often useful to have
a
specific type for your specific content, so the user-agent can know
what
to do with it.  The convention is to include +xml on the end, so if
the user agent doens't know your specific format, it can fall back to
treating it as generic XML.

For instance:

application/rss+xml
application/atom+xml
application/rdf+xml

And dozens more you can see at:
http://www.iana.org/assignments/media-types/application/   (search for
+xml).

Thanks to Mark and Ross Singer for pointing out application/marc
already
exists. (and is on that list above).  Awesome.

I'm still feeling the need for application/marc+xml, and
application/mods+xml

Jonathan

Ethan Gruber wrote:
 Correct me if I'm wrong, but wouldn't the mime type for MARC-XML and
MODS be
 application/xml, like every other xml file?  As for MARC-binary, I
can't
 say.  I don't have any of those files handy.

 Ethan

 On Wed, Feb 4, 2009 at 10:47 AM, Jonathan Rochkind rochk...@jhu.edu
wrote:


 I am actually rather shocked that it seems that MARC-XML, MODS,
 MARC21-binary, do not have registered Internet Content Types (aka
MIME
 types).

 Am I missing something, or is this really so?

 Anyone know what the process is for registering such?  Anyone want
to help
 try to do that? I guess we'd probably have to talk to the standards
 organizations for each of those types, rather than doing it
independently?

 Jonathan