On Sun, Jun 27, 2010 at 5:14 PM, Dan Scott <[email protected]> wrote: > On 25 June 2010 20:08, Mike Rylander <[email protected]> wrote: >> On Fri, Jun 25, 2010 at 12:07 AM, Dan Scott <[email protected]> wrote: >>> Hello: >>> >>> In early July, I plan to spend two weeks working with the team at the >>> International Institute of Social History (IISH) - you may have seen >>> Repke and Marjolein on various Evergreen mailing lists. One of our goals >>> is to come out of our time together with some additional functionality, >>> particularly in the areas of authority control (yay!). Our other, >>> arguably primary goal, is to share as much knowledge as I can with the >>> team at IISH and help cultivate more development talent locally at IISH >>> and, by extension, in the general Evergreen community (double yay!). >>> >>> For the proposed authority enhancements that we plan to work towards, >>> I've created a Launchpad Blueprint at >>> https://blueprints.launchpad.net/evergreen/+spec/respect-my-authorities ; >>> you'll find the meatier details of the proposed enhancements at >>> http://evergreen-ils.org/dokuwiki/doku.php?id=dev:proposal:authorities >>> >>> We welcome your thoughts, suggestions, warnings, and if you have >>> full-fledged examples including sample authority and bib records to >>> illustrate your concerns or ideas, those would be fabulous. >>> >>> >> >> First, thanks both to Dan and the IISH team! >> >> I've updated the wiki page ( >> http://open-ils.org/dokuwiki/doku.php?id=dev:proposal:authorities ) >> with some information about implementation that I've been working on >> over the last couple days. This is all backend infrastructure, and >> should not effect the overall implementation, but thoughts and >> concerns are welcome. >> > > Hi Mike: > > Thanks very much for the updates to the proposal and for the > implementation work you've already committed! You've got to leave us > something to do, eh? :) > > However, I find part of the following update a bit confusing: > > """ > Further implementation thought – we can use the ON UPDATE OR INSERT OR > DELETE trigger, which now exists in trunk for optional update > propagation (see below), to overwrites the 035$a with the id, preceded > by a value stored in an OU setting (or defaulting to, say, “EVRGRN”) > as the agency code, surrounded by parens. IMO (miker) this should be > unconditional, as the 035 is enough and it would be best to leave the > 001 alone. This would also allow us to simply drop the arn_value and > arn_source columns from authority.record_entry, which would be good > all around. > """ > > I don't know why you think "it would be best to leave the 001 alone". > We've started to discuss this in the past, but never finished the > discussion... maybe we can hash it out this time? I'll do my best to > represent my position. > > As I understand it, when a record is imported or created by a given > institution, it shifts the existing 001 into a 035 (if that 035 > doesn't already exist) and replaces the 001 with its own value. The > 035 is a repeatable field > (http://www.loc.gov/marc/authority/ad035.html), whereas the 001 is > non-repeatable (http://www.loc.gov/marc/authority/ad001.html). > > So it makes more sense, to me, to update the 001 with the purely > numeric ID - as otherwise, there may be a number of authority record > 035s for a given controlled bibliographic field to point to, but we > could be guaranteed to be able to create links that point to the > correct record if we point at the 001-synced-with-record-ID. We still > need to create the $0 subfield values with the agency source > identifier + authority record system control number for the controlled > field, but with all of the authority records stored in a single system > we would have to add another layer of abstraction (identify a given > authority record by one of a possible number of 035s - just one more > map table, I know, but I don't see what that layer of abstraction buys > us other than more complexity). From > http://www.loc.gov/marc/bibliographic/ecbdcntf.html:
Most systems I've seen authority data from do not change the 001 on import. However, most I've seen also usually just pull in data from LoC (or similar large sources), which generally have a mixed alpha-numeric control number scheme (thus avoiding numeric collisions). At least for publics, there's a good reason for not stomping the 001 -- they can dump the records and send them off to MARChive or the like for batch upgrade based on the original source (and source identifier) and overlay them based on the 001. I missed the repeatability of the 035, or perhaps just tried to block it out. The fact that the standard says "match the $0 against, er, one of some unbounded set of 035a values per authority record" seems to me to be just silly. I contend that what we need, regardless of what MARC says (and regardless of what else should happen to a record at create/import time), in order to make this work properly, is a field that we can control completely (IOW, isn't used for some other well-established process like batch upgrade), that is non-repeatable and (well, as a subpoint to the first) that we can guarantee is unique in a given instance. We have a precedent for something like that -- the 901c in the bib record. So, if the 001 or 035 can fit those criteria, then I'm fine with either ... but it doesn't sound like either can. > > """ > $0 - Authority record control number > System control number of the related authority record preceded by the > MARC code, enclosed in parentheses, for the agency to which the > control number applies. See Organization Code Sources for a listing of > sources used in MARC 21 records. > > 100 1#$aBach, Johann Sebastian.$4aut$0(DE-101c)31000889 > """ > > Similarly, when dealing with the linking from MFHD 004 to the > associated bibliographic records, we'll want to link to the bib's 001, > as the intro to MFHD specifies "Control Number", not "System Control > Number" (http://www.loc.gov/marc/holdings/hdintro.html): > Of course this assumes that the 001s are unique. Evergreen doesn't stomp the 001 for incoming records (but does try to use it as the external TCN) because in practice many, if not most, institutions (in the US, anyway) want to maintain the original 001 for things like OCLC holdings updates and MARChive batch upgrades, where the original source id is important. This is the main reason we shove the 901 into bib records at export time -- maintaining the original 001 is more important to external system integration than is following the MARC standard. I'd rather that weren't the case, but pragmatism and all... > """ > Separate holdings records - A separate holdings record is linked to > the related MARC bibliographic record by field 004 (Control Number for > Related Bibliographic Record). > """ > > Well, to muddy things somewhat, the docs for MFHD's 004 do switch > maddeningly between "bibliographic record control number" and "system > control number" at http://www.loc.gov/marc/holdings/hd004.html, but it > seems clear to me - although I might be nuts - that the intention is > for the non-repeatable 001 to be a unique identifier within a given > system that can be used for linking within that system - and that only > when those records are imported by another system does the 001 get > shifted to the new field. > But that also assumes that the 001 is unique and in practice (not just in Evergreen, but here too because of the data we must ingest and external processes we must support) it's not. Again ... 901. :( > If it's not meant to be used as the unique identifier within a given > system, what is the 001 possibly useful for? External integration with our OCLC overlords... (snarky, yes, but only half joking). > Having the 001 synced > with the record ID (whether authority, bibliographic, or holding > record) within the system makes our lives all a heck of a lot easier, > I think, I agree with this statement in principle, however ... > and I don't understand what the downside would be. (This is > where somebody unsheathes their Stick +5 of Cluefulness and enlightens > me!) > I can't speak with the authority of a +5 cluebat, but I can say that we can't go stomping the 001 in bibs. The solution (granted, restricted to bibs so far) has been to impose our authority (heh) over the 901 and use that for shoving a unique identifier into the record. We've only done this at export time so far. Therefore ... a modest proposal? * Forcibly maintain the 901c of /all/ MARC types, via triggers. * Forcibly maintain an 035a on authority records which includes the agency code (ou setting with default) and the id (aka, 901c). This let's us maintain a semblance of MARC-level spec compliance. * When linking authority records via $0, use the 901c to generate the $0 -- but we have a matching 035a in the authority so all looks well at the MARC level * When linking MFHD records via 004, copy the 001 from the bib, but we have a field on serial.record_entry that holds the internal id (aka bib 901c) of the bib So, we maintain the veneer of MARC-level linking, but acknowledge that it's not possible in practice and use internal ids for the real work. Eh? -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: http://www.esilibrary.com
