On Fri, Jun 27, 2008 at 4:12 PM, Dan Wells <[EMAIL PROTECTED]> wrote: > Hello all, > > I have been playing around with record loads of various shapes and sizes over > the last week or two, and have come to the conclusion that marc2bre.pl is a > bit discombobulated in its current state. It boils down to a consistent > state of confusion within the code between the record id (i.e. database id) > and the record title control number. I believe these should generally not be > the same thing, and I would say about half of the code agrees with me :) In > particular, the idfield argument seems be the source of most of my problems. > I believe it was originally meant to be a way to specify tcns, not database > record ids, but since tcns are often alphanumeric, the regular expression > which strips out any non-digits flies in the face of this. The end result is > that there is a bunch of code, particularly in the preprocess subroutine that > is supposed to check and intelligently set the tcn but which never gets run > under normal circumstances (short of an odd dontuse_file setting). From what > I can tell, there is therefore no good way to get a file out the other end > with sane tcns (unless yours happen to be all digits). >
I'm not at a computer where I can look now, and I'm not sure which branch you're looking at, but here's what is intended: idfield is, in fact, meant to specify the field (subfield a) from which to extract the database id. More on that later. If there is no available tcn value (as defined in preprocess()) then the record id will be used. There could very well have been short-circut logic introduced into the trunk of svn that causes the idfield value to be used, but that is not the intention. I'll look at it when I get back to my computer. The purpose of the dontuse parameter (which could certainly use a better name) is to inform marc2bre of existing TCN values already in use in the database, for instance when you are loading new records into an existing implementation. That lets it look for alternate TCNs when there is a collision. > I have created a new version which hopefully untangles most of this. I left > in the idfield setting for setting the record database id (though I am not > sure how useful this actually is) and added tcnfield and tcnsubfield settings > which honor common tcn formats and use the preprocess code properly in case > of duplicates. It is currently being tested, but before I post any version > of it. I am wondering if am completely nuts about all of this. > You're not nuts, and being able to specify a tcn field (and subfield) is a great addition! As for the usefulness of idfield, the point there is to maintain (with a potenial offset supplied by the adjustid parameter) the identifier that a legacy system uses to address the record, where applicable, at migration time. Most ILSs (Evergreen included) use an internal identifier, because TCN is too human-supplied to be trustworthy as a unique identifier (and forcing a user to change the TCN to make it unique seems ... bad). Item, hold and other records will usually use this internal identifier to point at a bib record, and moving the old id (space-shifted by adjustid) is much easier than trying to stich things back together using some other means ... impossible in some cases, in fact. In any case, please do post your new version (or you can send it to me directly if you'd prefer) and I'll go over it as soon as I can. Any improvment and cleanup of marc2bre is a good thing, as it's a critical component. Thanks Dan! -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://www.esilibrary.com
