Picking up on Dan Well's note about revising marc2bre, I've been playing with an older VMWare image of Evergreen (v1.2.1.3 or something like that). Why? 'Cause this image was built with 20 GB of (virtual) disk space, plenty to handle a copy of Creighton's bibs (669,000) and associated items (1.1 million). There are other, newer VMWare images, but they don't have so much space.
Lots of space give me room to run scripts against, say, a 1 GB input file and get ginormous (1, 2, 4, 5, 7 or more GB) output files out the other end. My problem at the moment is (I'm guessing the file name, but you'll know what I mean) pg_loader_bre.ql (or something like that) contains duplicate "bre" records. The target table in PostgreSQL has at least one column set to "no dups," so the import fails. Dan Scott suggested I grep around the duplicate records. That's always an option, but I reasoned it'd be quicker to create a clean export file from the source system, and then have that clean file to process on the target side. I found a way to tell the export program on the source side to put an integer into the tag and subfield of my choice. This integer value simply numbers the bib records in the output file from first to last. That way I have a guaranteed (sic?) unique ID number in the source file. However, I discovered on import that there is still some other field declared as unique, which causes PostgreSQL to do what it does, and stop the import when it finds a duplicate key. Hmmm, what to do? I suggest the processing script (somehow) identify duplicate records, write them to an exception file, and skip to the next record. This is potentially difficult because the import scripts, several *.sql files, contain related records to *.bre records. So a duplicate *.bre record would be skipped, along with any related records in other files. I wonder how to do this? Mark Mark Andrews, MLS, Systems Librarian Academic & e-Learning Technologies, Division of Information Technology Creighton University, Omaha, NE 68178 402-280-3065 - [EMAIL PROTECTED]
