Re: [OPEN-ILS-GENERAL] Marc_stream_importer for batch loading

Janet Schrader Fri, 14 Mar 2014 14:48:06 -0700

Hi Martha,

This is good news. I hope I can accomplish this.


I have been loading files of 1,000 records each. It takes about 35-40 minutes 
per file but they don't time out. I do the queue first to get some idea of how 
many will match and on what because sometimes I want to overlay and preserve 
the 856s and sometimes I want to just add the new 856s. Then I start the load. 
While that load is processing, I do another queue. Still I was only loading 5 
or 6 files a day so this will definitely speed up the process. Two libraries in 
our consortium want me to load EBSCO records, 112,000 for each library. 

If you select to overlay 1 match and import non-matching do you get a list of 
the records that didn't load? Those would be ones with 2 or more matches.




Thanks,
Janet

Janet Schrader
C/W MARS Inc.
Supervisor of Bibliographic Services
67 Millbrook Street, Suite 201
Worcester, MA 01606
tel: 508-755-3323 ext. 25
fax: 508-757-7801
[email protected]



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Martha 
Driscoll
Sent: Friday, March 14, 2014 4:30 PM
To: [email protected]
Subject: [OPEN-ILS-GENERAL] Marc_stream_importer for batch loading

We have recently come up with a good way to load electronic resource records 
that I wanted to share.

We have been struggling with how to load our electronic resource marc records 
into Evergreen.  We constantly receive files from vendors and our cataloger 
loads them through Vandelay.  Sometimes the records match on-file records and 
just add an 856 link.  Other records are new and need to be added.  Vandelay is 
a great tool because you can setup match criteria and overlay profiles.

The only problem is Vandelay will timeout with a file of more than 500 records. 
 We have tried splitting the files into 500-record chunks, but the overhead in 
queuing up the files, especially when you split a 20,000-record file into 40 
pieces, can add up.

The solution we have been happy with is an updated version of 
marc_stream_importer.pl that Bill Erickson recently worked on (LP# 1279998).  
Bill added support for overlay 1 match, overlay best match, and import 
non-matching records.  By default marc_stream_importer assumes you have 
supplied a record ID in a 901 $c.  This version now supports all the vandelay 
options but can be run from the command line which also means you can script 
the loading of records.

Here is how I load a file:

marc_stream_importer.pl  --spoolfile /home/opensrf/file-7 --user xxx --password 
xxx --source 102 --merge-profile 2 --queue
  11391 --auto-overlay-best-match --import-no-match --nodaemon

The record source and merge profile are specified on the command line. 
The queue contains the record match set.  If there are no errors, 
marc_stream_importer will empty the queue.

I can find the record ID's of records added or updated in the log files:

#!/usr/bin/perl

@imported = `grep queue=11391
/var/log/evergreen/prod/2014/03/14/activity.log`;

foreach $line (@imported) {
     if ($line =~ /imported_as= ischanged/) {next};
     $line =~ s/.*(imported_as=[0-9]+) .*/\1/;
     print $line;
}

Marc_stream_importer, like Vandelay, still has problems loading more than 500 
records at a time.  I was getting 'out of shared memory errors (see 
LP#1271661).  The good news is that files can be easily split using 
yaz-marcdump and then the commands can be stacked in a shell script.

Here is how to split a file into 500-record files:

yaz-marcdump -i marc -o marc -s file- -C 500 mybigfile.mrc > /dev/null

Then it's just a matter of creating a shell script to run through the files one 
at a time piping the output to a log file so I can verify the records loaded.  
Over the last 4 nights I was able to load 4 files of
5900 records each.

--
Martha Driscoll
Systems Manager
North of Boston Library Exchange
Danvers, Massachusetts
www.noblenet.org

Re: [OPEN-ILS-GENERAL] Marc_stream_importer for batch loading

Reply via email to