Hi all.

After a year's hiatus, I am back loading data into evergreen. It goes much 
smoother than last time. Thanks for the improvements!

I do have a few comments.

1) the instructions at 
http://open-ils.org/dokuwiki/doku.php?id=evergreen-admin:importing:bibrecords, 
in the introductory section, say "If you take this approach, due to a current 
limitation of MARC::File::XML you have to do a horrible thing and ensure that 
there are no namespace prefixes in front of the element names. marc2bre.pl 
cannot parse the following example"... In fact, yaz-marcdump does not seem to 
insert namespace prefixes, so if you use it to convert, there are no issues 
with that.

2) point one not withstanding, marc2bre stops dead on marcxml records with 
'bad' characters. I ended up biting the 'time' bullet and using marc2bre on 
'real' marc records, cuz that doesn't choke on the odd hex character. All that 
being said, the speed difference in the marc2bre run was going from an average 
of 305 records per second with xml to 175 per second with utf8 marc, so it 
wasn't a terrible speed hit.

3) after using the real marc to get a bre file, I had issues with pg_ingest 
choking on what were probably the same bad characters in the bre file... but 
it, at least, just noisily dropped the bad records. If I were to take:

sub rawJSON2perl {
        my $class = shift;
    my $json = shift;
    return undef unless defined $json and $json !~ /^\s*$/o;
    return $parser->decode($json);
}

from JSON.pm and replace it with 

sub rawJSON2perl {
        my $class = shift;
    my $json = shift;
    return " " unless defined $json and $json !~ /^\s*$/o;
    return $parser->decode($json);
}

would that keep the records but just 'blank' the otherwise bad character. 
(there seem to be more than a few hex 06 and hex 01 and some others in my 
data... what can I say?).


Other than those points, I am presently flogging my poor old 1gb ubuntu system 
by doing a pg_loader on a 6million line ingest file... and it may be slow, but 
it sure isn't fast, either!

don (waiting for loader to finish).

ps... If you could remind me what my id/password for the wiki might be, I would 
make these notes on the loader page.

pps. As I re-read this, it occurs to me I may have had too much green tea 
today.... but I'm hitting send, anyway



Reply via email to