RE: Corrupt MARC records

2005-05-11 Thread Ron Davies
At 16:58 7/05/2005, Andrew Houghton wrote:
The code is off the top of my head and parts have been copied from a 
variety of Perl scripts I had hanging around.  It isn't tested, but 
hopefully a start for your work.
Thanks, Andy, there's a lot there that I can put to good use. Much more 
elegant code than I could have written off the top of MY head.

At 21:10 7/05/2005, Ed Summers wrote:
It's ironic that MARC::Record *used* to do what Andrew suggests: using 
split() rather than
than substr() with the actual directory lengths. The reason for the switch 
was just as Andrew pointed out: the order of the tags in the directory is 
not necessarily the order of the field data.
Has anybody ever seen a MARC record where the order of the field data 
wasn't the same as that of the entries in the directory? I'm not 
questioning the logic of reading a record using the field lengths and 
offsets, just wondering if anybody had ever seen this occur in the wild. I 
never have.

Thanks again for your help and the confirmation that this kind of 
correction is a reasonable thing to do.

Ron
Ron Davies
Information and documentation systems consultant
Av. Baden-Powell 1  Bte 2, 1200 Brussels, Belgium
Email:  ron(at)rondavies.be
Tel:+32 (0)2 770 33 51
GSM:+32 (0)484 502 393 

Re: Corrupt MARC records

2005-05-11 Thread Colin Campbell
Ron Davies wrote:
Has anybody ever seen a MARC record where the order of the field data 
wasn't the same as that of the entries in the directory? I'm not 
questioning the logic of reading a record using the field lengths and 
offsets, just wondering if anybody had ever seen this occur in the wild. 
I never have.

I have although I can't recall where it came from. (It was some years 
ago) The problem was exacerbated because the program reading it assumed 
the directory and field sequence matched and was not flagging any 
errors. It was sometime later that users spotted some records were odd 
and it took a while to trace it back to this cause.

Colin
--
Colin Campbell
Software Development Consultant
Sirsi Ltd


Re: Corrupt MARC records

2005-05-07 Thread Ed Summers
I wondered if any of you had run into similar problems, or if you had 
any thoughts on how to tackle this particular issue.
It's ironic that MARC::Record *used* to do what Andrew suggests: using 
split() rather than
than substr() with the actual directory lengths. The reason for the 
switch was just as Andrew pointed out: the order of the tags in the 
directory is not necessarily the order of the field data.

If you need to you could try downloading MARC::Record v1.17 and try 
using that. Or you could roll your own code and cut and paste it 
everywhere like Andrew ;-)

//Ed


RE: Corrupt MARC records

2005-05-07 Thread Houghton,Andrew
 
Most MARC utilities like MARC::Record depend upon the actual directory lengths 
and having well formed structure.  Isn't that what standards are for?  But 
sometimes you really do get badly formed MARC records and need to recover the 
data.  The presented code does have two caveats, which I pointed out and Ed 
reiterates.  The directory *must* be in the same order as the fields.

However, even if the fields are not in the same order as the directory, code 
could be written to take that into account so long as you can make the 
assumption that the start positions for each directory entry give the nearest 
position to the data.  If we take the directory and sort on the start position 
field, we will have the directory in the order necessary for extraction by the 
presented code.

Of course, you would probably want to keep track of the original directory and 
the sorted directory order so you can output the MARC record with the fields in 
the same order as the original.  Things are never ideal when you have corrupt 
MARC records...


Andy.

-Original Message-
From: Ed Summers [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 07, 2005 3:11 PM
To: perl4lib@perl.org
Subject: Re: Corrupt MARC records

 I wondered if any of you had run into similar problems, or if you had 
 any thoughts on how to tackle this particular issue.

It's ironic that MARC::Record *used* to do what Andrew suggests: using
split() rather than
than substr() with the actual directory lengths. The reason for the switch was 
just as Andrew pointed out: the order of the tags in the directory is not 
necessarily the order of the field data.

If you need to you could try downloading MARC::Record v1.17 and try using that. 
Or you could roll your own code and cut and paste it everywhere like Andrew ;-)

//Ed