I believe this thread started on ol-discuss, but it's now "techincal." I 
tried running the test set of 100 records through marcedit, and got an 
error. I suspect that the problem is with the character set because I 
was able to validate the records (which I believe just looks at 
structure) with that same program. Looking at the raw data, it looks to 
me like the records are using the "non-filing" elements that were added 
to the MARC standard but were never implemented in the US. So this (in hex):

0x1f 0x98 0x61 0x44 0x61 0x73 0x9c

is the first part of

a˜Dasœ Imiut

Where the "a" and "s" are printing out as the non-filing characters. 
(The records claim to be in utf-8)

Because this never was implemented in the US it isn't documented in the 
MARC documentation. The latest info I can find is a 1998 proposal [1] 
that the control characters are:

Hex 'X88' nonsorting character, begin
Hex 'X89' nonsorting character, end

(I believe those are ASCII characters, not Unicode.)

For OL (which doesn't really need non-filing characters, I believe) we 
could just strip these characters out. If someone could strip them out 
of the current set I could run marcedit again. I'm just trying to get a 
good look at the records to see if they'll translate well to OL fields.

I'm heading off for 10 days to the Dublin Core conference in Lisbon. If 
anyone else has time to do analysis on this, please feel free:

http://archive.org/details/marc21_records_german_national_library

kc

[1] http://www.loc.gov/marc/marbi/1998/98-16r.html
-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to