I believe this thread started on ol-discuss, but it's now "techincal." I tried running the test set of 100 records through marcedit, and got an error. I suspect that the problem is with the character set because I was able to validate the records (which I believe just looks at structure) with that same program. Looking at the raw data, it looks to me like the records are using the "non-filing" elements that were added to the MARC standard but were never implemented in the US. So this (in hex):
0x1f 0x98 0x61 0x44 0x61 0x73 0x9c is the first part of aDas Imiut Where the "a" and "s" are printing out as the non-filing characters. (The records claim to be in utf-8) Because this never was implemented in the US it isn't documented in the MARC documentation. The latest info I can find is a 1998 proposal [1] that the control characters are: Hex 'X88' nonsorting character, begin Hex 'X89' nonsorting character, end (I believe those are ASCII characters, not Unicode.) For OL (which doesn't really need non-filing characters, I believe) we could just strip these characters out. If someone could strip them out of the current set I could run marcedit again. I'm just trying to get a good look at the records to see if they'll translate well to OL fields. I'm heading off for 10 days to the Dublin Core conference in Lisbon. If anyone else has time to do analysis on this, please feel free: http://archive.org/details/marc21_records_german_national_library kc [1] http://www.loc.gov/marc/marbi/1998/98-16r.html -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
