On 9/12/13 8:37 AM, Tom Morris wrote:
> > > It looks like the 1998 proposal was approved according to these > guidelines from June: > http://www.loc.gov/marc/nonsorting.html Yes, it was approved, but never implemented in the US. It was added to aid the transition of the German libraries to MARC - they already had this capability in their format (MAB). I've never seen it in "live" records before, so it's still got only limited use. > > > OK, after maze of documents all pointing at each other, I found a place > that defines this in a useful fashion: > http://lcweb2.loc.gov/diglib/codetables/45.html > > MARC-8 MARC-8 > as C1 UCS UTF-8 CHAR C? NAME ALT ALT UTF-8 > 88 0098 C298 ˜ NON-SORT BEGIN / START OF > STRING > 89 009C C29C œ NON-SORT END / STRING TERMINATOR > > which explains the oe ligature in your data, although the graphic > representation doesn't mean it's the same as the real tilde and oe > ligature. The real tilde has UTF-8 representation of 0x7E instead of > 0xC298. Great, thanks. I'd forgotten about those code tables. So the 88, 89 were 8-bit ascii, as defined in MARC-8, not in "normal" ASCII. MARC doesn't used the extended latin combined characters, but has separate codes for character and diacritic. (And redefines all of the values of 8-bit ascii in a proprietary way!) > > The weird thing is that your data seems to have the raw 0x98 and 0x9C > without the 0xC2 byte introducing them. That doesn't seem correct on > the surface, but I'm not sure where you cut & pasted your data from. You can find them as 0098 and 009C in this code page: http://www.unicode.org/charts/PDF/U0080.pdf I did a hex display of the data in a text editor (textmate) and can't attest to its accuracy. I also don't know if either the creation of the MARC file or the display of it didn't alter something - that's the real bugaboo with trying to "look" at character sets. I no longer have a hex dump or binary dump program around. (At one point I could read both with ease... glad that's behind me!) > > For OL (which doesn't really need non-filing characters, I believe) we > could just strip these characters out. If someone could strip them out > of the current set I could run marcedit again. I'm just trying to get a > good look at the records to see if they'll translate well to OL fields. > > > Rather than futzing around with closed source marcedit, could I just use > PyMarc to make a formatted dump of a few records for you? That would be great, thanks. Actually, the whole set of 100 that Johannes provided would be ideal: https://dl.dropboxusercontent.com/u/38124925/dnb_sample.mrc kc > > Tom > > I'm heading off for 10 days to the Dublin Core conference in Lisbon. If > anyone else has time to do analysis on this, please feel free: > > http://archive.org/details/marc21_records_german_national_library > > kc > > [1] http://www.loc.gov/marc/marbi/1998/98-16r.html > > -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
