Hi Karen, Sorry for the delay on this. Here are links to both the original MARC extract and the PyMARC/MARCbreaker formatted text version:
MARC - https://dl.dropboxusercontent.com/u/33663928/dnb_sample.mrc Text - https://dl.dropboxusercontent.com/u/33663928/dnb_sample.txt I also came across this DNB MARC documentation http://www.dnb.de/SharedDocs/Downloads/DE/DNB/standardisierung/marc21FieldsDnbZdbRecords2009En.pdf?__blob=publicationFile which may be useful. I originally tried to include them as attachements since I figured it would be useful to have them in the archive rather than on an ephemeral Dropbox location, but the mailer is set up with a very low 40KB maximum message size (this would have been well under 200KB with both attachments). Tom On Thu, Sep 12, 2013 at 12:18 PM, Karen Coyle <[email protected]> wrote: > > > On 9/12/13 8:37 AM, Tom Morris wrote: > > >> >> It looks like the 1998 proposal was approved according to these >> guidelines from June: >> http://www.loc.gov/marc/**nonsorting.html<http://www.loc.gov/marc/nonsorting.html> >> > > > Yes, it was approved, but never implemented in the US. It was added to aid > the transition of the German libraries to MARC - they already had this > capability in their format (MAB). I've never seen it in "live" records > before, so it's still got only limited use. > > > > >> > >> OK, after maze of documents all pointing at each other, I found a place >> that defines this in a useful fashion: >> http://lcweb2.loc.gov/diglib/**codetables/45.html<http://lcweb2.loc.gov/diglib/codetables/45.html> >> >> MARC-8 MARC-8 >> as C1 UCS UTF-8 CHAR C? NAME ALT ALT UTF-8 >> 88 0098 C298 ˜ NON-SORT BEGIN / START OF >> STRING >> 89 009C C29C œ NON-SORT END / STRING >> TERMINATOR >> >> which explains the oe ligature in your data, although the graphic >> representation doesn't mean it's the same as the real tilde and oe >> ligature. The real tilde has UTF-8 representation of 0x7E instead of >> 0xC298. >> > > > Great, thanks. I'd forgotten about those code tables. So the 88, 89 were > 8-bit ascii, as defined in MARC-8, not in "normal" ASCII. MARC doesn't used > the extended latin combined characters, but has separate codes for > character and diacritic. (And redefines all of the values of 8-bit ascii in > a proprietary way!) > > > > >> The weird thing is that your data seems to have the raw 0x98 and 0x9C >> without the 0xC2 byte introducing them. That doesn't seem correct on >> the surface, but I'm not sure where you cut & pasted your data from. >> > > You can find them as 0098 and 009C in this code page: > http://www.unicode.org/charts/**PDF/U0080.pdf<http://www.unicode.org/charts/PDF/U0080.pdf> > > I did a hex display of the data in a text editor (textmate) and can't > attest to its accuracy. I also don't know if either the creation of the > MARC file or the display of it didn't alter something - that's the real > bugaboo with trying to "look" at character sets. I no longer have a hex > dump or binary dump program around. (At one point I could read both with > ease... glad that's behind me!) > > > > >> For OL (which doesn't really need non-filing characters, I believe) we >> could just strip these characters out. If someone could strip them out >> of the current set I could run marcedit again. I'm just trying to get >> a >> good look at the records to see if they'll translate well to OL >> fields. >> >> >> Rather than futzing around with closed source marcedit, could I just use >> PyMarc to make a formatted dump of a few records for you? >> > > > That would be great, thanks. Actually, the whole set of 100 that Johannes > provided would be ideal: > > https://dl.dropboxusercontent.**com/u/38124925/dnb_sample.mrc<https://dl.dropboxusercontent.com/u/38124925/dnb_sample.mrc> > > kc > > > > >> Tom >> >> I'm heading off for 10 days to the Dublin Core conference in Lisbon. >> If >> anyone else has time to do analysis on this, please feel free: >> >> >> http://archive.org/details/**marc21_records_german_**national_library<http://archive.org/details/marc21_records_german_national_library> >> >> kc >> >> [1] >> http://www.loc.gov/marc/marbi/**1998/98-16r.html<http://www.loc.gov/marc/marbi/1998/98-16r.html> >> >> >> > -- > Karen Coyle > [email protected] http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet >
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech Archives: http://www.mail-archive.com/[email protected]/ To unsubscribe from this mailing list, send email to [email protected]
