On 9/12/13 8:37 AM, Tom Morris wrote:

>
>
> It looks like the 1998 proposal was approved according to these
> guidelines from June:
> http://www.loc.gov/marc/nonsorting.html


Yes, it was approved, but never implemented in the US. It was added to 
aid the transition of the German libraries to MARC - they already had 
this capability in their format (MAB). I've never seen it in "live" 
records before, so it's still got only limited use.


>

>
> OK, after maze of documents all pointing at each other, I found a place
> that defines this in a useful fashion:
> http://lcweb2.loc.gov/diglib/codetables/45.html
>
> MARC-8        MARC-8
> as C1 UCS     UTF-8   CHAR    C?      NAME    ALT     ALT UTF-8
>       88      0098    C298    ˜               NON-SORT BEGIN / START OF 
> STRING                
>       89      009C    C29C    œ               NON-SORT END / STRING TERMINATOR
>
> which explains the oe ligature in your data, although the graphic
> representation doesn't mean it's the same as the real tilde and oe
> ligature.  The real tilde has UTF-8 representation of 0x7E instead of
> 0xC298.


Great, thanks. I'd forgotten about those code tables. So the 88, 89 were 
8-bit ascii, as defined in MARC-8, not in "normal" ASCII. MARC doesn't 
used the extended latin combined characters, but has separate codes for 
character and diacritic. (And redefines all of the values of 8-bit ascii 
in a proprietary way!)


>
> The weird thing is that your data seems to have the raw 0x98 and 0x9C
> without the 0xC2 byte introducing them.  That doesn't seem correct on
> the surface, but I'm not sure where you cut & pasted your data from.

You can find them as 0098 and 009C in this code page:
http://www.unicode.org/charts/PDF/U0080.pdf

I did a hex display of the data in a text editor (textmate) and can't 
attest to its accuracy. I also don't know if either the creation of the 
MARC file or the display of it didn't alter something - that's the real 
bugaboo with trying to "look" at character sets. I no longer have a hex 
dump or binary dump program around. (At one point I could read both with 
ease... glad that's behind me!)


>
>     For OL (which doesn't really need non-filing characters, I believe) we
>     could just strip these characters out. If someone could strip them out
>     of the current set I could run marcedit again. I'm just trying to get a
>     good look at the records to see if they'll translate well to OL fields.
>
>
> Rather than futzing around with closed source marcedit, could I just use
> PyMarc to make a formatted dump of a few records for you?


That would be great, thanks. Actually, the whole set of 100 that 
Johannes provided would be ideal:

https://dl.dropboxusercontent.com/u/38124925/dnb_sample.mrc

kc


>
> Tom
>
>     I'm heading off for 10 days to the Dublin Core conference in Lisbon. If
>     anyone else has time to do analysis on this, please feel free:
>
>     http://archive.org/details/marc21_records_german_national_library
>
>     kc
>
>     [1] http://www.loc.gov/marc/marbi/1998/98-16r.html
>
>

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to