[CODE4LIB] a note on MARC8 to UTF8 transcoding: Character references

2013-11-05 Thread Jonathan Rochkind
Do you do sometimes deal with MARC in the MARC8 character encoding? Do you deal with software that converts from MARC8 to UTF8? Maybe sometimes you've seen weird escape sequences that look like HTML or XML character references, like, say #x200F;. You, like me, might wonder what the heck

Re: [CODE4LIB] a note on MARC8 to UTF8 transcoding: Character references

2013-11-05 Thread Terry Reese
Yeah -- this has been part of the MARC standard for quite some time (2004?)...LC added it as a way to protect round trip ability. MarcEdit has supported this for years -- it's actually one of the questions that I have to answer occasionally when people translate UTF8 code outside of the MARC8

Re: [CODE4LIB] a note on MARC8 to UTF8 transcoding: Character references

2013-11-05 Thread Bryan Baldus
So be warned, you may need to add this to your software too. One of these that may cause problems in some systems (including the ones we use; hopefully our customers' systems deal with it more appropriately) is the character used in the middle of [1], the Extended Roman alif character which