Re: How to convert from ANSEL/MARC-8 to UTF-8?
On 07.01.2009 17:47 Galen Charlton wrote: > You can use NFC() from Unicode::Normalize to do this (after using > MARC::Charset to do the conversion to UTF-8). Now I have got it! I had an "encode" too early and the NFC() didn't work on the UTF-8 octets. Regarding presenting it on the web: Technically everythings is fine and works pre-combined or not (I have all the necessary headers), the non-combined version just looks ugly. Thanks for all your help and suggestions -Michael
RE: How to convert from ANSEL/MARC-8 to UTF-8?
> From: Galen Charlton [mailto:[email protected]] > Sent: Wednesday, January 07, 2009 11:47 AM > To: Michael Lackhoff > Cc: [email protected] > Subject: Re: How to convert from ANSEL/MARC-8 to UTF-8? > > On Wed, Jan 7, 2009 at 11:42 AM, Michael Lackhoff > wrote: > > diakritics + base char to the combined character. So I still have two > > characters for e.g. the > > German umlauts. This might be correct UTF-8 but is not useable to > > present in (X)HTML. I just cannot let that go. UTF-8 *is* Unicode encoded in a special way. Whether the characters are combined or uncombined is not relevant to (X)HTML so long as you specify that the document is encoded in a Unicode encoding, e.g., UTF-8, UTF-16BE, UTF-16LE, and the user agent, e.g., browser understands Unicode which I think is a requirement of the (X)HTML standards. Your browser should be able to deal with combined or uncombined characters however, uncombined characters may not display appropriately due to font rendering issues, which is why you might be inclined to pre-compose any uncombined characters in your (X)HTML, e.g., convert them to Unicode Normal Form C (NFC). Andy.
Re: How to convert from ANSEL/MARC-8 to UTF-8?
Hi, On Wed, Jan 7, 2009 at 11:42 AM, Michael Lackhoff wrote: > diakritics + base char to the combined character. So I still have two > characters for e.g. the > German umlauts. This might be correct UTF-8 but is not useable to > present in (X)HTML. > Is there any other option short of doing it by hand with lots of s/// > for at least the most common > combinations? You can use NFC() from Unicode::Normalize to do this (after using MARC::Charset to do the conversion to UTF-8). Regards, Galen -- Galen Charlton VP, Research & Development, LibLime [email protected] p: 1-888-564-2457 x709 skype: gmcharlt
