Re: How to convert from ANSEL/MARC-8 to UTF-8?

2009-01-07 Thread Michael Lackhoff
On 07.01.2009 17:47 Galen Charlton wrote:

> You can use NFC() from Unicode::Normalize to do this (after using
> MARC::Charset to do the conversion to UTF-8).

Now I have got it! I had an "encode" too early and the NFC() didn't work
on the UTF-8 octets.

Regarding presenting it on the web: Technically everythings is fine and
works pre-combined or not (I have all the necessary headers), the
non-combined version just looks ugly.

Thanks for all your help and suggestions
-Michael


RE: How to convert from ANSEL/MARC-8 to UTF-8?

2009-01-07 Thread Houghton,Andrew
> From: Galen Charlton [mailto:[email protected]]
> Sent: Wednesday, January 07, 2009 11:47 AM
> To: Michael Lackhoff
> Cc: [email protected]
> Subject: Re: How to convert from ANSEL/MARC-8 to UTF-8?
> 
> On Wed, Jan 7, 2009 at 11:42 AM, Michael Lackhoff
>  wrote:
> > diakritics + base char to the combined character. So I still have two
> > characters for e.g. the
> > German umlauts. This might be correct UTF-8 but is not useable to
> > present in (X)HTML.

I just cannot let that go.  UTF-8 *is* Unicode encoded in a special way.
Whether the characters are combined or uncombined is not relevant to
(X)HTML so long as you specify that the document is encoded in a Unicode
encoding, e.g., UTF-8, UTF-16BE, UTF-16LE, and the user agent, e.g.,
browser understands Unicode which I think is a requirement of the (X)HTML
standards.  Your browser should be able to deal with combined or uncombined
characters however, uncombined characters may not display appropriately due
to font rendering issues, which is why you might be inclined to pre-compose
any uncombined characters in your (X)HTML, e.g., convert them to Unicode 
Normal Form C (NFC).


Andy.



Re: How to convert from ANSEL/MARC-8 to UTF-8?

2009-01-07 Thread Galen Charlton
Hi,

On Wed, Jan 7, 2009 at 11:42 AM, Michael Lackhoff
 wrote:
> diakritics + base char to the combined character. So I still have two
> characters for e.g. the
> German umlauts. This might be correct UTF-8 but is not useable to
> present in (X)HTML.
> Is there any other option short of  doing it by hand with lots of s///
> for at least the most common
> combinations?

You can use NFC() from Unicode::Normalize to do this (after using
MARC::Charset to do the conversion to UTF-8).

Regards,

Galen
-- 
Galen Charlton
VP, Research & Development, LibLime
[email protected]
p: 1-888-564-2457 x709
skype: gmcharlt