Larry has been consistently using

OxAB op 0xBB

in his messages to represent a (French quote) hyperop,
(corresponding to the Unicode characters 0x00AB and 0x00BB)
which is consistent with the iso-8859-1 encoding (despite
the fact that my mailserver or his mailer insists on
labelling those messages as UTF-8).

However, the UTF-8 encoding of those Unicode characters
actually is:

0xC2AB op 0xC2BB

.. As far as I understand it, the UTF-8 encoding only allows
single byte representations of characters if they fall in the
0x00 to 0x7F range.

So the question is, if I'm writing a program and I actually
want to use one of these ops, do I put

0xAB op 0xBB

or

0xC2AB op 0xC2BB

?

-- Matt,
   who'd never thought he'd have to do hex dumps to debug
   his Perl programs ;)

-- 
      Matthew Zimmerman
      Interdisciplinary Biophysics, University of Virginia
      http://www.people.virginia.edu/~mdz4c/

Reply via email to