Hi Folks,

I have heard it stated that, in the context of character encoding and decoding:

    Interoperability is getting better.

Do you have data to back up the assertion that interoperability is getting 
better?

Below is a summary of my understanding of interoperability. Would you inform me 
of any misunderstandings please?

-------------------------------------------------------------------------------
Interoperability of Text (i.e., Character Encoding Interoperability)
-------------------------------------------------------------------------------
Remember not long ago you would visit a web page and see strange characters 
like this:

    “Good morning, Daveâ€

You don't see that anymore. 

Why?

The answer is this:

    Interoperability is getting better.

In the context of character encoding and decoding, what does that mean?

Interoperability means that you and I interpret (decode) the bytes in the same 
way.

Example: I create text file, encode all the characters in it using UTF-8, and 
send the text file to you. 

Here is a graphical depiction (i.e., glyphs) of the bytes that I send to you:

    López

You receive my text document and interpret the bytes as iso-8859-1. 

In UTF-8 the ó symbol is a graphical depiction of the "LATIN SMALL LETTER O 
WITH ACUTE" character and it is encoded using these two bytes: C3 B3

But in iso-8859-1, the two bytes C3 B3 is the encoding of two characters:

     C3 is the encoding of the à character
     B3 is the encoding of the ³ character

Thus you interpret my text as:

    López

We are interpreting the same text (i.e., the same set of bytes) differently.

Interoperability has failed.

So when we say: 

    Interoperability is getting better.

we mean that the number of incidences of senders and receivers interpreting the 
same bytes differently is decreasing.  

Let's revisit our first example. You go to a web site and see this: 

     “Good morning, Daveâ€

Here's how that happened:

I use Microsoft Word (character set, Windows-1252) to create a web page 
containing this text document:

    “Good morning, Dave”

Notice that I wrapped the greeting in Microsoft smart quotes. 

You visit my web page.

Suppose your browser is set to interpret all web pages as iso-8859-15.

In Windows-1252 the left smart quote is hex: 93

In Windows-1252 the right smart quote is hex: 84

In iso-8859-15 there are no characters assigned to either hex 93 or hex 84.  

So your browser replaces the left smart quote (hex 93) with hex E2 (â) followed 
by hex A4 (€) followed by hex BD (œ).

And your browser replaces the right smart quote (hex 84) with hex E2 (â) 
followed by hex A4 (€). 

The result is that you see this on your browser screen:

    “Good morning, Daveâ€





Reply via email to