On Dec 5, 2006, at 8:42 AM, Igor Tandetnik wrote:

Da Martian <[EMAIL PROTECTED]> wrote:
So if I look at a name with umlaughts in the database via sqlite3.exe
I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an "a" with two dots on top

"A with umlaut" is represented as two bytes in UTF-8.

This is a huge simplification. At a bare minimum, 'ä' can be represented as either one or two Unicode code points -- one code point represented 'ä' or one representing 'a' and one representing the '¨' combining mark. How *that* is represented in the UTF-8 encoding of Unicode is another issue, that depends on the exact values of the code points involved.

The particular example of 'ä' be represented as two bytes in UTF-8 in both cases (I don't know offhand) but that's not something that can be generalized.

  -- Chris


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to