On 11/01/06, Lachlan Hunt <[EMAIL PROTECTED]> wrote:
> liorean wrote:
> > Character references refer to Unicode code points independent of the
> > document encoding and character set. At least for HTML4 and XML, if
> > not for HTML3.2.
>
> As far as character references in HTML are concerned, they have always
> referred to the Unicode code points since HTML 2.0.

Ah. I just saw

         BASESET  "ISO 646:1983//CHARSET
                   International Reference Version
                   (IRV)//ESC 2/5 4/0"
         BASESET  "ISO Registration Number 100//CHARSET
                   ECMA-94 Right Part of
                   Latin Alphabet Nr. 1//ESC 2/13 4/1"

in HTML3.2 and

          BASESET  "ISO Registration Number 177//CHARSET
                    ISO/IEC 10646-1:1993 UCS-4 with
                    implementation level 3//ESC 2/5 2/15 4/6"

in HTML4.01 SGML declarations and assumed the first one (ISO-646) was
ANSI, the second one (ECMA-94) was the extended 8-bit characters
(latin-1) and the third one (ISO-10646) was Unicode. This assumption
was wrong?

> See my article:
> http://lachy.id.au/log/2005/10/char-refs
> (take note of the comments too, which contain a few corrections)

I read it months ago :)
--
David "liorean" Andersson
<uri:http://liorean.web-graphics.com/>
******************************************************
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list & getting help
******************************************************

Reply via email to