Re: [users] Re: Character Encodings

Michael Adams Fri, 23 May 2008 14:14:22 -0700

On Fri, 23 May 2008 10:40:00 -0400
Jim Allan wrote:

> Michael Adams wrote:
> 
> > NOTE: Microsoft defied this and used those character spaces for
> > their"smart quotes" and other characters in the WINDOWS-1252
> > encoding which does not have a lot of approval as an international
> > standard except by IANA (the Internet Assigned Numbers Authority)
> > for web use.
> 
> Quite seriously, what else could they do at the time?
> 
> They were attempting to compete against Apple who had their own 
> proprietary character sets which included the curly quotation marks, 
> dashes, and various other non-ISO characters.
> 
> And hardly anyone has ever used the official 8-bit control characters.
> 
> I suppose they could have just followed the DOS route of letting every
> word processor and every desktop publishing program have its own way
> of producing characters which are essential to typographically correct
> publishing, continuing the mess established under DOS.
> 
> Defying the standards on this point was one of the best things they
> did in my opinion.
> 
> > NOTE: Interestingly most of the pages on the web which claim to be
> > ISO-8859-1 are not accurate because they contain the WINDOWS-1252
> > smart quotes or other WINDOWS-1252 characters. Most browsers allow
> > this and read ISO-8859-1 pages as WINDOWS-1252 anyway because the
> > ISO-8859-1 control codes are illegal for use in a web page anyway.
> 
> You can, of course, declare that your webpage is coded as Windows-1252
> or another Windows encoding. That_s really what should be done. I_ve 
> never read a discussion that indicates why it wasn_t done more, save
> the explanation of ignorance on the part of the page creators.
>


Now UTF-XX should be used which prevents this mix up.

> > With the advent of the Euro, in ISO-8859-15 the Euro sign was
> > introduced as well as incorporating Microsofts "smart quotes" into
> > the code (though some moved into legitimate character locations). 

I was in error here, the smart quotes are not in 8895-15

> >In theory this code supercedes and combines both WINDOWS-1252 and
> > ISO-8859-1, but ISO-8859-1 is still legal. I am not even sure if
> > ISO-8859-1 is officially deprecated (to be phased out over time).
> 
> I think it was barely used by anyone.
> 
> > UTF-8 gets around the above issue by using 1 byte for most letters,
> > and a special control character byte which says the next 1 to 3
> > bytes are an extended character for rarer characters. This takes up
> > a lot less memory, disk space and bandwidth than UTF-16 and UTF-32
> > in normal use.
> 
> True for Latin-alphabet coding. Not true for normal use if you writing
> Chinese, or even Greek or Cyrillic.
> 
> > A UTF-8
> > document starts with a special character called a Byte Order
> > Mark(BOM) which i will do no more than mention as it would take this
> > to far OT(plus i don't understand it completely).
> 
> BOM should not be used in UTF-8. It is required for some UTF-16 and 
> UTF-32 formats.
> 

BOM may be used in UTF-8 especially where the character encoding is not
declared in any other way. Some higher protocols do require that a BOM
*MUST NOT* be used.

http://unicode.org/faq/utf_bom.html#29

-- 
Michael

All shall be well, and all shall be well, and all manner of things shall
be well

 - Julian of Norwich 1342 - 1416

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users] Re: Character Encodings

Reply via email to