On Fri, 23 May 2008 10:40:00 -0400 Jim Allan wrote: > Michael Adams wrote: > > > NOTE: Microsoft defied this and used those character spaces for > > their"smart quotes" and other characters in the WINDOWS-1252 > > encoding which does not have a lot of approval as an international > > standard except by IANA (the Internet Assigned Numbers Authority) > > for web use. > > Quite seriously, what else could they do at the time? > > They were attempting to compete against Apple who had their own > proprietary character sets which included the curly quotation marks, > dashes, and various other non-ISO characters. > > And hardly anyone has ever used the official 8-bit control characters. > > I suppose they could have just followed the DOS route of letting every > word processor and every desktop publishing program have its own way > of producing characters which are essential to typographically correct > publishing, continuing the mess established under DOS. > > Defying the standards on this point was one of the best things they > did in my opinion. > > > NOTE: Interestingly most of the pages on the web which claim to be > > ISO-8859-1 are not accurate because they contain the WINDOWS-1252 > > smart quotes or other WINDOWS-1252 characters. Most browsers allow > > this and read ISO-8859-1 pages as WINDOWS-1252 anyway because the > > ISO-8859-1 control codes are illegal for use in a web page anyway. > > You can, of course, declare that your webpage is coded as Windows-1252 > or another Windows encoding. That_s really what should be done. I_ve > never read a discussion that indicates why it wasn_t done more, save > the explanation of ignorance on the part of the page creators. >
Now UTF-XX should be used which prevents this mix up. > > With the advent of the Euro, in ISO-8859-15 the Euro sign was > > introduced as well as incorporating Microsofts "smart quotes" into > > the code (though some moved into legitimate character locations). I was in error here, the smart quotes are not in 8895-15 > >In theory this code supercedes and combines both WINDOWS-1252 and > > ISO-8859-1, but ISO-8859-1 is still legal. I am not even sure if > > ISO-8859-1 is officially deprecated (to be phased out over time). > > I think it was barely used by anyone. > > > UTF-8 gets around the above issue by using 1 byte for most letters, > > and a special control character byte which says the next 1 to 3 > > bytes are an extended character for rarer characters. This takes up > > a lot less memory, disk space and bandwidth than UTF-16 and UTF-32 > > in normal use. > > True for Latin-alphabet coding. Not true for normal use if you writing > Chinese, or even Greek or Cyrillic. > > > A UTF-8 > > document starts with a special character called a Byte Order > > Mark(BOM) which i will do no more than mention as it would take this > > to far OT(plus i don't understand it completely). > > BOM should not be used in UTF-8. It is required for some UTF-16 and > UTF-32 formats. > BOM may be used in UTF-8 especially where the character encoding is not declared in any other way. Some higher protocols do require that a BOM *MUST NOT* be used. http://unicode.org/faq/utf_bom.html#29 -- Michael All shall be well, and all shall be well, and all manner of things shall be well - Julian of Norwich 1342 - 1416 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
