On Fri, 23 May 2008 17:38:39 -0400
John Kaufmann wrote:

> In a message dated 2008.05.23 17:15 -0500, Michael Adams wrote:
> 
> > BOM may be used in UTF-8 especially where the character encoding is
> > not declared in any other way. Some higher protocols do require that
> > a BOM*MUST NOT* be used.
> > 
> > http://unicode.org/faq/utf_bom.html#29
> 
> ? I don't get your point in saying "BOM may be used in UTF-8".  As
> your reference says, "UTF-8 can contain a BOM. However, it makes no
> difference as to the endianness of the byte stream."  So why would one
> bother?

In web design the encoding can be declared in the HTTP Header or the
content of the <head> but in a straight text stream the encoding may not
be passed at all externally to the stream. UTF-8 is not endian
reversible, the BOM is always passed as EF BB BF, so it is more like an
encoded mime type at the start of the text than anything else. By
reading the first three bytes a recieving aplication can go ah-ha this
is UTF-8.

Not sure if it is still true in the latest incarnation but notepad.exe
always set a BOM when UTF-8 was set, to the annoyance of PHP
programmers, because when it was used as a PHP include() it ended up
with random BOM's mid page. Still notepad never was any programmers
favourite editor.


-- 
Michael

All shall be well, and all shall be well, and all manner of things shall
be well

 - Julian of Norwich 1342 - 1416

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to