On Fri, 23 May 2008 17:38:39 -0400 John Kaufmann wrote: > In a message dated 2008.05.23 17:15 -0500, Michael Adams wrote: > > > BOM may be used in UTF-8 especially where the character encoding is > > not declared in any other way. Some higher protocols do require that > > a BOM*MUST NOT* be used. > > > > http://unicode.org/faq/utf_bom.html#29 > > ? I don't get your point in saying "BOM may be used in UTF-8". As > your reference says, "UTF-8 can contain a BOM. However, it makes no > difference as to the endianness of the byte stream." So why would one > bother?
In web design the encoding can be declared in the HTTP Header or the content of the <head> but in a straight text stream the encoding may not be passed at all externally to the stream. UTF-8 is not endian reversible, the BOM is always passed as EF BB BF, so it is more like an encoded mime type at the start of the text than anything else. By reading the first three bytes a recieving aplication can go ah-ha this is UTF-8. Not sure if it is still true in the latest incarnation but notepad.exe always set a BOM when UTF-8 was set, to the annoyance of PHP programmers, because when it was used as a PHP include() it ended up with random BOM's mid page. Still notepad never was any programmers favourite editor. -- Michael All shall be well, and all shall be well, and all manner of things shall be well - Julian of Norwich 1342 - 1416 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
