That is not sufficient. The first three bytes could represent a real content character, ZWNBSP or they could be a BOM. The label doesn't tell you.
This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE 0xFF represents a BOM, and is not part of the content. In the second case, it does *not* represent a BOM -- it represents a ZWNBSP, and must not be stripped. The difference here is that the encoding name tells you exactly what the situation is. Mark __________________________________ http://www.macchiato.com ► “Eppur si muove” ◄ ----- Original Message ----- From: "Murray Sargent" <[EMAIL PROTECTED]> To: "Joseph Boyle" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, November 01, 2002 12:42 Subject: RE: Names for UTF-8 with and without BOM > Joseph Boyle says: "It would be useful to have official names to > distinguish UTF-8 with and without BOM." > > To see if a UTF-8 file has no BOM, you can just look at the first three > bytes. Is this a problem? Typically when you care about a file's > encoding form, you plan to read the file. > > Thanks > Murray > > >