John W Kennedy wrote:

ODBC is, by definition, a many-languages-to-many-databases interface. If it had the kind of proprietary crap that Microsoft traditionally smears all over standards, it wouldn't work at all.

You’ve just shown that Microsoft does not always produce broken standards.

Microsoft currently uses UTF-16 internally which can use initial BOM. So there is nothing wrong there with Microsoft using initial BOM. Check the Unicode manual.

What has that to do with the question of whether or not Microsoft has vandalized UTF-8?

Well you haven’t provided ANY evidence that Microsoft has vandalized UTF-8.

UTF-8 does not define a BOM, and does not need one. However, one is often encountered, which frequently causes trouble, because the damn thing is invisible, and typically manifests itself as a phantom syntax error. I have /heard/ that this evil practice started with Microsoft, but I do not know. Perhaps someone else here does.

I understand that much of the problem comes from files which are often created or modified by Microsoft’s default text editor Notepad which automatically places a BOM at the beginning if you save as Unicode. If you try to cat such files on a nix system, the BOM is kept, as indeed it should be according to standard cat usage.

For Unicode’s own comments on BOM see http://unicode.org/faq/utf_bom.html#BOM . They quite openly state the problem with UTF-8 and BOM. The problem originates with the Unicode standard (and software which does not know it is handling Unicode and doesn’t know about BOM). BOM has been part of Unicode since Unicode 1.0 in 1991.

Unless or until the Unicode consortium deprecates its use, BOM is a part of Unicode to be used as defined by the Unicode standard. There was quite a lot of pressure put on them in the Unicode forum a few years back to deprecate BOM. They wouldn’t do it, feeling that its use as a signature and a byte-order indicator more than compensated for users trying to do things which Unicode files with tools that didn’t understand Unicode.

The basic trouble is that a lot more things are done on Nix systems with text files than in Windows, which means there is more possbility of things being messed up by an unexpected BOM. And, at least a few years back, Nix people seemed to prefer to blame Microsoft rather than to modify tools to recognize BOM at the beginning of a file and take care of it. It’s not very much trouble to add a few lines of code to check if a file begins with EF BB BF and then take appropriate action, either question the user or just throw the three bytes away.

Jim Allan









---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to