John W Kennedy wrote:
ODBC is, by definition, a many-languages-to-many-databases interface. If
it had the kind of proprietary crap that Microsoft traditionally smears
all over standards, it wouldn't work at all.
You’ve just shown that Microsoft does not always produce broken standards.
Microsoft currently uses UTF-16 internally which can use initial BOM.
So there is nothing wrong there with Microsoft using initial BOM.
Check the Unicode manual.
What has that to do with the question of whether or not Microsoft has
vandalized UTF-8?
Well you haven’t provided ANY evidence that Microsoft has vandalized UTF-8.
UTF-8 does not define a BOM, and does not need one. However, one is
often encountered, which frequently causes trouble, because the damn
thing is invisible, and typically manifests itself as a phantom syntax
error. I have /heard/ that this evil practice started with Microsoft,
but I do not know. Perhaps someone else here does.
I understand that much of the problem comes from files which are often
created or modified by Microsoft’s default text editor Notepad which
automatically places a BOM at the beginning if you save as Unicode. If
you try to cat such files on a nix system, the BOM is kept, as indeed it
should be according to standard cat usage.
For Unicode’s own comments on BOM see
http://unicode.org/faq/utf_bom.html#BOM . They quite openly state the
problem with UTF-8 and BOM. The problem originates with the Unicode
standard (and software which does not know it is handling Unicode and
doesn’t know about BOM). BOM has been part of Unicode since Unicode 1.0
in 1991.
Unless or until the Unicode consortium deprecates its use, BOM is a part
of Unicode to be used as defined by the Unicode standard. There was
quite a lot of pressure put on them in the Unicode forum a few years
back to deprecate BOM. They wouldn’t do it, feeling that its use as a
signature and a byte-order indicator more than compensated for users
trying to do things which Unicode files with tools that didn’t
understand Unicode.
The basic trouble is that a lot more things are done on Nix systems with
text files than in Windows, which means there is more possbility of
things being messed up by an unexpected BOM. And, at least a few years
back, Nix people seemed to prefer to blame Microsoft rather than to
modify tools to recognize BOM at the beginning of a file and take care
of it. It’s not very much trouble to add a few lines of code to check if
a file begins with EF BB BF and then take appropriate action, either
question the user or just throw the three bytes away.
Jim Allan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]