> Fra: Jukka K. Korpela <[email protected]> >> "When the BOM is used in web pages or editors for UTF-8 encoded content it >> can sometimes introduce blank spaces or short sequences of strange-looking >> characters (such as ). For this reason, it is usually best for >> interoperability to omit the BOM, when given a choice, for UTF-8 content." >> >> http://www.w3.org/International/questions/qa-byte-order-mark >> >> In reality, BOM surely helps rather than hurts, especially when a document >> is saved locally and HTTP headers are thereby lost. Authoring tools may have >> problems with it (and then again, some tools have problems with UTF-8 files >> that _lack_ BOM).
This stetemant for maximum interoperability may have been true in the past, where Unicode support was not so universal and still not adopted formally for all newer developments in RFCs published by the IETF. But now the situation is reversed : maximum interoperability if offerd when BOMs are present, not really to indicate the byte order itself, but to confirm that the content is Unicode encoded and extremely likely to be text content and not arbitrary binary contents (that today almost always use a distinctive leading signature). Without the BOM we remain in the old practice of using host-specific and unspecified default encodings, which do not survive any transmission from one system to another, or from one user to another (the worst appearing when the default decoding used depends on the viewing user of the service; only because he speaks a different language with a basic setting that implies a different default encoding ; users generally dont know how to set the encodings, and will refuse to change their environment constantly depending on the services or contents they want to access too : this does not work today when we live in a world of applications and services provided from many simultaneous sources created in a highly heterogeneous worldwide network). BOMS are helping much more than they hurt today (and most places where they hurt are on systems that should have been updated since long due to the many discovered security holes in them, constantly harnessed by lots of attacks, and only partly fixed by security suites). Those old systems are also very frequently much less performant now (using the same hardware resources), notably everything related to filesystems and to Internet protocols such as web browsers, Even if we don't reencode the archives, we have now very simple and fast conversion tools that allow "reconnecting" these archives to the modern world in a transparent way (if they are archives, the data they store is readonly and van be accessed by a transparent filter whose speed can also be improved by internal caching using newer, faster, and cheaper storage solutions : these transparent and caching proxies also help preserving the precious archives)

