> Just imagine what would be created with your assumption with this source: > const wchar_t c = L'?'; > where ? is a combining character.
The programmer would get bit. At best, there's no reason to assume that every compiler accepts UTF-8, besides that fact that you can't assume that the compiler or any intermediary step doesn't normalize. That's why Unicode escapes exist, and partially why Java as a general rule translates source into a form that uses Unicode escapes for non-ASCII characters. Even if you assume the compiler can accept Unicode text in whatever UTF you choose, it still seems needlessly dangerous to use a bare combining character instead of a Unicode escape or a numeric entity. Despite your distinction, there's no clear line between programming editors and non-programming editors. Any editor that gives you variable names in Hindi or Arabic is likely to have the sophistication need to combine that ? with that ', and I see no reason they won't; quite possibly, the underlying system won't give them the option to handle Hindi or Arabic and not combining that ? with that '. Emacs, for one notorious programming editor, fully plans to have that sophistication. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm

