> OK, it's safe, but it is a misuse of Unicode. As space plus combining > character is a unit in Unicode, it should be treated as a unit by higher > level protocols. If higher level protocols are allowed to do arbitrary > things within Unicode units, there is no end to the possible confusion. > See for example, from Unicode 4.0 chapter 3: > > C7 A process shall interpret a coded character representation according > to the character > semantics established by this standard, if that process does interpret > that coded character > representation.
If this is not the case (I'm not entirely sure this bans what XML does with spaces) then all we would need is a change so that rather than a de facto ban on space+combining within names and nmtokens we would have an explicit ban on the same; then we'd all be happy, except possibly for some sadistic XML application designer that was planning on use that combination out of ill-will towards his or her colleagues.

