Peter Kirk asked: > Thanks for the clarification. I probably misunderstood Jon's intention. > But is there a problem if, for example, an application sees the string > <space, space, combining mark> and regularises it (wrongly!) to <space, > combining mark>?
Then you have a problem, of course. What the Unicode Standard says about application of nonspacing combining marks to SPACE seem clear to me. What other standards say about space folding is clear in their own contexts. If someone is implementing both such standards together, then one has to be careful how the requirements articulate. In Unicode terms, a space folding is an example of a "knowing modification" of the content of the text. It is perfectly o.k. to modify Unicode text, of course, *as long as you know what you are doing* -- i.e., you aren't converting valid text to bit hash because you aren't conforming to the meaning of the characters or to their encoding forms. Now if a process is doing a space folding, but is applying it to Unicode text as a "semi-ignorant modification", i.e., without being aware of the fact that nonspacing combining marks can apply to SPACE characters (and that such sequences are valid combining character sequences and should be treated analogously with other grapheme clusters, viz UAX #29), then it is modifying the text away from its intended content without *knowing* what it is actually doing. Such mistakes are programming errors in application of the relevant standards. Of course a standard which mandates space folding is also within its rights to mandate, for example, the non-use of nonspacing marks applied to SPACE characters. It can simply rule out such sequences as valid for its context, in which case the problem goes away. The important thing here is to know what you are doing when you modify text, and, as far as possible, to accomplish such modifications in ways that are the same as other processes which also know what they are doing. That is the basis for interoperability of textual data. --Ken

