...This is all rather interesting speculation. There are surely a lot of potential cases in scripts where some kind of combining mark can be considered as applying to a sequence of an arbitrary number of characters. For example:
encircle(<DIGIT 9, DIGIT 2, DIGIT 3, DOT, DIGIT 0>) == <DIGIT 9, DOUBLE ENCLOSING CIRCLE, DIGIT 2, DOUBLE ENCLOSING CIRCLE, DIGIT 3, DOUBLE ENCLOSING CIRCLE, DOT, DOUBLE ENCLOSING CIRCLE, DIGIT 0>
Here you don't have any ZWJ character, that's the double diacritic which creates explicitly the ligature between the previous and next base character.
All these solutions are not specified in the standard. This is a pure convention of use of Unicode, and until there's some enhancement published in the Unicode character model, to clearly create ranges of characters on which diacritics can be applied, without the too simple ZWJ control, this interpretation of such encoded text will remain application-dependant.
Enclosing circles, squares and ellipses.
Continuous underlines and overlines.
Continuous tildes, slurs, contour tone marks etc which may apply to several characters or whole words.
The cartouche in Egyptian hieroglyphs, which surrounds a group of several characters.
A number of mathematical functions e.g. fraction dividers, extensions to root signs.
Combining marks which are supposed to be centred over or under two or more characters or even a whole word, like the Hebrew masora circle.
Now I am sure it could be argued that some of these are not plain text and so should be dealt with by higher level markup. But maybe some of these need to be considered as part of plain text; for example, it is at least conceivable, and arguably true of the Egyptian cartouche, that these marks are required for proper understanding of the plain text, just as much so as regular letters and combining marks.
So how should they be represented? Philippe's suggestion of <c1, mark, c2, mark, c3, mark... mark, cn> would seem to work, but could be very inefficient. Jill's alternative <bracket1, c1, c2, c3... cn, bracket2, mark> is more efficient for long sequences. But perhaps better would be to have paired opening and closing marks: <mark1, c1, c2, c3... cn, mark2> - although this requires a new pair of characters for each such case.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/