On Sat, 22 Jun 2019 23:56:50 +0000 Shawn Steele via Unicode <unicode@unicode.org> wrote:
> + the list. For some reason the list's reply header is confusing. > > From: Shawn Steele > Sent: Saturday, June 22, 2019 4:55 PM > To: Sławomir Osipiuk <sosip...@gmail.com> > Subject: RE: Unicode "no-op" Character? > > The original comment about putting it between the base character and > the combining diacritic seems peculiar. I'm having a hard time > visualizing how that kind of markup could be interesting? There are a number of possible interesting scenarios: 1) Chopping the string into user perceived characters. For example, the Khmer sequences of COENG plus letter are named sequences. Akin to this is identifying resting places for a simple cursor, e.g. allowing it to be positioned between a base character and a spacing, unreordered subscript. (This last possibility overlaps with rendering.) 2) Chopping the string into collating elements. (This can require renormalisation, and may raise a rendering issue with HarfBuzz, where renomalisation is required to get marks into a suitable order for shaping. I suspect no-op characters would disrupt this renormalisation; CGJ may legitimately be used to affect rendering this way, even though it is supposed to have no other effect* on rendering.) 3) Chopping the string into default grapheme clusters. That separates a coeng from the following character with which it interacts. *Is a Unicode-compliant *renderer* allowed to distinguish diaeresis from the umlaut mark? Richard.