On 2/2/2020 5:22 PM, Richard Wordingham via Unicode wrote:
On Sun, 2 Feb 2020 16:20:07 -0800
Eric Muller via Unicode <unicode@unicode.org> wrote:

That would imply some coordination among variations sequences on
different code points, right?

E.g. <0B48> ≡ <0B47, 0B56>, so a variation sequence on 0B56 (Mn,
ccc=0) would imply the existence of a variation sequence on 0B48 with
the same variation selector, and the same effect.
That particular case oughtn't to be impossible, as in NFD everything in
sight has ccc=0.  However TUS 12.0 Section 23.4 does contain an
additional prohibition against meaningfully applying a variation
selector to a 'canonical decomposable character'. (Scare quotes because
'ly' seems to be missing from the phrase.)


So, let's look at what that would look like with some variation selector

<0B48, Fxxx> ≡ <0B47, 0B56, Fxxx>

If the variant in the shape of 0B48 is well-described by a variation on the contribution due to 0B56 in the decomposed sequence then this might make sense. But if the variant would be better described as a variation in the 0B47 component, then it would be a prime example of poor "pseudo encoding": where some random sequence is assigned to a a shape (in this case) without being properly analyzable into constituent characters with their own identity.

Which would it be in this example?

And this example only works, of course, because with ccc=0, 0B56 cannot be reordered.

The prohibition as worded may perhaps be slightly more broad than necessary, but I can understand that the UTC didn't want to parse it more finely in the absence of any good examples that could be used to better understand what the actual limitations should be. Better safe than sorry, and all that.


On 2/2/2020 11:43 AM, Mark Davis ☕️ via Unicode wrote:
I don't think there is a technical reason for disallowing variation
selectors after any starters (ccc=000); the normalization algorithm
doesn't care about the general category of characters.



Reply via email to