Hi Martin, Isn't this question analogous to asking whether the layout engine should use C1-conjoining form or C2-conjoining form for a <C1, Virama, C2> sequence in any indic? that is, whether the <C1, Virama> should form a glyph while C2 keeping its independent form or vice versa. (Potentially there can be more forms - that is, full ligature and explicit Virama form). If the question you asked is equivalent, then the answer is traditionally is left to the font to decide.
BTW, even for a given C1 and C2 for a given script, a font can potentially choose a different answer based on its its purpose/character, like a font for Malayalam traditional script Vs a font for reformed script. regards, Cibu On Mon, Oct 17, 2016 at 12:15 AM, Harshula <[email protected]> wrote: > Hi Martin, > > On 15/10/16 04:07, Martin Jansche wrote: > > For Sinhala, the following named sequences are defined (for good > reasons): > > > > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA > > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB > > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D > > > > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll > > write Ya for 0DBA and Ra for 0DBB. > > > > Note that these give rise to two potentially ambiguous codepoint > > strings, namely > > > > 0DBB 0DCA 200D 0DBA > > 0DBB 0DCA 200D 0DBB > > > > I'll concentrate on the first, as all arguments apply to the second one > > analogously. > > > > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible > parses: > > > > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya > > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya > > > > First question: Does the standard give any guidance as to which one is > > the intended parse? The section on Sinhala in the Unicode Standard is > > silent about this. Is there a general principle I'm missing? > > > > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not > > used and is considered incorrect, suggesting that the second parse > > (Repaya+Ya) should be the default interpretation of this sequence. > > However, SLS 1134 does not address the potential ambiguity of this > > sequence explicitly and the description there could be read as > > informative, not normative. > > 1) re: 0DBB 0DCA 200D 0DBA > > SLS 1134 was updated in 2011 (The latest public version I could find is > v3.41. This extract is the same in v3.6.): > https://sourceforge.net/p/sinhala/mailman/attachment/ > [email protected]/1/ > > "1. The yansaya is not used following the letter ර. e.g.: the spelling > කාර්ය is incorrect." > > If the above is insufficient, it's best to discuss the issue with Harsha > (CC'd) and Ruvan (CC'd). > > 2) re: 0DBB 0DCA 200D 0DBB > > Harsha & Ruvan can clarify this too. > > cya, > # > > > > Second question: Given that one parse of this sequence should be the > > default, how does one represent the non-default parse? > > > > In most cases one can guess what the intended meaning is, but I suspect > > this is somewhat of a gray area. In practice, trying to render these > > problematic sequences and their neighbors in HarfBuzz with a variety of > > fonts results in a variety of outcomes (including occasionally > > unexpected glyph choices). If the meaning of these sequences is not well > > defined, that would partly explain the variation across fonts. > > > > Am I missing something fundamental? If not, it seems this issue should > > be called out explicit in some part of the standard. > > > > Regards, > > -- martin >

