Re: Bidi edge cases in Hangul and Indic

Ken Whistler via Unicode Thu, 22 Feb 2018 15:36:06 -0800


On 2/22/2018 11:39 AM, David Corbett via Unicode wrote:

For example, after a right-to-left override, the Hangul string 보기(“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form isreordered by jamo instead of by syllable; that is, it looks like “igob”.

Nope. *tilt* The UBA reorders the display order in layout -- not theunderlying string.

"bogi" is the sequence <1107, 1169, 1100, 1175> in NFD or <BCF4, AE30>in NFC.

Because of canonical equivalence, for display of the NFD string, thesequence <1107,1169> needs to be mapped onto the same *glyph* as BCF4,and the sequence <1100,1175> onto the same *glyph* as AE30.

If you override the normal left-to-right ordering with bidi overridecontrols, then the layout order is reversed, but what is actually laidout is those two glyphs. So you just reverse the order of the twosyllables for display, in either case.

You could force display of "igob", but only if you had inserted somecharacter in between the conjoining jamos that was preventing theirequivalence to the syllables, anyway.

I don’t think it is the intent of the algorithm that canonicallyequivalent strings display so very differently, but I can’t find anyexplicit guidance. What should a UBA-conformant renderer do?


The right thing. ;-)

--Ken

Re: Bidi edge cases in Hangul and Indic

Reply via email to