BTW, relevant to this discussion is a proposal filed http://www.unicode.org/ L2/L2017/17434-emoji-rejex-uts51-def.pdf (The date is wrong, should be 2017-12-22)
Mark On Tue, Jan 2, 2018 at 11:41 AM, Mark Davis ☕️ <m...@macchiato.com> wrote: > We had that originally, but some people objected that some languages > (Arabic, as I recall) can end a string of letters with a ZWJ, and > immediately follow it by an emoji (without an intervening space) without > wanting it to be joined into a grapheme cluster with a following symbol. > While I personally consider that a degenerate case, we tightened the > definition to prevent that. > > Mark > > Mark > > On Tue, Jan 2, 2018 at 10:41 AM, Manish Goregaokar <man...@mozilla.com> > wrote: > >> In the current draft GB11 mentions Extended_Pictographic Extend* ZWJ x >> Extended_Pictographic. >> >> Can this similarly be distilled to just ZWJ x Extended_Pictographic? This >> does affect cases like <indic letter, virama, ZWJ, emoji> or <arabic >> letter, zwj, emoji> and I'm not certain if that counts as a degenerate >> case. If we do this then all of the rules except the flag emoji one become >> things which can be easily calculated with local information, which is nice >> for implementors. >> >> (Also in the current draft I think GB11 needs a `E_Modifier?` somewhere >> but if we merge that with Extend that's not going to be necessary anyway) >> >> -Manish >> >> On Tue, Jan 2, 2018 at 3:02 PM, Manish Goregaokar <man...@mozilla.com> >> wrote: >> >>> > Note: we are already planning to get rid of the GAZ/EBG distinction ( >>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >>> >>> >>> This is great! I hadn't noticed this when I last saw that draft (I was >>> focusing on the Virama stuff). Good to know! >>> >>> >>> > Instead, we'd add one line to >>> *Extend <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:* >>> >>> Yeah, this is essentially what I was hoping we could do. >>> >>> Is there any way to formally propose this? Or is bringing it up here >>> good enough? >>> >>> Thanks, >>> >>> -Manish >>> >>> On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ☕️ via Unicode < >>> unicode@unicode.org> wrote: >>> >>>> This is an interesting suggestion, Manish. >>>> >>>> <non-emoji-base, skin tone modifier> is a degenerate case, so if we >>>> following your suggestion we also could drop E_Base and E_Modifier, and >>>> rule GB10. >>>> >>>> Instead, we'd add one line to *Extend >>>> <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:* >>>> >>>> OLD >>>> Grapheme_Extend = Yes >>>> *and not* GCB = Virama >>>> >>>> NEW >>>> Grapheme_Extend = Yes, or >>>> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ >>>> UTS51 <http://www.unicode.org/reports/tr41/tr41-21.html#UTS51>]. >>>> *and not* GCB = Virama >>>> >>>> Note: we are already planning to get rid of the GAZ/EBG distinction ( >>>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >>>> >>>> Mark >>>> >>>> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >>>> unicode@unicode.org> wrote: >>>> >>>>> On Mon, 1 Jan 2018 13:24:29 +0530 >>>>> Manish Goregaokar via Unicode <unicode@unicode.org> wrote: >>>>> >>>>> > <random non-emoji, skin tone modifier> sounds very much like a >>>>> > degenerate case to me. >>>>> >>>>> Generally yes, but I'm not sure that they'd be inappropriate for >>>>> Egyptian hieroglyphs showing human beings. The choice of determinative >>>>> can convey unpronounceable semantic information, though I'm not sure >>>>> that that can be as sensitive as skin colour. However, in such a case >>>>> it would also be appropriate to give a skin tone modifier the property >>>>> Extend. >>>>> >>>>> Richard. >>>>> >>>> >>>> >>> >> >