On 10/24/19 9:40 PM, Kim Alvefur wrote:
We should refrain from using things like grapheme clusters in wire formats,
as those are subject to changes in upcoming Unicode versions and thus the
wire format would be understood differently depending on the Unicode version
implemented by the client.
Doesn't this also depend on the font?
If the font does not support certain graphemes it may be rendered as
multiple (it may render 🤦♂️ as 🤦 and ♂️). The font rendering toolkit
may be aware that this is a single grapheme since Emoji 4.0 and thus may
consider it a single grapheme when selecting (for copy and paste, i.e.
not allow to only copy the ♂️). If the rendering toolkit does allow to
select only a part of this grapheme cluster and the user does so and
instruct the client to make the selected text a reference, this would
make things interesting again (because in the Unicode counting, you'd be
in the middle of a character, so it would not be possible to actually do
what the user instructed). Thus the font may be relevant for various
UI/UX stuff and developers need to be aware of those when allowing the
user to input stuff.
For output, the font would not be of any relevance, it doesn't matter if
in the end the reference link is using a single grapheme or two
graphemes because the font does not support that single grapheme from
the newer Unicode version. Of course if the toolkit wants you to give
highlight instructions in displayed graphemes, you'd have to deal with
that, but I hope there is no toolkit doing that...
Does it make sense to do an Informational XEP for Unicode handling in XEPs?
Standards mailing list