On Sun, Oct 20, 2019, at 18:39, JC Brand wrote:
> You don't need tons of ways, you can just follow the instructions. If
> the sending client is buggy, then this will become clear over time.

"Following the instructions" may  mean different things to different
clients in this case. One might treat it as an error, one might display
it and break up the flag emoji, etc. This is not ideal.

> Yes, you just render the two letters separately given that this is
> what's implied by the information you've been given and it's also a
> legitimate use-case.

Assuming this is the desired behavior and we can actually do this: Now
that they've been rendered separately, what if the receiving client
copies and pastes the message. The highlight is not included, or just
becomes plain text, does this mean the flag emoji is rejoined and now
the copy/pasted message is different from the original? This doesn't
seem ideal.

> > What if it's between something and a zero-width joiner that would
> > join it to another glyph, does that split that and now you have a
> > dangling joiner?
>
> This is as clearly an error as setting an offset in the middle of a
> UTF-8 encoding.

Perhaps. Now we just have to enumerate all the other ways that Unicode
handles things like this, and make sure all clients handle them the same
way. This would of course be a problem if we were using bytes, for
example, too, but the point is that it's not as simple as saying "these
things are errors and these aren't". There are different ways to handle
these, and Unicode has a lot of edge cases we likely won't think of.

> > From a code perspective does this mean that highlighting always has
> > to integrate with the text rendering engine? This seems like a
> > *major* downside to me, as it likely makes the code much more
> > complicated, and we may or may not even have the ability to
> > manipulate how the text rendering engine handles things.
>
> It's not clear to me why you think highlighting will necessarily
> require integration with the rendering engine. It should be possible
> to identify unicode codepoints in a string independent of any
> rendering engine.

How do you propose breaking up a flag emoji, for example? We have to
have a way to tell the text rendering engine "don't render this flag,
show the letters". We could probably include a zero width space or
something between the letters, but now when someone copy/pastes the
message they are copying characters that weren't part of what the sender
actually typed, which doesn't feel great.

—Sam

-- 
Sam Whited
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Reply via email to