Hello Rob,

(Sorry about top-posting, but the mail is getting a bit long. I'm leaving it just in case somebody wants to check the context.)

You are correct that in particular Persian/Farsi requires U+200C ZERO WIDTH NON-JOINER (ZWNJ).

But this character, and the 'reverse' U+200D ZERO WIDTH JOINER (ZWJ), are not excluded in any of the sets defined in RFC 9839. If I'm wrong about this, please point out where in RFC 9839 they are excluded.

RFC 9839 references RFC 8264 (Precis), where the above two characters are in the JoinControl (H) category, defined in 9.8.

Regards,    Martin.

On 2025-11-03 11:25, Rob Sayre wrote:
On Sun, Nov 2, 2025 at 6:10 PM Martin J. Dürst <[email protected]>
wrote:

Hello Rob, others,

On 2025-11-03 08:28, Rob Sayre wrote:


On 11/2/25 5:03 AM, Pete Resnick wrote:
On 31 Oct 2025, at 7:57, Martin J. Dürst wrote:

On 2025-10-29 09:33, Paul Hoffman wrote:

On Oct 28, 2025, at 01:35, Martin J. Dürst <[email protected]>
wrote:

Content, major: Section 3: "There are many Unicode characters that
obviously cannot be displayed (such as control characters), and
many whose ability to be displayed is debatable.": It's unclear
what "many whose ability to be displayed is debatable." means. I'd
guess it refers to scripts and characters standardized recently,
for which font support is still thin. If that's what is meant,
please say so; if something else is meant, please make clear what
that is.

There is a wide variety of things that can be debatable. Are
combining characters like U+0315 (COMBINING COMMA ABOVE RIGHT)
displayable? What about non-spacing marks like U+0650 (ARABIC
KASRA)? I am sure people would take each side of the debate ("I can
see the symbol printed in the Unicode Standard" vs. "I can't see
that code point on my laptop even though it has quite a complete
font set" and so on).

On any decent browser, these should display without problems. When it
comes to editors, shells, and the like, the field is much wider, so
there are no absolute guarantees. But these are in Unicode since
Unicode 1.0 or so, so I would expect these to show.

I will leave it to you and Paul to replace "debatable" with something
clearer.



Hi,

There is an entire RFC about this, which Paul co-wrote.

https://www.rfc-editor.org/rfc/rfc9839.html

Last time I checked, none of the characters excluded in any of the sets
defined in RFC 9839 had any chance whatsoever to turn up in names of
people or companies or places.


What you may be missing is that social networks have character counts,
and they sure do go after these issues.

These systems do in fact count a "family" as one character, not
multiples with ZWNJs. Once you understand that, it gets a little cleaner.

I know. At a Unicode Conference many years back, I learned (directly
from the person who initiated that change) that Twitter had switched
from counting bytes to counting code points, which was the first step in
that direction.

But we are currently not looking at writing policy about length
restrictions, so I think this is irrelevant. [It's also irrelevant
because of the low (=zero?) likeliness of somebody having a family
emoji, or any emoji for that, in their name.]


You need them in Arabic and Persian (not even the correct name there, but
let's carry on).

https://www.w3.org/TR/2025/DNOTE-alreq-20251002/

Here, we can go for

4.3.4.1 Disjoining Enforcement

or

4.3.4.2 Joining Enforcement

or

4.3.4.3 Joining-Disjoining Enforcement

I am pretty sure you know this stuff, but most others probably don't.

We could use this last name:

علی‌رضا‎ (Ali‌Reza)

  thanks,
Rob


--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to