On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote:
On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote:
On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote:
Is there a block of RTL PUA also?
No.
Perhaps there should be?

This is a periodic suggestion that never goes anywhere--for good reason. (You can search the email archives and see that it keeps coming up.)

Presuming that this question was asked in good faith...


What about designating a part of the PUA to have a specific property?

The problem with that is that assigning *any* non-default property to any PUA code point would break existing implementations' assumptions about PUA character properties and potentially create havoc with existing use.

Only certain properties matter enough:

That is an un-demonstrated assertion that I don't think you have thought through sufficiently.

* wide
* RTL

RTL is not some binary counterpart of LTR. There are 23 values of Bidi_Class, and anyone who wanted to implement a right-to-left script in PUA might well have to make use of multiple values of Bidi_Class. Also, there are two major types of strong right-to-leftness: Bidi_Class=R and Bidi_Class=AL. Should a "RTL PUA" zone favor Arabic type behavior or non-Arabic type behavior?

* combining

Also not a binary switch. Canonical_Combining_Class is a numeric value, and any value but ccc=0 for a PUA character would break normalization. Then for the General_Category, there are three types of "marks" that count as combining: gc=Mn, gc=Mc, gc=Me. Which of those would be favored in any PUA assignment?

as most others are better represented in the font itself.

Really? Suppose someone wants to implement a bicameral script in PUA. They would need case mappings for that, and how would those be "better represented in the font itself"? Or how about digits? Would numeric values for digits be "better represented in the font itself"? How about implementation of punctuation? Would segmentation properties and behavior be "better represented in the font itself"?


This could be done either by parceling one of existing PUA ranges: planes 15
and 16 are virtually unused thus any damage would be negligible;

That is simply an assertion -- and not the kind of assertion that the UTC tends to accept on spec. I rather suspect that there are multiple participants on this email list, for example, who *do* have implementations making extensive use of Planes 15/16 PUA code points for one thing or another.

  or perhaps
by allocating a new range elsewhere.
See:

https://www.unicode.org/policies/stability_policy.html

The General_Category property value Private_Use (Co) is immutable: the set of code points with that value will never change.

That guarantee has been in place since 1996, and is a rule that binds the UTC. So nope, sorry, no more PUA ranges.
Meow!

Grrr! ;-)

As I see it, the only feasible way for people to get specialized behavior for PUA ranges involves first ceasing to assume that somehow they can jawbone the UTC into *standardizing* some ranges for some particular use or another. That simply isn't going to happen. People who assume this is somehow easy, and that the UTC are a bunch of boneheads who stand in the way of obvious solutions, do not -- I contend -- understand the complicated interplay of character properties, stability guarantees, and implementation behavior baked into system support libraries for the Unicode Standard.

The way forward for folks who want to do this kind thing is:

1. Define a *protocol* for reliable interchange of custom character property information about PUA code points.

2. Convince more than one party to actually *use* that protocol to define sets of interchangeable character property definitions.

3. Convince at least one implementer to support that protocol to create some relevant interchangeable *behavior* for those PUA characters.

And if the goal for #3 is to get some *system* implementer to support the protocol in widespread software, then before starting any of #1, #2, or #3, you had better start instead with:

0. Create a consortium (or other ongoing organization) with a 10-year time horizon and participation by at least one major software implementer, to define, publicize, and advocate for support of the protocol. (And if you expect a major software implementer to participate, you might need to make sure you have a business case defined that would warrant such a 10-year effort!)

--Ken

Reply via email to