On 30/03/2004 16:30, Kenneth Whistler wrote:

...

Uh, sorry, Peter, but the implications here are so much b...., err, ...
baloney.

The majority of the world's scripts are left-to-right. They also
happen to be non-Western. There are more *Indic* scripts encoded
in the Unicode Standard than *Western* scripts.

The majority of *entities* that the majority of users put into
PUA characters in actual application usage are unencoded CJK
ideograph variants and symbols from Asian code pages. It was
primarily the need to accomodate those *Eastern* users that drove
the setting of default values for the PUA.



OK, in that case let's allocate properties to PUA characters in proportion to the number of RTL vs LTR scripts, and the proportion of combining marks vs. base characters, in actual encoded scripts. The majority of PUA characters are unchanged. A significant minority become RTL or non-spacing.

A lot of effort has gone into accommodating certain *Eastern* users. Something like 100,000 CJK characters have already been defined, and already that is not enough and they have requisitioned two more planes of PUA with LTR properties. Fair enough if they might be needed. But what if users of certain other scripts e.g. RTL scripts want just a handful of PUA characters with the properties they need? Why is preference given to CJK? This sounds like bias to me even if I was wrong to call it western.

This bias is also reflected in their system software which (as far as I know with no exceptions) does not allow users to specify properties for PUA characters other than the default decided by the UTC.



Bias? Or business sense?


If you want some specialized behavior for software, you either
write it yourself, or pay someone to write it, or convince someone
else that adding such a feature to the software *they* write
will pay for the investment cost in terms of incremental
increased sales.

You may not like how the software industry works, but thems
the breaks for any mature industry.



Well, I don't quite see why it is business sense for software companies to support the huge PUAs for variant CJK characters, outside the 100,000 or so already defined by Unicode. I do understand that it is business sense not to support user specification of properties, because that would be hard work for little or no gain.


...

Scenario: The UTC listens to you and defines some section of the PUA
as strong right-to-left by default for use in PUA-defined bidirectional
scripts. Somebody else is *already* using that section of the PUA
for something else. Now they have an interoperability problem,
because the default behavior they were depending on changes over
in some future version of some software, not under their control,
and they data gets munged by bidi.



Well, they weren't supposed to rely on these default properties anyway, they were supposed to use the PUA at their own risk. They are not the only ones who are messed up by features of software which is not under their control. But it might be preferable in practice to define an additional PUA with RTL properties and one with default ignorable properties, outside all of the existing PUAs. I am not asking for a large space; very likely 256 characters of each type would be more than adequate.


This is the kind of stuff the UTC refuses to start up by trying
to provide some subdivision of semantics in the PUA. *That* is
the principle, by the way, which guides the UTC position on
the PUA: Use at your own risk, by private agreement.



What we do want is compatibility between our applications and the system software, and this proposal is the way to do that.



I don't see how any proposal to create some particular behavior in the PUA is a way to accomplish that.



If a new PUA is created with default RTL properties, one can expect that system software will soon support it at least to the extent of defining these characters as RTL for bidi algorithm etc purposes. Similarly with default ignorable.



...

A default value for a property is not a requirement by the UTC
*ON AN IMPLEMENTER* that they use that value. They can use whatever
property values they desire, but if they depart from what system
platforms provide them (by default) then they are buying themselves
an implementation task to get characters to do what they want.



Ken, you are a master of understatement. The task they are buying themselves is a rewrite of the whole system. Companies don't provide the details needed for others to customise individual modules, and it would probably be a breach of copyright etc to attempt to do so. Open Source is different here, of course.




-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/




Reply via email to