Re: What is the principle?

Kenneth Whistler Tue, 30 Mar 2004 17:19:56 -0800

Peter Kirk continued:

> >A user of PUA characters is free to define the
> >whole range of PUA characters as consisting of strong R-to-L
> >characters and implementing accordingly. ...
> >
> 
> This is not true!


It is true!!

> Users can define only those properties which the 
> software that they are using allows them to define. Your argument here 
> completely ignores the distinction between users and software 
> developers. 

No it doesn't. I am well aware of the distinctions between end
users, application developers, OS platform developers, and
basic library implementers. I have, at one point or another,
been in all of those shoes.

The mistake you (and some others on this thread) are making
is assuming that PUA characters were added to the standard
with some kind of implicit guarantee that end users could
define whatever they wanted there and that operating systems
would somehow magically supply appropriate rendering and
other behavior for them.

People *can* define whatever they want in PUA characters, but
if they expect something other than very dumb rendering and
collation behavior to be provided by some other system, they
are fooling themselves and each other. To do that kind of thing,
you need to *also* do the work to *implement* that behavior
for your definition.

> You may have the luxury of being able to do both. But the 
> vast majority of users depend on the software systems and applications 
> provided by large corporate software companies. (Software written by 
> smaller companies generally uses rendering engines, character processing 
> etc provided by the large companies.) 

Of course. And nobody expects some individual or even some small
company to be able to duplicate the entire Windows OS just in
order to implement Tifinagh (or whatever) in PUA characters and
have a Tifinagh-smart version of a word processing / typesetting
system come rolling out of the garage for fine publications.

That's the *REASON*, by the way, that the Unicode new scripts
committee (and WG2) has the extensive roadmap of additional
scripts to be encoded. We assume that the best way to get standard
behavior out of standard software for obscure scripts is to
*standardize* the character encoding for those scripts and keep
pushing the big software companies to update their support for
the latest additions to the standard. This works *much* better
than futzing around with attempting to get custom behavior for
complex scripts out of PUA characters.

> These large companies are mostly 
> members of the Unicode consoritum. They are also overwhelmingly western, 
> mostly American, and so inherently biased in favour of LTR scripts 
> without combining marks. This bias is reflected in the "default" 
> properties assigned to PUA characters, by their majority vote, and their 
> refusal to contemplate changes. 

Uh, sorry, Peter, but the implications here are so much b...., err, ...
baloney.

The majority of the world's scripts are left-to-right. They also
happen to be non-Western. There are more *Indic* scripts encoded
in the Unicode Standard than *Western* scripts.

The majority of *entities* that the majority of users put into
PUA characters in actual application usage are unencoded CJK
ideograph variants and symbols from Asian code pages. It was
primarily the need to accomodate those *Eastern* users that drove
the setting of default values for the PUA.

> This bias is also reflected in their 
> system software which (as far as I know with no exceptions) does not 
> allow users to specify properties for PUA characters other than the 
> default decided by the UTC.

Bias? Or business sense?

If you want some specialized behavior for software, you either
write it yourself, or pay someone to write it, or convince someone
else that adding such a feature to the software *they* write
will pay for the investment cost in terms of incremental
increased sales.

You may not like how the software industry works, but thems
the breaks for any mature industry.

You may also want to drive a 3-wheeled car that runs on solar
power. But if you want one, you'll probably have to build it
yourself, because it is unlikely that you'll get GM or Ford
or Toyota or Honda or Nissan or Daimler-Chrysler to do it
for you.

> At least you understand the problem which totally undermines your 
> argument here.

*scratches head*

> >You can do it privately. See above. But attempting to do such things
> >in terms of formally specified usages of the PUA is an invitation
> >to failure of interoperability.

> I don't understand this last comment. 

Scenario: The UTC listens to you and defines some section of the PUA
as strong right-to-left by default for use in PUA-defined bidirectional
scripts. Somebody else is *already* using that section of the PUA
for something else. Now they have an interoperability problem,
because the default behavior they were depending on changes over
in some future version of some software, not under their control,
and they data gets munged by bidi.

This is the kind of stuff the UTC refuses to start up by trying
to provide some subdivision of semantics in the PUA. *That* is
the principle, by the way, which guides the UTC position on
the PUA: Use at your own risk, by private agreement.

> What 
> we do want is compatibility between our applications and the system 
> software, and this proposal is the way to do that.

I don't see how any proposal to create some particular behavior
in the PUA is a way to accomplish that.


> >Nope. You're wrong. A default value for a property is not a
> >requirement by the UTC regarding what a PUA character can or may
> >or must be used for.
> >
> Yes. If a default value is not a requirement, then a CHANGE to a default 
> value is not a requirement. You have no good reason not to make a change 
> to the default value for some PUA characters.

Huh? The UTC has every reason not to make any change in the default
values for any PUA characters. (See above.)

A default value for a property is not a requirement by the UTC
*ON AN IMPLEMENTER* that they use that value. They can use whatever
property values they desire, but if they depart from what system
platforms provide them (by default) then they are buying themselves
an implementation task to get characters to do what they want.

> I see the point about not proliferating separate PUA spaces. But that is 
> the only argument I see on your side. Perhaps the UTC will be less dead 
> set against this if the arguments are realised, and perhaps if the few 
> non-western UTC members realise how the process is biased against the 
> languages of their countries.

This is more utter baloney, I'm afraid. The UTC has done more to
bring non-western writing systems under the big tent of modern
software development and global IT infrastructure than any 6
other standardization organizations you could name, combined.

--Ken

Re: What is the principle?

Reply via email to