On Sun, 21 Aug 2011 01:44:02 +0000 "Doug Ewell" <d...@ewellic.org> wrote:
>> The more I think of it, the more I like the idea of reassigning the >> default BC of Plane 16 to 'R'. What would the arguments against this >> be? >> BC of 'AL'? > Would that really be a better default? I thought the main RTL needs > for the PUA would be for unencoded scripts, not for even more Arabic > letters. (How many more are there anyway?) Not necessarily better, I'm just suggesting that both need to be supported. However, we need to look at use cases. (1) Unencoded Arabic script letters with joining behaviour, for use with any application. (a) We need the character to have AL, R or ON for it to be included in BiDi runs. If we use ON we may need RLM when the character is at the edge of a run, and even then, its behaviour may be no better than a character with a BC of R. (b) It may get left out of script runs. There were problems on Windows with the Tamil ligature k.SS not rendering, despite font support, when the character U+0BB7 TAMIL LETTER SSA was new. And that's in a left-to right script with a character in the appropriate block! (2) Complete right-to-left script. I'm presuming the difference between AL and R is then a matter of what right-to-left script the potential users chiefly also use. (a) As a practical implementation, the distinction between AL and R would matter if the script has modern use. Otherwise, any of ON, AL and R would do, though one might face the annoyance of having to start chunks of text with RLM. If a script with modern use should be encoded using a BC of R, then I believe ON would also do as a stop-gap until the script is encoded. How fiendish is BiDi-sensitive transliteration? (b) For experimentation, I believe the difference between AL, R and ON would matter little, even though it would be irritiating to have to use RLM. (c) Complex script support is patchy - one might be restricted to applications that allow the font to provide full complex script support. The big issue in all this, though, is (i) how to update the rendering system with a new set of values for Unicode properties, including script, and (ii) the scope of such an update. (The distinction between the PUA and the rest is that it makes sense for PUA properties to change as freely as fonts.) This, incidentally, is analogous to locales reflecting code page selections. There is also, though less pressing, the issue of tailoring collations. (The worst issue is there is distinct canonically inequivalent characters of type Lo comparing equal - I've seen it for Canadian Aboriginal Syllabics for Windows XP and for Thai in Ubuntu 10.04 - surely that's not the normal British collation of such characters.) One minor problem with (i) *was* that it wasn't clear how one should annotate a copy of UnicodeData.txt to show that it has been modified. The standard XML alternative provides allows for comments, thereby solving that problem. If Issue (i) can be readily solved at the machine or user level or lower, then the default properties of the PUA become irrelevant. Richard.