--On Wednesday, February 11, 2015 22:44 +0100 Bjoern Hoehrmann
<[email protected]> wrote:
> This is meant to be a replacement for
>> all of the text in 5.2.5:
>>
>> The directionality rule of a profile specifies how to
>> treat strings containing what are often called
>> "right-to-left" (RTL) characters. RTL characters come from
>...
>> The PRECIS framework does not directly address how to deal
>> with bidirectional strings, since there is currently no
>> widely accepted and implemented solution for the safe
>> display of arbitrary bidirectional strings beyond the
>> general Unicode bidirectional specification [UAX9]. Rules
>...
> I am not happy with "this document generally recommends"; it's
> not clear if this is the recommendation or if the
> recommendation is elswhere; it's also not clear if this a RFC
> 2119 RECOMMENDED or something weaker, made worse by
> "generally" and also the preceding "unless". Should be easy to
> rephrase, but I do not have a good idea right now.
Björn,
Let my try to explain this from the perspective of someone who
has been greatly concerned with multiple aspects of the PRECIS
work (I assume no one how has been following this list or the WG
will be surprised by that), but who is now trying to work with
Pete, the document authors, and the WG co-chairs to get this
wrapped up. Let me also do so, at least in part, less from the
perspective of the IETF's i18n efforts and more from that of
someone who was expected to be able to at least partially get
around in multiple languages from a rather tender age --
languages that were, by my great good (or bad) luck, written in
three separate scripts, one of which is written right to left.
For at least the last several centuries, no calligrapher and
probably no "ordinary" reader or writer of a given script has
been particularly troubled by its directionality properties.
One just writes it as it is written, whether that is left to
right, right to left, top to bottom, or some variation on what
is sometimes called serpentine. However, when these things are
coded for computer systems in some set of "uniform" rules,
things get complicated. We have to worry about order of bits or
bytes in transmission, storage order, potentially the difference
between storage order and rendering order, and all sorts of
subtle variations on characters that may result from what is
next to them along some dimension.
The result is that the Unicode Bidi rules are very complex, to
the point that I often suspect that only a few people completely
understand them and that some of those may not understand all of
them three months in a row. Certainly the observation that UAX
#9 has been revised several times, presumably to make it more
comprehensible and clarify fine points and edge cases,
contributes to a "I hope this time they got it right" sensation.
It would be easy to cast blame for this in Unicode's direction
but, as with a lot of other aspects of Unicode, I believe that a
different "universal" character set, constructed with a
different set of rules, would simply favor different tradeoffs
and exhibit a different set of problems/ difficulties.
Now back to PRECIS and the text Pete posted.
First, one of the fundamental problems with the WG, at least
IMO, is that there have been a lot of participants, probably a
considerable majority, who have been unwilling to dig deeply
into this and other complex issues, concentrating instead on
making sure their own favorite language or script could be
accommodated in some reasonable way or on a "please just tell me
what to do; I don't want to have to understand it or think about
it... as long as it doesn't mess up my deployed implementations"
style of requirement.
Second, while this is up to Pete, the WG, and its leadership it
is getting late enough in the process that "I am not happy with"
--in the absence of specific suggestions -- is not very helpful.
We need to at least understand what you would like done, not
just that you don't like what is there (or proposed).
If you (and, others if they agree with you), merely want the
statement to be very clear that it is referring to Section 5.2.5
and that is all that is going to be said as far as PRECIS is
concerned, I imagine we (particular Peter and Marc) can figure
out how to do that even though all the ways I can think of at
the moment would sound pretty pedantic.
If your problem (or desire) is a stronger and more specific
recommendation, then there are a couple of problems. There are
families and layers of peculiar and/or edge cases. I think it
is reasonably unlikely that a user or implementer who regularly
reads and writes a particular RTL script and has sufficient
wisdom and maturity to be sensible and conservative about
identifiers rather than trying to figure out what "weird stuff"
might work is going to encounter or generate them. As long as
one avoids those cases, both RFC 5893 and UAX #9 do about what a
reasonable person with experience with the relevant scripts on
computer systems is going to expect.
On the other hand, if one wants to discuss the edge cases, it
isn't going to be possible to do it in a few paragraphs (RFC
5893 and UAX #9 are relatively long documents for a reason) and
I have doubts that the WG will be able to come up with informed
consensus ("rough" or otherwise) that anything that is said is
right. So "general recommendation" covers most of the likely
and important cases. If one wants to get into the edges, one
will quickly end up on one's own and, unless one is really an
expert with the things, with a fairly high likelihood of ending
up in a rather large mess.
And that, IMO, is what the proposed text says.
best,
john
_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis