On 8/21/2011 7:34 PM, Doug Ewell wrote:
So what you are asking about is a directional control character that would
assign subsequent characters a BC of 'AL', right?
You don't want to call this a LANGUAGE MARK or anything else that implies language
identification, because of the existence of "real" language identification
mechanisms and the history of Unicode and language tagging.
An ARM (Arabic RTL Mark) would be a sensible addition to the standard.
It would close a small gap in design that currently prevents a fully
faithful plain text export of bidi text from rich text (higher level
protocol) formats.
In a HLP you can assign any run to behave as if it was following a
character with bidi property AL.
When you export this text as plain text, unless there is an actual AL
character, you cannot get the same behavior (other than by the
heavy-handed method of completely overriding the directionality, making
your plain text less editable).
So, yes, there's a bit of a use case for such a mark.
(It's effect is limited to treatment of numeric expressions, so it's not
an "Arabic language" mark, but one that triggers the same bidi context
as the presence of an Arabic Script (AL) character.)
A./
--
Doug Ewell • [email protected]
Sent via BlackBerry by AT&T
-----Original Message-----
From: Richard Wordingham<[email protected]>
Sender: [email protected]
Date: Mon, 22 Aug 2011 03:19:39
To: Unicode Mailing List<[email protected]>
Subject: Re: RTL PUA?
On Sun, 21 Aug 2011 23:55:46 +0000
"Doug Ewell"<[email protected]> wrote:
What's a LANGUAGE MARK?
There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect). 'AL' and 'R' have different effects on certain characters
next to digits - it's the mind-numbing part of the BiDi algorithm.
With one a $ sign after a string of European (or is it Arabic?) digits
appears on the left and in the other it appears on the right. I
can't remember whether 'higher-level protocols' have an effect on this
logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
has a BC of AL. That's why I tentatively raised the notion of ARABIC
LANGUAGE MARK. Incidentally, an RLO gives characters with a
temporary BC of R, not AL.
Richard.