Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

Asmus Freytag Mon, 21 Apr 2014 07:35:32 -0700

On 4/21/2014 1:33 AM, Eli Zaretskii wrote:

Date: Sun, 20 Apr 2014 23:03:20 -0700
From: Asmus Freytag <[email protected]>
CC: Eli Zaretskii <[email protected]>, [email protected],
  Kenneth Whistler <[email protected]>

         Note that the current embedding level is not changed by this rule.

     What does this last sentence mean by "the current embedding level"?
     The first bullet of X6 mandates that "the current character’s
     embedding level" _is_ changed by this rule, so what other "current
     embedding level" is alluded to here?

     I'm punting on that one - can someone else answer this?


I assume "current embedding level" here meant "the embedding level of
the last entry on the directional status stack". (This is a natural
slip to make if you think in terms of an optimized implementation that
stores each component of the top of the directional status stack in a
variable, as suggested in 3.3.2.)

James

In general, I heartily dislike "specifications" that just narrate a
particular implementation...

I cannot agree more.

In fact, my main gripe about the UBA additions in 6.3 are that some of
their crucial parts are not formally defined, except by an algorithm
that narrates a specific implementation.  The two worst examples of
that are the "definitions" of the isolating run sequence and of the
bracket pair.  I didn't ask about those because I succeeded to figure
them out, but it took many readings of the corresponding parts of the
document.  It is IMO a pity that the two main features added in 6.3
are based on definitions that are so hard to penetrate, and which
actually all but force you to use the specific implementation
described by the document.

My working definition that replaces BD13 is this:

   An isolating run sequence is the maximal sequence of level runs of
   the same embedding level that can be obtained by removing all the
   characters between an isolate initiator and its matching PDI (or
   paragraph end, if there is no matching PDI) within those level runs.

As for bracket pair (BD16), I'm really amazed that a concept as easy
and widely known/used as this would need such an obscure definition
that must have an algorithm as its necessary part.  How about this
instead:

   A bracket pair is a pair of an opening paired bracket and a closing
   paired bracket characters within the same isolating run sequence,
   such that the Bidi_Paired_Bracket property value of the former
   character or its canonical equivalent equals the latter character or
   its canonical equivalent, and all the opening and closing bracket
   characters in between these two are balanced.

Then we could use the algorithm to explain what it means for brackets
to be balanced (for those readers who somehow don't already know
that).

Again, thanks for clarifying these subtle issues.  I can now proceed
to updating the Emacs bidirectional display with the changes in
Unicode 6.3.

FWIW here is the restatement of BD16 that I used for myself (and that I put
into the source comments of the sample Java implementation):

// The following is a restatement of BD 16 using non-algorithmiclanguage.

    //
    // A bracket pair is a pair of characters consisting of an opening
    // paired bracket and a closing paired bracket such that the
    // Bidi_Paired_Bracket property value of the former equals the latter,
    // subject to the following constraints.
    // - both characters of a pair occur in the same isolating run sequence
    // - the closing character of a pair follows the opening character

// - any bracket character can belong at most to one pair, theearliest possible one// - any bracket character not part of a pair is treated like anordinary character

    // - pairs may nest properly, but their spans may not overlap otherwise

// Bracket characters with canonical decompositions are supposed tobe treated// as if they had been normalized, to allow normalized andnon-normalized text

    // to give the same result.

Your language is more concise, but you may compare for differences.

A./

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

Reply via email to