Re: Mirroring in Unicode

Behdad Esfahbod Fri, 11 Jun 2004 03:15:45 -0700

On Thu, 10 Jun 2004, Ordak D. Coward wrote:

> Hi Behdad,
>
> I just finished finding the relevant part (Rule L4 of UAX #9) of
> Unicode specs refering to mirroring. I believe the problem I am
> complaining about is still a problem and is due to bad Unicode
> specifications. I do not know how Unicode got mirroring into their
> standard, and their rationals behind this. However, in my opinion, the
> correct semantics is that if the input text has matched open and end
> parenthesis then the visual output should also have matched left and
> right parenthesis regardless of the paragrpah mode. Obviously the
> Unicode specs break this semantics when the text is "RTLTEXT(RTLTEXT)"
> and the paragraph is in LTR mode (or vice versa).


I'm sure you agree that matched parantheses is evil in plain
text.  This breaks all kind of things, like statelessness,
context-freeness, locality, etc.  It's plain text after all.

And assuming no matching should be considered, that's almost the
best you can get.  Note that in your example the problem is with
your paragraph direction, but if you change the spec to work
around it you are definitely making worse problems.  In this
speciall case, you need the second paranthesis that way to work
in the more natural "ltrtext(RTLTEXT)" case.

> While we are talking about the semantics behind BIDI algorithm, I was
> wondering if BIDI algorithm assigns the same direction to characters
> regardless of where a line is broken. Which apparenly does not! For
> example, type in "This a very very long line ÙØØØÛ +-* ÛØ ØØØÛ *-+
> this is the question!" in a multiline input area. Notice the visual
> order of *-+ is the same in both occurneces. Now, insert spaces in the
> beginning until you get both of the *-+ on the seocnd line. Now
> observe the difference in ordering of the *-+. I again believe this is
> a design defect of BIDI specifications. Whereas, it only looks at one
> line at a time, and does not allow (unless I am mistaken) for state
> information to be propagated across lines when breaking lines. A
> better design would have allowed (and required) to pass necessary
> state information from one line to another such that the visual
> ordering would have stayed the same regardless of where the lines are
> broken.

No you are wrong here.  Bidi does exactly what you expect.  It
computes this things called "embedding levels" per paragraph,
then reorders text in each line based on the computed embedding
levels.

Note that you are probably using MS products that hardly conform
to the Unicode standard.  Should you write the output you get
that you don't expect/like, I can discuss why it's not that bad.
I tried your example in gedit which is using FriBidi 0.10.4 for
the bidi engine and it works fine.  The "*-+" always looks the
same, no matter where the line breaks.

> Of course, a typical reply could be that I need to insert some control
> characters to achieve the desired ordering. Then, my rebuttal is that
> if that is the case, why not make the control characters for such
> cases mandatory?

Huh?  They are mandatory:  if you want your specific ordering,
you have to insert them.

> Anyway, I have no hope of achieving any positive contribution at
> Unicode consortium (or other big standard groups like that). So, I am
> going to turn this into something more fruitful. That is, I like to
> put the burden of correcting these flaws at the UI. Or:

In fact Unicode Consertium is very open to suggestions and
corrections, but as the bidi expert I tell you, that's almost the
best you can get in this logical->visual model.

> "The UI should add control characters at proper places to the user
> text such that the text renders semantically correct regardless of
> BIDI inconsistencies"

Yes this has been the rule for a few years, but everyone is so
scared about auto-inserting marks and later dealing with them,
without cluttering the text much.  One such implementation is
KDE's parantheses fixing idea based on keyboard layout which is
considered quite a failure (read on Arabeyes wiki page for Qt
bugs).

> I think satisfying the above requirement is not trivial, but
> challenging enough to keep a few good minds busy thinking about it.

Sure, but the problem is that there many many other easier things
that need to be done before we get to there.  For example, we're
right not trying to fix our target system (GNOME/GNU/Linux)
to produce and parse Persian digits.  I mentioned this example
because this is one of those that is not solved in MS system
either.

If you are interested in the bidi algorithm, I recommend
subscribing to the GNU FriBidi mailing list available from:

  http://freedesktop.org/Software/FriBidi

Cheers,
--behdad
  behdad.org

_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Re: Mirroring in Unicode

Reply via email to