Hi,

It is getting more interesting for me, because this is also one one the issues addressed by Persian GUI spec. document I am writing. Unfortunately, many people (including Microsoft) abuse Unicode when writing programs. They don't properly understand and observe bi-di semantics and the choices they make in places that Unicode is either silent or obscure results-in poor implementations. So, the problem is, Unicode specs and reports are not a substitute for good understanding of bi-di semantics, they are just regularizing some aspects of it.

I also criticize Unicode organization for not being through enough in pointing out caveats in this regard and correctly giving the big picture. I know what I should do to get correct results because I have already discovered it independently. Unicode is just one way of putting some of that knowledge on paper and specifying certain methods to deal with certain issues without covering all issues. I would have never been able to think of a correct bi-di implementation solely from Unicode documents.

So, what Unicode specifies is not wrong, but certainly it is not enough. Since there isn't a good documented source for specifying this kind of nuances in many aspects of handling bi-di text and Arabic Script, we came up with the idea of this Persian GUI spec to clarify these issues and provide guidelines to help developers implement correct Persian software (which includes correct bi-di behavior as a subset along with a lot of other things).

If you are really interested in tackling these issues, contact me off list so that we can collaborate further on this. I don't see the list a suitable medium for the discussion because our discussion on this topic will get highly technical and interactive and we will need some diagrams to better illustrate it. So, it will confuse many list members who are not seasoned designers/developers.

Just rest assured: The solution is there, clean and conclusive. Developers just need to get it. They can't easily get it (and it may take them years to get it like myself) because of the lack of good documentation. Persian GUI spec is an effort in the direction of clarifying the solutions to these issues. So, I repeat again: I need community support and help to produce something really helpful. Please take note that such an effort is in progress and it is related to a lot of these things, but it is still in early stages of being put on paper. Everything is still mostly in my head, help pull it out on paper in an understandable way.

- Hooman Mehr

On Jun 11, 2004, at 7:34 AM, Ordak D. Coward wrote:

Hi Behdad,

I just finished finding the relevant part (Rule L4 of UAX #9) of
Unicode specs refering to mirroring. I believe the problem I am
complaining about is still a problem and is due to bad Unicode
specifications. I do not know how Unicode got mirroring into their
standard, and their rationals behind this. However, in my opinion, the
correct semantics is that if the input text has matched open and end
parenthesis then the visual output should also have matched left and
right parenthesis regardless of the paragrpah mode. Obviously the
Unicode specs break this semantics when the text is "RTLTEXT(RTLTEXT)"
and the paragraph is in LTR mode (or vice versa).

While we are talking about the semantics behind BIDI algorithm, I was
wondering if BIDI algorithm assigns the same direction to characters
regardless of where a line is broken. Which apparenly does not! For
example, type in "This a very very long line ÙØØØÛ +-* ÛØ ØØØÛ *-+
this is the question!" in a multiline input area. Notice the visual
order of *-+ is the same in both occurneces. Now, insert spaces in the
beginning until you get both of the *-+ on the seocnd line. Now
observe the difference in ordering of the *-+. I again believe this is
a design defect of BIDI specifications. Whereas, it only looks at one
line at a time, and does not allow (unless I am mistaken) for state
information to be propagated across lines when breaking lines. A
better design would have allowed (and required) to pass necessary
state information from one line to another such that the visual
ordering would have stayed the same regardless of where the lines are
broken.

Of course, a typical reply could be that I need to insert some control
characters to achieve the desired ordering. Then, my rebuttal is that
if that is the case, why not make the control characters for such
cases mandatory?

Anyway, I have no hope of achieving any positive contribution at
Unicode consortium (or other big standard groups like that). So, I am
going to turn this into something more fruitful. That is, I like to
put the burden of correcting these flaws at the UI. Or:

"The UI should add control characters at proper places to the user
text such that the text renders semantically correct regardless of
BIDI inconsistencies"

I think satisfying the above requirement is not trivial, but
challenging enough to keep a few good minds busy thinking about it.


On Thu, 10 Jun 2004 21:47:03 -0400, Behdad Esfahbod
<[EMAIL PROTECTED]> wrote:
Hi Ordak,

This is not a problem in the Unicode Bidi Algorithm, not even in
Microsoft's implementation of the algorithm. And mirroring seems
to be working quite well. The problem is in the higher level
protocols of your system, which simply does not recognize
right-to-left paragraphs.

So your "paragraph direction" is left-to-right, and that's why
you see it like that. Microsoft systems have no way of
auto-detecting paragraph directions. In notepad you can set the
whole document direction to rtl or ltr. In MS Word you can set
direction for individual paragraphs.

GNOME has recently applied a marvelous patch to autodetect
paragraph directions in the most sophisticated way, so we're just
having fun with our text editors ;-).

behdad



On Thu, 10 Jun 2004, Ordak D. Coward wrote:

I noticed that certain mirrored characters appear semanticly wrong on
my Windows XP machine. I have no idea if it is a problem of Unicode
BIDI specs or is due to Windows XP imeplementation. I describe the
problem here, hoping people who know Unicode better pinpoint the
source of it.

I if type in: "ØØØ (farsi)", that is the sequence T A R SP ( f a r s i )
(capital stands for RTL text), the result is RAT (farsi)

However, if I type in "ØØØ (ÙØØØÛ)" that is the sequence T A R SP ( F A R S I )
the result is ISRAF) RAT)

Obvisouly the parenthesis are wrong in the second example. Now, if
this is a unicode spec problem, I think they need to fix this. How the
above text appears on other platforms?

_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing



--behdad
behdad.org


_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to