Re: Mirroring in Mac OS X (was Mirroring in Unicode)

2004-06-12 Thread Hooman Mehr
Dear Behnam,
No, this is another story. The sad news is that there are multiple 
implementations of Unicode in Mac OS X. WebKit (The engine of Safari) 
has its own Unicode/Bidi engine. Cocoa has its own Unicode with no 
native Bidi with some ugly Carbon ATSUI patches bolted on and some ICU 
thrown in to get limited Bidi. Carbon uses an incomplete and degraded 
implementation of ATSUI which is a downgraded and crippled version of 
QuickDraw GX layout engine of system 7 days. That is not all. I really 
hope Apple will start to clean up this extremely ugly mess, otherwise 
they will be forced out of bidi markets for good. It is amazing how 
much worse their bidi text engine is compared to 12 years ago.

The problem is that each of these have their own bugs. Sometimes the 
bugs are a result of the same thing being applied twice because of API 
layering. This is the case with Safari. In some combinations of style 
sheet and page tags it tends to mirror a glyph twice which will 
result-in no mirroring which is wrong. Actually the workaround in such 
case is to use a buggy font which does not have a 'prop' table (like a 
PC font) and then it will work because it would not be mirrored by the 
normal mechanism and just WebKit's extra mirroring would create the 
correct result.

I really hope someone at an influential Apple position would listen to 
me It really frustrates me to see Apple (who once was a pioneer in 
bidi and was one of the key founders of Unicode) in its current sad 
position in bidi support. The problems are deep rooted and want a real 
effort and will in high management positions to solve.

- Hooman Mehr
On Jun 12, 2004, at 7:51 PM, Behnam wrote:
Short of missing something on the list, that would be me providing 
alternatives to Apple standard keyboards. But they are not "fix" of 
existing standards. In fact, they are not standard at all! But you are 
right. This is a minor issue and can be fixed. I can do it for Mac 
community but I rather ask Apple to do it in its original issue.
My concern is more to do with different approaches in dealing with 
mirroring characters.
The point being, it doesn't seem to be the way mirroring characters 
are mapped on MS keyboards. And most of the web-pages are typed by MS 
keyboards. Am I on the right track?

Behnam
On 12-Jun-04, at 10:54 AM, Hooman Mehr wrote:
Hi,
I checked it and can confirm that Apple's ISIRI 2901 keyboard has a 
bug in this regard. The Persian opening parenthesis in ISIRI 2901 is 
located on shit-0 and closing parenthesis on shift-9, but Apple's 
implementation have them reversed. This is a minor issue. The 
keyboard file is an XML file that can be easily edited with sys. 
admin. privileges. I think someone already posted information on a 
fixed and enhanced Persian Mac OS X keyboard on the list.

- Hooman Mehr

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Behnam
Short of missing something on the list, that would be me providing 
alternatives to Apple standard keyboards. But they are not "fix" of 
existing standards. In fact, they are not standard at all! But you are 
right. This is a minor issue and can be fixed. I can do it for Mac 
community but I rather ask Apple to do it in its original issue.
My concern is more to do with different approaches in dealing with 
mirroring characters.
The point being, it doesn't seem to be the way mirroring characters are 
mapped on MS keyboards. And most of the web-pages are typed by MS 
keyboards. Am I on the right track?

Behnam
On 12-Jun-04, at 10:54 AM, Hooman Mehr wrote:
Hi,
I checked it and can confirm that Apple's ISIRI 2901 keyboard has a 
bug in this regard. The Persian opening parenthesis in ISIRI 2901 is 
located on shit-0 and closing parenthesis on shift-9, but Apple's 
implementation have them reversed. This is a minor issue. The keyboard 
file is an XML file that can be easily edited with sys. admin. 
privileges. I think someone already posted information on a fixed and 
enhanced Persian Mac OS X keyboard on the list.

- Hooman Mehr
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Hooman Mehr
Hi,
I checked it and can confirm that Apple's ISIRI 2901 keyboard has a bug 
in this regard. The Persian opening parenthesis in ISIRI 2901 is 
located on shit-0 and closing parenthesis on shift-9, but Apple's 
implementation have them reversed. This is a minor issue. The keyboard 
file is an XML file that can be easily edited with sys. admin. 
privileges. I think someone already posted information on a fixed and 
enhanced Persian Mac OS X keyboard on the list.

- Hooman Mehr
On Jun 12, 2004, at 6:12 PM, Behnam wrote:
On 12-Jun-04, at 8:50 AM, Hooman Mehr wrote:
On the other hand, I suspect you have font related issues. read 
below...

This whole thing means that on Mac platform we will see the wrong 
parenthesis on Persian web-pages forever!

Part of the issue you are experiencing could be related to fonts. 
Persian/Arabic Apple fonts need a suitable character property table 
to identify mirrored glyphs and behave correctly. Please compare the 
behavior of Geeza Pro standard system font with the fonts you are 
using. If they are different it is because of the missing or 
improperly formed 'prop' table in the font. 
(http://developer.apple.com/fonts/TTRefMan/RM06/Chap6prop.html) If 
this is the case let me know to see how I can help fix them.
I do all my tests with Geeza Pro and ISIRI keyboard does produce the 
opposite of intended parenthesis with Geeza Pro. Apple Persian 
keyboard produces the intended one because as I said it is mapped in 
the opposite way.
My other fonts behave similarly which, I suppose, is good news!

Behnam
P/S	I'm very interested to present this discussion to Apple developer 
and I'm working on it.

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Behnam
On 12-Jun-04, at 8:50 AM, Hooman Mehr wrote:
On the other hand, I suspect you have font related issues. read 
below...

This whole thing means that on Mac platform we will see the wrong 
parenthesis on Persian web-pages forever!

Part of the issue you are experiencing could be related to fonts. 
Persian/Arabic Apple fonts need a suitable character property table to 
identify mirrored glyphs and behave correctly. Please compare the 
behavior of Geeza Pro standard system font with the fonts you are 
using. If they are different it is because of the missing or 
improperly formed 'prop' table in the font. 
(http://developer.apple.com/fonts/TTRefMan/RM06/Chap6prop.html) If 
this is the case let me know to see how I can help fix them.
I do all my tests with Geeza Pro and ISIRI keyboard does produce the 
opposite of intended parenthesis with Geeza Pro. Apple Persian keyboard 
produces the intended one because as I said it is mapped in the 
opposite way.
My other fonts behave similarly which, I suppose, is good news!

Behnam
P/S	I'm very interested to present this discussion to Apple developer 
and I'm working on it.

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


OT: GNOME/GNU (was Re: Mirroring in Unicode)

2004-06-12 Thread Roozbeh Pournader
> our target system (GNOME/GNU/Linux)

GNOME is a GNU project, of course.

roozbeh


___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Hooman Mehr
On Jun 12, 2004, at 4:14 PM, Behnam wrote:
I had discussion with an Apple developer on this subject. She insisted 
that this is the way Unicode wants the mirroring characters to behave 
and that Apple has no intention to change its implementation of them.
There has been a misunderstanding in your conversation and in a sense 
both of you are right. As I develop this topic further you'll better 
understand it. I hope she would read my posts (if she has any influence 
on Apple) so that something would get fixed on Apple's side as well.

On the other hand, what she needs to realize (along with most of the 
other developers) is: Unicode does not have to dictate the user 
interface of text input and editing. The user interface of text editing 
can be vastly improved if we properly design a GUI-optimized model to 
hide the true underlying Unicode bidi semantics in favor of easier and 
more user friendly semantics while maintaining 100% Unicode 
compatibility.

On the other hand, I suspect you have font related issues. read below...
This whole thing means that on Mac platform we will see the wrong 
parenthesis on Persian web-pages forever!

Part of the issue you are experiencing could be related to fonts. 
Persian/Arabic Apple fonts need a suitable character property table to 
identify mirrored glyphs and behave correctly. Please compare the 
behavior of Geeza Pro standard system font with the fonts you are 
using. If they are different it is becuase of the missing or improperly 
formed 'prop' table in the font. 
(http://developer.apple.com/fonts/TTRefMan/RM06/Chap6prop.html) If this 
is the case let me know to see how I can help fix them.

I guess that along the effort in finding a proper solution for 
handling of mirroring characters, there has to be an effort to remove 
this useless mirroring effect in Unicode altogether.
Don't even think about that. In the text stream level using logical 
opening and closing parenthesis instead of visual left and right 
parenthesis is actually very helpful in keeping the logical text 
processing model simple and elegant. Also, too many things already 
depend on it. We need to address this issue in text input/editing 
services of the operating system without touching Unicode. As I 
mentioned Unicode is not at fault here. The current assumption that the 
Unicode model necessarily applies to the user interface is the problem.

- Hooman Mehr
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Behnam
On 12-Jun-04, at 5:35 AM, Hooman Mehr wrote:
- The user-friendly solution involves somewhat moving away from 
abstract concepts and embracing concrete objects. Lets delve deeper: 
What do you have on your keyboard that identifies a parenthesis? You 
have just a physical mark, a concrete object for each one. They do not 
unambiguously refer to either opening or closing parenthesis. Their 
meaning depends on the current *mode*. This means that Unicode 
results-in a modal situation without adequate feedback which I hope 
everybody agrees is undesirable in most circumstances.
Compared to Microsoft implementation, Apple Macintosh implements 
mirroring Unicode characters differently. RTL keyboard layouts of 
Macintosh, including Persian keyboard actually places the opposite 
shape of parenthesis or bracket etc. on the keyboard in order to 
produce the intended shape in RTL mode. This is indeed very confusing.
When Apple added Persian ISIRI 2901 to its latest OS, being ISIRI 
standard, it is implemented exactly as is. As a result, parenthesis on 
this keyboard produce the opposite of the intended shape in RTL mode.
I had discussion with an Apple developer on this subject. She insisted 
that this is the way Unicode wants the mirroring characters to behave 
and that Apple has no intention to change its implementation of them.
This whole thing means that on Mac platform we will see the wrong 
parenthesis on Persian web-pages forever!

I guess that along the effort in finding a proper solution for handling 
of mirroring characters, there has to be an effort to remove this 
useless mirroring effect in Unicode altogether.
I know of some Jewish foes that are not too happy about this either!

Behnam
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Hooman Mehr
Hi Behdad,
I didn't originally notice this part of your post. My apologies.
KDE's example is a bad realization of a good idea which causes the idea 
to be discredited. I have an implementation that have been working for 
years. [1]
My implementation looks more like patching a user hostile assumption in 
Unicode design [2], but it works flawlessly. KDE's example, does not 
prove the idea wrong, its implementation is flawed.

On the other hand, once you find an implementation that really works, 
you would never look back. I will share my solution in the Persian GUI 
spec document and for better or worse it may become the standard 
behavior. Now that you brought this up, I feel I am obligated to 
participate actively in its proper implementation to save it from 
ending up like KDE's.

Let me just give some hints about what goes wrong when you try to stay 
true to Unicode when dealing with text input/edit user interface:

- Most average users have trouble handling and using abstract concepts.
- Unicode is talking about logical things and abstractions a lot: 
Opening and closing parenthesis are concepts "(" and ")" are visual 
concrete objects. For a bi-di text the same closing parenthesis concept 
may sometimes result "(" and sometimes ")" -- two different objects in 
the physical world.
- The user-friendly solution involves somewhat moving away from 
abstract concepts and embracing concrete objects. Lets delve deeper: 
What do you have on your keyboard that identifies a parenthesis? You 
have just a physical mark, a concrete object for each one. They do not 
unambiguously refer to either opening or closing parenthesis. Their 
meaning depends on the current *mode*. This means that Unicode 
results-in a modal situation without adequate feedback which I hope 
everybody agrees is undesirable in most circumstances.

You can see that if we want to make the bi-di computing more user 
friendly, we need to architect a mode-less, WYSIWYG user interface for 
bidi text input/edit. To achieve that, we have no choice but to go 
against some Unicode principles and replace some abstract concepts with 
concrete ones in the context of user interface. This does not mean that 
we have to change or violate Unicode but means that we need to do more 
work on text input/edit engine besides blindly relying on FriBiDi to 
create a clean Unicode text stream in the back-end.

Please note that this does not mean that Unicode is bad or wrong, but 
it is not designed to be optimal for Interactive text input/edit. This 
also does not mean that an optimal text input/edit is impossible with 
Unicode as the back-end text stream/storage model.

- Hooman Mehr
[1] I admit that it is working in a controlled environment and is not 
stress tested. Also, post processing of the text stream can wreck the 
text stream if it does not observe some rules. Something very hard to 
enforce on database engines that convert Unicode to some other (usually 
8-bit) internal encoding and later convert it back to Unicode.

[2] Unicode uses some good principles to create a logically clean text 
stream while reducing duplicated characters. The actual implementation 
does not always stay true to the principles which makes the actual 
Unicode (as it exists today) far uglier than it could have been. The 
bad news is that some of those principles adversely affect bi-di text 
in a fundamental way. Unicode has been struggling for years to refine 
its bi-di handling to the point of today's maturity and Behdad, you 
have been a great contributor with your FriBiDi and other efforts. But 
the fact is, those principles are not a natural fit for bi-di text. We 
can easily see this. Look at the mirrored glyph issue for example.

On Jun 12, 2004, at 11:42 AM, Ordak D. Coward wrote:
Hi Behdad,
On Fri, 11 Jun 2004 05:34:42 -0400, Behdad Esfahbod
<[EMAIL PROTECTED]> wrote:
Yes this has been the rule for a few years, but everyone is so
scared about auto-inserting marks and later dealing with them,
without cluttering the text much.  One such implementation is
KDE's parantheses fixing idea based on keyboard layout which is
considered quite a failure (read on Arabeyes wiki page for Qt
bugs).
I finally figured out that if I insert either an RLE or an LRE
character right before each open parenthesis and a PDF character right
after each close parenthesis then all parenthesis are matched and also
their nesting level is preserved as well. Is this something guranteed,
or is that I could not find a bad example where this breaks?
Also, is this the KDE's parenthesis fixing idea you are refering to 
above?

--
ODC
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-12 Thread Ordak D. Coward
Hi Behdad,

On Fri, 11 Jun 2004 05:34:42 -0400, Behdad Esfahbod
<[EMAIL PROTECTED]> wrote:
> 
> Yes this has been the rule for a few years, but everyone is so
> scared about auto-inserting marks and later dealing with them,
> without cluttering the text much.  One such implementation is
> KDE's parantheses fixing idea based on keyboard layout which is
> considered quite a failure (read on Arabeyes wiki page for Qt
> bugs).

I finally figured out that if I insert either an RLE or an LRE
character right before each open parenthesis and a PDF character right
after each close parenthesis then all parenthesis are matched and also
their nesting level is preserved as well. Is this something guranteed,
or is that I could not find a bad example where this breaks?

Also, is this the KDE's parenthesis fixing idea you are refering to above?

--
ODC
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-11 Thread Hooman Mehr
Hi,

It is getting more interesting for me, because this is also one one the issues addressed by Persian GUI spec. document I am writing. Unfortunately, many people (including Microsoft) abuse Unicode when writing programs. They don't properly understand and observe bi-di semantics and the choices they make in places that Unicode is either silent or obscure results-in poor implementations. So, the problem is, Unicode specs and reports are not a substitute for good understanding of bi-di semantics, they are just regularizing some aspects of it.

I also criticize Unicode organization for not being through enough in pointing out caveats in this regard and correctly giving the big picture. I know what I should do to get correct results because I have already discovered it independently. Unicode is just one way of putting some of that knowledge on paper and specifying certain methods to deal with certain issues without covering all issues. I would have never been able to think of a correct bi-di implementation solely from Unicode documents. 

So, what Unicode specifies is not wrong, but certainly it is not enough. Since there isn't a good documented source for specifying this kind of nuances in many aspects of handling bi-di text and Arabic Script, we came up with the idea of this Persian GUI spec to clarify these issues and provide guidelines to help developers implement correct Persian software (which includes correct bi-di behavior as a subset along with a lot of other things).

If you are really interested in tackling these issues, contact me off list so that we can collaborate further on this. I don't see the list a suitable medium for the discussion because our discussion on this topic will get highly technical and interactive and we will need some diagrams to better illustrate it. So, it will confuse many list members who are not seasoned designers/developers. 

Just rest assured: The solution is there, clean and conclusive. Developers just need to get it. They can't easily get it (and it may take them years to get it like myself) because of the lack of good documentation. Persian GUI spec is an effort in the direction of clarifying the solutions to these issues. So, I repeat again: I need community support and help to produce something really helpful. Please take note that such an effort is in progress and it is related to a lot of these things, but it is still in early stages of being put on paper. Everything is still mostly in my head, help pull it out on paper in an understandable way.

- Hooman Mehr

On Jun 11, 2004, at 7:34 AM, Ordak D. Coward wrote:

Hi Behdad, 

I just finished finding the relevant part (Rule L4 of UAX #9) of
Unicode specs refering to mirroring. I believe the problem I am
complaining about is still a problem and is due to bad Unicode
specifications. I do not know how Unicode got mirroring into their
standard, and their rationals behind this. However, in my opinion, the
correct semantics is that if the input text has matched open and end
parenthesis then the visual output should also have matched left and
right parenthesis regardless of the paragrpah mode. Obviously the
Unicode specs break this semantics when the text is "RTLTEXT(RTLTEXT)"
and the paragraph is in LTR mode (or vice versa).

While we are talking about the semantics behind BIDI algorithm, I was
wondering if BIDI algorithm assigns the same direction to characters
regardless of where a line is broken. Which apparenly does not! For
example, type in "This a very very long line ÙØØØÛ +-* ÛØ ØØØÛ *-+
this is the question!" in a multiline input area. Notice the visual
order of *-+ is the same in both occurneces. Now, insert spaces in the
beginning until you get both of the *-+ on the seocnd line. Now
observe the difference in ordering of the *-+. I again believe this is
a design defect of BIDI specifications. Whereas, it only looks at one
line at a time, and does not allow (unless I am mistaken) for state
information to be propagated across lines when breaking lines. A
better design would have allowed (and required) to pass necessary
state information from one line to another such that the visual
ordering would have stayed the same regardless of where the lines are
broken.

Of course, a typical reply could be that I need to insert some control
characters to achieve the desired ordering. Then, my rebuttal is that
if that is the case, why not make the control characters for such
cases mandatory?

Anyway, I have no hope of achieving any positive contribution at
Unicode consortium (or other big standard groups like that). So, I am
going to turn this into something more fruitful. That is, I like to
put the burden of correcting these flaws at the UI. Or:

"The UI should add control characters at proper places to the user
text such that the text renders semantically correct regardless of
BIDI inconsistencies"

I think satisfying the above requirement is not trivial, but
challenging enough to keep a few good minds busy thinking abo

Re: Mirroring in Unicode

2004-06-11 Thread Behdad Esfahbod
On Thu, 10 Jun 2004, Ordak D. Coward wrote:

> Hi Behdad,
>
> I just finished finding the relevant part (Rule L4 of UAX #9) of
> Unicode specs refering to mirroring. I believe the problem I am
> complaining about is still a problem and is due to bad Unicode
> specifications. I do not know how Unicode got mirroring into their
> standard, and their rationals behind this. However, in my opinion, the
> correct semantics is that if the input text has matched open and end
> parenthesis then the visual output should also have matched left and
> right parenthesis regardless of the paragrpah mode. Obviously the
> Unicode specs break this semantics when the text is "RTLTEXT(RTLTEXT)"
> and the paragraph is in LTR mode (or vice versa).

I'm sure you agree that matched parantheses is evil in plain
text.  This breaks all kind of things, like statelessness,
context-freeness, locality, etc.  It's plain text after all.

And assuming no matching should be considered, that's almost the
best you can get.  Note that in your example the problem is with
your paragraph direction, but if you change the spec to work
around it you are definitely making worse problems.  In this
speciall case, you need the second paranthesis that way to work
in the more natural "ltrtext(RTLTEXT)" case.

> While we are talking about the semantics behind BIDI algorithm, I was
> wondering if BIDI algorithm assigns the same direction to characters
> regardless of where a line is broken. Which apparenly does not! For
> example, type in "This a very very long line ÙØØØÛ +-* ÛØ ØØØÛ *-+
> this is the question!" in a multiline input area. Notice the visual
> order of *-+ is the same in both occurneces. Now, insert spaces in the
> beginning until you get both of the *-+ on the seocnd line. Now
> observe the difference in ordering of the *-+. I again believe this is
> a design defect of BIDI specifications. Whereas, it only looks at one
> line at a time, and does not allow (unless I am mistaken) for state
> information to be propagated across lines when breaking lines. A
> better design would have allowed (and required) to pass necessary
> state information from one line to another such that the visual
> ordering would have stayed the same regardless of where the lines are
> broken.

No you are wrong here.  Bidi does exactly what you expect.  It
computes this things called "embedding levels" per paragraph,
then reorders text in each line based on the computed embedding
levels.

Note that you are probably using MS products that hardly conform
to the Unicode standard.  Should you write the output you get
that you don't expect/like, I can discuss why it's not that bad.
I tried your example in gedit which is using FriBidi 0.10.4 for
the bidi engine and it works fine.  The "*-+" always looks the
same, no matter where the line breaks.

> Of course, a typical reply could be that I need to insert some control
> characters to achieve the desired ordering. Then, my rebuttal is that
> if that is the case, why not make the control characters for such
> cases mandatory?

Huh?  They are mandatory:  if you want your specific ordering,
you have to insert them.

> Anyway, I have no hope of achieving any positive contribution at
> Unicode consortium (or other big standard groups like that). So, I am
> going to turn this into something more fruitful. That is, I like to
> put the burden of correcting these flaws at the UI. Or:

In fact Unicode Consertium is very open to suggestions and
corrections, but as the bidi expert I tell you, that's almost the
best you can get in this logical->visual model.

> "The UI should add control characters at proper places to the user
> text such that the text renders semantically correct regardless of
> BIDI inconsistencies"

Yes this has been the rule for a few years, but everyone is so
scared about auto-inserting marks and later dealing with them,
without cluttering the text much.  One such implementation is
KDE's parantheses fixing idea based on keyboard layout which is
considered quite a failure (read on Arabeyes wiki page for Qt
bugs).

> I think satisfying the above requirement is not trivial, but
> challenging enough to keep a few good minds busy thinking about it.

Sure, but the problem is that there many many other easier things
that need to be done before we get to there.  For example, we're
right not trying to fix our target system (GNOME/GNU/Linux)
to produce and parse Persian digits.  I mentioned this example
because this is one of those that is not solved in MS system
either.

If you are interested in the bidi algorithm, I recommend
subscribing to the GNU FriBidi mailing list available from:

  http://freedesktop.org/Software/FriBidi

Cheers,
--behdad
  behdad.org

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-10 Thread Ordak D. Coward
Hi Behdad, 

I just finished finding the relevant part (Rule L4 of UAX #9) of
Unicode specs refering to mirroring. I believe the problem I am
complaining about is still a problem and is due to bad Unicode
specifications. I do not know how Unicode got mirroring into their
standard, and their rationals behind this. However, in my opinion, the
correct semantics is that if the input text has matched open and end
parenthesis then the visual output should also have matched left and
right parenthesis regardless of the paragrpah mode. Obviously the
Unicode specs break this semantics when the text is "RTLTEXT(RTLTEXT)"
and the paragraph is in LTR mode (or vice versa).

While we are talking about the semantics behind BIDI algorithm, I was
wondering if BIDI algorithm assigns the same direction to characters
regardless of where a line is broken. Which apparenly does not! For
example, type in "This a very very long line ÙØØØÛ +-* ÛØ ØØØÛ *-+
this is the question!" in a multiline input area. Notice the visual
order of *-+ is the same in both occurneces. Now, insert spaces in the
beginning until you get both of the *-+ on the seocnd line. Now
observe the difference in ordering of the *-+. I again believe this is
a design defect of BIDI specifications. Whereas, it only looks at one
line at a time, and does not allow (unless I am mistaken) for state
information to be propagated across lines when breaking lines. A
better design would have allowed (and required) to pass necessary
state information from one line to another such that the visual
ordering would have stayed the same regardless of where the lines are
broken.

Of course, a typical reply could be that I need to insert some control
characters to achieve the desired ordering. Then, my rebuttal is that
if that is the case, why not make the control characters for such
cases mandatory?

Anyway, I have no hope of achieving any positive contribution at
Unicode consortium (or other big standard groups like that). So, I am
going to turn this into something more fruitful. That is, I like to
put the burden of correcting these flaws at the UI. Or:

"The UI should add control characters at proper places to the user
text such that the text renders semantically correct regardless of
BIDI inconsistencies"

I think satisfying the above requirement is not trivial, but
challenging enough to keep a few good minds busy thinking about it.


On Thu, 10 Jun 2004 21:47:03 -0400, Behdad Esfahbod
<[EMAIL PROTECTED]> wrote:
> 
> 
> Hi Ordak,
> 
> This is not a problem in the Unicode Bidi Algorithm, not even in
> Microsoft's implementation of the algorithm.  And mirroring seems
> to be working quite well.  The problem is in the higher level
> protocols of your system, which simply does not recognize
> right-to-left paragraphs.
> 
> So your "paragraph direction" is left-to-right, and that's why
> you see it like that.  Microsoft systems have no way of
> auto-detecting paragraph directions.  In notepad you can set the
> whole document direction to rtl or ltr.  In MS Word you can set
> direction for individual paragraphs.
> 
> GNOME has recently applied a marvelous patch to autodetect
> paragraph directions in the most sophisticated way, so we're just
> having fun with our text editors ;-).
> 
> behdad
> 
> 
> 
> On Thu, 10 Jun 2004, Ordak D. Coward wrote:
> 
> > I noticed that certain mirrored characters appear semanticly wrong on
> > my Windows XP machine. I have no idea if it is a problem of Unicode
> > BIDI specs or is due to Windows XP imeplementation. I describe the
> > problem here, hoping people who know Unicode better pinpoint the
> > source of it.
> >
> > I if type in: "ØØØ (farsi)", that is the sequence T A R SP ( f a r s i )
> > (capital stands for RTL text), the result is RAT (farsi)
> >
> > However, if I type in "ØØØ (ÙØØØÛ)" that is the sequence T A R SP ( F A R 
> > S I )
> > the result is  ISRAF) RAT)
> >
> > Obvisouly the parenthesis are wrong in the second example. Now, if
> > this is a unicode spec problem, I think they need to fix this. How the
> > above text appears on other platforms?
> >
> > ___
> > PersianComputing mailing list
> > [EMAIL PROTECTED]
> > http://lists.sharif.edu/mailman/listinfo/persiancomputing
> >
> >
> 
> --behdad
>  behdad.org
>

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Mirroring in Unicode

2004-06-10 Thread Behdad Esfahbod

Hi Ordak,

This is not a problem in the Unicode Bidi Algorithm, not even in
Microsoft's implementation of the algorithm.  And mirroring seems
to be working quite well.  The problem is in the higher level
protocols of your system, which simply does not recognize
right-to-left paragraphs.

So your "paragraph direction" is left-to-right, and that's why
you see it like that.  Microsoft systems have no way of
auto-detecting paragraph directions.  In notepad you can set the
whole document direction to rtl or ltr.  In MS Word you can set
direction for individual paragraphs.

GNOME has recently applied a marvelous patch to autodetect
paragraph directions in the most sophisticated way, so we're just
having fun with our text editors ;-).

behdad


On Thu, 10 Jun 2004, Ordak D. Coward wrote:

> I noticed that certain mirrored characters appear semanticly wrong on
> my Windows XP machine. I have no idea if it is a problem of Unicode
> BIDI specs or is due to Windows XP imeplementation. I describe the
> problem here, hoping people who know Unicode better pinpoint the
> source of it.
>
> I if type in: "ØØØ (farsi)", that is the sequence T A R SP ( f a r s i )
> (capital stands for RTL text), the result is RAT (farsi)
>
> However, if I type in "ØØØ (ÙØØØÛ)" that is the sequence T A R SP ( F A R S 
> I )
> the result is  ISRAF) RAT)
>
> Obvisouly the parenthesis are wrong in the second example. Now, if
> this is a unicode spec problem, I think they need to fix this. How the
> above text appears on other platforms?
>
> ___
> PersianComputing mailing list
> [EMAIL PROTECTED]
> http://lists.sharif.edu/mailman/listinfo/persiancomputing
>
>

--behdad
  behdad.org

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Mirroring in Unicode

2004-06-10 Thread Ordak D. Coward
I noticed that certain mirrored characters appear semanticly wrong on
my Windows XP machine. I have no idea if it is a problem of Unicode
BIDI specs or is due to Windows XP imeplementation. I describe the
problem here, hoping people who know Unicode better pinpoint the
source of it.

I if type in: "ØØØ (farsi)", that is the sequence T A R SP ( f a r s i )
(capital stands for RTL text), the result is RAT (farsi)

However, if I type in "ØØØ (ÙØØØÛ)" that is the sequence T A R SP ( F A R S I )
the result is  ISRAF) RAT)

Obvisouly the parenthesis are wrong in the second example. Now, if
this is a unicode spec problem, I think they need to fix this. How the
above text appears on other platforms?

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing