I don't think anything to do with 5 levels of imbedding or overrides can be considered a big bug.
Jony > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]] On Behalf Of Bernard Miller > Sent: Wednesday, May 29, 2002 6:57 PM > To: [EMAIL PROTECTED] > Subject: 3 big bidi bugs > > > > This letter describes 3 major technical problems with the > current Unicode bidirectional algorithm as described in UAX > #9, version 3.20. Problems 1 and 3 have security > implications. Other problems with the whole Unicode > bidirectional encoding approach, and their solutions, are > discussed in the recently updated Bytext FAQ and > documentation (www.bytext.org). > > (1) Line width dependent mangling, general case: > Step L2 of UAX #9 indicates that a line that resolves into a > sequence of characters with homogenous embedding levels will > ALWAYS be displayed right to left, regardless of what the > embedding level is. > > So, for example a line that with the L1 resolved embedding > levels of: 2222222222222222222222222 will display right to > left 3333333333333333333333333 will display right to left > 4444444444444444444444444 will display right to left etc > > Likewise: > in 3333333333333333333333331, the 3�s will display left to > right in 5555555555555555555555551, the 5�s will display left > to right etc > > It directly contradicts the writers intentions. It means that > different Unicode compliant applications will display the > same characters in a different order (depending on available > line width). Examples of how this is bad are given in > question 12 of the Bytext FAQ (www.bytext.org/faq#12). This > can be fixed by rewording step L2 such that a reversal > happens from the highest embedding level to each lower > contiguous embedding level, regardless if the embedding level > is represented by a character on the line, until the > embedding level of 1 is reached (or, as an optimization, > until the first odd embedding level equal to or lower than > the lowest embedding level represented by a character on the line). > > (2) Line width dependent mangling, spelling conventions for > quotes: What is the purpose of step X10 if not to allow > something like LEFT DOUBLE QUOTATION MARK to be used as if it > was an OPEN DOUBLE QUOTATION MARK? One simply puts an > embedding inside a quotation, such as �<RLE>quotation<PDF>�. > The problem with this is that it only works if the quotation > begins and ends on the same line. Examples of how the text is > mangled when the quotation spans multiple lines are given in > question 13 of the Bytext FAQ (www.bytext.org/faq#13). This > cannot really be fixed with minor changes other than to > notify users that the whole left=open, right=closed idea may > not work as such when the default automatic line breaking is > used. Users should not rely on any spelling conventions that > do not bypass the effects of step X10 and mirroring --how > this can be done is described in the Bytext documentation. > > (3) Mirroring ambiguities: > What if eor = sor? > > text: R RLO whatever PDF N LRO whatever PDF > embedding level at step X9: 1 3 3 1 2 2 > directional type at step X10: R R R ? L L > > The above example should be in a monospace font. The original > is at www.bytext.org/faq#12. Step X10 is ambiguous whether > the �N� should be L or R. This means that if N is has the > mirrored property, some implementations might display the > mirrored form, others the non mirrored form, and others might > result in an error. This can be fixed by deciding on a single > form for such cases. Also, the > statement: �for two adjacent runs, the eor of the first run > is the same as the sor of the second� needs to be removed > because it is not true. > > Bernard > --- > Bernard Rafael Miller, email: [EMAIL PROTECTED] > Format enabling simplified 8 bit regexes of UCS characters: > www.bytext.org > --- > > > >

