This letter describes 3 major technical problems with the current Unicode bidirectional algorithm as described in UAX #9, version 3.20. Problems 1 and 3 have security implications. Other problems with the whole Unicode bidirectional encoding approach, and their solutions, are discussed in the recently updated Bytext FAQ and documentation (www.bytext.org).
(1) Line width dependent mangling, general case: Step L2 of UAX #9 indicates that a line that resolves into a sequence of characters with homogenous embedding levels will ALWAYS be displayed right to left, regardless of what the embedding level is. So, for example a line that with the L1 resolved embedding levels of: 2222222222222222222222222 will display right to left 3333333333333333333333333 will display right to left 4444444444444444444444444 will display right to left etc Likewise: in 3333333333333333333333331, the 3�s will display left to right in 5555555555555555555555551, the 5�s will display left to right etc It directly contradicts the writers intentions. It means that different Unicode compliant applications will display the same characters in a different order (depending on available line width). Examples of how this is bad are given in question 12 of the Bytext FAQ (www.bytext.org/faq#12). This can be fixed by rewording step L2 such that a reversal happens from the highest embedding level to each lower contiguous embedding level, regardless if the embedding level is represented by a character on the line, until the embedding level of 1 is reached (or, as an optimization, until the first odd embedding level equal to or lower than the lowest embedding level represented by a character on the line). (2) Line width dependent mangling, spelling conventions for quotes: What is the purpose of step X10 if not to allow something like LEFT DOUBLE QUOTATION MARK to be used as if it was an OPEN DOUBLE QUOTATION MARK? One simply puts an embedding inside a quotation, such as �<RLE>quotation<PDF>�. The problem with this is that it only works if the quotation begins and ends on the same line. Examples of how the text is mangled when the quotation spans multiple lines are given in question 13 of the Bytext FAQ (www.bytext.org/faq#13). This cannot really be fixed with minor changes other than to notify users that the whole left=open, right=closed idea may not work as such when the default automatic line breaking is used. Users should not rely on any spelling conventions that do not bypass the effects of step X10 and mirroring --how this can be done is described in the Bytext documentation. (3) Mirroring ambiguities: What if eor = sor? text: R RLO whatever PDF N LRO whatever PDF embedding level at step X9: 1 3 3 1 2 2 directional type at step X10: R R R ? L L The above example should be in a monospace font. The original is at www.bytext.org/faq#12. Step X10 is ambiguous whether the �N� should be L or R. This means that if N is has the mirrored property, some implementations might display the mirrored form, others the non mirrored form, and others might result in an error. This can be fixed by deciding on a single form for such cases. Also, the statement: �for two adjacent runs, the eor of the first run is the same as the sor of the second� needs to be removed because it is not true. Bernard --- Bernard Rafael Miller, email: [EMAIL PROTECTED] Format enabling simplified 8 bit regexes of UCS characters: www.bytext.org ---

