> Is it legitimate to truncate the context to a single line? The BiDi
> algorithm is attempting to interpret unlabelled text as embedded text > (it's not an arbitrary dance), and in just one line there is no > indicator of whether the hyphen is part of the LTR text embedded in RTL > text. For this discussion, I think yes. See Section 3.4 of UAX #9: The following rules describe the logical process of finding the correct display order. As opposed to resolution phases, these rules act on a per-line basis and are applied after any line wrapping is applied to the paragraph. The main collection of UBA rules apply on a per-paragraph basis, but you cannot actually do reordering of the resolved levels until you have specified the line breaks. Effectively, the hyphenation decision has to be taken first. And *then* you can reorder the results line-by-line. So once we have the decision where we are breaking “car-/rot”, we can then talk just about where the “car-“ ends up on the single line. But I agree that there are many conundrums for trying to hyphenate individual words in mixed-direction bidi text, so I am not surprised that there would be special typographical conventions which might, as Asmus suggested, require dropping in LRM’s or the like, if you wanted the visual placement of hyphens to override the basic behavior of the algorithm. > However, the very next character is 'r', which tells us that the > left-to-right run contains the hyphen. I also think the HYPHEN-MINUS > is the wrong character to consider - the analogy should be with U+2010 > HYPHEN (class ON) rather than with U+2212 MINUS SIGN (class ES), let > alone the ambiguous HPYHEN-MINUS, for which ES is merely the > interpretation most likely to work. Well, sure, but for the purposes of *this* particular discussion, it makes no difference whatsoever whether we are using U+002D or U+2010, despite the difference in Bidi_Class, since there is no question of numerical formatting here. Rule W6 will convert the bc=ES to bc=ON, and thereafter the processing is identical: Trace: Entering br_UBA_ResolveTerminators [W5] Current State: 11 Text: 05D0 05D1 05D2 0020 0063 0061 0072 002D Bidi_Class: R R R WS L L L ES Levels: 1 1 1 1 1 1 1 1 Runs: <R-----------------------------------R> Trace: Entering br_UBA_ResolveESCSET [W6] Current State: 12 Text: 05D0 05D1 05D2 0020 0063 0061 0072 002D Bidi_Class: R R R WS L L L ON Levels: 1 1 1 1 1 1 1 1 Runs: <R-----------------------------------R> --Ken
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

