Hi Richard, > > Are they okay to be present in visual order (the terminal's explicit > > mode, what we're discussing now) too? > > Where do you define the order for explicit mode?
In explicit mode, the application (Emacs, Vim, whatever) reorders the characters, and passes visual order (left to right) to the terminal emulator. The terminal emulator preserves this visual order, doesn't reshuffle anything. How to handle ZW(N)J in visual order? What's the desired way? Is it specified anywhere? As far as I know, they specify the relation between two adjacent characters of the logical order, which might not even become adjacent in the visual. Should they always "stick" to the preceding character, for example? The Unicode BiDi algorithm doesn't seem to make a difference between base letters and combining accents for reordering. So, given in an RTL text a base letter + a combining accent, the BiDi algorithm gives the visual LTR order of the combining accent first (on the left), followed by the base letter. This order is not okay for terminal emulators. Combining accents have to be reordered in the output of the Unicode BiDi algorithm, so that they come after the base letter even in the visual LTR order. This is e.g. what FriBidi does by default, due to the REORDER_NSM flag. Presumably it doesn't just reorder non-spacing combining accents, but also ZW(N)J and alike symbols too, which already smells pretty problematic, doesn't it? Or is this what you need there, too? > There may be complications in ensuring that > <joiner control><letter><non-spacing marks><joiner control> gets stored > as the content of a single cell. How should the terminal emulator know which cell (the previous or the subsequent) do these two <joiner control>s belong to? > > Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined > > above. > > Example, please. Cropped strings, cropped strings that are adjacent to each other, and faulty shaping could kick in there. Two fields on the UI. One in columns 36-40 with cyan background, aiming to show ABCDEF, but due to limited room, can only show ABCDE (let's say it's scrolled horizontally this way). Another in columns 41-45 with yellow background, aiming to show UVWXYZ, but due to limited space only VWXYZ is shown (it's scrolled horizontally like this). What the terminal emulator sees is a continuous text of ABCDEVWXYZ. What the application wants to have is to get E shaped as if there was an F on its right, and get V shaped as if there was an U on its left. Once you address this problem, I'm not sure ZW(N)J are still required/desireable, rather than applying this more generic solution there as well. > At present, VTE positions LTR Indic preceding spacing combining marks > after the consonant. I though your draft scheme corrected this very > local bidi issue, which is so local that the bidi algorithm ignores it. Indic spacing combining marks are handled incorrectly by VTE and are being addressed in bug 584160 which I've already linked. This particular issue I don't consider BiDi at all. It's something totally different. The spacing accent can be to the right, somewhat on top of and somewhat to the right, on top of, somewhat to the left and somewhat on top of, or fully on the left. It's not binary left or right. Proper rendering should be done by font, and not at all by the BiDi of the terminal. The terminal is unaware of how much the base glyph is shifted to the right and the accent to its left. All that the terminal needs to do (and VTE gets it wrong now) is to pass these two into whichever font rendering engine in one single step. > So ព្រះ <U+1796 KHMER LETTER PO, U+17D2 KHMER SIGN COENG, U+179A KHMER > LETTER RO, U+17C8 KHMER SIGN > _preah_ 'prefix denoting > repect for gods, kings, etc.' will be three cells <្រ,ព,ៈ> = <(COENG, > RA), PO, YUUKALEAPINTU> and cause no confusion? Or will the cells be > <RA, (PO, COENG), YUUKALEAPINTU>? First it's a base character followed by a non-spacing mark. As in most terminal emulators (and now we're absolutely not talking about my BiDi proposal) they are stored in the same cell. The first cell contains (PO, COENG). The next two are a base character followed by a spacing mark. In VTE 584160 I outline two possible approaches, but the one I'm in favor of, is that the row's second cell contains RO and the third cell contains YUUKALEAPINTU, which two are combined together properly when the logical contains get displayed. Another possibility which I'm pondering about is whether the emulation layer should combine them, that is, have the second cell store the "first half of (RO, YUUKA)" and the third cell store the "second half of (RO, YUUKA)". Does this make any sense? If not, could you please explain what and why is the desired behavior? Please keep in mind that I know nothing about Khmer in particular. Anyway, here we're talking about something that's totally independent from my BiDi work. It's also something that should be standardized across terminals, sure, but maybe not right now :) cheers, egmont