On Fri, 1 Feb 2019 15:15:53 +0100 Egmont Koblinger via Unicode <[email protected]> wrote:
> Hi Richard, > > On Fri, Feb 1, 2019 at 12:19 AM Richard Wordingham via Unicode > <[email protected]> wrote: > > > Cropped why? If the problem is the truncation of lines, one can > > simple store the next character. > > Yup, trancation of line for example. > > I agree that one could "store the next character". We could extend the > terminal emulation protocol where by some means you can specify that > column 80 contains a letter X, and even though there's no column 81, > an app can still tell the terminal emulator that it should imagine > that column 81 contans the letter Y, and perform shaping accordingly. > > This will need to be done not just at the end of the terminal, but at > any position, and for both directions. Think of e.g. a vertically > split tmux. You should be able to tell that column 40 contains X which > should be shaped as if column 41 contained Y, and column 41 contains Z > which should be shaped as if column 40 contained A. > > What I canont see at all is how this could be "simply". Could you > please elaborate on that? I don't find this simple at all! > > >> > It's not able to > > > separate different UI elements that happen to be adjacent in the > > > terminal, separated by different background color or such. > > > > ZWJ and ZWNJ can handle that. > > Wouldn't it be a semantical misuse of these characters, though? No. ZWNJ is used before the inanimate plural suffix of Persian, and in at least one language, <HEH, ZWJ> is used to distinguish one usage from the digit ٥ (or is it the digit ۵?). > They are supposed to be present in the logical order, and in logical > order (that is: the terminal's implicit mode) they can work as > desired. > > Are they okay to be present in visual order (the terminal's explicit > mode, what we're discussing now) too? Where do you define the order for explicit mode? There may be complications in ensuring that <joiner control><letter><non-spacing marks><joiner control> gets stored as the content of a single cell. > > Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined > above. Example, please. > > > If a general text manipulating application, e.g. cat, grep or awk, > > is writing to a file, it should not convert normal Arabic > > characters to presentation forms. You are now asking a general > > application to determine whether it is writing to a terminal or > > not, and alter its output if it is writing to a terminal. > > No, this absolutely not what I'm talking about! > > There are two vastly different modes of the terminal. For "cat", > "grep" etc. the terminal will be in implicit mode. Absolutely no BiDi > handling is expected from these apps, the terminal will do BiDi and > shaping (perhaps using Harfbuzz; perhaps using presentation form > characters as a temporarily low hanging fruit until a better one is > implemented – the choice is obviously up to the implementation and not > to the specification). > > For "emacs" and friends, an explicit mode is required where visual > order is passed to the terminal. What we're discussing is how to > handle shaping in this mode. (Partitioning grapheme clusters and Indic syllables) > > But it as an issue that needs to be addressed. As a terminal can be > > addressed by cell, an application may need to keep track of what > > text went into each cell. Misery results when the application gets > > it wrong. > > My recommendation doesn't change this principle at all. In the lower > (emulation) layer every character still goes into the cell it used to > go to, and is addressable using cursor motion escapes and so on > exactly as without BiDi. At present, VTE positions LTR Indic preceding spacing combining marks after the consonant. I though your draft scheme corrected this very local bidi issue, which is so local that the bidi algorithm ignores it. > > > > How many cells do CJK ideographs occupy? We've had a strong hint > > that a medial BEH should occupy one cell, while an isolated BEH > > should occupy two. > > CJK occupy two, but they do regardless of what's around them. That is, > they already occupy two cells in the logical buffers, in the emulation > layer. > > There is absolutely no sane way we can make in terminal emulation a > character's logical width (as in number of cells it occupies) depend > on its neighboring characters. (And even if we could by some terrible > hacks, it would break the principle you just said as "misery > results...", and the principle Eli said that things should remain > reasonably simple, otherwise hardly anyone will bother implementing > them.) This is a compromise Arabic folks will have to accept. So ព្រះ <U+1796 KHMER LETTER PO, U+17D2 KHMER SIGN COENG, U+179A KHMER LETTER RO, U+17C8 KHMER SIGN > _preah_ 'prefix denoting repect for gods, kings, etc.' will be three cells <្រ,ព,ៈ> = <(COENG, RA), PO, YUUKALEAPINTU> and cause no confusion? Or will the cells be <RA, (PO, COENG), YUUKALEAPINTU>? Richard.

