On Sat, 09 Feb 2019 09:42:09 +0200 Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Sat, 9 Feb 2019 00:18:14 +0000 > > From: Richard Wordingham via Unicode <unicode@unicode.org> > > > > > For character composition, you must have a shaping engine to talk > > > to, and the shaper should tell you the width of each grapheme > > > cluster it returns. > > > > (a) What defines the grapheme clusters? The definition might be > > terminal-specific. > > Well, the "you" above alluded to the terminal emulator, of course. > The grapheme clusters are determined by the shaping engine that the > emulator must call when appropriate (or always). I find it very hard to believe that that is how it works with GNOME Terminal (Version 3.18.3, using VTE Version 0.42.5). At the command line I typed in the Khmer script string ក្កេក (KA, COENG, KA, SIGN E, KA), and saw the string split into four columns (KA, COENG), (KA), (SIGN E), (KA), with each column given the same width. When written correctly, SIGN E is first in visual order. The fourth column was displayed on top of the third column, which contained a dotted circle to show that SIGN E on its own was not grammatically correct. If I were writing a Khmer font for use with Gnome terminal, I would attempt to ensure that the display for SIGN E fitted in a single cell. Of course, the renderer's grapheme cluster boundaries don't always match appearances. To get the traditional placement of U+1A58 TAI THAM SIGN MAI KANG LAI, I end up with it being a mark glyph one cluster later than HarfBuzz indicates it to be. It would be good to be able to access a maintained statement of the VTE rules for allocating characters to a cell, or group of cells, as appropriate. > > (b) With a terminal that expects a fixed width font, surely the > > terminal decides how many cells it allocates to a group of > > characters, and the font designer has to come up with a suitable > > value based on that. > > Yes. A terminal emulator that works with a shaper should probably > post-process the width information returned by the shaper for these > purposes. Perhaps it should base the number of cells on the width of the clusters. However, continuing with my example, U+1789 KHMER LETTER NYO as a base character is too wide to fit in a cell, and the next character will overwrite its right-hand part. From this I deduce that it is allocated just one cell. Gnome terminal is not alone in doing this, but it does better than some, in my opinion, in that the overflow of the foreground of one cell is not obliterated by the background of the next cell. U+1789 has an East Asian width property of 'Neutral', which is distinctly unhelpful. What I would like is a specification of what a font must do to avoid such problems. > > > I don't see how you can expect wcwidth, or any other > > > interface that was designed to work with _characters_, to be > > > useful when you need to display grapheme clusters. It, or something similar but worse, gets used, especially when moving the cursor for editing. > > Well I can envisage a decision being made that a grapheme cluster > > str (as decreed by the terminal) shall occupy wcswidth(str) cells - > > "The wcswidth() function returns the number of column positions for > > the wide-character string s, truncated to at most length n". > > AFAIU, the shaping engine returns its output in terms of font glyph > numbers, not character codepoints, so you cannot in general call > wcswidth on them. The shaper also returns the advance information, > which serves instead of wcwidth and related APIs for determining the > actual width on display. Unfortunately, when the rectangular grid is being preserved, typographical advance width is generally ignored when determining the placement of characters. Now, this is not always true; one can have the situation where the the positioning of characters respects the advance widths, but the positioning of the cursor assumes a fixed-width rectangular grid. I have found working with that to be extremely confusing. Richard.