Re: Encoding italic
Martin J. Dürst wrote, >> Isn't that already the case if one uses variation sequences to choose >> between Chinese and Japanese glyphs? > > Well, not necessarily. There's nothing prohibiting a font that includes > both Chinese and Japanese glyph variants. Just as there’s nothing prohibiting a single font file from including both roman and italic variants of Latin characters.
Re: Encoding italic
On 2019/02/09 19:58, Richard Wordingham via Unicode wrote: > On Fri, 8 Feb 2019 18:08:34 -0800 > Asmus Freytag via Unicode wrote: >> Under the implicit assumptions bandied about here, the VS approach >> thus reveals itself as a true rich-text solution (font switching) >> albeit realized with pseudo coding rather than markup, markdown or >> escape sequences. > > Isn't that already the case if one uses variation sequences to choose > between Chinese and Japanese glyphs? Well, not necessarily. There's nothing prohibiting a font that includes both Chinese and Japanese glyph variants. Regards, Martin.
Re: Bidi paragraph direction in terminal emulators
On Sun, 10 Feb 2019 00:59:46 +0100 Egmont Koblinger via Unicode wrote: > Is there such a monospace font obeying wcwidth (that is: double wide > character for when a spacing mark is combined) for Devanagari? For CV, that would correspond to a Hindi typewriter, so the odds look good. The Remington keyboard layout is taken from the typewriter design. However, the typewriter had non-spacing keys for repha (roughly ) and vattu (), so you'll be out of luck for consonant clusters. On the other hand, is two key strokes - the cells would be for and ! There's an implementation of the keyboard in the M17N database - hi-remington.mim. > Is there a monospace font for Arabic, Apart from wcwidth("لآ") = 2, Khaled has already said in this thread that there are such fonts. > for Syriac, etc.? (How much do these questions make sense at all?) Perfect sense. Richard.
Re: Bidi paragraph direction in terminal emulators
On Sat, 9 Feb 2019 18:42:52 +0100 Egmont Koblinger via Unicode wrote: > The > problem that I don't know how to address is: What if harfbuzz tells us > that the overall width for rendering a particular grapheme cluster is > significantly different from its designated area (the number of > character cells [wcswidth()] multiplied by the width of each)? You have to reduce the width of the glyph used. The tricky bit is where the glyph deliberately overhangs or underlies a neighbouring glyph. A good example of this is almost U+0E33 THAI CHARACTER SARA AM, whose nikkhahit component can typically overhangs the previous character; however, ink beyond the left limit should not be a problem for LTR scripts. Which side do you align RTL cells on? Now, you might want to treat U+0E33 as interacting with its predecessor, because it does. The test word is น้ำ 'water'. Richard.
Re: Bidi paragraph direction in terminal emulators
On Sat, 9 Feb 2019 22:29:31 +0100 Adam Borowski via Unicode wrote: > On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode > wrote: > > I don't know. Maybe it keeps a database of character combinations > > that need shaping, each one with the maximum width on display the > > result can occupy. Or maybe it does something else. If it cannot, > > and the terminal cannot either, then what you say is that some > > scripts can never be supported by text terminals. > > That's doable even within the current rules, where every codepoint > bears a wcwidth of 0, 1 or 2. A cluster made of codepoints a ' b c d > " ^ (where a b c d have widths 1 while ' " ^ widths 0) needs to be > rendered in exactly 4 cells. This may force stretching or condensing > the shaped cluster compared to what usual typography would demand but > that's in no way different from stretching Latin "i" or condensing > "W". It would be helpful if overlong shapings were condensed automatically. The general principle that functions work better on strings applies here. There are two obvious situations where the additive formulae break down. (a) Emoji should, should they not, occupy at least 2 cells. There are a few problem sequences, such as (or is wcwidth(0x20E3) equal to 1?). (b) Brahmi-like Indic scripts. In many of these, the combination of a virama or invisible stacker and a base consonant acts like a combining mark, either causing no advance or as a mark with a very slight width. Examples include Grantha, Myanmar, Tai Tham and Khmer. Stretching a stack of 3 or 4 consonants to occupy 3 or 4 cells instead of 1 would be worse than stretching 'i'. If you do it, you want fonts that adjust the glyphs accordingly, just as for 'i'. Richard.
Re: Bidi paragraph direction in terminal emulators
Hi, On Sun, Feb 10, 2019 at 12:52 AM Richard Wordingham via Unicode wrote: > This is an example of where one needs a font designed for terminal > emulators. Definitely, this is another approach I forgot to mention in my mail, rather than VTE switching to harfbuzz and figuring out all the issues. This approach would also make them usable in every decent terminal emulator at once, not just VTE. Is there such a monospace font obeying wcwidth (that is: double wide character for when a spacing mark is combined) for Devanagari? Is there a monospace font for Arabic, for Syriac, etc.? (How much do these questions make sense at all?) If there are such fonts, I'd be happy to use them for testing. e.
Re: Bidi paragraph direction in terminal emulators
On Sat, 9 Feb 2019 22:31:37 +0100 Egmont Koblinger via Unicode wrote: > Let's take the Devanagari improvement of the other day. Until now, > there were plenty of dotted circles shown, and combining spacing marks > that should've been placed before the letter were placed after the > letter, before a placeholder dotted circle. Now they are displayed as > expected: the combininig spacing mark shows up before the letter (if > it's of that kind), and no dotted circle. The letter + spacing marks > now shows up correctly. The entire word still doesn't, e.g. there are > often spaces between letters where the upper line connecting them > should be continuous. This is an example of where one needs a font designed for terminal emulators. Richard.
Re: Bidi paragraph direction in terminal emulators
On Sat, 9 Feb 2019 13:02:55 -0800 "Asmus Freytag \(c\) via Unicode" wrote: > To force Hindi crosswords mode you need to segment the string into > syllables, > each having a variable number of characters, and then assign a single > display > position to them. Now some syllables are wider than others, so you > could use the single/double width paradigm. The result may be > somewhat legible for Devanagari, but even some of the closely related > scripts may not fit that well. It is also possible that whole syllables are used because there are vertical words. > To give you an idea, here is an Arabi crossword. It uses the isolated > shape of > all letters and writes all words unconnected. That's two things that > may be acceptable for a puzzle, but not for text output. > > http://www.everyday-arabic.com/2013/12/crossword1.html > > (try typing 3 vertical as a word to see the difference - it's 4x > U+062A) Crosswords suffer from the need to be read vertically as well as horizontally. Can Arabic naturally be written vertically? In any case, Arabic typewriters exist and, so far as I understand, work. The problem rather seems to be one of standardising the Procrustean technique to be used. It seems from what Khaled Hosny wrote that monospace for letters is the usual solution already. The design difficulty for Arabic is rather that horizontally adjacency may sometimes need to be treated as accidental rather than as an invitation to cursively join.. Richard.
Re: Bidi paragraph direction in terminal emulators
On 2/9/2019 1:40 PM, Egmont Koblinger wrote: On Sat, Feb 9, 2019 at 10:10 PM Asmus Freytag via Unicode wrote: I hope though that all the scripts can be supported with more or less compromises, e.g. like it would appear in a crossword. But maybe not. See other messages: not. For the crossword analogy, I can see why it's not good. But this doesn't mean there aren't any other ideas we could experiment with. "all...scripts" is the issue. We know how to handle text for all scripts and what complexities one has to account for in order to do that. You can back off some corner cases or (slightly) degrade things, but even after you are done with that, there will be scripts where the "more or less compromises" forces by the design parameters you gave will mean an utterly unacceptable display. That said, there are scripts that had "passable" typewriter implementations and it may be possible to tweak things to approach that level support. Don't know for sure, it depends on the details for each script. Or do you mean to say that because it can't be made perfect, there's no point at all in partially improving? I don't think I agree with that. It's more a question of being upfront with your goal. At this point I understand it as accepting some design parameters as fundamental and seeing whether there are some tweaks that allow more scripts to work with or to "survive" given the constraints. That's not a totally useless effort, but it is a far cry from Unicode's universal support for ALL writing systems. A./ PS: also we have been seriously hijacking a thread related to bidi e.
Re: Bidi paragraph direction in terminal emulators
On Sat, Feb 9, 2019 at 10:10 PM Asmus Freytag via Unicode wrote: > > I hope though that all the scripts can be supported with more or less > > compromises, e.g. like it would appear in a crossword. But maybe not. > > See other messages: not. For the crossword analogy, I can see why it's not good. But this doesn't mean there aren't any other ideas we could experiment with. Or do you mean to say that because it can't be made perfect, there's no point at all in partially improving? I don't think I agree with that. e.
Re: Bidi paragraph direction in terminal emulators
Hi Asmus, On Sat, Feb 9, 2019 at 10:02 PM Asmus Freytag (c) wrote: > are you excluding CJK because of the difficulty handling a large > repertoire with mechanical means? No, I excluded CJK because they're pretty well solved in terminals, and nowhere near along the lines of how they work with typewriters. I should've probably said "letter based" scripts or whatever, I'm not familiar with the exact terminologies. > To force Hindi crosswords mode you need to segment the string into syllables, > each having a variable number of characters [...] Thanks a lot to you too for your detailed explanation! > Are you defining as your goal to have some kind of "line by line" display that > can survive any Unicode text thrown at it, or are you trying to extend a given > design with rather specific limitations, so that it survives / can be used > with, > just a few more scripts than European + CJK? I don't have a clearly defined goal. I find fun in developing VTE (and slightly improving other terminal emulators too by spreading ideas, knowledge, comments etc.), addressing various kinds of goals, whatever happens to come next. At this point it's BiDi, with a bit of Devanagari improvement sneaking in the other day. What is clear to me: I cannot redefine the basics of terminal emulation. I can only add incremental improvements to whatever it already is, and I have to make sure that the ecosystem built around it during decades (all the screen handling libraries and applications) doesn't break. I'm limited by these constraints. > The discrepancies would be more like throwing random blank spaces in the > middle of every word, writing letters out of order, or overprinting. So, more > fundamental, not just "not perfect". Let's take the Devanagari improvement of the other day. Until now, there were plenty of dotted circles shown, and combining spacing marks that should've been placed before the letter were placed after the letter, before a placeholder dotted circle. Now they are displayed as expected: the combininig spacing mark shows up before the letter (if it's of that kind), and no dotted circle. The letter + spacing marks now shows up correctly. The entire word still doesn't, e.g. there are often spaces between letters where the upper line connecting them should be continuous. Eventually HarfBuzz could help, but it's just not yet clear how exactly. I cannot essentially change the underlying model of fixed width cells. On top of this model, though, we can experiment with various ideas about displaying. For example, if a word occupies 7 columns in the model, then HarfBuzz renders it, and the rendered version occupies the width of 8.6 columns, maybe we can squeeze it using a trivial linear transformation? I'm not sure, but maybe it's an idea worth investigating. Won't look perfect, but probably will look better than what we do currently. We already have column spacing implemented, to pull the columns further apart from each other by a fixed amount (mostly for accessibility purposes), maybe a user can use this feature to make more room for a nicely rendered, non-squeezed Devanagari text. > To give you an idea, here is an Arabi crossword. It uses the isolated shape of > all letters and writes all words unconnected. That's two things that may be > acceptable for a puzzle, but not for text output. You can't get nice Arabic without first making sure the order of the letters is the correct one, not reversed. :-) That's what my current work is about. As per Richard's feedback, I also see that shaping needs to be done differently than I had thought. Mind you, my visual inspection of what the non-preferred shaping approach gave to me vs. what a proper HarfBuzz rendering gave (for Arabic) were extremely close to each other, something that I'd probably consider "good enough" if I spoke the language and were aware of the terminal's constraints. Well, definitely a major improvement over what we have. > You may begin to see the limitations and that they may well prevent you from > reaching even your limited goal for speakers of at least three of the top ten > languages > worldwide. If the goal is to have perfect rendering without compromises: sure I won't reach that. (It's not a goal for me. For perfect rendering, users should get away from terminals.) If the goal is to have something reasonably good, better than what we have currently, I can't see why not. cheers, e.
Re: Bidi paragraph direction in terminal emulators
On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode wrote: > > From: Egmont Koblinger > > Date: Sat, 9 Feb 2019 20:36:50 +0100 > > Cc: Richard Wordingham , > > unicode Unicode Discussion > > > > On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii wrote: > > > > > That's the application's problem, not the terminal's. An application > > > that wants its column to line up _and_ wants to support complex text > > > scripts will need to move cursor to certain coordinates, not to assume > > > that 7 codepoints always take 7 columns on display. It must know that those particular 7 codepoints take, say, 5 columns when written together in a sequence. And it can't possibly ask the terminal, either -- it might be on a link that doesn't allow metadata to pass, it might be broadcasted, its output might be recorded many years prior to being displayed. A good part of the time the program is even run on a different distribution/release/OS. Obviously, a program running with system libraries might suffer misalignment and thus visual corruption if those libraries don't know beyond, say, Unicode 13 yet the terminal expects Unicode 17 -- but that's no different from any other property incompatibly changing. Property changes for established characters are pretty rare thus no significant loss of interoperability can be expected over time. > > In order to do that, an application needs to know how wide a text will > > appear, which depends on the font. How will it know it? > > I don't know. Maybe it keeps a database of character combinations > that need shaping, each one with the maximum width on display the > result can occupy. Or maybe it does something else. If it cannot, > and the terminal cannot either, then what you say is that some scripts > can never be supported by text terminals. That's doable even within the current rules, where every codepoint bears a wcwidth of 0, 1 or 2. A cluster made of codepoints a ' b c d " ^ (where a b c d have widths 1 while ' " ^ widths 0) needs to be rendered in exactly 4 cells. This may force stretching or condensing the shaped cluster compared to what usual typography would demand but that's in no way different from stretching Latin "i" or condensing "W". Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Re: Bidi paragraph direction in terminal emulators
On 2/9/2019 11:48 AM, Egmont Koblinger wrote: Hi Asmus, On quick reading this appears to be a strong argument why such emulators will never be able to be used for certain scripts. Effectively, the model described works well with any scripts where characters are laid out (or can be laid out) in fixed width cells that are linearly adjacent. I'm wondering if you happen to know: Are there any (non-CJK) scripts for which a mechanical typewriter does not exist due to the complexity of the script? Egmont, are you excluding CJK because of the difficulty handling a large repertoire with mechanical means? However, see: https://en.wikipedia.org/wiki/Chinese_typewriter Are there any (non-CJK) scripts for which crossword puzzles don't exist? For scripts where these do exist, is it perhaps an acceptable tradeoff to keep their limitations in the terminal emulator world as well, to combine the terminal emulator's power with these scripts? I agree with you that crossword puzzles and scrabble have a similar limitation to the design that you sketched for us. However, take a script that is written in syllables (each composed of 1-5 characters, say). In a "crossword" I could write this script so that each syllable occupies a cell. It would be possible to read such a puzzle, but trying to use such a draconian technique for running text would be painful, to say the least. (We are not even talking about pretty, here). Here's an example for Hindi: https://vargapaheli.blogspot.com/2017/ I don't read Hindi, but 5 vertical in the top puzzle, cell 2, looks like it contains both a consonant and a vowel. To force Hindi crosswords mode you need to segment the string into syllables, each having a variable number of characters, and then assign a single display position to them. Now some syllables are wider than others, so you could use the single/double width paradigm. The result may be somewhat legible for Devanagari, but even some of the closely related scripts may not fit that well. Now there are some scripts where the same syllable can be written in more than one form; the forms differing by how the elements are fused (or sometimes not fused) into a single shape. Sometimes, these differences are more "stylistic", more like an 'fi' ligature in English, sometimes they really indicate different words, or one of the forms is simply not correct (like trying to spell lam-alif in Arabic using two separate letters). I'm sure there are scripts that work rather poorly (effectively not at all) in cross- word mode. The question then becomes one of goals. Are you defining as your goal to have some kind of "line by line" display that can survive any Unicode text thrown at it, or are you trying to extend a given design with rather specific limitations, so that it survives / can be used with, just a few more scripts than European + CJK? Honestly, even with English, all I have to do is "cat some_text_file", and chances are that a word is split in half at some random place where it hits the right margin. Even with just English, a terminal emulator isn't something that gives me a grammatically and typographically super pleasing or correct environment. It gives me something that I personally find grammatically and typographically "good enough", and in the mean time a powerful tool to get my work done. The discrepancies would be more like throwing random blank spaces in the middle of every word, writing letters out of order, or overprinting. So, more fundamental, not just "not perfect". To give you an idea, here is an Arabi crossword. It uses the isolated shape of all letters and writes all words unconnected. That's two things that may be acceptable for a puzzle, but not for text output. http://www.everyday-arabic.com/2013/12/crossword1.html (try typing 3 vertical as a word to see the difference - it's 4x U+062A) Obviously the more complex the script, the more tradeoffs there will be. I think it's a call each user has to make whether they prefer a terminal emulator or a graphical app for a certain kind of task. And if terminal emulators have a lower usage rate in these scripts, that's not necessarily a problem. If we can improve by small incremental changes, sure, let's do. If we'd need to heavily redesign plenty of fundamentals in order to improve, it most likely won't happen. You may begin to see the limitations and that they may well prevent you from reaching even your limited goal for speakers of at least three of the top ten languages worldwide. A./
Re: Bidi paragraph direction in terminal emulators
On 2/9/2019 12:07 PM, Egmont Koblinger via Unicode wrote: On Sat, Feb 9, 2019 at 9:01 PM Eli Zaretskii wrote: then what you say is that some scripts can never be supported by text terminals. I'm not familiar at all with all the scripts and their requirements, but yes, basically this is what I'm saying. I'm afraid some scripts can never be perfectly supported by text terminals. This includes the scripts used for up to four of the world's top ten languages. And it's more than "not perfect"; effectively some scripts cannot be shoehorned into the fundamental design. That design was created to work with European scripts, and proved somewhat adaptable to other scripts that lend themselves to fixed-width cell display. But beyond that is where you hit the proverbial brick wall. I hope though that all the scripts can be supported with more or less compromises, e.g. like it would appear in a crossword. But maybe not. See other messages: not. Maybe one day some new, modern platform will arise with the goal of replacing terminal emulators, which I wouldn't necessarily mind. It's gonna take an enormous amount of work, though. A./
Re: Bidi paragraph direction in terminal emulators
Egmont, On 2/9/2019 11:48 AM, Egmont Koblinger via Unicode wrote: Are there any (non-CJK) scripts for which crossword puzzles don't exist? There are crossword puzzles for Hindi (in the Devanagari script). Just do an image search for "Hindi crossword puzzle". But the conventions for these break up words into syllables fitting into the boxes, and the rules for that are complex. You have to allow for the placement of dependent vowels, which may take up extra space left or right, as well as consonant clusters, which would be expressed often as conjuncts in Sanskrit, but which in Hindi are more commonly rendered as dead consonant sequences. So the "stuff in a box" is: 1. Inherently proportional width. 2. Inherently multi-character in content. (underlying 1 to 3 or more characters per cell) This is the kind of compromise you would have to have to make for almost any Indic script, to enable a rational approach to building crossword puzzles that make sense. And in a terminal context, you probably would not get acceptable behavior for Hindi if you tried to just take all the "stuff in a box" chunks and tried to lay them out directly in a line, as if the script behaved more like CJK. The existence proof of techniques to cut up text into syllables that enable crossword puzzle building, is not the same as a determination that the script, ipso facto, would work in a terminal context without dealing with additional complex script issues. At any rate, this is once again straying over into the issue of whether terminals can be adapted for the requirements of shaping rules for complex scripts -- rather than the nominal subject of the thread, which has to do with bidi text layout in terminals. --Ken
Re: Bidi paragraph direction in terminal emulators
Hi Ken, > There are crossword puzzles for Hindi (in the Devanagari script). Just > do an image search for "Hindi crossword puzzle". It's easy to confirm the existence by an image search, it's hard to confirm the non-existence ;) > The existence proof of techniques to cut up text into syllables that > enable crossword puzzle building, is not the same as a determination > that the script, ipso facto, would work in a terminal context without > dealing with additional complex script issues. Thanks a lot for your detailed explanation; this possibility indeed didn't occur to me. cheers, egmont
Re: Bidi paragraph direction in terminal emulators
On Sat, Feb 9, 2019 at 9:01 PM Eli Zaretskii wrote: > then what you say is that some scripts > can never be supported by text terminals. I'm not familiar at all with all the scripts and their requirements, but yes, basically this is what I'm saying. I'm afraid some scripts can never be perfectly supported by text terminals. I hope though that all the scripts can be supported with more or less compromises, e.g. like it would appear in a crossword. But maybe not. Maybe one day some new, modern platform will arise with the goal of replacing terminal emulators, which I wouldn't necessarily mind. It's gonna take an enormous amount of work, though. cheers, egmont
Re: Bidi paragraph direction in terminal emulators
> From: Egmont Koblinger > Date: Sat, 9 Feb 2019 20:36:50 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii wrote: > > > That's the application's problem, not the terminal's. An application > > that wants its column to line up _and_ wants to support complex text > > scripts will need to move cursor to certain coordinates, not to assume > > that 7 codepoints always take 7 columns on display. > > In order to do that, an application needs to know how wide a text will > appear, which depends on the font. How will it know it? I don't know. Maybe it keeps a database of character combinations that need shaping, each one with the maximum width on display the result can occupy. Or maybe it does something else. If it cannot, and the terminal cannot either, then what you say is that some scripts can never be supported by text terminals.
Re: Bidi paragraph direction in terminal emulators
Hi Asmus, > On quick reading this appears to be a strong argument why such emulators will > never be able to be used for certain scripts. Effectively, the model > described works > well with any scripts where characters are laid out (or can be laid out) in > fixed > width cells that are linearly adjacent. I'm wondering if you happen to know: Are there any (non-CJK) scripts for which a mechanical typewriter does not exist due to the complexity of the script? Are there any (non-CJK) scripts for which crossword puzzles don't exist? For scripts where these do exist, is it perhaps an acceptable tradeoff to keep their limitations in the terminal emulator world as well, to combine the terminal emulator's power with these scripts? Honestly, even with English, all I have to do is "cat some_text_file", and chances are that a word is split in half at some random place where it hits the right margin. Even with just English, a terminal emulator isn't something that gives me a grammatically and typographically super pleasing or correct environment. It gives me something that I personally find grammatically and typographically "good enough", and in the mean time a powerful tool to get my work done. Obviously the more complex the script, the more tradeoffs there will be. I think it's a call each user has to make whether they prefer a terminal emulator or a graphical app for a certain kind of task. And if terminal emulators have a lower usage rate in these scripts, that's not necessarily a problem. If we can improve by small incremental changes, sure, let's do. If we'd need to heavily redesign plenty of fundamentals in order to improve, it most likely won't happen. cheers, egmont
Re: Bidi paragraph direction in terminal emulators
On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii wrote: > That's the application's problem, not the terminal's. An application > that wants its column to line up _and_ wants to support complex text > scripts will need to move cursor to certain coordinates, not to assume > that 7 codepoints always take 7 columns on display. In order to do that, an application needs to know how wide a text will appear, which depends on the font. How will it know it? Will it by some means know the font and the rendering engine the terminal uses (even across ssh) and will it have to measure it itself? Or will it be able to ask the terminal? If so, how? Maybe a new extension, an asynchronous escape sequence that responds back with the measured width? What about the latency caused by the bunch of asyncronous roundtrips, especially over ssh? What about the utter pain and intrinsic unreliability of handling asynchronous responses, as I've outlined in a section of https://gitlab.freedesktop.org/terminal-wg/specifications/issues/8 ? What if there's no font? What if there are multiple fonts at the same time? What if the font is changed later on, is it okay then for the display of existing stuff to fall apart and only newly printed stuff to appear correctly? How do you define the "width of the terminal in characters", get/set by ioctl(..., TIOC[GS]WINSZ, ...) that many apps rely on? If you define it by any means, what if by placing the maximum numbers of "i"s in a row doesn't fill up the entire width? Will that area be unaccessible, then? Or despite having a definition of terminal width, will there be new cells beyond this width to write to? What if filling a row with all "w"s overflows? I take it that an app shouldn't print there, but what if it still does, will that piece of text just not be shown? How much more complicated would you think implementing something like "zip -h" become? > How is this different from using variable-pitch fonts? Do you mean variable-pitch font where the terminal still places each glyph in its designated area? The font is the private business of the terminal emulator, then, it'll just appear ugly as a screenshot I've already linked, but the emulation behavior wouldn't care. Or do you mean variable-pitch font where each letter is placed after each other, as you'd expect in document editors? That is, way more "i"s that "w"s fitting in a line? It's not different, it's practically the same. And this is something that none of the terminal emulators I'm aware of does; and having some clue about terminal emuators, I can't imagine how one could do (see all the questions above for a start). This is why I'm saying: Sure you can take this path, but then we're talking about something new, not terminal emulators as we currently know them. You can take this path, but then you'll have to rebuild many of the already existing apps, and beware, they'll get way more complex. e.
Re: Bidi paragraph direction in terminal emulators
> From: Egmont Koblinger > Date: Sat, 9 Feb 2019 20:03:21 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > Let's suppose a utility outputs these two lines of text: > abcdefg| > complex| > > whereas "abcdefg" are these English letters themselves, but "complex" > is a word of some language requiring complex script rendering, taking > up 7 logical cells (because that's what wcwidth() says). Also, "|" is > the pipe symbol, or a vertical box drawing line, whatever. > > Now let's assume that harfbuzz tells you that the desired width for > rendering this "complex" word is 5.3 times the width of the character > cell. Or 8.6 times it. How to proceed? How will the "|" bars align up, > and thus mc's two-panel layout, tmux's vertical split etc. not fall > apart? In the latter case, when the width requested by harfbuzz is > bigger than the designated width, what to with characters that "fall > off" at the right edge of the terminal? That's the application's problem, not the terminal's. An application that wants its column to line up _and_ wants to support complex text scripts will need to move cursor to certain coordinates, not to assume that 7 codepoints always take 7 columns on display. Or it will have to tell the users to use specific fonts, which are known to provide guarantees that this happens. How is this different from using variable-pitch fonts?
Re: Bidi paragraph direction in terminal emulators
On quick reading this appears to be a strong argument why such emulators will never be able to be used for certain scripts. Effectively, the model described works well with any scripts where characters are laid out (or can be laid out) in fixed width cells that are linearly adjacent. There are some crude techniques that allow an extension to cover scripts that require half-width or double-width cells, and perhaps even zero-width. However, scripts, where rendering involves complicated ligatures or other typographical interactions that often are specific to a given font, would simply be out of scope because for those scripts the fixed width model with an underlying buffer mimicking the display simply cannot be made to work. And indeed, by up-front accepting the limitation of a particular design approach it would be surprising if such emulators proved flexible enough to handle the rather wide variety of writing systems supported by Unicode. At best, the discussion could yield a few further approximations of correct rendering that can be retrofitted to the particular design restrictions outlined below, but that with luck extend the envelope somewhat so that a few more writing systems can be shoehorned into it. However, it appears quite hopeless to attempt to cover all of Unicode's scripts on that premise. A./ On 2/9/2019 10:25 AM, Egmont Koblinger via Unicode wrote: On Sat, Feb 9, 2019 at 7:07 PM Eli Zaretskii wrote: You need to use what HarfBuzz tells you _instead_ of wcswidth. It is in general wrong to use wcswidth or anything similar when you use a shaping engine and support complex script shaping. This approach is not viable at all. Terminal emulators have an internal data structure that they maintain, a matrix of character cells. Every operation is performed here, every escape sequence is defined on this layer what it does, the cursor position is tracked on this layer, etc. You can move the cursor to integer coordinates, overwrite the letter in that cell, and do plenty of other operations (like push the rest to the right by one cell). If you change these fundamentals, most of the terminal-based applications will fall apart big time. This behavior has to be absolutely independent from the font. The application running inside the terminal doesn't and cannot know what font you use, let alone how harfbuzz is about to render it. (You can even have no font at all, such as with the libvterm headless emulator library, or a detached screen or tmux session; or have multiple fonts at the same time if a screen or tmux session is attached from multiple graphical emulators.) So one part of a terminal emulator's code is responsible for maintaining this matrix of characters according to the input it receives. Another part of their code is responsible for presenting this matrix of characters on the UI, doing the best it can. If you say that the font should determine the logical width, you need to start building up something brand new from scratch. You need to have something that doesn't have concepts like "width in characters". You need to redefine cursor movement and many other escape sequences. You need to heavily adjust the behavior of a gazillion of software, e.g. zip's two-column output, anything that aligns in columns (e.g. midnight commander, tmux's vertical split etc.), the shell's (or readline's) command editing and wrapping to multiple lines, ncurses, and so on, all the way to e.g. fullscreen text editors like Emacs. And then we're not talking about terminal emulators anymore, as we know them now, but something new, something pretty different. Terminal emulators do have strong limitations. Complex text rendering can only work to the extent we can squeeze it into these limitations. cheers, egmont
Re: Bidi paragraph direction in terminal emulators
On Sat, Feb 9, 2019 at 7:56 PM Eli Zaretskii wrote: > I'm probably missing something, because I don't see the grave problems > you hint at. Any width provided back by a shaper can be rounded to > the nearest integral character cell, so your canvas can still remain > rectangular. Let's suppose a utility outputs these two lines of text: abcdefg| complex| whereas "abcdefg" are these English letters themselves, but "complex" is a word of some language requiring complex script rendering, taking up 7 logical cells (because that's what wcwidth() says). Also, "|" is the pipe symbol, or a vertical box drawing line, whatever. Now let's assume that harfbuzz tells you that the desired width for rendering this "complex" word is 5.3 times the width of the character cell. Or 8.6 times it. How to proceed? How will the "|" bars align up, and thus mc's two-panel layout, tmux's vertical split etc. not fall apart? In the latter case, when the width requested by harfbuzz is bigger than the designated width, what to with characters that "fall off" at the right edge of the terminal? e.
Re: Bidi paragraph direction in terminal emulators
> From: Egmont Koblinger > Date: Sat, 9 Feb 2019 19:25:08 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > > You need to use what HarfBuzz tells you _instead_ of wcswidth. It is > > in general wrong to use wcswidth or anything similar when you use a > > shaping engine and support complex script shaping. > > This approach is not viable at all. > [...] I'm probably missing something, because I don't see the grave problems you hint at. Any width provided back by a shaper can be rounded to the nearest integral character cell, so your canvas can still remain rectangular. And I see no reason why an application should be bothered by the actual number of character cells occupied by the text it wrote on display. So what exactly is not viable in using the width reported back by the shaper? > If you say that the font should determine the logical width, you need > to start building up something brand new from scratch. Are you saying that a terminal cannot work with variable-pitch fonts? > Terminal emulators do have strong limitations. Complex text rendering > can only work to the extent we can squeeze it into these limitations. No one said anything to the contrary, AFAICT.
Re: Bidi paragraph direction in terminal emulators
On Sat, Feb 9, 2019 at 7:07 PM Eli Zaretskii wrote: > You need to use what HarfBuzz tells you _instead_ of wcswidth. It is > in general wrong to use wcswidth or anything similar when you use a > shaping engine and support complex script shaping. This approach is not viable at all. Terminal emulators have an internal data structure that they maintain, a matrix of character cells. Every operation is performed here, every escape sequence is defined on this layer what it does, the cursor position is tracked on this layer, etc. You can move the cursor to integer coordinates, overwrite the letter in that cell, and do plenty of other operations (like push the rest to the right by one cell). If you change these fundamentals, most of the terminal-based applications will fall apart big time. This behavior has to be absolutely independent from the font. The application running inside the terminal doesn't and cannot know what font you use, let alone how harfbuzz is about to render it. (You can even have no font at all, such as with the libvterm headless emulator library, or a detached screen or tmux session; or have multiple fonts at the same time if a screen or tmux session is attached from multiple graphical emulators.) So one part of a terminal emulator's code is responsible for maintaining this matrix of characters according to the input it receives. Another part of their code is responsible for presenting this matrix of characters on the UI, doing the best it can. If you say that the font should determine the logical width, you need to start building up something brand new from scratch. You need to have something that doesn't have concepts like "width in characters". You need to redefine cursor movement and many other escape sequences. You need to heavily adjust the behavior of a gazillion of software, e.g. zip's two-column output, anything that aligns in columns (e.g. midnight commander, tmux's vertical split etc.), the shell's (or readline's) command editing and wrapping to multiple lines, ncurses, and so on, all the way to e.g. fullscreen text editors like Emacs. And then we're not talking about terminal emulators anymore, as we know them now, but something new, something pretty different. Terminal emulators do have strong limitations. Complex text rendering can only work to the extent we can squeeze it into these limitations. cheers, egmont
Re: Bidi paragraph direction in terminal emulators
> Date: Sat, 9 Feb 2019 18:42:52 +0100 > Cc: unicode Unicode Discussion > From: Egmont Koblinger via Unicode > > What if harfbuzz tells us that the overall width for rendering a > particular grapheme cluster is significantly different from its > designated area (the number of character cells [wcswidth()] > multiplied by the width of each)? You need to use what HarfBuzz tells you _instead_ of wcswidth. It is in general wrong to use wcswidth or anything similar when you use a shaping engine and support complex script shaping.
Re: Bidi paragraph direction in terminal emulators
Hi Richard, On Sat, Feb 9, 2019 at 3:08 PM Richard Wordingham via Unicode wrote: > It would be good to be able to access a maintained statement of the > VTE rules for allocating characters to a cell, or group of cells, as > appropriate. What VTE did, up to a couple of days ago: It opens the font, and measures the ASCII 33-126 or so characters, takes their average size (well, in case of monospace font, they should all have the same size), this determines the cell size. Then every character cell is rendered individually, using Pango or Cairo or I'm not sure what exactly – there are like three paths in the source, the details are unclear to me. A cell might contain a base character + nonspacing combining accents, these are passed together to Pango and friends, so they render it as one unit. The glyph is aligned to the left of its designated cell area, overflowing on the right (and thus potentially overlapping with the next glyph) if it's wider than its designated area. As a special case, two adjacents cells might contain a double wide (typically CJK) character, but it's not that special after all: it's also displayed aligned to the left edge of its first cell. What I improved a couple of days ago (to be released in vte-0.56), for Devanagari and friends, although I know there's more than this to address these scripts properly: If a cell contains a regular letter, and the next cell contains a spacing combining mark, then these two are passed to Pango in a single step, that is, the spacing combining mark is applied around its base letter by Pango as expected. (Previously the spacing combining mark was rendered on its own, around a dotted circle, which was obviously pretty bad.) What I'm working on currently, as you all know by now, is BiDi-shuffling the cells before rendering them (hopefully for vte-0.58). This is how VTE works now, but it's by no means a specification, and tailoring a font to this behavior is probably not the right approach. Instead, VTE's behavior should be improved. We have a pending feature request (which I've already linked) to use HarfBuzz for rendering the glyphs, which would then render grapheme clusters beautifully. The problem that I don't know how to address is: What if harfbuzz tells us that the overall width for rendering a particular grapheme cluster is significantly different from its designated area (the number of character cells [wcswidth()] multiplied by the width of each)? cheers, egmont > > > > (b) With a terminal that expects a fixed width font, surely the > > > terminal decides how many cells it allocates to a group of > > > characters, and the font designer has to come up with a suitable > > > value based on that. > > > > Yes. A terminal emulator that works with a shaper should probably > > post-process the width information returned by the shaper for these > > purposes. > > Perhaps it should base the number of cells on the width of the > clusters. However, continuing with my example, U+1789 KHMER LETTER NYO > as a base character is too wide to fit in a cell, and the next > character will overwrite its right-hand part. From this I deduce that it > is allocated just one cell. Gnome terminal is not alone in doing this, > but it does better than some, in my opinion, in that the overflow of the > foreground of one cell is not obliterated by the background of the > next cell. U+1789 has an East Asian width property of 'Neutral', which > is distinctly unhelpful. > > What I would like is a specification of what a font must do to avoid > such problems. > > > > > I don't see how you can expect wcwidth, or any other > > > > interface that was designed to work with _characters_, to be > > > > useful when you need to display grapheme clusters. > > It, or something similar but worse, gets used, especially when moving > the cursor for editing. > > > > Well I can envisage a decision being made that a grapheme cluster > > > str (as decreed by the terminal) shall occupy wcswidth(str) cells - > > > "The wcswidth() function returns the number of column positions for > > > the wide-character string s, truncated to at most length n". > > > > AFAIU, the shaping engine returns its output in terms of font glyph > > numbers, not character codepoints, so you cannot in general call > > wcswidth on them. The shaper also returns the advance information, > > which serves instead of wcwidth and related APIs for determining the > > actual width on display. > > Unfortunately, when the rectangular grid is being preserved, > typographical advance width is generally ignored when determining the > placement of characters. Now, this is not always true; one can have > the situation where the the positioning of characters respects the > advance widths, but the positioning of the cursor assumes a fixed-width > rectangular grid. I have found working with that to be extremely > confusing. > > Richard. >
Encoding colour (from Re: Encoding italic)
Egmont Koblinger wrote: Should this scheme be extended for colors, too? What to do with the legacy 8/16 as well as the 256-color extensions wrt. the color palette? Should Unicode go into the business of defining a fixed set of colors, or allow to alter the palette colors using the OSC 4 and friends escape sequences which supported by about half of the terminal emulators out there? Encoding colour is already a topic in relation to emoji and maybe could be extended to other characters. A stateful method, though which might be useful for plain text streams in some applications, would be to encode as characters some of the glyphs for indicating colours and the digit characters to go with them from page 5 and from page 3 of the following publication. http://www.users.globalnet.co.uk/~ngo/locse027.pdf What to do with things that Unicode might also want to have, but doesn't exist in terminal emulators due to their nature, such as switching to a different font size? Well, if people were to want to do it, there could be a character encoded in the Specials section and then use that character as a base character and follow it with a sequence of tag characters. William Overington Saturday 9 February 2019
Re: Encoding colour (from Re: Encoding italic)
Previously I wrote: A stateful method, though which might be useful for plain text streams in some applications, would be to encode as characters some of the glyphs for indicating colours and the digit characters to go with them from page 5 and from page 3 of the following publication. http://www.users.globalnet.co.uk/~ngo/locse027.pdf Thinking about this further, for this application copies of the glyphs could be redesigned so as to be square and could be emoji-style and the meanings of the characters specifying which colour component is to be set could be changed so that they refer to the number previously entered using one or more of the special digit characters. Thus the setting of colour components could be done in the same reverse notation way that the FORTH computer language works. Yet although the colour components thus set would be stateful until changed there would be no Escape sequence and if an application did not support interpretation of the characters as setting colours, they would just be displayed as glyphs, each either as a particular glyph or as a .notdef glyph. William Overington Saturday 9 February 2019
Re: Encoding italic
On Sat, 9 Feb 2019 04:52:30 -0800 David Starner via Unicode wrote: > Note that this is actually the only thing that stands out to me in > Unicode not supporting older character sets; in PETSCII (Commodore > 64), the high-bit character characters were the reverse (in this > sense) of the low-bit characters. Later ISCII has some styling codes, bold and italic amongst them. Richard.
Re: Bidi paragraph direction in terminal emulators
On Sat, 09 Feb 2019 09:42:09 +0200 Eli Zaretskii via Unicode wrote: > > Date: Sat, 9 Feb 2019 00:18:14 + > > From: Richard Wordingham via Unicode > > > > > For character composition, you must have a shaping engine to talk > > > to, and the shaper should tell you the width of each grapheme > > > cluster it returns. > > > > (a) What defines the grapheme clusters? The definition might be > > terminal-specific. > > Well, the "you" above alluded to the terminal emulator, of course. > The grapheme clusters are determined by the shaping engine that the > emulator must call when appropriate (or always). I find it very hard to believe that that is how it works with GNOME Terminal (Version 3.18.3, using VTE Version 0.42.5). At the command line I typed in the Khmer script string ក្កេក (KA, COENG, KA, SIGN E, KA), and saw the string split into four columns (KA, COENG), (KA), (SIGN E), (KA), with each column given the same width. When written correctly, SIGN E is first in visual order. The fourth column was displayed on top of the third column, which contained a dotted circle to show that SIGN E on its own was not grammatically correct. If I were writing a Khmer font for use with Gnome terminal, I would attempt to ensure that the display for SIGN E fitted in a single cell. Of course, the renderer's grapheme cluster boundaries don't always match appearances. To get the traditional placement of U+1A58 TAI THAM SIGN MAI KANG LAI, I end up with it being a mark glyph one cluster later than HarfBuzz indicates it to be. It would be good to be able to access a maintained statement of the VTE rules for allocating characters to a cell, or group of cells, as appropriate. > > (b) With a terminal that expects a fixed width font, surely the > > terminal decides how many cells it allocates to a group of > > characters, and the font designer has to come up with a suitable > > value based on that. > > Yes. A terminal emulator that works with a shaper should probably > post-process the width information returned by the shaper for these > purposes. Perhaps it should base the number of cells on the width of the clusters. However, continuing with my example, U+1789 KHMER LETTER NYO as a base character is too wide to fit in a cell, and the next character will overwrite its right-hand part. From this I deduce that it is allocated just one cell. Gnome terminal is not alone in doing this, but it does better than some, in my opinion, in that the overflow of the foreground of one cell is not obliterated by the background of the next cell. U+1789 has an East Asian width property of 'Neutral', which is distinctly unhelpful. What I would like is a specification of what a font must do to avoid such problems. > > > I don't see how you can expect wcwidth, or any other > > > interface that was designed to work with _characters_, to be > > > useful when you need to display grapheme clusters. It, or something similar but worse, gets used, especially when moving the cursor for editing. > > Well I can envisage a decision being made that a grapheme cluster > > str (as decreed by the terminal) shall occupy wcswidth(str) cells - > > "The wcswidth() function returns the number of column positions for > > the wide-character string s, truncated to at most length n". > > AFAIU, the shaping engine returns its output in terms of font glyph > numbers, not character codepoints, so you cannot in general call > wcswidth on them. The shaper also returns the advance information, > which serves instead of wcwidth and related APIs for determining the > actual width on display. Unfortunately, when the rectangular grid is being preserved, typographical advance width is generally ignored when determining the placement of characters. Now, this is not always true; one can have the situation where the the positioning of characters respects the advance widths, but the positioning of the cursor assumes a fixed-width rectangular grid. I have found working with that to be extremely confusing. Richard.
Re: Encoding italic
On Sat, Feb 9, 2019 at 4:58 AM David Starner via Unicode < unicode@unicode.org> wrote: > > On Sat, Feb 9, 2019 at 3:59 AM Kent Karlsson via Unicode < > unicode@unicode.org> wrote: > >> >> Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" > >: >> > • Reverse on: ESC [7m >> > • Reverse off: ESC [27m >> >> "Reverse" = "switch background and foreground colours". >> >> This is an (odd) colour thing. If you want to go with (full!) colour >> (foreground and background), fine, but the "reverse" is oddball (and >> based on what really old terminals were limited to when it comes to >> colour). >> > > Note that this is actually the only thing that stands out to me in Unicode > not supporting older character sets; in PETSCII (Commodore 64), the > high-bit character characters were the reverse (in this sense) of the > low-bit characters. > This is true, many legacy character sets encoded reverse-video characters as wholly-separate characters, and even allowed them in contexts widely considered plain-text such as file names. This makes reverse-video possibly the one text attribute best argued to be worthy of encoding in Unicode. But I can already tell you it won't work, because we made such an argument in an early version of L2/19-025, and even proposed using VS14, the very same VS William Overington has since swiped from us for italics. That proposal was shot down rather quickly. Bold, italics, etc. don't even stand a chance.
Re: Encoding italic
On Sat, Feb 9, 2019 at 3:59 AM Kent Karlsson via Unicode < unicode@unicode.org> wrote: > > Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" >: > > • Reverse on: ESC [7m > > • Reverse off: ESC [27m > > "Reverse" = "switch background and foreground colours". > > This is an (odd) colour thing. If you want to go with (full!) colour > (foreground and background), fine, but the "reverse" is oddball (and > based on what really old terminals were limited to when it comes to > colour). > Note that this is actually the only thing that stands out to me in Unicode not supporting older character sets; in PETSCII (Commodore 64), the high-bit character characters were the reverse (in this sense) of the low-bit characters.
Re: Encoding italic
Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" : > I'd like to propose encoding italics and similar display attributes in > plain text using the following stateful mechanism: Note that these do NOT nest (no stack...), just state changes for the relevant PART of the "graphic" (i.e. style) state. So the approach in that regard is quite different from the approach done in HTML/CSS. > Italics on: ESC [3m > Italics off: ESC [23m > Bold on: ESC [1m > Bold off: ESC [22m > Underline on: ESC [4m (implies turning double underline off) Underline, double: ESC [21m (implies turning single underline off) > Underline off: ESC [24m > Strikethrough on: ESC [9m > Strikethrough off: ESC [29m > Reverse on: ESC [7m > Reverse off: ESC [27m "Reverse" = "switch background and foreground colours". This is an (odd) colour thing. If you want to go with (full!) colour (foreground and background), fine, but the "reverse" is oddball (and based on what really old terminals were limited to when it comes to colour). I'd rather include 'ESC [50m' (not variable spacing, i.e. "monospace" font) and 'ESC [26m' (variable spacing, i.e. "proportional" font). Recall that this is NOT for terminal emulators but for styling applied to text outside of terminal emulators. (Terminal emulators already implement much of this and more; albeit sometimes wrongly). This would be handy for including (say) programming code or computer commands (or for that matter, "ASCII art", or more generally "Unicode art") in otherwise "ordinary" text... (The "ordinary" text preferably set in a proportional font.) > Reset all attributes: ESC [m (Actually 'ESC [0m', with the 0 default-able.) Handy, agreed, but not 100% necessary. These ESC-sequences should not normally be inserted "manually" but by a text editor program, using the conventional means of "making bold" etc. (ctrl-b, cmd-b, "bold" in a menu); only "hackers" (in the positive sense) would actually bother about the command sequences as such. /Kent K > where ESC is U+001B. > > This mechanism has existed for around 40 years and is already supported > as widely as any new Unicode-only convention will ever be. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >
Re: Encoding italic
Den 2019-02-08 22:29, skrev "Egmont Koblinger via Unicode" : > (Mind you, I don't find it a good idea to add italic and whatnot > formatting support to Unicode at all... but let's put aside that now.) I don't think Doug mean to "add it to the Unicode standard", just to have a summary of "handy esc-sequences (actually command-sequences) for simple styling of text" picked from long-standing (text level...) standards. > There are a lot of problems with these escape sequences, and if you go > for a potentially new standard, you might not want to carry these > problems. > > There is not a well-defined framework for escape sequences. In this > particular case you might say it starts with ESC [ and ends with the > letter 'm', but how do you know where to end the sequence if that > letter 'm' just doesn't arrive? Terminal emulators have extremely There is an overriding "basic (overall) syntax" for esc-seq/ command-sequences that do not include a string argument (like OSC, APC, ...). IIUC it is (originally as byte sequences, but here as character sequences): \u001B[\u0020-\002F]*[\u0030-\007E]| (\u001B'['|\009B)[\u0030-\003F]*[\u0020-\002F]*[\u0040-\007E] (no newline or carriage return in there). True, that has no direct limit, but it would not be unreasonable to set a limit of (say) max 30 characters. Potential (i.e. starting with ESC) esc-"sequences" that do not match the overall syntax or are too long can simply be rendered as is (except for the ESC itself). The esc/command sequences (that match) but are not interpreted should be ignored in "normal" (not "show invisibles" mode) display. They are unlikely to be "default ignored" by such things as sorting (and should preferably be filtered out beforehand, if possible). But if we compare to other rich text editors, the command sequences should be ignored by (interactive) searching, just like HTML tags are ignored in interactive searching (the internal representation "skipping" the HTML tags in one way or another). HTML tags should also (when text known to be HTLM) filtered out before doing such things as sorting. > complex tables for parsing (and still many of them get plenty of > things wrong). It's unreasonable for any random small utility > processing Unicode text to go into this business of recognizing all > the well-known escape sequences, not even to the extent to know where > they end. Whatever is designed should be much more easily parseable. > Should you say "everything from ESC[ to m", you'll cause a whole bunch > of problems when a different kind of escape sequence gets interpreted > as Unicode. The escape/command sequences would not be part of Unicode (standard). > A parser, by the way, would also have to interpret combined sequences > like ESC[3;0;1m or alike, for which I don't see a good reason as > opposed to having separate sequences for each. Also, it should be Formally covered by the (non-Unicode) standards, but optional (IIUC). > carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[ > opening for an escape sequence here terminal emulators vary. These > just make everything even more cumbersome. > > ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity". I think one should interpret these in a "modern" way, not looking too much at what old terminals were limited to. (Colour ("increased intensity") should be handled completely separately from bold.) > Should this scheme be extended for colors, too? What to do with the > legacy 8/16 as well as the 256-color extensions wrt. the color > palette? Should Unicode go into the business of defining a fixed set > of colors, or allow to alter the palette colors using the OSC 4 and > friends escape sequences which supported by about half of the terminal > emulators out there? IF extending to colour, only refer to "true colour" (RGB) command-sequence. The colour palette versions are for the limitations of (semi-)old terminals. > For 256-colors and truecolors, there are two or three syntaxes out > there regarding whether the separator is a colon or a semicolon. It can only be colon. Using semicolon would interfere with the syntax for multiple style specifications in one command sequence. (I by mistake wrote a semicolon there in an earlier post; sorry.) > Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m > for curly underline. What to do with them? Where to draw the line what (Note colon, not semicolon, as separator.) Possible, partially matching the capabilities for underlining via CSS (solid, dotted, dashed, wavy, double). Depends on how much styling options one wants to pick up. > to add to Unicode and what not to? Will Unicode possibly be a I don't think anyone wants to make this part of the Unicode standard. (A the most a Unicode technical note...; from Unicode's point of view.) [...] > What to do with things that Unicode might also want to have, but > doesn't exist in terminal emulators due to their nature, such as > switching
Re: Encoding italic
On Fri, 8 Feb 2019 18:08:34 -0800 Asmus Freytag via Unicode wrote: > On 2/8/2019 5:42 PM, James Kass via Unicode wrote: > You are still making the assumption that selecting a different glyph > for the base character would automatically lead to the selection of a > different glyph for the combining mark that follows. That's an iffy > assumption because "italics" can be realized by choosing a separate > font (typographically, italics is realized as a separate typeface). The usual practice is to look for a font that supports both base character and mark. > Under the implicit assumptions bandied about here, the VS approach > thus reveals itself as a true rich-text solution (font switching) > albeit realized with pseudo coding rather than markup, markdown or > escape sequences. Isn't that already the case if one uses variation sequences to choose between Chinese and Japanese glyphs? >> Of course, the user might insert VS14s without application >> assistance. In which case hopefully the user knows the rules. The >> worst case scenario is where the user might insert a VS14 after a >> non-base character, in which case it should simply be ignored by any >> application. It should never “break” the display or the processing; >> it simply makes the text for that document non-conformant. (Of >> course putting a VS14 after “ê” should not result in an italicized >> “ê”.) Is there any obligation on applications to ignore it? In plain text, the Unicode rules allow the application to choose to render every third 'ê' as italic. Possibly it comes down to the mens rea of the application (or of its coder or specifier), but without mentalism an application could opt to treat <ê, VS14> as . A relevant concern would be 'voracious' with the first 'o' italicised by VS14. How would current typeface selection logic work? I can envisage only being in the cmap of an italic font. Richard.
Re: Bidi paragraph direction in terminal emulators
> From: Elias Mårtenson > Date: Sat, 9 Feb 2019 13:33:49 +0800 > Cc: Egmont Koblinger , unicode > > Moreover, emitting the control sequences that set the mode is in > itself a complication, because if the terminal doesn't support them, > the result could be corrupted display. You will need methods of > detecting the support, and those detection methods usually involve > sending another control sequence to the terminal and waiting for > response, something that complicates applications and causes delays in > displaying output. > > That's what the TERM environment variable is for though. That's not indicative enough when some version of a terminal starts to support a feature not supported by previous versions of the same terminal. Happens a lot with terminal emulators such as xterm, which are under active development, and add features all the time.