Re: Proposal for BiDi in terminal emulators
> Date: Sun, 3 Feb 2019 20:35:18 + > From: Richard Wordingham via Unicode > > > What is "screen overwriting" in this context? > > When instead of adding lines to the bottom, new lines are added on top > of and replace existing lines. I prefer the scrollable terminal > behaviour to the teletype behaviour of Emacs when running the > Linux(?) monitor program 'top', but being a fuddy duddy I prefer the > teletype behaviour of Emacs for 'man'. From an error message from > 'info', it seems that the Emacs buffer is classified as a 'dumb' > terminal. Try customizing scroll-conservatively, it sounds like you want that.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 4 Feb 2019 00:36:23 +0100 Egmont Koblinger via Unicode wrote: > Now, back to terminals. > > The smallest possible viable definition of a "paragraph" in terminal > emulators is stuff between one newline and the next one. > > It would require a hell lot of work, redesigning (overcomplicating) > plenty of basics of terminal emulation to be able to come up with > smaller units, e.g. cells of a table – a concept that doesn't > currently exist in this world –, I don't find any such approach > feasible at all. The concept appears to exist in the form of the fields of the fifth edition of ECMA-48. Have you digested this ambitious standard? ECMA-48 has the concept of hyphenation and wrapping! (Well, in Appendix C it does. I haven't fully tied it in with the receipt of characters.) Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 4 Feb 2019 00:36:23 +0100 Egmont Koblinger via Unicode wrote: > I wish to store and deliver the following text, as it's layed out here > in logical order. That is, the order as the bytes appear in the text > file, as I typed them from the keyboard, is laid out here strictly > from left to right, with uppercase standing for RTL letters, and no > mirroring: > > lorem ipsum ABC <[ DEF foobar > Let's assume that me, as the producer of the text file, wish to create > a typical README in the spirit of COPYING.GPL and similar text files, > with the paragraph definition that two consecutive newline characters > (that is: a single empty line) delimit paragraphs; and a single > newline is equivalent to a space. Since I'd prefer to keep a margin of > 16 characters in the source file (for demo purposes), I can take the > liberty of replacing the space after "ABC" by a single newline. (Maybe > my text editor does this automatically.) The file's contents, again > the logical order laid out from left to right, top to bottom, becomes > this: > > lorem ipsum ABC > <[ DEF foobar That split is wrong if you want the non-HTML text to lay out reasonably well in anything but a higher order protocol forcing RTL. You need to it split as: lorem ipsum ABC <[ DEF foobar or lorem ipsum ABC <[ DEF foobar Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Sun, 03 Feb 2019 19:50:50 +0200 Eli Zaretskii via Unicode wrote: > Do you see how this is carefully formatted to avoid overflowing an > 80-column line of a typical terminal? Now suppose this is translated > into a RTL language, which causes the Copyright line to start with a > strong R letter (because "Copyright" is translated). You will see the > first line flushed to the right margin, then the next line flushed to > the left margin (because it's a separate paragraph, and starts with a > strong L letter). Then the line which says "The default action..." > will again start at the right. And so on and so forth -- the result > is extremely ugly. Depending on the environment. If you look at it in Notepad, all lines will be LTR or all lines will be RTL. Would not a careful translator either ensure that each non-blank line had a strong character and that all first strong characters were (a) L, (b) R or (c) AL? Text in LTR scripts tends not to be so careful. Richard.
Re: Proposal for BiDi in terminal emulators
On Sun, 03 Feb 2019 18:13:06 +0200 Eli Zaretskii via Unicode wrote: > Actually, you pass the characters to be shaped in logical order, and > then display the produced grapheme clusters in visual order. Some early systems supporting computerised Hebrew script did pass characters in left-to-right order. This works fairly well when the contents of character cells do not interact. Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, (I'm responding in multiple emails.) The Unicode BiDi algorithm states that it operates on paragraphs of text, and leaves it up to a higher protocol to define what a paragraph exactly is. What's the definition of "paragraph" in the context of plain text files? I don't think there's a single well-established practice. In some particular text files, every explicit newline character starts a new paragraph. In some (e.g. COPYING.GPL and friends), an empty line (that is: two consecutive newline characters) separates two paragraphs. In some, e.g. in Emacs's TUTORIAL.he, or markdown files, it's way more complicated, probably there isn't a well-defined grammar for how exactly bullet list entries and alike should become new paragraphs. In the output of "dpkg -s packagename" consecutive lines indented by 1 space – except for those where there's only a single dot after the space – form the human-perceived paragraphs. There are sure several other syntaxes out there. If the producer of a text file uses a different definition than the viewer software, bugs can arise. I think this should be intuitively obvious, but just in case, let me give a concrete example. In this example I'll assume LTR paragraph direction set up by some external means; with autodetected paragraph direction it's much easier to come up with such breakages. I wish to store and deliver the following text, as it's layed out here in logical order. That is, the order as the bytes appear in the text file, as I typed them from the keyboard, is laid out here strictly from left to right, with uppercase standing for RTL letters, and no mirroring: lorem ipsum ABC <[ DEF foobar The visual representation, what I expect to see in any decent viewer software, is this one according to the BiDi algorithm this: lorem ipsum FED ]> CBA foobar The visual representation, in a narrower viewport, might wrap for example like this: lorem ipsum CBA FED ]> foobar which is still correct, given that logical "ABC <[ DEF" is a single RTL run. (This assumes a viewer which, unlike Emacs, follows the Unicode BiDi algorithm for wrapping a paragraph into multiple lines.) Let's assume that me, as the producer of the text file, wish to create a typical README in the spirit of COPYING.GPL and similar text files, with the paragraph definition that two consecutive newline characters (that is: a single empty line) delimit paragraphs; and a single newline is equivalent to a space. Since I'd prefer to keep a margin of 16 characters in the source file (for demo purposes), I can take the liberty of replacing the space after "ABC" by a single newline. (Maybe my text editor does this automatically.) The file's contents, again the logical order laid out from left to right, top to bottom, becomes this: lorem ipsum ABC <[ DEF foobar This file, accoring to the paragraph definition chosen earlier, is equivalent to the unwrapped version shown before, and thus should convey the same message. If I view this file in a piece of software which uses the same paragraph definition for BiDi purposes, the contents will appear as expected. An example for such a viewer is a markdown converter's (that leaves single newlines as-is, and adds a "" at double newlines) output viewed as an html file in a browser. Here comes the twist. Let's view this latter file with a viewer that uses a _different_ definition for paragraph. Let's view it in Gedit, Emacs, or the work-in-progress BiDi-aware VTE by "cat"ing it, where every newline begins a new paragraph – that's how these viewers define the notion of "paragraph" for the sake of BiDi. The visual layout in these viewers becomes: lorem ipsum CBA <[ FED foobar which is just not correct. Since here BiDi is run on the two lines separately, the initial "<[" is treated as LTR, placed at the wrong location in the wrong order, and the glyphs aren't mirrored. Now, Emacs ships a TUTORIAL.he which, for most of its contents (but not everywhere) seems to treat runs between empty lines as paragraphs, while Emacs itself is a viewer that treats runs between single newlines as paragraphs. That is, Emacs is inconsistent with itself. In case you think I got something wrong with Emacs: Could you please give exact definitions: - What are the exact units (so-called "paragraphs" by UAX9) that it runs BiDi on when it loads and displays a file? - What are the exact units (so-called "paragraphs" by UAX9) in TUTORIAL.he on which BiDi needs to be run in order to get the desired readable version? What most likely happens is that in order to see a difference, you'd need to have more special symbols, or at least a more special constellation of them. Probably TUTORIAL.he is just luckily simple enough that such a difference isn't hit. Another possibility is (and I cannot check because I can't speak Hebrew) that somewhere TUTORIAL.he "cheats" with the logical order to get the desired visual one. - Now, back to terminals. The smallest possible viable definition of a
Re: Proposal for BiDi in terminal emulators
On Sun, 03 Feb 2019 20:07:51 +0200 Eli Zaretskii via Unicode wrote: > > Date: Sun, 3 Feb 2019 17:45:06 + > > From: Richard Wordingham via Unicode > > > > > > So, what do you recommend I run grep from for Hebrew or Tai > > > > Lue? > > > > > > Inside Emacs, of course: "M-x grep RET" etc. > > > > That assumes you like using bindings for all the commands; I > > don't. > > What bindings? "M-x grep" just shows the Grep hits in a separate > window, you don't need to do anything except reading them. > > The advantage is that you get bidi reordering and text shaping for > free, something you won't get from most terminals. Which is why I try to remember to issue the emacs command 'M-x shell' command and issue grep commands from the buffer created thereby. The point I'm making is that this emacs command hasn't made terminal emulators obsolete, even though it also does graphics. Richard.
Re: Proposal for BiDi in terminal emulators
On Sun, 03 Feb 2019 18:05:49 +0200 Eli Zaretskii via Unicode wrote: > > Date: Sat, 2 Feb 2019 21:49:40 + > > From: Richard Wordingham via Unicode > > > > Eli will probably tell me I'm behind the times, but there are a few > > places where a Gnome-terminal is better than an Emacs GUI window. > > One is colour highlighting of text found by grep. > > ??? The Emacs 'grep' command also highlights the matches, by > interpreting the escape sequences emitted by Grep the program it > invokes. > > > Another is that screen overwriting doesn't work in an Emacs > > window. > > What is "screen overwriting" in this context? When instead of adding lines to the bottom, new lines are added on top of and replace existing lines. I prefer the scrollable terminal behaviour to the teletype behaviour of Emacs when running the Linux(?) monitor program 'top', but being a fuddy duddy I prefer the teletype behaviour of Emacs for 'man'. From an error message from 'info', it seems that the Emacs buffer is classified as a 'dumb' terminal. Richard.
Re: Proposal for BiDi in terminal emulators
> Date: Sun, 3 Feb 2019 17:45:06 + > From: Richard Wordingham via Unicode > > > > So, what do you recommend I run grep from for Hebrew or Tai Lue? > > > > Inside Emacs, of course: "M-x grep RET" etc. > > That assumes you like using bindings for all the commands; I don't. What bindings? "M-x grep" just shows the Grep hits in a separate window, you don't need to do anything except reading them. The advantage is that you get bidi reordering and text shaping for free, something you won't get from most terminals. > Command recall and having completion options serve me very well. Your > suggestion comes unstuck when I attempt to switch between the window's > keyboard and the MULE keyboard in the middle of the command. 'M-x' > isn't recursive. This isn't an Emacs forum, so I will leave it at that; but you are wrong on all counts.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> From: Egmont Koblinger > Date: Sun, 3 Feb 2019 17:54:25 +0100 > Cc: unicode@unicode.org > > I'm arguing, although my reasons are not rock solid, that IMHO the > default should be the strict direction as set by SCP, without > autodetection. I think it's unreasonable and impractical to expect 'echo', 'cat', and its ilk to emit bidi controls (or any other controls) to force paragraph direction. For starters, they won't know what direction to force, because they don't understand the text they are processing. No, this simple case must work reasonably well with the application _completely_ oblivious to the bidi aspects. If this can't work reasonably well, I submit that the entire concept of having a bidi-aware terminal emulator doesn't "hold water". > > The fundamental problem here is that most "simple" utilities use hard > > newlines to present text in some visually plausible format. > > Could you please list examples? Just redirect any of them to a file, and look at the file with a hex editor. You will see a hard newline character, 0x0A, at the end of each line. > What I have in mind are "echo", "cat", "grep" and alike, they don't > care about the terminal width. Terminal width is not always relevant here, and I didn't mention it. However, as long as you allude to that, I think your garden-variety text utility does assume the width of a terminal window is 80 columns, and the messages displayed by these programs are formatted accordingly. > If an app cares about the terminal width, how does it care about it? > What does it use this information for? To truncate overlong strings, > for example? To break long lines at appropriate places, and to emit text that fits on a line in the first place. Just try invoking any such utility with the --help option, and you will see what I mean. I give one example below. > At this very moment I'd argue that such applications need > to do BiDi on their own, and thus set the terminal to explicit mode. > In ap app does any kind of string truncation, it can no longer > delegate the task of BiDi to the terminal emulator. I'm afraid this won't fly, because most "simple" utilities do it that way. If you insist on them doing their own bidi, you've just lost your cause. No upstream developer will be interested in adapting their utilities to a terminal emulator that requires them to do their own bidi. > I'm also mentioning that you cannot both logically and visually > truncate a BiDi string at once. I don't understand why you talk about truncation; I didn't. Here, look at this random example: Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. Zip 3.0 (July 5th 2008). Usage: zip [-options] [-b path] [-t mmdd] [-n suffixes] [zipfile list] [-xi list] The default action is to add or replace zipfile entries from list, which can include the special name - to compress standard input. If zipfile and list are omitted, zip compresses stdin to stdout. -f freshen: only changed files -u update: only changed or new files -d delete entries in zipfile-m move into zipfile (delete OS files) -r recurse into directories -j junk (don't record) directory names -0 store only -l convert LF to CR LF (-ll CR LF to LF) -1 compress faster -9 compress better -q quiet operation -v verbose operation/print version info -c add one-line comments-z add zipfile comment -@ read names from stdin-o make zipfile as old as latest entry -x exclude the following names -i include only the following names -F fix zipfile (-FF try harder) -D do not add directory entries -A adjust self-extracting exe -J junk zipfile prefix (unzipsfx) -T test zipfile integrity -X eXclude eXtra file attributes -! use privileges (if granted) to obtain all aspects of WinNT security -$ include volume label -S include system and hidden files -e encrypt -n don't compress these suffixes -h2 show more help Do you see how this is carefully formatted to avoid overflowing an 80-column line of a typical terminal? Now suppose this is translated into a RTL language, which causes the Copyright line to start with a strong R letter (because "Copyright" is translated). You will see the first line flushed to the right margin, then the next line flushed to the left margin (because it's a separate paragraph, and starts with a strong L letter). Then the line which says "The default action..." will again start at the right. And so on and so forth -- the result is extremely ugly. > > Even when > > these utilities just emit text read from files (as opposed to > > generating the text from the program), you will normally see each line > > end with a hard newline, because the absolute majority of text files > > have a hard newline and the end of each line. > > How does a BiDi
Re: Proposal for BiDi in terminal emulators
On Sun, 03 Feb 2019 18:14:53 +0200 Eli Zaretskii via Unicode wrote: > > Date: Sun, 3 Feb 2019 02:43:06 + > > Cc: Kent Karlsson > > From: Richard Wordingham via Unicode > > > > So, what do you recommend I run grep from for Hebrew or Tai Lue? > > Inside Emacs, of course: "M-x grep RET" etc. That assumes you like using bindings for all the commands; I don't. Command recall and having completion options serve me very well. Your suggestion comes unstuck when I attempt to switch between the window's keyboard and the MULE keyboard in the middle of the command. 'M-x' isn't recursive. Still, your suggestion should be useful for grepping for ASCII stuff. Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, > The document cited at the beginning of the parent thread states that > "simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should > use the "implicit" mode of bidi reordering, with automatic guessing of > the base paragraph direction. Not exactly. I take the SCP escape sequence from ECMA TR/53 (and slightly reinterpret it) so that it specifies the paragraph direction, plus introduce a new one that specifies whether autodetection is enabled. I'm arguing, although my reasons are not rock solid, that IMHO the default should be the strict direction as set by SCP, without autodetection. > The fundamental problem here is that most "simple" utilities use hard > newlines to present text in some visually plausible format. Could you please list examples? What I have in mind are "echo", "cat", "grep" and alike, they don't care about the terminal width. If an app cares about the terminal width, how does it care about it? What does it use this information for? To truncate overlong strings, for example? At this very moment I'd argue that such applications need to do BiDi on their own, and thus set the terminal to explicit mode. In ap app does any kind of string truncation, it can no longer delegate the task of BiDi to the terminal emulator. I'm also mentioning that you cannot both logically and visually truncate a BiDi string at once. Either you truncate the logical string, which may result in a visual nonsense, or you truncate the visual string, risking that it's not an initial fragment of the data that ends up getting displayed. Along these lines I'm arguing that basic utilities like "cut" shouldn't care about BiDi, the logical behavior there is more important than the visual one. There could, of course, be sophisticated "bidi-cut" and similar utilities at one point which cut the visual string, but they should use the terminal's explicit mode. > Even when > these utilities just emit text read from files (as opposed to > generating the text from the program), you will normally see each line > end with a hard newline, because the absolute majority of text files > have a hard newline and the end of each line. How does a BiDi text file look like, to begin with? Can a heavily BiDi text file be formatted to 72 (or whatever) columns using explicit newlines, keeping BiDi both semantically and visually correct? I truly doubt that. Can you show me such files? > When bidirectional text is reordered by the terminal emulator, these > hard newlines will make each line be a separate paragraph. And this > is a problem, because the result will be completely random, depending > on the first strong directional character in each line, and will be > visually very unpleasant. Just take the output produced by any > utility when invoked with, say, the --help option, and try imagining > how this will look when translated into a language that uses RTL > script. First, having no autodetection by default but rather an explicit control for the overall direction hopefully mitigates this problem. Second, I outline a possible future extension with a different definition of a "paragraph", maybe something between empty lines, or other kinds of explicit markers. > So I think determination of the paragraph direction even in this > simplest case cannot be left to the UBA defaults, and there's a need > to use "higher-level" protocols for paragraph direction. That higher level protocol is part of my recommendation, part of ECMA TR/53, as the SCP sequence. Does this make sense? cheers, egmont
Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> Date: Sun, 03 Feb 2019 18:10:15 +0200 > Cc: richard.wording...@ntlworld.com, unicode@unicode.org > From: Eli Zaretskii via Unicode > > I think there are hard problems even for such "simple" utilities, and > I will start a separate thread about this. I think we spent enough time discussing issues of complex script shaping in terminal emulators, something that IMO took us too far aside. The basic problems with bidi reordering of text-mode output start much sooner, and are much more fundamental. I think they should be considered first. The document cited at the beginning of the parent thread states that "simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should use the "implicit" mode of bidi reordering, with automatic guessing of the base paragraph direction. I think this already present non-trivial problems. The fundamental problem here is that most "simple" utilities use hard newlines to present text in some visually plausible format. Even when these utilities just emit text read from files (as opposed to generating the text from the program), you will normally see each line end with a hard newline, because the absolute majority of text files have a hard newline and the end of each line. When bidirectional text is reordered by the terminal emulator, these hard newlines will make each line be a separate paragraph. And this is a problem, because the result will be completely random, depending on the first strong directional character in each line, and will be visually very unpleasant. Just take the output produced by any utility when invoked with, say, the --help option, and try imagining how this will look when translated into a language that uses RTL script. So I think determination of the paragraph direction even in this simplest case cannot be left to the UBA defaults, and there's a need to use "higher-level" protocols for paragraph direction. IOW, the implicit mode described in the above-mentioned document needs to be augmented by a smarter method of determining the base paragraph direction. (I might have a suggestion for that, if people agree with the above reasoning.)
Re: Proposal for BiDi in terminal emulators
> Date: Sun, 3 Feb 2019 02:43:06 + > Cc: Kent Karlsson > From: Richard Wordingham via Unicode > > So, what do you recommend I run grep from for Hebrew or Tai Lue? Inside Emacs, of course: "M-x grep RET" etc.
Re: Proposal for BiDi in terminal emulators
> Date: Sun, 3 Feb 2019 01:30:26 + > From: Richard Wordingham via Unicode > > Shaping for RTL scripts happens on strings stored in logical order. > These are then laid out right to left, though the dominant usage of > the term 'advance width' for right-to-left glyph sequences feels > perversely different from the use for left to right glyph sequences. > > Passing text in the form of characters in left-to-right order is an > annoying distraction, presumably forced on you by the attempt to > maximise compatibility with existing systems. Actually, you pass the characters to be shaped in logical order, and then display the produced grapheme clusters in visual order.
Re: Proposal for BiDi in terminal emulators
> Date: Sat, 2 Feb 2019 23:02:10 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > On top of this, I make the clarification that combining marks need to > be reordered to be sent out to the terminal emulator _after_ their > base letter That is true in general regarding any text shaping: the shaping engine needs the characters to be submitted in the logical order. When Emacs works on a text-mode terminal, it sends characters to be shaped together, such as base character and its combining marks, in logical order, even when the surrounding text is reordered into visual order. > What I add is another mode (the technically less problematic > "implicit" mode where the terminal displays the contents just as any > BiDi-aware graphical text editor, browser etc. would do) for the > sake of "cat"-like simple utilities I think there are hard problems even for such "simple" utilities, and I will start a separate thread about this.
Re: Proposal for BiDi in terminal emulators
> Date: Sat, 2 Feb 2019 21:49:40 + > From: Richard Wordingham via Unicode > > Eli will probably tell me I'm behind the times, but there are a few > places where a Gnome-terminal is better than an Emacs GUI window. One > is colour highlighting of text found by grep. ??? The Emacs 'grep' command also highlights the matches, by interpreting the escape sequences emitted by Grep the program it invokes. > Another is that screen overwriting doesn't work in an Emacs window. What is "screen overwriting" in this context?
Re: Proposal for BiDi in terminal emulators
> Date: Sun, 3 Feb 2019 03:02:13 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > > All I am saying is that your proposal should define what it means by > > visual order. > > Are you nitpicking on me not giving a precise definition on the > otherwise IMO freaking obvious "visual order" Most probably. The definition is trivial: the order of characters on display, from left to right. The only possible reason to split hairs here could be when some characters don't appear on display, like control characters. Other than that, there should be no doubt what visual order means.