Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Sun, Feb 17, 2019 at 1:59 PM Philippe Verdy wrote: > Resist this idea, I've not been impolite. I didn't say a word about you being impolite. I said I might be impolite for not wishing to continue this discussion in that direction. > I just want to show you that terminals are legacy environments You might have missed the thread's opening mail where I mentioned that I've been developing a terminal emulator for five years. So I'm not sure what you exactly want to show me about what a legacy environment it is; I think I perfectly know it. > that are far behind what is needed for proper internationalization For many languages (or should I say scripts) internationalization is pretty well solved in terminals. For others, requiring LTR complex rendering, so-so. For RTL scripts it's a straight disaster, an application can't even count on the letters of a word showing up in the expected order, no matter what it does. My work fixes the latter only, within(!) the limitations of this legacy environment. I don't find it feasible to get rid of this legacy (the concept of strict grid), and I find it a waste of time to ponder about it. Not sure why after about 200 mails on the topic, I still have a hard time getting this message through. Seems to me that folks here on the Unicode list want everything to be perfect for all the scripts at once and not compromise to the slightest bit; and don't really appreciate work that only offers partial improvement due to a special context's constraints. This is something I didn't expect when I posted to this list. At this point I think I've gathered all the actionable positive feedback I could (two issues: one is that shaping needs to be done differently, and the other one is that the paragraph direction should be detected on larger chunks of data (at least optionally) – thanks again for them, I'll rework my spec accordingly). For all the rest, irrelevant and hopeless stuff, like switching to proportional fonts, IMO it's high time we let this thread end here. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Le ven. 8 févr. 2019 à 13:56, Egmont Koblinger a écrit : > Philippe, I hate do say it, but at the risk of being impolite, I just > have to. > Resist this idea, I've not been impolite. I just want to show you that terminals are legacy environments that are far behind what is needed for proper internationalization. And when I exposed the problem of monospaced fonts, and exposed the case of "dualspace" fonts, this is already used in legacy terminals to solve practical problems (and there are even data in the UCD about them): dualspace is an excellent solution that should be extended even outside CJK contexts (for example with emojis, and various other South Asian scripts).
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Fri, Feb 8, 2019 at 10:36 PM Eli Zaretskii wrote: > No one in their right minds will run Emacs inside the Emacs terminal > emulator. And even for other applications, disabling bidi will almost > always needed only for full-screen programs, which use curses-like > libraries to address the entire screen. So you'd switch off > reordering for the entire time you are running such an app, then > switch it back on after exiting. Exactly. But the question is: should it be the user to manually switch it on/off, or should it happen for them automatically under the hood? If the latter, how? My BiDi proposal answers this. Do you have another possible answer? > Are there any terminal emulators that support these sequences? Prior to my specs: Not that I'm aware of. As of my work being available: at least VTE and Mintty are working on it, and I know that iTerm2 was also waiting for some specification. I'm sincerely hoping for even more to follow. e.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 17:44:53 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > For certain apps, one of the modes is required (e.g. for cat it's the > implicit mode). For other tasks it's the other mode (e.g. for emacs > the explicit mode). No one in their right minds will run Emacs inside the Emacs terminal emulator. And even for other applications, disabling bidi will almost always needed only for full-screen programs, which use curses-like libraries to address the entire screen. So you'd switch off reordering for the entire time you are running such an app, then switch it back on after exiting. The other, simpler text applications will always need reordering to active. > > You can hardly expect Emacs (or any other application) to support > > control sequences that are not yet defined, let alone standardized. > > The most essential sequence, BDSM to switch between implicit and > explicit modes, has been defined for like 28 years now. Sure I bring > slight changes and clarifications to it, as well as introduce new > ones. As of my recommendation which I've announced, these new ones are > defined as well. Are there any terminal emulators that support these sequences? > > When they become sufficiently widely available, I'm sure someone will > > add them to Emacs. > > There's always a chicken and egg problem with this attutide. At the > very least, I'm kindly asking Emacs to emit BDSM so that when it's > fired up on a gnome-terminal, it'll have the terminal's BiDi > automatically disabled. Feel free to file a feature request with the Emacs bug tracker about this. Somebody, maybe even myself, is likely to act on that at some point.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, > Why would they want to toggle it back and forth? What are the use > cases where it makes sense to mix both modes? IME, you either need > one or the other, never both. (Back to the basics, which are mentioned pretty clearly in my specification, I believe, and I've also described here multiple times... sigh.) For certain apps, one of the modes is required (e.g. for cat it's the implicit mode). For other tasks it's the other mode (e.g. for emacs the explicit mode). In a typical terminal session, you don't just use one of these kinds of commands. You use various commands in a sequence, e.g. a cat followed by an emacs, then a zip, then whatnot, then emacs again, then a cat and a grep, etc... The very last thing I would want to do as a user is to toggle some setting back and forth, let alone remember which command needs which mode. > You can hardly expect Emacs (or any other application) to support > control sequences that are not yet defined, let alone standardized. The most essential sequence, BDSM to switch between implicit and explicit modes, has been defined for like 28 years now. Sure I bring slight changes and clarifications to it, as well as introduce new ones. As of my recommendation which I've announced, these new ones are defined as well. It's probably never going to be a de jure standard, adopted by ECMA or whatever "authority", but that's not what happens anywhere else in terminal emulators nowadays. An "authority" which doesn't keep up to date with innovations, doesn't have a feedback forum, and hasn't released a new version for 28 years, is clearly not suitable for making progress. We have just announced a public forum called "Terminal WG" for terminal emulator developers to collaborate and join their efforts wrt. new extensions, rather than ad-hoc collaborations or each going their own separate ways. We'd like its work to be widely accepted as a basis for the desired behavior. My BiDi work is one of the works hosted there. It'll probably never be an "authority" like ECMA, but hopefully will be some kind of well-respected place of specs to adhere to. > When they become sufficiently widely available, I'm sure someone will > add them to Emacs. There's always a chicken and egg problem with this attutide. At the very least, I'm kindly asking Emacs to emit BDSM so that when it's fired up on a gnome-terminal, it'll have the terminal's BiDi automatically disabled. This has nothing to do yet with Emacs's built-in terminal emulator. Addressing that is sure a much bigger chunk of work; I hope it'll happen if my BiDi proposal indeed turns out to be successful. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 15:42:51 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > On Fri, Feb 8, 2019 at 3:28 PM Eli Zaretskii wrote: > > > You can have what you call the "explicit mode" if you set the variable > > bidi-display-reordering to nil. > > So, if someone is running a mixture of applications requiring implicit > vs. explicit modes, they'll have to continuously toggle the setting of > their terminal back and forth. Why would they want to toggle it back and forth? What are the use cases where it makes sense to mix both modes? IME, you either need one or the other, never both. In any case, I'm just trying to help you map your requirements into existing Emacs features. If this is not helpful, feel free to disregard. > Now, I, as a user, want BiDi to work as seamlessly as possible, > definitely without me having to repeatedly switch a setting back and > forth if the applications could just as well do it automatically. One > of the basics of my spec. > > Whether Emacs will adopt this, or will keep requiring users to toggle > this setting back and forth depending on the particular app they wish > to run, is not my call. You can hardly expect Emacs (or any other application) to support control sequences that are not yet defined, let alone standardized. When they become sufficiently widely available, I'm sure someone will add them to Emacs.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Fri, Feb 8, 2019 at 3:28 PM Eli Zaretskii wrote: > You can have what you call the "explicit mode" if you set the variable > bidi-display-reordering to nil. So, if someone is running a mixture of applications requiring implicit vs. explicit modes, they'll have to continuously toggle the setting of their terminal back and forth. Just as for Konsole and friends there's a graphical setting, correspondingly for Emacs's terminal there's this bidi-display-reordering setting. Now, I, as a user, want BiDi to work as seamlessly as possible, definitely without me having to repeatedly switch a setting back and forth if the applications could just as well do it automatically. One of the basics of my spec. Whether Emacs will adopt this, or will keep requiring users to toggle this setting back and forth depending on the particular app they wish to run, is not my call. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 14:57:56 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > According to the description you give, Emacs's terminal always applies > the BiDi algorithm, therefore by its design only implements what I > call "implicit mode", and not the "explicit mode". You can have what you call the "explicit mode" if you set the variable bidi-display-reordering to nil. This only supports the LTR explicit mode, though. Personally, I don't see when would the RTL explicit mode be useful: there's no RTL-only text in real life, so some reordering is always required. But maybe I'm missing something. > I'm making the strong claim that by running the UBA a terminal > emulator doesn't become BiDi aware, there's much more it needs to do. Like I said, you are welcome to test the rest of your requirements and ask questions if you think something is not supported or isn't working as expected.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, > Emacs implements the latest UBA from Unicode 11; and the Emacs > terminal emulator inserts all the text into a "normal" Emacs buffer, > and displays that buffer as any other buffer. So yes, you have there > full UBA support. One of the essentials of my work is that there's much more to BiDi in terminal emulators than running the UBA. If one takes a step backwards to look at the big picture, it becomes clear that in some cases the UBA needs to be run, while in other cases it mustn't. And then of course there needs to be some means of switching, and so on... According to the description you give, Emacs's terminal always applies the BiDi algorithm, therefore by its design only implements what I call "implicit mode", and not the "explicit mode". On the other hand, in order to run Emacs inside a terminal emulator, you need to set that terminal emulator to explicit mode, so that it doesn't reshuffle the characters. The behavior it expects from the outer terminal doesn't match the behavior it provides in its inner one. As an interesting consequence, if you open Emacs, then inside it a terminal emulator, and then inside it an Emacs, it will display BiDi incorrectly, in reversed order. I'm making the strong claim that by running the UBA a terminal emulator doesn't become BiDi aware, there's much more it needs to do. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 13:30:42 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > Hi Eli, > > > Not sure why. There are terminal emulators out there which support > > proportional fonts. > > Well, of course, a terminal emulator can load any font, even > proportional, but as it places them in the grid, it will look ugly as > hell Maybe so, but the original text was this: Emacs and 'M-x term' are the route to take if one only has proportional fonts. Which I don't understand, since the terminal emulator in Emacs doesn't do anything special about proportional fonts, AFAIK. > In Emacs-25.2's terminal emulator I executed "cat TUTORIAL.he". For > the entire contents, LTR paragraph direction was used and was aligned > to the left. Maybe something has changed for 26.x, I don't know. I told you what changed: Emacs 25 forces LTR paragraph direction, whereas Emacs 26 and later does not. You can get dynamic paragraph direction in your Emacs 25 as well if you set bidi-paragraph-direction to nil in the *term* buffer. > And now you suddenly tell that Emacs's terminal supports BiDi more or > less in full??? Emacs implements the latest UBA from Unicode 11; and the Emacs terminal emulator inserts all the text into a "normal" Emacs buffer, and displays that buffer as any other buffer. So yes, you have there full UBA support. I thought this was clear, sorry if it wasn't. One caveat with this is that the Emacs emulator works only on Posix platforms, it doesn't work on MS-Windows. > Sorry, I just don't buy it. If you retain this claim, I'd pretty > please like to see a specification of its behavior The specification is the latest version of the UBA, augmented with three deviations, two of them allowed by the UBA, the third isn't: . Emacs uses HLA1 for determining base paragraph direction: it decides on base direction only once for every chunk of text delimited by empty lines; . Emacs doesn't by default remove bidi formatting controls from display; . Emacs wraps long lines _after_ reordering, not before. I think that's it. If I forget something, please forgive me: I implemented this 10 years ago, so maybe something evades me at the moment. > one which addresses at least all the major the issues I address in > my work, one which I could replace my work with, one which I'd be > happy to implement in gnome-terminal in the solid belief that it's > about as good as my proposal, and would wholeheartedly recommend for > other terminal emulators to adopt. > > Or maybe, by any chance, when you said Emacs's terminal supported BiDi > more or less in full, did you perhaps went with your own idea what a > BiDi-aware terminal emulator needs to support; ignoring all those > things I detail in my work, such as the inevitable need for explicit > mode, the need for deciding the scope of implicit vs. explicit mode, > and much more? Sorry, I cannot afford testing everything you wrote in your specification. I think most, if not all, of that is covered, but I certainly didn't test that, so maybe I'm wrong. Please feel free to test the relevant aspects and ask questions if you need more "inside information". I do hope that my impression about "most everything being supported" is correct, because that would give you a working implementation/prototype of most of the features you want to see in terminal emulators, so you could actually try the behavior to see if it's convenient, causes problems, etc. One other feature you may find interesting (something that I don't think you covered in your document, at least not explicitly) is that Emacs supports visual-order cursor motion, in addition to the "usual" logical-order. The latter is, of course, the default, but you can switch to the former if you set the visual-order-cursor-movement option to a non-nil value.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Philippe, > Adding a single bit of protection in cell attributes to indicate they are > either protected or become transparent (and the rest of the > attributes/character field indicates the id of another terminal grid or > rendering plugin crfeating its own layer and having its own scrolling state > and dimensions) can allow convenient things, including the possibility of > managing a grid-based system of stackable windows. > You can design one of the layer to allow input (managed directly in the > terminal, with local echo without transmission delays and without risks of > overwriting surrounding contents. At this point you're already touching much more the core of terminal emulator behavior than e.g. my BiDi work does, it's a way more essential, way more complex change – with much less clear goal to me, like, why should emulators implement it, why would applications start using it etc. If you wish to go for this direction, good luck! (If anything, what I do see somewhat feasibile, is building up something from scratch that looks much more like a proportional-font text editing widget, or even a rich text editor, rather than terminal emulator, and figure out step by step how to get a shell and simple utilities and later more complex utilities run in that. This could be a new platform which, by putting decades of hard work in it – which I cannot do voluntarily –, could eventually replace terminal emulators.) Philippe, I hate do say it, but at the risk of being impolite, I just have to. Your ideas would take terminal emulators extremely far from what they are now, with no clear goals and feasibility to me; and are no longer any relevant to BiDi. All I see is we're wasting each other's time on utterly irrelevant topics, and since I see exactly zero chance of any worthful takeaway to come out of this, unfortunately I cannot anymore devote my limited free time for this, I just have to quit this conversation between the two of us. I'm really sorry. best regards, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, > Not sure why. There are terminal emulators out there which support > proportional fonts. Well, of course, a terminal emulator can load any font, even proportional, but as it places them in the grid, it will look ugly as hell (like this one: https://askubuntu.com/q/781327/398785 ). Sure you could apply some tricks to make it look a bit less terrible (e.g. by centering each glyph in its cell rather than aligning to the left), but it still won't look great. In the world of terminal emulation, many applications expect things to align properly according to the wcwidth() of the string they emit. You abandon this (start placing the glyphs one after the other in a row, no matter how wide they are), and plenty of applications suddenly fall apart big time (let alone questions like how you define the terminal's width in characters). > Emacs is perhaps the only one whose terminal > emulator currently supports bidi more or less in full Let's not get started from here, please. In Emacs-25.2's terminal emulator I executed "cat TUTORIAL.he". For the entire contents, LTR paragraph direction was used and was aligned to the left. Maybe something has changed for 26.x, I don't know. In my work I carefully evaluated 4 other "BiDi-aware" terminal emulators, as well an ancient specification for BiDi which I had to read about twenty times to get to pretty much understand what it's talking about. Identified substantial issues with both the standard as well as all the independent implementations (which didn't care about this standard at all). I show that existing terminal emulators are incompatible to the extent that an app cannot reliably print any RTL text by any means at all. At this point I firmly believe it should be clear that BiDi in terminals is not a topic where one can just go ahead and do something, without having a specification first. I lay down principles which a proper BiDi-supporting platform I believe needs to meet, argue why multiple modes (explicit and implicit) are inevitable, examine what to do with paragraph direction, cursor location and tons of other issues, and come up with concrete suggestion how (partially based on that ancient specifications) these all should be exactly addressed. Then, after putting literally months of work in it, I come here to announce my work and ask for feedback. So far, from a thread of 100+ mails, I take away two pieces of worthful feedback: one is that shaping should be done differently, and the other one is that – for some use cases – a bigger scope of data should be used for autodetecting the "paragraph direction" (as per UBA's terminology). And now you suddenly tell that Emacs's terminal supports BiDi more or less in full??? Sorry, I just don't buy it. If you retain this claim, I'd pretty please like to see a specification of its behavior, one which addresses at least all the major the issues I address in my work, one which I could replace my work with, one which I'd be happy to implement in gnome-terminal in the solid belief that it's about as good as my proposal, and would wholeheartedly recommend for other terminal emulators to adopt. Or maybe, by any chance, when you said Emacs's terminal supported BiDi more or less in full, did you perhaps went with your own idea what a BiDi-aware terminal emulator needs to support; ignoring all those things I detail in my work, such as the inevitable need for explicit mode, the need for deciding the scope of implicit vs. explicit mode, and much more? thanks a lot, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> Date: Fri, 8 Feb 2019 06:40:44 + > From: Richard Wordingham via Unicode > > > I, for one, am not to the slightest bit interested in abandoning the > > character grid and allowing for proportional fonts. This would just > > break a gazillion of things. > > The message I take from that and this thread in general is that Emacs > and 'M-x term' are the route to take if one only has proportional fonts. Not sure why. There are terminal emulators out there which support proportional fonts. Emacs is perhaps the only one whose terminal emulator currently supports bidi more or less in full, but is that related to proportional fonts? > What's the sledgehammer for Windows? Not sure what you meant. "M-x term" doesn't work on Windows. > Where do I find the specification for fixed-width fonts (is > wcswidth() the core?) and how do I select the set of fonts to use? Do I > need to use fontconfig where available? That depends on the underlying C library and other facilities; basically on your OS. AFAIK wcwidth will give the results consistent with the UCD only if you use glibc. In Emacs, you have the functions char-width and string-width that take their data from EastAsianWidth.txt. Not sure about other facilities, and I don't really understand what environment are you asking about -- are you talking about C/C++ programs?
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Fri, 8 Feb 2019 00:38:24 +0100 Egmont Koblinger via Unicode wrote: > I, for one, am not to the slightest bit interested in abandoning the > character grid and allowing for proportional fonts. This would just > break a gazillion of things. The message I take from that and this thread in general is that Emacs and 'M-x term' are the route to take if one only has proportional fonts. What's the sledgehammer for Windows? Where do I find the specification for fixed-width fonts (is wcswidth() the core?) and how do I select the set of fonts to use? Do I need to use fontconfig where available? Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Adding a single bit of protection in cell attributes to indicate they are either protected or become transparent (and the rest of the attributes/character field indicates the id of another terminal grid or rendering plugin crfeating its own layer and having its own scrolling state and dimensions) can allow convenient things, including the possibility of managing a grid-based system of stackable windows. You can design one of the layer to allow input (managed directly in the terminal, with local echo without transmission delays and without risks of overwriting surrounding contents. Asynchronous behavior can be defined as well between the remote application/OS and the local processing in the terminal. The protocol can also support an extension to provide alternate streams (take an example on MIME multipart). This can even be used to transport the inputs and outputs for each layer, and additional streams to support (java)scripts, or the content of an image, or a link to a video stream. And just like with classing graphics interface, you can have more than just solid RGB colors and add an alpha layer. The single-rectangular-flat grid design is not the only option. Layered approaches can then even be rendered on hardware easily by mapping these virtual layers and flattening them internally in the terminal emulator to the single flat grid supported by the hardware. The result is more or less equivalent to graphic RGB frames, except that the unit is not a single pixel but a whole cell with not just one color but a pair of colors and an encoded character and a font selected for that cell, or if a single font is supported, using a dynamic font and storing glyph ids in that font (prescaled for the cell size). The hardware then makes the rest to build the pixels of the frame, but it can be easily accelerated. The layered approache could also be used to link together the cells that use the same script and font settings, in order to use proportional fonts when monospaced fonts are not usable, and justify their text in the field (which may turn to be scrollable itself when needed for input). Having multiple communication streams between the terminal emulator and the remote application allows the application to query the properties and behave in a smarter way than with just static "termcaps" not taking into account the actual state of the remote terminal. All this requires some extension to TV-like protocols (using specific escape sequences, just like with the Xterm extensions for X11). You can also reconsider how "old" mainframes terminals worked: the user in fact never submitted characters one by one to the remote application: the application was sending a full screen and an input form, the user on its terminal could fill in the form and press a "submit/send" button when he had finished inputing the data. But while the user was inputing data, there was absolutely no need to communicate each typed keystroke to the application, all was taken in charge by the terminal itself which was instructed (and could even perform form data validation with input formats and some conditions, possibly as well a script). In other words, they worked mostly like an HTML input form with a submit button. Such mode is very useful for small devices because they don't have to react interactively with the user, the transmission delays (which may be slow) are no longer a problem, user can enter and correct data easily, and the editing facilities don'ty need to be handled by the remote application (which today could be a very tiny device with in fact much less processing power than the terminal emulator, and would have in fact no knowledge at all of the fonts needed) A terminal emulator can make a lot of things itself and locally. And this would also be useful on many modern application servers that need to serve lot of remote clients, possibly over very slow internet links and long roundtrip times. The idea behing this is to allow to distribute the workload and decide which side will handle part of all of the I/O. Of course it will transport text (preferably in an Unicode UTF), but text is not the only content to transport. There are also audio/video/images, security items (certificates, signatures, personal data that should remain private and be encrypted, or only sent to the application in a on-way-hashed form), plus some states/flags that could provide visual/audio hints to the user when working in the rendered input/output form with his local terminal emulator. I spoke about HTML because terminal-based browsers already exist since long, some of them which are still maintained in 2019 (w3m still used as a W3C-sponsored demo, Lynx is best known on Linux, or elinks): https://www.slant.co/topics/4702/~web-browsers-that-run-in-a-terminal This gives a good idea of what is needed, what a good terminal protocol can do, and what the many legacy VT-like protocol variants have never treid to unify. These browsers don't reinvent the wheel: HTML
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Philippe, > I have never said anything about your work because I don't know where you > spoke about it or where you made some proposals. I must have missed one of > your messages (did it reach this list?). This entire conversation started by me announcing here my work, aiming to bring usable BiDi to terminal emulators. > Terminals are not displaying plain text, they create their own upper layer > protocol which requires and enforces the 2D layout [...] Bidi does not > specify the 2D layout completely, it is purely 1D and speaks about left and > right direction That's one of the reasons why it's not as simple as "let's just run the UBA inside the terminal", one of the reasons why gluing the two worlds together requires a substantial amount of design work. > For now terminal protocols, and emulators trying to implement them; that must > mix the desynchronized input and output (especially when they have to do > "local echo" of the input [...] I assume by "local echo" you're talking about the Send/Receive Mode (SRM) of terminals, and not the "stty echo" line discipline setting of the kernel, because as far as the terminal emulator is concerned, the kernel is already remote, and it's utterly irrelevant for us whether it's the kernel or the application sending back the character. SRM is only supported by a few terminal emulators, and we're about to drop it from VTE, too (https://gitlab.gnome.org/GNOME/vte/issues/69). > If you look at historic "terminal" protocols, I'm mostly interested in the present and future. In the past, only for curiosity, and to the extent necessary to understand the present and to plan for the future. > Some older terminal protocols for mainframes notably were better than today's > VT-like protocols: you did not transmit just what would be displayed, but you > also described the screen area where user input is allowed and the position > of fields and navigation between them: This is not seen in today's graphical terminal emulators. > Today these links are better used with real protocols made for 2D and > allowing an web application to mange the input with presentation layer (HTML) > and with javascript helpers (that avoid the roundtrip time). Sure, if you need another tool, let's say a dynamic webpage in your browser, rather than a terminal emulator to perform your taks effectively, so be it. I'm not claiming terminal emulators are great for everything, I'm not claiming terminal emulators should be used for everything. > But basic text terminals have never evolved and have lagged behind today's > need. I disagree with the former part. There are quite a few terminal emulators out there, and many have added plenty of new great features recently. Whether they're up to today's needs, depends on what your needs are. If you need something utterly different, go ahead and use whatever that is, such as maybe a web browser. If you're good with terminals, that's fine too. And there's a slim area where terminal emulators are mostly good for you, you'd just need a tiny little bit more from them. And maybe for some people this tiny little bit more happens to be BiDi. > Most of them were never tested for internationalization needs: Terminal emulators weren't created with internationalization in mind. I18n goals are added one by one. Nowadays combining accents and CJK are supported by most emulators. Time to stretch it further with BiDi, shaping, spacing combining marks for Devanagari, etc. > [...] delimit input fields in input forms for mainframes, something that was > completely forgotten and remains forgotten today with today's VT-* protocols, > to indicate which side of the communcation link controls the content of > specific areas Something that was completely forgotten, probably for good reasons, and I don't see why it should be brought back. > As well today's VT-* protocols have no possibility to be scriptable: > implemeint a way to transport fragments of javascripts would be fine. I have absolutely no incentive to work in this direction. > Text-only terminals are now aging but no longer needed for user-friendly > interaction, they are used for technical needs where the only need is to be > able to render static documents without interactiving with it, except > scrolling it down, and only if they provide help in the user's language. Text-only terminals are no longer needed??? Well, strictly speaking, computers aren't needed either, people lived absolutely fine lives before they were invented :) If you get to do some work, depending on the kind of work, terminal emulators may or may not be a necessary or a useful tool for you. For certain tasks you don't really have anything else, or at least terminals are way more effective than other approaches. For other tasks (e.g. text editing) it's mostly a matter of taste whether you use a terminal or a graphical app. For yet other tasks, terminal emulators take you nowhere. My work aims to bring BiDi into
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Le jeu. 7 févr. 2019 à 19:38, Egmont Koblinger a écrit : > As you can see from previous discussions, there's a whole lot of > confusion about the terminology. And it was exactly the subject of my first message sent to this thread ! you probably missed it. > Philippe, with all due respect, I have the feeling that you have some > fundamental problems with my work (and I'm temped to ask back: have > you read it at all?), but your message what your problem is just > doesn't come across to me. Could you please avoid all those irrelevant > stories with baud rate and font size and Asian scripts and whatnot, > and clearly get to your point? > I have never said anything about your work because I don't know where you spoke about it or where you made some proposals. I must have missed one of your messages (did it reach this list?). So don't take that as a personal attack because this only started on a reply I made (the one specifically speaking about the various ambiguities of encoded newlines in terminal protocols, which do not match the basic plain text definition (similar to MIME) made only for static documents, but never tuned for interactive bidirectional use (including for example text editors, which also requires a modelization of 2D layout, and also sets some assumptions about "characters" visible in a single cell of a regularly spaced grid, and a known number of lines and columns, independant of the lines of the text rendered and read on it. Terminals are not displaying plain text, they create their own upper layer protocol which requires and enforces the 2D layout (whereas Unicode is a purely linear protocol with only relations between one character and the next one in a 1D stream, and no assumption at all about their display width, which cannot be monospaced in all scripts and are definitely not encoded in logical order: try adding characters at end of a logical line, with a Bidi text you do not just replace the content of one cell, you have to scroll the content of surrounding cells and your input curet position does not necessarily changes or you'l reach a point where a visual line will be split in two part, but not at the rest position, and some parts moved up to down Bidi does not specify the 2D layout completely, it is purely 1D and speaks about left and right direction and does not specify what happens when contents do not fit on the visual line for the text which is already present there before inserting new text or even what will be replaced if you are in replace mode and not in insert mode: The Bidi algorithm is not designed to handle overwrites, and not even the whole Unicoidce standard itself, which is made as if all text was inserted only at end of lines and not replacing anything. For now terminal protocols, and emulators trying to implement them; that must mix the desynchronized input and output (especially when they have to do "local echo" of the input for performance reason over slow serial links where there's no synchronization between the local buffer of the terminal and the remote virtual buffer of the terminal emulator in the emitting app, even those using the best "termcap" definitions) have no easy way to do that. The logical encoding of Unicode does not play well and the time to resynchronize the local and remote buffers is a limiting factor (over a 9.6kbps link, refreshing the whole screen takes too long, and this cannot be done on every keystroke of input, or user input would have to be dramatically slow if local echoing is also enabled, or most user inputs that are too fast would have to be discarded, and this makes user input very unreliable, requiring constant correction; these protocols are definitely not human-friendly as they depend on strict timing which is not the way humans enter text; this timing is also unpredicatable and very variable over serial links and the protocols do not have any specification for timing requirements. In fact time is constantly ignored, even if it plays an evident role). If you look at historic "terminal" protocols, technics were used to control time: notably the XON/XOFF protocols, or mechanical constraints. Especially when the output was a printer (with a daisywheel or matrix head). But time was just control between one machine and another, a human could not really interact asynchronously. And it was in a time where full-screen text editors did not even exist (at most they were typing "on the flow" and text layout was completely forgotten. This changed radiucally when the ouput became a screen, with the assumption that the output was instantanous, but the mechanical restrictions were removed. Some older terminal protocols for mainframes notably were better than today's VT-like protocols: you did not transmit just what would be displayed, but you also described the screen area where user input is allowed and the position of fields and navigation between them: the terminal had then no difficulty to avoid breaking the output when
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Philippe, On Thu, Feb 7, 2019 at 3:21 PM Philippe Verdy wrote: > "Rules" are not formally written, they are just a sense of best practices. When it comes to BiDi in terminals, I haven't seen anything that I consider reasonably okay, let alone "best practice". It's a mess. That's why I decided to come up with something. > Bidi plays very badly on terminals Agreed. There's essentially two ways from here: just leave it as bad as it is (or even see various terminal emulators coming up with not well-thought-out hacks that just make it even worse) or try to improve. I picked the latter. > [...] refreshing a typical 80x25 screen takes about one half second, which is > much longer than typical user input, so full screen refresh does not work for > data input and editing, and terminals implement themselves the echo of user > input, ignoring how and when the receiving application will handle the input, > and also ignoring if the applciation is already sending ouput to the terminal. I'm really unsure where you're trying to get with it. For one, adding BiDi doesn't introduce the need for significantly larger updates. Whenever a partial repaint of the screen was sufficient, even with BiDi in the game it will remain sufficient. Another thing: I'm not sure that 9.6kbps is a bottleneck to worry about. It's present if you connect to a device via serial port, but will you really do this in combination with BiDi? The use case I much more have in mind is running a terminal emulator locally, or ssh'ing to a remote matchine, for getting various kinds of productive work done (e.g. wriiting a text file in someone's native RTL script in a text editor). These are magnitudes faster. > It's hard or impossible to synchroinize this and local echoes on the terminal > causes havoc. If input mixes with output (e.g. you press some keys while you're waiting for make/gcc to compile your app, and these letters appear onscreen), the visual result is broken even without BiDi. I cannot elimite this kind of breakage by introducing BiDi, nor can I build up something from scratch that somewhat resembles the current terminal emulator world but fixes all of its oddnesses. > But the concept of "line" or "paragraph" in a terminal protocols is extremely > fuzzy. It's then very difficult to take into account the additiona Bidi > contraints as it's impossible to conciliate BOTH the logical ordering (what > is encoded in the transmitted data or kept in history buffers) and the visual > ordering. I don't try to conciliate logical and visual ordering within the same paragraph, I agree it's impossible, it's a semantical nonsense. But I try to conciliate them in the sense that sometimes the visual order is the desired one, sometimes the logical order, so let's make it possible to use one for one paragraph, and the other one for another paragraph. > That's why there are terminal protocols that absolutely don't want to play > with the logical ordering and require all their data to be transmitted in > visual order (in which case, there's no bidi handling at all). This is one of the modes in my recommendation. If your application requires this mode (as e.g. Emacs does), use this mode and you're good. > In fact most terminal protocols are very defective and were never dessign to > handle Bidi input Maybe it's high time someone fixed this defect, then? :) > And here your unit (logical lines) is not even defined in the terminal > protocol and not known from the meitting applications whjich has no input > about the final output terminal properties. So the terminal must perform > guesses. As it can insert additional linebreaks itself, and scroll out some > portion of it, there's no way to delimit the effect of "bidi controls". The > basic requirement for correctly handling bidi controls is to make sure that > paragraph delimitations are known and stable. if additional breaks can occur > anywhere on what you think is a "logical line" but which is different from > the mietting application (or static text document which is ouput "as is" > without any change to reformat it, these bidi controls just make things worse > and it becomes impossible to make reasonnable guesses about paragraph > delimitations in the terminal. The result become unpredictable and most often > will not even make any sense as the terminal uses visual ordering always but > looses the tr! ack of the logical ordering (and things get worse when there are complex clusters or characters that cannot even fit in a monospaced grid. If an exact definition of hard vs. soft wrapped lines is what you miss from the specification, okay, I'll add it to a future version. I don't know how terminals performing guesses occured to you, they sure don't (as for hard vs. soft newlines). > The basic requirement for correctly handling bidi controls is to make sure > that paragraph delimitations are known and stable. Since we're talking about bidi controls being emitted,
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger a écrit : > Hi Philippe, > > > There's some rules for correct display including with Bidi: > > In what sense are these "rules"? Where are these written, in what kind > of specification or existing practice? > "Rules" are not formally written, they are just a sense of best practices. Bidi plays very badly on terminals (even enhanced terminals like VT-* or ANSI that expose capabilities when, most of the time, these capabilities are not even accessible: it is too late and further modifications of the terminal properties (notably its display size) can never be taken into account (it is too late, the ouput has been already generated, and all what the terminal can do is to play with what is in its history buffers). Even on dual-channel protocols (input and output), terminal protocols are also not synchronizing the input and the output and these asynchrnous channels ignore the transmission time between the terminal and the aware application, so the terminal protocol must include a functio nthat allows flushing and redrawing the screen completely (but this requires long delays). With a common 9.6kbps serial link, refreshing a typical 80x25 screen takes about one half second, which is much longer than typical user input, so full screen refresh does not work for data input and editing, and terminals implement themselves the echo of user input, ignoring how and when the receiving application will handle the input, and also ignoring if the applciation is already sending ouput to the terminal. It's hard or impossible to synchroinize this and local echoes on the terminal causes havoc. I've not seen any way for a terminal to handle all these constraints. So the only way for them is to support them only plain-text basic documents, formatted reasonnably, and inserting layout "hints" in the format of their output so that termioanl can perform reasonnable guesses and adapt. But the concept of "line" or "paragraph" in a terminal protocols is extremely fuzzy. It's then very difficult to take into account the additiona Bidi contraints as it's impossible to conciliate BOTH the logical ordering (what is encoded in the transmitted data or kept in history buffers) and the visual ordering. That's why there are terminal protocols that absolutely don't want to play with the logical ordering and require all their data to be transmitted in visual order (in which case, there's no bidi handling at all). Then terminals will attempt to consiliate the visual line delimitations (in the transmitted data) with the local-only capabilities of the rendered frame. Many terminals will also not allow changing the display width, will not allow changing the display cell size, will force constraints on cell sizes and fonts, and then won't be able to correctly output many Asian scripts. In fact most terminal protocols are very defective and were never dessign to handle Bidi input, and Asian scripts with compelx clusters and variable fonts that are needed for them (even CJK scripts which use a mix of "half-wifth" and "full-width" characters. > - Separate paragraphs that need a different default Bidi by double > newlines (to force a hard break) > > There is currently no terminal emulator I'm aware of that uses empty > lines as boundaries of BiDi treatment. > These are hint in absence of something else, and it plays a role when the terminal disaply width is unpredicable by the application making the output and having no access to any return input channel. Take the example of terminal emulators in resizable windows: the display width is undefined, but there's not any document level and no buffering, scrolling text will flush the ouput partially, history is limited A terminal emulator then needs hints about where paragrpahs are delimited and most often don't have any other distinctions available even in their limited history that allows distinguishing the 3 main kinds of line breaks. > While my recommendation uses a one smaller unit (logical lines), and I > And here your unit (logical lines) is not even defined in the terminal protocol and not known from the meitting applications whjich has no input about the final output terminal properties. So the terminal must perform guesses. As it can insert additional linebreaks itself, and scroll out some portion of it, there's no way to delimit the effect of "bidi controls". The basic requirement for correctly handling bidi controls is to make sure that paragraph delimitations are known and stable. if additional breaks can occur anywhere on what you think is a "logical line" but which is different from the mietting application (or static text document which is ouput "as is" without any change to reformat it, these bidi controls just make things worse and it becomes impossible to make reasonnable guesses about paragraph delimitations in the terminal. The result become unpredictable and most often will not even make any sense as the terminal uses visual ordering
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Philippe, > There's some rules for correct display including with Bidi: In what sense are these "rules"? Where are these written, in what kind of specification or existing practice? > - Separate paragraphs that need a different default Bidi by double newlines > (to force a hard break) There is currently no terminal emulator I'm aware of that uses empty lines as boundaries of BiDi treatment. While my recommendation uses a one smaller unit (logical lines), and I understand as per Eli's request that it would be desireable to go with emptyline-delimited boundaries, what in fact all the current self-proclaimed BiDi-aware terminal emulators that I came across do is use a unit two steps smaller than yours: they do BiDi on physical lines of the terminal, no matter how a logical line of the output had to wrap into physical ones because didn't fit in the width. (It's a terrible behavior.) The current behavior of terminal emulators is very far from what you describe. > - use a single newline on continuation Continuation of what exactly? But let's take a step back: Should the output be pre-formatted by some means, or do we rely on the terminal emulator wrapping overlong lines? (If pre-formatted then for what width? 80 columns, so that I waste precious real estate if my terminal is wider? Or is it a requirement for any app that produces output to implement a decent dynamic wrapping engine for nice formatting according to the actual width?) There's precedence for both of these different approaches. I don't think it's feasible to pick one, and claim that the other approach is discouraged/invalid/whatever. > - if technical items are untranslatable, make sure they are at the begining > of lines and indented by some leading spaces, before translated ones. I firmly disagree. There shouldn't be any restriction on how a translator wishes to translate a sentence. The computer world has to adapt to the requirements of human languages, not the other way around! > - Don't use any Bidi control ! Why not? They do exist for a reason, for the very reason that any logical translation, which a translator might want to write (see my previous point) is presentable in a visually correct way. Use them for that, whenever needed. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
I read your email, you spoke for example about how a typical Unix/Linux tool shows its usage option (e.g. "anycommand --help") with a leading line then syntaxes and tabulated lists of options followed by translated help on the same line. There's some rules for correct display including with Bidi: - Separate paragraphs that need a different default Bidi by double newlines (to force a hard break) - use a single newline on continuation - if technical items are untranslatable, make sure they are at the begining of lines and indented by some leading spaces, before translated ones. - avoid breaking lists - try to separate as much as posible text in natural languages from technical texts. - Be careful about correcty usage of leading punctuations (notably for list items) - Be consistant about indentation - Normalize spaces, - Don't ussume that TAB controls have the same width (ban TABS except at the begining of lines) - In column output, separate colums always with at least two spaces, don't glue them as if they were sentences. - Don't use "soft line breaks" in the middle of short lines (less than 72 base characters) - Don't use any Bidi control ! With some cares, you can perfectly translate Linux/Unix tools in languages needing Bidi and get consistant output, but be careful if your text contains placeholders or technihcal untranslated terms (make sure to surround them with paired punctuation, or don't translate them at all. And avoid paragraphs that would mix natural and technical untranslatable terms (such as command names or command-line options). Make sure to test the output so that it will also work with varaible fonts (don't assume monospaced fonts are used, they do not exist for various scripts and don't work reliably for Arabic and most Asian scripts, and not even for Chinese or Japanese even if these don't need Bidi support). But the difficulty is not really in the terminal emulators but in the source texts given to translators, when they don't know the context in which the text will be used and have no hint about which terms should not be translated (because they can become inconsistant: there are many examples, even in Windows 10, where some of the command line tools are completely unusable with the translated UI and with examples of syntaxes that are not even working where some terms were randomly and inconsistantly translated or confused, or because tools assumed an LTR-only layout of the output, and monospaced fonts with one-to-one character per display cell, or requiring specific fonts that do not contain the characters in their monospaced variants: this is challenging notably for Asian scripts needing complex clusters if you made these Latin-based assumptions) Le mer. 6 févr. 2019 à 22:30, Egmont Koblinger a écrit : > Hi Philippe, > > Thanks a lot for your input! > > Another fundamental difficulty with terminal emulators is: These > controls (CR, LF...) are control instructions that move the cursor in > some ways, and then are forgotten. You cannot do BiDi on the > instructions the terminal receives. You can only do BiDi on the > result, the contents of the canvas after these instructions are > executed. Here these controls are either lost, or you have to give a > specification how exactly they need to be remembered, i.e. converted > to being part of the canvas's data. > > Let's also mention that trying to get apps into using them is quite > hopeless. The best you can do is design BiDi around what you already > have, which pretty much means hard vs. soft line endings, and > hopefully forthcoming semantical marks around shell prompts. (To > overcomplicate the story, a received LF doesn't convert the line > ending to hard wrapped in most terminal emulators. In some it does. I > don't think there's an exact specification anywhere. Maybe the BiDi > spec needs to create one. Lines are hard wrapped by default, turned to > soft wrapped when the text gets wrapped at the end of the line, and a > few random control functions turn them back to hard one, but in most > terminals, a newline is not such a control function.) > > Anyway, please also see my previous email; I hope that clarifies a lot > for you, too. > > > cheers, > egmont > > On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode > wrote: > > > > I think that before making any decision we must make some decision about > what we mean by "newlines". There are in fact 3 different functions: > > - (1) soft line breaks (which are used to enforce a maximum display > width between paragraph margins): these are equivalent to breakable and > compressible whitespaces, and do not change the logical paragraph > direction, they don't insert any additionnal vertical gap between lines, so > the logicial line-height is preserved and continues uninterrupted. If text > justification applies, this whitespace will be entirely collapsed into the > end margin, and any text before it will stilol be justified to match the > end margin (until the maximum
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Philippe, Thanks a lot for your input! Another fundamental difficulty with terminal emulators is: These controls (CR, LF...) are control instructions that move the cursor in some ways, and then are forgotten. You cannot do BiDi on the instructions the terminal receives. You can only do BiDi on the result, the contents of the canvas after these instructions are executed. Here these controls are either lost, or you have to give a specification how exactly they need to be remembered, i.e. converted to being part of the canvas's data. Let's also mention that trying to get apps into using them is quite hopeless. The best you can do is design BiDi around what you already have, which pretty much means hard vs. soft line endings, and hopefully forthcoming semantical marks around shell prompts. (To overcomplicate the story, a received LF doesn't convert the line ending to hard wrapped in most terminal emulators. In some it does. I don't think there's an exact specification anywhere. Maybe the BiDi spec needs to create one. Lines are hard wrapped by default, turned to soft wrapped when the text gets wrapped at the end of the line, and a few random control functions turn them back to hard one, but in most terminals, a newline is not such a control function.) Anyway, please also see my previous email; I hope that clarifies a lot for you, too. cheers, egmont On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode wrote: > > I think that before making any decision we must make some decision about what > we mean by "newlines". There are in fact 3 different functions: > - (1) soft line breaks (which are used to enforce a maximum display width > between paragraph margins): these are equivalent to breakable and > compressible whitespaces, and do not change the logical paragraph direction, > they don't insert any additionnal vertical gap between lines, so the logicial > line-height is preserved and continues uninterrupted. If text justification > applies, this whitespace will be entirely collapsed into the end margin, and > any text before it will stilol be justified to match the end margin (until > the maximum expansion of other whitespaces in the middle is reached, and the > maximum intercharacter gap is also reached (in which case, that line will not > longer be expanded more), but this does not apply to terminal emulators that > noramlly never use text justification, so the text will just be aligned to > the start margin and whitespaces before it on the same line are preserved, > and collapsed only at end of the line (just before the soft line break itself) > - (2) hard line breaks: they break to a new line but continue the paragraph > within its same logical direction, but they are not compressible whitespaces > (and do not depend on the logical end margin of the paragraph. > - (3) paragraph breaks: generally they introduce an addition vertical gap > with top and bottom margins > > The problem in terminals is that they usually cannot distinguish types (1) > and (2), they are simply encoded by a single CR, or LF, or CR+LF, or NEL. > Type (1) is only existing within the framework of a higher level protocol > which gives additional interpretation to these "newlines". The special > control LS is almost never used but may be used for type (1) i.e. soft > line-breaks, and will fallback to type (2) which is represented by the legacy > "simple" newlines (single CR, or single LF, or single CR+LF, or single NEL). > I have seen very little or no use of the LS (line separator) special control. > > Type (3) may be encoded with PS (paragraph separator), but in terminals (and > common protocols line MIME) it is usually encoded using a couple of newline > (CR+CR, or LF+LF, or CR+LF+CR+LF, or NL+NL) possibly with additional > whitespaces (and additional presentation characters such as ">" in quotations > inserted in mail responses) between them (needed for MIME and HTTP) which may > be collapsed when rendering or interpreting them. > > Some terminal protocols can also use other legacy ASCII separators such as > FS, GS, RS, US for grouping units containing multiple paragraphs, or STX/EOT > pairs for encapsulating whole text documents in an protocol-specific > enveloppe format (and will also use some escaping mechanism for special > controls found in the middle, such as DLE+control to escape the control, or > DLE+0 to escape a NUL, or DLE+# to escape a DEL, or DEL+x+NN where N are a > fixed number of hexadecimal, decimal or octal digits. There's a wide variety > of escaping mechanisms used by various higher-layer protocols (including > transport protocols or encoding syntaxes used just below the plain-text > layer, in a lower layer than the transport protocol layer). > > Le lun. 4 févr. 2019 à 21:46, Eli Zaretskii via Unicode > a écrit : >> >> > Date: Mon, 4 Feb 2019 19:45:13 + >> > From: Richard Wordingham via Unicode >> > >> > Yes. If one has a text composed of LTR and RTL
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
I think that before making any decision we must make some decision about what we mean by "newlines". There are in fact 3 different functions: - (1) soft line breaks (which are used to enforce a maximum display width between paragraph margins): these are equivalent to breakable and compressible whitespaces, and do not change the logical paragraph direction, they don't insert any additionnal vertical gap between lines, so the logicial line-height is preserved and continues uninterrupted. If text justification applies, this whitespace will be entirely collapsed into the end margin, and any text before it will stilol be justified to match the end margin (until the maximum expansion of other whitespaces in the middle is reached, and the maximum intercharacter gap is also reached (in which case, that line will not longer be expanded more), but this does not apply to terminal emulators that noramlly never use text justification, so the text will just be aligned to the start margin and whitespaces before it on the same line are preserved, and collapsed only at end of the line (just before the soft line break itself) - (2) hard line breaks: they break to a new line but continue the paragraph within its same logical direction, but they are not compressible whitespaces (and do not depend on the logical end margin of the paragraph. - (3) paragraph breaks: generally they introduce an addition vertical gap with top and bottom margins The problem in terminals is that they usually cannot distinguish types (1) and (2), they are simply encoded by a single CR, or LF, or CR+LF, or NEL. Type (1) is only existing within the framework of a higher level protocol which gives additional interpretation to these "newlines". The special control LS is almost never used but may be used for type (1) i.e. soft line-breaks, and will fallback to type (2) which is represented by the legacy "simple" newlines (single CR, or single LF, or single CR+LF, or single NEL). I have seen very little or no use of the LS (line separator) special control. Type (3) may be encoded with PS (paragraph separator), but in terminals (and common protocols line MIME) it is usually encoded using a couple of newline (CR+CR, or LF+LF, or CR+LF+CR+LF, or NL+NL) possibly with additional whitespaces (and additional presentation characters such as ">" in quotations inserted in mail responses) between them (needed for MIME and HTTP) which may be collapsed when rendering or interpreting them. Some terminal protocols can also use other legacy ASCII separators such as FS, GS, RS, US for grouping units containing multiple paragraphs, or STX/EOT pairs for encapsulating whole text documents in an protocol-specific enveloppe format (and will also use some escaping mechanism for special controls found in the middle, such as DLE+control to escape the control, or DLE+0 to escape a NUL, or DLE+# to escape a DEL, or DEL+x+NN where N are a fixed number of hexadecimal, decimal or octal digits. There's a wide variety of escaping mechanisms used by various higher-layer protocols (including transport protocols or encoding syntaxes used just below the plain-text layer, in a lower layer than the transport protocol layer). Le lun. 4 févr. 2019 à 21:46, Eli Zaretskii via Unicode a écrit : > > Date: Mon, 4 Feb 2019 19:45:13 + > > From: Richard Wordingham via Unicode > > > > Yes. If one has a text composed of LTR and RTL paragraphs, one has to > > choose how far apart their starting margins are. I think that could > > get complicated for plain text if the terminal has unbounded width. > > But no real-life terminal does. The width is always bounded. >
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 4 Feb 2019 22:27:39 +0100 Egmont Koblinger via Unicode wrote: > Hi Richard, > > > The concept appears to exist in the form of the fields of the > > fifth edition of ECMA-48. Have you digested this ambitious > > standard? > > To be honest: No, I haven't. And I have no idea what those "fields" > are. (Taken out of order) > That being said, I'd really, honestly love to see if someone evaluated > ECMA's "fields" and created a feasibility study for current terminal > emulators, similarly to how I did it with TR/53. They mostly seem to be security, protection and checking features. They seem to make sense for a captive system used as a till or for stock look-up by customers. For example, fields can be restricted as to how they are overwritten, e.g. not at all, or only with numbers, and some fields cannot be copied from the terminal. HTML forms seem to provide most of this functionality nowadays. Fields are persistent attributes. On reading further, the pane boundary functionality seems to be provided by the 'line home position' and 'line limit position'. These would have to be re-established whenever a pane became the active pane, but they seem to support the notion of writing a paragraph into a pane, with the terminal sorting out the splitting into lines. I'm not sure that this would be portable between ECMA-48 terminals; I get the impression that there would be a reliance on unstandardised behaviour being appropriate. I could be wrong; the specification may be there. > I spent (read: wasted) way too much time studying ECMA TR/53 to get to > understand what it's talking about, to realize that the good parts > were already obvious to me, and to be able to argue why I firmly > believe that the bad parts are bad. Remember: These documents were > created in 1991, that is, 28 years ago. (I'm emphasizing it because I > did the math wrong for a long time, I though it was 18 years ago :-D.) > Things have a changed a lot since then. It took me a while to work out that the recommendations of ECMA TR/53 had been implemented in Issue 5 of ECMA-48. > As for the BiDi docs, I found that the current state of the art, > current best practices, exisiting BiDi algorithm differ so much from > ECMA's approach (which no one I'm aware of cared to implement for 28 > years) that the standard is of pretty little use. Only a few good > parts could be kept (but needed tiny corrections), and plenty of other > things needed to be build up anew. This is the only reasonable way to > move forward. The relationship between the data store and the presentation store don't seem to be very well defined. There may be room for the BiDi algorithm there. > If you designed a house 2 or 3 years ago, and finally have the money > to get it built, you can reasonably start building it. If you designed > a house 28 years ago and finally have the chance to build it > (including the exact same heating technologies, electrical system > etc.), you wouldn't, would you? I'm sure you looked at those plans, > and started at the very least heavily updating them, or started to > design a brand new one, perhaps somewhat based on your old ideas. But a scheme may be more persuasive if it can be said to conform to ECMA-48. One thing that is very unclear in ECMA-48 is how characters are allocated to cells in 'implicit' mode. As the Arabic encoding considered contained harakat, it looks as though the allocation is defined by 'unspecified protocols'. I note that in the scheme apparently given most consideration, forced Arabic presentation forms are selected by a combination of escape sequences and Arabic letters. The 'unspecified protocols' could be interpreted as one grapheme cluster* per group of cells. The typical groups would be one cell and the two cells for a CJK character. *Grapheme cluster is a customisable concept. > I don't expect it to be any different with "fields" of ECMA-48. I'm > not aware of any terminal emulator implementing anything like them, > whatever they are. Probably there's a good reason for that. Whatever > purpose they aimed to serve apparently wasn't important enough for > such a long time. By now, if they're found important, they should > probably be solved by some new design (or at the very least, just like > I did with TR/53, the work should begin by evaluating that standard to > see if it's still feasible). > Instead of spending a huge amount of work on my BiDi proposal, I could > have just said: "guys, let's go with ECMA for BiDi handling". The > thing is, I'm pretty sure it wouldn't have taken us anywhere. I don't > expect it to be different with "fields" either. Your interpretation document would have explored the issues. > The starting point for my work was the current state of terminal > emulators and the surrounding ecosystem, plus the current BiDi > algorithm; not some ancient plan that was buried deep in some drawer > for almost three decades. I hope this makes sense. You're assuming that the
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, > I think it's unreasonable and impractical to expect 'echo', 'cat', and > its ilk to emit bidi controls (or any other controls) to force > paragraph direction. For starters, they won't know what direction to > force, because they don't understand the text they are processing. I agree, it is unreasonable for 'echo', 'cat' etc. to emit BiDi controls. There could be some higher level helper utiities though, let's say a "bidi-cat" that examines the file, makes a guess, emits the corresponding escape sequences and cats the file. It's not necessarily a good approach, but a possible one (at least temporarily until terminals implement a better one). On the other hand, it's not unreasonable for higher level stuff (e.g. shell scripts, or tools like "zip") to use such control characters. > No, this simple case must work reasonably well with the application > _completely_ oblivious to the bidi aspects. If this can't work > reasonably well, I submit that the entire concept of having a > bidi-aware terminal emulator doesn't "hold water". There isn't a magic wand. I can't magically fix every BiDi stuff by changing the terminal emulator's source code. Not because I'm clumsy, but because it just can't be done. If it was possible, I wouldn't have written a long specification, I would have just done it. (Actually, if it was possible, others would have sure done it long before I joined terminal emulator development.) There need to be multiple modes, some of them due to the technical particularities of terminal emulation that aren't seen elsewhere (e.g. explicit vs. implicit), and some of them because they are present everywhere where it comes to BiDi (e.g. paragraph direction). And if the mode is not set correctly, things might break, there's nothing new in it. What my specification essentially modifies is that with this specification, you at least will have a chance to get the mode right. Currently there are perhaps like 4 different behaviors implemented across terminal emulators when it comes to BiDi. An application cannot control and cannot query the behavior. In order to get Emacs behave properly, you have to ask your users to adjust a setting (and I cannot repeat enough times that I find this an unacceptable user experience). If the settings of the terminal aren't what Emacs expects, the result could be broken (RTL words might even show up in reverse, LTR order). The same goes for the random example of "zip -h", assuming that they add Hebrew translation. Given the current set of popular terminal emulators, there's no way zip could emit some Hebrew text in a reliably readable way. Whatever it does, there will be terminal emulators (and settings thereof) where the result is totally broken (reversed), or at least unpleasant (wrong paragraph direction used). Moreover, if "zip" emits the Hebrew text in the semantically correct logical order (e.g. they use whatever existing framework, like gettext and a popular .po editor), as opposed to the visual LTR order seen in some legacy systems, it will need different terminal emulator settings than Emacs, so if someone uses both zip and Emacs regularly, they'll have to continuously toggle their terminal's settings back and forth – have I mentioned how unacceptable I find this as a user? :) One of the key points of my specification is that applications will be able to automatically set the mode. Emacs will be able to switch to the mode it requires, and so will be zip. They will have the opportunity. If they don't live with this opportunity, it's not my problem, and there's nothing I could do about it. Let's say hypothetically that zip adds Hebrew translations, but refuses to emit the escape sequence that switches to RTL paragraph direction, and thus its result doesn't look perfect. Can terminal emulators, can my specification, can me be blamed in this case? I don't think so. If zip knows exactly what it wants to print (as with the help page it knows for sure), and is given all the technical infrastructure to reliably achieve that, it'd be solely them to blame if they refused to properly use it. It's absolutely out of the scope of my work to try to fix this case. "cat" is substantially different. In case of "zip", the creators of that software know exactly how the output should look like, and according to my specification (assuming a confirming terminal emulator, of course) nothing stops them from achieving it. "cat" doesn't know, cannot know the desired look, since the file itself lacks this information. Paragraph direction is a concept that sucks big time. (I have no idea how Unicode could have got it better, though.) It's a piece of information that needs to be carried externally along with the text, in order to make sure it'll be displayed correctly. It's a pain in the butt, just as much carrying the encoding in the pre-Unicode days was, and hardly anyone cared about, resulting in incorrect accented letters way too often. Practically everyone's lazy and
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 04 Feb 2019 22:39:07 +0200 Eli Zaretskii via Unicode wrote: > > Date: Mon, 4 Feb 2019 19:45:13 + > > From: Richard Wordingham via Unicode > > > > Yes. If one has a text composed of LTR and RTL paragraphs, one has > > to choose how far apart their starting margins are. I think that > > could get complicated for plain text if the terminal has unbounded > > width. > > But no real-life terminal does. The width is always bounded. The Emacs terminal (M-x term) seems to be a reasonable approximation, with the scroll-left and scroll-right commands changing the margins' separations. This is an example of a terminal that has lines with left-to-right character paths and lines with right-to-left character paths. (Such lines are necessarily separated by blank lines.) Geometrically, column positions on left-to-right and right-to-left character paths are incomparable - resizing the window and scrolling move them differently. Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> > Yes. If one has a text composed of LTR and RTL paragraphs, one has to > > choose how far apart their starting margins are. I think that could > > get complicated for plain text if the terminal has unbounded width. > > But no real-life terminal does. The width is always bounded. Allegedly the no longer maintained FinalTerm, and maybe another one or two not so popular terminal emulators experimented with this. VTE and a few other emulators have also received such a feature request; VTE has rejected it. See https://bugzilla.gnome.org/show_bug.cgi?id=769440 if you're curious. Indeed BiDi becomes problematic in the sense that Richard pointed out: how far should the starting margins be from each other? By terminal emulators rejecting the idea of unbounded width, this is not a problem for them. It might still be a problem for BiDi aware text viewers/edtiors, though. I mean one possible, obvious approach could be to adjust them according to the terminal's width. Another is to take it from the file's contents (e.g. longest line). But maybe there's demand for other options, e.g. to have those margins 80 characters away from each other even when the file is viewed on a mobile phone where the viewport is narrower and the user wishes to scroll horizontally. This is up for text viewers/editors to decide. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Richard, > That split is wrong if you want the non-HTML text to lay out reasonably > well in anything but a higher order protocol forcing RTL. You need to > it split as: > > lorem ipsum ABC > <[ DEF foobar Okay, so you should use LRMs or other similar tricks when wrapping a human-perceived paragraph of text. I take it as: - The expected definition of "paragraph", for the technical sake of running the BiDi algorithm, is lines of the text file (that is, between a newline and the next one). - On top of this technical definition, the document is crafted so that lines are not longer than a certain threshold, and the human-perceived paragraphs are usually delimited by empty lines (sometimes by other means, like bullets of a list). Sounds like a reasonable approach to me, probably the best to have. And, by the way, aligns with my BiDi proposal if the higher level protocol (escape sequences) set the paragraph direction correctly and disable autodetection. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Richard, > The concept appears to exist in the form of the fields of the > fifth edition of ECMA-48. Have you digested this ambitious standard? To be honest: No, I haven't. And I have no idea what those "fields" are. I spent (read: wasted) way too much time studying ECMA TR/53 to get to understand what it's talking about, to realize that the good parts were already obvious to me, and to be able to argue why I firmly believe that the bad parts are bad. Remember: These documents were created in 1991, that is, 28 years ago. (I'm emphasizing it because I did the math wrong for a long time, I though it was 18 years ago :-D.) Things have a changed a lot since then. As for the BiDi docs, I found that the current state of the art, current best practices, exisiting BiDi algorithm differ so much from ECMA's approach (which no one I'm aware of cared to implement for 28 years) that the standard is of pretty little use. Only a few good parts could be kept (but needed tiny corrections), and plenty of other things needed to be build up anew. This is the only reasonable way to move forward. If you designed a house 2 or 3 years ago, and finally have the money to get it built, you can reasonably start building it. If you designed a house 28 years ago and finally have the chance to build it (including the exact same heating technologies, electrical system etc.), you wouldn't, would you? I'm sure you looked at those plans, and started at the very least heavily updating them, or started to design a brand new one, perhaps somewhat based on your old ideas. I don't expect it to be any different with "fields" of ECMA-48. I'm not aware of any terminal emulator implementing anything like them, whatever they are. Probably there's a good reason for that. Whatever purpose they aimed to serve apparently wasn't important enough for such a long time. By now, if they're found important, they should probably be solved by some new design (or at the very least, just like I did with TR/53, the work should begin by evaluating that standard to see if it's still feasible). Instead of spending a huge amount of work on my BiDi proposal, I could have just said: "guys, let's go with ECMA for BiDi handling". The thing is, I'm pretty sure it wouldn't have taken us anywhere. I don't expect it to be different with "fields" either. The starting point for my work was the current state of terminal emulators and the surrounding ecosystem, plus the current BiDi algorithm; not some ancient plan that was buried deep in some drawer for almost three decades. I hope this makes sense. That being said, I'd really, honestly love to see if someone evaluated ECMA's "fields" and created a feasibility study for current terminal emulators, similarly to how I did it with TR/53. cheers, egmont
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> Date: Mon, 4 Feb 2019 19:45:13 + > From: Richard Wordingham via Unicode > > Yes. If one has a text composed of LTR and RTL paragraphs, one has to > choose how far apart their starting margins are. I think that could > get complicated for plain text if the terminal has unbounded width. But no real-life terminal does. The width is always bounded.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 04 Feb 2019 18:53:22 +0200 Eli Zaretskii via Unicode wrote: > Date: Mon, 4 Feb 2019 01:19:21 + > From: Richard Wordingham via Unicode >> If you look at it in Notepad, all >> lines will be LTR or all lines will be RTL. > That's because Notepad implements _only_ the higher-level protocol for > base paragraph direction: there's no way to make Notepad determine the > direction by looking at the text. Yes. If one has a text composed of LTR and RTL paragraphs, one has to choose how far apart their starting margins are. I think that could get complicated for plain text if the terminal has unbounded width. Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> Date: Mon, 4 Feb 2019 01:19:21 + > From: Richard Wordingham via Unicode > > On Sun, 03 Feb 2019 19:50:50 +0200 > Eli Zaretskii via Unicode wrote: > > > Do you see how this is carefully formatted to avoid overflowing an > > 80-column line of a typical terminal? Now suppose this is translated > > into a RTL language, which causes the Copyright line to start with a > > strong R letter (because "Copyright" is translated). You will see the > > first line flushed to the right margin, then the next line flushed to > > the left margin (because it's a separate paragraph, and starts with a > > strong L letter). Then the line which says "The default action..." > > will again start at the right. And so on and so forth -- the result > > is extremely ugly. > > Depending on the environment. If you look at it in Notepad, all lines > will be LTR or all lines will be RTL. That's because Notepad implements _only_ the higher-level protocol for base paragraph direction: there's no way to make Notepad determine the direction by looking at the text. > Would not a careful translator either ensure that each non-blank > line had a strong character and that all first strong characters > were (a) L, (b) R or (c) AL? This is very hard in practice, and is a tremendous annoyance when translating message catalogs to RTL languages. Translation is a hard enough job even without this complication.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 4 Feb 2019 00:36:23 +0100 Egmont Koblinger via Unicode wrote: > Now, back to terminals. > > The smallest possible viable definition of a "paragraph" in terminal > emulators is stuff between one newline and the next one. > > It would require a hell lot of work, redesigning (overcomplicating) > plenty of basics of terminal emulation to be able to come up with > smaller units, e.g. cells of a table – a concept that doesn't > currently exist in this world –, I don't find any such approach > feasible at all. The concept appears to exist in the form of the fields of the fifth edition of ECMA-48. Have you digested this ambitious standard? ECMA-48 has the concept of hyphenation and wrapping! (Well, in Appendix C it does. I haven't fully tied it in with the receipt of characters.) Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Mon, 4 Feb 2019 00:36:23 +0100 Egmont Koblinger via Unicode wrote: > I wish to store and deliver the following text, as it's layed out here > in logical order. That is, the order as the bytes appear in the text > file, as I typed them from the keyboard, is laid out here strictly > from left to right, with uppercase standing for RTL letters, and no > mirroring: > > lorem ipsum ABC <[ DEF foobar > Let's assume that me, as the producer of the text file, wish to create > a typical README in the spirit of COPYING.GPL and similar text files, > with the paragraph definition that two consecutive newline characters > (that is: a single empty line) delimit paragraphs; and a single > newline is equivalent to a space. Since I'd prefer to keep a margin of > 16 characters in the source file (for demo purposes), I can take the > liberty of replacing the space after "ABC" by a single newline. (Maybe > my text editor does this automatically.) The file's contents, again > the logical order laid out from left to right, top to bottom, becomes > this: > > lorem ipsum ABC > <[ DEF foobar That split is wrong if you want the non-HTML text to lay out reasonably well in anything but a higher order protocol forcing RTL. You need to it split as: lorem ipsum ABC <[ DEF foobar or lorem ipsum ABC <[ DEF foobar Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
On Sun, 03 Feb 2019 19:50:50 +0200 Eli Zaretskii via Unicode wrote: > Do you see how this is carefully formatted to avoid overflowing an > 80-column line of a typical terminal? Now suppose this is translated > into a RTL language, which causes the Copyright line to start with a > strong R letter (because "Copyright" is translated). You will see the > first line flushed to the right margin, then the next line flushed to > the left margin (because it's a separate paragraph, and starts with a > strong L letter). Then the line which says "The default action..." > will again start at the right. And so on and so forth -- the result > is extremely ugly. Depending on the environment. If you look at it in Notepad, all lines will be LTR or all lines will be RTL. Would not a careful translator either ensure that each non-blank line had a strong character and that all first strong characters were (a) L, (b) R or (c) AL? Text in LTR scripts tends not to be so careful. Richard.
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, (I'm responding in multiple emails.) The Unicode BiDi algorithm states that it operates on paragraphs of text, and leaves it up to a higher protocol to define what a paragraph exactly is. What's the definition of "paragraph" in the context of plain text files? I don't think there's a single well-established practice. In some particular text files, every explicit newline character starts a new paragraph. In some (e.g. COPYING.GPL and friends), an empty line (that is: two consecutive newline characters) separates two paragraphs. In some, e.g. in Emacs's TUTORIAL.he, or markdown files, it's way more complicated, probably there isn't a well-defined grammar for how exactly bullet list entries and alike should become new paragraphs. In the output of "dpkg -s packagename" consecutive lines indented by 1 space – except for those where there's only a single dot after the space – form the human-perceived paragraphs. There are sure several other syntaxes out there. If the producer of a text file uses a different definition than the viewer software, bugs can arise. I think this should be intuitively obvious, but just in case, let me give a concrete example. In this example I'll assume LTR paragraph direction set up by some external means; with autodetected paragraph direction it's much easier to come up with such breakages. I wish to store and deliver the following text, as it's layed out here in logical order. That is, the order as the bytes appear in the text file, as I typed them from the keyboard, is laid out here strictly from left to right, with uppercase standing for RTL letters, and no mirroring: lorem ipsum ABC <[ DEF foobar The visual representation, what I expect to see in any decent viewer software, is this one according to the BiDi algorithm this: lorem ipsum FED ]> CBA foobar The visual representation, in a narrower viewport, might wrap for example like this: lorem ipsum CBA FED ]> foobar which is still correct, given that logical "ABC <[ DEF" is a single RTL run. (This assumes a viewer which, unlike Emacs, follows the Unicode BiDi algorithm for wrapping a paragraph into multiple lines.) Let's assume that me, as the producer of the text file, wish to create a typical README in the spirit of COPYING.GPL and similar text files, with the paragraph definition that two consecutive newline characters (that is: a single empty line) delimit paragraphs; and a single newline is equivalent to a space. Since I'd prefer to keep a margin of 16 characters in the source file (for demo purposes), I can take the liberty of replacing the space after "ABC" by a single newline. (Maybe my text editor does this automatically.) The file's contents, again the logical order laid out from left to right, top to bottom, becomes this: lorem ipsum ABC <[ DEF foobar This file, accoring to the paragraph definition chosen earlier, is equivalent to the unwrapped version shown before, and thus should convey the same message. If I view this file in a piece of software which uses the same paragraph definition for BiDi purposes, the contents will appear as expected. An example for such a viewer is a markdown converter's (that leaves single newlines as-is, and adds a "" at double newlines) output viewed as an html file in a browser. Here comes the twist. Let's view this latter file with a viewer that uses a _different_ definition for paragraph. Let's view it in Gedit, Emacs, or the work-in-progress BiDi-aware VTE by "cat"ing it, where every newline begins a new paragraph – that's how these viewers define the notion of "paragraph" for the sake of BiDi. The visual layout in these viewers becomes: lorem ipsum CBA <[ FED foobar which is just not correct. Since here BiDi is run on the two lines separately, the initial "<[" is treated as LTR, placed at the wrong location in the wrong order, and the glyphs aren't mirrored. Now, Emacs ships a TUTORIAL.he which, for most of its contents (but not everywhere) seems to treat runs between empty lines as paragraphs, while Emacs itself is a viewer that treats runs between single newlines as paragraphs. That is, Emacs is inconsistent with itself. In case you think I got something wrong with Emacs: Could you please give exact definitions: - What are the exact units (so-called "paragraphs" by UAX9) that it runs BiDi on when it loads and displays a file? - What are the exact units (so-called "paragraphs" by UAX9) in TUTORIAL.he on which BiDi needs to be run in order to get the desired readable version? What most likely happens is that in order to see a difference, you'd need to have more special symbols, or at least a more special constellation of them. Probably TUTORIAL.he is just luckily simple enough that such a difference isn't hit. Another possibility is (and I cannot check because I can't speak Hebrew) that somewhere TUTORIAL.he "cheats" with the logical order to get the desired visual one. - Now, back to terminals. The smallest possible viable definition of a
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> From: Egmont Koblinger > Date: Sun, 3 Feb 2019 17:54:25 +0100 > Cc: unicode@unicode.org > > I'm arguing, although my reasons are not rock solid, that IMHO the > default should be the strict direction as set by SCP, without > autodetection. I think it's unreasonable and impractical to expect 'echo', 'cat', and its ilk to emit bidi controls (or any other controls) to force paragraph direction. For starters, they won't know what direction to force, because they don't understand the text they are processing. No, this simple case must work reasonably well with the application _completely_ oblivious to the bidi aspects. If this can't work reasonably well, I submit that the entire concept of having a bidi-aware terminal emulator doesn't "hold water". > > The fundamental problem here is that most "simple" utilities use hard > > newlines to present text in some visually plausible format. > > Could you please list examples? Just redirect any of them to a file, and look at the file with a hex editor. You will see a hard newline character, 0x0A, at the end of each line. > What I have in mind are "echo", "cat", "grep" and alike, they don't > care about the terminal width. Terminal width is not always relevant here, and I didn't mention it. However, as long as you allude to that, I think your garden-variety text utility does assume the width of a terminal window is 80 columns, and the messages displayed by these programs are formatted accordingly. > If an app cares about the terminal width, how does it care about it? > What does it use this information for? To truncate overlong strings, > for example? To break long lines at appropriate places, and to emit text that fits on a line in the first place. Just try invoking any such utility with the --help option, and you will see what I mean. I give one example below. > At this very moment I'd argue that such applications need > to do BiDi on their own, and thus set the terminal to explicit mode. > In ap app does any kind of string truncation, it can no longer > delegate the task of BiDi to the terminal emulator. I'm afraid this won't fly, because most "simple" utilities do it that way. If you insist on them doing their own bidi, you've just lost your cause. No upstream developer will be interested in adapting their utilities to a terminal emulator that requires them to do their own bidi. > I'm also mentioning that you cannot both logically and visually > truncate a BiDi string at once. I don't understand why you talk about truncation; I didn't. Here, look at this random example: Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. Zip 3.0 (July 5th 2008). Usage: zip [-options] [-b path] [-t mmdd] [-n suffixes] [zipfile list] [-xi list] The default action is to add or replace zipfile entries from list, which can include the special name - to compress standard input. If zipfile and list are omitted, zip compresses stdin to stdout. -f freshen: only changed files -u update: only changed or new files -d delete entries in zipfile-m move into zipfile (delete OS files) -r recurse into directories -j junk (don't record) directory names -0 store only -l convert LF to CR LF (-ll CR LF to LF) -1 compress faster -9 compress better -q quiet operation -v verbose operation/print version info -c add one-line comments-z add zipfile comment -@ read names from stdin-o make zipfile as old as latest entry -x exclude the following names -i include only the following names -F fix zipfile (-FF try harder) -D do not add directory entries -A adjust self-extracting exe -J junk zipfile prefix (unzipsfx) -T test zipfile integrity -X eXclude eXtra file attributes -! use privileges (if granted) to obtain all aspects of WinNT security -$ include volume label -S include system and hidden files -e encrypt -n don't compress these suffixes -h2 show more help Do you see how this is carefully formatted to avoid overflowing an 80-column line of a typical terminal? Now suppose this is translated into a RTL language, which causes the Copyright line to start with a strong R letter (because "Copyright" is translated). You will see the first line flushed to the right margin, then the next line flushed to the left margin (because it's a separate paragraph, and starts with a strong L letter). Then the line which says "The default action..." will again start at the right. And so on and so forth -- the result is extremely ugly. > > Even when > > these utilities just emit text read from files (as opposed to > > generating the text from the program), you will normally see each line > > end with a hard newline, because the absolute majority of text files > > have a hard newline and the end of each line. > > How does a BiDi
Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
Hi Eli, > The document cited at the beginning of the parent thread states that > "simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should > use the "implicit" mode of bidi reordering, with automatic guessing of > the base paragraph direction. Not exactly. I take the SCP escape sequence from ECMA TR/53 (and slightly reinterpret it) so that it specifies the paragraph direction, plus introduce a new one that specifies whether autodetection is enabled. I'm arguing, although my reasons are not rock solid, that IMHO the default should be the strict direction as set by SCP, without autodetection. > The fundamental problem here is that most "simple" utilities use hard > newlines to present text in some visually plausible format. Could you please list examples? What I have in mind are "echo", "cat", "grep" and alike, they don't care about the terminal width. If an app cares about the terminal width, how does it care about it? What does it use this information for? To truncate overlong strings, for example? At this very moment I'd argue that such applications need to do BiDi on their own, and thus set the terminal to explicit mode. In ap app does any kind of string truncation, it can no longer delegate the task of BiDi to the terminal emulator. I'm also mentioning that you cannot both logically and visually truncate a BiDi string at once. Either you truncate the logical string, which may result in a visual nonsense, or you truncate the visual string, risking that it's not an initial fragment of the data that ends up getting displayed. Along these lines I'm arguing that basic utilities like "cut" shouldn't care about BiDi, the logical behavior there is more important than the visual one. There could, of course, be sophisticated "bidi-cut" and similar utilities at one point which cut the visual string, but they should use the terminal's explicit mode. > Even when > these utilities just emit text read from files (as opposed to > generating the text from the program), you will normally see each line > end with a hard newline, because the absolute majority of text files > have a hard newline and the end of each line. How does a BiDi text file look like, to begin with? Can a heavily BiDi text file be formatted to 72 (or whatever) columns using explicit newlines, keeping BiDi both semantically and visually correct? I truly doubt that. Can you show me such files? > When bidirectional text is reordered by the terminal emulator, these > hard newlines will make each line be a separate paragraph. And this > is a problem, because the result will be completely random, depending > on the first strong directional character in each line, and will be > visually very unpleasant. Just take the output produced by any > utility when invoked with, say, the --help option, and try imagining > how this will look when translated into a language that uses RTL > script. First, having no autodetection by default but rather an explicit control for the overall direction hopefully mitigates this problem. Second, I outline a possible future extension with a different definition of a "paragraph", maybe something between empty lines, or other kinds of explicit markers. > So I think determination of the paragraph direction even in this > simplest case cannot be left to the UBA defaults, and there's a need > to use "higher-level" protocols for paragraph direction. That higher level protocol is part of my recommendation, part of ECMA TR/53, as the SCP sequence. Does this make sense? cheers, egmont
Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)
> Date: Sun, 03 Feb 2019 18:10:15 +0200 > Cc: richard.wording...@ntlworld.com, unicode@unicode.org > From: Eli Zaretskii via Unicode > > I think there are hard problems even for such "simple" utilities, and > I will start a separate thread about this. I think we spent enough time discussing issues of complex script shaping in terminal emulators, something that IMO took us too far aside. The basic problems with bidi reordering of text-mode output start much sooner, and are much more fundamental. I think they should be considered first. The document cited at the beginning of the parent thread states that "simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should use the "implicit" mode of bidi reordering, with automatic guessing of the base paragraph direction. I think this already present non-trivial problems. The fundamental problem here is that most "simple" utilities use hard newlines to present text in some visually plausible format. Even when these utilities just emit text read from files (as opposed to generating the text from the program), you will normally see each line end with a hard newline, because the absolute majority of text files have a hard newline and the end of each line. When bidirectional text is reordered by the terminal emulator, these hard newlines will make each line be a separate paragraph. And this is a problem, because the result will be completely random, depending on the first strong directional character in each line, and will be visually very unpleasant. Just take the output produced by any utility when invoked with, say, the --help option, and try imagining how this will look when translated into a language that uses RTL script. So I think determination of the paragraph direction even in this simplest case cannot be left to the UBA defaults, and there's a need to use "higher-level" protocols for paragraph direction. IOW, the implicit mode described in the above-mentioned document needs to be augmented by a smarter method of determining the base paragraph direction. (I might have a suggestion for that, if people agree with the above reasoning.)