Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> Date: Thu, 7 Feb 2019 22:35:23 +
> From: Richard Wordingham via Unicode 
> 
> > > Do you mean you aim to maintain a regex that matches everyone's
> > > prompt in the world, without a significant amount of false positive
> > > matches on non-prompt lines?  
> 
> > Yes.
> 
> Wow!  You'll do well to match a prompt such as '2p ', which I used for
> a while.

Like I said: for any reasonable prompt that doesn't match, you can
report a bug, and have the Emacs maintainers deliberate whether your
case is important enough to be supported by default.  Failing that,
you can set the regexp to a suitable value in a mode hook defined on
your init file.


Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Richard Wordingham via Unicode
On Fri, 8 Feb 2019 00:38:24 +0100
Egmont Koblinger via Unicode  wrote:

> I, for one, am not to the slightest bit interested in abandoning the
> character grid and allowing for proportional fonts. This would just
> break a gazillion of things.

The message I take from that and this thread in general is that Emacs
and 'M-x term' are the route to take if one only has proportional fonts.
What's the sledgehammer for Windows?

Where do I find the specification for fixed-width fonts (is
wcswidth() the core?) and how do I select the set of fonts to use?  Do I
need to use fontconfig where available?

Richard.


Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Adding a single bit of protection in cell attributes to indicate they are
either protected or become transparent (and the rest of the
attributes/character field indicates the id of another terminal grid or
rendering plugin crfeating its own layer and having its own scrolling state
and dimensions) can allow convenient things, including the possibility of
managing a grid-based system of stackable windows.
You can design one of the layer to allow input (managed directly in the
terminal, with local echo without transmission delays and without risks of
overwriting surrounding contents.
Asynchronous behavior can be defined as well between the remote
application/OS and the local processing in the terminal.
The protocol can also support an extension to provide alternate streams
(take an example on MIME multipart). This can even be used to transport the
inputs and outputs for each layer, and additional streams to support
(java)scripts, or the content of an image, or a link to a video stream.
And just like with classing graphics interface, you can have more than just
solid RGB colors and add an alpha layer. The single-rectangular-flat grid
design is not the only option. Layered approaches can then even be rendered
on hardware easily by mapping these virtual layers and flattening them
internally in the terminal emulator to the single flat grid supported by
the hardware. The result is more or less equivalent to graphic RGB frames,
except that the unit is not a single pixel but a whole cell with not just
one color but a pair of colors and an encoded character and a font selected
for that cell, or if a single font is supported, using a dynamic font and
storing glyph ids in that font (prescaled for the cell size). The hardware
then makes the rest to build the pixels of the frame, but it can be easily
accelerated.
The layered approache could also be used to link together the cells that
use the same script and font settings, in order to use proportional fonts
when monospaced fonts are not usable, and justify their text in the field
(which may turn to be scrollable itself when needed for input). Having
multiple communication streams between the terminal emulator and the remote
application allows the application to query the properties and behave in a
smarter way than with just static "termcaps" not taking into account the
actual state of the remote terminal.
All this requires some extension to TV-like protocols (using specific
escape sequences, just like with the Xterm extensions for X11).

You can also reconsider how "old" mainframes terminals worked: the user in
fact never submitted characters one by one to the remote application: the
application was sending a full screen and an input form, the user on its
terminal could fill in the form and press a "submit/send" button when he
had finished inputing the data. But while the user was inputing data, there
was absolutely no need to communicate each typed keystroke to the
application, all was taken in charge by the terminal itself which was
instructed (and could even perform form data validation with input formats
and some conditions, possibly as well a script). In other words, they
worked mostly like an HTML input form with a submit button.

Such mode is very useful for small devices because they don't have to react
interactively with the user, the transmission delays (which may be slow)
are no longer a problem, user can enter and correct data easily, and the
editing facilities don'ty need to be handled by the remote application
(which today could be a very tiny device with in fact much less processing
power than the terminal emulator, and would have in fact no knowledge at
all of the fonts needed) A terminal emulator can make a lot of things
itself and locally. And this would also be useful on many modern
application servers that need to serve lot of remote clients, possibly over
very slow internet links and long roundtrip times.

The idea behing this is to allow to distribute the workload and decide
which side will handle part of all of the I/O. Of course it will transport
text (preferably in an Unicode UTF), but text is not the only content to
transport. There are also audio/video/images, security items (certificates,
signatures, personal data that should remain private and be encrypted, or
only sent to the application in a on-way-hashed form), plus some
states/flags that could provide visual/audio hints to the user when working
in the rendered input/output form with his local terminal emulator.

I spoke about HTML because terminal-based browsers already exist since
long, some of them which are still maintained in 2019 (w3m still used as a
W3C-sponsored demo, Lynx is best known on Linux, or elinks):
  https://www.slant.co/topics/4702/~web-browsers-that-run-in-a-terminal
This gives a good idea of what is needed, what a good terminal protocol can
do, and what the many legacy VT-like protocol variants have never treid to
unify. These browsers don't reinvent the wheel: HTML 

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Egmont Koblinger via Unicode
Hi Philippe,

> I have never said anything about your work because I don't know where you 
> spoke about it or where you made some proposals. I must have missed one of 
> your messages (did it reach this list?).

This entire conversation started by me announcing here my work, aiming
to bring usable BiDi to terminal emulators.

> Terminals are not displaying plain text, they create their own upper layer 
> protocol which requires and enforces the 2D layout [...] Bidi does not 
> specify the 2D layout completely, it is purely 1D and speaks about left and 
> right direction

That's one of the reasons why it's not as simple as "let's just run
the UBA inside the terminal", one of the reasons why gluing the two
worlds together requires a substantial amount of design work.

> For now terminal protocols, and emulators trying to implement them; that must 
> mix the desynchronized input and output (especially when they have to do 
> "local echo" of the input [...]

I assume by "local echo" you're talking about the Send/Receive Mode
(SRM) of terminals, and not the "stty echo" line discipline setting of
the kernel, because as far as the terminal emulator is concerned, the
kernel is already remote, and it's utterly irrelevant for us whether
it's the kernel or the application sending back the character.

SRM is only supported by a few terminal emulators, and we're about to
drop it from VTE, too (https://gitlab.gnome.org/GNOME/vte/issues/69).

> If you look at historic "terminal" protocols,

I'm mostly interested in the present and future. In the past, only for
curiosity, and to the extent necessary to understand the present and
to plan for the future.

> Some older terminal protocols for mainframes notably were better than today's 
> VT-like protocols: you did not transmit just what would be displayed, but you 
> also described the screen area where user input is allowed and the position 
> of fields and navigation between them:

This is not seen in today's graphical terminal emulators.

> Today these links are better used with real protocols made for 2D and 
> allowing an web application to mange the input with presentation layer (HTML) 
> and with javascript helpers (that avoid the roundtrip time).

Sure, if you need another tool, let's say a dynamic webpage in your
browser, rather than a terminal emulator to perform your taks
effectively, so be it. I'm not claiming terminal emulators are great
for everything, I'm not claiming terminal emulators should be used for
everything.

> But basic text terminals have never evolved and have lagged behind today's 
> need.

I disagree with the former part. There are quite a few terminal
emulators out there, and many have added plenty of new great features
recently.

Whether they're up to today's needs, depends on what your needs are.
If you need something utterly different, go ahead and use whatever
that is, such as maybe a web browser. If you're good with terminals,
that's fine too. And there's a slim area where terminal emulators are
mostly good for you, you'd just need a tiny little bit more from them.
And maybe for some people this tiny little bit more happens to be
BiDi.

> Most of them were never tested for internationalization needs:

Terminal emulators weren't created with internationalization in mind.
I18n goals are added one by one. Nowadays combining accents and CJK
are supported by most emulators. Time to stretch it further with BiDi,
shaping, spacing combining marks for Devanagari, etc.

> [...] delimit input fields in input forms for mainframes, something that was 
> completely forgotten and remains forgotten today with today's VT-* protocols, 
> to indicate which side of the communcation link controls the content of 
> specific areas

Something that was completely forgotten, probably for good reasons,
and I don't see why it should be brought back.

> As well today's VT-* protocols have no possibility to be scriptable: 
> implemeint a way to transport fragments of javascripts would be fine.

I have absolutely no incentive to work in this direction.

> Text-only terminals are now aging but no longer needed for user-friendly 
> interaction, they are used for technical needs where the only need is to be 
> able to render static documents without interactiving with it, except 
> scrolling it down, and only if they provide help in the user's language.

Text-only terminals are no longer needed??? Well, strictly speaking,
computers aren't needed either, people lived absolutely fine lives
before they were invented :)

If you get to do some work, depending on the kind of work, terminal
emulators may or may not be a necessary or a useful tool for you. For
certain tasks you don't really have anything else, or at least
terminals are way more effective than other approaches. For other
tasks (e.g. text editing) it's mostly a matter of taste whether you
use a terminal or a graphical app. For yet other tasks, terminal
emulators take you nowhere.

My work aims to bring BiDi into 

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Richard Wordingham via Unicode
On Thu, 07 Feb 2019 22:00:20 +0200
Eli Zaretskii via Unicode  wrote:

> > From: Egmont Koblinger 
> > Date: Thu, 7 Feb 2019 19:01:33 +0100

> > On Thu, Feb 7, 2019 at 6:53 PM Eli Zaretskii  wrote:

> > > No, it needs no interaction.  Unless the regexp doesn't work for
> > > you, which you should then report as a bug in Emacs.  

> > Do you mean you aim to maintain a regex that matches everyone's
> > prompt in the world, without a significant amount of false positive
> > matches on non-prompt lines?  

> Yes.

Wow!  You'll do well to match a prompt such as '2p ', which I used for
a while.

Richard.


Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 19:38, Egmont Koblinger  a écrit :

> As you can see from previous discussions, there's a whole lot of
> confusion about the terminology.


And it was exactly the subject of my first message sent to this thread !
you probably missed it.


> Philippe, with all due respect, I have the feeling that you have some
> fundamental problems with my work (and I'm temped to ask back: have
> you read it at all?), but your message what your problem is just
> doesn't come across to me. Could you please avoid all those irrelevant
> stories with baud rate and font size and Asian scripts and whatnot,
> and clearly get to your point?
>

I have never said anything about your work because I don't know where you
spoke about it or where you made some proposals. I must have missed one of
your messages (did it reach this list?). So don't take that as a personal
attack because this only started on a reply I made (the one specifically
speaking about the various ambiguities of encoded newlines in terminal
protocols, which do not match the basic plain text definition (similar to
MIME) made only for static documents, but never tuned for interactive
bidirectional use (including for example text editors, which also requires
a modelization of 2D layout, and also sets some assumptions about
"characters" visible in a single cell of a regularly spaced grid, and a
known number of lines and columns, independant of the lines of the text
rendered and read on it.

Terminals are not displaying plain text, they create their own upper layer
protocol which requires and enforces the 2D layout (whereas Unicode is a
purely linear protocol with only relations between one character and the
next one in a 1D stream, and no assumption at all about their display
width, which cannot be monospaced in all scripts and are definitely not
encoded in logical order: try adding characters at end of a logical line,
with a Bidi text you do not just replace the content of one cell, you have
to scroll the content of surrounding cells and your input curet position
does not necessarily changes or you'l reach a point where a visual line
will be split in two part, but not at the rest position, and some parts
moved up to down

Bidi does not specify the 2D layout completely, it is purely 1D and speaks
about left and right direction and does not specify what happens when
contents do not fit on the visual line for the text which is already
present there before inserting new text or even what will be replaced if
you are in replace mode and not in insert mode: The Bidi algorithm is not
designed to handle overwrites, and not even the whole Unicoidce standard
itself, which is made as if all text was inserted only at end of lines and
not replacing anything.

For now terminal protocols, and emulators trying to implement them; that
must mix the desynchronized input and output (especially when they have to
do "local echo" of the input for performance reason over slow serial links
where there's no synchronization between the local buffer of the terminal
and the remote virtual buffer of the terminal emulator in the emitting app,
even those using the best "termcap" definitions) have no easy way to do
that. The logical encoding of Unicode does not play well and the time to
resynchronize the local and remote buffers is a limiting factor (over a
9.6kbps link, refreshing the whole screen takes too long, and this cannot
be done on every keystroke of input, or user input would have to be
dramatically slow if local echoing is also enabled, or most user inputs
that are too fast would have to be discarded, and this makes user input
very unreliable, requiring constant correction; these protocols are
definitely not human-friendly as they depend on strict timing which is not
the way humans enter text; this timing is also unpredicatable and very
variable over serial links and the protocols do not have any specification
for timing requirements. In fact time is constantly ignored, even if it
plays an evident role).

If you look at historic "terminal" protocols, technics were used to control
time: notably the XON/XOFF protocols, or mechanical constraints. Especially
when the output was a printer (with a daisywheel or matrix head). But time
was just control between one machine and another, a human could not really
interact asynchronously. And it was in a time where full-screen text
editors did not even exist (at most they were typing "on the flow" and text
layout was completely forgotten. This changed radiucally when the ouput
became a screen, with the assumption that the output was instantanous, but
the mechanical restrictions were removed.

Some older terminal protocols for mainframes notably were better than
today's VT-like protocols: you did not transmit just what would be
displayed, but you also described the screen area where user input is
allowed and the position of fields and navigation between them: the
terminal had then no difficulty to avoid breaking the output when 

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Thu, 7 Feb 2019 19:01:33 +0100
> Cc: Richard Wordingham , 
>   unicode Unicode Discussion 
> 
> On Thu, Feb 7, 2019 at 6:53 PM Eli Zaretskii  wrote:
> 
> > No, it needs no interaction.  Unless the regexp doesn't work for you,
> > which you should then report as a bug in Emacs.
> 
> Do you mean you aim to maintain a regex that matches everyone's prompt
> in the world, without a significant amount of false positive matches
> on non-prompt lines?

Yes.


Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Egmont Koblinger via Unicode
Hi Philippe,

On Thu, Feb 7, 2019 at 3:21 PM Philippe Verdy  wrote:

> "Rules" are not formally written, they are just a sense of best practices.

When it comes to BiDi in terminals, I haven't seen anything that I
consider reasonably okay, let alone "best practice". It's a mess.
That's why I decided to come up with something.

> Bidi plays very badly on terminals

Agreed. There's essentially two ways from here: just leave it as bad
as it is (or even see various terminal emulators coming up with not
well-thought-out hacks that just make it even worse) or try to
improve. I picked the latter.

> [...] refreshing a typical 80x25 screen takes about one half second, which is 
> much longer than typical user input, so full screen refresh does not work for 
> data input and editing, and terminals implement themselves the echo of user 
> input, ignoring how and when the receiving application will handle the input, 
> and also ignoring if the applciation is already sending ouput to the terminal.

I'm really unsure where you're trying to get with it.

For one, adding BiDi doesn't introduce the need for significantly
larger updates. Whenever a partial repaint of the screen was
sufficient, even with BiDi in the game it will remain sufficient.

Another thing: I'm not sure that 9.6kbps is a bottleneck to worry
about. It's present if you connect to a device via serial port, but
will you really do this in combination with BiDi? The use case I much
more have in mind is running a terminal emulator locally, or ssh'ing
to a remote matchine, for getting various kinds of productive work
done (e.g. wriiting a text file in someone's native RTL script in a
text editor). These are magnitudes faster.

> It's hard or impossible to synchroinize this and local echoes on the terminal 
> causes havoc.

If input mixes with output (e.g. you press some keys while you're
waiting for make/gcc to compile your app, and these letters appear
onscreen), the visual result is broken even without BiDi. I cannot
elimite this kind of breakage by introducing BiDi, nor can I build up
something from scratch that somewhat resembles the current terminal
emulator world but fixes all of its oddnesses.

> But the concept of "line" or "paragraph" in a terminal protocols is extremely 
> fuzzy. It's then very difficult to take into account the additiona Bidi 
> contraints as it's impossible to conciliate BOTH the logical ordering (what 
> is encoded in the transmitted data or kept in history buffers) and the visual 
> ordering.

I don't try to conciliate logical and visual ordering within the same
paragraph, I agree it's impossible, it's a semantical nonsense. But I
try to conciliate them in the sense that sometimes the visual order is
the desired one, sometimes the logical order, so let's make it
possible to use one for one paragraph, and the other one for another
paragraph.

> That's why there are terminal protocols that absolutely don't want to play 
> with the logical ordering and require all their data to be transmitted in 
> visual order (in which case, there's no bidi handling at all).

This is one of the modes in my recommendation. If your application
requires this mode (as e.g. Emacs does), use this mode and you're
good.

> In fact most terminal protocols are very defective and were never dessign to 
> handle Bidi input

Maybe it's high time someone fixed this defect, then? :)

> And here your unit (logical lines) is not even defined in the terminal 
> protocol and not known from the meitting applications whjich has no input 
> about the final output terminal properties. So the terminal must perform 
> guesses. As it can insert additional linebreaks itself, and scroll out some 
> portion of it, there's no way to delimit the effect of "bidi controls". The 
> basic requirement for correctly handling bidi controls is to make sure that 
> paragraph delimitations are known and stable. if additional breaks can occur 
> anywhere on what you think is a "logical line" but which is different from 
> the mietting application (or static text document which is ouput "as is" 
> without any change to reformat it, these bidi controls just make things worse 
> and it becomes impossible to make reasonnable guesses about paragraph 
> delimitations in the terminal. The result become unpredictable and most often 
> will not even make any sense as the terminal uses visual ordering always but 
> looses the tr!
 ack of the logical ordering (and things get worse when there are complex 
clusters or characters that cannot even fit in a monospaced grid.

If an exact definition of hard vs. soft wrapped lines is what you miss
from the specification, okay, I'll add it to a future version.

I don't know how terminals performing guesses occured to you, they
sure don't (as for hard vs. soft newlines).

> The basic requirement for correctly handling bidi controls is to make sure 
> that paragraph delimitations are known and stable.

Since we're talking about bidi controls being emitted, 

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode
On Thu, Feb 7, 2019 at 6:53 PM Eli Zaretskii  wrote:

> No, it needs no interaction.  Unless the regexp doesn't work for you,
> which you should then report as a bug in Emacs.

Do you mean you aim to maintain a regex that matches everyone's prompt
in the world, without a significant amount of false positive matches
on non-prompt lines?

(It's getting damn off-topic though.)


e.


Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Thu, 7 Feb 2019 18:20:02 +0100
> Cc: Richard Wordingham , 
>   unicode Unicode Discussion 
> 
> > It uses a regular expression, see term-prompt-regexp.
> 
> So, it's not automatic, needs user interaction

No, it needs no interaction.  Unless the regexp doesn't work for you,
which you should then report as a bug in Emacs.


Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode
On Thu, Feb 7, 2019 at 6:33 PM Eli Zaretskii  wrote:

> Well, let's just say that Emacs uses the HL1 rule, and determines the
> base direction for the entire chunk of text between empty lines.

Exactly!

Now it's my turn to figure out how to add this behavior to terminals,
preferably stopping before/after prompts too.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Thu, 7 Feb 2019 18:12:37 +0100
> Cc: Richard Wordingham , 
>   unicode Unicode Discussion 
> 
> I believe it's not my mental model that's weird, but your use of
> terminology that doesn't match UBA's that confused me.

Well, let's just say that Emacs uses the HL1 rule, and determines the
base direction for the entire chunk of text between empty lines.


Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode
Hi,

On Thu, Feb 7, 2019 at 3:27 PM Eli Zaretskii  wrote:

> It uses a regular expression, see term-prompt-regexp.

So, it's not automatic, needs user interaction, and for that reason,
may not have worked for me. (I have other weird things in my prompt,
like 256-color sequences that Emacs didn't recognize, perhaps this
made the regexp matching fail. Nevermind.)

> > Whatever it does to know where the prompt is, can it be made into a
> > standard, cross-terminal feature?
>
> Not sure.  It's a kind of heuristic, which is why the regexp is
> customizable on user level, so that users could adapt it to their
> needs, should that be necessary.

iTerm2 has a "shell integration" where the prompt contains explicit
markers so that no heuristics or user configuration is needed from the
terminal. We're trying to somewhat standardize it at
https://gitlab.freedesktop.org/terminal-wg/specifications/issues/4 and
get more terminals support it. Not sure where this attempt will take
us, we'll see.

> In what version of Emacs is that?  In the latest version 26 I have
> here, the tutorial displays with most paragraphs in RTL direction.

25.2 here, it might have obviously changed for a newer version, glad to hear it.

My distro will upgrade in about 2 months. Since I'm not an Emacs user
myself, I hope you don't mind if I don't make extra rounds in
upgrading now to verify this.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode
On Thu, Feb 7, 2019 at 3:14 PM Eli Zaretskii  wrote:

> Not a bug, a feature.  Emacs doesn't remove the bidi controls from
> display (that's another deviation allowed by the UBA, see section
> 5.2).  On GUI displays, these controls are displayed as thin 1-pixel
> spaces, but on text-mode terminals they are shown as space.

Thanks for the clarification!

> Why?  As I said, the tutorial was written in part to demonstrate the
> UBA implementation, including the dynamic detection of base paragraph
> direction, and this is exactly one example of how it works in
> practice.

Fair enough, then.

> > To recap: The _paragraph direction_ is determined in Emacs for
> > emptyline-delimited segments of data, which I honestly find a great
> > thing, and would love to do in terminals too, alas at this point it's
> > blocked by some really nontrivial technical issues. But once you have
> > decided on a direction, each _line_ within that data is passed
> > separately to the BiDi algorithm to get reshuffled
>
> Yes and no.  You could keep your mental model if you like, but
> actually the UBA explicitly says that each line is to be reordered for
> display separately, see section 3.4 of UAX#9.

The very first step of the BiDi algorithm is to split at "paragraphs",
however that's defined, and then do the rest for each paragraph.

For one particular paragraph, there's a lot going on: determining
embedded levels and such. At one point, at the very beginning of 3.4,
a caller may split a paragraph into lines. Then the rest (actual
reordering) happens on lines.

This is _not_ the same as splitting into lines upfront (that is,
define UBA's "paragraphs" as the text file's "lines"), and then
determining embedded levels and reshuffling on these smaller units.

Emacs does the latter, and so does my specification.

I believe it's not my mental model that's weird, but your use of
terminology that doesn't match UBA's that confused me. It's pretty
confusing and obviously hard to use the proper terminology, since
Emacs's definition and the user-perceived notion of a "paragraph"
differs from what becomes a "paragraph" according to UBA's definition.

Both in Emacs and in my spec, a "line" of the text file maps to a
"paragraph" according to UBA's phrasing. (Except when determining the
paragraph direction, where Emacs uses its own human-perceived
emptyline-separated paragraph, rather than lines. Which is a nice
thing to do.)

Anyways, I'm glad it turned out we're on the same page, it's just the
terminology that's truly confusing.


cheers,
egmont


Re: Two more ellispis-type interpunctations: ?.. and !..

2019-02-07 Thread Serik Serikbay via Unicode
Khakass language is much close to Kyrgyz ..




On Thu, Feb 7, 2019 at 8:54 PM "Jörg Knappen" via Unicode <
unicode@unicode.org> wrote:

> While working on a corpus of Kyrgyz language, a Turkic language written in
> the Cyrilic script,
> I encountered two ellipsis-type interpunctations, namely ?.. and !..
>
> Note that this is not (yet) a proposal to encode them a single Unicode
> characters although I would definitely
> use such characters when available because they make the text processing
> tool chain much simpler and more
> robust. It is a survey question:
>
> Do you have encountered ?.. or !.. in other languages than Kyrgyz?
>
> --Jörg Knappen
>


Two more ellispis-type interpunctations: ?.. and !..

2019-02-07 Thread Jörg Knappen
While working on a corpus of Kyrgyz language, a Turkic language written in the Cyrilic script,

I encountered two ellipsis-type interpunctations, namely ?.. and !..

 

Note that this is not (yet) a proposal to encode them a single Unicode characters although I would definitely

use such characters when available because they make the text processing tool chain much simpler and more

robust. It is a survey question:

 

Do you have encountered ?.. or !.. in other languages than Kyrgyz?

 

--Jörg Knappen


Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> Date: Thu, 7 Feb 2019 00:45:55 +0100
> Cc: unicode Unicode Discussion 
> From: Egmont Koblinger via Unicode 
> 
> > Not necessarily.  One could allow the first strong character in the
> > prompt to determine the paragraph directions
> 
> How does Emacs know what's a prompt? How can it tell it from the
> previous and next command's output?

It uses a regular expression, see term-prompt-regexp.

> Whatever it does to know where the prompt is, can it be made into a
> standard, cross-terminal feature?

Not sure.  It's a kind of heuristic, which is why the regexp is
customizable on user level, so that users could adapt it to their
needs, should that be necessary.

> > That's what the Emacs
> > terminal (invoked by M-x term; top level definition in term.el) does.
> 
> I tried it. Executed my default shell, and inside that, a "cat
> TUTORIAL.he". All the paragraphs are rendered as LTR ones,
> left-aligned. Not the way the file is opened in Emacs.

In what version of Emacs is that?  In the latest version 26 I have
here, the tutorial displays with most paragraphs in RTL direction.

> If you claim Emacs's built-in terminal emulator supports BiDi, I'm
> kindly asking you to present a documentation of its behavior, in
> similar spirit to my BiDi proposal.

The Emacs terminal emulator displays text as any other text in any
other Emacs buffer, so it supports the same bidi reordering as
elsewhere.  You could make it emulate other terminals by setting the
variable bidi-paragraph-direction to either left-to-right or
right-to-left, then all the paragraphs will have the base direction
you specify.  But the default value of this variable in term buffers
is nil, which invokes dynamic determination of base paragraph
direction.


Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> Date: Wed, 6 Feb 2019 23:32:43 +
> From: Richard Wordingham via Unicode 
> 
> > You define paragraphs as emptyline-separated blocks on which you
> > perform autodetection of the paragraph direction. This is great! As
> > I've mentioned, I'd love to have such a mode in terminals, but it's
> > subject to underlying improvements, like knowing when a prompt starts
> > and ends, because prompts also have to be paragraph delimiters.
> 
> Not necessarily.  One could allow the first strong character in the
> prompt to determine the paragraph directions.  That's what the Emacs
> terminal (invoked by M-x term; top level definition in term.el) does.

Emacs's built-in terminal emulator does that only because no one
bothered to do something about this behavior.  I personally don't
consider this the correct behavior (but then I don't use M-x term in
Emacs except for testing).  Emacs does know where the prompt is, so it
could implement the rule that whatever follows the prompt starts a new
paragraph.


Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger  a écrit :

> Hi Philippe,
>
> > There's some rules for correct display including with Bidi:
>
> In what sense are these "rules"? Where are these written, in what kind
> of specification or existing practice?
>

"Rules" are not formally written, they are just a sense of best practices.
Bidi plays very badly on terminals (even enhanced terminals like VT-* or
ANSI that expose capabilities when, most of the time, these capabilities
are not even accessible: it is too late and further modifications of the
terminal properties (notably its display size) can never be taken into
account (it is too late, the ouput has been already generated, and all what
the terminal can do is to play with what is in its history buffers). Even
on dual-channel protocols (input and output), terminal protocols are also
not synchronizing the input and the output and these asynchrnous channels
ignore the transmission time between the terminal and the aware
application, so the terminal protocol must include a functio nthat allows
flushing and redrawing the screen completely (but this requires long
delays). With a common 9.6kbps serial link, refreshing a typical 80x25
screen takes about one half second, which is much longer than typical user
input, so full screen refresh does not work for data input and editing, and
terminals implement themselves the echo of user input, ignoring how and
when the receiving application will handle the input, and also ignoring if
the applciation is already sending ouput to the terminal.
It's hard or impossible to synchroinize this and local echoes on the
terminal causes havoc.
I've not seen any way for a terminal to handle all these constraints. So
the only way for them is to support them only plain-text basic documents,
formatted reasonnably, and inserting layout "hints" in the format of their
output so that termioanl can perform reasonnable guesses and adapt.
But the concept of "line" or "paragraph" in a terminal protocols is
extremely fuzzy. It's then very difficult to take into account the
additiona Bidi contraints as it's impossible to conciliate BOTH the logical
ordering (what is encoded in the transmitted data or kept in history
buffers) and the visual ordering. That's why there are terminal protocols
that absolutely don't want to play with the logical ordering and require
all their data to be transmitted in visual order (in which case, there's no
bidi handling at all). Then terminals will attempt to consiliate the visual
line delimitations (in the transmitted data) with the local-only
capabilities of the rendered frame. Many terminals will also not allow
changing the display width, will not allow changing the display cell size,
will force constraints on cell sizes and fonts, and then won't be able to
correctly output many Asian scripts.
In fact most terminal protocols are very defective and were never dessign
to handle Bidi input, and Asian scripts with compelx clusters and variable
fonts that are needed for them (even CJK scripts which use a mix of
"half-wifth" and "full-width" characters.

> - Separate paragraphs that need a different default Bidi by double
> newlines (to force a hard break)
>
> There is currently no terminal emulator I'm aware of that uses empty
> lines as boundaries of BiDi treatment.
>

These are hint in absence of something else, and it plays a role when the
terminal disaply width is unpredicable by the application making the output
and having no access to any return input channel.
Take the example of terminal emulators in resizable windows: the display
width is undefined, but there's not any document level and no buffering,
scrolling text will flush the ouput partially, history is limited A
terminal emulator then needs hints about where paragrpahs are delimited and
most often don't have any other distinctions available even in their
limited history that allows distinguishing the 3 main kinds of line breaks.


> While my recommendation uses a one smaller unit (logical lines), and I
>

And here your unit (logical lines) is not even defined in the terminal
protocol and not known from the meitting applications whjich has no input
about the final output terminal properties. So the terminal must perform
guesses. As it can insert additional linebreaks itself, and scroll out some
portion of it, there's no way to delimit the effect of "bidi controls". The
basic requirement for correctly handling bidi controls is to make sure that
paragraph delimitations are known and stable. if additional breaks can
occur anywhere on what you think is a "logical line" but which is different
from the mietting application (or static text document which is ouput "as
is" without any change to reformat it, these bidi controls just make things
worse and it becomes impossible to make reasonnable guesses about paragraph
delimitations in the terminal. The result become unpredictable and most
often will not even make any sense as the terminal uses visual ordering

Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Wed, 6 Feb 2019 22:01:59 +0100
> Cc: Richard Wordingham , unicode@unicode.org
> 
> - Emacs running in a terminal shows an underscore wherever there's a
> BiDi control in the source file – while the graphical one doesn't.
> This looks like a simple bug to me, right?

Not a bug, a feature.  Emacs doesn't remove the bidi controls from
display (that's another deviation allowed by the UBA, see section
5.2).  On GUI displays, these controls are displayed as thin 1-pixel
spaces, but on text-mode terminals they are shown as space.  The
underscore you see is a special typeface used to indicate that this is
not really a space.  (This is the default; Emacs being Emacs, it
allows to customize how these characters are displayed, and in
particular not to display them at all.)

> - Line 1007, the copyright line of this file uses visual indentation,
> and Emacs detects LTR paragraph for that line. I think it should
> rather use BiDi controls to have an overall RTL paragraph direction
> detected, and within that BiDi controls to force LTR for the text.

Why?  As I said, the tutorial was written in part to demonstrate the
UBA implementation, including the dynamic detection of base paragraph
direction, and this is exactly one example of how it works in
practice.

> To recap: The _paragraph direction_ is determined in Emacs for
> emptyline-delimited segments of data, which I honestly find a great
> thing, and would love to do in terminals too, alas at this point it's
> blocked by some really nontrivial technical issues. But once you have
> decided on a direction, each _line_ within that data is passed
> separately to the BiDi algorithm to get reshuffled

Yes and no.  You could keep your mental model if you like, but
actually the UBA explicitly says that each line is to be reordered for
display separately, see section 3.4 of UAX#9.

> Let's make a thought experiment. Let's assume that for running the
> BiDi algorithm, we'd still stick to the emptyline-delimited paragraph
> definition. This is not what you do, this is not what I do, but I
> misunderstood that this is what you did, and I also thought this was a
> good idea as a potential extension for the BiDi specs – I no longer
> think so. This definition is truly problematic, as I'll show below.

Which is why it is not what the UBA says one should do.


Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Egmont Koblinger via Unicode
Hi Philippe,

> There's some rules for correct display including with Bidi:

In what sense are these "rules"? Where are these written, in what kind
of specification or existing practice?

> - Separate paragraphs that need a different default Bidi by double newlines 
> (to force a hard break)

There is currently no terminal emulator I'm aware of that uses empty
lines as boundaries of BiDi treatment.

While my recommendation uses a one smaller unit (logical lines), and I
understand as per Eli's request that it would be desireable to go with
emptyline-delimited boundaries, what in fact all the current
self-proclaimed BiDi-aware terminal emulators that I came across do is
use a unit two steps smaller than yours: they do BiDi on physical
lines of the terminal, no matter how a logical line of the output had
to wrap into physical ones because didn't fit in the width. (It's a
terrible behavior.)

The current behavior of terminal emulators is very far from what you describe.

> - use a single newline on continuation

Continuation of what exactly?

But let's take a step back: Should the output be pre-formatted by some
means, or do we rely on the terminal emulator wrapping overlong lines?
(If pre-formatted then for what width? 80 columns, so that I waste
precious real estate if my terminal is wider? Or is it a requirement
for any app that produces output to implement a decent dynamic
wrapping engine for nice formatting according to the actual width?)

There's precedence for both of these different approaches. I don't
think it's feasible to pick one, and claim that the other approach is
discouraged/invalid/whatever.

> - if technical items are untranslatable, make sure they are at the begining 
> of lines and indented by some leading spaces, before translated ones.

I firmly disagree. There shouldn't be any restriction on how a
translator wishes to translate a sentence. The computer world has to
adapt to the requirements of human languages, not the other way
around!

> - Don't use any Bidi control !

Why not? They do exist for a reason, for the very reason that any
logical translation, which a translator might want to write (see my
previous point) is presentable in a visually correct way. Use them for
that, whenever needed.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Richard Wordingham via Unicode
On Thu, 7 Feb 2019 00:45:55 +0100
Egmont Koblinger via Unicode  wrote:

> Hi Richard,
> 
> > Not necessarily.  One could allow the first strong character in the
> > prompt to determine the paragraph directions  
> 
> How does Emacs know what's a prompt? How can it tell it from the
> previous and next command's output?

I don't believe the Emacs terminal does either.  What's special about
the prompt is that it starts a line, so most paragraphs start with a
prompt.  Not all prompts contain a strong character.  To let a file's
contents control directionality, instead of issuing the command 'cat
file1' one would have to issue a shell command '(echo; cat file1)' or
similar to terminate the paragraph containing the prompt.  The 'echo'
inserts an empty line.

> > That's what the Emacs
> > terminal (invoked by M-x term; top level definition in term.el)
> > does.  
> 
> I tried it. Executed my default shell, and inside that, a "cat
> TUTORIAL.he". All the paragraphs are rendered as LTR ones,
> left-aligned. Not the way the file is opened in Emacs.

See above.  I don't know how what your shell is.

> If you claim Emacs's built-in terminal emulator supports BiDi, I'm
> kindly asking you to present a documentation of its behavior, in
> similar spirit to my BiDi proposal.

I've a feeling it has emergent behaviour, and may require a lot of
experimentation to elucidate.

> Does this logic also apply to single newline characters? If not, why
> not, what's the conceptual difference? If it does, why do text files
> end in a newline?

I don't like the convention that removing the newline from the end of a
non-empty line changes it into a binary file.  The short answer is that
some editors allow a text file not to have a final newline; such files
are not handled well in the Unix environment.

Some things are just untidy messes.  Compare C, where a semicolon
*terminates* statements, but some are terminated by '}', and a
semicolon *separates* the expression within the control part of a for
statement, and a comma *separates* the constant definitions in an enum
declaration - for a long time, a trailing comma inside the braces was
illegal.

Richard.