from:"Egmont Koblinger via Unicode"

Re: Numeric group separators and Bidi

2019-07-10 Thread Egmont Koblinger via Unicode

On Wed, Jul 10, 2019 at 1:21 AM Philippe Verdy via Unicode
 wrote:
>
>> Well my first feeling was that U+202F should work all the time, but I found 
>> cases where this is not always the case. So this must be bugs in those 
>> renderers.
>
> I think we can attribute these bugs

What bugs?

I asked for an example, you haven't provided, yet you blame others
without even considering that you might be doing or expecting
something wrong. So I'm asking again. Please show us an example along
the lines of: "I'm using the FooBar software, version 1.2.3, this and
that particular field. I enter a data, the hexdump of that data is
included here. I expect it to render as 123, instead it renders as
321."

I don't find it a nice attitude to blame others without having a
thorough understanding of the situation, without having firm reasons
to suspect the problem elsewhere rather than in your expectations.

And if a renderer is incorrect (which is not impossible, but maybe a
bit early to claim), you just have to ditch it and replace with a
correct one. Or, well, maybe your goal is to locate a set of faulty
renderers, locate and understand their exact bugs, and find a
workaround, i.e. a Unicode representation of numbers in RTL context
with narrow spaces which is immune to those bugs? Not sure if any of
us here are eager to help with that, I'm not, sorry. Not sure if it's
possible at all (if there are really such bugs), probably not, given
your further constraints such as not using BiDi control chars.

egmont

Re: Numeric group separators and Bidi

2019-07-09 Thread Egmont Koblinger via Unicode

On Tue, Jul 9, 2019 at 10:43 PM Philippe Verdy  wrote:
>
> Well my first feeling was that U+202F should work all the time, but I found 
> cases where this is not always the case. So this must be bugs in those 
> renderers.

Could you share some concrete examples?

Re: Numeric group separators and Bidi

2019-07-09 Thread Egmont Koblinger via Unicode

Hi Philippe,

What do you mean U+202F doesn't work fo you?

Whereas the logical string "hebrew 123456 hebrew" indeed shows
the number incorrectly as "456 123", it's not the case with U+202F
instead of space, then the number shows up as "123 456" as expected.

I think you need to pick a character whose BiDi class is "Common
Number Separator", see e.g.
https://www.compart.com/en/unicode/bidiclass/CS for a list of such
characters including U+00A0 no-break space and U+202F narrow no-break
space. This suggests to me that U+202F is a correct choice if you need
the look of a narrow space.

Another possibility is to embed the number in a LRI...PDI block, as
e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%"
fragment of its default example.

cheers,
egmont

On Tue, Jul 9, 2019 at 9:01 PM Philippe Verdy via Unicode
 wrote:
>
> Is there a narrow space usable as a numeric group separator, and that also 
> has the same bidi property as digits (i.e. neutral outside the span of digits 
> and separators, but inheriting the implied directionality of the previous 
> digit) ?
>
> I can't find a way to use narrow spaces instead of punctuation signs (dot or 
> comma) for example in Arabic/Hebrew, for example to present tabular numeric 
> data in a really language-neutral way. In Arabic/Hebrew we need to use 
> punctuations as group separators because spaces don't work (not even the 
> narrow non-breaking space U+202F used in French and recommended in ISO), but 
> then these punctuation separators are interpreted differently (notably 
> between French and English where the interpretation dot and comma are swapped)
>
> Note that:
> - the "figure space" is not suitable (as it has the same width as digits and 
> is used as a "filler" in tabular data; but it also does not have the correct 
> bidi behavior, as it does not have the same bidi properties as digits).
> - the "thin space" is not suitable (it is breakable)
> - the "narrow non-breaking space" U+202F (used in French and currently in 
> ISO) is not suitable, or may be I'm wrong and its presence is still neutral 
> between groups of digits where it inherits the properties of the previous 
> digit, but still does not enforces the bidi direction of the whole span of 
> digits.
>
> Can you point me if U+202F is really suitable ? I made some tests with 
> various text renderers, and some of them "break" the group of digits by 
> reordering these groups, changing completely the rendered value (units become 
> thousands or more, and thousands become units...). But may be these are bugs 
> in renderers.
>

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-18 Thread Egmont Koblinger via Unicode

On Sun, Feb 17, 2019 at 1:59 PM Philippe Verdy  wrote:

> Resist this idea, I've not been impolite.

I didn't say a word about you being impolite. I said I might be
impolite for not wishing to continue this discussion in that
direction.

> I just want to show you that terminals are legacy environments

You might have missed the thread's opening mail where I mentioned that
I've been developing a terminal emulator for five years. So I'm not
sure what you exactly want to show me about what a legacy environment
it is; I think I perfectly know it.

> that are far behind what is needed for proper internationalization

For many languages (or should I say scripts) internationalization is
pretty well solved in terminals. For others, requiring LTR complex
rendering, so-so. For RTL scripts it's a straight disaster, an
application can't even count on the letters of a word showing up in
the expected order, no matter what it does.

My work fixes the latter only, within(!) the limitations of this
legacy environment. I don't find it feasible to get rid of this legacy
(the concept of strict grid), and I find it a waste of time to ponder
about it.

Not sure why after about 200 mails on the topic, I still have a hard
time getting this message through. Seems to me that folks here on the
Unicode list want everything to be perfect for all the scripts at once
and not compromise to the slightest bit; and don't really appreciate
work that only offers partial improvement due to a special context's
constraints. This is something I didn't expect when I posted to this
list.

At this point I think I've gathered all the actionable positive
feedback I could (two issues: one is that shaping needs to be done
differently, and the other one is that the paragraph direction should
be detected on larger chunks of data (at least optionally) – thanks
again for them, I'll rework my spec accordingly). For all the rest,
irrelevant and hopeless stuff, like switching to proportional fonts,
IMO it's high time we let this thread end here.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-13 Thread Egmont Koblinger via Unicode

On Tue, Feb 12, 2019 at 9:35 PM Richard Wordingham via Unicode
 wrote:

> Bash already seems to handle proportional fonts quite well when run
> under Emacs 'M-x shell',

Having never used bash inside Emacs's shell, here's my experience
after about a minute of trying it:

Cursor keys allow you to walk back to the prompt, backspace allows to
delete the prompt, typing letters lets you modify the prompt... Not
something that I consider a sensible behavior.

If I do so, I have no idea what the executed command will be. Coloring
gives some clue, but isn't always reliable. My prompt is blue, the
text I type after that is black. I type one letter and then press
Ctrl-T to transpose the last two letters (the trailing space of my
prompt, and the newly typed letter). The newly typed letter is black.
I press Enter, this one-letter command isn't executed, and becomes
blue.

I feel magnitudes safer in standard bash where I know it doesn't allow
me to walk back to the prompt, only allows me to edit whatever I'm
trying to execute.

I have not studied how this behavior is implemented, but as per [1] as
well as the behavior I experience, it seems that lot of bash's
behavior wrt. line editing is moved to Emacs itself. Pretty much none
of my preferred shortcuts work as they do in native bash, something
I'm not happy about either.

I've no idea how this (external editing) would be expected to be the
generic behavior when there's no Emacs (no external editor) in the
game, plus a whole bunch of other utilities are expected to run (ones
that fail big time in Emacs's M-x shell, or even refuse to start up).

[1] https://www.gnu.org/software/emacs/manual/html_node/emacs/Shell-Mode.html

Re: Bidi paragraph direction in terminal emulators

2019-02-12 Thread Egmont Koblinger via Unicode

Hi Elias,

> For all the willingness to come up with ways to modernise the terminal, 
> you've only spoken about trying to showhorn rtl text in to the vt102 basic 
> terminal.

Yes, addressing BiDi was the exact thing that I did now. What's wrong with that?

I can't address all the imperfectnesses at once. If you take a look at
VTE's changelog, you'll see that I've done a lot more than this, and
chances are this won't be my last improvement either.

> What I mean is that f you're willing to go as far as introducing new escape 
> codes to allow applications to better control the behaviour of this one 
> feature, why do you stop there? Why still limit yourself to the bonds of 
> vt102?

Did I stay I'll stop here? No, I presented one step, without saying
anything about what might be the next one I tackle. (Okay, I drafted
out some ideas for continuing this work, and I said things about what
will definitely _not_ be the next step, as far as I'm concerned.)

> Once you take that first step towards the new control codes, why not simply 
> come up with a new scheme? Why not let me do:
>
> TERM=newfancything
>
> And then I'd have a system that supports everything I need: variable with 
> fonts, proper rtl text, pixel-precise character positioning, all the colours, 
> inline graphics, etc.

Because this would create a brand new world where practically every
application has to be heavily adjusted, if not built up from scratch
(e.g. for ncurses, I'd expect that a new replacement would have to be
designed and created).

Because this is not solely an engineering kind of task, but rather
something that would need buy-in from a critical set of people (the
maintainers of all these libs and apps, and the other popular
terminals), which I find unlikely to get, given that for most of these
apps the current platform is good enough, and something new would add
an significant amount of extra burden for marginal benefits.

Because, even if everyone supported the idea, the required amount of
design and implementation work would be magnitudes bigger than for
BiDi.

Because I'm doing one thing at a time. And I honestly just because I
came here to announce my work that addresses _one_ thing, I really
don't find it a fair question to ask why I didn't address suddenly
magnitudes more than that.

Because I'm doing this as a hobby project, not as a paid job. If
someone offers me a job to do this, we can discuss it.

> There is nothing magic about the grid of cells, and once you introduce new 
> escape sequences, you might as well truly modernise the terminal.

The magic about the grid of cells is all the software that were built
up with this assumption during the last couple of decades.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-12 Thread Egmont Koblinger via Unicode

Hi Philippe,

> The monospace restriction is a strong limitator: but then I don't see why a 
> "terminal" could not handle fonts with variable metrics, and why it must be 
> modeled only as a regular grid of rectangular cells (all of equal size) 
> containing only one "character" (or cluster?).

Because this is what a "terminal" currently is, this is one of the
basic assumptions around which gazilliions of libraries and
application were built up.

Just one example: A utility might query the width, let's say it's 80
columns. Then it can print either 81 "i"s, or 81 "w"s, and in both
cases it can be sure that the last one will be aligned exactly below
the first one.

You can sure change this. But then you'll have to heavily adjust the
behavior of all the screen drawing libraries and all the applications
that use these libraries or do their own screen handling. It's out of
the scope of my work to do anything like this. If you feel like, I
encourage you to go ahead, put your work in it, and present a proof of
concept.

> So using controls, you would try to mimic again what HTML already provides 
> you for free (and without complex specifications and redevelopment).

Show me that "without complex specifications and redevelopment"
because all I see is the need to heavily rewrite plenty of libs and
tools that were created and continuously developed during the last few
decades. I don't really see this approach feasible. Feel free to prove
me wrong by presenting software that works on top of the redefined
terminal emulator concept, at least on a proof on concept level. For
starter, I'd love to see a shell with interactive line editing (like
bash, zsh), and one application that uses vertical alignment heavily,
let's say "top" or anything similar, using proportional font in your
newly created world.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-10 Thread Egmont Koblinger via Unicode

On Sun, Feb 10, 2019 at 2:57 AM Richard Wordingham via Unicode
 wrote:

> Which side do you align RTL cells on?

It's out of the scope of my docs.

In the current work-in-progress implementation I align them to the
left, but there's a TODO entry to align them to the right instead (or
maybe center all the glyphs).


e.

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

Hi,

On Sun, Feb 10, 2019 at 12:52 AM Richard Wordingham via Unicode
 wrote:

> This is an example of where one needs a font designed for terminal
> emulators.

Definitely, this is another approach I forgot to mention in my mail,
rather than VTE switching to harfbuzz and figuring out all the issues.
This approach would also make them usable in every decent terminal
emulator at once, not just VTE.

Is there such a monospace font obeying wcwidth (that is: double wide
character for when a spacing mark is combined) for Devanagari? Is
there a monospace font for Arabic, for Syriac, etc.? (How much do
these questions make sense at all?)

If there are such fonts, I'd be happy to use them for testing.

e.

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

On Sat, Feb 9, 2019 at 10:10 PM Asmus Freytag via Unicode
 wrote:

> > I hope though that all the scripts can be supported with more or less
> > compromises, e.g. like it would appear in a crossword. But maybe not.
>
> See other messages: not.

For the crossword analogy, I can see why it's not good. But this
doesn't mean there aren't any other ideas we could experiment with.

Or do you mean to say that because it can't be made perfect, there's
no point at all in partially improving? I don't think I agree with
that.

e.

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

Hi Asmus,

On Sat, Feb 9, 2019 at 10:02 PM Asmus Freytag (c)  wrote:

> are you excluding CJK because of the difficulty handling a large
> repertoire with mechanical means?

No, I excluded CJK because they're pretty well solved in terminals,
and nowhere near along the lines of how they work with typewriters.

I should've probably said "letter based" scripts or whatever, I'm not
familiar with the exact terminologies.

> To force Hindi crosswords mode you need to segment the string into syllables,
> each having a variable number of characters [...]

Thanks a lot to you too for your detailed explanation!

> Are you defining as your goal to have some kind of "line by line" display that
> can survive any Unicode text thrown at it, or are you trying to extend a given
> design with rather specific limitations, so that it survives / can be used 
> with,
> just a few more scripts than European + CJK?

I don't have a clearly defined goal. I find fun in developing VTE (and
slightly improving other terminal emulators too by spreading ideas,
knowledge, comments etc.), addressing various kinds of goals, whatever
happens to come next. At this point it's BiDi, with a bit of
Devanagari improvement sneaking in the other day.

What is clear to me: I cannot redefine the basics of terminal
emulation. I can only add incremental improvements to whatever it
already is, and I have to make sure that the ecosystem built around it
during decades (all the screen handling libraries and applications)
doesn't break. I'm limited by these constraints.

> The discrepancies would be more like throwing random blank spaces in the
> middle of every word, writing letters out of order, or overprinting. So, more
> fundamental, not just "not perfect".

Let's take the Devanagari improvement of the other day. Until now,
there were plenty of dotted circles shown, and combining spacing marks
that should've been placed before the letter were placed after the
letter, before a placeholder dotted circle. Now they are displayed as
expected: the combininig spacing mark shows up before the letter (if
it's of that kind), and no dotted circle. The letter + spacing marks
now shows up correctly. The entire word still doesn't, e.g. there are
often spaces between letters where the upper line connecting them
should be continuous.

Eventually HarfBuzz could help, but it's just not yet clear how
exactly. I cannot essentially change the underlying model of fixed
width cells. On top of this model, though, we can experiment with
various ideas about displaying. For example, if a word occupies 7
columns in the model, then HarfBuzz renders it, and the rendered
version occupies the width of 8.6 columns, maybe we can squeeze it
using a trivial linear transformation? I'm not sure, but maybe it's an
idea worth investigating. Won't look perfect, but probably will look
better than what we do currently. We already have column spacing
implemented, to pull the columns further apart from each other by a
fixed amount (mostly for accessibility purposes), maybe a user can use
this feature to make more room for a nicely rendered, non-squeezed
Devanagari text.

> To give you an idea, here is an Arabi crossword. It uses the isolated shape of
> all letters and writes all words unconnected. That's two things that may be
> acceptable for a puzzle, but not for text output.

You can't get nice Arabic without first making sure the order of the
letters is the correct one, not reversed. :-) That's what my current
work is about.

As per Richard's feedback, I also see that shaping needs to be done
differently than I had thought. Mind you, my visual inspection of what
the non-preferred shaping approach gave to me vs. what a proper
HarfBuzz rendering gave (for Arabic) were extremely close to each
other, something that I'd probably consider "good enough" if I spoke
the language and were aware of the terminal's constraints. Well,
definitely a major improvement over what we have.

> You may begin to see the limitations and that they may well prevent you from
> reaching even your limited goal for speakers of at least three of the top ten 
> languages
> worldwide.

If the goal is to have perfect rendering without compromises: sure I
won't reach that. (It's not a goal for me. For perfect rendering,
users should get away from terminals.) If the goal is to have
something reasonably good, better than what we have currently, I can't
see why not.

cheers,
e.

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

Hi Ken,

> There are crossword puzzles for Hindi (in the Devanagari script). Just
> do an image search for "Hindi crossword puzzle".

It's easy to confirm the existence by an image search, it's hard to
confirm the non-existence ;)

> The existence proof of techniques to cut up text into syllables that
> enable crossword puzzle building, is not the same as a determination
> that the script, ipso facto, would work in a terminal context without
> dealing with additional complex script issues.

Thanks a lot for your detailed explanation; this possibility indeed
didn't occur to me.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

On Sat, Feb 9, 2019 at 9:01 PM Eli Zaretskii  wrote:

> then what you say is that some scripts
> can never be supported by text terminals.

I'm not familiar at all with all the scripts and their requirements,
but yes, basically this is what I'm saying. I'm afraid some scripts
can never be perfectly supported by text terminals.

I hope though that all the scripts can be supported with more or less
compromises, e.g. like it would appear in a crossword. But maybe not.

Maybe one day some new, modern platform will arise with the goal of
replacing terminal emulators, which I wouldn't necessarily mind. It's
gonna take an enormous amount of work, though.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

Hi Asmus,

> On quick reading this appears to be a strong argument why such emulators will
> never be able to be used for certain scripts. Effectively, the model 
> described works
> well with any scripts where characters are laid out (or can be laid out) in 
> fixed
> width cells that are linearly adjacent.

I'm wondering if you happen to know:

Are there any (non-CJK) scripts for which a mechanical typewriter does
not exist due to the complexity of the script?

Are there any (non-CJK) scripts for which crossword puzzles don't exist?

For scripts where these do exist, is it perhaps an acceptable tradeoff
to keep their limitations in the terminal emulator world as well, to
combine the terminal emulator's power with these scripts?

Honestly, even with English, all I have to do is "cat some_text_file",
and chances are that a word is split in half at some random place
where it hits the right margin. Even with just English, a terminal
emulator isn't something that gives me a grammatically and
typographically super pleasing or correct environment. It gives me
something that I personally find grammatically and typographically
"good enough", and in the mean time a powerful tool to get my work
done.

Obviously the more complex the script, the more tradeoffs there will
be. I think it's a call each user has to make whether they prefer a
terminal emulator or a graphical app for a certain kind of task. And
if terminal emulators have a lower usage rate in these scripts, that's
not necessarily a problem. If we can improve by small incremental
changes, sure, let's do. If we'd need to heavily redesign plenty of
fundamentals in order to improve, it most likely won't happen.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii  wrote:

> That's the application's problem, not the terminal's.  An application
> that wants its column to line up _and_ wants to support complex text
> scripts will need to move cursor to certain coordinates, not to assume
> that 7 codepoints always take 7 columns on display.

In order to do that, an application needs to know how wide a text will
appear, which depends on the font. How will it know it?

Will it by some means know the font and the rendering engine the
terminal uses (even across ssh) and will it have to measure it itself?

Or will it be able to ask the terminal? If so, how? Maybe a new
extension, an asynchronous escape sequence that responds back with the
measured width? What about the latency caused by the bunch of
asyncronous roundtrips, especially over ssh? What about the utter pain
and intrinsic unreliability of handling asynchronous responses, as
I've outlined in a section of
https://gitlab.freedesktop.org/terminal-wg/specifications/issues/8 ?

What if there's no font? What if there are multiple fonts at the same
time? What if the font is changed later on, is it okay then for the
display of existing stuff to fall apart and only newly printed stuff
to appear correctly?

How do you define the "width of the terminal in characters", get/set
by ioctl(..., TIOC[GS]WINSZ, ...) that many apps rely on?

If you define it by any means, what if by placing the maximum numbers
of "i"s in a row doesn't fill up the entire width? Will that area be
unaccessible, then? Or despite having a definition of terminal width,
will there be new cells beyond this width to write to?

What if filling a row with all "w"s overflows? I take it that an app
shouldn't print there, but what if it still does, will that piece of
text just not be shown?

How much more complicated would you think implementing something like
"zip -h" become?

> How is this different from using variable-pitch fonts?

Do you mean variable-pitch font where the terminal still places each
glyph in its designated area? The font is the private business of the
terminal emulator, then, it'll just appear ugly as a screenshot I've
already linked, but the emulation behavior wouldn't care.

Or do you mean variable-pitch font where each letter is placed after
each other, as you'd expect in document editors? That is, way more
"i"s that "w"s fitting in a line? It's not different, it's practically
the same. And this is something that none of the terminal emulators
I'm aware of does; and having some clue about terminal emuators, I
can't imagine how one could do (see all the questions above for a
start).

This is why I'm saying: Sure you can take this path, but then we're
talking about something new, not terminal emulators as we currently
know them. You can take this path, but then you'll have to rebuild
many of the already existing apps, and beware, they'll get way more
complex.

e.

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

On Sat, Feb 9, 2019 at 7:56 PM Eli Zaretskii  wrote:

> I'm probably missing something, because I don't see the grave problems
> you hint at.  Any width provided back by a shaper can be rounded to
> the nearest integral character cell, so your canvas can still remain
> rectangular.

Let's suppose a utility outputs these two lines of text:
abcdefg|
complex|

whereas "abcdefg" are these English letters themselves, but "complex"
is a word of some language requiring complex script rendering, taking
up 7 logical cells (because that's what wcwidth() says). Also, "|" is
the pipe symbol, or a vertical box drawing line, whatever.

Now let's assume that harfbuzz tells you that the desired width for
rendering this "complex" word is 5.3 times the width of the character
cell. Or 8.6 times it. How to proceed? How will the "|" bars align up,
and thus mc's two-panel layout, tmux's vertical split etc. not fall
apart? In the latter case, when the width requested by harfbuzz is
bigger than the designated width, what to with characters that "fall
off" at the right edge of the terminal?

e.

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

On Sat, Feb 9, 2019 at 7:07 PM Eli Zaretskii  wrote:

> You need to use what HarfBuzz tells you _instead_ of wcswidth.  It is
> in general wrong to use wcswidth or anything similar when you use a
> shaping engine and support complex script shaping.

This approach is not viable at all.

Terminal emulators have an internal data structure that they maintain,
a matrix of character cells. Every operation is performed here, every
escape sequence is defined on this layer what it does, the cursor
position is tracked on this layer, etc. You can move the cursor to
integer coordinates, overwrite the letter in that cell, and do plenty
of other operations (like push the rest to the right by one cell). If
you change these fundamentals, most of the terminal-based applications
will fall apart big time.

This behavior has to be absolutely independent from the font. The
application running inside the terminal doesn't and cannot know what
font you use, let alone how harfbuzz is about to render it. (You can
even have no font at all, such as with the libvterm headless emulator
library, or a detached screen or tmux session; or have multiple fonts
at the same time if a screen or tmux session is attached from multiple
graphical emulators.)

So one part of a terminal emulator's code is responsible for
maintaining this matrix of characters according to the input it
receives. Another part of their code is responsible for presenting
this matrix of characters on the UI, doing the best it can.

If you say that the font should determine the logical width, you need
to start building up something brand new from scratch. You need to
have something that doesn't have concepts like "width in characters".
You need to redefine cursor movement and many other escape sequences.
You need to heavily adjust the behavior of a gazillion of software,
e.g. zip's two-column output, anything that aligns in columns (e.g.
midnight commander, tmux's vertical split etc.), the shell's (or
readline's) command editing and wrapping to multiple lines, ncurses,
and so on, all the way to e.g. fullscreen text editors like Emacs.

And then we're not talking about terminal emulators anymore, as we
know them now, but something new, something pretty different.

Terminal emulators do have strong limitations. Complex text rendering
can only work to the extent we can squeeze it into these limitations.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode

Hi Richard,

On Sat, Feb 9, 2019 at 3:08 PM Richard Wordingham via Unicode
 wrote:

> It would be good to be able to access a maintained statement of the
> VTE rules for allocating characters to a cell, or group of cells, as
> appropriate.

What VTE did, up to a couple of days ago:

It opens the font, and measures the ASCII 33-126 or so characters,
takes their average size (well, in case of monospace font, they should
all have the same size), this determines the cell size.

Then every character cell is rendered individually, using Pango or
Cairo or I'm not sure what exactly – there are like three paths in the
source, the details are unclear to me. A cell might contain a base
character + nonspacing combining accents, these are passed together to
Pango and friends, so they render it as one unit. The glyph is aligned
to the left of its designated cell area, overflowing on the right (and
thus potentially overlapping with the next glyph) if it's wider than
its designated area.

As a special case, two adjacents cells might contain a double wide
(typically CJK) character, but it's not that special after all: it's
also displayed aligned to the left edge of its first cell.

What I improved a couple of days ago (to be released in vte-0.56), for
Devanagari and friends, although I know there's more than this to
address these scripts properly:

If a cell contains a regular letter, and the next cell contains a
spacing combining mark, then these two are passed to Pango in a single
step, that is, the spacing combining mark is applied around its base
letter by Pango as expected. (Previously the spacing combining mark
was rendered on its own, around a dotted circle, which was obviously
pretty bad.)

What I'm working on currently, as you all know by now, is
BiDi-shuffling the cells before rendering them (hopefully for
vte-0.58).

This is how VTE works now, but it's by no means a specification, and
tailoring a font to this behavior is probably not the right approach.
Instead, VTE's behavior should be improved. We have a pending feature
request (which I've already linked) to use HarfBuzz for rendering the
glyphs, which would then render grapheme clusters beautifully. The
problem that I don't know how to address is: What if harfbuzz tells us
that the overall width for rendering a particular grapheme cluster is
significantly different from its designated area (the number of
character cells [wcswidth()] multiplied by the width of each)?

cheers,
egmont

>
> > > (b) With a terminal that expects a fixed width font, surely the
> > > terminal decides how many cells it allocates to a group of
> > > characters, and the font designer has to come up with a suitable
> > > value based on that.
> >
> > Yes.  A terminal emulator that works with a shaper should probably
> > post-process the width information returned by the shaper for these
> > purposes.
>
> Perhaps it should base the number of cells on the width of the
> clusters.  However, continuing with my example, U+1789 KHMER LETTER NYO
> as a base character is too wide to fit in a cell, and the next
> character will overwrite its right-hand part. From this I deduce that it
> is allocated just one cell.  Gnome terminal is not alone in doing this,
> but it does better than some, in my opinion, in that the overflow of the
> foreground of one cell is not obliterated by the background of the
> next cell.  U+1789 has an East Asian width property of 'Neutral', which
> is distinctly unhelpful.
>
> What I would like is a specification of what a font must do to avoid
> such problems.
>
> > > >  I don't see how you can expect wcwidth, or any other
> > > > interface that was designed to work with _characters_, to be
> > > > useful when you need to display grapheme clusters.
>
> It, or something similar but worse, gets used, especially when moving
> the cursor for editing.
>
> > > Well I can envisage a decision being made that a grapheme cluster
> > > str (as decreed by the terminal) shall occupy wcswidth(str) cells -
> > > "The wcswidth() function returns the number of column positions for
> > > the wide-character string s, truncated to at most length n".
> >
> > AFAIU, the shaping engine returns its output in terms of font glyph
> > numbers, not character codepoints, so you cannot in general call
> > wcswidth on them.  The shaper also returns the advance information,
> > which serves instead of wcwidth and related APIs for determining the
> > actual width on display.
>
> Unfortunately, when the rectangular grid is being preserved,
> typographical advance width is generally ignored when determining the
> placement of characters.  Now, this is not always true; one can have
> the situation where the the positioning of characters respects the
> advance widths, but the positioning of the cursor assumes a fixed-width
> rectangular grid.  I have found working with that to be extremely
> confusing.
>
> Richard.
>

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Egmont Koblinger via Unicode

On Fri, Feb 8, 2019 at 10:36 PM Eli Zaretskii  wrote:

> No one in their right minds will run Emacs inside the Emacs terminal
> emulator.  And even for other applications, disabling bidi will almost
> always needed only for full-screen programs, which use curses-like
> libraries to address the entire screen.  So you'd switch off
> reordering for the entire time you are running such an app, then
> switch it back on after exiting.

Exactly.

But the question is: should it be the user to manually switch it
on/off, or should it happen for them automatically under the hood? If
the latter, how? My BiDi proposal answers this. Do you have another
possible answer?

> Are there any terminal emulators that support these sequences?

Prior to my specs: Not that I'm aware of. As of my work being
available: at least VTE and Mintty are working on it, and I know that
iTerm2 was also waiting for some specification. I'm sincerely hoping
for even more to follow.

e.

Re: Encoding italic

2019-02-08 Thread Egmont Koblinger via Unicode

Hi guys,

Having been a terminal emulator developer for some years now, I have
to say – perhaps surprisingly – that I don't fancy the idea of reusing
escape sequences of the terminal world.

(Mind you, I don't find it a good idea to add italic and whatnot
formatting support to Unicode at all... but let's put aside that now.)

There are a lot of problems with these escape sequences, and if you go
for a potentially new standard, you might not want to carry these
problems.

There is not a well-defined framework for escape sequences. In this
particular case you might say it starts with ESC [ and ends with the
letter 'm', but how do you know where to end the sequence if that
letter 'm' just doesn't arrive? Terminal emulators have extremely
complex tables for parsing (and still many of them get plenty of
things wrong). It's unreasonable for any random small utility
processing Unicode text to go into this business of recognizing all
the well-known escape sequences, not even to the extent to know where
they end. Whatever is designed should be much more easily parseable.
Should you say "everything from ESC[ to m", you'll cause a whole bunch
of problems when a different kind of escape sequence gets interpreted
as Unicode.

A parser, by the way, would also have to interpret combined sequences
like ESC[3;0;1m or alike, for which I don't see a good reason as
opposed to having separate sequences for each. Also, it should be
carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[
opening for an escape sequence – here terminal emulators vary. These
just make everything even more cumbersome.

ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity".
It's only nowadays that most terminal emulators support 256 colors and
some even support 16M true colors that some emulators try to push for
this bit unambiguously meaning "bold" only, whereas in most emulators
it means "both bold and increased intensity". Because of compatibility
reason, it won't be a smooth switch. Note that "bold" and "increased
intensity" only go in the same direction with white-on-black color
scheme, with black-on-white bold stands out more while increased
intensity (a lighter shade of gray instead of black) stands out less.
(We could also start nitpicking that the spec doesn't even say that
increased intensity is just for the foreground and not for the
background too.)

Should this scheme be extended for colors, too? What to do with the
legacy 8/16 as well as the 256-color extensions wrt. the color
palette? Should Unicode go into the business of defining a fixed set
of colors, or allow to alter the palette colors using the OSC 4 and
friends escape sequences which supported by about half of the terminal
emulators out there?

For 256-colors and truecolors, there are two or three syntaxes out
there regarding whether the separator is a colon or a semicolon.
ECMA-48 doesn't say anything about it, TUI T.416 does, although it's
absolutely not clear. See e.g. the discussion at the comment section
of https://gist.github.com/XVilka/8346728 , in Dec 2018, we just
couldn't figure out which syntax exactly TUI T.416 wants to say.
Moreover, due to a common misinterpretation of the spec, one of the
positional parameters are often omitted.

Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
for curly underline. What to do with them? Where to draw the line what
to add to Unicode and what not to? Will Unicode possibly be a
bottleneck of further improvements in terminal emulators, because from
now on every new mode we figure out we'd like to have in terminals
should go through some Unicode committee? And what if Unicode wants to
have a mode that terminal emulators aren't interested in, who will
assign numbers to them that don't clash with terminals? Who will
somehow keep the two worlds in sync?

What to do with things that Unicode might also want to have, but
doesn't exist in terminal emulators due to their nature, such as
switching to a different font size?

> This mechanism [...] is already supported
> as widely as any new Unicode-only convention will ever be.

I truly doubt this, these escape sequences are specific to terminal
emulation, an extremely narrow subset of where Unicode is used and
rich text might be desired.

I see it a much more viable approach if Unicode goes for something
brand new, something clean, easily parseable, and it remains the job
of specific applications to serve as a bridge between the two worlds.
Or, if it wants to adopt some already existing technology, I find
HTML/CSS a much better starting point.

regards,
egmont

On Fri, Feb 8, 2019 at 9:55 PM Doug Ewell via Unicode
 wrote:
>
> I'd like to propose encoding italics and similar display attributes in
> plain text using the following stateful mechanism:
>
> •   Italics on: ESC [3m
> •   Italics off: ESC [23m
> •   Bold on: ESC [1m
> •   Bold off: ESC [22m
> •   Underline on: ESC [4m
> •   Underline off: ESC [24m
> •

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Egmont Koblinger via Unicode

Hi Eli,

> Why would they want to toggle it back and forth?  What are the use
> cases where it makes sense to mix both modes?  IME, you either need
> one or the other, never both.

(Back to the basics, which are mentioned pretty clearly in my
specification, I believe, and I've also described here multiple
times... sigh.)

For certain apps, one of the modes is required (e.g. for cat it's the
implicit mode). For other tasks it's the other mode (e.g. for emacs
the explicit mode).

In a typical terminal session, you don't just use one of these kinds
of commands. You use various commands in a sequence, e.g. a cat
followed by an emacs, then a zip, then whatnot, then emacs again, then
a cat and a grep, etc...

The very last thing I would want to do as a user is to toggle some
setting back and forth, let alone remember which command needs which
mode.

> You can hardly expect Emacs (or any other application) to support
> control sequences that are not yet defined, let alone standardized.

The most essential sequence, BDSM to switch between implicit and
explicit modes, has been defined for like 28 years now. Sure I bring
slight changes and clarifications to it, as well as introduce new
ones. As of my recommendation which I've announced, these new ones are
defined as well.

It's probably never going to be a de jure standard, adopted by ECMA or
whatever "authority", but that's not what happens anywhere else in
terminal emulators nowadays. An "authority" which doesn't keep up to
date with innovations, doesn't have a feedback forum, and hasn't
released a new version for 28 years, is clearly not suitable for
making progress.

We have just announced a public forum called "Terminal WG" for
terminal emulator developers to collaborate and join their efforts
wrt. new extensions, rather than ad-hoc collaborations or each going
their own separate ways. We'd like its work to be widely accepted as a
basis for the desired behavior. My BiDi work is one of the works
hosted there. It'll probably never be an "authority" like ECMA, but
hopefully will be some kind of well-respected place of specs to adhere
to.

> When they become sufficiently widely available, I'm sure someone will
> add them to Emacs.

There's always a chicken and egg problem with this attutide. At the
very least, I'm kindly asking Emacs to emit BDSM so that when it's
fired up on a gnome-terminal, it'll have the terminal's BiDi
automatically disabled. This has nothing to do yet with Emacs's
built-in terminal emulator. Addressing that is sure a much bigger
chunk of work; I hope it'll happen if my BiDi proposal indeed turns
out to be successful.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Egmont Koblinger via Unicode

On Fri, Feb 8, 2019 at 3:28 PM Eli Zaretskii  wrote:

> You can have what you call the "explicit mode" if you set the variable
> bidi-display-reordering to nil.

So, if someone is running a mixture of applications requiring implicit
vs. explicit modes, they'll have to continuously toggle the setting of
their terminal back and forth. Just as for Konsole and friends there's
a graphical setting, correspondingly for Emacs's terminal there's this
bidi-display-reordering setting.

Now, I, as a user, want BiDi to work as seamlessly as possible,
definitely without me having to repeatedly switch a setting back and
forth if the applications could just as well do it automatically. One
of the basics of my spec.

Whether Emacs will adopt this, or will keep requiring users to toggle
this setting back and forth depending on the particular app they wish
to run, is not my call.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Egmont Koblinger via Unicode

Hi Eli,

> Emacs implements the latest UBA from Unicode 11; and the Emacs
> terminal emulator inserts all the text into a "normal" Emacs buffer,
> and displays that buffer as any other buffer.  So yes, you have there
> full UBA support.

One of the essentials of my work is that there's much more to BiDi in
terminal emulators than running the UBA. If one takes a step backwards
to look at the big picture, it becomes clear that in some cases the
UBA needs to be run, while in other cases it mustn't. And then of
course there needs to be some means of switching, and so on...

According to the description you give, Emacs's terminal always applies
the BiDi algorithm, therefore by its design only implements what I
call "implicit mode", and not the "explicit mode".

On the other hand, in order to run Emacs inside a terminal emulator,
you need to set that terminal emulator to explicit mode, so that it
doesn't reshuffle the characters. The behavior it expects from the
outer terminal doesn't match the behavior it provides in its inner
one. As an interesting consequence, if you open Emacs, then inside it
a terminal emulator, and then inside it an Emacs, it will display BiDi
incorrectly, in reversed order.

I'm making the strong claim that by running the UBA a terminal
emulator doesn't become BiDi aware, there's much more it needs to do.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Egmont Koblinger via Unicode

Hi Philippe,

> Adding a single bit of protection in cell attributes to indicate they are 
> either protected or become transparent (and the rest of the 
> attributes/character field indicates the id of another terminal grid or 
> rendering plugin crfeating its own layer and having its own scrolling state 
> and dimensions) can allow convenient things, including the possibility of 
> managing a grid-based system of stackable windows.
> You can design one of the layer to allow input (managed directly in the 
> terminal, with local echo without transmission delays and without risks of 
> overwriting surrounding contents.

At this point you're already touching much more the core of terminal
emulator behavior than e.g. my BiDi work does, it's a way more
essential, way more complex change – with much less clear goal to me,
like, why should emulators implement it, why would applications start
using it etc. If you wish to go for this direction, good luck!

(If anything, what I do see somewhat feasibile, is building up
something from scratch that looks much more like a proportional-font
text editing widget, or even a rich text editor, rather than terminal
emulator, and figure out step by step how to get a shell and simple
utilities and later more complex utilities run in that. This could be
a new platform which, by putting decades of hard work in it – which I
cannot do voluntarily –, could eventually replace terminal emulators.)

Philippe, I hate do say it, but at the risk of being impolite, I just
have to. Your ideas would take terminal emulators extremely far from
what they are now, with no clear goals and feasibility to me; and are
no longer any relevant to BiDi. All I see is we're wasting each
other's time on utterly irrelevant topics, and since I see exactly
zero chance of any worthful takeaway to come out of this,
unfortunately I cannot anymore devote my limited free time for this, I
just have to quit this conversation between the two of us. I'm really
sorry.


best regards,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Egmont Koblinger via Unicode

Hi Eli,

> Not sure why.  There are terminal emulators out there which support
> proportional fonts.

Well, of course, a terminal emulator can load any font, even
proportional, but as it places them in the grid, it will look ugly as
hell (like this one: https://askubuntu.com/q/781327/398785 ). Sure you
could apply some tricks to make it look a bit less terrible (e.g. by
centering each glyph in its cell rather than aligning to the left),
but it still won't look great.

In the world of terminal emulation, many applications expect things to
align properly according to the wcwidth() of the string they emit. You
abandon this (start placing the glyphs one after the other in a row,
no matter how wide they are), and plenty of applications suddenly fall
apart big time (let alone questions like how you define the terminal's
width in characters).

> Emacs is perhaps the only one whose terminal
> emulator currently supports bidi more or less in full

Let's not get started from here, please.

In Emacs-25.2's terminal emulator I executed "cat TUTORIAL.he". For
the entire contents, LTR paragraph direction was used and was aligned
to the left. Maybe something has changed for 26.x, I don't know.

In my work I carefully evaluated 4 other "BiDi-aware" terminal
emulators, as well an ancient specification for BiDi which I had to
read about twenty times to get to pretty much understand what it's
talking about. Identified substantial issues with both the standard as
well as all the independent implementations (which didn't care about
this standard at all). I show that existing terminal emulators are
incompatible to the extent that an app cannot reliably print any RTL
text by any means at all. At this point I firmly believe it should be
clear that BiDi in terminals is not a topic where one can just go
ahead and do something, without having a specification first. I lay
down principles which a proper BiDi-supporting platform I believe
needs to meet, argue why multiple modes (explicit and implicit) are
inevitable, examine what to do with paragraph direction, cursor
location and tons of other issues, and come up with concrete
suggestion how (partially based on that ancient specifications) these
all should be exactly addressed.

Then, after putting literally months of work in it, I come here to
announce my work and ask for feedback. So far, from a thread of 100+
mails, I take away two pieces of worthful feedback: one is that
shaping should be done differently, and the other one is that – for
some use cases – a bigger scope of data should be used for
autodetecting the "paragraph direction" (as per UBA's terminology).

And now you suddenly tell that Emacs's terminal supports BiDi more or
less in full???

Sorry, I just don't buy it. If you retain this claim, I'd pretty
please like to see a specification of its behavior, one which
addresses at least all the major the issues I address in my work, one
which I could replace my work with, one which I'd be happy to
implement in gnome-terminal in the solid belief that it's about as
good as my proposal, and would wholeheartedly recommend for other
terminal emulators to adopt.

Or maybe, by any chance, when you said Emacs's terminal supported BiDi
more or less in full, did you perhaps went with your own idea what a
BiDi-aware terminal emulator needs to support; ignoring all those
things I detail in my work, such as the inevitable need for explicit
mode, the need for deciding the scope of implicit vs. explicit mode,
and much more?


thanks a lot,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Egmont Koblinger via Unicode

Hi Philippe,

> I have never said anything about your work because I don't know where you 
> spoke about it or where you made some proposals. I must have missed one of 
> your messages (did it reach this list?).

This entire conversation started by me announcing here my work, aiming
to bring usable BiDi to terminal emulators.

> Terminals are not displaying plain text, they create their own upper layer 
> protocol which requires and enforces the 2D layout [...] Bidi does not 
> specify the 2D layout completely, it is purely 1D and speaks about left and 
> right direction

That's one of the reasons why it's not as simple as "let's just run
the UBA inside the terminal", one of the reasons why gluing the two
worlds together requires a substantial amount of design work.

> For now terminal protocols, and emulators trying to implement them; that must 
> mix the desynchronized input and output (especially when they have to do 
> "local echo" of the input [...]

I assume by "local echo" you're talking about the Send/Receive Mode
(SRM) of terminals, and not the "stty echo" line discipline setting of
the kernel, because as far as the terminal emulator is concerned, the
kernel is already remote, and it's utterly irrelevant for us whether
it's the kernel or the application sending back the character.

SRM is only supported by a few terminal emulators, and we're about to
drop it from VTE, too (https://gitlab.gnome.org/GNOME/vte/issues/69).

> If you look at historic "terminal" protocols,

I'm mostly interested in the present and future. In the past, only for
curiosity, and to the extent necessary to understand the present and
to plan for the future.

> Some older terminal protocols for mainframes notably were better than today's 
> VT-like protocols: you did not transmit just what would be displayed, but you 
> also described the screen area where user input is allowed and the position 
> of fields and navigation between them:

This is not seen in today's graphical terminal emulators.

> Today these links are better used with real protocols made for 2D and 
> allowing an web application to mange the input with presentation layer (HTML) 
> and with javascript helpers (that avoid the roundtrip time).

Sure, if you need another tool, let's say a dynamic webpage in your
browser, rather than a terminal emulator to perform your taks
effectively, so be it. I'm not claiming terminal emulators are great
for everything, I'm not claiming terminal emulators should be used for
everything.

> But basic text terminals have never evolved and have lagged behind today's 
> need.

I disagree with the former part. There are quite a few terminal
emulators out there, and many have added plenty of new great features
recently.

Whether they're up to today's needs, depends on what your needs are.
If you need something utterly different, go ahead and use whatever
that is, such as maybe a web browser. If you're good with terminals,
that's fine too. And there's a slim area where terminal emulators are
mostly good for you, you'd just need a tiny little bit more from them.
And maybe for some people this tiny little bit more happens to be
BiDi.

> Most of them were never tested for internationalization needs:

Terminal emulators weren't created with internationalization in mind.
I18n goals are added one by one. Nowadays combining accents and CJK
are supported by most emulators. Time to stretch it further with BiDi,
shaping, spacing combining marks for Devanagari, etc.

> [...] delimit input fields in input forms for mainframes, something that was 
> completely forgotten and remains forgotten today with today's VT-* protocols, 
> to indicate which side of the communcation link controls the content of 
> specific areas

Something that was completely forgotten, probably for good reasons,
and I don't see why it should be brought back.

> As well today's VT-* protocols have no possibility to be scriptable: 
> implemeint a way to transport fragments of javascripts would be fine.

I have absolutely no incentive to work in this direction.

> Text-only terminals are now aging but no longer needed for user-friendly 
> interaction, they are used for technical needs where the only need is to be 
> able to render static documents without interactiving with it, except 
> scrolling it down, and only if they provide help in the user's language.

Text-only terminals are no longer needed??? Well, strictly speaking,
computers aren't needed either, people lived absolutely fine lives
before they were invented :)

If you get to do some work, depending on the kind of work, terminal
emulators may or may not be a necessary or a useful tool for you. For
certain tasks you don't really have anything else, or at least
terminals are way more effective than other approaches. For other
tasks (e.g. text editing) it's mostly a matter of taste whether you
use a terminal or a graphical app. For yet other tasks, terminal
emulators take you nowhere.

My work aims to bring BiDi into

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Egmont Koblinger via Unicode

Hi Philippe,

On Thu, Feb 7, 2019 at 3:21 PM Philippe Verdy  wrote:

> "Rules" are not formally written, they are just a sense of best practices.

When it comes to BiDi in terminals, I haven't seen anything that I
consider reasonably okay, let alone "best practice". It's a mess.
That's why I decided to come up with something.

> Bidi plays very badly on terminals

Agreed. There's essentially two ways from here: just leave it as bad
as it is (or even see various terminal emulators coming up with not
well-thought-out hacks that just make it even worse) or try to
improve. I picked the latter.

> [...] refreshing a typical 80x25 screen takes about one half second, which is 
> much longer than typical user input, so full screen refresh does not work for 
> data input and editing, and terminals implement themselves the echo of user 
> input, ignoring how and when the receiving application will handle the input, 
> and also ignoring if the applciation is already sending ouput to the terminal.

I'm really unsure where you're trying to get with it.

For one, adding BiDi doesn't introduce the need for significantly
larger updates. Whenever a partial repaint of the screen was
sufficient, even with BiDi in the game it will remain sufficient.

Another thing: I'm not sure that 9.6kbps is a bottleneck to worry
about. It's present if you connect to a device via serial port, but
will you really do this in combination with BiDi? The use case I much
more have in mind is running a terminal emulator locally, or ssh'ing
to a remote matchine, for getting various kinds of productive work
done (e.g. wriiting a text file in someone's native RTL script in a
text editor). These are magnitudes faster.

> It's hard or impossible to synchroinize this and local echoes on the terminal 
> causes havoc.

If input mixes with output (e.g. you press some keys while you're
waiting for make/gcc to compile your app, and these letters appear
onscreen), the visual result is broken even without BiDi. I cannot
elimite this kind of breakage by introducing BiDi, nor can I build up
something from scratch that somewhat resembles the current terminal
emulator world but fixes all of its oddnesses.

> But the concept of "line" or "paragraph" in a terminal protocols is extremely 
> fuzzy. It's then very difficult to take into account the additiona Bidi 
> contraints as it's impossible to conciliate BOTH the logical ordering (what 
> is encoded in the transmitted data or kept in history buffers) and the visual 
> ordering.

I don't try to conciliate logical and visual ordering within the same
paragraph, I agree it's impossible, it's a semantical nonsense. But I
try to conciliate them in the sense that sometimes the visual order is
the desired one, sometimes the logical order, so let's make it
possible to use one for one paragraph, and the other one for another
paragraph.

> That's why there are terminal protocols that absolutely don't want to play 
> with the logical ordering and require all their data to be transmitted in 
> visual order (in which case, there's no bidi handling at all).

This is one of the modes in my recommendation. If your application
requires this mode (as e.g. Emacs does), use this mode and you're
good.

> In fact most terminal protocols are very defective and were never dessign to 
> handle Bidi input

Maybe it's high time someone fixed this defect, then? :)

> And here your unit (logical lines) is not even defined in the terminal 
> protocol and not known from the meitting applications whjich has no input 
> about the final output terminal properties. So the terminal must perform 
> guesses. As it can insert additional linebreaks itself, and scroll out some 
> portion of it, there's no way to delimit the effect of "bidi controls". The 
> basic requirement for correctly handling bidi controls is to make sure that 
> paragraph delimitations are known and stable. if additional breaks can occur 
> anywhere on what you think is a "logical line" but which is different from 
> the mietting application (or static text document which is ouput "as is" 
> without any change to reformat it, these bidi controls just make things worse 
> and it becomes impossible to make reasonnable guesses about paragraph 
> delimitations in the terminal. The result become unpredictable and most often 
> will not even make any sense as the terminal uses visual ordering always but 
> looses the tr!
 ack of the logical ordering (and things get worse when there are complex 
clusters or characters that cannot even fit in a monospaced grid.

If an exact definition of hard vs. soft wrapped lines is what you miss
from the specification, okay, I'll add it to a future version.

I don't know how terminals performing guesses occured to you, they
sure don't (as for hard vs. soft newlines).

> The basic requirement for correctly handling bidi controls is to make sure 
> that paragraph delimitations are known and stable.

Since we're talking about bidi controls being emitted,

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode

On Thu, Feb 7, 2019 at 6:53 PM Eli Zaretskii  wrote:

> No, it needs no interaction.  Unless the regexp doesn't work for you,
> which you should then report as a bug in Emacs.

Do you mean you aim to maintain a regex that matches everyone's prompt
in the world, without a significant amount of false positive matches
on non-prompt lines?

(It's getting damn off-topic though.)

e.

Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode

On Thu, Feb 7, 2019 at 6:33 PM Eli Zaretskii  wrote:

> Well, let's just say that Emacs uses the HL1 rule, and determines the
> base direction for the entire chunk of text between empty lines.

Exactly!

Now it's my turn to figure out how to add this behavior to terminals,
preferably stopping before/after prompts too.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode

Hi,

On Thu, Feb 7, 2019 at 3:27 PM Eli Zaretskii  wrote:

> It uses a regular expression, see term-prompt-regexp.

So, it's not automatic, needs user interaction, and for that reason,
may not have worked for me. (I have other weird things in my prompt,
like 256-color sequences that Emacs didn't recognize, perhaps this
made the regexp matching fail. Nevermind.)

> > Whatever it does to know where the prompt is, can it be made into a
> > standard, cross-terminal feature?
>
> Not sure.  It's a kind of heuristic, which is why the regexp is
> customizable on user level, so that users could adapt it to their
> needs, should that be necessary.

iTerm2 has a "shell integration" where the prompt contains explicit
markers so that no heuristics or user configuration is needed from the
terminal. We're trying to somewhat standardize it at
https://gitlab.freedesktop.org/terminal-wg/specifications/issues/4 and
get more terminals support it. Not sure where this attempt will take
us, we'll see.

> In what version of Emacs is that?  In the latest version 26 I have
> here, the tutorial displays with most paragraphs in RTL direction.

25.2 here, it might have obviously changed for a newer version, glad to hear it.

My distro will upgrade in about 2 months. Since I'm not an Emacs user
myself, I hope you don't mind if I don't make extra rounds in
upgrading now to verify this.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Egmont Koblinger via Unicode

On Thu, Feb 7, 2019 at 3:14 PM Eli Zaretskii  wrote:

> Not a bug, a feature.  Emacs doesn't remove the bidi controls from
> display (that's another deviation allowed by the UBA, see section
> 5.2).  On GUI displays, these controls are displayed as thin 1-pixel
> spaces, but on text-mode terminals they are shown as space.

Thanks for the clarification!

> Why?  As I said, the tutorial was written in part to demonstrate the
> UBA implementation, including the dynamic detection of base paragraph
> direction, and this is exactly one example of how it works in
> practice.

Fair enough, then.

> > To recap: The _paragraph direction_ is determined in Emacs for
> > emptyline-delimited segments of data, which I honestly find a great
> > thing, and would love to do in terminals too, alas at this point it's
> > blocked by some really nontrivial technical issues. But once you have
> > decided on a direction, each _line_ within that data is passed
> > separately to the BiDi algorithm to get reshuffled
>
> Yes and no.  You could keep your mental model if you like, but
> actually the UBA explicitly says that each line is to be reordered for
> display separately, see section 3.4 of UAX#9.

The very first step of the BiDi algorithm is to split at "paragraphs",
however that's defined, and then do the rest for each paragraph.

For one particular paragraph, there's a lot going on: determining
embedded levels and such. At one point, at the very beginning of 3.4,
a caller may split a paragraph into lines. Then the rest (actual
reordering) happens on lines.

This is _not_ the same as splitting into lines upfront (that is,
define UBA's "paragraphs" as the text file's "lines"), and then
determining embedded levels and reshuffling on these smaller units.

Emacs does the latter, and so does my specification.

I believe it's not my mental model that's weird, but your use of
terminology that doesn't match UBA's that confused me. It's pretty
confusing and obviously hard to use the proper terminology, since
Emacs's definition and the user-perceived notion of a "paragraph"
differs from what becomes a "paragraph" according to UBA's definition.

Both in Emacs and in my spec, a "line" of the text file maps to a
"paragraph" according to UBA's phrasing. (Except when determining the
paragraph direction, where Emacs uses its own human-perceived
emptyline-separated paragraph, rather than lines. Which is a nice
thing to do.)

Anyways, I'm glad it turned out we're on the same page, it's just the
terminology that's truly confusing.

cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Egmont Koblinger via Unicode

Hi Philippe,

> There's some rules for correct display including with Bidi:

In what sense are these "rules"? Where are these written, in what kind
of specification or existing practice?

> - Separate paragraphs that need a different default Bidi by double newlines 
> (to force a hard break)

There is currently no terminal emulator I'm aware of that uses empty
lines as boundaries of BiDi treatment.

While my recommendation uses a one smaller unit (logical lines), and I
understand as per Eli's request that it would be desireable to go with
emptyline-delimited boundaries, what in fact all the current
self-proclaimed BiDi-aware terminal emulators that I came across do is
use a unit two steps smaller than yours: they do BiDi on physical
lines of the terminal, no matter how a logical line of the output had
to wrap into physical ones because didn't fit in the width. (It's a
terrible behavior.)

The current behavior of terminal emulators is very far from what you describe.

> - use a single newline on continuation

Continuation of what exactly?

But let's take a step back: Should the output be pre-formatted by some
means, or do we rely on the terminal emulator wrapping overlong lines?
(If pre-formatted then for what width? 80 columns, so that I waste
precious real estate if my terminal is wider? Or is it a requirement
for any app that produces output to implement a decent dynamic
wrapping engine for nice formatting according to the actual width?)

There's precedence for both of these different approaches. I don't
think it's feasible to pick one, and claim that the other approach is
discouraged/invalid/whatever.

> - if technical items are untranslatable, make sure they are at the begining 
> of lines and indented by some leading spaces, before translated ones.

I firmly disagree. There shouldn't be any restriction on how a
translator wishes to translate a sentence. The computer world has to
adapt to the requirements of human languages, not the other way
around!

> - Don't use any Bidi control !

Why not? They do exist for a reason, for the very reason that any
logical translation, which a translator might want to write (see my
previous point) is presentable in a visually correct way. Use them for
that, whenever needed.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-06 Thread Egmont Koblinger via Unicode

Hi Richard,

> Not necessarily.  One could allow the first strong character in the
> prompt to determine the paragraph directions

How does Emacs know what's a prompt? How can it tell it from the
previous and next command's output?

Whatever it does to know where the prompt is, can it be made into a
standard, cross-terminal feature?

> That's what the Emacs
> terminal (invoked by M-x term; top level definition in term.el) does.

I tried it. Executed my default shell, and inside that, a "cat
TUTORIAL.he". All the paragraphs are rendered as LTR ones,
left-aligned. Not the way the file is opened in Emacs.

If you claim Emacs's built-in terminal emulator supports BiDi, I'm
kindly asking you to present a documentation of its behavior, in
similar spirit to my BiDi proposal.

> Not necessarily.  One might use cat to glue together files that had
> split into 1400k chunks, in which case it is not even reasonable to
> expect the end of file to be at a character boundary.  (Yes, floppy
> disks still have their uses.)

I did not say anything about changing cat's behavior. I recommended to
change the convention for such paragraph-oriented text files to end
with two newlines.

> But the white space between paragraphs is a separator, not a
> terminator.  One doesn't require it at the end when formatting
> paragraphs within the cell of a table.

Does this logic also apply to single newline characters? If not, why
not, what's the conceptual difference? If it does, why do text files
end in a newline?


e.

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-06 Thread Egmont Koblinger via Unicode

Hi,

I was loose with my terminology once again, which is not a wise thing
when you're trying to clarify misunderstandings :)

> But once you have
> decided on a direction, each _line_ within that data is passed
> separately to the BiDi algorithm to get reshuffled; this is what Emacs
> does, this is what my specification says, and this is the right thing.
> That is, for this step, the definition of "paragraph", as the BiDi
> algorithm uses this term, is a line of the text file.

I keep thinking of the BiDi algorithm as one that takes a single
paragraph, because that's how I use it in VTE. But in fact, the BiDi
algorithm starts by splitting into paragraphs. I keep forgetting about
this outermost "for loop" of the BiDi algo.

And with that, proper definition, you can of course pass the entire
emptyline-delimited segment into the BiDi algorithm in a single step.
In its first phase, the BiDi algorithm will split it at newlines,
because for the BiDi algorithm (but not when detecting the paragraph
direction in Emacs), newline is the paragraph delimiter. Then it will
execute the rest of the algorithm for each paragraph (that is: line)
separately.

This is exactly the same as splitting manually, and then for each line
invoking the BiDi algorithm.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-06 Thread Egmont Koblinger via Unicode

Hi Philippe,

Thanks a lot for your input!

Another fundamental difficulty with terminal emulators is: These
controls (CR, LF...) are control instructions that move the cursor in
some ways, and then are forgotten. You cannot do BiDi on the
instructions the terminal receives. You can only do BiDi on the
result, the contents of the canvas after these instructions are
executed. Here these controls are either lost, or you have to give a
specification how exactly they need to be remembered, i.e. converted
to being part of the canvas's data.

Let's also mention that trying to get apps into using them is quite
hopeless. The best you can do is design BiDi around what you already
have, which pretty much means hard vs. soft line endings, and
hopefully forthcoming semantical marks around shell prompts. (To
overcomplicate the story, a received LF doesn't convert the line
ending to hard wrapped in most terminal emulators. In some it does. I
don't think there's an exact specification anywhere. Maybe the BiDi
spec needs to create one. Lines are hard wrapped by default, turned to
soft wrapped when the text gets wrapped at the end of the line, and a
few random control functions turn them back to hard one, but in most
terminals, a newline is not such a control function.)

Anyway, please also see my previous email; I hope that clarifies a lot
for you, too.


cheers,
egmont

On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode
 wrote:
>
> I think that before making any decision we must make some decision about what 
> we mean by "newlines". There are in fact 3 different functions:
> - (1) soft line breaks (which are used to enforce a maximum display width 
> between paragraph margins): these are equivalent to breakable and 
> compressible whitespaces, and do not change the logical paragraph direction, 
> they don't insert any additionnal vertical gap between lines, so the logicial 
> line-height is preserved and continues uninterrupted. If text justification 
> applies, this whitespace will be entirely collapsed into the end margin, and 
> any text before it will stilol be justified to match the end margin (until 
> the maximum expansion of other whitespaces in the middle is reached, and the 
> maximum intercharacter gap is also reached (in which case, that line will not 
> longer be expanded more), but this does not apply to terminal emulators that 
> noramlly never use text justification, so the text will just be aligned to 
> the start margin and whitespaces before it on the same line are preserved, 
> and collapsed only at end of the line (just before the soft line break itself)
> - (2) hard line breaks: they break to a new line but continue the paragraph 
> within its same logical direction, but they are not compressible whitespaces 
> (and do not depend on the logical end margin of the paragraph.
> - (3) paragraph breaks: generally they introduce an addition vertical gap 
> with top and bottom margins
>
> The problem in terminals is that they usually cannot distinguish types (1) 
> and (2), they are simply encoded by a single CR, or LF, or CR+LF, or NEL. 
> Type (1) is only existing within the framework of a higher level protocol 
> which gives additional interpretation to these "newlines". The special 
> control LS is almost never used but may be used for type (1) i.e. soft 
> line-breaks, and will fallback to type (2) which is represented by the legacy 
> "simple" newlines (single CR, or single LF, or single CR+LF, or single NEL). 
> I have seen very little or no use of the LS (line separator) special control.
>
> Type (3) may be encoded with PS (paragraph separator), but in terminals (and 
> common protocols line MIME) it is usually encoded using a couple of newline 
> (CR+CR, or LF+LF, or CR+LF+CR+LF, or NL+NL) possibly with additional 
> whitespaces (and additional presentation characters such as ">" in quotations 
> inserted in mail responses) between them (needed for MIME and HTTP) which may 
> be collapsed when rendering or interpreting them.
>
> Some terminal protocols can also use other legacy ASCII separators such as 
> FS, GS, RS, US for grouping units containing multiple paragraphs, or STX/EOT 
> pairs for encapsulating whole text documents in an protocol-specific 
> enveloppe format (and will also use some escaping mechanism for special 
> controls found in the middle, such as DLE+control to escape the control, or 
> DLE+0 to escape a NUL, or DLE+# to escape a DEL, or DEL+x+NN where N are a 
> fixed number of hexadecimal, decimal or octal digits. There's a wide variety 
> of escaping mechanisms used by various higher-layer protocols (including 
> transport protocols or encoding syntaxes used just below the plain-text 
> layer, in a lower layer than the transport protocol layer).
>
> Le lun. 4 févr. 2019 à 21:46, Eli Zaretskii via Unicode  
> a écrit :
>>
>> > Date: Mon, 4 Feb 2019 19:45:13 +
>> > From: Richard Wordingham via Unicode 
>> >
>> > Yes.  If one has a text composed of LTR and RTL

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-06 Thread Egmont Koblinger via Unicode

Hi Eli,

(I'm getting lost where to reply, and how the subject gets mangled and
the thread split into different ones.)


I've thought about it a lot, experimented with Emacs's behavior, and
I've arrived at the conclusion that we are actually much closer to
each other than I had thought. Probably there's a lot of
misunderstanding due to different terminology we used.

I've set my terminal to RTL paragraph direction (via the relevant
escape sequence), then did a "cat TUTORIAL.he" (the file taken from
26.1), and compared to what I see in Emacs 25.2.2 – both the graphical
one, and the one running in a terminal of no BiDi.

Apart from a few minor irrelevant differences, they look the same! Hooray!!!

(The differences are:

- I had to slightly modify TUTORIAL.he to make sure none of the lines
start with a BiDi control (I added a preceding character) because
currently VTE doesn't support them, there's no character cell to store
this data. This definitely needs to be fixed in the second version of
my proposal.

- Emacs running in a terminal shows an underscore wherever there's a
BiDi control in the source file – while the graphical one doesn't.
This looks like a simple bug to me, right?

- Line 1007, the copyright line of this file uses visual indentation,
and Emacs detects LTR paragraph for that line. I think it should
rather use BiDi controls to have an overall RTL paragraph direction
detected, and within that BiDi controls to force LTR for the text. The
terminal shows it with RTL direction, as I manually set it.

Again, all these three details are irrelevant to my point, namely that
in WIP gnome-terminal it looks the same as in Emacs.)


You define paragraphs as emptyline-separated blocks on which you
perform autodetection of the paragraph direction. This is great! As
I've mentioned, I'd love to have such a mode in terminals, but it's
subject to underlying improvements, like knowing when a prompt starts
and ends, because prompts also have to be paragraph delimiters. You
convinced me that it's much more important than I thought, thanks a
lot for that! I will try to see if I can push for addressing the
prerequisite issues sooner. Indeed I had to manually set RTL paragraph
direction; with manual LTR or with per-line autodetection (as VTE can
do now) the result would be much worse.


Here's how the story continues from here. Here is where we
misunderstood each other (or at the very least I misunderstood you),
although we are talking about the same, doing things the same way:

The BiDi algorithm takes a paragraph of text at a time, and somehow
reshuffles its letters. UAX#9 section 3 starts by saying that the
first main phase is separation into "paragraphs". What are those
"paragraphs" that we're takling about _now_?

The thing is, both in Emacs as well as in my specification, it's a
logical line of the text (that is: delimited by single newlines). No,
in these steps, when UBA is run, the paragraph is no longer defined as
emptyline-delimited segments, it's defined as lines of the text.

To recap: The _paragraph direction_ is determined in Emacs for
emptyline-delimited segments of data, which I honestly find a great
thing, and would love to do in terminals too, alas at this point it's
blocked by some really nontrivial technical issues. But once you have
decided on a direction, each _line_ within that data is passed
separately to the BiDi algorithm to get reshuffled; this is what Emacs
does, this is what my specification says, and this is the right thing.
That is, for this step, the definition of "paragraph", as the BiDi
algorithm uses this term, is a line of the text file. This is where I
thought we had a disagreement, but we don't, we just misunderstood
each other.

-

On a nitpicking side note:

It's damn ugly not to terminate a text file with a newline. Newline is
much better thought of a "terminator" than a "delimiter". For example,
if you do a "cat file1 file2", you expect file2 to start on its own
line.

Shouldn't this apply to paragraphs, too, especially when BiDi is in
the game? I'd argue that an empty line (double newline) shouldn't be a
delimiter, it should be a terminator for a paragraph. I think "cat
file1 file2" should make sure that the last paragraph of file1 and the
first paragraph of file2 are printed as separate paragraphs
(potentially with different paragraph direction), shouldn't it? I'd
argue that if a text file is formatted like TUTORIAL.he, with empty
lines denoting paragraph boundaries, then it should also end in an
empty line (that is: two newline characters).

-

Feel free to skip the rest :)

Let's make a thought experiment. Let's assume that for running the
BiDi algorithm, we'd still stick to the emptyline-delimited paragraph
definition. This is not what you do, this is not what I do, but I
misunderstood that this is what you did, and I also thought this was a
good idea as a potential extension for the BiDi specs – I no longer
think so. This definition is truly problematic, as I'll

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators)

2019-02-04 Thread Egmont Koblinger via Unicode

Hi Eli,

> IME, this is a grave mistake.  I hope I explained why; it is now up to
> you to decide what to do about that.

Let me share one more thought.

I have to admit, I'm not an Emacs user, I only have some vague ideas
how powerful a tool it is. But in its very core I still believe it's a
text editor – is it fair to say this? It could be used for example to
conveniently create TUTORIAL.he.

I'm not aware of all the kinds of works you can do in Emacs, but I
have a feeling that the kind of work you do in a terminal emulator is
potentially more diverse. (Let's not nitpick that a terminal can run
emacs and emacs has a terminal inside so mathematically speaking it's
all the same...)

"cat TUTORIAL.he" is indeed one of the commands you can execute in a
terminal, and unfortunately, given what terminals currently understand
from their contents, I just cannot make it display as you would prefer
(and I agree would make a lot of sense). But it's just one use case.

There are plenty of line-oriented tools.

Think of "head" and "tail". They operate on lines of files, which end
up being paragraphs in the terminal according to my definition.
According to your definition, they could cut a paragraph in half, they
could render differently than as if the entire file was printed.
According to my definition, you'll always get the same visual
repsesentation, just on the given fragment of the file.

Think of "grep", possibly combined with "-r" to process files
recursively, and "-C" to print context lines. Not only it can cut
paragraphs (of your definition) in half when it displays the matching
line (plus context), but also how would you locate in its output when
it switches from one match's context to the next match's context
within the same file, or to a match in another file? How would you
define a paragraph, and how would you define the bigger unit on which
the paragraph direction is guessed? I think it's again a use case
where my definition of paragraph is less problematic than yours.

Think of ad-hoc shell scripts that use "echo"/"printf" to inform the
user, "read" to read data etc. Or utilities written in C or whatever
that don't care about terminals at all, just print output. In these
cases there's no one formatting / wrapping at 80 columns performed by
the app. A logical segment is typically printed as a single line,
which will be wrapped by the terminal if doesn't fit in the current
width (and in some terminals rewrapped when the terminal is resized),
this matches my definition of paragraph. There's rarely an empty line
injected in these cases; if there is, it is most likely to separate
some even bigger semantical units.

There are just sooo many use cases, it's impossible to perfectly
address all of them at once. "cat TUTORIAL.he" is just one of them,
not necessarily the most typical, not necessarily the one that should
drive the BiDi design.

Let's note that the four "BiDi-aware" terminals that I could test all
define paragraphs as lines – I mean visual lines on their own canvas.
If the terminal is 80 characters wide, and a utility prints a line of
100 characters, it'll obviously wrap into 80+20 characters. And then
these terminals treat them as two separate paragraphs, one with 80
characters and one with 20, and run BiDi separately on them. I'm
confident that my specification which says that it should be preserved
as a 100 character long paragraph and passed to BiDi accordingly is
already a significant step forward.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-04 Thread Egmont Koblinger via Unicode

Hi Eli,

> I think it's unreasonable and impractical to expect 'echo', 'cat', and
> its ilk to emit bidi controls (or any other controls) to force
> paragraph direction.  For starters, they won't know what direction to
> force, because they don't understand the text they are processing.

I agree, it is unreasonable for 'echo', 'cat' etc. to emit BiDi controls.

There could be some higher level helper utiities though, let's say a
"bidi-cat" that examines the file, makes a guess, emits the
corresponding escape sequences and cats the file. It's not necessarily
a good approach, but a possible one (at least temporarily until
terminals implement a better one).

On the other hand, it's not unreasonable for higher level stuff (e.g.
shell scripts, or tools like "zip") to use such control characters.

> No, this simple case must work reasonably well with the application
> _completely_ oblivious to the bidi aspects.  If this can't work
> reasonably well, I submit that the entire concept of having a
> bidi-aware terminal emulator doesn't "hold water".

There isn't a magic wand. I can't magically fix every BiDi stuff by
changing the terminal emulator's source code. Not because I'm clumsy,
but because it just can't be done. If it was possible, I wouldn't have
written a long specification, I would have just done it. (Actually, if
it was possible, others would have sure done it long before I joined
terminal emulator development.)

There need to be multiple modes, some of them due to the technical
particularities of terminal emulation that aren't seen elsewhere (e.g.
explicit vs. implicit), and some of them because they are present
everywhere where it comes to BiDi (e.g. paragraph direction). And if
the mode is not set correctly, things might break, there's nothing new
in it.

What my specification essentially modifies is that with this
specification, you at least will have a chance to get the mode right.

Currently there are perhaps like 4 different behaviors implemented
across terminal emulators when it comes to BiDi. An application cannot
control and cannot query the behavior. In order to get Emacs behave
properly, you have to ask your users to adjust a setting (and I cannot
repeat enough times that I find this an unacceptable user experience).
If the settings of the terminal aren't what Emacs expects, the result
could be broken (RTL words might even show up in reverse, LTR order).

The same goes for the random example of "zip -h", assuming that they
add Hebrew translation. Given the current set of popular terminal
emulators, there's no way zip could emit some Hebrew text in a
reliably readable way. Whatever it does, there will be terminal
emulators (and settings thereof) where the result is totally broken
(reversed), or at least unpleasant (wrong paragraph direction used).
Moreover, if "zip" emits the Hebrew text in the semantically correct
logical order (e.g. they use whatever existing framework, like gettext
and a popular .po editor), as opposed to the visual LTR order seen in
some legacy systems, it will need different terminal emulator settings
than Emacs, so if someone uses both zip and Emacs regularly, they'll
have to continuously toggle their terminal's settings back and forth –
have I mentioned how unacceptable I find this as a user? :)

One of the key points of my specification is that applications will be
able to automatically set the mode. Emacs will be able to switch to
the mode it requires, and so will be zip. They will have the
opportunity.

If they don't live with this opportunity, it's not my problem, and
there's nothing I could do about it. Let's say hypothetically that zip
adds Hebrew translations, but refuses to emit the escape sequence that
switches to RTL paragraph direction, and thus its result doesn't look
perfect. Can terminal emulators, can my specification, can me be
blamed in this case? I don't think so. If zip knows exactly what it
wants to print (as with the help page it knows for sure), and is given
all the technical infrastructure to reliably achieve that, it'd be
solely them to blame if they refused to properly use it. It's
absolutely out of the scope of my work to try to fix this case.

"cat" is substantially different. In case of "zip", the creators of
that software know exactly how the output should look like, and
according to my specification (assuming a confirming terminal
emulator, of course) nothing stops them from achieving it. "cat"
doesn't know, cannot know the desired look, since the file itself
lacks this information.

Paragraph direction is a concept that sucks big time. (I have no idea
how Unicode could have got it better, though.) It's a piece of
information that needs to be carried externally along with the text,
in order to make sure it'll be displayed correctly. It's a pain in the
butt, just as much carrying the encoding in the pre-Unicode days was,
and hardly anyone cared about, resulting in incorrect accented letters
way too often. Practically everyone's lazy and

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators)

2019-02-04 Thread Egmont Koblinger via Unicode

Hi Eli,

> Actually, UAX#9 defines "paragraph" as the chunk of text delimited by
> paragraph separator characters.  This means characters whose bidi
> category is B, which includes Newline, the CR-LF pair on Windows,
> U+0085 NEL, and U+2029 PARAGRAPH SEPARATOR.

Indeed, this was an oversight on my side. So, with this definition,
every single newline character starts a new paragraph. The result of
printf "Hello\nWorld\n" > world.txt
is a text file consisting of two paragraphs, with 5 characters in each. Correct?

> Actually, Emacs implements the rule that paragraphs are separated by
> empty lines.  This is documented in the Emacs manuals.

That is, Emacs overrides UAX#9 and comes up with a different
definition? Furthermore, you argue that in terminals I should follow
Emacs's definition rather than Unicode's? Or please clarify if I
misunderstood you here.

> > while Emacs itself is a viewer that treats runs between single
> > newlines as paragraphs. That is, Emacs is inconsistent with itself.
>
> Incorrect.  Emacs always treats a run of text between empty lines as a
> single paragraph, in TUTORIAL.he and everywhere else.  There's nothing
> special about TUTORIAL.he, it is just a plain text file with a few
> dozen of bidi formatting controls, needed to show the key sequences
> with weak and neutral characters in correct visual order.  [...]

Thanks for the clarification, I believe it's clear to me now.

> At least with Emacs, it is not the same.  I think considering each
> line as a separate paragraph makes writing bidi plain-text documents
> that look right almost impossible, if each line ends in a newline [...]

> My personal recommendation is to adopt theempty line rule.  It's
> simple enough and gives good results IME. [...]

> I'm surprised that you describe this as such a complex problem.  I
> think you explained up-thread that terminal emulators should cope with
> lines of text arriving piecemeal, which I interpreted as meaning that
> text is stored in the emulator's memory.  Modern emulators running on
> windowed desktops also provide scroll-back buffers, and react to
> expose events.  So I think the text that is currently in the viewport,
> and also some text previously shown, are stored in memory, and can be
> consulted.

The problem is not the memory management.

Let's look at the following session:

---snip---
prompt$ cat file1.txt
This is the
first human-perceived paragraph.

And this is the
second.
prompt$ cat file2.txt
Here this is the
third paragraph.

And this one is
the fourth.
prompt$
---snip---

If you load the files to Emacs, it is perfectly aware of the contents
of the two files. It can define paragraphs however it wants to, and
BiDi the files accordingly.

The terminal emulator doesn't know what's a shell prompt, what's a
command that the user types, what's the output of that command. (You
don't know either from this snippet. Maybe I only cat'ed file1.txt,
and "prompt$ cat file2.txt" is just the sixth line of this eleven-line
file.)

In the terminal emulator's eyes, with Emacs's definition (empty line
delimited), this is one paragraph:

prompt$ cat file1.txt
This is the
first human-perceived paragraph.

and this is another paragraph:

And this is the
second
prompt$ cat file2.txt
Here this is the
third paragraph.

and similarly for the third one.

I believe I understand your concerns with the per-line paragraph
definition, but this interpretation that I've just shown most likely
leads to even more broken behavior.

It's a really nontrivial technical problem to let the terminal
emulator know where each prompt, and/or each command's output begins
and ends. There's work going on for letting the terminal emulator
recognize the prompts, but even if it's successful, it'll probably
take 5-10 years to reach the majority of the users. And it probably
still wouldn't solve the case of knowing the boundary between the two
outputs if a "cat file1.txt; cat file2.txt" is executed, let alone if
they're concatenated with "cat file1.txt file2.txt".

So, what you're arguing for, is that the default behavior should be
something that's:
- currently not implementable in a semantically correct way (to stop
around shell prompts) due to technical limitations, and
- isn't what Unicode says.

You have not convinced me that the pros outweigh the cons. That being
said, I'm more than open to see such a behavior as a future extension,
subject of course to the semantic prompt stuff being available.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-04 Thread Egmont Koblinger via Unicode

Hi,

> To me, 'visual order' means in the dominant order of the script.

This is not a definition I've come across anywhere else, nor matches
my intuition of "visual order" : the exact visual order (recursive
definition, yay!) of how you see the glyphs being displayed in the
row.

> So,
> if one takes it as natural that a decimal number starts with the most
> significant digits, the decimal numbers used with Arabic are *not*
> stored in visual order if considered as part of that script.

The visual order is: You get the string rendered properly. You scan
with your eyes in one strict direction, and take note of what you see
in that order.

For example, let's say: "Hello Shalom" (the latter word in Hebrew):

HELLO שָׁלוֹם

The logical order:
H
E
L
L
O
space
שָׁ
ל
וֹ
ם

The visual order, from left to right is:
H
E
L
L
O
space
ם
וֹ
ל
שָׁ

Similarly, the visual order from right to left (a much more rarely
seen concept, the exact reverse of the visual LTR order) is:
שָׁ
ל
וֹ
ם
space
O
L
L
E
H

"Visual order" most of the time means "visual left to right order",
although strictly speaking, "visual right to left order" is just as
much a visual order. This is all independent from the script's
dominant order.

> "In combination with the following rule, this means that trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction)."
>
> The 'visual end' is clearly not always the right-hand end!

Yes, that's right. (And it doesn't contradict the definition of
"visual order". For RTL paragraphs, those trailing whitespaces appear
at the beginning of the "visual LTR order").


e.

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-04 Thread Egmont Koblinger via Unicode

> > Yes.  If one has a text composed of LTR and RTL paragraphs, one has to
> > choose how far apart their starting margins are.  I think that could
> > get complicated for plain text if the terminal has unbounded width.
>
> But no real-life terminal does.  The width is always bounded.

Allegedly the no longer maintained FinalTerm, and maybe another one or
two not so popular terminal emulators experimented with this.

VTE and a few other emulators have also received such a feature
request; VTE has rejected it. See
https://bugzilla.gnome.org/show_bug.cgi?id=769440 if you're curious.

Indeed BiDi becomes problematic in the sense that Richard pointed out:
how far should the starting margins be from each other? By terminal
emulators rejecting the idea of unbounded width, this is not a problem
for them.

It might still be a problem for BiDi aware text viewers/edtiors,
though. I mean one possible, obvious approach could be to adjust them
according to the terminal's width. Another is to take it from the
file's contents (e.g. longest line). But maybe there's demand for
other options, e.g. to have those margins 80 characters away from each
other even when the file is viewed on a mobile phone where the
viewport is narrower and the user wishes to scroll horizontally. This
is up for text viewers/editors to decide.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-04 Thread Egmont Koblinger via Unicode

Hi Richard,

> That split is wrong if you want the non-HTML text to lay out reasonably
> well in anything but a higher order protocol forcing RTL.  You need to
> it split as:
>
> lorem ipsum ABC
> <[ DEF foobar

Okay, so you should use LRMs or other similar tricks when wrapping a
human-perceived paragraph of text.

I take it as:

- The expected definition of "paragraph", for the technical sake of
running the BiDi algorithm, is lines of the text file (that is,
between a newline and the next one).

- On top of this technical definition, the document is crafted so that
lines are not longer than a certain threshold, and the human-perceived
paragraphs are usually delimited by empty lines (sometimes by other
means, like bullets of a list).

Sounds like a reasonable approach to me, probably the best to have.
And, by the way, aligns with my BiDi proposal if the higher level
protocol (escape sequences) set the paragraph direction correctly and
disable autodetection.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-04 Thread Egmont Koblinger via Unicode

Hi Richard,

> The concept appears to exist in the form of the fields of the
> fifth edition of ECMA-48.  Have you digested this ambitious standard?

To be honest: No, I haven't. And I have no idea what those "fields" are.

I spent (read: wasted) way too much time studying ECMA TR/53 to get to
understand what it's talking about, to realize that the good parts
were already obvious to me, and to be able to argue why I firmly
believe that the bad parts are bad. Remember: These documents were
created in 1991, that is, 28 years ago. (I'm emphasizing it because I
did the math wrong for a long time, I though it was 18 years ago :-D.)
Things have a changed a lot since then.

As for the BiDi docs, I found that the current state of the art,
current best practices, exisiting BiDi algorithm differ so much from
ECMA's approach (which no one I'm aware of cared to implement for 28
years) that the standard is of pretty little use. Only a few good
parts could be kept (but needed tiny corrections), and plenty of other
things needed to be build up anew. This is the only reasonable way to
move forward.

If you designed a house 2 or 3 years ago, and finally have the money
to get it built, you can reasonably start building it. If you designed
a house 28 years ago and finally have the chance to build it
(including the exact same heating technologies, electrical system
etc.), you wouldn't, would you? I'm sure you looked at those plans,
and started at the very least heavily updating them, or started to
design a brand new one, perhaps somewhat based on your old ideas.

I don't expect it to be any different with "fields" of ECMA-48. I'm
not aware of any terminal emulator implementing anything like them,
whatever they are. Probably there's a good reason for that. Whatever
purpose they aimed to serve apparently wasn't important enough for
such a long time. By now, if they're found important, they should
probably be solved by some new design (or at the very least, just like
I did with TR/53, the work should begin by evaluating that standard to
see if it's still feasible).

Instead of spending a huge amount of work on my BiDi proposal, I could
have just said: "guys, let's go with ECMA for BiDi handling". The
thing is, I'm pretty sure it wouldn't have taken us anywhere. I don't
expect it to be different with "fields" either.

The starting point for my work was the current state of terminal
emulators and the surrounding ecosystem, plus the current BiDi
algorithm; not some ancient plan that was buried deep in some drawer
for almost three decades. I hope this makes sense.

That being said, I'd really, honestly love to see if someone evaluated
ECMA's "fields" and created a feasibility study for current terminal
emulators, similarly to how I did it with TR/53.


cheers,
egmont

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-03 Thread Egmont Koblinger via Unicode

Hi Eli,

(I'm responding in multiple emails.)


The Unicode BiDi algorithm states that it operates on paragraphs of
text, and leaves it up to a higher protocol to define what a paragraph
exactly is.

What's the definition of "paragraph" in the context of plain text files?

I don't think there's a single well-established practice. In some
particular text files, every explicit newline character starts a new
paragraph. In some (e.g. COPYING.GPL and friends), an empty line (that
is: two consecutive newline characters) separates two paragraphs. In
some, e.g. in Emacs's TUTORIAL.he, or markdown files, it's way more
complicated, probably there isn't a well-defined grammar for how
exactly bullet list entries and alike should become new paragraphs. In
the output of "dpkg -s packagename" consecutive lines indented by 1
space – except for those where there's only a single dot after the
space – form the human-perceived paragraphs. There are sure several
other syntaxes out there.

If the producer of a text file uses a different definition than the
viewer software, bugs can arise. I think this should be intuitively
obvious, but just in case, let me give a concrete example. In this
example I'll assume LTR paragraph direction set up by some external
means; with autodetected paragraph direction it's much easier to come
up with such breakages.


I wish to store and deliver the following text, as it's layed out here
in logical order. That is, the order as the bytes appear in the text
file, as I typed them from the keyboard, is laid out here strictly
from left to right, with uppercase standing for RTL letters, and no
mirroring:

lorem ipsum ABC <[ DEF foobar

The visual representation, what I expect to see in any decent viewer
software, is this one according to the BiDi algorithm this:

lorem ipsum FED ]> CBA foobar

The visual representation, in a narrower viewport, might wrap for
example like this:

lorem ipsum CBA
FED ]> foobar

which is still correct, given that logical "ABC <[ DEF" is a single
RTL run. (This assumes a viewer which, unlike Emacs, follows the
Unicode BiDi algorithm for wrapping a paragraph into multiple lines.)


Let's assume that me, as the producer of the text file, wish to create
a typical README in the spirit of COPYING.GPL and similar text files,
with the paragraph definition that two consecutive newline characters
(that is: a single empty line) delimit paragraphs; and a single
newline is equivalent to a space. Since I'd prefer to keep a margin of
16 characters in the source file (for demo purposes), I can take the
liberty of replacing the space after "ABC" by a single newline. (Maybe
my text editor does this automatically.) The file's contents, again
the logical order laid out from left to right, top to bottom, becomes
this:

lorem ipsum ABC
<[ DEF foobar

This file, accoring to the paragraph definition chosen earlier, is
equivalent to the unwrapped version shown before, and thus should
convey the same message.

If I view this file in a piece of software which uses the same
paragraph definition for BiDi purposes, the contents will appear as
expected. An example for such a viewer is a markdown converter's (that
leaves single newlines as-is, and adds a "" at double newlines)
output viewed as an html file in a browser.


Here comes the twist. Let's view this latter file with a viewer that
uses a _different_ definition for paragraph. Let's view it in Gedit,
Emacs, or the work-in-progress BiDi-aware VTE by "cat"ing it, where
every newline begins a new paragraph – that's how these viewers define
the notion of "paragraph" for the sake of BiDi.

The visual layout in these viewers becomes:

lorem ipsum CBA
<[ FED foobar

which is just not correct. Since here BiDi is run on the two lines
separately, the initial "<[" is treated as LTR, placed at the wrong
location in the wrong order, and the glyphs aren't mirrored.


Now, Emacs ships a TUTORIAL.he which, for most of its contents (but
not everywhere) seems to treat runs between empty lines as paragraphs,
while Emacs itself is a viewer that treats runs between single
newlines as paragraphs. That is, Emacs is inconsistent with itself.

In case you think I got something wrong with Emacs: Could you please
give exact definitions:
- What are the exact units (so-called "paragraphs" by UAX9) that it
runs BiDi on when it loads and displays a file?
- What are the exact units (so-called "paragraphs" by UAX9) in
TUTORIAL.he on which BiDi needs to be run in order to get the desired
readable version?

What most likely happens is that in order to see a difference, you'd
need to have more special symbols, or at least a more special
constellation of them. Probably TUTORIAL.he is just luckily simple
enough that such a difference isn't hit.

Another possibility is (and I cannot check because I can't speak
Hebrew) that somewhere TUTORIAL.he "cheats" with the logical order to
get the desired visual one.

-

Now, back to terminals.

The smallest possible viable definition of a

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-03 Thread Egmont Koblinger via Unicode

Hi Eli,

> The document cited at the beginning of the parent thread states that
> "simple" text-mode utilities, such as 'echo', 'cat', 'ls' etc. should
> use the "implicit" mode of bidi reordering, with automatic guessing of
> the base paragraph direction.

Not exactly. I take the SCP escape sequence from ECMA TR/53 (and
slightly reinterpret it) so that it specifies the paragraph direction,
plus introduce a new one that specifies whether autodetection is
enabled. I'm arguing, although my reasons are not rock solid, that
IMHO the default should be the strict direction as set by SCP, without
autodetection.

> The fundamental problem here is that most "simple" utilities use hard
> newlines to present text in some visually plausible format.

Could you please list examples?

What I have in mind are "echo", "cat", "grep" and alike, they don't
care about the terminal width.

If an app cares about the terminal width, how does it care about it?
What does it use this information for? To truncate overlong strings,
for example? At this very moment I'd argue that such applications need
to do BiDi on their own, and thus set the terminal to explicit mode.
In ap app does any kind of string truncation, it can no longer
delegate the task of BiDi to the terminal emulator.

I'm also mentioning that you cannot both logically and visually
truncate a BiDi string at once. Either you truncate the logical
string, which may result in a visual nonsense, or you truncate the
visual string, risking that it's not an initial fragment of the data
that ends up getting displayed. Along these lines I'm arguing that
basic utilities like "cut" shouldn't care about BiDi, the logical
behavior there is more important than the visual one. There could, of
course, be sophisticated "bidi-cut" and similar utilities at one point
which cut the visual string, but they should use the terminal's
explicit mode.

> Even when
> these utilities just emit text read from files (as opposed to
> generating the text from the program), you will normally see each line
> end with a hard newline, because the absolute majority of text files
> have a hard newline and the end of each line.

How does a BiDi text file look like, to begin with? Can a heavily BiDi
text file be formatted to 72 (or whatever) columns using explicit
newlines, keeping BiDi both semantically and visually correct? I truly
doubt that. Can you show me such files?

> When bidirectional text is reordered by the terminal emulator, these
> hard newlines will make each line be a separate paragraph.  And this
> is a problem, because the result will be completely random, depending
> on the first strong directional character in each line, and will be
> visually very unpleasant.  Just take the output produced by any
> utility when invoked with, say, the --help option, and try imagining
> how this will look when translated into a language that uses RTL
> script.

First, having no autodetection by default but rather an explicit
control for the overall direction hopefully mitigates this problem.
Second, I outline a possible future extension with a different
definition of a "paragraph", maybe something between empty lines, or
other kinds of explicit markers.

> So I think determination of the paragraph direction even in this
> simplest case cannot be left to the UBA defaults, and there's a need
> to use "higher-level" protocols for paragraph direction.

That higher level protocol is part of my recommendation, part of ECMA
TR/53, as the SCP sequence.

Does this make sense?


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Richard,

On Sun, Feb 3, 2019 at 2:32 AM Richard Wordingham via Unicode
 wrote:

> That first reference doesn't even use the word 'visual'.

The Unicode BiDi algorithm does speak about "visual positions for
display", "reordering for display" etc.

> All I am saying is that your proposal should define what it means by
> visual order.

Are you nitpicking on me not giving a precise definition on the
otherwise IMO freaking obvious "visual order", or am I missing
something fundamental?

> Shaping for RTL scripts happens on strings stored in logical order.

That's what I recommend in my current v0.1, which was vetoed by you
guys, claming that the terminal emulator should do it even in cases
when it's only aware of the visual order.

> Passing text in the form of characters in left-to-right order is an
> annoying distraction, presumably forced on you by the attempt to
> maximise compatibility with existing systems.

Nope; passing text in visual order(*) is a technical necessity for
Emacs (as Eli confirmed it) and all other fullscreen apps (text
editors and such), as I provide a detailed proof for that in my
proposal. It's literally impossible to perform visual cropping on a
string (required by practically all fullscreen text editors), visual
concatenation of strings (e.g. a line of tmux which has two panes next
to each other), and in the mean time preserve the logical order that's
passed on. You just can't define a logical order after visual
operations.

(*) To be pedantic, they could pass the text in whatever order they
want to, with random cursor movements in between. The point is that
the terminal emulator won't reshuffle the cells, that is, they should
write into column 1 whichever they want to appear at the leftmost
position, into column 2 whichever they want to appear in column 2, and
so on. And unless the cursor is moved explicitly, the cursor keeps
moving forward to higher numbered columns, that is, the terminal
expects to receive visual order.

> Casting text into grids of 'characters' requires consideration of all
> types of writing elements.  The division into panes is an awkward
> complication; panes in the application not shared with the terminal is
> even worse for shaping.

I'm really not sure what you're trying to say here.

The feeling I get, and I'm happy if you can prove me wrong, is that
while you're truly knowledgeable about shaping, you haven't yet
understood the very fundamentals why terminals are vastly different
from let's say web browsers, which results in the technical necessity
of often relying on visual order. There's even a separate section
dedicated to explaining this in my spec. If terminals weren't vastly
different, BiDi there would've been solved along with the birth of the
Unicode BiDi algorithm, I wouldn't have spent months working on this
proposal, and we wouldn't be having this discussion right now :)

Remember, this whole story is about finding a compromise between what
a terminal emulator is, and what BiDi scripts require (incl. shaping).
If you want to do BiDi and shaping without compromises, you should get
away from terminal emulators (as Kent has also suggested). Having a
strict grid of characters is such a compromise. The terminal emulator
not being aware of the entire logical string, only the currently
onscreen bits (that is, a cropped version of the string), which
results in the need for the explicit mode (visual order) is another
such compromise.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Richard,

> My main interest in this, though, is in improving the general run of
> Indic terminal cell editors.  If we can get Gnome-terminal working for
> Kharoshthi, things should improve for LTR Indic.  Even working on the
> false assumption that Indic scripts are like Devanagari would be an
> improvement, despite my comments about Khmer.

So, as for concrete bugs, there's the aforementioned VTE bug 584160.
You might want to give the pending patches a try, or (to keep the
relevant discussion at one place) comment over there about your
desired priorities etc.

We've also set up a "Terminal WG" on freedesktop
(https://gitlab.freedesktop.org/terminal-wg), a place intended for
specifications. If you/we feel like certains bits around
Devanagari/Khmer/etc. handling need a proper specification before we
could jump to the implementation, probably that would be the best
platform to discuss that. Reason being that I don't know when I'd be
able to address them, if ever, but there are multiple terminal
emulator developers waiting there for such challenges. Also, IMHO a
bugtracker is a better forum than a mailing list if parties can't all
immediately work on the problem :)

I'm definitely aiming to fix the basic Devanagari rendering (that is:
spacing marks), for this autumn's VTE release. Maybe even for this
spring's. I probably won't do more (like Virama), they'll have to wait
for the HarfBuzz port.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Richard,

On Sat, Feb 2, 2019 at 9:57 PM Richard Wordingham
 wrote:

> Seriously, you need to give a definition of 'visual order' for this
> context.  Not everyone shares your chiralist view.

When I look at the Unicode BiDi algorithm, or go to an online demo at
https://unicode.org/cldr/utility/bidic.jsp, or look at the FriBidi API
etc., their very basic functionality is that I pass the logical order
(as the string is expected to be stored in text files etc.), and the
result of the algorithm is the visual order.

On top of this, I make the clarification that combining marks need to
be reordered to be sent out to the terminal emulator _after_ their
base letter, because that's how terminal emulators work. The BiDi
problem area can only be reasonably addressed in the display layer, by
leaving the emulation layer pretty much unchanged. I find it
unreasonable to introduce a new mode where the combining accents are
sent to the terminal emulator _before_ their base letter. (On an
offtopic note, I wish that was the only mode in Unicode, it would
simplify a couple of things in the handling of streams. But this ship
has sailed decades ago.)

This reordering for the combining accents to come after (that is: to
the right) of the base letter in the LTR visual order is what e.g.
FriBidi does by default, due to the REORDER_NSM flag being set by
default.

Essentially, the "explicit mode" that my specification introduces is
the exact same behavior that most terminal emulators do now, and the
one that e.g. Emacs requires. They lay out the codepoints they
receive, from left to right. Nothing is going to change there. What I
add is another mode (the technically less problematic "implicit" mode
where the terminal displays the contents just as any BiDi-aware
graphical text editor, browser etc. would do) for the sake of
"cat"-like simple utilities, while being unsuitable for Emacs and
friends. My work also specifies how/when exactly to toggle back and
forth between these two modes.

What else do I need to further specify in the concept of "visual order"?

> A visible U+17D2 has no rôle in the Khmer writing system.  On
> computers, it is a warning that the input of a subscript consonant is
> only half done.  There are three units of the writing system in that
> word - KHMER LETTER PO, KHMER CONSONANT SIGN COENG RO*, and KHMER SIGN
> YUUKALEAPINTU.

> [and I could quote a whole lot more]

Richard, you are obviously magnitudes more savvy in shaping and stuff
than me, and I can't quickly pick up your knowledge to properly answer
to all the issues you mentioned.

What you probably still haven't realized is that I aimed to address a
much lower level issue than the ones you keep bringing up. Currently,
no matter what terminal emulator you pick, as soon as you start doing
BiDi (vim, emacs, cat, echo...), you end up with words being written
backwards. I mean, maybe they show up correctly with emacs, but they
show up incorrectly with vim and cat. Then you switch to a different
emulator, or toggle a setting, and suddenly vim and cat will be okay,
and emacs won't. This is bad.

This is the low level issue I'm trying to address, to make sure that
letters of words are always shown in the correct order. There's no way
you could do shaping underneath this level, it makes no sense to talk
about shaping, zero-width (non)joining, special Khmer symbols and
whatnot on reversed words, right? The order of the letters need to be
fixed first, which is what I'm doing, and then all the bells and
whistles needed for shaping might come on top of this.

Right now I'm doing this BiDi work all voluntarily. As much as I'd
love to solve all the problems of the world, I don't have capacity for
that. As for shaping, chances are that I'm not going to get there,
unless someone offers a decent paid job :P. What I'm looking for right
now is feedback on whether the low-level BiDi work makes sense, and
whether it really creates proper grounds for building shaping etc. on
top of it one day.

Hope this clarifies a lot. And again, thanks for all your precious
input, but we've heavily diverged from the scope of my work.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Richard,

On Sat, Feb 2, 2019 at 12:43 PM Richard Wordingham via Unicode
 wrote:

> I'm not conversant with the details of terminal controls and I haven't
> used fields.  However, where I spoke of lines above, I believe you can
> simply translate it to fields.  I don't know how one best handles
> fields - are they a list, possibly of rows within fields, or are they
> stored as cell attributes?

The very essential is that the terminal emulator stores "cells".
Pretty much all the data (with very few exceptions) resides in cells.

A cell contains a base letter, followed by possibly a few non-spacing
marks. A cell has a foreground color, background color, bold,
underlined, italic etc. properties.

How these cells are linked up, in an array or whatever, is mostly
irrelevant since it's likely to be different in every implementation.

Of course it is possible to extend the per-cell storage to contain a
"previous" and a "next" character, as to be used for shaping purposes
only. Some questions: Is this enough (e.g. aren't there cases where
more than the immediate neighbor are relevant)? Is the next base
character enough, or do we also need to know the combining accents
that belong to that? And can't we store significantly less information
than the actual letter (let's say, 1 out of 13 [randomly made up
number] possible ways of shaping)?

Terminal emulators potentially store a lot of data (some even support
infinite scrolling), and try to handle them in some effective way.
That is, they do all sorts of bitpacking and crazy stuff. E.g. some
might reject adding new attributes when the per-cell size of the
attribute would extend 4 or 8 bytes, both for memory and performance
reasons. Another example: VTE has one global pool of all the base
character + combining accents combos that it has encountered, and
starts assigning single codepoints to them from U+1000 or so, so
that then for each cell the base letter + combining accents still
don't require more storage than 4 bytes.

The takeaway is: the less data we need to remember per cell, the
better, and every bit matters.

But to recap, we're now just peeking into a possible future extension
of the specs to see if it's viable (I guess it is), which I believe
emulators might reasonably decide not to implement, if they think
performance is more important than proper shaping in all the special
cases.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Richard,

> Not all terminal emulators can deal with non-spacing combining
> characters.

Both Hebrew and Arabic seem to use non-spacing combining characters,
presumably other Arabic-like scripts too.

I forgot to state explicitly in my docs, but let's just say that
handling non-spacing combining accents is a prerequisite for BiDi
support. Those emulators that don't handle them should be out of scope
for our current discussion.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Richard,

> > Are they okay to be present in visual order (the terminal's explicit
> > mode, what we're discussing now) too?
>
> Where do you define the order for explicit mode?

In explicit mode, the application (Emacs, Vim, whatever) reorders the
characters, and passes visual order (left to right) to the terminal
emulator. The terminal emulator preserves this visual order, doesn't
reshuffle anything.

How to handle ZW(N)J in visual order? What's the desired way? Is it
specified anywhere? As far as I know, they specify the relation
between two adjacent characters of the logical order, which might not
even become adjacent in the visual. Should they always "stick" to the
preceding character, for example?

The Unicode BiDi algorithm doesn't seem to make a difference between
base letters and combining accents for reordering. So, given in an RTL
text a base letter + a combining accent, the BiDi algorithm gives the
visual LTR order of the combining accent first (on the left), followed
by the base letter. This order is not okay for terminal emulators.
Combining accents have to be reordered in the output of the Unicode
BiDi algorithm, so that they come after the base letter even in the
visual LTR order. This is e.g. what FriBidi does by default, due to
the REORDER_NSM flag.

Presumably it doesn't just reorder non-spacing combining accents, but
also ZW(N)J and alike symbols too, which already smells pretty
problematic, doesn't it? Or is this what you need there, too?

> There may be complications in ensuring that
>  gets stored
> as the content of a single cell.

How should the terminal emulator know which cell (the previous or the
subsequent) do these two s belong to?

> > Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined
> > above.
>
> Example, please.

Cropped strings, cropped strings that are adjacent to each other, and
faulty shaping could kick in there.

Two fields on the UI. One in columns 36-40 with cyan background,
aiming to show ABCDEF, but due to limited room, can only show ABCDE
(let's say it's scrolled horizontally this way). Another in columns
41-45 with yellow background, aiming to show UVWXYZ, but due to
limited space only VWXYZ is shown (it's scrolled horizontally like
this).

What the terminal emulator sees is a continuous text of ABCDEVWXYZ.
What the application wants to have is to get E shaped as if there was
an F on its right, and get V shaped as if there was an U on its left.

Once you address this problem, I'm not sure ZW(N)J are still
required/desireable, rather than applying this more generic solution
there as well.

> At present, VTE positions LTR Indic preceding spacing combining marks
> after the consonant.  I though your draft scheme corrected this very
> local bidi issue, which is so local that the bidi algorithm ignores it.

Indic spacing combining marks are handled incorrectly by VTE and are
being addressed in bug 584160 which I've already linked. This
particular issue I don't consider BiDi at all. It's something totally
different. The spacing accent can be to the right, somewhat on top of
and somewhat to the right, on top of, somewhat to the left and
somewhat on top of, or fully on the left. It's not binary left or
right. Proper rendering should be done by font, and not at all by the
BiDi of the terminal. The terminal is unaware of how much the base
glyph is shifted to the right and the accent to its left. All that the
terminal needs to do (and VTE gets it wrong now) is to pass these two
into whichever font rendering engine in one single step.

> So ព្រះ  LETTER RO, U+17C8 KHMER SIGN > _preah_ 'prefix denoting
> repect for gods, kings, etc.' will be three cells <្រ,ព,ៈ> = <(COENG,
> RA), PO, YUUKALEAPINTU> and cause no confusion?  Or will the cells be
> ?

First it's a base character followed by a non-spacing mark. As in most
terminal emulators (and now we're absolutely not talking about my BiDi
proposal) they are stored in the same cell. The first cell contains
(PO, COENG).

The next two are a base character followed by a spacing mark. In VTE
584160 I outline two possible approaches, but the one I'm in favor of,
is that the row's second cell contains RO and the third cell contains
YUUKALEAPINTU, which two are combined together properly when the
logical contains get displayed. Another possibility which I'm
pondering about is whether the emulation layer should combine them,
that is, have the second cell store the "first half of (RO, YUUKA)"
and the third cell store the "second half of (RO, YUUKA)".

Does this make any sense? If not, could you please explain what and
why is the desired behavior? Please keep in mind that I know nothing
about Khmer in particular.

Anyway, here we're talking about something that's totally independent
from my BiDi work. It's also something that should be standardized
across terminals, sure, but maybe not right now :)


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Egmont Koblinger via Unicode

Hi Kent,

On Sat, Feb 2, 2019 at 12:41 AM Kent Karlsson via Unicode
 wrote:

> [...] neither of which
> should directly consult the font [...]
> But terminals
> (read terminal emulators) can deal with mixed single width and double
> width characters (which is, IIUC, the motivation for the datafile
> EastAsianWidth.txt).

Yup, exactly; and for this reason, no terminal I'm aware of takes the
single vs. double width property from the font. The logical behavior,
i.e. knowing which logical cell contains what character (or which half
of what character, in case of double wide ones) isn't influenced by
the font. It's taken from EastAsianWidth (or other means, which we're
working on: https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9
, to address e.g. incompatibilities arising from different Unicode
version used by the app vs. the terminal, as you pointed out).

Also think of cases like when the user modifies the font of the
terminal run-time, or a headless terminal emulator, or a screen/tmux
attached to multiple terminal emulators of different fonts at once...
Adjusting the logical behavior according to the font would definitely
be a wrong path to take.

> Likewise non-spacing combining characters should
> be possible to deal reasonably with.

Most terminal emulators handle non-spacing combining marks, it's a
piece of cake. (Spacing marks are more problematic.)

> All sorts of problems arise; feeding
> the emulator a character (or "short" strings) at a time not allowed
> to buffer for display (causing reshaping or movement of already
> displayed characters, edit position movement even within a single
> line, etc.).

Emulators need to update their screen to reflect whatever is in the
logical buffer, and the contents of the logical buffer mustn't depend
on the timing of the incoming data. As a consequence, when the input
stream contains a base character + a combining accent, there is a slim
chance that the base character without the combining accent makes it
to the display for a short time. It's the emulator's job to "fix" it
(that is, redraw the glyph with the combining accent) once the accent
is received. If an emulator doesn't do it correctly, it's simply a bug
in that emulator.

On a side note, we're also working on an extension for atomic updates
at https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9
which should significantly further decrease the chance of such
intermittent screen updates.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi,

I'm trying to respond to every question, but I'm having a hard time
keeping up :-)

Thanks a lot for all the precious input about shaping!

Here's my suggestion, for version 0.2 of the recommendation:

- No longer encourage any use of presentation form characters.

- State that it's the terminal emulator's task to perform shaping,
both in implicit and explicit modes.

- Leave it for a future enhancement to handle trickier cases in
explicit mode, such as shaping of a word that's only partially
visible, or prevent shaping when two words happen to touch each other
and are visually separated by other means (e.g. background color).
Leave it for further research whether we could use ZWJ/ZWNJ here,
whether we could use ECMA's SAPV 5-8 & 21-11, or whether we should
invent something new (perhaps even telling the terminal emulator what
neighboring previous/next characters to imagine there for the purpose
of shaping)...

Let me know if you have any remaining problems/concerns/etc.

As for the implementation in VTE: initially I'll still use
presentation form characters, solely because that's a low hanging
fruit approach (low investment, high gain). I've already implemented
it in about an hour (a bit of further hacks will be necessary to
extend it to explicit mode, but still easily doable), whereas
switching to HarfBuzz is expected to take weeks of heavy work. We'll
tackle that in a subsequent version. And if anyone's happy to help,
there's already some bounty for harfbuzz support :)

Thanks again for the great guidance!

cheers,
egmont

On Tue, Jan 29, 2019 at 1:50 PM Egmont Koblinger  wrote:
>
> Hi,
>
> Terminal emulators are a powerful tool used by many people for various
> tasks. Most terminal emulators' bugtracker has a request to add RTL /
> BiDi support. Unicode has supported BiDi for about 20 years now.
> Still, the intersection of these two fields isn't solved. Even some
> Unicode experts have stated over time that no one knows how to do it
> properly.
>
> The only documentation I could find (ECMA TR/53) predates the Unicode
> BiDi algorithm, and as such no surprise that it doesn't follow the
> current state of the art or best practices.
>
> Some terminal emulators decided to run the BiDi algorithm for display
> purposes on its lines (rather than paragraphs, uh), not seeing the big
> picture that such a behavior turns them into a platform on top of
> which it's literally impossible to implement proper BiDi-aware text
> editing (vim, emacs, whatever) experience. In turn, vim, emacs and
> friends stand there clueless, not knowing how to do BiDi in terminals.
>
> With about 5 years of experience in terminal emulator development, and
> some prior BiDi homepage developing experience with the kind mentoring
> of one of the BiDi gurus (Aharon, if you're reading this, hi there!),
> I decided to tackle this issue. I studied and evaluated the
> aforementioned documentation and the behavior of such terminals,
> pointed out the problems, and came up with a draft proposal.
>
> My work isn't complete yet. One of the most important pending issues
> is to figure out how to track BiDi control characters (e.g. which
> character cells they belong to), it is to be addressed in a subsequent
> version. But I sincerely hope I managed to get the basics right and
> clean enough so that work can begin on implementing proper support in
> terminal emulators as well as fullscreen text applications; and as we
> gain experience and feedback, extending the spec to address the
> missing bits too.
>
> You can find this (draft) specification at [1]. Feedback is welcome –
> if it's an actionable one then preferably over there in the project's
> bugtracker.
>
> [1] https://terminal-wg.pages.freedesktop.org/bidi/
>
>
> cheers,
> egmont (GNOME Terminal / VTE co-developer)

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Richard,

On Fri, Feb 1, 2019 at 12:19 AM Richard Wordingham via Unicode
 wrote:

> Cropped why?  If the problem is the truncation of lines, one can simple
> store the next character.

Yup, trancation of line for example.

I agree that one could "store the next character". We could extend the
terminal emulation protocol where by some means you can specify that
column 80 contains a letter X, and even though there's no column 81,
an app can still tell the terminal emulator that it should imagine
that column 81 contans the letter Y, and perform shaping accordingly.

This will need to be done not just at the end of the terminal, but at
any position, and for both directions. Think of e.g. a vertically
split tmux. You should be able to tell that column 40 contains X which
should be shaped as if column 41 contained Y, and column 41 contains Z
which should be shaped as if column 40 contained A.

What I canont see at all is how this could be "simply". Could you
please elaborate on that? I don't find this simple at all!

>> > It's not able to
> > separate different UI elements that happen to be adjacent in the
> > terminal, separated by different background color or such.
>
> ZWJ and ZWNJ can handle that.

Wouldn't it be a semantical misuse of these characters, though?

They are supposed to be present in the logical order, and in logical
order (that is: the terminal's implicit mode) they can work as
desired.

Are they okay to be present in visual order (the terminal's explicit
mode, what we're discussing now) too?

Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined above.

> If a general text manipulating application, e.g. cat, grep or awk, is
> writing to a file, it should not convert normal Arabic characters to
> presentation forms.  You are now asking a general application to
> determine whether it is writing to a terminal or not, and alter its
> output if it is writing to a terminal.

No, this absolutely not what I'm talking about!

There are two vastly different modes of the terminal. For "cat",
"grep" etc. the terminal will be in implicit mode. Absolutely no BiDi
handling is expected from these apps, the terminal will do BiDi and
shaping (perhaps using Harfbuzz; perhaps using presentation form
characters as a temporarily low hanging fruit until a better one is
implemented – the choice is obviously up to the implementation and not
to the specification).

For "emacs" and friends, an explicit mode is required where visual
order is passed to the terminal. What we're discussing is how to
handle shaping in this mode.

> But it as an issue that needs to be addressed.  As a terminal can be
> addressed by cell, an application may need to keep track of what text
> went into each cell. Misery results when the application gets it wrong.

My recommendation doesn't change this principle at all. In the lower
(emulation) layer every character still goes into the cell it used to
go to, and is addressable using cursor motion escapes and so on
exactly as without BiDi.

> How many cells do CJK ideographs occupy?  We've had a strong hint
> that a medial BEH should occupy one cell, while an isolated BEH should
> occupy two.

CJK occupy two, but they do regardless of what's around them. That is,
they already occupy two cells in the logical buffers, in the emulation
layer.

There is absolutely no sane way we can make in terminal emulation a
character's logical width (as in number of cells it occupies) depend
on its neighboring characters. (And even if we could by some terrible
hacks, it would break the principle you just said as "misery
results...", and the principle Eli said that things should remain
reasonably simple, otherwise hardly anyone will bother implementing
them.) This is a compromise Arabic folks will have to accept.

When displayed, it's up for terminal emulators to perhaps
enwiden/shrink cells as it wants to (they might even totally give up
on monospace fonts), but then they'll risk vertical lines not aligning
up perfectly vertically, content overflowing on the right etc. Konsole
does such things.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Ken,

> [language tag]
> That is a complete non-starter for the Unicode Standard.

Thanks for your input!

(I hope it was clear that I just started throwing in random ideas, as
in a brainstorming session. This one is ruled out, then.)

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

On Thu, Jan 31, 2019 at 4:26 PM Eli Zaretskii  wrote:

> > Yes, I do argue that emacs will need to print a new escape sequence.
> > Which is much-much-much-much-much better than having to tell users to
> > go into the settings of their macOS Terminal / Konsole /
> > gnome-terminal etc. and disable BiDi there, isn't it?
>
> I'm not sure I agree.  Most users can disable bidi reordering of the
> terminal once and for all.  They don't need it.

What users are we talking about? Those who don't need BiDi ever at all?

Everything is already perfect for them! They should't care about the
"enable BiDi" settings of their terminal, either value will result in
the same, correct behavior for them.

Or do we talk about users who care about BiDi inside Emacs, but don't
care about BiDi when echo'ing, cat'ing...? Do such users exist? Well,
even if they do, they're not the only target of my work.

Remember: My proposal aims to address both the Emacs as well as the
echo/cat/... use cases. These are substantially different use cases
that require the terminal emulator to be in a different mode, and thus
automatic switching between the two modes has to be solved.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi,

On Thu, Jan 31, 2019 at 4:14 PM Eli Zaretskii  wrote:>

> I suggest that you show the result to someone who does read Arabic.

I contacted one guy who is pretty knowledgeable in Arabic scripts, as
well as terminal emulation, I sent out an early unpublished version of
the proposal to him, but unfortunately he was busy and didn't have the
chance to respond. Let this thread be one where we invite Arabic folks
to comment :)

> Small changes can be very unpleasant to the eyes of an Arabic reader.

I can easily imagine that!

I can assure you, seeing õ instead of ő in my native language is
extremely unpleasant to my eyes. Depending on the font you're using,
you may not even have spotted any difference.

But could someone argue for example that seeing an "i" and "w" equally
wide is unpleasant to their eyes? Where do we draw the lines of what's
an acceptable compromise on a platform that has technical limitations
(fixed grid) to begin with? We really need input from Arabic folks to
answer this.

I'm also wondering: how unpleasant it is if a letter is cut in half
(e.g. overflows at the edge of the text editor), and is shaped not
according to the entire word but according to the visible part? I took
it from the CSS specification that the desired behavior is to shape it
according to the entire word, but I honestly don't know how acceptable
or how unpleasant the other approach is.

> You could do that, but it will require a lot of non-trivial processing
> from the applications.  Text-mode applications don't want any complex
> tinkering, they want just to write their text and be done.  The more
> overhead you add to that simple task, the less probable it is that
> applications will support such a terminal.

I agree with your overall observation, but I'm not sure how much it
applies to this context.

Text-mode applications have to run the BiDi algorithm. The one I
picked can also do shaping (well, the pretty limited one, using
presentation forms). Shouldn't any BiDi algorithm also provide methods
for shaping that produce some output that can be easily sent to the
terminals? Shouldn't we push for them?

As far as I imagine the ideal solution, doing this part of shaping
shouldn't be any harder for apps than doing BiDi, basically all they
would need to do is hook up to existing API methods.

Of course, given the current APIs, it's probably really not this simple.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi,

On Thu, Jan 31, 2019 at 4:10 PM Eli Zaretskii  wrote:

> The reordering happens before TABs are converted to cursor motion,
> does it not?

No, not at all.

You cannot "mix" handling the input and reordering, since the input is
not available as a single step but arrives continuously in a stream.

Consider a heavy BiDi text such as (I'm making up some random
gibberish, uppercase being RTL):
foo BAR FSI BAz quUX 1234 PDI whatEVer

Someone prints it to the terminal, but due to the internals, the
terminal doesn't receive this in one single step but in two
consecutive ones, broken in the middle. Maybe the app split it in half
(e.g. a shell script printed fragments one by one using printf without
a trailing newline). Maybe the emitter is a "dd" printing blocks of
let's say 4kB and this line happens to cross a boundary. Maybe a
transport layer such as ssh split it for whatever reason.

Then would you take the first half of this text, let's say
foo BAR FSI BAz quU
even with unbalanced BiDi controls, then reorder it, and continue from
it? Continue how? How to remember to reorder the second half too, but
not the first half once again in order to avoid "double BiDi"?

What to do with explicit cursor movement, would they jump to the
visual positon? This would break absolutely basic principles, e.g.
jumping twice to the same location to overwrite a letter twice in a
row may actually end up overwriting two different letters, since
everything was potentially rearranged after the first overwrite
happened? Any application having any existing preconception about
cursor movement would uncontrollably fall apart.

This approach is doomed to fail big time (and was the reason I had to
drop ECMA TR/53's DCSM "presentation" mode).

The only reasonable way is if you have two layers. The bottom layer
does the emulation almost exactly as it used to do, with no BiDi
whatsoever (except for tiny additions, e.g. it tracks BiDi-related
properties such as the paragraph direction). The upper layer displays
the data, and this upper layer performs BiDi solely for display
purposes: using the lower layer's data as input, but not modifying it.

This is, by the way, also what current emulators that shuffle the
characters arond do.

Let's also mention that the lower layer (emulation) should be as fast
as possible. e.g. VTE can handle input in the ballpark of 10MB/s.
Reordering, that is, running BiDi for display purposes needs to happen
much more rarely, maybe 20-60 times per second. It would be a
performance killer having to run the BiDi algorithm upon every
received chunk of data – in fact, to eliminate any possible behavior
difference due to timing difference, it'd need to happen after every
printable character received.

There's absolutely no way we could reorder first, and then handle
TAB's cursor movement. TAB's cursor movement happens in the lower
layer, reordering happens in the upper one.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Eli,

> So we will some day have one such terminal emulator.  That's good, but
> a text-mode application that needs to support bidi cannot rely on its
> users all having access to that single terminal.

No. A text-mode application that needs to support BiDi must do the
BiDi itself and pass visual order to the emulator, and beforehand
switch the emulator to explicit mode so that you don't end up with
"double BiDi". Once you emit visual order, there's no need for any
BiDi control characters.

For this behavior, the only feature you need from a terminal emulator
is to have a mode where it doesn't shuffle the characters. Currently
every emulator I'm aware of has such a mode, although in some of them
you have to tweak the settings to get to this mode (in my firm opinion
it's an unacceptable user experience), while in emulators according to
my specification there'll be an escape sequence for text-mode apps to
automatically switch to this mode.

What BiDi control characters (LRE, LRI, FSI etc.) in implicit mode
will give you – if supported – is that you'll be able to execute "cat
file", and it'll be displayed correctly, even taking FSI and friends
as present in the file into account. Of course this will only work in
terminal emulators that support this.

> This is indeed a significant issue, because it means applications
> cannot force the terminal use a certain non-default base paragraph
> direction.

They can, since there's a dedicated escape sequence (SCP) for setting
the base paragraph.

That being said, not being able to remember FSI at the beginning of a
string is indeed a significant issue, we agree on this. We just need
to figure out how to alter the emulation behavior to remember them,
which I find the next big step to address in the specification.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Eli,

> Arabic presentation forms are more like an exception than a rule, I
> hope you understand this by now.  Most languages/scripts don't have
> such forms, and even for Arabic they cover only a part of what needs
> to be done to present correctly shaped text.  Complex script shaping
> is much more than just substituting some glyphs with others, it
> requires an intimate knowledge of the font being used and its
> capabilities, and the ability to control how various glyphs of a
> grapheme cluster are placed relative to one another, something that an
> application running on a text terminal cannot do.
>
> So I suggest that you don't consider Arabic presentation forms a
> representative of the direction in which terminal emulators supporting
> such scripts should evolve.

Thanks a lot for this information!

I now understand that presentation forms isn't an ideal possible
approach, and the recommendation should be improved here.

Until it happens, I'm uncertain whether using presentation form
characters is a decent low hanging fruit that significantly improves
the readability in some situations (e.g. "good enough" in some sense
for Arabic), or is a dead end we shouldn't propagate.

I still do not agree however that the entire responsibility can be
shifted to the emulator. There are certain important bits of
information that are only available to the application, and not the
emulator – as with many other aspects, such as reordering,
copy-pasting, searching in the data in BiDi-aware text editors using
the terminal's explicit mode, which are all pushed to the application
because the emulator cannot do them correctly.

I believe we should further study the situation, e.g. see whether
ECMA-48's SAPV (8.3.18) parameters 5..8 (to explicitly specify whether
to use isolated/initial/medial/final form for each character) are
flexible enough to convey all this information, or perhaps a new, more
powerful means should be crafted. At this point I lack sufficient
knowledge to fix the design, I'd need to spend a lot of time studying
the situation and/or working together with you guys, if you're up for
it.


Thanks a lot,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Richard,

> Basic Arabic shaping, at the level of a typewriter, is straightforward
> enough to leave to a terminal emulator, as Eli has suggested.

What is "basic" Arabic shaping exactly?

I can see problems with leaving it to a terminal. It's not aware of
the neighboring character if the string is cropped. It's not able to
separate different UI elements that happen to be adjacent in the
terminal, separated by different background color or such.

On the other hand, let's reverse the question:

"Basic Arabic shaping, at the level of a typewriter, is
straightforward enough to be implemented in the application, using
presentation form characters, as I suggest". Could you please point
out the problems with this statement?

> I believe combining marks present issues even in implicit modes.  In
> implicit mode, one cannot simply delegate the task to normal text
> rendering, for one has to allocate text to cells.  There are a number
> of complications that spring to mind:
>
> 1) Some characters decompose to two characters that may otherwise lay
> claim to their own cells:
>
> U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
> 0654>.  Do you intend that your scheme be usable by Unicode-compliant
> processes?

Decompose during which step? During shaping?

Or do you mean they are NFC-NFD counterparts of each other?

Most terminal emulators are able to handle combining accents, and of
course implicit mode would take them into account when rearranging the
letters. Terminal emulators don't do explicit (de)composing, a.k.a.
NFC->NFD or NFD->NFC conversion (at least I'm not aware of any that
does).

> 4) Indic conjuncts.
> (i) There are some conjuncts, such as Devanagari K.SSA, where a
> display as ,  is simply unacceptable.  In some
> closely related scripts, this conjunct has the status of a character.

We (in GNOME Terminal / VTE) do have an open bug about Devanagari
spacing marks (currently they don't show up properly), plus Virama and
friends. I'd like to address the essentials along with the BiDi
implementation; although here we should discuss the design and not a
particular implementation thereof :)

In case you're interested, at
https://bugzilla.gnome.org/show_bug.cgi?id=584160 comments 45-48, 95
and perhaps a few others comments I wondered whether certain joining
operations should be done on the emulation layer or the display layer.
The answer is not yet clear. We can't fix suddenly everything, but
it's nice to move forward step by step. It's also proposed that we
used HarfBuzz, but it's unclear to me at this point how the grid
alignment could be preserved in the mean time.

"simply unacceptable" – I'm not familiar with those languages,
cultures and so on, but I'd be hesitant to go as far as calling
anything "unacceptable". E.g. there's a physical typewriter in our
family, as far as I remember it has no digits 1 or 0 (use the letters
lowercase L and anycase O instead), it doesn't contain all the
accented letters of my mother tounge so sometimes a similarly looking
one has to be used. In today's computer world, I'd say such
limitations are "unacceptable", but at that time this was what we had
to live with.

Terminal emulators, due to their strict character grid nature and
their legacy behavior of many decades, are a platform where a certain
level of compromise might be necessary for some scripts. I cannot tell
where to draw the line, cannot tell what is "extremely bad" vs. "not
nice" vs. "kind of okay but could be better", but we can't do
everything in a terminal emulator that a graphical app could do. If
someone wants to have a pixel perfect look, terminal emulators are not
for them. Maybe looking at typewriters of those scripts could be a
good starting point. Anyway, we've drifted quite far away.


What I've already implemented in VTE (in a work-in-progress branch),
and to my eyes looks quite nice, is Arabic shape using presentation
form characters as done by FriBidi (in implicit mode only). According
to the API of this library, this shaping process keeps a 1:1 mapping
between the original and shaped letters (at least the number of
Unicode codepoints – I haven't double checked their terminal width,
but I really hope they don't mess with us here). That is, I don't have
to deal with a character cell splitting into two, or two character
cells joining into one during shaping. Does this sound okay so far?


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

On Thu, Jan 31, 2019 at 10:05 AM Richard Wordingham via Unicode
 wrote:

> > How will "ls -l" possibly work?  This is an example of the "table"
> > layout you were already discussing.
>
> I think the answer is that it will use the same trickery as with a
> default setting for the --color argument.  Colour codes are emitted
> only when the output is a terminal.  Presumably the same would go for
> Bidi controls.

Exactly, that's what I have in mind in the long run. If coreutils
folks like the idea, "ls" could have a new option
--bidi=never/auto/always. With BiDi mode, it would enclose each of the
logical segments of strings that potentially contain RTL text
(filenames, dates etc.) separately inside an FSI...PDI block. That way
its output would look as desired (over the terminal's new default
"implicit" mode), since the terminal would take care of BiDi-ing each
FSI...PDI block.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

> > And if you argue "so make emacs print your
> > new code to disable formatting", so do thousands of other programs that are
> > less sophisticated than emacs.
>
> Yes, I do argue that emacs will need to print a new escape sequence.
> Which is much-much-much-much-much better than having to tell users to
> go into the settings of their macOS Terminal / Konsole /
> gnome-terminal etc. and disable BiDi there, isn't it?

Let me phrase it slightly differently. Emacs will not "need" to print
a new escape sequence, but will have the possibility to do so.

VTE is pretty certainly going to switch its default behavior to what
Konsole, PuTTY, Mlterm, macOS Terminal do now: to perform BiDi on its
contents. This mode is not suitable for Emacs or for any BiDi-aware
text editor.

Similarly to these terminal emulators, GNOME Terminal (and hopefully
other VTE-based frontends) will also most likely have a user setting
to force disable BiDi.

But as opposed to the aforementioned terminals, VTE will also turn off
BiDi upon a designated escape sequence.

VTE is the terminal widget behind several emulator apps, such as GNOME
Terminal, Xfce Terminal, Tilix, Terminator, Guake... I don't have
metrics, but according to various user polls I have the feeling that
VTE's usage share among Linux users is pretty significant, somewhere
in the ballpark of 50%.

Of course Emacs, or any other text editor, can still point its users
to the terminal's setting to disable BiDi. And then if the user also
wishes to have BiDi for "cat", they'll have to keep toggling it back
and forth. Or Emacs can emit the new escape sequence and then it will
be fully automatic.

Which one puts less supporting burden on Emacs's developers and
supporters? Which one is the better for the users? I think the answer
is the same to these two questions, and you sure know which answer I'm
thinking of.

According to this specification, nothing is going to be "worse" than
it already is in those few aforementioned terminal emulators. The new
default behavior will be the same as their behavior. We'll just
further extend it with the possibility of switching back to the old
mode without annoying the user.

I hope this clarifies a lot.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

> Arabic terminals and terminal emulators existed at the time of Unicode 1.0.

I haven't found any mention of them, let alone any documentation about them.

> If you are trying to emulate those services, for example so that older 
> software can run, you would need to look at how these programs expected to be 
> fed their data.

My goal is not to have those ancient software run. My goal is to look
into the future. Address the requests often seen in current terminal
emulator's bugtrackers. Stop the utterly unacceptable user experience
of current self-proclaimed BiDi-aware terminals where in order to run
Emacs you need to fiddle with the terminal's settings. Show that BiDi
in terminals is a much more complex story than just shuffling around
the characters, thus stopping new emulators from taking this broken
route which causes about as much damage as good. Create a platform on
top of which modern BiDi-aware experience can be created, to make both
"cat" and "emacs" work properly out of the box for BiDi users.

> I see little reason to reinvent things here, because we are talking about 
> emulating legacy hardware. Or are we not?

As per the above, no, not really. I'm not aware of any hardware that
supported BiDi, was there any? I look at terminal emulators as
extremely powerful tools for getting all kinds of work done. They are
continuously being improved, nowadays many terminal emulators contain
plenty of features that weren't there in any hardware one. I'm looking
for smoothlessly extending the terminal emulator experience to the RTL
/ BiDi world.

> It's conceivable, that with modern fonts, one can show some characters that 
> could not be supported on the actual legacy hardware, because that was 
> limited by available character memory and available pre-Unicode character 
> sets. As long as the new characters otherwise fit the paradigm (character per 
> cell) they can be supported without other changes in the protocol beyond 
> change in character set.

Which protocol, the protocol of non-BiDi-aware terminals that lays out
everything from left to right, so the output of "echo", "cat" etc. are
reversed; or the protocol of self-proclaimed BiDi-aware terminals
where it's literally impossible to create a proper BiDi-aware text
editor?

My work focuses on proving that both of these modes are needed, and
how the necessary mode switches could happen automatically behind the
scenes.

> However, I would not expect an emulator to accept data in NFD for example.

Many emulators do.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

On Wed, Jan 30, 2019 at 5:31 PM Adam Borowski  wrote:

> The program (emacs in this case) can do arbitrary reordering of characters
> on the grid, it also has lots of information the terminal doesn't.  For
> example, what are you going to do when there's a line longer than what fits
> on the screen?  Emacs will cut and hide part of it; any attempts to reorder
> that paragraph by the terminal are outright broken as you don't _have_ the
> paragraph.  Same for a popup window on the middle of the screen partially
> obscuring some text underneath.

This is absolutely correct so far.

> And if you argue "so make emacs print your
> new code to disable formatting", so do thousands of other programs that are
> less sophisticated than emacs.

Yes, I do argue that emacs will need to print a new escape sequence.
Which is much-much-much-much-much better than having to tell users to
go into the settings of their macOS Terminal / Konsole /
gnome-terminal etc. and disable BiDi there, isn't it?

Could you please give me a brief idea about those "thousands of other
programs" that will need to be adjusted? What other apps can do BiDi?
Not even Vim/NeoVim can do it.

If an app doesn't support BiDi, it's broken anyways when encountering
RTL text. It'll still be broken, just broken differently. Did you mean
all these programs as those thousands?

For ncurses apps there's also a workaround that you could apply:
create a terminfo where the ti/te entries not only switch to/from the
alternate screen but also disable/enable BiDi. In that case all these
thousand ones will be "fixed" (that is: broken in the "old" way rather
than broken in the "new" way).

On the other hand, what you absolutely can *not* do automatically by
emitting escape sequences at the right times, is to enclose the output
of much lighter utilities like "echo", "cat", "grep", "head" and so on
with any kind of BiDi controls.

> On the other hand, all that the program can output is a sequence of Unicode
> codepoints.  These don't include shaping information

With "presentation form" characters, yes, they can, they do including
shaping information.

> and are not supposed
> to.  The shaping is explicitly meant to be done by the terminal,

Why?

> and it's
> the terminal who's equipped with _most_ of the needed data

Why? It's the app that knows the context characters, it's the app that
knows the language.

What is it that the terminal knows, but the app doesn't although
should, or what is it that the terminal doesn't know if presentation
form characters are used?

What is it that the app knows but cannot pass to the terminal?
Shouldn't we then extend the protocol so that it can pass these, too?

e.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

> Personally, I think we should simply assume that complex script
> shaping is left to the terminal, and if the terminal cannot do that,
> then that's a restriction of working on a text terminal.

I cannot read any of the Arabic, Syriac etc. scripts, but I did lots
of experimenting with picking random Arabic words, displaying in the
terminal their unshaped as well as shaped (using presentation form
characters) variants, and compared them to pango-view's (harfbuzz's)
rendering.

To my eyes the version I got in the terminal with the presentation
form characters matched the "expected" (pango-view) rendering
extremely closely. Of course there's still some tradeoffs due to fixed
with cells (just as in English, arguably an "i" and "w" having the
same width doesn't look as nice as with proportional fonts). In the
mean time, the unshaped characters looks vastly differently.

> OTOH a terminal emulator who wants to perform shaping needs
> information from the application

And the presentation form characters are pretty much exactly that
information, aren't they (for Arabic)?

> There's nothing you can do here [...] there's no way for the application to 
> provide

Instead of saying that it's not possible, could we perpahs try to
solve it, or at least substantially improve the situation? I mean, for
example we can introduce control characters that specify the language.
We can introduce a flag that tells the terminal whether to do shaping
or not. There are probably plenty of more ideas to be thrown in for
discussion and improvement.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Eli,

On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii  wrote:

> I think the application could use TAB characters to get to the next
> cell, then simplistic reordering would also work.

TAB handling is extremely complicated, because in terminal emulation
TAB is not a character, TAB is a control instruction (like escape
sequences) that moves the cursor (and jumps through the existing
content, if any, without erasing it). Some terminal emulators perform
some magic to remember TABs in certain circumstances, but they cannot
always do so.

There are plenty of other problems, e.g. how they are handled at the
end of line (no, they don't wrap to the next line), how their
positions are user-configurable and not necessarily at every 8th
column etc., I'm not going into these details now if you don't mind,
it's just not a feasible approach.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Eli,

> Does anyone know of a terminal emulator which supports isolates?

GNOME Terminal's (VTE's) current work-in-progress implementation does
remember BiDi control characters just like it remembers combining
accents, that is, tied to the preceding letter's cell. It uses FriBidi
1.0 for the BiDi work, so yes, it supports Unicode 6.3's isolates.

There's one significant issue, though. Because we currently just
misuse our existing infrastructure of combining accents for the BiDi
controls, BiDi controls at the very beginning of a paragraph are
dropped. Addressing this issue would need core changes to the terminal
emulation behavior, such as introducing in-between-cells storage, or
zero-width special characters belonging to a cell _before_ the cell's
actual letter, or something like this. I outline one idea in my
specification, but it's subject to discussion to finalize it.

(There's also a less significant issue: copy-pasting fragments of text
probably doesn't produce the contents that make the most sense wrt.
BiDi controls. I'm not sure what other software do here, though.)

Mintty is also actively working on BiDi support, I believe its author
just recently added support for isolates. It uses its own BiDi
implementation.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Eli,

> It doesn't do _any_ shaping.  Complex script shaping is left to the
> terminal, because it's impossible to do shaping in any reasonable way
> [...]

Partially, you are right. On the other hand, as far as I know, shaping
should take into account the neighboring glyphs even if those are not
visible (e.g. overflow from the viewport), and the terminal is unaware
of what those glyps are. This is an area that "presentation form"
characters can address for Arabic – although as it was pointed out,
not for Syrian and some others.

I'd say it's subject to further research and improvement to find the
ideal behavior.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

> A formatted table is pretty unsuitable for automated processing, and
> obviously meant for human display.

Could you please clarify how exactly that data looks like? Maybe a
tiny hexdump of an example?

Is the RTL piece of text already stored in visual order, that is,
beginning with the leftmost (last logical) letter of the word? If so
then you can sure display it properly in BiDi-unaware rendering
engines (including most terminal emulators currently, as well as in
"explicit" mode according to my specification). That is, whoever
produces that data reverses that word for you?

Or is the RTL piece of text still in its logical order? Then in what
piece of software does this formatted data show up to you in a
readable way?

> You're a terminal emulator maintainer, thus it's natural for you to think
> it's the right place to come up with a solution.

No. I've been a maintainer/developer/contributor to all kinds of
software, including (but not limited to) terminal emulators, apps
running inside terminal emulators, or a pretty complex RTL homepage.
I'm doing my best in looking at the entire ecosystem, and coming up
with a good BiDi-aware interface between terminal emulators and
applications.

> I'd argue that it's not --
> all a terminal emulator can do is to display already formatted text, there's
> no sane way to move things around.

You missed that your use case with this table is not the only possible
use case. There are others where the terminal needs to do BiDi. My
work aims to address multiple use cases at once, yours being one of
them.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Frédéric,

> I guess Arabic shaping is doable through presentation form characters,
> because the latter are character inherited from legacy standards using
> them in such solutions. But if you want to support other “arabic like”
> scripts (like Syriac, N’ko), or even some LTR complex scripts, like
> Myanmar or Khmer, this “solution” cannot work, because no equivalent of
> “presentation form characters” exists for these scripts

Unfortunately my knowledge ends here, I'm not familiar with shaping
for Syriac and other similar scripts. I'd really appreciate input from
experts here.

I outline in the document problems that arise from the terminal
emulator performing shaping on its contents in "explicit" mode, which
is to be used by Emacs and others. The terminal emulator is not aware
of the characters that are chopped off at the edge of the screen,
required for shaping. The terminal emulator is not aware of which
characters happen to be placed next to each other, but belong to
semantically different UI elements, that is, shouldn't be shaped.

(And as a side note, FriBidi doesn't provide a method for doing
shaping on _visual_ order. I'm unsure about other libraries, and
unsure if there's an algorithm for it at all.)

Honestly, I have no idea how to best address all these problems at
once. This is where we can think of extensions "expliti mode level 2",
use control characters that explicitly specify how to shape certain
glyphs. This is subject to further research.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski  wrote:

> > > ╒═══╤══╕
> > > │ filename1 │  123 │
> > > │ FILENAME2 │   17 │
> > > └───┴──┘

> That's possible only if the program in question is running directly attached
> to the tty.  That's not an option if the output is redirected.  Frames in
> a plain text file are a perfectly rational, and pretty widespread, use --
> and your proposal will break them all.  Be it "cat" to the screen, "less" or
> even "mutt" if the text was sent via a mail.

I'd argue that if you have such a data stored in a file, with logical
order used in Arabic or Hebrew text, combined with line drawing chars
as you showed, then your data is broken to begin with – broken in the
sense that it's suitable for automated processing (*), but not for
display. I can't think of any utility that would display it properly,
because that's not what the Unicode BiDi algorithm run over this data
produces.

(*) but then line drawing chars are not really a nice choice over CSV,
JSON, whatever.

The only possible choice is for some display engine to be aware that
line drawing characters are part of a "higher level protocol", and
BiDi should be applied only in the lower scope. I don't think the
terminal emulator is the right place to make such decisions – I don't
think any other generic tool (graphical word processor, browser etc.)
does make such a call either.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Adam,

One more note, to hopefully clarify:

> ╒═══╤══╕
> │ filename1 │  123 │
> │ FILENAME2 │   17 │
> └───┴──┘
>
> I'm afraid there's no good way to do BiDi without support from individual
> programs.

In this particular example, when the output consists of RTL text in
logical order (the emitter does not reorder the characters to their
visual order, nor emit any BiDi controls), combined with line drawing
and such, there is hardly anything we could do purely on the terminal
emulator's side.

I did not consider the possibility of certain characters (e.g. line
drawing ones) being "stop characters", and BiDi to get applied only in
runs of other characters. Any such magic would be arbitrary, fix a
subset of the cases while cause other unforeseen breakages elsewhere.
E.g. what if someone intentionally uses these characters as
letter-like ones in a BiDi text, like """here I'm talking about the
'└' shaped corner"""... or what if poor man's ASCII pipe and other
symbols are used... it's way too risky to go into any kind of
heuristics.

In this particular case the terminal cannot magically fix the output
for you, you'll need to get the application fixed.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Adam,

> Even a line is way too big a piece to be safely reordered by the terminal.
> What you propose will break every full-screen program that uses line-drawing
> characters:

Certain terminal emulators already perform BiDi on their lines. They
already break every full-screen program with line-drawing and such, as
you pointed out. What my proposal adds, amongst plenty of other
things, is a means to automatically disable the terminal's BiDi,
rather than having to go to its settings. This way you can automate
the fix of the apps that aren't explicitly fixed, e.g. via wrapper
scripts, or terminfo entries with special ti/te definitions.

> I'm afraid there's no good way to do BiDi without support from individual
> programs.

Depends on the use case.

For complex apps, like text editors, you are right, the terminal
emulator must stay out of the game.

For simple utilities, like "cat" and friends, there's no way you can
implement BiDi support in "cat" itself. Here the terminal needs to do
it.

Your use case with tables is perhaps somewhat in the middle. One
possible approach for the emitting utility is to disable BiDi in the
terminal (switch to "explicit" mode) for the scope of this output.
Another possible approach is to leave the terminal doing BiDi, but
embed all the text fragments in FSI...PDI blocks. (This latter is
subject to a bit of further research, to be exactly specified in a
forthcoming version of the specs.)

What is extremely tough here is realizing that there are multiple
conflicting requirements (including the example you gave), and coming
up with a soluiton that satisfies the needs of all. This is what my
work aims to do.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Eli,

> My personal experience with bringing BiDi to Emacs led me to a firm
> conclusion that BiDi support by terminal emulators cannot be relied on
> by sophisticated text editing and display applications that are
> BiDi-aware.  The terminal emulator can never be smart enough to do
> what the editing needs require, so the application eventually ends up
> jumping through hoops in order to trick the terminal into doing TRT.
> It is easier to tell users to disable BiDi support of the terminal (if
> it even has one), and do everything in the app.  This is the only way
> of having full control of what is displayed, especially when
> "higher-level protocols" need to be used to tailor the UBA to the need
> of the user, because there's usually no way of asking the terminal to
> apply a behavior which deviates from the UBA.

We are absolutely on the same page here. As long as the use case is
text editing or something similar, it's harmful if the terminal
emulator aims to do any BiDi.

Having to tell users to turn off BiDi in the emulator's settings is in
my firm opinion a user experience no-go. It has to be automatic,
happen under the hood, that is, using escape sequences.

There's another side to the entire BiDi story, though. Simple
utilities like "echo", "cat", "ls", "grep" and so on, line editing
experience of your shell, these kinds. It's absolutely not feasible to
add BiDi support to these utilities. Here the only viable approach is
to have the terminal emulator do it.

Hence, as I confirm ECMA TR/53's realization of 28 years ago, there
have to be two substantially different modes. "Explicit" mode for what
you need for Emacs: the terminal to stay out of the game; and
"implicit" mode where the terminal performs BiDi for the sake of "cat"
and other simple utiltiies.

I'm also arguing that contrary to TR/53, there's no way to hook up a
mode switch to "cat" and a gazillion of other similar tools. The only
reaslisticly implementable approach is if the "implicit" mode is the
default so that simple utilities provide a proper BiDi experience.
Those very few fullscreen apps that do know what they are doing and do
want the terminal to leave the characters at their designated place
(such as Emacs, Vim etc.) will have to request this "explicit" mode
from the terminal.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Eli,

> > In turn, vim, emacs and friends stand there clueless, not knowing
> > how to do BiDi in terminals.
>
> This is inaccurate: [...]

I have to admit, I was somewhat sloppy in the phrasing of this
announcement. My bad, apologies.

Currently some terminal emulators shuffle the characters around for
display purposes, while most don't. There's absolutely no way an
editor (no matter if Emacs or any other) could produce the desired
look on both kinds. I actually present a proof that an editor cannot
always produce the desired look on ones that shuffle their contents
around. So it's a somewhat reasonable expectation to produce the
desired look on ones that don't shuffle their cells.

In the document, more precisely at [1] I evalute my findings with GNU
Emacs 25.2. (I've just fixed the page to add "GNU", thanks for
pointing this out!)

Brief summary:

- GNU Emacs reshuffles the characters according to the BiDi algorithm,
expecting that the terminal emulator doesn't do any BiDi.

- According to my recommendation, in order to address BiDi in the
entire ecosystem around terminal emulators, the default behavior will
have to be that terminals shuffle the characters around. Don't worry,
there'll be a mode where this shuffling doesn't occur. Emacs (and all
other BiDi-aware text editors) will have to switch to this mode.

- It doesn't do Arabic shaping. In my recommendation I'm arguing that
in this mode, where shuffling the characters is the task of the text
editor and not the terminal, so should it be for Arabic shaping using
presentation form characters.

- When it comes to visually wrapping a line because it doesn't fit in
the current width, Emacs goes its own way which doesn't match what the
Unicode BiDi algorithm says. I'm not saying Emacs's behavior is bad
per se or unreasonable, and it's out of the scope of my work to try to
get it changed, but I'm making a note that it's different.

[1] https://terminal-wg.pages.freedesktop.org/bidi/prior-work/applications.html

cheers,
egmont

Proposal for BiDi in terminal emulators

2019-01-29 Thread Egmont Koblinger via Unicode

Hi,

Terminal emulators are a powerful tool used by many people for various
tasks. Most terminal emulators' bugtracker has a request to add RTL /
BiDi support. Unicode has supported BiDi for about 20 years now.
Still, the intersection of these two fields isn't solved. Even some
Unicode experts have stated over time that no one knows how to do it
properly.

The only documentation I could find (ECMA TR/53) predates the Unicode
BiDi algorithm, and as such no surprise that it doesn't follow the
current state of the art or best practices.

Some terminal emulators decided to run the BiDi algorithm for display
purposes on its lines (rather than paragraphs, uh), not seeing the big
picture that such a behavior turns them into a platform on top of
which it's literally impossible to implement proper BiDi-aware text
editing (vim, emacs, whatever) experience. In turn, vim, emacs and
friends stand there clueless, not knowing how to do BiDi in terminals.

With about 5 years of experience in terminal emulator development, and
some prior BiDi homepage developing experience with the kind mentoring
of one of the BiDi gurus (Aharon, if you're reading this, hi there!),
I decided to tackle this issue. I studied and evaluated the
aforementioned documentation and the behavior of such terminals,
pointed out the problems, and came up with a draft proposal.

My work isn't complete yet. One of the most important pending issues
is to figure out how to track BiDi control characters (e.g. which
character cells they belong to), it is to be addressed in a subsequent
version. But I sincerely hope I managed to get the basics right and
clean enough so that work can begin on implementing proper support in
terminal emulators as well as fullscreen text applications; and as we
gain experience and feedback, extending the spec to address the
missing bits too.

You can find this (draft) specification at [1]. Feedback is welcome –
if it's an actionable one then preferably over there in the project's
bugtracker.

[1] https://terminal-wg.pages.freedesktop.org/bidi/


cheers,
egmont (GNOME Terminal / VTE co-developer)

77 matches

Mail list logo