Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Benjamin Riefenstahl via Unicode
Hi Richard,


> Benjamin Riefenstahl wrote:
>> the severe limitations of that environment.

Richard Wordingham writes:
> Eli will probably tell me I'm behind the times, but there are a few
> places where a Gnome-terminal is better than an Emacs GUI window.  One
> is colour highlighting of text found by grep.  Another is that screen
> overwriting doesn't work in an Emacs window.

I have not followed all of this thread, but is that on-topic?  Anyway I
did not mean to talk about Emacs GUI windows, they are a completely
different animal from terminal windows in my mind.  Where Emacs GUI
windows lack features in their interaction with other programs, people
who care about that are implementing those features.  There is no theory
or research necessary, beyond understanding the existing codebase.

>> Additional character forms could be added, where the Unicode
>> repertoire is not sufficient.  This could use PUA characters

> You do not need PUA. For U+0756 ARABIC LETTER BEH WITH SMALL V, we
> can form:
>
> Initial form:   200C 0756 200D
> Medial form:200D 0756 200D
> Final form: 200D 0756 200C
> Isolated form:  200C 0756 200C
>
> The tricky bit is to get the terminal to accept them as cell contents.

If you want to implement in the terminal that it should interprete these
sequences, you can just as well implement shaping as a whole,
i.e. interprete any sequence that needs shaping.  There is no reason for
control characters here, I think.

I was looking at it from the standpoint of what works now, sending
presentation forms to the terminal, and what than could be simple means
to extend that mechanism to support more shaping variants.  PUA
characters could work without changes in the terminal emulators
themself.  You would only need the font that supports those PUA
characters, which is easy if you start from a Truetype font that already
supports that script and thus presumably already has that glyph.  From
my POV that is a very simple technique.


benny


Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Benjamin Riefenstahl via Unicode
Hi Egmont, hi all,


This is a interesting discussion here.  If only because I would have
thought that there is only minimal interest by the actual target
audience in supporting these scripts in a terminal, given the severe
limitations of that environment.  The most important limitation seems to
me that a monospaced font must be used, which does not suite most
scripts that do shaping.  On the script-level I am familiar with Arabic,
Syraic and Mandaic (I don't actually speak any of these languages, so if
you want a real expert, I am not that person).  Monospaced Arabic
struggles and is not very elegant.  I have not seen solutions for
monospaced Syriac or Mandaic but I have trouble to even to imagine them.

OTOH, that inelegance maybe can be an excuse (or a guide if you prefer)
to make the implementation simpler in other respects, because
expectations should be lower than for a graphical application.

Anyway, as a concrete addition to the discussion, I have a simple Arabic
shaping solution for Emacs on the terminal, especially on the Linux
console, and this discussion finally made me make it public on Gitlab,
see https://gitlab.com/cc_benny/termshape.  The Gitlab CD is activated,
so (mostly) ready-make Emacs packages can be downloaded as build
artifacts.  If anybody wants to discuss this implementation, we should
probably move that discussion somewhere else, like to the Emacs mailing
list (https://lists.gnu.org/mailman/listinfo/emacs-devel).

Some specific technical points from thinking about the problem on my
side:

Presentation forms: Termshape uses the Arabic presentation forms
available and so it is somewhat limited as mentioned by Eli.  Given that
we need to keep the implementation simple anyway, I am not sure that
significantly more is really needed, at least given what Emacs provides
already.  Additional character forms could be added, where the Unicode
repertoire is not sufficient.  This could use PUA characters or other
means like terminal control sequences.  In both cases a common
understanding would be needed between the terminal (or the font used by
it) and the application, outside of Unicode.

Ligatures: With most shaping one character is transformed into a
character form that still only occupies one cell.  A ligature like
lam-alif OTOH only occupies one cell for two characters, so for
justification etc. the application will have to know that the two
characters together have a width of 1 on the screen.  This is easier if
the applicaton does the selection of ligatures.  If you want to do this
in the terminal, the application would probably need to have some way to
measure the display width of a string, so that it can handle the
situation.  Be prepared though for the application to make quite a lot
of these requests.  For my own main use case for Emacs on a terminal,
display over SSH, that could become a problem.

Diacritics: The application can know what is a non-spacing character and
what is not.  So it can know that diacritics do not occupy their own
cell and it should be able to ignore whether the terminal supports a
specific diacritic or not.  If the terminal does not support a diacritic
the terminal can either just leave it out or the terminal can mess up
the display more of less irreparably.  In the first case, the worst is
that the user does not see the character, in the second case the
application cannot do anything about it with reasonable effort IMO.

A real problem is a combination of diacritics and ligatures.  Any
diacritic applies to only one character in the ligature, and between the
application and the terminal it is currently not possible to determine
which one.  This is one area where an implementation in the terminal
would clearly have the advantage.  But a terminal control sequence could
also help.  IMO we are talking about a luxury problem here, though.  Do
we want to set as our first goal showing complete quranic verses in all
their glory, or are we satisfied with everyday Arabic like say the
website of a modern Arabic newspaper?


Thanks for your effort and for starting this discussion,
benny