RE: Encoding italic

2019-01-30 Thread Tex via Unicode
David, Asmus,
 
·   “without external standards, then it's simply impossible.”

·   “And without external standard, not interoperable.“

As you both know there are de jure as well as de facto standards. So for years 
people typed : - ) as a smiley without a de facto standard and at some point 
long before emoji, systems began converting these to smiley faces.

Even the utf-8 BOM began as one company’s non-interoperable convention for 
encoding identifier which later became part of the de facto standard.

Ideally interoperability means supported everywhere but we have many useful 
mechanisms that simply don’t do harm without being interpreted.

For example, Unicode relies on this for backward compatibility when it 
introduces new characters, properties, algorithms, et al that are not 
understood by all systems but are tolerated by older ones.

=

While I am at it, I am amused by the arguments earlier in this thread as well 
as other threads, that go:

·   If the feature was needed developers would have implemented it by now. 
It isn’t implemented so the standard doesn’t need it.

·   The feature was implemented without the standard, so we don’t need it 
in the standard.

If men were meant to fly they would have wings…

Apparently, for some, it is only when there are many conflicting 
implementations that a feature demonstrates both that it is a requirement and 
also that it should be standardized.

In fact, this is sometimes not a bad view as it prevents adding features to the 
standard that go unused yet add complexity. 

But, it can also set too high a bar. And often it isn’t a true criteria but 
just resistance to change.

You  don’t need italics. When I went to school we just tilted the terminal a 
few degrees and voila.

(You don’t need a car. When I went to school we walked 6 miles to get there. 
Uphill both ways. J )

tex

 

 

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
via Unicode
Sent: Wednesday, January 30, 2019 10:20 PM
To: unicode@unicode.org
Subject: Re: Encoding italic

 

On 1/30/2019 7:46 PM, David Starner via Unicode wrote:

On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
   wrote:

A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline in plain-text

 
Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.
 

It's either "markdown" or control/tag sequences. Both are out of band 
information.

And without external standard, not interoperable.

A./



Re: Encoding italic

2019-01-30 Thread James Kass via Unicode



David Starner wrote,

>> ... italics, bold, strikethrough, and underline in plain-text
>
> Okay? Ed can do that too, along with nano and notepad. It's called
> HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
> without external standards, then it's simply impossible.

HTML source files are in plain-text.  Hopefully everyone on this list 
understands that and has already explored the marvelous benefits offered 
by granting users the ability to make exciting and effective page 
layouts via any plain-text editor.  HTML is standard and interchangeable.


As Tex Texin observed, differences of opinion as to where we draw the 
line between text and mark-up are somewhat ideological.  If a compelling 
case for handling italics at the plain-text level can be made, then the 
fact that italics can already be handled elsewhere doesn’t matter.  If a 
compelling case cannot be made, there are always alternatives.


As for use of other variant letter forms enabled by the math 
alphanumerics, the situation exists.  It’s an interesting phenomenon 
which is sometimes worthy of comment and relates to this thread because 
the math alphanumerics include italics.  One of the web pages referring 
to third-party input tools calls the practice “super cool Unicode text 
magic”.




Re: Encoding italic

2019-01-30 Thread Asmus Freytag via Unicode

  
  
On 1/30/2019 7:46 PM, David Starner via
  Unicode wrote:


  On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 wrote:

  
A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline in plain-text

  
  
Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.



It's either "markdown" or control/tag
sequences. Both are out of band information.
And without external standard, not
interoperable.
A./

  



Re: Encoding italic

2019-01-30 Thread David Starner via Unicode
On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 wrote:
> A new beta of BabelPad has been released which enables input, storing,
> and display of italics, bold, strikethrough, and underline in plain-text

Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.

-- 
Kie ekzistas vivo, ekzistas espero.


Re: Encoding italic

2019-01-30 Thread Asmus Freytag via Unicode

  
  
On 1/30/2019 4:38 PM, Kent Karlsson via
  Unicode wrote:


  I did say "multiple" and "for instance". But since you ask:

ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in Cygwin (sorry for mentioning a product name).)



No need to be sorry; we understand that the motivation is not so
  much advertising as giving a concrete example. It would be
  interesting if anything out there implements CMY(K). My
  expectation would be that this would be limited to interfaces for
  printers or their emulators.




  
(The "named" ones, though very popular in terminal emulators, are
all much too stark, I think, and the exact colour for them are
implementation defined.)



Muted colors are something that's become more popular as display
  hardware has improved. Modern displays are able to reproduce these
  both more predictably as well as with the necessary degree of
  contrast (although some users'/designer's fetish for low contrast
  text design is almost as bad as people randomly mixing "stark"
  FG/BG colors in the '90s.)




  

ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
traditionally does not use bold or italic. Compare those specified for CSS
(https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
be of interest for the generalised subject of this thread.



Mapping all of these to CSS would be essential if you want this
  stuff to be interoperable.




  

There are some other differences as well, but those are the major ones
with regard to text styling. (I don't know those standards to a tee.
I've just looked at the "m" control sequences for text styling. And yes,
I looked at the free copies...)

/Kent Karlsson

PS
If people insist on that EACH character in "plain text" italic/bold/etc
"controls" be default ignorable: one could just take the control sequences
as specified, but map the printable characters part to the corresponding
tag characters... Not that I think that that is really necessary.

Systems that support "markdown", i.e. simplified markup to
  provide the most main-stream features of rich-text tend to do that
  with printable characters, for a reason. Perhaps two reasons.
Users find it preferable to have a visible fallback when
  "markdown" is not interpreted by a receiving system and users'
  generally like the ability to edit the markdown directly (even if,
  for convenience) there's some direct UI support for adding text
  styling.
Loading up the text with lots of invisible characters that may be
  deleted or copied out of order by someone working on a system that
  neither interprets nor displays these code points is an
  interoperability nightmare in my opinion.




  


Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" :


  
Kent Karlsson wrote:
 


  Yes, great. But as I've said, we've ALREADY got a
default-ignorable-in-display (if implemented right)
way of doing such things.

And not only do we already have one, but it is also
standardised in multiple standards from different
standards institutions. See for instance "ISO/IEC 8613-6,
Information technology --- Open Document Architecture (ODA)
and Interchange Format: Character content architecture".


 
I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
has the advantage of not costing me USD 179, and it looks very similar
to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
are talking about: setting text display properties such as bold and
italics by means of escape sequences.
 
Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
doing, and if it does not, why we should not simply refer to the more
familiar 6429?
 
--
Doug Ewell | Thornton, CO, US | ewellic.org


  
  






  



Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Mark E. Shoulson via Unicode

On 1/30/19 8:58 AM, Egmont Koblinger via Unicode wrote:

There's another side to the entire BiDi story, though. Simple
utilities like "echo", "cat", "ls", "grep" and so on, line editing
experience of your shell, these kinds. It's absolutely not feasible to
add BiDi support to these utilities. Here the only viable approach is
to have the terminal emulator do it.


How will "ls -l" possibly work?  This is an example of the "table" 
layout you were already discussing.


I think us command-line troglodytes just have to deal with not having a 
whole lot of BiDi support.  There's simply no way any terminal emulator 
could possibly know what makes sense and what doesn't for a given line 
of text, coming from some random program.  Your "grep" could be grepping 
from a file with ANY layout, not necessarily one conducive to terminal 
layout, and so on.


~mark



Re: Encoding italic

2019-01-30 Thread Kent Karlsson via Unicode
I did say "multiple" and "for instance". But since you ask:

ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in Cygwin (sorry for mentioning a product name).)
(The "named" ones, though very popular in terminal emulators, are
all much too stark, I think, and the exact colour for them are
implementation defined.)

ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
traditionally does not use bold or italic. Compare those specified for CSS
(https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
be of interest for the generalised subject of this thread.

There are some other differences as well, but those are the major ones
with regard to text styling. (I don't know those standards to a tee.
I've just looked at the "m" control sequences for text styling. And yes,
I looked at the free copies...)

/Kent Karlsson

PS
If people insist on that EACH character in "plain text" italic/bold/etc
"controls" be default ignorable: one could just take the control sequences
as specified, but map the printable characters part to the corresponding
tag characters... Not that I think that that is really necessary.


Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" :

> Kent Karlsson wrote:
>  
>> Yes, great. But as I've said, we've ALREADY got a
>> default-ignorable-in-display (if implemented right)
>> way of doing such things.
>> 
>> And not only do we already have one, but it is also
>> standardised in multiple standards from different
>> standards institutions. See for instance "ISO/IEC 8613-6,
>> Information technology --- Open Document Architecture (ODA)
>> and Interchange Format: Character content architecture".
>  
> I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
> has the advantage of not costing me USD 179, and it looks very similar
> to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
> are talking about: setting text display properties such as bold and
> italics by means of escape sequences.
>  
> Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
> doing, and if it does not, why we should not simply refer to the more
> familiar 6429?
>  
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
> 




Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Asmus Freytag via Unicode

  
  
Arabic terminals and terminal emulators
  existed at the time of Unicode 1.0. If you are trying to emulate
  those services, for example so that older software can run, you
  would need to look at how these programs expected to be fed their
  data.


I see little reason to reinvent things
  here, because we are talking about emulating legacy hardware. Or
  are we not?


It's conceivable, that with modern
  fonts, one can show some characters that could not be supported on
  the actual legacy hardware, because that was limited by available
  character memory and available pre-Unicode character sets. As long
  as the new characters otherwise fit the paradigm (character per
  cell) they can be supported without other changes in the protocol
  beyond change in character set.


However, I would not expect an emulator
  to accept data in NFD for example.


A./





On 1/30/2019 2:02 PM, Richard
  Wordingham via Unicode wrote:


  On Wed, 30 Jan 2019 15:33:38 +0100
Frédéric Grosshans via Unicode  wrote:


  
Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :


  - It doesn't do Arabic shaping. In my recommendation I'm arguing
that in this mode, where shuffling the characters is the task of
the text editor and not the terminal, so should it be for Arabic
shaping using presentation form characters.  



I guess Arabic shaping is doable through presentation form
characters, because the latter are character inherited from legacy
standards using them in such solutions.

  
  
So long as you don't care about local variants, e.g. U+0763 ARABIC
LETTER KEHEH WITH THREE DOTS ABOVE.  It has no presentation form
characters.

Basic Arabic shaping, at the level of a typewriter, is straightforward
enough to leave to a terminal emulator, as Eli has suggested.  Lam-alif
would be trickier - one cell or two?


  
But if you want to support
other “arabic like” scripts (like Syriac, N’ko), or even some LTR
complex scripts, like Myanmar or Khmer, this “solution” cannot work,
because no equivalent of “presentation form characters” exists for
these scripts

  
  
I believe combining marks present issues even in implicit modes.  In
implicit mode, one cannot simply delegate the task to normal text
rendering, for one has to allocate text to cells.  There are a number
of complications that spring to mind:

1) Some characters decompose to two characters that may otherwise lay
claim to their own cells:

U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
0654>.  Do you intend that your scheme be usable by Unicode-compliant
processes?

2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which
canonically decomposes into a preceding combining mark U+0D46 MALAYALAM
VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN
AA.

3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER
VOWEL SIGN OO.  OpenType layout decomposes that into a preceding
'U+17C1 KHMER VOWEL SIGN E' and the second part.

4) Indic conjuncts.
(i) There are some conjuncts, such as Devanagari K.SSA, where a
display as ,  is simply unacceptable.  In some
closely related scripts, this conjunct has the status of a character.

(ii) In some scripts, e.g. Khmer, the virama-equivalent is not an
acceptable alternative to form a consonant stack.  Khmer could
equally well have been encoded with a set of subscript consonants in
the same manner as Tibetan.

(iii) In some scripts, there are marks named as medial consonants
which function in exactly the same way as <'virama', consonant>; it is
silly to render them in entirely different manners.

5) Some non-spacing marks are spacing marks in some contexts.  U+102F
MYANMAR VOWEL SIGN U is probably the best known example.

Richard.



  






  



Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Richard Wordingham via Unicode
On Wed, 30 Jan 2019 15:33:38 +0100
Frédéric Grosshans via Unicode  wrote:

> Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :
> > - It doesn't do Arabic shaping. In my recommendation I'm arguing
> > that in this mode, where shuffling the characters is the task of
> > the text editor and not the terminal, so should it be for Arabic
> > shaping using presentation form characters.  
> 
> I guess Arabic shaping is doable through presentation form
> characters, because the latter are character inherited from legacy
> standards using them in such solutions.

So long as you don't care about local variants, e.g. U+0763 ARABIC
LETTER KEHEH WITH THREE DOTS ABOVE.  It has no presentation form
characters.

Basic Arabic shaping, at the level of a typewriter, is straightforward
enough to leave to a terminal emulator, as Eli has suggested.  Lam-alif
would be trickier - one cell or two?

> But if you want to support
> other “arabic like” scripts (like Syriac, N’ko), or even some LTR
> complex scripts, like Myanmar or Khmer, this “solution” cannot work,
> because no equivalent of “presentation form characters” exists for
> these scripts

I believe combining marks present issues even in implicit modes.  In
implicit mode, one cannot simply delegate the task to normal text
rendering, for one has to allocate text to cells.  There are a number
of complications that spring to mind:

1) Some characters decompose to two characters that may otherwise lay
claim to their own cells:

U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
0654>.  Do you intend that your scheme be usable by Unicode-compliant
processes?

2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which
canonically decomposes into a preceding combining mark U+0D46 MALAYALAM
VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN
AA.

3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER
VOWEL SIGN OO.  OpenType layout decomposes that into a preceding
'U+17C1 KHMER VOWEL SIGN E' and the second part.

4) Indic conjuncts.
(i) There are some conjuncts, such as Devanagari K.SSA, where a
display as ,  is simply unacceptable.  In some
closely related scripts, this conjunct has the status of a character.

(ii) In some scripts, e.g. Khmer, the virama-equivalent is not an
acceptable alternative to form a consonant stack.  Khmer could
equally well have been encoded with a set of subscript consonants in
the same manner as Tibetan.

(iii) In some scripts, there are marks named as medial consonants
which function in exactly the same way as <'virama', consonant>; it is
silly to render them in entirely different manners.

5) Some non-spacing marks are spacing marks in some contexts.  U+102F
MYANMAR VOWEL SIGN U is probably the best known example.

Richard.



  



Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode
Kent Karlsson wrote:
 
> Yes, great. But as I've said, we've ALREADY got a
> default-ignorable-in-display (if implemented right)
> way of doing such things.
>
> And not only do we already have one, but it is also
> standardised in multiple standards from different
> standards institutions. See for instance "ISO/IEC 8613-6,
> Information technology --- Open Document Architecture (ODA)
> and Interchange Format: Character content architecture".
 
I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
has the advantage of not costing me USD 179, and it looks very similar
to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
are talking about: setting text display properties such as bold and
italics by means of escape sequences.
 
Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
doing, and if it does not, why we should not simply refer to the more
familiar 6429?
 
--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode
Martin J. Dürst wrote:
 
> Here's a little dirty secret about these tag characters: They were
> placed in one of the astral planes explicitly to make sure they'd use
> 4 bytes per tag character, and thus quite a few bytes for any actual
> complete tags.
 


Aha. That explains why SCSU had to be banished to the hut, right around
the same time the Plane 14 language tags were deprecated. In SCSU,
astral characters can be 1 byte just like BMP characters.

 
 
--
Doug Ewell | Thornton, CO, US | ewellic.org
 



Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Adam Borowski via Unicode
On Wed, Jan 30, 2019 at 05:56:00PM +0200, Eli Zaretskii via Unicode wrote:
> > - It doesn't do Arabic shaping.
> 
> It doesn't do _any_ shaping.  Complex script shaping is left to the
> terminal, because it's impossible to do shaping in any reasonable way
> without controlling the fonts being used and accessing the font
> information, and this is not possible when you run on a terminal

It's the inverse of the situation with RTL reordering.  The interface
between the program and the terminal is a character cell grid (really, a
sequence of printables and \e-based codes, but that's a technical detail).

The program (emacs in this case) can do arbitrary reordering of characters
on the grid, it also has lots of information the terminal doesn't.  For
example, what are you going to do when there's a line longer than what fits
on the screen?  Emacs will cut and hide part of it; any attempts to reorder
that paragraph by the terminal are outright broken as you don't _have_ the
paragraph.  Same for a popup window on the middle of the screen partially
obscuring some text underneath.  And if you argue "so make emacs print your
new code to disable formatting", so do thousands of other programs that are
less sophisticated than emacs.

On the other hand, all that the program can output is a sequence of Unicode
codepoints.  These don't include shaping information, and are not supposed
to.  The shaping is explicitly meant to be done by the terminal, and it's
the terminal who's equipped with _most_ of the needed data (it might lack
context just outside screen's end or under an overlapped window, but that's
not specific to complex shaping -- same can happen for the other half of a
CJK character).  You know if the font used supports shaping, you can have
access to a graphic view (as opposed to the array of codepoints) -- heck,
it's only you who know the text is rendered on a screen rather than a
Braille device.  And if you miss an opportunity to shape something, the
result is still readable to the user, merely not as good as it could be.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> Date: Wed, 30 Jan 2019 15:49:34 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> I outline in the document problems that arise from the terminal
> emulator performing shaping on its contents in "explicit" mode, which
> is to be used by Emacs and others. The terminal emulator is not aware
> of the characters that are chopped off at the edge of the screen,
> required for shaping. The terminal emulator is not aware of which
> characters happen to be placed next to each other, but belong to
> semantically different UI elements, that is, shouldn't be shaped.
> 
> (And as a side note, FriBidi doesn't provide a method for doing
> shaping on _visual_ order. I'm unsure about other libraries, and
> unsure if there's an algorithm for it at all.)
> 
> Honestly, I have no idea how to best address all these problems at
> once. This is where we can think of extensions "expliti mode level 2",
> use control characters that explicitly specify how to shape certain
> glyphs. This is subject to further research.

Personally, I think we should simply assume that complex script
shaping is left to the terminal, and if the terminal cannot do that,
then that's a restriction of working on a text terminal.  There's
nothing you can do here, because correct shaping requires too many
features that applications running on text terminals cannot use, and
OTOH a terminal emulator who wants to perform shaping needs
information from the application (like the directionality of the text
and its language) that there's no way for the application to provide.


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> Date: Wed, 30 Jan 2019 15:25:32 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> > ╒═══╤══╕
> > │ filename1 │  123 │
> > │ FILENAME2 │   17 │
> > └───┴──┘
> >
> > I'm afraid there's no good way to do BiDi without support from individual
> > programs.
> 
> In this particular example, when the output consists of RTL text in
> logical order (the emitter does not reorder the characters to their
> visual order, nor emit any BiDi controls), combined with line drawing
> and such, there is hardly anything we could do purely on the terminal
> emulator's side.

I think the application could use TAB characters to get to the next
cell, then simplistic reordering would also work.

But in general, yes: this is one of the examples why sophisticated
text-editing applications cannot leave this to the terminal.  (Another
example is handling mouse clicks, if the terminal supports that.)


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> Date: Wed, 30 Jan 2019 15:07:22 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> Another possible approach is to leave the terminal doing BiDi, but
> embed all the text fragments in FSI...PDI blocks.

Does anyone know of a terminal emulator which supports isolates?


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Wed, 30 Jan 2019 14:36:42 +0100
> Cc: unicode@unicode.org
> 
> - GNU Emacs reshuffles the characters according to the BiDi algorithm,
> expecting that the terminal emulator doesn't do any BiDi.

Yes, users are told to disable bidi reordering of the terminal, if the
terminal supports that.

> - It doesn't do Arabic shaping.

It doesn't do _any_ shaping.  Complex script shaping is left to the
terminal, because it's impossible to do shaping in any reasonable way
without controlling the fonts being used and accessing the font
information, and this is not possible when you run on a terminal --
the user configures the terminal emulator, and the emulator chooses
the fonts it likes/needs according to that configuration.

(Emacs does support complex script shaping on GUI displays.)

> - When it comes to visually wrapping a line because it doesn't fit in
> the current width, Emacs goes its own way which doesn't match what the
> Unicode BiDi algorithm says.

Yes, this deviation is documented in the Emacs manuals.  The reason
for that is that the Emacs implementation of the UBA reorders
characters on the fly (i.e., it implements a function that can be
called character by character, and returns the next character in the
visual order).  This was done due to a special structure of the Emacs
display engine and efficiency considerations.

In practice, this problem happens very rarely.


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
> A formatted table is pretty unsuitable for automated processing, and
> obviously meant for human display.

Could you please clarify how exactly that data looks like? Maybe a
tiny hexdump of an example?

Is the RTL piece of text already stored in visual order, that is,
beginning with the leftmost (last logical) letter of the word? If so
then you can sure display it properly in BiDi-unaware rendering
engines (including most terminal emulators currently, as well as in
"explicit" mode according to my specification). That is, whoever
produces that data reverses that word for you?

Or is the RTL piece of text still in its logical order? Then in what
piece of software does this formatted data show up to you in a
readable way?

> You're a terminal emulator maintainer, thus it's natural for you to think
> it's the right place to come up with a solution.

No. I've been a maintainer/developer/contributor to all kinds of
software, including (but not limited to) terminal emulators, apps
running inside terminal emulators, or a pretty complex RTL homepage.
I'm doing my best in looking at the entire ecosystem, and coming up
with a good BiDi-aware interface between terminal emulators and
applications.

> I'd argue that it's not --
> all a terminal emulator can do is to display already formatted text, there's
> no sane way to move things around.

You missed that your use case with this table is not the only possible
use case. There are others where the terminal needs to do BiDi. My
work aims to address multiple use cases at once, yours being one of
them.


cheers,
egmont


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Adam Borowski via Unicode
On Wed, Jan 30, 2019 at 03:43:10PM +0100, Egmont Koblinger via Unicode wrote:
> On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski  wrote:
> 
> > > > ╒═══╤══╕
> > > > │ filename1 │  123 │
> > > > │ FILENAME2 │   17 │
> > > > └───┴──┘
> 
> > That's possible only if the program in question is running directly attached
> > to the tty.  That's not an option if the output is redirected.  Frames in
> > a plain text file are a perfectly rational, and pretty widespread, use --
> > and your proposal will break them all.  Be it "cat" to the screen, "less" or
> > even "mutt" if the text was sent via a mail.
> 
> I'd argue that if you have such a data stored in a file, with logical
> order used in Arabic or Hebrew text, combined with line drawing chars
> as you showed, then your data is broken to begin with – broken in the
> sense that it's suitable for automated processing (*), but not for
> display.

A formatted table is pretty unsuitable for automated processing, and
obviously meant for human display.

> (*) but then line drawing chars are not really a nice choice over CSV,
> JSON, whatever.

That's why you use CSV and JSON for machine-readable, plain text for humans,
and XML for neither.

> The only possible choice is for some display engine to be aware that
> line drawing characters are part of a "higher level protocol", and
> BiDi should be applied only in the lower scope. I don't think the
> terminal emulator is the right place to make such decisions

At this point, required information is lost.  Any transformations such as
RTL reordering needs to be done earlier, when you still see _unformatted_
version of the data.

You're a terminal emulator maintainer, thus it's natural for you to think
it's the right place to come up with a solution.  I'd argue that it's not --
all a terminal emulator can do is to display already formatted text, there's
no sane way to move things around.  Any changes need to be localized -- for
example, you can do ligatures only if you keep total length unchanged.  Ie,
the terminal emulator is the right layer for things like complex script
shaping, but not RTL reordering.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
Hi Frédéric,

> I guess Arabic shaping is doable through presentation form characters,
> because the latter are character inherited from legacy standards using
> them in such solutions. But if you want to support other “arabic like”
> scripts (like Syriac, N’ko), or even some LTR complex scripts, like
> Myanmar or Khmer, this “solution” cannot work, because no equivalent of
> “presentation form characters” exists for these scripts

Unfortunately my knowledge ends here, I'm not familiar with shaping
for Syriac and other similar scripts. I'd really appreciate input from
experts here.

I outline in the document problems that arise from the terminal
emulator performing shaping on its contents in "explicit" mode, which
is to be used by Emacs and others. The terminal emulator is not aware
of the characters that are chopped off at the edge of the screen,
required for shaping. The terminal emulator is not aware of which
characters happen to be placed next to each other, but belong to
semantically different UI elements, that is, shouldn't be shaped.

(And as a side note, FriBidi doesn't provide a method for doing
shaping on _visual_ order. I'm unsure about other libraries, and
unsure if there's an algorithm for it at all.)

Honestly, I have no idea how to best address all these problems at
once. This is where we can think of extensions "expliti mode level 2",
use control characters that explicitly specify how to shape certain
glyphs. This is subject to further research.

cheers,
egmont



Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski  wrote:

> > > ╒═══╤══╕
> > > │ filename1 │  123 │
> > > │ FILENAME2 │   17 │
> > > └───┴──┘

> That's possible only if the program in question is running directly attached
> to the tty.  That's not an option if the output is redirected.  Frames in
> a plain text file are a perfectly rational, and pretty widespread, use --
> and your proposal will break them all.  Be it "cat" to the screen, "less" or
> even "mutt" if the text was sent via a mail.

I'd argue that if you have such a data stored in a file, with logical
order used in Arabic or Hebrew text, combined with line drawing chars
as you showed, then your data is broken to begin with – broken in the
sense that it's suitable for automated processing (*), but not for
display. I can't think of any utility that would display it properly,
because that's not what the Unicode BiDi algorithm run over this data
produces.

(*) but then line drawing chars are not really a nice choice over CSV,
JSON, whatever.

The only possible choice is for some display engine to be aware that
line drawing characters are part of a "higher level protocol", and
BiDi should be applied only in the lower scope. I don't think the
terminal emulator is the right place to make such decisions – I don't
think any other generic tool (graphical word processor, browser etc.)
does make such a call either.

cheers,
egmont



Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Frédéric Grosshans via Unicode

Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :

- It doesn't do Arabic shaping. In my recommendation I'm arguing that
in this mode, where shuffling the characters is the task of the text
editor and not the terminal, so should it be for Arabic shaping using
presentation form characters.


I guess Arabic shaping is doable through presentation form characters, 
because the latter are character inherited from legacy standards using 
them in such solutions. But if you want to support other “arabic like” 
scripts (like Syriac, N’ko), or even some LTR complex scripts, like 
Myanmar or Khmer, this “solution” cannot work, because no equivalent of 
“presentation form characters” exists for these scripts





Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Adam Borowski via Unicode
On Wed, Jan 30, 2019 at 03:25:32PM +0100, Egmont Koblinger via Unicode wrote:
> One more note, to hopefully clarify:
> > ╒═══╤══╕
> > │ filename1 │  123 │
> > │ FILENAME2 │   17 │
> > └───┴──┘
> >
> > I'm afraid there's no good way to do BiDi without support from individual
> > programs.
> 
> In this particular example, when the output consists of RTL text in
> logical order (the emitter does not reorder the characters to their
> visual order, nor emit any BiDi controls), combined with line drawing
> and such, there is hardly anything we could do purely on the terminal
> emulator's side.
> 
> I did not consider the possibility of certain characters (e.g. line
> drawing ones) being "stop characters", and BiDi to get applied only in
> runs of other characters. Any such magic would be arbitrary, fix a
> subset of the cases while cause other unforeseen breakages elsewhere.
> E.g. what if someone intentionally uses these characters as
> letter-like ones in a BiDi text, like """here I'm talking about the
> '└' shaped corner"""... or what if poor man's ASCII pipe and other
> symbols are used... it's way too risky to go into any kind of
> heuristics.
> 
> In this particular case the terminal cannot magically fix the output
> for you, you'll need to get the application fixed.

That's possible only if the program in question is running directly attached
to the tty.  That's not an option if the output is redirected.  Frames in
a plain text file are a perfectly rational, and pretty widespread, use --
and your proposal will break them all.  Be it "cat" to the screen, "less" or
even "mutt" if the text was sent via a mail.

I don't really see a possibility to do that on the terminal's side, any RTL
reordering would need to be done by the program in question.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
Hi Adam,

One more note, to hopefully clarify:

> ╒═══╤══╕
> │ filename1 │  123 │
> │ FILENAME2 │   17 │
> └───┴──┘
>
> I'm afraid there's no good way to do BiDi without support from individual
> programs.

In this particular example, when the output consists of RTL text in
logical order (the emitter does not reorder the characters to their
visual order, nor emit any BiDi controls), combined with line drawing
and such, there is hardly anything we could do purely on the terminal
emulator's side.

I did not consider the possibility of certain characters (e.g. line
drawing ones) being "stop characters", and BiDi to get applied only in
runs of other characters. Any such magic would be arbitrary, fix a
subset of the cases while cause other unforeseen breakages elsewhere.
E.g. what if someone intentionally uses these characters as
letter-like ones in a BiDi text, like """here I'm talking about the
'└' shaped corner"""... or what if poor man's ASCII pipe and other
symbols are used... it's way too risky to go into any kind of
heuristics.

In this particular case the terminal cannot magically fix the output
for you, you'll need to get the application fixed.

cheers,
egmont



Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
Hi Adam,

> Even a line is way too big a piece to be safely reordered by the terminal.
> What you propose will break every full-screen program that uses line-drawing
> characters:

Certain terminal emulators already perform BiDi on their lines. They
already break every full-screen program with line-drawing and such, as
you pointed out. What my proposal adds, amongst plenty of other
things, is a means to automatically disable the terminal's BiDi,
rather than having to go to its settings. This way you can automate
the fix of the apps that aren't explicitly fixed, e.g. via wrapper
scripts, or terminfo entries with special ti/te definitions.

> I'm afraid there's no good way to do BiDi without support from individual
> programs.

Depends on the use case.

For complex apps, like text editors, you are right, the terminal
emulator must stay out of the game.

For simple utilities, like "cat" and friends, there's no way you can
implement BiDi support in "cat" itself. Here the terminal needs to do
it.

Your use case with tables is perhaps somewhat in the middle. One
possible approach for the emitting utility is to disable BiDi in the
terminal (switch to "explicit" mode) for the scope of this output.
Another possible approach is to leave the terminal doing BiDi, but
embed all the text fragments in FSI...PDI blocks. (This latter is
subject to a bit of further research, to be exactly specified in a
forthcoming version of the specs.)

What is extremely tough here is realizing that there are multiple
conflicting requirements (including the example you gave), and coming
up with a soluiton that satisfies the needs of all. This is what my
work aims to do.

cheers,
egmont


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
Hi Eli,

> My personal experience with bringing BiDi to Emacs led me to a firm
> conclusion that BiDi support by terminal emulators cannot be relied on
> by sophisticated text editing and display applications that are
> BiDi-aware.  The terminal emulator can never be smart enough to do
> what the editing needs require, so the application eventually ends up
> jumping through hoops in order to trick the terminal into doing TRT.
> It is easier to tell users to disable BiDi support of the terminal (if
> it even has one), and do everything in the app.  This is the only way
> of having full control of what is displayed, especially when
> "higher-level protocols" need to be used to tailor the UBA to the need
> of the user, because there's usually no way of asking the terminal to
> apply a behavior which deviates from the UBA.

We are absolutely on the same page here. As long as the use case is
text editing or something similar, it's harmful if the terminal
emulator aims to do any BiDi.

Having to tell users to turn off BiDi in the emulator's settings is in
my firm opinion a user experience no-go. It has to be automatic,
happen under the hood, that is, using escape sequences.

There's another side to the entire BiDi story, though. Simple
utilities like "echo", "cat", "ls", "grep" and so on, line editing
experience of your shell, these kinds. It's absolutely not feasible to
add BiDi support to these utilities. Here the only viable approach is
to have the terminal emulator do it.

Hence, as I confirm ECMA TR/53's realization of 28 years ago, there
have to be two substantially different modes. "Explicit" mode for what
you need for Emacs: the terminal to stay out of the game; and
"implicit" mode where the terminal performs BiDi for the sake of "cat"
and other simple utiltiies.

I'm also arguing that contrary to TR/53, there's no way to hook up a
mode switch to "cat" and a gazillion of other similar tools. The only
reaslisticly implementable approach is if the "implicit" mode is the
default so that simple utilities provide a proper BiDi experience.
Those very few fullscreen apps that do know what they are doing and do
want the terminal to leave the characters at their designated place
(such as Emacs, Vim etc.) will have to request this "explicit" mode
from the terminal.

cheers,
egmont


Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode
Hi Eli,

> > In turn, vim, emacs and friends stand there clueless, not knowing
> > how to do BiDi in terminals.
>
> This is inaccurate: [...]

I have to admit, I was somewhat sloppy in the phrasing of this
announcement. My bad, apologies.

Currently some terminal emulators shuffle the characters around for
display purposes, while most don't. There's absolutely no way an
editor (no matter if Emacs or any other) could produce the desired
look on both kinds. I actually present a proof that an editor cannot
always produce the desired look on ones that shuffle their contents
around. So it's a somewhat reasonable expectation to produce the
desired look on ones that don't shuffle their cells.

In the document, more precisely at [1] I evalute my findings with GNU
Emacs 25.2. (I've just fixed the page to add "GNU", thanks for
pointing this out!)

Brief summary:

- GNU Emacs reshuffles the characters according to the BiDi algorithm,
expecting that the terminal emulator doesn't do any BiDi.

- According to my recommendation, in order to address BiDi in the
entire ecosystem around terminal emulators, the default behavior will
have to be that terminals shuffle the characters around. Don't worry,
there'll be a mode where this shuffling doesn't occur. Emacs (and all
other BiDi-aware text editors) will have to switch to this mode.

- It doesn't do Arabic shaping. In my recommendation I'm arguing that
in this mode, where shuffling the characters is the task of the text
editor and not the terminal, so should it be for Arabic shaping using
presentation form characters.

- When it comes to visually wrapping a line because it doesn't fit in
the current width, Emacs goes its own way which doesn't match what the
Unicode BiDi algorithm says. I'm not saying Emacs's behavior is bad
per se or unreasonable, and it's out of the scope of my work to try to
get it changed, but I'm making a note that it's different.

[1] https://terminal-wg.pages.freedesktop.org/bidi/prior-work/applications.html

cheers,
egmont