Re: Encoding italic

2019-02-09 Thread James Kass via Unicode



Martin J. Dürst wrote,

>> Isn't that already the case if one uses variation sequences to choose
>> between Chinese and Japanese glyphs?
>
> Well, not necessarily. There's nothing prohibiting a font that includes
> both Chinese and Japanese glyph variants.

Just as there’s nothing prohibiting a single font file from including 
both roman and italic variants of Latin characters.




Re: Encoding italic

2019-02-09 Thread Martin J . Dürst via Unicode
On 2019/02/09 19:58, Richard Wordingham via Unicode wrote:
> On Fri, 8 Feb 2019 18:08:34 -0800
> Asmus Freytag via Unicode  wrote:

>> Under the implicit assumptions bandied about here, the VS approach
>> thus reveals itself as a true rich-text solution (font switching)
>> albeit realized with pseudo coding rather than markup, markdown or
>> escape sequences.
> 
> Isn't that already the case if one uses variation sequences to choose
> between Chinese and Japanese glyphs?

Well, not necessarily. There's nothing prohibiting a font that includes 
both Chinese and Japanese glyph variants.

Regards,   Martin.



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Richard Wordingham via Unicode
On Sun, 10 Feb 2019 00:59:46 +0100
Egmont Koblinger via Unicode  wrote:

> Is there such a monospace font obeying wcwidth (that is: double wide
> character for when a spacing mark is combined) for Devanagari?

For CV, that would correspond to a Hindi typewriter, so the odds look
good. The Remington keyboard layout is taken from the typewriter
design.  However, the typewriter had non-spacing keys for repha
(roughly ) and vattu (), so you'll be out of luck
for consonant clusters.  On the other hand,  is two
key strokes - the cells would be for  and
!  There's an implementation of the keyboard in the M17N
database - hi-remington.mim.

> Is there a monospace font for Arabic,

Apart from wcwidth("لآ") = ‎2, Khaled has already said in this thread
that there are such fonts.

> for Syriac, etc.? (How much do these questions make sense at all?)

Perfect sense.

Richard.



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Richard Wordingham via Unicode
On Sat, 9 Feb 2019 18:42:52 +0100
Egmont Koblinger via Unicode  wrote:


> The
> problem that I don't know how to address is: What if harfbuzz tells us
> that the overall width for rendering a particular grapheme cluster is
> significantly different from its designated area (the number of
> character cells [wcswidth()] multiplied by the width of each)?

You have to reduce the width of the glyph used.  The tricky bit is
where the glyph deliberately overhangs or underlies a neighbouring
glyph.  A good example of this is almost U+0E33 THAI CHARACTER SARA AM,
whose nikkhahit component can typically overhangs the previous
character; however, ink beyond the left limit should not be a problem
for LTR scripts. Which side do you align RTL cells on?

Now, you might want to treat U+0E33 as interacting with its
predecessor, because it does. The test word is น้ำ 'water'.

Richard.



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Richard Wordingham via Unicode
On Sat, 9 Feb 2019 22:29:31 +0100
Adam Borowski via Unicode  wrote:

> On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode
> wrote:

> > I don't know.  Maybe it keeps a database of character combinations
> > that need shaping, each one with the maximum width on display the
> > result can occupy.  Or maybe it does something else.  If it cannot,
> > and the terminal cannot either, then what you say is that some
> > scripts can never be supported by text terminals.  
> 
> That's doable even within the current rules, where every codepoint
> bears a wcwidth of 0, 1 or 2.  A cluster made of codepoints a ' b c d
> " ^ (where a b c d have widths 1 while ' " ^ widths 0) needs to be
> rendered in exactly 4 cells.  This may force stretching or condensing
> the shaped cluster compared to what usual typography would demand but
> that's in no way different from stretching Latin "i" or condensing
> "W".

It would be helpful if overlong shapings were condensed automatically.

The general principle that functions work better on strings applies
here.  There are two obvious situations where the additive formulae
break down.

(a) Emoji should, should they not, occupy at least 2 cells.  There are
a few problem sequences, such as  (or is
wcwidth(0x20E3) equal to 1?).

(b) Brahmi-like Indic scripts.  In many of these, the combination of a
virama or invisible stacker and a base consonant acts like a combining
mark, either causing no advance or as a mark with a very slight width.
Examples include Grantha, Myanmar, Tai Tham and Khmer.

Stretching a stack of 3 or 4 consonants to occupy 3 or 4 cells instead
of 1 would be worse than stretching 'i'.  If you do it, you want fonts
that adjust the glyphs accordingly, just as for 'i'.

Richard.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
Hi,

On Sun, Feb 10, 2019 at 12:52 AM Richard Wordingham via Unicode
 wrote:

> This is an example of where one needs a font designed for terminal
> emulators.

Definitely, this is another approach I forgot to mention in my mail,
rather than VTE switching to harfbuzz and figuring out all the issues.
This approach would also make them usable in every decent terminal
emulator at once, not just VTE.

Is there such a monospace font obeying wcwidth (that is: double wide
character for when a spacing mark is combined) for Devanagari? Is
there a monospace font for Arabic, for Syriac, etc.? (How much do
these questions make sense at all?)

If there are such fonts, I'd be happy to use them for testing.

e.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Richard Wordingham via Unicode
On Sat, 9 Feb 2019 22:31:37 +0100
Egmont Koblinger via Unicode  wrote:

> Let's take the Devanagari improvement of the other day. Until now,
> there were plenty of dotted circles shown, and combining spacing marks
> that should've been placed before the letter were placed after the
> letter, before a placeholder dotted circle. Now they are displayed as
> expected: the combininig spacing mark shows up before the letter (if
> it's of that kind), and no dotted circle. The letter + spacing marks
> now shows up correctly. The entire word still doesn't, e.g. there are
> often spaces between letters where the upper line connecting them
> should be continuous.

This is an example of where one needs a font designed for terminal
emulators.

Richard.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Richard Wordingham via Unicode
On Sat, 9 Feb 2019 13:02:55 -0800
"Asmus Freytag \(c\) via Unicode"  wrote:

> To force Hindi crosswords mode you need to segment the string into 
> syllables,
> each having a variable number of characters, and then assign a single 
> display
> position to them. Now some syllables are wider than others, so you
> could use the single/double width paradigm. The result may be
> somewhat legible for Devanagari, but even some of the closely related
> scripts may not fit that well.

It is also possible that whole syllables are used because there are
vertical words.

> To give you an idea, here is an Arabi crossword. It uses the isolated 
> shape of
> all letters and writes all words unconnected. That's two things that
> may be acceptable for a puzzle, but not for text output.
> 
> http://www.everyday-arabic.com/2013/12/crossword1.html
> 
> (try typing 3 vertical as a word to see the difference - it's 4x
> U+062A)

Crosswords suffer from the need to be read vertically as well as
horizontally.  Can Arabic naturally be written vertically?

In any case, Arabic typewriters exist and, so far as I understand,
work.  The problem rather seems to be one of standardising the
Procrustean technique to be used.  It seems from what Khaled Hosny
wrote that monospace for letters is the usual solution already. 

The design difficulty for Arabic is rather that horizontally adjacency
may sometimes need to be treated as accidental rather than as an
invitation to cursively join..

Richard.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Asmus Freytag (c) via Unicode

On 2/9/2019 1:40 PM, Egmont Koblinger wrote:

On Sat, Feb 9, 2019 at 10:10 PM Asmus Freytag via Unicode
 wrote:


I hope though that all the scripts can be supported with more or less
compromises, e.g. like it would appear in a crossword. But maybe not.

See other messages: not.

For the crossword analogy, I can see why it's not good. But this
doesn't mean there aren't any other ideas we could experiment with.



"all...scripts" is the issue.  We know how to handle text for all 
scripts and what complexities one has to account for in order to do 
that. You can back off some corner cases or (slightly) degrade things, 
but even after you are done with that, there will be scripts where the 
"more or less compromises" forces by the design parameters you gave will 
mean an utterly unacceptable display.


That said, there are scripts that had "passable" typewriter 
implementations and it may be possible to tweak things to approach that 
level support. Don't know for sure, it depends on the details for each 
script.





Or do you mean to say that because it can't be made perfect, there's
no point at all in partially improving? I don't think I agree with
that.



It's more a question of being upfront with your goal.

At this point I understand it as accepting some design parameters as 
fundamental and seeing whether there are some tweaks that allow more 
scripts to work with or to "survive" given the constraints.


That's not a totally useless effort, but it is a far cry from Unicode's 
universal support for ALL writing systems.


A./

PS: also we have been seriously hijacking a thread related to bidi




e.





Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
On Sat, Feb 9, 2019 at 10:10 PM Asmus Freytag via Unicode
 wrote:

> > I hope though that all the scripts can be supported with more or less
> > compromises, e.g. like it would appear in a crossword. But maybe not.
>
> See other messages: not.

For the crossword analogy, I can see why it's not good. But this
doesn't mean there aren't any other ideas we could experiment with.

Or do you mean to say that because it can't be made perfect, there's
no point at all in partially improving? I don't think I agree with
that.



e.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
Hi Asmus,

On Sat, Feb 9, 2019 at 10:02 PM Asmus Freytag (c)  wrote:

> are you excluding CJK because of the difficulty handling a large
> repertoire with mechanical means?

No, I excluded CJK because they're pretty well solved in terminals,
and nowhere near along the lines of how they work with typewriters.

I should've probably said "letter based" scripts or whatever, I'm not
familiar with the exact terminologies.

> To force Hindi crosswords mode you need to segment the string into syllables,
> each having a variable number of characters [...]

Thanks a lot to you too for your detailed explanation!

> Are you defining as your goal to have some kind of "line by line" display that
> can survive any Unicode text thrown at it, or are you trying to extend a given
> design with rather specific limitations, so that it survives / can be used 
> with,
> just a few more scripts than European + CJK?

I don't have a clearly defined goal. I find fun in developing VTE (and
slightly improving other terminal emulators too by spreading ideas,
knowledge, comments etc.), addressing various kinds of goals, whatever
happens to come next. At this point it's BiDi, with a bit of
Devanagari improvement sneaking in the other day.

What is clear to me: I cannot redefine the basics of terminal
emulation. I can only add incremental improvements to whatever it
already is, and I have to make sure that the ecosystem built around it
during decades (all the screen handling libraries and applications)
doesn't break. I'm limited by these constraints.

> The discrepancies would be more like throwing random blank spaces in the
> middle of every word, writing letters out of order, or overprinting. So, more
> fundamental, not just "not perfect".

Let's take the Devanagari improvement of the other day. Until now,
there were plenty of dotted circles shown, and combining spacing marks
that should've been placed before the letter were placed after the
letter, before a placeholder dotted circle. Now they are displayed as
expected: the combininig spacing mark shows up before the letter (if
it's of that kind), and no dotted circle. The letter + spacing marks
now shows up correctly. The entire word still doesn't, e.g. there are
often spaces between letters where the upper line connecting them
should be continuous.

Eventually HarfBuzz could help, but it's just not yet clear how
exactly. I cannot essentially change the underlying model of fixed
width cells. On top of this model, though, we can experiment with
various ideas about displaying. For example, if a word occupies 7
columns in the model, then HarfBuzz renders it, and the rendered
version occupies the width of 8.6 columns, maybe we can squeeze it
using a trivial linear transformation? I'm not sure, but maybe it's an
idea worth investigating. Won't look perfect, but probably will look
better than what we do currently. We already have column spacing
implemented, to pull the columns further apart from each other by a
fixed amount (mostly for accessibility purposes), maybe a user can use
this feature to make more room for a nicely rendered, non-squeezed
Devanagari text.

> To give you an idea, here is an Arabi crossword. It uses the isolated shape of
> all letters and writes all words unconnected. That's two things that may be
> acceptable for a puzzle, but not for text output.

You can't get nice Arabic without first making sure the order of the
letters is the correct one, not reversed. :-) That's what my current
work is about.

As per Richard's feedback, I also see that shaping needs to be done
differently than I had thought. Mind you, my visual inspection of what
the non-preferred shaping approach gave to me vs. what a proper
HarfBuzz rendering gave (for Arabic) were extremely close to each
other, something that I'd probably consider "good enough" if I spoke
the language and were aware of the terminal's constraints. Well,
definitely a major improvement over what we have.

> You may begin to see the limitations and that they may well prevent you from
> reaching even your limited goal for speakers of at least three of the top ten 
> languages
> worldwide.

If the goal is to have perfect rendering without compromises: sure I
won't reach that. (It's not a goal for me. For perfect rendering,
users should get away from terminals.) If the goal is to have
something reasonably good, better than what we have currently, I can't
see why not.


cheers,
e.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Adam Borowski via Unicode
On Sat, Feb 09, 2019 at 10:01:21PM +0200, Eli Zaretskii via Unicode wrote:
> > From: Egmont Koblinger 
> > Date: Sat, 9 Feb 2019 20:36:50 +0100
> > Cc: Richard Wordingham , 
> > unicode Unicode Discussion 
> > 
> > On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii  wrote:
> > 
> > > That's the application's problem, not the terminal's.  An application
> > > that wants its column to line up _and_ wants to support complex text
> > > scripts will need to move cursor to certain coordinates, not to assume
> > > that 7 codepoints always take 7 columns on display.

It must know that those particular 7 codepoints take, say, 5 columns when
written together in a sequence.  And it can't possibly ask the terminal,
either -- it might be on a link that doesn't allow metadata to pass, it
might be broadcasted, its output might be recorded many years prior to being
displayed.  A good part of the time the program is even run on a different
distribution/release/OS.

Obviously, a program running with system libraries might suffer misalignment
and thus visual corruption if those libraries don't know beyond, say,
Unicode 13 yet the terminal expects Unicode 17 -- but that's no different
from any other property incompatibly changing.  Property changes for
established characters are pretty rare thus no significant loss of
interoperability can be expected over time.

> > In order to do that, an application needs to know how wide a text will
> > appear, which depends on the font. How will it know it?
> 
> I don't know.  Maybe it keeps a database of character combinations
> that need shaping, each one with the maximum width on display the
> result can occupy.  Or maybe it does something else.  If it cannot,
> and the terminal cannot either, then what you say is that some scripts
> can never be supported by text terminals.

That's doable even within the current rules, where every codepoint bears a
wcwidth of 0, 1 or 2.  A cluster made of codepoints a ' b c d " ^ (where a b
c d have widths 1 while ' " ^ widths 0) needs to be rendered in exactly 4
cells.  This may force stretching or condensing the shaped cluster compared
to what usual typography would demand but that's in no way different from
stretching Latin "i" or condensing "W".


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Asmus Freytag (c) via Unicode

On 2/9/2019 11:48 AM, Egmont Koblinger wrote:

Hi Asmus,


On quick reading this appears to be a strong argument why such emulators will
never be able to be used for certain scripts. Effectively, the model described 
works
well with any scripts where characters are laid out (or can be laid out) in 
fixed
width cells that are linearly adjacent.

I'm wondering if you happen to know:

Are there any (non-CJK) scripts for which a mechanical typewriter does
not exist due to the complexity of the script?


Egmont,

are you excluding CJK because of the difficulty handling a large
repertoire with mechanical means? However, see:

https://en.wikipedia.org/wiki/Chinese_typewriter




Are there any (non-CJK) scripts for which crossword puzzles don't exist?

For scripts where these do exist, is it perhaps an acceptable tradeoff
to keep their limitations in the terminal emulator world as well, to
combine the terminal emulator's power with these scripts?



I agree with you that crossword puzzles and scrabble have a similar
limitation to the design that you sketched for us. However, take a script
that is written in syllables (each composed of 1-5 characters, say).

In a "crossword" I could write this script so that each syllable occupies
a cell. It would be possible to read such a puzzle, but trying to use 
such a draconian
technique for running text would be painful, to say the least. (We are 
not even

talking about pretty, here).

Here's an example for Hindi:
https://vargapaheli.blogspot.com/2017/
I don't read Hindi, but 5 vertical in the top puzzle, cell 2, looks like 
it contains

both a consonant and a vowel.

To force Hindi crosswords mode you need to segment the string into 
syllables,
each having a variable number of characters, and then assign a single 
display

position to them. Now some syllables are wider than others, so you could use
the single/double width paradigm. The result may be somewhat legible for
Devanagari, but even some of the closely related scripts may not fit 
that well.


Now there are some scripts where the same syllable can be written in more
than one form; the forms differing by how the elements are fused (or 
sometimes
not fused) into a single shape. Sometimes, these differences are more 
"stylistic",
more like an 'fi' ligature in English, sometimes they really indicate 
different words,
or one of the forms is simply not correct (like trying to spell lam-alif 
in Arabic using

two separate letters).

I'm sure there are scripts that work rather poorly (effectively not at 
all) in cross-

word mode. The question then becomes one of goals.

Are you defining as your goal to have some kind of "line by line" 
display that
can survive any Unicode text thrown at it, or are you trying to extend a 
given
design with rather specific limitations, so that it survives / can be 
used with,

just a few more scripts than European + CJK?



Honestly, even with English, all I have to do is "cat some_text_file",
and chances are that a word is split in half at some random place
where it hits the right margin. Even with just English, a terminal
emulator isn't something that gives me a grammatically and
typographically super pleasing or correct environment. It gives me
something that I personally find grammatically and typographically
"good enough", and in the mean time a powerful tool to get my work
done.



The discrepancies would be more like throwing random blank spaces in the
middle of every word, writing letters out of order, or overprinting. So, 
more

fundamental, not just "not perfect".

To give you an idea, here is an Arabi crossword. It uses the isolated 
shape of

all letters and writes all words unconnected. That's two things that may be
acceptable for a puzzle, but not for text output.

http://www.everyday-arabic.com/2013/12/crossword1.html

(try typing 3 vertical as a word to see the difference - it's 4x U+062A)



Obviously the more complex the script, the more tradeoffs there will
be. I think it's a call each user has to make whether they prefer a
terminal emulator or a graphical app for a certain kind of task. And
if terminal emulators have a lower usage rate in these scripts, that's
not necessarily a problem. If we can improve by small incremental
changes, sure, let's do. If we'd need to heavily redesign plenty of
fundamentals in order to improve, it most likely won't happen.

You may begin to see the limitations and that they may well prevent you 
from
reaching even your limited goal for speakers of at least three of the 
top ten languages

worldwide.

A./



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Asmus Freytag via Unicode

  
  
On 2/9/2019 12:07 PM, Egmont Koblinger
  via Unicode wrote:


  On Sat, Feb 9, 2019 at 9:01 PM Eli Zaretskii  wrote:


  
then what you say is that some scripts
can never be supported by text terminals.

  
  
I'm not familiar at all with all the scripts and their requirements,
but yes, basically this is what I'm saying. I'm afraid some scripts
can never be perfectly supported by text terminals.



This includes the scripts used for up to four of the world's top
  ten languages.
And it's more than "not perfect"; effectively some scripts cannot
  be shoehorned
  into the fundamental design.
That design was created to work with European scripts, and proved
  somewhat
  adaptable to other scripts that lend themselves to fixed-width
  cell display. But
  beyond that is where you hit the proverbial brick wall.


  

I hope though that all the scripts can be supported with more or less
compromises, e.g. like it would appear in a crossword. But maybe not.



See other messages: not.



  

Maybe one day some new, modern platform will arise with the goal of
replacing terminal emulators, which I wouldn't necessarily mind. It's
gonna take an enormous amount of work, though.


A./



  



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Ken Whistler via Unicode

Egmont,

On 2/9/2019 11:48 AM, Egmont Koblinger via Unicode wrote:

Are there any (non-CJK) scripts for which crossword puzzles don't exist?


There are crossword puzzles for Hindi (in the Devanagari script). Just 
do an image search for "Hindi crossword puzzle".


But the conventions for these break up words into syllables fitting into 
the boxes, and the rules for that are complex. You have to allow for the 
placement of dependent vowels, which may take up extra space left or 
right, as well as consonant clusters, which would be expressed often as 
conjuncts in Sanskrit, but which in Hindi are more commonly rendered as 
dead consonant sequences. So the "stuff in a box" is:


1. Inherently proportional width.

2. Inherently multi-character in content. (underlying 1 to 3 or more 
characters per cell)


This is the kind of compromise you would have to have to make for almost 
any Indic script, to enable a rational approach to building crossword 
puzzles that make sense.


And in a terminal context, you probably would not get acceptable 
behavior for Hindi if you tried to just take all the "stuff in a box" 
chunks and tried to lay them out directly in a line, as if the script 
behaved more like CJK.


The existence proof of techniques to cut up text into syllables that 
enable crossword puzzle building, is not the same as a determination 
that the script, ipso facto, would work in a terminal context without 
dealing with additional complex script issues.


At any rate, this is once again straying over into the issue of whether 
terminals can  be adapted for the requirements of shaping rules for 
complex scripts -- rather than the nominal subject of the thread, which 
has to do with bidi text layout in terminals.


--Ken




Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
Hi Ken,

> There are crossword puzzles for Hindi (in the Devanagari script). Just
> do an image search for "Hindi crossword puzzle".

It's easy to confirm the existence by an image search, it's hard to
confirm the non-existence ;)

> The existence proof of techniques to cut up text into syllables that
> enable crossword puzzle building, is not the same as a determination
> that the script, ipso facto, would work in a terminal context without
> dealing with additional complex script issues.

Thanks a lot for your detailed explanation; this possibility indeed
didn't occur to me.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
On Sat, Feb 9, 2019 at 9:01 PM Eli Zaretskii  wrote:

> then what you say is that some scripts
> can never be supported by text terminals.

I'm not familiar at all with all the scripts and their requirements,
but yes, basically this is what I'm saying. I'm afraid some scripts
can never be perfectly supported by text terminals.

I hope though that all the scripts can be supported with more or less
compromises, e.g. like it would appear in a crossword. But maybe not.

Maybe one day some new, modern platform will arise with the goal of
replacing terminal emulators, which I wouldn't necessarily mind. It's
gonna take an enormous amount of work, though.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Sat, 9 Feb 2019 20:36:50 +0100
> Cc: Richard Wordingham , 
>   unicode Unicode Discussion 
> 
> On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii  wrote:
> 
> > That's the application's problem, not the terminal's.  An application
> > that wants its column to line up _and_ wants to support complex text
> > scripts will need to move cursor to certain coordinates, not to assume
> > that 7 codepoints always take 7 columns on display.
> 
> In order to do that, an application needs to know how wide a text will
> appear, which depends on the font. How will it know it?

I don't know.  Maybe it keeps a database of character combinations
that need shaping, each one with the maximum width on display the
result can occupy.  Or maybe it does something else.  If it cannot,
and the terminal cannot either, then what you say is that some scripts
can never be supported by text terminals.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
Hi Asmus,

> On quick reading this appears to be a strong argument why such emulators will
> never be able to be used for certain scripts. Effectively, the model 
> described works
> well with any scripts where characters are laid out (or can be laid out) in 
> fixed
> width cells that are linearly adjacent.

I'm wondering if you happen to know:

Are there any (non-CJK) scripts for which a mechanical typewriter does
not exist due to the complexity of the script?

Are there any (non-CJK) scripts for which crossword puzzles don't exist?

For scripts where these do exist, is it perhaps an acceptable tradeoff
to keep their limitations in the terminal emulator world as well, to
combine the terminal emulator's power with these scripts?

Honestly, even with English, all I have to do is "cat some_text_file",
and chances are that a word is split in half at some random place
where it hits the right margin. Even with just English, a terminal
emulator isn't something that gives me a grammatically and
typographically super pleasing or correct environment. It gives me
something that I personally find grammatically and typographically
"good enough", and in the mean time a powerful tool to get my work
done.

Obviously the more complex the script, the more tradeoffs there will
be. I think it's a call each user has to make whether they prefer a
terminal emulator or a graphical app for a certain kind of task. And
if terminal emulators have a lower usage rate in these scripts, that's
not necessarily a problem. If we can improve by small incremental
changes, sure, let's do. If we'd need to heavily redesign plenty of
fundamentals in order to improve, it most likely won't happen.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii  wrote:

> That's the application's problem, not the terminal's.  An application
> that wants its column to line up _and_ wants to support complex text
> scripts will need to move cursor to certain coordinates, not to assume
> that 7 codepoints always take 7 columns on display.

In order to do that, an application needs to know how wide a text will
appear, which depends on the font. How will it know it?

Will it by some means know the font and the rendering engine the
terminal uses (even across ssh) and will it have to measure it itself?

Or will it be able to ask the terminal? If so, how? Maybe a new
extension, an asynchronous escape sequence that responds back with the
measured width? What about the latency caused by the bunch of
asyncronous roundtrips, especially over ssh? What about the utter pain
and intrinsic unreliability of handling asynchronous responses, as
I've outlined in a section of
https://gitlab.freedesktop.org/terminal-wg/specifications/issues/8 ?

What if there's no font? What if there are multiple fonts at the same
time? What if the font is changed later on, is it okay then for the
display of existing stuff to fall apart and only newly printed stuff
to appear correctly?

How do you define the "width of the terminal in characters", get/set
by ioctl(..., TIOC[GS]WINSZ, ...) that many apps rely on?

If you define it by any means, what if by placing the maximum numbers
of "i"s in a row doesn't fill up the entire width? Will that area be
unaccessible, then? Or despite having a definition of terminal width,
will there be new cells beyond this width to write to?

What if filling a row with all "w"s overflows? I take it that an app
shouldn't print there, but what if it still does, will that piece of
text just not be shown?

How much more complicated would you think implementing something like
"zip -h" become?

> How is this different from using variable-pitch fonts?

Do you mean variable-pitch font where the terminal still places each
glyph in its designated area? The font is the private business of the
terminal emulator, then, it'll just appear ugly as a screenshot I've
already linked, but the emulation behavior wouldn't care.

Or do you mean variable-pitch font where each letter is placed after
each other, as you'd expect in document editors? That is, way more
"i"s that "w"s fitting in a line? It's not different, it's practically
the same. And this is something that none of the terminal emulators
I'm aware of does; and having some clue about terminal emuators, I
can't imagine how one could do (see all the questions above for a
start).

This is why I'm saying: Sure you can take this path, but then we're
talking about something new, not terminal emulators as we currently
know them. You can take this path, but then you'll have to rebuild
many of the already existing apps, and beware, they'll get way more
complex.


e.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Sat, 9 Feb 2019 20:03:21 +0100
> Cc: Richard Wordingham , 
>   unicode Unicode Discussion 
> 
> Let's suppose a utility outputs these two lines of text:
> abcdefg|
> complex|
> 
> whereas "abcdefg" are these English letters themselves, but "complex"
> is a word of some language requiring complex script rendering, taking
> up 7 logical cells (because that's what wcwidth() says). Also, "|" is
> the pipe symbol, or a vertical box drawing line, whatever.
> 
> Now let's assume that harfbuzz tells you that the desired width for
> rendering this "complex" word is 5.3 times the width of the character
> cell. Or 8.6 times it. How to proceed? How will the "|" bars align up,
> and thus mc's two-panel layout, tmux's vertical split etc. not fall
> apart?  In the latter case, when the width requested by harfbuzz is
> bigger than the designated width, what to with characters that "fall
> off" at the right edge of the terminal?

That's the application's problem, not the terminal's.  An application
that wants its column to line up _and_ wants to support complex text
scripts will need to move cursor to certain coordinates, not to assume
that 7 codepoints always take 7 columns on display.  Or it will have
to tell the users to use specific fonts, which are known to provide
guarantees that this happens.

How is this different from using variable-pitch fonts?


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Asmus Freytag via Unicode

  
  
On quick reading this appears to be a
  strong argument why such emulators will
never be able to be used for certain
  scripts. Effectively, the model described works
well with any scripts where characters
  are laid out (or can be laid out) in fixed
width cells that are linearly adjacent.


There are some crude techniques that
  allow an extension to cover scripts that
require half-width or double-width
  cells, and perhaps even zero-width.


However, scripts, where rendering
  involves complicated ligatures or other
  typographical interactions that often are specific to a given
  font, would simply 

be out of scope because for those
  scripts the fixed width model with an 

underlying buffer mimicking the display
  simply cannot be made to work.


And indeed, by up-front accepting the
  limitation of a particular design approach
it would be surprising if such
  emulators proved flexible enough to handle the
rather wide variety of writing systems
  supported by Unicode.


At best, the discussion could yield a
  few further approximations of correct
rendering that can be retrofitted to
  the particular design restrictions outlined
below, but that with luck extend the
  envelope somewhat so that a few more
writing systems can be shoehorned into
  it.


However, it appears quite hopeless to
  attempt to cover all of Unicode's scripts
on that premise.


A./









On 2/9/2019 10:25 AM, Egmont Koblinger
  via Unicode wrote:


  On Sat, Feb 9, 2019 at 7:07 PM Eli Zaretskii  wrote:


  
You need to use what HarfBuzz tells you _instead_ of wcswidth.  It is
in general wrong to use wcswidth or anything similar when you use a
shaping engine and support complex script shaping.

  
  
This approach is not viable at all.

Terminal emulators have an internal data structure that they maintain,
a matrix of character cells. Every operation is performed here, every
escape sequence is defined on this layer what it does, the cursor
position is tracked on this layer, etc. You can move the cursor to
integer coordinates, overwrite the letter in that cell, and do plenty
of other operations (like push the rest to the right by one cell). If
you change these fundamentals, most of the terminal-based applications
will fall apart big time.

This behavior has to be absolutely independent from the font. The
application running inside the terminal doesn't and cannot know what
font you use, let alone how harfbuzz is about to render it. (You can
even have no font at all, such as with the libvterm headless emulator
library, or a detached screen or tmux session; or have multiple fonts
at the same time if a screen or tmux session is attached from multiple
graphical emulators.)

So one part of a terminal emulator's code is responsible for
maintaining this matrix of characters according to the input it
receives. Another part of their code is responsible for presenting
this matrix of characters on the UI, doing the best it can.

If you say that the font should determine the logical width, you need
to start building up something brand new from scratch. You need to
have something that doesn't have concepts like "width in characters".
You need to redefine cursor movement and many other escape sequences.
You need to heavily adjust the behavior of a gazillion of software,
e.g. zip's two-column output, anything that aligns in columns (e.g.
midnight commander, tmux's vertical split etc.), the shell's (or
readline's) command editing and wrapping to multiple lines, ncurses,
and so on, all the way to e.g. fullscreen text editors like Emacs.

And then we're not talking about terminal emulators anymore, as we
know them now, but something new, something pretty different.

Terminal emulators do have strong limitations. Complex text rendering
can only work to the extent we can squeeze it into these limitations.


cheers,
egmont





  



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
On Sat, Feb 9, 2019 at 7:56 PM Eli Zaretskii  wrote:

> I'm probably missing something, because I don't see the grave problems
> you hint at.  Any width provided back by a shaper can be rounded to
> the nearest integral character cell, so your canvas can still remain
> rectangular.

Let's suppose a utility outputs these two lines of text:
abcdefg|
complex|

whereas "abcdefg" are these English letters themselves, but "complex"
is a word of some language requiring complex script rendering, taking
up 7 logical cells (because that's what wcwidth() says). Also, "|" is
the pipe symbol, or a vertical box drawing line, whatever.

Now let's assume that harfbuzz tells you that the desired width for
rendering this "complex" word is 5.3 times the width of the character
cell. Or 8.6 times it. How to proceed? How will the "|" bars align up,
and thus mc's two-panel layout, tmux's vertical split etc. not fall
apart? In the latter case, when the width requested by harfbuzz is
bigger than the designated width, what to with characters that "fall
off" at the right edge of the terminal?



e.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger 
> Date: Sat, 9 Feb 2019 19:25:08 +0100
> Cc: Richard Wordingham , 
>   unicode Unicode Discussion 
> 
> > You need to use what HarfBuzz tells you _instead_ of wcswidth.  It is
> > in general wrong to use wcswidth or anything similar when you use a
> > shaping engine and support complex script shaping.
> 
> This approach is not viable at all.
> [...]

I'm probably missing something, because I don't see the grave problems
you hint at.  Any width provided back by a shaper can be rounded to
the nearest integral character cell, so your canvas can still remain
rectangular.  And I see no reason why an application should be
bothered by the actual number of character cells occupied by the text
it wrote on display.  So what exactly is not viable in using the width
reported back by the shaper?

> If you say that the font should determine the logical width, you need
> to start building up something brand new from scratch.

Are you saying that a terminal cannot work with variable-pitch fonts?

> Terminal emulators do have strong limitations. Complex text rendering
> can only work to the extent we can squeeze it into these limitations.

No one said anything to the contrary, AFAICT.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
On Sat, Feb 9, 2019 at 7:07 PM Eli Zaretskii  wrote:

> You need to use what HarfBuzz tells you _instead_ of wcswidth.  It is
> in general wrong to use wcswidth or anything similar when you use a
> shaping engine and support complex script shaping.

This approach is not viable at all.

Terminal emulators have an internal data structure that they maintain,
a matrix of character cells. Every operation is performed here, every
escape sequence is defined on this layer what it does, the cursor
position is tracked on this layer, etc. You can move the cursor to
integer coordinates, overwrite the letter in that cell, and do plenty
of other operations (like push the rest to the right by one cell). If
you change these fundamentals, most of the terminal-based applications
will fall apart big time.

This behavior has to be absolutely independent from the font. The
application running inside the terminal doesn't and cannot know what
font you use, let alone how harfbuzz is about to render it. (You can
even have no font at all, such as with the libvterm headless emulator
library, or a detached screen or tmux session; or have multiple fonts
at the same time if a screen or tmux session is attached from multiple
graphical emulators.)

So one part of a terminal emulator's code is responsible for
maintaining this matrix of characters according to the input it
receives. Another part of their code is responsible for presenting
this matrix of characters on the UI, doing the best it can.

If you say that the font should determine the logical width, you need
to start building up something brand new from scratch. You need to
have something that doesn't have concepts like "width in characters".
You need to redefine cursor movement and many other escape sequences.
You need to heavily adjust the behavior of a gazillion of software,
e.g. zip's two-column output, anything that aligns in columns (e.g.
midnight commander, tmux's vertical split etc.), the shell's (or
readline's) command editing and wrapping to multiple lines, ncurses,
and so on, all the way to e.g. fullscreen text editors like Emacs.

And then we're not talking about terminal emulators anymore, as we
know them now, but something new, something pretty different.

Terminal emulators do have strong limitations. Complex text rendering
can only work to the extent we can squeeze it into these limitations.


cheers,
egmont


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> Date: Sat, 9 Feb 2019 18:42:52 +0100
> Cc: unicode Unicode Discussion 
> From: Egmont Koblinger via Unicode 
> 
> What if harfbuzz tells us that the overall width for rendering a
> particular grapheme cluster is significantly different from its
> designated area (the number of character cells [wcswidth()]
> multiplied by the width of each)?

You need to use what HarfBuzz tells you _instead_ of wcswidth.  It is
in general wrong to use wcswidth or anything similar when you use a
shaping engine and support complex script shaping.



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Egmont Koblinger via Unicode
Hi Richard,

On Sat, Feb 9, 2019 at 3:08 PM Richard Wordingham via Unicode
 wrote:

> It would be good to be able to access a maintained statement of the
> VTE rules for allocating characters to a cell, or group of cells, as
> appropriate.

What VTE did, up to a couple of days ago:

It opens the font, and measures the ASCII 33-126 or so characters,
takes their average size (well, in case of monospace font, they should
all have the same size), this determines the cell size.

Then every character cell is rendered individually, using Pango or
Cairo or I'm not sure what exactly – there are like three paths in the
source, the details are unclear to me. A cell might contain a base
character + nonspacing combining accents, these are passed together to
Pango and friends, so they render it as one unit. The glyph is aligned
to the left of its designated cell area, overflowing on the right (and
thus potentially overlapping with the next glyph) if it's wider than
its designated area.

As a special case, two adjacents cells might contain a double wide
(typically CJK) character, but it's not that special after all: it's
also displayed aligned to the left edge of its first cell.

What I improved a couple of days ago (to be released in vte-0.56), for
Devanagari and friends, although I know there's more than this to
address these scripts properly:

If a cell contains a regular letter, and the next cell contains a
spacing combining mark, then these two are passed to Pango in a single
step, that is, the spacing combining mark is applied around its base
letter by Pango as expected. (Previously the spacing combining mark
was rendered on its own, around a dotted circle, which was obviously
pretty bad.)

What I'm working on currently, as you all know by now, is
BiDi-shuffling the cells before rendering them (hopefully for
vte-0.58).

This is how VTE works now, but it's by no means a specification, and
tailoring a font to this behavior is probably not the right approach.
Instead, VTE's behavior should be improved. We have a pending feature
request (which I've already linked) to use HarfBuzz for rendering the
glyphs, which would then render grapheme clusters beautifully. The
problem that I don't know how to address is: What if harfbuzz tells us
that the overall width for rendering a particular grapheme cluster is
significantly different from its designated area (the number of
character cells [wcswidth()] multiplied by the width of each)?


cheers,
egmont




>
> > > (b) With a terminal that expects a fixed width font, surely the
> > > terminal decides how many cells it allocates to a group of
> > > characters, and the font designer has to come up with a suitable
> > > value based on that.
> >
> > Yes.  A terminal emulator that works with a shaper should probably
> > post-process the width information returned by the shaper for these
> > purposes.
>
> Perhaps it should base the number of cells on the width of the
> clusters.  However, continuing with my example, U+1789 KHMER LETTER NYO
> as a base character is too wide to fit in a cell, and the next
> character will overwrite its right-hand part. From this I deduce that it
> is allocated just one cell.  Gnome terminal is not alone in doing this,
> but it does better than some, in my opinion, in that the overflow of the
> foreground of one cell is not obliterated by the background of the
> next cell.  U+1789 has an East Asian width property of 'Neutral', which
> is distinctly unhelpful.
>
> What I would like is a specification of what a font must do to avoid
> such problems.
>
> > > >  I don't see how you can expect wcwidth, or any other
> > > > interface that was designed to work with _characters_, to be
> > > > useful when you need to display grapheme clusters.
>
> It, or something similar but worse, gets used, especially when moving
> the cursor for editing.
>
> > > Well I can envisage a decision being made that a grapheme cluster
> > > str (as decreed by the terminal) shall occupy wcswidth(str) cells -
> > > "The wcswidth() function returns the number of column positions for
> > > the wide-character string s, truncated to at most length n".
> >
> > AFAIU, the shaping engine returns its output in terms of font glyph
> > numbers, not character codepoints, so you cannot in general call
> > wcswidth on them.  The shaper also returns the advance information,
> > which serves instead of wcwidth and related APIs for determining the
> > actual width on display.
>
> Unfortunately, when the rectangular grid is being preserved,
> typographical advance width is generally ignored when determining the
> placement of characters.  Now, this is not always true; one can have
> the situation where the the positioning of characters respects the
> advance widths, but the positioning of the cursor assumes a fixed-width
> rectangular grid.  I have found working with that to be extremely
> confusing.
>
> Richard.
>



Encoding colour (from Re: Encoding italic)

2019-02-09 Thread wjgo_10...@btinternet.com via Unicode

Egmont Koblinger wrote:


Should this scheme be extended for colors, too? What to do with the

legacy 8/16 as well as the 256-color extensions wrt. the color
palette? Should Unicode go into the business of defining a fixed set
of colors, or allow to alter the palette colors using the OSC 4 and
friends escape sequences which supported by about half of the terminal
emulators out there?

Encoding colour is already a topic in relation to emoji and maybe could 
be extended to other characters.


A stateful method, though which might be useful for plain text streams 
in some applications, would be to encode as characters some of the 
glyphs for indicating colours and the digit characters to go with them 
from page 5 and from page 3 of the following publication.


http://www.users.globalnet.co.uk/~ngo/locse027.pdf

What to do with things that Unicode might also want to have, but 
doesn't exist in terminal emulators due to their nature, such as

switching to a different font size?

Well, if people were to want to do it, there could be a character 
encoded in the Specials section and then use that character as a base 
character and follow it with a sequence of tag characters.


William Overington

Saturday 9 February 2019


Re: Encoding colour (from Re: Encoding italic)

2019-02-09 Thread wjgo_10...@btinternet.com via Unicode

Previously I wrote:

A stateful method, though which might be useful for plain text streams 
in some applications, would be to encode as characters some of the 
glyphs for indicating colours and the digit characters to go with them 
from page 5 and from page 3 of the following publication.



http://www.users.globalnet.co.uk/~ngo/locse027.pdf


Thinking about this further, for this application copies of the glyphs 
could be redesigned so as to be square and could be emoji-style and the 
meanings of the characters specifying which colour component is to be 
set could be changed so that they refer to the number previously entered 
using one or more  of the special  digit characters. Thus the setting of 
colour components could be done in the same reverse notation way that 
the FORTH computer language works. Yet although the colour components 
thus set would be stateful until changed there would be no Escape 
sequence and if an application did not support interpretation of the 
characters as setting colours, they would just be displayed as glyphs, 
each either as a particular glyph or as a .notdef glyph.


William Overington
Saturday 9 February 2019



Re: Encoding italic

2019-02-09 Thread Richard Wordingham via Unicode
On Sat, 9 Feb 2019 04:52:30 -0800
David Starner via Unicode  wrote:

> Note that this is actually the only thing that stands out to me in
> Unicode not supporting older character sets; in PETSCII (Commodore
> 64), the high-bit character characters were the reverse (in this
> sense) of the low-bit characters.

Later ISCII has some styling codes, bold and italic amongst them.

Richard.


Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Richard Wordingham via Unicode
On Sat, 09 Feb 2019 09:42:09 +0200
Eli Zaretskii via Unicode  wrote:

> > Date: Sat, 9 Feb 2019 00:18:14 +
> > From: Richard Wordingham via Unicode 
> >   
> > > For character composition, you must have a shaping engine to talk
> > > to, and the shaper should tell you the width of each grapheme
> > > cluster it returns.  
> > 
> > (a) What defines the grapheme clusters?  The definition might be
> > terminal-specific.  
> 
> Well, the "you" above alluded to the terminal emulator, of course.
> The grapheme clusters are determined by the shaping engine that the
> emulator must call when appropriate (or always).

I find it very hard to believe that that is how it works with GNOME
Terminal (Version 3.18.3, using VTE Version 0.42.5).  At the command
line I typed in the Khmer script string ក្កេក (KA, COENG, KA, SIGN E,
KA), and saw the string split into four columns (KA, COENG), (KA),
(SIGN E), (KA), with each column given the same width. When written
correctly, SIGN E is first in visual order.  The fourth column was
displayed on top of the third column, which contained a dotted circle
to show that SIGN E on its own was not grammatically correct.  If I
were writing a Khmer font for use with Gnome terminal, I would attempt
to ensure that the display for SIGN E fitted in a single cell.

Of course, the renderer's grapheme cluster boundaries don't always
match appearances.  To get the traditional placement of U+1A58 TAI THAM
SIGN MAI KANG LAI, I end up with it being a mark glyph one cluster
later than HarfBuzz indicates it to be.

It would be good to be able to access a maintained statement of the
VTE rules for allocating characters to a cell, or group of cells, as
appropriate. 

> > (b) With a terminal that expects a fixed width font, surely the
> > terminal decides how many cells it allocates to a group of
> > characters, and the font designer has to come up with a suitable
> > value based on that.   
> 
> Yes.  A terminal emulator that works with a shaper should probably
> post-process the width information returned by the shaper for these
> purposes.

Perhaps it should base the number of cells on the width of the
clusters.  However, continuing with my example, U+1789 KHMER LETTER NYO
as a base character is too wide to fit in a cell, and the next
character will overwrite its right-hand part. From this I deduce that it
is allocated just one cell.  Gnome terminal is not alone in doing this,
but it does better than some, in my opinion, in that the overflow of the
foreground of one cell is not obliterated by the background of the
next cell.  U+1789 has an East Asian width property of 'Neutral', which
is distinctly unhelpful.

What I would like is a specification of what a font must do to avoid
such problems.

> > >  I don't see how you can expect wcwidth, or any other
> > > interface that was designed to work with _characters_, to be
> > > useful when you need to display grapheme clusters.  

It, or something similar but worse, gets used, especially when moving
the cursor for editing.

> > Well I can envisage a decision being made that a grapheme cluster
> > str (as decreed by the terminal) shall occupy wcswidth(str) cells -
> > "The wcswidth() function returns the number of column positions for
> > the wide-character string s, truncated to at most length n".  
> 
> AFAIU, the shaping engine returns its output in terms of font glyph
> numbers, not character codepoints, so you cannot in general call
> wcswidth on them.  The shaper also returns the advance information,
> which serves instead of wcwidth and related APIs for determining the
> actual width on display.

Unfortunately, when the rectangular grid is being preserved,
typographical advance width is generally ignored when determining the
placement of characters.  Now, this is not always true; one can have
the situation where the the positioning of characters respects the
advance widths, but the positioning of the cursor assumes a fixed-width
rectangular grid.  I have found working with that to be extremely
confusing.

Richard.



Re: Encoding italic

2019-02-09 Thread Rebecca Bettencourt via Unicode
On Sat, Feb 9, 2019 at 4:58 AM David Starner via Unicode <
unicode@unicode.org> wrote:

>
> On Sat, Feb 9, 2019 at 3:59 AM Kent Karlsson via Unicode <
> unicode@unicode.org> wrote:
>
>>
>> Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" > >:
>> > • Reverse on: ESC [7m
>> > • Reverse off: ESC [27m
>>
>> "Reverse" = "switch background and foreground colours".
>>
>> This is an (odd) colour thing. If you want to go with (full!) colour
>> (foreground and background), fine, but the "reverse" is oddball (and
>> based on what really old terminals were limited to when it comes to
>> colour).
>>
>
> Note that this is actually the only thing that stands out to me in Unicode
> not supporting older character sets; in PETSCII (Commodore 64), the
> high-bit character characters were the reverse (in this sense) of the
> low-bit characters.
>

This is true, many legacy character sets encoded reverse-video characters
as wholly-separate characters, and even allowed them in contexts widely
considered plain-text such as file names. This makes reverse-video possibly
the one text attribute best argued to be worthy of encoding in Unicode. But
I can already tell you it won't work, because we made such an argument in
an early version of L2/19-025, and even proposed using VS14, the very same
VS William Overington has since swiped from us for italics. That proposal
was shot down rather quickly. Bold, italics, etc. don't even stand a chance.


Re: Encoding italic

2019-02-09 Thread David Starner via Unicode
On Sat, Feb 9, 2019 at 3:59 AM Kent Karlsson via Unicode <
unicode@unicode.org> wrote:

>
> Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode"  >:
> > • Reverse on: ESC [7m
> > • Reverse off: ESC [27m
>
> "Reverse" = "switch background and foreground colours".
>
> This is an (odd) colour thing. If you want to go with (full!) colour
> (foreground and background), fine, but the "reverse" is oddball (and
> based on what really old terminals were limited to when it comes to
> colour).
>

Note that this is actually the only thing that stands out to me in Unicode
not supporting older character sets; in PETSCII (Commodore 64), the
high-bit character characters were the reverse (in this sense) of the
low-bit characters.


Re: Encoding italic

2019-02-09 Thread Kent Karlsson via Unicode

Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" :

> I'd like to propose encoding italics and similar display attributes in
> plain text using the following stateful mechanism:

Note that these do NOT nest (no stack...), just state changes for the
relevant PART of the "graphic" (i.e. style) state. So the approach in
that regard is quite different from the approach done in HTML/CSS.

> € Italics on: ESC [3m
> € Italics off: ESC [23m
> € Bold on: ESC [1m
> € Bold off: ESC [22m
> € Underline on: ESC [4m
(implies turning double underline off)

   Underline, double: ESC [21m
(implies turning single underline off)

> € Underline off: ESC [24m
> € Strikethrough on: ESC [9m
> € Strikethrough off: ESC [29m
> € Reverse on: ESC [7m
> € Reverse off: ESC [27m

"Reverse" = "switch background and foreground colours".

This is an (odd) colour thing. If you want to go with (full!) colour
(foreground and background), fine, but the "reverse" is oddball (and
based on what really old terminals were limited to when it comes to colour).

I'd rather include 'ESC [50m' (not variable spacing, i.e. "monospace" font)
and 'ESC [26m' (variable spacing, i.e. "proportional" font). Recall that
this is NOT for terminal emulators but for styling applied to text
outside of terminal emulators. (Terminal emulators already implement
much of this and more; albeit sometimes wrongly). This would be handy
for including (say) programming code or computer commands (or for that
matter, "ASCII art", or more generally "Unicode art") in otherwise
"ordinary"
text... (The "ordinary" text preferably set in a proportional font.)

> € Reset all attributes: ESC [m

(Actually 'ESC [0m', with the 0 default-able.) Handy, agreed, but not 100%
necessary.
These ESC-sequences should not normally be inserted "manually" but by a text
editor program, using the conventional means of "making bold" etc. (ctrl-b,
cmd-b,
"bold" in a menu); only "hackers" (in the positive sense) would actually
bother
about the command sequences as such.

/Kent K


> where ESC is U+001B.
>  
> This mechanism has existed for around 40 years and is already supported
> as widely as any new Unicode-only convention will ever be.
>  
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>  
> 



Re: Encoding italic

2019-02-09 Thread Kent Karlsson via Unicode


Den 2019-02-08 22:29, skrev "Egmont Koblinger via Unicode"
:

> (Mind you, I don't find it a good idea to add italic and whatnot
> formatting support to Unicode at all... but let's put aside that now.)

I don't think Doug mean to "add it to the Unicode standard", just to
have a summary of "handy esc-sequences (actually command-sequences)
for simple styling of text" picked from long-standing (text level...)
standards.

> There are a lot of problems with these escape sequences, and if you go
> for a potentially new standard, you might not want to carry these
> problems.
> 
> There is not a well-defined framework for escape sequences. In this
> particular case you might say it starts with ESC [ and ends with the
> letter 'm', but how do you know where to end the sequence if that
> letter 'm' just doesn't arrive? Terminal emulators have extremely

There is an overriding "basic (overall) syntax" for esc-seq/
command-sequences that do not include a string argument (like OSC,
APC, ...). IIUC it is (originally as byte sequences, but here as
character sequences):

\u001B[\u0020-\002F]*[\u0030-\007E]| 
(\u001B'['|\009B)[\u0030-\003F]*[\u0020-\002F]*[\u0040-\007E] 

(no newline or carriage return in there). True, that has no direct
limit, but it would not be unreasonable to set a limit of (say)
max 30 characters. Potential (i.e. starting with ESC) esc-"sequences"
that do not match the overall syntax or are too long can simply be
rendered as is (except for the ESC itself). The esc/command sequences
(that match) but are not interpreted should be ignored in "normal"
(not "show invisibles" mode) display.

They are unlikely to be "default ignored" by such things as sorting
(and should preferably be filtered out beforehand, if possible). But
if we compare to other rich text editors, the command sequences should
be ignored by (interactive) searching, just like HTML tags are ignored
in interactive searching (the internal representation "skipping" the
HTML tags in one way or another). HTML tags should also (when text
known to be HTLM) filtered out before doing such things as sorting.

> complex tables for parsing (and still many of them get plenty of
> things wrong). It's unreasonable for any random small utility
> processing Unicode text to go into this business of recognizing all
> the well-known escape sequences, not even to the extent to know where
> they end. Whatever is designed should be much more easily parseable.
> Should you say "everything from ESC[ to m", you'll cause a whole bunch
> of problems when a different kind of escape sequence gets interpreted
> as Unicode.

The escape/command sequences would not be part of Unicode (standard).

> A parser, by the way, would also have to interpret combined sequences
> like ESC[3;0;1m or alike, for which I don't see a good reason as
> opposed to having separate sequences for each. Also, it should be

Formally covered by the (non-Unicode) standards, but optional (IIUC).

> carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[
> opening for an escape sequence ­ here terminal emulators vary. These
> just make everything even more cumbersome.
> 
> ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity".

I think one should interpret these in a "modern" way, not looking
too much at what old terminals were limited to. (Colour ("increased
intensity") should be handled completely separately from bold.)

> Should this scheme be extended for colors, too? What to do with the
> legacy 8/16 as well as the 256-color extensions wrt. the color
> palette? Should Unicode go into the business of defining a fixed set
> of colors, or allow to alter the palette colors using the OSC 4 and
> friends escape sequences which supported by about half of the terminal
> emulators out there?

IF extending to colour, only refer to "true colour" (RGB) command-sequence.
The colour palette versions are for the limitations of (semi-)old terminals.

> For 256-colors and truecolors, there are two or three syntaxes out
> there regarding whether the separator is a colon or a semicolon.

It can only be colon. Using semicolon would interfere with the syntax
for multiple style specifications in one command sequence. (I by mistake
wrote a semicolon there in an earlier post; sorry.)

> Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
> for curly underline. What to do with them? Where to draw the line what

(Note colon, not semicolon, as separator.) Possible, partially matching
the capabilities for underlining via CSS (solid, dotted, dashed, wavy,
double). Depends on how much styling options one wants to pick up.

> to add to Unicode and what not to? Will Unicode possibly be a

I don't think anyone wants to make this part of the Unicode standard.
(A the most a Unicode technical note...; from Unicode's point of view.)

[...] 
> What to do with things that Unicode might also want to have, but
> doesn't exist in terminal emulators due to their nature, such as
> switching 

Re: Encoding italic

2019-02-09 Thread Richard Wordingham via Unicode
On Fri, 8 Feb 2019 18:08:34 -0800
Asmus Freytag via Unicode  wrote:

> On 2/8/2019 5:42 PM, James Kass via Unicode wrote:


> You are still making the assumption that selecting a different glyph
> for the base character would automatically lead to the selection of a
> different glyph for the combining mark that follows. That's an iffy
> assumption because "italics" can be realized by choosing a separate
> font (typographically, italics is realized as a separate typeface).

The usual practice is to look for a font that supports both base
character and mark.

> Under the implicit assumptions bandied about here, the VS approach
> thus reveals itself as a true rich-text solution (font switching)
> albeit realized with pseudo coding rather than markup, markdown or
> escape sequences.

Isn't that already the case if one uses variation sequences to choose
between Chinese and Japanese glyphs?

>> Of course, the user might insert VS14s without application
>> assistance.  In which case hopefully the user knows the rules.  The
>> worst case scenario is where the user might insert a VS14 after a
>> non-base character, in which case it should simply be ignored by any
>> application.  It should never “break” the display or the processing;
>> it simply makes the text for that document non-conformant.  (Of
>> course putting a VS14 after “ê” should not result in an italicized
>> “ê”.)

Is there any obligation on applications to ignore it?  In plain text,
the Unicode rules allow the application to choose to render every third
'ê' as italic.  Possibly it comes down to the mens rea of the
application (or of its coder or specifier), but without mentalism an
application could opt to treat <ê, VS14> as .

A relevant concern would be 'voracious' with the first 'o'
italicised by VS14.  How would current typeface selection logic work?
I can envisage  only being in the cmap of an italic font.

Richard.



Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Elias Mårtenson 
> Date: Sat, 9 Feb 2019 13:33:49 +0800
> Cc: Egmont Koblinger , unicode 
> 
>  Moreover, emitting the control sequences that set the mode is in
>  itself a complication, because if the terminal doesn't support them,
>  the result could be corrupted display.  You will need methods of
>  detecting the support, and those detection methods usually involve
>  sending another control sequence to the terminal and waiting for
>  response, something that complicates applications and causes delays in
>  displaying output.
> 
> That's what the TERM environment variable is for though.

That's not indicative enough when some version of a terminal starts to
support a feature not supported by previous versions of the same
terminal.  Happens a lot with terminal emulators such as xterm, which
are under active development, and add features all the time.