Re: [dev] [st] wide characters getting cropped

2021-11-11 Thread NRK
On Thu, Nov 11, 2021 at 05:31:15PM +0100, Страхиња Радић wrote:
> (for example, if a simple "cat /some/file" for a multi-line text file
> has a delay anywhere from 500 ms to a second or two between the output
> of individual lines, when not dependant on factors such as reading
> from a network of a faulty hard disk; that would just be annoying, but
> still usable).

You're correct, cating some file is not something that needs critical
speed. But that's not what I'm talking about when I say "latency".

I'm reffering specifically to "input latency" or "end-to-end latency",
which is the delay between a physical input (in this case key-press on
the keyboard) and it's output (in this case the key-press being
rendered).

- NRK



Re: [dev] [st] wide characters getting cropped

2021-11-11 Thread Страхиња Радић
On 21/11/10 08:55, NRK wrote:
> I wouldn't say it's "critical need". And if we judge from that pov then
> one could ask, "What's the critical need for a dynamic window manger or
> minimal softwares in general?".

Terminal emulator's job is to allow terminal input/output. Latency is simply not
relevant, unless it is noticeable by a human (for example, if a simple "cat
/some/file" for a multi-line text file has a delay anywhere from 500 ms to a
second or two between the output of individual lines, when not dependant on
factors such as reading from a network of a faulty hard disk; that would just be
annoying, but still usable). st doesn't have such issues, so anyone experiencing
them should look for the cause elsewhere (Xft/fontconfig?).

Also, the need for "minimal software" is not comparable to the "need" for "low
latency" in a terminal emulator. The former is a fundamental concept, while the
latter is superficial.

> XTerm has many (visible) problems. Maybe I've misconfiuged it, but I
> cannot get it to fallback to other fonts reliably, and thus some glyphs
> don't render. It also chokes badly when it tries to render some unicode
> glyphs for the first time.

If you refer to color emoji or Nerd Font symbols, they are poorly supported in
X, and I'd even say they are bloat by themselves. But it is to be expected of
XTerm to have problems. To quote https://st.suckless.org,
> xterm is bloated and unmaintainable. Here's an excerpt from the README:
> 
> Abandon All Hope, Ye Who Enter Here
> 
> This is undoubtedly the most ugly program in the distribution. It was one
> of the first "serious" programs ported, and still has a lot of historical
> baggage.  Ideally, there would be a general tty widget and then vt102 and
> tek4014 subwidgets so that they could be used in other programs. We are
> trying to clean things up as we go, but there is still a lot of work to
> do.
> 
> Needless to say things have not changed, it's still ugly. It has over 65K
> lines of code and emulates obscure and obsolete terminals you will never need.
> 
> The popular alternative, rxvt has only 32K lines of code. This is just too
> much for something as simple as a terminal emulator; it's yet another example
> of code complexity.
> 
> Terminal emulation doesn't need to be so complex.


signature.asc
Description: PGP signature


Re: [dev] [st] wide characters getting cropped

2021-11-10 Thread NRK
On Tue, Nov 09, 2021 at 02:00:57PM +0100, Laslo Hunhold wrote:
> I'm always wondering: What do you suggest to improve the
> latency-situation?

If I knew the answer to that, then I would've ditched XTerm and patched
ST already. Unfortunately I know next to nothing when it comes to the
inner workings of a terminal.


On Tue, Nov 09, 2021 at 10:09:48PM +0100, Страхиња Радић wrote:
> I'm wondering what's the use case for such critical need for low latency?

I wouldn't say it's "critical need". And if we judge from that pov then
one could ask, "What's the critical need for a dynamic window manger or
minimal softwares in general?".

XTerm has many (visible) problems. Maybe I've misconfiuged it, but I
cannot get it to fallback to other fonts reliably, and thus some glyphs
don't render. It also chokes badly when it tries to render some unicode
glyphs for the first time.

I have neither of those problems on ST. But those situations are far
less common for me compared to situation where I'm typing into the
terminal (which is always). So if I can get a better experience out of
the most common workflow out of a certain software, then it's going be
the one I will end up using.

Also just to clarify, I wouldn't say ST has "latency issues", that
implies the situation is _bad_. As I've said, it's the 2nd most
responsive terminal I've tried and _MILES_ better than these "gpu
accelerated" terminals. It's also the only other terminal that I still
have installed in my system.

- NRK



Re: [dev] [st] wide characters getting cropped

2021-11-09 Thread Страхиња Радић
On 21/11/09 02:00, Laslo Hunhold wrote:
> I'm always wondering: What do you suggest to improve the
> latency-situation? Can we even be "better" than the screen's framerate?

I'm wondering what's the use case for such critical need for low latency?
Playing DOOM (2016) in a terminal with aalib? That's not what terminal emulators
were meant for.


signature.asc
Description: PGP signature


Re: [dev] [st] wide characters getting cropped

2021-10-29 Thread Страхиња Радић
On 21/10/29 12:18, Dmytro Kolomoiets wrote:
> Страхиња Радић, do you have a cleaned up version of the patch
> which applies to latest st tree without rejecting hunks?

No, but it shouldn't be too hard to make given the PR. I have applied it to my
fork of st (https://git.sr.ht/~strahinja/st).


signature.asc
Description: PGP signature


Re: [dev] [st] wide characters getting cropped

2021-10-29 Thread Dmytro Kolomoiets
> https://github.com/LukeSmithxyz/st/pull/224

Страхиња Радић, do you have a cleaned up version of the patch
which applies to latest st tree without rejecting hunks?


On Wed, 27 Oct 2021 at 23:12, NRK  wrote:
>
> On Wed, Oct 27, 2021 at 09:38:41AM +0200, Hiltjo Posthuma wrote:
> > Its a longstanding myth st has input latency issues.
> > The common quoted benchmark is wrong.
>
> If we're thinking about the same benchmark then it's also outdated.
> But regardless I didn't base my decision on that. Sometimes ago (9-10
> months) I was testing out a whole bunch of different terminals, some of
> the worst offenders when it comes to latency was Alacritty and Termite
> IIRC.
>
> ST, in my expereince, has been the 2nd most responsive terminal and
> quite close to XTerm. But still not as responsive and thus overtime I
> have switched over to it.
>
> It would be interesting to see someone doing a proper benchmark with all
> latest terminal versions, since I don't have any data to back up what I
> am saying :) So take it as you may.
>
> - NRK
>



Re: [dev] [st] wide characters getting cropped

2021-10-27 Thread NRK
On Wed, Oct 27, 2021 at 09:38:41AM +0200, Hiltjo Posthuma wrote:
> Its a longstanding myth st has input latency issues.
> The common quoted benchmark is wrong.

If we're thinking about the same benchmark then it's also outdated.
But regardless I didn't base my decision on that. Sometimes ago (9-10
months) I was testing out a whole bunch of different terminals, some of
the worst offenders when it comes to latency was Alacritty and Termite
IIRC.

ST, in my expereince, has been the 2nd most responsive terminal and
quite close to XTerm. But still not as responsive and thus overtime I
have switched over to it.

It would be interesting to see someone doing a proper benchmark with all
latest terminal versions, since I don't have any data to back up what I
am saying :) So take it as you may.

- NRK



Re: [dev] [st] wide characters getting cropped

2021-10-27 Thread Pavel Renev
The benchmark was done on macOS, if I'm not mistaken



Re: [dev] [st] wide characters getting cropped

2021-10-27 Thread Hiltjo Posthuma
On Wed, Oct 27, 2021 at 03:52:09AM +0600, NRK wrote:
> On Tue, Oct 26, 2021 at 07:51:52PM +, Ian Liu Rodrigues wrote:
> > I've noticed that in some situations wide characters are being cropped
> > on my terminal. The following script, which uses a wide character from
> > the "Nerd Font Symbol"[1], shows a test case:
> > 
> > 
> > echo -e '\e[31m \e[0m c'
> > echo -e '\e[31m  \e[0mc'
> > 
> 
> Hi Liu,
> 
> I remember having the same problem in ST, however it works fine now.
> Looking at my git log I haven't applied any patches for it and even the
> upstream branch works fine for me.
> 
> This might be related to terminfo (5). I won't be delving any deeper
> into this as I've moved over to XTerm long ago due to input latency
> reasons. You might want to check FAQ provided in the ST repo.
> 
> P.S echo is non-portable. Use printf.
> https://wiki.bash-hackers.org/commands/builtin/echo#portability_considerations
> 
> - NRK
> 

Its a longstanding myth st has input latency issues.
The common quoted benchmark is wrong.

-- 
Kind regards,
Hiltjo



Re: [dev] [st] wide characters getting cropped

2021-10-26 Thread Ian Liu Rodrigues
On Tuesday, October 26th, 2021 at 17:27, Страхиња Радић  
wrote:
> For me, this patch fixed the glyph truncation:
>
> https://github.com/LukeSmithxyz/st/pull/224
>
> Perhaps someone could add this to the official patches?


Thanks! I will try applying that patch.



Re: [dev] [st] wide characters getting cropped

2021-10-26 Thread NRK
On Tue, Oct 26, 2021 at 07:51:52PM +, Ian Liu Rodrigues wrote:
> I've noticed that in some situations wide characters are being cropped
> on my terminal. The following script, which uses a wide character from
> the "Nerd Font Symbol"[1], shows a test case:
> 
> 
> echo -e '\e[31m \e[0m c'
> echo -e '\e[31m  \e[0mc'
> 

Hi Liu,

I remember having the same problem in ST, however it works fine now.
Looking at my git log I haven't applied any patches for it and even the
upstream branch works fine for me.

This might be related to terminfo (5). I won't be delving any deeper
into this as I've moved over to XTerm long ago due to input latency
reasons. You might want to check FAQ provided in the ST repo.

P.S echo is non-portable. Use printf.
https://wiki.bash-hackers.org/commands/builtin/echo#portability_considerations

- NRK



[dev] [st] wide characters getting cropped

2021-10-26 Thread Ian Liu Rodrigues
Dear all,

This is my first post here after two failed attempts, I think because
of the email being sent as HTML. Lets hope this one goes alright.

I've noticed that in some situations wide characters are being cropped
on my terminal. The following script, which uses a wide character from
the "Nerd Font Symbol"[1], shows a test case:


echo -e '\e[31m \e[0m c'
echo -e '\e[31m  \e[0mc'


Here is a screenshot of the script's output: https://qu.ax/3SBs.png

The only difference between the two echo's is the position where
the foreground color resets: the first resets right after the wide
character, whereas the second resets after the space.

I've hacked a little bit in the source code but couldn't figure out how
st paints the characters. I see in function xdrawglyphfontspecs[2] that
it calls this:

/* Clean up the region we want to draw to. */
XftDrawRect(xw.draw, bg, winx, winy, width, win.ch);

which seems to clear the character rectangle unconditionally, but then
shouldn't the second echo also crop?

Kind regards,
Ian L. Rodrigues

[1]: 
https://raw.githubusercontent.com/ryanoasis/nerd-fonts/d0bf73a19c3459aab39734a05159e2694911d7d6/src/glyphs/Symbols-2048-em%20Nerd%20Font%20Complete.ttf

[2]: https://git.suckless.org/st/file/x.c.html#l1453




Re: [dev] [st] wide characters getting cropped

2021-10-26 Thread Страхиња Радић
n 21/10/26 07:51, Ian Liu Rodrigues wrote:
> echo -e '\e[31m \e[0m c'
> echo -e '\e[31m  \e[0mc'
>
>
> Here is a screenshot of the script's output: https://qu.ax/3SBs.png

For me, this patch fixed the glyph truncation:

https://github.com/LukeSmithxyz/st/pull/224

Perhaps someone could add this to the official patches?


signature.asc
Description: PGP signature


Re: [dev] [st] wide characters

2013-04-15 Thread Martti Kühne
On Sun, Apr 14, 2013 at 2:56 AM, Random832 random...@fastmail.us wrote:
 Okay, but why not work with a unicode code point as an int?



-1 from me.
It is utter madness to waste 32 (64 on x86_64) bits for a single
glyph. According to a quick google those chars can become as wide as 6
bytes, and believe me you don't want that, as long as there are
mblen(3) / mbrlen(3)...

cheers!
mar77i



Re: [dev] [st] wide characters

2013-04-15 Thread Alexander Sedov
2013/4/15 Martti Kühne mysat...@gmail.com:
 -1 from me.
 It is utter madness to waste 32 (64 on x86_64) bits for a single
 glyph. According to a quick google those chars can become as wide as 6
 bytes, and believe me you don't want that, as long as there are
 mblen(3) / mbrlen(3)...
int is always 32 bits, and given we are already wasting that exact
amount of space for each glyph (char[4]), your point is somewhat weak.
I think the real reason is future diacritics support and potential
abitily to store multiple runes at one glyph.



Re: [dev] [st] wide characters

2013-04-15 Thread random832
On Mon, Apr 15, 2013, at 10:58, Martti Kühne wrote:
 On Sun, Apr 14, 2013 at 2:56 AM, Random832 random...@fastmail.us wrote:
  Okay, but why not work with a unicode code point as an int?
 
 -1 from me.
 It is utter madness to waste 32 (64 on x86_64) bits for a single
 glyph.

A. current usage is char[4]

B. int is 32 bits on x86_64. There's no I in LP64.

 According to a quick google those chars can become as wide as 6
 bytes,

No, they can't. I have no idea what your source on this is.

 and believe me you don't want that, as long as there are
 mblen(3) / mbrlen(3)...

I don't know how these functions are relevant to your argument.



Re: [dev] [st] wide characters

2013-04-15 Thread Strake
On 15/04/2013, random...@fastmail.us random...@fastmail.us wrote:
 On Mon, Apr 15, 2013, at 10:58, Martti Kühne wrote:
 According to a quick google those chars can become as wide as 6
 bytes,

 No, they can't. I have no idea what your source on this is.

In UTF-8 the maximum encoded character length is 6 bytes [1]

[1] Linux docs: man 7 utf-8

This is more than a four-byte integer ('‿')



Re: [dev] [st] wide characters

2013-04-15 Thread Alexander Sedov
2013/4/15 Strake strake...@gmail.com:
 On 15/04/2013, random...@fastmail.us random...@fastmail.us wrote:
 On Mon, Apr 15, 2013, at 10:58, Martti Kühne wrote:
 According to a quick google those chars can become as wide as 6
 bytes,

 No, they can't. I have no idea what your source on this is.

 In UTF-8 the maximum encoded character length is 6 bytes [1]

 [1] Linux docs: man 7 utf-8

 This is more than a four-byte integer ('‿')

1. That's outdated information. Unicode range was reduced since then.
2. That's relevant to multibyte characters, not to wide. Wide are
always fixed size.



Re: [dev] [st] wide characters

2013-04-15 Thread random832
On Mon, Apr 15, 2013, at 15:16, Strake wrote:
 On 15/04/2013, random...@fastmail.us random...@fastmail.us wrote:
  On Mon, Apr 15, 2013, at 10:58, Martti Kühne wrote:
  According to a quick google those chars can become as wide as 6
  bytes,
 
  No, they can't. I have no idea what your source on this is.
 
 In UTF-8 the maximum encoded character length is 6 bytes [1]

What on earth does that have to do with using an int to store the code
point *instead of* the raw UTF-8 bytes (which are used _now_)?

Also, this is out of date; the latest version of unicode (since 2003 at
the latest) limits code points to 0x10 and therefore UTF-8 sequences
to four bytes. Unless your manpage is much older than mine, it states
this clearly and you misread it.



Re: [dev] [st] wide characters

2013-04-15 Thread Thorsten Glaser
Strake dixit:

In UTF-8 the maximum encoded character length is 6 bytes [1]

Right, but the largest codepoint in Unicode is U-0001,
which is �: F0 9F BF BF in UTF-8.

Most things are in the BMP anyway – for example, the distance
between the lowest and highest encoded glyph in an X11 font
is roughly 2¹⁶, so you’ll end up using up to 3 octets normally,
but at additional cost for some operations (glyph width, and,
though very minor, movement across characters).

Actually, wint_t is the standard type to use for this. One
could also use wchar_t but that may be an unsigned short on
some systems, or a signed or unsigned int. uint32_t makes
sense, if one doesn’t want to go after the possible savings
on 16-bit Unicode systems, since signed integers in C are
almost Undefined anyway…

bye,
//mirabilos
-- 
15:39⎜«mika:#grml» mira|AO: mit XFree86® wär’ das nicht passiert - muhaha
15:48⎜thkoehler:#grml also warum machen die xorg Jungs eigentlich alles
kaputt? :)15:49⎜novoid:#grml thkoehler: weil sie als Kinder nie den
gebauten Turm selber umschmeissen durften?  -- ~/.Xmodmap wonders…



Re: [dev] [st] wide characters

2013-04-15 Thread random832
On Mon, Apr 15, 2013, at 15:36, Thorsten Glaser wrote:
 Actually, wint_t is the standard type to use for this. One
 could also use wchar_t but that may be an unsigned short on
 some systems, or a signed or unsigned int.

Those systems aren't using wchar_t *or* wint_t for unicode, though.

The main reason for wint_t's existence is that wchar_t isn't guaranteed
to be able to represent a WEOF value distinct from all valid character
values. wchar_t can be used just fine for any actual character, but if
the system doesn't use unicode as its wchar type, it could (for example)
be a signed 16-bit int to wchar_t's unsigned 8-bit.

You can use #if __STDC_ISO_10646__ to test whether the implementation
uses unicode for wchar_t (most modern systems do, though some may not
define this constant) - if so, then wchar_t is, naturally, guaranteed to
be able to represent at least the range 0 to 0x10, and wint_t that
plus WEOF (usually -1). They're usually both 32-bit signed ints.

MS Windows uses an unsigned short for both types due to various
historical reasons.



Re: [dev] [st] wide characters

2013-04-15 Thread Thorsten Glaser
random...@fastmail.us dixit:

Those systems aren't using wchar_t *or* wint_t for unicode, though.

Do not assume that.

tg@blau:~ $ echo '__STDC_ISO_10646__ / __WCHAR_TYPE__ , __WCHAR_MAX__' | cc -E -
# 1 stdin
# 1 built-in
# 1 command line
# 1 stdin
29L / short unsigned int , 65535U

The main reason for wint_t's existence is that wchar_t isn't guaranteed
to be able to represent a WEOF value distinct from all valid character

Right.

You can use #if __STDC_ISO_10646__ to test whether the implementation
uses unicode for wchar_t (most modern systems do, though some may not
define this constant)

I think most do not define this constant…

 - if so, then wchar_t is, naturally, guaranteed to
be able to represent at least the range 0 to 0x10

Nope. But systems using 16 bit may not rise past 29L
even if they do otherwise support newer Unicode stuff.

This works very well by the way: (wchar_t)-1 and (wchar_t)-2
aren’t Unicode characters anyway, and it allows for relatively
easy conversion of legacy software, such as BSD tr (which uses
tables), to Unicode.

I should know, I implemented it for this purpose ;-)

bye,
//mirabilos
-- 
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font.   -- Rob Pike in Notes on Programming in C



Re: [dev] [st] wide characters

2013-04-14 Thread Christoph Lohmann
Greetings.

On Sun, 14 Apr 2013 08:10:22 +0200 Random832 random...@fastmail.us wrote:
 I am forced to ask, though, why character cell values are stored in 
 utf-8 rather than as wchar_t (or as an explicitly unicode int) in the 
 first place, particularly since the simplest way to detect a wide 
 character is to call the function wcwidth. What was the reason for this 
 design decision? It doesn't save any space, since on most systems 
 UTF_SIZ == sizeof(int) == sizeof(wchar_t).

That  design decision can change when I’m actually implementing the dou‐
ble‐width and double‐height support in st. The codebase is small  enough
to change such a type in less than 10 minutes. So no religion was intro‐
duced here.

 And I don't know the st codebase well enough (or at all, really) to tell 
 at a glance what would have to be changed to be able to support a 
 double-width character cell, or to support wrapping to the next line if 
 one is output at the second-to-last column.

I hadn't yet the time to read all the double-width implementations in other
terminals so st would do the »right thing« in implementing all questionable
cases.

Double‐width characters are like BCE a design decision applications need
adapt to.

Some corner cases I haven't yet found a good answer to:
* Is there any standard for this except for setting the flag in
  terminfo and taking up two cells in the terminal?
* If st has double-width default.
* What happens if the application does naive character
  counting? Will layouts break?
* Is there some way to tell the application that we have
  double-width support enforced except for the terminfo?
* How do applications implement this? Is there some historical
  cruft that will break?
* With an option to toggle the double-width handling:
* Is this needed for tmux, screen or other terminal proxies
  that for example miss BCE too?

These  are  the questions I miss an answer too before implementing this.
The code isn’t a problem.


Sincerely,

Christoph Lohmann




Re: [dev] [st] wide characters

2013-04-14 Thread Troels Henriksen
Random832 random...@fastmail.us writes:

 On 04/13/2013 07:07 PM, Aurélien Aptel wrote:
 The ISO/IEC 10646:2003 Unicode standard 4.0 says that:

  The width of wchar_t is compiler-specific and can be as small as
 8 bits. Consequently, programs that need to be portable across any C
 or C++ compiler should not use wchar_t for storing Unicode text. The
 wchar_t type is intended for storing compiler-defined wide characters,
 which may be Unicode characters in some compilers.

 utf-8 is rather straightforward to handle and process.

 Okay, but why not work with a unicode code point as an int?

That would not be UTF-8, but UCS-4.  I don't think Xlib can handle that
natively.

-- 
\  Troels
/\ Henriksen



Re: [dev] [st] wide characters

2013-04-14 Thread Random832

On 04/14/2013 02:10 AM, Christoph Lohmann wrote:

Greetings.

On Sun, 14 Apr 2013 08:10:22 +0200 Random832 random...@fastmail.us wrote:

I am forced to ask, though, why character cell values are stored in
utf-8 rather than as wchar_t (or as an explicitly unicode int) in the
first place, particularly since the simplest way to detect a wide
character is to call the function wcwidth. What was the reason for this
design decision? It doesn't save any space, since on most systems
UTF_SIZ == sizeof(int) == sizeof(wchar_t).

That  design decision can change when I’m actually implementing the dou‐
ble‐width and double‐height support in st. The codebase is small  enough
to change such a type in less than 10 minutes. So no religion was intro‐
duced here.


The reason for my question about using codepoints instead of UTF-8 was 
because I thought it might make it easier to support combining 
diacritics, not wide characters. The two problems are broadly related 
because both of them affect the number of character cells occupied by a 
string.



And I don't know the st codebase well enough (or at all, really) to tell
at a glance what would have to be changed to be able to support a
double-width character cell, or to support wrapping to the next line if
one is output at the second-to-last column.

I hadn't yet the time to read all the double-width implementations in other
terminals so st would do the »right thing« in implementing all questionable
cases.

Double‐width characters are like BCE a design decision applications need
adapt to.

Some corner cases I haven't yet found a good answer to:
* Is there any standard for this except for setting the flag in
  terminfo and taking up two cells in the terminal?


I don't know if there's a standard. I can find nothing about character 
cell terminals in any UTR, and ECMA 48 is silent on the question of wide 
characters.


I don't know what terminfo flag you are referring to. I was talking 
about support for east asian characters, not VT100-style stretching of 
ASCII characters. I suspect the widcs/swidm/rwidm capabilities refer to 
the latter (though the only actual instance in the terminfo database is 
a swidm string on the att730).



Observed behavior in various terminals that do support them is:
* cursor position can be in either half of a double character, though 
the whole character is hilighted (all observed terminals)
* outputting one at the end of the line (i.e. where a pair of two narrow 
characters would be split across lines) fails entirely (xterm) or wraps 
to the next line leaving the last cell alone (vte, tmux, mlterm, kterm).
* outputting a narrow character on top of a wide character erases the 
entire wide character (xterm, tmux, mlterm, kterm) or erases only when 
in the left half (vte)


* deleting (e.g. with ESC [ P) part of a character has various different 
behaviors:
** on xterm and kterm, deleting either half of a character replaces the 
remaining half with a single-width blank space.
** tmux's behavior is very buggy: a vertical line drawn across a 
different part of the screen _after_ deleting different parts of wide 
characters on different lines ended up redrawing incorrectly after 
refreshing. As for the wide characters themselves, deleting the left 
half deletes the entire character and deleting the right half has no 
effect, but there is some hidden state involved - a sequence of two 
deletions will delete a single wide character. I suspect the right 
half is filled with some placeholder value that is not output to the 
host terminal, and they are deleted individually. This is consistent 
with all of my observations.
** on mlterm, deleting the left half of a character deletes the entire 
character; deleting the right half replaces it with two spaces.
** on vte, deleting the right half of a character replaces the _next_ 
character with a space. Deleting the left half replaces the present 
character with a space, but seems to leave some hidden state, since the 
cursor on this space is still double width.
* the xterm/kterm behavior seems the most rational, since it yields no 
visual glitches, always keeps the cursor in the same logical position, 
and a deletion always shifts characters right of it by the same amount.


I haven't made any detailed investigation into the actual set of 
characters that are considered wide (or combining) by each terminal and 
by various applications, (except tmux, which has a list of ranges in 
utf8.c). I also haven't investigated whether any of them have 
locale-dependent treatment of ambiguous characters (e.g. greek or 
cyrillic) which are wide in historical east asian fonts (except tmux, 
which does not)


mlterm does have an option that makes it work differently; the above 
results are with -Z enabled.



* If st has double-width default.
* What happens if the application does naive character
  counting? Will layouts break?


My experience is that layouts break 

[dev] [st] wide characters

2013-04-13 Thread Random832
I don't mean as in wchar_t, I mean as in characters (generally in East 
Asian languages) that are meant to take up two character cells.


I am forced to ask, though, why character cell values are stored in 
utf-8 rather than as wchar_t (or as an explicitly unicode int) in the 
first place, particularly since the simplest way to detect a wide 
character is to call the function wcwidth. What was the reason for this 
design decision? It doesn't save any space, since on most systems 
UTF_SIZ == sizeof(int) == sizeof(wchar_t).


And I don't know the st codebase well enough (or at all, really) to tell 
at a glance what would have to be changed to be able to support a 
double-width character cell, or to support wrapping to the next line if 
one is output at the second-to-last column.




Re: [dev] [st] wide characters

2013-04-13 Thread Aurélien Aptel
On Sat, Apr 13, 2013 at 11:17 PM, Random832 random...@fastmail.us wrote:
 I am forced to ask, though, why character cell values are stored in utf-8
 rather than as wchar_t (or as an explicitly unicode int) in the first place,
 particularly since the simplest way to detect a wide character is to call
 the function wcwidth. What was the reason for this design decision? It
 doesn't save any space, since on most systems UTF_SIZ == sizeof(int) ==
 sizeof(wchar_t).

The ISO/IEC 10646:2003 Unicode standard 4.0 says that:

The width of wchar_t is compiler-specific and can be as small as
8 bits. Consequently, programs that need to be portable across any C
or C++ compiler should not use wchar_t for storing Unicode text. The
wchar_t type is intended for storing compiler-defined wide characters,
which may be Unicode characters in some compilers.

utf-8 is rather straightforward to handle and process.



Re: [dev] [st] wide characters

2013-04-13 Thread Random832

On 04/13/2013 07:07 PM, Aurélien Aptel wrote:

The ISO/IEC 10646:2003 Unicode standard 4.0 says that:

 The width of wchar_t is compiler-specific and can be as small as
8 bits. Consequently, programs that need to be portable across any C
or C++ compiler should not use wchar_t for storing Unicode text. The
wchar_t type is intended for storing compiler-defined wide characters,
which may be Unicode characters in some compilers.

utf-8 is rather straightforward to handle and process.


Okay, but why not work with a unicode code point as an int?