Re: Fix handling of ed addresses consisting only of separators

2023-01-29 Thread Sören Tempel
Hi,

I've been using this patch for the past 6 months and haven't noticed any
regressions. However, during this timespan I became aware of additional
POSIX compatibility issues in OpenBSD ed: `3  2p` as well as
addresses with interleaved `,` and `;` separators (e.g. `1,;`) are also
evaluated incorrectly.

These additional compatibility issues are unrelated to the issue fixed
by the proposed patch. Is there a general interest in improving the
POSIX compatibility of OpenBSD ed? If so, what would need to be done in
order to get the proposed patch committed?

Greetings,
Sören

Sören Tempel  wrote:
> PING.
> 
> I have executed the ed tests (available in bin/ed/test) with this patch
> applied and the proposed patch doesn't seem to introduce any new
> regressions.
> 
> If the patch needs to be revised further, please let me know. The only
> thing I can think of right now is that it might make sense to rename the
> `recur` variable to `sawseparator` or something along those lines.
> 
> Greetings,
> Sören
> 
> Sören Tempel  wrote:
> > Hello,
> > 
> > The POSIX specification of ed(1) includes a table in the rationale
> > section which (among others) mandates the following address handling
> > rules [1]:
> > 
> >  Address Addr1 Addr2
> >  ,,  $ $
> >  ;;  $ $
> > 
> > Unfortunately, OpenBSD does not correctly handle these two example
> > addresses. As an example, consider the following `,,p` print command:
> > 
> > printf "a\nfoo\nbar\nbaz\n.\n,,p\nQ\n" | ed
> > 
> > This should only print the last line (as `,,p` should expand to `$,$p`)
> > but instead prints all lines with OpenBSD ed. This seems to be a
> > regression introduced by Jerome Frgagic in 2017 [2]. Prior to this
> > change, `,,p` as well as `;;p` correctly printed the last line. The
> > aforementioned change added a recursive invocation of next_addr() which
> > does not update the local first variable (causing the second address
> > separator to be mistaken for the first). The patch below fixes this by
> > tracking recursive invocations of next_addr() via an additional function
> > parameter.
> > 
> > See also: https://austingroupbugs.net/view.php?id=1582
> > 
> > [1]: 
> > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#tag_20_38_18
> > [2]: 
> > https://github.com/openbsd/src/commit/b4c905bbf3932a50011ce3436b1e22acc742cfeb
> > 
> > diff --git bin/ed/main.c bin/ed/main.c
> > index 231d021ef19..ca1aeb102b3 100644
> > --- bin/ed/main.c
> > +++ bin/ed/main.c
> > @@ -64,7 +64,7 @@ void signal_hup(int);
> >  void signal_int(int);
> >  void handle_winch(int);
> >  
> > -static int next_addr(void);
> > +static int next_addr(int);
> >  static int check_addr_range(int, int);
> >  static int get_matching_node_addr(regex_t *, int);
> >  static char *get_filename(int);
> > @@ -296,7 +296,7 @@ extract_addr_range(void)
> >  
> > addr_cnt = 0;
> > first_addr = second_addr = current_addr;
> > -   while ((addr = next_addr()) >= 0) {
> > +   while ((addr = next_addr(0)) >= 0) {
> > addr_cnt++;
> > first_addr = second_addr;
> > second_addr = addr;
> > @@ -328,7 +328,7 @@ extract_addr_range(void)
> >  
> >  /*  next_addr: return the next line address in the command buffer */
> >  static int
> > -next_addr(void)
> > +next_addr(int recur)
> >  {
> > char *hd;
> > int addr = current_addr;
> > @@ -382,11 +382,15 @@ next_addr(void)
> > case '%':
> > case ',':
> > case ';':
> > -   if (first) {
> > +   if (first && !recur) {
> > ibufp++;
> > addr_cnt++;
> > second_addr = (c == ';') ? current_addr : 1;
> > -   if ((addr = next_addr()) < 0)
> > +
> > +   // If the next address is omitted (e.g. "," or 
> > ";") or
> > +   // if it is also a separator (e.g. ",," or 
> > ";;") then
> > +   // return the last address (as per the omission 
> > rules).
> > +   if ((addr = next_addr(1)) < 0)
> > addr = addr_last;
> > break;
> > }



Re: ed: Make use of stderr compliant with POSIX

2022-11-17 Thread Sören Tempel
PING.

I think this is a fairly obvious POSIX compliance issue with an easy fix.

Sören Tempel  wrote:
> Hello,
> 
> Currently, the OpenBSD ed implementation incorrectly writes information
> to standard error that should be written to standard out (as per POSIX).
> 
> For the read and write command the POSIX standard states the following:
> 
>   If the command is successful, the number of bytes written shall
>   be written to standard output [...]
> 
> However, OpenBSD ed presently writes the number of bytes (for both the
> read and the write command) to standard error. Furthermore, the POSIX
> standard mandates that the infamous "?\n" error string should also be
> written to standard output while OpenBSD ed presently writes the string
> to standard error.
> 
> Both cases are fixed by the patch below.
> 
> OK?
> 
> diff --git bin/ed/io.c bin/ed/io.c
> index 97306be16..e75bedd8b 100644
> --- bin/ed/io.c
> +++ bin/ed/io.c
> @@ -64,7 +64,7 @@ read_file(char *fn, int n)
>   return ERR;
>   }
>   if (!scripted)
> - fprintf(stderr, "%d\n", size);
> + printf("%d\n", size);
>   return current_addr - n;
>  }
>  
> @@ -166,7 +166,7 @@ write_file(char *fn, char *mode, int n, int m)
>   return ERR;
>   }
>   if (!scripted)
> - fprintf(stderr, "%d\n", size);
> + printf("%d\n", size);
>   return n ? m - n + 1 : 0;
>  }
>  
> diff --git bin/ed/main.c bin/ed/main.c
> index 231d021ef..674741c7c 100644
> --- bin/ed/main.c
> +++ bin/ed/main.c
> @@ -184,7 +184,7 @@ top:
>   signal(SIGINT, signal_int);
>   if (sigsetjmp(env, 1)) {
>   status = -1;
> - fputs("\n?\n", stderr);
> + fputs("\n?\n", stdout);
>   seterrmsg("interrupt");
>   } else {
>   init_buffers();
> @@ -196,7 +196,7 @@ top:
>   strlcpy(old_filename, *argv,
>   sizeof old_filename);
>   } else if (argc) {
> - fputs("?\n", stderr);
> + fputs("?\n", stdout);
>   if (**argv == '\0')
>   seterrmsg("invalid filename");
>   if (!interactive)
> @@ -215,7 +215,7 @@ top:
>   continue;
>   } else if (n == 0) {
>   if (modified && !scripted) {
> - fputs("?\n", stderr);
> + fputs("?\n", stdout);
>   seterrmsg("warning: file modified");
>   if (!interactive) {
>   if (garrulous)
> @@ -250,7 +250,7 @@ top:
>   break;
>   case EMOD:
>   modified = 0;
> - fputs("?\n", stderr);   /* give warning */
> + fputs("?\n", stdout);   /* give warning */
>   seterrmsg("warning: file modified");
>   if (!interactive) {
>   if (garrulous)
> @@ -271,7 +271,7 @@ top:
>   quit(3);
>   break;
>   default:
> - fputs("?\n", stderr);
> + fputs("?\n", stdout);
>   if (!interactive) {
>   if (garrulous)
>   fprintf(stderr,



ed: Make use of stderr compliant with POSIX

2022-09-25 Thread Sören Tempel
Hello,

Currently, the OpenBSD ed implementation incorrectly writes information
to standard error that should be written to standard out (as per POSIX).

For the read and write command the POSIX standard states the following:

If the command is successful, the number of bytes written shall
be written to standard output [...]

However, OpenBSD ed presently writes the number of bytes (for both the
read and the write command) to standard error. Furthermore, the POSIX
standard mandates that the infamous "?\n" error string should also be
written to standard output while OpenBSD ed presently writes the string
to standard error.

Both cases are fixed by the patch below.

OK?

diff --git bin/ed/io.c bin/ed/io.c
index 97306be16..e75bedd8b 100644
--- bin/ed/io.c
+++ bin/ed/io.c
@@ -64,7 +64,7 @@ read_file(char *fn, int n)
return ERR;
}
if (!scripted)
-   fprintf(stderr, "%d\n", size);
+   printf("%d\n", size);
return current_addr - n;
 }
 
@@ -166,7 +166,7 @@ write_file(char *fn, char *mode, int n, int m)
return ERR;
}
if (!scripted)
-   fprintf(stderr, "%d\n", size);
+   printf("%d\n", size);
return n ? m - n + 1 : 0;
 }
 
diff --git bin/ed/main.c bin/ed/main.c
index 231d021ef..674741c7c 100644
--- bin/ed/main.c
+++ bin/ed/main.c
@@ -184,7 +184,7 @@ top:
signal(SIGINT, signal_int);
if (sigsetjmp(env, 1)) {
status = -1;
-   fputs("\n?\n", stderr);
+   fputs("\n?\n", stdout);
seterrmsg("interrupt");
} else {
init_buffers();
@@ -196,7 +196,7 @@ top:
strlcpy(old_filename, *argv,
sizeof old_filename);
} else if (argc) {
-   fputs("?\n", stderr);
+   fputs("?\n", stdout);
if (**argv == '\0')
seterrmsg("invalid filename");
if (!interactive)
@@ -215,7 +215,7 @@ top:
continue;
} else if (n == 0) {
if (modified && !scripted) {
-   fputs("?\n", stderr);
+   fputs("?\n", stdout);
seterrmsg("warning: file modified");
if (!interactive) {
if (garrulous)
@@ -250,7 +250,7 @@ top:
break;
case EMOD:
modified = 0;
-   fputs("?\n", stderr);   /* give warning */
+   fputs("?\n", stdout);   /* give warning */
seterrmsg("warning: file modified");
if (!interactive) {
if (garrulous)
@@ -271,7 +271,7 @@ top:
quit(3);
break;
default:
-   fputs("?\n", stderr);
+   fputs("?\n", stdout);
if (!interactive) {
if (garrulous)
fprintf(stderr,



Re: Fix handling of ed addresses consisting only of separators

2022-06-19 Thread Sören Tempel
PING.

I have executed the ed tests (available in bin/ed/test) with this patch
applied and the proposed patch doesn't seem to introduce any new
regressions.

If the patch needs to be revised further, please let me know. The only
thing I can think of right now is that it might make sense to rename the
`recur` variable to `sawseparator` or something along those lines.

Greetings,
Sören

Sören Tempel  wrote:
> Hello,
> 
> The POSIX specification of ed(1) includes a table in the rationale
> section which (among others) mandates the following address handling
> rules [1]:
> 
>Address Addr1 Addr2
>,,  $ $
>;;  $ $
> 
> Unfortunately, OpenBSD does not correctly handle these two example
> addresses. As an example, consider the following `,,p` print command:
> 
>   printf "a\nfoo\nbar\nbaz\n.\n,,p\nQ\n" | ed
> 
> This should only print the last line (as `,,p` should expand to `$,$p`)
> but instead prints all lines with OpenBSD ed. This seems to be a
> regression introduced by Jerome Frgagic in 2017 [2]. Prior to this
> change, `,,p` as well as `;;p` correctly printed the last line. The
> aforementioned change added a recursive invocation of next_addr() which
> does not update the local first variable (causing the second address
> separator to be mistaken for the first). The patch below fixes this by
> tracking recursive invocations of next_addr() via an additional function
> parameter.
> 
> See also: https://austingroupbugs.net/view.php?id=1582
> 
> [1]: 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#tag_20_38_18
> [2]: 
> https://github.com/openbsd/src/commit/b4c905bbf3932a50011ce3436b1e22acc742cfeb
> 
> diff --git bin/ed/main.c bin/ed/main.c
> index 231d021ef19..ca1aeb102b3 100644
> --- bin/ed/main.c
> +++ bin/ed/main.c
> @@ -64,7 +64,7 @@ void signal_hup(int);
>  void signal_int(int);
>  void handle_winch(int);
>  
> -static int next_addr(void);
> +static int next_addr(int);
>  static int check_addr_range(int, int);
>  static int get_matching_node_addr(regex_t *, int);
>  static char *get_filename(int);
> @@ -296,7 +296,7 @@ extract_addr_range(void)
>  
>   addr_cnt = 0;
>   first_addr = second_addr = current_addr;
> - while ((addr = next_addr()) >= 0) {
> + while ((addr = next_addr(0)) >= 0) {
>   addr_cnt++;
>   first_addr = second_addr;
>   second_addr = addr;
> @@ -328,7 +328,7 @@ extract_addr_range(void)
>  
>  /*  next_addr: return the next line address in the command buffer */
>  static int
> -next_addr(void)
> +next_addr(int recur)
>  {
>   char *hd;
>   int addr = current_addr;
> @@ -382,11 +382,15 @@ next_addr(void)
>   case '%':
>   case ',':
>   case ';':
> - if (first) {
> + if (first && !recur) {
>   ibufp++;
>   addr_cnt++;
>   second_addr = (c == ';') ? current_addr : 1;
> - if ((addr = next_addr()) < 0)
> +
> + // If the next address is omitted (e.g. "," or 
> ";") or
> + // if it is also a separator (e.g. ",," or 
> ";;") then
> + // return the last address (as per the omission 
> rules).
> + if ((addr = next_addr(1)) < 0)
>   addr = addr_last;
>   break;
>   }



Fix handling of ed addresses consisting only of separators

2022-05-22 Thread Sören Tempel
Hello,

The POSIX specification of ed(1) includes a table in the rationale
section which (among others) mandates the following address handling
rules [1]:

 Address Addr1 Addr2
 ,,  $ $
 ;;  $ $

Unfortunately, OpenBSD does not correctly handle these two example
addresses. As an example, consider the following `,,p` print command:

printf "a\nfoo\nbar\nbaz\n.\n,,p\nQ\n" | ed

This should only print the last line (as `,,p` should expand to `$,$p`)
but instead prints all lines with OpenBSD ed. This seems to be a
regression introduced by Jerome Frgagic in 2017 [2]. Prior to this
change, `,,p` as well as `;;p` correctly printed the last line. The
aforementioned change added a recursive invocation of next_addr() which
does not update the local first variable (causing the second address
separator to be mistaken for the first). The patch below fixes this by
tracking recursive invocations of next_addr() via an additional function
parameter.

See also: https://austingroupbugs.net/view.php?id=1582

[1]: 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#tag_20_38_18
[2]: 
https://github.com/openbsd/src/commit/b4c905bbf3932a50011ce3436b1e22acc742cfeb

diff --git bin/ed/main.c bin/ed/main.c
index 231d021ef19..ca1aeb102b3 100644
--- bin/ed/main.c
+++ bin/ed/main.c
@@ -64,7 +64,7 @@ void signal_hup(int);
 void signal_int(int);
 void handle_winch(int);
 
-static int next_addr(void);
+static int next_addr(int);
 static int check_addr_range(int, int);
 static int get_matching_node_addr(regex_t *, int);
 static char *get_filename(int);
@@ -296,7 +296,7 @@ extract_addr_range(void)
 
addr_cnt = 0;
first_addr = second_addr = current_addr;
-   while ((addr = next_addr()) >= 0) {
+   while ((addr = next_addr(0)) >= 0) {
addr_cnt++;
first_addr = second_addr;
second_addr = addr;
@@ -328,7 +328,7 @@ extract_addr_range(void)
 
 /*  next_addr: return the next line address in the command buffer */
 static int
-next_addr(void)
+next_addr(int recur)
 {
char *hd;
int addr = current_addr;
@@ -382,11 +382,15 @@ next_addr(void)
case '%':
case ',':
case ';':
-   if (first) {
+   if (first && !recur) {
ibufp++;
addr_cnt++;
second_addr = (c == ';') ? current_addr : 1;
-   if ((addr = next_addr()) < 0)
+
+   // If the next address is omitted (e.g. "," or 
";") or
+   // if it is also a separator (e.g. ",," or 
";;") then
+   // return the last address (as per the omission 
rules).
+   if ((addr = next_addr(1)) < 0)
addr = addr_last;
break;
}



[PATCH] Fix ed shell command when stdout isn't line-buffered

2022-01-22 Thread Sören Tempel
Hello,

The ed shell escape command currently behaves incorrectly if standard
output isn't line-buffered. As an example, consider the following ed(1)
invocations:

$ cat /tmp/ed-cmd
!echo foo
!!
$ ed < /tmp/ed-cmd
foo
!
echo foo
foo
!
$ ed < /tmp/ed-cmd > /tmp/ed-out.txt
$ cat /tmp/ed-out.txt
foo
foo
!
echo foo
!

The output of the first and second invocation differs in regards to when
the modified command-string of the !! command is output. In the first
invocation, it is output before the command is executed (which is the
behavior mandated by POSIX). However, in the second invocation it is
output **after** the command was executed (which is incorrect).

This is due to the fact that in the second invocation standard output is
a file and hence not line-buffered. Unfortunately, ed(1) does currently
not flush all I/O buffers before executing the command using system(3).
As such, output of the command will be written to standard output before
any data buffered by ed itself is written.

The patch below fixes this issue by flushing all open output streams
before executing the command using system(3). Alternatively, it may also
be sufficient to only flush stdout and (maybe) stderr.

This issue was originally discovered in GNU ed where it has been fixed
since. See: https://lists.gnu.org/archive/html/bug-ed/2021-12/msg0.html

Greetings,
Sören Tempel

diff --git bin/ed/main.c bin/ed/main.c
index dc73dd92b..1e3cd8bdb 100644
--- bin/ed/main.c
+++ bin/ed/main.c
@@ -852,6 +852,7 @@ exec_command(void)
return ERR;
GET_COMMAND_SUFFIX();
if (sflags) printf("%s\n", shcmd + 1);
+   fflush(NULL); /* flush any buffered I/O */
system(shcmd + 1);
if (!scripted) printf("!\n");
break;



Re: ksh: add support for bracketed paste mode

2021-09-04 Thread Sören Tempel
   OF_ANY }, /* non-standard */
diff --git sh.h sh.h
index 93beef31d46..652a1f6dd06 100644
--- sh.h
+++ sh.h
@@ -134,6 +134,9 @@ extern const struct option sh_options[];
 enum sh_flag {
FEXPORT = 0,/* -a: export all */
FBRACEEXPAND,   /* enable {} globbing */
+#ifdef EMACS
+   FBBRACKETPASTE, /* bracketed paste mode */
+#endif
FBGNICE,/* bgnice */
FCOMMAND,   /* -c: (invocation) execute specified command */
FCSHHISTORY,/* csh-style history enabled */

Sören Tempel  wrote:
> Hello,
> 
> The diff below adds support for bracketed paste mode to ksh. Bracketed
> paste mode is a set of special escape sequences, which are employed by
> many terminal emulators to allow programs run inside of them to
> distinguish pasted text from typed-in text [0]. This is useful for
> preventing pasted text from accidentally executing commands in the
> application it was pasted to. Commonly this problem arises when copying
> text from a web browser to a shell since the user may have copied
> hidden text from the web page which may contain control characters such
> as \n [1].
> 
> Bracketed paste mode is supported by all mainstream terminal emulators
> including xterm, urxvt, and gnome-terminal. The implementation proposed
> here was tested with urxvt. However, since we cannot determine in
> advance whether the utilized terminal emulator supports these escape
> sequences the feature must be explicitly activated using the set
> builtin. I haven't tested the diff extensively yet but it is rather
> simple so I don't expect any breakages. One limitation of the proposed
> implementation is that in only works in emacs mode. Would be happy to
> address any feedback you might have.
> 
> Greetings,
> Sören
> 
> [0]: https://www.xfree86.org/4.7.0/ctlseqs.html#Bracketed%20Paste%20Mode
> [1]: https://thejh.net/misc/website-terminal-copy-paste
> 
> diff --git bin/ksh/edit.c bin/ksh/edit.c
> index 3089d195d..4b6d45050 100644
> --- bin/ksh/edit.c
> +++ bin/ksh/edit.c
> @@ -150,12 +150,23 @@ x_puts(const char *s)
>   shf_putc(*s++, shl_out);
>  }
>  
> +static void
> +x_paste_mode(bool onoff)
> +{
> + if (!Flag(FBBRACKETPASTE))
> + return;
> +
> + printf((onoff) ? BRPASTE_INT : BRPASTE_DEINT);
> + fflush(stdout);
> +}
> +
>  bool
>  x_mode(bool onoff)
>  {
>   static bool x_cur_mode;
>   boolprev;
>  
> + x_paste_mode(onoff);
>   if (x_cur_mode == onoff)
>   return x_cur_mode;
>   prev = x_cur_mode;
> diff --git bin/ksh/edit.h bin/ksh/edit.h
> index 0b604cd64..8cc774f01 100644
> --- bin/ksh/edit.h
> +++ bin/ksh/edit.h
> @@ -34,6 +34,12 @@ extern X_chars edchars;
>  #define XCF_FULLPATH BIT(2)  /* command completion: store full path */
>  #define XCF_COMMAND_FILE (XCF_COMMAND|XCF_FILE)
>  
> +/* https://www.xfree86.org/4.7.0/ctlseqs.html#Bracketed%20Paste%20Mode */
> +#define BRPASTE_INT  "\033[?2004h"
> +#define BRPASTE_DEINT"\033[?2004l"
> +#define BRPASTE_PRE  kb_encode("^[[200~")
> +#define BRPASTE_POST kb_encode("^[[201~")
> +
>  /* edit.c */
>  int  x_getc(void);
>  void x_flush(void);
> diff --git bin/ksh/emacs.c bin/ksh/emacs.c
> index 694c402ff..96136263e 100644
> --- bin/ksh/emacs.c
> +++ bin/ksh/emacs.c
> @@ -118,6 +118,7 @@ staticchar*xmp;   /* mark pointer */
>  static   char*killstack[KILLSIZE];
>  static   int killsp, killtp;
>  static   int x_literal_set;
> +static   int x_brack_paste;
>  static   int x_arg_set;
>  static   char*macro_args;
>  static   int prompt_skip;
> @@ -203,6 +204,8 @@ static intx_fold_lower(int);
>  static int   x_fold_upper(int);
>  static int   x_set_arg(int);
>  static int   x_comment(int);
> +static int   x_brack_paste_start(int);
> +static int   x_brack_paste_end(int);
>  #ifdef DEBUG
>  static int   x_debug_info(int);
>  #endif
> @@ -260,6 +263,8 @@ static const struct x_ftab x_ftab[] = {
>   { x_fold_upper, "upcase-word",  XF_ARG },
>   { x_set_arg,"set-arg",  XF_NOBIND },
>   { x_comment,"comment",  0 },
> + { x_brack_paste_start,  "bracketed-paste-start",0 },
> + { x_brack_paste_end,"bracketed-paste-end",  0 },
>   { 0, 0, 0 },
>  #ifdef DEBUG
>   { x_debug_info, "debug-info",   0 },
> @@ -316,6 +321,8 @@ x_emacs(char *buf, size_t len)
>   }
>  
>   x_li

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-27 Thread Sören Tempel
Ingo Schwarze  wrote:
> It would, and in principle, that would be an improvement.
> But i think editline(3) code quality is insufficent for use in a
> shell.  It's all quite messy and hastily and sloppily written.
> I tried to polish some of it in the past, but got distracted,
> so editline(3) code is still full of stuff that needs review
> and quality improvement.
> 
> Actually, i'm a bit scared that sftp(1) uses it.  Then again, i'm not
> aware that it caused any major vulnerabilities in the past, and the
> OpenSSH developers are not at all reckless people, so i am sure they
> know what they are doing.

Coincidentally, I actually found two non-critical out-of-bounds memory
accesses in the NetBSD version of editline two years ago [1] [2].
Quickly checking the OpenBSD code, it seems the associated fixes haven't
made their way into the OpenBSD editline version yet. Not sure if/how
this is normally done but maybe it makes sense to (partially) sync the
OpenBSD version with the NetBSD one? The latter has accumulated a few
general improvements over the past years. Either way, I do understand
your resentment towards using editline in ksh.

Greetings,
Sören

[1]: 
https://github.com/NetBSD/src/commit/e93ca0319937d29fabb0a98abfdbef65a55dba9a
[2]: 
https://github.com/NetBSD/src/commit/f6dff047a3ee27c332a60cef9c76355093a4e05e



Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-07 Thread Sören Tempel
Ingo Schwarze  wrote:
> Hi,

Hello,

> Which problem needs fixing:
> Of the four-byte UTF-8 sequences, only a subset is identified by the
> existing code.  The other four-byte UTF-8 sequences still get chopped
> up resulting in individual bytes being passed on.
> 
> 
> I'm also adding a few comments as suggested by jca@.  Parsing of UTF-8
> is less trivial than one might think, witnessed once again by the fact
> that i got this code wrong in the first place.
> 
> I also changed "cc & 0x20" to "cc > 0x9f" and "cc & 0x30" to "cc > 0x8f"
> for uniformity and readabilty - UTF-8-parsing is bad enough without
> needless micro-optimization, right?

Nice, wasn't aware that you also had a patch ready. Sounds good to me
and also fixes the problem I originally experienced with 4 byte UTF-8
sequences.

> Note that even with the patch below, moving backward and forward
> over a blowfish icon on the command line still does not work because
> the character is width 2 and the ksh code intentionally does not
> use wcwidth(3).  But maybe it improves something in tmux?  Not sure.

Character movements over emojis (e.g. U+1F421) are currently broken
because the ksh code doesn't correctly determine the amount of columns
needed for a given character (i.e. what you would normally do with
wcwidth). I tried fixing this but without wchar.h doing so seemed very
cumbersome. Inputting emojis works with your patch though and was broken
previously.

> Either way, unless it causes regressions, this (or a further improved
> version) should go in because what is there is clearly wrong.
> 
> OK?

Your diff looks good to me.

BTW: Is there any reason why ksh doesn't use editline for all its line
editing needs? That would allow handling all these nitty-gritty details
in a central place.

Greetings,
Sören



Re: ksh: fix input handling for 4 byte UTF-8 sequences

2021-05-08 Thread Sören Tempel
Ping.

Sören Tempel  wrote:
> Hello,
> 
> Currently, ksh does not correctly calculate the length of 4 byte UTF-8
> sequences in emacs input mode. For demonstration purposes try inputting
> an emoji (e.g. U+1F421) at your shell prompt. These 4 byte sequences can
> be identified by checking if the first four bits are set and the fifth
> bit isn't. The current check for identifying these 4 byte sequences is
> incorrect.
> 
> The patch below fixes this, thereby allowing users to enter emojis
> (and other 4 byte UTF-8 sequences) at their shell prompt in emacs mode:
> 
> Greetings,
> Sören
> 
> diff --git bin/ksh/emacs.c bin/ksh/emacs.c
> index 694c402ff..970a0989d 100644
> --- bin/ksh/emacs.c
> +++ bin/ksh/emacs.c
> @@ -1851,7 +1851,7 @@ x_e_getu8(char *buf, int off)
>   return -1;
>   buf[off++] = c;
>  
> - if (c == 0xf4)
> + if ((c & 0xf8) == 0xf0)
>   len = 4;
>   else if ((c & 0xf0) == 0xe0)
>   len = 3;



ksh: fix input handling for 4 byte UTF-8 sequences

2021-04-04 Thread Sören Tempel
Hello,

Currently, ksh does not correctly calculate the length of 4 byte UTF-8
sequences in emacs input mode. For demonstration purposes try inputting
an emoji (e.g. U+1F421) at your shell prompt. These 4 byte sequences can
be identified by checking if the first four bits are set and the fifth
bit isn't. The current check for identifying these 4 byte sequences is
incorrect.

The patch below fixes this, thereby allowing users to enter emojis
(and other 4 byte UTF-8 sequences) at their shell prompt in emacs mode:

Greetings,
Sören

diff --git bin/ksh/emacs.c bin/ksh/emacs.c
index 694c402ff..970a0989d 100644
--- bin/ksh/emacs.c
+++ bin/ksh/emacs.c
@@ -1851,7 +1851,7 @@ x_e_getu8(char *buf, int off)
return -1;
buf[off++] = c;
 
-   if (c == 0xf4)
+   if ((c & 0xf8) == 0xf0)
len = 4;
else if ((c & 0xf0) == 0xe0)
len = 3;



ksh: add support for bracketed paste mode

2021-04-02 Thread Sören Tempel
Hello,

The diff below adds support for bracketed paste mode to ksh. Bracketed
paste mode is a set of special escape sequences, which are employed by
many terminal emulators to allow programs run inside of them to
distinguish pasted text from typed-in text [0]. This is useful for
preventing pasted text from accidentally executing commands in the
application it was pasted to. Commonly this problem arises when copying
text from a web browser to a shell since the user may have copied
hidden text from the web page which may contain control characters such
as \n [1].

Bracketed paste mode is supported by all mainstream terminal emulators
including xterm, urxvt, and gnome-terminal. The implementation proposed
here was tested with urxvt. However, since we cannot determine in
advance whether the utilized terminal emulator supports these escape
sequences the feature must be explicitly activated using the set
builtin. I haven't tested the diff extensively yet but it is rather
simple so I don't expect any breakages. One limitation of the proposed
implementation is that in only works in emacs mode. Would be happy to
address any feedback you might have.

Greetings,
Sören

[0]: https://www.xfree86.org/4.7.0/ctlseqs.html#Bracketed%20Paste%20Mode
[1]: https://thejh.net/misc/website-terminal-copy-paste

diff --git bin/ksh/edit.c bin/ksh/edit.c
index 3089d195d..4b6d45050 100644
--- bin/ksh/edit.c
+++ bin/ksh/edit.c
@@ -150,12 +150,23 @@ x_puts(const char *s)
shf_putc(*s++, shl_out);
 }
 
+static void
+x_paste_mode(bool onoff)
+{
+   if (!Flag(FBBRACKETPASTE))
+   return;
+
+   printf((onoff) ? BRPASTE_INT : BRPASTE_DEINT);
+   fflush(stdout);
+}
+
 bool
 x_mode(bool onoff)
 {
static bool x_cur_mode;
boolprev;
 
+   x_paste_mode(onoff);
if (x_cur_mode == onoff)
return x_cur_mode;
prev = x_cur_mode;
diff --git bin/ksh/edit.h bin/ksh/edit.h
index 0b604cd64..8cc774f01 100644
--- bin/ksh/edit.h
+++ bin/ksh/edit.h
@@ -34,6 +34,12 @@ extern X_chars edchars;
 #define XCF_FULLPATH   BIT(2)  /* command completion: store full path */
 #define XCF_COMMAND_FILE (XCF_COMMAND|XCF_FILE)
 
+/* https://www.xfree86.org/4.7.0/ctlseqs.html#Bracketed%20Paste%20Mode */
+#define BRPASTE_INT"\033[?2004h"
+#define BRPASTE_DEINT  "\033[?2004l"
+#define BRPASTE_PREkb_encode("^[[200~")
+#define BRPASTE_POST   kb_encode("^[[201~")
+
 /* edit.c */
 intx_getc(void);
 void   x_flush(void);
diff --git bin/ksh/emacs.c bin/ksh/emacs.c
index 694c402ff..96136263e 100644
--- bin/ksh/emacs.c
+++ bin/ksh/emacs.c
@@ -118,6 +118,7 @@ static  char*xmp;   /* mark pointer */
 static char*killstack[KILLSIZE];
 static int killsp, killtp;
 static int x_literal_set;
+static int x_brack_paste;
 static int x_arg_set;
 static char*macro_args;
 static int prompt_skip;
@@ -203,6 +204,8 @@ static int  x_fold_lower(int);
 static int x_fold_upper(int);
 static int x_set_arg(int);
 static int x_comment(int);
+static int x_brack_paste_start(int);
+static int x_brack_paste_end(int);
 #ifdef DEBUG
 static int x_debug_info(int);
 #endif
@@ -260,6 +263,8 @@ static const struct x_ftab x_ftab[] = {
{ x_fold_upper, "upcase-word",  XF_ARG },
{ x_set_arg,"set-arg",  XF_NOBIND },
{ x_comment,"comment",  0 },
+   { x_brack_paste_start,  "bracketed-paste-start",0 },
+   { x_brack_paste_end,"bracketed-paste-end",  0 },
{ 0, 0, 0 },
 #ifdef DEBUG
{ x_debug_info, "debug-info",   0 },
@@ -316,6 +321,8 @@ x_emacs(char *buf, size_t len)
}
 
x_literal_set = 0;
+   x_brack_paste = 0;
+
x_arg = -1;
x_last_command = NULL;
while (1) {
@@ -353,6 +360,13 @@ x_emacs(char *buf, size_t len)
}
}
 
+   /* In bracketed paste mode only allow x_brack_paste_end,
+* to quit this mode, for all other commands insert a literal. 
*/
+   if (x_brack_paste && (submatch == 1 && kmatch)) {
+   if (kmatch->ftab->xf_func != x_brack_paste_end)
+   submatch = 0;
+   }
+
if (submatch == 1 && kmatch) {
if (kmatch->ftab->xf_func == x_ins_string &&
kmatch->args && !macro_args) {
@@ -1479,6 +1493,10 @@ x_init_emacs(void)
 
TAILQ_INIT();
 
+   /* bracketed paste mode */
+   kb_add_string(x_brack_paste_start,  NULL, BRPASTE_PRE);
+   kb_add_string(x_brack_paste_end,NULL, BRPASTE_POST);
+
/* man page order */
kb_add(x_abort, CTRL('G'), 0);
kb_add(x_mv_back,   CTRL('B'), 0);
@@ -1984,6 +2002,21 @@ x_comment(int c)
return KSTD;
 }
 
+int

rwho on OpenBSD 5.6

2014-11-09 Thread Sören Tempel
Hi all,

I just updated to OpenBSD 5.6 and I was happy to see that rcp, rsh,
rshd, rwho, rwhod, etc have been removed (at least according to the
Changelog). However, the upgrade instructions fail to mention that files
like /etc/rc.d/rwhod or /usr/bin/rwho should be removed.

Sören.