Re: improve xterm(1) resilience against control code attacks

2016-03-08 Thread lists
> Christian Weisgerber wrote on Mon, Mar 07, 2016 at 03:51:41PM +:
> > On 2016-03-07, Ingo Schwarze  wrote:  
> 
> >> Consequently, in the interest of safe and sane defaults, i propose
> >> switching our xterm(1) to enable UTF-8 mode by default.  
> 
> > Seconded.

Please.
 
> >> The best place to switch is in the setup function VTInitialize_locale()
> >> that decides whether to enable UTF-8 mode and which supporting flags
> >> to set, by pretending to it that CODESET is always UTF-8, but without
> >> interfering with the actual value of the CODESET and without changing
> >> the utility function xtermEnvUTF8().  
> 
> > Hmm, maybe you are overthinking this.
> > Other defaults that we set differently from upstream are simply
> > resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm).  

Tue, 8 Mar 2016 00:14:45 +0100 Ingo Schwarze 
> Heh.  I considered simply changing the resource defaults, but came
> to the wrong conclusion that there wouldn't be a way to achieve the
> desired effect.  Thanks for bringing it up again, that made me
> re-check, and it turns out there *is* a way that is quite
> straightforward, minimally intrusive, very robust, and doesn't get
> in the way of explicit user configuration: See the patch below.
> If this gets OKs, let's forget my previous, more intrusive patch.

Maybe still not forget it for future reference though.

> With that change, users can obviously still set *locale to other
> values (for example, "true" or "false" come to mind), and the command
> line options changing *locale (-lc +lc -en) still work.  Looking
> at the code, explicitly setting *utf8 to false (or equivalently,
> +u8 on the command line) also overrides this.
> 
> Spending a day reading xterm source code wasn't wasted, though -
> by reading the documentation only, i wouldn't have understood
> that this way works as intended and is safe.

If it took you that long, no wonder the barrier is too high for entry
in this program.  Not implying anything though, please leave it be.

> >>   printf "\303\237\n"   # thanks to sobrado@ for the striking example
> >> Now your local terminal hangs until you force a reset using the
> >> menus of the xterm program.  
> 
> > \237 is 0x9F, equivalent to ESC _, which is APC (Application Program
> > Command).  That appears in a table, but is not explained in the
> > VT220 manual.  The VT420 manual says: "The VT420 ignores all following
> > characters until it receives a SUB, ST, or any other C1 control
> > character."  
> 
> Yes.  I spent so much time reading terminal control code documentation
> lately that i probably assumed this to be widely known.  ;-)
> You are right, explaining it is helpful.

More thinking on terminal resilience and utf8, thanks for making huge
progress so far system wide.

> Index: XTerm.ad
> ===
> RCS file: /cvs/xenocara/app/xterm/XTerm.ad,v
> retrieving revision 1.15
> diff -u -p -r1.15 XTerm.ad
> --- XTerm.ad  26 Aug 2013 20:06:10 -  1.15
> +++ XTerm.ad  7 Mar 2016 22:54:44 -
> @@ -259,6 +259,11 @@
>  
>  ! OpenBSD local modifications
>  
> +! Enable UTF-8 mode since OpenBSD does not support any other multibyte
> +! locales.  Even for people using the C/POSIX locale for everything,
> +! that's safer and more usable than the upstream default of "medium".
> +*locale: UTF-8
> +
>  ! ScrollBar by default
>  *scrollBar: true

Not mentioning the unrelated login resource class set to true in
another patch for application defaults instead of user resources.



Re: improve xterm(1) resilience against control code attacks

2016-03-08 Thread Matthieu Herrb
On Tue, Mar 08, 2016 at 12:14:45AM +0100, Ingo Schwarze wrote:
> Hi Christian,
> 
> Christian Weisgerber wrote on Mon, Mar 07, 2016 at 03:51:41PM +:
> > On 2016-03-07, Ingo Schwarze  wrote:
> 
> >> Consequently, in the interest of safe and sane defaults, i propose
> >> switching our xterm(1) to enable UTF-8 mode by default.
> 
> > Seconded.
>  
> >> The best place to switch is in the setup function VTInitialize_locale()
> >> that decides whether to enable UTF-8 mode and which supporting flags
> >> to set, by pretending to it that CODESET is always UTF-8, but without
> >> interfering with the actual value of the CODESET and without changing
> >> the utility function xtermEnvUTF8().
> 
> > Hmm, maybe you are overthinking this.
> > Other defaults that we set differently from upstream are simply
> > resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm).
> 
> Heh.  I considered simply changing the resource defaults, but came
> to the wrong conclusion that there wouldn't be a way to achieve the
> desired effect.  Thanks for bringing it up again, that made me
> re-check, and it turns out there *is* a way that is quite
> straightforward, minimally intrusive, very robust, and doesn't get
> in the way of explicit user configuration: See the patch below.
> If this gets OKs, let's forget my previous, more intrusive patch.
> 
> With that change, users can obviously still set *locale to other
> values (for example, "true" or "false" come to mind), and the command
> line options changing *locale (-lc +lc -en) still work.  Looking
> at the code, explicitly setting *utf8 to false (or equivalently,
> +u8 on the command line) also overrides this.
> 
> Spending a day reading xterm source code wasn't wasted, though -
> by reading the documentation only, i wouldn't have understood
> that this way works as intended and is safe.
> 
> OK?
>   Ingo
> 

Ok. 

Thanks for the extra explanations. 

> 
> > 
> > PS:
>  
> >>   printf "\303\237\n"   # thanks to sobrado@ for the striking example
> >> Now your local terminal hangs until you force a reset using the
> >> menus of the xterm program.
> 
> > \237 is 0x9F, equivalent to ESC _, which is APC (Application Program
> > Command).  That appears in a table, but is not explained in the
> > VT220 manual.  The VT420 manual says: "The VT420 ignores all following
> > characters until it receives a SUB, ST, or any other C1 control
> > character."
> 
> Yes.  I spent so much time reading terminal control code documentation
> lately that i probably assumed this to be widely known.  ;-)
> You are right, explaining it is helpful.
> 
> 
> Index: XTerm.ad
> ===
> RCS file: /cvs/xenocara/app/xterm/XTerm.ad,v
> retrieving revision 1.15
> diff -u -p -r1.15 XTerm.ad
> --- XTerm.ad  26 Aug 2013 20:06:10 -  1.15
> +++ XTerm.ad  7 Mar 2016 22:54:44 -
> @@ -259,6 +259,11 @@
>  
>  ! OpenBSD local modifications
>  
> +! Enable UTF-8 mode since OpenBSD does not support any other multibyte
> +! locales.  Even for people using the C/POSIX locale for everything,
> +! that's safer and more usable than the upstream default of "medium".
> +*locale: UTF-8
> +
>  ! ScrollBar by default
>  *scrollBar: true
>  

-- 
Matthieu Herrb


pgp90Bs5BZ5DM.pgp
Description: PGP signature


Re: improve xterm(1) resilience against control code attacks

2016-03-07 Thread Ingo Schwarze
Hi Christian,

Christian Weisgerber wrote on Mon, Mar 07, 2016 at 03:51:41PM +:
> On 2016-03-07, Ingo Schwarze  wrote:

>> Consequently, in the interest of safe and sane defaults, i propose
>> switching our xterm(1) to enable UTF-8 mode by default.

> Seconded.
 
>> The best place to switch is in the setup function VTInitialize_locale()
>> that decides whether to enable UTF-8 mode and which supporting flags
>> to set, by pretending to it that CODESET is always UTF-8, but without
>> interfering with the actual value of the CODESET and without changing
>> the utility function xtermEnvUTF8().

> Hmm, maybe you are overthinking this.
> Other defaults that we set differently from upstream are simply
> resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm).

Heh.  I considered simply changing the resource defaults, but came
to the wrong conclusion that there wouldn't be a way to achieve the
desired effect.  Thanks for bringing it up again, that made me
re-check, and it turns out there *is* a way that is quite
straightforward, minimally intrusive, very robust, and doesn't get
in the way of explicit user configuration: See the patch below.
If this gets OKs, let's forget my previous, more intrusive patch.

With that change, users can obviously still set *locale to other
values (for example, "true" or "false" come to mind), and the command
line options changing *locale (-lc +lc -en) still work.  Looking
at the code, explicitly setting *utf8 to false (or equivalently,
+u8 on the command line) also overrides this.

Spending a day reading xterm source code wasn't wasted, though -
by reading the documentation only, i wouldn't have understood
that this way works as intended and is safe.

OK?
  Ingo


> 
> PS:
 
>>   printf "\303\237\n"   # thanks to sobrado@ for the striking example
>> Now your local terminal hangs until you force a reset using the
>> menus of the xterm program.

> \237 is 0x9F, equivalent to ESC _, which is APC (Application Program
> Command).  That appears in a table, but is not explained in the
> VT220 manual.  The VT420 manual says: "The VT420 ignores all following
> characters until it receives a SUB, ST, or any other C1 control
> character."

Yes.  I spent so much time reading terminal control code documentation
lately that i probably assumed this to be widely known.  ;-)
You are right, explaining it is helpful.


Index: XTerm.ad
===
RCS file: /cvs/xenocara/app/xterm/XTerm.ad,v
retrieving revision 1.15
diff -u -p -r1.15 XTerm.ad
--- XTerm.ad26 Aug 2013 20:06:10 -  1.15
+++ XTerm.ad7 Mar 2016 22:54:44 -
@@ -259,6 +259,11 @@
 
 ! OpenBSD local modifications
 
+! Enable UTF-8 mode since OpenBSD does not support any other multibyte
+! locales.  Even for people using the C/POSIX locale for everything,
+! that's safer and more usable than the upstream default of "medium".
+*locale: UTF-8
+
 ! ScrollBar by default
 *scrollBar: true
 



Re: improve xterm(1) resilience against control code attacks

2016-03-07 Thread Christian Weisgerber
On 2016-03-07, Ingo Schwarze  wrote:

> Consequently, in the interest of safe and sane defaults, i propose
> switching our xterm(1) to enable UTF-8 mode by default.

Seconded.

> The best place to switch is in the setup function VTInitialize_locale()
> that decides whether to enable UTF-8 mode and which supporting flags
> to set, by pretending to it that CODESET is always UTF-8, but without
> interfering with the actual value of the CODESET and without changing
> the utility function xtermEnvUTF8().

Hmm, maybe you are overthinking this.
Other defaults that we set differently from upstream are simply
resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm).



PS:

>   printf "\303\237\n"   # thanks to sobrado@ for the striking example
> Now your local terminal hangs until you force a reset using the
> menus of the xterm program.

\237 is 0x9F, equivalent to ESC _, which is APC (Application Program
Command).  That appears in a table, but is not explained in the
VT220 manual.  The VT420 manual says: "The VT420 ignores all following
characters until it receives a SUB, ST, or any other C1 control
character."

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



improve xterm(1) resilience against control code attacks

2016-03-06 Thread Ingo Schwarze
Hi,

if two programs communicating encoded character strings to each other
disagree about the encoding, that can result in problems.

One particular example of such communication is an application program
passing output text to a terminal emulator program.  If the terminal
uses a different encoding for decoding the text than the application
used for encoding it, the terminal may see control codes where the
application only intended printable characters.  This can screw up the
terminal state, spoiling display of subsequent text or even hanging
the terminal.

Actually, i assume that this problem occurs frequently in practice,
for the following reasons.  If the application program is well-behaved,
it either produces C/POSIX/US-ASCII output only, or its idea of the
encoding to use is governed by the LC_CTYPE locale(1) environment
variable, typically passed to it by the shell it was started from.
Now that locale(1) environment is completely unrelated to whatever
encoding the terminal may be set up for.  It may not even be on the
same physical machine.  For example, during an SSH session, your
terminal is on the local SSH client machine, while the shell starting
your application programs is on the remote SSH server machine.
To fully appreciate the implications, try out the following scenario:
Start an xterm(1) that is not UTF-8 enabled on your local machine
by saying "xterm +lc +u8".  Unset LC_ALL, LC_CTYPE, and LANG; check
with locale(1) that your locale is "C".  Use ssh(1) to connect to
a remote machine.  Now simulate a program producing UTF-8 output
on the remote machine, for example U+00DF LATIN SMALL LETTER SHARP S:
  printf "\303\237\n"   # thanks to sobrado@ for the striking example
Now your local terminal hangs until you force a reset using the
menus of the xterm program.  If the shell startup files on the
remote machine set LC_CTYPE=en_US.UTF-8 or something similar by
default, programs on the remote machine will always do just that.

That shows how easy it is to inadvertently cause application-terminal
character encoding mismatches; yet i doubt that many people are aware
of the problem.  So we should try to reduce the likelihood that people
get burnt by such effects.

On an operating system supporting any third locale in addition to
C/POSIX and UTF-8, people are screwed beyond rescue because even
if one side of the connection assumes US-ASCII, communication is
still unsafe in both directions.  Reinterpreting US-ASCII in an
arbitrary encoding and reinterpreting an arbitrary encoding as
US-ASCII may both turn innocuous printable characters into dangerous
terminal control codes.  That is particularly bitter because some
programs will always output US-ASCII, which is not safe to display
in a terminal set up for an arbitrary locale.

Fortunately, in OpenBSD, we made the decision to only support exactly
two locales, C/POSIX and UTF-8, and this combination has the following
properties:

 1. Printing unsanitized strings to the terminal is never safe,
no matter the locale and terminal setup (think of "cat /bsd").
 2. Printing sanitized US-ASCII to a US-ASCII terminal is safe.
 3. Printing sanitized UTF-8 to a UTF-8 terminal is safe.
 4. Printing sanitized US-ASCII to a UTF-8 terminal is safe.
That is important because there are some programs that we may
never want to add UTF-8 support to.

However:

 5. Printing sanitized UTF-8 to a US-ASCII terminal is *NOT* safe.
Remember the example above that hung a US-ASCII terminal by
printing U+00DF LATIN SMALL LETTER SHARP S in UTF-8 to it.

By default, our xterm(1) runs in US-ASCII mode.  In view of the
above, that's a terrible idea, even if the user doesn't intend to
ever use UTF-8.  A UTF-8 terminal handles the US-ASCII the user
wants just fine, and in addition to that, and mostly for free, it
is more resilient against stray UTF-8 sneaking in.

Actually, even when fed garbage or unsupported encodings, a UTF-8
xterm(1) is more robust than a US-ASCII xterm(1) because the UTF-8
xterm(1) honours *fewer* terminal escape codes than the US-ASCII
xterm(1).  That may seem surprising at first because Unicode defines
*more* control characters than US-ASCII does.  But as explained on

  http://invisible-island.net/xterm/ctlseqs/ctlseqs.html

xterm(1) never treats decoded multibyte characters as terminal
control codes, so the ISO 6429 C1 control codes do not take effect
in UTF-8 mode; but they do take effect in US-ASCII mode, even though
they fall outside the scope of ASCII.

Consequently, in the interest of safe and sane defaults, i propose
switching our xterm(1) to enable UTF-8 mode by default.  If somebody
insists on running an xterm(1) in US-ASCII mode, there are still
many ways to force that, for example with "+lc +u8".


It is rather tricky to get the switch right because the locale+encoding
user interface of xterm(1) is ridiculously complicated.  It uses
three X resources (*locale, *utf8, *wideChar) with 5+4+2 possible
values (*locale: true, medium,