On Fri, Nov 10, 2017 at 01:59:46PM +0100, Egmont Koblinger wrote:
[...]
> On Ubuntu Artful (glibc-2.26), this tiny snippet reproducibly crashes bash:
>
> LC_ALL=en_US.UTF-8 # or any other UTF-8 locale
> echo -e '\ud800' # or any other lone high or low surrogate
> LC_ALL=en_US.UTF-8 # or any available locale
I'm able to reproduce it in the `devel' branch:
(gdb) r
Starting program: /home/dualbus/src/gnu/build-bash-devel/bash
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
dualbus@ubuntu:~/src/gnu/build-bash-devel$ LC_ALL=en_US.UTF-8
dualbus@ubuntu:~/src/gnu/build-bash-devel$ echo -e '\ud800'
���
dualbus@ubuntu:~/src/gnu/build-bash-devel$ LC_ALL=en_US.UTF-8
Program received signal SIGSEGV, Segmentation fault.
__gconv_close (cd=0x0) at gconv_close.c:35
35 gconv_close.c: No such file or directory.
(gdb) bt
#0 __gconv_close (cd=0x0) at gconv_close.c:35
#1 0x7662eb7f in iconv_close (cd=) at iconv_close.c:35
#2 0x5576dcb8 in u32reset () at ../../../bash/lib/sh/unicode.c:102
#3 0x556e9f7a in set_locale_var (var=0x603000171a00 "LC_ALL",
value=0x602000207430 "en_US.UTF-8") at ../bash/locale.c:215
#4 0x556432e1 in sv_locale (name=0x603000171a00 "LC_ALL") at
../bash/variables.c:5671
#5 0x55641c8c in stupidly_hack_special_variables (name=0x603000171a00
"LC_ALL") at ../bash/variables.c:5280
#6 0x556759a8 in do_assignment_internal (word=0x602000204770,
expand=1) at ../bash/subst.c:3225
#7 0x55675d08 in do_word_assignment (word=0x602000204770, flags=0) at
../bash/subst.c:3263
#8 0x556a335e in expand_word_list_internal (list=0x602000205d70,
eflags=31) at ../bash/subst.c:11080
#9 0x556a0b25 in expand_words (list=0x602000205d70) at
../bash/subst.c:10635
#10 0x55628701 in execute_simple_command
(simple_command=0x603000171940, pipe_in=-1, pipe_out=-1, async=0,
fds_to_close=0x6020002073f0)
at ../bash/execute_cmd.c:4230
#11 0x556167b4 in execute_command_internal (command=0x603000171910,
asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x6020002073f0)
at ../bash/execute_cmd.c:821
#12 0x55614edb in execute_command (command=0x603000171910) at
../bash/execute_cmd.c:393
#13 0x555e164f in reader_loop () at ../bash/eval.c:172
#14 0x555dc882 in main (argc=1, argv=0x7fffe138,
env=0x7fffe148) at ../bash/shell.c:804
(gdb) frame 2
#2 0x5576dcb8 in u32reset () at ../../../bash/lib/sh/unicode.c:102
102 iconv_close (localconv);
(gdb) p localconv
$1 = (iconv_t) 0x0
The problem is that Bash treats UTF-8 as a special case, so it doesn't
initialize `localconv' to a proper value in `u32cconv', but then it calls
`iconv_close' on the uninitialized `localconv' value during the locale switch.
I think the fix looks something like this:
diff --git a/lib/sh/unicode.c b/lib/sh/unicode.c
index a6e3058f..2f64315e 100644
--- a/lib/sh/unicode.c
+++ b/lib/sh/unicode.c
@@ -272,6 +272,7 @@ u32cconv (c, s)
if (u32init == 0)
{
utf8locale = locale_utf8locale;
+ localconv = (iconv_t)-1; /* initialize */
if (utf8locale == 0)
{
#if HAVE_LOCALE_CHARSET