Bug#987916: openssh: Segfault or malloc_consolidate(): invalid chunk size + Aborted with GSSAPITrustDns yes

2021-05-18 Thread Bernhard Übelacker

Hello Chris,
I am not involved in packaging, just trying to give some
pointers to get better information for the maintainers.

In [1] are several possible actions listed, that could be
used to get more informations.

Just to clarify, host heisenberg is your local system,
from which the connection starts?

If yes, I would propose these actions:

- If possible, install systemd-coredump. Then in the journalctl
  output there should already appear a basic backtrace with
  a crashing process, plus a core gets collected of it and
  stored for some time (e.g. "coredumpctl list" or "coredumpctl gdb").

- Try connecing setting following environment before:
export MALLOC_CHECK_=3
  That should make stricter checks in the allocator
  and maybe fail earlier.

- You might install debug symbols too, also described in [1].
  With that you could also start the connection inside a debugger:
gdb -q --args ssh hammercloud-ai-11.cern.ch -v
  And do following actions at the gdb prompt:
  run
  bt
  detach
  quit
  That way the 'bt' command should print a backtrace
  that might help to reproduce the issue.

Kind regards,
Bernhard

[1] https://wiki.debian.org/HowToGetABacktrace



Bug#987916: openssh: Segfault or malloc_consolidate(): invalid chunk size + Aborted with GSSAPITrustDns yes

2021-05-01 Thread Christoph Anton Mitterer
Source: openssh
Version: 1:8.4p1-5
Severity: important


Hey.

This is from https://bugzilla.mindrot.org/show_bug.cgi?id=3307:

Hey there.

I've noted the two errors, with the following setup:

Locally, I have:
OpenSSH_8.4p1 Debian-5, OpenSSL 1.1.1k  25 Mar 2021

from which I connect to some internal node at CERN (hammercloud-ai-11.cern.ch) 
via some publicly available node (lxplus.cern.ch) which all have:
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017

The lxplus.cern.ch is actually a round robin DNS name, but all nodes behind 
have the same ssh server key.


Since CERN uses AFS, I have to do GSSAPI auth.
Locally I have a keytab file created with ktuil, which even works out of the 
box with SSH - that is, if I don't have a krb ticket yet, it automatically 
creates one.


My SSH config looks like the following:
Host hammercloud-ai-11.cern.ch
GSSAPIAuthentication yes
GSSAPIDelegateCredentials yes
GSSAPIRenewalForcesRekey yes
GSSAPITrustDns yes
ProxyJump   lxplus.cern.ch


Host lxplus.cern.ch
GSSAPIAuthentication yes
GSSAPIDelegateCredentials yes
GSSAPIRenewalForcesRekey yes
GSSAPITrustDns yes
#   ControlMaster   auto
#   ControlPersist  10s
#   ControlPath ~/.ssh/channel-mux/%r@%h:%p

Host *.cern.ch
User someUser
IdentityFile~/.ssh/id_ed25519
SetEnv "LANG=en_US.UTF-8"


Further, I do have a custom locale which is basically en_US.UTF-8, but with 
some international stuff like "," as decimal separator.

Now that works to login to lxplus, and from there (within an interactive 
session) to hammercloud-ai-11.

When I use the ProxyJump however and directly go to hammercloud-ai-11, I start 
to see errors.


1) with LANG=en_DE.UTF-8 it segfaults:
$ ssh hammercloud-ai-11.cern.ch -v
...
Authenticated to hammercloud-ai-11.cern.ch (via proxy).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessi...@openssh.com
debug1: Entering interactive session.
debug1: pledge: proc
debug1: client_input_global_request: rtype hostkeys...@openssh.com want_reply 0
debug1: Sending environment.
debug1: Sending env LANG = en_DE.UTF-8
Segmentation fault
$ debug1: stdio forwarding: done

Interestingly it seems to still try to send "my" locale instead what I've 
configured above with:
SetEnv "LANG=en_US.UTF-8"



2) the same with LANG=C
$ export LANG=C
$ ssh hammercloud-ai-11.cern.ch -v
...
Authenticated to hammercloud-ai-11.cern.ch (via proxy).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessi...@openssh.com
debug1: Entering interactive session.
debug1: pledge: proc
debug1: client_input_global_request: rtype hostkeys...@openssh.com want_reply 0
debug1: Sending environment.
debug1: Sending env LANG = C
malloc_consolidate(): invalid chunk size
Aborted
$ debug1: stdio forwarding: done


Whether or not using a Control Channel doesn't seem to matter.


When I comment the
Host hammercloud-ai-11.cern.ch
...
#   GSSAPITrustDns yes


It works in both cases.

Commeting the same for lxplus (the proxy node), doesn't solve the issue.


Any ideas?

Cheers,
Chris.



forgot:

May 01 16:38:39 heisenberg kernel: ssh[16368]: segfault at 7e0008 ip 
7f646525a86c sp 7ffd72b5fb30 error 4 in 
libc-2.31.so[7f64651f9000+14b000]
May 01 16:38:39 heisenberg kernel: Code: 43 28 00 00 00 00 48 8b 54 24 08 48 89 
ef 48 89 43 10 48 83 cf 01 48 89 7b 08 48 89 53 18 48 89 2c 2b 48 85 c9 74 87 
48 89 cb <48> 8b 43 08 89 c1 c1 e9 04 83 e9 02 49 8d 4c cc 10 49 39 cd 0f 85
May 01 16:38:50 heisenberg kernel: ssh[16375]: segfault at 7e0008 ip 
7fe602caa86c sp 7fff2ac78150 error 4 in 
libc-2.31.so[7fe602c49000+14b000]
May 01 16:38:50 heisenberg kernel: Code: 43 28 00 00 00 00 48 8b 54 24 08 48 89 
ef 48 89 43 10 48 83 cf 01 48 89 7b 08 48 89 53 18 48 89 2c 2b 48 85 c9 74 87 
48 89 cb <48> 8b 43 08 89 c1 c1 e9 04 83 e9 02 49 8d 4c cc 10 49 39 cd 0f 85




-- System Information:
Debian Release: 11.0
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-6-amd64 (SMP w/4 CPU threads)
Locale: LANG=en_DE.UTF-8, LC_CTYPE=en_DE.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)