https://bugzilla.mindrot.org/show_bug.cgi?id=2167
--- Comment #2 from Tetsuo Handa <[email protected]> --- This is not a kernel bug. fatal() is designed to eventually call exit(), but this bug is preventing sshd process from calling exit(). The child process calls fatal() when fork() failed at privsep_postauth(). (Please note that fork() is replaced with -1 for explanation.) ---------- privsep_postauth() in sshd.c ---------- static void privsep_postauth(Authctxt *authctxt) { u_int32_t rnd[256]; #ifdef DISABLE_FD_PASSING if (1) { #else if (authctxt->pw->pw_uid == 0 || options.use_login) { #endif /* File descriptor passing is broken or root login */ use_privsep = 0; goto skip; } /* New socket pair */ monitor_reinit(pmonitor); pmonitor->m_pid = -1; // Emulate the fork() failure. if (pmonitor->m_pid == -1) fatal("fork of unprivileged child failed"); (...snipped...) ---------- privsep_postauth() in sshd.c ---------- fatal() calls cleanup_exit(255). (Please note that dummy write(-1, "", $step) lines are inserted for comparing with strace command.) ---------- fatal() in fatal.c ---------- void fatal(const char *fmt,...) { va_list args; va_start(args, fmt); do_log(SYSLOG_LEVEL_FATAL, fmt, args); va_end(args); write(-1, "", 0); cleanup_exit(255); } ---------- fatal() in fatal.c ---------- cleanup_exit(255) will eventually call _exit(255). ---------- cleanup_exit() in sshd.c ---------- /* server specific fatal cleanup */ void cleanup_exit(int i) { static int in_cleanup; int is_privsep_child; write(-1, "", 1); /* cleanup_exit can be called at the very least from the privsep wrappers used for auditing. Make sure we don't recurse indefinitely. */ if (in_cleanup) _exit(i); write(-1, "", 2); in_cleanup = 1; if (the_authctxt) do_cleanup(the_authctxt); write(-1, "", 3); is_privsep_child = use_privsep && pmonitor != NULL && !mm_is_monitor(); write(-1, "", 4); if (sensitive_data.host_keys != NULL) destroy_sensitive_data(is_privsep_child); write(-1, "", 5); packet_destroy_all(1, is_privsep_child); #ifdef SSH_AUDIT_EVENTS /* done after do_cleanup so it can cancel the PAM auth 'thread' */ write(-1, "", 6); if ((the_authctxt == NULL || !the_authctxt->authenticated) && (!use_privsep || mm_is_monitor())) audit_event(SSH_CONNECTION_ABANDON); #endif write(-1, "", 7); _exit(i); } ---------- cleanup_exit() in sshd.c ---------- Did we reach the _exit(i) line? Let's check the strace command. ---------- strace log start ---------- [pid 17153] socketpair(PF_FILE, SOCK_STREAM, 0, [5, 6]) = 0 [pid 17153] fcntl(5, F_SETFD, FD_CLOEXEC) = 0 [pid 17153] fcntl(6, F_SETFD, FD_CLOEXEC) = 0 [pid 17153] sendto(4, "<82>Nov 1 11:11:08 sshd[17153]:"..., 73, MSG_NOSIGNAL, NULL, 0) = 73 [pid 17153] close(4) = 0 [pid 17153] write(4294967295, "", 0) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0", 1) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0G", 2) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0Go", 3) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0Got", 4) = -1 EBADF (Bad file descriptor) [pid 17153] getuid() = 0 [pid 17153] write(5, "\0\0\0DR", 5) = 5 [pid 17153] write(5, "\0\0\0/c1:bd:33:a5:66:d5:83:2d:0c:9"..., 67) = 67 [pid 17153] read(5, ^C <unfinished ...> Process 17147 detached Process 17153 detached ---------- strace log end ---------- We can see that the child process reached the write(-1, "", 4) line but did not reach the write(-1, "", 5) line. This means that the child process is sleeping at if (sensitive_data.host_keys != NULL) destroy_sensitive_data(is_privsep_child); trying to read data from fd == 5. What is fd == 5 connected with? According to strace command, fd == 5 and fd == 6 are a socket pair created by monitor_reinit() call in privsep_postauth(). ---------- monitor_reinit() in monitor.c ---------- void monitor_reinit(struct monitor *mon) { int pair[2]; monitor_socketpair(pair); mon->m_recvfd = pair[0]; mon->m_sendfd = pair[1]; } ---------- monitor_reinit() in monitor.c ---------- destroy_sensitive_data() tried to write to fd == 5 and trying to read from fd == 5, but there is no writers writing to fd == 6. Dead lock caused by trying to I/O against wrong file descriptor. This was why calling shutdown() seems to solve the problem. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug. _______________________________________________ openssh-bugs mailing list [email protected] https://lists.mindrot.org/mailman/listinfo/openssh-bugs
