Re: Mosh regression between 10.x and 11-stable
On 8/13/16 4:30 AM, Peter Jeremy wrote: > Hi John, > > Sorry, I got side-tracked. > > On 2016-Aug-12 16:37:15 -0400, John Hood <cg...@glup.org> wrote: >> >Could I ask you to look at this a little further? On the one hand, it >> >sure looks like a Mosh issue, and tcdrain() solves it-- but on the other >> >hand, this is a regression on FreeBSD and we don't see this issue on any >> >other OS. I'd like to fully understand this and make sure that this is >> >not a kernel issue. > It's got me puzzled as well. And it's only getting wierder... The > following is using an unmodified mosh-1.2.5, built from the port, as > the server on FreeBSD 11.0-BETA4 r303957. The client is 1.2.4a-1ubuntu1 > on Linux. the standard driver script consistently fails but adding a > "print" makes it mostly work. Where there's a "[mosh is exiting.]" > message, it was successful (and would report that there were other > orphaned servers since I wasn't waiting the 60 seconds for servers to > die between invocations). For completeness, I've tried ktrace'ing > mosh-server but can't make it fail when I do so. I've now managed to reproduce the issue *and* ktrace it (and sshd) on a VPS, with a single-CPU VM on a badly-oversubscribed VMWare host and OS X client. mosh-server shows a straightforward execution trace, the parent successfully writes the MOSH CONNECT message on stdout, forks and exits. About a millisecond later, the child starts running, writes the verbose copyright/etc. message on stderr, and gets EIO (and ignores it). Shortly thereafter it closes its pty slave fds on stdin/stdout/stderr. It continues normally from there. In that millisecond, the sshd trace shows that it catches SIGCHLD, writes utmp info, wait()s for a child and gets mosh-server's pid, and does a final read() that returns 0 bytes. It then closes the pty master, without doing revoke() or any ioctls. So the pty driver is clearly dropping the MOSH CONNECT message. It's a lot less clear whether that's a bug. On the one hand, <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_11> states that output should be drained on final close (which in this case is done by the forked mosh-server). On the other hand, the login shell is session leader, and <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_03> requires that the terminal be disassociated from the session when it exits. I suspect that this problem is most visible on single-core machines, because on a multi-core machine the pty driver, mosh-servers, and sshd will run with different ordering, and I suspect that the mosh-server child gets to close the pty slave before sshd closes the pty master. I wrote a Xenix serial driver ages ago and I can see arguments for either draining or dropping final output. And Mosh's behavior is a little questionable here. So I'm not sure whether to call this a kernel pty bug or not. But the workaround in Mosh is easy enough, tcdrain(), so I'm doing that anyway. regards, --jh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Mosh regression between 10.x and 11-stable
That's interesting (not in a good way). Next thing to try: mosh --server='/usr/local/bin/mosh-server 2>/dev/null' peter@VPS (with an unmodified mosh-server) Mosh prints the 'MOSH CONNECT ...' message on stdout, then forks. The parent exits immediately, and the child prints verbose and copyright info to stderr. I'm wondering if the race is that the parent and child writes appear mixed together, corrupting the expected message. You might try adding a 'print;' in the while loop that digests this input in the mosh script. That'll tell us whether the script is not getting 'MOSH CONNECT...' at all, or if it's corrupted. You'll have to run mosh inside /usr/bin/script to capture that. regards, --jh On 08/11/16 03:53 PM, Peter Jeremy wrote: > On 2016-Aug-11 12:30:23 -0400, John Hood <cg...@glup.org> wrote: >> I still can't reproduce this on 3 different 11.0-BETA4 servers and a >> variety of clients and networks. Can you try and identify a more >> portable repro or at least figure out why it fails on your system? >> >> Please try applying this patch, too. It's a shot in the dark, though. > That patch seems to fix the problem I'm seeing. Not waiting for output > to drain is consistent with the symptoms I'm seeing, though I have no > idea why only my Linux client is affected. > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Mosh regression between 10.x and 11-stable
I still can't reproduce this on 3 different 11.0-BETA4 servers and a variety of clients and networks. Can you try and identify a more portable repro or at least figure out why it fails on your system? Please try applying this patch, too. It's a shot in the dark, though. regards, --jh >From 4c4d2193af37f4375d9f7d52c109bbcdb873d9fc Mon Sep 17 00:00:00 2001 From: John Hood <cg...@glup.org> Date: Thu, 11 Aug 2016 12:25:55 -0400 Subject: [PATCH] Ensure MOSH CONNECT message reaches sshd. --- src/frontend/mosh-server.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/frontend/mosh-server.cc b/src/frontend/mosh-server.cc index a88a8d2..cd25a14 100644 --- a/src/frontend/mosh-server.cc +++ b/src/frontend/mosh-server.cc @@ -410,6 +410,9 @@ static int run_server( const char *desired_ip, const char *desired_port, printf( "\nMOSH CONNECT %s %s\n", network->port().c_str(), network->get_key().c_str() ); fflush( stdout ); + if ( isatty( fileno( stdout ))) { +tcdrain( fileno( stdout )); + } /* don't let signals kill us */ struct sigaction sa; -- 2.9.2 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Mosh regression between 10.x and 11-stable
On 8/10/16 4:18 AM, Peter Jeremy wrote: > I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 > r303811 and mosh to that host from my Linux laptop stopped working. All > I get on the laptop is: > $ mosh remotehost > Connection to remotehost closed. > /usr/bin/mosh: Did not find mosh server startup message. > > I've tried rebuilding mosh (and all dependencies) on the host to no avail. I'm a mosh maintainer. mosh 1.2.5 (from ports) and mosh master (just last night tagged as 1.2.6, alas) work fine for me on my two 11.0-BETA4 systems, one local and one remote. > This isn't the DSA change that's been discussed elsewhere: I can SSH from my > laptop to the host without problem. I can also manually invoke mosh-client > and mosh-server and it works. Unfortunately, mosh has no provision for > debugging. I've tried hacking the mosh perl script to make it more verbose > and that shows that: > 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. Do you know if the message is getting out of mosh-server? into sshd? Do you know if mosh-server is actually running? (It will log utmp entries on startup.) Mosh's debugging/logging isn't very good, but 'mosh-server new -v 2> logfile' does produce some useful info (mostly logging of network traffic). > 2) it's racy because I can get it from "always fails" to "sometimes works". How do you get it there? > My suspicion is that something has changed in either sshd or TCP that > is resulting in the connection going away before the stdout from the > remote mosh-server makes it out from the local ssh process. mosh does 'ssh -t' and uses ptys. That's another potential point the message could get dropped. > I've looked at tcpdump's of both successful and failed SSH sessions > but don't see anything obviously different (encryption makes it > difficult to decode the session). > > Has anyone else seen this behaviour or have any ideas what might be > causing it? Common suspects include issues with shell login/invocation of mosh (are you making sure it's reachable in /usr/local/bin with $PATH or '--server=/usr/local/bin/mosh'? are your login shell and its login scripts unusual?) On Linux we've had issues with ecryptfs and systemd breaking mosh-server when the ssh session ends, but I don't think that applies here. regards, --jh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Mosh regression between 10.x and 11-stable
On 8/10/16 4:18 AM, Peter Jeremy wrote: > I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 > r303811 and mosh to that host from my Linux laptop stopped working. All > I get on the laptop is: > $ mosh remotehost > Connection to remotehost closed. > /usr/bin/mosh: Did not find mosh server startup message. > > I've tried rebuilding mosh (and all dependencies) on the host to no avail. I'm a mosh maintainer. mosh 1.2.5 (from ports) and mosh master (just last night tagged as 1.2.6, alas) work fine for me on my two 11.0-BETA4 systems, one local and one remote. > This isn't the DSA change that's been discussed elsewhere: I can SSH from my > laptop to the host without problem. I can also manually invoke mosh-client > and mosh-server and it works. Unfortunately, mosh has no provision for > debugging. I've tried hacking the mosh perl script to make it more verbose > and that shows that: > 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. Do you know if the message is getting out of mosh-server? into sshd? Do you know if mosh-server is actually running? (It will log utmp entries on startup.) Mosh's debugging/logging isn't very good, but 'mosh-server new -v 2> logfile' does produce some useful info (mostly logging of network traffic). > 2) it's racy because I can get it from "always fails" to "sometimes works". How do you get it there? > My suspicion is that something has changed in either sshd or TCP that > is resulting in the connection going away before the stdout from the > remote mosh-server makes it out from the local ssh process. mosh does 'ssh -t' and uses ptys. That's another potential point the message could get dropped. > I've looked at tcpdump's of both successful and failed SSH sessions > but don't see anything obviously different (encryption makes it > difficult to decode the session). > > Has anyone else seen this behaviour or have any ideas what might be > causing it? Common suspects include issues with shell login/invocation of mosh (are you making sure it's reachable in /usr/local/bin with $PATH or '--server=/usr/local/bin/mosh'? are your login shell and its login scripts unusual?) On Linux we've had issues with ecryptfs and systemd breaking mosh-server when the ssh session ends, but I don't think that applies here. regards, --jh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FreeBSD 10.0-RC5 Now Available
On 1/9/14 10:52 PM, Allan Jude wrote: Another user had this problem this morning, try reinstalling or recompiling hald I had this too today. There's an easier, lamer workaround: Put moused_nondefault_enable=NO in your /etc/rc.conf, and unplug/plug your USB mouse (or kill moused), and restart your X server or login manager. This costs you 1 mouse pointer in text mode. --jh ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org