Re: Mosh regression between 10.x and 11-stable

2016-08-14 Thread john hood
On 8/13/16 4:30 AM, Peter Jeremy wrote:
> Hi John,
> 
> Sorry, I got side-tracked.
> 
> On 2016-Aug-12 16:37:15 -0400, John Hood <cg...@glup.org> wrote:
>> >Could I ask you to look at this a little further?  On the one hand, it
>> >sure looks like a Mosh issue, and tcdrain() solves it-- but on the other
>> >hand, this is a regression on FreeBSD and we don't see this issue on any
>> >other OS.  I'd like to fully understand this and make sure that this is
>> >not a kernel issue.
> It's got me puzzled as well.  And it's only getting wierder...  The
> following is using an unmodified mosh-1.2.5, built from the port, as
> the server on FreeBSD 11.0-BETA4 r303957.  The client is 1.2.4a-1ubuntu1
> on Linux.  the standard driver script consistently fails but adding a
> "print" makes it mostly work.  Where there's a "[mosh is exiting.]"
> message, it was successful (and would report that there were other
> orphaned servers since I wasn't waiting the 60 seconds for servers to
> die between invocations).  For completeness, I've tried ktrace'ing
> mosh-server but can't make it fail when I do so.

I've now managed to reproduce the issue *and* ktrace it (and sshd) on a
VPS, with a single-CPU VM on a badly-oversubscribed VMWare host and OS X
client.

mosh-server shows a straightforward execution trace, the parent
successfully writes the MOSH CONNECT message on stdout, forks and exits.
 About a millisecond later, the child starts running, writes the verbose
copyright/etc. message on stderr, and gets EIO (and ignores it).
Shortly thereafter it closes its pty slave fds on stdin/stdout/stderr.
It continues normally from there.

In that millisecond, the sshd trace shows that it catches SIGCHLD,
writes utmp info, wait()s for a child and gets mosh-server's pid, and
does a final read() that returns 0 bytes.  It then closes the pty
master, without doing revoke() or any ioctls.

So the pty driver is clearly dropping the MOSH CONNECT message.  It's a
lot less clear whether that's a bug.  On the one hand,
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_11>
states that output should be drained on final close (which in this case
is done by the forked mosh-server).  On the other hand, the login shell
is session leader, and
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_03>
requires that the terminal be disassociated from the session when it exits.

I suspect that this problem is most visible on single-core machines,
because on a multi-core machine the pty driver, mosh-servers, and sshd
will run with different ordering, and I suspect that the mosh-server
child gets to close the pty slave before sshd closes the pty master.

I wrote a Xenix serial driver ages ago and I can see arguments for
either draining or dropping final output.  And Mosh's behavior is a
little questionable here.  So I'm not sure whether to call this a kernel
pty bug or not.  But the workaround in Mosh is easy enough, tcdrain(),
so I'm doing that anyway.

regards,

  --jh

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Mosh regression between 10.x and 11-stable

2016-08-11 Thread John Hood
That's interesting (not in a good way).

Next thing to try:

mosh --server='/usr/local/bin/mosh-server 2>/dev/null' peter@VPS

(with an unmodified mosh-server)

Mosh prints the 'MOSH CONNECT ...' message on stdout, then forks.  The
parent exits immediately, and the child prints verbose and copyright
info to stderr.  I'm wondering if the race is that the parent and child
writes appear mixed together, corrupting the expected message.

You might try adding a 'print;' in the while loop that digests this
input in the mosh script.  That'll tell us whether the script is not
getting 'MOSH CONNECT...' at all, or if it's corrupted.  You'll have to
run mosh inside /usr/bin/script to capture that.

regards,

  --jh

On 08/11/16 03:53 PM, Peter Jeremy wrote:
> On 2016-Aug-11 12:30:23 -0400, John Hood <cg...@glup.org> wrote:
>> I still can't reproduce this on 3 different 11.0-BETA4 servers and a
>> variety of clients and networks.  Can you try and identify a more
>> portable repro or at least figure out why it fails on your system?
>>
>> Please try applying this patch, too.  It's a shot in the dark, though.
> That patch seems to fix the problem I'm seeing.  Not waiting for output
> to drain is consistent with the symptoms I'm seeing, though I have no
> idea why only my Linux client is affected.
>

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Mosh regression between 10.x and 11-stable

2016-08-11 Thread John Hood
I still can't reproduce this on 3 different 11.0-BETA4 servers and a
variety of clients and networks.  Can you try and identify a more
portable repro or at least figure out why it fails on your system?

Please try applying this patch, too.  It's a shot in the dark, though.

regards,

  --jh


>From 4c4d2193af37f4375d9f7d52c109bbcdb873d9fc Mon Sep 17 00:00:00 2001
From: John Hood <cg...@glup.org>
Date: Thu, 11 Aug 2016 12:25:55 -0400
Subject: [PATCH] Ensure MOSH CONNECT message reaches sshd.

---
 src/frontend/mosh-server.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/frontend/mosh-server.cc b/src/frontend/mosh-server.cc
index a88a8d2..cd25a14 100644
--- a/src/frontend/mosh-server.cc
+++ b/src/frontend/mosh-server.cc
@@ -410,6 +410,9 @@ static int run_server( const char *desired_ip, const char *desired_port,
 
   printf( "\nMOSH CONNECT %s %s\n", network->port().c_str(), network->get_key().c_str() );
   fflush( stdout );
+  if ( isatty( fileno( stdout ))) {
+tcdrain( fileno( stdout ));
+  }
 
   /* don't let signals kill us */
   struct sigaction sa;
-- 
2.9.2

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Mosh regression between 10.x and 11-stable

2016-08-10 Thread john hood
On 8/10/16 4:18 AM, Peter Jeremy wrote:
> I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4
> r303811 and mosh to that host from my Linux laptop stopped working.  All
> I get on the laptop is:
> $ mosh remotehost
> Connection to remotehost closed.
> /usr/bin/mosh: Did not find mosh server startup message.
> 
> I've tried rebuilding mosh (and all dependencies) on the host to no avail.

I'm a mosh maintainer.  mosh 1.2.5 (from ports) and mosh master (just
last night tagged as 1.2.6, alas) work fine for me on my two 11.0-BETA4
systems, one local and one remote.

> This isn't the DSA change that's been discussed elsewhere: I can SSH from my
> laptop to the host without problem.  I can also manually invoke mosh-client
> and mosh-server and it works.  Unfortunately, mosh has no provision for
> debugging.  I've tried hacking the mosh perl script to make it more verbose
> and that shows that:
> 1) the "MOSH CONNECT" message isn't making it out of the local ssh process.

Do you know if the message is getting out of mosh-server?  into sshd?
Do you know if mosh-server is actually running?  (It will log utmp
entries on startup.)

Mosh's debugging/logging isn't very good, but 'mosh-server new -v 2>
logfile' does produce some useful info (mostly logging of network traffic).

> 2) it's racy because I can get it from "always fails" to "sometimes works".

How do you get it there?

> My suspicion is that something has changed in either sshd or TCP that
> is resulting in the connection going away before the stdout from the
> remote mosh-server makes it out from the local ssh process.

mosh does 'ssh -t' and uses ptys.  That's another potential point the
message could get dropped.

> I've looked at tcpdump's of both successful and failed SSH sessions
> but don't see anything obviously different (encryption makes it
> difficult to decode the session).
> 
> Has anyone else seen this behaviour or have any ideas what might be
> causing it?

Common suspects include issues with shell login/invocation of mosh (are
you making sure it's reachable in /usr/local/bin with $PATH or
'--server=/usr/local/bin/mosh'?  are your login shell and its login
scripts unusual?)

On Linux we've had issues with ecryptfs and systemd breaking mosh-server
when the ssh session ends, but I don't think that applies here.

regards,

  --jh

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Mosh regression between 10.x and 11-stable

2016-08-10 Thread john hood
On 8/10/16 4:18 AM, Peter Jeremy wrote:
> I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4
> r303811 and mosh to that host from my Linux laptop stopped working.  All
> I get on the laptop is:
> $ mosh remotehost
> Connection to remotehost closed.
> /usr/bin/mosh: Did not find mosh server startup message.
> 
> I've tried rebuilding mosh (and all dependencies) on the host to no avail.

I'm a mosh maintainer.  mosh 1.2.5 (from ports) and mosh master (just
last night tagged as 1.2.6, alas) work fine for me on my two 11.0-BETA4
systems, one local and one remote.

> This isn't the DSA change that's been discussed elsewhere: I can SSH from my
> laptop to the host without problem.  I can also manually invoke mosh-client
> and mosh-server and it works.  Unfortunately, mosh has no provision for
> debugging.  I've tried hacking the mosh perl script to make it more verbose
> and that shows that:
> 1) the "MOSH CONNECT" message isn't making it out of the local ssh process.

Do you know if the message is getting out of mosh-server?  into sshd?
Do you know if mosh-server is actually running?  (It will log utmp
entries on startup.)

Mosh's debugging/logging isn't very good, but 'mosh-server new -v 2>
logfile' does produce some useful info (mostly logging of network traffic).

> 2) it's racy because I can get it from "always fails" to "sometimes works".

How do you get it there?

> My suspicion is that something has changed in either sshd or TCP that
> is resulting in the connection going away before the stdout from the
> remote mosh-server makes it out from the local ssh process.

mosh does 'ssh -t' and uses ptys.  That's another potential point the
message could get dropped.

> I've looked at tcpdump's of both successful and failed SSH sessions
> but don't see anything obviously different (encryption makes it
> difficult to decode the session).
> 
> Has anyone else seen this behaviour or have any ideas what might be
> causing it?

Common suspects include issues with shell login/invocation of mosh (are
you making sure it's reachable in /usr/local/bin with $PATH or
'--server=/usr/local/bin/mosh'?  are your login shell and its login
scripts unusual?)

On Linux we've had issues with ecryptfs and systemd breaking mosh-server
when the ssh session ends, but I don't think that applies here.

regards,

  --jh

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FreeBSD 10.0-RC5 Now Available

2014-01-09 Thread john hood

On 1/9/14 10:52 PM, Allan Jude wrote:
Another user had this problem this morning, try reinstalling or 
recompiling hald 

I had this too today.  There's an easier, lamer workaround:

Put

  moused_nondefault_enable=NO

in your /etc/rc.conf, and unplug/plug your USB mouse (or kill moused), 
and restart your X server or login manager.  This costs you 1 mouse 
pointer in text mode.


  --jh

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org