On Thu, 7 May 2009 13:30:08 -0400 James Whittington
<[email protected]> wrote...

> I never did have anyone offering feedback on how to address a reverse ssh 
> slave server dropping the reverse tunnel.

Sorry James. I did see your original post but it entirely slipped my
mind so I'm glad you did a follow up !

I have experienced the exact behaviour you describe with one of our
two slave servers - both configured in a reverse SSH tunnel mode.

Oddly it is the same server - one in Asia with a significantly higher
latency that the other slave in the US - which this has happened to
twice now. I'm just waiting for the third episode which will surely
happen.

According to our smokeping graphs there doesn't seem to be any
connectivity issues, but certainly the round trip time is ~300ms
whereas with our slave in the US (no problems whatsoever) it's ~60ms.

I cannot spot any other particular difference between the two slaves
apart from that.

> The remote site is able to pass info back to the master server but the master 
> server cannot initiate a connection with the slave
> server

Exactly. We can see the autossh process running on the slave but the
"slave" port on the master is no longer there. Netstat shows this and
of course if you attempt to SSH to the localhost on that port - which
should tunnel to the slave - it fails.

The fix right now is just to re-start the "opsview-slave" service -
essentially re-creating the tunnel - and issuing a reload on the
master.

> The slave server doesn’t really have a way to know that the reverse tunnel is 
> down...

This is what confuses me. The autossh process is running on the slave
but the tunnel from master to slave is no longer running. Shouldn't
autossh (by definition?) be picking up on this?

> Did anyone come up with creative ways to address this sort of condition?

I haven't really addressed it yet but off the top of my head right now
I'd try...

1. Using SSH from the slave to the master (which is fine) to check if
the port on the master is listening.

2. If so then exit true.

3. If not then issue a restart of the opsview-slave service.

Seems a reasonably simple thing to script and you could run it under
cron every 10 mins or so.
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/listinfo/opsview-users

Reply via email to