Re: 5.5 odd issue with relayd flapping

2016-09-15 Thread Alan McKay
Yes, upgrading is on our to-do list.
But it will be a few months before we can do that.



Re: 5.5 odd issue with relayd flapping

2016-09-15 Thread Mariano Baragiola
FYI, 5.9 and 6.0 are the currently supported versions.

You won't get much help here unless you upgrade. But you might get lucky
and someone who had the exactly same problem reads your mail and decides
to help you out.

Cheers.



5.5 odd issue with relayd flapping

2016-09-15 Thread Alan McKay
Hi folks,

I have googled this and found something similar back here :

https://www.mail-archive.com/misc@openbsd.org/msg77218.html

There are a couple of threads but everything seems to say it was a known
issue that was fixed post 5.2.   But I have an extra oddity to add to it as
you will see from my relayd config.

These systems have been running fine for almost 2 years now (653 day
uptime!) with no issues, then last week one of my environments started
throwing these sorts of errors about every hour:

relayd[PID]: host , check script (Xms), state up -> down,
availability x.y%
relayd[PID]: host , check script (Xms), state down -> up,
availability x.y%

The check is against an LDAP server, but here is the funny business we have
going because it is not really checking the LDAP

We have primary and backup LDAPs defined like this :

table  { 10.x.y.1 retry 1 }
table   disable { 10.x.y.2 retry 1 }

[...]


redirect ourldap {
listen on $ldap_addr port $ldap_port interface $relayd_int
tag relayd
session timeout 86400
forward to  check script "/usr/bin/false"
forward to   check script "/usr/bin/false"
}

I know this seems odd but basically as far as relayd is concerned there is
never an issue whatsoever with its check.  We do this because we have another
script which runs that will cut over between the LDAPs if there is an issue.
We basically use relayd to handle the firewall rules for us.  (Earlier versions
of this check found that relayd was not able to properly cut over the LDAPs
on its own - it took several minutes to do so )

We checked the local NICs for errors (netstat -I) and there was nothing.
We checked the switch for errors, and again nothing.

Oh one more thing - this is a redundant pair of firewalls and we only see
this on the backup firewall, not the master.  And it is in our DR facility which
really does not see any traffic.  We have the exact same configuration
in production which is extremely active, and we do not see the issue there.

thanks,
-Alan



-- 
"You should sit in nature for 20 minutes a day.
 Unless you are busy, then you should sit for an hour"
 - Zen Proverb