On Thu, 25 Sep 2008, Howard Chu wrote:
Brett @Google wrote:
I was wondering if anybody is using syncrepl in the context of a
hardware content switch or redundant environment.
Yes.
I am considering the edge case where a connection is redirected to a
client, and :
a) client has no current data (new node introduced)
b) client decides it needs to do a full refresh - perhaps it was down
and missed a large number of updates
Yes, you need to keep all servers identical (as much as practical).
Seems to me that such a switch really isn't useful here. Also, if you're
running an LDAP service where the network fabric can actually sustain more
traffic than your LDAP servers, you've done something very strange.
Considering that a dual-socket quad-core server running OpenLDAP can saturate
a gigabit ethernet, I don't see how you can load-balance beyond that. The
content switch will become the bottleneck.
It's not so much about saturating the wire (although our current switches
do 2Gbps each, and I'm sure the next ones will be on the order of 6-8Gbps
each, and we use more than one). It's about service availability -- taking
down a slave and having everything else converge onto the remaining slaves
in well under a second. A load balancer handles this much faster than the
vast majority of clients configured with multiple servers, and there's no
client delays as they vainly attempt down servers. You also don't have to
worry about any software that only allows you to configure a single
server.
If you're bringing up a brand new replica, just use a separate (virtual, if
necessary) network interface while it's bootstrapping, and don't enable the
main interface until it's caught up.
This is essentially what we do. We start with slapadd -q from recent LDIF.
Then, to catch "late breaking changes," we slapd -h ldapi:///. During both
of these procedures, there's nothing listening on the network, so the load
balancer marks the node as failed. Once contextCSNs appear in sync
(discussed at length in the archives), restart slapd with listeners.
Strictly speaking, you could consider one of the contextCSN checks as a
custom load balancer check. This might be a bit dangerous, though, since
syncrepl only guarantees eventual convergence. It's theoretically possible
that all your slaves would fail out during a particularly large refresh.
You'll have to decide for yourself if it's more dangerous to be serving
stale data or to be serving no data. We don't do this, because we'd rather
be serving stale.