Thank you, Howard! I'll give that a try in case the keepalive option Quanah mentioned is not fixing the issue.
Mircea -- Mircea Baciu | Senior Unix Systems Administrator Simmons University | 300 The Fenway | Boston, MA 02115 | 617-521-2194 On Mon, Sep 20, 2021 at 12:02 PM Howard Chu <[email protected]> wrote: > Mircea Baciu wrote: > > Hi, > > > > I have an issue with a consumer replication starting to fail until > OpenLDAP is restarted. > > > > My setup consists of a pair of on-prem MirrorMode replicated providers > (only one is active at a given time using a virtual IP managed by > Keepalived), and one > > off-site (AWS) consumer. The providers use a dedicated port (LDAPS on > 1636) for their own replication, as well as for the consumer to connect to > them, so the > > consumer has access to both servers, regardless of where the providers' > virtual IP is residing. > > > > All the connections happen over LDAPS, and the syncrepl configs have the > tls_reqcert=allow option. > > > > The providers are always in sync and I'm able to switch make one or the > other one the "active" one with ease. The consumer does the initial sync > and stays in > > sync for a while, but I find it often (almost daily) out of sync. I see > error messages on both the consumer and provider side: > > Sounds like an issue in the TLS layer. You should increase the debug level > on both provider and consumer to see > if there are any TLS-specific error messages being generated. If you have > cn=monitor configured you can set the > debuglevel using ldapmodify, so no need to restart the servers for it to > take effect. That'll let you see the > problem as it's occurring. > > > > On the consumer (every minute): > > Sep 20 08:19:31 <consumer> slapd[1440]: slap_client_connect: > URI=ldaps://<provider1>:1636/ > DN="uid=replication,ou=sysaccounts,dc=example,dc=com" > > ldap_sasl_bind_s failed (-1) > > Sep 20 08:19:31 <consumer> slapd[1440]: do_syncrepl: rid=001 rc -1 > retrying > > Sep 20 08:19:31 <consumer> slapd[1440]: slap_client_connect: > URI=ldaps://<provider2>:1636/ > DN="uid=replication,ou=sysaccounts,dc=example,dc=com" > > ldap_sasl_bind_s failed (-1) > > Sep 20 08:19:31 <consumer> slapd[1440]: do_syncrepl: rid=002 rc -1 > retrying > > Sep 20 08:20:31 <consumer> slapd[1440]: slap_client_connect: > URI=ldaps://<provider1>:1636/ > DN="uid=replication,ou=sysaccounts,dc=example,dc=com" > > ldap_sasl_bind_s failed (-1) > > Sep 20 08:20:31 <consumer> slapd[1440]: do_syncrepl: rid=001 rc -1 > retrying > > Sep 20 08:20:31 <consumer> slapd[1440]: slap_client_connect: > URI=ldaps://<provider2>:1636/ > DN="uid=replication,ou=sysaccounts,dc=example,dc=com" > > ldap_sasl_bind_s failed (-1) > > Sep 20 08:20:31 <consumer> slapd[1440]: do_syncrepl: rid=002 rc -1 > retrying > > > > On the provider (every minute): > > Sep 20 08:19:31 <provider1> slapd[1057]: conn=11242 fd=14 ACCEPT from > IP=<consumer IP>:45438 (IP=<provider1 IP>:1636) > > Sep 20 08:19:31 <provider1> slapd[1057]: conn=11242 fd=14 TLS > established tls_ssf=256 ssf=256 > > Sep 20 08:19:31 <provider1> slapd[1057]: conn=11242 fd=14 closed > (connection lost) > > Sep 20 08:20:31 <provider1> slapd[1057]: conn=11243 fd=14 ACCEPT from > IP=<consumer IP>:45458 (IP=<provider1 IP>:1636) > > Sep 20 08:20:31 <provider1> slapd[1057]: conn=11243 fd=14 TLS > established tls_ssf=256 ssf=256 > > Sep 20 08:20:31 <provider1> slapd[1057]: conn=11243 fd=14 closed > (connection lost) > > > > Sep 20 08:19:31 <provider2> slapd[1051]: conn=215893 fd=18 ACCEPT from > IP=<consumer IP>:41706 (IP=<provider2 IP>:1636) > > Sep 20 08:19:31 <provider2> slapd[1051]: conn=215893 fd=18 TLS > established tls_ssf=256 ssf=256 > > Sep 20 08:19:31 <provider2> slapd[1051]: conn=215893 fd=18 closed > (connection lost) > > Sep 20 08:20:31 <provider2> slapd[1051]: conn=215898 fd=18 ACCEPT from > IP=<consumer IP>:41726 (IP=<provider2 IP>:1636) > > Sep 20 08:20:31 <provider2> slapd[1051]: conn=215898 fd=18 TLS > established tls_ssf=256 ssf=256 > > Sep 20 08:20:31 <provider2> slapd[1051]: conn=215898 fd=18 closed > (connection lost) > > > > There must be something wrong on the consumer side since when the issue > starts, the consumer is not able to connect to either provider. > > > > Once I restart the consumer, it quickly resyncs and works just fine, for > a while. > > > > The providers are OpenLDAP 2.4.44 (openldap-2.4.44-24.el7_9.x86_64), > running on RHEL 7. > > The consumer is OpenLDAP 2.4.44 (openldap-2.4.44-24.el7_9.x86_64), > running on CentOS 7. > > > > The consumer syncrepl config is: > > olcSyncrepl: {0}rid=001 > > provider=ldaps://<provider1>:1636/ > > searchbase="dc=example,dc=com" > > type=refreshAndPersist > > retry="60 +" > > timeout=1 > > bindmethod=simple > > binddn="uid=replication,ou=SysAccounts,dc=example,dc=com" > > credentials=<credentials> > > tls_reqcert=allow > > olcSyncrepl: {1}rid=002 > > provider=ldaps://<provider1>:1636/ > > searchbase="dc=example,dc=com" > > type=refreshAndPersist > > retry="60 +" > > timeout=1 > > bindmethod=simple > > binddn="uid=replication,ou=SysAccounts,dc=example,dc=com" > > credentials=<credentials> > > tls_reqcert=allow > > > > The "uid=replication,ou=SysAccounts,dc=example,dc=com" DN has full > read-only permissions for the entire "dc=example,dc=com" tree. > > > > Any idea on what might be my issue here? > > > > Thank you, > > Mircea > > -- > > Mircea Baciu | Senior Unix Systems Administrator > > Simmons University | 300 The Fenway | Boston, MA 02115 | 617-521-2194 > > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ >
