Ok, that's embarrassing. I forgot the last couple lines of each of the slapd.confs. Just pretend each of the four ends with the following lines after all the syncrepl rids have been configured:
mirrormode TRUE overlay syncprov syncprov-checkpoint 50 10 syncprov-sessionlog 100 On Thu, Mar 31, 2011 at 9:06 PM, Mark <[email protected]> wrote: > I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm > having some sporadic problems with it that I'm having difficulty > diagnosing.. > > I have four identical RHEL 4.9 machines on the same switch (NTP syncronized > to same stratum 2 servers): > dual-core Xeon 5110 1.60GHz > 8GB RAM > 100Mb full-duplex NIC > OpenLDAP 2.4.25, BDB 4.8.30, OpenSSL 1.0.0d, Cyrus SASL 2.1.23 (using no > tls/ssl at this time) > > I start the slapds with '-d conns,sync' then commence. I ldapadd 1000 DNs > to one of the servers. After all the syncing has stopped I then compare the > slapd contents against each other looking for differences. Occasionally > there are as much as a couple hundred DNs missing from one or more of the > instances. When that happens I've noticed that the mmaster with less DNs has > lost its consumer connection to a mmaster provider (confirmed using lsof and > netstat) and will never attempt a re-connect, but the provider still shows > the connection (using lsof and netstat). When the consumer gets in this > state I can connect to its cn=config and cn=monitor backends (and browse > them) but when I try to connect to its multi-master'd backend the connection > attempt just hangs. It's almost like the connect succeeds but the client is > waiting for a response from the server (and never gets it). Also, the > consumer slapd will not respond to a 'kill -TERM' at this time and must be > 'kill -KILL'd. The same thing occurs sometimes when I delete the entire > tree. > > I've been trying to catch logging information that might help but so far > nothing's jumping out at me. While I continue to try to reproduce and parse > through logfiles maybe someone can look at my slapd.confs below and see if I > might have configured something wrong (I'm listing the original slapd.conf > files below, but I've used slaptest to convert them to > slapd.d/cn=config.ldif format): > > HOST1 slapd.conf: > > include /tmp/openldap/multi-master/etc/schema/core.schema > include /tmp/openldap/multi-master/etc/schema/cosine.schema > include /tmp/openldap/multi-master/etc/schema/nis.schema > argsfile /tmp/openldap/multi-master/var/run/slapd.args > pidfile /tmp/openldap/multi-master/var/run/slapd.pid > threads 16 > idletimeout 0 > writetimeout 5 > reverse-lookup off > timelimit time.soft=30 time.hard=300 > sizelimit size.soft=500 size.hard=1000 > password-hash {SSHA} > loglevel stats sync > serverid 001 > modulepath /tmp/openldap/multi-master/libexec > moduleload back_monitor.la > moduleload back_hdb.la > moduleload syncprov.la > > database config > rootdn cn=manager,cn=config > rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm > > database monitor > rootdn cn=manager,cn=monitor > rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R > > database hdb > suffix dc=example,dc=com > rootdn cn=manager,dc=example,dc=com > rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI > directory /tmp/openldap/multi-master/var/data/dc=example,dc=com > cachesize 30000 > cachefree 5 > checkpoint 128 15 > dncachesize 25000 > idlcachesize 100000 > index objectClass eq > index entryCSN eq > index entryUUID eq > > syncrepl rid=001 > provider=ldap://host2:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=002 > provider=ldap://host3:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=003 > provider=ldap://host4:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > > HOST2 slapd.conf: > > include /tmp/openldap/multi-master/etc/schema/core.schema > include /tmp/openldap/multi-master/etc/schema/cosine.schema > include /tmp/openldap/multi-master/etc/schema/nis.schema > argsfile /tmp/openldap/multi-master/var/run/slapd.args > pidfile /tmp/openldap/multi-master/var/run/slapd.pid > threads 16 > idletimeout 0 > writetimeout 5 > reverse-lookup off > timelimit time.soft=30 time.hard=300 > sizelimit size.soft=500 size.hard=1000 > password-hash {SSHA} > loglevel stats sync > serverid 002 > modulepath /tmp/openldap/multi-master/libexec > moduleload back_monitor.la > moduleload back_hdb.la > moduleload syncprov.la > > database config > rootdn cn=manager,cn=config > rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm > > database monitor > rootdn cn=manager,cn=monitor > rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R > > database hdb > suffix dc=example,dc=com > rootdn cn=manager,dc=example,dc=com > rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI > directory /tmp/openldap/multi-master/var/data/dc=example,dc=com > cachesize 30000 > cachefree 5 > checkpoint 128 15 > dncachesize 25000 > idlcachesize 100000 > index objectClass eq > index entryCSN eq > index entryUUID eq > > syncrepl rid=001 > provider=ldap://host1:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=002 > provider=ldap://host3:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=003 > provider=ldap://host4:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > > HOST3 slapd.conf: > > include /tmp/openldap/multi-master/etc/schema/core.schema > include /tmp/openldap/multi-master/etc/schema/cosine.schema > include /tmp/openldap/multi-master/etc/schema/nis.schema > argsfile /tmp/openldap/multi-master/var/run/slapd.args > pidfile /tmp/openldap/multi-master/var/run/slapd.pid > threads 16 > idletimeout 0 > writetimeout 5 > reverse-lookup off > timelimit time.soft=30 time.hard=300 > sizelimit size.soft=500 size.hard=1000 > password-hash {SSHA} > loglevel stats sync > serverid 003 > modulepath /tmp/openldap/multi-master/libexec > moduleload back_monitor.la > moduleload back_hdb.la > moduleload syncprov.la > > database config > rootdn cn=manager,cn=config > rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm > > database monitor > rootdn cn=manager,cn=monitor > rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R > > database hdb > suffix dc=example,dc=com > rootdn cn=manager,dc=example,dc=com > rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI > directory /tmp/openldap/multi-master/var/data/dc=example,dc=com > cachesize 30000 > cachefree 5 > checkpoint 128 15 > dncachesize 25000 > idlcachesize 100000 > index objectClass eq > index entryCSN eq > index entryUUID eq > > syncrepl rid=001 > provider=ldap://host1:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=002 > provider=ldap://host2:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=003 > provider=ldap://host4:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > HOST4 slapd.conf: > > include /tmp/openldap/multi-master/etc/schema/core.schema > include /tmp/openldap/multi-master/etc/schema/cosine.schema > include /tmp/openldap/multi-master/etc/schema/nis.schema > argsfile /tmp/openldap/multi-master/var/run/slapd.args > pidfile /tmp/openldap/multi-master/var/run/slapd.pid > threads 16 > idletimeout 0 > writetimeout 5 > reverse-lookup off > timelimit time.soft=30 time.hard=300 > sizelimit size.soft=500 size.hard=1000 > password-hash {SSHA} > loglevel stats sync > serverid 004 > modulepath /tmp/openldap/multi-master/libexec > moduleload back_monitor.la > moduleload back_hdb.la > moduleload syncprov.la > > database config > rootdn cn=manager,cn=config > rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm > > database monitor > rootdn cn=manager,cn=monitor > rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R > > database hdb > suffix dc=example,dc=com > rootdn cn=manager,dc=example,dc=com > rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI > directory /tmp/openldap/multi-master/var/data/dc=example,dc=com > cachesize 30000 > cachefree 5 > checkpoint 128 15 > dncachesize 25000 > idlcachesize 100000 > index objectClass eq > index entryCSN eq > index entryUUID eq > > syncrepl rid=001 > provider=ldap://host1:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=002 > provider=ldap://host2:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > syncrepl rid=003 > provider=ldap://host3:1389 > type=refreshAndPersist > interval=00:00:05:00 > retry="15 +" > searchbase="dc=example,dc=com" > binddn="cn=manager,dc=example,dc=com" > credentials="example_pass" > starttls=no > schemachecking=off > > > Thank you. > > >
