Hi-
 I’ve been working on this problem for a couple of days, the manual pages/admin 
guide/logs and I are now best buddies, but still I fail. Any help you can offer 
would be much appreciated.

I’m trying to set up a 4-host MMR cluster using 2.4.39 (LTB build, running on 
Ubuntu 12.04). With the config I have below (which is the same on all hosts), 
I’m seeing this peculiar behavior where all of the servers attempt to perform a 
full sync with each other over and over again. They stay at a relatively high 
load as a result. The logs (at default level) show this over and over again:

ul 29 10:40:55 eadrax slapd[3815]: conn=1000 op=1 SRCH 
base="dc=ccs,dc=neu,dc=edu" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 10:40:55 eadrax slapd[3815]: conn=1000 op=1 SRCH attr=* +
Jul 29 10:40:56 eadrax slapd[3815]: conn=1000 op=1 SEARCH RESULT tag=101 err=0 
nentries=16551 text=
Jul 29 10:40:56 eadrax slapd[3815]: conn=1000 op=2 SRCH 
base="dc=ccs,dc=neu,dc=edu" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 10:40:56 eadrax slapd[3815]: conn=1000 op=2 SRCH attr=* +
Jul 29 10:40:58 eadrax slapd[3815]: conn=1000 op=2 SEARCH RESULT tag=101 err=0 
nentries=16551 text=
...

If I delete the databases from a server and  bring the server to a loglevel of 
16384, I see the initial re-sync proceed as I would expect (all of the data is 
replicated) but then the full sync process appears to repeat again and the logs 
show entries like:

syncrepl_entry: rid=003 entry unchanged, ignored (...
and
dn_callback : entries have identical CSN ...

My first thought was to check the contextCSN on the servers, and indeed 
something is peculiar because ldapsearch authorizing with the rootDN (while 
running) and slapcat (while at rest) show that there is no contextCSN attribute 
associated with the main database (there is one in cn=accesslog). I have 
confirmed with cn=monitor that the main database does indeed show the syncprov 
and syncrepl overlays loaded. I have changed log levels to see if the config 
files are being parsed ok (they are). I have changed values for 
syncprov-checkpoint. I have attempted to just have two of the four talk to each 
other to see if a simpler case would help illuminate what is going on, but to 
no avail. There are no weird errors in the log. At this point I don’t know what 
else to try.

Here are the relevant sections from my configs, do you spot anything untoward 
that might be causing this behavior?

=== slapd.conf excerpt:

# {serverN} is replaced with a real name in the configs
serverId 1 ldaps://{server1}.ccs.neu.edu:636/
serverId 2 ldaps://{server2}.ccs.neu.edu:636/
serverId 3 ldaps://{server3}.ccs.neu.edu:636/
serverId 4 ldaps://{server4}.ccs.neu.edu:636/

include /usr/local/openldap/etc/openldap/slapd.conf.acl

database        mdb
suffix          "dc=ccs,dc=neu,dc=edu"
rootdn          “XXXX”


include /usr/local/openldap/etc/openldap/slapd.conf.index
include /usr/local/openldap/etc/openldap/slapd.conf.replicas

# {repluser} is replaced with a real name in the actual configs
limits dn.exact=cn={repluser},dc=ccs,dc=neu,dc=edu
    time.soft=unlimited
    time.hard=unlimited
    size.soft=unlimited
    size.hard=unlimited

overlay syncprov
syncprov-checkpoint 100 10
syncprov-reloadhint FALSE
syncprov-nopresent  FALSE

overlay accesslog

logdb "cn=accesslog"
logops writes
logpurge 07+00:00 01+00:00
logsuccess TRUE

index reqstart eq

database mdb
suffix          "cn=accesslog"
rootdn          “XXXX”

index default eq
index entryCSN,entryUUID,objectClass,reqEnd,reqResult,reqStart

# {repluser} is replaced with a real name in the actual configs
limits dn.exact=cn={repluser},dc=ccs,dc=neu,dc=edu
        time.soft=unlimited
        time.hard=unlimited
        size.soft=unlimited
        size.hard=unlimited

overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 500
syncprov-reloadhint TRUE
syncprov-nopresent  TRUE

=== slapd.conf.acl excerpt:

access to *
    by dn=cn={repluser},dc=ccs,dc=neu,dc=edu read
    by * break

=== slapd.conf.replicas excerpt:

# {serverN} is replaced with a real name in the configs
syncrepl rid=001    provider="ldaps://{server1}.ccs.neu.edu:636/"
    searchbase="dc=ccs,dc=neu,dc=edu"
    syncdata="accesslog"
    logbase="cn=accesslog"
    logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
    bindmethod="sasl"
    saslmech="EXTERNAL"
    type="refreshAndPersist"
    retry="10 +"
    timeout="1"
    keepalive="180:3:60"
    network-timeout="10"
    schemachecking="on"
syncrepl rid=002    provider="ldaps://{server2}.ccs.neu.edu:636/"
    searchbase="dc=ccs,dc=neu,dc=edu"
    syncdata="accesslog"
    logbase="cn=accesslog"
    logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
    bindmethod="sasl"
    saslmech="EXTERNAL"
    type="refreshAndPersist"
    retry="10 +"
    timeout="1"
    keepalive="180:3:60"
    network-timeout="10"
    schemachecking=“on"
… 
(other 2 hosts, same format)

=== slapd.conf.index:
index cn eq,sub
index entrycsn eq
index entryuuid eq
index mail sub
index member eq
index objectclass eq
index sn eq,sub
index uid eq,sub

Thanks for any help you can offer!

   — dNb



Reply via email to