Thank you both Howard and Leonid. Yes, you're right, it happened the other way around; the modification was made on the second server and propagated back to the first one. However, I don't know why the change was in turn returned to the second server - SIDs are ok as far as I know.
However, as Leonid mentioned, both servers weren't synchronized correctly anyway. Turns out that yesterday we upgraded to 2.4.39-6 (as I stated in my first mail). Previously, we were using 2.4.39-3 and it seemed to work fine. We also noted that 2.4.39-6 produced some additional issues (like client and syncrepl sockets dying without any apparent reason), so today we downgraded back to 2.4.39-3 and everything seems to work just fine again. We had a look at the changelog from 2.4.39-3 to 2.4.39-6 and no change seems to be explicitly syncrepl related, but rather related to LDAPS (strange, as we use the LDAP protocol for syncrepl instead of LDAPS). Anyway, we'll keep version 2.4.39-3 as far as it works well. Thanks. Regards. 2015-04-21 23:53 GMT+01:00 Леонид Юрьев <[email protected]>: > Hi Nicolás, > > 1) If contextCSN(s) are differs on servers, then are still not > syncronized (or has a glitches). > http://www.openldap.org/lists/openldap-technical/201108/threads.html#00001 > > 2) Replication takes a some time. Therefore contextCSN(s) may be equals > only when some time was no any changes. > > 3) Make sure that the time is synchronized on servers (e.g. by using > ntpdate). > > 4) Unfortunatelly, all current releases (include 2.4.39 and 2.4.40) have > enough bugs in replication code. > For example, by ITS#8081 ( > http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8081) you could > get segfault, but also lost (like undo) some changes by a replication. > > 5) We made a fork of OpenLDAP project for our usecase (highload > TELCO-aware multi-master), it called ReOpenLDAP. > If you decide to build slapd from sources, I recommend use our ReOpenLDAP > ;) > > New features yet not documented in english man-pages, by you can translate > by Google: > https://github.com/ReOpen/ReOpenLDAP/releases/tag/ReOpenLDAP-2.4.41-rc > > https://github.com/ReOpen/ReOpenLDAP/commit/4fc4bc18dd4bd80909aa80700c5c19b0816ca120 > > https://github.com/ReOpen/ReOpenLDAP/commit/95808b156ee36a886523b7096a75d5099e9b44fc > > https://github.com/ReOpen/ReOpenLDAP/commit/1c94bc17ec285388e8a8299399ed537754fc3028 > > Leonid. > > 2015-04-21 16:01 GMT+03:00 Nicolás Kovac Neumann <[email protected]>: > >> Hi, >> >> We're currently using N-way multimaster replication on two servers for >> our LDAP directory, both for the config and the hdb databases. It's working >> fine mostly, but we've run into a possible issue with the syncrepl engine >> which we would like to cast light on. We're using CentOS 7 with >> openldap-servers version 2.4.39-6. >> >> We made an update on one of the entries (server1, in this case), so >> server2 replicated that change (as shown below in the log): >> >> ==> server1/ldap.log <== >> Apr 21 13:38:55 server1 slapd[1835]: do_syncrep2: rid=002 >> cookie=rid=002,sid=002,csn=20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server1 slapd[1835]: syncrepl_message_to_entry: >> rid=002 DN: uid=user1,cn=subtree,dc=example,dc=org, UUID: >> 18a2436c-73ce-1030-95dd-b52dc05ced13 >> Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 >> LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) >> Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 >> be_search (0) >> Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 >> uid=user1,cn=subtree,dc=example,dc=org >> Apr 21 13:38:55 server1 slapd[1835]: slap_queue_csn: queing >> 0x7ff8f42789f0 20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server1 slapd[1835]: slap_graduate_commit_csn: >> removing 0x7ff8f435e770 20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 >> be_modify uid=user1,cn=subtree,dc=example,dc=org (0) >> Apr 21 13:38:55 server1 slapd[1835]: syncprov_sendresp: >> cookie=rid=001,sid=001,csn=20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server1 slapd[1835]: slap_queue_csn: queing >> 0x7ff8f42789f0 20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server1 slapd[1835]: slap_graduate_commit_csn: >> removing 0x7ff8f41b7b90 20150421123855.643239Z#000000#002#000000 >> >> ==> server2/ldap.log <== >> Apr 21 13:38:55 server2 slapd[1948]: slap_queue_csn: queing >> 0x7f897affb220 20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server2 slapd[1948]: syncprov_sendresp: to=001, >> cookie=rid=002,sid=002,csn=20150421123855.643239Z#000000#002#000000 >> Apr 21 13:38:55 server2 slapd[1948]: slap_graduate_commit_csn: >> removing 0x7f89307f42a0 20150421123855.643239Z#000000#002#000000 >> >> Nothing strange up to now, however, if we query the contextCSN, it >> differs on both servers. >> >> For server1, we have: >> >> contextCSN: 20150421123523.281736Z#000000#001#000000 >> contextCSN: 20150421123417.889477Z#000000#002#000000 >> >> For server2, the value for server ID 001 differs: >> >> contextCSN: 20150421115324.003502Z#000000#001#000000 >> contextCSN: 20150421123417.889477Z#000000#002#000000 >> >> However, the entry seems to replicate the entryCSN correctly on both >> servers: >> >> entryCSN: 20150421123417.889477Z#000000#002#000000 >> >> Is this the expected behavior? Shouldn't both contextCSN values match on >> both servers? >> >> Thanks! >> >> Regards, >> >> Nicolás >> >> >
