Hi! An update on my experience of delta-syncrepl with OpenLDAp 2.5 as provided by SLES15 SP6: In a configuration, I decided to test replication by stopping the server, clearing the changelog and the DIT, then starting it again. Unfortunately the providing server dore dumped then. After restarting, it continued. So I checked whether both servers are in sync, and the were not. So I cleared the changelog for the DIT on the crashed server (while topped, then started it again). Expecting sync to happen I saw nothing. Then I realized that the other server had core dumped. 8-( Somewhere in between these experiments an accesslog ran out of space, so I deleted that, too.
With the situation of two different servers after a core dump, I'm wondering what happens when an MDB changelog becomes full: Obviously new entries cannot be added (while the sync from remote virtually continues), but can an existing record be updated? Specifically I wonder about "updating mincsn": What happens to sync if the actual changelog entry could not be written, but mincsn is being updated? Will it cause inconsistency in the changelog? Sorry, I still don't understand all the details of delta-syncrepl. As it is now, I don't feel it's ready for production, specifically if a (n outdated) consumer can cause the provider to dump core (as it seems). I know that I should rather try version 2.6, but SLES support is only available for "their version". Kind regards, Ulrich Windl > -----Original Message----- > From: Windl, Ulrich <u.wi...@ukr.de> > Sent: Monday, June 2, 2025 12:27 PM > To: Ondřej Kuzník <on...@mistotebe.net> > Cc: openldap-technical@openldap.org > Subject: [EXT] RE: Re: delta-syncrepl experience with OpenLDAP 2.5 (from > SLES15) > > See inline comments (as good as Outlook allows) > > Kind regards, > Ulrich Windl > > > -----Original Message----- > > From: Ondřej Kuzník <on...@mistotebe.net> > > Sent: Wednesday, May 28, 2025 11:32 AM > > To: Windl, Ulrich <u.wi...@ukr.de> > > Cc: openldap-technical@openldap.org > > Subject: [EXT] Re: delta-syncrepl experience with OpenLDAP 2.5 (from > > SLES15) > > > > On Tue, May 27, 2025 at 12:59:42PM +0000, Windl, Ulrich wrote: > > > Hi! > > > > > > When upgrading from OpenLDAP 2.4 (SLES12) to OpenLDAP 2.5 (SLES15) I > > > gave delta-syncrepl a try. That was a hard way in several aspects. > > > Meanwhile I think I understand most of the details (docs could be much > > > better IMHO). > > > > Hi Ulrich, > > I am sorry that you have not had a great experience, please suggest > > which parts of documentation (manpages, admin guide) you feel should be > > adjusted. > [Windl, Ulrich] > In slapd-config(5): > The olcSyncrepl description is quite long, and lists all the keywords. > I think before there should be a description or a reference to a description > explaining the replication variants (and details how theose work) > > In the Admin Guide (18.2.1. Delta-syncrepl replication) disadvantages of > LDAP Sync are explained and the advantages of delta-syncrepl, but the > relationship between local database and the corresponding changelog is not > explained well IMHO. > For example "The replication consumer checks the changelog for the changes > it needs": > So the consumer queries the local changelog to see which changes to > request from the provider (which, in turn, consults its local changelog to > provide the changes)? > If it works that way, can you use delta-syncrepl on-way without having a > changelog on the consumer? > I always felt it's important to keep the provider's database and changelog in > sync for delta-syncrepl to work (well actually I thought the provider's > changelog should sync from a refreshed database automatically), but it > seems I have to delete the consumer's changelog database also after > reloading the provider's content. > > Maybe the admin guide should loose a few words on this... > > Then "18.3.2.1. Delta-syncrepl Provider configuration > > Setting up delta-syncrepl requires configuration changes on both the > provider > and replica servers:": > > I think it would be better to summarize the changed needed before showing > the example; otherwise the user has to guess from the example. > Likewise for "18.3.2.2. Delta-syncrepl Consumer configuration". > Also for "Note: An accesslog database is unique to a given provider.": > It's not quite clear whether one changelog can be used for multiple > databases on the provider (or whether each database needs a separate > changelog). > (same applies to the scope of RIDs: May different databases on a consumer > have the same RIDs, or must they be different?) > > > > > > Where delta-syncrepl has big problems is when sync has been set up, > > > but one database is reloaded and some UUIDs are newly created for > > > entries that exist on the other server(s). > > > Somehow slapd detects that problem and claims that a “content sync” is > > > required, but after some time it seems to start a refresh anyway. > > > > It looks like your ACLs are not as documented or you have chosen to > > reload a database *not* from a slapcat preserving some information > > (entryCSNs, ...) and not preserving other (entryUUIDs)? The required > [Windl, Ulrich] > Well I think the problem is related to schema updates that I implemented like > this in LDIF: > > dn: cn=schema,cn=config > objectClass: olcSchemaConfig > cn: schema > structuralObjectClass: olcSchemaConfig > entryUUID: db3f59a6-7c0e-1032-81c5-d54356bd918f > creatorsName: cn=config > createTimestamp: 20130708113956Z > entryCSN: 20250313000000.000000Z#000000#005#000000 > modifiersName: cn=config > modifyTimestamp: 20250313000000Z > > include: file:///etc/openldap/schema/core.ldif > > include: file:///etc/openldap/schema/cosine.ldif > > include: file:///etc/openldap/schema/inetorgperson.ldif > > include: file:///etc/openldap/schema/rfc2307bis.ldif > > include: file:///etc/openldap/schema/yast.ldif > > include: file:///etc/openldap/schema/sudo.ldif > > dn: olcDatabase={-1}frontend,cn=config > ... > > On import a new UUID and CSN will be created for each schema, but the > consumer has some different UUID/CSN for each. > I don't really know how to handle that maybe the schema LDIFs actually > should contain a UUID and a CSN. > I was thinking whether to generate those from either the files modification > time, or from the content revision (even harder to get) > > > ACLs have to give the replication identity *unrestricted* read access to > > both the replicated DB and its accesslog, anything else will lead to > > deltasync replication failing in various not always easy to spot ways. > [Windl, Ulrich] > > I think I did that. See above. > > > > > Replication cannot figure this out for you because its own state is now > > inconsistent. Either start from scratch or use a slapcat+slapadd for the > > database. If you have actually done what I'm suggesting here, please > > describe how you got into this situation because that would be a bug. > > > > > When I did the content load on the other server, slapd quit with a > > > core dump. Unfortunately I had quite a lot of core dumps during my > > > testing. > > > So I feel delta-syncrepl is not as solid as it should be (in the > > > version provided with SLES15 SP6). > > > > Yes, replication relies on keeping its own state that you interfere with > > at your own peril, potentially triggering temporary or even permanent > > desyncs. However if you encounter a crash, I will ask you again that you > > log a bug with steps to reproduce and/or a full backtrace with the > > necessary symbols available. And any logs you can provide, if you need > > to redact confidential information that is fine. We cannot fix bugs we > > are not aware of except by accident. > [Windl, Ulrich] > > I understand that, but unfortunately I'm not using one of yur official > versions, > so SUSE has to deal with their own patch sets, I'm afraid. > > > > > > May 27 13:43:35 v06 systemd-coredump[27242]: [🡕] Process 27194 (slapd) > > > of user 76 dumped core. > > > > > > Stack trace of thread 27199: > > > #0 0x00007f6b34ca8dfc __pthread_kill_implementation (libc.so.6 + > > 0xa8dfc) > > > #1 0x00007f6b34c57842 raise (libc.so.6 + 0x57842) > > > #2 0x00007f6b34c3f5cf abort (libc.so.6 + 0x3f5cf) > > > #3 0x00007f6b34c3f4e7 __assert_fail_base.cold (libc.so.6 + 0x3f4e7) > > > #4 0x00007f6b34c4fb32 __assert_fail (libc.so.6 + 0x4fb32) > > > #5 0x00007f6b34787258 n/a (syncprov.so + 0xc258) > > > #6 0x000055765d7e04f3 overlay_op_walk (slapd + 0xb74f3) > > > #7 0x000055765d7e06be n/a (slapd + 0xb76be) > > > #8 0x000055765d76ee54 fe_op_search (slapd + 0x45e54) > > > #9 0x000055765d76e726 do_search (slapd + 0x45726) > > > #10 0x000055765d76c18f n/a (slapd + 0x4318f) > > > #11 0x000055765d76c98d n/a (slapd + 0x4398d) > > > #12 0x00007f6b34ff7da0 n/a (libldap-2.5.releng.so.0 + 0x48da0) > > > #13 0x00007f6b34ca6f6c start_thread (libc.so.6 + 0xa6f6c) > > > #14 0x00007f6b34d2e338 __clone3 (libc.so.6 + 0x12e338) > > > > This backtrace is not very useful, I suggest you not strip the binaries > > or make sure you have the relevant debuginfo packages in place and have > > systemd-coredump store the core file[0] so you can actually examine it > > after the fact with gdb. > [Windl, Ulrich] > > In SLES debug information is shipped I nseparate "debuginfo" packages, and I > did not take the time to find out how to actually use them > (well even some supporters at SUSE seem not to know; in my understanding > they could use the core dump together with the binary and debuginfo to get > more useful info from the dump, but that's a different topic) > > > > > [0]. https://systemd.io/COREDUMP/ > > > > Thanks, > > > > -- > > Ondřej Kuzník > > Senior Software Engineer > > Symas Corporation http://www.symas.com > > Packaged, certified, and supported LDAP solutions powered by OpenLDAP