Hi!

An update on my experience of delta-syncrepl with OpenLDAp 2.5 as provided by 
SLES15 SP6:
In a configuration, I decided to test replication by stopping the server, 
clearing the changelog and the DIT, then starting it again.
Unfortunately the providing server dore dumped then. After restarting, it 
continued.
So I checked whether both servers are in sync, and the were not.
So I cleared the changelog for the DIT on the crashed server (while topped, 
then started it again).
Expecting sync to happen I saw nothing. Then I realized that the other server 
had core dumped. 8-(
Somewhere in between these experiments an accesslog ran out of space, so I 
deleted that, too.

With the situation of two different servers after a core dump, I'm wondering 
what happens when an MDB changelog becomes full:
Obviously new entries cannot be added (while the sync from remote virtually 
continues), but can an existing record be updated?
Specifically I wonder about "updating mincsn": What happens to sync if the 
actual changelog entry could not be written, but mincsn is being updated? Will 
it cause inconsistency in the changelog?

Sorry, I still don't understand all the details of delta-syncrepl. As it is 
now, I don't feel it's ready for production, specifically if a (n outdated) 
consumer can cause the provider to dump core (as it seems).

I know that I should rather try version 2.6, but SLES support is only available 
for "their version".

Kind regards,
Ulrich Windl

> -----Original Message-----
> From: Windl, Ulrich <u.wi...@ukr.de>
> Sent: Monday, June 2, 2025 12:27 PM
> To: Ondřej Kuzník <on...@mistotebe.net>
> Cc: openldap-technical@openldap.org
> Subject: [EXT] RE: Re: delta-syncrepl experience with OpenLDAP 2.5 (from
> SLES15)
> 
> See inline comments (as good as Outlook allows)
> 
> Kind regards,
> Ulrich Windl
> 
> > -----Original Message-----
> > From: Ondřej Kuzník <on...@mistotebe.net>
> > Sent: Wednesday, May 28, 2025 11:32 AM
> > To: Windl, Ulrich <u.wi...@ukr.de>
> > Cc: openldap-technical@openldap.org
> > Subject: [EXT] Re: delta-syncrepl experience with OpenLDAP 2.5 (from
> > SLES15)
> >
> > On Tue, May 27, 2025 at 12:59:42PM +0000, Windl, Ulrich wrote:
> > > Hi!
> > >
> > > When upgrading from OpenLDAP 2.4 (SLES12) to OpenLDAP 2.5 (SLES15) I
> > > gave delta-syncrepl a try. That was a hard way in several aspects.
> > > Meanwhile I think I understand most of the details (docs could be much
> > > better IMHO).
> >
> > Hi Ulrich,
> > I am sorry that you have not had a great experience, please suggest
> > which parts of documentation (manpages, admin guide) you feel should be
> > adjusted.
> [Windl, Ulrich]
> In slapd-config(5):
> The olcSyncrepl description is quite long, and lists all the keywords.
> I think before there should be a description or a reference to a description
> explaining the replication variants (and details how theose work)
> 
> In the Admin Guide (18.2.1. Delta-syncrepl replication) disadvantages of
> LDAP Sync are explained and the advantages of delta-syncrepl, but the
> relationship between local database and the corresponding changelog is not
> explained well IMHO.
> For example "The replication consumer checks the changelog for the changes
> it needs":
> So the consumer queries the local changelog to see which changes to
> request from the provider (which, in turn, consults its local changelog to
> provide the changes)?
> If it works that way, can you use delta-syncrepl on-way without having a
> changelog on the consumer?
> I always felt it's important to keep the provider's database and changelog in
> sync for delta-syncrepl to work (well actually I thought the provider's
> changelog should sync from a refreshed database automatically), but it
> seems I have to delete the consumer's changelog database also after
> reloading the provider's content.
> 
> Maybe the admin guide should loose a few words on this...
> 
> Then "18.3.2.1. Delta-syncrepl Provider configuration
> 
> Setting up delta-syncrepl requires configuration changes on both the
> provider
> and replica servers:":
> 
> I think it would be better to summarize the changed needed before showing
> the example; otherwise the user has to guess from the example.
> Likewise for "18.3.2.2. Delta-syncrepl Consumer configuration".
> Also for "Note: An accesslog database is unique to a given provider.":
> It's not quite clear whether one changelog can be used for multiple
> databases on the provider (or whether each database needs a separate
> changelog).
> (same applies to the scope of RIDs: May different databases on a consumer
> have the same RIDs, or must they be different?)
> 
> >
> > > Where delta-syncrepl has big problems is when sync has been set up,
> > > but one database is reloaded and some UUIDs are newly created for
> > > entries that exist on the other server(s).
> > > Somehow slapd detects that problem and claims that a “content sync” is
> > > required, but after some time it seems to start a refresh anyway.
> >
> > It looks like your ACLs are not as documented or you have chosen to
> > reload a database *not* from a slapcat preserving some information
> > (entryCSNs, ...) and not preserving other (entryUUIDs)? The required
> [Windl, Ulrich]
> Well I think the problem is related to schema updates that I implemented like
> this in LDIF:
> 
> dn: cn=schema,cn=config
> objectClass: olcSchemaConfig
> cn: schema
> structuralObjectClass: olcSchemaConfig
> entryUUID: db3f59a6-7c0e-1032-81c5-d54356bd918f
> creatorsName: cn=config
> createTimestamp: 20130708113956Z
> entryCSN: 20250313000000.000000Z#000000#005#000000
> modifiersName: cn=config
> modifyTimestamp: 20250313000000Z
> 
> include: file:///etc/openldap/schema/core.ldif
> 
> include: file:///etc/openldap/schema/cosine.ldif
> 
> include: file:///etc/openldap/schema/inetorgperson.ldif
> 
> include: file:///etc/openldap/schema/rfc2307bis.ldif
> 
> include: file:///etc/openldap/schema/yast.ldif
> 
> include: file:///etc/openldap/schema/sudo.ldif
> 
> dn: olcDatabase={-1}frontend,cn=config
> ...
> 
> On import a new UUID and CSN will be created for each schema, but the
> consumer has some different UUID/CSN for each.
> I don't really know how to handle that maybe the schema LDIFs actually
> should contain a UUID and a CSN.
> I was thinking whether to generate those from either the files modification
> time, or from the content revision (even harder to get)
> 
> > ACLs have to give the replication identity *unrestricted* read access to
> > both the replicated DB and its accesslog, anything else will lead to
> > deltasync replication failing in various not always easy to spot ways.
> [Windl, Ulrich]
> 
> I think I did that. See above.
> 
> >
> > Replication cannot figure this out for you because its own state is now
> > inconsistent. Either start from scratch or use a slapcat+slapadd for the
> > database. If you have actually done what I'm suggesting here, please
> > describe how you got into this situation because that would be a bug.
> >
> > > When I did the content load on the other server, slapd quit with a
> > > core dump. Unfortunately I had quite a lot of core dumps during my
> > > testing.
> > > So I feel delta-syncrepl is not as solid as it should be (in the
> > > version provided with SLES15 SP6).
> >
> > Yes, replication relies on keeping its own state that you interfere with
> > at your own peril, potentially triggering temporary or even permanent
> > desyncs. However if you encounter a crash, I will ask you again that you
> > log a bug with steps to reproduce and/or a full backtrace with the
> > necessary symbols available. And any logs you can provide, if you need
> > to redact confidential information that is fine. We cannot fix bugs we
> > are not aware of except by accident.
> [Windl, Ulrich]
> 
> I understand that, but unfortunately I'm not using one of yur official 
> versions,
> so SUSE has to deal with their own patch sets, I'm afraid.
> 
> >
> > > May 27 13:43:35 v06 systemd-coredump[27242]: [🡕] Process 27194 (slapd)
> > > of user 76 dumped core.
> > >
> > > Stack trace of thread 27199:
> > > #0  0x00007f6b34ca8dfc __pthread_kill_implementation (libc.so.6 +
> > 0xa8dfc)
> > > #1  0x00007f6b34c57842 raise (libc.so.6 + 0x57842)
> > > #2  0x00007f6b34c3f5cf abort (libc.so.6 + 0x3f5cf)
> > > #3  0x00007f6b34c3f4e7 __assert_fail_base.cold (libc.so.6 + 0x3f4e7)
> > > #4  0x00007f6b34c4fb32 __assert_fail (libc.so.6 + 0x4fb32)
> > > #5  0x00007f6b34787258 n/a (syncprov.so + 0xc258)
> > > #6  0x000055765d7e04f3 overlay_op_walk (slapd + 0xb74f3)
> > > #7  0x000055765d7e06be n/a (slapd + 0xb76be)
> > > #8  0x000055765d76ee54 fe_op_search (slapd + 0x45e54)
> > > #9  0x000055765d76e726 do_search (slapd + 0x45726)
> > > #10 0x000055765d76c18f n/a (slapd + 0x4318f)
> > > #11 0x000055765d76c98d n/a (slapd + 0x4398d)
> > > #12 0x00007f6b34ff7da0 n/a (libldap-2.5.releng.so.0 + 0x48da0)
> > > #13 0x00007f6b34ca6f6c start_thread (libc.so.6 + 0xa6f6c)
> > > #14 0x00007f6b34d2e338 __clone3 (libc.so.6 + 0x12e338)
> >
> > This backtrace is not very useful, I suggest you not strip the binaries
> > or make sure you have the relevant debuginfo packages in place and have
> > systemd-coredump store the core file[0] so you can actually examine it
> > after the fact with gdb.
> [Windl, Ulrich]
> 
> In SLES debug information is shipped I nseparate "debuginfo" packages, and I
> did not take the time to find out how to actually use them
> (well even some supporters at SUSE seem not to know; in my understanding
> they could use the core dump together with the binary and debuginfo to get
> more useful info from the dump, but that's a different topic)
> 
> >
> > [0]. https://systemd.io/COREDUMP/
> >
> > Thanks,
> >
> > --
> > Ondřej Kuzník
> > Senior Software Engineer
> > Symas Corporation                       http://www.symas.com
> > Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Reply via email to