> -----Original Message-----
> From: Ondřej Kuzník <on...@mistotebe.net>
> Sent: Wednesday, August 6, 2025 12:42 PM
> To: Windl, Ulrich <u.wi...@ukr.de>
> Cc: openldap-technical@openldap.org
> Subject: [EXT] Re: slapd 2.5 dumping core on delta-synrepl issues
> 
> Sicherheits-Hinweis: Diese E-Mail wurde von einer Person außerhalb des
> UKR gesendet. Seien Sie vorsichtig vor gefälschten Absendern, wenn Sie auf
> Links klicken, Anhänge öffnen oder weitere Aktionen ausführen, bevor Sie
> die Echtheit überprüft haben.
> 
> On Tue, Aug 05, 2025 at 12:50:44PM +0000, Windl, Ulrich wrote:
> > Hi!
> >
> > I have a support case from SUSE's version of slapd open, and I wonder
> > about one specific statement from support:
> >
> > A core dump is triggered by
> >
> > syncprov.c:2360:
> >
> > assert( !BER_BVISEMPTY( &oldestcsn ) && !BER_BVISEMPTY( &newestcsn )
> &&
> >         ber_bvcmp( &oldestcsn, &newestcsn ) < 0 );
> >
> > Support explained: "Any of these indicates the changelog (accesslog)
> > is in a completely inconsistent or corrupted state."
> 
> Hi Ulrich,
> based on what have they concluded this? There is very little to go on in
> what you've provided here.
[Windl, Ulrich] 
Well support had received a core dump which you don't have, of course.
As I remember it we had two servers using bi-directional delta-syncrepl that 
also pulled updates from a third server using RefreshAndPersist (as that server 
was still running slapd 2.4).
During migration of the third serer to OpenLDAP 2.5 sync did not work as 
expected (I had made some mistakes), so I dumped the main DIT on the third 
server and slapadd-ed the LDIF into the two servers, but when starting those, 
there was some problem (I think the servers just refused to respond (i.e. took 
"forever" instead of responding)), si support had sent a "repaired" version of 
slapd.
On the very first attempt to run that fixed version it dumped core, so I had 
contacted support again, asking why it would dump core.
Support also had received some messages being written before the core dump.

The core dumps are actually not all the same, but the last one I saw was like 
this
Jul 10 08:37:13 h02 slapd[1047558]: conn=-1 op=0 accesslog_response: got result 
0x44 adding log entry reqStart=20250710063713.000001Z,cn=audit
Jul 10 08:37:13 h02 slapd[1047558]: slap_sl_malloc of 93818789174842 bytes 
failed
Jul 10 08:37:13 h02 kernel: __vm_enough_memory: pid: 1047562, comm: slapd, not 
enough memory for the allocation
Jul 10 08:37:13 h02 kernel: __vm_enough_memory: pid: 1047562, comm: slapd, not 
enough memory for the allocation
Jul 10 08:37:13 h02 kernel: __vm_enough_memory: pid: 1047562, comm: slapd, not 
enough memory for the allocation
Jul 10 08:37:13 h02 systemd[1]: Started Process Core Dump (PID 1627168/UID 0).

The system doesn't have that many GB of RAM.

> 
> Again as with all reports of crashes and desyncs, please file an ITS at
> bugs.openldap.org or encourage them to do so:
> - if you can reproduce the issue with sample data, please provide a way
>   to do so, this is universally the best way to ensure we can diagnose
>   and address a bug
> - provide as much information as possible, at a minimum relevant parts
>   of configuration, "sync"-level logs, ... from all servers involved
> - provide the values of contextCSN and minCSN on the main and accesslog
>   DBs on all the servers involved (in addition to the logs from that
>   event)
> 
> Without this you're just hoping someone else encounters this issue and
> does the right thing of giving us the information we need to isolate it.
[Windl, Ulrich] 

As SUSE's SLES version is not the "plain vanilla" type of slapd I think SUSE 
should do that if they feel the bug is from the base they are using.
> 
> > The support recommends to reset the CSNs by disabling any replication
> > (which doesn't remove those IMHO) and "either using syncrepl or
> > delta-syncrepl, but not mixing both.":
> >
> > I don't see a problem if one dependent server gets the changed through
> > "classic methods" (e.g. Refresh), and another server gets updates
> > through delte-syncrepl. Am I wrong?
> 
> Stating "either using syncrepl or delta-syncrepl, but not mixing both."
> sounds concerning. You haven't provided any sort of configuration
> snippets or even basic description of your set up to say if we should be
> concerned about this.
[Windl, Ulrich] 
Read the description I provided at the start. Sometimes it's tricky to upgrade 
a MMR-configuration "online", and I decided to break apart the configuration 
temporarily, so that the newer servers would use delta-syncrepl between each 
other while doing RefreshAndPersits for the old servers. Of course sync of 
configuration was broken at that time too.

> 
> Are these servers also providers? All providers need to have *identical*
> configuration and *full* read access to other provider's DB (both main
> DB and accesslog if used).
[Windl, Ulrich] 
Again, read above: The new servers were also providers.

> 
> > Finally support concludes: "Please note that these types of
> > replication integrity issues do not affect 389 Directory Server, which
> > uses a more robust mechanism for change tracking and includes a proper
> > Lamport clock implementation."
> 
> AFAIK 389DS replication is push based, so the design and behaviours are
> quite different. Also assuming we're looking at the above, their comment
> seems somewhat random in context.

[Windl, Ulrich] 
Well, SLES 15 officially had abandoned OpenLDAP in favor of 389DS (but did not 
provide usable tools or documentation to allow a successful migration of the 
databases), but then (maybe due to external pressure) decided to re-support 
OpenLDAP starting at SP5 od SLES15 (or so). So I guess supported wanted to say 
that I should use 389DS instead, but that isn't an option now.

Kind regards,
Ulrich

Reply via email to