Hi, since some weeks we operate an openldap deployment in N-Way Multi-Provider Delta-syncrepl Replication. There are around 130000 entries in the main database, and it is around 635 MB in size. Currently the replication contains two VMs each with 4Gb of RAM and 2 CPUs. The VMs are running slapd 2.6.10. I posted the configuration on [1]. I only removed the credentials and some user specific ACLs.
This setup worked flawlessly for some time until both servers rebooted during a scheduled patch circle. Since then, we see drastically increased response times and CPU utilization on both VMs. On one of the servers (ldap08) I see the following Log entry every few seconds: do_syncrep2: rid=910 (4096) Content Sync Refresh Required When I try to compare the contextCSN of both servers they differ a little but only less then 5 seconds at max. The however change constantly because these servers are used for login and store the last login and Intruder detection information which must be replicated. Most of the other data is static but there are some changes (changed passwords, Name changes) every few minutes or so. For Example a few minutes ago we had the following values: 20250717103207.135309Z#000000#000#000000 20250717115153.689217Z#000000#06a#000000 20250814105455.935611Z#000000#06b#000000 20250814105522.282937Z#000000#06c#000000 I understand that the 000 entry is from before enabling replication and the 06a value is from an old server no longer belonging to this replication (ldap05). When I last had a situation like this the servers where not in production and I shut both down, copied the database over and started them up again. This is not an option now as they are in production and needed for login. Also preventing updates for longer than a few minutes is not an option an even this has to be announced ahead of time. When I last tried adding a brand new Server cluster configured in a similar way in testing it took multiple hours to get the new server up to speed. I fear that removing one of the two servers for multiple hours would overwhelm the remaining server with requests. In the future the plan is to have at least 3 Servers in this replication but currently we only have two. It is however an option to prepare new server(s) and add them to the replication if that might help somehow. One other information is that currently the accesslog database is around 2 GB of size. What would be the best approach to remediate this situation? [1] https://next.hessenbox.de/index.php/s/jFX9gAEWXoqoxNS Mit freundlichen Grüßen Clemens (Bergmann) -- Clemens Bergmann [er/ihm; he/him] Gruppe Nutzermanagement und Entwicklung Technische Universität Darmstadt Hochschulrechenzentrum, Alexanderstraße 2, 64283 Darmstadt Tel. +49 6151 16 71184 http://www.hrz.tu-darmstadt.de/
smime.p7s
Description: S/MIME cryptographic signature