If you'd have to use "kill -9", something is really broken IMHO. Did you try to attach strace to the process to see what it does? (strace -p <PID>)
Kind regards, Ulrich Windl > -----Original Message----- > From: linuxm...@4lin.net <linuxm...@4lin.net> > Sent: Thursday, February 20, 2025 12:01 PM > To: openldap-technical@openldap.org > Subject: [EXT] Debian Bookworm: Issues with stucking / hanging slapd > process 2.5, while add / modify entries (master-master replication) > > Hello, > > we fighting since upgrade from Buster to Bookworm with smaller and > bigger issues on our OpenLDAP. We use WebADM as IDM (Rcdevs) and this is > using OpenLDAP as backend. Since a long while on Bookworm, we have the > issues, that slapd stucks on operations, like on adding entries. For > example adding more than 1 CN entry to an existing OU. The only way to > get all working is again, to stop slapd, but systemctl stop slapd > doesn't work, you have to use kill -9 .. and that, pretty often. > > So, I hoped, to get it working again, I cloned the VMs; cutted the > (normal) network and used a localhost bridge, so that both can see each > others, without issues. Then I've created a backup (slapcat); deleted > the db and slapd.d/cn=config ... and restored on both the DB. This part > worked without issues .. but: > > ``` > cat /home/foo/sudo_single.ldif > > dn: > cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example, > dc=local > objectclass: sudorole > objectclass: top > cn: jochoa_fra_dev_bookworm_02 > sudorunasuser: ALL > sudooption: !authenticate > sudocommand: /bin/su > sudohost: fra-dev-bookworm-02.example.local > sudouser: jochoa@example.local > > > ldapadd -ZZ -c -x -D 'cn=webadmin,ou=Accounts,dc=example,dc=local' -W > -H ldap://fra-corp-auth-01.example.com:389 -f > /home/foo/sudo_single.ldif -vv > > ldap_initialize( ldap://fra-corp-auth-01.example.com:389/??base ) > Enter LDAP Password: > add objectclass: > sudorole > top > add cn: > jochoa_fra_dev_bookworm_02 > add sudorunasuser: > ALL > add sudooption: > !authenticate > add sudocommand: > /bin/su > add sudohost: > fra-dev-bookworm-02.example.local > add sudouser: > jochoa@example.local > adding new entry > "cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl > e,dc=local" > ``` > > and then .. it just stucks, till I break with CTRL +C > > The same happens via ApacheDirectory or using WebADM Gui ... sometimes > it works .. but often not. > > > ```` > eb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on 1 > descriptor > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on: > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: 22r > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: read active on 22 > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=8 > active_threads=0 tvp=zero > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=9 > active_threads=0 tvp=zero > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=10 > active_threads=0 tvp=zero > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: connection_get(22) > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: connection_get(22): got > connid=1041 > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: connection_read(22): > checking for input on id=1041 > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: op tag 0x68, time > 1740046822 > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: conn=1041 op=1 do_add > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: conn=1041 op=1 do_add: dn > (cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example > ,dc=local) > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: >>> dnPrettyNormal: > <cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl > e,dc=local> > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: <<< dnPrettyNormal: > <cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl > e,dc=local>, > <cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl > e,dc=local> > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: conn=1041 op=1 ADD > dn="cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exa > mple,dc=local" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => mdb_entry_get: ndn: > "cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl > e,dc=local" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => mdb_entry_get: oc: > "(null)", at: "(null)" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: > mdb_dn2entry("cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudo > ers,dc=example,dc=local") > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => > mdb_dn2id("cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers > ,dc=example,dc=local") > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: <= mdb_dn2id: get failed: > MDB_NOTFOUND: No matching key/data pair found (-30798) > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => mdb_entry_get: cannot > find entry: > "cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl > e,dc=local" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: mdb_entry_get: rc=32 > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: ==> mdb_add: > cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example, > dc=local > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_required entry > (cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example > ,dc=local), > objectClass "sudoRole" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "objectClass" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type "cn" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "sudoRunAsUser" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "sudoOption" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "sudoCommand" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "sudoHost" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "sudoUser" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type > "structuralObjectClass" > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on 1 > descriptor > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on: > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: > Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=8 > active_threads=0 tvp=zero > ``` > > If I try to stop the slapd om ldap1: > > ``` > Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: conn=1001 fd=20 closed > (slapd shutdown) > Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_closing: > readying conn=1041 sd=22 for close > Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_close: deferring > conn=1041 sd=22 > Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_closing: > readying conn=1011 sd=23 for close > Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_close: deferring > conn=1011 sd=23 > Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: slapd shutdown: waiting for > 4 operations/tasks to finish > ``` > > strace shows: > > ``` > futex(0x7f3e9a9ff990, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 720, > NULL, > FUTEX_BITSET_MATCH_ANY > ``` > > So, if I stop all .. start slapd again .. all seems fine .. > > * ldap2 > > ``` > Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2 > syncprov_op_search: registered persistent search > Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2 > syncprov_op_search: no change, skipping log replay > Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2 > syncprov_op_search: nothing changed, finishing up initial search early > Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2 > syncprov_sendinfo: refreshDelete cookie= > Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2 > syncprov_search_response: detaching op > ``` > > then I again try to use ldapadd .. and I see still: > > * ldap2 > > ``` > ... > Feb 20 11:54:35 fra-corp-auth-02 slapd[4696]: =>do_syncrepl rid=002 > Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=8 > active_threads=0 tvp=zero > Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=9 > active_threads=0 tvp=zero > Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=10 > active_threads=0 tvp=zero > Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: start_refresh: rid=002 a > refresh on rid=001 in progress, pausing > Feb 20 11:54:37 fra-corp-auth-02 slapd[4696]: =>do_syncrepl rid=002 > Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=8 > active_threads=0 tvp=zero > Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=9 > active_threads=0 tvp=zero > Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=10 > active_threads=0 tvp=zero > Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: start_refresh: rid=002 a > refresh on rid=001 in progress, pausing > Feb 20 11:54:39 fra-corp-auth-02 slapd[4696]: =>do_syncrepl rid=002 > .... > > but .. a ldapsearch on ldap1 .. **still works** :-/ > > on ldap1 .. log is silent, except from my ldapsearch and ... I have to > kill -9 slapd on ldap1 again and start .. > > I have no clue .. what else I can do ..... > > any hints ? > > > cu denny