If you'd have to use "kill -9", something is really broken IMHO.
Did you try to attach strace to the process to see what it does? (strace -p 
<PID>)

Kind regards,
Ulrich Windl

> -----Original Message-----
> From: linuxm...@4lin.net <linuxm...@4lin.net>
> Sent: Thursday, February 20, 2025 12:01 PM
> To: openldap-technical@openldap.org
> Subject: [EXT] Debian Bookworm: Issues with stucking / hanging slapd
> process 2.5, while add / modify entries (master-master replication)
> 
> Hello,
> 
> we fighting since upgrade from Buster to Bookworm with smaller and
> bigger issues on our OpenLDAP. We use WebADM as IDM (Rcdevs) and this is
> using OpenLDAP as backend. Since a long while on Bookworm, we have the
> issues, that slapd stucks on operations, like on adding entries. For
> example adding more than 1 CN entry to an existing OU. The only way to
> get all working is again, to stop slapd, but systemctl stop slapd
> doesn't work, you have to use kill -9 .. and that, pretty often.
> 
> So, I hoped, to get it working again, I cloned the VMs; cutted the
> (normal) network and used a localhost bridge, so that both can see each
> others, without issues. Then I've created a backup (slapcat); deleted
> the db and slapd.d/cn=config ... and restored on both the DB. This part
> worked without issues .. but:
> 
> ```
> cat /home/foo/sudo_single.ldif
> 
> dn:
> cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example,
> dc=local
> objectclass: sudorole
> objectclass: top
> cn: jochoa_fra_dev_bookworm_02
> sudorunasuser: ALL
> sudooption: !authenticate
> sudocommand: /bin/su
> sudohost: fra-dev-bookworm-02.example.local
> sudouser: jochoa@example.local
> 
> 
>   ldapadd -ZZ  -c -x  -D 'cn=webadmin,ou=Accounts,dc=example,dc=local' -W
>   -H ldap://fra-corp-auth-01.example.com:389 -f
> /home/foo/sudo_single.ldif  -vv
> 
> ldap_initialize( ldap://fra-corp-auth-01.example.com:389/??base )
> Enter LDAP Password:
> add objectclass:
>          sudorole
>          top
> add cn:
>          jochoa_fra_dev_bookworm_02
> add sudorunasuser:
>          ALL
> add sudooption:
>          !authenticate
> add sudocommand:
>          /bin/su
> add sudohost:
>          fra-dev-bookworm-02.example.local
> add sudouser:
>          jochoa@example.local
> adding new entry
> "cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl
> e,dc=local"
> ```
> 
> and then .. it just stucks, till I break with CTRL +C
> 
> The same happens via ApacheDirectory or using WebADM Gui ...  sometimes
> it works .. but often not.
> 
> 
> ````
> eb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on 1
> descriptor
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on:
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]:  22r
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]:
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: read active on 22
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=8
> active_threads=0 tvp=zero
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=9
> active_threads=0 tvp=zero
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=10
> active_threads=0 tvp=zero
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: connection_get(22)
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: connection_get(22): got
> connid=1041
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: connection_read(22):
> checking for input on id=1041
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: op tag 0x68, time
> 1740046822
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: conn=1041 op=1 do_add
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: conn=1041 op=1 do_add: dn
> (cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example
> ,dc=local)
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: >>> dnPrettyNormal:
> <cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl
> e,dc=local>
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: <<< dnPrettyNormal:
> <cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl
> e,dc=local>,
> <cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl
> e,dc=local>
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: conn=1041 op=1 ADD
> dn="cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exa
> mple,dc=local"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => mdb_entry_get: ndn:
> "cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl
> e,dc=local"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => mdb_entry_get: oc:
> "(null)", at: "(null)"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]:
> mdb_dn2entry("cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudo
> ers,dc=example,dc=local")
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: =>
> mdb_dn2id("cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers
> ,dc=example,dc=local")
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: <= mdb_dn2id: get failed:
> MDB_NOTFOUND: No matching key/data pair found (-30798)
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: => mdb_entry_get: cannot
> find entry:
> "cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=exampl
> e,dc=local"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: mdb_entry_get: rc=32
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: ==> mdb_add:
> cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example,
> dc=local
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_required entry
> (cn=jochoa_fra_dev_bookworm_02,ou=user_rules,ou=sudoers,dc=example
> ,dc=local),
> objectClass "sudoRole"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "objectClass"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type "cn"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "sudoRunAsUser"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "sudoOption"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "sudoCommand"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "sudoHost"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "sudoUser"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: oc_check_allowed type
> "structuralObjectClass"
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on 1
> descriptor
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: activity on:
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]:
> Feb 20 11:20:22 fra-corp-auth-01 slapd[710]: daemon: epoll: listen=8
> active_threads=0 tvp=zero
> ```
> 
> If I try to stop the slapd om ldap1:
> 
> ```
> Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: conn=1001 fd=20 closed
> (slapd shutdown)
> Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_closing:
> readying conn=1041 sd=22 for close
> Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_close: deferring
> conn=1041 sd=22
> Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_closing:
> readying conn=1011 sd=23 for close
> Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: connection_close: deferring
> conn=1011 sd=23
> Feb 20 11:23:10 fra-corp-auth-01 slapd[710]: slapd shutdown: waiting for
> 4 operations/tasks to finish
> ```
> 
> strace shows:
> 
> ```
> futex(0x7f3e9a9ff990, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 720,
> NULL,
> FUTEX_BITSET_MATCH_ANY
> ```
> 
> So, if I stop all .. start slapd again .. all seems fine ..
> 
> * ldap2
> 
> ```
> Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2
> syncprov_op_search: registered persistent search
> Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2
> syncprov_op_search: no change, skipping log replay
> Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2
> syncprov_op_search: nothing changed, finishing up initial search early
> Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2
> syncprov_sendinfo: refreshDelete cookie=
> Feb 20 11:49:32 fra-corp-auth-02 slapd[686]: conn=1209 op=2
> syncprov_search_response: detaching op
> ```
> 
>   then I again try to use ldapadd .. and I see still:
> 
> * ldap2
> 
> ```
> ...
> Feb 20 11:54:35 fra-corp-auth-02 slapd[4696]: =>do_syncrepl rid=002
> Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=8
> active_threads=0 tvp=zero
> Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=9
> active_threads=0 tvp=zero
> Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=10
> active_threads=0 tvp=zero
> Feb 20 11:54:36 fra-corp-auth-02 slapd[4696]: start_refresh: rid=002 a
> refresh on rid=001 in progress, pausing
> Feb 20 11:54:37 fra-corp-auth-02 slapd[4696]: =>do_syncrepl rid=002
> Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=8
> active_threads=0 tvp=zero
> Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=9
> active_threads=0 tvp=zero
> Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: daemon: epoll: listen=10
> active_threads=0 tvp=zero
> Feb 20 11:54:38 fra-corp-auth-02 slapd[4696]: start_refresh: rid=002 a
> refresh on rid=001 in progress, pausing
> Feb 20 11:54:39 fra-corp-auth-02 slapd[4696]: =>do_syncrepl rid=002
> ....
> 
> but .. a ldapsearch on ldap1 .. **still works** :-/
> 
> on ldap1 .. log is silent, except from my ldapsearch and ... I have to
> kill -9 slapd on ldap1 again and start ..
> 
> I have no clue .. what else I can do .....
> 
> any hints ?
> 
> 
> cu denny

Reply via email to