https://bugs.openldap.org/show_bug.cgi?id=9210

            Bug ID: 9210
           Summary: [with patch] Infinite retry-loop (and thus 100%
                    CPU-Usage) when lots of requests are issued
           Product: OpenLDAP
           Version: 2.4.47
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: ---
         Component: libraries
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 706
  --> https://bugs.openldap.org/attachment.cgi?id=706&action=edit
Patch adding errno resets

*tl;dr* single-stepping revealed a missing `errno` reset in `ber_int_sb_write`s
retry loop.

An sssd-setup of ours, which we use for basic-auth on one of our services,
issues ldap calls.  When under load, i.e. when many `ldap_search_ext` calls had
to be issued due to many requests, we observed that the corresponding
process/thread went up to 100% CPU usage and stayed there.

- This was the
[flamegraph](https://helios.wh2.tu-dresden.de/~shreyder/sssd_be%20--domain%20dom-http-wiki.svg),
where you can see that it was stuck below `ber_int_sb_write`.
- Single-Stepping with GDB revealed that we are stuck in the
`for(;;)`-Retry-loop.  Indeed, we could observe that the `sbi_write` was
successful, but the `errno` continued to be `EINTR` every time I hit that
breakpoint.
- Patching `sockbuf.c` as attached and rebuilding resolved the issue.

I also noticed similar sections with such a loop in `sockbuf.c` and added
`errno = 0;` at the beginning of each iteration.  In principle, they should
suffer from the same problem.

The reasoning for why this happened under load is that with many requests being
issued, the probability that the write happens when the process gets an
_actual_ interrupt is much higher, and once that happens, we're stuck in the
infinite loop.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Reply via email to