>>> Simone Piccardi <[email protected]> schrieb am 05.11.2020 um 16:17 in
Nachricht <[email protected]>:
> Il 03/11/20 22:49, Quanah Gibson‑Mount ha scritto:
> 
>>> The problem manifests itself without periodicity and looking on the
>>> number of connection before it we could not see any usage peak. We tried
>>> to strace slapd threads during the problem, and they seem blocked on a
>>> mutex waiting for the one running at 100% (in a single CPU, user time).
>>> I'm attaching a top results during one of these events.
>> 
>> If you can attach to the process while this is occurring, I'd suggest
>> obtaining a full GDB backtrace to see what the different slapd threads
>> are doing at that time.  Also, what mutex specifically is slapd waiting
on?
>> 
> I executed gstack on the slapd pid during one of such events saving the
> output, they are attached, but the running slapd is stripped so they are
> quite obscure (at least for me).

I think even when stripped, you could "re-attach" the symbols (given that you
saved them before stripping). For some dirstributions, such symbol (debug)
packages are available for install. I don't know for your package source,
however.

> 
> We are trying to put in a non stripped version (compiled with
> CFLAGS='‑g"  and ‑‑enable‑debug=yes) in use for a test, but that's a
> production machine, and it will take a while.
> 
> What I should do to find which one the mutex is? in the straces they are
> identified just by a number.
> 
>>> So a first question is: there is any other configuration parameter about
>>> indexing that I can try?
>> 
>> If you really believe that this is indexing related, you should be able
>> to tell this from the slapd logs at "stats" logging, where you would see
>> a specific search taking a significant amount of time.  However that
>> generally does not lead to a system that's paused as searches shouldn't
>> trigger a mutex issue like what you're describing.
>> 
> No, it is not that I believe that, as I said it was just a guess about
> something that could need full CPU for tens of seconds blocking all
> other operations. But from what you are saying the guess is probably
> plain wrong.
> 
>> Is this on RHEL7 or later?  If you have both "stats" and "sync" logging
>> enabled (the recommended setting for replicating nodes), what does the
>> slapd log show is happening at this time?
> 
> The server is running an updated version of Amazon Linux (Amazon Linux
> AMI 2018.03).
> 
> We enabled stats and sync to logs, and I'm attaching a redacted excerpt
> of them around the incident time, when I also took the gstack.txt (done
> at 00:39:04) and gstack2.txt (done at 00:39:20) backtraces. But during
> that time there is no data.
> 
> Simone


Reply via email to