>>> Simone Piccardi <[email protected]> schrieb am 05.11.2020 um 16:17 in Nachricht <[email protected]>: > Il 03/11/20 22:49, Quanah Gibson‑Mount ha scritto: > >>> The problem manifests itself without periodicity and looking on the >>> number of connection before it we could not see any usage peak. We tried >>> to strace slapd threads during the problem, and they seem blocked on a >>> mutex waiting for the one running at 100% (in a single CPU, user time). >>> I'm attaching a top results during one of these events. >> >> If you can attach to the process while this is occurring, I'd suggest >> obtaining a full GDB backtrace to see what the different slapd threads >> are doing at that time. Also, what mutex specifically is slapd waiting on? >> > I executed gstack on the slapd pid during one of such events saving the > output, they are attached, but the running slapd is stripped so they are > quite obscure (at least for me).
I think even when stripped, you could "re-attach" the symbols (given that you saved them before stripping). For some dirstributions, such symbol (debug) packages are available for install. I don't know for your package source, however. > > We are trying to put in a non stripped version (compiled with > CFLAGS='‑g" and ‑‑enable‑debug=yes) in use for a test, but that's a > production machine, and it will take a while. > > What I should do to find which one the mutex is? in the straces they are > identified just by a number. > >>> So a first question is: there is any other configuration parameter about >>> indexing that I can try? >> >> If you really believe that this is indexing related, you should be able >> to tell this from the slapd logs at "stats" logging, where you would see >> a specific search taking a significant amount of time. However that >> generally does not lead to a system that's paused as searches shouldn't >> trigger a mutex issue like what you're describing. >> > No, it is not that I believe that, as I said it was just a guess about > something that could need full CPU for tens of seconds blocking all > other operations. But from what you are saying the guess is probably > plain wrong. > >> Is this on RHEL7 or later? If you have both "stats" and "sync" logging >> enabled (the recommended setting for replicating nodes), what does the >> slapd log show is happening at this time? > > The server is running an updated version of Amazon Linux (Amazon Linux > AMI 2018.03). > > We enabled stats and sync to logs, and I'm attaching a redacted excerpt > of them around the incident time, when I also took the gstack.txt (done > at 00:39:04) and gstack2.txt (done at 00:39:20) backtraces. But during > that time there is no data. > > Simone
