On Mon, May 30, 2022 at 02:38:11PM +0100, Howard Chu wrote:

> Let us know how things go.

Arg. Seems to have been a red herring. Blew up again with swappiness set
to 1, and then again with swap completely disabled :(. Usual symptoms of
crazy high disk reads:

Total DISK READ :     389.05 M/s | Total DISK WRITE :       3.93 K/s            
                      
Actual DISK READ:     391.50 M/s | Actual DISK WRITE:       0.00 B/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND        
                              
 577547 be/4 ldap       36.88 M/s    0.00 B/s  0.00 % 97.92 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 577546 be/4 ldap       32.27 M/s    0.00 B/s  0.00 % 97.88 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 575034 be/4 ldap       29.47 M/s    0.00 B/s  0.00 % 97.72 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572838 be/4 ldap       27.38 M/s    0.00 B/s  0.00 % 97.66 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 575308 be/4 ldap       24.47 M/s    0.00 B/s  0.00 % 97.50 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572866 be/4 ldap       91.55 M/s    0.00 B/s  0.00 % 97.33 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572841 be/4 ldap       26.96 M/s    0.00 B/s  0.00 % 96.87 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572836 be/4 ldap       43.90 M/s    0.00 B/s  0.00 % 96.84 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 577508 be/4 ldap       76.17 M/s    0.00 B/s  0.00 % 95.96 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap

Even though there's plenty of memory:

              total        used        free      shared  buff/cache   available
Mem:           3901         944         109           1        2847        2715
Swap:             0           0           0

Looking at the lmdb mapping:

00007f662ea25000 5242880  325560       0 rw-s- data.mdb
00007f676ec26000 2097152       0       0 rw-s- data.mdb

There seems to be fewer pages mapped in than on one that isn't blowing up:

00007f6ab1606000 5242880  560712       0 rw-s- data.mdb
00007f6bf1807000 2097152  120772       0 rw-s- data.mdb

Memory use is similar:

              total        used        free      shared  buff/cache   available
Mem:           3896         725         156           0        3014        2893
Swap:          2047         127        1920


The one that's unhappy is generating a lot of page faults:

ldap-02 ~ # ps -o min_flt,maj_flt 572833; sleep 10; ps -o min_flt,maj_flt 572833
 MINFL  MAJFL
11924597 3715970
 MINFL  MAJFL
11931358 3718833
ldap-02 ~ # ps -o min_flt,maj_flt 572833; sleep 10; ps -o min_flt,maj_flt 572833
 MINFL  MAJFL
11949883 3726966
 MINFL  MAJFL
11957081 3730080

Compared to the one that's working properly, which has none:

ldap-01 ~ # ps -o min_flt,maj_flt 1227; sleep 10; ps -o min_flt,maj_flt 1227
 MINFL  MAJFL
1282224 221928
 MINFL  MAJFL
1282224 221928
ldap-01 ~ # ps -o min_flt,maj_flt 1227; sleep 10; ps -o min_flt,maj_flt 1227
 MINFL  MAJFL
1282225 221928
 MINFL  MAJFL
1282225 221928

But why? Arg. All the slow queries are asking for memberOf:

May 30 21:54:25 ldap-02 slapd[572833]: conn=120576 op=1 SRCH 
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 
filter="(&(objectClass=person)(calstateEduPersonEmplID=014994057))"
May 30 21:54:25 ldap-02 slapd[572833]: conn=120576 op=1 SRCH attr=memberOf
May 30 21:56:59 ldap-02 slapd[572833]: conn=120576 op=1 SEARCH RESULT tag=101 
err=0 qtime=0.000016 etime=154.273556 nentries=1 text=

There's something going on with the dynlist overlay and memberof queries,
but I still can't figure out what <sigh>. It's not a low on memory issue,
there's plenty of free memory. But for some reason the read IO goes through
the roof. I'm pretty sure it has the same query load while it's freaking
as it did when it was running fine.

Reply via email to