On Fri, 2 Mar 2018, Mateusz Guzik wrote:
sx: fix adaptive spinning broken in r327397
The condition was flipped.
In particular heavy multithreaded kernel builds on zfs started suffering
due to nested sx locks.
For instance make -s -j 128 buildkernel:
before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total
What are you doing to make builds so slow? Kernel builds here take about
5 seconds. A full world takes about 120 seconds. This is with old (~5.2)
worlds. (I don't like newer worlds since they are so much slower.
Yesterday I did more than 200 makeworlds looking for a lost wakeup for
idle threads. When makeworld has a sys time of 50-100 seconds, it is
hard to see the effect of optimizations in the noise. I recently rewrote
the statistics reporting to show real, user, sys, interrupt, total = sum
of these and idle times for the whole system. Based on kern.cp_time. This
is innaccurate, but it is much better than rusage times which can't show
times for other threads or even interrupt and idle times for threads in the
build's tree. Results look like:
128.57r 566.24u 77.10s 0.83i 644.17t 384.36I 408X i386-12-cold-102-40
168.53r 559.01u 76.95s 1.07i 637.03t 711.19I 735X i386-12-cold-102-40
Here I is the idle time according to kern.cp_time and X is the idle time
according to td_runtime. The latter is much more accurate and shows that
there is an error of about 24 seconds = 24 * (stathz = 127) = 3048 ticks
in the u+s+i vs I ticks (u+s+i is too high by 24 seconds or 3 seconds per
CPU, and I is too low by 24 seconds).
The second run shows an enormous variance in the idle time -- of 327 seconds
or 41 seconds per CPU. This pessimization was mostly the result of
enabling PREEMPTION and IPI_PREEMPTION, which was supposed to decrease the
variance. Normally I turn these off to reduce context switches, and this
somehow reduces the variance to more like 5 seconds per CPU. Context switches
turned out to be mostly for waking up idle threads and this should not be
affected by PREEMPTION, and my optimization of turning off PREEMPTION makes
little difference. That it helps on average and for the best case was
determined by running many more than a few hundred makeworlds. Increasing
kern.sched.quantum makes similarly little difference. I don't normally use
that since it was a pessimization on average. Now I think I understand why
it is a pessimization: it saves a little u+s+i time by reducing context
switches, but increases I time by reducing compensation for the missing
These benchmarks are with 4BSD. ULE is slightly slower and has lower
variance but its variance is high enough to indicate missed wakeups.
I can reduce the variance using spinloops, so that idle threads never
sleep. This only works with HTT turned off (the above is wirh 4 cores
times 2 threads). With HTT on, spinloops in idle threads slow down
running threads by abiut 50%. ULE is aware of this and avoids internal
spinloops for HTT cases. 4BSD and lower levels are not aware of this,
but it takes the misconfigurations machdep.idle_mwait=0 and machdep.idle=
spin to reach the spinloop case.
Some times for ULE:
127.06r 565.81u 79.40s 0.65i 645.86t 370.58I 394X i386-12-cold-102-40
125.83r 566.72u 79.10s 0.61i 646.43t 360.24I 382X i386-12-warm-102-40
130.33r 565.56u 79.41s 0.65i 645.62t 397.02I 417X i386-12-cold-102-40
118.70r 566.55u 80.46s 0.54i 647.56t 302.05I 325X i386-12-warm-102-40
127.09r 569.36u 77.07s 0.98i 647.41t 369.28I 390X i386-12-cold-102-40
115.80r 570.67u 79.17s 0.63i 650.46t 275.90I 297X i386-12-warm-102-40
115.80r is the best time seen recently.
The u+s+i = t times average a couple of seconds higher for ULE. This is
not quite in the noise -- 2 seconds for 8 CPUs is 0.25 seconds in real time.
The noise in the I time dominates the signal in the u+s+i = t times, so
it isn't clear if ULE is doing anything better or worse. If the u+s+i
times can be trusted, then ULE's affinity optimizations are apparently
I have some optimizations in 4BSD too. These might be negative too. One
seemed to work, but only on 1 system, and recent benchmarks indicated that
this is just because it accidentally reduced lost wakeups. Its improvements
for u+s+i are in the noise.
email@example.com mailing list
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"