On 09/23/2016 03:20 AM, Robert Haas wrote:
On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra
I don't dare to suggest rejecting the patch, but I don't see how
we could commit any of the patches at this point. So perhaps
"returned with feedback" and resubmitting in the next CF (along
with analysis of improvedworkloads) would be appropriate.
I think it would be useful to have some kind of theoretical analysis
of how much time we're spending waiting for various locks. So, for
example, suppose we one run of these tests with various client
counts - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run
"select wait_event from pg_stat_activity" once per second throughout
the test. Then we see how many times we get each wait event,
including NULL (no wait event). Now, from this, we can compute the
approximate percentage of time we're spending waiting on
CLogControlLock and every other lock, too, as well as the percentage
of time we're not waiting for lock. That, it seems to me, would give
us a pretty clear idea what the maximum benefit we could hope for
from reducing contention on any given lock might be.
Yeah, I think that might be a good way to analyze the locks in general,
not just got these patches. 24h run with per-second samples should give
us about 86400 samples (well, multiplied by number of clients), which is
probably good enough.
We also have LWLOCK_STATS, that might be interesting too, but I'm not
sure how much it affects the behavior (and AFAIK it also only dumps the
data to the server log).
Now, we could also try that experiment with various patches. If we
can show that some patch reduces CLogControlLock contention without
increasing TPS, they might still be worth committing for that
reason. Otherwise, you could have a chicken-and-egg problem. If
reducing contention on A doesn't help TPS because of lock B and
visca-versa, then does that mean we can never commit any patch to
reduce contention on either lock? Hopefully not. But I agree with you
that there's certainly not enough evidence to commit any of these
patches now. To my mind, these numbers aren't convincing.
Yes, the chicken-and-egg problem is why the tests were done with
unlogged tables (to work around the WAL lock).
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: