Thanks for having a look at this. On Wed, 24 Jul 2019 at 04:13, Tom Lane <t...@sss.pgh.pa.us> wrote: > > David Rowley <david.row...@2ndquadrant.com> writes: > > I'm pretty happy with v7 now. If anyone has any objections to it, > > please speak up very soon. Otherwise, I plan to push it about this > > time tomorrow. > > I dunno, this seems close to useless in this form. As it stands, > once hash_get_max_bucket has exceeded the threshold, you will > arbitrarily reset the table 1000 transactions later (since the > max bucket is certainly not gonna decrease otherwise). So that's > got two big problems: > > 1. In the assumed-common case where most of the transactions take > few locks, you wasted cycles for 999 transactions. > > 2. You'll reset the table independently of subsequent history, > even if the session's usage is pretty much always over the > threshold. Admittedly, if you do this only once per 1K > transactions, it's probably not a horrible overhead --- but you > can't improve point 1 without making it a bigger overhead.
This is true, but I think you might be overestimating just how much effort is wasted with #1. We're only seeing this overhead in small very fast to execute xacts. In the test case in , I was getting about 10k TPS unpatched and about 15k patched. This means that, on average, 1 unpatched xact takes 100 microseconds and 1 average patched xact takes 66 microseconds, so the additional time spent doing the hash_seq_search() must be about 34 microseconds. So we'll waste a total of 34 *milliseconds* if we wait for 1000 xacts before we reset the table. With 10k TPS we're going to react to the change in 0.1 seconds. I think you'd struggle to measure that 1 xact is taking 34 microseconds longer without running a few thousand queries. My view is that nobody is ever going to notice that it takes 1k xacts to shrink the table, and I've already shown that the overhead of the shrink every 1k xact is tiny. I mentioned 0.34% in  using v6. This is likely a bit smaller in v7 due to swapping the order of the if condition to put the less likely case first. Since the overhead of rebuilding the table was 7% when done every xact, then it stands to reason that this has become 0.007% doing it every 1k xats and that the additional overhead to make up that 0.34% is from testing if the reset condition has been met (or noise). That's not something we can remove completely with any solution that resets the hash table. > I did complain about the cost of tracking the stats proposed by > some earlier patches, but I don't think the answer to that is to > not track any stats at all. The proposed hash_get_max_bucket() > function is quite cheap, so I think it wouldn't be out of line to > track the average value of that at transaction end over the > session's lifespan, and reset if the current value is more than > some-multiple of the average. > > The tricky part here is that if some xact kicks that up to > say 64 entries, and we don't choose to reset, then the reading > for subsequent transactions will be 64 even if they use very > few locks. So we probably need to not use a plain average, > but account for that effect somehow. Maybe we could look at > how quickly the number goes up after we reset? > > [ thinks for awhile... ] As a straw-man proposal, I suggest > the following (completely untested) idea: > > * Make the table size threshold variable, not constant. > > * If, at end-of-transaction when the table is empty, > the table bucket count exceeds the threshold, reset > immediately; but if it's been less than K transactions > since the last reset, increase the threshold (by say 10%). > > I think K can be a constant; somewhere between 10 and 100 would > probably work. At process start, we should act as though the last > reset were more than K transactions ago (i.e., don't increase the > threshold at the first reset). I think the problem with this idea is that there is no way once the threshold has been enlarged to recover from that to work better workloads that require very few locks again. If we end up with some large value for the variable threshold, there's no way to decrease that again. All this method stops is the needless hash table resets if the typical case requires many locks. The only way to know if we can reduce the threshold again is to count the locks released during LockReleaseAll(). Some version of the patch did that, and you objected. > The main advantage this has over v7 is that we don't have the > 1000-transaction delay before reset, which ISTM is giving up > much of the benefit of the whole idea. Also, if the session > is consistently using lots of locks, we'll adapt to that after > awhile and not do useless table resets. True, but you neglected to mention the looming and critical drawback, which pretty much makes that idea useless. All we'd need to do is give this a workload that throws that variable threshold up high so that it can't recover. It would be pretty simple then to show that LockReleaseAll() is still slow with workloads that just require a small number of locks... permanently with no means to recover. To be able to reduce the threshold down again we'd need to make a hash_get_num_entries(LockMethodLocalHash) call before performing the guts of LockReleaseAll(). We could then weight that onto some running average counter with a weight of, say... 10, so we react to changes fairly quickly, but not instantly. We could then have some sort of logic like "rebuild the hash table if running average 4 times less than max_bucket" I've attached a spreadsheet of that idea and the algorithm we could use to track the running average. Initially, I've mocked it up a series of transactions that use 1000 locks, then at row 123 dropped that to 10 locks. If we assume the max_bucket is 1000, then it takes until row 136 for the running average to drop below the max_bucket count, i.e 13 xacts. There we'd reset there at the init size of 16. If the average went up again, then we'd automatically expand the table as we do now. To make this work we'd need an additional call to hash_get_num_entries(), before we release the locks, so there is more overhead.  https://www.postgresql.org/message-id/CAKJS1f_ycJ-6QTKC70pZRYdwsAwUo+t0_CV0eXC=j-b5a2m...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services