On 07.08.2020 00:33, Tomas Vondra wrote:

Unfortunately Konstantin did not share any details about what workloads
he tested, what config etc. But I find the "no regression" hypothesis
rather hard to believe, because we're adding non-trivial amount of code
to a place that can be quite hot.

Sorry, that I have not explained  my test scenarios.
As far as Postgres is pgbench-oriented database:) I have also used pgbench:
read-only case and sip-some updates.
For this patch most critical is number of buffer allocations,
so I used small enough database (scale=100), but shared buffer was set to 1Gb. As a result, all data is cached in memory (in file system cache), but there is intensive swapping at Postgres buffer manager level. I have tested it both with relatively small (100) and large (1000) number of clients. I repeated this tests at my notebook (quadcore, 16Gb RAM, SSD) and IBM Power2 server with about 380 virtual cores  and about 1Tb of memory. I the last case results are vary very much I think because of NUMA architecture) but I failed to find some noticeable regression of patched version.


But I have to agree that adding parallel hash (in addition to existed buffer manager hash) is not so good idea.
This cache really quite frequently becomes bottleneck.
My explanation of why I have not observed some noticeable regression was that this patch uses almost the same lock partitioning schema as already used it adds not so much new conflicts. May be in case of POwer2 server, overhead of NUMA is much higher than other factors (although shared hash is one of the main thing suffering from NUMA architecture). But in principle I agree that having two independent caches may decrease speed up to two times  (or even more).

I hope that everybody will agree that this problem is really critical. It is certainly not the most common case when there are hundreds of relation which are frequently truncated. But having quadratic complexity in drop function is not acceptable from my point of view. And it is not only recovery-specific problem, this is why solution with local cache is not enough.

I do not know good solution of the problem. Just some thoughts.
- We can somehow combine locking used for main buffer manager cache (by relid/blockno) and cache for relid. It will eliminates double locking overhead. - We can use something like sorted tree (like std::map) instead of hash - it will allow to locate blocks both by relid/blockno and by relid only.


Reply via email to