pgsql: Optimize hash index bulk-deletion with streaming read

Michael Paquier Sun, 15 Mar 2026 17:22:49 -0700

Optimize hash index bulk-deletion with streaming read

This commit refactors hashbulkdelete() to use streaming reads, improving
the efficiency of the operation by prefetching upcoming buckets while
processing a current bucket.  There are some specific changes required
to make sure that the cleanup work happens in accordance to the data
pushed to the stream read callback.  When the cached metadata page is
refreshed to be able to process the next set of buckets, the stream is
reset and the data fed to the stream read callback has to be updated.
The reset needs to happen in two code paths, when _hash_getcachedmetap()
is called.


The author has seen better performance numbers than myself on this one
(with tweaks similar to 6c228755add8).  The numbers are good enough for
both of us that this change is worth doing, in terms of IO and runtime.

Author: Xuneng Zhou <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: 
https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=k0rzexnuw6u_ggqzsju32w...@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/bfa3c4f106b1fb858ead1c8f05332f09d34f664a

Modified Files
--------------
src/backend/access/hash/hash.c   | 80 ++++++++++++++++++++++++++++++++++++++--
src/tools/pgindent/typedefs.list |  1 +
2 files changed, 78 insertions(+), 3 deletions(-)

pgsql: Optimize hash index bulk-deletion with streaming read

Reply via email to