> What I'm thinking of is the regular indexscan that's done internally
> by get_actual_variable_range, not whatever ends up getting chosen as
> the plan for the user query. I had supposed that that would kill
> dead index entries as it went, but maybe that's not happening for
> some reason.
Really, this happens as you said. Index entries are marked as dead.
But after this, backends spends cpu time on skip this killed entries
in _bt_checkkeys :
if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
{
/* return immediately if there are more tuples on the page */
if (ScanDirectionIsForward(dir))
{
if (offnum < PageGetMaxOffsetNumber(page))
return NULL;
}
else
{
BTPageOpaque opaque = (BTPageOpaque)
PageGetSpecialPointer(page);
if (offnum > P_FIRSTDATAKEY(opaque))
return NULL;
}
This confirmed by perf records and backtrace reported by Vladimir earlier.
root@pgload01e ~ # perf report | grep -v '^#' | head
56.67% postgres postgres [.] _bt_checkkeys
19.27% postgres postgres [.] _bt_readpage
2.09% postgres postgres [.] pglz_decompress
2.03% postgres postgres [.] LWLockAttemptLock
1.61% postgres postgres [.] PinBuffer.isra.3
1.14% postgres postgres [.] hash_search_with_hash_value
0.68% postgres postgres [.] LWLockRelease
0.42% postgres postgres [.] AllocSetAlloc
0.40% postgres postgres [.] SearchCatCache
0.40% postgres postgres [.] ReadBuffer_common
root@pgload01e ~ #
It seems like killing dead tuples does not solve this problem.