David Rowley <[email protected]> writes:
> On Thu, 27 Nov 2025 at 07:26, Tom Lane <[email protected]> wrote:
>> I checked the costing calculations and it's basically that
>> genericcostestimate doesn't understand about hash buckets.
> I assume you must mean using your "x % 100" case rather than Sergei's case.
Right.
>> ... tuples in the index, so it supposes that that means we'll visit
>> 1/100th of the index's pages (rounded up to 1).
> I'm not so clear on why this is bad. The index has 1000 tuples on 6
> pages and we want to scan 1% of the rows in the index. As a result,
> genericcostestimate() calculates that's 1 page. How many pages would
> you expect to scan?
To be clear, neither of genericcostestimate's estimates are bad as
a generic estimate. And I'm not even sure that we could do better
with a hash-aware estimate implemented in hashcostestimate. I think
the key thing about this test case is that the partial index ends
up with all its entries in one hash bucket. I'm not sure that we
could reasonably expect to know that in the cost estimator. Even
if we could, should we really expend a lot of effort on the case?
It's a textbook example of when you should not use a hash index.
I'm interested in fixing this can't-generate-this-plan-shape bug
because it probably impacts more-realistic use-cases. But I
think the cost estimation problem is probably specific to cases
where you shouldn't have used a hash index, so I'm okay with
ignoring that part. Until more evidence arrives, anyway.
regards, tom lane