Hi

> On Jan 22, 2026, at 04:44, Ilia Evdokimov <[email protected]> 
> wrote:
>
> Nice catch - this is indeed not first time we run into an O(N^2) bottleneck 
> in MCV array and address it with hash-based lookup. Thanks for working on 
> this!

Thanks for the review. This change was actually inspired by your earlier
patch on a similar issue, so thanks a lot for that as well.

> I have a couple of follow-up questions after a quick look at the patch:
> 1. The hash table is created but I do not see a corresponding destroy call.

The hash table is allocated in the per-column temporary memory context
(the “Analyze Column” context that is active while compute_stats()
runs). That context is MemoryContextReset() after each column and
eventually deleted at the end of ANALYZE, so the table is reclaimed
automatically.

This matches the existing convention noted at the end of
compute_distinct_stats() (“We don’t need to bother cleaning up any of
our temporary palloc’s”).

That said, I can add an explicit DistinctHash_destroy() before returning
for clarity, although it shouldn’t be required for correctness or leak
prevention.

> 2. In the original code, when a value was not found using the nested-loop 
> search, the singleton (count = 1) region was shifted to make room for the new 
> entry. After the patch, I no longer see this shifting logic. I might be 
> missing something, but it looks like the non-hash path now follows the same 
> replacement behavior as the hash-based one.

In the original code, inserting a new distinct value into the singleton
(count = 1) region effectively evicted the oldest singleton (FIFO) by
inserting at the head of that region and shifting the tail, which is
O(n) per new distinct value.

The patch replaces this with a FIFO eviction cursor (effectively a
circular / round-robin cursor) over the count = 1 region, avoiding
repeated shifts. When hashing is enabled, this also avoids having to
update many hash→index mappings due to element shifts.

I kept the non-hash path using the same mechanism so that the hash and
fallback paths follow the same replacement behavior, with the intention
of preserving the original FIFO semantics while avoiding the O(n) cost.

I’ve also posted a v2 update. It adjusts the eviction cursor when a
count = 1 value is promoted (via the existing bubble-up logic), so that
the cursor-based eviction is now exactly equivalent to the original
shift-based FIFO behavior.

I’ve also updated the relevant comments to clarify this behavior. The v2
patch is attached.


--
Best regards,
Chengpeng Yan

Attachment: v2-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch
Description: v2-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch

Reply via email to