Hi > On Jan 22, 2026, at 04:44, Ilia Evdokimov <[email protected]> > wrote: > > Nice catch - this is indeed not first time we run into an O(N^2) bottleneck > in MCV array and address it with hash-based lookup. Thanks for working on > this!
Thanks for the review. This change was actually inspired by your earlier patch on a similar issue, so thanks a lot for that as well. > I have a couple of follow-up questions after a quick look at the patch: > 1. The hash table is created but I do not see a corresponding destroy call. The hash table is allocated in the per-column temporary memory context (the “Analyze Column” context that is active while compute_stats() runs). That context is MemoryContextReset() after each column and eventually deleted at the end of ANALYZE, so the table is reclaimed automatically. This matches the existing convention noted at the end of compute_distinct_stats() (“We don’t need to bother cleaning up any of our temporary palloc’s”). That said, I can add an explicit DistinctHash_destroy() before returning for clarity, although it shouldn’t be required for correctness or leak prevention. > 2. In the original code, when a value was not found using the nested-loop > search, the singleton (count = 1) region was shifted to make room for the new > entry. After the patch, I no longer see this shifting logic. I might be > missing something, but it looks like the non-hash path now follows the same > replacement behavior as the hash-based one. In the original code, inserting a new distinct value into the singleton (count = 1) region effectively evicted the oldest singleton (FIFO) by inserting at the head of that region and shifting the tail, which is O(n) per new distinct value. The patch replaces this with a FIFO eviction cursor (effectively a circular / round-robin cursor) over the count = 1 region, avoiding repeated shifts. When hashing is enabled, this also avoids having to update many hash→index mappings due to element shifts. I kept the non-hash path using the same mechanism so that the hash and fallback paths follow the same replacement behavior, with the intention of preserving the original FIFO semantics while avoiding the O(n) cost. I’ve also posted a v2 update. It adjusts the eviction cursor when a count = 1 value is promoted (via the existing bubble-up logic), so that the cursor-based eviction is now exactly equivalent to the original shift-based FIFO behavior. I’ve also updated the relevant comments to clarify this behavior. The v2 patch is attached. -- Best regards, Chengpeng Yan
v2-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch
Description: v2-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch
