Hi Chengpeng,

On 21.01.2026 17:13, Chengpeng Yan wrote:
`compute_distinct_stats()` is used for data types that have an equality
operator but no ordering (so we can't use the sort-based path).  It
keeps a `track[]` array of candidate MCVs and, for each sampled row,
searches that array for a match.  With high `statistics_target`, that
becomes O(n) per sample and can dominate ANALYZE time.

Nice catch - this is indeed not first time we run into an O(N^2) bottleneck in MCV array and address it with hash-based lookup. Thanks for working on this!

I have a couple of follow-up questions after a quick look at the patch:

1. The hash table is created but I do not see a corresponding destroy call.
2. In the original code, when a value was not found using the nested-loop search, the singleton (count = 1) region was shifted to make room for the new entry. After the patch, I no longer see this shifting logic. I might be missing something, but it looks like the non-hash path now follows the same replacement behavior as the hash-based one.

I’ll need a bit more time for a deeper review and some local benchmarks, but overall the approach looks interesting.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Reply via email to