Have you benchmarked this change except in first message in this thread?

While reviewing the patch more closely, I noticed that compute_distinct_stats() is only used for types where we have =, != but not <. In practice, most common scalar types go through compute_scalar_stats() instead.

That makes me wonder how often this optimization would actually trigger in real workloads. Since compute_scalar_stats() is the more common path, there's chance that the hash-table based improvement in compute_distinct_stats() may not provide a noticeable overall benefit.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/



Reply via email to