Hi,

`compute_distinct_stats()` is used for data types that have an equality
operator but no ordering (so we can't use the sort-based path).  It
keeps a `track[]` array of candidate MCVs and, for each sampled row,
searches that array for a match.  With high `statistics_target`, that
becomes O(n) per sample and can dominate ANALYZE time.

This patch keeps the existing behavior but adds a fast lookup path:

- When `attstattarget >= 100`, and the type's default hash support
matches the equality operator, build a small `simplehash` table mapping
a tracked Datum to its current `track[]` slot (average O(1) match
lookup).

- Otherwise, fall back to the existing linear scan.

While here, the singleton-eviction logic changes from repeatedly
shifting the count=1 region to a simple round-robin cursor over that
region.  This keeps replacement O(1) and (when hashing is enabled)
avoids having to update many hash->index mappings.

Performance

I used a small harness focusing on this code path (xid and aclitem, with
match-heavy / unique-heavy / Zipf-ish distributions).

Setup:
- MacBook Pro (Apple M4 Max, 36GB RAM), macOS 26.1 (Darwin 25.1.0)
- Reported numbers are median of 5 runs
- For the "after" numbers below, I set the hash threshold to 0 to show
the potential benefit; the patch as attached enables hashing only for
`attstattarget >= 100`.

Results (ms; before -> after):

obj | statistics_target | before_ms | after_ms | speedup_x
----+------------------+----------+---------+---------
bench_aclitem.x_match |               25 |  174.154 | 182.346 |     0.96
bench_aclitem.x_match |               50 |   55.708 |  55.693 |     1.00
bench_aclitem.x_match |              100 |   74.311 |  63.978 |     1.16
bench_aclitem.x_match |              200 |  143.621 |  86.790 |     1.65
bench_aclitem.x_match |              500 |  462.616 | 125.083 |     3.70
bench_aclitem.x_match |             1000 | 1590.918 | 202.849 |     7.84
bench_aclitem.x_match |             2000 | 5857.253 | 406.840 |    14.40
bench_aclitem.x_match |             5000 | 30844.177 | 1678.826 |    18.37
bench_aclitem.x_match |            10000 | 73134.141 | 9207.071 |     7.94
bench_aclitem.x_unique |               25 |   18.214 |  17.962 |     1.01
bench_aclitem.x_unique |               50 |   39.305 |  37.045 |     1.06
bench_aclitem.x_unique |              100 |   76.555 |  64.800 |     1.18
bench_aclitem.x_unique |              200 |  151.416 |  90.619 |     1.67
bench_aclitem.x_unique |              500 |  497.243 | 134.351 |     3.70
bench_aclitem.x_unique |             1000 | 1788.755 | 209.299 |     8.55
bench_aclitem.x_unique |             2000 | 7249.638 | 332.865 |    21.78
bench_aclitem.x_unique |             5000 | 50785.046 | 604.944 |    83.95
bench_aclitem.x_unique |            10000 | 195506.022 | 792.658 |   246.65
bench_aclitem.x_zipf |               25 |   17.451 |  19.770 |     0.88
bench_aclitem.x_zipf |               50 |   37.131 |  35.560 |     1.04
bench_aclitem.x_zipf |              100 |   76.053 |  67.494 |     1.13
bench_aclitem.x_zipf |              200 |  145.435 | 100.484 |     1.45
bench_aclitem.x_zipf |              500 |  429.574 | 131.647 |     3.26
bench_aclitem.x_zipf |             1000 | 1313.187 | 223.918 |     5.86
bench_aclitem.x_zipf |             2000 | 4117.637 | 445.521 |     9.24
bench_aclitem.x_zipf |             5000 | 14863.325 | 1189.886 |    12.49
bench_aclitem.x_zipf |            10000 | 31706.434 | 2269.572 |    13.97
bench_xid.x_match |               25 |   22.136 |  56.767 |     0.39
bench_xid.x_match |               50 |   42.182 |  40.947 |     1.03
bench_xid.x_match |              100 |   81.089 |  67.848 |     1.20
bench_xid.x_match |              200 |  118.315 |  80.993 |     1.46
bench_xid.x_match |              500 |  403.674 | 120.421 |     3.35
bench_xid.x_match |             1000 | 1246.068 | 171.385 |     7.27
bench_xid.x_match |             2000 | 4374.122 | 406.823 |    10.75
bench_xid.x_match |             5000 | 21961.532 | 1903.715 |    11.54
bench_xid.x_match |            10000 | 55453.808 | 11035.402 |     5.03
bench_xid.x_unique |               25 |   21.062 |  20.011 |     1.05
bench_xid.x_unique |               50 |   40.473 |  41.414 |     0.98
bench_xid.x_unique |              100 |   79.711 |  67.382 |     1.18
bench_xid.x_unique |              200 |  125.255 |  72.853 |     1.72
bench_xid.x_unique |              500 |  412.332 |  96.030 |     4.29
bench_xid.x_unique |             1000 | 1373.882 | 144.837 |     9.49
bench_xid.x_unique |             2000 | 5084.898 | 277.216 |    18.34
bench_xid.x_unique |             5000 | 31412.454 | 574.693 |    54.66
bench_xid.x_unique |            10000 | 126639.495 | 804.947 |   157.33
bench_xid.x_zipf |               25 |   21.915 |  20.758 |     1.06
bench_xid.x_zipf |               50 |   41.458 |  39.498 |     1.05
bench_xid.x_zipf |              100 |   78.128 |  70.466 |     1.11
bench_xid.x_zipf |              200 |  118.361 |  79.816 |     1.48
bench_xid.x_zipf |              500 |  351.440 | 114.009 |     3.08
bench_xid.x_zipf |             1000 | 1049.475 | 173.981 |     6.03
bench_xid.x_zipf |             2000 | 3162.409 | 404.565 |     7.82
bench_xid.x_zipf |             5000 | 11170.524 | 1346.067 |     8.30
bench_xid.x_zipf |            10000 | 23928.836 | 2549.381 |     9.39

The bench harness (bench/) is attached for reproduction.

Patch attached.


--
Best regards,
Chengpeng Yan




Attachment: v1-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch
Description: v1-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch

<<attachment: bench.zip>>

Reply via email to