I ran a benchmark on the patch *Reduce pg_class scans in GRANT/REVOKE
  ON ALL TABLES IN SCHEMA* (collapses 5 per-relkind pg_class heap scans
  into 1 scan distributed into per-relkind buckets). Summary below.(It took
much time to tests)


  ## Assumptions

  - Two builds of PostgreSQL 19devel from the same source tree (one
    patched, one at master tip), identical compile flags, separate
    --prefix.
  - Separate data directories, run sequentially on an otherwise idle
    host.
  - GUCs: shared_buffers=2GB, max_locks_per_transaction=100000,
    fsync=off, synchronous_commit=off, full_page_writes=off,
    autovacuum=off.
  - bench_s schema contains N empty tables (CREATE TABLE t_i()).
  - Measured operations:
      GRANT  SELECT ON ALL TABLES IN SCHEMA bench_s TO   bench_role
      REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA bench_s FROM bench_role
  - Best of 3 runs reported (seconds).
  - Two scenarios:
      A. Clean catalog   — VACUUM FULL pg_class; VACUUM FULL pg_attribute
                           immediately before measurement. pg_class
                           densely packed.
      B. Bloated catalog — pre-bloat phase: GRANT+REVOKE on all N tables
                           repeated C cycles, no VACUUM. Both patched
                           and master operate on catalogs with identical
                           relpages and n_dead_tup.


  ## Results — Scenario A (Clean catalog)

  macOS (Apple Silicon), best of 3, seconds:

    ntables | GRANT patched | GRANT master | REVOKE patched | REVOKE master
    --------+---------------+--------------+----------------+--------------
     20,000 |     0.116     |    0.115     |     0.231      |    0.226
     40,000 |     0.250     |    0.250     |     0.460      |    0.460
    100,000 |     0.730     |    0.678     |     1.193      |    1.193

  Honestly, there is no measurable performance difference in the clean
  state. Patched and master are statistically indistinguishable within
  run-to-run noise. This matches the design of the patch: when pg_class
  is densely packed, repeating a small seq scan five times is cheap, so
  collapsing it into one has nothing meaningful to save. The patch adds
  no overhead either — worst case is a tie.


  ## Results — Scenario B (Bloated catalog)

  ### Linux x86_64, C=20, best of 3, seconds

    ntables   | dead_tup  | GRANT patched | GRANT master |   Δ     | REVOKE
patched | REVOKE master |   Δ

----------+-----------+---------------+--------------+---------+----------------+---------------+---------
       10,000 |         0 |    0.0924     |    0.0935    |  −1.2 % |
0.1668     |    0.1696     |  −1.6 %
       20,000 |   109,825 |    0.2027     |    0.2069    |  −2.0 % |
0.3381     |    0.3533     |  −4.3 %
       50,000 |   329,468 |    0.5555     |    0.5895    |  −5.8 % |
0.8901     |    0.9371     |  −5.0 %
      100,000 |   879,311 |    1.1732     |    1.1968    |  −2.0 % |
1.8808     |    1.9555     |  −3.8 %
      200,000 | 1,978,925 |    2.2188     |    2.3470    |  −5.5 % |
3.7290     |    3.9064     |  −4.5 %
      500,000 | 4,178,604 |    6.0260     |    6.6663    |  −9.6 % |
9.8162     |   10.2169     |  −3.9 %
    1,000,000 | 9,678,399 |   12.9241     |   14.7657    | −12.5 % |
 24.8893     |   28.7566     | −13.4 %

  ### macOS (Apple Silicon), C=20 (C=10 at 1M), best of 3, seconds

    ntables   | dead_tup  | GRANT patched | GRANT master |   Δ    | REVOKE
patched | REVOKE master |   Δ

----------+-----------+---------------+--------------+--------+----------------+---------------+--------
       20,000 |   299,960 |    0.168      |    0.163     |  +3 %  |
0.260      |    0.278      |  −6 %
       40,000 |   519,601 |    0.307      |    0.307     |   0 %  |
0.552      |    0.564      |  −2 %
      100,000 |   959,268 |    0.784      |    0.934     | −16 %  |
1.405      |    1.419      |  ~0 %
      200,000 | 2,058,886 |    1.787      |    1.878     |  −5 %  |
2.745      |    2.849      |  −4 %
      500,000 | 4,258,565 |    4.727      |    5.197     |  −9 %  |
7.126      |    7.908      | −10 %
    1,000,000 | 9,758,364 |   10.977      |   11.126     |  −1 %  |
 19.473      |   20.759      |  −6 %

  Negative Δ = patched faster. Under catalog bloat the patch produces a
  consistent, reproducible improvement on both operating systems.


  Happy to share the bench scripts and raw logs on request.

  Thanks,
  charsyam

2026년 4월 13일 (월) 오전 9:43, Michael Paquier <[email protected]>님이 작성:

> On Sun, Apr 12, 2026 at 04:22:24PM +0900, CharSyam wrote:
> >   Benchmark
> >   ---------
> >   This is a targeted micro-optimization, not a dramatic speedup.
> >   With 10,000 tables in a single schema (pg_class ~10,452 rows),
> >   running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
> >   (6 iterations, first dropped as warmup), I measured a consistent
> >   ~15% reduction in end-to-end time:
> >
> >                        baseline    patched     delta
> >       GRANT  (avg)     88.2 ms     75.9 ms    -14%
> >       REVOKE (avg)    134.9 ms    115.7 ms    -14%
>
> I am pretty sure that there are users with millions of relations in a
> single schema that could benefit from that.  At least that would not
> be surprising with partitioning these days, and foreign tables.  What
> kind of numbers do you get if you bump up the number of digits for
> these tests.  Let's say a comparison based on a few million relations
> at least?
>
> The change you are proposing looks simple enough, quickly skimming
> through the patch.  There may be more optimizations doable here, I
> have not looked at that, still I tend to like such micro-optimization
> proposals as they provide a silent benefit.
> --
> Michael
>

Reply via email to