Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-21 Thread Dominique Devienne
On Thu, Mar 21, 2024 at 8:10 AM alex work  wrote:

> We encounter slow `GRANT ROLES` only on PostgreSQL 16 instances up to 42
> seconds
> in production, the client process at PostgresSQL would use 100% of the
> CPU. [...]
>
Using ROLE `acc`, grant `d_` ROLE to a session ROLE:
> real0m7.579s [...]

PostgreSQL 15
> Using ROLE `acc`, grant `d_` ROLE to a session ROLE:
> real0m0.077s
>

Ouch, that's a ~ 100x regression. Thanks for the write-up, that's worrying.
We don't have as many ROLEs, but we do have plenty, so this is worrying.

On top of the v16 ROLE changes breaking on ROLE logic, which was fine prior
(v12-v15).
We've paused for now our planned v16 upgrade, until we have more time to
adapt.

Like you, I welcome the changes. But it turns out more expensive to adapt
to them.
And your report certainly makes me wonder whether we should hold off until
that perf regression is addressed.

Thanks, --DD


Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-21 Thread alex work
Hello,

we run multiple versions of PostgreSQL instances on production. Some time ago
we add new physical servers and decided to go with latest GA from pgdg APT
repository, that is PostgreSQL 16.

We encounter slow `GRANT ROLES` only on PostgreSQL 16 instances up to 42 seconds
in production, the client process at PostgresSQL would use 100% of the CPU.
Which is a surprise compared to other instances running older PostgreSQL
releases. On production we have a *LOT* of ROLEs, which unfortunately a case
that we did not test before switching the new servers into production mode.


The Application & ROLEs
---
Our application make use of ROLEs. We create group ROLEs for each tenant of our
application, these ROLEs are named with `d_` and `a_` prefix.

A special ROLE, called `acc`, it will be a member to each of these `d_` and
`a_` ROLEs.

The application have a concept of "session", which it would mantain and I think
outside the scope of this e-mail. In relation to PostgreSQL, the application
would create a PostgreSQL ROLE that corresponds to its own (application)
session. It would name these ROLEs with `s_` prefix, which CREATEd and
GRANTed its permission on every application's "session".

When an application "session" started, user with `acc` ROLE would grant
membersip of `d_` ROLE to `s_` ROLE (ie. GRANT ROLE `d_` TO `s_`;)

To make this clear, for example, we have (say) role `d_202402` already existing
and application would create a new ROLE `s_001` which corresponds to
application's "session". Application that connects with special ROLE `acc`
would GRANT ROLE `d_202402` to the ROLE `s_001`, like so:

GRANT d_202402 TO s_001;


In production we have up to 13 thousands of these ROLEs, each:

$ sudo -u postgres psql -p 5531
psql (16.2 (Debian 16.2-1.pgdg120+2))
Type "help" for help.

postgres=# select count(*) s_roles_count from pg_catalog.pg_authid
where rolname like 's_%';
s_roles_count
---
13299
(1 row)

postgres=# select count(*) a_roles_count from pg_catalog.pg_authid
where rolname like 'a_%';
a_roles_count
---
12776
(1 row)

postgres=# select count(*) d_roles_count from pg_catalog.pg_authid
where rolname like 'd_%';
d_roles_count
---
13984
(1 row)


The Setup
-

Investigating this slow `GRANT ROLE` we start a VM running Debian 11,
and create a lot of roles.

create special `acc` role and write to some file:
$ echo -e "CREATE ROLE acc WITH LOGIN NOSUPERUSER INHERIT CREATEDB
CREATEROLE NOREPLICATION;\n\n" > create_acc.sql

create a lot of `a_` roles and make sure `acc` is member of each one of them:
$ for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for
idx3 in $(seq -w 1 10); do echo "CREATE ROLE a_${idx1}${idx2}${idx3}
WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"; echo "GRANT
a_${idx1}${idx2}${idx3} TO acc WITH ADMIN OPTION;"; done; done; done >
create_a.sql

create a lot of `d_` roles and make sure `acc` is member of each one of them:
$ for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for
idx3 in $(seq -w 1 10); do echo "CREATE ROLE d_${idx1}${idx2}${idx3}
WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"; echo "GRANT
d_${idx1}${idx2}${idx3} TO acc WITH ADMIN OPTION;"; done; done; done >
create_d.sql

create a lot of `s_` roles:
$ for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for
idx3 in $(seq -w 1 10); do echo "CREATE ROLE s_${idx1}${idx2}${idx3}
WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"; done; done;
done > create_s.sql

merge ROLE creation into one file:
$ cat create_acc.sql create_a.sql create_d.sql create_s.sql >
/tmp/create-roles.sql


PostgreSQL 16
-

Install PostgreSQL 16:
--
$ sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt
$(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
$ sudo apt install gnupg2
$ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc
| sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get -y install postgresql-16 postgresql-client-16


Create PostgreSQL 16 instance:
--
$ sudo pg_dropcluster --stop 16 main  # drop default Debian cluster
$ sudo pg_createcluster 16 pg16
$ echo "local   allacc trust" | sudo tee
/etc/postgresql/16/pg16/pg_hba.conf
$ echo "local   allpostgrespeer"  | sudo tee -a
/etc/postgresql/16/pg16/pg_hba.conf
$ sudo systemctl start postgresql@16-pg16.service


Import lots of roles:
--
$ sudo -u postgres /usr/lib/postgresql/16/bin/psql -f
/tmp/create-roles.sql -p 5432 -d postgres


Using ROLE `acc`, grant `d_` ROLE to a session ROLE:
--
$ time sudo -u postgres /usr/lib/postgresql/16/bin/psql -U acc
postgres -c 'GRANT d_0010109 TO s_0010109;'
GRANT ROLE

real0m7.579s
user0m0.054s
sys 0m0.020s


This is the surprising behavior for PostgreSQL 16. It seems there's a new logic
in PostgreSQL that checks against each role, and it took 100% of CPU.

At this point we know `acc` is just another