Re: [PERFORM] Performance of information_schema with many schemata and tables

Ulf Lohbrügge Wed, 28 Jun 2017 07:26:16 -0700

2017-06-28 10:43 GMT+02:00 Pritam Baral <pri...@pritambaral.com>:

>
>
> On Wednesday 28 June 2017 02:00 PM, Ulf Lohbrügge wrote:
> > Nope, I didn't try that yet. But I don't have the impression that
> reindexing the indexes in information_schema will help. The table
> information_schema.tables consists of the following indexes:
> >
> >     "pg_class_oid_index" UNIQUE, btree (oid)
> >     "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
> >     "pg_class_tblspc_relfilenode_index" btree (reltablespace,
> relfilenode)
>
> information_schema.tables is not a table, it's a view; at least on 9.5[0].
> These indexes you list are actually indexes on the pg_catalog.pg_class
> table.
>


Yes, it's a view. \d+ information_schema.tables gives:

View definition:
 SELECT current_database()::information_schema.sql_identifier AS
table_catalog,
    nc.nspname::information_schema.sql_identifier AS table_schema,
    c.relname::information_schema.sql_identifier AS table_name,
        CASE
            WHEN nc.oid = pg_my_temp_schema() THEN 'LOCAL TEMPORARY'::text
            WHEN c.relkind = 'r'::"char" THEN 'BASE TABLE'::text
            WHEN c.relkind = 'v'::"char" THEN 'VIEW'::text
            WHEN c.relkind = 'f'::"char" THEN 'FOREIGN TABLE'::text
            ELSE NULL::text
        END::information_schema.character_data AS table_type,
    NULL::character varying::information_schema.sql_identifier AS
self_referencing_column_name,
    NULL::character varying::information_schema.character_data AS
reference_generation,
        CASE
            WHEN t.typname IS NOT NULL THEN current_database()
            ELSE NULL::name
        END::information_schema.sql_identifier AS user_defined_type_catalog,
    nt.nspname::information_schema.sql_identifier AS
user_defined_type_schema,
    t.typname::information_schema.sql_identifier AS user_defined_type_name,
        CASE
            WHEN c.relkind = 'r'::"char" OR (c.relkind = ANY
(ARRAY['v'::"char", 'f'::"char"])) AND
(pg_relation_is_updatable(c.oid::regclass, false) & 8) = 8 THEN 'YES'::text
            ELSE 'NO'::text
        END::information_schema.yes_or_no AS is_insertable_into,
        CASE
            WHEN t.typname IS NOT NULL THEN 'YES'::text
            ELSE 'NO'::text
        END::information_schema.yes_or_no AS is_typed,
    NULL::character varying::information_schema.character_data AS
commit_action
   FROM pg_namespace nc
     JOIN pg_class c ON nc.oid = c.relnamespace
     LEFT JOIN (pg_type t
     JOIN pg_namespace nt ON t.typnamespace = nt.oid) ON c.reloftype = t.oid
  WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"]))
AND NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid, 'SELECT, INSERT, UPDATE,
REFERENCES'::text));


>
> >
> > The costly sequence scan in question on pg_class happens with the
> following WHERE clause:
> >
> > WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"]))
> AND NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
> 'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
> DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR 
> has_any_column_privilege(c.oid,
> 'SELECT, INSERT, UPDATE, REFERENCES'::text));
>
> This is not the bottleneck WHERE clause the query plan from your first
> mail shows. That one is:
>
> ((relkind = ANY ('{r,v,f}'::"char"[])) AND (((relname)::information_
> schema.sql_identifier)::text = 'bar'::text) AND (pg_has_role(relowner,
> 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE,
> TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid,
> 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
>

The part you copied is from the EXPLAIN ANALYZE output. The WHERE clause I
posted earlier (or see view definition) above does unfortunately not
contain the relname.


>
> I can say with certainty that an index on pg_catalog.pg_class.relname is
> going to speed this up. Postgres doesn't allow modifying system catalogs,
> but the `REINDEX SYSTEM <dbname>;` command should rebuild the system
> indexes and pg_catalog.pg_class.relname should be included in them (I
> tested on 9.6).
>
> Do try that once. If you still see sequential scans, check what indexes
> are present on pg_catalog.pg_class.
>

I just fired a 'REINDEX SYSTEM <dbname>;' but the output of EXPLAIN ANALYZE
is unchanged and the query duration did not change.

Best Regards,
Ulf


>
>
> >
> > Besides pg_class_oid_index none of the referenced columns is indexed. I
> tried to add an index on relowner but didn't succeed because the column is
> used in the function call pg_has_role and the query is still forced to do a
> sequence scan.
> >
> > Regards,
> > Ulf
> >
> > 2017-06-28 3:31 GMT+02:00 Pritam Baral <pri...@pritambaral.com <mailto:
> pri...@pritambaral.com>>:
> >
> >     On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> >     > Hi all,
> >     >
> >     > we use schemata to separate our customers in a multi-tenant setup
> (9.5.7, Debian stable). Each tenant is managed in his own schema with all
> the tables that only he can access. All tables in all schemata are the same
> in terms of their DDL: Every tenant uses e.g. his own table 'address'. We
> currently manage around 1200 schemata (i.e. tenants) on one cluster. Every
> schema consists currently of ~200 tables - so we end up with ~240000 tables
> plus constraints, indexes, sequences et al.
> >     >
> >     > Our current approach is quite nice in terms of data privacy
> because every tenant is isolated from all other tenants. A tenant uses his
> own user that gives him only access to the corresponding schema.
> Performance is great for us - we didn't expect Postgres to scale so well!
> >     >
> >     > But performance is pretty bad when we query things in the
> information_schema:
> >     >
> >     > SELECT
> >     >   *
> >     > FROM information_schema.tables
> >     > WHERE table_schema = 'foo'
> >     > AND table_name = 'bar';``
> >     >
> >     > Above query results in a large sequence scan with a filter that
> removes 1305161 rows:
> >     >
> >     >
>
>                              QUERY PLAN
> >     > ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------------------------------------------
> >     >  Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265)
> (actual time=383.505..383.505 rows=0 loops=1)
> >     >    ->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141)
> (actual time=383.504..383.504 rows=0 loops=1)
> >     >          Join Filter: (nc.oid = c.relnamespace)
> >     >          ->  Seq Scan on pg_class c  (cost=0.00..101023.01
> rows=867 width=77) (actual time=383.502..383.502 rows=0 loops=1)
> >     >                Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND
> (((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT,
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
> >     >                Rows Removed by Filter: 1305161
> >     >          ->  Materialize  (cost=0.00..56.62 rows=5 width=68)
> (never executed)
> >     >                ->  Seq Scan on pg_namespace nc  (cost=0.00..56.60
> rows=5 width=68) (never executed)
> >     >                      Filter: ((NOT pg_is_other_temp_schema(oid))
> AND (((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
> >     >    ->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never
> executed)
> >     >          ->  Index Scan using pg_type_oid_index on pg_type t
> (cost=0.42..8.12 rows=1 width=72) (never executed)
> >     >                Index Cond: (c.reloftype = oid)
> >     >          ->  Index Scan using pg_namespace_oid_index on
> pg_namespace nt  (cost=0.28..0.30 rows=1 width=68) (never executed)
> >     >                Index Cond: (oid = t.typnamespace)
> >     >  Planning time: 0.624 ms
> >     >  Execution time: 383.784 ms
> >     > (16 rows)
> >     >
> >     > We noticed the degraded performance first when using the psql cli.
> Pressing tab after beginning a WHERE clause results in a query against the
> information_schema which is pretty slow and ends in "lag" when trying to
> enter queries.
> >     >
> >     > We also use Flyway (https://flywaydb.org/) to handle our database
> migrations. Unfortunately Flyway is querying the information_schema to
> check if specific tables exist (I guess this is one of the reasons
> information_schema exists) and therefore vastly slows down the migration of
> our tenants. Our last migration run on all tenants (schemata) almost took
> 2h because the above query is executed multiple times per tenant. The
> migration run consisted of multiple sql files to be executed and triggered
> more than 10 queries on information_schema per tenant.
> >     >
> >     > I don't think that Flyway is to blame because querying the
> information_schema should be a fast operation (and was fast for us when we
> had less schemata). I tried to speedup querying pg_class by adding indexes
> (after enabling allow_system_table_mods) but didn't succeed. The function
> call 'pg_has_role' is probably not easy to optimize.
> >     >
> >     > Postgres is really doing a great job to handle those many schemata
> and tables but doesn't scale well when querying information_schema. I
> actually don't want to change my current multi-tenant setup (one schema per
> tenant) as it is working great but the slow information_schema is killing
> our deployments.
> >     >
> >     > Are there any other options besides switching from
> one-schema-per-tenant-approach? Any help is greatly appreciated!
> >
> >     Have you tried a `REINDEX SYSTEM <dbname>`?
> >
> >     >
> >     > Regards,
> >     > Ulf
> >
> >     --
> >     #!/usr/bin/env regards
> >     Chhatoi Pritam Baral
> >
> >
> >
> >     --
> >     Sent via pgsql-performance mailing list (
> pgsql-performance@postgresql.org <mailto:pgsql-performance@postgresql.org
> >)
> >     To make changes to your subscription:
> >     http://www.postgresql.org/mailpref/pgsql-performance <
> http://www.postgresql.org/mailpref/pgsql-performance>
> >
> >
>
> [0]: https://www.postgresql.org/docs/9.5/static/infoschema-tables.html
>
> --
> #!/usr/bin/env regards
> Chhatoi Pritam Baral
>
>

Re: [PERFORM] Performance of information_schema with many schemata and tables

Reply via email to