Re: [PERFORM] Performance of information_schema with many schemata and tables

Pritam Baral Tue, 27 Jun 2017 18:32:53 -0700

On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> Hi all,
>
> we use schemata to separate our customers in a multi-tenant setup (9.5.7, 
> Debian stable). Each tenant is managed in his own schema with all the tables 
> that only he can access. All tables in all schemata are the same in terms of 
> their DDL: Every tenant uses e.g. his own table 'address'. We currently 
> manage around 1200 schemata (i.e. tenants) on one cluster. Every schema 
> consists currently of ~200 tables - so we end up with ~240000 tables plus 
> constraints, indexes, sequences et al.
>
> Our current approach is quite nice in terms of data privacy because every 
> tenant is isolated from all other tenants. A tenant uses his own user that 
> gives him only access to the corresponding schema. Performance is great for 
> us - we didn't expect Postgres to scale so well!
>
> But performance is pretty bad when we query things in the information_schema:
>
> SELECT
>   *
> FROM information_schema.tables
> WHERE table_schema = 'foo'
> AND table_name = 'bar';``
>
> Above query results in a large sequence scan with a filter that removes 
> 1305161 rows:
>
>                                                                               
>                                                                               
>                 QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265) (actual 
> time=383.505..383.505 rows=0 loops=1)
>    ->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141) (actual 
> time=383.504..383.504 rows=0 loops=1)
>          Join Filter: (nc.oid = c.relnamespace)
>          ->  Seq Scan on pg_class c  (cost=0.00..101023.01 rows=867 width=77) 
> (actual time=383.502..383.502 rows=0 loops=1)
>                Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND 
> (((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND 
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, 
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR 
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
>                Rows Removed by Filter: 1305161
>          ->  Materialize  (cost=0.00..56.62 rows=5 width=68) (never executed)
>                ->  Seq Scan on pg_namespace nc  (cost=0.00..56.60 rows=5 
> width=68) (never executed)
>                      Filter: ((NOT pg_is_other_temp_schema(oid)) AND 
> (((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
>    ->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never executed)
>          ->  Index Scan using pg_type_oid_index on pg_type t  
> (cost=0.42..8.12 rows=1 width=72) (never executed)
>                Index Cond: (c.reloftype = oid)
>          ->  Index Scan using pg_namespace_oid_index on pg_namespace nt  
> (cost=0.28..0.30 rows=1 width=68) (never executed)
>                Index Cond: (oid = t.typnamespace)
>  Planning time: 0.624 ms
>  Execution time: 383.784 ms
> (16 rows)
>
> We noticed the degraded performance first when using the psql cli. Pressing 
> tab after beginning a WHERE clause results in a query against the 
> information_schema which is pretty slow and ends in "lag" when trying to 
> enter queries.
>
> We also use Flyway (https://flywaydb.org/) to handle our database migrations. 
> Unfortunately Flyway is querying the information_schema to check if specific 
> tables exist (I guess this is one of the reasons information_schema exists) 
> and therefore vastly slows down the migration of our tenants. Our last 
> migration run on all tenants (schemata) almost took 2h because the above 
> query is executed multiple times per tenant. The migration run consisted of 
> multiple sql files to be executed and triggered more than 10 queries on 
> information_schema per tenant.
>
> I don't think that Flyway is to blame because querying the information_schema 
> should be a fast operation (and was fast for us when we had less schemata). I 
> tried to speedup querying pg_class by adding indexes (after enabling 
> allow_system_table_mods) but didn't succeed. The function call 'pg_has_role' 
> is probably not easy to optimize.
>
> Postgres is really doing a great job to handle those many schemata and tables 
> but doesn't scale well when querying information_schema. I actually don't 
> want to change my current multi-tenant setup (one schema per tenant) as it is 
> working great but the slow information_schema is killing our deployments.
>
> Are there any other options besides switching from 
> one-schema-per-tenant-approach? Any help is greatly appreciated!


Have you tried a `REINDEX SYSTEM <dbname>`?

>
> Regards,
> Ulf

-- 
#!/usr/bin/env regards
Chhatoi Pritam Baral



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance of information_schema with many schemata and tables

Reply via email to