Re: [PERFORM]

2017-06-28 Thread Yevhenii Kurtov
Hello folks,

Thank you very much for analysis and suggested - there is a lot to learn
here. I just  tried UNION queries and got following error:

ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT

I made a table dump for anyone who wants to give it a spin
https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr
and here is the gist for the original commands
https://gist.github.com/lessless/33215d0c147645db721e74e07498ac53

On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong  wrote:

>
>
> On 2017-06-28, Pavel Stehule wrote ...
> > On 2017-06-28, Yevhenii Kurtov wrote ...
> >> On 2017-06-28, Pavel Stehule wrote ...
> >>> On 2017-06-28, Yevhenii Kurtov wrote ...
>  We have a query that is run almost each second and it's very
> important to squeeze every other ms out of it. The query is:
>  ...
>  I added following index: CREATE INDEX ON campaign_jobs(id, status,
> failed_at, started_at, priority DESC, times_failed);
>  ...
> >>> There are few issues
> >>> a) parametrized LIMIT
> >>> b) complex predicate with lot of OR
> >>> c) slow external sort
> >>>
> >>> b) signalize maybe some strange in design .. try to replace "OR" by
> "UNION" query
> >>> c) if you can and you have good enough memory .. try to increase
> work_mem .. maybe 20MB
> >>>
> >>> if you change query to union queries, then you can use conditional
> indexes
> >>>
> >>> create index(id) where status = 0;
> >>> create index(failed_at) where status = 2;
> >>> create index(started_at) where status = 1;
> >>
> >> Can you please give a tip how to rewrite the query with UNION clause?
> >
> > SELECT c0."id" FROM "campaign_jobs" AS c0
> > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> > WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> > WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
> > ORDER BY c0."priority" DESC, c0."times_failed"
> > LIMIT $7
> > FOR UPDATE SKIP LOCKED
>
>
> Normally (at least for developers I've worked with), that kind of query
> structure is used when the "status" values don't overlap and don't change
> from query to query. Judging from Pavel's suggested conditional indexes
> (i.e. "where status = "), he also thinks that is likely.
>
> Give the optimizer that information so that it can use it. Assuming $1 = 0
> and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in
> limit. Push order by and limit to each branch of the union all (or does
> Postgres figure that out automatically?) Replace union with union all (not
> sure about Postgres, but allows other dbms to avoid sorting and merging
> result sets to eliminate duplicates). (Use of UNION ALL assumes that "id"
> is unique across rows as implied by only "id" being selected with FOR
> UPDATE. If multiple rows can have the same "id", then use UNION to
> eliminate the duplicates.)
>
> SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1)
>   UNION ALL
> SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2
>   UNION ALL
> SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3
> ORDER BY "priority" DESC, "times_failed"
> LIMIT 100
> FOR UPDATE SKIP LOCKED
>
>
> Another thing that you could try is to push the ORDER BY and LIMIT to the
> branches of the UNION (or does Postgres figure that out automatically?) and
> use slightly different indexes. This may not make sense for all the
> branches but one nice thing about UNION is that each branch can be tweaked
> independently. Also, there are probably unmentioned functional dependencies
> that you can use to reduce the index size and/or improve your match rate.
> Example - if status = 1 means that the campaign_job has started but not
> failed or completed, then you may know that started_at is set, but
> failed_at and ended_at are null. The < comparison in and of itself implies
> that only rows where "started_at" is not null will match the condition.
>
> SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND
> NOT (c0."id" = ANY($1 ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT 100
> UNION ALL
> SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND
> (c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT
> 100
> UNION ALL
> SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND
> (c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT 100
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT 100
> FOR UPDATE SKIP LOCKED
>
> Including the "priority", "times_failed" and "id" columns in the indexes
> along with "failed_at"/"started_at" allows the optimizer to do index only
> scans. (May still have to do random I/O to the data page to determine tuple
> version visibility but I don't think that can be eliminated.)
>
> create index ... ("priority" desc, "times_failed", "id")
>  where "status" = 0;

Re: [PERFORM] Performance of information_schema with many schemata and tables

2017-06-28 Thread Ulf Lohbrügge
2017-06-28 10:43 GMT+02:00 Pritam Baral :

>
>
> On Wednesday 28 June 2017 02:00 PM, Ulf Lohbrügge wrote:
> > Nope, I didn't try that yet. But I don't have the impression that
> reindexing the indexes in information_schema will help. The table
> information_schema.tables consists of the following indexes:
> >
> > "pg_class_oid_index" UNIQUE, btree (oid)
> > "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
> > "pg_class_tblspc_relfilenode_index" btree (reltablespace,
> relfilenode)
>
> information_schema.tables is not a table, it's a view; at least on 9.5[0].
> These indexes you list are actually indexes on the pg_catalog.pg_class
> table.
>

Yes, it's a view. \d+ information_schema.tables gives:

View definition:
 SELECT current_database()::information_schema.sql_identifier AS
table_catalog,
nc.nspname::information_schema.sql_identifier AS table_schema,
c.relname::information_schema.sql_identifier AS table_name,
CASE
WHEN nc.oid = pg_my_temp_schema() THEN 'LOCAL TEMPORARY'::text
WHEN c.relkind = 'r'::"char" THEN 'BASE TABLE'::text
WHEN c.relkind = 'v'::"char" THEN 'VIEW'::text
WHEN c.relkind = 'f'::"char" THEN 'FOREIGN TABLE'::text
ELSE NULL::text
END::information_schema.character_data AS table_type,
NULL::character varying::information_schema.sql_identifier AS
self_referencing_column_name,
NULL::character varying::information_schema.character_data AS
reference_generation,
CASE
WHEN t.typname IS NOT NULL THEN current_database()
ELSE NULL::name
END::information_schema.sql_identifier AS user_defined_type_catalog,
nt.nspname::information_schema.sql_identifier AS
user_defined_type_schema,
t.typname::information_schema.sql_identifier AS user_defined_type_name,
CASE
WHEN c.relkind = 'r'::"char" OR (c.relkind = ANY
(ARRAY['v'::"char", 'f'::"char"])) AND
(pg_relation_is_updatable(c.oid::regclass, false) & 8) = 8 THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS is_insertable_into,
CASE
WHEN t.typname IS NOT NULL THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS is_typed,
NULL::character varying::information_schema.character_data AS
commit_action
   FROM pg_namespace nc
 JOIN pg_class c ON nc.oid = c.relnamespace
 LEFT JOIN (pg_type t
 JOIN pg_namespace nt ON t.typnamespace = nt.oid) ON c.reloftype = t.oid
  WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"]))
AND NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid, 'SELECT, INSERT, UPDATE,
REFERENCES'::text));


>
> >
> > The costly sequence scan in question on pg_class happens with the
> following WHERE clause:
> >
> > WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"]))
> AND NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
> 'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
> DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR 
> has_any_column_privilege(c.oid,
> 'SELECT, INSERT, UPDATE, REFERENCES'::text));
>
> This is not the bottleneck WHERE clause the query plan from your first
> mail shows. That one is:
>
> ((relkind = ANY ('{r,v,f}'::"char"[])) AND (((relname)::information_
> schema.sql_identifier)::text = 'bar'::text) AND (pg_has_role(relowner,
> 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE,
> TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid,
> 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
>

The part you copied is from the EXPLAIN ANALYZE output. The WHERE clause I
posted earlier (or see view definition) above does unfortunately not
contain the relname.


>
> I can say with certainty that an index on pg_catalog.pg_class.relname is
> going to speed this up. Postgres doesn't allow modifying system catalogs,
> but the `REINDEX SYSTEM ;` command should rebuild the system
> indexes and pg_catalog.pg_class.relname should be included in them (I
> tested on 9.6).
>
> Do try that once. If you still see sequential scans, check what indexes
> are present on pg_catalog.pg_class.
>

I just fired a 'REINDEX SYSTEM ;' but the output of EXPLAIN ANALYZE
is unchanged and the query duration did not change.

Best Regards,
Ulf


>
>
> >
> > Besides pg_class_oid_index none of the referenced columns is indexed. I
> tried to add an index on relowner but didn't succeed because the column is
> used in the function call pg_has_role and the query is still forced to do a
> sequence scan.
> >
> > Regards,
> > Ulf
> >
> > 2017-06-28 3:31 GMT+02:00 Pritam Baral >:
> >
> > On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> > 

Re: [PERFORM] Efficiently merging and sorting collections of sorted rows

2017-06-28 Thread Merlin Moncure
On Fri, Jun 23, 2017 at 6:33 PM, Tom Lane  wrote:
> Clint Miller  writes:
>> That's a good plan because it's not doing a quick sort. Instead, it's just
>> reading the sort order off of the index, which is exactly what I want. (I
>> had to disable enable_sort because I didn't have enough rows of test data
>> in the table to get Postgres to use the index. But if I had enough rows,
>> the enable_sort stuff wouldn't be necessary. My real table has lots of rows
>> and doesn't need enable_sort turned off to do the sort with the index.)
>
> TBH, I think this whole argument is proceeding from false premises.
> Using an indexscan as a substitute for an explicit sort of lots of
> rows isn't all that attractive, because it implies a whole lot of
> random access to the table (unless the table is nearly in index
> order, which isn't a condition you can count on without expending
> a lot of maintenance effort to keep it that way).  seqscan-and-sort
> is often a superior alternative, especially if you're willing to give
> the sort a reasonable amount of work_mem.

Hm, if he reverses the index terms he gets his sort order for free and
a guaranteed IOS.   This would only be sensible to do only if several
conditions applied, you'd have to live under the IOS criteria
generally, the number of rows returned to what relative to what was
thrown out would have to be reasonably high (this is key), and the
result set would have to be large making the sort an expensive
consideration relative to the filtering.  You'd also have to be
uninterested in explicit filters on 's' or be willing to create
another index to do that if you were.

merlin

postgres=# \d foo
  Table "public.foo"
 Column │  Type   │ Modifiers
┼─┼───
 s  │ text│
 i  │ integer │
Indexes:
"foo_i_s_idx" btree (i, s)  -- reversed

postgres=# set enable_sort = false;
SET

postgres=# explain analyze select * from foo where s = 'a' or s = 'b'
order by i;
   QUERY PLAN
─
 Index Only Scan using foo_i_s_idx on foo  (cost=0.15..68.75 rows=12
width=36) (actual time=0.004..0.004 rows=0 loops=1)
   Filter: ((s = 'a'::text) OR (s = 'b'::text))
   Heap Fetches: 0
 Planning time: 0.215 ms
 Execution time: 0.025 ms







merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM]

2017-06-28 Thread Brad DeJong


On 2017-06-28, Pavel Stehule wrote ...
> On 2017-06-28, Yevhenii Kurtov wrote ...
>> On 2017-06-28, Pavel Stehule wrote ...
>>> On 2017-06-28, Yevhenii Kurtov wrote ...
 We have a query that is run almost each second and it's very important to 
 squeeze every other ms out of it. The query is:
 ...
 I added following index: CREATE INDEX ON campaign_jobs(id, status, 
 failed_at, started_at, priority DESC, times_failed);
 ...
>>> There are few issues 
>>> a) parametrized LIMIT
>>> b) complex predicate with lot of OR 
>>> c) slow external sort
>>>
>>> b) signalize maybe some strange in design .. try to replace "OR" by "UNION" 
>>> query
>>> c) if you can and you have good enough memory .. try to increase work_mem 
>>> .. maybe 20MB
>>>
>>> if you change query to union queries, then you can use conditional indexes
>>>
>>> create index(id) where status = 0;
>>> create index(failed_at) where status = 2;
>>> create index(started_at) where status = 1;
>>
>> Can you please give a tip how to rewrite the query with UNION clause?
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED


Normally (at least for developers I've worked with), that kind of query 
structure is used when the "status" values don't overlap and don't change from 
query to query. Judging from Pavel's suggested conditional indexes (i.e. "where 
status = "), he also thinks that is likely.

Give the optimizer that information so that it can use it. Assuming $1 = 0 and 
$3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in limit. 
Push order by and limit to each branch of the union all (or does Postgres 
figure that out automatically?) Replace union with union all (not sure about 
Postgres, but allows other dbms to avoid sorting and merging result sets to 
eliminate duplicates). (Use of UNION ALL assumes that "id" is unique across 
rows as implied by only "id" being selected with FOR UPDATE. If multiple rows 
can have the same "id", then use UNION to eliminate the duplicates.)

SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1)
  UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2
  UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3
ORDER BY "priority" DESC, "times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED


Another thing that you could try is to push the ORDER BY and LIMIT to the 
branches of the UNION (or does Postgres figure that out automatically?) and use 
slightly different indexes. This may not make sense for all the branches but 
one nice thing about UNION is that each branch can be tweaked independently. 
Also, there are probably unmentioned functional dependencies that you can use 
to reduce the index size and/or improve your match rate. Example - if status = 
1 means that the campaign_job has started but not failed or completed, then you 
may know that started_at is set, but failed_at and ended_at are null. The < 
comparison in and of itself implies that only rows where "started_at" is not 
null will match the condition.

SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND NOT 
(c0."id" = ANY($1 ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND 
(c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND 
(c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED

Including the "priority", "times_failed" and "id" columns in the indexes along 
with "failed_at"/"started_at" allows the optimizer to do index only scans. (May 
still have to do random I/O to the data page to determine tuple version 
visibility but I don't think that can be eliminated.)

create index ... ("priority" desc, "times_failed", "id")   where 
"status" = 0;
create index ... ("priority" desc, "times_failed", "id", "failed_at")  where 
"status" = 2 and "failed_at" is not null;
create index ... ("priority" desc, "times_failed", "id", "started_at") where 
"status" = 1 and "started_at" is not null; -- and ended_at is null and ...


I'm assuming that the optimizer knows that "where status = 1 and started_at < 
$3" implies "and started_at is not null" and will consider the conditional 
index. If not, then the "and started_at is not null" needs to be explicit.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:

Re: [PERFORM]

2017-06-28 Thread Gerardo Herzig


- Mensaje original -
> De: "Yevhenii Kurtov" 
> Para: pgsql-performance@postgresql.org
> Enviados: Miércoles, 28 de Junio 2017 3:47:44
> Asunto: [PERFORM]
> 
> Hello,
> 
> We have a query that is run almost each second and it's very important to
> squeeze every other ms out of it. The query is:
> 
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
> OR ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
> 
> I added following index:
> 
> CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority
> DESC, times_failed);
> 
> And it didn't help at all, even opposite - the planning phase time grew up
> from ~2ms  up to ~40 ms leaving execution time intact:
> 
>  Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
> time=827.753..828.113 rows=100 loops=1)
>->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
> time=827.752..828.096 rows=100 loops=1)
>  ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18) (actual
> time=827.623..827.653 rows=100 loops=1)
>Sort Key: priority DESC, times_failed
>Sort Method: external sort  Disk: 5472kB
>->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
> rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
>  Filter: (((status = 0) AND (id <> ALL
> ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
> 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
> 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
> '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
> (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
>  Planning time: 40.734 ms
>  Execution time: 913.638 ms
> (9 rows)
> 
> 
> I see that query still went through the Seq Scan instead of Index Scan. Is
> it due to poorly crafted index or because of query structure? Is it
> possible to make this query faster?
> 
> 
> Thanks
> 
Well, most of the time is spent ordering, and it is doing a (slow) disk sort. 
Try increasing work_mem for a in-memory sort.

How many rows in campaign_jobs? If the query is returning most of the rows in 
the table, it will not going to use any index anyway.

HTH
Gerardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of information_schema with many schemata and tables

2017-06-28 Thread Ulf Lohbrügge
Nope, I didn't try that yet. But I don't have the impression that
reindexing the indexes in information_schema will help. The table
information_schema.tables consists of the following indexes:

"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

The costly sequence scan in question on pg_class happens with the following
WHERE clause:

WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"])) AND
NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid,
'SELECT, INSERT, UPDATE, REFERENCES'::text));

Besides pg_class_oid_index none of the referenced columns is indexed. I
tried to add an index on relowner but didn't succeed because the column is
used in the function call pg_has_role and the query is still forced to do a
sequence scan.

Regards,
Ulf

2017-06-28 3:31 GMT+02:00 Pritam Baral :

> On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> > Hi all,
> >
> > we use schemata to separate our customers in a multi-tenant setup
> (9.5.7, Debian stable). Each tenant is managed in his own schema with all
> the tables that only he can access. All tables in all schemata are the same
> in terms of their DDL: Every tenant uses e.g. his own table 'address'. We
> currently manage around 1200 schemata (i.e. tenants) on one cluster. Every
> schema consists currently of ~200 tables - so we end up with ~24 tables
> plus constraints, indexes, sequences et al.
> >
> > Our current approach is quite nice in terms of data privacy because
> every tenant is isolated from all other tenants. A tenant uses his own user
> that gives him only access to the corresponding schema. Performance is
> great for us - we didn't expect Postgres to scale so well!
> >
> > But performance is pretty bad when we query things in the
> information_schema:
> >
> > SELECT
> >   *
> > FROM information_schema.tables
> > WHERE table_schema = 'foo'
> > AND table_name = 'bar';``
> >
> > Above query results in a large sequence scan with a filter that removes
> 1305161 rows:
> >
> >
>
>QUERY PLAN
> > 
> 
> 
> 
> 
> ---
> >  Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265) (actual
> time=383.505..383.505 rows=0 loops=1)
> >->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141) (actual
> time=383.504..383.504 rows=0 loops=1)
> >  Join Filter: (nc.oid = c.relnamespace)
> >  ->  Seq Scan on pg_class c  (cost=0.00..101023.01 rows=867
> width=77) (actual time=383.502..383.502 rows=0 loops=1)
> >Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND
> (((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT,
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
> >Rows Removed by Filter: 1305161
> >  ->  Materialize  (cost=0.00..56.62 rows=5 width=68) (never
> executed)
> >->  Seq Scan on pg_namespace nc  (cost=0.00..56.60 rows=5
> width=68) (never executed)
> >  Filter: ((NOT pg_is_other_temp_schema(oid)) AND
> (((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
> >->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never executed)
> >  ->  Index Scan using pg_type_oid_index on pg_type t
> (cost=0.42..8.12 rows=1 width=72) (never executed)
> >Index Cond: (c.reloftype = oid)
> >  ->  Index Scan using pg_namespace_oid_index on pg_namespace nt
> (cost=0.28..0.30 rows=1 width=68) (never executed)
> >Index Cond: (oid = t.typnamespace)
> >  Planning time: 0.624 ms
> >  Execution time: 383.784 ms
> > (16 rows)
> >
> > We noticed the degraded performance first when using the psql cli.
> Pressing tab after beginning a WHERE clause results in a query against the
> information_schema which is pretty slow and ends in "lag" when trying to
> enter queries.
> >
> > We also use Flyway (https://flywaydb.org/) to handle our database
> migrations. Unfortunately Flyway is querying the information_schema to
> check if specific tables exist (I guess this is one of the reasons
> information_schema exists) and therefore vastly slows down the migration of
> our tenants. Our last migration run on all tenants (schemata) almost 

Re: [PERFORM]

2017-06-28 Thread Pavel Stehule
2017-06-28 9:28 GMT+02:00 Yevhenii Kurtov :

> Hello Pavel,
>
> Can you please give a tip how to rewrite the query with UNION clause? I
> didn't use it at all before actually and afraid that will not get it
> properly from the first time :)
>

SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
UNION SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
UNION SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED

Something like this

Pavel


>
> On Wed, Jun 28, 2017 at 2:12 PM, Pavel Stehule 
> wrote:
>
>>
>>
>> 2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov :
>>
>>> Hello,
>>>
>>> We have a query that is run almost each second and it's very important
>>> to squeeze every other ms out of it. The query is:
>>>
>>> SELECT c0."id" FROM "campaign_jobs" AS c0
>>> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
>>> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
>>> OR ((c0."status" = $5) AND (c0."started_at" < $6))
>>> ORDER BY c0."priority" DESC, c0."times_failed"
>>> LIMIT $7
>>> FOR UPDATE SKIP LOCKED
>>>
>>> I added following index:
>>>
>>> CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at,
>>> priority DESC, times_failed);
>>>
>>> And it didn't help at all, even opposite - the planning phase time grew
>>> up from ~2ms  up to ~40 ms leaving execution time intact:
>>>
>>>  Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
>>> time=827.753..828.113 rows=100 loops=1)
>>>->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
>>> time=827.752..828.096 rows=100 loops=1)
>>>  ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18)
>>> (actual time=827.623..827.653 rows=100 loops=1)
>>>Sort Key: priority DESC, times_failed
>>>Sort Method: external sort  Disk: 5472kB
>>>->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
>>> rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
>>>  Filter: (((status = 0) AND (id <> ALL
>>> ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>>> 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
>>> 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
>>> '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
>>> (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
>>>  Planning time: 40.734 ms
>>>  Execution time: 913.638 ms
>>> (9 rows)
>>>
>>>
>>> I see that query still went through the Seq Scan instead of Index Scan.
>>> Is it due to poorly crafted index or because of query structure? Is it
>>> possible to make this query faster?
>>>
>>
>> There are few issues
>>
>> a) parametrized LIMIT
>> b) complex predicate with lot of OR
>> c)  slow external sort
>>
>> b) signalize maybe some strange in design .. try to replace "OR" by
>> "UNION" query
>> c) if you can and you have good enough memory .. try to increase work_mem
>> .. maybe 20MB
>>
>> if you change query to union queries, then you can use conditional indexes
>>
>> create index(id) where status = 0;
>> create index(failed_at) where status = 2;
>> create index(started_at) where status = 1;
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> Thanks
>>>
>>
>>
>


Re: [PERFORM]

2017-06-28 Thread Yevhenii Kurtov
Hello Pavel,

Can you please give a tip how to rewrite the query with UNION clause? I
didn't use it at all before actually and afraid that will not get it
properly from the first time :)

On Wed, Jun 28, 2017 at 2:12 PM, Pavel Stehule 
wrote:

>
>
> 2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov :
>
>> Hello,
>>
>> We have a query that is run almost each second and it's very important to
>> squeeze every other ms out of it. The query is:
>>
>> SELECT c0."id" FROM "campaign_jobs" AS c0
>> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
>> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
>> OR ((c0."status" = $5) AND (c0."started_at" < $6))
>> ORDER BY c0."priority" DESC, c0."times_failed"
>> LIMIT $7
>> FOR UPDATE SKIP LOCKED
>>
>> I added following index:
>>
>> CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority
>> DESC, times_failed);
>>
>> And it didn't help at all, even opposite - the planning phase time grew
>> up from ~2ms  up to ~40 ms leaving execution time intact:
>>
>>  Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
>> time=827.753..828.113 rows=100 loops=1)
>>->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
>> time=827.752..828.096 rows=100 loops=1)
>>  ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18) (actual
>> time=827.623..827.653 rows=100 loops=1)
>>Sort Key: priority DESC, times_failed
>>Sort Method: external sort  Disk: 5472kB
>>->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
>> rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
>>  Filter: (((status = 0) AND (id <> ALL
>> ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>> 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
>> 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
>> '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
>> (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
>>  Planning time: 40.734 ms
>>  Execution time: 913.638 ms
>> (9 rows)
>>
>>
>> I see that query still went through the Seq Scan instead of Index Scan.
>> Is it due to poorly crafted index or because of query structure? Is it
>> possible to make this query faster?
>>
>
> There are few issues
>
> a) parametrized LIMIT
> b) complex predicate with lot of OR
> c)  slow external sort
>
> b) signalize maybe some strange in design .. try to replace "OR" by
> "UNION" query
> c) if you can and you have good enough memory .. try to increase work_mem
> .. maybe 20MB
>
> if you change query to union queries, then you can use conditional indexes
>
> create index(id) where status = 0;
> create index(failed_at) where status = 2;
> create index(started_at) where status = 1;
>
> Regards
>
> Pavel
>
>
>>
>> Thanks
>>
>
>


Re: [PERFORM]

2017-06-28 Thread Pavel Stehule
2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov :

> Hello,
>
> We have a query that is run almost each second and it's very important to
> squeeze every other ms out of it. The query is:
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
> OR ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
>
> I added following index:
>
> CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority
> DESC, times_failed);
>
> And it didn't help at all, even opposite - the planning phase time grew up
> from ~2ms  up to ~40 ms leaving execution time intact:
>
>  Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
> time=827.753..828.113 rows=100 loops=1)
>->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
> time=827.752..828.096 rows=100 loops=1)
>  ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18) (actual
> time=827.623..827.653 rows=100 loops=1)
>Sort Key: priority DESC, times_failed
>Sort Method: external sort  Disk: 5472kB
>->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
> rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
>  Filter: (((status = 0) AND (id <> ALL
> ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
> 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
> 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
> '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
> (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
>  Planning time: 40.734 ms
>  Execution time: 913.638 ms
> (9 rows)
>
>
> I see that query still went through the Seq Scan instead of Index Scan. Is
> it due to poorly crafted index or because of query structure? Is it
> possible to make this query faster?
>

There are few issues

a) parametrized LIMIT
b) complex predicate with lot of OR
c)  slow external sort

b) signalize maybe some strange in design .. try to replace "OR" by "UNION"
query
c) if you can and you have good enough memory .. try to increase work_mem
.. maybe 20MB

if you change query to union queries, then you can use conditional indexes

create index(id) where status = 0;
create index(failed_at) where status = 2;
create index(started_at) where status = 1;

Regards

Pavel


>
> Thanks
>


[PERFORM]

2017-06-28 Thread Yevhenii Kurtov
Hello,

We have a query that is run almost each second and it's very important to
squeeze every other ms out of it. The query is:

SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED

I added following index:

CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority
DESC, times_failed);

And it didn't help at all, even opposite - the planning phase time grew up
from ~2ms  up to ~40 ms leaving execution time intact:

 Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
time=827.753..828.113 rows=100 loops=1)
   ->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
time=827.752..828.096 rows=100 loops=1)
 ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18) (actual
time=827.623..827.653 rows=100 loops=1)
   Sort Key: priority DESC, times_failed
   Sort Method: external sort  Disk: 5472kB
   ->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
 Filter: (((status = 0) AND (id <> ALL
('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
'2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
(started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
 Planning time: 40.734 ms
 Execution time: 913.638 ms
(9 rows)


I see that query still went through the Seq Scan instead of Index Scan. Is
it due to poorly crafted index or because of query structure? Is it
possible to make this query faster?


Thanks