Re: [PERFORM]

2017-06-29 Thread Scott Marlowe
On Thu, Jun 29, 2017 at 1:11 PM, Yevhenii Kurtov
 wrote:
> Hi Jeff,
>
> That is just a sample data, we are going live in Jun and I don't have
> anything real so far. Right now it's 9.6 and it will be a latest stable
> available release on the date that we go live.

Trust me on this one, you want to get some realistic fake data in
there, and in realistic quantities before you go live to test.
Postgresql's planner makes decisions based on size of the data it has
to trundle through and statistical analysis of the data in the tables
etc. You don't wanna go from 500 rows in a test table with the same
values to 10,000,000 rows with wildly varying data in production
without having some clue where that db is gonna be headed performance
wise.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM]

2017-06-29 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov :
> 

> > I just  tried UNION queries and got following error:
> >
> > ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
> 
> it is sad :(

I think we could lift this restriction for UNION ALL, but UNION sounds
difficult.


BTW I wonder how much of the original problem is caused by using a
prepared query.  I understand the desire to avoid repeated planning
work, but I think in this case it may be working against you.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM]

2017-06-29 Thread Yevhenii Kurtov
Hi Jeff,

That is just a sample data, we are going live in Jun and I don't have
anything real so far. Right now it's 9.6 and it will be a latest stable
available release on the date that we go live.

On Fri, Jun 30, 2017 at 1:11 AM, Jeff Janes  wrote:

> On Tue, Jun 27, 2017 at 11:47 PM, Yevhenii Kurtov <
> yevhenii.kur...@gmail.com> wrote:
>
>> Hello,
>>
>> We have a query that is run almost each second and it's very important to
>> squeeze every other ms out of it. The query is:
>>
>> SELECT c0."id" FROM "campaign_jobs" AS c0
>> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
>> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
>> OR ((c0."status" = $5) AND (c0."started_at" < $6))
>> ORDER BY c0."priority" DESC, c0."times_failed"
>> LIMIT $7
>> FOR UPDATE SKIP LOCKED
>>
>>
>
>
>>
>> I see that query still went through the Seq Scan instead of Index Scan.
>> Is it due to poorly crafted index or because of query structure? Is it
>> possible to make this query faster?
>>
>
> An index on (priority desc, times_failed) should speed this up massively.
> Might want to include status at the end as well. However, your example data
> is not terribly realistic.
>
> What version of PostgreSQL are you using?
>
> Cheers,
>
> Jeff
>


Re: [PERFORM]

2017-06-29 Thread Jeff Janes
On Tue, Jun 27, 2017 at 11:47 PM, Yevhenii Kurtov  wrote:

> Hello,
>
> We have a query that is run almost each second and it's very important to
> squeeze every other ms out of it. The query is:
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
> OR ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
>
>


>
> I see that query still went through the Seq Scan instead of Index Scan. Is
> it due to poorly crafted index or because of query structure? Is it
> possible to make this query faster?
>

An index on (priority desc, times_failed) should speed this up massively.
Might want to include status at the end as well. However, your example data
is not terribly realistic.

What version of PostgreSQL are you using?

Cheers,

Jeff


Re: [PERFORM]

2017-06-29 Thread Pavel Stehule
2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov :

> Hello folks,
>
> Thank you very much for analysis and suggested - there is a lot to learn
> here. I just  tried UNION queries and got following error:
>
> ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
>

it is sad :(

maybe bitmap index scan can work

postgres=# create table test(id int, started date, failed date, status int);
CREATE TABLE
postgres=# create index on test(id) where status = 0;
CREATE INDEX
postgres=# create index on test(started) where status = 1;
CREATE INDEX
postgres=# create index on test(failed ) where status = 2;
CREATE INDEX
postgres=# explain select id from test where (status = 0 and id in
(1,2,3,4,5)) or (status = 1 and started < current_date) or (status = 2 and
failed > current_date);
┌
│
QUERY PLAN
╞
│ Bitmap Heap Scan on test  (cost=12.93..22.50 rows=6 width=4)

│   Recheck Cond: (((id = ANY ('{1,2,3,4,5}'::integer[])) AND (status = 0))
OR ((started < CURRENT_DATE) AND (status = 1)) OR ((faile
│   Filter: (((status = 0) AND (id = ANY ('{1,2,3,4,5}'::integer[]))) OR
((status = 1) AND (started < CURRENT_DATE)) OR ((status = 2)
│   ->  BitmapOr  (cost=12.93..12.93 rows=6 width=0)

│ ->  Bitmap Index Scan on test_id_idx  (cost=0.00..4.66 rows=1
width=0)
│   Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))

│ ->  Bitmap Index Scan on test_started_idx  (cost=0.00..4.13
rows=3 width=0)
│   Index Cond: (started < CURRENT_DATE)

│ ->  Bitmap Index Scan on test_failed_idx  (cost=0.00..4.13 rows=3
width=0)
│   Index Cond: (failed > CURRENT_DATE)

└
(10 rows)



>
> I made a table dump for anyone who wants to give it a spin
> https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr
> and here is the gist for the original commands https://gist.github.
> com/lessless/33215d0c147645db721e74e07498ac53
>
> On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong 
> wrote:
>
>>
>>
>> On 2017-06-28, Pavel Stehule wrote ...
>> > On 2017-06-28, Yevhenii Kurtov wrote ...
>> >> On 2017-06-28, Pavel Stehule wrote ...
>> >>> On 2017-06-28, Yevhenii Kurtov wrote ...
>>  We have a query that is run almost each second and it's very
>> important to squeeze every other ms out of it. The query is:
>>  ...
>>  I added following index: CREATE INDEX ON campaign_jobs(id, status,
>> failed_at, started_at, priority DESC, times_failed);
>>  ...
>> >>> There are few issues
>> >>> a) parametrized LIMIT
>> >>> b) complex predicate with lot of OR
>> >>> c) slow external sort
>> >>>
>> >>> b) signalize maybe some strange in design .. try to replace "OR" by
>> "UNION" query
>> >>> c) if you can and you have good enough memory .. try to increase
>> work_mem .. maybe 20MB
>> >>>
>> >>> if you change query to union queries, then you can use conditional
>> indexes
>> >>>
>> >>> create index(id) where status = 0;
>> >>> create index(failed_at) where status = 2;
>> >>> create index(started_at) where status = 1;
>> >>
>> >> Can you please give a tip how to rewrite the query with UNION clause?
>> >
>> > SELECT c0."id" FROM "campaign_jobs" AS c0
>> > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
>> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0
>> > WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
>> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0
>> > WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
>> > ORDER BY c0."priority" DESC, c0."times_failed"
>> > LIMIT $7
>> > FOR UPDATE SKIP LOCKED
>>
>>
>> Normally (at least for developers I've worked with), that kind of query
>> structure is used when the "status" values don't overlap and don't change
>> from query to query. Judging from Pavel's suggested conditional indexes
>> (i.e. "where status = "), he also thinks that is likely.
>>
>> Give the optimizer that information so that it can use it. Assuming $1 =
>> 0 and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in
>> limit. Push order by and limit to each branch of the union all (or does
>> Postgres figure that out automatically?) Replace union with union all (not
>> sure about Postgres, but allows other dbms to avoid sorting and merging
>> result sets to eliminate duplicates). (Use of UNION ALL assumes that "id"
>> is unique across rows as implied by only "id" being selected with FOR
>> UPDATE. If multiple rows can have the same "id", then use UNION to
>> eliminate the duplicates.)
>>
>> SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1)
>>   UNION ALL
>> SELECT "id" FROM "campaign_jobs" WHERE