[PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
Hi all,

We have a slow query. After analyzing, the planner decision seems to be
discutable : the query is faster when disabling seqscan. See below the two
query plan, and an extract from pg_stats.

Any idea about what to change to help the planner ?

An information which can be useful : the number on distinct value on
organization_id is very very low, may be the planner does not known that,
and take the wrong decision.

Regards,

Bertrand

# explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
"external_sync_messages"."organization_id" = 1612 AND
("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;

 QUERY
PLAN



 Limit  (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213
rows=1 loops=1)

   ->  Seq Scan on external_sync_messages  (cost=0.00..79104.69 rows=6385
width=0) (actual time=232.209..232.209 rows=1 loops=1)

 Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND
((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[])))

 Rows Removed by Filter: 600140

 Planning time: 0.490 ms

 Execution time: 232.246 ms

(6 rows)

# set enable_seqscan = off;

SET

# explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
"external_sync_messages"."organization_id" = 1612 AND
("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;


  QUERY PLAN


--

 Limit  (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030 rows=1
loops=1)

   ->  Index Scan using index_external_sync_messages_on_organization_id on
external_sync_messages  (cost=0.42..251934.05 rows=6385 width=0) (actual
time=0.028..0.028 rows=1 loops=1)

 Index Cond: (organization_id = 1612)

 Filter: ((handled_by IS NULL) AND ((status)::text <> ALL
('{sent_to_proxy,in_progress,ok}'::text[])))

 Planning time: 0.103 ms

 Execution time: 0.052 ms

(6 rows)

# SELECT attname, inherited, n_distinct, array_to_string(most_common_vals,
E'\n') as most_common_vals FROM pg_stats WHERE tablename =
'external_sync_messages' and attname IN ('status', 'organization_id',
'handled_by');

 attname | inherited | n_distinct | most_common_vals

-+---++--

 handled_by  | f |  3 | 3   +

 |   || 236140  +

 |   || 54413

 organization_id | f | 22 | 1612+

 |   || 287 +

 |   || 967 +

 |   || 1223+

 |   || 1123+

 |   || 1930+

 |   || 841 +

 |   || 1814+

 |   || 711 +

 |   || 1513+

 |   || 1794+

 |   || 1246+

 |   || 1673+

 |   || 1552+

 |   || 1747+

 |   || 2611+

 |   || 2217+

 |   || 2448+

 |   || 2133+

 |   || 1861+

 |   || 2616+

 |   || 2796

 status  | f |  6 | ok  +

 |   || ignored +

 |   || channel_error   +

 |   || in_progress +

 |   || error   +

 |   || sent_to_proxy

(3 rows)

# select count(*) from external_sync_messages;

 count



 992912

(1 row)


Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Alex Ignatov

On 27.10.2015 12:35, Bertrand Paquet wrote:

Hi all,

We have a slow query. After analyzing, the planner decision seems to 
be discutable : the query is faster when disabling seqscan. See below 
the two query plan, and an extract from pg_stats.


Any idea about what to change to help the planner ?

An information which can be useful : the number on distinct value on 
organization_id is very very low, may be the planner does not known 
that, and take the wrong decision.


Regards,

Bertrand

# explain analyze SELECT  1 AS one FROM "external_sync_messages"  
WHERE "external_sync_messages"."organization_id" = 1612 AND 
("external_sync_messages"."status" NOT IN ('sent_to_proxy', 
'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS 
NULL LIMIT 1;


  QUERY PLAN



 Limit  (cost=0.00..12.39 rows=1 width=0) (actual 
time=232.212..232.213 rows=1 loops=1)


   ->  Seq Scan on external_sync_messages  (cost=0.00..79104.69 
rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1)


 Filter: ((handled_by IS NULL) AND (organization_id = 1612) 
AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[])))


 Rows Removed by Filter: 600140

 Planning time: 0.490 ms

 Execution time: 232.246 ms

(6 rows)

# set enable_seqscan = off;

SET

# explain analyze SELECT  1 AS one FROM "external_sync_messages"  
WHERE "external_sync_messages"."organization_id" = 1612 AND 
("external_sync_messages"."status" NOT IN ('sent_to_proxy', 
'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS 
NULL LIMIT 1;


QUERY PLAN

--

 Limit  (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030 
rows=1 loops=1)


   ->  Index Scan using 
index_external_sync_messages_on_organization_id on 
external_sync_messages  (cost=0.42..251934.05 rows=6385 width=0) 
(actual time=0.028..0.028 rows=1 loops=1)


 Index Cond: (organization_id = 1612)

 Filter: ((handled_by IS NULL) AND ((status)::text <> ALL 
('{sent_to_proxy,in_progress,ok}'::text[])))


 Planning time: 0.103 ms

 Execution time: 0.052 ms

(6 rows)

# SELECT attname, inherited, 
n_distinct, array_to_string(most_common_vals, E'\n') as 
most_common_vals FROM pg_stats WHERE tablename = 
'external_sync_messages' and attname IN ('status', 'organization_id', 
'handled_by');


 attname | inherited | n_distinct | most_common_vals

-+---++--

 handled_by  | f |   3 | 3   +

 |   | | 236140  +

 |   | | 54413

 organization_id | f | 22 | 1612+

 |   | | 287 +

 |   | | 967 +

 |   | | 1223+

 |   | | 1123+

 |   | | 1930+

 |   | | 841 +

 |   | | 1814+

 |   | | 711 +

 |   | | 1513+

 |   | | 1794+

 |   | | 1246+

 |   | | 1673+

 |   | | 1552+

 |   | | 1747+

 |   | | 2611+

 |   | | 2217+

 |   | | 2448+

 |   | | 2133+

 |   | | 1861+

 |   | | 2616+

 |   | | 2796

 status  | f |   6 | ok  +

 |   | | ignored +

 |   | | channel_error   +

 |   | | in_progress +

 |   | | error   +

 |   | | sent_to_proxy

(3 rows)

# select count(*) from external_sync_messages;

 count



 992912

(1 row)



Hello, Bertrand!
May be statistics on external_sync_messages is wrong? i.e planner give 
us rows=6385 but seq scan give us Rows Removed by Filter: 600140

Maybe you should recalc it by VACUUM ANALYZE it?

--
Alex Ignatov
Postgres Professional: 

Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
Yes, I have run VACUUM ANALYZE, no effect.

Bertrand

2015-10-27 12:08 GMT+01:00 Alex Ignatov :

> On 27.10.2015 12:35, Bertrand Paquet wrote:
>
>> Hi all,
>>
>> We have a slow query. After analyzing, the planner decision seems to be
>> discutable : the query is faster when disabling seqscan. See below the two
>> query plan, and an extract from pg_stats.
>>
>> Any idea about what to change to help the planner ?
>>
>> An information which can be useful : the number on distinct value on
>> organization_id is very very low, may be the planner does not known that,
>> and take the wrong decision.
>>
>> Regards,
>>
>> Bertrand
>>
>> # explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
>> "external_sync_messages"."organization_id" = 1612 AND
>> ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
>> 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;
>>
>>   QUERY PLAN
>>
>>
>> 
>>
>>  Limit  (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213
>> rows=1 loops=1)
>>
>>->  Seq Scan on external_sync_messages  (cost=0.00..79104.69 rows=6385
>> width=0) (actual time=232.209..232.209 rows=1 loops=1)
>>
>>  Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND
>> ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[])))
>>
>>  Rows Removed by Filter: 600140
>>
>>  Planning time: 0.490 ms
>>
>>  Execution time: 232.246 ms
>>
>> (6 rows)
>>
>> # set enable_seqscan = off;
>>
>> SET
>>
>> # explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
>> "external_sync_messages"."organization_id" = 1612 AND
>> ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
>> 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;
>>
>> QUERY PLAN
>>
>>
>> --
>>
>>  Limit  (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030
>> rows=1 loops=1)
>>
>>->  Index Scan using index_external_sync_messages_on_organization_id
>> on external_sync_messages  (cost=0.42..251934.05 rows=6385 width=0) (actual
>> time=0.028..0.028 rows=1 loops=1)
>>
>>  Index Cond: (organization_id = 1612)
>>
>>  Filter: ((handled_by IS NULL) AND ((status)::text <> ALL
>> ('{sent_to_proxy,in_progress,ok}'::text[])))
>>
>>  Planning time: 0.103 ms
>>
>>  Execution time: 0.052 ms
>>
>> (6 rows)
>>
>> # SELECT attname, inherited, n_distinct,
>> array_to_string(most_common_vals, E'\n') as most_common_vals FROM pg_stats
>> WHERE tablename = 'external_sync_messages' and attname IN ('status',
>> 'organization_id', 'handled_by');
>>
>>  attname | inherited | n_distinct | most_common_vals
>>
>> -+---++--
>>
>>  handled_by  | f |   3 | 3   +
>>
>>  |   | | 236140  +
>>
>>  |   | | 54413
>>
>>  organization_id | f | 22 | 1612+
>>
>>  |   | | 287 +
>>
>>  |   | | 967 +
>>
>>  |   | | 1223+
>>
>>  |   | | 1123+
>>
>>  |   | | 1930+
>>
>>  |   | | 841 +
>>
>>  |   | | 1814+
>>
>>  |   | | 711 +
>>
>>  |   | | 1513+
>>
>>  |   | | 1794+
>>
>>  |   | | 1246+
>>
>>  |   | | 1673+
>>
>>  |   | | 1552+
>>
>>  |   | | 1747+
>>
>>  |   | | 2611+
>>
>>  |   | | 2217+
>>
>>  |   | | 2448+
>>
>>  |   | | 2133+
>>
>>  |   | | 1861+
>>
>>  |   | | 2616+
>>
>>  |   | | 2796
>>
>>  status  | f |   6 | ok  +
>>
>>  |   | | ignored +
>>
>>  |   | | channel_error   +
>>
>>  |   | | in_progress +
>>
>>  | 

Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Alex Ignatov

On 27.10.2015 14:10, Bertrand Paquet wrote:

Yes, I have run VACUUM ANALYZE, no effect.

Bertrand

2015-10-27 12:08 GMT+01:00 Alex Ignatov >:


On 27.10.2015 12:35, Bertrand Paquet wrote:

Hi all,

We have a slow query. After analyzing, the planner decision
seems to be discutable : the query is faster when disabling
seqscan. See below the two query plan, and an extract from
pg_stats.

Any idea about what to change to help the planner ?

An information which can be useful : the number on distinct
value on organization_id is very very low, may be the planner
does not known that, and take the wrong decision.

Regards,

Bertrand

# explain analyze SELECT  1 AS one FROM
"external_sync_messages"  WHERE
"external_sync_messages"."organization_id" = 1612 AND
("external_sync_messages"."status" NOT IN ('sent_to_proxy',
'in_progress', 'ok')) AND
"external_sync_messages"."handled_by" IS NULL LIMIT 1;

  QUERY PLAN




 Limit  (cost=0.00..12.39 rows=1 width=0) (actual
time=232.212..232.213 rows=1 loops=1)

   ->  Seq Scan on external_sync_messages (cost=0.00..79104.69
rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1)

 Filter: ((handled_by IS NULL) AND (organization_id =
1612) AND ((status)::text <> ALL
('{sent_to_proxy,in_progress,ok}'::text[])))

 Rows Removed by Filter: 600140

 Planning time: 0.490 ms

 Execution time: 232.246 ms

(6 rows)

# set enable_seqscan = off;

SET

# explain analyze SELECT  1 AS one FROM
"external_sync_messages"  WHERE
"external_sync_messages"."organization_id" = 1612 AND
("external_sync_messages"."status" NOT IN ('sent_to_proxy',
'in_progress', 'ok')) AND
"external_sync_messages"."handled_by" IS NULL LIMIT 1;

QUERY PLAN


--

 Limit  (cost=0.42..39.88 rows=1 width=0) (actual
time=0.030..0.030 rows=1 loops=1)

   ->  Index Scan using
index_external_sync_messages_on_organization_id on
external_sync_messages  (cost=0.42..251934.05 rows=6385
width=0) (actual time=0.028..0.028 rows=1 loops=1)

 Index Cond: (organization_id = 1612)

 Filter: ((handled_by IS NULL) AND ((status)::text <>
ALL ('{sent_to_proxy,in_progress,ok}'::text[])))

 Planning time: 0.103 ms

 Execution time: 0.052 ms

(6 rows)

# SELECT attname, inherited, n_distinct,
array_to_string(most_common_vals, E'\n') as most_common_vals
FROM pg_stats WHERE tablename = 'external_sync_messages' and
attname IN ('status', 'organization_id', 'handled_by');

 attname | inherited | n_distinct | most_common_vals

-+---++--

 handled_by  | f |   3 | 3+

 |   | | 236140   +

 |   | | 54413

 organization_id | f | 22 | 1612   +

 |   | | 287+

 |   | | 967+

 |   | | 1223   +

 |   | | 1123   +

 |   | | 1930   +

 |   | | 841+

 |   | | 1814   +

 |   | | 711+

 |   | | 1513   +

 |   | | 1794   +

 |   | | 1246   +

 |   | | 1673   +

 |   | | 1552   +

 |   | | 1747   +

 |   | | 2611   +

 |   | | 2217   +

 |   | | 2448   +

 |   | | 2133   +

 |   | | 1861   +

 |   | | 2616   +

 |   | | 2796

 status  | f |   6 | ok   +

 |   | | ignored+


Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
relname | n_live_tup | n_dead_tup |  last_vacuum
  |last_autovacuum| last_analyze  |
  last_autoanalyze

+++---+---+---+---

 external_sync_messages | 998105 |  11750 | 2015-10-26
20:15:17.484771+00 | 2015-10-02 15:04:25.944479+00 | 2015-10-26
20:15:19.465308+00 | 2015-10-22 12:24:26.947616+00

(1 row)

2015-10-27 12:17 GMT+01:00 Alex Ignatov :

> On 27.10.2015 14:10, Bertrand Paquet wrote:
>
> Yes, I have run VACUUM ANALYZE, no effect.
>
> Bertrand
>
> 2015-10-27 12:08 GMT+01:00 Alex Ignatov :
>
>> On 27.10.2015 12:35, Bertrand Paquet wrote:
>>
>>> Hi all,
>>>
>>> We have a slow query. After analyzing, the planner decision seems to be
>>> discutable : the query is faster when disabling seqscan. See below the two
>>> query plan, and an extract from pg_stats.
>>>
>>> Any idea about what to change to help the planner ?
>>>
>>> An information which can be useful : the number on distinct value on
>>> organization_id is very very low, may be the planner does not known that,
>>> and take the wrong decision.
>>>
>>> Regards,
>>>
>>> Bertrand
>>>
>>> # explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
>>> "external_sync_messages"."organization_id" = 1612 AND
>>> ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
>>> 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;
>>>
>>>   QUERY PLAN
>>>
>>>
>>> 
>>>
>>>  Limit  (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213
>>> rows=1 loops=1)
>>>
>>>->  Seq Scan on external_sync_messages  (cost=0.00..79104.69
>>> rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1)
>>>
>>>  Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND
>>> ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[])))
>>>
>>>  Rows Removed by Filter: 600140
>>>
>>>  Planning time: 0.490 ms
>>>
>>>  Execution time: 232.246 ms
>>>
>>> (6 rows)
>>>
>>> # set enable_seqscan = off;
>>>
>>> SET
>>>
>>> # explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
>>> "external_sync_messages"."organization_id" = 1612 AND
>>> ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
>>> 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;
>>>
>>> QUERY PLAN
>>>
>>>
>>> --
>>>
>>>  Limit  (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030
>>> rows=1 loops=1)
>>>
>>>->  Index Scan using index_external_sync_messages_on_organization_id
>>> on external_sync_messages  (cost=0.42..251934.05 rows=6385 width=0) (actual
>>> time=0.028..0.028 rows=1 loops=1)
>>>
>>>  Index Cond: (organization_id = 1612)
>>>
>>>  Filter: ((handled_by IS NULL) AND ((status)::text <> ALL
>>> ('{sent_to_proxy,in_progress,ok}'::text[])))
>>>
>>>  Planning time: 0.103 ms
>>>
>>>  Execution time: 0.052 ms
>>>
>>> (6 rows)
>>>
>>> # SELECT attname, inherited, n_distinct,
>>> array_to_string(most_common_vals, E'\n') as most_common_vals FROM pg_stats
>>> WHERE tablename = 'external_sync_messages' and attname IN ('status',
>>> 'organization_id', 'handled_by');
>>>
>>>  attname | inherited | n_distinct | most_common_vals
>>>
>>> -+---++--
>>>
>>>  handled_by  | f |   3 | 3   +
>>>
>>>  |   | | 236140  +
>>>
>>>  |   | | 54413
>>>
>>>  organization_id | f | 22 | 1612+
>>>
>>>  |   | | 287 +
>>>
>>>  |   | | 967 +
>>>
>>>  |   | | 1223+
>>>
>>>  |   | | 1123+
>>>
>>>  |   | | 1930+
>>>
>>>  |   | | 841 +
>>>
>>>  |   | | 1814+
>>>
>>>  |   | | 711 +
>>>
>>>  |   | | 1513+
>>>
>>>  |   | | 1794+
>>>
>>>  |   | | 1246+
>>>
>>>  |   | | 1673+
>>>
>>>  |   | | 1552+
>>>
>>>  |   |  

Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Alex Ignatov



On 27.10.2015 14:19, Bertrand Paquet wrote:


relname | n_live_tup | n_dead_tup |  
last_vacuum  | last_autovacuum| last_analyze  
|   last_autoanalyze


+++---+---+---+---

 external_sync_messages | 998105 |  11750 | 2015-10-26 
20:15:17.484771+00 | 2015-10-02 15:04:25.944479+00 | 2015-10-26 
20:15:19.465308+00 | 2015-10-22 12:24:26.947616+00


(1 row)


2015-10-27 12:17 GMT+01:00 Alex Ignatov >:


On 27.10.2015 14:10, Bertrand Paquet wrote:

Yes, I have run VACUUM ANALYZE, no effect.

Bertrand

2015-10-27 12:08 GMT+01:00 Alex Ignatov mailto:[email protected]>>:

On 27.10.2015 12:35, Bertrand Paquet wrote:

Hi all,

We have a slow query. After analyzing, the planner
decision seems to be discutable : the query is faster
when disabling seqscan. See below the two query plan, and
an extract from pg_stats.

Any idea about what to change to help the planner ?

An information which can be useful : the number on
distinct value on organization_id is very very low, may
be the planner does not known that, and take the wrong
decision.

Regards,

Bertrand

# explain analyze SELECT  1 AS one FROM
"external_sync_messages"  WHERE
"external_sync_messages"."organization_id" = 1612 AND
("external_sync_messages"."status" NOT IN
('sent_to_proxy', 'in_progress', 'ok')) AND
"external_sync_messages"."handled_by" IS NULL LIMIT 1;

  QUERY PLAN




 Limit  (cost=0.00..12.39 rows=1 width=0) (actual
time=232.212..232.213 rows=1 loops=1)

   ->  Seq Scan on external_sync_messages
(cost=0.00..79104.69 rows=6385 width=0) (actual
time=232.209..232.209 rows=1 loops=1)

 Filter: ((handled_by IS NULL) AND
(organization_id = 1612) AND ((status)::text <> ALL
('{sent_to_proxy,in_progress,ok}'::text[])))

 Rows Removed by Filter: 600140

 Planning time: 0.490 ms

 Execution time: 232.246 ms

(6 rows)

# set enable_seqscan = off;

SET

# explain analyze SELECT  1 AS one FROM
"external_sync_messages"  WHERE
"external_sync_messages"."organization_id" = 1612 AND
("external_sync_messages"."status" NOT IN
('sent_to_proxy', 'in_progress', 'ok')) AND
"external_sync_messages"."handled_by" IS NULL LIMIT 1;

QUERY PLAN


--

 Limit  (cost=0.42..39.88 rows=1 width=0) (actual
time=0.030..0.030 rows=1 loops=1)

   ->  Index Scan using
index_external_sync_messages_on_organization_id on
external_sync_messages (cost=0.42..251934.05 rows=6385
width=0) (actual time=0.028..0.028 rows=1 loops=1)

 Index Cond: (organization_id = 1612)

 Filter: ((handled_by IS NULL) AND
((status)::text <> ALL
('{sent_to_proxy,in_progress,ok}'::text[])))

 Planning time: 0.103 ms

 Execution time: 0.052 ms

(6 rows)

# SELECT attname, inherited, n_distinct,
array_to_string(most_common_vals, E'\n') as
most_common_vals FROM pg_stats WHERE tablename =
'external_sync_messages' and attname IN ('status',
'organization_id', 'handled_by');

 attname | inherited | n_distinct | most_common_vals

-+---++--

 handled_by  | f |   3 | 3   +

 |   | | 236140  +

 |   | | 54413

 organization_id | f | 22 | 1612+

 |   | | 287 +

 |   | | 967 +

 |   | | 1223+

 |   | | 1123+

 |   | | 1930+

 

Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Tom Lane
Bertrand Paquet  writes:
> We have a slow query. After analyzing, the planner decision seems to be
> discutable : the query is faster when disabling seqscan. See below the two
> query plan, and an extract from pg_stats.

> Any idea about what to change to help the planner ?

Neither one of those plans is very good: you're just hoping that the
Filter condition will let a tuple through sooner rather than later.

If you care about the performance of this type of query, I'd consider
creating an index on (organization_id, status, handled_by) so that all
the conditions can be checked in the index.

regards, tom lane


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
show random_page_cost ;

 random_page_cost

--

 4

(1 row)

2015-10-27 12:30 GMT+01:00 Alex Ignatov :

>
>
> On 27.10.2015 14:19, Bertrand Paquet wrote:
>
> relname | n_live_tup | n_dead_tup |  last_vacuum
> |last_autovacuum| last_analyze  |
> last_autoanalyze
>
>
> +++---+---+---+---
>
>  external_sync_messages | 998105 |  11750 | 2015-10-26
> 20:15:17.484771+00 | 2015-10-02 15:04:25.944479+00 | 2015-10-26
> 20:15:19.465308+00 | 2015-10-22 12:24:26.947616+00
>
> (1 row)
>
> 2015-10-27 12:17 GMT+01:00 Alex Ignatov :
>
>> On 27.10.2015 14:10, Bertrand Paquet wrote:
>>
>> Yes, I have run VACUUM ANALYZE, no effect.
>>
>> Bertrand
>>
>> 2015-10-27 12:08 GMT+01:00 Alex Ignatov < 
>> [email protected]>:
>>
>>> On 27.10.2015 12:35, Bertrand Paquet wrote:
>>>
 Hi all,

 We have a slow query. After analyzing, the planner decision seems to be
 discutable : the query is faster when disabling seqscan. See below the two
 query plan, and an extract from pg_stats.

 Any idea about what to change to help the planner ?

 An information which can be useful : the number on distinct value on
 organization_id is very very low, may be the planner does not known that,
 and take the wrong decision.

 Regards,

 Bertrand

 # explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
 "external_sync_messages"."organization_id" = 1612 AND
 ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;

   QUERY PLAN


 

  Limit  (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213
 rows=1 loops=1)

->  Seq Scan on external_sync_messages  (cost=0.00..79104.69
 rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1)

  Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND
 ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[])))

  Rows Removed by Filter: 600140

  Planning time: 0.490 ms

  Execution time: 232.246 ms

 (6 rows)

 # set enable_seqscan = off;

 SET

 # explain analyze SELECT  1 AS one FROM "external_sync_messages"  WHERE
 "external_sync_messages"."organization_id" = 1612 AND
 ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress',
 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1;

 QUERY PLAN


 --

  Limit  (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030
 rows=1 loops=1)

->  Index Scan using index_external_sync_messages_on_organization_id
 on external_sync_messages  (cost=0.42..251934.05 rows=6385 width=0) (actual
 time=0.028..0.028 rows=1 loops=1)

  Index Cond: (organization_id = 1612)

  Filter: ((handled_by IS NULL) AND ((status)::text <> ALL
 ('{sent_to_proxy,in_progress,ok}'::text[])))

  Planning time: 0.103 ms

  Execution time: 0.052 ms

 (6 rows)

 # SELECT attname, inherited, n_distinct,
 array_to_string(most_common_vals, E'\n') as most_common_vals FROM pg_stats
 WHERE tablename = 'external_sync_messages' and attname IN ('status',
 'organization_id', 'handled_by');

  attname | inherited | n_distinct | most_common_vals

 -+---++--

  handled_by  | f |   3 | 3   +

  |   | | 236140  +

  |   | | 54413

  organization_id | f | 22 | 1612+

  |   | | 287 +

  |   | | 967 +

  |   | | 1223+

  |   | | 1123+

  |   | | 1930+

  |   | | 841 +

  |   | | 1814+

  |   | | 711 +

  |  

Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
Hi tom,

I did the test yesterday with an index on the three fields, and with a
partial index on organization and status and where is null condition on
handled.

Le mardi 27 octobre 2015, Tom Lane  a écrit :

> Bertrand Paquet  writes:
> > We have a slow query. After analyzing, the planner decision seems to be
> > discutable : the query is faster when disabling seqscan. See below the
> two
> > query plan, and an extract from pg_stats.
>
> > Any idea about what to change to help the planner ?
>
> Neither one of those plans is very good: you're just hoping that the
> Filter condition will let a tuple through sooner rather than later.
>
> If you care about the performance of this type of query, I'd consider
> creating an index on (organization_id, status, handled_by) so that all
> the conditions can be checked in the index.
>
> regards, tom lane
>


Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
Hi tom,

I did the test yesterday with an index on the three fields, and with a
partial index on organization and status and where is null condition on
handled. I saw no modification on query plan.
May be I forgot to analyze vacuum after. I will retry tonight.

I use a btree index. Is it the good solution, even with the In clause ?

Regards,

Bertrand

Le mardi 27 octobre 2015, Tom Lane  a écrit :

> Bertrand Paquet > writes:
> > We have a slow query. After analyzing, the planner decision seems to be
> > discutable : the query is faster when disabling seqscan. See below the
> two
> > query plan, and an extract from pg_stats.
>
> > Any idea about what to change to help the planner ?
>
> Neither one of those plans is very good: you're just hoping that the
> Filter condition will let a tuple through sooner rather than later.
>
> If you care about the performance of this type of query, I'd consider
> creating an index on (organization_id, status, handled_by) so that all
> the conditions can be checked in the index.
>
> regards, tom lane
>


[PERFORM] Partition Constraint Exclusion Limits

2015-10-27 Thread GMail
I have partitioned a large table in my PG database (6.7 billion rows!) by a 
date column and in general constraint exclusion works well but only in 
relatively simple case when the partition key is specified exactly as created 
in the CHECK constraint.  I'm curious if there is a way to get it to work a 
little more generally though.

For example my CHECK constraint (see code below) specifying a hard-coded field 
value works well (#1 and #2).  Specifying a function that returns a value even 
though it is the appropriate type scans all of the partitions (#3) 
unfortunately.  Likewise any join, CTE, or sub-query expression, even for a 
single row that returns the correct type also results in a scan of all of the 
partitions.  

I was curious if there was a way specifically to get #3 to work as the WHERE 
predicate in this case is stored as an integer but the table itself is 
partitioned by the appropriate date type.  I believe I could work around this 
issue with dynamic sql in a function but there are lots of cases of this type 
of simple conversion and I wanted to avoid the maintenance of creating a 
function per query.

It's also slightly surprising that queries that join with the appropriate type 
(#4 & #5) also cause a full partition scan.  Is there a work-around to get 
constraint_exclusion to work in this case?


-- constraint exclusion tests
-- generate some data
create schema if not exists ptest;
set search_path=ptest;
drop table if exists ptest.tbl cascade;
create table if not exists tbl as select * from (
with a as (
select
generate_series('2014-01-01'::date, now(), '1 day'::interval)::date dt
),
b as (
select
generate_series(1, 1000) i
)
select
a.dt,
b.i,
md5((random()*4+5)::text) str
from
a cross join b
) c;

-- create child partitions
create table ptest.tbl_p2014(check (dt >= '2014-01-01'::date and dt < 
'2015-01-01'::date)) inherits (ptest.tbl);
create table ptest.tbl_p2015(check (dt >= '2015-01-01'::date and dt < 
'2016-01-01'::date)) inherits (ptest.tbl);

-- populate child partitions
with pd as ( delete from only ptest.tbl where dt >= '2014-01-01'::date and dt < 
'2015-01-01'::date returning *) 
insert into ptest.tbl_p2014 select * from pd;
with pd as ( delete from only ptest.tbl where dt >= '2015-01-01'::date and dt < 
'2016-01-01'::date returning *) 
insert into ptest.tbl_p2015 select * from pd;

-- clean parent of any data
truncate table only ptest.tbl;

-- create dt field indexes
create index i_tbl_dt on ptest.tbl(dt);
create index i_tbl_dt_p2014 on ptest.tbl_p2014(dt);
create index i_tble_dt_p2015 on ptest.tbl_p2015(dt);

-- vacuum
vacuum analyze verbose ptest.tbl;

-- verify parent is empty and partitions have some data (estimated)
select relname, n_live_tup from pg_stat_user_tables where relname like 'tbl%' 
and schemaname = 'ptest' order by relname;

-- check that partitions show in parent
\d+ ptest.tbl

-- force constraint_exclusion to partition
set constraint_exclusion = partition;

-- #1: works
explain analyze select count(1) from ptest.tbl where dt = '2014-06-01'::date;

-- #2: works
explain analyze select count(1) from ptest.tbl where dt = DATE '2014-06-01';

-- #3: full scan (no constraint exclusion)
explain analyze select count(1) from ptest.tbl where dt = 
to_date(201406::text||01::text, 'MMDD');

-- #4: full scan (no constraint exclusion)
explain analyze select count(1) from ptest.tbl where dt = (select 
'2014-06-01'::date);

-- #5: full scan (no constraint exclusion)
explain analyze with foo as (select '2014-06-01'::date dt)
select count(1) from ptest.tbl inner join foo on (ptest.tbl.dt = foo.dt);



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partition Constraint Exclusion Limits

2015-10-27 Thread David G. Johnston
On Tue, Oct 27, 2015 at 2:29 PM, GMail  wrote:

> I have partitioned a large table in my PG database (6.7 billion rows!) by
> a date column and in general constraint exclusion works well but only in
> relatively simple case when the partition key is specified exactly as
> created in the CHECK constraint.  I'm curious if there is a way to get it
> to work a little more generally though.
>
> For example my CHECK constraint (see code below) specifying a hard-coded
> field value works well (#1 and #2).  Specifying a function that returns a
> value even though it is the appropriate type scans all of the partitions
> (#3) unfortunately.  Likewise any join, CTE, or sub-query expression, even
> for a single row that returns the correct type also results in a scan of
> all of the partitions.
>
> I was curious if there was a way specifically to get #3 to work as the
> WHERE predicate in this case is stored as an integer but the table itself
> is partitioned by the appropriate date type.  I believe I could work around
> this issue with dynamic sql in a function but there are lots of cases of
> this type of simple conversion and I wanted to avoid the maintenance of
> creating a function per query.
>

​Short answer, no.

The planner has the responsibility for performing constraint exclusion and
it only has access to constants during its evaluation.  It has no clue what
kind of transformations a function might do.  Various other optimizations
are indeed possible but are not presently performed.

​So, #3 (
to_date(201406::text||01::text, 'MMDD');
​) ​
is down-right impossible given the present architecture
​; and likely any future architecture.

With #4 (
explain analyze select count(1) from ptest.tbl where dt = (select
'2014-06-01'::date);
​) ​
in theory the re-write module could recognize and re-write this remove the
sub-select.
​  But likely real-life is not so simple otherwise the query writer likely
would have simply done is directly themself.

​
​
​
​In a partitioning scheme the partitioning data has to be injected into the
query explicitly so that it is already in place before the planner receives
the query.  Anything within the query requiring "execution" is handled by
the executor and at that point the chance to exclude partitions has come
and gone.

David J.


Re: [PERFORM] Partition Constraint Exclusion Limits

2015-10-27 Thread Vitalii Tymchyshyn
BTW: May be it could be feasible in future to perform partition exclusion
during the execution? This would be very neat feature.

Regards, Vitalii Tymchyshyn

Вт, 27 жовт. 2015 15:03 David G. Johnston  пише:

> On Tue, Oct 27, 2015 at 2:29 PM, GMail  wrote:
>
>> I have partitioned a large table in my PG database (6.7 billion rows!) by
>> a date column and in general constraint exclusion works well but only in
>> relatively simple case when the partition key is specified exactly as
>> created in the CHECK constraint.  I'm curious if there is a way to get it
>> to work a little more generally though.
>>
>> For example my CHECK constraint (see code below) specifying a hard-coded
>> field value works well (#1 and #2).  Specifying a function that returns a
>> value even though it is the appropriate type scans all of the partitions
>> (#3) unfortunately.  Likewise any join, CTE, or sub-query expression, even
>> for a single row that returns the correct type also results in a scan of
>> all of the partitions.
>>
>> I was curious if there was a way specifically to get #3 to work as the
>> WHERE predicate in this case is stored as an integer but the table itself
>> is partitioned by the appropriate date type.  I believe I could work around
>> this issue with dynamic sql in a function but there are lots of cases of
>> this type of simple conversion and I wanted to avoid the maintenance of
>> creating a function per query.
>>
>
> ​Short answer, no.
>
> The planner has the responsibility for performing constraint exclusion and
> it only has access to constants during its evaluation.  It has no clue what
> kind of transformations a function might do.  Various other optimizations
> are indeed possible but are not presently performed.
>
> ​So, #3 (
> to_date(201406::text||01::text, 'MMDD');
> ​) ​
> is down-right impossible given the present architecture
> ​; and likely any future architecture.
>
> With #4 (
> explain analyze select count(1) from ptest.tbl where dt = (select
> '2014-06-01'::date);
> ​) ​
> in theory the re-write module could recognize and re-write this remove the
> sub-select.
> ​  But likely real-life is not so simple otherwise the query writer likely
> would have simply done is directly themself.
>
> ​
> ​
> ​
> ​In a partitioning scheme the partitioning data has to be injected into
> the query explicitly so that it is already in place before the planner
> receives the query.  Anything within the query requiring "execution" is
> handled by the executor and at that point the chance to exclude partitions
> has come and gone.
>
> David J.
>


Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Bertrand Paquet
So,

Tonight, the index on the three field is used, may be my yesterday vacuum
updated stats.

Thx you for your help.

Regards,

Bertrand




2015-10-27 18:33 GMT+01:00 Bertrand Paquet :

> Hi tom,
>
> I did the test yesterday with an index on the three fields, and with a
> partial index on organization and status and where is null condition on
> handled. I saw no modification on query plan.
> May be I forgot to analyze vacuum after. I will retry tonight.
>
> I use a btree index. Is it the good solution, even with the In clause ?
>
> Regards,
>
> Bertrand
>
> Le mardi 27 octobre 2015, Tom Lane  a écrit :
>
>> Bertrand Paquet  writes:
>> > We have a slow query. After analyzing, the planner decision seems to be
>> > discutable : the query is faster when disabling seqscan. See below the
>> two
>> > query plan, and an extract from pg_stats.
>>
>> > Any idea about what to change to help the planner ?
>>
>> Neither one of those plans is very good: you're just hoping that the
>> Filter condition will let a tuple through sooner rather than later.
>>
>> If you care about the performance of this type of query, I'd consider
>> creating an index on (organization_id, status, handled_by) so that all
>> the conditions can be checked in the index.
>>
>> regards, tom lane
>>
>


Re: [PERFORM] Partition Constraint Exclusion Limits

2015-10-27 Thread Jim Nasby

On 10/27/15 3:33 PM, Vitalii Tymchyshyn wrote:

BTW: May be it could be feasible in future to perform partition
exclusion during the execution? This would be very neat feature.


True exclusion? probably not. The problem is you can't completely 
exclude something based on any value that could change during execution.


There has been some work done on declarative partition specification, 
where a given value would be fit to the exact partition it belong in. 
IIRC that's currently stalled though.


One thing you could try would be to create an index on each partition 
that would always be empty. IE, if you have a June 2015 partition, you 
could:


CREATE INDEX ... ON( date_field ) WHERE date_field < '2015-6-1'::date OR 
date_field >= '2015-7-1'::date;


Because the WHERE clause will never be true, that index will always be 
empty, which will make probing it very fast. I suspect that might be 
faster than probing a regular index on the date field, but you should 
test it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner wants to use seq scan

2015-10-27 Thread Jim Nasby

On 10/27/15 3:56 PM, Bertrand Paquet wrote:

Tonight, the index on the three field is used, may be my yesterday
vacuum updated stats.


BTW, you can run just ANALYZE, which is *far* faster than a VACUUM on a 
large table.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Scalability to more than 64 cores With PG 9.4 and RHEL 7.1 Kernel 3.10

2015-10-27 Thread Javier Muro



Has PostgreSQL 9.4  official Support on a system with more than 64 legitimate 
cores? ( 72 Cores , 4 CPUs  Intel(R) Xeon(R) CPU E7-8890 )


The  work Robert Haas did to fix the CPU locking way back when showed 
significant improvements up to 64, but so far as I know.

Thanks in Advance.