Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-28 Thread Tomas Vondra
Hi,

the perf profiles look really weird - clearly, there's a lot of lock
contention, because the top item is this

13.49%13.40%  postmaster   postgres  [.]   LWLockAcquire
 |

 ---LWLockAcquire



That's a sign of lock contention - not sure which one, though. And then
there's heap_hot_search_buffer

12.65% 7.92%  postmaster   postgres [.] heap_hot_search_buffer
 |

 ---heap_hot_search_buffer


So either there's a table with many HOT-updated rows (repeatedly), or we
search the HOT chains very frequently for some reason. Considering it
only affects the non-LIMIT case, I'd guess the latter.

There also seems to be quite a few page faults, for some reason.

Trying to reproduce this without the actual data is rather futile. We
need some sort of reproducer (sample data to test the query on).

regards

On 03/28/2018 10:30 AM, Alessandro Aste wrote:
> Hello,  any news ?
> 
> Thank you,
> 
> Alessandro.
> 
> On Fri, Mar 23, 2018 at 8:22 PM, Alessandro Aste
> > wrote:
> 
> PS , in the meanwhile I discovered a 2nd workaround(beside disabling
> parallel processing) . I added offset  0 to the subquery , and,
> according to the documentation, “OFFSET 0 is the same as omitting
> the OFFSET clause”
> - https://www.postgresql.org/docs/current/static/queries-limit.html
> 
> 
> cmd3dev=# show max_parallel_workers_per_gather ;
> 
> *max_parallel_workers_per_gather*
> 
> *-*
> 
> *8*
> 
> (1 row)
> 
>  
> 
> cmd3dev=# \timing
> 
> Timing is on.
> 
> cmd3dev=#  SELECT * FROM (SELECT seg.circuit_id AS id,
> vendor_gtt_pop.vendor_id, CASE WHEN vendor_gtt_pop.push_to_gaa = 1
> THEN 'Y' END as gaa, pop.gii_circuitid AS pop_name, cst.label AS
> pop_status, seg.a_company_name AS pop_location, seg.vendor_id AS
> pop_provider_id, seg.vendor_name AS pop_provider, cs.address1 AS
> pop_street, cs.city AS pop_city, cs.postal_code AS pop_postal_code,
> cs.state AS pop_state, csc.code AS pop_country, cs.country_id AS
> pop_country_id FROM ( SELECT c.gii_circuitid, max(so.id
> ) AS service_order_id FROM service_order so join
> circuit c on c.product_id=so.product_id join master_service_order
> mso on mso.id =so.master_service_order_id WHERE NOT
> (so.ordertype_id = 2 AND so.status <> 999) AND NOT (so.ordertype_id
> = 3 AND so.status <> 999) AND c.status >= 20 AND c.status not in
> (160,999) AND mso.client_id=11615 AND c.service_description=28 AND
> c.status!=160 GROUP BY c.gii_circuitid ) pop JOIN service_order so
> ON so.id  = pop.service_order_id left JOIN client_site
> cs on cs.id =so.a_site_id left JOIN country csc on
> csc.id =cs.country_id JOIN circuit c ON
> so.product_id=c.product_id JOIN circuit_status cst ON cst.id
> =c.status JOIN ( SELECT c.id  AS
> circuit_id, sg.id  AS segment_id,
> c.pop_support_vendor_id AS vendor_id, v.name  AS
> vendor_name, sg.a_company_name FROM segment sg JOIN circuit_layout
> cl ON cl.segment_id = sg.id  AND cl.ordinal = 1 JOIN
> circuit c ON c.id  = cl.circuit_id JOIN vendor v ON
> v.id  = c.pop_support_vendor_id ) seg ON seg.circuit_id
> = c.id  JOIN vendor_gtt_pop on
> vendor_gtt_pop.gtt_pop_id = seg.circuit_id offset 0) foo where
> vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id desc LIMIT 10;
> 
>    id   | vendor_id | gaa |   pop_name    | pop_status |
> pop_location | pop_provider_id | pop_provider | pop_street 
> | pop_city | pop
> 
> _postal_code | pop_state | pop_country | pop_country_id
> 
> 
> +---+-+---++--+-+--+-+--+
> 
> -+---+-+
> 
> 684807 | 12346 | Y   | GTT/POP/LON1T | Active | LON1T   
> |   12288 | Telehouse UK | 14 Coriander Avenue | London   | E14
> 
> 2AA |   | GB  |    219
> 
> (1 row)
> 
>  
> 
> *Time: 2245.073 ms (00:02.245)*
> 
> 
> 
> 
> On Fri, Mar 23, 2018 at 9:31 AM, Alessandro Aste
> > wrote:
> 
> Tomas, I'm attaching a 4MB file with the perf report. Let me
> know if it gets blocked, I'll shrink it to the first 1000 lines. 
> 
> Thank you,
> 
> Alessandro.
> 
> On Thu, Mar 22, 2018 at 11:42 PM, Tomas Vondra
>  > wrote:
> 
>   

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-28 Thread Alessandro Aste
Hello,  any news ?

Thank you,

Alessandro.

On Fri, Mar 23, 2018 at 8:22 PM, Alessandro Aste 
wrote:

> PS , in the meanwhile I discovered a 2nd workaround(beside disabling
> parallel processing) . I added offset  0 to the subquery , and, according
> to the documentation, “OFFSET 0 is the same as omitting the OFFSET clause”
> - https://www.postgresql.org/docs/current/static/queries-limit.html
>
> cmd3dev=# show max_parallel_workers_per_gather ;
>
> *max_parallel_workers_per_gather*
>
> *-*
>
> *8*
>
> (1 row)
>
>
>
> cmd3dev=# \timing
>
> Timing is on.
>
> cmd3dev=#  SELECT * FROM (SELECT seg.circuit_id AS id,
> vendor_gtt_pop.vendor_id, CASE WHEN vendor_gtt_pop.push_to_gaa = 1 THEN 'Y'
> END as gaa, pop.gii_circuitid AS pop_name, cst.label AS pop_status,
> seg.a_company_name AS pop_location, seg.vendor_id AS pop_provider_id,
> seg.vendor_name AS pop_provider, cs.address1 AS pop_street, cs.city AS
> pop_city, cs.postal_code AS pop_postal_code, cs.state AS pop_state,
> csc.code AS pop_country, cs.country_id AS pop_country_id FROM ( SELECT
> c.gii_circuitid, max(so.id) AS service_order_id FROM service_order so
> join circuit c on c.product_id=so.product_id join master_service_order mso
> on mso.id=so.master_service_order_id WHERE NOT (so.ordertype_id = 2 AND
> so.status <> 999) AND NOT (so.ordertype_id = 3 AND so.status <> 999) AND
> c.status >= 20 AND c.status not in (160,999) AND mso.client_id=11615 AND
> c.service_description=28 AND c.status!=160 GROUP BY c.gii_circuitid ) pop
> JOIN service_order so ON so.id = pop.service_order_id left JOIN
> client_site cs on cs.id=so.a_site_id left JOIN country csc on 
> csc.id=cs.country_id
> JOIN circuit c ON so.product_id=c.product_id JOIN circuit_status cst ON
> cst.id=c.status JOIN ( SELECT c.id AS circuit_id, sg.id AS segment_id,
> c.pop_support_vendor_id AS vendor_id, v.name AS vendor_name,
> sg.a_company_name FROM segment sg JOIN circuit_layout cl ON cl.segment_id =
> sg.id AND cl.ordinal = 1 JOIN circuit c ON c.id = cl.circuit_id JOIN
> vendor v ON v.id = c.pop_support_vendor_id ) seg ON seg.circuit_id = c.id
> JOIN vendor_gtt_pop on vendor_gtt_pop.gtt_pop_id = seg.circuit_id offset 0)
> foo where vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id desc LIMIT
> 10;
>
>id   | vendor_id | gaa |   pop_name| pop_status | pop_location |
> pop_provider_id | pop_provider | pop_street  | pop_city | pop
>
> _postal_code | pop_state | pop_country | pop_country_id
>
> +---+-+---++
> --+-+--+
> -+--+
>
> -+---+-+
>
> 684807 | 12346 | Y   | GTT/POP/LON1T | Active | LON1T
> |   12288 | Telehouse UK | 14 Coriander Avenue | London   | E14
>
> 2AA |   | GB  |219
>
> (1 row)
>
>
>
> *Time: 2245.073 ms (00:02.245)*
>
>
>
> On Fri, Mar 23, 2018 at 9:31 AM, Alessandro Aste <
> alessandro.a...@gmail.com> wrote:
>
>> Tomas, I'm attaching a 4MB file with the perf report. Let me know if it
>> gets blocked, I'll shrink it to the first 1000 lines.
>>
>> Thank you,
>>
>> Alessandro.
>>
>> On Thu, Mar 22, 2018 at 11:42 PM, Tomas Vondra <
>> tomas.von...@2ndquadrant.com> wrote:
>>
>>> On 03/22/2018 11:29 PM, Alessandro Aste wrote:
>>> > Thanks Tomas. We're currently building postgres from source. In order
>>> to
>>> > enable symbols, you want me to re-configure postres with
>>> --enable-debug
>>> > then run perf?
>>> >
>>>
>>> Yes.
>>>
>>> regards
>>>
>>> --
>>> Tomas Vondra  http://www.2ndQuadrant.com
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>>
>


Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-23 Thread Alessandro Aste
 PS , in the meanwhile I discovered a 2nd workaround(beside disabling
parallel processing) . I added offset  0 to the subquery , and, according
to the documentation, “OFFSET 0 is the same as omitting the OFFSET clause” -
 https://www.postgresql.org/docs/current/static/queries-limit.html

cmd3dev=# show max_parallel_workers_per_gather ;

*max_parallel_workers_per_gather*

*-*

*8*

(1 row)



cmd3dev=# \timing

Timing is on.

cmd3dev=#  SELECT * FROM (SELECT seg.circuit_id AS id,
vendor_gtt_pop.vendor_id, CASE WHEN vendor_gtt_pop.push_to_gaa = 1 THEN 'Y'
END as gaa, pop.gii_circuitid AS pop_name, cst.label AS pop_status,
seg.a_company_name AS pop_location, seg.vendor_id AS pop_provider_id,
seg.vendor_name AS pop_provider, cs.address1 AS pop_street, cs.city AS
pop_city, cs.postal_code AS pop_postal_code, cs.state AS pop_state,
csc.code AS pop_country, cs.country_id AS pop_country_id FROM ( SELECT
c.gii_circuitid, max(so.id) AS service_order_id FROM service_order so join
circuit c on c.product_id=so.product_id join master_service_order mso on
mso.id=so.master_service_order_id WHERE NOT (so.ordertype_id = 2 AND
so.status <> 999) AND NOT (so.ordertype_id = 3 AND so.status <> 999) AND
c.status >= 20 AND c.status not in (160,999) AND mso.client_id=11615 AND
c.service_description=28 AND c.status!=160 GROUP BY c.gii_circuitid ) pop
JOIN service_order so ON so.id = pop.service_order_id left JOIN client_site
cs on cs.id=so.a_site_id left JOIN country csc on csc.id=cs.country_id JOIN
circuit c ON so.product_id=c.product_id JOIN circuit_status cst ON
cst.id=c.status
JOIN ( SELECT c.id AS circuit_id, sg.id AS segment_id,
c.pop_support_vendor_id AS vendor_id, v.name AS vendor_name,
sg.a_company_name FROM segment sg JOIN circuit_layout cl ON cl.segment_id =
sg.id AND cl.ordinal = 1 JOIN circuit c ON c.id = cl.circuit_id JOIN vendor
v ON v.id = c.pop_support_vendor_id ) seg ON seg.circuit_id = c.id JOIN
vendor_gtt_pop on vendor_gtt_pop.gtt_pop_id = seg.circuit_id offset 0) foo
where vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id desc LIMIT 10;

   id   | vendor_id | gaa |   pop_name| pop_status | pop_location |
pop_provider_id | pop_provider | pop_street  | pop_city | pop

_postal_code | pop_state | pop_country | pop_country_id

+---+-+---++--+-+--+-+--+

-+---+-+

684807 | 12346 | Y   | GTT/POP/LON1T | Active | LON1T
|   12288 | Telehouse UK | 14 Coriander Avenue | London   | E14

2AA |   | GB  |219

(1 row)



*Time: 2245.073 ms (00:02.245)*



On Fri, Mar 23, 2018 at 9:31 AM, Alessandro Aste 
wrote:

> Tomas, I'm attaching a 4MB file with the perf report. Let me know if it
> gets blocked, I'll shrink it to the first 1000 lines.
>
> Thank you,
>
> Alessandro.
>
> On Thu, Mar 22, 2018 at 11:42 PM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On 03/22/2018 11:29 PM, Alessandro Aste wrote:
>> > Thanks Tomas. We're currently building postgres from source. In order to
>> > enable symbols, you want me to re-configure postres with  --enable-debug
>> > then run perf?
>> >
>>
>> Yes.
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-22 Thread Alessandro Aste
Thanks Tomas. We're currently building postgres from source. In order to
enable symbols, you want me to re-configure postres with  --enable-debug
then run perf?

Regards,

On Thu, Mar 22, 2018 at 5:00 PM, Tomas Vondra 
wrote:

>
>
> On 03/22/2018 11:33 AM, Alessandro Aste wrote:
> > Tomas, thank you. This machine  is abare metal server running only a
> > staging postgresql 10.3  instance. Nobody is using it beside me.
> >
> > I'm attaching 4 files.
> >
> > every_30_seconds_top_stats_during_query.txt- this is a caputure of
> > the top command every 30 seconds(more or less) for 10+ minutes while I'm
> > running the query. Let me know if this helps to answere your question.
> > EXPLAIN_WITH_LIMIT_AND_max_parallel_workers_per_gather_SET_TO_0.txt   -
> > query plan  with full query and  max_parallel_workers_per_gather  force
> > to 0. Full output.
> > EXPLAIN_WITH_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.txt  -
> >  query plan with full query and  default parellel processing settings.
> > Full output.
> > EXPLAIN_WITHOUT_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.tx  -
> > query plan of the query omitting the LIMIT clause and default parellel
> > processing settings. Full output.
> >
>
> OK. Looking at the top output, I see this:
>
>PID USER   VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 104880 postgres  30.8g 1.9g 1.9g R 92.0  1.5  15:15.60 postmaster
> 111732 postgres  30.8g 476m 473m R 88.2  0.4   0:00.47 postmaster
> 111730 postgres  30.8g 473m 470m R 86.4  0.4   0:00.46 postmaster
> 111731 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
> 111733 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
> 111734 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
> 111728 root  15824 1912  828 R  3.8  0.0   0:00.04 top
>
> That means it certainly is not stuck, it's simply doing a lot of work on
> CPU. The question is why and what it's doing ...
>
> Can you collect some CPU profiles using perf? There's a howto here:
>
> https://wiki.postgresql.org/wiki/Profiling_with_perf
>
> But in short - install perf, install debuginfo packages for postgres,
> and then do
>
> perf record -a -g -s sleep 60
>
> while running the query. Once the profile data is collected, do
>
> perf report > report.txt
>
> and share the report.txt with us (well, if it's very large you may need
> to only cut the first 1000 lines or so).
>
> That should tell us in which functions most of the time is spent. That
> will give us some hints, hopefully.
>
> >
> > For what concerns the  self-contained test case - I'll do my best to
> > prepare it.
> >
>
> Yeah, that would be helpful.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-22 Thread Tomas Vondra
On 03/22/2018 11:29 PM, Alessandro Aste wrote:
> Thanks Tomas. We're currently building postgres from source. In order to
> enable symbols, you want me to re-configure postres with  --enable-debug
> then run perf?
> 

Yes.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-22 Thread Tomas Vondra


On 03/22/2018 11:33 AM, Alessandro Aste wrote:
> Tomas, thank you. This machine  is abare metal server running only a
> staging postgresql 10.3  instance. Nobody is using it beside me. 
> 
> I'm attaching 4 files.  
> 
> every_30_seconds_top_stats_during_query.txt    - this is a caputure of
> the top command every 30 seconds(more or less) for 10+ minutes while I'm
> running the query. Let me know if this helps to answere your question.
> EXPLAIN_WITH_LIMIT_AND_max_parallel_workers_per_gather_SET_TO_0.txt   - 
> query plan  with full query and  max_parallel_workers_per_gather  force
> to 0. Full output.
> EXPLAIN_WITH_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.txt  - 
>  query plan with full query and  default parellel processing settings. 
> Full output.
> EXPLAIN_WITHOUT_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.tx  -
> query plan of the query omitting the LIMIT clause and default parellel
> processing settings. Full output.
> 

OK. Looking at the top output, I see this:

   PID USER   VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
104880 postgres  30.8g 1.9g 1.9g R 92.0  1.5  15:15.60 postmaster
111732 postgres  30.8g 476m 473m R 88.2  0.4   0:00.47 postmaster
111730 postgres  30.8g 473m 470m R 86.4  0.4   0:00.46 postmaster
111731 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
111733 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
111734 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
111728 root  15824 1912  828 R  3.8  0.0   0:00.04 top

That means it certainly is not stuck, it's simply doing a lot of work on
CPU. The question is why and what it's doing ...

Can you collect some CPU profiles using perf? There's a howto here:

https://wiki.postgresql.org/wiki/Profiling_with_perf

But in short - install perf, install debuginfo packages for postgres,
and then do

perf record -a -g -s sleep 60

while running the query. Once the profile data is collected, do

perf report > report.txt

and share the report.txt with us (well, if it's very large you may need
to only cut the first 1000 lines or so).

That should tell us in which functions most of the time is spent. That
will give us some hints, hopefully.

> 
> For what concerns the  self-contained test case - I'll do my best to
> prepare it.
> 

Yeah, that would be helpful.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-22 Thread Alessandro Aste
Tomas, thank you. This machine  is abare metal server running only a
staging postgresql 10.3  instance. Nobody is using it beside me.

I'm attaching 4 files.

every_30_seconds_top_stats_during_query.txt- this is a caputure of the
top command every 30 seconds(more or less) for 10+ minutes while I'm
running the query. Let me know if this helps to answere your question.
EXPLAIN_WITH_LIMIT_AND_max_parallel_workers_per_gather_SET_TO_0.txt   -  query
plan  with full query and  max_parallel_workers_per_gather  force to 0.
Full output.
EXPLAIN_WITH_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.txt  -query
plan with full query and  default parellel processing settings.  Full
output.
EXPLAIN_WITHOUT_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.tx  - query
plan of the query omitting the LIMIT clause and default parellel processing
settings. Full output.


For what concerns the  self-contained test case - I'll do my best to
prepare it.

Thank you very much, please let me know if this answer your questions.






Il 22 mar 2018 3:04 AM, "Tomas Vondra"  ha
scritto:

>
> On 03/21/2018 08:44 PM, Alessandro Aste wrote:
> > Thanks for your reply Tomas.  The query just got stuck for forever.  I
> > observed no CPU spikes, it is currently running and I see 89 of the CPU
> > idle. I'm using a relative powerfull machine (120 GB ram, 56 CPU).
> >
>
> That doesn't really answer the question, I'm afraid. I suppose "89 of
> CPU" means that 89% idle in total, but 11% with 56 CPUs still means
> about 6 cores 100% busy. But maybe you meant something else?
>
> Is there something else running on the machine? If you look at "top" are
> the processes (the one you're connected to and the parallel workers)
> doing something on the CPU?
>
> >
> > Plain analyze as requested. :
> >
>
> I don't see anything obviously broken with the query plan, and it's
> difficult to compare with the other plans because they are quite different.
>
> But there's one part of the plan interesting:
>
>  Limit  (cost=253523.56..253523.57 rows=1 width=176)
>->  Sort  (cost=253523.56..253523.57 rows=1 width=176)
>  Sort Key: c_2.gii_circuitid, c_1.id
>  ->  Nested Loop  (cost=33190.89..253523.55 rows=1 width=176)
>Join Filter: (c_1.id = c.id)
>->  Nested Loop  (cost=31724.87..31736.29 rows=1 ...)
>  ...
>->  Gather  (cost=1466.02..221787.23 rows=3 width=75)
>  Workers Planned: 5
>  ->  Hash Join  (cost=466.02..220786.93 rows=1 ...)
>  ...
>
> That is, there's a Gather on the inner side of a Nested Loop. I wonder
> if that might cause issues in case of under-estimate (in which case we'd
> be restarting the Gather many times) ...
>
>
> BTW one of the plans you sent earlier is incomplete, because it ends
> like this:
>
> ->  Nested Loop  (cost=42469.41..42480.82 rows=1 width=85) (...)
> Join Filter: (c.status = cst.id)
> Time: 3016.688 ms (00:03.017)
>
> That is, it's missing the part below the join.
>
>
> That being said, I'm not sure what's the issue here. Can you prepare a
> self-contained test case that we might use to reproduce the issue? For
> example by dumping the relevant part of the tables?
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
QUERY PLAN
Sort  (cost=253514.80..253514.81 rows=1 width=176)
  Sort Key: c_2.gii_circuitid, c_1.id
  ->  Hash Join  (cost=33193.56..253514.79 rows=1 width=176)
Hash Cond: (c_1.id = c.id)
->  Gather  (cost=1466.02..221787.23 rows=3 width=75)
  Workers Planned: 5
  ->  Hash Join  (cost=466.02..220786.93 rows=1 width=75)
Hash Cond: (c_1.id = vendor_gtt_pop.gtt_pop_id)
->  Hash Join  (cost=444.07..219779.19 rows=157724 width=63)
  Hash Cond: (c_1.pop_support_vendor_id = v.id)
  ->  Merge Join  (cost=5.02..217348.87 rows=157724 
width=40)
Merge Cond: (cl.circuit_id = c_1.id)
->  Nested Loop  (cost=0.86..171314.49 
rows=157724 width=32)
  ->  Parallel Index Only Scan using 
circuit_layout_idx on circuit_layout cl  (cost=0.43..55430.93 rows=157724 
width=8)
Index Cond: (ordinal = 1)
  ->  Index Scan using uniqid on segment sg 
 (cost=0.43..0.73 rows=1 width=32)
Index Cond: (id = cl.segment_id)
->  Index Scan using circuit_id_key on circuit 
c_1  (cost=0.42..41790.58 rows=909014 width=8)
  ->  Hash  (cost=325.69..325.69 rows=9069 width=27)
->  Seq Scan on vendor v  (cost=0.00..325.69 

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-21 Thread Tomas Vondra

On 03/21/2018 08:44 PM, Alessandro Aste wrote:
> Thanks for your reply Tomas.  The query just got stuck for forever.  I
> observed no CPU spikes, it is currently running and I see 89 of the CPU
> idle. I'm using a relative powerfull machine (120 GB ram, 56 CPU).  
> 

That doesn't really answer the question, I'm afraid. I suppose "89 of
CPU" means that 89% idle in total, but 11% with 56 CPUs still means
about 6 cores 100% busy. But maybe you meant something else?

Is there something else running on the machine? If you look at "top" are
the processes (the one you're connected to and the parallel workers)
doing something on the CPU?

> 
> Plain analyze as requested. : 
> 

I don't see anything obviously broken with the query plan, and it's
difficult to compare with the other plans because they are quite different.

But there's one part of the plan interesting:

 Limit  (cost=253523.56..253523.57 rows=1 width=176)
   ->  Sort  (cost=253523.56..253523.57 rows=1 width=176)
 Sort Key: c_2.gii_circuitid, c_1.id
 ->  Nested Loop  (cost=33190.89..253523.55 rows=1 width=176)
   Join Filter: (c_1.id = c.id)
   ->  Nested Loop  (cost=31724.87..31736.29 rows=1 ...)
 ...
   ->  Gather  (cost=1466.02..221787.23 rows=3 width=75)
 Workers Planned: 5
 ->  Hash Join  (cost=466.02..220786.93 rows=1 ...)
 ...

That is, there's a Gather on the inner side of a Nested Loop. I wonder
if that might cause issues in case of under-estimate (in which case we'd
be restarting the Gather many times) ...


BTW one of the plans you sent earlier is incomplete, because it ends
like this:

->  Nested Loop  (cost=42469.41..42480.82 rows=1 width=85) (...)
Join Filter: (c.status = cst.id)
Time: 3016.688 ms (00:03.017)

That is, it's missing the part below the join.


That being said, I'm not sure what's the issue here. Can you prepare a
self-contained test case that we might use to reproduce the issue? For
example by dumping the relevant part of the tables?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-21 Thread Alessandro Aste
Thanks for your reply Tomas.  The query just got stuck for forever.  I
observed no CPU spikes, it is currently running and I see 89 of the CPU
idle. I'm using a relative powerfull machine (120 GB ram, 56 CPU).


Plain analyze as requested. :



QUERY PLAN



-
-
 Limit  (cost=253523.56..253523.57 rows=1 width=176)
   ->  Sort  (cost=253523.56..253523.57 rows=1 width=176)
 Sort Key: c_2.gii_circuitid, c_1.id
 ->  Nested Loop  (cost=33190.89..253523.55 rows=1 width=176)
   Join Filter: (c_1.id = c.id)
   ->  Nested Loop  (cost=31724.87..31736.29 rows=1 width=85)
 Join Filter: (c.status = cst.id)
 ->  Nested Loop  (cost=31724.87..31734.84 rows=1
width=74)
   ->  Nested Loop Left Join
(cost=31724.45..31734.35 rows=1 width=70)
 Join Filter: (csc.id = cs.country_id)
 ->  Nested Loop Left Join
(cost=31724.45..31726.73 rows=1 width=68)
   ->  Nested Loop
(cost=31724.02..31726.27 rows=1 width=30)
 ->  GroupAggregate
(cost=31723.60..31723.62 rows=1 width=26)
   Group Key:
c_2.gii_circuitid
   ->  Sort
(cost=31723.60..31723.60 rows=1 width=26)
 Sort Key:
c_2.gii_circuitid
 ->  Gather
(cost=1000.85..31723.59 rows=1 width=26)
   Workers
Planned: 3
   ->  Nested
Loop  (cost=0.85..30723.49 rows=1 width=26)
 ->
Nested Loop  (cost=0.42..30722.56 rows=2 width=30)

 ->  Parallel Seq Scan on circuit c_2  (cost=0.00..30714.61 rows=3
width=26)

 Filter: ((status >= 20) AND (status <> ALL ('{160,999}'::int
eger[])) AND (status <> 160) AND (service_description = 28))

 ->  Index Scan using so_pid_idx on service_order so_1  (cost=0.42.
.2.65 rows=1 width=12)

 Index Cond: (product_id = c_2.product_id)

 Filter: (((ordertype_id <> 2) OR (status = 999)) AND ((order
type_id <> 3) OR (status = 999)))
 ->
Index Scan using master_service_order_id_key on master_service_order
 mso  (cost=0.42..0.46 rows=1 width=4)

 Index Cond: (id = so_1.master_service_order_id)

 Filter: (client_id = 11615)
 ->  Index Scan using
service_order_id_key on service_order so  (cost=0.42..2.64 rows=1 width=12)
   Index Cond: (id = (max(
so_1.id)))
   ->  Index Scan using
client_site_pkey on client_site cs  (cost=0.42..0.46 rows=1 width=46)
 Index Cond: (id = so.a_site_id)
 ->  Seq Scan on country csc
(cost=0.00..4.50 rows=250 width=6)
   ->  Index Scan using circuit_product_id_idx on
circuit c  (cost=0.42..0.49 rows=1 width=12)
 Index Cond: (product_id = so.product_id)
 ->  Seq Scan on circuit_status cst  (cost=0.00..1.20
rows=20 width=19)
   ->  Gather  (cost=1466.02..221787.23 rows=3 width=75)
 Workers Planned: 5
 ->  Hash Join  (cost=466.02..220786.93 rows=1 width=75)
   Hash Cond: (c_1.id = vendor_gtt_pop.gtt_pop_id)
   ->  Hash Join  (cost=444.07..219779.19
rows=157724 width=63)
 Hash Cond: (c_1.pop_support_vendor_id =
v.id)
 ->  Merge Join  (cost=5.02..217348.87
rows=157724 width=40)
   Merge Cond: (cl.circuit_id = c_1.id)
   ->  Nested Loop
(cost=0.86..171314.49 rows=157724 width=32)
 ->  Parallel Index Only Scan
using circuit_layout_idx on circuit_layout cl  (cost=0.43..55430.93
 rows=157724 width=8)
   Index Cond: (ordinal = 1)
 ->  Index Scan using uniqid on
segment sg  (cost=0.43..0.73 rows=1 width=32)
   Index Cond: (id =
cl.segment_id)
   ->  Index Scan using circuit_id_key
on circuit c_1  (cost=0.42..41790.58 rows=909014 width=8)
 ->  Hash  (cost=325.69..325.69 

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-21 Thread Tomas Vondra


On 03/21/2018 05:09 PM, Alessandro Aste wrote:
> Hi there, we are using postgresql 10.3 and we're facing an issue with a
> query. The query (full query below)  completes only  when: 
> 
> 1 - LIMIT 10 is removed
> or
> 2 -  show max_parallel_workers_per_gather  is set to 0, so parallel
> processing is disabled.
> 
> With  max_parallel_workers_per_gather   set to the default value (8) I'm
> not even able to get the query plan.
> 
> Notes: 
> 
>   * We're experiencing the issue in any server of ours but I've
> reproduced the issue in a fresh restored database with full
> vacuum/reindex of the tables.
>   * We didn't touch any parameter concering the parallel processing,
> we're running the defaults: 
> 
> 
> cmdstaging=# show max_parallel_workers_per_gather ;
>  max_parallel_workers_per_gather
> -
>  8
> (1 row)
> 
> cmdstaging=# show max_worker_processes ;
>  max_worker_processes
> --
>  8
> (1 row)
> 
> cmdstaging=# show max_parallel_workers;
>  max_parallel_workers
> --
>  8
> (1 row)
> 
> 
> 
> 
> The query completes only omitting the LIMIT clause or when I disable
> parallel processing: 
> 
>  id   | vendor_id | gaa |   pop_name    | pop_status | pop_location |
> pop_provider_id | pop_provider |     pop_street      | pop_city | pop
> _postal_code | pop_state | pop_country | pop_country_id
> +---+-+---++--+-+--+-+--+
> -+---+-+
>  684807 |     12346 |     | GTT/POP/LON1T | Active     | LON1T        | 
>          12288 | Telehouse UK | 14 Coriander Avenue | London   | E14
>  2AA         |           | GB          |            219
> (1 row)
> 
> Time: 4374.759 ms (00:04.375)
> cmdstaging=# show max_parallel_workers_per_gather ;
>  max_parallel_workers_per_gather
> -
>  0
> (1 row)
> 
> Time: 0.097 ms
> 
> 
> Otherwise it just keep running for forever. 
> 

When you say "running forever" is it actually using CPU, or does it get
stuck on something?

> 
> This is the full query: 
> 
> 
> SELECT * FROM (
> SELECT
> seg.circuit_id AS id,
> vendor_gtt_pop.vendor_id,
> CASE WHEN vendor_gtt_pop.push_to_gaa = 1 THEN 'Y' END as gaa,
> pop.gii_circuitid AS pop_name,
> cst.label AS pop_status,
> seg.a_company_name AS pop_location,
> seg.vendor_id AS pop_provider_id,
> seg.vendor_name AS pop_provider,
> cs.address1 AS pop_street,
> cs.city AS pop_city,
> cs.postal_code AS pop_postal_code,
> cs.state AS pop_state,
> csc.code AS pop_country,
> cs.country_id AS pop_country_id
> FROM (
> SELECT c.gii_circuitid, max(so.id ) AS service_order_id
> FROM service_order so
> join circuit c on c.product_id=so.product_id
> join master_service_order mso on mso.id 
> =so.master_service_order_id
> WHERE NOT (so.ordertype_id = 2 AND so.status <> 999) AND
> NOT (so.ordertype_id = 3 AND so.status <> 999) AND
> c.status >= 20 AND
> c.status not in (160,999) AND
> mso.client_id=11615 AND
> c.service_description=28 AND
> c.status!=160
> GROUP BY c.gii_circuitid
> ) pop
> JOIN service_order so ON so.id  = pop.service_order_id
> left JOIN client_site cs on cs.id =so.a_site_id
> left JOIN country csc on csc.id =cs.country_id
> JOIN circuit c ON so.product_id=c.product_id
> JOIN circuit_status cst ON cst.id =c.status
> JOIN (
> SELECT c.id  AS circuit_id, sg.id  AS segment_id, 
> c.pop_support_vendor_id AS vendor_id,
> v.name  AS vendor_name, sg.a_company_name
> FROM segment sg
> JOIN circuit_layout cl ON cl.segment_id = sg.id  AND cl.ordinal 
> = 1
> JOIN circuit c ON c.id  = cl.circuit_id
> JOIN vendor v ON v.id  = c.pop_support_vendor_id
> ) seg ON seg.circuit_id = c.id 
> JOIN vendor_gtt_pop on vendor_gtt_pop.gtt_pop_id = seg.circuit_id
> ) foo where vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id LIMIT 10
> 
> 
> 
> Execution plan with  max_parallel_workers_per_gather =0 , 
> max_parallel_workers_per_gather =8 and no LIMIT clause : 
> 
> 
We really need to see the execution plan that causes issues, i.e.
max_parallel_workers_per_gather=8 with LIMIT clause. Plain explain
(without analyze), at least.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-21 Thread Alessandro Aste
Hi there, we are using postgresql 10.3 and we're facing an issue with a
query. The query (full query below)  completes only  when:

1 - LIMIT 10 is removed
or
2 -  show max_parallel_workers_per_gather  is set to 0, so parallel
processing is disabled.

With  max_parallel_workers_per_gather   set to the default value (8) I'm
not even able to get the query plan.

Notes:


   - We're experiencing the issue in any server of ours but I've reproduced
   the issue in a fresh restored database with full vacuum/reindex of the
   tables.
   - We didn't touch any parameter concering the parallel processing, we're
   running the defaults:


cmdstaging=# show max_parallel_workers_per_gather ;
 max_parallel_workers_per_gather
-
 8
(1 row)

cmdstaging=# show max_worker_processes ;
 max_worker_processes
--
 8
(1 row)

cmdstaging=# show max_parallel_workers;
 max_parallel_workers
--
 8
(1 row)




The query completes only omitting the LIMIT clause or when I disable
parallel processing:

 id   | vendor_id | gaa |   pop_name| pop_status | pop_location |
pop_provider_id | pop_provider | pop_street  | pop_city | pop
_postal_code | pop_state | pop_country | pop_country_id
+---+-+---++--+-+--+-+--+
-+---+-+
 684807 | 12346 | | GTT/POP/LON1T | Active | LON1T|
   12288 | Telehouse UK | 14 Coriander Avenue | London   | E14
 2AA |   | GB  |219
(1 row)

Time: 4374.759 ms (00:04.375)
cmdstaging=# show max_parallel_workers_per_gather ;
 max_parallel_workers_per_gather
-
 0
(1 row)

Time: 0.097 ms


Otherwise it just keep running for forever.






This is the full query:



SELECT * FROM (
SELECT
seg.circuit_id AS id,
vendor_gtt_pop.vendor_id,
CASE WHEN vendor_gtt_pop.push_to_gaa = 1 THEN 'Y' END as gaa,
pop.gii_circuitid AS pop_name,
cst.label AS pop_status,
seg.a_company_name AS pop_location,
seg.vendor_id AS pop_provider_id,
seg.vendor_name AS pop_provider,
cs.address1 AS pop_street,
cs.city AS pop_city,
cs.postal_code AS pop_postal_code,
cs.state AS pop_state,
csc.code AS pop_country,
cs.country_id AS pop_country_id
FROM (
SELECT c.gii_circuitid, max(so.id) AS service_order_id
FROM service_order so
join circuit c on c.product_id=so.product_id
join master_service_order mso on mso.id=so.master_service_order_id
WHERE NOT (so.ordertype_id = 2 AND so.status <> 999) AND
NOT (so.ordertype_id = 3 AND so.status <> 999) AND
c.status >= 20 AND
c.status not in (160,999) AND
mso.client_id=11615 AND
c.service_description=28 AND
c.status!=160
GROUP BY c.gii_circuitid
) pop
JOIN service_order so ON so.id = pop.service_order_id
left JOIN client_site cs on cs.id=so.a_site_id
left JOIN country csc on csc.id=cs.country_id
JOIN circuit c ON so.product_id=c.product_id
JOIN circuit_status cst ON cst.id=c.status
JOIN (
SELECT c.id AS circuit_id, sg.id AS segment_id,
c.pop_support_vendor_id AS vendor_id,v.name AS vendor_name,
sg.a_company_name
FROM segment sg
JOIN circuit_layout cl ON cl.segment_id = sg.id AND cl.ordinal = 1
JOIN circuit c ON c.id = cl.circuit_id
JOIN vendor v ON v.id = c.pop_support_vendor_id
) seg ON seg.circuit_id = c.id
JOIN vendor_gtt_pop on vendor_gtt_pop.gtt_pop_id = seg.circuit_id
) foo where vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id LIMIT 10



Execution plan with  max_parallel_workers_per_gather =0 ,
max_parallel_workers_per_gather =8 and no LIMIT clause :



=== QUERY PLAN WITH  max_parallel_workers_per_gather  = 0 ===

 Limit  (cost=433508.77..433508.77 rows=1 width=176) (actual
time=2992.875..2992.875 rows=1 loops=1)
   ->  Sort  (cost=433508.77..433508.77 rows=1 width=176) (actual
time=2992.874..2992.874 rows=1 loops=1)
 Sort Key: c_2.gii_circuitid, c_1.id
 Sort Method: quicksort  Memory: 25kB
 ->  Nested Loop  (cost=168438.04..433508.76 rows=1 width=176)
(actual time=1747.038..2992.864 rows=1 loops=1)
   Join Filter: (c_1.id = vendor_gtt_pop.gtt_pop_id)
   Rows Removed by Join Filter: 662
   ->  Hash Join  (cost=168438.04..433486.80 rows=1 width=148)
(actual time=1540.484..2952.522 rows=663 loops=1)
 Hash Cond: (c_1.id = c.id)
 ->  Hash Join  (cost=125957.20..388048.64 rows=788618
width=63) (actual time=1335.406..2746.835 rows=1582 loops=1)
   Hash Cond: (c_1.pop_support_vendor_id = v.id)
   ->  Hash Join  (cost=125518.15..377653.28
rows=788618 width=40) (actual time=1330.727..2711.206 rows=778564 loops=
1)
 Hash Cond: (sg.id = cl.segment_id)
 ->  Seq Scan on segment sg
(cost=0.00..231189.42 rows=3482542 width=32) (actual time=0.004..472.967
rows=34
83328 loops=1)