subject:"\[HACKERS\] Gather Merge"

Re: [HACKERS] Gather Merge

2017-03-14 Thread Robert Haas

On Tue, Mar 14, 2017 at 9:56 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Cool, thanks for the review.  I'm not quite confident that we've found
>> all of the bugs here yet, but I think we're moving in the right
>> direction.
>
> I guess the real question here is why isn't Gather Merge more like
> Append and MergeAppend?  That is, why did you complicate matters
> by making it projection capable?  That seems like a pretty random
> decision from here.

Well, then it would be inconsistent with plain old Gather.  I thought
about going that route - ripping whatever projection logic Gather has
out and teaching the system that it's not projection-capable - but I
don't see what that buys us.  It's pretty useful to be able to project
on top of Gather-type nodes, because they will often be at the top of
the plan, just before returning the results to the user.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-14 Thread Tom Lane

Robert Haas  writes:
> Cool, thanks for the review.  I'm not quite confident that we've found
> all of the bugs here yet, but I think we're moving in the right
> direction.

I guess the real question here is why isn't Gather Merge more like
Append and MergeAppend?  That is, why did you complicate matters
by making it projection capable?  That seems like a pretty random
decision from here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-14 Thread Robert Haas

On Tue, Mar 14, 2017 at 5:47 AM, Rushabh Lathia
 wrote:
> Thanks Robert for the patch and the explanation.
>
> I studied the patch and that look right to me. I performed manual testing,
> run the scripts which I created during the gather merge patch also run
> the tpch queries to make sure that it all working good.
>
> I haven't found any regression the that changes.

Cool, thanks for the review.  I'm not quite confident that we've found
all of the bugs here yet, but I think we're moving in the right
direction.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-14 Thread Rushabh Lathia

On Mon, Mar 13, 2017 at 10:56 PM, Robert Haas  wrote:

> On Fri, Mar 10, 2017 at 7:59 AM, Rushabh Lathia
>  wrote:
> > Error coming from create_gather_merge_plan() from below condition:
> >
> > if (memcmp(sortColIdx, gm_plan->sortColIdx,
> >numsortkeys * sizeof(AttrNumber)) != 0)
> > elog(ERROR, "GatherMerge child's targetlist doesn't match
> > GatherMerge");
> >
> > Above condition checks the sort column numbers explicitly, to ensure the
> > tlists
> > really do match up. This been copied from the create_merge_append_plan().
> > Now
> > this make sense as for MergeAppend as its not projection capable plan
> (see
> > is_projection_capable_plan()). But like Gather, GatherMerge is the
> > projection
> > capable and its target list can be different from the subplan, so I don't
> > think this
> > condition make sense for the GatherMerge.
> >
> > Here is the some one the debugging info, through which I was able to
> reach
> > to this conclusion:
> >
> > - targetlist for the GatherMerge and subpath is same during
> > create_gather_merge_path().
> >
> > - targetlist for the GatherMerge is getting changed into
> > create_gather_merge_plan().
> >
> > postgres=# explain (analyze, verbose) select t2.j from t1 JOIN t2 ON
> > t1.k=t2.k where t1.i=1 order by t1.j desc;
> > NOTICE:  path parthtarget: {PATHTARGET :exprs ({VAR :varno 2 :varattno 2
> > :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2
> :varoattno
> > 2 :location 34} {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1
> > :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 2 :location 90})
> > :sortgrouprefs 0 1 :cost.startup 0.00 :cost.per_tuple 0.00 :width 8}
> >
> > NOTICE:  subpath parthtarget: {PATHTARGET :exprs ({VAR :varno 1
> :varattno 2
> > :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1
> :varoattno
> > 2 :location 90} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1
> > :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 34})
> > :cost.startup 0.00 :cost.per_tuple 0.00 :width 8}
> >
> > - Attached memory watch point and found that target list for GatherMerge
> is
> > getting
> > changed into groupping_planner() -> apply_projection_to_path().
> >
> > PFA patch to fix this issue.
>
> I don't think this fix is correct, partly on theoretical grounds and
> partly because I managed to make it crash.  The problem is that
> prepare_sort_for_pathkeys() actually alters the output tlist of Gather
> Merge, which is inconsistent with the idea that Gather Merge can do
> projection.  It's going to produce whatever
> prepare_sort_for_pathkeys() says it's going to produce, which may or
> may not be what was there before.  Using Kuntal's dump file and your
> patch:
>
> set min_parallel_table_scan_size = 0;
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
> set enable_sort = false;
> set enable_bitmapscan = false;
> alter table t1 alter column j type text;
> select t2.i from t1 join t2 on t1.k=t2.k where t1.i=1 order by t1.j desc;
>
> Crash.   Abbreviated stack trace:
>
> #0  pg_detoast_datum_packed (datum=0xbc) at fmgr.c:2176
> #1  0x00010160e707 in varstrfastcmp_locale (x=188, y=819,
> ssup=0x7fe1ea06a568) at varlena.c:1997
> #2  0x0001013efc73 in ApplySortComparator [inlined] () at
> /Users/rhaas/pgsql/src/include/utils/sortsupport.h:225
> #3  0x0001013efc73 in heap_compare_slots (a= unavailable, due to optimizations>, b= due to optimizations>, arg=0x7fe1ea04e590) at sortsupport.h:681
> #4  0x0001014057b2 in sift_down (heap=0x7fe1ea079458,
> node_off=) at
> binaryheap.c:274
> #5  0x00010140573a in binaryheap_build (heap=0x7fe1ea079458) at
> binaryheap.c:131
> #6  0x0001013ef771 in gather_merge_getnext [inlined] () at
> /Users/rhaas/pgsql/src/backend/executor/nodeGatherMerge.c:421
> #7  0x0001013ef771 in ExecGatherMerge (node=0x7fe1ea04e590) at
> nodeGatherMerge.c:240
>
> Obviously, this is happening because we're trying to apply a
> comparator for text to a value of type integer.  I propose the
> attached, slightly more involved fix, which rips out the first call to
> prepare_sort_from_pathkeys() altogether.
>
> Thanks Robert for the patch and the explanation.

I studied the patch and that look right to me. I performed manual testing,
run the scripts which I created during the gather merge patch also run
the tpch queries to make sure that it all working good.

I haven't found any regression the that changes.



> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-03-13 Thread Robert Haas

On Fri, Mar 10, 2017 at 7:59 AM, Rushabh Lathia
 wrote:
> Error coming from create_gather_merge_plan() from below condition:
>
> if (memcmp(sortColIdx, gm_plan->sortColIdx,
>numsortkeys * sizeof(AttrNumber)) != 0)
> elog(ERROR, "GatherMerge child's targetlist doesn't match
> GatherMerge");
>
> Above condition checks the sort column numbers explicitly, to ensure the
> tlists
> really do match up. This been copied from the create_merge_append_plan().
> Now
> this make sense as for MergeAppend as its not projection capable plan (see
> is_projection_capable_plan()). But like Gather, GatherMerge is the
> projection
> capable and its target list can be different from the subplan, so I don't
> think this
> condition make sense for the GatherMerge.
>
> Here is the some one the debugging info, through which I was able to reach
> to this conclusion:
>
> - targetlist for the GatherMerge and subpath is same during
> create_gather_merge_path().
>
> - targetlist for the GatherMerge is getting changed into
> create_gather_merge_plan().
>
> postgres=# explain (analyze, verbose) select t2.j from t1 JOIN t2 ON
> t1.k=t2.k where t1.i=1 order by t1.j desc;
> NOTICE:  path parthtarget: {PATHTARGET :exprs ({VAR :varno 2 :varattno 2
> :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno
> 2 :location 34} {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1
> :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 2 :location 90})
> :sortgrouprefs 0 1 :cost.startup 0.00 :cost.per_tuple 0.00 :width 8}
>
> NOTICE:  subpath parthtarget: {PATHTARGET :exprs ({VAR :varno 1 :varattno 2
> :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno
> 2 :location 90} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1
> :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 34})
> :cost.startup 0.00 :cost.per_tuple 0.00 :width 8}
>
> - Attached memory watch point and found that target list for GatherMerge is
> getting
> changed into groupping_planner() -> apply_projection_to_path().
>
> PFA patch to fix this issue.

I don't think this fix is correct, partly on theoretical grounds and
partly because I managed to make it crash.  The problem is that
prepare_sort_for_pathkeys() actually alters the output tlist of Gather
Merge, which is inconsistent with the idea that Gather Merge can do
projection.  It's going to produce whatever
prepare_sort_for_pathkeys() says it's going to produce, which may or
may not be what was there before.  Using Kuntal's dump file and your
patch:

set min_parallel_table_scan_size = 0;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set enable_sort = false;
set enable_bitmapscan = false;
alter table t1 alter column j type text;
select t2.i from t1 join t2 on t1.k=t2.k where t1.i=1 order by t1.j desc;

Crash.   Abbreviated stack trace:

#0  pg_detoast_datum_packed (datum=0xbc) at fmgr.c:2176
#1  0x00010160e707 in varstrfastcmp_locale (x=188, y=819,
ssup=0x7fe1ea06a568) at varlena.c:1997
#2  0x0001013efc73 in ApplySortComparator [inlined] () at
/Users/rhaas/pgsql/src/include/utils/sortsupport.h:225
#3  0x0001013efc73 in heap_compare_slots (a=, b=, arg=0x7fe1ea04e590) at sortsupport.h:681
#4  0x0001014057b2 in sift_down (heap=0x7fe1ea079458,
node_off=) at
binaryheap.c:274
#5  0x00010140573a in binaryheap_build (heap=0x7fe1ea079458) at
binaryheap.c:131
#6  0x0001013ef771 in gather_merge_getnext [inlined] () at
/Users/rhaas/pgsql/src/backend/executor/nodeGatherMerge.c:421
#7  0x0001013ef771 in ExecGatherMerge (node=0x7fe1ea04e590) at
nodeGatherMerge.c:240

Obviously, this is happening because we're trying to apply a
comparator for text to a value of type integer.  I propose the
attached, slightly more involved fix, which rips out the first call to
prepare_sort_from_pathkeys() altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


gm_plan_fix_rmh.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-10 Thread Rushabh Lathia

Error coming from create_gather_merge_plan() from below condition:

if (memcmp(sortColIdx, gm_plan->sortColIdx,
   numsortkeys * sizeof(AttrNumber)) != 0)
elog(ERROR, "GatherMerge child's targetlist doesn't match
GatherMerge");

Above condition checks the sort column numbers explicitly, to ensure the
tlists
really do match up. This been copied from the create_merge_append_plan().
Now
this make sense as for MergeAppend as its not projection capable plan (see
is_projection_capable_plan()). But like Gather, GatherMerge is the
projection
capable and its target list can be different from the subplan, so I don't
think this
condition make sense for the GatherMerge.

Here is the some one the debugging info, through which I was able to reach
to this conclusion:

- targetlist for the GatherMerge and subpath is same during
create_gather_merge_path().

- targetlist for the GatherMerge is getting changed into
create_gather_merge_plan().

postgres=# explain (analyze, verbose) select t2.j from t1 JOIN t2 ON
t1.k=t2.k where t1.i=1 order by t1.j desc;
NOTICE:  path parthtarget: {PATHTARGET :exprs ({VAR :varno 2 :varattno 2
:vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2
:varoattno 2 :location 34} {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod
-1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 2 :location 90})
:sortgrouprefs 0 1 :cost.startup 0.00 :cost.per_tuple 0.00 :width 8}

NOTICE:  subpath parthtarget: {PATHTARGET :exprs ({VAR :varno 1 :varattno 2
:vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1
:varoattno 2 :location 90} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod
-1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 34})
:cost.startup 0.00 :cost.per_tuple 0.00 :width 8}

- Attached memory watch point and found that target list for GatherMerge is
getting
changed into groupping_planner() -> apply_projection_to_path().

PFA patch to fix this issue.



On Fri, Mar 10, 2017 at 4:12 PM, Rushabh Lathia 
wrote:

>
>
> On Fri, Mar 10, 2017 at 4:09 PM, Kuntal Ghosh 
> wrote:
>
>> On Fri, Mar 10, 2017 at 3:04 PM, Rushabh Lathia
>>  wrote:
>> >
>> >
>> > On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh <
>> andr...@visena.com>
>> > wrote:
>> >>
>> >> På fredag 10. mars 2017 kl. 10:09:22, skrev Rushabh Lathia
>> >> :
>> >>
>> >>
>> >>
>> >> On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh <
>> andr...@visena.com>
>> >> wrote:
>> >>>
>> >>> [...]
>> >>> The execution-plan seems (unsurprisingly) to depend on
>> data-distribution,
>> >>> so is there a way I can force a GatherMerge?
>> >>
>> >>
>> >> Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
>> >> parallel_tuple_cost and cpu_operator_cost. May be you can force this
>> >> by setting this cost low enough. Or another way to force is by disable
>> the
>> >> other plans.
>> >>
>> >> What plan you are getting now? You not seeing the below error ?
>> >>
>> >> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
>> >>
>> >>
>> >> I'm seeing the same error, it's just that for reproducing it I'd rather
>> >> not copy my whole dataset.
>> >
>> >
>> > Can you share me a schema information, I will try to reproduce at my
>> side?
>> I'm able to reproduce the error. I've attached the dump file and a
>> script to reproduce it.
>>
>> The following query executes successfully.
>> postgres=# explain select t1.* from t1 JOIN t2 ON t1.k=t2.k where
>> t1.i=1 order by t1.j desc;
>>   QUERY PLAN
>> 
>> ---
>>  Gather Merge  (cost=0.58..243.02 rows=943 width=12)
>>Workers Planned: 1
>>->  Nested Loop  (cost=0.57..235.94 rows=555 width=12)
>>  ->  Parallel Index Scan Backward using idx_t1_i_j on t1
>> (cost=0.29..14.33 rows=603 width=12)
>>Index Cond: (i = 1)
>>  ->  Index Only Scan using idx_t2_k on t2  (cost=0.29..0.34
>> rows=3 width=4)
>>Index Cond: (k = t1.k)
>> (7 rows)
>>
>> Whereas, If columns from t2 is projected, it throws the same error.
>> postgres=# explain select t2.* from t1 JOIN t2 ON t1.k=t2.k where
>> t1.i=1 order by t1.j desc;
>> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
>>
>>
> Thanks Kuntal.
>
> I am able to reproduce the issue with the shared script. I will look into
> this now.
>
>
>>
>>
>> --
>> Thanks & Regards,
>> Kuntal Ghosh
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
>
>
> --
> Rushabh Lathia
>



-- 
Rushabh Lathia


gm_plan_fix.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-10 Thread Rushabh Lathia

On Fri, Mar 10, 2017 at 4:09 PM, Kuntal Ghosh 
wrote:

> On Fri, Mar 10, 2017 at 3:04 PM, Rushabh Lathia
>  wrote:
> >
> >
> > On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh <
> andr...@visena.com>
> > wrote:
> >>
> >> På fredag 10. mars 2017 kl. 10:09:22, skrev Rushabh Lathia
> >> :
> >>
> >>
> >>
> >> On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh <
> andr...@visena.com>
> >> wrote:
> >>>
> >>> [...]
> >>> The execution-plan seems (unsurprisingly) to depend on
> data-distribution,
> >>> so is there a way I can force a GatherMerge?
> >>
> >>
> >> Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
> >> parallel_tuple_cost and cpu_operator_cost. May be you can force this
> >> by setting this cost low enough. Or another way to force is by disable
> the
> >> other plans.
> >>
> >> What plan you are getting now? You not seeing the below error ?
> >>
> >> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
> >>
> >>
> >> I'm seeing the same error, it's just that for reproducing it I'd rather
> >> not copy my whole dataset.
> >
> >
> > Can you share me a schema information, I will try to reproduce at my
> side?
> I'm able to reproduce the error. I've attached the dump file and a
> script to reproduce it.
>
> The following query executes successfully.
> postgres=# explain select t1.* from t1 JOIN t2 ON t1.k=t2.k where
> t1.i=1 order by t1.j desc;
>   QUERY PLAN
> 
> ---
>  Gather Merge  (cost=0.58..243.02 rows=943 width=12)
>Workers Planned: 1
>->  Nested Loop  (cost=0.57..235.94 rows=555 width=12)
>  ->  Parallel Index Scan Backward using idx_t1_i_j on t1
> (cost=0.29..14.33 rows=603 width=12)
>Index Cond: (i = 1)
>  ->  Index Only Scan using idx_t2_k on t2  (cost=0.29..0.34
> rows=3 width=4)
>Index Cond: (k = t1.k)
> (7 rows)
>
> Whereas, If columns from t2 is projected, it throws the same error.
> postgres=# explain select t2.* from t1 JOIN t2 ON t1.k=t2.k where
> t1.i=1 order by t1.j desc;
> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
>
>
Thanks Kuntal.

I am able to reproduce the issue with the shared script. I will look into
this now.


>
>
> --
> Thanks & Regards,
> Kuntal Ghosh
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-03-10 Thread Andreas Joseph Krogh

På fredag 10. mars 2017 kl. 10:34:48, skrev Rushabh Lathia <
rushabh.lat...@gmail.com >:
    On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh > wrote: På fredag 10. mars 2017 kl. 10:09:22, skrev 
Rushabh Lathia >:
    On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh > wrote: [...]


The execution-plan seems (unsurprisingly) to depend on data-distribution, so 
is there a way I can force a GatherMerge?
 
Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
parallel_tuple_cost and cpu_operator_cost. May be you can force this
by setting this cost low enough. Or another way to force is by disable the
other plans.
 
What plan you are getting now? You not seeing the below error ?
 
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge



 
I'm seeing the same error, it's just that for reproducing it I'd rather not 
copy my whole dataset.
 
Can you share me a schema information, I will try to reproduce at my side?



 
The relevant schema is this:
 
drop table if EXISTS temp_email_address_owner; drop table if EXISTS 
temp_email_delivery;drop table if EXISTS temp_email_message; create table 
temp_email_message( entity_idBIGSERIAL PRIMARY KEY ); create table 
temp_email_delivery( entity_idBIGSERIAL PRIMARY KEY, message_id bigint not null 
referencestemp_email_message(entity_id), from_entity_id bigint, 
received_timestamptimestamp not null ); create table temp_email_address_owner( 
entity_idBIGSERIAL PRIMARY KEY, message_id bigint not null references 
temp_email_message(entity_id), recipient_idbigint ); 
EXPLAIN ANALYSE SELECT em.entity_id FROM temp_email_delivery del JOIN 
temp_email_message emON (del.message_id = em.entity_id) WHERE 
del.from_entity_id =279519 OR em.entity_id IN ( SELECT ea_owner.message_id FROM 
temp_email_address_owner ea_ownerWHERE ea_owner.recipient_id = 279519 ) ORDER BY
del.received_timestampDESC  LIMIT 101 OFFSET 0; 
 
.. But I'm having a hard time reproducing it.
I've tried to copy data from the relevant tables to the test-tables (temp_*), 
adding indexes etc. but Gathre Merge works just fine:
 
│ Limit  (cost=209378.96..209391.05 rows=101 width=16) (actual 
time=799.380..799.432 rows=101 loops=1) 

  │
 │   ->  Gather Merge  (cost=209378.96..262335.79 rows=442285 width=16) 
(actual time=799.379..799.420 rows=101 loops=1) 

  │
 │ Workers Planned: 4 


│
 │ Workers Launched: 4 


   │
 │ ->  Sort  (cost=208378.90..208655.33 rows=110571 width=16) (actual 
time=785.029..785.042 rows=81 loops=5) 

 │
 │   Sort Key: del.received_timestamp DESC 

   │
 │   Sort Method: quicksort  Memory: 29kB 

│
 │   ->  Hash Join  (cost=52036.86..204145.01 rows=110571 
width=16) (actual time=400.812..784.907 rows=95 loops=5) 
   │
 │ Hash Cond: (del.message_id = em.entity_id) 

│
 │ Join Filter: ((del.from_entity_id = 279519) OR (hashed 
SubPlan 1)) 

│
 │ Rows Removed by Join Filter: 176799 

   │
 │ ->  Parallel Seq Scan on temp_email_delivery del 
 (cost=0.00..142515.18 rows=221118 width=24) (actual time=0.033..211.196 
rows=176894 loops=5) │
 │ ->  Hash  (cost=39799.72..39799.72 rows=730772 width=8) 
(actual time=368.746..368.746 rows=730772 loops=5)

Re: [HACKERS] Gather Merge

2017-03-10 Thread Rushabh Lathia

On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh 
wrote:

> På fredag 10. mars 2017 kl. 10:09:22, skrev Rushabh Lathia <
> rushabh.lat...@gmail.com>:
>
>
>
> On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh 
> wrote:
>>
>> [...]
>> The execution-plan seems (unsurprisingly) to depend on data-distribution,
>> so is there a way I can force a GatherMerge?
>>
>
> Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
> parallel_tuple_cost and cpu_operator_cost. May be you can force this
> by setting this cost low enough. Or another way to force is by disable the
> other plans.
>
> What plan you are getting now? You not seeing the below error ?
>
> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
>
>
> I'm seeing the same error, it's just that for reproducing it I'd rather
> not copy my whole dataset.
>

Can you share me a schema information, I will try to reproduce at my side?



>
> --
> *Andreas Joseph Krogh*
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-03-10 Thread Andreas Joseph Krogh

På fredag 10. mars 2017 kl. 10:09:22, skrev Rushabh Lathia <
rushabh.lat...@gmail.com >:
    On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh > wrote: [...]


The execution-plan seems (unsurprisingly) to depend on data-distribution, so 
is there a way I can force a GatherMerge?
 
Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
parallel_tuple_cost and cpu_operator_cost. May be you can force this
by setting this cost low enough. Or another way to force is by disable the
other plans.
 
What plan you are getting now? You not seeing the below error ?
 
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge



 
I'm seeing the same error, it's just that for reproducing it I'd rather not 
copy my whole dataset.
 
-- Andreas Joseph Krogh

Re: [HACKERS] Gather Merge

2017-03-10 Thread Rushabh Lathia

On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh 
wrote:

> På fredag 10. mars 2017 kl. 09:53:47, skrev Rushabh Lathia <
> rushabh.lat...@gmail.com>:
>
>
>
> On Fri, Mar 10, 2017 at 1:44 PM, Andreas Joseph Krogh 
> wrote:
>>
>> På torsdag 09. mars 2017 kl. 18:09:45, skrev Robert Haas <
>> robertmh...@gmail.com>:
>>
>> On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
>>  wrote:
>> > I don't see this failure with the patch. Even I forced the gather merge
>> > in the above query and that just working fine.
>> >
>> > Attaching patch, with the discussed changes.
>>
>> Committed.
>>
>>
>>
>> I'm still getting (as of 9c2635e26f6f4e34b3b606c0fc79d0e111953a74):
>> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
>>
>>
>> from this query:
>>
>>
>> EXPLAIN ANALYSE SELECTem.entity_idFROM origo_email_delivery del
>> JOIN origo_email_message em ON (del.message_id = em.entity_id)WHERE 1 = 
>> 1  AND del.owner_id = 3  AND (
>>   del.from_entity_id = 279519  OR  
>> del.from_entity_id = 3 AND em.entity_id IN (
>>   SELECT  ea_owner.message_id
>>   FROM origo_email_address_owner ea_owner
>>   WHERE ea_owner.recipient_id = 279519  )
>>   )
>> ORDER BY del.received_timestamp DESCLIMIT 101 OFFSET 0;
>>
>>
>> Is this known or shall I provide more info/schema etc?
>>
>
> Please provide the reproducible test if possible.
>
>
> The execution-plan seems (unsurprisingly) to depend on data-distribution,
> so is there a way I can force a GatherMerge?
>

Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
parallel_tuple_cost and cpu_operator_cost. May be you can force this
by setting this cost low enough. Or another way to force is by disable the
other plans.

What plan you are getting now? You not seeing the below error ?

ERROR:  GatherMerge child's targetlist doesn't match GatherMerge


> --
> *Andreas Joseph Krogh*
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-03-10 Thread Andreas Joseph Krogh

På fredag 10. mars 2017 kl. 09:53:47, skrev Rushabh Lathia <
rushabh.lat...@gmail.com >:
    On Fri, Mar 10, 2017 at 1:44 PM, Andreas Joseph Krogh > wrote: På torsdag 09. mars 2017 kl. 18:09:45, 
skrev Robert Haas >:
On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
 > wrote:
 > I don't see this failure with the patch. Even I forced the gather merge
 > in the above query and that just working fine.
 >
 > Attaching patch, with the discussed changes.

 Committed.
 
 
I'm still getting (as of 9c2635e26f6f4e34b3b606c0fc79d0e111953a74): 
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge

  
from this query:
 
EXPLAIN ANALYSE SELECT em.entity_id FROM origo_email_delivery del JOIN 
origo_email_message emON (del.message_id = em.entity_id) WHERE 1 = 1 AND 
del.owner_id =3 AND ( del.from_entity_id = 279519 OR del.from_entity_id = 3 AND 
em.entity_idIN ( SELECT ea_owner.message_id FROM origo_email_address_owner 
ea_ownerWHERE ea_owner.recipient_id = 279519 ) ) ORDER BY del.received_timestamp
DESCLIMIT 101 OFFSET 0; 
 
Is this known or shall I provide more info/schema etc?
 
Please provide the reproducible test if possible.



 
The execution-plan seems (unsurprisingly) to depend on data-distribution, so 
is there a way I can force a GatherMerge?
 
-- Andreas Joseph Krogh

Re: [HACKERS] Gather Merge

2017-03-10 Thread Rushabh Lathia

On Fri, Mar 10, 2017 at 1:44 PM, Andreas Joseph Krogh 
wrote:

> På torsdag 09. mars 2017 kl. 18:09:45, skrev Robert Haas <
> robertmh...@gmail.com>:
>
> On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
>  wrote:
> > I don't see this failure with the patch. Even I forced the gather merge
> > in the above query and that just working fine.
> >
> > Attaching patch, with the discussed changes.
>
> Committed.
>
>
>
> I'm still getting (as of 9c2635e26f6f4e34b3b606c0fc79d0e111953a74):
> ERROR:  GatherMerge child's targetlist doesn't match GatherMerge
>
>
> from this query:
>
>
> EXPLAIN ANALYSE SELECTem.entity_idFROM origo_email_delivery del
> JOIN origo_email_message em ON (del.message_id = em.entity_id)WHERE 1 = 1 
>  AND del.owner_id = 3  AND (
>   del.from_entity_id = 279519  OR  del.from_entity_id 
> = 3 AND em.entity_id IN (
>   SELECT  ea_owner.message_id
>   FROM origo_email_address_owner ea_owner
>   WHERE ea_owner.recipient_id = 279519  )
>   )
> ORDER BY del.received_timestamp DESCLIMIT 101 OFFSET 0;
>
>
> Is this known or shall I provide more info/schema etc?
>

Please provide the reproducible test if possible.


>
> If I select del.entity_id, it works:
>
>
> EXPLAIN ANALYSE SELECTdel.entity_id
> FROM origo_email_delivery del
> JOIN origo_email_message em ON (del.message_id = 
> em.entity_id)
> WHERE 1 = 1  AND del.owner_id = 3 
>  AND (
>   del.from_entity_id = 279519 
>  OR  del.from_entity_id = 3 AND em.entity_id IN (
>   SELECT  
> ea_owner.message_id
>   FROM origo_email_address_owner ea_owner
>   WHERE ea_owner.recipient_id = 279519
>   )
>   )
>
> ORDER BY del.received_timestamp DESCLIMIT 101 
> OFFSET 0;
>
>
> Plan is:
> │ Limit  (cost=1259.72..15798.21 rows=101 width=16) (actual
> time=152.946..153.269 rows=34 loops=1)
>   
> │
>
> │   ->  Gather Merge  (cost=1259.72..3139414.43 rows=21801 width=16)
> (actual time=152.945..153.264 rows=34 loops=1)
>
>│
> │ Workers Planned: 4
>
>
> │
> │ Workers Launched: 4
>
>
>│
> │ ->  Nested Loop  (cost=259.66..3135817.66 rows=5450 width=16)
> (actual time=95.295..137.549 rows=7 loops=5)
> │
> │   ->  Parallel Index Scan Backward using
> origo_email_received_idx on origo_email_delivery del  (cost=0.42..312163.56
> rows=10883 width=32) (actual time=0.175..121.434 rows=6540 loops=5)
> │
> │ Filter: ((owner_id = 3) AND ((from_entity_id =
> 279519) OR (from_entity_id = 3)))
>
>   │
> │ Rows Removed by Filter: 170355
>
>
>   │
> │   ->  Index Only Scan using origo_email_message_pkey on
> origo_email_message em  (cost=259.24..259.45 rows=1 width=8) (actual
> time=0.002..0.002 rows=0 loops=32699)   │
> │ Index Cond: (entity_id = del.message_id)
>
>
> │
> │ Filter: ((del.from_entity_id = 279519) OR
> ((del.from_entity_id = 3) AND (hashed SubPlan 1)))
>
> │
> │ Rows Removed by Filter: 1
>
>
>│
> │ Heap Fetches: 0
>
>
>│
> │ SubPlan 1
>
>
>  │
> │   ->  Index Scan using 
> origo_email_address_owner_recipient_id_idx
> on origo_email_address_owner ea_owner  (cost=0.43..258.64 rows=69 width=8)
> (actual time=0.032..0.294 rows=175 loops=5) │
> │ Index Cond: (recipient_id = 279519)
>
>
>  │
> │ Planning time: 1.372 ms
>
>
>│
> │ Execution time: 170.859 ms
>
>
> │
>
>
>
> --
> *Andreas Joseph Krogh*
>
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-03-10 Thread Andreas Joseph Krogh

På torsdag 09. mars 2017 kl. 18:09:45, skrev Robert Haas >:
On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
  wrote:
 > I don't see this failure with the patch. Even I forced the gather merge
 > in the above query and that just working fine.
 >
 > Attaching patch, with the discussed changes.

 Committed.
 
 
I'm still getting (as of 9c2635e26f6f4e34b3b606c0fc79d0e111953a74): 
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge

  
from this query:
 
EXPLAIN ANALYSE SELECT em.entity_id FROM origo_email_delivery del JOIN 
origo_email_message emON (del.message_id = em.entity_id) WHERE 1 = 1 AND 
del.owner_id =3 AND ( del.from_entity_id = 279519 OR del.from_entity_id = 3 AND 
em.entity_idIN ( SELECT ea_owner.message_id FROM origo_email_address_owner 
ea_ownerWHERE ea_owner.recipient_id = 279519 ) ) ORDER BY del.received_timestamp
DESCLIMIT 101 OFFSET 0; 
 
Is this known or shall I provide more info/schema etc?
 
If I select del.entity_id, it works:
 
EXPLAIN ANALYSE SELECT del.entity_id FROM origo_email_delivery del JOIN 
origo_email_message emON (del.message_id = em.entity_id) WHERE 1 = 1 AND 
del.owner_id =3 AND ( del.from_entity_id = 279519 OR del.from_entity_id = 3 AND 
em.entity_idIN ( SELECT ea_owner.message_id FROM origo_email_address_owner 
ea_ownerWHERE ea_owner.recipient_id = 279519 ) ) ORDER BY del.received_timestamp
DESC LIMIT 101 OFFSET 0; 
 
Plan is:
│ Limit  (cost=1259.72..15798.21 rows=101 width=16) (actual 
time=152.946..153.269 rows=34 loops=1) 

│
 │   ->  Gather Merge  (cost=1259.72..3139414.43 rows=21801 width=16) (actual 
time=152.945..153.264 rows=34 loops=1) 

   │
 │ Workers Planned: 4 


  │
 │ Workers Launched: 4 


 │
 │ ->  Nested Loop  (cost=259.66..3135817.66 rows=5450 width=16) 
(actual time=95.295..137.549 rows=7 loops=5) 

  │
 │   ->  Parallel Index Scan Backward using 
origo_email_received_idx on origo_email_delivery del  (cost=0.42..312163.56 
rows=10883 width=32) (actual time=0.175..121.434 rows=6540 loops=5) │
 │ Filter: ((owner_id = 3) AND ((from_entity_id = 279519) 
OR (from_entity_id = 3))) 

│
 │ Rows Removed by Filter: 170355 

  │
 │   ->  Index Only Scan using origo_email_message_pkey on 
origo_email_message em  (cost=259.24..259.45 rows=1 width=8) (actual 
time=0.002..0.002 rows=0 loops=32699)   │
 │ Index Cond: (entity_id = del.message_id) 

│
 │ Filter: ((del.from_entity_id = 279519) OR 
((del.from_entity_id = 3) AND (hashed SubPlan 1))) 

│
 │ Rows Removed by Filter: 1 

   │
 │ Heap Fetches: 0 


 │
 │ SubPlan 1 


   │
 │   ->  Index Scan using 
origo_email_address_owner_recipient_id_idx on origo_email_address_owner 
ea_owner  (cost=0.43..258.64 rows=69 width=8) (actual time=0.032..0.294 
rows=175 loops=5) │
 │ Index Cond: (recipient_id = 279519) 

 │
 │ Planning time: 1.372 ms

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
 wrote:
> I don't see this failure with the patch. Even I forced the gather merge
> in the above query and that just working fine.
>
> Attaching patch, with the discussed changes.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-09 Thread Rushabh Lathia

On Thu, Mar 9, 2017 at 9:42 PM, Robert Haas  wrote:

> On Thu, Mar 9, 2017 at 8:21 AM, Rushabh Lathia 
> wrote:
> > Thanks Robert for committing this.
> >
> > My colleague Neha Sharma found one regression with the patch. I was about
> > to send this mail and noticed that you committed the patch.
>
> Oops.  Bad timing.
>
> > postgres=# explain select aid from pgbench_accounts where aid % 25= 0
> group
> > by aid;
> > ERROR:  ORDER/GROUP BY expression not found in targetlist
>
> I think your fix for this looks right, although I would write it this way:
>
> -gm_plan->plan.targetlist = subplan->targetlist;
> +gm_plan->plan.targetlist = build_path_tlist(root, _path->path);
>
> The second part of your fix looks wrong.  I think you want this:
>
>  create_gather_merge_path(root,
>   grouped_rel,
>   subpath,
> - NULL,
> + partial_grouping_target,
>   root->group_pathkeys,
>   NULL,
>   _groups);
>
> That will match the create_gather_path case.
>
>
Right, I did that change and perform the test with the fix and I don't
see any regression now.


> This test case is still failing for me even with those fixes:
>
> rhaas=# select aid+1 from pgbench_accounts group by aid+1;
> ERROR:  could not find pathkey item to sort
>
>
I don't see this failure with the patch. Even I forced the gather merge
in the above query and that just working fine.

Attaching patch, with the discussed changes.



Thanks,
Rushabh Lathia


fix_target_gm_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 8:21 AM, Rushabh Lathia  wrote:
> Thanks Robert for committing this.
>
> My colleague Neha Sharma found one regression with the patch. I was about
> to send this mail and noticed that you committed the patch.

Oops.  Bad timing.

> postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
> by aid;
> ERROR:  ORDER/GROUP BY expression not found in targetlist

I think your fix for this looks right, although I would write it this way:

-gm_plan->plan.targetlist = subplan->targetlist;
+gm_plan->plan.targetlist = build_path_tlist(root, _path->path);

The second part of your fix looks wrong.  I think you want this:

 create_gather_merge_path(root,
  grouped_rel,
  subpath,
- NULL,
+ partial_grouping_target,
  root->group_pathkeys,
  NULL,
  _groups);

That will match the create_gather_path case.

This test case is still failing for me even with those fixes:

rhaas=# select aid+1 from pgbench_accounts group by aid+1;
ERROR:  could not find pathkey item to sort

So evidently there is at least one more bug here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-09 Thread Rushabh Lathia

On Thu, Mar 9, 2017 at 6:19 PM, Robert Haas  wrote:

> On Wed, Mar 8, 2017 at 11:59 PM, Rushabh Lathia
>  wrote:
> > Here is another version of patch with the suggested changes.
>
> Committed.
>
>
Thanks Robert for committing this.

My colleague Neha Sharma found one regression with the patch. I was about
to send this mail and noticed that you committed the patch.

Here is the small example:

Test setup:

1) ./db/bin/pgbench postgres -i -F 100 -s 20

2) update pgbench_accounts set filler = 'foo' where aid%10 = 0;

3) vacuum analyze pgbench_accounts

4)

postgres=# set max_parallel_workers_per_gather = 4;
SET

postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
by aid;
ERROR:  ORDER/GROUP BY expression not found in targetlist

postgres=# set enable_indexonlyscan = off;
SET
postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
by aid;
   QUERY
PLAN

 Group  (cost=44708.21..45936.81 rows=10001 width=4)
   Group Key: aid
   ->  Gather Merge  (cost=44708.21..45911.81 rows=1 width=0)
 Workers Planned: 4
 ->  Group  (cost=43708.15..43720.65 rows=2500 width=4)
   Group Key: aid
   ->  Sort  (cost=43708.15..43714.40 rows=2500 width=4)
 Sort Key: aid
 ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..43567.06 rows=2500 width=4)
   Filter: ((aid % 25) = 0)
(10 rows)

- Index only scan with GM do work - but with ORDER BY clause
postgres=# set enable_seqscan = off;
SET
postgres=# explain select aid from pgbench_accounts where aid % 25= 0 order
by aid;
 QUERY
PLAN
-
 Gather Merge  (cost=1000.49..113924.61 rows=10001 width=4)
   Workers Planned: 4
   ->  Parallel Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.43..111733.33 rows=2500 width=4)
 Filter: ((aid % 25) = 0)
(4 rows)

Debugging further I found that problem only when IndexOnlyScan under GM and
that to only with grouping. Debugging problem I found that ressortgroupref
is
not getting set. That lead me thought that it might be because
create_gather_merge_plan()
is building tlist, with build_path_tlist. Another problem I found is that
create_grouping_paths() is passing NULL for the targets while calling
create_gather_merge_path(). (fix_target_gm.patch)

With those changes above test is running fine, but it broke other things.
Like

postgres=# explain select distinct(bid) from pgbench_accounts where filler
like '%foo%' group by aid;
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge

Another thing I noticed that, if we stop adding gathermerge during the
generate_gather_paths() then all the test working fine.
(comment_gm_from_generate_gather.patch)

I will continue looking at this problem.

Attaching both the patch for reference.

regards,
Rushabh Lathia
www.EnterpriseDB.com

comment_gm_from_generate_gather.patch
Description: application/download

fix_target_gm.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas

On Wed, Mar 8, 2017 at 11:59 PM, Rushabh Lathia
 wrote:
> Here is another version of patch with the suggested changes.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-08 Thread Rushabh Lathia

On Thu, Mar 9, 2017 at 8:40 AM, Robert Haas  wrote:

> On Mon, Feb 20, 2017 at 1:35 AM, Rushabh Lathia
>  wrote:
> > Thanks Amit for raising this point. I was not at all aware of
> mark/restore.
> > I tried to come up with the case, but haven't found such case.
> >
> > For now here is the patch with comment update.
>
> Sorry for the delay in getting back to this.  This seems to need minor
> rebasing again.
>
>
Done.


> A few other points:
>
> ExecEndGatherMerge needs to be patched along the lines of
> acf555bc53acb589b5a2827e65d655fa8c9adee0.
>
>
Done.


> @@ -2290,7 +2376,6 @@ create_limit_plan(PlannerInfo *root, LimitPath
> *best_path, int flags)
>  return plan;
>  }
>
> -
>  /***
> **
>   *
>   *  BASE-RELATION SCAN METHODS
>
> Unnecessary.
>

Fixed.

Here is another version of patch with the suggested changes.


Thanks,

Rushabh Lathia
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


gather-merge-v9.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-03-08 Thread Robert Haas

On Mon, Feb 20, 2017 at 1:35 AM, Rushabh Lathia
 wrote:
> Thanks Amit for raising this point. I was not at all aware of mark/restore.
> I tried to come up with the case, but haven't found such case.
>
> For now here is the patch with comment update.

Sorry for the delay in getting back to this.  This seems to need minor
rebasing again.

A few other points:

ExecEndGatherMerge needs to be patched along the lines of
acf555bc53acb589b5a2827e65d655fa8c9adee0.

@@ -2290,7 +2376,6 @@ create_limit_plan(PlannerInfo *root, LimitPath
*best_path, int flags)
 return plan;
 }

-
 /*
  *
  *  BASE-RELATION SCAN METHODS

Unnecessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-20 Thread Amit Kapila

On Mon, Feb 20, 2017 at 1:58 PM, Dilip Kumar  wrote:
> On Mon, Feb 20, 2017 at 12:05 PM, Rushabh Lathia
>  wrote:
>> Thanks Amit for raising this point. I was not at all aware of mark/restore.
>> I tried to come up with the case, but haven't found such case.
>>
>> For now here is the patch with comment update.
>
> I think for reproducing this you need plan something like below (I
> think this is a really bad plan, but you can use to test this
> particular case).
>
> MergeJoin
> -> Index Scan
> -> Gather Merge
>->Parallel Index Scan
>
> So if only IndexScan node is there as a inner node which support
> Mark/Restore then we don't need to insert any materialize node. But
> after we put Gather Merge (which don't support Mark/Restore) then we
> need a materialize node on top of that. Therefore, plan should become
> like this, I think so.
> (But anyway if we have the Gather instead of the GatherMerge we would
> required a Sort node on top of the Gather and Materialize is obviously
> cheaper than the Sort.)
>
> MergeJoin
> -> Index Scan
> -> Materialize
>-> Gather Merge  (Does not support mark/restore)
>->Parallel Index Scan
>

Yes, exactly that's what will happen, however, we are not sure how
many times such plan (Gather Merge on inner side of merge join) will
be helpful and whether adding Mark/Restore support will make it any
faster than just adding Materialize on top of Gather Merge.  So, it
seems better not to go there unless we see some use of it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-20 Thread Dilip Kumar

On Mon, Feb 20, 2017 at 12:05 PM, Rushabh Lathia
 wrote:
> Thanks Amit for raising this point. I was not at all aware of mark/restore.
> I tried to come up with the case, but haven't found such case.
>
> For now here is the patch with comment update.

I think for reproducing this you need plan something like below (I
think this is a really bad plan, but you can use to test this
particular case).

MergeJoin
-> Index Scan
-> Gather Merge
   ->Parallel Index Scan

So if only IndexScan node is there as a inner node which support
Mark/Restore then we don't need to insert any materialize node. But
after we put Gather Merge (which don't support Mark/Restore) then we
need a materialize node on top of that. Therefore, plan should become
like this, I think so.
(But anyway if we have the Gather instead of the GatherMerge we would
required a Sort node on top of the Gather and Materialize is obviously
cheaper than the Sort.)

MergeJoin
-> Index Scan
-> Materialize
   -> Gather Merge  (Does not support mark/restore)
   ->Parallel Index Scan

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-19 Thread Rushabh Lathia

Thanks Amit for raising this point. I was not at all aware of mark/restore.
I tried to come up with the case, but haven't found such case.

For now here is the patch with comment update.

Thanks,


On Sun, Feb 19, 2017 at 7:30 PM, Amit Kapila 
wrote:

> On Sun, Feb 19, 2017 at 2:22 PM, Robert Haas 
> wrote:
> > On Sat, Feb 18, 2017 at 6:43 PM, Amit Kapila 
> wrote:
> >> I think there is a value in supporting mark/restore position for any
> >> node which produces sorted results, however, if you don't want to
> >> support it, then I think we should update above comment in the code to
> >> note this fact.  Also, you might want to check other places to see if
> >> any similar comment updates are required in case you don't want to
> >> support mark/restore position for GatherMerge.
> >
> > I don't think it makes sense to put mark/support restore into Gather
> > Merge.  Maybe somebody else will come up with a test that shows
> > differently, but ISTM that with something like Sort it makes a ton of
> > sense to support mark/restore because the Sort node itself can do it
> > much more cheaply than would be possible with a separate Materialize
> > node.  If you added a separate Materialize node, the Sort node would
> > be trying to throw away the exact same data that the Materialize node
> > was trying to accumulate, which is silly.
> >
>
> I am not sure but there might be some cases where adding Materialize
> node on top of Sort node could make sense as we try to cost it as well
> and add it if it turns out to be cheap.
>
> >  Here with Gather Merge
> > there is no such thing happening; mark/restore support would require
> > totally new code - probably we would end up shoving the same code that
> > already exists in Materialize into Gather Merge as well.
> >
>
> I have tried to evaluate this based on plans reported by Rushabh and
> didn't find any case where GatherMerge can be beneficial by supporting
> mark/restore in the plans where it is being used (it is not being used
> on the right side of merge join).  Now, it is quite possible that it
> might be beneficial at higher scale factors or may be planner has
> ignored such a plan.  However, as we are not clear what kind of
> benefits we can get via mark/restore support for GatherMerge, it
> doesn't make much sense to take the trouble of implementing it.
>
> >
> > A comment update is probably a good idea, though.
> >
>
> Agreed.
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia


gather-merge-v8-update-comment.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-19 Thread Amit Kapila

On Sun, Feb 19, 2017 at 2:22 PM, Robert Haas  wrote:
> On Sat, Feb 18, 2017 at 6:43 PM, Amit Kapila  wrote:
>> I think there is a value in supporting mark/restore position for any
>> node which produces sorted results, however, if you don't want to
>> support it, then I think we should update above comment in the code to
>> note this fact.  Also, you might want to check other places to see if
>> any similar comment updates are required in case you don't want to
>> support mark/restore position for GatherMerge.
>
> I don't think it makes sense to put mark/support restore into Gather
> Merge.  Maybe somebody else will come up with a test that shows
> differently, but ISTM that with something like Sort it makes a ton of
> sense to support mark/restore because the Sort node itself can do it
> much more cheaply than would be possible with a separate Materialize
> node.  If you added a separate Materialize node, the Sort node would
> be trying to throw away the exact same data that the Materialize node
> was trying to accumulate, which is silly.
>

I am not sure but there might be some cases where adding Materialize
node on top of Sort node could make sense as we try to cost it as well
and add it if it turns out to be cheap.

>  Here with Gather Merge
> there is no such thing happening; mark/restore support would require
> totally new code - probably we would end up shoving the same code that
> already exists in Materialize into Gather Merge as well.
>

I have tried to evaluate this based on plans reported by Rushabh and
didn't find any case where GatherMerge can be beneficial by supporting
mark/restore in the plans where it is being used (it is not being used
on the right side of merge join).  Now, it is quite possible that it
might be beneficial at higher scale factors or may be planner has
ignored such a plan.  However, as we are not clear what kind of
benefits we can get via mark/restore support for GatherMerge, it
doesn't make much sense to take the trouble of implementing it.

>
> A comment update is probably a good idea, though.
>

Agreed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-19 Thread Robert Haas

On Sat, Feb 18, 2017 at 6:43 PM, Amit Kapila  wrote:
> I think there is a value in supporting mark/restore position for any
> node which produces sorted results, however, if you don't want to
> support it, then I think we should update above comment in the code to
> note this fact.  Also, you might want to check other places to see if
> any similar comment updates are required in case you don't want to
> support mark/restore position for GatherMerge.

I don't think it makes sense to put mark/support restore into Gather
Merge.  Maybe somebody else will come up with a test that shows
differently, but ISTM that with something like Sort it makes a ton of
sense to support mark/restore because the Sort node itself can do it
much more cheaply than would be possible with a separate Materialize
node.  If you added a separate Materialize node, the Sort node would
be trying to throw away the exact same data that the Materialize node
was trying to accumulate, which is silly.  Here with Gather Merge
there is no such thing happening; mark/restore support would require
totally new code - probably we would end up shoving the same code that
already exists in Materialize into Gather Merge as well.  That doesn't
seem like a good idea off-hand.

A comment update is probably a good idea, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-18 Thread Amit Kapila

On Fri, Feb 17, 2017 at 6:27 PM, Rushabh Lathia
 wrote:
> On Fri, Feb 17, 2017 at 4:47 PM, Amit Kapila 
> wrote:
>>
>> Are you suggesting that commit 0c2070ce has helped to improve
>> performance if so, I don't think that has been proved?  I guess the
>> numbers are different either due to different m/c or some other
>> settings like scale factor or work_mem.
>
>
> I don't really think 0c2070ce is the exact reason. I run the tpch runs
> with the same same setting as what Robert was running. I haven't
> noticed any regression with the runs. For the last runs I also
> uploaded the tpch run outputs for the individual queries for the
> reference.
>

Okay, then I am not sure why you and Robert are seeing different
results, probably because you are using a different machine.  Another
thing which we might need to think about this patch is support of
mark/restore position.  As of now the paths which create data in
sorted order like sort node and indexscan have support for
mark/restore position.   This is required for correctness when such a
node appears on the inner side of Merge Join.  Even though this patch
doesn't support mark/restore, it will not produce wrong results
because planner inserts Materialize node to compensate for it, refer
below code.

final_cost_mergejoin()
{
..
/*
* Even if materializing doesn't look cheaper, we *must* do it if the
* inner path is to be used directly (without sorting) and it doesn't
* support mark/restore.
*
* Since the inner side must be ordered, and only Sorts and IndexScans can
* create order to begin with, and they both support mark/restore, you
* might think there's no problem --- but you'd be wrong.  Nestloop and
* merge joins can *preserve* the order of their inputs, so they can be
* selected as the input of a mergejoin, and they don't support
* mark/restore at present.
*
* We don't test the value of enable_material here, because
* materialization is required for correctness in this case, and turning
* it off does not entitle us to deliver an invalid plan.
*/
else if (innersortkeys == NIL &&
!ExecSupportsMarkRestore(inner_path))
path->materialize_inner = true;
..
}

I think there is a value in supporting mark/restore position for any
node which produces sorted results, however, if you don't want to
support it, then I think we should update above comment in the code to
note this fact.  Also, you might want to check other places to see if
any similar comment updates are required in case you don't want to
support mark/restore position for GatherMerge.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-17 Thread Rushabh Lathia

On Fri, Feb 17, 2017 at 4:47 PM, Amit Kapila 
wrote:

> On Fri, Feb 17, 2017 at 3:59 PM, Thomas Munro
>  wrote:
> > On Thu, Feb 2, 2017 at 2:32 AM, Rushabh Lathia 
> wrote:
> >> Please find attached latest patch.
> >
> > The latest patch still applies (with some fuzz), builds and the
> > regression tests pass.
> >
> > I see that Robert made a number of changes and posted a v6 along with
> > some numbers which he described as lacklustre, but then fixed a row
> > estimate problem which was discouraging parallel joins (commit
> > 0c2070ce).  Rushabh posted a v7 and test results which look good.
> >
>
> Are you suggesting that commit 0c2070ce has helped to improve
> performance if so, I don't think that has been proved?  I guess the
> numbers are different either due to different m/c or some other
> settings like scale factor or work_mem.
>

I don't really think 0c2070ce is the exact reason. I run the tpch runs
with the same same setting as what Robert was running. I haven't
noticed any regression with the runs. For the last runs I also
uploaded the tpch run outputs for the individual queries for the
reference.



> >  As
> > far as I can see there are no outstanding issues or unhandled review
> > feedback.  I've had a fresh read through of the latest version and
> > have no further comments myself.
> >
> > I've set this to ready-for-committer now.  If I've misunderstood and
> > there are still unresolved issues from that earlier email exchange or
> > someone else wants to post a review or objection, then of course
> > please feel free to set it back.
> >
> > BTW  There is no regression test supplied.  I see that commit 5262f7a4
> > adding parallel index scans put simple explain output in
> > "select_parallel" to demonstrate the new kind of plan being created;
>
> It has added both explain statement test and a test to exercise
> parallel index scan code.
>
>
Thanks for the reference. I added the similar tests for GM in the
uploaded latest patch.


>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-02-17 Thread Rushabh Lathia

On Fri, Feb 17, 2017 at 3:59 PM, Thomas Munro  wrote:

> On Thu, Feb 2, 2017 at 2:32 AM, Rushabh Lathia 
> wrote:
> > Please find attached latest patch.
>
> The latest patch still applies (with some fuzz), builds and the
> regression tests pass.
>
> Attached latest patch, which applies cleanly on latest source.


> I see that Robert made a number of changes and posted a v6 along with
> some numbers which he described as lacklustre, but then fixed a row
> estimate problem which was discouraging parallel joins (commit
> 0c2070ce).  Rushabh posted a v7 and test results which look good.  As
> far as I can see there are no outstanding issues or unhandled review
> feedback.  I've had a fresh read through of the latest version and
> have no further comments myself.
>
> I've set this to ready-for-committer now.  If I've misunderstood and
> there are still unresolved issues from that earlier email exchange or
> someone else wants to post a review or objection, then of course
> please feel free to set it back.
>
>
Thanks Thomas.


> BTW  There is no regression test supplied.  I see that commit 5262f7a4
> adding parallel index scans put simple explain output in
> "select_parallel" to demonstrate the new kind of plan being created;
> perhaps this patch should do the same?  I know it wouldn't really test
> much of the code but it's at least something.  Perhaps you could post
> a new version with that?
>
>
Added the regression test to the new version of patch.

PFA latest patch.


-- 
Rushabh Lathia
www.EnterpriseDB.com


gather-merge-v8.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-17 Thread Amit Kapila

On Fri, Feb 17, 2017 at 3:59 PM, Thomas Munro
 wrote:
> On Thu, Feb 2, 2017 at 2:32 AM, Rushabh Lathia  
> wrote:
>> Please find attached latest patch.
>
> The latest patch still applies (with some fuzz), builds and the
> regression tests pass.
>
> I see that Robert made a number of changes and posted a v6 along with
> some numbers which he described as lacklustre, but then fixed a row
> estimate problem which was discouraging parallel joins (commit
> 0c2070ce).  Rushabh posted a v7 and test results which look good.
>

Are you suggesting that commit 0c2070ce has helped to improve
performance if so, I don't think that has been proved?  I guess the
numbers are different either due to different m/c or some other
settings like scale factor or work_mem.

>  As
> far as I can see there are no outstanding issues or unhandled review
> feedback.  I've had a fresh read through of the latest version and
> have no further comments myself.
>
> I've set this to ready-for-committer now.  If I've misunderstood and
> there are still unresolved issues from that earlier email exchange or
> someone else wants to post a review or objection, then of course
> please feel free to set it back.
>
> BTW  There is no regression test supplied.  I see that commit 5262f7a4
> adding parallel index scans put simple explain output in
> "select_parallel" to demonstrate the new kind of plan being created;

It has added both explain statement test and a test to exercise
parallel index scan code.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-17 Thread Thomas Munro

On Thu, Feb 2, 2017 at 2:32 AM, Rushabh Lathia  wrote:
> Please find attached latest patch.

The latest patch still applies (with some fuzz), builds and the
regression tests pass.

I see that Robert made a number of changes and posted a v6 along with
some numbers which he described as lacklustre, but then fixed a row
estimate problem which was discouraging parallel joins (commit
0c2070ce).  Rushabh posted a v7 and test results which look good.  As
far as I can see there are no outstanding issues or unhandled review
feedback.  I've had a fresh read through of the latest version and
have no further comments myself.

I've set this to ready-for-committer now.  If I've misunderstood and
there are still unresolved issues from that earlier email exchange or
someone else wants to post a review or objection, then of course
please feel free to set it back.

BTW  There is no regression test supplied.  I see that commit 5262f7a4
adding parallel index scans put simple explain output in
"select_parallel" to demonstrate the new kind of plan being created;
perhaps this patch should do the same?  I know it wouldn't really test
much of the code but it's at least something.  Perhaps you could post
a new version with that?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-03 Thread Neha Sharma

Hi,

I have done some testing with the latest patch

1)./pgbench postgres -i -F 100 -s 20
2) update pgbench_accounts set filler = 'foo' where aid%10 = 0;
3) vacuum analyze pgbench_accounts;
4) set max_parallel_workers_per_gather = 4;
5) set max_parallel_workers = 4;

*Machine Configuration :-*
RAM:- 16GB
VCPU  :- 8
Disk :- 640 GB

Test case script with out-file attached.

*LCOV Report :- *

File Names Line Coverage without Test cases Line Coverage with Test
cases Function
Coverage without Test cases Function Coverage with Test cases
src/backend/executor/nodeGatherMerge.c 0.0 % 92.3 % 0.0 % 92.3 %
src/backend/commands/explain.c 65.5 % 68.4 % 81.7 % 85.0 %
src/backend/executor/execProcnode.c 92.50% 95.1 % 100% 100.0 %
src/backend/nodes/copyfuncs.c 77.2 % 77.6 % 73.0 % 73.4 %
src/backend/nodes/outfuncs.c 32.5 % 35.9 % 31.9 % 36.2 %
src/backend/nodes/readfuncs.c 62.7 % 68.2 % 53.3 % 61.7 %
src/backend/optimizer/path/allpaths.c 93.0 % 93.4 % 100 % 100%
src/backend/optimizer/path/costsize.c 96.7 % 96.8 % 100% 100%
src/backend/optimizer/plan/createplan.c 89.9 % 91.2 % 95.0 % 96.0 %
src/backend/optimizer/plan/planner.c 95.1 % 95.2 % 97.3 % 97.3 %
src/backend/optimizer/plan/setrefs.c 94.7 % 94.7 % 97.1 % 97.1 %
src/backend/optimizer/plan/subselect.c 94.1 % 94.1% 100% 100%
src/backend/optimizer/util/pathnode.c 95.6 % 96.1 % 100% 100%
src/backend/utils/misc/guc.c 67.4 % 67.4 % 91.9 % 91.9 %

On Wed, Feb 1, 2017 at 7:02 PM, Rushabh Lathia 
wrote:

> Due to recent below commit, patch not getting apply cleanly on
> master branch.
>
> commit d002f16c6ec38f76d1ee97367ba6af3000d441d0
> Author: Tom Lane 
> Date:   Mon Jan 30 17:15:42 2017 -0500
>
> Add a regression test script dedicated to exercising system views.
>
> Please find attached latest patch.
>
>
>
> On Wed, Feb 1, 2017 at 5:55 PM, Rushabh Lathia 
> wrote:
>
>> I am sorry for the delay, here is the latest re-based patch.
>>
>> my colleague Neha Sharma, reported one regression with the patch, where
>> explain output for the Sort node under GatherMerge was always showing
>> cost as zero:
>>
>> explain analyze select '' AS "xxx" from pgbench_accounts  where filler
>> like '%foo%' order by aid;
>>QUERY
>> PLAN
>> 
>> 
>> 
>>  Gather Merge  (cost=47169.81..70839.91 rows=197688 width=36) (actual
>> time=406.297..653.572 rows=20 loops=1)
>>Workers Planned: 4
>>Workers Launched: 4
>>->  Sort  (*cost=0.00..0.00 rows=0 width=0*) (actual
>> time=368.945..391.124 rows=4 loops=5)
>>  Sort Key: aid
>>  Sort Method: quicksort  Memory: 3423kB
>>  ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..42316.60
>> rows=49422 width=36) (actual time=296.612..338.873 rows=4 loops=5)
>>Filter: (filler ~~ '%foo%'::text)
>>Rows Removed by Filter: 36
>>  Planning time: 0.184 ms
>>  Execution time: 734.963 ms
>>
>> This patch also fix that issue.
>>
>>
>>
>>
>> On Wed, Feb 1, 2017 at 11:27 AM, Michael Paquier <
>> michael.paqu...@gmail.com> wrote:
>>
>>> On Mon, Jan 23, 2017 at 6:51 PM, Kuntal Ghosh
>>>  wrote:
>>> > On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
>>> >  wrote:
>>> >>
>>> > The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.
>>>
>>> Is an update going to be provided? I have moved this patch to next CF
>>> with "waiting on author" as status.
>>> --
>>> Michael
>>>
>>
>>
>>
>> --
>> Rushabh Lathia
>>
>
>
>
> --
> Rushabh Lathia
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 

Regards,

Neha Sharma
\! pgbench -F 100 -i -s 20 regression

\set VERBOSITY verbose

update pgbench_accounts set filler = 'foo' where aid%10 = 0;

vacuum analyze pgbench_accounts;

set max_parallel_workers_per_gather = 4;

set max_parallel_workers = 4;

explain analyse select count(b.aid) from pgbench_accounts CROSS JOIN pgbench_accounts as a CROSS JOIN pgbench_accounts as b where a.filler like '%foo%'group by a.aid,b.abalance;

explain analyze select sum(aid) from pgbench_accounts where aid % 25= 0 group by aid; --this created a crash which was fixed by dev later

explain analyze select '' AS "xxx" from pgbench_accounts  where filler like '%foo%' group by aid; --performance improved about 65%

explain analyze select distinct(aid) from pgbench_accounts where filler like '%foo%' group by aid; --performance improved about 70%

explain analyze select distinct(bid) from pgbench_accounts where filler like '%foo%' group by aid; --performance improved about 60%

explain analyze select avg(a.aid) from  pgbench_accounts a join

Re: [HACKERS] Gather Merge

2017-02-01 Thread Rushabh Lathia

Due to recent below commit, patch not getting apply cleanly on
master branch.

commit d002f16c6ec38f76d1ee97367ba6af3000d441d0
Author: Tom Lane 
Date:   Mon Jan 30 17:15:42 2017 -0500

Add a regression test script dedicated to exercising system views.

Please find attached latest patch.



On Wed, Feb 1, 2017 at 5:55 PM, Rushabh Lathia 
wrote:

> I am sorry for the delay, here is the latest re-based patch.
>
> my colleague Neha Sharma, reported one regression with the patch, where
> explain output for the Sort node under GatherMerge was always showing
> cost as zero:
>
> explain analyze select '' AS "xxx" from pgbench_accounts  where filler
> like '%foo%' order by aid;
>QUERY
> PLAN
> 
> 
> 
>  Gather Merge  (cost=47169.81..70839.91 rows=197688 width=36) (actual
> time=406.297..653.572 rows=20 loops=1)
>Workers Planned: 4
>Workers Launched: 4
>->  Sort  (*cost=0.00..0.00 rows=0 width=0*) (actual
> time=368.945..391.124 rows=4 loops=5)
>  Sort Key: aid
>  Sort Method: quicksort  Memory: 3423kB
>  ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..42316.60
> rows=49422 width=36) (actual time=296.612..338.873 rows=4 loops=5)
>Filter: (filler ~~ '%foo%'::text)
>Rows Removed by Filter: 36
>  Planning time: 0.184 ms
>  Execution time: 734.963 ms
>
> This patch also fix that issue.
>
>
>
>
> On Wed, Feb 1, 2017 at 11:27 AM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Mon, Jan 23, 2017 at 6:51 PM, Kuntal Ghosh
>>  wrote:
>> > On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
>> >  wrote:
>> >>
>> > The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.
>>
>> Is an update going to be provided? I have moved this patch to next CF
>> with "waiting on author" as status.
>> --
>> Michael
>>
>
>
>
> --
> Rushabh Lathia
>



-- 
Rushabh Lathia


gather-merge-v7.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-01 Thread Rushabh Lathia

I am sorry for the delay, here is the latest re-based patch.

my colleague Neha Sharma, reported one regression with the patch, where
explain output for the Sort node under GatherMerge was always showing
cost as zero:

explain analyze select '' AS "xxx" from pgbench_accounts  where filler like
'%foo%' order by aid;
   QUERY
PLAN

 Gather Merge  (cost=47169.81..70839.91 rows=197688 width=36) (actual
time=406.297..653.572 rows=20 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Sort  (*cost=0.00..0.00 rows=0 width=0*) (actual
time=368.945..391.124 rows=4 loops=5)
 Sort Key: aid
 Sort Method: quicksort  Memory: 3423kB
 ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..42316.60
rows=49422 width=36) (actual time=296.612..338.873 rows=4 loops=5)
   Filter: (filler ~~ '%foo%'::text)
   Rows Removed by Filter: 36
 Planning time: 0.184 ms
 Execution time: 734.963 ms

This patch also fix that issue.




On Wed, Feb 1, 2017 at 11:27 AM, Michael Paquier 
wrote:

> On Mon, Jan 23, 2017 at 6:51 PM, Kuntal Ghosh
>  wrote:
> > On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
> >  wrote:
> >>
> > The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.
>
> Is an update going to be provided? I have moved this patch to next CF
> with "waiting on author" as status.
> --
> Michael
>



-- 
Rushabh Lathia


gather-merge-v7.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-01-31 Thread Michael Paquier

On Mon, Jan 23, 2017 at 6:51 PM, Kuntal Ghosh
 wrote:
> On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
>  wrote:
>>
> The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.

Is an update going to be provided? I have moved this patch to next CF
with "waiting on author" as status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-01-23 Thread Kuntal Ghosh

On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
 wrote:
>
The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-01-17 Thread Rushabh Lathia

On Tue, Jan 17, 2017 at 6:44 PM, Amit Kapila 
wrote:

> On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia
>  wrote:
> >
> >
> >
> > Here is the benchmark number which I got with the latest (v6) patch:
> >
> > - max_worker_processes = DEFAULT (8)
> > - max_parallel_workers_per_gather = 4
> > - Cold cache environment is ensured. With every query execution - server
> is
> >   stopped and also OS caches were dropped.
> > - The reported values of execution time (in ms) is median of 3
> executions.
> > - power2 machine with 512GB of RAM
> > - With default postgres.conf
> >
>
> You haven't mentioned scale factor used in these tests.
>

Oops sorry. Those results are for scale factor 10.


>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia

Re: [HACKERS] Gather Merge

2017-01-17 Thread Peter Geoghegan

On Tue, Jan 17, 2017 at 4:26 AM, Rushabh Lathia
 wrote:
> Another observation is, HashAggregate (case 1) is performs better compare to
> GroupAggregate (case 2), but still it doesn't justify the cost difference of
> those two.

It may not be the only issue, or even the main issue, but I'm fairly
suspicious of the fact that cost_sort() doesn't distinguish between
the comparison cost of text and int4, for example.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-01-17 Thread Amit Kapila

On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia
 wrote:
>
>
>
> Here is the benchmark number which I got with the latest (v6) patch:
>
> - max_worker_processes = DEFAULT (8)
> - max_parallel_workers_per_gather = 4
> - Cold cache environment is ensured. With every query execution - server is
>   stopped and also OS caches were dropped.
> - The reported values of execution time (in ms) is median of 3 executions.
> - power2 machine with 512GB of RAM
> - With default postgres.conf
>

You haven't mentioned scale factor used in these tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-01-17 Thread Rushabh Lathia

On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia 
wrote:

>
>
> On Fri, Jan 13, 2017 at 10:52 AM, Rushabh Lathia  > wrote:
>
>>
>>
>> On Thu, Jan 12, 2017 at 8:50 AM, Robert Haas 
>> wrote:
>>
>>> On Sun, Dec 4, 2016 at 7:36 PM, Haribabu Kommi 
>>> wrote:
>>> > On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia <
>>> rushabh.lat...@gmail.com>
>>> > wrote:
>>> >> PFA latest patch with fix as well as few cosmetic changes.
>>> >
>>> > Moved to next CF with "needs review" status.
>>>
>>> I spent quite a bit of time on this patch over the last couple of
>>> days.  I was hoping to commit it, but I think it's not quite ready for
>>> that yet and I hit a few other issues along the way.  Meanwhile,
>>> here's an updated version with the following changes:
>>>
>>> * Adjusted cost_gather_merge because we don't need to worry about less
>>> than 1 worker.
>>> * Don't charge double maintenance cost of the heap per 34ca0905.  This
>>> was pointed out previous and Rushabh said it was fixed, but it wasn't
>>> fixed in v5.
>>> * cost_gather_merge claimed to charge a slightly higher IPC cost
>>> because we have to block, but didn't.  Fix it so it does.
>>> * Move several hunks to more appropriate places in the file, near
>>> related code or in a more logical position relative to surrounding
>>> code.
>>> * Fixed copyright dates for the new files.  One said 2015, one said 2016.
>>> * Removed unnecessary code from create_gather_merge_plan that tried to
>>> handle an empty list of pathkeys (shouldn't happen).
>>> * Make create_gather_merge_plan more consistent with
>>> create_merge_append_plan.  Remove make_gather_merge for the same
>>> reason.
>>> * Changed generate_gather_paths to generate gather merge paths.  In
>>> the previous coding, only the upper planner nodes ever tried to
>>> generate gather merge nodes, but that seems unnecessarily limiting,
>>> since it could be useful to generate a gathered path with pathkeys at
>>> any point in the tree where we'd generate a gathered path with no
>>> pathkeys.
>>> * Rewrote generate_ordered_paths() logic to consider only the one
>>> potentially-useful path not now covered by the new code in
>>> generate_gather_paths().
>>> * Reverted changes in generate_distinct_paths().  I think we should
>>> add something here but the existing logic definitely isn't right
>>> considering the change to generate_gather_paths().
>>> * Assorted cosmetic cleanup in nodeGatherMerge.c.
>>> * Documented the new GUC enable_gathermerge.
>>> * Improved comments.  Dropped one that seemed unnecessary.
>>> * Fixed parts of the patch to be more pgindent-clean.
>>>
>>>
>> Thanks Robert for hacking into this.
>>
>>
>>> Testing this against the TPC-H queries at 10GB with
>>> max_parallel_workers_per_gather = 4, seq_page_cost = 0.1,
>>> random_page_cost = 0.1, work_mem = 64MB initially produced somewhat
>>> demoralizing results.  Only Q17, Q4, and Q8 picked Gather Merge, and
>>> of those only Q17 got faster.  Investigating this led to me realizing
>>> that join costing for parallel joins is all messed up: see
>>> https://www.postgresql.org/message-id/CA+TgmoYt2pyk2CTyvYCtF
>>> ySXN=jsorgh8_mjttlowu5qkjo...@mail.gmail.com
>>>
>>> With that patch applied, in my testing, Gather Merge got picked for
>>> Q3, Q4, Q5, Q6, Q7, Q8, Q10, and Q17, but a lot of those queries get a
>>> little slower instead of a little faster.  Here are the timings --
>>> these are with EXPLAIN ANALYZE, so take them with a grain of salt --
>>> first number is without Gather Merge, second is with Gather Merge:
>>>
>>> Q3 16943.938 ms -> 18645.957 ms
>>> Q4 3155.350 ms -> 4179.431 ms
>>> Q5 13611.484 ms -> 13831.946 ms
>>> Q6 9264.942 ms -> 8734.899 ms
>>> Q7 9759.026 ms -> 10007.307 ms
>>> Q8 2473.899 ms -> 2459.225 ms
>>> Q10 13814.950 ms -> 12255.618 ms
>>> Q17 49552.298 ms -> 46633.632 ms
>>>
>>>
>> This is strange, I will re-run the test again and post the results soon.
>>
>>
> Here is the benchmark number which I got with the latest (v6) patch:
>
> - max_worker_processes = DEFAULT (8)
> - max_parallel_workers_per_gather = 4
> - Cold cache environment is ensured. With every query execution - server is
>   stopped and also OS caches were dropped.
> - The reported values of execution time (in ms) is median of 3 executions.
> - power2 machine with 512GB of RAM
> - With default postgres.conf
>
> Timing with v6 patch on REL9_6_STABLE branch
> (last commit: 8a70d8ae7501141d283e56b31e10c66697c986d5).
>
> Query 3: 49888.300 -> 45914.426
> Query 4: 8531.939 -> 7790.498
> Query 5: 40668.920 -> 38403.658
> Query 9: 90922.825 -> 50277.646
> Query 10: 45776.445 -> 39189.086
> Query 12: 21644.593 -> 21180.402
> Query 15: 63889.061 -> 62027.351
> Query 17: 142208.463 -> 118704.424
> Query 18: 244051.155 -> 186498.456
> Query 20: 212046.605 -> 159360.520
>
> Timing with v6 patch on master branch:
> (last commit: 0777f7a2e8e0a51f0f60cfe164d538bb459bf9f2)
>
>

Re: [HACKERS] Gather Merge

2017-01-17 Thread Rushabh Lathia

On Fri, Jan 13, 2017 at 10:52 AM, Rushabh Lathia 
wrote:

>
>
> On Thu, Jan 12, 2017 at 8:50 AM, Robert Haas 
> wrote:
>
>> On Sun, Dec 4, 2016 at 7:36 PM, Haribabu Kommi 
>> wrote:
>> > On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia <
>> rushabh.lat...@gmail.com>
>> > wrote:
>> >> PFA latest patch with fix as well as few cosmetic changes.
>> >
>> > Moved to next CF with "needs review" status.
>>
>> I spent quite a bit of time on this patch over the last couple of
>> days.  I was hoping to commit it, but I think it's not quite ready for
>> that yet and I hit a few other issues along the way.  Meanwhile,
>> here's an updated version with the following changes:
>>
>> * Adjusted cost_gather_merge because we don't need to worry about less
>> than 1 worker.
>> * Don't charge double maintenance cost of the heap per 34ca0905.  This
>> was pointed out previous and Rushabh said it was fixed, but it wasn't
>> fixed in v5.
>> * cost_gather_merge claimed to charge a slightly higher IPC cost
>> because we have to block, but didn't.  Fix it so it does.
>> * Move several hunks to more appropriate places in the file, near
>> related code or in a more logical position relative to surrounding
>> code.
>> * Fixed copyright dates for the new files.  One said 2015, one said 2016.
>> * Removed unnecessary code from create_gather_merge_plan that tried to
>> handle an empty list of pathkeys (shouldn't happen).
>> * Make create_gather_merge_plan more consistent with
>> create_merge_append_plan.  Remove make_gather_merge for the same
>> reason.
>> * Changed generate_gather_paths to generate gather merge paths.  In
>> the previous coding, only the upper planner nodes ever tried to
>> generate gather merge nodes, but that seems unnecessarily limiting,
>> since it could be useful to generate a gathered path with pathkeys at
>> any point in the tree where we'd generate a gathered path with no
>> pathkeys.
>> * Rewrote generate_ordered_paths() logic to consider only the one
>> potentially-useful path not now covered by the new code in
>> generate_gather_paths().
>> * Reverted changes in generate_distinct_paths().  I think we should
>> add something here but the existing logic definitely isn't right
>> considering the change to generate_gather_paths().
>> * Assorted cosmetic cleanup in nodeGatherMerge.c.
>> * Documented the new GUC enable_gathermerge.
>> * Improved comments.  Dropped one that seemed unnecessary.
>> * Fixed parts of the patch to be more pgindent-clean.
>>
>>
> Thanks Robert for hacking into this.
>
>
>> Testing this against the TPC-H queries at 10GB with
>> max_parallel_workers_per_gather = 4, seq_page_cost = 0.1,
>> random_page_cost = 0.1, work_mem = 64MB initially produced somewhat
>> demoralizing results.  Only Q17, Q4, and Q8 picked Gather Merge, and
>> of those only Q17 got faster.  Investigating this led to me realizing
>> that join costing for parallel joins is all messed up: see
>> https://www.postgresql.org/message-id/CA+TgmoYt2pyk2CTyvYCtF
>> ySXN=jsorgh8_mjttlowu5qkjo...@mail.gmail.com
>>
>> With that patch applied, in my testing, Gather Merge got picked for
>> Q3, Q4, Q5, Q6, Q7, Q8, Q10, and Q17, but a lot of those queries get a
>> little slower instead of a little faster.  Here are the timings --
>> these are with EXPLAIN ANALYZE, so take them with a grain of salt --
>> first number is without Gather Merge, second is with Gather Merge:
>>
>> Q3 16943.938 ms -> 18645.957 ms
>> Q4 3155.350 ms -> 4179.431 ms
>> Q5 13611.484 ms -> 13831.946 ms
>> Q6 9264.942 ms -> 8734.899 ms
>> Q7 9759.026 ms -> 10007.307 ms
>> Q8 2473.899 ms -> 2459.225 ms
>> Q10 13814.950 ms -> 12255.618 ms
>> Q17 49552.298 ms -> 46633.632 ms
>>
>>
> This is strange, I will re-run the test again and post the results soon.
>
>
Here is the benchmark number which I got with the latest (v6) patch:

- max_worker_processes = DEFAULT (8)
- max_parallel_workers_per_gather = 4
- Cold cache environment is ensured. With every query execution - server is
  stopped and also OS caches were dropped.
- The reported values of execution time (in ms) is median of 3 executions.
- power2 machine with 512GB of RAM
- With default postgres.conf

Timing with v6 patch on REL9_6_STABLE branch
(last commit: 8a70d8ae7501141d283e56b31e10c66697c986d5).

Query 3: 49888.300 -> 45914.426
Query 4: 8531.939 -> 7790.498
Query 5: 40668.920 -> 38403.658
Query 9: 90922.825 -> 50277.646
Query 10: 45776.445 -> 39189.086
Query 12: 21644.593 -> 21180.402
Query 15: 63889.061 -> 62027.351
Query 17: 142208.463 -> 118704.424
Query 18: 244051.155 -> 186498.456
Query 20: 212046.605 -> 159360.520

Timing with v6 patch on master branch:
(last commit: 0777f7a2e8e0a51f0f60cfe164d538bb459bf9f2)

Query 3: 45261.722 -> 43499.739
Query 4: 7444.630 -> 6363.999
Query 5: 37146.458 -> 37081.952
Query 9: 88874.243 -> 50232.088
Query 10: 43583.133 -> 38118.195
Query 12: 19918.149 -> 20414.114
Query 15: 62554.860 -> 61039.048

Re: [HACKERS] Gather Merge

2017-01-12 Thread Rushabh Lathia

On Thu, Jan 12, 2017 at 8:50 AM, Robert Haas  wrote:

> On Sun, Dec 4, 2016 at 7:36 PM, Haribabu Kommi 
> wrote:
> > On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia <
> rushabh.lat...@gmail.com>
> > wrote:
> >> PFA latest patch with fix as well as few cosmetic changes.
> >
> > Moved to next CF with "needs review" status.
>
> I spent quite a bit of time on this patch over the last couple of
> days.  I was hoping to commit it, but I think it's not quite ready for
> that yet and I hit a few other issues along the way.  Meanwhile,
> here's an updated version with the following changes:
>
> * Adjusted cost_gather_merge because we don't need to worry about less
> than 1 worker.
> * Don't charge double maintenance cost of the heap per 34ca0905.  This
> was pointed out previous and Rushabh said it was fixed, but it wasn't
> fixed in v5.
> * cost_gather_merge claimed to charge a slightly higher IPC cost
> because we have to block, but didn't.  Fix it so it does.
> * Move several hunks to more appropriate places in the file, near
> related code or in a more logical position relative to surrounding
> code.
> * Fixed copyright dates for the new files.  One said 2015, one said 2016.
> * Removed unnecessary code from create_gather_merge_plan that tried to
> handle an empty list of pathkeys (shouldn't happen).
> * Make create_gather_merge_plan more consistent with
> create_merge_append_plan.  Remove make_gather_merge for the same
> reason.
> * Changed generate_gather_paths to generate gather merge paths.  In
> the previous coding, only the upper planner nodes ever tried to
> generate gather merge nodes, but that seems unnecessarily limiting,
> since it could be useful to generate a gathered path with pathkeys at
> any point in the tree where we'd generate a gathered path with no
> pathkeys.
> * Rewrote generate_ordered_paths() logic to consider only the one
> potentially-useful path not now covered by the new code in
> generate_gather_paths().
> * Reverted changes in generate_distinct_paths().  I think we should
> add something here but the existing logic definitely isn't right
> considering the change to generate_gather_paths().
> * Assorted cosmetic cleanup in nodeGatherMerge.c.
> * Documented the new GUC enable_gathermerge.
> * Improved comments.  Dropped one that seemed unnecessary.
> * Fixed parts of the patch to be more pgindent-clean.
>
>
Thanks Robert for hacking into this.


> Testing this against the TPC-H queries at 10GB with
> max_parallel_workers_per_gather = 4, seq_page_cost = 0.1,
> random_page_cost = 0.1, work_mem = 64MB initially produced somewhat
> demoralizing results.  Only Q17, Q4, and Q8 picked Gather Merge, and
> of those only Q17 got faster.  Investigating this led to me realizing
> that join costing for parallel joins is all messed up: see
> https://www.postgresql.org/message-id/CA+TgmoYt2pyk2CTyvYCtFySXN=
> jsorgh8_mjttlowu5qkjo...@mail.gmail.com
>
> With that patch applied, in my testing, Gather Merge got picked for
> Q3, Q4, Q5, Q6, Q7, Q8, Q10, and Q17, but a lot of those queries get a
> little slower instead of a little faster.  Here are the timings --
> these are with EXPLAIN ANALYZE, so take them with a grain of salt --
> first number is without Gather Merge, second is with Gather Merge:
>
> Q3 16943.938 ms -> 18645.957 ms
> Q4 3155.350 ms -> 4179.431 ms
> Q5 13611.484 ms -> 13831.946 ms
> Q6 9264.942 ms -> 8734.899 ms
> Q7 9759.026 ms -> 10007.307 ms
> Q8 2473.899 ms -> 2459.225 ms
> Q10 13814.950 ms -> 12255.618 ms
> Q17 49552.298 ms -> 46633.632 ms
>
>
This is strange, I will re-run the test again and post the results soon.



> I haven't really had time to dig into these results yet, so I'm not
> sure how "real" these numbers are and how much is run-to-run jitter,
> EXPLAIN ANALYZE distortion, or whatever.  I think this overall concept
> is good, because there should be cases where it's substantially
> cheaper to preserve the order while gathering tuples from workers than
> to re-sort afterwards.  But this particular set of results is a bit
> lackluster.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Rushabh Lathia

On Sun, Dec 4, 2016 at 7:36 PM, Haribabu Kommi wrote:
> On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia
> wrote:
>> PFA latest patch with fix as well as few cosmetic changes.
>
> Moved to next CF with "needs review" status.

I spent quite a bit of time on this patch over the last couple of
days. I was hoping to commit it, but I think it's not quite ready for
that yet and I hit a few other issues along the way. Meanwhile,
here's an updated version with the following changes:

* Adjusted cost_gather_merge because we don't need to worry about less
than 1 worker.
* Don't charge double maintenance cost of the heap per 34ca0905. This
was pointed out previous and Rushabh said it was fixed, but it wasn't
fixed in v5.
* cost_gather_merge claimed to charge a slightly higher IPC cost
because we have to block, but didn't. Fix it so it does.
* Move several hunks to more appropriate places in the file, near
related code or in a more logical position relative to surrounding
code.
* Fixed copyright dates for the new files. One said 2015, one said 2016.
* Removed unnecessary code from create_gather_merge_plan that tried to
handle an empty list of pathkeys (shouldn't happen).
* Make create_gather_merge_plan more consistent with
create_merge_append_plan. Remove make_gather_merge for the same
reason.
* Changed generate_gather_paths to generate gather merge paths. In
the previous coding, only the upper planner nodes ever tried to
generate gather merge nodes, but that seems unnecessarily limiting,
since it could be useful to generate a gathered path with pathkeys at
any point in the tree where we'd generate a gathered path with no
pathkeys.
* Rewrote generate_ordered_paths() logic to consider only the one
potentially-useful path not now covered by the new code in
generate_gather_paths().
* Reverted changes in generate_distinct_paths(). I think we should
add something here but the existing logic definitely isn't right
considering the change to generate_gather_paths().
* Assorted cosmetic cleanup in nodeGatherMerge.c.
* Documented the new GUC enable_gathermerge.
* Improved comments. Dropped one that seemed unnecessary.
* Fixed parts of the patch to be more pgindent-clean.

Testing this against the TPC-H queries at 10GB with
max_parallel_workers_per_gather = 4, seq_page_cost = 0.1,
random_page_cost = 0.1, work_mem = 64MB initially produced somewhat
demoralizing results. Only Q17, Q4, and Q8 picked Gather Merge, and
of those only Q17 got faster. Investigating this led to me realizing
that join costing for parallel joins is all messed up: see
https://www.postgresql.org/message-id/CA+TgmoYt2pyk2CTyvYCtFySXN=jsorgh8_mjttlowu5qkjo...@mail.gmail.com

With that patch applied, in my testing, Gather Merge got picked for
Q3, Q4, Q5, Q6, Q7, Q8, Q10, and Q17, but a lot of those queries get a
little slower instead of a little faster. Here are the timings --
these are with EXPLAIN ANALYZE, so take them with a grain of salt --
first number is without Gather Merge, second is with Gather Merge:

Q3 16943.938 ms -> 18645.957 ms
Q4 3155.350 ms -> 4179.431 ms
Q5 13611.484 ms -> 13831.946 ms
Q6 9264.942 ms -> 8734.899 ms
Q7 9759.026 ms -> 10007.307 ms
Q8 2473.899 ms -> 2459.225 ms
Q10 13814.950 ms -> 12255.618 ms
Q17 49552.298 ms -> 46633.632 ms

I haven't really had time to dig into these results yet, so I'm not
sure how "real" these numbers are and how much is run-to-run jitter,
EXPLAIN ANALYZE distortion, or whatever. I think this overall concept
is good, because there should be cases where it's substantially
cheaper to preserve the order while gathering tuples from workers than
to re-sort afterwards. But this particular set of results is a bit
lackluster.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

gather-merge-v6.patch
Description: application/download

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-12-04 Thread Haribabu Kommi

On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia 
wrote:

>
> PFA latest patch with fix as well as few cosmetic changes.
>
>
Moved to next CF with "needs review" status.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Gather Merge

2016-11-24 Thread Rushabh Lathia

On Wed, Nov 16, 2016 at 3:10 PM, Rushabh Lathia 
wrote:

>
>
> On Mon, Nov 14, 2016 at 3:51 PM, Thomas Munro <
> thomas.mu...@enterprisedb.com> wrote:
>
>> On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia
>>  wrote:
>> > On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <
>> thomas.mu...@enterprisedb.com>
>> > wrote:
>> >> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>> Group
>> >> + * Portions Copyright (c) 1994, Regents of the University of
>> California
>> >>
>> >> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> >> Group"?
>> >
>> > Fixed.
>>
>> The year also needs updating to 2016 in nodeGatherMerge.h.
>>
>
> Oops sorry, fixed now.
>
>
>>
>> >> + /* Per-tuple heap maintenance cost */
>> >> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>> >>
>> >> Why multiply by two?  The comment above this code says "about log2(N)
>> >> comparisons to delete the top heap entry and another log2(N)
>> >> comparisons to insert its successor".  In fact gather_merge_getnext
>> >> calls binaryheap_replace_first, which replaces the top element without
>> >> any comparisons at all and then performs a sift-down in log2(N)
>> >> comparisons to find its new position.  There is no per-tuple "delete"
>> >> involved.  We "replace" the top element with the value it already had,
>> >> just to trigger the sift-down, because we know that our comparator
>> >> function might have a new opinion of the sort order of this element.
>> >> Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
>> >> to be wrong though -- or am I misreading the code?
>> >>
>> > See cost_merge_append.
>>
>> That just got tweaked in commit 34ca0905.
>>
>
> Fixed.
>
>
>>
>> > Looking at the plan I realize that this is happening because wrong
>> costing
>> > for Gather Merge. Here in the plan we can see the row estimated by
>> > Gather Merge is wrong. This is because earlier patch GM was considering
>> > rows = subpath->rows, which is not true as subpath is partial path. So
>> > we need to multiple it with number of worker. Attached patch also fixed
>> > this issues. I also run the TPC-H benchmark with the patch and results
>> > are same as earlier.
>>
>> In create_grouping_paths:
>> +   double  total_groups = gmpath->rows *
>> gmpath->parallel_workers;
>>
>> This hides a variable of the same name in the enclosing scope.  Maybe
>> confusing?
>>
>> In some other places like create_ordered_paths:
>> +   double  total_groups = path->rows * path->parallel_workers;
>>
>> Though it probably made sense to use this variable name in
>> create_grouping_paths, wouldn't total_rows be better here?
>>
>>
> Initially I just copied from the other places. I agree with you that
> create_ordered_paths - total_rows make more sense.
>
>
>> It feels weird to be working back to a total row count estimate from
>> the partial one by simply multiplying by path->parallel_workers.
>> Gather Merge will underestimate the total rows when parallel_workers <
>> 4, if using partial row estimates ultimately from  cost_seqscan which
>> assume some leader contribution.  I don't have a better idea though.
>> Reversing cost_seqscan's logic certainly doesn't seem right.  I don't
>> know how to make them agree on the leader's contribution AND give
>> principled answers, since there seems to be some kind of cyclic
>> dependency in the costing logic (cost_seqscan really needs to be given
>> a leader contribution estimate from its superpath which knows whether
>> it will allow the leader to pull tuples greedily/fairly or not, but
>> that superpath hasn't been created yet; cost_gather_merge needs the
>> row count from its subpath).  Or maybe I'm just confused.
>>
>>
> Yes, I agree with you. But we can't really do changes into cost_seqscan.
> Another option I can think of is just calculate the rows for gather merge,
> by using the reverse formula which been used into cost_seqscan. So
> we can completely remote the rows argument from the
> create_gather_merge_path()
> and then inside create_gather_merge_path() - calculate the total_rows
> using same formula which been used into cost_seqscan. This is working
> fine - but not quite sure about the approach. So I attached that part of
> changes
> as separate patch. Any suggestions?
>

With offline discussion with Thomas, I realized that this won't work. It
will
work only if the subplan is seqscan - so this logic won't be enough to
estimate the rows. I guess as Thomas told earlier, this is not problem
with GatherMerge implementation as such - so we will keep this as separate.

Apart from this my colleague Neha Sharma, reported the server crash with
the patch.
It was hitting below Assert into create_gather_merge_path().

 Assert(pathkeys);

Basically when query is something like "select * from foo where a = 1 order
by a;"
here query has sortclause but planner won't generate sort key because
where equality clause is

Re: [HACKERS] Gather Merge

2016-11-16 Thread Rushabh Lathia

On Mon, Nov 14, 2016 at 3:51 PM, Thomas Munro  wrote:

> On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia
>  wrote:
> > On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <
> thomas.mu...@enterprisedb.com>
> > wrote:
> >> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
> Group
> >> + * Portions Copyright (c) 1994, Regents of the University of California
> >>
> >> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
> >> Group"?
> >
> > Fixed.
>
> The year also needs updating to 2016 in nodeGatherMerge.h.
>

Oops sorry, fixed now.


>
> >> + /* Per-tuple heap maintenance cost */
> >> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
> >>
> >> Why multiply by two?  The comment above this code says "about log2(N)
> >> comparisons to delete the top heap entry and another log2(N)
> >> comparisons to insert its successor".  In fact gather_merge_getnext
> >> calls binaryheap_replace_first, which replaces the top element without
> >> any comparisons at all and then performs a sift-down in log2(N)
> >> comparisons to find its new position.  There is no per-tuple "delete"
> >> involved.  We "replace" the top element with the value it already had,
> >> just to trigger the sift-down, because we know that our comparator
> >> function might have a new opinion of the sort order of this element.
> >> Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
> >> to be wrong though -- or am I misreading the code?
> >>
> > See cost_merge_append.
>
> That just got tweaked in commit 34ca0905.
>

Fixed.


>
> > Looking at the plan I realize that this is happening because wrong
> costing
> > for Gather Merge. Here in the plan we can see the row estimated by
> > Gather Merge is wrong. This is because earlier patch GM was considering
> > rows = subpath->rows, which is not true as subpath is partial path. So
> > we need to multiple it with number of worker. Attached patch also fixed
> > this issues. I also run the TPC-H benchmark with the patch and results
> > are same as earlier.
>
> In create_grouping_paths:
> +   double  total_groups = gmpath->rows *
> gmpath->parallel_workers;
>
> This hides a variable of the same name in the enclosing scope.  Maybe
> confusing?
>
> In some other places like create_ordered_paths:
> +   double  total_groups = path->rows * path->parallel_workers;
>
> Though it probably made sense to use this variable name in
> create_grouping_paths, wouldn't total_rows be better here?
>
>
Initially I just copied from the other places. I agree with you that
create_ordered_paths - total_rows make more sense.


> It feels weird to be working back to a total row count estimate from
> the partial one by simply multiplying by path->parallel_workers.
> Gather Merge will underestimate the total rows when parallel_workers <
> 4, if using partial row estimates ultimately from  cost_seqscan which
> assume some leader contribution.  I don't have a better idea though.
> Reversing cost_seqscan's logic certainly doesn't seem right.  I don't
> know how to make them agree on the leader's contribution AND give
> principled answers, since there seems to be some kind of cyclic
> dependency in the costing logic (cost_seqscan really needs to be given
> a leader contribution estimate from its superpath which knows whether
> it will allow the leader to pull tuples greedily/fairly or not, but
> that superpath hasn't been created yet; cost_gather_merge needs the
> row count from its subpath).  Or maybe I'm just confused.
>
>
Yes, I agree with you. But we can't really do changes into cost_seqscan.
Another option I can think of is just calculate the rows for gather merge,
by using the reverse formula which been used into cost_seqscan. So
we can completely remote the rows argument from the
create_gather_merge_path()
and then inside create_gather_merge_path() - calculate the total_rows
using same formula which been used into cost_seqscan. This is working
fine - but not quite sure about the approach. So I attached that part of
changes
as separate patch. Any suggestions?


-- 
Rushabh Lathia
www.EnterpriseDB.com


gather_merge_v4_minor_changes.patch
Description: application/download


gm_v4_plus_rows_estimate.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-14 Thread Thomas Munro

On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia
 wrote:
> On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro 
> wrote:
>> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
>> + * Portions Copyright (c) 1994, Regents of the University of California
>>
>> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> Group"?
>
> Fixed.

The year also needs updating to 2016 in nodeGatherMerge.h.

>> + /* Per-tuple heap maintenance cost */
>> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>>
>> Why multiply by two?  The comment above this code says "about log2(N)
>> comparisons to delete the top heap entry and another log2(N)
>> comparisons to insert its successor".  In fact gather_merge_getnext
>> calls binaryheap_replace_first, which replaces the top element without
>> any comparisons at all and then performs a sift-down in log2(N)
>> comparisons to find its new position.  There is no per-tuple "delete"
>> involved.  We "replace" the top element with the value it already had,
>> just to trigger the sift-down, because we know that our comparator
>> function might have a new opinion of the sort order of this element.
>> Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
>> to be wrong though -- or am I misreading the code?
>>
> See cost_merge_append.

That just got tweaked in commit 34ca0905.

> Looking at the plan I realize that this is happening because wrong costing
> for Gather Merge. Here in the plan we can see the row estimated by
> Gather Merge is wrong. This is because earlier patch GM was considering
> rows = subpath->rows, which is not true as subpath is partial path. So
> we need to multiple it with number of worker. Attached patch also fixed
> this issues. I also run the TPC-H benchmark with the patch and results
> are same as earlier.

In create_grouping_paths:
+   double  total_groups = gmpath->rows *
gmpath->parallel_workers;

This hides a variable of the same name in the enclosing scope.  Maybe confusing?

In some other places like create_ordered_paths:
+   double  total_groups = path->rows * path->parallel_workers;

Though it probably made sense to use this variable name in
create_grouping_paths, wouldn't total_rows be better here?

It feels weird to be working back to a total row count estimate from
the partial one by simply multiplying by path->parallel_workers.
Gather Merge will underestimate the total rows when parallel_workers <
4, if using partial row estimates ultimately from  cost_seqscan which
assume some leader contribution.  I don't have a better idea though.
Reversing cost_seqscan's logic certainly doesn't seem right.  I don't
know how to make them agree on the leader's contribution AND give
principled answers, since there seems to be some kind of cyclic
dependency in the costing logic (cost_seqscan really needs to be given
a leader contribution estimate from its superpath which knows whether
it will allow the leader to pull tuples greedily/fairly or not, but
that superpath hasn't been created yet; cost_gather_merge needs the
row count from its subpath).  Or maybe I'm just confused.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-11 Thread Rushabh Lathia

Oops forgot to attach latest patch in the earlier mail.



On Fri, Nov 11, 2016 at 6:26 PM, Rushabh Lathia 
wrote:

>
>
> On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <
> thomas.mu...@enterprisedb.com> wrote:
>
>> On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
>>  wrote:
>> > Please find attached latest patch which fix the review point as well as
>> > additional clean-up.
>>
>> I've signed up to review this patch and I'm planning to do some
>> testing.  Here's some initial feedback after a quick read-through:
>>
>>
> Thanks Thomas.
>
>
>> + if (gather_merge_readnext(gm_state, i, initialize ? false : true))
>>
>> Clunky ternary operator... how about "!initialize".
>>
>>
> Fixed.
>
>
>> +/*
>> + * Function clear out a slot in the tuple table for each gather merge
>> + * slots and returns the clear clear slot.
>> + */
>>
>> Maybe better like this?  "_Clear_ out a slot in the tuple table for
>> each gather merge _slot_ and _return_ the _cleared_ slot."
>>
>>
> Fixed.
>
>
>> + /* Free tuple array as we no more need it */
>>
>> "... as we don't need it any more"
>>
>>
> Fixed
>
>
>> +/*
>> + * Read the next tuple for gather merge.
>> + *
>> + * Function fetch the sorted tuple out of the heap.
>> + */
>>
>> "_Fetch_ the sorted tuple out of the heap."
>>
>>
> Fixed
>
>
>> + * Otherwise, pull the next tuple from whichever participate we
>> + * returned from last time, and reinsert the index into the heap,
>> + * because it might now compare differently against the existing
>>
>> s/participate/participant/
>>
>>
> Fixed.
>
>
>> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
>> + * Portions Copyright (c) 1994, Regents of the University of California
>>
>> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> Group"?
>
>
> Fixed.
>
>
>> Are we supposed to be blaming the University of California
>> for new files?
>>
>>
> Not quite sure about this, so keeping this as it is.
>
>
>> +#include "executor/tqueue.h"
>> +#include "miscadmin.h"
>> +#include "utils/memutils.h"
>> +#include "utils/rel.h"
>> +#include "lib/binaryheap.h"
>>
>> Not correctly sorted.
>>
>>
> Copied from nodeGather.c. But Fixed here.
>
>
>> + /*
>> + * store the tuple descriptor into gather merge state, so we can use it
>> + * later while initilizing the gather merge slots.
>> + */
>>
>> s/initilizing/initializing/
>>
>>
> Fixed.
>
>
>> +/* 
>> + * ExecEndGatherMerge
>> + *
>> + * frees any storage allocated through C routines.
>> + * 
>>
>> The convention in Postgres code seems to be to use a form like "Free
>> any storage ..." in function documentation.  Not sure if that's an
>> imperative, an infinitive, or if the word "we" is omitted since
>> English is so fuzzy like that, but it's inconsistent with other
>> documentation to use "frees" here.  Oh, I see that exact wording is in
>> several other files.  I guess I'll just leave this as a complaint
>> about all those files then :-)
>>
>>
> Sure.
>
>
>> + * Pull atleast single tuple from each worker + leader and set up the
>> heap.
>>
>> s/atleast single/at least a single/
>>
>>
> Fixed.
>
>
>> + * Read the tuple for given reader into nowait mode, and form the tuple
>> array.
>>
>> s/ into / in /
>>
>>
> Fixed.
>
>
>> + * Function attempt to read tuple for the given reader and store it into
>> reader
>>
>> s/Function attempt /Attempt /
>>
>>
> Fixed.
>
>
>> + * Function returns true if found tuple for the reader, otherwise returns
>>
>> s/Function returns /Return /
>>
>>
> Fixed.
>
>
>> + * First try to read tuple for each worker (including leader) into nowait
>> + * mode, so that we initialize read from each worker as well as leader.
>>
>> I wonder if it would be good to standardise on the terminology we use
>> when we mean workers AND the leader.  In my Parallel Shared Hash work,
>> I've been saying 'participants' if I mean = workers + leader.  What do
>> you think?
>>
>>
> I am not quite sure about participants. In my options when we explicitly
> say workers + leader its more clear. I am open to change is if committer
> thinks otherwise.
>
>
>
>> + * After this if all active worker unable to produce the tuple, then
>> + * re-read and this time read the tuple into wait mode. For the worker,
>> + * which was able to produced single tuple in the earlier loop and still
>> + * active, just try fill the tuple array if more tuples available.
>> + */
>>
>> How about this?  "After this, if all active workers are unable to
>> produce a tuple, then re-read and this time us wait mode.  For workers
>> that were able to produce a tuple in the earlier loop and are still
>> active, just try to fill the tuple array if more tuples are
>> available."
>>
>>
> Fixed.
>
>
>> + * The heap is never spilled to disk, since we assume N is not very
>> large. So
>> + * this is much simple then cost_sort.
>>

Re: [HACKERS] Gather Merge

2016-11-11 Thread Rushabh Lathia

On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro 
wrote:

> On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
>  wrote:
> > Please find attached latest patch which fix the review point as well as
> > additional clean-up.
>
> I've signed up to review this patch and I'm planning to do some
> testing.  Here's some initial feedback after a quick read-through:
>
>
Thanks Thomas.


> + if (gather_merge_readnext(gm_state, i, initialize ? false : true))
>
> Clunky ternary operator... how about "!initialize".
>
>
Fixed.


> +/*
> + * Function clear out a slot in the tuple table for each gather merge
> + * slots and returns the clear clear slot.
> + */
>
> Maybe better like this?  "_Clear_ out a slot in the tuple table for
> each gather merge _slot_ and _return_ the _cleared_ slot."
>
>
Fixed.


> + /* Free tuple array as we no more need it */
>
> "... as we don't need it any more"
>
>
Fixed


> +/*
> + * Read the next tuple for gather merge.
> + *
> + * Function fetch the sorted tuple out of the heap.
> + */
>
> "_Fetch_ the sorted tuple out of the heap."
>
>
Fixed


> + * Otherwise, pull the next tuple from whichever participate we
> + * returned from last time, and reinsert the index into the heap,
> + * because it might now compare differently against the existing
>
> s/participate/participant/
>
>
Fixed.


> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
>
> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
> Group"?


Fixed.


> Are we supposed to be blaming the University of California
> for new files?
>
>
Not quite sure about this, so keeping this as it is.


> +#include "executor/tqueue.h"
> +#include "miscadmin.h"
> +#include "utils/memutils.h"
> +#include "utils/rel.h"
> +#include "lib/binaryheap.h"
>
> Not correctly sorted.
>
>
Copied from nodeGather.c. But Fixed here.


> + /*
> + * store the tuple descriptor into gather merge state, so we can use it
> + * later while initilizing the gather merge slots.
> + */
>
> s/initilizing/initializing/
>
>
Fixed.


> +/* 
> + * ExecEndGatherMerge
> + *
> + * frees any storage allocated through C routines.
> + * 
>
> The convention in Postgres code seems to be to use a form like "Free
> any storage ..." in function documentation.  Not sure if that's an
> imperative, an infinitive, or if the word "we" is omitted since
> English is so fuzzy like that, but it's inconsistent with other
> documentation to use "frees" here.  Oh, I see that exact wording is in
> several other files.  I guess I'll just leave this as a complaint
> about all those files then :-)
>
>
Sure.


> + * Pull atleast single tuple from each worker + leader and set up the
> heap.
>
> s/atleast single/at least a single/
>
>
Fixed.


> + * Read the tuple for given reader into nowait mode, and form the tuple
> array.
>
> s/ into / in /
>
>
Fixed.


> + * Function attempt to read tuple for the given reader and store it into
> reader
>
> s/Function attempt /Attempt /
>
>
Fixed.


> + * Function returns true if found tuple for the reader, otherwise returns
>
> s/Function returns /Return /
>
>
Fixed.


> + * First try to read tuple for each worker (including leader) into nowait
> + * mode, so that we initialize read from each worker as well as leader.
>
> I wonder if it would be good to standardise on the terminology we use
> when we mean workers AND the leader.  In my Parallel Shared Hash work,
> I've been saying 'participants' if I mean = workers + leader.  What do
> you think?
>
>
I am not quite sure about participants. In my options when we explicitly
say workers + leader its more clear. I am open to change is if committer
thinks otherwise.



> + * After this if all active worker unable to produce the tuple, then
> + * re-read and this time read the tuple into wait mode. For the worker,
> + * which was able to produced single tuple in the earlier loop and still
> + * active, just try fill the tuple array if more tuples available.
> + */
>
> How about this?  "After this, if all active workers are unable to
> produce a tuple, then re-read and this time us wait mode.  For workers
> that were able to produce a tuple in the earlier loop and are still
> active, just try to fill the tuple array if more tuples are
> available."
>
>
Fixed.


> + * The heap is never spilled to disk, since we assume N is not very
> large. So
> + * this is much simple then cost_sort.
>
> s/much simple then/much simpler than/
>
>
Fixed.


> + /*
> + * Avoid log(0)...
> + */
> + N = (path->num_workers < 2) ? 2.0 : (double) path->num_workers;
> + logN = LOG2(N);
> ...
> + /* Per-tuple heap maintenance cost */
> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>
> Why multiply by two?  The comment above this code says "about

Re: [HACKERS] Gather Merge

2016-11-07 Thread Amit Kapila

On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro
 wrote:
> On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
>  wrote:
>> Please find attached latest patch which fix the review point as well as
>> additional clean-up.
>
> +/*
> + * Read the tuple for given reader into nowait mode, and form the tuple 
> array.
> + */
> +static void
> +form_tuple_array(GatherMergeState *gm_state, int reader)
>
> This function is stangely named.  How about try_to_fill_tuple_buffer
> or something?
>

Hmm.  We have discussed upthread to name it as form_tuple_array.  Now,
you feel that is also not good, I think it is basically matter of
perspective, so why not leave it as it is for now and we will come
back to naming it towards end of patch review or may be we can leave
it for committer to decide.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-04 Thread Thomas Munro

On Sat, Nov 5, 2016 at 2:42 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Fri, Nov 4, 2016 at 12:00 PM, Thomas Munro
>>  wrote:
>>> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>>> Group"?  Are we supposed to be blaming the University of California
>>> for new files?
>
>> If the new file contains a portion of code from this age, yes.
>
> My habit has been to include the whole old copyright if there's anything
> at all in the new file that could be considered to be copy-and-paste from
> an existing file.  Frequently it's a gray area.

Thanks.  I see that it's warranted in this case, as code is recycled
from MergeAppend.

> Legally, I doubt anyone cares much.  Morally, I see it as paying due
> respect to those who came before us in this project.

+1

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-04 Thread Thomas Munro

On Sat, Nov 5, 2016 at 1:55 AM, Robert Haas  wrote:
> On Thu, Nov 3, 2016 at 11:00 PM, Thomas Munro
>  wrote:
>> + /*
>> + * Avoid log(0)...
>> + */
>> + N = (path->num_workers < 2) ? 2.0 : (double) path->num_workers;
>> + logN = LOG2(N);
>> ...
>> + /* Per-tuple heap maintenance cost */
>> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>>
>> Why multiply by two?  The comment above this code says "about log2(N)
>> comparisons to delete the top heap entry and another log2(N)
>> comparisons to insert its successor".  In fact gather_merge_getnext
>> calls binaryheap_replace_first, which replaces the top element without
>> any comparisons at all and then performs a sift-down in log2(N)
>> comparisons to find its new position.  There is no per-tuple "delete"
>> involved.  We "replace" the top element with the value it already had,
>> just to trigger the sift-down, because we know that our comparator
>> function might have a new opinion of the sort order of this element.
>> Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
>> to be wrong though -- or am I misreading the code?
>
> See cost_merge_append, and the header comments threreto.

I see.  So commit 7a2fe9bd got rid of the delete/insert code
(heap_siftup_slot and heap_insert_slot) and introduced
binaryheap_replace_first which does it in one step, but the costing
wasn't adjusted and still thinks we pay comparison_cost * logN twice.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-04 Thread Tom Lane

Michael Paquier  writes:
> On Fri, Nov 4, 2016 at 12:00 PM, Thomas Munro
>  wrote:
>> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> Group"?  Are we supposed to be blaming the University of California
>> for new files?

> If the new file contains a portion of code from this age, yes.

My habit has been to include the whole old copyright if there's anything
at all in the new file that could be considered to be copy-and-paste from
an existing file.  Frequently it's a gray area.

Legally, I doubt anyone cares much.  Morally, I see it as paying due
respect to those who came before us in this project.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-04 Thread Robert Haas

On Thu, Nov 3, 2016 at 11:00 PM, Thomas Munro
 wrote:
> + /*
> + * Avoid log(0)...
> + */
> + N = (path->num_workers < 2) ? 2.0 : (double) path->num_workers;
> + logN = LOG2(N);
> ...
> + /* Per-tuple heap maintenance cost */
> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>
> Why multiply by two?  The comment above this code says "about log2(N)
> comparisons to delete the top heap entry and another log2(N)
> comparisons to insert its successor".  In fact gather_merge_getnext
> calls binaryheap_replace_first, which replaces the top element without
> any comparisons at all and then performs a sift-down in log2(N)
> comparisons to find its new position.  There is no per-tuple "delete"
> involved.  We "replace" the top element with the value it already had,
> just to trigger the sift-down, because we know that our comparator
> function might have a new opinion of the sort order of this element.
> Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
> to be wrong though -- or am I misreading the code?

See cost_merge_append, and the header comments threreto.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-03 Thread Michael Paquier

On Fri, Nov 4, 2016 at 12:00 PM, Thomas Munro
 wrote:
> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
>
> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
> Group"?  Are we supposed to be blaming the University of California
> for new files?

If the new file contains a portion of code from this age, yes. If
that's something completely new using only PGDG is fine. At least
that's what I can conclude by looking at git log -p and search for
"new file mode".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-11-03 Thread Thomas Munro

On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
 wrote:
> Please find attached latest patch which fix the review point as well as
> additional clean-up.

I've signed up to review this patch and I'm planning to do some
testing.  Here's some initial feedback after a quick read-through:

+ if (gather_merge_readnext(gm_state, i, initialize ? false : true))

Clunky ternary operator... how about "!initialize".

+/*
+ * Function clear out a slot in the tuple table for each gather merge
+ * slots and returns the clear clear slot.
+ */

Maybe better like this?  "_Clear_ out a slot in the tuple table for
each gather merge _slot_ and _return_ the _cleared_ slot."

+ /* Free tuple array as we no more need it */

"... as we don't need it any more"

+/*
+ * Read the next tuple for gather merge.
+ *
+ * Function fetch the sorted tuple out of the heap.
+ */

"_Fetch_ the sorted tuple out of the heap."

+ * Otherwise, pull the next tuple from whichever participate we
+ * returned from last time, and reinsert the index into the heap,
+ * because it might now compare differently against the existing

s/participate/participant/

+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California

Shouldn't this say just "(c) 2016, PostgreSQL Global Development
Group"?  Are we supposed to be blaming the University of California
for new files?

+#include "executor/tqueue.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "lib/binaryheap.h"

Not correctly sorted.

+ /*
+ * store the tuple descriptor into gather merge state, so we can use it
+ * later while initilizing the gather merge slots.
+ */

s/initilizing/initializing/

+/* 
+ * ExecEndGatherMerge
+ *
+ * frees any storage allocated through C routines.
+ * 

The convention in Postgres code seems to be to use a form like "Free
any storage ..." in function documentation.  Not sure if that's an
imperative, an infinitive, or if the word "we" is omitted since
English is so fuzzy like that, but it's inconsistent with other
documentation to use "frees" here.  Oh, I see that exact wording is in
several other files.  I guess I'll just leave this as a complaint
about all those files then :-)

+ * Pull atleast single tuple from each worker + leader and set up the heap.

s/atleast single/at least a single/

+ * Read the tuple for given reader into nowait mode, and form the tuple array.

s/ into / in /

+ * Function attempt to read tuple for the given reader and store it into reader

s/Function attempt /Attempt /

+ * Function returns true if found tuple for the reader, otherwise returns

s/Function returns /Return /

+ * First try to read tuple for each worker (including leader) into nowait
+ * mode, so that we initialize read from each worker as well as leader.

I wonder if it would be good to standardise on the terminology we use
when we mean workers AND the leader.  In my Parallel Shared Hash work,
I've been saying 'participants' if I mean = workers + leader.  What do
you think?

+ * After this if all active worker unable to produce the tuple, then
+ * re-read and this time read the tuple into wait mode. For the worker,
+ * which was able to produced single tuple in the earlier loop and still
+ * active, just try fill the tuple array if more tuples available.
+ */

How about this?  "After this, if all active workers are unable to
produce a tuple, then re-read and this time us wait mode.  For workers
that were able to produce a tuple in the earlier loop and are still
active, just try to fill the tuple array if more tuples are
available."

+ * The heap is never spilled to disk, since we assume N is not very large. So
+ * this is much simple then cost_sort.

s/much simple then/much simpler than/

+ /*
+ * Avoid log(0)...
+ */
+ N = (path->num_workers < 2) ? 2.0 : (double) path->num_workers;
+ logN = LOG2(N);
...
+ /* Per-tuple heap maintenance cost */
+ run_cost += path->path.rows * comparison_cost * 2.0 * logN;

Why multiply by two?  The comment above this code says "about log2(N)
comparisons to delete the top heap entry and another log2(N)
comparisons to insert its successor".  In fact gather_merge_getnext
calls binaryheap_replace_first, which replaces the top element without
any comparisons at all and then performs a sift-down in log2(N)
comparisons to find its new position.  There is no per-tuple "delete"
involved.  We "replace" the top element with the value it already had,
just to trigger the sift-down, because we know that our comparator
function might have a new opinion of the sort order of this element.
Very clever!  The comment and the 2.0 factor in cost_gather_merge seem
to be wrong though -- or am I misreading the code?

Also, shouldn't we add 1 to N to account for the leader?  Suppose
there are 2

Re: [HACKERS] Gather Merge

2016-10-27 Thread Rushabh Lathia

Please find attached latest patch which fix the review point as well as
additional clean-up.

- Get rid of funnel_slot as its not needed for the Gather Merge
- renamed fill_tuple_array to form_tuple_array
- Fix possible infinite loop into gather_merge_init (Reported by Amit
Kaplia)
- Fix tqueue.c memory leak, by calling TupleQueueReaderNext() with
per-tuple context.
- Code cleanup into ExecGatherMerge.
- Added new function gather_merge_clear_slots(), which clear out all gather
merge slots and also free tuple array at end of execution.

I did the performance testing again with the latest patch and I haven't
observed any regression. Some of TPC-H queries showing additional benefit
with
the latest patch, but its just under 5%.

Do let me know if I missed anything.


On Mon, Oct 24, 2016 at 11:55 AM, Rushabh Lathia 
wrote:

>
>
> On Thu, Oct 20, 2016 at 1:12 PM, Amit Kapila 
> wrote:
>
>> On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia
>>  wrote:
>> > On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
>> > wrote:
>> >>
>> >> There is lot of common code between ExecGatherMerge and ExecGather.
>> >> Do you think it makes sense to have a common function to avoid the
>> >> duplicity?
>> >>
>> >> I see there are small discrepancies in both the codes like I don't see
>> >> the use of single_copy flag, as it is present in gather node.
>> >>
>> >
>> > Yes, even I thought to centrilize some things of ExecGather and
>> > ExecGatherMerge,
>> > but its really not something that is fixed. And I thought it might
>> change
>> > particularly
>> > for the Gather Merge. And as explained by Robert single_copy doesn't
>> make
>> > sense
>> > for the Gather Merge. I will still look into this to see if something
>> can be
>> > make
>> > centralize.
>> >
>>
>> Okay, I haven't thought about it, but do let me know if you couldn't
>> find any way to merge the code.
>>
>>
> Sure, I will look into this.
>
>
>> >>
>> >> 3.
>> >> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
>> >> force)
>> >> {
>> >> ..
>> >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
>> >> +
>> >> + /*
>> >> +
>> >>  * try to read more tuple into nowait mode and store it into the tuple
>> >> + * array.
>> >> +
>> >>  */
>> >> + if (HeapTupleIsValid(tup))
>> >> + fill_tuple_array(gm_state, reader);
>> >>
>> >> How the above read tuple is stored in array?  In anycase the above
>> >> interface seems slightly awkward to me. Basically, I think what you
>> >> are trying to do here is after reading first tuple in waitmode, you
>> >> are trying to fill the array by reading more tuples.  So, can't we
>> >> push reading of this fist tuple into that function and name it as
>> >> form_tuple_array().
>> >>
>> >
>> > Yes, you are right.
>> >
>>
>> You have not answered my first question.  I will try to ask again, how
>> the tuple read by below code is stored in the array:
>>
>>
> Tuple directly get stored into related TupleTableSlot.
> In gather_merge_readnext()
> at the end of function it build the TupleTableSlot for the given tuple. So
> tuple
> read by above code is directly stored into TupleTableSlot.
>
> >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
>>
>> > First its trying to read tuple into wait-mode, and once
>> > it
>> > find tuple then its try to fill the tuple array (which basically try to
>> read
>> > tuple
>> > into nowait-mode). Reason I keep it separate is because in case of
>> > initializing
>> > the gather merge, if we unable to read tuple from all the worker - while
>> > trying
>> > re-read, we again try to fill the tuple array for the worker who already
>> > produced
>> > atleast a single tuple (see gather_merge_init() for more details).
>> >
>>
>> Whenever any worker produced one tuple, you already try to fill the
>> array in gather_merge_readnext(), so does the above code help much?
>>
>> > Also I
>> > thought
>> > filling tuple array (which basically read tuple into nowait mode) and
>> > reading tuple
>> > (into wait-mode) are two separate task - and if its into separate
>> function
>> > that code
>> > look more clear.
>>
>> To me that looks slightly confusing.
>>
>> > If you have any suggestion for the function name
>> > (fill_tuple_array)
>> > then I am open to change that.
>> >
>>
>> form_tuple_array (form_tuple is used at many places in code, so it
>> should look okay).
>
>
> Ok, I rename it with next patch.
>
>
>> I think you might want to consider forming array
>> even for leader, although it might not be as beneficial as for
>> workers, OTOH, the code will get simplified if we do that way.
>>
>
> Yes, I did that earlier - and as you guessed its not be any beneficial
> so to avoided extra memory allocation for the tuple array, I am not
> forming array for leader.
>
> Today, I observed another issue in code:
>>
>> +gather_merge_init(GatherMergeState *gm_state)
>> {
>> ..
>> +reread:
>> + for (i = 0; i <

Re: [HACKERS] Gather Merge

2016-10-24 Thread Rushabh Lathia

On Thu, Oct 20, 2016 at 1:12 PM, Amit Kapila 
wrote:

> On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia
>  wrote:
> > On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
> > wrote:
> >>
> >> There is lot of common code between ExecGatherMerge and ExecGather.
> >> Do you think it makes sense to have a common function to avoid the
> >> duplicity?
> >>
> >> I see there are small discrepancies in both the codes like I don't see
> >> the use of single_copy flag, as it is present in gather node.
> >>
> >
> > Yes, even I thought to centrilize some things of ExecGather and
> > ExecGatherMerge,
> > but its really not something that is fixed. And I thought it might change
> > particularly
> > for the Gather Merge. And as explained by Robert single_copy doesn't make
> > sense
> > for the Gather Merge. I will still look into this to see if something
> can be
> > make
> > centralize.
> >
>
> Okay, I haven't thought about it, but do let me know if you couldn't
> find any way to merge the code.
>
>
Sure, I will look into this.


> >>
> >> 3.
> >> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
> >> force)
> >> {
> >> ..
> >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
> >> +
> >> + /*
> >> +
> >>  * try to read more tuple into nowait mode and store it into the tuple
> >> + * array.
> >> +
> >>  */
> >> + if (HeapTupleIsValid(tup))
> >> + fill_tuple_array(gm_state, reader);
> >>
> >> How the above read tuple is stored in array?  In anycase the above
> >> interface seems slightly awkward to me. Basically, I think what you
> >> are trying to do here is after reading first tuple in waitmode, you
> >> are trying to fill the array by reading more tuples.  So, can't we
> >> push reading of this fist tuple into that function and name it as
> >> form_tuple_array().
> >>
> >
> > Yes, you are right.
> >
>
> You have not answered my first question.  I will try to ask again, how
> the tuple read by below code is stored in the array:
>
>
Tuple directly get stored into related TupleTableSlot.
In gather_merge_readnext()
at the end of function it build the TupleTableSlot for the given tuple. So
tuple
read by above code is directly stored into TupleTableSlot.

>> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
>
> > First its trying to read tuple into wait-mode, and once
> > it
> > find tuple then its try to fill the tuple array (which basically try to
> read
> > tuple
> > into nowait-mode). Reason I keep it separate is because in case of
> > initializing
> > the gather merge, if we unable to read tuple from all the worker - while
> > trying
> > re-read, we again try to fill the tuple array for the worker who already
> > produced
> > atleast a single tuple (see gather_merge_init() for more details).
> >
>
> Whenever any worker produced one tuple, you already try to fill the
> array in gather_merge_readnext(), so does the above code help much?
>
> > Also I
> > thought
> > filling tuple array (which basically read tuple into nowait mode) and
> > reading tuple
> > (into wait-mode) are two separate task - and if its into separate
> function
> > that code
> > look more clear.
>
> To me that looks slightly confusing.
>
> > If you have any suggestion for the function name
> > (fill_tuple_array)
> > then I am open to change that.
> >
>
> form_tuple_array (form_tuple is used at many places in code, so it
> should look okay).


Ok, I rename it with next patch.


> I think you might want to consider forming array
> even for leader, although it might not be as beneficial as for
> workers, OTOH, the code will get simplified if we do that way.
>

Yes, I did that earlier - and as you guessed its not be any beneficial
so to avoided extra memory allocation for the tuple array, I am not
forming array for leader.

Today, I observed another issue in code:
>
> +gather_merge_init(GatherMergeState *gm_state)
> {
> ..
> +reread:
> + for (i = 0; i < nreaders + 1; i++)
> + {
> + if (TupIsNull(gm_state->gm_slots[i]) ||
> + gm_state->gm_slots[i]->tts_isempty)
> + {
> + if (gather_merge_readnext(gm_state, i, initialize ? false : true))
> + {
> + binaryheap_add_unordered(gm_state->gm_heap,
> + Int32GetDatum(i));
> + }
> + }
> + else
> + fill_tuple_array(gm_state, i);
> + }
> + initialize = false;
> +
> + for (i = 0; i < nreaders; i++)
> + if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_
> isempty)
> + goto reread;
> ..
> }
>
> This code can cause infinite loop.  Consider a case where one of the
> worker doesn't get any tuple because by the time it starts all the
> tuples are consumed by all other workers. The above code will keep on
> looping to fetch the tuple from that worker whereas that worker can't
> return any tuple.  I think you can fix it by checking if the
> particular queue has been exhausted.
>
>
Oh yes. I will work on the fix and soon submit the next set of patch.


> >> >
> >> > Open Issue:
> >> >
> >> > - Commit

Re: [HACKERS] Gather Merge

2016-10-20 Thread Rushabh Lathia

On Thu, Oct 20, 2016 at 12:22 AM, Peter Geoghegan  wrote:

> On Tue, Oct 4, 2016 at 11:05 PM, Rushabh Lathia
>  wrote:
> > Query 4:  With GM 7901.480 -> Without GM 9064.776
> > Query 5:  With GM 53452.126 -> Without GM 55059.511
> > Query 9:  With GM 52613.132 -> Without GM 98206.793
> > Query 15: With GM 68051.058 -> Without GM 68918.378
> > Query 17: With GM 129236.075 -> Without GM 160451.094
> > Query 20: With GM 259144.232 -> Without GM 306256.322
> > Query 21: With GM 153483.497 -> Without GM 168169.916
> >
> > Here from the results we can see that query 9, 17 and 20 are the one
> which
> > show good performance benefit with the Gather Merge.
>
> Were all other TPC-H queries unaffected? IOW, did they have the same
> plan as before with your patch applied? Did you see any regressions?
>
>
Yes, all other TPC-H queries where unaffected with the patch. At the
initially stage of patch development I noticed the regressions, but then
realize that it is because I am not allowing leader to participate in the
GM. Later on I fixed that and after that I didn't noticed any regressions.

I assume that this patch has each worker use work_mem for its own
> sort, as with hash joins today. One concern with that model when
> testing is that you could end up with a bunch of internal sorts for
> cases with a GM node, where you get one big external sort for cases
> without one. Did you take that into consideration?
>
>
Yes, but isn't that good? Please correct me if I am missing anything.


> --
> Peter Geoghegan
>



--
Rushabh Lathia
www.EnterpriseDB.com

Re: [HACKERS] Gather Merge

2016-10-20 Thread Amit Kapila

On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia
 wrote:
> On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
> wrote:
>>
>> There is lot of common code between ExecGatherMerge and ExecGather.
>> Do you think it makes sense to have a common function to avoid the
>> duplicity?
>>
>> I see there are small discrepancies in both the codes like I don't see
>> the use of single_copy flag, as it is present in gather node.
>>
>
> Yes, even I thought to centrilize some things of ExecGather and
> ExecGatherMerge,
> but its really not something that is fixed. And I thought it might change
> particularly
> for the Gather Merge. And as explained by Robert single_copy doesn't make
> sense
> for the Gather Merge. I will still look into this to see if something can be
> make
> centralize.
>

Okay, I haven't thought about it, but do let me know if you couldn't
find any way to merge the code.

>>
>> 3.
>> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
>> force)
>> {
>> ..
>> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
>> +
>> + /*
>> +
>>  * try to read more tuple into nowait mode and store it into the tuple
>> + * array.
>> +
>>  */
>> + if (HeapTupleIsValid(tup))
>> + fill_tuple_array(gm_state, reader);
>>
>> How the above read tuple is stored in array?  In anycase the above
>> interface seems slightly awkward to me. Basically, I think what you
>> are trying to do here is after reading first tuple in waitmode, you
>> are trying to fill the array by reading more tuples.  So, can't we
>> push reading of this fist tuple into that function and name it as
>> form_tuple_array().
>>
>
> Yes, you are right.
>

You have not answered my first question.  I will try to ask again, how
the tuple read by below code is stored in the array:

>> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);

> First its trying to read tuple into wait-mode, and once
> it
> find tuple then its try to fill the tuple array (which basically try to read
> tuple
> into nowait-mode). Reason I keep it separate is because in case of
> initializing
> the gather merge, if we unable to read tuple from all the worker - while
> trying
> re-read, we again try to fill the tuple array for the worker who already
> produced
> atleast a single tuple (see gather_merge_init() for more details).
>

Whenever any worker produced one tuple, you already try to fill the
array in gather_merge_readnext(), so does the above code help much?

> Also I
> thought
> filling tuple array (which basically read tuple into nowait mode) and
> reading tuple
> (into wait-mode) are two separate task - and if its into separate function
> that code
> look more clear.

To me that looks slightly confusing.

> If you have any suggestion for the function name
> (fill_tuple_array)
> then I am open to change that.
>

form_tuple_array (form_tuple is used at many places in code, so it
should look okay).  I think you might want to consider forming array
even for leader, although it might not be as beneficial as for
workers, OTOH, the code will get simplified if we do that way.


Today, I observed another issue in code:

+gather_merge_init(GatherMergeState *gm_state)
{
..
+reread:
+ for (i = 0; i < nreaders + 1; i++)
+ {
+ if (TupIsNull(gm_state->gm_slots[i]) ||
+ gm_state->gm_slots[i]->tts_isempty)
+ {
+ if (gather_merge_readnext(gm_state, i, initialize ? false : true))
+ {
+ binaryheap_add_unordered(gm_state->gm_heap,
+ Int32GetDatum(i));
+ }
+ }
+ else
+ fill_tuple_array(gm_state, i);
+ }
+ initialize = false;
+
+ for (i = 0; i < nreaders; i++)
+ if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_isempty)
+ goto reread;
..
}

This code can cause infinite loop.  Consider a case where one of the
worker doesn't get any tuple because by the time it starts all the
tuples are consumed by all other workers. The above code will keep on
looping to fetch the tuple from that worker whereas that worker can't
return any tuple.  I think you can fix it by checking if the
particular queue has been exhausted.

>> >
>> > Open Issue:
>> >
>> > - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak into
>> > tqueue.c by
>> > calling gather_readnext() into per-tuple context. Now for gather merge
>> > that
>> > is
>> > not possible, as we storing the tuple into Tuple array and we want tuple
>> > to
>> > be
>> > free only its get pass through the merge sort algorithm. One idea is, we
>> > can
>> > also call gm_readnext_tuple() under per-tuple context (which will fix
>> > the
>> > leak
>> > into tqueue.c) and then store the copy of tuple into tuple array.
>> >
>>
>> Won't extra copy per tuple impact performance?  Is the fix in
>> mentioned commit was for record or composite types, if so, does
>> GatherMerge support such types and if it support, does it provide any
>> benefit over Gather?
>>
>
> I don't think was specificially for the record or composite types - but I
> might be
> wrong. As per my understanding

Re: [HACKERS] Gather Merge

2016-10-19 Thread Peter Geoghegan

On Tue, Oct 4, 2016 at 11:05 PM, Rushabh Lathia
 wrote:
> Query 4:  With GM 7901.480 -> Without GM 9064.776
> Query 5:  With GM 53452.126 -> Without GM 55059.511
> Query 9:  With GM 52613.132 -> Without GM 98206.793
> Query 15: With GM 68051.058 -> Without GM 68918.378
> Query 17: With GM 129236.075 -> Without GM 160451.094
> Query 20: With GM 259144.232 -> Without GM 306256.322
> Query 21: With GM 153483.497 -> Without GM 168169.916
>
> Here from the results we can see that query 9, 17 and 20 are the one which
> show good performance benefit with the Gather Merge.

Were all other TPC-H queries unaffected? IOW, did they have the same
plan as before with your patch applied? Did you see any regressions?

I assume that this patch has each worker use work_mem for its own
sort, as with hash joins today. One concern with that model when
testing is that you could end up with a bunch of internal sorts for
cases with a GM node, where you get one big external sort for cases
without one. Did you take that into consideration?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-10-18 Thread Rushabh Lathia

Thanks Amit for reviewing this patch.

On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
wrote:

> On Wed, Oct 5, 2016 at 11:35 AM, Rushabh Lathia
>  wrote:
> > Hi hackers,
> >
> > Attached is the patch to implement Gather Merge.
> >
>
> Couple of review comments:
>
> 1.
> ExecGatherMerge()
> {
> ..
> + /* No workers? Then never mind. */
> + if (!got_any_worker
> ||
> + node->nreaders < 2)
> + {
> +
> ExecShutdownGatherMergeWorkers(node);
> + node->nreaders = 0;
> +
> }
>
> Are you planning to restrict the use of gather merge based on number
> of workers, if there is a valid reason, then I think comments should
> be updated for same?
>
>
Thanks for catching this. This is left over from the earlier design patch.
With
current design we don't have any limitation for the number of worker . I did
the performance testing with setting max_parallel_workers_per_gather to 1
and didn't noticed any performance degradation. So I removed this limitation
into attached patch.

2.
> +ExecGatherMerge(GatherMergeState * node){
> ..
> + if (!node->initialized)
> + {
> + EState   *estate = node->ps.state;
> +
> GatherMerge *gm = (GatherMerge *) node->ps.plan;
> +
> + /*
> + * Sometimes we
> might have to run without parallelism; but if parallel
> + * mode is active then we can try to
> fire up some workers.
> + */
> + if (gm->num_workers > 0 && IsInParallelMode())
> +
> {
> + ParallelContext *pcxt;
> + bool got_any_worker =
> false;
> +
> + /* Initialize the workers required to execute Gather node. */
> +
> if (!node->pei)
> + node->pei = ExecInitParallelPlan(node-
> >ps.lefttree,
> +
>  estate,
> +
>  gm->num_workers);
> ..
> }
>
> There is lot of common code between ExecGatherMerge and ExecGather.
> Do you think it makes sense to have a common function to avoid the
> duplicity?
>
> I see there are small discrepancies in both the codes like I don't see
> the use of single_copy flag, as it is present in gather node.
>
>
Yes, even I thought to centrilize some things of ExecGather and
ExecGatherMerge,
but its really not something that is fixed. And I thought it might change
particularly
for the Gather Merge. And as explained by Robert single_copy doesn't make
sense
for the Gather Merge. I will still look into this to see if something can
be make
centralize.


> 3.
> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
> force)
> {
> ..
> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
> +
> + /*
> +
>  * try to read more tuple into nowait mode and store it into the tuple
> + * array.
> +
>  */
> + if (HeapTupleIsValid(tup))
> + fill_tuple_array(gm_state, reader);
>
> How the above read tuple is stored in array?  In anycase the above
> interface seems slightly awkward to me. Basically, I think what you
> are trying to do here is after reading first tuple in waitmode, you
> are trying to fill the array by reading more tuples.  So, can't we
> push reading of this fist tuple into that function and name it as
> form_tuple_array().
>
>
Yes, you are right. First its trying to read tuple into wait-mode, and once
it
find tuple then its try to fill the tuple array (which basically try to
read tuple
into nowait-mode). Reason I keep it separate is because in case of
initializing
the gather merge, if we unable to read tuple from all the worker - while
trying
re-read, we again try to fill the tuple array for the worker who already
produced
atleast a single tuple (see gather_merge_init() for more details). Also I
thought
filling tuple array (which basically read tuple into nowait mode) and
reading tuple
(into wait-mode) are two separate task - and if its into separate function
that code
look more clear. If you have any suggestion for the function name
(fill_tuple_array)
then I am open to change that.



> 4.
> +create_gather_merge_path(..)
> {
> ..
> +  0 /* FIXME: pathnode->limit_tuples*/);
>
> What exactly you want to fix in above code?
>
>
Fixed.


> 5.
>  +/* Tuple array size */
> +#define MAX_TUPLE_STORE 10
>
> Have you tried with other values of MAX_TUPLE_STORE?  If yes, then
> what are the results?  I think it is better to add a comment why array
> size is best for performance.
>
>

Actually I was thinking on that, but I don't wanted to add their because
its just
performance number on my machine. Anyway I added the comments.


>
> 6.
> +/* INTERFACE ROUTINES
> + * ExecInitGatherMerge - initialize the MergeAppend
> node
> + * ExecGatherMerge - retrieve the next tuple from the node
> + *
> ExecEndGatherMerge - shut down the MergeAppend node
> + *
> ExecReScanGatherMerge - rescan the MergeAppend node
>
> typo.  /MergeAppend/GatherMerge
>
>
Fixed.


> 7.
> +static TupleTableSlot *gather_merge_getnext(GatherMergeState * gm_state);
> +static HeapTuple
> gm_readnext_tuple(GatherMergeState * gm_state, int nreader, bool
> force, bool *done);
>
> Formatting near GatherMergeState doesn't seem to be appropriate.  I
> think you need to add GatherMergeState in typedefs.list and

Re: [HACKERS] Gather Merge

2016-10-17 Thread Robert Haas

On Mon, Oct 17, 2016 at 4:56 AM, Amit Kapila  wrote:
> + node->nreaders < 2)
...
> I see there are small discrepancies in both the codes like I don't see
> the use of single_copy flag, as it is present in gather node.

single_copy doesn't make sense for GatherMerge, because the whole
point is to merge a bunch of individually-sorted output streams into a
single stream.  If you have only one stream of tuples, you don't need
to merge anything: you could have just used Gather for that.

It does, however, make sense to merge one worker's output with the
leader's output.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-10-17 Thread Amit Kapila

On Wed, Oct 5, 2016 at 11:35 AM, Rushabh Lathia
 wrote:
> Hi hackers,
>
> Attached is the patch to implement Gather Merge.
>

Couple of review comments:

1.
ExecGatherMerge()
{
..
+ /* No workers? Then never mind. */
+ if (!got_any_worker
||
+ node->nreaders < 2)
+ {
+
ExecShutdownGatherMergeWorkers(node);
+ node->nreaders = 0;
+
}

Are you planning to restrict the use of gather merge based on number
of workers, if there is a valid reason, then I think comments should
be updated for same?

2.
+ExecGatherMerge(GatherMergeState * node){
..
+ if (!node->initialized)
+ {
+ EState   *estate = node->ps.state;
+
GatherMerge *gm = (GatherMerge *) node->ps.plan;
+
+ /*
+ * Sometimes we
might have to run without parallelism; but if parallel
+ * mode is active then we can try to
fire up some workers.
+ */
+ if (gm->num_workers > 0 && IsInParallelMode())
+
{
+ ParallelContext *pcxt;
+ bool got_any_worker =
false;
+
+ /* Initialize the workers required to execute Gather node. */
+
if (!node->pei)
+ node->pei = ExecInitParallelPlan(node-
>ps.lefttree,
+
 estate,
+
 gm->num_workers);
..
}

There is lot of common code between ExecGatherMerge and ExecGather.
Do you think it makes sense to have a common function to avoid the
duplicity?

I see there are small discrepancies in both the codes like I don't see
the use of single_copy flag, as it is present in gather node.

3.
+gather_merge_readnext(GatherMergeState * gm_state, int reader, bool force)
{
..
+ tup = gm_readnext_tuple(gm_state, reader, force, NULL);
+
+ /*
+
 * try to read more tuple into nowait mode and store it into the tuple
+ * array.
+
 */
+ if (HeapTupleIsValid(tup))
+ fill_tuple_array(gm_state, reader);

How the above read tuple is stored in array?  In anycase the above
interface seems slightly awkward to me. Basically, I think what you
are trying to do here is after reading first tuple in waitmode, you
are trying to fill the array by reading more tuples.  So, can't we
push reading of this fist tuple into that function and name it as
form_tuple_array().

4.
+create_gather_merge_path(..)
{
..
+  0 /* FIXME: pathnode->limit_tuples*/);

What exactly you want to fix in above code?

5.
 +/* Tuple array size */
+#define MAX_TUPLE_STORE 10

Have you tried with other values of MAX_TUPLE_STORE?  If yes, then
what are the results?  I think it is better to add a comment why array
size is best for performance.

6.
+/* INTERFACE ROUTINES
+ * ExecInitGatherMerge - initialize the MergeAppend
node
+ * ExecGatherMerge - retrieve the next tuple from the node
+ *
ExecEndGatherMerge - shut down the MergeAppend node
+ *
ExecReScanGatherMerge - rescan the MergeAppend node

typo.  /MergeAppend/GatherMerge

7.
+static TupleTableSlot *gather_merge_getnext(GatherMergeState * gm_state);
+static HeapTuple
gm_readnext_tuple(GatherMergeState * gm_state, int nreader, bool
force, bool *done);

Formatting near GatherMergeState doesn't seem to be appropriate.  I
think you need to add GatherMergeState in typedefs.list and then run
pgindent again.

8.
+ /*
+ * Initialize funnel slot to same tuple descriptor as outer plan.
+ */
+ if
(!ExecContextForcesOids(_state->ps, ))

I think in above comment, you mean Initialize GatherMerge slot.

9.
+ /* Does tuple array have any avaiable tuples? */
/avaiable/available

>
> Open Issue:
>
> - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak into
> tqueue.c by
> calling gather_readnext() into per-tuple context. Now for gather merge that
> is
> not possible, as we storing the tuple into Tuple array and we want tuple to
> be
> free only its get pass through the merge sort algorithm. One idea is, we can
> also call gm_readnext_tuple() under per-tuple context (which will fix the
> leak
> into tqueue.c) and then store the copy of tuple into tuple array.
>

Won't extra copy per tuple impact performance?  Is the fix in
mentioned commit was for record or composite types, if so, does
GatherMerge support such types and if it support, does it provide any
benefit over Gather?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

64 matches

Mail list logo