subject:"\[HACKERS\] advanced partition matching algorithm for partition\-wise join"

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-16 Thread Etsuro Fujita

On Mon, Apr 6, 2020 at 8:43 PM Ashutosh Bapat
 wrote:
> On Fri, 3 Apr 2020 at 20:45, Etsuro Fujita  wrote:
> But it will be good to have following addition I suggested in my patches to 
> make sure nparts is set to 0 for an unpartitioned relation as per the comment 
> on nparts in RelOptInfo.
> @@ -1653,6 +1663,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, 
> RelOptInfo *outer_rel,
> jointype, restrictlist))
> {
> Assert(!IS_PARTITIONED_REL(joinrel));
> +   /* Join is not partitioned. */
> +   joinrel->nparts = 0;
> return;
> }

I didn't modified that function as proposed, because I thought that 1)
there would be no need to do so, and that 2) it would be better to set
joinrel->nparts only when we set joinrel->part_schema, for
consistency.

>> >> 3) I think the for nparts comment is somewhat misleading:
>> >>
>> >>int nparts;  /* number of partitions; 0 = not partitioned;
>> >>  * -1 = not yet set */
>> >>
>> >> which says that nparts=0 means not partitioned. But then there are
>> >> conditions like this:
>> >>
>> >>  /* Nothing to do, if the join relation is not partitioned. */
>> >>  if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
>> >>  return;
>> >>
>> >> which (ignoring the obsolete comment) seems to say we can have nparts==0
>> >> even for partitioned tables, no?
>>
>> Yeah, I think I was a bit hasty.  I fixed this.

> For a non-join relation, nparts = 0 and nparts = -1 both have the same 
> meaning. Although we never set nparts = 0 for a non-join relation?

I don't think so.  Consider this:

create table prt (a int, b int) partition by range (a);
create table prt_p1 partition of prt for values from (0) to (250);
create table prt_p2 partition of prt for values from (250) to (500);
drop table prt_p1;
drop table prt_p2;
select count(*) from prt;

For this query, we would have nparts=0 for the partitioned table prt.

Thanks!  Sorry for the delay.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-10 Thread Etsuro Fujita

On Sat, Apr 11, 2020 at 1:00 AM Tom Lane  wrote:
> Ashutosh Bapat  writes:
> > On Fri, Apr 10, 2020 at 9:14 PM Tom Lane  wrote:
> >> I see no patch here ...
>
> > Sorry. Here it is
>
> LGTM, will push in a moment.

Thanks for taking care of this, Tom!  Thanks for the patch, Ashutosh!
Thanks for the report, Kuntal!

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-10 Thread Tom Lane

Ashutosh Bapat  writes:
> On Fri, Apr 10, 2020 at 9:14 PM Tom Lane  wrote:
>> I see no patch here ...

> Sorry. Here it is

LGTM, will push in a moment.

regards, tom lane

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-10 Thread Ashutosh Bapat

On Fri, Apr 10, 2020 at 9:14 PM Tom Lane  wrote:
>
> Ashutosh Bapat  writes:
> > On Fri, 10 Apr 2020 at 20:44, Jeff Janes  wrote:
> >> In that case, we really should add the PG_USED_FOR_ASSERTS_ONLY to make
> >> the compiler happy.
>
> > Attaching my patch again. It doesn't need PG_USED_FOR_ASSERTS_ONLY as well.
> > Kuntal has confirmed that this fixes the warning for him.
>
> I see no patch here ...
>

Sorry. Here it is


-- 
Best Wishes,
Ashutosh Bapat
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 7607501fe7..4681441dcc 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -1021,7 +1021,6 @@ partition_bounds_merge(int partnatts,
 	   List **outer_parts, List **inner_parts)
 {
 	PartitionBoundInfo outer_binfo = outer_rel->boundinfo;
-	PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
 
 	/*
 	 * Currently, this function is called only from try_partitionwise_join(),
@@ -1032,7 +1031,7 @@ partition_bounds_merge(int partnatts,
 		   jointype == JOIN_ANTI);
 
 	/* The partitioning strategies should be the same. */
-	Assert(outer_binfo->strategy == inner_binfo->strategy);
+	Assert(outer_binfo->strategy == inner_rel->boundinfo->strategy);
 
 	*outer_parts = *inner_parts = NIL;
 	switch (outer_binfo->strategy)

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-10 Thread Tom Lane

Ashutosh Bapat  writes:
> On Fri, 10 Apr 2020 at 20:44, Jeff Janes  wrote:
>> In that case, we really should add the PG_USED_FOR_ASSERTS_ONLY to make
>> the compiler happy.

> Attaching my patch again. It doesn't need PG_USED_FOR_ASSERTS_ONLY as well.
> Kuntal has confirmed that this fixes the warning for him.

I see no patch here ...

regards, tom lane

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-10 Thread Ashutosh Bapat

On Fri, 10 Apr 2020 at 20:44, Jeff Janes  wrote:

> On Thu, Apr 9, 2020 at 10:04 AM Ashutosh Bapat <
> ashutosh.bapat@gmail.com> wrote:
>
>> On Thu, Apr 9, 2020 at 12:03 PM Etsuro Fujita 
>> wrote:
>> >
>> > On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
>> > > Etsuro Fujita  writes:
>> > > > Yeah, partition_bounds_merge() is currently called only from
>> > > > try_partitionwise_join(), which guarantees that the strategies are
>> the
>> > > > same.
>> >
>> > > If there's only one caller and there's not likely to ever be more,
>> > > then I tend to agree that you don't need the assertion.
>> >
>> > It seems unlikely that partition_bounds_merge() will be called from
>> > more places in the foreseeable future, so I'd still vote for removing
>> > the assertion.
>>
>> When I wrote that function, I had UNION also in mind. A UNION across
>> multiple partitioned relations will be partitioned if we can merge the
>> partition bounds in a sensible manner. Of course the current structure
>> of that function looks more purposed for join, but it's not difficult
>> to convert it to be used for UNION as well. In that case those set of
>> functions will have many more callers. So, I will vote to keep that
>> assertion now that we have it there.
>>
>
> In that case, we really should add the PG_USED_FOR_ASSERTS_ONLY to make
> the compiler happy.
>
>
Attaching my patch again. It doesn't need PG_USED_FOR_ASSERTS_ONLY as well.
Kuntal has confirmed that this fixes the warning for him.

-- 
Best Wishes,
Ashutosh

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-10 Thread Jeff Janes

On Thu, Apr 9, 2020 at 10:04 AM Ashutosh Bapat 
wrote:

> On Thu, Apr 9, 2020 at 12:03 PM Etsuro Fujita 
> wrote:
> >
> > On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> > > Etsuro Fujita  writes:
> > > > Yeah, partition_bounds_merge() is currently called only from
> > > > try_partitionwise_join(), which guarantees that the strategies are
> the
> > > > same.
> >
> > > If there's only one caller and there's not likely to ever be more,
> > > then I tend to agree that you don't need the assertion.
> >
> > It seems unlikely that partition_bounds_merge() will be called from
> > more places in the foreseeable future, so I'd still vote for removing
> > the assertion.
>
> When I wrote that function, I had UNION also in mind. A UNION across
> multiple partitioned relations will be partitioned if we can merge the
> partition bounds in a sensible manner. Of course the current structure
> of that function looks more purposed for join, but it's not difficult
> to convert it to be used for UNION as well. In that case those set of
> functions will have many more callers. So, I will vote to keep that
> assertion now that we have it there.
>

In that case, we really should add the PG_USED_FOR_ASSERTS_ONLY to make the
compiler happy.

Cheers,

Jeff

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-09 Thread Tomas Vondra

On Thu, Apr 09, 2020 at 07:34:01PM +0530, Ashutosh Bapat wrote:

On Thu, Apr 9, 2020 at 12:03 PM Etsuro Fujita  wrote:

On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> Etsuro Fujita  writes:
> > Yeah, partition_bounds_merge() is currently called only from
> > try_partitionwise_join(), which guarantees that the strategies are the
> > same.

> If there's only one caller and there's not likely to ever be more,
> then I tend to agree that you don't need the assertion.

It seems unlikely that partition_bounds_merge() will be called from
more places in the foreseeable future, so I'd still vote for removing
the assertion.

When I wrote that function, I had UNION also in mind. A UNION across
multiple partitioned relations will be partitioned if we can merge the
partition bounds in a sensible manner. Of course the current structure
of that function looks more purposed for join, but it's not difficult
to convert it to be used for UNION as well. In that case those set of
functions will have many more callers. So, I will vote to keep that
assertion now that we have it there.

Yeah. I really don't see why we should remove an assertion that enforces
something useful, especially when it's just a plain comparions. Had it
been some expensive assert, maybe. But how much slower does this make
an assert-enabled build? 0.0001% or something like that?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-09 Thread Ashutosh Bapat

On Thu, Apr 9, 2020 at 12:03 PM Etsuro Fujita  wrote:
>
> On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> > Etsuro Fujita  writes:
> > > Yeah, partition_bounds_merge() is currently called only from
> > > try_partitionwise_join(), which guarantees that the strategies are the
> > > same.
>
> > If there's only one caller and there's not likely to ever be more,
> > then I tend to agree that you don't need the assertion.
>
> It seems unlikely that partition_bounds_merge() will be called from
> more places in the foreseeable future, so I'd still vote for removing
> the assertion.

When I wrote that function, I had UNION also in mind. A UNION across
multiple partitioned relations will be partitioned if we can merge the
partition bounds in a sensible manner. Of course the current structure
of that function looks more purposed for join, but it's not difficult
to convert it to be used for UNION as well. In that case those set of
functions will have many more callers. So, I will vote to keep that
assertion now that we have it there.
-- 
Best Wishes,
Ashutosh Bapat

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-09 Thread Etsuro Fujita

On Thu, Apr 9, 2020 at 2:36 PM Tom Lane  wrote:
> Etsuro Fujita  writes:
> > Yeah, partition_bounds_merge() is currently called only from
> > try_partitionwise_join(), which guarantees that the strategies are the
> > same.

> If there's only one caller and there's not likely to ever be more,
> then I tend to agree that you don't need the assertion.

It seems unlikely that partition_bounds_merge() will be called from
more places in the foreseeable future, so I'd still vote for removing
the assertion.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Tom Lane

Etsuro Fujita  writes:
> Yeah, partition_bounds_merge() is currently called only from
> try_partitionwise_join(), which guarantees that the strategies are the
> same.  The assertion cost would be cheap, but not zero, so I still
> think it would be better to remove the assertion from
> partition_bounds_merge().

FWIW, our general policy is that assertion costs should be ignored
in any performance considerations.  If you're concerned about
performance you should be running a non-assert build, so it doesn't
matter.  (And certainly, there are lots of assertions in the backend
that cost FAR more than this one.)  The thing to evaluate an assertion
on is how likely it is that it would catch a foreseeable sort of coding
error in some future patch.  Maybe this one carries its weight on that
score or maybe it doesn't, but that's how to think about it.

If there's only one caller and there's not likely to ever be more,
then I tend to agree that you don't need the assertion.

regards, tom lane

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Etsuro Fujita

Hi,

On Thu, Apr 9, 2020 at 12:06 AM Kuntal Ghosh  wrote:
> Both of your patches fix the problem. I don't have much exposure in
> this area to comment on whether we should keep/remove the assertion
> from the code. But, here is my opinion:
>
> The code structure looks like following:
> Assert(condition A);
> if (Condition B)
> merge_*_bounds();
>
> Inside merge_*_bounds(), you have both the above assert and the if
> condition as another assert:
> Assert(condition A and Condition B);
>
> And, merge_*_bounds() are called from only one place. So, something is
> redundant here and I'm inclined towards removal of the assert
> condition. Another thing I noticed:
>
> /* The partitioning strategies should be the same. */
> Assert(outer_binfo->strategy == inner_binfo->strategy);
>
> The comment just reads the assertion aloud which looks unnecessary.
>

Yeah, partition_bounds_merge() is currently called only from
try_partitionwise_join(), which guarantees that the strategies are the
same.  The assertion cost would be cheap, but not zero, so I still
think it would be better to remove the assertion from
partition_bounds_merge().

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Kuntal Ghosh

Hello Ashutosh, Fujita,

On Wed, Apr 8, 2020 at 3:49 PM Ashutosh Bapat
 wrote:
> On Wed, 8 Apr 2020 at 15:42, Etsuro Fujita  wrote:
>> On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh  
>> wrote:
>> > I'm getting the following warning during compilation.
>> >
>> > partbounds.c: In function ‘partition_bounds_merge’:
>> > partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
>> > [-Wunused-variable]
>> >   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>> >  ^
>> > For fixing the same, we can declare inner_binfo as
>> > PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>>
>> I'd propose to remove an assertion causing this (and  the
>> outer_binfo/inner_binfo variables) from partition_bounds_merge(),
>> rather than doing so, because the assertion is redundant, as we have
>> the same assertion in merge_list_bounds() and merge_range_bounds().
>> Please find attached a patch.
>
>
> I think it's better to have the assertion in all the three places and also in 
> merge_hash_bounds() whenever that comes along. The assertion in 
> merge_*_bounds() will be good to in case those functions are called from 
> places other than partition_bounds_merge(). The assertion in 
> partition_bounds_merge() will make sure that when the individual 
> merge_*_bounds() functions are called based on one of the bounds both of the 
> bounds have same strategy.

Both of your patches fix the problem. I don't have much exposure in
this area to comment on whether we should keep/remove the assertion
from the code. But, here is my opinion:

The code structure looks like following:
Assert(condition A);
if (Condition B)
merge_*_bounds();

Inside merge_*_bounds(), you have both the above assert and the if
condition as another assert:
Assert(condition A and Condition B);

And, merge_*_bounds() are called from only one place. So, something is
redundant here and I'm inclined towards removal of the assert
condition. Another thing I noticed:

/* The partitioning strategies should be the same. */
Assert(outer_binfo->strategy == inner_binfo->strategy);

The comment just reads the assertion aloud which looks unnecessary.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Ashutosh Bapat

On Wed, 8 Apr 2020 at 15:42, Etsuro Fujita  wrote:

> Hi Kuntal,
>
> On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh 
> wrote:
> > I'm getting the following warning during compilation.
> >
> > partbounds.c: In function ‘partition_bounds_merge’:
> > partbounds.c:1024:21: warning: unused variable ‘inner_binfo’
> [-Wunused-variable]
> >   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
> >  ^
> > For fixing the same, we can declare inner_binfo as
> > PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>
> I'd propose to remove an assertion causing this (and  the
> outer_binfo/inner_binfo variables) from partition_bounds_merge(),
> rather than doing so, because the assertion is redundant, as we have
> the same assertion in merge_list_bounds() and merge_range_bounds().
> Please find attached a patch.
>

Oh, I didn't see this mail before sending my other mail.

I think it's better to have the assertion in all the three places and also
in merge_hash_bounds() whenever that comes along. The assertion in
merge_*_bounds() will be good to in case those functions are called from
places other than partition_bounds_merge(). The assertion in
partition_bounds_merge() will make sure that when the individual
merge_*_bounds() functions are called based on one of the bounds both of
the bounds have same strategy.
-- 
Best Wishes,
Ashutosh

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Ashutosh Bapat

Thanks Kuntal for the report. Let me know if this patch works for you.

On Wed, 8 Apr 2020 at 13:00, Kuntal Ghosh 
wrote:

> Hi,
>
> On Wed, Apr 8, 2020 at 7:07 AM Etsuro Fujita 
> wrote:
> >
> > Pushed after modifying some comments further, based on the suggestions
> > of Ashutosh.
> I'm getting the following warning during compilation.
>
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1024:21: warning: unused variable ‘inner_binfo’
> [-Wunused-variable]
>   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>  ^
> For fixing the same, we can declare inner_binfo as
> PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>
> --
> Thanks & Regards,
> Kuntal Ghosh
> EnterpriseDB: http://www.enterprisedb.com
>


-- 
Best Wishes,
Ashutosh
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 7607501fe7..4681441dcc 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -1021,7 +1021,6 @@ partition_bounds_merge(int partnatts,
 	   List **outer_parts, List **inner_parts)
 {
 	PartitionBoundInfo outer_binfo = outer_rel->boundinfo;
-	PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
 
 	/*
 	 * Currently, this function is called only from try_partitionwise_join(),
@@ -1032,7 +1031,7 @@ partition_bounds_merge(int partnatts,
 		   jointype == JOIN_ANTI);
 
 	/* The partitioning strategies should be the same. */
-	Assert(outer_binfo->strategy == inner_binfo->strategy);
+	Assert(outer_binfo->strategy == inner_rel->boundinfo->strategy);
 
 	*outer_parts = *inner_parts = NIL;
 	switch (outer_binfo->strategy)

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Etsuro Fujita

Hi Kuntal,

On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh  wrote:
> I'm getting the following warning during compilation.
>
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
> [-Wunused-variable]
>   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>  ^
> For fixing the same, we can declare inner_binfo as
> PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.

I'd propose to remove an assertion causing this (and  the
outer_binfo/inner_binfo variables) from partition_bounds_merge(),
rather than doing so, because the assertion is redundant, as we have
the same assertion in merge_list_bounds() and merge_range_bounds().
Please find attached a patch.

Best regards,
Etsuro Fujita

fix-compiler-warning.patch
Description: Binary data

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Kuntal Ghosh

Hi,

On Wed, Apr 8, 2020 at 7:07 AM Etsuro Fujita  wrote:
>
> Pushed after modifying some comments further, based on the suggestions
> of Ashutosh.
I'm getting the following warning during compilation.

partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ [-Wunused-variable]
  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
 ^
For fixing the same, we can declare inner_binfo as
PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-07 Thread Ashutosh Bapat

Thanks a lot Fujita-san. Thanks Dmitry, Rajkumar, Amul, Mark, Robert,
Antonin, Amit, Justin,Thomas and Tomas for all your help and review.

On Wed, 8 Apr 2020 at 07:07, Etsuro Fujita  wrote:

> On Wed, Apr 8, 2020 at 2:24 AM Etsuro Fujita 
> wrote:
> > On Wed, Apr 8, 2020 at 12:15 AM Tomas Vondra
> >  wrote:
> > > On Mon, Apr 06, 2020 at 05:28:52PM +0900, Etsuro Fujita wrote:
> > > >On Sat, Apr 4, 2020 at 12:15 AM Etsuro Fujita <
> etsuro.fuj...@gmail.com> wrote:
> > > >> Attached is the original patch (0001) and one patch (0002) with
> > > >> changes including those by Tomas and Ashutosh.
> > > >
> > > >I merged the patches into one and rebased it against HEAD.  Attached
> > > >is a new version, in which I added the commit message as well.  Does
> > > >that make sense?  If there are no objections, I’ll commit the patch.
> >
> > > +1
> >
> > Great!  It's midnight in Japan now, so I'll push the patch early morning.
>
> Pushed after modifying some comments further, based on the suggestions
> of Ashutosh.
>
> Best regards,
> Etsuro Fujita
>


-- 
Best Wishes,
Ashutosh

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-07 Thread Etsuro Fujita

On Wed, Apr 8, 2020 at 2:24 AM Etsuro Fujita  wrote:
> On Wed, Apr 8, 2020 at 12:15 AM Tomas Vondra
>  wrote:
> > On Mon, Apr 06, 2020 at 05:28:52PM +0900, Etsuro Fujita wrote:
> > >On Sat, Apr 4, 2020 at 12:15 AM Etsuro Fujita  
> > >wrote:
> > >> Attached is the original patch (0001) and one patch (0002) with
> > >> changes including those by Tomas and Ashutosh.
> > >
> > >I merged the patches into one and rebased it against HEAD.  Attached
> > >is a new version, in which I added the commit message as well.  Does
> > >that make sense?  If there are no objections, I’ll commit the patch.
>
> > +1
>
> Great!  It's midnight in Japan now, so I'll push the patch early morning.

Pushed after modifying some comments further, based on the suggestions
of Ashutosh.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-07 Thread Etsuro Fujita

Hi Tomas,

On Wed, Apr 8, 2020 at 12:15 AM Tomas Vondra
 wrote:
> On Mon, Apr 06, 2020 at 05:28:52PM +0900, Etsuro Fujita wrote:
> >On Sat, Apr 4, 2020 at 12:15 AM Etsuro Fujita  
> >wrote:
> >> Attached is the original patch (0001) and one patch (0002) with
> >> changes including those by Tomas and Ashutosh.
> >
> >I merged the patches into one and rebased it against HEAD.  Attached
> >is a new version, in which I added the commit message as well.  Does
> >that make sense?  If there are no objections, I’ll commit the patch.

> +1

Great!  It's midnight in Japan now, so I'll push the patch early morning.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-07 Thread Tomas Vondra


On Mon, Apr 06, 2020 at 05:28:52PM +0900, Etsuro Fujita wrote:

On Sat, Apr 4, 2020 at 12:15 AM Etsuro Fujita  wrote:

Attached is the original patch (0001) and one patch (0002) with
changes including those by Tomas and Ashutosh.


I merged the patches into one and rebased it against HEAD.  Attached
is a new version, in which I added the commit message as well.  Does
that make sense?  If there are no objections, I’ll commit the patch.



+1


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-06 Thread Ashutosh Bapat

Here are some changes suggested on top of v34 as per my previous mail.
These are mostly comment changes.

On Mon, 6 Apr 2020 at 13:59, Etsuro Fujita  wrote:

> On Sat, Apr 4, 2020 at 12:15 AM Etsuro Fujita 
> wrote:
> > Attached is the original patch (0001) and one patch (0002) with
> > changes including those by Tomas and Ashutosh.
>
> I merged the patches into one and rebased it against HEAD.  Attached
> is a new version, in which I added the commit message as well.  Does
> that make sense?  If there are no objections, I’ll commit the patch.
>

I have no objections.

-- 
Best Wishes,
Ashutosh
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 13633841f3..d285e63b6b 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1106,32 +1106,33 @@ into joins between their partitions is called partitionwise join. We will use
 term "partitioned relation" for either a partitioned table or a join between
 compatibly partitioned tables.
 
-The technique is extended to some cases where the joining tables don't have
-exactly the same partition bounds, by an advanced partition-matching
-algorithm: it checks to see if there is a relationship where each partition of
-one joining table matches/overlaps at most one partition of the other, and
-vice versa; in which case the join between the joining tables can be broken
-down into joins between the matching partitions (ie, the join relation is
-considerd partitioned), so the algorithm produces the pairs of the matching
-partitions, plus the partition bounds for the join relation, to allow
-partitionwise join for the join.  The algorithm is implemented in
-partition_bounds_merge().  For an N-way join relation considered partitioned
-by this extension, not every pair of joining relations can use partitionwise
+Even if the joining relations do not have exactly same partition bounds,
+partitionwise join can be still applied by by using an advanced
+partition-matching algorithm. For both the joining relations, the algorithm
+checks whether for every given partition of given joining relation there
+existsa matching/overlapping partition in the other joining relation. In such a
+case the join between the joining relations can be broken down into joins
+between their matching/overlapping partitions. The join relation can then be
+considered partitioned. The algorithm produces the pairs of the
+matching/overlapping partitions, plus the partition bounds for the join
+relation, to allow partitionwise join for computing join.  The algorithm is
+implemented in partition_bounds_merge().  For an N-way join relation considered
+partitioned this way, not every pair of joining relations can use partitionwise
 join.  For example:
 
 	(A leftjoin B on (Pab)) innerjoin C on (Pac)
 
-where A, B, and C are partitioned tables, and A has an extra partition
-compared to B and C.  When considering partitionwise join for the join {A B},
-the extra partition of A doesn't have a matching partition on the nullable
-side, which is the case that the current implementation of partitionwise join
-can't handle.  So {A B} is not considered partitioned, and thus the pair of
-{A B} and C considered for the 3-way join can't use partitionwise join.  On
-the other hand, the pair of {A C} and B can use partitionwise join, because
-{A C} is considered partitioned, eliminating the extra partition (see identity
-1 on outer join reordering).  The partitionwise joinability of the N-way join
-relation is determined based on the first pair of joining relations that are
-both partitioned and can use partitionwise join.
+where A, B, and C are partitioned tables. A has an extra partition compared to
+B and C.  When considering partitionwise join for the join {A B}, the extra
+partition of A doesn't have a matching partition on the nullable side, which is
+the case that the current implementation of partitionwise join can't handle.
+So {A B} is not considered partitioned and the pair of {A B} and C considered
+for the 3-way join can not use partitionwise join.  On the other hand, the pair
+of {A C} and B can use partitionwise join, because {A C} is considered
+partitioned by eliminating the extra partition (see identity 1 on outer join
+reordering).  Whether an N-way join can use partitionwise join is determined
+based on the first pair of joining relations that are both partitioned and can
+use partitionwise join.
 
 The partitioning properties of a partitioned relation are stored in its
 RelOptInfo.  The information about data types of partition keys are stored in
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index e4c74d6c03..8ff798fd17 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -70,8 +70,8 @@ typedef struct PartitionRangeBound
 } PartitionRangeBound;
 
 /*
- * Mapping from partitions of a partitioned relation to partitions of a join
- * relation supposed to be partitioned

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-06 Thread Ashutosh Bapat

On Fri, 3 Apr 2020 at 20:45, Etsuro Fujita  wrote:

> Hi,
>
> On Thu, Apr 2, 2020 at 2:12 AM Ashutosh Bapat
>  wrote:
> > On Thu, 26 Mar 2020 at 00:35, Tomas Vondra 
> wrote:
> >> I've started reviewing the patch a couple days ago. I haven't done any
> >> extensive testing, but I do have a bunch of initial comments that I can
> >> share now.
> >>
> >> 1) I wonder if this needs to update src/backend/optimizer/README, which
> >> does have a section about partitionwise joins. It seems formulated in a
> >> way that that probably covers even this more advanced algorithm, but
> >> maybe it should mention handling of default partitions etc.?
>
> > Done. Please check the wording. It might need some word smithy.
>
> You heavily changed the existing documentation about PWJ, but I don't
> think we really need to do so.  Also, IMO I think the description
> about default partitions you added is needed in README.  I think it
> would be better to put such a description in source files.  How about
> something like the attached, instead?  I wrote part of this based on
> the commit message in the original versions of the patch you posted.
>

I corrected some grammar, typos. Broke longer sentences into smaller ones
so that its easy to read and understand. As is the concept is hard to
understand with all its limitations. Thanks for the example. Retained it.

You seem to have removed few comments that explained the algorithm in
detail from build_joinrel_partition_info(). It would have been good to have
those there. But I am ok not having them either.

But it will be good to have following addition I suggested in my patches to
make sure nparts is set to 0 for an unpartitioned relation as per the
comment on nparts in RelOptInfo.
@@ -1653,6 +1663,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
jointype, restrictlist))
{
Assert(!IS_PARTITIONED_REL(joinrel));
+   /* Join is not partitioned. */
+   joinrel->nparts = 0;
return;
}


> >> There certainly needs to be some description of the algorithm somewhere,
> >> either in a README or before a suitable function. It doesn't have to be
> >> particularly detailed, a rough outline of the matching would be enough,
> >> so that readers don't have to rebuild the knowledge from pieces
> >> scattered around various comments.
>
> > The algorithm for list and range partitioned tables is slightly
> different. So, I have added separate prologue to each list_merge_bounds()
> and range_merge_bounds(). Please check if that serves the purpose.
>
> Too detailed to me.  In this:
>
> + * If there are multiple partitions from one side matching a given
> partition on
> + * the other side, the algorithm bails out since we do not have machinary
> for
> + * joining one partition with mulitple partitions. It might happen that
> any of
> + * the list items of a partition from the outer relation do not appear in
> the
> + * inner relation and there is no default partition in the inner
> relation. Such
> + * a partition from the outer side will have no matching partition on the
> inner
> + * side. The algorithm will bail out in such a case since we do not have a
> + * mechanism to perform a join with a non-existing relation.
>
> I don't think the last comment is correct; that would apply to the old
> versions of this function IIRC, but not to the latest version.  How
> about something much simpler like the attached, instead?
>

I know that algorithm pretty well by now, so it suffices for me to say we
use something similar to merge join, but may be for someone without that
background a detailed explanation is useful. But this looks fine at the
moment.


> > Done. Actually this wasn't updated when partition pruning was
> introduced, which could cause a partitionwise join to be not used even when
> those conditions were met. Similarly when a query involved whole row
> reference. It's hard to explain all the conditions under which
> partitionwise join technique will be used. But I have tried to keep it easy
> to understand.
>
> IMO I think your words "there is exactly one pair of matching
> partitions." is a bit misleading, because that sounds like that PWJ
> doesn't allow multiply-segmented join.  How about s/exact
> matching/one-to-one matching/ in the existing documentation, instead?


Good catch. That was really misleading. Looks good to me.


> >> 3) I think the for nparts comment is somewhat misleading:
> >>
> >>int nparts;  /* number of partitions; 0 = not partitioned;
> >>  * -1 = not yet set */
> >>
> >> which says that nparts=0 means not partitioned. But then there are
> >> conditions like this:
> >>
> >>  /* Nothing to do, if the join relation is not partitioned. */
> >>  if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
> >>  return;
> >>
> >> which (ignoring the obsolete comment) seems to say we can have nparts==0
> >> even for partitioned tables, no?
>
> Yeah, I

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-01 Thread Etsuro Fujita

Hi Tomas and Ashutosh,

On Thu, Apr 2, 2020 at 1:51 AM Ashutosh Bapat
 wrote:
> On Thu, 26 Mar 2020 at 05:47, Tomas Vondra  
> wrote:

>> three more comments after eye-balling the code for a bit longer.
>>
>> 1) The patch probably needs to tweak config.sgml which says this about
>> the enable_partitionwise_join GUC:
>>
>>.. Partitionwise join currently applies only when the join conditions
>>include all the partition keys, which must be of the same data type
>>and have exactly matching sets of child partitions. ..
>
>
> Done. Actually this wasn't updated when partition pruning was introduced, 
> which could cause a partitionwise join to be not used even when those 
> conditions were met. Similarly when a query involved whole row reference. 
> It's hard to explain all the conditions under which partitionwise join 
> technique will be used. But I have tried to keep it easy to understand.
>
>>
>>
>> Which is probably incorrect, because the point of this patch is not to
>> require exact match of the partitions, right?
>>
>> 2) Do we really need the 'merged' flag in try_partitionwise_join? Can't
>> we simply use the joinrel->merged flag directly? ISTM the we always
>> start with joinrel->merged=false, and then only ever set it to true in
>> some cases. I've tried doing that, per the attached 0002 patch. The
>> regression tests seem to work fine.
>
>
> Thanks. I just added a small prologue to compute_partition_bounds().
>
>>
>>
>> I noticed this because I've tried moving part of the function into a
>> separate function, and removing the variable makes that simpler.
>>
>> The patch also does a couple additional minor tweaks.
>>
>> 3) I think the for nparts comment is somewhat misleading:
>>
>>int nparts;  /* number of partitions; 0 = not partitioned;
>>  * -1 = not yet set */
>>
>> which says that nparts=0 means not partitioned. But then there are
>> conditions like this:
>>
>>  /* Nothing to do, if the join relation is not partitioned. */
>>  if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
>>  return;
>>
>> which (ignoring the obsolete comment) seems to say we can have nparts==0
>> even for partitioned tables, no?
>
>
> See my previous mail.
>
>>
>>
>> Anyway, attached is the original patch (0001) and two patches with
>> proposed changes. 0002 removes the "merged" flag as explained in (2),
>> 0003 splits the try_partitionwise_join() function into two parts.
>>
>> I'm saying these changes have to happen and it's a bit crude (and it
>> might be a bit of a bikeshedding).
>
>
> I have added 0005 with the changes I described in this as well as the 
> previous mail. 0004 is just some white space fixes.

Thanks for the comments, Tomas!  Thanks for the patch, Ashutosh!  I
will look at the patch.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-01 Thread Ashutosh Bapat

On Thu, 26 Mar 2020 at 00:35, Tomas Vondra 
wrote:

> Hi,
>
> I've started reviewing the patch a couple days ago. I haven't done any
> extensive testing, but I do have a bunch of initial comments that I can
> share now.
>
> 1) I wonder if this needs to update src/backend/optimizer/README, which
> does have a section about partitionwise joins. It seems formulated in a
> way that that probably covers even this more advanced algorithm, but
> maybe it should mention handling of default partitions etc.?
>

Done. Please check the wording. It might need some word smithy.


>
> There certainly needs to be some description of the algorithm somewhere,
> either in a README or before a suitable function. It doesn't have to be
> particularly detailed, a rough outline of the matching would be enough,
> so that readers don't have to rebuild the knowledge from pieces
> scattered around various comments.
>

The algorithm for list and range partitioned tables is slightly different.
So, I have added separate prologue to each list_merge_bounds() and
range_merge_bounds(). Please check if that serves the purpose.


>
> 2) Do we need another GUC enabling this more complex algorithm? In PG11
> the partitionwise join is disabled by default, on the grounds that it's
> expensive and not worth it by default. How much more expensive is this?
> Maybe it makes sense to allow enabling only the "simple" approach?
>

We have reduced the complexity of merging bounds quite a bit so this
shouldn't be costly. Further more we handle the usual case of equal bounds
quickly using the merge flag so most of the cases should be fine. It's only
when two partitioned tables with same partition scheme are joined but do
not have merge-able bounds that this algorithm would not yield useful
result - but that would be rare in the field. enable_partitionwise_join =
false should take care of such scenarios easily. I am not in favour of
adding another GUC which we set to false by default and then take another
few releases to make it true by default.


>
> 3) This comment in try_partitionwise_join is now incorrect, because the
> condition may be true even for partitioned tables with (nparts == 0).
>
>  /* Nothing to do, if the join relation is not partitioned. */
>  if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
>  return;
>
>
If part_scheme = NULL, npart should be 0, fixed that in
build_joinrel_partition_info(). If partscheme != NULL but bounds can not be
merged, nparts = 0. So this condition is correct. Encapsulated in a macro
IS_JOINREL_NOT_PARTITITIONED(). and added comments for the macro. Given
that the macro is used exactly at one place, it may not be necessary to
define it but it looks *nice*.


> Moreover, the condition used to be
>
>  if (!IS_PARTITIONED_REL(joinrel))
>  return;
>
> which is way more readable. I think it's net negative to replace these
> "nice" macros with clear meaning with complex conditions. If needed, we
> can invent new macros. There are many other places where the patch
> replaces macros with less readable conditions.
>
>
The only other place where we have replaced a *nice* macro is in
build_joinrel_partition_info(). But I think it's a legit replacement. I
have added a comment there.


>
> 4) I'm a bit puzzled how we could get here with non-partitioned rels?
>
>  /*
>   * We can not perform partitionwise join if either of the joining
> relations
>   * is not partitioned.
>   */
>  if (!IS_PARTITIONED_REL(rel1) || !IS_PARTITIONED_REL(rel2))
>  return;
>

See the comment I have added in build_joinrel_partition_info(). Not all
joining pairs for a given relation are partitioned.


>
> 5) I find the "merged" flag in RelOptInfo rather unclear, because it
> does not clearly indicate what was merged. Maybe something like
> partbounds_merged would be better?
>

Done.


>
> 6) The try_partitionwise_join function is getting a bit too long and
> harder to understand. The whole block in
>
>  if (joinrel->nparts == -1)
>  {
>  ...
>  }
>
> seems rather well isolated, so I propose to move it to a separate
> function responsible only for the merging. We can simply call it on the
> joinrel, and make it return right away if (joinrel->nparts == -1).
>

Looks like you have already taken care of this one in one of your patches.


>
> 7) I'd suggest not to reference exact functions in comments unless
> abolutely necessary, because it's harder to maintain and it does not
> really explain purpose of the struct/code. E.g. consider this:
>
>  /* Per-partitioned-relation data for
> merge_list_bounds()/merge_range_bounds() */
>  typedef struct PartitionMap
>  { ... }
>
> Why does it matter where is the struct used? That's pretty trivial to
> find using 'git grep' or something. Instead the comment should explain
> the purpose of the struct.
>

Adjusted names and comments a bit.

-- 
Best Wishes,
Ashutosh

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-03-25 Thread Tomas Vondra


Hi,

I've started reviewing the patch a couple days ago. I haven't done any
extensive testing, but I do have a bunch of initial comments that I can
share now.

1) I wonder if this needs to update src/backend/optimizer/README, which
does have a section about partitionwise joins. It seems formulated in a
way that that probably covers even this more advanced algorithm, but
maybe it should mention handling of default partitions etc.?

There certainly needs to be some description of the algorithm somewhere,
either in a README or before a suitable function. It doesn't have to be
particularly detailed, a rough outline of the matching would be enough,
so that readers don't have to rebuild the knowledge from pieces
scattered around various comments.

2) Do we need another GUC enabling this more complex algorithm? In PG11
the partitionwise join is disabled by default, on the grounds that it's
expensive and not worth it by default. How much more expensive is this?
Maybe it makes sense to allow enabling only the "simple" approach?

3) This comment in try_partitionwise_join is now incorrect, because the
condition may be true even for partitioned tables with (nparts == 0).

/* Nothing to do, if the join relation is not partitioned. */
if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
return;

Moreover, the condition used to be 


if (!IS_PARTITIONED_REL(joinrel))
return;

which is way more readable. I think it's net negative to replace these
"nice" macros with clear meaning with complex conditions. If needed, we
can invent new macros. There are many other places where the patch
replaces macros with less readable conditions.


4) I'm a bit puzzled how we could get here with non-partitioned rels?

/*
 * We can not perform partitionwise join if either of the joining relations
 * is not partitioned.
 */
if (!IS_PARTITIONED_REL(rel1) || !IS_PARTITIONED_REL(rel2))
return;

5) I find the "merged" flag in RelOptInfo rather unclear, because it
does not clearly indicate what was merged. Maybe something like
partbounds_merged would be better?

6) The try_partitionwise_join function is getting a bit too long and
harder to understand. The whole block in

if (joinrel->nparts == -1)
{
...
}

seems rather well isolated, so I propose to move it to a separate
function responsible only for the merging. We can simply call it on the
joinrel, and make it return right away if (joinrel->nparts == -1).

7) I'd suggest not to reference exact functions in comments unless
abolutely necessary, because it's harder to maintain and it does not
really explain purpose of the struct/code. E.g. consider this:

/* Per-partitioned-relation data for 
merge_list_bounds()/merge_range_bounds() */
typedef struct PartitionMap
{ ... }

Why does it matter where is the struct used? That's pretty trivial to
find using 'git grep' or something. Instead the comment should explain
the purpose of the struct.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-03-23 Thread Ashutosh Bapat

On Tue, Mar 17, 2020 at 1:44 PM Etsuro Fujita  wrote:
>
> > + /*
> > + * If this segment of the join is empty, it means that this segment
> >
> > "partition of the join" looks consistent with other usages than "segment of 
> > the
> > join".
>
> Actually, "segment" is used in the existing comments in the caller
> function try_partitionwise_join(), so I think "segment" is better here
> for consistency.

A segment can be any part of the join relation, not necessarily a
partition. May be we should change the caller.

>
> > + /*
> > + * Get a relids set of partition(s) involved in this join segment that
> > + * are from the rel1 side.
> > + */
> > + child_relids1 = bms_intersect(child_joinrel->relids,
> > +  rel1->all_partrels);
> >
> > The partition bounds are sorted by their values. Even for a list partitioned
> > table, the bounds are sorted by the least partition value. We do not allow
> > multiple paritions from one side to be joined with one partition on the 
> > other
> > and vice versa. All this together means that the partitions of the join
> > relation are formed by joining partitions from either side in the order of
> > their bounds. This means that the matching pairs of partitions can be found 
> > by
> > matching relids of partitions of join with those of the joining relation by
> > traversing partitions from all the three relations only once in the order 
> > they
> > appears in partition bounds of corresponding relations.
>
> Consider this 2-way join for list partitioned tables:
>
> CREATE TABLE plt1_ad (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_ad_p1 PARTITION OF plt1_ad FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_ad_p2 PARTITION OF plt1_ad FOR VALUES IN ('0002');
> INSERT INTO plt1_ad SELECT i, i, to_char(i % 10, 'FM') FROM
> generate_series(1, 100) i WHERE i % 10 in (1, 2, 3);
> ANALYZE plt1_ad;
> CREATE TABLE plt2_ad (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_ad_p1 PARTITION OF plt2_ad FOR VALUES IN ('0002', '0004');
> CREATE TABLE plt2_ad_p2 PARTITION OF plt2_ad FOR VALUES IN ('0003');
> INSERT INTO plt2_ad SELECT i, i, to_char(i % 10, 'FM') FROM
> generate_series(1, 100) i WHERE i % 10 IN (2, 3, 4);
> ANALYZE plt2_ad;
>
> EXPLAIN (COSTS OFF)
> SELECT t1.c, t1.a, t2.a FROM plt1_ad t1 INNER JOIN plt2_ad t2 ON (t1.c
> = t2.c) WHERE t1.a < 10  ORDER BY t1.c, t1.a, t2.a;
>  QUERY PLAN
> -
>  Sort
>Sort Key: t1.c, t1.a, t2.a
>->  Append
>  ->  Hash Join
>Hash Cond: (t2_1.c = t1_2.c)
>->  Seq Scan on plt2_ad_p1 t2_1
>->  Hash
>  ->  Seq Scan on plt1_ad_p2 t1_2
>Filter: (a < 10)
>  ->  Hash Join
>Hash Cond: (t2_2.c = t1_1.c)
>->  Seq Scan on plt2_ad_p2 t2_2
>->  Hash
>  ->  Seq Scan on plt1_ad_p1 t1_1
>Filter: (a < 10)
> (15 rows)
>
> As you can see from the EXPLAIN result, the first partition on the
> outer side matches the second partition on the inner side, and the
> second partition on the outer side matches the first partition on the
> inner side.  How does the algorithm you proposed work e.g., when an
> N-way join for list partitioned tables contains this join as its lower
> join?

Hmm, this is a good example. I tried to work out the algorithm based
on the bound ordering. The algorithm worked well when all the bounds
on both the sides were included in the join. But it didn't work well,
when some bounds vanished. In order to detect whether a bound has
vanished, we need to either compare that bound with the bounds of join
(an operation costlier than comparing bitmapset) or compare relids of
all the partitions of the join. Either way it looks costlier than what
you have right now. May be we could improve by keeping track of such
lost bounds and corresponding partitions. But I didn't get time to
work on that part. Anyway, even if such an algorithm exists, we will
have to change just a single function. That could be done later I
think. So we are good here right now. Thanks.


> > + if (rel1_is_simple)
> >
> > This variable is used only in one place. So instead we should the expression
> > assigning the value to it. Changed in the attached patch.
>
> I don't think that's a good idea, because this check is done
> repeatedly in a for loop.

Compiler's optimizer would anyway optimize it away. But anyway, I
won't insist on this.

>
> > - rel->nparts = 0;
> > + rel->nparts = -1;
> >
> > I think we need to add comments around various values that nparts can take. 
> > How
> > about like something attached.
>
> +1
>
> > + case PARTITION_STRATEGY_HASH:
> > + merged_bounds = NULL;
> >
> > I think, we need to explain why we aren't merging hash partition bounds. 
> > AFAIU,
> > the reason is thus: When the modulo of used for partition mapping (i.e. 
> > maximum
> >

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-03-17 Thread Etsuro Fujita

Hi Ashutosh,

On Wed, Mar 4, 2020 at 1:48 AM Ashutosh Bapat
 wrote:
> I reviewed the patch. Except for the logic of matching the pairs of 
> partitions from already merged partitions, I think the code changes are good. 
> But there are several places where it needs further changes to comments. The 
> attached patch has those. I have described some of them below.

Thanks for reviewing!

> + * We can not perform partition-wise join if either of the joining
> + * relations is not partitioned.
>
> We are consistently using partitionwise instead of partition-wise.

Will fix.

> + /*
> + * See if the partition bounds for inputs are exactly the same, in
> + * which case we don't need to work hard: partitions with the same
> + * partition indexes will form join pairs, and the join rel will have
> + * the same partition bounds as inputs; otherwise try to merge the
> + * partition bounds along with generating join pairs.
>
> Phrase "joining relations" is better than "inputs", IMO. Updated in the
> attached patch.

"inputs" is used in many places in the planner performing join
planning, so I'm not sure "joining relations" is better than "inputs".

> + /*
> + * If the partition bounds for the join rel are not merged ones,
> + * inputs are guaranteed to have the same partition bounds, so
> + * partitions with the same partition indexes will form join pairs;
> + * else let get_matching_part_pairs() do the work.
> + */
> + if (joinrel->merged)
> + {
>
> This condition in the comment is opposite to the condition being checked in
> code, so looks confusing. BTW this comment is also repeated around line 1405.
> See attached patch for correction.

OK, I'll revise the comments as proposed.

> + /*
> + * If this segment of the join is empty, it means that this segment
>
> "partition of the join" looks consistent with other usages than "segment of 
> the
> join".

Actually, "segment" is used in the existing comments in the caller
function try_partitionwise_join(), so I think "segment" is better here
for consistency.

> + /*
> + * Get a relids set of partition(s) involved in this join segment that
> + * are from the rel1 side.
> + */
> + child_relids1 = bms_intersect(child_joinrel->relids,
> +  rel1->all_partrels);
>
> The partition bounds are sorted by their values. Even for a list partitioned
> table, the bounds are sorted by the least partition value. We do not allow
> multiple paritions from one side to be joined with one partition on the other
> and vice versa. All this together means that the partitions of the join
> relation are formed by joining partitions from either side in the order of
> their bounds. This means that the matching pairs of partitions can be found by
> matching relids of partitions of join with those of the joining relation by
> traversing partitions from all the three relations only once in the order they
> appears in partition bounds of corresponding relations.

Consider this 2-way join for list partitioned tables:

CREATE TABLE plt1_ad (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_ad_p1 PARTITION OF plt1_ad FOR VALUES IN ('0001', '0003');
CREATE TABLE plt1_ad_p2 PARTITION OF plt1_ad FOR VALUES IN ('0002');
INSERT INTO plt1_ad SELECT i, i, to_char(i % 10, 'FM') FROM
generate_series(1, 100) i WHERE i % 10 in (1, 2, 3);
ANALYZE plt1_ad;
CREATE TABLE plt2_ad (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_ad_p1 PARTITION OF plt2_ad FOR VALUES IN ('0002', '0004');
CREATE TABLE plt2_ad_p2 PARTITION OF plt2_ad FOR VALUES IN ('0003');
INSERT INTO plt2_ad SELECT i, i, to_char(i % 10, 'FM') FROM
generate_series(1, 100) i WHERE i % 10 IN (2, 3, 4);
ANALYZE plt2_ad;

EXPLAIN (COSTS OFF)
SELECT t1.c, t1.a, t2.a FROM plt1_ad t1 INNER JOIN plt2_ad t2 ON (t1.c
= t2.c) WHERE t1.a < 10  ORDER BY t1.c, t1.a, t2.a;
 QUERY PLAN
-
 Sort
   Sort Key: t1.c, t1.a, t2.a
   ->  Append
 ->  Hash Join
   Hash Cond: (t2_1.c = t1_2.c)
   ->  Seq Scan on plt2_ad_p1 t2_1
   ->  Hash
 ->  Seq Scan on plt1_ad_p2 t1_2
   Filter: (a < 10)
 ->  Hash Join
   Hash Cond: (t2_2.c = t1_1.c)
   ->  Seq Scan on plt2_ad_p2 t2_2
   ->  Hash
 ->  Seq Scan on plt1_ad_p1 t1_1
   Filter: (a < 10)
(15 rows)

As you can see from the EXPLAIN result, the first partition on the
outer side matches the second partition on the inner side, and the
second partition on the outer side matches the first partition on the
inner side.  How does the algorithm you proposed work e.g., when an
N-way join for list partitioned tables contains this join as its lower
join?

> If we use this
> algorithm, we don't need all_partrels to be collected and also don't need to
> search base or join relation. That, I think, will reduce the time and space
> complexity of this logic. Am I missing

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-03-03 Thread Ashutosh Bapat

Hi Fujita-san,
I reviewed the patch. Except for the logic of matching the pairs of
partitions from already merged partitions, I think the code changes are
good. But there are several places where it needs further changes to
comments. The attached patch has those. I have described some of them below.
+ * We can not perform partition-wise join if either of the joining
+ * relations is not partitioned.

We are consistently using partitionwise instead of partition-wise.

+ /*
+ * See if the partition bounds for inputs are exactly the same, in
+ * which case we don't need to work hard: partitions with the same
+ * partition indexes will form join pairs, and the join rel will have
+ * the same partition bounds as inputs; otherwise try to merge the
+ * partition bounds along with generating join pairs.

Phrase "joining relations" is better than "inputs", IMO. Updated in the
attached patch.

+ /*
+ * If the partition bounds for the join rel are not merged ones,
+ * inputs are guaranteed to have the same partition bounds, so
+ * partitions with the same partition indexes will form join pairs;
+ * else let get_matching_part_pairs() do the work.
+ */
+ if (joinrel->merged)
+ {

This condition in the comment is opposite to the condition being checked in
code, so looks confusing. BTW this comment is also repeated around line
1405.
See attached patch for correction.

+ /*
+ * If this segment of the join is empty, it means that this segment

"partition of the join" looks consistent with other usages than "segment of
the
join". Modified in the attached patch.

+ /*
+ * Get a relids set of partition(s) involved in this join segment that
+ * are from the rel1 side.
+ */
+ child_relids1 = bms_intersect(child_joinrel->relids,
+  rel1->all_partrels);

The partition bounds are sorted by their values. Even for a list partitioned
table, the bounds are sorted by the least partition value. We do not allow
multiple paritions from one side to be joined with one partition on the
other
and vice versa. All this together means that the partitions of the join
relation are formed by joining partitions from either side in the order of
their bounds. This means that the matching pairs of partitions can be found
by
matching relids of partitions of join with those of the joining relation by
traversing partitions from all the three relations only once in the order
they
appears in partition bounds of corresponding relations. If we use this
algorithm, we don't need all_partrels to be collected and also don't need to
search base or join relation. That, I think, will reduce the time and space
complexity of this logic. Am I missing something? As a side effect it won't
require any special handling for base and join relation.

+ /*
+ * Get a child rel for rel1 with the relids.  Note that we should have
+ * the child rel even if rel1 is a join rel, because in that case the
+ * partitions specified in the relids would have matching/overlapping
+ * boundaries, so those partitions should be considered as ones to be
+ * joined even when planning partitionwise joins of rel1, meaning that
+ * the child rel would have been built by the time we get here.
+ */
+ if (rel1_is_simple)

This variable is used only in one place. So instead we should the expression
assigning the value to it. Changed in the attached patch.

- rel->nparts = 0;
+ rel->nparts = -1;

I think we need to add comments around various values that nparts can take.
How
about like something attached.

+ case PARTITION_STRATEGY_HASH:
+ merged_bounds = NULL;

I think, we need to explain why we aren't merging hash partition bounds.
AFAIU,
the reason is thus: When the modulo of used for partition mapping (i.e.
maximum
number of has partitions) is same, their partition bounds are same and do
not
need merging. If the maximum number of partitions is different for both the
joining relations, there's high probability that one partition on one side
will
join with multiple partitions on the other side. So exact partition bounds
match will work in most of the cases. The cases otherwise are not so common
to
spend the effort in coding and planning.

I have added this explanation in the patch. Don't know if it's there written
somewhere already.

+ if (part_index == -1)
+ return -1;
+ } while (is_dummy_partition(rel, part_index));

I understand why we are skipping NULL positions. I am not sure why are we
are
skipping over RelOptInfos which exist but are marked as dummy; we can still
create a join pair with those partitions.

+/*
+ * get_merged_range_bounds
+ * Given the bounds of range partitions to be join, determine the range

s/join/joined/

There are more changes to comments, where I thought that the comments are
required or existing comments need more clarification. Please review the
attached patch. This patch is created on top of
v32-0001-Improve-partition-matching-for-partitionwise-join.

On Mon, Feb 10, 2020 at 5:14 PM Etsuro Fujita 
wrote:

> On Fri, Feb 7, 2020 at 9:57 PM Etsuro Fujita 
> wrote:
> > On

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-02-07 Thread Etsuro Fujita

On Thu, Feb 6, 2020 at 3:55 AM Mark Dilger  wrote:
> The patches apply and pass all tests.  A review of the patch vs. master looks 
> reasonable.

Thanks for the review!

> The partition_join.sql test has multiple levels of partitioning, but when 
> your patch extends that test with “advanced partition-wise join”, none of the 
> tables for the new section have multiple levels.  I spent a little while 
> reviewing the code and inventing multiple level partitioning tests for 
> advanced partition-wise join and did not encounter any problems.  I don’t 
> care whether you use this particular example, but do you want to have 
> multiple level partitioning in the new test section?

Yes, I do.

> CREATE TABLE alpha (a double precision, b double precision) PARTITION BY 
> RANGE (a);
> CREATE TABLE alpha_neg PARTITION OF alpha FOR VALUES FROM ('-Infinity') TO 
> (0) PARTITION BY RANGE (b);
> CREATE TABLE alpha_pos PARTITION OF alpha FOR VALUES FROM (0) TO ('Infinity') 
> PARTITION BY RANGE (b);
> CREATE TABLE alpha_nan PARTITION OF alpha FOR VALUES FROM ('Infinity') TO 
> ('NaN');
> CREATE TABLE alpha_neg_neg PARTITION OF alpha_neg FOR VALUES FROM 
> ('-Infinity') TO (0);
> CREATE TABLE alpha_neg_pos PARTITION OF alpha_neg FOR VALUES FROM (0) TO 
> ('Infinity');
> CREATE TABLE alpha_neg_nan PARTITION OF alpha_neg FOR VALUES FROM 
> ('Infinity') TO ('NaN');
> CREATE TABLE alpha_pos_neg PARTITION OF alpha_pos FOR VALUES FROM 
> ('-Infinity') TO (0);
> CREATE TABLE alpha_pos_pos PARTITION OF alpha_pos FOR VALUES FROM (0) TO 
> ('Infinity');
> CREATE TABLE alpha_pos_nan PARTITION OF alpha_pos FOR VALUES FROM 
> ('Infinity') TO ('NaN');
> INSERT INTO alpha (a, b)
> (SELECT * FROM
> (VALUES (-1.0::float8), (0.0::float8), (1.0::float8), 
> ('Infinity'::float8)) a,
> (VALUES (-1.0::float8), (0.0::float8), (1.0::float8), 
> ('Infinity'::float8)) b
> );
> ANALYZE alpha;
> ANALYZE alpha_neg;
> ANALYZE alpha_pos;
> ANALYZE alpha_nan;
> ANALYZE alpha_neg_neg;
> ANALYZE alpha_neg_pos;
> ANALYZE alpha_neg_nan;
> ANALYZE alpha_pos_neg;
> ANALYZE alpha_pos_pos;
> ANALYZE alpha_pos_nan;
> CREATE TABLE beta (a double precision, b double precision) PARTITION BY RANGE 
> (a, b);
> CREATE TABLE beta_lo PARTITION OF beta FOR VALUES FROM (-5, -5) TO (0, 0);
> CREATE TABLE beta_me PARTITION OF beta FOR VALUES FROM (0, 0) TO (0, 5);
> CREATE TABLE beta_hi PARTITION OF beta FOR VALUES FROM (0, 5) TO (5, 5);
> INSERT INTO beta (a, b)
> (SELECT * FROM
> (VALUES (-1.0::float8), (0.0::float8), (1.0::float8)) a,
> (VALUES (-1.0::float8), (0.0::float8), (1.0::float8)) b
> );
> ANALYZE beta;
> ANALYZE beta_lo;
> ANALYZE beta_me;
> ANALYZE beta_hi;
> EXPLAIN SELECT * FROM alpha INNER JOIN beta ON (alpha.a = beta.a AND alpha.b 
> = beta.b) WHERE alpha.a = 1 AND beta.b = 1;
>   QUERY PLAN
> ---
>  Nested Loop  (cost=0.00..2.11 rows=1 width=32)
>->  Seq Scan on alpha_pos_pos alpha  (cost=0.00..1.06 rows=1 width=16)
>  Filter: ((b = '1'::double precision) AND (a = '1'::double precision))
>->  Seq Scan on beta_hi beta  (cost=0.00..1.04 rows=1 width=16)
>  Filter: ((b = '1'::double precision) AND (a = '1'::double precision))
> (5 rows)

Hmm, I'm not sure this is a good test case for that, because this
result would be due to partition pruning applied to each side of the
join before considering partition-wise join; you could get the same
result even with enable_partitionwise_join=off.  I think it's
important that the partition-wise join logic doesn't break this query,
though.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-02-07 Thread Etsuro Fujita

On Thu, Feb 6, 2020 at 2:03 AM Mark Dilger  wrote:
> > On Feb 5, 2020, at 4:51 AM, Etsuro Fujita  wrote:

> >> On Tue, Jan 28, 2020 at 1:39 PM Mark Dilger
> >>  wrote:
> >>> I have added tests checking correctness and showing some partition 
> >>> pruning limitations.  Find three patches, attached.
> >>>
> >>> The v31-0001-… patch merely applies your patches as a starting point for 
> >>> the next two patches.  It is your work, not mine.
> >>>
> >>> The v31-0002-… patch adds more regression tests for range partitioning.  
> >>> The commit message contains my comments about that.
> >>>
> >>> The v31-0003-… patch adds more regression tests for list partitioning, 
> >>> and again, the commit message contains my comments about that.

> > I tested the new test patches.  The patches are applied to the
> > partition_join.sql regression test cleanly, but the new tests failed
> > in my environment (even with make check LANG=C).  I think I should set
> > the right locale for the testing.  Which one did you use?
>
> I did not set a locale in the tests.  My environment has:
>
> LANG="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_CTYPE="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_ALL=

Thanks for the information!

> >  Maybe I'm
> > missing something, but let me comment on the new tests.  This is the
> > one proposed in the v31-0002 patch:
> >
> > +EXPLAIN (COSTS OFF)
> > +SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a)
> > WHERE t1.a IN ('äbç', 'ὀδυσσεύς');
> > +QUERY PLAN
> > +--
> > + Hash Join
> > +   Hash Cond: (t2.a = t1.a)
> > +   ->  Append
> > + ->  Seq Scan on beta_a t2_1
> > + ->  Seq Scan on beta_b t2_2
> > + ->  Seq Scan on beta_c t2_3
> > + ->  Seq Scan on beta_d t2_4
> > + ->  Seq Scan on beta_e t2_5
> > + ->  Seq Scan on beta_f t2_6
> > + ->  Seq Scan on beta_g t2_7
> > + ->  Seq Scan on beta_h t2_8
> > + ->  Seq Scan on beta_default t2_9
> > +   ->  Hash
> > + ->  Append
> > +   ->  Seq Scan on alpha_e t1_1
> > + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> > +   ->  Seq Scan on alpha_default t1_2
> > + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> > +(18 rows)
> >
> > The commit message says:
> >
> >When joining with
> >
> >alpha.a = beta.a and alpha.a IN ('äbç', 'ὀδυσσεύς')
> >
> >the planner does the right thing for one side of the query, but
> >hits all partitions for the other side, which it doesn't need to
> >do.
> >
> > Yeah, I agree that the resulting plan is not efficient.  The reason
> > for that would be that the planner doesn't generate a qual on the
> > outer side matching the ScalarArrayOpExpr qual "a = ANY
> > ('{äbç,ὀδυσσεύς}'::text[])" on the inner side, which I think would be
> > a restriction caused by the equivalence machinery not by the
> > partitionwise join logic IIUC.
>
> It’s fine if this is beyond the scope of the patch.
>
> >  I think the critique would be useful,
> > so I don't object to adding this test case, but the critique would be
> > more about query planning that is actually not related to the
> > partitionwise join logic, so I'm not sure that the partition_join.sql
> > regression test is the best place to add it.  I guess that there would
> > be much better places than partition_join.sql.
>
> You don’t need to add the test anywhere.  It’s enough for me that you looked 
> at it and considered whether the partition-wise join patch should do anything 
> differently in this case.  Again, it sounds like this is beyond the scope of 
> the patch.

OK

> > (This is nitpicking;
> > but another thing I noticed about this test case is that the join
> > query contains only a single join condition "t1.a = t2.a", but the
> > queried tables alpha and beta are range-partitioned by multiple
> > columns a and b, so the query should have a join condition for each of
> > the columns like "t1.a = t2.a AND t1.b = t2.b" if adding this as a
> > test case for partitionwise join.)
>
> Well, it is important that partition-wise join does not break such queries.  
> I added the column ‘b’ to the partitioning logic to verify that did not 
> confuse the code in your patch.

OK, thanks for the testing!

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-02-05 Thread Mark Dilger




> On Feb 5, 2020, at 4:51 AM, Etsuro Fujita  wrote:
> 
> 


The patches apply and pass all tests.  A review of the patch vs. master looks 
reasonable.

The partition_join.sql test has multiple levels of partitioning, but when your 
patch extends that test with “advanced partition-wise join”, none of the tables 
for the new section have multiple levels.  I spent a little while reviewing the 
code and inventing multiple level partitioning tests for advanced 
partition-wise join and did not encounter any problems.  I don’t care whether 
you use this particular example, but do you want to have multiple level 
partitioning in the new test section?

CREATE TABLE alpha (a double precision, b double precision) PARTITION BY RANGE 
(a);
CREATE TABLE alpha_neg PARTITION OF alpha FOR VALUES FROM ('-Infinity') TO (0) 
PARTITION BY RANGE (b);
CREATE TABLE alpha_pos PARTITION OF alpha FOR VALUES FROM (0) TO ('Infinity') 
PARTITION BY RANGE (b);
CREATE TABLE alpha_nan PARTITION OF alpha FOR VALUES FROM ('Infinity') TO 
('NaN');
CREATE TABLE alpha_neg_neg PARTITION OF alpha_neg FOR VALUES FROM ('-Infinity') 
TO (0);
CREATE TABLE alpha_neg_pos PARTITION OF alpha_neg FOR VALUES FROM (0) TO 
('Infinity');
CREATE TABLE alpha_neg_nan PARTITION OF alpha_neg FOR VALUES FROM ('Infinity') 
TO ('NaN');
CREATE TABLE alpha_pos_neg PARTITION OF alpha_pos FOR VALUES FROM ('-Infinity') 
TO (0);
CREATE TABLE alpha_pos_pos PARTITION OF alpha_pos FOR VALUES FROM (0) TO 
('Infinity');
CREATE TABLE alpha_pos_nan PARTITION OF alpha_pos FOR VALUES FROM ('Infinity') 
TO ('NaN');
INSERT INTO alpha (a, b)
(SELECT * FROM
(VALUES (-1.0::float8), (0.0::float8), (1.0::float8), 
('Infinity'::float8)) a,
(VALUES (-1.0::float8), (0.0::float8), (1.0::float8), 
('Infinity'::float8)) b
);
ANALYZE alpha;
ANALYZE alpha_neg;
ANALYZE alpha_pos;
ANALYZE alpha_nan;
ANALYZE alpha_neg_neg;
ANALYZE alpha_neg_pos;
ANALYZE alpha_neg_nan;
ANALYZE alpha_pos_neg;
ANALYZE alpha_pos_pos;
ANALYZE alpha_pos_nan;
CREATE TABLE beta (a double precision, b double precision) PARTITION BY RANGE 
(a, b);
CREATE TABLE beta_lo PARTITION OF beta FOR VALUES FROM (-5, -5) TO (0, 0);
CREATE TABLE beta_me PARTITION OF beta FOR VALUES FROM (0, 0) TO (0, 5);
CREATE TABLE beta_hi PARTITION OF beta FOR VALUES FROM (0, 5) TO (5, 5);
INSERT INTO beta (a, b)
(SELECT * FROM
(VALUES (-1.0::float8), (0.0::float8), (1.0::float8)) a,
(VALUES (-1.0::float8), (0.0::float8), (1.0::float8)) b
);
ANALYZE beta;
ANALYZE beta_lo;
ANALYZE beta_me;
ANALYZE beta_hi;
EXPLAIN SELECT * FROM alpha INNER JOIN beta ON (alpha.a = beta.a AND alpha.b = 
beta.b) WHERE alpha.a = 1 AND beta.b = 1;
  QUERY PLAN
---
 Nested Loop  (cost=0.00..2.11 rows=1 width=32)
   ->  Seq Scan on alpha_pos_pos alpha  (cost=0.00..1.06 rows=1 width=16)
 Filter: ((b = '1'::double precision) AND (a = '1'::double precision))
   ->  Seq Scan on beta_hi beta  (cost=0.00..1.04 rows=1 width=16)
 Filter: ((b = '1'::double precision) AND (a = '1'::double precision))
(5 rows)




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-02-05 Thread Mark Dilger




> On Feb 5, 2020, at 4:51 AM, Etsuro Fujita  wrote:
> 
> On Wed, Jan 29, 2020 at 9:15 PM Etsuro Fujita  wrote:
>> On Tue, Jan 28, 2020 at 1:39 PM Mark Dilger
>>  wrote:
>>> I have added tests checking correctness and showing some partition pruning 
>>> limitations.  Find three patches, attached.
>>> 
>>> The v31-0001-… patch merely applies your patches as a starting point for 
>>> the next two patches.  It is your work, not mine.
>>> 
>>> The v31-0002-… patch adds more regression tests for range partitioning.  
>>> The commit message contains my comments about that.
>>> 
>>> The v31-0003-… patch adds more regression tests for list partitioning, and 
>>> again, the commit message contains my comments about that.
>> 
>> I'll dig into it more closely.
> 
> I tested the new test patches.  The patches are applied to the
> partition_join.sql regression test cleanly, but the new tests failed
> in my environment (even with make check LANG=C).  I think I should set
> the right locale for the testing.  Which one did you use?

I did not set a locale in the tests.  My environment has:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=


>  Maybe I'm
> missing something, but let me comment on the new tests.  This is the
> one proposed in the v31-0002 patch:
> 
> +EXPLAIN (COSTS OFF)
> +SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a)
> WHERE t1.a IN ('äbç', 'ὀδυσσεύς');
> +QUERY PLAN
> +--
> + Hash Join
> +   Hash Cond: (t2.a = t1.a)
> +   ->  Append
> + ->  Seq Scan on beta_a t2_1
> + ->  Seq Scan on beta_b t2_2
> + ->  Seq Scan on beta_c t2_3
> + ->  Seq Scan on beta_d t2_4
> + ->  Seq Scan on beta_e t2_5
> + ->  Seq Scan on beta_f t2_6
> + ->  Seq Scan on beta_g t2_7
> + ->  Seq Scan on beta_h t2_8
> + ->  Seq Scan on beta_default t2_9
> +   ->  Hash
> + ->  Append
> +   ->  Seq Scan on alpha_e t1_1
> + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> +   ->  Seq Scan on alpha_default t1_2
> + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> +(18 rows)
> 
> The commit message says:
> 
>When joining with
> 
>alpha.a = beta.a and alpha.a IN ('äbç', 'ὀδυσσεύς')
> 
>the planner does the right thing for one side of the query, but
>hits all partitions for the other side, which it doesn't need to
>do.
> 
> Yeah, I agree that the resulting plan is not efficient.  The reason
> for that would be that the planner doesn't generate a qual on the
> outer side matching the ScalarArrayOpExpr qual "a = ANY
> ('{äbç,ὀδυσσεύς}'::text[])" on the inner side, which I think would be
> a restriction caused by the equivalence machinery not by the
> partitionwise join logic IIUC.

It’s fine if this is beyond the scope of the patch.

>  I think the critique would be useful,
> so I don't object to adding this test case, but the critique would be
> more about query planning that is actually not related to the
> partitionwise join logic, so I'm not sure that the partition_join.sql
> regression test is the best place to add it.  I guess that there would
> be much better places than partition_join.sql.

You don’t need to add the test anywhere.  It’s enough for me that you looked at 
it and considered whether the partition-wise join patch should do anything 
differently in this case.  Again, it sounds like this is beyond the scope of 
the patch.

> (This is nitpicking;
> but another thing I noticed about this test case is that the join
> query contains only a single join condition "t1.a = t2.a", but the
> queried tables alpha and beta are range-partitioned by multiple
> columns a and b, so the query should have a join condition for each of
> the columns like "t1.a = t2.a AND t1.b = t2.b" if adding this as a
> test case for partitionwise join.)

Well, it is important that partition-wise join does not break such queries.  I 
added the column ‘b’ to the partitioning logic to verify that did not confuse 
the code in your patch.

> I feel almost the same to other
> test cases in the patch (and the v31-0003 patch), except this one
> proposed in the v31-0003 patch:
> 
> +CREATE TABLE alpha (a TEXT) PARTITION BY LIST(a);
> +CREATE TABLE alpha_a PARTITION OF alpha FOR VALUES IN ('Turkiye', 'TURKIYE');
> +CREATE TABLE alpha_b PARTITION OF alpha FOR VALUES IN ('b?t', 'BIT');
> +CREATE TABLE alpha_c PARTITION OF alpha FOR VALUES IN ('abc', 'ABC');
> +CREATE TABLE alpha_d PARTITION OF alpha FOR VALUES IN ('aaa', 'cote', 
> 'Gotz');
> +CREATE TABLE alpha_e PARTITION OF alpha FOR VALUES IN ('?δυσσε??', 
> '?ΔΥΣΣΕ?Σ');
> +CREATE TABLE alpha_f PARTITION OF alpha FOR VALUES IN ('を読み取り用',
> 'にオープンできませんでした', NULL);
> +CREATE TABLE alpha_default PARTITION OF alpha DEFAULT;

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-01-29 Thread Etsuro Fujita

Hi Mark,

On Tue, Jan 28, 2020 at 1:39 PM Mark Dilger
 wrote:
> There is stray whitespace in v30-0002:
>
> src/backend/partitioning/partbounds.c:4557: space before tab in indent.
> +   outer_null_unmerged = true;

Good catch!

> I have added tests checking correctness and showing some partition pruning 
> limitations.  Find three patches, attached.
>
> The v31-0001-… patch merely applies your patches as a starting point for the 
> next two patches.  It is your work, not mine.
>
> The v31-0002-… patch adds more regression tests for range partitioning.  The 
> commit message contains my comments about that.
>
> The v31-0003-… patch adds more regression tests for list partitioning, and 
> again, the commit message contains my comments about that.

The PWJ behavior shown by the tests you added looks interesting!  I'll
dig into it more closely.  Thanks for the patches and review!

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-12-10 Thread Etsuro Fujita

Hi Amul,

On Tue, Dec 10, 2019 at 3:49 PM amul sul  wrote:
> On Mon, Dec 9, 2019 at 3:08 PM amul sul  wrote:
>> Attached is the rebase version atop the latest master head(2d0fdfaccec).

Thanks for that!

> I have been through your changes proposed in [1] -- the changes make sense to 
> me &
> I didn't see any unobvious behaviour in testing as well.

Thanks for reviewing!  I'll merge the changes into the main patch,
then.  I don't see any issues in the latest version, but I think we
need to polish the patch, so I'll do that.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-12-09 Thread amul sul

On Mon, Dec 9, 2019 at 3:08 PM amul sul  wrote:

> Attached is the rebase version atop the latest master head(2d0fdfaccec).
>
>
Hi Fujita san,

I have been through your changes proposed in [1] -- the changes make sense
to me &
I didn't see any unobvious behaviour in testing as well.

Regards,
Amul

1]
https://postgr.es/m/CAPmGK15kXh76EnRn=B64u=+qlxzokwrod4umjmbnwcszpdq...@mail.gmail.com

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-11-28 Thread Etsuro Fujita

On Fri, Nov 22, 2019 at 10:08 PM Etsuro Fujita  wrote:
> Done.  I modified compare_range_partitions() as well to match its the
> variable names with others.  Attached is a new version of the patch.
>
> I reviewed the rest of the partbounds.c changes.  Here are my review comments:
>
> * I don't think this analysis is correct:
>
> +/*
> + * merge_null_partitions
> + *
> + * Merge NULL partitions, i.e. a partition that can hold NULL values for a 
> lis\
> t
> + * partitioned table, if any. Find the index of merged partition to which the
> + * NULL values would belong in the join result. If one joining relation has a
> + * NULL partition but not the other, try matching it with the default 
> partitio\
> n
> + * from the other relation since the default partition may have rows with 
> NULL
> + * partition key. We can eliminate a NULL partition when it appears only on 
> th\
> e
> + * inner side of the join and the outer side doesn't have a default 
> partition.
> + *
> + * When the equality operator used for join is strict, two NULL values will 
> no\
> t
> + * be considered as equal, and thus a NULL partition can be eliminated for an
> + * inner join. But we don't check the strictness operator here.
> + */
>
> First of all, I think we can assume here that the equality operator is
> *strict*, because 1) have_partkey_equi_join(), which is executed
> before calling merge_null_partitions(), requires that the
> corresponding clause is mergejoinable, and 2) currently, we assume
> that mergejoinable operators are strict (see MJEvalOuterValues() and
> MJEvalInnerValues()).  So I don't think it's needed to try merging a
> NULL partition on one side with the default partition on the other
> side as above.  (merge_null_partitions() tries merging NULL partitions
> as well, but in some cases I don't think we need to do that, either.)
> So I rewrote merge_null_partitions() completely.  Another change is
> that I eliminated the NULL partition of the joinrel in more cases.
> Attached is a patch (v28-0002-modify-merge_null_partitions.patch) for
> that created on top of the main patch.  I might be missing something,
> though.
>
> Other changes in that patch:
>
> * I fixed memory leaks in partition_list_bounds_merge() and
> partition_range_bounds_merge().
>
> * I noticed this in merge_default_partitions():
>
> +   Assert(outer_has_default && inner_has_default);
> +
> +   *default_index = map_and_merge_partitions(outer_map,
> + inner_map,
> + outer_default,
> + inner_default,
> + next_index);
> +   if (*default_index == -1)
> +   return false;
>
> I think the merging should be done successfully, because of 1) this in
> process_outer_partition():
>
> +   if (inner_has_default)
> +   {
> +   Assert(inner_default >= 0);
> +
> +   /*
> +* If the outer side has the default partition as well, we need to
> +* merge the default partitions (see merge_default_partitions()); give
> +* up on it.
> +*/
> +   if (outer_has_default)
> +   return false;
>
> and 2) this in process_inner_partition():
>
> +   if (outer_has_default)
> +   {
> +   Assert(outer_default >= 0);
> +
> +   /*
> +* If the inner side has the default partition as well, we need to
> +* merge the default partitions (see merge_default_partitions()); give
> +* up on it.
> +*/
> +   if (inner_has_default)
> +   return false;
>
> So I removed the above if test (ie, *default_index == -1).  I also
> modified that function a bit further, including comments.
>
> * I simplified process_outer_partition() and process_inner_partition()
> a bit, changing the APIs to match that of map_and_merge_partitions().
> Also, I think this in these functions is a bit ugly;
>
> +   /* Don't add this index to the list of merged indexes. */
> +   *merged_index = -1;
>
> so I removed it, and modified the condition on whether or not we add
> merged bounds to the lists in partition_list_bounds_merge() and
> partition_range_bounds_merge(), instead.

Moved to the next CF.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-11-15 Thread Etsuro Fujita

Hi Amul,

On Fri, Nov 15, 2019 at 2:21 PM amul sul  wrote:
> Thank you Fujita san for the patch & the enhancements.  I am fine with your
> delta patch.

OK, I'll merge the delta patch with the main one in the next version,
if no objections from others.

> I would like to share some thought for other code:
>
> File: partbounds.c:
> 3298 static void
> 3299 get_merged_range_bounds(int partnatts, FmgrInfo *partsupfuncs,
> 3300 Oid *partcollations, JoinType jointype,
> 3301 PartitionRangeBound *left_lb,
> 3302 PartitionRangeBound *left_ub,
> 3303 PartitionRangeBound *right_lb,
> 3304 PartitionRangeBound *right_ub,
> 3305 PartitionRangeBound *merged_lb,
> 3306 PartitionRangeBound *merged_ub)
>
> Can we rename these argument as inner_*  & outer_*  like we having for the
> functions, like 0003 patch?

+1  (Actually, I too was thinking about that.)

> File: partbounds.c:
> 3322
> 3323 case JOIN_INNER:
> 3324 case JOIN_SEMI:
> 3325 if (compare_range_bounds(partnatts, partsupfuncs, 
> partcollations,
> 3326  left_ub, right_ub) < 0)
> 3327 *merged_ub = *left_ub;
> 3328 else
> 3329 *merged_ub = *right_ub;
> 3330
> 3331 if (compare_range_bounds(partnatts, partsupfuncs, 
> partcollations,
> 3332  left_lb, right_lb) > 0)
>  *merged_lb = *left_lb;
> 3334 else
> 3335 *merged_lb = *right_lb;
> 3336 break;
> 3337
>
> How about reusing ub_cmpval & lb_cmpval  instead of calling
> compare_range_bounds() inside get_merged_range_bounds(), like 0004 patch?

Good idea!  So, +1

> Apart from this, I would like to propose 0005 cleanup patch where I have
> rearranged function arguments & code to make sure the arguments & the code
> related to lower bound should come first and then the upper bound.

+1

I'll merge these changes as well into the main patch, if no objections
of others.

Thanks for the review and patches!

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-10-31 Thread Etsuro Fujita

On Tue, Oct 29, 2019 at 7:29 PM Etsuro Fujita  wrote:
> On Fri, Oct 25, 2019 at 6:59 PM amul sul  wrote:
> > On Wed, Oct 16, 2019 at 6:20 PM Etsuro Fujita  
> > wrote:
> >> So I'd like to propose to introduce separate functions like
> >> process_outer_partition() and process_inner_partition() in the
> >> attached, instead of handle_missing_partition().  (I added a fast path
> >> to these functions that if both outer/inner sides have the default
> >> partitions, give up on merging partitions.  Also, I modified the
> >> caller sites of proposed functions so that they call these if
> >> necessary.)
>
> > Agree -- process_outer_partition() & process_inner_partition() approach 
> > looks
> > much cleaner than before.

> > Note that LHS numbers are the line numbers in your previously posted 
> > patch[1].
> >
> >  455 +   if (inner_has_default ||
> >  456 +   jointype == JOIN_LEFT ||
> >  457 +   jointype == JOIN_ANTI ||
> >  458 +   jointype == JOIN_FULL)
> >  459 +   {
> >  460 +   if (!process_outer_partition(_map,
> >
> >  512 +   if (outer_has_default || jointype == JOIN_FULL)
> >  513 +   {
> >  514 +   if (!process_inner_partition(_map,
> >
> > How about adding these conditions to the else block of 
> > process_outer_partition()
> > & process_inner_partition() function respectively so that these functions 
> > can be
> > called unconditionally?  Thoughts/Comments?
>
> I'm not sure that's a good idea; these functions might be called many
> times, so I just thought it would be better to call these functions
> conditionally, to avoid useless function call overhead.

The overhead might be small, but isn't zero, so I still think that we
should call these functions if necessary.

> >  456 +   jointype == JOIN_LEFT ||
> >  457 +   jointype == JOIN_ANTI ||
> >  458 +   jointype == JOIN_FULL)
> >
> > Also, how about using IS_OUTER_JOIN() instead. But we need an assertion to
> > restrict JOIN_RIGHT or something.
>
> Seems like a good idea.

Done.

> > For the convenience, I did both aforesaid changes in the attached delta 
> > patch that
> > can be applied atop of your previously posted patch[1].  Kindly have a look 
> > & share
> > your thoughts, thanks.
>
> Thanks for the patch!
>
> > 1273 + * *next_index is incremented when creating a new merged partition 
> > associated
> > 1274 + * with the given outer partition.
> > 1275 + */
> >
> > Correction: s/outer/inner
> > ---
> >
> > 1338 +* In range partitioning, if the given outer partition is 
> > already
> > 1339 +* merged (eg, because we found an overlapping range earlier), 
> > we know
> > 1340 +* where it fits in the join result; nothing to do in that 
> > case.  Else
> > 1341 +* create a new merged partition.
> >
> > Correction: s/outer/inner.
> > ---
> >
> > 1712 /*
> > 1713  * If the NULL partition was missing from the inner side of 
> > the join,
> >
> > s/inner side/outer side
> > --
>
> Good catch!  Will fix.

Done.

I also added some assertions to process_outer_partition() and
process_inner_partition(), including ones as proposed in your patch.
Attached is an updated version.  If no objections, I'll merge this
with the main patch [1].

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAPmGK14WHKckT1P6UJV2B63TZAxPyMn8iZJ99XF=azunhg6...@mail.gmail.com


modify-partbounds-changes-2.patch
Description: Binary data

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-10-29 Thread Etsuro Fujita

Hi Amul,

On Fri, Oct 25, 2019 at 6:59 PM amul sul  wrote:
> On Wed, Oct 16, 2019 at 6:20 PM Etsuro Fujita  wrote:
>> So I'd like to propose to introduce separate functions like
>> process_outer_partition() and process_inner_partition() in the
>> attached, instead of handle_missing_partition().  (I added a fast path
>> to these functions that if both outer/inner sides have the default
>> partitions, give up on merging partitions.  Also, I modified the
>> caller sites of proposed functions so that they call these if
>> necessary.)

> Agree -- process_outer_partition() & process_inner_partition() approach looks
> much cleaner than before.
>
> Here are the few comments:

Thanks for the review!

> Note that LHS numbers are the line numbers in your previously posted patch[1].
>
>  455 +   if (inner_has_default ||
>  456 +   jointype == JOIN_LEFT ||
>  457 +   jointype == JOIN_ANTI ||
>  458 +   jointype == JOIN_FULL)
>  459 +   {
>  460 +   if (!process_outer_partition(_map,
>
>  512 +   if (outer_has_default || jointype == JOIN_FULL)
>  513 +   {
>  514 +   if (!process_inner_partition(_map,
>
> How about adding these conditions to the else block of 
> process_outer_partition()
> & process_inner_partition() function respectively so that these functions can 
> be
> called unconditionally?  Thoughts/Comments?

I'm not sure that's a good idea; these functions might be called many
times, so I just thought it would be better to call these functions
conditionally, to avoid useless function call overhead.

>  456 +   jointype == JOIN_LEFT ||
>  457 +   jointype == JOIN_ANTI ||
>  458 +   jointype == JOIN_FULL)
>
> Also, how about using IS_OUTER_JOIN() instead. But we need an assertion to
> restrict JOIN_RIGHT or something.

Seems like a good idea.

> For the convenience, I did both aforesaid changes in the attached delta patch 
> that
> can be applied atop of your previously posted patch[1].  Kindly have a look & 
> share
> your thoughts, thanks.

Thanks for the patch!

> 1273 + * *next_index is incremented when creating a new merged partition 
> associated
> 1274 + * with the given outer partition.
> 1275 + */
>
> Correction: s/outer/inner
> ---
>
> 1338 +* In range partitioning, if the given outer partition is already
> 1339 +* merged (eg, because we found an overlapping range earlier), 
> we know
> 1340 +* where it fits in the join result; nothing to do in that case. 
>  Else
> 1341 +* create a new merged partition.
>
> Correction: s/outer/inner.
> ---
>
> 1712 /*
> 1713  * If the NULL partition was missing from the inner side of the 
> join,
>
> s/inner side/outer side
> --

Good catch!  Will fix.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-10-25 Thread amul sul

On Wed, Oct 16, 2019 at 6:20 PM Etsuro Fujita 
wrote:

> On Wed, Sep 25, 2019 at 12:59 AM Etsuro Fujita 
> wrote:
> > I will continue to review the rest of the patch.
>
> I've been reviewing the rest of the patch.  Here are my review comments:
>
[]

> So I'd like to propose to introduce separate functions like
> process_outer_partition() and process_inner_partition() in the
> attached, instead of handle_missing_partition().  (I added a fast path
> to these functions that if both outer/inner sides have the default
> partitions, give up on merging partitions.  Also, I modified the
> caller sites of proposed functions so that they call these if
> necessary.)
>

Agree -- process_outer_partition() & process_inner_partition() approach
looks
much cleaner than before.

Here are the few comments:

Note that LHS numbers are the line numbers in your previously posted
patch[1].

 455 +   if (inner_has_default ||
 456 +   jointype == JOIN_LEFT ||
 457 +   jointype == JOIN_ANTI ||
 458 +   jointype == JOIN_FULL)
 459 +   {
 460 +   if (!process_outer_partition(_map,

 512 +   if (outer_has_default || jointype == JOIN_FULL)
 513 +   {
 514 +   if (!process_inner_partition(_map,

How about adding these conditions to the else block of
process_outer_partition()
& process_inner_partition() function respectively so that these functions
can be
called unconditionally?  Thoughts/Comments?
---

 456 +   jointype == JOIN_LEFT ||
 457 +   jointype == JOIN_ANTI ||
 458 +   jointype == JOIN_FULL)

Also, how about using IS_OUTER_JOIN() instead. But we need an assertion to
restrict JOIN_RIGHT or something.

For the convenience, I did both aforesaid changes in the attached delta
patch that
can be applied atop of your previously posted patch[1].  Kindly have a look
& share
your thoughts, thanks.
--

1273 + * *next_index is incremented when creating a new merged partition
associated
1274 + * with the given outer partition.
1275 + */

Correction: s/outer/inner
---

1338 +* In range partitioning, if the given outer partition is
already
1339 +* merged (eg, because we found an overlapping range earlier),
we know
1340 +* where it fits in the join result; nothing to do in that
case.  Else
1341 +* create a new merged partition.

Correction: s/outer/inner.
---

1712 /*
1713  * If the NULL partition was missing from the inner side of
the join,

s/inner side/outer side
--

Regards,
Amul

1]
https://postgr.es/m/CAPmGK145V8DNCNQ2gQBgNE3QqoJGjsmK5WMwaA3FMirNM723KQ%40mail.gmail.com
From c7f165b575fd984ca3053ce7162bdd8e4bf56be8 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Thu, 24 Oct 2019 05:38:11 -0400
Subject: [PATCH] delta

Changes :
1.  Call process_outer_partition & process_inner_partition
unconditionally.
2. Used IS_OUTER_JOIN() instead of listing individual join type.
---
 src/backend/partitioning/partbounds.c | 209 --
 1 file changed, 93 insertions(+), 116 deletions(-)

diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 4c3a31c82c0..b29e44e2f28 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3601,24 +3601,18 @@ partition_range_bounds_merge(int partnatts, FmgrInfo *partsupfuncs,
 			}
 			else
 			{
-if (inner_has_default ||
-	jointype == JOIN_LEFT ||
-	jointype == JOIN_ANTI ||
-	jointype == JOIN_FULL)
-{
-	if (!process_outer_partition(_map,
- _map,
- outer_has_default,
- inner_has_default,
- outer_part,
- inner_default,
- jointype,
- outer_bi->strategy,
- _index,
- _index,
- _index))
-		return NULL;
-}
+if (!process_outer_partition(_map,
+			 _map,
+			 outer_has_default,
+			 inner_has_default,
+			 outer_part,
+			 inner_default,
+			 jointype,
+			 outer_bi->strategy,
+			 _index,
+			 _index,
+			 _index))
+	return NULL;
 
 merged_lb = _lb;
 merged_ub = _ub;
@@ -3640,21 +3634,18 @@ partition_range_bounds_merge(int partnatts, FmgrInfo *partsupfuncs,
 			}
 			else
 			{
-if (outer_has_default || jointype == JOIN_FULL)
-{
-	if (!process_inner_partition(_map,
- _map,
- outer_has_default,
- inner_has_default,
- inner_part,
- outer_default,
- jointype,
- outer_bi->strategy,
- _index,
- _index,
- _index))
-		return NULL;
-}
+if (!process_inner_partition(_map,
+			 _map,
+			 outer_has_default,
+			 inner_has_default,
+			 inner_part,
+			 outer_default,
+			 jointype,
+

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-10-16 Thread Etsuro Fujita

On Wed, Sep 25, 2019 at 12:59 AM Etsuro Fujita  wrote:
> I will continue to review the rest of the patch.

I've been reviewing the rest of the patch.  Here are my review comments:

* map_and_merge_partitions() checks whether the two partitions from
the outer and inner sides can be merged in two steps: 1) see if the
partitions are mapped to each other (ie, partmap1->from == index2 &&
partmap2->from == index1), and 2) see if the merged partition indexes
assigned are the same (ie, partmap1->to == partmap2->to) (or satisfy
some other conditions), but the first step seems redundant to me
because I think that if the merged partition indexes are the same,
then the partitions would be guaranteed to be mapped to each other.
Also, I noticed that that function can't handle some list-partitioning
cases properly.  Here is an example:

CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('', '0002');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0003', '0004');
INSERT INTO plt1 SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i WHERE i % 5 IN (0, 2, 3, 4);
ANALYZE plt1;
CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0001', '0002');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0003', '0004');
INSERT INTO plt2 SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 2, 3, 4);
ANALYZE plt2;
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 FULL JOIN plt2 t2 ON (t1.c
= t2.c) WHERE COALESCE(t1.a, 0) % 5 != 3 AND COALESCE(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a;
 QUERY PLAN

---
--
 Sort
   Sort Key: t1.c, t1.a, t2.a
   ->  Hash Full Join
 Hash Cond: (t1.c = t2.c)
 Filter: (((COALESCE(t1.a, 0) % 5) <> 3) AND ((COALESCE(t1.a,
0) % 5) <> 4))
 ->  Append
   ->  Seq Scan on plt1_p1 t1
   ->  Seq Scan on plt1_p2 t1_1
 ->  Hash
   ->  Append
 ->  Seq Scan on plt2_p1 t2
 ->  Seq Scan on plt2_p2 t2_1
(12 rows)

This should use partitionwise join by the new partition-matching
algorithm but doesn't.  The reason for that is because plt1_p1 and
plt2_p1 are mapped to different merged partitions and thus considered
not merge-able by map_and_merge_partitions() as-is.  I might be
missing something, but I don't think this is intentional, so I rewrote
that function completely in the attached, which is a WIP patch created
on top of the patch [1].

* In handle_missing_partition(), I noticed this:

+   else if (missing_side_inner)
+   {
+   /*
+* If this partition has already been mapped (say because we
+* found an overlapping range earlier), we know where does it
+* fit in the join result. Nothing to do in that case. Else
+* create a new merged partition.
+*/
+   PartitionMap *extra_map = _with_extra[extra_part];
+   if (extra_map->to < 0)
+   {
+   extra_map->to = *next_index;
+   *next_index = *next_index + 1;
+   *merged_index = extra_map->to;
+   }
+   }

As commented, that function skips setting *merged_index when the
"extra_part" partition is already mapped to a merged partition.  This
would be correct for range partitioning, but not for list
partitioning, I think.  Here is an example:

CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('', '0001', '0002');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0003', '0004');
INSERT INTO plt1 SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i;
ANALYZE plt1;
CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0002');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0003', '0004');
INSERT INTO plt2 SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
ANALYZE plt2;
CREATE TABLE plt3 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt3_p1 PARTITION OF plt3 FOR VALUES IN ('0001');
CREATE TABLE plt3_p2 PARTITION OF plt3 FOR VALUES IN ('0003', '0004');
INSERT INTO plt3 SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
ANALYZE plt3;
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1 t1 LEFT JOIN plt2
t2 ON (t1.c = t2.c)) FULL JOIN plt3 t3 ON (t1.c = t3.c) WHERE
COALESCE(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
 QUERY PLAN

---
--
 Sort
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join
 Hash Cond: (t1.c = t3.c)
 Filter:

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-09-05 Thread Etsuro Fujita

Hi,

On Thu, Sep 5, 2019 at 1:24 PM amul sul  wrote:
> On Wed, Sep 4, 2019 at 2:40 AM Alvaro Herrera  
> wrote:
>> CFbot complains that Fujita-san submitted a patch that doesn't apply,
>> which makes sense since the necessary previous patch was only referred
>> to without being resubmitted.  I suggest to always post all patches
>> together with each resubmission so that it can be checked automatically
>> by the cf bot: http://commitfest.cputube.org/etsuro-fujita.html

> Understood and sorry about that.

Sorry about that.

>> I'm not clear on who the author of this patch is, now.  Also, I'm not
>> sure what the *status* is.  Are we waiting for Fujita-san to review this
>> patch?

> Yes, we are waiting for Fujita-san to review.  Fujita-san has started a review
> and proposed some enhancement which I reviewed in the last update.
>
> I think soon Fujita-san might post the complete patch including his changes.

I'm a bit busy with something else recently, but I'll do that ASAP.
And I'll continue to review the other part of the patch.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-09-04 Thread amul sul

On Wed, Sep 4, 2019 at 2:40 AM Alvaro Herrera 
wrote:

> Fujita-san, amul,
>
> CFbot complains that Fujita-san submitted a patch that doesn't apply,
> which makes sense since the necessary previous patch was only referred
> to without being resubmitted.  I suggest to always post all patches
> together with each resubmission so that it can be checked automatically
> by the cf bot: http://commitfest.cputube.org/etsuro-fujita.html
>
>
Understood and sorry about that.

I'm not clear on who the author of this patch is, now.  Also, I'm not
> sure what the *status* is.  Are we waiting for Fujita-san to review this
> patch?
>

Yes, we are waiting for Fujita-san to review.  Fujita-san has started a
review
and proposed some enhancement which I reviewed in the last update.

I think soon Fujita-san might post the complete patch including his changes.

Regards,
Amul

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-09-03 Thread Alvaro Herrera

Fujita-san, amul,

CFbot complains that Fujita-san submitted a patch that doesn't apply,
which makes sense since the necessary previous patch was only referred
to without being resubmitted.  I suggest to always post all patches
together with each resubmission so that it can be checked automatically
by the cf bot: http://commitfest.cputube.org/etsuro-fujita.html

I'm not clear on who the author of this patch is, now.  Also, I'm not
sure what the *status* is.  Are we waiting for Fujita-san to review this
patch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-09-01 Thread amul sul

Hi Fujita San,


Please find my comments inline below:

On Wed, Aug 28, 2019 at 3:52 PM Etsuro Fujita 
wrote:

> On Fri, Aug 16, 2019 at 10:25 PM Etsuro Fujita 
> wrote:
> >

[... skipped ..]

>
> About the attached:
>
> * The attached patch modified try_partitionwise_join() so that we call
> partition_bounds_equal() then partition_bounds_merge() (if necessary)
> to create the partition bounds for the join rel.  We don't support for
> merging the partition bounds for the hash-partitioning case, so this
> makes code to handle the hash-partitioning case in
> partition_bounds_merge() completely unnecessary, so I removed that
> entirely.
>

Yes, that make sense.

On thinking further, a thought hits to me why we can't join two hash
partitioned
table which has the same modulus and partition key specification, but some
of
the partitions are missing from either partitioned table.

For e.g. here is a smaller case where foo has two partitions and bar has
only one.

CREATE TABLE foo(a int) PARTITION BY HASH(a);
CREATE TABLE foo_p0 PARTITION OF foo FOR VALUES WITH(modulus 2, remainder
0);
CREATE TABLE foo_p1 PARTITION OF foo FOR VALUES WITH(modulus 2, remainder
1);

CREATE TABLE bar(a int) PARTITION BY HASH(a);  <-- missing partitions
for REMAINDER 1
CREATE TABLE bar_p0 PARTITION OF bar FOR VALUES WITH(modulus 2, remainder
0);

Explain:
postgres=# explain select * from foo p1, bar p2 where p1.a = p2.a;
   QUERY PLAN

-
 Merge Join  (cost=590.35..1578.47 rows=65025 width=8)
   Merge Cond: (p2.a = p1.a)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
 Sort Key: p2.a
 ->  Seq Scan on bar_p0 p2  (cost=0.00..35.50 rows=2550 width=4)
   ->  Sort  (cost=410.57..423.32 rows=5100 width=4)
 Sort Key: p1.a
 ->  Append  (cost=0.00..96.50 rows=5100 width=4)
   ->  Seq Scan on foo_p0 p1  (cost=0.00..35.50 rows=2550
width=4)
   ->  Seq Scan on foo_p1 p1_1  (cost=0.00..35.50 rows=2550
width=4)
(10 rows)

The partitions-wise join will be performed only if we fill the partition
hole that
exists for the joining table i.e. adding partitions to bar table.

postgres=# CREATE TABLE bar_p1 PARTITION OF bar FOR VALUES WITH(modulus 2,
remainder 1);
CREATE TABLE
postgres=# explain select * from foo p1, bar p2 where p1.a = p2.a;
   QUERY PLAN

-
 Append  (cost=359.57..2045.11 rows=65024 width=8)
   ->  Merge Join  (cost=359.57..860.00 rows=32512 width=8)
 Merge Cond: (p1.a = p2.a)
 ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
   Sort Key: p1.a
   ->  Seq Scan on foo_p0 p1  (cost=0.00..35.50 rows=2550
width=4)
 ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
   Sort Key: p2.a
   ->  Seq Scan on bar_p0 p2  (cost=0.00..35.50 rows=2550
width=4)
   ->  Merge Join  (cost=359.57..860.00 rows=32512 width=8)
 Merge Cond: (p1_1.a = p2_1.a)
 ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
   Sort Key: p1_1.a
   ->  Seq Scan on foo_p1 p1_1  (cost=0.00..35.50 rows=2550
width=4)
 ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
   Sort Key: p2_1.a
   ->  Seq Scan on bar_p1 p2_1  (cost=0.00..35.50 rows=2550
width=4)
(17 rows)

It would have been nice if we could support this case, as we do allow
partitions
hole for the other partition scheme, but there wouldn't be much objection if
we don't want to add this support for now since there will be a lesser
chance
that hash partitioned table has the hole, IMO.


> * I removed this assertion in partition_bounds_merge(), because I
> think this is covered by two assertions above this.
>
> +   Assert((*outer_parts == NIL || *inner_parts != NIL) &&
> +  (*inner_parts == NIL || *outer_parts != NIL));
>
> * (I forgot to mention this in a previous email, but) I removed this
> bit of generate_matching_part_pairs(), because we already have the
> same check in try_partitionwise_join(), so this bit would be redundant
> IIUC.
>

Looks good.


>
> * I added more comments.
>

Thanks.


> If there are no objections, I'll merge the attached with the base patch in
> [1].
>

The proposed enhancement in the patch is too good and the patch is pretty
much
reasonable to merge into the main patch.

Here are the few cosmetic fixes for this path I think is needed. Feel free
to
ignore if some of them do not make sense or too obvious.

Note: left side number represents code line number of the patch.

118 +   }
119 +
120 +   /*
121 +* Try to create the partition bounds along with join pairs.
122 +*/
123 +   if (boundinfo == NULL)
124 +   {

Can we add this block as else part of previous if condition checking equal
partitions bound?

133 +

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-08-28 Thread Etsuro Fujita

On Fri, Aug 16, 2019 at 10:25 PM Etsuro Fujita  wrote:
> It seems that I performed the above tests on an assertion-enabled
> build.  :(  So I executed the tests one more time.  Here are the
> results.
>
> * 2-way self-join of pt: explain analyze select * from pt t0, pt t1
> where t0.a = t1.a;
>  - HEAD:
>  Planning Time: 0.969 ms
>  Execution Time: 13.843 ms
>  - with patch:
>  Planning Time: 1.720 ms
>  Execution Time: 14.393 ms
>  - with patch plus attached:
>  Planning Time: 1.630 ms
>  Execution Time: 14.002 ms
>
> * 4-way self-join of pt: explain analyze select * from pt t0, pt t1,
> pt t2, pt t3 where t0.a = t1.a
>
> and t1.a = t2.a and t2.a = t3.a;
>  - HEAD:
>  Planning Time: 12.203 ms
>  Execution Time: 31.784 ms
>  - with patch:
>  Planning Time: 32.102 ms
>  Execution Time: 32.504 ms
>  - with patch plus attached:
>  Planning Time: 19.471 ms
>  Execution Time: 32.582 ms
>
> * 8-way self-join of pt: explain analyze select * from pt t0, pt t1,
> pt t2, pt t3, pt t4, pt t5, pt t6, pt t7 where t0.a = t1.a and t1.a =
> t2.a and t2.a = t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a
> and t6.a = t7.a;
>  - HEAD:
>  Planning Time: 948.131 ms
>  Execution Time: 55.645 ms
>  - with patch:
>  Planning Time: 2939.813 ms
>  Execution Time: 56.760 ms
>  - with patch plus attached:
>  Planning Time: 1108.076 ms
>  Execution Time: 55.750 ms
>
> Note: the attached patch still uses the proposed partition matching
> algorithm for these queries.  As I said before, these queries don't
> need that algorithm, so we could eliminate the planning overhead
> compared to HEAD, by planning these queries as before, perhaps, but I
> haven't modified the patch as such yet.

I modified the patch further as such.  Attached is an updated version
of the patch created on top of the patch in [1].  I did the tests
again using the updated version of the patch.  Here are the results:

* 2-way self-join of pt:
 Planning Time: 1.043 ms
 Execution Time: 13.931 ms

* 4-way self-join of pt:
 Planning Time: 12.499 ms
 Execution Time: 32.392 ms

* 8-way self-join of pt:
 Planning Time: 968.412 ms
 Execution Time: 56.328 ms

The planning time for each test case still increased slightly, but IMO
I think that would be acceptable.  To see the efficiency of the
attached, I did another testing with test cases that really need the
new partition-matching algorithm:

* explain analyze select * from pt6 t6, pt7 t7 where t6.a = t7.a;
 - base patch in [1]
 Planning Time: 1.758 ms
 Execution Time: 13.977 ms
 - with attached
 Planning Time: 1.777 ms
 Execution Time: 13.959 ms

* explain analyze select * from pt4 t4, pt5 t5, pt6 t6, pt7 t7 where
t4.a = t5.a and t5.a = t6.a and t6.a = t7.a;
 - base patch in [1]
 Planning Time: 33.201 ms
 Execution Time: 32.480 ms
 - with attached
 Planning Time: 21.019 ms
 Execution Time: 32.777 ms

* explain analyze select * from pt0 t0, pt1 t1, pt2 t2, pt3 t3, pt4
t4, pt5 t5, pt6 t6, pt7 t7 where t0.a = t1.a and t1.a = t2.a and t2.a
= t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a and t6.a =
t7.a;
 - base patch in [1]
 Planning Time: 3060.000 ms
 Execution Time: 55.553 ms
 - with attached
 Planning Time: 1144.996 ms
 Execution Time: 56.233 ms

where pt0, pt1, pt2, pt3, pt4, pt5, pt6 and pt7 are list partitioned
tables that have slighly different list values.  (The structure and
list values of ptN are almost the same as that of pt used above, but
ptN's N-th partition ptNpN has an extra list value that pt's N-th
partition ptpN doesn't have.)  If anyone is interested in this
testing, I'll send a script file for producing these list partitioned
tables.

About the attached:

* The attached patch modified try_partitionwise_join() so that we call
partition_bounds_equal() then partition_bounds_merge() (if necessary)
to create the partition bounds for the join rel.  We don't support for
merging the partition bounds for the hash-partitioning case, so this
makes code to handle the hash-partitioning case in
partition_bounds_merge() completely unnecessary, so I removed that
entirely.

* I removed this assertion in partition_bounds_merge(), because I
think this is covered by two assertions above this.

+   Assert((*outer_parts == NIL || *inner_parts != NIL) &&
+  (*inner_parts == NIL || *outer_parts != NIL));

* (I forgot to mention this in a previous email, but) I removed this
bit of generate_matching_part_pairs(), because we already have the
same check in try_partitionwise_join(), so this bit would be redundant
IIUC.

+   switch (jointype)
+   {
+   case JOIN_INNER:
+   case JOIN_SEMI:
+
+   /*
+* An inner or semi join can not return any row when the
+* matching partition on either side is missing. We should
+* have eliminated all such cases while merging the bounds.
+*/
+   Assert(part1 >= 0 && part2 >= 0);
+   break;
+
+   case JOIN_LEFT:
+   case JOIN_ANTI:
+

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-08-28 Thread amul sul

Thank you Fujita San for the enhancement, will have a look.

Regards,
Amul

On Wed, Aug 28, 2019 at 3:52 PM Etsuro Fujita 
wrote:

> On Fri, Aug 16, 2019 at 10:25 PM Etsuro Fujita 
> wrote:
> > It seems that I performed the above tests on an assertion-enabled
> > build.  :(  So I executed the tests one more time.  Here are the
> > results.
> >
> > * 2-way self-join of pt: explain analyze select * from pt t0, pt t1
> > where t0.a = t1.a;
> >  - HEAD:
> >  Planning Time: 0.969 ms
> >  Execution Time: 13.843 ms
> >  - with patch:
> >  Planning Time: 1.720 ms
> >  Execution Time: 14.393 ms
> >  - with patch plus attached:
> >  Planning Time: 1.630 ms
> >  Execution Time: 14.002 ms
> >
> > * 4-way self-join of pt: explain analyze select * from pt t0, pt t1,
> > pt t2, pt t3 where t0.a = t1.a
> >
> > and t1.a = t2.a and t2.a = t3.a;
> >  - HEAD:
> >  Planning Time: 12.203 ms
> >  Execution Time: 31.784 ms
> >  - with patch:
> >  Planning Time: 32.102 ms
> >  Execution Time: 32.504 ms
> >  - with patch plus attached:
> >  Planning Time: 19.471 ms
> >  Execution Time: 32.582 ms
> >
> > * 8-way self-join of pt: explain analyze select * from pt t0, pt t1,
> > pt t2, pt t3, pt t4, pt t5, pt t6, pt t7 where t0.a = t1.a and t1.a =
> > t2.a and t2.a = t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a
> > and t6.a = t7.a;
> >  - HEAD:
> >  Planning Time: 948.131 ms
> >  Execution Time: 55.645 ms
> >  - with patch:
> >  Planning Time: 2939.813 ms
> >  Execution Time: 56.760 ms
> >  - with patch plus attached:
> >  Planning Time: 1108.076 ms
> >  Execution Time: 55.750 ms
> >
> > Note: the attached patch still uses the proposed partition matching
> > algorithm for these queries.  As I said before, these queries don't
> > need that algorithm, so we could eliminate the planning overhead
> > compared to HEAD, by planning these queries as before, perhaps, but I
> > haven't modified the patch as such yet.
>
> I modified the patch further as such.  Attached is an updated version
> of the patch created on top of the patch in [1].  I did the tests
> again using the updated version of the patch.  Here are the results:
>
> * 2-way self-join of pt:
>  Planning Time: 1.043 ms
>  Execution Time: 13.931 ms
>
> * 4-way self-join of pt:
>  Planning Time: 12.499 ms
>  Execution Time: 32.392 ms
>
> * 8-way self-join of pt:
>  Planning Time: 968.412 ms
>  Execution Time: 56.328 ms
>
> The planning time for each test case still increased slightly, but IMO
> I think that would be acceptable.  To see the efficiency of the
> attached, I did another testing with test cases that really need the
> new partition-matching algorithm:
>
> * explain analyze select * from pt6 t6, pt7 t7 where t6.a = t7.a;
>  - base patch in [1]
>  Planning Time: 1.758 ms
>  Execution Time: 13.977 ms
>  - with attached
>  Planning Time: 1.777 ms
>  Execution Time: 13.959 ms
>
> * explain analyze select * from pt4 t4, pt5 t5, pt6 t6, pt7 t7 where
> t4.a = t5.a and t5.a = t6.a and t6.a = t7.a;
>  - base patch in [1]
>  Planning Time: 33.201 ms
>  Execution Time: 32.480 ms
>  - with attached
>  Planning Time: 21.019 ms
>  Execution Time: 32.777 ms
>
> * explain analyze select * from pt0 t0, pt1 t1, pt2 t2, pt3 t3, pt4
> t4, pt5 t5, pt6 t6, pt7 t7 where t0.a = t1.a and t1.a = t2.a and t2.a
> = t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a and t6.a =
> t7.a;
>  - base patch in [1]
>  Planning Time: 3060.000 ms
>  Execution Time: 55.553 ms
>  - with attached
>  Planning Time: 1144.996 ms
>  Execution Time: 56.233 ms
>
> where pt0, pt1, pt2, pt3, pt4, pt5, pt6 and pt7 are list partitioned
> tables that have slighly different list values.  (The structure and
> list values of ptN are almost the same as that of pt used above, but
> ptN's N-th partition ptNpN has an extra list value that pt's N-th
> partition ptpN doesn't have.)  If anyone is interested in this
> testing, I'll send a script file for producing these list partitioned
> tables.
>
> About the attached:
>
> * The attached patch modified try_partitionwise_join() so that we call
> partition_bounds_equal() then partition_bounds_merge() (if necessary)
> to create the partition bounds for the join rel.  We don't support for
> merging the partition bounds for the hash-partitioning case, so this
> makes code to handle the hash-partitioning case in
> partition_bounds_merge() completely unnecessary, so I removed that
> entirely.
>
> * I removed this assertion in partition_bounds_merge(), because I
> think this is covered by two assertions above this.
>
> +   Assert((*outer_parts == NIL || *inner_parts != NIL) &&
> +  (*inner_parts == NIL || *outer_parts != NIL));
>
> * (I forgot to mention this in a previous email, but) I removed this
> bit of generate_matching_part_pairs(), because we already have the
> same check in try_partitionwise_join(), so this bit would be redundant
> IIUC.
>
> +   switch (jointype)
> +   {
> +   case JOIN_INNER:
> +   case JOIN_SEMI:
> +
> +   /*
> +

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-08-16 Thread Etsuro Fujita

On Tue, Jul 30, 2019 at 6:00 PM Etsuro Fujita  wrote:
> On Fri, Jul 19, 2019 at 10:44 PM Robert Haas  wrote:
> > On Thu, Jul 18, 2019 at 2:55 AM Etsuro Fujita  
> > wrote:
> > > I.e., partition_bounds_merge() is performed for each pair of input
> > > partitioned relations for a join relation in try_partitionwise_join().
> > > Since partition_bounds_merge() would need a lot of CPU cycles, I don't
> > > think this is acceptable; ISTM that some redesign is needed to avoid
> > > this.  I'm wondering that once we successfully merged partition bounds
> > > from a pair of input partitioned relations for the join relation, by
> > > using the merged partition bounds, we could get the lists of matching
> > > to-be-joined partitions for subsequent pairs of input partitioned
> > > relations for the join relation in a more efficient way than by
> > > performing partition_bounds_merge() as proposed in the patch.
> >
> > I don't know whether partition_bounds_merge() is well-implemented; I
> > haven't looked.
>
> My concern about that is list partitioning.  In that case that
> function calls partition_list_bounds_merge(), which generates the
> partition bounds for a join relation between two given input
> relations, by performing merge join for a pair of the datums arrays
> from both the input relations.  Since the datums arrays contain all
> non-null list values across all partitions, if the numbers of the list
> values (ie, ndatums') are large, the merge join would require not a
> few cycles, so it would be much expensive to perform the merge join
> for each such pair when considering large N-way partitionwise joins of
> list-partitioned tables with large ndatums.  To see that, I did simple
> tests using a list-partitioned table pt created with the attached,
> which has 10 partitions, each with 1000 list values, so ndatums is
> 1.  (The tests below are performed with
> enable_partitionwise_join=on.)
>
> * 2-way self-join of pt: explain analyze select * from pt t0, pt t1
> where t0.a = t1.a;
>  - HEAD:
>  Planning Time: 1.731 ms
>  Execution Time: 15.159 ms
>  - Patched:
>  Planning Time: 1.884 ms
>  Execution Time: 15.127 ms
>
> * 4-way self-join of pt: explain analyze select * from pt t0, pt t1,
> pt t2, pt t3 where t0.a = t1.a and t1.a = t2.a and t2.a = t3.a;
>  - HEAD:
>  Planning Time: 28.787 ms
>  Execution Time: 34.313 ms
>  - Patched:
>  Planning Time: 40.263 ms
>  Execution Time: 35.019 ms
>
> * 8-way self-join of pt: explain analyze select * from pt t0, pt t1,
> pt t2, pt t3, pt t4, pt t5, pt t6, pt t7 where t0.a = t1.a and t1.a =
> t2.a and t2.a = t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a
> and t6.a = t7.a;
>  - HEAD:
>  Planning Time: 2279.653 ms
>  Execution Time: 63.303 ms
>  - Patched:
>  Planning Time: 3834.751 ms
>  Execution Time: 62.949 ms
>
> Actually, these joins would not need the partition-matching algorithm
> the patch adds; we could probably avoid this regression by modifying
> the patch to plan these joins the same way as before, but ISTM that
> these results imply that the cost of performing the merge join for
> each such pair would not be negligible when considering large N-way
> partitionwise joins mentioned above.  Maybe I'm missing something,
> though.
>
> > But in general I don't see an alternative to doing
> > some kind of merging on each pair of input relations. That's just how
> > planning works, and I don't see why it should need to be prohibitively
> > expensive.  I might be missing something, though; do you have an idea?
>
> Yes, I do; but I think I should think a little more about that.

I gave more thought to this.  My idea is to generate the list of
matching partitions to be joined from the partbound info after
creating it with partition_bounds_merge(), as I stated before.  Let me
explain using an example.  Suppose that R, S and T are partitioned
tables, that R=R(a,b,c) is partitioned on ranges of a into three
partitions R1, R2 and R3, that S=S(a,b,c) is partitioned on ranges of
a into three partitions S1, S2 and S3, and that T=T(a,b,c) is
partitioned on ranges of a into three partitions T1, T2 and T3.
Consider a 3-way join query: SELECT * FROM R, S, T WHERE R.a=S.a AND
S.a=T.a;  Suppose that when considering 2-way join R IJ S,
partition_bounds_merge() successfully merges the partition bounds for
R and S, and generates join pairs (R1, S1), (R2, S2) and (R3, S3), and
that when considering 3-way join (R IJ S) IJ T, that function
successfully merges the partition bounds for (R IJ S) and T, and
generates join pairs ((R1 IJ S1), T1) and ((R2 IJ S2), T2).  The
question here is: do we really need to perform
partition_bounds_merge() to generate the list of matching partitions
to be joined for 3-way join R IJ (S IJ T), for example?  I don't think
so; because 1) we see from the 3-way join pairs ((R1 IJ S1), T1) and
((R2 IJ S2), T2) that Ri, Si and Ti (i=1,2) have boundaries
overlapping with each other, which means that there would be (S1, T1)
and (S2, T2) as 2-way join pairs

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-08-01 Thread Etsuro Fujita

Amit-san,

On Wed, Jul 31, 2019 at 2:47 PM Amit Langote  wrote:
> On Tue, Jul 30, 2019 at 6:00 PM Etsuro Fujita  wrote:
> > On Fri, Jul 19, 2019 at 10:44 PM Robert Haas  wrote:
> > > I don't know whether partition_bounds_merge() is well-implemented; I
> > > haven't looked.
> >
> > My concern about that is list partitioning.  In that case that
> > function calls partition_list_bounds_merge(), which generates the
> > partition bounds for a join relation between two given input
> > relations, by performing merge join for a pair of the datums arrays
> > from both the input relations.
>
> I had similar thoughts upon seeing that partition_bounds_merge() will
> be replacing the current way of determining if partition-wise join can
> occur; that it will make the handling of currently supported cases
> more expensive.
>
> The current way is to compare the PartitionBoundInfos of joining
> relations using partition_bounds_equal(), and if equal, simply join
> the pairs of matching partitions if the join quals permit doing so.
> There's no need to do anything extra to determine which partitions to
> join with each other, because it's already established.  Likewise,
> partition_bounds_merge() shouldn't to have to anything extra in that
> case.  That is, for the cases that are already supported, we should
> find a way to make partition_bounds_merge() only as expensive as just
> performing partition_bounds_equals(), or maybe just slightly more.

I 100% agree on that point.

One thing that was unexpected to me is this:

I wrote:
> To see that, I did simple
> tests using a list-partitioned table pt created with the attached,
> which has 10 partitions, each with 1000 list values, so ndatums is
> 1.  (The tests below are performed with
> enable_partitionwise_join=on.)
>
> * 2-way self-join of pt: explain analyze select * from pt t0, pt t1
> where t0.a = t1.a;
>  - HEAD:
>  Planning Time: 1.731 ms
>  Execution Time: 15.159 ms
>  - Patched:
>  Planning Time: 1.884 ms
>  Execution Time: 15.127 ms

IIUC, in this test, I think partition_bounds_equals() and
partition_bounds_merge() have been performed only once in HEAD and the
patched version respectively to plan the partitionwise join, so this
might imply that the cost of the latter is just slightly more
expensive than that of the former.  I'm missing something, though.

Anyway I'll continue to review this patch, so I'll move this to the
next CF with the same status (Needs review).

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-30 Thread Amit Langote

On Tue, Jul 30, 2019 at 6:00 PM Etsuro Fujita  wrote:
> On Fri, Jul 19, 2019 at 10:44 PM Robert Haas  wrote:
> > On Thu, Jul 18, 2019 at 2:55 AM Etsuro Fujita  
> > wrote:
> > > I.e., partition_bounds_merge() is performed for each pair of input
> > > partitioned relations for a join relation in try_partitionwise_join().
> > > Since partition_bounds_merge() would need a lot of CPU cycles, I don't
> > > think this is acceptable; ISTM that some redesign is needed to avoid
> > > this.  I'm wondering that once we successfully merged partition bounds
> > > from a pair of input partitioned relations for the join relation, by
> > > using the merged partition bounds, we could get the lists of matching
> > > to-be-joined partitions for subsequent pairs of input partitioned
> > > relations for the join relation in a more efficient way than by
> > > performing partition_bounds_merge() as proposed in the patch.
> >
> > I don't know whether partition_bounds_merge() is well-implemented; I
> > haven't looked.
>
> My concern about that is list partitioning.  In that case that
> function calls partition_list_bounds_merge(), which generates the
> partition bounds for a join relation between two given input
> relations, by performing merge join for a pair of the datums arrays
> from both the input relations.

I had similar thoughts upon seeing that partition_bounds_merge() will
be replacing the current way of determining if partition-wise join can
occur; that it will make the handling of currently supported cases
more expensive.

The current way is to compare the PartitionBoundInfos of joining
relations using partition_bounds_equal(), and if equal, simply join
the pairs of matching partitions if the join quals permit doing so.
There's no need to do anything extra to determine which partitions to
join with each other, because it's already established.  Likewise,
partition_bounds_merge() shouldn't to have to anything extra in that
case.  That is, for the cases that are already supported, we should
find a way to make partition_bounds_merge() only as expensive as just
performing partition_bounds_equals(), or maybe just slightly more.

Thanks,
Amit

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-30 Thread Etsuro Fujita

On Fri, Jul 19, 2019 at 10:44 PM Robert Haas  wrote:
> On Thu, Jul 18, 2019 at 2:55 AM Etsuro Fujita  wrote:
> > I.e., partition_bounds_merge() is performed for each pair of input
> > partitioned relations for a join relation in try_partitionwise_join().
> > Since partition_bounds_merge() would need a lot of CPU cycles, I don't
> > think this is acceptable; ISTM that some redesign is needed to avoid
> > this.  I'm wondering that once we successfully merged partition bounds
> > from a pair of input partitioned relations for the join relation, by
> > using the merged partition bounds, we could get the lists of matching
> > to-be-joined partitions for subsequent pairs of input partitioned
> > relations for the join relation in a more efficient way than by
> > performing partition_bounds_merge() as proposed in the patch.
>
> I don't know whether partition_bounds_merge() is well-implemented; I
> haven't looked.

My concern about that is list partitioning.  In that case that
function calls partition_list_bounds_merge(), which generates the
partition bounds for a join relation between two given input
relations, by performing merge join for a pair of the datums arrays
from both the input relations.  Since the datums arrays contain all
non-null list values across all partitions, if the numbers of the list
values (ie, ndatums') are large, the merge join would require not a
few cycles, so it would be much expensive to perform the merge join
for each such pair when considering large N-way partitionwise joins of
list-partitioned tables with large ndatums.  To see that, I did simple
tests using a list-partitioned table pt created with the attached,
which has 10 partitions, each with 1000 list values, so ndatums is
1.  (The tests below are performed with
enable_partitionwise_join=on.)

* 2-way self-join of pt: explain analyze select * from pt t0, pt t1
where t0.a = t1.a;
 - HEAD:
 Planning Time: 1.731 ms
 Execution Time: 15.159 ms
 - Patched:
 Planning Time: 1.884 ms
 Execution Time: 15.127 ms

* 4-way self-join of pt: explain analyze select * from pt t0, pt t1,
pt t2, pt t3 where t0.a = t1.a and t1.a = t2.a and t2.a = t3.a;
 - HEAD:
 Planning Time: 28.787 ms
 Execution Time: 34.313 ms
 - Patched:
 Planning Time: 40.263 ms
 Execution Time: 35.019 ms

* 8-way self-join of pt: explain analyze select * from pt t0, pt t1,
pt t2, pt t3, pt t4, pt t5, pt t6, pt t7 where t0.a = t1.a and t1.a =
t2.a and t2.a = t3.a and t3.a = t4.a and t4.a = t5.a and t5.a = t6.a
and t6.a = t7.a;
 - HEAD:
 Planning Time: 2279.653 ms
 Execution Time: 63.303 ms
 - Patched:
 Planning Time: 3834.751 ms
 Execution Time: 62.949 ms

Actually, these joins would not need the partition-matching algorithm
the patch adds; we could probably avoid this regression by modifying
the patch to plan these joins the same way as before, but ISTM that
these results imply that the cost of performing the merge join for
each such pair would not be negligible when considering large N-way
partitionwise joins mentioned above.  Maybe I'm missing something,
though.

> But in general I don't see an alternative to doing
> some kind of merging on each pair of input relations. That's just how
> planning works, and I don't see why it should need to be prohibitively
> expensive.  I might be missing something, though; do you have an idea?

Yes, I do; but I think I should think a little more about that.

Sorry for the delay.

Best regards,
Etsuro Fujita

list_parted2.sql
Description: Binary data

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-30 Thread Etsuro Fujita

On Fri, Jul 19, 2019 at 8:09 PM amul sul  wrote:
> On Mon, Jul 8, 2019 at 5:03 PM Etsuro Fujita  wrote:
>> I started reviewing this.  Here is my initial review comments:
>>
>> * 0001-Hash-partition-bound-equality-refactoring-v22.patch

>> However, I don't think it's a good idea to do this
>> refactoring, because that would lead to duplicating the code to check
>> whether two given hash bound collections are equal in
>> partition_bounds_equal() and partition_hash_bounds_merge() that will
>> be added by the main patch, after all.  To avoid that, how about
>> calling partition_bounds_equal() from partition_hash_bounds_merge() in
>> the main patch, like the attached?

> Agree, your changes look good to me, thanks for working on it.

Cool!  Thanks for reviewing!

Sorry for the delay.  I was busy with something else recently.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-19 Thread Robert Haas

On Thu, Jul 18, 2019 at 2:55 AM Etsuro Fujita  wrote:
> I.e., partition_bounds_merge() is performed for each pair of input
> partitioned relations for a join relation in try_partitionwise_join().
> Since partition_bounds_merge() would need a lot of CPU cycles, I don't
> think this is acceptable; ISTM that some redesign is needed to avoid
> this.  I'm wondering that once we successfully merged partition bounds
> from a pair of input partitioned relations for the join relation, by
> using the merged partition bounds, we could get the lists of matching
> to-be-joined partitions for subsequent pairs of input partitioned
> relations for the join relation in a more efficient way than by
> performing partition_bounds_merge() as proposed in the patch.

I don't know whether partition_bounds_merge() is well-implemented; I
haven't looked. But in general I don't see an alternative to doing
some kind of merging on each pair of input relations. That's just how
planning works, and I don't see why it should need to be prohibitively
expensive.  I might be missing something, though; do you have an idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-19 Thread amul sul

On Mon, Jul 8, 2019 at 5:03 PM Etsuro Fujita 
wrote:

> On Wed, Jul 3, 2019 at 3:44 PM Etsuro Fujita 
> wrote:
> > On Tue, Jul 2, 2019 at 1:47 PM amul sul  wrote:
> > > Attached version is rebase atop of the latest master head(c74d49d41c),
> thanks.
> >
> > Thanks!  Will review.
>
> I started reviewing this.  Here is my initial review comments:
>
> * 0001-Hash-partition-bound-equality-refactoring-v22.patch
>
> First of all, I agree with your view on hash partitioning:
>
> "3. For hash partitioned tables however, we support partition-wise join
> only when the bounds exactly match. For hash partitioning it's unusual
> to have missing partitions and hence generic partition matching is not
> required."
>
> which is cited from the commit message for the main patch
> "0002-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi-v22.patch".
> (I think it would be better if we can extend the partition matching to
> the hash-partitioning case where there are missing partitions in
> future, though.)  However, I don't think it's a good idea to do this
> refactoring, because that would lead to duplicating the code to check
> whether two given hash bound collections are equal in
> partition_bounds_equal() and partition_hash_bounds_merge() that will
> be added by the main patch, after all.  To avoid that, how about
> calling partition_bounds_equal() from partition_hash_bounds_merge() in
> the main patch, like the attached?


Agree, your changes look good to me, thanks for working on it.

Regards,
Amul

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-18 Thread Etsuro Fujita

On Mon, Jul 8, 2019 at 8:33 PM Etsuro Fujita  wrote:
> I'll review the remaining parts (ie,
> "0002-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi-v22.patch"
> and "0003-Tests-for-0-1-1-1-and-1-0-partition-matching-v22.patch")
> closely next.

I've been reviewing the main patch
"0002-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi-v22.patch",
but it's pretty large and complicated, so I'm still at the first pass,
and so maybe I'm missing something, but let me comment a few things on
the patch.  First of all, I think the patch's problem setting, which
is stated in the commit message, is reasonable:

Earlier version of partition-wise join implementation allowed
partition-wise join between two relations with exactly same partition
bounds. This commit allows partition-wise join to be applied under
following conditions

1. the partition bounds of joining relations are such that rows from
given partition on one side join can join with rows from maximum one
partition on the other side i.e. bounds of a given partition on one
side match/overlap with those of maximum one partition on the other
side.

And I agree with the patch's approach to this that tries to find such
a partition mapping between two input partitioned relations for a join
relation, by trying to match their partition bounds, which is
implemented in a new function partition_bounds_merge(), in general.
But one thing that I'm concerned about most at this point is this in
try_partitionwise_join():

/*
-* Since we allow partitionwise join only when the partition bounds of the
-* joining relations exactly match, the partition bounds of the join
-* should match those of the joining relations.
+* Get the list of matching partitions to be joined along with the
+* partition bounds of the join relation. Because of the restrictions
+* imposed by partition matching algorithm, not every pair of joining
+* relations for this join will be able to use partition-wise join. But all
+* those pairs which can use partition-wise join will produce the same
+* partition bounds for the join relation.
 */
-   Assert(partition_bounds_equal(joinrel->part_scheme->partnatts,
- joinrel->part_scheme->parttyplen,
- joinrel->part_scheme->parttypbyval,
- joinrel->boundinfo, rel1->boundinfo));
-   Assert(partition_bounds_equal(joinrel->part_scheme->partnatts,
- joinrel->part_scheme->parttyplen,
- joinrel->part_scheme->parttypbyval,
- joinrel->boundinfo, rel2->boundinfo));
+   join_boundinfo = partition_bounds_merge(part_scheme->partnatts,
+   part_scheme->partsupfunc,
+   part_scheme->partcollation,
+   rel1, rel2,
+   parent_sjinfo->jointype,
+   , );
+
+   if (join_boundinfo == NULL)
+   return;

I.e., partition_bounds_merge() is performed for each pair of input
partitioned relations for a join relation in try_partitionwise_join().
Since partition_bounds_merge() would need a lot of CPU cycles, I don't
think this is acceptable; ISTM that some redesign is needed to avoid
this.  I'm wondering that once we successfully merged partition bounds
from a pair of input partitioned relations for the join relation, by
using the merged partition bounds, we could get the lists of matching
to-be-joined partitions for subsequent pairs of input partitioned
relations for the join relation in a more efficient way than by
performing partition_bounds_merge() as proposed in the patch.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-03 Thread Etsuro Fujita

On Tue, Jul 2, 2019 at 1:47 PM amul sul  wrote:
> Attached version is rebase atop of the latest master head(c74d49d41c), thanks.

Thanks!  Will review.

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-01 Thread Etsuro Fujita

On Mon, Jul 1, 2019 at 6:50 PM Thomas Munro  wrote:
> On Sat, May 18, 2019 at 12:20 AM Robert Haas  wrote:
> > On Tue, May 14, 2019 at 12:24 AM Amit Langote
> >  wrote:
> > > He did mention that cases where the nullable side is provably empty can be
> > > handled by simply returning the path of the non-nullable side with
> > > suitable projection path added on top to emit NULLs for the columns of the
> > > nullable-side.  If we teach populate_joinrel_with_paths() and underlings
> > > about that, then we can allow partitionwise join even in the case where
> > > the nullable side has some partitions missing.
> >
> > Yes, I think that would be a good approach to pursue.
>
> Hi Ashutosh, Amul, Fujita-san,
>
> Could we please have a fresh rebase for the new Commitfest?

Will do unless Ashutosh, Amul, or anyone wants to.

Thanks!

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-01 Thread Thomas Munro

On Sat, May 18, 2019 at 12:20 AM Robert Haas  wrote:
> On Tue, May 14, 2019 at 12:24 AM Amit Langote
>  wrote:
> > He did mention that cases where the nullable side is provably empty can be
> > handled by simply returning the path of the non-nullable side with
> > suitable projection path added on top to emit NULLs for the columns of the
> > nullable-side.  If we teach populate_joinrel_with_paths() and underlings
> > about that, then we can allow partitionwise join even in the case where
> > the nullable side has some partitions missing.
>
> Yes, I think that would be a good approach to pursue.

Hi Ashutosh, Amul, Fujita-san,

Could we please have a fresh rebase for the new Commitfest?

Thanks,

-- 
Thomas Munro
https://enterprisedb.com

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-05-17 Thread Robert Haas

On Tue, May 14, 2019 at 12:24 AM Amit Langote
 wrote:
> He did mention that cases where the nullable side is provably empty can be
> handled by simply returning the path of the non-nullable side with
> suitable projection path added on top to emit NULLs for the columns of the
> nullable-side.  If we teach populate_joinrel_with_paths() and underlings
> about that, then we can allow partitionwise join even in the case where
> the nullable side has some partitions missing.

Yes, I think that would be a good approach to pursue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-05-14 Thread Ashutosh Bapat

On Tue, May 14, 2019 at 10:00 AM Amit Langote 
wrote:

> On 2019/05/14 13:23, Amit Langote wrote:
> > Tom
> > strongly objected to that idea saying that such join paths are kind of
> > silly [1], even outside the context of partitionwise join.  He suggested
> > that we abandon partitionwise join in such cases, because having to build
> > a dummy base relation for pruned partitions only to generate
> silly-looking
> > paths would be an ugly kludge.
>
> I forgot to mention that he even committed a patch to disable
> partitionwise joins in such cases, which was also applied to v11 branch.
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d70c147fa217c4bae32ac1afb86ab42d98b36fdf
>
> Note that there were also other reasons for committing, beside what I
> described in my previous email.
>
>
I haven't seen the actual commit, but we could use these patches to enable
partition-wise join when partitions are pruned. For that the partition
descriptor of the pruned partition table should be arranged as if those
partitions are missing in the table itself. However, we will still need
code to handle the cases when the partitions are missing on the nullable
side. Tom mentioned the idea of using just projection to produce join
tuples with rows on the outer side appended with null columns from the
nullable side. If we can implement that, we can remove the restrictions in
this patch.

--
Best Wishes,
Ashutosh Bapat

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-05-13 Thread Amit Langote

On 2019/05/14 13:23, Amit Langote wrote:
> Tom
> strongly objected to that idea saying that such join paths are kind of
> silly [1], even outside the context of partitionwise join.  He suggested
> that we abandon partitionwise join in such cases, because having to build
> a dummy base relation for pruned partitions only to generate silly-looking
> paths would be an ugly kludge.

I forgot to mention that he even committed a patch to disable
partitionwise joins in such cases, which was also applied to v11 branch.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d70c147fa217c4bae32ac1afb86ab42d98b36fdf

Note that there were also other reasons for committing, beside what I
described in my previous email.

Thanks,
Amit

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-05-13 Thread Amit Langote

Hi Amul, Ashutosh,

On 2019/04/24 20:26, amul sul wrote:
> Attached version is rebase atop of the latest master head(fdc7efcc30), also
> incorporates the Ashutosh's suggestion, thanks.

Reading the commit message of 0002 and after testing to confirm, I
understand that the patch doesn't handle OUTER joins where the nullable
side is missing some partitions.  The reason given is that join planning
would have to add base relations corresponding to missing partitions on
the nullable side, which we can't do.  While working on partition pruning
refactoring recently (committed as 428b260f87e), we faced a similar
situation in that pruned partitions are like missing partitions, because
they're not added to the PlannerInfo anymore, whereas before that commit,
they'd be added and marked dummy afterwards.  Earlier versions of my patch
had code to add dummy base relations for such pruned partitions, because
partitionwise join expected pairs of matched partitions to be valid base
relations, because that's how things were when partitionwise joins feature
was committed.  Join path generated in this case would have a
constant-false Result path (an empty relation) for the nullable side.  Tom
strongly objected to that idea saying that such join paths are kind of
silly [1], even outside the context of partitionwise join.  He suggested
that we abandon partitionwise join in such cases, because having to build
a dummy base relation for pruned partitions only to generate silly-looking
paths would be an ugly kludge.  I guess the same argument applies to the
case where the nullable side is missing some partitions, so the right way
to support partitionwise join case in that case wouldn't be to figure out
how joinrels.c could add dummy base relations.

He did mention that cases where the nullable side is provably empty can be
handled by simply returning the path of the non-nullable side with
suitable projection path added on top to emit NULLs for the columns of the
nullable-side.  If we teach populate_joinrel_with_paths() and underlings
about that, then we can allow partitionwise join even in the case where
the nullable side has some partitions missing.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/25035.1553905052%40sss.pgh.pa.us

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-04-26 Thread Rajkumar Raghuwanshi

On Wed, Apr 24, 2019 at 4:56 PM amul sul  wrote:

> Attached version is rebase atop of the latest master head(fdc7efcc30), also
> incorporates the Ashutosh's suggestion, thanks.
>
Thanks for rebase patch, patches applied cleanly on PG head.
I did some crash testing with extra test case [0006 patch] [1] and found no
more issue.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB.

[1]
https://www.postgresql.org/message-id/CAFjFpReKuV_4LRRfdy80BqX8oZfwbo%2BHWLQNv3CsJ5iGPSyTfA%40mail.gmail.com




>
> Regards,
> Amul
>
> On Mon, Mar 11, 2019 at 10:14 PM Ashutosh Bapat <
> ashutosh.bapat@gmail.com> wrote:
>
>>
>>
>> On Mon, Mar 11, 2019 at 10:40 AM amul sul  wrote:
>>
>>>
>>> All the places from where this handle_missing_partition() get called
>>> have the following code to decide the value for
>>> missing_side_outer/_inner which
>>> I yet to understand. Do you think this has some flaw?
>>>
>>> /*
>>>  * For a FULL join, inner relation acts as both OUTER and INNER
>>>  * relation.  For LEFT and ANTI join the inner relation acts as
>>>  * INNER relation. For INNER and SEMI join OUTER and INNER
>>>  * differentiation is immaterial.
>>>  */
>>> missing_side_inner = (jointype == JOIN_FULL ||
>>>   jointype == JOIN_LEFT ||
>>>   jointype == JOIN_ANTI);
>>> missing_side_outer = (jointype == JOIN_FULL);
>>>
>>
>> I was wrong, sorry. The comment says it all.
>>
>>
>>>
>>>
>>>
 argument value which fails to set merged_index.
>
> In the attached patch, I tried to fix this case by setting
> merged_index
> explicitly which fixes the reported crash.
>

 I expect handle_missing_partition() to set the merged_index always. In
 your patches, I don't see that function in your patches is setting it
 explicitly. If we are setting merged_index explicitly somewhere else, other
 places may miss that explicit assignment. So it's better to move it inside
 this function.


>>>
>>> Ok, that can be fixed.
>>>
>>> Similarly, I think merge_null_partitions should set null_index instead
>>> of
>>> asserting when null partitions missing from both the side, make sense?
>>>
>>
>> I think not. null_index, once set shouldn't change and hence does not
>> change with each pair of partitions being matched. So, it makes sense to
>> make sure that null_index remains invalid if none of the tables have null
>> partition.
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>>
>

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-03-11 Thread Ashutosh Bapat

On Mon, Mar 11, 2019 at 10:40 AM amul sul  wrote:

>
> All the places from where this handle_missing_partition() get called
> have the following code to decide the value for missing_side_outer/_inner
> which
> I yet to understand. Do you think this has some flaw?
>
> /*
>  * For a FULL join, inner relation acts as both OUTER and INNER
>  * relation.  For LEFT and ANTI join the inner relation acts as
>  * INNER relation. For INNER and SEMI join OUTER and INNER
>  * differentiation is immaterial.
>  */
> missing_side_inner = (jointype == JOIN_FULL ||
>   jointype == JOIN_LEFT ||
>   jointype == JOIN_ANTI);
> missing_side_outer = (jointype == JOIN_FULL);
>

I was wrong, sorry. The comment says it all.


>
>
>
>> argument value which fails to set merged_index.
>>>
>>> In the attached patch, I tried to fix this case by setting merged_index
>>> explicitly which fixes the reported crash.
>>>
>>
>> I expect handle_missing_partition() to set the merged_index always. In
>> your patches, I don't see that function in your patches is setting it
>> explicitly. If we are setting merged_index explicitly somewhere else, other
>> places may miss that explicit assignment. So it's better to move it inside
>> this function.
>>
>>
>
> Ok, that can be fixed.
>
> Similarly, I think merge_null_partitions should set null_index instead of
> asserting when null partitions missing from both the side, make sense?
>

I think not. null_index, once set shouldn't change and hence does not
change with each pair of partitions being matched. So, it makes sense to
make sure that null_index remains invalid if none of the tables have null
partition.

--
Best Wishes,
Ashutosh Bapat

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-03-10 Thread amul sul

On Mon, Mar 11, 2019 at 8:29 AM Ashutosh Bapat 
wrote:

> On Thu, Mar 7, 2019 at 8:20 PM amul sul  wrote:
>
>>
>>
>> On Thu, Mar 7, 2019 at 1:02 PM amul sul  wrote:
>>
>>> Thanks Rajkumar,
>>>
>>> I am looking into this.
>>>
>>>
>> The crash happens when none of the if-else branch of
>> handle_missing_partition()
>> evaluates and returns merged_index unassigned.
>>
>> Let me explain, in Rajkumar 's test case, the join type is JOIN_INNER.
>> When
>> only outer rel has null partition, merge_null_partitions() function calls
>> handle_missing_partition() with missing_side_inner = false and
>> missing_side_outer = false
>>
>
> Both missing_side_ variables being false when the NULL partition is
> missing on the inner side looks suspicious. I guess from the variable names
> that the missing_side_inner should be true in this case.
>
>

All the places from where this handle_missing_partition() get called
have the following code to decide the value for missing_side_outer/_inner
which
I yet to understand. Do you think this has some flaw?

/*
 * For a FULL join, inner relation acts as both OUTER and INNER
 * relation.  For LEFT and ANTI join the inner relation acts as
 * INNER relation. For INNER and SEMI join OUTER and INNER
 * differentiation is immaterial.
 */
missing_side_inner = (jointype == JOIN_FULL ||
  jointype == JOIN_LEFT ||
  jointype == JOIN_ANTI);
missing_side_outer = (jointype == JOIN_FULL);

> argument value which fails to set merged_index.
>>
>> In the attached patch, I tried to fix this case by setting merged_index
>> explicitly which fixes the reported crash.
>>
>
> I expect handle_missing_partition() to set the merged_index always. In
> your patches, I don't see that function in your patches is setting it
> explicitly. If we are setting merged_index explicitly somewhere else, other
> places may miss that explicit assignment. So it's better to move it inside
> this function.
>
>

Ok, that can be fixed.

Similarly, I think merge_null_partitions should set null_index instead of
asserting when null partitions missing from both the side, make sense?

Regards,
Amul

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-03-08 Thread Rajkumar Raghuwanshi

On Thu, Mar 7, 2019 at 8:20 PM amul sul  wrote:

>
>
> On Thu, Mar 7, 2019 at 1:02 PM amul sul  wrote:
>
>> Thanks Rajkumar,
>>
>> I am looking into this.
>>
>>
> The crash happens when none of the if-else branch of
> handle_missing_partition()
> evaluates and returns merged_index unassigned.
>
> Let me explain, in Rajkumar 's test case, the join type is JOIN_INNER.
> When
> only outer rel has null partition, merge_null_partitions() function calls
> handle_missing_partition() with missing_side_inner = false and
> missing_side_outer = false argument value which fails to set merged_index.
>
> In the attached patch, I tried to fix this case by setting merged_index
> explicitly which fixes the reported crash.
>
Thanks Amul, with v20 patches, crash is fixed.


>
> Regards,
> Amul
>
>
>
>> On Thu, Mar 7, 2019 at 11:54 AM Rajkumar Raghuwanshi <
>> rajkumar.raghuwan...@enterprisedb.com> wrote:
>>
>>>
>>>
>>> On Tue, Mar 5, 2019 at 3:45 PM amul sul  wrote:
>>>
 Attached is the rebased atop of the latest master head(35bc0ec7c8).

>>> thanks Amul, patches applied cleanly on PG head.
>>>
>>> While testing this I got a server crash with below test case.
>>>
>>> CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c);
>>> CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
>>> ('0001','0002','0003');
>>> CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN
>>> ('0004','0005','0006');
>>> CREATE TABLE plt1_p3 PARTITION OF plt1 FOR VALUES IN
>>> (NULL,'0008','0009');
>>> CREATE TABLE plt1_p4 PARTITION OF plt1 FOR VALUES IN ('','0010');
>>> INSERT INTO plt1 SELECT i, i % 47, to_char(i % 17, 'FM') FROM
>>> generate_series(0, 500) i WHERE i % 17 NOT IN (7, 11, 12, 13, 14, 15,16);
>>> INSERT INTO plt1 SELECT i, i % 47, case when i % 17 = 7 then NULL else
>>> to_char(i % 17, 'FM') end FROM generate_series(0, 500) i WHERE i % 17
>>> IN (7,8,9);
>>> ANALYSE plt1;
>>>
>>> CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c);
>>> CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0002','0003');
>>> CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN
>>> ('0004','0005','0006');
>>> CREATE TABLE plt2_p3 PARTITION OF plt2 FOR VALUES IN
>>> ('0007','0008','0009');
>>> CREATE TABLE plt2_p4 PARTITION OF plt2 FOR VALUES IN
>>> ('',NULL,'0012');
>>> INSERT INTO plt2 SELECT i, i % 47, to_char(i % 17, 'FM') FROM
>>> generate_series(0, 500) i WHERE i % 17 NOT IN (1, 10, 11, 13, 14, 15, 16);
>>> INSERT INTO plt2 SELECT i, i % 47, case when i % 17 = 11 then NULL else
>>> to_char(i % 17, 'FM') end FROM generate_series(0, 500) i WHERE i % 17
>>> IN (0,11,12);
>>> ANALYZE plt2;
>>>
>>> CREATE TABLE plt1_e (a int, b int, c text) PARTITION BY LIST(ltrim(c,
>>> 'A'));
>>> CREATE TABLE plt1_e_p1 PARTITION OF plt1_e FOR VALUES IN ('0002',
>>> '0003');
>>> CREATE TABLE plt1_e_p2 PARTITION OF plt1_e FOR VALUES IN ('0004',
>>> '0005', '0006');
>>> CREATE TABLE plt1_e_p3 PARTITION OF plt1_e FOR VALUES IN ('0008',
>>> '0009');
>>> CREATE TABLE plt1_e_p4 PARTITION OF plt1_e FOR VALUES IN ('');
>>> INSERT INTO plt1_e SELECT i, i % 47, to_char(i % 17, 'FM') FROM
>>> generate_series(0, 500) i WHERE i % 17 NOT IN (1, 7, 10, 11, 12, 13, 14,
>>> 15, 16);
>>> ANALYZE plt1_e;
>>>
>>> EXPLAIN (COSTS OFF)
>>> SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM
>>> plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') =
>>> t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>>
>>> below is stack trace,  looks like some indexes got corrupted, please
>>> take a look.
>>>
>>> Core was generated by `postgres: edb postgres [local]
>>> EXPLAIN   '.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  0x00821aee in map_and_merge_partitions (partmaps1=0x2c1c8a8,
>>> partmaps2=0x2c1c8e0, index1=45540240, index2=0, next_index=0x7ffeebd43d3c)
>>> at partbounds.c:4162
>>> 4162if (partmap1->from < 0 && partmap2->from < 0)
>>> Missing separate debuginfos, use: debuginfo-install
>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>>> openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
>>> (gdb) bt
>>> #0  0x00821aee in map_and_merge_partitions (partmaps1=0x2c1c8a8,
>>> partmaps2=0x2c1c8e0, *index1=45540240*, index2=0,
>>> next_index=0x7ffeebd43d3c) at partbounds.c:4162
>>> #1  0x008226c3 in merge_null_partitions (outer_bi=0x2b6e338,
>>> inner_bi=0x2bf90b0, outer_maps=0x2c1c8a8, inner_maps=0x2c1c8e0,
>>> jointype=JOIN_INNER,
>>> next_index=0x7ffeebd43d3c, null_index=0x7ffeebd43d38,
>>> default_index=0x7ffeebd43d34) at partbounds.c:4610
>>> #2  0x00821726 in partition_list_bounds_merge
>>>

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-03-06 Thread amul sul

Thanks Rajkumar,

I am looking into this.

Regards,
Amul

On Thu, Mar 7, 2019 at 11:54 AM Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

>
>
> On Tue, Mar 5, 2019 at 3:45 PM amul sul  wrote:
>
>> Attached is the rebased atop of the latest master head(35bc0ec7c8).
>>
> thanks Amul, patches applied cleanly on PG head.
>
> While testing this I got a server crash with below test case.
>
> CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c);
> CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
> ('0001','0002','0003');
> CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN
> ('0004','0005','0006');
> CREATE TABLE plt1_p3 PARTITION OF plt1 FOR VALUES IN (NULL,'0008','0009');
> CREATE TABLE plt1_p4 PARTITION OF plt1 FOR VALUES IN ('','0010');
> INSERT INTO plt1 SELECT i, i % 47, to_char(i % 17, 'FM') FROM
> generate_series(0, 500) i WHERE i % 17 NOT IN (7, 11, 12, 13, 14, 15,16);
> INSERT INTO plt1 SELECT i, i % 47, case when i % 17 = 7 then NULL else
> to_char(i % 17, 'FM') end FROM generate_series(0, 500) i WHERE i % 17
> IN (7,8,9);
> ANALYSE plt1;
>
> CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c);
> CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0002','0003');
> CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN
> ('0004','0005','0006');
> CREATE TABLE plt2_p3 PARTITION OF plt2 FOR VALUES IN
> ('0007','0008','0009');
> CREATE TABLE plt2_p4 PARTITION OF plt2 FOR VALUES IN ('',NULL,'0012');
> INSERT INTO plt2 SELECT i, i % 47, to_char(i % 17, 'FM') FROM
> generate_series(0, 500) i WHERE i % 17 NOT IN (1, 10, 11, 13, 14, 15, 16);
> INSERT INTO plt2 SELECT i, i % 47, case when i % 17 = 11 then NULL else
> to_char(i % 17, 'FM') end FROM generate_series(0, 500) i WHERE i % 17
> IN (0,11,12);
> ANALYZE plt2;
>
> CREATE TABLE plt1_e (a int, b int, c text) PARTITION BY LIST(ltrim(c,
> 'A'));
> CREATE TABLE plt1_e_p1 PARTITION OF plt1_e FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt1_e_p2 PARTITION OF plt1_e FOR VALUES IN ('0004', '0005',
> '0006');
> CREATE TABLE plt1_e_p3 PARTITION OF plt1_e FOR VALUES IN ('0008', '0009');
> CREATE TABLE plt1_e_p4 PARTITION OF plt1_e FOR VALUES IN ('');
> INSERT INTO plt1_e SELECT i, i % 47, to_char(i % 17, 'FM') FROM
> generate_series(0, 500) i WHERE i % 17 NOT IN (1, 7, 10, 11, 12, 13, 14,
> 15, 16);
> ANALYZE plt1_e;
>
> EXPLAIN (COSTS OFF)
> SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM
> plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c
> GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> below is stack trace,  looks like some indexes got corrupted, please take
> a look.
>
> Core was generated by `postgres: edb postgres [local]
> EXPLAIN   '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00821aee in map_and_merge_partitions (partmaps1=0x2c1c8a8,
> partmaps2=0x2c1c8e0, index1=45540240, index2=0, next_index=0x7ffeebd43d3c)
> at partbounds.c:4162
> 4162if (partmap1->from < 0 && partmap2->from < 0)
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  0x00821aee in map_and_merge_partitions (partmaps1=0x2c1c8a8,
> partmaps2=0x2c1c8e0, *index1=45540240*, index2=0,
> next_index=0x7ffeebd43d3c) at partbounds.c:4162
> #1  0x008226c3 in merge_null_partitions (outer_bi=0x2b6e338,
> inner_bi=0x2bf90b0, outer_maps=0x2c1c8a8, inner_maps=0x2c1c8e0,
> jointype=JOIN_INNER,
> next_index=0x7ffeebd43d3c, null_index=0x7ffeebd43d38,
> default_index=0x7ffeebd43d34) at partbounds.c:4610
> #2  0x00821726 in partition_list_bounds_merge
> (partsupfunc=0x2ba3548, partcollation=0x2ba34e8, outer_rel=0x2b6ce40,
> inner_rel=0x2bf8d28, outer_parts=0x7ffeebd43ed8,
> inner_parts=0x7ffeebd43ed0, jointype=JOIN_INNER) at partbounds.c:4031
> #3  0x0081ff5d in partition_bounds_merge (partnatts=1,
> partsupfunc=0x2ba3548, partcollation=0x2ba34e8, outer_rel=0x2b6ce40,
> inner_rel=0x2bf8d28, jointype=JOIN_INNER,
> outer_parts=0x7ffeebd43ed8, inner_parts=0x7ffeebd43ed0) at
> partbounds.c:3053
> #4  0x007c610f in try_partitionwise_join (root=0x2be2a28,
> rel1=0x2b6ce40, rel2=0x2bf8d28, joinrel=0x2c1b0f0,
> parent_sjinfo=0x7ffeebd44010,
> parent_restrictlist=0x2c1c070) at joinrels.c:1370
> #5  0x007c5521 in populate_joinrel_with_paths (root=0x2be2a28,
> rel1=0x2b6ce40, rel2=0x2bf8d28, joinrel=0x2c1b0f0, sjinfo=0x7ffeebd44010,
> restrictlist=0x2c1c070)
> at joinrels.c:914
> #6  0x007c4f48 in make_join_rel (root=0x2be2a28,

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-03-06 Thread Rajkumar Raghuwanshi

On Tue, Mar 5, 2019 at 3:45 PM amul sul  wrote:

> Attached is the rebased atop of the latest master head(35bc0ec7c8).
>
thanks Amul, patches applied cleanly on PG head.

While testing this I got a server crash with below test case.

CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('0001','0002','0003');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0004','0005','0006');
CREATE TABLE plt1_p3 PARTITION OF plt1 FOR VALUES IN (NULL,'0008','0009');
CREATE TABLE plt1_p4 PARTITION OF plt1 FOR VALUES IN ('','0010');
INSERT INTO plt1 SELECT i, i % 47, to_char(i % 17, 'FM') FROM
generate_series(0, 500) i WHERE i % 17 NOT IN (7, 11, 12, 13, 14, 15,16);
INSERT INTO plt1 SELECT i, i % 47, case when i % 17 = 7 then NULL else
to_char(i % 17, 'FM') end FROM generate_series(0, 500) i WHERE i % 17
IN (7,8,9);
ANALYSE plt1;

CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0002','0003');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0004','0005','0006');
CREATE TABLE plt2_p3 PARTITION OF plt2 FOR VALUES IN ('0007','0008','0009');
CREATE TABLE plt2_p4 PARTITION OF plt2 FOR VALUES IN ('',NULL,'0012');
INSERT INTO plt2 SELECT i, i % 47, to_char(i % 17, 'FM') FROM
generate_series(0, 500) i WHERE i % 17 NOT IN (1, 10, 11, 13, 14, 15, 16);
INSERT INTO plt2 SELECT i, i % 47, case when i % 17 = 11 then NULL else
to_char(i % 17, 'FM') end FROM generate_series(0, 500) i WHERE i % 17
IN (0,11,12);
ANALYZE plt2;

CREATE TABLE plt1_e (a int, b int, c text) PARTITION BY LIST(ltrim(c, 'A'));
CREATE TABLE plt1_e_p1 PARTITION OF plt1_e FOR VALUES IN ('0002', '0003');
CREATE TABLE plt1_e_p2 PARTITION OF plt1_e FOR VALUES IN ('0004', '0005',
'0006');
CREATE TABLE plt1_e_p3 PARTITION OF plt1_e FOR VALUES IN ('0008', '0009');
CREATE TABLE plt1_e_p4 PARTITION OF plt1_e FOR VALUES IN ('');
INSERT INTO plt1_e SELECT i, i % 47, to_char(i % 17, 'FM') FROM
generate_series(0, 500) i WHERE i % 17 NOT IN (1, 7, 10, 11, 12, 13, 14,
15, 16);
ANALYZE plt1_e;

EXPLAIN (COSTS OFF)
SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM
plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c
GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

below is stack trace,  looks like some indexes got corrupted, please take a
look.

Core was generated by `postgres: edb postgres [local]
EXPLAIN   '.
Program terminated with signal 11, Segmentation fault.
#0  0x00821aee in map_and_merge_partitions (partmaps1=0x2c1c8a8,
partmaps2=0x2c1c8e0, index1=45540240, index2=0, next_index=0x7ffeebd43d3c)
at partbounds.c:4162
4162if (partmap1->from < 0 && partmap2->from < 0)
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x00821aee in map_and_merge_partitions (partmaps1=0x2c1c8a8,
partmaps2=0x2c1c8e0, *index1=45540240*, index2=0,
next_index=0x7ffeebd43d3c) at partbounds.c:4162
#1  0x008226c3 in merge_null_partitions (outer_bi=0x2b6e338,
inner_bi=0x2bf90b0, outer_maps=0x2c1c8a8, inner_maps=0x2c1c8e0,
jointype=JOIN_INNER,
next_index=0x7ffeebd43d3c, null_index=0x7ffeebd43d38,
default_index=0x7ffeebd43d34) at partbounds.c:4610
#2  0x00821726 in partition_list_bounds_merge
(partsupfunc=0x2ba3548, partcollation=0x2ba34e8, outer_rel=0x2b6ce40,
inner_rel=0x2bf8d28, outer_parts=0x7ffeebd43ed8,
inner_parts=0x7ffeebd43ed0, jointype=JOIN_INNER) at partbounds.c:4031
#3  0x0081ff5d in partition_bounds_merge (partnatts=1,
partsupfunc=0x2ba3548, partcollation=0x2ba34e8, outer_rel=0x2b6ce40,
inner_rel=0x2bf8d28, jointype=JOIN_INNER,
outer_parts=0x7ffeebd43ed8, inner_parts=0x7ffeebd43ed0) at
partbounds.c:3053
#4  0x007c610f in try_partitionwise_join (root=0x2be2a28,
rel1=0x2b6ce40, rel2=0x2bf8d28, joinrel=0x2c1b0f0,
parent_sjinfo=0x7ffeebd44010,
parent_restrictlist=0x2c1c070) at joinrels.c:1370
#5  0x007c5521 in populate_joinrel_with_paths (root=0x2be2a28,
rel1=0x2b6ce40, rel2=0x2bf8d28, joinrel=0x2c1b0f0, sjinfo=0x7ffeebd44010,
restrictlist=0x2c1c070)
at joinrels.c:914
#6  0x007c4f48 in make_join_rel (root=0x2be2a28, rel1=0x2b6ce40,
rel2=0x2bf8d28) at joinrels.c:748
#7  0x007c4514 in make_rels_by_clause_joins (root=0x2be2a28,
old_rel=0x2b6ce40, other_rels=0x2bae4d8) at joinrels.c:294
#8  0x007c41c8 in join_search_one_level (root=0x2be2a28, level=3)
at joinrels.c:116
#9  0x007abe59 in standard_join_search (root=0x2be2a28,
levels_needed=3,

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-01-21 Thread Etsuro Fujita


(2019/01/21 20:56), amul sul wrote:

On Mon, Nov 26, 2018 at 9:33 PM Dmitry Dolgov <9erthali...@gmail.com
> wrote:

 > On Mon, Nov 26, 2018 at 1:41 PM Dmitry Dolgov
<9erthali...@gmail.com > wrote:
 >
 > Thanks, I've messed this up too - rebased a wrong branch, the
correct one
 > doesn't have this code already. Sorry for that, and here is the
proper version
 > of this patch.

And one more version, because I haven't fixed the tests (sorry for
the noise).



0003 patch need a rebase.


Will do.

Thank you for letting us know!

Best regards,
Etsuro Fujita

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-01-21 Thread amul sul

On Mon, Nov 26, 2018 at 9:33 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Mon, Nov 26, 2018 at 1:41 PM Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> >
> > Thanks, I've messed this up too - rebased a wrong branch, the correct one
> > doesn't have this code already. Sorry for that, and here is the proper
> version
> > of this patch.
>
> And one more version, because I haven't fixed the tests (sorry for the
> noise).
>

0003 patch need a rebase.

Regards,
Amul

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-11-25 Thread Thomas Munro

On Mon, Nov 26, 2018 at 9:03 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I've noticed, that this patch set is outdated, so here is the rebased version.

This turned red on cfbot because I turned on -Werror:

partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:3619:43: error: ‘outer_part’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
merged_index = map_and_merge_partitions(outer_pmap, outer_mmap,
^
partbounds.c:3475:6: note: ‘outer_part’ was declared here
int outer_part;
^
cc1: all warnings being treated as errors

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-09-17 Thread Ashutosh Bapat

On Thu, Sep 13, 2018 at 1:45 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Fri, 31 Aug 2018 at 08:23, Ashutosh Bapat <
> ashutosh.ba...@enterprisedb.com> wrote:
> >
> > On Thu, Aug 30, 2018 at 2:23 PM, Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > >
> > >> I won't be working on this actively in the next commitfest. I will be
> > >> glad if somebody else wants to take this up. If there's nobody,
> > >> probably we should mark this entry as "returned with feedback" in the
> > >> next commitfest.
> > >
> > > Since I'm more or less familiar with the code and I believe it's an
> interesting
> > > feature, I can try to take over it for now if you don't mind (but
> without any
> > > strong commitments to make it perfectly shining for the next CF).
> >
> > Please do. Thanks.
>
> I've noticed that the patch is outdated already, so here is the rebased
> version. I also removed the last part with the extra tests since it
> was something
> internal

I am fine with that. It was never meant to be committed. I used to run
those tests to make sure that any changes to the core logic do not break
any working scenarios. Whenever I found a new failure in the extra tests
which wasn't there in tests to be committed, I used to move that test from
the first to the second. Over the time, the number of new failures in extra
has reduced and recently I didn't see any extra failures. So, may be it's
time for the extra tests to be dropped. I will suggest that keep the extra
tests running from time to time and certainly around the time the feature
gets committed.

> and merged the debug message into the implementation part. Ashutosh,
> please let me know if you're not happy with these modifications.
>

Robert Haas raised objections, and I agreed to those, about a similar debug
message I had included in the basic partition-wise join patches. I think
those reasons still apply, so you will need to remove the debug message
before the patches get committed. Said that the debug message is a good
debugging aid, so keeping it around till that time is a good idea.

>
> Other than that I haven't changed anything yet, but hope this will come
> soon.
> And this version is more to keep it updated for those people who may be
> interested.
>

Thanks.

--
Best Wishes,
Ashutosh Bapat

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-08-31 Thread Ashutosh Bapat

On Thu, Aug 30, 2018 at 2:23 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
>> I won't be working on this actively in the next commitfest. I will be
>> glad if somebody else wants to take this up. If there's nobody,
>> probably we should mark this entry as "returned with feedback" in the
>> next commitfest.
>
> Since I'm more or less familiar with the code and I believe it's an 
> interesting
> feature, I can try to take over it for now if you don't mind (but without any
> strong commitments to make it perfectly shining for the next CF).

Please do. Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-08-30 Thread Dmitry Dolgov

> On Wed, 29 Aug 2018 at 09:32, Ashutosh Bapat 
>  wrote:
>
> > * Many functions carry some unrelated arguments just to pass them through,
> >   which obscures the purpose of a function.
>
> Can you please provide some examples?

E.g this chain with partsupfunc & collations:

partition_range_bounds_merge
 -> partition_range_cmp
 -> partition_range_bound_cmp
 -> partition_range_merge_next_lb

I believe I already mentioned that before (although I'm not saying that I have
a solution right away, it just bothers me).

> > * I want to suggest to introduce a new data structure for partitions mapping
> >   instead of just keeping them in arrays (was it discussed already before?).
>
> The best I could think of was a structure with just two arrays. So,
> instead of encapsulating the arrays within a structure, I thought it
> best to pass bare arrays around. If you have any other ideas, please
> let me know.

Well, what I had in mind is something like a structure Mapping with fields from
& to.

> > * What is the reason that almost everywhere in the patch there is a naming
> >   convention "outer/inner" partition maps, and sometimes (e.g. in
> >   generate_matching_part_pairs) it's "map1/map2"?
>
> You will find that the optimizer in general uses outer/inner,
> rel1/rel2 nomenclature interchangeably. This patch-set just inherits
> that. But yes, we should do more to straighten it out.

Thanks, good to know.

> I won't be working on this actively in the next commitfest. I will be
> glad if somebody else wants to take this up. If there's nobody,
> probably we should mark this entry as "returned with feedback" in the
> next commitfest.

Since I'm more or less familiar with the code and I believe it's an interesting
feature, I can try to take over it for now if you don't mind (but without any
strong commitments to make it perfectly shining for the next CF).

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-08-29 Thread Ashutosh Bapat

On Thu, Aug 23, 2018 at 4:01 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On Fri, 27 Jul 2018 at 20:13, Robert Haas  wrote:
>>
>> On Fri, Jul 27, 2018 at 3:17 AM, Ashutosh Bapat
>>  wrote:
>> > Apart from the complexity there's also a possibility that this
>> > skipping will reduce the efficiency actually in normal cases. Consider
>> > a case where A and B have exactly matching partitions. Current
>> > partition matching algorithm compare any given range/bound only once
>> > and we will have O(n) algorithm. If we use a binary search, however,
>> > for every range comparison, it will be O(n log n) algorithm. There
>> > will be unnecessary comparisons during binary search. The current
>> > algorithm is always O(n), whereas binary search would be O(n log(n))
>> > with a possibility of having sub-O(n) complexity in some cases. I
>> > would go for an algorithm which has a consistent time complexity
>> > always and which is efficient in normal cases, rather than worrying
>> > about some cases which are not practical.
>>
>> Yeah, I think that's a good point.  The normal case here will be that
>> the partition bounds are equal, or that there are a few extra
>> partitions on one side that don't exist on the other.  We don't want
>> other cases to be crazily inefficient, but I suspect in practice that
>> if the partitioning bounds aren't pretty close to matching up exactly,
>> we're going to end up failing to be able to do a partition-wise join
>> at all.  It's not very likely that somebody happens to have a case
>> where they've partitioned two tables in randomly different ways, but
>> then they decide to join them anyway, but then it turns out that the
>> partition bounds happen to be compatible enough that this algorithm
>> works.
>
>> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat 
>>  wrote:
>> 1. those cases are rare enough that we can ignore those right now. How
>> many times we would encounter the case you have quoted, for example?
>> Usually the ranges will be continuous only differing in the first or
>> last partition e.g time-series data.
>
> Ok, but what about list partitioning? I believe the area of application for it
> can be more diverse than mostly just for time-series, and in the patch almost
> the same approach is used to merge list partitions.

Same arguments hold for list partitioning as well. For a list
partitioned table the bounds are ordered by list datums and not
partitions, so it will be rather odd to have large runs of mismatching
datums, the case where binary search fares, from one of side of join.
So, my following argument still holds true

---
>> > I would go for an algorithm which has a consistent time complexity
>> > always and which is efficient in normal cases, rather than worrying
>> > about some cases which are not practical.
---


>
> Other than that everything seems fine from functionality point of view, and so
> far I couldn't find any situation that produces a wrong plan. Some of the 
> joins
> were not that intuitive from the first glance, but at the end everything was
> according to the documentation.

Thanks a lot for your tests and I am glad that you didn't find any failures.

> Taking this into account I wonder if it's possible somehow to give any hints 
> in
> an explain result about why it wasn't possible to apply partition wise join? 
> If
> something wasn't clear for me I ended up looking at the value of "merged" 
> flag,
> and to figure out actual reason one needs to trace the entire algorithm.

That's kind of tricky in PostgreSQL. The optimizer doesn't always
report why a particular path was not chosen. Said that we could add
trace logs printing that information, however, the main difficulty is
reporting the relations being joined.See 0004 for example.

>
> Besides that I checked the performance in some simple cases, no problems on
> this side (but I'll also do some checks for more complicated joins).

Thanks a lot.

>
> And I still have some complaints about readability, although I can imagine 
> that
> it's just me:
>
> * Many functions carry some unrelated arguments just to pass them through,
>   which obscures the purpose of a function.

Can you please provide some examples?

>
> * I want to suggest to introduce a new data structure for partitions mapping
>   instead of just keeping them in arrays (was it discussed already before?).

The best I could think of was a structure with just two arrays. So,
instead of encapsulating the arrays within a structure, I thought it
best to pass bare arrays around. If you have any other ideas, please
let me know.

>
> * What is the reason that almost everywhere in the patch there is a naming
>   convention "outer/inner" partition maps, and sometimes (e.g. in
>   generate_matching_part_pairs) it's "map1/map2"?

You will find that the optimizer in general uses outer/inner,
rel1/rel2 nomenclature interchangeably. This patch-set just inherits
that. But yes, we should do more to straighten it out.

I won't be working on this

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-08-23 Thread Dmitry Dolgov

> On Fri, 27 Jul 2018 at 20:13, Robert Haas  wrote:
>
> On Fri, Jul 27, 2018 at 3:17 AM, Ashutosh Bapat
>  wrote:
> > Apart from the complexity there's also a possibility that this
> > skipping will reduce the efficiency actually in normal cases. Consider
> > a case where A and B have exactly matching partitions. Current
> > partition matching algorithm compare any given range/bound only once
> > and we will have O(n) algorithm. If we use a binary search, however,
> > for every range comparison, it will be O(n log n) algorithm. There
> > will be unnecessary comparisons during binary search. The current
> > algorithm is always O(n), whereas binary search would be O(n log(n))
> > with a possibility of having sub-O(n) complexity in some cases. I
> > would go for an algorithm which has a consistent time complexity
> > always and which is efficient in normal cases, rather than worrying
> > about some cases which are not practical.
>
> Yeah, I think that's a good point.  The normal case here will be that
> the partition bounds are equal, or that there are a few extra
> partitions on one side that don't exist on the other.  We don't want
> other cases to be crazily inefficient, but I suspect in practice that
> if the partitioning bounds aren't pretty close to matching up exactly,
> we're going to end up failing to be able to do a partition-wise join
> at all.  It's not very likely that somebody happens to have a case
> where they've partitioned two tables in randomly different ways, but
> then they decide to join them anyway, but then it turns out that the
> partition bounds happen to be compatible enough that this algorithm
> works.

> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat 
>  wrote:
> 1. those cases are rare enough that we can ignore those right now. How
> many times we would encounter the case you have quoted, for example?
> Usually the ranges will be continuous only differing in the first or
> last partition e.g time-series data.

Ok, but what about list partitioning? I believe the area of application for it
can be more diverse than mostly just for time-series, and in the patch almost
the same approach is used to merge list partitions.

Other than that everything seems fine from functionality point of view, and so
far I couldn't find any situation that produces a wrong plan. Some of the joins
were not that intuitive from the first glance, but at the end everything was
according to the documentation.
Taking this into account I wonder if it's possible somehow to give any hints in
an explain result about why it wasn't possible to apply partition wise join? If
something wasn't clear for me I ended up looking at the value of "merged" flag,
and to figure out actual reason one needs to trace the entire algorithm.

Besides that I checked the performance in some simple cases, no problems on
this side (but I'll also do some checks for more complicated joins).

And I still have some complaints about readability, although I can imagine that
it's just me:

* Many functions carry some unrelated arguments just to pass them through,
  which obscures the purpose of a function.

* I want to suggest to introduce a new data structure for partitions mapping
  instead of just keeping them in arrays (was it discussed already before?).

* What is the reason that almost everywhere in the patch there is a naming
  convention "outer/inner" partition maps, and sometimes (e.g. in
  generate_matching_part_pairs) it's "map1/map2"?

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-27 Thread Robert Haas

On Fri, Jul 27, 2018 at 3:17 AM, Ashutosh Bapat
 wrote:
> Apart from the complexity there's also a possibility that this
> skipping will reduce the efficiency actually in normal cases. Consider
> a case where A and B have exactly matching partitions. Current
> partition matching algorithm compare any given range/bound only once
> and we will have O(n) algorithm. If we use a binary search, however,
> for every range comparison, it will be O(n log n) algorithm. There
> will be unnecessary comparisons during binary search. The current
> algorithm is always O(n), whereas binary search would be O(n log(n))
> with a possibility of having sub-O(n) complexity in some cases. I
> would go for an algorithm which has a consistent time complexity
> always and which is efficient in normal cases, rather than worrying
> about some cases which are not practical.

Yeah, I think that's a good point.  The normal case here will be that
the partition bounds are equal, or that there are a few extra
partitions on one side that don't exist on the other.  We don't want
other cases to be crazily inefficient, but I suspect in practice that
if the partitioning bounds aren't pretty close to matching up exactly,
we're going to end up failing to be able to do a partition-wise join
at all.  It's not very likely that somebody happens to have a case
where they've partitioned two tables in randomly different ways, but
then they decide to join them anyway, but then it turns out that the
partition bounds happen to be compatible enough that this algorithm
works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-27 Thread Ashutosh Bapat

On Thu, Jul 26, 2018 at 8:37 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat 
>>  wrote:
>>
>> On Fri, Jul 20, 2018 at 3:13 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> >
>> > It's of course wrong, it's going to be O(max(m, n)) as you said, but
>> > the point is still valid - if we have partitions A1, A2 from one side
>> > and B1, ..., BN on another side, we can skip necessary the
>> > partitions from B that are between e.g. A1 and A2 faster with
>> > binary search.
>>
>> That's possible only when there is no default partition and the join
>> is an inner join. For an outer join, we need to include all the
>> partitions on the outer side, so we can't just skip over some
>> partitions. In case of a default partition, it can take place of a
>> missing partition, so we can't skip partitions using binary search.
>
> But in this case I described above (when we have a partition A3 on one side,
> and another matching partition B3 from other side, but separated by some other
> partitions, e.g. B1, B2) it means that we will merge A3 with a default
> partition from other side without actually needing that, am I right? In this
> sense it's an overhead out of nothing.

No. We will join A3 with B3 since they have matching bounds. We will
compare B1 and B2's bounds with A3 (assuming that there are no bounds
before A3). Since they won't be compatible we will match default of A
to B1 and B2. That will of-course fail since we will try to match
multiple partitions to a single partition, but that's not the point of
your question I think. You are right that we could skip comparing A3
with B1 and B2 and directly land on B3. Any partitions skipped in
between will be matched with A's default partition. But as I have said
this would be rare and complicating the logic for a rare case doesn't
seem practical at this stage.

Apart from the complexity there's also a possibility that this
skipping will reduce the efficiency actually in normal cases. Consider
a case where A and B have exactly matching partitions. Current
partition matching algorithm compare any given range/bound only once
and we will have O(n) algorithm. If we use a binary search, however,
for every range comparison, it will be O(n log n) algorithm. There
will be unnecessary comparisons during binary search. The current
algorithm is always O(n), whereas binary search would be O(n log(n))
with a possibility of having sub-O(n) complexity in some cases. I
would go for an algorithm which has a consistent time complexity
always and which is efficient in normal cases, rather than worrying
about some cases which are not practical.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-26 Thread Dmitry Dolgov

> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat 
>  wrote:
>
> On Fri, Jul 20, 2018 at 3:13 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> >
> > It's of course wrong, it's going to be O(max(m, n)) as you said, but
> > the point is still valid - if we have partitions A1, A2 from one side
> > and B1, ..., BN on another side, we can skip necessary the
> > partitions from B that are between e.g. A1 and A2 faster with
> > binary search.
>
> That's possible only when there is no default partition and the join
> is an inner join. For an outer join, we need to include all the
> partitions on the outer side, so we can't just skip over some
> partitions. In case of a default partition, it can take place of a
> missing partition, so we can't skip partitions using binary search.

But in this case I described above (when we have a partition A3 on one side,
and another matching partition B3 from other side, but separated by some other
partitions, e.g. B1, B2) it means that we will merge A3 with a default
partition from other side without actually needing that, am I right? In this
sense it's an overhead out of nothing.

> 1. those cases are rare enough that we can ignore those right now. How
> many times we would encounter the case you have quoted, for example?
> Usually the ranges will be continuous only differing in the first or
> last partition e.g time-series data.

Well, unfortunately, I don't have enough context to discuss whether it's rare
or not.

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-23 Thread Ashutosh Bapat

On Fri, Jul 20, 2018 at 3:13 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> It's of course wrong, it's going to be O(max(m, n)) as you said, but
> the point is still valid - if we have partitions A1, A2 from one side
> and B1, ..., BN on another side, we can skip necessary the
> partitions from B that are between e.g. A1 and A2 faster with
> binary search.

That's possible only when there is no default partition and the join
is an inner join. For an outer join, we need to include all the
partitions on the outer side, so we can't just skip over some
partitions. In case of a default partition, it can take place of a
missing partition, so we can't skip partitions using binary search.
The code right now works for all the cases and is O(n). I agree that
it can be optimized for certain cases, but
1. those cases are rare enough that we can ignore those right now. How
many times we would encounter the case you have quoted, for example?
Usually the ranges will be continuous only differing in the first or
last partition e.g time-series data.
2. The code is enough complex right now and it's also a lot. Making it
complicated further is not the worth the rare use cases. If we get
complaints from the field, we can certainly improve it in future. But
I would wait for those complaints before improving it further.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-19 Thread Dmitry Dolgov

> On Thu, 19 Jul 2018 at 21:04, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > > * Just to clarify - the iterating through all the partitions, is it the 
> > > best
> > >   way of finding matching ranges? Correct me if I'm wrong, but from what 
> > > I see
> > >   in the comments for PartitionBoundInfoData, all the ranges are arranged 
> > > in
> > >   increasing order - maybe then it's possible to use binary search for 
> > > finding
> > >   a matching range?
> >
> > The loop in partition_range_bounds_merge() runs as many times as the
> > maximum of number of datums from the given partition bounds. So the
> > complexity is O(n) where n is the maximum of number of datums. With
> > binary search the complexity will increase to O(nlogn). I might be
> > missing something here.
>
> Now I'm confused even more. Correct me if I'm wrong, but here is what I see
> right now:
>
> * We're trying to solve the standard problem of finding overlapping intervals
>   from two sets of intervals
>
> * The current implementation implicitly compares every range from one side of 
> a
>   join with every range from another side, which is O(n^2).

It's of course wrong, it's going to be O(max(m, n)) as you said, but
the point is still valid - if we have partitions A1, A2 from one side
and B1, ..., BN on another side, we can skip necessary the
partitions from B that are between e.g. A1 and A2 faster with
binary search.

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-19 Thread Dmitry Dolgov

> On Tue, 17 Jul 2018 at 11:58, Ashutosh Bapat 
>  wrote:
>
> On Sun, Jul 15, 2018 at 11:13 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> >> On Thu, 28 Jun 2018 at 07:54, Amit Langote  
> >> wrote:
> >>
> >> On 2018/06/27 22:21, Ashutosh Bapat wrote:
> >> > On Wed, Jun 27, 2018 at 12:28 PM, Amit Langote
> >> >> Ah, okay.  I thought of reporting this because I felt the errors may 
> >> >> have
> >> >> to do with changes to the related code in HEAD between May 14 when you
> >> >> last posted the patches and today that you may need to account for in 
> >> >> you
> >> >> patches.  For instance, there are many diffs like this:
> >> >>
> >> >> Looks like the Result node on top of Append is no longer there after
> >> >> applying your patch.
> >> >
> >> > Yes. They are coming because of a commit which removed Result node on
> >> > top of an Append node. I don't remember exactly which.
> >> >
> >> > I wouldn't worry about those diffs at this time. As I have mentioned
> >> > in earlier mails, the expected output from 0006 is quite large and is
> >> > not supposed to be committed. So, I don't see much value in fixing the
> >> > plans in that output.
> >> >
> >> > Do you see that as a hindrance in reviewing the code changes and tests 
> >> > in 0005?
> >>
> >> I think not.  I'll ignore 0006 for now and focus on other patches.
> >
> > Hi,
> >
> > Sorry for my irregular reviews. Unfortunately, the patch need to be rebased
> > again.
>
> I rebased the patches without any problem on top of commit

Thanks!

> > In the meantime I have few more commentaries about range_bounds_merge:
> >
> > * From what I see partition_range_bound_cmp is executed on the same bounds 
> > few
> >   times to find whether they are overlapping and during the range merging, 
> > is
> >   it necessary? Probably it would be enough just to compare current ranges 
> > once
> >   per iteration.
>
> The bounds that are compared in those cases are different. Any two
> bounds are compared only once per iteration. Remember each range has
> two bounds, thus there are total four comparisons possible. In case of
> overlap, we do all four comparisons.

Yep, indeed, now I see.

> > * Just to clarify - the iterating through all the partitions, is it the best
> >   way of finding matching ranges? Correct me if I'm wrong, but from what I 
> > see
> >   in the comments for PartitionBoundInfoData, all the ranges are arranged in
> >   increasing order - maybe then it's possible to use binary search for 
> > finding
> >   a matching range?
>
> The loop in partition_range_bounds_merge() runs as many times as the
> maximum of number of datums from the given partition bounds. So the
> complexity is O(n) where n is the maximum of number of datums. With
> binary search the complexity will increase to O(nlogn). I might be
> missing something here.

Now I'm confused even more. Correct me if I'm wrong, but here is what I see
right now:

* We're trying to solve the standard problem of finding overlapping intervals
  from two sets of intervals

* The current implementation implicitly compares every range from one side of a
  join with every range from another side, which is O(n^2).

Let's imagine we have two tables:

=# \d+ test1_overlap
   Table "public.test1_overlap"
 Column |  Type   | Collation | Nullable | Default | Storage  |
+-+---+--+-+--+
 id | integer |   |  | | plain|
 data   | jsonb   |   |  | | extended |
Partition key: RANGE (id)
Partitions: test11_overlap FOR VALUES FROM (200) TO (210),
test12_overlap FOR VALUES FROM (220) TO (230),
test13_overlap FOR VALUES FROM (240) TO (250)

=# \d+ test2_overlap
   Table "public.test2_overlap"
 Column |  Type   | Collation | Nullable | Default | Storage  |
+-+---+--+-+--+
 id | integer |   |  | | plain|
 data   | jsonb   |   |  | | extended |
Partition key: RANGE (id)
Partitions: test210_overlap FOR VALUES FROM (160) TO (170),
test211_overlap FOR VALUES FROM (180) TO (190),
test21_overlap FOR VALUES FROM (0) TO (10),
test22_overlap FOR VALUES FROM (20) TO (30),
test23_overlap FOR VALUES FROM (200) TO (210),
test24_overlap FOR VALUES FROM (40) TO (50),
test25_overlap FOR VALUES FROM (60) TO (70),
test26_overlap FOR VALUES FROM (80) TO (90),
test27_overlap FOR VALUES FROM (100) TO (110),
test28_overlap FOR VALUES FROM (120) TO (130),
test29_overlap FOR VALUES FROM (140) TO (150)

And the join:

select * from test1_overlap inner join test2_overlap using (id);

Partition wise join will work fine, but what will happen (I see this following
the code under gdb) is that inside the function partition_range_bounds_merge we
start from two partitions:

test11_overlap (200, 210) and

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-16 Thread Robert Haas

On Sun, Jul 15, 2018 at 1:43 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Partitions: test11 FOR VALUES FROM (0) TO (100),
> test12 FOR VALUES FROM (100) TO (200),
> test13 FOR VALUES FROM (200) TO (300)
>
> Partitions: test21 FOR VALUES FROM (10) TO (110),
> test22 FOR VALUES FROM (110) TO (210),
> test23 FOR VALUES FROM (210) TO (310)
>
> I'm confused, since there is only one-to-one mapping of partitions.

No, test21 would have to be joined to both test11 and test12, since
either could contain matching rows.  Also, test22 would have to be
joined to both test12 and test13.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-15 Thread Dmitry Dolgov

> On Thu, 28 Jun 2018 at 07:54, Amit Langote  
> wrote:
>
> On 2018/06/27 22:21, Ashutosh Bapat wrote:
> > On Wed, Jun 27, 2018 at 12:28 PM, Amit Langote
> >> Ah, okay.  I thought of reporting this because I felt the errors may have
> >> to do with changes to the related code in HEAD between May 14 when you
> >> last posted the patches and today that you may need to account for in you
> >> patches.  For instance, there are many diffs like this:
> >>
> >> Looks like the Result node on top of Append is no longer there after
> >> applying your patch.
> >
> > Yes. They are coming because of a commit which removed Result node on
> > top of an Append node. I don't remember exactly which.
> >
> > I wouldn't worry about those diffs at this time. As I have mentioned
> > in earlier mails, the expected output from 0006 is quite large and is
> > not supposed to be committed. So, I don't see much value in fixing the
> > plans in that output.
> >
> > Do you see that as a hindrance in reviewing the code changes and tests in 
> > 0005?
>
> I think not.  I'll ignore 0006 for now and focus on other patches.

Hi,

Sorry for my irregular reviews. Unfortunately, the patch need to be rebased
again. In the meantime I have few more commentaries about range_bounds_merge:

* From what I see partition_range_bound_cmp is executed on the same bounds few
  times to find whether they are overlapping and during the range merging, is
  it necessary? Probably it would be enough just to compare current ranges once
  per iteration.

* Just to clarify - the iterating through all the partitions, is it the best
  way of finding matching ranges? Correct me if I'm wrong, but from what I see
  in the comments for PartitionBoundInfoData, all the ranges are arranged in
  increasing order - maybe then it's possible to use binary search for finding
  a matching range?

* I'm not sure why in this case partition wise join is not applied? Maybe I'm
  wrong, but I was under the impression that it should work here

=# \d+ test1;
Table "public.test1"
 Column |  Type   | Collation | Nullable | Default | Storage  |
+-+---+--+-+--+
 data   | jsonb   |   |  | | extended |
 id | integer |   |  | | plain|
Partition key: RANGE (id)
Indexes:
"test1_idx" btree (id)
Partitions: test11 FOR VALUES FROM (0) TO (100),
test12 FOR VALUES FROM (100) TO (200),
test13 FOR VALUES FROM (200) TO (300)

=# \d+ test2;
Table "public.test2"
 Column |  Type   | Collation | Nullable | Default | Storage  |
+-+---+--+-+--+
 data   | jsonb   |   |  | | extended |
 id | integer |   |  | | plain|
Partition key: RANGE (id)
Indexes:
"test2_idx" btree (id)
Partitions: test21 FOR VALUES FROM (10) TO (110),
test22 FOR VALUES FROM (110) TO (210),
test23 FOR VALUES FROM (210) TO (310)

=# set enable_partitionwise_join to true;

=# explain analyze select * from test1 t1 inner join test2 t2 using (id);
QUERY PLAN
---
 Hash Join  (cost=3.25..6.56 rows=9 width=54)
 (actual time=0.082..0.105 rows=3 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Append  (cost=0.00..3.18 rows=12 width=29)
   (actual time=0.026..0.047 rows=12 loops=1)
 ->  Seq Scan on test11 t1  (cost=0.00..1.05 rows=5 width=29)
(actual time=0.025..0.028 rows=5 loops=1)
 ->  Seq Scan on test12 t1_1 (cost=0.00..1.04 rows=4 width=29)
 (actual time=0.006..0.0 07 rows=4 loops=1)
 ->  Seq Scan on test13 t1_2 (cost=0.00..1.03 rows=3 width=29)
 (actual time=0.005..0.0 06 rows=3 loops=1)
   ->  Hash  (cost=3.13..3.13 rows=9 width=29)
 (actual time=0.033..0.033 rows=9 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Append  (cost=0.00..3.13 rows=9 width=29)
 (actual time=0.006..0.022 rows=9 loops=1)
   ->  Seq Scan on test21 t2
   (cost=0.00..1.03 rows=3 width=29)
   (actual time=0.005.  .0.008 rows=3 loops=1)
   ->  Seq Scan on test22 t2_1
   (cost=0.00..1.03 rows=3 width=29)
   (actual time=0.00 4..0.005 rows=3 loops=1)
   ->  Seq Scan on test23 t2_2
   (cost=0.00..1.03 rows=3 width=29)
   (actual time=0.00 3..0.004 rows=3 loops=1)
 Planning Time: 0.921 ms
 Execution Time: 0.261 ms
(14 rows)

=# set enable_partitionwise_join to false;
=# explain analyze select * from test1 t1 inner join test2 t2 using (id);
QUERY PLAN

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-06-27 Thread Amit Langote

On 2018/06/27 22:21, Ashutosh Bapat wrote:
> On Wed, Jun 27, 2018 at 12:28 PM, Amit Langote
>> Ah, okay.  I thought of reporting this because I felt the errors may have
>> to do with changes to the related code in HEAD between May 14 when you
>> last posted the patches and today that you may need to account for in you
>> patches.  For instance, there are many diffs like this:
>>
>> Looks like the Result node on top of Append is no longer there after
>> applying your patch.
> 
> Yes. They are coming because of a commit which removed Result node on
> top of an Append node. I don't remember exactly which.
> 
> I wouldn't worry about those diffs at this time. As I have mentioned
> in earlier mails, the expected output from 0006 is quite large and is
> not supposed to be committed. So, I don't see much value in fixing the
> plans in that output.
> 
> Do you see that as a hindrance in reviewing the code changes and tests in 
> 0005?

I think not.  I'll ignore 0006 for now and focus on other patches.

Thanks,
Amit

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-06-27 Thread Ashutosh Bapat

On Wed, Jun 27, 2018 at 12:28 PM, Amit Langote
 wrote:
> On 2018/06/26 18:02, Ashutosh Bapat wrote:
>> On Tue, Jun 26, 2018 at 2:27 PM, Amit Langote
>>  wrote:
>>> Hi Ashutosh,
>>>
>>> On 2018/05/14 20:14, Ashutosh Bapat wrote:
 0001-Hash-partition-bound-equality-refactoring.patch
 0002-Targetlist-of-a-child-join-is-produced-by-translatin.patch
 0003-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi.patch
 0004-Add-a-debug-message-to-notify-whether-partition-wise.patch
 0005-Tests-for-0-1-1-1-and-1-0-partition-matching.patch
 0006-Extra-extensive-tests-for-advanced-partition-matchin.patch
>>>
>>> I noticed after *cleanly* applying 0001-0004 to today's HEAD that while
>>> 0005's test all pass, there are many failures in 0006's tests.  Maybe, you
>>> need to adjust something in one of the patches or adjust test outputs.
>>
>> If the failures are because of plan changes, it's expected. If those
>> are because of crashes or changed output, those need to be fixed. I
>> have kept that patch to notice any crashes or output changes, in which
>> case, I pull that test into 0005 test set after fixing the code. Once
>> we are near commit, I will remove that patch from the patchset.
>
> Ah, okay.  I thought of reporting this because I felt the errors may have
> to do with changes to the related code in HEAD between May 14 when you
> last posted the patches and today that you may need to account for in you
> patches.  For instance, there are many diffs like this:
>
> ***
> *** 90,132 
>   -- left outer join, with whole-row reference
>   EXPLAIN (COSTS OFF)
>   SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b =
> 0 ORDER BY t1.a, t1.b, t1.c, t2.a, t2.b, t2.c;
> !QUERY PLAN
> ! 
>Sort
>  Sort Key: t1.a, t1.c, t2.a, t2.b, t2.c
> !->  Result
> !  ->  Append
> !->  Hash Right Join
> !  Hash Cond: (t2.b = t1.a)
> !  ->  Seq Scan on prt2_p0 t2
> !  ->  Hash
> !->  Seq Scan on prt1_p0 t1
> 
>
> --- 90,131 
>   -- left outer join, with whole-row reference
>   EXPLAIN (COSTS OFF)
>   SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b =
> 0 ORDER BY t1.a, t1.b, t1.c, t2.a, t2.b, t2.c;
> ! QUERY PLAN
> ! --
>Sort
>  Sort Key: t1.a, t1.c, t2.a, t2.b, t2.c
> !->  Append
> !  ->  Hash Right Join
> !Hash Cond: (t2.b = t1.a)
> !->  Seq Scan on prt2_p0 t2
> !->  Hash
> !  ->  Seq Scan on prt1_p0 t1
> !Filter: (b = 0)
>
> Looks like the Result node on top of Append is no longer there after
> applying your patch.

Yes. They are coming because of a commit which removed Result node on
top of an Append node. I don't remember exactly which.

I wouldn't worry about those diffs at this time. As I have mentioned
in earlier mails, the expected output from 0006 is quite large and is
not supposed to be committed. So, I don't see much value in fixing the
plans in that output.

Do you see that as a hindrance in reviewing the code changes and tests in 0005?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-06-27 Thread Amit Langote

On 2018/06/26 18:02, Ashutosh Bapat wrote:
> On Tue, Jun 26, 2018 at 2:27 PM, Amit Langote
>  wrote:
>> Hi Ashutosh,
>>
>> On 2018/05/14 20:14, Ashutosh Bapat wrote:
>>> 0001-Hash-partition-bound-equality-refactoring.patch
>>> 0002-Targetlist-of-a-child-join-is-produced-by-translatin.patch
>>> 0003-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi.patch
>>> 0004-Add-a-debug-message-to-notify-whether-partition-wise.patch
>>> 0005-Tests-for-0-1-1-1-and-1-0-partition-matching.patch
>>> 0006-Extra-extensive-tests-for-advanced-partition-matchin.patch
>>
>> I noticed after *cleanly* applying 0001-0004 to today's HEAD that while
>> 0005's test all pass, there are many failures in 0006's tests.  Maybe, you
>> need to adjust something in one of the patches or adjust test outputs.
> 
> If the failures are because of plan changes, it's expected. If those
> are because of crashes or changed output, those need to be fixed. I
> have kept that patch to notice any crashes or output changes, in which
> case, I pull that test into 0005 test set after fixing the code. Once
> we are near commit, I will remove that patch from the patchset.

Ah, okay.  I thought of reporting this because I felt the errors may have
to do with changes to the related code in HEAD between May 14 when you
last posted the patches and today that you may need to account for in you
patches.  For instance, there are many diffs like this:

***
*** 90,132 
  -- left outer join, with whole-row reference
  EXPLAIN (COSTS OFF)
  SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b =
0 ORDER BY t1.a, t1.b, t1.c, t2.a, t2.b, t2.c;
!QUERY PLAN
! 
   Sort
 Sort Key: t1.a, t1.c, t2.a, t2.b, t2.c
!->  Result
!  ->  Append
!->  Hash Right Join
!  Hash Cond: (t2.b = t1.a)
!  ->  Seq Scan on prt2_p0 t2
!  ->  Hash
!->  Seq Scan on prt1_p0 t1


--- 90,131 
  -- left outer join, with whole-row reference
  EXPLAIN (COSTS OFF)
  SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b =
0 ORDER BY t1.a, t1.b, t1.c, t2.a, t2.b, t2.c;
! QUERY PLAN
! --
   Sort
 Sort Key: t1.a, t1.c, t2.a, t2.b, t2.c
!->  Append
!  ->  Hash Right Join
!Hash Cond: (t2.b = t1.a)
!->  Seq Scan on prt2_p0 t2
!->  Hash
!  ->  Seq Scan on prt1_p0 t1
!Filter: (b = 0)

Looks like the Result node on top of Append is no longer there after
applying your patch.

Thanks,
Amit

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-06-26 Thread Ashutosh Bapat

On Tue, Jun 26, 2018 at 2:27 PM, Amit Langote
 wrote:
> Hi Ashutosh,
>
> On 2018/05/14 20:14, Ashutosh Bapat wrote:
>> 0001-Hash-partition-bound-equality-refactoring.patch
>> 0002-Targetlist-of-a-child-join-is-produced-by-translatin.patch
>> 0003-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi.patch
>> 0004-Add-a-debug-message-to-notify-whether-partition-wise.patch
>> 0005-Tests-for-0-1-1-1-and-1-0-partition-matching.patch
>> 0006-Extra-extensive-tests-for-advanced-partition-matchin.patch
>
> I noticed after *cleanly* applying 0001-0004 to today's HEAD that while
> 0005's test all pass, there are many failures in 0006's tests.  Maybe, you
> need to adjust something in one of the patches or adjust test outputs.

If the failures are because of plan changes, it's expected. If those
are because of crashes or changed output, those need to be fixed. I
have kept that patch to notice any crashes or output changes, in which
case, I pull that test into 0005 test set after fixing the code. Once
we are near commit, I will remove that patch from the patchset.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-06-26 Thread Amit Langote

Hi Ashutosh,

On 2018/05/14 20:14, Ashutosh Bapat wrote:
> 0001-Hash-partition-bound-equality-refactoring.patch
> 0002-Targetlist-of-a-child-join-is-produced-by-translatin.patch
> 0003-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi.patch
> 0004-Add-a-debug-message-to-notify-whether-partition-wise.patch
> 0005-Tests-for-0-1-1-1-and-1-0-partition-matching.patch
> 0006-Extra-extensive-tests-for-advanced-partition-matchin.patch

I noticed after *cleanly* applying 0001-0004 to today's HEAD that while
0005's test all pass, there are many failures in 0006's tests.  Maybe, you
need to adjust something in one of the patches or adjust test outputs.

Thanks,
Amit

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-05-06 Thread Dmitry Dolgov

> On 3 April 2018 at 15:34, Ashutosh Bapat  
> wrote:
> On Fri, Mar 30, 2018 at 7:36 PM, Ashutosh Bapat
>  wrote:
>>
>> I am working on commenting portions of the code to make it more clear
>> and readable. Will update the patches with the comments soon, mostly
>> this Monday.
>>
>
> Here's set of patches rebased on the latest head. I have added a lot
> of comments and revised a lot of code to avoid duplication, causing a
> net reduction in the number of lines.

Thank you. Unfortunately, there are already few more conflicts since last time,
could you rebase again?

In the meantime, I'm a bit confused by `merge_null_partitions` and why
partitions with NULL values should be treated separately? It became even more
confusing when I looked at where this functions is used:

/* If merge is unsuccessful, bail out without any further processing. */
if (merged)
merged = merge_null_partitions(outer_bi, outer_pmap, outer_mmap,
   inner_bi, inner_pmap, inner_mmap,
   jointype, _index, _index,
   _index);

Is this commentary correct?

Also, I don't see anywhere in PostgreSQL itself or in this patch a definition
of the term "NULL partition", can you add it, just to make things clear?

Another question, I see this pattern a lot here when there is a code like:

if (!outer_has_something && inner_has_something)
{
// some logic
}
else if (outer_has_something && !inner_has_something)
{
// some logic symmetric to what we have above
}

By symmetric I mean that the code is more or less the same, just "inner"
variables were changed to "outer" (and required types of joins are opposite).
Is it feasible to actually use completely the same logic, and just change
"inner"/"outer" variables instead?

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-03-30 Thread Ashutosh Bapat

Thanks for your review comments. They are helpful.

On Thu, Mar 29, 2018 at 2:53 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:

> But actually you also mentioned
> another topic that bothers me about this patch. Different parts of the
> algorithm implementation (at least for functions that build maps of matching
> partitions) are quite dependent on each other in terms of shared state. At
> first glance in `partition_range_bounds_merge` we have about a dozen of
> variables of different mutability level, that affect the control flow:
>
> outer_lb_index
> inner_lb_index
> merged
> merged_index
> overlap
> merged_lb
> merged_ub
> finished_outer
> finished_inner
> ub_cmpval
> lb_cmpval
> inner_has_default
> outer_has_default
> jointype
>
> It looks a bit too much for me, and would require commentaries like "if you
> changed the logic here, also take a look there". But I'm not saying that I 
> have
> any specific suggestions how to change it (although I'll definitely try to do
> so, at least to get some better understanding of the underlying algorithm).

I am working on commenting portions of the code to make it more clear
and readable. Will update the patches with the comments soon, mostly
this Monday.

>
> Just to make myself clear, I wanted to suggest not to change the commentary 
> for
> `IS_PARTITIONED_REL` significantly, but just add a sentence that you need to
> check if given relation is fully set up.

For that we have REL_HAS_ALL_PART_PROPS. IS_PARTITIONED_REL is really
checking whether a relation is partitioned and following comment makes
it clear why we have to check all those things. So, I still don't see
why we need to change the comment there.

 * It's not enough to test whether rel->part_scheme is set, because it might
 * be that the basic partitioning properties of the input relations matched
 * but the partition bounds did not.

> Also, few more random notes (mostly related to readability, since I found some
> parts of the patch hard to read, but of course it's arguable).
>
> ```
> PartitionRangeBound outer_lb;
> PartitionRangeBound outer_ub;
> PartitionRangeBound inner_lb;
> PartitionRangeBound inner_ub;
> PartitionRangeBound *merged_lb = NULL;
> PartitionRangeBound *merged_ub = NULL;
> ```
>
> Maybe it would be better to not repeat the type here? Let's say:
>
> ```
> PartitionRangeBound outer_lb,
> outer_ub,

PostgreSQL code uses both the styles, but I like declarations with the
type is right there along side the variable. It also makes it easy to
delete the variable or modify its type, without causing unnecessary
diff on adjacent lines.

> ```
> partition_range_bounds_merge(int partnatts, FmgrInfo *partsupfuncs,
>  Oid *partcollations, PartitionBoundInfo outer_bi,
>  int outer_nparts, PartitionBoundInfo inner_bi,
>  int inner_nparts, JoinType jointype,
>  List **outer_parts, List **inner_parts)
> ```
>
> From what I see in `partition.c` there are a lot functions that accept
> `partnatts`, `partcollations` only to pass it down to, e.g.
> `partition_rbound_cmp`.
> What do you think about introducing a data structure to keep these arguments,
> and pass only an instance of this structure instead?

I have discussed this in an adjacent thread, but let me repeat it
here. Those members come from PartitionKey or PartitionScheme. If we
pack those in a structure and not include it in those two
structures,we will just create a structure before calling the function
and unpack those inside the comparison function e.g.
partition_rbound_cmp(). We could argue that that will benefit
intermediate functions like partition_range_bound_cmp(), which just
pass those values down, but there is all the possibility that its
future caller may not have that packing structure available readily.
So, I am inclined not to add a new structure just for this.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-03-28 Thread Dmitry Dolgov

> On 22 March 2018 at 14:18, Ashutosh Bapat  
> wrote:
> On Thu, Mar 22, 2018 at 4:32 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>> On 12 March 2018 at 06:00, Ashutosh Bapat  
>>> wrote:
>>> Thanks for the note. Here are rebased patches.
>>
>> Since I started to look at this patch, I can share few random notes (although
>> it's not a complete review, I'm in the progress now), maybe it can be 
>> helpful.
>>
>> In `partition_range_bounds_merge`
>>
>> + if (!merged)
>> + break;
>>
>> is a bit redundant I think, because every time `merged` set to false it
>> followed by break.
>
> Yes, right now. May be I should turn it into Assert(merged); What do you 
> think?

Thank you for reply. Yes, that sounds good. But actually you also mentioned
another topic that bothers me about this patch. Different parts of the
algorithm implementation (at least for functions that build maps of matching
partitions) are quite dependent on each other in terms of shared state. At
first glance in `partition_range_bounds_merge` we have about a dozen of
variables of different mutability level, that affect the control flow:

outer_lb_index
inner_lb_index
merged
merged_index
overlap
merged_lb
merged_ub
finished_outer
finished_inner
ub_cmpval
lb_cmpval
inner_has_default
outer_has_default
jointype

It looks a bit too much for me, and would require commentaries like "if you
changed the logic here, also take a look there". But I'm not saying that I have
any specific suggestions how to change it (although I'll definitely try to do
so, at least to get some better understanding of the underlying algorithm).

>>
>> I've noticed that in some places `IS_PARTITIONED_REL` was replaced
>>
>> - if (!IS_PARTITIONED_REL(joinrel))
>> + if (joinrel->part_scheme == NULL)
>>
>> but I'm not quite follow why? Is it because `boundinfo` is not available
>> anymore at this point? If so, maybe it makes sense to update the commentary 
>> for
>> this macro and mention to not use for joinrel.
>
>
> This is done in try_partitionwise_join(). As explained in the comment
>
>  * Get the list of matching partitions to be joined along with the
>  * partition bounds of the join relation. Because of the restrictions
>  * imposed by partition matching algorithm, not every pair of joining
>  * relations for this join will be able to use partition-wise join. But 
> all
>  * those pairs which can use partition-wise join will produce the same
>  * partition bounds for the join relation.
>
> boundinfo for the join relation is built in this function. So, we
> don't have join relation's partitioning information fully set up yet.
> So we can't use IS_PARTITIONED_REL() there. joinrel->part_scheme if
> set tells that the joining relations have matching partition schemes
> and thus the join relation can possibly use partition-wise join
> technique. If it's not set, then we can't use partition-wise join.
>
> But IS_PARTITIONED_REL() is still useful at a number of other places,
> where it's known to encounter a RelOptInfo whose partitioning
> properties are fully setup. So, I don't think we should change the
> macro or the comments above it.

Just to make myself clear, I wanted to suggest not to change the commentary for
`IS_PARTITIONED_REL` significantly, but just add a sentence that you need to
check if given relation is fully set up.

Also, few more random notes (mostly related to readability, since I found some
parts of the patch hard to read, but of course it's arguable).

```
PartitionRangeBound outer_lb;
PartitionRangeBound outer_ub;
PartitionRangeBound inner_lb;
PartitionRangeBound inner_ub;
PartitionRangeBound *merged_lb = NULL;
PartitionRangeBound *merged_ub = NULL;
```

Maybe it would be better to not repeat the type here? Let's say:

```
PartitionRangeBound outer_lb,
outer_ub,
...
```

It's just too long and distracting.

```
partition_range_bounds_merge(int partnatts, FmgrInfo *partsupfuncs,
 Oid *partcollations, PartitionBoundInfo outer_bi,
 int outer_nparts, PartitionBoundInfo inner_bi,
 int inner_nparts, JoinType jointype,
 List **outer_parts, List **inner_parts)
```

>From what I see in `partition.c` there are a lot functions that accept
`partnatts`, `partcollations` only to pass it down to, e.g.
`partition_rbound_cmp`.
What do you think about introducing a data structure to keep these arguments,
and pass only an instance of this structure instead?

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-03-22 Thread Ashutosh Bapat

On Thu, Mar 22, 2018 at 4:32 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On 12 March 2018 at 06:00, Ashutosh Bapat  
>> wrote:
>> Thanks for the note. Here are rebased patches.
>
> Since I started to look at this patch, I can share few random notes (although
> it's not a complete review, I'm in the progress now), maybe it can be helpful.
>
> In `partition_range_bounds_merge`
>
> + if (!merged)
> + break;
>
> is a bit redundant I think, because every time `merged` set to false it
> followed by break.

Yes, right now. May be I should turn it into Assert(merged); What do you think?

>
> At the end same function
>
> + if (merged)
> + merged = merge_default_partitions(outer_bi, outer_pmap, outer_mmap,
> + inner_bi, inner_pmap, inner_mmap,
> + jointype, _index,
> + _index);
>
> Looks like I misunderstand something in the algorithm, but aren't default
> partitions were already processed before e.g. here:
>
> + /*
> +  * Default partition on the outer side forms the default
> +  * partition of the join result.
> +  */

The default partition handling in the loop handles the cases of
missing partitions as explained in a comment
/*
 * If a range appears in one of the joining relations but not the other
 * (as a whole or part), the rows in the corresponding partition will
 * not have join partners in the other relation, unless the other
 * relation has a default partition.

But merge_default_partitions() tries to map the default partitions
from both the relations.

>
> Also in here
>
> + /* Free any memory we used in this function. */
> + list_free(merged_datums);
> + list_free(merged_indexes);
> + pfree(outer_pmap);
> + pfree(inner_pmap);
> + pfree(outer_mmap);
> + pfree(inner_mmap);
>
> I think `merged_kinds` is missing.

Done.

>
> I've noticed that in some places `IS_PARTITIONED_REL` was replaced
>
> - if (!IS_PARTITIONED_REL(joinrel))
> + if (joinrel->part_scheme == NULL)
>
> but I'm not quite follow why? Is it because `boundinfo` is not available
> anymore at this point? If so, maybe it makes sense to update the commentary 
> for
> this macro and mention to not use for joinrel.

This is done in try_partitionwise_join(). As explained in the comment
/*
 * Get the list of matching partitions to be joined along with the
 * partition bounds of the join relation. Because of the restrictions
 * imposed by partition matching algorithm, not every pair of joining
 * relations for this join will be able to use partition-wise join. But all
 * those pairs which can use partition-wise join will produce the same
 * partition bounds for the join relation.
 */
boundinfo for the join relation is built in this function. So, we
don't have join relation's partitioning information fully set up yet.
So we can't use IS_PARTITIONED_REL() there. joinrel->part_scheme if
set tells that the joining relations have matching partition schemes
and thus the join relation can possibly use partition-wise join
technique. If it's not set, then we can't use partition-wise join.

But IS_PARTITIONED_REL() is still useful at a number of other places,
where it's known to encounter a RelOptInfo whose partitioning
properties are fully setup. So, I don't think we should change the
macro or the comments above it.

>
> Also, as for me it would be helpful to have some commentary about this new
> `partsupfunc`, what is exactly the purpose of it (but it's maybe just me
> missing some obvious things from partitioning context)
>
> + FmgrInfo   *partsupfunc;

It's just copied from Relation::PartitionKey as is. It stores the
functions required to compare two partition key datums. Since we use
those functions frequently their FmgrInfo is cached.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-03-21 Thread Dmitry Dolgov

> On 12 March 2018 at 06:00, Ashutosh Bapat  
> wrote:
> Thanks for the note. Here are rebased patches.

Since I started to look at this patch, I can share few random notes (although
it's not a complete review, I'm in the progress now), maybe it can be helpful.

In `partition_range_bounds_merge`

+ if (!merged)
+ break;

is a bit redundant I think, because every time `merged` set to false it
followed by break.

At the end same function

+ if (merged)
+ merged = merge_default_partitions(outer_bi, outer_pmap, outer_mmap,
+ inner_bi, inner_pmap, inner_mmap,
+ jointype, _index,
+ _index);

Looks like I misunderstand something in the algorithm, but aren't default
partitions were already processed before e.g. here:

+ /*
+  * Default partition on the outer side forms the default
+  * partition of the join result.
+  */

Also in here

+ /* Free any memory we used in this function. */
+ list_free(merged_datums);
+ list_free(merged_indexes);
+ pfree(outer_pmap);
+ pfree(inner_pmap);
+ pfree(outer_mmap);
+ pfree(inner_mmap);

I think `merged_kinds` is missing.

I've noticed that in some places `IS_PARTITIONED_REL` was replaced

- if (!IS_PARTITIONED_REL(joinrel))
+ if (joinrel->part_scheme == NULL)

but I'm not quite follow why? Is it because `boundinfo` is not available
anymore at this point? If so, maybe it makes sense to update the commentary for
this macro and mention to not use for joinrel.

Also, as for me it would be helpful to have some commentary about this new
`partsupfunc`, what is exactly the purpose of it (but it's maybe just me
missing some obvious things from partitioning context)

+ FmgrInfo   *partsupfunc;

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-03-11 Thread Dmitry Dolgov

> On 26 February 2018 at 11:03, Ashutosh Bapat 
>  wrote:
> On Fri, Feb 23, 2018 at 7:35 PM, Robert Haas  wrote:
>> On Fri, Feb 16, 2018 at 12:14 AM, Ashutosh Bapat
>>  wrote:
>>> Appreciate you taking time for review.
>>>
>>> PFA updated version.
>>
>> Committed 0001.
>
> Thanks.
>
> Here's patchset rebased on the latest head. I have fixed all the
> crashes and bugs reported till now.

Hi,

I've stumbled upon this patch and noticed, that it's not compiled anymore after
2af28e6033, where `parttypcoll` was renamed to `partcollation`.

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-23 Thread Robert Haas

On Fri, Feb 16, 2018 at 12:14 AM, Ashutosh Bapat
 wrote:
> Appreciate you taking time for review.
>
> PFA updated version.

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-15 Thread Amit Langote

Hi Ashutosh.

On 2018/02/09 14:27, Ashutosh Bapat wrote:
> Here's updated patch set with those comments added.

I looked at patches 0002 and 0003.

In 0002:

+ * In case of hash partition we store modulus and remainder in datums array

In case of hash partitioned table?

+ * which has the same data type irrespective of the number of partition keys
+ * and their data types. Hence we can compare the hash bound collection
without
+ * any partition key specific information.

"has the same data type" sounds like it means a Postgres data type,
whereas I think you mean that they are simple int32 values, so we don't
need any PartitionKey information to compare them.

In 0003:

A portion of code in both partition_range_bounds_merge(),
partition_list_bounds_merge(), and merge_null_partitions() has an extra
semi-colon at the end of a line starting with else if:

   if (default_index < 0)
default_index = merged_index;
else if(default_index != merged_index);
{

which emits warnings like this:

partition.c: In function ‘partition_range_bounds_merge’:
partition.c:4192:11: warning: this ‘if’ clause does not guard...
[-Wmisleading-indentation]
  else if(default_index != merged_index);

   ^~
partition.c: In function ‘partition_list_bounds_merge’:
partition.c:4261:11: warning: this ‘if’ clause does not guard...
[-Wmisleading-indentation]
  else if(default_index != merged_index);
   ^~
Also, get this warning.

partition.c:3955:1: warning: ‘is_next_range_continuous’ defined but not
used [-Wunused-function]

I'm trying to understand the optimizer side of this patch.  In your commit
message, I read:

This commit allows partition-wise join to be applied under
following conditions

1. the partition bounds of joining relations are such that rows from
given partition on one side join can join with rows from maximum one
partition on the other side i.e. bounds of a given partition on one
side match/overlap with those of maximum one partition on the other
side. If the mapping happens to be m:n where m > 1 or n > 1, we have
to gang multiple partition relations together into a single relation.
This means that we have to add simple relations during join
processing, something which is not supported right now.  ALso, in such
a case, different pairs of joining relations can produce different
partition bounds for the same join relation, which again is not
supported right now.


So, is the currently allowed case (partition bounds on both sides match
exactly) a special case of this new feature which tries to match
partitions in a more generalized manner?  I see that this patch removes
the partition_bound_equal(outer_rel->boundinfo, inner_rel->boundinfo)
check in build_joinrel_partition_info() in favor of reconciling any
differences in the representation of the partition bounds by calling
partition_bounds_merge() from try_partition_wise_join().

2. For every partition on outer side that can contribute to the result
of an OUTER side, there exists at least one (taken along with item 1,
it means exactly one)  matching partition on the inner side. To
support partition-wise join when the inner matching partition doesn't
exist, we have to add a dummy base relation corresponding to the
non-existent inner partition. We don't have support add base relations
during join processing.

Sorry but I'm a bit confused by the last sentence; does it mean we're not
able to allow partition-wise join to happen in this case?  But this is in
the list of the new cases that the patch makes partition-wise join to
happen for.


Looking at the code changes under src/backend/optimizer:

+else
+{
+Assert(partition_bounds_equal(part_scheme->partnatts,
+  part_scheme->parttyplen,
+  part_scheme->parttypbyval,
+  join_boundinfo, joinrel->boundinfo));

IIUC, this else block would run when try_partition_wise_join() is called
again for the same pair of relations.

+/*
+ * Every pair of joining relations should result in the same number
+ * of child-joins.
+ */

Sorry if I'm misreading this, but does it mean: a given pair of joining
relations should always result in the same number of (set of, too?)
child-joins?

In the new comment in build_joinrel_partition_info():

+ * Because of restrictions in partition_bounds_merge(), not every pair of
+ * joining relation

joining relations


I will try to hop into partition_bounds_merge() from now...

Thanks,
Amit

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-08 Thread Ashutosh Bapat

On Fri, Feb 9, 2018 at 11:26 AM, Amit Langote
 wrote:
> On 2018/02/09 14:31, Ashutosh Bapat wrote:
>>> I also noticed that a later patch adds partsupfunc to PartitionScheme,
>>> which the pruning patch needs too.  So, perhaps would be nice to take out
>>> that portion of the patch.  That is, the changes to PartitionScheme struct
>>> definition and those to find_partition_scheme().
>>
>> I am not sure whether a patch with just that change and without any
>> changes to use that member will be acceptable. So leaving this aside.
>
> I asked, because with everything that I have now changed in the partition
> pruning patch, one would need to pass these FmgrInfo pointers down to
> partition bound searching functions from the optimizer.  If the changes to
> add partsupfunc to the optimizer were taken out from your main patch, the
> pruning patch could just start using it.  For now, I'm making those
> changes part of the pruning patch.

That's fine. Someone's patch will be committed first and the other
will just take out those changes. But I am open to separate those
changes into other patch if a committer feels so.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-08 Thread Amit Langote

On 2018/02/09 14:31, Ashutosh Bapat wrote:
>> I also noticed that a later patch adds partsupfunc to PartitionScheme,
>> which the pruning patch needs too.  So, perhaps would be nice to take out
>> that portion of the patch.  That is, the changes to PartitionScheme struct
>> definition and those to find_partition_scheme().
> 
> I am not sure whether a patch with just that change and without any
> changes to use that member will be acceptable. So leaving this aside.

I asked, because with everything that I have now changed in the partition
pruning patch, one would need to pass these FmgrInfo pointers down to
partition bound searching functions from the optimizer.  If the changes to
add partsupfunc to the optimizer were taken out from your main patch, the
pruning patch could just start using it.  For now, I'm making those
changes part of the pruning patch.

Thanks,
Amit

1 2 >

1 - 100 of 103 matches

Mail list logo