Re: [HACKERS] path toward faster partition pruning

2017-11-13 Thread Amit Langote
Horiguchi-san,

Thanks for taking a look.  Replying to all your emails here.

On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
> In 0002, bms_add_range has a bit naive-looking loop
> 
> + while (wordnum <= uwordnum)
> + {
> + bitmapword mask = (bitmapword) ~0;
> +
> + /* If working on the lower word, zero out bits below 'lower'. */
> + if (wordnum == lwordnum)
> + {
> + int lbitnum = BITNUM(lower);
> + mask >>= lbitnum;
> + mask <<= lbitnum;
> + }
> +
> + /* Likewise, if working on the upper word, zero bits above 
> 'upper' */
> + if (wordnum == uwordnum)
> + {
> + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 
> 1);
> + mask <<= ushiftbits;
> + mask >>= ushiftbits;
> + }
> +
> + a->words[wordnum++] |= mask;
> + }
> 
> Without some aggressive optimization, the loop takes most of the
> time to check-and-jump for nothing especially with many
> partitions and somewhat unintuitive.
> 
> The following uses a bit tricky bitmap operation but
> is straightforward as a whole.
> 
> =
> /* fill the bits upper from BITNUM(lower) (0-based) of the first word */
> a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
> 
> /* fill up intermediate words */
> while (wordnum < uwordnum)
>a->words[wordnum++] = ~(bitmapword) 0;
> 
> /* fill up to BITNUM(upper) bit (0-based) of the last word */
> a->workds[wordnum++] |=
>  (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
> =

Considering also the David's comment downthread, I will try to incorporate
this into bms_add_range().

> In 0003, 
> 
> +match_clauses_to_partkey(RelOptInfo *rel,
> ...
> +  if (rinfo->pseudoconstant &&
> +(IsA(clause, Const) &&
> + Const *) clause)->constisnull) ||
> +  !DatumGetBool(((Const *) clause)->constvalue
> +  {
> +*constfalse = true;
> +continue;
> 
> If we call this function in both conjunction and disjunction
> context (the latter is only in recursive case). constfalse ==
> true means no need of any clauses for the former case.
> 
> Since (I think) just a list of RestrictInfo is expected to be
> treated as a conjunction (it's quite doubious, though..),

I think it makes sense to consider a list of RestrictInfo's, such as
baserestrictinfo, that is passed as input to match_clauses_to_partkey(),
to be mutually conjunctive for our purpose here.

> we
> might be better call this for each subnodes of a disjunction. Or,
> like match_clauses_to_index, we might be better provide
> match_clause_to_partkey(rel, rinfo, contains_const), which
> returns NULL if constfalse. (I'm not self-confident on this..)

After reading your comment, I realized that it was wrong that the
recursive call to match_clauses_to_partkey() passed the arguments of an OR
clause all at once.  That broke the assumption mentioned above that all of
the clauses in the list passed to match_clauses_to_partkey() are mutually
conjunctive.  Instead, we must make a single-member list for each of the
OR clause's arguments and pass the same.

Then if we got constfalse for all of the OR's arguments, then we return
the constfalse=true to the original caller.

> 
> +  /*
> +   * If no commutator exists, cannot flip the qual's args,
> +   * so give up.
> +   */
> +  if (!OidIsValid(expr_op))
> +continue;
> 
> I suppose it's better to leftop and rightop as is rather than
> flipping over so that var is placed left-side. Does that make
> things so complex?

Reason to do it that way is that the code placed far away (code in
partition.c that extracts constant values to use for pruning from matched
clauses) can always assume that the clauses determined to be useful for
partition-pruning always come in the 'partkey op constant' form.

> + * It the operator happens to be '<>', which is never listed
> If?

Right, will fix.

> +if (!op_in_opfamily(expr_op, partopfamily))
> +{
> +  Oidnegator = get_negator(expr_op);
> +
> +  if (!OidIsValid(negator) ||
> +!op_in_opfamily(negator, partopfamily))
> +continue;
> 
> classify_partition_bounding_keys() checks the same thing by
> checking whether the negator's strategy is
> BTEquealStrategyNumber. (I'm not sure the operator is guaranteed
> to be of btreee, though..) Aren't they needed to be in similar
> way?

You're right.  The <>'s negator may not always be a btree operator.  So,
we should add a check in match_clauses_to_partkey() that list or range
partitioning is in use, because only those require a btree operator
family.  We now have hash partitioning, so need to be careful not to make
the assumption that all partitioning operators are from btree operator
families.

If match_clauses_to_partkey() 

Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread David Rowley
On 10 November 2017 at 16:30, Kyotaro HORIGUCHI
 wrote:
> In 0002, bms_add_range has a bit naive-looking loop
>
> +   while (wordnum <= uwordnum)
> +   {
> +   bitmapword mask = (bitmapword) ~0;
> +
> +   /* If working on the lower word, zero out bits below 'lower'. 
> */
> +   if (wordnum == lwordnum)
> +   {
> +   int lbitnum = BITNUM(lower);
> +   mask >>= lbitnum;
> +   mask <<= lbitnum;
> +   }
> +
> +   /* Likewise, if working on the upper word, zero bits above 
> 'upper' */
> +   if (wordnum == uwordnum)
> +   {
> +   int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) 
> + 1);
> +   mask <<= ushiftbits;
> +   mask >>= ushiftbits;
> +   }
> +
> +   a->words[wordnum++] |= mask;
> +   }
>
> Without some aggressive optimization, the loop takes most of the
> time to check-and-jump for nothing especially with many
> partitions and somewhat unintuitive.
>
> The following uses a bit tricky bitmap operation but
> is straightforward as a whole.
>
> =
> /* fill the bits upper from BITNUM(lower) (0-based) of the first word */
> a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
>
> /* fill up intermediate words */
> while (wordnum < uwordnum)
>a->words[wordnum++] = ~(bitmapword) 0;
>
> /* fill up to BITNUM(upper) bit (0-based) of the last word */
> a->workds[wordnum++] |=
>  (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
> =

No objections here for making bms_add_range() perform better, but this
is not going to work when lwordnum == uwordnum. You'd need to special
case that. I didn't think it was worth the trouble, but maybe it is...

I assume the += should be |=.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 10 Nov 2017 14:44:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171110.144455.117208639.horiguchi.kyot...@lab.ntt.co.jp>

> > Those two conditions are not orthogonal. Maybe something like
> > following seems more understantable.
> > 
> > > if (!constfalse)
> > > {
> > >   /* No constraints on the keys, so, return *all* partitions. */
> > >   if (nkeys == 0)
> > > return bms_add_range(result, 0, partdesc->nparts - 1);
> > > 
> > >   result = get_partitions_for_keys(relation, );
> > > }

So, the condition (!constfalse && nkeys == 0) cannot return
there. I'm badly confused by the variable name.

I couldn't find another reasonable structure using the current
classify_p_b_keys(), but could you add a comment like the
following as an example?

+ /*
+  * Ths function processes other than OR expressions and returns
+  * the excluded OR expressions in or_clauses
+  */
>   nkeys = classify_partition_bounding_keys(relation, clauses,
>, ,
>_clauses);
>   /*
>* Only look up in the partition decriptor if the query provides
>* constraints on the keys at all.
>*/
>   if (!constfalse)
>   {
> if (nkey > 0)
>   result = get_partitions_for_keys(relation, );
> else
-+   /* No constraints on the keys, so, all partitions are passed. */
>   result = bms_add_range(result, 0, partdesc->nparts - 1);
>   }
> 
+   /*
+* We have a partition set for clauses not returned in or_clauses
+* here. Conjuct the result of each OR clauses.
+*/
>   foreach(lc, or_clauses)
>   {
> BoolExpr *or = (BoolExpr *) lfirst(lc);
> ListCell *lc1;
> Bitmapset *or_partset = NULL;
> 
+ Assert(or_clause(or));

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread Kyotaro HORIGUCHI
Ooops! The following comment is wrong. Please ignore it.

At Fri, 10 Nov 2017 14:38:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171110.143811.97616847.horiguchi.kyot...@lab.ntt.co.jp>
> Those two conditions are not orthogonal. Maybe something like
> following seems more understantable.
> 
> > if (!constfalse)
> > {
> >   /* No constraints on the keys, so, return *all* partitions. */
> >   if (nkeys == 0)
> > return bms_add_range(result, 0, partdesc->nparts - 1);
> > 
> >   result = get_partitions_for_keys(relation, );
> > }
> 
> I'm not sure what is meant to be (formally) done here, but it
> seems to me that OrExpr is assumed to be only at the top level of
> the caluses. So the following (just an example, but meaningful
> expression in this shpape must exists.) expression is perhaps
> wrongly processed here.
> 
> CREATE TABLE p (a int) PARITION BY (a);
> CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (10);
> CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (10) TO (20);
> 
> SELECT * FROM p WHERE a = 15 AND (a = 15 OR a = 5);
> 
> get_partitions_for_keys() returns both c1 and c2 and still
> or_clauses here holds (a = 15 OR a = 5) and the function tries to
> *add* partitions for a = 15 and a = 5 separetely.

This is working fine. Sorry for the bogus comment.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread Kyotaro HORIGUCHI
Hello, this is the second part of the review.

At Fri, 10 Nov 2017 12:30:00 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171110.123000.151902771.horiguchi.kyot...@lab.ntt.co.jp>
> In 0002, bms_add_range has a bit naive-looking loop
> In 0003, 

In 0004,

The name get_partitions_from_clauses_guts(), it seems to me that
we usually use _internal for recursive part of some function. (I
have the same comment as David about the comment for
get_partition_from_clause())

About the same function:

Couldn't we get out in the fast path when clauses == NIL?

+/* No constraints on the keys, so, return *all* partitions. */
+result = bms_add_range(result, 0, partdesc->nparts - 1);

This allows us to return immediately here. And just above this,

+   if (nkeys > 0 && !constfalse)
+   result = get_partitions_for_keys(relation, );
+   else if (!constfalse)

Those two conditions are not orthogonal. Maybe something like
following seems more understantable.

> if (!constfalse)
> {
>   /* No constraints on the keys, so, return *all* partitions. */
>   if (nkeys == 0)
> return bms_add_range(result, 0, partdesc->nparts - 1);
> 
>   result = get_partitions_for_keys(relation, );
> }

I'm not sure what is meant to be (formally) done here, but it
seems to me that OrExpr is assumed to be only at the top level of
the caluses. So the following (just an example, but meaningful
expression in this shpape must exists.) expression is perhaps
wrongly processed here.

CREATE TABLE p (a int) PARITION BY (a);
CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (10);
CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (10) TO (20);

SELECT * FROM p WHERE a = 15 AND (a = 15 OR a = 5);

get_partitions_for_keys() returns both c1 and c2 and still
or_clauses here holds (a = 15 OR a = 5) and the function tries to
*add* partitions for a = 15 and a = 5 separetely.


I'd like to pause here.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 10 Nov 2017 09:34:57 +0900, Amit Langote 
 wrote in 
<5fcb1a9f-b4ad-119d-14c7-282c30c7f...@lab.ntt.co.jp>
> Hi Amul.
> 
> On 2017/11/09 20:05, amul sul wrote:
> > On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
> >  wrote:
> >> On 2017/11/06 14:32, David Rowley wrote:
> >>> On 6 November 2017 at 17:30, Amit Langote wrote:
>  On 2017/11/03 13:32, David Rowley wrote:
> > On 31 October 2017 at 21:43, Amit Langote wrote:
> > []
> >>
> >> Attached updated set of patches, including the fix to make the new pruning
> >> code handle Boolean partitioning.
> >>
> > 
> > I am getting following warning on mac os x:
> 
> Thanks for the review.
> 
> > partition.c:1800:24: warning: comparison of constant -1 with
> > expression of type 'NullTestType'
> >   (aka 'enum NullTestType') is always false
> > [-Wtautological-constant-out-of-range-compare]
> > if (keynullness[i] == -1)
> > ~~ ^  ~~
> > partition.c:1932:25: warning: comparison of constant -1 with
> > expression of type 'NullTestType'
> >   (aka 'enum NullTestType') is always false
> > [-Wtautological-constant-out-of-range-compare]
> > if (keynullness[i] == -1)
> > ~~ ^  ~~
> > 2 warnings generated.
> > 
> > 
> > Comment for 0004 patch:
> > 270 +   /* -1 represents an invalid value of NullTestType. */
> >  271 +   memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
> > 
> > I think we should not use memset to set a value other than 0 or true/false.
> > This will work for -1 on the system where values are stored in the 2's
> > complement but I am afraid of other architecture.
> 
> OK, I will remove all instances of comparing and setting variables of type
> NullTestType to a value of -1.

In 0002, bms_add_range has a bit naive-looking loop

+   while (wordnum <= uwordnum)
+   {
+   bitmapword mask = (bitmapword) ~0;
+
+   /* If working on the lower word, zero out bits below 'lower'. */
+   if (wordnum == lwordnum)
+   {
+   int lbitnum = BITNUM(lower);
+   mask >>= lbitnum;
+   mask <<= lbitnum;
+   }
+
+   /* Likewise, if working on the upper word, zero bits above 
'upper' */
+   if (wordnum == uwordnum)
+   {
+   int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 
1);
+   mask <<= ushiftbits;
+   mask >>= ushiftbits;
+   }
+
+   a->words[wordnum++] |= mask;
+   }

Without some aggressive optimization, the loop takes most of the
time to check-and-jump for nothing especially with many
partitions and somewhat unintuitive.

The following uses a bit tricky bitmap operation but
is straightforward as a whole.

=
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);

/* fill up intermediate words */
while (wordnum < uwordnum)
   a->words[wordnum++] = ~(bitmapword) 0;

/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
 (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=


In 0003, 

+match_clauses_to_partkey(RelOptInfo *rel,
...
+  if (rinfo->pseudoconstant &&
+(IsA(clause, Const) &&
+ Const *) clause)->constisnull) ||
+  !DatumGetBool(((Const *) clause)->constvalue
+  {
+*constfalse = true;
+continue;

If we call this function in both conjunction and disjunction
context (the latter is only in recursive case). constfalse ==
true means no need of any clauses for the former case.

Since (I think) just a list of RestrictInfo is expected to be
treated as a conjunction (it's quite doubious, though..), we
might be better call this for each subnodes of a disjunction. Or,
like match_clauses_to_index, we might be better provide
match_clause_to_partkey(rel, rinfo, contains_const), which
returns NULL if constfalse. (I'm not self-confident on this..)


+  /*
+   * If no commutator exists, cannot flip the qual's args,
+   * so give up.
+   */
+  if (!OidIsValid(expr_op))
+continue;

I suppose it's better to leftop and rightop as is rather than
flipping over so that var is placed left-side. Does that make
things so complex?

+ * It the operator happens to be '<>', which is never listed
If?


+if (!op_in_opfamily(expr_op, partopfamily))
+{
+  Oidnegator = get_negator(expr_op);
+
+  if (!OidIsValid(negator) ||
+!op_in_opfamily(negator, partopfamily))
+continue;

classify_partition_bounding_keys() checks the same thing by
checking whether the 

Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread Amit Langote
Hi Amul.

On 2017/11/09 20:05, amul sul wrote:
> On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
>  wrote:
>> On 2017/11/06 14:32, David Rowley wrote:
>>> On 6 November 2017 at 17:30, Amit Langote wrote:
 On 2017/11/03 13:32, David Rowley wrote:
> On 31 October 2017 at 21:43, Amit Langote wrote:
> []
>>
>> Attached updated set of patches, including the fix to make the new pruning
>> code handle Boolean partitioning.
>>
> 
> I am getting following warning on mac os x:

Thanks for the review.

> partition.c:1800:24: warning: comparison of constant -1 with
> expression of type 'NullTestType'
>   (aka 'enum NullTestType') is always false
> [-Wtautological-constant-out-of-range-compare]
> if (keynullness[i] == -1)
> ~~ ^  ~~
> partition.c:1932:25: warning: comparison of constant -1 with
> expression of type 'NullTestType'
>   (aka 'enum NullTestType') is always false
> [-Wtautological-constant-out-of-range-compare]
> if (keynullness[i] == -1)
> ~~ ^  ~~
> 2 warnings generated.
> 
> 
> Comment for 0004 patch:
> 270 +   /* -1 represents an invalid value of NullTestType. */
>  271 +   memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
> 
> I think we should not use memset to set a value other than 0 or true/false.
> This will work for -1 on the system where values are stored in the 2's
> complement but I am afraid of other architecture.

OK, I will remove all instances of comparing and setting variables of type
NullTestType to a value of -1.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread amul sul
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
 wrote:
> On 2017/11/06 14:32, David Rowley wrote:
>> On 6 November 2017 at 17:30, Amit Langote wrote:
>>> On 2017/11/03 13:32, David Rowley wrote:
 On 31 October 2017 at 21:43, Amit Langote wrote:
[]
>
> Attached updated set of patches, including the fix to make the new pruning
> code handle Boolean partitioning.
>

I am getting following warning on mac os x:

partition.c:1800:24: warning: comparison of constant -1 with
expression of type 'NullTestType'
  (aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~ ^  ~~
partition.c:1932:25: warning: comparison of constant -1 with
expression of type 'NullTestType'
  (aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~ ^  ~~
2 warnings generated.


Comment for 0004 patch:
270 +   /* -1 represents an invalid value of NullTestType. */
 271 +   memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));

I think we should not use memset to set a value other than 0 or true/false.
This will work for -1 on the system where values are stored in the 2's
complement but I am afraid of other architecture.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-08 Thread Amit Langote
Hi Rajkumar,

Thanks for testing.

On 2017/11/08 15:52, Rajkumar Raghuwanshi wrote:
> On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote 
> wrote:
> 
>> Attached updated set of patches, including the fix to make the new pruning
>> code handle Boolean partitioning.
>>
> 
> Hi Amit,
> 
> I have tried pruning for different values of constraint exclusion GUC
> change, not sure exactly how it should behave, but I can see with the
> delete statement pruning is not happening when constraint_exclusion is off,
> but select is working as expected. Is this expected behaviour?

Hmm, the new pruning only works for selects, not DML.  The patch also
changes get_relation_constraints() to not include the internal partition
constraints, but mistakenly does so for all query types, not just select.
Will look into it.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread Rajkumar Raghuwanshi
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote 
wrote:

> Attached updated set of patches, including the fix to make the new pruning
> code handle Boolean partitioning.
>

Hi Amit,

I have tried pruning for different values of constraint exclusion GUC
change, not sure exactly how it should behave, but I can see with the
delete statement pruning is not happening when constraint_exclusion is off,
but select is working as expected. Is this expected behaviour?

create table lp (c1 int, c2 text) partition by list(c1);
create table lp1 partition of lp for values in (1,2);
create table lp2 partition of lp for values in (3,4);
create table lp3 partition of lp for values in (5,6);
insert into lp values (1,'p1'),(2,'p1'),(3,'p2'),(4,'p2'),(5,'p3');

show constraint_exclusion ;
 constraint_exclusion
--
 partition
(1 row)

explain select c1 from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
--
 Append  (cost=0.00..29.05 rows=6 width=4)
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=4)
 Filter: ((c1 >= 1) AND (c1 < 2))
(3 rows)

explain delete from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
--
 Delete on lp  (cost=0.00..29.05 rows=6 width=6)
   Delete on lp1
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
(4 rows)

set constraint_exclusion = off;

explain select c1 from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
--
 Append  (cost=0.00..29.05 rows=6 width=4)
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=4)
 Filter: ((c1 >= 1) AND (c1 < 2))
(3 rows)

*explain delete from lp where c1 >= 1 and c1 < 2;*
QUERY PLAN
--
 Delete on lp  (cost=0.00..87.15 rows=18 width=6)
   Delete on lp1
   Delete on lp2
   Delete on lp3
   ->  Seq Scan on lp1  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
   ->  Seq Scan on lp2  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
   ->  Seq Scan on lp3  (cost=0.00..29.05 rows=6 width=6)
 Filter: ((c1 >= 1) AND (c1 < 2))
(10 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread Amit Langote
Hi David.

Thanks for the review.

(..also looking at the comments you sent earlier today.)

On 2017/11/07 11:14, David Rowley wrote:
> On 7 November 2017 at 01:52, David Rowley  
> wrote:
>> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
> 
> I have a little more review to share:
> 
> 1. Missing "in" in comment. Should be "mentioned in"
> 
>  * get_append_rel_partitions
>  * Return the list of partitions of rel that pass the clauses mentioned
>  * rel->baserestrictinfo
> 
> 2. Variable should be declared in inner scope with the following fragment:
> 
> void
> set_basic_child_rel_properties(PlannerInfo *root,
>RelOptInfo *rel,
>RelOptInfo *childrel,
>AppendRelInfo *appinfo)
> {
> AttrNumber attno;
> 
> if (rel->part_scheme)
> {
> 
> which makes the code the same as where you moved it from.

It seems like you included the above changes in your attached C file,
which I will incorporate into my repository.

> 3. Normally lfirst() is assigned to a variable at the start of a
> foreach() loop. You have code which does not follow this.
> 
> foreach(lc, clauses)
> {
> Expr   *clause;
> int i;
> 
> if (IsA(lfirst(lc), RestrictInfo))
> {
> RestrictInfo *rinfo = lfirst(lc);
> 
> You could assign this to a Node * since the type is unknown to you at
> the start of the loop.

Will make the suggested changes to match_clauses_to_partkey().

> 4.
> /*
> * Useless if what we're thinking of as a constant is actually
> * a Var coming from this relation.
> */
> if (bms_is_member(rel->relid, constrelids))
> continue;
> 
> should this be moved to just above the op_strict() test? This one seems 
> cheaper.

Agreed, will do.  Also makes sense to move the PartCollMatchesExprColl()
test together.

> 5. Typo "paritions": /* No clauses to prune paritions, so scan all
> partitions. */
> 
> But thinking about it more the comment should something more along the
> lines of /* No useful clauses for partition pruning. Scan all
> partitions. */

You fixed it. :)

> The key difference is that there might be clauses, just without Consts.
> 
> Actually, the more I look at get_append_rel_partitions() I think it
> would be better if you re-shaped that if/else if test so that it only
> performs the loop over the partindexes if it's been set.
> 
> I ended up with the attached version of the function after moving
> things around a little bit.

Thanks a lot for that.  Looks much better now.

> I'm still reviewing but thought I'd share this part so far.

As mentioned at the top, I'm looking at your latest comments and they all
seem to be good points to me, so will address those in the next version.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread David Rowley
On 7 November 2017 at 01:52, David Rowley  wrote:
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

Hi Amit,

I had another look over this today. Apologies if any of the review seems petty.

Here goes:

1. If test seems to be testing for a child that's a partitioned table,
but is testing for a non-NULL part_scheme.

/*
* If childrel is itself partitioned, add it and its partitioned
* children to the list being propagated up to the root rel.
*/
if (childrel->part_scheme && rel->part_scheme)

Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange
to test for a NULL part_scheme

2. There's a couple of mistakes in my bms_add_range() code. I've
attached bms_add_range_fix.patch. Can you apply this to your tree?

3. This assert seems to be Asserting the same thing twice:

Assert(rel->live_partitioned_rels != NIL &&
   list_length(rel->live_partitioned_rels) > 0);

A List with length == 0 is always NIL.

4. get_partitions_from_clauses(), can you comment why you perform the
list_concat() there.

I believe this is there so that the partition bound from the parent is
passed down to the child so that we can properly eliminate all child
partitions when the 2nd level of partitioning is using the same
partition key as the 1st level. I think this deserves a paragraph of
comment to explain this.

5. Please add a comment to explain what's going on here in
classify_partition_bounding_keys()

if (partattno == 0)
{
partexpr = lfirst(partexprs_item);
partexprs_item = lnext(partexprs_item);
}

Looks like, similar to index expressions, that partition expressions
are attno 0 to mark to consume the next expression from the list.

Does this need validation that there are enough partexprs_item items
like what is done in get_range_key_properties()? Or is this validated
somewhere else?

6. Comment claims the if test will test something which it does not
seem to test for:

/*
* Redundant key elimination using btree-semantics based tricks.
*
* Only list and range partitioning use btree operator semantics, so
* skip otherwise.   Also, if there are expressions whose value is yet
* unknown, skip this step, because we need to compare actual values
* below.
*/
memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
if (partkey->strategy == PARTITION_STRATEGY_LIST ||
partkey->strategy == PARTITION_STRATEGY_RANGE)

I was expecting this to be skipped when the clauses contained a
non-const, but it does not seem to.

7. Should be "compare them"

/*
* If the leftarg and rightarg clauses' constants are both of the type
* expected by "op" clause's operator, then compare then using the
* latter's comparison function.
*/

But if I look at the code "compare then using the latter's comparison
function." is not true, it seems to use op's comparison function not
rightarg's. With most of the calls op and rightarg are the same, but
not all of them. The function shouldn't make that assumption even if
the args op was always the same as rightarg.

8. remove_redundant_clauses() needs an overview comment of what the
function does.

9. The comment should explain what we would do in the case of key < 3
AND key <= 2 using some examples.

/* try to keep only one of <, <= */

10. Wondering why this loop runs backward?

for (s = BTMaxStrategyNumber; --s >= 0;)

Why not just:

for (s = 0; s < BTMaxStrategyNumber; s++)

I can't see a special reason for it to run backward. It seems unusual,
but if there's a good reason that I've failed to realise then it's
maybe worth a comment.

11. Pleae comment on why *constfalse = true is set here:

if (!chk || s == (BTEqualStrategyNumber - 1))
continue;

if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
   _result))
{
if (!test_result)
{
*constfalse = true;
return;
}
/* discard the redundant key. */
xform[s] = NULL;
}

Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1.

Also please add a comment when discarding the redundant key maybe
explain that equality is more useful than the other strategies when
there's an overlap.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


bms_add_range_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread Amit Langote
On 2017/11/06 21:52, David Rowley wrote:
> On 6 November 2017 at 23:01, Amit Langote  
> wrote:
>> OK, I have gotten rid of the min/max partition index interface and instead
>> adopted the bms_add_range() approach by including your patch to add the
>> same in the patch set (which is now 0002 in the whole set).  I have to
>> admit that it's simpler to understand the new code with just Bitmapsets to
>> look at, but I'm still a bit concerned about materializing the whole set
>> right within partition.c, although we can perhaps optimize it later.
> 
> Thanks for making that change. The code looks much more simple now.
> 
> For performance, if you're worried about a very large number of
> partitions, then I think you're better off using bms_next_member()
> rather than bms_first_member(), (likely this applies globally, but you
> don't need to worry about those).
> 
> The problem with bms_first_member is that it must always loop over the
> 0 words before it finds any bits set for each call, whereas
> bms_next_member will start on the word it was last called for. There
> will likely be a pretty big performance difference between the two
> when processing a large Bitmapset.

Ah, thanks for the explanation.  I will change it to bms_next_member() in
the next version.

>> Attached updated set of patches, including the fix to make the new pruning
>> code handle Boolean partitioning.
> 
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

Thank you.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread David Rowley
On 7 November 2017 at 01:52, David Rowley  wrote:
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

I have a little more review to share:

1. Missing "in" in comment. Should be "mentioned in"

 * get_append_rel_partitions
 * Return the list of partitions of rel that pass the clauses mentioned
 * rel->baserestrictinfo

2. Variable should be declared in inner scope with the following fragment:

void
set_basic_child_rel_properties(PlannerInfo *root,
   RelOptInfo *rel,
   RelOptInfo *childrel,
   AppendRelInfo *appinfo)
{
AttrNumber attno;

if (rel->part_scheme)
{

which makes the code the same as where you moved it from.

3. Normally lfirst() is assigned to a variable at the start of a
foreach() loop. You have code which does not follow this.

foreach(lc, clauses)
{
Expr   *clause;
int i;

if (IsA(lfirst(lc), RestrictInfo))
{
RestrictInfo *rinfo = lfirst(lc);

You could assign this to a Node * since the type is unknown to you at
the start of the loop.

4.
/*
* Useless if what we're thinking of as a constant is actually
* a Var coming from this relation.
*/
if (bms_is_member(rel->relid, constrelids))
continue;

should this be moved to just above the op_strict() test? This one seems cheaper.

5. Typo "paritions": /* No clauses to prune paritions, so scan all
partitions. */

But thinking about it more the comment should something more along the
lines of /* No useful clauses for partition pruning. Scan all
partitions. */

The key difference is that there might be clauses, just without Consts.

Actually, the more I look at get_append_rel_partitions() I think it
would be better if you re-shaped that if/else if test so that it only
performs the loop over the partindexes if it's been set.

I ended up with the attached version of the function after moving
things around a little bit.

I'm still reviewing but thought I'd share this part so far.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
/*
 * get_append_rel_partitions
 *		Return the list of partitions of rel that pass the clauses mentioned
 *		in rel->baserestrictinfo. An empty list is returned if no matching
 *		partitions were found.
 *
 * Returned list contains the AppendRelInfos of chosen partitions.
 */
static List *
get_append_rel_partitions(PlannerInfo *root,
		  RelOptInfo *rel,
		  RangeTblEntry *rte)
{
	List   *partclauses;
	bool	contains_const,
			constfalse;

	/*
	 * Get the clauses that match the partition key, including information
	 * about any nullness tests against partition keys.  Set keynullness to
	 * a invalid value of NullTestType, which 0 is not.
	 */
	partclauses = match_clauses_to_partkey(rel,
		   list_copy(rel->baserestrictinfo),
		   _const,
		   );

	if (!constfalse)
	{
		Relation		parent = heap_open(rte->relid, NoLock);
		PartitionDesc	partdesc = RelationGetPartitionDesc(parent);
		Bitmapset	   *partindexes;
		List		   *result = NIL;
		inti;

		/*
		 * If we have matched clauses that contain at least one constant
		 * operand, then use these to prune partitions.
		 */
		if (partclauses != NIL && contains_const)
			partindexes = get_partitions_from_clauses(parent, partclauses);

		/*
		 * Else there are no clauses that are useful to prune any paritions,
		 * so we must scan all partitions.
		 */
		else
			partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);

		/* Fetch the partition appinfos. */
		i = -1;
		while ((i = bms_next_member(partindexes, i)) >= 0)
		{
			AppendRelInfo *appinfo = rel->part_appinfos[i];

#ifdef USE_ASSERT_CHECKING
			RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);

			/*
			 * Must be the intended child's RTE here, because appinfos are ordered
			 * the same way as partitions in the partition descriptor.
			 */
			Assert(partdesc->oids[i] == rte->relid);
#endif

			result = lappend(result, appinfo);
		}

		/* Record which partitions must be scanned. */
		rel->live_part_appinfos = result;

		heap_close(parent, NoLock);

		return result;
	}

	return NIL;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread David Rowley
On 6 November 2017 at 23:01, Amit Langote  wrote:
> OK, I have gotten rid of the min/max partition index interface and instead
> adopted the bms_add_range() approach by including your patch to add the
> same in the patch set (which is now 0002 in the whole set).  I have to
> admit that it's simpler to understand the new code with just Bitmapsets to
> look at, but I'm still a bit concerned about materializing the whole set
> right within partition.c, although we can perhaps optimize it later.

Thanks for making that change. The code looks much more simple now.

For performance, if you're worried about a very large number of
partitions, then I think you're better off using bms_next_member()
rather than bms_first_member(), (likely this applies globally, but you
don't need to worry about those).

The problem with bms_first_member is that it must always loop over the
0 words before it finds any bits set for each call, whereas
bms_next_member will start on the word it was last called for. There
will likely be a pretty big performance difference between the two
when processing a large Bitmapset.

> Attached updated set of patches, including the fix to make the new pruning
> code handle Boolean partitioning.

Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread Amit Langote
On 2017/11/06 13:15, David Rowley wrote:
> On 31 October 2017 at 21:43, Amit Langote  
> wrote:
>> Attached updated version of the patches
> 
> match_clauses_to_partkey() needs to allow for the way quals on Bool
> columns are represented.
> 
> create table pt (a bool not null) partition by list (a);
> create table pt_true partition of pt for values in('t');
> create table pt_false partition of pt for values in('f');
> explain select * from pt where a = true;
> QUERY PLAN
> --
>  Append  (cost=0.00..76.20 rows=2810 width=1)
>->  Seq Scan on pt_false  (cost=0.00..38.10 rows=1405 width=1)
>  Filter: a
>->  Seq Scan on pt_true  (cost=0.00..38.10 rows=1405 width=1)
>  Filter: a
> (5 rows)
> 
> match_clause_to_indexcol() shows an example of how to handle this.
> 
> explain select * from pt where a = false;
> 
> will need to be allowed too. This works slightly differently.

You're right.  I've fixed things to handle Boolean partitioning in the
updated set of patches I will post shortly.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-05 Thread David Rowley
On 6 November 2017 at 17:30, Amit Langote  wrote:
> On 2017/11/03 13:32, David Rowley wrote:
>> On 31 October 2017 at 21:43, Amit Langote  
>> wrote:
>> 1. This comment seem wrong.
>>
>> /*
>> * Since the clauses in rel->baserestrictinfo should all contain Const
>> * operands, it should be possible to prune partitions right away.
>> */
>
> Yes.  I used to think it was true, then realized it isn't and updated the
> code to get rid of that assumption, but I forgot updating this comment.
> Fixed.
>
>> How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
>> baserestrictinfo in this case will contain a single RestrictInfo with
>> an OpExpr containing two Var args and it'll come right through that
>> function too.

...

> We won't be able to use such a clause for pruning at all; neither
> planning-time pruning nor execution-time pruning.  Am I missing something?

I just meant the comment was wrong.

>
> The design with min/max partition index interface to the partition.c's new
> partition-pruning facility is intentional.  You can find hints about how
> such a design came about in the following Robert's email:
>
> https://www.postgresql.org/message-id/CA%2BTgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw%40mail.gmail.com

Yeah, I remember reading that before I had looked at the code. I
disagree with Robert on this. The fact that the min/max range gets
turned into a list of everything in that range in
get_append_rel_partitions means all the advantages that storing the
partitions as a range is voided. If you could have kept it a range the
entire time, then that might be different, but seems you need to
materialize the entire range in order to sort the partitions into
order. I've included Robert in just in case he wants to take a look at
the code that resulted from that design. Maybe something is not
following what he had in mind, or maybe he'll change his mind based on
the code that resulted.


> For range queries, it is desirable for the partitioning module to return
> the set of qualifying partitions that are contiguous in a compact (O(1))
> min/max representation than having to enumerate all those indexes in the
> set.  It's nice to avoid iterating over that set twice -- once when
> constructing the set in the partitioning module and then again in the
> caller (in this case, planner) to perform the planning-related tasks per
> selected partition.

The idea is that you still get the min and max from the bsearch, but
then use bms_add_range() to populate a bitmapset of the matching
partitions. The big-O notation of the search shouldn't change.

> We need the other_parts Bitmapset too, because selected partitions may not
> always be contiguous, even in the case of range partitioning, if there are
> OR clauses and the possibility that the default partition is also
> selected.  While computing the selected partition set from a given set of
> clauses, partitioning code tries to keep the min/max representation as
> long as it makes sense to and once the selected partitions no longer
> appear to be contiguous it switches to the Bitmapset mode.

Yip. I understand that. I just think there's no benefit to having
min/max since it needs to be materialized into a list of the entire
range at some point, it might as well be done as soon as possible
using a bitmapset, which would save having all the partset_union,
partset_intersect, partset_range_empty, partset_range_overlap,
partset_range_adjacent code. You'd end up just using bms_union and
bms_intersect then bms_add_range to handle the min/max bound you get
from the bsearch.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-05 Thread Amit Langote
On 2017/11/06 12:53, David Rowley wrote:
> On 3 November 2017 at 17:32, David Rowley  
> wrote:
>> 2. This code is way more complex than it needs to be.
>>
>> if (num_parts > 0)
>> {
>> int j;
>>
>> all_indexes = (int *) palloc(num_parts * sizeof(int));
>> j = 0;
>> if (min_part_idx >= 0 && max_part_idx >= 0)
>> {
>> for (i = min_part_idx; i <= max_part_idx; i++)
>> all_indexes[j++] = i;
>> }
>> if (!bms_is_empty(other_parts))
>> while ((i = bms_first_member(other_parts)) >= 0)
>> all_indexes[j++] = i;
>> if (j > 1)
>> qsort((void *) all_indexes, j, sizeof(int), intcmp);
>> }
>>
>> It looks like the min/max partition stuff is just complicating things
>> here. If you need to build this array of all_indexes[] anyway, I don't
>> quite understand the point of the min/max. It seems like min/max would
>> probably work a bit nicer if you didn't need the other_parts
>> BitmapSet, so I recommend just getting rid of min/max completely and
>> just have a BitmapSet with bit set for each partition's index you
>> need, you'd not need to go to the trouble of performing a qsort on an
>> array and you could get rid of quite a chunk of code too.
>>
>> The entire function would then not be much more complex than:
>>
>> partindexes = get_partitions_from_clauses(parent, partclauses);
>>
>> while ((i = bms_first_member(partindexes)) >= 0)
>> {
>> AppendRelInfo *appinfo = rel->part_appinfos[i];
>> result = lappend(result, appinfo);
>> }
>>
>> Then you can also get rid of your intcmp() function too.
> 
> I've read a bit more of the patch and I think even more now that the
> min/max stuff should be removed.

Oops, I didn't catch this email before sending my earlier reply.  Thanks
for the bms range patch.  Will reply to this shortly after reading your
patch and thinking on it a bit.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-05 Thread David Rowley
On 31 October 2017 at 21:43, Amit Langote  wrote:
> Attached updated version of the patches

match_clauses_to_partkey() needs to allow for the way quals on Bool
columns are represented.

create table pt (a bool not null) partition by list (a);
create table pt_true partition of pt for values in('t');
create table pt_false partition of pt for values in('f');
explain select * from pt where a = true;
QUERY PLAN
--
 Append  (cost=0.00..76.20 rows=2810 width=1)
   ->  Seq Scan on pt_false  (cost=0.00..38.10 rows=1405 width=1)
 Filter: a
   ->  Seq Scan on pt_true  (cost=0.00..38.10 rows=1405 width=1)
 Filter: a
(5 rows)

match_clause_to_indexcol() shows an example of how to handle this.

explain select * from pt where a = false;

will need to be allowed too. This works slightly differently.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-05 Thread David Rowley
On 3 November 2017 at 17:32, David Rowley  wrote:
> 2. This code is way more complex than it needs to be.
>
> if (num_parts > 0)
> {
> int j;
>
> all_indexes = (int *) palloc(num_parts * sizeof(int));
> j = 0;
> if (min_part_idx >= 0 && max_part_idx >= 0)
> {
> for (i = min_part_idx; i <= max_part_idx; i++)
> all_indexes[j++] = i;
> }
> if (!bms_is_empty(other_parts))
> while ((i = bms_first_member(other_parts)) >= 0)
> all_indexes[j++] = i;
> if (j > 1)
> qsort((void *) all_indexes, j, sizeof(int), intcmp);
> }
>
> It looks like the min/max partition stuff is just complicating things
> here. If you need to build this array of all_indexes[] anyway, I don't
> quite understand the point of the min/max. It seems like min/max would
> probably work a bit nicer if you didn't need the other_parts
> BitmapSet, so I recommend just getting rid of min/max completely and
> just have a BitmapSet with bit set for each partition's index you
> need, you'd not need to go to the trouble of performing a qsort on an
> array and you could get rid of quite a chunk of code too.
>
> The entire function would then not be much more complex than:
>
> partindexes = get_partitions_from_clauses(parent, partclauses);
>
> while ((i = bms_first_member(partindexes)) >= 0)
> {
> AppendRelInfo *appinfo = rel->part_appinfos[i];
> result = lappend(result, appinfo);
> }
>
> Then you can also get rid of your intcmp() function too.

I've read a bit more of the patch and I think even more now that the
min/max stuff should be removed.

I understand that you'll be bsearching for a lower and upper bound for
cases like:

SELECT * FROM pt WHERE key BETWEEN 10 and 20;

but it looks like the min and max range stuff is thrown away if the
query is written as:

SELECT * FROM pt WHERE key BETWEEN 10 and 20 OR key BETWEEN 30 AND 40;

from reading the code, it seems like partset_union() would be called
in this case and if the min/max of each were consecutive then the
min/max range would get merged, but there's really a lot of code to
support this. I think it would be much better to invent
bms_add_range() and just use a Bitmapset to store the partition
indexes to scan. You could simply use bms_union for OR cases and
bms_intersect() or AND cases. It seems this would allow removal of
this complex code. It looks like this would allow you to remove all
the partset_range_* macros too.

I've attached a patch which implements bms_add_range() which will save
you from having to write the tight loops to call bms_add_member() such
as the ones in partset_union(). Those would not be so great if there
was a huge number of partitions as the Bitmapset->words[] array could
be expanded many more times than required. bms_add_range() will handle
that much more efficiently with a maximum of 1 repalloc() for the
whole operation. It would also likely faster since it's working at the
bitmapword level rather than bit level.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Add-bms_add_range-to-add-members-within-the-specifie.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-02 Thread David Rowley
On 31 October 2017 at 21:43, Amit Langote  wrote:
> Attached updated version of the patches addressing some of your comments

I've spent a bit of time looking at these but I'm out of time for now.

So far I have noted down the following;

1. This comment seem wrong.

/*
* Since the clauses in rel->baserestrictinfo should all contain Const
* operands, it should be possible to prune partitions right away.
*/

How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
baserestrictinfo in this case will contain a single RestrictInfo with
an OpExpr containing two Var args and it'll come right through that
function too.

2. This code is way more complex than it needs to be.

if (num_parts > 0)
{
int j;

all_indexes = (int *) palloc(num_parts * sizeof(int));
j = 0;
if (min_part_idx >= 0 && max_part_idx >= 0)
{
for (i = min_part_idx; i <= max_part_idx; i++)
all_indexes[j++] = i;
}
if (!bms_is_empty(other_parts))
while ((i = bms_first_member(other_parts)) >= 0)
all_indexes[j++] = i;
if (j > 1)
qsort((void *) all_indexes, j, sizeof(int), intcmp);
}

It looks like the min/max partition stuff is just complicating things
here. If you need to build this array of all_indexes[] anyway, I don't
quite understand the point of the min/max. It seems like min/max would
probably work a bit nicer if you didn't need the other_parts
BitmapSet, so I recommend just getting rid of min/max completely and
just have a BitmapSet with bit set for each partition's index you
need, you'd not need to go to the trouble of performing a qsort on an
array and you could get rid of quite a chunk of code too.

The entire function would then not be much more complex than:

partindexes = get_partitions_from_clauses(parent, partclauses);

while ((i = bms_first_member(partindexes)) >= 0)
{
AppendRelInfo *appinfo = rel->part_appinfos[i];
result = lappend(result, appinfo);
}

Then you can also get rid of your intcmp() function too.

3. Following code has the wrong size calculation:

memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));

should be PARTITION_MAX_KEYS * sizeof(NullTestType). It might have
worked on your machine if you're compiling as 32 bit.

I'll continue on with the review in the next few days.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-02 Thread David Rowley
On 31 October 2017 at 21:43, Amit Langote  wrote:
> Attached updated version of the patches addressing some of your comments
> above and fixing a bug that Rajkumar reported [1].  As mentioned there,
> I'm including here a patch (the 0005 of the attached) to tweak the default
> range partition constraint to be explicit about null values that it might
> contain.  So, there are 6 patches now and what used to be patch 0005 in
> the previous set is patch 0006 in this version of the set.

Hi Amit,

I've been looking over this. I see the latest patches conflict with
cf7ab13bf. Can you send patches rebased on current master?

Thanks

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-31 Thread Amit Langote
Thanks for the test case.

On 2017/10/30 17:09, Rajkumar Raghuwanshi wrote:
> I am getting wrong output when default is sub-partitioned further, below is
> a test case.
> 
> CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a);
> CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3);
> CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5);
> CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a);
> CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9);
> CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12);
> CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null);
> INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i;
> INSERT INTO lpd VALUES (null,null,null);
> 
> --on HEAD
> postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
> a IS NOT NULL ORDER BY 1;
>  QUERY PLAN
> -
>  Sort
>Sort Key: ((lpd_p1.tableoid)::regclass)
>->  Result
>  ->  Append
>->  Seq Scan on lpd_p1
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_p2
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_d3
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_d1
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_d2
>  Filter: (a IS NOT NULL)
> (14 rows)
> 
> postgres=#
> postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
> BY 1;
>  tableoid | a  | b  | c
> --+++
>  lpd_p1   |  1 | 1  |  1
>  lpd_p1   |  2 | 2  |  2
>  lpd_p1   |  3 | 3  |  3
>  lpd_p2   |  4 | 4  |  4
>  lpd_p2   |  5 | 5  |  5
>  lpd_d1   |  7 | 7  |  7
>  lpd_d1   |  8 | 8  |  8
>  lpd_d1   |  9 | 9  |  9
>  lpd_d2   | 12 | 12 | 12
>  lpd_d2   | 10 | 10 | 10
>  lpd_d2   | 11 | 11 | 11
>  lpd_d3   |  6 | 6  |  6
> (12 rows)
> 
> 
> --on HEAD + v8 patches
> 
> postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
> a IS NOT NULL ORDER BY 1;
>  QUERY PLAN
> -
>  Sort
>Sort Key: ((lpd_p1.tableoid)::regclass)
>->  Result
>  ->  Append
>->  Seq Scan on lpd_p1
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_p2
>  Filter: (a IS NOT NULL)
> (8 rows)
> 
> postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
> BY 1;
>  tableoid | a | b | c
> --+---+---+---
>  lpd_p1   | 1 | 1 | 1
>  lpd_p1   | 2 | 2 | 2
>  lpd_p1   | 3 | 3 | 3
>  lpd_p2   | 4 | 4 | 4
>  lpd_p2   | 5 | 5 | 5
> (5 rows)

I found bugs in 0003 and 0005 that caused this.  Will post the patches
containing the fix in reply to the Dilip's email which contains some code
review comments [1].

Also, I noticed that the new pruning code was having a hard time do deal
with the fact that the default "range" partition doesn't explicitly say in
its partition constraint that it might contain null values.  More
precisely perhaps, the default range partition's constraint appears to
imply that it can only contain non-null values, which confuses the new
pruning code.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFiTN-thYsobXxPS6bwOA_9erpax_S=iztsn3rtuxkkmkg4...@mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-30 Thread Dilip Kumar
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote
 wrote:
> On 2017/10/30 14:55, Amit Langote wrote:
>> Fixed in the attached updated patch, along with a new test in 0001 to
>> cover this case.  Also, made a few tweaks to 0003 and 0005 (moved some
>> code from the former to the latter) around the handling of 
>> ScalarArrayOpExprs.
>
> Sorry, I'd forgotten to include some changes.
>
> In the previous versions, RT index of the table needed to be passed to
> partition.c, which I realized is no longer needed, so I removed that
> requirement from the interface.  As a result, patches 0002 and 0003 have
> changed in this version.

Some Minor comments:

+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,

Function name in function header is not correct.

+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;

DatumGetBool will return true if the first byte of constvalue is
nonzero otherwise
false.  IIUC, this is not the intention here. Or I am missing something?

+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list

a = 0 the list -> a = 0 in the list

+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;

I think for evaluating other expressions (e.g. T_Param) we will have
to pass ExprContext to this function. Or we can do something cleaner
because if we want to access the ExprContext inside
partkey_datum_from_expr then we may need to pass it to
"get_partitions_from_clauses" which is a common function for optimizer
and executor.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-30 Thread Rajkumar Raghuwanshi
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> In the previous versions, RT index of the table needed to be passed to
> partition.c, which I realized is no longer needed, so I removed that
> requirement from the interface.  As a result, patches 0002 and 0003 have
> changed in this version.
>

Thanks for the fix.

I am getting wrong output when default is sub-partitioned further, below is
a test case.

CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a);
CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3);
CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5);
CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a);
CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9);
CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12);
CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null);
INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i;
INSERT INTO lpd VALUES (null,null,null);

--on HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
 QUERY PLAN
-
 Sort
   Sort Key: ((lpd_p1.tableoid)::regclass)
   ->  Result
 ->  Append
   ->  Seq Scan on lpd_p1
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_p2
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_d3
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_d1
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_d2
 Filter: (a IS NOT NULL)
(14 rows)

postgres=#
postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
 tableoid | a  | b  | c
--+++
 lpd_p1   |  1 | 1  |  1
 lpd_p1   |  2 | 2  |  2
 lpd_p1   |  3 | 3  |  3
 lpd_p2   |  4 | 4  |  4
 lpd_p2   |  5 | 5  |  5
 lpd_d1   |  7 | 7  |  7
 lpd_d1   |  8 | 8  |  8
 lpd_d1   |  9 | 9  |  9
 lpd_d2   | 12 | 12 | 12
 lpd_d2   | 10 | 10 | 10
 lpd_d2   | 11 | 11 | 11
 lpd_d3   |  6 | 6  |  6
(12 rows)


--on HEAD + v8 patches

postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
 QUERY PLAN
-
 Sort
   Sort Key: ((lpd_p1.tableoid)::regclass)
   ->  Result
 ->  Append
   ->  Seq Scan on lpd_p1
 Filter: (a IS NOT NULL)
   ->  Seq Scan on lpd_p2
 Filter: (a IS NOT NULL)
(8 rows)

postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
 tableoid | a | b | c
--+---+---+---
 lpd_p1   | 1 | 1 | 1
 lpd_p1   | 2 | 2 | 2
 lpd_p1   | 3 | 3 | 3
 lpd_p2   | 4 | 4 | 4
 lpd_p2   | 5 | 5 | 5
(5 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] path toward faster partition pruning

2017-10-27 Thread Rajkumar Raghuwanshi
On Fri, Oct 27, 2017 at 2:41 PM, Amit Langote  wrote:

> 0001: added some new tests
> 0002: no change
> 0003: fixed issue that Rajkumar reported (cope with Params properly)
> 0004: no change
> 0005: fix the case to prune the default partition when warranted (the
>   issue reported by Beena)
>

Thanks for the updated patch, i am getting server crash with below query.

CREATE TABLE mp (c1 int, c2 int, c3 int) PARTITION BY LIST(c3);
CREATE TABLE mp_p1 PARTITION OF mp FOR VALUES IN (10, 20) PARTITION BY
RANGE(c2);
CREATE TABLE mp_p1_1 PARTITION OF mp_p1 FOR VALUES FROM (0) TO (200);
CREATE TABLE mp_p1_2 PARTITION OF mp_p1 FOR VALUES FROM (200) TO (400);
CREATE TABLE mp_p2 PARTITION OF mp FOR VALUES IN (30, 40) PARTITION BY
RANGE(c2);
CREATE TABLE mp_p2_1 PARTITION OF mp_p2 FOR VALUES FROM (0) TO (300);
CREATE TABLE mp_p2_2 PARTITION OF mp_p2 FOR VALUES FROM (300) TO (600);

INSERT INTO mp VALUES(10, 100, 10);
INSERT INTO mp VALUES(20, 200, 20);
INSERT INTO mp VALUES(21, 150, 30);
INSERT INTO mp VALUES(30, 200, 40);
INSERT INTO mp VALUES(31, 300, 30);
INSERT INTO mp VALUES(40, 400, 40);

EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM mp WHERE c3 = 40 AND
c2 < 300;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] path toward faster partition pruning

2017-10-27 Thread Rajkumar Raghuwanshi
On Thu, Oct 26, 2017 at 4:47 PM, Amit Langote  wrote:

>
> Meanwhile, attached updated set of patches including fixes for the typos
> you reported in the other message.  Updated 0005 fixes the first bug (the
> Case 1 in your email), while other patches 0002-0004 are updated mostly to
> fix the reported typos.  A couple of tests are added in 0001 to test the
> default partition case a bit more.
>

Hi Amit,

while testing further this feature, I got a bug with partitions as foreign
tables. Test case given below. Take a look.

CREATE EXTENSION postgres_fdw;
CREATE SERVER fp_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname
'postgres', port '5432', use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER fp_server;

CREATE TABLE fplt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE fplt1_p1 (a int, b int, c text);
CREATE FOREIGN TABLE ftplt1_p1 PARTITION OF fplt1 FOR VALUES IN ('',
'0001', '0002', '0003') SERVER fp_server OPTIONS (TABLE_NAME 'fplt1_p1');
CREATE TABLE fplt1_p2 (a int, b int, c text);
CREATE FOREIGN TABLE ftplt1_p2 PARTITION OF fplt1 FOR VALUES IN ('0004',
'0005', '0006', '0007') SERVER fp_server OPTIONS (TABLE_NAME 'fplt1_p2');

INSERT INTO fplt1_p1 SELECT i, i, to_char(i/50, 'FM') FROM
generate_series(0, 198, 2) i;
INSERT INTO fplt1_p2 SELECT i, i, to_char(i/50, 'FM') FROM
generate_series(200, 398, 2) i;

--PG-HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT t1.c FROM fplt1 t1, LATERAL (SELECT
DISTINCT t2.c FROM fplt1 t2  WHERE  t2.c = t1.c ) q;
QUERY PLAN
--
 Nested Loop
   ->  Append
 ->  Foreign Scan on ftplt1_p1 t1
 ->  Foreign Scan on ftplt1_p2 t1_1
   ->  Unique
 ->  Append
   ->  Foreign Scan on ftplt1_p1 t2
   ->  Foreign Scan on ftplt1_p2 t2_1
(8 rows)

--PG-HEAD +v5 patches
postgres=# EXPLAIN (COSTS OFF) SELECT t1.c FROM fplt1 t1, LATERAL (SELECT
DISTINCT t2.c FROM fplt1 t2  WHERE  t2.c = t1.c ) q;

*ERROR:  invalid expression for partition key*
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] path toward faster partition pruning

2017-10-27 Thread Amit Langote
On 2017/10/27 13:57, Robert Haas wrote:
> On Fri, Oct 27, 2017 at 3:17 AM, Amit Langote
>  wrote:
>>> I don't think we really want to get into theorem-proving here, because
>>> it's slow.
>>
>> Just to be clear, I'm saying we could use theorem-proving (if at all) just
>> for the default partition.
> 
> I don't really see why it should be needed there either.  We've got
> all the bounds in order, so we should know where there are any gaps
> that are covered by the default partition in the range we care about.

Sorry, I forgot to add: "...just for the default partition, for cases like
the one in Beena's example."

In normal cases, default partition selection doesn't require any
theorem-proving.  It proceeds in a straightforward manner more or less
like what you said it should.

After thinking more on it, I think there is a rather straightforward trick
to implement the idea you mentioned to get this working for the case
presented in Beena's example, which works as follows:

For any non-root partitioned tables, we add the list of its partition
constraint clauses to the query-provided list of clauses and use the whole
list to drive the partition-pruning algorithm.  So, when partition-pruning
runs for tprt_1, along with (< 1) which the original query provides,
we also have (>= 1) which comes from the partition constraint of tprt_1
(which is >= 1 and < 5).  Note that there exists a trick in the new
code for the (< 5) coming from the constraint to be overridden by the
more restrictive (< 1) coming from the original query.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-26 Thread Robert Haas
On Fri, Oct 27, 2017 at 3:17 AM, Amit Langote
 wrote:
>> I don't think we really want to get into theorem-proving here, because
>> it's slow.
>
> Just to be clear, I'm saying we could use theorem-proving (if at all) just
> for the default partition.

I don't really see why it should be needed there either.  We've got
all the bounds in order, so we should know where there are any gaps
that are covered by the default partition in the range we care about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-26 Thread Amit Langote
On 2017/10/26 20:34, Robert Haas wrote:
> On Thu, Oct 26, 2017 at 1:17 PM, Amit Langote
>  wrote:
>> It can perhaps taught to not make that conclusion by taking into account
>> the default partition's partition constraint, which includes constraint
>> inherited from the parent, viz. 1 <= col1 < 50001.  To do that, it might
>> be possible to summon up predtest.c's powers to conclude from the default
>> partition's partition constraint that it cannot contain any keys < 1, but
>> then we'll have to frame up a clause expression describing the latter.
>> Generating such a clause expression can be a bit daunting for a
>> multi-column key.   So, I haven't yet tried really hard to implement this.
>>  Any thoughts on that?
> 
> I don't think we really want to get into theorem-proving here, because
> it's slow.

Just to be clear, I'm saying we could use theorem-proving (if at all) just
for the default partition.

> Whatever we're going to do we should be able to do without
> that - keeping it in the form of btree-strategy + value.  It doesn't
> seem that hard.  Suppose we're asked to select partitions from tprt
> subject to (<, 1).  Well, we determine that some of the tprt_1
> partitions may be relevant, so we tell tprt_1 to select partitions
> subject to (>=, 1, <, 1).  We know to do that because we know that
> 1 < 5 and we know to include >= 1 because we haven't got any
> lower bound currently at all.  What's the problem?

Hmm, that's interesting.  With the approach that the patch currently
takes, (>= 1) wouldn't be passed down when selecting the partitions of
tprt_1.  The source of values (+ btree strategy) to use to select
partitions is the same original set of clauses for all partitioned tables
in the tree, as the patch currently implements it.  Nothing would get
added to that set (>= 1, as in this example), nor subtracted (such as
clauses that are trivially true).

I will think about this approach in general and to solve this problem in
particular.

> In some sense it's tempting to say that this case just doesn't matter
> very much; after all, subpartitioning on the same column used to
> partition at the top level is arguably lame.  But if we can get it
> right in a relatively straightforward manner then let's do it.

Yeah, I tend to agree.

Thanks for the input.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-26 Thread Robert Haas
On Thu, Oct 26, 2017 at 1:17 PM, Amit Langote
 wrote:
> It can perhaps taught to not make that conclusion by taking into account
> the default partition's partition constraint, which includes constraint
> inherited from the parent, viz. 1 <= col1 < 50001.  To do that, it might
> be possible to summon up predtest.c's powers to conclude from the default
> partition's partition constraint that it cannot contain any keys < 1, but
> then we'll have to frame up a clause expression describing the latter.
> Generating such a clause expression can be a bit daunting for a
> multi-column key.   So, I haven't yet tried really hard to implement this.
>  Any thoughts on that?

I don't think we really want to get into theorem-proving here, because
it's slow.  Whatever we're going to do we should be able to do without
that - keeping it in the form of btree-strategy + value.  It doesn't
seem that hard.  Suppose we're asked to select partitions from tprt
subject to (<, 1).  Well, we determine that some of the tprt_1
partitions may be relevant, so we tell tprt_1 to select partitions
subject to (>=, 1, <, 1).  We know to do that because we know that
1 < 5 and we know to include >= 1 because we haven't got any
lower bound currently at all.  What's the problem?

In some sense it's tempting to say that this case just doesn't matter
very much; after all, subpartitioning on the same column used to
partition at the top level is arguably lame.  But if we can get it
right in a relatively straightforward manner then let's do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-25 Thread Beena Emerson
Hello,

On Wed, Oct 25, 2017 at 1:07 PM, Amit Langote
 wrote:
> On 2017/10/25 15:47, Amit Langote wrote:
>> On 2017/10/24 1:38, Beena Emerson wrote:
>>> I had noticed this and also that this crash:
>>>
>>> tprt PARTITION BY RANGE(Col1)
>>>tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
>>>   tprt_11 FOR VALUES FROM (1) TO (1),
>>>   tprt_1d DEFAULT
>>>tprt_2 FOR VALUES FROM (50001) TO (11)
>>>
>>> EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>> ...and this (crash) were due to bugs in the 0005 patch.
>
> [  ]
>
>> Should be fixed in the attached updated version.
>
> Oops, not quite.  The crash that Beena reported wasn't fixed (or rather
> reintroduced by some unrelated change after once confirming it was fixed).
>
> Really fixed this time.

Some minor comments:

1. wrong function name (0003)

The comment on function get_partitions_from_clauses_guts uses wrong name:
instead of "_from_", "_using_" is written.

 /*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *

2. typo information (0003)

+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument

3. misspell admissible (0003)
+* columns, whereas a prefix of all partition key columns is addmissible
+* as min and max keys.

4. double and? (0002)
+* as part of the operator family, check if its negator
+* exists and and that the latter is compatible with

5. typo inequality (0002)
+* (key < val OR key > val), if the partitioning method
+* supports such notion of inequlity.


6. typo output (0005)
+* return it as the only scannable partition, that means the query
+* doesn't want null values in its outout.

7. typo provide (0005)
+   /* Valid keys->eqkeys must provoide all partition keys. */
+   Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);

8. comment of struct PartClause (0003)
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */

Instead of repeated recomputation, we can use just  " repeated
computation" or just " recomputation"



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-25 Thread Beena Emerson
Hello Amit,

Thanks for the updated patches

On Wed, Oct 25, 2017 at 1:07 PM, Amit Langote
 wrote:
> On 2017/10/25 15:47, Amit Langote wrote:
>> On 2017/10/24 1:38, Beena Emerson wrote:
>>> I had noticed this and also that this crash:
>>>
>>> tprt PARTITION BY RANGE(Col1)
>>>tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
>>>   tprt_11 FOR VALUES FROM (1) TO (1),
>>>   tprt_1d DEFAULT
>>>tprt_2 FOR VALUES FROM (50001) TO (11)
>>>
>>> EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>> ...and this (crash) were due to bugs in the 0005 patch.
>
> [  ]
>
>> Should be fixed in the attached updated version.
>
> Oops, not quite.  The crash that Beena reported wasn't fixed (or rather
> reintroduced by some unrelated change after once confirming it was fixed).
>
> Really fixed this time.
>

The crashes are fixed. However, handling of DEFAULT partition in
various queries is not proper.

Case 1: In this case default should be selected.
 DROP TABLE tprt;
  CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
  CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY list(col1);
  CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES IN (2, 25000);
  CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES IN (5, 35000);
  CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES IN (1);
  CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;


postgres=#  EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 1;
QUERY PLAN
--
 Result
   One-Time Filter: false
(2 rows)


Case 2: In this case DEFAULT need not be selected.

DROP TABLE  tprt;
  CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
  CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY range(col1);
  CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (1);
  CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (2);
  CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES FROM (2) TO (3);
  CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
  INSERT INTO tprt SELECT generate_series(1,5), generate_series(1,5);

postgres=#  EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 1;
   QUERY PLAN

 Append
   ->  Seq Scan on tprt_11
 Filter: (col1 < 1)
   ->  Seq Scan on tprt_1d
 Filter: (col1 < 1)
(5 rows)


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Beena Emerson
On Mon, Oct 23, 2017 at 3:24 PM, Rajkumar Raghuwanshi
 wrote:
>
> On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote
>  wrote:
>>
>> The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
>> this typedef redefinition introduced by the 0002 patch, but it seems that
>> it's not needed anyway, so got rid of that line in the attached updated
>> patch.
>>
>> Fixed one more useless diff in 0002, but no changes in any other patch
>
>
> Thanks for updated patches, I am able to compile it on head.
>
> While testing this, I got an observation, pruning is not scanning default
> partition leading to wrong output. below is test to reproduce this.
>
> create table rp (a int, b varchar) partition by range (a);
> create table rp_p1 partition of rp default;
> create table rp_p2 partition of rp for values from (1) to (10);
> create table rp_p3 partition of rp for values from (10) to (maxvalue);
>
> insert into rp values (-1,'p1');
> insert into rp values (1,'p2');
> insert into rp values (11,'p3');
>
> postgres=# select tableoid::regclass,* from rp;
>  tableoid | a  | b
> --++
>  rp_p2|  1 | p2
>  rp_p3| 11 | p3
>  rp_p1| -1 | p1
> (3 rows)
>
> --with pruning
> postgres=# explain (costs off) select * from rp where a <= 1;
> QUERY PLAN
> --
>  Append
>->  Seq Scan on rp_p2
>  Filter: (a <= 1)
> (3 rows)
>
> postgres=# select * from rp where a <= 1;
>  a | b
> ---+
>  1 | p2
> (1 row)
>

I had noticed this and also that this crash:

tprt PARTITION BY RANGE(Col1)
   tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
  tprt_11 FOR VALUES FROM (1) TO (1),
  tprt_1d DEFAULT
   tprt_2 FOR VALUES FROM (50001) TO (11)

EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Rajkumar Raghuwanshi
On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote  wrote:

> The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
> this typedef redefinition introduced by the 0002 patch, but it seems that
> it's not needed anyway, so got rid of that line in the attached updated
> patch.
>
> Fixed one more useless diff in 0002, but no changes in any other patch


Thanks for updated patches, I am able to compile it on head.

While testing this, I got an observation, pruning is not scanning default
partition leading to wrong output. below is test to reproduce this.

create table rp (a int, b varchar) partition by range (a);
create table rp_p1 partition of rp default;
create table rp_p2 partition of rp for values from (1) to (10);
create table rp_p3 partition of rp for values from (10) to (maxvalue);

insert into rp values (-1,'p1');
insert into rp values (1,'p2');
insert into rp values (11,'p3');

postgres=# select tableoid::regclass,* from rp;
 tableoid | a  | b
--++
 rp_p2|  1 | p2
 rp_p3| 11 | p3
 rp_p1| -1 | p1
(3 rows)

--with pruning
postgres=# explain (costs off) select * from rp where a <= 1;
QUERY PLAN
--
 Append
   ->  Seq Scan on rp_p2
 Filter: (a <= 1)
(3 rows)

postgres=# select * from rp where a <= 1;
 a | b
---+
 1 | p2
(1 row)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Rajkumar Raghuwanshi
On Thu, Oct 19, 2017 at 12:16 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Description of the attached patches:
>
> 0001: add new tests for partition-pruning
>
> 0002: patch that makes all the changes needed in the planer (adds a stub
>   function in partition.c)
>
> 0003: patch that implements the aforementioned stub (significant amount of
>   code to analyze partition clauses and gin up bounding keys to
>   compare with the values in PartitionBoundInfo; the actual function
>   that will do the comparison is just a stub as of this patch)
>
> 0004: make some preparatory changes to partition_bound_cmp/bsearch, to be
>   able to pass incomplete partition keys (aka, prefix of a multi-
>   column key) for comparison with the values in PartitionBoundInfo
>  (just a refactoring patch)
>
> 0005: implements the stub mentioned in 0003 and finally gets the new
>   partition-pruning working (also disables constraint exclusion using
>   internal partition constraints by teaching get_relation_constraints
>   to not include those).
>
> Feedback greatly welcome.
>
Hi Amit,

I have tried to apply attached patch. patch applied cleanly on commit id -
bf54c0f05c0a58db17627724a83e1b6d4ec2712c
but make failed with below error.

./../../../src/include/nodes/relation.h:2126: error: redefinition of
typedef ‘AppendRelInfo’
../../../../src/include/nodes/relation.h:584: note: previous declaration of
‘AppendRelInfo’ was here
make[4]: *** [gistbuild.o] Error 1


Re: [HACKERS] path toward faster partition pruning

2017-10-02 Thread Amit Langote
Hi David.

Thanks a lot for your review comments and sorry it took me a while to reply.

On 2017/09/28 18:16, David Rowley wrote:
> On 27 September 2017 at 14:22, Amit Langote
>  wrote:
>> - 0001 includes refactoring that Dilip proposed upthread [1] (added him as
>>   an author).  I slightly tweaked his patch -- renamed the function
>>   get_matching_clause to match_clauses_to_partkey, similar to
>>   match_clauses_to_index.
> 
> Hi Amit,
> 
> I've made a pass over the 0001 patch while trying to get myself up to
> speed with the various developments that are going on in partitioning
> right now.
> 
> These are just my thoughts from reading over the patch. I understand
> that there's quite a bit up in the air right now about how all this is
> going to work, but I have some thoughts about that too, which I
> wouldn't mind some feedback on as my line of thought may be off.
> 
> Anyway, I will start with some small typos that I noticed while reading:
> 
> partition.c:
> 
> + * telling what kind of NullTest has been applies to the corresponding
> 
> "applies" should be "applied"

Will fix.

> plancat.c:
> 
> * might've occurred to satisfy the query.  Rest of the fields are set in
> 
> "Rest of the" should be "The remaining"

Will fix.

> Any onto more serious stuff:
> 
> allpath.c:
> 
> get_rel_partitions()
> 
> I wonder if this function does not belong in partition.c. I'd have
> expected a function to exist per partition type, RANGE and LIST, I'd
> imagine should have their own function in partition.c to eliminate
> partitions
> which cannot possibly match, and return the remainder. I see people
> speaking of HASH partitioning, but we might one day also want
> something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine
> routing records to be processed into foreign tables where you never
> need to query them again). It would be good if we could easily expand
> this list and not have to touch files all over the optimizer to do
> that. Of course, there would be other work to do in the executor to
> implement any new partitioning method too.

I think there will have to be at least some code in the optimizer.  That
is, the code to match the query to the table's partition keys and using
the constant values therein to then look up the table's partitions.  More
importantly, once the partitions (viz. their offsets in the table's
partition descriptor) have been looked up by partition.c, we must be able
to then map them to their planner data structures viz. their
AppendRelInfo, and subsequently RelOptInfo.  This last part will have to
be in the optimizer.  In fact, that was the role of get_rel_partitions in
the initial versions of the patch, when neither of the code for matching
keys and for pruning using constants was implemented.

One may argue that the first part, that is, matching clauses to match to
the partition key and subsequent lookup of partitions using constants
could both be implemented in partition.c, but it seems better to me to
keep at least the part of matching clauses to the partition keys within
the planner (just like matching clauses to indexes is done by the
planner).  Looking up partitions using constants cannot be done outside
partition.c anyway, so that's something we have to implement there.

> I know there's mention of it somewhere about get_rel_partitions() not
> being so smart about WHERE partkey > 20 AND partkey > 10, the code
> does not understand what's more restrictive. I think you could
> probably follow the same rules here that are done in
> eval_const_expressions(). Over there I see that evaluate_function()
> will call anything that's not marked as volatile.

Hmm, unless I've missed it, I don't see in evaluate_function() anything
about determining which clause is more restrictive.  AFAIK, such
determination depends on the btree operator class semantics (at least in
the most common cases where, say, ">" means greater than in a sense that
btree code uses it), so I was planning to handle it the way btree code
handles it in _bt_preprocess_keys().  In fact, the existing code in
predtest.c, which makes decisions of the similar vein also relies on btree
semantics.  It's OK to do so, because partitioning methods that exist
today and for which we'd like to have such smarts use btree semantics to
partition the data.  Also, we won't need to optimize such cases for hash
partitioning anyway.

> I'd imagine, for
> each partition key, you'd want to store a Datum with the minimum and
> maximum possible value based on the quals processed. If either the
> minimum or maximum is still set to NULL, then it's unbounded at that
> end. If you encounter partkey = Const, then minimum and maximum can be
> set to the value of that Const. From there you can likely ignore any
> other quals for that partition key, as if the query did contain
> another qual with partkey = SomeOtherConst, then that would have
> become a gating qual during the constant folding process. This way 

Re: [HACKERS] path toward faster partition pruning

2017-10-02 Thread Robert Haas
On Sun, Oct 1, 2017 at 9:13 PM, Amit Langote
 wrote:
> I agree.  Equality checks are going to be common enough to warrant them to
> be handled specially, instead of implementing equality-pruning on top of
> min/max framework.

What you might do is pass  and
optionally allow a second .  Then for
the common case of equality you can pass BTEqualStrategyNumber and for
a range bounded at both ends you can pass BTGreaterStrategyNumber or
BTGreaterEqualStrategyNumber for one bound and BTLessStrategyNumber or
BTLessEqualStrategyNumber for the other.

Not sure if this is exactly the right idea but it's what pops to mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-01 Thread Amit Langote
On 2017/09/30 1:28, Robert Haas wrote:
> On Thu, Sep 28, 2017 at 5:16 AM, David Rowley
>  wrote:
>> I'd imagine, for
>> each partition key, you'd want to store a Datum with the minimum and
>> maximum possible value based on the quals processed. If either the
>> minimum or maximum is still set to NULL, then it's unbounded at that
>> end. If you encounter partkey = Const, then minimum and maximum can be
>> set to the value of that Const. From there you can likely ignore any
>> other quals for that partition key, as if the query did contain
>> another qual with partkey = SomeOtherConst, then that would have
>> become a gating qual during the constant folding process. This way if
>> the user had written WHERE partkey >= 1 AND partkey <= 1 the
>> evaluation would end up the same as if they'd written WHERE partkey =
>> 1 as the minimum and maximum would be the same value in both cases,
>> and when those two values are the same then it would mean just one
>> theoretical binary search on a partition range to find the correct
>> partition instead of two.
> 
> I have not looked at the code submitted here in detail yet but I do
> think we should try to avoid wasting cycles in the
> presumably-quite-common case where equality is being tested.  The
> whole idea of thinking of this as minimum/maximum seems like it might
> be off precisely for that reason.

I agree.  Equality checks are going to be common enough to warrant them to
be handled specially, instead of implementing equality-pruning on top of
min/max framework.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-29 Thread Robert Haas
On Thu, Sep 28, 2017 at 5:16 AM, David Rowley
 wrote:
> I'd imagine, for
> each partition key, you'd want to store a Datum with the minimum and
> maximum possible value based on the quals processed. If either the
> minimum or maximum is still set to NULL, then it's unbounded at that
> end. If you encounter partkey = Const, then minimum and maximum can be
> set to the value of that Const. From there you can likely ignore any
> other quals for that partition key, as if the query did contain
> another qual with partkey = SomeOtherConst, then that would have
> become a gating qual during the constant folding process. This way if
> the user had written WHERE partkey >= 1 AND partkey <= 1 the
> evaluation would end up the same as if they'd written WHERE partkey =
> 1 as the minimum and maximum would be the same value in both cases,
> and when those two values are the same then it would mean just one
> theoretical binary search on a partition range to find the correct
> partition instead of two.

I have not looked at the code submitted here in detail yet but I do
think we should try to avoid wasting cycles in the
presumably-quite-common case where equality is being tested.  The
whole idea of thinking of this as minimum/maximum seems like it might
be off precisely for that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-28 Thread David Rowley
On 27 September 2017 at 14:22, Amit Langote
 wrote:
> - 0001 includes refactoring that Dilip proposed upthread [1] (added him as
>   an author).  I slightly tweaked his patch -- renamed the function
>   get_matching_clause to match_clauses_to_partkey, similar to
>   match_clauses_to_index.

Hi Amit,

I've made a pass over the 0001 patch while trying to get myself up to
speed with the various developments that are going on in partitioning
right now.

These are just my thoughts from reading over the patch. I understand
that there's quite a bit up in the air right now about how all this is
going to work, but I have some thoughts about that too, which I
wouldn't mind some feedback on as my line of thought may be off.

Anyway, I will start with some small typos that I noticed while reading:

partition.c:

+ * telling what kind of NullTest has been applies to the corresponding

"applies" should be "applied"

plancat.c:

* might've occurred to satisfy the query.  Rest of the fields are set in

"Rest of the" should be "The remaining"

Any onto more serious stuff:

allpath.c:

get_rel_partitions()

I wonder if this function does not belong in partition.c. I'd have
expected a function to exist per partition type, RANGE and LIST, I'd
imagine should have their own function in partition.c to eliminate
partitions
which cannot possibly match, and return the remainder. I see people
speaking of HASH partitioning, but we might one day also want
something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine
routing records to be processed into foreign tables where you never
need to query them again). It would be good if we could easily expand
this list and not have to touch files all over the optimizer to do
that. Of course, there would be other work to do in the executor to
implement any new partitioning method too.

I know there's mention of it somewhere about get_rel_partitions() not
being so smart about WHERE partkey > 20 AND partkey > 10, the code
does not understand what's more restrictive. I think you could
probably follow the same rules here that are done in
eval_const_expressions(). Over there I see that evaluate_function()
will call anything that's not marked as volatile. I'd imagine, for
each partition key, you'd want to store a Datum with the minimum and
maximum possible value based on the quals processed. If either the
minimum or maximum is still set to NULL, then it's unbounded at that
end. If you encounter partkey = Const, then minimum and maximum can be
set to the value of that Const. From there you can likely ignore any
other quals for that partition key, as if the query did contain
another qual with partkey = SomeOtherConst, then that would have
become a gating qual during the constant folding process. This way if
the user had written WHERE partkey >= 1 AND partkey <= 1 the
evaluation would end up the same as if they'd written WHERE partkey =
1 as the minimum and maximum would be the same value in both cases,
and when those two values are the same then it would mean just one
theoretical binary search on a partition range to find the correct
partition instead of two.

I see in get_partitions_for_keys you've crafted the function to return
something to identify which partitions need to be scanned. I think it
would be nice to see a special element in the partition array for the
NULL and DEFAULT partition. I imagine 0 and 1, but obviously, these
would be defined constants somewhere. The signature of that function
is not so pretty and that would likely tidy it up a bit. The matching
partition indexes could be returned as a Bitmapset, yet, I don't
really see any code which handles adding the NULL and DEFAULT
partition in get_rel_partitions() either, maybe I've just not looked
hard enough yet...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-28 Thread Dilip Kumar
On Thu, Sep 28, 2017 at 1:44 PM, Amit Langote
 wrote:
> On 2017/09/28 13:58, Dilip Kumar wrote:

>> I think the above logic is common between this patch and the runtime
>> pruning.  I think we can make
>> a reusable function.  Here we are preparing minkey and maxkey of Datum
>> because we can directly fetch rightop->constvalue whereas for runtime
>> pruning we are making minkeys and maxkeys of Expr because during
>> planning time we don't have the values for the Param.  I think we can
>> always make these minkey, maxkey array of Expr and later those can be
>> processed in whatever way we want it.  So this path will fetch the
>> constval out of Expr and runtime pruning will Eval that expression at
>> runtime.
>
> I think that makes sense.  In fact we could even move the minkey/maxkey
> collection code to match_clauses_to_partkey() itself.  No need for a
> different function and worrying about defining a separate interface for
> the same.  We match clauses exactly because we want to extract the
> constant or param values out of them.  No need to do the two activities
> independently and in different places.
>

+1


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-28 Thread Amit Langote
On 2017/09/28 13:58, Dilip Kumar wrote:
> On Wed, Sep 27, 2017 at 6:52 AM, Amit Langote
>  wrote:
> 
> I was looking into the latest patch set, seems like we can reuse some
> more code between this path and runtime pruning[1]
> 
> + foreach(lc1, matchedclauses[i])
> + {
> + Expr   *clause = lfirst(lc1);
> + Const  *rightop = (Const *) get_rightop(clause);
> + Oid opno = ((OpExpr *) clause)->opno,
> + opfamily = rel->part_scheme->partopfamily[i];
> + StrategyNumber strategy;
> +
> + strategy = get_op_opfamily_strategy(opno, opfamily);
> + switch (strategy)
> + {
> + case BTLessStrategyNumber:
> + case BTLessEqualStrategyNumber:
> + if (need_next_max)
> + {
> + maxkeys[i] = rightop->constvalue;
> + if (!maxkey_set[i])
> + n_maxkeys++;
> + maxkey_set[i] = true;
> + max_incl = (strategy == BTLessEqualStrategyNumber);
> + }
> + if (strategy == BTLessStrategyNumber)
> + need_next_max = false;
> 
> I think the above logic is common between this patch and the runtime
> pruning.  I think we can make
> a reusable function.  Here we are preparing minkey and maxkey of Datum
> because we can directly fetch rightop->constvalue whereas for runtime
> pruning we are making minkeys and maxkeys of Expr because during
> planning time we don't have the values for the Param.  I think we can
> always make these minkey, maxkey array of Expr and later those can be
> processed in whatever way we want it.  So this path will fetch the
> constval out of Expr and runtime pruning will Eval that expression at
> runtime.

I think that makes sense.  In fact we could even move the minkey/maxkey
collection code to match_clauses_to_partkey() itself.  No need for a
different function and worrying about defining a separate interface for
the same.  We match clauses exactly because we want to extract the
constant or param values out of them.  No need to do the two activities
independently and in different places.

> Does this make sense or it will cause one level of extra processing
> for this path i.e converting the Expr array to CONST array?

Hm, it's not such a big cost to pay I'd think.

I will update the planner patch accordingly.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-27 Thread Dilip Kumar
On Wed, Sep 27, 2017 at 6:52 AM, Amit Langote
 wrote:

I was looking into the latest patch set, seems like we can reuse some
more code between this path and runtime pruning[1]

+ foreach(lc1, matchedclauses[i])
+ {
+ Expr   *clause = lfirst(lc1);
+ Const  *rightop = (Const *) get_rightop(clause);
+ Oid opno = ((OpExpr *) clause)->opno,
+ opfamily = rel->part_scheme->partopfamily[i];
+ StrategyNumber strategy;
+
+ strategy = get_op_opfamily_strategy(opno, opfamily);
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+ }
+ if (strategy == BTLessStrategyNumber)
+ need_next_max = false;

I think the above logic is common between this patch and the runtime
pruning.  I think we can make
a reusable function.  Here we are preparing minkey and maxkey of Datum
because we can directly fetch rightop->constvalue whereas for runtime
pruning we are making minkeys and maxkeys of Expr because during
planning time we don't have the values for the Param.  I think we can
always make these minkey, maxkey array of Expr and later those can be
processed in whatever way we want it.  So this path will fetch the
constval out of Expr and runtime pruning will Eval that expression at
runtime.

Does this make sense or it will cause one level of extra processing
for this path i.e converting the Expr array to CONST array?

[1] 
https://www.postgresql.org/message-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-27 Thread amul sul
On Wed, Sep 27, 2017 at 6:09 AM, Amit Langote  wrote:

> On 2017/09/27 1:51, Robert Haas wrote:
> > On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
> >  wrote:
> >> One could advocate (*cough*) that the hash partition patch [1] should be
> >> merged first in order to find other instances of where other CommitFest
> >> entries doesn't account for hash partitions at the moment in their
> method
> >> signatures; Beena noted something similar in [2]. I know that you said
> >> otherwise [3], but this is CommitFest 1, so there is time for a revert
> >> later, and hash partitions are already useful in internal testing.
> >
> > Well, that's a fair point.  I was assuming that committing things in
> > that order would cause me to win the "least popular committer" award
> > at least for that day, but maybe not.  It's certainly not ideal to
> > have to juggle that patch along and keep rebasing it over other
> > changes when it's basically done, and just waiting on other
> > improvements to land.  Anybody else wish to express an opinion?
>
> FWIW, I tend to agree that it would be nice to get the hash partitioning
> patch in, even with old constraint exclusion based partition-pruning not
> working for hash partitions.  That way, it might be more clear what we
> need to do in the partition-pruning patches to account for hash partitions.
>
​
+1

regards,
Amul​


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Amit Langote
Hi Jesper.

Firstly, thanks for looking at the patch.

On 2017/09/26 22:00, Jesper Pedersen wrote:
> Hi Amit,
> 
> On 09/15/2017 04:50 AM, Amit Langote wrote:
>> On 2017/09/15 11:16, Amit Langote wrote:
>>> I will post rebased patches later today, although I think the overall
>>> design of the patch on the planner side of things is not quite there yet.
>>> Of course, your and others' feedback is greatly welcome.
>>
>> Rebased patches attached.  Because Dilip complained earlier today about
>> clauses of the form (const op var) not causing partition-pruning, I've
>> added code to commute the clause where it is required.  Some other
>> previously mentioned limitations remain -- no handling of OR clauses, no
>> elimination of redundant clauses for given partitioning column, etc.
>>
>> A note about 0001: this patch overlaps with
>> 0003-Canonical-partition-scheme.patch from the partitionwise-join patch
>> series that Ashutosh Bapat posted yesterday [1].  Because I implemented
>> the planner-portion of this patch based on what 0001 builds, I'm posting
>> it here.  It might actually turn out that we will review and commit
>> 0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
>> 0001 if you want to play with the later patches.  I would certainly like
>> to review  0003-Canonical-partition-scheme.patch myself, but won't be able
>> to immediately (see below).
>>
> 
> Could you share your thoughts on the usage of PartitionAppendInfo's
> min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.
> 
> I'm looking at get_partitions_for_keys.

Sure.  You may have noticed that min_datum_idx and max_datum_idx in
PartitionAppendInfo are offsets in the PartitionDescData.boundinfo.datums
array, of datums that lie within the query-specified range (that is, using
=, >, >=, <, <= btree operators in the query). That array contains bounds
of all partitions sorted in the btree operator class defined order, at
least for list and range partitioning.  I haven't (yet) closely looked at
the composition of hash partition datums in PartitionBoundInfo, which
perhaps have different ordering properties (or maybe none) than list and
range partitioning datums.

Now, since they are offsets of datums in PartitionBoundInfo, not indexes
of partitions themselves, their utility outside partition.c might be
questionable.  But partition-wise join patch, for example, to determine if
two partitioned tables can be joined partition-wise, is going to check if
PartitionBoundInfos in RelOptInfos of two partitioned tables are
bound-to-bound equal [1].  Partition-pruning may select only a subset of
partitions of each of the joining partitioned tables.  Equi-join
requirement for partition-wise join means that the subset of partitions
will be same on both sides of the join.  My intent of having
min_datum_idx, max_datum_idx, along with contains_null_partition, and
contains_default_partition in PartitionAppendInfo is to have sort of a
cross check that those values end up being same on both sides of the join
after equi-join requirement has been satisfied.  That is,
get_partitions_for_keys will have chosen the same set of partitions for
both partitioned tables and hence will have set the same values for those
fields.

As mentioned above, that may be enough for list and range partitioning,
but since hash partition datums do not appear to have the same properties
as list and range datums [2], we may need an additional field(s) to
describe the hash partition selected by get_partitions_for_keys.  I guess
only one field will be enough, that will be the offset in the datums array
of the hash partition chosen for the query or -1 if query quals couldn't
conclusively determine one (maybe not all partition keys were specified to
be hashed or some or all used non-equality operator).

Hope that answers your question at least to some degree.  Now, there are
some points Robert mentioned in his reply that I will need to also
consider in the patch, which I'll go do now. :)

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFjFpRc4UdCYknBai9pBu2GA1h4nZVNPDmzgs4jOkqFamT1huA%40mail.gmail.com

[2] It appears that in the hash partitioning case, unlike list and range
partitioning, PartitionBoundInfo doesn't contain values that are
directly comparable to query-specified constants, but a pair (modulus,
remainder) for each partition.  We'll first hash *all* the key values
(mentioned in the query) using the partitioning hash machinery and
then determine the hash partition index by using
(hash % largest_modulus) as offset into the PartitionBoundInfo.indexes
array.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Amit Langote
Hi David,

On 2017/09/27 6:04, David Rowley wrote:
> On 25 September 2017 at 23:04, Amit Langote
>  wrote:
>> By the way, I'm now rebasing these patches on top of [1] and will try to
>> merge your refactoring patch in some appropriate way.  Will post more
>> tomorrow.
>>
>> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826
> 
> Yeah, I see 0001 conflicts with that. I'm going to set this to waiting
> on author while you're busy rebasing this.

Thanks for the reminder.  Just thought I'd say that while I'm actually
done rebasing itself (attaching rebased patches to try 'em out), I'm now
considering Robert's comments and will be busy for a bit revising things
based on those comments.

Some notes about the attached patches:

- 0001 includes refactoring that Dilip proposed upthread [1] (added him as
  an author).  I slightly tweaked his patch -- renamed the function
  get_matching_clause to match_clauses_to_partkey, similar to
  match_clauses_to_index.

- Code to set AppendPath's partitioned_rels in add_paths_to_append_rel()
  revised by 0a480502b09 (was originally introduced in d3cc37f1d80) is
  still revised to get partitioned_rels from a source that is not
  PlannerInfo.pcinfo_list.  With the new code, partitioned_rels won't
  contain RT indexes of the partitioned child tables that were pruned.


Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFiTN-tGnQzF_4QtbOHT-3hE%3DOvNaMfbbeRxa4UY0CQyF0G8gQ%40mail.gmail.com
From f20aebcad9b089434ba60cd4439fa1a9d55091b8 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 1/4] WIP: planner-side changes for partition-pruning

Firstly, this adds a stub get_partitions_for_keys() in partition.c
with appropriate interface for the caller to specify bounding scan
keys, along with other information about the scan keys extracted
from the query, such as NULL-ness of the keys, inclusive-ness, etc.

More importantly, this implements the planner-side logic to extract
bounding scan keys to be passed to get_partitions_for_keys.  That is,
it will go through rel->baserestrictinfo and match individual clauses
to partition keys and construct lower bound and upper bound tuples,
which may cover only a prefix of a multi-column partition key.

A bunch of smarts are still missing when mapping the clause operands
with keys.  For example, code to match a clause is specifed as
(constant op var) doesn't exist.  Also, redundant keys are not
eliminated, for example, a combination of clauses a = 10 and a > 1
will cause the later clause a > 1 taking over and resulting in
needless scanning of partitions containing values a > 1 and a < 10.

...constraint exclusion is still used, because
get_partitions_for_keys is just a stub...

Authors: Amit Langote, Dilip Kumar
---
 src/backend/catalog/partition.c   |  43 
 src/backend/optimizer/path/allpaths.c | 390 ++
 src/backend/optimizer/util/plancat.c  |  10 +
 src/backend/optimizer/util/relnode.c  |   7 +
 src/include/catalog/partition.h   |   9 +
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/relation.h  |  66 ++
 7 files changed, 487 insertions(+), 39 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1ab6dba7ae..ccf8a1fa67 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1335,6 +1335,49 @@ get_partition_dispatch_recurse(Relation rel, Relation 
parent,
}
 }
 
+/*
+ * get_partitions_for_keys
+ * Returns the list of indexes of rel's partitions that will need 
to be
+ * scanned given the bounding scan keys.
+ *
+ * Each value in the returned list can be used as an index into the oids array
+ * of the partition descriptor.
+ *
+ * Inputs:
+ * keynullness contains between 0 and (key->partnatts - 1) values, each
+ * telling what kind of NullTest has been applies to the corresponding
+ * partition key column.  minkeys represents the lower bound on the 
partition
+ * the key of the records that the query will return, while maxkeys
+ * represents upper bound.  min_inclusive and max_inclusive tell whether 
the
+ * bounds specified minkeys and maxkeys is inclusive, respectively.
+ *
+ * Other outputs:
+ * *min_datum_index will return the index in boundinfo->datums of the first
+ * datum that the query's bounding keys allow to be returned for the query.
+ * Similarly, *max_datum_index.  *null_partition_chosen returns whether
+ * the null partition will be scanned.
+ *
+ * TODO: Implement.
+ */
+List *
+get_partitions_for_keys(Relation rel,
+   NullTestType *keynullness,
+   Datum *minkeys, int n_minkeys, 
bool min_inclusive,
+   Datum *maxkeys, int n_maxkeys, 
bool max_inclusive,
+  

Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Amit Langote
On 2017/09/27 1:51, Robert Haas wrote:
> On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
>  wrote:
>> One could advocate (*cough*) that the hash partition patch [1] should be
>> merged first in order to find other instances of where other CommitFest
>> entries doesn't account for hash partitions at the moment in their method
>> signatures; Beena noted something similar in [2]. I know that you said
>> otherwise [3], but this is CommitFest 1, so there is time for a revert
>> later, and hash partitions are already useful in internal testing.
> 
> Well, that's a fair point.  I was assuming that committing things in
> that order would cause me to win the "least popular committer" award
> at least for that day, but maybe not.  It's certainly not ideal to
> have to juggle that patch along and keep rebasing it over other
> changes when it's basically done, and just waiting on other
> improvements to land.  Anybody else wish to express an opinion?

FWIW, I tend to agree that it would be nice to get the hash partitioning
patch in, even with old constraint exclusion based partition-pruning not
working for hash partitions.  That way, it might be more clear what we
need to do in the partition-pruning patches to account for hash partitions.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread David Rowley
On 25 September 2017 at 23:04, Amit Langote
 wrote:
> By the way, I'm now rebasing these patches on top of [1] and will try to
> merge your refactoring patch in some appropriate way.  Will post more
> tomorrow.
>
> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826

Yeah, I see 0001 conflicts with that. I'm going to set this to waiting
on author while you're busy rebasing this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Robert Haas
On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
 wrote:
> One could advocate (*cough*) that the hash partition patch [1] should be
> merged first in order to find other instances of where other CommitFest
> entries doesn't account for hash partitions at the moment in their method
> signatures; Beena noted something similar in [2]. I know that you said
> otherwise [3], but this is CommitFest 1, so there is time for a revert
> later, and hash partitions are already useful in internal testing.

Well, that's a fair point.  I was assuming that committing things in
that order would cause me to win the "least popular committer" award
at least for that day, but maybe not.  It's certainly not ideal to
have to juggle that patch along and keep rebasing it over other
changes when it's basically done, and just waiting on other
improvements to land.  Anybody else wish to express an opinion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Jesper Pedersen

On 09/26/2017 10:33 AM, Robert Haas wrote:

On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen
 wrote:

Could you share your thoughts on the usage of PartitionAppendInfo's
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.


This brings up something that I've kind of been thinking about.  There
are sort of four cases when it comes to partition pruning:

1. There is exactly one matching partition.  For example, this happens
when there is an equality constraint on every partition column.

2. There are multiple matching partitions which are consecutive.  For
example, there is a single level of range partitioning with no default
partition and the single partitioning column is constrained by < > <=
or >=.

3. There are multiple matching partitions which are not consecutive.
This case is probably rare, but it can happen if there is a default
partition, if there are list partitions with multiple bounds that are
interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5),
and the query allows values >= 4 and <= 5), if the query involves OR
conditions, or if there are multiple levels of partitioning (e.g.
partition by a, subpartition by b, put a range constraint on a and an
equality constraint on b).

4. There are no matching partitions.

One of the goals of this algorithm is to be fast.  The obvious way to
cater to case (3) is to iterate through all partitions and test
whether each one works, returning a Bitmapset, but that is O(n).
Admittedly, it might be O(n) with a pretty small constant factor, but
it still seems like exactly the sort of thing that we want to avoid
given the desire to scale to higher partition counts.

I propose that we create a structure that looks like this:

struct foo {
int min_partition;
int max_partition;
Bitmapset *extra_partitions;
};

This indicates that all partitions from min_partition to max_partition
need to be scanned, and in addition any partitions in extra_partitions
need to be scanned.  Assuming that we only consider cases where all
partition keys or a leading subset of the partition keys are
constrained, we'll generally be able to get by with just setting
min_partition and max_partition, but extra_partitions can be used to
handle default partitions and interleaved list bounds.  For equality
on all partitioning columns, we can do a single bsearch of the bounds
to identify the target partition at a given partitioning level, and
the same thing works for a single range-bound.  If there are two
range-bounds (< and > or <= and >= or whatever) we need to bsearch
twice.  The default partition, if any and if matched, must also be
included.  When there are multiple levels of partitioning things get a
bit more complex -- if someone wants to knock out a partition that
breaks up the range, we might need to shrink the main range to cover
part of it and kick the other indexes out to extra_partitions.

But the good thing is that in common cases with only O(lg n) effort we
can return O(1) data that describes what will be scanned.  In cases
where that's not practical we expend more effort but still prune with
maximal effectiveness.



For OLTP style applications 1) would be the common case, and with hash 
partitions it would be one equality constraint.


So, changing the method signature to use a data type as you described 
above instead of the explicit min_datum_idx / max_datum_idx output 
parameters would be more clear.


One could advocate (*cough*) that the hash partition patch [1] should be 
merged first in order to find other instances of where other CommitFest 
entries doesn't account for hash partitions at the moment in their 
method signatures; Beena noted something similar in [2]. I know that you 
said otherwise [3], but this is CommitFest 1, so there is time for a 
revert later, and hash partitions are already useful in internal testing.


[1] https://commitfest.postgresql.org/14/1059/
[2] 
https://www.postgresql.org/message-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A%40mail.gmail.com

[3] http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Robert Haas
On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen
 wrote:
> Could you share your thoughts on the usage of PartitionAppendInfo's
> min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.

This brings up something that I've kind of been thinking about.  There
are sort of four cases when it comes to partition pruning:

1. There is exactly one matching partition.  For example, this happens
when there is an equality constraint on every partition column.

2. There are multiple matching partitions which are consecutive.  For
example, there is a single level of range partitioning with no default
partition and the single partitioning column is constrained by < > <=
or >=.

3. There are multiple matching partitions which are not consecutive.
This case is probably rare, but it can happen if there is a default
partition, if there are list partitions with multiple bounds that are
interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5),
and the query allows values >= 4 and <= 5), if the query involves OR
conditions, or if there are multiple levels of partitioning (e.g.
partition by a, subpartition by b, put a range constraint on a and an
equality constraint on b).

4. There are no matching partitions.

One of the goals of this algorithm is to be fast.  The obvious way to
cater to case (3) is to iterate through all partitions and test
whether each one works, returning a Bitmapset, but that is O(n).
Admittedly, it might be O(n) with a pretty small constant factor, but
it still seems like exactly the sort of thing that we want to avoid
given the desire to scale to higher partition counts.

I propose that we create a structure that looks like this:

struct foo {
   int min_partition;
   int max_partition;
   Bitmapset *extra_partitions;
};

This indicates that all partitions from min_partition to max_partition
need to be scanned, and in addition any partitions in extra_partitions
need to be scanned.  Assuming that we only consider cases where all
partition keys or a leading subset of the partition keys are
constrained, we'll generally be able to get by with just setting
min_partition and max_partition, but extra_partitions can be used to
handle default partitions and interleaved list bounds.  For equality
on all partitioning columns, we can do a single bsearch of the bounds
to identify the target partition at a given partitioning level, and
the same thing works for a single range-bound.  If there are two
range-bounds (< and > or <= and >= or whatever) we need to bsearch
twice.  The default partition, if any and if matched, must also be
included.  When there are multiple levels of partitioning things get a
bit more complex -- if someone wants to knock out a partition that
breaks up the range, we might need to shrink the main range to cover
part of it and kick the other indexes out to extra_partitions.

But the good thing is that in common cases with only O(lg n) effort we
can return O(1) data that describes what will be scanned.  In cases
where that's not practical we expend more effort but still prune with
maximal effectiveness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Jesper Pedersen

Hi Amit,

On 09/15/2017 04:50 AM, Amit Langote wrote:

On 2017/09/15 11:16, Amit Langote wrote:

I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.


Rebased patches attached.  Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required.  Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.

A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1].  Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here.  It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches.  I would certainly like
to review  0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).



Could you share your thoughts on the usage of PartitionAppendInfo's 
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.


I'm looking at get_partitions_for_keys.

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Dilip Kumar
On Tue, Sep 26, 2017 at 2:45 PM, Amit Langote
 wrote:
> On 2017/09/25 20:21, Dilip Kumar wrote:

> I see.  So, in the run-time pruning case, only the work of extracting
> bounding values is deferred to execution time.  Matching clauses with the
> partition key still occurs during planning time.  Only that the clauses
> that require run-time pruning are not those in rel->baserestrictinfo.

Right.
>
>> After separating out the matching clause we will do somewhat similar
>> processing what "get_rel_partitions" is doing. But, at optimizer time
>> for PARAM we will not have Datum values for rightop, so we will keep
>> track of the PARAM itself.
>
> I guess information about which PARAMs map to which partition keys will be
> kept in the plan somehow.

Yes.
>
>> And, finally at runtime when we get the PARAM value we can prepare
>> minkey and maxkey and call get_partitions_for_keys function.
>
> Note that get_partitions_for_keys() is not planner code, nor is it bound
> with any other planning code.  It's callable from executor without much
> change.  Maybe you already know that though.

Yes, Right.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Amit Langote
On 2017/09/25 20:21, Dilip Kumar wrote:
> On Mon, Sep 25, 2017 at 3:34 PM, Amit Langote
>  wrote:
> 
>> Thanks for looking at the patches and the comments.
> 
>> It's not clear to me whether get_rel_partitions() itself, as it is, is
>> callable from outside the planner, because its signature contains
>> RelOptInfo.  We have the RelOptInfo in the signature, because we want to
>> mark certain fields in it so that latter planning steps can use them.  So,
>> get_rel_partitions()'s job is not just to match clauses and find
>> partitions, but also to perform certain planner-specific tasks of
>> generating information that the later planning steps will want to use.
>> That may turn out to be unnecessary, but until we know that, let's not try
>> to export get_rel_partitions() itself out of the planner.
>>
>> OTOH, the function that your refactoring patch separates out to match
>> clauses to partition keys and extract bounding values seems reusable
>> outside the planner and we should export it in such a way that it can be
>> used in the executor.  Then, the hypothetical executor function that does
>> the pruning will first call the planner's clause-matching function,
>> followed by calling get_partitions_for_keys() in partition.c to get the
>> selected partitions.
> 
> Thanks for your reply.
> 
> Actually, we are still planning to call get_matching_clause at the
> optimizer time only.  Since we can not use get_rel_partitions function
> directly for runtime pruning because it does all the work (find
> matching clause, create minkey and maxkey and call
> get_partitions_for_keys) during planning time itself.
>
> For runtime pruning, we are planning to first get_matching_clause
> function during optimizer time to identify the clause which is
> matching with partition keys, but for PARAM_EXEC case we can not
> depend upon baserelrestriction instead we will get the from join
> clause, that's the reason I have separated out get_matching_clause.
> But it will still be used during planning time.

I see.  So, in the run-time pruning case, only the work of extracting
bounding values is deferred to execution time.  Matching clauses with the
partition key still occurs during planning time.  Only that the clauses
that require run-time pruning are not those in rel->baserestrictinfo.

> After separating out the matching clause we will do somewhat similar
> processing what "get_rel_partitions" is doing. But, at optimizer time
> for PARAM we will not have Datum values for rightop, so we will keep
> track of the PARAM itself.

I guess information about which PARAMs map to which partition keys will be
kept in the plan somehow.

> And, finally at runtime when we get the PARAM value we can prepare
> minkey and maxkey and call get_partitions_for_keys function.

Note that get_partitions_for_keys() is not planner code, nor is it bound
with any other planning code.  It's callable from executor without much
change.  Maybe you already know that though.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-25 Thread Dilip Kumar
On Mon, Sep 25, 2017 at 3:34 PM, Amit Langote
 wrote:

> Thanks for looking at the patches and the comments.

> It's not clear to me whether get_rel_partitions() itself, as it is, is
> callable from outside the planner, because its signature contains
> RelOptInfo.  We have the RelOptInfo in the signature, because we want to
> mark certain fields in it so that latter planning steps can use them.  So,
> get_rel_partitions()'s job is not just to match clauses and find
> partitions, but also to perform certain planner-specific tasks of
> generating information that the later planning steps will want to use.
> That may turn out to be unnecessary, but until we know that, let's not try
> to export get_rel_partitions() itself out of the planner.
>
> OTOH, the function that your refactoring patch separates out to match
> clauses to partition keys and extract bounding values seems reusable
> outside the planner and we should export it in such a way that it can be
> used in the executor.  Then, the hypothetical executor function that does
> the pruning will first call the planner's clause-matching function,
> followed by calling get_partitions_for_keys() in partition.c to get the
> selected partitions.

Thanks for your reply.

Actually, we are still planning to call get_matching_clause at the
optimizer time only.  Since we can not use get_rel_partitions function
directly for runtime pruning because it does all the work (find
matching clause, create minkey and maxkey and call
get_partitions_for_keys) during planning time itself.

For runtime pruning, we are planning to first get_matching_clause
function during optimizer time to identify the clause which is
matching with partition keys, but for PARAM_EXEC case we can not
depend upon baserelrestriction instead we will get the from join
clause, that's the reason I have separated out get_matching_clause.
But it will still be used during planning time.

After separating out the matching clause we will do somewhat similar
processing what "get_rel_partitions" is doing. But, at optimizer time
for PARAM we will not have Datum values for rightop, so we will keep
track of the PARAM itself.

And, finally at runtime when we get the PARAM value we can prepare
minkey and maxkey and call get_partitions_for_keys function.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-25 Thread Amit Langote
Hi Dilip.

Thanks for looking at the patches and the comments.

On 2017/09/16 18:43, Dilip Kumar wrote:
> On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
>  wrote:
>> On 2017/09/15 11:16, Amit Langote wrote:
> 
> Thanks for the updated patch.  I was going through the logic of
> get_rel_partitions in 0002 as almost similar functionality will be
> required by runtime partition pruning on which Beena is working.  The
> only difference is that here we are processing the
> "rel->baserestrictinfo" and in case of runtime pruning, we also need
> to process join clauses which are pushed down to appendrel.

Yeah, I agree with the point you seem to be making that
get_rel_partitions() covers a lot of functionality, which it would be nice
to break down into reusable function(s) with suitable signature(s) that
the executor will also be able to use.

Your proposed refactoring patch down-thread seems to be a good step in
that direction.  Thanks for working on it.

> So can we make some generic logic which can be used for both the patches.
> 
> So basically, we need to do two changes
> 
> 1. In get_rel_partitions instead of processing the
> "rel->baserestrictinfo" we can take clause list as input that way we
> can pass any clause list to this function.
> 
> 2. Don't call "get_partitions_for_keys" routine from the
> "get_rel_partitions", instead, get_rel_partitions can just prepare
> minkey, maxkey and the caller of the get_rel_partitions can call
> get_partitions_for_keys, because for runtime pruning we need to call
> get_partitions_for_keys at runtime.

It's not clear to me whether get_rel_partitions() itself, as it is, is
callable from outside the planner, because its signature contains
RelOptInfo.  We have the RelOptInfo in the signature, because we want to
mark certain fields in it so that latter planning steps can use them.  So,
get_rel_partitions()'s job is not just to match clauses and find
partitions, but also to perform certain planner-specific tasks of
generating information that the later planning steps will want to use.
That may turn out to be unnecessary, but until we know that, let's not try
to export get_rel_partitions() itself out of the planner.

OTOH, the function that your refactoring patch separates out to match
clauses to partition keys and extract bounding values seems reusable
outside the planner and we should export it in such a way that it can be
used in the executor.  Then, the hypothetical executor function that does
the pruning will first call the planner's clause-matching function,
followed by calling get_partitions_for_keys() in partition.c to get the
selected partitions.

We should be careful when designing the interface of the exported function
to make sure it's not bound to the planner.  Your patch still maintains
the RelOptInfo in the signature of the clause-matching function, which the
executor pruning function won't have access to.

> After these changes also there will be one problem that the
> get_partitions_for_keys is directly fetching the "rightop->constvalue"
> whereas, for runtime pruning, we need to store rightop itself and
> calculate the value at runtime by param evaluation,  I haven't yet
> thought how can we make this last part generic.

I don't think any code introduced by the patch in partition.c itself looks
inside OpExpr (or any type of clause for that matter).  That is, I don't
see where get_partitions_for_keys() is looking at rightop->constvalue.
All it receives to work with are arrays of Datums and some other relevant
information like inclusivity, nullness, etc.

By the way, I'm now rebasing these patches on top of [1] and will try to
merge your refactoring patch in some appropriate way.  Will post more
tomorrow.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-19 Thread Dilip Kumar
I have done some refactoring of the code where I have moved the code
of getting the matching clause into the separate function so that it
can fetch the matching clause from any set of given restriction list.

It can be applied on top of 0002-WIP:
planner-side-changes-for-partition-pruning.patch

On Sat, Sep 16, 2017 at 3:13 PM, Dilip Kumar  wrote:
> On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
>  wrote:
>> On 2017/09/15 11:16, Amit Langote wrote:
>
> Thanks for the updated patch.  I was going through the logic of
> get_rel_partitions in 0002 as almost similar functionality will be
> required by runtime partition pruning on which Beena is working.  The
> only difference is that here we are processing the
> "rel->baserestrictinfo" and in case of runtime pruning, we also need
> to process join clauses which are pushed down to appendrel.
>
> So can we make some generic logic which can be used for both the patches.
>
> So basically, we need to do two changes
>
> 1. In get_rel_partitions instead of processing the
> "rel->baserestrictinfo" we can take clause list as input that way we
> can pass any clause list to this function.
>
> 2. Don't call "get_partitions_for_keys" routine from the
> "get_rel_partitions", instead, get_rel_partitions can just prepare
> minkey, maxkey and the caller of the get_rel_partitions can call
> get_partitions_for_keys, because for runtime pruning we need to call
> get_partitions_for_keys at runtime.
>
> After these changes also there will be one problem that the
> get_partitions_for_keys is directly fetching the "rightop->constvalue"
> whereas, for runtime pruning, we need to store rightop itself and
> calculate the value at runtime by param evaluation,  I haven't yet
> thought how can we make this last part generic.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0002-refactor_get_rel_partition.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-16 Thread Dilip Kumar
On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
 wrote:
> On 2017/09/15 11:16, Amit Langote wrote:

Thanks for the updated patch.  I was going through the logic of
get_rel_partitions in 0002 as almost similar functionality will be
required by runtime partition pruning on which Beena is working.  The
only difference is that here we are processing the
"rel->baserestrictinfo" and in case of runtime pruning, we also need
to process join clauses which are pushed down to appendrel.

So can we make some generic logic which can be used for both the patches.

So basically, we need to do two changes

1. In get_rel_partitions instead of processing the
"rel->baserestrictinfo" we can take clause list as input that way we
can pass any clause list to this function.

2. Don't call "get_partitions_for_keys" routine from the
"get_rel_partitions", instead, get_rel_partitions can just prepare
minkey, maxkey and the caller of the get_rel_partitions can call
get_partitions_for_keys, because for runtime pruning we need to call
get_partitions_for_keys at runtime.

After these changes also there will be one problem that the
get_partitions_for_keys is directly fetching the "rightop->constvalue"
whereas, for runtime pruning, we need to store rightop itself and
calculate the value at runtime by param evaluation,  I haven't yet
thought how can we make this last part generic.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-15 Thread Amit Langote
On Sat, Sep 16, 2017 at 4:04 AM, Robert Haas  wrote:
> On Fri, Sep 15, 2017 at 4:50 AM, Amit Langote
>  wrote:
>> Rebased patches attached.  Because Dilip complained earlier today about
>> clauses of the form (const op var) not causing partition-pruning, I've
>> added code to commute the clause where it is required.  Some other
>> previously mentioned limitations remain -- no handling of OR clauses, no
>> elimination of redundant clauses for given partitioning column, etc.
>>
>> A note about 0001: this patch overlaps with
>> 0003-Canonical-partition-scheme.patch from the partitionwise-join patch
>> series that Ashutosh Bapat posted yesterday [1].
>
> It doesn't merely overlap; it's obviously a derivative work,

Yes it is.  I noted that upthread [1] that most of these are derived
from Ashutosh's patch on his suggestion.  I guess I should have
repeated that in this message too, sorry.

> and the
> commit message in your version should credit all the authors.

That was a mistake on my part, too.  Will be careful hereon.

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/0e829199-a43c-2a66-b966-89a0020a6cd4%40lab.ntt.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-15 Thread Robert Haas
On Fri, Sep 15, 2017 at 4:50 AM, Amit Langote
 wrote:
> Rebased patches attached.  Because Dilip complained earlier today about
> clauses of the form (const op var) not causing partition-pruning, I've
> added code to commute the clause where it is required.  Some other
> previously mentioned limitations remain -- no handling of OR clauses, no
> elimination of redundant clauses for given partitioning column, etc.
>
> A note about 0001: this patch overlaps with
> 0003-Canonical-partition-scheme.patch from the partitionwise-join patch
> series that Ashutosh Bapat posted yesterday [1].

It doesn't merely overlap; it's obviously a derivative work, and the
commit message in your version should credit all the authors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-15 Thread Amit Langote
On 2017/09/15 11:16, Amit Langote wrote:
> I will post rebased patches later today, although I think the overall
> design of the patch on the planner side of things is not quite there yet.
> Of course, your and others' feedback is greatly welcome.

Rebased patches attached.  Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required.  Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.

A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1].  Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here.  It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches.  I would certainly like
to review  0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).

> Also, I must inform to all of those who're looking at this thread that I
> won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23,
> Sat, due to some personal business.

To remind.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFiTN-skmaqeCVaoAHCBqe2DyfO3f6sgdtEjHWrUgi0kV1yPLQ%40mail.gmail.com
From ff9ccd8df6555cfca31e54e22293ac1613db327c Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 1/5] Some optimizer data structures for partitioned rels

Nobody uses it though.
---
 src/backend/optimizer/util/plancat.c | 120 +++
 src/backend/optimizer/util/relnode.c |  20 ++
 src/include/nodes/nodes.h|   1 +
 src/include/nodes/relation.h | 135 +++
 src/include/optimizer/plancat.h  |   2 +
 5 files changed, 278 insertions(+)

diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..f7e3a1df5f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -68,6 +68,8 @@ static List *get_relation_constraints(PlannerInfo *root,
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
  Relation heapRelation);
 static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
+static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+   Relation relation);
 
 /*
  * get_relation_info -
@@ -420,6 +422,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
/* Collect info about relation's foreign keys, if relevant */
get_relation_foreign_keys(root, rel, relation, inhparent);
 
+   /* Collect partitioning info, if relevant. */
+   if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   get_relation_partition_info(root, rel, relation);
+
heap_close(relation, NoLock);
 
/*
@@ -1802,3 +1808,117 @@ has_row_triggers(PlannerInfo *root, Index rti, CmdType 
event)
heap_close(relation, NoLock);
return result;
 }
+
+static void
+get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+   Relation relation)
+{
+   int i;
+   ListCell *l;
+   PartitionKey key = RelationGetPartitionKey(relation);
+   PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+   rel->part_scheme = find_partition_scheme(root, relation);
+   rel->partexprs = (List **) palloc0(key->partnatts * sizeof(List *));
+
+   l = list_head(key->partexprs);
+   for (i = 0; i < key->partnatts; i++)
+   {
+   Expr *keyCol;
+
+   if (key->partattrs[i] != 0)
+   {
+   keyCol = (Expr *) makeVar(rel->relid,
+ 
key->partattrs[i],
+ 
key->parttypid[i],
+ 
key->parttypmod[i],
+ 
key->parttypcoll[i],
+ 0);
+   }
+   else
+   {
+   if (l == NULL)
+   elog(ERROR, "wrong number of partition key 
expressions");
+   keyCol = (Expr *) copyObject(lfirst(l));
+   l = lnext(l);
+   }
+
+   rel->partexprs[i] = list_make1(keyCol);
+   }
+
+   /* Values are filled in build_simple_rel(). */
+   

Re: [HACKERS] path toward faster partition pruning

2017-09-14 Thread Amit Langote
Hi Dilip,

Thanks for looking at the patch.

On 2017/09/15 13:43, Dilip Kumar wrote:
> On Wed, Sep 6, 2017 at 4:08 PM, Amit Langote
>> [PATCH 2/5] WIP: planner-side changes for partition-pruning
>>
>> This patch adds a stub get_partitions_for_keys in partition.c with a
>> suitable interface for the caller to pass bounding keys extracted from the
>> query and other related information.
>>
>> Importantly, it implements the logic within the planner to match query's
>> scan keys to a parent table's partition key and form the bounding keys
>> that will be passed to partition.c to compute the list of partitions that
>> satisfy those bounds.
> 
> + Node   *leftop = get_leftop(clause);
> +
> + if (IsA(leftop, RelabelType))
> + leftop = (Node *) ((RelabelType *) leftop)->arg;
> +
> + if (equal(leftop, partkey))
> 
> It appears that this patch always assume that clause will be of form
> "var op const", but it can also be "const op var"
> 
> That's the reason in below example where in both the queries condition
> is same it can only prune in the first case but not in the second.
> 
> postgres=# explain select * from t where t.a < 2;
>QUERY PLAN
> 
>  Append  (cost=0.00..2.24 rows=1 width=8)
>->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
>  Filter: (a < 2)
> (3 rows)
> 
> postgres=# explain select * from t where 2>t.a;
>QUERY PLAN
> 
>  Append  (cost=0.00..4.49 rows=2 width=8)
>->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
>  Filter: (2 > a)
>->  Seq Scan on t2  (cost=0.00..2.25 rows=1 width=8)
>  Filter: (2 > a)
> (5 rows)

Yeah, there are a bunch of smarts still missing in that patch as it is.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-14 Thread Dilip Kumar
On Wed, Sep 6, 2017 at 4:08 PM, Amit Langote
 wrote:
> On 2017/09/04 10:10, Amit Langote wrote:
>> On 2017/09/02 2:52, Robert Haas wrote:

>
> [PATCH 2/5] WIP: planner-side changes for partition-pruning
>
> This patch adds a stub get_partitions_for_keys in partition.c with a
> suitable interface for the caller to pass bounding keys extracted from the
> query and other related information.
>
> Importantly, it implements the logic within the planner to match query's
> scan keys to a parent table's partition key and form the bounding keys
> that will be passed to partition.c to compute the list of partitions that
> satisfy those bounds.
>

+ Node   *leftop = get_leftop(clause);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))

It appears that this patch always assume that clause will be of form
"var op const", but it can also be "const op var"

That's the reason in below example where in both the queries condition
is same it can only prune in the first case but not in the second.

postgres=# explain select * from t where t.a < 2;
   QUERY PLAN

 Append  (cost=0.00..2.24 rows=1 width=8)
   ->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
 Filter: (a < 2)
(3 rows)

postgres=# explain select * from t where 2>t.a;
   QUERY PLAN

 Append  (cost=0.00..4.49 rows=2 width=8)
   ->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
 Filter: (2 > a)
   ->  Seq Scan on t2  (cost=0.00..2.25 rows=1 width=8)
 Filter: (2 > a)
(5 rows)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-14 Thread Amit Langote
On 2017/09/15 10:55, David Rowley wrote:
> On 21 August 2017 at 18:37, Amit Langote  
> wrote:
>> I've been working on implementing a way to perform plan-time
>> partition-pruning that is hopefully faster than the current method of
>> using constraint exclusion to prune each of the potentially many
>> partitions one-by-one.  It's not fully cooked yet though.
> 
> I'm interested in seeing improvements in this area, so I've put my
> name down to review this.

Great, thanks!

I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.

Also, I must inform to all of those who're looking at this thread that I
won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23,
Sat, due to some personal business.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-14 Thread David Rowley
On 21 August 2017 at 18:37, Amit Langote  wrote:
> I've been working on implementing a way to perform plan-time
> partition-pruning that is hopefully faster than the current method of
> using constraint exclusion to prune each of the potentially many
> partitions one-by-one.  It's not fully cooked yet though.

I'm interested in seeing improvements in this area, so I've put my
name down to review this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-08 Thread Amit Langote
On 2017/09/08 4:41, Robert Haas wrote:
> On Thu, Sep 7, 2017 at 7:16 AM, Amit Langote
>  wrote:
>> There is a patch in the Ashutosh's posted series of patches, which does
>> more or less the same thing that this patch does.  He included it in his
>> series of patches, because, IIUC, the main partitionwise-join planning
>> logic that one of the later patch implements depends on being able to
>> consider applying that new planning technique individually for every
>> partition sub-tree, instead of just at the whole tree root.
>>
>> One notable difference from his patch is that while his patch will expand
>> in non-flattened manner even in the case where the parent is the result
>> relation of a query, my patch doesn't in that case, because the new
>> partition-pruning technique cannot be applied to inheritance parent that
>> is a result relation, for example,
>>
>> update partitioned_table set ...
>>
>> And AFAICS, partitionwise-join cannot be applied to such a parent either.
>>
>> Note however that if there are other instances of the same partitioned
>> table (in the FROM list of an update statement) or other partitioned
>> tables in the query, they will be expanded in a non-flattened manner,
>> because they are themselves not the result relations of the query.  So,
>> the new partition-pruning and (supposedly) partitionwise-join can be
>> applied for those other partitioned tables.
> 
> It seems to me that it would be better not to write new patches for
> things that already have patches without a really clear explanation
> with what's wrong with the already-existing patch; I don't see any
> such explanation here.  Instead of writing your own patch for this to
> duel with his his, why not review his and help him correct any
> deficiencies which you can spot?  Then we have one patch with more
> review instead of two patches with less review both of which I have to
> read and try to decide which is better.

Sorry, I think I should have just used the Ashutosh's patch.

> In this case, I think Ashutosh has the right idea.  I think that
> handling the result-relation and non-result-relation differently
> creates an unpleasant asymmetry.  With your patch, we have to deal
> with three cases: (a) partitioned tables that were expanded
> level-by-level because they are not result relations, (b) partitioned
> tables that were expanded "flattened" because they are result
> relations, and (c) non-partitioned tables that were expanded
> "flattened".  With Ashutosh's approach, we only have two cases to
> worry about in the future rather than three, and I like that better.

I tend to agree with this now.

> Your patch also appears to change things so that the table actually
> referenced in the query ends up with an AppendRelInfo for the parent,
> which seems pointless.

Actually, it wouldn't, because my patch also got rid of the notion of
adding the duplicate RTE for original parent, because I thought the
duplicate RTE was pointless in the partitioning case.

> There are a couple of hunks from your patch that we might want or need
> to incorporate into Ashutosh's patch.  The change to
> relation_excluded_by_constraints() looks like it might be useful,
> although it needs a better comment and some tests.

I think we could just drop that part from this patch.  It also looks like
Ashutosh has a patch elsewhere concerning this.

https://commitfest.postgresql.org/14/1108/

Maybe, we could discuss what do about this on that thread.  Now that not
only the root partitioned table, but also other partitioned tables in the
tree get an RTE with inh = true, I think it would be interesting to
consider his patch.

> Also, Ashutosh's
> patch has no equivalent of your change to add_paths_to_append_rel().
> I'm not clear what the code you've introduced there is supposed to be
> doing, and I'm suspicious that it is confusing "partition root" with
> "table named in the query", which will often be the same but not
> always; the user could have named an intermediate partition.  Can you
> expand on what this is doing?

I've replied on the partition-wise thread explaining why changes in the
add_paths_to_append_rel() are necessary.

Anyway, I'm dropping my patch in favor of the patch on the other thread.
Sorry for the duplicated effort involved in having to look at both the
patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZEUonD9dUZH1FBEyq%3DPEv_KvE3wC%3DA%3D0zm-_tRz_917A%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-07 Thread Robert Haas
On Thu, Sep 7, 2017 at 7:16 AM, Amit Langote
 wrote:
> There is a patch in the Ashutosh's posted series of patches, which does
> more or less the same thing that this patch does.  He included it in his
> series of patches, because, IIUC, the main partitionwise-join planning
> logic that one of the later patch implements depends on being able to
> consider applying that new planning technique individually for every
> partition sub-tree, instead of just at the whole tree root.
>
> One notable difference from his patch is that while his patch will expand
> in non-flattened manner even in the case where the parent is the result
> relation of a query, my patch doesn't in that case, because the new
> partition-pruning technique cannot be applied to inheritance parent that
> is a result relation, for example,
>
> update partitioned_table set ...
>
> And AFAICS, partitionwise-join cannot be applied to such a parent either.
>
> Note however that if there are other instances of the same partitioned
> table (in the FROM list of an update statement) or other partitioned
> tables in the query, they will be expanded in a non-flattened manner,
> because they are themselves not the result relations of the query.  So,
> the new partition-pruning and (supposedly) partitionwise-join can be
> applied for those other partitioned tables.

It seems to me that it would be better not to write new patches for
things that already have patches without a really clear explanation
with what's wrong with the already-existing patch; I don't see any
such explanation here.  Instead of writing your own patch for this to
duel with his his, why not review his and help him correct any
deficiencies which you can spot?  Then we have one patch with more
review instead of two patches with less review both of which I have to
read and try to decide which is better.

In this case, I think Ashutosh has the right idea.  I think that
handling the result-relation and non-result-relation differently
creates an unpleasant asymmetry.  With your patch, we have to deal
with three cases: (a) partitioned tables that were expanded
level-by-level because they are not result relations, (b) partitioned
tables that were expanded "flattened" because they are result
relations, and (c) non-partitioned tables that were expanded
"flattened".  With Ashutosh's approach, we only have two cases to
worry about in the future rather than three, and I like that better.

Your patch also appears to change things so that the table actually
referenced in the query ends up with an AppendRelInfo for the parent,
which seems pointless.  And it has no tests.

There are a couple of hunks from your patch that we might want or need
to incorporate into Ashutosh's patch.  The change to
relation_excluded_by_constraints() looks like it might be useful,
although it needs a better comment and some tests.  Also, Ashutosh's
patch has no equivalent of your change to add_paths_to_append_rel().
I'm not clear what the code you've introduced there is supposed to be
doing, and I'm suspicious that it is confusing "partition root" with
"table named in the query", which will often be the same but not
always; the user could have named an intermediate partition.  Can you
expand on what this is doing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-07 Thread Amit Langote
Forgot to mention a couple of important points about the relation of some
of the patches here to the patches and discussion at the
partitionwise-join thread [1].

On 2017/09/06 19:38, Amit Langote wrote:
> [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner
> 
> This will allow us to perform scan and join planning in a per partition
> sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
> Previously, only the root of the whole partition tree would get a
> RelOptInfo, along with the leaf partitions, with each leaf partition's
> AppendRelInfo pointing to the former as its parent.
> 
> This is essential, because we will be doing partition-pruning for every
> partitioned table in the tree by matching query's scan keys with its
> partition key.  We won't be able to do that if the intermediate
> partitioned tables didn't have a RelOptInfo.

There is a patch in the Ashutosh's posted series of patches, which does
more or less the same thing that this patch does.  He included it in his
series of patches, because, IIUC, the main partitionwise-join planning
logic that one of the later patch implements depends on being able to
consider applying that new planning technique individually for every
partition sub-tree, instead of just at the whole tree root.

One notable difference from his patch is that while his patch will expand
in non-flattened manner even in the case where the parent is the result
relation of a query, my patch doesn't in that case, because the new
partition-pruning technique cannot be applied to inheritance parent that
is a result relation, for example,

update partitioned_table set ...

And AFAICS, partitionwise-join cannot be applied to such a parent either.

Note however that if there are other instances of the same partitioned
table (in the FROM list of an update statement) or other partitioned
tables in the query, they will be expanded in a non-flattened manner,
because they are themselves not the result relations of the query.  So,
the new partition-pruning and (supposedly) partitionwise-join can be
applied for those other partitioned tables.

> [PATCH 2/5] WIP: planner-side changes for partition-pruni[...]
> 
> Also, it adds certain fields to RelOptInfos of the partitioned tables that
> reflect its partitioning properties.

There is something called PartitionScheme, which is a notion one of the
Ashutosh's patches invented that this patch incorporates as one of the new
fields in RelOptInfo that I mentioned above (also a list of
PartitionScheme's in the planner-global PlannerInfo).  Although,
PartitionScheme is not significant for the task of partition-pruning
itself, it's still useful.  On Ashutosh's suggestion, I adopted the same
in my patch series, so that the partition-wise join patch doesn't end up
conflicting with the partition-pruning patch while trying to implement the
same and can get straight to the task of implementing partition-wise joins.

The same patch in the partition-wise join patch series that introduces
PartitionScheme, also introduces a field in the RelOptInfo called
partexprs, which records the partition key expressions.  Since,
partition-pruning has use for the same, I incorporated the same here;
also, in a format that Ashutosh's partition-wise patch can use directly,
instead of the latter having to hack it again to make it suitable to store
partition key expressions of joinrels.  Again, that's to minimize
conflicts and let his patch just find the field to use as is, instead of
implementing it first.

Lastly, this patch introduces a PartitionAppendInfo in a partitioned
table's RelOptInfo that stores AppendRelInfos of the partitions (child
relations) that survive partition-pruning, which serves to identify those
partitions' RelOptInfos.  Along with the identities of surviving
partitions, it also stores the partitioning configuration of the
partitioned table after partitions are pruned.  That includes
partdesc->boundinfo (which is simply a pointer into the table's relcache)
and a few other fields that are set by partition-pruning code, such
min_datum_index, max_datum_index, null_partition_chosen, that describe the
result after pruning.  So, for two partitioned tables being joined, if the
boundinfos match per partition_bounds_equal() and these other fields
match, they can be safely partition-wise joined.

[1]
https://www.postgresql.org/message-id/CAFjFpRfRDhWp%3DoguNjyzN%3DNMoOD%2BRCC3wS%2Bb%2BxbGKwKUk0dRKg%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-06 Thread Amit Langote
On 2017/09/04 10:10, Amit Langote wrote:
> On 2017/09/02 2:52, Robert Haas wrote:
>> It strikes me that this patch set is doing two things but maybe in the
>> opposite order that I would have chosen to attack them.  First,
>> there's getting partition pruning to use something other than
>> constraint exclusion.  Second, there's deferring work that is
>> currently done at an early stage of the process until later, so that
>> we waste less effort on partitions that are ultimately going to be
>> pruned.
> 
> OK.
> 
>> Therefore, IMHO, it would be best to focus first on how we're going to
>> identify the partitions that survive pruning, and then afterwards work
>> on transposing that logic to happen before partitions are opened and
>> locked.  That way, we get some incremental benefit sooner, and also
>> unblock some other development work.
> 
> Alright, I will try to do it that way.

Attached set of patches that does things that way.  Individual patches
described below:

[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.

This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key.  We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.

[PATCH 2/5] WIP: planner-side changes for partition-pruning

This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.

Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.

Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.

[PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}

This guy modifies the partition bound comparison function so that the
caller can pass incomplete partition key tuple that is potentially a
prefix of a multi-column partition key.  Currently, the input tuple must
contain all of key->partnatts values, but that may not be true for
queries, which may not have restrictions on all the partition key columns.

[PATCH 4/5] WIP: Implement get_partitions_for_keys()

This one fills the get_partitions_for_keys stub with the actual logic that
searches the partdesc->boundinfo for partition bounds that match the
bounding keys specified by the caller.

[PATCH 5/5] Add more tests for the new partitioning-related planning

More tests.


Some TODO items still remain to be done:

* Process OR clauses to use for partition-pruning
* Process redundant clauses (such as a = 10 and a > 1) more smartly
* Other tricks that are missing
* Fix bugs
* More tests

Thanks,
Amit
From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 6 Sep 2017 09:28:14 +0900
Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

...except when the partitioned table in question is the result rel
of the query.

This allows us perform scan and join planning for each sub-tree in a
given partition tree, with each sub-tree's root partitioned table
getting its own RelOptInfo.  Previously only the root of the whole
partition tree got a RelOptInfo, along with the leaf partitions,
with each leaf partition's AppendRelInfo pointing to the former as
its parent.
---
 src/backend/optimizer/path/allpaths.c  |  34 ++-
 src/backend/optimizer/plan/planner.c   |   3 +-
 src/backend/optimizer/prep/prepunion.c | 166 +
 src/backend/optimizer/util/plancat.c   |   7 +-
 4 files changed, 146 insertions(+), 64 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..6c3511bd47 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
RangeTblEntry *rte;
 
rte = planner_rt_fetch(rel->relid, root);
+
+   /*
+* Get the partitioned_rels list from root->pcinfo_list after
+* confirming that rel is actually a root partitioned table.
+*/
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
-   partitioned_rels = get_partitioned_child_rels(root, rel->relid);
-   /* The root partitioned table is included as a child rel */
-   Assert(list_length(partitioned_rels) >= 1);
+   

Re: [HACKERS] path toward faster partition pruning

2017-09-03 Thread Amit Langote
Thanks for the comments.

On 2017/09/02 2:52, Robert Haas wrote:
> On Thu, Aug 31, 2017 at 2:02 AM, Amit Langote
>  wrote:
>> Attached is now also the set of patches that implement the actual
>> partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
>> the attached.
> 
> It strikes me that this patch set is doing two things but maybe in the
> opposite order that I would have chosen to attack them.  First,
> there's getting partition pruning to use something other than
> constraint exclusion.  Second, there's deferring work that is
> currently done at an early stage of the process until later, so that
> we waste less effort on partitions that are ultimately going to be
> pruned.

OK.

> 
> The second one is certainly a worthwhile goal, but there are fairly
> firm interdependencies between the first one and some other things
> that are in progress.  For example, the first one probably ought to be
> done before hash partitioning gets committed, because
> constraint-exclusion based partitioning pruning won't work with
> partitioning pruning, but some mechanism based on asking the
> partitioning code which partitions might match will.

Yeah.

> Such a mechanism
> is more efficient for list and range partitions, but it's the only
> thing that will work for hash partitions.  Also, Beena Emerson is
> working on run-time partition pruning, and the more I think about it,
> the more I think that overlaps with this first part.  Both patches
> need a mechanism to identify, given a btree-indexable comparison
> operator (< > <= >= =) and a set of values, which partitions might
> contain matching values.  Run-time partition pruning will call that at
> execution time, and this patch will call it at plan time, but it's the
> same logic; it's just a question of the point at which the values are
> known.  And of course we don't want to end up with two copies of the
> logic.

Agreed here too.

I agree that spending effort on the first part (deferment of locking, etc.
within the planner) does not benefit either the hash partitioning and
run-time pruning patches much.

> Therefore, IMHO, it would be best to focus first on how we're going to
> identify the partitions that survive pruning, and then afterwards work
> on transposing that logic to happen before partitions are opened and
> locked.  That way, we get some incremental benefit sooner, and also
> unblock some other development work.

Alright, I will try to do it that way.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-01 Thread Robert Haas
On Thu, Aug 31, 2017 at 2:02 AM, Amit Langote
 wrote:
> Attached is now also the set of patches that implement the actual
> partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
> the attached.

It strikes me that this patch set is doing two things but maybe in the
opposite order that I would have chosen to attack them.  First,
there's getting partition pruning to use something other than
constraint exclusion.  Second, there's deferring work that is
currently done at an early stage of the process until later, so that
we waste less effort on partitions that are ultimately going to be
pruned.

The second one is certainly a worthwhile goal, but there are fairly
firm interdependencies between the first one and some other things
that are in progress.  For example, the first one probably ought to be
done before hash partitioning gets committed, because
constraint-exclusion based partitioning pruning won't work with
partitioning pruning, but some mechanism based on asking the
partitioning code which partitions might match will.  Such a mechanism
is more efficient for list and range partitions, but it's the only
thing that will work for hash partitions.  Also, Beena Emerson is
working on run-time partition pruning, and the more I think about it,
the more I think that overlaps with this first part.  Both patches
need a mechanism to identify, given a btree-indexable comparison
operator (< > <= >= =) and a set of values, which partitions might
contain matching values.  Run-time partition pruning will call that at
execution time, and this patch will call it at plan time, but it's the
same logic; it's just a question of the point at which the values are
known.  And of course we don't want to end up with two copies of the
logic.

Therefore, IMHO, it would be best to focus first on how we're going to
identify the partitions that survive pruning, and then afterwards work
on transposing that logic to happen before partitions are opened and
locked.  That way, we get some incremental benefit sooner, and also
unblock some other development work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-08-30 Thread Amit Langote
Hi Ashutosh,

Thanks for the comments and sorry that it took me a while to reply here.

On 2017/08/23 20:16, Ashutosh Bapat wrote:
> On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
>  wrote:
>> I've been working on implementing a way to perform plan-time
>> partition-pruning that is hopefully faster than the current method of
>> using constraint exclusion to prune each of the potentially many
>> partitions one-by-one.  It's not fully cooked yet though.
>>
>> Meanwhile, I thought I'd share a couple of patches that implement some
>> restructuring of the planner code related to partitioned table inheritance
>> planning that I think would be helpful.  They are to be applied on top of
>> the patches being discussed at [1].  Note that these patches themselves
>> don't implement the actual code that replaces constraint exclusion as a
>> method of performing partition pruning.  I will share that patch after
>> debugging it some more.
>>
>> The main design goal of the patches I'm sharing here now is to defer the
>> locking and  opening of leaf partitions in a given partition tree to a
>> point after set_append_rel_size() is called on the root partitioned table.
>>  Currently, AFAICS, we need to lock and open the child tables in
>> expand_inherited_rtentry() only to set the translated_vars field in
>> AppendRelInfo that we create for the child.  ISTM, we can defer the
>> creation of a child AppendRelInfo to a point when it (its field
>> translated_vars in particular) will actually be used and so lock and open
>> the child tables only at such a time.  Although we don't lock and open the
>> partition child tables in expand_inherited_rtentry(), their RT entries are
>> still created and added to root->parse->rtable, so that
>> setup_simple_rel_arrays() knows the maximum number of entries
>> root->simple_rel_array will need to hold and allocate the memory for that
>> array accordingly.   Slots in simple_rel_array[] corresponding to
>> partition child tables will be empty until they are created when
>> set_append_rel_size() is called on the root parent table and it determines
>> the partitions that will be scanned after all.
> 
> The partition pruning can happen only after the quals have been
> distributed to Rels i.e. after deconstruct_jointree(),
> reconsider_outer_join_clauses() and generate_base_implied_equalities()
> have been called. If the goal is to not heap_open() the partitions
> which are pruned, we can't do that in expand_inherited_rtentry(). One
> reason why I think we don't want to heap_open() partition relations is
> to avoid relcache bloat because of opened partition relations, which
> are ultimately pruned. But please note that according to your patches,
> we still need to populate catalog caches to get relkind and reltype
> etc.

Yes, we still hit syscache for *all* partitions.  I haven't yet thought
very hard about avoiding that altogether.

> There are many functions that traverse simple_rel_array[] after it's
> created. Most of them assume that the empty entries in that array
> correspond to non-simple range entries like join RTEs. But now we are
> breaking that assumption. Most of these functions also skip "other"
> relations, so that may be OK now, but I am not sure if it's really
> going to be fine if we keep empty slots in place of partition
> relations. There may be three options here 1. add placeholder
> RelOptInfos for partition relations (may be mark those specially) and
> mark the ones which get pruned as dummy later. 2. Prune the partitions
> before any functions scans simple_rel_array[] or postpone creating
> simple_rel_array till pruning. 3. Examine all the current scanners
> esp. the ones which will be called before pruning to make sure that
> skipping "other" relations is going to be kosher.

Between the point when slots in simple_rel_array are allocated
(setup_simple_rel_arrays) and partition RelOptInfos are actually created
after the partition-pruning step has occurred (set_append_rel_size), it
seems that most places that iterate over simple_rel_array know also to
skip slots containing NULL values.  We might need to document that NULL
means partitions in addition to its current meaning - non-baserels.

>> Patch augments the existing PartitionedChildRelInfo node, which currently
>> holds only the partitioned child rel RT indexes, to carry some more
>> information about the partition tree, which includes the information
>> returned by RelationGetPartitionDispatchInfo() when it is called from
>> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
>> be able to add partitions to the query tree in the bound order).
>> Actually, since PartitionedChildRelInfo now contains more information
>> about the partition tree than it used to before, I thought the struct's
>> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
>> root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
>> only use that node 

Re: [HACKERS] path toward faster partition pruning

2017-08-23 Thread Ashutosh Bapat
On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
 wrote:
> I've been working on implementing a way to perform plan-time
> partition-pruning that is hopefully faster than the current method of
> using constraint exclusion to prune each of the potentially many
> partitions one-by-one.  It's not fully cooked yet though.
>
> Meanwhile, I thought I'd share a couple of patches that implement some
> restructuring of the planner code related to partitioned table inheritance
> planning that I think would be helpful.  They are to be applied on top of
> the patches being discussed at [1].  Note that these patches themselves
> don't implement the actual code that replaces constraint exclusion as a
> method of performing partition pruning.  I will share that patch after
> debugging it some more.
>
> The main design goal of the patches I'm sharing here now is to defer the
> locking and  opening of leaf partitions in a given partition tree to a
> point after set_append_rel_size() is called on the root partitioned table.
>  Currently, AFAICS, we need to lock and open the child tables in
> expand_inherited_rtentry() only to set the translated_vars field in
> AppendRelInfo that we create for the child.  ISTM, we can defer the
> creation of a child AppendRelInfo to a point when it (its field
> translated_vars in particular) will actually be used and so lock and open
> the child tables only at such a time.  Although we don't lock and open the
> partition child tables in expand_inherited_rtentry(), their RT entries are
> still created and added to root->parse->rtable, so that
> setup_simple_rel_arrays() knows the maximum number of entries
> root->simple_rel_array will need to hold and allocate the memory for that
> array accordingly.   Slots in simple_rel_array[] corresponding to
> partition child tables will be empty until they are created when
> set_append_rel_size() is called on the root parent table and it determines
> the partitions that will be scanned after all.

The partition pruning can happen only after the quals have been
distributed to Rels i.e. after deconstruct_jointree(),
reconsider_outer_join_clauses() and generate_base_implied_equalities()
have been called. If the goal is to not heap_open() the partitions
which are pruned, we can't do that in expand_inherited_rtentry(). One
reason why I think we don't want to heap_open() partition relations is
to avoid relcache bloat because of opened partition relations, which
are ultimately pruned. But please note that according to your patches,
we still need to populate catalog caches to get relkind and reltype
etc.

There are many functions that traverse simple_rel_array[] after it's
created. Most of them assume that the empty entries in that array
correspond to non-simple range entries like join RTEs. But now we are
breaking that assumption. Most of these functions also skip "other"
relations, so that may be OK now, but I am not sure if it's really
going to be fine if we keep empty slots in place of partition
relations. There may be three options here 1. add placeholder
RelOptInfos for partition relations (may be mark those specially) and
mark the ones which get pruned as dummy later. 2. Prune the partitions
before any functions scans simple_rel_array[] or postpone creating
simple_rel_array till pruning. 3. Examine all the current scanners
esp. the ones which will be called before pruning to make sure that
skipping "other" relations is going to be kosher.

>
> Patch augments the existing PartitionedChildRelInfo node, which currently
> holds only the partitioned child rel RT indexes, to carry some more
> information about the partition tree, which includes the information
> returned by RelationGetPartitionDispatchInfo() when it is called from
> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
> be able to add partitions to the query tree in the bound order).
> Actually, since PartitionedChildRelInfo now contains more information
> about the partition tree than it used to before, I thought the struct's
> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
> root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
> only use that node internally.
>
> Then during the add_base_rels_to_query() step, when build_simple_rel()
> builds a RelOptInfo for the root partitioned table, it also initializes
> some newly introduced fields in RelOptInfo from the information contained
> in PartitionRootInfo of the table.  The aforementioned fields are only
> initialized in RelOptInfos of root partitioned tables.  Note that the
> add_base_rels_to_query() step won't add the partition "otherrel"
> RelOptInfos yet (unlike the regular inheritance case, where they are,
> after looking them up in root->append_rel_list).

Partition-wise join requires the partition hierarchy to be expanded
level-by-level keeping in-tact the parent-child relationship between
partitioned table and its partitions. Your patch