Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-02-13 Thread Alvaro Herrera
David Rowley wrote:
> On 19 January 2018 at 16:00, Kyotaro HORIGUCHI
>  wrote:
> > And I'd like to ask David to check out his mail environment so
> > that SPF record is available for his message.
> 
> Will investigate

This should be fixed now.  Please let us know if you still see problems.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-23 Thread David Rowley
On 23 January 2018 at 23:22, Amit Langote  wrote:
> On 2018/01/23 15:44, David Rowley wrote:
>> Attached is what I had in mind about how to do this.
>
> Thanks for the delta patch.  I will start looking at it tomorrow.

Thanks. I've been looking more at this and I've made a few more
adjustments in the attached.

This delta patch should be applied against the
faster_partition_prune_v21_delta_drowley_v1.patch one I sent
yesterday. This changes a few comments, also now correctly passes the
context to get_partitions_excluded_by_ne_clauses and fixes a small
error where the patch was failing to record a notnull for the
partition key when it saw a strict <> clause. It was only doing this
for the opposite case, but both seem to be perfectly applicable. I
also made a small adjustment to the regression tests to ensure this is
covered.

I'm now going to start work on basing the partition pruning patch on
top of this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_partition_prune_v21_delta_drowley_v1_delta.patch
Description: Binary data


Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-22 Thread David Rowley
Hi Amit
,
On 19 January 2018 at 04:03, David Rowley  wrote:
> On 18 January 2018 at 23:56, Amit Langote  
> wrote:
>> So, I've been assuming that the planner changes in the run-time pruning
>> patch have to do with selecting clauses (restriction clauses not
>> containing Consts and/or join clauses) to be passed to the executor by
>> recording them in the Append node.  Will they be selected by the planner
>> calling into partition.c?
>
> I had thought so. I only have a rough idea in my head and that's that
> the PartitionPruneInfo struct that I wrote for the run-time pruning
> patch would have the clause List replaced with this new
> PartScanClauseInfo struct (which likely means it needs to go into
> primnodes.h), this struct would contain all the partition pruning
> clauses in a more structured form so that no matching of quals to the
> partition key wouldn't be required during execution. The idea is that
> we'd just need to call; remove_redundant_clauses,
> extract_bounding_datums and get_partitions_for_keys. I think this is
> the bare minimum of work that can be done in execution since we can't
> remove the redundant clauses until we know the values of the Params.
>
> Likely this means there will need to be 2 functions externally
> accessible for this in partition.c. I'm not sure exactly how best to
> do this. Maybe it's fine just to have allpaths.c call
> extract_partition_key_clauses to generate the PartScanClauseInfo then
> call some version of get_partitions_from_clauses which does do the
> extract_partition_key_clauses duties. This is made more complex by the
> code that handles adding the default partition bound to the quals. I'm
> not yet sure where that should live.
>
> I've also been thinking of having some sort of PartitionPruneContext
> struct that we can pass around the functions. Runtime pruning had to
> add structs which store the Param values to various functions which I
> didn't like. It would be good to just add those to the context and
> have them passed down without having to bloat the parameters in the
> functions. I might try and do that tomorrow too. This should make the
> footprint of the runtime pruning patch is a bit smaller.

Attached is what I had in mind about how to do this. Only the planner
will need to call populate_partition_clause_info. The planner and
executor can call get_partitions_from_clauseinfo. I'll just need to
change the run-time prune patch to pass the PartitionClauseInfo into
the executor in the Append node.

I've also added the context struct that I mentioned above. It's
currently not carrying much weight, but the idea is that this context
will be passed around a bit more with the run-time pruning patch and
it will also carry the details about parameter values. I'll need to
modify a few signatures of functions like partkey_datum_from_expr to
pass the context there too. I didn't do that here because currently,
those functions have no use for the context with the fields that they
currently have.

I've also fixed a bug where when you built the commutator OpExpr in
what's now called extract_partition_key_clauses the inputcollid was
not being properly set. The changes I made there go much further than
just that, I've completely removed the OpExpr from the PartClause
struct as only two members were ever used. I thought it was better
just to add those to PartClause instead.

I also did some renaming of variables that always assumed that the
Expr being compared to the partition key was a constant. This is true
now, but with run-time pruning patch, it won't be, so I thought it was
better to do this here rather than having to rename them in the
run-time pruning patch.

One thing I don't yet understand about the patch is the use of
predicate_refuted_by() in get_partitions_from_or_clause_args(). I did
adjust the comment above that code, but I'm still not certain I fully
understand why that function has to be used instead of building the
clauses for the OR being processed along with the remaining clauses.
Is it that this was too hard to extract that you ended up using
predicate_refuted_by()?

I've also removed the makeBoolExpr call that you were encapsulating
the or_clauses in. I didn't really see the need for this since you
just removed it again when looping over the or_clauses.

The only other changes are just streamlining code and comment changes.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_partition_prune_v21_delta_drowley_v1.patch
Description: Binary data


Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-19 Thread David Rowley
On 19 January 2018 at 16:00, Kyotaro HORIGUCHI
 wrote:
> And I'd like to ask David to check out his mail environment so
> that SPF record is available for his message.

Will investigate

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-18 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 18 Jan 2018 11:41:00 -0800, Andres Freund <and...@anarazel.de> wrote in 
<20180118194100.dy3kxdtktsbvm...@alap3.anarazel.de>
> Hi Amit,
> 
> It seems your mail system continually adds "[Sender Address Forgery]"
> prefixes to messages. E.g. this mail now has
> Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender 
> Address Forgery]Re: [HACKERS] path toward faster partition pruning
> as its subject, whereas the mail you're replying to only had
> Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: 
> [HACKERS] path toward faster partition pruning
> two of them.
> 
> I think the two previous occurances of this also are from you.
> 
> This is somewhat annoying, could you try to figure out a) what the
> problem is b) how to prevent the subject being edited like that?

Our mail server is failing to fetch SPF record for David's mails
that received directly (not via -hakders ML) and the server adds
the subject header.  It is failing to fetch SPF record for
2ndquadrant.com. The reason might be that the envelope-from of
his mails is not consistent with his server's IP address.

Anyway, mails via -hackers ML doesn't suffer so, what Amit (and
I) side can do by myself is one of the following.

- Being careful to reply to the mails comming via the ML.
- Remove the added header by hand..


And I'd like to ask David to check out his mail environment so
that SPF record is available for his message.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-18 Thread Andres Freund
Hi Amit,

It seems your mail system continually adds "[Sender Address Forgery]"
prefixes to messages. E.g. this mail now has
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender 
Address Forgery]Re: [HACKERS] path toward faster partition pruning
as its subject, whereas the mail you're replying to only had
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] 
path toward faster partition pruning
two of them.

I think the two previous occurances of this also are from you.

This is somewhat annoying, could you try to figure out a) what the
problem is b) how to prevent the subject being edited like that?

Regards,

Andres



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-18 Thread David Rowley
On 18 January 2018 at 23:56, Amit Langote  wrote:
>> I've not fully worked out how run-time pruning
>> will use this as it'll need another version of
>> get_partitions_from_clauses but passes in a PartScanClauseInfo
>> instead, and does not call extract_partition_key_clauses. That area
>> probably  needs some shuffling around so that does not end up a big
>> copy and paste of all that new logic.
>>
> So, I've been assuming that the planner changes in the run-time pruning
> patch have to do with selecting clauses (restriction clauses not
> containing Consts and/or join clauses) to be passed to the executor by
> recording them in the Append node.  Will they be selected by the planner
> calling into partition.c?

I had thought so. I only have a rough idea in my head and that's that
the PartitionPruneInfo struct that I wrote for the run-time pruning
patch would have the clause List replaced with this new
PartScanClauseInfo struct (which likely means it needs to go into
primnodes.h), this struct would contain all the partition pruning
clauses in a more structured form so that no matching of quals to the
partition key wouldn't be required during execution. The idea is that
we'd just need to call; remove_redundant_clauses,
extract_bounding_datums and get_partitions_for_keys. I think this is
the bare minimum of work that can be done in execution since we can't
remove the redundant clauses until we know the values of the Params.

Likely this means there will need to be 2 functions externally
accessible for this in partition.c. I'm not sure exactly how best to
do this. Maybe it's fine just to have allpaths.c call
extract_partition_key_clauses to generate the PartScanClauseInfo then
call some version of get_partitions_from_clauses which does do the
extract_partition_key_clauses duties. This is made more complex by the
code that handles adding the default partition bound to the quals. I'm
not yet sure where that should live.

I've also been thinking of having some sort of PartitionPruneContext
struct that we can pass around the functions. Runtime pruning had to
add structs which store the Param values to various functions which I
didn't like. It would be good to just add those to the context and
have them passed down without having to bloat the parameters in the
functions. I might try and do that tomorrow too. This should make the
footprint of the runtime pruning patch is a bit smaller.

> Meanwhile, here is a revised version (v21) that incorporates your changes.
>  I added you as the author in 0002 and 0005 patches.  I guess a v22 will
> have to follow very soon...

Thanks for merging that in.

I'll have a try at making this work tomorrow, although I'm not sure
yet how much time I'll have to dedicate to it as I have a few other
things to do too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-17 Thread David Rowley
On 18 January 2018 at 00:13, David Rowley  wrote:
> On 17 January 2018 at 23:48, Amit Langote  wrote:
>> I'm concerned that after your patch to remove
>> match_clauses_to_partkey(), we'd be doing more work than necessary in
>> some cases.  For example, consider the case of using run-time pruning
>> for nested loop where the inner relation is a partitioned table.  With
>> the old approach, get_partitions_from_clauses() would only be handed
>> the clauses that are known to match the partition keys (which most
>> likely is fewer than all of the query's clauses), so
>> get_partitions_from_clauses() doesn't have to do the work of filtering
>> non-partition clauses every time (that is, for every outer row).
>> That's why I had decided to keep that part in the planner.
>
> That might be better served by splitting
> classify_partition_bounding_keys() into separate functions, the first
> function would be in charge of building keyclauses_all. That way the
> remaining work during the executor would never need to match clauses
> to a partition key as they'd be in lists dedicated to each key.

I've attached another delta against your v20 patch which does this.
It's very rough for now and I've only checked that it passes the
regression test so far.

It will need some cleanup work, but I'd be keen to know what you think
of the general idea. I've not fully worked out how run-time pruning
will use this as it'll need another version of
get_partitions_from_clauses but passes in a PartScanClauseInfo
instead, and does not call extract_partition_key_clauses. That area
probably  needs some shuffling around so that does not end up a big
copy and paste of all that new logic.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_partition_prune_v20_delta_drowley_v2.patch
Description: Binary data


Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-17 Thread David Rowley
On 17 January 2018 at 23:48, Amit Langote  wrote:
> I'm concerned that after your patch to remove
> match_clauses_to_partkey(), we'd be doing more work than necessary in
> some cases.  For example, consider the case of using run-time pruning
> for nested loop where the inner relation is a partitioned table.  With
> the old approach, get_partitions_from_clauses() would only be handed
> the clauses that are known to match the partition keys (which most
> likely is fewer than all of the query's clauses), so
> get_partitions_from_clauses() doesn't have to do the work of filtering
> non-partition clauses every time (that is, for every outer row).
> That's why I had decided to keep that part in the planner.

That might be better served by splitting
classify_partition_bounding_keys() into separate functions, the first
function would be in charge of building keyclauses_all. That way the
remaining work during the executor would never need to match clauses
to a partition key as they'd be in lists dedicated to each key.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-17 Thread Amit Langote
Hi David.

On Wed, Jan 17, 2018 at 6:19 PM, David Rowley
 wrote:
> On 17 January 2018 at 17:05, David Rowley  
> wrote:
>> 6. Which brings me to; why do we need match_clauses_to_partkey at all?
>> classify_partition_bounding_keys seems to do all the work
>> match_clauses_to_partkey does, plus more. Item #3 above is caused by
>> an inconsistency between these functions. What benefit does
>> match_clauses_to_partkey give? I might understand if you were creating
>> list of clauses matching each partition key, but you're just dumping
>> everything in one big list which causes
>> classify_partition_bounding_keys() to have to match each clause to a
>> partition key again, and classify_partition_bounding_keys is even
>> coded to ignore clauses that don't' match any key, so it makes me
>> wonder what is match_clauses_to_partkey actually for?
>
> I started to look at this and ended up shuffling the patch around a
> bit to completely remove the match_clauses_to_partkey function.
>
> I also cleaned up some of the comments and shuffled some fields around
> in some of the structs to shrink them down a bit.
>
> All up, this has saved 268 lines of code in the patch.
>
> src/backend/catalog/partition.c   | 296 ---
> src/backend/optimizer/path/allpaths.c | 368 ++
> 2 files changed, 198 insertions(+), 466 deletions(-)
>
> It's had very minimal testing. Really I've only tested that the
> regression tests pass.
>
> I also fixed up the bad assumption that IN lists will contain Consts
> only which hopefully fixes the crash I reported earlier.
>
> I saw you'd added a check to look for contradicting IS NOT NULL
> clauses when processing an IS NULL clause, but didn't do anything for
> the opposite case. I added code for this so it behaves the same
> regardless of the clause order.
>
> Can you look at my changes and see if I've completely broken anything?

Thanks for the patch.  I applied the patch and see that it didn't
break any tests, although haven't closely reviewed the code yet.

I'm concerned that after your patch to remove
match_clauses_to_partkey(), we'd be doing more work than necessary in
some cases.  For example, consider the case of using run-time pruning
for nested loop where the inner relation is a partitioned table.  With
the old approach, get_partitions_from_clauses() would only be handed
the clauses that are known to match the partition keys (which most
likely is fewer than all of the query's clauses), so
get_partitions_from_clauses() doesn't have to do the work of filtering
non-partition clauses every time (that is, for every outer row).
That's why I had decided to keep that part in the planner.

Thanks,
Amit



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-17 Thread David Rowley
On 17 January 2018 at 17:05, David Rowley  wrote:
> 6. Which brings me to; why do we need match_clauses_to_partkey at all?
> classify_partition_bounding_keys seems to do all the work
> match_clauses_to_partkey does, plus more. Item #3 above is caused by
> an inconsistency between these functions. What benefit does
> match_clauses_to_partkey give? I might understand if you were creating
> list of clauses matching each partition key, but you're just dumping
> everything in one big list which causes
> classify_partition_bounding_keys() to have to match each clause to a
> partition key again, and classify_partition_bounding_keys is even
> coded to ignore clauses that don't' match any key, so it makes me
> wonder what is match_clauses_to_partkey actually for?

I started to look at this and ended up shuffling the patch around a
bit to completely remove the match_clauses_to_partkey function.

I also cleaned up some of the comments and shuffled some fields around
in some of the structs to shrink them down a bit.

All up, this has saved 268 lines of code in the patch.

src/backend/catalog/partition.c   | 296 ---
src/backend/optimizer/path/allpaths.c | 368 ++
2 files changed, 198 insertions(+), 466 deletions(-)

It's had very minimal testing. Really I've only tested that the
regression tests pass.

I also fixed up the bad assumption that IN lists will contain Consts
only which hopefully fixes the crash I reported earlier.

I saw you'd added a check to look for contradicting IS NOT NULL
clauses when processing an IS NULL clause, but didn't do anything for
the opposite case. I added code for this so it behaves the same
regardless of the clause order.

Can you look at my changes and see if I've completely broken anything?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_partition_prune_v20_delta_drowley.patch
Description: Binary data


Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-16 Thread David Rowley
On 16 January 2018 at 21:08, Amit Langote  wrote:
> Attached v20.  Thanks again.

Thanks for working on v20. I've had a look over part of it and I've
written down the following:

1. The following comment is not correct

/*
* Equality look up key.  Values in the following array appear in no
* particular order (unlike minkeys and maxkeys below which must appear in
* the same order as the partition key columns).

These must be in partition key order, just like the others.

This part is not true either:

* the same order as the partition key columns).  n_eqkeys must be equal to
* the number of partition keys to be valid (except in the case of hash
* partitioning where that's not required).  When set, minkeys and maxkeys
* are ignored.

range2 is pruned just fine from the following:

create table rangep (a int, b int) partition by range (a,b);
create table rangep1 partition of rangep for values from (1,10) to (1,20);
create table rangep2 partition of rangep for values from (2,10) to (2,20);

explain select * from rangep where a = 1;
  QUERY PLAN
---
 Append  (cost=0.00..38.25 rows=11 width=8)
   ->  Seq Scan on rangep1  (cost=0.00..38.25 rows=11 width=8)
 Filter: (a = 1)
(3 rows)

2. You've added a list_copy() to get_partitions_from_clauses so as not
to modify the input list, but this function calls
get_partitions_from_clauses_recurse which calls
classify_partition_bounding_keys() which modifes that list. Would it
not just be better to make a list copy in
get_partitions_from_clauses() without any conditions?

If we get new users of that function, e.g Run-time pruning, then they
might be surprised to see new items magically added to their input
list without mention of that behaviour in the function comment.

3. The following case causes an Assert failure:

drop table listp;
CREATE TABLE listp (a int, b int) partition by list (a);
create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);

prepare q1 (int) as select * from listp where a = 1 and a in($1,10);

explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1); -- <--- Assert failure!

In match_clauses_to_partkey you always add the ScalarArrayOpExpr to
the result regardless if it is a complete set of Consts, however, the
code in classify_partition_bounding_keys() that deals with
ScalarArrayOpExpr in can't handle non-consts

/* OK to add to the result. */
result = lappend(result, clause);
if (IsA(eval_const_expressions(root, rightop), Const))
*contains_const = true;
else
*contains_const = false;

*contains_consts is reset to true again by the a = 1 qual, so
get_partitions_from_clauses() gets called from
get_append_rel_partitions. Later classify_partition_bounding_keys()
when processing the ScalarArrayOpExpr, the following code assumes the
array exprs are all Consts:

foreach(lc1, elem_exprs)
{
Const  *rightop = castNode(Const, lfirst(lc1));


Setting *contains_const = false; in match_clauses_to_partkey() is not
correct either. If I understand the intent here correctly, you want
this to be set to true if the clause list contains quals with any
consts that are useful for partition pruning during planning. If
that's the case then you should set it to true if you find a suitable
clause, otherwise leave it set to false as you set it to at the start
of the function. What you have now will have varying results based on
the order of the clauses in the list, which is certainly not correct.

4. The following code can be rearranged to not pull_varnos if there's
no commutator op.

constexpr = leftop;
constrelids = pull_varnos((Node *) leftop);
expr_op = get_commutator(expr_op);

/*
* If no commutator exists, cannot flip the qual's args,
* so give up.
*/
if (!OidIsValid(expr_op))
continue;

5. Header comment for match_clauses_to_partkey() says only clauses in
the pattern of "partkey op const" and "const op partkey" are handled.
NULL tests are also mentioned but nothing is mentioned about
ScalarArrayOpExpr. It might be better to be less verbose about what
the function handles, but if you're listing what is handled then you
should not make false claims.

 * For an individual clause to match with a partition key column, the clause
 * must be an operator clause of the form (partkey op const) or (const op
 * partkey); the latter only if a suitable commutator exists.  Furthermore,

6. Which brings me to; why do we need match_clauses_to_partkey at all?
classify_partition_bounding_keys seems to do all the work
match_clauses_to_partkey does, plus more. Item #3 above is caused by
an inconsistency between these functions. What benefit does
match_clauses_to_partkey give? I might understand if you were creating
list of clauses matching each partition key, but you're just dumping
everything in one big list which 

Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-16 Thread David Rowley
On 16 January 2018 at 21:08, Amit Langote  wrote:
> On 2018/01/12 12:30, David Rowley wrote:
>> 8. The code in get_partitions_from_ne_clauses() does perform quite a
>> few nested loops. I think a more simple way to would be to track the
>> offsets you've seen in a Bitmapset. This would save you having to
>> check for duplicates, as an offset can only contain a single datum.
>> You'd just need to build a couple of arrays after that, one to sum up
>> the offsets found per partition, and one for the total datums allowed
>> in the partition. If the numbers match then you can remove the
>> partition.
>>
>> I've written this and attached it to this email. It saves about 50
>> lines of code and should perform much better for complex cases, for
>> example, a large NOT IN list. This also implements #6.
>
> I liked your patch, so incorporated it, except, I feel slightly
> uncomfortable about the new name that you chose for the function because
> it sounds a bit generic.

You're right. I only renamed it because I inverted the meaning of the
function in the patch. It no longer did
"get_partitions_from_ne_clauses", it did the opposite and give the
partitions which can't match. Please feel free to think of a new
better name. Is "get_partitions_excluded_by_ne_clauses" too long?

>> I think you quite often use "the same" to mean "it". Can you change that?
>
> I guess that's just one of my many odd habits when writing English, all of
> which I'm trying to get rid of, but apparently with limited success.  Must
> try harder. :)

Oops, on re-reading that it sounded as though I was asking you to
change some habit, but I just meant the comments. I understand there
will be places that use English where that's normal. It's just I don't
recall seeing that in PostgreSQL code before. American English is
pretty much the standard for the project, despite that not always
being strictly applied (e.g we have a command called ANALYSE which is
an alias for ANALYZE). I always try and do my best to spell words in
American English (which is not where I'm from), which for me stretches
about as far as putting 'z' in the place of some of my 's'es.

> I rewrote the above comment block as:
>
>  * Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
>  * order, using the operator of 'op' and set *result to the result of this
>  * comparison.
>
> Is that any better?

Sounds good.

>
>> 15. "the latter" is normally used when you're referring to the last
>> thing in a list which was just mentioned. In this case, leftarg_const
>> and rightarg_const is the list, so "the latter" should mean
>> rightarg_const, but I think you mean to compare them using the
>> operator.
>>
>> * If the leftarg_const and rightarg_const are both of the type expected
>> * by op's operator, then compare them using the latter.
>
> Rewrote it as:
>
>  * We can compare leftarg_const and rightarg_const using op's operator
>  * only if both are of the type expected by it.

I'd probably write "expected type." instead of "type expected by it."

>> 17. The following example will cause get_partitions_for_keys_hash to 
>> misbehave:
>>
>> create table hashp (a int, b int) partition by hash (a, b);
>> create table hashp1 partition of hashp for values with (modulus 4, remainder 
>> 0);
>> create table hashp2 partition of hashp for values with (modulus 4, remainder 
>> 1);
>> create table hashp3 partition of hashp for values with (modulus 4, remainder 
>> 3);
>> create table hashp4 partition of hashp for values with (modulus 4, remainder 
>> 2);
>> explain select * from hashp where a = 1 and a is null;
>>
>> The following code assumes that you'll never get a NULL test for a key
>> that has an equality test, and ends up trying to prune partitions
>> thinking we got compatible clauses for both partition keys.
>
> Yeah, I think this example helps spot a problem.  I thought we'd never get
> to get_partitions_for_keys_hash() for the above query, because we
> should've been able to prove much earlier that the particular clause
> combination should be always false (a cannot be both non-null 1 and null).
>  Now, because the planner itself doesn't substitute a constant-false for
> that, I taught classify_partition_bounding_keys() to do so.  It would now
> return constfalse=true if it turns out that clauses in the input list lead
> to contradictory nullness condition for a given partition column.
>
>>   memset(keyisnull, false, sizeof(keyisnull));
>>   for (i = 0; i < partkey->partnatts; i++)
>>   {
>> if (bms_is_member(i, keys->keyisnull))
>> {
>>   keys->n_eqkeys++;
>>   keyisnull[i] = true;
>> }
>>   }
>>
>>   /*
>>* Can only do pruning if we know all the keys and they're all equality
>>* keys including the nulls that we just counted above.
>>*/
>>   if (keys->n_eqkeys == partkey->partnatts)
>>
>> The above code will need to be made smarter. It'll likely crash if you
>> change "b" to a pass-by-ref type.
>

Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-12 Thread Robert Haas
On Thu, Jan 11, 2018 at 10:30 PM, David Rowley
 wrote:
> Instead, can you make it:
>
> if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
> keys->n_maxkeys > 0 || n_keynullness > 0)
> return true;
>
> return false;

Or better yet:

return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || n_keynullness > 0);

It's not really necessary to write if (some condition is true) return
true; return false when you can just write return (boolean-valued
condition).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-11 Thread Amit Langote
David,

On 2018/01/12 12:30, David Rowley wrote:
> Can you also perform a self-review of the patch? Some of the things
> I'm picking up are leftovers from a previous version of the patch. We
> might never get through this review if you keep leaving those around!

Sorry, I will look more closely before posting the next version.  I guess
I may have rushed a bit too much when posting the v18/v19 patches, partly
because it's been 3 weeks since v17 and I felt I needed to catch up
quickly given the activity on the run-time pruning thread which depends on
the patches here.

Thanks,
Amit




Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-11 Thread David Rowley
On 12 January 2018 at 15:27, Amit Langote  wrote:
> On 2018/01/11 19:23, David Rowley wrote:

>> ERROR:  operator 531 is not a member of opfamily 1976
>
> You'll be able to see that the error no longer appears with the attached
> updated set of patches, but I'm now seeing that the resulting plan with
> patched for this particular query differs from what master (constraint
> exclusion) produces.  Master produces a plan with no partitions (as one
> would think is the correct plan), whereas patched produces a plan
> including the xy1 partition.  I will think about that a bit and post
> something later.

Thanks for looking at that.

I've got a few more things for you. I'm only partway through another
pass, but it makes sense to post what I have now if you're working on
a new version.

1. partitioing -> partitioning

 * Strategy of a partition clause operator per the partitioing operator class

2. get_partitions_from_clauses() modifies partclauses without
mentioning it in the header. I think you need to either:

a) warn about this in the header comment; or
b) do a list_copy() before list_concat()
c) do list_truncate back to the original length after you're done with the list.

3. get_partitions_from_clauses_recurse(), with:

  result = bms_add_range(result, 0, partdesc->nparts - 1);

You could change that to bms_add_range(NULL, ...) and ditch the
assignment of result to NULL at the start of the function.

4. classify_partition_bounding_keys() now returns bool, but the return
statement is still:

return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;

my compiler didn't warn about that, but I'd imagine some might.

Instead, can you make it:

if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || n_keynullness > 0)
return true;

return false;

probably equal keys are the most likely case, so it'll be good to
short circuit instead of performing addition on a bunch of stuff we
don't care about anymore.

5. In classify_partition_bounding_keys, why do we "continue" here?

clause = rinfo->clause;
if (rinfo->pseudoconstant &&
!DatumGetBool(((Const *) clause)->constvalue))
{
*constfalse = true;
continue;
}

Is there any point in searching further?

Also, if you were consistent with the return value for
classify_partition_bounding_keys when you've set *constfalse = true;
you wouldn't need to handle the case twice like you are in
get_partitions_from_clauses_recurse().

6. I think it would be nicer if get_partitions_from_ne_clauses returns
a set of partitions that could be excluded.

So instead of:

 * get_partitions_from_ne_clauses
 *
 * Return partitions of relation that satisfy all <> operator clauses in
 * ne_clauses.  Only ever called if relation is a list partitioned table.

Have:

 * get_partitions_from_ne_clauses
 *
 * Returns a Bitmapset of partitions that can be safely excluded due to
 * not-equal clauses existing for all possible partition values. It is only
 * valid to call this for LIST partitioned tables.

and instead of:

result = bms_add_range(NULL, 0, partdesc->nparts - 1);
result = bms_del_members(result, excluded_parts);
bms_free(excluded_parts);

return result;

Just do:

return excluded_parts;

and in get_partitions_from_clauses_recurse(), do bms_del_members
instead of bms_int_members.

there's less bit shuffling and it seems cleaner. Perhaps the function
name would need to be changed if we're inverting the meaning too.

(I've attached a patch which makes this change along with an idea in #8 below)

7. The following comment claims the function sets *datum, but there's
no param by that name:

/*
 * partkey_datum_from_expr
 * Extract constant value from expr and set *datum to that value
 */
static bool
partkey_datum_from_expr(PartitionKey key, int partkeyidx,
Expr *expr, Datum *value)

8. The code in get_partitions_from_ne_clauses() does perform quite a
few nested loops. I think a more simple way to would be to track the
offsets you've seen in a Bitmapset. This would save you having to
check for duplicates, as an offset can only contain a single datum.
You'd just need to build a couple of arrays after that, one to sum up
the offsets found per partition, and one for the total datums allowed
in the partition. If the numbers match then you can remove the
partition.

I've written this and attached it to this email. It saves about 50
lines of code and should perform much better for complex cases, for
example, a large NOT IN list. This also implements #6.

9. "the same" -> "it"

/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle the same if its negator is indeed a part of the
* partitioning operator family.
*/

10. in classify_partition_bounding_keys: "0" -> "false"

/* Return if no work to do below. */
if (!will_compute_keys || *constfalse)
return 0;

Likewise for:

if (*constfalse)
return 0;

11. I don't see partition_bound_bsearch used anywhere below the