Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-07 Thread Amit Langote
Hi David.

Thanks for the review.

(..also looking at the comments you sent earlier today.)

On 2017/11/07 11:14, David Rowley wrote:
> On 7 November 2017 at 01:52, David Rowley  
> wrote:
>> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
> 
> I have a little more review to share:
> 
> 1. Missing "in" in comment. Should be "mentioned in"
> 
>  * get_append_rel_partitions
>  * Return the list of partitions of rel that pass the clauses mentioned
>  * rel->baserestrictinfo
> 
> 2. Variable should be declared in inner scope with the following fragment:
> 
> void
> set_basic_child_rel_properties(PlannerInfo *root,
>RelOptInfo *rel,
>RelOptInfo *childrel,
>AppendRelInfo *appinfo)
> {
> AttrNumber attno;
> 
> if (rel->part_scheme)
> {
> 
> which makes the code the same as where you moved it from.

It seems like you included the above changes in your attached C file,
which I will incorporate into my repository.

> 3. Normally lfirst() is assigned to a variable at the start of a
> foreach() loop. You have code which does not follow this.
> 
> foreach(lc, clauses)
> {
> Expr   *clause;
> int i;
> 
> if (IsA(lfirst(lc), RestrictInfo))
> {
> RestrictInfo *rinfo = lfirst(lc);
> 
> You could assign this to a Node * since the type is unknown to you at
> the start of the loop.

Will make the suggested changes to match_clauses_to_partkey().

> 4.
> /*
> * Useless if what we're thinking of as a constant is actually
> * a Var coming from this relation.
> */
> if (bms_is_member(rel->relid, constrelids))
> continue;
> 
> should this be moved to just above the op_strict() test? This one seems 
> cheaper.

Agreed, will do.  Also makes sense to move the PartCollMatchesExprColl()
test together.

> 5. Typo "paritions": /* No clauses to prune paritions, so scan all
> partitions. */
> 
> But thinking about it more the comment should something more along the
> lines of /* No useful clauses for partition pruning. Scan all
> partitions. */

You fixed it. :)

> The key difference is that there might be clauses, just without Consts.
> 
> Actually, the more I look at get_append_rel_partitions() I think it
> would be better if you re-shaped that if/else if test so that it only
> performs the loop over the partindexes if it's been set.
> 
> I ended up with the attached version of the function after moving
> things around a little bit.

Thanks a lot for that.  Looks much better now.

> I'm still reviewing but thought I'd share this part so far.

As mentioned at the top, I'm looking at your latest comments and they all
seem to be good points to me, so will address those in the next version.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread Amit Langote
On 2017/11/06 21:52, David Rowley wrote:
> On 6 November 2017 at 23:01, Amit Langote  
> wrote:
>> OK, I have gotten rid of the min/max partition index interface and instead
>> adopted the bms_add_range() approach by including your patch to add the
>> same in the patch set (which is now 0002 in the whole set).  I have to
>> admit that it's simpler to understand the new code with just Bitmapsets to
>> look at, but I'm still a bit concerned about materializing the whole set
>> right within partition.c, although we can perhaps optimize it later.
> 
> Thanks for making that change. The code looks much more simple now.
> 
> For performance, if you're worried about a very large number of
> partitions, then I think you're better off using bms_next_member()
> rather than bms_first_member(), (likely this applies globally, but you
> don't need to worry about those).
> 
> The problem with bms_first_member is that it must always loop over the
> 0 words before it finds any bits set for each call, whereas
> bms_next_member will start on the word it was last called for. There
> will likely be a pretty big performance difference between the two
> when processing a large Bitmapset.

Ah, thanks for the explanation.  I will change it to bms_next_member() in
the next version.

>> Attached updated set of patches, including the fix to make the new pruning
>> code handle Boolean partitioning.
> 
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

Thank you.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread Amit Langote
On 2017/11/06 13:15, David Rowley wrote:
> On 31 October 2017 at 21:43, Amit Langote  
> wrote:
>> Attached updated version of the patches
> 
> match_clauses_to_partkey() needs to allow for the way quals on Bool
> columns are represented.
> 
> create table pt (a bool not null) partition by list (a);
> create table pt_true partition of pt for values in('t');
> create table pt_false partition of pt for values in('f');
> explain select * from pt where a = true;
> QUERY PLAN
> --
>  Append  (cost=0.00..76.20 rows=2810 width=1)
>->  Seq Scan on pt_false  (cost=0.00..38.10 rows=1405 width=1)
>  Filter: a
>->  Seq Scan on pt_true  (cost=0.00..38.10 rows=1405 width=1)
>  Filter: a
> (5 rows)
> 
> match_clause_to_indexcol() shows an example of how to handle this.
> 
> explain select * from pt where a = false;
> 
> will need to be allowed too. This works slightly differently.

You're right.  I've fixed things to handle Boolean partitioning in the
updated set of patches I will post shortly.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-05 Thread David Rowley
On 6 November 2017 at 17:30, Amit Langote  wrote:
> On 2017/11/03 13:32, David Rowley wrote:
>> On 31 October 2017 at 21:43, Amit Langote  
>> wrote:
>> 1. This comment seem wrong.
>>
>> /*
>> * Since the clauses in rel->baserestrictinfo should all contain Const
>> * operands, it should be possible to prune partitions right away.
>> */
>
> Yes.  I used to think it was true, then realized it isn't and updated the
> code to get rid of that assumption, but I forgot updating this comment.
> Fixed.
>
>> How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
>> baserestrictinfo in this case will contain a single RestrictInfo with
>> an OpExpr containing two Var args and it'll come right through that
>> function too.

...

> We won't be able to use such a clause for pruning at all; neither
> planning-time pruning nor execution-time pruning.  Am I missing something?

I just meant the comment was wrong.

>
> The design with min/max partition index interface to the partition.c's new
> partition-pruning facility is intentional.  You can find hints about how
> such a design came about in the following Robert's email:
>
> https://www.postgresql.org/message-id/CA%2BTgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw%40mail.gmail.com

Yeah, I remember reading that before I had looked at the code. I
disagree with Robert on this. The fact that the min/max range gets
turned into a list of everything in that range in
get_append_rel_partitions means all the advantages that storing the
partitions as a range is voided. If you could have kept it a range the
entire time, then that might be different, but seems you need to
materialize the entire range in order to sort the partitions into
order. I've included Robert in just in case he wants to take a look at
the code that resulted from that design. Maybe something is not
following what he had in mind, or maybe he'll change his mind based on
the code that resulted.


> For range queries, it is desirable for the partitioning module to return
> the set of qualifying partitions that are contiguous in a compact (O(1))
> min/max representation than having to enumerate all those indexes in the
> set.  It's nice to avoid iterating over that set twice -- once when
> constructing the set in the partitioning module and then again in the
> caller (in this case, planner) to perform the planning-related tasks per
> selected partition.

The idea is that you still get the min and max from the bsearch, but
then use bms_add_range() to populate a bitmapset of the matching
partitions. The big-O notation of the search shouldn't change.

> We need the other_parts Bitmapset too, because selected partitions may not
> always be contiguous, even in the case of range partitioning, if there are
> OR clauses and the possibility that the default partition is also
> selected.  While computing the selected partition set from a given set of
> clauses, partitioning code tries to keep the min/max representation as
> long as it makes sense to and once the selected partitions no longer
> appear to be contiguous it switches to the Bitmapset mode.

Yip. I understand that. I just think there's no benefit to having
min/max since it needs to be materialized into a list of the entire
range at some point, it might as well be done as soon as possible
using a bitmapset, which would save having all the partset_union,
partset_intersect, partset_range_empty, partset_range_overlap,
partset_range_adjacent code. You'd end up just using bms_union and
bms_intersect then bms_add_range to handle the min/max bound you get
from the bsearch.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-05 Thread Amit Langote
On 2017/11/06 12:53, David Rowley wrote:
> On 3 November 2017 at 17:32, David Rowley  
> wrote:
>> 2. This code is way more complex than it needs to be.
>>
>> if (num_parts > 0)
>> {
>> int j;
>>
>> all_indexes = (int *) palloc(num_parts * sizeof(int));
>> j = 0;
>> if (min_part_idx >= 0 && max_part_idx >= 0)
>> {
>> for (i = min_part_idx; i <= max_part_idx; i++)
>> all_indexes[j++] = i;
>> }
>> if (!bms_is_empty(other_parts))
>> while ((i = bms_first_member(other_parts)) >= 0)
>> all_indexes[j++] = i;
>> if (j > 1)
>> qsort((void *) all_indexes, j, sizeof(int), intcmp);
>> }
>>
>> It looks like the min/max partition stuff is just complicating things
>> here. If you need to build this array of all_indexes[] anyway, I don't
>> quite understand the point of the min/max. It seems like min/max would
>> probably work a bit nicer if you didn't need the other_parts
>> BitmapSet, so I recommend just getting rid of min/max completely and
>> just have a BitmapSet with bit set for each partition's index you
>> need, you'd not need to go to the trouble of performing a qsort on an
>> array and you could get rid of quite a chunk of code too.
>>
>> The entire function would then not be much more complex than:
>>
>> partindexes = get_partitions_from_clauses(parent, partclauses);
>>
>> while ((i = bms_first_member(partindexes)) >= 0)
>> {
>> AppendRelInfo *appinfo = rel->part_appinfos[i];
>> result = lappend(result, appinfo);
>> }
>>
>> Then you can also get rid of your intcmp() function too.
> 
> I've read a bit more of the patch and I think even more now that the
> min/max stuff should be removed.

Oops, I didn't catch this email before sending my earlier reply.  Thanks
for the bms range patch.  Will reply to this shortly after reading your
patch and thinking on it a bit.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers