Hi David. Thanks a lot for your review comments and sorry it took me a while to reply.
On 2017/09/28 18:16, David Rowley wrote: > On 27 September 2017 at 14:22, Amit Langote > <langote_amit...@lab.ntt.co.jp> wrote: >> - 0001 includes refactoring that Dilip proposed upthread  (added him as >> an author). I slightly tweaked his patch -- renamed the function >> get_matching_clause to match_clauses_to_partkey, similar to >> match_clauses_to_index. > > Hi Amit, > > I've made a pass over the 0001 patch while trying to get myself up to > speed with the various developments that are going on in partitioning > right now. > > These are just my thoughts from reading over the patch. I understand > that there's quite a bit up in the air right now about how all this is > going to work, but I have some thoughts about that too, which I > wouldn't mind some feedback on as my line of thought may be off. > > Anyway, I will start with some small typos that I noticed while reading: > > partition.c: > > + * telling what kind of NullTest has been applies to the corresponding > > "applies" should be "applied" Will fix. > plancat.c: > > * might've occurred to satisfy the query. Rest of the fields are set in > > "Rest of the" should be "The remaining" Will fix. > Any onto more serious stuff: > > allpath.c: > > get_rel_partitions() > > I wonder if this function does not belong in partition.c. I'd have > expected a function to exist per partition type, RANGE and LIST, I'd > imagine should have their own function in partition.c to eliminate > partitions > which cannot possibly match, and return the remainder. I see people > speaking of HASH partitioning, but we might one day also want > something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine > routing records to be processed into foreign tables where you never > need to query them again). It would be good if we could easily expand > this list and not have to touch files all over the optimizer to do > that. Of course, there would be other work to do in the executor to > implement any new partitioning method too. I think there will have to be at least some code in the optimizer. That is, the code to match the query to the table's partition keys and using the constant values therein to then look up the table's partitions. More importantly, once the partitions (viz. their offsets in the table's partition descriptor) have been looked up by partition.c, we must be able to then map them to their planner data structures viz. their AppendRelInfo, and subsequently RelOptInfo. This last part will have to be in the optimizer. In fact, that was the role of get_rel_partitions in the initial versions of the patch, when neither of the code for matching keys and for pruning using constants was implemented. One may argue that the first part, that is, matching clauses to match to the partition key and subsequent lookup of partitions using constants could both be implemented in partition.c, but it seems better to me to keep at least the part of matching clauses to the partition keys within the planner (just like matching clauses to indexes is done by the planner). Looking up partitions using constants cannot be done outside partition.c anyway, so that's something we have to implement there. > I know there's mention of it somewhere about get_rel_partitions() not > being so smart about WHERE partkey > 20 AND partkey > 10, the code > does not understand what's more restrictive. I think you could > probably follow the same rules here that are done in > eval_const_expressions(). Over there I see that evaluate_function() > will call anything that's not marked as volatile. Hmm, unless I've missed it, I don't see in evaluate_function() anything about determining which clause is more restrictive. AFAIK, such determination depends on the btree operator class semantics (at least in the most common cases where, say, ">" means greater than in a sense that btree code uses it), so I was planning to handle it the way btree code handles it in _bt_preprocess_keys(). In fact, the existing code in predtest.c, which makes decisions of the similar vein also relies on btree semantics. It's OK to do so, because partitioning methods that exist today and for which we'd like to have such smarts use btree semantics to partition the data. Also, we won't need to optimize such cases for hash partitioning anyway. > I'd imagine, for > each partition key, you'd want to store a Datum with the minimum and > maximum possible value based on the quals processed. If either the > minimum or maximum is still set to NULL, then it's unbounded at that > end. If you encounter partkey = Const, then minimum and maximum can be > set to the value of that Const. From there you can likely ignore any > other quals for that partition key, as if the query did contain > another qual with partkey = SomeOtherConst, then that would have > become a gating qual during the constant folding process. This way if > the user had written WHERE partkey >= 1 AND partkey <= 1 the > evaluation would end up the same as if they'd written WHERE partkey = > 1 as the minimum and maximum would be the same value in both cases, > and when those two values are the same then it would mean just one > theoretical binary search on a partition range to find the correct > partition instead of two. Given the way the patch recognizes a given qual as contributing to the equal key or minimum key or maximum key, it will not conclude the above to in fact be an equal key, because that presumably would require comparing clauses among each other to make such a discovery. I'm slightly hesitant to add code to do that in the first version of the patch. That is, for time being let WHERE partkey >= 1 and partkey <= 1 be handled by passing 1 as both minimum and maximum key, which requires two binary searches. Whereas, WHERE partkey = 1 would require only one. Planner code to get rid of the extra binary search lookup could come later, IMHO. > I see in get_partitions_for_keys you've crafted the function to return > something to identify which partitions need to be scanned. I think it > would be nice to see a special element in the partition array for the > NULL and DEFAULT partition. I imagine 0 and 1, but obviously, these > would be defined constants somewhere. The signature of that function > is not so pretty and that would likely tidy it up a bit. The matching > partition indexes could be returned as a Bitmapset, yet, I don't > really see any code which handles adding the NULL and DEFAULT > partition in get_rel_partitions() either, maybe I've just not looked > hard enough yet... New version of the patch I will post soon cleans up the interface of get_partitions_for_keys quite a bit; particularly the way selected partitions are returned, for which I adopted an idea that Robert Haas mentioned . When it recognizes that a sequence of consecutive partitions are to be scanned, it will return the starting and ending offsets as *min_part_idx and *max_part_idx. Those that don't fit this pattern (of which there should be only a few in many cases) are returned in a Bitmapset, as a supposedly unordered set of partitioned. Since NULL and DEFAULT partitions are partitions of this later category, they would be included the bitmapset if it turns out that the query will need to scan them after all. Thanks again. Regards, Amit  https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf8269  https://www.postgresql.org/message-id/CA%2BTgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw%40mail.gmail.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers