On Wed, Apr 26, 2017 at 9:30 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Apr 24, 2017 at 7:06 AM, Ashutosh Bapat > <ashutosh.ba...@enterprisedb.com> wrote: >> This assumes that datums in partition bounds have one to one mapping >> with the partitions, which isn't true for list partitions. For list >> partitions we have multiple datums corresponding to the items listed >> associated with a given partition. So, simply looping over the >> partitions of outer relations doesn't work; in fact there are two >> outer relations for a full outer join, so we have to loop over both of >> them together in a merge join fashion. > > Maybe so, but my point is that it can be done with the original types, > without converting anything to a different type. >
Theoretically, I agree with this. But practically the implementation is lot more complex than what you have described in the earlier mails. I am afraid, that the patch with those changes will be a lot harder to review and commit. Later in this mail, I will try to explain some of the complexities. > > I'm going to say this one more time: I really, really, really think > you need to avoid trying to convert the partition bounds to a common > type. I said before that the infrastructure to do that is not present > in our type system, and I'm pretty sure that statement is 100% > correct. The fact that you can find other cases where we do something > sorta like that but in a different case with different requirements > doesn't make that false. Ok. Thanks for the explanation. The current design and implementation is for a restricted case where the partition bounds, partition key types and numbers match exactly. We want to commit an implementation which is reasonably extensible and doesn't require a lot of changes when we add more capabilities. Some of the extensions we discussed are as follows: 1. Partition-wise join when the existing partitions have matching bounds/lists but there can be extra partitions on either side of the join (between base relations or join relations) without a matching partition on the other side.\ 2. Partition-wise join when the partition bounds/lists do not match exactly but there is 1:1 or 1:0 or 0:1 mapping between the partitions which can contribute to the final result. E.g. A (0-100, 100 - 150, 200-300), B (0-50, 125-200, 300-400) 3. Partition-wise join when the partition key types do not match, but there's a single opfamily being used for partitioning. 4. Partition-wise join where 1:m or m:n mapping exists between partitions of the joining relations. First one is clearly something that we will need. We may add it in the first commit or next commit, but it will be needed pretty soon (v11?). To me 2nd is more important than the 3rd one. You may have a different view. We will expect 3rd optimization to work with all the prior optimizations. I am restricting myself from thinking about 4th one since that requires ganging together multiple RelOptInfos as a single RelOptInfo while joining, something we don't have infrastruture for. In case of first goal, supporting INNER joins and OUTER joins where the partitions are missing on the OUTER side but not inner side are easier. In those cases we just drop those partitions and corresponding bounds/lists from the join. For a FULL OUTER join, where both sides act as OUTER as well as INNER, we will need exact mapping between the partitions. For supporting OUTER joins where partitions on the INNER sides can be missing, we need to create some "dummy" relations representing the missing partitions so that we have OUTER rows with NULL inner side. This requires giving those dummy relations some relids and thus in case of base relations we may need to inject some dummy children. This may mean that we have to expand simple_rel_array as part of outer join, may or may not require adding new AppendRelInfos and so on. We are basically breaking an assumption that base relations can not be introduced while planning joins and that might require some rework in the existing infrastructure. There might be other ways to introduce dummy relations during join planning, but I haven't really thought through the problem. The third goal requires that the partition bounds be compared based on the partition keys present in the equi-join. While matching the partitions to be joined, the partition bounds corresponding the base relation whose partition keys appear in the equi-join are used for comparison using support function corresponding to the data types of partition keys. This requires us to associate the partitions of a join with the bounds of base relation. E.g. A(A1, A2) with bounds (X1, X3) (notice missing X2), B (B1, B2) bounds (X1, X2), C (C1, C2, C3) bounds (X1, X2, X3) and the join is A LJ B on A.a = B.b LJ C on B.b = C.c assuming strict operators this can be executed as (AB)C or A(BC). AB will have partitions A1B1, A2B3 since there is no matching bound of A for B2 and A is outer relation. A1B1 is associated with bound X1 of A and C both. A2B3 is associated with bound of X3, which happens to be 2nd bound of A but third of B. When we join (AB) with C, we should notice that C1 goes with A1B1, C2 doesn't have any matching partition in AB and C3 goes with A2B3. If we compare bounds of B with C without any transformation we will know C2 matches B2, but we need to look at the children of AB to realize that B2 isn't present in any of the children and thus C2 should not be joined with any partition of AB. That usually looks a quadratic order operation on the number of partitions. The complexity can be reduced by maintaining as many partition bounds as the number of base relations participating in the join (an idea, I have floated earlier [1]) I don't elaborate it here to avoid digression. There's also the complexity of an N-way join with multiple partition keys and joins on partition keys from different relations as discussed in [1]. There may be more involved cases, that I haven't thought about. In short, implementation for 1st and 3rd optimization together looks fairly complex. Add to this the 2nd optimization and it becomes still more complex. In order to keep the patches manageable to implement review and commit, I am proposing following approach. 1. Implement first optimization on top of the current patches, which enforces that the partition key datatypes of the joining relations match. I am right now working on that patch. Do this for INNER join and OUTER join where partitions are missing on the OUTER side and not INNER side. As a side note, the existing partition bound comparison functions are tied to PartitionKey structure and require complete set of bounds from partitioned relation. Both of those are not applicable anymore, PartitionKey structure is not available for join and we have to compare individual bounds in case of join as against one probe with a complete set. This refactoring did eat some time. 2. Implement support for OUTER join where partitions can be missing from either side. 3. Implement support for partition-wise join with different partition key types. All those implementation will be different patches on top of v18 patches. Given the complexities involved in 2 and 3, I am not sure which order I should attack them. I don't have any estimates as to how much time each of those are going to require. May be a couple of months, but I am not sure. Obviously we have to wait till the first commitfest to commit the first version of the patch. So, based on the status at time, we can decide what goes in the first commit of this feature and adjust the patch set accordingly. Thoughts/comments? [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg312916.html -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers