Hi David: Thanks for your time.
On Wed, Mar 18, 2020 at 9:56 AM David Rowley <dgrowle...@gmail.com> wrote: > On Mon, 16 Mar 2020 at 06:01, Andy Fan <zhihui.fan1...@gmail.com> wrote: > > > > Hi All: > > > > I have re-implemented the patch based on David's suggestion/code, Looks > it > > works well. The updated patch mainly includes: > > > > 1. Maintain the not_null_colno in RelOptInfo, which includes the not > null from > > catalog and the not null from vars. > > What about non-nullability that we can derive from means other than > NOT NULL constraints. Where will you track that now that you've > removed the UniqueKeySet type? > I tracked it in 'deconstruct_recurse', just before the distribute_qual_to_rels call. + ListCell *lc; + foreach(lc, find_nonnullable_vars(qual)) + { + Var *var = lfirst_node(Var, lc); + RelOptInfo *rel = root->simple_rel_array[var->varno]; + if (var->varattno > InvalidAttrNumber) + rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno); + } > Traditionally we use attno or attnum rather than colno for variable > names containing attribute numbers > Currently I use a list of Var for a UnqiueKey, I guess it is ok? > > > 3. postpone the propagate_unique_keys_to_joinrel call to > populate_joinrel_with_paths > > since we know jointype at that time. so we can handle the semi/anti > join specially. > > ok, but the join type was known already where I was calling the > function from. It just wasn't passed to the function. > > > 4. Add the rule I suggested above, if both of the 2 relation yields the > a unique result, > > the join result will be unique as well. the UK can be ( (rel1_uk1, > rel1_uk2).. ) > > I see. So basically you're saying that the joinrel's uniquekeys should > be the cartesian product of the unique rels from either side of the > join. I wonder if that's a special case we need to worry about too > much. Surely it only applies for clauseless joins Some other cases we may need this as well:). like select m1.pk, m2.pk from m1, m2 where m1.b = m2.b; The cartesian product of the unique rels will make the unqiue keys too long, so I maintain the UnqiueKeyContext to make it short. The idea is if (UK1) is unique already, no bother to add another UK as (UK1, UK2) which is just a superset of it. > > > 5. If the unique key is impossible to be referenced by others, we can > safely ignore > > it in order to keep the (join)rel->unqiuekeys short. > > You could probably have an equivalent of has_useful_pathkeys() and > pathkeys_useful_for_ordering() > > Thanks for suggestion, I will do so in the v5-xx.patch. > > 6. I only consider the not null check/opfamily check for the uniquekey > which comes > > from UniqueIndex. I think that should be correct. > > 7. I defined each uniquekey as List of Expr, so I didn't introduce new > node type. > > Where will you store the collation Oid? I left comments to mention > that needed to be checked but just didn't wire it up. > This is too embarrassed, I am not sure if it is safe to ignore it. I removed it due to the following reasons (sorry for that I didn't explain it carefully for the last email). 1. When we choose if an UK is usable, we have chance to compare the collation info for restrictinfo (where uk = 1) or target list (select uk from t) with the indexinfo's collation, the targetlist one has more trouble since we need to figure out the default collation for it. However relation_has_unique_index_for has the same situation as us, it ignores it as well. See comment /* XXX at some point we may need to check collations here too. */. It think if there are some reasons we can ignore that. 2. What we expect from UK is: a). Where m1.uniquekey = m2.b m2.uk will not be duplicated by this joinclause. Here if m1.uk has different collation, it will raise runtime error. b). Where m1.uniquekey collate 'xxxx' = m2.b. We may can't depends on the run-time error this time. But if we are sure that *if uk is uk at default collation is unique, then (uk collcate 'other-colation') is unique as well**. if so we may safe ignore it as well. c). select uniquekey from t / select uniquekey collate 'xxxx' from t. This have the same requirement as item b). 3). Looks maintain the collation for our case totally is a big effort, and user rarely use it, If my expectation for 2.b is not true, I prefer to detect such case (user use a different collation), we can just ignore the UK for that. But After all, I think this should be an open question for now. --- At last, I am so grateful for your suggestion/feedback, that's really heuristic and constructive. And so thanks Tom's for the quick review and suggest to add a new fields for RelOptInfo, without it I don't think I can add a new field to a so important struct. And also thanks Bapat who explains the thing more detailed. I'm now writing the code for partition index stuff, which is a bit of boring, since every partition may have different unique index. I am expecting that I can finish it in the following 2 days, and hope you can have another round of review again. Thanks for your feedback! Best Regards Andy Fan