Re: [PATCH] Erase the distinctClause if the result is unique by definition

Andy Fan Tue, 17 Mar 2020 19:57:40 -0700

Hi David:

Thanks for your time.

On Wed, Mar 18, 2020 at 9:56 AM David Rowley <dgrowle...@gmail.com> wrote:

> On Mon, 16 Mar 2020 at 06:01, Andy Fan <zhihui.fan1...@gmail.com> wrote:
> >
> > Hi All:
> >
> > I have re-implemented the patch based on David's suggestion/code,  Looks
> it
> > works well.   The updated patch mainly includes:
> >
> > 1. Maintain the not_null_colno in RelOptInfo, which includes the not
> null from
> >     catalog and the not null from vars.
>
> What about non-nullability that we can derive from means other than
> NOT NULL constraints. Where will you track that now that you've
> removed the UniqueKeySet type?
>

I tracked it in 'deconstruct_recurse',  just before
the distribute_qual_to_rels call.

+                       ListCell        *lc;
+                       foreach(lc, find_nonnullable_vars(qual))
+                       {
+                               Var *var = lfirst_node(Var, lc);
+                               RelOptInfo *rel =
root->simple_rel_array[var->varno];
+                               if (var->varattno > InvalidAttrNumber)
+                                       rel->not_null_cols =
bms_add_member(rel->not_null_cols, var->varattno);
+                       }

> Traditionally we use attno or attnum rather than colno for variable
> names containing attribute numbers
>

Currently I use a list of Var for a UnqiueKey,   I guess it is ok?

>
> > 3. postpone the propagate_unique_keys_to_joinrel call to
> populate_joinrel_with_paths
> >   since we know jointype at that time. so we can handle the semi/anti
> join specially.
>
> ok, but the join type was known already where I was calling the
> function from. It just wasn't passed to the function.
>
> > 4. Add the rule I suggested above,  if both of the 2 relation yields the
> a unique result,
> >   the join result will be unique as well. the UK can be  ( (rel1_uk1,
> rel1_uk2).. )
>
> I see. So basically you're saying that the joinrel's uniquekeys should
> be the cartesian product of the unique rels from either side of the
> join.  I wonder if that's a special case we need to worry about too
> much. Surely it only applies for clauseless joins

Some other cases we may need this as well:).   like select m1.pk, m2.pk
from m1, m2
where m1.b = m2.b;

The cartesian product of the unique rels will make the unqiue keys too
long,  so I maintain
the UnqiueKeyContext to make it short.  The idea is if  (UK1) is unique
already,  no bother
to add another UK as (UK1, UK2) which is just a superset of it.

>
>
> 5. If the unique key is impossible to be referenced by others,  we can
> safely ignore
> >     it in order  to keep the (join)rel->unqiuekeys short.
>
> You could probably have an equivalent of has_useful_pathkeys() and
> pathkeys_useful_for_ordering()
>
>
Thanks for suggestion,  I will do so  in the v5-xx.patch.

> > 6. I only consider the not null check/opfamily check for the uniquekey
> which comes
> >    from UniqueIndex.  I think that should be correct.
> > 7. I defined each uniquekey as List of Expr,  so I didn't introduce new
> node type.
>
> Where will you store the collation Oid? I left comments to mention
> that needed to be checked but just didn't wire it up.
>

This is too embarrassed,  I am not sure if it is safe to ignore it.  I
removed it due to
the following reasons (sorry for that I didn't explain it carefully for the
last email).

1.  When we choose if an UK is usable,  we have chance to compare the
collation info
for  restrictinfo  (where uk = 1) or target list (select uk from t) with
the indexinfo's collation,
the targetlist one has more trouble since we need to figure out the default
collation  for it.
However relation_has_unique_index_for has the same situation as us,  it
ignores it as well.
See comment /* XXX at some point we may need to check collations here too.
*/. It think
if there are some reasons we can ignore that.

2. What we expect from UK is:
a).  Where m1.uniquekey = m2.b   m2.uk will not be duplicated by this
joinclause.   Here
if m1.uk has different collation, it will raise runtime error.
b).  Where m1.uniquekey collate 'xxxx' = m2.b.   We may can't depends on
the run-time error this time. But if we are sure that  *if uk is uk at
default collation is unique,
then (uk collcate 'other-colation') is unique as well**.  if so we may safe
ignore it as well.
c).  select uniquekey from t / select uniquekey collate 'xxxx'  from t.
This have the same
requirement as item b).

3).  Looks maintain the collation for our case totally is a big effort,
and user rarely use it,  If
my expectation for 2.b is not true,  I prefer to detect such case (user use
a different collation),
we can just ignore the UK for that.

But After all, I think this should be an open question for now.

---
At last,  I am so grateful for your suggestion/feedback,  that's really
heuristic and constructive.
And so thanks Tom's for the quick review and suggest to add a new fields
for RelOptInfo,
without it I don't think I can add a new field to a so important struct.
And also thanks Bapat who
explains the thing more detailed.   I'm now writing the code for partition
index stuff, which
is a bit of boring, since every partition may have different unique index.
I am expecting that
I can finish it in the following 2 days,  and hope you can have another
round of review again.

Thanks for your feedback!

Best Regards
Andy Fan

Re: [PATCH] Erase the distinctClause if the result is unique by definition

Reply via email to