Re: [PATCH] Erase the distinctClause if the result is unique by definition

Andy Fan Tue, 10 Mar 2020 21:24:24 -0700

On Wed, Mar 11, 2020 at 6:49 AM David Rowley <dgrowle...@gmail.com> wrote:


> On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
> <ashutosh.bapat....@gmail.com> wrote:
> >
> > On Tue, Mar 10, 2020 at 1:49 PM Andy Fan <zhihui.fan1...@gmail.com>
> wrote:
> > > In my current implementation, it calculates the uniqueness for each
> > > BaseRel only, but in your way,  looks we need to calculate the
> > > UniquePathKey for both BaseRel and JoinRel.   This makes more
> > > difference for multi table join.
> >
> > I didn't understand this concern. I think, it would be better to do it
> > for all kinds of relation types base, join etc. This way we are sure
> > that one method works across the planner to eliminate the need for
> > Distinct or grouping. If we just implement something for base
> > relations right now and don't do that for joins, there is a chance
> > that that method may not work for joins when we come to implement it.
>
> Yeah, it seems to me that we're seeing more and more features that
> require knowledge of uniqueness of a RelOptInfo. The skip scans patch
> needs to know if a join will cause row duplication so it knows if the
> skip scan path can be joined to without messing up the uniqueness of
> the skip scan.  Adding more and more places that loop over the rel's
> indexlist just does not seem the right way to do it, especially so
> when you have to dissect the join rel down to its base rel components
> to check which indexes there are. Having the knowledge on-hand at the
> RelOptInfo level means we no longer have to look at indexes for unique
> proofs.
>
> > > Another concern is UniquePathKey
> > > is designed for a general purpose,  we need to maintain it no matter
> > > distinctClause/groupbyClause.
> >
> > This should be ok. The time spent in annotating a RelOptInfo about
> > uniqueness is not going to be a lot. But doing so would help generic
> > elimination of Distinct/Group/Unique operations. What is
> > UniquePathKey; I didn't find this in your patch or in the code.
>
> Possibly a misinterpretation. There is some overlap between this patch
> and the skip scan patch, both would like to skip doing explicit work
> to implement DISTINCT. Skip scans just go about it by adding new path
> types that scan the index and only gathers up unique values.  In that
> case, the RelOptInfo won't contain the unique keys, but the skip scan
> path will. How I imagine both of these patches working together in
> create_distinct_paths() is that we first check if the DISTINCT clause
> is a subset of the a set of the RelOptInfo's unique keys (this patch),
> else we check if there are any paths with uniquekeys that we can use
> to perform a no-op on the DISTINCT clause (the skip scan patch), if
> none of those apply, we create the required paths to uniquify the
> results.
>

Now I am convinced that we should maintain UniquePath on RelOptInfo,
I would see how to work with "Index Skip Scan" patch.

Re: [PATCH] Erase the distinctClause if the result is unique by definition

Reply via email to