On Wed, Mar 11, 2020 at 6:49 AM David Rowley <dgrowle...@gmail.com> wrote:
> On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: > > > > On Tue, Mar 10, 2020 at 1:49 PM Andy Fan <zhihui.fan1...@gmail.com> > wrote: > > > In my current implementation, it calculates the uniqueness for each > > > BaseRel only, but in your way, looks we need to calculate the > > > UniquePathKey for both BaseRel and JoinRel. This makes more > > > difference for multi table join. > > > > I didn't understand this concern. I think, it would be better to do it > > for all kinds of relation types base, join etc. This way we are sure > > that one method works across the planner to eliminate the need for > > Distinct or grouping. If we just implement something for base > > relations right now and don't do that for joins, there is a chance > > that that method may not work for joins when we come to implement it. > > Yeah, it seems to me that we're seeing more and more features that > require knowledge of uniqueness of a RelOptInfo. The skip scans patch > needs to know if a join will cause row duplication so it knows if the > skip scan path can be joined to without messing up the uniqueness of > the skip scan. Adding more and more places that loop over the rel's > indexlist just does not seem the right way to do it, especially so > when you have to dissect the join rel down to its base rel components > to check which indexes there are. Having the knowledge on-hand at the > RelOptInfo level means we no longer have to look at indexes for unique > proofs. > > > > Another concern is UniquePathKey > > > is designed for a general purpose, we need to maintain it no matter > > > distinctClause/groupbyClause. > > > > This should be ok. The time spent in annotating a RelOptInfo about > > uniqueness is not going to be a lot. But doing so would help generic > > elimination of Distinct/Group/Unique operations. What is > > UniquePathKey; I didn't find this in your patch or in the code. > > Possibly a misinterpretation. There is some overlap between this patch > and the skip scan patch, both would like to skip doing explicit work > to implement DISTINCT. Skip scans just go about it by adding new path > types that scan the index and only gathers up unique values. In that > case, the RelOptInfo won't contain the unique keys, but the skip scan > path will. How I imagine both of these patches working together in > create_distinct_paths() is that we first check if the DISTINCT clause > is a subset of the a set of the RelOptInfo's unique keys (this patch), > else we check if there are any paths with uniquekeys that we can use > to perform a no-op on the DISTINCT clause (the skip scan patch), if > none of those apply, we create the required paths to uniquify the > results. > Now I am convinced that we should maintain UniquePath on RelOptInfo, I would see how to work with "Index Skip Scan" patch.