Hi Ashutosh: Nice to see you again!
On Tue, May 17, 2022 at 8:50 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote: > On Sun, May 15, 2022 at 8:41 AM Andy Fan <zhihui.fan1...@gmail.com> wrote: > > > > > The var in RelOptInfo->reltarget should have nullable = 0 but the var in > > RelOptInfo->baserestrictinfo should have nullable = 1; The beauty of > this > > are: a). It can distinguish the two situations perfectly b). Whenever we > want > > to know the nullable attribute of a Var for an expression, it is super > easy to > > know. In summary, we need to maintain the nullable attribute at 2 > different > > places. one is the before the filters are executed(baserestrictinfo, > joininfo, > > ec_list at least). one is after the filters are executed > (RelOptInfo.reltarget > > only?) > > Thanks for identifying this. What you have written makes sense and it > might open a few optimization opportunities. But let me put down some > other thoughts here. You might want to take those into consideration > when designing your solution. > Thanks. > > Do we want to just track nullable and non-nullable. May be we want > expand this class to nullable (var may be null), non-nullable (Var is > definitely non-NULL), null (Var will be always NULL). > > Currently it doesn't support "Var will be always NULL" . Do you have any use cases for this? and I can't think of too many cases where we can get such information except something like "SELECT a FROM t WHERE a IS NULL". But the other way to look at this is along the lines of equivalence > classes. Equivalence classes record the expressions which are equal in > the final result of the query. The equivalence class members are not > equal at all the stages of query execution. But because they are > equal in the final result, we can impose that restriction on the lower > levels as well. Can we think of nullable in that fashion? If a Var is > non-nullable in the final result, we can impose that restriction on > the intermediate stages since rows with NULL values for that Var will > be filtered out somewhere. Similarly we could argue for null Var. But > knowledge that a Var is nullable in the final result does not impose a > NULL, non-NULL restriction on the intermediate stages. If we follow > this thought process, we don't need to differentiate Var at different > stages in query. > I agree this is an option. If so we need to track it under the PlannerInfo struct but it would not be as fine-grained as my previous. Without intermediate information, We can't know if a UnqiueKey contains multiple NULLs, this would not be an issue for the "MARK Distinct as no-op" case, but I'm not sure it is OK for other UniqueKey user cases. So my current idea is I still prefer to maintain the intermediate information, unless we are sure it costs too much or it is too complex to implement which I don't think so for now at least. So if you have time to look at the attached patch, that would be super great as well. -- Best Regards Andy Fan