On Tue, Feb 16, 2021 at 10:03 PM Andy Fan <zhihui.fan1...@gmail.com> wrote:
> > > On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowle...@gmail.com> > wrote: > >> On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1...@gmail.com> wrote: >> > >> > On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowle...@gmail.com> >> wrote: >> >> The reason I don't really like this is that it really depends where >> >> you want to use RelOptInfo.notnullattrs. If someone wants to use it >> >> to optimise something before the base quals are evaluated then they >> >> might be unhappy that they found some NULLs. >> >> >> > >> > Do you mean the notnullattrs is not set correctly before the base quals >> are >> > evaluated? I think we have lots of data structures which are set just >> after some >> > stage. but notnullattrs is special because it is set at more than 1 >> stage. However >> > I'm doubtful it is unacceptable, Some fields ever change their meaning >> at different >> > stages like Var->varno. If a user has a misunderstanding on it, it >> probably will find it >> > at the testing stage. >> >> You're maybe focusing too much on your use case for notnullattrs. It >> only cares about NULLs in the result for each query level. >> >> .... thinks of an example... >> >> OK, let's say I decided that COUNT(*) is faster than COUNT(id) so >> decided that I might like to write a patch which rewrite the query to >> use COUNT(*) when it was certain that "id" could not contain NULLs. >> >> The query is: >> >> SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER >> JOIN sales s ON p.partid = s.partid GROUP BY p.partid; >> >> sale.saleid is marked as NOT NULL in pg_attribute. As the writer of >> the patch, I checked the comment for notnullattrs and it says "Not >> null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I >> should be ok to assume since sales.saleid is marked in notnullattrs >> that I can rewrite the query?! >> >> The documentation about the RelOptInfo.notnullattrs needs to be clear >> what exactly it means. I'm not saying your representation of how to >> record NOT NULL in incorrect. I'm saying that you need to be clear >> what exactly is being recorded in that field. >> >> If you want it to mean "attribute marked here cannot output NULL >> values at this query level", then you should say something along those >> lines. >> > > I think I get what you mean. You are saying notnullattrs is only correct > at the *current* stage, namely set_rel_size. It would not be true after > that, but the data is still there. That would cause some confusion. I > admit > that is something I didn't realize before. I checked other fields of > RelOptInfo, > looks no one filed works like this, so I am not really happy with this > design > now. I'm OK with saying more things along these lines. That can be done > we all understand each other well. Any better design is welcome as well. > I think the "Var represents null stuff" is good, until I see your > comments below. > > > >> However, having said that, because this is a Bitmapset of >> pg_attribute.attnums, it's only possible to record Vars from base >> relations. It does not seem like you have any means to record >> attributes that are normally NULLable, but cannot produce NULL values >> due to a strict join qual. >> >> e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something; >> >> I'd expect the RelOptInfo for t not to contain a bit for the >> "nullable" column, but there's no way to record the fact that the join >> RelOptInfo for {t,j} cannot produce a NULL for that column. It might >> be quite useful to know that for the UniqueKeys patch. >> >> > The current patch can detect t.nullable is not null correctly. That > is done by find_nonnullable_vars(qual) and deconstruct_recure stage. > > >> I know there's another discussion here between Ashutosh and Tom about >> PathTarget's and Vars. I had the Var idea too once myself [1] but it >> was quickly shot down. Tom's reasoning there in [1] seems legit. I >> guess we'd need some sort of planner version of Var and never confuse >> it with the Parse version of Var. That sounds like quite a big >> project which would have quite a lot of code churn. I'm not sure how >> acceptable it would be to have Var represent both these things. It >> gets complex when you do equal(var1, var2) and expect that to return >> true when everything matches apart from the notnull field. We >> currently have this issue with the "location" field and we even have a >> special macro which just ignores those in equalfuncs.c. I imagine not >> many people would like to expand that to other fields. >> >> > Thanks for sharing this. > > >> It would be good to agree on the correct representation for Vars that >> cannot produce NULLs so that we don't shut the door on classes of >> optimisation that require something other than what you need for your >> case. >> >> > Agreed. The simplest way is just adding some comments. If go a > step further, how about just reset the notnullattrs when it is nullable > later like outer join? I have added this logic in the attached patch. > (comment for the notnullattrs is still not touched). I think we only > need to handle this in build_join_rel stage. > .. > With the v2 commit 2, > notnullattrs might be unset too early, but if the value is there, then > it is correct. > > This looks bad as well. How about adding an extra field in RelOptInfo for the outer join case. For example: @@ -710,8 +710,14 @@ typedef struct RelOptInfo PlannerInfo *subroot; /* if subquery */ List *subplan_params; /* if subquery */ int rel_parallel_workers; /* wanted number of parallel workers */ - /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */ + /* + * Not null attrs, start from -FirstLowInvalidHeapAttributeNumber. The nullness + * might be changed after outer join, So we need to consult with leftouter_relids + * before using it. + */ Bitmapset *notnullattrs; + /* A list of Relids which will be a outer rel when join with this relation. */ + List *leftouter_relids; /* Information about foreign tables and foreign joins */ Oid serverid; /* identifies server for the table or join */ leftout_relids should be able to be filled with root->join_info_list. If we go with this direction, not sure leftouter_relids should be a List or not since I even can't think out a query which can have more than one relids for a relation. -- Best Regards Andy Fan (https://www.aliyun.com/)