On Mon, Feb 19, 2018 at 9:56 AM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Mon, Feb 19, 2018 at 9:35 AM, Ashutosh Bapat > <ashutosh.ba...@enterprisedb.com> wrote: >> On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat >>> <ashutosh.ba...@enterprisedb.com> wrote: >>>> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapil...@gmail.com> >>>> wrote: >>>>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat >>>>> <ashutosh.ba...@enterprisedb.com> wrote: >>>>>> I happened to look at the patch for something else. But here are some >>>>>> comments. If any of those have been already discussed, please feel >>>>>> free to ignore. I have gone through the thread cursorily, so I might >>>>>> have missed something. >>>>>> >>>>>> In grouping_planner() we call query_planner() first which builds the >>>>>> join tree and creates paths, then calculate the target for scan/join >>>>>> rel which is applied on the topmost scan join rel. I am wondering >>>>>> whether we can reverse this order to calculate the target list of >>>>>> scan/join relation and pass it to standard_join_search() (or the hook) >>>>>> through query_planner(). >>>>>> >>>>> >>>>> I think the reason for doing in that order is that we can't compute >>>>> target's width till after query_planner(). See the below note in >>>>> code: >>>>> >>>>> /* >>>>> * Convert the query's result tlist into PathTarget format. >>>>> * >>>>> * Note: it's desirable to not do this till after query_planner(), >>>>> * because the target width estimates can use per-Var width numbers >>>>> * that were obtained within query_planner(). >>>>> */ >>>>> final_target = create_pathtarget(root, tlist); >>>>> >>>>> Now, I think we can try to juggle the code in a way that the width can >>>>> be computed later, but the rest can be done earlier. >>>> >>>> /* Convenience macro to get a PathTarget with valid cost/width fields */ >>>> #define create_pathtarget(root, tlist) \ >>>> set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist)) >>>> create_pathtarget already works that way. We will need to split it. >>>> >>>> Create the Pathtarget without widths. Apply width estimates once we >>>> know the width of Vars somewhere here in query_planner() >>>> /* >>>> * We should now have size estimates for every actual table involved in >>>> * the query, and we also know which if any have been deleted from the >>>> * query by join removal; so we can compute total_table_pages. >>>> * >>>> * Note that appendrels are not double-counted here, even though we >>>> don't >>>> * bother to distinguish RelOptInfos for appendrel parents, because the >>>> * parents will still have size zero. >>>> * >>>> The next step is building the join tree. Set the pathtarget before that. >>>> >>>>> However, I think >>>>> that will be somewhat major change >>>> >>>> I agree. >>>> >>>>> still won't address all kind of >>>>> cases (like for ordered paths) unless we can try to get all kind of >>>>> targets pushed down in the call stack. >>>> >>>> I didn't understand that. >>>> >>> >>> The places where we use a target different than the target which is >>> pushed via query planner can cause a similar problem (ex. see the >>> usage of adjust_paths_for_srfs) because the cost of that target >>> wouldn't be taken into consideration for Gather's costing. >>> >> >> Right. Right now apply_projection_to_path() or adjust_paths_for_srfs() >> do not take into consideration the type of path whose cost is being >> adjusted for the new targetlist. That works for most of the paths but >> not all the paths like custom, FDW or parallel paths. The idea I am >> proposing is to compute final targetlist before query planner so that >> it's available when we create paths for the topmost scan/join >> relation. That way, any path creation logic can then take advantage of >> this list to compute costs, not just parallel paths. > > In fact, we should do this not just for scan/join relations, but all > the upper relations as well. Upper relations too create parallel > paths, whose costs need to be adjusted after their targetlists are > updated by adjust_paths_for_srfs(). Similar adjustments are needed for > any FDWs, custom paths which cost targetlists differently. >
I think any such change in planner can be quite tricky and can lead to a lot of work. I am not denying that it is not possible to think along the lines you are suggesting, but OTOH, I don't see it as a realistic approach for this patch where we can deal with the majority of cases with the much smaller patch. In future, if you are someone can have a patch along those lines for some other purpose (considering it is feasible to do so which I am not completely sure), then we can adjust the things for parallel paths as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com