On 16/01/23 09:48, David Rowley wrote:
I don't think we should touch this.  It could significantly increase
the number of indexes that we consider when generating paths on base
relations and therefore *significantly* increase the number of paths
we consider during the join search.  What I had in mind was just
making better use of existing paths to see if we can find a cheaper
way to perform the DISTINCT.  That'll only possibly increase the
number of paths for the distinct upper relation which really only
increases the number of paths which are considered in
create_ordered_paths(). That's unlikely to cause much of a slowdown in
the planner.

Okay, I see the issue. Makes sense


I'm seeing these two things as separate patches. I don't think there's
any need to add further complexity to the patch that tries to reduce
the number of sorts for WindowAggs. I think you'd better start a new
thread for this.

Will be starting new thread for this and separate patch.


As far as I see it, you shouldn't be touching the distinct_pathkeys.
Those are set in such a way as to minimise the likelihood of an
additional sort for the ORDER BY.  If you've fiddled with that, then I
imagine this is why the plan below has an additional Incremental Sort
that didn't exist before.

I've not looked at your patch, but all I imagine you need to do for it
is to invent a function in pathkeys.c which is along the lines of what
pathkeys_count_contained_in() does, but returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1. In
create_final_distinct_paths(), you can then perform an incremental
sort on any input_path which has a non-empty return list and in
create_incremental_sort_path(), you'll pass presorted_keys as the
number of pathkeys in the path, and the required pathkeys the
input_path->pathkeys + the pathkeys returned from the new function.

Okay, this should be straightforward. Let me try this.

As an optimization, you might want to consider that the
distinct_pathkeys list might be long and that the new function, if you
code the lookup as a nested loop, might be slow. You might want to
consider hashing the distinct_pathkeys once in
create_final_distinct_paths(), then for each input_path, perform a
series of hash lookups to see which of the input_path->pathkeys are in
the hash table. That might require adding two functions to pathkeys.c,
one to build the hash table and then another to probe it and return
the remaining pathkeys list.  I'd go and make sure it all works as we
expect before going to the trouble of trying to optimize this. A
simple nested loop lookup will allow us to review that this works as
we expect.

Okay make sense, will start with nested loop while it is in review and then

optimal version once it is all good to go.

Thanks,

Ankit



Reply via email to