On 04/06/2018 01:43 AM, Alexander Korotkov wrote:
> Hi!
> On Tue, Apr 3, 2018 at 2:10 PM, Tomas Vondra
> <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote:
>     I think solving this may be fairly straight-forward. Essentially, until
>     now we only had one way to do the sort, so it was OK to make the sort
>     implicit by checking if the path is sorted
>         if (input not sorted)
>         {
>             ... add a Sort node ...
>         }
>     But now we have multiple possible ways to do the sort, with different
>     startup/total costs. So the places that create the sorts need to
>     actually generate the Sort paths for each sort alternative, and store
>     the information in the Sort node (instead of relying on pathkeys).
>     Ultimately, this should simplify the createplan.c places making all the
>     make_sort calls unnecessary (i.e. the input should be already sorted
>     when needed). Otherwise it'd mean the decision needs to be done locally,
>     but I don't think that should be needed.
>     But it's surely a fairly invasive change to the patch ...
> Right, there are situation when incremental sort has lower startup cost,
> but higher total cost.  In order to find lower cost, we ideally should
> generate
> paths for both full sort and incremental sort.  However, that would increase
> number of total pathes, and could slowdown planning time.  Another issue
> that we don't always generate pathes for sort.  And yes, it would be rather
> invasive.  So, that doesn't look feasible to get into 11.

I agree that's probably not feasible for PG11, considering the freeze is
about 48h from now. Not necessarily because of amount of code needed to
do that (it might be fairly small, actually) but because of the risk of
regressions in other types of plans and lack of time for review/testing.

I do not think this would cause a significant increase in path number.
We already do have the (partially sorted) paths in pathlist, otherwise
v21 wouldn't be able to build the incremental sort path anyway. And the
plans that did the decision in createplan.c could fairly easily do the
decision when constructing the path, I believe.

Looking v21, this affects three different places:

1) create_merge_append_plan

For merge_append the issue is that generate_mergeappend_paths() calls
get_cheapest_path_for_pathkeys(), which however only looks at cheapest
startup/total path, and then simply falls-back to cheapest_total_path if
there are no suitably sorted paths. IMHO if you modify this to also
consider partially-sorted paths, it should work. You'll have to account
for the extra cost of incremental cost, and it needs to be fairly cheap
(perhaps even by first quickly computing some initial cost estimate -
see e.g. initial_cost_nestloop/final_cost_nestloop).

2) create_mergejoin_plan

For mergejoin, the issue is that sort_inner_and_outer() only looks at
cheapest_total_path for both sides, even before computing the merge
pathkeys. So some of the code would have to move into the foreach loop,
and the paths would be picked by get_cheapest_path_for_pathkeys(). This
time only using total cost, per the comment in sort_inner_and_outer().

3) create_gather_merge_plan

This seems fairly simple - the issue is that gather_grouping_paths()
only looks at cheapest_partial_path. Should be a matter of simply
calling the improved get_cheapest_path_for_pathkeys().

Of course, this is all fairly hand-wavy and it may turn out to be fairly
expensive in some cases. But we can use another trick - we don't need to
search through all partially sorted paths, because for each pathkey
prefix there can be just one "best" path for startup cost and one for
total cost. So we could maintain a much shorter list of partially sorted
paths, I think.

> Intead, I decided to cut usage of incremental sort.  Now, incremental sort
> is generated only in create_sort_path().  Cheaper path selected between
> incremental sort and full sort with taking limit_tuples into account.
> That limits usage of incremental sort, however risk of regression by this
> patch is also minimal.  In fact, incremental sort will be used only when
> sort is explicitly specified and simultaneously LIMIT is specified or
> dataset to be sorted is large and incremental sort saves disk IO.

I personally am OK with reducing the scope of the patch like this. It's
still beneficial for the common ORDER BY + LIMIT case, which is good. I
don't think it may negatively affect other cases (at least I can't think
of any).

It's pretty obvious it may be extremely useful for the other cases too
(particularly for mergejoin on large tables, where it can allow
in-memory sort with quick startup).

But even if you managed to make the necessary code changes, it's
unlikely any experienced committer will look into such significant
change this close to the cutoff. Either they are going to be away for a
weekend, or they are already looking at other patches :-(

> Attached patch also incorporates following commits made by Alexander
> Kuzmenkov:
> * Rename fields of IncrementalSortState to snake_case for the sake of
> consistency.
> * Rename group test function to isCurrentGroup.
> * Comments from Tomas Vondra about nodeIncrementalSort.c
> * Add a test for incremental sort.
> * Add a separate function to calculate costs of incremental sort.

Those changes seem fine, but are still a couple of issues remaining:

1) pathkeys_useful_for_ordering() still uses enable_incrementalsort,
which I think is a bad idea. I've complained about it in my review on
31/3, and I don't see any explanation why this is a good idea.

2) Likewise, I've suggested that the claim about abbreviated keys in
nodeIncrementalsort.c is dubious. No response, and the XXX comment was
instead merged into the patch:

 * XXX The claim about abbreviated keys seems rather dubious, IMHO.

3) There is a comment at cost_merge_append, despite there being no
relevant changes in that function. Misplaced comment?

4) It's not clear to me why INITIAL_MEMTUPSIZE is defined the way it is.
There needs to be a comment - the intent seems to be making it large
enough to exceed ALLOCSET_SEPARATE_THRESHOLD, but it's not quite clear
why that's a good idea.

5) I do get this warning when building the code:

costsize.c: In function ‘cost_incremental_sort’:
costsize.c:1812:2: warning: ISO C90 forbids mixed declarations and code
  List    *presortedExprs = NIL;

6) The comment at cost_incremental_sort talks about cost_sort_internal,
but it's cost_sort_tuplesort I guess.

7) The new code in create_sort_path is somewhat ugly, I guess. It's
correct, but it really needs to be made easier to comprehend. I might
have time to look into that tomorrow, but I can't promise that.

Attached is a diff highlighting some of those places, and couple of
minor code formatting fixes.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 7603a0d1bdcf1ebead6f4671c8e2db96436c86ef
Author: Tomas Vondra <to...@2ndquadrant.com>
Date:   Fri Apr 6 16:10:56 2018 +0200


diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8d39b5c..2668cf3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1564,14 +1564,12 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
 	/* Now, insert a Sort node if subplan isn't sufficiently ordered */
 	if (!pathkeys_contained_in(pathkeys, best_path->subpath->pathkeys))
-	{
 		subplan = (Plan *) make_sort(subplan, gm_plan->numCols,
-	}
 	/* Now insert the subplan under GatherMerge. */
 	gm_plan->plan.lefttree = subplan;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6a595c3..6ff98b0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4817,6 +4817,8 @@ create_distinct_paths(PlannerInfo *root,
  * The only new path we need consider is an explicit sort on the
  * cheapest-total existing path.
+ * XXX This comment needs updating, I guess.
+ *
  * input_rel: contains the source-data Paths
  * target: the output tlist the result Paths must emit
  * limit_tuples: estimated bound on the number of output tuples,
@@ -4856,12 +4858,24 @@ create_ordered_paths(PlannerInfo *root,
 		Path	   *path = (Path *) lfirst(lc);
 		int			n_useful_pathkeys;
+		bool		is_partially_sorted;
+		bool		is_sorted;
 		n_useful_pathkeys = pathkeys_useful_for_ordering(root->sort_pathkeys,
-		if (path == cheapest_input_path || n_useful_pathkeys > 0)
+		/*
+		 * The path is consireder partially sorted when it's sorted by
+		 * a prefix of the pathkeys (possibly all of them).
+		 */
+		is_partially_sorted = (n_useful_pathkeys > 0);
+		/* It's fully sorted when it's sorted by all requested keys. */
+		is_sorted = (n_useful_pathkeys == list_length(root->sort_pathkeys));
+		if (path == cheapest_input_path || is_partially_sorted)
-			if (n_useful_pathkeys < list_length(root->sort_pathkeys))
+			if (!is_sorted)
 				/* An explicit sort here can take advantage of LIMIT */
 				path = (Path *) create_sort_path(root,
@@ -6235,17 +6249,28 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			Path	   *path = (Path *) lfirst(lc);
 			int			n_useful_pathkeys;
+			bool		is_partially_sorted;
+			bool		is_sorted;
 			n_useful_pathkeys = pathkeys_useful_for_ordering(
 									root->group_pathkeys, path->pathkeys);
-			if (path == cheapest_path || n_useful_pathkeys > 0)
+			/*
+			 * The path is consireder partially sorted when it's sorted by
+			 * a prefix of the pathkeys (possibly all of them).
+			 */
+			is_partially_sorted = (n_useful_pathkeys > 0);
+			/* It's fully sorted when it's sorted by all requested keys. */
+			is_sorted = (n_useful_pathkeys == list_length(root->sort_pathkeys));
+			if (path == cheapest_path || is_partially_sorted)
 				 * Sort the path if it isn't already sorted.  Sort might
 				 * be needed for cheapest-total or path sorted by prefix
 				 * of required pathkeys.
-				if (n_useful_pathkeys < list_length(root->group_pathkeys))
+				if (!is_sorted)
 					path = (Path *) create_sort_path(root,
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 83665e0..f8d105b 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -661,7 +661,6 @@ static void free_sort_tuple(Tuplesortstate *state, SortTuple *stup);
 static void tuplesort_free(Tuplesortstate *state);
 static void tuplesort_updatemax(Tuplesortstate *state);
  * Special versions of qsort just for SortTuple objects.  qsort_tuple() sorts
  * any variant of SortTuples, using the appropriate comparetup function.
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5c207e7..815c567 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1532,7 +1532,6 @@ typedef struct IncrementalSortPath
 	int			presortedCols;	/* number of presorted columns */
 } IncrementalSortPath;
  * GroupPath represents grouping (of presorted input)

Reply via email to