On 05/30/15 03:52, Tomas Vondra wrote:


On 05/30/15 01:20, Tomas Vondra wrote:

Notice the cost - it's way lover than the previous plan (9.2 vs
~111k), yet this plan was not chosen. So either the change broke
something (e.g. by violating some optimizer assumption), or maybe
there's a bug somewhere else ...

After a bit more investigation, what I think is happening here is
add_path() does not realize this is a SEMI join and a single tuple is
enough, and discards the simple indexscan path in favor of the bitmap
index scan as that seems cheaper when scanning everything.

So not really a bug, but maybe 'consider_startup' would help?

A bit more info - the reason why the IndexPath gets discarded (in favor of BitmapHeapScan) seems to be twofold:

(a) Semijoin does no imply 'consider_startup' for the inner path, so it
    gets discarded by the BitmapHeapScan, as it has lower total cost.

    I'm not sure where exactly the consider_startup should be set, so
    I've forced that at the beginning on add_path() to see what happens.
    Sadly, this alone is not sufficient, because of the next point.

(b) To actually use startup costs in compare_path_costs_fuzzily, there
    is an additional condition about handling parameterized paths, as
    explained in this comment:

    * This function also includes special hacks to support a policy
    * enforced by its sole caller, add_path(): paths that have any
    * parameterization cannot win comparisons on the grounds of having
    * cheaper startup cost, since we deem only total cost to be of
    * interest for a parameterized path. (Unparameterized paths are
    * more common, so we check for this case last.)

    so the startup_cost comparisons look like this:

    if (consider_startup &&
        path2->startup_cost > path1->startup_cost * fuzz_factor &&
        path1->param_info == NULL)

    Of course, path1 (IndexPath) actually has parameters, which makes
    the whole condition false, and the IndexPath gets discarded :-(

    ISTM the reported test case seems like a counter example to the
    policy, so maybe this should really be relaxed somehow.

To see what happens, I tweaked both (a) and (b) by forcing

    consider_startup = true

at the beginning of add_path(), and removing the (param_info==NULL) conditions (patch attached). It also includes the run_cost tweak mentioned in the first message.

I'm not claiming this is the correct fix, the only goal was to see what happens. And it really does produce the 'right' plan:

                             QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.99..9.20 rows=1 width=74)
   ->  Index Scan using term_facttablename_columnname_idx on term t
         (cost=0.55..8.57 rows=1 width=74)
         Index Cond: (((facttablename)::text =
              'facttable_stat_fta4'::text) AND ((columnname)::text =
              'berechnungsart'::text))
   ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx on
       facttable_stat_fta4 f (cost=0.43..310525.02 rows=4870496 width=2)
         Index Cond: (berechnungsart = (t.term)::text)
(5 rows)


If I get the comments in pathnode.c about the startup_cost policy, it's "only" an optimization to reduce the number of parameterized paths, so I believe this does not break anything. Also, it's no fun if the planning is a tad faster, but the query runtime is 1000x higher.

Ideas how to relax the policy without significant impact on optimizer?

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ac865be..368bed4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1836,11 +1836,11 @@ initial_cost_nestloop(PlannerInfo *root, JoinCostWorkspace *workspace,
 		 * estimate whether that will happen, so be conservative and always
 		 * charge the whole first-scan cost once.
 		 */
-		run_cost += inner_run_cost;
-
 		outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac);
 		inner_scan_frac = 2.0 / (semifactors->match_count + 1.0);
 
+		run_cost += inner_run_cost * inner_scan_frac;
+
 		/* Add inner run cost for additional outer tuples having matches */
 		if (outer_matched_rows > 1)
 			run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 7f7aa24..aa97d4f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -159,8 +159,7 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor,
 	{
 		/* path1 fuzzily worse on total cost */
 		if (consider_startup &&
-			path2->startup_cost > path1->startup_cost * fuzz_factor &&
-			path1->param_info == NULL)
+			path2->startup_cost > path1->startup_cost * fuzz_factor)
 		{
 			/* ... but path2 fuzzily worse on startup, so DIFFERENT */
 			return COSTS_DIFFERENT;
@@ -172,8 +171,7 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor,
 	{
 		/* path2 fuzzily worse on total cost */
 		if (consider_startup &&
-			path1->startup_cost > path2->startup_cost * fuzz_factor &&
-			path2->param_info == NULL)
+			path1->startup_cost > path2->startup_cost * fuzz_factor)
 		{
 			/* ... but path1 fuzzily worse on startup, so DIFFERENT */
 			return COSTS_DIFFERENT;
@@ -183,14 +181,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor,
 	}
 	/* fuzzily the same on total cost */
 	/* (so we may as well compare startup cost, even if !consider_startup) */
-	if (path1->startup_cost > path2->startup_cost * fuzz_factor &&
-		path2->param_info == NULL)
+	if (path1->startup_cost > path2->startup_cost * fuzz_factor)
 	{
 		/* ... but path1 fuzzily worse on startup, so path2 wins */
 		return COSTS_BETTER2;
 	}
-	if (path2->startup_cost > path1->startup_cost * fuzz_factor &&
-		path1->param_info == NULL)
+	if (path2->startup_cost > path1->startup_cost * fuzz_factor)
 	{
 		/* ... but path2 fuzzily worse on startup, so path1 wins */
 		return COSTS_BETTER1;
@@ -401,6 +397,9 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	ListCell   *p1_prev;
 	ListCell   *p1_next;
 
+	/* XXX force startup cost comparison */
+	parent_rel->consider_startup = true;
+
 	/*
 	 * This is a convenient place to check for query cancel --- no part of the
 	 * planner goes very long without calling add_path().
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to