Hello,

I'd love to have some sort of dynamic query feedback, yet it's very complicated 
to do it right. I am not convinced that changing the plan during a single 
execution is the right way to do it, not only because it sounds intrusive to do 
crazy things in the executor, but also because don't understand why the new 
plan should be any better than the old one. Can you be more elaborate how you'd 
want to go about it?

In your example (which presumably just has a single relation), we have no 
notion of whether the scan returns no rows because we were unlucky, because 
just the first few pages were empty of matching rows (which in my experience 
happens more often), or because the cardinality estimation is wrong. Even if 
the cardinality estimation is wrong, we have no notion of which predicate or 
predicate combination actually caused the misestimation. If the first few pages 
where empty, the same might happen with every order (so also with every 
available indexscan). Imagine a very simple seqscan plan of 
select * from mytab where a = 3 and b = 40 limit 1
Even if we know the cardinality is overestimated, we have no idea whether the 
cardinality of a = 3 or b = 40 is wrong or they just correlate, so there is no 
notion of which is actually the cheapest plan. Usual workaround for most of 
these queries is to add an order by (which has the nice addition of having a 
deterministic result) with an appropriate complex index, usually resulting in 
indexscans.

While we actually know more after the first execution of a nodes like 
materialize, sort or hash nodes, I rarely encounter materialize nodes in the 
wild. Consequently that is the place where the work is usually already done, 
which is especially true with the hash node. Even though it still might be more 
optimal to switch from a mergejoin to a hashjoin in some cases, I doubt that's 
worth any work (and even less the maintenance).

Best regards
Arne Roland

-----Original Message-----
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Oliver Mattos
Sent: Monday, November 13, 2017 5:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Query planner gaining the ability to replanning after start 
of query execution.

I am interested in giving the query planner the ability to replan (or re-rank 
plans) after query execution has begun, based on the progression of the query 
so far.

Example use case:

*  A LIMIT 1 query is planned using an expensive scan which the planner expects 
to return a large number of results, and to terminate
early.   The reality is the query actually produces no results, and
the scan must run to completion, potentially taking thousands of times longer 
than expected.

*  If this plans costs were adjusted mid-execution to reflect the fact that the 
scan is producing far fewer rows than expected, then another query plan might 
come out ahead, which would complete far faster.


Has this been done before?   Are there any pitfalls to beware of?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-----Original Message-----
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Oliver Mattos
Sent: Monday, November 13, 2017 5:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Query planner gaining the ability to replanning after start 
of query execution.

I am interested in giving the query planner the ability to replan (or re-rank 
plans) after query execution has begun, based on the progression of the query 
so far.

Example use case:

*  A LIMIT 1 query is planned using an expensive scan which the planner expects 
to return a large number of results, and to terminate
early.   The reality is the query actually produces no results, and
the scan must run to completion, potentially taking thousands of times longer 
than expected.

*  If this plans costs were adjusted mid-execution to reflect the fact that the 
scan is producing far fewer rows than expected, then another query plan might 
come out ahead, which would complete far faster.


Has this been done before?   Are there any pitfalls to beware of?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to