On 12/30/2015 10:30 PM, David Rowley wrote:
On 31 December 2015 at 01:24, Tomas Vondra <tomas.von...@2ndquadrant.com
Personally I'd like to see automatic plan caching occur in
PostgreSQL. There is a bit of a problem with it in regards to a query
such as: select * from t where mySkewedColumn = 1; where the value 1
appears 99% of the time. Initially we may plan to seqscan, where with
other values we'd likely prefer to index scan. I imagine with my
unique joins patch, that it could be expanded to test
baserestrictinfos to see if they contain quals with a unique index.
This knowledge could later permit plan caching to occur on queries
which are safe from having any skew in the results. It might sound
rather restrictive, but likely this would cover 99% of UPDATE and
DELETE operations in an OLTP workload, and likely a good deal of the
SELECTs too. The safety of caching could be analyzed during planning,
and a flag could be set somewhere, perhaps in PlannedStmt to mark if
the plan is safe to cache. The planner() function could initially
hash the query string and check if any cached plan exists with that
hash, if not, plan the statement, and then check if the "safeToCache"
flag is set, and if so, stick that plan into a hash table. Also plans
with no baserestrictinfos could be "safeToCache" too.
Yeah, that's what I meant by "non-trivial". I don't know a good approach
to this problem, or if such a "good" approach even exists, but I'd say
being able to decide whether to rebuild a plan in such cases is a
"must-have" feature. Otherwise we could easily loose any gains from the
more thorough optimization because of poor plans.
In other words, we'd have to come up with a way to decide whether to use
the same plan as before, or try building another plan (for the same
query with different parameter values). I can think of two approaches:
(1) Try to measure how "different" are the parameter values used in the
new query, compared to the existing plan(s). This probably means
difference in terms of probability / frequencies etc.
(2) Compute the cost of the existing plan for the new parameters. I.e.
don't perform the whole optimization, just the costing for the
single plan. If the costs are "close" then use the existing plan.
Of course, none of this is trivial and still may fail for some cases.
I wonder what the other databases do, or if there are papers about this
topic (I'd guess there are, but I haven't looked too much).
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: