Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Fri, Jul 27, 2018 at 02:19:27PM +0530, Ashutosh Bapat wrote: > [ removal of latest arguments ] > +1, if we could do that. The patch seems to have stuck a bit, so I am marking it as returned with feedback because of no activity. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Thu, Jul 26, 2018 at 10:30 PM, Dean Rasheed wrote: > On 26 July 2018 at 07:12, Ashutosh Bapat > wrote: >> In the patch clauselist_selectivity() gets called repeatedly for every >> new qual added to the clause list. Instead, if we try to combine the >> cost/row estimation with order_qual_clauses() or >> clauselist_selectivity(), we might be able to what the current patch >> does in O(n). clauselist_selectivity() calculates combined selectivity >> of all the given quals in O(n). We should do something similar to that >> in this case. > > Duplicating the logic of clauselist_selectivity() seems like quite a > lot of effort to go to just for improved filter cost estimates. Note > also that clauselist_selectivity() might get a good deal more complex > with multivariate stats. I am not suggesting to duplicate code in clauselist_selectivity(). Instead I am suggesting that we incorporate costing in that function. I don't know how feasible is that. > > Perhaps a reasonable simplification would be to just treat the clauses > as independent when estimating the filter cost, and so use the > following as a "good enough" estimate for the filter cost: > > cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ... > > That would probably be an improvement over what we currently have. It > would be O(n) to compute, and it would probably use the cached > selectivity estimates for the clauses. That looks like a good trade-off. But looking at the following comment in clause_selectivity(), I doubt if this will work in all the cases /* * If possible, cache the result of the selectivity calculation for * the clause. We can cache if varRelid is zero or the clause * contains only vars of that relid --- otherwise varRelid will affect * the result, so mustn't cache. Outer join quals might be examined * with either their join's actual jointype or JOIN_INNER, so we need * two cache variables to remember both cases. Note: we assume the * result won't change if we are switching the input relations or * considering a unique-ified case, so we only need one cache variable * for all non-JOIN_INNER cases. */ > > Note also that with this simplified expression for the filter cost, it > would be possible to improve order_qual_clauses() to take into account > the clause selectivities when sorting the clauses, and minimise the > cost of the filter by evaluating more selective clauses sooner, if > they're not too expensive. I believe that amounts to sorting by > cost/(1-sel) rather than just cost, except for clauses with sel==1, > which it makes sense to move to the end, and just sort by cost. +1, if we could do that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On 26 July 2018 at 07:12, Ashutosh Bapat wrote: > In the patch clauselist_selectivity() gets called repeatedly for every > new qual added to the clause list. Instead, if we try to combine the > cost/row estimation with order_qual_clauses() or > clauselist_selectivity(), we might be able to what the current patch > does in O(n). clauselist_selectivity() calculates combined selectivity > of all the given quals in O(n). We should do something similar to that > in this case. Duplicating the logic of clauselist_selectivity() seems like quite a lot of effort to go to just for improved filter cost estimates. Note also that clauselist_selectivity() might get a good deal more complex with multivariate stats. Perhaps a reasonable simplification would be to just treat the clauses as independent when estimating the filter cost, and so use the following as a "good enough" estimate for the filter cost: cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ... That would probably be an improvement over what we currently have. It would be O(n) to compute, and it would probably use the cached selectivity estimates for the clauses. Note also that with this simplified expression for the filter cost, it would be possible to improve order_qual_clauses() to take into account the clause selectivities when sorting the clauses, and minimise the cost of the filter by evaluating more selective clauses sooner, if they're not too expensive. I believe that amounts to sorting by cost/(1-sel) rather than just cost, except for clauses with sel==1, which it makes sense to move to the end, and just sort by cost. Regards, Dean
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Mon, Jan 29, 2018 at 10:42 AM, Yuto Hayamizu wrote: > On Fri, Jan 19, 2018 at 5:07 PM, Yuto Hayamizu wrote: >> My idea of improving this patch is that give a threshold N_limit, >> and for q_1 ... q_N_limit, do the same weighted cost estimation in the >> current version of this patch. >> For q_{N_limit+1} , stop calling clauselist_selectivity for >> calculating the weight >> and reuse the result of clauselist_selectivity({q_1,q_2, ..., q_N_limit}). >> For example, if N_limit=100, additional overhead is only >> sub-milliseconds per each range table entry, >> and cost estimation is surely better than the current postgres >> implementation. > > Attached patch implemented the improvement idea above. > With this patch attached, performance degradation of the test query > with many quals was <1%. > Example test query is attached. > I looked at the patch and the idea sounds reasonable to me. But I still have doubts about the usefulness of the patch. What this patch does is to produce more accurate costs of scan of a single relation. That's good, but it does that for all the paths created for that relation. So the accurate estimate doesn't help us to choose one method of scanning the relation over the other method. It also does not affect the costs of different join paths, sort paths etc. IOW, I can not imagine a find a query which will be planned in a different way than upstream when we apply the patch and the new plan will be more efficient than the previous one. I might be missing something but. Can you please provide such a query. In the patch clauselist_selectivity() gets called repeatedly for every new qual added to the clause list. Instead, if we try to combine the cost/row estimation with order_qual_clauses() or clauselist_selectivity(), we might be able to what the current patch does in O(n). clauselist_selectivity() calculates combined selectivity of all the given quals in O(n). We should do something similar to that in this case. As suggested upthread by Tom, it will be more useful to have this feature work with not just scans but joins as well. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Fri, Jan 19, 2018 at 5:07 PM, Yuto Hayamizuwrote: > My idea of improving this patch is that give a threshold N_limit, > and for q_1 ... q_N_limit, do the same weighted cost estimation in the > current version of this patch. > For q_{N_limit+1} , stop calling clauselist_selectivity for > calculating the weight > and reuse the result of clauselist_selectivity({q_1,q_2, ..., q_N_limit}). > For example, if N_limit=100, additional overhead is only > sub-milliseconds per each range table entry, > and cost estimation is surely better than the current postgres implementation. Attached patch implemented the improvement idea above. With this patch attached, performance degradation of the test query with many quals was <1%. Example test query is attached. regards, Yuto Hayamizu Mitigate-filter-cost-overestimation-v3.patch Description: Binary data testquery.sql Description: Binary data
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Thu, Nov 9, 2017 at 12:33 PM, Ashutosh Bapatwrote: > different set, within each set the order is same. FWIW, we can order > all clauses in largest set once and use that order every time. Albeit > we will have to remember the order somewhere OR make the separator > routine retain the order in the larger set, which I guess is true > about all separator functions. For this patch, sorting of a qual list happens only once for each range table entry, not for each path. So there is no need for caching sorted qual lists as far as I know. regards, Yuto Hayamizu
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Wed, Nov 8, 2017 at 6:48 AM, Tom Lanewrote: > Because (1) up to now there's been no need to consider the qual ordering > till later, and (2) re-doing that sort for each path seemed unduly > expensive. If we're to try to estimate whether later quals will be > reached, then sure the ordering becomes important. I'm still concerned > about (2) though. This patch changes 'set_baserel_size_estimates', and it is called only once for each range table entry, so sorting does not happen for each path and its negative impact on optimizer performance is negligible. While sorting quals does not cost so much, I noticed another performance problem of this patch. Patched 'set_baserel_size_estimates' is O(N^2) (N is the number of quals) and would take so long time when a query has a large qual list. After sorting quals, it loops over quals q_1,q_2, ..., q_N to estimated "weighted" cost of each qual q_k. In each iteration, in order to calculate "the weight" of q_k, it calls clauselist_selectivity({q_1,q_2, ..., q_k}), which loops k times in the function. I've checked actual negative impact on optimization time with the artificial query: SELECT count(*) FROM pgbench_accounts WHERE (randomly generated N quals); N=50: 0.5966 msec (original), 0.938 msec (patched) N=100: 1.1992 msec (original), 1.5396 msec (patched) N=200: 1.9167 msec (original), 4.6676 msec (patched) N=400: 2.7583 msec (original), 12.0786 msec (patched) N=800: 5.2919 msec (original), 38.0228 msec (patched) N=1600: 9.3428 msec (original), 167.737 msec (patched) >From this result, patched version is almost O(N^2). 100 quals in a query seems unusually large number, but I think there may exists some real-world queries that have hundreds or thousands of quals, so I feel this patch needs improvement for handling large N. My idea of improving this patch is that give a threshold N_limit, and for q_1 ... q_N_limit, do the same weighted cost estimation in the current version of this patch. For q_{N_limit+1} , stop calling clauselist_selectivity for calculating the weight and reuse the result of clauselist_selectivity({q_1,q_2, ..., q_N_limit}). For example, if N_limit=100, additional overhead is only sub-milliseconds per each range table entry, and cost estimation is surely better than the current postgres implementation. Any thoughts?
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Sun, Jan 7, 2018 at 8:25 AM, Stephen Frostwrote: > > Greetings, > > * Michael Paquier (michael.paqu...@gmail.com) wrote: > > On Thu, Nov 9, 2017 at 12:33 PM, Ashutosh Bapat > > wrote: > > > Looking at order_qual_clauses(), we can say that a set of quals q1 > > > qn are ordered the same irrespective of the set of clauses they > > > are subset of. E.g. if {q1 .. qn} is subset of Q (ordered as Qo) and > > > also Q' (ordered as Q'o) the order in which they appear in Qo and Q'o > > > is same. So, even if different paths segregate the clauses in > > > different set, within each set the order is same. FWIW, we can order > > > all clauses in largest set once and use that order every time. Albeit > > > we will have to remember the order somewhere OR make the separator > > > routine retain the order in the larger set, which I guess is true > > > about all separator functions. > > > > I am not sure what to think about this patch, so moved to next CF. The > > patch still applies. Hayamizu-san, it would be nice as well if you > > could review other's patches. One patch reviewed for one patch > > submitted, with equal difficulty. You should also get a community > > account so as it is possible to add your name as an author of this > > patch. > > While this patch does still apply, it doesn't pass the 'make check' > regression tests and there appears to be concern raised up-thread about > the performance impact. > > Yuto Hayamizu, it looks like some good questions were raised and which > need to be addressed, in addition to making sure that the patch at least > passes 'make check', so I'm leaving this as waiting-for-author. If you > believe the patch is no longer viable, we can change it to 'returned > with feedback', otherwise, an updated patch and some responses to the > questions raised would be great as we're already a week into this > commitfest and this thread has been dormant since near the start of the > last CF. > > Thanks! > > Stephen Thank you for your kind guidance, and sorry for late reply. Attached patch fixes regression tests and now it passes the 'make check' tests. Since this patch changes cost estimation in set_baserel_size_estimates, picked query plans were changed for some tests, so I've updated their expected EXPLAIN results. I'm going to answer raised questions by replying to each message soon. regards, Yuto Hayamizu Mitigate-filter-cost-overestimation-v2.patch Description: Binary data
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Greetings, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Thu, Nov 9, 2017 at 12:33 PM, Ashutosh Bapat >wrote: > > Looking at order_qual_clauses(), we can say that a set of quals q1 > > qn are ordered the same irrespective of the set of clauses they > > are subset of. E.g. if {q1 .. qn} is subset of Q (ordered as Qo) and > > also Q' (ordered as Q'o) the order in which they appear in Qo and Q'o > > is same. So, even if different paths segregate the clauses in > > different set, within each set the order is same. FWIW, we can order > > all clauses in largest set once and use that order every time. Albeit > > we will have to remember the order somewhere OR make the separator > > routine retain the order in the larger set, which I guess is true > > about all separator functions. > > I am not sure what to think about this patch, so moved to next CF. The > patch still applies. Hayamizu-san, it would be nice as well if you > could review other's patches. One patch reviewed for one patch > submitted, with equal difficulty. You should also get a community > account so as it is possible to add your name as an author of this > patch. While this patch does still apply, it doesn't pass the 'make check' regression tests and there appears to be concern raised up-thread about the performance impact. Yuto Hayamizu, it looks like some good questions were raised and which need to be addressed, in addition to making sure that the patch at least passes 'make check', so I'm leaving this as waiting-for-author. If you believe the patch is no longer viable, we can change it to 'returned with feedback', otherwise, an updated patch and some responses to the questions raised would be great as we're already a week into this commitfest and this thread has been dormant since near the start of the last CF. Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Thu, Nov 9, 2017 at 12:33 PM, Ashutosh Bapatwrote: > Looking at order_qual_clauses(), we can say that a set of quals q1 > qn are ordered the same irrespective of the set of clauses they > are subset of. E.g. if {q1 .. qn} is subset of Q (ordered as Qo) and > also Q' (ordered as Q'o) the order in which they appear in Qo and Q'o > is same. So, even if different paths segregate the clauses in > different set, within each set the order is same. FWIW, we can order > all clauses in largest set once and use that order every time. Albeit > we will have to remember the order somewhere OR make the separator > routine retain the order in the larger set, which I guess is true > about all separator functions. I am not sure what to think about this patch, so moved to next CF. The patch still applies. Hayamizu-san, it would be nice as well if you could review other's patches. One patch reviewed for one patch submitted, with equal difficulty. You should also get a community account so as it is possible to add your name as an author of this patch. -- Michael