Re: [HACKERS] GIN indexscans versus equality selectivity estimation
Robert Haas robertmh...@gmail.com writes: On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: or we could hack eqsel() to bound the no-stats estimate to a bit less than 1. This seems like a pretty sensible thing to do. I can't immediately imagine a situation in which 1.0 is a sensible selectivity estimate in the no-stats case and 0.90 (say) is a major regression. After sleeping on it, that seems like my least favorite option. It's basically a kluge, as is obvious because there's no principled way to choose what the bound is (or the minimum result from get_variable_numdistinct, if we were to hack it there). I'm currently leaning to the idea of tweaking the logic in indxpath.c; in particular, why wouldn't it be a good idea to force consideration of the bitmap path if the index type hasn't got amgettuple? If we don't, then we've completely wasted the effort spent up to that point inside find_usable_indexes. Or we could just ignore the issue; as Josh says, that's not an unreasonable option. The particular case I ran into is certainly not too compelling. I'm a bit worried though that there might be other cases where the estimator comes up with 1.0 selectivity but it'd still be worth considering a bitmap scan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
On 1/10/11 7:25 AM, Tom Lane wrote: I'm a bit worried though that there might be other cases where the estimator comes up with 1.0 selectivity but it'd still be worth considering a bitmap scan. Well, I think the answer is to apply the other fixes, and test. If there are other cases of selectivity=1.0, they'll show up. People are pretty fast to complain if indexes aren't used, and we have a good production test case available once you implement the other operators. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
Josh Berkus j...@agliodbs.com writes: On 1/10/11 7:25 AM, Tom Lane wrote: I'm a bit worried though that there might be other cases where the estimator comes up with 1.0 selectivity but it'd still be worth considering a bitmap scan. Well, I think the answer is to apply the other fixes, and test. If there are other cases of selectivity=1.0, they'll show up. People are pretty fast to complain if indexes aren't used, and we have a good production test case available once you implement the other operators. Implement the other operators? I don't think we're on the same page here. What I'm talking about is a one-line change in indxpath.c to not short-circuit consideration of a bitmap indexscan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: or we could hack eqsel() to bound the no-stats estimate to a bit less than 1. This seems like a pretty sensible thing to do. I can't immediately imagine a situation in which 1.0 is a sensible selectivity estimate in the no-stats case and 0.90 (say) is a major regression. After sleeping on it, that seems like my least favorite option. It's basically a kluge, as is obvious because there's no principled way to choose what the bound is (or the minimum result from get_variable_numdistinct, if we were to hack it there). Well, the general problem is that we have no reasonable way of handling planning uncertainty. We have no way of throwing our hands up in the air and saying I really have no clue how many rows are going to come out of that node; as far as the rest of the planning process is concerned, a selectivity estimate of 0.005 based on column = some MCV with a frequency of 0.005 is exactly identical to one that results from a completely inscrutable equality condition. So while I agree with you that there's no particular principled way to choose the exact value, that doesn't strike me as a compelling argument against fixing some value. ISTM that selectivity estimates of exactly 0 and exactly 1 ought to be viewed with a healthy dose of suspicion. I'm currently leaning to the idea of tweaking the logic in indxpath.c; in particular, why wouldn't it be a good idea to force consideration of the bitmap path if the index type hasn't got amgettuple? If we don't, then we've completely wasted the effort spent up to that point inside find_usable_indexes. I guess the obvious question is: why wouldn't it be a good idea to force consideration of the bitmap path even if the index type DOES have amgettuple? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm currently leaning to the idea of tweaking the logic in indxpath.c; in particular, why wouldn't it be a good idea to force consideration of the bitmap path if the index type hasn't got amgettuple? If we don't, then we've completely wasted the effort spent up to that point inside find_usable_indexes. I guess the obvious question is: why wouldn't it be a good idea to force consideration of the bitmap path even if the index type DOES have amgettuple? Well, the motivation is what the code comment said: not to waste time uselessly considering the bitmap form of an indexscan whose only reason to live was to produce output sorted in a particular way. That's irrelevant for GIN of course, but it's entirely relevant for btree. It might be just useless over-optimization, but I don't think so -- choose_bitmap_and is O(N^2) in the number of paths submitted to it, so adding a lot of uninteresting paths doesn't seem smart. A small variant of the approach would be to only reject paths that have non-empty pathkeys. That's not a *sufficient* condition, because a path could have both pathkeys and good selectivity --- but it could be added onto the selectivity test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
On 1/9/11 3:38 PM, Tom Lane wrote: 1. Do nothing. The issue seems quite unlikely to affect anyone in the field, since in fact use a seqscan is probably the right answer anytime reltuples = 1; and anyway using a GIN index for plain equality is a corner case to begin with. However, it could confuse people who were doing testing (it confused me!). +1. It's an unexpected result, but not actually a bad one. It just doesn't seem worth messing with code which works in production just to help testing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: or we could hack eqsel() to bound the no-stats estimate to a bit less than 1. This seems like a pretty sensible thing to do. I can't immediately imagine a situation in which 1.0 is a sensible selectivity estimate in the no-stats case and 0.90 (say) is a major regression. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers