Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

 This seems like a pretty sensible thing to do.  I can't immediately
 imagine a situation in which 1.0 is a sensible selectivity estimate in
 the no-stats case and 0.90 (say) is a major regression.

After sleeping on it, that seems like my least favorite option.  It's
basically a kluge, as is obvious because there's no principled way to
choose what the bound is (or the minimum result from
get_variable_numdistinct, if we were to hack it there).  I'm currently
leaning to the idea of tweaking the logic in indxpath.c; in particular,
why wouldn't it be a good idea to force consideration of the bitmap path
if the index type hasn't got amgettuple?  If we don't, then we've
completely wasted the effort spent up to that point inside
find_usable_indexes.

Or we could just ignore the issue; as Josh says, that's not an
unreasonable option.  The particular case I ran into is certainly not
too compelling.  I'm a bit worried though that there might be other
cases where the estimator comes up with 1.0 selectivity but it'd still
be worth considering a bitmap scan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Josh Berkus
On 1/10/11 7:25 AM, Tom Lane wrote:
 I'm a bit worried though that there might be other
 cases where the estimator comes up with 1.0 selectivity but it'd still
 be worth considering a bitmap scan.

Well, I think the answer is to apply the other fixes, and test.  If
there are other cases of selectivity=1.0, they'll show up.  People are
pretty fast to complain if indexes aren't used, and we have a good
production test case available once you implement the other operators.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 1/10/11 7:25 AM, Tom Lane wrote:
 I'm a bit worried though that there might be other
 cases where the estimator comes up with 1.0 selectivity but it'd still
 be worth considering a bitmap scan.

 Well, I think the answer is to apply the other fixes, and test.  If
 there are other cases of selectivity=1.0, they'll show up.  People are
 pretty fast to complain if indexes aren't used, and we have a good
 production test case available once you implement the other operators.

Implement the other operators?  I don't think we're on the same page
here.  What I'm talking about is a one-line change in indxpath.c to not
short-circuit consideration of a bitmap indexscan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Robert Haas
On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

 This seems like a pretty sensible thing to do.  I can't immediately
 imagine a situation in which 1.0 is a sensible selectivity estimate in
 the no-stats case and 0.90 (say) is a major regression.

 After sleeping on it, that seems like my least favorite option.  It's
 basically a kluge, as is obvious because there's no principled way to
 choose what the bound is (or the minimum result from
 get_variable_numdistinct, if we were to hack it there).

Well, the general problem is that we have no reasonable way of
handling planning uncertainty.  We have no way of throwing our hands
up in the air and saying I really have no clue how many rows are
going to come out of that node; as far as the rest of the planning
process is concerned, a selectivity estimate of 0.005 based on
column = some MCV with a frequency of 0.005 is exactly identical
to one that results from a completely inscrutable equality condition.
So while I agree with you that there's no particular principled way to
choose the exact value, that doesn't strike me as a compelling
argument against fixing some value.  ISTM that selectivity estimates
of exactly 0 and exactly 1 ought to be viewed with a healthy dose of
suspicion.

 I'm currently
 leaning to the idea of tweaking the logic in indxpath.c; in particular,
 why wouldn't it be a good idea to force consideration of the bitmap path
 if the index type hasn't got amgettuple?  If we don't, then we've
 completely wasted the effort spent up to that point inside
 find_usable_indexes.

I guess the obvious question is: why wouldn't it be a good idea to
force consideration of the bitmap path even if the index type DOES
have amgettuple?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm currently
 leaning to the idea of tweaking the logic in indxpath.c; in particular,
 why wouldn't it be a good idea to force consideration of the bitmap path
 if the index type hasn't got amgettuple?  If we don't, then we've
 completely wasted the effort spent up to that point inside
 find_usable_indexes.

 I guess the obvious question is: why wouldn't it be a good idea to
 force consideration of the bitmap path even if the index type DOES
 have amgettuple?

Well, the motivation is what the code comment said: not to waste time
uselessly considering the bitmap form of an indexscan whose only reason
to live was to produce output sorted in a particular way.  That's
irrelevant for GIN of course, but it's entirely relevant for btree.

It might be just useless over-optimization, but I don't think so --
choose_bitmap_and is O(N^2) in the number of paths submitted to it,
so adding a lot of uninteresting paths doesn't seem smart.

A small variant of the approach would be to only reject paths that have
non-empty pathkeys.  That's not a *sufficient* condition, because a path
could have both pathkeys and good selectivity --- but it could be added
onto the selectivity test.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Josh Berkus
On 1/9/11 3:38 PM, Tom Lane wrote:
 1. Do nothing.  The issue seems quite unlikely to affect anyone in the
 field, since in fact use a seqscan is probably the right answer
 anytime reltuples = 1; and anyway using a GIN index for plain equality
 is a corner case to begin with.  However, it could confuse people who
 were doing testing (it confused me!).

+1.  It's an unexpected result, but not actually a bad one.  It just
doesn't seem worth messing with code which works in production just to
help testing.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Robert Haas
On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

This seems like a pretty sensible thing to do.  I can't immediately
imagine a situation in which 1.0 is a sensible selectivity estimate in
the no-stats case and 0.90 (say) is a major regression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers