Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

 This seems like a pretty sensible thing to do.  I can't immediately
 imagine a situation in which 1.0 is a sensible selectivity estimate in
 the no-stats case and 0.90 (say) is a major regression.

After sleeping on it, that seems like my least favorite option.  It's
basically a kluge, as is obvious because there's no principled way to
choose what the bound is (or the minimum result from
get_variable_numdistinct, if we were to hack it there).  I'm currently
leaning to the idea of tweaking the logic in indxpath.c; in particular,
why wouldn't it be a good idea to force consideration of the bitmap path
if the index type hasn't got amgettuple?  If we don't, then we've
completely wasted the effort spent up to that point inside
find_usable_indexes.

Or we could just ignore the issue; as Josh says, that's not an
unreasonable option.  The particular case I ran into is certainly not
too compelling.  I'm a bit worried though that there might be other
cases where the estimator comes up with 1.0 selectivity but it'd still
be worth considering a bitmap scan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Josh Berkus
On 1/10/11 7:25 AM, Tom Lane wrote:
 I'm a bit worried though that there might be other
 cases where the estimator comes up with 1.0 selectivity but it'd still
 be worth considering a bitmap scan.

Well, I think the answer is to apply the other fixes, and test.  If
there are other cases of selectivity=1.0, they'll show up.  People are
pretty fast to complain if indexes aren't used, and we have a good
production test case available once you implement the other operators.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 1/10/11 7:25 AM, Tom Lane wrote:
 I'm a bit worried though that there might be other
 cases where the estimator comes up with 1.0 selectivity but it'd still
 be worth considering a bitmap scan.

 Well, I think the answer is to apply the other fixes, and test.  If
 there are other cases of selectivity=1.0, they'll show up.  People are
 pretty fast to complain if indexes aren't used, and we have a good
 production test case available once you implement the other operators.

Implement the other operators?  I don't think we're on the same page
here.  What I'm talking about is a one-line change in indxpath.c to not
short-circuit consideration of a bitmap indexscan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Robert Haas
On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

 This seems like a pretty sensible thing to do.  I can't immediately
 imagine a situation in which 1.0 is a sensible selectivity estimate in
 the no-stats case and 0.90 (say) is a major regression.

 After sleeping on it, that seems like my least favorite option.  It's
 basically a kluge, as is obvious because there's no principled way to
 choose what the bound is (or the minimum result from
 get_variable_numdistinct, if we were to hack it there).

Well, the general problem is that we have no reasonable way of
handling planning uncertainty.  We have no way of throwing our hands
up in the air and saying I really have no clue how many rows are
going to come out of that node; as far as the rest of the planning
process is concerned, a selectivity estimate of 0.005 based on
column = some MCV with a frequency of 0.005 is exactly identical
to one that results from a completely inscrutable equality condition.
So while I agree with you that there's no particular principled way to
choose the exact value, that doesn't strike me as a compelling
argument against fixing some value.  ISTM that selectivity estimates
of exactly 0 and exactly 1 ought to be viewed with a healthy dose of
suspicion.

 I'm currently
 leaning to the idea of tweaking the logic in indxpath.c; in particular,
 why wouldn't it be a good idea to force consideration of the bitmap path
 if the index type hasn't got amgettuple?  If we don't, then we've
 completely wasted the effort spent up to that point inside
 find_usable_indexes.

I guess the obvious question is: why wouldn't it be a good idea to
force consideration of the bitmap path even if the index type DOES
have amgettuple?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm currently
 leaning to the idea of tweaking the logic in indxpath.c; in particular,
 why wouldn't it be a good idea to force consideration of the bitmap path
 if the index type hasn't got amgettuple?  If we don't, then we've
 completely wasted the effort spent up to that point inside
 find_usable_indexes.

 I guess the obvious question is: why wouldn't it be a good idea to
 force consideration of the bitmap path even if the index type DOES
 have amgettuple?

Well, the motivation is what the code comment said: not to waste time
uselessly considering the bitmap form of an indexscan whose only reason
to live was to produce output sorted in a particular way.  That's
irrelevant for GIN of course, but it's entirely relevant for btree.

It might be just useless over-optimization, but I don't think so --
choose_bitmap_and is O(N^2) in the number of paths submitted to it,
so adding a lot of uninteresting paths doesn't seem smart.

A small variant of the approach would be to only reject paths that have
non-empty pathkeys.  That's not a *sufficient* condition, because a path
could have both pathkeys and good selectivity --- but it could be added
onto the selectivity test.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Tom Lane
Whilst fooling around with GIN over the past few days, I noticed the
following rather surprising behavior:

regression=# create table t1 (f1 int[]);
CREATE TABLE
regression=# insert into t1 values (array[42]);
INSERT 0 1
regression=# create index ti1 on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan to 0;
SET
regression=# explain select * from t1 where f1 = array[42];
  QUERY PLAN   
---
 Seq Scan on t1  (cost=100.00..101.01 rows=1 width=32)
   Filter: (f1 = '{42}'::integer[])
(2 rows)

The system absolutely will not use the index in this state, even though
the GIN array opclass does support equality.  I dug around and
eventually figured out why:

1. We don't have any pg_stats statistics in this state; but we do know
reltuples = 1, since the CREATE INDEX helpfully updated the pg_class row
with that information.  Without stats, eqsel() falls back to a
selectivity estimate of 1/numdistinct; and without stats,
get_variable_numdistinct estimates the number of distinct values as
equal to reltuples, if that's a small number.  The upshot is that we get
a selectivity estimate of exactly 1.0 for the equality condition.

2. In indxpath.c, we have the following comment and code about selecting
which paths to generate bitmap scan paths for:

 * ... Also, pick out the ones that might be useful
 * as bitmap scans.  For that, we must discard indexes that don't support
 * bitmap scans, and we also are only interested in paths that have some
 * selectivity; we should discard anything that was generated solely for
 * ordering purposes.

if (ipath-indexinfo-amhasgetbitmap 
ipath-indexselectivity  1.0 
!ScanDirectionIsBackward(ipath-indexscandir))
bitindexpaths = lappend(bitindexpaths, ipath);

Since GIN doesn't support plain indexscans, only bitmap scans, this
means that the planner simply won't use a GIN index unless the estimated
selectivity is less than 1.0.

There are several things we might choose to do about this:

1. Do nothing.  The issue seems quite unlikely to affect anyone in the
field, since in fact use a seqscan is probably the right answer
anytime reltuples = 1; and anyway using a GIN index for plain equality
is a corner case to begin with.  However, it could confuse people who
were doing testing (it confused me!).

2. Hack the selectivity estimators so that we don't get exactly 1.0
for this sort of situation.  It might be plausible for
get_variable_numdistinct to set a floor of 2 on its result, for example;
or we could hack eqsel() to bound the no-stats estimate to a bit less
than 1.

3. Change the indxpath.c code to have some different way of determining
whether a path is sensible to use as a bitmap scan.  We could for
instance make the test be path has some indexquals and/or index is
partial.  Or maybe there should be an exception in the case where the
index type doesn't support plain indexscan.

I'm not really sold on any of these alternatives.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Josh Berkus
On 1/9/11 3:38 PM, Tom Lane wrote:
 1. Do nothing.  The issue seems quite unlikely to affect anyone in the
 field, since in fact use a seqscan is probably the right answer
 anytime reltuples = 1; and anyway using a GIN index for plain equality
 is a corner case to begin with.  However, it could confuse people who
 were doing testing (it confused me!).

+1.  It's an unexpected result, but not actually a bad one.  It just
doesn't seem worth messing with code which works in production just to
help testing.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Robert Haas
On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

This seems like a pretty sensible thing to do.  I can't immediately
imagine a situation in which 1.0 is a sensible selectivity estimate in
the no-stats case and 0.90 (say) is a major regression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers