On 09/13/2015 08:03 PM, Kevin Grittner wrote:

In my view, the most disappointing thing about the patch is that
when both indexes are present, it doesn't use the narrower one.  If
*only* the narrower index is present, it runs the index-only scan
using that index for count(b) and count(*), which is faster.  Can
we wrangle this patch into making a better choice among available
index-only scans?

That's indeed strange, but after poking into that for a while, it seems rather like a costing issue. Let me demonstrate:

create table t (a int, b int);
insert into t select i,i from generate_series(1,1000000) s(i);

create index idx1 on t (a)   where b between 300000 and 600000;
create index idx2 on t (a,b) where b between 300000 and 600000;

vacuum t;
analyze t;

explain select a from t where b between 300000 and 600000;

                            QUERY PLAN
 Index Only Scan using idx2 on t  (cost=0.42..9236.88 rows=297823 width=4)
   Index Cond: ((b >= 300000) AND (b <= 600000))
(2 rows)

drop index idx2;

                            QUERY PLAN
 Index Only Scan using idx1 on t  (cost=0.42..9236.88 rows=297823 width=4)
(1 row)

Now, both plans are index only scans, but the first one has Index Cond and the other one does not!

I've put a bunch of logging into cost_index(), and turns out that while the final cost is *exactly* the same, it's most likely by chance. After we call amcostestimate, we get these two results:

idx1: indexStartupCost=0.422500 indexTotalCost=4769.537500
      indexSelectivity=0.297823 indexCorrelation=1.000000

idx2: indexStartupCost=0.422500 indexTotalCost=6258.652500
      indexSelectivity=0.297823 indexCorrelation=0.750000

So amcostestimate does make a difference, and we get

idx1: run_cost = 4769.115000
idx2: run_cost = 6258.230000

and then for both indexes

    pages_fetched = 0.000000

but then we do

    run_cost += cpu_per_tuple * tuples_fetched;

and we end up with

    run_cost = 9236.460000

for both indexes. How's that possible? Number of tuples fetched is exactly the same for both indexes (297823), so clearly cpu_per_tuple must be different. That however seems a bit strange, because the only difference between the indexes is the additional column, and the condition should be covered by the index predicate ...

It seems that the problem is related to this:

        = extract_nonindex_conditions(baserel->baserestrictinfo,

while the "larger" index on (a,b) gets

    path->indexquals=(b BETWEEN 300000 AND 600000)

the "smaller" index on (a) gets

    qpquals=(b BETWEEN 300000 AND 600000)

And so the larger index gets qpqual_cost=0, the smaller one gets qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015.

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

Sppoky! Although it seems like a mere coincidence, thanks to the nice round numbers of tuples in the table, and lucky choice of two conditions.

For example after replacing the BETWEEN condition (which is actually two conditions) with a single one (b<300000) - both in the indexes and query, I get this:

                           QUERY PLAN
 Index Only Scan using idx2 on t  (cost=0.42..8541.25 rows=299507 width=4)
   Index Cond: (b < 300000)
(2 rows)

drop index idx2;

                           QUERY PLAN
 Index Only Scan using idx1 on t  (cost=0.42..8541.43 rows=299507 width=4)
(1 row)

The plans are not costed exactly the same anymore (I'm not saying the costs are correct - clearly still the 'larger' index was preferred).

It's not bound to index only scan either - after adding another column to the table, and requesting it from the query (so preventing IOS), I get exactly the same issues.

I really wonder why we get different path->indexquals for those indexes, because that's the root of the issue here. Any ideas?

It also seems disappointing that we don't recognize that
count(columnname) could be treated as a synonym for count(*) if
columnname is NOT NULL, but that doesn't seem like material for
this patch.

Yeah, that's really not what this patch deals with.

Benchmarking took so much time I did not get to a close review of
the code changes.  :-(  Based on just the test results, it appears
to me that the patch as it stands would be a net win for the vast
majority of workloads where it would make any noticeable difference,
and I didn't manage to create any big downside.  I would like to
see whether we can't improve the choice of partial index when there
are multiple possibilities -- it seems quite surprising to see
identical estimates for indexes of different column counts and key
widths, and to see the wider index chosen when the narrow one is
clearly (and predictably) faster.

I am changing this to Waiting on Author.

I will be on vacation without Internet access for the next 15 days,
so hopefully someone else can have a look when a new version is
posted.  If it's still open I'll have a look when I get back.

Thanks for the feedback!


Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to