Hi Teodor,

Thanks for the patch. This (missing) optimization popped-up repeatedly recently, and I was planning to look into it for PG12. So now I don't have to, because you've done all the hard work ;-)

I've have done only very basic review so far, but ISTM the general approach of considering additional indexes in create_index_paths is correct. That being said, I do have a couple of comments:


1) add_path() ensures that we only keep the one cheapest path sorted path for each pathkeys. This patch somewhat defeats that because it considers additional pathkeys (essentially any permutation of group keys) as interesting. So we may end up with more paths in the list.

I wonder if we should make the add_path() logic smarter to recognize when two paths have different pathkeys but were generated to match the same grouping, to reduce the number of paths added by this optimization. Currently we do that for each pathkeys list independently, but we're considering many more pathkeys orderings ...

That being said, maybe this concern is moot - to generate large number of new paths, there would have to be a lot of indexes matching the group by clause (as a prefix), and the group by clause would need to be fairly large (to have many permutations). That seems like a fairly rare case. I do know a couple of people who do create large numbers of indexes on tables, but those are usually single-column indexes.


2) sort reordering based on ndistinct estimates

This part of the patch (reordering the columns in an explicit sort node) seems to be largely independent of the "consider other indexes" part. So I do suggest separating those two, and discussing them independently.

But thinking about this optimization, I'm worried it relies on a couple of important assumptions. For now those decisions could have be made by the person writing the SQL query, but this optimization makes that impossible. So we really need to get this right.

For example, it seems to disregard that different data types have different comparison costs. For example comparing bytea will be far more expensive compared to int4, so it may be much more efficient to compare int4 columns first, even if there are far fewer distinct values in them.

This costing is probably related to abbreviated keys too, which is a huge improvement for text and other varlena-based types.

Also, simply sorting the columns by their ndistinct estimate is somewhat naive, because it assumes the columns are independent. Imagine for example a table with three columns:

   CREATE TABLE t (a INT, b INT, c INT);

   INSERT INTO t SELECT mod(i,100), mod(i,100)/2, 49*random()
     FROM generate_series(1,1000000) s(i);

Clearly, "a" has the most distinct values (100), while both "b" and "c" have 50 distinct values. But "b" does not actually split any of the "a" groups, while "c" does:

    select count(*) from (select 1 from t group by a,b) foo;
     count
    -------
       100
    (1 row)

    select count(*) from (select 1 from t group by a,c) foo;
     count
    -------
      5000
    (1 row)

So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using per-column ndistinct values. Luckily, we now have ndistinct multi-column coefficients which could be used to decide this I believe (but I haven't tried).

The real issue however is that this decision can't be made entirely locally. Consider for example this:

    explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
QUERY PLAN

------------------------------------------------------------------------------
     Sort  (cost=156010.16..156260.16 rows=100000 width=20)
       Sort Key: c, b, a
       ->  GroupAggregate  (cost=132154.34..145654.34 rows=100000 width=20)
             Group Key: a, c, b
             ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
                   Sort Key: a, c, b
-> Seq Scan on t (cost=0.00..15406.00 rows=1000000 width=12)
    (7 rows)

while without the patch, this would happen:

    explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
QUERY PLAN

------------------------------------------------------------------------
     GroupAggregate  (cost=132154.34..145654.34 rows=100000 width=20)
       Group Key: c, b, a
       ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
             Sort Key: c, b, a
             ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=12)
    (5 rows)

Which is clearly cheaper (at least according to the optimizer) than doing two separate sorts. So the optimization actually does the wrong thing here, and it needs to somehow consider the other ordering requirements (in this case ORDER BY) - either by generating multiple paths with different orderings or some heuristics.

I'm also wondering how freely we can change the group by result ordering. Assume you disable hash aggregate and parallel query - currently we are guaranteed to use group aggregate that produces exactly the ordering as specified in GROUP BY, but this patch removes that "guarantee" and we can produce arbitrary permutation of the ordering. But as I argued in other threads, such implicit guarantees are really fragile, can silently break for arbitrary reasons (say, parallel query will do just that) and can be easily fixed by adding a proper ORDER BY. So I don't think this is an issue.

The other random thought is how will/could this interact with the incremental sorts patch. I know Alexander wanted to significantly limit where we actually consider the incremental sort, but I'm not sure if this was one of those places or not (is sure looks like a place where we would greatly benefit from it).

So to summarize this initial review - I do suggest splitting the patch into two parts. One that does the index magic, and one for this reordering optimization. The first part (additional indexes) seems quite fairly safe, likely to get committable soon. The other part (ndistinct reordering) IMHO requires more thought regarding costing and interaction with other query parts.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to