Re: POC: GROUP BY optimization

Andrey Lepikhov Mon, 18 Sep 2023 21:42:42 -0700

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".


This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

Ok, some thoughts on this part of the task. As I see, we have not somany different operators: 26 with fixed width and 16 with variable width:


SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;

Benchmarking procedure of types with fixed length can be something likebelow:


CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
    nrows integer,
    exec_time float
)  AS $$
DECLARE
  data json;
  step integer;
BEGIN
  SET work_mem='1GB';

  FOR step IN 0..3 LOOP
    SELECT pow(100, step)::integer INTO nrows;
    DROP TABLE IF EXISTS test CASCADE;
    EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
                    FROM generate_series(1,%s) AS gs;', typ, nrows);

    EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
      SELECT * FROM test ORDER BY (x) DESC INTO data;
    SELECT data->0->'Execution Time' INTO exec_time;
    RETURN NEXT;
  END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

Execution of SELECT * FROM pass_sort('integer'); shows quite linear growof the execution time. So, using '2.0 * cpu_operator_cost' as a cost forthe integer type (as a basis) we can calculate costs for otheroperators. Variable-width types, i think, could require more complextechnique to check dependency on the length.


Does this way look worthwhile?

--
regards,
Andrey Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

Reply via email to