Hi, In a multi-level partitioned table, a parent whole-row reference gets translated into nested ConvertRowtypeExpr with child whole-row reference as the leaf. During the execution, the child whole-row reference gets translated into all all intermediate parents' whole-row references, ultimately represented as parent's whole-row reference. AFAIU, the intermediate translations are unnecessary. The leaf child whole-row can be directly translated into top parent's whole-row reference. Here's a WIP patch which does that by eliminating intermediate ConvertRowtypeExprs during ExecInitExprRec().
I tested the performance with two-level partitions, and 1M rows, on my laptop selecting just the whole-row expression. I saw ~20% improvement in the execution time. Please see the attached test and its output with and without patch. For two-level inheritance hierarchy, this patch doesn't show any performance difference since the plan time hierarchy is flattened into single level. Instead of the approach that the patch takes, we might modify adjust_appendrel_attrs() not to produce nested ConvertRowtypeExprs in the first place. With that we may get rid of code which handles nested ConvertRowtypeExprs like is_converted_whole_row_reference(). But I haven't tried that approach yet. Suggestions/comments welcome. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index db5fcaf..ec896a4 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -1364,6 +1364,37 @@ ExecInitExprRec(Expr *node, ExprState *state, case T_ConvertRowtypeExpr: { ConvertRowtypeExpr *convert = (ConvertRowtypeExpr *) node; + ConvertRowtypeExpr *tmp_cre = convert; + bool nested_cre = false; + + /* + * If this is a nested ConvertRowtypeExpr resulting from a + * multi-level partition/inheritance hierarchy, its leaf node + * will be a whole-row expression. We can convert the leaf + * whole-row directly into the topmost parent, without + * converting it into the intermediate parent row types. + */ + while (IsA(tmp_cre->arg, ConvertRowtypeExpr)) + { + tmp_cre = castNode(ConvertRowtypeExpr, tmp_cre->arg); + nested_cre = true; + } + + if (nested_cre && IsA(tmp_cre->arg, Var) && + castNode(Var, tmp_cre->arg)->varattno == 0) + { + ConvertRowtypeExpr *new_convert; + + /* + * XXX: Instead of modifying the expression directly, we + * save a modified copy in the execution tree. May be it's + * safe to modify the expression directly, not sure. + */ + new_convert = makeNode(ConvertRowtypeExpr); + memcpy(new_convert, convert, sizeof(ConvertRowtypeExpr)); + new_convert->arg = tmp_cre->arg; + convert = new_convert; + } /* evaluate argument into step's result area */ ExecInitExprRec(convert->arg, state, resv, resnull);
nest_cre.out
Description: Binary data
\set num_rows 1000000 drop table t1; drop table p cascade; create table t1 (a int, b varchar, c timestamp) partition by range(a); create table t1p1 (b varchar, c timestamp, a int) partition by range(a); alter table t1 attach partition t1p1 for values from (0) to (100); create table t1p1p1(c timestamp, a int, b varchar); alter table t1p1 attach partition t1p1p1 for values from (0) to (50); insert into t1 select abs(random()) * 49, i, now() from generate_series(1, :num_rows) i; explain analyze select t1 from t1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.079..683.430 rows=1000000 loops=1) -> Seq Scan on t1p1p1 (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.077..616.439 rows=1000000 loops=1) Planning time: 0.193 ms Execution time: 717.929 ms (4 rows) explain analyze select t1 from t1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.017..607.063 rows=1000000 loops=1) -> Seq Scan on t1p1p1 (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.016..541.619 rows=1000000 loops=1) Planning time: 0.115 ms Execution time: 640.628 ms (4 rows) explain analyze select t1 from t1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.016..605.972 rows=1000000 loops=1) -> Seq Scan on t1p1p1 (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.016..541.256 rows=1000000 loops=1) Planning time: 0.109 ms Execution time: 639.426 ms (4 rows) explain analyze select t1 from t1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.017..613.866 rows=1000000 loops=1) -> Seq Scan on t1p1p1 (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.017..548.002 rows=1000000 loops=1) Planning time: 0.112 ms Execution time: 647.820 ms (4 rows) explain analyze select t1 from t1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.016..612.391 rows=1000000 loops=1) -> Seq Scan on t1p1p1 (cost=0.00..13565.97 rows=719697 width=32) (actual time=0.016..546.605 rows=1000000 loops=1) Planning time: 0.111 ms Execution time: 646.292 ms (4 rows) create table p (a int, b varchar, c timestamp); create table p_c (b varchar, c timestamp, a int, d int); alter table p_c inherit p; create table p_c_c (c timestamp, a int, b varchar, d int, e int); alter table p_c_c inherit p_c; insert into p_c_c select c, a, b, a, a from t1; explain analyze select p from p; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..14851.04 rows=749906 width=32) (actual time=0.044..737.725 rows=1000000 loops=1) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=68) (actual time=0.002..0.002 rows=0 loops=1) -> Seq Scan on p_c (cost=0.00..0.00 rows=1 width=32) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on p_c_c (cost=0.00..14851.04 rows=749904 width=32) (actual time=0.042..672.052 rows=1000000 loops=1) Planning time: 0.194 ms Execution time: 772.241 ms (6 rows) explain analyze select p from p; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..14851.04 rows=749906 width=32) (actual time=0.018..677.591 rows=1000000 loops=1) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=68) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c (cost=0.00..0.00 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c_c (cost=0.00..14851.04 rows=749904 width=32) (actual time=0.016..611.849 rows=1000000 loops=1) Planning time: 0.098 ms Execution time: 711.619 ms (6 rows) explain analyze select p from p; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..14851.04 rows=749906 width=32) (actual time=0.018..684.336 rows=1000000 loops=1) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=68) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c (cost=0.00..0.00 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c_c (cost=0.00..14851.04 rows=749904 width=32) (actual time=0.015..618.316 rows=1000000 loops=1) Planning time: 0.104 ms Execution time: 718.329 ms (6 rows) explain analyze select p from p; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..14851.04 rows=749906 width=32) (actual time=0.018..677.647 rows=1000000 loops=1) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=68) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c (cost=0.00..0.00 rows=1 width=32) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on p_c_c (cost=0.00..14851.04 rows=749904 width=32) (actual time=0.016..611.973 rows=1000000 loops=1) Planning time: 0.098 ms Execution time: 711.535 ms (6 rows) explain analyze select p from p; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..14851.04 rows=749906 width=32) (actual time=0.018..675.063 rows=1000000 loops=1) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=68) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c (cost=0.00..0.00 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on p_c_c (cost=0.00..14851.04 rows=749904 width=32) (actual time=0.016..609.450 rows=1000000 loops=1) Planning time: 0.098 ms Execution time: 708.892 ms (6 rows)
nest_cre.out.wo_patch
Description: Binary data