Hello Ashutosh and Alvaro,

I appreciate you replying to the email.

On Tue, Dec 3, 2024 at 7:38 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
>
> I don't think planning time in assert-enabled builds is something we
> should worry about, at all.  Planning time in production builds is the
> important one.

Thank you for your reply. Making debug builds too slow is not good for
developers, so I'd like to see how these patches behave with
assert-enabled builds.

On Tue, Dec 3, 2024 at 1:12 PM Ashutosh Bapat
<ashutosh.bapat....@gmail.com> wrote:
>
> Hi Yuya,
> For one of the earlier versions, I had reported a large memory
> consumption in all cases and increase in planning time for Assert
> enabled builds. How does the latest version perform in those aspects?

1. Experimental results

I ran experiments to measure memory consumption during planning. These
are done with the release build. In the experiments, I used the
rebased version of your patch [1], which is attached to this email.

Table 1: Memory consumption when planning query A (without
partition-wise join (PWJ), MiB)
---------------------------------
    n | Master |    v23 |    v28
---------------------------------
    1 |  0.132 |  0.137 |  0.137
    2 |  0.158 |  0.166 |  0.166
    4 |  0.220 |  0.234 |  0.235
    8 |  0.347 |  0.375 |  0.375
   16 |  0.596 |  0.652 |  0.653
   32 |  1.104 |  1.221 |  1.223
   64 |  2.126 |  2.392 |  2.396
  128 |  4.245 |  4.917 |  4.925
  256 |  8.742 | 10.651 | 10.663
  384 | 13.603 | 17.159 | 17.176
  512 | 18.758 | 24.827 | 24.850
  640 | 23.924 | 32.223 | 32.253
  768 | 30.050 | 41.843 | 41.879
  896 | 36.224 | 51.937 | 51.978
 1024 | 42.923 | 64.058 | 64.105
---------------------------------

Table 2: Memory consumption when planning query A (with PWJ, MiB)
------------------------------------
    n |  Master |     v23 |     v28
------------------------------------
    1 |   0.190 |   0.194 |   0.195
    2 |   0.276 |   0.284 |   0.284
    4 |   0.461 |   0.475 |   0.475
    8 |   0.844 |   0.871 |   0.871
   16 |   1.584 |   1.640 |   1.641
   32 |   3.085 |   3.202 |   3.204
   64 |   6.099 |   6.365 |   6.369
  128 |  12.261 |  12.934 |  12.941
  256 |  25.061 |  26.970 |  26.982
  384 |  38.542 |  42.098 |  42.116
  512 |  52.541 |  58.610 |  58.633
  640 |  66.579 |  74.878 |  74.908
  768 |  82.421 |  94.214 |  94.250
  896 |  98.483 | 114.196 | 114.237
 1024 | 115.074 | 136.208 | 136.255
------------------------------------

Table 3: Memory consumption when planning query B (without PWJ, MiB)
----------------------------------
   n | Master |     v23 |     v28
----------------------------------
   1 | 16.019 |  16.404 |  16.404
   2 | 15.288 |  15.708 |  15.708
   4 | 15.674 |  16.360 |  16.360
   8 | 16.554 |  17.784 |  17.786
  16 | 18.221 |  19.954 |  19.958
  32 | 21.630 |  25.609 |  25.617
  64 | 28.913 |  39.419 |  39.427
 128 | 45.331 |  77.015 |  77.030
 256 | 86.127 | 192.884 | 192.916
----------------------------------

Table 4: Memory consumption when planning query B (with PWJ, MiB)
--------------------------------------
   n |   Master |      v23 |      v28
--------------------------------------
   1 |   33.623 |   34.008 |   34.008
   2 |   50.285 |   50.705 |   50.705
   4 |   85.562 |   86.247 |   86.247
   8 |  156.465 |  157.695 |  157.697
  16 |  298.692 |  300.424 |  300.428
  32 |  585.713 |  589.692 |  589.699
  64 | 1169.396 | 1179.901 | 1179.909
 128 | 2375.592 | 2407.275 | 2407.291
 256 | 4942.295 | 5049.053 | 5049.084
--------------------------------------

Next, I measured the planning times using the debug build with
assertions. In this experiment, I set CFLAGS to "-O0" and also used
the attached patch that removes assertions in Bitmapset-based indexes.

Table 5: Planning time of query A (debug build, ms)
-----------------------------------------
    n |  Master |     v28 | v28 w/ patch
-----------------------------------------
    1 |   0.648 |   0.664 |        0.665
    2 |   0.788 |   0.810 |        0.800
    4 |   0.891 |   0.936 |        0.931
    8 |   1.202 |   1.301 |        1.268
   16 |   1.973 |   2.145 |        2.042
   32 |   3.668 |   4.000 |        3.638
   64 |   8.093 |   8.597 |        7.167
  128 |  20.015 |  19.641 |       14.274
  256 |  57.634 |  51.008 |       29.930
  384 | 114.280 |  94.760 |       46.449
  512 | 196.492 | 154.230 |       63.758
  640 | 315.037 | 240.142 |       82.476
  768 | 466.149 | 338.043 |      101.318
  896 | 679.029 | 511.097 |      134.854
 1024 | 897.806 | 592.823 |      141.852
-----------------------------------------

Table 6: Planning time of query B (debug build, ms)
------------------------------------------
   n |   Master |      v28 | v28 w/ patch
------------------------------------------
   1 |   43.788 |   46.364 |       45.418
   2 |   42.637 |   45.750 |       44.093
   4 |   43.842 |   48.109 |       45.000
   8 |   47.504 |   54.410 |       48.199
  16 |   55.682 |   67.242 |       53.895
  32 |   77.736 |   98.507 |       66.877
  64 |  144.772 |  185.697 |       96.591
 128 |  411.967 |  503.644 |      166.437
 256 | 1653.681 | 1610.697 |      337.940
------------------------------------------

2. Discussion

Tables 1, 2, 3, and 4 show that the proposed patches increase memory
consumption. There seems to be no difference between v23 and v28. The
increase is more significant for query B, which is 2x or more.

For debug builds, I observed regressions compared to the master. The
regressions were reduced with the attached patch. This indicates that
the get_ec_[source|derive]_indexes[_strict]() functions (quoted below)
have time-consuming assertions. I think these assertions are helpful,
but it might be better to remove them to avoid slowing down debug
builds. What do you think?

=====
Bitmapset *
get_ec_derive_indexes_strict(PlannerInfo *root, EquivalenceClass *ec,
                             Relids relids)
{
    Bitmapset  *edis = NULL;
    int i = bms_next_member(relids, -1);

    if (i >= 0)
    {
        EquivalenceClassIndexes *index = &root->eclass_indexes_array[i];

        /*
         * bms_intersect to the first relation to try to keep the resulting
         * Bitmapset as small as possible.  This saves having to make a
         * complete bms_copy() of one of them.  One may contain significantly
         * more words than the other.
         */
        edis = bms_intersect(ec->ec_derive_indexes,
                             index->derive_indexes);

        while ((i = bms_next_member(relids, i)) >= 0)
        {
            index = &root->eclass_indexes_array[i];
            edis = bms_int_members(edis, index->derive_indexes);
        }
    }

#ifdef USE_ASSERT_CHECKING
    /* verify the results look sane */
    i = -1;
    while ((i = bms_next_member(edis, i)) >= 0)
    {
        RestrictInfo *rinfo = list_nth_node(RestrictInfo, root->eq_derives,
                                            i);

        Assert(bms_is_subset(relids, rinfo->clause_relids));
    }
#endif

    return edis;
}
=====

[1] 
https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com

-- 
Best regards,
Yuya Watari
From 1c5bdc574b9ab136ec29a2bd86b43fb484a57345 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH] Report memory used for planning a query in EXPLAIN ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c | 12 ++++++++++--
 src/backend/commands/prepare.c |  5 ++++-
 src/backend/utils/mmgr/mcxt.c  | 19 +++++++++++++++++++
 src/include/commands/explain.h |  3 ++-
 src/include/utils/memutils.h   |  1 +
 5 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a3f1d53d7a..42fe11b3ce 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -471,6 +471,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
        MemoryContextCounters mem_counters;
        MemoryContext planner_ctx = NULL;
        MemoryContext saved_ctx = NULL;
+       Size            mem_consumed;
 
        if (es->memory)
        {
@@ -491,12 +492,14 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
        if (es->buffers)
                bufusage_start = pgBufferUsage;
        INSTR_TIME_SET_CURRENT(planstart);
+       mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
        /* plan the query */
        plan = pg_plan_query(query, queryString, cursorOptions, params);
 
        INSTR_TIME_SET_CURRENT(planduration);
        INSTR_TIME_SUBTRACT(planduration, planstart);
+       mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) - 
mem_consumed;
 
        if (es->memory)
        {
@@ -514,7 +517,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
        /* run it (if needed) and produce output */
        ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
                                   &planduration, (es->buffers ? &bufusage : 
NULL),
-                                  es->memory ? &mem_counters : NULL);
+                                  es->memory ? &mem_counters : NULL, 
&mem_consumed);
 }
 
 /*
@@ -638,7 +641,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, 
ExplainState *es,
                           const char *queryString, ParamListInfo params,
                           QueryEnvironment *queryEnv, const instr_time 
*planduration,
                           const BufferUsage *bufusage,
-                          const MemoryContextCounters *mem_counters)
+                          const MemoryContextCounters *mem_counters,
+                          const Size *mem_consumed)
 {
        DestReceiver *dest;
        QueryDesc  *queryDesc;
@@ -769,6 +773,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, 
ExplainState *es,
                double          plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
                ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 
3, es);
+
+               if (mem_consumed)
+                       ExplainPropertyUInteger("Planning Memory", "bytes",
+                                                                       
(uint64) *mem_consumed, es);
        }
 
        /* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index a93f970a29..e5a49e825e 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -582,6 +582,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause 
*into, ExplainState *es,
        MemoryContextCounters mem_counters;
        MemoryContext planner_ctx = NULL;
        MemoryContext saved_ctx = NULL;
+       Size            mem_consumed;
 
        if (es->memory)
        {
@@ -596,6 +597,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause 
*into, ExplainState *es,
        if (es->buffers)
                bufusage_start = pgBufferUsage;
        INSTR_TIME_SET_CURRENT(planstart);
+       mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
        /* Look it up in the hash table */
        entry = FetchPreparedStatement(execstmt->name, true);
@@ -632,6 +634,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause 
*into, ExplainState *es,
 
        INSTR_TIME_SET_CURRENT(planduration);
        INSTR_TIME_SUBTRACT(planduration, planstart);
+       mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) - 
mem_consumed;
 
        if (es->memory)
        {
@@ -656,7 +659,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause 
*into, ExplainState *es,
                if (pstmt->commandType != CMD_UTILITY)
                        ExplainOnePlan(pstmt, into, es, query_string, paramLI, 
pstate->p_queryEnv,
                                                   &planduration, (es->buffers 
? &bufusage : NULL),
-                                                  es->memory ? &mem_counters : 
NULL);
+                                                  es->memory ? &mem_counters : 
NULL, &mem_consumed);
                else
                        ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, 
paramLI);
 
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 70d33226cb..e0a119a886 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -865,6 +865,25 @@ MemoryContextStatsDetail(MemoryContext context,
        }
 }
 
+/*
+ * Compute the memory used in the given context and its children.
+ *
+ * XXX: Instead of this separate function we could modify
+ * MemoryContextStatsDetail() to report used memory and disable printing the
+ * detailed stats.
+ */
+extern Size
+MemoryContextMemUsed(MemoryContext context)
+{
+       MemoryContextCounters grand_totals;
+
+       memset(&grand_totals, 0, sizeof(grand_totals));
+
+       MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, 
false);
+
+       return grand_totals.totalspace - grand_totals.freespace;
+}
+
 /*
  * MemoryContextStatsInternal
  *             One recursion level for MemoryContextStats
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index aa5872bc15..4fb267afef 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -108,7 +108,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, 
IntoClause *into,
                                                   ParamListInfo params, 
QueryEnvironment *queryEnv,
                                                   const instr_time 
*planduration,
                                                   const BufferUsage *bufusage,
-                                                  const MemoryContextCounters 
*mem_counters);
+                                                  const MemoryContextCounters 
*mem_counters,
+                                                  const Size *mem_consumed);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index bf93433b78..9afb0bfb2a 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context,
                                                                         bool 
print_to_stderr);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
                                                                                
                bool allow);
+extern Size MemoryContextMemUsed(MemoryContext context);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 extern void MemoryContextCheck(MemoryContext context);
-- 
2.45.2.windows.1

diff --git a/src/backend/optimizer/path/equivclass.c 
b/src/backend/optimizer/path/equivclass.c
index 86057538c5..9bbe11a9de 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -4014,18 +4014,6 @@ get_ec_source_indexes(PlannerInfo *root, 
EquivalenceClass *ec, Relids relids)
                rel_esis = bms_add_members(rel_esis, index->source_indexes);
        }
 
-#ifdef USE_ASSERT_CHECKING
-       /* verify the results look sane */
-       i = -1;
-       while ((i = bms_next_member(rel_esis, i)) >= 0)
-       {
-               RestrictInfo *rinfo = list_nth_node(RestrictInfo, 
root->eq_sources,
-                                                                               
        i);
-
-               Assert(bms_overlap(relids, rinfo->clause_relids));
-       }
-#endif
-
        /* bitwise-AND to leave only the ones for this EquivalenceClass */
        return bms_int_members(rel_esis, ec->ec_source_indexes);
 }
@@ -4065,18 +4053,6 @@ get_ec_source_indexes_strict(PlannerInfo *root, 
EquivalenceClass *ec,
                }
        }
 
-#ifdef USE_ASSERT_CHECKING
-       /* verify the results look sane */
-       i = -1;
-       while ((i = bms_next_member(esis, i)) >= 0)
-       {
-               RestrictInfo *rinfo = list_nth_node(RestrictInfo, 
root->eq_sources,
-                                                                               
        i);
-
-               Assert(bms_is_subset(relids, rinfo->clause_relids));
-       }
-#endif
-
        return esis;
 }
 
@@ -4103,18 +4079,6 @@ get_ec_derive_indexes(PlannerInfo *root, 
EquivalenceClass *ec, Relids relids)
                rel_edis = bms_add_members(rel_edis, index->derive_indexes);
        }
 
-#ifdef USE_ASSERT_CHECKING
-       /* verify the results look sane */
-       i = -1;
-       while ((i = bms_next_member(rel_edis, i)) >= 0)
-       {
-               RestrictInfo *rinfo = list_nth_node(RestrictInfo, 
root->eq_derives,
-                                                                               
        i);
-
-               Assert(bms_overlap(relids, rinfo->clause_relids));
-       }
-#endif
-
        /* bitwise-AND to leave only the ones for this EquivalenceClass */
        return bms_int_members(rel_edis, ec->ec_derive_indexes);
 }
@@ -4154,18 +4118,6 @@ get_ec_derive_indexes_strict(PlannerInfo *root, 
EquivalenceClass *ec,
                }
        }
 
-#ifdef USE_ASSERT_CHECKING
-       /* verify the results look sane */
-       i = -1;
-       while ((i = bms_next_member(edis, i)) >= 0)
-       {
-               RestrictInfo *rinfo = list_nth_node(RestrictInfo, 
root->eq_derives,
-                                                                               
        i);
-
-               Assert(bms_is_subset(relids, rinfo->clause_relids));
-       }
-#endif
-
        return edis;
 }
 

Reply via email to