On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote:
>  Did additional benchmarks and found issues with the patch that doesn't do 
> execProcNode
> wrapping. There are sporadic crashes with **double free or corruption (top)**
> ****
> So making the patch that uses the wrapper the current one. Again, giving the 
> credits to
> torikoshia as being the owner of that section of the code.
> 

Rafael, thanks for working on it. It is a step forward in observability. I
started with some performance tests and the last improvements seem to fix the
overhead imposed in the initial patch version. I didn't notice any of these new
function in the perf report while executing fast queries.

I found a crash. It is simple to reproduce.

Session A:

select * from pg_stat_progress_explain;
\watch 2

Session B:

explain select pg_sleep(30);
\watch 2

8<--------------------------------------------------------------------8<

Backtrace:

Core was generated by `postgres: euler postgres [lo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
5401 if (ps->ExecProcNodeOriginal != NULL)
#0  WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
#1  0x00005624173829aa in handle_sig_alarm (postgres_signal_arg=<optimized 
out>) at timeout.c:414
#2  0x00005624173ba02c in wrapper_handler (postgres_signal_arg=14) at 
pqsignal.c:110
#3  <signal handler called>
#4  0x00007f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, 
maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#5  0x00005624171fb02f in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at 
waiteventset.c:1190
#6  WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffdd9e62080, nevents=nevents@entry=1, 
wait_event_info=wait_event_info@entry=100663296) at waiteventset.c:1138
#7  0x000056241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 
<PqRecvBuffer>, len=8192) at be-secure.c:218
#8  0x000056241709bf2e in pq_recvbuf () at pqcomm.c:924
#9  0x000056241709ceb5 in pq_getbyte () at pqcomm.c:970
#10 0x000056241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361
#11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484
#12 PostgresMain (dbname=<optimized out>, username=<optimized out>) at 
postgres.c:4625
#13 0x00005624172167ed in BackendMain (startup_data=<optimized out>, 
startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000056241717519b in postmaster_child_launch (child_type=<optimized out>, 
child_slot=2, startup_data=startup_data@entry=0x7ffdd9e6253c, 
startup_data_len=startup_data_len@entry=4, 
client_sock=client_sock@entry=0x7ffdd9e62540) at launch_backend.c:274
#15 0x0000562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at 
postmaster.c:3519
#16 ServerLoop () at postmaster.c:1688
#17 0x000056241717a6da in PostmasterMain (argc=argc@entry=1, 
argv=argv@entry=0x56244eeb81b0) at postmaster.c:1386
#18 0x0000562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230

8<--------------------------------------------------------------------8<

You call this feature "progressive explain". My first impression is that it
will only provide the execution plans for EXPLAIN commands. Instead of
"progressive explain", I would suggest "query progress" that is a general
database terminology. It seems natural to use "progressive explain" since you
are using the explain infrastructure (including the same options -- format,
settings, wal, ...) -- to print the execution plan.

+CREATE VIEW pg_stat_progress_explain AS
+    SELECT
+            *
+    FROM pg_stat_progress_explain(true);
+

There is no use for the function argument. If you decide to keep this function,
remove it.

Why don't you use the pgstat_progress_XXX() API? Since you are using a
pg_stat_progress_XXX view name I would expect using the command progress
reporting infrastructure (see backend_progress.c).

Maybe you could include datid and datname as the other progress reporting
views. It would avoid a join to figure out what the database is.

+static const struct config_enum_entry explain_format_options[] = {
+   {"text", EXPLAIN_FORMAT_TEXT, false},
+   {"xml", EXPLAIN_FORMAT_XML, false},
+   {"json", EXPLAIN_FORMAT_JSON, false},
+   {"yaml", EXPLAIN_FORMAT_YAML, false},
+   {NULL, 0, false}
+};

Isn't it the same definition as in auto_explain.c? Use only one definition for
auto_explain and this feature. You can put this struct into explain.c, use an
extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into
guc.h. See wal_level_options, for an example.

+static const struct config_enum_entry progressive_explain_options[] = {
+   {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+   {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+   {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+   {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+   {NULL, 0, false}
+};

The "analyze" is a separate option in auto_explain. Should we have 2 options?
One that enable/disable this feature and another one that enable/disable
analyze option.

Don't the other EXPLAIN options make sense here? Like serialize and summary.

    TupleDesc   tupDesc;        /* descriptor for result tuples */
    EState     *estate;         /* executor's query-wide state */
    PlanState  *planstate;      /* tree of per-plan-node state */
+   struct ExplainState *pe_es; /* progressive explain state if enabled */

Should you use the same name pattern here? pestate, for example.

PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)

Could you use a specific name? Even if you keep the proposed name, you should
use ProgressiveExplainHash, ProgressiveExplain or QueryProgress.

+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;

s/immediatelly/immediately/

+typedef struct progressiveExplainHashKey
+{
+   int         pid;            /* PID */
+} progressiveExplainHashKey;
+    
+typedef struct progressiveExplainHashEntry
+{
+   progressiveExplainHashKey key;  /* hash key of entry - MUST BE FIRST */
+   dsa_handle  h;
+   dsa_pointer p;
+} progressiveExplainHashEntry;

You don't need progressiveExplainHashKey. Use pid as key directly.

+           /* Update shared memory with new data */
+           strcpy(pe_data->plan, es->str->data);
+           pe_data->last_print = GetCurrentTimestamp();

I don't think last_print is accurate because it is not the time the execution 
plan
is printed but the time it was updated. I suggest last_updated_time.

+/* Flag set by timeouts to control when to print progressive explains */
+bool       ProgressiveExplainPending = false;

s/print/update/

There are other point that you use "print" but it is better to replace it with
"update".

+   progressiveExplainArray = ShmemInitHash("progressive explain hash",
+                                           50, 50,
+                                           &info,
+                                           HASH_ELEM | HASH_BLOBS);

I'm wondering why you use "array" in the name. ProgressiveExplainHash is a
better name.

+               entry->p = dsa_allocate(es->pe_a,
+                                       add_size(sizeof(progressiveExplainData),
+                                                add_size(strlen(es->str->data),
+                                                         
PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));

I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because it
doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or
PROGRESSIVE_EXPLAIN_AVAIL_SIZE?

Maybe you can use dshash.

There are some comments that still refers to the wrong function name.

+/*
+ * ExecProcNodeWithExplain
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)

and

+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *

+   /* state related to progressive explains */
+   struct PlanState *pe_curr_node; 
+   struct Instrumentation *pe_local_instr;
+   dsa_area   *pe_a;

Could you add some comments saying what each of these variables are for?

I didn't experiment but I was wondering if there is a way to avoid the
duplicates that you added to avoid the overhead.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Reply via email to