On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote: > Did additional benchmarks and found issues with the patch that doesn't do > execProcNode > wrapping. There are sporadic crashes with **double free or corruption (top)** > **** > So making the patch that uses the wrapper the current one. Again, giving the > credits to > torikoshia as being the owner of that section of the code. >
Rafael, thanks for working on it. It is a step forward in observability. I started with some performance tests and the last improvements seem to fix the overhead imposed in the initial patch version. I didn't notice any of these new function in the perf report while executing fast queries. I found a crash. It is simple to reproduce. Session A: select * from pg_stat_progress_explain; \watch 2 Session B: explain select pg_sleep(30); \watch 2 8<--------------------------------------------------------------------8< Backtrace: Core was generated by `postgres: euler postgres [lo'. Program terminated with signal SIGSEGV, Segmentation fault. #0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401 5401 if (ps->ExecProcNodeOriginal != NULL) #0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401 #1 0x00005624173829aa in handle_sig_alarm (postgres_signal_arg=<optimized out>) at timeout.c:414 #2 0x00005624173ba02c in wrapper_handler (postgres_signal_arg=14) at pqsignal.c:110 #3 <signal handler called> #4 0x00007f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #5 0x00005624171fb02f in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at waiteventset.c:1190 #6 WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffdd9e62080, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=100663296) at waiteventset.c:1138 #7 0x000056241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 <PqRecvBuffer>, len=8192) at be-secure.c:218 #8 0x000056241709bf2e in pq_recvbuf () at pqcomm.c:924 #9 0x000056241709ceb5 in pq_getbyte () at pqcomm.c:970 #10 0x000056241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361 #11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484 #12 PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4625 #13 0x00005624172167ed in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:107 #14 0x000056241717519b in postmaster_child_launch (child_type=<optimized out>, child_slot=2, startup_data=startup_data@entry=0x7ffdd9e6253c, startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffdd9e62540) at launch_backend.c:274 #15 0x0000562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at postmaster.c:3519 #16 ServerLoop () at postmaster.c:1688 #17 0x000056241717a6da in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x56244eeb81b0) at postmaster.c:1386 #18 0x0000562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230 8<--------------------------------------------------------------------8< You call this feature "progressive explain". My first impression is that it will only provide the execution plans for EXPLAIN commands. Instead of "progressive explain", I would suggest "query progress" that is a general database terminology. It seems natural to use "progressive explain" since you are using the explain infrastructure (including the same options -- format, settings, wal, ...) -- to print the execution plan. +CREATE VIEW pg_stat_progress_explain AS + SELECT + * + FROM pg_stat_progress_explain(true); + There is no use for the function argument. If you decide to keep this function, remove it. Why don't you use the pgstat_progress_XXX() API? Since you are using a pg_stat_progress_XXX view name I would expect using the command progress reporting infrastructure (see backend_progress.c). Maybe you could include datid and datname as the other progress reporting views. It would avoid a join to figure out what the database is. +static const struct config_enum_entry explain_format_options[] = { + {"text", EXPLAIN_FORMAT_TEXT, false}, + {"xml", EXPLAIN_FORMAT_XML, false}, + {"json", EXPLAIN_FORMAT_JSON, false}, + {"yaml", EXPLAIN_FORMAT_YAML, false}, + {NULL, 0, false} +}; Isn't it the same definition as in auto_explain.c? Use only one definition for auto_explain and this feature. You can put this struct into explain.c, use an extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into guc.h. See wal_level_options, for an example. +static const struct config_enum_entry progressive_explain_options[] = { + {"off", PROGRESSIVE_EXPLAIN_NONE, false}, + {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false}, + {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false}, + {"false", PROGRESSIVE_EXPLAIN_NONE, true}, + {NULL, 0, false} +}; The "analyze" is a separate option in auto_explain. Should we have 2 options? One that enable/disable this feature and another one that enable/disable analyze option. Don't the other EXPLAIN options make sense here? Like serialize and summary. TupleDesc tupDesc; /* descriptor for result tuples */ EState *estate; /* executor's query-wide state */ PlanState *planstate; /* tree of per-plan-node state */ + struct ExplainState *pe_es; /* progressive explain state if enabled */ Should you use the same name pattern here? pestate, for example. PG_LWLOCK(52, SerialControl) +PG_LWLOCK(53, ExplainHash) Could you use a specific name? Even if you keep the proposed name, you should use ProgressiveExplainHash, ProgressiveExplain or QueryProgress. +$node->init; +# Configure progressive explain to be logged immediatelly +$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0'); +$node->start; s/immediatelly/immediately/ +typedef struct progressiveExplainHashKey +{ + int pid; /* PID */ +} progressiveExplainHashKey; + +typedef struct progressiveExplainHashEntry +{ + progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */ + dsa_handle h; + dsa_pointer p; +} progressiveExplainHashEntry; You don't need progressiveExplainHashKey. Use pid as key directly. + /* Update shared memory with new data */ + strcpy(pe_data->plan, es->str->data); + pe_data->last_print = GetCurrentTimestamp(); I don't think last_print is accurate because it is not the time the execution plan is printed but the time it was updated. I suggest last_updated_time. +/* Flag set by timeouts to control when to print progressive explains */ +bool ProgressiveExplainPending = false; s/print/update/ There are other point that you use "print" but it is better to replace it with "update". + progressiveExplainArray = ShmemInitHash("progressive explain hash", + 50, 50, + &info, + HASH_ELEM | HASH_BLOBS); I'm wondering why you use "array" in the name. ProgressiveExplainHash is a better name. + entry->p = dsa_allocate(es->pe_a, + add_size(sizeof(progressiveExplainData), + add_size(strlen(es->str->data), + PROGRESSIVE_EXPLAIN_ALLOC_SIZE))); I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because it doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or PROGRESSIVE_EXPLAIN_AVAIL_SIZE? Maybe you can use dshash. There are some comments that still refers to the wrong function name. +/* + * ExecProcNodeWithExplain + * ExecProcNode wrapper that initializes progressive explain + * and uwraps ExecProdNode to the original function. + */ +static TupleTableSlot * +ExecProcNodeExplain(PlanState *node) and +/* + * ExecProcNodeWithExplain + * Responsible for initialization of all structures related to progressive + * explains. + * + /* state related to progressive explains */ + struct PlanState *pe_curr_node; + struct Instrumentation *pe_local_instr; + dsa_area *pe_a; Could you add some comments saying what each of these variables are for? I didn't experiment but I was wondering if there is a way to avoid the duplicates that you added to avoid the overhead. -- Euler Taveira EDB https://www.enterprisedb.com/