On 18.11.2025 20:37, Tomas Vondra wrote: > > > On 11/18/25 19:35, David Geier wrote: >> >> On 18.11.2025 18:31, Tomas Vondra wrote: >>> On 11/18/25 17:51, Tom Lane wrote: >>>> David Geier <[email protected]> writes: >>>>> On 18.11.2025 16:40, Tomas Vondra wrote: >>>>>> It'd need code in the parallel-aware scans, i.e. seqscan, bitmap, index. >>>>>> I don't think you'd need code in other plans, because all parallel plans >>>>>> have one "driving" table. >>>> >>>> You're assuming that the planner will insert Gather nodes at arbitrary >>>> places in the plan, which isn't true. If it does generate plans that >>>> are problematic from this standpoint, maybe the answer is "don't >>>> parallelize in exactly that way". >>>> >>> >>> I think David has a point that nodes that "buffer" tuples (like Sort or >>> HashAgg) would break the approach making this the responsibility of the >>> parallel-aware scan. I don't see anything particularly wrong with such >>> plans - plans with partial aggregation often look like that. >>> >>> Maybe this should be the responsibility of execProcnode.c, not the >>> various nodes? I hadn't realized that this approach has the same limitation: ExecProcNode() is only called when e.g. heap_nextslot() or index_getnext_slot() have found a qualifying tuple. Otherwise, they just keep crunching without returning.
>> I like that idea, even though it would still not work while a node is >> doing the crunching. That is after it has pulled all rows and before it >> can return the first row. During this time the node won't call >> ExecProcNode(). >> > True. Perhaps we could provide a function nodes could call in suitable > places to check whether to end? This function would then also be required by the base relation scans. And we would have to call it more or less in the same places CHECK_FOR_INTERRUPTS() is called today. Beyond that, code such as heap_nextslot() or index_getnext_slot() don't have access to the PlanState which would contain the stop flag. So that would have to be propagated downwards as well. All of that would make for a fairly big patch, which the initial patch avoids. > > Actually, how does canceling queries with parallel workers work? Is that > done similarly to what your patch did? Parallel workers use the same mechanism as normal backends, except that parallel workers quit instead of waiting for the next query. The flow is as follows: parallel workers catch SIGINT via StatementCancelHandler() which sets QueryCancelPending = true. When ProcessInterrupts() is called the next time, it will elog(ERROR) out. BackgroundWorkerMain() will catch the error and proc_exit(). This mechanism is very similar to what I have in my patch, with the difference that (1) I use SendProcSignal() to inform the workers to stop and (2) that I added another sigsetjmp target around ExecutorRun() to be able to bail but still call all the shutdown code. (1) is necessary to be able to distinguish between query cancel and early stopping but not cancelling. (2) is necessary because the parallel shutdown code needs to be called before exiting. I tried to piggy back on the existing error handling mechanism by siglongjmp(*PG_exception_stack, 1) and there to not calling EmitErrorReport() if got_stopped == true. That gave me the following error: ERROR: lost connection to parallel worker >> Inspectability on that end seems useful. Maybe only with VERBOSE, >> similarly to the extended per-worker information. >> > > Maybe, no opinion. But it probably needs to apply to all nodes in the > parallel worker, right? Or maybe it's even a per-worker detail. I thought to make it a per-worker detail such as time or number of rows returned. Let's discuss this again, once we've settled on a solution. -- David Geier
