Re: Performance issues with parallelism and LIMIT

David Geier Wed, 19 Nov 2025 04:29:15 -0800

On 18.11.2025 20:37, Tomas Vondra wrote:
> 
> 
> On 11/18/25 19:35, David Geier wrote:
>>
>> On 18.11.2025 18:31, Tomas Vondra wrote:
>>> On 11/18/25 17:51, Tom Lane wrote:
>>>> David Geier <[email protected]> writes:
>>>>> On 18.11.2025 16:40, Tomas Vondra wrote:
>>>>>> It'd need code in the parallel-aware scans, i.e. seqscan, bitmap, index.
>>>>>> I don't think you'd need code in other plans, because all parallel plans
>>>>>> have one "driving" table.
>>>>
>>>> You're assuming that the planner will insert Gather nodes at arbitrary
>>>> places in the plan, which isn't true.  If it does generate plans that
>>>> are problematic from this standpoint, maybe the answer is "don't
>>>> parallelize in exactly that way".
>>>>
>>>
>>> I think David has a point that nodes that "buffer" tuples (like Sort or
>>> HashAgg) would break the approach making this the responsibility of the
>>> parallel-aware scan. I don't see anything particularly wrong with such
>>> plans - plans with partial aggregation often look like that.
>>>
>>> Maybe this should be the responsibility of execProcnode.c, not the
>>> various nodes?
I hadn't realized that this approach has the same limitation:
ExecProcNode() is only called when e.g. heap_nextslot() or
index_getnext_slot() have found a qualifying tuple. Otherwise, they just
keep crunching without returning.


>> I like that idea, even though it would still not work while a node is
>> doing the crunching. That is after it has pulled all rows and before it
>> can return the first row. During this time the node won't call
>> ExecProcNode().
>>
> True. Perhaps we could provide a function nodes could call in suitable
> places to check whether to end?
This function would then also be required by the base relation scans.
And we would have to call it more or less in the same places
CHECK_FOR_INTERRUPTS() is called today.

Beyond that, code such as heap_nextslot() or index_getnext_slot() don't
have access to the PlanState which would contain the stop flag. So that
would have to be propagated downwards as well.

All of that would make for a fairly big patch, which the initial patch
avoids.

> 
> Actually, how does canceling queries with parallel workers work? Is that
> done similarly to what your patch did?
Parallel workers use the same mechanism as normal backends, except that
parallel workers quit instead of waiting for the next query.

The flow is as follows: parallel workers catch SIGINT via
StatementCancelHandler() which sets QueryCancelPending = true. When
ProcessInterrupts() is called the next time, it will elog(ERROR) out.
BackgroundWorkerMain() will catch the error and proc_exit().

This mechanism is very similar to what I have in my patch, with the
difference that (1) I use SendProcSignal() to inform the workers to stop
and (2) that I added another sigsetjmp target around ExecutorRun() to be
able to bail but still call all the shutdown code.

(1) is necessary to be able to distinguish between query cancel and
early stopping but not cancelling.

(2) is necessary because the parallel shutdown code needs to be called
before exiting. I tried to piggy back on the existing error handling
mechanism by siglongjmp(*PG_exception_stack, 1) and there to not calling
EmitErrorReport() if got_stopped == true. That gave me the following error:

ERROR:  lost connection to parallel worker

>> Inspectability on that end seems useful. Maybe only with VERBOSE,
>> similarly to the extended per-worker information.
>>
> 
> Maybe, no opinion. But it probably needs to apply to all nodes in the
> parallel worker, right? Or maybe it's even a per-worker detail.
I thought to make it a per-worker detail such as time or number of rows
returned. Let's discuss this again, once we've settled on a solution.

--
David Geier

Re: Performance issues with parallelism and LIMIT

Reply via email to