Re: a pool for parallel worker

Kirill Reshke Tue, 25 Mar 2025 22:42:51 -0700

On Tue, 11 Mar 2025 at 17:38, Andy Fan <[email protected]> wrote:
>
>
>
> Hi,
>


Hi!

> Currently when a query needs some parallel workers, postmaster spawns
> some backend for this query and when the work is done, the backend
> exit.  there are some wastage here, e.g. syscache, relcache, smgr cache,
> vfd cache and fork/exit syscall itself.
>
> I am thinking if we should preallocate (or create lazily) some backends
> as a pool for parallel worker. The benefits includes:
>
> (1) Make the startup cost of a parallel worker lower in fact.
> (2) Make the core most suitable for the cases where executor need to a
> new worker to run a piece of plan more. I think this is needed in some
> data redistribution related executor in a distributed database.
>
> I guess the both cases can share some well designed code, like costing or
> transfer the data between worker and leader.

Surely forking from the postmaster is costly.

> The boring thing for the pool is it is [dbid + userId] based, which
> I mean if the dbid or userId is different with the connection in pool,
> they can't be reused.  To reduce the effect of UserId, I think if we can
> start the pool with a superuser and then switch the user information
> with 'SET ROLE xxx'. and the pool can be created lazily.

I don't think this is secure. Currently, if your postgresql process
had started under superuser role, there is no way to undo that.
Consider a worker in a pool running a user query, which uses UDF. In
this UDF, one can simply RESET SESSION AUTHORIZATION and process with
anything under superuser rights.

> Any comments on this idea?
>
> --
> Best Regards
> Andy Fan
>
>
>


-- 
Best regards,
Kirill Reshke

Re: a pool for parallel worker

Reply via email to