On Tue, 11 Mar 2025 at 17:38, Andy Fan <zhihuifan1...@163.com> wrote: > > > > Hi, >
Hi! > Currently when a query needs some parallel workers, postmaster spawns > some backend for this query and when the work is done, the backend > exit. there are some wastage here, e.g. syscache, relcache, smgr cache, > vfd cache and fork/exit syscall itself. > > I am thinking if we should preallocate (or create lazily) some backends > as a pool for parallel worker. The benefits includes: > > (1) Make the startup cost of a parallel worker lower in fact. > (2) Make the core most suitable for the cases where executor need to a > new worker to run a piece of plan more. I think this is needed in some > data redistribution related executor in a distributed database. > > I guess the both cases can share some well designed code, like costing or > transfer the data between worker and leader. Surely forking from the postmaster is costly. > The boring thing for the pool is it is [dbid + userId] based, which > I mean if the dbid or userId is different with the connection in pool, > they can't be reused. To reduce the effect of UserId, I think if we can > start the pool with a superuser and then switch the user information > with 'SET ROLE xxx'. and the pool can be created lazily. I don't think this is secure. Currently, if your postgresql process had started under superuser role, there is no way to undo that. Consider a worker in a pool running a user query, which uses UDF. In this UDF, one can simply RESET SESSION AUTHORIZATION and process with anything under superuser rights. > Any comments on this idea? > > -- > Best Regards > Andy Fan > > > -- Best regards, Kirill Reshke