On 06.06.2023 5:13 PM, Robert Haas wrote:
On Tue, Jun 6, 2023 at 9:40 AM Robert Haas <robertmh...@gmail.com> wrote:
I'm not sure that there's a strong consensus, but I do think it's a good idea.
Let me elaborate on this a bit.



Not all databases have this problem, and PostgreSQL isn't going to be
able to stop having it without some kind of major architectural
change. Changing from a process model to a threaded model might be
insufficient, because while I think that threads consume fewer OS
resources than processes, what is really needed, in all likelihood, is
the ability to have idle connections have neither a process nor a
thread associated with them until they cease being idle. That's a huge
project and I'm not volunteering to do it, but if we want to have the
same kind of scalability as some competing products, that is probably
a place to which we ultimately need to go. Getting out of the current
model where every backend has an arbitrarily large amount of state
hanging off of random global variables, not all of which are even
known to any central system, is a critical step in that journey.

It looks like built-in connection pooler, doesn't it?
Actually built-in connection pooler has a lot o common things with multithreaded Postgres.
It also needs to keep session context.
Te main difference is that there is no need to place here all Postgres global/static variables, because lefitime of most of them is shorter than transaction. So it is really enough to place all such variables in single struct.
This is how built-in connection pooler was implemented in PgPro.

Reading all concerns  against  multithreading Postgres makes me think that it may erasonable to combine two approaches: still have processes (backends) but be able to spawn multiple threads inside process (for example for parallel query execution). It can be considered that such approach can only increase complexity of implementation and combine drawbacks of both approaches.
But actually such approach allows:
1. Support old (external, non-reentrant) extensions - them will be executed by dedicated backends.
2. Simplify parallel query execution and make it more efficient.
3. Allows to most efficiently use multitreaded PL-s (like JVM based). As far as there will be no single VM for all connections, but only for some group of them(for example belonging to one user), then most complaints concerning sharing VM between different connections can be avoided
4. Avoid or minimize problems with OOM and memory fragmentation.
5. Can be combine with connection pooler (save inactive connection state without having process or thread for it)







Reply via email to